research-article

A programming system for future proofing performance critical libraries

Authors:

Wen-mei HwuAuthors Info & Claims

PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Article No.: 32, Pages 1 - 2

https://doi.org/10.1145/2851141.2851178

Published: 27 February 2016 Publication History

Get Access

Abstract

We present Tangram, a programming system for writing performance-portable programs. The language enables programmers to write computation and composition codelets, supported by tuning knobs and primitives for expressing data parallelism and work decomposition. The compiler and runtime use a set of techniques such as hierarchical composition, coarsening, data placement, tuning, and runtime selection based on input characteristics and micro-profiling. The resulting performance is competitive with optimized vendor libraries.

References

[1]

B. Jang et al. Exploiting memory access patterns to improve memory performance in data-parallel architectures. IEEE Trans. Parallel Distrib. Syst., 22(1):105--118, 2011.

Digital Library

Google Scholar

[2]

D. Merrill et al. Policy-based tuning for performance portability and library co-optimization. In InPar, pages 1--10, 2012.

Crossref

Google Scholar

[3]

G. Blelloch. NESL: A nested data-parallel language. Technical report, Pittsburgh, PA, USA, 1992.

Digital Library

Google Scholar

[4]

G. Chen et al. PORPLE: An extensible optimizer for portable data placement on GPU. In MICRO, pages 88--100, 2014.

Digital Library

Google Scholar

[5]

H.-S. Kim et al. Locality-centric thread scheduling for bulk-synchronous programming models on cpu architectures. In CGO, pages 257--268, 2015.

Digital Library

Google Scholar

[6]

J. Ansel et al. Petabricks: A language and compiler for algorithmic choice. In PLDI, pages 38--49, 2009.

Digital Library

Google Scholar

[7]

R. Karrenberg and S. Hack. Improving Performance of OpenCL on CPUs. In CC, pages 1--20, 2012.

Digital Library

Google Scholar

[8]

L.-W. Chang et al. Tangram: a high-level language for performance portable code synthesis. In In Programmability Issues for Heterogeneous Multicores, 2015.

Google Scholar

[9]

L.-W. Chang et al. Dysel: Lightweight dynamic selection for kernel-based data-parallel programming model. In ASPLOS, 2016 (in press).

Digital Library

Google Scholar

[10]

M. Püschel et al. Spiral: A generator for platform-adapted libraries of signal processing alogorithms. International Journal of High Performance Computing Applications, 18(1):21--45, 2004.

Digital Library

Google Scholar

[11]

P. Jääskeläinen et al. pocl: A performance-portable OpenCL implementation, 2014.

Google Scholar

[12]

R. C. Whaley et el. Automated empirical optimizations of software and the atlas project. Parallel Computing, 27(1):3--35, 2001.

Google Scholar

[13]

S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, pages 44--54, 2009.

Digital Library

Google Scholar

Cited By

View all

De Gonzalo SHuang SGómez-Luna JHammond SMutlu OHwu WKandemir MJimborean AMoseley T(2019)Automatic generation of warp-level primitives and atomic instructions for fast and portable parallel reduction on GPUsProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314884(73-84)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.5555/3314872.3314884
Gonzalo SHuang SGomez-Luna JHammond SMutlu OHwu W(2019)Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2019.8661187(73-84)Online publication date: Feb-2019
https://doi.org/10.1109/CGO.2019.8661187
Chang LGómez-Luna JEl Hajj IHuang SChen DHwu WBinder WCortellessa VKoziolek ASmirni EPoess M(2017)Collaborative Computing for Heterogeneous Integrated SystemsProceedings of the 8th ACM/SPEC on International Conference on Performance Engineering10.1145/3030207.3030244(385-388)Online publication date: 17-Apr-2017
https://dl.acm.org/doi/10.1145/3030207.3030244

Recommendations

A programming system for future proofing performance critical libraries
PPoPP '16

We present Tangram, a programming system for writing performance-portable programs. The language enables programmers to write computation and composition codelets, supported by tuning knobs and primitives for expressing data parallelism and work ...
Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd Edition
AnyDSL: a partial evaluation framework for programming high-performance libraries

This paper advocates programming high-performance code using partial evaluation. We present a clean-slate programming system with a simple, annotation-based, online partial evaluator that operates on a CPS-style intermediate representation. Our system ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 2016

420 pages

ISBN:9781450340922

DOI:10.1145/2851141

General Chair:
Rafael Asenjo
University of Málaga, Spain
,
Program Chair:
Tim Harris
Oracle Labs, Cambridge, UK

ACM SIGPLAN Notices Volume 51, Issue 8
PPoPP '16
August 2016
405 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3016078
Editor:
Matthew Fluet
Issue’s Table of Contents

© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

PPoPP '16

Sponsor:

PPoPP '16: 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

March 12 - 16, 2016

Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
191
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

De Gonzalo SHuang SGómez-Luna JHammond SMutlu OHwu WKandemir MJimborean AMoseley T(2019)Automatic generation of warp-level primitives and atomic instructions for fast and portable parallel reduction on GPUsProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314884(73-84)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.5555/3314872.3314884
Gonzalo SHuang SGomez-Luna JHammond SMutlu OHwu W(2019)Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2019.8661187(73-84)Online publication date: Feb-2019
https://doi.org/10.1109/CGO.2019.8661187
Chang LGómez-Luna JEl Hajj IHuang SChen DHwu WBinder WCortellessa VKoziolek ASmirni EPoess M(2017)Collaborative Computing for Heterogeneous Integrated SystemsProceedings of the 8th ACM/SPEC on International Conference on Performance Engineering10.1145/3030207.3030244(385-388)Online publication date: 17-Apr-2017
https://dl.acm.org/doi/10.1145/3030207.3030244

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

A programming system for future proofing performance critical libraries

Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd Edition

AnyDSL: a partial evaluation framework for programming high-performance libraries