[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2851141.2851178acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

A programming system for future proofing performance critical libraries

Published: 27 February 2016 Publication History

Abstract

We present Tangram, a programming system for writing performance-portable programs. The language enables programmers to write computation and composition codelets, supported by tuning knobs and primitives for expressing data parallelism and work decomposition. The compiler and runtime use a set of techniques such as hierarchical composition, coarsening, data placement, tuning, and runtime selection based on input characteristics and micro-profiling. The resulting performance is competitive with optimized vendor libraries.

References

[1]
B. Jang et al. Exploiting memory access patterns to improve memory performance in data-parallel architectures. IEEE Trans. Parallel Distrib. Syst., 22(1):105--118, 2011.
[2]
D. Merrill et al. Policy-based tuning for performance portability and library co-optimization. In InPar, pages 1--10, 2012.
[3]
G. Blelloch. NESL: A nested data-parallel language. Technical report, Pittsburgh, PA, USA, 1992.
[4]
G. Chen et al. PORPLE: An extensible optimizer for portable data placement on GPU. In MICRO, pages 88--100, 2014.
[5]
H.-S. Kim et al. Locality-centric thread scheduling for bulk-synchronous programming models on cpu architectures. In CGO, pages 257--268, 2015.
[6]
J. Ansel et al. Petabricks: A language and compiler for algorithmic choice. In PLDI, pages 38--49, 2009.
[7]
R. Karrenberg and S. Hack. Improving Performance of OpenCL on CPUs. In CC, pages 1--20, 2012.
[8]
L.-W. Chang et al. Tangram: a high-level language for performance portable code synthesis. In In Programmability Issues for Heterogeneous Multicores, 2015.
[9]
L.-W. Chang et al. Dysel: Lightweight dynamic selection for kernel-based data-parallel programming model. In ASPLOS, 2016 (in press).
[10]
M. Püschel et al. Spiral: A generator for platform-adapted libraries of signal processing alogorithms. International Journal of High Performance Computing Applications, 18(1):21--45, 2004.
[11]
P. Jääskeläinen et al. pocl: A performance-portable OpenCL implementation, 2014.
[12]
R. C. Whaley et el. Automated empirical optimizations of software and the atlas project. Parallel Computing, 27(1):3--35, 2001.
[13]
S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, pages 44--54, 2009.

Cited By

View all
  • (2019)Automatic generation of warp-level primitives and atomic instructions for fast and portable parallel reduction on GPUsProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314884(73-84)Online publication date: 16-Feb-2019
  • (2019)Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2019.8661187(73-84)Online publication date: Feb-2019
  • (2017)Collaborative Computing for Heterogeneous Integrated SystemsProceedings of the 8th ACM/SPEC on International Conference on Performance Engineering10.1145/3030207.3030244(385-388)Online publication date: 17-Apr-2017

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
February 2016
420 pages
ISBN:9781450340922
DOI:10.1145/2851141
© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

PPoPP '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Automatic generation of warp-level primitives and atomic instructions for fast and portable parallel reduction on GPUsProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314884(73-84)Online publication date: 16-Feb-2019
  • (2019)Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2019.8661187(73-84)Online publication date: Feb-2019
  • (2017)Collaborative Computing for Heterogeneous Integrated SystemsProceedings of the 8th ACM/SPEC on International Conference on Performance Engineering10.1145/3030207.3030244(385-388)Online publication date: 17-Apr-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media