[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/IPDPS.2011.33guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Automatic Library Generation for BLAS3 on GPUs

Published: 16 May 2011 Publication History

Abstract

High-performance libraries, the performance-critical building blocks for high-level applications, will assume greater importance on modern processors as they become more complex and diverse. However, automatic library generators are still immature, forcing library developers to manually tune library to meet their performance objectives. We are developing a new script-controlled compilation framework to help domain experts reduce much of the tedious and error-prone nature of manual tuning, by enabling them to leverage their expertise and reuse past optimization experiences. We focus on demonstrating improved performance and productivity obtained through using our framework to tune BLAS3 routines on three GPU platforms: up to 5.4x speedups over the CUBLAS achieved on NVIDIA GeForce 9800, 2.8x on GTX285, and 3.4x on Fermi Tesla C2050. Our results highlight the potential benefits of exploiting domain expertise and the relations between different routines (in terms of their algorithms and data structures).

Cited By

View all
  • (2019)PPOpenCL: a performance-portable OpenCL compiler with host and kernel thread code fusionProceedings of the 28th International Conference on Compiler Construction10.1145/3302516.3307350(2-16)Online publication date: 16-Feb-2019
  • (2018)SCPACM Transactions on Architecture and Code Optimization10.1145/327465415:4(1-21)Online publication date: 10-Oct-2018
  • (2018)Revisiting Loop Tiling for DatacentersProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205306(328-340)Online publication date: 12-Jun-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
IPDPS '11: Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
May 2011
1285 pages
ISBN:9780769543857

Publisher

IEEE Computer Society

United States

Publication History

Published: 16 May 2011

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)PPOpenCL: a performance-portable OpenCL compiler with host and kernel thread code fusionProceedings of the 28th International Conference on Compiler Construction10.1145/3302516.3307350(2-16)Online publication date: 16-Feb-2019
  • (2018)SCPACM Transactions on Architecture and Code Optimization10.1145/327465415:4(1-21)Online publication date: 10-Oct-2018
  • (2018)Revisiting Loop Tiling for DatacentersProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205306(328-340)Online publication date: 12-Jun-2018
  • (2017)Automatic generation of fast BLAS3-GEMM: a portable compiler approachProceedings of the 2017 International Symposium on Code Generation and Optimization10.5555/3049832.3049846(122-133)Online publication date: 4-Feb-2017
  • (2015)Loo.py: from fortran to performance via transformation and substitution rulesProceedings of the 2nd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming10.1145/2774959.2774969(1-6)Online publication date: 13-Jun-2015
  • (2015)Hadoop+Proceedings of the 29th ACM on International Conference on Supercomputing10.1145/2751205.2751236(143-153)Online publication date: 8-Jun-2015
  • (2014)Loo.pyProceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming10.1145/2627373.2627387(82-87)Online publication date: 9-Jun-2014
  • (2013)An empirical model for predicting cross-core performance interference on multicore processorsProceedings of the 22nd international conference on Parallel architectures and compilation techniques10.5555/2523721.2523750(201-212)Online publication date: 7-Oct-2013
  • (2013)AUGEMProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.1145/2503210.2503219(1-12)Online publication date: 17-Nov-2013
  • (2013)Layout-oblivious compiler optimization for matrix computationsACM Transactions on Architecture and Code Optimization (TACO)10.1145/2400682.24006949:4(1-20)Online publication date: 20-Jan-2013
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media