[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/SAAHPC.2011.18guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures

Published: 19 July 2011 Publication History

Abstract

Three out of the top four supercomputers in the November 2010 TOP500 list of the world's most powerful supercomputers use NVIDIA GPUs to accelerate computations. Ninety-five systems from the list are using processors with six or more cores. Three-hundred-sixty-five systems use quad-core processor-based systems. Thirty-seven systems are using dual-core processors. The large-scale enabling of hybrid graphics processing unit (GPU)-based multicore platforms for computational science by developing fundamental numerical libraries (in particular, libraries in the area of dense linear algebra) for them has been underway for some time. We present a class of algorithms based largely on software infrastructures that have already been developed for homogeneous multicores and hybrid GPU-based computing. The algorithms extend what is currently available in the Matrix Algebra for GPU and Multicore Architectures (MAGMA) Library for performing Cholesky, QR, and LU factorizations using a single core or socket and a single GPU. The extensions occur in two areas. First, panels factored on the CPU using LAPACK are, instead, done in parallel using a highly optimized dynamic asynchronous scheduled algorithm on some number of CPU cores. Second, the remaining CPU cores are used to update the rightmost panels of the matrix in parallel.

Cited By

View all
  • (2023)An Optimized Framework for Matrix Factorization on the New Sunway Many-core PlatformACM Transactions on Architecture and Code Optimization10.1145/357185620:2(1-24)Online publication date: 1-Mar-2023
  • (2015)Parallel Programming Models for Dense Linear Algebra on Heterogeneous SystemsSupercomputing Frontiers and Innovations: an International Journal10.14529/jsfi1504052:4(67-86)Online publication date: 1-Mar-2015
  • (2015)A Survey of CPU-GPU Heterogeneous Computing TechniquesACM Computing Surveys10.1145/278839647:4(1-35)Online publication date: 21-Jul-2015
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing
July 2011
169 pages
ISBN:9780769544489

Publisher

IEEE Computer Society

United States

Publication History

Published: 19 July 2011

Author Tags

  1. Cholesky
  2. GPU
  3. LU
  4. QR
  5. multicore

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)An Optimized Framework for Matrix Factorization on the New Sunway Many-core PlatformACM Transactions on Architecture and Code Optimization10.1145/357185620:2(1-24)Online publication date: 1-Mar-2023
  • (2015)Parallel Programming Models for Dense Linear Algebra on Heterogeneous SystemsSupercomputing Frontiers and Innovations: an International Journal10.14529/jsfi1504052:4(67-86)Online publication date: 1-Mar-2015
  • (2015)A Survey of CPU-GPU Heterogeneous Computing TechniquesACM Computing Surveys10.1145/278839647:4(1-35)Online publication date: 21-Jul-2015
  • (2015)Design and analysis of scheduling strategies for multi-CPU and multi-GPU architecturesParallel Computing10.1016/j.parco.2015.03.00144:C(37-52)Online publication date: 1-May-2015
  • (2015)Memory-aware tree traversals with pre-assigned tasksJournal of Parallel and Distributed Computing10.1016/j.jpdc.2014.10.00475:C(53-66)Online publication date: 1-Jan-2015
  • (2013)Efficient heterogeneous execution on large multicore and accelerator platformsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2013.07.01273:12(1578-1591)Online publication date: 1-Dec-2013
  • (2013)Model and complexity results for tree traversals on hybrid platformsProceedings of the 19th international conference on Parallel Processing10.1007/978-3-642-40047-6_65(647-658)Online publication date: 26-Aug-2013
  • (2012)Towards Heterogeneous Computing without Heterogeneous ProgrammingProceedings of the 2012 Conference on Trends in Functional Programming - Volume 782910.1007/978-3-642-40447-4_18(279-294)Online publication date: 12-Jun-2012
  • (2012)Hierarchical partitioning algorithm for scientific computing on highly heterogeneous CPU + GPU clustersProceedings of the 18th international conference on Parallel Processing10.1007/978-3-642-32820-6_49(489-501)Online publication date: 27-Aug-2012

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media