Article

A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures

Authors:

Mitch Horton,

Stanimire Tomov,

Jack DongarraAuthors Info & Claims

SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

Pages 150 - 158

https://doi.org/10.1109/SAAHPC.2011.18

Published: 19 July 2011 Publication History

Abstract

Three out of the top four supercomputers in the November 2010 TOP500 list of the world's most powerful supercomputers use NVIDIA GPUs to accelerate computations. Ninety-five systems from the list are using processors with six or more cores. Three-hundred-sixty-five systems use quad-core processor-based systems. Thirty-seven systems are using dual-core processors. The large-scale enabling of hybrid graphics processing unit (GPU)-based multicore platforms for computational science by developing fundamental numerical libraries (in particular, libraries in the area of dense linear algebra) for them has been underway for some time. We present a class of algorithms based largely on software infrastructures that have already been developed for homogeneous multicores and hybrid GPU-based computing. The algorithms extend what is currently available in the Matrix Algebra for GPU and Multicore Architectures (MAGMA) Library for performing Cholesky, QR, and LU factorizations using a single core or socket and a single GPU. The extensions occur in two areas. First, panels factored on the CPU using LAPACK are, instead, done in parallel using a highly optimized dynamic asynchronous scheduled algorithm on some number of CPU cores. Second, the remaining CPU cores are used to update the rightmost panels of the matrix in parallel.

Cited By

View all

Ma WLiu FChen DLu QHu YWang HYuan X(2023)An Optimized Framework for Matrix Factorization on the New Sunway Many-core PlatformACM Transactions on Architecture and Code Optimization10.1145/357185620:2(1-24)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1145/3571856
Dongarra JAbalenkovs MAbdelfattah AGates MHaidar AKurzak JLuszczek PTomov SYamazaki IYarKhan A(2015)Parallel Programming Models for Dense Linear Algebra on Heterogeneous SystemsSupercomputing Frontiers and Innovations: an International Journal10.14529/jsfi1504052:4(67-86)Online publication date: 1-Mar-2015
https://dl.acm.org/doi/10.14529/jsfi150405
Mittal SVetter J(2015)A Survey of CPU-GPU Heterogeneous Computing TechniquesACM Computing Surveys10.1145/278839647:4(1-35)Online publication date: 21-Jul-2015
https://dl.acm.org/doi/10.1145/2788396
Show More Cited By

Recommendations

Implementing a parallel matrix factorization library on the cell broadband engine
High Performance Computing with the Cell Broadband Engine

Matrix factorization (or often called decomposition) is a frequently used kernel in a large number of applications ranging from linear solvers to data clustering and machine learning. The central contribution of this paper is a thorough performance ...
Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems

With the raw computing power of graphics processing units (GPUs) being more widely available in commodity multicore systems, there is an imminent need to harness their power for important numerical libraries such as LAPACK. In this paper, we consider ...
Massively LDPC Decoding on Multicore Architectures

Unlike usual VLSI approaches necessary for the computation of intensive Low-Density Parity-Check (LDPC) code decoders, this paper presents flexible software-based LDPC decoders. Algorithms and data structures suitable for parallel computing are proposed ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

July 2011

169 pages

ISBN:9780769544489

Publisher

IEEE Computer Society

United States

Publication History

Published: 19 July 2011

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Ma WLiu FChen DLu QHu YWang HYuan X(2023)An Optimized Framework for Matrix Factorization on the New Sunway Many-core PlatformACM Transactions on Architecture and Code Optimization10.1145/357185620:2(1-24)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1145/3571856
Dongarra JAbalenkovs MAbdelfattah AGates MHaidar AKurzak JLuszczek PTomov SYamazaki IYarKhan A(2015)Parallel Programming Models for Dense Linear Algebra on Heterogeneous SystemsSupercomputing Frontiers and Innovations: an International Journal10.14529/jsfi1504052:4(67-86)Online publication date: 1-Mar-2015
https://dl.acm.org/doi/10.14529/jsfi150405
Mittal SVetter J(2015)A Survey of CPU-GPU Heterogeneous Computing TechniquesACM Computing Surveys10.1145/278839647:4(1-35)Online publication date: 21-Jul-2015
https://dl.acm.org/doi/10.1145/2788396
Lima JGautier TDanjean VRaffin BMaillard N(2015)Design and analysis of scheduling strategies for multi-CPU and multi-GPU architecturesParallel Computing10.1016/j.parco.2015.03.00144:C(37-52)Online publication date: 1-May-2015
https://dl.acm.org/doi/10.1016/j.parco.2015.03.001
Herrmann JMarchal LRobert Y(2015)Memory-aware tree traversals with pre-assigned tasksJournal of Parallel and Distributed Computing10.1016/j.jpdc.2014.10.00475:C(53-66)Online publication date: 1-Jan-2015
https://dl.acm.org/doi/10.1016/j.jpdc.2014.10.004
Park APerumalla K(2013)Efficient heterogeneous execution on large multicore and accelerator platformsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2013.07.01273:12(1578-1591)Online publication date: 1-Dec-2013
https://dl.acm.org/doi/10.1016/j.jpdc.2013.07.012
Herrmann JMarchal LRobert Y(2013)Model and complexity results for tree traversals on hybrid platformsProceedings of the 19th international conference on Parallel Processing10.1007/978-3-642-40047-6_65(647-658)Online publication date: 26-Aug-2013
https://dl.acm.org/doi/10.1007/978-3-642-40047-6_65
Diogo MGrelck C(2012)Towards Heterogeneous Computing without Heterogeneous ProgrammingProceedings of the 2012 Conference on Trends in Functional Programming - Volume 782910.1007/978-3-642-40447-4_18(279-294)Online publication date: 12-Jun-2012
https://dl.acm.org/doi/10.1007/978-3-642-40447-4_18
Clarke DIlic ALastovetsky ASousa L(2012)Hierarchical partitioning algorithm for scientific computing on highly heterogeneous CPU + GPU clustersProceedings of the 18th international conference on Parallel Processing10.1007/978-3-642-32820-6_49(489-501)Online publication date: 27-Aug-2012
https://dl.acm.org/doi/10.1007/978-3-642-32820-6_49

Abstract

Cited By

Recommendations

Implementing a parallel matrix factorization library on the cell broadband engine

Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems

Massively LDPC Decoding on Multicore Architectures

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations