[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3581784.3607109acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Open access

High-Performance SVD Partial Spectrum Computation

Published: 11 November 2023 Publication History

Abstract

We introduce a new singular value decomposition (SVD) solver based on the QR-based Dynamically Weighted Halley (QDWH) algorithm for computing the partial spectrum SVD (QDWHpartial-SVD) problems. By optimizing the rational function underlying the algorithms in the desired part of the spectrum only, the QDWHpartial-SVD algorithm efficiently computes a fraction (say 1--20%) of the leading singular values/vectors. We develop a high-performance implementation of QDWHpartial-SVD 1 on distributed-memory manycore systems and demonstrate its numerical robustness. We perform a benchmarking campaign against counterparts from the state-of-the-art numerical libraries across various matrix sizes using up to 36K MPI processes. Experimental results show performance speedups for QDWHpartial-SVD up to 6X and 2X against vendor-optimized PDGESVD from ScaLAPACK and KSVD on a Cray XC40 system using 1152 nodes based on two-socket 16-core Intel Haswell CPU, respectively. We also port our QDWHpartial-SVD software library to a system composed of 256 nodes with two-socket 64-Core AMD EPYC Milan CPU and achieve performance speedup up to 4X compared to vendor-optimized PDGESVD from ScaLAPACK. We also compare energy consumption for the two algorithms and demonstrate how QDWHpartial-SVD can further outperform PDGESVD in that regard by performing fewer memory-bound operations.

Supplemental Material

MP4 File - SC23 paper presentation recording for "High-Performance SVD Partial Spectrum Computation"
SC23 paper presentation recording for "High-Performance SVD Partial Spectrum Computation", by David Keyes, Hatem Ltaief, Yuji Nakatsukasa and Dalal Sukkari

References

[1]
Sameh Abdulah, Hatem Ltaief, Ying Sun, Marc G Genton, and David E Keyes. 2018. ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems. IEEE Transactions on Parallel and Distributed Systems 29, 12 (2018), 2771--2784.
[2]
Kadir Akbudak, Hatem Ltaief, Aleksandr Mikhalev, Ali Charara, Aniello Esposito, and David Keyes. 2018. Exploiting Data Sparsity for Large-Scale Matrix Computations. In Euro-Par 2018: Parallel Processing, Marco Aldinucci, Luca Padovani, and Massimo Torquati (Eds.), Vol. 11014. Springer International Publishing, Cham, 721--734.
[3]
Patrick Amestoy, Cleve Ashcraft, Olivier Boiteau, Alfredo Buttari, Jean-Yves L'Excellent, and Clément Weisbecker. 2015. Improving Multifrontal Methods by Means of Block Low-Rank Representations. SIAM Journal on Scientific Computing 37, 3 (2015), A1451--A1474.
[4]
Edward Anderson, Zhaojun Bai, Christian Heinrich Bischof, Laura Susan Blackford, James Weldon Demmel, Jack J Dongarra, Jeremy J Du Croz, Anne Greenbaum, Sven Hammarling, A McKenney, and Danny C Sorensen. 1999. LAPACK User's Guide (3rd ed.). SIAM, Philadelphia.
[5]
I.Y. Bar-Itzhack. 1975. Iterative Optimal Orthogonalization of the Strapdown Matrix. IEEE Trans. Aerospace Electron. Systems AES-11, 1 (Jan 1975), 30--37.
[6]
Christian H. Bischof, Bruno Lang, and Xiaobai Sun. 2000. Algorithm 807: The SBR Toolbox---Software for Successive Band Reduction. ACM Trans. Math. Software 26, 4 (2000), 602--616.
[7]
L. Suzan Blackford, J. Choi, Andy Cleary, Eduardo F. D'Azevedo, James W. Demmel, Inderjit S. Dhillon, Jack J. Dongarra, Sven Hammarling, Greg Henry, Antoine Petitet, Ken Stanley, David W. Walker, and R. Clint Whaley. 1997. ScaLAPACK Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia.
[8]
Qinglei Cao, Sameh Abdulah, Rabab Alomairy, Yu Pei, Pratik Nag, George Bosilca, Jack Dongarra, Marc G. Genton, David E. Keyes, Hatem Ltaief, and Ying Sun. 2022. Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC'22). IEEE Press, Dallas, Texas, Article 2, 12 pages.
[9]
Lars Eldén. 2007. Matrix Methods in Data Mining and Pattern Recognition. Society for Industrial and Applied Mathematics. x + 224 pages.
[10]
Aniello Esposito, David E. Keyes, Hatem Ltaief, and Dalal Sukkari. 2018. Performance Impact of Rank-Reordering on Advanced Polar Decomposition Algorithms. In Cray Users' Group Conference. http://hdl.handle.net/10754/628026
[11]
Massimiliano Fasi and Nicholas J. Higham. 2021. Generating Extreme-Scale Matrices With Specified Singular Values or Condition Number. SIAM Journal on Scientific Computing 43, 1 (2021), A663--A684. arXiv:https://doi.org/10.1137/20M1327938
[12]
Jerome A. Goldstein and Mel Levy. 1991. Linear Algebra and Quantum Chemistry. Amer. Math. Monthly 98, 10 (Oct. 1991), 710--718.
[13]
Gene H. Golub and C. Reinsch. 1970. Singular Value Decomposition and Least Squares Solutions. Numerische Mathematik 14 (1970), 403--420.
[14]
Gene H. Golub and Charles F. Van Loan. 2012. Matrix Computations (4th ed.). The Johns Hopkins University Press.
[15]
Ming Gu and Stanley C. Eisenstat. 1996. Efficient algorithms for computing a strong rank-revealing QR factorization. SIAM Journal on Scientific Computing 17, 4 (1996), 848--869.
[16]
Wolfgang Hackbusch. 2015. Hierarchical Matrices: Algorithms and Analysis. Vol. 49. Springer. Springer Series in Computational Mathematics.
[17]
Azzam Haidar, Hatem Ltaief, and Jack Dongarra. 2011. Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems Using Aggregated Finegrained And Memory-aware Kernels. In Proceedings of SC'11 Conference on High Performance Computing Networking, Storage and Analysis. ACM SIGARCH/IEEE Computer Society, Seattle, WA, USA, 8.
[18]
Azzam Haidar, Hatem Ltaief, and Jack Dongarra. 2012. Toward a High Performance Tile Divide and Conquer Algorithm for the Dense Symmetric Eigenvalue Problem. SIAM Journal on Scientific Computing 34, 6 (2012), 249--274.
[19]
Azzam Haidar, Stanimire Tomov, Jack Dongarra, Raffaele Solcá, and Thomas Schulthess. 2014. A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine-Grained Memory Aware Tasks. The International Journal of High Performance Computing Applications 28, 2 (2014), 196--209.
[20]
Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53, 2 (2011), 217--288.
[21]
Nicholas J. Higham. 2008. Functions of Matrices: Theory and Computation. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. xx+425 pages.
[22]
Bruno Lang. 1999. Efficient Eigenvalue and Singular Value Computations On Shared Memory Machines. Parallel Comput. 25, 7 (1999), 845--860.
[23]
Hatem Ltaief, Piotr Luszczek, and Jack Dongarra. 2012. Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction. In Parallel Processing and Applied Mathematics, Roman Wyrzykowski, Jack Dongarra, Konrad Karczewski, and Jerzy Waśniewski (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 661--670.
[24]
Hatem Ltaief, Piotr Luszczek, and Jack Dongarra. 2012. High Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures. ACM Trans. Math. Software 39, 3 (2012), 1--22.
[25]
Hatem Ltaief, Piotr Luszczek, Azzam Haidar, and Jack Dongarra. 2011. Solving the Generalized Symmetric Eigenvalue Problem using Tile Algorithms on Multicore Architectures. In PARCO (Advances in Parallel Computing, Vol. 22), Koen De Bosschere, Erik H. D'Hollander, Gerhard R. Joubert, David A. Padua, Frans J. Peters, and Mark Sawyer (Eds.). IOS Press, 397--404.
[26]
Hatem Ltaief, Dalal Sukkari, Aniello Esposito, Yuji Nakatsukasa, and David Keyes. 2019. Massively Parallel Polar Decomposition on Distributed-Memory Systems. ACM Transactions on Parallel Computing 6, 1, Article 4 (June 2019), 15 pages.
[27]
Hatem Ltaief, Dalal Sukkari, Oliver Guyon, and David Keyes. 2018. Extreme Computing for Extreme Adaptive Optics: The Key to Finding Life Outside Our Solar System. In Proceedings of the Platform for Advanced Scientific Computing Conference (Basel, Switzerland) (PASC'18). ACM, New York, NY, USA, Article 1, 10 pages.
[28]
Piotr Luszczek, Hatem Ltaief, and Jack Dongarra. 2011. Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices using Tile Algorithms on Multicore Architectures. In 2011 IEEE International Parallel & Distributed Processing Symposium. ACM, Anchorage, AK USA, 944--955.
[29]
Osni Marques, James Demmel, and Paulo B. Vasconcelos. 2020. Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem. ACM Trans. Math. Software 46, 2, Article 14 (May 2020), 25 pages.
[30]
Yuji Nakatsukasa. 2020. Fast and stable randomized low-rank matrix approximation. arXiv:2009.11392 (2020).
[31]
Yuji Nakatsukasa, Zhaojun Bai, and François Gygi. 2010. Optimizing Halley's Iteration for Computing the Matrix Polar Decomposition. SIAM J. Matrix Anal. Appl. 31, 5 (2010), 2700--2720. arXiv:https://doi.org/10.1137/090774999
[32]
Yuji Nakatsukasa and Roland W. Freund. 2016. Computing Fundamental Matrix Decompositions Accurately via the Matrix Sign Function in Two Iterations: The Power of Zolotarev's Functions. SIAM Rev. 58, 3 (2016), 461--493.
[33]
Yuji Nakatsukasa and Nicholas J. Higham. 2013. Stable and Efficient Spectral Divide and Conquer Algorithms for the Symmetric Eigenvalue Decomposition and the SVD. SIAM Journal on Scientific Computing 35, 3 (2013), A1325--A1349.
[34]
Ivan V. Oseledets and Eugene E. Tyrtyshnikov. 2009. Breaking the Curse of Dimensionality, Or How to Use SVD in Many Dimensions. SIAM Journal on Scientific Computing 31, 5 (Oct. 2009), 3744--3759.
[35]
Robert Schreiber and Beresford Parlett. 1988. Block Reflectors: Theory and Computation. SIAM J. Numer. Anal. 25, 1 (1988), 189--205.
[36]
Rémi Soummer, Laurent Pueyo, and James Larkin. 2012. Detection and characterization of exoplanets and disks using projections on Karhunen-Loève eigenimages. The Astrophysical Journal Letters 755, 2 (2012), L28.
[37]
Dalal Sukkari, Hatem Ltaief, Aniello Esposito, and David Keyes. 2019. A QDWH-Based SVD Software Framework on Distributed-Memory Manycore Systems. ACM Trans. Math. Software 45, 2, Article 18 (April 2019), 21 pages.
[38]
Dalal Sukkari, Hatem Ltaief, Mathieu Faverge, and David Keyes. 2017. Asynchronous Task-Based Polar Decomposition on Single Node Manycore Architectures. IEEE Transactions on Parallel and Distributed Systems PP, 99 (2017), 1--1.
[39]
Dalal Sukkari, Hatem Ltaief, and David E. Keyes. 2016. A High Performance QDWH-SVD Solver Using Hardware Accelerators. ACM Trans. Math. Software 43, 1 (2016), 6:1--6:25.
[40]
Dalal Sukkari, Hatem Ltaief, and David E. Keyes. 2016. High Performance Polar Decomposition on Distributed Memory Systems. In Euro-Par 2016: Parallel Processing - 22nd International Conference on Parallel and Distributed Computing, Grenoble, France, August 24--26, 2016, Proceedings (Lecture Notes in Computer Science, Vol. 9833), Pierre-François Dutot and Denis Trystram (Eds.). Springer, 605--616.
[41]
Lloyd N. Trefethen and David Bau. 1997. Numerical Linear Algebra. SIAM, Philadelphia, PA. http://www.siam.org/books/OT50/Index.htm

Cited By

View all
  • (2024)svds-C: A multi-thread C code for computing truncated singular value decompositionSoftwareX10.1016/j.softx.2024.10178127(101781)Online publication date: Sep-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2023
1428 pages
ISBN:9798400701092
DOI:10.1145/3581784
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2023

Check for updates

Author Tags

  1. singular value decomposition
  2. partial spectrum
  3. parallel numerical algorithms
  4. distributed-memory systems

Qualifiers

  • Research-article

Conference

SC '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)270
  • Downloads (Last 6 weeks)35
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)svds-C: A multi-thread C code for computing truncated singular value decompositionSoftwareX10.1016/j.softx.2024.10178127(101781)Online publication date: Sep-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media