[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Public Access

Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software

Published: 26 April 2021 Publication History

Abstract

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.

References

[1]
Ahmad Abdelfattah, Hartwig Anzt, Erik Boman, Erin Carson, Terry Cojean, Jack Dongarra, Mark Gates, Thomas Gruetzmacher, Nicholas J. Higham, Sherry Li, Neil Lindquist, Yang Liu, Jennifer Loe, Piotr Luszczek, Pratik Nayak, Sri Pranesh, Siva Rajamanickam, Tobias Ribizel, Barry Smith, Kasia Swirydowicz, Stephen Thomas, Stanimire Tomov, Yaohung Tsai, Ichitaro Yamazaki, and Urike Meier Yang. 2020. A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic. SLATE Working Notes 15, ICL-UT-20-08.
[2]
Hartwig Anzt, Yen-Chen Chen, Terry Cojean, Jack Dongarra, Goran Flegar, Pratik Nayak, Enrique S. Quintana-Ortí, Yuhsiang M. Tsai, and Weichung Wang. 2019a. Towards continuous benchmarking: An automated performance evaluation framework for high performance software. In Proceedings of the Platform for Advanced Scientific Computing Conference (PASC’19). ACM, New York, NY, USA, Article 9, 11 pages.
[3]
Hartwig Anzt, Terry Cojean, Yen-Chen Chen, Goran Flegar, Fritz Göbel, Thomas Grützmacher, Pratik Nayak, Tobias Ribizel, and Yu-Hsiang Tsai. 2020a. Ginkgo: A high performance numerical linear algebra library. Journal of Open Source Software 5, 52 (2020), 2260.
[4]
Hartwig Anzt, Terry Cojean, Goran Flegar, Fritz Göbel, Thomas Grützmacher, Pratik Nayak, Tobias Ribizel, Yuhsiang Mike Tsai, and Enrique S. Quintana-Ortí. 2020b. Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing. arxiv:cs.MS/2006.16852
[5]
Hartwig Anzt, Jack Dongarra, Goran Flegar, Nicholas J. Higham, and Enrique S. Quintana-Ortí. 2019b. Adaptive precision in block-Jacobi preconditioning for iterative sparse linear system solvers. Concurrency and Computation: Practice and Experience 31, 6 (2019), e4460.
[6]
Hartwig Anzt, Jack Dongarra, Goran Flegar, and Enrique S. Quintana-Ortí. 2017a. Batched Gauss-Jordan elimination for block-Jacobi preconditioner generation on GPUs. In Proceedings of the8th International Workshop on Programming Models & Applications for Multicores & Manycores (PMAM). 1--10.
[7]
Hartwig Anzt, Jack Dongarra, Goran Flegar, and Enrique S. Quintana-Ortí. 2017b. Variable-size batched LU for small matrices and its integration into block-jacobi preconditioning. In 2017 46th International Conference on Parallel Processing (ICPP). 91--100.
[8]
Hartwig Anzt, Jack Dongarra, Goran Flegar, and Enrique S. Quintana-Ortí. 2018. Variable-size batched Gauss--Jordan elimination for block-Jacobi preconditioning on graphics processors. Parallel Comput. (jan 2018).
[9]
Hartwig Anzt, Jack Dongarra, Goran Flegar, Enrique S. Quintana-Ortí, and Andrés E. Tomás. 2017c. Variable-size batched gauss-huard for block-jacobi preconditioning. Procedia Computer Science 108 (2017), 1783--1792. International Conference on Computational Science, {ICCS} 2017, 12-14 June 2017, Zurich, Switzerland.
[10]
Hartwig Anzt, Jack Dongarra, Goran G. Flegar, and Thomas Grützmacher. 2018. Variable-size batched condition number calculation on GPUs. In 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). 132--139.
[11]
Hartwig Anzt, Goran Flegar, Thomas Grützmacher, and Enrique S. Quintana-Ortí. 2019c. Toward a modular precision ecosystem for high-performance computing. The International Journal of High Performance Computing Applications (2019), 1094342019846547.
[12]
Aydin Buluç, Jeremy T. Fineman, Matteo Frigo, John R. Gilbert, and Charles E. Leiserson. 2009. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In Proceedings of the 21st Annual Symposium on Parallelism in Algorithms and Architectures (SPAA’09). ACM, 233--244.
[13]
Erin Carson and Nicholas J. Higham. 2018. Accelerating the solution of linear systems by iterative refinement in three precisions. SIAM J. Scientific Computing 40, 2 (2018), A817--A847.
[14]
IEEE Standard Commitee. 2000. IEEE standard for modeling and simulation (M& S) high level architecture (HLA) - Framework and rules. IEEE Std. 1516-2000 (2000), i --22.
[15]
Siegfried Cools. 2018. Numerical stability analysis of the class of communication hiding pipelined Conjugate Gradient methods. CoRR abs/1804.02962 (2018). arxiv:1804.02962 http://arxiv.org/abs/1804.02962
[16]
Timothy A. Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011), 25 pages.
[17]
Jack Dongarra et al. 2014. Applied Mathematics Research for Exascale Computing. Technical Report. U.S. Dept. of Energy, Office of Science, Advanced Scientific Computing Research Program. https://science.energy.gov/media/ascr/pdf/research/am/docs/EMWGreport.pdf.
[18]
Markus Goetz and Hartwig Anzt. 2018. Machine learning-aided numerical linear algebra: Convolutional neural networks for the efficient preconditioner generation. In Proceedings of the 2018 IEEE/ACM 9th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (scalA). 49--56.
[19]
Gene H. Golub and Charles F. Van Loan. 1996. Matrix Computations (3rd ed.). The Johns Hopkins University Press, Baltimore.
[20]
Gene H. Golub and Qiang Ye. 1999. Inexact preconditioned conjugate gradient method with inner-outer iteration. SIAM Journal on Scientific Computing 21, 4 (1999), 1305--1320. arXiv:https://doi.org/10.1137/S1064827597323415
[21]
Google. 2019. https://github.com/google/googletest.
[22]
Nicholas J. Higham. 2002. Accuracy and Stability of Numerical Algorithms (2nd ed.). Society for Industrial and Applied Mathematics, Philadelphia, PA.
[23]
Mark Hoemmen. 2010. Communication-avoiding Krylov Subspace Methods. Ph.D. Dissertation. Berkeley, CA, USA. Advisor(s) Demmel, James W. AAI3413388. University of California, Berkeley.
[24]
Robert Lucas, James Ang, Keren Bergman, Shekhar Borkar, William Carlson, Laura Carrington, George Chiu, Robert Colwell, William Dally, Jack Dongarra, Al Geist, Rud Haring, Jeffrey Hittinger, Adolfy Hoisie, Dean Micron Klein, Peter Kogge, Richard Lethin, Vivek Sarkar, Robert Schreiber, John Shalf, Thomas Sterling, Rick Stevens, Jon Bashor, Ron Brightwell, Paul Coteus, Erik Debenedictus, Jon Hiller, K. H. Kim, Harper Langston, Richard Micron Murphy, Clayton Webster, Stefan Wild, Gary Grider, Rob Ross, Sven Leyffer, and James Laros III. 2014. Top ten Exascale Research Challenges. http://science.energy.gov/media/ascr/ascac/pdf/meetings/Q774420140210/Top10reportFEB14.pdf.
[25]
NVIDIA Corporation 2018. NVIDIA CUDA Toolkit (9.0 ed.). NVIDIA Corporation.
[26]
Y. Saad. 2003. Iterative Methods for Sparse Linear Systems (2nd ed.). SIAM.

Cited By

View all
  • (2024)High performance computing seismic redatuming by inversion with algebraic compression and multiple precisionsInternational Journal of High Performance Computing Applications10.1177/1094342023122619038:3(225-244)Online publication date: 15-May-2024
  • (2024)Adaptive Precision Sparse Matrix–Vector Product and Its Application to Krylov SolversSIAM Journal on Scientific Computing10.1137/22M152261946:1(C30-C56)Online publication date: 25-Jan-2024
  • (2024)Accelerated Atomistic Kinetic Monte Carlo Simulations of Resistive Memory ArraysProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00097(1-16)Online publication date: 17-Nov-2024
  • Show More Cited By

Index Terms

  1. Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Mathematical Software
      ACM Transactions on Mathematical Software  Volume 47, Issue 2
      June 2021
      243 pages
      ISSN:0098-3500
      EISSN:1557-7295
      DOI:10.1145/3459727
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 April 2021
      Accepted: 01 December 2020
      Revised: 01 July 2020
      Received: 01 April 2019
      Published in TOMS Volume 47, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. GPU
      2. Krylov solvers
      3. Sparse linear algebra
      4. adaptive precision
      5. block-Jacobi
      6. preconditioning

      Qualifiers

      • Research-article
      • Opinion
      • Refereed limited

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)272
      • Downloads (Last 6 weeks)44
      Reflects downloads up to 12 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)High performance computing seismic redatuming by inversion with algebraic compression and multiple precisionsInternational Journal of High Performance Computing Applications10.1177/1094342023122619038:3(225-244)Online publication date: 15-May-2024
      • (2024)Adaptive Precision Sparse Matrix–Vector Product and Its Application to Krylov SolversSIAM Journal on Scientific Computing10.1137/22M152261946:1(C30-C56)Online publication date: 25-Jan-2024
      • (2024)Accelerated Atomistic Kinetic Monte Carlo Simulations of Resistive Memory ArraysProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00097(1-16)Online publication date: 17-Nov-2024
      • (2024)Mille-feuille: A Tile-Grained Mixed Precision Single-Kernel Conjugate Gradient Solver on GPUsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00064(1-16)Online publication date: 17-Nov-2024
      • (2024)Investigating mixed-precision for AGATA pulse-shape analysisEPJ Web of Conferences10.1051/epjconf/202429503020295(03020)Online publication date: 6-May-2024
      • (2024)Reduced-Precision and Reduced-Exponent Formats for Accelerating Adaptive Precision Sparse Matrix–Vector ProductEuro-Par 2024: Parallel Processing10.1007/978-3-031-69583-4_2(17-30)Online publication date: 26-Aug-2024
      • (2023)Parallelization of the Fluid Behavior Modeling Algorithm in Real TimeÈlektronnoe modelirovanie10.15407/emodel.45.06.08545:6(85-101)Online publication date: 19-Dec-2023
      • (2023)Mixed-Precision Sparse Approximate Inverse Preconditioning Algorithm on GPUIEEE Access10.1109/ACCESS.2023.333844311(136410-136421)Online publication date: 2023
      • (2023)Three-precision algebraic multigrid on GPUsFuture Generation Computer Systems10.1016/j.future.2023.07.024149(280-293)Online publication date: Dec-2023
      • (2022)Performance impact of precision reduction in sparse linear systems solversPeerJ Computer Science10.7717/peerj-cs.7788(e778)Online publication date: 17-Jan-2022
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media