[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3673038.3673040acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Open access

FP16 Acceleration in Structured Multigrid Preconditioner for Real-World Applications

Published: 12 August 2024 Publication History

Abstract

Half-precision hardware support is now almost ubiquitous. In contrast to its active use in AI, half-precision is less commonly employed in scientific and engineering computing. The valuable proposition of accelerating scientific computing applications using half-precision prompted this study. Focusing on solving sparse linear systems in scientific computing, we explore the technique of utilizing FP16 in multigrid preconditioners. Based on observations of sparse matrix formats, numerical features of scientific applications, and the performance characteristics of multigrid, this study formulates four guidelines for FP16 utilization in multigrid. The proposed algorithm demonstrates how to avoid FP16 overflow through scaling. A setup-then-scale strategy prevents FP16’s limited accuracy and narrow range from interfering with the multigrid’s numerical properties. Another strategy, recover-and-rescale on the fly, reduces the memory footprint of hotspot kernels. The extra precision-conversion overhead in mix-precision kernels is addressed by the transformation of storage formats and SIMD implementation. Two ablation experiments validate the effectiveness of our algorithm and parallel kernel implementation on ARM and X86 architectures. We further evaluate three idealized and five real-world problems to demonstrate the advantage of utilizing FP16 in a multigrid preconditioner. The average speedups are approximately 2.75x and 1.95x in preconditioner and end-to-end workflow, respectively.

References

[1]
Ahmad Abdelfattah and et al. 2021. A survey of numerical linear algebra methods utilizing mixed-precision arithmetic. 35 (Mar 2021).
[2]
China Meteorological Administration. 2016. GRAPES Numerical Weather Prediction System. Retrieved July 7, 2023 from https://www.cma.gov.cn/2011xwzx/2011xqxxw/2011xqxyw/202110/t20211030_4079298.html
[3]
Innovative Computing Laboratory at University of Tennessee. 2023. HPL-MXP mixed-precision benchmark. Retrieved March 3, 2023 from https://hpl-mxp.org/
[4]
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. 38, 1, Article 1 (Dec 2011), 25 pages.
[5]
Maximilian Emans and Albert van der Meer. 2010. Mixed-precision AMG as linear equation solver for definite systems. 1, 1 (2010), 175–183. https://doi.org/10.1016/j.procs.2010.04.020 ICCS 2010.
[6]
Robert D. Falgout and Jacob B. Schroder. 2014. Non-Galerkin Coarse Grids for Algebraic Multigrid. 36, 3 (Jan 2014), C309–C334.
[7]
Hormozd Gahvari and et al. 2012. Modeling the Performance of an Algebraic Multigrid Cycle Using Hybrid MPI/OpenMP. In. 128–137.
[8]
S. L. Glimberg and et al. 2013. A Fast GPU-Accelerated Mixed-Precision Strategy for Fully Nonlinear Water Wave Computations. In. Berlin, Heidelberg, 645–652.
[9]
Dominik Goddeke and Robert Strzodka. 2011. Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid. 22, 1 (jan 2011), 22–32. https://doi.org/10.1109/TPDS.2010.61
[10]
Nicholas J. Higham and Theo Mary. 2022. Mixed precision algorithms in numerical linear algebra. 31 (2022), 347–414.
[11]
Nhut-Minh Ho and et al. 2017. Exploiting half precision arithmetic in Nvidia GPUs. In.
[12]
X. Huang and et al. 2016. P-CSI v1.0, an accelerated barotropic solver for the high-resolution ocean model component in the Community Earth System Model v2.0. 9, 11 (2016), 4209–4225.
[13]
Intel. 2018. BFLOAT16 - hardware numerics definition.Retrieved Nov 30, 2023 from https://www.intel.com/content/dam/develop/external/us/en/documents/bf16-hardware-numerics-definition-white-paper.pdf
[14]
Carlo Janna, Andrea Comerlati, and Giuseppe Gambolati. 2009. A Comparison of Projective and Direct Solvers for Finite Elements in Elastostatics. 40, 8 (aug 2009), 675–685. https://doi.org/10.1016/j.advengsoft.2008.11.010
[15]
Lawrence Livermore National Lab. 2023. Documentation for hypre. Retrieved March 3, 2023 from https://hypre.readthedocs.io/en/latest
[16]
Lawrence Livermore National Lab. 2023. Structured multigrid in HYPRE. Retrieved March 3, 2023 from https://hypre.readthedocs.io/en/latest/solvers-smg-pfmg.html
[17]
Ruipeng Li and Ulrike Meier Yang. 2021. Performance Evaluation of hypre Solvers. (Feb 2021). https://doi.org/10.2172/1764323
[18]
Daniel Lowell and et al. 2013. Stencil-Aware GPU Optimization of Iterative Solvers. 35, 5 (2013), S209–S228.
[19]
Stephen F. McCormick, Joseph Benzaken, and Rasmus Tamstorf. 2020. Algebraic error analysis for mixed-precision multigrid solvers. 43 (2020), S392–S419.
[20]
Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. 2018. Mixed Precision Training. arxiv:1710.03740 [cs.AI]
[21]
Eike H. Müller and et al. 2014. Massively parallel solvers for elliptic partial differential equations in numerical weather and climate prediction. 140, 685 (2014), 2608–2624.
[22]
M. Naumov and et al. 2015. AmgX: A Library for GPU Accelerated Algebraic Multigrid and Preconditioned Iterative Methods. 37, 5 (2015), S602–S626.
[23]
Society of Petroleum Engineers. 2023. SPE Comparative Solution Project. Retrieved March 3, 2023 from https://www.spe.org/web/csp/datasets/set02.htm
[24]
Kyaw Linn Oo and Andreas Vogel. 2020. Accelerating Geometric Multigrid Preconditioning with Half-Precision Arithmetic on GPUs. arxiv:2007.07539 [cs.MS]
[25]
China Meteorological News Press. 2014. An Introduction of GRAPES.Retrieved July 7, 2023 from https://www.cma.gov.cn/en/NewsReleases/MetInstruments/201403/t20140327_241784.html
[26]
Trilinos project. 2023. MueLu. Retrieved Nov 26, 2023 from https://trilinos.github.io/muelu.html
[27]
Christian Richter and et al. 2014. GPU-accelerated mixed precision algebraic multigrid preconditioners for discrete elliptic field problems. In. 1–2.
[28]
Yousef Saad. 2003. (second ed.). Society for Industrial and Applied Mathematics.
[29]
Martin H. Sadd. 2005. Academic Press. https://doi.org/10.1016/B978-0-12-605811-6.X5000-3
[30]
K. Stuben. 2000. Algebraic Multigrid (AMG) : An Introduction With Applications.
[31]
MFEM team. 2023. MFEM examples. Retrieved Nov 26, 2023 from https://mfem.org/examples
[32]
Ulrich Trottenberg and et al. 2001. Academic Press, San Diego, California, USA.
[33]
Yu-Hsiang Mike Tsai and et al. 2023. Three-precision algebraic multigrid on GPUs. 149 (12 2023).
[34]
Xiaowen Xu and et al. 2017. Algebraic interface-based coarsening AMG preconditioner for multi-scale sparse matrices with applications to radiation hydrodynamics computation. 24, 2 (2017), e2078. https://doi.org/10.1002/nla.2078
[35]
Takateru Yamagishi and et al. 2016. GPU Acceleration of a Non-Hydrostatic Ocean Model with a Multigrid Poisson/Helmholtz Solver. 80, C (jun 2016), 1658–1669. https://doi.org/10.1016/j.procs.2016.05.502
[36]
Ulrike Meier Yang. 2010. On long-range interpolation operators for aggressive coarsening. 17, 2-3 (2010), 453–472.
[37]
Xiaojian Yang and et al. 2023. Optimizing Multi-Grid Computation and Parallelization on Multi-Cores. In. 227–239. https://doi.org/10.1145/3577193.3593726
[38]
Chensong Zhang and et al. 2023. OpenCAEPoro. Retrieved March 3, 2023 from https://github.com/OpenCAEPlus/OpenCAEPoro/tree/main/examples/spe10
[39]
Qianchao Zhu and et al. 2021. Enabling and Scaling the HPCG Benchmark on the Newest Generation Sunway Supercomputer with 42 Million Heterogeneous Cores. In. ACM, Article 57, 13 pages. https://doi.org/10.1145/3458817.3476158
[40]
Yi Zong and et al. 2024. POSTER: StructMG: A Fast and Scalable Structured Multigrid. In. ACM, 478–480. https://doi.org/10.1145/3627535.3638482

Index Terms

  1. FP16 Acceleration in Structured Multigrid Preconditioner for Real-World Applications

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ICPP '24: Proceedings of the 53rd International Conference on Parallel Processing
      August 2024
      1279 pages
      ISBN:9798400717932
      DOI:10.1145/3673038
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 August 2024

      Check for updates

      Author Tags

      1. multigrid
      2. preconditioner
      3. sparse matrix
      4. structured grid

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      Conference

      ICPP '24

      Acceptance Rates

      Overall Acceptance Rate 91 of 313 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 363
        Total Downloads
      • Downloads (Last 12 months)363
      • Downloads (Last 6 weeks)93
      Reflects downloads up to 25 Dec 2024

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media