[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3407947.3407951acmotherconferencesArticle/Chapter ViewAbstractPublication Pageshp3cConference Proceedingsconference-collections
research-article

Recursive DiamondCandy: non-memory-bound LRnLA algorithm for 3D cross stencil calculations on CUDA GPU

Published: 06 August 2020 Publication History

Abstract

To make stencil computation on GPU completely free of the memory bandwidth limitation the development of new algorithms is essential. The new algorithm for tiling the dependency graph in time-space, called DiamondCandy, is presented. It fits the 3D cross stencil dependency region shape since the tile is based on the octahedron. In this paper, the first recursive application of the DiamondCandy on GPU is proposed. The algorithm parameters are adjusted to all levels of parallelism in a device, including CUDA-threads, warps, CUDA-blocks. The 3D locality of the tile shape allows to store all of the currently computed values in the register files and to use the shared memory only for the data exchange between warps. The concept is illustrated in a code for the solution of the 3D acoustic wave equation. The achieved performance on Volta V100 is 261 billion cell updates per second, which is more than 25% of the compute peak.

References

[1]
Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: an insightful visual performance model for multicore architectures. Communications of the ACM, 52, 4, 65--76.
[2]
David G Wonnacott and Michelle Mills Strout. 2013. On the scalability of loop tiling techniques. IMPACT 2013, 3.
[3]
Johannes Habich, T Zeiser, G Hager, and G Wellein. 2009. Enabling temporal blocking for a lattice Boltzmann flow solver through multicore-aware wavefront parallelization. In 21st International Conference on Parallel Computational Fluid Dynamics, 178--182.
[4]
Matthias Korch and Tim Werner. 2019. Improving locality of explicit one-step methods on gpus by tiling across stages and time steps. Future Generation Computer Systems.
[5]
Anthony E Terrano. 1988. Optimal tilings for iterative pde solvers. In Proceedings., 2nd Symposium on the Frontiers of Massively Parallel Computation. IEEE, 227--229.
[6]
Liang Yuan, Shan Huang, Yunquan Zhang, and Hang Cao. 2019. Tessellating star stencils. In Proceedings of the 48th International Conference on Parallel Processing. ACM, 43.
[7]
V. D. Levchenko and A. Y. Perepelkina. 2018. Locally recursive non-locally asynchronous algorithms for stencil computation. Lobachevskii Journal of Mathematics, 39, 4, 552--561.
[8]
Vadim Levchenko, Anastasia Perepelkina, and Andrey Zakirov. 2016. Diamondtorre algorithm for high-performance wave modeling. Computation, 4, 3, 29.
[9]
Boris Korneev and Vadim Levchenko. 2016. Detailed numerical simulation of shock-body interaction in 3D multicomponent flow using the RKDG numerical method and "diamondtorre" GPU algorithm of implementation. In Journal of Physics: Conf. Series number 1. Volume 681. IOP Publishing, 012046.
[10]
Andrey Zakirov, Vadim Levchenko, Anastasia Perepelkina, and Yasunari Zempo. 2016. High performance FDTD algorithm for GPGPU supercomputers. In Journal of Physics: Conference Series number 1. Volume 759. IOP Publishing, 012100.
[11]
Anastasia Perepelkina, Vadim Levchenko, and Sergey Khilkov. 2018. The DiamondCandy LRnLA algorithm: raising efficiency of the 3D cross-stencil schemes. The Journal of Supercomputing. ISSN: 1573--0484. https://doi.org/10.1007/s11227-018-2461-z.
[12]
Michael Wolfe. 1986. Loops skewing: the wavefront method revisited. International Journal of Parallel Programming, 15, 4, 279--293.
[13]
NVIDIA Corporation. 2019. CUDA C Programming Guide. (PG-02829-001_v10.1 edition). NVIDIA Corporation, 316. https://docs.nvidia.com/cuda/pdf/CUDA\_C\_Programming\_Guide.pdf.
[14]
Ilya S Pershin, Vadim D Levchenko, and Anastasia Y Perepelkina. 2019. Performance limits study of stencil codes on modern gpgpus. Supercomputing Frontiers and Innovations, 6, 2, 86--101.
[15]
John D McCalpin et al. 1995. Memory bandwidth and machine balance in current high performance computers. IEEE computer society technical committee on computer architecture (TCCA) newsletter, 2, 19--25.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
HP3C 2020: Proceedings of the 2020 4th International Conference on High Performance Compilation, Computing and Communications
June 2020
191 pages
ISBN:9781450376914
DOI:10.1145/3407947
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Xi'an Jiaotong-Liverpool University: Xi'an Jiaotong-Liverpool University
  • City University of Hong Kong: City University of Hong Kong
  • Guangdong University of Technology: Guangdong University of Technology

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 August 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 3D cross stencil
  2. CUDA GPU
  3. LRnLA algorithms
  4. diamond tiling
  5. stencil computing
  6. time blocking

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

HP3C 2020

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 43
    Total Downloads
  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media