[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2503210.2503214acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

A scalable, efficient scheme for evaluation of stencil computations over unstructured meshes

Published: 17 November 2013 Publication History

Abstract

Stencil computations are a common class of operations that appear in many computational scientific and engineering applications. Stencil computations often benefit from compile-time analysis, exploiting data-locality, and parallelism. Post-processing of discontinuous Galerkin (dG) simulation solutions with B-spline kernels is an example of a numerical method which requires evaluating computationally intensive stencil operations over a mesh. Previous work on stencil computations has focused on structured meshes, while giving little attention to unstructured meshes. Performing stencil operations over an unstructured mesh requires sampling of heterogeneous elements which often leads to inefficient memory access patterns and limits data locality/reuse. In this paper, we present an efficient method for performing stencil computations over unstructured meshes which increases data-locality and cache efficiency, and a scalable approach for stencil tiling and concurrent execution. We provide experimental results in the context of post-processing of dG solutions that demonstrate the effectiveness of our approach.

References

[1]
M. Arenaz, J. Touriño, and R. Doallo. An inspector-executor algorithm for irregular assignment parallelization. In In Proc. of the 2nd International Symposium on Parallel and Distributed Processing and Applications (ISPA), 2005.
[2]
V. Bandishti, I. Pananilath, and U. Bondhugula. Tiling stencil computations to maximize parallelism. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 40:1--40:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.
[3]
J. L. Bentley and J. H. Friedman. Data Structures for Range Searching. ACM Comput. Surv., 11(4):397--409, Dec. 1979.
[4]
M. Burtscher, R. Nasre, and K. Pingali. A Quantitative Study of Irregular Programs on GPUs. In Proceedings of the IEEE International Symposium on Workload Characterization, IISWC '12, 2012.
[5]
L.-W. Chang, J. A. Stratton, H.-S. Kim, and W.-M. W. Hwu. A scalable, numerically stable, high-performance tridiagonal solver using GPUs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 27:1--27:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.
[6]
M. Christen, O. Schenk, and H. Burkhart. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures. In Parallel Distributed Processing Symposium (IPDPS), 2011 IEEE International, pages 676--687, 2011.
[7]
B. Cockburn, M. Luskin, C.-W. Shu, and E. Süli. Post-processing of Galerkin methods for hyperbolic problems. In Proceedings of the International Symposium on Discontinuous Galerkin Methods, pages 291--300. Springer, 1999.
[8]
B. Cockburn, M. Luskin, C.-W. Shu, and E. Suli. Enhanced accuracy by post-processing for finite element methods for hyperbolic equations. Mathematics of Computation, 72:577--606, 2003.
[9]
N. Corporation. CUDA C Best Practices Guide. NVIDIA, 2012.
[10]
K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC '08, pages 4:1--4:12, Piscataway, NJ, USA, 2008. IEEE Press.
[11]
J. Holewinski, L.-N. Pouchet, and P. Sadayappan. High-performance code generation for stencil computations on GPU architectures. In Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, pages 311--320, New York, NY, USA, 2012. ACM.
[12]
Khronos Group. The OpenCL Specification, Sept. 2011.
[13]
S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective automatic parallelization of stencil computations. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, PLDI '07, pages 235--244, New York, NY, USA, 2007. ACM.
[14]
T. Malas, A. J. Ahmadia, J. Brown, J. A. Gunnels, and D. E. Keyes. Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor. International Journal of High Performance Computing Applications, 27(2):193--209, May 2013.
[15]
D. J. Mavriplis. Unstructured Grid Techniques. Annual Review of Fluid Mechanics, 29(1):473--514, 1997.
[16]
H. Mirzaee, L. Ji, J. K. Ryan, and R. M. Kirby. Smoothness-Increasing Accuracy-Conserving (SIAC) Post-Processing for Discontinuous Galerkin solutions over structured Triangular Meshes. SIAM Journal of Numerical Analysis, 49:1899--1920, 2011.
[17]
H. Mirzaee, J. King, J. Ryan, and R. Kirby. Smoothness-Increasing Accuracy-Conserving Filters for Discontinuous Galerkin Solutions over Unstructured Triangular Meshes. SIAM Journal on Scientific Computing, 35(1):A212--A230, 2013.
[18]
H. Mirzaee, J. K. Ryan, and R. M. Kirby. Efficient Implementation of Smoothness-Increasing Accuracy-Conserving (SIAC) Filters for Discontinuous Galerkin Solutions. Journal of Scientific Computing, 2011.
[19]
NVIDIA. CUDA C Programming Guide v5.0. NVIDIA, 2012.
[20]
M. Rietmann, P. Messmer, T. Nissen-Meyer, D. Peter, P. Basini, D. Komatitsch, O. Schenk, J. Tromp, L. Boschi, and D. Giardini. Forward and adjoint simulations of seismic wave propagation on emerging large-scale GPU architectures. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 38:1--38:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.
[21]
J. K. Ryan and C.-W. Shu. On a one-sided post-processing technique for the discontinuous Galerkin methods. Methods and Applications of Analysis, 10:295--307, 2003.
[22]
J. K. Ryan, C.-W. Shu, and H. L. Atkins. Extension of a post-processing technique for the discontinuous Galerkin method for hyperbolic equations with application to an aeroacoustic problem. SIAM Journal on Scientific Computing, 26:821--843, 2005.
[23]
H. Samet. Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005.
[24]
L. Solano-Quinde, B. Bode, and A. K. Somani. Techniques for the parallelization of unstructured grid applications on multi-GPU systems. In Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM '12, pages 140--147, New York, NY, USA, 2012. ACM.
[25]
L. Solano-Quinde, Z. J. Wang, B. Bode, and A. K. Somani. Unstructured grid applications on GPU: performance analysis and improvement. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, pages 13:1--13:8, New York, NY, USA, 2011. ACM.
[26]
M. Steffen, S. Curtis, R. M. Kirby, and J. K. Ryan. Investigation of Smoothness Enhancing Accuracy-Conserving Filters for Improving Streamline Integration through Discontinuous Fields. IEEE Transactions on Visualization and Computer Graphics, 14(3):680--692, 2008.
[27]
I. E. Sutherland and G. W. Hodgman. Reentrant polygon clipping. Communications of the ACM, 17(1):32--42, 1974.
[28]
D. Walfisch, J. K. Ryan, R. M. Kirby, and R. Haimes. One-Sided Smoothness-Increasing Accuracy-Conserving Filtering for Enhanced Streamline Integration through Discontinuous Fields. Journal of Scientific Computing, 38(2):164--184, 2009.
[29]
Y. Zhang and F. Mueller. Auto-generation and auto-tuning of 3D stencil codes on GPU clusters. In Proceedings of the Tenth International Symposium on Code Generation and Optimization, CGO '12, pages 155--164, New York, NY, USA, 2012. ACM.

Cited By

View all
  • (2018)A High Arithmetic Intensity Krylov Subspace Method Based on Stencil Compiler ProgramsHigh Performance Computing in Science and Engineering10.1007/978-3-319-97136-0_1(1-18)Online publication date: 17-Jul-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2013
1123 pages
ISBN:9781450323789
DOI:10.1145/2503210
  • General Chair:
  • William Gropp,
  • Program Chair:
  • Satoshi Matsuoka
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

SC13
Sponsor:

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2018)A High Arithmetic Intensity Krylov Subspace Method Based on Stencil Compiler ProgramsHigh Performance Computing in Science and Engineering10.1007/978-3-319-97136-0_1(1-18)Online publication date: 17-Jul-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media