More Web Proxy on the site http://driver.im/

research-article

A scalable, efficient scheme for evaluation of stencil computations over unstructured meshes

Authors:

Robert M. KirbyAuthors Info & Claims

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Article No.: 79, Pages 1 - 12

https://doi.org/10.1145/2503210.2503214

Published: 17 November 2013 Publication History

Abstract

Stencil computations are a common class of operations that appear in many computational scientific and engineering applications. Stencil computations often benefit from compile-time analysis, exploiting data-locality, and parallelism. Post-processing of discontinuous Galerkin (dG) simulation solutions with B-spline kernels is an example of a numerical method which requires evaluating computationally intensive stencil operations over a mesh. Previous work on stencil computations has focused on structured meshes, while giving little attention to unstructured meshes. Performing stencil operations over an unstructured mesh requires sampling of heterogeneous elements which often leads to inefficient memory access patterns and limits data locality/reuse. In this paper, we present an efficient method for performing stencil computations over unstructured meshes which increases data-locality and cache efficiency, and a scalable approach for stencil tiling and concurrent execution. We provide experimental results in the context of post-processing of dG solutions that demonstrate the effectiveness of our approach.

References

[1]

M. Arenaz, J. Touriño, and R. Doallo. An inspector-executor algorithm for irregular assignment parallelization. In In Proc. of the 2nd International Symposium on Parallel and Distributed Processing and Applications (ISPA), 2005.

Digital Library

[2]

V. Bandishti, I. Pananilath, and U. Bondhugula. Tiling stencil computations to maximize parallelism. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 40:1--40:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.

Digital Library

[3]

J. L. Bentley and J. H. Friedman. Data Structures for Range Searching. ACM Comput. Surv., 11(4):397--409, Dec. 1979.

Digital Library

[4]

M. Burtscher, R. Nasre, and K. Pingali. A Quantitative Study of Irregular Programs on GPUs. In Proceedings of the IEEE International Symposium on Workload Characterization, IISWC '12, 2012.

Digital Library

[5]

L.-W. Chang, J. A. Stratton, H.-S. Kim, and W.-M. W. Hwu. A scalable, numerically stable, high-performance tridiagonal solver using GPUs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 27:1--27:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.

Digital Library

[6]

M. Christen, O. Schenk, and H. Burkhart. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures. In Parallel Distributed Processing Symposium (IPDPS), 2011 IEEE International, pages 676--687, 2011.

Digital Library

[7]

B. Cockburn, M. Luskin, C.-W. Shu, and E. Süli. Post-processing of Galerkin methods for hyperbolic problems. In Proceedings of the International Symposium on Discontinuous Galerkin Methods, pages 291--300. Springer, 1999.

[8]

B. Cockburn, M. Luskin, C.-W. Shu, and E. Suli. Enhanced accuracy by post-processing for finite element methods for hyperbolic equations. Mathematics of Computation, 72:577--606, 2003.

Digital Library

[9]

N. Corporation. CUDA C Best Practices Guide. NVIDIA, 2012.

[10]

K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, SC '08, pages 4:1--4:12, Piscataway, NJ, USA, 2008. IEEE Press.

Digital Library

[11]

J. Holewinski, L.-N. Pouchet, and P. Sadayappan. High-performance code generation for stencil computations on GPU architectures. In Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, pages 311--320, New York, NY, USA, 2012. ACM.

Digital Library

[12]

Khronos Group. The OpenCL Specification, Sept. 2011.

[13]

S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective automatic parallelization of stencil computations. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, PLDI '07, pages 235--244, New York, NY, USA, 2007. ACM.

Digital Library

[14]

T. Malas, A. J. Ahmadia, J. Brown, J. A. Gunnels, and D. E. Keyes. Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor. International Journal of High Performance Computing Applications, 27(2):193--209, May 2013.

Digital Library

[15]

D. J. Mavriplis. Unstructured Grid Techniques. Annual Review of Fluid Mechanics, 29(1):473--514, 1997.

[16]

H. Mirzaee, L. Ji, J. K. Ryan, and R. M. Kirby. Smoothness-Increasing Accuracy-Conserving (SIAC) Post-Processing for Discontinuous Galerkin solutions over structured Triangular Meshes. SIAM Journal of Numerical Analysis, 49:1899--1920, 2011.

Digital Library

[17]

H. Mirzaee, J. King, J. Ryan, and R. Kirby. Smoothness-Increasing Accuracy-Conserving Filters for Discontinuous Galerkin Solutions over Unstructured Triangular Meshes. SIAM Journal on Scientific Computing, 35(1):A212--A230, 2013.

[18]

H. Mirzaee, J. K. Ryan, and R. M. Kirby. Efficient Implementation of Smoothness-Increasing Accuracy-Conserving (SIAC) Filters for Discontinuous Galerkin Solutions. Journal of Scientific Computing, 2011.

Digital Library

[19]

NVIDIA. CUDA C Programming Guide v5.0. NVIDIA, 2012.

[20]

M. Rietmann, P. Messmer, T. Nissen-Meyer, D. Peter, P. Basini, D. Komatitsch, O. Schenk, J. Tromp, L. Boschi, and D. Giardini. Forward and adjoint simulations of seismic wave propagation on emerging large-scale GPU architectures. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 38:1--38:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.

Digital Library

[21]

J. K. Ryan and C.-W. Shu. On a one-sided post-processing technique for the discontinuous Galerkin methods. Methods and Applications of Analysis, 10:295--307, 2003.

[22]

J. K. Ryan, C.-W. Shu, and H. L. Atkins. Extension of a post-processing technique for the discontinuous Galerkin method for hyperbolic equations with application to an aeroacoustic problem. SIAM Journal on Scientific Computing, 26:821--843, 2005.

Digital Library

[23]

H. Samet. Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005.

Digital Library

[24]

L. Solano-Quinde, B. Bode, and A. K. Somani. Techniques for the parallelization of unstructured grid applications on multi-GPU systems. In Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM '12, pages 140--147, New York, NY, USA, 2012. ACM.

Digital Library

[25]

L. Solano-Quinde, Z. J. Wang, B. Bode, and A. K. Somani. Unstructured grid applications on GPU: performance analysis and improvement. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, pages 13:1--13:8, New York, NY, USA, 2011. ACM.

Digital Library

[26]

M. Steffen, S. Curtis, R. M. Kirby, and J. K. Ryan. Investigation of Smoothness Enhancing Accuracy-Conserving Filters for Improving Streamline Integration through Discontinuous Fields. IEEE Transactions on Visualization and Computer Graphics, 14(3):680--692, 2008.

Digital Library

[27]

I. E. Sutherland and G. W. Hodgman. Reentrant polygon clipping. Communications of the ACM, 17(1):32--42, 1974.

Digital Library

[28]

D. Walfisch, J. K. Ryan, R. M. Kirby, and R. Haimes. One-Sided Smoothness-Increasing Accuracy-Conserving Filtering for Enhanced Streamline Integration through Discontinuous Fields. Journal of Scientific Computing, 38(2):164--184, 2009.

Digital Library

[29]

Y. Zhang and F. Mueller. Auto-generation and auto-tuning of 3D stencil codes on GPU clusters. In Proceedings of the Tenth International Symposium on Code Generation and Optimization, CGO '12, pages 155--164, New York, NY, USA, 2012. ACM.

Digital Library

Cited By

Donfack SSanan PSchenk OReps BVanroose W(2018)A High Arithmetic Intensity Krylov Subspace Method Based on Stencil Compiler ProgramsHigh Performance Computing in Science and Engineering10.1007/978-3-319-97136-0_1(1-18)Online publication date: 17-Jul-2018
https://doi.org/10.1007/978-3-319-97136-0_1

Recommendations

Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...
Vectorizing unstructured mesh computations for many-core architectures

Achieving optimal performance on the latest multi-core and many-core architectures increasingly depends on making efficient use of the hardware's vector units. This paper presents results on achieving high performance through vectorization on CPUs and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

November 2013

1123 pages

ISBN:9781450323789

DOI:10.1145/2503210

General Chair:
William Gropp
University of Illinois at Urbana-Champaign, Urbana, Illinois
,
Program Chair:
Satoshi Matsuoka
Tokyo Institute of Technology, Tokyo, Japan

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Conference

SC13

Sponsor:

SIGHPC
SIGARCH
IEEE-CS

SC13: International Conference for High Performance Computing, Networking, Storage and Analysis

November 17 - 21, 2013

Colorado, Denver

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
167
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Donfack SSanan PSchenk OReps BVanroose W(2018)A High Arithmetic Intensity Krylov Subspace Method Based on Stencil Compiler ProgramsHigh Performance Computing in Science and Engineering10.1007/978-3-319-97136-0_1(1-18)Online publication date: 17-Jul-2018
https://doi.org/10.1007/978-3-319-97136-0_1

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents