[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A Stencil Scaling Approach for Accelerating Matrix-Free Finite Element Implementations

Published: 01 January 2018 Publication History

Abstract

We present a novel approach to fast on-the-fly low order finite element assembly for scalar elliptic partial differential equations of Darcy type with variable coefficients optimized for matrix-free implementations. Our approach introduces a new operator that is obtained by appropriately scaling the reference stiffness matrix from the constant coefficient case. Assuming sufficient regularity, an a priori analysis shows that solutions obtained by this approach are unique and have asymptotically optimal order convergence in the $H^1$- and the $L^2$-norms on hierarchical hybrid grids. For the preasymptotic regime, we present a local modification that guarantees uniform ellipticity of the operator. Cost considerations show that our novel approach requires roughly one-third of the floating-point operations compared to a classical finite element assembly scheme employing nodal integration. Our theoretical considerations are illustrated by numerical tests that confirm the expectations with respect to accuracy and run-time. A large scale application with more than a hundred billion ($1.6\cdot10^{11}$) degrees of freedom executed on 14310 compute cores demonstrates the efficiency of the new scaling approach.

References

[1]
P. Arbenz, G. H. van Lenthe, U. Mennel, R. Müller, and M. Sala, A scalable multi-level preconditioner for matrix-free $\mu$-finite element analysis of human bone structures, Internat. J. Numer. Methods Engrg., 73 (2008), pp. 927--947, https://doi.org/10.1002/nme.2101.
[2]
S. Bauer, M. Mohr, U. Rüde, J. Weismüller, M. Wittmann, and B. Wohlmuth, A two-scale approach for efficient on-the-fly operator assembly in massively parallel high performance multigrid codes, Appl. Numer. Math., 122 (2017), pp. 14--38, https://doi.org/10.1016/j.apnum.2017.07.006.
[3]
B. Bergen, Hierarchical Hybrid Grids: Data Structures and Core Algorithms for Efficient Finite Element Simulations on Supercomputers, Ph.D. thesis, Technische Fakultät der Friedrich-Alexander-Universität Erlangen-Nürnberg, SCS Publishing House, 2006.
[4]
B. Bergen and F. Hülsemann, Hierarchical hybrid grids: Data structures and core algorithms for multigrid, Numer. Linear Algebra Appl., 11 (2004), pp. 279--291, https://doi.org/10.1002/nla.382.
[5]
B. Bergen, G. Wellein, F. Hülsemann, and U. Rüde, Hierarchical hybrid grids: Achieving TERAFLOP performance on large scale finite element simulations, Internat. J. Parallel Emergent Distrib. Syst., 22 (2007), pp. 311--329.
[6]
J. Bey, Tetrahedral grid refinement, Computing, 55 (1995), pp. 355--378, https://doi.org/10.1007/BF02238487.
[7]
J. Bielak, O. Ghattas, and E.-J. Kim, Parallel octree-based finite element method for large-scale earthquake ground motion simulation, CMES Comput. Model. Eng. Sci., 10 (2005), pp. 99--112, https://doi.org/10.3970/cmes.2005.010.099.
[8]
A. Brandt, Barriers to Achieving Textbook Multigrid Efficiency (TME) in CFD, Institute for Computer Applications in Science and Engineering, NASA Langley Research Center, 1998.
[9]
J. Brown, Efficient nonlinear solvers for nodal high-order finite elements in 3D, J. Sci. Comput., 45 (2010), pp. 48--63, https://doi.org/10.1007/s10915-010-9396-8.
[10]
G. F. Carey and B.-N. Jiang, Element-by-element linear and nonlinear solution schemes, Commun. Appl. Numer. Methods, 2 (1986), pp. 145--153, https://doi.org/10.1002/cnm.1630020205.
[11]
P. G. Ciarlet, The Finite Element Method for Elliptic Problems, Studies in Mathematics and Its Applications, North-Holland, Amsterdam, New York, Oxford, 1978.
[12]
P. G. Ciarlet and P.-A. Raviart, Maximum principle and uniform convergence for the finite element method, Comput. Methods Appl. Mech. Engrg., 2 (1973), pp. 17--31.
[13]
D. Comer, Essentials of Computer Architecture, Pearson Prentice Hall, 2005.
[14]
C. Engwer, R. D. Falgout, and U. M. Yang, Stencil computations for PDE-based applications with examples from DUNE and Hypre, Concurrency and Computation: Practice and Experience, 29 (2017), e4097, https://doi.org/10.1002/cpe.4097.
[15]
C. Flaig and P. Arbenz, A Highly Scalable Matrix-Free Multigrid Solver for $\mu$FE Analysis Based on a Pointer-Less Octree, in Large-Scale Scientific Computing: 8th International Conference, LSSC 2011, Sozopol, Bulgaria, 2011, Revised Selected Papers, I. Lirkov, S. Margenov, and J. Waśniewski, eds., Springer, Berlin, Heidelberg, 2012, pp. 498--506, https://doi.org/10.1007/978-3-642-29843-1_56.
[16]
L. E.-R. Gariepy and L. C. Evans, Measure Theory and Fine Properties of Functions, 1992.
[17]
B. Gmeiner, Design and Analysis of Hierarchical Hybrid Multigrid Methods for Peta-Scale Systems and Beyond, Ph.D. thesis, Technische Fakultät der Friedrich-Alexander-Universität Erlangen-Nürnberg, 2013.
[18]
B. Gmeiner, U. Rüde, H. Stengel, C. Waluga, and B. Wohlmuth, Towards textbook efficiency for parallel multigrid, Numer. Math. Theory Methods Appl., 8 (2015), pp. 22--46.
[19]
W. J. Gordon and C. A. Hall, Transfinite element methods: Blending-function interpolation over arbitrary curved element domains, Numer. Math., 21 (1973), pp. 109--129.
[20]
G. Hager, J. Treibig, J. Habich, and G. Wellein, Exploring performance and power properties of modern multicore chips via simple machine models, Concurrency and Computation: Practice and Experience, 28 (2014), pp. 189--210, https://doi.org/10.1002/cpe.3180.
[21]
G. Hager and G. Wellein, Introduction to High Performance Computing for Scientists and Engineers, Computational Science Series, CRC Press, 2011.
[22]
A. Ilic, F. Pratas, and L. Sousa, Cache-aware Roofline model: Upgrading the loft, IEEE Comput. Archit. Lett., 13 (2013), pp. 21--24, https://doi.org/10.1109/L-CA.2013.6.
[23]
Intel Corp., Intel Advisor, https://software.intel.com/en-us/intel-advisor-xe, 2018, Update 2.
[24]
S. Korotov, M. Křížek, and P. Neittaanmäki, Weakened acute type condition for tetrahedral triangulations and the discrete maximum principle, Math. Comp., 70 (2001), pp. 107--119, https://doi.org/10.1090/S0025-5718-00-01270-9.
[25]
M. Kronbichler and K. Kormann, A generic interface for parallel cell-based finite element operator application, Comput. & Fluids, 63 (2012), pp. 135--147, https://doi.org/10.1016/j.compfluid.2012.04.012.
[26]
M. Křížek and L. Qun, On diagonal dominance of stiffness matrices in 3D, East-West J. Numer. Math., 3 (1995), pp. 59--69.
[27]
J. Li, J. M. Melenk, B. Wohlmuth, and J. Zou, Optimal a priori estimates for higher order finite elements for elliptic interface problems, Appl. Numer. Math., 60 (2010), pp. 19--37.
[28]
K. Ljungkvist, Matrix-free finite-element computations on graphics processors with adaptively refined unstructured meshes, in Proceedings of the 25th High Performance Computing Symposium, HPC '17, Society for Computer Simulation International, 2017, pp. 1:1--1:12, http://dl.acm.org/citation.cfm?id=3108096.3108097.
[29]
K. Ljungkvist and M. Kronbichler, Multigrid for Matrix-Free Finite Element Computations on Graphics Processors, Tech. Report 2017-006, Department of Information Technology, Uppsala University, 2017.
[30]
LRZ, SuperMUC Petascale System, https://www.lrz.de/services/compute/supermuc/systemdescription/ (retrieved on 3 July 2018).
[31]
J. Lyness and U. Rüde, Cubature of integrands containing derivatives, Numer. Math., 78 (1998), pp. 439--461, https://doi.org/10.1007/s002110050320.
[32]
D. A. May, J. Brown, and L. L. Pourhiet, A scalable, matrix-free multigrid preconditioner for finite element discretizations of heterogeneous Stokes flow, Comput. Methods Appl. Mech. Engrg., 290 (2015), pp. 496--523, https://doi.org/10.1016/j.cma.2015.03.014.
[33]
C. Pechstein and R. Scheichl, Weighted Poincaré inequalities, IMA J. Numer. Anal., 33 (2013), pp. 652--686, https://doi.org/10.1093/imanum/drs017.
[34]
G. Strang and G. Fix, An Analysis of the Finite Element Method, 2nd ed., Wellesley-Cambridge Press, Wellesley, MA, 2008.
[35]
J. Treibig, G. Hager, and G. Wellein, LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments, in Proceedings of the 2010 39th International Conference on Parallel Processing Workshops, ICPPW '10, Washington, DC, 2010, IEEE Computer Society, pp. 207--216, https://doi.org/10.1109/ICPPW.2010.38.
[36]
H. Triebel, Interpolation Theory, Function Spaces, Differential Operators, 2nd ed., Johann Ambrosius Barth, Heidelberg, 1995.
[37]
B. van Rietbergen, H. Weinans, R. Huiskes, and B. Polman, Computational strategies for iterative solutions of large FEM applications employing voxel data, Internat. J. Numer. Methods Engrg., 39 (1996), pp. 2743--2767, https://doi.org/10.1002/(SICI)1097-0207(19960830)39:16<2743::AID-NME974>3.0.CO;2-A.
[38]
S. Williams, A. Waterman, and D. Patterson, Roofline: An insightful visual performance model for multicore architectures, Commun. ACM, 52 (2009), pp. 65--76, https://doi.org/10.1145/1498765.1498785.
[39]
W.-S. Yang, Variable Viscosity Thermal Convection at Infinite Prandtl Number in a Thick Spherical Shell, Ph.D. thesis, University of Illinois at Urbana-Champaign, 1997.

Cited By

View all
  • (2023)Enhancing data locality of the conjugate gradient method for high-order matrix-free finite-element implementationsInternational Journal of High Performance Computing Applications10.1177/1094342022110788037:2(61-81)Online publication date: 1-Mar-2023
  • (2022)Resiliency in numerical algorithm design for extreme scale simulationsInternational Journal of High Performance Computing Applications10.1177/1094342021105518836:2(251-285)Online publication date: 1-Mar-2022
  • (2020)A Flexible, Parallel, Adaptive Geometric Multigrid Method for FEMACM Transactions on Mathematical Software10.1145/342519347:1(1-27)Online publication date: 8-Dec-2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image SIAM Journal on Scientific Computing
SIAM Journal on Scientific Computing  Volume 40, Issue 6
DOI:10.1137/sjoce3.40.6
Issue’s Table of Contents

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 January 2018

Author Tags

  1. matrix-free
  2. finite elements
  3. variable coefficients
  4. stencil scaling
  5. variational crime analysis
  6. optimal order a priori estimates

Author Tags

  1. 65N15
  2. 65N30
  3. 65Y20

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Enhancing data locality of the conjugate gradient method for high-order matrix-free finite-element implementationsInternational Journal of High Performance Computing Applications10.1177/1094342022110788037:2(61-81)Online publication date: 1-Mar-2023
  • (2022)Resiliency in numerical algorithm design for extreme scale simulationsInternational Journal of High Performance Computing Applications10.1177/1094342021105518836:2(251-285)Online publication date: 1-Mar-2022
  • (2020)A Flexible, Parallel, Adaptive Geometric Multigrid Method for FEMACM Transactions on Mathematical Software10.1145/342519347:1(1-27)Online publication date: 8-Dec-2020

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media