More Web Proxy on the site http://driver.im/

research-article

Algorithms and Data Structures for Matrix-Free Finite Element Operators with MPI-Parallel Sparse Multi-Vectors

Authors:

Martin KronbichlerAuthors Info & Claims

ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 3

Article No.: 20, Pages 1 - 30

https://doi.org/10.1145/3399736

Published: 29 June 2020 Publication History

Abstract

Traditional solution approaches for problems in quantum mechanics scale as O(M³), where M is the number of electrons. Various methods have been proposed to address this issue and obtain a linear scaling O(M). One promising formulation is the direct minimization of energy. Such methods take advantage of physical localization of the solution, allowing users to seek it in terms of non-orthogonal orbitals with local support.

This work proposes a numerically efficient implementation of sparse parallel vectors within the open-source finite element library deal.II. The main algorithmic ingredient is the matrix-free evaluation of the Hamiltonian operator by cell-wise quadrature. Based on an a-priori chosen support for each vector, we develop algorithms and data structures to perform (i) matrix-free sparse matrix multivector products (SpMM), (ii) the projection of an operator onto a sparse sub-space (inner products), and (iii) post-multiplication of a sparse multivector with a square matrix. The node-level performance is analyzed using a roofline model. Our matrix-free implementation of finite element operators with sparse multivectors achieves a performance of 157 GFlop/s on an Intel Cascade Lake processor with 20 cores. Strong and weak scaling results are reported for a representative benchmark problem using quadratic and quartic finite element bases.

References

[1]

Kadir Akbudak, Oguz Selvitopi, and Cevdet Aykanat. 2018. Partitioning models for scaling parallel sparse matrix-matrix multiplication. ACM Transactions on Parallel Computing (TOPC) 4, 3 (2018), 13.

[2]

Giovanni Alzetta, Daniel Arndt, Wolfgang Bangerth, Vishal Boddu, Benjamin Brands, Denis Davydov, Rene Gassmöller, Timo Heister, Luca Heltai, Katharina Kormann, Martin Kronbichler, Matthias Maier, Jean-Paul Pelteret, Bruno Turcksin, and David Wells. 2018. The deal.II Library, Version 9.0. Journal of Numerical Mathematics 26, 4 (2018), 173--183.

[3]

Hartwig Anzt, Stanimire Tomov, and Jack Dongarra. 2015. Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product. In Proceedings of the Symposium on High Performance Computing (HPC’15). Society for Computer Simulation International, 75--82. http://dl.acm.org/citation.cfm?id=2872599.2872609

Digital Library

[4]

Ariful Azad, Grey Ballard, Aydin Buluc, James Demmel, Laura Grigori, Oded Schwartz, Sivan Toledo, and Samuel Williams. 2016. Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication. SIAM Journal on Scientific Computing 38, 6 (2016), C624--C651.

[5]

Satish Balay, Shrirang Abhyankar, Mark F. Adams, Jed Brown, Peter Brune, Kris Buschelman, Lisandro Dalcin, Victor Eijkhout, William D. Gropp, Dinesh Kaushik, Matthew G. Knepley, Lois Curfman McInnes, Karl Rupp, Barry F. Smith, Stefano Zampini, and Hong Zhang. 2015. PETSc Users Manual. Technical Report ANL-95/11 - Revision 3.6. Argonne National Laboratory.

[6]

Wolfgang Bangerth, Carsten Burstedde, Timo Heister, and Martin Kronbichler. 2011. Algorithms and data structures for massively parallel generic finite element codes. ACM Trans. Math. Softw. 38, 2 (2011), 14:1--14:28.

Digital Library

[7]

Gang Bao, Guanghui Hu, and Di Liu. 2012. Numerical solution of the Kohn--Sham equation by finite element methods with an adaptive mesh redistribution technique. Journal of Scientific Computing 55, 2 (2012), 372--391.

[8]

Thomas L. Beck. 2009. Real-Space and Multigrid Methods in Computational Chemistry. John Wiley 8 Sons, Inc., Hoboken, New Jersey, Chapter 5, 223--285.

[9]

Nicolas Bock and Matt Challacombe. 2013. An optimized sparse approximate matrix multiply for matrices with decay. SIAM Journal on Scientific Computing 35, 1 (2013), C72--C98.

[10]

Nicolas Bock, Matt Challacombe, and Laxmikant V. Kalé. 2016. Solvers for O(N) electronic structure in the strong scaling limit. SIAM Journal on Scientific Computing 38, 1 (2016), C1--C21.

Digital Library

[11]

Urban Borštnik, Joost VandeVondele, Valéry Weber, and Jürg Hutter. 2014. Sparse matrix multiplication: The distributed block-compressed sparse row library. Parallel Computing 40, 5-6 (2014), 47--58.

[12]

David R. Bowler, Ian J. Bush, and Michael J. Gillan. 2000. Practical methods for ab initio calculations on thousands of atoms. International Journal of Quantum Chemistry 77, 5 (2000), 831--842.

[13]

David R. Bowler, Tsuyoshi Miyazaki, and Michael J. Gillan. 2001. Parallel sparse matrix multiplication for linear scaling electronic structure calculations. Computer Physics Communications 137, 2 (2001), 255--273.

[14]

Aydin Buluc and John R. Gilbert. 2008. Challenges and advances in parallel sparse matrix-matrix multiplication. In Proceedings of the 37th International Conference on Parallel Processing. IEEE, 503--510.

[15]

Steven K. Burger and Weitao Yang. 2008. Linear-scaling quantum calculations using non-orthogonal localized molecular orbitals. Journal of Physics: Condensed Matter 20, 29 (2008), 294209.

[16]

Eric J. Bylaska, Michael Holst, and John H. Weare. 2009. Adaptive finite element method for solving the exact Kohn--Sham equation of density functional theory. Journal of Chemical Theory and Computation 5, 4 (2009), 937--948.

[17]

Matt Challacombe. 2000. A general parallel sparse-blocked matrix multiply for linear scaling SCF theory. Computer Physics Communications 128, 1--2 (2000), 93--107.

[18]

Denis Davydov, Tymofiy Gerasimov, Jean-Paul Pelteret, and Paul Steinmann. 2017. Convergence study of the h-adaptive PUM and the hp-adaptive FEM applied to eigenvalue problems in quantum mechanics. Advanced Modeling and Simulation in Engineering Sciences 4, 1 (12 Dec 2017), 7.

[19]

Denis Davydov, Timo Heister, Martin Kronbichler, and Paul Steinmann. 2018. Matrix-free locally adaptive finite element solution of density-functional theory with nonorthogonal orbitals and multigrid preconditioning. Physica Status Solidi B: Basic Solid State Physics 255, 9 (2018).

[20]

Mehmet Deveci, Christian Trott, and Sivasankaran Rajamanickam. 2018. Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures. Parallel Computing 78 (2018), 33--46.

Digital Library

[21]

Jun Fang, Xingyu Gao, and Aihui Zhou. 2012. A Kohn--Sham equation solver based on hexahedral finite elements. J. Comput. Phys. 231, 8 (2012), 3166--3180.

Digital Library

[22]

Jean-Luc Fattebert and Jerry Bernholc. 2000. Towards grid-based O(N) density-functional theory methods: Optimized nonorthogonal orbitals and multigrid acceleration. Physical Review B 62 (Jul 2000), 1713--1722. Issue 3.

[23]

Jean-Luc Fattebert and Francois Gygi. 2004. Linear scaling first-principles molecular dynamics with controlled accuracy. Computer Physics Communications 162, 1 (2004), 24--36.

[24]

Jean-Luc Fattebert and Francois Gygi. 2006. Linear-scaling first-principles molecular dynamics with plane-waves accuracy. Phys. Rev. B 73 (Mar 2006), 115-124. Issue 11.

[25]

Jean-Luc Fattebert, Richard D. Hornung, and Andrew M. Wissink. 2007. Finite element approach for density functional theory calculations on locally-refined meshes. Journal of Computational Physics 223, 2 (2007), 759--773.

Digital Library

[26]

Paul Fischer, Misun Min, Thilina Rathnayake, Som Dutta, Tzanio Kolev, Veselin Dobrev, Jean-Sylvain Camier, Martin Kronbichler, Tim Warburton, Kasia Świrydowicz, and Jed Brown. 2020. Scalability of high-performance PDE solvers. The International Journal on High Performance Computing Applications in press (2020).

[27]

Jeffrey T. Frey and Douglas J. Doren. 2011. TubeGen 3.4 (web-interface, http://turin.nss.udel.edu/research/tubegenonline.html), University of Delaware, Newark, DE. (2011).

[28]

Giulia Galli and Michele Parrinello. 1992. Large scale electronic structure calculations. Physical Review Letters 69, 24 (1992), 3547.

[29]

Chris M. Goringe, Eduardo Hernández, Michael J. Gillan, and Ian J. Bush. 1997. Linear-scaling DFT-pseudopotential calculations on parallel computers. Computer Physics Communications 102, 1--3 (1997), 1--16.

[30]

Fred G. Gustavson. 1978. Two fast algorithms for sparse matrices: Multiplication and permuted transposition. ACM Transactions on Mathematical Software (TOMS) 4, 3 (1978), 250--269.

Digital Library

[31]

Kikuji Hirose and Tomoya Ono. 2001. Direct minimization to generate electronic states with proper occupation numbers. Physical Review B 64, 8 (2001), 085105.

[32]

Kikuji Hirose, Tomoya Ono, Yoshitaka Fujimoto, and Shigeru Tsukamoto. 2005. First-Principles Calculations in Real-Space Formalism. Imperial College Press, 57 Shelton Street, Covent Garden, London WC2H 9HE.

[33]

Pierre C. Hohenberg and Walter Kohn. 1964. Inhomogeneous electron gas. Physical Review 136, 3B (1964), B864--B871.

[34]

Changwan Hong, Aravind Sukumaran-Rajam, Bortik Bandyopadhyay, Jinsung Kim, Süreyya Emre Kurt, Israt Nisa, Shivani Sabhlok, Ümit V. Çatalyürek, Srinivasan Parthasarathy, and P. Sadayappan. 2018. Efficient sparse-matrix multi-vector product on GPUs. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing. ACM, New York, 66--79.

[35]

Sohrab Ismail-Beigi and Tomas A. Arias. 1999. Locality of the density matrix in metals, semiconductors, and insulators. Physical Review Letters 82 (Mar. 1999), 2127--2130. Issue 10.

[36]

Jeremy Kepner and John Gilbert. 2011. Graph Algorithms in the Language of Linear Algebra. SIAM.

[37]

R. D. King-Smith and David Vanderbilt. 1994. First-principles investigation of ferroelectricity in perovskite compounds. Physical Review B 49, 9 (1994), 5828.

[38]

Walter Kohn. 1959. Analytic properties of Bloch waves and Wannier functions. Physical Review 115 (Aug 1959), 809--821. Issue 4.

[39]

Walter Kohn and Lu Jeu Sham. 1965. Self-consistent equations including exchange and correlation effects. Physical Review 140, 4A (1965), A1133--A1138.

[40]

Martin Kronbichler and Katharina Kormann. 2012. A generic interface for parallel cell-based finite element operator application. Computers 8 Fluids 63 (2012), 135--147.

[41]

Martin Kronbichler and Katharina Kormann. 2019. Fast matrix-free evaluation of discontinuous Galerkin finite element operators. ACM Transactions on Mathematical Software 45, 3 (2019), 29/1--40.

Digital Library

[42]

Alfio Lazzaro, Joost VandeVondele, Jürg Hutter, and Ole Schütt. 2017. Increasing the efficiency of sparse matrix-matrix multiplication with a 2.5 D algorithm and one-sided MPI. In Proceedings of the Platform for Advanced Scientific Computing Conference. ACM, 3.

Digital Library

[43]

Jianfeng Lu and Kyle Thicke. 2017. Orbital minimization method with l¹ regularization. Journal of Computational Physics 336 (2017), 87--103.

[44]

Francesco Mauri and Giulia Galli. 1994. Electronic-structure calculations and molecular-dynamics simulations with linear system-size scaling. Physical Review B 50, 7 (1994), 4316.

[45]

Michael McCourt, Barry Smith, and Hong Zhang. 2013. Efficient sparse matrix-matrix products using colorings. SIAM Journal on Matrix Analysis and Applications (2013).

[46]

Phani Motamarri, Sambit Das, Shiva Rudraraju, Krishnendu Ghosh, Denis Davydov, and Vikram Gavini. 2019. DFT-FE - A massively parallel adaptive finite-element code for large-scale density functional theory calculations. Computer Physics Communications (2019).

[47]

Phani Motamarri, Michael R. Nowak, Kenneth Leiter, Jaroslaw Knap, and Vikram Gavini. 2012. Higher-order adaptive finite-element methods for Kohn--Sham density functional theory. Journal of Computational Physics 253, 15 (2012), 308--343.

[48]

Peter Munch, Katharina Kormann, and Martin Kronbichler. 2020. hyper.deal: An efficient, matrix-free finite-element library for high-dimensional partial differential equations. (2020). arxiv:arXiv:2002.08110v1

[49]

Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, and Aydın Buluç. 2019. Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors. Parallel Comput. 90 (2019), 102545.

Digital Library

[50]

Anna Nissen, Katharina Kormann, Magnus Grandin, and Kristoffer Virta. 2015. Stable difference methods for block-oriented adaptive grids. Journal of Scientific Computing 65, 2 (2015), 486--511.

Digital Library

[51]

Pablo Ordejón, David A. Drabold, Richard M. Martin, and Matthew P. Grumbach. 1995. Linear system-size scaling methods for electronic-structure calculations. Physical Review B 51, 3 (1995), 1456.

[52]

John E. Pask, Barry M. Klein, Philip A. Sterne, and Chingyao Y. Fong. 2001. Finite-element methods in electronic-structure theory. Computer Physics Communications 135, 1 (2001), 1--34.

[53]

John E. Pask and Natarajan Sukumar. 2017. Partition of unity finite element method for quantum mechanical materials calculations. Extreme Mechanics Letters 11 (2017), 8--17.

[54]

Md Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Jongsoo Park, Michael J. Anderson, Satya Gautam Vadlamudi, Dipankar Das, Sergey G. Pudov, Vadim O. Pirogov, and Pradeep Dubey. 2015. Parallel efficient sparse matrix-matrix multiplication on multicore platforms. In International Conference on High Performance Computing. Springer, 48--57.

[55]

Liang Peng, Feng Long Gu, and Weitao Yang. 2013. Effective preconditioning for ab initio ground state energy minimization with non-orthogonal localized molecular orbitals. Physical Chemistry Chemical Physics 15, 37 (2013), 15518--15527.

[56]

L. Ramdas Ram-Mohan. 2002. Finite Element and Boundary Element Applications in Quantum Mechanics. Oxford University Press.

[57]

Emanuel H. Rubensson, Elias Rudberg, and Paweł Sałek. 2006. Sparse matrix algebra for quantum modeling of large systems. In Proceedings of the International Workshop on Applied Parallel Computing. Springer, 90--99.

[58]

Emanuel H. Rubensson, Elias Rudberg, and Paweł Sałek. 2007. A hierarchic sparse matrix data structure for large-scale Hartree--Fock/Kohn--Sham calculations. Journal of Computational Chemistry 28, 16 (2007), 2531--2537.

[59]

Elias Rudberg, Emanuel H. Rubensson, Paweł Sałek, and Anastasia Kruchinina. 2018. Ergo: An open-source program for linear-scaling electronic structure calculations. SoftwareX 7 (2018), 107--111.

[60]

Chandra Saravanan, Yihan Shao, Roi Baer, Philip N. Ross, and Martin Head-Gordon. 2003. Sparse matrix multiplications for linear scaling electronic structure calculations in an atom-centered basis set using multiatom blocks. Journal of Computational Chemistry 24, 5 (2003), 618--622.

[61]

Volker Schauer and Christian Linder. 2015. The reduced basis method in all-electron calculations with finite elements. Advances in Computational Mathematics 41, 5 (2015), 1035--1047.

Digital Library

[62]

Jan Treibig, Georg Hager, and Gerhard Wellein. 2010. LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures. San Diego CA.

Digital Library

[63]

Eiji Tsuchida and Masaru Tsukada. 1996. Adaptive finite-element method for electronic-structure calculations. Physical Review B 54, 11 (sep 1996), 7602--7605.

[64]

Eiji Tsuchida and Masaru Tsukada. 1998. Large-scale electronic-structure calculations based on the adaptive finite-element method. Journal of the Physical Society of Japan 67, 11 (1998), 3844--3858.

[65]

Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM 52, 4 (April 2009), 65--76.

Digital Library

[66]

Samuel Webb Williams. 2008. Auto-tuning Performance on Multicore Computers. Ph.D. Dissertation. University of California, Berkeley.

[67]

Toby D. Young, Eloy Romero, and Jose E. Roman. 2013. Parallel finite element density functional computations exploiting grid refinement and subspace recycling. In Computer Physics Communications. IEEE, 66--72.

[68]

Dier Zhang, Lihua Shen, Aihui Zhou, and Xin-Gao Gong. 2008. Finite element method for solving Kohn--Sham equations based on self-adaptive tetrahedral mesh. Physics Letters A 372, 30 (2008), 5071--5076.

Cited By

Panigrahi GKodali NPanda DMotamarri P(2024)Fast hardware-aware matrix-free algorithms for higher-order finite-element discretized matrix multivector products on distributed systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104925192:COnline publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1016/j.jpdc.2024.104925
Munch PIvannikov VCyron CKronbichler M(2024)On the construction of an efficient finite-element solver for phase-field simulations of many-particle solid-state-sintering processesComputational Materials Science10.1016/j.commatsci.2023.112589231(112589)Online publication date: Jan-2024
https://doi.org/10.1016/j.commatsci.2023.112589
Arndt DFehn NKanschat GKormann KKronbichler MMunch PWall WWitte J(2020)ExaDG: High-Order Discontinuous Galerkin for the Exa-ScaleSoftware for Exascale Computing - SPPEXA 2016-201910.1007/978-3-030-47956-5_8(189-224)Online publication date: 31-Jul-2020
https://doi.org/10.1007/978-3-030-47956-5_8

Index Terms

Algorithms and Data Structures for Matrix-Free Finite Element Operators with MPI-Parallel Sparse Multi-Vectors
1. Mathematics of computing
  1. Mathematical analysis
    1. Differential equations
      1. Partial differential equations
  2. Mathematical software
    1. Mathematical software performance

Recommendations

Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators

We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with sum factorization on quadrilateral and hexahedral meshes, targeting general weak forms of linear and ...
Multigrid for Matrix-Free High-Order Finite Element Computations on Graphics Processors

This article presents matrix-free finite-element techniques for efficiently solving partial differential equations on modern many-core processors, such as graphics cards. We develop a GPU parallelization of a matrix-free geometric multigrid iterative ...
An efficient framework for matrix-free SpMV computation on GPU for elastoplastic problems
Abstract
High computational cost in elastoplastic analysis is often handled by the use of high performance parallel computers. However, the presence of both elastic and plastic states leads to the branching issue, which prevents the realization of true ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing

ACM Transactions on Parallel Computing Volume 7, Issue 3

Special Issue on PPoPP 2017 (Part 2) and Regular Papers

September 2020

182 pages

ISSN:2329-4949

EISSN:2329-4957

DOI:10.1145/3407694

Editor:
David A. Bader
New Jersey Institute of Technology, USA

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2020

Accepted: 01 April 2020

Revised: 01 January 2020

Received: 01 July 2019

Published in TOPC Volume 7, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

German Research Foundation (Deutsche Forschungsgemeinschaft, DFG)
Bayerisches Kompetenznetzwerk für Technisch-Wissenschaftliches Hoch- und Höchstleistungsrechnen (KONWIHR)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
184
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)3

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Panigrahi GKodali NPanda DMotamarri P(2024)Fast hardware-aware matrix-free algorithms for higher-order finite-element discretized matrix multivector products on distributed systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104925192:COnline publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1016/j.jpdc.2024.104925
Munch PIvannikov VCyron CKronbichler M(2024)On the construction of an efficient finite-element solver for phase-field simulations of many-particle solid-state-sintering processesComputational Materials Science10.1016/j.commatsci.2023.112589231(112589)Online publication date: Jan-2024
https://doi.org/10.1016/j.commatsci.2023.112589
Arndt DFehn NKanschat GKormann KKronbichler MMunch PWall WWitte J(2020)ExaDG: High-Order Discontinuous Galerkin for the Exa-ScaleSoftware for Exascale Computing - SPPEXA 2016-201910.1007/978-3-030-47956-5_8(189-224)Online publication date: 31-Jul-2020
https://doi.org/10.1007/978-3-030-47956-5_8

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents