[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Algorithms and Data Structures for Matrix-Free Finite Element Operators with MPI-Parallel Sparse Multi-Vectors

Published: 29 June 2020 Publication History

Abstract

Traditional solution approaches for problems in quantum mechanics scale as O(M3), where M is the number of electrons. Various methods have been proposed to address this issue and obtain a linear scaling O(M). One promising formulation is the direct minimization of energy. Such methods take advantage of physical localization of the solution, allowing users to seek it in terms of non-orthogonal orbitals with local support.
This work proposes a numerically efficient implementation of sparse parallel vectors within the open-source finite element library deal.II. The main algorithmic ingredient is the matrix-free evaluation of the Hamiltonian operator by cell-wise quadrature. Based on an a-priori chosen support for each vector, we develop algorithms and data structures to perform (i) matrix-free sparse matrix multivector products (SpMM), (ii) the projection of an operator onto a sparse sub-space (inner products), and (iii) post-multiplication of a sparse multivector with a square matrix. The node-level performance is analyzed using a roofline model. Our matrix-free implementation of finite element operators with sparse multivectors achieves a performance of 157 GFlop/s on an Intel Cascade Lake processor with 20 cores. Strong and weak scaling results are reported for a representative benchmark problem using quadratic and quartic finite element bases.

References

[1]
Kadir Akbudak, Oguz Selvitopi, and Cevdet Aykanat. 2018. Partitioning models for scaling parallel sparse matrix-matrix multiplication. ACM Transactions on Parallel Computing (TOPC) 4, 3 (2018), 13.
[2]
Giovanni Alzetta, Daniel Arndt, Wolfgang Bangerth, Vishal Boddu, Benjamin Brands, Denis Davydov, Rene Gassmöller, Timo Heister, Luca Heltai, Katharina Kormann, Martin Kronbichler, Matthias Maier, Jean-Paul Pelteret, Bruno Turcksin, and David Wells. 2018. The deal.II Library, Version 9.0. Journal of Numerical Mathematics 26, 4 (2018), 173--183.
[3]
Hartwig Anzt, Stanimire Tomov, and Jack Dongarra. 2015. Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product. In Proceedings of the Symposium on High Performance Computing (HPC’15). Society for Computer Simulation International, 75--82. http://dl.acm.org/citation.cfm?id=2872599.2872609
[4]
Ariful Azad, Grey Ballard, Aydin Buluc, James Demmel, Laura Grigori, Oded Schwartz, Sivan Toledo, and Samuel Williams. 2016. Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication. SIAM Journal on Scientific Computing 38, 6 (2016), C624--C651.
[5]
Satish Balay, Shrirang Abhyankar, Mark F. Adams, Jed Brown, Peter Brune, Kris Buschelman, Lisandro Dalcin, Victor Eijkhout, William D. Gropp, Dinesh Kaushik, Matthew G. Knepley, Lois Curfman McInnes, Karl Rupp, Barry F. Smith, Stefano Zampini, and Hong Zhang. 2015. PETSc Users Manual. Technical Report ANL-95/11 - Revision 3.6. Argonne National Laboratory.
[6]
Wolfgang Bangerth, Carsten Burstedde, Timo Heister, and Martin Kronbichler. 2011. Algorithms and data structures for massively parallel generic finite element codes. ACM Trans. Math. Softw. 38, 2 (2011), 14:1--14:28.
[7]
Gang Bao, Guanghui Hu, and Di Liu. 2012. Numerical solution of the Kohn--Sham equation by finite element methods with an adaptive mesh redistribution technique. Journal of Scientific Computing 55, 2 (2012), 372--391.
[8]
Thomas L. Beck. 2009. Real-Space and Multigrid Methods in Computational Chemistry. John Wiley 8 Sons, Inc., Hoboken, New Jersey, Chapter 5, 223--285.
[9]
Nicolas Bock and Matt Challacombe. 2013. An optimized sparse approximate matrix multiply for matrices with decay. SIAM Journal on Scientific Computing 35, 1 (2013), C72--C98.
[10]
Nicolas Bock, Matt Challacombe, and Laxmikant V. Kalé. 2016. Solvers for O(N) electronic structure in the strong scaling limit. SIAM Journal on Scientific Computing 38, 1 (2016), C1--C21.
[11]
Urban Borštnik, Joost VandeVondele, Valéry Weber, and Jürg Hutter. 2014. Sparse matrix multiplication: The distributed block-compressed sparse row library. Parallel Computing 40, 5-6 (2014), 47--58.
[12]
David R. Bowler, Ian J. Bush, and Michael J. Gillan. 2000. Practical methods for ab initio calculations on thousands of atoms. International Journal of Quantum Chemistry 77, 5 (2000), 831--842.
[13]
David R. Bowler, Tsuyoshi Miyazaki, and Michael J. Gillan. 2001. Parallel sparse matrix multiplication for linear scaling electronic structure calculations. Computer Physics Communications 137, 2 (2001), 255--273.
[14]
Aydin Buluc and John R. Gilbert. 2008. Challenges and advances in parallel sparse matrix-matrix multiplication. In Proceedings of the 37th International Conference on Parallel Processing. IEEE, 503--510.
[15]
Steven K. Burger and Weitao Yang. 2008. Linear-scaling quantum calculations using non-orthogonal localized molecular orbitals. Journal of Physics: Condensed Matter 20, 29 (2008), 294209.
[16]
Eric J. Bylaska, Michael Holst, and John H. Weare. 2009. Adaptive finite element method for solving the exact Kohn--Sham equation of density functional theory. Journal of Chemical Theory and Computation 5, 4 (2009), 937--948.
[17]
Matt Challacombe. 2000. A general parallel sparse-blocked matrix multiply for linear scaling SCF theory. Computer Physics Communications 128, 1--2 (2000), 93--107.
[18]
Denis Davydov, Tymofiy Gerasimov, Jean-Paul Pelteret, and Paul Steinmann. 2017. Convergence study of the h-adaptive PUM and the hp-adaptive FEM applied to eigenvalue problems in quantum mechanics. Advanced Modeling and Simulation in Engineering Sciences 4, 1 (12 Dec 2017), 7.
[19]
Denis Davydov, Timo Heister, Martin Kronbichler, and Paul Steinmann. 2018. Matrix-free locally adaptive finite element solution of density-functional theory with nonorthogonal orbitals and multigrid preconditioning. Physica Status Solidi B: Basic Solid State Physics 255, 9 (2018).
[20]
Mehmet Deveci, Christian Trott, and Sivasankaran Rajamanickam. 2018. Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures. Parallel Computing 78 (2018), 33--46.
[21]
Jun Fang, Xingyu Gao, and Aihui Zhou. 2012. A Kohn--Sham equation solver based on hexahedral finite elements. J. Comput. Phys. 231, 8 (2012), 3166--3180.
[22]
Jean-Luc Fattebert and Jerry Bernholc. 2000. Towards grid-based O(N) density-functional theory methods: Optimized nonorthogonal orbitals and multigrid acceleration. Physical Review B 62 (Jul 2000), 1713--1722. Issue 3.
[23]
Jean-Luc Fattebert and Francois Gygi. 2004. Linear scaling first-principles molecular dynamics with controlled accuracy. Computer Physics Communications 162, 1 (2004), 24--36.
[24]
Jean-Luc Fattebert and Francois Gygi. 2006. Linear-scaling first-principles molecular dynamics with plane-waves accuracy. Phys. Rev. B 73 (Mar 2006), 115-124. Issue 11.
[25]
Jean-Luc Fattebert, Richard D. Hornung, and Andrew M. Wissink. 2007. Finite element approach for density functional theory calculations on locally-refined meshes. Journal of Computational Physics 223, 2 (2007), 759--773.
[26]
Paul Fischer, Misun Min, Thilina Rathnayake, Som Dutta, Tzanio Kolev, Veselin Dobrev, Jean-Sylvain Camier, Martin Kronbichler, Tim Warburton, Kasia Świrydowicz, and Jed Brown. 2020. Scalability of high-performance PDE solvers. The International Journal on High Performance Computing Applications in press (2020).
[27]
Jeffrey T. Frey and Douglas J. Doren. 2011. TubeGen 3.4 (web-interface, http://turin.nss.udel.edu/research/tubegenonline.html), University of Delaware, Newark, DE. (2011).
[28]
Giulia Galli and Michele Parrinello. 1992. Large scale electronic structure calculations. Physical Review Letters 69, 24 (1992), 3547.
[29]
Chris M. Goringe, Eduardo Hernández, Michael J. Gillan, and Ian J. Bush. 1997. Linear-scaling DFT-pseudopotential calculations on parallel computers. Computer Physics Communications 102, 1--3 (1997), 1--16.
[30]
Fred G. Gustavson. 1978. Two fast algorithms for sparse matrices: Multiplication and permuted transposition. ACM Transactions on Mathematical Software (TOMS) 4, 3 (1978), 250--269.
[31]
Kikuji Hirose and Tomoya Ono. 2001. Direct minimization to generate electronic states with proper occupation numbers. Physical Review B 64, 8 (2001), 085105.
[32]
Kikuji Hirose, Tomoya Ono, Yoshitaka Fujimoto, and Shigeru Tsukamoto. 2005. First-Principles Calculations in Real-Space Formalism. Imperial College Press, 57 Shelton Street, Covent Garden, London WC2H 9HE.
[33]
Pierre C. Hohenberg and Walter Kohn. 1964. Inhomogeneous electron gas. Physical Review 136, 3B (1964), B864--B871.
[34]
Changwan Hong, Aravind Sukumaran-Rajam, Bortik Bandyopadhyay, Jinsung Kim, Süreyya Emre Kurt, Israt Nisa, Shivani Sabhlok, Ümit V. Çatalyürek, Srinivasan Parthasarathy, and P. Sadayappan. 2018. Efficient sparse-matrix multi-vector product on GPUs. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing. ACM, New York, 66--79.
[35]
Sohrab Ismail-Beigi and Tomas A. Arias. 1999. Locality of the density matrix in metals, semiconductors, and insulators. Physical Review Letters 82 (Mar. 1999), 2127--2130. Issue 10.
[36]
Jeremy Kepner and John Gilbert. 2011. Graph Algorithms in the Language of Linear Algebra. SIAM.
[37]
R. D. King-Smith and David Vanderbilt. 1994. First-principles investigation of ferroelectricity in perovskite compounds. Physical Review B 49, 9 (1994), 5828.
[38]
Walter Kohn. 1959. Analytic properties of Bloch waves and Wannier functions. Physical Review 115 (Aug 1959), 809--821. Issue 4.
[39]
Walter Kohn and Lu Jeu Sham. 1965. Self-consistent equations including exchange and correlation effects. Physical Review 140, 4A (1965), A1133--A1138.
[40]
Martin Kronbichler and Katharina Kormann. 2012. A generic interface for parallel cell-based finite element operator application. Computers 8 Fluids 63 (2012), 135--147.
[41]
Martin Kronbichler and Katharina Kormann. 2019. Fast matrix-free evaluation of discontinuous Galerkin finite element operators. ACM Transactions on Mathematical Software 45, 3 (2019), 29/1--40.
[42]
Alfio Lazzaro, Joost VandeVondele, Jürg Hutter, and Ole Schütt. 2017. Increasing the efficiency of sparse matrix-matrix multiplication with a 2.5 D algorithm and one-sided MPI. In Proceedings of the Platform for Advanced Scientific Computing Conference. ACM, 3.
[43]
Jianfeng Lu and Kyle Thicke. 2017. Orbital minimization method with l1 regularization. Journal of Computational Physics 336 (2017), 87--103.
[44]
Francesco Mauri and Giulia Galli. 1994. Electronic-structure calculations and molecular-dynamics simulations with linear system-size scaling. Physical Review B 50, 7 (1994), 4316.
[45]
Michael McCourt, Barry Smith, and Hong Zhang. 2013. Efficient sparse matrix-matrix products using colorings. SIAM Journal on Matrix Analysis and Applications (2013).
[46]
Phani Motamarri, Sambit Das, Shiva Rudraraju, Krishnendu Ghosh, Denis Davydov, and Vikram Gavini. 2019. DFT-FE - A massively parallel adaptive finite-element code for large-scale density functional theory calculations. Computer Physics Communications (2019).
[47]
Phani Motamarri, Michael R. Nowak, Kenneth Leiter, Jaroslaw Knap, and Vikram Gavini. 2012. Higher-order adaptive finite-element methods for Kohn--Sham density functional theory. Journal of Computational Physics 253, 15 (2012), 308--343.
[48]
Peter Munch, Katharina Kormann, and Martin Kronbichler. 2020. hyper.deal: An efficient, matrix-free finite-element library for high-dimensional partial differential equations. (2020). arxiv:arXiv:2002.08110v1
[49]
Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, and Aydın Buluç. 2019. Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors. Parallel Comput. 90 (2019), 102545.
[50]
Anna Nissen, Katharina Kormann, Magnus Grandin, and Kristoffer Virta. 2015. Stable difference methods for block-oriented adaptive grids. Journal of Scientific Computing 65, 2 (2015), 486--511.
[51]
Pablo Ordejón, David A. Drabold, Richard M. Martin, and Matthew P. Grumbach. 1995. Linear system-size scaling methods for electronic-structure calculations. Physical Review B 51, 3 (1995), 1456.
[52]
John E. Pask, Barry M. Klein, Philip A. Sterne, and Chingyao Y. Fong. 2001. Finite-element methods in electronic-structure theory. Computer Physics Communications 135, 1 (2001), 1--34.
[53]
John E. Pask and Natarajan Sukumar. 2017. Partition of unity finite element method for quantum mechanical materials calculations. Extreme Mechanics Letters 11 (2017), 8--17.
[54]
Md Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Jongsoo Park, Michael J. Anderson, Satya Gautam Vadlamudi, Dipankar Das, Sergey G. Pudov, Vadim O. Pirogov, and Pradeep Dubey. 2015. Parallel efficient sparse matrix-matrix multiplication on multicore platforms. In International Conference on High Performance Computing. Springer, 48--57.
[55]
Liang Peng, Feng Long Gu, and Weitao Yang. 2013. Effective preconditioning for ab initio ground state energy minimization with non-orthogonal localized molecular orbitals. Physical Chemistry Chemical Physics 15, 37 (2013), 15518--15527.
[56]
L. Ramdas Ram-Mohan. 2002. Finite Element and Boundary Element Applications in Quantum Mechanics. Oxford University Press.
[57]
Emanuel H. Rubensson, Elias Rudberg, and Paweł Sałek. 2006. Sparse matrix algebra for quantum modeling of large systems. In Proceedings of the International Workshop on Applied Parallel Computing. Springer, 90--99.
[58]
Emanuel H. Rubensson, Elias Rudberg, and Paweł Sałek. 2007. A hierarchic sparse matrix data structure for large-scale Hartree--Fock/Kohn--Sham calculations. Journal of Computational Chemistry 28, 16 (2007), 2531--2537.
[59]
Elias Rudberg, Emanuel H. Rubensson, Paweł Sałek, and Anastasia Kruchinina. 2018. Ergo: An open-source program for linear-scaling electronic structure calculations. SoftwareX 7 (2018), 107--111.
[60]
Chandra Saravanan, Yihan Shao, Roi Baer, Philip N. Ross, and Martin Head-Gordon. 2003. Sparse matrix multiplications for linear scaling electronic structure calculations in an atom-centered basis set using multiatom blocks. Journal of Computational Chemistry 24, 5 (2003), 618--622.
[61]
Volker Schauer and Christian Linder. 2015. The reduced basis method in all-electron calculations with finite elements. Advances in Computational Mathematics 41, 5 (2015), 1035--1047.
[62]
Jan Treibig, Georg Hager, and Gerhard Wellein. 2010. LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures. San Diego CA.
[63]
Eiji Tsuchida and Masaru Tsukada. 1996. Adaptive finite-element method for electronic-structure calculations. Physical Review B 54, 11 (sep 1996), 7602--7605.
[64]
Eiji Tsuchida and Masaru Tsukada. 1998. Large-scale electronic-structure calculations based on the adaptive finite-element method. Journal of the Physical Society of Japan 67, 11 (1998), 3844--3858.
[65]
Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM 52, 4 (April 2009), 65--76.
[66]
Samuel Webb Williams. 2008. Auto-tuning Performance on Multicore Computers. Ph.D. Dissertation. University of California, Berkeley.
[67]
Toby D. Young, Eloy Romero, and Jose E. Roman. 2013. Parallel finite element density functional computations exploiting grid refinement and subspace recycling. In Computer Physics Communications. IEEE, 66--72.
[68]
Dier Zhang, Lihua Shen, Aihui Zhou, and Xin-Gao Gong. 2008. Finite element method for solving Kohn--Sham equations based on self-adaptive tetrahedral mesh. Physics Letters A 372, 30 (2008), 5071--5076.

Cited By

View all
  • (2024)Fast hardware-aware matrix-free algorithms for higher-order finite-element discretized matrix multivector products on distributed systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104925192:COnline publication date: 1-Oct-2024
  • (2024)On the construction of an efficient finite-element solver for phase-field simulations of many-particle solid-state-sintering processesComputational Materials Science10.1016/j.commatsci.2023.112589231(112589)Online publication date: Jan-2024
  • (2020)ExaDG: High-Order Discontinuous Galerkin for the Exa-ScaleSoftware for Exascale Computing - SPPEXA 2016-201910.1007/978-3-030-47956-5_8(189-224)Online publication date: 31-Jul-2020

Index Terms

  1. Algorithms and Data Structures for Matrix-Free Finite Element Operators with MPI-Parallel Sparse Multi-Vectors

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Parallel Computing
      ACM Transactions on Parallel Computing  Volume 7, Issue 3
      Special Issue on PPoPP 2017 (Part 2) and Regular Papers
      September 2020
      182 pages
      ISSN:2329-4949
      EISSN:2329-4957
      DOI:10.1145/3407694
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 29 June 2020
      Accepted: 01 April 2020
      Revised: 01 January 2020
      Received: 01 July 2019
      Published in TOPC Volume 7, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Finite element method
      2. density functional theory
      3. matrix-free method

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • German Research Foundation (Deutsche Forschungsgemeinschaft, DFG)
      • Bayerisches Kompetenznetzwerk für Technisch-Wissenschaftliches Hoch- und Höchstleistungsrechnen (KONWIHR)

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)20
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 01 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Fast hardware-aware matrix-free algorithms for higher-order finite-element discretized matrix multivector products on distributed systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104925192:COnline publication date: 1-Oct-2024
      • (2024)On the construction of an efficient finite-element solver for phase-field simulations of many-particle solid-state-sintering processesComputational Materials Science10.1016/j.commatsci.2023.112589231(112589)Online publication date: Jan-2024
      • (2020)ExaDG: High-Order Discontinuous Galerkin for the Exa-ScaleSoftware for Exascale Computing - SPPEXA 2016-201910.1007/978-3-030-47956-5_8(189-224)Online publication date: 31-Jul-2020

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media