[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

GPU-based matrix-free finite element solver exploiting symmetry of elemental matrices

  • Regular Paper
  • Published:
Computing Aims and scope Submit manuscript

Abstract

Matrix-free solvers for finite element method (FEM) avoid assembly of elemental matrices and replace sparse matrix-vector multiplication required in iterative solution method by an element level dense matrix-vector product. In this paper, a novel matrix-free strategy for FEM is proposed which computes element level matrix-vector product by using only the symmetric part of the elemental matrices. The proposed strategy is developed to take advantage of the massive parallelism of Graphics Processing Unit (GPU). A unique data structure is also introduced which ensures localized and coalesced memory access suitable for a GPU while storing only the symmetric part of the elemental matrices. In addition, the proposed strategy emphasizes the efficient use of register cache, uniform workload distribution, reducing thread synchronization, and maintaining sufficient granularity to make the best use of GPU resources. The performance of the proposed strategy is evaluated by solving elasticity and heat conduction problems using 4-noded quadrilateral element with two degrees of freedom (DOFs) and one DOF per node, respectively. The performance is compared with the matrix-free solver strategies on GPU from the literature. It is found that a maximum speedup of 4.9 \(\times \) is obtained for the elasticity problem and a maximum of 3.2 \(\times \) speedup for the heat conduction problem. Further, the proposed strategy takes the least amount of GPU memory as compared to the existing strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Abdelfattah A, Dongarra J, Keyes D, Ltaief H (2012) Optimizing memory-bound SYMV kernel on GPU hardware accelerators. In: International conference on high performance computing for computational science. Springer, pp 72–79

  2. Ahamed AKC, Magoulès F (2017) Conjugate gradient method with graphics processing unit acceleration: CUDA vs OpenCL. Adv Eng Softw 111:32–42. https://doi.org/10.1016/j.advengsoft.2016.10.002

    Article  Google Scholar 

  3. Alexandersen J, Sigmund O, Aage N (2016) Large scale three-dimensional topology optimisation of heat sinks cooled by natural convection. Int J Heat Mass Transf 100:876–891. https://doi.org/10.1016/j.ijheatmasstransfer.2016.05.013

    Article  Google Scholar 

  4. Altinkaynak A (2017) An efficient sparse matrix-vector multiplication on CUDA-enabled graphic processing units for finite element method simulations. Int J Numer Methods Eng 110(1):57–78. https://doi.org/10.1002/nme.5346

    Article  MathSciNet  MATH  Google Scholar 

  5. Anzt H, Gates M, Dongarra J, Kreutzer M, Wellein G, Köhler M (2017) Preconditioned Krylov solvers on GPUs. Parallel Comput 68:32–44

    Article  MathSciNet  Google Scholar 

  6. Bauer S, Drzisga D, Mohr M, Rüde U, Waluga C, Wohlmuth B (2018) A stencil scaling approach for accelerating matrix-free finite element implementations. SIAM J Sci Comput 40(6):C748–C778. https://doi.org/10.1137/17M1148384

    Article  MathSciNet  MATH  Google Scholar 

  7. Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis, ACM, p 18

  8. Cai Y, Li G, Wang H (2013) A parallel node-based solution scheme for implicit finite element method using GPU. Proc Eng 61:318–324. https://doi.org/10.1016/j.proeng.2013.08.022

    Article  Google Scholar 

  9. Carey GF, Jiang BN (1986) Element-by-element linear and nonlinear solution schemes. Int J Numer Methods Biomed Eng 2(2):145–153

    MATH  Google Scholar 

  10. Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Methods Eng 85(5):640–669

    Article  Google Scholar 

  11. Charara A, Keyes D, Ltaief H (2019) Batched triangular dense linear algebra kernels for very small matrix sizes on GPUs. ACM Trans Math Softw TOMS 45(2):15:1–15:28. https://doi.org/10.1145/3267101

    Article  MathSciNet  MATH  Google Scholar 

  12. Corporation NVIDIA (2019) CUDA C programming guide. Version 10

  13. Deakin T, McIntosh-Smith S (2015) GPU-STREAM: benchmarking the achievable memory bandwidth of graphics processing units. In: SuperComputing, IEEE/ACM, Austin, USA

  14. Fehn N, Wall WA, Kronbichler M (2019) A matrix-free high-order discontinuous Galerkin compressible Navier–Stokes solver: a performance comparison of compressible and incompressible formulations for turbulent incompressible flows. Int J Numer Methods Fluids 89(3):71–102. https://doi.org/10.1002/fld.4683

    Article  MathSciNet  Google Scholar 

  15. Filippone S, Cardellini V, Barbieri D, Fanfarillo A (2017) Sparse matrix-vector multiplication on GPGPUs. ACM Trans Math Softw TOMS 43(4):30

    MathSciNet  MATH  Google Scholar 

  16. Fu Z, Lewis TJ, Kirby RM, Whitaker RT (2014) Architecting the finite element method pipeline for the GPU. J Comput Appl Math 257:195–211. https://doi.org/10.1016/j.cam.2013.09.001

    Article  MathSciNet  MATH  Google Scholar 

  17. Göddeke D (2011) Fast and accurate finite-element multigrid solvers for PDE simulations on GPU clusters. Logos Verlag Berlin GmbH

  18. Hughes TJR, Levit I, Winget J (1983) An element-by-element solution algorithm for problems of structural and solid mechanics. Comput Methods Appl Mech Eng 36(2):241–254. https://doi.org/10.1016/0045-7825(83)90115-9

    Article  MATH  Google Scholar 

  19. Joldes GR, Wittek A, Miller K (2010) Real-time nonlinear finite element computations on GPU-application to neurosurgical simulation. Comput Methods Appl Mech Eng 199(49–52):3305–3314

    Article  Google Scholar 

  20. Kiran U, Sharma D, Gautam SS (2019) GPU-warp based finite element matrices generation and assembly using coloring method. J Comput Des Eng 6(4):705–718. https://doi.org/10.1016/j.jcde.2018.11.001

    Article  Google Scholar 

  21. Kiss I, Gyimothy S, Badics Z, Pavo J (2012) Parallel realization of the element-by-element FEM technique by CUDA. Magn IEEE Trans 48(2):507–510

    Article  Google Scholar 

  22. Komatitsch D, Michéa D, Erlebacher G (2009) Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. J Parallel Distrib Comput 69(5):451–460

    Article  Google Scholar 

  23. Kronbichler M, Kormann K (2019) Fast matrix-free evaluation of discontinuous Galerkin finite element operators. ACM Trans Math Softw. https://doi.org/10.1145/3325864

    Article  MathSciNet  MATH  Google Scholar 

  24. Li R, Saad Y (2013) GPU-accelerated preconditioned iterative linear solvers. J Supercomput 63(2):443–466

    Article  Google Scholar 

  25. Macioł P, Płaszewski P, Banaś K (2010) 3D finite element numerical integration on GPUs. Proc Comput Sci 1(1):1093–1100

    Article  Google Scholar 

  26. Markall G, Slemmer A, Ham D, Kelly P, Cantwell C, Sherwin S (2013) Finite element assembly strategies on multi-core and many-core architectures. Int J Numer Methods Fluids 71(1):80–97

    Article  MathSciNet  Google Scholar 

  27. Martínez-Frutos J, Martínez-Castejón PJ, Herrero-Pérez D (2015) Fine-grained GPU implementation of assembly-free iterative solver for finite element problems. Comput Struct 157:9–18

    Article  Google Scholar 

  28. Martínez-Frutos J, Herrero-Pérez D (2015) Efficient matrix-free GPU implementation of fixed grid finite element analysis. Finite Elem Anal Des 104:61–71. https://doi.org/10.1016/j.finel.2015.06.005

    Article  Google Scholar 

  29. Müller E, Guo X, Scheichl R, Shi S (2013) Matrix-free GPU implementation of a preconditioned conjugate gradient solver for anisotropic elliptic PDEs. Comput Vis Sci 16(2):41–58. https://doi.org/10.1007/s00791-014-0223-x

    Article  MathSciNet  MATH  Google Scholar 

  30. Nath R, Tomov S, Dong TT, Dongarra J (2011) Optimizing symmetric dense matrix-vector multiplication on GPUs. In: Proceedings of 2011 international conference for high performance computing, networking, storage and analysis. ACM, New York, NY, USA, SC ’11, pp 6:1–6:10. https://doi.org/10.1145/2063384.2063392

  31. Ohshima S, Hayashi M, Katagiri T, Nakajima K (2013) Implementation and evaluation of 3D finite element method application for CUDA. In: Daydé M, Marques O, Nakajima K (eds) High performance computing for computational science—VECPAR 2012. Springer, Berlin, Heidelberg, pp 140–148

    Chapter  Google Scholar 

  32. Pikle NK, Sathe SR, Vyavahare AY (2018) High performance iterative elemental product strategy in assembly-free FEM on GPU with improved occupancy. Computing 100(12):1273–1297. https://doi.org/10.1007/s00607-018-0613-x

    Article  MathSciNet  Google Scholar 

  33. Ram L, Sharma D (2017) Evolutionary and GPU computing for topology optimization of structures. Swarm Evolut Comput 35:1–13

    Article  Google Scholar 

  34. Reguly I, Giles M (2013) Finite element algorithms and data structures on graphical processing units. Int J Parallel Progr 43(2):203–239

    Article  Google Scholar 

  35. Rupp K, Weinbub J, Jüngel A, Grasser T (2016) Pipelined iterative solvers with kernel fusion for graphics processing units. ACM Trans Math Softw TOMS 43(2):11:1–11:27. https://doi.org/10.1145/2907944

    Article  MathSciNet  MATH  Google Scholar 

  36. Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia. https://doi.org/10.1137/1.9780898718003

    Book  MATH  Google Scholar 

  37. Sanfui S, Sharma D (2017) A two-kernel based strategy for performing assembly in FEA on the graphics processing unit. In: 2017 international conference on advances in mechanical, industrial, automation and management systems (AMIAMS), IEEE, pp 1–9

  38. Sanfui S, Sharma D (2019) Exploiting symmetry in elemental computation and assembly stage of GPU-accelerated FEA. In: Proceedings at the 10th international conference on computational methods (ICCM2019). ScienTech Publisher, pp 641–651

  39. Sanfui S, Sharma D (2020) A three-stage gpu-based fea matrix generation strategy for unstructured meshes. International Journal of Numerical Methods in Engineering. (in press). https://doi.org/10.1002/nme.6383

  40. Shewchuk JR (1994) An introduction to the conjugate gradient method without the agonizing pain. Tech. Rep, Pittsburgh

    Google Scholar 

  41. Tezduyar T, Aliabadi S, Behr M, Mittal S (1994) Massively parallel finite element simulation of compressible and incompressible flows. Comput Methods Appl Mech Eng 119(1):157–177. https://doi.org/10.1016/0045-7825(94)00082-4

    Article  MATH  Google Scholar 

  42. Top500 Supercomputers (2019). https://www.top500.org. Accessed 2 Jan 2020

  43. van Rietbergen B, Weinans H, Huiskes R, Polman B (1996) Computational strategies for iterative solutions of large FEM applications employing voxel data. Int J Numer Methods Eng 39(16):2743–2767

    Article  Google Scholar 

  44. Wong J, Kuhl E, Darve E (2015) A new sparse matrix vector multiplication graphics processing unit algorithm designed for finite element problems. Int J Numer Methods Eng 102(12):1784–1814. https://doi.org/10.1002/nme.4865

    Article  MathSciNet  MATH  Google Scholar 

  45. Yagawa G, Soneda N, Yoshimura S (1991) A large scale finite element analysis using domain decomposition method on a parallel computer. Comput Struct 38(5):615–625. https://doi.org/10.1016/0045-7949(91)90013-C

    Article  MATH  Google Scholar 

  46. Zhang J, Shen D (2013) GPU-based implementation of finite element method for elasticity using CUDA. In: 2013 IEEE 10th international conference on high performance computing and communications, 2013 IEEE international conference on embedded and ubiquitous computing, pp 1003–1008. https://doi.org/10.1109/HPCC.and.EUC.2013.142

Download references

Acknowledgements

The authors are grateful to the SERB, DST for supporting this research under Project SR/FTP/ETA-0008/2014.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepak Sharma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kiran, U., Gautam, S.S. & Sharma, D. GPU-based matrix-free finite element solver exploiting symmetry of elemental matrices. Computing 102, 1941–1965 (2020). https://doi.org/10.1007/s00607-020-00827-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-020-00827-4

Keywords

Mathematics Subject Classification

Navigation