计算机科学 ›› 2022, Vol. 49 ›› Issue (6): 108-118.doi: 10.11896/jsjkx.210300259
汪晋, 刘江
WANG Jin, LIU Jiang
摘要: 在科学计算和工程领域,大型稀疏线性方程组的求解非常常见,目前已经有许多迭代方法和预处理技术被用于求解这类方程。DILU预处理技术类似于ILU,是开源计算流体力学软件OpenFOAM中重要的预处理技术,但未在OpenFOAM以外的领域引起关注,目前也没有完整的GPU实现。比较了DILU和ILU预处理技术对稳定双共轭梯度法(BiCGStab)加速的效果,以及它们在构造预处理子上的开销,结果表明,DILU在加速效果上不逊于ILU且在稳定性上优于ILU。在GPU并行实现方面,DILU可以使用分层并行和无全局同步并行两种并行策略,详细讨论了DILU预处理技术在这两种策略下的实现方法,给出了相关的算法和参考代码,然后比较了在两种并行策略下DILU预处理技术的性能。数值实验结果表明,在实践中两种并行策略各有优劣,可以根据实际表现进行选择。另外比较了GPU和CPU执行的DILU预处理技术,GPU在性能上具有明显优势,在线性方程组求解上存在性能瓶颈的程序可以移植到GPU平台以提升性能。
中图分类号:
[1] SAAD Y.Iterative Methods for Sparse Linear Systems[M/OL].SIAM,2003.https://www-users.cse.umn.edu/~saad/itermethbook_2nded.pdf. [2] GUTKNECHT M H.A brief introduction to Krylov spacemethods for solving linear systems[M]//Frontiers of Computational Science.Berlin:Springer,2007:53-62. [3] LANGR D,TVRDIK P.Evaluation Criteria for Sparse Matrix Storage Formats[J].IEEE Transactions on Parallel and Distri-buted Systems,2016,27(2):428-440. [4] DAVIS T A,RAJAMANICKAM S,SID-LAKHDAR W M.ASurvey of Direct Methods for Sparse Linear Systems[J].Acta Numerica,2016,25:383-566. [5] HACKBUSCH W.Iterative Solution of Large Sparse Systems of Equations (2nd ed)[M].Berlin:Springer,2016. [6] FILIPPONE S,CARDELLINI V,BARBIERI D,et al.SparseMatrix-vector Multiplication on GPGPUs[J].ACM Transactions on Mathematical Software (TOMS),2017,43(4):30. [7] STEINBERGER M,DERLERY A,ZAYER R,et al.How Naive is Naive SpMV on the GPU?[C]//2016 IEEE High Perfor-mance Extreme Computing Conference(HPEC).IEEE,2016:1-8. [8] GAO J,QI P,HE G.Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU[J].Mathematical Problems in Enginee-ring,2016,2016:1-14. [9] BELL N,GARLAND M.Efficient sparse matrix-vector multiplication on CUDA:Nvidia Technical Report:NVR-2008-004[R].Nvidia Corporation,2008. [10] BARRETT R,BERRY M W,CHAN T F,et al.Templates forthe Solution of Linear Systems:Building Blocks for Iterative Methods[M].SIAM Other Titles in Applied Mathematics,1994. [11] BEHRENS T.OpenFOAM’s basic solvers for linear systems of equations[J/OL].Chalmers,Department of Applied Mechanics,2009,18(2).http://www.tfd.chalmers.se/~hani/kurser/os_cfd_2008/timberhrens/tibeh-report-fin.pdf. [12] OpenFOAM Project[EB/OL].https://openfoam.com/. [13] OWENS J D,HOUSTON M,LUEBKE D,et al.GPU computing[J].Proceedings of the IEEE,2008,96(5):879-899. [14] MEIJERINK J A,VAN DER VORST H A.An Iterative Solution Method for Linear Systems of Which the Coefficient Matrix is a Symmetric M-Matrix[J].Mathematics of Computation,1977,31(137):148-162. [15] KERSHAW D S.The incomplete Cholesky-conjugate gradient method for the iterative solution of systems of linear equations[J].Journal of Computational Physics,1978,26(1):43-65. [16] SAAD Y.ILUT:A dual threshold incomplete LU factorization[J].Numerical Linear Algebra with Applications,1994,1(4):387-402. [17] POMMERELL C.Solution of large unsymmetric systems of li-near equations[D].ETH Zürich:Swiss Federal Institute of Technology Zurich,1992. [18] ANDERSON E,SAAD Y.Solving Sparse Triangular LinearSystem on Parallel Computers[J].International Journal of High Speed Computing,1989,1(1):73-95. [19] NAUMOV M.Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU:NVIDIA Technical Report:NVR-2011-001[R].2011. [20] LI R,SAAD Y.GPU-accelerated preconditioned iterative linear solvers[J].The Journal of Supercomputing,2013,63(2):443-466. [21] NAUMOV M,CASTONGUAY P,COHEN J.Parallel GraphColoring with Applications to the Incomplete-LU Factorization on the GPU:NVIDIA Technical Report:NVR-2015-001[R].2015. [22] LIU W,LI A,HOGG J,et al.A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves[C]//European Conference on Parallel Processing.Cham:Springer,2016:617-630. [23] LI R.On parallel solution of sparse triangular linear systems in CUDA[J].arXiv:1710.04985,2017. [24] SU J,ZHANG F,LIU W,et al.CapelliniSpTRSV:A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs[C]//49th International Conference on Parallel Processing-ICPP.2020. [25] XIE C,CHEN J,FIROZ J S,et al.Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures[J].arXiv:.06959,2020. [26] ANZT H,RIBIZEL T,FLEGAR G,et al.ParILUT-A Parallel Threshold ILU for GPUs[C]//2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).2019:231-241. [27] CHEN Y,TIAN X,LIU H,et al.Parallel ILU preconditioners in GPU computation[J].Soft Computing,2018,22(24):8187-8205. [28] NAUMOV M.Incomplete-LU and Cholesky preconditioned ite-rative methods using CUSPARSE and CUBLAS[J/OL].Nvidia white paper,2011:1-16.https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.230.8934&rep=rep1&type=pdf. [29] BNA S,SPISSOA I,OLESENB M,et al.PETSc4FOAM:A Library to plug-in PETSc into the OpenFOAM Framework[J/OL].PRACE White paper,2020.https://prace-ri.eu/wp-content/uploads/WP294-PETSc4FOAM-A-Library-to-plug-in-PETSc-into-the-OpenFOAM-Framework.pdf. [30] Paralution project[EB/OL].http://www.paralution.com. [31] SAAD Y,SCHULTZ M H.GMRES:A generalized minimal residual algorithm for solving nonsymmetric linear systems[J].SIAM Journal on Scientific and Statistical Computing,1986,7(3):856-869. [32] VAN DER VORST H A.Bi-CGSTAB:A fast and smoothlyconverging variant of Bi-CG for the solution of nonsymmetric linear systems[J].SIAM Journal on Scientific and Statistical Computing,1992,13(2):631-644. [33] JENSEN T R,TOFT B.Graph Coloring Problems[M/OL].1995.https://doi.org/10.3929/ethz-a-000669614. [34] SALTZ J H.Aggregation Methods for Solving Sparse Triangular Systems on Multiprocessors[J].SIAM Journal on Scientific and Statistical Computing.1990,11(1):123-144. [35] KNUTH D E.The art of computer programming,volume 3:(2nd ed.) sorting and searching[M].Addison Wesley Longman Publishing Co.,Inc.,1998,17(4):324. [36] CORMEN T H,LEISERSON C E,RIVEST R L,et al.Introduction to algorithms second edition[M].MIT Press,2001. [37] HORN R A,JOHNSON C R.The Hadamard product[M]//Topics in Matrix Analysis:Cambridge University Press,1991:298-381. |
[1] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[2] | 宗迪迪, 谢益武. 基于法线迭代的模型中轴生成方法 Model Medial Axis Generation Method Based on Normal Iteration 计算机科学, 2022, 49(6A): 764-770. https://doi.org/10.11896/jsjkx.210400050 |
[3] | 周琴, 罗飞, 丁炜超, 顾春华, 郑帅. 基于逐次超松弛技术的Double Speedy Q-Learning算法 Double Speedy Q-Learning Based on Successive Over Relaxation 计算机科学, 2022, 49(3): 239-245. https://doi.org/10.11896/jsjkx.201200173 |
[4] | 刘江, 刘文博, 张矩. OpenFoam中多面体网格生成的MPI+OpenMP混合并行方法 Hybrid MPI+OpenMP Parallel Method on Polyhedral Grid Generation in OpenFoam 计算机科学, 2022, 49(3): 3-10. https://doi.org/10.11896/jsjkx.210700060 |
[5] | 黄颖琦, 陈红梅. 基于代价敏感卷积神经网络的非平衡问题混合方法 Cost-sensitive Convolutional Neural Network Based Hybrid Method for Imbalanced Data Classification 计算机科学, 2021, 48(9): 77-85. https://doi.org/10.11896/jsjkx.200900013 |
[6] | 李繁, 严星, 张晓宇. 基于GPU的特征脸算法优化研究 Optimization of GPU-based Eigenface Algorithm 计算机科学, 2021, 48(4): 197-204. https://doi.org/10.11896/jsjkx.200600033 |
[7] | 胡蓉, 阳王东, 王昊天, 罗辉章, 李肯立. 基于GPU加速的并行WMD算法 Parallel WMD Algorithm Based on GPU Acceleration 计算机科学, 2021, 48(12): 24-28. https://doi.org/10.11896/jsjkx.210600213 |
[8] | 文敏华, 汪申鹏, 韦建文, 李林颖, 张斌, 林新华. 基于DGX-2的湍流燃烧问题优化研究 DGX-2 Based Optimization of Application for Turbulent Combustion 计算机科学, 2021, 48(12): 43-48. https://doi.org/10.11896/jsjkx.201200129 |
[9] | 汪亮, 周新志, 严华. 基于GPU的实时SIFT算法 Real-time SIFT Algorithm Based on GPU 计算机科学, 2020, 47(8): 105-111. https://doi.org/10.11896/jsjkx.190700036 |
[10] | 倪晓军, 佘戌豪. 面向无线传感网络应用的改进LZW算法 Improvement of LZW Algorithms for Wireless Sensor Networks 计算机科学, 2020, 47(5): 260-264. https://doi.org/10.11896/jsjkx.190400108 |
[11] | 刘世芳, 赵永华, 于天禹, 黄荣锋. 广义稠密对称特征问题标准化算法在GPU集群上的有效实现 Efficient Implementation of Generalized Dense Symmetric Eigenproblem StandardizationAlgorithm on GPU Cluster 计算机科学, 2020, 47(4): 6-12. https://doi.org/10.11896/jsjkx.191000009 |
[12] | 左宪禹, 张哲, 苏岳瀚, 刘扬, 葛强, 田军锋. 基于GPU多流并发并行模型的NDVI提取算法 Extraction Algorithm of NDVI Based on GPU Multi-stream Parallel Model 计算机科学, 2020, 47(4): 25-29. https://doi.org/10.11896/jsjkx.190500029 |
[13] | 曹锋,徐扬,钟建,宁欣然. 基于目标演绎距离的一阶逻辑子句集预处理方法 First-order Logic Clause Set Preprocessing Method Based on Goal Deduction Distance 计算机科学, 2020, 47(3): 217-221. https://doi.org/10.11896/jsjkx.190100004 |
[14] | 陈佳,欧阳金源,冯安琪,吴远,钱丽萍. 边缘计算构架下基于孤立森林算法的DoS异常检测 DoS Anomaly Detection Based on Isolation Forest Algorithm Under Edge Computing Framework 计算机科学, 2020, 47(2): 287-293. https://doi.org/10.11896/jsjkx.190100047 |
[15] | 孙士远, 何勇. 自动机终结字查找算法的设计与实现 Designs and Implementations of Algorithms for Searching Terminal Words of Automata 计算机科学, 2020, 47(11A): 599-603. https://doi.org/10.11896/jsjkx.200300096 |
|