计算机科学 ›› 2022, Vol. 49 ›› Issue (6): 73-80.doi: 10.11896/jsjkx.210900045
叶跃进1, 李芳1, 陈德训2, 郭恒2, 陈鑫1
YE Yue-jin1, LI Fang1, CHEN De-xun2, GUO Heng2, CHEN Xin1
摘要: 如何高效地解决非结构网格离散访存问题一直是科学与工程计算并行算法和应用领域关注的核心热点问题之一。基于国产申威异构众核架构而设计的分布式区块重连的优化算法,在解决应用课题中的非结构稀疏问题时能始终保持高效的计算性能。通过深入分析众核架构片上的通信机制来设计高效的消息分组策略,以提高从核片上阵列带宽的利用率,同时结合无栅栏数据分发算法充分发挥国产异构众核体系架构网络的性能。通过建立性能模型与实验测试分析可知,该算法在不同访存特征下平均内存带宽能达到理论值的70%以上,与主核串行算法相比具有平均10倍和最高45倍的加速性能。同时通过对多个不同领域的应用进行测试分析也证明了该算法的普适性。
中图分类号:
[1] LI YY,XUE W,CHEN D X,et al.Performance optimization of sparse matrix vector multiplication on Sunway many-core architecture[J].Chinese Journal of Computers,2020,43(6):1011-1020. [2] ZHENG F,LI H L,LV H,et al.Cooperative computing techniques for a deeply fused and heterogeneous many-core processor architecture[J].Journal of Computer Science and Techno-logy,2015,30(1):145-162. [3] GUNNELS J A,HENRY G M,VAN DE GEIJN R A.A Family of High-Performance Matrix Multiplication Algorithms[C]//Proceedings of the International Conference on Computational Sciences-Part I.London,UK,UK:Springer-Verlag,2001:51-60. [4] GOTO K,VAN DE GRIJN R.High-performance Implementation of the Level-3BLAS[J].ACM Transaction on Mathematical Software,2008,35(4):1-14. [5] CHECCONI F,PETRINI F,WILLCOCK J,et al.Breaking the speed and scalability barriers for graph exploration on distributed-memory machines[C]//International Conference on Storage Anal & High Performance Computing Networking.SC12,2012. [6] UENO K,SUZUMURA T,MARUYAMA N,et al.Exremescale breath- first search chon super computer[C]//Big Data (Big Data).IEEE International Conference,2016:1040-1047. [7] BEAMER S,BULUC A,ASANOVIC K,et al.Distributed me-mory breadth-first search revisited:Enabling bottom-up search[C]//Parallel and Distributed Porcessing Symposium Workshops.IEEE International Conference,2013:1618-1627. [8] CHECCONI F,PETRINI F.Traversing trillions of edges in real time:Graph exploration on large scale parallel machines[C]//International Conference & International Parallel and Distributed Processing Symposium.IEEE International Conference,2014:425-434. [9] BISSON M,BERNASCHI M,MASTRONSTEFANO E.Parallel Distributed Breadth First Search on the Kepler Architecture[J].IEEE Transaction on Parallel and Distributed System,2016,27(7):2091-2102. [10] LIAO J F.Redesigning CAM-SE for Peta-Scale Climate Mode-ling Performance on Sunway TaihuLight[D].Beijing:Tsinghua University,2017. [11] LI F,LI Z H,XU J X,et al.Research on Adaptation of CFD Software Based on Many-core Architecture of 100P Domestic Supercomputing System[J].Chinese Journal of Computers,2020,47(1):1-8. [12] AO Y L.Research on Key Optimizations of Sparse Matrix and Stencil Computation for the Domestic Large Many-core System[D].Hefei:University of Science and Technology of China,2017. [13] AN H,YU Y,CHEN J S,et al.Pipelining Computation and Optimization Strategies for Scaling GROMACS on the Sunway Many-core Processor[C]//International Conference on Algorithms and Architectures for Parallel Processing.2018:134-137. [14] KOURTIS K,KARAKASIS V,GOUMAS G,et al.Csx:An extended compression format for spmv on shared memory system[J].ACM SIGPLAN Notices,2011,46(2):247-256 [15] SUN Q,ZHANG C Y.Bandwith reduced parallel SpMV on the SW26010 many-core platform[C]//Proceedings of the 47th International Conference on Parallel Processing Eugence.USA,2018:1-10. [16] ASHARI A,SEDAGHATI N,EISENLOHR J,et al.An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs[C]//Proceedings of the 28th ACM International Conference on Supercomputing.ACM,2014:273-282. [17] LIU C X,XIE B W,LIU X,et al.Towards efficeient SpMV on sunway many-core architectures[C]//Proceedings of the 2018 International Conference on Supercomputing.Portland,USA,2018:363-373. [18] NI H,LIU X.Many-core Optimization Technology Of Unstructured-grid On SunWay TaihuLight[J].Computer Engineering,2019,45(6):51-57. [19] LIN H.Extreme-scale graph analysis on heterogeneous architecture[D].Beijing:Tsinghua University,2017. [20] APHU E S,BRANTSON E T,ADDO B J,et al.Development of Finite Difference Explicit and Implicit Numerical Reservoir Simulator for Modelling Single Phase Flow in Porous Media[J].Earth Science,2018,134:2-10. |
[1] | 陈鑫, 李芳, 丁海昕, 孙唯哲, 刘鑫, 陈德训, 叶跃进, 何香. 面向国产异构众核架构的CFD非结构网格计算并行优化方法 Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture 计算机科学, 2022, 49(6): 99-107. https://doi.org/10.11896/jsjkx.210400157 |
[2] | 倪鸿, 刘鑫. 非结构网格下稀疏下三角方程求解器众核优化技术研究 Many-core Optimization for Sparse Triangular Solver Under Unstructured Grids 计算机科学, 2019, 46(6A): 518-522. |
[3] | 刘鑫,陆林生,陈德训. 非结构网格并行计算预处理方法研究 Research on Pre-processing Methods of Unstructured Grids 计算机科学, 2012, 39(3): 308-311. |
|