计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 32-40.doi: 10.11896/jsjkx.200500093
所属专题: 高性能计算
郭杰1, 高希然2, 陈莉2, 傅游1, 刘颖2
GUO Jie1, GAO Xi-ran2, CHEN Li2, FU You1, LIU Ying2,
摘要: 多重网格是数值计算领域中一种加速迭代收敛的重要技术, 被广泛应用。近年来, 大规模并行计算系统向多核化、异构众核化发展, 多重网格应用也亟须适应新的并行计算平台。文中采用一种数据驱动的任务并行语言AceMesh将遗产的NAS MG程序移植到“天河二号”和“神威·太湖之光”两种不同架构的国产超算平台上, 展示了使用该语言对计算循环、通信代码的任务并行方法, 验证了AceMesh语言的跨平台性能可移植性。文中定性地分析了该应用的任务图特征和计算-通信重叠的特点, 并分别在两个并行计算平台上将其与现有编程模型MPI/OpenMP和MPI/OpenACC进行性能对比, 分析了AceMesh任务图并行程序对访存性能和通信-计算重叠的优化效果。实验数据表明, 相比传统的并行编程方法, AceMesh在“神威·太湖之光”和“天河二号”平台上分别最高获得了1.19X和1.85X的性能加速。最后, 针对该应用在不同网格层的通信特点以及通信序列化导致大量通信不能隐藏的问题, 提出了未来的研究方向。
中图分类号:
[1] BRANDT A.Multiscale computational methods:research activities[C]∥Proceedings of 1991 Hang Zhou International Conf.on Scientific Computation.Singapore:World Scientific Publishing Co., 1992. [2] BRANDT A.Multi-Level Adaptive Solutions to Boundary-ValueProblems.Mathematics of Computation, 1977, 31(138):333-390. [3] HACKBUSCH W.Multi-Grid Methods and Applications.Heidelberg:Springer, 1985. [4] NAKAJIMA K.Optimization of serial and parallel communications for parallel geometric multigrid method∥Proceedings of IEEE International Conference on Parallel and Distributed Systems(ICPADS).Hsinchu, Taiwan, 2014:25-32. [5] LIU X Z, LU Z H, HU X D, et al.Large-scale Parallel CFD Simulation Software-CCFD Development and Application[C]∥HPC China 2019.2019. [6] LEI J, LIU W, ZHOU Y L, et al.CFD unsteady flow simulations using GPU with high-order schemes[C]∥HPC China 2019.2019. [7] WANG W, XU C F, CHE Y G.A Heterogeneous Parallel Algorithm Based on Inner-Out Subdomain Dividing for High Order CFD Solver[C]∥HPC China 2019.2019. [8] NVIDIA, the Portland Group.The openacc application programming interface.http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.698.5254&rep=rep1&type=pdf. [9] OpenMP Architecture Review Board.OpenMP Application Program Interface(Version 4.0).http://www.openmp.org/. [10] DURAN A, AYGUADE E, BADIA R M, et al.OmpSs:A Proposal for Programming Heterogeneous Multi-core Architectures[J].Parallel Processing Letters, 2011, 21(2):173-193. [11] AUGONNET C, THIBAULT S, NAMYST R, et al.StarPU:A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures.Concurrency and Computation-Practice &Experience, 2011, 23(2):187-198. [12] Intel Inc.Intel CilkTM Plus.https://www.cilkplus.org. [13] Intel Inc.Intel Threading Building Blocks Documentation.https://software.intel.com/en-us/node/506286. [14] BRIGGS W L, EMDEN H V, MCCORMICK S F.A Multigrid Tutorial, 2nd Edition.Society for Industrial and Applied Mathematics, 2000. [15] WAGNER C.Introduction to Algebraic Multigrid.http://www.mgnet.org/mgnet/papers/Wagner/amgV11.pdf. [16] BAILEY D H, BARSZCZ E, BARTON J T, et al.The NAS Parallel Benchmarks.https://www.nas.nasa.gov/assets/pdf/techreports/1994/rnr-94-007.pdf. [17] XU Z, LIN J, MATSUOKA S.Benchmarking SW26010 Many-Core Processor[C]∥IEEE International Parallel & Distributed Processing Symposium Workshops.IEEE, 2017. [18] FU H H, LIAO J F, YANG J Z, et al.The Sunway Taihu Light supercomputer:system and applications.Science China(Information Sciences), 2016, 59(7):113-128. [19] LI F, LI Z H, XU J X, et al.Research on Adaptation of CFD Software Based on Many-core Architecture of 100P Domestic Supercomputing System.Computer Science, 2020, 47(1):24-30. [20] BASU P, VENKAT A, HALL M, et al.Compiler generation and autotuning of communication-avoiding operators for geometric multigrid[C]∥High Performance Computing.2013:452-461. [21] CHAN C, ANSEL J, WONG Y L, et al.Autotuning multigrid with petabricks[C]∥Proceedings of the ACM/IEEE Conference on High Performance Computing Networking.New York:ACM, 2009. [22] CHRISTEN M, SCHENK O, BURKHART H.PATUS:A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures[C]∥Parallel & Distributed Processing Symposium(IPDPS) 2011 IEEE International.2011:676-687. [23] MARJANOVIC V, LABARTA J, AYGUADE E, et al.Overlapping communication and computation by using a hybrid MPI/SMPSs approach[C]∥Proceedings of the 24th ACM International Conference on Supercomputing.2010:5-16. [24] CASTILLO E, JAIN N, CASAS M, et al.Optimizing computation-communication overlap in asynchronous task-based programs[C]∥Proceedings of the ACM International Conference on Supercomputing(ICS ’19).New York:Association for Computing Machinery, 2019:380-391. |
[1] | 陈鑫, 李芳, 丁海昕, 孙唯哲, 刘鑫, 陈德训, 叶跃进, 何香. 面向国产异构众核架构的CFD非结构网格计算并行优化方法 Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture 计算机科学, 2022, 49(6): 99-107. https://doi.org/10.11896/jsjkx.210400157 |
[2] | 袁欣辉, 林蓉芬, 魏迪, 尹万旺, 徐金秀. 面向国产异构众核处理器SW26010的BFS优化方法 Optimization of BFS on Domestic Heterogeneous Many-core Processor SW26010 计算机科学, 2020, 47(8): 98-104. https://doi.org/10.11896/jsjkx.191000013 |
[3] | 倪鸿, 刘鑫. 非结构网格下稀疏下三角方程求解器众核优化技术研究 Many-core Optimization for Sparse Triangular Solver Under Unstructured Grids 计算机科学, 2019, 46(6A): 518-522. |
[4] | 程东升,刘志勇,薛国伟,高月芳. 一种针对大波数Helmholtz方程的高性能并行预条件迭代求解算法 High-performance Parallel Preconditioned Iterative Solver for Helmholtz Equation with Large Wavenumbers 计算机科学, 2018, 45(7): 299-306. https://doi.org/10.11896/j.issn.1002-137X.2018.07.051 |
[5] | 顾坚,刘伟. 面向NUMA集群的代数多重网格算法优化 Optimizing Algebraic Multigrid on NUMA-based Cluster System 计算机科学, 2014, 41(6): 113-118. https://doi.org/10.11896/j.issn.1002-137X.2014.06.023 |
[6] | 许瑾晨,郭绍忠,黄永忠,王磊. 面向异构众核从核的数学函数库访存优化方法 Access Optimization Technique for Mathematical Library of Slave Processors on Heterogeneous Many-core Architectures 计算机科学, 2014, 41(6): 12-17. https://doi.org/10.11896/j.issn.1002-137X.2014.06.003 |
[7] | 杜振龙,李晓丽,郭延文,杨小健,沈钢纲. 大尺度图像编辑的泊松方程并行多重网格求解算法 Parallel Multigrid Approach for Solving Poisson PDE in Gigapixel Image Editing 计算机科学, 2013, 40(3): 59-61. |
|