用数据驱动的编程模型并行多重网格应用

计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 32-40.doi: 10.11896/jsjkx.200500093

所属专题：高性能计算

用数据驱动的编程模型并行多重网格应用

郭杰¹, 高希然², 陈莉², 傅游¹, 刘颖²

1 山东科技大学计算机科学与工程学院山东青岛266590
2 中国科学院计算技术研究所计算机体系结构国家重点实验室北京100190

出版日期:2020-08-15 发布日期:2020-08-10
通讯作者: 陈莉(lchen@ict.ac.cn)
作者简介:17854258663@163.com
基金资助:
国家自然科学基金(61521092);国家重点研发计划(2016YFB0200803);山东省重点研发计划(2019GGX101066)

Parallelizing Multigrid Application Using Data-driven Programming Model

GUO Jie¹, GAO Xi-ran², CHEN Li², FU You¹, LIU Ying²,

1 College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, Shandong 266590, China
2 State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

Online:2020-08-15 Published:2020-08-10
About author:GUO Jie, born in 1996, postgraduate.His main research interests includeparal-lel optimization and parallel compilation.
CHEN Li, born in 1970, Ph.D, associate professor, is a member of China Computer Federation.Her main research interests include parallel programming languages and parallelizing compiling techniques.
Supported by:
This work was supported by the National Natural Science Foundation of China(61521092), National Key R&D Program of China(2016YFB0200803) and Key R&D Project of Shandong Province(2019GGX101066).

摘要/Abstract

摘要： 多重网格是数值计算领域中一种加速迭代收敛的重要技术, 被广泛应用。近年来, 大规模并行计算系统向多核化、异构众核化发展, 多重网格应用也亟须适应新的并行计算平台。文中采用一种数据驱动的任务并行语言AceMesh将遗产的NAS MG程序移植到“天河二号”和“神威·太湖之光”两种不同架构的国产超算平台上, 展示了使用该语言对计算循环、通信代码的任务并行方法, 验证了AceMesh语言的跨平台性能可移植性。文中定性地分析了该应用的任务图特征和计算-通信重叠的特点, 并分别在两个并行计算平台上将其与现有编程模型MPI/OpenMP和MPI/OpenACC进行性能对比, 分析了AceMesh任务图并行程序对访存性能和通信-计算重叠的优化效果。实验数据表明, 相比传统的并行编程方法, AceMesh在“神威·太湖之光”和“天河二号”平台上分别最高获得了1.19X和1.85X的性能加速。最后, 针对该应用在不同网格层的通信特点以及通信序列化导致大量通信不能隐藏的问题, 提出了未来的研究方向。

关键词: MPI遗产应用, 多重网格, 计算-通信重叠, 数据驱动的任务并行编程模型, 异构众核

Abstract: Multigrid is an important family of algorithms to accelerate the convergence of iterative solvers for linear systems, and it plays an important role in large-scale scientific computing.At present, distributed-memory systems have evolved to large scale systems based on multi-core nodes or heterogeneous nodes with accelerators.Legacy applications face the urgent need to be ported to modern supercomputers with diverse node-level architectures.In this paper, a data-driven programming language, AceMesh is introduced, and using this directive language, NAS MG is ported to two home-made supercomputers which are Tianhe-2 and Sunway TaihuLight supercomputer.This paper shows how to taskify computation loops and communication-related codes in AceMesh, and analyzes the characteristics on its task graph and on its computation-communication overlapping.Experimental results show that compared with traditional programming models, the AceMesh versions achieve relative speedup up to 1.19X and 1.85X on Sunway TaihuLight and Tianhe-2 respectively.Analyses show that performance improvements come from two main reasons, memory-related optimization and communication overlapping optimization.At last, future directions are put forward to further optimize inter-process communications for the AceMesh version.

Key words: Computation-communication overlap, Data-driven task parallel programming model, Heterogeneous many-core, MPI legacy application, Multigrid

中图分类号:

TP311

郭杰, 高希然, 陈莉, 傅游, 刘颖. 用数据驱动的编程模型并行多重网格应用[J]. 计算机科学, 2020, 47(8): 32-40. https://doi.org/10.11896/jsjkx.200500093

GUO Jie, GAO Xi-ran, CHEN Li, FU You, LIU Ying. Parallelizing Multigrid Application Using Data-driven Programming Model[J]. Computer Science, 2020, 47(8): 32-40. https://doi.org/10.11896/jsjkx.200500093

参考文献

[1] BRANDT A.Multiscale computational methods:research activities[C]∥Proceedings of 1991 Hang Zhou International Conf.on Scientific Computation.Singapore:World Scientific Publishing Co., 1992.
[2] BRANDT A.Multi-Level Adaptive Solutions to Boundary-ValueProblems.Mathematics of Computation, 1977, 31(138):333-390.
[3] HACKBUSCH W.Multi-Grid Methods and Applications.Heidelberg:Springer, 1985.
[4] NAKAJIMA K.Optimization of serial and parallel communications for parallel geometric multigrid method∥Proceedings of IEEE International Conference on Parallel and Distributed Systems(ICPADS).Hsinchu, Taiwan, 2014:25-32.
[5] LIU X Z, LU Z H, HU X D, et al.Large-scale Parallel CFD Simulation Software-CCFD Development and Application[C]∥HPC China 2019.2019.
[6] LEI J, LIU W, ZHOU Y L, et al.CFD unsteady flow simulations using GPU with high-order schemes[C]∥HPC China 2019.2019.
[7] WANG W, XU C F, CHE Y G.A Heterogeneous Parallel Algorithm Based on Inner-Out Subdomain Dividing for High Order CFD Solver[C]∥HPC China 2019.2019.
[8] NVIDIA, the Portland Group.The openacc application programming interface.http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.698.5254&rep=rep1&type=pdf.
[9] OpenMP Architecture Review Board.OpenMP Application Program Interface(Version 4.0).http://www.openmp.org/.
[10] DURAN A, AYGUADE E, BADIA R M, et al.OmpSs:A Proposal for Programming Heterogeneous Multi-core Architectures[J].Parallel Processing Letters, 2011, 21(2):173-193.
[11] AUGONNET C, THIBAULT S, NAMYST R, et al.StarPU:A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures.Concurrency and Computation-Practice &Experience, 2011, 23(2):187-198.
[12] Intel Inc.Intel CilkTM Plus.https://www.cilkplus.org.
[13] Intel Inc.Intel Threading Building Blocks Documentation.https://software.intel.com/en-us/node/506286.
[14] BRIGGS W L, EMDEN H V, MCCORMICK S F.A Multigrid Tutorial, 2nd Edition.Society for Industrial and Applied Mathematics, 2000.
[15] WAGNER C.Introduction to Algebraic Multigrid.http://www.mgnet.org/mgnet/papers/Wagner/amgV11.pdf.
[16] BAILEY D H, BARSZCZ E, BARTON J T, et al.The NAS Parallel Benchmarks.https://www.nas.nasa.gov/assets/pdf/techreports/1994/rnr-94-007.pdf.
[17] XU Z, LIN J, MATSUOKA S.Benchmarking SW26010 Many-Core Processor[C]∥IEEE International Parallel & Distributed Processing Symposium Workshops.IEEE, 2017.
[18] FU H H, LIAO J F, YANG J Z, et al.The Sunway Taihu Light supercomputer:system and applications.Science China(Information Sciences), 2016, 59(7):113-128.
[19] LI F, LI Z H, XU J X, et al.Research on Adaptation of CFD Software Based on Many-core Architecture of 100P Domestic Supercomputing System.Computer Science, 2020, 47(1):24-30.
[20] BASU P, VENKAT A, HALL M, et al.Compiler generation and autotuning of communication-avoiding operators for geometric multigrid[C]∥High Performance Computing.2013:452-461.
[21] CHAN C, ANSEL J, WONG Y L, et al.Autotuning multigrid with petabricks[C]∥Proceedings of the ACM/IEEE Conference on High Performance Computing Networking.New York:ACM, 2009.
[22] CHRISTEN M, SCHENK O, BURKHART H.PATUS:A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures[C]∥Parallel & Distributed Processing Symposium(IPDPS) 2011 IEEE International.2011:676-687.
[23] MARJANOVIC V, LABARTA J, AYGUADE E, et al.Overlapping communication and computation by using a hybrid MPI/SMPSs approach[C]∥Proceedings of the 24th ACM International Conference on Supercomputing.2010:5-16.
[24] CASTILLO E, JAIN N, CASAS M, et al.Optimizing computation-communication overlap in asynchronous task-based programs[C]∥Proceedings of the ACM International Conference on Supercomputing(ICS ’19).New York:Association for Computing Machinery, 2019:380-391.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed