[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

计算机科学 ›› 2022, Vol. 49 ›› Issue (6): 99-107.doi: 10.11896/jsjkx.210400157

• 高性能计算 • 上一篇    下一篇

面向国产异构众核架构的CFD非结构网格计算并行优化方法

陈鑫1, 李芳1, 丁海昕2, 孙唯哲1, 刘鑫1, 陈德训1, 叶跃进1, 何香1   

  1. 1 国家超级计算无锡中心 江苏 无锡 214000
    2 中国空气动力研究与发展中心 四川 绵阳 621000
  • 收稿日期:2021-04-15 修回日期:2021-07-15 出版日期:2022-06-15 发布日期:2022-06-08
  • 通讯作者: 李芳(lifang56@163.com)
  • 作者简介:(ischen.xin@foxmail.com)
  • 基金资助:
    国家重点研发计划(2016YFB0201100);国家科技重大专项(2017-I-0004-0004)

Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture

CHEN Xin1, LI Fang1, DING Hai-xin2, SUN Wei-ze1, LIU Xin1, CHEN De-xun1, YE Yue-jin1, HE Xiang1   

  1. 1 National Super Computing Center in Wuxi,Wuxi,Jiangsu 214000,China
    2 China Aerodynamics Research and Development Center,Mianyang,Sichuan 621000,China
  • Received:2021-04-15 Revised:2021-07-15 Online:2022-06-15 Published:2022-06-08
  • About author:CHEN Xin,born in 1994,master,research assistant.His main research interests include computational fluid dynamics and high-performance parallel computation and application.
    LI Fang,born in 1980,Ph.D,associate researcher.Her main research interests include computational fluid dynamics and high-performance parallel computation and application.
  • Supported by:
    National Key Research and Development Project of China(2016YFB0201100) and National Science and Techno-logy Major Project (2017-I-0004-0004).

摘要: 神威太湖之光在2016-2018年度全球超算top500榜单中排名第一,峰值性能为125.4 PFlops,其计算能力主要归功于国产SW26010众核处理器。由于CFD非结构网格计算存在拓扑关系复杂、离散访存问题严重、存在强相关的线化方程求解等问题,导致CFD非结构网格计算一直是国产众核超级计算机移植与优化的难题。为充分发挥国产异构众核架构的计算效能,首先,提出了一种数据重构模型,提高了数据的局部性和可并行性,使得数据结构更加适应众核架构的特点;然后,针对非结构网格数据存放的无序性导致的离散访存问题,提出了一种基于信息关系预存的离散访存优化方法,将离散访存转化为连续访存;最后,对于存在强相关的线化方程求解问题,引入了从核阵列流水线并行的思想,实现了众核并行。优化后CFD非结构网格计算的整体性能相比原始版本提升了4.19倍,相比通用CPU提升了1.2倍,并扩展到62.4万计算核心的并行规模,能保持64.5%的并行效率。

关键词: 并行计算, 非结构网格, 计算流体力学, 神威超级计算机, 异构众核

Abstract: Sunway TaihuLight ranked first in the global supercomputer top 500 list 2016-2018 with a peak performance of 125.4 PFlops.Its computing power is mainly attributed to the domestic SW26010 many-core RISC processor.CFD unstructured-grid computing has always been a challenge for porting and optimizing in domestic many-core supercomputer,because of its complex topology,serious discrete memory access problems,and strongly correlated linear equation solution.In order to give fully play to the computing efficiency of domestic heterogeneous multi-core architecture,firstly,a data reconstruction model is proposed to improve the locality and parallelism of data,and the data structure is more suitable for the characteristics of multi-core architecture.Secondly,aiming at the discrete memory access problem caused by the disorder of unstructured-grid data storage,a discrete memory access optimization method based on prestorage of information relation is proposed,which transforms discrete memory access into continuous memory access.Finally,the pipeline parallelism mechanism in core array is introduced to realize many-core parallelism for solving linear equations with strong correlation.Experiments show that the overall performance of unstructured-grid computing in CFD is improved by more than 4 times,and is 1.2x faster than the general CPU.The computing cores scale to 624 000,and the parallelism efficiency is maintained at 64.5%.

Key words: Computational fluid dynamics, Heterogeneous many-core, Parallel computing, Sunway supercomputer, Unstructured-grid

中图分类号: 

  • TP311
[1] LIN C L,TAWHAI M H,MCLENNAN G,et al.Computational fluid dynamics[J].IEEE Engineering in Medicine & Biology Magazine,2009,28(3):25-33.
[2] XU K,MATHEMATICS D O.Direct modeling for computa-tional fluid dynamics[J].Acta Mechanica Sinica,2015,1(1):303-318.
[3] XU C F,DENG X G,ZHANG L L,et al.Parallelizing a High-Order CFD Software for 3D,Multi-block,Structural Grids on the TianHe-1A Supercomputer[C]//International Supercomputing Conference.Berlin,Heidelberg,2013:26-39.
[4] CORRIGAN A,CAMELLI F,LOHNER R,et al.Running unstructured grid based CFD solvers on modern graphics hardware[J].International Journal for Numerical Methods in Fluids,2011,66(2):221-229.
[5] ABBRUZZESE G,GÓMEZ M,CORDERO-GRACIA M,et al.Unstructured 2D grid generation using overset-mesh cutting and single-mesh reconstruction[J].Aerospace Science & Techno-logy,2018,78:637-647.
[6] JAHANDARI H,BIHLO A.Forward modelling of geophysical electromagnetic data on unstructured grids using an adaptive mimetic finite-difference method[J].Computational Geosciences,2021,25:1083-1104.
[7] CHEN S S,HUA Y,CAI F J,et al.Multi-dimensional dissipation strategy within advection upstream splitting methods in hypersonic flows[J].Journal of Physics:Conference Series,2021,1786(1):012050.
[8] DLA C,MP A,RL A,et al.Tracer transport within an unstructured grid ocean model using characteristic discontinuous Galerkin advection -ScienceDirect[J].Computers & Mathematics with Applications,2019,78(2):611-622.
[9] CAI X,ZHANG Y J,SHEN J,et al.A Numerical Study of Hypoxia in Chesapeake Bay Using an Unstructured Grid Model:Validation and Sensitivity to Bathymetry Representation[J].JAWRA Journal of the American Water Resources Association,2020,10:1-24.
[10] FUJITA K,HORIKOSHI M,ICHIMURA T,et al.Develop-ment of Element-by-Element Kernel Algorithms in Unstructured Finite-Element Solvers for Many-Core Wide-SIMD CPUs:Application to Earthquake Simulation[J].Journal of Computational Science,2020,45:1-11.
[11] SHARMA V,ESWARAN V,CHAKRABORTY D,et al.Determination of optimal spacing between transverse jets in a SCRAMJET engine[J].Aerospace Science and Technology,2020,96:1-12.
[12] LI F,LI Z H,XU J X.Research on Adaptation of CFD Software Based on Many-core Architecture of 100P Domestic Supercomputing System[J].Computer Science,2020,47(1):24-30.
[13] LI R,WANG X,ZHAO W B.A Multigrid Block LU-SGS Algorithm for Euler Equations on Unstructured Grids[J].Numerical Mathematics Theory Methods & Applications,2008,1(1):1-25.
[14] LI W,LUO L S.An implicit block LU-SGS finite-volume lattice-Boltzmann scheme for steady flows on arbitrary unstructured meshes[J].Journal of Computational Physics,2016,20(2):503-518.
[15] FU H H,LIAO J F,YANG J Z,et al.The Sunway Taihu Light supercomputer:system and applications[J].Science China(Information Sciences),2016,59(7):113-128.
[16] LIN H,TANG X,YU B,et al.Scalable Graph Traversal onSunway TaihuLight with Ten Million Cores[C]//2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).IEEE,2017.
[17] LIN J,XU Z,NUKADA A.Optimizations of Two Computebound Scientific Kernelson SW26010 Manycore Processor[C]//Proceedings of the 46th International Conference on Parallel Processing.IEEE,2017.
[18] DONGARR J.Sunway TaihuLight supercomputer makes its appearance[J].National Science Review,2016,3(3):265-266.
[19] LIU X,LU L S,CHEN D X,et al.Research on Pre-processing Methods of Unstructured Grids[J].Computer Science,2012,39(3):308-311.
[20] MENG D L,WEN M H,WEI J W,et al.Porting and Optimizing OpenFOAM on Sunway TaihuLight System[J].Computer Science,2017,44(10):64-70.
[21] NI H,LIU X.Unstructured grid many-core optimization technology based on Sunway·Taihulight[J].Computer Enginee-ring,2019,45(6):45-51.
[22] XU T H.GPU implementation of compressible viscous flow numerical method based on unstructured mesh[D].Nanjing:Nanjing University of Aeronautics and Astronautics,2016.
[23] CHEN L,XU T H,TIAN S L.Research on GPU Acceleration of Implicit Schemes Based on Unstructured Grids[J].Computer System Application,2018,27(5):238-243.
[24] SINGH M,SINGH R,SINGH S,et al.Discrete Finite VolumeApproach for Multidimensional Agglomeration Population Ba-lance Equation on Unstructured Grid[J].Powder Technology,2020,376:229-240.
[25] ZHOU S,WEI W,GUO X.Notice of Retraction Unstructuredgrid finite volume method for NS equation[C]//International Conference on Computer Application & System Modeling.IEEE,2010.
[26] BOCHAROV A N,EVSTIGNEEV N M,RYABKOV O I.Fully implicit multiple graphics processing units’ schemes for hypersonic flows with lower upper symmetric Gauss-Seidel preconditioner on unstructured non-orthogonal grids[J].Journal of Physics:Conference Series,2020,1698(1):1-13.
[27] WANG L.Parallel Numerical Simulations of the Whole Scramjet Engine Flowfields on Unstructured grids[D].Mianyan:China Aerodynamics Research and Development Center,2007.
[28] HORSTMAN C,SETTLES G S,WILLIAMS D R,et al.A Reattaching Free Shear Layer in Compressible Turbulent Flow[J].AIAA Journal,1982,20(1):79-85.
[29] BYNUM M,BAURLE R.A Design of Experiments Study forthe HIFiRE Flight 2 Ground Test Computational Fluid Dyna-mics Results[C]//17th AIAA International Space Planes and Hypersonic Systems and Technologies Conference.2013.
[1] 叶跃进, 李芳, 陈德训, 郭恒, 陈鑫.
基于国产众核架构的非结构网格分区块重构预处理算法研究
Study on Preprocessing Algorithm for Partition Reconnection of Unstructured-grid Based on Domestic Many-core Architecture
计算机科学, 2022, 49(6): 73-80. https://doi.org/10.11896/jsjkx.210900045
[2] 刘江, 刘文博, 张矩.
OpenFoam中多面体网格生成的MPI+OpenMP混合并行方法
Hybrid MPI+OpenMP Parallel Method on Polyhedral Grid Generation in OpenFoam
计算机科学, 2022, 49(3): 3-10. https://doi.org/10.11896/jsjkx.210700060
[3] 傅天豪, 田鸿运, 金煜阳, 杨章, 翟季冬, 武林平, 徐小文.
一种面向构件化并行应用程序的性能骨架分析方法
Performance Skeleton Analysis Method Towards Component-based Parallel Applications
计算机科学, 2021, 48(6): 1-9. https://doi.org/10.11896/jsjkx.201200115
[4] 何亚茹, 庞建民, 徐金龙, 朱雨, 陶小涵.
基于神威平台的Floyd并行算法的实现和优化
Implementation and Optimization of Floyd Parallel Algorithm Based on Sunway Platform
计算机科学, 2021, 48(6): 34-40. https://doi.org/10.11896/jsjkx.201100051
[5] 冯凯, 马鑫玉.
(n,k)-冒泡排序网络的子网络可靠性
Subnetwork Reliability of (n,k)-bubble-sort Networks
计算机科学, 2021, 48(4): 43-48. https://doi.org/10.11896/jsjkx.201100139
[6] 胡蓉, 阳王东, 王昊天, 罗辉章, 李肯立.
基于GPU加速的并行WMD算法
Parallel WMD Algorithm Based on GPU Acceleration
计算机科学, 2021, 48(12): 24-28. https://doi.org/10.11896/jsjkx.210600213
[7] 马梦宇, 吴烨, 陈荦, 伍江江, 李军, 景宁.
显示导向型的大规模地理矢量实时可视化技术
Display-oriented Data Visualization Technique for Large-scale Geographic Vector Data
计算机科学, 2020, 47(9): 117-122. https://doi.org/10.11896/jsjkx.190800121
[8] 陈国良, 张玉杰.
并行计算学科发展历程
Development of Parallel Computing Subject
计算机科学, 2020, 47(8): 1-4. https://doi.org/10.11896/jsjkx.200600027
[9] 阳王东, 王昊天, 张宇峰, 林圣乐, 蔡沁耘.
异构混合并行计算综述
Survey of Heterogeneous Hybrid Parallel Computing
计算机科学, 2020, 47(8): 5-16. https://doi.org/10.11896/jsjkx.200600045
[10] 郭杰, 高希然, 陈莉, 傅游, 刘颖.
用数据驱动的编程模型并行多重网格应用
Parallelizing Multigrid Application Using Data-driven Programming Model
计算机科学, 2020, 47(8): 32-40. https://doi.org/10.11896/jsjkx.200500093
[11] 袁欣辉, 林蓉芬, 魏迪, 尹万旺, 徐金秀.
面向国产异构众核处理器SW26010的BFS优化方法
Optimization of BFS on Domestic Heterogeneous Many-core Processor SW26010
计算机科学, 2020, 47(8): 98-104. https://doi.org/10.11896/jsjkx.191000013
[12] 冯凯, 李婧.
k元n方体的子网络可靠性研究
Study on Subnetwork Reliability of k-ary n-cubes
计算机科学, 2020, 47(7): 31-36. https://doi.org/10.11896/jsjkx.190700170
[13] 杨宗霖, 李天瑞, 刘胜久, 殷成凤, 贾真, 珠杰.
基于Spark Streaming的流式并行文本校对
Streaming Parallel Text Proofreading Based on Spark Streaming
计算机科学, 2020, 47(4): 36-41. https://doi.org/10.11896/jsjkx.190300070
[14] 邓定胜.
一种改进的DBSCAN算法在Spark平台上的应用
Application of Improved DBSCAN Algorithm on Spark Platform
计算机科学, 2020, 47(11A): 425-429. https://doi.org/10.11896/jsjkx.190700071
[15] 徐传福,王曦,刘舒,陈世钊,林玉.
基于Python的大规模高性能LBM多相流模拟
Large-scale High-performance Lattice Boltzmann Multi-phase Flow Simulations Based on Python
计算机科学, 2020, 47(1): 17-23. https://doi.org/10.11896/jsjkx.190500009
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!