A GPU-Accelerated Method for 3D Nonlinear Kelvin Ship Wake Patterns Simulation
<p>Flow field diagram.</p> "> Figure 2
<p>Calculation flow chart of the banded preconditioner JFNK method.</p> "> Figure 3
<p>The distribution of eigenvalues of <math display="inline"><semantics> <msub> <mi mathvariant="bold">J</mi> <mi mathvariant="bold-italic">t</mi> </msub> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="bold">J</mi> <mi mathvariant="bold-italic">t</mi> </msub> <msup> <mrow> <mi mathvariant="bold">P</mi> </mrow> <mrow> <mo>−</mo> <mn mathvariant="bold">1</mn> </mrow> </msup> </mrow> </semantics></math> on a <math display="inline"><semantics> <mrow> <mn>31</mn> <mo>×</mo> <mn>11</mn> </mrow> </semantics></math> mesh.</p> "> Figure 4
<p>Construction of the banded preconditioner, the area marked in red lines indicates the bandwidth.</p> "> Figure 5
<p>The distribution of eigenvalues of <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="bold">J</mi> <mi mathvariant="bold-italic">t</mi> </msub> <msup> <mrow> <mi mathvariant="bold">P</mi> </mrow> <mrow> <mo>−</mo> <mn mathvariant="bold">1</mn> </mrow> </msup> </mrow> </semantics></math> on a <math display="inline"><semantics> <mrow> <mn>31</mn> <mo>×</mo> <mn>11</mn> </mrow> </semantics></math> mesh for: <math display="inline"><semantics> <mrow> <mi>b</mi> <mi>a</mi> <mi>n</mi> <mi>d</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo> </mo> <mi>b</mi> <mi>a</mi> <mi>n</mi> <mi>d</mi> <mo>=</mo> <mn>11</mn> <mo>,</mo> <mo> </mo> <mi>b</mi> <mi>a</mi> <mi>n</mi> <mi>d</mi> <mo>=</mo> <mn>21</mn> </mrow> </semantics></math>.</p> "> Figure 6
<p>The plot of runtime against the bandwidth computed on a <math display="inline"><semantics> <mrow> <mn>121</mn> <mo>×</mo> <mn>41</mn> </mrow> </semantics></math> mesh, <math display="inline"><semantics> <mrow> <mi>b</mi> <mi>a</mi> <mi>n</mi> <mi>d</mi> <mo>=</mo> <msup> <mi>b</mi> <mo>′</mo> </msup> <mo>×</mo> <mrow> <mo>(</mo> <mi>N</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </semantics></math>.</p> "> Figure 7
<p>Illustration of CUDA execution mode and thread organization hierarchy [<a href="#B23-applsci-13-12148" class="html-bibr">23</a>].</p> "> Figure 8
<p>The computation time distribution of a ship wave solver. The alphabet I represents the step of inverting the preconditioner matrix, the alphabet C represents the step of creating a nonlinear system, the alphabet B represents the step of building a preconditioner matrix, the alphabet S represents the step of solving linear equations by the GMRES algorithm and the alphabet O represents the step of other codes in the solver.</p> "> Figure 9
<p>The computational flow chart of GPU implementation.</p> "> Figure 10
<p>Optimal values of bandwidth <math display="inline"><semantics> <msup> <mi>b</mi> <mo>′</mo> </msup> </semantics></math> for different mesh sizes.</p> "> Figure 11
<p>A comparison of the centerline profiles for the simulation results of the CPU solver and GPU solver, which are computed on a <math display="inline"><semantics> <mrow> <mn>361</mn> <mo>×</mo> <mn>121</mn> </mrow> </semantics></math> mesh with <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>x</mi> <mo>=</mo> <mn>0.3</mn> <mo>,</mo> <mo> </mo> <mo>Δ</mo> <mi>y</mi> <mo>=</mo> <mn>0.3</mn> <mo>,</mo> <mo> </mo> <mi>F</mi> <mo>=</mo> <mn>0.7</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>ϵ</mi> <mo>=</mo> <mn>0.4</mn> </mrow> </semantics></math>. The solid line represents the simulation result of the GPU solver, while the solid circles represent the simulation result of the CPU solver.</p> "> Figure 12
<p>The runtime of the GPU solver and CPU solver at different mesh sizes; red bars represent the GPU solver results and blue bars represent the CPU solver results.</p> "> Figure 13
<p>Wake pattern of real ship waves and the GPU solver computational results. The picture of a real speedboat wake pattern came from the internet <a href="https://www.quanjing.com" target="_blank">https://www.quanjing.com</a>, accessed on 1 September 2023; the picture of a real fishing ship wake came from <a href="https://www.shutterstock.com" target="_blank">https://www.shutterstock.com</a>, accessed on 1 September 2023; the picture of a real large vessel wake came from the internet <a href="https://blogs.worldbank.org" target="_blank">https://blogs.worldbank.org</a>, accessed on 6 September 2023.</p> ">
Abstract
:1. Introduction
2. Numerical Model
3. Banded Proconditioner JFNK Algorithm
3.1. Jacobian-Free Newton–Krylov Method
3.2. Banded Preconditioner Method
3.2.1. Building Preconditioner Matrix
3.2.2. Preconditioner Factorisation and Storage
3.2.3. The Banded Preconditioner
4. GPU Parallel Computing Framework
4.1. Parallel Computing Framework Design
- Step 1:
- Input calculation parameters including the initial guess; the data are transferred from the CPU to GPU;
- Step 2:
- According to the calculation parameters, the nonlinear equations are created in the GPU device;
- Step 3:
- The banded preconditioner method is applied to build the banded preconditioner matrix in the GPU;
- Step 4:
- The QR decomposition algorithm is used to invert the preconditioner matrix, and it saves the decomposition results outside the loop body to avoid repeated QR decomposition of preprocessing;
- Step 5:
- The result of is calculated directly using the QR decomposition results; by combining the result of with the approximate solution of the nonlinear equations, the finite difference approximation is carried out to obtain the linear equations;
- Step 6:
- The GMRES algorithm is used to calculate the linear equations, obtain the correction values and update the approximate solutions ;
- Step 7:
- Check the approximate solutions of the nonlinear equations: if the accuracy requirement is not met, back to step 5; if the accuracy requirement is met, the result is transferred from the GPU to the CPU.
4.2. GPU Solver Implementation
4.2.1. Creating Nonlinear System
Algorithm 1 The programming of creating nonlinear system on the GPU |
|
4.2.2. Building Preconditioner Matrix
Algorithm 2 The programming of building preconditioner matrix on the GPU |
|
4.2.3. Inverting Preconditioner Matrix
- Step 1:
- Using CSR data format to save the preconditioner matrix with an appropriate bandwidth;
- Step 2:
- In the analysis stage, cusolverSpXcsrqrAnalysis() function is used to analyze the sparsity of orthogonal matrix and upper triangular matrix in QR decomposition. This process may consume a large amount of memory. If the memory is insufficient to complete the analysis, the program will stop running and return the corresponding error message;
- Step 3:
- In the preparation stage, cusolverSpXcsrqrAnalysis() function is used to select the appropriate computing space to prepare for QR decomposition. Here, two memory blocks are prepared in the GPU: one to store the orthogonal matrix and the upper triangular matrix, and the other to perform QR decomposition;
- Step 4:
- The cusolverSpDcsrqrSetup() function is called to allocate storage space for the orthogonal and upper triangular matrices based on the results of the preparation stage. Then, cusolverSpDcsrqrFactor() function is used to complete the QR decomposition of coefficient matrix outside the cycle;
- Step 5:
- Using the cusolverSpDcsrqrZeroPivot() function checks the singularity of the decomposition results, if the nearly singular the program terminates operation and error is given, return to step 1 to choose the bandwidth again;
- Step 6:
- In the loop body, the cusolverSpDcsrqrSolve() function is repeatedly called, and the solution of linear equations can be obtained directly by using the decomposition results stored in the GPU;
4.2.4. Solving Linear Equations by GMRES Algorithm
5. Numerical Simulations and Discussion
5.1. Verification of the Banded Preconditioner JFNK Method
5.2. Verification of the GPU Solver
5.2.1. Accuracy
5.2.2. Efficiency
5.2.3. Capability
6. Conclusions
- (1)
- The bandwidth has an effect on the running memory and runtime of the GPU solver. Based on the mesh size, the value of the most appropriate bandwidth is around ; more than 66% GPU memory can be saved.
- (2)
- The GPU solver can obtain an accurate numerical solution. The mean square error of the GPU solver results and CPU solver results is = , which is acceptable.
- (3)
- By designing the GPU parallel computing framework, the computation of ship wave simulation is accelerated up to 20 times.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
JFNK | Jacobian-free Newton–Krylov |
GMRES | Generalized Minimum Residual |
CUDA | Compute Unified Device Architecture |
GPU | Graphics Process Unit |
CPU | Central Processing Unit |
MSE | Mean Square Error |
References
- Dias, F. Ship Waves and Kelvin. J. Fluid Mech. 2014, 746, 1–4. [Google Scholar] [CrossRef]
- Tuck, E.; Scullen, D. A comparison of linear and nonlinear computations of waves made by slender submerged bodies. J. Eng. Math. 2002, 42, 255–264. [Google Scholar] [CrossRef]
- Froude, W. Experiments upon the Effect Produced on the Wave-Making Resistance of Ships by Length of Parallel Middle Body; Institution of Naval Architects: London, UK, 1877. [Google Scholar]
- Kelvin, L. On Ship Waves. Proc. Inst. Mech. Eng. 1887, 38, 409–434. [Google Scholar] [CrossRef]
- Rabaud, M.; Moisy, F. Ship Wakes: Kelvin or Mach Angle? Phys. Rev. Lett. 2013, 110, 214503.1–214503.5. [Google Scholar] [CrossRef] [PubMed]
- Pethiyagoda, R.; Moroney, T.; Lustri, C.; McCue, S. Kelvin Wake Pattern at Small Froude Numbers. J. Fluid Mech. 2021, 915, A126. [Google Scholar] [CrossRef]
- Ma, C.; Zhu, Y.; Wu, H.; He, J.; Zhang, C.; Li, W.; Noblesse, F. Wavelengths of the Highest Waves Created by Fast Monohull Ships or Catamarans. Ocean Eng. 2016, 113, 208–214. [Google Scholar] [CrossRef]
- Havelock, T. Wave resistance: Some cases of three-dimensional fluid motion. Proc. R. Soc. Lond. Ser. A Contain. Pap. Math. Phys. Character 1919, 95, 354–365. [Google Scholar]
- Michell, J.H. The wave resistance of a ship. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1898, 45, 106–123. [Google Scholar] [CrossRef]
- Forbes, L. An algorithm for 3-dimensional free-surface problems in hydrodynamics. J. Comput. Phys. 1989, 82, 330–347. [Google Scholar] [CrossRef]
- Parau, E.; Vanden-Broeck, J. Three-dimensional waves beneath an ice sheet due to a steadily moving pressure. Philos. Trans. R. Soc. Lond. A Math. Phys. Eng. Sci. 2011, 369, 2973–2988. [Google Scholar]
- Sun, X.; Cai, M.; Wang, J.; Liu, C. Numerical Simulation of the Kelvin Wake Patterns. Appl. Sci. 2022, 12, 6265. [Google Scholar] [CrossRef]
- Crespo, A.; Domínguez, J.; Barreiro, A.; Gómez-Gesteira, M.; Rogers, B. GPUs, a New Tool of Acceleration in CFD: Efficiency and Reliability on Smoothed Particle Hydrodynamics. PLoS ONE 2011, 6, e20685. [Google Scholar] [CrossRef] [PubMed]
- Hori, C.; Gotoh, H.; Ikari, H.; Khayyer, A. GPU-Acceleration for Moving Particle Semi-Implicit Method. Comput. Fluids 2011, 51, 174–183. [Google Scholar] [CrossRef]
- Pethiyagoda, R. Mathematical and Computational Analysis of Kelvin Ship Wave Patterns. Ph.D. Thesis, Queensland University of Technology, Brisbane, QLD, Australia, 2016. [Google Scholar]
- Lu, X.; Dao, M.H.; Le, Q.T. A GPU-accelerated domain decomposition method for numerical analysis of nonlinear waves-current-structure interactions. Ocean Eng. 2022, 259, 111901. [Google Scholar] [CrossRef]
- Xie, F.; Zhao, W.; Wan, D. CFD Simulations of Three-Dimensional Violent Sloshing Flows in Tanks Based on MPS and GPU. J. Hydrodyn. 2020, 32, 672–683. [Google Scholar] [CrossRef]
- Saad, Y.; Schultz, M. GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems. SIAM J. Sci. Stat. Comput. 1986, 7, 856–869. [Google Scholar] [CrossRef]
- Brown, P.; Saad, Y. Hybrid Krylov Methods for Nonlinear Systems of Equations. SIAM J. Sci. Stat. Comput. 1990, 11, 450–481. [Google Scholar] [CrossRef]
- Dembo, R.; Eisenstat, S.; Steihaug, T. Inexact Newton Methods. SIAM J. Numer. Anal. 1982, 19, 400–408. [Google Scholar] [CrossRef]
- Trefethen, L.; Bau, D. Numerical Linear Algebra; SIAM: Philadelphia, PA, USA, 1997. [Google Scholar]
- Lustri, C.J.; Chapman, S.J. Steady Gravity Waves Due to a Submerged Source. J. Fluid Mech. 2013, 732, 400–408. [Google Scholar] [CrossRef]
- NVIDIA. CUDA Toolkit Documentation v11.7.1; NVIDIA: Santa Clara, CA, USA, 2022. [Google Scholar]
- Grossman, M.; Mckercher, T. Professional CUDA C Programming; China Machine Press: Beijing, China, 2017. [Google Scholar]
No. | Function Name | Goal |
---|---|---|
1 | cusolverSpXcsrqrAnalysisHost(); | Analyze structure |
2 | cusolverSpDcsrqrBufferInfoHost(); | Set up workspace |
3 | cusolverSpDcsrqrSetupHost(); | QR factorization |
4 | cusolverSpDcsrqrFactorHost(); | QR factorization |
5 | cusolverSpDcsrqrZeroPivotHost(); | Check singular |
6 | cusolverSpDcsrqrSolveHost(); | Solve system |
CPU | GPU | |
---|---|---|
Card | Intel Xeon Bronze 3204 | NVIDIA Tesla A100 |
Memory | 64 GB | 40 GB |
Max Cores | 6 per node | 6912 |
Programming language | C++ | CUDA, C++ |
Mesh Size | Before | After | Reduction Ratio | |
---|---|---|---|---|
0.91 GB | 19 | 0.28 GB | 3.2 | |
2.9 GB | 24 | 0.88 GB | 3.3 | |
6.9 GB | 33 | 2.3 GB | 3.0 | |
15 GB | 38 | 4.6 GB | 3.3 |
Mesh Size | CPU Solver | Exp. | GPU Solver | Accelerated-Up Ratio |
---|---|---|---|---|
s | s | s | 16.1 | |
s | s | s | 23.9 | |
s | s | s | 16.0 | |
s | s | s | 19.3 | |
s | s | s | 20.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, X.; Cai, M.; Ding, J. A GPU-Accelerated Method for 3D Nonlinear Kelvin Ship Wake Patterns Simulation. Appl. Sci. 2023, 13, 12148. https://doi.org/10.3390/app132212148
Sun X, Cai M, Ding J. A GPU-Accelerated Method for 3D Nonlinear Kelvin Ship Wake Patterns Simulation. Applied Sciences. 2023; 13(22):12148. https://doi.org/10.3390/app132212148
Chicago/Turabian StyleSun, Xiaofeng, Miaoyu Cai, and Junchen Ding. 2023. "A GPU-Accelerated Method for 3D Nonlinear Kelvin Ship Wake Patterns Simulation" Applied Sciences 13, no. 22: 12148. https://doi.org/10.3390/app132212148
APA StyleSun, X., Cai, M., & Ding, J. (2023). A GPU-Accelerated Method for 3D Nonlinear Kelvin Ship Wake Patterns Simulation. Applied Sciences, 13(22), 12148. https://doi.org/10.3390/app132212148