Open AccessArticle

A Block Iteration with Parallelization Method for the Greedy Selection in Radial Basis Functions Based Mesh Deformation

Ran Zhao

¹,

Chao Li

^2,*,

Xiaowei Guo

²,

Sijiang Fan

^1,†,

Yi Wang

^1,† and

Canqun Yang

College of Computer, National University of Defense Technology, Changsha 410073, China

State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2019, 9(6), 1141; https://doi.org/10.3390/app9061141

Submission received: 29 January 2019 / Revised: 3 March 2019 / Accepted: 13 March 2019 / Published: 18 March 2019

(This article belongs to the Section Applied Physics General)

Download

Browse Figures

Figure 1
Time allocation of conventional greedy selection in different experiment setups. The specific explanation of the experiment setup in this figure is presented in <a href="#sec4-applsci-09-01141" class="html-sec">Section 4</a>. "> Figure 2
The mesh around two-dimensional undulating fish. "> Figure 3
Three-dimensional fish and initial mesh of its cubic tank. "> Figure 4
The distribution of control points in three-dimensional fish (<math display="inline"><semantics> <mrow> <msub> <mi>ξ</mi> <mrow> <mi>t</mi> <mi>o</mi> <mi>l</mi> </mrow> </msub> <mo>=</mo> <mn>0.0001</mn> </mrow> </semantics></math>). "> Figure 5
Convergence history of boundary errors in three-dimensional fish (<math display="inline"><semantics> <mrow> <msub> <mi>ξ</mi> <mrow> <mi>t</mi> <mi>o</mi> <mi>l</mi> </mrow> </msub> <mo>=</mo> <mn>0.0001</mn> </mrow> </semantics></math>). "> Figure 6
Time comparison of three-dimensional fish based on <a href="#applsci-09-01141-t004" class="html-table">Table 4</a>. "> Figure 7
ONERA M6 wing and its initial surface mesh. "> Figure 8
The distribution of boundary errors on wing surface. "> Figure 9
Time comparison of ONERA M6 wing. "> Figure 10
Speedup ratio and parallel efficiency of ONERA M6 wing. (a) Part d, <math display="inline"><semantics> <mrow> <msub> <mi>ξ</mi> <mrow> <mi>t</mi> <mi>o</mi> <mi>l</mi> </mrow> </msub> <mo>=</mo> <mn>0.0005</mn> </mrow> </semantics></math>. (b) Part e, <math display="inline"><semantics> <mrow> <msub> <mi>ξ</mi> <mrow> <mi>t</mi> <mi>o</mi> <mi>l</mi> </mrow> </msub> <mo>=</mo> <mn>0.0005</mn> </mrow> </semantics></math>. (c) Part d, <math display="inline"><semantics> <mrow> <msub> <mi>ξ</mi> <mrow> <mi>t</mi> <mi>o</mi> <mi>l</mi> </mrow> </msub> <mo>=</mo> <mn>0.00055</mn> </mrow> </semantics></math>. (d) Part e, <math display="inline"><semantics> <mrow> <msub> <mi>ξ</mi> <mrow> <mi>t</mi> <mi>o</mi> <mi>l</mi> </mrow> </msub> <mo>=</mo> <mn>0.00055</mn> </mrow> </semantics></math> "> Figure 11
Three-dimensional Super-cavitating Hydrofoil and its initial surface mesh. "> Figure 12
Time cost and speedup of three-dimensional Super-cavitating Hydrofoil. (a) Time Comparison of three-dimensional Super- cavitating Hydrofoil. (b) Speedup Ratio and Parallel Efficiency of three- dimensional Super-cavitating Hydrofoil. ">

Versions Notes

Abstract

Greedy algorithm is one of the important point selection methods in the radial basis function based mesh deformation. However, in large-scale mesh, the conventional greedy selection will generate expensive time consumption and result in performance penalties. To accelerate the computational procedure of the point selection, a block iteration with parallelization method is proposed in this paper. By the block iteration method, the computational complexities of three steps in the greedy selection are all reduced from

O (n^{3})

O (n^{2})

. In addition, the parallelization of two steps in the greedy selection separates boundary points into sub-cores, efficiently accelerating the procedure. Specifically, three typical models of three-dimensional undulating fish, ONERA M6 wing and three-dimensional Super-cavitating Hydrofoil are taken as the test cases to validate the proposed method and the results show that it improves 17.41 times performance compared to the conventional method.

Keywords:

CFD mesh deformation; RBF interpolation; greedy selection; data reduction; block iteration; parallel computing

1. Introduction

Simulation based on computational fluid dynamics (CFD) is an effective solution for various problems in aerospace engineering and ocean engineering [1], etc. Among methods adopted by CFD, the mesh deformation method plays a significant role. To promote the quality and precision of the grid after mesh deformation operation, researchers exploit two different mesh deformation strategies: connectivity methods and point-by-point methods.

One typical instance of connectivity methods is the spring analogy [2]. It assumes that every edge has all the characteristics of a spring, shaping a mesh connected by springs and it is widely applied in aerodynamic field. Subsequently, Farhat et al. [3] improved its robustness by including the torsional condition. However, in large-scale mesh systems, the spring analogy method will produce an expensive cost due to its requirement for the whole mesh connectivity. The linear elastic analogy [4,5,6], in which the mesh cell is abstracted into an elastic solid, solves the mesh deformation based on the partial differential equation. This method has great robustness in large-scale mesh deformation but still produces a huge expensive cost.

The point-by-point methods mean that each node displaced independently and has no relation with other nodes [7]. This means that the method could provide an identical solution for both structured and unstructured mesh. In the later period of time, Liu et al. [8] proposed a method which propagates deformation from boundary points to internal points by interpolation of barycenter. Witteveen [9] proposed a method that the boundary points could affect the deformation of the internal points by inverse distance weighted interpolation. In addition, Luke et al. [10] decreased computational cost of aforementioned methodology via tree-code optimization. Although the point-by-point methods increase the efficiency of computation, it doesn’t perform well in the complex mesh system and is not able to preserve the mesh orthogonality. Fortunately, one of the point-by-point methods, radial basis function interpolation, does not have these aforementioned problems.

In 2007, Boer et al. [11] put forward the application of radial basis function (RBF) interpolation in mesh deformation. It is a promotion of the original point-by-point methods that ensures the preservation of mesh orthogonality while still maintaining its good adaptability in all mesh types. The main idea is to calculate mesh motion by interpolating the displacements of the boundary points to the internal points. The research by Boer et al. [11], Sheng et al. [12] and Bos et al. [13] showed that different RBFs adopted in the mesh deformation could result in different results. Jakobsson and Amoignon [14] proposed a point coarsening method for gradient-based aerodynamic shape optimization. By conducting a variety of testing cases, they finally gave some suggestions about how to choose these RBFs. However, it is demonstrated that, even using the best radial basis function, the computational cost will still grow rapidly as the mesh scale increases. This is due to the calculation redundancy caused by the interpolation from the unnecessary boundary points.

To eliminate the calculation redundancy, Rendall et al. [15] proposed a greedy algorithm to reduce the boundary points for interpolation and the remaining boundary points are selected as a smaller set of control points. Based on the criterion of displacement error, the greedy algorithm could effectively remove the redundant boundary points that has a smaller impact on displacement interpolation. Meanwhile, it could also preserve the mesh quality as well as the conventional RBF mesh deformation. An important work done by Michler [16] is that he directionalized the point selection by selecting a different set of points in each direction using a greedy selection method. However, it is found that, with the increasing quantity of control points, the computational complexity of most greedy point selection processes is approximately

O (n^{4})

, which significantly influences the computational efficiency of the whole process [17].

To solve this problem, in the last decade, researchers have achieved some effective methods by analysing the concrete process of each separated step in the greedy algorithm. Gillebaart et al. [18] adopted an adaptive method to reduce the frequency of greedy point selection. Skala et al. [19] proposed an incremental approach to effectively accelerate the step of matrix inversion. Based on it, Selim et al. [20] presented an improvement method of lower-upper (LU) decomposition. Afterwards, Fang et al. [21] provided an effective method based on recurrence Choleskey decomposition. In addition, in 2017, Kedward et al. [22] proposed a form of data reduction that still includes all the points. In addition, researchers also consider the feasibility of parallel computation in the greedy algorithm. Gerhold et al. [23] parallelized RBF mesh deformation. Then, Rendall et al. [24] proposed a parallel approach applied in multi-bladed rotors. In addition, Li et al. [17] and Fang et al. [21] proposed parallel computation methods based on the master–slave architecture.

Nevertheless, all of the aforementioned methods could just focus on some specific steps without an overall optimization for the greedy algorithm. Meanwhile, considering the continuity of all steps, only some parts could be well parallelized. Therefore, this paper proposes a method combining block iteration with parallel computing. This method provides an optimization scheme for each step in the point selection procedure and could significantly reduce the time cost for the whole RBF mesh deformation. Specifically, the main dedications are summarized as follows:

A block iteration method is developed by analyzing the mathematical characters of the greedy algorithm. With the application of the block iteration, some specific steps that have the feasibility of iteration could be greatly optimized. The computational complexity could be reduced from $O (n^{3})$ to $O (n^{2})$ .
The parallelization is accomplished by analyzing the data dependency of the whole procedure. Steps that have the parallel feasibility could have good speedups because of the low communication cost.
Our block iteration with parallelization method is firstly validated by three-dimensional undulating fish and ONERA M6 wing which are both $10^{6}$ cells mesh models. To validate the method efficiency of large-scale mesh, we adopt a three-dimensional Super-cavitating Hydrofoil model with 11 million cells. All three of the models could obtain an effective improvement via the proposed method.

The rest of this paper is allocated as follows. Section 2 recalls the methodology of RBF mesh deformation and the conventional greedy algorithm. Section 3 demonstrates the fundamental theory of block iteration and the implementation of parallel computing in detail. In Section 4, it exhibits the experimental results and discusses how the results could be explained by the previous studies. Finally, conclusions are described in Section 5.

2. RBF Mesh Deformation with Greedy Algorithm

The utilization of point-by-point mesh deformation in CFD is divided into two processes. The first process is to solve the predefined equation coefficient of the boundary points by its actual displacement. In addition, the second process is to obtain the estimated displacement of the internal points based on the received coefficient value. This is the update of grid points after boundary motion in one time step. In this section, the formulation of RBF mesh deformation is introduced at first, and then how the greedy algorithm is adopted in the RBF mesh deformation to promote the efficiency will be explained.

2.1. RBF Mesh Deformation

For a grid model, whole points could be divided into two types. One is internal point, which fills the internal space of the grid model, and the other is boundary point, which is on the grid boundary and attaches to the rim of moving object. The RBF mesh deformation interpolation could be interpreted as follows:

s (x) = \sum_{j = 1}^{N_{b}} λ_{j} ϕ (∥ x - x_{b_{j}} ∥),

(1)

where

x

is the coordinate of a random point,

x_{b_{j}}

is the coordinate of the

j^{t h}

boundary point,

N_{b}

is the total quantity of boundary points,

λ_{j}

is the weight coefficient,

ϕ

is the selected radial basis function,

∥ x ∥

is the Euclidean norm of vector

x

and

s (x)

is the estimated displacement of point

x

. We could calculate the displacement of the moving boundary points which trace with the rim of moving object, and the displacements of the static boundary points are vector 0. Therefore, the displacements of the boundary points could be calculated as follows:

s (x_{b}) = Δ x_{b} = Φ_{b, b} λ,

(2)

Φ_{b, b} (i, j) = ϕ (∥ x_{b_{i}} - x_{b_{j}} ∥),

(3)

where

s (x_{b})

and

Δ x_{b}

are the estimated displacements and actual displacements of the boundary points,

Φ_{b, b}

is an

N_{b} \times N_{b}

matrix, which stores the RBF calculations and

λ

is the N-dimensional coefficients vector that contains

λ^{x}

λ^{y}

, and

λ^{z}

values. Thus, we could obtain the value of

λ

λ = Φ_{b, b}^{- 1} Δ x_{b} .

(4)

After calculating the value of

λ

, we could obtain the estimated displacement of the internal points by:

s (x_{i n_{i}}) = \sum_{j = 1}^{N_{b}} λ_{j} ϕ (∥ x_{i n_{i}} - x_{b_{j}} ∥),

(5)

where

x_{i n_{i}}

is the

i_{t h}

internal point and

(s (x_{i n_{i}}))

is the estimated displacement of

x_{i n_{i}}

. Therefore, the matrix formulation could be described as:

Δ x_{i n} = Φ_{i n, b} λ,

(6)

Φ_{i n, b} (i, j) = ϕ (∥ x_{i n_{i}} - x_{b_{j}} ∥),

(7)

where

Φ_{i n, b}

is an

N_{i n} \times N_{b}

matrix, and

N_{i n}

is the total quantity of internal points.

For the formulation above, the radial basis functions

ϕ

possess various expressions. In general, it is divided into two categories: compact functions and global functions as presented in Table 1. Compact functions limit the range of the interpolation with a support radius. Global functions operate on the whole grid internal points. The distinct of the two categories is explained by Boer et al. [11]. According to the research by Bos et al. [13], the radius basis function we have chosen is the thin plate spline (TPS), which generates the highest mesh quality of the update grid.

2.2. Greedy Selection

The large-scale mesh contains huge quantity of boundary points and it results in a long time consumption. In fact, part of the boundary points have essential impact on the displacement of internal points. In other words, redundant points bring the unnecessary time cost. To solve this problem, T.C.S. Rendall proposed the greedy algorithm in RBF mesh deformation [15], and it is presented as the following:

α = Φ_{c, c}^{- 1} Δ x_{c},

(8)

Δ x_{i n} = Φ_{i n, c} α,

(9)

where

Δ x_{c}

is the actual displacement of the boundary point,

Φ_{c, c}

is an

N_{c} \times N_{c}

matrix,

N_{c}

is the total quantity of control points and

Φ_{i n, c}

is an

N_{i n} \times N_{c}

matrix.

The greedy selection is the process of selecting the proper control points. Its purpose is to pick out the representative points that affects the estimated displacements most. The detailed process is as follows:

Choose an arbitrary point from boundary points to initialize the set of control points;
Solve the predefined equation coefficient of control points by Equation (8);
Obtain the estimated displacements of the moving boundary points:

$Δ x_{b}^{*} = Φ_{b, c} α,$

(10)

where $Φ_{b, c}$ is an $N_{b} \times N_{c}$ matrix and $Δ x_{b}^{*}$ is the estimated displacement of boundary points;
Obtain the boundary errors of all the unselected boundary points:

$ε = Δ x_{b}^{*} - Δ x_{b},$

(11)

where $ε$ is the boundary errors;
Compare the largest boundary error with the predefined criterion threshold, and figure out whether it is larger than threshold or not:

$\frac{∥ ε_{m a x} ∥_{2}}{∥ Δ x_{b} ∥_{2}} > ξ_{t o l},$

(12)

where ${∥ x ∥}_{2}$ is the Euclidean norm of $x$ and $ξ_{t o l}$ is the artificially setting error tolerance.
If yes, put the point which have the largest boundary error into the set of control points;
If not, end the selection;
Repeat from Step 2 to Step 6.

Greedy selection effectively reduces the quantity and redundancy of operating-attended boundary points compared with the original RBF mesh deformation. However, this algorithm generates additional time cost for the RBF mesh deformation process. The extra time is mainly spent on the loop of points selection, and the cost will increase exponentially with the growth of control points quantity [17]. To accelerate the this procedure, block iteration method and parallel computing method are proposed respectively in Section 3.

3. Block Iteration with Parallelization

To figure out the procedure in which the additional time cost is generated, we experimented on the conventional greedy selection of RBF mesh deformation in different conditions as presented in Figure 1. Conspicuously, it shows that the time allocation of conventional greedy selection could be divided into six parts as presented in Table 2:

As presented in Figure 1, we find out that part b, d and e consume the majority of the time cost and each of them occupies more than 15% of the whole time. Especially for part b, the whole process costs approximately 50% time on it. Thus, the optimization of part b, d and e could efficiently reduce the time cost of the greedy selection in RBF mesh deformation. The following content of this section introduces the methods of

B l o c k i t e r a t i o n

and

P a r a l l e l c o m p u t i n g

, which could efficiently handle these problems. In addition, the methods from Section 3.1.1, Section 3.1.2, Section 3.1.3 and Section 3.2, respectively, optimize the part a, b, d and e.

3.1. Block Iteration

3.1.1. The Iterative Feasibility of Constructing $Φ_{c, c}$

To find the time reduction methods, we initially dissect the construction process of

Φ_{c, c}

. In terms of Equation (3), matrix

Φ_{c, c}

could be unfolded as follows:

Φ_{c, c} = [\begin{matrix} ϕ (∥ x_{c_{1}} - x_{c_{1}} ∥) & ϕ (∥ x_{c_{1}} - x_{c_{2}} ∥) & ϕ (∥ x_{c_{1}} - x_{c_{3}} ∥) & \dots & ϕ (∥ x_{c_{1}} - x_{c_{N_{c}}} ∥) \\ ϕ (∥ x_{c_{2}} - x_{c_{1}} ∥) & ϕ (∥ x_{c_{2}} - x_{c_{2}} ∥) & ϕ (∥ x_{c_{2}} - x_{c_{3}} ∥) & \dots & ϕ (∥ x_{c_{2}} - x_{c_{N_{c}}} ∥) \\ ϕ (∥ x_{c_{3}} - x_{c_{1}} ∥) & ϕ (∥ x_{c_{3}} - x_{c_{2}} ∥) & ϕ (∥ x_{c_{3}} - x_{c_{3}} ∥) & \dots & ϕ (∥ x_{c_{3}} - x_{c_{N_{c}}} ∥) \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ ϕ (∥ x_{c_{N_{c}}} - x_{c_{1}} ∥) & ϕ (∥ x_{c_{N_{c}}} - x_{c_{2}} ∥) & ϕ (∥ x_{c_{N_{c}}} - x_{c_{3}} ∥) & \dots & ϕ (∥ x_{c_{N_{c}}} - x_{c_{N_{c}}} ∥) \end{matrix}] .

(13)

According to the characteristics of Euclidean norm,

ϕ (∥ x_{c_{i}} - x_{c_{j}} ∥) = ϕ (∥ x_{c_{j}} - x_{c_{i}} ∥)

. Thus, we could get:

Φ_{c, c} = [\begin{matrix} ϕ (0) & ϕ_{1, 2} & ϕ_{1, 3} & \dots & ϕ_{1, N_{c}} \\ ϕ_{1, 2} & ϕ (0) & ϕ_{2, 3} & \dots & ϕ_{2, N_{c}} \\ ϕ_{1, 3} & ϕ_{2, 3} & ϕ (0) & \dots & ϕ_{3, N_{c}} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ ϕ_{1, N_{c}} & ϕ_{2, N_{c}} & ϕ_{3, N_{c}} & \dots & ϕ (0) \end{matrix}] .

(14)

Equation (14) presents that

Φ_{c, c}

is a real symmetric matrix and its diagonal elements are all equal. From the beginning to the end of greedy selection, each loop step puts one point into the set of control points without any decrease. Therefore, we could establish links between

{(i - 1)}_{t h}

and

i_{t h}

loop:

Φ_{i} = [\begin{matrix} Φ_{i - 1} & r \\ r^{T} & ϕ (0) \end{matrix}],

(15)

where

r = {[ϕ_{1, i} ϕ_{2, i} ϕ_{3, i} \dots ϕ_{i - 1, i}]}^{T}

. Apparently, it establishes a relationship between

Φ_{i - 1}

and

Φ_{i}

. In other words, the construction procedure of

Φ_{c, c}

in greedy selection possesses the fundamental precondition of iteration. For each loop, in the process of constructing

Φ_{i}

, the values needed to be calculated are vector

r

and

ϕ (0)

, instead of constructing an

i \times i

matrix as it is presented in Algorithm 1. In this process, the original computational complexity could be calculated by the following equation:

1^{2} + 2^{2} + 3^{2} + 4^{2} + \dots + N_{c}^{2} = \frac{N_{c} (N_{c} + 1) (2 \times N_{c} + 1)}{6} = O (N_{c}^{3}) .

(16)

The after-optimized computational complexity could be calculated by the following equation:

1 + (2 \times 2 - 1) + (3 \times 2 - 1) + (4 \times 2 - 1) + \dots + (N_{c} \times 2 - 1) = N_{c}^{2} = O (N_{c}^{2}) .

(17)

Therefore, for constructing

Φ_{c, c}

, the computing complexity could be reduced from

O (n^{3})

O (n^{2})

by block iteration.

Algorithm 1 Block iteration and Parallel computing in Greedy selection.

Require:

x_{b}

Δ x_{b}

and

ξ_{t o l}

Ensure:

x_{c}

and

Δ x_{c}

1:: Equally distribute $x_{b}^{(i)}$ to $P_{(i)}$ and put $x_{c}$ in $P_{m a i n}$
2:: $x_{c} = [x_{b_{i}}]$ //Select the initial control point from boundary points randomly
3:: Calculate $Φ_{c, c}$ , $Φ_{c, c}^{- 1}$ and $α$ , and make $Φ_{t e m p 1} = Φ_{c, c}$ , $Φ_{t e m p 1}^{- 1} = Φ_{c, c}^{- 1}$
4:: $P_{m a i n}$ sends $α$ and $x_{c}$ to $P_{(i)}$
5:: Calculate $Φ_{b, c}^{(i)}$ , and make $Φ_{t e m p 2}^{(i)} = Φ_{b, c}^{(i)}$
6:: $Δ x_{b}^{(i) *} = Φ_{b, c}^{(i)} α$
7:: $ε^{(i)} = Δ x_{b}^{(i) *} - Δ x_{b}^{(i)}$
8:: $P_{m a i n}$ receives $ε_{m a x}^{(i)}$ from $P_{(i)}$
9:: $ε_{m a x} = m a x (ε_{m a x}^{(i)})$
10:: for $\frac{∥ ε_{m a x} ∥_{2}}{∥ Δ x_{b} ∥_{2}} > ξ_{t o l}$ do
11:: $x_{c} = [x_{c}, x_{b_{ε_{m a x}}}]$
12:: $Φ_{c, c} = [\begin{matrix} Φ_{t e m p 1} & r r^{T} & ϕ (0) \end{matrix}]$
13:: $Φ_{t e m p 1} = Φ_{c, c}$
14:: $β = - Φ_{t e m p 1}^{- 1} r$ , $b = \frac{1}{ϕ (0) - r^{T} Φ_{t e m p 1}^{- 1} r}$
15:: $Φ_{c, c}^{- 1} = [\begin{matrix} Φ_{t e m p 1}^{- 1} + b β β^{T} & b β \\ b β^{T} & b \end{matrix}]$
16:: $Φ_{t e m p 1}^{- 1} = Φ_{c, c}^{- 1}$
17:: $α = Φ_{c, c}^{- 1} Δ x_{c};$
18:: $P_{m a i n}$ sends $α$ and $x_{c}$ to $P_{(i)}$
19:: Calculate $τ^{(i)}$
20:: $Φ_{b, c}^{(i)} = [\begin{matrix} Φ_{t e m p 2}^{(i)} & τ^{(i)} \end{matrix}]$
21:: $Φ_{t e m p 2}^{(i)} = Φ_{b, c}^{(i)}$
22:: $Δ x_{b}^{(i) *} = Φ_{b, c}^{(i)} α$
23:: $ε^{(i)} = Δ x_{b}^{(i) *} - Δ x_{b}^{(i)}$
24:: $P_{m a i n}$ receives $ε_{m a x}^{(i)}$ from $P_{(i)}$
25:: $ε_{m a x} = m a x (ε_{m a x}^{(i)})$
26:: end for
27:: end Greedy selection

3.1.2. The Iterative Feasibility of Inversion

The inversion of

Φ_{c, c}

occupies almost half of the time cost. Optimizing the inverse process is one of the essential parts of the work in this paper. In Section 3.1.1, we separate

Φ_{i}

into a

2 \times 2

block matrix. According to the research done by Lu et al. [25], we could obtain several effective methods to inverse a

2 \times 2

block matrix. Considering that

Φ_{i}

is a real symmetric matrix, we inverse it as the following:

Φ_{i}^{- 1} = Q_{i} = [\begin{matrix} M_{i - 1} & η \\ η^{T} & p \end{matrix}],

(18)

where

Q_{i}

is an

i \times i

matrix,

M_{i - 1}

is an

(i - 1) \times (i - 1)

matrix,

η

is an

(i - 1)

dimensional vector and p is a real number. In addition, we formulate the equation:

Φ_{i} Q_{i} = [\begin{matrix} Φ_{i - 1} & r \\ r^{T} & ϕ (0) \end{matrix}] [\begin{matrix} M_{i - 1} & η \\ η^{T} & p \end{matrix}] = E_{i},

(19)

where

E_{i}

is an

i \times i

identity matrix. Thus, we could obtain the equation set:

\{\begin{matrix} Φ_{i - 1} M_{i - 1} + r η^{T} = E_{i - 1}, \\ Φ_{i - 1} η + r p = 0, \\ r^{T} M_{i - 1} + ϕ (0) η^{T} = 0^{T}, \\ r^{T} η + ϕ (0) p = 1, \end{matrix}

(20)

where

E_{i - 1}

is an

(i - 1) \times (i - 1)

identity matrix, and

0

is i dimensional zero vector. The solution is:

\{\begin{matrix} p = \frac{1}{ϕ (0) - r^{T} Φ_{i - 1}^{- 1} r}, \\ η = - \frac{Φ_{i - 1}^{- 1} r}{ϕ (0) - r^{T} Φ_{i - 1}^{- 1} r}, \\ M_{i - 1} = \frac{Φ_{i - 1}^{- 1} r {(Φ_{i - 1}^{- 1} r)}^{T}}{ϕ (0) - r^{T}, Φ_{i - 1}^{- 1} r} + Φ_{i - 1}^{- 1} - Φ_{i - 1}^{- 1} η^{T} . \end{matrix}

(21)

In addition, we assume the equation set:

\{\begin{matrix} β = - Φ_{i - 1}^{- 1} r, \\ b = \frac{1}{ϕ (0) - r^{T} Φ_{i - 1}^{- 1} r}, \end{matrix}

(22)

where

β

is an

(i - 1)

dimensional vector and b is a real number. Combining Equations (21) and (22), we obtain the solution:

Φ_{i}^{- 1} = [\begin{matrix} Φ_{i - 1}^{- 1} + b β β^{T} & b β \\ b β^{T} & b \end{matrix}] .

(23)

Equation (23) establishes links between

P h i_{i}^{- 1}

and

Φ_{i - 1}^{- 1}

. Therefore, the process of inversion has the ability of iteration. For each loop, in this process, the values needed to be calculated are

β

, b and

Φ_{i - 1}^{- 1} + b β β^{T}

rather than the inversion process of the whole

i \times i

matrix as it presented in Algorithm 1. In general, inversion adopts the method of Gauss elimination or LU factorization to inverse matrix. The computing complexity of Gauss elimination or LU factorization is approximately

O (n^{3})

[26]. For the block iteration method, the computing complexity decreases to

O (n^{2})

[19].

In addition, the initial matrix of some RBFs such as TPS is

Φ_{0} = [ϕ (0)] = [0]

, and their inverse matrices never exist. Therefore, for these kinds of RBF, at the beginning of the loop, we should inverse the initial

2 \times 2

matrix by the classical methods such as LU factorization.

3.1.3. The Iterative Feasibility of Constructing $Φ_{b, c}$

Similar to Section 3.1.1, combining Equations (7) and (10), matrix

Φ_{b, c}

is presented as the following:

Φ_{b, c} = [\begin{matrix} ϕ (∥ x_{b_{1}} - x_{c_{1}} ∥) & ϕ (∥ x_{b_{1}} - x_{c_{2}} ∥) & ϕ (∥ x_{b_{1}} - x_{c_{3}} ∥) & \dots & ϕ (∥ x_{b_{1}} - x_{c_{N_{c}}} ∥) \\ ϕ (∥ x_{b_{2}} - x_{c_{1}} ∥) & ϕ (∥ x_{b_{2}} - x_{c_{2}} ∥) & ϕ (∥ x_{b_{2}} - x_{c_{3}} ∥) & \dots & ϕ (∥ x_{b_{2}} - x_{c_{N_{c}}} ∥) \\ ϕ (∥ x_{b_{3}} - x_{c_{1}} ∥) & ϕ (∥ x_{b_{3}} - x_{c_{2}} ∥) & ϕ (∥ x_{b_{3}} - x_{c_{3}} ∥) & \dots & ϕ (∥ x_{b_{3}} - x_{c_{N_{c}}} ∥) \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ ϕ (∥ x_{b_{N_{b}}} - x_{c_{1}} ∥) & ϕ (∥ x_{b_{N_{b}}} - x_{c_{2}} ∥) & ϕ (∥ x_{b_{N_{b}}} - x_{c_{3}} ∥) & \dots & ϕ (∥ x_{b_{N_{b}}} - x_{c_{N_{c}}} ∥) \end{matrix}] .

(24)

For each loop in the greedy selection, the quantity of boundary points is constant. However, the number of control points adds by one in the

k_{t h}

loop contrasted with the

{(k - 1)}_{t h}

. Therefore, we establish links between

{(k - 1)}_{t h}

and

k_{t h}

loop:

Φ_{k} = [\begin{matrix} Φ_{k - 1} & τ \end{matrix}],

(25)

where

τ = {[ϕ_{b_{1}, c_{k}} ϕ_{b_{2}, c_{k}} ϕ_{b_{3}, c_{k}} \dots ϕ_{b_{N_{b}}, c_{k}}]}^{T}

and

Φ_{k - 1}

is a

N_{b} \times (k - 1)

matrix. Thus, the process of constructing

Φ_{b, c}

has iterative feasibility. For each loop, in the process of constructing

Φ_{k}

, the value that should be calculated is vector

τ

instead of constructing an

k \times k

matrix as it presented in Algorithm 1. In this process, with condition that

N_{b} > N_{c}

, the original computing complexity is:

(\sum_{i = 1}^{N_{c}} i) N_{b} = \frac{N_{c} (N_{c} + 1)}{2} N_{b} \approx \frac{σ}{2} N_{c}^{3},

(26)

where

σ = \frac{N_{b}}{N_{c}}

. In addition, the after-optimizing computing complexity is

N_{b} \times N_{c} = σ N_{c}^{2}

. In other words, the computing complexity of this process decreases from

O (n^{3})

O (n^{3})

3.2. Parallel Computing

3.2.1. The Feasibility of Parallel Computing

To optimize the multiplication process of

Φ_{b, c}

and

α

, we dissect Equation (10):

Δ x_{b}^{*} = Φ_{b, c} α = [\begin{matrix} ϕ (∥ x_{b_{1}} - x_{c_{1}} ∥) & ϕ (∥ x_{b_{1}} - x_{c_{2}} ∥) & \dots & ϕ (∥ x_{b_{1}} - x_{c_{N_{c}}} ∥) \\ ϕ (∥ x_{b_{2}} - x_{c_{1}} ∥) & ϕ (∥ x_{b_{2}} - x_{c_{2}} ∥) & \dots & ϕ (∥ x_{b_{2}} - x_{c_{N_{c}}} ∥) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ϕ (∥ x_{b_{N_{b}}} - x_{c_{1}} ∥) & ϕ (∥ x_{b_{N_{b}}} - x_{c_{2}} ∥) & \dots & ϕ (∥ x_{b_{N_{b}}} - x_{c_{N_{c}}} ∥) \end{matrix}] [\begin{matrix} α_{1} \\ α_{2} \\ ⋮ \\ α_{N_{c}} \end{matrix}] .

(27)

For the

i_{t h}

element of

Δ x_{b}^{*}

Δ x_{i}^{*} = (ϕ (∥ x_{b_{i}} - x_{c_{1}} ∥) α_{1} ϕ (∥ x_{b_{i}} - x_{c_{2}} ∥) α_{2} \dots ϕ (∥ x_{b_{i}} - x_{c_{N_{c}}} ∥) α_{N_{c}})^{T} .

(28)

According to the Section 3.1.3 and Equation (28), we figure out that

x_{b_{i}}

and

x_{b_{j}}

Δ x_{i}^{*}

and

Δ x_{j}^{*}

, has no data dependency for each other. Therefore, parallelization could be adopted in the process of constructing

Φ_{b, c}

and multiplying

Φ_{b, c}

and

α

3.2.2. Implementation of Parallelization

Considering that every boundary point is data-independent from the others, all the boundary points are equally distributed to every sub-processor. Thus, for the whole process of greedy selection, Section 3.1.1 and Section 3.1.2 are implemented at the main processor, and Section 3.1.3 and Section 3.2.1 are implemented at the sub-processors as presented in Algorithm 1, where

P_{m a i n}

and

P_{i}

refer to the main processor and sub-processors.

4. Results and Discussion

In this paper, the efficiency of proposed methods are contracted with the conventional greedy selection by three-dimensional undulating fish motion, Office National d’ Etudes et de Recherches Aerospatiales (ONERA) M6 wing deformation and three-dimensional super-cavitating hydrofoil. The time consumption of greedy selection is composed of six parts as presented in Table 2. For testing the parallel computational efficiency, we adopt an high performance computing (HPC) cluster that consists of more than one hundred computing nodes. Every node has an Intel Xeon 2.1 GHz E5-2620 CPU (United States of America) that consists of 12 cores and 16 GB total memory.

4.1. Mesh Quality and Three-Dimensional Undulating Fish

4.1.1. Mesh Quality Results

We define a quality metric for each of the mesh cells to assess the mesh quality results after deformation. Actually, the method we proposed is to improve the computational efficiency and its simulation results should be theoretically the same as the original results. Therefore, we just need to compare whether the results of the block iteration with parallelization method are equal to the original results or not. To make the mesh deformation results more intuitive, we adopt the two-dimensional undulating fish models as showed in Figure 2, in which the mesh is discretized with triangular cells.

Knupp [27] proposed that size and shape are the two main metrics for a triangle cell. The size metric is to judge whether the cells are too large or too small compared to the reference. In addition, the reference is the area of the cell before the deformation. The formulation of the size metric is expressed as:

f_{s i z e} = m i n (γ, \frac{1}{γ}),

(29)

where

γ

is the ratio of the area of the deformed cell to the area of the original cell. The

f_{s i z e} = 1

means that if and only if the physical triangle has the same area as the reference triangle, and the

f_{s i z e} = 0

means if and only if the physical triangle is degenerate.

The shape metric is to analyse distortions with the equilateral triangles reference. For two-dimensional models, a triangular cell contains three nodes that could be coordinated as

(x_{(k)}, y_{(k)})

k = 0, 1, 2

. In addition, the formulation of the

J a c o b i a n

matrix

A^{(k)}

A^{(k)} = [\begin{matrix} x_{k + 1} - x_{k} & x_{k + 2} - x_{k} \\ y_{k + 1} - y_{k} & y_{k + 2} - y_{k} \end{matrix}] .

(30)

Make

a_{i, j}^{(k)}

i, j = 1, 2

be the

i j^{t h}

component of the

k^{t h}

metric tensor. For any k, a node independent shape metric is expressed as:

f_{s h a p e} = \frac{\sqrt{3} ν^{(k)}}{a_{1, 1}^{(k)} + a_{2, 2}^{(k)} - a_{1, 2}^{(k)}},

(31)

where

ν^{(k)} = d e t (A^{(k)})

f_{s h a p e} = 1

means if and only if the physical triangle is equilateral and

f_{s h a p e} = 0

means if and only if the physical triangle is degenerate. In addition, the value of

f_{s h a p e}

is independent from a uniform scaling of the element.

In general, to measure both size and shape simultaneously with different weights, a size-shape quality metric for triangles is adopted in this paper, considering that the shape metric plays a more important role than the size metric:

f_{s i z e - s h a p e} = \sqrt{f_{s i z e}} f_{s h a p e} .

(32)

According to the method above, we test the two-dimensional undulating fish model, controlled by Equation (33). In addition, the results are shown in Table 3.

O r i g i n a l

means the original greedy selection.

B l o c k

means that computation procedure only adopts the block iteration method.

B w i t h P

means both block iteration and parallelization method are adopted with a parallelism increment of 12 cores.

We conducted undulation from

t i m e = 0.00 T

t i m e = 0.50 T

and

t i m e = 1.00 T

. It obviously shows that the proposed methods could obtain the same mesh quality results as the original one. In addition, according to the average

f_{s i z e - s h a p e}

, their mesh cells remain well.

4.1.2. Three-Dimensional Undulating Fish

Three-dimensional undulating fish is shown in Figure 3a. A fish is undulating in the centre of a 20 L × 5 L × 5 L cubic tank as presented in Figure 3b. It consists of 55,879 internal points, 5646 moving boundary points inside the mesh. Kern et al. [28] proposed the outline geometry of the fish. The motion of the three-dimensional fish is controlled by the sinusoidal function referred by Carling et al. [29]:

y (s, t) = 0.125 \frac{s + 0.03125}{1.03125} sin [2 π (s - \frac{t}{T})],

(33)

where y is the displacement of mid-line profile. s is the arc length along the mid-line of the fish body, t is the current time from beginning and T is the undulating period. For the motion, we compare the selection procedure of the first 0.0001 unit of time.

For the deformation of three-dimensional fish, the distribution of control points is shown in Figure 4. The convergence history of boundary errors is shown in Figure 5. Thus, we can see the block iteration with the parallelism method could get the same result compared with the conventional greedy selection.

Time comparison is shown in Table 4 and Figure 6.

O r i g i n

means the conventional greedy selection.

B l o c k

i t e r a t i o n

means that computation procedure only adopts the block iteration method.

B l o c k

w i t h

p a r a l l e l

means that both block iteration and the parallelization method are adopted with a parallelism increment of 12 cores. We can see that the time costs of parts a, b, d are efficiently reduced by 94.4% on average via a block iteration method. In addition, the parallel speedup ratio of part d, e is 10.37 on average, calculated by Table 4.

For the deformation

ξ_{t o l} = 0.00015

, there are 489 control points selected. The block iteration time cost of part a is about

\frac{1}{146}

compared with the conventional condition. It could be a validation that computational complexity is reduced from approximate

\frac{1}{3} N_{c}^{3}

N_{c}^{2}

. For part d, the block iteration time is

\frac{1}{231}

of the conventional time cost, reduced from

\frac{σ}{2} N_{c}^{3}

σ N_{c}^{2}

. The same with part a and d, the time cost of part b is reduced from

O (N_{c}^{3})

O (N_{c}^{2})

. Without optimization, the time cost of part c is almost the same as the conventional time cost. However, the time cost of part f is increased because of the additional variable introduction and parallel communication time consumption. For the deformation

ξ_{t o l} = 0.0001

, there are 618 control points selected. The computational complexities of block iteration for part a, b, d are

\frac{1}{191}

\frac{1}{161}

\frac{1}{302}

of the conventional complexities, reduced from

O (N_{c}^{3})

O (N_{c}^{2})

4.2. ONERA M6 Wing

ONERA M6 wing is shown in Figure 7a. The initial surface mesh connecting the outside flow field with the wing is shown in Figure 7b. It consists of 39,496 internal points, 9951 moving boundary points inside the mesh. The motion of the wing is controlled by the sinusoidal function referred by Fang et al. [21]:

Δ y = 0.05 y sin \frac{2 π y}{c},

(34)

where c is the length of root chord. The same as Section 4.1, for the motion, we compare the selection procedure of the first 0.0001 time unit.

For the deformation of ONERA M6 wing, 560 control points are selected under the condition that

ξ_{t o l} = 0.0005

. The distribution of boundary errors on wing surface are shown in Figure 8. Time comparison is shown in Table 5 and Figure 9. We can see that the time costs of parts a, b, d are efficiently reduced by more than 95% on average via the block iteration method. In addition, the parallel speedup of part d, e is 10.33 on average.

For the deformation

ξ_{t o l} = 0.00055

, there are 528 control points selected. The block iteration time cost of part a is about

\frac{1}{148}

compared with the conventional condition. It could be a validation that computational complexity is reduced from approximately

\frac{1}{3} N_{c}^{3}

N_{c}^{2}

. For part d, the block iteration time is

\frac{1}{259}

of the conventional time cost, reduced from

\frac{σ}{2} N_{c}^{3}

σ N_{c}^{2}

. The same with parts a and d, and the time cost of part b is reduced from

O (N_{c}^{3})

O (N_{c}^{2})

. The time costs of part c and f are the same as Section 4.1. For the deformation

ξ_{t o l} = 0.0005

, there are 560 control points selected. The computational complexities of block iteration for parts a, b, d are

\frac{1}{168}

\frac{1}{110}

\frac{1}{278}

of the conventional complexities, reduced from

O (N_{c}^{3})

O (N_{c}^{2})

The parallel speedup ratio is shown in Figure 10. We can see that parallelism could efficiently reduce the time cost of part d and e. However, with the quantity increase of processors, the parallel efficiency rapidly declined.

4.3. Three-Dimensional Super-Cavitating Hydrofoil

The results in Section 4.1 and Section 4.2 are both tested on

10^{5}

scale mesh. For testing the performance of the proposed method in large-scale mesh, we utilize the model of three-dimensional Super-cavitating Hydrofoil. A three-dimensional Super-cavitating Hydrofoil is shown in Figure 11a. The initial surface mesh connecting the outside flow field with the hydrofoil is shown in Figure 11b. It consists of

1.1 \times 10^{7}

cells, 10,309,112 internal points and 25,702 moving boundary points inside the mesh. The hydrofoil axially rotates inside the mesh system. For the motion, we compare the selection procedure of the first 0.0001 time unit.

For the deformation

ξ_{t o l} = 10^{- 5}

, there are 444 control points selected. As shown in Table 6 and Figure 12a, the block iteration time cost of part a is about

\frac{1}{155}

compared with the conventional condition. It could be a validation that computational complexity is reduced from approximately

\frac{1}{3} N_{c}^{3}

N_{c}^{2}

. For part d, the block iteration time is

\frac{1}{216}

of the conventional time cost, reduced from

\frac{σ}{2} N_{c}^{3}

σ N_{c}^{2}

. The same with parts a and d, the time cost of part b is reduced from

O (N_{c}^{3})

O (N_{c}^{2})

. The time cost of part c is almost changeless. However, part f generates additional time costs compared with the conventional method.

The parallel speedup ratio is shown in Figure 12b. We can see that, with the quantity increase of processors, the parallel efficiency rapidly declined, the same as results in Section 4.2.

5. Conclusions

This paper proposes block iteration with the parallel method to improve the efficiency of the greedy selection based on RBF mesh deformation. The block iteration method is proposed based on the fact that the specific selection way has the iterative feasibility. It could reduce the computational complexity of matrices construction and inversion from

O (n^{3})

O (n^{2})

in the process of the greedy selection. The parallelization addresses the load imbalance problem of the selection process and improves the efficiency of data reduction, based on the independence of the boundary point. To adopt the block iteration with the parallel method, the efficiency of greedy selection in RBF deformation is greatly improved while keeping the data accuracy. Three models are tested via this proposed method and validate the good performance of the method. The model tests show that the total speedup ratio of each test could achieve 17.35 on average via the block iteration with the parallel method.

Author Contributions

Conceptualization, R.Z. and C.L.; data curation, R.Z., S.F. and Y.W.; formal analysis, R.Z., C.L. and X.G.; funding acquisition, C.Y.; investigation, R.Z., C.L. and X.G.; methodology, R.Z. and C.L.; project administration, R.Z. C.Y. and C.L.; resources, R.Z., C.L. and X.G.; software, R.Z.; supervision, C.L. and X.G.; validation, R.Z. and C.L.; visualization, R.Z., S.F. and Y.W.; writing—original draft, R.Z.; writing—review and editing, C.Y. and C.L.

Funding

This research was funded by the National Natural Science Foundation of China Grant No. 61872380, 61802426 and 61806213

Acknowledgments

The proposed methods are tested via OpenFOAM.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

CFD	Computational Fluid Dynamic
RBF	Radial Basis Function
ONERA	Office National d’ Etudes et de Recherches Aerospatiales
OE	Ocean Engineering
LU	Lower-upper
TPS	Thin Plate Spline
HPC	High Performance Computing
AIAA	American Institute of Aeronautics and Astronautics
CMAME	Computer Methods in Applied Mechanics and Engineering
FEAD	Finite Elements in Analysis and Design
BNM	BIT Numerical Mathematics
AASME	AIAA Aerospace Sciences Meeting and Exhibit
ASME	Aerospace Sciences Meeting and Exhibit
AASMNHFAE	AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace
	Exposition
IJNME	International Journal for Numerical Methods in Engineering
CF	Computers and Fluids
CS	Computers and Structures
JCP	Journal of Computational Physics
CMA	Computers and Mathematics with Applications
ASM	Applied Mathematics, Simulation, Modelling
SIAM	Society for Industrial and Applied Mathematics
JEB	Journal of Experimental Biology

References

Li, C.; Yang, W.; Xu, X.; Wang, J.; Wang, M.; Xu, L. Numerical investigation of fish exploiting vortices based on the Kármán gaiting model. OE 2017, 140, 7–18. [Google Scholar] [CrossRef]
Batina, J.T. Unsteady Euler algorithm with unstructured dynamic mesh for complex-aircraft aerodynamic analysis. AIAA 1991, 29, 327–333. [Google Scholar] [CrossRef]
Farhat, C.; Degand, C.; Koobus, B.; Lesoinne, M. Torsional springs for two-dimensional dynamic unstructured fluid meshes. CMAME 1998, 163, 231–245. [Google Scholar] [CrossRef]
Huo, S.H.; Wang, F.S.; Yan, W.Z.; Yue, Z.F. Layered elastic solid method for the generation of unstructured dynamic mesh. FEAD 2010, 46, 949–955. [Google Scholar] [CrossRef]
Shontz, S.M.; Vavasis, S.A. Analysis of and workarounds for element reversal for a finite element-based algorithm for warping triangular and tetrahedral meshes. BNM 2010, 50, 863–884. [Google Scholar] [CrossRef] [Green Version]
Yang, Z.; Mavriplis, D.J. Unstructured dynamic meshes with higher-order time integration schemes for the unsteady Navier-Stokes equations. In Proceedings of the 43rd AIAA Aerospace Science Meeting and Exhibit, Reno, NV, USA, 10–13 January 2005; Volume 1222. [Google Scholar]
Potsdam, M.A.; Guruswamy, G.P. A parallel multiblock mesh movement scheme for complex aeroelastic applications. In Proceedings of the 39rd AIAA Aerospace Science Meeting and Exhibit, Reno, NV, USA, 8–11 January 2001; Volume 716. [Google Scholar]
Liu, X.; Qin, N.; Xia, H. Fast dynamic grid deformation based on Delaunay graph mapping. JCP 2006, 211, 405–423. [Google Scholar] [CrossRef]
Witteveen, J.A.S. Explicit and robust inverse distance weighting mesh deformation for CFD. In Proceedings of the 48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Expositio, Orlando, FL, USA, 4–7 January 2010; p. 165. [Google Scholar]
Luke, E.; Collins, E.; Blades, E. A fast mesh deformation method using explicit interpolation. JCP 2012, 231, 586–601. [Google Scholar] [CrossRef]
De Boer, A.; van der Schoot, M.S.; Bijl, H. Mesh deformation based on radial basis function interpolation. CS 2007, 11, 784–795. [Google Scholar] [CrossRef]
Sheng, C.; Allen, C.B. Efficient mesh deformation using radial basis functions on unstructured meshes. AIAA 2012, 51, 707–720. [Google Scholar] [CrossRef]
Bos, F.M.; van Oudheusden, B.W. Radial basis function based mesh deformation applied to simulation of flow around flapping wings. CF 2013, 79, 167–177. [Google Scholar] [CrossRef]
Jakobsson, S.; Amoignon, O. Mesh deformation using radial basis functions for gradient-based aerodynamic shape optimization. CF 2007, 36, 1119–1136. [Google Scholar] [CrossRef]
Rendall, T.C.S.; Allen, C.B. Efficient mesh motion using radial basis functions with data reduction algorithms. JCP 2009, 17, 6231–6249. [Google Scholar] [CrossRef]
Michler, A.K. Aircraft control surface deflection using RBF-based mesh deformation. IJNME 2011, 88, 986–1007. [Google Scholar] [CrossRef]
Li, C.; Xu, X.; Wang, J.; Xu, L.; Ye, S.; Yang, X. A parallel multiselection greedy method for the radial basis function–based mesh deformation. IJNME 2018, 113, 1561–1588. [Google Scholar] [CrossRef]
Gillebaart, T.; Blom, D.S.; van Zuijlen, A.H.; Bijl, H. Adaptive radial basis function mesh deformation using data reduction. JCP 2016, 321, 997–1025. [Google Scholar] [CrossRef]
Skala, V. Radial basis functions interpolation and applications: An incremental approach. ASM 2010, 7, 209–213. [Google Scholar]
Selim, M.M.; Koomullil, R.P.; Shehata, A.S. Incremental approach for radial basis functions mesh deformation with greedy algorithm. JCP 2017, 340, 556–574. [Google Scholar] [CrossRef]
Fang, H.; Hu, Y.; Yu, C.; Tie, M.; Liu, J.; Gong, C. An efficient radial basis functions mesh deformation with greedy algorithm based on recurrence Choleskey decomposition and parallel computing. JCP 2019, 377, 183–199. [Google Scholar] [CrossRef]
Kedward, L.; Allen, C.B.; Rendall, T.C.S. Efficient and exact mesh deformation using multiscale RBF interpolation. JCP 2017, 345, 732–751. [Google Scholar] [CrossRef]
Gerhold, T.; Neumann, J. The parallel mesh deformation of the DLR TAU-code. In New Results in Numerical and Experimental Fluid Mechanics VI; Springer: Berlin/Heidelberg, Germany, 2007; pp. 162–169. [Google Scholar]
Rendall, T.C.S.; Allen, C.B. Parallel efficient mesh motion using radial basis functions with application to multi-bladed rotors. IJNME 2010, 81, 183–199. [Google Scholar]
Lu, T.T.; Shiou, S.H. Inverses of 2 × 2 block matrices. CMA 2002, 1, 119–129. [Google Scholar] [CrossRef]
Datta, B.N. Gaussian Elimination and LU Factorization. In Numerical Linear Algebra and Applications, 2nd ed.; SIAM: Philadelphia, PA, USA, 2010; pp. 81–116. [Google Scholar]
Knupp, P.M. Algebraic mesh quality metrics for unstructured initial meshes. FEAD 2003, 39, 217–241. [Google Scholar]
Kern, S.; Koumoutsakos, P. Simulations of optimized anguilliform swimming. JEB 2006, 209, 4841–4857. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Carling, J.; williams, T.L.; Bowtell, G. Self-propelled anguilliform swimming: Simultaneous solution of the two-dimensional Navier-Stokes equations and Newton’s laws of motion. JEB 1998, 201, 3143–3166. [Google Scholar]

Figure 1. Time allocation of conventional greedy selection in different experiment setups. The specific explanation of the experiment setup in this figure is presented in Section 4.

Figure 2. The mesh around two-dimensional undulating fish.

Figure 3. Three-dimensional fish and initial mesh of its cubic tank.

Figure 4. The distribution of control points in three-dimensional fish (

ξ_{t o l} = 0.0001

Figure 4. The distribution of control points in three-dimensional fish (

ξ_{t o l} = 0.0001

Figure 5. Convergence history of boundary errors in three-dimensional fish (

ξ_{t o l} = 0.0001

Figure 5. Convergence history of boundary errors in three-dimensional fish (

ξ_{t o l} = 0.0001

Figure 6. Time comparison of three-dimensional fish based on Table 4.

Figure 7. ONERA M6 wing and its initial surface mesh.

Figure 8. The distribution of boundary errors on wing surface.

Figure 9. Time comparison of ONERA M6 wing.

Figure 10. Speedup ratio and parallel efficiency of ONERA M6 wing. (a) Part d,

ξ_{t o l} = 0.0005

. (b) Part e,

ξ_{t o l} = 0.0005

. (c) Part d,

ξ_{t o l} = 0.00055

. (d) Part e,

ξ_{t o l} = 0.00055

Figure 10. Speedup ratio and parallel efficiency of ONERA M6 wing. (a) Part d,

ξ_{t o l} = 0.0005

. (b) Part e,

ξ_{t o l} = 0.0005

. (c) Part d,

ξ_{t o l} = 0.00055

. (d) Part e,

ξ_{t o l} = 0.00055

Figure 11. Three-dimensional Super-cavitating Hydrofoil and its initial surface mesh.

Figure 12. Time cost and speedup of three-dimensional Super-cavitating Hydrofoil. (a) Time Comparison of three-dimensional Super- cavitating Hydrofoil. (b) Speedup Ratio and Parallel Efficiency of three- dimensional Super-cavitating Hydrofoil.

Table 1. Radial basis function (

θ = \frac{x}{r}

, where x is the Euclidean norm and r is the support radius).

Table 1. Radial basis function (

θ = \frac{x}{r}

, where x is the Euclidean norm and r is the support radius).

Global		Compact
Name	f(x)	Name	f(θ)
Thin plate spline (TPS)	$x^{2} log (x)$	CP $C^{0}$	${(1 - θ)}^{2}$
Multi-quadric bi-harmonics (MQB)	$\sqrt{a^{2} + x^{2}}$	CP $C^{2}$	${(1 - θ)}^{4} (4 θ + 1)$
Inverse multi-quadric bi-harmonics (IMQB)	$\sqrt{\frac{1}{a^{2} + x^{2}}}$	CP $C^{6}$	${(1 - θ)}^{8} (32 θ^{3} + 25 θ^{2} + 8 θ + 1)$
Quadric bi-harmonics (QB)	$1 + x^{2}$	CTPS $C^{0}$	${(1 - θ)}^{5}$
Gaussian	$e^{- x^{2}}$	Compact TPS	$θ^{2} log (θ)$

Table 2. Time allocation of conventional greedy selection.

a .

Construct

Φ_{c, c}

in Equation (8);

b .

Solve the value of

Φ_{c, c}^{- 1}

in Equation (8);

c .

Calculate the product of

Φ_{c, c}^{- 1}

and

Δ x_{c}

in Equation (8);

d .

Construct

Φ_{b, c}

in Equation (10);

e .

Calculate the product of

Φ_{b, c}

and

α

in Equation (10);

f .

Others.

Table 3. Mesh deformation results for the undulating fish using different methods,

ξ_{t o l} = 0.0001

Table 3. Mesh deformation results for the undulating fish using different methods,

ξ_{t o l} = 0.0001

$f_{size - shape}$	Origin	Block	B with P
Average (T = 0.50)	0.95	0.95	0.95
Minimum (T = 0.50)	0.541	0.541	0.541
Average (T = 1.00)	0.94	0.94	0.94
Minimum (T = 1.00)	0.653	0.653	0.653

Table 4. Time comparison of three-dimensional fish (value of

ξ_{t o l}