1 Introduction

Nowadays, researchers extensively rely on computer simulations to evaluate candidate designs and to reduce the time and the cost of the engineering design process. Such simulations, which must be properly validated with real-world experiments, transform the design process into an optimization problem having three distinct features:

  1. 1)

    Objective values are obtained from the simulation, which is often a legacy code or a commercial software available only as an executable. As such, the simulation is treated as a black-box function, i.e., a function with no analytic expression.

  2. 2)

    Each simulation run is expensive, i.e., it requires extensive computational resources, often minutes to weeks of CPU time. This implies that the expensive function (the simulation) can be called only a small number of times.

  3. 3)

    Due to the numerical simulation and the underlying physics being modelled, the objective landscape may contain many local optima and exhibit other complicated features, which exacerbate the optimization difficulty.

Accordingly, such problems are often referred to as expensive optimization problems[1].

Due to these difficulties, classical optimization algorithms may perform poorly in such settings, and this has motivated researchers to explore new optimization strategies. One such well-established strategy is that of metamodel-assisted computational intelligence (CI) optimization algorithms, which combines two powerful approaches:

  1. 1)

    In contrast to classical gradient-based optimizers which operate on a single candidate solution, CI optimizers employ a population of candidate solutions and manipulate them at each iteration to produce a new set of candidate solutions[2]. CI optimizers typically perform well over a range of different objective functions, do not require gradient information, and can handle challenges such as multi-modality and discontinuities in the objective function landscape.

  2. 2)

    Metamodels are computationally cheaper approximations of the true expensive function, typically interpolants trained using previously evaluated solutions. During the optimization search, the optimizer receives a predicted response from the metamodel at a fraction of the computational cost when compared to using the true expensive function. Metamodels allow to use CI optimizers efficiently, which typically require many thousands of evaluations to converge, on a limited computational budget.

Although different types of optimizers and metamodels have been proposed, the optimal type to employ is typically problem dependant, i.e., different optimizers and metamodels suit different problems. Therefore, a priori prescribing the optimizer and metamodel type may result in a setup which is unsuitable to the problem being solved, and can degrade the search effectiveness.

Leveraging on this issue, this study proposes a new CI metamodel-assisted algorithm which adapts the type of optimizer and metamodel during the search, and selects an optimal combination of the latter two out of a family of candidates. This eliminates the need to a priori prescribing the type of the metamodel and optimizer, and allows the optimization search to self-adapt to different problems. To the best of our knowledge, such an adaptive metamodel-assisted CI algorithm is new. Extensive performance analysis, using both mathematical test functions and a representative engineering application, demonstrates the effectiveness of the proposed algorithm and highlights the contribution of the proposed approach.

The remainder of this paper is organized as follows. Section 2 provides the pertinent background information on CI optimizers and expensive optimization problems. Section 3 describes the proposed algorithm. Section 4 provides a detailed performance analysis. And Section 5 concludes this paper.

2 Background

2.1 Expensive optimization problems

Expensive optimization problems arise in diverse domains, and as mentioned in Section 1, a typical scenario is an engineering design optimization problem driven by computer simulations. Fig.1 shows the layout of such problem, where the simulation is viewed as a black-box function, i.e., it assigns objective values to candidate designs while its analytic expression is unknown. In such optimization problem candidate designs are represented as vectors of design variables, and are provided as inputs to the simulation.

Fig.1
figure 1

The layout of an expensive black-box optimization problem

Also as mentioned, the resultant objective function often has a complicated, noncnvex landscape, which can lead gradient-based optimizers to converge to a poor final result. This has motivated the application of CI optimizers in such problems, since these optimizers are more explorative and therefore often perform better in nonconvex function landscapes. Such optimizers typically employ a population of candidate solutions, and manipulate them using a variety of operators. For example, the evolutionary algorithm (EA) employs operators inspired by the paradigms of adaption and survival of the fittest in nature, such that the best candidate solutions in the population are combined to produce offspring. As another example, the particle swarm optimizer (PSO) employs a set of particles which explore the search space, and it modifies their trajectories based on the observed function values. Through their respective operators, the various CI optimizers emphasize the best performing member in the population, which leads to gradual adaptation of the population to the function landscape and convergence to an optimum.

Since CI optimizers rely on a population of candidate solutions and do not utilize gradient information, they often require many thousands of function evaluations to obtain a satisfactory solution. This is a major obstacle in applying them to expensive optimization problems where the objective function can be evaluated only a small number of times. An established methodology to circumvent this is to employ a metamodel which is an approximation of the true expensive function, but is computationally much cheaper to evaluate. Metamodels (also termed as surrogates or response surfaces) are typically interpolants trained with previously evaluated vectors, and variants include artificial neural networks, Kriging, polynomials, and radial basis functions (RBF), which are described in Appendix A[3,4].

Much research has been devoted to the development of metamodel-assisted optimization algorithms. Early examples include the algorithm in [5] which studied an evolutionary algorithm assisted with a Kriging metamodel, and the algorithm in [6] which coupled a pattern search optimizer with quadratic polynomials. Later examples include the algorithm which used an evolutionary strategies optimizer coupled with a Kriging metamodel[7], and the algorithm which coupled an evolutionary algorithm with a least-square fit polynomial[8]. Poloni et al.[9] and Muy et al.[10] studied algorithms coupled with an artificial neural network.

In [11], Goel et al. explored the concept of a metamodel ensemble, in which multiple metamodels were employed concurrently, and their predictions were aggregated based on the estimated accuracy of each metamodel. Similar concepts were explored by Jin and Sendhoff[12], which used an ensemble of artificial neural networkss, and Lim et al.[13] which used an ensemble comprised of an RBF and a polynomial metamodel. Acar and Rais-Rohani[14] proposed to optimize the weights of the individual ensemble metamodels to improve the overall ensemble accuracy, which transforms the ensemble training process into an optimization problem of its own. Acar[15] studied the use of the pointwise cross-validation error estimation as a mean for determining the ensemble composition, and observed that the resultant ensembles did not perform well since this measure was not a reliable accuracy estimator. Zhou et al.[16] proposed to configure the ensemble based on a recursive arithmetic averaging procedure, and showed this improved the ensemble prediction accuracy.

Another strategy which considers multiple metamodels is that of model selection, in which the accuracy of several metamodels is estimated, and a single metamodel, which is deemed as the most accurate, is employed. Following this methodology, Gorissen et al.[17] proposed a framework in which the metamodels were selected by an evolutionary algorithm, a setup which resulted in a dynamic selection of the metamodel type and its complexity during the search. In a related study, Tenne and Armfield[18] presented a framework which used global and local metamodels, and selected an optimal type for each based on statistical cross-validation techniques. In this setup, the algorithm autonomously selected the optimal global and local metamodel types at each phase of the optimization search. Recently, Gorissen et al.[19] described a toolbox for training and selection of metamodels.

Also pertinent is the variable fidelity approach, in which the optimization algorithm relies on several simulations which differ by their accuracy, and accordingly by their computational cost. The goal is then to try and maximize the use of the computationally cheaper simulations while ensuring convergence to an optimum of the problem, i.e., an optimum of the objective function of the highest fidelity simulation. In two early examples, both Rodriguez et al.[20] and Alexandrov et al.[21] proposed frameworks which adjusted the simulation fidelity based on the trust-region approach. Sefrioui and Periaux[22] proposed a hierarchical setup which consisted of a bottom layer in which an evolutionary algorithm with a large population used a low fidelity simulation. Also, a top layer was employed in which an EA with a small population used a high fidelity simulation. The best performing individuals from the bottom layer would then migrate into a top layer for more exact evaluation, and vice versa. Also in the class, Lim et al.[23] explored an evolutionary based optimization algorithm which adjusted the simulation fidelity during the search, and based on the correlation between adjacent members in the population of candidate solutions.

An extensive survey on expensive optimization problems and metamodel-assisted optimization is given by [3], which provides a detailed discussion on a variety of related topics. Forrester and Keane[4] addressed similar topics, but focused on the Kriging metamodel and metamodel updating strategies based on the expected-improvement framework of [24]. Tenne and Goh[1] provide a collection of recent studies which focus on CI and metamodel-assisted optimization.

2.2 Related studies: Algorithms employing multiple metamodels or optimizers

Researchers have explored algorithms which employ multiple metamodels or optimizers with the goal of improving the search effectiveness.

With respect to multiple metamodels, an early example, which was also mentioned in Section 2.1, is the framework of hierarchical optimization of [22], which employed multiple simulations of different fidelity. A similar setup was explored by [25] and used evolutionary strategies optimizers. Also as mentioned, Tenne and Armfield[18] proposed an optimization framework which employed global and local metamodels, and autonomously selected the type of each metamodel during the optimization search. Zhou et al.[26] used a hybrid algorithm which coupled an EA and an SQP optimizer, and used an radial basis functions and a polynomial metamodels to benefit from the inaccuracy of the metamodel predictions. Cohen and Intrator[27] used an artificial neural network (ANN) comprising of clusters of both radial basis functions and multilayer perceptron (MLP) components, and used tests, such as error levels and confidence intervals, to configure the network topology.

With respect to employing multiple optimizers, Neri et al.[28] studied the problem of optimizing the topology of an MLP neural network, and optimized its weights using an EA coupled with a simulated annealing (SA) and a Hook-Jeeves optimizers. Caponio et al.[29] coupled an EA with a Hooke-Jeeves and a simplex optimizer for the problem of optimizing the performance of a motor. Egea et al.[30] proposed an algorithm for optimization of chemical processes which employed a scatter search optimizer as a global search procedure, while promising solutions were then refined with a simplex optimizer. Wang et al.[31] studied an algorithm for dynamic optimization problems in which an EA optimizer was coupled with a hill climbing optimizer. They also described various specialized EA operators to enhance the search effectiveness.

Most studies, such as those referenced above, used a fixed set of metamodels or optimizers, and did not select which types to use based on the problem being solved. However, the optimal type of the latter two is problem dependant. To demonstrate this, Table 1 shows the results of the two numerical experiments. In the first, three types of metamodels (radial basis function, radial basis functions network, and Kriging), which are described in Appendix A, were used to approximate the functions Ackley-10D, Rastrigin-20D, and Rosenbrock-20D. For each function, the three meta-models were trained using the same set of 30 vectors, and their prediction error, measured by the root mean square error (RMSE), was calculated using the same set of 20 new test vectors. Table 1 (a) shows the resultant prediction error, from which it follows that no single metamodel was overall best, and that the most suitable metamodel varied with the function and dimension. In the second experiment, three different types of optimizers, namely, differential evolution (DE), evolutionary algorithm, and particle swarm optimizer, which are described in Appendix B, were used to search for the optimum of several metamodels. For each test function, a metamodel was trained using a sample of 50 vectors, and each optimizer was applied repeatedly for 5 times. Table 1 (b) shows the best solution found by each optimizer in each case, from which it follows that no single optimizer was overall best, and the best optimizer to use again varied between the test cases.

Table 1 Comparison of metamodel accuracy and optimizer performance across several test functions (RMS error)

These results imply that a priori prescribing the optimizer and metamodel types may degrade the search effectiveness. This observation motivates the development of algorithms which adapt the type of metamodel and optimizer to the problem being solved. In this context, Gorissen et al.[17] proposed an approach in which multiple metamodels were considered, and EA was used to select an optimal type of metamodel during the search. Taking a different approach, the algorithm proposed in this study adapts both the type of optimizer and metamodel during the search by autonomously selecting the best performing ones from a family candidates. Section 3 provides a detailed description of the proposed algorithm.

3 Proposed algorithm

Before providing a detailed description of the proposed algorithm, it is highlighted that it leverages on two main concepts:

  1. 1)

    Adaptation of the type of optimizer and metamodel during search: The algorithm autonomously selects the best performing metamodel and optimizer out of a family of candidates at each iteration.

  2. 2)

    Trust-region (TR) optimization: To ensure convergence to an optimum of the true expensive objective function in the presence of metamodel inaccuracy, the proposed algorithm employs a TR approach which involves a trial-step search followed by the updates of the TR.

The proposed algorithm operates in five main steps: initialization, selecting the metamodel type, selecting the optimizer type, performing a TR trial step to seek an optimum on the metamodel, and updating the TR. The details of these steps are as follows:

Step 1. Initialization: The proposed algorithm begins by generating an initial sample of vectors using a Latin hy-percube (LH) design of experiments[32]. This technique is used to ensure that the sample is space-filling, as this improves the metamodel accuracy. Briefly, for a sample of l vectors, the range of each variable is split into l equal intervals, and one point is sampled at random in each interval. Next, a sample point is selected at random without replacement for each variable, and these samples are combined to produce a vector. This procedure is repeated for l times to generate the complete sample, which is then evaluated with the expensive simulation, and the vectors are stored in memory. After this initialization step, the main optimization loop begins.

Step 2. Metamodel selection: As mentioned, the proposed algorithm selects an optimal type of metamodel out of a family of candidates. To accomplish this in a mathematically rigorous manner, it leverages on the statistical model selection theory[33], and operates as follows. The set of vectors, which have already been evaluated with the true expensive function and stored in memory is split into a training set and a testing set. For each candidate meta-model type, the proposed algorithm trains a metamodel using the training set, tests its accuracy using the testing set, a procedure termed in the literature as cross-validation (CV). In this procedure, the accuracy of each metamodel is estimated using the mean squared error (MSE) of prediction, which is calculated as

$$\operatorname{MSE} = \frac{1}{l}\sum\limits_{i = 1}^l {{{(\hat m({x_i}) - f({x_i}))}^2}} $$
(1)

where x i, i = 1,⋯, l , are the testing vectors, \(\hat m\)(x) is a metamodel trained using the training set, and f(x i) is the exact and already known function value at x i. A merit of this approach is that there is no restriction on the type or number of the candidate metamodels which can be considered. As the CV procedure relies on splitting the candidate solutions stored in memory into two sets, numerical experiments have been performed to identify a suitable splitting ratio, and the details are given in Section 4.1. With respect to the metamodel types, the proposed algorithm selects between three candidates: radial basis functions (RBF), radial basis functions network (RBFN) and Kriging, which are described in Appendix A. After the selection phase, the proposed algorithm trains a new metamodel by using all the vectors stored in memory, and this metamodel is then used during the TR trial-step, as described in the following step.

Step 3. TR step and optimizer selection: The best vector stored in memory is taken as the TR centre (x b), and the proposed algorithm performs a TR trial-step, i.e., it seeks the optimum of the metamodel in the bounded region

$$g = \left\{ {x:{{\left\| {x - {x_b}} \right\|}_2} \leqslant \Delta } \right\}$$
(2)

where x b is the best solution found so far, and Δ is the TR radius.

Next, a procedure is employed to identify the best performing optimizer out of a family of candidates. To accomplish this, in turn, each candidate optimizer is used to seek the optimum of the metamodel trained in the previous step. In this setup, the TR trial-step is repeated with each optimizer. But since obtaining the objective values from each of the metamodels is computationally cheap, the entire procedure requires only a few seconds. After completing this step, the proposed algorithm selects the overall best solution x* found, and uses it during the TR trial-step as described below. It also records the best performing optimizer, and uses it in the metamodel improvement step as described below. The proposed algorithm selects between three candidate optimizers: a DE algorithm, an EA, and a PSO, which are described in Appendix B.

Step 4. TR updates: The best solution found in the previous step x* , is evaluated with the true expensive function, which yields the exact objective value f(x*). Following the classical TR framework[34], the proposed algorithm updates the metamodel and the TR as follows:

  1. 1)

    Iff(x*) < f(x b), the trial step was successful since the predicted optimum is indeed better than the current best solution x b. Accordingly, the TR is centred at the new optimum, and the TR is enlarged by doubling its radius.

  2. 2)

    If f(x*) ⩾ f(x b) and there are sufficient vectors in the TR, the search was unsuccessful since the predicted optimum is not better than the current best vector. However, since there are sufficient vectors in the TR, the metamodel is considered to be sufficiently accurate to justify contracting the TR, and therefore the TR radius is halved.

  3. 3)

    If f(x*) ⩾ f(x b) and there are insufficient vectors inside the TR, the search was unsuccessful. But this may be due to poor accuracy of the metamodel, i.e., there are too few vectors in the TR. As such, the algorithm adds new vectors in the TR to improve the metamodel accuracy as explained below.

As a change from the classical TR framework, the proposed algorithm monitors the number of vectors in the TR to avoid contracting it too quickly, which can result in premature convergence[34]. Numerical experiments have been performed to identify a suitable threshold value q , and the details are given in Section 4.1.

Another change from the classical TR framework is the addition of new vectors to improve the accuracy of the metamodel in the TR, as mentioned above. There are two considerations in selecting these vectors:

  1. 1)

    They should improve the metamodel accuracy locally, around the current optimum (the TR centre).

  2. 2)

    Alternatively, they should improve the metamodel over the entire TR, particularly in regions sparse with vectors[35].

Since these are typically two opposing considerations, the proposed algorithm generates several new vectors which represent different trade-offs between the above two considerations. The vectors are taken as the minimizers of the objective function

$$h(x) = w{h_1}(x) + (1 - w){h_2}(x)$$
(3)

where the minimization of (3) is performed by the best performing optimizer selected in Step 3.

The variables used in (3) are:

h 1(x) is the rank of a vector based on its objective value. The best vector in the population is assigned a rank of 1, the following one is assigned a rank of 2, etc.

h 2(x) is the rank of a vector based on its distance from the existing vectors in the TR. The vector in the population which is farthest is given a rank of 1, the vector having the second largest distance is given a rank of 2, etc.

The weight w defines the trade-off between the two considerations. The setting w = 1 implies that a vector is searched only based on its objective value, while w = 0 implies that a vector is searched only based on its distance to existing ones.

Equation (3) uses a rank-based approach to make the search insensitive to the magnitude and sign of the objective function values. Numerical experiments have been performed to identify a suitable set of weights, and the details are given in Section 4.1.

While in this study, the proposed algorithm used specific metamodels (Kriging, radial basis functions, and radial basis functions network) and optimizers (DE, EA, and PSO), other types or numbers of the latter can be readily used. To complete the description of the proposed algorithm, Fig. 2 gives a schematic layout of its optimization iteration, and Algorithm 1 gives its pseudo code.

Fig.2
figure 2

The layout of an optimization iteration of the proposed algorithm

4 Numerical experiments: Results and discussion

This section analyzes in detail the performance of the proposed algorithm. It begins by describing a parameter sensitivity analysis which was used to identify suitable settings for the algorithm parameters, it then describes benchmark tests based on a set of mathematical test functions, and concludes with another set of benchmark tests based on an engineering application of airfoil shape optimization.

4.1 Parameter sensitivity analysis

As described in Section 3, the proposed algorithm relies on three main parameters:

  1. 1)

    q: The minimum number of vectors needed in the TR to invoke a TR contraction.

  2. 2)

    w i : The weights used for generating the new vectors in the metamodel improvement step.

  3. 3)

    The split ratio between the training and the testing sets, as used in the CV procedure when estimating the metamodel accuracy.

For a rigorous sensitivity analysis, the following setup was used. A baseline set of parameter values was defined, and for each parameter in question, three candidate values were selected. In turn, only one parameter was changed and assigned a candidate setting, and the proposed algorithm was run for a full optimization search.

Algorithm 1
figure Tab6

Proposed optimization algorithm

To examine the impact of different parameter settings across a variety of objective functions and function dimensions, for each candidate setting, the algorithm was run with the Rastrigin-10D, Rosenbrock-10D, Rastrigin-20D, and Rosenbrock-20D functions, and with 10 repeats per case. Overall, for each parameter, 120 test runs were performed (3 candidate settings per parameter × 4 objective functions × 10 repetitions per setting). The candidate parameter settings were (baseline settings are underlined):

  1. 1)

    q: 0.1d, 05d, d, where d is the function dimension.

  2. 2)

    w: {0.8,0.6,0.4,0.2}, {0.8,0.5,0.2}, {0.8,0.2}.

  3. 3)

    Training testing solit ratio (in percent): 80-20, 50-50, 20-80.

For each candidate setting, Table 2 shows the mean objective value, the rank, and the overall rank, where a lower score is better. From analyzing these test results, it follows:

Table 2 Results for the parameter sensitivity analysis
  1. 1)

    q: Using q = d was the optimal setting for the Ras-trigin function, while it was q = 0.1 d for the Rosenbrock function. This is attributed to the differences between the function landscapes: Rastrigin is highly multimodal, and therefore, a higher threshold number delays the TR contraction and diminishes the changes of converging to a poor local optimum. In contrast, Rosenbrock is unimodal and therefore a lower threshold number accelerates the TR contraction and enables the algorithm to obtain a better approximation of the single global optimum. Since no single setting was overall optimal, we selected the intermediate setting of q = 0.5 d.

  2. 2)

    w: The optimal setting was that of using four weights, w = {0.8,0.6,0.4,0.2} which corresponds to adding four new vectors in the metamodel improvement step. This is attributed to the weights which result in vectors that improve the metamodel accuracy both near the optimum and over the entire TR.

  3. 3)

    Training-testing split ratio: The 50–50 split ratio was selected as it yielded the best results.

The selected parameter setting, as described above, were used in the benchmark tests described in the following sections.

4.2 Benchmark tests based on mathematical test functions

The performance of the proposed algorithm was evaluated using the set of established test functions (Ackley, Griewank, Rastrigin, Rosenbrock, Schwefel 2.13, and Weier-strass) suggested by [36], and in dimension 10 to 40. The functions details are given in Table 3.

Table 3 Mathematical test functions

For a thorough evaluation, the proposed algorithm was benchmarked against four reference algorithms:

  1. 1)

    V1: Variant 1 of the proposed algorithm, which is identical in operation, except that it used a single metamodel (RBF) and a single optimizer (DE).

  2. 2)

    V2: Variant 2 of the proposed algorithm, which was identical to the V1 variant in operation, except that it used a single metamodel (RBFN) and a single optimizer (EA).

  3. 3)

    Metamodel-assisted EA with periodic sampling (EA-PS)[5]: The algorithm combines a Kriging metamodel and an EA. It safeguards the metamodel accuracy by periodically evaluating a small subset of the population with the true objective function and then incorporating the latter into the metamodel.

  4. 4)

    Expected-improvement with a metamodel-assisted CMA-ES (EI-CMA-ES)[37]: The algorithm combines a co-variance matrix adaptation evolutionary strategy (CMA-ES) optimizer with a Kriging metamodel, and uses the expected improvement framework to update the metamodel.

Full details of the latter two reference algorithms are given in Appendix C. This benchmark layout was employed to:

  1. 1)

    Study the contribution of selecting the metamodel and optimizer during the search, by comparing the proposed algorithm to the V1 and V2 variants which did not employ such a selection.

  2. 2)

    Compare the proposed algorithm to the representative metamodel-assisted algorithms from the literature.

Every algorithm-objective function combination was repeated for 30 times to support a valid statistical analysis.

Table 4 gives the resultant test statistics of mean, standard deviation (SD), median, minimum (best) and maximum (worst) objective value for each algorithm-objective function combination, where the test functions were Ackley and Griewank in dimension 10, Rastrigin and Rosenbrock in dimension 20, and Schwefel 2.13 and Weierstrass in dimension 40. Also, the statistic α shows the significance-level at which the performance of the proposed algorithm was better than those of the other algorithms. An empty entry indicates that there was no statistically significant performance advantage at the 0.05 level. The α statistic was obtained using the Mann-Whitney nonparametric test[38].

Table 4 Test statistics for the mathematical test functions

Test results show that the proposed algorithm performed well. With the Ackley-10D function, it obtained the best mean and had the second best median, closely following the V1 algorithm. Its SD was the third best after the EA-PS and EI–CMA–ES algorithms, which indicates that its performance was stable. Its performance had a statistically significant advantage over that of the V2, EA-PS, and EI– CMA–ES algorithms at 0.05 level or better.

A similar pattern is observed with the Griewank-10D function, in which the proposed algorithm obtained both the best mean and the best median statistics. Its SD was the fourth best, indicating it had a higher variability in its performance when compared to the other algorithms. Its performance had a statistically significant advantage over that of the V2, EA-PS and EI–CMA–ES algorithms at 0.01 level, which indicates solid performance gains.

With the Rastrigin-20D function, the proposed algorithm maintained its performance advantage, and obtained the best mean and median statistics. Its SD statistic was the second best after the EA-PS algorithm, which indicates that its performance was stable. Lastly, its performance had a statistically significant advantage over that of the other four algorithms at the 0.01 level, again indicating solid performance gains.

With the Rosenbrock-20D function, a similar trend is observed, and the proposed algorithm obtained the best mean and median statistics while its SD was the second best after the EA-PS algorithm. These results are consistent with those observed with the Rastrigin-20D function, i.e., that the proposed algorithm obtained both a good final result and that its performance was stable. Its performance had a statistically significant advantage over that of the V2, EA– PS, and EI–CMA–ES algorithms at the 0.01 level.

For the Schwefel 2.13-40D function, the proposed algorithm obtained both the best mean and median statistics, while its SD was the second best behind the EA-PS algorithm. Its performance had a statistically significant advantage over the other four algorithms at the 0.01 level.

Finally, for the Weierstrass-40D function, the proposed algorithm again obtained the best mean and median statistics, while its SD was the fourth best, which indicates that its performance had a higher variability than that of the V1, V2, and EA-PS algorithms. Its performance had a statistically significant advantage over the V1 and V2 algorithms at the 0.01 level.

Overall, test results show that the proposed algorithm achieved the best mean and median statistics in all cases except for the Ackley-10D, where it achieved the best mean and the second best median. Analysis of the SD statistic shows that the proposed algorithm performance was typically stable, as it obtained the best SD statistic with the Schwefel 2.13-40D function, the second best with the Rastrigin-20D and Rosenbrock-20D, and the fourth best with the Ackley-10D function.

Analysis also shows that the proposed algorithm had a statistically significant performance advantage over the reference algorithms in 19 out of 24 comparisons (it was compared to 4 reference algorithms across 6 test functions), which indicates that it was the best performing algorithm overall.

Another performance aspect which was studied was the pattern of metamodel and optimizer updates, in order to understand if the latter two were updated frequently or if a single type of metamodel or optimizer dominated the search. Fig.3 shows representative plots of the meta-model and optimizer updates from a test run with functions Ackley-10D and Rosenbrock-20D. It follows that both were updated frequently during the search, which indicates that no single type was overall optimal, and that the optimal type varied as the search progressed. These results also highlight the demerit of using a fixed type of metamodel and optimizer, as this can result in an unsuitable configuration which can degrade the search effectiveness. This observation is also supported by the test results, where the proposed algorithm consistently outperformed algorithms V1 and V2 which were identical in operation to the proposed algorithm, but employed a single type of metamodel and optimizer.

Fig.3
figure 3

Examples of variation of the selected metmodel type and optimizer type during the search

To compare the convergence trends by each of the five benchmarked algorithms, Fig.4 shows the representative convergence plots with functions Ackley-10D and Rosenbrock-20D. The plots show that the proposed algorithm achieved either the best or near-best convergence rate, while it still provided the overall best final result. The contribution of adapting the metamodel and optimizer is evident from the comparison to algorithms V1 and V2, which had a slower convergence rate and achieved an inferior final result. Plots also show that the proposed algorithm outperformed the two reference algorithms from the literature, which follows the trends observed in the test statistics (Table 4).

Fig.4
figure 4

Representative convergence plots for the benchmarked algorithms

4.3 Engineering test problem

For a comprehensive evaluation, the proposed algorithm was also tested with an engineering application of airfoil shape optimization which is representative of a real-world simulation-driven design application, and is formulated as follows. During flight, an aircraft generates lift, i.e., the force which counters the aircraft weight and keeps it airborne, and drag which is an aerodynamic friction force that obstructs the aircrafts movement. Both the lift and drag are strongly determined by the wing cross-section, i.e., the airfoil. The optimization goal is then to find an airfoil shape which maximizes the lift and minimizes the drag.

In practise, the design requirements for airfoils are specified in terms of the nondimensional lift and drag coefficients, cl and c d , respectively, defined as

$${c_l} = \frac{L}{{\frac{1}{2}\rho {V^2}S}}$$
(4a)
$${c_d} = \frac{D}{{\frac{1}{2}\rho {V^2}S}}$$
(4b)

where L and D are the lift and drag forces, respectively, ρ is the air density, V is the aircraft speed, and S is a reference area, such as the wing area. Also, the angle of attack (AOA) is the angle between the aircraft velocity and the airfoil chord line, which is defined as the straight line joining the leading and trailing edges of the airfoil. Fig.5 gives a schematic layout of the airfoil problem.

Fig.5
figure 5

Schematic layout of the airfoil optimization problem

Candidate airfoils were represented using the Hicks-Henne parameterization[39], which defines the profile of a candidate airfoil as

$$y = {y_b} + \sum\limits_{i = 1}^h {{\alpha _i}{b_i}(x)} $$
(5)

where y b is a baseline airfoil profile, taken as the NACA0012 symmetric airfoil, b i is a basis function, which is defined as[40]

$${b_i}(x) = {\left[ {\sin \left( {\pi {x^{\frac{{\log (0.5)}}{{\log \tfrac{i}{{(h + 1)}}}}}}} \right)} \right]^4}$$
(6)

and α i ∈ [−0.01,0.01] are coefficients, which are the problem’s design variables. 10 basis functions were used for the upper and lower airfoil profile, respectively, resulting in a problem consisting of 20 design variables. Fig.5 summarizes the nomenclature of the physical quantities and airfoil parametrization involved in this test problem. The lift and drag coefficients of candidate airfoils were obtained using XFoil, a computational fluid dynamics simulation for analysis of airfoils operating in the subsonic regime Drela[41]. Each airfoil evaluation required up to 30 s on a desktop computer. To ensure structural integrity, between 0.2 to 0.8 of the chord line, the airfoil thickness (t) had to be equal to or larger than a critical value t* = 0.1. The flight conditions were set as a cruise altitude of 30kft (30000 feet), a cruise speed of Mach 0.7, i.e., 70% of the speed of sound, and an AOA of 2°. As mentioned, the optimization goal was to find an airfoil shape which maximizes the lift coefficient (c) and minimizes the aerodynamic (c d ) at the prescribed flight altitude, speed, and AOA. Also, to ensure structural integrity, between 0.2 to 0.8 of the chord length, the airfoil's thickness (t) had to be equal to or larger than a critical value t* =0.1.

To facilitate the metamodel-assisted optimization, i.e., to account both for the objective function and constraint using a single metamodel, the thickness constraint was incorporated into the objective function, which resulted in the following augmented objective function

$$f = - \frac{{{c_l}}}{{{c_d}}} + p$$
(7a)

where p is the penalty for airfoils which violate the thickness constraint, and was defined as

$$p = \left\{ {\begin{array}{*{20}{l}} {\frac{{{t^*}}}{t} \times \left\| {\frac{{{c_l}}}{{{c_d}}}} \right\|,}&{\operatorname{if} \;t < {t^*}} \\ {0,}&{\operatorname{otherwise} .} \end{array}} \right.$$
(7b)

The benchmark tests followed the setup described in Section 4.2, and Table 5 gives the resultant test statistics. It follows that the proposed algorithm outperformed the other candidate algorithms also in this test problem, which shows that it also performed well in a representative simulation-driven test problem. It obtained the best mean and median statistics, and had the highest SD. Its performance had a statistically significant advantage over that of the EA-PS algorithm at the 0.05 level.

Table 5 Test statistics for the airfoil problem

Following Section 4.2, Fig.6 shows the metamodel and optimizer updates from one test run. They follow the trend observed earlier: the metamodel and optimizer types were updated frequently during the search, and no single type was dominant. In this test run, the RBFN metamodel was used much less than the RBF and Kriging ones.

Fig.6
figure 6

Updates of the metmodel and optimizer type during one run with the airfoil problem

Fig.7 gives representative convergence plots for the five algorithms, showing that the proposed algorithm had the fastest convergence rate and obtained the overall best final result. The V1 and V2 variants followed closely but obtained inferior final results, which again highlights the contribution of adapting the metamodel and optimizer types during the search. The EA-PS algorithm did not improve on its initial best solution, while the EI–CMA–ES algorithm obtained a result inferior to that of the proposed algorithm and its two variants.

Fig.7
figure 7

Representative convergence plots for the five algorithms in the airfoil problem

5 Conclusions

The modern engineering design optimization process often replaces laboratory experiments with computer simulations, which results in expensive black-box optimization problems. Such problems introduce several new challenges into the optimization process, e.g., a lack of gradient information, a high computational cost for evaluating candidate solutions, and a complicated nonconvex objective landscape.

Metamodel-assisted CI optimization algorithms have shown to be effective in such challenging problems, but a variety of metamodels and optimizers have been proposed, which introduces the problem of selecting a combination suitable to the problem being solved. The type of meta-model and optimizer being used directly impacts the effectiveness of the optimization search, and a priori prescribing their type may result in a setup which is unsuitable to the problem being solved and degrades the search effectiveness. Furthermore, the high cost of evaluating candidate solutions makes it impractical to experiment with different metamodels and optimizers by performing multiple optimization runs.

Leveraging on this issue, this study has proposed a new metamodel-assisted CI algorithm which autonomously adapts the type of metamodel and optimizer during the search, by selecting the most suitable ones out of a family of candidates.

In an extensive evaluation, the proposed algorithm was benchmarked against two metamodel-assisted algorithms from the literature and two variants without metamodel or optimizer selection. Analysis shows that overall the proposed algorithm outperformed the reference algorithms as it consistently achieved the best final result. Also, the type of metamodel and optimizer were updated frequently during the search, which highlights the merit of the proposed approach.

Appendix

A. Candidate metmodels

As mentioned in Section 2, a metamodel serves as a computationally-cheaper mathematical approximation of the true expensive objective function, and it is typically an interpolant trained using previously evaluated vectors. In this study, the proposed algorithm selected between three metamodel variants[3,4]:

1) Kriging: The metamodel takes a statistical approach to interpolation by combining two components: a drift function which is a global coarse approximation of the true function and a local correction based on the correlation between the interpolation vectors. Given a set of evaluated vectors, x i ∈ Rd, i = 1, ,n, the Kriging metamodel is trained such that it exactly interpolates the observed values, i.e., m(xi) = f(xi), where m(x) and f(x) are the metamodel and true objective function, respectively. Using a constant drift function[42] gives the Kriging metamodel

$$m(x) = \beta + \kappa (x)$$
(8)

with the drift function β and local correction κ(x) . The latter is defined by a stationary Gaussian process with mean zero and covariance

$$\operatorname{Cov} [\kappa (x)\kappa (y)] = {\sigma ^2}c(\theta ,x,\vec y)$$
(9)

where c(θ ,x,y) is a user-prescribed correlation function. A common choice for the correlation is the Gaussian correlation function[42], defined as

$$c(\theta ,x,y) = \prod\limits_{i = 1}^d {\exp \left( { - \theta {{({x_i} - {y_i})}^2}} \right)} .$$
(10)

Combining the latter expression with the constant drift function transforms the metamodel from (8) into the following form

(11)

where \(\hat \beta \) is the estimated drift coefficient, R is the symmetric matrix of correlations between all interpolation vectors, f is the vector of objective values, and 1 is a vector with all elements equal to 1. r T is the correlation vector between a new vector x and the sample vectors, i.e.,

(12)

The estimated drift coefficient \(\hat \beta \) and variance \({{\hat \sigma }^2}\) , which are required in (11), are obtained as

(13a)
(13b)

Fully defining the metamodel requires the correlation parameters θ, which are commonly taken as the maximizers of the metamodel likelihood. This is achieved by minimizing the expression

$$\psi ({\mathbf{\theta }}) = {\left\| {\mathbf{R}} \right\|^{\tfrac{1}{n}}}{\hat \sigma ^2}$$
(14)

which is a function only of the correlation parameters θ and the sample data. While a different correlation parameter can be used for each variable, the metamodel used in this study, as is commonly done in literature, employs a single correlation parameter. This results in a univariate likelihood function, which is relatively easy to optimize.

2) Radial basis functions: The metamodel approximates the objective function by a superposition of basis functions of the form

$${\phi _i}({\mathbf{x}}) = \phi ({\left\| {{\mathbf{x}} - {{\mathbf{x}}_i}} \right\|_2})$$
(15)

where x iis a sampled vector. Given the sample vectors x i ∈ Rd, i = 1, , n, and the corresponding objective values f(x i ), the radial basis function metamodel is then given by

$$m({\mathbf{x}}) = {\alpha _i}\sum\limits_{i = 1}^n {{\phi _i}({\mathbf{x}}) + c} $$
(16)

where α i and c are coefficients determined from the interpolation conditions

$$m({{\mathbf{x}}_i}) = f({{\mathbf{x}}_i}),\;i = 1, \cdots ,n$$
(17a)
$$\sum\limits_{i = 1}^n {{\alpha _i}} = 0.$$
(17b)

A common choice is the Gaussian basis function[43], defined as

$${\varphi _i}({\mathbf{x}}) = exp\left( {\frac{{{\mathbf{x}} - {{\mathbf{x}}_i}}}{\tau }} \right)$$
(18)

where τ controls the width of the Gaussians and is determined by cross-validation[4,44], as explained below. Mathematically, the metamodel training process involves the solution of the linear equation system

(19)

where Φ is the Gram matrix such that Φ i,j = ϕ i (x j ), 1 is a vector whose elements are all one, and f is the vector of objective values of the sample vectors. The linear system (19) is often ill-conditioned, and is therefore solved using the singular value decomposition (SVD) procedure[4].

A main aspect of the training process of the radial basis function metamodel is the identification of an optimal width coefficient (τ) for the Gaussian basis functions. An established procedure to achieve this is by generating a set of candidate values, estimating the accuracy of the radial basis function metamodel with each candidate value using a CV procedure, and selecting the metamodel deemed as the most accurate[4,44].

3) RBFN: The metamodel aims to mimic the operation of a biological neural network, and uses radial basis functions as its neurons or processing units. In contrast to the baseline radial basis function metamodel, the radial basis functions network typically uses fewer basis functions than sample vectors, which is done to improve the generalization ability and prediction accuracy. Given a set of sample vectors, x i, i = 1, , n, and their respective objective values f(x i ) , the radial basis functions network metamodel is given by

$$m({\mathbf{x}}) = \sum\limits_{i = 1}^{\hat n} {{\alpha _i}{\varphi _i}({\mathbf{x}})} $$
(20)

where the processing units ϕ i (x) are

$${\varphi _{_i}}({\mathbf{x}}) = exp\left( {\frac{{{\mathbf{x}} - {{{\mathbf{\hat x}}}_i}}}{\tau }} \right)$$
(21)

and \(\hat n \)n. The neurons’ centres \({{\hat x}_i}\), i = 1, , \(\hat n\), need to be determined such that they provide a good representation of the sample vectors. An established approach to achieve this is to determine them by clustering, and often with the k-means clustering algorithm to determine these centres[45]. The parameter τ , which determines the width of the Gaussian functions of the neurons, is selected by cross-validation CV[4,44]. The coefficients α i are determined from the least-squares solution

(22)

where

$${\Phi _{i,j}} = {\varphi _j}({{\mathbf{x}}_i})$$
(23)

x i, i = 1, ,n, are the sample vectors, and f is the vector of corresponding objective function values. As mentioned in Section 3, the proposed algorithm used an radial basis functions network with n = 0.8n, i.e., the number of neurons was 80 % of the sample size.

B. Candidate optimizers

Computational intelligence and nature-inspired optimizers employ a population of candidate solutions and manipulate them at each iteration to produce a new set of candidate solutions. This manipulation is performed with operators which are often inspired by phenomena in biology and physics. In this study, the proposed algorithm selects between three established CI optimizers which represent different optimization paradigms:

1) Differential evolution[46]: The optimizer employs a population of candidate solutions. At each iteration, it selects a candidate solution x from the population at random, and generates a new solution by probabilistic rules governing how to combine components both from the original candidate solution and other population members. In this study, a DE variant known as DE/rand/1/bin was used, which operates as follows. A new candidate solution y is generated to replace x using the following component-wise rule:

$${y_i} = \left\{ {\begin{array}{*{20}{l}} {{a_i} + w({b_i} - {c_i}),}&{\operatorname{if} \;{r_i} \leqslant p} \\ {{x_i},}&{\operatorname{otherwise} } \end{array}} \right.$$
(24)

where a, b, and c are three distinct candidate solutions selected from the population at random, w is a user-prescribed weight, r i is a random number drawn from a uniform probability distribution U[0,1], and p is the user-prescribed crossover probability. The DE parameters were set based on the recommended settings[47] as: a population size of 18 candidate solutions, a crossover probability of p = 0.502, and a differential weight of w = 0.671.

2) Evolutionary algorithm : The algorithm uses a population of candidate solutions. At each iteration, it applies a series of operators to generate the population of the next iteration. A typical sequence of operators is: a) selection: the best candidate solutions in the population are selected as parents, b) recombination: parents are selected at random and their vectors are combined to produce offspring, c) mutation: a few offsprings are selected at random and some of their design variables are assigned new values. In this study, the real-coded EA of [48] was used, which employs the stochastic universal selection operator, intermediate recombination with a probability of p = 0.7 , and the breeder genetic algorithm mutation operator with probability p = 0.1.

3) Particle swarm optimizer[49]: The algorithm uses a swarm of candidate solutions, termed particles, each having its own velocity. Based on the function values of the particles, the algorithm updates their velocities and positions, such that the swarm is driven to explore more promising regions of the search space. At iteration t, for a swarm particle at position x (t) travelling at a velocity v (t), its new velocity is obtained using the rule

(25)

where ω is a user-defined parameter termed the inertia weight, and ϕ p and ϕ g are the user-defined attraction weights towards the particle’s best known position p and swarm best position g. rp and r g are weights taken as random numbers drawn from the uniform distribution U[0,1] . After obtaining v(t+1), the particle's new position is updated using the rule

$${{\mathbf{x}}^{(t + 1)}} = {{\mathbf{x}}^{(t)}} + {{\mathbf{v}}^{(t + 1)}}$$
(26)

The PSO parameters were set based on the recommended settings in [50]: a swarm size of 150 particles, ω = −0.324, ϕ p = −0.114, and ϕ g = 3.979.

C. Reference algorithms

As mentioned in Section 4.2, the benchmark tests included the following two reference algorithms from literature:

1) Metamodel-assisted with EA-PS[5]: The algorithm combines a Kriging metamodel and an EA, and safeguards the metamodel accuracy by periodically evaluating a small subset of the population with the true objective function, and using them to update the metamodel. The algorithm begins by generating candidate solutions using an LH design, evaluating them with the expensive function, and training an initial Kriging metamodel. A real-coded EA then runs for 10 generations while evaluating only the metamodel. Next, the algorithm evaluates the best ten candidate solutions in the population with the true expensive function, and incorporates them into the metamodel. The process repeats until the maximum number of expensive function evaluations is reached. In the tests, the EA was identical to the one used by the proposed algorithm, and which was described in Appendix B.

2) EI-CMA-ES[37]: The algorithm combines a CMA-ES optimizer[51] with a Kriging metamodel, and updates the latter based on the expected improvement framework[24]. The algorithm begins by generating an initial sample of vectors, and evaluates them with the true function. Its main loop then begins, where at each generation, it trains a local Kriging metamodel using both the recently evaluated vectors, and the vectors stored in memory which are nearest to the best solution. A CMA-ES algorithm then searches for an optimum of the metamodel in a bounded region defined by the training sample. In the spirit of the expected improvement framework[24], it uses the merit function

$$f({\mathbf{x}}) = m({\mathbf{x}}) - \rho \zeta ({\mathbf{x}})$$
(27)

where m(x) is the metamodel prediction, ρ is a prescribed coefficient, and ζ( x) is the estimated Kriging prediction error, which is zero at sampled vectors since the true objective value is known. The search is repeated for ρ = 0,1,2, and 4, to obtain four solutions corresponding to different search profiles, i.e., ranging from a local search (ρ = 0) to a more explorative one (ρ = 4). All non-duplicate solutions found are evaluated with the true expensive function and are stored in memory. If no new vectors were evaluated, e.g., they match vectors which have already been stored in memory, the algorithm generates a new vector by randomly changing the current best one. Following [37], the algorithm used a training set of 100 vectors comprising of the 50 most recently evaluated ones and 50 nearest-neighbours, and CMA–ES used the default values in [51].