Introduction

In the real world, there exist many problems having two or more (often conflicting) objectives that we aim to optimize at the same time. They are called multiobjective optimization problems (MOPs) and their solution has attracted the attention of researchers for many years. Because of the conflict among the objectives, solving an MOP produces a set of solutions representing the best possible trade-offs among the objectives (i.e., solutions in which one objective cannot be improved without worsening another one). Such solutions constitute the Pareto optimal set and the image of this set (i.e., the corresponding objective function values) form the so-called Pareto front.

In spite of the fact that a wide variety of mathematical programming techniques have been developed to tackle MOPs since the 1970s [115], such techniques present a number of limitations, from which the most remarkable are that these algorithms are normally quite susceptible to the shape and/or continuity of the Pareto front and that they usually generate one element of the Pareto optimal set per algorithmic execution. Additionally, some mathematical programming techniques require that the objective functions and the constraints are provided in algebraic form and in many real-world problems we can only obtain such values from a simulator. These limitations have motivated the use of alternative approaches, from which metaheuristics have been a very popular choice, mainly because of their flexibility (i.e., they require little domain specific information) and their ease of use. From the many metaheuristics currently available [158], evolutionary algorithms [57] have certainly been the most popular in the last few years in this area, giving rise to a field now known as evolutionary multiobjective optimization (EMO) [29].

The first Multi-Objective Evolutionary Algorithm (MOEA) was called Vector Evaluated Genetic Algorithm (VEGA) and was proposed by Schaffer in the mid-1980s [147,148,149]. Something interesting is that there was not much interest in EMO research for almost a decade. However, in the mid-1990s, this area started to attract a lot of attention from several research groups around the world, and has maintained a high research activity since then.Footnote 1

The remainder of this paper is organized as follows. In “Basic Concepts” we provide some basic mathematical concepts related to multiobjective optimization, with the aim of making of this a self-contained paper. “Open research areas” provides a list of research areas that present challenges that are particularly relevant from the authors’ perspective. In the subsequent subsections, each of these areas are described, providing in the process a summary of some of the most relevant research that has been conducted in such areas so far. The next section presents the main challenges associated to each of the research areas previously discussed. Finally, our conclusions are presented in the last section.

Basic concepts

In multiobjective optimization, the aim is to solve problems of the type:Footnote 2

$$\begin{aligned} \text{ minimize } \mathbf {f}(\mathbf {x}):=\left[ f_{1}(\mathbf {x}), f_{2}(\mathbf {x}), \ldots , f_{k}(\mathbf {x}) \right] \end{aligned}$$
(1)

subject to:

$$\begin{aligned}&g_{i}(\mathbf {x}) \le 0 \ \ \ \ i=1,2,\ldots ,m, \end{aligned}$$
(2)
$$\begin{aligned}&h_{i}(\mathbf {x}) = 0 \ \ \ \ i=1,2,\ldots ,p, \end{aligned}$$
(3)

where \(\mathbf {x}=\left[ x_{1}, x_{2}, \ldots , x_{n} \right] ^{\text {T}}\) is the vector of decision variables, \(f_{i}:\mathbb {R}^{n} \rightarrow {\mathbb {R}}\), \(i=1,\ldots ,k\) are the objective functions and \(g_i,h_j:{\mathbb {R}}^{n} \rightarrow {\mathbb {R}}\), \(i=1,\ldots ,m\), \(j=1,\ldots ,p\) are the constraint functions of the problem.

A few additional definitions are required to introduce the notion of optimality used in multiobjective optimization.

Definition 1

Given two vectors \(\mathbf {x},\mathbf {y} \in \mathbb {R}^k\), we say that \(\mathbf {x} \le \mathbf {y}\) if \(x_i \le y_i\) for \(i=1,\ldots ,k\), and that \(\mathbf {x}\) dominates \(\mathbf {y}\) (denoted by \(\mathbf {x}\prec \mathbf {y}\)) if \(\mathbf {x}\le \mathbf {y}\) and \(\mathbf {x} \ne \mathbf {y}\).

Definition 2

We say that a vector of decision variables \(\mathbf {x}\in \mathcal {X}\subset \mathbb {R}^n\) is nondominated with respect to \(\mathcal {X}\), if there does not exist another \(\mathbf {x}'\in \mathcal {X}\) such that \(\mathbf {f}(\mathbf {x}')\prec \mathbf {f}(\mathbf {x})\).

Definition 3

We say that a vector of decision variables \(\mathbf {x}^{*} \in \mathcal {F}\subset \mathbb {R}^n\) (\(\mathcal {F}\) is the feasible region) is Pareto-optimal if it is nondominated with respect to \(\mathcal {F}\).

Definition 4

The Pareto optimal set \(\mathcal {P^*}\) is defined by:

$$\begin{aligned} \mathcal {P^*}=\{\mathbf {x}\in \mathcal {F} |\mathbf {x} \text{ is } \text{ Pareto-optimal } \}. \end{aligned}$$

Definition 5

The Pareto front \(\mathcal {PF^*}\) is defined by:

$$\begin{aligned} \mathcal {PF^*}=\{ \mathbf {f}(\mathbf {x}) \in \mathbb {R}^k |\mathbf {x}\in \mathcal {P^*}\}. \end{aligned}$$

Therefore, our aim is to obtain the Pareto optimal set from the set \(\mathcal {F}\) of all the decision variable vectors that satisfy (2) and (3). Note however, that in practice, not all the Pareto optimal set is normally desirable or achievable, and decision-makers tend to prefer certain types of solutions or regions of the Pareto front [16].

Open research areas

In spite of the intense research activity of the last 25 years in this area, there are still some open research areas that are worth exploring in the next few years. The following is a non-comprehensive list of them:

  1. 1.

    Algorithmic design

  2. 2.

    Scalability

  3. 3.

    Dealing with expensive objective functions

  4. 4.

    Hyper-heuristics.

Next, we briefly discuss some of the most representative research that has been conducted on these topics.

Algorithmic design

In their origins, MOEAs were very simple and naive. A good example of this is the Vector Evaluated Genetic Algorithm (VEGA) [148] in which the population of a simple genetic algorithm was subdivided into as many subpopulations as the number of objectives of the MOP to be solved (only problems with two objectives were normally considered at that time). Then, solutions in each subpopulation were selected based on their performance on a single objective (e.g., individuals in the first subpopulation were selected based on the performance on the first objective). Then, the individuals of all the subpopulations were shuffled with the aim of recombining solutions that were the best in the first objective with those that were the best in the second objective. When combined with proportional selection (e.g., the roulette-wheel method), VEGA produced solutions similar to those obtained with the use of a linear aggregating function that combines all the objective functions into a single scalar value [31]. In spite of the limitations of VEGA, some researchers eventually found applications in which this sort of scheme could be useful (see for example [27]).

Linear aggregating functions were among the most popular approaches adopted in the early days of MOEAs [67], but their incapability of dealing with non-convex Pareto fronts was soon pointed out by some researchers (see for example [36]). Nevertheless, linear aggregating functions and other naive approaches, such as lexicographic ordering have survived in the EMO literature for many years [29].

Goldberg proposed in his seminal book on genetic algorithms [57] a mechanism called Pareto ranking for the selection scheme of a MOEA. The core idea of Pareto ranking is to rank the population of an evolutionary algorithm based on Pareto optimality, such that the nondominated solutions obtain the highest (best) possible rank and are sampled at the same rate (i.e., all nondominated solutions have the same probability of survival). Since Goldberg did not provide a specific algorithm for Pareto ranking, several implementations were developed based on his proposal. From them, the two main ones were those provided in the Multi-Objective Genetic Algorithm (MOGA) of Fonseca and Fleming [53] and the Nondominated Sorting Genetic Algorithm (NSGA) of Srinivas and Deb [153]. In the first, the ranking was done in a single pass (by comparing each individual with respect to everybody else, in terms of Pareto optimality), whereas the second required the creation of several layers of solutions, which involved re-ranking the population several times (i.e., NSGA was more computationally expensive than MOGA).

Goldberg [57] realized that in MOEAs, diversity would be a key issue if we aimed to generate as many elements of the Pareto optimal set as possible in a single algorithmic execution. This gave rise to the use of a mechanism that was eventually called density estimator, whose task is to maintain different (nondominated) solutions in the population, thus avoiding convergence to a single one (something that eventually happens with any evolutionary algorithm because of stochastic noise [57]). MOGA [53] and NSGA [153] used fitness sharing [58] as their density estimator, but a wide variety of other approaches have been proposed since then: clustering [184], adaptive grids [85], crowding [40], entropy [129] and parallel coordinates [74], among others.

In the late 1990s, another mechanism was incorporated in MOEAs: elitism. The idea of elitism is to retain the best solutions obtained by a MOEA so that they are not destroyed by the evolutionary operators (e.g., crossover and mutation). However, since all nondominated solutions are considered equally good (unless we have some preference information), this leads to the generation of a large number of solutions. Zitzler realized this when developing the Strength Pareto Evolutionary Algorithm (SPEA) [184] and also observed that retaining such a large number of solutions diluted the selection pressure. Thus, he proposed not only to use an external archive to store the nondominated solutions generated by his MOEA, but also proposed to prune such an archive once a certain (user-defined) limit was reached. For this sake, he adopted clustering. Elitism is important not only for practical reasons (it is easier to compare the performance of two MOEAs that produce the same number of nondominated solutions), but also for theoretical reasons, since it has been proved that such a mechanism is required in a MOEA to guarantee convergence [141].

Pareto-based MOEAs were very popular in the mid-1990s, but few of the many approaches that were proposed at that time have been actually used by other researchers. With no doubt, the most popular of the Pareto-based MOEAs has been the Nondominated Sorting Genetic Algorithm-II (NSGA-II) [40] which uses a more efficient ranking scheme (called nondominated sorting) than its predecessor (NSGA), and adopts a clever mechanism called crowded comparison operator (which does not require any parameters), as its density estimator. NSGA-II is still used today by many researchers, in spite of the well-known limitations of its crowded comparison operator when dealing with MOPs having more than three objectivesFootnote 3 (the so-called many-objective optimization problems [29]).

For over 10 years, Pareto-based MOEAs were, by far, the most popular approaches in the specialized literature. In 2004, a different type of algorithmic design was proposed, although it remained underdeveloped for several years: indicator-based selection. The core idea of this sort of MOEA was introduced in the Indicator-Based Evolutionary Algorithm (IBEA) [183] which consists of an algorithmic framework that allows the incorporation of any performance indicator into the selection mechanism of a MOEA. IBEA was originally tested with the hypervolume [182] and the binary \(\epsilon \) indicator [183].

Indicator-based MOEAs were initially seen as a curiosity in the field, since it was not clear what were their advantages other than providing an alternative mechanism for selecting solutions. However, when the limitations of Pareto-based selection for dealing with many-objective problems became clear, researchers started to get interested in indicator-based MOEAs, which did not seem to have scalability limitations. Much of the interest in this area was produced by the introduction of the S Metric Selection Evolutionary Multiobjective Algorithm (SMS-EMOA) [46]. SMS-EMOA randomly generates an initial population and then produces a single solution per iteration (i.e., it uses steady-state selection) using the crossover and mutation operators from NSGA-II. Then, it applies nondominated sorting (as in NSGA-II). When the last nondominated front has more than one solution, SMS-EMOA uses hypervolume [182] to decide which solution should be removed. Beume et al. [13] proposed a new version of SMS-EMOA in which the hypervolume contribution is not used when, in the nondominated sorting process, we obtain more than one front (i.e., the hypervolume is used as a density estimator). In this case, they use the number of solutions that dominate to a certain individual (i.e., the solution that is dominated by the largest number of solutions is removed). This makes SMS-EMOA a bit more efficient. However, since this MOEA relies on the use of exact hypervolume contributions, it becomes too computationally expensive as we increase the number of objectives [12].

SMS-EMOA started a trend for designing indicator-based MOEAs (several of which rely on the hypervolume indicator) although it is worth indicating that in such approaches, the performance indicator has been mostly used as a density estimator (see for example [77]). The use of “pure” indicator-based selection mechanisms has been very rare in the specialized literature (see for example [114]).

At this point, an obvious question is: why is that the hypervolume is such an attractive choice for indicator-based selection? The hypervolume (which is also known as the \(\mathcal {S}\) metric or the Lebesgue measure) of a set of solutions measures the size of the portion of objective space that is dominated by those solutions collectively. One of its main advantages are its mathematical properties, since it has been proved that the maximization of this performance measure is equivalent to finding the Pareto optimal set [52]. Additionally, empirical studies have shown that (for a certain number of points previously determined) maximizing the hypervolume indeed produces subsets of the true Pareto front [46, 85]. Also, the hypervolume assesses both convergence and, to a certain extent, also the spread of solutions along the Pareto front (although without necessarily enforcing a uniform distribution of solutions). However, there are several issues regarding the use of the hypervolume. First, the computation of this performance indicator depends of a reference point, which can influence the results in a significant manner. Some people have proposed to use the worst objective function values in the current population, but this requires scaling of the objectives. Nevertheless, the most serious limitation of the hypervolume is its high computational cost. The best algorithms known to compute hypervolume have a polynomial complexity on the number of points used, but such complexity grows exponentially on the number of objectives [12]. This has triggered a significant amount of research regarding algorithms that can reduce the computational cost of computing the hypervolume and the hypervolume contributions, which is what we need for a hypervolume-based MOEAFootnote 4 (see for example [33, 62, 81, 93, 142]).

An obvious alternative to deal with this issue is to approximate the actual hypervolume contributions. This is the approach adopted by the Hypervolume Estimation Algorithm for Multi-Objective Optimization (HyPE) [6] in which Monte Carlo simulations were used to approximate exact hypervolume values. In spite of the fact that HyPE can efficiently solve MOPs having a very large number of objectives, the results are not as competitive as when using exact hypervolume contributions.

Another possibility is to use a different performance indicator whose computation is relatively inexpensive. Unfortunately, the hypervolume is the only unary indicator which is known to be Pareto compliant [185], which makes less attractive the use of other performance indicators. Nevertheless, there are some other performance indicators which are weakly Pareto compliant, such as R2 [17] and the Inverted Generational Distance plus (IGD+) [79]. Although several efficient and effective indicator-based MOEAs have been proposed around these performance indicators (see for example [18, 72, 97, 105, 106, 160]), their use has remained relatively rare in the specialized literature.

In 2007, a different sort of approach was proposed, quickly attracting a lot of interest: the Multi-Objective Evolutionary Algorithm based on Decomposition (MOEA/D) [178]. The idea of using decomposition (or scalarization) methods was originally proposed in mathematical programming more than 20 years ago [35] and it consists in transforming an MOP into several single-objective optimization problems which are then solved to generate the nondominated solutions of the original problem. Unlike linear aggregating functions, the use of scalarization (or decomposition) methods allows the generation of non-convex portions of the Pareto front and works even in disconnected Pareto fronts. MOEA/D presents an important advantage with respect to methods proposed in the mathematical programming literature [such as Normal Boundary Intersection (NBI) [35]]: it uses neighborhood search to solve simultaneously all the single-objective optimization problems generated from the transformation. Additionally, MOEA/D is not only effective and efficient, but can also be used for solving problems with more than three objectives although in such cases it will require higher population sizes.

Decomposition-based MOEAs became fashionable at around 2010 and have remained as an active research area since then [144]. In fact, this sort of approach influenced the development of the Nondominated Sorting Genetic Algorithm-III (NSGA-III) [38] which adopts both decomposition and reference points to deal with many-objective problems. However, it was recently found that decomposition-based MOEAs do not work properly with certain Pareto front geometries [80]. This will certainly trigger a lot of research in the next few years, given the popularity of decomposition-based MOEAs.

Scalability

As has been pointed out, in the early days of MOEAs, their use was frequently limited to solving problems having only two or three objectives. However, over the years, the need for tackling many-objective problems became more evident. It was soon identified that scalability in objective function space is a serious limitation of Pareto-based MOEAs [75].

However, it is interesting to notice that when Schütze et al. [150] studied the actual source of difficulty in many-objective problems, they concluded that adding more objectives to an MOP does not necessarily makes it harder. According to this study, the difficulty is really associated to the intersection of the descent cones of the objectives (these descent cones are obtained with the combination of the gradients of each objective). This was somehow corroborated by an empirical study conducted by Ishibuchi et al. [78] in which it was shown that NSGA-II could properly solve many-objective knapsack problems in which the objectives were highly correlated. So, the question arises: why is that many-objective problems turn out to be difficult to solve in practice when using Pareto-based MOEAs? A series of experimental [131, 165] and analytical studies [32, 78, 86] have identified the following limitations of Pareto-based MOEAs in many-objective problems:

  1. 1.

    Deterioration of the search ability The proportion of nondominated solutions in a population increases rapidly with the number of objectives [49]. According to Bentley et al. [9], the number of nondominated k-dimensional vectors on a set of size n is \(O(\ln ^{k-1} n)\). This implies that in problems with a large number objectives, the selection of solutions is carried out almost at random or guided by the density estimator. In fact, Mostaghim and Schmeck [119] experimentally showed that a random search optimizer could achieve better results than NSGA-II [40] in a problem with ten objectives. This problem is the reason why most indicator-based MOEAs are able to tackle many-objective problems simply by using a more powerful density estimator that guides the search.

  2. 2.

    Dimensionality of the Pareto front Due to the ‘curse of dimensionality’ the number of points required to represent accurately a Pareto front increases exponentially with the number of objectives. The number of points necessary to represent a k-dimensional Pareto front with resolution r is given by \(O(kr^{k-1})\) (e.g., see [151]). This poses a challenge both to the data structures to efficiently manage that number of points in the population as well as in the external archive and to the density estimators to achieve an even distribution of the solutions along the Pareto front. In fact, this is a challenge even for performance indicators [78]. In practice, most MOEAs tend to use relatively small population sizes (less than 300 individuals) even when tackling MOPs with more than six objectives in spite of the fact that such population sizes are clearly inappropriate for sampling such high-dimensional Pareto fronts.

  3. 3.

    Visualization of the Pareto front Clearly, with more than three objectives is not possible to plot the Pareto front as usual. This is a serious problem since visualization plays a key role for a proper decision-making process. In recent years, a number of visualization techniques have been proposed for many-objective problems (see for example [161]), and this is still an active research area (see for example [14, 70, 156]).

In order to properly deal with many-objective optimization problems, three main approaches have been normally adopted [8, 96, 163]:

  1. 1.

    As mentioned before, the use of indicator-based MOEAs has been an important research trend to deal with many-objective optimization problems, in spite of the limitations of some performance indicators such as the hypervolume (see for example [82]).

  2. 2.

    One interesting possibility that was adopted in the early days of many-objective optimization was the use of an optimality relation that yields a solution ordering finer than that produced by Pareto optimality. These are normally called relaxed forms of Pareto dominance. Some examples are: k-optimality [50], preference order ranking [42], the favour relation [155], and a method that controls the dominance area [145], among others. Besides providing a richer ordering of the solutions, these relations obtain an optimal set that it is usually a subset of the Pareto optimal set.

  3. 3.

    Another interesting approach which is now rarely used is dimensionality reduction in which we reduce the number of objectives of the MOP either during the search process or in an a posteriori manner, during the decision-making process [19, 100, 146]. The main aim of reduction techniques is to identify redundant objectives (or at least partially redundant) in order to discard them. A redundant objective is one that can be removed without changing the dominance relation induced by the original objective set. Evidently, if each objective is conflicting with respect to every other objective, no reduction is possible. However, this is rarely the case in practice.

Many other approaches are possible for tackling many-objective problems, including, for example, the use of alternative ranking schemes (different from nondominated sorting) (see for example [54]), the use of machine learning techniques (as in MONEDA [108]), or approaches such as the two-archive MOEA, which uses one archive for convergence and another for diversity [131].

Dealing with expensive objective functions

In spite of the several advantages that MOEAs can offer for solving complex MOPs (e.g., ease of use and generality), their most important limitation is that they normally require a relatively high number of objective function evaluations to produce a reasonably good approximation of the Pareto front. The reason for this is that MOEAs need to sample the search space in order to identify an appropriate search direction, since they are stochastic search techniques. This is, indeed, a serious limitation and, in some cases, it can make MOEAs inappropriate for solving certain real-world MOPs in which their computational cost becomes prohibitive (e.g., applications in aeronautical and aerospace engineering [122]).

In general, MOEAs can become computationally unaffordable for an application when:

  • The evaluation of the fitness functions is computationally expensive (i.e., it takes from minutes to hours, depending on the quality or granularity of the model and the available computational resources).

  • The fitness functions cannot be defined in an algebraic form (e.g., when the fitness functions are generated by a computational simulation of the physics of the real system).

  • The total number of evaluations of the fitness functions is limited by some financial constraints (i.e., there is a financial cost involved in computing the fitness functions and we cannot exceed a certain pre-defined budget).

As we get access to more computational power each year at more affordable prices, the interest in pursuing research in the development of MOEAs for solving computationally expensive MOPs has significantly increased in the last few years. The main approaches that have been developed in this area can be roughly divided into three main groups [143]:

  1. 1.

    Parallelism The use of parallel processing is perhaps the most obvious choice for solving computationally expensive MOPs, particularly with the decrease in the cost of high-speed multi-core processors (see for example [37, 83, 110, 124, 139]). Something interesting, however, is that in spite of the existence of an important number of papers on parallel MOEAs [159], basic research in this area has remained scarce since the origins of MOEAs [29]. Most papers on parallel MOEAs focus on applications [173] or relatively straightforward parallel extensions of well-established MOEAs such as NSGA-II [130] or SMS-EMOA [71]. For many years, the emphasis was placed on developing synchronous parallel MOEAs, but in recent years, the development of asynchronous implementations (which are more appropriate for the heterogeneous computer architectures that are more common nowadays) has become more common [41, 68, 171, 174].

  2. 2.

    Surrogates When using this approach, an empirical model that approximates the real problem is built through the use of information gathered from actual objective function evaluations [43, 121, 157]. Then, the empirical model (on which evaluating the fitness function is computationally inexpensive) is used to predict promising new solutions [157]. Current functional approximation models include Polynomials (response surface methodologies [47, 172]), artificial neural networks (e.g., multi-layer perceptrons (MLPs) [5], radial-basis function (RBF) networks [2, 177], Gaussian processes [15, 179], support vector machines [3, 4, 176] and Kriging [111, 126] models. Although frequently used in engineering applications, surrogate methods can normally be adopted only in problems of low dimensionality, which is an important limitation when dealing with real-world MOPs. Additionally, surrogate models tend to lack robustness which is also an important issue in optimization problems. Nevertheless, there has been recent research oriented towards overcoming the scalability and robustness limitations of surrogate methods (see for example [102, 125, 134, 175]).

  3. 3.

    Fitness inheritance This approach was originally proposed by Smith et al. [152] with the aim of reducing the total number of fitness function evaluations performed by a (single-objective) evolutionary algorithm. This mechanism works as follows: when assigning fitness to an individual, in some cases we evaluate the objective function as usual, but the rest of the time, we assign as the fitness value of an individual the average of the fitness of its parents. This allows saving one fitness function evaluation. This idea is based on the assumption of similarity of an offspring to its parents. Evidently, fitness inheritance must not be always applied, since the algorithm needs to use the true fitness function value from time to time, in order to obtain enough information to guide the search. The percentage of time in which fitness inheritance is applied is called inheritance proportion. If this inheritance proportion is one (i.e., 100%), the algorithm is most likely to prematurely converge [24]. Extending fitness inheritance to multiobjective optimization involves several issues, mainly related to its apparent limitation for dealing with non-convex Pareto fronts [45]. However, some researchers have managed to successfully adapt fitness inheritance to MOEAs [55, 128, 135, 169], reporting important savings on the total number of objective function evaluations performed.

Hyper-heuristics

A hyper-heuristic is a search method or learning mechanism for selecting or generating heuristics to solve computational search problems [20]. Hyper-heuristics are high-level approaches that operate on a search space of heuristics or on a pool of heuristics instead of the search space of the problem at hand. Hyper-heuristics have been promoted with the aim to provide more general search methodologies. Typically, simple heuristics tend to work well on a particular type of problem. However, when facing a new problem or even slightly modified instances of the same problem, such heuristics tend to perform poorly. Additionally, identifying which heuristic works efficiently on a certain problem is a very tedious and time-consuming task whose computational cost may become prohibitive in some applications.

Burke et al. [21] proposed a taxonomy of hyper-heuristics considering two dimensions:

  1. 1.

    The nature of the heuristics’ search space, and

  2. 2.

    The different sources of feedback information.

Regarding the nature of the search space, there are two options:

  1. 1.

    Heuristic selection, which are methodologies for choosing existing heuristics,

  2. 2.

    Heuristic generation, which are methodologies for generating new heuristics from the components of existing ones.

Regarding the source of feedback information obtained during the search process, there are three options:

  1. 1.

    No-learning, in which there is no learning mechanism and the heuristic selection is based on either a random or an exhaustive process,

  2. 2.

    Offline learning, in which knowledge is gathered in the form of rules from a set of training instances, that will hopefully generalize to solve unseen instances, and

  3. 3.

    Online learning, in which the learning takes place while the algorithm is solving an instance of a problem.

The use of collaborative approaches which work as hyper-heuristics can be found across Operations Research, Computer Science and Artificial Intelligence. Although the ideas behind hyper-heuristics can be traced back to the early 1960s in single-objective optimization, until relatively recently, their potential had not been explored in multiobjective optimization. Early attempts in this field date back to 2005, when hyper-heuristics started to be used to solve multiobjective combinatorial optimization problems, such as space allocation and timetabling [22], decision-tree induction algorithms [7], bin packing and cutting stock problems [59], integration and test order problems [63, 64, 107], spanning trees [89], job shop scheduling [162], knapsack problems [90] and software module clustering [91, 92], among others.

Multiobjective hyper-heuristics for continuous search spaces are still rare in the specialized literature. For example Vrugt et al. [164], proposed the Multi-ALgorithm Genetically Adaptive Method (AMALGAM), which is an online selection hyper-heuristic that operates similarly to NSGA-II [40]. However, in this case, the offspring are also created using the variation operators of other stochastic search methods, such as differential evolution [154], particle swarm optimization [84] and adaptive metropolis search [65]. Although all these methods participate during the optimization process, those showing the highest reproductive success are favored. Additionally, the initialization of the population adopts Latin hypercubes sampling.

León et al. [95] proposed the Metaheuristics-based Extensible Tool for Cooperative Optimization (METCO) which is based on the island model and the cooperation of a set of MOEAs, which grants more computational resources to those algorithms that show a more promising behavior. A coordinator node is in charge of maintaining the global solution and selecting the configurations that are executed on the islands. A configuration consists of a MOEA plus the variation operators and the set of parameters which define them (population size, mutation and crossover rates, etc.). These parameters are defined by the user. The global solution set is obtained by merging local results achieved by each of the islands and its size is limited using the crowding distance operator. Besides the global stopping criterion, a local stopping criterion is defined for the execution of the MOEAs on the islands. When the local stopping criterion is reached, the configuration is scored using a performance indicator. Then, the coordinator applies the hyper-heuristic, selecting the configuration that will continue executing on the idle island. If the configuration has changed, the subpopulation is replaced by a random subset of the currently global solution.

Wang and Li [170] proposed the Multi-Strategy Ensemble Dynamic Multi-Objective Evolutionary Algorithm (MS-MOEA), which is an offline selection hyper-heuristic that adopts the fundamental principle of AMALGAM of combining different variation operators. This approach works as \(\epsilon \)-MOEA [39] with an external archive that is pruned to a limit size using the hypervolume indicator. The heuristics for generating new individuals consists of two re-initialization techniques, which are based on random sampling and Gaussian distribution with mean around the previous optimal solutions; the genetic operators SBX (Simulated Binary Crossover) and polynomial-based mutation; the Differential Evolution strategies DE/rand/1 and DE/current to best/1; as well as Gaussian mutation. This approach was designed to solve dynamic multiobjective optimization problems (i.e., problems in which either the Pareto optimal set or the Pareto optimal front change over time). If there is an environmental change, then one re-initialization technique is applied under certain probability. Genetic operators are used at early stages of the evolutionary process. Once convergence is detected, Differential Evolution is adopted to enhance diversity. Under this scheme, each strategy creates an offspring. After a fixed number of solutions is created, Gaussian mutation is launched for escaping from local optima. It is worth noticing that convergence is detected when the external archive has been full during a certain number of generations.

McClymont et al. [112] proposed the Markov Chain hyper-heuristic (MCHH), which is a selection heuristic with online learning working as a (\(\mu + \lambda \))-evolution strategy with an unbounded external archive. The pool of heuristics is composed of four variation operators: mutation, replication, transposition, and cloning. All of them operate on the decision variables of a given solution. The heuristic selection mechanism uses a Markov chain, which can be seen as a directed graph where every vertex is connected to each other vertex and to itself. A vertex represents a state (heuristic), and the weight of an edge out represents the probability of moving from the current state to the destination state. All edges out of a state must sum one. The Markov chain stochastically selects the next heuristic biased by these probabilities. The selected heuristic is employed during a certain number of generations. Then, its performance is measured by counting the number of parents that were dominated by each offspring produced by such heuristic. This performance is used to update the corresponding probability in the Markov chain using reinforcement learning.

Maashi [104] proposed an online learning selection choice function based hyper-heuristic framework for multiobjective optimization. Her proposed approach controls and combines the strengths of three well-known MOEAs (NSGA-II, SPEA2, and MOGA), which are adopted as her low-level heuristics.

Gonçalves et al. [60] proposed the MOEA/D Hyper-Heuristic (MOEA/D-HH), which is an online selection hyper-heuristic that is coupled to a MOEA/D variant [178]. In this approach, an adaptive choice function is used to determine the Differential Evolution (DE) strategy that should be applied to generate individuals at each iteration.

Walker and Keedwell [166] proposed the indicator-based multiobjective sequence-based hyper-heuristic (MOSSHH) algorithm. This seems to be the first attempt to use a hyper-heuristic in many-objective problems. This online selection hyper-heuristic is based on a hidden Markov model to determine the mutation strategy to be applied for generating a single child from the current parent. Thus, this approach works as a (\(1+1\))-evolution strategy complemented with an external archive, which keeps all the nondominated solutions discovered so far. The pool of seven mutation heuristics consists primarily of (1) adding noise to the current solution using three different continuous probability distributions, and (2) replacing the parent (or only a variable) with another one, whether randomly created or taken from the archive. At each iteration, the child replaces the parent if the former dominates the second. However, in a further paper [167], this comparison rule was changed by strategies based on the hypervolume indicator [182], the favour relation [44] and the average rank [10]. Moreover, the hidden Markov model is updated if the child is added to the archive and if it was better than the parent.

More recently, Hernández Gómez and Coello [73] proposed a hyper-heuristic which combines the strengths and compensates for the weaknesses of different scalarizing functions. The selection is conducted through an indicator called s-energy [69], which measures the even distribution of a set of points in k-dimensional manifolds.

Challenges

After reviewing some of the most relevant research done on the topics selected in “Open research areas”, we provide here some of the challenges that lie ahead in each of them.

  • Algorithmic design Although there has been some debate regarding the paradigm that will become more common in the next few years, it seems quite obvious at this point that decomposition-based MOEAs are the clear winner so far. For indicator-based MOEAs to become more popular, other performance indicators need to be proposed. Although there has been some research activity in this regard (see for example [113, 140]) none of these other performance indicators has become as popular as the hypervolume. Another interesting idea is the combination of performance indicators in order to take advantage of their strengths and compensate for their limitations (see for example [48]).

    Nevertheless, a more relevant question in this area is the following: can we design MOEAs in a different way? This is a very important question, since algorithmic development has been at the heart of research on EMO and it is quite important that new algorithmic proposals are made in the next few years in order to keep this research area alive. Evidently, it is not trivial to produce a selection mechanism that is not Pareto-based, decomposition-based or indicator-based, but this is indeed possible. For example, Molinet Berenguer and Coello [11, 118], proposed an approach that transforms an MOP into a linear assignment problem using a set of weight vectors uniformly scattered. Uniform design is adopted to obtain the set of weights, and the Kuhn–Munkres (Hungarian) algorithm [87] is used to solve the resulting assignment problem. This approach was found to perform quite well (and at a low computational cost) even in many-objective optimization problems. As such, this approach does not belong to any of the three algorithmic families previously discussed and it constitutes an intriguing new family of MOEAs.

    So, it should be clear that a challenge is to develop selection mechanisms for MOEAs that are different from those that have been developed so far. Additionally, popularizing the use of such MOEAs is certainly another (perhaps more difficult) challenge.

    Another challenge in this area is to gain a deeper understanding of the limitations of current MOEAs. For example, knowing that some scalarizing functions offer advantages over others is very useful to design good decomposition-based and even indicator-based MOEAs (see for example [127]).

    Another interesting idea is to combine components of MOEAs under a single framework that allows to exploit their advantages. This is the basic idea of Borg [66], which adopts \(\epsilon \)-dominance, a measure of convergence speed called \(\epsilon \) progress, an adaptive population size, multiple recombination operators and a steady-state selection mechanism.

    A relevant question in this regard is if this sort of scheme could lead us to the automated design of MOEAs as has been suggested by researchers from automated parameter tuning for single-objective evolutionary algorithms [76].

  • Scalability This area presents several challenges. For example, what sort of performance indicator should we use to assess diversity in many-objective problems? In low-dimensional Pareto fronts, the aim is normally to achieve a uniform distribution of solutions along the Pareto front. However, what would be a desirable distribution in an MOP having, for example, ten objectives, if we are using only 300 solutions to sample it? In recent years, the use of some performance indicators such as the Riesz s-energy [69] have shown promise in this regard, but more research in this area is required.

    In contrast with the significant interest that researchers have had on many-objective optimization in recent years, scalability in decision variable space (i.e., the solution of the so-called large-scale problems) has been only recently studied in the context of multiobjective optimization (see for example [103, 116, 181]). This is remarkable if we consider that large-scale multiobjective optimization problems (i.e., problems having more than 100 decision variables) are not rare in real-world applications (see for example [101]). In this area, the use of cooperative coevolutionary approaches (which have been very successful in single-objective large-scale optimization) is the most common research trend. However, new test suites are required for large-scale multiobjective optimization and some work has already been done in this direction (see for example [25]).

    Another challenge in this area is the solution of large-scale many-objective problems which is a very recent research topic in which some work has been recently published (see for example [23, 180]).

  • Dealing with expensive objective functions Other approaches for dealing with expensive objective functions are also possible. For example, some researchers have adopted cultural algorithms [30, 34, 133, 136], which gather knowledge during the evolutionary process and use it to perform a more efficient search at the expense of a significantly larger memory usage. Cultural algorithms were proposed by Reynolds [137, 138], as an approach that tries to add domain knowledge to an evolutionary algorithm during the search process, avoiding the need to add it a priori. This approach uses, in addition to the population space commonly adopted in evolutionary algorithms, a belief space, which encodes the knowledge obtained from the search points and their evaluation, in order to influence the evolutionary operators that guide the search. However, the belief space is commonly designed based on the group of problems that is to be solved. At each generation, the cultural algorithm selects some exemplar individuals from the population, in order to extract information from them that can be useful during the search. Such an information is used to update the belief space. The belief space will then influence the operators of the evolutionary algorithm, to transform them into informed operators that can enhance the search process. Cultural algorithms can be an effective means of saving objective function evaluations, but since a map of decision variable space must be kept at all times, their cost will soon become prohibitive even for problems of moderate dimensionality (in decision variable space).

  • Hyper-heuristics We certainly need theoretical studies on the use of hyper-heuristics in multiobjective optimization and some work in that direction has been already done. Qian et al. [132] provided a theoretical study on the effectiveness of selection hyper-heuristics for multiobjective optimization. This paper concluded that applying selection hyper-heuristics to any of the three major components of a MOEA (selection, mutation and acceptance), can exponentially speed up the optimization process.

    Another interesting idea is to combine different performance indicators within an indicator-based MOEA as proposed by Falcón-Cardona and Coello [48]. In this case, IGD\(^+\), \(\epsilon +\), \(\varDelta _p\) and R2 are adopted as possible density estimators (i.e., the low-level heuristics).

    One more interesting area of research would be the use of genetic programming to generate components of MOEAs (e.g., evolutionary operators or even scalarizing functions) that can improve their performance when adopted within a multiobjective hyper-heuristic.

Conclusions

As we have seen in this paper, EMO still has plenty of topics to be explored. However, it is important to emphasize that some of them require us to move outside the main stream of research currently being conducted in this area. Besides the topics previously indicated, there are several more that have been already explored, but are worth re-visiting. For example, we need new performance indicators, particularly for many-objective optimization. For instance, we have very few performance indicators for assessing diversity in many-objective optimization (see for example [99, 168]), but there are other interesting choices that are also worth exploring (see for example, the s-energy indicator [69]).

It is also important to design new mechanisms (e.g., operators, encodings, etc.) for MOEAs based on specific features of real-world problems (e.g., variable length encodings, expensive objective functions, uncertainty, etc.). See for example [98]. The way in which coevolutionary approaches can help us to solve complex multiobjective optimization problems is another interesting venue for future research. Besides large-scale problems, coevolution can help us in other domains (e.g., dynamic multiobjective optimization problems [56]), but its potential has been scarcely studied in this area (see [117]).

However, it is important to keep in mind that a great source of diversity regarding research ideas is the knowledge coming from other disciplines. For example, EMO has adopted advanced data structures (e.g., red–black trees [51]), concepts from computational geometry (e.g., convex hulls [26, 109], quadtrees [120] and Voronoi maps [123]), and from economics (e.g., game theory [61]) to design novel MOEAs and operators.

We also need to explore more ways of bridging the gap between Operations Research and EMO. An example is the development of hybrid approaches that combine a MOEA with a mathematical programming technique (see for example [94]). Another one is the use of the Karush–Kuhn–Tucker optimality conditions to estimate proximity of a solution to the Pareto optimal set (see [1]).

Summarizing, we claim that EMO is still a very promising research area which should remain active for several more years. However, we need to increase diversity in our research topics and to be more disruptive. If we only do work by analogy, we will suffer stagnation!