1. Introduction
Current measurement in aerial power lines, in winding packs for high-field magnets, or in plasmas for industry applications cannot be achieved easily using standard sensors, due to poor accessibility of conductors (e.g., for aerial lines) or to demanding technical issues (e.g., in high field magnets supply), or to harsh environment (e.g., in high temperature plasmas). As a matter of fact, in the proposed examples, not only the total current amplitude but also frequently the current distribution inside the support region (the different conductors in the aerial lines and in high field magnets or the plasma column itself in the latter case) is required. In such cases, the concept of measurement must be understood in a broader sense, and suitable current distribution sensors should be introduced as a combination of magnetic measurements and suited mathematical treatment to cope with the inverse problems of reconstructing current data from magnetic field sensors. The general purpose of the paper is twofold: on the one hand, to provide an overview of effective methods for inverting data from field sensors in order to identify the current distribution and, on the other hand, to test them in a comparative way against a well-known benchmark problem.
The need for these methods is believed to be important because the underlying inverse problem is ill-posed, leading to spurious solutions. The ultimate goal is to pave the way for a virtual sensor system, i.e., a numerical procedure that could help both the current reconstruction, given a set of measurements, and the current source synthesis, given a set of specification on the field distribution in a region of interest.
Recently, many research areas have taken advantage of the potential offered by behavioral models based on machine learning (ML) or (deep) neural networks (DNN) [
1,
2]. In fact, recent works on the DNN-assisted analysis of electromagnetic (EM) field computation problems showed the promising potential of convolutional neural networks (CNN) and ML tools [
3,
4,
5,
6,
7,
8,
9,
10,
11]. A comprehensive review of recent works on ML for the design optimization of electromagnetic devices can be found in [
4], where the growing interest of the community is clearly evidenced. Some works adopted ML or DNN models to predict the key performance indicators of electrical machines [
5,
6,
7], whilst others focused on topology optimization [
8,
9,
10,
11].
The main appeal of ML or DNN in dealing with inverse problems is their capability of achieving efficient solutions from experiential knowledge rather than mathematical formulations. On the other hand, such models do not always grant accuracy. A combined use with more classical approaches can be pursued to improve overall performance.
Note that the data used to train the ML or DNN models are inherently bidirectional, and the role of inputs and outputs can be, up to a certain level, interchanged, training the model to directly identify materials, geometries, or sources from measurements of electromagnetic fields. This approach would allow the resolution of inverse problems in much shorter times than by using classical methods, especially when endowed with iterative schemes.
To ease reading, it is fruitful to provide here a definition of inverse problems in terms of the reconstruction of system characteristics, e.g., its inner structure, from observed or desired data. These problems appear in various applications, such as medical imaging with X-rays [
12] or other electromagnetic sources [
13]. Image processing is the best-known application of behavioral approaches to inverse problems. To cite just a few examples, classical DNNs are compared in [
14] with classical sparse reconstruction algorithms, while several CNNs are presented in [
15] for medical applications of magnetic resonance imaging. Other possible approaches include recurrent neural networks (where node-connecting weights form a directed graph) and generative adversarial networks (two networks competing into a sort of game [
13] each to achieve a different objective in the data processing, regularizing in this way the overall behavior).
In [
16], multilayer perceptron (MLP) autoencoders are added to the previously listed approaches. Quite notably, early attempts to solve inverse problems using fully connected neural networks (FCNNs) are reported as early as 1992 [
17]. Finally, [
18] presents a taxonomy of inverse problems depending on the type of supervision and knowledge of the corresponding direct problem.
Although numerous works introduce ML approaches and NN to solve direct problems in electromagnetism, contributions addressing the inverse case are still rare, yet steadily increasing. Since the inverse problems we are dealing with are classically formulated as the minimization of a reconstruction error, a regularization of the problem (as raw observed data are frequently compatible with multiple solutions) is needed. Usually, to achieve the minimum error, iterative processes are used. While DNN can provide a solution in a single step, when properly trained, much care must be given to the way behavioral approaches regularize the problem. As a matter of fact, the DNN proposes the solution most closely corresponding to the observed data among those considered in the training step. Consequently, DNN does provide an inherent regularization, ruled by the construction of the learning set and by the teaching algorithm: this point needs further investigation in the viewpoint of authors.
In this paper, we first identify the characteristics of various Electromagnetic Inverse Problems (EIPs) usually found in practical cases. Then, we investigate different possible ML and NN approaches to the resolution of the EIP, with particular reference to the bi-directionality of the approach, i.e., to the possibility of training the model by changing the role of input and output of the direct problems, obtaining a straightforward resolution of the inverse problem.
In the first section, we provide a synthetic description of the direct problem we use in the paper and introduce the relevant inverse problem and its mathematical characteristics. The considered EIP is a current synthesis problem: examples of this class being the reconstruction of current distributions from external magnetic measurements, as cited above, or the optimal choice of currents to generate a given field distribution. In the following section, we briefly discuss the classical regularization methods used to allow the resolution of EIPs. Then, we present a short review of available behavioral approaches, together with the numerical techniques used to improve their performance. Finally, we test the proposed schemes on the benchmark problem, which, albeit simple in its scheme, does show all the problems usually faced in more complex cases. To the best of our knowledge, this is the first attempt to assess the inherent regularization capabilities of ML and NN approaches to EIP and to make a comparison between the characteristics of such models and the more classical regularization strategies usually adopted in the resolution of EIP.
2. Materials and Methods
2.1. Direct and Inverse Electromagnetic (Source) Problems
As described above, data-based models require massive amounts of data in the training step. In the class of problems considered here, such data are related to the measurements of magnetic fields and their sources. To focus on the background theoretical aspects of this problem, we preferred in this paper to use numerically simulated data. In particular, we adopted as a simple example the computation of the magnetic field
H in free space generated by a set of currents
J flowing in conductors with known geometry Ω
s (
Figure 1). Without any pretense of generality, we provide in this section a synthetic description of the mathematical formulation we adopt in the paper for the computation of the magnetic field.
In a homogenous domain, the equation governing the link between current distribution and magnetic field is given by the Biot–Savart integral:
where
H is the magnetic field,
rf is the position vector of the field points (the sensors),
rs ∈ Ω
s is the source point considered in the integration process,
J is the source current density, assumed known in the direct problem, and Ω
s is the source region. We assume that the field is to be computed externally to Ω
s, to avoid convergence issues. When
J is assigned and
H is unknown, a direct problem arises.
Specifically, to formulate the direct problem in a concise yet explicative form, we can write:
where
is usually an integral operator, such as (1). In (2),
J represents the input data, while
H represents the output. The dependence on the current density map
J, the field point(s)
(the underbar sign “_” indicating an array of points), and the source volume Ω
s have been highlighted. The formulation (2) is general enough to allow the presence of magnetic or conducting material regions, indicated by Ω
mat and Ω
cnd, respectively, but neglected in this analysis for the sake of simplicity; in addition, we will assume that all materials behave linearly with respect to the current–field relationships. The current material support is known and fixed, and the current is constant, thus allowing a magnetostatic formulation.
Let us now flip our point of view and attempt to formulate the problem of looking for an unknown current
J from a given field map
H outside the source regions, which is known as the inverse (source) problem. In this case, we introduce the inverse operator:
Equation (3) describes a first type (inhomogeneous) Fredholm equation, generally expressed as:
where
represents the data of the problem,
f is the unknown, and
K is called the kernel of the equation. The possibility of solving the inverse source problem depends on
K, on the data space (from now on, named Y), and on the solutions space (named X from now on). Let us recall here that the inverse operator
−1 exists if and only if
is bijective, that is:
and
The set is known as the rank of the operator. Unfortunately, the plain existence of an operator −1 is not enough for to be invertible, since the solution can be not unique. It is possible to state that a linear operator is invertible if it has a bounded inverse.
We will use the additional statement that a compact linear operator admits a bounded inverse −1 if its rank, , has a finite dimension. Thus, as a conclusion, we can state that, to have a stable solution, we need −1 to be a linear, compact operator, and its rank (−1(Y)) to be of finite dimension. Unicity also requires X to be of finite dimension.
Our operator is an integral operator, defined by a kernel K, which, in the case we are considering, is the fundamental solution of the magnetostatic problem as described in (1). This implies:
- (a)
is a linear operator;
- (b)
the kernel function is well-behaved (smooth, continuous, etc.), quadratically bounded, and grants compactness to . So, according to what was stated before, we only need to constrain the data and solution spaces to have a finite dimension.
We usually have a discrete set of measurements, and we need to elaborate them numerically. In order to distinguish the theoretical field H generated by J from the one actually measured, which is affected by uncertainties and noise and generally known in a (discrete) subset of points, we will indicate the array of available field measurements as MH. In addition, from a practical point of view, current distribution can usually be represented by a set of parameters, the most straightforward being the current amplitude in the diverse conductors, but a different view can be the coefficients of current density map in some representation bases. In any case, we will use the symbol I to indicate the solution parameters array. Under the assumed linearity hypothesis, the discrete nature of both sources and measurements allows one to postulate the existence of a matrix transforming the former into the latter. This matrix is usually called the lead field matrix, and we will be indicating it by the symbol . In the example problem we are considering, the elements amn of can be computed by evaluating the Biot–Savart integral (1) on the n-th conductor Ωn in the m-th field point rm.
Having achieved discrete finite-dimensional data space Y and solutions space X, we have demonstrated that the problem admits a unique solution, but this does not yet grant well-posedness since the solution may depend not smoothly on data. If this is the case (as it usually is), we must be aware of the impact of noise and approximation and select the best discrete approach. We must, in any case, keep in mind that a problem in the resolution process usually is not a consequence of lack of data, but rather a consequence of the nature of the operator or a consequence of a wrong choice of data and solution spaces.
2.2. Regularization Methods: A Review
The correct approach to obtain solution uniqueness is to adopt regularization methods. In this section, we present a comparative review of some among the best-known regularization methods and their application to the proposed current identification problem. We consider the following schemes:
Classical (direct) methods: Tikhonov method, Truncated SVD, and ν-Method;
Statistical methods: Linear Regression, linear fit with Principal Component Analysis, and Elastic Net Regularization.
The classical and statistical methods considered here are based on the properties of the lead field matrix , computed using the Finite Element Method (FEM). Both classes of methods apply as well in the case of lead field matrix recovered from a purposedly designed set of measurements.
2.2.1. Direct Methods
The classical linear inverse problem
I =
−1MH (where
−1 must be understood as the Moore–Penrose pseudo-inverse) has been tackled using many different approaches for its regularization. A non-exhaustive list may include the Tikhonov approach (TA, [
19]), the Truncated Singular Value Decomposition (T-SVD, [
19]), and the Discrepancy Principle (DP, [
20]). A new group of methods, collectively known as iteration-based, has started to be considered more recently. Examples are the ν-Method (νM, [
21]) and the ART method [
22]. A broader list of possible regularization schemes can be found in [
23,
24]. We just briefly describe here those that are considered in the following for the comparison with the behavioral models.
TA: The Tikhonov approach is probably the most diffused counter measure to the ill-posed nature of inverse problems. In the notation adopted here, the solution process of the (regularized) inverse problem can be cast as:
where
represents the 2-norm of the measurements vector,
represents the 2-norm of the parameters vector, and
λ is the regularization parameter. The performance of the TA depends on the parameter
λ, balancing the model error and the solution norm. The L-curve approach [
25], or alternatively the generalized cross validation method [
26], are the most adopted strategies to choose its value.
T-SVD: The Truncated Singular Value Decomposition is based on the representation of
in terms of its left and right singular vectors:
where
ui and
vi are orthonormal vectors in the currents space and in the measurements space, respectively;
si are the singular values of
, in descending order; and N is the matrix rank. To obtain a (rank-deficient) well-conditioned matrix
n, it is possible to truncate the summation to an index
n < N. The pseudo-inverse
−1n provides a (regularized) solution:
In =
−1n MH. The smaller is n, the smoother but less detailed will be the solution.
νM: It can be shown that iterative algorithms (e.g., conjugate gradient) allow smoother components of the solution of the linear problem to converge earlier. The ν-Method leverages this property to regularize the resolution process by stopping the iterations before complete convergence. The role of a regularizing parameter in this case is played by the number of iterations.
2.2.2. Statistical Approaches
Statistical approaches can be used to solve inverse problems when a dataset of correlated source and measurement values is available. Taking inspiration from experimental physics, we can extract some relationship (e.g., a linear interpolation) between the outputs, in our case, magnetic fields, and the inputs, in our case, the currents, fitting the model to the data. Under suitable hypothesis on the distribution of the data and on the underlying actual model, as presented in
Section 2, the fitted model will be able to provide reliable estimates of the output as well as for unseen inputs. Note that also in the case of fitted models, the ill-conditioned nature of the underlying problem amplifies data nuisances, and some regularizing techniques should be applied. We will briefly analyze here a few well-known interpolation approaches.
MLR: Multi-Linear Regression adopts linear regression model from multiple data m
k (k = 1, 2, …, N
meas) to multiple output I
i, expressed by:
where βi
0 and β
ik k = 1 … N
meas are the interpolation coefficients, and ε
i is the residual error, due to additive white Gaussian measurement noise, for example. Currents I
i are fitted independently. Least squares minimization is used to estimate the fit coefficients. Thanks to the assumptions on the noise, the coefficients also maximize the likelihood of the prediction vector.
LPCA: Linear fit with Principal Component Analysis starts from the assumption that the information about the (required) field map is highly redundant, so any regression model should probably address such an issue. This is easily verified from the lead field matrix analysis and from the correlation analysis of the field measurement. In such cases, PCA can be used to extract the most effective regressors. This helps in regularizing the problem, as PCA removes any redundancy among input data. The elements of the orthogonal basis made of principal components can be ranked in a decreasing order of variance over the data set, and reduced models explaining any desired level of data variance can be obtained.
ENR: Elastic Net Regularization is a regularization technique minimizing regression coefficients of the less relevant variables. For each reconstructed variable (currents, in our example), the ENR technique solves the following minimization problem to find the set of interpolation coefficients β
0, β
k, k = 1 … N
meas [
27]:
where
,
λ is a nonnegative real number, and
. Note that ENR for
α = 1 reduces to lasso regularization, while for
α → 0, it approaches ridge regression.
2.3. Machine Learning and Neural Network Models for Electromagnetic Inverse Problems
The data-driven statistical approaches described in
Section 2.2.1, i.e., learning a behavioral model using an available collection of paired input–output quantities, is the basic operating principle of supervised learning algorithms such as NN and other ML algorithms. The use of ML is a natural choice when the behavior of the model is generally too complex to be efficiently described analytically, or perfect knowledge of physical parameters is lacking, and this is the case in many inverse and direct problems involved in electromagnetic applications.
The success of NN and other algorithms, such as support vector machines, is due mainly to two factors: they are universal approximators, and their generalization and regularization capabilities can be controlled in several different ways. For example, regularization can be improved by diminishing the number of neurons in the hidden layer by early stopping of the training (which is equivalent to νM or ART regularization techniques), by using a regularization term in the loss function that penalizes the presence of large neural weights (which is in a sense similar to the TA), or by the so-called dropout method that randomly removes a certain number of neural connections.
A further, relevant consideration exists regarding the dimensionality of the input and output vectors. In fact, when the number of outputs exceeds the number of inputs, we are asking the model to generate redundant information possibly not present in the input itself, and this usually leads to poor performance of training algorithms. When using classical approaches, standard countermeasures include the adoption of a regularization technique. In the case of ML or NN, other possibilities are available. As a matter of fact, it would be preferable to apply a dimensionality reduction technique to the output data before training the model (either ML or NN) or to add some a priori information, as in the case of Physics-Informed Networks [
28].
On the other hand, when the number of inputs is greater than the number of outputs, NNs perform quite satisfactorily. However, if the number of inputs is very high or there are many linearly dependent inputs, the neural model can be affected by the course of dimensionality. In this case, a dimensionality reduction of the inputs is again recommended. As a result, in many cases, it is necessary to exploit the methods for reducing dimensionality, which can be linear, such as PCA, or nonlinear, such as autoencoders [
1]. Some models, such as DNN, can deal directly with high dimensional inputs, avoiding the need to reduce the number of features. In any case, a preliminary PCA is usually very helpful and adds valuable knowledge, revealing the directions along which data points are most distributed and how much information is lost when cutting negligible directions. In addition, in many cases, PCA is strongly related to mathematical features of the inputs data, which can be directly linked to a physical behavior of the system.
In the remainder of the paper, the authors use a benchmark problem, described in
Section 3, to test different data-driven approaches to solve the EIP. Three different EIP solution procedures are implemented and briefly described as follows:
2.3.1. EIP Using Neural Networks and Deep Learning
In this contribution, the forward operators consist of a dataset of Finite Element Models (FEMs) generating lead field matrices
for different choices of the geometrical quantities. Then, a PCA is applied to represent the
matrices in a lower-dimension feature space so that the original matrix can be well reconstructed from a reduced set of principal components. Then, we train an NN to predict the reduced set of principal components given the geometry of the system. The corresponding Lead Field (LF) full matrix is then reconstructed from the predicted principal components. Subsequently, the EIP can be solved by means of one of the above-mentioned regularization methods. In particular, the pseudo-inverse is computed by means the T-SVD approach. A scheme of this method is shown in
Figure 2.
The proposed method, which combines a CNN with an inversion technique, is very general because it is geometry-free. In fact, the network learns the equations, so it is able to generalize to new geometries, enabling a rapid solution of the synthesis problem.
With reference to this approach, a further remark can be made. It is common practice in many applications to solve an EIP by implementing an optimization procedure: the direct problem (from known sources and geometry to the measured field values) is solved iteratively, allowing the calculation of a fitness function; sources (and geometry) are updated to reach the desired value of the fitness function. At the end of the procedure, the resulting model is solicited with a set of (desired) field measurements, and a set of sources are obtained. Mathematically, we first obtain a model of the forward operator , and the optimization algorithm searches for the best currents I, minimizing an objective function of the form .
This implicit approach may find accurate results, but it is computationally expensive; for instance, in electromagnetics, is often evaluated by a numerical procedure (i.e., Finite Element Method, Boundary Element Methods, Integral Methods, etc.). The optimization phase is not relevant for the present comparative study, and it will be addressed in future works. However, ML can also play a fundamental role in this case; in particular, the ML-based surrogate model of that can be obtained with the NN-LF approach can be used to solve the direct problem at each iteration, resulting in a dramatic reduction of the overall computational time.
Alternatively, without an explicit use of the lead field matrix, a direct estimation of the inverse operator is obtained using different ML paradigms. In particular, one implementation of is obtained by training shallow neural networks with sigmoidal activation function, using different learning approaches. A second ML approach is considered, and a deep neural network, i.e., composed of multiple layers, is trained and tested.
Learning the inverse operator by training an NN allows us to exploit the different and powerful regularization approaches usually adopted in NN training steps, such as early stopping with a validation set or Bayesian regularization. Moreover, the shallow fully connected sigmoidal NN being a universal approximator, it is likely to correctly learn and represent the inverse model from the training data. A deep, multi-layer neural network is an alternative approach that is often heuristically found to outperform the shallow neural network, also allowing us to use a combination of linear and nonlinear layers in order to take into account previous knowledge on the model that generated the dataset.
2.3.2. EIP by Linear Regression
This approach consists of learning an estimate of the inverse operator from the data to predict the currents I from the magnetic measurements MH. We denote this approach as the explicit inverse model. The main advantage of this approach is that, once the model is trained, it can perform the inversion in an extremely short time. Of course, a training dataset containing Nsamples observations is necessary. One disadvantage is that if the geometry of the system changes, the model is no longer valid: a new training should be performed on a newly generated dataset. In particular, the use of Standard Regression Algorithm (SRA), Robust Regression Approaches (RRA), and Truncated Singular Value Decomposition (T-SVD) pseudo-inverse is investigated and applied to the benchmark problem.
5. Discussion and Conclusions
The problem faced here, although showing a simple structure to ease comprehension and reduce computational burden, shows all the pitfalls of electromagnetic inverse problems. In our opinion, the difficulties in the resolution of the problem using classical approaches are intrinsic in their mathematical structures, as discussed in
Section 2, and cannot be overcome by a plain, straightforward application of machine learning. This point has been demonstrated, in our opinion, by the poor performance of simple, non-regularized regression or neural approaches. On the other hand, regularization schemes are available also for the latter, so we compared regularized neural networks with similar classical schemes, showing how NNs have the capability of extracting the underlaying relationships quite naturally, with minimal tailoring of learning schemes. This is not always the case for classical approach, a typical example being the choice of the truncation threshold required in the T-SVD approach or the choice of the regularization parameter in the Tikhonov regularization.
In our opinion, many similarities can be found between some classical regularizations and the way neural networks need to be trained to achieve satisfactory results. As an example, the νM classical approach aims to prevent the overfitting of the dataset, using the jargon of neural network practitioners. As a second example, the Bayesian learning aims to minimize the weights of the network, thus bearing some similarities with the Tikhonov regularization. Conversely, the intrinsic feature of repeated examples presentation, eventually in varying order, gives to the training process of neural networks an effective capability of dealing with noisy, imprecise data, which is not a characteristic of any classical algorithm, although it can be transferred to regression algorithms by dividing the data set into smaller sets and fitting repeatedly on each of them. Note that while the handling of ill-posedness in the classical approaches has been designed specifically for the resolution of inverse problems and benefits from long-lasting experience for parameter tuning, the countermeasures adopted to improve NN performance are rather general purpose ones, and we are convinced that better results could be achieved by fine tuning their parameters.
As a conclusion, the adoption of machine learning and, more specifically, neural networks, provides new tools for the resolution of (electromagnetic) inverse problems. The underlaying ill-posed nature of these problems, nevertheless, must also be dealt with when adopting data-based approaches. The main contribution of this paper, using a simple yet illustrative benchmark problem, is the attempt to compare some of the classical well known regularization schemes with some measures adopted in the training of machine learning or neural model.
In our opinion, there are correspondences between many classical regularization approaches and countermeasures used to allow NN to converge. A few were highlighted in this paper, but we are convinced that many others can be found.