Abstract
This paper suggests a new limited memory trust region algorithm for large unconstrained black box least squares problems, called LMLS. Main features of LMLS are a new non-monotone technique, a new adaptive radius strategy, a new Broyden-like algorithm based on the previous good points, and a heuristic estimation for the Jacobian matrix in a subspace with random basis indices. Our numerical results show that LMLS is robust and efficient, especially in comparison with solvers using traditional limited memory and standard quasi-Newton approximations.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
In this paper, we consider the unconstrained nonlinear least squares problem
with high-dimensional \(x\in {\mathbb {R}}^n\) and continuously differentiable \(E:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^r\) (\(r\ge n\)), possibly expensive. However, we assume that no derivative information is available.
1.1 Related work
In recent years, there has been a huge amount of literature on least squares and its applications. Here we just list a useful book and paper:
-
Ortega and Rheinboldt (2000) introduced an excellent book, both covering algorithms and their analysis.
-
An excellent paper, both covering Levenberg–Marquardt algorithms, quasi-Newton algorithms, and trust region algorithms and their local analysis without non- singularity assumption, has been introduced by Yuan (2011).
Derivative free unconstrained nonlinear black box least squares solvers can be classified in two ways according to how the Jacobian matrix is estimated, and according to whether they are based on line search or on trust region:
-
Quasi-Newton approximation. Sorber et al. (2012) introduced MINLBFGS (a limited memory BFGS algorithm) and MINLBFGSDL (a trust region algorithm using a dogleg algorithm and limited memory BFGS approximation).
-
Finite difference approximation. There are many trust region methods using the finite difference method for the Jacobian matrix estimation such as CoDoSol and STRSCNE by Bellavia et al. (2004, 2012), NMPNTR by Kimiaei (2016), NATRN and NATRLS by Amini et al. (2016, 2016), LSQNONLIN from the MATLAB Toolbox, NLSQERR (an adaptive trust region strategy) by Deuflhard (2011), and DOGLEG by Nielsen (2012). They are suitable for small- and medium-scale problems. Line search methods using the finite difference approximation are NLEQ (a damped affine invariant Newton method) by Nowak and Weimann (1990) and MINFNCG (a family of nonlinear conjugate gradient methods) by Sorber et al. (2012).
FMINUNC by MATLAB Optimization Toolbox is an efficient solver for small- and medium-scale problems. It uses the finite difference method to estimate the gradient vector and the standard quasi-Newton method to estimate the Hessian matrix. In fact, FMINUNC disregards the least squares structure and only has access to function values. Nevertheless, it will be shown that FMINUNC is more efficient than LSQNONLIN using the least squares structure.
To solve the least squares problem (1), trust region methods use linear approximations of the residual vectors to make surrogate quadratic models whose accuracy are increased by restricting their feasible points. These methods use a computational measure to identify whether an agreement between an actual reduction of the objective function and a predicated reduction of surrogate quadratic model function is good or not. If this agreement is good, the iteration is said successful and the trust region radius is expanded; otherwise, the iteration is said unsuccessful and the trust region radius is reduced, for more details see (Conn et al. 2000; Nocedal and Wright 1999).
The efficiency of trust region methods depends on how the trust region radius is updated (see, e.g., Ahookhosh et al. 2013; Amini et al. 2016, 2016); Esmaeili and Kimiaei (2014b, 2014a, 2015); Fan (2006); Fan and Pan (2009, 2010); Kimiaei (2017); Yu and Pu (2008)) and whether non-monotone techniques are applied (see, e.g., Ahookhosh and Amini 2011; Ahookhosh et al. 2015, 2013; Amini et al. 2016, 2016); Deng et al. (1993); Grippo et al. (1986); Grippo and Sciandrone (2007); Kimiaei (2016, 2017); Yu and Pu (2008)). Rounding errors may lead two problems:
-
(i)
The model function may not decrease numerically for some iterations. In this case, if there is no decrease in the function value for such iterations, trust region radii are expanded possibly several times which is an unnecessary expansion for them,
-
(ii)
The model function may decrease numerically but the objective function may not decrease in the cases where iterations are near a valley, deep with a small creek at the bottom and steep sides. In this case, trust region radii are reduced possibly many times, leading to the production of quite a small radius, or even a failure.
Non-monotone techniques can be used in the hope of overcoming the second problem.
1.2 Overview of the new method
We suggest in Sect. 2 a new trust region-based limited memory algorithm for unconstrained black box least squares problems, called LMLS. This algorithm uses
-
a non-monotone ratio and an adaptive radius formula to quickly reach the minimizer when the valley is narrow;
-
a Broyden-like algorithm to get a decrease in the function value when the trust region radius is so small and iteration is unsuccessful;
-
a finite difference approximation in a subspace with random basis indices to estimate the Jacobian matrix;
-
either a Gauss–Newton or a dogleg algorithm in a subspace with random basis indices to solve the trust region subproblems.
Numerical results for small- to large-scale problems are given in Sect. 3 showing the fact that the new method is suitable for large-scale problems and is more robust and efficient than solvers using limited memory and standard quasi-Newton approximations.
2 The trust region method
In this section, we construct an improved trust region algorithm for handling problems in high dimensions:
-
In Sect. 2.1 a Gauss-Newton direction in a subspace with random basis indices is introduced.
-
In Sect. 2.2 a non-monotone term and an adaptive technique are constructed to quickly reach the minimizer in the presence of a narrow valley.
-
In Sect. 2.3 a dogleg algorithm in a subspace with random basis indices is discussed.
-
In Sect. 2.4 a Broyden-like technique is suggested based on the old best points.
-
In Sect. 2.5 our algorithm using new enhancements is introduced.
We write J(x) for the Jacobian matrix of the residual vector E at x. Then the gradient vector is \( g(x){:}{=}\nabla f(x){:}{=}J(x)^TE(x) \) and the Hessian matrix is
If the residual vector E(x) is small, the second term in G(x) is small. Hence, we approximate G(x) by the Gauss-Newton Hessian matrix \(J(x)^TJ(x)\). We define the quadratic surrogate objective function
where \(f{:}{=}f(x)\), \(E{:}{=}E(x)\), \(J{:}{=}J(x)\), \(g{:}{=}g(x){:}{=}J^TE\). We denote by \(A_{:k}\) the kth column of a matrix A.
A trust region method finds a minimizer of the constrained problem
whose constraint restricts feasible points by the trust region radius \(\varDelta >0\). This problem is called the trust region subproblem. Given a solution p of (3), we define the actual reduction in the objective function by
and the predicted reduction in the model function by
What constitutes an agreement between the actual and predicted reduction around the current iterate x must be measured by the monotone trust region ratio
If such an agreement is good according to a heuristic formula discussed in Sect. 2.2, the iteration is said successful, \(x+p\) is accepted as a new point, said a best point, and the radius is expanded; otherwise, the iteration is said unsuccessful and so the radius is reduced.
2.1 A new subspace Gauss–Newton method
In this subsection, we have two goals: estimating the Jacobian matrix and constructing a Gauss-Newton direction in a subspace with random basis indices.
Let \(m_{\mathop {\mathrm{sn}}}\) be the subspace dimension. The Jacobian matrix in a subspace with random basis indices is estimated by a new subspace random finite difference called SRFD using the following steps:
-
(1)
At first, an initial subspace basis indices set is a random subset of \(\{1,\ldots ,n\}\) consisting of \(m_{\mathop {\mathrm{sn}}}\) members and its complementary is \(S^c{:}{=}\{1,\ldots ,n\}\setminus S\).
-
(2)
Next, if the complementary of old subspace basis indices set \(S^c_{\mathop {\mathrm{old}}}\) is not empty, a new index set \({\mathcal {I}}\) needs to be identified before a new subspace basis indices set is determined. In this case, if \({\mathcal {I}}\) consists of at least \(m_{\mathop {\mathrm{sn}}}\) members, \({\mathcal {I}}\) is a random subset of \(\{1,\ldots ,m_{\mathop {\mathrm{sn}}}\}\) with the \(|S^c_{\mathop {\mathrm{old}}}|\) members; otherwise, it is a permutation of \(\{1,\ldots ,|S^c_{\mathop {\mathrm{old}}}|\}\). Then, a new subspace basis indices set is determined by \(S{:}{=}S^c_{\mathop {\mathrm{old}}}({\mathcal {I}})\) and its complementary is found by \(S^c{:}{=}S^c_{\mathop {\mathrm{old}}}\setminus S\). But if \(S^c_{\mathop {\mathrm{old}}}\) is empty, a new subspace basis indices set and its complementary are restarted and chosen in the same way as the initial subspace basis indices set and its complementary, respectively.
-
(3)
For any \(i\in S\),
-
the step size is computed by
$$\begin{aligned} h_i{:}{=} \left\{ \begin{array}{ll} \gamma _s\ \ &{}\hbox {if }x_i=0, \\ \gamma _s(\mathop {\mathrm{sign}}x_i)\max \Big \{|x_i|,\displaystyle \frac{\Vert x\Vert _1}{n}\Big \}\ \ &{} \hbox {otherwise,}\\ \end{array} \right. \end{aligned}$$where \(0<\gamma _s<1\) is a tiny factor and \(\mathop {\mathrm{sign}}x_i\) identifies the sign of \(x_i\), taking one of values \(-1\) (if \(x_i<0\)), 0 (if \(x_i=0\)), and 1 (if \(x_i>0\)).
-
the random approximation coordinate direction p discussed in Kimiaei (2020) is used with the difference that its ith component is updated by \(p_i=p_i+h_i\).
-
the new trial residual \(E(x+p)\) and the new column \((E(x+p)-E)/h_i\) of the Jacobian matrix are computed.
-
It is well known that standard quasi-Newton methods are more robust than limited memory quasi-Newton ones, but they cannot handle problems in high dimensions; for standard quasi-Newton methods, see (Dennis and Moré 1977; Dennis and Walker 1981; Nocedal 1992; Schnabel 1989), and for limited memory quasi-Newton methods, see (Liu and Nocedal 1989; Nazareth 1979; Nocedal 1992). On the other hand, finite difference methods are more efficient than standard quasi-Newton ones. Hence, if used in a subspace with random basis indices, they can be more efficient than limited memory quasi-Newton methods for small- up to large-scale problems.
Using S and \(S^c\) generated and updated by SRFD, we construct a new subspace Gauss-Newton direction by
2.2 New non-monotone and adaptive strategies
In this subsection, a new non-monotone term – stronger than the objective function f – is constructed and a new adaptive radius formula to update \(\varDelta \) is derived from it. They help LMLS in finite precision arithmetic to quickly reach the minimizer in the cases where the valley is deep with a small creek at the bottom and steep sides.
Our non-monotone term is updated not only for successful but also for unsuccessful iterations that may have happened before a successful iteration is found. This choice is based on an estimated increase in f defined below which is updated according to whether a decrease in f is found or not. It helps us to generate a somewhat strong non-monotone term when a decrease in f is not found and a somewhat weak non-monotone term otherwise. Somewhat strong non-monotone terms increase the chance of finding a point with better function value or at least a point with a little progress in the function value instead of solving trust region subproblems with high computational costs.
We denote by X a list of best points and by F a list of corresponding function values. Let \(m_{\mathop {\mathrm{rs}}}\) be the maximum number of good points saved in X. In order to update X and F, we use updateXF. Here we describe how to work it. If \(m_{\mathop {\mathrm{rs}}}\) is not exceeded, points with good function values are saved in X and their function values in F. Otherwise, the worst point and its function value are found and replaced by the best point and its function value, respectively.
Let \(\gamma _t\in (0,1)\), \(\underline{\gamma }\in (0,1)\), \(\gamma ^{{\mathrm{init}}}>0\), and \(\overline{\gamma }>1\) be the tuning parameters and let
Before a new non-monotone term is constructed, an estimated increase in f needs to be estimated by
Accordingly, the new non-monotone formula is defined by
and the new adaptive radius is constructed by
where \(\varDelta _{\mathop {\mathrm{nm}}}^0>0\) is a tuning parameter and \(\lambda ^k\) is updated according to
Here \(\lambda ^0>0\), \( 0< \sigma _1< 1 < \sigma _2 \), and \( \overline{\lambda }> \underline{\lambda }> 0 \) are the tuning parameters and the new non-monotone trust region ratio is defined by
where \(p^{k-1}\) is a solution of the following trust region subproblem in a subspace with the random basis indices set S by SRFD
with \(f^{k-1}{:}{=}f(x^{k-1})\), \(E^{k-1}{:}{=}E(x^{k-1})\), \(J^{k-1}_{:S}{:}{=}J_{:S}(x^{k-1})\), and \(g^{k-1}_S{:}{=}(J_{:S}^{k-1})^TE^{k-1}\).
2.3 A subspace dogleg algorithm
We define the Cauchy step by
The goal is to solve the trust region subproblem (14) such that
hold. After the subspace Gauss-Newton direction is computed by (7), if it is outside a trust region, a subspace dogleg algorithm, called subDogleg, is used resulting in an estimated step enforcing (16).
The model function \(\widetilde{\mathcal {Q}}\) is reduced by (15) if \(dq_{\mathop {\mathrm{sn}}} {:}{=} \widetilde{\mathcal {Q}}(0)-\widetilde{\mathcal {Q}}(p_{\mathop {\mathrm{sn}}})>0\). subDogleg first identifies whether \(dq_{\mathop {\mathrm{sn}}}>0\) or not. Then we have one of the following cases:
Case 1. If \(dq_{\mathop {\mathrm{sn}}}>0\), the scaled steepest descent step
is computed. If it is outside the trust region, an estimated solution of (3) is either the Cauchy step computed by (15) or the dogleg step
both of (17) and (18) are on the trust region boundary. Here t is found by solving the equation \(\Vert p_{\mathop {\mathrm{sd}}}+t(p_{\mathop {\mathrm{sn}}}-p_{\mathop {\mathrm{sd}}})\Vert =\varDelta _{\mathop {\mathrm{nm}}}\). If the condition \(dp{:}{=}(p_{\mathop {\mathrm{sd}}})^T(p_{\mathop {\mathrm{sn}}}-p_{\mathop {\mathrm{sd}}})\le 0\) holds, a positive root is computed by
Otherwise, t is computed by
e.g., see Nielsen 2012.
Case 2. If \(dq_{\mathop {\mathrm{sn}}}\le 0\), the model function \(\widetilde{\mathcal {Q}}\) is convex since the matrix \((J_Sg_S)^T(J_Sg_S)\) is symmetric and positive semidefinite. An estimated solution of (3) is either \(p_{\mathop {\mathrm{sd}}}\) computed by (17) if it is inside the trust region or the Cauchy step \(p{:}{=}\varDelta _{\mathop {\mathrm{nm}}} (p_{\mathop {\mathrm{sd}}}/\Vert p_{\mathop {\mathrm{sd}}}\Vert )\) according to \(p_{\mathop {\mathrm{sd}}}\), otherwise.
2.4 Broyden-like technique
Before a successful iteration is found by a trust region algorithm, the trust region subproblems may be solved many times with high computational cost. Instead, our idea is to use a new algorithm based on the previous best points in the hope of finding a point with good function value.
Whenever LMLS cannot decrease the function value, a new Broyden-like technique, called BroydenLike, is used in the hope of getting a decrease in the function value. Let \(x^1,\ldots ,x^{m_{\mathop {\mathrm{rs}}}}\) be the \(m_{\mathop {\mathrm{rs}}}\) best point stored in X. Then a point in the affine space spanned by such points has the following form
where \(e\in {\mathbb {R}}^{m_{\mathop {\mathrm{rs}}}}\) is a vector all of whose components are one. Given \(B{:}{=}\left( \begin{array}{l} X\\ e\end{array}\right) \), the linear approximation \(E(x_z)\approx Bz\) is used to replace (1) by the surrogate problem
This is a quality constrained convex quadratic problem in \(m_{\mathop {\mathrm{rs}}}\) variables and hence can be solved in closed form. Then a QR factorization is made in the form \(B=QR\), where Q is an orthogonal matrix and R is a square upper triangular matrix. By setting \(Z{:}{=}R^{-1}\), we make the substitution \(z{:}{=}Zy\), define \(a^T{:}{=}e^TZ\), and obtain the \(m_{\mathop {\mathrm{rs}}}\)-dimensional minimal norm linear feasibility problem
whose solution is \(y{:}{=}a/\Vert a\Vert _2\). Hence, a new trial point for the next algorithm is
BroydenLike tries to find a point with better function value when no decrease in f is found along p. It takes X, F, x, E, B, f, \(\delta _f\), and \(f_{\mathop {\mathrm{nm}}}\) as input and uses the following tuning parameter:
-
\(\underline{\gamma }\in (0,1)\) (tiny factor for adjusting \(\delta _f\)),
-
\(\overline{\gamma }>1\) (parameter for expanding \(\delta _f\)),
-
\(\gamma _s\) (tiny parameter for the finite difference step size), \(m_{\mathop {\mathrm{rs}}}\) (memory for affine space),
-
\(0<\gamma _r<1\) (tiny parameter for adjusting the scaled random directions).
It returns a new value of \(\delta _f\) and \(f_{\mathop {\mathrm{nm}}}\) (and f, X, F, E, S, and \(J_S\) if a decrease in f is found) as output.
Since the Jacobian matrix may be singular or indefinite, a new point may move toward either a maximum point or saddle point. To remedy this disadvantage, BroydenLike does not lead to accept such a point with largest function value.
2.5 A limited memory trust region algorithm
We describe all steps of a new limited memory algorithm, called LMLS using the new subspace direction (7), the new non-monotone technique (10), the new adaptive radius strategy (11), and BroydenLike.
In each iteration, an estimated solution of the trust region subproblem (14) is found. Whenever the condition \(\rho _{\mathop {\mathrm{nm}}}\ge \gamma _t\) holds, the iteration is successful while updating both the non-monotone term (10) and adaptive radius formula (11), and estimating the Jacobian matrix in a subspace with random basis indices by SRFD. Otherwise, the iteration is unsuccessful. In this case, BroydenLike is performed in the hope of finding a decrease in the function value. If a decrease in the function value is found, the iteration becomes successful; otherwise, it remains unsuccessful while reducing the radius and updating the non-monotone term (10) until a decrease in the function value is found and the iteration becomes successful.
LMLS solves unconstrained nonlinear black box least squares problem. This algorithm takes the initial point \(x^0\), and maximal number of function evaluations (nfmax). It uses the following tuning parameters:
-
\(m_{\mathop {\mathrm{sn}}}\) ( subspace dimension),
-
\(\gamma _t\in (0,1)\) (parameter for trust region),
-
\( 0< \sigma _1< 1 < \sigma _2 \) (parameters for updating \(\lambda \)),
-
\(\gamma ^{{\mathrm{init}}}\) (parameter for updating the initial \(\delta _f\)),
-
\(\underline{\gamma }\in (0,1)\) (tiny factor for adjusting \(\delta _f\)),
-
\(\overline{\gamma }>1\) (parameter for expanding \(\delta _f\)),
-
\(\underline{\lambda }\) (lower bound for \(\lambda \)),
-
\(\overline{\lambda }\) (upper bound for \(\lambda \)),
-
\(\gamma _s\) (tiny parameter for adjusting finite difference step sizes),
-
\(0<\gamma _r<1\) (tiny parameter for adjusting random approximation coordinate directions).
It returns a solution \(x^{{\mathrm{best}}}\) of a nonlinear least squares problem as output.
LMLS was implemented in MATLAB; the source code is available at
3 Numerical results
We updated the test environment constructed by Kimiaei and Neumaier (2019) to use test problems suggested by Lukšan et al. (2018). LMLS is compared with unconstrained least squares and unconstrained optimization solvers, for some of which we had to choose options different from the default to make them competitive in the first subsection.
3.1 Codes compared
Least squares solvers:
-
CoDoSol is a solver for constrained nonlinear systems of equations, obtained from
It combines Newton method and a trust region method, see Bellavia et al. (2012). The following option was used
Note that \(\mathtt{delta}=-1\) means that \(\varDelta _0=1\). According to our numerical results, CoDoSol was not sensitive for the initial radius; hence, the default was used.
-
STRSCNE is a solver for constrained nonlinear systems of equations, obtained from
It combines Newton method and a trust region procedure, see Bellavia et al. (2004). The option
was used. Note that \(\mathtt{delta}=-1\) means that \(\varDelta _0=1\). According to our numerical results, STRSCNE was not sensitive for the initial radius; hence, the default was used for \(\varDelta _0\).
-
NLEQ1 is a damped affine invariant Newton method for nonlinear systems, obtained from
http://elib.zib.de/pub/elib/codelib/nleq1_m/nleq1.m and suggested by Deuflhard (2011) and written by Nowak and Weimann (1990). The default tuning parameters were used; only \(\mathtt{iopt.jacgen}=2\), \(\mathtt{iopt.qrank1}=1\), and \(\mathtt{wk.fmin = 1e-50}\) were selected.
-
NLEQ2 is the same as NLEQ1; only \(\mathtt{iopt.qrank1}=0\) was selected.
-
MINLBFGS is a L-BFGS with line search algorithm, obtained from
and written by Sorber et al. (2012). The default parameters were used, except for m. MINLBFGS1 and MINLBFGS2 are MINLBFGS with \(m=\min (n,30)\) and \(m=\min (n,100)\), respectively.
-
MINLBFGSDL is a L-BFGS with dogleg trust region algorithm with the option set \(\mathtt{MaxIter = nfmax}\), \(\mathtt{TolFun = 1e-50}\), \(\mathtt{TolX = 1e-50}\). The default parameters were chosen for \(\mathtt{PlaneSearch}\) and M. MINLBFGSDL1, MINLBFGSDL2, MINLBFG SDL3, and MINLBFGSDL4 are MINLBFGSDL with
-
(1)
\(\mathtt{PlaneSearch = false}\) and \(M = \min (30,n)\),
-
(2)
\(\mathtt{PlaneSearch = false}\) and \(M = \min (100,n)\),
-
(3)
\(\mathtt{PlaneSearch = true}\) and \(M = \min (30,n)\),
-
(4)
\(\mathtt{PlaneSearch = true}\) and \(M = \min (100,n)\).
-
(1)
According to our results, MINLBFGSDL1 was the best.
-
MINFNCG is a nonlinear conjugate gradient solver, obtained from
and written by Sorber et al. (2012). We used the following option set
$$\begin{aligned}&\mathtt{MaxIter = nfmax};\ \ \mathtt{TolFun = 1e-50}; \ \ \\&\mathtt{TolX = 1e-50}. \end{aligned}$$The other tuning parameter was \(\mathtt{Beta}\in \{\mathtt{HS},\mathtt{HSm},{} \mathtt{PR},\mathtt{FR},\mathtt{PRm},\mathtt{SD}\}\). MINFNCG1, MINFNCG2, MINFNCG3, MINFNCG4, MINFNCG5, and MINFNCG6 are MINFNCG with \(\mathtt{Beta}=\mathtt{HS}\), \(\mathtt{Beta}=\mathtt{HSm}\), \(\mathtt{Beta}=\mathtt{PR}\), \(\mathtt{Beta}=\mathtt{FR}\), \(\mathtt{Beta}=\mathtt{PRm}\), and \(\mathtt{Beta}=\mathtt{SD}\), respectively.
-
NLSQERR is a global unconstrained Gauss-Newton method with error oriented convergence criterion and adaptive trust region strategy Deuflhard (2011), obtained from
http://elib.zib.de/pub/elib/codelib/NewtonLib/index.html
The following options were used
$$\begin{aligned}&\mathtt{iniscalx} = 0;\ \ \mathtt{rescalx} = 0; \ \ \mathtt{xthrsh} = \mathop {\mathrm{ones}}(n, 1); \\&\mathtt{xtol} = \mathtt{1.e-50};\ \ \mathtt{ftol} = \mathtt{1.e-50};\ \ \mathtt{kmax} = \mathtt{nfmax};\\&\mathtt{printmon} = 2;\ \ \mathtt{printsol} = 1;\ \ \mathtt{fid} = 1; \ \ \\&\mathtt{numdif} = 1; \\&\mathtt{lambda0} = \mathtt{eps}; \mathtt{lambdamin} = \mathtt{1e-50}; \\&\mathtt{ftol} = \mathtt{1.e-50}. \end{aligned}$$ -
NMPNTR, non-monotone projected Newton trust region method, is a bound constrained solver Kimiaei (2016). NMPNTR1, NMPNTR2, NMPNTR3, and NMPNTR4 are NMPNTR with \(\varDelta _0=1\), \(\varDelta _0=10\), \(\varDelta _0=100\), and \(\varDelta _0=500\), respectively. According to our results, NMPNTR2 was the best.
-
NATRN is a non-monotone trust region algorithm Amini et al. (2016) using the full finite difference approximation. The subproblem was solved in the same way as LMLS. NATRN1, NATRN2, NATRN3, and NATRN4 are NATRN with \(\varDelta _0=1\), \(\varDelta _0=10\), \(\varDelta _0=100\), and \(\varDelta _0=500\), respectively. According to our results, NATRN1 had the best performance.
-
NATRLS is a non-monotone line search and trust region algorithm Amini et al. (2016) using the full finite difference approximation. The subproblem was solved in the same way as LMLS. NATRLS1, NATRLS2, NATRLS3, and NATRLS4 are NATRLS with \(\varDelta _0=1\), \(\varDelta _0=10\), \(\varDelta _0=100\), and \(\varDelta _0=500\), respectively. According to our results, NATRLS1 had the best performance.
-
LSQNONLIN1, obtained from the MATLAB Optimization Toolbox at,
https://de.mathworks.com/help/optim/ug/lsqnonlin.html
is a nonlinear least squares solver with the following options:
options = optimoptions(@lsqnonlin,‘Algorithm’, ‘levenberg-marquardt’, ‘Fin- iteDifferenceType’,‘forward’, ‘MaxIter’, Inf,‘MaxFunEvals’, nfmax, ‘TolX’, 0,‘Specify ObjectiveGradient’,‘false’).
-
LSQNONLIN2, obtained from the MATLAB Optimization Toolbox at,
https://de.mathworks.com/help/optim/ug/lsqnonlin.html
is a nonlinear least squares solver with the following options:
options = optimoptions(@lsqnonlin,‘Algorithm’, ‘trust-region reflective’, ‘FiniteDifferenceType’,‘forward’, ‘MaxIter’, Inf,‘MaxFunEvals’, nfmax, ‘TolX’, 0,‘SpecifyObjectiveGradient’,‘false’).
-
DOGLEG is Powell’s dogleg method for least squares problems, which is the best algorithm from the toolbox of immoptibox.zip Nielsen (2012), available at
http://www2.imm.dtu.dk/projects/immoptibox/
The following option was used
$$\begin{aligned} \mathtt{opts}= & {} [\varDelta _0,\mathtt{tolg},\mathtt{tolx},\mathtt{tolr},\mathtt{maxeval}]\\= & {} [\varDelta _0,\mathtt{1e-50},\mathtt{1e-50},\mathtt{1e-50},\mathtt{nfmax}]. \end{aligned}$$DOGLEG1, DOGLEG2, DOGLEG3, and DOGLEG4 are DOGLEG with \(\varDelta _0=1\), \(\varDelta _0=10\), \(\varDelta _0=100\), and \(\varDelta _0=500\), respectively. The best version was DOGLEG1.
Unconstrained solvers:
-
FMINUNC, obtained from the MATLAB Optimization Toolbox at
https://ch.mathworks.com/help/optim/ug/fminunc.html,
is a standard quasi-Newton algorithm. We used FMINUNC with
opts = optimoptions(@fminunc),‘Algorithm’,‘quasi-newton’, ‘Display’, ‘Iter’, ‘MaxIter’,Inf,‘MaxFunEvals’, nfmax, ‘TolX’, 0,‘TolFun’,0,‘ObjectiveLimit’,-1e-50).
-
FMINUNC1 is FMINUNC with the limited memory quasi-Newton approximation by Liu and Nocedal (1989). It were added to FMINUNC by the present authors. The option set for it was used the same as FMINUNC; only the memory \(m=10\) was added to the option set.
3.2 Default for tuning parameters of LMLS
The tuning parameters for our new method (LMLS) are chosen as
The remaining tuning parameter \(m_{\mathop {\mathrm{sn}}}\) is varied in the experiment: LMLS1, LMLS2, LMLS3, and LMLS4 are LMLS with \(m_{\mathop {\mathrm{sn}}}=3\), \(m_{\mathop {\mathrm{sn}}}=\min (10,n)\), \(m_{\mathop {\mathrm{sn}}}=\min (30,n)\), and \(m_{\mathop {\mathrm{sn}}}=\min (100,n)\), respectively.
3.3 Test problems used, initial point, and stopping tests
Test problems suggested by Lukšan et al. (2018) are classified in Table 1 according to whether they are overdetermined (\(r>n\)) or not (\(r=n\)). A shifted point for these problems is done like Kimiaei and Neumaier (2020) as \(\chi _i {:}{=} (-1)^{i-1}2/(2+i)\) for all \(i=1,\ldots ,n\). This means that the initial point is chosen by \(x^0_i=\chi _i\) for all \(i=1,\ldots ,n\) and the initial function value is computed by \(f^0{:}{=}f(x^0)\), while the other function values are computed by \(f^\ell {:}{=}f(x^{\ell }+\chi )\) for all \(\ell \ge 1\).
We denote \(\mathtt{nf}\) and \(\mathtt{msec}\) as the number of function evaluations and time in milliseconds, respectively. nfmax and secmax are the upper bounds for them, chosen as
and
Denote by \(f^0\) the function value of the starting point (common to all solvers), by \(f^{so}\) the best point found by the solver so, and by \(f^{\mathop {\mathrm{opt}}}\) the best point known to us. Then, if the target accuracy satisfies
then the problem is solved by the solver so. Otherwise, the problem is unsolved by it; either nfmax or secmax is exceeded, or the solver fails.
3.4 The efficiency and robustness of a solver
For a given collection S of solvers, the strength of a solver \(so \in S\)—relative to an ideal solver that matches on each problem the best solver—is measured, for any given cost measure \(c_s\) by the number, \(e_{so}\) defined by
called the efficiency of the solver so with respect to this cost measure. Two cost measures nf and msec are used.
The robustness of a solver is how many test problems it can solve. Efficiency and robustness are two adequate tools to determine which solver is competitive. In fact, the robustness of a solver is more important than its efficiency. We use two different performance plots in terms of the robustness and efficiency of solvers:
-
The first performance plot is the data profile by Moré and Wild (2009) for nf/(best nf) and msec/(best msec) as well; but it is the percentage of problems solved within the number of function evaluations and time in milliseconds.
-
The second performance plot is the performance profile by Dolan and Moré (2002) for nf/(best nf) and msec/(best msec); the percentage of problems solved within a factor \(\tau \) of the best solvers.
All tables and data/performance profiles are given in Sects. 4.2–4.4. In Sects. 3.5-3.7, we summarize them as two new performance plots.
3.5 Small scale: \(n\in [1,100]\)
A comparison among LMLS1, LMLS2, LMLS3, LMLS4, and solvers using quasi-Newton is shown in Subfigures (a) and (b) of Fig. 1, so that
-
LMLS4 using the full estimated Jacobian matrix is the best in terms of the number of solved problems and the nf efficiency;
-
LMLS3, LMLS2, and LMLS1 are more efficient than solvers using quasi-Newton approximation (FMINUNC, FMINUNC1, MINFLBFGS1, and MINFLBFGSDL1) in terms of the nf efficiency;
-
FMINUNC and MINFLBFGSDL1 are comparable with LMLS3—only for very large budget—in terms of the number of solved problems but LMLS3, LMLS2, and LMLS1 are more efficient than FMINUNC and MINFLBFGSDL1 in terms of the nf efficiency not only for very large budget but also for small up to large budgets.
To determine whether our new non-monotone and adaptive radius strategies are effective or not, we compare LMLS4 with solvers using other non-monotone and adaptive radius strategies, shown in Subfigures (c) and (d) of Fig. 1. All solvers use the full Jacobian matrix and the trust region subproblems are solved by the same algorithm. As can be seen, LMLS4 is much more efficient and robust than NATRLS1, NMPGTR2, and NATRN1 in terms of the number of solved problems and the nf efficiency.
We compare LMLS4 with four famous solvers LSQNONLIN1, CoDoSol1, NLEQ1, and DOGLEG1 shown in Subfigures (e) and (f) of Fig. 1. It is seen that LMLS4 and CoDoSol1 are the two best solvers in terms of the nf efficiency while LMLS4 and DOGLEG1 are the two best solvers in terms of the number of solved problems.
Another comparison is among LMLS3, LMLS2, and LMLS1 using the Jacobian matrix in an adaptive subspace basis indices set and LSQNONLIN1 and NLEQ1 using the full Jacobian matrix. We conclude from Subfigures (g) and (h) of Fig. 1 that
-
LMLS3 is the best in terms of the number of solved problems and the nf efficiency;
-
LMLS2 is the second best solver in terms of the number of solved problems and the nf efficiency for medium, large, and very large budgets;
-
LMLS1 with lowest subspace dimension is more efficient than LSQNONLIN1 in terms of the number of solved problems and the nf efficiency; even it is more efficient than NLEQ1 for very large budget in terms of the nf efficiency.
As a result, LMLS is competitive for small-scale problems in comparison with the state-of-the-art solvers.
3.6 Medium scale: \(n\in [101,1000]\)
In this subsection, we compare LMLS1, LMLS2, LMLS3, and LMLS4 using the estimated Jacobian matrices in a subspace with random basis indices with FMINUNC using standard BFGS approximations and FMINUNC1 using limited memory BFGS ones (Fig. 2).
From Subfigures (a) and (b) of Fig. 1, we conclude that
-
LMLS4, LMLS3, and LMLS2 are the three best solvers in terms of the nf efficiency, respectively;
-
LMLS4 is the best solver in terms of the number of solved problems; only FMINUNC is the best for large budget.
Performance plots for small-scale problems. a–b: A comparison of limited memory solvers, c–d: A comparison among LMLS in a full subspace with random basis indices and solvers using other non-monotone and adaptive radius techniques, e–f: A comparison among LMLS in a full subspace with random basis indices and other famous solvers, g–h: A comparison among low-dimensional LMLS1, LMLS2, LMLS3, and NLEQ1 and LSQNONLIN1 using full estimated Jacobian
3.7 Large scale: \(n\in [1001,10000]\)
In this subsection, we compare LMLS1, LMLS2, LMLS3, LMLS4 using the estimated Jacobian matrices in a subspace with random basis indices with FMINUNC1 using limited memory BFGS approximations.
In terms of the nf efficiency and the number of solved problems, Subfigures (a) and (b) of Fig. 3 result in the fact that
-
LMLS4, LMLS3, and LMLS2 are the three best solvers, respectively;
-
LMLS1 with lowest subspace dimension is more efficient than FMINUNC1 for small budget while FMINUNC1 is more efficient than LMLS1 for large budget.
4 Additional material for LMLS
4.1 Summarizing tables
In all tables, efficiencies are given as percentages, rounded (towards zero) to integers. Larger efficiencies imply a better average behavior, while a zero efficiency indicates failure.
#100 is the total number of problems for which the solver so was best with respect to nf (\(e_{so}=1=100\%\)). !100 is the total number of problems solved for which the solver so was better than all other solvers with respect to nf.
We denote the time in seconds without the setup time for the objective function by sec. In tables, a sign
-
n indicates that nf \(\ge \mathtt{nfmax}\) was reached.
-
t indicates that sec \(\ge \mathtt{secmax}\) was reached.
-
f indicates that the algorithm failed for other reasons.
\(T_{\mathop {\mathrm{mean}}}\) is the mean of the time in seconds needed by a solver to solve the test problems chosen from the list of test problems \({\mathcal {P}}\), ignoring the times for unsolved problems. It can be a good measure when solvers have approximately the same number of solved problems.
4.2 Tables and data/performance profiles for \(1\le n\le 100\)
This section contains Tables 2, 3, 4, 5 and Figs. 4, 5, 6, 7, summaries of which were discussed in Sect 3.5.
a and b: Data profiles for nf/(best nf) and msec/(best msec), respectively. \(\rho \) designates the fraction of problems solved within the number of function evaluations and time in milliseconds used by the best solver. Problems solved by no solver are ignored. c–d: Performance profiles for nf/(best nf) and msec/(best msec), respectively. \(\rho \) designates the fraction of problems solved within the number of function evaluations and time in milliseconds used by the best solver. Problems solved by no solver are ignored
Details as in Fig. 4
Details as in Fig. 4
Details as in Fig. 4
4.3 Tables and data/performance profiles for \(101 \le n \le 1000\)
This section contains Tables 6, 7, 8 and Figs. 8, 9, 10, summaries of which were discussed in Sect 3.6.
Details as in Fig. 4
Details as in Fig. 4
Details as in Fig. 4
Details as in Fig. 4
4.4 Tables and data/performance profiles for \(1001 \le n \le 10000\)
This section contains Tables 9, 10 and Figs. 11, 12, summaries of which were discussed in Sect 3.7.
Details as in Fig. 4
References
Ahookhosh M, Amini K (2011) An efficient nonmonotone trust-region method for unconstrained optimization. Numer Algo 59:523–540
Ahookhosh M, Esmaeili H, Kimiaei M (2013) An effective trust-region-based approach for symmetric nonlinear systems. Int J Comput Math 90:671–690
Ahookhosh M, Amini K, Kimiaei M (2015) A globally convergent trust-region method for large-scale symmetric nonlinear systems. Number Func Anal Opt 36:830–855
Amini K, Shiker MAK, Kimiaei M (2016) A line search trust-region algorithm with nonmonotone adaptive radius for a system of nonlinear equations. 4OR-Q J Oper Res 14:133–152
Amini K, Esmaeili H, Kimiaei M (2016) A nonmonotone trust-region-approach with nonmonotone adaptive radius for solving nonlinear systems. IJNAO 6
Bellavia S, Macconi M, Morini B (2004) STRSCNE: a scaled trust-region solver for constrained nonlinear equations. Comput Optim Appl 28:31–50
Bellavia S, Macconi M, Pieraccini S (2012) Constrained dogleg methods for nonlinear systems with simple bounds. Comput Optim Appl 53:771–794
Conn AR, Gould NIM, Toint Ph L (2000) Trust region methods. Society for industrial and applied mathematics
Deng NY, Xiao Y, Zhou FJ (1993) Nonmonotonic trust region algorithm. J Optim Theory Appl 76:259–285
Dennis JE Jr, Moré JJ (1977) Quasi-newton methods, motivation and theory. SIAM Rev 19:46–89
Dennis JE Jr, Walker HF (1981) Convergence theorems for least-change secant update methods. SIAM J Numer Anal 18:949–987
Deuflhard P (2011) Newton methods for nonlinear problems. Springer, Berlin Heidelberg
Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math Prog 91:201–213
Esmaeili H, Kimiaei M (2014) An efficient adaptive trust-region method for systems of nonlinear equations. Int J Comput Math 92:151-[object Object]
Esmaeili H, Kimiaei M (2014) A new adaptive trust-region method for system of nonlinear equations. Appl Math Model 38:3003–3015
Esmaeili H, Kimiaei M (2015) A trust-region method with improved adaptive radius for systems of nonlinear equations. Math Meth Oper Res 83:109–125
Fan J (2006) Convergence rate of the trust region method for nonlinear equations under local error bound condition. Comput Optim Appl 34:215–227
Fan J, Pan J (2009) An improved trust region algorithm for nonlinear equations. Comput Optim Appl 48:59–70
Fan J, Pan J (2010) A modified trust region algorithm for nonlinear equations with new updating rule of trust region radius. Int J Comput Math 87:3186–3195
Grippo L, Sciandrone M (2007) Nonmonotone derivative-free methods for nonlinear equations. Comput Optim Appl 37:297–328
Grippo L, Lampariello F, Lucidi S (1986) A nonmonotone line search technique for newton’s method. SIAM J Numer Anal 23: 707–716
Kimiaei M (2020) Line search in noisy unconstrained black box optimization. http://www.optimization-online.org/DB_HTML/2020/09/8007.html
Kimiaei M (2016) A new class of nonmonotone adaptive trust-region methods for nonlinear equations with box constraints. Calcolo 54:769–812
Kimiaei M (2017) Nonmonotone self-adaptive levenberg–marquardt approach for solving systems of nonlinear equations. Number Func Anal Opt 39:47–66
Kimiaei M, Neumaier A (2019) Testing and tuning optimization algorithm. Preprint, Vienna University, Fakultät für Mathematik, Universität Wien, Oskar-Morgenstern-Platz 1, A-1090 Wien, Austria
Kimiaei M, Neumaier A (2020) Efficient global unconstrained black box optimization. http://www.optimization-online.org/DB_HTML/2018/08/6783.html
Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math Program 45:503–528
Lukšan L, Matonoha C, Vlček J (2018) Problems for nonlinear least squares and nonlinear equations. Technical report V-1259, ICS CAS
Moré JJ, Wild SM (2009) Benchmarking derivative-free optimization algorithms. SIAM J Optim 20:172–191
Nazareth L (1979) A relationship between the BFGS and conjugate gradient algorithms and its implications for new algorithms. SIAM J Numer Anal 16:794–800
Nielsen HB (2012) immoptibox – a matlab toolbox for optimization and data fitting. version 2.2
Nocedal J (1992) Theory of algorithms for unconstrained optimization. Acta Numer 1:199–242
Nocedal J, Wright SJ (1999) eds. Numerical optimization, Springer-Verlag
Nowak U, Weimann L (1990) A family of newton codes for systems of highly nonlinear equations. Technical report TR 91–10, Zuse Institute Berlin (ZIB)
Ortega JM, Rheinboldt WC (2000) Iterative solution of nonlinear equations in several variables. Society for industrial and applied mathematics
Schnabel RB (1989) Sequential and parallel methods for unconstrained optimization. In: Iri M, Tanabe K (eds) Mathematical programming, recent developments and applications. Kluwer Academic Publishers, Netherlands, pp 227–261
Sorber L, Barel MV, Lathauwer LD (2012) Unconstrained optimization of real functions in complex variables. SIAM J Optim 22:879–898
Yu Z, Pu D (2008) A new nonmonotone line search technique for unconstrained optimization. J Comput Appl Math 219:134–144
Yuan Y (2011) Recent advances in numerical methods for nonlinear equations and nonlinear least squares. Numer Algebra Control Optim 1:15–34
Funding
Open access funding provided by University of Vienna.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest
Human and animal rights
This study does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The first author acknowledges the financial support of the Doctoral Program Vienna Graduate School on Computational Optimization (VGSCO) funded by the Austrian Science Foundation under Project No W1260-N35.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kimiaei, M., Neumaier, A. A new limited memory method for unconstrained nonlinear least squares. Soft Comput 26, 465–490 (2022). https://doi.org/10.1007/s00500-021-06415-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-06415-8