Abstract
We propose a multistart algorithm to identify all local minima of a constrained, nonconvex stochastic optimization problem. The algorithm uniformly samples points in the domain and then starts a local stochastic optimization run from any point that is the “probabilistically best” point in its neighborhood. Under certain conditions, our algorithm is shown to asymptotically identify all local optima with high probability; this holds even though our algorithm is shown to almost surely start only finitely many local stochastic optimization runs. We demonstrate the performance of an implementation of our algorithm on nonconvex stochastic optimization problems, including identifying optimal variational parameters for the quantum approximate optimization algorithm.
Similar content being viewed by others
Data availibility
We do not use any specific dataset in this study. We use simulated data to present our analyses and results. As no specific datasets were employed, no data will be made available.
Change history
02 July 2024
A Correction to this paper has been published: https://doi.org/10.1007/s11590-024-02135-8
References
Gheribi, A.E., Robelin, C., Digabel, S.L., Audet, C., Pelton, A.D.: Calculating all local minima on liquidus surfaces using the FactSage software and databases and the mesh adaptive direct search algorithm. J. Chem. Thermodyn. 43(9), 1323–1330 (2011). https://doi.org/10.1016/j.jct.2011.03.021
Adamcik, J., Mezzenga, R.: Amyloid polymorphism in the protein folding and aggregation energy landscape. Angewandte Chémie Int. Edit. 57(28), 8370–8382 (2018). https://doi.org/10.1002/anie.201713416
Floudas, C., Klepeis, J., Pardalos, P.: Global optimization approaches in protein folding and peptide docking. In: DIMACS Series in Discrete Mathematics and Theoretical Computer Science vol. 47, pp. 141–171. Am. Math. Soc. (1999). https://doi.org/10.1090/dimacs/047/07
Li, Z., Scheraga, H.A.: Monte Carlo-minimization approach to the multiple-minima problem in protein folding. Proc. Natl. Acad. Sci. U.S.A. 84(19), 6611–6615 (1987). https://doi.org/10.1073/pnas.84.19.6611
di Serafino, D., Gomez, S., Milano, L., Riccio, F., Toraldo, G.: A genetic algorithm for a global optimization problem arising in the detection of gravitational waves. J. Global Optim. 48(1), 41–55 (2010). https://doi.org/10.1007/s10898-010-9525-9
Krishnamoorthy, M., Schulz, H., Ju, X., Wang, W., Leyffer, S., Marshall, Z., Mrenna, S., Müller, J., Kowalkowski, J.B.: Apprentice for event generator tuning. EPJ Web of Conf. 251, 03060 (2021). https://doi.org/10.1051/epjconf/202125103060
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Hyperband: a novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18(1), 6765–6816 (2017)
Maclaurin, D., Duvenaud, D., Adams, R.: Gradient-based hyperparameter optimization through reversible learning. In: International Conference on Machine Learning, pp. 2113–2122 (2015)
Farhi, E., Goldstone, J., Gutmann, S.: A quantum approximate optimization algorithm. arXiv:1411.4028 (2014). https://doi.org/10.48550/arXiv.1411.4028
Farhi, E., Goldstone, J., Gutmann, S.: A quantum approximate optimization algorithm applied to a bounded occurrence constraint problem. arXiv:1412.6062 (2014). https://doi.org/10.48550/arXiv.1412.6062
Rinnooy Kan, A.H.G., Timmer, G.T.: Stochastic global optimization methods part I Clustering methods. Math. Program. 39(1), 27–56 (1987). https://doi.org/10.1007/bf02592070
Rinnooy Kan, A.H.G., Timmer, G.T.: Stochastic global optimization methods part II: Multi level methods. Math. Program. 39(1), 57–78 (1987). https://doi.org/10.1007/bf02592071
Shashaani, S., Hashemi, F.S., Pasupathy, R.: ASTRO-DF: A class of adaptive sampling trust-region algorithms for derivative-free stochastic optimization. SIAM J. Optim. 28(4), 3145–3176 (2018). https://doi.org/10.1137/15m1042425
Larson, J., Wild, S.M.: Asynchronously parallel optimization solver for finding multiple minima. Math. Program. Comput. 10(3), 303–332 (2018). https://doi.org/10.1007/s12532-017-0131-4
Kushner, H.J.: A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. J. Basic Eng. 86(1), 97 (1964). https://doi.org/10.1115/1.3653121
Frazier, P.I.: Bayesian optimization. In: Recent Advances in Optimization and Modeling of Contemporary Problems, pp. 255–278. INFORMS TutORials in Operations Research, (2018). https://doi.org/10.1287/educ.2018.0188
Mathesen, L., Pedrielli, G., Ng, S.H., Zabinsky, Z.B.: Stochastic optimization with adaptive restart: A framework for integrated local and global learning. J. Global Optim. 79(1), 87–110 (2020). https://doi.org/10.1007/s10898-020-00937-5
Locatelli, M.: Relaxing the assumptions of the multilevel single linkage algorithm. J. Global Optim. 13(1), 25–42 (1998). https://doi.org/10.1023/a:1008246031222
Nguyen, V., Rana, S., Gupta, S., Li, C., Venkatesh, S.: Budgeted batch Bayesian optimization with unknown batch sizes. arXiv:1703.04842 (2017). https://doi.org/10.48550/arXiv.1703.04842
Wessing, S., Preuss, M.: The true destination of EGO is multi-local optimization. In: IEEE Latin American Conference on Computational Intelligence (2017). https://doi.org/10.1109/la-cci.2017.8285677
Krityakierne, T., Shoemaker, C.A.: SOMS: SurrOgate MultiStart algorithm for use with nonlinear programming for global optimization. Int. Trans. Oper. Res. 24(5), 1139–1172 (2015). https://doi.org/10.1111/itor.12190
Peri, D., Tinti, F.: A multistart gradient-based algorithm with surrogate model for global optimization. Commun. Appl. Ind. Math. 3(1) (2012). https://doi.org/10.1685/journal.caim.393
Regis, R.G., Shoemaker, C.A.: A quasi-multistart framework for global optimization of expensive functions using response surface models. J. Global Optim. 56(4), 1719–1753 (2012). https://doi.org/10.1007/s10898-012-9940-1
Žilinskas, A., Gillard, J., Scammell, M., Zhigljavsky, A.: Multistart with early termination of descents. J. Global Optim. 79(2), 447–462 (2019). https://doi.org/10.1007/s10898-019-00814-w
Zheng, R., Li, M.: Multistart global optimization with tunnelling and an evolutionary strategy supervised by a martingale. Eng. Optim., 1–19 (2021). https://doi.org/10.1080/0305215x.2021.1940989
Jin, C., Liu, L.T., Ge, R., Jordan, M.I.: On the local minima of the empirical risk. In: Advances in Neural Information Processing Systems, pp. 4896–4905 (2018)
Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. 155(1–2), 267–305 (2014). https://doi.org/10.1007/s10107-014-0846-1
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156(1–2), 59–99 (2015). https://doi.org/10.1007/s10107-015-0871-8
Hu, W., Li, C.J., Li, L., Liu, J.-G.: On the diffusion approximation of nonconvex stochastic gradient descent. Ann. Math. Sci. Appl. 4(1), 3–32 (2019). https://doi.org/10.4310/amsa.2019.v4.n1.a1
Durrett, R.: Probability: Theory and Examples. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, (2010). https://doi.org/10.1017/CBO9780511779398
Brooks, S.H.: A discussion of random methods for seeking maxima. Oper. Res. 6(2), 244–251 (1958). https://doi.org/10.1287/opre.6.2.244
Moré, J.J., Wild, S.M.: Benchmarking derivative-free optimization algorithms. SIAM J. Optim. 20(1), 172–191 (2009). https://doi.org/10.1137/080724083
Forrester, A., Sobester, A., Keane, A.: Engineering Design via Surrogate Modelling, pp. 195–203. John Wiley & Sons, Ltd, (2008). https://doi.org/10.1002/9780470770801.app1
Zhou, L., Wang, S.-T., Choi, S., Pichler, H., Lukin, M.D.: Quantum approximate optimization algorithm: Performance, mechanism, and implementation on near-term devices. Phys. Rev. X 10(2) (2020). https://doi.org/10.1103/physrevx.10.021067
Shaydulin, R., Safro, I., Larson, J.: Multistart methods for quantum approximate optimization. In: Proceedings of the IEEE High Performance Extreme Computing Conference (2019). https://doi.org/10.1109/hpec.2019.8916288
Cartis, C., Fiala, J., Marteau, B., Roberts, L.: Improving the flexibility and robustness of model-based derivative-free optimization solvers. ACM Trans. Math. Softw. 45(3), 1–41 (2019). https://doi.org/10.1145/3338517
Huyer, W., Neumaier, A.: SNOBFIT - stable noisy optimization by branch and fit. ACM Trans. Math. Softw. 35(2), 1–25 (2008). https://doi.org/10.1145/1377612.1377613
Nogueira, F.: Bayesian Optimization: Open source constrained global optimization tool for Python (2014–). https://github.com/fmfn/BayesianOptimization
Acknowledgements
This work was supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Accelerated Research for Quantum Computing program, under contract number DE-AC02-06CH11357.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised due to change in the copyright line of the article.
Appendices
Proofs
Below are the proofs of Lemma 1 and Lemma 2.
Proof of Lemma 1
First, consider the sets
Now, observe that using Chebyschev’s inequality and Assumption 2, for all \(x \in \mathcal {G}(a;r;n)\),
Since \(x \in \mathcal {G}(a;r;n)\) implies \({\epsilon }_n(x;a)> [f(x)- f(a)]\), therefore
Next, recall the definition of \(\mathcal {A}\left( a;r;n;\beta \right)\), and observe that Eqs. (23) and (24) together imply that
for all \(x \in \mathcal {G}(a;r;n)\). Since \({\epsilon }_n(x;a) = \sqrt{\frac{{\text {Var}}_{\xi }\left[ {\hat{f}}_n(x)-{\hat{f}}_n(a) \right] }{\beta }}\), Eq. (25) implies that
Observe that \(\mathcal {C}(a;r)\subseteq \mathcal {G}(a;r;n)\) and \(\mathcal {A}\left( a;r;n;\beta \right) \cap \mathcal {G}(a;r;n) \subseteq \mathcal {A}\left( a;r;n;\beta \right)\); Therefore it follows from (26) that
Recall that \(\mathcal {T}_{\omega }\) is the union of \(\omega\)-radius balls centered at stationary points of f in \(\mathcal {D}\). Now consider \(a \in \mathcal {D}{\setminus } (\mathcal {Q}_{\tau }\cup \mathcal {T}_{\omega }) \}\). Define the set
where \(\rho\) is the largest eigenvalue of \(\nabla ^2 f(x)\) for \(x \in \mathcal {D}\). Using the Taylor expansion of f around a, we know that for all \(x\in \mathcal {D}\), with \(\Vert x-a\Vert \le r\), there exists a \(\theta \in [0,1]\) such that
For ease of reference, let \(H = \nabla ^2 f(a + \theta (x-a) ) = H\) and \(v = x-a\).
Since f is twice continuously differentiable by Assumption 1.1, its Hessian is always real and symmetric, satisfying \(v^T H v \le \rho v^Tv\) for all \(v \in \mathbb {R}^d\). It follows from (28) that
Equation (29) implies that \(\mathcal {E}(a;r;\rho ) \subseteq \mathcal {C}(a;r) \subseteq \mathcal {A}\left( a;r;n;\beta \right)\). For a given \(a\in \mathcal {D}\), \(m( \mathcal {B}\left( a;r \right) ) \le r^d \frac{\pi ^{d/2} }{ \mathrm {\Gamma }(1+ d/2)}\), where \(\mathrm {\Gamma }(\cdot )\) is the gamma function. Now using the lower bound on \(m(\mathcal {E}(a;r;\rho ))\) derived in Lemma 7 of [11], we obtain
where \(p = \min \{ \Vert \nabla f(a)\Vert , a \in \mathcal {D}{\setminus } (\mathcal {Q}_{\tau }\cup \mathcal {T}_{\omega }) \}\) for any \(k \in \mathbb {N}\). (Note that \(p > 0\) by construction.) Therefore (30) implies that for all \(a \in \mathcal {D}{\setminus } (\mathcal {Q}_{\tau }\cup \mathcal {T}_{\omega })\)
\(\square\)
Proof of Lemma 2
For any \(a\in\) \(S_k\setminus ( \mathcal {Q}_{\tau }\cup \mathcal {T}_{\omega })\) and using the fact that \(S_k\) is sampled uniformly, for a vanishing sequence \(\{r_k\}\) the Lemma 1 implies that there exists a \(k_0\ge 1\) such that for all \(k\ge k_0\),
where the first inequality bounds the probability that for any sampled point in \(S_k\setminus ( \mathcal {Q}_{\tau }\cup \mathcal {T}_{\omega })\) none of the remaining sampled points are in \(\mathcal {A}\left( a;r_k;n;\beta \right)\) (see (S1) of Table 1). The second inequality follows from Boole’s inequality. The last inequality in (31) is due to Lemma 1 and the fact that \(|S_k\setminus ( \mathcal {Q}_{\tau }\cup \mathcal {T}_{\omega })|< |S_k|\).
Recall that for any \(a\in \mathcal {D}\) and \(r_k\) sufficiently small such that \(\mathcal {B}\left( a;r_k \right) \subseteq \mathcal {D}\), \(m( \mathcal {B}\left( a;r_k \right) ) = r_k^d \frac{\pi ^{d/2} }{ \mathrm {\Gamma }(1+ d/2)}\). Combined with this fact, for any \(\sigma >0\) and choosing a sequence \(r_k = \frac{1}{\sqrt{\pi }} \root d \of {\mathrm {\Gamma }(1+{d/2}) m(\mathcal {D}) \sigma \frac{\log |S_k|}{|S_k|}}\), we have
Since \(|S_k|\rightarrow \infty\) as \(k\rightarrow \infty\), the fact that \(e^{-\frac{\sigma }{2} \frac{\log |S_k|}{|S_k|}}\ge 1-\frac{\sigma }{2} \frac{\log |S_k|}{|S_k|}\) implies
\(\square\)
Shekel with \(d=6\) and \(d=8\)
MANSO with heterogeneous variance
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jaiswal, P., Larson, J. Multistart algorithm for identifying all optima of nonconvex stochastic functions. Optim Lett 18, 1335–1360 (2024). https://doi.org/10.1007/s11590-024-02114-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-024-02114-z