[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Zeroth-order optimization with orthogonal random directions

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

We propose and analyze a randomized zeroth-order optimization method based on approximating the exact gradient by finite differences computed in a set of orthogonal random directions that changes with each iteration. A number of previously proposed methods are recovered as special cases including spherical smoothing, coordinate descent, as well as discretized gradient descent. Our main contribution is proving convergence guarantees as well as convergence rates under different parameter choices and assumptions. In particular, we consider convex objectives, but also possibly non-convex objectives satisfying the Polyak-Łojasiewicz (PL) condition. Theoretical results are complemented and illustrated by numerical experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Agarwal, A., Dekel, O., Xiao, L.: Optimal algorithms for online convex optimization with multi-point bandit feedback. In: Proceedings of the Twenty Third Annual Conference on Computational Learning Theory, Citeseer, pp. 28–40 (2010)

  2. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137, 91–129 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  3. Avron, H., Maymounkov, P., Toledo, S.: Blendenpik: supercharging Lapack’s least-squares solver. SIAM J. Sci. Comput. 32, 1217–1236 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  4. Baillon, J.-B., Haddad, G.: Quelques propriétés des opérateurs angle-bornés et n-cycliquement monotones. Israel J. Math. 26, 137–150 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  5. Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23, 2037–2060 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  6. Berahas, A.S., Byrd, R.H., Nocedal, J.: Derivative-free optimization of noisy functions via quasi-Newton methods. SIAM J. Optim. 29, 965–993 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  7. Berahas, A.S., Cao, L., Choromanski, K., Scheinberg, K.: A theoretical and empirical comparison of gradient approximations in derivative-free optimization, arXiv preprint arXiv:1905.01332, (2019)

  8. Berahas, A.S., Cao, L., Scheinberg, K.: Global convergence rate analysis of a generic line search algorithm with noise, arXiv preprint arXiv:1910.04055, (2019)

  9. Bertsekas, D.P., Tsitsiklis, J.N.: Gradient convergence in gradient methods with errors. SIAM J. Optim. 10, 627–642 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  10. Bollapragada R., Wild, S.M.: Adaptive sampling quasi-Newton methods for derivative-free stochastic optimization, arXiv preprint arXiv:1910.13516, (2019)

  11. Cartis, C., Roberts, L.: Scalable subspace methods for derivative-free nonlinear least-squares optimization, arXiv preprint arXiv:2102.12016, (2021)

  12. Cartis, C., Scheinberg, K.: Global convergence rate analysis of unconstrained optimization methods based on probabilistic models. Math. Program. 169, 337–375 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  13. Cauchy, A.: Méthode générale pour la résolution des systemes d’équations simultanées. Comp. Rend. Sci. Paris 25, 536–538 (1847)

    Google Scholar 

  14. Choromanski, K., Rowland, M., Sindhwani, V., Turner, R., Weller, A.: Structured evolution with compact architectures for scalable policy optimization. In: Dy, J., Krause, A. (Eds.), Proceedings of the 35th International Conference on Machine Learning, vol. 80 of Proceedings of Machine Learning Research, Stockholmsmässan, Stockholm Sweden, PMLR, pp. 970–978, (10–15 Jul 2018)

  15. Chung, K.L.: On a stochastic approximation method. Ann. Math. Stat. 25, 463–483 (1954)

    Article  MathSciNet  MATH  Google Scholar 

  16. Combettes, P.L., Pesquet, J.-C.: Stochastic quasi-Fejér block-coordinate fixed point iterations with random sweeping. SIAM J. Optim. 25, 1221–1248 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  17. Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to derivative-free optimization, vol. 8 of MPS/SIAM Series on Optimization, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA; Mathematical Programming Society (MPS), Philadelphia, PA (2009)

  18. Dodangeh, M., Vicente, L.N., Zhang, Z. a: On the optimal order of worst case complexity of direct search. Optim. Lett. 10, 699–708 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  19. Drineas, P., Mahoney, M.W., Muthukrishnan, S., Sarlós, T.: Faster least squares approximation. Numer. Math. 117, 219–249 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  20. Duchi, J.C., Jordan, M.I., Wainwright, M.J., Wibisono, A.: Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Trans. Inf. Theory 61, 2788–2806 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  21. Ehrhardt, M.J., Riis, E.S., Ringholm, T., Schönlieb, C.-B.: A geometric integration approach to smooth optimisation: Foundations of the discrete gradient method, arXiv preprint arXiv:1805.06444, (2018)

  22. Finkel, D.E., Kelley, C.T.: Convergence analysis of sampling methods for perturbed Lipschitz functions. Pac. J. Optim. 5, 339–349 (2009)

    MathSciNet  MATH  Google Scholar 

  23. Flaxman, A.D., Kalai, A.T., McMahan, H.B.: Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, ACM, New York, pp. 385–394 (2005)

  24. Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23, 2341–2368 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  25. Gilmore, P., Kelley, C.T.: An implicit filtering algorithm for optimization of functions with many local minima. SIAM J. Optim. 5, 269–285 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  26. Gilmore, P., Kelley, C.T., Miller, C.T., Williams, G.A.: Implicit filtering and optimal design problems. In: Optimal design and control (Blacksburg, VA, 1994), vol. 19 of Progr. Systems Control Theory, Birkhäuser Boston, Boston, MA, pp. 159–176 (1995)

  27. Gratton, S., Royer, C.W., Vicente, L.N., Zhang, Z.: Direct search based on probabilistic descent. SIAM J. Optim. 25, 1515–1541 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  28. Grimm, V., McLachlan, R.I., McLaren, D.I., Quispel, G., Schönlieb, C.: Discrete gradient methods for solving variational image regularisation models. J. Phys. A: Math. Theor. 50, 295201 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  29. Gupal, A.: A method for the minimization of almost-differentiable functions. Cybern. 13, 115–117 (1977)

    Article  MathSciNet  Google Scholar 

  30. Gupal, A., Norkin, V.: Algorithm for the minimization of discontinuous functions. Cybern. 13, 220–223 (1977)

    Article  MATH  Google Scholar 

  31. Hanzely, F., Doikov, N., Nesterov, Y., Richtarik, P.: Stochastic subspace cubic Newton method. In: Proceedings of the 37th International Conference on Machine Learning, H. D. III and A. Singh, eds., vol. 119 of Proceedings of Machine Learning Research, PMLR, pp. 4027–4038 (13–18 Jul 2020)

  32. Hanzely, F., Kovalev, D., Richtarik, P.: Variance reduced coordinate descent with acceleration: New method with a surprising application to finite-sum problems. In: Proceedings of the 37th International Conference on Machine Learning, H. D. III and A. Singh, eds., vol. 119 of Proceedings of Machine Learning Research, PMLR, pp. 4039–4048 (13–18 Jul 2020)

  33. Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp. 795–811 (2016)

  34. Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23, 462–466 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  35. Bromwich, T.: Theory and application of infinite series. By Knopp K. Translated from the second German edition by Miss Young R.C. pp. xii 572. 30s. net. 1928. (Blackie) Math. Gaz. 14(199), 370–371. Cambridge University Press (1929). https://doi.org/:10.2307/3606730

  36. Konecný, J., Richtárik, P.: Simple complexity analysis of simplified direct search, arXiv preprint arXiv:1411.5873, (2014)

  37. Kozak, D., Becker, S., Doostan, A., Tenorio, L.: Stochastic subspace descent, arXiv preprint arXiv:1904.01145, (2019)

  38. Kozak, D., Becker, S., Doostan, A., Tenorio, L.: A stochastic subspace approach to gradient-free optimization in high dimensions. Comput. Optim. Appl. 79, 339–368 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  39. Kushner, H.J., Clark, D.S.: Stochastic approximation methods for constrained and unconstrained systems. Applied Mathematical Sciences, vol. 26. Springer-Verlag, New York-Berlin (1978)

  40. Mania, H., Guy, A., Recht, B.: Simple random search of static linear policies is competitive for reinforcement learning. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 1800–1809. Curran Associates, Inc., New York (2018)

    Google Scholar 

  41. Martinsson, P.-G., Tropp, J.A.: Randomized numerical linear algebra: foundations and algorithms. Acta Numer. 29, 403–572 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  42. Matyas, J.: Random optimization. Autom. Remote. Control. 26, 246–253 (1965)

    MathSciNet  MATH  Google Scholar 

  43. Mezzadri, F.: How to generate random matrices from the classical compact groups. Notices Amer. Math. Soc. 54, 592–604 (2007)

    MathSciNet  MATH  Google Scholar 

  44. Nesterov, Y.: Random gradient-free minimization of convex functions. LIDAM Discussion Papers CORE 2011001, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), (Jan. 2011)

  45. Nesterov, Y., Spokoiny, V.: Random gradient-free minimization of convex functions. Found. Comput. Math. 17, 527–566 (2017). (First appeard as CORE discussion paper 2011)

    Article  MathSciNet  MATH  Google Scholar 

  46. Opial, Z. a: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Amer. Math. Soc. 73, 591–597 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  47. Paquette, C., Scheinberg, K.: A stochastic line search method with expected complexity analysis. SIAM J. Optim. 30, 349–376 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  48. Polyak, B.T.: Introduction to optimization, vol. 1. Optimization Software Inc., New York (1987)

    MATH  Google Scholar 

  49. Rastrigin, L.A.: About convergence of random search method in extremal control of multi-parameter systems. Avtomat. i Telemekh 24, 1467–1473 (1963)

    Google Scholar 

  50. Riis, E.S., Ehrhardt, M.J., Quispel, G., Schönlieb, C.-B.: A geometric integration approach to nonsmooth, nonconvex optimisation, arXiv preprint arXiv:1807.07554, (2018)

  51. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  52. Robbins, H., Siegmund, D.: A convergence theorem for non negative almost supermartingales and some applications. In: Optimizing methods in statistics (Proc. Sympos., Ohio State Univ., Columbus, Ohio, 1971), pp. 233–257 (1971)

  53. Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning, arXiv preprint arXiv:1703.03864, (2017)

  54. Salzo, S., Villa, S.: Parallel random block-coordinate forward-backward algorithm: A unified convergence analysis, arXiv preprint arXiv:1906.07392, (2019)

  55. Schmidt, M., Roux, N.L., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization, arXiv preprint arXiv:1109.2415, (2011)

  56. Spall, J.C.: Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Automat. Control 37, 332–341 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  57. Tappenden, R., Takáč, M., Richtárik, P.: On the complexity of parallel coordinate descent. Optim. Methods Softw. 33, 372–395 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  58. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992)

    Article  MATH  Google Scholar 

  59. Woodruff, D.P.: Sketching as a tool for numerical linear algebra. Found. Trends Theor. Comput. Sci. 10, iv+157 (2014)

    MathSciNet  MATH  Google Scholar 

  60. Wright, S.J.: Coordinate descent algorithms. Math. Program. 151, 3–34 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  61. Zavriev, S.K.: On the global optimization properties of finite-difference local descent algorithms. J. Global Optim. 3, 67–78 (1993)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

L.R. acknowledges support from the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216, and the Italian Institute of Technology. L.R. also acknowledges the financial support of the European Research Council (grant SLING 819789), the AFOSR projects FA9550- 18-1-7009, FA9550-17-1-0390 and BAA-AFRL-AFOSR-2016-0007 (European Office of Aerospace Research and Development), and the EU H2020-MSCA-RISE project NoMADS - DLV-777826. This work has been supported by the ITN-ETN project TraDE-OPT funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 861137. LT acknowledges funding by National Science Foundation grant DMS-1723005.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Kozak.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 374 KB)

Auxiliary lemmas

Auxiliary lemmas

Here we collect the main auxiliary results used in the convergence analysis of Algorithm 1.

Proof of Lemma 1

Let \(k\in \mathbb {N}\), denoting by \(p^{(j)}\) the j-th column of \(P_k\), we want to estimate

$$\begin{aligned} \left\Vert \nabla _{\left( P_k, h_k\right) } f(x)-P_k^\top \nabla f(x)\right\Vert {}^2 = {\sum _{j=1}^{\ell }} \left( \left[ \nabla _{\left( P_k, h_k\right) } f(x)\right] _j-\langle \nabla f(x), p^{(j)} \rangle \right) ^2.\qquad \end{aligned}$$
(9.1)

To get an upper-bound for the term in parenthesis, we use H.1. As \(\nabla f\) is \(\lambda \)-Lipschitz, the Descent Lemma 7 holds: for every \(x\in \mathbb {R}^d\),

$$\begin{aligned} \left| f(x+h_k p_j) - f(x) - h_k\langle \nabla f(x), p^{(j)} \rangle \right| \le {\frac{\lambda h_k^2}{2}}\left\Vert p^{(j)}\right\Vert ^2. \end{aligned}$$
(9.2)

Then, rearranging,

$$\begin{aligned} \begin{aligned} \left| \left[ \nabla _{\left( P_k, h_k\right) } f(x)\right] _j-\langle \nabla f(x),p^{(j)} \rangle \right|&\le {\frac{\lambda h_k}{2}}\left\Vert p^{(j)}\right\Vert ^2 \quad \text {a.s.} \end{aligned} \end{aligned}$$
(9.3)

Note that P.1 implies \(\Vert p^{(j)}\Vert ^2\overset{a.s.}{=}d/\ell \) and so (9.1) and (9.3) yield

$$\begin{aligned} \begin{aligned} \left\Vert \nabla _{\left( P_k, h_k\right) } f(x)-P_k^\top \nabla f(x)\right\Vert {}&\le \sqrt{{\sum _{j=1}^{\ell } } \left( {\frac{\lambda h_k d}{2\ell }}\right) ^2} = {\frac{\lambda h_k d}{2\sqrt{\ell }}} \quad \text { a.s.}. \end{aligned} \end{aligned}$$

\(\square \)

Lemma 5

Let \(\left( a_k\right) \) and \(\left( c_k\right) \) be non-negative sequences with \(\lim _k c_k =0\) and let \(\eta \in \left( 0,1\right) \). If for every \(k\in \mathbb {N}\)

$$\begin{aligned} a_{k+1}\le \left( 1-\eta \right) a_k +c_k, \end{aligned}$$
(9.4)

then \(\lim _k a_k =0\). Moreover, if \(c_k=c/k^t\) for some \(c\ge 0\) and \(t>0\), then we have \(\limsup _k \ k^t a_k\le {c}/{\eta }.\)

Proof

Iterating the inequality in (9.4), we get

$$\begin{aligned} \begin{aligned} a_{k}&\le (1-\eta )^k a_0 \\&\quad + \left( c_{k-1}+(1-\eta ) c_{k-2} + (1-\eta )^2 c_{k-3} +...+(1-\eta )^{k-1}c_0 \right) \\&\le a_0 + \left( \sup _{j\ge 0} c_j \right) {\sum _{j\ge 0}^{}} (1-\eta )^j = : {\tilde{C}} <+\infty , \end{aligned} \end{aligned}$$

where we used the fact that \(\left( c_k\right) \) is convergent (and thus bounded) and that \(\eta \in \left( 0,1\right) \). In particular, the sequence \(\left( a_k\right) \) is bounded and so its \(\limsup \) is a real number. Then, again from the hypothesis that \(a_{k+1}\le (1-\eta ) a_k +c_k\) and the existence of \(\lim _k c_k\), we have \(\limsup _k a_{k} \le \limsup _k \left[ (1-\eta ) a_k + c_k \right] = (1-\eta ) \limsup _k a_k + \lim _k c_k.\) So, as \(\lim _k c_k =0\) and \(\limsup _k a_k \in \left[ 0,+\infty \right) \), \(\eta \limsup _k a_k\le 0\). Finally, as \(\eta \in \left( 0,1\right) \), \(\limsup _k a_k\le 0\), and so, as \(a_k\ge 0\), \(\lim _k a_k = 0\). Now assume that \(c_k=c/k^t\) for some \(c\ge 0\) and \(t>0\).

Let \(c'>c\). Then, following the proof of [15, Lemma 4], there exists \(k_0\in \mathbb {N}\) such that for every \(k\ge k_0\) it holds

$$\begin{aligned} \frac{c}{k^t}\le \frac{c'}{\eta }\left[ \frac{1}{(k+1)^t}-(1-\eta )\frac{1}{k^t}\right] . \end{aligned}$$

Using the previous inequality in (9.4), we get

$$\begin{aligned} a_{k+1}-\frac{c'}{\eta (k+1)^t}\le (1-\eta )\left[ a_{k}-\frac{c'}{\eta k^t}\right] . \end{aligned}$$
(9.5)

First suppose that there exists \({\bar{k}}\ge k_0\) such that \(a_{{\bar{k}}}\le \frac{c'}{\eta {\bar{k}}^{t}}\). Using (9.5), it is easy to see by recursion that, for every \(k\ge {\bar{k}}\),

$$\begin{aligned} a_{k}\le \frac{c'}{\eta k^{t}}. \end{aligned}$$

Now suppose the opposite; namely, that for every \(k\ge k_0\)

$$\begin{aligned} a_k-\frac{c'}{\eta k^t}>0. \end{aligned}$$

Then, iterating (9.5), we get

$$\begin{aligned} a_{k}-\frac{c'}{\eta k^t}\le (1-\eta )^{k-1}\left[ a_{1}-\frac{c'}{\eta }\right] . \end{aligned}$$

In any of the two previous cases,

$$\begin{aligned} \limsup _k \ k^t a_k\le \limsup _k \left\{ \frac{c'}{\eta }+k^t(1-\eta )^{k-1}\left[ a_{1}-\frac{c'}{\eta }\right] \right\} =\frac{c'}{\eta }. \end{aligned}$$

Finally, since the previous inequality holds for every \(c'>c\), we get the claim. \(\square \)

Remark 23

Consider a function \(f: {\mathbb {R}}^d \rightarrow {\mathbb {R}}\) with \(\lambda \)-Lipschitz gradient and at least one minimizer. Applying the recursion of Algorithm 1 to such a function with an arbitrary starting point \(x_0 \in {\mathbb {R}}^d\) we get that \(\left\Vert x_k-x_*\right\Vert ^2\), \(\langle P_kP_k^\top \nabla f_k, x_k-x_* \rangle \), and \(\left\Vert \nabla f_k\right\Vert ^2\) are bounded for all \(k>0\). In particular, \(\left\Vert \nabla f_k\right\Vert ^2\), \(\left\Vert x_k-x_*\right\Vert ^2\), and \(\langle P_kP_k^\top \nabla f_k, x_k-x_* \rangle \) are all integrable. To see this note first that \(\nabla f_k\) and \(x_k\) are both measurable. Recall that for finite \(x_k\), \(\lambda \)-Lipschitz gradient of f implies that \(\left\Vert \nabla f_k\right\Vert ^2 \le \lambda ^2 \left\Vert x_k-x_*\right\Vert ^2\). Choose an arbitrary finite \(x_0\) to begin the recursion. Then,

$$\begin{aligned} \left\Vert x_1 -x_*\right\Vert ^2&\le 2\left\Vert x_0-x_*\right\Vert ^2 + 2\alpha _0^2\left\Vert P_0\nabla _{P_0} f_0\right\Vert ^2 \\&\le C_1 + C_2\left\Vert P_0\nabla _{P_0} f_0 - P_0P_0^\top \nabla f_0 + P_0P_0^\top \nabla f_0\right\Vert ^2 \\ \mathrm {Lemma }\,\,~1&\le C_1 + C_2\left( C_3 + \left( {\frac{d}{\ell }}\right) ^2 \left\Vert \nabla f_0\right\Vert ^2\right) , \end{aligned}$$

where \(C_1, C_2, C_3\) are fixed and finite, the values are unimportant but can be calculated. Repeated recursion reveals that \(\left\Vert x_k-x_*\right\Vert ^2\) is bounded for all \(k>0\). Further, by \(\lambda \)-Lipschitz gradient, \(\left\Vert \nabla f_k\right\Vert ^2 \le \lambda ^2 \left\Vert x_k-x_*\right\Vert ^2\). The claim on the inner product follows from Young’s inequality and the previous results. Finally, since \(\left\Vert \nabla f_k\right\Vert ^2\), \(\left\Vert x_k-x_*\right\Vert ^2\), and \(\langle P_kP_k^\top \nabla f_k, x_k-x_* \rangle \) are bounded, and measurable, they are integrable.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kozak, D., Molinari, C., Rosasco, L. et al. Zeroth-order optimization with orthogonal random directions. Math. Program. 199, 1179–1219 (2023). https://doi.org/10.1007/s10107-022-01866-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-022-01866-9

Keywords

Mathematics Subject Classification

Navigation