More Web Proxy on the site http://driver.im/

research-article

Stochastic zeroth order descent with structured directions

Authors:

Cesare Molinari,

Lorenzo RosascoAuthors Info & Claims

Computational Optimization and Applications, Volume 89, Issue 3

Pages 691 - 727

https://doi.org/10.1007/s10589-024-00616-1

Published: 18 October 2024 Publication History

Abstract

We introduce and analyze Structured Stochastic Zeroth order Descent (S-SZD), a finite difference approach that approximates a stochastic gradient on a set of

l \leq d

orthogonal directions, where d is the dimension of the ambient space. These directions are randomly chosen and may change at each step. For smooth convex functions we prove almost sure convergence of the iterates and a convergence rate on the function values of the form

O ((d / l) k^{- c})

for every

c < 1 / 2

, which is arbitrarily close to the one of Stochastic Gradient Descent (SGD) in terms of number of iterations (Garrigos and Gower in Handbook of convergence theorems for (stochastic) gradient methods, arXiv:2301.11235, 2024) . Our bound shows the benefits of using l multiple directions instead of one. For non-convex functions satisfying the Polyak-Łojasiewicz condition, we establish the first convergence rates for stochastic structured zeroth order algorithms under such an assumption. We corroborate our theoretical findings with numerical simulations where the assumptions are satisfied and on the real-world problem of hyper-parameter optimization in machine learning, achieving competitive practical performance.

References

[1]

Garrigos, G., Gower, R.M.: Handbook of convergence theorems for (stochastic) gradient methods (2024). arXiv:2301.11235 [math.OC]

[2]

Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning (2017). arXiv:1703.03864 [stat.ML]

[3]

Mania, H., Guy, A., Recht, B.: Simple random search of static linear policies is competitive for reinforcement learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18, pp. 1805–1814. Curran Associates Inc., Red Hook, NY, USA (2018)

[4]

Choromanski, K., Rowland, M., Sindhwani, V., Turner, R., Weller, A.: Structured evolution with compact architectures for scalable policy optimization. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 970–978 (2018). https://proceedings.mlr.press/v80/choromanski18a.html

[5]

Flaxman, A., Kalai, A.T., McMahan, B.: Online convex optimization in the bandit setting: Gradient descent without a gradient. In: SODA ’05 Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 385–394 (2005). https://www.microsoft.com/en-us/research/publication/online-convex-optimization-bandit-setting-gradient-descent-without-gradient/

[6]

Spall JC Introduction to Stochastic Search and Optimization 2003 1 USA John Wiley & Sons Inc

[7]

Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to derivative-free optimization. In: MPS-SIAM Series on Optimization (2009)

[8]

Duchi JC, Jordan MI, Wainwright MJ, and Wibisono A Optimal rates for zero-order convex optimization: the power of two function evaluations IEEE Trans. Inf. Theory 2015 61 5 2788-2806

Digital Library

[9]

Nesterov Y and Spokoiny V Random gradient-free minimization of convex functions Found. Comput. Math. 2017 17 2 527-566

Digital Library

[10]

Chen, R., Wild, S.: Randomized derivative-free optimization of noisy convex functions (2015). arXiv:1507.03332 [math.OC]

[11]

Cai H, McKenzie D, Yin W, and Zhang Z Zeroth-order regularized optimization (zoro): approximately sparse gradients and adaptive sampling SIAM J. Optim. 2022 32 2 687-714

Digital Library

[12]

Cai, H., Lou, Y., Mckenzie, D., Yin, W.: A zeroth-order block coordinate descent algorithm for huge-scale black-box optimization. In: Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 1193–1203 (2021). https://proceedings.mlr.press/v139/cai21d.html

[13]

Gasnikov, A., Novitskii, A., Novitskii, V., Abdukhakimov, F., Kamzolov, D., Beznosikov, A., Takac, M., Dvurechensky, P., Gu, B.: The power of first-order smooth optimization for black-box non-smooth problems. In: Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 7241–7265. PMLR, Virtual Conference (2022). https://proceedings.mlr.press/v162/gasnikov22a.html

[14]

Kolda TG, Lewis RM, and Torczon V Optimization by direct search: new perspectives on some classical and modern methods SIAM Rev. 2003 45 3 385-482

[15]

Gratton S, Royer CW, Vicente LN, and Zhang Z Direct search based on probabilistic descent SIAM J. Optim. 2015 25 3 1515-1541

Digital Library

[16]

Roberts, L., Royer, C.W.: Direct search based on probabilistic descent in reduced spaces (2023).

[17]

Anderson EJ and Ferris MC A direct search algorithm for optimization with noisy function evaluations SIAM J. Optim. 2001 11 3 837-857

Digital Library

[18]

Kim, S., Zhang, D.: Convergence properties of direct search methods for stochastic optimization, (2010).

[19]

Dzahini KJ Expected complexity analysis of stochastic direct-search Comput. Optim. Appl. 2022 81 1 179-200

Digital Library

[20]

Dzahini, K.J., Wild, S.M.: Direct search for stochastic optimization in random subspaces with zeroth-, first-, and second-order convergence and expected complexity (2024). https://arxiv.org/abs/2403.13320

[21]

Audet C, Dzahini KJ, Kokkolaras M, and Le Digabel S Stochastic mesh adaptive direct search for blackbox optimization using probabilistic estimates Comput. Optim. Appl. 2021 79 1 1-34

Digital Library

[22]

Price CJ, Reale M, and Robertson B A direct search method for smooth and nonsmooth unconstrained optimization ANZIAM J. 2006 48 927-948

[23]

Garmanjani R and Vicente LN Smoothing and worst-case complexity for direct-search methods in nonsmooth optimization IMA J. Numer. Anal. 2013 33 3 1008-1028

[24]

Popovic, D., Teel, A.R.: Direct search methods for nonsmooth optimization. In: 2004 43rd IEEE Conference on Decision and Control (CDC)(IEEE Cat. No. 04CH37601), vol. 3, pp. 3173–3178. IEEE (2004)

[25]

Ghadimi S and Lan G Stochastic first- and zeroth-order methods for nonconvex stochastic programming SIAM J. Optim. 2013 23 4 2341-2368

Digital Library

[26]

Khaled, A., Richtárik, P.: Better Theory for SGD in the Nonconvex World (2020)

[27]

Kozák, D., Molinari, C., Rosasco, L., Tenorio, L., Villa, S.: Zeroth-order optimization with orthogonal random directions. Springer, 233 Spring Street, New York, NY 10013, USA (2023).

Digital Library

[28]

Kozák D, Becker S, Doostan A, and Tenorio L A stochastic subspace approach to gradient-free optimization in high dimensions Comput. Optim. Appl. 2021 79 339-368

[29]

Wang, T., Feng, Y.: Convergence rates of zeroth order gradient descent for Łojasiewicz functions. INFORMS Journal on Computing.

Digital Library

[30]

Berahas AS, Cao L, Choromanski K, and Scheinberg K A theoretical and empirical comparison of gradient approximations in derivative-free optimization Found. Comput. Math. 2022 22 2 507-560

Digital Library

[31]

Rando, M., Molinari, C., Rosasco, L., Villa, S.: An optimal structured zeroth-order algorithm for non-smooth optimization. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information Processing Systems, vol. 36, pp. 36738–36767 (2023). https://proceedings.neurips.cc/paper_files/paper/2023/file/7429f4c1b267cf619f28c4d4f1532f99-Paper-Conference.pdf

[32]

Dodangeh M and Vicente LN Worst case complexity of direct search under convexity Math. Program. 2016 155 1 307-332

Digital Library

[33]

Dodangeh M, Vicente L, and Zhang Z On the optimal order of worst case complexity of direct search Optim. Lett. 2016 10 4 699-708

[34]

Duchi JC, Bartlett PL, and Wainwright MJ Randomized smoothing for stochastic optimization SIAM J. Optim. 2012 22 2 674-701

Digital Library

[35]

Bolte J, Daniilidis A, Ley O, and Mazet L Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity Trans. Am. Math. Soc. 2009 362 3319-3363

[36]

Balasubramanian, K., Ghadimi, S.: Zeroth-order (non)-convex stochastic optimization via conditional gradient and gradient updates. Advances in Neural Information Processing Systems 31 (2018)

[37]

Konečný, J., Richtárik, P.: Simple Complexity Analysis of Simplified Direct Search (2014)

[38]

Bergou EH, Gorbunov E, and Richtárik P Stochastic three points method for unconstrained smooth minimization SIAM J. Optim. 2020 30 4 2726-2749

Digital Library

[39]

Hall, J.R., Carey, V.: Accelerating derivative-free optimization with dimension reduction and hyperparameter learning (2021). arXiv:2101.07444 [math.OC]

[40]

Cai H, McKenzie D, Yin W, and Zhang Z A one-bit, comparison-based gradient estimator Appl. Comput. Harmon. Anal. 2022 60 242-266

[41]

Kiefer JW Stochastic estimation of the maximum of a regression function Ann. Math. Stat. 1952 23 3 462-466

[42]

Grapiglia, G.N.: Worst-case evaluation complexity of a derivative-free quadratic regularization method (2022)

[43]

Chikuse, Y.: Statistics on Special Manifolds. vol. 174 (2012)

[44]

Lojasiewicz S A topological property of real analytic subsets Coll. du CNRS, Les équations aux dérivées partielles 1963 117 87–89 2

[45]

Powell, M.J.D.: In: Gomez, S., Hennart, J.-P. (eds.) A Direct Search Optimization Method That Models the Objective and Constraint Functions by Linear Interpolation, pp. 51–67. Springer, Dordrecht (1994).

[46]

Powell MJD Uobyqa: unconstrained optimization by quadratic approximation Math. Program. 2002 92 3 555-582

[47]

Powell, M.J.D.: In: Di Pillo, G., Roma, M. (eds.) The NEWUOA software for unconstrained optimization without derivatives, pp. 255–297. Springer, Boston, MA (2006).

[48]

Powell, M.J., et al.: The bobyqa algorithm for bound constrained optimization without derivatives. Cambridge NA Report NA2009/06, University of Cambridge, Cambridge 26, 26–46 (2009)

[49]

Cartis C and Roberts L Scalable subspace methods for derivative-free nonlinear least-squares optimization Math. Program. 2023 199 1 461-524

Digital Library

[50]

Blanchet J, Cartis C, Menickelly M, and Scheinberg K Convergence rate analysis of a stochastic trust-region method via supermartingales INFORMS J. Optim. 2019 1 2 92-119

[51]

Dzahini, K.J., Wild, S.M.: Stochastic trust-region algorithm in random subspaces with convergence and expected complexity analyses (2022)

[52]

Ha, Y., Shashaani, S.: Iteration Complexity and Finite-Time Efficiency of Adaptive Sampling Trust-Region Methods for Stochastic Derivative-Free Optimization (2024)

[53]

Srinivas, N., Krause, A., Kakade, S.M., Seeger, M.: Gaussian process optimization in the bandit setting: No regret and experimental design. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, pp. 1015–1022 (2010)

[54]

Rando, M., Carratino, L., Villa, S., Rosasco, L.: Ada-bkb: Scalable gaussian process optimization on continuous domains by adaptive discretization. In: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 151, pp. 7320–7348. PMLR, Virtual Conference (2022). https://proceedings.mlr.press/v151/rando22a.html

[55]

Frazier, P.I.: A tutorial on bayesian optimization (2018). arXiv:1807.02811 [stat.ML]

[56]

Shekhar S and Javidi T Gaussian process bandits with adaptive discretization Electron. J. Stat. 2018 12 2 3829-3874

[57]

Salgia, S., Vakili, S., Zhao, Q.: A domain-shrinking based bayesian optimization algorithm with order-optimal regret performance. In: NeurIPS (2021)

[58]

Hansen N The CMA evolution strategy: a comparing review 2007 192 75-102

[59]

Singh DN Review of particle swarm optimization Int. J. Comput. Intell. Inf. Secur. 2012 3 34-44

[60]

Totzeck C Bellomo N, Carrillo JA, and Tadmor E Trends in consensus-based optimization Active Particles, Volume 3: Advances in Theory, Models, and Applications 2022 Cham Springer 201-226

[61]

Fornasier, M., Klock, T., Riedl, K.: Consensus-based optimization methods converge globally (2022)

[62]

Rudi, A., Carratino, L., Rosasco, L.: Falkon: An optimal large scale kernel method. In: Advances in Neural Information Processing Systems, vol. 30 (2017). https://proceedings.neurips.cc/paper/2017/file/05546b0e38ab9175cd905eebcc6ebb76-Paper.pdf

[63]

Lyon RJ, Stappers BW, Cooper S, Brooke JM, and Knowles JD Fifty years of pulsar candidate selection: from simple filters to a new principled real-time classification approach Mon. Not. R. Astron. Soc. 2016 459 1 1104-1123

[64]

Dua, D., Graff, C.: UCI Machine Learning Repository (2017). http://archive.ics.uci.edu/ml

[65]

Polyak BT Introduction to optimization 1987 New York Optimization Software Inc., Publications Division 1, 32

[66]

Robbins H and Siegmund D Rustagi JS A convergence theorem for non negative almost supermartingales and some applications Optimizing Methods in Statistics 1971 Cambridge Academic Press 233-257

[67]

Chung KL On a stochastic approximation method Ann. Math. Stat. 1954 25 3 463-483 Accessed 2022-05-06

[68]

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, and Duchesnay E Scikit-learn: Machine learning in Python J. Mach. Learn. Res. 2011 12 2825-2830

Digital Library

[69]

Harris CR, Millman KJ, Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R, Picus M, Hoyer S, Kerkwijk MH, Brett M, Haldane A, Río JF, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, and Oliphant TE Array programming with NumPy Nature 2020 585 7825 357-362

[70]

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: PyTorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc., Red Hook, NY, USA (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

[71]

Meanti G, Carratino L, Rosasco L, and Rudi A Larochelle H, Ranzato M, Hadsell R, Balcan MF, and Lin H Kernel methods through the roof: Handling billions of points efficiently Advances in Neural Information Processing Systems 2020 Red Hook, NY, USA Curran Associates Inc 14410-14422

[72]

Meanti, G., Carratino, L., De Vito, E., Rosasco, L.: Efficient hyperparameter tuning for large scale kernel ridge regression. In: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (2022)

[73]

Liu, S., Kailkhura, B., Chen, P.-Y., Ting, P., Chang, S., Amini, L.: Zeroth-order stochastic variance reduction for nonconvex optimization. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31 (2018). https://proceedings.neurips.cc/paper_files/paper/2018/file/ba9a56ce0a9bfa26e8ed9e10b2cc8f46-Paper.pdf

[74]

Ji, K., Wang, Z., Zhou, Y., Liang, Y.: Improved zeroth-order variance reduced algorithms and analysis for nonconvex optimization. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 3100–3109 (2019). https://proceedings.mlr.press/v97/ji19a.html

[75]

Mezzadri F How to generate random matrices from the classical compact groups Not. Am. Math. Soc. 2006 54 592-604

Index Terms

Stochastic zeroth order descent with structured directions
1. Mathematics of computing
  1. Mathematical analysis
    1. Mathematical optimization
      1. Continuous optimization
        Stochastic control and optimization
  2. Probability and statistics
    1. Stochastic processes
2. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization
      1. Continuous optimization

Index terms have been assigned to the content through auto-classification.

Recommendations

Stochastic Intermediate Gradient Method for Convex Problems with Stochastic Inexact Oracle

In this paper, we introduce new methods for convex optimization problems with stochastic inexact oracle. Our first method is an extension of the Intermediate Gradient Method proposed by Devolder, Glineur and Nesterov for problems with deterministic ...
Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions

Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, ...
Zeroth-order optimization with orthogonal random directions
Abstract
We propose and analyze a randomized zeroth-order optimization method based on approximating the exact gradient by finite differences computed in a set of orthogonal random directions that changes with each iteration. A number of previously ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computational Optimization and Applications

Computational Optimization and Applications Volume 89, Issue 3

Dec 2024

421 pages

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 18 October 2024

Accepted: 03 October 2024

Received: 21 July 2023

Author Tags

Author Tags

Qualifiers

Research-article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents