[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Stochastic zeroth order descent with structured directions

Published: 18 October 2024 Publication History

Abstract

We introduce and analyze Structured Stochastic Zeroth order Descent (S-SZD), a finite difference approach that approximates a stochastic gradient on a set of ld orthogonal directions, where d is the dimension of the ambient space. These directions are randomly chosen and may change at each step. For smooth convex functions we prove almost sure convergence of the iterates and a convergence rate on the function values of the form O((d/l)k-c) for every c<1/2, which is arbitrarily close to the one of Stochastic Gradient Descent (SGD) in terms of number of iterations (Garrigos and Gower in Handbook of convergence theorems for (stochastic) gradient methods, arXiv:2301.11235, 2024) . Our bound shows the benefits of using l multiple directions instead of one. For non-convex functions satisfying the Polyak-Łojasiewicz condition, we establish the first convergence rates for stochastic structured zeroth order algorithms under such an assumption. We corroborate our theoretical findings with numerical simulations where the assumptions are satisfied and on the real-world problem of hyper-parameter optimization in machine learning, achieving competitive practical performance.

References

[1]
Garrigos, G., Gower, R.M.: Handbook of convergence theorems for (stochastic) gradient methods (2024). arXiv:2301.11235 [math.OC]
[2]
Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning (2017). arXiv:1703.03864 [stat.ML]
[3]
Mania, H., Guy, A., Recht, B.: Simple random search of static linear policies is competitive for reinforcement learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18, pp. 1805–1814. Curran Associates Inc., Red Hook, NY, USA (2018)
[4]
Choromanski, K., Rowland, M., Sindhwani, V., Turner, R., Weller, A.: Structured evolution with compact architectures for scalable policy optimization. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 970–978 (2018). https://proceedings.mlr.press/v80/choromanski18a.html
[5]
Flaxman, A., Kalai, A.T., McMahan, B.: Online convex optimization in the bandit setting: Gradient descent without a gradient. In: SODA ’05 Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 385–394 (2005). https://www.microsoft.com/en-us/research/publication/online-convex-optimization-bandit-setting-gradient-descent-without-gradient/
[6]
Spall JC Introduction to Stochastic Search and Optimization 2003 1 USA John Wiley & Sons Inc
[7]
Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to derivative-free optimization. In: MPS-SIAM Series on Optimization (2009)
[8]
Duchi JC, Jordan MI, Wainwright MJ, and Wibisono A Optimal rates for zero-order convex optimization: the power of two function evaluations IEEE Trans. Inf. Theory 2015 61 5 2788-2806
[9]
Nesterov Y and Spokoiny V Random gradient-free minimization of convex functions Found. Comput. Math. 2017 17 2 527-566
[10]
Chen, R., Wild, S.: Randomized derivative-free optimization of noisy convex functions (2015). arXiv:1507.03332 [math.OC]
[11]
Cai H, McKenzie D, Yin W, and Zhang Z Zeroth-order regularized optimization (zoro): approximately sparse gradients and adaptive sampling SIAM J. Optim. 2022 32 2 687-714
[12]
Cai, H., Lou, Y., Mckenzie, D., Yin, W.: A zeroth-order block coordinate descent algorithm for huge-scale black-box optimization. In: Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 1193–1203 (2021). https://proceedings.mlr.press/v139/cai21d.html
[13]
Gasnikov, A., Novitskii, A., Novitskii, V., Abdukhakimov, F., Kamzolov, D., Beznosikov, A., Takac, M., Dvurechensky, P., Gu, B.: The power of first-order smooth optimization for black-box non-smooth problems. In: Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 7241–7265. PMLR, Virtual Conference (2022). https://proceedings.mlr.press/v162/gasnikov22a.html
[14]
Kolda TG, Lewis RM, and Torczon V Optimization by direct search: new perspectives on some classical and modern methods SIAM Rev. 2003 45 3 385-482
[15]
Gratton S, Royer CW, Vicente LN, and Zhang Z Direct search based on probabilistic descent SIAM J. Optim. 2015 25 3 1515-1541
[16]
Roberts, L., Royer, C.W.: Direct search based on probabilistic descent in reduced spaces (2023).
[17]
Anderson EJ and Ferris MC A direct search algorithm for optimization with noisy function evaluations SIAM J. Optim. 2001 11 3 837-857
[18]
Kim, S., Zhang, D.: Convergence properties of direct search methods for stochastic optimization, (2010).
[19]
Dzahini KJ Expected complexity analysis of stochastic direct-search Comput. Optim. Appl. 2022 81 1 179-200
[20]
Dzahini, K.J., Wild, S.M.: Direct search for stochastic optimization in random subspaces with zeroth-, first-, and second-order convergence and expected complexity (2024). https://arxiv.org/abs/2403.13320
[21]
Audet C, Dzahini KJ, Kokkolaras M, and Le Digabel S Stochastic mesh adaptive direct search for blackbox optimization using probabilistic estimates Comput. Optim. Appl. 2021 79 1 1-34
[22]
Price CJ, Reale M, and Robertson B A direct search method for smooth and nonsmooth unconstrained optimization ANZIAM J. 2006 48 927-948
[23]
Garmanjani R and Vicente LN Smoothing and worst-case complexity for direct-search methods in nonsmooth optimization IMA J. Numer. Anal. 2013 33 3 1008-1028
[24]
Popovic, D., Teel, A.R.: Direct search methods for nonsmooth optimization. In: 2004 43rd IEEE Conference on Decision and Control (CDC)(IEEE Cat. No. 04CH37601), vol. 3, pp. 3173–3178. IEEE (2004)
[25]
Ghadimi S and Lan G Stochastic first- and zeroth-order methods for nonconvex stochastic programming SIAM J. Optim. 2013 23 4 2341-2368
[26]
Khaled, A., Richtárik, P.: Better Theory for SGD in the Nonconvex World (2020)
[27]
Kozák, D., Molinari, C., Rosasco, L., Tenorio, L., Villa, S.: Zeroth-order optimization with orthogonal random directions. Springer, 233 Spring Street, New York, NY 10013, USA (2023).
[28]
Kozák D, Becker S, Doostan A, and Tenorio L A stochastic subspace approach to gradient-free optimization in high dimensions Comput. Optim. Appl. 2021 79 339-368
[29]
Wang, T., Feng, Y.: Convergence rates of zeroth order gradient descent for Łojasiewicz functions. INFORMS Journal on Computing.
[30]
Berahas AS, Cao L, Choromanski K, and Scheinberg K A theoretical and empirical comparison of gradient approximations in derivative-free optimization Found. Comput. Math. 2022 22 2 507-560
[31]
Rando, M., Molinari, C., Rosasco, L., Villa, S.: An optimal structured zeroth-order algorithm for non-smooth optimization. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information Processing Systems, vol. 36, pp. 36738–36767 (2023). https://proceedings.neurips.cc/paper_files/paper/2023/file/7429f4c1b267cf619f28c4d4f1532f99-Paper-Conference.pdf
[32]
Dodangeh M and Vicente LN Worst case complexity of direct search under convexity Math. Program. 2016 155 1 307-332
[33]
Dodangeh M, Vicente L, and Zhang Z On the optimal order of worst case complexity of direct search Optim. Lett. 2016 10 4 699-708
[34]
Duchi JC, Bartlett PL, and Wainwright MJ Randomized smoothing for stochastic optimization SIAM J. Optim. 2012 22 2 674-701
[35]
Bolte J, Daniilidis A, Ley O, and Mazet L Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity Trans. Am. Math. Soc. 2009 362 3319-3363
[36]
Balasubramanian, K., Ghadimi, S.: Zeroth-order (non)-convex stochastic optimization via conditional gradient and gradient updates. Advances in Neural Information Processing Systems 31 (2018)
[37]
Konečný, J., Richtárik, P.: Simple Complexity Analysis of Simplified Direct Search (2014)
[38]
Bergou EH, Gorbunov E, and Richtárik P Stochastic three points method for unconstrained smooth minimization SIAM J. Optim. 2020 30 4 2726-2749
[39]
Hall, J.R., Carey, V.: Accelerating derivative-free optimization with dimension reduction and hyperparameter learning (2021). arXiv:2101.07444 [math.OC]
[40]
Cai H, McKenzie D, Yin W, and Zhang Z A one-bit, comparison-based gradient estimator Appl. Comput. Harmon. Anal. 2022 60 242-266
[41]
Kiefer JW Stochastic estimation of the maximum of a regression function Ann. Math. Stat. 1952 23 3 462-466
[42]
Grapiglia, G.N.: Worst-case evaluation complexity of a derivative-free quadratic regularization method (2022)
[43]
Chikuse, Y.: Statistics on Special Manifolds. vol. 174 (2012)
[44]
Lojasiewicz S A topological property of real analytic subsets Coll. du CNRS, Les équations aux dérivées partielles 1963 117 87–89 2
[45]
Powell, M.J.D.: In: Gomez, S., Hennart, J.-P. (eds.) A Direct Search Optimization Method That Models the Objective and Constraint Functions by Linear Interpolation, pp. 51–67. Springer, Dordrecht (1994).
[46]
Powell MJD Uobyqa: unconstrained optimization by quadratic approximation Math. Program. 2002 92 3 555-582
[47]
Powell, M.J.D.: In: Di Pillo, G., Roma, M. (eds.) The NEWUOA software for unconstrained optimization without derivatives, pp. 255–297. Springer, Boston, MA (2006).
[48]
Powell, M.J., et al.: The bobyqa algorithm for bound constrained optimization without derivatives. Cambridge NA Report NA2009/06, University of Cambridge, Cambridge 26, 26–46 (2009)
[49]
Cartis C and Roberts L Scalable subspace methods for derivative-free nonlinear least-squares optimization Math. Program. 2023 199 1 461-524
[50]
Blanchet J, Cartis C, Menickelly M, and Scheinberg K Convergence rate analysis of a stochastic trust-region method via supermartingales INFORMS J. Optim. 2019 1 2 92-119
[51]
Dzahini, K.J., Wild, S.M.: Stochastic trust-region algorithm in random subspaces with convergence and expected complexity analyses (2022)
[52]
Ha, Y., Shashaani, S.: Iteration Complexity and Finite-Time Efficiency of Adaptive Sampling Trust-Region Methods for Stochastic Derivative-Free Optimization (2024)
[53]
Srinivas, N., Krause, A., Kakade, S.M., Seeger, M.: Gaussian process optimization in the bandit setting: No regret and experimental design. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, pp. 1015–1022 (2010)
[54]
Rando, M., Carratino, L., Villa, S., Rosasco, L.: Ada-bkb: Scalable gaussian process optimization on continuous domains by adaptive discretization. In: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 151, pp. 7320–7348. PMLR, Virtual Conference (2022). https://proceedings.mlr.press/v151/rando22a.html
[55]
Frazier, P.I.: A tutorial on bayesian optimization (2018). arXiv:1807.02811 [stat.ML]
[56]
Shekhar S and Javidi T Gaussian process bandits with adaptive discretization Electron. J. Stat. 2018 12 2 3829-3874
[57]
Salgia, S., Vakili, S., Zhao, Q.: A domain-shrinking based bayesian optimization algorithm with order-optimal regret performance. In: NeurIPS (2021)
[58]
Hansen N The CMA evolution strategy: a comparing review 2007 192 75-102
[59]
Singh DN Review of particle swarm optimization Int. J. Comput. Intell. Inf. Secur. 2012 3 34-44
[60]
Totzeck C Bellomo N, Carrillo JA, and Tadmor E Trends in consensus-based optimization Active Particles, Volume 3: Advances in Theory, Models, and Applications 2022 Cham Springer 201-226
[61]
Fornasier, M., Klock, T., Riedl, K.: Consensus-based optimization methods converge globally (2022)
[62]
Rudi, A., Carratino, L., Rosasco, L.: Falkon: An optimal large scale kernel method. In: Advances in Neural Information Processing Systems, vol. 30 (2017). https://proceedings.neurips.cc/paper/2017/file/05546b0e38ab9175cd905eebcc6ebb76-Paper.pdf
[63]
Lyon RJ, Stappers BW, Cooper S, Brooke JM, and Knowles JD Fifty years of pulsar candidate selection: from simple filters to a new principled real-time classification approach Mon. Not. R. Astron. Soc. 2016 459 1 1104-1123
[64]
Dua, D., Graff, C.: UCI Machine Learning Repository (2017). http://archive.ics.uci.edu/ml
[65]
Polyak BT Introduction to optimization 1987 New York Optimization Software Inc., Publications Division 1, 32
[66]
Robbins H and Siegmund D Rustagi JS A convergence theorem for non negative almost supermartingales and some applications Optimizing Methods in Statistics 1971 Cambridge Academic Press 233-257
[67]
Chung KL On a stochastic approximation method Ann. Math. Stat. 1954 25 3 463-483 Accessed 2022-05-06
[68]
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, and Duchesnay E Scikit-learn: Machine learning in Python J. Mach. Learn. Res. 2011 12 2825-2830
[69]
Harris CR, Millman KJ, Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R, Picus M, Hoyer S, Kerkwijk MH, Brett M, Haldane A, Río JF, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, and Oliphant TE Array programming with NumPy Nature 2020 585 7825 357-362
[70]
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: PyTorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc., Red Hook, NY, USA (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
[71]
Meanti G, Carratino L, Rosasco L, and Rudi A Larochelle H, Ranzato M, Hadsell R, Balcan MF, and Lin H Kernel methods through the roof: Handling billions of points efficiently Advances in Neural Information Processing Systems 2020 Red Hook, NY, USA Curran Associates Inc 14410-14422
[72]
Meanti, G., Carratino, L., De Vito, E., Rosasco, L.: Efficient hyperparameter tuning for large scale kernel ridge regression. In: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (2022)
[73]
Liu, S., Kailkhura, B., Chen, P.-Y., Ting, P., Chang, S., Amini, L.: Zeroth-order stochastic variance reduction for nonconvex optimization. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31 (2018). https://proceedings.neurips.cc/paper_files/paper/2018/file/ba9a56ce0a9bfa26e8ed9e10b2cc8f46-Paper.pdf
[74]
Ji, K., Wang, Z., Zhou, Y., Liang, Y.: Improved zeroth-order variance reduced algorithms and analysis for nonconvex optimization. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 3100–3109 (2019). https://proceedings.mlr.press/v97/ji19a.html
[75]
Mezzadri F How to generate random matrices from the classical compact groups Not. Am. Math. Soc. 2006 54 592-604

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computational Optimization and Applications
Computational Optimization and Applications  Volume 89, Issue 3
Dec 2024
421 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 18 October 2024
Accepted: 03 October 2024
Received: 21 July 2023

Author Tags

  1. Zeroth-order optimization
  2. Finite differences
  3. Black-box optimization
  4. Derivative-free optimization
  5. Stochastic optimization
  6. Convex optimization

Author Tags

  1. 90C56
  2. 90C15
  3. 90C25
  4. 90C30

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media