[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Zeroth-Order Regularized Optimization (ZORO): : Approximately Sparse Gradients and Adaptive Sampling

Published: 28 April 2022 Publication History

Abstract

We consider the problem of minimizing a high-dimensional objective function, which may include a regularization term, using only (possibly noisy) evaluations of the function. Such optimization is also called derivative-free, zeroth-order, or black-box optimization. We propose a new zeroth-order regularized optimization method, dubbed ZORO. When the underlying gradient is approximately sparse at an iterate, ZORO needs very few objective function evaluations to obtain a new iterate that decreases the objective function. We achieve this with an adaptive, randomized gradient estimator, followed by an inexact proximal-gradient scheme. Under a novel approximately sparse gradient assumption and various different convex settings, we show that the (theoretical and empirical) convergence rate of ZORO is only logarithmically dependent on the problem dimension. Numerical experiments show that ZORO outperforms existing methods with similar assumptions, on both synthetic and real datasets.

References

[1]
K. Balasubramanian and S. Ghadimi, Zeroth-order (non)-convex stochastic optimization via conditional gradient and gradient updates, in Proceedings of Advances in Neural Information Processing Systems, 2018, pp. 3455--3464.
[2]
R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, A simple proof of the restricted isometry property for random matrices, Constr. Approx., 28 (2008), pp. 253--263.
[3]
A. S. Berahas, L. Cao, K. Choromanski, and K. Scheinberg, A theoretical and empirical comparison of gradient approximations in derivative-free optimization, Found. Comput. Math., 2021, pp. 1--54.
[4]
A. S. Berahas, L. Cao, and K. Scheinberg, Global convergence rate analysis of a generic line search algorithm with noise, SIAM J. Optim., 31 (2021), pp. 1489--1518.
[5]
E. H. Bergou, E. Gorbunov, and P. Richtarik, Stochastic three points method for unconstrained smooth minimization, SIAM J. Optim., 30 (2020), pp. 2726--2749.
[6]
J. Bergstra and Y. Bengio, Random search for hyper-parameter optimization, J. Mach. Learn. Res., 13 (2012), pp. 281--305.
[7]
D. P. Bertsekas, Nonlinear programming, J. Oper. Res. Soc., 48 (1997), pp. 334--334.
[8]
D. Blatt, A. O. Hero, and H. Gauchman, A convergent incremental gradient method with a constant step size, SIAM J. Optim., 18 (2007), pp. 29--51.
[9]
H. Cai, Y. Lou, D. Mckenzie, and W. Yin, A zeroth-order block coordinate descent algorithm for huge-scale black-box optimization, in Proceedings of the International Conference on Machine Learning, 2021, pp. 1193--1203.
[10]
H. Cai, D. Mckenzie, W. Yin, and Z. Zhang, A One-Bit, Comparison-Based Gradient Estimator, preprint, arXiv:2010.02479, 2020.
[11]
T.-J. Chang, N. Meade, J. E. Beasley, and Y. M. Sharaiha, Heuristics for cardinality constrained portfolio optimisation, Comput. Oper. Res., 27 (2000), pp. 1271--1302.
[12]
X. Chen, S. Liu, K. Xu, X. Li, X. Lin, M. Hong, and D. Cox, ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization, in Proceedings of Advances in Neural Information Processing Systems, 2019, pp. 7202--7213.
[13]
K. Choromanski, A. Pacchiano, J. Parker-Holder, Y. Tang, D. Jain, Y. Yang, A. Iscen, J. Hsu, and V. Sindhwani, Provably robust blackbox optimization for reinforcement learning, in Proceedings of the Conference on Robot Learning, 2020, pp. 683--696.
[14]
K. Choromanski, M. Rowland, V. Sindhwani, R. Turner, and A. Weller, Structured evolution with compact architectures for scalable policy optimization, in Proceedings of the International Conference on Machine Learning, 2018, pp. 970--978.
[15]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, Imagenet: A large-scale hierarchical image database, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, pp. 248--255.
[16]
J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc., 96 (2001), pp. 1348--1360.
[17]
A. D. Flaxman, A. T. Kalai, and H. B. McMahan, Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient, preprint, arXiv:cs/0408007, 2004.
[18]
S. Foucart, Sparse Recovery Algorithms: Sufficient Conditions in Terms of Restricted Isometry Constants, in Proceedings of Approximation Theory XIII, San Antonio, 2010, Springer, New York, 2012, pp. 65--77.
[19]
M. P. Friedlander and M. Schmidt, Hybrid deterministic-stochastic methods for data fitting, SIAM J. Sci. Comput., 34 (2012), pp. A1380--A1405.
[20]
S. Ghadimi and G. Lan, Stochastic first-and zeroth-order methods for nonconvex stochastic programming, SIAM J. Optim., 23 (2013), pp. 2341--2368.
[21]
K. G. Jamieson, R. Nowak, and B. Recht, Query complexity of derivative-free optimization, in Proceedings of Advances in Neural Information Processing Systems, 2012, pp. 2672--2680.
[22]
J. Kiefer and J. Wolfowitz, Stochastic estimation of the maximum of a regression function, Ann. Math. Statist., 23 (1952), pp. 462--466.
[23]
B. Kim, H. Cai, D. McKenzie, and W. Yin, Curvature-Aware Derivative-Free Optimization, preprint, arXiv:2109.13391, 2021.
[24]
C. G. Knight, S. H. Knight, N. Massey, T. Aina, C. Christensen, D. J. Frame, J. A. Kettleborough, A. Martin, S. Pascoe, B. Sanderson, D. A. Stainforth, and M. R. Allen, Association of parameter, software, and hardware variation with large-scale behavior across 57,000 climate models, Proc. Natl. Acad. Sci. USA, 104 (2007), pp. 12259--12264.
[25]
A. Kurakin, I. Goodfellow, and S. Bengio, Adversarial machine learning at scale, in Proceedings of the International Conference on Learning Representations, 2017.
[26]
Z.-Q. Luo and P. Tseng, Error bounds and convergence analysis of feasible descent methods: A general approach, Ann. Oper. Res., 46 (1993), pp. 157--178.
[27]
H. Mania, A. Guy, and B. Recht, Simple Random Search Provides a Competitive Approach to Reinforcement Learning, preprint, arXiv:1803.07055, 2018.
[28]
A. Modas, S.-M. Moosavi-Dezfooli, and P. Frossard, Sparsefool: A few pixels make a big difference, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 9087--9096.
[29]
B. S. Mordukhovich, Variational Analysis and Generalized Differentiation I: Basic Theory, Grundlehren Math. Wise. 330, Springer, New York, 2006.
[30]
B. S. Mordukhovich, N. M. Nam, and N. Yen, Fréchet subdifferential calculus and optimality conditions in nondifferentiable programming, Optimization, 55 (2006), pp. 685--708.
[31]
N. Nakamura, J. Seepaul, J. B. Kadane, and B. Reeja-Jayan, Design for low-temperature microwave-assisted crystallization of ceramic thin films, Appl. Stoch. Models Bus. Ind., 33 (2017), pp. 314--321.
[32]
A. Nedić and D. P. Bertsekas, The effect of deterministic noise in subgradient methods, Math. Program., 125 (2010), pp. 75--99.
[33]
D. Needell and J. A. Tropp, CoSaMP: Iterative signal recovery from incomplete and inaccurate samples, Appl. Comput. Harmon. Anal., 26 (2009), pp. 301--321.
[34]
Y. Nesterov and V. Spokoiny, Random Gradient-Free Minimization of Convex Functions, Technical report, Universite Catholique de Louvain, Center for Operations Research and Econometrics, 2011.
[35]
Y. Nesterov and V. Spokoiny, Random gradient-free minimization of convex functions, Found. Comput. Math., 17 (2017), pp. 527--566.
[36]
N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, Practical black-box attacks against machine learning, in Proceedings of the ACM on Asia Conference on Computer and Communications Security, 2017, pp. 506--519.
[37]
E. Ryu and W. Yin, Large-Scale Convex Optimization via Monotone Operators, 2020.
[38]
T. Salimans, J. Ho, X. Chen, S. Sidor, and I. Sutskever, Evolution Strategies as a Scalable Alternative to Reinforcement Learning, preprint, arXiv:1703.03864, 2017.
[39]
M. Schmidt, N. L. Roux, and F. R. Bach, Convergence rates of inexact proximal-gradient methods for convex optimization, in Proceedings of Advances in Neural Information Processing Systems, 2011, pp. 1458--1466.
[40]
F. Schöpfer, Linear convergence of descent methods for the unconstrained minimization of restricted strongly convex functions, SIAM J. Optim., 26 (2016), pp. 1883--1911.
[41]
O. Shamir, On the complexity of bandit and derivative-free stochastic convex optimization, in Proceedings of the Conference on Learning Theory, 2013, pp. 3--24.
[42]
J. Snoek, H. Larochelle, and R. P. Adams, Practical Bayesian optimization of machine learning algorithms, in Proceedings of Advances in Neural Information Processing Systems, 2012, pp. 2951--2959.
[43]
J. C. Spall, An overview of the simultaneous perturbation method for efficient optimization, Johns Hopkins APL Tech. Digest, 19 (1998), pp. 482--492.
[44]
S. U. Stich, C. L. Muller, and B. Gartner, Optimization of convex functions with random pursuit, SIAM J. Optim., 23 (2013), pp. 1284--1309.
[45]
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Rethinking the inception architecture for computer vision, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2818--2826.
[46]
R. Tappenden, P. Richtárik, and J. Gondzio, Inexact coordinate descent: Complexity and preconditioning, J. Optim. Theory and Appl., 170 (2016), pp. 144--176.
[47]
B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin, Learning structured prediction models: A large margin approach, in Proceedings of the 22nd International Conference on Machine Learning, 2005, pp. 896--903.
[48]
Y. Wang, S. Du, S. Balakrishnan, and A. Singh, Stochastic zeroth-order optimization in high dimensions, in Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018, pp. 1356--1365.
[49]
Z. Wang, M. Zoghi, F. Hutter, D. Matheson, and N. De Freitas, Bayesian optimization in high dimensions via random embeddings, in Proceedings of the International Joint Conference on Artificial Intelligence, 2013.
[50]
H. Zhang, The restricted strong convexity revisited: Analysis of equivalence to error bound and quadratic growth, Optim. Lett., 11 (2017), pp. 817--833.

Cited By

View all
  • (2023)Fine-tuning language models with just forward passesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668430(53038-53075)Online publication date: 10-Dec-2023
  • (2023)Stochastic Zeroth-Order Riemannian Derivative Estimation and OptimizationMathematics of Operations Research10.1287/moor.2022.130248:2(1183-1211)Online publication date: 1-May-2023
  • (2022)Zeroth-order hard-thresholdingProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601911(22589-22601)Online publication date: 28-Nov-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image SIAM Journal on Optimization
SIAM Journal on Optimization  Volume 32, Issue 2
DOI:10.1137/sjope8.32.2
Issue’s Table of Contents

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 28 April 2022

Author Tags

  1. zeroth-order optimization
  2. black-box optimization
  3. derivative-free optimization
  4. regularized optimization
  5. sparse gradients
  6. sparse adversarial attack

Author Tags

  1. 90C56
  2. 65K05
  3. 68T05
  4. 68Q25

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Fine-tuning language models with just forward passesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668430(53038-53075)Online publication date: 10-Dec-2023
  • (2023)Stochastic Zeroth-Order Riemannian Derivative Estimation and OptimizationMathematics of Operations Research10.1287/moor.2022.130248:2(1183-1211)Online publication date: 1-May-2023
  • (2022)Zeroth-order hard-thresholdingProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601911(22589-22601)Online publication date: 28-Nov-2022

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media