Abstract
In this paper, we present a generic framework that allows accelerating almost arbitrary non-accelerated deterministic and randomized algorithms for smooth convex optimization problems. The major approach of our envelope is the same as in Catalyst [37]: an accelerated proximal outer gradient method, which is used as an envelope for a non-accelerated inner method for the \(\ell _2\) regularized auxiliary problem. Our algorithm has two key differences: 1) easily verifiable stopping condition for inner algorithm; 2) the regularization parameter can be tuned along the way. As a result, the main contribution of our work is a new framework that applies to adaptive inner algorithms: Steepest Descent, Adaptive Coordinate Descent, Alternating Minimization. Moreover, in the non-adaptive case, our approach allows obtaining Catalyst without a logarithmic factor, which appears in the standard Catalyst [37, 38].
The research is supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye) №075-00337-20-03, project No. 0714-2020-0005.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
For deterministic algorithms we can skip “with probability at least \(1 - \delta \)” and factor \(\log \tfrac{N}{\delta }\).
- 4.
The number of oracle calls (iterations) of auxiliary method \(\mathcal {M}\) that required to find \(\varepsilon \) solution of (1) in terms of functions value.
- 5.
- 6.
Here one should use a following trick in recalculation of \(\ln \left( \sum _{i=1}^m \exp \left( [ A x]_i\right) \right) \) and its gradient (partial derivative). From the structure of the method we know that \(x^{new} = x^{old} + \delta e_i\), where \(e_i\) is i-th orth. So if we’ve already calculate \(A x^{old}\) then to recalculate \(A x^{new} = A x^{old} + \delta A_i\) requires only O(s) additional operations independently of n and m.
References
Allen-Zhu, Z., Hazan, E.: Optimal black-box reductions between optimization objectives. arXiv preprint arXiv:1603.05642 (2016)
Bayandina, A., Gasnikov, A., Lagunovskaya, A.: Gradient-free two-points optimal method for non smooth stochastic convex optimization problem with additional small noise. Autom. Rem. Contr. 79(7) (2018). arXiv:1701.03821
Beck, A.: First-order methods in optimization, vol. 25. SIAM (2017)
Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends® Mach. Learn. 8(3–4), 231–357 (2015)
De Klerk, E., Glineur, F., Taylor, A.B.: On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions. Optim. Lett. 11(7), 1185–1199 (2017)
Diakonikolas, J., Orecchia, L.: Alternating randomized block coordinate descent. arXiv preprint arXiv:1805.09185 (2018)
Diakonikolas, J., Orecchia, L.: Conjugate gradients and accelerated methods unified: the approximate duality gap view. arXiv preprint arXiv:1907.00289 (2019)
Doikov, N., Nesterov, Y.: Contracting proximal methods for smooth convex optimization. SIAM J. Optim. 30(4), 3146–3169 (2020)
Doikov, N., Nesterov, Y.: Inexact tensor methods with dynamic accuracies. arXiv preprint arXiv:2002.09403 (2020)
Duchi, J.C., Jordan, M.I., Wainwright, M.J., Wibisono, A.: Optimal rates for zero-order convex optimization: the power of two function evaluations. IEEE Trans. Inf. Theory 61(5), 2788–2806 (2015)
Dvinskikh, D., et al.: Accelerated meta-algorithm for convex optimization. Comput. Math. Math. Phys. 61(1), 17–28 (2021)
Dvinskikh, D., Omelchenko, S., Gasnikov, A., Tyurin, A.: Accelerated gradient sliding for minimizing a sum of functions. Doklady Math. 101, 244–246 (2020)
Dvurechensky, P., Gasnikov, A., Gorbunov, E.: An accelerated directional derivative method for smooth stochastic convex optimization. arXiv:1804.02394 (2018)
Dvurechensky, P., Gasnikov, A., Gorbunov, E.: An accelerated method for derivative-free smooth stochastic convex optimization. arXiv:1802.09022 (2018)
Fercoq, O., Richtárik, P.: Accelerated, parallel, and proximal coordinate descent. SIAM J. Optim. 25(4), 1997–2023 (2015)
Gasnikov, A.: Universal Gradient Descent. MCCME, Moscow (2021)
Gasnikov, A., Lagunovskaya, A., Usmanova, I., Fedorenko, F.: Gradient-free proximal methods with inexact oracle for convex stochastic nonsmooth optimization problems on the simplex. Autom. Rem. Contr. 77(11), 2018–2034 (2016). https://doi.org/10.1134/S0005117916110114. http://dx.doi.org/10.1134/S0005117916110114. arXiv:1412.3890
Gasnikov, A.: Universal gradient descent. arXiv preprint arXiv:1711.00394 (2017)
Gasnikov, A., et al.: Near optimal methods for minimizing convex functions with lipschitz \( p \)-th derivatives. In: Conference on Learning Theory, pp. 1392–1393 (2019)
Gasnikov, A., Dvurechensky, P., Usmanova, I.: On accelerated randomized methods. Proc. Moscow Inst. Phys. Technol. 8(2), 67–100 (2016). (in Russian), first appeared in arXiv:1508.02182
Gasnikov, A., Gorbunov, E., Kovalev, D., Mokhammed, A., Chernousova, E.: Reachability of optimal convergence rate estimates for high-order numerical convex optimization methods. Doklady Math. 99, 91–94 (2019)
Gazagnadou, N., Gower, R.M., Salmon, J.: Optimal mini-batch and step sizes for saga. arXiv preprint arXiv:1902.00071 (2019)
Gorbunov, E., Hanzely, F., Richtarik, P.: A unified theory of SGD: variance reduction, sampling, quantization and coordinate descent (2019)
Gower, R.M., Loizou, N., Qian, X., Sailanbayev, A., Shulgin, E., Richtárik, P.: SGD: general analysis and improved rates. arXiv preprint arXiv:1901.09401 (2019)
Guminov, S., Dvurechensky, P., Gasnikov, A.: Accelerated alternating minimization. arXiv preprint arXiv:1906.03622 (2019)
Hendrikx, H., Bach, F., Massoulié, L.: Dual-free stochastic decentralized optimization with variance reduction. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Ivanova, A., et al.: Oracle complexity separation in convex optimization. arXiv preprint arXiv:2002.02706 (2020)
Ivanova, A., Pasechnyuk, D., Grishchenko, D., Shulgin, E., Gasnikov, A., Matyukhin, V.: Adaptive catalyst for smooth convex optimization. arXiv preprint arXiv:1911.11271 (2019)
Kamzolov, D., Gasnikov, A., Dvurechensky, P.: Optimal combination of tensor optimization methods. In: Olenev, N., Evtushenko, Y., Khachay, M., Malkova, V. (eds.) OPTIMA 2020. LNCS, vol. 12422, pp. 166–183. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62867-3_13
Kamzolov, D., Gasnikov, A.: Near-optimal hyperfast second-order method for convex optimization and its sliding. arXiv preprint arXiv:2002.09050 (2020)
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9851, pp. 795–811. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46128-1_50
Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S.J., Stich, S.U., Suresh, A.T.: Scaffold: stochastic controlled averaging for federated learning. arXiv preprint arXiv:1910.06378 (2019)
Kovalev, D., Salim, A., Richtárik, P.: Optimal and practical algorithms for smooth and strongly convex decentralized optimization. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Kulunchakov, A., Mairal, J.: A generic acceleration framework for stochastic composite optimization. arXiv preprint arXiv:1906.01164 (2019)
Li, H., Lin, Z.: Revisiting extra for smooth distributed optimization. arXiv preprint arXiv:2002.10110 (2020)
Li, H., Lin, Z., Fang, Y.: Optimal accelerated variance reduced extra and diging for strongly convex and smooth decentralized optimization. arXiv preprint arXiv:2009.04373 (2020)
Lin, H., Mairal, J., Harchaoui, Z.: A universal catalyst for first-order optimization. In: Advances in Neural Information Processing Systems, pp. 3384–3392 (2015)
Lin, H., Mairal, J., Harchaoui, Z.: Catalyst acceleration for first-order convex optimization: from theory to practice. arXiv preprint arXiv:1712.05654 (2018)
Lin, T., Jin, C., Jordan, M.: On gradient descent ascent for nonconvex-concave minimax problems. In: International Conference on Machine Learning, pp. 6083–6093. PMLR (2020)
Mishchenko, K., Iutzeler, F., Malick, J., Amini, M.R.: A delay-tolerant proximal-gradient algorithm for distributed learning. In: International Conference on Machine Learning, pp. 3587–3595 (2018)
Monteiro, R.D., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013)
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)
Nesterov, Y.: Lectures on Convex Optimization, vol. 137. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91578-4
Nesterov, Y., Gasnikov, A., Guminov, S., Dvurechensky, P.: Primal-dual accelerated gradient descent with line search for convex and nonconvex optimization problems. arXiv preprint arXiv:1809.05895 (2018)
Nesterov, Y., Stich, S.U.: Efficiency of the accelerated coordinate descent method on structured optimization problems. SIAM J. Optim. 27(1), 110–123 (2017)
Palaniappan, B., Bach, F.: Stochastic variance reduction methods for saddle-point problems. In: Advances in Neural Information Processing Systems, pp. 1416–1424 (2016)
Paquette, C., Lin, H., Drusvyatskiy, D., Mairal, J., Harchaoui, Z.: Catalyst acceleration for gradient-based non-convex optimization. arXiv preprint arXiv:1703.10993 (2017)
Parikh, N., Boyd, S., et al.: Proximal algorithms. Found. Trends® Optim. 1(3), 127–239 (2014)
Pasechnyuk, D., Anikin, A., Matyukhin, V.: Accelerated proximal envelopes: application to the coordinate descent method. arXiv preprint arXiv:2101.04706 (2021)
Polyak, B.T.: Introduction to optimization. Optimization Software (1987)
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control. Optim. 14(5), 877–898 (1976)
Shalev-Shwartz, S., Zhang, T.: Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. In: International Conference on Machine Learning, pp. 64–72 (2014)
Shamir, O.: An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. J. Mach. Learn. Res. 18, 52:1–52:11 (2017)
Tupitsa, N.: Accelerated alternating minimization and adaptability to strong convexity. arXiv preprint arXiv:2006.09097 (2020)
Tupitsa, N., Dvurechensky, P., Gasnikov, A.: Alternating minimization methods for strongly convex optimization. arXiv preprint arXiv:1911.08987 (2019)
Wilson, A.C., Mackey, L., Wibisono, A.: Accelerating rescaled gradient descent: Fast optimization of smooth functions. In: Advances in Neural Information Processing Systems, pp. 13533–13543 (2019)
Woodworth, B., et al.: Is local SGD better than minibatch SGD? arXiv preprint arXiv:2002.07839 (2020)
Wright, S.J.: Coordinate descent algorithms. Math. Program. 151(1), 3–34 (2015)
Yang, J., Zhang, S., Kiyavash, N., He, N.: A catalyst framework for minimax optimization. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Acknowledgements
We would like to thank Soomin Lee (Yahoo), Erik Ordentlich (Yahoo), César A. Uribe (MIT), Pavel Dvurechensky (WIAS, Berlin) and Peter Richtarik (KAUST) for useful remarks. We also would like to thank anonymous reviewers for their fruitful comments.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ivanova, A., Pasechnyuk, D., Grishchenko, D., Shulgin, E., Gasnikov, A., Matyukhin, V. (2021). Adaptive Catalyst for Smooth Convex Optimization. In: Olenev, N.N., Evtushenko, Y.G., Jaćimović, M., Khachay, M., Malkova, V. (eds) Optimization and Applications. OPTIMA 2021. Lecture Notes in Computer Science(), vol 13078. Springer, Cham. https://doi.org/10.1007/978-3-030-91059-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-91059-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91058-7
Online ISBN: 978-3-030-91059-4
eBook Packages: Computer ScienceComputer Science (R0)