[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

A frequency-domain analysis of inexact gradient methods

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

We study robustness properties of some iterative gradient-based methods for strongly convex functions, as well as for the larger class of functions with sector-bounded gradients, under a relative error model. Proofs of the corresponding convergence rates are based on frequency-domain criteria for the stability of nonlinear systems. Applications are given to inexact versions of gradient descent and the Triple Momentum Method. To further emphasize the usefulness of frequency-domain methods, we derive improved analytic bounds for the convergence rate of Nesterov’s accelerated method (in the exact setting) on strongly convex functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. For the reader familiar with the control-theoretic absolute stability problem, the convergence result quoted above shows that gradient descent satisfies a version of the Aizerman conjecture.

  2. In the language of the absolute stability problem, inexact gradient descent satisfies a version of the Kalman conjecture.

  3. In fact, the choice of parameters (1.15) was originally motivated by a frequency-domain analysis; for more, see §4 and [61].

  4. The reader should ignore the statement about the finiteness of \(V_\mathrm {f}\), which is famously false [63].

  5. [52] explicitly states that A should not have spectrum with \(|z| = \rho \), but this is unnecessary. Such an assumption can always be removed when (AB) is controllable by shifting the eigenvalues off of \(|z| = \rho \) via linear feedback: an equivalent characterization of controllability is that for any monic polynomial there is a matrix N such that the characteristic polynomial of \(A + BN\) is the given polynomial [50, §34, Theorem 1]. The corresponding linear change of variables \((x, u) \mapsto (x, u + Nx)\) does not affect the validity of (FDI) (it corresponds to a congruency transformation of the Popov function by a rational matrix function), and leaves invariant any solution of (LMI).

  6. In general, this is not true of every solution to (LMI).

  7. In reference to the remarks preceding Lemma 2.7, note that \(\Pi ({{\bar{z}}},z) \equiv 0\) when \(|z| = \rho _\mathrm {GD}\) and \(\alpha = 2/(m+L)\).

References

  1. Ahmad, N.S., Carrasco, J., Heath, W.P.: A less conservative lmi condition for stability of discrete-time systems with slope-restricted nonlinearities. IEEE Trans. Autom. Control 60(6), 1692–1697 (2014)

    Article  MathSciNet  Google Scholar 

  2. Arora, S., Ge, R., Ma, T., Moitra, A.: Simple, efficient, and neural algorithms for sparse coding. In: Conference on Learning Theory, pp. 113–149. PMLR (2015)

  3. Bertsekas, D.P.: Nonlinear Programming. Athena Scientific Belmont, Massachusets (1999)

    MATH  Google Scholar 

  4. Boczar, R., Lessard, L., Recht, B.: Exponential convergence bounds using integral quadratic constraints. In: 2015 54th IEEE Conference on Decision and Control (CDC), pp. 7516–7521. IEEE (2015)

  5. Badithela, A., Seiler, P.: Analysis of the heavy-ball algorithm using integral quadratic constraints. In: 2019 American Control Conference (ACC), pp. 4081–4085. IEEE (2019)

  6. Chen, Y., Candes, E.: Solving random quadratic systems of equations is nearly as easy as solving linear systems. In: Advances in Neural Information Processing Systems, pp. 739–747 (2015)

  7. Cohen, M.B., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. arXiv:1805.12591 (2018)

  8. Cyrus, S., Hu, B., Van Scoy, B., Lessard, L.: A robust accelerated optimization algorithm for strongly convex functions. In: 2018 Annual American Control Conference (ACC), pp. 1376–1381. IEEE (2018)

  9. Candes, E.J., Li, X., Soltanolkotabi, M.: Phase retrieval via Wirtinger flow: theory and algorithms. IEEE Trans. Inf. Theory 61(4), 1985–2007 (2015)

    Article  MathSciNet  Google Scholar 

  10. d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008)

    Article  MathSciNet  Google Scholar 

  11. Drummond, R., Duncan, S.: Accelerated gradient methods with memory. arXiv:1805.09077 (2018)

  12. Devolder, O., Glineur, F., Nesterov, Y.: First-order methods with inexact oracle: the strongly convex case. CORE discussion papers, 2013016 (2013)

  13. Devolder, O., Glineur, F., Nesterov, Y.: Intermediate gradient methods for smooth convex problems with inexact oracle. Technical report (2013)

  14. Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1–2), 37–75 (2014)

    Article  MathSciNet  Google Scholar 

  15. de Klerk, E., Glineur, F., Taylor, A.: Worst-case convergence analysis of gradient and Newton methods through semidefinite programming performance estimation. arXiv:1709.05191 (2017)

  16. de Klerk, E., Glineur, F., Taylor, A.B.: On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions. Optim. Lett. 11(7), 1185–1199 (2017)

    Article  MathSciNet  Google Scholar 

  17. Drori, Y., Taylor, A.B.: Efficient first-order methods for convex minimization: a constructive approach. Math. Program. 184, 183–220 (2020). https://doi.org/10.1007/s10107-019-01410-2

  18. Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Program. 145(1–2), 451–482 (2014)

    Article  MathSciNet  Google Scholar 

  19. França, G., Bento, J.: An explicit rate bound for over-relaxed ADMM. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 2104–2108. IEEE (2016)

  20. Fazlyab, M., Ribeiro, A., Morari, M., Preciado, V.M.: Analysis of optimization algorithms via integral quadratic constraints: nonstrongly convex problems. SIAM J. Optim. 28(3), 2654–2689 (2018)

    Article  MathSciNet  Google Scholar 

  21. Fetzer, M., Scherer, C.W.: Absolute stability analysis of discrete time feedback interconnections. IFAC-PapersOnLine 50(1), 8447–8453 (2017)

    Article  Google Scholar 

  22. Friedlander, M.P., Schmidt, M.: Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3), A1380–A1405 (2012)

    Article  MathSciNet  Google Scholar 

  23. Gannot, O.: Frequency criteria for exponential stability. arXiv:1910.10855 (2019)

  24. Gramlich, D., Ebenbauer, C., Scherer, C.W.: Convex synthesis of accelerated gradient algorithms for optimization and saddle point problems using Lyapunov functions. arXiv:2006.09946 (2020)

  25. Hu, B., Lessard, L.: Control interpretations for first-order optimization methods. In: 2017 American Control Conference (ACC), pp. 3114–3119. IEEE (2017)

  26. Hu, B., Lessard, L.: Dissipativity theory for nesterov’s accelerated method. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1549–1557 (2017)

  27. Hassan-Moghaddam, S., Jovanović, M.R: Proximal gradient flow and Douglas–Rachford splitting dynamics: global exponential stability via integral quadratic constraints. arXiv:1908.09043 (2019)

  28. Hu, B., Seiler, P.: Exponential decay rate conditions for uncertain linear systems using integral quadratic constraints. IEEE Trans. Autom. Control 61(11), 3631–3637 (2016)

    Article  MathSciNet  Google Scholar 

  29. Hu, B., Syed, U.A.: Characterizing the exact behaviors of temporal difference learning algorithms using Markov jump linear system theory. arXiv:1906.06781 (2019)

  30. Hu, B., Seiler, P., Lessard, L.: Analysis of approximate stochastic gradient using quadratic constraints and sequential semidefinite programs. arXiv:1711.00987 (2017)

  31. Hu, B., Seiler, P., Lessard, L.: Analysis of biased stochastic gradient descent using sequential semidefinite programs. Math. Program., 1–26 (2020)

  32. Hu, B., Wright, S., Lessard, L.: Dissipativity theory for accelerating stochastic variance reduction: a unified analysis of SVRG and Katyusha using semidefinite programs. arXiv:1806.03677 (2018)

  33. Jury, E.I., Lee, B.W.: A stability theory for multinonlinear control systems. Third IFAC World Congr. 28, A1–A11 (1966)

    Google Scholar 

  34. Jury, E.L., Lee, B.: On the stability of a certain class of nonlinear sampled-data systems. IEEE Trans. Autom. Control 9(1), 51–61 (1964)

    Article  MathSciNet  Google Scholar 

  35. Kalman, R.E.: Lectures on controllability and observability. In: Controllability and Observability, pp. 1–149. Springer (2010)

  36. Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math. Program. 159(1–2), 81–107 (2016)

    Article  MathSciNet  Google Scholar 

  37. Kim, D., Fessler, J.A.: On the convergence analysis of the optimized gradient method. J. Optim. Theory Appl. 172(1), 187–205 (2017)

    Article  MathSciNet  Google Scholar 

  38. Kleinberg, R., Li, Y., Yuan, Y.: An alternative view: When does SGD escape local minima? arXiv:1802.06175 (2018)

  39. Kalman, R.E., Szegoö, G.: Sur la stabilité absolue d’un système d’équations aux diffèrences finies. C R Acad Sci Paris

  40. Lessard, L., Recht, B., Packard, A.: Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J. Optim. 26(1), 57–95 (2016)

    Article  MathSciNet  Google Scholar 

  41. Lessard, L., Seiler, P.: Direct synthesis of iterative algorithms with bounds on achievable worst-case convergence rate. arXiv:1904.09046 (2019)

  42. Lurie, A.I.: Some Nonlinear Problems in the Theory of Automatic Control. HM Stationary Office, London (1957)

    Google Scholar 

  43. Li, Y., Yuan, Y.: Convergence analysis of two-layer neural networks with ReLU activation. arXiv:1705.09886 (2017)

  44. Michalowsky, S.: Design of Distributed and Robust Optimization Algorithms. Logos Verlag Berlin GmbH, A Systems Theoretic Approach (2020)

  45. Megretski, A., Rantzer, A.: System analysis via integral quadratic constraints. IEEE Trans. Autom. Control 42(6), 819–830 (1997)

    Article  MathSciNet  Google Scholar 

  46. Michalowsky, S., Scherer, C., Ebenbauer, C.: Robust and structure exploiting optimization algorithms: an integral quadratic constraint approach. arXiv:1905.00279 (2019)

  47. Nesterov, Y.: Lectures on Convex Optimization, vol. 137. Springer, Berlin (2018)

    Book  Google Scholar 

  48. Nishihara, R., Lessard, L., Recht, B., Packard, A., Jordan, M.I.: A general analysis of the convergence of ADMM. arXiv:1502.02009 (2015)

  49. O’Shea, R., Younis, M.: A frequency-time domain stability criterion for sampled-data systems. IEEE Trans. Autom. Control 12(6), 719–724 (1967)

    Article  Google Scholar 

  50. Popov, V.M., Georgescu, R.: Hyperstability of Control Systems. Springer, Berlin (1973)

    Book  Google Scholar 

  51. Polyak, B.T.: Introduction to Optimization, vol. 1. Publications Division Inc., New York (1987)

    MATH  Google Scholar 

  52. Rantzer, A.: On the kalman–Yakubovich–Popov lemma. Syst. Control Lett. 28(1), 7–10 (1996)

    Article  MathSciNet  Google Scholar 

  53. Ryu, E.K., Taylor, A.B., Bergeling, C., Giselsson, P.: Operator splitting performance estimation: tight contraction factors and optimal parameter selection. arXiv:1812.00146 (2018)

  54. Safavi, S., Joshi, B., França, G., Bento, J.: An explicit convergence rate for nesterov’s method from SDP. In: 2018 IEEE International Symposium on Information Theory (ISIT), pp. 1560–1564. IEEE (2018)

  55. Sun, R., Luo, Z.-Q.: Guaranteed matrix completion via non-convex factorization. IEEE Trans. Inf. Theory 62(11), 6535–6579 (2016)

    Article  MathSciNet  Google Scholar 

  56. Taylor, A., Bach, F.: Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. arXiv:1902.00947 (2019)

  57. Taylor, A.B., Hendrickx, J.M., Glineur, F.: Exact worst-case performance of first-order methods for composite convex optimization. SIAM J. Optim. 27(3), 1283–1313 (2017)

    Article  MathSciNet  Google Scholar 

  58. Taylor, A.B., Hendrickx, J.M., Glineur, F.: Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Program. 161(1–2), 307–345 (2017)

    Article  MathSciNet  Google Scholar 

  59. Taylor, A.B., Hendrickx, J.M., Glineur, F.: Exact worst-case convergence rates of the proximal gradient method for composite convex minimization. J. Optim. Theory Appl. 178(2), 455–476 (2018)

    Article  MathSciNet  Google Scholar 

  60. Taylor, A., Van Scoy, B., Lessard, L.: Lyapunov functions for first-order methods: tight automated convergence guarantees. arXiv:1803.06073 (2018)

  61. Van Scoy, B., Freeman, R.A., Lynch, K.M.: The fastest known globally convergent first-order method for minimizing strongly convex functions. IEEE Cont. Syst. Lett. 2(1), 49–54 (2017)

    Article  MathSciNet  Google Scholar 

  62. Willems, J.: Least squares stationary optimal control and the algebraic Riccati equation. IEEE Trans. Autom. Control 16(6), 621–634 (1971)

    Article  MathSciNet  Google Scholar 

  63. Willems, J.: On the existence of a nonpositive solution to the Riccati equation. IEEE Trans. Autom. Control 19(5), 592–593 (1974)

    Article  MathSciNet  Google Scholar 

  64. Willems, J.C.: Dissipative dynamical systems part I: general theory. Arch. Ration. Mech. Anal. 45(5), 321–351 (1972)

    Article  Google Scholar 

  65. Willems, J.C.: Dissipative dynamical systems part II: linear systems with quadratic supply rates. Arch. Ration. Mech. Anal. 45(5), 352–393 (1972)

    Article  Google Scholar 

  66. Willems, J.C., Polderman, J.W.: Introduction to Mathematical Systems Theory: A Behavioral Approach, vol. 26. Springer, Berlin (1997)

    MATH  Google Scholar 

  67. Xiong, H., Chi, Y., Bin, H., Zhang, W.: Analytical convergence regions of accelerated gradient descent in nonconvex optimization under regularity condition. Automatica 113, 108715 (2020)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The author is indebted to both anonymous referees for their close readings and numerous suggestions on how to improve the paper. Special thanks are due to the first referee, who pointed out that a proof of Proposition 1.5 can already be inferred from [dKGT1].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oran Gannot.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gannot, O. A frequency-domain analysis of inexact gradient methods. Math. Program. 194, 975–1016 (2022). https://doi.org/10.1007/s10107-021-01665-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-021-01665-8

Mathematics Subject Classification

Navigation