[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Sub-sampled Newton methods

  • Full Length Paper
  • Series B
  • Published:
Mathematical Programming Submit manuscript

Abstract

For large-scale finite-sum minimization problems, we study non-asymptotic and high-probability global as well as local convergence properties of variants of Newton’s method where the Hessian and/or gradients are randomly sub-sampled. For Hessian sub-sampling, using random matrix concentration inequalities, one can sub-sample in a way that second-order information, i.e., curvature, is suitably preserved. For gradient sub-sampling, approximate matrix multiplication results from randomized numerical linear algebra provide a way to construct the sub-sampled gradient which contains as much of the first-order information as possible. While sample sizes all depend on problem specific constants, e.g., condition number, we demonstrate that local convergence rates are problem-independent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal, N., Bullins, B., Hazan, E.: Second order stochastic optimization in linear time. arXiv preprint arXiv:1602.03943 (2016)

  2. Berahas, A.S., Bollapragada, R., Nocedal, J.: An investigation of Newton-sketch and subsampled Newton methods. arXiv preprint arXiv:1705.06211 (2017)

  3. Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)

    MATH  Google Scholar 

  4. Bertsekas, D.P.: Convex Optimization Theory. Athena Scientific, Belmont (2009)

    MATH  Google Scholar 

  5. Bollapragada, R., Byrd, R., Nocedal, J.: Exact and inexact subsampled Newton methods for optimization. arXiv preprint arXiv:1609.08502 (2016). (To appear in IMA Journal of Numerical Analysis)

  6. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  7. Byrd, R.H., Chin, G.M., Neveitt, W., Nocedal, J.: On the use of stochastic Hessian information in optimization methods for machine learning. SIAM J. Optim. 21(3), 977–995 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  8. Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.: Sample size selection in optimization methods for machine learning. Math. Program. 134(1), 127–155 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  9. Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for convex L-1 regularized optimization. arXiv preprint arXiv:1309.3529 (2013)

  10. Cartis, C., Gould, N.I.M., Toint, PhL: On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization problems. SIAM J. Optim. 20(6), 2833–2852 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  11. Cartis, C., Gould, N.I.M., Toint, Ph.L.: An example of slow convergence for Newton’s method on a function with globally Lipschitz continuous Hessian. Technical report, ERGO 13-008, School of Mathematics, Edinburgh University (2013)

  12. Cohen, M.B., Lee, Y.T., Musco, C., Musco, C., Peng, R., Sidford, A.: Uniform sampling for matrix approximation. In: Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, pp. 181–190. ACM (2015)

  13. Conn, A.R., Gould, N.I.M., Toint, PhL: Trust Region Methods. SIAM, Philadelphia (2000)

    Book  MATH  Google Scholar 

  14. Dembo, R.S., Eisenstat, S.C., Steihaug, T.: Inexact Newton methods. SIAM J. Numer. Anal. 19(2), 400–408 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  15. Drineas, P., Kannan, R., Mahoney, M.W.: Fast Monte Carlo algorithms for matrices I: approximating matrix multiplication. SIAM J. Comput. 36(1), 132–157 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  16. Eisen, M., Mokhtari, A., Ribeiro, A.: Large scale empirical risk minimization via truncated adaptive Newton method. In: Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR,vol. 84, pp. 1447–1455 (2018)

  17. Eisenstat, S.C., Walker, H.F.: Choosing the forcing terms in an inexact Newton method. SIAM J. Sci. Comput. 17(1), 16–32 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  18. Erdogdu, M.A., Montanari, A.: Convergence rates of sub-sampled Newton methods. Adv. Neural Inf. Process. Syst. 28, 3034–3042 (2015)

    Google Scholar 

  19. Friedlander, M.P., Schmidt, M.: Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3), A1380–A1405 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  20. Griewank, A.: Some Bounds on the Complexity of Gradients, Jacobians, and Hessians. Complexity in Nonlinear Optimization, pp. 128–161. World Scientific Publisher, Singapore (1993)

    MATH  Google Scholar 

  21. Gross, D., Nesme, V.: Note on sampling without replacing from a finite collection of matrices. arXiv preprint arXiv:1001.2738 (2010)

  22. Haber, E., Chung, M.: Simultaneous source for non-uniform data variance and missing data. arXiv preprint arXiv:1404.5254 (2014)

  23. Hestenes, M.R.: Pseudoinversus and conjugate gradients. Commun. ACM 18(1), 40–43 (1975)

    Article  MATH  Google Scholar 

  24. Hestenes, M.R.: Conjugate Direction Methods in optimization, vol. 12. Springer, Berlin (2012)

    MATH  Google Scholar 

  25. Holodnak, J.T., Ipsen, I.C.: Randomized approximation of the Gram matrix: exact computation and probabilistic bounds. SIAM J. Matrix Anal. Appl. 36(1), 110–137 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  26. Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24(3), 1420–1443 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  27. Liu, X., Hsieh, C.J., Lee, J.D., Sun, Y.: An inexact subsampled proximal Newton-type method for large-scale machine learning. arXiv preprint arXiv:1708.08552 (2017)

  28. Mahoney, M.W.: Randomized algorithms for matrices and data. Found. Trends® Mach. Learn. 3(2), 123–224 (2011)

    MATH  Google Scholar 

  29. Martens, J.: Deep learning via Hessian-free optimization. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 735–742 (2010)

  30. McCullagh, P., Nelder, J.A.: Generalized Linear Models, vol. 37. CRC Press, Boca Raton (1989)

    Book  MATH  Google Scholar 

  31. Mutnỳ, M.: Stochastic second-order optimization via von Neumann series. arXiv preprint arXiv:1612.04694 (2016)

  32. Nesterov, Y.: Introductory Lectures on Convex Optimization, vol. 87. Springer, Berlin (2004)

    MATH  Google Scholar 

  33. Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  34. Pearlmutter, B.A.: Fast exact multiplication by the Hessian. Neural Comput. 6(1), 147–160 (1994)

    Article  Google Scholar 

  35. Pilanci, M., Wainwright, M.J.: Newton sketch: a linear-time optimization algorithm with linear-quadratic convergence. arXiv preprint arXiv:1505.02250 (2015)

  36. Roosta-Khorasani, F., van den Doel, K., Ascher, U.: Stochastic algorithms for inverse problems involving PDEs and many measurements. SIAM J. Sci. Comput. 36(5), S3–S22 (2014). https://doi.org/10.1137/130922756

    Article  MathSciNet  MATH  Google Scholar 

  37. Roosta-Khorasani, F., Székely, G.J., Ascher, U.: Assessing stochastic algorithms for large scale nonlinear least squares problems using extremal probabilities of linear combinations of gamma random variables. SIAM/ASA J. Uncertain. Quantif. 3(1), 61–90 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  38. Tropp, J.A.: Improved analysis of the subsampled randomized Hadamard transform. Adv. Adapt. Data Anal. 3(01n02), 115–126 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  39. Tropp, J.A.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12(4), 389–434 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  40. Wang, C.C., Huang, C.H., Lin, C.J.: Subsampled Hessian Newton methods for supervised learning. Neural Comput. 27, 1766–1795 (2015)

    Article  MathSciNet  Google Scholar 

  41. Xu, P., Roosta-Khorasan, F., Mahoney, M.W.: Second-order optimization for non-convex machine learning: an empirical study. arXiv preprint arXiv:1708.07827 (2017)

  42. Xu, P., Yang, J., Roosta-Khorasani, F., Ré, C., Mahoney, M.W.: Sub-sampled Newton methods with non-uniform sampling. In: Advances in Neural Information Processing Systems 29, pp. 3000–3008 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Farbod Roosta-Khorasani.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roosta-Khorasani, F., Mahoney, M.W. Sub-sampled Newton methods. Math. Program. 174, 293–326 (2019). https://doi.org/10.1007/s10107-018-1346-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-018-1346-5

Keywords

Mathematics Subject Classification

Navigation