Sub-sampled Newton methods

Farbod Roosta-Khorasani¹ &
Michael W. Mahoney²

2524 Accesses
75 Citations
Explore all metrics

Abstract

For large-scale finite-sum minimization problems, we study non-asymptotic and high-probability global as well as local convergence properties of variants of Newton’s method where the Hessian and/or gradients are randomly sub-sampled. For Hessian sub-sampling, using random matrix concentration inequalities, one can sub-sample in a way that second-order information, i.e., curvature, is suitably preserved. For gradient sub-sampling, approximate matrix multiplication results from randomized numerical linear algebra provide a way to construct the sub-sampled gradient which contains as much of the first-order information as possible. While sample sizes all depend on problem specific constants, e.g., condition number, we demonstrate that local convergence rates are problem-independent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

References

Agarwal, N., Bullins, B., Hazan, E.: Second order stochastic optimization in linear time. arXiv preprint arXiv:1602.03943 (2016)
Berahas, A.S., Bollapragada, R., Nocedal, J.: An investigation of Newton-sketch and subsampled Newton methods. arXiv preprint arXiv:1705.06211 (2017)
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
MATH Google Scholar
Bertsekas, D.P.: Convex Optimization Theory. Athena Scientific, Belmont (2009)
MATH Google Scholar
Bollapragada, R., Byrd, R., Nocedal, J.: Exact and inexact subsampled Newton methods for optimization. arXiv preprint arXiv:1609.08502 (2016). (To appear in IMA Journal of Numerical Analysis)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Byrd, R.H., Chin, G.M., Neveitt, W., Nocedal, J.: On the use of stochastic Hessian information in optimization methods for machine learning. SIAM J. Optim. 21(3), 977–995 (2011)
Article MathSciNet MATH Google Scholar
Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.: Sample size selection in optimization methods for machine learning. Math. Program. 134(1), 127–155 (2012)
Article MathSciNet MATH Google Scholar
Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for convex L-1 regularized optimization. arXiv preprint arXiv:1309.3529 (2013)
Cartis, C., Gould, N.I.M., Toint, PhL: On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization problems. SIAM J. Optim. 20(6), 2833–2852 (2010)
Article MathSciNet MATH Google Scholar
Cartis, C., Gould, N.I.M., Toint, Ph.L.: An example of slow convergence for Newton’s method on a function with globally Lipschitz continuous Hessian. Technical report, ERGO 13-008, School of Mathematics, Edinburgh University (2013)
Cohen, M.B., Lee, Y.T., Musco, C., Musco, C., Peng, R., Sidford, A.: Uniform sampling for matrix approximation. In: Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, pp. 181–190. ACM (2015)
Conn, A.R., Gould, N.I.M., Toint, PhL: Trust Region Methods. SIAM, Philadelphia (2000)
Book MATH Google Scholar
Dembo, R.S., Eisenstat, S.C., Steihaug, T.: Inexact Newton methods. SIAM J. Numer. Anal. 19(2), 400–408 (1982)
Article MathSciNet MATH Google Scholar
Drineas, P., Kannan, R., Mahoney, M.W.: Fast Monte Carlo algorithms for matrices I: approximating matrix multiplication. SIAM J. Comput. 36(1), 132–157 (2006)
Article MathSciNet MATH Google Scholar
Eisen, M., Mokhtari, A., Ribeiro, A.: Large scale empirical risk minimization via truncated adaptive Newton method. In: Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR,vol. 84, pp. 1447–1455 (2018)
Eisenstat, S.C., Walker, H.F.: Choosing the forcing terms in an inexact Newton method. SIAM J. Sci. Comput. 17(1), 16–32 (1996)
Article MathSciNet MATH Google Scholar
Erdogdu, M.A., Montanari, A.: Convergence rates of sub-sampled Newton methods. Adv. Neural Inf. Process. Syst. 28, 3034–3042 (2015)
Google Scholar
Friedlander, M.P., Schmidt, M.: Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3), A1380–A1405 (2012)
Article MathSciNet MATH Google Scholar
Griewank, A.: Some Bounds on the Complexity of Gradients, Jacobians, and Hessians. Complexity in Nonlinear Optimization, pp. 128–161. World Scientific Publisher, Singapore (1993)
MATH Google Scholar
Gross, D., Nesme, V.: Note on sampling without replacing from a finite collection of matrices. arXiv preprint arXiv:1001.2738 (2010)
Haber, E., Chung, M.: Simultaneous source for non-uniform data variance and missing data. arXiv preprint arXiv:1404.5254 (2014)
Hestenes, M.R.: Pseudoinversus and conjugate gradients. Commun. ACM 18(1), 40–43 (1975)
Article MATH Google Scholar
Hestenes, M.R.: Conjugate Direction Methods in optimization, vol. 12. Springer, Berlin (2012)
MATH Google Scholar
Holodnak, J.T., Ipsen, I.C.: Randomized approximation of the Gram matrix: exact computation and probabilistic bounds. SIAM J. Matrix Anal. Appl. 36(1), 110–137 (2015)
Article MathSciNet MATH Google Scholar
Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24(3), 1420–1443 (2014)
Article MathSciNet MATH Google Scholar
Liu, X., Hsieh, C.J., Lee, J.D., Sun, Y.: An inexact subsampled proximal Newton-type method for large-scale machine learning. arXiv preprint arXiv:1708.08552 (2017)
Mahoney, M.W.: Randomized algorithms for matrices and data. Found. Trends® Mach. Learn. 3(2), 123–224 (2011)
MATH Google Scholar
Martens, J.: Deep learning via Hessian-free optimization. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 735–742 (2010)
McCullagh, P., Nelder, J.A.: Generalized Linear Models, vol. 37. CRC Press, Boca Raton (1989)
Book MATH Google Scholar
Mutnỳ, M.: Stochastic second-order optimization via von Neumann series. arXiv preprint arXiv:1612.04694 (2016)
Nesterov, Y.: Introductory Lectures on Convex Optimization, vol. 87. Springer, Berlin (2004)
MATH Google Scholar
Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
Article MathSciNet MATH Google Scholar
Pearlmutter, B.A.: Fast exact multiplication by the Hessian. Neural Comput. 6(1), 147–160 (1994)
Article Google Scholar
Pilanci, M., Wainwright, M.J.: Newton sketch: a linear-time optimization algorithm with linear-quadratic convergence. arXiv preprint arXiv:1505.02250 (2015)
Roosta-Khorasani, F., van den Doel, K., Ascher, U.: Stochastic algorithms for inverse problems involving PDEs and many measurements. SIAM J. Sci. Comput. 36(5), S3–S22 (2014). https://doi.org/10.1137/130922756
Article MathSciNet MATH Google Scholar
Roosta-Khorasani, F., Székely, G.J., Ascher, U.: Assessing stochastic algorithms for large scale nonlinear least squares problems using extremal probabilities of linear combinations of gamma random variables. SIAM/ASA J. Uncertain. Quantif. 3(1), 61–90 (2015)
Article MathSciNet MATH Google Scholar
Tropp, J.A.: Improved analysis of the subsampled randomized Hadamard transform. Adv. Adapt. Data Anal. 3(01n02), 115–126 (2011)
Article MathSciNet MATH Google Scholar
Tropp, J.A.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12(4), 389–434 (2012)
Article MathSciNet MATH Google Scholar
Wang, C.C., Huang, C.H., Lin, C.J.: Subsampled Hessian Newton methods for supervised learning. Neural Comput. 27, 1766–1795 (2015)
Article MathSciNet Google Scholar
Xu, P., Roosta-Khorasan, F., Mahoney, M.W.: Second-order optimization for non-convex machine learning: an empirical study. arXiv preprint arXiv:1708.07827 (2017)
Xu, P., Yang, J., Roosta-Khorasani, F., Ré, C., Mahoney, M.W.: Sub-sampled Newton methods with non-uniform sampling. In: Advances in Neural Information Processing Systems 29, pp. 3000–3008 (2016)

Download references

Author information

Authors and Affiliations

School of Mathematics and Physics, University of Queensland, St Lucia, QLD, Australia
Farbod Roosta-Khorasani
ICSI and Department of Statistics, UC Berkeley, Berkeley, CA, USA
Michael W. Mahoney

Authors

Farbod Roosta-Khorasani
View author publications
You can also search for this author in PubMed Google Scholar
Michael W. Mahoney
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Farbod Roosta-Khorasani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roosta-Khorasani, F., Mahoney, M.W. Sub-sampled Newton methods. Math. Program. 174, 293–326 (2019). https://doi.org/10.1007/s10107-018-1346-5

Download citation

Received: 14 March 2017
Accepted: 27 October 2018
Published: 16 November 2018
Issue Date: 01 March 2019
DOI: https://doi.org/10.1007/s10107-018-1346-5

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A conjugate gradient sampling method for nonsmooth optimization

A Gradient Sampling Method Based on Ideal Direction for Solving Nonsmooth Optimization Problems

A fast gradient and function sampling method for finite-max functions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Sub-sampled Newton methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A conjugate gradient sampling method for nonsmooth optimization

A Gradient Sampling Method Based on Ideal Direction for Solving Nonsmooth Optimization Problems

A fast gradient and function sampling method for finite-max functions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now