[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2968826.2968940guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm

Published: 08 December 2014 Publication History

Abstract

We improve a recent guarantee of Bach and Moulines on the linear convergence of SGD for smooth and strongly convex objectives, reducing a quadratic dependence on the strong convexity to a linear dependence. Furthermore, we show how reweighting the sampling distribution (i.e. importance sampling) is necessary in order to further improve convergence, and obtain a linear dependence on average smoothness, dominating previous results, and more broadly discus how importance sampling for SGD can improve convergence also in other scenarios. Our results are based on a connection between SGD and the randomized Kaczmarz algorithm, which allows us to transfer ideas between the separate bodies of literature studying each of the two methods.

References

[1]
F. Bach and E. Moulines. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Advances in Neural Information Processing Systems (NIPS), 2011.
[2]
Y. Censor, P. P. B. Eggermont, and D. Gordon. Strong underrelaxation in Kaczmarz's method for inconsistent systems. Numerische Mathematik, 41(1):83-92, 1983.
[3]
R. Foygel and N. Srebro. Concentration-based guarantees for low-rank matrix reconstruction. 24th Ann. Conf. Learning Theory (COLT), 2011.
[4]
M. Hanke and W. Niethammer. On the acceleration of Kaczmarz's method for inconsistent linear systems. Linear Algebra and its Applications, 130:83-98, 1990.
[5]
G. T. Herman. Fundamentals of computerized tomography: image reconstruction from projections. Springer, 2009.
[6]
G. N Hounsfield. Computerized transverse axial scanning (tomography): Part 1. description of system. British Journal of Radiology, 46(552):1016-1022, 1973.
[7]
S. Kaczmarz. Angenäherte auflösung von systemen linearer gleichungen. Bull. Int. Acad. Polon. Sci. Lett. Ser. A, pages 335-357, 1937.
[8]
N. Le Roux, M. W. Schmidt, and F. Bach. A stochastic gradient method with an exponential convergence rate for finite training sets. Advances in Neural Information Processing Systems (NIPS), pages 2672-2680, 2012.
[9]
F. Natterer. The mathematics of computerized tomography, volume 32 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2001. ISBN 0-89871-4931. URL http://dx.doi.org/10.1137/1.978 08 987192 84. Reprint of the 1986 original.
[10]
D. Needell. Randomized Kaczmarz solver for noisy linear systems. BIT, 50(2):395-403, 2010. ISSN 0006-3835. URL http://dx.doi.org/10.1007/s10543-010-0265-5.
[11]
D. Needell and R. Ward. Two-subspace projection method for coherent overdetermined linear systems. Journal of Fourier Analysis and Applications, 19(2):256-269, 2013.
[12]
Arkadi Nemirovski. Efficient methods in convex programming. 2005.
[13]
Y. Nesterov. Introductory Lectures on Convex Optimization. Kluwer, 2004.
[14]
Y. Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optimiz., 22(2):341-362, 2012.
[15]
P. Richtárik and M. Takáč. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program., pages 1-38, 2012.
[16]
M. Schmidt, N. Roux, and F. Bach. Minimizing finite sums with the stochastic average gradient. arXiv preprint arXiv:1309.2388, 2013.
[17]
S. Shalev-Shwartz and T. Zhang. Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res., 14(1):567-599, 2013.
[18]
N. Srebro, K. Sridharan, and A. Tewari. Smoothness, low noise and fast rates. In Advances in Neural Information Processing Systems, 2010.
[19]
T. Strohmer and R. Vershynin. A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl., 15(2):262-278, 2009. ISSN 1069-5869. URL http://dx.doi.org/10.1007/s00041-008-9030-4.
[20]
K. Tanabe. Projection method for solving a singular system of linear equations and its applications. Numerische Mathematik, 17(3):203-214, 1971.
[21]
T. M. Whitney and R. K. Meany. Two algorithms related to the method of steepest descent. SIAM Journal on Numerical Analysis, 4(1):109-118, 1967.
[22]
P. Zhao and T. Zhang. Stochastic optimization with importance sampling. Submitted, 2014.

Cited By

View all
  • (2024)Understanding the training speedup from sampling with approximate lossesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692473(10127-10147)Online publication date: 21-Jul-2024
  • (2023)Coordinating distributed example orders for provably accelerated trainingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668573(56198-56210)Online publication date: 10-Dec-2023
  • (2023)Walk for Learning: A Random Walk Approach for Federated Learning From Heterogeneous DataIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.324425041:4(929-940)Online publication date: 1-Apr-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'14: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1
December 2014
3697 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 08 December 2014

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Understanding the training speedup from sampling with approximate lossesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692473(10127-10147)Online publication date: 21-Jul-2024
  • (2023)Coordinating distributed example orders for provably accelerated trainingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668573(56198-56210)Online publication date: 10-Dec-2023
  • (2023)Walk for Learning: A Random Walk Approach for Federated Learning From Heterogeneous DataIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.324425041:4(929-940)Online publication date: 1-Apr-2023
  • (2022)GraBProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600922(8969-8981)Online publication date: 28-Nov-2022
  • (2021)Approximate Gradient Coding with Optimal Decoding2021 IEEE International Symposium on Information Theory (ISIT)10.1109/ISIT45174.2021.9517990(2280-2285)Online publication date: 12-Jul-2021
  • (2020)The impact of neural network overparameterization on gradient confusion and stochastic gradient descentProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525723(8469-8479)Online publication date: 13-Jul-2020
  • (2020)On convergence-diagnostic based step sizes for stochastic gradient descentProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525646(7641-7651)Online publication date: 13-Jul-2020
  • (2020)An equivalence between loss functions and non-uniform sampling in experience replayProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496916(14219-14230)Online publication date: 6-Dec-2020
  • (2020)Debiasing averaged stochastic gradient descent to handle missing valuesProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496811(12957-12967)Online publication date: 6-Dec-2020
  • (2020)Escaping saddle-point faster under interpolation-like conditionsProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496765(12414-12425)Online publication date: 6-Dec-2020
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media