More Web Proxy on the site http://driver.im/

Article

Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm

Authors: Deanna Needell,

Nathan Srebro, Rachel WardAuthors Info & Claims

NIPS'14: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1

Pages 1017 - 1025

Published: 08 December 2014 Publication History

Abstract

We improve a recent guarantee of Bach and Moulines on the linear convergence of SGD for smooth and strongly convex objectives, reducing a quadratic dependence on the strong convexity to a linear dependence. Furthermore, we show how reweighting the sampling distribution (i.e. importance sampling) is necessary in order to further improve convergence, and obtain a linear dependence on average smoothness, dominating previous results, and more broadly discus how importance sampling for SGD can improve convergence also in other scenarios. Our results are based on a connection between SGD and the randomized Kaczmarz algorithm, which allows us to transfer ideas between the separate bodies of literature studying each of the two methods.

References

[1]

F. Bach and E. Moulines. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Advances in Neural Information Processing Systems (NIPS), 2011.

[2]

Y. Censor, P. P. B. Eggermont, and D. Gordon. Strong underrelaxation in Kaczmarz's method for inconsistent systems. Numerische Mathematik, 41(1):83-92, 1983.

Digital Library

[3]

R. Foygel and N. Srebro. Concentration-based guarantees for low-rank matrix reconstruction. 24th Ann. Conf. Learning Theory (COLT), 2011.

[4]

M. Hanke and W. Niethammer. On the acceleration of Kaczmarz's method for inconsistent linear systems. Linear Algebra and its Applications, 130:83-98, 1990.

[5]

G. T. Herman. Fundamentals of computerized tomography: image reconstruction from projections. Springer, 2009.

Digital Library

[6]

G. N Hounsfield. Computerized transverse axial scanning (tomography): Part 1. description of system. British Journal of Radiology, 46(552):1016-1022, 1973.

[7]

S. Kaczmarz. Angenäherte auflösung von systemen linearer gleichungen. Bull. Int. Acad. Polon. Sci. Lett. Ser. A, pages 335-357, 1937.

[8]

N. Le Roux, M. W. Schmidt, and F. Bach. A stochastic gradient method with an exponential convergence rate for finite training sets. Advances in Neural Information Processing Systems (NIPS), pages 2672-2680, 2012.

[9]

F. Natterer. The mathematics of computerized tomography, volume 32 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2001. ISBN 0-89871-4931. URL http://dx.doi.org/10.1137/1.978 08 987192 84. Reprint of the 1986 original.

[10]

D. Needell. Randomized Kaczmarz solver for noisy linear systems. BIT, 50(2):395-403, 2010. ISSN 0006-3835. URL http://dx.doi.org/10.1007/s10543-010-0265-5.

Digital Library

[11]

D. Needell and R. Ward. Two-subspace projection method for coherent overdetermined linear systems. Journal of Fourier Analysis and Applications, 19(2):256-269, 2013.

[12]

Arkadi Nemirovski. Efficient methods in convex programming. 2005.

[13]

Y. Nesterov. Introductory Lectures on Convex Optimization. Kluwer, 2004.

Digital Library

[14]

Y. Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optimiz., 22(2):341-362, 2012.

[15]

P. Richtárik and M. Takáč. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program., pages 1-38, 2012.

[16]

M. Schmidt, N. Roux, and F. Bach. Minimizing finite sums with the stochastic average gradient. arXiv preprint arXiv:1309.2388, 2013.

[17]

S. Shalev-Shwartz and T. Zhang. Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res., 14(1):567-599, 2013.

Digital Library

[18]

N. Srebro, K. Sridharan, and A. Tewari. Smoothness, low noise and fast rates. In Advances in Neural Information Processing Systems, 2010.

[19]

T. Strohmer and R. Vershynin. A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl., 15(2):262-278, 2009. ISSN 1069-5869. URL http://dx.doi.org/10.1007/s00041-008-9030-4.

[20]

K. Tanabe. Projection method for solving a singular system of linear equations and its applications. Numerische Mathematik, 17(3):203-214, 1971.

Digital Library

[21]

T. M. Whitney and R. K. Meany. Two algorithms related to the method of steepest descent. SIAM Journal on Numerical Analysis, 4(1):109-118, 1967.

Digital Library

[22]

P. Zhao and T. Zhang. Stochastic optimization with importance sampling. Submitted, 2014.

Cited By

Das RChen XIeong BBansal PSanghavi SSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Understanding the training speedup from sampling with approximate lossesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692473(10127-10147)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692473
Cooper AGuo WPham KYuan TRuan CLu YDe Sa COh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Coordinating distributed example orders for provably accelerated trainingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668573(56198-56210)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668573
Ayache GDassari VRouayheb S(2023)Walk for Learning: A Random Walk Approach for Federated Learning From Heterogeneous DataIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.324425041:4(929-940)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/JSAC.2023.3244250
Show More Cited By

Index Terms

Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm
1. Mathematics of computing
  1. Mathematical analysis
    1. Mathematical optimization
      1. Continuous optimization
        Stochastic control and optimization
  2. Probability and statistics
    1. Probabilistic reasoning algorithms
      1. Markov-chain Monte Carlo methods
2. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization
      1. Continuous optimization

Index terms have been assigned to the content through auto-classification.

Recommendations

Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm

We obtain an improved finite-sample guarantee on the linear convergence of stochastic gradient descent for smooth and strongly convex objectives, improving from a quadratic dependence on the conditioning $$(L/\mu )^2$$(L/μ)2 (where $$L$$L is a bound on ...
Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions

Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, ...
New insight into hybrid stochastic gradient descent: beyond with-replacement sampling and convexity
NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

As an incremental-gradient algorithm, the hybrid stochastic gradient descent (HSGD) enjoys merits of both stochastic and full gradient methods for finite-sum problem optimization. However, the existing rate-of-convergence analysis for HSGD is made under ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'14: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1

December 2014

3697 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 08 December 2014

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Das RChen XIeong BBansal PSanghavi SSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Understanding the training speedup from sampling with approximate lossesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692473(10127-10147)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692473
Cooper AGuo WPham KYuan TRuan CLu YDe Sa COh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Coordinating distributed example orders for provably accelerated trainingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668573(56198-56210)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668573
Ayache GDassari VRouayheb S(2023)Walk for Learning: A Random Walk Approach for Federated Learning From Heterogeneous DataIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.324425041:4(929-940)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/JSAC.2023.3244250
Lu YGuo WDe Sa CKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)GraBProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600922(8969-8981)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3600922
Glasgow MWootters M(2021)Approximate Gradient Coding with Optimal Decoding2021 IEEE International Symposium on Information Theory (ISIT)10.1109/ISIT45174.2021.9517990(2280-2285)Online publication date: 12-Jul-2021
https://dl.acm.org/doi/10.1109/ISIT45174.2021.9517990
Sankararaman KDe SXu ZHuang WGoldstein TDaumé HSingh A(2020)The impact of neural network overparameterization on gradient confusion and stochastic gradient descentProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525723(8469-8479)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525723
Pesme SDieuleveut AFlammarion NDaumé HSingh A(2020)On convergence-diagnostic based step sizes for stochastic gradient descentProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525646(7641-7651)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525646
Fujimoto SMeger DPrecup DLarochelle HRanzato MHadsell RBalcan MLin H(2020)An equivalence between loss functions and non-uniform sampling in experience replayProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496916(14219-14230)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3496916
Sportisse ABoyer CDieuleveut AJosse JLarochelle HRanzato MHadsell RBalcan MLin H(2020)Debiasing averaged stochastic gradient descent to handle missing valuesProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496811(12957-12967)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3496811
Roy ABalasubramanian KGhadimi SMohapatra PLarochelle HRanzato MHadsell RBalcan MLin H(2020)Escaping saddle-point faster under interpolation-like conditionsProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496765(12414-12425)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3496765
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents