Computer Science > Machine Learning

arXiv:2208.05287 (cs)

[Submitted on 10 Aug 2022]

Title:Adaptive Learning Rates for Faster Stochastic Gradient Methods

Authors:Samuel Horváth, Konstantin Mishchenko, Peter Richtárik

View PDF

Abstract:In this work, we propose new adaptive step size strategies that improve several stochastic gradient methods. Our first method (StoPS) is based on the classical Polyak step size (Polyak, 1987) and is an extension of the recent development of this method for the stochastic optimization-SPS (Loizou et al., 2021), and our second method, denoted GraDS, rescales step size by "diversity of stochastic gradients". We provide a theoretical analysis of these methods for strongly convex smooth functions and show they enjoy deterministic-like rates despite stochastic gradients. Furthermore, we demonstrate the theoretical superiority of our adaptive methods on quadratic objectives. Unfortunately, both StoPS and GraDS depend on unknown quantities, which are only practical for the overparametrized models. To remedy this, we drop this undesired dependence and redefine StoPS and GraDS to StoP and GraD, respectively. We show that these new methods converge linearly to the neighbourhood of the optimal solution under the same assumptions. Finally, we corroborate our theoretical claims by experimental validation, which reveals that GraD is particularly useful for deep learning optimization.

Comments:	14 pages, 5 figures, 10 pages of appendix
Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC)
Cite as:	arXiv:2208.05287 [cs.LG]
	(or arXiv:2208.05287v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2208.05287

Submission history

From: Samuel Horváth [view email]
[v1] Wed, 10 Aug 2022 11:36:00 UTC (2,052 KB)

Computer Science > Machine Learning

Title:Adaptive Learning Rates for Faster Stochastic Gradient Methods

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Adaptive Learning Rates for Faster Stochastic Gradient Methods

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators