Mathematics > Optimization and Control

arXiv:2305.12475 (math)

[Submitted on 21 May 2023]

Title:Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods

Authors:Junchi Yang, Xiang Li, Ilyas Fatkhullin, Niao He

View PDF

Abstract:The classical analysis of Stochastic Gradient Descent (SGD) with polynomially decaying stepsize $\eta_t = \eta/\sqrt{t}$ relies on well-tuned $\eta$ depending on problem parameters such as Lipschitz smoothness constant, which is often unknown in practice. In this work, we prove that SGD with arbitrary $\eta > 0$, referred to as untuned SGD, still attains an order-optimal convergence rate $\widetilde{O}(T^{-1/4})$ in terms of gradient norm for minimizing smooth objectives. Unfortunately, it comes at the expense of a catastrophic exponential dependence on the smoothness constant, which we show is unavoidable for this scheme even in the noiseless setting. We then examine three families of adaptive methods $\unicode{x2013}$ Normalized SGD (NSGD), AMSGrad, and AdaGrad $\unicode{x2013}$ unveiling their power in preventing such exponential dependency in the absence of information about the smoothness parameter and boundedness of stochastic gradients. Our results provide theoretical justification for the advantage of adaptive methods over untuned SGD in alleviating the issue with large gradients.

Subjects:	Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2305.12475 [math.OC]
	(or arXiv:2305.12475v1 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.2305.12475

Submission history

From: Junchi Yang [view email]
[v1] Sun, 21 May 2023 14:40:43 UTC (500 KB)

Mathematics > Optimization and Control

Title:Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Optimization and Control

Title:Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators