extended-abstract

Efficient Hessian-based DNN Optimization via Chain-Rule Approximation

Authors:

CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)

Pages 297 - 298

https://doi.org/10.1145/3570991.3571042

Published: 04 January 2023 Publication History

Get Access

Abstract

Learning non use-case specific models has been shown to be a challenging task in Deep Learning (DL). Hyperparameter tuning requires long training sessions that have to be restarted any time the network or the dataset changes and are not affordable by most stakeholders in industry and research. Many attempts have been made to justify and understand the source of the use-case specificity that distinguishes DL problems. To this date, second-order optimization methods have been partially shown to be effective in some cases but have not been sufficiently investigated in the context of learning and optimization.

In this work, we present a chain rule for the efficient approximation of the Hessian matrix (i.e., the second-order derivatives) of the weights across the layers of a Deep Neural Network (DNN). We show the application of our approach for weight optimization during DNN training, as we believe that this is a step that particularly suffers from the enormous variety of the optimizers provided by state-of-the-art libraries such as Keras and PyTorch. We demonstrate—both theoretically and empirically—the improved accuracy of our approximation technique and that the Hessian is a useful diagnostic tool which helps to more rigorously optimize training. Our preliminary experiments prove the efficiency as well as the improved convergence of our approach which both are crucial aspects for DNN training.

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, 2016. TensorFlow: a system for Large-Scale machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16). 265–283.

Digital Library

Google Scholar

[2]

Prateek Jain and Purushottam Kar. 2017. Non-convex Optimization for Machine Learning.Foundations and Trends in Machine Learning(2017).

Google Scholar

[3]

Andrej Karpathy. 2015. char-rnn. https://github.com/karpathy/char-rnn.

Google Scholar

[4]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization.CoRR abs/1412.6980(2015).

Google Scholar

[5]

Herbert E. Robbins. 1951. A Stochastic Approximation Method. Annals of Mathematical Statistics(1951).

Google Scholar

[6]

Maciej Skorski, Alessandro Temperoni, and Martin Theobald. 2021. Revisiting Weight Initialization of Deep Neural Networks. In Proceedings of Machine Learning Research, Neil Lawrence (Ed.). Vol. 157. 16 pages.

Google Scholar

Recommendations

A reduced Hessian SQP method for inequality constrained optimization

This paper develops a reduced Hessian method for solving inequality constrained optimization problems. At each iteration, the proposed method solves a quadratic subproblem which is always feasible by introducing a slack variable to generate a search ...
Controllability and a Multiplier Rule for Nondifferentiable Optimization Problems

Let K be a compact subset of a normed vector space $\mathcal{X}$, C a convex body in a Banach space $\mathcal{Y}$, $k_0 \in K$, $(\phi ,\Phi ):K \to \mathbb{R}^m \times \mathcal{Y}$ continuous, and $\Phi (k_0 ) \in C$. We introduce the concept of a “...
Hessian Barrier Algorithms for Linearly Constrained Optimization Problems

In this paper, we propose an interior-point method for linearly constrained---and possibly nonconvex---optimization problems. The method---which we call the Hessian barrier algorithm (HBA)---combines a forward Euler discretization of Hessian--Riemannian ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)

January 2023

357 pages

ISBN:9781450397971

DOI:10.1145/3570991

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 January 2023

Check for updates

Qualifiers

Extended-abstract
Research
Refereed limited

Conference

CODS-COMAD 2023

CODS-COMAD 2023: 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)

January 4 - 7, 2023

Mumbai, India

Acceptance Rates

Overall Acceptance Rate 197 of 680 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
60
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)1

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Recommendations

A reduced Hessian SQP method for inequality constrained optimization

Controllability and a Multiplier Rule for Nondifferentiable Optimization Problems

Hessian Barrier Algorithms for Linearly Constrained Optimization Problems

Comments

Information

Published In

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations