Article

Deep learning via Hessian-free optimization

Author:

James MartensAuthors Info & Claims

ICML'10: Proceedings of the 27th International Conference on International Conference on Machine Learning

Pages 735 - 742

Published: 21 June 2010 Publication History

Publisher Site

Abstract

We develop a 2^nd-order optimization method based on the "Hessian-free" approach, and apply it to training deep auto-encoders. Without using pre-training, we obtain results superior to those reported by Hinton & Salakhutdinov (2006) on the same tasks they considered. Our method is practical, easy to use, scales nicely to very large datasets, and isn't limited in applicability to auto-encoders, or any specific model class. We also discuss the issue of "pathological curvature" as a possible explanation for the difficulty of deep-learning and how 2^nd-order optimization, and our method in particular, effectively deals with it.

References

[1]

Amari, S., Park, H., and Fukumizu, K. Adaptive method of realizing natural gradient learning for multilayer perceptrons. Neural Computation, 2000.

Digital Library

Google Scholar

[2]

Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H. Greedy layer-wise training of deep networks. In NIPS, 2007.

Digital Library

Google Scholar

[3]

Erhan, D., Bengio, Y., Courville, A., Manzagol, P., Vincent, P., and Bengio, S. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 2010.

Digital Library

Google Scholar

[4]

Hinton, G. E. and Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science, July 2006.

Google Scholar

[5]

LeCun, Y., Bottou, L., Orr, G., and Muller, K. Efficient backprop. In Orr, G. and K., Muller (eds.), Neural Networks: Tricks of the trade. Springer, 1998.

Digital Library

Google Scholar

[6]

Mizutani, E. and Dreyfus, S. E. Second-order stagewise back-propagation for hessian-matrix analyses and investigation of negative curvature. Neural Networks, 21(2-3):193 - 203, 2008.

Digital Library

Google Scholar

[7]

Nocedal, J. and Wright, S. J. Numerical Optimization. Springer, 1999.

Crossref

Google Scholar

[8]

Pearlmutter, B. A. Fast exact multiplication by the hessian. Neural Computation, 1994.

Digital Library

Google Scholar

[9]

Schraudolph, N. N. Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 2002.

Digital Library

Google Scholar

Cited By

View all

Park JKim BSung H(2024)NavCim: Comprehensive Design Space Exploration for Analog Computing-in-Memory ArchitecturesProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676946(168-182)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3676946
Potapczynski AFinzi MPleiss GWilson AOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)CoLAProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668026(43894-43917)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668026
Eschenhagen RImmer ATurner RSchneider FHennig POh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Kronecker-factored approximate curvature for modern neural network architecturesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667583(33624-33655)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667583
Show More Cited By

Deep learning via Hessian-free optimization

Recommendations

Advanced Deep Learning with Keras: Apply deep learning techniques, autoencoders, GANs, variational autoencoders, deep reinforcement learning, policy gradients, and more
Lung cancer diagnosis using Hessian adaptive learning optimization in generative adversarial networks
Abstract
Lung cancer is the most frequent cancer and the reason for cancer death, with high morbidity and mortality. Computed tomography is one of the efficient medical imaging tools for lung cancer diagnosis, which offers internal lung details. However, ...
Robust and Interpretable Denoising Via Deep Learning

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ICML'10: Proceedings of the 27th International Conference on International Conference on Machine Learning

June 2010

1262 pages

ISBN:9781605589077

Publisher

Omnipress

Madison, WI, United States

Publication History

Published: 21 June 2010

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

56
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Park JKim BSung H(2024)NavCim: Comprehensive Design Space Exploration for Analog Computing-in-Memory ArchitecturesProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676946(168-182)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3676946
Potapczynski AFinzi MPleiss GWilson AOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)CoLAProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668026(43894-43917)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668026
Eschenhagen RImmer ATurner RSchneider FHennig POh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Kronecker-factored approximate curvature for modern neural network architecturesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667583(33624-33655)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667583
Ott KTiemann MHennig PBriol FEvans RShpitser I(2023)Bayesian numerical integration with neural networksProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625985(1606-1617)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3625985
Loo NHasani RLechner MRus DKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Dataset distillation with convexified implicit gradientsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619350(22649-22674)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619350
Wu KShen JNing YWang TWang WSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Certified Edge Unlearning for Graph Neural NetworksProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599271(2606-2617)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599271
Wu YLiu L(2023)Selecting and Composing Learning Rate Policies for Deep Neural NetworksACM Transactions on Intelligent Systems and Technology10.1145/357050814:2(1-25)Online publication date: 16-Feb-2023
https://dl.acm.org/doi/10.1145/3570508
Wu YWeimer JDavidson S(2021)CHEFProceedings of the VLDB Endowment10.14778/3476249.347629014:11(2410-2418)Online publication date: 27-Oct-2021
https://dl.acm.org/doi/10.14778/3476249.3476290
Hu WZhang CZhan FZhang LWong TShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Conditional Directed Graph Convolution for 3D Human Pose EstimationProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475219(602-611)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475219
Ma PDu TMatusik WDaumé HSingh A(2020)Efficient continuous pareto exploration in multi-task learningProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525543(6522-6531)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525543
Show More Cited By

Abstract

References

Cited By

Recommendations

Advanced Deep Learning with Keras: Apply deep learning techniques, autoencoders, GANs, variational autoencoders, deep reinforcement learning, policy gradients, and more

Lung cancer diagnosis using Hessian adaptive learning optimization in generative adversarial networks

Robust and Interpretable Denoising Via Deep Learning

Comments

Information

Published In

Sponsors

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations