More Web Proxy on the site http://driver.im/

research-article

Open access

Asymptotic Convergence Rate of Dropout on Shallow Linear Neural Networks

Authors:

Albert Senen-Cerda,

Jaron SandersAuthors Info & Claims

Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 6, Issue 2

Article No.: 32, Pages 1 - 53

https://doi.org/10.1145/3530898

Published: 06 June 2022 Publication History

Abstract

We analyze the convergence rate of gradient flows on objective functions induced by Dropout and Dropconnect, when applying them to shallow linear Neural Networks(NN) ---which can also be viewed as doing matrix factorization using a particular regularizer. Dropout algorithms such as these are thus regularization techniques that use 0,1-valued random variables to filter weights during training in order to avoid coadaptation of features. By leveraging a recent result on nonconvex optimization and conducting a careful analysis of the set of minimizers as well as the Hessian of the loss function, we are able to obtain (i) a local convergence proof of the gradient flow and (ii) a bound on the convergence rate that depends on the data, the dropout probability, and the width of the NN. Finally, we compare this theoretical bound to numerical simulations, which are in qualitative agreement with the convergence bound and match it when starting sufficiently close to a minimizer.

References

[1]

Tsuyoshi Ando. 1989. Majorization, doubly stochastic matrices, and comparison of eigenvalues. Linear Algebra Appl. 118 (1989), 163--248.

[2]

Sanjeev Arora, Noah Golowich, Nadav Cohen, and Wei Hu. 2019. A convergence analysis of gradient descent for deep linear neural networks. In 7th International Conference on Learning Representations, ICLR 2019.

[3]

Andreas Arvanitogeorgos. 2003. ¯ An introduction to Lie groups and the geometry of homogeneous spaces. Vol. 22. American Mathematical Soc.

[4]

Jimmy Ba and Brendan Frey. 2013. Adaptive Dropout for training deep neural networks. In Advances in Neural Information Processing Systems. 3084--3092.

[5]

Bubacarr Bah, Holger Rauhut, Ulrich Terstiege, and Michael Westdickenberg. 2022. Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers. Information and Inference: A Journal of the IMA 11, 1 (2022), 307--353.

[6]

Pierre Baldi and Peter J. Sadowski. 2013. Understanding Dropout. In Advances in Neural Information Processing Systems. 2814--2822.

[7]

Pierre Baldi and Peter J. Sadowski. 2014. The Dropout learning algorithm. Artificial Intelligence 210 (2014), 78--122. Proc. ACM Meas. Anal. Comput. Syst., Vol. 6, No. 2, Article 32. Publication date: June 2022. Dropout's Asymptotic Convergence Rate 32:19

[8]

Peter L. Bartlett, David P. Helmbold, and Philip M. Long. 2018. Gradient Descent with Identity Initialization Efficiently Learns Positive-Definite Linear Transformations by Deep Residual Networks. Neural Computation 31 (2018), 477--502.

Digital Library

[9]

Dimitri P. Bertsekas and John N. Tsitsiklis. 1996. Neuro-dynamic programming. Athena Scientific.

Digital Library

[10]

Jacek Bochnak, Michel Coste, and Marie-Françoise Roy. 2013. Real algebraic geometry. Vol. 36. Springer Science & Business Media.

[11]

Vivek S. Borkar. 2009. Stochastic approximation: a dynamical systems viewpoint. Vol. 48. Springer.

[12]

Jacopo Cavazza, Pietro Morerio, Benjamin Haeffele, Connor Lane, Vittorio Murino, and Rene Vidal. 2018. Dropout as a low-rank regularizer for matrix factorization. In International Conference on Artificial Intelligence and Statistics. PMLR, 435--444.

[13]

Terrance DeVries and Graham W. Taylor. 2017. Improved regularization of convolutional neural networks with Cutout. arXiv preprint arXiv:1708.04552 (2017).

[14]

Benjamin Fehrman, Benjamin Gess, and Arnulf Jentzen. 2020. Convergence rates for the stochastic gradient descent method for non-convex objective functions. Journal of Machine Learning Research 21 (2020), 136.

[15]

Kam Hamidieh. 2018. A data-driven statistical model for predicting the critical temperature of a superconductor. Computational Materials Science 154 (2018), 346 -- 354. https://doi.org/10.1016/j.commatsci.2018.07.052

[16]

Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012).

[17]

Arnulf Jentzen and Adrian Riekert. 2021. On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks. arXiv preprint arXiv:2112.09684 (2021).

[18]

Kenji Kawaguchi. 2016. Deep Learning without Poor Local Minima. Advances in Neural Information Processing Systems 29 (2016), 586--594.

Digital Library

[19]

Edmund Kay and Anurag Agarwal. 2016. Dropconnected neural network trained with diverse features for classifying heart sounds. In 2016 Computing in Cardiology Conference (CinC). IEEE, 617--620.

[20]

Durk P. Kingma, Tim Salimans, and Max Welling. 2015. Variational Dropout and the local reparameterization trick. In Advances in Neural Information Processing Systems. 2575--2583.

[21]

Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. (2009).

[22]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.

Digital Library

[23]

Harold J. Kushner and George G. Yin. 2003. Stochastic approximation and recursive algorithms and applications. Vol. 35. Springer Science & Business Media.

[24]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.

[25]

Yann LeCun, Corinna Cortes, and CJ Burges. 2010. MNIST handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist 2 (2010).

[26]

Jason D. Lee, Max Simchowitz, Michael I. Jordan, and Benjamin Recht. 2016. Gradient descent converges to minimizers. arXiv preprint arXiv:1602.04915 (2016).

[27]

John M. Lee. 2013. Smooth manifolds. In Introduction to Smooth Manifolds. Springer, 1--31.

[28]

Zhe Li, Boqing Gong, and Tianbao Yang. 2016. Improved Dropout for shallow and deep learning. In Advances in Neural Information Processing Systems. 2523--2531.

[29]

Albert W. Marshall, Ingram Olkin, and Barry C. Arnold. 1979. Inequalities: theory of majorization and its applications. Vol. 143. Springer.

[30]

Poorya Mianjy and Raman Arora. 2019. On Dropout and Nuclear Norm Regularization. In International Conference on Machine Learning. 4575--4584.

[31]

Poorya Mianjy and Raman Arora. 2020. On Convergence and Generalization of Dropout Training. Advances in Neural Information Processing Systems 33 (2020).

[32]

Poorya Mianjy, Raman Arora, and Rene Vidal. 2018. On the Implicit Bias of Dropout. In International Conference on Machine Learning. 3540--3548.

[33]

Dmitry Molchanov, Arsenii Ashukha, and Dmitry Vetrov. 2017. Variational Dropout sparsifies deep neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR.org, 2498--2507.

Digital Library

[34]

Quynh Nguyen and Matthias Hein. 2017. The loss surface of deep and wide neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. 2603--2612.

Digital Library

[35]

Ambar Pal, Connor Lane, René Vidal, and Benjamin D. Haeffele. 2020. On the regularization properties of structured Dropout. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7671--7679.

[36]

Vu Pham, Théodore Bluche, Christopher Kermorvant, and Jérôme Louradour. 2014. Dropout improves recurrent neural networks for handwriting recognition. In 2014 14th International Conference on Frontiers in Handwriting Recognition. IEEE, 285--290. Proc. ACM Meas. Anal. Comput. Syst., Vol. 6, No. 2, Article 32. Publication date: June 2022. 32:20 Albert Senen--Cerda and Jaron Sanders

[37]

Stanislau Semeniuta, Aliaksei Severyn, and Erhardt Barth. 2016. Recurrent Dropout without Memory Loss. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 1757-- 1766.

[38]

Albert Senen-Cerda and Jaron Sanders. 2020. Almost sure convergence of Dropout algorithms for neural networks. arXiv preprint arXiv:2002.02247 (2020).

[39]

Karen Smith, Lauri Kahanpää, Pekka Kekäläinen, and William Traves. 2004. An invitation to algebraic geometry. Springer Science & Business Media.

[40]

Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958.

Digital Library

[41]

Gregor Urban, Kevin Bache, Duc T.T. Phan, Agua Sobrino, Alexander K. Shmakov, Stephanie J. Hachey, Christopher C.W. Hughes, and Pierre Baldi. 2018. Deep learning for drug discovery and cancer research: Automated analysis of vascularization images. IEEE/ACM transactions on computational biology and bioinformatics 16, 3 (2018), 1029--1035.

[42]

Stefan Wager, Sida Wang, and Percy S. Liang. 2013. Dropout training as adaptive regularization. In Advances in Neural Information Processing Systems. 351--359.

[43]

Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, and Rob Fergus. 2013. Regularization of neural networks using Dropconnect. In International Conference on Machine Learning. 1058--1066.

Digital Library

[44]

Colin Wei, Sham Kakade, and Tengyu Ma. 2020. The implicit and explicit regularization effects of Dropout. In International Conference on Machine Learning. PMLR, 10181--10192.

[45]

Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014).

Index Terms

Asymptotic Convergence Rate of Dropout on Shallow Linear Neural Networks
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic algorithms

Recommendations

Asymptotic Convergence Rate of Dropout on Shallow Linear Neural Networks
SIGMETRICS/PERFORMANCE '22: Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems

We analyze the convergence rate of gradient flows on objective functions induced by Dropout and Dropconnect, when applying them to shallow linear Neural Networks (NNs) ---which can also be viewed as doing matrix factorization using a particular ...
Asymptotic Convergence Rate of Dropout on Shallow Linear Neural Networks
SIGMETRICS '22

We analyze the convergence rate of gradient flows on objective functions induced by Dropout and Dropconnect, when applying them to shallow linear Neural Networks (NNs) ---which can also be viewed as doing matrix factorization using a particular ...
New globally asymptotic stability criteria for delayed cellular neural networks

This brief is concerned with the stability analysis for cellular neural networks with time-varying delays. First, an appropriate Lyapunov-Krasovskii functional is introduced to form some new delay-dependent stability conditions in terms of linear matrix ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems

Proceedings of the ACM on Measurement and Analysis of Computing Systems Volume 6, Issue 2

POMACS

June 2022

499 pages

EISSN:2476-1249

DOI:10.1145/3543145

Editors:
Augustin Chaintreau
Columbia University
,
Leana Golubchik
University of Southern California
,
Zhi-Li Zhang
University of Minnesota

Issue’s Table of Contents

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2022

Published in POMACS Volume 6, Issue 2

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
200
Total Downloads

Downloads (Last 12 months)96
Downloads (Last 6 weeks)12

Reflects downloads up to 21 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents