[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Asymptotic Convergence Rate of Dropout on Shallow Linear Neural Networks

Published: 06 June 2022 Publication History

Abstract

We analyze the convergence rate of gradient flows on objective functions induced by Dropout and Dropconnect, when applying them to shallow linear Neural Networks(NN) ---which can also be viewed as doing matrix factorization using a particular regularizer. Dropout algorithms such as these are thus regularization techniques that use 0,1-valued random variables to filter weights during training in order to avoid coadaptation of features. By leveraging a recent result on nonconvex optimization and conducting a careful analysis of the set of minimizers as well as the Hessian of the loss function, we are able to obtain (i) a local convergence proof of the gradient flow and (ii) a bound on the convergence rate that depends on the data, the dropout probability, and the width of the NN. Finally, we compare this theoretical bound to numerical simulations, which are in qualitative agreement with the convergence bound and match it when starting sufficiently close to a minimizer.

References

[1]
Tsuyoshi Ando. 1989. Majorization, doubly stochastic matrices, and comparison of eigenvalues. Linear Algebra Appl. 118 (1989), 163--248.
[2]
Sanjeev Arora, Noah Golowich, Nadav Cohen, and Wei Hu. 2019. A convergence analysis of gradient descent for deep linear neural networks. In 7th International Conference on Learning Representations, ICLR 2019.
[3]
Andreas Arvanitogeorgos. 2003. ¯ An introduction to Lie groups and the geometry of homogeneous spaces. Vol. 22. American Mathematical Soc.
[4]
Jimmy Ba and Brendan Frey. 2013. Adaptive Dropout for training deep neural networks. In Advances in Neural Information Processing Systems. 3084--3092.
[5]
Bubacarr Bah, Holger Rauhut, Ulrich Terstiege, and Michael Westdickenberg. 2022. Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers. Information and Inference: A Journal of the IMA 11, 1 (2022), 307--353.
[6]
Pierre Baldi and Peter J. Sadowski. 2013. Understanding Dropout. In Advances in Neural Information Processing Systems. 2814--2822.
[7]
Pierre Baldi and Peter J. Sadowski. 2014. The Dropout learning algorithm. Artificial Intelligence 210 (2014), 78--122. Proc. ACM Meas. Anal. Comput. Syst., Vol. 6, No. 2, Article 32. Publication date: June 2022. Dropout's Asymptotic Convergence Rate 32:19
[8]
Peter L. Bartlett, David P. Helmbold, and Philip M. Long. 2018. Gradient Descent with Identity Initialization Efficiently Learns Positive-Definite Linear Transformations by Deep Residual Networks. Neural Computation 31 (2018), 477--502.
[9]
Dimitri P. Bertsekas and John N. Tsitsiklis. 1996. Neuro-dynamic programming. Athena Scientific.
[10]
Jacek Bochnak, Michel Coste, and Marie-Françoise Roy. 2013. Real algebraic geometry. Vol. 36. Springer Science & Business Media.
[11]
Vivek S. Borkar. 2009. Stochastic approximation: a dynamical systems viewpoint. Vol. 48. Springer.
[12]
Jacopo Cavazza, Pietro Morerio, Benjamin Haeffele, Connor Lane, Vittorio Murino, and Rene Vidal. 2018. Dropout as a low-rank regularizer for matrix factorization. In International Conference on Artificial Intelligence and Statistics. PMLR, 435--444.
[13]
Terrance DeVries and Graham W. Taylor. 2017. Improved regularization of convolutional neural networks with Cutout. arXiv preprint arXiv:1708.04552 (2017).
[14]
Benjamin Fehrman, Benjamin Gess, and Arnulf Jentzen. 2020. Convergence rates for the stochastic gradient descent method for non-convex objective functions. Journal of Machine Learning Research 21 (2020), 136.
[15]
Kam Hamidieh. 2018. A data-driven statistical model for predicting the critical temperature of a superconductor. Computational Materials Science 154 (2018), 346 -- 354. https://doi.org/10.1016/j.commatsci.2018.07.052
[16]
Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012).
[17]
Arnulf Jentzen and Adrian Riekert. 2021. On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks. arXiv preprint arXiv:2112.09684 (2021).
[18]
Kenji Kawaguchi. 2016. Deep Learning without Poor Local Minima. Advances in Neural Information Processing Systems 29 (2016), 586--594.
[19]
Edmund Kay and Anurag Agarwal. 2016. Dropconnected neural network trained with diverse features for classifying heart sounds. In 2016 Computing in Cardiology Conference (CinC). IEEE, 617--620.
[20]
Durk P. Kingma, Tim Salimans, and Max Welling. 2015. Variational Dropout and the local reparameterization trick. In Advances in Neural Information Processing Systems. 2575--2583.
[21]
Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. (2009).
[22]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.
[23]
Harold J. Kushner and George G. Yin. 2003. Stochastic approximation and recursive algorithms and applications. Vol. 35. Springer Science & Business Media.
[24]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.
[25]
Yann LeCun, Corinna Cortes, and CJ Burges. 2010. MNIST handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist 2 (2010).
[26]
Jason D. Lee, Max Simchowitz, Michael I. Jordan, and Benjamin Recht. 2016. Gradient descent converges to minimizers. arXiv preprint arXiv:1602.04915 (2016).
[27]
John M. Lee. 2013. Smooth manifolds. In Introduction to Smooth Manifolds. Springer, 1--31.
[28]
Zhe Li, Boqing Gong, and Tianbao Yang. 2016. Improved Dropout for shallow and deep learning. In Advances in Neural Information Processing Systems. 2523--2531.
[29]
Albert W. Marshall, Ingram Olkin, and Barry C. Arnold. 1979. Inequalities: theory of majorization and its applications. Vol. 143. Springer.
[30]
Poorya Mianjy and Raman Arora. 2019. On Dropout and Nuclear Norm Regularization. In International Conference on Machine Learning. 4575--4584.
[31]
Poorya Mianjy and Raman Arora. 2020. On Convergence and Generalization of Dropout Training. Advances in Neural Information Processing Systems 33 (2020).
[32]
Poorya Mianjy, Raman Arora, and Rene Vidal. 2018. On the Implicit Bias of Dropout. In International Conference on Machine Learning. 3540--3548.
[33]
Dmitry Molchanov, Arsenii Ashukha, and Dmitry Vetrov. 2017. Variational Dropout sparsifies deep neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR.org, 2498--2507.
[34]
Quynh Nguyen and Matthias Hein. 2017. The loss surface of deep and wide neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. 2603--2612.
[35]
Ambar Pal, Connor Lane, René Vidal, and Benjamin D. Haeffele. 2020. On the regularization properties of structured Dropout. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7671--7679.
[36]
Vu Pham, Théodore Bluche, Christopher Kermorvant, and Jérôme Louradour. 2014. Dropout improves recurrent neural networks for handwriting recognition. In 2014 14th International Conference on Frontiers in Handwriting Recognition. IEEE, 285--290. Proc. ACM Meas. Anal. Comput. Syst., Vol. 6, No. 2, Article 32. Publication date: June 2022. 32:20 Albert Senen--Cerda and Jaron Sanders
[37]
Stanislau Semeniuta, Aliaksei Severyn, and Erhardt Barth. 2016. Recurrent Dropout without Memory Loss. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 1757-- 1766.
[38]
Albert Senen-Cerda and Jaron Sanders. 2020. Almost sure convergence of Dropout algorithms for neural networks. arXiv preprint arXiv:2002.02247 (2020).
[39]
Karen Smith, Lauri Kahanpää, Pekka Kekäläinen, and William Traves. 2004. An invitation to algebraic geometry. Springer Science & Business Media.
[40]
Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958.
[41]
Gregor Urban, Kevin Bache, Duc T.T. Phan, Agua Sobrino, Alexander K. Shmakov, Stephanie J. Hachey, Christopher C.W. Hughes, and Pierre Baldi. 2018. Deep learning for drug discovery and cancer research: Automated analysis of vascularization images. IEEE/ACM transactions on computational biology and bioinformatics 16, 3 (2018), 1029--1035.
[42]
Stefan Wager, Sida Wang, and Percy S. Liang. 2013. Dropout training as adaptive regularization. In Advances in Neural Information Processing Systems. 351--359.
[43]
Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, and Rob Fergus. 2013. Regularization of neural networks using Dropconnect. In International Conference on Machine Learning. 1058--1066.
[44]
Colin Wei, Sham Kakade, and Tengyu Ma. 2020. The implicit and explicit regularization effects of Dropout. In International Conference on Machine Learning. PMLR, 10181--10192.
[45]
Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014).

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 6, Issue 2
POMACS
June 2022
499 pages
EISSN:2476-1249
DOI:10.1145/3543145
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2022
Published in POMACS Volume 6, Issue 2

Check for updates

Author Tags

  1. convergence rate
  2. dropout
  3. gradient flow
  4. neural networks

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 200
    Total Downloads
  • Downloads (Last 12 months)96
  • Downloads (Last 6 weeks)12
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media