Quantitative Gaussian approximation of randomly initialized deep neural networks

278 Accesses
1 Altmetric
Explore all metrics

Abstract

Given any deep fully connected neural network, initialized with random Gaussian parameters, we bound from above the quadratic Wasserstein distance between its output distribution and a suitable Gaussian process. Our explicit inequalities indicate how the hidden and output layers sizes affect the Gaussian behaviour of the network and quantitatively recover the distributional convergence results in the wide limit, i.e., if all the hidden layers sizes become large.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Quantitative CLTs in deep neural networks

Article 04 February 2025

Deep and Wide Neural Networks Covariance Estimation

Products of Many Large Random Matrices and Gradients in Deep Neural Networks

Article 23 December 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Code availability

The code for is available upon request. Its implementation is discussed in Sect. 6.

References

Agrawal, D., Papamarkou, T., & Hinkle, J. (2020). Wide neural networks with bottlenecks are deep Gaussian processes. Journal of Machine Learning Research, 21(175), 1–66.
MathSciNet Google Scholar
Ambrosio, L., Gigli, N., & Savaré, G. (2005). Gradient flows. In Metric spaces and in the space of probability measures. Springer.
Bonis, T. (2020). Stein’s method for normal approximation in Wasserstein distances with application to the multivariate central limit theorem. Probability Theory and Related Fields, 178(3), 827–860.
Article MathSciNet Google Scholar
Borovykh, A. (2019). A Gaussian Process perspective on Convolutional Neural Networks. arXiv:1810.10798 [cs, stat].
Bracale, D., Favaro, S., Fortini, S., & Peluchetti, S. (2021). Large-width functional asymptotics for deep gaussian neural networks. In 9th International conference on learning representations, ICLR
Cao, W., Wang, X., Ming, Z., & Gao, J. (2018). A review on neural networks with random weights. Neurocomputing, 275, 278–287.
Article Google Scholar
Cho, Y., & Saul, L. (2009). Kernel methods for deep learning. Advances in Neural Information Processing Systems, 22
Cowan, M.K. (2022). battlesnake/neural. original-date: 2013-08-12T23:46:47Z
Eldan, R., Mikulincer, D., & Schramm, T. (2021). Non-asymptotic approximations of neural networks by Gaussian processes. In Conference on learning theory (pp. 1754–1775). PMLR.
Eldan, R., Mikulincer, D., & Schramm, T. (2021). Non-asymptotic approximations of neural networks by Gaussian processes, 134, 1754–1775.
Flamary, R., Courty, N., Gramfort, A., Alaya, M. Z., Boisbunon, A., Chambon, S., Chapel, L., Corenflos, A., Fatras, K., Fournier, N., Gautheron, L., Gayraud, N. T. H., Janati, H., Rakotomamonjy, A., Redko, I., Rolet, A., Schutz, A., Seguy, V., & Sutherland, D. J. (2021). POT: Python optimal transport. The Journal of Machine Learning Research, 22(1), 3571–3578.
Google Scholar
Fournier, N., & Guillin, A. (2015). On the rate of convergence in Wasserstein distance of the empirical measure. Probability Theory and Related Fields, 162(3–4), 707–738.
Article MathSciNet Google Scholar
G. Matthews, A.G., Hron, J., Rowland, M., Turner, R.E., & Ghahramani, Z. (2018). Gaussian process behaviour in wide deep neural networks. In 6th International conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net,
Garriga-Alonso, A., Rasmussen, C.E., & Aitchison, L. (2019). Deep convolutional networks as shallow Gaussian processes
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems, vol. 27. Curran Associates, Inc.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
Hanin, B. (2023). Random neural networks in the infinite width limit as Gaussian processes. The Annals of Applied Probability, 33(6A), 4798–4819.
Article MathSciNet Google Scholar
Hemmen, J. L., & Ando, T. (1980). An inequality for trace ideals. Communications in Mathematical Physics, 76(2), 143–148.
Article MathSciNet Google Scholar
Klukowski, A. (2022). Rate of convergence of polynomial networks to gaussian processes. In Conference on learning theory (pp. 701–722). PMLR.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
Article Google Scholar
Lee, J., Bahri, Y., Novak, R., Schoenholz, S.S., Pennington, J., & Sohl-Dickstein, J. (2018). Deep neural networks as gaussian processes. In 6th International conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net,
Lee, J., Xiao, L., Schoenholz, S., Bahri, Y., Novak, R., Sohl-Dickstein, J., & Pennington, J. (2019). Wide neural networks of any depth evolve as linear models under gradient descent. In Advances in neural information processing systems, vol. 32. Curran Associates, Inc.
Matsubara, T., Oates, C. J., & Briol, F. (2020). The ridgelet prior: A covariance function approach to prior specification for Bayesian neural networks. Journal of Machine Learning Research, 22, 1–57.
MathSciNet Google Scholar
Mei, S., Montanari, A., & Nguyen, P.-M. (2018). A mean field view of the landscape of two-layer neural networks. Proceedings of the National Academy of Sciences, 115(33), 7665–7671.
Article MathSciNet Google Scholar
Narkhede, M. V., Bartakke, P. P., & Sutaone, M. S. (2022). A review on weight initialization strategies for neural networks. Artificial Intelligence Review, 55(1), 291–322.
Article Google Scholar
Neal, R.M. (1996). Priors for infinite networks. In: Neal, R.M. (ed.) Bayesian Learning for Neural Networks. Lecture Notes in Statistics (pp. 29–53). Springer.
Nguyen, P.-M., & Pham, H. T. (2023). A rigorous framework for the mean field limit of multilayer neural networks. Mathematical Statistics and Learning, 6(3), 201–357.
Article MathSciNet Google Scholar
Novak, R., Xiao, L., Hron, J., Lee, J., Alemi, A.A., Sohl-Dickstein, J., & Schoenholz, S.S. (2020). Neural tangents: Fast and easy infinite neural networks in Python. In International Conference on Learning Representations
Peluchetti, S., Favaro, S., & Fortini, S. (2020). Stable behaviour of infinitely wide deep neural networks. In Proceedings of the Twenty third international conference on artificial intelligence and statistics (pp. 1137–1146). PMLR. ISSN: 2640-3498.
Peyré, G., & Cuturi, M. (2019). Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6), 355–607
Roberts, D. A., Yaida, S., & Hanin, B. (2022). The Principles of deep learning theory. Cambridge University Press.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386.
Article Google Scholar
Santambrogio, F. (2015). Optimal transport for applied mathematicians. Birkäuser, NY, 55(58–63), 94.
Sejnowski, T. J. (2018). The deep learning revolution. MIT Press.
Villani, C. (2009). Optimal Transport: Old and New vol. 338. Springer.
Williams, C., & Rasmussen, C.E. (2006). Gaussian processes for machine learning, vol. 2, no. 3. MIT Press
Yang, G. (2019). Wide feedforward or recurrent neural networks of any architecture are Gaussian processes. In Advances in neural information processing systems 32: Annual conference on neural information processing systems 2019, NeurIPS 2019 (pp. 9947–9960).
Yang, G. (2019). Wide feedforward or recurrent neural networks of any architecture are gaussian processes. Advances in Neural Information Processing Systems, 32
Yang, G., & Littwin, E. (2021). Tensor programs iib: Architectural universality of neural tangent kernel training dynamics. In International conference on machine learning (pp. 11762–11772). PMLR.

Download references

Funding

A.B. Basteri acknowledges partial support from the French government under management of Agence Nationale de la Recherche as part of the “Investissements d’avenir” program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute), and support from the European Research Council (grant REAL 947908). D.T. acknowledges the MIUR Excellence Department Project awarded to the Department of Mathematics, University of Pisa, CUP I57G22000700001, the HPC Italian National Centre for HPC, Big Data and Quantum Computing - Proposal code CN1 CN00000013, CUP I53C22000690001, the PRIN 2022 Italian grant 2022WHZ5XH - “understanding the LEarning process of QUantum Neural networks (LeQun)”, CUP J53D23003890006, the INdAM-GNAMPA project 2023 “Teoremi Limite per Dinamiche di Discesa Gradiente Stocastica: Convergenza e Generalizzazione”, INdAM-GNAMPA project 2024 “Tecniche analitiche e probabilistiche in informazione quantistica” and the project G24-202 “Variational methods for geometric and optimal matching problems” funded by Università Italo Francese. Research also partly funded by PNRR - M4C2 - Investimento 1.3, Partenariato Esteso PE00000013 - "FAIR - Future Artificial Intelligence Research" - Spoke 1 "Human-centered AI", funded by the European Commission under the NextGeneration EU programme.

Author information

Authors and Affiliations

INRIA Paris, 75012, Paris, France
Andrea Basteri
Département d’Informatique, École Normale Supérieure, Paris, France
Andrea Basteri
PSL Research University, Paris, France
Andrea Basteri
Dipartimento di Matematica, Università di Pisa, Largo Bruno Pontecorvo 5, 56127, Pisa, Italy
Dario Trevisan

Authors

Andrea Basteri
View author publications
You can also search for this author in PubMed Google Scholar
Dario Trevisan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Both authors contributed equally to this work.

Corresponding author

Correspondence to Andrea Basteri.

Ethics declarations

Conflict of interest

Not applicable.

Consent for publication

The authors of this manuscript consent to its publication.

Additional information

Editor: Hendrik Blockeel

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Member of the INdAM GNAMPA group.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Basteri, A., Trevisan, D. Quantitative Gaussian approximation of randomly initialized deep neural networks. Mach Learn 113, 6373–6393 (2024). https://doi.org/10.1007/s10994-024-06578-z

Download citation

Received: 10 February 2023
Revised: 24 April 2024
Accepted: 30 May 2024
Published: 25 June 2024
Issue Date: September 2024
DOI: https://doi.org/10.1007/s10994-024-06578-z

Quantitative Gaussian approximation of randomly initialized deep neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Quantitative CLTs in deep neural networks

Deep and Wide Neural Networks Covariance Estimation

Products of Many Large Random Matrices and Gradients in Deep Neural Networks

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Quantitative Gaussian approximation of randomly initialized deep neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Quantitative CLTs in deep neural networks

Deep and Wide Neural Networks Covariance Estimation

Products of Many Large Random Matrices and Gradients in Deep Neural Networks

Explore related subjects

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now