Abstract
Given any deep fully connected neural network, initialized with random Gaussian parameters, we bound from above the quadratic Wasserstein distance between its output distribution and a suitable Gaussian process. Our explicit inequalities indicate how the hidden and output layers sizes affect the Gaussian behaviour of the network and quantitatively recover the distributional convergence results in the wide limit, i.e., if all the hidden layers sizes become large.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Code availability
The code for is available upon request. Its implementation is discussed in Sect. 6.
References
Agrawal, D., Papamarkou, T., & Hinkle, J. (2020). Wide neural networks with bottlenecks are deep Gaussian processes. Journal of Machine Learning Research, 21(175), 1–66.
Ambrosio, L., Gigli, N., & Savaré, G. (2005). Gradient flows. In Metric spaces and in the space of probability measures. Springer.
Bonis, T. (2020). Stein’s method for normal approximation in Wasserstein distances with application to the multivariate central limit theorem. Probability Theory and Related Fields, 178(3), 827–860.
Borovykh, A. (2019). A Gaussian Process perspective on Convolutional Neural Networks. arXiv:1810.10798 [cs, stat].
Bracale, D., Favaro, S., Fortini, S., & Peluchetti, S. (2021). Large-width functional asymptotics for deep gaussian neural networks. In 9th International conference on learning representations, ICLR
Cao, W., Wang, X., Ming, Z., & Gao, J. (2018). A review on neural networks with random weights. Neurocomputing, 275, 278–287.
Cho, Y., & Saul, L. (2009). Kernel methods for deep learning. Advances in Neural Information Processing Systems, 22
Cowan, M.K. (2022). battlesnake/neural. original-date: 2013-08-12T23:46:47Z
Eldan, R., Mikulincer, D., & Schramm, T. (2021). Non-asymptotic approximations of neural networks by Gaussian processes. In Conference on learning theory (pp. 1754–1775). PMLR.
Eldan, R., Mikulincer, D., & Schramm, T. (2021). Non-asymptotic approximations of neural networks by Gaussian processes, 134, 1754–1775.
Flamary, R., Courty, N., Gramfort, A., Alaya, M. Z., Boisbunon, A., Chambon, S., Chapel, L., Corenflos, A., Fatras, K., Fournier, N., Gautheron, L., Gayraud, N. T. H., Janati, H., Rakotomamonjy, A., Redko, I., Rolet, A., Schutz, A., Seguy, V., & Sutherland, D. J. (2021). POT: Python optimal transport. The Journal of Machine Learning Research, 22(1), 3571–3578.
Fournier, N., & Guillin, A. (2015). On the rate of convergence in Wasserstein distance of the empirical measure. Probability Theory and Related Fields, 162(3–4), 707–738.
G. Matthews, A.G., Hron, J., Rowland, M., Turner, R.E., & Ghahramani, Z. (2018). Gaussian process behaviour in wide deep neural networks. In 6th International conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net,
Garriga-Alonso, A., Rasmussen, C.E., & Aitchison, L. (2019). Deep convolutional networks as shallow Gaussian processes
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems, vol. 27. Curran Associates, Inc.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
Hanin, B. (2023). Random neural networks in the infinite width limit as Gaussian processes. The Annals of Applied Probability, 33(6A), 4798–4819.
Hemmen, J. L., & Ando, T. (1980). An inequality for trace ideals. Communications in Mathematical Physics, 76(2), 143–148.
Klukowski, A. (2022). Rate of convergence of polynomial networks to gaussian processes. In Conference on learning theory (pp. 701–722). PMLR.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
Lee, J., Bahri, Y., Novak, R., Schoenholz, S.S., Pennington, J., & Sohl-Dickstein, J. (2018). Deep neural networks as gaussian processes. In 6th International conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net,
Lee, J., Xiao, L., Schoenholz, S., Bahri, Y., Novak, R., Sohl-Dickstein, J., & Pennington, J. (2019). Wide neural networks of any depth evolve as linear models under gradient descent. In Advances in neural information processing systems, vol. 32. Curran Associates, Inc.
Matsubara, T., Oates, C. J., & Briol, F. (2020). The ridgelet prior: A covariance function approach to prior specification for Bayesian neural networks. Journal of Machine Learning Research, 22, 1–57.
Mei, S., Montanari, A., & Nguyen, P.-M. (2018). A mean field view of the landscape of two-layer neural networks. Proceedings of the National Academy of Sciences, 115(33), 7665–7671.
Narkhede, M. V., Bartakke, P. P., & Sutaone, M. S. (2022). A review on weight initialization strategies for neural networks. Artificial Intelligence Review, 55(1), 291–322.
Neal, R.M. (1996). Priors for infinite networks. In: Neal, R.M. (ed.) Bayesian Learning for Neural Networks. Lecture Notes in Statistics (pp. 29–53). Springer.
Nguyen, P.-M., & Pham, H. T. (2023). A rigorous framework for the mean field limit of multilayer neural networks. Mathematical Statistics and Learning, 6(3), 201–357.
Novak, R., Xiao, L., Hron, J., Lee, J., Alemi, A.A., Sohl-Dickstein, J., & Schoenholz, S.S. (2020). Neural tangents: Fast and easy infinite neural networks in Python. In International Conference on Learning Representations
Peluchetti, S., Favaro, S., & Fortini, S. (2020). Stable behaviour of infinitely wide deep neural networks. In Proceedings of the Twenty third international conference on artificial intelligence and statistics (pp. 1137–1146). PMLR. ISSN: 2640-3498.
Peyré, G., & Cuturi, M. (2019). Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6), 355–607
Roberts, D. A., Yaida, S., & Hanin, B. (2022). The Principles of deep learning theory. Cambridge University Press.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386.
Santambrogio, F. (2015). Optimal transport for applied mathematicians. Birkäuser, NY, 55(58–63), 94.
Sejnowski, T. J. (2018). The deep learning revolution. MIT Press.
Villani, C. (2009). Optimal Transport: Old and New vol. 338. Springer.
Williams, C., & Rasmussen, C.E. (2006). Gaussian processes for machine learning, vol. 2, no. 3. MIT Press
Yang, G. (2019). Wide feedforward or recurrent neural networks of any architecture are Gaussian processes. In Advances in neural information processing systems 32: Annual conference on neural information processing systems 2019, NeurIPS 2019 (pp. 9947–9960).
Yang, G. (2019). Wide feedforward or recurrent neural networks of any architecture are gaussian processes. Advances in Neural Information Processing Systems, 32
Yang, G., & Littwin, E. (2021). Tensor programs iib: Architectural universality of neural tangent kernel training dynamics. In International conference on machine learning (pp. 11762–11772). PMLR.
Funding
A.B. Basteri acknowledges partial support from the French government under management of Agence Nationale de la Recherche as part of the “Investissements d’avenir” program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute), and support from the European Research Council (grant REAL 947908). D.T. acknowledges the MIUR Excellence Department Project awarded to the Department of Mathematics, University of Pisa, CUP I57G22000700001, the HPC Italian National Centre for HPC, Big Data and Quantum Computing - Proposal code CN1 CN00000013, CUP I53C22000690001, the PRIN 2022 Italian grant 2022WHZ5XH - “understanding the LEarning process of QUantum Neural networks (LeQun)”, CUP J53D23003890006, the INdAM-GNAMPA project 2023 “Teoremi Limite per Dinamiche di Discesa Gradiente Stocastica: Convergenza e Generalizzazione”, INdAM-GNAMPA project 2024 “Tecniche analitiche e probabilistiche in informazione quantistica” and the project G24-202 “Variational methods for geometric and optimal matching problems” funded by Università Italo Francese. Research also partly funded by PNRR - M4C2 - Investimento 1.3, Partenariato Esteso PE00000013 - "FAIR - Future Artificial Intelligence Research" - Spoke 1 "Human-centered AI", funded by the European Commission under the NextGeneration EU programme.
Author information
Authors and Affiliations
Contributions
Both authors contributed equally to this work.
Corresponding author
Ethics declarations
Conflict of interest
Not applicable.
Consent for publication
The authors of this manuscript consent to its publication.
Additional information
Editor: Hendrik Blockeel
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Member of the INdAM GNAMPA group.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Basteri, A., Trevisan, D. Quantitative Gaussian approximation of randomly initialized deep neural networks. Mach Learn 113, 6373–6393 (2024). https://doi.org/10.1007/s10994-024-06578-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-024-06578-z