[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Convergence of deep ReLU networks

Published: 12 April 2024 Publication History

Abstract

We explore convergence of deep neural networks with the popular ReLU activation function, as the depth of the networks tends to infinity. To this end, we introduce the notion of activation domains and activation matrices of a ReLU network. By replacing applications of the ReLU activation function by multiplications with activation matrices on activation domains, we obtain an explicit expression of the ReLU network. We then identify the convergence of the ReLU networks as convergence of a class of infinite products of matrices. Sufficient and necessary conditions for convergence of these infinite products of matrices are studied. As a result, we establish necessary conditions for ReLU networks to converge that the sequence of weight matrices converges to the identity matrix and the sequence of the bias vectors converges to zero as the depth of ReLU networks increases to infinity. Moreover, we obtain sufficient conditions in terms of the weight matrices and bias vectors at hidden layers for pointwise convergence of deep ReLU networks. These results provide mathematical insights to convergence of deep neural networks. Experiments are conducted to mathematically verify the results and to illustrate their potential usefulness in initialization of deep neural networks.

References

[1]
LeCun Y., Bengio Y., Hinton G., Deep learning, Nature 521 (7553) (2015) 436–444. 2015.
[2]
Goodfellow I., Bengio Y., Courville A., Deep Learning, MIT Press, Cambridge, 2016.
[3]
DeVore R., Hanin B., Petrova G., Neural network approximation, Acta Numerica 30 (2021) 327–444.
[4]
Elbrächter D., Perekrestenko D., Grohs P., Bölcskei H., Deep neural network approximation theory, IEEE Trans. Inform. Theory 67 (2021) 2581–2623.
[5]
Poggio T., Mhaskar H., Rosasco L., Miranda B., Liao Q., Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review, Int. J. Autom. Comput. 14 (2017) 503–519.
[6]
Montanelli H., Du Q., New error bounds for deep networks using sparse grids, SIAM J. Math. Data Sci. 1 (1) (2019) 78–92.
[7]
Yarotsky D., Error bounds for approximations with deep relu networks, Neural Netw. 94 (2017) 103–114.
[8]
Montanelli H., Yang H., Error bounds for deep ReLU networks using the Kolmogorov–Arnold superposition theorem, Neural Netw. 129 (2020) 1–6.
[9]
E W., Wang Q., Exponential convergence of the deep neural network approximation for analytic functions, Sci. China Math. 61 (10) (2018) 1733–1740.
[10]
Zhou D.X., Universality of deep convolutional neural networks, Appl. Comput. Harmon. Anal. 48 (2) (2020) 787–794.
[11]
Shen Z., Yang H., Zhang S., Deep network approximation characterized by number of neurons, Commun. Comput. Phys. 28 (5) (2020) 1768–1811.
[12]
Shen Z., Yang H., Zhang S., Deep network with approximation error being reciprocal of width to power of square root of depth, Neural Comput. 33 (4) (2021) 1005–1036.
[13]
Shen Z., Yang H., Zhang S., Optimal approximation rate of ReLU networks in terms of width and depth, J. Math. Pures Appl. 157 (2022) 101–135.
[14]
Daubechies I., DeVore R., Foucart S., Hanin B., Petrova G., Nonlinear approximation and (deep) ReLU networks, Constr. Approx. 55 (2022) 127–172.
[15]
Wang Y., A mathematical introduction to generative adversarial nets (GAN), 2020, arXiv:2009.00169.
[16]
Combettes P.L., Pesquet J.-C., Lipschitz certificates for layered network structures driven by averaged activation operators, SIAM J. Math. Data Sci. 2 (2) (2020) 529–557.
[17]
Hasannasab M., Hertrich J., Neumayer S., Plonka G., Setzer S., Steidl G., Parseval proximal neural networks, J. Fourier Anal. Appl. 26 (4) (2020) 31. Paper No. 59.
[18]
K. Scaman, A. Virmaux, Lipschitz regularity of deep neural networks: analysis and efficient estimation, in: 32nd Conference on Neural Information Processing Systems, NeurIPS 2018, Montréal, Canada.
[19]
Zou D., Balan R., Singh M., On Lipschitz bounds of general convolutional neural networks, IEEE Trans. Inform. Theory 66 (3) (2020) 1738–1759.
[20]
Hanin B., Nica M., Finite depth and width corrections to the neural tangent kernel, 2019, arXiv:1909.05989.
[21]
A. Jacot, F. Gabriel, C. Hongler, Neural tangent kernel: convergence and generalization in neural networks, in: 32nd Conference on Neural Information Processing Systems, NeurIPS 2018, Montréal, Canada.
[22]
Q. Nguyen, M. Mondelli, G.F. Montufar, Tight bounds on the smallest eigenvalue of the neural tangent kernel for deep ReLU networks, in: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021, pp. 8119–8129.
[23]
Stein E., Shakarchi R., Fourier Analysis. An Introduction, Princeton University Press, Princeton, NJ, 2003.
[24]
Daubechies I., Ten Lectures on Wavelets, SIAM, Philadelphia, 1992.
[25]
Zaslavsky T., Facing up to arrangements: face-count formulas for partitions of space by hyperplanes, Mem. Amer. Math. Soc. 1 (1) (1975) no. 154.
[26]
Wedderburn J.H.M., Lectures on Matrices, Dover, New York, 1964.
[27]
Artzrouni M., On the convergence of infinite products of matrices, Linear Algebra Appl. 74 (1986) 11–21.
[28]
Lax P.D., Functional Analysis, Wiley-Interscience, New York, 2002.
[29]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 770–778.
[30]
K. He, X. Zhang, S. Ren, J. Sun, Identity mappings in deep residual networks, in: B. Leibe, J. Matas, N. Sebe, M. Welling (Eds.), Computer Vision – ECCV 2016, in: Lecture Notes in Computer Science, vol. 9908, Springer, Cham.
[31]
Folland G.B., Real Analysis: Modern Techniques and their Applications, John Wiley & Sons, 1999, p. 40.
[32]
Chollet F., et al., Keras, 2015, Available at: https://github.com/fchollet/keras.
[33]
K. He, X. Zhang, S. Ren, J. Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, in: 2015 IEEE International Conference on Computer Vision, ICCV, pp. 1026–1034.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neurocomputing
Neurocomputing  Volume 571, Issue C
Feb 2024
210 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 12 April 2024

Author Tags

  1. Deep learning
  2. ReLU networks
  3. Activation domains
  4. Infinite product of matrices

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media