More Web Proxy on the site http://driver.im/

research-article

Convergence of deep ReLU networks

Authors:

Haizhang ZhangAuthors Info & Claims

Volume 571, Issue C

https://doi.org/10.1016/j.neucom.2023.127174

Published: 12 April 2024 Publication History

Abstract

We explore convergence of deep neural networks with the popular ReLU activation function, as the depth of the networks tends to infinity. To this end, we introduce the notion of activation domains and activation matrices of a ReLU network. By replacing applications of the ReLU activation function by multiplications with activation matrices on activation domains, we obtain an explicit expression of the ReLU network. We then identify the convergence of the ReLU networks as convergence of a class of infinite products of matrices. Sufficient and necessary conditions for convergence of these infinite products of matrices are studied. As a result, we establish necessary conditions for ReLU networks to converge that the sequence of weight matrices converges to the identity matrix and the sequence of the bias vectors converges to zero as the depth of ReLU networks increases to infinity. Moreover, we obtain sufficient conditions in terms of the weight matrices and bias vectors at hidden layers for pointwise convergence of deep ReLU networks. These results provide mathematical insights to convergence of deep neural networks. Experiments are conducted to mathematically verify the results and to illustrate their potential usefulness in initialization of deep neural networks.

References

[1]

LeCun Y., Bengio Y., Hinton G., Deep learning, Nature 521 (7553) (2015) 436–444. 2015.

[2]

Goodfellow I., Bengio Y., Courville A., Deep Learning, MIT Press, Cambridge, 2016.

Digital Library

[3]

DeVore R., Hanin B., Petrova G., Neural network approximation, Acta Numerica 30 (2021) 327–444.

[4]

Elbrächter D., Perekrestenko D., Grohs P., Bölcskei H., Deep neural network approximation theory, IEEE Trans. Inform. Theory 67 (2021) 2581–2623.

[5]

Poggio T., Mhaskar H., Rosasco L., Miranda B., Liao Q., Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review, Int. J. Autom. Comput. 14 (2017) 503–519.

[6]

Montanelli H., Du Q., New error bounds for deep networks using sparse grids, SIAM J. Math. Data Sci. 1 (1) (2019) 78–92.

[7]

Yarotsky D., Error bounds for approximations with deep relu networks, Neural Netw. 94 (2017) 103–114.

[8]

Montanelli H., Yang H., Error bounds for deep ReLU networks using the Kolmogorov–Arnold superposition theorem, Neural Netw. 129 (2020) 1–6.

[9]

E W., Wang Q., Exponential convergence of the deep neural network approximation for analytic functions, Sci. China Math. 61 (10) (2018) 1733–1740.

[10]

Zhou D.X., Universality of deep convolutional neural networks, Appl. Comput. Harmon. Anal. 48 (2) (2020) 787–794.

[11]

Shen Z., Yang H., Zhang S., Deep network approximation characterized by number of neurons, Commun. Comput. Phys. 28 (5) (2020) 1768–1811.

[12]

Shen Z., Yang H., Zhang S., Deep network with approximation error being reciprocal of width to power of square root of depth, Neural Comput. 33 (4) (2021) 1005–1036.

[13]

Shen Z., Yang H., Zhang S., Optimal approximation rate of ReLU networks in terms of width and depth, J. Math. Pures Appl. 157 (2022) 101–135.

[14]

Daubechies I., DeVore R., Foucart S., Hanin B., Petrova G., Nonlinear approximation and (deep) ReLU networks, Constr. Approx. 55 (2022) 127–172.

[15]

Wang Y., A mathematical introduction to generative adversarial nets (GAN), 2020, arXiv:2009.00169.

[16]

Combettes P.L., Pesquet J.-C., Lipschitz certificates for layered network structures driven by averaged activation operators, SIAM J. Math. Data Sci. 2 (2) (2020) 529–557.

[17]

Hasannasab M., Hertrich J., Neumayer S., Plonka G., Setzer S., Steidl G., Parseval proximal neural networks, J. Fourier Anal. Appl. 26 (4) (2020) 31. Paper No. 59.

[18]

K. Scaman, A. Virmaux, Lipschitz regularity of deep neural networks: analysis and efficient estimation, in: 32nd Conference on Neural Information Processing Systems, NeurIPS 2018, Montréal, Canada.

[19]

Zou D., Balan R., Singh M., On Lipschitz bounds of general convolutional neural networks, IEEE Trans. Inform. Theory 66 (3) (2020) 1738–1759.

[20]

Hanin B., Nica M., Finite depth and width corrections to the neural tangent kernel, 2019, arXiv:1909.05989.

[21]

A. Jacot, F. Gabriel, C. Hongler, Neural tangent kernel: convergence and generalization in neural networks, in: 32nd Conference on Neural Information Processing Systems, NeurIPS 2018, Montréal, Canada.

[22]

Q. Nguyen, M. Mondelli, G.F. Montufar, Tight bounds on the smallest eigenvalue of the neural tangent kernel for deep ReLU networks, in: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021, pp. 8119–8129.

[23]

Stein E., Shakarchi R., Fourier Analysis. An Introduction, Princeton University Press, Princeton, NJ, 2003.

[24]

Daubechies I., Ten Lectures on Wavelets, SIAM, Philadelphia, 1992.

[25]

Zaslavsky T., Facing up to arrangements: face-count formulas for partitions of space by hyperplanes, Mem. Amer. Math. Soc. 1 (1) (1975) no. 154.

[26]

Wedderburn J.H.M., Lectures on Matrices, Dover, New York, 1964.

[27]

Artzrouni M., On the convergence of infinite products of matrices, Linear Algebra Appl. 74 (1986) 11–21.

[28]

Lax P.D., Functional Analysis, Wiley-Interscience, New York, 2002.

[29]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 770–778.

[30]

K. He, X. Zhang, S. Ren, J. Sun, Identity mappings in deep residual networks, in: B. Leibe, J. Matas, N. Sebe, M. Welling (Eds.), Computer Vision – ECCV 2016, in: Lecture Notes in Computer Science, vol. 9908, Springer, Cham.

[31]

Folland G.B., Real Analysis: Modern Techniques and their Applications, John Wiley & Sons, 1999, p. 40.

[32]

Chollet F., et al., Keras, 2015, Available at: https://github.com/fchollet/keras.

[33]

K. He, X. Zhang, S. Ren, J. Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, in: 2015 IEEE International Conference on Computer Vision, ICCV, pp. 1026–1034.

Recommendations

Convergence of deep convolutional neural networks
Abstract
Convergence of deep neural networks as the depth of the networks tends to infinity is fundamental in building the mathematical foundation for deep learning. In a previous study, we investigated this question for deep networks with the Rectified ...
Parameter identifiability of a deep feedforward ReLU neural network
Abstract
The possibility for one to recover the parameters—weights and biases—of a neural network thanks to the knowledge of its function on a subset of the input space can be, depending on the situation, a curse or a blessing. On one hand, recovering the ...
Quasi-equivalence between width and depth of neural networks

While classic studies proved that wide networks allow universal approximation, recent research and successes of deep learning demonstrate the power of deep networks. Based on a symmetric consideration, we investigate if the design of artificial neural ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neurocomputing

Neurocomputing Volume 571, Issue C

Feb 2024

210 pages

Issue’s Table of Contents

Elsevier B.V.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 12 April 2024

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents