Abstract
Well-honed CNN architectures trained with massive labeled images datasets are the state-of-the-art solution in many fields. In this paper, the weights of five commonly used pre-trained models are carefully analyzed for extracting their numerical characteristics and spatial distribution law. The general characteristics are: (1) the weights of a single convolutional layer conform to the distribution of symmetric power law. (2) the power exponent at the center of its convolutional kernel is relatively large, and the power exponent decreases radially from the center. (3) the value range of power exponents between layers is continuous from \(- 0.5\) to \(- 3.5\). Based on these founding, a weight initialization method is proposed in order to speed up the convergence and improve the performance of CNN models. The proposed weight initialization method is compared with several commonly used methods. Extensive experiments show that it can improve the convergence speed of the CNN models, and the model accuracy is improved by 1–3%.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Islam NU, Lee S (2019) Interpretation of deep cnn based on learning feature reconstruction with feedback weights. IEEE Access 7:25-195–25-208
Go J, Baek B, Lee C (2004) Analyzing weight distribution of feedforward neural networks and efficient weight initialization. In: Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR). Springer, pp 840–849
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:1609.04747
D. Nguyen and B. Widrow, Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights, in (1990) IJCNN international joint conference on neural networks. IEEE 1990:21–26
Yam JY, Chow TW (2001) Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients. IEEE Trans. Neural Netw 12(2):430–434
Sodhi SS, Chandra P, Tanwar S (2014), A new weight initialization method for sigmoidal feedforward artificial neural networks. In: 2014 international joint conference on neural networks (IJCNN). IEEE, pp 291–298
Cun Y, Bottou L, Orr G, Muller K (1998) Efficient backprop, neural networks: tricks of the trade. Lect Notes Comput Sci 1524:5–50
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, pp 249–256
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:1609.04747
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
Hirasawa K, Ohbayashi M, Koga M, Harada M (1996) Forward propagation universal learning network. In: Proceedings of international conference on neural networks (ICNN’96), vol 1. minus IEEE, pp 353–358
Rumelhart D. E, Hinton G. E, Williams R. J (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Eguiluz VM, Chialvo DR, Cecchi GA, Baliki M, Apkarian AV (2005) Scale-free brain functional networks. Phys Rev Lett 94(1):018102
Tongtong X, Rencheng S, Fengjing S, Yi S (2022) Research on weight initialization method in deep learning. Comput Eng 007:048
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems 28 (NIPS 2015)
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32 (NeurIPS 2019). arXiv:1912.01703
Ioffe S, Szegedy C (2015), Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. pmlr, pp 448–456
Jarque CM (2011) Jarque-Bera test. Springer, Berlin, pp 701–702. https://doi.org/10.1007/978-3-642-04898-2_319
Kim H-Y (2013) Statistical notes for clinical researchers: assessing normal distribution (2) using skewness and kurtosis. Restor Dent Endod 38(1):52–54
Hou L, Kwok JT (2018) Power law in sparsified deep neural networks. arXiv:1805.01891
Funding
This research is supported by Young Scientists Fund of the National Natural Science Foundation of China (41706198) and Qingdao Independent innovation major special project (21-1-2-1hy).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Data availability
The weight data for the pretrained models supporting the findings of this study are publicly available for download at https://pytorch.org/vision/stable/models.html.
Conflict of interest
The authors declare no potential conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jiang, K., Liu, J., Xing, T. et al. Power-law initialization algorithm for convolutional neural networks. Neural Comput & Applic 35, 22431–22447 (2023). https://doi.org/10.1007/s00521-023-08881-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08881-7