Power-law initialization algorithm for convolutional neural networks

Kaiwen Jiang¹,
Jian Liu²,
Tongtong Xing¹,
Shujing Li¹,
Shunyao Wu¹,
Fengjing Shao¹ &
…
Rencheng Sun¹

162 Accesses
Explore all metrics

Abstract

Well-honed CNN architectures trained with massive labeled images datasets are the state-of-the-art solution in many fields. In this paper, the weights of five commonly used pre-trained models are carefully analyzed for extracting their numerical characteristics and spatial distribution law. The general characteristics are: (1) the weights of a single convolutional layer conform to the distribution of symmetric power law. (2) the power exponent at the center of its convolutional kernel is relatively large, and the power exponent decreases radially from the center. (3) the value range of power exponents between layers is continuous from \(- 0.5\) to \(- 3.5\). Based on these founding, a weight initialization method is proposed in order to speed up the convergence and improve the performance of CNN models. The proposed weight initialization method is compared with several commonly used methods. Extensive experiments show that it can improve the convergence speed of the CNN models, and the model accuracy is improved by 1–3%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Data-driven weight initialization strategy for convolutional neural networks

Article 14 November 2024

Delving into Feature Maps: An Explanatory Analysis to Evaluate Weight Initialization

A review on weight initialization strategies for neural networks

Article 28 June 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Islam NU, Lee S (2019) Interpretation of deep cnn based on learning feature reconstruction with feedback weights. IEEE Access 7:25-195–25-208
Article Google Scholar
Go J, Baek B, Lee C (2004) Analyzing weight distribution of feedforward neural networks and efficient weight initialization. In: Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR). Springer, pp 840–849
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:1609.04747
D. Nguyen and B. Widrow, Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights, in (1990) IJCNN international joint conference on neural networks. IEEE 1990:21–26
Google Scholar
Yam JY, Chow TW (2001) Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients. IEEE Trans. Neural Netw 12(2):430–434
Article Google Scholar
Sodhi SS, Chandra P, Tanwar S (2014), A new weight initialization method for sigmoidal feedforward artificial neural networks. In: 2014 international joint conference on neural networks (IJCNN). IEEE, pp 291–298
Cun Y, Bottou L, Orr G, Muller K (1998) Efficient backprop, neural networks: tricks of the trade. Lect Notes Comput Sci 1524:5–50
Google Scholar
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, pp 249–256
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:1609.04747
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
Hirasawa K, Ohbayashi M, Koga M, Harada M (1996) Forward propagation universal learning network. In: Proceedings of international conference on neural networks (ICNN’96), vol 1. minus IEEE, pp 353–358
Rumelhart D. E, Hinton G. E, Williams R. J (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Article MATH Google Scholar
Eguiluz VM, Chialvo DR, Cecchi GA, Baliki M, Apkarian AV (2005) Scale-free brain functional networks. Phys Rev Lett 94(1):018102
Article Google Scholar
Tongtong X, Rencheng S, Fengjing S, Yi S (2022) Research on weight initialization method in deep learning. Comput Eng 007:048
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems 28 (NIPS 2015)
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32 (NeurIPS 2019). arXiv:1912.01703
Ioffe S, Szegedy C (2015), Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. pmlr, pp 448–456
Jarque CM (2011) Jarque-Bera test. Springer, Berlin, pp 701–702. https://doi.org/10.1007/978-3-642-04898-2_319
Book Google Scholar
Kim H-Y (2013) Statistical notes for clinical researchers: assessing normal distribution (2) using skewness and kurtosis. Restor Dent Endod 38(1):52–54
Article Google Scholar
Hou L, Kwok JT (2018) Power law in sparsified deep neural networks. arXiv:1805.01891

Download references

Funding

This research is supported by Young Scientists Fund of the National Natural Science Foundation of China (41706198) and Qingdao Independent innovation major special project (21-1-2-1hy).

Author information

Authors and Affiliations

College of Computer Science and Technology, University of Qingdao, No. 308, Ningxia Road, Qingdao, 266071, China
Kaiwen Jiang, Tongtong Xing, Shujing Li, Shunyao Wu, Fengjing Shao & Rencheng Sun
Qingdao Stomatological Hospital, No. 17, Dexian Road, Qingdao, 266000, China
Jian Liu

Authors

Kaiwen Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tongtong Xing
View author publications
You can also search for this author in PubMed Google Scholar
Shujing Li
View author publications
You can also search for this author in PubMed Google Scholar
Shunyao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Fengjing Shao
View author publications
You can also search for this author in PubMed Google Scholar
Rencheng Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rencheng Sun.

Ethics declarations

Data availability

The weight data for the pretrained models supporting the findings of this study are publicly available for download at https://pytorch.org/vision/stable/models.html.

Conflict of interest

The authors declare no potential conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jiang, K., Liu, J., Xing, T. et al. Power-law initialization algorithm for convolutional neural networks. Neural Comput & Applic 35, 22431–22447 (2023). https://doi.org/10.1007/s00521-023-08881-7

Download citation

Received: 07 January 2023
Accepted: 12 July 2023
Published: 10 August 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s00521-023-08881-7

Power-law initialization algorithm for convolutional neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Data-driven weight initialization strategy for convolutional neural networks

Delving into Feature Maps: An Explanatory Analysis to Evaluate Weight Initialization

A review on weight initialization strategies for neural networks

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Data availability

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Power-law initialization algorithm for convolutional neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Data-driven weight initialization strategy for convolutional neural networks

Delving into Feature Maps: An Explanatory Analysis to Evaluate Weight Initialization

A review on weight initialization strategies for neural networks

Explore related subjects

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Data availability

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation