[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

Power-law initialization algorithm for convolutional neural networks

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Well-honed CNN architectures trained with massive labeled images datasets are the state-of-the-art solution in many fields. In this paper, the weights of five commonly used pre-trained models are carefully analyzed for extracting their numerical characteristics and spatial distribution law. The general characteristics are: (1) the weights of a single convolutional layer conform to the distribution of symmetric power law. (2) the power exponent at the center of its convolutional kernel is relatively large, and the power exponent decreases radially from the center. (3) the value range of power exponents between layers is continuous from \(- 0.5\) to \(- 3.5\). Based on these founding, a weight initialization method is proposed in order to speed up the convergence and improve the performance of CNN models. The proposed weight initialization method is compared with several commonly used methods. Extensive experiments show that it can improve the convergence speed of the CNN models, and the model accuracy is improved by 1–3%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  2. Islam NU, Lee S (2019) Interpretation of deep cnn based on learning feature reconstruction with feedback weights. IEEE Access 7:25-195–25-208

    Article  Google Scholar 

  3. Go J, Baek B, Lee C (2004) Analyzing weight distribution of feedforward neural networks and efficient weight initialization. In: Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR). Springer, pp 840–849

  4. Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:1609.04747

  5. D. Nguyen and B. Widrow, Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights, in (1990) IJCNN international joint conference on neural networks. IEEE 1990:21–26

    Google Scholar 

  6. Yam JY, Chow TW (2001) Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients. IEEE Trans. Neural Netw 12(2):430–434

    Article  Google Scholar 

  7. Sodhi SS, Chandra P, Tanwar S (2014), A new weight initialization method for sigmoidal feedforward artificial neural networks. In: 2014 international joint conference on neural networks (IJCNN). IEEE, pp 291–298

  8. Cun Y, Bottou L, Orr G, Muller K (1998) Efficient backprop, neural networks: tricks of the trade. Lect Notes Comput Sci 1524:5–50

    Google Scholar 

  9. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, pp 249–256

  10. Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:1609.04747

  11. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

  12. Hirasawa K, Ohbayashi M, Koga M, Harada M (1996) Forward propagation universal learning network. In: Proceedings of international conference on neural networks (ICNN’96), vol 1. minus IEEE, pp 353–358

  13. Rumelhart D. E, Hinton G. E, Williams R. J (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536

    Article  MATH  Google Scholar 

  14. Eguiluz VM, Chialvo DR, Cecchi GA, Baliki M, Apkarian AV (2005) Scale-free brain functional networks. Phys Rev Lett 94(1):018102

    Article  Google Scholar 

  15. Tongtong X, Rencheng S, Fengjing S, Yi S (2022) Research on weight initialization method in deep learning. Comput Eng 007:048

    Google Scholar 

  16. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  18. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  19. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems 28 (NIPS 2015)

  20. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32 (NeurIPS 2019). arXiv:1912.01703

  21. Ioffe S, Szegedy C (2015), Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. pmlr, pp 448–456

  22. Jarque CM (2011) Jarque-Bera test. Springer, Berlin, pp 701–702. https://doi.org/10.1007/978-3-642-04898-2_319

    Book  Google Scholar 

  23. Kim H-Y (2013) Statistical notes for clinical researchers: assessing normal distribution (2) using skewness and kurtosis. Restor Dent Endod 38(1):52–54

    Article  Google Scholar 

  24. Hou L, Kwok JT (2018) Power law in sparsified deep neural networks. arXiv:1805.01891

Download references

Funding

This research is supported by Young Scientists Fund of the National Natural Science Foundation of China (41706198) and Qingdao Independent innovation major special project (21-1-2-1hy).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rencheng Sun.

Ethics declarations

Data availability

The weight data for the pretrained models supporting the findings of this study are publicly available for download at https://pytorch.org/vision/stable/models.html.

Conflict of interest

The authors declare no potential conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, K., Liu, J., Xing, T. et al. Power-law initialization algorithm for convolutional neural networks. Neural Comput & Applic 35, 22431–22447 (2023). https://doi.org/10.1007/s00521-023-08881-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08881-7

Keywords

Navigation