A novel weight initialization with adaptive hyper-parameters for deep semantic segmentation

Nuhman Ul Haq¹,
Ahmad Khan¹,
Zia ur Rehman ORCID: orcid.org/0000-0001-5836-0488¹,
Ahmad Din¹,
Ling Shao² &
…
Sajid Shah¹

370 Accesses
2 Citations
Explore all metrics

Abstract

The semantic segmentation process divides an image into its constituent objects and background by assigning a corresponding class label to each pixel in the image. Semantic segmentation is an important area in computer vision with wide practical applications. The contemporary semantic segmentation approaches are primarily based on two types of deep neural networks architectures i.e., symmetric and asymmetric networks. Both types of networks consist of several layers of neurons which are arranged in two sections called encoder and decoder. The encoder section receives the input image and the decoder section outputs the segmented image. However, both sections in symmetric networks have the same number of layers and the number of neurons in an encoder layer is the same as that of the corresponding layer in the decoder section but asymmetric networks do not strictly follow such one-one correspondence between encoder and decoder layers. At the moment, SegNet and ESNet are the two leading state-of-the-art symmetric encoder-decoder deep neural network architectures. However, both architectures require extensive training for good generalization and need several hundred epochs for convergence. This paper aims to improve the convergence and enhance network generalization by introducing two novelties into the network training process. The first novelty is a weight initialization method and the second contribution is an adaptive mechanism for dynamic layer learning rate adjustment in training loop. The proposed initialization technique uses transfer learning to initialize the encoder section of the network, but for initialization of decoder section, the weights of the encoder section layers are copied to the corresponding layers of the decoder section. The second contribution of the paper is an adaptive layer learning rate method, wherein the learning rates of the encoder layers are updated based on a metric representing the difference between the probability distributions of the input images and encoder weights. Likewise, the learning rates of the decoder layers are updated based on the difference between the probability distributions of the output labels and decoder weights. Intensive empirical validation of the proposed approach shows significant improvement in terms of faster convergence and generalization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Supervised semantic segmentation based on deep learning: a survey

Article 02 April 2022

Semantic Guided Deep Unsupervised Image Segmentation

Semantic image segmentation algorithm in a deep learning computer network

Article 10 August 2020

References

Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin
Book Google Scholar
Abualigah L (2020) Multi-verse optimizer algorithm: A comprehensive survey of its results, variants, and applications. Neural Comput Applic 1–21
Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73(11):4773–4795
Article Google Scholar
Afouras T, Chung JS, Senior A, Vinyals O, Zisserman A (2018) Deep audio-visual speech recognition. IEEE Trans Pattern Anal Mach Intell 1–1 https://doi.org/10.1109/TPAMI.2018.2889052
Agarwal N, Allen-Zhu Z, Bullins B, Hazan E, Ma T (2017) Finding approximate local minima faster than gradient descent. In: Proceedings of the 49th annual ACM SIGACT symposium on theory of computing, ACM, New York, NY, USA, STOC. https://doi.org/10.1145/3055399.3055464, vol 2017, pp 1195–1199
Andrew S, McClelland J, Surya G (2013) Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:13126120
Badrinarayanan V, Kendall A, Cipolla R (2015) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv:151100561
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615
Article Google Scholar
Bengio Y, et al. (2009) Learning deep architectures for AI. Found Trends®; Mach Learn 2(1):1–127
Article MathSciNet Google Scholar
Brox T, Weickert J (2006) Level set segmentation with multiple regions. IEEE Trans Image Process 15(10):3213–3218. https://doi.org/10.1109/TIP.2006.877481
Article Google Scholar
Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell PAMI 8(6):679–698. https://doi.org/10.1109/TPAMI.1986.4767851
Article Google Scholar
Cashman D, Patterson G, Mosca A, Watts N, Robinson S, Chang R (2018) RNNBow: Visualizing learning via backpropagation gradients in RNNs. IEEE Comput Graph Appl 38(6):39–50. https://doi.org/10.1109/MCG.2018.2878902
Article Google Scholar
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the european conference on computer vision (ECCV), pp 801–818
Cheng Y, Wang D, Zhou P, Zhang T (2017) A survey of model compression and acceleration for deep neural networks. arXiv:171009282
Cheng Y, Wang D, Zhou P, Zhang T (2018) Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Proc Mag 35(1):126–136
Article Google Scholar
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Chorowski J, Bahdanau D, Serdyuk D, Cho K, Bengio Y (2015) Attention-based models for speech recognition. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, Curran Associates, Inc. http://papers.nips.cc/paper/5847-attention-based-models-for-speech-recognition.pdf, pp 577–585
Chuang KS, Tzeng HL, Chen S, Wu J, Chen TJ (2006) Fuzzy c-means clustering with spatial information for image segmentation. Comput Med Imag Grap 30(1):9–15. https://doi.org/10.1016/j.compmedimag.2005.10.001. http://www.sciencedirect.com/science/article/pii/S0895611105000923
Article Google Scholar
Csurka G, Larlus D, Perronnin F (2013) What is a good evaluation measure for semantic segmentation?. In: BMVC
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
Dauphin YN, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y (2014) Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: Ghahramani Z, Welling M, Cortes C, Lawrence N D, Weinberger K Q (eds) Advances in neural information processing systems, vol 27. Curran Associates Inc, New York, pp 2933–2941
Gabriel B, Julien F, Roberto C (2009) Semantic object classes in video: A high-definition ground truth database. Pattern Recogn Lett 30(2):88–97
Article Google Scholar
Gao Z, Gao LS, Zhang H, Cheng Z, Hong R (2019) Deep spatial pyramid features collaborative reconstruction for partial person reid. In: Proceedings of the 27th ACM international conference on multimedia, pp 1879–1887
Gao XW, Hui R, Tian Z (2017) Classification of CT brain images based on deep learning networks. Comput Methods Programs Biomed 138:49–56. https://doi.org/10.1016/j.cmpb.2016.10.007. http://www.sciencedirect.com/science/article/pii/S0169260716305296
Article Google Scholar
Gao H, Liu Z, Weinberger K, Van der Maaten L (2017) Deep residual learning for image recognition. In: CVPR
Gao Z, Xue H, Wan S (2020) Multiple discrimination and pairwise cnn for view-based 3d object retrieval. Neural Netw
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Teh YW, Titterington M (eds) Proceedings of the thirteenth international conference on artificial intelligence and statistics, PMLR, Chia Laguna Resort, Sardinia, Italy, Proceedings of machine learning research. http://proceedings.mlr.press/v9/glorot10a.html, vol 9, pp 249–256
Han J, Zhang D, Cheng G, Liu N, Xu D (2018) Advanced deep-learning techniques for salient and category-specific object detection: A survey. IEEE Signal Proc Mag 35(1):84–100. https://doi.org/10.1109/MSP.2017.2749125
Article Google Scholar
Hanin B (2018) Which neural net architectures give rise to exploding and vanishing gradients?. In: Advances in neural information processing systems, vol 31. Curran Associates Inc, New York, pp 582–591
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hochreiter S (1998) The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int JUncert Fuzz Knowl-Based Syst 06 (02):107–116. https://doi.org/10.1142/S0218488598000094
Article Google Scholar
Huang G, Sun Y, Liu Z, Sedra D, Weinberger KQ (2016) Deep networks with stochastic depth. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision – ECCV, vol 2016. Springer International Publishing, Cham, pp 646–661
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456
Kaiming H, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision. https://doi.org/10.1109/iccv.2015.123, pp 1026–1034
Kawaguchi K (2016) Advances in neural information processing systems, curran associates, Inc. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds). http://papers.nips.cc/paper/6112-deep-learning-without-poor-local-minima.pdf, pp 586–594
Kendall A, Badrinarayanan V, Cipolla R (2015) Bayesian SegNet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv:1511.02680
Khan A, Jaffar MA (2015) Genetic algorithm and self organizing map based fuzzy hybrid intelligent method for color image segmentation. Appl Soft Comput 32:300–310
Article Google Scholar
Khan A, Jaffar MA, Choi TS (2013) Som and fuzzy based color image segmentation. Multimed Tools Appl 64(2):331–344
Article Google Scholar
Khan A, Jaffar MA, Shao L (2015) A modified adaptive differential evolution algorithm for color image segmentation. Knowl Inf Syst 43(3):583–597
Article Google Scholar
Khan A, Ullah J, Jaffar MA, Choi TS (2014) Color image segmentation: A novel spatial fuzzy genetic algorithm. SIViP 8(7):1233–1243
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436. https://doi.org/10.1038/nature14539
Article Google Scholar
Li C, Kao C, Gore JC, Ding Z (2008) Minimization of region-scalable fitting energy for image segmentation. IEEE Trans Image Process Pub IEEE Signal Process Soc 17(10):1940
MathSciNet MATH Google Scholar
Lim W, Jang D, Lee T (2016) Speech emotion recognition using convolutional and recurrent neural networks. In: 2016 Asia-pacific signal and information processing association annual summit and conference (APSIPA), pp 1–4, DOI https://doi.org/10.1109/APSIPA.2016.7820699, (to appear in print)
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440
Marblestone AH, Wayne G, Kording KP (2016) Toward an integration of deep learning and neuroscience. Front Comput Neurosci 10:94. https://doi.org/10.3389/fncom.2016.00094. https://www.frontiersin.org/article/10.3389/fncom.2016.00094
Article Google Scholar
Marquez ES, Hare JS, Niranjan M (2018) Deep cascade learning. IEEE Trans Neural Netw Learn Syst 29(11):5475–5485. https://doi.org/10.1109/TNNLS.2018.2805098
Article MathSciNet Google Scholar
Mesnil G, Dauphin Y, Yao K, Bengio Y, Deng L, Hakkani-Tur D, He X, Heck L, Tur G, Yu D, Zweig G (2015) Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans Audio Speech Language Process 23(3):530–539. https://doi.org/10.1109/TASLP.2014.2383614
Article Google Scholar
Mishkin D, Matas J (2015) All you need is a good init. arXiv:151106422
Montufar G, Pascanu R, Cho K, Bengio Y (2014) On the number of linear regions of deep neural networks. In: Advances in neural information processing systems, pp 2924–2932
Nwankpa C, Ijomah W, Gachagan A, Marshall S (2018) Activation functions: Comparison of trends in practice and research for deep learning. arXiv:1811.03378
Ohlander R, Price K, Reddy DR (1978) Picture segmentation using a recursive region splitting method. Comput Graphics Image Process 8 (3):313–333. https://doi.org/10.1016/0146-664X(78)90060-6. http://www.sciencedirect.com/science/article/pii/0146664X78900606
Article Google Scholar
Omran MGH, Salman A, Engelbrecht AP (2005) Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal Applic 8(4):332. https://doi.org/10.1007/s10044-005-0015-5
Article MathSciNet Google Scholar
Ouyang W, Wang X, Zeng X, Qiu S, Luo P, Tian Y, Li H, Yang S, Wang Z, Loy CC, Tang X (2015) DeepID-net: Deformable deep convolutional neural networks for object detection. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Poole B, Lahiri S, Raghu M, Sohl-Dickstein J, Ganguli S (2016) Exponential expressivity in deep neural networks through transient chaos. In: Advances in neural information processing systems, pp 3360–3368
Qian Y, Bi M, Tan T, Yu K (2016) Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans Audio Speech Language Process 24(12):2263–2276. https://doi.org/10.1109/TASLP.2016.2602884
Article Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells W M, Frangi A F (eds) Medical image computing and computer-assisted intervention – MICCAI, vol 2015. Springer International Publishing, Cham, pp 234–241
Ros G, Sellart L, Materzynska J, Vazquez D, Lopez AM (2016) The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3234–3243
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:1609.04747
Sainath TN, Kingsbury B, Saon G, Soltau H, Rahman Mohamed A, Dahl G, Ramabhadran B (2015) Deep convolutional neural networks for large-scale speech tasks. Neural Netw 64:39–48. https://doi.org/10.1016/j.neunet.2014.08.005. http://www.sciencedirect.com/science/article/pii/S0893608014002007, special Issue on Deep Learning of Representations
Article Google Scholar
Schmidhuber J (2015) Deep learning in neural networks: An overview. Neural Netw 61:85–117. https://doi.org/10.1016/j.neunet.2014.09.003. http://www.sciencedirect.com/science/article/pii/S0893608014002135
Article Google Scholar
Shalev-Shwartz S, Shamir O, Shammah S (2017) Failures of gradient-based deep learning. In: Proceedings of the 34th international conference on machine learning - vol 70, JMLR.org, ICML’17. http://dl.acm.org/citation.cfm?id=3305890.3305998, pp 3067–3075
Shickel B, Tighe PJ, Bihorac A, Rashidi P (2018) Deep EHR: A survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inform 22(5):1589–1604. https://doi.org/10.1109/JBHI.2017.2767063
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Tremeau A, Borel N (1997) A region growing and merging algorithm to color segmentation. Pattern Recogn 30(7):1191–1203. https://doi.org/10.1016/S0031-3203(96)00147-1
Article Google Scholar
Ullah J, Khan A, Jaffar MA (2018) Motion cues and saliency based unconstrained video segmentation. Multimed Tools Appl 77(6):7429–7446
Article Google Scholar
Wang Y, Zhou Q, Xiong J, Wu X, Jin X (2019) Esnet: An efficient symmetric network for real-time semantic segmentation. In: Lin Z, Wang L, Yang J, Shi G, Tan T, Zheng N, Chen X, Zhang Y (eds) Chinese conference on pattern recognition and computer vision. Springer International Publishing, Cham, pp 41–52
Wu JT, Dernoncourt F, Gehrmann S, Tyler PD, Moseley ET, Carlson ET, Grant DW, Li Y, Welt J, Celi LA (2018) Behind the scenes: A medical natural language processing project. Int J Med Inform 112:68–73. https://doi.org/10.1016/j.ijmedinf.2017.12.003. http://www.sciencedirect.com/science/article/pii/S138650561730446X
Article Google Scholar
Wu L, Zhu ZEW (2017) Towards understanding generalization of deep learning: Perspective of loss landscapes. arXiv:1706.10239
Yang J, Yang G (2018) Modified convolutional neural network based on dropout and the stochastic gradient descent optimizer. Algorithms 11(3), https://doi.org/10.3390/a11030028. http://www.mdpi.com/1999-4893/11/3/28
Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag 13 (3):55–75. https://doi.org/10.1109/MCI.2018.2840738
Article Google Scholar
Zhang J, Lei Q, Dhillon IS (2018) Stabilizing gradients for deep neural networks via efficient SVD parameterization. arXiv:1803.09327
Zhang L, Yang F, Daniel Zhang Y, Zhu YJ (2016) Road crack detection using deep convolutional neural network. In: 2016 IEEE international conference on image processing (ICIP), pp 3708–3712, DOI https://doi.org/10.1109/ICIP.2016.7533052, (to appear in print)

Download references

Author information

Authors and Affiliations

COMSATS University Islamabad (CUI), Abbottabad Campus University Road Tobe Camp, Abbottabad, Pakistan
Nuhman Ul Haq, Ahmad Khan, Zia ur Rehman, Ahmad Din & Sajid Shah
Inception Institute of Artificial Intelligence, Abu Dhabi, United Arab Emirates
Ling Shao

Authors

Nuhman Ul Haq
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Khan
View author publications
You can also search for this author in PubMed Google Scholar
Zia ur Rehman
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Din
View author publications
You can also search for this author in PubMed Google Scholar
Ling Shao
View author publications
You can also search for this author in PubMed Google Scholar
Sajid Shah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zia ur Rehman.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haq, N.U., Khan, A., Rehman, Z.u. et al. A novel weight initialization with adaptive hyper-parameters for deep semantic segmentation. Multimed Tools Appl 80, 21771–21787 (2021). https://doi.org/10.1007/s11042-021-10510-1

Download citation

Received: 09 December 2019
Revised: 27 October 2020
Accepted: 05 January 2021
Published: 20 March 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11042-021-10510-1

A novel weight initialization with adaptive hyper-parameters for deep semantic segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Supervised semantic segmentation based on deep learning: a survey

Semantic Guided Deep Unsupervised Image Segmentation

Semantic image segmentation algorithm in a deep learning computer network

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A novel weight initialization with adaptive hyper-parameters for deep semantic segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Supervised semantic segmentation based on deep learning: a survey

Semantic Guided Deep Unsupervised Image Segmentation

Semantic image segmentation algorithm in a deep learning computer network

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation