Abstract
Semantic image segmentation task is a key content of the computer vision field. However, the biggest challenge is feature extraction will cause the loss of information in images. Simultaneously, the context information will also get weak, which leads to the rough of segmentation results. Thus, this paper proposes a novel semantic segmentation model using encoder–decoder as the basic structure. The designed model consists of two parallel branches. The first branch is Laplacian Pyramid network. It can store residual features that will be fed into the second branch to restore the lost information effectively. The second branch is backbone segmentation network realized by encoder–decoder structure. In encoder stage, we use ResNet-50 to extract features, and we replace traditional convolution with deformable convolution, which can make the contour of objects clearer after segmentation. In decoder stage, we provide a spatial attention with filtering and centralization to obtain consistent spatial attention matrix. It can capture the correlation of pixels in images and acquire the context information. We also design weighted-sum module of the feature map. It is accomplished by performing element-weighted-wise sum between the decoder feature and corresponding residual feature from the Laplacian Pyramid. This module can recover the boundary and detail information further. Our proposed model can obtain dense feature prediction and promote segmentation accuracy. Experimental results show that the model can reach 91.2% average accuracy and 65.0% mIOU on Caimvid dataset, respectively, which proves the rationality and effectiveness of the proposed model.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Brabandere BD, Neven D, Gool LV (2017) Semantic instance segmentation with a discriminative loss function. Preprint arXiv:1708.02551
Kirillov A, Girshick R, He K (2019) Panoptic feature pyramid networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6399–6408
Hong C, Yu J, Zhang J, Jin X, Lee KH (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inform 15(7):3952–3961
Jian Y, Fidler S, Urtasun R (2012) Describing the scene as a whole: joint object detection, scene classification and semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 702–709
He K, Cao X, Shi Y, Nie D, Gao Y, Shen D (2018) Pelvic organ segmentation using distinctive curve guided fully convolutional networks. IEEE Trans Med Imaging 38(2):585–595
Yu X, Ye X, Gao Q (2019) Pelvic organ segmentation using distinctive curve guided fully convolutional networks. Int J Press Vessels Pip 172:329–336
Yu J, Tao D, Wang M, Rui Y (2014) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779
Lim YW, Sang UL (1990) On the color image segmentation algorithm based on the thresholding and the fuzzy c-means techniques. Pattern Recogn 23(9):935–952
Wang XP, Chen L, Wu S (2014) Watershed image segmentation based on area constraint and adaptive gradient modification. J Optoelectron 25(11):2219–2226
Chanda B, Kundu MK, Padmaja YV (1998) A multi-scale morphologic edge detector. Pattern Recogn 31(10):1469–1478
Leymarie F, Levine MD (1993) Tracking deformable objects in the plane using an active contour model. IEEE Trans Pattern Anal Mach Intell 15(6):617–634
LeCun Y, Boser B, Denker J, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Proceedings of the advances in neural information processing systems, pp 1097–1105
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Preprint arXiv: 1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Jegou S, Drozdzal M, Vazquez D, Romero A, Bengio Y (2017) The one hundred layers tiramisu: fully convolutional denseNets for semantic segmentation. In: IEEE conference on computer vision and pattern recognition workshops, pp 11–19
Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241
Lin GS, Milan A, Shen CH, Reid ID (2017) Refifinenet: multi-path efifinement networks for highresolution semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 5168–5177
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2015) Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: International conference on learning representations, pp 357–361
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Ren Z, Kong Q, Han J, Plumbley MD, Schuller BW (2019) Attention-based atrous convolutional neural networks: visualisation and understanding perspectives of acoustic scenes. In: Proceedings of the advances in international conference on acoustics, speech and signal processing, pp 56–60
Mehta S, Rastegari M, Caspi A, Shapiro L, Hajishirzi H (2018) Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In: European conference on computer vision. Springer, pp 552–568
Heeger DJ, Bergen JR (1995) Pyramid-based texture analysis/synthesis. In: Proc. Conf. Comput. Graph. Interactive techniques, pp 229–238
Paris S, Hasinoff SW, Kautz J (2011) Local Laplacian filters: Edgeaware image processing with a Laplacian pyramid. ACM Trans Graph 30(4):1–12
Burt PJ, Adelson HE (1983) The Laplacian pyramid as a compact image code. IEEE Trans Commun 30(4):532–540
Ghiasi G, Fowlkes CC (2016) Laplacian pyramid reconstruction and refinement for semantic segmentation. In: Proc. Eur. Conf. Comput. Vis, pp 519–534
Dai JF, Qi HZ, Xiong YW, Li Y, Zhang GD, Hu H, Wei YC (2017) Deformable convolutional networks. In: roceedings of the IEEE international conference on computer vision (ICCV), pp 764–773
Lazarow J, Lee K, Shi K, Tu Z (2020) Learning instance occlusion for panoptic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10720–10729
Saleh FS, Aliakbarian MS, Salzmann M, Petersson L, Alvarez JM (2018) Effective use of synthetic data for urban scene semantic segmentation. In: European conference on computer vision, pp 86–103
Hazirbas C, Ma L, Domokos C, Cremers D (2016) Fusenet: incorporating depth into semantic segmentation via fusion-based cnn architecture. In: Asian conference on computer vision Springer, pp 213–228
Dai J, He K, Sun J (2015) Boxsup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: IEEE International conference on computer vision, pp 1635–1643
Li H, Xiong P, Fan HQ, Sun J (2019) Dfanet: Deep feature aggregation for real-time semantic segmentation. In: Proceedings of the ieee conference on computer vision and pattern recognition, pp 9522–9531
Fu J, Liu J, Tian HJ, Li Y, Bao YJ, Fang ZW, Lu HQ (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3146–3154
Yu CQ, Wang JB Peng C, Gao CX, Yu G, Sang N (2018) Learning a discriminative feature network for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1857–1866
Zhu H, Miao Y, Zhang X (2020) Semantic image segmentation with improved position attention and feature fusion. Neural Process Lett 50(1):329–351
Brostow G, Fauqueur J, Cipolla R (2009) Semantic object classes in video: a high-definition ground truth database. Pattern Recogn Lett 30(2):88–97
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, Y. Semantic Image Segmentation with Feature Fusion Based on Laplacian Pyramid. Neural Process Lett 54, 4153–4170 (2022). https://doi.org/10.1007/s11063-022-10801-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-022-10801-0