Salient feature network for semantic segmentation

Yan Zhou ORCID: orcid.org/0000-0002-2372-4947¹,
Zhen Zhou¹,
Haibin Zhou²,
Jinzhen Mu³ &
…
Dongli Wang¹

364 Accesses
Explore all metrics

Abstract

In the encoding stage, some existing semantic segmentation networks capture rich multi-scale context information. However, multi-scale approaches do not pay attention to the correlation between different scale feature maps in the multi-scale feature fusion stage. In the decoding stage, simple fusion of high- and low-dimensional channel is used to improve the semantic segmentation, but simple fusion suffers from the defect that the segmentation boundary is not sufficiently clear. In this paper, a salient feature network is proposed to address these two disadvantages. For the first shortcoming, an atrous spatial pyramid pooling with Euclidean distance similarity (EDS-ASPP) module is proposed to enhance the representation of high-level semantic information features, that is, to boost meaningful features, while suppressing weak ones. Therefore, this module can solve the segmentation error inside objects. For the second deficiency, a supplementary details (SD) module is proposed to rearrange the low-level spatial details and the activation graph obtained from the EDS-ASPP module in the decoding stage. The function of this module is to repair the edge details lost during the downsampling process. The proposed model achieves a 73.45% mIoU on PASCAL VOC2012 and a 64.27% mIoU on Cityscapes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Semantic segmentation feature fusion network based on transformer

Article Open access 19 February 2025

FCPFNet: Feature Complementation Network with Pyramid Fusion for Semantic Segmentation

Article Open access 20 February 2024

Multi-scale Spatial Location Preference for Semantic Segmentation

References

Teichmann, M., Weber, M., Zoellner, M., Cipolla, R., Urtasun, R.: Multinet: real-time joint semantic reasoning for autonomous driving. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1013–1020 (2018)
Huang, H., Lin, L., Tong, R., Hu, H., Zhang, Q., Iwamoto, Y., et al.: Unet 3+: A full-scale connected unet for medical image segmentation. In: ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1055–1059 (2020)
Le, T.T., Almansa, A., Gousseau, Y., Masnou, S.: Object removal from complex videos using a few annotations. Comput. Vis. Media 5(3), 267–291 (2019)
Article Google Scholar
Borji, A., Cheng, M., Hou, Q., Jiang, H., Li, J.: Salient object detection: a survey. Comput. Vis. Media 5(2), 117–150 (2019)
Article Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation, pp. 3431–3440 (2015)
Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: looking wider to see better. arXiv preprint arXiv:1506.04579 (2015)
Chen, L., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder–decoder with atrous separable convolution for semantic image segmentation, pp. 833–851 (2018)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2881–2890 (2017)
Hu, J., Shen, L., Albanie, S., Sun, G.,Wu, E.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Mottaghi, R., Chen, X., Liu, X., Cho, N.G., Lee, S.W., Fidler, S., et al.: The role of context for object detection and semantic segmentation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 891–898 (2014)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016)
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 633–641 (2017)
Caesar, H., Uijlings, J., Ferrari, V.: Coco-stuff: thing and stuff classes in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1209–1218 (2017)
hang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., et al.: Context encoding for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7151–7160 (2018)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid Scene Parsing Network, pp. 2881–2890. IEEE Computer Society, Washington (2017)
Google Scholar
Chen, L., Barron, J.T., Papandreou, G., Murphy, K., Yuille, A.L.: Semantic image segmentation with task-specific edge detection using CNNS and a discriminatively trained domain transform. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4545–4554 (2016)
Lin, G., Shen, C., Den.Hengel, A.V., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3194–3203 (2016)
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Article Google Scholar
Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 Recognition (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Ghiasi, G., Fowlkes, C.C.: Laplacian pyramid reconstruction and refinement for semantic segmentation. In: European Conference on Computer Vision, pp. 519–534 (2016)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision, pp. 483–499 (2016)
Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8759–8768 (2018)
Shrivastava, A., Sukthankar, R., Malik, J., Gupta, A.: Beyond skip connections: top-down modulation for object detection. arXiv preprint arXiv:1612.06851 (2016)
Zhang, Z., Zhang, X., Peng, C., Xue, X., Sun, J.: Exfuse: enhancing feature fusion for semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 269–284 (2018)
Yuan, Y., Wang, J.: Ocnet: Object context network for scene parsing. arXiv preprint arXiv:1809.00916 (2018)
Xue, H., Liu, C., Wan, F., Jiao, J., Ji, X., Ye, Q.: Danet: divergent activation for weakly supervised object localization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6589–6598 (2019)
Yu, Z., Feng, C., Liu, M.Y., Ramalingam, S.: Casenet: deep category-aware semantic edge detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5964–5973 (2017)
Chen, L., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 833–851 (2018)
Everingham, M., Van.Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes challenge 2012 (voc2012) results (2012). http://www.pascal-network.org/challenges/VOC/voc2011/workshop/index.html (2011)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016)
Li, H., Xiong, P., An, J., Wang, L.: Pyramid attention network for semantic segmentation. arXiv preprint arXiv:1805.10180 (2018)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (61773330), the National key R&D Program of China (SQ2020YFA070205), the Project of Shanghai Municipal Science and Technology Commission (19511120900), the Research Project of the Department of Education of Hunan Province (19C1740), and Xiangtan University Innovation Foundation for Postgraduate (XDCX2020B083).

Author information

Authors and Affiliations

School of Automation and Electronic Information, Xiangtan University, Xiangtan, 411105, China
Yan Zhou, Zhen Zhou & Dongli Wang
School of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, China
Haibin Zhou
Shanghai Aerospace Control Technology Institute, Shanghai, 201109, China
Jinzhen Mu

Authors

Yan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Haibin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jinzhen Mu
View author publications
You can also search for this author in PubMed Google Scholar
Dongli Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Zhou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, Y., Zhou, Z., Zhou, H. et al. Salient feature network for semantic segmentation. SIViP 16, 763–771 (2022). https://doi.org/10.1007/s11760-021-02016-y

Download citation

Received: 25 October 2020
Revised: 18 May 2021
Accepted: 18 August 2021
Published: 16 September 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s11760-021-02016-y

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic segmentation feature fusion network based on transformer

FCPFNet: Feature Complementation Network with Pyramid Fusion for Semantic Segmentation

Multi-scale Spatial Location Preference for Semantic Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Salient feature network for semantic segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic segmentation feature fusion network based on transformer

FCPFNet: Feature Complementation Network with Pyramid Fusion for Semantic Segmentation

Multi-scale Spatial Location Preference for Semantic Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now