[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

BG-Net: boundary-guidance network for object consistency maintaining in semantic segmentation

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Semantic segmentation suffers from boundary shift and shape deformation problems due to the neglect of overall guidance information. Motivated by that the object boundaries have a stronger representation for overall information of target objects, we propose a simple yet effective network, boundary-guidance network (BG-Net), for object consistency maintaining in semantic segmentation. In our work, we explore the pixels both near and far from the boundary, characterizing each pixel by utilizing pixel-boundary association. The boundary feature is integrated into the segmentation feature to mitigate boundary shift and shape deformation problem. We explicitly supervise the angle and distance information of pixels pointing to the nearest object boundary. Then the association can be learned by geometric modelling. Meanwhile, the low-level feature emphasized up-sampling (LFEU) module is designed to supplement the detail representation in high-level feature without direct interference. Finally, we evaluate our method on Cityscapes and CamVid datasets. The experimental results demonstrate the superiority of our BG-Net.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The datasets used in this paper are public datasets and can be obtained by contacting the relevant providers.

References

  1. Peng, G., Yang, S., Wang, H.: Refine for semantic segmentation based on parallel convolutional network with attention model. Neural Process. Lett. 53(6), 4177–4188 (2021)

    Article  Google Scholar 

  2. Liu, S., Ye, H., Jin, K., Cheng, H.: Ct-unet: context-transfer-unet for building segmentation in remote sensing images. Neural Process. Lett. 53(6), 4257–4277 (2021)

    Article  Google Scholar 

  3. Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)

    Article  Google Scholar 

  4. Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) European Conference on Computer Vision. Springer, Cham (2014)

    Google Scholar 

  5. Mottaghi, R., Chen, X., Liu, X., Cho, N.G., Lee, S.W., Fidler, S., Yuille, A.: The role of context for object detection and semantic segmentation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 891-898 (2014)

  6. Neuhold, G., Ollmann, T., Bulo, S.R., Kontschieder, P.: The mapillary vistas dataset for semantic understanding of street scenes. In: IEEE International Conference on Computer Vision (2017)

  7. Abu Alhaija, H., Mustikovela, S.K., Mescheder, L., Geiger, A., Rother, C.: Augmented reality meets computer vision : efficient data generation for urban driving scenes. Int. J. Comput. Vis. 2, 1–12 (2017)

    Google Scholar 

  8. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

  9. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. PAMI 40(4), 834–848 (2018)

    Article  Google Scholar 

  10. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016)

  11. Chen, L.C., Papandreou, G., Scroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. CoRR (2017) arXiv:1706.05587

  12. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)

  13. Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. arXiv preprint arXiv:1909.11065 (2019)

  14. Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., Agrawal, A.: Context encoding for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7151–7160 (2018)

  15. Zhang, H., Zhang, H., Wang, C., Xie, J.: Co-occurrent features in semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 548–557 (2019)

  16. Li, X., Zhong, Z., Wu, J., Yang, Y., Lin, Z., Liu, H.: Expectation-maximization attention networks for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9167–9176 (2019)

  17. He, J., Deng, Z., Zhou, L., Wang, Y., Qiao, Y.: Adaptive pyramid context network for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7519–7528 (2019)

  18. Zhang, D., Zhang, H., Tang, J., Hua, X.-S., Sun, Q.: Self-regulation for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6953–6963 (2021)

  19. Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P.H., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6881–6890 (2021)

  20. Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2613–2622 (2021)

  21. Acuna, D., Kar, A., Fidler, S.: Devil is in the edges: Learning semantic boundaries from noisy annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11075–11083 (2019)

  22. Chen, X., Williams, B.M., Vallabhaneni, S.R., Czanner, G., Williams, R., Zheng, Y.: Learning active contour models for medical image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11632–11640 (2019)

  23. Sun, K., Zhao, Y., Jiang, B., Cheng, T., Xiao, B., Liu, D., Mu, Y., Wang, X., Liu, W., Wang, J.: High-resolution representations for labeling pixels and regions. arXiv preprint arXiv:1904.04514 (2019)

  24. Ke, T.-W., Hwang, J.-J., Liu, Z., Yu, S.X.: Adaptive affinity fields for semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 587–602 (2018)

  25. Liu, Y., Cheng, M.-M., Hu, X., Wang, K., Bai, X.: Richer convolutional features for edge detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3000–3009 (2017)

  26. Ding, H., Jiang, X., Liu, A.Q., Thalmann, N.M., Wang, G.: Boundary-aware feature propagation for scene segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6819–6829 (2019)

  27. Fieraru, M., Khoreva, A., Pishchulin, L., Schiele, B.: Learning to refine human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 205–214 (2018)

  28. Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 603–612 (2019)

  29. Kuo, W., Angelova, A., Malik, J., Lin, T.-Y.: Shapemask: Learning to segment novel objects by refining shape priors. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9207–9216 (2019)

  30. Kirillov, A., Wu, Y., He, K., Girshick, R.: Pointrend: Image segmentation as rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9799–9808 (2020)

  31. Yuan, Y., Xie, J., Chen, X., Wang, J.: Segfix: Model-agnostic boundary refinement for segmentation. In: European Conference on Computer Vision, pp. 489–506. Springer, (2020)

  32. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)

    Article  Google Scholar 

  33. Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528 (2015)

  34. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241. Springer, (2015)

  35. Peng, X., Feris, R.S., Wang, X., Metaxas, D.N.: A recurrent encoder-decoder network for sequential face alignment. In: European Conference on Computer Vision, pp. 38–56. Springer, (2016)

  36. Wu, H., Zhang, J., Huang, K., Liang, K., Yu, Y.: Fastfcn: Rethinking dilated convolution in the backbone for semantic segmentation. arXiv preprint arXiv:1903.11816 (2019)

  37. Kirillov, A., Girshick, R., He, K., Dollár, P.: Panoptic feature pyramid networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6399–6408 (2019)

  38. Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)

  39. Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)

  40. Yang, M., Yu, K., Zhang, C., Li, Z., Yang, K.: Denseaspp for semantic segmentation in street scenes. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3684–3692 (2018)

  41. Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)

  42. Zhu, Z., Xu, M., Bai, S., Huang, T., Bai, X.: Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 593–602 (2019)

  43. Zhang, D., Zhang, H., Tang, J., Wang, M., Hua, X., Sun, Q.: Feature pyramid transformer. arXiv preprint arXiv:2007.09451 (2020)

  44. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)

  45. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

  46. Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: Upsnet: A unified panoptic segmentation network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8818–8826 (2019)

  47. Liao, X., Yin, J., Chen, M., Qin, Z.: Adaptive payload distribution in multiple images steganography based on image texture features. IEEE Trans. Dependable Secur. Comput. (2020). https://doi.org/10.1109/TDSC.2020.3004708

    Article  Google Scholar 

  48. Liao, X., Yu, Y., Li, B., Li, Z., Qin, Z.: A new payload partition strategy in color image steganography. IEEE Trans. Circ. Syst. Video Technol. 30(3), 685–696 (2019)

    Article  Google Scholar 

  49. Liao, X., Li, K., Zhu, X., Liu, K.R.: Robust detection of image operator chain with two-stream convolutional neural network. IEEE J. Select. Topics Signal Process. 14(5), 955–968 (2020)

    Article  Google Scholar 

  50. Sun, Y., Chen, Q., He, X., Wang, J., Feng, H., Han, J., Ding, E., Cheng, J., Li, Z., Wang, J.: Singular value fine-tuning: Few-shot segmentation requires few-parameters fine-tuning. arXiv:2206.06122 [cs.CV] (2022)

  51. Li, Z., Sun, Y., Zhang, L., Tang, J.: Ctnet: context-based tandem network for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://doi.org/10.1109/TPAMI.2021.3132068

    Article  Google Scholar 

  52. Sun, Y., Li, Z.: Ssa: Semantic structure aware inference for weakly pixel-wise dense predictions without cost. arXiv preprint arXiv:2111.03392 (2021)

  53. Ma, Z., Yuan, M., Gu, J., Meng, W., Xu, S., Zhang, X.: Triple-strip attention mechanism-based natural disaster images classification and segmentation. Vis. Comput. 38(9), 3163–3173 (2022)

    Article  Google Scholar 

  54. Fu, Y., Chen, Q., Zhao, H.: CGFNet: cross-guided fusion network for RGB-thermal semantic segmentation. Vis. Comput. 38(9–10), 3243–3252 (2022)

    Article  Google Scholar 

  55. Liu, T., Cai, Y., Zheng, J., Thalmann, N.M.: Beacon: a boundary embedded attentional convolution network for point cloud instance segmentation. Vis. Comput. 38(7), 2303–2313 (2022)

    Article  Google Scholar 

  56. Takikawa, T., Acuna, D., Jampani, V., Fidler, S.: Gated-scnn: Gated shape cnns for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5229–5238 (2019)

  57. Li, X., Li, X., Zhang, L., Cheng, G., Shi, J., Lin, Z., Tan, S., Tong, Y.: Improving semantic segmentation via decoupled body and edge supervision. arXiv preprint arXiv:2007.10035 (2020)

  58. Roy, K., Sahay, R.R.: A robust multi-scale deep learning approach for unconstrained hand detection aided by skin segmentation. Vis. Comput. 38(8), 2801–2825 (2022)

    Article  Google Scholar 

  59. Jiang, M., Zhai, F., Kong, J.: Sparse attention module for optimizing semantic segmentation performance combined with a multi-task feature extraction network. Vis. Comput. 38(7), 2473–2488 (2022)

    Article  Google Scholar 

  60. Lyu, C., Hu, G., Wang, D.: Attention to fine-grained information: hierarchical multi-scale network for retinal vessel segmentation. Vis. Comput. (2020). https://doi.org/10.1007/s00371-020-02018-w

    Article  Google Scholar 

  61. Cheng, Z., Qu, A., He, X.: Contour-aware semantic segmentation network with spatial attention mechanism for medical image. Vis. Comput. 38(3), 749–762 (2022)

    Article  Google Scholar 

  62. Krähenbühl, P., Koltun, V.: Efficient inference in fully connected crfs with gaussian edge potentials. In: Advances in Neural Information Processing Systems, pp. 109–117 (2011)

  63. Dias, P.A., Medeiros, H.: Semantic segmentation refinement by monte carlo region growing of high confidence detections. In: Asian Conference on Computer Vision, pp. 131–146. Springer, (2018)

  64. Zhang, C., Lin, G., Liu, F., Yao, R., Shen, C.: Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5217–5226 (2019)

  65. Fan, M., Lai, S., Huang, J., Wei, X., Chai, Z., Luo, J., Wei, X.: Rethinking BiSeNet for real-time semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9716-9725 (2021)

  66. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28 (2015)

  67. Kimmel, R., Kiryati, N., Bruckstein, A.M.: Sub-pixel distance maps and weighted distance transforms. J. Math. Imaging Vis. 6(2), 223–233 (1996)

    Article  MathSciNet  Google Scholar 

  68. Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: ECCV (1), pp. 44–57 (2008)

  69. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

  70. Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)

  71. Tao, A., Sapra, K., Catanzaro, B.: Hierarchical multi-scale attention for semantic segmentation. arXiv preprint arXiv:2005.10821 (2020)

  72. Chen, Y., Kalantidis, Y., Li, J., Yan, S., Feng, J.: \(A^2\)-nets: Double attention networks. Adv. Neural Inform. Process. Syst. 31 (2018)

  73. Amirul Islam, M., Rochan, M., Bruce, N.D., Wang, Y.: Gated feedback refinement network for dense image labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3751–3759 (2017)

  74. Mohan, R., Valada, A.: Efficientps: Efficient panoptic segmentation. arXiv preprint arXiv:2004.02307 (2020)

  75. Cheng, B., Collins, M.D., Zhu, Y., Liu, T., Huang, T.S., Adam, H., Chen, L.-C.: Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12475–12485 (2020)

  76. Chen, L.-C., Lopes, R.G., Cheng, B., Collins, M.D., Cubuk, E.D., Zoph, B., Adam, H., Shlens, J.: Naive-student: Leveraging semi-supervised learning in video sequences for urban scene segmentation. European Conference on Computer Vision, ECCV 2020, pp. 695–714

  77. Kundu, A., Vineet, V., Koltun, V.: Feature space optimization for semantic video segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3168–3175 (2016)

  78. Bilinski, P., Prisacariu, V.: Dense decoder shortcut connections for single-pass semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6596–6605 (2018)

  79. Chandra, S., Couprie, C., Kokkinos, I.: Deep spatio-temporal random fields for efficient video segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8915–8924 (2018)

  80. Zhu, Y., Sapra, K., Reda, F.A., Shih, K.J., Newsam, S., Tao, A., Catanzaro, B.: Improving semantic segmentation via video propagation and label relaxation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8856–8865 (2019)

Download references

Acknowledgements

This study was funded by Zhejiang Provincial Key Laboratory of Harmonized Application of Vision & Transmission, No. 1199, Bin’an Road, Binjiang District, Hangzhou, China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bingyan Liao.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, X., Huang, S., Liao, B. et al. BG-Net: boundary-guidance network for object consistency maintaining in semantic segmentation. Vis Comput 40, 373–391 (2024). https://doi.org/10.1007/s00371-023-02787-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02787-0

Keywords

Navigation