[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Credible Dual-Expert Learning for Weakly Supervised Semantic Segmentation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Great progress has been witnessed for weakly supervised semantic segmentation, which aims to segment objects without dense pixel annotations. Most approaches concentrate on generating high quality pseudo labels, which are then fed into a standard segmentation model as supervision. However, such a solution has one major limitation: noise of pseudo labels is inevitable, which is unsolvable for the standard segmentation model. In this paper, we propose a credible dual-expert learning (CDL) framework to mitigate the noise of pseudo labels. Specifically, we first observe that the model predictions with different optimization loss functions will have different credible regions; thus, it is possible to make self-corrections with multiple predictions. Based on this observation, we design a dual-expert structure to mine credible predictions, which are then processed by our noise correction module to update pseudo labels in an online way. Meanwhile, to handle the case that the dual-expert produces incredible predictions for the same region, we design a relationship transfer module to provide feature relationships, enabling our noise correction module to transfer predictions from the credible regions to such incredible regions. Considering the above designs, we propose a base CDL network and an extended CDL network to satisfy different requirements. Extensive experiments show that directly replacing our model with a conventional fully supervised segmentation model, the performances of various weakly supervised semantic segmentation pipelines were boosted, achieving new state-of-the-art performances on both PASCAL VOC 2012 and MS COCO with a clear margin. Code will be available at: https://github.com/zbf1991/CDL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability Statement

We use two datasets: PASCAL VOC 2012 (Everingham et al., 2010) with SBD (Hariharan et al., 2011) and MS COCO-2014 (Lin et al., 2014). All of them are from public resources. The PASCAL VOC 2012 dataset generated and analyzed during the current study is available in the PASCAL repository (Everingham et al., 2010), which is from the following public domain resources: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html. The dataset SBD (Hariharan et al., 2011) is available at the following public domain resources: http://home.bharathh.info/pubs/codes/SBD/download.html. The MS COCO-2014 dataset generated and analyzed during the current study is available in COCO repository (Lin et al., 2014), which is from the following public domain resources: https://cocodataset.org/.

References

  • Ahn, J., & Kwak, S. (2018). Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4981–4990).

  • Ahn, J., Cho, S., & Kwak, S. (2019). Weakly supervised learning of instance segmentation with inter-pixel relations. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  • Asgari Taghanaki, S., Abhishek, K., Cohen, J. P., Cohen-Adad, J., & Hamarneh, G. (2021). Deep semantic segmentation of natural and medical images: A review. Artificial Intelligence Review, 54(1), 137–178.

    Article  Google Scholar 

  • Bearman, A., Russakovsky, O., Ferrari, V., & Fei-Fei, L. (2016). What’s the point: Semantic segmentation with point supervision. In Proceedings of the European conference on computer vision (pp. 549–565).

  • Chang, Y. T., Wang, Q., Hung, W. C., Piramuthu, R., Tsai, Y. H., & Yang, M. H. (2020). Weakly-supervised semantic segmentation via sub-category exploration. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8991–9000).

  • Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2014). Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1412.7062.

  • Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587.

  • Chen, L., Wu, W., Fu, C., Han, X., & Zhang, Y. (2020). Weakly supervised semantic segmentation with boundary exploration. In Proceedings of the European conference on computer vision (pp. 347–362).

  • Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (pp. 801–818).

  • Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.

    Article  Google Scholar 

  • Dai, J., He, K., & Sun, J. (2015). Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1635–1643).

  • Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 248–255).

  • Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Fan, J., Zhang, Z., & Tan, T. (2020). Employing multi-estimations for weakly-supervised semantic segmentation. In Proceedings of the European conference on computer vision.

  • Fan, J., Zhang, Z., Song, C., & Tan, T. (2020). Learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4283–4292).

  • Gao, S. H., Cheng, M. M., Zhao, K., Zhang, X. Y., Yang, M. H., & Torr, P. (2021). Res2net: A new multi-scale backbone architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2019.2938758

    Article  Google Scholar 

  • Hariharan, B., Arbeláez, P., Bourdev, L., Maji, S., & Malik, J. (2011). Semantic contours from inverse detectors. In Proceedings of the IEEE international conference on computer vision (pp. 991–998).

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

  • Huang, Z., Wang, X., Wang, J., Liu, W., & Wang, J. (2018). Weakly-supervised semantic segmentation network with deep seeded region growing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7014–7023).

  • Jadon, S. (2020). A survey of loss functions for semantic segmentation. In 2020 IEEE conference on computational intelligence in bioinformatics and computational biology (CIBCB) (pp. 1–7).

  • Khoreva, A., Benenson, R., Hosang, J., Hein, M., & Schiele, B. (2017). Simple does it: Weakly supervised instance and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 876–885).

  • Krähenbühl, P., & Koltun, V. (2013). Parameter learning and convergent inference for dense random fields. In International conference on machine learning (pp. 513–521).

  • Kulharia, V., Chandra, S., Agrawal, A., Torr, P., & Tyagi, A. (2020). Box2seg: Attention weighted loss and discriminative feature learning for weakly supervised segmentation. In Proceedings of the European conference on computer vision (pp. 290–308).

  • Kweon, H., Yoon, S. H., Kim, H., Park, D., & Yoon, K. J. (2021). Unlocking the potential of ordinary classifier: Class-specific adversarial erasing framework for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6994–7003).

  • Lee, J., Choi, J., Mok, J., & Yoon, S. (2021). Reducing information bottleneck for weakly supervised semantic segmentation. Advances in Neural Information Processing Systems, 34, 27408–27421.

    Google Scholar 

  • Lee, J., Kim, E., & Yoon, S. (2021). Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4071–4080).

  • Lee, J., Kim, E., Lee, S., Lee, J., & Yoon, S. (2019). Ficklenet: Weakly and semi-supervised semantic image segmentation using stochastic inference. arXiv preprint arXiv:1902.10421.

  • Lee, S., Lee, M., Lee, J., & Shim, H. (2021). Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5495–5505).

  • Lee, J., Yi, J., Shin, C., & Yoon, S. (2021). Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2643–2652).

  • Li, Y., Kuang, Z., Liu, L., Chen, Y., & Zhang, W. (2021). Pseudo-mask matters in weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6964–6973).

  • Li, X., Zhong, Z., Wu, J., Yang, Y., Lin, Z., & Liu, H. (2019). Expectation-maximization attention networks for semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9167–9176).

  • Lin, D., Dai, J., Jia, J., He, K., & Sun, J. (2016). Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on computer vision and pattern recognition (pp. 3159–3167).

  • Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117–2125).

  • Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C.L. (2014). Microsoft coco: Common objects in context. In Proceedings of the European conference on computer vision (pp. 740–755).

  • Liu, Y., Wu, Y. H., Wen, P., Shi, Y., Qiu, Y., & Cheng, M. M. (2020). Leveraging instance-, image- and dataset-level information for weakly supervised instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2020.3023152

    Article  Google Scholar 

  • Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431–3440).

  • Luo, W., Yang, M., & Zheng, W. (2021). Weakly-supervised semantic segmentation with saliency and incremental supervision updating. Pattern Recognition, 115, 107858.

    Article  Google Scholar 

  • Milletari, F. (2018). Hough voting strategies for segmentation, detection and tracking (Ph.D. Thesis, Technische Universität München).

  • Milletari, F., Navab, N., & Ahmadi, S. A. (2016). V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 4th international conference on 3D vision (3DV) (pp. 565–571).

  • Nakashima, K. (2017). Deeplab with pytorch. https://github.com/kazuto1011/deeplab-pytorch.

  • Oh, Y., Kim, B., & Ham, B. (2021). Background-aware pooling and noise-aware loss for weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6913–6922).

  • Pan, J., Zhu, P., Zhang, K., Cao, B., Wang, Y., Zhang, D., Han, J., & Hu, Q. (2022). Learning self-supervised low-rank network for single-stage weakly and semi-supervised semantic segmentation. International Journal of Computer Vision, 130(5), 1181–1195.

    Article  Google Scholar 

  • Pu, M., Huang, Y., Guan, Q., & Zou, Q. (2018). Graphnet: Learning image pseudo annotations for weakly-supervised semantic segmentation. In Proceedings of the 26th ACM international conference on multimedia (pp. 483–491).

  • Ru, L., Du, B., Zhan, Y., & Wu, C. (2022). Weakly-supervised semantic segmentation with visual words learning and hybrid pooling. International Journal of Computer Vision, 130(4), 1127–1144.

    Article  Google Scholar 

  • Shimoda, W., & Yanai, K. (2019). Self-supervised difference detection for weakly-supervised semantic segmentation. In Proceedings of the IEEE international conference on computer vision (pp. 5208–5217).

  • Song, C., Huang, Y., Ouyang, W., & Wang, L. (2019). Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation. arXiv preprint arXiv:1904.11693.

  • Su, Y., Sun, R., Lin, G., & Wu, Q. (2021). Context decoupling augmentation for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7004–7014).

  • Sun, K., Shi, H., Zhang, Z., & Huang, Y. (2021). Ecs-net: Improving weakly supervised semantic segmentation by using connections between class activation maps. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7283–7292).

  • Sun, G., Wang, W., Dai, J., & Van Gool, L. (2020). Mining cross-image semantics for weakly supervised semantic segmentation. In Proceedings of the European conference on computer vision (pp. 347–365).

  • Tang, M., Perazzi, F., Djelouah, A., Ben Ayed, I., Schroers, C., & Boykov, Y. (2018). On regularized losses for weakly-supervised cnn segmentation. In Proceedings of the European conference on computer vision (pp. 507–522).

  • Wang, B., Qi, G., Tang, S., Zhang, T., Wei, Y., Li, L., & Zhang, Y. (2019). Boundary perception guidance: A scribble-supervised semantic segmentation approach. In International Joint Conference on Artificial Intelligence.

  • Wang, Y., Zhang, J., Kan, M., Shan, S., & Chen, X. (2020). Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. arXiv preprint arXiv:2004.04581.

  • Wang, X., Liu, S., Ma, H., & Yang, M. H. (2020). Weakly-supervised semantic segmentation by iterative affinity learning. International Journal of Computer Vision, 128(6), 1736–1749.

    Article  MathSciNet  MATH  Google Scholar 

  • Wei, Y., Feng, J., Liang, X., Cheng, M. M., Zhao, Y., & Yan, S. (2017). Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1568–1576).

  • Wei, Y., Xiao, H., Shi, H., Jie, Z., Feng, J., & Huang, T. S. (2018). Revisiting dilated convolution: A simple approach for weakly-and semi-supervised semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7268–7277).

  • Wu, T., Huang, J., Gao, G., Wei, X., Wei, X., Luo, X., & Liu, C.H. (2021). Embedded discriminative attention mechanism for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16765–16774).

  • Wu, Z., Shen, C., & Van Den Hengel, A. (2019). Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition, 90, 119–133.

    Article  Google Scholar 

  • Xiao, T., Liu, Y., Zhou, B., Jiang, Y., & Sun, J. (2018). Unified perceptual parsing for scene understanding. In Proceedings of the European conference on computer vision.

  • Xu, L., Ouyang, W., Bennamoun, M., Boussaid, F., Sohel, F., & Xu, D. (2021). Leveraging auxiliary tasks with affinity learning for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6984–6993).

  • Yao, Y., Chen, T., Xie, G. S., Zhang, C., Shen, F., Wu, Q., Tang, Z., & Zhang, J. (2021). Non-salient region object mining for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2623–2632).

  • Zhang, F., Gu, C., Zhang, C., & Dai, Y. (2021). Complementary patch for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7242–7251).

  • Zhang, T., Lin, G., Liu, W., Cai, J., & Kot, A. (2020). Splitting vs. merging: Mining object regions with discrepancy and intersection loss for weakly supervised semantic segmentation. In Proceedings of the European conference on computer vision.

  • Zhang, B., Xiao, J., & Zhao, Y. (2021). Dynamic feature regularized loss for weakly supervised semantic segmentation. arXiv preprint arXiv:2108.01296.

  • Zhang, B., Xiao, J., Jiao, J., Wei, Y., & Zhao, Y. (2021). Affinity attention graph neural network for weakly supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11), 8082–8096.

  • Zhang, B., Xiao, J., Wei, Y., Huang, K., Luo, S., & Zhao, Y. (2022). End-to-end weakly supervised semantic segmentation with reliable region mining. Pattern Recognition, 128, 108663.

    Article  Google Scholar 

  • Zhang, D., Zhang, H., Tang, J., Hua, X., & Sun, Q. (2020). Causal intervention for weakly-supervised semantic segmentation. arXiv preprint arXiv:2009.12547.

  • Zhang, B., Xiao, J., Wei, Y., Sun, M., & Huang, K. (2020). Reliability does matter: An end-to-end weakly supervised semantic segmentation approach. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07), 12765–12772.

    Article  Google Scholar 

  • Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2881–2890).

  • Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2921–2929).

Download references

Acknowledgements

This work was supported by National Key R &D Program of China (No. 2022YFE0200300), National Natural Science Foundation of China (No. 61972323) and Independent Innovation Research Project of China University of Petroleum (East China) (No. 22CX06060A).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jimin Xiao.

Additional information

Communicated by Karteek Alahari.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, B., Xiao, J., Wei, Y. et al. Credible Dual-Expert Learning for Weakly Supervised Semantic Segmentation. Int J Comput Vis 131, 1892–1908 (2023). https://doi.org/10.1007/s11263-023-01796-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01796-9

Keywords

Navigation