When CNN Meet with ViT: Towards Semi-supervised Learning for Multi-class Medical Image Semantic Segmentation

Ziyang Wang¹⁰,
Tianze Li¹¹,
Jian-Qing Zheng¹² &
…
Baoru Huang¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13807))

Included in the following conference series:

European Conference on Computer Vision

2742 Accesses
18 Citations

Abstract

Due to the lack of quality annotation in medical imaging community, semi-supervised learning methods are highly valued in image semantic segmentation tasks. In this paper, an advanced consistency-aware pseudo-label-based self-ensembling approach is presented to fully utilize the power of Vision Transformer (ViT) and Convolutional Neural Network (CNN) in semi-supervised learning. Our proposed framework consists of a feature-learning module which is enhanced by ViT and CNN mutually, and a guidance module which is robust for consistency-aware purposes. The pseudo labels are inferred and utilized recurrently and separately by views of CNN and ViT in the feature-learning module to expand the data set and are beneficial to each other. Meanwhile, a perturbation scheme is designed for the feature-learning module, and averaging network weight is utilized to develop the guidance module. By doing so, the framework combines the feature-learning strength of CNN and ViT, strengthens the performance via dual-view co-training, and enables consistency-aware supervision in a semi-supervised manner. A topological exploration of all alternative supervision modes with CNN and ViT are detailed validated, demonstrating the most promising performance and specific setting of our method on semi-supervised medical image segmentation tasks. Experimental results show that the proposed method achieves state-of-the-art performance on a public benchmark data set with a variety of metrics. The code is publicly available (https://github.com/ziyangwang007/CV-SSL-MIS).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 79.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 99.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Prototype Consistency Learning for Medical Image Segmentation by Cross Pseudo Supervision

Article 07 September 2023

Dual-Branch Differentiated Similarity Network for Semi-supervised Medical Image Segmentation

Self-aware and Cross-Sample Prototypical Learning for Semi-supervised Medical Image Segmentation

References

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Bernard, O., et al.: Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imaging 37(11), 2514–2525 (2018)
Article Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100 (1998)
Google Scholar
Cao, H., et al.: Swin-UNet: UNet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537 (2021)
Chang, Y.T., et al.: Weakly-supervised semantic segmentation via sub-category exploration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8991–9000 (2020)
Google Scholar
Chen, J., et al.: TransUNet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Chen, L.-C., et al.: Naive-student: leveraging semi-supervised learning in video sequences for urban scene segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 695–714. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_40
Chapter Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision, pp. 801–818 (2018)
Google Scholar
Chen, X., et al.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR (2021)
Google Scholar
Deng, J., et al.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Dong-DongChen, W., WeiGao, Z.H.: Tri-net for semi-supervised deep learning. In: Proceedings of Twenty-Seventh International Joint Conference on Artificial Intelligence, pp. 2014–2020 (2018)
Google Scholar
Dosovitskiy, A., et al.: An image is worth \(16 \times 16\) words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059. PMLR (2016)
Google Scholar
Han, B., et al.: Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Huang, B., et al.: Simultaneous depth estimation and surgical tool segmentation in laparoscopic images. IEEE Trans. Med. Robot. Bionics 4(2), 335–338 (2022)
Article Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Hung, W.C., et al.: Adversarial learning for semi-supervised semantic segmentation. In: 29th British Machine Vision Conference, BMVC 2018 (2018)
Google Scholar
Ibrahim, M.S., et al.: Semi-supervised semantic image segmentation with self-correcting networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12715–12725 (2020)
Google Scholar
Ibtehaz, N., Rahman, M.S.: MultiResUNet: rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Netw. 121, 74–87 (2020)
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Google Scholar
Isensee, F., et al.: nnU-Net: self-adapting framework for U-Net-based medical image segmentation. In: Handels, H., Deserno, T., Maier, A., Maier-Hein, K., Palm, C., Tolxdorff, T. (eds.) Bildverarbeitung für die Medizin 2019. I, p. 22. Springer, Wiesbaden (2019). https://doi.org/10.1007/978-3-658-25326-4_7
Chapter Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Ji, W., et al.: Learning calibrated medical image segmentation via multi-rater agreement modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12341–12351 (2021)
Google Scholar
Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242 (2016)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Luo, X., et al.: Efficient semi-supervised gross target volume of nasopharyngeal carcinoma segmentation via uncertainty rectified pyramid consistency. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12902, pp. 318–329. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87196-3_30
Chapter Google Scholar
Luo, X., et al.: Semi-supervised medical image segmentation via cross teaching between CNN and transformer. arXiv preprint arXiv:2112.04894 (2021)
Mendel, R., de Souza, L.A., Rauber, D., Papa, J.P., Palm, C.: Semi-supervised segmentation based on error-correcting supervision. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12374, pp. 141–157. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58526-6_9
Chapter Google Scholar
Qiao, S., Shen, W., Zhang, Z., Wang, B., Yuille, A.: Deep co-training for semi-supervised image recognition. In: Proceedings of the European Conference on Computer Vision, pp. 135–152 (2018)
Google Scholar
Reiß, S., Seibold, C., Freytag, A., Rodner, E., Stiefelhagen, R.: Every annotation counts: multi-label deep supervision for medical image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9532–9542 (2021)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Woo, S., et al.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Google Scholar
Song, C., et al.: Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3136–3145 (2019)
Google Scholar
Souly, N., Spampinato, C., Shah, M.: Semi supervised semantic segmentation using generative adversarial network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5688–5696 (2017)
Google Scholar
Strudel, R., Garcia, R., Laptev, I., Schmid, C.: Segmenter: transformer for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7262–7272 (2021)
Google Scholar
Tarvainen, A., et al.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Verma, V., et al.: Interpolation consistency training for semi-supervised learning. In: International Joint Conference on Artificial Intelligence (2019)
Google Scholar
Vu, T.H., et al.: Advent: adversarial entropy minimization for domain adaptation in semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2517–2526 (2019)
Google Scholar
Wang, Z.: Deep learning in medical ultrasound image segmentation: a review. arXiv preprint arXiv:2002.07703 (2020)
Wang, Z., et al.: RAR-U-Net: a residual encoder to attention decoder by residual connections framework for spine segmentation under noisy labels. In: 2021 IEEE International Conference on Image Processing (ICIP). IEEE (2021)
Google Scholar
Wang, Z., Voiculescu, I.: Triple-view feature learning for medical image segmentation. In: Xu, X., Li, X., Mahapatra, D., Cheng, L., Petitjean, C., Fu, H. (eds.) REMIA 2022. LNCS, vol. 13543, pp. 42–54. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16876-5_5
Chapter Google Scholar
Wang, Z., Voiculescu, I.: Quadruple augmented pyramid network for multi-class Covid-19 segmentation via CT. In: 2021 43rd Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC) (2021)
Google Scholar
Wang, Z., et al.: Computationally-efficient vision transformer for medical image semantic segmentation via dual pseudo-label supervision. In: IEEE International Conference on Image Processing (ICIP) (2022)
Google Scholar
Wang, Z., Zheng, J.Q., Voiculescu, I.: An uncertainty-aware transformer for MRI cardiac semantic segmentation via mean teachers. In: Yang, G., Aviles-Rivero, A., Roberts, M., Schönlieb, C.B. (eds.) MIUA 2022. LNCS, vol. 13413, pp. 497–507. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-12053-4_37
Chapter Google Scholar
Xia, Y., et al.: 3D semi-supervised learning with uncertainty-aware multi-view co-training. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3646–3655 (2020)
Google Scholar
You, X., et al.: Segmentation of retinal blood vessels using the radial projection and semi-supervised approach. Pattern Recogn. 44(10–11), 2314–2324 (2011)
Article Google Scholar
Yu, L., Wang, S., Li, X., Fu, C.-W., Heng, P.-A.: Uncertainty-aware self-ensembling model for semi-supervised 3D left atrium segmentation. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11765, pp. 605–613. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32245-8_67
Chapter Google Scholar
Zhang, Y., Yang, L., Chen, J., Fredericksen, M., Hughes, D.P., Chen, D.Z.: Deep adversarial networks for biomedical image segmentation utilizing unannotated images. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 408–416. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66179-7_47
Chapter Google Scholar
Zhou, B., et al.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)
Google Scholar
Zoph, B., et al.: Rethinking pre-training and self-training. In: Advances in Neural Information Processing Systems, vol. 33, pp. 3833–3845 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Oxford, Oxford, UK
Ziyang Wang
Canford School, Wimborne, UK
Tianze Li
The Kennedy Institute of Rheumatology, University of Oxford, Oxford, UK
Jian-Qing Zheng
Department of Surgery and Cancer, Imperial College London, London, UK
Baoru Huang

Authors

Ziyang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tianze Li
View author publications
You can also search for this author in PubMed Google Scholar
Jian-Qing Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Baoru Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ziyang Wang .

Editor information

Editors and Affiliations

IBM Research - MIT-IBM Watson AI Lab, Massachusetts, USA
Leonid Karlinsky
Technion – Israel Institute of Technology, Haifa, Israel
Tomer Michaeli
Kyoto University, Kyoto, Japan
Ko Nishino

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 253 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Z., Li, T., Zheng, JQ., Huang, B. (2023). When CNN Meet with ViT: Towards Semi-supervised Learning for Multi-class Medical Image Semantic Segmentation. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13807. Springer, Cham. https://doi.org/10.1007/978-3-031-25082-8_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-25082-8_28
Published: 12 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25081-1
Online ISBN: 978-3-031-25082-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

When CNN Meet with ViT: Towards Semi-supervised Learning for Multi-class Medical Image Semantic Segmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Prototype Consistency Learning for Medical Image Segmentation by Cross Pseudo Supervision

Dual-Branch Differentiated Similarity Network for Semi-supervised Medical Image Segmentation

Self-aware and Cross-Sample Prototypical Learning for Semi-supervised Medical Image Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 253 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

When CNN Meet with ViT: Towards Semi-supervised Learning for Multi-class Medical Image Semantic Segmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Prototype Consistency Learning for Medical Image Segmentation by Cross Pseudo Supervision

Dual-Branch Differentiated Similarity Network for Semi-supervised Medical Image Segmentation

Self-aware and Cross-Sample Prototypical Learning for Semi-supervised Medical Image Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 253 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation