Class-conditional domain adaptation for semantic segmentation

Yue Wang¹,
Yuke Li²,
James H. Elder³,
Runmin Wu⁴ &
…
Huchuan Lu¹

775 Accesses
Explore all metrics

Abstract

Semantic segmentation is an important sub-task for many applications. However, pixel-level ground-truth labeling is costly, and there is a tendency to overfit to training data, thereby limiting the generalization ability. Unsupervised domain adaptation can potentially address these problems by allowing systems trained on labelled datasets from the source domain (including less expensive synthetic domain) to be adapted to a novel target domain. The conventional approach involves automatic extraction and alignment of the representations of source and target domains globally. One limitation of this approach is that it tends to neglect the differences between classes: representations of certain classes can be more easily extracted and aligned between the source and target domains than others, limiting the adaptation over all classes. Here, we address this problem by introducing a Class-Conditional Domain Adaptation (CCDA) method. This incorporates a class-conditional multi-scale discriminator and class-conditional losses for both segmentation and adaptation. Together, they measure the segmentation, shift the domain in a class-conditional manner, and equalize the loss over classes. Experimental results demonstrate that the performance of our CCDA method matches, and in some cases, surpasses that of state-of-the-art methods.

Article PDF

Unsupervised Domain Adaptation for Semantic Segmentation via Class-Balanced Self-training

Combining Pixel-Level and Structure-Level Adaptation for Semantic Segmentation

Article 12 March 2023

Contextual-Relation Consistent Domain Adaptation for Semantic Segmentation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Gong, L. X.; Zhang, Y. Q.; Zhang, Y. K.; Yang, Y.; Xu, W. W. Erroneous pixel prediction for semantic image segmentation. Computational Visual Media Vol. 8, No. 1, 165–175, 2022.
Article Google Scholar
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440, 2015.
Zhao, H. S.; Shi, J. P.; Qi, X. J.; Wang, X. G.; Jia, J. Y. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6230–6239, 2017.
Wang, W. H.; Xie, E. Z.; Li, X.; Fan, D. P.; Song, K. T.; Liang, D.; Lu, T.; Luo, P.; Shao, L. PVT v2: Improved baselines with pyramid vision transformer. Computational Visual Media Vol. 8, No. 3, 415–424, 2022.
Article Google Scholar
Yao, T.; Pan, Y. W.; Ngo, C. W.; Li, H. Q.; Mei, T. Semi-supervised Domain Adaptation with Subspace Learning for visual recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2142–2150, 2015.
Tsai, Y. H.; Hung, W. C.; Schulter, S.; Sohn, K.; Yang, M. H.; Chandraker, M. Learning to adapt structured output space for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7472–7481, 2018.
Tsai, Y. H.; Sohn, K.; Schulter, S.; Chandraker, M. Domain adaptation for structured output via discriminative patch representations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 1456–1465, 2019.
Chen, Y. H.; Chen, W. Y.; Chen, Y. T.; Tsai, B. C.; Wang, Y. C F.; Sun, M. No more discrimination: Cross city adaptation of road scene segmenters. In: Proceedings of the IEEE International Conference on Computer Vision, 2011–2020, 2017.
Luo, Y. W.; Liu, P.; Guan, T.; Yu, J. Q.; Yang, Y. Significance-aware information bottleneck for domain adaptive semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 6777–6786, 2019.
Sun, B. C.; Feng, J. S.; Saenko, K. Return of frustratingly easy domain adaptation. In: Proceedings of the 30th AI Conference on Artificial Intelligence, 2058–2065, 2016.
Geng, B.; Tao, D. C.; Xu, C. DAML: Domain adaptation metric learning. IEEE Transactions on Image Processing Vol. 20, No. 10, 2980–2989, 2011.
Article MathSciNet Google Scholar
Goodfellow, I. J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 2, 2672–2680, 2014.
Zhou, W.; Wang, Y. K.; Chu, J. J.; Yang, J. H.; Bai, X.; Xu, Y. C. Affinity space adaptation for semantic segmentation across domains. IEEE Transactions on Image Processing Vol. 30, 2549–2561, 2021.
Article Google Scholar
Vu, T. H.; Jain, H.; Bucher, M.; Cord, M.; Perez, P. ADVENT: Adversarial entropy minimization for domain adaptation in semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2512–2521, 2019.
Shan, Y. H.; Chew, C. M.; Lu, W. F. Semantic-aware short path adversarial training for cross-domain semantic segmentation. Neurocomputing Vol. 380, 125–132, 2020.
Article Google Scholar
Rozantsev, A.; Salzmann, M.; Fua, P. Beyond sharing weights for deep domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 4, 801–814, 2019.
Article Google Scholar
Sun, B.; Saenko, K. Deep CORAL: Correlation alignment for deep domain adaptation. In: Computer Vision–ECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9915. Hua, G.; Jégou, H. Eds. Springer Cham, 443–450, 2016
Google Scholar
Tzeng, E.; Hoffman, J.; Saenko, K.; Darrell, T. Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2962–2971, 2017.
Ganin, Y.; Lempitsky, V. Unsupervised domain adaptation by backpropagation. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, Vol. 37, 1180–1189, 2015.
Hoffman, J.; Wang, D.; Yu, F.; Darrell, T. FCNs in the wild: Pixel-level adversarial and constraint-based adaptation. arXiv preprint arXiv:1612.02649, 2016.
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, Vol. 70, 214–223, 2017.
Choi, Y.; Choi, M.; Kim, M.; Ha, J. W.; Kim, S.; Choo, J. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8789–8797, 2018.
Harms, J.; Lei, Y.; Wang, T.; Zhang, R.; Zhou, J.; Tang, X.; Curran, W. J.; Liu, T.; Yang, X. Paired cycle-GAN-based image correction for quantitative cone-beam computed tomography. Medical Physics Vol. 46, No. 9, 3998–4009, 2019.
Article Google Scholar
Isola, P.; Zhu, J. Y.; Zhou, T. H.; Efros, A. A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5967–5976, 2017.
Zou, Y.; Yu, Z. D.; Vijaya Kumar, B. V. K.; Wang, J. S. Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In: Computer Vision–ECCV 2018. Lecture Notes in Computer Science, Vol. 11207. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 297–313, 2018.
Google Scholar
Zhang, Y.; David, P.; Foroosh, H.; Gong, B. Q. A curriculum domain adaptation approach to the semantic segmentation of urban scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 42, No. 8, 1823–1841, 2020.
Article Google Scholar
Li, Y. S.; Yuan, L.; Vasconcelos, N. Bidirectional learning for domain adaptation of semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6929–6938, 2019.
Kim, M.; Byun, H. Learning texture invariant representation for domain adaptation of semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12972–12981, 2020.
Liu, Y. A.; Zhang, W.; Wang, J. Source-free domain adaptation for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1215–1224, 2021.
Klingner, M.; Termohlen, J. A.; Ritterbach, J.; Fingscheidt, T. Unsupervised BatchNorm adaptation (UBNA): A domain adaptation method for semantic segmentation without using source domain representations. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, 210–220, 2022.
Zhang, X. H.; Chen, Y.; Shen, Z. Y.; Shen, Y. M.; Zhang, H. F.; Zhang, Y. D. Confidence-and-refinement adaptation model for cross-domain semantic segmentation. IEEE Transactions on Intelligent Transportation Systems Vol. 23, No. 7, 9529–9542, 2022.
Article Google Scholar
Luo, Y. W.; Liu, P.; Zheng, L.; Guan, T.; Yu, J. Q.; Yang, Y. Category-level adversarial adaptation for semantic segmentation using purified features. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 44, No. 8, 3940–3956, 2022.
Google Scholar
Du, L.; Tan, J. G.; Yang, H. Y.; Feng, J. F.; Xue, X. Y.; Zheng, Q. B.; Ye, X. Q.; Zhang, X. L. SSF-DAN: Separated semantic feature based domain adaptation network for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 982–991, 2019.
Yang, J. H.; Xu, R. J.; Li, R. Y.; Qi, X. J.; Shen, X. Y.; Li, G. B.; Lin, L. An adversarial perturbation oriented domain adaptation approach for semantic segmentation. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 7, 12613–12620, 2020.
Article Google Scholar
Milletari, F.; Navab, N.; Ahmadi, S. A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of the 4th International Conference on 3D Vision, 565–571, 2016.
Nie, D.; Gao, Y. Z.; Wang, L.; Shen, D. G. ASDNet: Attention based semi-supervised deep networks for medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2018. Lecture Notes in Computer Science, Vol. 11073. Frangi, A.; Schnabel, J.; Davatzikos, C.; Alberola-López, C.; Fichtinger, G. Eds. Springer Cham, 370–378, 2018.
Google Scholar
Wong, K. C. L.; Moradi, M.; Tang, H.; Syeda-Mahmood, T. 3D segmentation with exponential logarithmic loss for highly unbalanced object sizes. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2018. Lecture Notes in Computer Science, Vol. 11072. Frangi, A.; Schnabel, J.; Davatzikos, C.; Alberola-Loópez, C.; Fichtinger, G. Eds. Springer Cham, 612–619, 2018.
Google Scholar
Toldo, M.; Michieli, U.; Zanuttigh, P. Unsupervised domain adaptation in semantic segmentation via orthogonal and clustered embeddings. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 1357–1367, 2021.
Richter, S. R.; Vineet, V.; Roth, S.; Koltun, V. Playing for data: Ground truth from computer games. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9906. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 102–118, 2016.
Google Scholar
Ros, G.; Sellart, L.; Materzynska, J.; Vazquez, D.; Lopez, A. M. The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3234–3243, 2016.
Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3213–3223, 2016.
Chen, L. C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 4, 834–848, 2018.
Article Google Scholar
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations, 2015.
Maas, A. L.; Hannun, A. Y.; Ng, A. Y. Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the ICML Workshop on Deep Learning for Audio, Speech and Language Processing, 2013.
Bottou, L. Large-scale machine learning with stochastic gradient descent. In: Proceedings of the COMPSTAT’ 2010, 177–186, 2010.
Kingma, D. P.; Ba, J. L. Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations, 2015.
Zhang, X. H.; Zhang, H. F.; Lu, J. F.; Shao, L.; Yang, J. Y. Target-targeted domain adaptation for unsupervised semantic segmentation. In: Proceedings of the IEEE International Conference on Robotics and Automation, 13560–13566, 2021.
Kang, J. X.; Zang, B.; Cao, W. P. Domain adaptive semantic segmentation via image translation and representation alignment. In: Proceedings of the IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking, 509–516, 2021.
Cheng, Y. T.; Wei, F. Y.; Bao, J. M.; Chen, D.; Wen, F.; Zhang, W. Q. Dual path learning for domain adaptation of semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 9062–9071, 2021.
Choi, J.; Kim, T.; Kim, C. Self-ensembling with GAN-based data augmentation for domain adaptation in semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 6829–6839, 2010.
Liang, T. T.; Chu, X. J.; Liu, Y. D.; Wang, Y. T.; Tang, Z.; Chu, W.; Chen, J. D.; Ling, H. B. CBNet: A composite backbone network architecture for object detection. IEEE Transactions on Image Processing Vol. 31, 6893–6906, 2022.
Article Google Scholar
Lan, Y. Q.; Duan, Y.; Liu, C. Y.; Zhu, C. Y.; Xiong, Y. S.; Huang, H.; Xu, K. ARM3D: Attention-based relation module for indoor 3D object detection. Computational Visual Media Vol. 8, No. 3, 395–414, 2022.
Article Google Scholar
Liu, Y.; Xie, Z. W.; Liu, H. An adaptive and robust edge detection method based on edge proportion statistics. IEEE Transactions on Image Processing Vol. 29, 5206–5215, 2020.
Article Google Scholar
Ji, G. P.; Fan, D. P.; Fu, K. R.; Wu, Z.; Shen, J. B.; Shao, L. Full-duplex strategy for video object segmentation. Computational Visual Media Vol. 9, No. 1, 155–175, 2023.
Article Google Scholar
You, M. Y.; Luo, C. X.; Zhou, H. J.; Zhu, S. Q. Dynamic dense CRF inference for video segmentation and semantic SLAM. Pattern Recognition Vol. 133, 109023, 2023.
Article Google Scholar

Download references

Acknowledgements

We would like to thank the York University Vision: Science to Applications (VISTA) program and Intelligent Systems for Sustainable Urban Mobility (ISSUM) project, funded by the Ontario Research Fund-Research Excellence program for their supports.

Author information

Authors and Affiliations

School of Information and Communication Engineering, Dalian University of Technology, Dalian, 116024, China
Yue Wang & Huchuan Lu
School of Computer Science, Wuhan University, Wuhan, 430072, China
Yuke Li
Department of Electrical Engineering and Computer Science, York University, Toronto, M3J 1P3, Canada
James H. Elder
Department of Computer Science, the University of Hong Kong, Hong Kong, 999077, China
Runmin Wu

Authors

Yue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuke Li
View author publications
You can also search for this author in PubMed Google Scholar
James H. Elder
View author publications
You can also search for this author in PubMed Google Scholar
Runmin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Huchuan Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huchuan Lu.

Ethics declarations

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Yue Wang is a Ph.D. student in Signal and Information Processing, Dalian University of Technology. Her research interest is in saliency detection and unsupervised learning.

Yuke Li received his Ph.D. degree in communication and information system, Wuhan University. His research interests include computer vision and deep learning.

James H. Elder is presently a professor in the Department of Electrical Engineering and Computer Science and the Department of Psychology, York University. His research interests include shape perception, single-view 3D reconstruction.

Runmin Wu is currently studying in computer science, the University of Hong Kong. Her research interest is in computer vision.

Huchuan Lu is a professor in the Department of Electronic Information and Electrical Engineering, Dalian University of Technology. His recent research interests focus on computer vision, artificial intelligence, pattern recognition, and machine learning.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Reprints and permissions

About this article

Cite this article

Wang, Y., Li, Y., Elder, J.H. et al. Class-conditional domain adaptation for semantic segmentation. Comp. Visual Media 10, 1013–1030 (2024). https://doi.org/10.1007/s41095-023-0362-4

Download citation

Received: 10 February 2023
Accepted: 23 June 2023
Published: 22 March 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s41095-023-0362-4

Class-conditional domain adaptation for semantic segmentation

Abstract

Article PDF

Similar content being viewed by others

Unsupervised Domain Adaptation for Semantic Segmentation via Class-Balanced Self-training

Combining Pixel-Level and Structure-Level Adaptation for Semantic Segmentation

Contextual-Relation Consistent Domain Adaptation for Semantic Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Class-conditional domain adaptation for semantic segmentation

Abstract

Article PDF

Similar content being viewed by others

Unsupervised Domain Adaptation for Semantic Segmentation via Class-Balanced Self-training

Combining Pixel-Level and Structure-Level Adaptation for Semantic Segmentation

Contextual-Relation Consistent Domain Adaptation for Semantic Segmentation

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation