[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3581783.3611792acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Calibration-based Dual Prototypical Contrastive Learning Approach for Domain Generalization Semantic Segmentation

Published: 27 October 2023 Publication History

Abstract

Prototypical contrastive learning (PCL) has been widely used to learn class-wise domain-invariant features recently. These methods are based on the assumption that the prototypes, which are represented as the central value of the same class in a certain domain, are domain-invariant. Since the prototypes of different domains have discrepancies as well, the class-wise domain-invariant features learned from the source domain by PCL need to be aligned with the prototypes of other domains simultaneously. However, the prototypes of the same class in different domains may be different while the prototypes of different classes may be similar, which may affect the learning of class-wise domain-invariant features. Based on these observations, a calibration-based dual prototypical contrastive learning (CDPCL) approach is proposed to reduce the domain discrepancy between the learned class-wise features and the prototypes of different domains for domain generalization semantic segmentation. It contains an uncertainty-guided PCL (UPCL) and a hard-weighted PCL (HPCL). Since the domain discrepancies of the prototypes of different classes may be different, we propose an uncertainty probability matrix to represent the domain discrepancies of the prototypes of all the classes. The UPCL estimates the uncertainty probability matrix to calibrate the weights of the prototypes during the PCL. Moreover, considering that the prototypes of different classes may be similar in some circumstances, which means these prototypes are hard-aligned, the HPCL is proposed to generate a hard-weighted matrix to calibrate the weights of the hard-aligned prototypes during the PCL. Extensive experiments demonstrate that our approach achieves superior performance over current approaches on domain generalization segmentation tasks. The source code will be released at https://github.com/seabearlmx/CDPCL.

References

[1]
Minjie Cai, Feng Lu, and Yoichi Sato. 2020. Generalizing hand segmentation in egocentric videos with uncertainty-guided model adaptation. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition. 14392--14401.
[2]
Chaoqi Chen, Jiongcheng Li, Xiaoguang Han, Xiaoqing Liu, and Yizhou Yu. 2022. Compound domain generalization via meta-knowledge encoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7119--7129.
[3]
Chaoqi Chen, Weiping Xie, Wenbing Huang, Yu Rong, Xinghao Ding, Yue Huang, Tingyang Xu, and Junzhou Huang. 2019. Progressive feature alignment for unsupervised domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 627--636.
[4]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 4 (2017), 834--848.
[5]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV). 801--818.
[6]
Sungha Choi, Sanghun Jung, Huiwon Yun, Joanne T Kim, Seungryong Kim, and Jaegul Choo. 2021. Robustnet: Improving domain generalization in urban-scene segmentation via instance selective whitening. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11580--11590.
[7]
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3213--3223.
[8]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[10]
Yulin He, Wei Chen, Zhengfa Liang, Dan Chen, Yusong Tan, Xin Luo, Chen Li, and Yulan Guo. 2021. Fast and Accurate Lane Detection via Frequency Domain Learning. In Proceedings of the 29th ACM International Conference on Multimedia. 890--898.
[11]
Guoguang Hua, Muxin Liao, Shishun Tian, Yuhang Zhang, and Wenbin Zou. 2023. Multiple Relational Learning Network for Joint Referring Expression Comprehension and Segmentation. IEEE Transactions on Multimedia (2023).
[12]
Jiaxing Huang, Dayan Guan, Aoran Xiao, and Shijian Lu. 2021. Fsdr: Frequency space domain randomization for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6891--6902.
[13]
Zhengkai Jiang, Yuxi Li, Ceyuan Yang, Peng Gao, Yabiao Wang, Ying Tai, and Chengjie Wang. 2022. Prototypical contrast adaptation for domain adaptive semantic segmentation. In European Conference on Computer Vision. Springer, 36--54.
[14]
Xin Jin, Cuiling Lan, Wenjun Zeng, and Zhibo Chen. 2021. Style normalization and restitution for domain generalization and adaptation. IEEE Transactions on Multimedia, Vol. 24 (2021), 3636--3651.
[15]
Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, and Kwanghoon Sohn. 2022. Pin the Memory: Learning to Generalize Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4350--4360.
[16]
Geon Lee, Chanho Eom, Wonkyung Lee, Hyekang Park, and Bumsub Ham. 2022a. Bi-directional Contrastive Learning for Domain Adaptive Semantic Segmentation. In European Conference on Computer Vision. Springer, 38--55.
[17]
Suhyeon Lee, Hongje Seong, Seongwon Lee, and Euntai Kim. 2022b. WildNet: Learning Domain Generalized Semantic Segmentation from the Wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9936--9946.
[18]
Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy Hospedales. 2018. Learning to generalize: Meta-learning for domain generalization. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
[19]
Miaoyu Li, Yachao Zhang, Yuan Xie, Zuodong Gao, Cuihua Li, Zhizhong Zhang, and Yanyun Qu. 2022. Cross-Domain and Cross-Modal Knowledge Distillation in Domain Adaptation for 3D Semantic Segmentation. In Proceedings of the 30th ACM International Conference on Multimedia. 3829--3837.
[20]
Muxin Liao, Guoguang Hua, Shishun Tian, Yuhang Zhang, Wenbin Zou, and Xia Li. 2022. Exploring More Concentrated and Consistent Activation Regions for Cross-domain Semantic Segmentation. Neurocomputing (2022).
[21]
Yahao Liu, Jinhong Deng, Xinchen Gao, Wen Li, and Lixin Duan. 2021. Bapa-net: Boundary adaptation and prototype alignment for cross-domain semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8801--8811.
[22]
Yulei Lu, Yawei Luo, Li Zhang, Zheyang Li, Yi Yang, and Jun Xiao. 2022. Bidirectional self-training with multiple anisotropic prototypes for domain adaptive semantic segmentation. In Proceedings of the 30th ACM International Conference on Multimedia. 1405--1415.
[23]
Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV). 116--131.
[24]
Zeyu Ma, Yang Yang, Guoqing Wang, Xing Xu, Heng Tao Shen, and Mingxing Zhang. 2022. Rethinking Open-World Object Detection in Autonomous Driving Scenarios. In Proceedings of the 30th ACM International Conference on Multimedia. 1279--1288.
[25]
Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulo, and Peter Kontschieder. 2017. The mapillary vistas dataset for semantic understanding of street scenes. In Proceedings of the IEEE international conference on computer vision. 4990--4999.
[26]
Xingang Pan, Ping Luo, Jianping Shi, and Xiaoou Tang. 2018. Two at once: Enhancing learning and generalization capacities via ibn-net. In Proceedings of the European Conference on Computer Vision (ECCV). 464--479.
[27]
Xingang Pan, Xiaohang Zhan, Jianping Shi, Xiaoou Tang, and Ping Luo. 2019. Switchable whitening for deep representation learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1863--1871.
[28]
Duo Peng, Yinjie Lei, Munawar Hayat, Yulan Guo, and Wen Li. 2022a. Semantic-aware domain generalized segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2594--2605.
[29]
Duo Peng, Yinjie Lei, Lingqiao Liu, Pingping Zhang, and Jun Liu. 2021. Global and local texture randomization for synthetic-to-real semantic segmentation. IEEE Transactions on Image Processing, Vol. 30 (2021), 6594--6608.
[30]
Xi Peng, Fengchun Qiao, and Long Zhao. 2022b. Out-of-domain generalization from a single source: An uncertainty quantification approach. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).
[31]
Fengchun Qiao and Xi Peng. 2021. Uncertainty-guided model generalization to unseen domains. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6790--6800.
[32]
Stephan R Richter, Vibhav Vineet, Stefan Roth, and Vladlen Koltun. 2016. Playing for data: Ground truth from computer games. In European conference on computer vision. Springer, 102--118.
[33]
German Ros, Laura Sellart, Joanna Materzynska, David Vazquez, and Antonio M Lopez. 2016. The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3234--3243.
[34]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510--4520.
[35]
Zu-Yun Shiau, Wei-Wei Lin, Ci-Siang Lin, and Yu-Chiang Frank Wang. 2021. Meta-Learned Feature Critics for Domain Generalized Semantic Segmentation. In 2021 IEEE International Conference on Image Processing (ICIP). IEEE, 2244--2248.
[36]
Siwei Su, Haijian Wang, and Meng Yang. 2022. Consistency Learning based on Class-Aware Style Variation for Domain Generalizable Semantic Segmentation. In Proceedings of the 30th ACM International Conference on Multimedia. 6029--6038.
[37]
Gabriel Tjio, Ping Liu, Joey Tianyi Zhou, and Rick Siow Mong Goh. 2022. Adversarial semantic hallucination for domain generalized semantic segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 318--327.
[38]
Girish Varma, Anbumani Subramanian, Anoop Namboodiri, Manmohan Chandraker, and CV Jawahar. 2019. IDD: A dataset for exploring problems of autonomous navigation in unconstrained environments. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1743--1751.
[39]
Jingye Wang, Ruoyi Du, Dongliang Chang, Kongming Liang, and Zhanyu Ma. 2022a. Domain Generalization via Frequency-domain-based Feature Disentanglement and Interaction. In Proceedings of the 30th ACM International Conference on Multimedia. 4821--4829.
[40]
Shanshan Wang, Lei Zhang, Pichao Wang, MengZhu Wang, and Xingyi Zhang. 2023. BP-triplet net for unsupervised domain adaptation: A Bayesian perspective. Pattern Recognition, Vol. 133 (2023), 108993.
[41]
Yue Wang, Lei Qi, Yinghuan Shi, and Yang Gao. 2022b. Feature-based Style Randomization for Domain Generalization. IEEE Transactions on Circuits and Systems for Video Technology (2022).
[42]
Yinduo Wang, Haofeng Zhang, Zheng Zhang, Yang Long, and Ling Shao. 2020. Learning discriminative domain-invariant prototypes for generalized zero shot learning. Knowledge-Based Systems, Vol. 196 (2020), 105796.
[43]
Yanyan Wei, Zhao Zhang, Huan Zheng, Richang Hong, Yi Yang, and Meng Wang. 2022. Sginet: Toward sufficient interaction between single image deraining and semantic segmentation. In Proceedings of the 30th ACM International Conference on Multimedia. 6202--6210.
[44]
Zehao Xiao, Jiayi Shen, Xiantong Zhen, Ling Shao, and Cees Snoek. 2021. A bit more bayesian: Domain-invariant learning with uncertainty. In International Conference on Machine Learning. PMLR, 11351--11361.
[45]
Qi Xu, Liang Yao, Zhengkai Jiang, Guannan Jiang, Wenqing Chu, Wenhui Han, Wei Zhang, Chengjie Wang, and Ying Tai. 2022. DIRL: Domain-invariant representation learning for generalizable semantic segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 2884--2892.
[46]
Yalan Ye, Ziqi Liu, Yangwuyong Zhang, Jingjing Li, and Hengtao Shen. 2022. Alleviating Style Sensitivity then Adapting: Source-free Domain Adaptation for Medical Image Segmentation. In Proceedings of the 30th ACM International Conference on Multimedia. 1935--1944.
[47]
Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, and Trevor Darrell. 2020. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2636--2645.
[48]
Jinze Yu, Jiaming Liu, Xiaobao Wei, Haoyi Zhou, Yohei Nakata, Denis Gudovskiy, Tomoyuki Okuno, Jianxin Li, Kurt Keutzer, and Shanghang Zhang. 2022. MTTrans: Cross-domain Object Detection with Mean Teacher Transformer. In Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part IX. Springer, 629--645.
[49]
Zhimin Yuan, Ming Cheng, Wankang Zeng, Yanfei Su, Weiquan Liu, Shangshu Yu, and Cheng Wang. 2023. Prototype-guided Multi-task Adversarial Network for Cross-domain LiDAR Point Clouds Semantic Segmentation. IEEE Transactions on Geoscience and Remote Sensing (2023).
[50]
Xiangyu Yue, Yang Zhang, Sicheng Zhao, Alberto Sangiovanni-Vincentelli, Kurt Keutzer, and Boqing Gong. 2019. Domain randomization and pyramid consistency: Simulation-to-real generalization without accessing target domain data. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2100--2110.
[51]
Jian Zhang, Lei Qi, Yinghuan Shi, and Yang Gao. 2022b. Generalizable model-agnostic semantic segmentation via target-specific normalization. Pattern Recognition, Vol. 122 (2022), 108292.
[52]
Wei Zhang, Xiaohong Zhang, Sheng Huang, Yuting Lu, and Kun Wang. 2022c. A Probabilistic Model for Controlling Diversity and Accuracy of Ambiguous Medical Image Segmentation. In Proceedings of the 30th ACM International Conference on Multimedia. 4751--4759.
[53]
Yachao Zhang, Miaoyu Li, Yuan Xie, Cuihua Li, Cong Wang, Zhizhong Zhang, and Yanyun Qu. 2022a. Self-supervised Exclusive Learning for 3D Segmentation with Cross-Modal Unsupervised Domain Adaptation. In Proceedings of the 30th ACM International Conference on Multimedia. 3338--3346.
[54]
Yuhang Zhang, Shishun Tian, Muxin Liao, Wenbin Zou, and Chen Xu. 2023. A hybrid domain learning framework for unsupervised semantic segmentation. Neurocomputing, Vol. 516 (2023), 133--145.
[55]
Yixin Zhang, Zilei Wang, and Yushi Mao. 2021. Rpn prototype alignment for domain adaptive object detector. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12425--12434.
[56]
Yuyang Zhao, Zhun Zhong, Na Zhao, Nicu Sebe, and Gim Hee Lee. 2022. Style-hallucinated dual consistency learning for domain generalized semantic segmentation. In Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXVIII. Springer, 535--552.
[57]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2223--2232.
[58]
Wenbin Zou, Ruijing Long, Yuhang Zhang, Muxin Liao, Zhi Zhou, and Shishun Tian. 2023. Dual geometric perception for cross-domain road segmentation. Displays, Vol. 76 (2023), 102332.

Cited By

View all
  • (2024)Calibration-Based Multi-Prototype Contrastive Learning for Domain Generalization Semantic Segmentation in Traffic ScenesIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.345427425:12(20985-21001)Online publication date: Dec-2024
  • (2024)Cross-modal domain generalization semantic segmentation based on fusion featuresKnowledge-Based Systems10.1016/j.knosys.2024.112356302(112356)Online publication date: Oct-2024

Index Terms

  1. Calibration-based Dual Prototypical Contrastive Learning Approach for Domain Generalization Semantic Segmentation

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '23: Proceedings of the 31st ACM International Conference on Multimedia
      October 2023
      9913 pages
      ISBN:9798400701085
      DOI:10.1145/3581783
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 October 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. domain generalization
      2. hard-weighted prototypical contrastive learning
      3. semantic segmentation
      4. uncertainty-guided prototypical contrastive learning

      Qualifiers

      • Research-article

      Funding Sources

      • National Natural Science Foundation of China
      • Natural Science Foundation of Guangdong Province, China
      • The Tencent ``Rhinoceros Birds' - Scientific Research Foundation for Young Teachers of Shenzhen University, China
      • The Key Project of Shenzhen Science and Technology Plan
      • The Interdisciplinary Innovation Team of Shenzhen University
      • The Key Project of DEGP (Department of Education of Guangdong Province)

      Conference

      MM '23
      Sponsor:
      MM '23: The 31st ACM International Conference on Multimedia
      October 29 - November 3, 2023
      Ottawa ON, Canada

      Acceptance Rates

      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)74
      • Downloads (Last 6 weeks)7
      Reflects downloads up to 11 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Calibration-Based Multi-Prototype Contrastive Learning for Domain Generalization Semantic Segmentation in Traffic ScenesIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.345427425:12(20985-21001)Online publication date: Dec-2024
      • (2024)Cross-modal domain generalization semantic segmentation based on fusion featuresKnowledge-Based Systems10.1016/j.knosys.2024.112356302(112356)Online publication date: Oct-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media