More Web Proxy on the site http://driver.im/

Article

CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation

Authors:

Ming-Hsuan Yang,

Jiaya JiaAuthors Info & Claims

Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXI

Pages 59 - 77

https://doi.org/10.1007/978-3-031-19821-2_4

Published: 23 October 2022 Publication History

Abstract

To improve instance-level detection/segmentation performance, existing self-supervised and semi-supervised methods extract either task-unrelated or task-specific training signals from unlabeled data. We show that these two approaches, at the two extreme ends of the task-specificity spectrum, are suboptimal for the task performance. Utilizing too little task-specific training signals causes underfitting to the ground-truth labels of downstream tasks, while the opposite causes overfitting to the ground-truth labels. To this end, we propose a novel Class-Agnostic Semi-Supervised Learning (CA-SSL) framework to achieve a more favorable task-specificity balance in extracting training signals from unlabeled data. CA-SSL has three training stages that act on either ground-truth labels (labeled data) or pseudo labels (unlabeled data). This decoupling strategy avoids the complicated scheme in traditional SSL methods that balances the contributions from both data types. Especially, we introduce a warmup training stage to achieve a more optimal balance in task specificity by ignoring class information in the pseudo labels, while preserving localization training signals. As a result, our warmup model can better avoid underfitting/overfitting when fine-tuned on the ground-truth labels in detection and segmentation tasks. Using 3.6M unlabeled data, we achieve a significant performance gain of

4.7 %

over ImageNet-pretrained baseline on FCOS object detection. In addition, our warmup model demonstrates excellent transferability to other detection and segmentation frameworks.

References

[1]

Berthelot, D., et al.: MixMatch: a holistic approach to semi-supervised learning. In: NeurlPS (2019)

[2]

et al, R.: Not all unlabeled data are equal: learning to weight data in semi-supervised learning. In: NeurlPS (2020)

[3]

Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, and Zagoruyko S Vedaldi A, Bischof H, Brox T, and Frahm J-M End-to-end object detection with transformers Computer Vision – ECCV 2020 2020 Cham Springer 213-229

[4]

Chen, K., et al.: Hybrid task cascade for instance segmentation. In: CVPR (2019)

[5]

Chen, K., et al.: MMDetection: open MMLab detection toolbox and benchmark (2019)

[6]

Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum contrastive learning. arXiv (2020)

[7]

Chen, X., Girshick, R., He, K., Dollár, P.: Tensormask: A foundation for dense object segmentation. In: ICCV (2019)

[8]

Chen, X., He, K.: Exploring simple Siamese representation learning. In: CVPR (2021)

[9]

Chen, Y., et al.: Scale-aware automatic augmentation for object detection. In: CVPR (2021)

[10]

Cheng, B., Schwing, A.G., Kirillov, A.: Per-pixel classification is not all you need for semantic segmentation. In: NeurlPS (2021)

[11]

Chu, R., Sun, Y., Li, Y., Liu, Z., Zhang, C., Wei, Y.: Vehicle re-identification with viewpoint-aware metric learning. In: ICCV (2019)

[12]

Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)

[13]

Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NeurIPS (2016)

[14]

Dai, Z., Cai, B., Lin, Y., Chen, J.: UP-DETR: unsupervised pre-training for object detection with transformers. In: CVPR (2021)

[15]

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)

[16]

Dollár P and Zitnick CL Fast edge detection using structured forests PAMI 2015 37 1558-1570

[17]

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)

[18]

He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)

[19]

He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: ICCV (2017)

[20]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

[21]

Hénaff, O.J., Koppula, S., Alayrac, J.B., van den Oord, A., Vinyals, O., Carreira, J.: Efficient visual pretraining with contrastive detection. In: ICCV (2021)

[22]

Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)

[23]

Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)

[24]

Jeong, J., Lee, S., Kim, J., Kwak, N.: Consistency-based semi-supervised learning for object detection (2019)

[25]

Kim, D., Lin, T.Y., Angelova, A., Kweon, I.S., Kuo, W.: Learning open-world object proposals without learning to classify. arXiv (2021)

[26]

Kirillov, A., Girshick, R., He, K., Dollár, P.: Panoptic feature pyramid networks. In: CVPR (2019)

[27]

Krasin, I., et al.: OpenImages: a public dataset for large-scale multi-label and multi-class image classification. Dataset (2016). http://github.com/openimages

[28]

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NeurIPS (2012)

[29]

Kuznetsova A et al. The open images dataset V4: unified image classification, object detection, and visual relationship detection at scale IJCV 2020 128 1956-1981

[30]

Li Y, Huang D, Qin D, Wang L, and Gong B Vedaldi A, Bischof H, Brox T, and Frahm J-M Improving object detection with Selective self-supervised self-training Computer Vision – ECCV 2020 2020 Cham Springer 589-607

[31]

Li, Y., et al.: Fully convolutional networks for panoptic segmentation with point-based supervision. arXiv (2021)

[32]

Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR (2017)

[33]

Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)

[34]

Lin T-Y et al. Fleet D, Pajdla T, Schiele B, Tuytelaars T, et al. Microsoft COCO: common objects in context Computer Vision – ECCV 2014 2014 Cham Springer 740-755

[35]

Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: CVPR (2018)

[36]

Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV (2021)

[37]

Morrison, D., et al.: Cartman: the low-cost cartesian manipulator that won the amazon robotics challenge. In: ICRA (2018)

[38]

Mottaghi, R., et al.: The role of context for object detection and semantic segmentation in the wild. In: CVPR (2014)

[39]

Qi, L., Jiang, L., Liu, S., Shen, X., Jia, J.: Amodal instance segmentation with KINS dataset. In: CVPR (2019)

[40]

Qi, L., et al.: Multi-scale aligned distillation for low-resolution detection. In: CVPR (2021)

[41]

Qi, L., et al.: Open-world entity segmentation. arXiv (2021)

[42]

Qi, L., Zhang, X., Chen, Y., Chen, Y., Sun, J., Jia, J.: PointINS: point-based instance segmentation. arXiv (2020)

[43]

Radosavovic, I., Dollár, P., Girshick, R., Gkioxari, G., He, K.: Data distillation: towards omni-supervised learning. In: CVPR (2018)

[44]

Ramanathan, V., Wang, R., Mahajan, D.: PreDet: large-scale weakly supervised pre-training for detection. In: ICCV (2021)

[45]

Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NeurIPS (2015)

[46]

Sharma, A., Khan, N., Mubashar, M., Sundaramoorthi, G., Torr, P.: Class-agnostic segmentation loss and its application to salient object detection and segmentation. In: ICCV (2021)

[47]

Shu, G.: Human detection, tracking and segmentation in surveillance video (2014)

[48]

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

[49]

Sohn, K., Zhang, Z., Li, C.L., Zhang, H., Lee, C.Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv (2020)

[50]

Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR (2019)

[51]

Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. arXiv (2016)

[52]

Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)

[53]

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)

[54]

Tang, P., Ramaiah, C., Wang, Y., Xu, R., Xiong, C.: Proposal learning for semi-supervised object detection. In: WACV (2021)

[55]

Tian Z, Shen C, and Chen H Vedaldi A, Bischof H, Brox T, and Frahm J-M Conditional convolutions for instance segmentation Computer Vision – ECCV 2020 2020 Cham Springer 282-298

[56]

Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: ICCV (2019)

[57]

Wang, K., Yan, X., Zhang, D., Zhang, L., Lin, L.: Towards human-machine cooperation: self-supervised sample mining for object detection. In: CVPR (2018)

[58]

Wang, W., Feiszli, M., Wang, H., Tran, D.: Unidentified video objects: a benchmark for dense, open-world segmentation. arXiv (2021)

[59]

Wang, X., Zhang, R., Shen, C., Kong, T., Li, L.: Dense contrastive learning for self-supervised visual pre-training. In: CVPR (2021)

[60]

Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). http://github.com/facebookresearch/detectron2

[61]

Xiao B, Wu H, and Wei Y Ferrari V, Hebert M, Sminchisescu C, and Weiss Y Simple baselines for human pose estimation and tracking Computer Vision – ECCV 2018 2018 Cham Springer 472-487

[62]

Xie, E., et al.: DetCo: unsupervised contrastive learning for object detection. In: ICCV (2021)

[63]

Xie, E., et al.: PolarMask: single shot instance segmentation with polar representation. In: CVPR (2020)

[64]

Xu, M., et al.: End-to-end semi-supervised object detection with soft teacher. In: ICCV (2021)

[65]

Zhang, R., Tian, Z., Shen, C., You, M., Yan, Y.: Mask encoding for single shot instance segmentation. In: CVPR (2020)

[66]

Zhou B, Lapedriza A, Khosla A, Oliva A, and Torralba A Places: a 10 million image database for scene recognition TPAMI 2017 40 1452-1464

[67]

Zhou, X., Wang, D., Krähenbühl, P.: Objects as points (2019)

[68]

Zoph, B., et al.: Rethinking pre-training and self-training. In: NeurIPS (2020)

Index Terms

CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

A Survey of Semi-Supervised Learning Methods
CIS '08: Proceedings of the 2008 International Conference on Computational Intelligence and Security - Volume 02

In traditional machine learning approaches to classification, one uses only a labelled set to train the classifier. Labelled instances however are often difficult, expensive, or time consuming to obtain, as they require the efforts of experienced human ...
Bayesian Self-training for Semi-supervised 3D Segmentation
Computer Vision – ECCV 2024
Abstract
3D segmentation is a core problem in computer vision and, similarly to many other dense prediction tasks, it requires large amounts of annotated data for adequate training. However, densely labeling 3D point clouds to employ fully-supervised ...
Twin self-supervision based semi-supervised learning (TS-SSL): Retinal anomaly classification in SD-OCT images
Abstract
The performance of supervised deep learning significantly relies on the volume of training samples. However, the vast majority of medical images lacks manual expert annotations. Compared to natural image annotation, the cost of medical ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXI

Oct 2022

809 pages

ISBN:978-3-031-19820-5

DOI:10.1007/978-3-031-19821-2

Editors:
Shai Avidan
Tel Aviv University, Tel Aviv, Israel
,
Gabriel Brostow
University College London, London, UK
,
Moustapha Cissé
Google AI, Accra, Ghana
,
Giovanni Maria Farinella
University of Catania, Catania, Italy
,
Tal Hassner
Facebook (United States), Menlo Park, CA, USA

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 October 2022

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents