[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/978-3-031-19821-2_4guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation

Published: 23 October 2022 Publication History

Abstract

To improve instance-level detection/segmentation performance, existing self-supervised and semi-supervised methods extract either task-unrelated or task-specific training signals from unlabeled data. We show that these two approaches, at the two extreme ends of the task-specificity spectrum, are suboptimal for the task performance. Utilizing too little task-specific training signals causes underfitting to the ground-truth labels of downstream tasks, while the opposite causes overfitting to the ground-truth labels. To this end, we propose a novel Class-Agnostic Semi-Supervised Learning (CA-SSL) framework to achieve a more favorable task-specificity balance in extracting training signals from unlabeled data. CA-SSL has three training stages that act on either ground-truth labels (labeled data) or pseudo labels (unlabeled data). This decoupling strategy avoids the complicated scheme in traditional SSL methods that balances the contributions from both data types. Especially, we introduce a warmup training stage to achieve a more optimal balance in task specificity by ignoring class information in the pseudo labels, while preserving localization training signals. As a result, our warmup model can better avoid underfitting/overfitting when fine-tuned on the ground-truth labels in detection and segmentation tasks. Using 3.6M unlabeled data, we achieve a significant performance gain of 4.7% over ImageNet-pretrained baseline on FCOS object detection. In addition, our warmup model demonstrates excellent transferability to other detection and segmentation frameworks.

References

[1]
Berthelot, D., et al.: MixMatch: a holistic approach to semi-supervised learning. In: NeurlPS (2019)
[2]
et al, R.: Not all unlabeled data are equal: learning to weight data in semi-supervised learning. In: NeurlPS (2020)
[3]
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, and Zagoruyko S Vedaldi A, Bischof H, Brox T, and Frahm J-M End-to-end object detection with transformers Computer Vision – ECCV 2020 2020 Cham Springer 213-229
[4]
Chen, K., et al.: Hybrid task cascade for instance segmentation. In: CVPR (2019)
[5]
Chen, K., et al.: MMDetection: open MMLab detection toolbox and benchmark (2019)
[6]
Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum contrastive learning. arXiv (2020)
[7]
Chen, X., Girshick, R., He, K., Dollár, P.: Tensormask: A foundation for dense object segmentation. In: ICCV (2019)
[8]
Chen, X., He, K.: Exploring simple Siamese representation learning. In: CVPR (2021)
[9]
Chen, Y., et al.: Scale-aware automatic augmentation for object detection. In: CVPR (2021)
[10]
Cheng, B., Schwing, A.G., Kirillov, A.: Per-pixel classification is not all you need for semantic segmentation. In: NeurlPS (2021)
[11]
Chu, R., Sun, Y., Li, Y., Liu, Z., Zhang, C., Wei, Y.: Vehicle re-identification with viewpoint-aware metric learning. In: ICCV (2019)
[12]
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)
[13]
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NeurIPS (2016)
[14]
Dai, Z., Cai, B., Lin, Y., Chen, J.: UP-DETR: unsupervised pre-training for object detection with transformers. In: CVPR (2021)
[15]
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
[16]
Dollár P and Zitnick CL Fast edge detection using structured forests PAMI 2015 37 1558-1570
[17]
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
[18]
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)
[19]
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: ICCV (2017)
[20]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
[21]
Hénaff, O.J., Koppula, S., Alayrac, J.B., van den Oord, A., Vinyals, O., Carreira, J.: Efficient visual pretraining with contrastive detection. In: ICCV (2021)
[22]
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)
[23]
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)
[24]
Jeong, J., Lee, S., Kim, J., Kwak, N.: Consistency-based semi-supervised learning for object detection (2019)
[25]
Kim, D., Lin, T.Y., Angelova, A., Kweon, I.S., Kuo, W.: Learning open-world object proposals without learning to classify. arXiv (2021)
[26]
Kirillov, A., Girshick, R., He, K., Dollár, P.: Panoptic feature pyramid networks. In: CVPR (2019)
[27]
Krasin, I., et al.: OpenImages: a public dataset for large-scale multi-label and multi-class image classification. Dataset (2016). http://github.com/openimages
[28]
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NeurIPS (2012)
[29]
Kuznetsova A et al. The open images dataset V4: unified image classification, object detection, and visual relationship detection at scale IJCV 2020 128 1956-1981
[30]
Li Y, Huang D, Qin D, Wang L, and Gong B Vedaldi A, Bischof H, Brox T, and Frahm J-M Improving object detection with Selective self-supervised self-training Computer Vision – ECCV 2020 2020 Cham Springer 589-607
[31]
Li, Y., et al.: Fully convolutional networks for panoptic segmentation with point-based supervision. arXiv (2021)
[32]
Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR (2017)
[33]
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)
[34]
Lin T-Y et al. Fleet D, Pajdla T, Schiele B, Tuytelaars T, et al. Microsoft COCO: common objects in context Computer Vision – ECCV 2014 2014 Cham Springer 740-755
[35]
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: CVPR (2018)
[36]
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV (2021)
[37]
Morrison, D., et al.: Cartman: the low-cost cartesian manipulator that won the amazon robotics challenge. In: ICRA (2018)
[38]
Mottaghi, R., et al.: The role of context for object detection and semantic segmentation in the wild. In: CVPR (2014)
[39]
Qi, L., Jiang, L., Liu, S., Shen, X., Jia, J.: Amodal instance segmentation with KINS dataset. In: CVPR (2019)
[40]
Qi, L., et al.: Multi-scale aligned distillation for low-resolution detection. In: CVPR (2021)
[41]
Qi, L., et al.: Open-world entity segmentation. arXiv (2021)
[42]
Qi, L., Zhang, X., Chen, Y., Chen, Y., Sun, J., Jia, J.: PointINS: point-based instance segmentation. arXiv (2020)
[43]
Radosavovic, I., Dollár, P., Girshick, R., Gkioxari, G., He, K.: Data distillation: towards omni-supervised learning. In: CVPR (2018)
[44]
Ramanathan, V., Wang, R., Mahajan, D.: PreDet: large-scale weakly supervised pre-training for detection. In: ICCV (2021)
[45]
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NeurIPS (2015)
[46]
Sharma, A., Khan, N., Mubashar, M., Sundaramoorthi, G., Torr, P.: Class-agnostic segmentation loss and its application to salient object detection and segmentation. In: ICCV (2021)
[47]
Shu, G.: Human detection, tracking and segmentation in surveillance video (2014)
[48]
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
[49]
Sohn, K., Zhang, Z., Li, C.L., Zhang, H., Lee, C.Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv (2020)
[50]
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR (2019)
[51]
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. arXiv (2016)
[52]
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)
[53]
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)
[54]
Tang, P., Ramaiah, C., Wang, Y., Xu, R., Xiong, C.: Proposal learning for semi-supervised object detection. In: WACV (2021)
[55]
Tian Z, Shen C, and Chen H Vedaldi A, Bischof H, Brox T, and Frahm J-M Conditional convolutions for instance segmentation Computer Vision – ECCV 2020 2020 Cham Springer 282-298
[56]
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: ICCV (2019)
[57]
Wang, K., Yan, X., Zhang, D., Zhang, L., Lin, L.: Towards human-machine cooperation: self-supervised sample mining for object detection. In: CVPR (2018)
[58]
Wang, W., Feiszli, M., Wang, H., Tran, D.: Unidentified video objects: a benchmark for dense, open-world segmentation. arXiv (2021)
[59]
Wang, X., Zhang, R., Shen, C., Kong, T., Li, L.: Dense contrastive learning for self-supervised visual pre-training. In: CVPR (2021)
[60]
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). http://github.com/facebookresearch/detectron2
[61]
Xiao B, Wu H, and Wei Y Ferrari V, Hebert M, Sminchisescu C, and Weiss Y Simple baselines for human pose estimation and tracking Computer Vision – ECCV 2018 2018 Cham Springer 472-487
[62]
Xie, E., et al.: DetCo: unsupervised contrastive learning for object detection. In: ICCV (2021)
[63]
Xie, E., et al.: PolarMask: single shot instance segmentation with polar representation. In: CVPR (2020)
[64]
Xu, M., et al.: End-to-end semi-supervised object detection with soft teacher. In: ICCV (2021)
[65]
Zhang, R., Tian, Z., Shen, C., You, M., Yan, Y.: Mask encoding for single shot instance segmentation. In: CVPR (2020)
[66]
Zhou B, Lapedriza A, Khosla A, Oliva A, and Torralba A Places: a 10 million image database for scene recognition TPAMI 2017 40 1452-1464
[67]
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points (2019)
[68]
Zoph, B., et al.: Rethinking pre-training and self-training. In: NeurIPS (2020)

Index Terms

  1. CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXI
          Oct 2022
          809 pages
          ISBN:978-3-031-19820-5
          DOI:10.1007/978-3-031-19821-2

          Publisher

          Springer-Verlag

          Berlin, Heidelberg

          Publication History

          Published: 23 October 2022

          Author Tags

          1. Semi-supervised
          2. Class-agnostic
          3. Instance-level detection

          Qualifiers

          • Article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 03 Jan 2025

          Other Metrics

          Citations

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media