Semi-supervised 3D Object Detection with Proficient Teachers

Junbo Yin¹²,
Jin Fang^13,14,15,
Dingfu Zhou^13,14,
Liangjun Zhang^13,14,
Cheng-Zhong Xu¹⁵,
Jianbing Shen¹⁵ &
…
Wenguan Wang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13698))

Included in the following conference series:

European Conference on Computer Vision

3305 Accesses
49 Citations

Abstract

Dominated point cloud-based 3D object detectors in autonomous driving scenarios rely heavily on the huge amount of accurately labeled samples, however, 3D annotation in the point cloud is extremely tedious, expensive and time-consuming. To reduce the dependence on large supervision, semi-supervised learning (SSL) based approaches have been proposed. The Pseudo-Labeling methodology is commonly used for SSL frameworks, however, the low-quality predictions from the teacher model have seriously limited its performance. In this work, we propose a new Pseudo-Labeling framework for semi-supervised 3D object detection, by enhancing the teacher model to a proficient one with several necessary designs. First, to improve the recall of pseudo labels, a Spatial-temporal Ensemble (STE) module is proposed to generate sufficient seed boxes. Second, to improve the precision of recalled boxes, a Clustering-based Box Voting (CBV) module is designed to get aggregated votes from the clustered seed boxes. This also eliminates the necessity of sophisticated thresholds to select pseudo labels. Furthermore, to reduce the negative influence of wrongly pseudo-labeled samples during the training, a soft supervision signal is proposed by considering Box-wise Contrastive Learning (BCL). The effectiveness of our model is verified on both ONCE and Waymo datasets. For example, on ONCE, our approach significantly improves the baseline by 9.51 mAP. Moreover, with half annotations, our model outperforms the oracle model with full annotations on Waymo.

J. Yin and J. Fang—Equal contribution. Work done when J. Yin was an intern at Baidu Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 79.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 99.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Few Annotated Pixels and Point Cloud Based Weakly Supervised Semantic Segmentation of Driving Scenes

Article 04 November 2024

TCC-Det: Temporarily Consistent Cues for Weakly-Supervised 3D Detection

Lidar Point Cloud Guided Monocular 3D Object Detection

Notes

1.
https://once-for-auto-driving.github.io/benchmark.html#benchmark.

References

Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.: Mixmatch: a holistic approach to semi-supervised learning. In: NeurIPS (2019)
Google Scholar
Caesar, H., et al.: nuscenes: a multimodal dataset for autonomous driving. In: CVPR (2020)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML (2020)
Google Scholar
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving. In: CVPR (2017)
Google Scholar
Deng, J., Shi, S., Li, P., gang Zhou, W., Zhang, Y., Li, H.: Voxel R-CNN: towards high performance voxel-based 3d object detection. In: AAAI (2021)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. IJCV 88(2), 303–338 (2010)
Article Google Scholar
Fang, J., et al.: Augmented lidar simulator for autonomous driving. IEEE Robot. Autom. Lett. 5(2), 1931–1938 (2020)
Article Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: CVPR (2012)
Google Scholar
Huang, X., Wang, P., Cheng, X., Zhou, D., Geng, Q., Yang, R.: The apolloscape open dataset for autonomous driving and its application. PAMI 42(10), 2702–2719 (2019)
Article Google Scholar
Jeong, J., Lee, S., Kim, J., Kwak, N.: Consistency-based semi-supervised learning for object detection. In: NeurIPS (2019)
Google Scholar
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: Fast encoders for object detection from point clouds. In: CVPR (2019)
Google Scholar
Mao, J., et al.: One million scenes for autonomous driving: once dataset. In: NeurIPS Datasets and Benchmarks (2021)
Google Scholar
Meng, Q., Wang, W., Zhou, T., Shen, J., Jia, Y., Van Gool, L.: Towards a weakly supervised framework for 3d point cloud object detection and annotation. TPAMI (2021)
Google Scholar
Meng, Q., Wang, W., Zhou, T., Shen, J., Van Gool, L., Dai, D.: Weakly supervised 3D object detection from lidar point cloud. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12358, pp. 515–531. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_31
Chapter Google Scholar
Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint (2018)
Google Scholar
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: CVPR (2019)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: NeurIPS (2017)
Google Scholar
Rizve, M.N., Duarte, K., Rawat, Y.S., Shah, M.: In defense of pseudo-labeling: an uncertainty-aware pseudo-label selection framework for semi-supervised learning. In: ICLR (2021)
Google Scholar
Samuli, L., Timo, A.: Temporal ensembling for semi-supervised learning. In: ICLR (2017)
Google Scholar
Shanmugam, D., Blalock, D., Balakrishnan, G., Guttag, J.: Better aggregation in test-time augmentation. In: ICCV (2021)
Google Scholar
Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., Li, H.: Pv-rcnn: Point-voxel feature set abstraction for 3D object detection. In: CVPR (2020)
Google Scholar
Shi, S., Wang, X., Li, H.: Pointrcnn: 3d object proposal generation and detection from point cloud. In: CVPR (2019)
Google Scholar
Shi, W., Rajkumar, R.: Point-GNN: Graph neural network for 3d object detection in a point cloud. In: CVPR (2020)
Google Scholar
Sohn, K., et al.: Fixmatch: simplifying semi-supervised learning with consistency and confidence. In: NeurIPS (2020)
Google Scholar
Sohn, K., Zhang, Z., Li, C.L., Zhang, H., Lee, C.Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint (2020)
Google Scholar
Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: CVPR (2020)
Google Scholar
Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: NeurIPS (2017)
Google Scholar
van Engelen, J.E., Hoos, H.H.: A survey on semi-supervised learning. Mach. Learn. 109(2), 373–440 (2019). https://doi.org/10.1007/s10994-019-05855-6
Article MathSciNet MATH Google Scholar
Wang, H., Cong, Y., Litany, O., Gao, Y., Guibas, L.J.: 3dioumatch: leveraging IOU prediction for semi-supervised 3d object detection. In: CVPR (2021)
Google Scholar
Wang, W., Zhou, T., Yu, F., Dai, J., Konukoglu, E., Van Gool, L.: Exploring cross-image pixel contrast for semantic segmentation. In: ICCV (2021)
Google Scholar
Xie, Q., Luong, M.T., Hovy, E., Le, Q.V.: Self-training with noisy student improves imagenet classification. In: CVPR (2020)
Google Scholar
Xie, S., Gu, J., Guo, D., Qi, C.R., Guibas, L., Litany, O.: PointContrast: unsupervised pre-training for 3d point cloud understanding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 574–591. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_34
Chapter Google Scholar
Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Article Google Scholar
Yang, Z., Sun, Y., Liu, S., Jia, J.: 3DSSD: point-based 3D single stage object detector. In: CVPR (2020)
Google Scholar
Yin, J., Shen, J., Gao, X., Crandall, D., Yang, R.: Graph neural network and spatiotemporal transformer attention for 3d video object detection from point clouds. TPAMI (2021)
Google Scholar
Yin, J., Shen, J., Guan, C., Zhou, D., Yang, R.: Lidar-based online 3d video object detection with graph-based message passing and spatiotemporal transformer attention. In: CVPR (2020)
Google Scholar
Yin, J., Zhou, D., Zhang, L., Fang, J., Xu, C.Z., Shen, J., Wang, W.: Proposalcontrast: Unsupervised pre-training for lidar-based 3D object detection. In: ECCV (2022)
Google Scholar
Yin, T., Zhou, X., Krahenbuhl, P.: Center-based 3D object detection and tracking. In: CVPR (2021)
Google Scholar
Zhao, N., Chua, T.S., Lee, G.H.: SESS: self-ensembling semi-supervised 3D object detection. In: CVPR (2020)
Google Scholar
Zhou, D., et al.: Joint 3d instance segmentation and object detection for autonomous driving. In: CVPR (2020)
Google Scholar
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: CVPR (2018)
Google Scholar
Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synthesis Lect. Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by Zhejiang Lab’s International Talent Fund for Young Professionals (ZJ2020GZ023), ARC DECRA DE220101390, FDCT under grant 0015/2019/AKP, and the Start-up Research Grant (SRG) of University of Macau.

Author information

Authors and Affiliations

School of Computer Science, Beijing Institute of Technology, Beijing, China
Junbo Yin
Baidu Research, Beijing, China
Jin Fang, Dingfu Zhou & Liangjun Zhang
National Engineering Laboratory of Deep Learning Technology and Application, Beijing, China
Jin Fang, Dingfu Zhou & Liangjun Zhang
SKL-IOTSC, CIS, University of Macau, Zhuhai, China
Jin Fang, Cheng-Zhong Xu & Jianbing Shen
ReLER, AAII, University of Technology Sydney, Ultimo, Australia
Wenguan Wang

Authors

Junbo Yin
View author publications
You can also search for this author in PubMed Google Scholar
Jin Fang
View author publications
You can also search for this author in PubMed Google Scholar
Dingfu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Liangjun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Zhong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jianbing Shen
View author publications
You can also search for this author in PubMed Google Scholar
Wenguan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jianbing Shen or Wenguan Wang .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yin, J. et al. (2022). Semi-supervised 3D Object Detection with Proficient Teachers. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13698. Springer, Cham. https://doi.org/10.1007/978-3-031-19839-7_42

Download citation

DOI: https://doi.org/10.1007/978-3-031-19839-7_42
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19838-0
Online ISBN: 978-3-031-19839-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Semi-supervised 3D Object Detection with Proficient Teachers

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Few Annotated Pixels and Point Cloud Based Weakly Supervised Semantic Segmentation of Driving Scenes

TCC-Det: Temporarily Consistent Cues for Weakly-Supervised 3D Detection

Lidar Point Cloud Guided Monocular 3D Object Detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Semi-supervised 3D Object Detection with Proficient Teachers

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Few Annotated Pixels and Point Cloud Based Weakly Supervised Semantic Segmentation of Driving Scenes

TCC-Det: Temporarily Consistent Cues for Weakly-Supervised 3D Detection

Lidar Point Cloud Guided Monocular 3D Object Detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation