More Web Proxy on the site http://driver.im/

Article

LISO: Lidar-Only Self-supervised 3D Object Detection

Authors:

Stefan Andreas Baur,

Frank Moosmann,

Andreas GeigerAuthors Info & Claims

Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LXXXVI

Pages 253 - 270

https://doi.org/10.1007/978-3-031-73016-0_15

Published: 26 October 2024 Publication History

Abstract

3D object detection is one of the most important components in any Self-Driving stack, but current state-of-the-art (SOTA) lidar object detectors require costly & slow manual annotation of 3D bounding boxes to perform well. Recently, several methods emerged to generate pseudo ground truth without human supervision, however, all of these methods have various drawbacks: Some methods require sensor rigs with full camera coverage and accurate calibration, partly supplemented by an auxiliary optical flow engine. Others require expensive high-precision localization to find objects that disappeared over multiple drives.

We introduce a novel self-supervised method to train SOTA lidarobject detection networks, requiring only unlabeled sequences of lidar point clouds. We call this trajectory-regularized self-training. It utilizes a SOTA self-supervised lidar scene flownetwork under the hood to generate, track, and iteratively refine pseudo ground truth. We demonstrate the effectiveness of our approach for multiple SOTA object detection networks across multiple real-world datasets. Code will be released (https://github.com/baurst/liso).

References

[1]

Amini, M., Feofanov, V., Pauletto, L., Devijver, E., Maximov, Y.: Self-training: a survey. CoRR abs/2202.12040 (2022). https://arxiv.org/abs/2202.12040

[2]

Bai, X., et al.: Transfusion: robust lidar-camera fusion for 3d object detection with transformers. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18–24 June 2022, pp. 1080–1089. IEEE (2022).

[3]

Baur, S.A., Emmerichs, D.J., Moosmann, F., Pinggera, P., Ommer, B., Geiger, A.: SLIM: self-supervised lidar scene flow and motion segmentation. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, 10–17 October 2021, pp. 13106–13116. IEEE (2021).

[4]

Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020, pp. 11618–11628. IEEE (2020).

[5]

Chen, Z., Luo, Y., Wang, Z., Baktashmotlagh, M., Huang, Z.: Revisiting domain-adaptive 3d object detection by reliable, diverse and class-balanced pseudo-labeling. In: IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, 1–6 October 2023, pp. 3691–3703. IEEE (2023).

[6]

Deng, D., Zakhor, A.: RSF: optimizing rigid scene flow from 3D point clouds without labels. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1277–1286, January 2023

[7]

Dewan, A., Caselitz, T., Tipaldi, G.D., Burgard, W.: Motion-based detection and tracking in 3d lidar scans. In: Kragic, D., Bicchi, A., Luca, A.D. (eds.) 2016 IEEE International Conference on Robotics and Automation, ICRA 2016, Stockholm, Sweden, 16–21 May 2016, pp. 4508–4513. IEEE (2016).

Digital Library

[8]

Eskandar, G.: An empirical study of the generalization ability of lidar 3d object detectors to unseen domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 23815–23825, June 2024

[9]

Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), Portland, Oregon, USA, pp. 226–231. AAAI Press (1996). http://www.aaai.org/Library/KDD/1996/kdd96-037.php

[10]

Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robotics Res. 32(11), 1231–1237 (2013).

Digital Library

[11]

Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012, pp. 3354–3361. IEEE Computer Society (2012).

[12]

Harley, A.W., et al.: Track, check, repeat: an EM approach to unsupervised tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 16581–16591. Computer Vision Foundation / IEEE (2021). https://openaccess.thecvf.com/content/CVPR2021/html/Harley_Track_Check_Repeat_An_EM_Approach_to_Unsupervised_Tracking_CVPR_2021_paper.html

[13]

Liang, Z., Zhang, Z., Zhang, M., Zhao, X., Pu, S.: RangeIoUDet: range image based real-time 3d object detector optimized by intersection over union. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 7140–7149. Computer Vision Foundation / IEEE (2021). https://openaccess.thecvf.com/content/CVPR2021/html/Liang_RangeIoUDet_Range_Image_Based_Real-Time_3D_Object_Detector_Optimized_by_CVPR_2021_paper.html

[14]

Meyer, G.P., Laddha, A., Kee, E., Vallespi-Gonzalez, C., Wellington, C.K.: LaserNet: an efficient probabilistic 3d object detector for autonomous driving. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 12677–12686. Computer Vision Foundation / IEEE (2019). http://openaccess.thecvf.com/content_CVPR_2019/html/Meyer_LaserNet_An_Efficient_Probabilistic_3D_Object_Detector_for_Autonomous_Driving_CVPR_2019_paper.html

[15]

Najibi M et al. Avidan S, Brostow GJ, Cissé M, Farinella GM, Hassner T, et al. Motion inspired unsupervised perception and prediction in autonomous driving ECCV 2022, Part XXXVIII 2022 Cham Springer 424-443

Digital Library

[16]

Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, 8–14 December 2019, pp. 8024–8035 (2019). https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html

[17]

Pedregosa F et al. Scikit-learn: machine learning in Python J. Mach. Learn. Res. 2011 12 2825-2830

Digital Library

[18]

Rist, C.B., Enzweiler, M., Gavrila, D.M.: Cross-sensor deep domain adaptation for lidar detection and segmentation. In: 2019 IEEE Intelligent Vehicles Symposium, IV 2019, Paris, France, 9–12 June 2019, pp. 1535–1542. IEEE (2019).

Digital Library

[19]

Seidenschwarz, J., Ošep, A., Ferroni, F., Lucey, S., Leal-Taixé, L.: SeMoLi: what moves together belongs together (2024). https://arxiv.org/abs/2402.19463

[20]

Shen, Z., Liang, H., Lin, L., Wang, Z., Huang, W., Yu, J.: Fast ground segmentation for 3d lidar point cloud based on jump-convolution-process. Remote. Sens. 13(16), 3239 (2021).

[21]

Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 770–779. Computer Vision Foundation / IEEE (2019). http://openaccess.thecvf.com/content_CVPR_2019/html/Shi_PointRCNN_3D_Object_Proposal_Generation_and_Detection_From_Point_Cloud_CVPR_2019_paper.html

[22]

Shin, S., Golodetz, S., Vankadari, M., Zhou, K., Markham, A., Trigoni, N.: Sample, crop, track: self-supervised mobile 3D object detection for urban driving lidar. CoRR abs/2209.10471 (2022).

[23]

Song, Z., Yang, B.: OGC: Unsupervised 3D object segmentation from rigid dynamics of point clouds. In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (eds.) Advances in Neural Information Processing Systems (2022). https://openreview.net/forum?id=ecNbEOOtqBU

[24]

Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020, pp. 2443–2451. IEEE (2020).

[25]

Sun, P., et al.: RSN: range sparse net for efficient, accurate lidar 3d object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 5725–5734. Computer Vision Foundation / IEEE (2021). https://openaccess.thecvf.com/content/CVPR2021/html/Sun_RSN_Range_Sparse_Net_for_Efficient_Accurate_LiDAR_3D_Object_CVPR_2021_paper.html

[26]

Théodose, R., Denis, D., Chateau, T., Frémont, V., Checchin, P.: A deep learning approach for lidar resolution-agnostic object detection. IEEE Trans. Intell. Transp. Syst. 23(9), 14582–14593 (2022).

Digital Library

[27]

Träuble, B., Pauen, S., Poulin-Dubois, D.: Speed and direction changes induce the perception of animacy in 7-month-old infants. Front. Psychol. 5 (2014). https://www.frontiersin.org/articles/10.3389/fpsyg.2014.01141

[28]

Vizzo, I., Guadagnino, T., Mersch, B., Wiesmann, L., Behley, J., Stachniss, C.: KISS-ICP: in defense of point-to-point ICP - simple, accurate, and robust registration if done the right way. IEEE Robot. Autom. Lett. 8(2), 1029–1036 (2023).

[29]

Wang, Y., et al.: Train in Germany, test in the USA: making 3d object detectors generalize. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020, pp. 11710–11720. Computer Vision Foundation / IEEE (2020). https://openaccess.thecvf.com/content_CVPR_2020/html/Wang_Train_in_Germany_Test_in_the_USA_Making_3D_Object_CVPR_2020_paper.html

[30]

Wang, Y., Chen, Y., Zhang, Z.: 4d unsupervised object discovery. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022 (2022). http://papers.nips.cc/paper_files/paper/2022/hash/e7407ab5e89c405d28ff6807ffec594a-Abstract-Conference.html

[31]

Wilson, B., et al.: Argoverse 2: next generation datasets for self-driving perception and forecasting. In: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS Datasets and Benchmarks 2021) (2021)

[32]

Wozniak, M.K., Hansson, M., Thiel, M., Jensfelt, P.: UADA3D: unsupervised adversarial domain adaptation for 3d object detection with sparse lidar and large domain gaps. CoRR abs/2403.17633 (2024).

[33]

Xu, J., Waslander, S.L.: HyperMODEST: self-supervised 3d object detection with confidence score filtering. CoRR abs/2304.14446 (2023).

[34]

Xu, Q., Zhou, Y., Wang, W., Qi, C.R., Anguelov, D.: SPG: unsupervised domain adaptation for 3d object detection via semantic point generation. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, 10–17 October 2021, pp. 15426–15436. IEEE (2021).

[35]

Yang, Z., Sun, Y., Liu, S., Jia, J.: 3DSSD: point-based 3D single stage object detector. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020, pp. 11037–11045. Computer Vision Foundation / IEEE (2020). https://openaccess.thecvf.com/content_CVPR_2020/html/Yang_3DSSD_Point-Based_3D_Single_Stage_Object_Detector_CVPR_2020_paper.html

[36]

Yin, T., Zhou, X., Krähenbühl, P.: Center-based 3d object detection and tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 11784–11793. Computer Vision Foundation / IEEE (2021). https://openaccess.thecvf.com/content/CVPR2021/html/Yin_Center-Based_3D_Object_Detection_and_Tracking_CVPR_2021_paper.html

[37]

You, Y., et al.: Learning to detect mobile objects from lidar scans without labels. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18–24 June 2022, pp. 1120–1130. IEEE (2022).

[38]

Zhang, L., et al.: Towards unsupervised object detection from lidar point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9317–9328, June 2023

[39]

Zhang, Y., Hu, Q., Xu, G., Ma, Y., Wan, J., Guo, Y.: Not all points are equal: learning highly efficient point-based detectors for 3d lidar point clouds. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18–24 June 2022, pp. 18931–18940. IEEE (2022).

[40]

Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3d object detection. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 4490–4499. IEEE Computer Society (2018). http://openaccess.thecvf.com/content_cvpr_2018/html/Zhou_VoxelNet_End-to-End_Learning_CVPR_2018_paper.html

Index Terms

LISO: Lidar-Only Self-supervised 3D Object Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Matching
        Object detection
        Object recognition
      2. Computer vision tasks
        Scene understanding
        Vision for robotics

Index terms have been assigned to the content through auto-classification.

Recommendations

Leveraging Prior-Knowledge for Weakly Supervised Object Detection Under a Collaborative Self-Paced Curriculum Learning Framework

Weakly supervised object detection is an interesting yet challenging research topic in computer vision community, which aims at learning object models to localize and detect the corresponding objects of interest only under the supervision of image-level ...
Self-supervised Prototype Conditional Few-Shot Object Detection
Image Analysis and Processing – ICIAP 2022
Abstract
Traditional deep learning-based object detection methods require a large amount of annotation for training, and creating such a dataset is expensive. Few-shot object detection which detects a new category of objects with a small amount of data is ...
Forget and Diversify: Regularized Refinement for Weakly Supervised Object Detection
Computer Vision – ACCV 2018
Abstract
We study weakly supervised learning for object detectors, where training images have image-level class labels only. This problem is often addressed by multiple instance learning, where pseudo-labels of proposals are constructed from image-level ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LXXXVI

Sep 2024

554 pages

ISBN:978-3-031-73015-3

DOI:10.1007/978-3-031-73016-0

Editors:
Aleš Leonardis
University of Birmingham, Birmingham, UK
,
Elisa Ricci
https://ror.org/05trd4x28University of Trento, Trento, Italy
,
Stefan Roth
Technical University of Darmstadt, Darmstadt, Hessen, Germany
,
Olga Russakovsky
Princeton University, Palo Alto, CA, USA
,
Torsten Sattler
Czech Technical University in Prague, Prague, Czech Republic
,
Gül Varol
École des Ponts ParisTech, Marne-la-Vallée, France

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 26 October 2024

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents