Abstract
The current 3D object detection methods have achieved promising results for conventional tasks to detect frequently occurring objects like cars, pedestrians and cyclists. However, they require many annotated boundary boxes and class labels for training, which is very expensive and hard to obtain. Nevertheless, detecting infrequent occurring objects, such as police vehicles, is also essential for autonomous driving to be successful. Therefore, we explore the potential of few-shot learning to handle this challenge of detecting infrequent categories. The current 3D object detectors do not have the necessary architecture to support this type of learning. Thus, this paper presents a new method termed few-shot single-stage network for 3D object detection (FS-3DSSN) to predict infrequent categories of objects. FS-3DSSN uses a class-incremental few-shot learning approach to detect infrequent categories without compromising the detection accuracy of frequent categories. It consists of two modules: (i) a single-stage network architecture for 3D object detection (3DSSN) using deformable convolutions to detect small objects and (ii) a class-incremental-based meta-learning module to learn and predict infrequent class categories. 3DSSN obtained 84.53 \(\textrm{mAP}_{\textrm{3D}}\) on the KITTI car category and 73.4 NDS on the nuScenes dataset, outperforming previous state of the art. Further, the result of FS-3DSSN on nuScenes is also encouraging for detecting infrequent categories while maintaining accuracy in frequent classes.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The dataset used in the script is available at following link: KITTI dataset: https://www.cvlibs.net/datasets/kitti/ nuScenes dataset: https://www.nuscenes.org/nuscenes.
References
Qian, R., Lai, X., Li, X.: 3D object detection for autonomous driving: a survey. Pattern Recogn. 130, 108796 (2022). https://doi.org/10.1016/j.patcog.2022.108796
Drobnitzky, M., Friederich, J., Egger, B., Zschech, P.: Survey and systematization of 3D object detection models and methods. The Visual Computer 11, 1–47 (2023)
Zheng, W., Tang, W., Jiang, L., Fu, C.-W.: SE-SSD: Self-ensembling single-stage object detector from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14494–14503 (2021)
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017)
Wu, P., Gu, L., Yan, X., Xie, H., Wang, F.L., Cheng, G., Wei, M.: PV-RCNN++: semantical point-voxel feature interaction for 3D object detection. Vis. Comput. 39(6), 2425–2440 (2023)
Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–779 (2019)
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018)
Wang, K., Zhou, T., Li, X., Ren, F.: Performance and challenges of 3D object detection methods in complex scenes for autonomous driving. IEEE Trans. Intell. Vehicles 8(2), 1699–1716 (2023). https://doi.org/10.1109/TIV.2022.3213796
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: Single shot MultiBox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, Springer, pp. 21–37 (2016)
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022)
Yin, T., Zhou, X., Krahenbuhl, P.: Center-based 3D object detection and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11784–11793 (2021)
Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., Li, H.: PV-RCNN: Point-Voxel feature set abstraction for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529–10538 (2020)
Cao, J., Tao, C., Zhang, Z., Gao, Z., Luo, X., Zheng, S., Zhu, Y.: Accelerating point-voxel representation of 3D object detection for automatic driving. IEEE Trans. Artif. Intell. (2023). https://doi.org/10.1109/TAI.2023.3237787
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: Deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems 30 (2017)
Chen, Y., Liu, S., Shen, X., Jia, J.: Fast point R-CNN. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9775–9784 (2019)
Shi, G., Li, R., Ma, C.: PillarNet: Real-time and high-performance pillar-based 3D object detection. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part X, Springer, pp. 35–52 (2022)
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: Fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)
Yan, Y., Mao, Y., Li, B.: SECOND: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Liu, J., Dong, X., Zhao, S., Shen, J.: Generalized Few-Shot 3D object detection of LiDAR point cloud for autonomous driving. arXiv preprint arXiv:2302.03914 (2023)
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578 (2019)
Liu, Z., Xiang, Q., Tang, J., Wang, Y., Zhao, P.: Robust salient object detection for RGB images. Vis. Comput. 36, 1823–1835 (2020)
Alaba, S.Y., Ball, J.E.: Deep learning-based image 3D object detection for autonomous driving. IEEE Sens. J. 23(4), 3378–3394 (2023)
Huang, Z., Chen, B., Zhu, D.: ImGeo-VoteNet: image and geometry co-supported VoteNet for RGB-D object detection. The Visual Computer 10, 1–13 (2023)
Yang, Z., Sun, Y., Liu, S., Jia, J.: 3DSSD: Point-based 3D single stage object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11040–11048 (2020)
Chen, C., Chen, Z., Zhang, J., Tao, D.: SASA: Semantics-augmented set abstraction for point-based 3D object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 221–229 (2022)
Du, L., Ye, X., Tan, X., Johns, E., Chen, B., Ding, E., Xue, X., Feng, J.: AGO-Net: association-guided 3D point cloud object detection network. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 8097–8109 (2022). https://doi.org/10.1109/TPAMI.2021.3104172
Yu, C., Lei, J., Peng, B., Shen, H., Huang, Q.: SIEV-Net: a structure-information enhanced voxel network for 3D object detection from LiDAR point clouds. IEEE Trans. Geosci. Remote Sens. 60, 1–11 (2022)
Shi, S., Wang, Z., Shi, J., Wang, X., Li, H.: From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network. IEEE Trans. Pattern Anal. Mach. Intell. 43(8), 2647–2664 (2021). https://doi.org/10.1109/TPAMI.2020.2977026
Huang, G., Laradji, I., Vazquez, D., Lacoste-Julien, S., Rodriguez, P.: A survey of self-supervised and few-shot object detection. IEEE Trans. Pattern Anal Mach. Intell. 45(4), 4071–4089 (2022)
Cheraghian, A., Rahman, S., Fang, P., Roy, S.K., Petersson, L., Harandi, M.: Semantic-aware knowledge distillation for few-shot class-incremental learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2534–2543 (2021)
Cheng, M., Wang, H., Long, Y.: Meta-learning-based incremental few-shot object detection. IEEE Trans. Circuits Syst. Video Technol. 32(4), 2158–2169 (2022). https://doi.org/10.1109/TCSVT.2021.3088545
Antonelli, S., Avola, D., Cinque, L., Crisostomi, D., Foresti, G.L., Galasso, F., Marini, M.R., Mecca, A., Pannone, D.: Few-shot object detection: a survey. ACM Computing Surveys (CSUR) 54(11s), 1–37 (2022)
Yuan, S., Li, X., Huang, H., Fang, Y.: Meta-Det3D: Learn to learn few-shot 3D object detection. In: Proceedings of the Asian Conference on Computer Vision, pp. 1761–1776 (2022)
Wu, X., Sahoo, D., Hoi, S.: Meta-RCNN: Meta learning for few-shot object detection. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1679–1687 (2020)
Han, G., Huang, S., Ma, J., He, Y., Chang, S.-F.: Meta faster R-CNN: towards accurate few-shot object detection with attentive feature alignment. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 780–789 (2022)
Kang, B., Liu, Z., Wang, X., Yu, F., Feng, J., Darrell, T.: Few-shot object detection via feature reweighting. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8419–8428 (2019). https://doi.org/10.1109/ICCV.2019.00851
Jiang, W., Huang, K., Geng, J., Deng, X.: Multi-scale metric learning for few-shot learning. IEEE Trans. Circuits Syst. Video Technol. 31(3), 1091–1102 (2020)
Lu, Y., Chen, X., Wu, Z., Yu, J.: Decoupled metric network for single-stage few-shot object detection. IEEE Trans. Cybern. 53(1), 514–525 (2022)
Wei, L., Cui, W., Hu, Z., Sun, H., Hou, S.: A single-shot multi-level feature reused neural network for object detection. Vis. Comput. 37(1), 133–142 (2021)
Ning, K., Liu, Y., Su, Y., Jiang, K.: Point-voxel and bird-eye-view representation aggregation network for single stage 3D object detection. IEEE Trans. Intell. Trans. Syst. 24(3), 3223–3235 (2022)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
He, C., Zeng, H., Huang, J., Hua, X.-S., Zhang, L.: Structure aware single-stage 3D object detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11873–11882 (2020)
Zhou, D., Fang, J., Song, X., Guan, C., Yin, J., Dai, Y., Yang, R.: IoU Loss for 2D/3D object detection. In: 2019 International Conference on 3D Vision (3DV), pp. 85–94 (2019). IEEE
Chen, Q., Sun, L., Wang, Z., Jia, K., Yuille, A.: Object as hotspots: an anchor-free 3D object detection approach via firing of hotspots. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16, pp. 68–84 (2020). Springer
Bai, X., Hu, Z., Zhu, X., Huang, Q., Chen, Y., Fu, H., Tai, C.-L.: TransFusion: robust LiDAR-Camera fusion for 3D object detection with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1090–1099 (2022)
Koh, J., Lee, J., Lee, Y., Kim, J., Choi, J.W.: MGTANet: Encoding sequential LiDAR points using long short-term motion-guided temporal attention for 3D object detection. arXiv preprint arXiv:2212.00442 (2022)
Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O.: nuScenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tiwari, A.K., Sharma, G.K. FS-3DSSN: an efficient few-shot learning for single-stage 3D object detection on point clouds. Vis Comput 40, 8125–8139 (2024). https://doi.org/10.1007/s00371-023-03228-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-03228-8