Keypoint-Aware Single-Stage 3D Object Detector for Autonomous Driving
<p>Various BEVs (bird’s eye views) of predicted bounding box. The purple circle refers to the predefined anchor point, which predicts the 3D bounding box. The blue circles refer to the extracted keypoints on the boundary, and the green circles refer to the extracted keypoints inside the bounding box. The blue dashed line refers to the BEV of the predicted bounding box. The red circles refer to the point on the detection object scanned by the LIDAR, the yellow star refers to the center location of the bounding box. (<b>a</b>,<b>e</b>) denote different relations of predefined anchor points and bounding boxes. (<b>b</b>,<b>c</b>) denote different distribution of boundary keypoints of the bounding boxes. (<b>f</b>,<b>g</b>) indicate different distribution of inner keypoints of the bounding boxes. (<b>d</b>,<b>h</b>) indicate different Point out the different sampling methods covering the boundary and inner parts.</p> "> Figure 2
<p>Overview of the KASSD. Firstly, the KASSD convert raw points to voxel features. Then, the 3D backbone module applies 3D sparse convolution for feature extraction. Subsequently, the 3D features are converted into BEV representations, on which we use the LLM module to obtain more expressive features for subsequent detection. Finally, the KAM take regression box is used as input and generates accurate confident scores for post-processing.</p> "> Figure 3
<p>Voxel feature encoding layer. (<b>a</b>) Complex encoding method by stacking layers. (<b>b</b>) Computing mean value of inner points in voxel grid.</p> "> Figure 4
<p>The structure of the 3D backbone module. The first orange box converts voxel features into 4D sparse tensors. The green boxes are submanifold convolutional layers. The blue boxes are sparse convolution with stride = 2.</p> "> Figure 5
<p>The structure of the LLM module. (<b>a</b>) Feature reuse module. (<b>b</b>) Location attention module for multi-layer feature fusion.</p> "> Figure 6
<p>The illustration of the KAM module. The predicted bounding box is projected to feature map and extract keypoints to yield a rich representation.</p> "> Figure 7
<p>The method to calculate keypoints. The boundary keypoints are represented by blue circles and the inner points by green circles. The center keypoint is indicated by a purple circle. The blue dashed line refers to the BEV of the predicted bounding box.</p> "> Figure 8
<p>A visualization result of cars using the KITTI validation set. We present paired samples, where in each pair, row 1 is the 3D bounding boxes projected into the image for clearer visualization, while row 2 is the detection result of the LiDAR point cloud. We use red and green boxes to denote detections and ground truth boxes, respectively.</p> "> Figure 9
<p>A visualization result of cars using the KITTI test set. The detection results are indicated by a red box.</p> ">
Abstract
:1. Introduction
- (1)
- In order to better retain and extract spatial information from LiDAR, as well as to extract effective cross-layer features, a novel lightweight location attention module named LLM is proposed, which can maintain an efficient flow of spatial information and incorporate multi-level features.
- (2)
- A keypoints sample method is adopted to enhance the correlation between the predicted bounding box and scores, thus improving the performance of detection.
- (3)
- Extensive experiments are conducted on the KITTI benchmark dataset, demonstrating that the proposed network attains good performance.
2. Related Work
2.1. Multi-Modal Fusion Detector
2.2. LiDAR-Based Detector
2.3. Location Quality Estimation
3. Approach
3.1. Framework
3.2. Voxelization
3.3. D Backbone Module
3.4. Lightweight Location Attention Module
3.5. Keypoint-Aware Module
3.6. Loss Function
4. Experiments
4.1. Dataset and Evaluation Metrics
4.2. Implementation Details
4.3. Evaluation with the KITTI Dataset
4.4. Evaluation on the WHUT Dataset
4.5. Ablation Study
4.6. Runtime Analysis
4.7. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Christine, D.; Rung-Ching, C.; Hui, Y.; Xiaoyi, J. Robust detection method for improving small traffic sign recognition based on spatial pyramid pooling. J. Ambient. Intell. Humaniz. Comput. 2021. [Google Scholar] [CrossRef]
- Zhang, S.; Wang, C.; Lin, L.; Wen, C.; Yang, C.; Zhang, Z.; Li, J. Automated Visual Recognizability Evaluation of Traffic Sign Based on 3D LiDAR Point Clouds. Remote Sens. 2019, 11, 1453. [Google Scholar] [CrossRef] [Green Version]
- Bayoudh, K.; Hamdaoui, F.; Mtibaa, A. Transfer learning based hybrid 2D-3D CNN for traffic sign recognition and semantic road detection applied in advanced driver assistance systems. Appl. Intell. 2020, 51, 124–142. [Google Scholar] [CrossRef]
- Wu, L.; Zhang, R.; Zhou, R.; Wu, D. An edge computing based data detection scheme for traffic light at intersections. Comput. Commun. 2021, 176, 91–98. [Google Scholar] [CrossRef]
- Lee, E.; Kim, D. Accurate traffic light detection using deep neural network with focal regression loss. Image Vis. Comput. 2019, 87, 24–36. [Google Scholar] [CrossRef]
- Xiang, Z.; Chao, Z.; Hangzai, L.; Wanqing, Z.; Sheng, Z.; Lei, T.; Jinye, P.; Jianping, F. Automatic Learning for Object Detection. Neurocomputing, 2022; in press. [Google Scholar] [CrossRef]
- Shi, S.; Jiang, L.; Deng, J.; Wang, Z.; Guo, C.; Shi, J.; Wang, W.; Li, H. PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10526–10535. [Google Scholar]
- Li, J.; Luo, S.; Zhu, Z.; Dai, H.; Krylov, A.; Ding, Y.; Shao, L. 3D iou-net: Iou guided 3D object detector for point clouds. arXiv 2020, arXiv:2004.04962. [Google Scholar]
- Qian, R.; Lai, X.; Li, X. Boundary-Aware 3D Object Detection from Point Clouds. arXiv 2021, arXiv:2104.10330. [Google Scholar] [CrossRef]
- Zheng, W.; Tang, W.; Chen, S.; Jiang, L.; Fu, C. CIA-SSD: Confident IoU-Aware Single-Stage Object Detector from Point Cloud. arXiv 2020, arXiv:2012.03015. [Google Scholar]
- Qi, C.; Liu, W.; Wu, C.; Su, H.; Guibas, L. Frustum PointNets for 3D object detection from RGB-D data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 918–927. [Google Scholar]
- Wang, Z.; Jia, K. Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal 3D Object Detection. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 1742–1749. [Google Scholar]
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3D Object Detection Network for Autonomous Driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6526–6534. [Google Scholar]
- Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S. Joint 3d proposal generation and object detection from view aggregation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019, Macau, China, 3–8 November 2019; pp. 1–8. [Google Scholar]
- Liang, M.; Yang, B.; Chen, Y.; Hu, R.; Urtasun, R. Multi-task multi-sensor fusion for 3d object detection. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 7345–7353. [Google Scholar]
- Pang, S.; Morris, D.; Radha, H. CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection. In Proceedings of the 2020 International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 10386–10393. [Google Scholar]
- Shi, S.; Wang, X.; Li, H. PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud. In Proceedings of the IEEE/CVF Conference Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 770–779. [Google Scholar]
- Yang, Z.; Sun, Y.; Liu, S.; Shen, X.; Jia, J. Std: Sparse-to-dense 3d object detector for point cloud. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 1951–1960. [Google Scholar]
- Su, Z.; Tan, P.; Wang, Y. DV-Det: Efficient 3D Point Cloud Object Detection with Dynamic Voxelization. arXiv 2021, arXiv:2107.12707. [Google Scholar]
- Zhou, Y.; Tuzel, V. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 4490–4499. [Google Scholar]
- Yan, Y.; Mao, Y.; Li, B. Second: Sparsely embedded convolutional detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lang, A.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J. PointPillars: Fast Encoders for Object Detection from Point Clouds. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 12689–12697. [Google Scholar]
- Yang, Z.; Sun, Y.; Liu, S.; Jia, J. 3DSSD: Point-Based 3D Single Stage Object Detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11037–11045. [Google Scholar]
- He, C.; Zeng, H.; Huang, J.; Hua, X.; Zhang, L. Structure Aware Single-Stage 3D Object Detection from Point Cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11870–11879. [Google Scholar]
- Yang, B.; Luo, W.; Urtasun, R. PIXOR: Real-time 3D Object Detection from Point Clouds. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7652–7660. [Google Scholar] [CrossRef] [Green Version]
- Xu, J.; Ma, Y.; He, S.; Zhu, J. 3D-GIoU: 3D Generalized Intersection over Union for Object Detection in Point Cloud. Sensors 2019, 19, 4093. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, B.; An, J.; Cao, J. Voxel-FPN: Multi-Scale Voxel Feature Aggregation for 3D Object Detection from LIDAR Point Clouds. Sensors 2020, 20, 704. [Google Scholar]
- Wang, G.; Tian, B.; Ai, Y.; Xu, T.; Chen, L.; Cao, D. CenterNet3D: An Anchor free Object Detector for Autonomous Driving. arXiv 2020, arXiv:2007.07214. [Google Scholar]
- Song, Q.; Mei, K.; Huang, R. AttaNet: Attention-Augmented Network for Fast and Accurate Scene Parsing. arXiv 2021, arXiv:2103.05930. [Google Scholar]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-fcn: Object detection via region-based fully convolutional networks. arXiv 2016, arXiv:1605.06409. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. arXiv 2017, arXiv:1708.02002. [Google Scholar]
- Andreas, G.; Philip, L.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
- Liu, Z.; Zhao, X.; Huang, T.; Hu, R.; Zhou, Y.; Bai, X. Tanet: Robust 3d object detection from point clouds with triple attention. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 11677–11684. [Google Scholar]
- Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Warm Restarts. arXiv 2017, arXiv:1608.03983. [Google Scholar]
- Xie, L.; Xiang, C.; Yu, Z.; Xu, G.; Yang, Z.; Cai, D.; He, X. PI-RCNN: An Efficient Multi-sensor 3D Object Detector with Point-based Attentive Cont-conv Fusion Module. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 12460–12467. [Google Scholar]
- Chen, Y.; Liu, S.; Shen, X.; Jia, J. Fast point R-CNN. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 9775–9784. [Google Scholar]
- He, Y.; Xia, G.; Luo, Y.; Su, L.; Zhang, Z.; Li, W.; Wang, P. DVFENet: Dual-branch voxel feature extraction network for 3D object detection. Neurocomputing 2021, 459, 201. [Google Scholar] [CrossRef]
- Du, L.; Ye, X.; Tan, X.; Feng, J.; Xu, Z.; Ding, E.; Wen, S. Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 13326–13335. [Google Scholar]
- Shi, W.; Rajkumar, R. Point-GNN: Graph neural network for 3d object detection in a point cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1711–1719. [Google Scholar]
- Zhang, J.; Wang, J.; Xu, D.; Li, Y. HCNET: A Point Cloud Object Detection Network Based on Height and Channel Attention. Remote Sens. 2021, 13, 5071. [Google Scholar] [CrossRef]
- Li, H.; Zhao, S.; Zhao, W.; Zhang, L.; Shen, J. One-Stage Anchor-Free 3D Vehicle Detection from LiDAR Sensors. Sensors 2021, 21, 2651. [Google Scholar] [CrossRef] [PubMed]
- Chen, Q.; Fan, C.; Jin, W.; Zou, L.; Li, F.; Li, X.; Jiang, H.; Wu, M.; Liu, Y. EPGNet: Enhanced Point Cloud Generation for 3D Object Detection. Sensors 2020, 20, 6927. [Google Scholar] [CrossRef] [PubMed]
- Li, F.; Jin, W.; Fan, C.; Zou, L.; Chen, Q.; Li, X.; Jiang, H.; Liu, Y. PSANet: Pyramid Splitting and Aggregation Network for 3D Object Detection in Point Cloud. Sensors 2021, 21, 136. [Google Scholar] [CrossRef] [PubMed]
- Choi, H.; Jeong, J.; Choi, J.Y. Rotation-Aware 3D Vehicle Detection from Point Cloud. IEEE Access 2021, 9, 99276–99286. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1906. [Google Scholar] [CrossRef] [Green Version]
- Yang, M.; Yu, K.; Zhang, C.; Li, Z.; Yang, K. DenseASPP for Semantic Segmentation in Street Scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 7345–7353. [Google Scholar]
Type | Method | Modality | AP | ||
---|---|---|---|---|---|
Easy | Mod | Hard | |||
2- stage | MV3D [13] | LiDAR + Camera | 74.97 | 63.63 | 54.00 |
F-PointNet [11] | LiDAR + Camera | 82.19 | 69.79 | 60.59 | |
PI-RCNN [35] | LiDAR + Camera | 84.37 | 74.82 | 70.03 | |
PointRCNN [8] | LiDAR | 85.94 | 75.76 | 68.32 | |
Fast Point RCNN [36] | LiDAR | 84.28 | 75.73 | 67.39 | |
STD [18] | LiDAR | 87.95 | 79.71 | 75.09 | |
VoxelNet [20] | LiDAR | 77.49 | 65.11 | 62.85 | |
DVFENet [37] | LiDAR | 86.20 | 79.18 | 74.58 | |
3D IoU-Net [8] | LiDAR | 87.96 | 79.03 | 72.78 | |
1- stage | SECOND [21] | LiDAR | 87.44 | 79.46 | 73.97 |
PointPillars [22] | LiDAR | 82.58 | 74.31 | 68.99 | |
TANet [33] | LiDAR | 84.39 | 75.94 | 68.82 | |
Associate-3Ddet [38] | LiDAR | 85.99 | 77.40 | 70.53 | |
Point-GNN [39] | LiDAR | 88.33 | 79.47 | 72.29 | |
3DSSD [23] | LiDAR | 88.36 | 79.57 | 74.55 | |
HCNET [40] | LiDAR | 81.31 | 73.56 | 68.42 | |
AVEF [41] | LiDAR | 84.41 | 75.39 | 69.89 | |
Ours | LiDAR | 88.92 | 79.75 | 72.17 |
Type | Method | Modality | AP | ||
---|---|---|---|---|---|
Easy | Mod | Hard | |||
2- stage | MV3D [13] | LiDAR + Camera | 86.55 | 78.10 | 76.67 |
F-PointNet [11] | LiDAR + Camera | 88.16 | 84.02 | 76.44 | |
PI-RCNN [35] | LiDAR + Camera | 88.27 | 78.53 | 77.75 | |
PointRCNN [8] | LiDAR | 88.88 | 78.63 | 77.38 | |
Fast Point RCNN [36] | LiDAR | 89.12 | 79.00 | 77.48 | |
STD [18] | LiDAR | 89.7 | 79.8 | 79.30 | |
VoxelNet [20] | LiDAR | 81.97 | 65.46 | 62.85 | |
DVFENet [37] | LiDAR | 89.81 | 79.52 | 78.35 | |
3D IoU-Net [8] | LiDAR | 89.31 | 79.26 | 78.68 | |
1- stage | SECOND [21] | LiDAR | 87.43 | 76.48 | 69.10 |
PointPillars [22] | LiDAR | 88.91 | 79.88 | 78.37 | |
TANet [33] | LiDAR | 87.52 | 76.64 | 73.86 | |
CIA-SSD [10] | LiDAR | 90.04 | 79.81 | 78.80 | |
Associate-3Ddet [38] | LiDAR | 89.29 | 79.17 | 77.76 | |
Point-GNN [39] | LiDAR | 87.89 | 78.34 | 77.38 | |
3DSSD [23] | LiDAR | 89.71 | 79.45 | 78.67 | |
HCNET [40] | LiDAR | 88.45 | 78.01 | 77.72 | |
EPGNet [42] | LiDAR | 89.30 | 78.98 | 77.79 | |
AVEF [41] | LiDAR | 87.94 | 77.74 | 76.39 | |
PSANet [43] | LiDAR | 89.02 | 78.70 | 77.57 | |
RAVD [44] | LiDAR | 89.61 | 79.04 | 77.81 | |
Ours | LiDAR | 90.14 | 80.06 | 78.91 |
Method | Modality | AP |
---|---|---|
SECOND | LiDAR | 72.31 |
Ours | LiDAR | 73.28 |
LLM | Keypoints | AP | ||
---|---|---|---|---|
Easy | Mod | Hard | ||
89.09 | 78.95 | 77.67 | ||
√ | 89.41 | 79.35 | 77.94 | |
√ | 89.94 | 79.84 | 78.82 | |
√ | √ | 90.14 | 80.06 | 78.91 |
Module | AP | ||
---|---|---|---|
Easy | Mod | Hard | |
PSA | 90.14 | 79.61 | 78.35 |
DENSEASPP | 89.74 | 79.12 | 78.08 |
SPP | 89.62 | 79.42 | 78.32 |
Ours | 90.14 | 80.06 | 78.91 |
Boundary Points | Center Points | AP | ||
---|---|---|---|---|
Easy | Mod | Hard | ||
0 | 0 | 89.09 | 78.95 | 77.67 |
4 | 0 | 89.42 | 79.30 | 78.15 |
18 | 0 | 89.65 | 79.45 | 78.31 |
28 | 0 | 89.62 | 79.42 | 78.32 |
0 | 11 | 89.89 | 79.76 | 78.45 |
18 | 13 | 89.94 | 79.84 | 78. 82 |
28 | 19 | 89.66 | 79.32 | 78.43 |
Method | Point-GNN | Associat-3Ddet | Second | 3DSSD | Ours |
---|---|---|---|---|---|
Speed (ms) | 643 | 60 | 46.3 | 38 | 45.9 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, W.; Hu, J.; Chen, R.; An, Y.; Xiong, Z.; Liu, H. Keypoint-Aware Single-Stage 3D Object Detector for Autonomous Driving. Sensors 2022, 22, 1451. https://doi.org/10.3390/s22041451
Xu W, Hu J, Chen R, An Y, Xiong Z, Liu H. Keypoint-Aware Single-Stage 3D Object Detector for Autonomous Driving. Sensors. 2022; 22(4):1451. https://doi.org/10.3390/s22041451
Chicago/Turabian StyleXu, Wencai, Jie Hu, Ruinan Chen, Yongpeng An, Zongquan Xiong, and Han Liu. 2022. "Keypoint-Aware Single-Stage 3D Object Detector for Autonomous Driving" Sensors 22, no. 4: 1451. https://doi.org/10.3390/s22041451
APA StyleXu, W., Hu, J., Chen, R., An, Y., Xiong, Z., & Liu, H. (2022). Keypoint-Aware Single-Stage 3D Object Detector for Autonomous Driving. Sensors, 22(4), 1451. https://doi.org/10.3390/s22041451