HP3D-V2V: High-Precision 3D Object Detection Vehicle-to-Vehicle Cooperative Perception Algorithm
<p>Diagram of the three types of collaboration strategies [<a href="#B31-sensors-24-02170" class="html-bibr">31</a>]. (<b>a</b>) Precoordinated vehicle feature fusion process. (<b>b</b>) Midphase feature fusion process of the cooperative vehicle. (<b>c</b>) Late feature fusion process of the cooperative vehicle.</p> "> Figure 2
<p>Proposed algorithm architecture for high-precision 3D target detection vehicle-to-vehicle cooperative perception.</p> "> Figure 3
<p>3D LiDAR point cloud denoising flow chart.</p> "> Figure 4
<p>Comparison of feature extraction strategies.</p> "> Figure 5
<p>Structure of the voxel feature extraction network.</p> "> Figure 6
<p>Feature pyramid backbone network with multibranch target detection head structure.</p> "> Figure 7
<p>Crossvehicle feature fusion networks for intermediate feature.</p> "> Figure 8
<p>Data enhancement visualization results, where the green box is the ground truth. (<b>a</b>) shows the original data visualization, (<b>b</b>) shows the flipped visualization image, (<b>c</b>) shows the rotated visualization image, and (<b>d</b>) shows the scaled visualization image.</p> "> Figure 9
<p>Comparison chart of detection results between mainstream models based on intermediate fusion and the proposed intermediate fusion model. (<b>a</b>) shows the detection results of the AttFuse model for Default Towns and Culver City. (<b>b</b>) shows the detection results of the V2VNet model. (<b>c</b>) shows the detection results of the proposed detection method.</p> "> Figure 10
<p>Schematic diagram of ablation experiment. (<b>a</b>) shows the detection results of the benchmark method of this paper in Default Towns. (<b>b</b>) shows the detection results after adding the point cloud denoising method, as described in <a href="#sec3dot1-sensors-24-02170" class="html-sec">Section 3.1</a>. (<b>c</b>) shows the detection results after adding the feature extraction method of voxel point column fusion, as described in <a href="#sec3dot2-sensors-24-02170" class="html-sec">Section 3.2</a>. (<b>d</b>) shows the detection results after adding the crossvehicle feature fusion module, as described in <a href="#sec3dot4-sensors-24-02170" class="html-sec">Section 3.4</a>.</p> ">
Abstract
:1. Introduction
- A voxel grid-based statistical filter (voxel grid filter) is introduced in the preprocessing stage to improve the cleanness and reliability of the PCD.
- We present a feature extraction network structure for voxel point column fusion to solve the problem of the lack of spatial feature interaction in the point column-based feature extraction method, and we use maximum pooling to replace the feature splicing operation in the voxel-based method to realize the dimensionality reduction of the features and to generate a pseudoimage for the subsequent processing of pseudoimage features using a 2D CNN.
- We establish a cooperative perceptual feature fusion module to construct a feature compression and feature sharing network, and we introduce residual blocks to reduce the loss of information during network transmission. In addition, based on max and mean dimensionality reduction operators, we propose an adaptive feature fusion module to better capture spatial relationships between features, thus improving the accuracy of the model.
2. Related Works
2.1. Early Collaboration
2.2. Intermediate Collaboration
2.3. Late Collaboration
3. HP3D-V2V Algorithm
- Filtering the input point cloud data to enhance its quality.
- Utilizing voxel column fusion to perform feature coding on the filtered point cloud, thus resulting in a pseudoimage representation known as the pillar feature network (PFN).
- Extracting multiscale features from the PFN using a feature pyramid network (FPN), thereby enabling the extraction of intermediate features.
- Performing intervehicle data sharing, where the intermediate feature map of the cooperative autonomous cehicle (CAV) is projected onto the self-vehicle coordinates.
- Conducting intervehicle feature fusion to generate a combined feature map that integrates information from multiple vehicles.
- Performing 3D object detection to output a bird’s-eye view (BEV) representation of the detected 3D targets.
3.1. Point Cloud Denoising
3.2. Feature Learning Network
3.3. Backbone
3.4. Multivehicle Information Fusion Pipeline
3.4.1. Data Sharing and Feature Extraction
3.4.2. Feature Compression and Sharing
3.4.3. Crossvehicle Feature Fusion (CVFF)
3.5. Loss Functions
4. Experiments
4.1. Dataset and Split
4.2. Implementation Details
4.2.1. Device Information
4.2.2. Metrics
4.2.3. Model Details
4.2.4. Training
4.2.5. Data Augmentation
4.3. Comparison Experiments
4.3.1. Results
4.3.2. Discussion
4.4. Ablation Studies
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Fang, J.; Zhou, D.; Yan, F.; Zhao, T.; Zhang, F.; Ma, Y.; Wang, L.; Yang, R. Augmented LiDAR simulator for autonomous driving. IEEE Robot. Autom. Lett. 2020, 5, 1931–1938. [Google Scholar] [CrossRef]
- Wang, Z.; Han, Y.; Zhang, Y.; Hao, J.; Zhang, Y. Classification and Recognition Method of Non-Cooperative Objects Based on Deep Learning. Sensors 2024, 24, 583. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; He, L.; Chen, J.; Wang, B.; Wang, Y.; Zhou, Y. Multiattention mechanism 3D object detection algorithm based on RGB and LiDAR fusion for intelligent driving. Sensors 2023, 23, 8732. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.C.; Tian, J.; Ma, C.Y.; Glaser, N.; Kuo, C.W.; Kira, Z. Who2com: Collaborative perception via learnable handshake communication. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020. [Google Scholar]
- Liang, Z.; Huang, Y.; Liu, Z. Efficient graph attentional network for 3D object detection from Frustum-based LiDAR point clouds. J. Vis. Commun. Image Represent. 2022, 89, 103667. [Google Scholar] [CrossRef]
- Zhou, S.; Tian, Z.; Chu, X.; Zhang, X.; Zhang, B.; Lu, X.; Feng, C.; Jie, Z.; Chiang, P.Y.; Ma, L. FastPillars: A Deployment-friendly Pillar-based 3D Detector. arXiv 2023, arXiv:2302.02367. [Google Scholar]
- Zhang, G.; Li, S.; Zhang, K.; Lin, Y.J. Machine Learning-Based Human Posture Identification from Point Cloud Data Acquisitioned by FMCW Millimetre-Wave Radar. Sensors 2023, 23, 7208. [Google Scholar] [CrossRef]
- Tsukada, M.; Oi, T.; Ito, A.; Hirata, M.; Esaki, H. AutoC2X: Open-source software to realize V2X cooperative perception among autonomous vehicles. In Proceedings of the 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall), Victoria, BC, Canada, 18 November–16 December 2020. [Google Scholar]
- Li, Y.; Ma, D.; An, Z.; Wang, Z.; Zhong, Y.; Chen, S.; Feng, C. V2X-Sim: Multi-agent collaborative perception dataset and benchmark for autonomous driving. IEEE Robot. Autom. Lett. 2022, 7, 10914–10921. [Google Scholar] [CrossRef]
- Llatser, I.; Michalke, T.; Dolgov, M.; Wildschütte, F.; Fuchs, H. Cooperative automated driving use cases for 5G V2X communication. In Proceedings of the IEEE 2nd 5G World Forum (5GWF), Dresden, Germany, 30 September–2 October 2019. [Google Scholar]
- Noor-A-Rahim, M.; Liu, Z.; Lee, H.; Khyam, M.O.; He, J.; Pesch, D.; Moessner, K.; Saad, W.; Poor, H.V. 6G for vehicle-to-everything (V2X) communications: Enabling technologies, challenges, and opportunities. Proc. IEEE 2022, 110, 712–734. [Google Scholar] [CrossRef]
- Zhao, X.; Sun, P.; Xu, Z.; Min, H.; Yu, H. Fusion of 3D LIDAR and camera data for object detection in autonomous vehicle applications. IEEE Sensors J. 2020, 20, 4901–4913. [Google Scholar] [CrossRef]
- Choe, J.; Joo, K.; Imtiaz, T.; Kweon, I.S. Volumetric propagation network: Stereo-lidar fusion for long-range depth estimation. IEEE Robot. Autom. Lett. 2021, 6, 4672–4679. [Google Scholar] [CrossRef]
- Hu, C.; Pan, Z.; Li, P. A 3D point cloud filtering method for leaves based on manifold distance and normal estimation. Remote Sens. 2019, 11, 198. [Google Scholar] [CrossRef]
- Kim, S.U.; Roh, J.; Im, H.; Kim, J. Anisotropic SpiralNet for 3D Shape Completion and Denoising. Sensors 2022, 22, 6457. [Google Scholar] [CrossRef] [PubMed]
- Liu, K.; Xiao, A.; Huang, J.; Cui, K.; Xing, Y.; Lu, S. D-lc-nets: Robust denoising and loop closing networks for lidar slam in complicated circumstances with noisy point clouds. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022. [Google Scholar]
- Zhao, Q.; Gao, X.; Li, J.; Luo, L. Optimization algorithm for point cloud quality enhancement based on statistical filtering. J. Sens. 2021, 2021, 7325600. [Google Scholar] [CrossRef]
- Xu, Y.; Tong, X.; Stilla, U. Voxel-based representation of 3D point clouds: Methods, applications, and its potential use in the construction industry. Autom. Constr. 2021, 126, 103675. [Google Scholar] [CrossRef]
- Duan, Y.; Yang, C.; Li, H. Low-complexity adaptive radius outlier removal filter based on PCA for lidar point cloud denoising. Appl. Opt. 2021, 60.20, E1–E7. [Google Scholar] [CrossRef] [PubMed]
- He, C.; Zeng, H.; Huang, J.; Hua, X.S.; Zhang, L. Structure aware single-stage 3d object detection from point cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Hu, Y.; Ding, Z.; Ge, R.; Shao, W.; Huang, L.; Li, K.; Liu, Q. Afdetv2: Rethinking the necessity of the second stage for object detection from point clouds. Proc. AAAI Conf. Artif. Intell. 2022, 36, 969–979. [Google Scholar] [CrossRef]
- Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2019, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Noh, J.; Lee, S.; Ham, B. Hvpr: Hybrid voxel-point representation for single-stage 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Imad, M.; Doukhi, O.; Lee, D.J. Transfer learning based semantic segmentation for 3D object detection from point cloud. Sensors 2021, 21, 3964. [Google Scholar] [CrossRef] [PubMed]
- Yan, Y.; Mao, Y.; Li, B. Second: Sparsely embedded convolutional detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef]
- Ye, M.; Xu, S.; Cao, T. Hvnet: Hybrid voxel network for lidar-based 3D object detection. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition 2020, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Song, J.; Lee, J. Online Self-Calibration of 3D Measurement Sensors Using a Voxel-Based Network. Sensors 2021, 22, 6447. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Y.; Tuzel, O. Voxelnet: End-to-end learning for point cloud-based 3D object detection. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Arnold, E.; Dianati, M.; de Temple, R.; Fallah, S. Cooperative perception for 3D object detection in driving scenarios using infrastructure sensors. IEEE Trans. Intell. Transp. Syst. 2020, 23, 1852–1864. [Google Scholar] [CrossRef]
- Su, S.; Li, Y.; He, S.; Han, S.; Feng, C.; Ding, C.; Miao, F. Uncertainty quantification of collaborative detection for self-driving. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 5588–5594. [Google Scholar]
- Li, Y.; Ren, S.; Wu, P.; Chen, S.; Feng, C.; Zhang, W. Learning distilled collaboration graph for multi-agent perception. Adv. Neural Inf. Process. Syst. 2021, 34, 29541–29552. [Google Scholar]
- Chen, Q.; Ma, X.; Tang, S.; Guo, J.; Yang, Q.; Fu, S. F-cooper: Feature-based cooperative perception for an autonomous vehicle edge computing system using 3D point clouds. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing (2019), Washington, DC, USA, 7–9 November 2019. [Google Scholar]
- Xu, R.; Xiang, H.; Xia, X.; Han, X.; Li, J.; Ma, J. Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA) 2022, Philadelphia, PA, USA, 23–27 May 2022. [Google Scholar]
- Wang, T.H.; Manivasagam, S.; Liang, M.; Yang, B.; Zeng, W.; Urtasun, R. V2vnet: Vehicle-to-vehicle communication for joint perception and prediction. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020, Proceedings, Part II; Springer International Publishing: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Xu, R.; Xiang, H.; Tu, Z.; Xia, X.; Yang, M.H.; Ma, J. V2x-vit: Vehicle-to-everything cooperative perception with vision transformer. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2022. [Google Scholar]
- Lin, C.; Tian, D.; Duan, X.; Zhou, J.; Zhao, D.; Cao, D. V2VFormer: Vehicle-to-Vehicle Cooperative Perception with Spatial-Channel Transformer. IEEE Trans. Intell. Veh. 2024. [Google Scholar] [CrossRef]
- Wang, B.; Zhang, L.; Wang, Z.; Zhao, Y.; Zhou, T. CORE: Cooperative Reconstruction for Multi-Agent Perception. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2023, Paris, France, 1–6 October 2023. [Google Scholar]
- Wang, T.; Chen, G.; Chen, K.; Liu, Z.; Zhang, B.; Knoll, A.; Jiang, C. UMC: A unified bandwidth-efficient and multi-resolution based collaborative perception framework. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2023, Paris, France, 1–6 October 2023; pp. 8187–8196. [Google Scholar]
- Allig, C.; Wanielik, G. Alignment of perception information for cooperative perception. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019. [Google Scholar]
- Cheng, D.; Zhao, D.; Zhang, J.; Wei, C.; Tian, D. PCA-Based Denoising Algorithm for Outdoor Lidar Point Cloud Data. Sensors 2021, 21, 3703. [Google Scholar] [CrossRef] [PubMed]
- Shi, G.; Li, R.; Ma, C. Pillarnet: High-performance pillar-based 3D object detection. arXiv 2022, arXiv:2205.07403. [Google Scholar]
- Ballé, J.; Minnen, D.; Singh, S.; Hwang, S.J.; Johnston, N. Variational image compression with a scale hyperprior. arXiv 2018, arXiv:1802.01436. [Google Scholar]
- Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 28, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Xu, R.; Guo, Y.; Han, X.; Xia, X.; Xiang, H.; Ma, J. OpenCDA: An open cooperative driving automation framework integrated with co-simulation. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC) 2021, Indianapolis, IN, USA, 19–22 September 2021. [Google Scholar]
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning, Mountain View, CA, USA, 13–15 November 2017. [Google Scholar]
Method | Default Towns | Culver City | Model Size (Mb) | ||
---|---|---|---|---|---|
[email protected] | [email protected] | [email protected] | [email protected] | ||
No Fusion | 49.1 | 38.3 | 40.6 | 26.7 | 18.2 |
Early Fusion | 52.3 | 40.6 | 42.5 | 35.3 | 20.0 |
Late Fusion | 59.6 | 42.5 | 49.4 | 39.7 | 19.5 |
F-Cooper [32] | 61.7 | 49.8 | 53.7 | 44.5 | 35.3 |
Who2com [4] | 62.0 | 50.5 | 54.1 | 44.2 | 37.4 |
AttFuse [33] | 62.8 | 50.8 | 54.0 | 46.3 | 34.3 |
V2VNet [34] | 63.3 | 51.6 | 54.5 | 45.8 | 36.8 |
HP3D-V2V (Ours) | 67.4 | 56.5 | 58.8 | 50.5 | 35.0 |
Method | Default Towns | Culver City | |||
---|---|---|---|---|---|
[email protected] | [email protected] | [email protected] | [email protected] | ||
Baseline | SECOND | 60.4 | 48.7 | 55.3 | 45.1 |
PointPillar | 61.5 | 49.2 | 54.5 | 44.4 | |
+Denoising | SECOND | 61.7 | 49.6 | 56.0 | 45.8 |
PointPillar | 63.1 | 54.5 | 55.3 | 46.4 | |
+VFE_VP | SECOND | – | – | – | – |
PointPillar | 65.5 | 55.0 | 56.7 | 47.5 | |
+CVFF | SECOND | 64.3 | 53.1 | 56.1 | 47.2 |
PointPillar | 67.4 | 56.5 | 58.8 | 50.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, H.; Wang, H.; Liu, Z.; Gu, D.; Ye, W. HP3D-V2V: High-Precision 3D Object Detection Vehicle-to-Vehicle Cooperative Perception Algorithm. Sensors 2024, 24, 2170. https://doi.org/10.3390/s24072170
Chen H, Wang H, Liu Z, Gu D, Ye W. HP3D-V2V: High-Precision 3D Object Detection Vehicle-to-Vehicle Cooperative Perception Algorithm. Sensors. 2024; 24(7):2170. https://doi.org/10.3390/s24072170
Chicago/Turabian StyleChen, Hongmei, Haifeng Wang, Zilong Liu, Dongbing Gu, and Wen Ye. 2024. "HP3D-V2V: High-Precision 3D Object Detection Vehicle-to-Vehicle Cooperative Perception Algorithm" Sensors 24, no. 7: 2170. https://doi.org/10.3390/s24072170
APA StyleChen, H., Wang, H., Liu, Z., Gu, D., & Ye, W. (2024). HP3D-V2V: High-Precision 3D Object Detection Vehicle-to-Vehicle Cooperative Perception Algorithm. Sensors, 24(7), 2170. https://doi.org/10.3390/s24072170