[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3664647.3681565acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

SparseInteraction: Sparse Semantic Guidance for Radar and Camera 3D Object Detection

Published: 28 October 2024 Publication History

Abstract

Multi-modal fusion techniques, such as radar and images, enable a complementary and cost-effective perception of the surrounding environment regardless of lighting and weather conditions. However, existing fusion methods for surround-view images and radar are challenged by the inherent noise and positional ambiguity of radar, which leads to significant performance losses. To address this limitation effectively, our paper presents a robust, end-to-end fusion framework dubbed SparseInteraction. First, we introduce the Noisy Radar Filter (NRF) module to extract foreground features by creatively using queried semantic features from the image to filter out noisy radar features. Furthermore, we implement the Sparse Cross-Attention Encoder (SCAE) to effectively blend foreground radar features and image features to address positional ambiguity issues at a sparse level. Ultimately, to facilitate model convergence and performance, the foreground prior queries containing position information of the foreground radar are concatenated with predefined queries and fed into the subsequent transformer-based decoder. The experimental results demonstrate that the proposed fusion strategies markedly enhance detection performance and achieve new state-of-the-art results on the nuScenes benchmark. Source code is available at https://github.com/GG-Bonds/SparseInteraction.

References

[1]
Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11621--11631.
[2]
Xuanyao Chen, Tianyuan Zhang, Yue Wang, Yilun Wang, and Hang Zhao. 2023. Futr3d: A unified sensor fusion framework for 3d detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 172--181.
[3]
Can Cui, Yunsheng Ma, Juanwu Lu, and Ziran Wang. 2023. REDFormer: Radar Enlightens the Darkness of Camera Perception with Transformers. IEEE Transactions on Intelligent Vehicles (2023).
[4]
Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos, and Adrien Gaidon. 2020. 3d packing for self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2485--2494.
[5]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[6]
Chunyong Hu, Hang Zheng, Kun Li, Jianyun Xu, Weibo Mao, Maochun Luo, Lingxuan Wang, Mingxia Chen, Kaixuan Liu, Yiru Zhao, et al. 2023. FusionFormer: a multi-sensory fusion in bird's-eye-view and temporal consistent transformer for 3D objection. arXiv preprint arXiv:2309.05257 (2023).
[7]
Junjie Huang, Guan Huang, Zheng Zhu, Yun Ye, and Dalong Du. 2021. Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv preprint arXiv:2112.11790 (2021).
[8]
Jyh-Jing Hwang, Henrik Kretzschmar, Joshua Manela, Sean Rafferty, Nicholas Armstrong-Crews, Tiffany Chen, and Dragomir Anguelov. 2022. Cramnet: Camera-radar fusion with ray-constrained cross-attention for robust 3d object detection. In European Conference on Computer Vision. Springer, 388--405.
[9]
Jisong Kim, Minjae Seong, Geonho Bang, Dongsuk Kum, and Jun Won Choi. 2023. RCM-Fusion: Radar-Camera Multi-Level Fusion for 3D Object Detection. arXiv preprint arXiv:2307.10249 (2023).
[10]
Youngseok Kim, Sanmin Kim, Jun Won Choi, and Dongsuk Kum. 2023. Craft: Camera-radar 3d object detection with spatio-contextual fusion transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 1160--1168.
[11]
Youngseok Kim, Juyeb Shin, Sanmin Kim, In-Jae Lee, Jun Won Choi, and Dongsuk Kum. 2023. Crn: Camera radar net for accurate, robust, efficient 3d perception. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 17615--17626.
[12]
Kai Lei, Zhan Chen, Shuman Jia, and Xiaoteng Zhang. 2023. Hvdetfusion: A simple and robust camera-radar fusion framework. arXiv preprint arXiv:2307.11323 (2023).
[13]
Yinhao Li, Han Bao, Zheng Ge, Jinrong Yang, Jianjian Sun, and Zeming Li. 2023. Bevstereo: Enhancing depth estimation in multi-view 3d object detection with temporal stereo. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 1486--1494.
[14]
Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, and Zeming Li. 2023. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 1477--1485.
[15]
Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Yu Qiao, and Jifeng Dai. 2022. Bevformer: Learning bird's-eye-view representation from multi-camera images via spatiotemporal transformers. In European conference on computer vision. Springer, 1--18.
[16]
Zhiqi Li, Zhiding Yu, Wenhai Wang, Anima Anandkumar, Tong Lu, and Jose M Alvarez. 2023. Fb-bev: Bev representation from forward-backward view transformations. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6919--6928.
[17]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2117--2125.
[18]
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. 2022. A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11976--11986.
[19]
Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela L Rus, and Song Han. 2023. Bevfusion: Multi-task multi-sensor fusion with unified bird's-eye view representation. In 2023 IEEE international conference on robotics and automation (ICRA). IEEE, 2774--2781.
[20]
Yunfei Long, Daniel Morris, Xiaoming Liu, Marcos Castro, Punarjay Chakravarty, and Praveen Narayanan. 2021. Full-velocity radar returns by radar-camera fusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 16198--16207.
[21]
Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).
[22]
Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, and Jingdong Wang. 2021. Conditional detr for fast training convergence. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3651--3660.
[23]
Ramin Nabati and Hairong Qi. 2021. Centerfusion: Center-based radar and camera fusion for 3d object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1527--1536.
[24]
Jonah Philion and Sanja Fidler. 2020. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XIV 16. Springer, 194--210.
[25]
Guangsheng Shi, Ruifeng Li, and Chao Ma. 2022. Pillarnet: Real-time and high-performance pillar-based 3d object detection. In European Conference on Computer Vision. Springer, 35--52.
[26]
Ziying Song, Caiyan Jia, Lei Yang, Haiyue Wei, and Lin Liu. 2023. GraphAlign: An Accurate Feature Alignment by Graph Matching for Multi-Modal 3D Object Detection. IEEE Transactions on Circuits and Systems for Video Technology (2023), 1--1. https://doi.org/10.1109/TCSVT.2023.3306361
[27]
Ziying Song, Lin Liu, Feiyang Jia, Yadan Luo, Guoxin Zhang, Lei Yang, Li Wang, and Caiyan Jia. 2024. Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook. arXiv preprint arXiv:2401.06542 (2024).
[28]
Ziying Song, Guoxing Zhang, Lin Liu, Lei Yang, Shaoqing Xu, Caiyan Jia, Feiyang Jia, and Li Wang. 2024. Robofusion: Towards robust multi-modal 3d obiect detection via sam. arXiv preprint arXiv:2401.03907 (2024).
[29]
Ziying Song, Guoxin Zhang, Jun Xie, Lin Liu, Caiyan Jia, Shaoqing Xu, and Zhepeng Wang. 2024. Voxelnextfusion: A simple, unified and effective voxel fusion framework for multi-modal 3d object detection. arXiv preprint arXiv:2401.02702 (2024).
[30]
Lulu Tang, Ke Chen, Chaozheng Wu, Yu Hong, Kui Jia, and Zhi-Xin Yang. 2020. Improving semantic analysis on point clouds via auxiliary supervision of local geometric priors. IEEE Transactions on Cybernetics, Vol. 52, 6 (2020), 4949--4959.
[31]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).
[32]
Di Wang, Lulu Tang, Xu Wang, Luqing Luo, and Zhi-Xin Yang. 2022. Improving deep learning on point cloud by maximizing mutual information across layers. Pattern Recognition, Vol. 131 (2022), 108892.
[33]
Li Wang, Xinyu Zhang, Weijia Zeng, Wei Liu, Lei Yang, Jun Li, and Huaping Liu. 2022. Global perception-based robust parking space detection using a low-cost camera. IEEE Transactions on Intelligent Vehicles, Vol. 8, 2 (2022), 1439--1448.
[34]
Tai Wang, Xinge Zhu, Jiangmiao Pang, and Dahua Lin. 2021. Fcos3d: Fully convolutional one-stage monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 913--922.
[35]
Yue Wang, Vitor Campagnolo Guizilini, Tianyuan Zhang, Yilun Wang, Hang Zhao, and Justin Solomon. 2022. Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In Conference on Robot Learning. PMLR, 180--191.
[36]
Zizhang Wu, Guilian Chen, Yuanzhu Gan, Lei Wang, and Jian Pu. 2023. Mvfusion: Multi-view 3d object detection with semantic-aligned radar and camera fusion. arXiv preprint arXiv:2302.10511 (2023).
[37]
Yichen Xie, Chenfeng Xu, Marie-Julie Rakotosaona, Patrick Rim, Federico Tombari, Kurt Keutzer, Masayoshi Tomizuka, and Wei Zhan. 2023. SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection. arXiv preprint arXiv:2304.14340 (2023).
[38]
Shaoqing Xu, Fang Li, Ziying Song, Jin Fang, Sifen Wang, and Zhi-Xin Yang. 2024. Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection. IEEE Transactions on Geoscience and Remote Sensing (2024), 1--1.
[39]
Shaoqing Xu, Dingfu Zhou, Jin Fang, Junbo Yin, Zhou Bin, and Liangjun Zhang. 2021. Fusionpainting: Multimodal fusion with adaptive attention for 3d object detection. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC). IEEE, 3047--3054.
[40]
Zhuyu Yao, Jiangbo Ai, Boxun Li, and Chi Zhang. 2021. Efficient detr: improving end-to-end object detector with dense prior. arXiv preprint arXiv:2104.01318 (2021).
[41]
Xinyu Zhang, Li Wang, Jian Chen, Cheng Fang, Lei Yang, Ziying Song, Guangqi Yang, Yichen Wang, Xiaofei Zhang, and Jun Li. 2023. Dual radar: A multi-modal dataset with dual 4d radar for autononous driving. arXiv preprint arXiv:2310.07602 (2023).
[42]
Taohua Zhou, Junjie Chen, Yining Shi, Kun Jiang, Mengmeng Yang, and Diange Yang. 2023. Bridging the view disparity between radar and camera features for multi-modal fusion 3d object detection. IEEE Transactions on Intelligent Vehicles, Vol. 8, 2 (2023), 1523--1535.
[43]
Benjin Zhu, Zhengkai Jiang, Xiangxin Zhou, Zeming Li, and Gang Yu. 2019. Class-balanced grouping and sampling for point cloud 3d object detection. arXiv preprint arXiv:1908.09492 (2019).
[44]
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2020. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020).

Index Terms

  1. SparseInteraction: Sparse Semantic Guidance for Radar and Camera 3D Object Detection
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image ACM Conferences
            MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
            October 2024
            11719 pages
            ISBN:9798400706868
            DOI:10.1145/3664647
            This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

            Sponsors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 28 October 2024

            Check for updates

            Author Tags

            1. 3d object detection
            2. autonomous driving
            3. multi-modal

            Qualifiers

            • Research-article

            Funding Sources

            • Science and Technology Development Fund, Macau SAR
            • Guangdong Science and Technology Department
            • International Science and Technology Project of Guangzhou Development District
            • Zhuhai Science and Technology Innovation Bureau
            • Zhuhai UM Research Institute
            • University of Macau

            Conference

            MM '24
            Sponsor:
            MM '24: The 32nd ACM International Conference on Multimedia
            October 28 - November 1, 2024
            Melbourne VIC, Australia

            Acceptance Rates

            MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
            Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 98
              Total Downloads
            • Downloads (Last 12 months)98
            • Downloads (Last 6 weeks)84
            Reflects downloads up to 18 Dec 2024

            Other Metrics

            Citations

            View Options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            Login options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media