More Web Proxy on the site http://driver.im/

research-article

RoCo: Robust Cooperative Perception By Iterative Object Matching and Pose Adjustment

Authors:

Lei WangAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 7833 - 7842

https://doi.org/10.1145/3664647.3680559

Published: 28 October 2024 Publication History

Abstract

Collaborative autonomous driving with multiple vehicles usually requires the data fusion from multiple modalities. To ensure effective fusion, the data from each individual modality shall maintain a reasonably high quality. However, in collaborative perception, the quality of object detection based on a modality is highly sensitive to the relative pose errors among the agents. It leads to feature misalignment and significantly reduces collaborative performance. To address this issue, we propose RoCo, a novel unsupervised framework to conduct iterative object matching and agent pose adjustment. To the best of our knowledge, our work is the first to model the pose correction problem in collaborative perception as an object matching task, which reliably associates common objects detected by different agents. On top of this, we propose a graph optimization process to adjust the agent poses by minimizing the alignment errors of the associated objects, and the object matching is re-done based on the adjusted agent poses. This process is carried out iteratively until convergence. Experimental study on both simulated and real-world datasets demonstrates that the proposed framework RoCo consistently outperforms existing relevant methods in terms of the collaborative object detection performance, and exhibits highly desired robustness when the pose information of agents is with high-level noise. Ablation studies are also provided to show the impact of its key parameters and components. The code is released at https://github.com/HuangZhe885/RoCo.

References

[1]

Eduardo Arnold, Omar Y Al-Jarrah, Mehrdad Dianati, Saber Fallah, David Oxtoby, and Alexandros Mouzakitis. 2021. Data for Cooperative object classification for driving applications. (2021).

[2]

Nikolay Atanasov, Menglong Zhu, Kostas Daniilidis, and George J Pappas. 2016. Localization from semantic observations via the matrix permanent. The International Journal of Robotics Research, Vol. 35, 1--3 (2016), 73--99.

Digital Library

[3]

Josep Aulinas, Yvan Petillot, Joaquim Salvi, and Xavier Lladó. 2008. The SLAM problem: a survey. Artificial Intelligence Research and Development (2008), 363--371.

[4]

Ayman Beghdadi and Malik Mallem. 2022. A comprehensive overview of dynamic visual SLAM and deep learning: concepts, methods and challenges. Machine Vision and Applications, Vol. 33, 4 (2022), 54.

Digital Library

[5]

Lukas Bernreiter, Abel Gawel, Hannes Sommer, Juan Nieto, Roland Siegwart, and Cesar Cadena Lerma. 2019. Multiple hypothesis semantic mapping for robust data association. IEEE Robotics and Automation Letters, Vol. 4, 4 (2019), 3255--3262.

[6]

Alex Bewley, Zongyuan Ge, Lionel Ott, Fabio Ramos, and Ben Upcroft. 2016. Simple online and realtime tracking. In 2016 IEEE international conference on image processing (ICIP). IEEE, 3464--3468.

[7]

Sean L Bowman, Nikolay Atanasov, Kostas Daniilidis, and George J Pappas. 2017. Probabilistic data association for semantic slam. In 2017 IEEE international conference on robotics and automation (ICRA). IEEE, 1722--1729.

Digital Library

[8]

Xudong Cai, Yongcai Wang, Zhe Huang, Yu Shao, and Deying Li. 2024. VOLoc: Visual Place Recognition by Querying Compressed Lidar Map. arXiv preprint arXiv:2402.15961 (2024).

[9]

Qi Chen, Xu Ma, Sihai Tang, Jingda Guo, Qing Yang, and Song Fu. 2019. F-cooper: Feature based cooperative perception for autonomous vehicle edge computing system using 3D point clouds. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing. 88--100.

Digital Library

[10]

Siheng Chen, Baoan Liu, Chen Feng, Carlos Vallespi-Gonzalez, and Carl Wellington. 2020. 3d point cloud processing and learning for autonomous driving: Impacting map creation, localization, and perception. IEEE Signal Processing Magazine, Vol. 38, 1 (2020), 68--86.

[11]

Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An open urban driving simulator. In Conference on robot learning. PMLR, 1--16.

[12]

Nathaniel Glaser, Yen-Cheng Liu, Junjiao Tian, and Zsolt Kira. 2021. Overcoming obstructions via bandwidth-limited multi-agent spatial handshaking. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2406--2413.

Digital Library

[13]

Nathaniel Glaser, Yen-Cheng Liu, Junjiao Tian, and Zsolt Kira. 2021. Overcoming obstructions via bandwidth-limited multi-agent spatial handshaking. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2406--2413.

Digital Library

[14]

Jiaming Gu, Jingyu Zhang, Muyang Zhang, Weiliang Meng, Shibiao Xu, Jiguang Zhang, and Xiaopeng Zhang. 2023. FeaCo: Reaching Robust Feature-Level Consensus in Noisy Pose Conditions. In Proceedings of the 31st ACM International Conference on Multimedia. 3628--3636.

Digital Library

[15]

John E Hopcroft and Richard M Karp. 1973. An n^5/2 algorithm for maximum matchings in bipartite graphs. SIAM Journal on computing, Vol. 2, 4 (1973), 225--231.

[16]

Yue Hu, Shaoheng Fang, Zixing Lei, Yiqi Zhong, and Siheng Chen. 2022. Where2comm: Communication-efficient collaborative perception via spatial confidence maps. Advances in neural information processing systems, Vol. 35 (2022), 4874--4886.

[17]

Yue Hu, Shaoheng Fang, Weidi Xie, and Siheng Chen. 2023. Aerial monocular 3d object detection. IEEE Robotics and Automation Letters, Vol. 8, 4 (2023), 1959--1966.

[18]

Zhe Huang, Yongcai Wang, Xingui Tang, and Hongyu Sun. 2023. Boundary-aware set abstraction for 3D object detection. In 2023 International Joint Conference on Neural Networks (IJCNN). IEEE, 01--07.

[19]

Zhe Huang, Yongcai Wang, Jie Wen, Peng Wang, and Xudong Cai. 2023. An object detection algorithm combining semantic and geometric information of the 3D point cloud. Advanced Engineering Informatics, Vol. 56 (2023), 101971.

Digital Library

[20]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[21]

Xianghao Kong, Wentao Jiang, Jinrang Jia, Yifeng Shi, Runsheng Xu, and Si Liu. 2023. Dusa: Decoupled unsupervised sim2real adaptation for vehicle-to-everything collaborative perception. In Proceedings of the 31st ACM International Conference on Multimedia. 1943--1954.

Digital Library

[22]

Alex H Lang, Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. 2019. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12697--12705.

[23]

Kenneth Levenberg. 1944. A method for the solution of certain non-linear problems in least squares. Quarterly of applied mathematics, Vol. 2, 2 (1944), 164--168.

[24]

Yiming Li, Qi Fang, Jiamu Bai, Siheng Chen, Felix Juefei-Xu, and Chen Feng. 2023. Among us: Adversarially robust collaborative perception by consensus. arXiv preprint arXiv:2303.09495 (2023).

[25]

Yiming Li, Dekun Ma, Ziyan An, Zixun Wang, Yiqi Zhong, Siheng Chen, and Chen Feng. 2022. V2X-Sim: Multi-agent collaborative perception dataset and benchmark for autonomous driving. IEEE Robotics and Automation Letters, Vol. 7, 4 (2022), 10914--10921.

[26]

Yiming Li, Juexiao Zhang, Dekun Ma, Yue Wang, and Chen Feng. 2023. Multi-robot scene completion: Towards task-agnostic collaborative perception. In Conference on Robot Learning. PMLR, 2062--2072.

[27]

Yifan Lu, Yue Hu, Yiqi Zhong, Dequan Wang, Siheng Chen, and Yanfeng Wang. 2024. An Extensible Framework for Open Heterogeneous Collaborative Perception. arXiv preprint arXiv:2401.13964 (2024).

[28]

Yifan Lu, Quanhao Li, Baoan Liu, Mehrdad Dianati, Chen Feng, Siheng Chen, and Yanfeng Wang. 2023. Robust collaborative 3d object detection in presence of pose errors. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 4812--4818.

[29]

Donghao Qiao and Farhana Zulkernine. 2023. CoBEVFusion: Cooperative Perception with LiDAR-Camera Bird's-Eye View Fusion. arXiv preprint arXiv:2310.06008 (2023).

[30]

Akshay Raina and Vipul Arora. 2022. SyncNet: Using Causal Convolutions and Correlating Objective for Time Delay Estimation in Audio Signals. arXiv preprint arXiv:2203.14639 (2022).

[31]

Mike Rosenman and Fujun Wang. 2001. A component agent based open CAD system for collaborative design. Automation in Construction, Vol. 10, 4 (2001), 383--397.

[32]

Muhamad Risqi U Saputra, Andrew Markham, and Niki Trigoni. 2018. Visual SLAM and structure from motion in dynamic environments: A survey. ACM Computing Surveys (CSUR), Vol. 51, 2 (2018), 1--36.

Digital Library

[33]

Takafumi Taketomi, Hideaki Uchiyama, and Sei Ikeda. 2017. Visual SLAM algorithms: A survey from 2010 to 2016. IPSJ Transactions on Computer Vision and Applications, Vol. 9, 1 (2017), 1--11.

[34]

Xuebo Tian, Zhongyang Zhu, Junqiao Zhao, Gengxuan Tian, and Chen Ye. 2022. DL-SLOT: Dynamic LiDAR SLAM and object tracking based on collaborative graph optimization. arXiv preprint arXiv:2212.02077 (2022).

[35]

Nicholas Vadivelu, Mengye Ren, James Tu, Jingkang Wang, and Raquel Urtasun. 2021. Learning to communicate and correct pose errors. In Conference on Robot Learning. PMLR, 1195--1210.

[36]

Jingwen Wang, Martin Rünz, and Lourdes Agapito. 2021. DSP-SLAM: Object oriented SLAM with deep shape priors. In 2021 International Conference on 3D Vision (3DV). IEEE, 1362--1371.

[37]

Shuo Wang, Yongcai Wang, Xuewei Bai, and Deying Li. 2023. Communication Efficient, Distributed Relative State Estimation in UAV Networks. IEEE Journal on Selected Areas in Communications, Vol. 41, 4 (2023), 1151--1166. https://doi.org/10.1109/JSAC.2023.3242708

Digital Library

[38]

Shuo Wang, Yongcai Wang, Deying Li, and Qianchuan Zhao. 2023. Distributed Relative Localization Algorithms for Multi-Robot Networks: A Survey. Sensors, Vol. 23, 5 (2023). https://doi.org/10.3390/s23052399

[39]

Tianhang Wang, Guang Chen, Kai Chen, Zhengfa Liu, Bo Zhang, Alois Knoll, and Changjun Jiang. 2023. UMC: A Unified Bandwidth-efficient and Multi-resolution based Collaborative Perception Framework. arXiv preprint arXiv:2303.12400 (2023).

[40]

Tsun-Hsuan Wang, Sivabalan Manivasagam, Ming Liang, Bin Yang, Wenyuan Zeng, and Raquel Urtasun. 2020. V2vnet: Vehicle-to-vehicle communication for joint perception and prediction. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part II 16. Springer, 605--621.

[41]

Sizhe Wei, Yuxi Wei, Yue Hu, Yifan Lu, Yiqi Zhong, Siheng Chen, and Ya Zhang. 2023. Asynchrony-Robust Collaborative Perception via Bird's Eye View Flow. arXiv e-prints (2023), arXiv--2309.

[42]

Hao Xiang, Runsheng Xu, Xin Xia, Zhaoliang Zheng, Bolei Zhou, and Jiaqi Ma. 2023. V2xp-asg: Generating adversarial scenes for vehicle-to-everything perception. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 3584--3591.

[43]

Li Xiang, Junbo Yin, Wei Li, Cheng-Zhong Xu, Ruigang Yang, and Jianbing Shen. 2023. DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection. arXiv preprint arXiv:2312.15742 (2023).

[44]

Runsheng Xu, Zhengzhong Tu, Hao Xiang, Wei Shao, Bolei Zhou, and Jiaqi Ma. 2022. CoBEVT: Cooperative bird's eye view semantic segmentation with sparse transformers. arXiv preprint arXiv:2207.02202 (2022).

[45]

Runsheng Xu, Hao Xiang, Zhengzhong Tu, Xin Xia, Ming-Hsuan Yang, and Jiaqi Ma. 2022. V2x-vit: Vehicle-to-everything cooperative perception with vision transformer. In European conference on computer vision. Springer, 107--124.

Digital Library

[46]

Runsheng Xu, Hao Xiang, Xin Xia, Xu Han, Jinlong Li, and Jiaqi Ma. 2022. Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2583--2589.

Digital Library

[47]

Runsheng Xu, Hao Xiang, Xin Xia, Xu Han, Jinlong Li, and Jiaqi Ma. 2022 d. Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2583--2589.

Digital Library

[48]

Kun Yang, Dingkang Yang, Jingyu Zhang, Hanqi Wang, Peng Sun, and Liang Song. 2023. What2comm: Towards communication-efficient collaborative perception via feature decoupling. In Proceedings of the 31st ACM International Conference on Multimedia. 7686--7695.

Digital Library

[49]

Haibao Yu, Yizhen Luo, Mao Shu, Yiyi Huo, Zebang Yang, Yifeng Shi, Zhenglong Guo, Hanyu Li, Xing Hu, Jirui Yuan, et al. 2022. Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21361--21370.

[50]

Yunshuang Yuan, Hao Cheng, and Monika Sester. 2022. Keypoints-based deep feature fusion for cooperative vehicle detection of autonomous driving. IEEE Robotics and Automation Letters, Vol. 7, 2 (2022), 3054--3061.

[51]

Yunshuang Yuan, Hao Cheng, and Monika Sester. 2022. Keypoints-based deep feature fusion for cooperative vehicle detection of autonomous driving. IEEE Robotics and Automation Letters, Vol. 7, 2 (2022), 3054--3061.

[52]

Hanwei Zhang, Hideaki Uchiyama, Shintaro Ono, and Hiroshi Kawasaki. 2022. MOTSLAM: MOT-assisted monocular dynamic SLAM using single-view depth estimation. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 4865--4872.

[53]

Haibin Zhu, MengChu Zhou, and Rob Alkins. 2011. Group role assignment via a Kuhn--Munkres algorithm-based solution. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, Vol. 42, 3 (2011), 739--750.

Digital Library

[54]

Zhongyang Zhu, Junqiao Zhao, Xuebo Tian, Kai Huang, and Chen Ye. 2023. LIMOT: A Tightly-Coupled System for LiDAR-Inertial Odometry and Multi-Object Tracking. arXiv preprint arXiv:2305.00406 (2023).

[55]

Danping Zou, Ping Tan, and Wenxian Yu. 2019. Collaborative visual SLAM for multiple agents: A brief survey. Virtual Reality & Intelligent Hardware, Vol. 1, 5 (2019), 461--482.

Index Terms

RoCo: Robust Cooperative Perception By Iterative Object Matching and Pose Adjustment
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection

Recommendations

What2comm: Towards Communication-efficient Collaborative Perception via Feature Decoupling
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Multi-agent collaborative perception has received increasing attention recently as an emerging application in driving scenarios. Despite advancements in previous approaches, challenges remain due to redundant communication patterns and vulnerable ...
FeaCo: Reaching Robust Feature-Level Consensus in Noisy Pose Conditions
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Collaborative perception offers a promising solution to overcome challenges such as occlusion and long-range data processing. However, limited sensor accuracy leads to noisy poses that misalign observations among vehicles. To address this problem, we ...
FeaKM: Robust Collaborative Perception under Noisy Pose Conditions
JCRAI '24: Proceedings of the 2024 4th International Joint Conference on Robotics and Artificial Intelligence

Collaborative perception is essential for networks of agents with limited sensing capabilities, enabling them to work together by exchanging information to achieve a robust and comprehensive understanding of their environment. However, localization ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China Grant
China Scholarship Council award
Public Computing Cloud and the Blockchain Lab, School of Information, Renmin University of China

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
199
Total Downloads

Downloads (Last 12 months)199
Downloads (Last 6 weeks)138

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten