[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3595916.3626441acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

RGB-D Tracking via Hierarchical Modality Aggregation and Distribution Network

Published: 01 January 2024 Publication History

Abstract

The integration of dual-modal features has been pivotal in advancing RGB-Depth (RGB-D) tracking. However, current trackers are less efficient and focus solely on single-level features, resulting in weaker robustness in fusion and slower speeds that fail to meet the demands of real-world applications. In this paper, we introduce a novel network, denoted as HMAD (Hierarchical Modality Aggregation and Distribution), which addresses these challenges. HMAD leverages the distinct feature representation strengths of RGB and depth modalities, giving prominence to a hierarchical approach for feature distribution and fusion, thereby enhancing the robustness of RGB-D tracking. Experimental results on various RGB-D datasets demonstrate that HMAD achieves state-of-the-art performance. Moreover, real-world experiments further validate HMAD’s capacity to effectively handle a spectrum of tracking challenges in real-time scenarios.

References

[1]
Md. Shahinur Alam, Ki-Chul Kwon, and Nam Kim. 2021. Implementation of a Character Recognition System Based on Finger-Joint Tracking Using a Depth Camera. IEEE Transactions on Human-Machine Systems 51, 3 (2021), 229–241.
[2]
L. Bertinetto, J. Valmadre, Joo F. Henriques, A. Vedaldi, and Phs Torr. 2016. Fully-Convolutional Siamese Networks for Object Tracking. In European Conference on Computer Vision Workshops.
[3]
G. Bhat, M. Danelljan, L. V. Gool, and R. Timofte. 2019. Learning Discriminative Model Prediction for Tracking. In IEEE International Conference on Computer Vision.
[4]
L. Bo, J. Yan, W. Wei, Z. Zheng, and X. Hu. 2018. High Performance Visual Tracking with Siamese Region Proposal Network. In IEEE Conference on Computer Vision and Pattern Recognition.
[5]
M. Camplani, S. Hannuna, M. Mirmehdi, D. Damen, A. Paiement, L. Tao, and T. Burghardt. 2015. Real-time RGB-D Tracking with Depth Scaling Kernelised Correlation Filters and Occlusion Handling. In British Machine Vision Conference.
[6]
Xin Chen, Bin Yan, Jiawen Zhu, Dong Wang, Xiaoyun Yang, and Huchuan Lu. 2021. Transformer Tracking. In IEEE Conference on Computer Vision and Pattern Recognition.
[7]
Yutao Cui, Cheng Jiang, Limin Wang, and Gangshan Wu. 2022. Mixformer: End-to-end Tracking with Iterative Mixed Attention. In IEEE Conference on Computer Vision and Pattern Recognition.
[8]
M. Danelljan, G. Bhat, F. S. Khan, and M. Felsberg. 2020. ATOM: Accurate Tracking by Overlap Maximization. In IEEE Conference on Computer Vision and Pattern Recognition.
[9]
Keren Fu, Deng-Ping Fan, Ge-Peng Ji, Qijun Zhao, Jianbing Shen, and Ce Zhu. 2021. Siamese Network for RGB-D Salient Object Detection and Beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 9 (2021), 5541–5559.
[10]
Jingfan Guo, Tongwei Ren, and Jia Bei. 2016. Salient Object Detection for RGB-D Image Via Saliency Evolution. In IEEE International Conference on Multimedia and Expo.
[11]
Stefan Haag, Bharanidhar Duraisamy, Wolfgang Koch, and Jürgen Dickmann. 2018. Radar and Lidar Target Signatures of Various Object Types and Evaluation of Extended Object Tracking Methods for Autonomous Driving Applications. In International Conference on Information Fusion.
[12]
Harika, Narumanchi, Dishant, Goyal, Nitesh, Emmadi, Praveen, and Gauravaram. 2018. Hierarchical Multi-modal Fusion FCN with Attention Model for RGB-D Tracking. In IEEE International Conference on Cloud Engineering.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition.
[14]
João F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2014. High-speed Tracking with Kernelized Correlation Filters. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 3 (2014), 583–596.
[15]
Ruichao Hou, Tongwei Ren, and Gangshan Wu. 2022. MIRNet: A Robust RGBT Tracking Jointly with Multi-Modal Interaction and Refinement. In IEEE International Conference on Multimedia and Expo.
[16]
Ruichao Hou, Boyue Xu, Tongwei Ren, and Gangshan Wu. 2023. MTNet: Learning Modality-aware Representation with Transformer for RGBT Tracking. In IEEE International Conference on Multimedia and Expo.
[17]
J. F. Hu, W. S. Zheng, J. Lai, and J. Zhang. 2015. Jointly Learning Heterogeneous Features for RGB-D Activity Recognition. In IEEE Conference on Computer Vision and Pattern Recognition.
[18]
Uğur Kart, Joni-Kristian Kämäräinen, Jiří Matas, Lixin Fan, and Francesco Cricri. 2018. Depth Masked Discriminative Correlation Filter. In International Conference on Pattern Recognition.
[19]
Uur Kart, Joni Kristian Kmrinen, and Jií Matas. 2019. How to Make an RGBD Tracker?. In European Conference on Computer Vision Workshops.
[20]
Matej Kristan, Jiří Matas, Aleš Leonardis, Michael Felsberg, Roman Pflugfelder, Joni-Kristian Kämäräinen, Hyung Jin Chang, Martin Danelljan, Luka Cehovin, Alan Lukežič, 2021. The Ninth Visual Object Tracking Vot2021 Challenge Results. In IEEE International Conference on Computer Vision.
[21]
Matej Kristan, Jiri Matas, Ales Leonardis, Michael Felsberg, Roman Pflugfelder, Joni-Kristian Kamarainen, Luka ˇCehovin Zajc, Ondrej Drbohlav, Alan Lukezic, Amanda Berg, 2019. The Seventh Visual Object Tracking VOT2019 Challenge Results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.
[22]
B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan. 2020. SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks. In IEEE Conference on Computer Vision and Pattern Recognition.
[23]
Liting Lin, Heng Fan, Zhipeng Zhang, Yong Xu, and Haibin Ling. 2022. Swintrack: A Simple and Strong Baseline for Transformer Tracking. In Advances in Neural Information Processing Systems.
[24]
Alan Lukeźič, Luka Čehovin Zajc, Tomáš Vojíř, Jiří Matas, and Matej Kristan. 2020. Performance Evaluation Methodology for Long-term Single-object Tracking. IEEE Transactions on Cybernetics 51, 12 (2020), 6305–6318.
[25]
Yongsen Mao, Yiming Zhang, Hanxiao Jiang, Angel Chang, and Manolis Savva. 2022. MultiScan: Scalable RGBD Scanning for 3D Environments with Articulated Objects. In Advances in Neural Information Processing Systems.
[26]
Christoph Mayer, Martin Danelljan, Danda Pani Paudel, and Luc Van Gool. 2021. Learning Target Candidate Association to Keep Track of What Not to Track. In IEEE International Conference on Computer Vision.
[27]
A. Memarmoghadam. 2021. The Eighth Visual Object Tracking VOT2020 Challenge Results. In European Conference on Computer Vision Workshops.
[28]
Hyeonseob Nam and Bohyung Han. 2016. Learning Multi-Domain Convolutional Neural Networks for Visual Tracking. In IEEE Conference on Computer Vision and Pattern Recognition.
[29]
J Peršić, L Petrović, I Marković, and I Petrović. 2021. Online Multi-sensor Calibration Based on Moving Object Tracking. Advanced Robotics 35, 3-4 (2021), 130–140.
[30]
Yanlin Qian, Song Yan, Alan Lukežič, Matej Kristan, Joni-Kristian Kämäräinen, and Jiří Matas. 2021. DAL: A Deep Depth-aware Long-term Tracker. In International Conference on Pattern Recognition.
[31]
S. Ren, K. He, R. Girshick, and J. Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (2017), 1–9.
[32]
Tongwei Ren and Ao Zhang. 2019. RGB-D Salient Object Detection: A Review. Springer International Publishing, Cham, 203–220.
[33]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention Is All You Need. In Advances in Neural Information Processing Systems.
[34]
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. Cbam: Convolutional Block Attention Module. In European Conference on Computer Vision.
[35]
Peng Wu, Runze Guo, Xiaozhong Tong, Shaojing Su, Zhen Zuo, Bei Sun, and Junyu Wei. 2022. Link-RGBD: Cross-guided Feature Fusion Network for RGBD Semantic Segmentation. IEEE Sensors Journal 22, 24 (2022), 24161–24175.
[36]
S. Yan, J. Yang, J. Kpyl, F. Zheng, A. Leonardis, and J. K. Kmrinen. 2021. DepthTrack : Unveiling the Power of RGBD Tracking. In IEEE International Conference on Computer Vision.
[37]
Jinyu Yang, Zhe Li, Feng Zheng, Ales Leonardis, and Jingkuan Song. 2022. Prompting for Multi-modal Tracking. In the ACM International Conference on Multimedia.
[38]
Botao Ye, Hong Chang, Bingpeng Ma, Shiguang Shan, and Xilin Chen. 2022. Joint Feature Learning and Relation Modeling for Tracking: A One-stream Framework. In European Conference on Computer Vision.
[39]
Jiawen Zhu, Simiao Lai, Xin Chen, Dong Wang, and Huchuan Lu. 2023. Visual Prompt Multi-Modal Tracking. In IEEE Conference on Computer Vision and Pattern Recognition.
[40]
Xue-Feng Zhu, Tianyang Xu, Zhangyong Tang, Zucheng Wu, Haodong Liu, Xiao Yang, Xiao-Jun Wu, and Josef Kittler. 2023. RGBD1K: A Large-scale Dataset and Benchmark for RGB-D Object Tracking. In AAAI Conference on Artificial Intelligence.
[41]
Xue-Feng Zhu, Tianyang Xu, and Xiao-Jun Wu. 2022. Visual Object Tracking on Multi-modal RGB-D Videos: A Review. arXiv preprint arXiv:2201.09207 (2022), 1–5.

Cited By

View all
  • (2024)RGB-D Video Object Segmentation via Enhanced Multi-store Feature MemoryProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658036(1016-1024)Online publication date: 30-May-2024

Index Terms

  1. RGB-D Tracking via Hierarchical Modality Aggregation and Distribution Network

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia
    December 2023
    745 pages
    ISBN:9798400702051
    DOI:10.1145/3595916
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 January 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. RGB-D
    2. attention mechanism
    3. multi-modal fusion
    4. single object tracking

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • National Natural Science Foundation of China
    • Key R&D Project of Jiangsu Province
    • the Program B for Outstanding Ph.D. candidate of Nanjing University
    • the Collaborative Innovation Center of Novel Software Technology and Industrialization
    • the Fundamental Research Funds for the Central Universities

    Conference

    MMAsia '23
    Sponsor:
    MMAsia '23: ACM Multimedia Asia
    December 6 - 8, 2023
    Tainan, Taiwan

    Acceptance Rates

    Overall Acceptance Rate 59 of 204 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)62
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 23 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)RGB-D Video Object Segmentation via Enhanced Multi-store Feature MemoryProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658036(1016-1024)Online publication date: 30-May-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media