SiamMGT: robust RGBT tracking via graph attention and reliable modality weight learning

Lizhi Geng¹,
Dongming Zhou¹,
Kerui Wang¹,
Yisong Liu¹ &
…
Kaixiang Yan¹

180 Accesses
Explore all metrics

Abstract

In recent years, RGBT trackers based on the Siamese network have gained significant attention due to their balanced accuracy and efficiency. However, these trackers often rely on similarity matching of features between a fixed-size target template and search region, which can result in unsatisfactory tracking performance when there are dramatic changes in target scale or shape or occlusion occurs. Additionally, while these trackers often employ feature-level fusion for different modalities, they frequently overlook the benefits of decision-level fusion, which can diminish their flexibility and independence. In this paper, a novel Siamese tracker through graph attention and reliable modality weighting is proposed for robust RGBT tracking. Specifically, a modality feature interaction learning network is constructed to perform bidirectional learning of the local features from each modality while extracting their respective characteristics. Subsequently, a multimodality graph attention network is used to match the local features of the template and search region, generating more accurate and robust similarity responses. Finally, a modality fusion prediction network is designed to perform decision-level adaptive fusion of the two modality responses, leveraging their complementary nature for prediction. Extensive experiments on three large-scale RGBT benchmarks demonstrate outstanding tracking capabilities over other state-of-the-art trackers. Code will be shared at https://github.com/genglizhi/SiamMGT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Siamese transformer RGBT tracking

Article 28 July 2023

Learning Dual-Fused Modality-Aware Representations for RGBD Tracking

TEFNet: Target-Aware Enhanced Fusion Network for RGB-T Tracking

Data Availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: object-aware anchor-free tracking. In: Computer vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16. Springer, pp 771–787. https://doi.org/10.1007/978-3-030-58589-1_46
Li C, Cheng H, Hu S, Liu X, Tang J, Lin L (2016) Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans Image Process 25(12):5743–5756. https://doi.org/10.1109/TIP.2016.2614135. (IEEE)
Article MathSciNet Google Scholar
Li C, Liang X, Lu Y, Zhao N, Tang J (2019) Rgb-t object tracking: benchmark and baseline. Pattern Recogn 96:106977. https://doi.org/10.1016/j.patcog.2019.106977. (Elsevier)
Article Google Scholar
Li C, Xue W, Jia Y, Qu Z, Luo B, Tang J, Sun D (2021) Lasher: a large-scale high-diversity benchmark for rgbt tracking. IEEE Trans Image Process 31:392–404. https://doi.org/10.1109/TIP.2021.3130533. (IEEE)
Article Google Scholar
Du D, Qi Y, Yu H, Yang Y, Duan K, Li G, Zhang W, Huang Q, Tian Q (2018) The unmanned aerial vehicle benchmark: Object detection and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 370–386
Zhang X, Ye P, Leung H, Gong K, Xiao G (2020) Object fusion tracking based on visible and infrared images: a comprehensive review. Inf Fusion 63:166–187 (Elsevier)
Article Google Scholar
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: Computer vision–ECCV 2016 workshops: Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, proceedings, part II 14. Springer, pp. 850–865. https://doi.org/10.1007/978-3-319-48881-3_56
Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1763–1771
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8971–8980
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4282–4291
Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) Siamcar: siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6269–6277
Guo D, Shao Y, Cui Y, Wang Z, Zhang L, Shen C (2021) Graph attention tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9543–9552
Qi Y, Zhang S, Jiang F, Zhou H, Tao D, Li X (2020) Siamese local and global networks for robust face tracking. IEEE Trans Image Process 29:9152–9164. https://doi.org/10.1109/TIP.2020.3023621
Article Google Scholar
Dong X, Shen J, Porikli F, Luo J, Shao L (2023) Adaptive siamese tracking with a compact latent network. IEEE Trans Pattern Anal Mach Intell 45(7):8049–8062. https://doi.org/10.1109/TPAMI.2022.3230064
Article Google Scholar
Han W, Dong X, Zhang Y, Crandall D, Xu C-Z, Shen J (2024) Asymmetric convolution: an efficient and generalized method to fuse feature maps in multiple vision tasks. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2024.3400873
Article Google Scholar
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4293–4302
Long Li C, Lu A, Hua Zheng A, Tu Z, Tang J (2019) Multi-adapter rgbt tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp 2262–2270
Lu A, Li C, Yan Y, Tang J, Luo B (2021) Rgbt tracking via multi-adapter network with hierarchical divergence loss. IEEE Trans Image Process 30:5613–5625. https://doi.org/10.1109/TIP.2021.3087341. (IEEE)
Article Google Scholar
Gao Y, Li C, Zhu Y, Tang J, He T, Wang F (2019) Deep adaptive fusion network for high performance rgbt tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp 91–99
Zhang H, Zhang L, Zhuo L, Zhang J (2020) Object tracking in rgb-t videos using modal-aware attention network and competitive learning. Sensors 20(2):393. https://doi.org/10.3390/s20020393. (MDPI)
Article Google Scholar
Hou R, Ren T, Wu G (2022) Mirnet: A robust rgbt tracking jointly with multi-modal interaction and refinement. In: 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1–6. https://doi.org/10.1109/ICME52920.2022.9860018
Wang X, Shu X, Zhang S, Jiang B, Wang Y, Tian Y, Wu F (2022) Mfgnet: dynamic modality-aware filter generation for rgb-t tracking. IEEE Trans Multimedia 4335:4348. https://doi.org/10.1109/TMM.2022.3174341
Article Google Scholar
Mei J, Zhou D, Cao J, Nie R, He K (2023) Differential reinforcement and global collaboration network for rgbt tracking. IEEE Sens J 23(7):7301–7311. https://doi.org/10.1109/JSEN.2023.3244834. (IEEE)
Article Google Scholar
Zhang P, Zhao J, Wang D, Lu H, Ruan X (2022) Visible-thermal uav tracking: a large-scale benchmark and new baseline. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8886–8895
Liu L, Li C, Xiao Y, Ruan R, Fan M (2024) Rgbt tracking via challenge-based appearance disentanglement and interaction. IEEE Trans Image Process 33:1753–1767. https://doi.org/10.1109/TIP.2024.3371355
Article Google Scholar
Zhang X, Ye P, Peng S, Liu J, Gong K, Xiao G (2019) Siamft: an rgb-infrared fusion tracking method via fully convolutional siamese networks. IEEE Access 7:122122–122133. https://doi.org/10.1109/ACCESS.2019.2936914. (IEEE)
Article Google Scholar
Zhang T, Liu X, Zhang Q, Han J (2021) Siamcda: complementarity and distractor-aware rgb-t tracking based on siamese network. IEEE Trans Circuits Syst Video Technol 32(3):1403–1417. https://doi.org/10.1109/TCSVT.2021.3072207. (IEEE)
Article Google Scholar
Feng M, Su J (2023) Learning multi-layer attention aggregation siamese network for robust rgbt tracking. IEEE Trans Multimedia. https://doi.org/10.1109/TMM.2023.3310295
Article Google Scholar
Fan H, Yu Z, Wang Q, Fan B, Tang Y (2024) Querytrack: joint-modality query fusion network for rgbt tracking. IEEE Trans Image Process 33:3187–3199. https://doi.org/10.1109/TIP.2024.3393298
Article Google Scholar
Lan X, Ye M, Zhang S, Zhou H, Yuen PC (2020) Modality-correlation-aware sparse representation for rgb-infrared object tracking. Pattern Recogn Lett 130:12–20 (Elsevier)
Article Google Scholar
Liu J, Luo Z, Xiong X (2023) Online learning samples and adaptive recovery for robust rgb-t tracking. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/TCSVT.2023.3288853
Article Google Scholar
Huang Y, Li X, Lu R (2023) Qi N Rgb-t object tracking via sparse response-consistency discriminative correlation filters. Infrared Phys Technol 128:104509 (Elsevier)
Article Google Scholar
Zhang L, Danelljan M, Gonzalez-Garcia A, Van De Weijer J, Shahbaz Khan F (2019) Multi-modal fusion for end-to-end rgb-t tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops
Feng M, Song K, Wang Y, Liu J, Yan Y (2020) Learning discriminative update adaptive spatial-temporal regularized correlation filter for rgb-t tracking. J Vis Commun Image Represent 72:102881
Article Google Scholar
Tang Z, Xu T, Li H, Wu X-J, Zhu X, Kittler J (2023) Exploring fusion strategies for accurate rgbt visual object tracking. Inf Fusion 99:101881
Article Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9
Guo C, Yang D, Li C, Song P (2022) Dual siamese network for rgbt tracking via fusing predicted position maps. Vis Comput 38(7):2555–2567
Article Google Scholar
Xue Y, Zhang J, Lin Z, Li C, Huo B, Zhang Y (2023) Siamcaf: complementary attention fusion-based siamese network for rgbt tracking. Remote Sens 15(13):3252
Article Google Scholar
Liu Y, Zhou D, Cao J, Yan K, Geng L (2024) Specific and collaborative representations siamese network for rgbt tracking. IEEE Sens J 24(11):18520–18534. https://doi.org/10.1109/JSEN.2024.3386772
Article Google Scholar
Wang G, Jiang Q, Jin X, Lin Y, Wang Y, Zhou W (2023) Siamtdr: time-efficient rgbt tracking via disentangled representations. IEEE Trans Ind Cyber Phys Syst. https://doi.org/10.1109/TICPS.2023.3307340
Article Google Scholar
Zhu Y, Li C, Tang J, Luo B (2020) Quality-aware feature aggregation network for robust rgbt tracking. IEEE Trans Intell Veh 6(1):121–130. https://doi.org/10.1109/TIV.2020.2980735. (IEEE)
Article Google Scholar
Zhang P, Zhao J, Bo C, Wang D, Lu H, Yang X (2021) Jointly modeling motion and appearance cues for robust rgb-t tracking. IEEE Trans Image Process 30:3335–3347. https://doi.org/10.1109/TIP.2021.3060862. (IEEE)
Article Google Scholar
Tu Z, Lin C, Zhao W, Li C, Tang J (2021) M 5 l: multi-modal multi-margin metric learning for rgbt tracking. IEEE Trans Image Process 31:85–98. https://doi.org/10.1109/TIP.2021.3125504. (IEEE)
Article Google Scholar
Lu A, Qian C, Li C, Tang J, Wang L (2022) Duality-gated mutual condition network for rgbt tracking. IEEE Trans Neural Netw Learn Syst 1:14. https://doi.org/10.1109/TNNLS.2022.3157594
Article Google Scholar
Xiao Y, Yang M, Li C, Liu L, Tang J (2022) Attribute-based progressive fusion network for rgbt tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 36, pp 2831–2838

Download references

Funding

This work is supported by the National Natural Science Foundation of China (62066047, 61966037), Practice Innovation Fund of Yunnan University (ZC-23234092).

Author information

Authors and Affiliations

School of Information and Engineering, Yunnan University, Kunming, 650500, Yunnan, China
Lizhi Geng, Dongming Zhou, Kerui Wang, Yisong Liu & Kaixiang Yan

Authors

Lizhi Geng
View author publications
You can also search for this author in PubMed Google Scholar
Dongming Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Kerui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yisong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kaixiang Yan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LG and KW wrote the body of the manuscript, YL prepared Figs. 1–5, and KY prepared Figs. 6–10. All authors reviewed the manuscript.

Corresponding author

Correspondence to Dongming Zhou.

Ethics declarations

Conflict of interest

To the best of our knowledge, the named authors have no Conflict of interest, financial or otherwise.

Ethical approval

No human or animal experiments are involved in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Geng, L., Zhou, D., Wang, K. et al. SiamMGT: robust RGBT tracking via graph attention and reliable modality weight learning. J Supercomput 80, 25888–25910 (2024). https://doi.org/10.1007/s11227-024-06443-9

Download citation

Accepted: 09 August 2024
Published: 16 August 2024
Issue Date: December 2024
DOI: https://doi.org/10.1007/s11227-024-06443-9

SiamMGT: robust RGBT tracking via graph attention and reliable modality weight learning

Abstract

Access this article

Subscribe and save

Buy Now