Abstract
In recent years, RGBT trackers based on the Siamese network have gained significant attention due to their balanced accuracy and efficiency. However, these trackers often rely on similarity matching of features between a fixed-size target template and search region, which can result in unsatisfactory tracking performance when there are dramatic changes in target scale or shape or occlusion occurs. Additionally, while these trackers often employ feature-level fusion for different modalities, they frequently overlook the benefits of decision-level fusion, which can diminish their flexibility and independence. In this paper, a novel Siamese tracker through graph attention and reliable modality weighting is proposed for robust RGBT tracking. Specifically, a modality feature interaction learning network is constructed to perform bidirectional learning of the local features from each modality while extracting their respective characteristics. Subsequently, a multimodality graph attention network is used to match the local features of the template and search region, generating more accurate and robust similarity responses. Finally, a modality fusion prediction network is designed to perform decision-level adaptive fusion of the two modality responses, leveraging their complementary nature for prediction. Extensive experiments on three large-scale RGBT benchmarks demonstrate outstanding tracking capabilities over other state-of-the-art trackers. Code will be shared at https://github.com/genglizhi/SiamMGT.
Similar content being viewed by others
Data Availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: object-aware anchor-free tracking. In: Computer vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16. Springer, pp 771–787. https://doi.org/10.1007/978-3-030-58589-1_46
Li C, Cheng H, Hu S, Liu X, Tang J, Lin L (2016) Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans Image Process 25(12):5743–5756. https://doi.org/10.1109/TIP.2016.2614135. (IEEE)
Li C, Liang X, Lu Y, Zhao N, Tang J (2019) Rgb-t object tracking: benchmark and baseline. Pattern Recogn 96:106977. https://doi.org/10.1016/j.patcog.2019.106977. (Elsevier)
Li C, Xue W, Jia Y, Qu Z, Luo B, Tang J, Sun D (2021) Lasher: a large-scale high-diversity benchmark for rgbt tracking. IEEE Trans Image Process 31:392–404. https://doi.org/10.1109/TIP.2021.3130533. (IEEE)
Du D, Qi Y, Yu H, Yang Y, Duan K, Li G, Zhang W, Huang Q, Tian Q (2018) The unmanned aerial vehicle benchmark: Object detection and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 370–386
Zhang X, Ye P, Leung H, Gong K, Xiao G (2020) Object fusion tracking based on visible and infrared images: a comprehensive review. Inf Fusion 63:166–187 (Elsevier)
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: Computer vision–ECCV 2016 workshops: Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, proceedings, part II 14. Springer, pp. 850–865. https://doi.org/10.1007/978-3-319-48881-3_56
Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1763–1771
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8971–8980
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4282–4291
Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) Siamcar: siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6269–6277
Guo D, Shao Y, Cui Y, Wang Z, Zhang L, Shen C (2021) Graph attention tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9543–9552
Qi Y, Zhang S, Jiang F, Zhou H, Tao D, Li X (2020) Siamese local and global networks for robust face tracking. IEEE Trans Image Process 29:9152–9164. https://doi.org/10.1109/TIP.2020.3023621
Dong X, Shen J, Porikli F, Luo J, Shao L (2023) Adaptive siamese tracking with a compact latent network. IEEE Trans Pattern Anal Mach Intell 45(7):8049–8062. https://doi.org/10.1109/TPAMI.2022.3230064
Han W, Dong X, Zhang Y, Crandall D, Xu C-Z, Shen J (2024) Asymmetric convolution: an efficient and generalized method to fuse feature maps in multiple vision tasks. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2024.3400873
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4293–4302
Long Li C, Lu A, Hua Zheng A, Tu Z, Tang J (2019) Multi-adapter rgbt tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp 2262–2270
Lu A, Li C, Yan Y, Tang J, Luo B (2021) Rgbt tracking via multi-adapter network with hierarchical divergence loss. IEEE Trans Image Process 30:5613–5625. https://doi.org/10.1109/TIP.2021.3087341. (IEEE)
Gao Y, Li C, Zhu Y, Tang J, He T, Wang F (2019) Deep adaptive fusion network for high performance rgbt tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp 91–99
Zhang H, Zhang L, Zhuo L, Zhang J (2020) Object tracking in rgb-t videos using modal-aware attention network and competitive learning. Sensors 20(2):393. https://doi.org/10.3390/s20020393. (MDPI)
Hou R, Ren T, Wu G (2022) Mirnet: A robust rgbt tracking jointly with multi-modal interaction and refinement. In: 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1–6. https://doi.org/10.1109/ICME52920.2022.9860018
Wang X, Shu X, Zhang S, Jiang B, Wang Y, Tian Y, Wu F (2022) Mfgnet: dynamic modality-aware filter generation for rgb-t tracking. IEEE Trans Multimedia 4335:4348. https://doi.org/10.1109/TMM.2022.3174341
Mei J, Zhou D, Cao J, Nie R, He K (2023) Differential reinforcement and global collaboration network for rgbt tracking. IEEE Sens J 23(7):7301–7311. https://doi.org/10.1109/JSEN.2023.3244834. (IEEE)
Zhang P, Zhao J, Wang D, Lu H, Ruan X (2022) Visible-thermal uav tracking: a large-scale benchmark and new baseline. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8886–8895
Liu L, Li C, Xiao Y, Ruan R, Fan M (2024) Rgbt tracking via challenge-based appearance disentanglement and interaction. IEEE Trans Image Process 33:1753–1767. https://doi.org/10.1109/TIP.2024.3371355
Zhang X, Ye P, Peng S, Liu J, Gong K, Xiao G (2019) Siamft: an rgb-infrared fusion tracking method via fully convolutional siamese networks. IEEE Access 7:122122–122133. https://doi.org/10.1109/ACCESS.2019.2936914. (IEEE)
Zhang T, Liu X, Zhang Q, Han J (2021) Siamcda: complementarity and distractor-aware rgb-t tracking based on siamese network. IEEE Trans Circuits Syst Video Technol 32(3):1403–1417. https://doi.org/10.1109/TCSVT.2021.3072207. (IEEE)
Feng M, Su J (2023) Learning multi-layer attention aggregation siamese network for robust rgbt tracking. IEEE Trans Multimedia. https://doi.org/10.1109/TMM.2023.3310295
Fan H, Yu Z, Wang Q, Fan B, Tang Y (2024) Querytrack: joint-modality query fusion network for rgbt tracking. IEEE Trans Image Process 33:3187–3199. https://doi.org/10.1109/TIP.2024.3393298
Lan X, Ye M, Zhang S, Zhou H, Yuen PC (2020) Modality-correlation-aware sparse representation for rgb-infrared object tracking. Pattern Recogn Lett 130:12–20 (Elsevier)
Liu J, Luo Z, Xiong X (2023) Online learning samples and adaptive recovery for robust rgb-t tracking. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/TCSVT.2023.3288853
Huang Y, Li X, Lu R (2023) Qi N Rgb-t object tracking via sparse response-consistency discriminative correlation filters. Infrared Phys Technol 128:104509 (Elsevier)
Zhang L, Danelljan M, Gonzalez-Garcia A, Van De Weijer J, Shahbaz Khan F (2019) Multi-modal fusion for end-to-end rgb-t tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops
Feng M, Song K, Wang Y, Liu J, Yan Y (2020) Learning discriminative update adaptive spatial-temporal regularized correlation filter for rgb-t tracking. J Vis Commun Image Represent 72:102881
Tang Z, Xu T, Li H, Wu X-J, Zhu X, Kittler J (2023) Exploring fusion strategies for accurate rgbt visual object tracking. Inf Fusion 99:101881
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9
Guo C, Yang D, Li C, Song P (2022) Dual siamese network for rgbt tracking via fusing predicted position maps. Vis Comput 38(7):2555–2567
Xue Y, Zhang J, Lin Z, Li C, Huo B, Zhang Y (2023) Siamcaf: complementary attention fusion-based siamese network for rgbt tracking. Remote Sens 15(13):3252
Liu Y, Zhou D, Cao J, Yan K, Geng L (2024) Specific and collaborative representations siamese network for rgbt tracking. IEEE Sens J 24(11):18520–18534. https://doi.org/10.1109/JSEN.2024.3386772
Wang G, Jiang Q, Jin X, Lin Y, Wang Y, Zhou W (2023) Siamtdr: time-efficient rgbt tracking via disentangled representations. IEEE Trans Ind Cyber Phys Syst. https://doi.org/10.1109/TICPS.2023.3307340
Zhu Y, Li C, Tang J, Luo B (2020) Quality-aware feature aggregation network for robust rgbt tracking. IEEE Trans Intell Veh 6(1):121–130. https://doi.org/10.1109/TIV.2020.2980735. (IEEE)
Zhang P, Zhao J, Bo C, Wang D, Lu H, Yang X (2021) Jointly modeling motion and appearance cues for robust rgb-t tracking. IEEE Trans Image Process 30:3335–3347. https://doi.org/10.1109/TIP.2021.3060862. (IEEE)
Tu Z, Lin C, Zhao W, Li C, Tang J (2021) M 5 l: multi-modal multi-margin metric learning for rgbt tracking. IEEE Trans Image Process 31:85–98. https://doi.org/10.1109/TIP.2021.3125504. (IEEE)
Lu A, Qian C, Li C, Tang J, Wang L (2022) Duality-gated mutual condition network for rgbt tracking. IEEE Trans Neural Netw Learn Syst 1:14. https://doi.org/10.1109/TNNLS.2022.3157594
Xiao Y, Yang M, Li C, Liu L, Tang J (2022) Attribute-based progressive fusion network for rgbt tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 36, pp 2831–2838
Funding
This work is supported by the National Natural Science Foundation of China (62066047, 61966037), Practice Innovation Fund of Yunnan University (ZC-23234092).
Ethics declarations
Conflict of interest
To the best of our knowledge, the named authors have no Conflict of interest, financial or otherwise.
Ethical approval
No human or animal experiments are involved in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Geng, L., Zhou, D., Wang, K. et al. SiamMGT: robust RGBT tracking via graph attention and reliable modality weight learning. J Supercomput 80, 25888–25910 (2024). https://doi.org/10.1007/s11227-024-06443-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-024-06443-9