[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

SiamMGT: robust RGBT tracking via graph attention and reliable modality weight learning

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In recent years, RGBT trackers based on the Siamese network have gained significant attention due to their balanced accuracy and efficiency. However, these trackers often rely on similarity matching of features between a fixed-size target template and search region, which can result in unsatisfactory tracking performance when there are dramatic changes in target scale or shape or occlusion occurs. Additionally, while these trackers often employ feature-level fusion for different modalities, they frequently overlook the benefits of decision-level fusion, which can diminish their flexibility and independence. In this paper, a novel Siamese tracker through graph attention and reliable modality weighting is proposed for robust RGBT tracking. Specifically, a modality feature interaction learning network is constructed to perform bidirectional learning of the local features from each modality while extracting their respective characteristics. Subsequently, a multimodality graph attention network is used to match the local features of the template and search region, generating more accurate and robust similarity responses. Finally, a modality fusion prediction network is designed to perform decision-level adaptive fusion of the two modality responses, leveraging their complementary nature for prediction. Extensive experiments on three large-scale RGBT benchmarks demonstrate outstanding tracking capabilities over other state-of-the-art trackers. Code will be shared at https://github.com/genglizhi/SiamMGT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: object-aware anchor-free tracking. In: Computer vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16. Springer, pp 771–787. https://doi.org/10.1007/978-3-030-58589-1_46

  2. Li C, Cheng H, Hu S, Liu X, Tang J, Lin L (2016) Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans Image Process 25(12):5743–5756. https://doi.org/10.1109/TIP.2016.2614135. (IEEE)

    Article  MathSciNet  Google Scholar 

  3. Li C, Liang X, Lu Y, Zhao N, Tang J (2019) Rgb-t object tracking: benchmark and baseline. Pattern Recogn 96:106977. https://doi.org/10.1016/j.patcog.2019.106977. (Elsevier)

    Article  Google Scholar 

  4. Li C, Xue W, Jia Y, Qu Z, Luo B, Tang J, Sun D (2021) Lasher: a large-scale high-diversity benchmark for rgbt tracking. IEEE Trans Image Process 31:392–404. https://doi.org/10.1109/TIP.2021.3130533. (IEEE)

    Article  Google Scholar 

  5. Du D, Qi Y, Yu H, Yang Y, Duan K, Li G, Zhang W, Huang Q, Tian Q (2018) The unmanned aerial vehicle benchmark: Object detection and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 370–386

  6. Zhang X, Ye P, Leung H, Gong K, Xiao G (2020) Object fusion tracking based on visible and infrared images: a comprehensive review. Inf Fusion 63:166–187 (Elsevier)

    Article  Google Scholar 

  7. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: Computer vision–ECCV 2016 workshops: Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, proceedings, part II 14. Springer, pp. 850–865. https://doi.org/10.1007/978-3-319-48881-3_56

  8. Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1763–1771

  9. Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8971–8980

  10. Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4282–4291

  11. Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) Siamcar: siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6269–6277

  12. Guo D, Shao Y, Cui Y, Wang Z, Zhang L, Shen C (2021) Graph attention tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9543–9552

  13. Qi Y, Zhang S, Jiang F, Zhou H, Tao D, Li X (2020) Siamese local and global networks for robust face tracking. IEEE Trans Image Process 29:9152–9164. https://doi.org/10.1109/TIP.2020.3023621

    Article  Google Scholar 

  14. Dong X, Shen J, Porikli F, Luo J, Shao L (2023) Adaptive siamese tracking with a compact latent network. IEEE Trans Pattern Anal Mach Intell 45(7):8049–8062. https://doi.org/10.1109/TPAMI.2022.3230064

    Article  Google Scholar 

  15. Han W, Dong X, Zhang Y, Crandall D, Xu C-Z, Shen J (2024) Asymmetric convolution: an efficient and generalized method to fuse feature maps in multiple vision tasks. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2024.3400873

    Article  Google Scholar 

  16. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4293–4302

  17. Long Li C, Lu A, Hua Zheng A, Tu Z, Tang J (2019) Multi-adapter rgbt tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp 2262–2270

  18. Lu A, Li C, Yan Y, Tang J, Luo B (2021) Rgbt tracking via multi-adapter network with hierarchical divergence loss. IEEE Trans Image Process 30:5613–5625. https://doi.org/10.1109/TIP.2021.3087341. (IEEE)

    Article  Google Scholar 

  19. Gao Y, Li C, Zhu Y, Tang J, He T, Wang F (2019) Deep adaptive fusion network for high performance rgbt tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp 91–99

  20. Zhang H, Zhang L, Zhuo L, Zhang J (2020) Object tracking in rgb-t videos using modal-aware attention network and competitive learning. Sensors 20(2):393. https://doi.org/10.3390/s20020393. (MDPI)

    Article  Google Scholar 

  21. Hou R, Ren T, Wu G (2022) Mirnet: A robust rgbt tracking jointly with multi-modal interaction and refinement. In: 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1–6. https://doi.org/10.1109/ICME52920.2022.9860018

  22. Wang X, Shu X, Zhang S, Jiang B, Wang Y, Tian Y, Wu F (2022) Mfgnet: dynamic modality-aware filter generation for rgb-t tracking. IEEE Trans Multimedia 4335:4348. https://doi.org/10.1109/TMM.2022.3174341

    Article  Google Scholar 

  23. Mei J, Zhou D, Cao J, Nie R, He K (2023) Differential reinforcement and global collaboration network for rgbt tracking. IEEE Sens J 23(7):7301–7311. https://doi.org/10.1109/JSEN.2023.3244834. (IEEE)

    Article  Google Scholar 

  24. Zhang P, Zhao J, Wang D, Lu H, Ruan X (2022) Visible-thermal uav tracking: a large-scale benchmark and new baseline. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8886–8895

  25. Liu L, Li C, Xiao Y, Ruan R, Fan M (2024) Rgbt tracking via challenge-based appearance disentanglement and interaction. IEEE Trans Image Process 33:1753–1767. https://doi.org/10.1109/TIP.2024.3371355

    Article  Google Scholar 

  26. Zhang X, Ye P, Peng S, Liu J, Gong K, Xiao G (2019) Siamft: an rgb-infrared fusion tracking method via fully convolutional siamese networks. IEEE Access 7:122122–122133. https://doi.org/10.1109/ACCESS.2019.2936914. (IEEE)

    Article  Google Scholar 

  27. Zhang T, Liu X, Zhang Q, Han J (2021) Siamcda: complementarity and distractor-aware rgb-t tracking based on siamese network. IEEE Trans Circuits Syst Video Technol 32(3):1403–1417. https://doi.org/10.1109/TCSVT.2021.3072207. (IEEE)

    Article  Google Scholar 

  28. Feng M, Su J (2023) Learning multi-layer attention aggregation siamese network for robust rgbt tracking. IEEE Trans Multimedia. https://doi.org/10.1109/TMM.2023.3310295

    Article  Google Scholar 

  29. Fan H, Yu Z, Wang Q, Fan B, Tang Y (2024) Querytrack: joint-modality query fusion network for rgbt tracking. IEEE Trans Image Process 33:3187–3199. https://doi.org/10.1109/TIP.2024.3393298

    Article  Google Scholar 

  30. Lan X, Ye M, Zhang S, Zhou H, Yuen PC (2020) Modality-correlation-aware sparse representation for rgb-infrared object tracking. Pattern Recogn Lett 130:12–20 (Elsevier)

    Article  Google Scholar 

  31. Liu J, Luo Z, Xiong X (2023) Online learning samples and adaptive recovery for robust rgb-t tracking. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/TCSVT.2023.3288853

    Article  Google Scholar 

  32. Huang Y, Li X, Lu R (2023) Qi N Rgb-t object tracking via sparse response-consistency discriminative correlation filters. Infrared Phys Technol 128:104509 (Elsevier)

    Article  Google Scholar 

  33. Zhang L, Danelljan M, Gonzalez-Garcia A, Van De Weijer J, Shahbaz Khan F (2019) Multi-modal fusion for end-to-end rgb-t tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops

  34. Feng M, Song K, Wang Y, Liu J, Yan Y (2020) Learning discriminative update adaptive spatial-temporal regularized correlation filter for rgb-t tracking. J Vis Commun Image Represent 72:102881

    Article  Google Scholar 

  35. Tang Z, Xu T, Li H, Wu X-J, Zhu X, Kittler J (2023) Exploring fusion strategies for accurate rgbt visual object tracking. Inf Fusion 99:101881

    Article  Google Scholar 

  36. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9

  37. Guo C, Yang D, Li C, Song P (2022) Dual siamese network for rgbt tracking via fusing predicted position maps. Vis Comput 38(7):2555–2567

    Article  Google Scholar 

  38. Xue Y, Zhang J, Lin Z, Li C, Huo B, Zhang Y (2023) Siamcaf: complementary attention fusion-based siamese network for rgbt tracking. Remote Sens 15(13):3252

    Article  Google Scholar 

  39. Liu Y, Zhou D, Cao J, Yan K, Geng L (2024) Specific and collaborative representations siamese network for rgbt tracking. IEEE Sens J 24(11):18520–18534. https://doi.org/10.1109/JSEN.2024.3386772

    Article  Google Scholar 

  40. Wang G, Jiang Q, Jin X, Lin Y, Wang Y, Zhou W (2023) Siamtdr: time-efficient rgbt tracking via disentangled representations. IEEE Trans Ind Cyber Phys Syst. https://doi.org/10.1109/TICPS.2023.3307340

    Article  Google Scholar 

  41. Zhu Y, Li C, Tang J, Luo B (2020) Quality-aware feature aggregation network for robust rgbt tracking. IEEE Trans Intell Veh 6(1):121–130. https://doi.org/10.1109/TIV.2020.2980735. (IEEE)

    Article  Google Scholar 

  42. Zhang P, Zhao J, Bo C, Wang D, Lu H, Yang X (2021) Jointly modeling motion and appearance cues for robust rgb-t tracking. IEEE Trans Image Process 30:3335–3347. https://doi.org/10.1109/TIP.2021.3060862. (IEEE)

    Article  Google Scholar 

  43. Tu Z, Lin C, Zhao W, Li C, Tang J (2021) M 5 l: multi-modal multi-margin metric learning for rgbt tracking. IEEE Trans Image Process 31:85–98. https://doi.org/10.1109/TIP.2021.3125504. (IEEE)

    Article  Google Scholar 

  44. Lu A, Qian C, Li C, Tang J, Wang L (2022) Duality-gated mutual condition network for rgbt tracking. IEEE Trans Neural Netw Learn Syst 1:14. https://doi.org/10.1109/TNNLS.2022.3157594

    Article  Google Scholar 

  45. Xiao Y, Yang M, Li C, Liu L, Tang J (2022) Attribute-based progressive fusion network for rgbt tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 36, pp 2831–2838

Download references

Funding

This work is supported by the National Natural Science Foundation of China (62066047, 61966037), Practice Innovation Fund of Yunnan University (ZC-23234092).

Author information

Authors and Affiliations

Authors

Contributions

LG and KW wrote the body of the manuscript, YL prepared Figs. 15, and KY prepared Figs. 610. All authors reviewed the manuscript.

Corresponding author

Correspondence to Dongming Zhou.

Ethics declarations

Conflict of interest

To the best of our knowledge, the named authors have no Conflict of interest, financial or otherwise.

Ethical approval

No human or animal experiments are involved in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Geng, L., Zhou, D., Wang, K. et al. SiamMGT: robust RGBT tracking via graph attention and reliable modality weight learning. J Supercomput 80, 25888–25910 (2024). https://doi.org/10.1007/s11227-024-06443-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-024-06443-9

Keywords

Navigation