SpectralTracker: Jointly High and Low-Frequency Modeling for Tracking

Yimin Rong^15,16,
Qihua Liang^15,16,
Ning Li^15,16,
Zhiyi Mo^15,16,17 &
…
Bineng Zhong^15,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14436))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

805 Accesses

Abstract

Recently, a considerable number of top-performing Transformer based trackers have been proposed. However, most of them mainly focus on utilizing low-frequency information from a spatial-spectral analysis perspective, limiting their performance in complicated scenes. To address this problem, we propose a spectral tracker that explores how to capture high and low-frequency information for robust tracking jointly. Specifically, we design a novel dual-spectral information extraction and aggregation module (DSM) consisting of a high and low-frequency branch to capture and combine complementary frequency information of a Transformer effectively. Firstly, we divide the local window in the high-frequency branch to focus on more fine-grained high-frequency information. Then, in the low-frequency branch, we apply AvgPooling with a low-pass effect on a Transformer to amplify its low-frequency information. Furthermore, we design a shared MLP strategy to polarize the dual-frequency branching to high and low-frequency information attention. Finally, we utilize an MLP to complementarily fuse high and low-frequency information for frequency domain modeling. Comprehensive experiments on five tracking benchmarks (i.e., GOT-10k, TrackingNet, LaSOT, UAV123 and TNL2K) show that our spectral tracker achieves better performance than the state-of-the-art trackers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 51.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Spatio-temporal joint aberrance suppressed correlation filter for visual tracking

Article Open access 25 September 2021

SiamATA: an asymmetric target-aware and frequency domain task-aware Siamese network for visual tracking

Article 30 September 2024

MLGT: multi-local guided tracker for visual object tracking

Article 19 March 2024

References

Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional Siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Chapter Google Scholar
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: CVPR (2018)
Google Scholar
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of Siamese visual tracking with very deep networks. In: CVPR (2019)
Google Scholar
Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: CVPR (2020)
Google Scholar
Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. In: CVPR (2021)
Google Scholar
Yan, B., Peng, H., Fu, J., Wang, D., Lu, H.: Learning spatio-temporal transformer for visual tracking. In: ICCV, pp. 10428–10437 (2021)
Google Scholar
Ye, B., Chang, H., Ma, B., Shan, S.: Joint feature learning and relation modeling for tracking: a one-stream framework. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV. LNCS, vol. 13682. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20047-2_20
Guo, D., Wang, J., Cui, Y., Wang, Z., Chen, S.: SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: CVPR (2020)
Google Scholar
Cui, Y., Jiang, C., Wang, L., Wu, G.: Target transformed regression for accurate tracking. arXiv Computer Vision and Pattern Recognition (2021)
Google Scholar
Wang, N., Zhou, W., Wang, J., Li, H.: Transformer meets tracker: exploiting temporal context for robust visual tracking. In: Computer Vision and Pattern Recognition (2021)
Google Scholar
Pan, Z., Cai, J., Zhuang, B.: Fast vision transformers with HiLo attention. CoRR (2022)
Google Scholar
Si, C., Yu, W., Zhou, P., Zhou, Y., Wang, X., Yan, S.: Inception transformer. CoRR (2022)
Google Scholar
Cui, Y., Jiang, C., Wang, L., Wu, G.: MixFormer: end-to-end tracking with iterative mixed attention. In: CVPR, pp. 13598–13608. IEEE (2022)
Google Scholar
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese networks for visual object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 103–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_7
Chapter Google Scholar
Zheng, Y., Zhong, B., Liang, Q., Tang, Z., Ji, R., Li, X.: Leveraging local and global cues for visual tracking via parallel interaction network. IEEE Trans. Circuits Syst. Video Technol. 33(4), 1671–1683 (2022)
Article Google Scholar
Zhao, M., Okada, K., Inaba, M.: TrTr: visual tracking with transformer. arXiv preprint arXiv:2105.03817 (2021)
Park, N., Kim, S.: How do vision transformers work? In: ICLR (2022)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale. In: ICLR (2021)
Google Scholar
Wu, H., et al.: CVT: introducing convolutions to vision transformers. In: ICCV, pp. 22–31. IEEE (2021)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Neural Information Processing Systems (2017)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Fan, H., et al.: LaSOT: a high-quality benchmark for large-scale single object tracking. In: CVPR (2019)
Google Scholar
Huang, L., Zhao, X., Huang, K.: GOT-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1562–1577 (2021)
Article Google Scholar
Müller, M., Bibi, A., Giancola, S., Alsubaihi, S., Ghanem, B.: TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 310–327. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_19
Chapter Google Scholar
Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: ICCV (2019)
Google Scholar
Zhang, Z., Peng, H., Fu, J., Li, B., Hu, W.: Ocean: object-aware anchor-free tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 771–787. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_46
Chapter Google Scholar
Zhang, Z., Liu, Y., Wang, X., Li, B., Hu, W.: Learn to match: automatic matching network design for visual tracking. In: ICCV (2021)
Google Scholar
Mayer, C., Danelljan, M., Paudel, D.P., Gool, L.V.: Learning target candidate association to keep track of what not to track. In: ICCV, pp. 13424–13434. IEEE (2021)
Google Scholar
Ma, F., et al.: Unified transformer tracker for object tracking. In: CVPR (2022)
Google Scholar
Song, Z., Yu, J., Chen, Y.P., Yang, W.: Transformer tracking with cyclic shifting window attention. In: CVPR (2022)
Google Scholar
Gao, S., Zhou, C., Ma, C., Wang, X., Yuan, J.: AiATrack: attention in attention for transformer visual tracking. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. LNCS, vol. 13682, pp. 146–164. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20047-2_9
Chen, B., et al.: Backbone is all your need: a simplified architecture for visual object tracking. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV. LNCS, vol. 12356. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-58621-8_6
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 445–461. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_27
Chapter Google Scholar
Wang, X., et al.: Towards more flexible and accurate object tracking with natural language: algorithms and benchmark. In: CVPR (2021)
Google Scholar
Fu, Z., Liu, Q., Fu, Z., Wang, Y.: STMTrack: template-free visual tracking with space-time memory networks. In: CVPR (2021)
Google Scholar
Mayer, C., et al.: Transforming model prediction for tracking. In: CVPR (2022)
Google Scholar
Guo, M., et al.: Learning target-aware representation for visual tracking via informative interactions. In: Raedt, L.D. (ed.) IJCAI (2022)
Google Scholar

Download references

Acknowledgment

This work was supported by the Project of Guangxi Science and Technology (No. 2022GXNSFDA035079), the National Natural Science Foundation of China (No. 61972167 and U21A20474), the Guangxi “Bagui Scholar” Teams for Innovation and Research Project, the Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing, the Guangxi Talent Highland Project of Big Data Intelligence and Application, and the Research Project of Guangxi Normal University (No. 2022TD002).

Author information

Authors and Affiliations

Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004, China
Yimin Rong, Qihua Liang, Ning Li, Zhiyi Mo & Bineng Zhong
Guangxi Key Laboratory of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, 541004, China
Yimin Rong, Qihua Liang, Ning Li, Zhiyi Mo & Bineng Zhong
Guangxi Key Laboratory of Machine Vision and Intelligent Control, Wuzhou University, Wuzhou, 543002, China
Zhiyi Mo

Authors

Yimin Rong
View author publications
You can also search for this author in PubMed Google Scholar
Qihua Liang
View author publications
You can also search for this author in PubMed Google Scholar
Ning Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyi Mo
View author publications
You can also search for this author in PubMed Google Scholar
Bineng Zhong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qihua Liang .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rong, Y., Liang, Q., Li, N., Mo, Z., Zhong, B. (2024). SpectralTracker: Jointly High and Low-Frequency Modeling for Tracking. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14436. Springer, Singapore. https://doi.org/10.1007/978-981-99-8555-5_17

Download citation

DOI: https://doi.org/10.1007/978-981-99-8555-5_17
Published: 28 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8554-8
Online ISBN: 978-981-99-8555-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics