[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3447450.3447475acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicvipConference Proceedingsconference-collections
research-article

Siamese Region Proposal Networks and Attention Module for Real-time Visual Tracking

Published: 09 April 2021 Publication History

Abstract

Recently, the region proposal networks have been combined with the Siamese networks for visual tracking and have achieved great attention due to their balanced accuracy and speed. However, it is still challenging to track object with simultaneous requirements on robustness and discrimination power. A key to balance the online visual tracking accuracy and speed is to learn abundant features. In this paper, we propose an attention module based Siamese region proposal networks, named AM-Siam, for real-time visual tracking task. The basic idea is to refine neural features extracted from convolutional neural networks with channel attention module and spatial attention module, and provide reliable visual attentional features for tracking. In addition, we design a multi- task loss function with balanced L1 loss to accelerate convergence speed of the proposed tracking network. The proposed AM-Siam is trained off-line in an end-to-end pattern and does not update the network parameters during tracking. Experiments on the Visual Object Tracking (VOT) show that AM-Siam achieves state-of-the-art results with tracking accuracy of 61% and tracking speed of 189 fps on the VOT. Moreover, experimental results demonstrate the effectiveness of our proposed AM-Siam tracker compared with state-of-the-art trackers.

References

[1]
Lee, K.H., Hwang, J.N.: On-road pedestrian tracking across multiple driving recorders. IEEE Transactions on Multimedia 17(9), 1429–1438 (2015)
[2]
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for uav tracking. In: European conference on computer vision. pp. 445–461. Springer (2016)
[3]
Almomani, R., Dong, M., Zhu, D.: Object tracking via dirichlet process-based appearance models. Neural Computing and Applications 28(5), 867–879 (2017)
[4]
Held, D., Thrun, S., Savarese, S.: Learning to track at 100 fps with deep regression networks. In: European Conference on Computer Vision. pp. 749–765. Springer (2016)
[5]
Zhu, Z., Huang, G., Zou, W., Du, D., Huang, C.: Uct: Learning unified convolu- tional networks for real-time visual tracking. In: Proceedings of the IEEE Interna- tional Conference on Computer Vision Workshops. pp. 1973–1982 (2017)
[6]
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully- convolutional siamese networks for object tracking. In: European conference on computer vision. pp. 850–865. Springer (2016)
[7]
Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.: Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE Interna- tional Conference on Computer Vision. pp. 1763–1771 (2017)
[8]
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 8971–8980 (2018)
[9]
Hadfield, S., Bowden, R., Lebeda, K.: The visual object tracking vot2016 challenge results. Lecture Notes in Computer Science 9914, 777–823 (2016)
[10]
Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui, Y.M.: Visual object tracking using adaptive correlation filters. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 2544–2550 (2010)
[11]
Wang, N., Yeung, D.Y.: Learning a deep compact image representation for visual tracking. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26. pp. 809–817. Curran Associates, Inc. (2013)
[12]
Song,Y.,Ma,C.,Wu,X.,Gong,L.,Bao,L.,Zuo,W.,Shen,C.,Lau,R.W.H.,Yang, M.: VITAL: visual tracking via adversarial learning. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition. pp. 8990–8999. IEEE Computer Society (2018)
[13]
Danelljan, M., Robinson, A., Khan, F.S., Felsberg, M.: Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European con- ference on computer vision. pp. 472–488. Springer (2016)
[14]
Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1420–1429 (2016)
[15]
Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.: End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2805–2813 (2017)
[16]
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on pattern analysis and machine intelligence 20(11), 1254–1259 (1998)
[17]
Kosiorek, A., Bewley, A., Posner, I.: Hierarchical attentive recurrent tracking. In: Advances in neural information processing systems. pp. 3053–3061 (2017)
[18]
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
[19]
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3156–3164 (2017)
[20]
Chen, D., Zhang, S., Ouyang, W., Yang, J., Tai, Y.: Person search via a mask- guided two-stream cnn model. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 734–750 (2018)
[21]
Chen, Y., Kalantidis, Y., Li, J., Yan, S., Feng, J.: Aˆ 2-nets: Double attention networks. In: Advances in Neural Information Processing Systems. pp. 352–361 (2018)
[22]
Wang, Q., Teng, Z., Xing, J., Gao, J., Hu, W., Maybank, S.: Learning attentions: residual attentional siamese network for high performance online visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4854–4863 (2018)
[23]
Choi, J., Jin Chang, H., Yun, S., Fischer, T., Demiris, Y., Young Choi, J.: At- tentional correlation filter network for adaptive visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4807–4816 (2017)
[24]
Kosiorek, A., Bewley, A., Posner, I.: Hierarchical attentive recurrent tracking. In: Advances in neural information processing systems. pp. 3053–3061 (2017)
[25]
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7132–7141 (2018)
[26]
Woo, S., Park, J., Lee, J.Y., So Kweon, I.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 3–19 (2018)
[27]
Tang, S., Andriluka, M., Andres, B., Schiele, B.: Multiple people tracking by lifted multicut and person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3539–3548 (2017)
[28]
Hadfield, S., Bowden, R., Lebeda, K.: The visual object tracking vot2016 challenge results. Lecture Notes in Computer Science 9914, 777–823 (2016)
[29]
Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.: Staple: Comple- mentary learners for real-time tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1401–1409 (2016)
[30]
Lukezic, A., Vojir, T., Cehovin Zajc, L., Matas, J., Kristan, M.: Discriminative correlation filter with channel and spatial reliability. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)
[31]
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4293–4302 (2016)
[32]
Danelljan, M., Robinson, A., Khan, F.S., Felsberg, M.: Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European con- ference on computer vision. pp. 472–488. Springer (2016)
[33]
Danelljan,M.,Bhat,G.,ShahbazKhan,F.,Felsberg,M.:Eco:Efficientconvolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 6638–6646 (2017)
[34]
Zhu, Z., Huang, G., Zou, W., Du, D., Huang, C.: Uct: Learning unified convolu- tional networks for real-time visual tracking. In: Proceedings of the IEEE Interna- tional Conference on Computer Vision Workshops. pp. 1973–1982 (2017)

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICVIP '20: Proceedings of the 2020 4th International Conference on Video and Image Processing
December 2020
255 pages
ISBN:9781450389075
DOI:10.1145/3447450
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 April 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Attention module
  2. Real-time visual tracking
  3. Siamese networks

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Guangdong Science and Technology Program
  • Guangdong Basic and Applied Basic Research Foundation
  • National Key R&D Program of China

Conference

ICVIP 2020

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 39
    Total Downloads
  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media