[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Improving Feature Discrimination for Object Tracking by Structural-similarity-based Metric Learning

Published: 04 March 2022 Publication History

Abstract

Existing approaches usually form the tracking task as an appearance matching procedure. However, the discrimination ability of appearance features is insufficient in these trackers, which is caused by their weak feature supervision constraints and inadequate exploitation of spatial contexts. To tackle this issue, this article proposes a novel appearance matching tracking (AMT) method to strengthen the feature restraints and capture discriminative spatial representations. Specifically, we first utilize a triplet structural loss function, which improves the learning capability of features by applying a structural similarity constraint with a triplet metric format on the features. It leverages feature statistics to capture the complex interactions of visual parts. Second, we put forward an adaptive matching module that exploits the dual spatial enhancement module to reinforce target feature discrimination. This not only boosts the representation ability of spatial context but also realizes spatially dynamic feature selection by attending to target deformation information. Moreover, this model introduces a simple but effective matching unit to intuitively evaluate the relative appearance differences between the target and the proposals. In addition, with the obtained discriminative features, AMT is capable of providing precise localization for the target. Therefore, the impact of spatial suppression imposed by window functions can be alleviated, allowing for effective tracking of high-speed moving objects. Extensive experiments prove that AMT outperforms state-of-the-art methods on six public datasets and demonstrate the effectiveness of each component in AMT.

References

[1]
Luca Bertinetto, Jack Valmadre, Joao F. Henriques, Andrea Vedaldi, and Philip H. S. Torr. 2016. Fully convolutional Siamese networks for object tracking. In Proceedings of the European Conference on Computer Vision. Springer, 850–865.
[2]
Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. 2019. Learning discriminative model prediction for tracking. In Proceedings of the IEEE International Conference on Computer Vision. 6182–6191.
[3]
Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, and Han Hu. 2019. Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 0–0.
[4]
Ke Chen, Zhong Zhou, and Wei Wu. 2015. Progressive motion vector clustering for motion estimation and auxiliary tracking. ACM Trans. Multimedia Comput., Commun. Appl. 11, 3 (2015), 1–23.
[5]
Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2019. Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4660–4669.
[6]
Martin Danelljan, Luc Van Gool, and Radu Timofte. 2020. Probabilistic regression for visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7183–7192.
[7]
Martin Danelljan, Andreas Robinson, Fahad Shahbaz Khan, and Michael Felsberg. 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Proceedings of the European Conference on Computer Vision. Springer, 472–488.
[8]
J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248–255.
[9]
Xingping Dong and Jianbing Shen. 2018. Triplet loss in Siamese network for object tracking. In Proceedings of the European Conference on Computer Vision (ECCV’18). 459–474.
[10]
Fei Du, Peng Liu, Wei Zhao, and Xianglong Tang. 2020. Correlation-guided attention for corner detection-based visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6836–6845.
[11]
Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, and Haibin Ling. 2019. Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5374–5383.
[12]
Heng Fan and Haibin Ling. 2019. Siamese cascaded region proposal networks for real-time visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7952–7961.
[13]
Changyong Guo, Zhaoxin Zhang, Jinjiang Li, Xuesong Jiang, and Lei Zhang. 2020. Robust visual tracking using kernel sparse coding on multiple covariance descriptors. ACM Trans. Multimedia Comput., Commun. Appl. 16, 1s (2020), 1–22.
[14]
Anfeng He, Chong Luo, Xinmei Tian, and Wenjun Zeng. 2018. Towards a better match in Siamese network-based visual object tracker. In Proceedings of the European Conference on Computer Vision (ECCV’18). 0–0.
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[16]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.
[17]
Lianghua Huang, Xin Zhao, and Kaiqi Huang. 2018. Got-10k: A large high-diversity benchmark for generic object tracking in the wild. Retrieved from https://arXiv:1810.11981.
[18]
Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Xiao, and Yuning Jiang. 2018. Acquisition of localization confidence for accurate object detection. In Proceedings of the European Conference on Computer Vision (ECCV’18). 784–799.
[19]
Hamed Kiani Galoogahi, Ashton Fagg, Chen Huang, Deva Ramanan, and Simon Lucey. 2017. Need for speed: A benchmark for higher frame rate object tracking. In Proceedings of the IEEE International Conference on Computer Vision. 1125–1134.
[20]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. Retrieved from https://arXiv:1412.6980.
[21]
Matej Kristan, Ales Leonardis, Jiri Matas, Michael Felsberg, Roman Pflugfelder, Luka Cehovin Zajc, Tomas Vojir, Goutam Bhat, Alan Lukezic, Abdelrahman Eldesokey, et al. 2018. The sixth visual object tracking vot2018 challenge results. In Proceedings of the European Conference on Computer Vision (ECCV’18). 0–0.
[22]
Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, and Junjie Yan. 2019. Siamrpn++: Evolution of Siamese visual tracking with very deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4282–4291.
[23]
Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, and Xiaolin Hu. 2018. High performance visual tracking with Siamese region proposal network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8971–8980.
[24]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740–755.
[25]
Qiao Liu, Xin Li, Zhenyu He, Nana Fan, Di Yuan, Wei Liu, and Yongsheng Liang. 2020. Multi-task driven feature models for thermal infrared tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11604–11611.
[26]
Qiao Liu, Xin Li, Zhenyu He, Nana Fan, Di Yuan, and Hongpeng Wang. 2021. Learning deep multi-level similarity for thermal infrared object tracking. IEEE Trans. Multimedia 23 (2021), 2114–2126. DOI:
[27]
Qiao Liu, Xiaohuan Lu, Zhenyu He, Chunkai Zhang, and Wen-Sheng Chen. 2017. Deep convolutional neural networks for thermal infrared object tracking. Knowledge-Based Syst. 134 (2017), 189–198.
[28]
Xiankai Lu, Chao Ma, Bingbing Ni, Xiaokang Yang, Ian Reid, and Ming-Hsuan Yang. 2018. Deep regression tracking with shrinkage loss. In Proceedings of the European Conference on Computer Vision.
[29]
Chao Ma, Jia-Bin Huang, Xiaokang Yang, and Ming-Hsuan Yang. 2015. Hierarchical convolutional features for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision. 3074–3082.
[30]
Matthias Mueller, Neil Smith, and Bernard Ghanem. 2016. A benchmark and simulator for uav tracking. In Proceedings of the European Conference on Computer Vision. Springer, 445–461.
[31]
Hyeonseob Nam and Bohyung Han. 2016. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4293–4302.
[32]
Sung Cheol Park, Min Kyu Park, and Moon Gi Kang. 2003. Super-resolution image reconstruction: A technical overview. IEEE Signal Process. Mag. 20, 3 (2003), 21–36.
[33]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 3 (2015), 211–252.
[34]
Guangting Wang, Chong Luo, Zhiwei Xiong, and Wenjun Zeng. 2019. SPM-Tracker: Series-parallel matching for real-time visual object tracking. Retrieved from https://arXiv:1904.04452.
[35]
Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, and Philip H. S. Torr. 2019. Fast online object tracking and segmentation: A unifying approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1328–1338.
[36]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7794–7803.
[37]
Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13, 4 (2004), 600–612.
[38]
Sanghyun Woo, Jongchan Park, Joonyoung Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV’18). 3–19.
[39]
Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. 2015. Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37, 9 (2015), 1834–1848.
[40]
Tianyu Yang, Pengfei Xu, Runbo Hu, Hua Chai, and Antoni B. Chan. 2020. ROAM: Recurrently optimizing tracking model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6718–6727.
[41]
Yuechen Yu, Yilei Xiong, Weilin Huang, and Matthew R. Scott. 2020. Deformable Siamese attention networks for visual object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6728–6737.
[42]
Zhipeng Zhang and Houwen Peng. 2019. Deeper and wider Siamese networks for real-time visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4591–4600.
[43]
Bo Zhu, Jeremiah Z. Liu, Stephen F. Cauley, Bruce R. Rosen, and Matthew S. Rosen. 2018. Image reconstruction by domain-transform manifold learning. Nature 555, 7697 (2018), 487–492.
[44]
Zheng Zhu, Qiang Wang, Bo Li, Wei Wu, Junjie Yan, and Weiming Hu. 2018. Distractor-aware Siamese networks for visual object tracking. In Proceedings of the European Conference on Computer Vision (ECCV’18). 101–117.
[45]
Mohammadreza Zolfaghari, Kamaljeet Singh, and Thomas Brox. 2018. Eco: Efficient convolutional network for online video understanding. In Proceedings of the European Conference on Computer Vision (ECCV’18). 695–712.

Cited By

View all
  • (2025)CBIF-MComputers and Electrical Engineering10.1016/j.compeleceng.2024.109450118:PBOnline publication date: 7-Jan-2025
  • (2024)Enhanced Multi-Object Tracking: Inferring Motion States of Tracked ObjectsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3699960Online publication date: 11-Oct-2024
  • (2024)Asymmetric Deformable Spatio-temporal Framework for Infrared Object TrackingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3678882Online publication date: 19-Jul-2024
  • Show More Cited By

Index Terms

  1. Improving Feature Discrimination for Object Tracking by Structural-similarity-based Metric Learning

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 4
      November 2022
      497 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3514185
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 March 2022
      Accepted: 01 November 2021
      Revised: 01 September 2021
      Received: 01 March 2021
      Published in TOMM Volume 18, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Visual object tracking
      2. appearance matching
      3. feature discrimination
      4. triplet structural loss
      5. adaptive matching module
      6. reduction of spatial suppression

      Qualifiers

      • Research-article
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)58
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 02 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)CBIF-MComputers and Electrical Engineering10.1016/j.compeleceng.2024.109450118:PBOnline publication date: 7-Jan-2025
      • (2024)Enhanced Multi-Object Tracking: Inferring Motion States of Tracked ObjectsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3699960Online publication date: 11-Oct-2024
      • (2024)Asymmetric Deformable Spatio-temporal Framework for Infrared Object TrackingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3678882Online publication date: 19-Jul-2024
      • (2024)Exploring Visual Relationships via Transformer-based Graphs for Enhanced Image CaptioningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363855820:5(1-23)Online publication date: 22-Jan-2024
      • (2023)Towards Food Image Retrieval via Generalization-Oriented Sampling and Loss Function DesignACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360009520:1(1-19)Online publication date: 25-Aug-2023
      • (2023)Complementary Coarse-to-Fine Matching for Video Object SegmentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359649619:6(1-21)Online publication date: 12-Jul-2023
      • (2023)A2SC: Adversarial Attacks on Subspace ClusteringACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358709719:6(1-23)Online publication date: 12-Jul-2023
      • (2023)Robust Video Stabilization based on Motion DecompositionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358049819:5(1-24)Online publication date: 16-Mar-2023
      • (2023)LSTAloc: A Driver-Oriented Incentive Mechanism for Mobility-on-Demand Vehicular Crowdsensing MarketIEEE Transactions on Mobile Computing10.1109/TMC.2023.327167123:4(3106-3122)Online publication date: 1-May-2023
      • (2022)Real Time Facial Expression Recognition for Online LectureWireless Communications & Mobile Computing10.1155/2022/96842642022Online publication date: 1-Jan-2022
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media