More Web Proxy on the site http://driver.im/

research-article

Improving Feature Discrimination for Object Tracking by Structural-similarity-based Metric Learning

Authors:

Yimin LiuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 18, Issue 4

Article No.: 90, Pages 1 - 23

https://doi.org/10.1145/3497746

Published: 04 March 2022 Publication History

Abstract

Existing approaches usually form the tracking task as an appearance matching procedure. However, the discrimination ability of appearance features is insufficient in these trackers, which is caused by their weak feature supervision constraints and inadequate exploitation of spatial contexts. To tackle this issue, this article proposes a novel appearance matching tracking (AMT) method to strengthen the feature restraints and capture discriminative spatial representations. Specifically, we first utilize a triplet structural loss function, which improves the learning capability of features by applying a structural similarity constraint with a triplet metric format on the features. It leverages feature statistics to capture the complex interactions of visual parts. Second, we put forward an adaptive matching module that exploits the dual spatial enhancement module to reinforce target feature discrimination. This not only boosts the representation ability of spatial context but also realizes spatially dynamic feature selection by attending to target deformation information. Moreover, this model introduces a simple but effective matching unit to intuitively evaluate the relative appearance differences between the target and the proposals. In addition, with the obtained discriminative features, AMT is capable of providing precise localization for the target. Therefore, the impact of spatial suppression imposed by window functions can be alleviated, allowing for effective tracking of high-speed moving objects. Extensive experiments prove that AMT outperforms state-of-the-art methods on six public datasets and demonstrate the effectiveness of each component in AMT.

References

[1]

Luca Bertinetto, Jack Valmadre, Joao F. Henriques, Andrea Vedaldi, and Philip H. S. Torr. 2016. Fully convolutional Siamese networks for object tracking. In Proceedings of the European Conference on Computer Vision. Springer, 850–865.

[2]

Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. 2019. Learning discriminative model prediction for tracking. In Proceedings of the IEEE International Conference on Computer Vision. 6182–6191.

[3]

Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, and Han Hu. 2019. Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 0–0.

[4]

Ke Chen, Zhong Zhou, and Wei Wu. 2015. Progressive motion vector clustering for motion estimation and auxiliary tracking. ACM Trans. Multimedia Comput., Commun. Appl. 11, 3 (2015), 1–23.

Digital Library

[5]

Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2019. Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4660–4669.

[6]

Martin Danelljan, Luc Van Gool, and Radu Timofte. 2020. Probabilistic regression for visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7183–7192.

[7]

Martin Danelljan, Andreas Robinson, Fahad Shahbaz Khan, and Michael Felsberg. 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Proceedings of the European Conference on Computer Vision. Springer, 472–488.

[8]

J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248–255.

[9]

Xingping Dong and Jianbing Shen. 2018. Triplet loss in Siamese network for object tracking. In Proceedings of the European Conference on Computer Vision (ECCV’18). 459–474.

Digital Library

[10]

Fei Du, Peng Liu, Wei Zhao, and Xianglong Tang. 2020. Correlation-guided attention for corner detection-based visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6836–6845.

[11]

Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, and Haibin Ling. 2019. Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5374–5383.

[12]

Heng Fan and Haibin Ling. 2019. Siamese cascaded region proposal networks for real-time visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7952–7961.

[13]

Changyong Guo, Zhaoxin Zhang, Jinjiang Li, Xuesong Jiang, and Lei Zhang. 2020. Robust visual tracking using kernel sparse coding on multiple covariance descriptors. ACM Trans. Multimedia Comput., Commun. Appl. 16, 1s (2020), 1–22.

Digital Library

[14]

Anfeng He, Chong Luo, Xinmei Tian, and Wenjun Zeng. 2018. Towards a better match in Siamese network-based visual object tracker. In Proceedings of the European Conference on Computer Vision (ECCV’18). 0–0.

[15]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[16]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.

[17]

Lianghua Huang, Xin Zhao, and Kaiqi Huang. 2018. Got-10k: A large high-diversity benchmark for generic object tracking in the wild. Retrieved from https://arXiv:1810.11981.

[18]

Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Xiao, and Yuning Jiang. 2018. Acquisition of localization confidence for accurate object detection. In Proceedings of the European Conference on Computer Vision (ECCV’18). 784–799.

Digital Library

[19]

Hamed Kiani Galoogahi, Ashton Fagg, Chen Huang, Deva Ramanan, and Simon Lucey. 2017. Need for speed: A benchmark for higher frame rate object tracking. In Proceedings of the IEEE International Conference on Computer Vision. 1125–1134.

[20]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. Retrieved from https://arXiv:1412.6980.

[21]

Matej Kristan, Ales Leonardis, Jiri Matas, Michael Felsberg, Roman Pflugfelder, Luka Cehovin Zajc, Tomas Vojir, Goutam Bhat, Alan Lukezic, Abdelrahman Eldesokey, et al. 2018. The sixth visual object tracking vot2018 challenge results. In Proceedings of the European Conference on Computer Vision (ECCV’18). 0–0.

[22]

Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, and Junjie Yan. 2019. Siamrpn++: Evolution of Siamese visual tracking with very deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4282–4291.

[23]

Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, and Xiaolin Hu. 2018. High performance visual tracking with Siamese region proposal network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8971–8980.

[24]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740–755.

[25]

Qiao Liu, Xin Li, Zhenyu He, Nana Fan, Di Yuan, Wei Liu, and Yongsheng Liang. 2020. Multi-task driven feature models for thermal infrared tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11604–11611.

[26]

Qiao Liu, Xin Li, Zhenyu He, Nana Fan, Di Yuan, and Hongpeng Wang. 2021. Learning deep multi-level similarity for thermal infrared object tracking. IEEE Trans. Multimedia 23 (2021), 2114–2126. DOI:

[27]

Qiao Liu, Xiaohuan Lu, Zhenyu He, Chunkai Zhang, and Wen-Sheng Chen. 2017. Deep convolutional neural networks for thermal infrared object tracking. Knowledge-Based Syst. 134 (2017), 189–198.

[28]

Xiankai Lu, Chao Ma, Bingbing Ni, Xiaokang Yang, Ian Reid, and Ming-Hsuan Yang. 2018. Deep regression tracking with shrinkage loss. In Proceedings of the European Conference on Computer Vision.

[29]

Chao Ma, Jia-Bin Huang, Xiaokang Yang, and Ming-Hsuan Yang. 2015. Hierarchical convolutional features for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision. 3074–3082.

Digital Library

[30]

Matthias Mueller, Neil Smith, and Bernard Ghanem. 2016. A benchmark and simulator for uav tracking. In Proceedings of the European Conference on Computer Vision. Springer, 445–461.

[31]

Hyeonseob Nam and Bohyung Han. 2016. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4293–4302.

[32]

Sung Cheol Park, Min Kyu Park, and Moon Gi Kang. 2003. Super-resolution image reconstruction: A technical overview. IEEE Signal Process. Mag. 20, 3 (2003), 21–36.

[33]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 3 (2015), 211–252.

Digital Library

[34]

Guangting Wang, Chong Luo, Zhiwei Xiong, and Wenjun Zeng. 2019. SPM-Tracker: Series-parallel matching for real-time visual object tracking. Retrieved from https://arXiv:1904.04452.

[35]

Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, and Philip H. S. Torr. 2019. Fast online object tracking and segmentation: A unifying approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1328–1338.

[36]

Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7794–7803.

[37]

Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13, 4 (2004), 600–612.

Digital Library

[38]

Sanghyun Woo, Jongchan Park, Joonyoung Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV’18). 3–19.

[39]

Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. 2015. Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37, 9 (2015), 1834–1848.

Digital Library

[40]

Tianyu Yang, Pengfei Xu, Runbo Hu, Hua Chai, and Antoni B. Chan. 2020. ROAM: Recurrently optimizing tracking model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6718–6727.

[41]

Yuechen Yu, Yilei Xiong, Weilin Huang, and Matthew R. Scott. 2020. Deformable Siamese attention networks for visual object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6728–6737.

[42]

Zhipeng Zhang and Houwen Peng. 2019. Deeper and wider Siamese networks for real-time visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4591–4600.

[43]

Bo Zhu, Jeremiah Z. Liu, Stephen F. Cauley, Bruce R. Rosen, and Matthew S. Rosen. 2018. Image reconstruction by domain-transform manifold learning. Nature 555, 7697 (2018), 487–492.

[44]

Zheng Zhu, Qiang Wang, Bo Li, Wei Wu, Junjie Yan, and Weiming Hu. 2018. Distractor-aware Siamese networks for visual object tracking. In Proceedings of the European Conference on Computer Vision (ECCV’18). 101–117.

[45]

Mohammadreza Zolfaghari, Kamaljeet Singh, and Thomas Brox. 2018. Eco: Efficient convolutional network for online video understanding. In Proceedings of the European Conference on Computer Vision (ECCV’18). 695–712.

Digital Library

Cited By

Mondal SPal AIslam S(2025)CBIF-MComputers and Electrical Engineering10.1016/j.compeleceng.2024.109450118:PBOnline publication date: 7-Jan-2025
https://dl.acm.org/doi/10.1016/j.compeleceng.2024.109450
Liao PYang FWu DLiu BZhang XZhou S(2024)Enhanced Multi-Object Tracking: Inferring Motion States of Tracked ObjectsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3699960Online publication date: 11-Oct-2024
https://doi.org/10.1145/3699960
Wu JZhou XLi XLiu HQi MHong R(2024)Asymmetric Deformable Spatio-temporal Framework for Infrared Object TrackingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3678882Online publication date: 19-Jul-2024
https://doi.org/10.1145/3678882
Show More Cited By

Index Terms

Improving Feature Discrimination for Object Tracking by Structural-similarity-based Metric Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Tracking
      2. Computer vision representations
        Appearance and texture representations

Recommendations

Joint feature correspondences and appearance similarity for robust visual object tracking

A novel visual object tracking scheme is proposed by using joint point feature correspondences and object appearance similarity. For point feature-based tracking, we propose a candidate tracker that simultaneously exploits two separate sets of point ...
Visual Object Tracking Based on Mean-shift and Particle-Kalman Filter

Even though many algorithms have been developed and many applications of object tracking have been made, object tracking is still considered as a difficult task to accomplish. The existence of several problems such as illumination variation, tracking ...
Occlusion-aware visual object tracking based on multi-template updating Siamese network
Abstract
Visual object tracking is a crucial area of computer vision research. It aims to accurately track objects in videos with challenges such as occlusion, deformation, and lighting variations. Existing algorithms face difficulties when objects leave ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 18, Issue 4

November 2022

497 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3514185

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 March 2022

Accepted: 01 November 2021

Revised: 01 September 2021

Received: 01 March 2021

Published in TOMM Volume 18, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
735
Total Downloads

Downloads (Last 12 months)58
Downloads (Last 6 weeks)8

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mondal SPal AIslam S(2025)CBIF-MComputers and Electrical Engineering10.1016/j.compeleceng.2024.109450118:PBOnline publication date: 7-Jan-2025
https://dl.acm.org/doi/10.1016/j.compeleceng.2024.109450
Liao PYang FWu DLiu BZhang XZhou S(2024)Enhanced Multi-Object Tracking: Inferring Motion States of Tracked ObjectsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3699960Online publication date: 11-Oct-2024
https://doi.org/10.1145/3699960
Wu JZhou XLi XLiu HQi MHong R(2024)Asymmetric Deformable Spatio-temporal Framework for Infrared Object TrackingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3678882Online publication date: 19-Jul-2024
https://doi.org/10.1145/3678882
Li JMao ZLi HChen WZhang Y(2024)Exploring Visual Relationships via Transformer-based Graphs for Enhanced Image CaptioningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363855820:5(1-23)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3638558
Song JLi ZMin WJiang S(2023)Towards Food Image Retrieval via Generalization-Oriented Sampling and Loss Function DesignACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360009520:1(1-19)Online publication date: 25-Aug-2023
https://dl.acm.org/doi/10.1145/3600095
Chen ZYang MZhang S(2023)Complementary Coarse-to-Fine Matching for Video Object SegmentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359649619:6(1-21)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1145/3596496
Xu YWei XDai PCao X(2023)A2SC: Adversarial Attacks on Subspace ClusteringACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358709719:6(1-23)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1145/3587097
Wang JLing QLi P(2023)Robust Video Stabilization based on Motion DecompositionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358049819:5(1-24)Online publication date: 16-Mar-2023
https://dl.acm.org/doi/10.1145/3580498
Xiang CCheng WLin CZhang XLiu DZheng XLi Z(2023)LSTAloc: A Driver-Oriented Incentive Mechanism for Mobility-on-Demand Vehicular Crowdsensing MarketIEEE Transactions on Mobile Computing10.1109/TMC.2023.327167123:4(3106-3122)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1109/TMC.2023.3271671
Wu H(2022)Real Time Facial Expression Recognition for Online LectureWireless Communications & Mobile Computing10.1155/2022/96842642022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/9684264
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents