More Web Proxy on the site http://driver.im/

research-article

Deblurring transformer tracking with conditional cross-attention

Authors:

Fasheng WangAuthors Info & Claims

Multimedia Systems, Volume 29, Issue 3

Pages 1131 - 1144

https://doi.org/10.1007/s00530-022-01043-0

Published: 30 December 2022 Publication History

Abstract

In object tracking, motion blur is a common challenge induced by rapid movement of target object or long time exposure of the camera, which leads to poor tracking performance. Traditional solutions usually perform image recovery operations before tracking object. However, most image recovery methods usually have higher computational cost, which decreases the tracking speed. In order to solve the above problems, we propose a deblurring Transformer-based tracking method embedding the conditional cross-attention. The proposed method integrates three important modules: (1) an image quality assessment (IQA) module to estimate image quality; (2) an image deblurring module based on lightweight adversarial network to improve image quality; and (3) a tracking module based on Transformer with conditional cross-attention to enhance the object localization ability. Experimental results on two UAV object tracking benchmarks show that the proposed trackers achieve competitive results compared to several state-of-the-art trackers.

References

[1]

You, S., Zhu, H., Li, M., Li, Y.: A review of visual trackers and analysis of its application to mobile robot. ArXiv abs/1910.09761 (2019)

[2]

Li P, Wang D, Wang L, and Lu H Deep visual tracking: review and experimental comparison Pattern Recogn. 2018 76 323-338

[3]

Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: ECCV, pp. 101–117 (2018)

[4]

Dai, K., Wang, D., Lu, H., Sun, C., Li, J.: Visual tracking via adaptive spatially-regularized correlation filters. In: CVPR, pp. 4670–4679 (2019)

[5]

Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. In: CVPR, pp. 8126–8135 (2021)

[6]

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020)

[7]

Meng, D., Chen, X., Fan, Z., Zeng, G., Li, H., Yuan, Y., Sun, L., Wang, J.: Conditional detr for fast training convergence. In: CVPR, pp. 3651–3660 (2021)

[8]

Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al.: Searching for mobilenetv3. In: ICCV, pp. 1314–1324 (2019)

[9]

Li S and Yeung D-Y Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models AAAI Conf. Artif. Intell. 2017 31 4140-4146

[10]

Fu C, Cao Z, Li Y, Ye J, and Feng C Onboard real-time aerial tracking with efficient siamese anchor proposal network IEEE Trans. Geosci. Remote Sens. 2021 60 1-13

[11]

Wang F, Yin S, Mbelwa JT, and Sun F Learning saliency aware correlation filter for visual tracking Multimed. Tools Appl. 2022 81 27879-27893

[12]

Wang Y, Wang F, Wang C, He J, and Sun F Context and saliency aware correlation filter for visual target tracking Computer J. 2022 65 1846-1859

[13]

Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese instance search for tracking. In: CVPR, pp. 1420–1429 (2016)

[14]

Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: ECCV, pp. 850–865 (2016)

[15]

Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: CVPR, pp. 6668–6677 (2020)

[16]

Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: CVPR, pp. 4282–4291 (2019)

[17]

Guo, D., Shao, Y., Cui, Y., Wang, Z., Zhang, L., Shen, C.: Graph attention tracking. In: CVPR, pp. 9543–9552 (2021)

[18]

Wu R, Wen X, Liu Z, Yuan L, and Xu H Stasiamrpn: visual tracking based on spatiotemporal and attention Multimed. Syst. 2021 28 1543-1555

[19]

Ondrašovič M and Tarábek P Siamese visual object tracking: a survey IEEE Access 2021 9 110149-110172

[20]

Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: CVPR, pp. 8971–8980 (2018)

[21]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, and Polosukhin I Attention is all you need NeurIPS 2017 30 6000-6010

[22]

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S.A.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2021)

[23]

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 10012–10022 (2021)

[24]

Yan, B., Peng, H., Fu, J., Wang, D., Lu, H.: Learning spatio-temporal transformer for visual tracking. In: ICCV, pp. 10448–10457 (2021)

[25]

Chen, B., Li, P., Bai, L., Qiao, L., Shen, Q., Li, B., Gan, W., Wu, W., Ouyang, W.: Backbone is all your need: a simplified architecture for visual object tracking. In: ECCV, pp. 375–392 (2022)

[26]

Cui, Y., Jiang, C., Wang, L., Wu, G.: Mixformer: End-to-end tracking with iterative mixed attention. In: CVPR, pp. 13598–13608 (2022)

[27]

Song, Z., Yu, J., Chen, Y.P., Yang, W.: Transformer tracking with cyclic shifting window attention. In: CVPR, pp. 8781–8790 (2022)

[28]

Zhao, M., Okada, K., Inaba, M.: Trtr: Visual tracking with transformer. ArXiv abs/2105.03817 (2021)

[29]

Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, and Bharath AA Generative adversarial networks: An overview IEEE Signal Process. Mag. 2018 35 1 53-65

[30]

Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 1125–1134 (2017)

[31]

Badrinarayanan V, Kendall A, and Cipolla R Segnet: a deep convolutional encoder-decoder architecture for image segmentation IEEE Trans. Pattern Anal. Mach. Intell. 2017 39 12 2481-2495

[32]

Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: Deblurgan: Blind motion deblurring using conditional adversarial networks. In: CVPR, pp. 8183–8192 (2018)

[33]

Kupyn, O., Martyniuk, T., Wu, J., Wang, Z.: Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better. In: ICCV, pp. 8878–8887 (2019)

[34]

Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018)

[35]

Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)

[36]

Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: ECCV, pp. 694–711 (2016)

Cited By

Wang QYang JSong H(2024)A Novel Long Short-Term Memory Learning Strategy for Object TrackingInternational Journal of Intelligent Systems10.1155/2024/66322422024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/6632242
Ma SZhao BHou ZYu WPu LYang X(2024)SOCFExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122131238:PEOnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.122131
Zhang JChen WHe YKuang LSangaiah A(2024)SiamS3C: spatial-channel cross-correlation for visual tracking with centerness-guided regressionMultimedia Systems10.1007/s00530-024-01450-530:5Online publication date: 20-Aug-2024
https://dl.acm.org/doi/10.1007/s00530-024-01450-5
Show More Cited By

Recommendations

A novel gradient attenuation Richardson-Lucy algorithm for image motion deblurring

This paper presents a novel blind image deconvolution algorithm for motion deblurring from a single blurred image. We propose a unified framework for both blur kernel estimation and non-blind image deconvolution by using bilateral filtering (BF) and a ...
Non-uniform Deblurring for Shaken Images

Photographs taken in low-light conditions are often blurry as a result of camera shake, i.e. a motion of the camera while its shutter is open. Most existing deblurring methods model the observed blurry image as the convolution of a sharp image with a ...
Disparity-based space-variant image deblurring

Obtaining a good-quality image requires exposure to light for an appropriate amount of time. If there is camera or object motion during the exposure time, the image is blurred. To remove the blur, some recent image deblurring methods effectively ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Multimedia Systems

Multimedia Systems Volume 29, Issue 3

Jun 2023

936 pages

ISSN:0942-4962

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 30 December 2022

Accepted: 19 December 2022

Received: 20 November 2022

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Applied Basic Research Project of Liaoning Province
Innovative Talents Program for Liaoning Universities
Liaoning Revitalization Talents Program

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang QYang JSong H(2024)A Novel Long Short-Term Memory Learning Strategy for Object TrackingInternational Journal of Intelligent Systems10.1155/2024/66322422024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/6632242
Ma SZhao BHou ZYu WPu LYang X(2024)SOCFExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122131238:PEOnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.122131
Zhang JChen WHe YKuang LSangaiah A(2024)SiamS3C: spatial-channel cross-correlation for visual tracking with centerness-guided regressionMultimedia Systems10.1007/s00530-024-01450-530:5Online publication date: 20-Aug-2024
https://dl.acm.org/doi/10.1007/s00530-024-01450-5
Lu XLi FYang W(2024)Siamada: visual tracking based on Siamese adaptive learning networkNeural Computing and Applications10.1007/s00521-024-09481-936:14(7639-7656)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1007/s00521-024-09481-9

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents