[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1609/aaai.v33i01.33018287guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free access

STA: spatial-temporal attention for large-scale video-based person re-identification

Published: 27 January 2019 Publication History

Abstract

In this work, we propose a novel Spatial-Temporal Attention (STA) approach to tackle the large-scale person re-identification task in videos. Different from the most existing methods, which simply compute representations of video clips using frame-level aggregation (e.g. average pooling), the proposed STA adopts a more effective way for producing robust clip-level feature representation. Concretely, our STA fully exploits those discriminative parts of one target person in both spatial and temporal dimensions, which results in a 2-D attention score matrix via inter-frame regularization to measure the importances of spatial parts across different frames. Thus, a more robust clip-level feature representation can be generated according to a weighted sum operation guided by the mined 2-D attention score matrix. In this way, the challenging cases for video-based person re-identification such as pose variation and partial occlusion can be well tackled by the STA. We conduct extensive experiments on two large-scale benchmarks, i.e. MARS and DukeMTMC-VideoReID. In particular, the mAP reaches 87.7% on MARS, which significantly outperforms the state-of-the-arts with a large margin of more than 11.6%.

References

[1]
Ahmed, E.; Jones, M.; and Marks, T. K. 2015. An improved deep learning architecture for person re-identification. In IEEE ICCV.
[2]
Chen, D.; Li, H.; Xiao, T.; Yi, S.; and Wang, X. 2018. Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In IEEE CVPR.
[3]
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei-Fei, L. 2009. Imagenet: A large-scale hierarchical image database. In IEEE CVPR.
[4]
Ding, S.; Lin, L.; Wang, G.; and Chao, H. 2015. Deep feature learning with relative distance comparison for person re-identification. Pattern Recognition.
[5]
Felzenszwalb, P.; McAllester, D.; and Ramanan, D. 2008. A discriminatively trained, multiscale, deformable part model. In IEEE CVPR.
[6]
Fu, Y.; Wei, Y.; Zhou, Y.; Shi, H.; Huang, G.; Wang, X.; Yao, Z.; and Huang, T. 2019. Horizontal pyramid matching for person re-identification. In AAAI.
[7]
He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In IEEE CVPR.
[8]
Hermans, A.; Beyer, L.; and Leibe, B. 2017. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737.
[9]
Hirzer, M.; Beleznai, C.; Roth, P. M.; and Bischof, H. 2011. Person re-identification by descriptive and discriminative classification. In SCIA.
[10]
Jimin Xiao, Yanchun Xie, T. T. K. H. Y. W. J. F. 2019. Ian: The individual aggregation network for person search. Pattern Recognition.
[11]
Kingma, D. P., and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[12]
Li, D.; Chen, X.; Zhang, Z.; and Huang, K. 2017. Learning deep context-aware features over body and latent parts for person re-identification. In IEEE CVPR.
[13]
Li, S.; Bak, S.; Carr, P.; and Wang, X. 2018. Diversity regularized spatiotemporal attention for video-based person re-identification. In IEEE CVPR.
[14]
Lin, Z.; Feng, M.; Santos, C. N. d.; Yu, M.; Xiang, B.; Zhou, B.; and Bengio, Y. 2017. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130.
[15]
Liu, H.; Jie, Z.; Jayashree, K.; Qi, M.; Jiang, J.; Yan, S.; and Feng, J. 2017. Video-based person re-identification with accumulative motion context. IEEE TCSVT.
[16]
Liu, Y.; Yan, J.; and Ouyang, W. 2017. Quality aware network for set to set recognition. In IEEE CVPR.
[17]
McLaughlin, N.; Martinez del Rincon, J.; and Miller, P. 2016. Recurrent convolutional network for video-based person re-identification. In IEEE CVPR.
[18]
Meyer, C. D. 2000. Matrix analysis and applied linear algebra. Siam.
[19]
Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; and Tomasi, C. 2016. Performance measures and a data set for multi-target, multi-camera tracking. In ECCV.
[20]
Schroff, F.; Kalenichenko, D.; and Philbin, J. 2015. Facenet: A unified embedding for face recognition and clustering. In IEEE CVPR.
[21]
Simonyan, K., and Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[22]
Song, G.; Leng, B.; Liu, Y.; Hetang, C.; and Cai, S. 2017. Region-based quality estimation network for large-scale person re-identification. arXiv preprint arXiv:1711.08766.
[23]
Sun, Y.; Zheng, L.; Yang, Y.; Tian, Q.; and Wang, S. 2017. Beyond part models: Person retrieval with refined part pooling. arXiv preprint arXiv:1711.09349.
[24]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; and Wojna, Z. 2016. Rethinking the inception architecture for computer vision. In IEEE CVPR.
[25]
Wang, T.; Gong, S.; Zhu, X.; and Wang, S. 2014. Person re-identification by video ranking. In ECCV.
[26]
Wang, T.; Gong, S.; Zhu, X.; and Wang, S. 2016. Person re-identification by discriminative selection in video ranking. IEEE TPAMI.
[27]
Wu, Y.; Lin, Y.; Dong, X.; Yan, Y.; Ouyang, W.; and Yang, Y. 2018. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In IEEE CVPR.
[28]
Xiao, T.; Li, H.; Ouyang, W.; and Wang, X. 2016. Learning deep feature representations with domain guided dropout for person re-identification. In IEEE CVPR.
[29]
Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhudinov, R.; Zemel, R.; and Bengio, Y. 2015. Show, attend and tell: Neural image caption generation with visual attention. In ICML.
[30]
Xu, S.; Cheng, Y.; Gu, K.; Yang, Y.; Chang, S.; and Zhou, P. 2017. Jointly attentive spatial-temporal pooling networks for video-based person re-identification. arXiv preprint arXiv:1708.02286.
[31]
You, J.; Wu, A.; Li, X.; and Zheng, W.-S. 2016. Top-push video-based person re-identification. In IEEE CVPR.
[32]
Zamir, A. R.; Dehghan, A.; and Shah, M. 2012. Gmcp-tracker: Global multi-object tracking using generalized minimum clique graphs. In ECCV.
[33]
Zheng, L.; Bie, Z.; Sun, Y.; Wang, J.; Su, C.; Wang, S.; and Tian, Q. 2016. Mars: A video benchmark for large-scale person re-identification. In ECCV.
[34]
Zheng, Z.; Zheng, L.; and Yang, Y. 2017. A discriminatively learned cnn embedding for person reidentification. ACM TOMM.
[35]
Zhong, Z.; Zheng, L.; Cao, D.; and Li, S. 2017. Re-ranking person re-identification with k-reciprocal encoding. In IEEE CVPR.
[36]
Zhou, Z.; Huang, Y.; Wang, W.; Wang, L.; and Tan, T. 2017. See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In IEEE CVPR.

Cited By

View all
  • (2023)Attention-guided Adversarial Attack for Video Object SegmentationACM Transactions on Intelligent Systems and Technology10.1145/361706714:6(1-22)Online publication date: 14-Nov-2023
  • (2023)Context Sensing Attention Network for Video-based Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357320319:4(1-20)Online publication date: 27-Feb-2023
  • (2023)Reliable Cross-Camera Learning in Random Camera Person Re-IdentificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.334187734:6(4556-4567)Online publication date: 12-Dec-2023
  • Show More Cited By

Index Terms

  1. STA: spatial-temporal attention for large-scale video-based person re-identification
          Index terms have been assigned to the content through auto-classification.

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence
          January 2019
          10088 pages
          ISBN:978-1-57735-809-1

          Sponsors

          • Association for the Advancement of Artificial Intelligence

          Publisher

          AAAI Press

          Publication History

          Published: 27 January 2019

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)62
          • Downloads (Last 6 weeks)13
          Reflects downloads up to 15 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2023)Attention-guided Adversarial Attack for Video Object SegmentationACM Transactions on Intelligent Systems and Technology10.1145/361706714:6(1-22)Online publication date: 14-Nov-2023
          • (2023)Context Sensing Attention Network for Video-based Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357320319:4(1-20)Online publication date: 27-Feb-2023
          • (2023)Reliable Cross-Camera Learning in Random Camera Person Re-IdentificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.334187734:6(4556-4567)Online publication date: 12-Dec-2023
          • (2023)RPI-CapsuleGANPattern Recognition10.1016/j.patcog.2023.109626141:COnline publication date: 5-Jun-2023
          • (2022)Information Bottleneck Enhanced Video-based Person Re-identificationProceedings of the 2022 6th International Conference on Video and Image Processing10.1145/3579109.3579135(149-156)Online publication date: 23-Dec-2022
          • (2022)Multiple Biological Granularities Network for Person Re-IdentificationProceedings of the 2022 International Conference on Multimedia Retrieval10.1145/3512527.3531365(54-62)Online publication date: 27-Jun-2022
          • (2022)Temporal-Consistent Visual Clue Attentive Network for Video-Based Person Re-IdentificationProceedings of the 2022 International Conference on Multimedia Retrieval10.1145/3512527.3531362(72-80)Online publication date: 27-Jun-2022
          • (2022)Relation-based global-partial feature learning network for video-based person re-identificationNeurocomputing10.1016/j.neucom.2022.03.032488:C(424-435)Online publication date: 18-May-2022
          • (2022)Learning discriminative representations via variational self-distillation for cross-view geo-localizationComputers and Electrical Engineering10.1016/j.compeleceng.2022.108335103:COnline publication date: 1-Oct-2022
          • (2021)A Multi-task Deep Network for video-based Person Re-identificationProceedings of the 2021 1st International Conference on Control and Intelligent Robotics10.1145/3473714.3473770(320-324)Online publication date: 18-Jun-2021
          • Show More Cited By

          View Options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media