[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3461353.3461357acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiciaiConference Proceedingsconference-collections
research-article

Parallel Attention with Weighted Efficient Network for Video-Based Person Re-Identification

Published: 04 September 2021 Publication History

Abstract

In this paper, we propose a new way to solve the problems of temporal and spatial independence, shallow feature extraction, and large computation which are not solved by traditional video-based Re-ID methods. Insufficient ability to extract features based on traditional networks can cause problems with bad ripple effect later, therefore we design an attention network named Parallel Spatio-Temporal Attention (PSTA) to fuse spatio-temporal features. After extracting deep features, existed methods need stack convolutional operation to model large receptive fields, so we use Non-local operation to capture long-range dependencies directly. For Non-local method, we propose an Attention-Like Similarity (ALS) to learn the weights of similarity matrix adaptively, then filter out redundant similarities. To solve the high complexity brought by Non-local method and maintain accuracy, we perform Spatial Pyramid Pooling (SPP) in Non-local structure to reduce complexity and combine multi-scale features. Extensive experiments with ablation analysis show the effectiveness of our methods, and state-of-the-art results are achieved on large-scale video datasets.

References

[1]
Dong Seon Cheng, Marco Cristani, Michele Stoppa, Loris Bazzani, and Vittorio Murino. Custom pictorial structures for re-identification. In Bmvc, volume 1, page 6. Citeseer, 2011.
[2]
Douglas Gray and Hai Tao. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In European conference on computer vision, pages 262–275. Springer, 2008.
[3]
Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. Deepreid: Deep filter pairing neural network for person re-identification. In CVPR, 2014.
[4]
Chen Change Loy, Chunxiao Liu, and Shaogang Gong. Person re-identification by manifold ranking. In 2013 IEEE International Con- ference on Image Processing, pages 3567–3571. IEEE, 2013.
[5]
Yu, Hong Xing, “Unsupervised Person Re-Identification by Soft Multilabel Learning.” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) IEEE, 2020.
[6]
Liming Zhao, Xi Li, Yueting Zhuang, and Jingdong Wang. Deeply- learned part-aligned representations for person reidentification. In ICCV, pages 3239–3248, 2017.
[7]
Hao Liu, Jiashi Feng, Meibin Qi, Jianguo Jiang, and Shuicheng Yan. End-to-end comparative attention networks for person re-identification. TIP, pages 3492–3506.
[8]
Yu Liu, Junjie Yan, and Wanli Ouyang. Quality aware network for set to set recognition. In CVPR, pages 5790–5799, 2017.
[9]
Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, and Pan Zhou. Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In ICCV, pages 4733–4742, 2017.
[10]
Jiyang Gao and Ram Nevatia. Revisiting temporal modeling for video-based person reid. In BMVC, 2018.
[11]
Hou, Ruibing, “VRSTC: Occlusion-Free Video Person Re-Identification.” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) IEEE, 2019.
[12]
Zhu, Yingxin, “Multi-Branch Context-Aware Network for Person Re-Identification.” 2019 IEEE International Conference on Multimedia and Expo (ICME) IEEE, 2019.
[13]
Yu Liu, Junjie Yan, and Wanli Ouyang, “Quality aware network for set to set recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5790–5799.
[14]
Niall McLaughlin, Jesus Martinez del Rincon, and Paul Miller, “Re-current convolutional network for videobased person re-identification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1325–1334.
[15]
Zhen Zhou, Yan Huang, Wei Wang, Liang Wang, and Tieniu Tan, “See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4747–4756.
[16]
Jing Xu, Rui Zhao, Feng Zhu, Huaming Wang, and Wanli Ouyang,“Attention-aware compositional network for person re-identification,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
[17]
Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, and Pan Zhou, “Jointly attentive spatialtemporal pooling networks for video-based person reidentification,” in The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
[18]
Eduardo Fidalgo, Enrique Alegre, Victor Gonzalez-Castro, and Laura Fernandez-Robles, “Boosting image classification through semantic attention filtering strategies,” Pattern Recognition Letters, vol. 112, pp.176–183, 2018.
[19]
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon, “Cbam: Convolutional block attention module,” in The European Conference on Computer Vision (ECCV), September 2018.
[20]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He, “Non-local neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7794–7803.
[21]
A. K. Jain, Jianchang Mao, and K. M. Mohiuddin, “Artificial neural networks: a tutorial,” Computer, vol. 29, no. 3, pp. 31–44, 1996.
[22]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and JianSun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no.9, pp. 1904–1916, 2015.
[23]
Shuang Li, Slawomir Bak, Peter Carr, and Xiaogang Wang, “Diversity regularized spatiotemporal attentionfor video-based person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 369–378.
[24]
Yang Fu, Xiaoyang Wang, Yunchao Wei, and Thomas Huang,“Sta: Spatial-temporal attention for large-scale video-based person re-identification,” in Proceedings of the Association for the Advancement of Artificial Intelligence. 2019.
[25]
Dapeng Chen, Hongsheng Li, Tong Xiao, Shuai Yi, and Xiaogang Wang,“Video person re-identification with competitive snippet-similarity aggregation and coattentive snippet embedding,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
[26]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[27]
Jianing Li, Shiliang Zhang, and Tiejun Huang, “Multiscale 3d convolution network for video based person reidentification,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, pp. 8618–8625.
[28]
Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th International Conference on Machine Learning, New York, NY, USA, 2008, ICML ’08, p. 1096–1103, Association for Computing Machinery.
[29]
Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian, “Mars: A video benchmark for large-scale person re-identification,” in European Conference on Computer Vision. Springer,2016, pp. 868–884.
[30]
Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang, “Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5177–5186.
[31]
Hao Liu, Zequn Jie, Karlekar Jayashree, Meibin Qi, Jianguo Jiang, Shuicheng Yan, and Jiashi Feng. Video-based person re-identification with accumulative motion context. TCSVT, 28(10):2788–2802, 2018.
[32]
Alexander Hermans, Lucas Beyer, and Bastian Leibe. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737, 2017.
[33]
Dapeng Chen, Hongsheng Li, Tong Xiao, Shuai Yi, and Xiaogang Wang. Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In CVPR, pages 1169–1178, 2018.
[34]
Jianlou Si, Honggang Zhang, Chun-Guang Li, Jason Kuen, Xiangfei Kong, Alex C Kot, and Gang Wang. Dual attention matching network for context-aware feature sequence based person re-identification. In CVPR, 2018.
[35]
Recognition, 2018, pp. 5177–5186. Yiheng Liu, Zhenxun Yuan, Wengang Zhou, and Houqiang Li, “Spatial and temporal mutual promotion for video-based person re-identification,” 2018.
[36]
Shuang Li, Slawomir Bak, Peter Carr, and Xiaogang Wang, “Diversity regularized spatiotemporal attention for video-based person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
[37]
Yiru Zhao, Xu Shen, Zhongming Jin, Hongtao Lu, and Xian-sheng Hua, “Attribute-driven feature disentangling and temporal aggregation for video person reidentification,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
[38]
Jianing Li, Jingdong Wang, Qi Tian, Wen Gao, and Shiliang Zhang, “Global-local temporal representations for video person re- identification,” in The IEEE International Conference on Computer Vision (ICCV), October 2019.

Cited By

View all
  • (2024)Temporal Attention Framework Based on Occlusion Localization for Video Person Re-IDProceedings of the 13th International Conference on Computer Engineering and Networks10.1007/978-981-99-9243-0_34(345-354)Online publication date: 2-Feb-2024

Index Terms

  1. Parallel Attention with Weighted Efficient Network for Video-Based Person Re-Identification
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        ICIAI '21: Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence
        March 2021
        246 pages
        ISBN:9781450388634
        DOI:10.1145/3461353
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 04 September 2021

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. attention-like similarity
        2. non-local operation
        3. parallel spatio-temporal attention
        4. spatial pyramid pooling in non-local structure

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        ICIAI 2021

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)2
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 23 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Temporal Attention Framework Based on Occlusion Localization for Video Person Re-IDProceedings of the 13th International Conference on Computer Engineering and Networks10.1007/978-981-99-9243-0_34(345-354)Online publication date: 2-Feb-2024

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media