More Web Proxy on the site http://driver.im/

research-article

Parallel Attention with Weighted Efficient Network for Video-Based Person Re-Identification

Authors:

Fuchi LiAuthors Info & Claims

ICIAI '21: Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence

Pages 133 - 139

https://doi.org/10.1145/3461353.3461357

Published: 04 September 2021 Publication History

Abstract

In this paper, we propose a new way to solve the problems of temporal and spatial independence, shallow feature extraction, and large computation which are not solved by traditional video-based Re-ID methods. Insufficient ability to extract features based on traditional networks can cause problems with bad ripple effect later, therefore we design an attention network named Parallel Spatio-Temporal Attention (PSTA) to fuse spatio-temporal features. After extracting deep features, existed methods need stack convolutional operation to model large receptive fields, so we use Non-local operation to capture long-range dependencies directly. For Non-local method, we propose an Attention-Like Similarity (ALS) to learn the weights of similarity matrix adaptively, then filter out redundant similarities. To solve the high complexity brought by Non-local method and maintain accuracy, we perform Spatial Pyramid Pooling (SPP) in Non-local structure to reduce complexity and combine multi-scale features. Extensive experiments with ablation analysis show the effectiveness of our methods, and state-of-the-art results are achieved on large-scale video datasets.

References

[1]

Dong Seon Cheng, Marco Cristani, Michele Stoppa, Loris Bazzani, and Vittorio Murino. Custom pictorial structures for re-identification. In Bmvc, volume 1, page 6. Citeseer, 2011.

[2]

Douglas Gray and Hai Tao. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In European conference on computer vision, pages 262–275. Springer, 2008.

[3]

Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. Deepreid: Deep filter pairing neural network for person re-identification. In CVPR, 2014.

Digital Library

[4]

Chen Change Loy, Chunxiao Liu, and Shaogang Gong. Person re-identification by manifold ranking. In 2013 IEEE International Con- ference on Image Processing, pages 3567–3571. IEEE, 2013.

[5]

Yu, Hong Xing, “Unsupervised Person Re-Identification by Soft Multilabel Learning.” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) IEEE, 2020.

[6]

Liming Zhao, Xi Li, Yueting Zhuang, and Jingdong Wang. Deeply- learned part-aligned representations for person reidentification. In ICCV, pages 3239–3248, 2017.

[7]

Hao Liu, Jiashi Feng, Meibin Qi, Jianguo Jiang, and Shuicheng Yan. End-to-end comparative attention networks for person re-identification. TIP, pages 3492–3506.

[8]

Yu Liu, Junjie Yan, and Wanli Ouyang. Quality aware network for set to set recognition. In CVPR, pages 5790–5799, 2017.

[9]

Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, and Pan Zhou. Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In ICCV, pages 4733–4742, 2017.

[10]

Jiyang Gao and Ram Nevatia. Revisiting temporal modeling for video-based person reid. In BMVC, 2018.

[11]

Hou, Ruibing, “VRSTC: Occlusion-Free Video Person Re-Identification.” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) IEEE, 2019.

[12]

Zhu, Yingxin, “Multi-Branch Context-Aware Network for Person Re-Identification.” 2019 IEEE International Conference on Multimedia and Expo (ICME) IEEE, 2019.

[13]

Yu Liu, Junjie Yan, and Wanli Ouyang, “Quality aware network for set to set recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5790–5799.

[14]

Niall McLaughlin, Jesus Martinez del Rincon, and Paul Miller, “Re-current convolutional network for videobased person re-identification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1325–1334.

[15]

Zhen Zhou, Yan Huang, Wei Wang, Liang Wang, and Tieniu Tan, “See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4747–4756.

[16]

Jing Xu, Rui Zhao, Feng Zhu, Huaming Wang, and Wanli Ouyang,“Attention-aware compositional network for person re-identification,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[17]

Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, and Pan Zhou, “Jointly attentive spatialtemporal pooling networks for video-based person reidentification,” in The IEEE International Conference on Computer Vision (ICCV), Oct 2017.

[18]

Eduardo Fidalgo, Enrique Alegre, Victor Gonzalez-Castro, and Laura Fernandez-Robles, “Boosting image classification through semantic attention filtering strategies,” Pattern Recognition Letters, vol. 112, pp.176–183, 2018.

[19]

Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon, “Cbam: Convolutional block attention module,” in The European Conference on Computer Vision (ECCV), September 2018.

Digital Library

[20]

Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He, “Non-local neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7794–7803.

[21]

A. K. Jain, Jianchang Mao, and K. M. Mohiuddin, “Artificial neural networks: a tutorial,” Computer, vol. 29, no. 3, pp. 31–44, 1996.

Digital Library

[22]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and JianSun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no.9, pp. 1904–1916, 2015.

[23]

Shuang Li, Slawomir Bak, Peter Carr, and Xiaogang Wang, “Diversity regularized spatiotemporal attentionfor video-based person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 369–378.

[24]

Yang Fu, Xiaoyang Wang, Yunchao Wei, and Thomas Huang,“Sta: Spatial-temporal attention for large-scale video-based person re-identification,” in Proceedings of the Association for the Advancement of Artificial Intelligence. 2019.

Digital Library

[25]

Dapeng Chen, Hongsheng Li, Tong Xiao, Shuai Yi, and Xiaogang Wang,“Video person re-identification with competitive snippet-similarity aggregation and coattentive snippet embedding,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[26]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

[27]

Jianing Li, Shiliang Zhang, and Tiejun Huang, “Multiscale 3d convolution network for video based person reidentification,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, pp. 8618–8625.

[28]

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th International Conference on Machine Learning, New York, NY, USA, 2008, ICML ’08, p. 1096–1103, Association for Computing Machinery.

Digital Library

[29]

Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian, “Mars: A video benchmark for large-scale person re-identification,” in European Conference on Computer Vision. Springer,2016, pp. 868–884.

[30]

Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang, “Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5177–5186.

[31]

Hao Liu, Zequn Jie, Karlekar Jayashree, Meibin Qi, Jianguo Jiang, Shuicheng Yan, and Jiashi Feng. Video-based person re-identification with accumulative motion context. TCSVT, 28(10):2788–2802, 2018.

Digital Library

[32]

Alexander Hermans, Lucas Beyer, and Bastian Leibe. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737, 2017.

[33]

Dapeng Chen, Hongsheng Li, Tong Xiao, Shuai Yi, and Xiaogang Wang. Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In CVPR, pages 1169–1178, 2018.

[34]

Jianlou Si, Honggang Zhang, Chun-Guang Li, Jason Kuen, Xiangfei Kong, Alex C Kot, and Gang Wang. Dual attention matching network for context-aware feature sequence based person re-identification. In CVPR, 2018.

[35]

Recognition, 2018, pp. 5177–5186. Yiheng Liu, Zhenxun Yuan, Wengang Zhou, and Houqiang Li, “Spatial and temporal mutual promotion for video-based person re-identification,” 2018.

[36]

Shuang Li, Slawomir Bak, Peter Carr, and Xiaogang Wang, “Diversity regularized spatiotemporal attention for video-based person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[37]

Yiru Zhao, Xu Shen, Zhongming Jin, Hongtao Lu, and Xian-sheng Hua, “Attribute-driven feature disentangling and temporal aggregation for video person reidentification,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.

[38]

Jianing Li, Jingdong Wang, Qi Tian, Wen Gao, and Shiliang Zhang, “Global-local temporal representations for video person re- identification,” in The IEEE International Conference on Computer Vision (ICCV), October 2019.

Cited By

Li YShuai SDeng BWang CZhang D(2024)Temporal Attention Framework Based on Occlusion Localization for Video Person Re-IDProceedings of the 13th International Conference on Computer Engineering and Networks10.1007/978-981-99-9243-0_34(345-354)Online publication date: 2-Feb-2024
https://doi.org/10.1007/978-981-99-9243-0_34

Index Terms

Parallel Attention with Weighted Efficient Network for Video-Based Person Re-Identification
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

An efficient feature pyramid attention network for person re-identification
Abstract
For person re-identification, occlusion, appearance similarity and background clutter have always been challenges. In order to effectively address the challenges, we propose an efficient feature pyramid attention network (FPA-Net), which combines ...
Highlights
- Learn discriminative features to solve person occlusion and similar appearances.
- The composite attention mechanism explores salient feature information.
- Feature pyramid structure and transition block integrate prominent feature ...
Part-Aware Attention Network for Person Re-identification
Computer Vision – ACCV 2020
Abstract
Multi-level feature aggregation and part feature extraction are widely used to boost the performance of person re-identification (Re-ID). Most multi-level feature aggregation methods treat feature maps on different levels equally and use simple ...
HPAN: A Hybrid Pose Attention Network for Person Re-Identification
Pattern Recognition and Computer Vision
Abstract
To address the difficulty in expressing the correlation between different local features extracted by the current person re-identification feature extraction methods, and the challenge of effectively integrating local features with global features,...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICIAI '21: Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence

March 2021

246 pages

ISBN:9781450388634

DOI:10.1145/3461353

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 September 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICIAI 2021

ICIAI 2021: 2021 the 5th International Conference on Innovation in Artificial Intelligence

March 5 - 8, 2021

Xia men, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
42
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li YShuai SDeng BWang CZhang D(2024)Temporal Attention Framework Based on Occlusion Localization for Video Person Re-IDProceedings of the 13th International Conference on Computer Engineering and Networks10.1007/978-981-99-9243-0_34(345-354)Online publication date: 2-Feb-2024
https://doi.org/10.1007/978-981-99-9243-0_34

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents