Abstract
The goal of video-based person re-identification is to match different pedestrians in various image sequences across non-overlapping cameras. A critical issue of this task is how to exploit the useful information provided by videos. To solve this problem, we propose a novel feature learning framework for video-based person re-identification. The proposed method aims at capturing the most significant information in the spatial and temporal domains and then building a discriminative and robust feature representation for each sequence. More specifically, to learn more effective frame-wise features, we apply several attributes to the video-based task and build a multi-task network for the identity and attribute classifications. In the training phase, we present a multi-loss function to minimize intra-class variances and maximize inter-class differences. After that, the feature aggregation network is employed to aggregate frame-wise features and extract the temporal information from the video. Furthermore, considering that adjacent frames typically have similar appearance features, we propose the concept of “non-redundant appearance feature extraction” to obtain the sequence-level appearance descriptors of pedestrians. Based on the complementarity between the temporal feature and the non-redundant appearance feature, we combine them in the distance learning phase by assigning them different distance-weighted coefficients. Extensive experiments are conducted on three video-based datasets and the results demonstrate the superiority and effectiveness of our method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bai X, Yang M, Huang T, Dou Z, Yu R, Xu Y (2017) Deep-person: Learning discriminative deep features for person re-identification. arXiv:1711.10658
Chen G, Lu J, Yang M, Zhou J (2019) Spatial-temporal attention-aware learning for video-based person re-identification. IEEE Trans Image Process, pp 1–1
Chen S, Guo C, Lai J (2016) Deep ranking for person re-identification via joint representation learning. IEEE Trans Image Process 25(5):2353–2367
Chen Y, Duffner S, Stoian A, Dufour JY, Baskurt A (2018) Deep and low-level feature based attribute learning for person re-identification. Image Vis Comput 79:25–34
Chen YC, Zheng WS, Lai J (2015) Mirror representation for modeling view-specific transform in person re-identification. In: Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pp 3402–3408
Dai J, Zhang P, Wang D, Lu H, Wang H (2019) Video person re-identification by temporal residual learning. IEEE Trans Image Process 28(3):1366–1377
Deng Y, Ping L, Chen CL, Tang X (2014) Pedestrian attribute recognition at far distance. In: ACM International Conference on Multimedia
Gao J, Nevatia R (2018) Revisiting temporal modeling for video-based person reid. arXiv:1805.02104
Gong S, Cristani M, Yan S, et al. (2014) Person Re-Identification. Springer Publishing Company, Incorporated
Gray D, Tao H (2008) Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: European Conference on Computer Vision, Marseille, France, pp 262–275
Hirzer M, Beleznai C, Roth P, et al. (2011) Person re-identification by descriptive and discriminative classification. Ystad, Sweden, pp 91–102
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7132–7141
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. arXiv:14085093
Layne R, Hospedales T, Gong S (2014) Person re-identification by attributes. In: British Machine Vision Conference. Portland, USA
Li A, Liu L, Wang K, et al. (2015) Clothing attributes assisted person reidentification. IEEE Trans Circuits Syst Video Technol 25(5):869–878
Li W, Zhao R, Xiao T, et al. (2014) Deepreid: Deep filter pairing neural network for person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, pp 152–159
Liao S, Zhao G, Kellokumpu V et al (2010) Modeling pixel process with scale invariant local patterns for background subtraction in complex scenes. In: IEEE Conference on computer vision and pattern recognition, pp 1301–1306
Liao S, Hu Y, Zhu X, et al. (2015) Person re-identification by local maximal occurrence representation and metric learning. In: IEEE Conference on computer vision and pattern recognition, Boston, USA, 2197–2206
Lin Y, Zheng L, Zheng Z, Wu Y, Hu Z, Yan C, Yang Y (2019) Improving person re-identification by attribute and identity learning. Pattern Recognition. https://doi.org/10.1016/j.patcog.2019.06.006
Ling H, Wang Z, Li P, Shi Y, Chen J, Zou F (2019) Improving person re-identification by multi-task learning. Neurocomputing 347:109–118
Liu H, Jie Z, Jayashree K, Qi M, Jiang J, Yan S, Feng J (2017) Video-based person re-identification with accumulative motion context. IEEE Transactions on Circuits System and Video Technology PP(99):1–1
Liu J, Sun C, Xu X, Xu B, Yu S (2019) A spatial and temporal features mixture model with body parts for video-based person re-identification. Appl Intell 49(9):3436–3446
Liu K, Ma B, Zhang W, et al. (2015) A spatio-temporal appearance representation for viceo-based pedestrian re-identification. In: Proc IEEE Int Conf Comput Vis, Santiago, Chile, pp 3810–3818
Liu Y, Yan J, Ouyang W (2017) Quality aware network for set to set recognition. In: IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA, pp 4694–4703
Liu Z, Wang Y, Li A (2018) Hierarchical integration of rich features for video-based person re-identification. IEEE Transactions on Circuits and Systems for Video Technology, pp 1–1
Masoumi M, Amiri S (2013) A blind scene-based watermarking for video copyright protection. AEU - International Journal of Electronics and Communications 67(6):528–535
Matsukawa T, Suzuki E (2016) Person re-identification using cnn features learned from combination of attributes. In: International Conference on Pattern Recognition. Cancun, Mexico, pp 2428–2433
Matsukawa T, Okabe T, Suzuki E, et al. (2016) Hierarchical gaussian descriptor for person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp 1363–1372
McLaughlin N, Rincon J, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp 1325–1334
Roth P, Hirzer M, Kstinger M, et al. (2014) Mahalanobis distance learning for person re-identification. Springer, London, pp 247–267
Song W, Zheng J, Wu Y, Chen C, Liu F (2019) A two-stage attribute-constraint network for video-based person re-identification. IEEE Access 7:8508–8518
Song W, Wu Y, Zheng J, Chen C, Liu F (2020) Video-based person re-identification using a novel feature extraction and fusion technique. Multimedia Tools and Applications
Su C, Zhang S, Xing J, Gao W, Tian Q (2018) Multi-type attributes driven multi-camera person re-identification. Pattern Recogn 75:77–89
Wang T, Gong S, Zhu X, et al. (2014) Person re-identification by video ranking. In: European Conference on Computer Vision. Zurich, Switzerland, pp 688–703
Wang T, Gong S, Zhu X, Wang S (2016) Person re-identification by discriminative selection in video ranking. IEEE Trans Pattern Anal Mach Intell 38(12):2501–2514
Wei L, Zhang S, Yao H, Gao W, Tian Q (2019) Glad: Global local alignment descriptor for scalable person re-identification. IEEE Transactions on Multimedia 21(4):986–999
Weinberger K, Saul L (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision – ECCV 2016, Cham, pp 499–515
Wu D, Zheng SJ, Bao WZ, Zhang XP, Yuan CA, Huang DS (2019) A novel deep model with multi-loss and efficient training for person re-identification. Neurocomputing 324:69–75
Wu S, Chen YC, Li X, Wu AC, You JJ, Zheng WS (2016) An enhanced deep feature representation for person re-identification. In: IEEE Workshop Applications of Computer Vision, New York, USA, pp 1–8
Xiao T, Li H, Ouyang W, Wang X (2016) Learning deep feature representations with domain guided dropout for person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp 1249–1258
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on computer vision and pattern recognition, pp 5987–5995
Xu S, Cheng Y, Gu K, et al. (2017) Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: Proc IEEE Int Conf Comput Vis, Hawaii, USA, pp 4743–4752
Yan Y, Ni B, Song Z, et al. (2016) Person re-identification via recurrent feature aggregation. In: European Conference on Computer Vision, Amsterdam, The Netherlands, pp 701–716
Yang Y, Yang J, Yan J et al (2014) Salient color names for person re-identification. In: European Conference on computer vision, pp 536–551
You J, Wu A, Li X, Zheng WS (2016) Top-push video-based person re-identification. In: IEEE Conference on computer vision and pattern recognition, Las Vegas, USA, pp 1345–1353
Zhang L, Xiang T, Gong S (2016) Learning a discriminative null space for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Zhang W, Hu S, Liu K, Zha Z (2018) Compact appearance learning for video-based person re-identification. IEEE Transactions on Circuits and Systems for Video Technology, pp 1–1
Zheng L, Yang Y, Hauptmann AG (2016) Person re-identification: Past, present and future. arXiv:1610.02984
Zheng L, Bie Z, Sun Y, et al. (2016) Mars: A video benchmark for large-scale person re-identification. In: European conference on computer vision, Amsterdam, The Netherlands, vol 9910, pp 868–884
Zheng W, Gong S, Xiang T (2011) Person re-identification by probabilistic relative distance comparison
Zheng WS, Gong S, Xiang T (2013) Reidentification by relative distance comparison. IEEE Trans Pattern Anal Mach Intell 35(3):653–668
Zhong W, Zhang T, Jiang L, Ji J, Zhang Z, Xiong H (2019) Discriminative representation learning for person re-identification via multi-loss training. Journal of Visual Communication and Image Representation
Zhou Z, Huang Y, Wang W, et al. (2017) See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp 6776–6785
Zhu J, Liao S, Zhen L, Li SZ (2017) Multi-label convolutional neural network based pedestrian attribute classification. Image & Vision Computing 58(C):224–229
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported in part by National Natural Science Foundation of China under Grant 61702278, in part by Priority Academic Program Development of Jiangsu Higher Education Institutions and in part by Postgraduate Research & Practice Innovation Program of Jiangsu Province KYCX18_0890.
Rights and permissions
About this article
Cite this article
Song, W., Zheng, J., Wu, Y. et al. Discriminative feature extraction for video person re-identification via multi-task network. Appl Intell 51, 788–803 (2021). https://doi.org/10.1007/s10489-020-01844-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01844-8