[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Correlation Discrepancy Insight Network for Video Re-identification

Published: 17 December 2020 Publication History

Abstract

Video-based person re-identification (ReID) aims at re-identifying a specified person sequence from videos that were captured by disjoint cameras. Most existing works on this task ignore the quality discrepancy across frames by using all video frames to develop a ReID method. Additionally, they adopt only the person self-characteristic as the representation, which cannot adapt to cross-camera variation effectively. To that end, we propose a novel correlation discrepancy insight network for video-based person ReID, which consists of an unsupervised correlation insight model (CIM) for video purification and a discrepancy description network (DDN) for person representation. Concretely, CIM is constructed by using kernelized correlation filters to encode person half-parts, which evaluates the frame quality by the cross correlation across frames for selecting discriminative video fragments. Furthermore, DDN exploits the selected video fragments to generate a discrepancy descriptor using a compression network, which aims at employing the discrepancies with other persons’ to facilitate the representation of the target person rather than only using the self-characteristic. Due to the advantage in handling cross-domain variation, the discrepancy descriptor is expected to provide a new pattern for the object representation in cross-camera tasks. Experimental results on three public benchmarks demonstrate that the proposed method outperforms several state-of-the-art methods.

References

[1]
Rémi Auguste, Jean Martinet, and Pierre Tirilly. 2015. Space-time histograms and their application to person re-identification in tv shows. In Proceedings of the ACM Conference on Multimedia. 91--97.
[2]
Slawomir Bak, Guillaume Charpiat, Etienne Corvee, Francois Bremond, and Monique Thonnat. 2012. Learning to match appearances by correlations in a covariance metric space. In Proceedings of the European Conference on Computer Vision. Springer, 806--820.
[3]
A. Bedagkar-Gala and S. K. Shah. 2011. Multiple person re-identification using part based spatio-temporal color appearance model. In Proceedings of the International Conference Computer Vision Workshops. 1721--1728.
[4]
David S. Bolme, J. Ross Beveridge, Bruce A. Draper, and Yui Man Lui. 2010. Visual object tracking using adaptive correlation filters. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 2544--2550.
[5]
Dapeng Chen, Zejian Yuan, Badong Chen, and Nanning Zheng. 2016. Similarity learning with spatial constraints for person re-identification. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 1268--1277.
[6]
Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. A multi-task deep network for person re-identification. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.
[7]
Yanbei Chen, Xiatian Zhu, and Shaogang Gong. 2018. Deep association learning for unsupervised video person re-identification. arXiv:1808.07301. Retrieved from http://arxiv.org/abs/1808.07301.
[8]
Afshin Dehghan, Shayan Modiri Assari, and Mubarak Shah. 2015. Gmmcp tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 4091--4099.
[9]
Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester, and Deva Ramanan. 2010. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9 (2010), 1627--1645.
[10]
Niloofar Gheissari, Thomas B. Sebastian, and Richard Hartley. 2006. Person reidentification using spatiotemporal appearance. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 1528--1535.
[11]
Shaogang Gong and Tao Xiang. 2011. Visual Analysis of Behaviour: From Pixels to Semantics. Springer Science 8 Business Media.
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 770--778.
[13]
João F. Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2015. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37, 3 (2015), 583--596.
[14]
Martin Hirzer, Csaba Beleznai, Peter M. Roth, and Horst Bischof. 2011. Person re-identification by descriptive and discriminative classification. In Image Analysis.
[15]
Wenjun Huang, Chao Liang, Yi Yu, Zheng Wang, Weijian Ruan, and Ruimin Hu. 2018. Video-based person re-identification via self paced weighting. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2273--2280.
[16]
Srikrishna Karanam, Yang Li, and Richard J. Radke. 2015. Person re-identification with discriminatively trained viewpoint invariant dictionaries. In Proceedings of the International Conference on Computer Vision.
[17]
S. Karanam, Y. Li, and R. J. Radke. 2015. Person re-identification with discriminatively trained viewpoint invariant dictionaries. In Proceedings of the International Conference on Computer Vision. 4516--4524.
[18]
Srikrishna Karanam, Yang Li, and Richard J. Radke. 2015. Sparse re-id: Block sparsity for person re-identification. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition Workshops. 33--40.
[19]
Alexander Klaser, Marcin Marszałek, and Cordelia Schmid. 2008. A spatio-temporal descriptor based on 3d-gradients. In Proceedings of the British Machine Vision Conference 275--1.
[20]
Ivan Laptev, Marcin Marszalek, Cordelia Schmid, and Benjamin Rozenfeld. 2008. Learning realistic human actions from movies. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.
[21]
Minxian Li, Xiatian Zhu, and Shaogang Gong. 2020. Unsupervised tracklet person re-identification. IEEE Trans. Pattern Anal. Mach. Intell. 42, 7 (2020), 1770--1782.
[22]
Shuangqun Li, Xinchen Liu, Wu Liu, Huadong Ma, and Haitao Zhang. 2016. A discriminative null space based deep learning approach for person re-identification. In Proceedings of International Conference on Cloud Computing and Intelligence Systems. 480--484.
[23]
Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 2197--2206.
[24]
Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 2197--2206.
[25]
Jiawei Liu, Zheng-Jun Zha, Xuejin Chen, Zilei Wang, and Yongdong Zhang. 2019. Dense 3D-convolutional neural network for person re-identification in videos. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1s (2019), 8.
[26]
Kan Liu, Bingpeng Ma, Wei Zhang, and Rui Huang. 2015. A spatio-temporal appearance representation for video-based pedestrian re-identification. In Proceedings of the International Conference on Computer Vision.
[27]
Wu Liu, Xinchen Liu, Huadomg Ma, and Peng Cheng. 2017. Beyond human-level license plate super-resolution with progressive vehicle search and domain priori GAN. In Proceedings of the 25th ACM International Conference on Multimedia. 1618--1626.
[28]
Wu Liu, Cheng Zhang, Huadong Ma, and Shuangqun Li. 2018. Learning efficient spatial-temporal gait features with deep learning for human identification. Neuroinformatics 16, 3--4 (2018), 457--471.
[29]
Zimo Liu, Dong Wang, and Huchuan Lu. 2017. Stepwise metric promotion for unsupervised video person re-identification. In Proceedings of the International Conference on Computer Vision. 2448--2457.
[30]
Tetsu Matsukawa, Takahiro Okabe, Einoshin Suzuki, and Yoichi Sato. 2016. Hierarchical Gaussian descriptor for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1363--1372.
[31]
N. McLaughlin, J. Martinez del Rincon, and P. Miller. 2016. Recurrent convolutional network for video-based person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition.
[32]
Niall McLaughlin, Jesus Martinez del Rincon, Paul Miller, and Paul Miller. 2016. Recurrent convolutional network for video-based person re-identification. In Proceedings of the Computer Vision and Pattern Recognition. 1325--1334.
[33]
Deqiang Ouyang, Jie Shao, Yonghui Zhang, Yang Yang, and Heng Tao Shen. 2018. Video-based person re-identification via self-paced learning and deep reinforcement learning framework. In Proceedings of the 2018 ACM Multimedia Conference. ACM, 1562--1570.
[34]
Deqiang Ouyang, Yonghui Zhang, and Jie Shao. 2019. Video-based person re-identification via spatio-temporal attentional and two-stream fusion convolutional networks. Pattern Recogn. Lett. 117 (2019), 153--160.
[35]
Weijian Ruan, Jun Chen, Chao Liang, Yi Wu, and Ruimin Hu. 2017. Object tracking via online trajectory optimization with multi-feature fusion. In Proceedings of the International Conference on Multimedia Expo. 1231--1236.
[36]
Weijian Ruan, Jun Chen, Jinqiao Wang, Bo Luo, Wenjun Huang, and Ruimin Hu. 2016. Boosted local classifiers for visual tracking. In Proceedings of the International Conference on Multimedia Expo. 1--6.
[37]
Weijian Ruan, Jun Chen, Yi Wu, Jinqiao Wang, Chao Liang, Ruimin Hu, and Junjun Jiang. 2018. Multi-correlation filters with triangle-structure constraints for object tracking. IEEE Trans. Multimdedia 21, 5 (2018), 1122--1134.
[38]
Weijian Ruan, Chao Liang, Yi Yu, Jun Chen, and Ruimin Hu. 2020. SIST: Online scale-adaptive object tracking with stepwise insight. Neurocomputing 384 (2020), 200--212.
[39]
Weijian Ruan, Wu Liu, Qian Bao, Jun Chen, Yuhao Cheng, and Tao Mei. 2019. POINet: Pose-guided ovonic insight network for multi-person pose tracking. In Proceedings of the ACM Conference on Multimedia. ACM, 284--292.
[40]
Paul Scovanner, Saad Ali, and Mubarak Shah. 2007. A 3-dimensional sift descriptor and its application to action recognition. In Proceedings of the ACM Conference on Multimedia. ACM, 357--360.
[41]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[42]
Chi Su, Shiliang Zhang, Junliang Xing, Wen Gao, and Qi Tian. 2016. Deep attributes driven multi-camera person re-identification. In Proceedings of the European Conference on Computer Vision. Springer, 475--491.
[43]
Yu Sun, Yun Ye, Wu Liu, Wenpeng Gao, Yili Fu, and Tao Mei. 2019. Human mesh recovery from monocular images via a skeleton-disentangled representation. In Proceedings of the International Conference on Computer Vision. 5349--5358.
[44]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 2818--2826.
[45]
Dapeng Tao, Lianwen Jin, Yongfei Wang, Yuan Yuan, and Xuelong Li. 2013. Person re-identification by regularized smoothing kiss metric learning. IEEE Trans. Circ. Syst. Video Technol. 23, 10 (2013), 1675--1685.
[46]
Heng Wang, Muhammad Muneeb Ullah, Alexander Klaser, Ivan Laptev, and Cordelia Schmid. 2009. Evaluation of local spatio-temporal features for action recognition. In Proceedings of the British Machine Vision Conference 124--1.
[47]
Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2014. Person re-identification by video ranking. In Proceedings of the European Conference on Computer Vision. 688--703.
[48]
Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2016. Person re-identification by discriminative selection in video ranking. IEEE Trans. Pattern Anal. Mach. Intell. 38, 12 (2016), 2501--2514.
[49]
Xiao Wang, Chao Liang, Chen Chen, Jun Chen, Zheng Wang, Zhen Han, and Chunxia Xiao. 2019. S3D: Scalable pedestrian detection via score scale surface discrimination. IEEE Trans. Circuits and. Syst. Video Technol. 30, 10 (2019), 3332--3344.
[50]
Zheng Wang, Ruimin Hu, Chao Liang, Yi Yu, Junjun Jiang, Mang Ye, Jun Chen, and Qingming Leng. 2016. Zero-shot person re-identification via cross-view consistency. IEEE Trans. Multimedia 18, 2 (2016), 260--272.
[51]
Lin Wu, Yang Wang, Junbin Gao, and Xue Li. 2019. Where-and-when to look: Deep siamese attention networks for video-based person re-identification. IEEE Trans. Multimedia 21, 6 (2019), 1412--1424.
[52]
Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wei Bian, and Yi Yang. 2019. Progressive learning for person re-identification with one example. IEEE Trans. Image Process. 28, 6 (2019), 2872--2881.
[53]
Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, and Pan Zhou. 2017. Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In Proceedings of the International Conference on Computer Vision. 4743--4752.
[54]
Mang Ye, Xiangyuan Lan, and Pong C. Yuen. 2018. Robust anchor embedding for unsupervised video person re-identification in the wild. In Proceedings of the European Conference on Computer Vision. 170--186.
[55]
Mang Ye, Jiawei Li, Andy J. Ma, Liang Zheng, and Pong C. Yuen. 2019. Dynamic graph co-matching for unsupervised video-based person re-identification. IEEE Trans. Image Process. 28, 6 (2019), 2976--2990.
[56]
Mang Ye, Chao Liang, Yi Yu, Zheng Wang, Qingming Leng, Chunxia Xiao, Jun Chen, and Ruimin Hu. 2016. Person re-identification via ranking aggregation of similarity pulling and dissimilarity pushing. IEEE Trans. Multimedia 18, 12 (2016), 2553--2566.
[57]
Mang Ye, Andy J. Ma, Liang Zheng, Jiawei Li, and Pong C. Yuen. 2017. Dynamic label graph matching for unsupervised video re-identification. In Proceedings of the International Conference on Computer Vision.
[58]
Jinjie You, Ancong Wu, Xiang Li, and Wei-Shi Zheng. 2016. Top-push video-based person re-identification. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition.
[59]
Jinjie You, Ancong Wu, Xiang Li, and Wei-Shi Zheng. 2016. Top-push video-based person re-identification. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 1345--1353.
[60]
Li Zhang, Tao Xiang, and Shaogang Gong. 2016. Learning a discriminative null space for person re-identification. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 1239--1248.
[61]
Ruimao Zhang, Liang Lin, Rui Zhang, Wangmeng Zuo, and Lei Zhang. 2015. Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE Trans. Image Process. 24, 12 (2015), 4766--4779.
[62]
Wei Zhang, Shengnan Hu, Kan Liu, and Zhengjun Zha. 2018. Compact appearance learning for video-based person re-identification. IEEE Trans. Circ. Syst. Video Technol. 29, 8 (2018), 2442--2452.
[63]
Wei Zhang, Xiaodong Yu, and Xuanyu He. 2017. Learning bidirectional temporal cues for video-based person re-identification. IEEE Trans. Circ. Syst. Video Technol. 28, 10 (2017), 2768--2776.
[64]
Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. MARS: A video benchmark for large-scale person re-identification. In Proceedings of the European Conference on Computer Vision.
[65]
Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the International Conference on Computer Vision.
[66]
Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the International Conference on Computer Vision. 1116--1124.
[67]
Liang Zheng, Yi Yang, and Alexander G. Hauptmann. 2016. Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984 (2016).
[68]
Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In Proceedings of the International Conference on Computer Vision. 3754--3762.
[69]
Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. 2017. Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 1318--1327.
[70]
Zhen Zhou, Yan Huang, Wei Wang, Liang Wang, and Tieniu Tan. 2017. See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 6776--6785.
[71]
Xiaoke Zhu, Xiao-Yuan Jing, Fei Wu, and Hui Feng. 2016. Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics. In Proceedings of the International Joint Conference on Artificial Intelligence.
[72]
Xiaoke Zhu, Xiao-Yuan Jing, Fei Wu, Yunhong Wang, Wangmeng Zuo, and Wei-Shi Zheng. 2017. Learning heterogeneous dictionary pair with feature projection matrix for pedestrian video retrieval via single query image. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.

Cited By

View all
  • (2023)ALFPNInternational Journal of Intelligent Systems10.1155/2023/62662092023Online publication date: 1-Jan-2023
  • (2023)A Feature Map is Worth a Video Frame: Rethinking Convolutional Features for Visible-Infrared Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/361737520:2(1-20)Online publication date: 24-Aug-2023
  • (2023)Context Sensing Attention Network for Video-based Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357320319:4(1-20)Online publication date: 27-Feb-2023
  • Show More Cited By

Index Terms

  1. Correlation Discrepancy Insight Network for Video Re-identification

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 4
    November 2020
    372 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3444749
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 December 2020
    Accepted: 01 May 2020
    Revised: 01 May 2020
    Received: 01 September 2019
    Published in TOMM Volume 16, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Video re-identification
    2. correlation insight
    3. cross-domain variation
    4. discrepancy description network

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • National Key R&D Program of China
    • National Nature Science Foundation of China
    • Natural Science Fundation of Hubei Province

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)ALFPNInternational Journal of Intelligent Systems10.1155/2023/62662092023Online publication date: 1-Jan-2023
    • (2023)A Feature Map is Worth a Video Frame: Rethinking Convolutional Features for Visible-Infrared Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/361737520:2(1-20)Online publication date: 24-Aug-2023
    • (2023)Context Sensing Attention Network for Video-based Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357320319:4(1-20)Online publication date: 27-Feb-2023
    • (2023)Cross-Camera Trajectories Help Person Retrieval in a Camera NetworkIEEE Transactions on Image Processing10.1109/TIP.2023.329051532(3806-3820)Online publication date: 1-Jan-2023
    • (2023)Where to look: Multi-granularity occlusion aware for video person re-identificationNeurocomputing10.1016/j.neucom.2023.03.003536(137-151)Online publication date: Jun-2023
    • (2023)Generalizable person re-identification with part-based multi-scale networkMultimedia Tools and Applications10.1007/s11042-023-14718-182:25(38639-38666)Online publication date: 1-Oct-2023
    • (2022)Video-based Person re-identification with parallel correction and fusion of pedestrian area featuresMathematical Biosciences and Engineering10.3934/mbe.202316420:2(3504-3527)Online publication date: 2022
    • (2022)Clustering Matters: Sphere Feature for Fully Unsupervised Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/350140418:4(1-18)Online publication date: 15-Mar-2022
    • (2022)TICNet: A Target-Insight Correlation Network for Object TrackingIEEE Transactions on Cybernetics10.1109/TCYB.2021.307067752:11(12150-12162)Online publication date: Nov-2022
    • (2022)SANet: Statistic Attention Network for Video-Based Person Re-IdentificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2021.311998332:6(3866-3879)Online publication date: Jun-2022
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media