[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

An End-to-end Heterogeneous Restraint Network for RGB-D Cross-modal Person Re-identification

Published: 04 March 2022 Publication History

Abstract

The RGB-D cross-modal person re-identification (re-id) task aims to identify the person of interest across the RGB and depth image modes. The tremendous discrepancy between these two modalities makes this task difficult to tackle. Few researchers pay attention to this task, and the deep networks of existing methods still cannot be trained in an end-to-end manner. Therefore, this article proposes an end-to-end module for RGB-D cross-modal person re-id. This network introduces a cross-modal relational branch to narrow the gaps between two heterogeneous images. It models the abundant correlations between any cross-modal sample pairs, which are constrained by heterogeneous interactive learning. The proposed network also exploits a dual-modal local branch, which aims to capture the common spatial contexts in two modalities. This branch adopts shared attentive pooling and mutual contextual graph networks to extract the spatial attention within each local region and the spatial relations between distinct local parts, respectively. Experimental results on two public benchmark datasets, that is, the BIWI and RobotPKU datasets, demonstrate that our method is superior to the state-of-the-art. In addition, we perform thorough experiments to prove the effectiveness of each component in the proposed method.

References

[1]
Davide Baltieri, Roberto Vezzani, and Rita Cucchiara. 2013. Learning articulated body models for people re-identification. In Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, Spain. ACM, 557–560.
[2]
Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. Beyond triplet loss: A deep quadruplet network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA. IEEE, 403–412.
[3]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.
[4]
Xing Fan, Wei Jiang, Hao Luo, and Mengjuan Fei. 2019. SphereReID: Deep hypersphere manifold embedding for person re-identification. Journal of Visual Communication and Image Representation 60 (2019), 51–58.
[5]
Zhanxiang Feng, Jianhuang Lai, and Xiaohua Xie. 2019. Learning modality-specific representations for visible-infrared person re-identification. IEEE Transactions on Image Processing 29 (2019), 579–590.
[6]
Jianyuan Guo, Yuhui Yuan, Lang Huang, Chao Zhang, Jin-Ge Yao, and Kai Han. 2019. Beyond human parts: Dual part-aligned representations for person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3642–3651.
[7]
Frank M. Hafner, Amran Bhuiyan, Julian F. P. Kooij, and Eric Granger. 2019. RGB-depth cross-modal person re-identification. In 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS’19). IEEE, 1–8.
[8]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[9]
Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv:1703.07737
[10]
Jianguo Jiang, Kaiyuan Jin, Meibin Qi, Qian Wang, Jingjing Wu, and Cuiqun Chen. 2020. A cross-modal multi-granularity attention network for RGB-IR person re-identification. Neurocomputing 406 (2020), 59–67.
[11]
Xiangyuan Lan, Mang Ye, Shengping Zhang, Huiyu Zhou, and Pong C. Yuen. 2020. Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recognition Letters 130 (2020), 12–20.
[12]
Yaoyu Li, Hantao Yao, Tianzhu Zhang, and Changsheng Xu. 2020. Part-based structured representation learning for person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 4 (2020), 1–22.
[13]
Zhaoju Li, Zongwei Zhou, Nan Jiang, Zhenjun Han, Junliang Xing, and Jianbin Jiao. 2020. Spatial preserved graph convolution networks for person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 1s (2020), 1–14.
[14]
Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2197–2206.
[15]
Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, Zhilan Hu, Chenggang Yan, and Yi Yang. 2019. Improving person re-identification by attribute and identity learning. Pattern Recognition 95 (2019), 151–161.
[16]
Giuseppe Lisanti, Iacopo Masi, Andrew D. Bagdanov, and Alberto Del Bimbo. 2014. Person re-identification by iterative re-weighted sparse ranking. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 8 (2014), 1629–1642.
[17]
Hong Liu, Liang Hu, and Liqian Ma. 2017. Online RGB-D person re-identification based on metric model update. CAAI Transactions on Intelligence Technology 2, 1 (2017), 48–55.
[18]
Jialun Liu, Yifan Sun, Chuchu Han, Zhaopeng Dou, and Wenhui Li. 2020. Deep representation learning on long-tailed data: A learnable embedding augmentation perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2970–2979.
[19]
Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. 2016. Large-margin Softmax loss for convolutional neural networks. In International Conference on Machine Learning, New York City, NY, USA, Vol. 2. 7. Microtome Publishing, 507–516.
[20]
Hao Luo, Youzhi Gu, Xingyu Liao, Shenqi Lai, and Wei Jiang. 2019. Bag of tricks and a strong baseline for deep person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA. IEEE, 0–0.
[21]
Andreas Mogelmose, Chris Bahnsen, Thomas Moeslund, Albert Clapés, and Sergio Escalera. 2013. Tri-modal person re-identification with RGB, depth and thermal features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Portland, OR, USA. IEEE, 301–307.
[22]
Andreas Møgelmose, Thomas B. Moeslund, and Kamal Nasrollahi. 2013. Multimodal person re-identification using RGB-D sensors and a transient identification database. In 2013 International Workshop on Biometrics and Forensics (IWBF), Lisbon, Portugal. IEEE, 1–4.
[23]
Matteo Munaro, Andrea Fossati, Alberto Basso, Emanuele Menegatti, and Luc Van Gool. 2014. One-shot person re-identification with a consumer depth camera. In Person Re-Identification, Shaogang Gong, Marco Cristani, Shuicheng Yan, and Chen Change Loy (Eds.). Springer, 161–181.
[24]
Federico Pala, Riccardo Satta, Giorgio Fumera, and Fabio Roli. 2015. Multimodal person reidentification using RGB-D cameras. IEEE Transactions on Circuits and Systems for Video Technology 26, 4 (2015), 788–799.
[25]
Xuelin Qian, Yanwei Fu, Yu-Gang Jiang, Tao Xiang, and Xiangyang Xue. 2017. Multi-scale deep learning architectures for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. IEEE, 5399–5408.
[26]
Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An incremental improvement. arXiv:1804.02767
[27]
Hailin Shi, Yang Yang, Xiangyu Zhu, Shengcai Liao, Zhen Lei, Weishi Zheng, and Stan Z. Li. 2016. Embedding deep metric for person re-identification: A study against large variations. In European Conference on Computer Vision, Amsterdam, Netherlands. Springer, 732–748.
[28]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
[29]
Chi Su, Jianing Li, Shiliang Zhang, Junliang Xing, Wen Gao, and Qi Tian. 2017. Pose-driven deep convolutional model for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. IEEE, 3960–3969.
[30]
Yifan Sun, Changmao Cheng, Yuhan Zhang, Chi Zhang, Liang Zheng, Zhongdao Wang, and Yichen Wei. 2020. Circle loss: A unified perspective of pair similarity optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA. IEEE, 6398–6407.
[31]
Yifan Sun, Liang Zheng, Weijian Deng, and Shengjin Wang. 2017. SVDnet for pedestrian retrieval. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. IEEE, 3800–3808.
[32]
Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. 2018. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European Conference on Computer Vision (ECCV’18), Munich, Germany. Springer, 480–496.
[33]
Faqiang Wang, Wangmeng Zuo, Liang Lin, David Zhang, and Lei Zhang. 2016. Joint learning of single-image and cross-image representations for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA. IEEE, 1288–1296.
[34]
Guan’an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, and Zengguang Hou. 2019. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South). IEEE, 3623–3632.
[35]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA. IEEE, 7794–7803.
[36]
Nicolai Wojke and Alex Bewley. 2018. Deep cosine metric learning for person re-identification. In IEEE Winter Conference on Applications of Computer Vision (WACV’18), Lake Tahoe, NV/CA, USA. IEEE, 748–756.
[37]
Ancong Wu, Wei-Shi Zheng, and Jian-Huang Lai. 2017. Robust depth-based person re-identification. IEEE Transactions on Image Processing 26, 6 (2017), 2588–2603.
[38]
Ancong Wu, Wei-Shi Zheng, Hong-Xing Yu, Shaogang Gong, and Jianhuang Lai. 2017. RGB-infrared cross-modality person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. IEEE, 5380–5389.
[39]
Lin Wu, Chunhua Shen, and Anton van den Hengel. 2016. Personnet: Person re-identification with deep convolutional neural networks. arXiv:1601.07255
[40]
Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wei Bian, and Yi Yang. 2019. Progressive learning for person re-identification with one example. IEEE Transactions on Image Processing 28, 6 (June 2019), 2872–2881.
[41]
Xinxing Xu, Wen Li, and Dong Xu. 2015. Distance metric learning using privileged information for face verification and person re-identification. IEEE Transactions on Neural Networks and Learning Systems 26, 12 (2015), 3150–3162.
[42]
Mang Ye, Jianbing Shen, David J. Crandall, Ling Shao, and Jiebo Luo. 2020. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In Proceedings of the 16th European Conference on Computer Vision (ECCV 2020), Glasgow, UK, August 23–28, 2020, Part XVII. Springer, 229–247.
[43]
Mang Ye, Jianbing Shen, and Ling Shao. 2020. Visible-infrared person re-identification via homogeneous augmented tri-modal learning. IEEE Transactions on Information Forensics and Security 16 (2020), 728–739.
[44]
Can Zhang, Hong Liu, Wei Guo, and Mang Ye. 2021. Multi-scale cascading network with compact feature learning for RGB-infrared person re-identification. In 25th International Conference on Pattern Recognition (ICPR’20), Milan, Italy. IEEE, 8679–8686.
[45]
Peng Zhang, Jingsong Xu, Qiang Wu, Yan Huang, and Jian Zhang. 2019. Top-push constrained modality-adaptive dictionary learning for cross-modality person re-identification. IEEE Transactions on Circuits and Systems for Video Technology 30, 12 (2019), 4554–4566.
[46]
Xuan Zhang, Hao Luo, Xing Fan, Weilai Xiang, Yixiao Sun, Qiqi Xiao, Wei Jiang, Chi Zhang, and Jian Sun. 2017. AlignedReID: Surpassing human-level performance in person re-identification. arXiv:1711.08184
[47]
Liming Zhao, Xi Li, Yueting Zhuang, and Jingdong Wang. 2017. Deeply-learned part-aligned representations for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. IEEE, 3219–3228.
[48]
Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. A discriminatively learned CNN embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1 (2017), 1–20.
[49]
Jiaxuan Zhuo, Junyong Zhu, Jianhuang Lai, and Xiaohua Xie. 2017. Person re-identification on heterogeneous camera network. In CCF Chinese Conference on Computer Vision, Tianjin, China. Springer, 280–291.

Cited By

View all
  • (2024)Transparent Depth Completion Using Segmentation FeaturesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/369497820:12(1-19)Online publication date: 9-Sep-2024
  • (2024)Intermediary-Generated Bridge Network for RGB-D Cross-modal Re-identificationACM Transactions on Intelligent Systems and Technology10.1145/3682066Online publication date: 29-Jul-2024
  • (2024)Exploring Visual Relationships via Transformer-based Graphs for Enhanced Image CaptioningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363855820:5(1-23)Online publication date: 22-Jan-2024
  • Show More Cited By

Index Terms

  1. An End-to-end Heterogeneous Restraint Network for RGB-D Cross-modal Person Re-identification

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 4
      November 2022
      497 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3514185
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 March 2022
      Accepted: 01 December 2021
      Revised: 01 November 2021
      Received: 01 June 2021
      Published in TOMM Volume 18, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. RGB-D cross-modal person re-identification
      2. end-to-end deep network
      3. heterogeneous interactive learning
      4. cross-modal relational branch
      5. mutual contextual graph networks

      Qualifiers

      • Research-article
      • Refereed

      Funding Sources

      • National Natural Science Foundation of China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)92
      • Downloads (Last 6 weeks)11
      Reflects downloads up to 13 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Transparent Depth Completion Using Segmentation FeaturesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/369497820:12(1-19)Online publication date: 9-Sep-2024
      • (2024)Intermediary-Generated Bridge Network for RGB-D Cross-modal Re-identificationACM Transactions on Intelligent Systems and Technology10.1145/3682066Online publication date: 29-Jul-2024
      • (2024)Exploring Visual Relationships via Transformer-based Graphs for Enhanced Image CaptioningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363855820:5(1-23)Online publication date: 22-Jan-2024
      • (2024)CMAF: Cross-Modal Augmentation via Fusion for Underwater Acoustic Image RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363642720:5(1-25)Online publication date: 11-Jan-2024
      • (2024)A Multi-scale Feature Embedding Extension Network for RGB-D Cross-modal Person Re-identification2024 7th World Conference on Computing and Communication Technologies (WCCCT)10.1109/WCCCT60665.2024.10541659(92-97)Online publication date: 12-Apr-2024
      • (2024)SSRR: Structural Semantic Representation Reconstruction for Visible-Infrared Person Re-IdentificationIEEE Transactions on Multimedia10.1109/TMM.2023.334785526(6273-6284)Online publication date: 2024
      • (2024)SYRER: Synergistic Relational Reasoning for RGB-D Cross-Modal Re-IdentificationIEEE Transactions on Multimedia10.1109/TMM.2023.333805826(5600-5614)Online publication date: 2024
      • (2024)Detection-Free Cross-Modal Retrieval for Person Identification Using Videos and Radar SpectrogramsIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.337221073(1-12)Online publication date: 2024
      • (2023)Recent progress in person re-IDJournal of Image and Graphics10.11834/jig.23002228:6(1829-1862)Online publication date: 2023
      • (2023)Complementary Coarse-to-Fine Matching for Video Object SegmentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359649619:6(1-21)Online publication date: 12-Jul-2023
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media