More Web Proxy on the site http://driver.im/

research-article

CLIP-DFGS: A Hard Sample Mining Method for CLIP in Generalizable Person Re-Identification

Authors:

Xin GengAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 21, Issue 1

Article No.: 29, Pages 1 - 20

https://doi.org/10.1145/3701036

Published: 23 December 2024 Publication History

Abstract

Recent advancements in pre-trained vision-language models like CLIP have shown promise in person re-identification (ReID) applications. However, their performance in generalizable person ReID tasks remains suboptimal. The large-scale and diverse image-text pairs used in CLIP’s pre-training may lead to a lack or insufficiency of certain fine-grained features. In light of these challenges, we propose a hard sample mining method called Depth-First Graph Sampler (DFGS), based on depth-first search, designed to offer sufficiently challenging samples to enhance CLIP’s ability to extract fine-grained features. DFGS can be applied to both the image encoder and the text encoder in CLIP. By leveraging the powerful cross-modal learning capabilities of CLIP, we aim to apply our DFGS method to extract challenging samples and form mini-batches with high discriminative difficulty, providing the image model with more efficient and challenging samples that are difficult to distinguish, thereby enhancing the model’s ability to differentiate between individuals. Our results demonstrate significant improvements over other methods, confirming the effectiveness of DFGS in providing challenging samples that enhance CLIP’s performance in generalizable person Re-ID.

References

[1]

Dapeng Chen, Hongsheng Li, Xihui Liu, Yantao Shen, Jing Shao, Zejian Yuan, and Xiaogang Wang. 2018. Improving deep visual representation for person re-identification by global and local image-language association. In Proceedings of the European Conference on Computer Vision (ECCV ’18), 54–70.

Digital Library

[2]

Peixian Chen, Pingyang Dai, Jianzhuang Liu, Feng Zheng, Mingliang Xu, Qi Tian, and Rongrong Ji. 2021. Dual distribution alignment network for generalizable person re-identification. In Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI ’21), 1054–1062.

[3]

Seokeon Choi, Taekyung Kim, Minki Jeong, Hyoungseob Park, and Changick Kim. 2021. Meta batch-instance normalization for generalizable person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’21), 3425–3435.

[4]

Yongxing Dai, Xiaotong Li, Jun Liu, Zekun Tong, and Ling-Yu Duan. 2021. Generalizable person re-identification with relevance-aware mixture of experts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’21), 16145–16154.

[5]

Zefeng Ding, Changxing Ding, Zhiyin Shao, and Dacheng Tao. 2021. Semantically self-aligned network for text-to-image part-aware person re-identification. arXiv:2107.12666. Retrieved from https://arxiv.org/abs/2107.12666

[6]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929. Retrieved from https://arxiv.org/abs/2010.11929

[7]

Chanho Eom and Bumsub Ham. 2019. Learning disentangled representation for robust person re-identification. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS ’19), 5298–5309.

[8]

Wenlong Fang, Chunping Ouyang, Qiang Lin, and Yue Yuan. 2023. Three heads better than one: Pure entity, relation label and adversarial training for cross-domain few-shot relation extraction. Data Intelligence 5, 3 (2023), 807.

[9]

Ammarah Farooq, Muhammad Awais, Fei Yan, Josef Kittler, Ali Akbari, and Syed Safwan Khalid. 2020. A convolutional baseline for person re-identification using vision and language descriptions. arXiv:2003.00808. Retrieved from https://arxiv.org/abs/2003.00808

[10]

Douglas Gray and Hai Tao. 2008. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In Proceedings of the European Conference on Computer Vision (ECCV ’08), 262–275.

Digital Library

[11]

Lingxiao He, Wu Liu, Jian Liang, Kecheng Zheng, Xingyu Liao, Peng Cheng, and Tao Mei. 2021. Semi-supervised domain generalizable person re-identification. arXiv:2108.05045. Retrieved from https://arxiv.org/abs/2108.05045

[12]

Shuting He, Hao Luo, Pichao Wang, Fan Wang, Hao Li, and Wei Jiang. 2021. Transreid: Transformer-based object re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’21), 15013–15022.

[13]

Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv:1703.07737. Retrieved from https://arxiv.org/abs/1703.07737

[14]

Martin Hirzer, Csaba Beleznai, Peter M. Roth, and Horst Bischof. 2011. Person re-identification by descriptive and discriminative classification. In Proceedings of the Scandinavian Conference on Image Analysis (SCIA ’11), 91–102.

[15]

Jieru Jia, Qiuqi Ruan, and Timothy M. Hospedales. 2019. Frustratingly easy person re-identification: Generalizing person re-ID in Practice. In Proceedings of the British Machine Vision Conference (BMVC ’19), 117.

[16]

Ding Jiang and Mang Ye. 2023. Cross-modal implicit relation reasoning and aligning for text-to-image person retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’23), 2787–2797.

[17]

Xin Jin, Cuiling Lan, Wenjun Zeng, Zhibo Chen, and Li Zhang. 2020. Style normalization and restitution for generalizable person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’20), 3140–3149.

[18]

Amruta Kale, Tin Nguyen, Frederick C Harris Jr, Chenhao Li, Jiyin Zhang, and Xiaogang Ma. 2023. Provenance documentation to enable explainable and trustworthy AI: A literature review. Data Intelligence 5, 1 (2023), 139–162.

[19]

Aske Rasch Lejbolle, Kamal Nasrollahi, Benjamin Krogh, and Thomas B. Moeslund. 2019. Person re-identification using spatial and layer-wise attention. IEEE Transactions on Information Forensics and Security, 99 (2019), 1–1.

[20]

Qingming Leng, Mang Ye, and Qi Tian. 2019. A survey of open-world person re-identification. IEEE Transactions on Circuits and Systems for Video Technology 30, 4 (2019), 1092–1108.

[21]

Huafeng Li, Yiwen Chen, Dapeng Tao, Zhengtao Yu, and Guanqiu Qi. 2020. Attribute-aligned domain-invariant feature learning for unsupervised domain adaptation person re-identification. IEEE Transactions on Information Forensics and Security, 99 (2020), 1–1.

[22]

Siyuan Li, Li Sun, and Qingli Li. 2023. CLIP-ReID: Exploiting vision-language model for image re-identification without concrete text labels. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI ’23), 1405–1413.

Digital Library

[23]

Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, and Xiaogang Wang. 2017. Person search with natural language description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’17), 1970–1979.

[24]

Wei Li and Xiaogang Wang. 2013. Locally aligned feature transforms across views. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’13), 3594–3601.

Digital Library

[25]

Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. Deepreid: Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’14), 152–159.

Digital Library

[26]

Yuke Li, Jingkuan Song, Hao Ni, and Heng Tao Shen. 2023. Style-controllable generalized person re-identification. In Proceedings of the ACM International Conference on Multimedia (MM ’23), 7912–7921.

Digital Library

[27]

Shengcai Liao and Ling Shao. 2020. Interpretable and generalizable person re-identification with query-adaptive convolution and temporal lifting. In Proceedings of the European Conference on Computer Vision (ECCV ’20), 456–474.

Digital Library

[28]

Shengcai Liao and Ling Shao. 2022. Graph sampling based deep metric learning for generalizable person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’22), 7359–7368.

[29]

Ci-Siang Lin, Yuan-Chia Cheng, and Yu-Chiang Frank Wang. 2020. Domain generalized person re-identification via cross-domain episodic learning. In Proceedings of the International Conference on Pattern Recognition (ICPR ’20), 6758–6763.

[30]

Deyin Liu, Lin Wu, Richang Hong, ZongYuan Ge, Jialie Shen, Farid Boussaid, and Mohammed Bennamoun. 2022. Generative metric learning for adversarially robust open-world person re-identification. ACM Transactions on Multimedia Computing, Communications and Applications 19 (2022), 1–19.

Digital Library

[31]

Chen Change Loy, Tao Xiang, and Shaogang Gong. 2010. Time-delayed correlation analysis for multi-camera activity understanding. International Journal of Computer Vision 90 (2010), 106–129.

Digital Library

[32]

Chuanchen Luo, Chunfeng Song, and Zhaoxiang Zhang. 2020. Generalizing person re-identification by camera-aware invariance learning and cross-domain mixup. In Proceedings of the European Conference on Computer Vision (ECCV ’20), 224–241.

Digital Library

[33]

Hao Luo, Wei Jiang, Youzhi Gu, Fuxu Liu, Xingyu Liao, Shenqi Lai, and Jianyang Gu. 2019. A strong baseline and batch normalization neck for deep person re-identification. IEEE Transactions on Multimedia 22, 10 (2019), 2597–2609.

[34]

Hao Ni, Yuke Li, Lianli Gao, Heng Tao Shen, and Jingkuan Song. 2023. Part-aware transformer for generalizable person re-identification. In Proceedings of the International Conference on Computer Vision (ICCV ’23), 11280–11289.

[35]

Jinjia Peng, Song Pengpeng, Hui Li, and Huibing Wang. 2024. ReFID: reciprocal frequency-aware generalizable person re-identification via decomposition and filtering. ACM Transactions on Multimedia Computing, Communications and Applications 20, 7 (2024), 1–20.

Digital Library

[36]

Lei Qi, Ziang Liu, Yinghuan Shi, and Xin Geng. 2024. Generalizable metric network for cross-domain person re-identification. IEEE Transactions on Circuits and Systems for Video Technology 34, 10 (2024), 9039–9052.

Digital Library

[37]

Lei Qi, Lei Wang, Jing Huo, Yinghuan Shi, and Yang Gao. 2021. GreyReID: A novel two-stream deep framework with RGB-grey information for person re-identification. ACM Transactions on Multimedia Computing, Communications and Applications 17, 1 (2021).

Digital Library

[38]

Lei Qi, Hongpeng Yang, Yinghuan Shi, and Xin Geng. 2024. Multimatch: Multi-task learning for semi-supervised domain generalization. ACM Transactions on Multimedia Computing, Communications and Applications 20, 6 (2024), 1–21.

Digital Library

[39]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning (ICML ’21), 8748–8763.

[40]

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the International Conference on Computer Vision (ICCV ’17), 618–626.

[41]

Liao Shengcai and Shao Ling. 2021. Transmatcher: Deep image matching through transformers for generalizable person re-identification. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS ’21), 1–12.

[42]

Jifei Song, Yongxin Yang, Yi-Zhe Song, Tao Xiang, and Timothy M. Hospedales. 2019. Generalizable person re-Identification by domain-invariant mapping network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’19), 719–728.

[43]

Wentao Tan, Changxing Ding, Pengfei Wang, Mingming Gong, and Kui Jia. 2023. Style interleaved learning for generalizable person re-identification. IEEE Transactions on Multimedia 26 (2023), 1600–1612.

Digital Library

[44]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008), 2579–2605.

[45]

Zijie Wang, Aichun Zhu, Jingyi Xue, Xili Wan, Chao Liu, Tian Wang, and Yifeng Li. 2022. Caibc: Capturing all-round information beyond color for text-based person retrieval. In Proceedings of the ACM International Conference on Multimedia (MM ’22), 5314–5322.

Digital Library

[46]

Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. 2018. Person transfer gan to bridge domain gap for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’18), 79–88.

[47]

Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, and Xiaogang Wang. 2016. End-to-end deep learning for person search. arXiv:1604.01850. Retrieved from https://arxiv.org/abs/1604.01850

[48]

Boqiang Xu, Jian Liang, Lingxiao He, and Zhenan Sun. 2022. Mimic embedding via adaptive aggregation: Learning generalizable person re-identification. In Proceedings of the European Conference on Computer Vision (ECCV ’22), 372–388.

Digital Library

[49]

Cheng Yan, Guansong Pang, Xiao Bai, Changhong Liu, Xin Ning, Lin Gu, and Jun Zhou. 2021. Beyond triplet loss: Person re-identification with fine-grained difference-aware pairwise loss. IEEE Transactions on Multimedia (TMM) 24 (2021), 1665–1677.

[50]

Shuanglin Yan, Neng Dong, Liyan Zhang, and Jinhui Tang. 2023. Clip-driven fine-grained text-image person re-identification. IEEE Transactions on Image Processing 32 (2023), 6032–6046.

Digital Library

[51]

Ke Yang, Fan Part-awareand Yan, Shijian Lu, Huizhu Jia, Don Xie, Zongqiao Yu, Xiaowei Guo, Feiyue Huang, and Wen Gao. 2020. Part-aware progressive unsupervised domain adaptation for person re-identification. IEEE Transactions on Multimedia 23 (2020), 1681–1695.

Digital Library

[52]

Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven C. H. Hoi. 2021. Deep learning for person re-identification: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 6 (2021), 2872–2893.

[53]

Mang Ye and Pong C. Yuen. 2020. PurifyNet: A robust person re-identification model with noisy labels. IEEE Transactions on Information Forensics and Security 15, 99 (2020), 2655–2666.

Digital Library

[54]

Ye Yuan, Wuyang Chen, Tianlong Chen, Yang Yang, Zhou Ren, Zhangyang Wang, and Gang Hua. 2020. Calibrated domain-invariant learning for highly generalizable large scale re-identification. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV ’20), 3578–3587.

[55]

Enwei Zhang, Xinyang Jiang, Hao Cheng, Ancong Wu, Fufu Yu, Ke Li, Xiaowei Guo, Feng Zheng, Weishi Zheng, and Xing Sun. 2021. One for more: Selecting generalizable samples for generalizable reid model. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI ’21), 3324–3332.

[56]

Pengyi Zhang, Huanzhang Dou, Yunlong Yu, and Xi Li. 2022. Adaptive cross-domain learning for generalizable person re-identification. In Proceedings of the European Conference on Computer Vision (ECCV ’22), 215–232.

Digital Library

[57]

Yiyuan Zhang, Yuhao Kang, Sanyuan Zhao, and Jianbing Shen. 2022. Dual-semantic consistency learning for visible-infrared person re-identification. IEEE Transactions on Information Forensics and Security 18 (2022), 1554–1565.

Digital Library

[58]

Yuyang Zhao, Zhun Zhong, Fengxiang Yang, Zhiming Luo, Yaojin Lin, Shaozi Li, and Nicu Sebe. 2021. Learning to generalize unseen domains via memory-based multi-source meta-learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’21), 6277–6286.

[59]

Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’15), 1116–1124.

[60]

Liang Zheng, Yi Yang, and Alexander G. Hauptmann. 2016. Person re-identification: Past, present and future. arXiv:1610.02984. Retrieved from https://arxiv.org/abs/1610.02984

[61]

Qiushuo Zheng, Hao Wen, Meng Wang, Guilin Qi, and Chaoyu Bai. 2022. Faster zero-shot multi-modal entity linking via visual-linguistic representation. Data Intelligence 4, 3 (2022), 493–508.

[62]

Wei-Shi Zheng, Shaogang Gong, and Tao Xiang. 2009. Associating groups of people. In Proceedings of the British Machine Vision Conference (BMVC ’09), 1–11.

[63]

Aichun Zhu, Zijie Wang, Yifeng Li, Xili Wan, Jing Jin, Tian Wang, Fangqiang Hu, and Gang Hua. 2021. DSSL: Deep surroundings-person separation learning for text-based person retrieval. In Proceedings of the PACM International Conference on Multimedia (MM ’21), 209–217.

Digital Library

[64]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial Networks. In Proceedings of the International Conference on Computer Vision (ICCV ’17), 2242–2251.

[65]

Zijie Zhuang, Longhui Wei, Lingxi Xie, Tianyu Zhang, Hengheng Zhang, Haozhe Wu, Haizhou Ai, and Qi Tian. 2020. Rethinking the distribution gap of person re-identification with camera-based batch normalization. In Proceedings of the European Conference on Computer Vision (ECCV ’20), 140–157.

Digital Library

Index Terms

CLIP-DFGS: A Hard Sample Mining Method for CLIP in Generalizable Person Re-Identification
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
      2. Computer vision tasks
        Visual content-based indexing and retrieval

Recommendations

Hard sample mining makes person re-identification more efficient and accurate
Abstract
In recent years, the field of person re-identification has made significant advances riding on the wave of deep learning. However, owing to the fact that there are much more easy examples than those meaningful hard examples in a ...
DKAF: Diffusion Kolmogorov-Arnold Fourier Hard Sample Mining for CTR
Web Information Systems Engineering – WISE 2024
Abstract
In recommendation, the accuracy of Click-Through Rate prediction (CTR) significantly impacts user experience and economic benefits. Effective mining of hard samples is crucial for improving model performance. However, existing CTR models often ...
Re-ranking Person Re-identification with Adaptive Hard Sample Mining
Pattern Recognition and Computer Vision
Abstract
Person re-identification (re-ID) aims at searching a specific person among non-overlapping cameras, which can be considered as a retrieval process, and the result is presented as a ranking list. There always exists the phenomenon that true matches ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 21, Issue 1

January 2025

860 pages

EISSN:1551-6865

DOI:10.1145/3703004

Editor:
Abuabdulmotaleb El Saddik
University of Ottowa

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 December 2024

Online AM: 21 October 2024

Accepted: 13 October 2024

Revised: 28 September 2024

Received: 31 July 2024

Published in TOMM Volume 21, Issue 1

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSFC
China Postdoctoral Science Foundation
CPSF
Jiangsu Funding Program for Excellent Postdoctoral Talent

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
265
Total Downloads

Downloads (Last 12 months)265
Downloads (Last 6 weeks)67

Reflects downloads up to 31 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents