[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Robust Object Re-identification with Coupled Noisy Labels

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

In this paper, we reveal and study a new challenging problem faced by object Re-IDentification (ReID), i.e., Coupled Noisy Labels (CNL) which refers to the Noisy Annotation (NA) and the accompanied Noisy Correspondence (NC). Specifically, NA refers to the wrongly-annotated identity of samples during manual labeling, and NC refers to the mismatched training pairs including false positives and false negatives whose correspondences are established based on the NA. Clearly, CNL will limit the success of the object ReID paradigm that simultaneously performs identity-aware discrimination learning on the data samples and pairwise similarity learning on the training pairs. To overcome this practical but ignored problem, we propose a robust object ReID method dubbed Learning with Coupled Noisy Labels (LCNL). In brief, LCNL first estimates the annotation confidences of samples and then adaptively divides the training pairs into four groups with the confidences to rectify the correspondences. After that, LCNL employs a novel objective function to achieve robust object ReID with theoretical guarantees. To verify the effectiveness of LCNL, we conduct extensive experiments on five benchmark datasets in single- and cross-modality object ReID tasks compared with 14 algorithms. The code could be accessed from https://github.com/XLearning-SCU/2024-IJCV-LCNL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Arpit, D., Jastrzebski, S., Ballas, N., Krueger, D., Bengio, E., Kanwal, M.S., Maharaj, T., Fischer, A., Courville, A., Bengio, Y., & Lacoste-Julien, S. (2017). A closer look at memorization in deep networks. arXiv:1706.05394.

  • Bai, S., Bai, X., & Tian, Q. (2017). Scalable person re-identification on supervised smoothed manifold. In CVPR (pp. 2530–2539).

  • Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., & Raffel, C. A. (2019). Mixmatch: A holistic approach to semi-supervised learning. Advances in Neural Information Processing Systems, 32.

  • Chen, T. S., Liu, C. T., Wu, C. W., & Chien, S. Y. (2020). Orientation-aware vehicle re-identification with semantics-guided part attention network. In ECCV (pp. 330–346). Springer.

  • Choi, S., Lee, S., Kim, Y., Kim, T., & Kim, C. (2020). Hi-cmd: hierarchical cross-modality disentanglement for visible-infrared person re-identification. In CVPR (pp. 10257–10266).

  • Chu, R., Sun, Y., Li, Y., Liu, Z., Zhang, C., & Wei, Y. (2019). Vehicle re-identification with viewpoint-aware metric learning. In ICCV (pp. 8282–8291).

  • Ge, Y., Chen, D., & Li, H. (2020). Mutual mean-teaching: Pseudo label refinery for unsupervised domain adaptation on person re-identification. In: ICLR.

  • Goldberger, J., & Ben-Reuven, E. (2016). Training deep neural-networks using a noise adaptation layer. In International conference on learning representations.

  • Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I., & Sugiyama, M. (2018). Co-teaching: Robust training of deep neural networks with extremely noisy labels. In NeurIPS (pp. 8527–8537).

  • Hao, X., Zhao, S., Ye, M., & Shen, J. (2021). Cross-modality person re-identification via modality confusion and center aggregation. In ICCV (pp. 16403–16412).

  • He, S., Luo, H., Wang, P., Wang, F., Li, H., & Jiang, W. (2021). Transreid: Transformer-based object re-identification. In ICCV (pp. 15013–15022).

  • Hermans, A., Beyer, L., & Leibe, B. (2017). In defense of the triplet loss for person re-identification. arXiv:1703.07737.

  • Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., & Chen, X. (2019). Interaction-and-aggregation network for person re-identification. In CVPR (pp. 9317–9326).

  • Hu, P., Huang, Z., Peng, D., Wang. X., & Peng, X. (2023). Cross-modal retrieval with partially mismatched pairs. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Hu, P., Peng, X., Zhu, H., Zhen, L., & Lin, J. (2021). Learning cross-modal retrieval with noisy labels. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5403–5413).

  • Huang, Z., Niu, G., Liu, X., Ding, W., Xiao, X., & Peng, X. (2021). Learning with noisy correspondence for cross-modal matching. In NeurIPS.

  • Kim, Y., Yun, J., Shon, H., & Kim, J. (2021). Joint negative and positive learning for noisy labels. In CVPR (pp. 9442–9451).

  • Li, H., Wu, G., & Zheng, W. S. (2021). Combined depth space based architecture search for person re-identification. In CVPR (pp. 6729–6738).

  • Li, J., Socher. R., & Hoi, S. C. (2020). Dividemix: Learning with noisy labels as semi-supervised learning. arXiv:2002.07394.

  • Lin, Y., Yang, M., Yu, J., Hu, P., Zhang, C., & Peng, X. (2023). Graph matching with bi-level noisy correspondence. In ICCV.

  • Liu, H., Tian, Y., Yang, Y., Pang, L., & Huang, T. (2016a). Deep relative distance learning: Tell the difference between similar vehicles. In CVPR (pp. 2167–2175).

  • Liu, X., Liu, W., Mei, T. (2016b). A deep learning-based approach to progressive vehicle re-identification for urban surveillance. In ECCV (pp. 869–884). Springer.

  • Liu, X., Liu, W., Mei, T., & Ma, H. (2017). Provid: Progressive and multimodal vehicle reidentification for large-scale urban surveillance. IEEE Transactions on Multimedia, 20(3), 645–658.

    Article  Google Scholar 

  • Lu, Y., Wu, Y., Liu, B., Zhang, T., Li, B., Chu, Q., & Yu, N. (2020). Cross-modality person re-identification with shared-specific feature transfer. In CVPR (pp. 13379–13389).

  • Luo, C., Song, C., & Zhang, Z. (2022). Learning to adapt across dual discrepancy for cross-domain person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence. 45(2), 1963–1980

  • Ma, X., Huang, H., Wang, Y., Romano, S., Erfani, S., & Bailey, J. (2020). Normalized loss functions for deep learning with noisy labels. In ICML (pp. 6543–6553).

  • Mandal, D., & Biswas, S. (2020). Cross-modal retrieval with noisy labels. In ICIP (pp. 2326–2330). IEEE.

  • Meng, D., Li, L., Liu, X., Li, Y., Yang, S., Zha, Z. J., Gao, X., Wang, S., & Huang, Q. (2020). Parsing-based view-aware embedding network for vehicle re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7103–7112).

  • Nguyen, D. T., Hong, H. G., Kim, K. W., & Park, K. R. (2017). Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors, 17(3), 605.

    Article  Google Scholar 

  • Nguyen, D. T., Mummadi, C. K., Ngo, T. P. N., Nguyen, T. H. P., Beggel, L., & Brox, T. (2019). Self: Learning to filter noisy labels with self-ensembling. In ICLR.

  • Park, H., Lee, S., Lee, J., & Ham, B. (2021). Learning by aligning: Visible-infrared person re-identification using cross-modal correspondences. In ICCV (pp. 12046–12055).

  • Qin, Y., Peng, D., Peng, X., Wang, X., & Hu, P. (2022). Deep evidential learning with noisy correspondence for cross-modal retrieval. In ACM MM.

  • Rao, Y., Chen, G., Lu, J., & Zhou, J. (2021). Counterfactual attention learning for fine-grained visual categorization and re-identification. In ICCV (pp. 1025–1034).

  • Shen, Y., & Sanghavi, S. (2019). Learning with bad training data via iterative trimmed loss minimization. In ICML (pp. 5739–5748). PMLR.

  • Shen, Y., Xiao, T., Li, H., Yi, S., & Wang, X. (2018). End-to-end deep Kronecker-product matching for person re-identification. In CVPR (pp. 6886–6895).

  • Shi, J., Zhang, Y., Yin, X., Xie, Y., Zhang, Z., Fan, J., Shi, Z., & Qu, Y. (2023). Dual pseudo-labels interactive self-training for semi-supervised visible-infrared person re-identification. In ICCV.

  • Song, H., Kim, M., Park, D., & Lee, J. G. (2020). Learning from noisy labels with deep neural networks: A survey. arXiv:2007.08199.

  • Suh, Y., Wang, J., Tang, S., Mei, T., & Lee, K. M. (2018). Part-aligned bilinear representations for person re-identification. In ECCV (pp. 402–419).

  • Sun, Y., Zheng, L., Yang, Y., Tian, Q., & Wang, S. (2018). Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In ECCV (pp. 480–496).

  • Tarvainen, A., & Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in Neural Information Processing Systems, 30.

  • Tian, X., Zhang, Z., Lin, S., Qu, Y., Xie, Y., & Ma, L. (2021). Farewell to mutual information: Variational distillation for cross-modal person re-identification. In CVPR (pp. 1522–1531).

  • Wang, G., Zhang, T., Cheng, J., Liu, S., Yang, Y., & Hou, Z. (2019a). RGB-infrared cross-modality person re-identification via joint pixel and feature alignment. In ICCV (pp. 3623–3632).

  • Wang, F., Zuo, W., Lin, L., Zhang, D., & Zhang, L. (2016). Joint learning of single-image and cross-image representations for person re-identification. In CVPR (pp. 1288–1296).

  • Wang, Z., Wang, Z., Zheng, Y., Chuang, Y. Y., & Satoh, S. (2019b) Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In CVPR (pp. 618–626).

  • Wei, Z., Yang, X., Wang, N., & Gao, X. (2021). Syncretic modality collaborative learning for visible infrared person re-identification. In ICCV (pp. 225–234).

  • Wu, A., Zheng, W. S., Gong, S., & Lai, J. (2020). RGB-ir person re-identification by cross-modality similarity preservation. International Journal of Computer Vision, 128, 1765–1785.

  • Wu, A., Zheng, W. S., Yu, H. X., Gong, S., & Lai, J. (2017). RGB-infrared cross-modality person re-identification. In ICCV (pp. 5380–5389).

  • Wu, Q., Dai, P., Chen, J., Lin, C. W., Wu, Y., Huang, F., Zhong, B., & Ji, R. (2021). Discover cross-modality nuances for visible-infrared person re-identification. In CVPR (pp. 4330–4339).

  • Xiao, T., Xia, T., Yang, Y., Huang, C., & Wang, X. (2015). Learning from massive noisy labeled data for image classification. In CVPR (pp. 2691–2699).

  • Yang, M., Huang, Z., Peng, H., Li, T., Lv, J. C., & Peng, X. (2022a). Learning with twin noisy labels for visible-infrared person re-identification. In CVPR.

  • Yang, M., Li, Y., Hu, P., Bai, J., Lv, J. C., & Peng, X. (2022b). Robust multi-view clustering with incomplete information. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Yang, M., Li, Y., Huang, Z., Liu, Z., Hu, P., & Peng, X. (2021). Partially view-aligned representation learning with noise-robust contrastive loss. In CVPR.

  • Ye, M., Li, H., Du, B., Shen, J., Shao, L., & Hoi, S. C. (2022). Collaborative refining for person re-identification with label noise. IEEE Transactions on Image Processing, 31, 379–391.

    Article  Google Scholar 

  • Ye, M., Ruan, W., Du, B., & Shou, M. Z. (2021a). Channel augmented joint learning for visible-infrared recognition. In ICCV (pp. 13567–13576).

  • Ye, M., Shen, J., Crandall, D. J., Shao, L., & Luo, J. (2020). Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In ECCV.

  • Ye, M., Shen, J., Lin, G., Xiang, T., & Shao, L., & Hoi, S. C. (2021b). Deep learning for person re-identification: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(6), 2872–2893.

  • Ye, M., Wang, Z., Lan, X., & Yuen, P. C. (2018). Visible thermal person re-identification via dual-constrained top-ranking. In IJCAI (Vol. 1, p. 2).

  • Ye, M., & Yuen, P. C. (2020). Purifynet: A robust person re-identification model with noisy labels. IEEE Transactions on Information Forensics and Security, 15, 2655–2666.

    Article  Google Scholar 

  • Yu, T., Li, D., Yang, Y., Hospedales, T. M., & Xiang, T. (2019). Robust person re-identification by modelling feature uncertainty. In ICCV (pp. 552–561).

  • Zhang, X., Zhang, R., Cao, J., Gong, D., You, M., & Shen, C. (2020). Part-guided attention learning for vehicle instance retrieval. IEEE Transactions on Intelligent Transportation Systems. 23(4), 3048–3060

  • Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015a). Scalable person re-identification: A benchmark. In ICCV (pp. 1116–1124).

  • Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., & Tian, Q. (2017a). Person re-identification in the wild. In CVPR (pp. 1367–1376).

  • Zheng, W. S., Gong, S., & Tao, X. (2015). Towards open-world person re-identification by one-shot group-based verification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(3), 591–606.

    Article  Google Scholar 

  • Zheng, W. S., Gong, S., & Xiang, T. (2012). Reidentification by relative distance comparison. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(3), 653–668.

    Article  Google Scholar 

  • Zheng, W. S., Hong, J., Jiao, J., Wu, A., Zhu, X., Gong, S., Qin, J., & Lai, J. (2022). Joint bilateral-resolution identity modeling for cross-resolution person re-identification. International Journal of Computer Vision, (pp. 136–156).

  • Zheng, Z., Zheng, L., & Yang, Y. (2017b). Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In ICCV (pp. 3754–3762).

Download references

Acknowledgements

The authors would like to thank the associate editor and reviewers for the constructive comments and valuable suggestions that remarkably improve this study. This work was supported in part by NSFC under Grant U21B2040, 62176171; and in part by the Fundamental Research Funds for the Central Universities under Grant CJ202303.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xi Peng.

Additional information

Communicated by Yasuyuki Matsushita.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

1.1 Proof to Theorem 2

Theorem 2

For the FP &TN combination, the gradient value of \(\mathcal {L}^{aqdr}\) with \(\sigma _5\) w.r.t. \(d_{ij}\) is greater than that w.r.t. \(d_{is}\) when \(d_{ij} < d_{is}\).

Table 13 Ablation studies the network initialization scheme under SYSU-MM01 with 20% noise

Proof

For the FP &TN combination, \(\widetilde{y}^{p}_{ij}=0\) and \(\widetilde{y}^{p}_{is}=0\), the gradient of \(\mathcal {L}^{aqdr}\) with \(\sigma _5\) w.r.t. \(d_{ij}\) is in the form of

$$\begin{aligned} \frac{\partial \mathcal {L}^{aqdr}}{\partial d_{ij}} = \frac{\exp {(2d_{is})}+(1+d_{is}-d_{ij})\exp {(d_{ij}+d_{is})}}{(\exp {(d_{ij})}+\exp {(d_{is})})^{2}}, \end{aligned}$$

and the gradient of \(\mathcal {L}^{aqdr}\) with \(\sigma _5\) w.r.t. \(d_{is}\) is in the form of

$$\begin{aligned} \frac{\partial \mathcal {L}^{aqdr}}{\partial d_{is}} = \frac{\exp {(2d_{ij})}+(1+d_{ij}-d_{is})\exp {(d_{ij}+d_{is})}}{(\exp {(d_{ij})}+\exp {(d_{is})})^{2}}. \end{aligned}$$

Let G be the square difference between the values of \({\partial \mathcal {L}^{aqdr}}/{\partial d_{ij}}\) and \({\partial \mathcal {L}^{aqdr}}/{\partial d_{is}}\), it could be proved that \(G>0, \forall d_{ij} < d_{is}\) by

$$\begin{aligned} \begin{aligned} G&= \left| \frac{\partial \mathcal {L}^{aqdr}}{\partial d_{ij}}\right| ^{2} - \left| \frac{\partial \mathcal {L}^{aqdr}}{\partial d_{is}}\right| ^{2} \\&=\frac{2(d_{is}-d_{ij})\exp {(d_{ij}+d_{is})}+\exp (2d_{is})-\exp (2d_{ij})}{(\exp {(d_{ij})}+\exp {(d_{is})})^{2}}\\&> 0. \end{aligned} \end{aligned}$$

Therefore, the gradient value of \({\partial \mathcal {L}^{aqdr}}/{\partial d_{ij}}\) is greater than \({\partial \mathcal {L}^{aqdr}}/{\partial d_{is}}\) when \(d_{ij} < d_{is}\).

1.2 Discussion

Due to the hard mining strategy, the pairs are susceptible to be with noisy correspondence in the presence of noisy annotation, as discussed in Sect. 3.4.2 in the manuscript. Therefore, the number of different triplet combinations would be inevitably inconsistent. Fortunately, thanks to the proposed adaptive loss \(\mathcal {L}^{aqdr}\) [Eq. (9)], there is no need to use additional techniques to balance the triplet combinations. Specifically, LCNL adopts loss \(\mathcal {L}^{aqdr}\) to adaptively transform the noisy combinations (FP &FN, TP &FN, and FP &TN) into new “clean” combination (TP &TN) for achieving robustness. Thanks to the mechanism of \(\mathcal {L}^{aqdr}\), different types of combinations would be transformed into the new “clean” combination (TP &TN), thus having the same importance as each other. As a result, LCNL could achieve robustness against noisy correspondence under imbalanced triplet combinations.

More Experiment Details

In the Appendix, we elaborate on the details of the used five datasets as follows.

  • SYSU-MM01: it is a large-scale VI-ReID dataset where the images are captured by four visible cameras and two near-infrared ones under both indoor and outdoor environments on the SYSU campus. In the dataset, 22,258 visible images and 11,909 infrared images from 395 identities are used for training, 301 randomly sampled visible gallery images, and 3803 infrared query images from another 96 identities are used for single-shot evaluation.

  • RegDB: it is a VI-ReID dataset that consists of 8240 images from 412 identities. Each identity has 10 visible and 10 infrared images captured by a dual-camera (one visible and one infrared) system. The standard evaluation protocol is using 10 different training/testing splits. At each evaluation trial, half of the identities are chosen for training and the rest are used for evaluation.

  • Market-1501: it is a large-scale V-ReID benchmark, which consists of 32,668 images of 1501 identities captured by six different cameras. In the dataset, 751 identities are used for training and the rest 750 identities are utilized for testing. In the standard testing protocol, 3,368 query images are chosen as the probe set to find the correct matching over 15,913 reference gallery images.

  • DukeMTMC: it is a large-scale V-ReID dataset collected from eight different high-resolution videos. This dataset consists of 16,522 training images from 702 identities, 2228 query images, and 17,661 gallery images from another 702 identities.

  • VeRi-776: It is a widely-used dataset for vehicle ReID which is collected in the real-world urban surveillance scenario. The dataset consists of 37,715 training images from 576 identities, 11,579 gallery images, and 1678 query images from another 200 identities.

More Experiment Results

To investigate the impact of different network initialization on our LCNL, we change the initialization difference by varying the hyper-parameters of the default initialization scheme. Accordingly, we conduct experiments with three settings to investigate the impact of initialization differences on the final performance. Specifically, we initialize two networks with (1) the same initialization; (2) different initialization; and (3) relatively different initialization. The results are summarized in Table 13, where one could find that moderately varying the initialization between two networks might benefit the co-modeling scheme thus slightly improving the performance. However, over-changing the hyper-parameters of the default initialization scheme might lead to unstable optimization. Therefore, in the main experiments, we still initialize networks with default hyper-parameters.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, M., Huang, Z. & Peng, X. Robust Object Re-identification with Coupled Noisy Labels. Int J Comput Vis 132, 2511–2529 (2024). https://doi.org/10.1007/s11263-024-01997-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-024-01997-w

Keywords

Navigation