Abstract
The task of unsupervised person re-identification (ReID) involves matching images of the same individual across non-overlapping fields of view captured by cameras without the use of manual labels. In recent years, with the rise of aerial perspectives enabled by unmanned aerial vehicles (UAVs), this task has been extended to the aerial domain. However, compared to ground-based fixed cameras, UAV cameras offer greater flexibility, leading to more severe pose and viewpoint variations of pedestrians in aerial perspective. This results in significant intra-class variances during the clustering process within unsupervised methods. Existing unsupervised ReID methods rarely focus on the significant intra-class variations caused by the UAV perspective. To address these issues, we propose a Region Aware Transformer with Intra-Class Compact for Unsupervised aerial person ReID. Recognizing the invariance of local features under severe distortions, our Region Aware Transformer integrates both global and local information to achieve more stable feature representations. Furthermore, to mitigate the issue of substantial intra-class disparities among similar samples, we devise a Cluster Compact Loss. This loss function penalizes samples that stray too far from their respective cluster centers, encouraging more compact clustering. Our method not only outperforms state-of-the-art (SOTA) performance on UAV ReID datasets under unsupervised conditions, but it also demonstrates outstanding performance on ground-based ReID datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bai, Z., Wang, Z., Wang, J., Hu, D., Ding, E.: Unsupervised multi-source domain adaptation for person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12914–12923 (2021)
Chen, H., Zhang, Q., Lai, J.H., Xie, X.: Unsupervised group re-identification via adaptive clustering-driven progressive learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 1054–1062 (2024)
Chen, H., Zhang, Q., Lai, J.: Salient foreground-aware network for person search. In: Chinese Conference on Biometric Recognition, pp. 433–443. Springer (2022)
Chen, J., Gao, C., Sun, L., Sang, N.: Ccsd: cross-camera self-distillation for unsupervised person re-identification. Visual Intell. 1(1), 27 (2023)
Chen, S., Ye, M., Du, B.: Rotation invariant transformer for recognizing object in uavs. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 2565–2574 (2022)
Chen, Y., Fan, Z., Chen, Z., Zhu, Y.: Ca-jaccard: Camera-aware jaccard distance for person re-identification (2023). arXiv:2311.10605
Dai, Z., Wang, G., Yuan, W., Zhu, S., Tan, P.: Cluster contrast for unsupervised person re-identification. In: Proceedings of the Asian Conference on Computer Vision, pp. 1142–1160 (2022)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale (2020). arXiv:2010.11929
Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Knowledge Discovery and Data Mining, vol. 96, pp. 226–231 (1996)
Fu, D., Chen, D., Bao, J., Yang, H., Yuan, L., Zhang, L., Li, H., Chen, D.: Unsupervised pre-training for person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14750–14759 (2021)
Ge, Y., Zhu, F., Chen, D., Zhao, R., et al.: Self-paced contrastive learning with hybrid memory for domain adaptive object re-id. Adv. Neural. Inf. Process. Syst. 33, 11309–11321 (2020)
He, S., Luo, H., Wang, P., Wang, F., Li, H., Jiang, W.: Transreid: Transformer-based object re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15013–15022 (2021)
Khaldi, K., Nguyen, V.D., Mantini, P., Shah, S.: Unsupervised person re-identification in aerial imagery. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 260–269 (2024)
Lee, G., Lee, S., Kim, D., Shin, Y., Yoon, Y., Ham, B.: Camera-driven representation learning for unsupervised domain adaptive person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11453–11462 (2023)
Li, T., Liu, J., Zhang, W., Ni, Y., Wang, W., Li, Z.: Uav-human: A large benchmark for human behavior understanding with unmanned aerial vehicles. In: Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pp. 16266–16275 (2021)
Liu, X., Zhou, S., Lei, T., Jiang, P., Chen, Z., Lu, H.: First-person video domain adaptation with multi-scene cross-site datasets and attention-based methods. IEEE Trans. Circuits Syst. Video Technol. (2023)
Lloyd, S.: Least squares quantization in pcm. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Luo, H., Wang, P., Xu, Y., Ding, F., Zhou, Y., Wang, F., Li, H., Jin, R.: Self-supervised pre-training for transformer-based person re-identification (2021). arXiv:2111.12084
Peng, J., Jiang, G., Wang, H.: Adaptive memorization with group labels for unsupervised person re-identification. IEEE Trans. Circuits Syst. Video Technol. (2023)
Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. 105(4), 1118–1123 (2008)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: Computer vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, proceedings, part VII 14, pp. 499–515. Springer (2016)
Yan, P., Liu, X., Zhang, P., Lu, H.: Learning convolutional multi-level transformers for image-based person re-identification. Visual Intell. 1(1), 24 (2023)
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.: Deep learning for person re-identification: a survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 2872–2893 (2021)
Zhai, Y., Lu, S., Ye, Q., Shan, X., Chen, J., Ji, R., Tian, Y.: Ad-cluster: Augmented discriminative clustering for domain adaptive person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9021–9030 (2020)
Zhang, Q., Dang, K., Lai, J.H., Feng, Z., Xie, X.: Modeling 3d layout for group re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7512–7520 (2022)
Zhang, Q., Lai, J.H., Feng, Z., Xie, X.: Uncertainty modeling with second-order transformer for group re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 3318–3325 (2022)
Zhang, Q., Lai, J., Feng, Z., Xie, X.: Seeing like a human: asynchronous learning with dynamic progressive refinement for person re-identification. IEEE Trans. Image Process. 31, 352–365 (2021)
Zhang, Q., Lai, J., Feng, Z., Xie, X.: Uncertainty modeling for group re-identification. Int. J. Comput. Vis. 1–21 (2024)
Zhang, Q., Lai, J., Xie, X.: Learning modal-invariant angular metric by cyclic projection network for vis-nir person re-identification. IEEE Trans. Image Process. 30, 8019–8033 (2021)
Zhang, Q., Lai, J., Xie, X., Chen, H.: A summary on group re-identification. J. Image Graph. 28(5), 1225–1241 (2023)
Zhang, Q., Lai, J., Xie, X., Jin, X., Huang, S.: Separable spatial-temporal residual graph for cloth-changing group re-identification. IEEE Trans. Pattern Anal. Mach. Intell. (2024)
Zhang, Q., Wang, L., Patel, V.M., Xie, X., Lai, J.: View-decoupled transformer for person re-identification under aerial-ground camera network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22000–22009 (2024)
Zhang, S., Zhang, Q., Yang, Y., Wei, X., Wang, P., Jiao, B., Zhang, Y.: Person re-identification in aerial imagery. IEEE Trans. Multimedia 23, 281–291 (2020)
Zhang, X., Li, D., Wang, Z., Wang, J., Ding, E., Shi, J.Q., Zhang, Z., Wang, J.: Implicit sample extension for unsupervised person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7369–7378 (2022)
Zheng, A., Liu, J., Wang, Z., Huang, L., Li, C., Yin, B.: Visible-infrared person re-identification via specific and shared representations learning. Visual Intell. 1(1), 29 (2023)
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1116–1124 (2015)
Zhong, Z., Zheng, L., Cao, D., Li, S.: Re-ranking person re-identification with k-reciprocal encoding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1318–1327 (2017)
Zhu, K., Guo, H., Yan, T., Zhu, Y., Wang, J., Tang, M.: Pass: Part-aware self-supervised pre-training for person re-identification. In: European Conference on Computer Vision, pp. 198–214. Springer (2022)
Acknowledgments
This project was supported in part by the NSFC (U22A2095, 62076258), the Key-Area Research and Development Program of Guangzhou (202206030003), Guangdong Project (No. 2020B1515120085), the Project of Guangdong Provincial Key Laboratory of Information Security Technology (Grant No. 2023B121206 0026), and International Program Fund for Young Talent Scientific Research People, Sun Yat-Sen University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Lu, Z., Chen, H., Lai, JH. (2025). Region Aware Transformer with Intra-Class Compact for Unsupervised Aerial Person Re-identification. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15042. Springer, Singapore. https://doi.org/10.1007/978-981-97-8858-3_17
Download citation
DOI: https://doi.org/10.1007/978-981-97-8858-3_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8857-6
Online ISBN: 978-981-97-8858-3
eBook Packages: Computer ScienceComputer Science (R0)