Abstract
Face clustering has applications in organizing personal photo album, video understanding and automatic labeling of data for semi-supervised learning. Many existing methods cannot cluster millions of faces. They are either too slow, inaccurate, or need a lot memory. In our paper, we proposed a two stage unsupervised clustering algorithm which can cluster millions of faces in minutes. A rough clustering using greedy Transitive Closure (TC) algorithm to separate the easy to locate clusters, then a more precise non-greedy clustering algorithm is used to split the clusters into smaller clusters. We also developed a set of omni-supervised transformations that can produce multiple embeddings using a single trained model as if there are multiple models trained. These embeddings are combined using simple averaging and normalization. We carried out extensive experiments with multiple datasets of different sizes comparing with existing state of the art clustering algorithms to show that our clustering algorithm is robust to differences between datasets, efficient and outperforms existing methods. We also carried out further analysis on number of singleton clusters and variations of our model using different non-greedy clustering algorithms. We did trained our semi-supervised model using the cluster labels and shown that our clustering algorithm is effective for semi-supervised learning.
Similar content being viewed by others
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request
References
Radosavovic I, Dollár P, Girshick R, Gkioxari G, He K (2018) Data distillation: towards omni-supervised learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4119–4128
Sarfraz S, Sharma V, Stiefelhagen R (2019) Efficient parameter-free clustering using first neighbor relations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8934–8943
Yang L, Zhan X, Chen D, Yan J, Loy CC, Lin D (2019) learning to cluster faces on an affinity graph. in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2298–2306
Wang Z, Zheng L, Li Y, Wang S (2019) Linkage based face clustering via graph convolution network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1117–1125
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Iscen A, Tolias G, Avrithis Y, Chum O (2019) Label propagation for deep semi-supervised learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5070–5079
Wang S, Meng J, Yuan J, Tan Y-P (2019) Joint representative selection and feature learning: a semi-supervised approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6005–6013
Wu S, Li J, Liu C, Yu Z, Wong H-S (2019) Mutual learning of complementary networks via residual correction for improving semi-supervised classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6500–6509
Li Q, Wu X-M, Liu H, Zhang X, Guan Z(2019) Label efficient semi-supervised learning via graph filtering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9582–9591
Wu S, Deng G, Li J, Li R, Yu Z, Wong H-S (2019) Enhancing triplegan for semi-supervised conditional instance synthesis and classification. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 10091–10100
Yu B, Wu J, Ma J, Zhu Z (2019) Tangent-normal adversarial regularization for semi-supervised learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10676–10684
Jiang B, Zhang Z, Lin D, Tang J, Luo B (2019) Semi-supervised learning with graph learning-convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11313–11320
Qiao S, Shen W, Zhang Z, Wang B, Yuille A (2018) Deep co-training for semi-supervised image recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 135–152
Robert T, Thome N, Cord M (2018) Hybridnet: classification and reconstruction cooperation for semi-supervised learning. In: Proceedings of the European conference on computer vision (ECCV), pp 153–169
Chen Y, Zhu X, Gong S (2018) Semi-supervised deep learning with memory. In: Proceedings of the European conference on computer vision (ECCV), pp 268–283
Shi W, Gong Y, Ding C, MaXiaoyu Tao Z, Zheng N (2018) Transductive semi-supervised deep learning using min-max features. In: Proceedings of the European conference on computer vision (ECCV), pp 299–315
Cicek S, Fawzi A, Soatto S (2018) Saas: speed as a supervisor for semi-supervised learning. In: Proceedings of the European conference on computer vision (ECCV), pp 149–163
Liu Y, Song G, Shao J, Jin X, Wang X (2018) Transductive centroid projection for semi-supervised large-scale recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 70–86
Coelho de Castro D, Nowozin S (2018) From face recognition to models of identity: a bayesian approach to learning about unknown identities from unsupervised data. In: Proceedings of the European conference on computer vision (ECCV), pp 745–761
Kumar V, Namboodiri A, Jawahar C (2018) Semi-supervised annotation of faces in image collection. Signal Image Video Process 12(1):141–149
Sharma V, Tapaswi M, Sarfraz MS, Stiefelhagen R (2019) Self-supervised learning of face representations for video face clustering. arXiv preprint arXiv:1903.01000
Zhan X, Liu Z, Yan J, Lin D, Change Loy C (2018) Consensus-driven propagation in massive unlabeled data for face recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 568–583
Shen S, Li W, Zhu Z, Huang G, Du D, Lu J, Zhou J(2021) Structure aware face clustering on a large-scale graph with 107 nodes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE, pp 9085–9094
Nguyen XB, Bui DT, Duong CN, Bui TD, Luu K (2021) Clusformer: a transformer based clustering approach to unsupervised large-scale face and visual landmark recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE, pp 10847–10856
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Sign Process Lett 23(10):1499–1503
Whitelam C, Taborsky E, Blanton A, Maze B, Adams J, Miller T, Kalka N, Jain AK, Duncan JA, Allen K, et al. (2017) Iarpa janus benchmark-b face dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 90–98
Guo Y, Zhang L, Hu Y, He X, Gao J(2016) Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: European conference on computer vision, Springer, pp 87–102
Amigó E, Gonzalo J, Artiles J, Verdejo F (2009) A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf Retriev 12(4):461–486
Zhan X (2019) Implementation of “Consensus-Driven Propagation in Massive Unlabeled Data for Face Recognition” (CDP). GitHub
Yang L (2019) Learning to cluster faces on an affinity graph (CVPR 2019). GitHub
Wang Z (2019) Linkage-based face clustering via graph convolution network. GitHub
Yang L, Chen D, Zhan X, Zhao R, Loy CC, Lin D (2020) Learning to cluster faces via confidence and connectivity estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13369–13378
Liu Y, Zhang G, Wang H, Zhao W, Zhang M, Qin H (2019) An efficient super-resolution network based on aggregated residual transformations. Electronics 8(3):339
Funding
The authors did not receive support from any organization for the submitted work
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare they have no financial interests
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Implementation details of our clustering algorithm
We use the TC clustering algorithm as the backbone algorithm in our clustering algorithm. Our clustering algorithm first does a greedy clustering (TC clustering), next a non-greedy algorithm and lastly propagation of cluster labels from labeled embeddings to unlabeled embeddings (developed by us). Our whole algorithm is written entirely in python and we released the codes in our github site (state the site). We used the data from the papers [3] and [4]. All clustering experiments are ran on a desktop machine with Intel Core i7-7700 CPU @3.60GHz. We use only one core (without any parallel processing) in our experiments. Our clustering algorithm is simple and can run on most CPU using only single core.We use only one core (without any parallel processing) in our experiments. Our clustering algorithm is simple and can run on most CPU using only single core. We have uploaded our codes to github. It can be accessed through the link https://github.com/singkuangtan/face-clustering.
For Table 6, we use embeddings trained using softmax loss function and as for semi-supervised learning (Table 12), we use embeddings trained using ArcFace loss function.
Appendix 2: Python functions and packages
Table 13 shows the python functions and packages we use for the experiments.
Appendix 3: Relationship of the distances in four cases
We begin by describing a set of properties,
where \(C_0\) is the set of embedding indices for cluster 0 and likewise \(C_1\) is the set of embedding indices for cluster 1. \(n(i,j)=1\) if embedding i and j are neighbors else it is a 0. \(e_i\) or \(e_j\) is an embedding with index i or j.
For Case 0, a greedy clustering algorithm with a large threshold can separate the two clusters. Mathematically, it is
where max is a maximum function of the two input values and \(>>\) means much greater than (by a few times).
For case 1, a greedy clustering algorithm can separate the two clusters, but the gap between the clusters is smaller and therefore a smaller threshold is used. Mathematically, it is
For case 2, there is a bridge that connects nearest neighbor embeddings from the two clusters. Therefore no threshold using a greedy algorithm is able to separate them. However, the mean interclass distance is still larger than the mean intraclass distance. This property enables the clusters to be separated by non-greedy clustering algorithm such as Kmeans. Mathematically, it is
For case 3, although the singleton cluster 0 is separated from main cluster 1 using a threshold and greedy clustering algorithm, the ground truth class of cluster 0 is the same as cluster 1 due to random outlier noise. So the singleton cluster 0 should be combined with cluster 1. Mathematically, it is
where \(>>\) means it is much greater (a few times greater).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tan, S.K., Wang, X. A novel two-stage omni-supervised face clustering algorithm. Pattern Anal Applic 27, 83 (2024). https://doi.org/10.1007/s10044-024-01298-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10044-024-01298-5