Abstract
This paper proposes to perform online clustering by conducting twin contrastive learning (TCL) at the instance and cluster level. Specifically, we find that when the data is projected into a feature space with a dimensionality of the target cluster number, the rows and columns of its feature matrix correspond to the instance and cluster representation, respectively. Based on the observation, for a given dataset, the proposed TCL first constructs positive and negative pairs through data augmentations. Thereafter, in the row and column space of the feature matrix, instance- and cluster-level contrastive learning are respectively conducted by pulling together positive pairs while pushing apart the negatives. To alleviate the influence of intrinsic false-negative pairs and rectify cluster assignments, we adopt a confidence-based criterion to select pseudo-labels for boosting both the instance- and cluster-level contrastive learning. As a result, the clustering performance is further improved. Besides the elegant idea of twin contrastive learning, another advantage of TCL is that it could independently predict the cluster assignment for each instance, thus effortlessly fitting online scenarios. Extensive experiments on six widely-used image and text benchmarks demonstrate the effectiveness of TCL. The code is released on https://pengxi.me.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Asano, Y., Rupprecht, C., & Vedaldi, A. (2019). Self-labelling via simultaneous clustering and representation learning. In International conference on learning representations.
Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. In Advances in neural information processing systems (pp. 153–160).
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: Theory and experiment, 10, P10008.
Cai, D., He, X., Wang, X., Bao, H., & Han, J. (2009). Locality preserving nonnegative matrix factorization. IJCAI, 9, 1010–1015.
Caron, M., Bojanowski, P., Joulin, A., & Douze, M. (2018). Deep clustering for unsupervised learning of visual features. In Proceedings of the European conference on computer vision (ECCV) (pp. 132–149).
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., & Joulin, A. (2020). Unsupervised learning of visual features by contrasting cluster assignments. In Thirty-fourth conference on neural information processing systems (NeurIPS).
Chang, J., Guo, Y., Wang, L., Meng, G., Xiang, S., & Pan, C. (2019). Deep discriminative clustering analysis. arXiv preprint arXiv:1905.01681
Chang, J., Wang, L., Meng, G., Xiang, S., & Pan, C. (2017a). Deep adaptive image clustering. In Proceedings of the IEEE international conference on computer vision (pp. 5879–5887).
Chang, J., Wang, L., Meng, G., Xiang, S., & Pan, C. (2017b). Deep adaptive image clustering. In Proceedings of the IEEE international conference on computer vision (pp. 5879–5887).
Chen, X., & He, K. (2020). Exploring simple siamese representation learning. arXiv preprint arXiv:2011.10566
Chen, X., Fan, H., Girshick, R., & He, K. (2020c). Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297
Chen, T., Kornblith, S., Norouzi, M., Hinton, G. (2020a). A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709
Chen, T., Kornblith, S., Swersky, K., Norouzi, M., & Hinton, G. E. (2020). Big self-supervised models are strong semi-supervised learners. Advances in Neural Information Processing Systems, 33, 22243–22255.
Chen, G., & Lerman, G. (2009). Spectral curvature clustering (scc). International Journal of Computer Vision, 81(3), 317–330.
Coates, A., Ng, A., & Lee, H. (2011). An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics (pp. 215–223).
Cubuk, E. D., Zoph, B., Shlens, J., & Le, Q. V. (2020). Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 702–703).
Dang, Z., Deng, C., Yang, X., Huang, H. (2021). Doubly contrastive deep clustering. arXiv preprint arXiv:2103.05484
DeVries, T., & Taylor, G.W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552
Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., & Zisserman, A. (2021). With a little help from my friends: Nearest-neighbor contrastive learning of visual representations. arXiv preprint arXiv:2104.14548
Ghasedi Dizaji, K., Herandi, A., Deng, C., Cai, W., & Huang, H. (2017). Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In Proceedings of the IEEE international conference on computer vision (pp. 5736–5745).
Gowda, K. C., & Krishna, G. (1978). Agglomerative clustering using the concept of mutual nearest neighbourhood. Pattern Recognition, 10(2), 105–112.
Grill, J. B., Strub, F., Altché, F., Tallec, C., Richemond, PH., Buchatskaya, E., Doersch, C., Pires, B. A., Guo, ZD., & Azar, M. G. et al. (2020). Bootstrap your own latent: A new approach to self-supervised learning. arXiv preprint arXiv:2006.07733
Guo, X., Gao, L., Liu, X., & Yin, J. (2017). Improved deep embedded clustering with local structure preservation. In IJCAI (pp. 1753–1759).
Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality reduction by learning an invariant mapping. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), IEEE (Vol. 2, pp. 1735–1742).
Haeusser, P., Plapp, J., Golkov, V., Aljalbout, E., & Cremers, D. (2018). Associative deep clustering: Training a classification network with no labels. In German conference on pattern recognition (pp. 18–32). Springer.
Han, S., Park, S., Park, S., Kim, S., & Cha, M. (2020). Mitigating embedding and class assignment mismatch in unsupervised image classification. In Computer Vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, proceedings, part XXIV 16 (pp. 768–784). Springer.
Harris, Z. S. (1954). Distributional structure. Word, 10(2–3), 146–162.
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9729–9738).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Hu, W., Miyato, T., Tokui, S., Matsumoto, E., & Sugiyama, M. (2017). Learning discrete representations via information maximizing self-augmented training. In: International conference on machine learning, PMLR (pp. 1558–1567).
Hu, Q., Wang, X., Hu, W., & Qi, GJ. (2020). Adco: Adversarial contrast for efficient learning of unsupervised representations from self-trained negative adversaries. arXiv preprint arXiv:2011.08435
Huang, J., Gong, S., & Zhu, X. (2020). Deep semantic clustering by partition confidence maximisation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Ji, X., Henriques, J. F., & Vedaldi, A. (2019). Invariant information clustering for unsupervised image classification and segmentation. In Proceedings of the IEEE international conference on computer vision (pp. 9865–9874).
Jones, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation.
Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., & Krishnan, D. (2020). Supervised contrastive learning. Advances in Neural Information Processing Systems, 33.
Kim, Y., Yim, J., Yun, J., & Kim, J. (2019). Nlnl: Negative learning for noisy labels. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 101–110).
Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114
Kiros, R., Zhu, Y., Salakhutdinov, R., Zemel, R. S., Torralba, A., Urtasun, R., & Fidler, S. (2015). Skip-thought vectors. arXiv preprint arXiv:1506.06726
Kiselev, V. Y., Andrews, T. S., & Hemberg, M. (2019). Challenges in unsupervised clustering of single-cell rna-seq data. Nature Reviews Genetics, 20(5), 273–282.
Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Master’s thesis. Department of Computer Science, University of Toronto.
Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In International conference on machine learning, PMLR (pp. 1188–1196).
Li, T., & Ding, C. (2006). The relationships among various nonnegative matrix factorization methods for clustering. In Sixth International Conference on Data Mining (ICDM’06), IEEE (pp. 362–371).
Li, Y., Hu, P., Liu, Z., Peng, D., Zhou, J. T., & Peng, X. (2021b). Contrastive clustering (Vol. 35).
Li, X., Zhang, R., Wang, Q., & Zhang, H. (2020). Autoencoder constrained clustering with adaptive neighbors. IEEE Transactions on Neural Networks and Learning Systems, 1–7.
Li, J., Zhou, P., Xiong, C., & Hoi, SC. (2021a). Prototypical contrastive learning of unsupervised representations. In ICLR.
Liu, X., Dou, Y., Yin, J., Wang, L., & Zhu, E. (2016). Multiple kernel k-means clustering with matrix-induced regularization. In Proceedings of the thirtieth AAAI conference on artificial intelligence (pp. 1888–1894).
Liu, W., Shen, X., & Tsang, I. (2017). Sparse embedded k-means clustering. In Advances in neural information processing systems (pp. 3319–3327).
Ma, E. (2019). Nlp augmentation. https://github.com/makcedward/nlpaug
MacQueen, J., et al. (1967). Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA (Vol. 1, pp. 281–297).
Nie, F., Wang, CL., & Li, X. (2019). K-multiple-means: A multiple-means clustering method with specified k clusters. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 959–967).
Nie, F., Zeng, Z., Tsang, I. W., Xu, D., & Zhang, C. (2011). Spectral embedded clustering: A framework for in-sample and out-of-sample spectral clustering. IEEE Transactions on Neural Networks, 22(11), 1796–1808.
Niu, C., & Wang, G. (2021). Spice: Semantic pseudo-labeling for image clustering. arXiv preprint arXiv:2103.09382
Oord, A., Li, Y., & Vinyals, O. (2018). Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748
Park, S., Han, S., Kim, S., Kim, D., Park, S., Hong, S., & Cha, M. (2020). Improving unsupervised image clustering with robust learning. arXiv preprint arXiv:2012.11150
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., & Antiga, L., ... & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703
Pelleg, D., & Moore, A. W. (2000). X-means: Extending k-means with efficient estimation of the number of clusters. Icml, 1, 727–734.
Peng, X., Xiao, S., Feng, J., Yau, W. Y., & Yi, Z. (2016). Deep subspace clustering with sparsity prior. In IJCAI (pp. 1925–1931).
Peng, X., Yi, Z., & Tang, H. (2015). Robust subspace clustering via thresholding ridge regression. AAAI, 25, 3827–3833.
Peng, X., Zhu, H., Feng, J., Shen, C., Zhang, H., & Zhou, J. T. (2019). Deep clustering with sample-assignment invariance prior. IEEE Transactions on Neural Networks and Learning Systems, 31(11), 4857–4868.
Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434
Rakib, M. R. H., Zeh, N., Jankowska, M., & Milios, E. (2020). Enhancement of short text clustering by iterative classification. In International conference on applications of natural language to information systems (pp. 105–117). Springer.
Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084
Shen, S., Li, W., Zhu, Z., Huang, G., Du, D., Lu, J., & Zhou, J. (2021). Structure-aware face clustering on a large-scale graph with \(10^7\) nodes. arXiv preprint arXiv:2103.13225
Socher, R., Pennington, J., Huang, E. H., Ng, A. Y., & Manning, C. D. (2011). Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the 2011 conference on empirical methods in natural language processing (pp. 151–161).
Strehl, A., & Ghosh, J. (2002). Cluster ensembles–a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617.
Tang, M., Marin, D., Ayed, I. B., & Boykov, Y. (2019). Kernel cuts: Kernel and spectral clustering meet regularization. International Journal of Computer Vision, 127(5), 477–511.
Thanh, N. D., & Ali, M. (2017). Neutrosophic recommender system for medical diagnosis based on algebraic similarity measure and clustering. In 2017 IEEE international conference on fuzzy systems (FUZZ-IEEE), IEEE (pp. 1–6).
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9(11).
Van Gansbeke, W., Vandenhende, S., Georgoulis, S., & Van Gool, L. (2021). Revisiting contrastive methods for unsupervised learning of visual representations. arXiv preprint arXiv:2106.05967
Van Gansbeke, W., Vandenhende, S., Georgoulis, S., Proesmans, M., Van Gool, L. (2020). Scan: Learning to classify images without labels. In European conference on computer vision (pp. 268–285). Springer.
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P. A., & Bottou, L. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(12).
Wang, X., & Qi, G. J. (2021). Contrastive learning with stronger augmentations. arXiv preprint arXiv:2104.07713
Wang, X., Liu, Z., & Yu, S. X. (2021). Unsupervised feature learning by cross-level instance-group discrimination. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12586–12595).
Wei, J., & Zou, K. (2019). Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196
Wu, J., Long, K., Wang, F., Qian, C., Li, C., Lin, Z., & Zha, H. (2019). Deep comprehensive correlation mining for image clustering. In Proceedings of the IEEE international conference on computer vision (pp. 8150–8159).
Xie, J., Girshick, R., & Farhadi, A. (2016). Unsupervised deep embedding for clustering analysis. In International conference on machine learning (pp. 478–487).
Xu, J., Xu, B., Wang, P., Zheng, S., Tian, G., & Zhao, J. (2017). Self-taught convolutional neural networks for short text clustering. Neural Networks, 88, 22–31.
Xu, J., Xu, B., Wang, P., Zheng, S., Tian, G., & Zhao, J. (2017). Self-taught convolutional neural networks for short text clustering. Neural Networks, 88, 22–31.
Yang, J., Parikh, D., & Batra, D. (2016). Joint unsupervised learning of deep representations and image clusters. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5147–5156).
Yin, J., & Wang, J. (2016). A model-based approach for text clustering with outlier detection. In 2016 IEEE 32nd international conference on data engineering (ICDE), IEEE (pp. 625–636).
Zbontar, J., Jing, L., Misra, I., LeCun, Y., & Deny, S. (2021). Barlow twins: Self-supervised learning via redundancy reduction. arXiv preprint arXiv:2103.03230
Zeiler, M. D., Krishnan, D., Taylor, G. W., & Fergus, R. (2010). Deconvolutional networks. In 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE (pp. 2528–2535).
Zelnik-Manor, L., & Perona, P. (2005). Self-tuning spectral clustering. Advances in Neural Information Processing Systems, 1601–1608.
Zhang, D., Nan, F., Wei, X., Li, S., Zhu, H., McKeown, K., Nallapati, R., Arnold, A., & Xiang, B. (2021a). Supporting clustering with contrastive learning. arXiv preprint arXiv:2103.12953
Zhang, D., Nan, F., Wei, X., Li, S., Zhu, H., McKeown, K., Nallapati, R., Arnold, A., & Xiang, B. (2021b). Supporting clustering with contrastive learning. arXiv preprint arXiv:2103.12953
Zhang, W., Wang, X., Zhao, D., & Tang, X. (2012). Graph degree linkage: Agglomerative clustering on a directed graph. In European conference on computer vision (pp. 428–441). Springer.
Zhong, H., Chen, C., Jin, Z., & Hua, X. S. (2020). Deep robust clustering by contrastive learning. arXiv preprint arXiv:2008.03030
Acknowledgements
The authors would like to thank the Associate editor and reviewers for the constructive comments and valuable suggestions that remarkably improve this study. This work was supported in part by the National Key R &D Program of China under Grant 2020YFB1406702; in part by NFSC under Grant 62176171, U21B2040, and U19A2078; and in part by Open Research Projects of Zhejiang Lab under Grant 2021KH0AB02.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Frederic Jurie.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, Y., Yang, M., Peng, D. et al. Twin Contrastive Learning for Online Clustering. Int J Comput Vis 130, 2205–2221 (2022). https://doi.org/10.1007/s11263-022-01639-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-022-01639-z