Abstract
Variational autoencoders (VAEs) have emerged as powerful deep generative models for learning abstract representations in the latent space, making them highly applicable across diverse domains. This paper presents a novel image categorization approach that leverages VAEs with disentangled representations. In VAE-based clustering models, the latent representations learned by encoders often combine both generation and clustering information. To address this concern, our proposed model disentangles the acquired latent representations into dedicated clustering and generation modules, thereby enhancing the performance and efficiency of clustering tasks. Specifically, we introduce an extension of the Kullback–Leibler (KL) divergence to promote independence between these two modules. Additionally, we incorporate the von Mises-Fisher (vMF) distribution to improve the clustering model’s ability to capture cluster characteristics within the generation module. Extensive experimental evaluations confirm the effectiveness of our model in clustering tasks, notably without the requirement for pre-training. Furthermore, when compared to various deep generative clustering models requiring pre-training, our model is able to achieve either comparable or superior performance across multiple datasets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets used in this work are available at:
– MNIST: http://yann.lecun.com/exdb/mnist/
– USPS: https://www.kaggle.com/datasets/bistaumanga/usps-dataset
– GTSRB: https://benchmark.ini.rub.de/gtsrb_news.html
– YTF: https://www.cs.tau.ac.il/~wolf/ytfaces/
– F-MNIST: https://www.kaggle.com/datasets/zalando-research/fashionmnist
References
Aytekin C, Ni X, Cricri F, Aksu E (2018) Clustering and unsupervised anomaly detection with l2 normalized deep auto-encoder representations. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp 1–6
Cai J, Wang S, Guo W (2021) Unsupervised embedded feature learning for deep clustering with stacked sparse auto-encoder. Expert Syst Appl 186(115):729
Cao L, Asadi S, Zhu W, Schmidli C, Sjöberg M (2020) Simple, scalable, and stable variational deep clustering. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 108–124
Chen RT, Li X, Grosse RB, Duvenaud DK (2018) Isolating sources of disentanglement in variational autoencoders. vol 31
Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In: Advances in neural information processing systems, vol 29
Dai Q, Zhao C, Zhao S (2022) Variational bayesian student’st mixture model with closed-form missing value imputation for robust process monitoring of low-quality data. IEEE Transactions on Cybernetics pp 1–14
Davidson T, Falorsi L, De Cao N, Kipf T, Tomczak J (2018a) Hyperspherical variational auto-encoders. In: 34th Conference on Uncertainty in Artificial Intelligence 2018, UAI 2018, pp 856–865
Davidson TR, Falorsi L, De Cao N, Kipf T, Tomczak JM (2018b) Hyperspherical variational auto-encoders. arXiv preprint arXiv:1804.00891
Diallo B, Hu J, Li T, Khan GA, Liang X, Zhao Y (2021) Deep embedding clustering based on contractive autoencoder. Neurocomputing 433:96–107
Dilokthanakul N, Mediano PA, Garnelo M, Lee MC, Salimbeni H, Arulkumaran K, Shanahan M (2016) Deep unsupervised clustering with gaussian mixture variational autoencoders. arXiv preprint arXiv:1611.02648
Dupont E (2018) Learning disentangled joint continuous and discrete representations. vol 31
Fan W, Hou W (2022) Unsupervised modeling and feature selection of sequential spherical data through nonparametric hidden markov models. Int J Mach Learn Cybern 13:3019–3029
Fan W, Bouguila N, Ziou D (2012) Variational learning for finite Dirichlet mixture models and applications. IEEE Transactions on Neural Networks and Learning Systems 23(5):762–774
Fan W, Sallay H, Bouguila N (2017) Online learning of hierarchical Pitman-Yor process mixture of generalized Dirichlet distributions with feature selection. IEEE Transactions on Neural Networks and Learning Systems 28(9):2048–2061
Fan W, Bouguila N, Du JX, Liu X (2019) Axially symmetric data clustering through Dirichlet process mixture models of Watson distributions. IEEE Transactions on Neural Networks and Learning Systems 30(6):1683–1694
Fan W, Yang L, Bouguila N (2022) Unsupervised grouped axial data modeling via hierarchical Bayesian nonparametric models with Watson distributions. IEEE Trans Pattern Anal Mach Intell 44(12):9654–9668
Fan W, Shangguan W, Chen Y (2023) Transformer-based contrastive learning framework for image anomaly detection. Int J Mach Learn Cybern 14:3413–3426
Fan W, Zeng L, Wang T (2023) Uncertainty quantification in molecular property prediction through spherical mixture density networks. Eng Appl Artif Intell 123(106):180
Fei Z, Gong H, Guo J, Wang J, Jin W, Xiang X, Ding X, Zhang N (2023) Image clustering: Utilizing teacher-student model and autoencoder. IEEE Access
Feng K, Qin H, Wu S, Pan W, Liu G (2020) A sleep apnea detection method based on unsupervised feature learning and single-lead electrocardiogram. IEEE Trans Instrum Meas 70:1–12
Gao X, Huang W, Liu Y, Zhang Y, Zhang J, Li C, Bore JC, Wang Z, Si Y, Tian Y et al (2023) A novel robust student’s t-based granger causality for eeg based brain network analysis. Biomed Signal Process Control 80(104):321
Ge P, Ren CX, Dai DQ, Feng J, Yan S (2019) Dual adversarial autoencoders for clustering. IEEE transactions on neural networks and learning systems 31(4):1417–1424
Ghasedi Dizaji K, Herandi A, Deng C, Cai W, Huang H (2017) Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In: Proceedings of the IEEE international conference on computer vision, pp 5736–5745
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol 27
Guo X, Gao L, Liu X, Yin J (2017a) Improved deep embedded clustering with local structure preservation. In: Ijcai, pp 1753–1759
Guo X, Liu X, Zhu E, Yin J (2017b) Deep clustering with convolutional autoencoders. In: International conference on neural information processing, pp 373–382
Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2017) beta-vae: Learning basic visual concepts with a constrained variational framework. In: International conference on learning representations
Houben S, Stallkamp J, Salmen J, Schlipsing M, Igel C (2013) Detection of traffic signs in real-world images: The german traffic sign detection benchmark. In: The 2013 international joint conference on neural networks (IJCNN), pp 1–8
Hu Q, Zhang G, Qin Z, Cai Y, Yu G, Li GY (2023) Robust semantic communications with masked vq-vae enabled codebook. IEEE Transactions on Wireless Communications pp 1–1
Jiang Z, Zheng Y, Tan H, Tang B, Zhou H (2016) Variational deep embedding: An unsupervised and generative approach to clustering. arXiv preprint arXiv:1611.05148
Kim H, Mnih A (2018) Disentangling by factorising. In: International Conference on Machine Learning, pp 2649–2658
Kingma DP, Welling M (2013) Auto-encoding variational bayes. In: International Conference on Learning Representations
Külah E, Çetinkaya YM, Özer AG, Alemdar H (2023) Covid-19 forecasting using shifted gaussian mixture model with similarity-based estimation. Expert Syst Appl 214(119):034
Le Guennec A, Malinowski S, Tavenard R (2016) Data augmentation for time series classification using convolutional neural networks. In: ECML/PKDD workshop on advanced analytics and learning on temporal data, pp 3558–3565
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Li B, Wu F, Weinberger KQ, Belongie S (2019) Positional normalization. vol 32
Li B, Wu F, Lim SN, Belongie S, Weinberger KQ (2021a) On feature normalization and data augmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12,383–12,392
Li X, Kou K, Zhao B (2021b) Weather gan: Multi-domain weather translation using generative adversarial networks. arXiv preprint arXiv:2103.05422
Liu T, Yuan Q, Ding X, Wang Y, Zhang D (2023) Multi-objective optimization for greenhouse light environment using gaussian mixture model and an improved nsga-ii algorithm. Comput Electron Agric 205(107):612
Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605
Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B (2015) Adversarial autoencoders. arXiv preprint arXiv:1511.05644
Marsaglia G, Tsang WW (2000) A simple method for generating gamma variables. ACM Transactions on Mathematical Software (TOMS) 26(3):363–372
McLachlan GJ, Lee SX, Rathnayake SI (2019) Finite mixture models. Annual review of statistics and its application 6:355–378
Meitz M, Preve D, Saikkonen P (2023) A mixture autoregressive model based on student’st-distribution. Communications in Statistics-Theory and Methods 52(2):499–515
Miklautz L, Bauer LG, Mautz D, Tschiatschek S, Böhm C, Plant C (2021) Details (don’t) matter: Isolating cluster information in deep embedded spaces. In: IJCAI, pp 2826–2832
Mukherjee S, Asnani H, Lin E, Kannan S (2019) Clustergan: Latent space clustering in generative adversarial networks. Proceedings of the AAAI conference on artificial intelligence 33:4610–4617
Naesseth C, Ruiz F, Linderman S, Blei D (2017) Reparameterization gradients through acceptance-rejection sampling algorithms. In: Artificial Intelligence and Statistics, pp 489–498
Niknam G, Molaei S, Zare H, Clifton D, Pan S (2023) Graph representation learning based on deep generative gaussian mixture models. Neurocomputing 523:157–169
Satheesh C, Kamal S, Mujeeb A, Supriya M (2021) Passive sonar target classification using deep generative \(\beta \)-vae. IEEE Signal Process Lett 28:808–812
Sevgen E, Moller J, Lange A, Parker J, Quigley S, Mayer J, Srivastava P, Gayatri S, Hosfield D, Korshunova M et al (2023) Prot-vae: Protein transformer variational autoencoder for functional protein design. bioRxiv pp 2023–01
Wolf L, Hassner T, Maoz I (2011) Face recognition in unconstrained videos with matched background similarity. CVPR 2011:529–534
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478–487
Xu J, Durrett G (2018) Spherical latent spaces for stable variational autoencoders. In: Riloff E, Chiang D, Hockenmaier J, Tsujii J (eds) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 4503–4513
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp 267–273
Yang B, Fu X, Sidiropoulos ND, Hong M (2017) Towards k-means-friendly spaces: Simultaneous deep learning and clustering. In: International conference on machine learning, pp 3861–3870
Yang L, Fan W, Bouguila N (2021) Deep clustering analysis via dual variational autoencoder with spherical latent embeddings. IEEE Transactions on Neural Networks and Learning Systems
Yang L, Fan W, Bouguila N (2022) Clustering analysis via deep generative models with mixture models. IEEE Transactions on Neural Networks and Learning Systems 33(1):340–350
Yang L, Fan W, Bouguila N (2022) Robust unsupervised image categorization based on variational autoencoder with disentangled latent representations. Knowl-Based Syst 246(108):671
Yang L, Fan W, Bouguila N (2023) Deep clustering analysis via dual variational autoencoder with spherical latent embeddings. IEEE Transactions on Neural Networks and Learning Systems 34(9):6303–6312
Yang X, Deng C, Zheng F, Yan J, Liu W (2019) Deep spectral clustering using dual autoencoder network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4066–4075
Yang X, Yan J, Cheng Y, Zhang Y (2022c) Learning deep generative clustering via mutual information maximization. IEEE Transactions on Neural Networks and Learning Systems
Zhang Y, Fan W, Bouguila N (2019) Unsupervised image categorization based on variational autoencoder and student’st mixture model. In: 2019 IEEE Symposium Series on Computational Intelligence (SSCI), pp 2403–2409
Zhu X, Zhu Y, Zheng W (2020) Spectral rotation for deep one-step clustering. Pattern Recogn 105(107):175
Zhu X, Xu C, Tao D (2021) Commutative lie group vae for disentanglement learning. In: International Conference on Machine Learning, pp 12,924–12,934
Acknowledgements
The completion of this work was supported by the National Natural Science Foundation of China (62276106), the Guangdong Basic and Applied Basic Research Foundation (2024A1515011767), the Guangdong Provincial Key Laboratory IRADS (2022B1212010006, R0400001-22) and the UIC Start-up Research Fund (UICR0700056-23).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fan, W., Xu, K. Unsupervised image categorization based on deep generative models with disentangled representations and von Mises-Fisher distributions. Int. J. Mach. Learn. & Cyber. 16, 611–623 (2025). https://doi.org/10.1007/s13042-024-02265-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-024-02265-6