Unsupervised image categorization based on deep generative models with disentangled representations and von Mises-Fisher distributions

175 Accesses
Explore all metrics

Abstract

Variational autoencoders (VAEs) have emerged as powerful deep generative models for learning abstract representations in the latent space, making them highly applicable across diverse domains. This paper presents a novel image categorization approach that leverages VAEs with disentangled representations. In VAE-based clustering models, the latent representations learned by encoders often combine both generation and clustering information. To address this concern, our proposed model disentangles the acquired latent representations into dedicated clustering and generation modules, thereby enhancing the performance and efficiency of clustering tasks. Specifically, we introduce an extension of the Kullback–Leibler (KL) divergence to promote independence between these two modules. Additionally, we incorporate the von Mises-Fisher (vMF) distribution to improve the clustering model’s ability to capture cluster characteristics within the generation module. Extensive experimental evaluations confirm the effectiveness of our model in clustering tasks, notably without the requirement for pre-training. Furthermore, when compared to various deep generative clustering models requiring pre-training, our model is able to achieve either comparable or superior performance across multiple datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

HGMVAE: hierarchical disentanglement in Gaussian mixture variational autoencoder

Article 01 April 2024

Deep generative clustering methods based on disentangled representations and augmented data

Article 28 April 2024

Deep Image Clustering Using Self-learning Optimization in a Variational Auto-Encoder

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets used in this work are available at:

– MNIST: http://yann.lecun.com/exdb/mnist/

– USPS: https://www.kaggle.com/datasets/bistaumanga/usps-dataset

– GTSRB: https://benchmark.ini.rub.de/gtsrb_news.html

– YTF: https://www.cs.tau.ac.il/~wolf/ytfaces/

– F-MNIST: https://www.kaggle.com/datasets/zalando-research/fashionmnist

References

Aytekin C, Ni X, Cricri F, Aksu E (2018) Clustering and unsupervised anomaly detection with l2 normalized deep auto-encoder representations. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp 1–6
Cai J, Wang S, Guo W (2021) Unsupervised embedded feature learning for deep clustering with stacked sparse auto-encoder. Expert Syst Appl 186(115):729
MATH Google Scholar
Cao L, Asadi S, Zhu W, Schmidli C, Sjöberg M (2020) Simple, scalable, and stable variational deep clustering. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 108–124
Chen RT, Li X, Grosse RB, Duvenaud DK (2018) Isolating sources of disentanglement in variational autoencoders. vol 31
Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In: Advances in neural information processing systems, vol 29
Dai Q, Zhao C, Zhao S (2022) Variational bayesian student’st mixture model with closed-form missing value imputation for robust process monitoring of low-quality data. IEEE Transactions on Cybernetics pp 1–14
Davidson T, Falorsi L, De Cao N, Kipf T, Tomczak J (2018a) Hyperspherical variational auto-encoders. In: 34th Conference on Uncertainty in Artificial Intelligence 2018, UAI 2018, pp 856–865
Davidson TR, Falorsi L, De Cao N, Kipf T, Tomczak JM (2018b) Hyperspherical variational auto-encoders. arXiv preprint arXiv:1804.00891
Diallo B, Hu J, Li T, Khan GA, Liang X, Zhao Y (2021) Deep embedding clustering based on contractive autoencoder. Neurocomputing 433:96–107
Article MATH Google Scholar
Dilokthanakul N, Mediano PA, Garnelo M, Lee MC, Salimbeni H, Arulkumaran K, Shanahan M (2016) Deep unsupervised clustering with gaussian mixture variational autoencoders. arXiv preprint arXiv:1611.02648
Dupont E (2018) Learning disentangled joint continuous and discrete representations. vol 31
Fan W, Hou W (2022) Unsupervised modeling and feature selection of sequential spherical data through nonparametric hidden markov models. Int J Mach Learn Cybern 13:3019–3029
Article MATH Google Scholar
Fan W, Bouguila N, Ziou D (2012) Variational learning for finite Dirichlet mixture models and applications. IEEE Transactions on Neural Networks and Learning Systems 23(5):762–774
Article MATH Google Scholar
Fan W, Sallay H, Bouguila N (2017) Online learning of hierarchical Pitman-Yor process mixture of generalized Dirichlet distributions with feature selection. IEEE Transactions on Neural Networks and Learning Systems 28(9):2048–2061
MathSciNet MATH Google Scholar
Fan W, Bouguila N, Du JX, Liu X (2019) Axially symmetric data clustering through Dirichlet process mixture models of Watson distributions. IEEE Transactions on Neural Networks and Learning Systems 30(6):1683–1694
Article MathSciNet MATH Google Scholar
Fan W, Yang L, Bouguila N (2022) Unsupervised grouped axial data modeling via hierarchical Bayesian nonparametric models with Watson distributions. IEEE Trans Pattern Anal Mach Intell 44(12):9654–9668
Article MATH Google Scholar
Fan W, Shangguan W, Chen Y (2023) Transformer-based contrastive learning framework for image anomaly detection. Int J Mach Learn Cybern 14:3413–3426
Article MATH Google Scholar
Fan W, Zeng L, Wang T (2023) Uncertainty quantification in molecular property prediction through spherical mixture density networks. Eng Appl Artif Intell 123(106):180
MATH Google Scholar
Fei Z, Gong H, Guo J, Wang J, Jin W, Xiang X, Ding X, Zhang N (2023) Image clustering: Utilizing teacher-student model and autoencoder. IEEE Access
Feng K, Qin H, Wu S, Pan W, Liu G (2020) A sleep apnea detection method based on unsupervised feature learning and single-lead electrocardiogram. IEEE Trans Instrum Meas 70:1–12
Google Scholar
Gao X, Huang W, Liu Y, Zhang Y, Zhang J, Li C, Bore JC, Wang Z, Si Y, Tian Y et al (2023) A novel robust student’s t-based granger causality for eeg based brain network analysis. Biomed Signal Process Control 80(104):321
MATH Google Scholar
Ge P, Ren CX, Dai DQ, Feng J, Yan S (2019) Dual adversarial autoencoders for clustering. IEEE transactions on neural networks and learning systems 31(4):1417–1424
Article MathSciNet MATH Google Scholar
Ghasedi Dizaji K, Herandi A, Deng C, Cai W, Huang H (2017) Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In: Proceedings of the IEEE international conference on computer vision, pp 5736–5745
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol 27
Guo X, Gao L, Liu X, Yin J (2017a) Improved deep embedded clustering with local structure preservation. In: Ijcai, pp 1753–1759
Guo X, Liu X, Zhu E, Yin J (2017b) Deep clustering with convolutional autoencoders. In: International conference on neural information processing, pp 373–382
Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2017) beta-vae: Learning basic visual concepts with a constrained variational framework. In: International conference on learning representations
Houben S, Stallkamp J, Salmen J, Schlipsing M, Igel C (2013) Detection of traffic signs in real-world images: The german traffic sign detection benchmark. In: The 2013 international joint conference on neural networks (IJCNN), pp 1–8
Hu Q, Zhang G, Qin Z, Cai Y, Yu G, Li GY (2023) Robust semantic communications with masked vq-vae enabled codebook. IEEE Transactions on Wireless Communications pp 1–1
Jiang Z, Zheng Y, Tan H, Tang B, Zhou H (2016) Variational deep embedding: An unsupervised and generative approach to clustering. arXiv preprint arXiv:1611.05148
Kim H, Mnih A (2018) Disentangling by factorising. In: International Conference on Machine Learning, pp 2649–2658
Kingma DP, Welling M (2013) Auto-encoding variational bayes. In: International Conference on Learning Representations
Külah E, Çetinkaya YM, Özer AG, Alemdar H (2023) Covid-19 forecasting using shifted gaussian mixture model with similarity-based estimation. Expert Syst Appl 214(119):034
Google Scholar
Le Guennec A, Malinowski S, Tavenard R (2016) Data augmentation for time series classification using convolutional neural networks. In: ECML/PKDD workshop on advanced analytics and learning on temporal data, pp 3558–3565
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article MATH Google Scholar
Li B, Wu F, Weinberger KQ, Belongie S (2019) Positional normalization. vol 32
Li B, Wu F, Lim SN, Belongie S, Weinberger KQ (2021a) On feature normalization and data augmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12,383–12,392
Li X, Kou K, Zhao B (2021b) Weather gan: Multi-domain weather translation using generative adversarial networks. arXiv preprint arXiv:2103.05422
Liu T, Yuan Q, Ding X, Wang Y, Zhang D (2023) Multi-objective optimization for greenhouse light environment using gaussian mixture model and an improved nsga-ii algorithm. Comput Electron Agric 205(107):612
MATH Google Scholar
Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137
Article MathSciNet MATH Google Scholar
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605
MATH Google Scholar
Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B (2015) Adversarial autoencoders. arXiv preprint arXiv:1511.05644
Marsaglia G, Tsang WW (2000) A simple method for generating gamma variables. ACM Transactions on Mathematical Software (TOMS) 26(3):363–372
Article MathSciNet MATH Google Scholar
McLachlan GJ, Lee SX, Rathnayake SI (2019) Finite mixture models. Annual review of statistics and its application 6:355–378
Article MathSciNet MATH Google Scholar
Meitz M, Preve D, Saikkonen P (2023) A mixture autoregressive model based on student’st-distribution. Communications in Statistics-Theory and Methods 52(2):499–515
Article MATH Google Scholar
Miklautz L, Bauer LG, Mautz D, Tschiatschek S, Böhm C, Plant C (2021) Details (don’t) matter: Isolating cluster information in deep embedded spaces. In: IJCAI, pp 2826–2832
Mukherjee S, Asnani H, Lin E, Kannan S (2019) Clustergan: Latent space clustering in generative adversarial networks. Proceedings of the AAAI conference on artificial intelligence 33:4610–4617
Article MATH Google Scholar
Naesseth C, Ruiz F, Linderman S, Blei D (2017) Reparameterization gradients through acceptance-rejection sampling algorithms. In: Artificial Intelligence and Statistics, pp 489–498
Niknam G, Molaei S, Zare H, Clifton D, Pan S (2023) Graph representation learning based on deep generative gaussian mixture models. Neurocomputing 523:157–169
Article Google Scholar
Satheesh C, Kamal S, Mujeeb A, Supriya M (2021) Passive sonar target classification using deep generative \(\beta \)-vae. IEEE Signal Process Lett 28:808–812
Article Google Scholar
Sevgen E, Moller J, Lange A, Parker J, Quigley S, Mayer J, Srivastava P, Gayatri S, Hosfield D, Korshunova M et al (2023) Prot-vae: Protein transformer variational autoencoder for functional protein design. bioRxiv pp 2023–01
Wolf L, Hassner T, Maoz I (2011) Face recognition in unconstrained videos with matched background similarity. CVPR 2011:529–534
MATH Google Scholar
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478–487
Xu J, Durrett G (2018) Spherical latent spaces for stable variational autoencoders. In: Riloff E, Chiang D, Hockenmaier J, Tsujii J (eds) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 4503–4513
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp 267–273
Yang B, Fu X, Sidiropoulos ND, Hong M (2017) Towards k-means-friendly spaces: Simultaneous deep learning and clustering. In: International conference on machine learning, pp 3861–3870
Yang L, Fan W, Bouguila N (2021) Deep clustering analysis via dual variational autoencoder with spherical latent embeddings. IEEE Transactions on Neural Networks and Learning Systems
Yang L, Fan W, Bouguila N (2022) Clustering analysis via deep generative models with mixture models. IEEE Transactions on Neural Networks and Learning Systems 33(1):340–350
Article MathSciNet MATH Google Scholar
Yang L, Fan W, Bouguila N (2022) Robust unsupervised image categorization based on variational autoencoder with disentangled latent representations. Knowl-Based Syst 246(108):671
MATH Google Scholar
Yang L, Fan W, Bouguila N (2023) Deep clustering analysis via dual variational autoencoder with spherical latent embeddings. IEEE Transactions on Neural Networks and Learning Systems 34(9):6303–6312
Article MATH Google Scholar
Yang X, Deng C, Zheng F, Yan J, Liu W (2019) Deep spectral clustering using dual autoencoder network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4066–4075
Yang X, Yan J, Cheng Y, Zhang Y (2022c) Learning deep generative clustering via mutual information maximization. IEEE Transactions on Neural Networks and Learning Systems
Zhang Y, Fan W, Bouguila N (2019) Unsupervised image categorization based on variational autoencoder and student’st mixture model. In: 2019 IEEE Symposium Series on Computational Intelligence (SSCI), pp 2403–2409
Zhu X, Zhu Y, Zheng W (2020) Spectral rotation for deep one-step clustering. Pattern Recogn 105(107):175
MATH Google Scholar
Zhu X, Xu C, Tao D (2021) Commutative lie group vae for disentanglement learning. In: International Conference on Machine Learning, pp 12,924–12,934

Download references

Acknowledgements

The completion of this work was supported by the National Natural Science Foundation of China (62276106), the Guangdong Basic and Applied Basic Research Foundation (2024A1515011767), the Guangdong Provincial Key Laboratory IRADS (2022B1212010006, R0400001-22) and the UIC Start-up Research Fund (UICR0700056-23).

Author information

Authors and Affiliations

Guangdong Provincial Key Laboratory IRADS and Department of Computer Science, Beijing Normal University-Hong Kong Baptist University United International College (UIC), Zhuhai, Guangdong, China
Wentao Fan
Department of Computer Science and Technology, Huaqiao University, Xiamen, China
Kunxiong Xu

Authors

Wentao Fan
View author publications
You can also search for this author in PubMed Google Scholar
Kunxiong Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wentao Fan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fan, W., Xu, K. Unsupervised image categorization based on deep generative models with disentangled representations and von Mises-Fisher distributions. Int. J. Mach. Learn. & Cyber. 16, 611–623 (2025). https://doi.org/10.1007/s13042-024-02265-6

Download citation

Received: 27 May 2023
Accepted: 17 June 2024
Published: 23 June 2024
Issue Date: January 2025
DOI: https://doi.org/10.1007/s13042-024-02265-6

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

HGMVAE: hierarchical disentanglement in Gaussian mixture variational autoencoder

Deep generative clustering methods based on disentangled representations and augmented data

Deep Image Clustering Using Self-learning Optimization in a Variational Auto-Encoder

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Unsupervised image categorization based on deep generative models with disentangled representations and von Mises-Fisher distributions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

HGMVAE: hierarchical disentanglement in Gaussian mixture variational autoencoder

Deep generative clustering methods based on disentangled representations and augmented data

Deep Image Clustering Using Self-learning Optimization in a Variational Auto-Encoder

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation