[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

Unsupervised image categorization based on deep generative models with disentangled representations and von Mises-Fisher distributions

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Variational autoencoders (VAEs) have emerged as powerful deep generative models for learning abstract representations in the latent space, making them highly applicable across diverse domains. This paper presents a novel image categorization approach that leverages VAEs with disentangled representations. In VAE-based clustering models, the latent representations learned by encoders often combine both generation and clustering information. To address this concern, our proposed model disentangles the acquired latent representations into dedicated clustering and generation modules, thereby enhancing the performance and efficiency of clustering tasks. Specifically, we introduce an extension of the Kullback–Leibler (KL) divergence to promote independence between these two modules. Additionally, we incorporate the von Mises-Fisher (vMF) distribution to improve the clustering model’s ability to capture cluster characteristics within the generation module. Extensive experimental evaluations confirm the effectiveness of our model in clustering tasks, notably without the requirement for pre-training. Furthermore, when compared to various deep generative clustering models requiring pre-training, our model is able to achieve either comparable or superior performance across multiple datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The datasets used in this work are available at:

– MNIST: http://yann.lecun.com/exdb/mnist/

– USPS: https://www.kaggle.com/datasets/bistaumanga/usps-dataset

– GTSRB: https://benchmark.ini.rub.de/gtsrb_news.html

– YTF: https://www.cs.tau.ac.il/~wolf/ytfaces/

– F-MNIST: https://www.kaggle.com/datasets/zalando-research/fashionmnist

References

  1. Aytekin C, Ni X, Cricri F, Aksu E (2018) Clustering and unsupervised anomaly detection with l2 normalized deep auto-encoder representations. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp 1–6

  2. Cai J, Wang S, Guo W (2021) Unsupervised embedded feature learning for deep clustering with stacked sparse auto-encoder. Expert Syst Appl 186(115):729

    MATH  Google Scholar 

  3. Cao L, Asadi S, Zhu W, Schmidli C, Sjöberg M (2020) Simple, scalable, and stable variational deep clustering. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 108–124

  4. Chen RT, Li X, Grosse RB, Duvenaud DK (2018) Isolating sources of disentanglement in variational autoencoders. vol 31

  5. Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In: Advances in neural information processing systems, vol 29

  6. Dai Q, Zhao C, Zhao S (2022) Variational bayesian student’st mixture model with closed-form missing value imputation for robust process monitoring of low-quality data. IEEE Transactions on Cybernetics pp 1–14

  7. Davidson T, Falorsi L, De Cao N, Kipf T, Tomczak J (2018a) Hyperspherical variational auto-encoders. In: 34th Conference on Uncertainty in Artificial Intelligence 2018, UAI 2018, pp 856–865

  8. Davidson TR, Falorsi L, De Cao N, Kipf T, Tomczak JM (2018b) Hyperspherical variational auto-encoders. arXiv preprint arXiv:1804.00891

  9. Diallo B, Hu J, Li T, Khan GA, Liang X, Zhao Y (2021) Deep embedding clustering based on contractive autoencoder. Neurocomputing 433:96–107

    Article  MATH  Google Scholar 

  10. Dilokthanakul N, Mediano PA, Garnelo M, Lee MC, Salimbeni H, Arulkumaran K, Shanahan M (2016) Deep unsupervised clustering with gaussian mixture variational autoencoders. arXiv preprint arXiv:1611.02648

  11. Dupont E (2018) Learning disentangled joint continuous and discrete representations. vol 31

  12. Fan W, Hou W (2022) Unsupervised modeling and feature selection of sequential spherical data through nonparametric hidden markov models. Int J Mach Learn Cybern 13:3019–3029

    Article  MATH  Google Scholar 

  13. Fan W, Bouguila N, Ziou D (2012) Variational learning for finite Dirichlet mixture models and applications. IEEE Transactions on Neural Networks and Learning Systems 23(5):762–774

    Article  MATH  Google Scholar 

  14. Fan W, Sallay H, Bouguila N (2017) Online learning of hierarchical Pitman-Yor process mixture of generalized Dirichlet distributions with feature selection. IEEE Transactions on Neural Networks and Learning Systems 28(9):2048–2061

    MathSciNet  MATH  Google Scholar 

  15. Fan W, Bouguila N, Du JX, Liu X (2019) Axially symmetric data clustering through Dirichlet process mixture models of Watson distributions. IEEE Transactions on Neural Networks and Learning Systems 30(6):1683–1694

    Article  MathSciNet  MATH  Google Scholar 

  16. Fan W, Yang L, Bouguila N (2022) Unsupervised grouped axial data modeling via hierarchical Bayesian nonparametric models with Watson distributions. IEEE Trans Pattern Anal Mach Intell 44(12):9654–9668

    Article  MATH  Google Scholar 

  17. Fan W, Shangguan W, Chen Y (2023) Transformer-based contrastive learning framework for image anomaly detection. Int J Mach Learn Cybern 14:3413–3426

    Article  MATH  Google Scholar 

  18. Fan W, Zeng L, Wang T (2023) Uncertainty quantification in molecular property prediction through spherical mixture density networks. Eng Appl Artif Intell 123(106):180

    MATH  Google Scholar 

  19. Fei Z, Gong H, Guo J, Wang J, Jin W, Xiang X, Ding X, Zhang N (2023) Image clustering: Utilizing teacher-student model and autoencoder. IEEE Access

  20. Feng K, Qin H, Wu S, Pan W, Liu G (2020) A sleep apnea detection method based on unsupervised feature learning and single-lead electrocardiogram. IEEE Trans Instrum Meas 70:1–12

    Google Scholar 

  21. Gao X, Huang W, Liu Y, Zhang Y, Zhang J, Li C, Bore JC, Wang Z, Si Y, Tian Y et al (2023) A novel robust student’s t-based granger causality for eeg based brain network analysis. Biomed Signal Process Control 80(104):321

    MATH  Google Scholar 

  22. Ge P, Ren CX, Dai DQ, Feng J, Yan S (2019) Dual adversarial autoencoders for clustering. IEEE transactions on neural networks and learning systems 31(4):1417–1424

    Article  MathSciNet  MATH  Google Scholar 

  23. Ghasedi Dizaji K, Herandi A, Deng C, Cai W, Huang H (2017) Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In: Proceedings of the IEEE international conference on computer vision, pp 5736–5745

  24. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol 27

  25. Guo X, Gao L, Liu X, Yin J (2017a) Improved deep embedded clustering with local structure preservation. In: Ijcai, pp 1753–1759

  26. Guo X, Liu X, Zhu E, Yin J (2017b) Deep clustering with convolutional autoencoders. In: International conference on neural information processing, pp 373–382

  27. Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2017) beta-vae: Learning basic visual concepts with a constrained variational framework. In: International conference on learning representations

  28. Houben S, Stallkamp J, Salmen J, Schlipsing M, Igel C (2013) Detection of traffic signs in real-world images: The german traffic sign detection benchmark. In: The 2013 international joint conference on neural networks (IJCNN), pp 1–8

  29. Hu Q, Zhang G, Qin Z, Cai Y, Yu G, Li GY (2023) Robust semantic communications with masked vq-vae enabled codebook. IEEE Transactions on Wireless Communications pp 1–1

  30. Jiang Z, Zheng Y, Tan H, Tang B, Zhou H (2016) Variational deep embedding: An unsupervised and generative approach to clustering. arXiv preprint arXiv:1611.05148

  31. Kim H, Mnih A (2018) Disentangling by factorising. In: International Conference on Machine Learning, pp 2649–2658

  32. Kingma DP, Welling M (2013) Auto-encoding variational bayes. In: International Conference on Learning Representations

  33. Külah E, Çetinkaya YM, Özer AG, Alemdar H (2023) Covid-19 forecasting using shifted gaussian mixture model with similarity-based estimation. Expert Syst Appl 214(119):034

    Google Scholar 

  34. Le Guennec A, Malinowski S, Tavenard R (2016) Data augmentation for time series classification using convolutional neural networks. In: ECML/PKDD workshop on advanced analytics and learning on temporal data, pp 3558–3565

  35. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  MATH  Google Scholar 

  36. Li B, Wu F, Weinberger KQ, Belongie S (2019) Positional normalization. vol 32

  37. Li B, Wu F, Lim SN, Belongie S, Weinberger KQ (2021a) On feature normalization and data augmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12,383–12,392

  38. Li X, Kou K, Zhao B (2021b) Weather gan: Multi-domain weather translation using generative adversarial networks. arXiv preprint arXiv:2103.05422

  39. Liu T, Yuan Q, Ding X, Wang Y, Zhang D (2023) Multi-objective optimization for greenhouse light environment using gaussian mixture model and an improved nsga-ii algorithm. Comput Electron Agric 205(107):612

    MATH  Google Scholar 

  40. Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137

    Article  MathSciNet  MATH  Google Scholar 

  41. Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605

    MATH  Google Scholar 

  42. Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B (2015) Adversarial autoencoders. arXiv preprint arXiv:1511.05644

  43. Marsaglia G, Tsang WW (2000) A simple method for generating gamma variables. ACM Transactions on Mathematical Software (TOMS) 26(3):363–372

    Article  MathSciNet  MATH  Google Scholar 

  44. McLachlan GJ, Lee SX, Rathnayake SI (2019) Finite mixture models. Annual review of statistics and its application 6:355–378

    Article  MathSciNet  MATH  Google Scholar 

  45. Meitz M, Preve D, Saikkonen P (2023) A mixture autoregressive model based on student’st-distribution. Communications in Statistics-Theory and Methods 52(2):499–515

    Article  MATH  Google Scholar 

  46. Miklautz L, Bauer LG, Mautz D, Tschiatschek S, Böhm C, Plant C (2021) Details (don’t) matter: Isolating cluster information in deep embedded spaces. In: IJCAI, pp 2826–2832

  47. Mukherjee S, Asnani H, Lin E, Kannan S (2019) Clustergan: Latent space clustering in generative adversarial networks. Proceedings of the AAAI conference on artificial intelligence 33:4610–4617

    Article  MATH  Google Scholar 

  48. Naesseth C, Ruiz F, Linderman S, Blei D (2017) Reparameterization gradients through acceptance-rejection sampling algorithms. In: Artificial Intelligence and Statistics, pp 489–498

  49. Niknam G, Molaei S, Zare H, Clifton D, Pan S (2023) Graph representation learning based on deep generative gaussian mixture models. Neurocomputing 523:157–169

    Article  Google Scholar 

  50. Satheesh C, Kamal S, Mujeeb A, Supriya M (2021) Passive sonar target classification using deep generative \(\beta \)-vae. IEEE Signal Process Lett 28:808–812

    Article  Google Scholar 

  51. Sevgen E, Moller J, Lange A, Parker J, Quigley S, Mayer J, Srivastava P, Gayatri S, Hosfield D, Korshunova M et al (2023) Prot-vae: Protein transformer variational autoencoder for functional protein design. bioRxiv pp 2023–01

  52. Wolf L, Hassner T, Maoz I (2011) Face recognition in unconstrained videos with matched background similarity. CVPR 2011:529–534

    MATH  Google Scholar 

  53. Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747

  54. Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478–487

  55. Xu J, Durrett G (2018) Spherical latent spaces for stable variational autoencoders. In: Riloff E, Chiang D, Hockenmaier J, Tsujii J (eds) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 4503–4513

  56. Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp 267–273

  57. Yang B, Fu X, Sidiropoulos ND, Hong M (2017) Towards k-means-friendly spaces: Simultaneous deep learning and clustering. In: International conference on machine learning, pp 3861–3870

  58. Yang L, Fan W, Bouguila N (2021) Deep clustering analysis via dual variational autoencoder with spherical latent embeddings. IEEE Transactions on Neural Networks and Learning Systems

  59. Yang L, Fan W, Bouguila N (2022) Clustering analysis via deep generative models with mixture models. IEEE Transactions on Neural Networks and Learning Systems 33(1):340–350

    Article  MathSciNet  MATH  Google Scholar 

  60. Yang L, Fan W, Bouguila N (2022) Robust unsupervised image categorization based on variational autoencoder with disentangled latent representations. Knowl-Based Syst 246(108):671

    MATH  Google Scholar 

  61. Yang L, Fan W, Bouguila N (2023) Deep clustering analysis via dual variational autoencoder with spherical latent embeddings. IEEE Transactions on Neural Networks and Learning Systems 34(9):6303–6312

    Article  MATH  Google Scholar 

  62. Yang X, Deng C, Zheng F, Yan J, Liu W (2019) Deep spectral clustering using dual autoencoder network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4066–4075

  63. Yang X, Yan J, Cheng Y, Zhang Y (2022c) Learning deep generative clustering via mutual information maximization. IEEE Transactions on Neural Networks and Learning Systems

  64. Zhang Y, Fan W, Bouguila N (2019) Unsupervised image categorization based on variational autoencoder and student’st mixture model. In: 2019 IEEE Symposium Series on Computational Intelligence (SSCI), pp 2403–2409

  65. Zhu X, Zhu Y, Zheng W (2020) Spectral rotation for deep one-step clustering. Pattern Recogn 105(107):175

    MATH  Google Scholar 

  66. Zhu X, Xu C, Tao D (2021) Commutative lie group vae for disentanglement learning. In: International Conference on Machine Learning, pp 12,924–12,934

Download references

Acknowledgements

The completion of this work was supported by the National Natural Science Foundation of China (62276106), the Guangdong Basic and Applied Basic Research Foundation (2024A1515011767), the Guangdong Provincial Key Laboratory IRADS (2022B1212010006, R0400001-22) and the UIC Start-up Research Fund (UICR0700056-23).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wentao Fan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, W., Xu, K. Unsupervised image categorization based on deep generative models with disentangled representations and von Mises-Fisher distributions. Int. J. Mach. Learn. & Cyber. 16, 611–623 (2025). https://doi.org/10.1007/s13042-024-02265-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-024-02265-6

Keywords

Navigation