Bayesian inference for infinite asymmetric Gaussian mixture with feature selection

253 Accesses
Explore all metrics

Abstract

Data clustering is a fundamental unsupervised learning approach that impacts several domains such as data mining, computer vision, information retrieval, and pattern recognition. In this work, we develop a statistical framework for data clustering which uses Dirichlet processes and asymmetric Gaussian distributions. The parameters of this framework are learned using Markov Chain Monte Carlo inference approaches. We also integrate a feature selection technique to choose the features that are most informative in order to construct an appropriate model in terms of clustering accuracy. This paper reports results based on experiments that concern dynamic textures clustering as well as scene categorization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised Variational Learning of Finite Generalized Inverted Dirichlet Mixture Models with Feature Selection and Component Splitting

Model-Based Clustering Based on Variational Learning of Hierarchical Infinite Beta-Liouville Mixture Models

Article 15 August 2015

Variational Bayesian inference for infinite generalized inverted Dirichlet mixtures with feature selection and its application to clustering

Article 06 October 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Adams S, Beling PA (2017) A survey of feature selection methods for gaussian mixture models and hidden markov models. Artif Intell Rev 52:1739–1779
Article Google Scholar
Andonovski G, Mušič G, Blažič S, Škrjanc I (2018) Evolving model identification for process monitoring and prediction of non-linear systems. Eng Appl Artif Intell 68:214–221
Article Google Scholar
Antoniak CE (1974) Mixtures of dirichlet processes with applications to bayesian nonparametric problems. Ann Statist 2(6):1152–1174
Article MathSciNet Google Scholar
Blei DM, Jordan MI (2006) Variational inference for dirichlet process mixtures. Bayesian Anal 1(1):121–143
Article MathSciNet Google Scholar
Bouguila N (2009) A model-based approach for discrete data clustering and feature weighting using MAP and stochastic complexity. IEEE Trans Knowl Data Eng 21(12):1649–1664
Article Google Scholar
Bouguila N, Ziou D (2006) Unsupervised selection of a finite dirichlet mixture model: An mml-based approach. IEEE Trans Knowl Data Eng 18(8):993–1009
Article Google Scholar
Bouguila N, Ziou D (2012) A countably infinite mixture model for clustering and feature selection. Knowl Inf Syst 33(2):351–370
Article Google Scholar
Boutemedjet S, Bouguila N, Ziou D (2009) A hybrid feature extraction selection approach for high-dimensional non-gaussian data clustering. IEEE Trans Pattern Anal Mach Intell 31:1429–1443
Article Google Scholar
Boutemedjet S, Ziou D, Bouguila N (2010) Model-based subspace clustering of non-gaussian data. Neurocomputing 73(10–12):1730–1739
Article Google Scholar
Bouveyron C, Brunet-Saumard C (2014) Discriminative variable selection for clustering with the sparse fisher-em algorithm. Comput Stat 29(3):489–513
Article MathSciNet Google Scholar
Channoufi I, Bourouis S, Bouguila N, Hamrouni K (2018) Color image segmentation with bounded generalized gaussian mixture model and feature selection. In: 2018 4th International conference on advanced technologies for signal and image processing (ATSIP), pp 1–6
Channoufi I, Bourouis S, Bouguila N, Hamrouni K (2018) Image and video denoising by combining unsupervised bounded generalized gaussian mixture modeling and spatial information. Multimed Tools Appl 77(19):25591–25606
Article Google Scholar
Cheung Y, Zeng H (2006) Feature weighted rival penalized em for gaussian mixture clustering: Automatic feature and model selections in a single paradigm. In: 2006 International Conference on Computational Intelligence and Security, vol 1, pp 633–638
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: In Workshop on Statistical Learning in Computer Vision, ECCV, pp 1–22
Doretto G, Chiuso A, Wu YN, Soatto S (2003) Dynamic textures. Int J Comp Vision 51(2):91–109
Article Google Scholar
Elguebaly T, Bouguila N (2011) Bayesian learning of finite generalized gaussian mixture models on images. Sig Proces 91(4):801–820
Article Google Scholar
Elguebaly T, Bouguila N (2012) Generalized gaussian mixture models as a nonparametric bayesian approach for clustering using class-specific visual features. J Vis Comun Image Represent 23(8):1199–1212
Article Google Scholar
Elguebaly T, Bouguila N (2014) Background subtraction using finite mixtures of asymmetric gaussian distributions and shadow detection. Mach Vis Appl 25(5):1145–1162
Article Google Scholar
Elguebaly T, Bouguila N (2015) Simultaneous high-dimensional clustering and feature selection using asymmetric gaussian mixture models. Image Vis Comput 34:27–41
Article Google Scholar
Fan W, Bouguila N (2013) Learning finite beta-liouville mixture models via variational bayes for proportional data clustering. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, AAAI Press, IJCAI ’13, pp 1323–1329
Fan W, Bouguila N (2015) Dynamic textures clustering using a hierarchical pitman-yor process mixture of dirichlet distributions. In: 2015 IEEE International Conference on Image Processing (ICIP), pp 296–300
Fu S, Bouguila N (2018) Bayesian learning of finite asymmetric gaussian mixtures. In: Recent Trends and Future Technology in Applied Intelligence—31st International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2018, Montreal, QC, Canada, June 25-28, 2018, Proceedings, pp 355–365
Galimberti G, Manisi A, Soffritti G (2018) Modelling the role of variables in model-based cluster analysis. Statist Comput 28(1):145–169
Article MathSciNet Google Scholar
Griffin JE, Steel MFJ (2010) Bayesian nonparametric modelling with the dirichlet process regression smoother. Statist Sinica 20(4):1507–1527
MathSciNet MATH Google Scholar
Gustafson P, Carbonetto P, Thompson N, de Freitas N (2003) Bayesian feature weighting for unsupervised learning, with application to object recognition. In: AISTATS
Hyvärinen A, Hoyer P (2000) Emergence of phase- and shift-invariant features by decomposition of natural images into independent feature subspaces. Neural Comput 12(7):1705–1720
Article Google Scholar
Krishnan S, Samudravijaya K, Rao P (1996) Feature selection for pattern classification with gaussian mixture models: a new objective criterion. Pattern Recognit Lett 17(8):803–809
Article Google Scholar
Laptev I (2009) Improving object detection with boosted histograms. Image Vision Comput 27(5):535–544
Article Google Scholar
Law MHC, Figueiredo MAT, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26:1154–1166
Article Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol 2, pp 2169–2178
Li LJ, Fei-Fei L (2007) What, where and who? classifying events by scene and object recognition. In: 2007 IEEE 11th International Conference on Computer Vision pp 1–8
Marbac M, Sedki M (2017) Variable selection for model-based clustering using the integrated complete-data likelihood. Stat Comput 27(4):1049–1063
Article MathSciNet Google Scholar
Neal RM (2000) Markov chain sampling methods for dirichlet process mixture models. J Comput Graph Stat 9(2):249–265
MathSciNet Google Scholar
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comp Vision 42(3):145–175
Article Google Scholar
Pan W, Shen X (2007) Penalized model-based clustering with application to variable selection. J Mach Learn Res 8:1145–1164
MATH Google Scholar
Park S, Serpedin E, Qaraqe K (2013) Gaussian assumption: The least favorable but the most useful [lecture notes]. IEEE Signal Process Magaz 30(3):183–186
Article Google Scholar
Péteri R, Fazekas S, Huiskes MJ (2010) DynTex: a Comprehensive Database of Dynamic Textures. Pattern Recognit Lett 31:1627–1632
Article Google Scholar
Rasmussen CE (1999) The infinite gaussian mixture model. In: Proceedings of the 12th International Conference on Neural Information Processing Systems, MIT Press, Cambridge, MA, USA, NIPS’99, pp 554–560
Škrjanc I, Iglesias JA, Sanchis A, Leite D, Lughofer E, Gomide F (2019) Evolving fuzzy and neuro-fuzzy approaches in clustering, regression, identification, and classification: A survey. Inf Sci 490:344–368
Article Google Scholar
Song Z, Ali S, Bouguila N (2019) Bayesian learning of infinite asymmetric gaussian mixture models for background subtraction. In: Image Analysis and Recognition—16th International Conference, ICIAR 2019, Waterloo, ON, Canada, August 27-29, 2019, Proceedings, Part I, pp 264–274
Wang C, Blei DM, Fei-Fei L (2009) Simultaneous image classification and annotation. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition pp 1903–1910
Wang S, Zhu J (2008) Variable selection for model-based high-dimensional clustering and its application to microarray data. Biometrics 64(2):440–8
Article MathSciNet Google Scholar
Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928
Article Google Scholar
Zhu J, Li LJ, Fei-Fei L, Xing EP (2010) Large margin learning of upstream scene understanding models. In: NIPS

Download references

Author information

Authors and Affiliations

Concordia University, Montreal, Canada
Ziyang Song, Samr Ali & Nizar Bouguila

Authors

Ziyang Song
View author publications
You can also search for this author in PubMed Google Scholar
Samr Ali
View author publications
You can also search for this author in PubMed Google Scholar
Nizar Bouguila
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samr Ali.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Based on the hyperparameters setting chosen in Section 4, we deduce the posteriors for all of the parameters. For parameter $\alpha $, the posteriors depend only on the number of observations N and the number of components M, and not on how the distributions are distributed among the mixtures:

$$\begin{aligned} p \big (\alpha \mid k, n \big ) \propto \frac{\alpha ^{M-\frac{3}{2}}exp(-\frac{1}{2\alpha })\Gamma (\alpha )}{\Gamma (N+\alpha )} \end{aligned}$$

(32)

The complete posteriors for $\mu $, $\mu _{irr}$, $\lambda $ and r are obtained as follows:

$$\begin{aligned}&p\big ( \mu _{j k} \mid \dots \big ) \propto \, \mathcal {N}\bigg ((r\lambda + S_{lj k}\sum _{i: \phi _{ijk}=1, X_{ik} <\mu _{j k}} X_{i k} \nonumber \\&+ s_{rj k}\sum _{i: \phi _{ijk}=1, X_{ik}\ge \mu _{j k}} X_{i k}) / (r + p s_{lj k} + (n_j-p) s_{rj k}), \nonumber \\&\frac{1}{r + p s_{lj k} + (n_j - p) s_{rj k}} \bigg ) \end{aligned}$$

(33)

$$\begin{aligned}&p\big ( \mu _{j k}^{irr} \mid \dots \big ) \propto \mathcal {N}\bigg (\frac{\sum _{i, \phi _{ijk}=0} x_{ik}^{irr} S_{j k}^{irr} + r_k^{irr} \lambda _k^{irr}}{r_k^{irr} + n_j^{irr} S_{j k}^{irr} }, \nonumber \\&\frac{1}{r_k^{irr} + n_j^{irr} S_{j k}^{irr}}\bigg ) \end{aligned}$$

(34)

$$\begin{aligned} p\big (&\lambda \mid \mu _{1 k},\dots ,\mu _{M k}, r\big ) \propto \mathcal {N}\bigg (\frac{r\sum _{j=1}^M \mu _{j k} +\mu _x\sigma _x^{-2}}{\sigma _x^{-2}+Mr_k}, \nonumber \\&\frac{1}{\sigma _x^{-2}+Mr_k}\bigg ) \end{aligned}$$

(35)

$$\begin{aligned}&p\big (r \mid \mu _{1 k},\dots ,\mu _{M k}, \lambda \big ) \propto \gamma \bigg (\frac{M+1}{2}, \end{aligned}$$

(36)

$$\begin{aligned}&\frac{2}{\sigma _x^2 + \sum _{j=1}^M (\mu _{jk} - \lambda _k)^2}\bigg ) \end{aligned}$$

(37)

The complete posteriors for $s_{ljk}$, $s_{rjk}$, $s_{jk}^{irr}$, $\beta $ and w are obtained as follows:

$$\begin{aligned}&p\big (S_{lj k} \mid X, \mu _{j}, S_{rj}, \beta _l, w_l \big ) \nonumber \\&\propto \exp \bigg [-\frac{S_{lj k}\sum _{i:X_{i k}<\mu _{j k}}^n(x_{i k} - \mu _{j k})^2}{2} -\frac{w_{lk}\beta _{lk} S_{lj k}}{2}\bigg ] \end{aligned}$$

(38)

$$\begin{aligned}&p\big ( S_{j}^{irr} \mid X, \mu _{j}^{irr}, \beta ^{irr},w^{irr} \big ) \propto \Gamma \bigg (\frac{N_{jk}^{irr} \beta _k^{irr}}{2}, \nonumber \\&\frac{2}{\beta _k^{irr} w_k^{irr} + \sum _{i,\phi _{ijk} = 0} (X_{ik} - \mu _{j k}^{irr})^2}\bigg ) \end{aligned}$$

(39)

$$\begin{aligned}&p\big ( \beta _l \mid S_{l1 k},\dots ,S_{lM k}, w_l) \propto \Gamma (\frac{\beta _l}{2})^{-M}exp\bigg (-\frac{1}{2\beta _l}\bigg )\nonumber \\&(\frac{\beta _l}{2})^{\frac{M\beta _l -3 }{2}} \prod _{j=1}^{M}(w_l S_{lj k})^{\frac{\beta _l}{2}} exp\bigg (-\frac{\beta _l w_l s_{lj k}}{2}\bigg ) \end{aligned}$$

(40)

$$\begin{aligned}&p\big ( w_l \mid S_{l1 k},\dots ,S_{lM k}, \beta _l) \propto \Gamma \bigg (\frac{M\beta _l+1}{2}, \nonumber \\&\frac{2}{\sigma _y^{-2}+\beta _l \sum _{j=1}^{M}S_{lj k}}\bigg ) \end{aligned}$$

(41)

$N_{jk}^{re}$ and $N_{jk}^{irr}$ are the number of observations allocated to mixture j with feature k considered as relevant and irrelevant, respectively.

The complete posteriors for feature saliency $\phi $ with gamma parameters a and b, with $n_{jk}$ the number of feature k relevant for component j can then be expressed by:

$$\begin{aligned} p \big ( \rho _{j k} \mid \dots \big ) \propto \text {Beta} \left( a + n_{jk}, b + N - n_{jk}\right) \end{aligned}$$

(42)

$$\begin{aligned} p\big (a \mid \dots )\propto & {} a e^{-\frac{a}{2}} \bigg (\frac{\Gamma (a+b)}{\Gamma (a)}\bigg )^M \prod _{j=1}^M \rho _{jk}^{a-1 } \nonumber \\ p\big (b \mid \dots )\propto & {} b e^{-\frac{b}{2}} \bigg (\frac{\Gamma (a+b)}{\Gamma (b)}\bigg )^M \prod _{j=1}^M (1-\rho _{jk})^{a-1 } \end{aligned}$$

(43)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, Z., Ali, S. & Bouguila, N. Bayesian inference for infinite asymmetric Gaussian mixture with feature selection. Soft Comput 25, 6043–6053 (2021). https://doi.org/10.1007/s00500-021-05598-4

Download citation

Published: 02 February 2021
Issue Date: April 2021
DOI: https://doi.org/10.1007/s00500-021-05598-4

Bayesian inference for infinite asymmetric Gaussian mixture with feature selection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised Variational Learning of Finite Generalized Inverted Dirichlet Mixture Models with Feature Selection and Component Splitting

Model-Based Clustering Based on Variational Learning of Hierarchical Infinite Beta-Liouville Mixture Models

Variational Bayesian inference for infinite generalized inverted Dirichlet mixtures with feature selection and its application to clustering

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Bayesian inference for infinite asymmetric Gaussian mixture with feature selection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised Variational Learning of Finite Generalized Inverted Dirichlet Mixture Models with Feature Selection and Component Splitting

Model-Based Clustering Based on Variational Learning of Hierarchical Infinite Beta-Liouville Mixture Models

Variational Bayesian inference for infinite generalized inverted Dirichlet mixtures with feature selection and its application to clustering

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now