Eigenvalues and constraints in mixture modeling: geometric and computational issues

512 Accesses
15 Citations
1 Altmetric
Explore all metrics

Abstract

This paper presents a review about the usage of eigenvalues restrictions for constrained parameter estimation in mixtures of elliptical distributions according to the likelihood approach. The restrictions serve a twofold purpose: to avoid convergence to degenerate solutions and to reduce the onset of non interesting (spurious) local maximizers, related to complex likelihood surfaces. The paper shows how the constraints may play a key role in the theory of Euclidean data clustering. The aim here is to provide a reasoned survey of the constraints and their applications, considering the contributions of many authors and spanning the literature of the last 30 years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Parsimony and parameter estimation for mixtures of multivariate leptokurtic-normal distributions

Article Open access 27 September 2023

On the Use of the Matrix-Variate Tail-Inflated Normal Distribution for Parsimonious Mixture Modeling

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Article 14 January 2022

References

Andrews JL, McNicholas PD (2011) Extending mixtures of multivariate $t$-factor analyzers. Stat Comput 21:361–373
Article MathSciNet MATH Google Scholar
Andrews JL, McNicholas PD, Subedi S (2011) Model-based classification via mixtures of multivariate $t$-distributions. Comput Stat Data Anal 55:520–529
Article MathSciNet MATH Google Scholar
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821
Article MathSciNet MATH Google Scholar
Baudry J-P, Celeux G (2015) EM for mixtures. Stat Comput 22(5):1021–1029
MATH Google Scholar
Biernacki C (2004) Initializing EM using the properties of its trajectories in gaussian mixtures. Stat Comput 14:267–279
Article MathSciNet Google Scholar
Biernacki C, Chrétien S (2003) Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures with the EM. Stat Probab Lett 61:373–382
Article MathSciNet MATH Google Scholar
Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the em algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41(3–4):561–575
Article MathSciNet MATH Google Scholar
Boyles RA (1983) On the convergence of the EM algorithm. J R Stat Soc B 45:47–50
MathSciNet MATH Google Scholar
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28:781–793
Article Google Scholar
Cerioli A, García-Escudero LA, Mayo-Iscar A, Riani M (2016) Finding the number of groups in model-based clustering via constrained likelihoods. http://uvadoc.uva.es/handle/10324/18093
Chanda KC (1954) A note on the consistency and maxima of the roots of likelihood equations. Biometrika 41:56–61
Article MathSciNet MATH Google Scholar
Ciuperca G, Ridolfi A, Idier J (2003) Penalized maximum likelihood estimator for normal mixtures. Scand J Stat 30:45–59
Article MathSciNet MATH Google Scholar
Cramér H (1946) Math Methods Stat. Princeton University Press, Princeton
Google Scholar
Day N (1969) Estimating the components of a mixture of normal distributions. Biometrika 56(3):463–474
Article MathSciNet MATH Google Scholar
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38
MathSciNet MATH Google Scholar
Dennis JE (1981) Algorithms for non linear fitting. In: Proceedings of the NATO advanced research symposium, Cambridge, England. Cambridge University
Dykstra RL (1983) An algorithm for restricted least squares regression. J Am Stat Assoc 78(384):837–842
Article MathSciNet MATH Google Scholar
Fang K, Anderson T (1990) Statistical inference in elliptically contoured and related distributions. Alberton, New York
MATH Google Scholar
Fraley C, Raftery AE (2006) Mclust version 3: an R package for normal mixture modeling and model-based clustering. Technical report, DTIC Document
Fraley C, Raftery A (2007) Bayesian regularization for normal mixture estimation and model-based clustering. J Classif 24(2):155–181
Article MathSciNet MATH Google Scholar
Fraley C, Raftery A, Murphy T, Scrucca L (2012) mclust version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. University of Washington, Seattle
Google Scholar
Fritz H, García-Escudero LA, Mayo-Iscar A (2012) tclust: an R package for a trimming approach to cluster analysis. J Stat Softw 47(12):1–26
Article Google Scholar
Fritz H, García-Escudero LA, Mayo-Iscar A (2013) A fast algorithm for robust constrained clustering. Comput Stat Data Anal 61:124–136
Article MathSciNet MATH Google Scholar
Gallegos MT (2002) Maximum likelihood clustering with outliers. In: Jajuga K, Sokolowski A, Bock H-H (eds) Classification, clustering, and data analysis: recent advances and applications. Springer, Berlin, pp 247–255
Gallegos M, Ritter G (2009) Trimmed ML estimation of contaminated mixture. Sankhya (Ser A) 71:164–220
MathSciNet MATH Google Scholar
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36(3):1324–1345
Article MathSciNet MATH Google Scholar
García-Escudero LA, Gordaliza A, Mayo-Iscar A (2014) A constrained robust proposal for mixture modeling avoiding spurious solutions. Adv Data Anal Classif 8(1):27–43
Article MathSciNet Google Scholar
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2015) Avoiding spurious local maximizers in mixture modelling. Stat Comput 25:619–633
Article MathSciNet MATH Google Scholar
García-Escudero LA, Gordaliza A, Greselin F, Ingrassia S, Mayo-Iscar A (2016) The joint role of trimming and constraints in robust estimation for mixtures of Gaussian factor analyzers. Comput Stat Data Anal 99:131–147
Article MathSciNet MATH Google Scholar
García-Escudero LA, Gordaliza A, Greselin F, Ingrassia S, Mayo-Iscar A (2017) Eigenvalues in robust approaches to mixture modeling: a review. Technical report, in preparation
Ghahramani Z, Hinton G (1997) The EM algorithm for factor analyzers. Technical report CRG-TR-96-1, University of Toronto
Greselin F, Ingrassia S (2010) Constrained monotone em algorithms for mixtures of multivariate $t$-distributions. Stat Comput 20:9–22
Article MathSciNet Google Scholar
Greselin F, Ingrassia S (2015) Maximum likelihood estimation in constrained parameter spaces for mixtures of factor analyzers. Stat Comput 25(2):215–226
Article MathSciNet MATH Google Scholar
Greselin F, Ingrassia S, Punzo A (2011) Assessing the pattern of covariance matrices via an augmentation multiple testing procedure. Stat Methods Appl 20:141–170
Article MathSciNet MATH Google Scholar
Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13(2):795–800
Article MathSciNet MATH Google Scholar
Hathaway RJ (1996) A constrained EM algorithm for univariate normal mixtures. J Stat Comput Simul 23:211–230
Article Google Scholar
Hennig C (2004) Breakdown points for maximum likelihood estimators of location-scale mixtures. Ann Stat 32:1313–1340
Article MathSciNet MATH Google Scholar
Ingrassia S (1992) A comparison between the simulated annealing and the EM algorithms in normal mixture decompositions. Stat Comput 2:203–211
Article Google Scholar
Ingrassia S (2004) A likelihood-based constrained algorithm for multivariate normal mixture models. Stat Methods Appl 13(4):151–166
MathSciNet Google Scholar
Ingrassia S, Rocci R (2007) Constrained monotone EM algorithms for finite mixture of multivariate gaussians. Comput Stat Data Anal 51(11):5339–5351
Article MathSciNet MATH Google Scholar
Ingrassia S, Rocci R (2011) Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints. Comput Stat Data Anal 55(4):1715–1725
Article MathSciNet MATH Google Scholar
Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann Math Stat 27(4):887–906
Article MathSciNet MATH Google Scholar
Kiefer NM (1978) Discrete parameter variation: efficient estimation of a switching regression model. Econometrica 46(2):427–434
Article MathSciNet MATH Google Scholar
Lindsay BG (1995) Mixture models: theory, geometry and applications. NSF-CBMS regional conference series in probability and statistics, vol 5. Institute of Mathematical Statistics, Hayward, CA
McLachlan G, Krishnan T (2008a) The EM algorithm and extensions, 2nd edn, vol 589. Wiley, New York
McLachlan GJ, Krishnan T (2008b) The EM algorithm and its extensions, 2nd edn. Wiley, New York
Book MATH Google Scholar
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Book MATH Google Scholar
McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comput 18:285–296
Article MathSciNet Google Scholar
Meng X-L (1994) On the rate of convergence of the ECM algorithm. Ann Stat 22(1):326–339
Article MathSciNet MATH Google Scholar
Meng X-L, van Dyk D (1997) The EM algorithm. An old folk song sung to a fast new tune. J R Stat Soc B 59(3):511–567
Article MathSciNet MATH Google Scholar
Nettleton D (1999) Convergence properties of the EM algorithm in constrained spaces. Canad J Stat 27(3):639–644
Article MathSciNet MATH Google Scholar
O’Hagan A, White A (2016) Improved model-based clustering performance using bayes initialization averaging. Technical report, arXiv:1504.06870v4
O’Hagan A, Murphy TB, Gormley C (2013) Computational aspects of fitting mixture model via the expectation-maximisation algorithm. Comput Stat Data Anal 56(12):3843–3864
Article MATH Google Scholar
Puntanen S, Styan GP, Isotalo J (2011) Matrix tricks for linear statistical models. Springer, Berlin
Book MATH Google Scholar
Redner RA, Walker HF (1984) Mixture densities maximum likelihood and the EM algorithm. SIAM Rev 26(2):195–239
Article MathSciNet MATH Google Scholar
Ritter G (2014) Cluster analysis and variable selection. CRC Press, Boca Raton
MATH Google Scholar
Rocci R, Gattone SA, Di Mari R (2016) A data driven equivariant approach to constrained Gaussian mixture modeling. Adv Data Anal Classif. doi:10.1007/s11634-016-0279-1
Rousseeuw PJ, Leroy AM (2005) Robust regression and outlier detection. Wiley, New York
MATH Google Scholar
Rudin W (1976) Principles of mathematical analysis, 3rd edn. McGraw-Hill, Tokyo
MATH Google Scholar
Scrucca L, Fop M, Murphy TB, Raftery A (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J 8(1):289–317
Google Scholar
Seo B, Kim D (2012) Root selection in normal mixture models. Comput Stat Data Anal 56:2454–2470
Article MathSciNet MATH Google Scholar
Subedi S, Punzo A, Ingrassia S, McNicholas PD (2013) Clustering and classification via cluster-weighted factor analyzers. Adv Data Anal Classif 7:5–40
Article MathSciNet MATH Google Scholar
Subedi S, Punzo A, Ingrassia S, McNicholas PD (2015) Cluster-weighted $t$-factor analyzers for robust model-based clustering and dimension reduction. Stat Methods Appl 24(4):623–649
Tanaka K (2009) Strong consistency of the maximum likelihood estimator for finite mixtures of location-scale distributions when penalty is imposed on the ratios of the scale parameters. Scand J Stat 36(1):171–184
MathSciNet MATH Google Scholar
Tanaka K, Takemura A (2006) Strong consistency of the maximum likelihood estimator for finite mixtures of location-scale distributions when the scale parameters are exponentially small. Bernoulli 12(6):1003–1017
Article MathSciNet MATH Google Scholar
Tarone RD, Gruenhage G (1975) A note on the uniqueness of the roots of the likelihood equations for vector-valued parameters. J Am Stat Assoc 70(352):903–904
Article MathSciNet MATH Google Scholar
Tarone RD, Gruenhage G (1979) Corrigenda: a note on the uniqueness of the roots of the likelihood equations for vector-valued parameters. J Am Stat Assoc 74(367):744
Article Google Scholar
Theobald C (1975) An inequality with application to multivariate analysis. Biometrika 62(2):461–466
Article MathSciNet MATH Google Scholar
Theobald C (1976) Corrections and amendments: an inequality with application to multivariate analysis. Biometrika 63(3):685
MathSciNet Google Scholar
Tipping M, Bishop CM (1999) Mixtures of probabilistic principal component mixtures of probabilistic principal component analysers. Neural Comput 11(2):443–482
Article MATH Google Scholar
Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York
MATH Google Scholar
van Laarhoven PJM, Aarts EHL (1988) Simulated annealing: theory and practice. D. Reidel, Dordecht
Google Scholar
Wu CFJ (1983) On convergence properties of the EM algorithm. Ann Stat 11:95–103
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Operations Research and IMUVA, University of Valladolid, Valladolid, Spain
Luis Angel García-Escudero, Alfonso Gordaliza & Agustín Mayo-Iscar
Department of Statistics and Quantitative Methods, Milano-Bicocca University, Milan, Italy
Francesca Greselin
Department of Economics and Business, University of Catania, Corso Italia 55, 95128, Catania, Italy
Salvatore Ingrassia

Authors

Luis Angel García-Escudero
View author publications
You can also search for this author in PubMed Google Scholar
Alfonso Gordaliza
View author publications
You can also search for this author in PubMed Google Scholar
Francesca Greselin
View author publications
You can also search for this author in PubMed Google Scholar
Salvatore Ingrassia
View author publications
You can also search for this author in PubMed Google Scholar
Agustín Mayo-Iscar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Salvatore Ingrassia.

Appendix: Proof of Theorem 4

To maximize $\mathcal {L}(\mathbf {\psi })$ means to jointly maximize $|\varvec{\varSigma }_g|^{-1/2}$ and to minimize the argument of the exponential, i.e., $(\mathbf {x}_n- \varvec{\mu }_g)' \varvec{\varSigma }_g^{-1}(\mathbf {x}_n- \varvec{\mu }_g)$, for each $g=1,\ldots ,G$. Hence, firstly we will show that, for a given $\varvec{\varSigma }_g$, the mean vector $\varvec{\mu }_g$ has to lie in a compact subset in $\mathbb {R}^d$. Let C be the convex hull of $\mathcal {X}$, i.e., the intersection of all convex sets containing the N points, given by

$$\begin{aligned} C(\mathcal {X})=\left\{ \sum _{n=1}^N u_n \mathbf {x}_n \; | \; \sum _{n=1}^N u_n=1, u_n \ge 0\right\} . \end{aligned}$$

(53)

Suppose now that $\bar{\mathbf {\psi }} \in \mathbf {\Psi }_{a,b}$ satisfies $\bar{\varvec{\mu }}_g \notin C(\mathcal {X})$. Then $\mathcal {L}(\bar{\mathbf {\psi }}) \le \mathcal {L}(\mathbf {\psi }^*)$ where $\mathbf {\psi }^* \in \mathbf {\Psi }_{a,b}$ is obtained from $\bar{\mathbf {\psi }}$ by changing the gth mean component to $\varvec{\mu }_g=\alpha \bar{\varvec{\mu }}_g$ for some $\alpha \in (0,1)$ (i.e., along the line joining $\mathbf{0}$ and $\bar{\varvec{\mu }}_g$) such that $\varvec{\mu }_g \in C(\mathcal {X})$.

Let us set $S= \{ \mathbf {\psi }\in \mathbf {\Psi }_{a,b} \, | \, \varvec{\mu }_g\in C(\mathcal {X}); 0< a \le \lambda _i(\varvec{\varSigma }_g)\le b < + \infty \quad g=1, \ldots , G\}$. Then, it follows that

$$\begin{aligned} \sup _{\mathbf {\psi }\in \mathbf {\Psi }_{a,b}} \mathcal {L}(\mathbf {\psi }) = \sup _{\mathbf {\psi }\in S} \mathcal {L}(\mathbf {\psi }) . \end{aligned}$$

By the compactness of S and the continuity of $\mathcal {L}(\mathbf {\psi })$, there exists a parameter $\hat{\mathbf {\psi }}\in \mathbf {\Psi }_{a,b}$ satisfying

$$\begin{aligned} \mathcal {L}(\hat{\mathbf {\psi }}) =\sup _{\mathbf {\psi }\in \mathbf {\Psi }_{a,b}} \mathcal {L}(\mathbf {\psi })= \sup _{\mathbf {\psi }\in S} \mathcal {L}(\mathbf {\psi }) \end{aligned}$$

by Weierstrass’ theorem, see, e.g., Theorem 4.16 in Rudin (1976). $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

García-Escudero, L.A., Gordaliza, A., Greselin, F. et al. Eigenvalues and constraints in mixture modeling: geometric and computational issues. Adv Data Anal Classif 12, 203–233 (2018). https://doi.org/10.1007/s11634-017-0293-y

Download citation

Received: 22 April 2017
Revised: 01 September 2017
Accepted: 30 September 2017
Published: 23 October 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s11634-017-0293-y

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Parsimony and parameter estimation for mixtures of multivariate leptokurtic-normal distributions

On the Use of the Matrix-Variate Tail-Inflated Normal Distribution for Parsimonious Mixture Modeling

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Proof of Theorem 4

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Eigenvalues and constraints in mixture modeling: geometric and computational issues

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Parsimony and parameter estimation for mixtures of multivariate leptokurtic-normal distributions

On the Use of the Matrix-Variate Tail-Inflated Normal Distribution for Parsimonious Mixture Modeling

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Proof of Theorem 4

Appendix: Proof of Theorem 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation