[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Eigenvalues and constraints in mixture modeling: geometric and computational issues

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

This paper presents a review about the usage of eigenvalues restrictions for constrained parameter estimation in mixtures of elliptical distributions according to the likelihood approach. The restrictions serve a twofold purpose: to avoid convergence to degenerate solutions and to reduce the onset of non interesting (spurious) local maximizers, related to complex likelihood surfaces. The paper shows how the constraints may play a key role in the theory of Euclidean data clustering. The aim here is to provide a reasoned survey of the constraints and their applications, considering the contributions of many authors and spanning the literature of the last 30 years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Andrews JL, McNicholas PD (2011) Extending mixtures of multivariate \(t\)-factor analyzers. Stat Comput 21:361–373

    Article  MathSciNet  MATH  Google Scholar 

  • Andrews JL, McNicholas PD, Subedi S (2011) Model-based classification via mixtures of multivariate \(t\)-distributions. Comput Stat Data Anal 55:520–529

    Article  MathSciNet  MATH  Google Scholar 

  • Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821

    Article  MathSciNet  MATH  Google Scholar 

  • Baudry J-P, Celeux G (2015) EM for mixtures. Stat Comput 22(5):1021–1029

    MATH  Google Scholar 

  • Biernacki C (2004) Initializing EM using the properties of its trajectories in gaussian mixtures. Stat Comput 14:267–279

    Article  MathSciNet  Google Scholar 

  • Biernacki C, Chrétien S (2003) Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures with the EM. Stat Probab Lett 61:373–382

    Article  MathSciNet  MATH  Google Scholar 

  • Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the em algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41(3–4):561–575

    Article  MathSciNet  MATH  Google Scholar 

  • Boyles RA (1983) On the convergence of the EM algorithm. J R Stat Soc B 45:47–50

    MathSciNet  MATH  Google Scholar 

  • Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28:781–793

    Article  Google Scholar 

  • Cerioli A, García-Escudero LA, Mayo-Iscar A, Riani M (2016) Finding the number of groups in model-based clustering via constrained likelihoods. http://uvadoc.uva.es/handle/10324/18093

  • Chanda KC (1954) A note on the consistency and maxima of the roots of likelihood equations. Biometrika 41:56–61

    Article  MathSciNet  MATH  Google Scholar 

  • Ciuperca G, Ridolfi A, Idier J (2003) Penalized maximum likelihood estimator for normal mixtures. Scand J Stat 30:45–59

    Article  MathSciNet  MATH  Google Scholar 

  • Cramér H (1946) Math Methods Stat. Princeton University Press, Princeton

    Google Scholar 

  • Day N (1969) Estimating the components of a mixture of normal distributions. Biometrika 56(3):463–474

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  • Dennis JE (1981) Algorithms for non linear fitting. In: Proceedings of the NATO advanced research symposium, Cambridge, England. Cambridge University

  • Dykstra RL (1983) An algorithm for restricted least squares regression. J Am Stat Assoc 78(384):837–842

    Article  MathSciNet  MATH  Google Scholar 

  • Fang K, Anderson T (1990) Statistical inference in elliptically contoured and related distributions. Alberton, New York

    MATH  Google Scholar 

  • Fraley C, Raftery AE (2006) Mclust version 3: an R package for normal mixture modeling and model-based clustering. Technical report, DTIC Document

  • Fraley C, Raftery A (2007) Bayesian regularization for normal mixture estimation and model-based clustering. J Classif 24(2):155–181

    Article  MathSciNet  MATH  Google Scholar 

  • Fraley C, Raftery A, Murphy T, Scrucca L (2012) mclust version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. University of Washington, Seattle

    Google Scholar 

  • Fritz H, García-Escudero LA, Mayo-Iscar A (2012) tclust: an R package for a trimming approach to cluster analysis. J Stat Softw 47(12):1–26

    Article  Google Scholar 

  • Fritz H, García-Escudero LA, Mayo-Iscar A (2013) A fast algorithm for robust constrained clustering. Comput Stat Data Anal 61:124–136

    Article  MathSciNet  MATH  Google Scholar 

  • Gallegos MT (2002) Maximum likelihood clustering with outliers. In: Jajuga K, Sokolowski A, Bock H-H (eds) Classification, clustering, and data analysis: recent advances and applications. Springer, Berlin, pp 247–255

  • Gallegos M, Ritter G (2009) Trimmed ML estimation of contaminated mixture. Sankhya (Ser A) 71:164–220

    MathSciNet  MATH  Google Scholar 

  • García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36(3):1324–1345

    Article  MathSciNet  MATH  Google Scholar 

  • García-Escudero LA, Gordaliza A, Mayo-Iscar A (2014) A constrained robust proposal for mixture modeling avoiding spurious solutions. Adv Data Anal Classif 8(1):27–43

    Article  MathSciNet  Google Scholar 

  • García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2015) Avoiding spurious local maximizers in mixture modelling. Stat Comput 25:619–633

    Article  MathSciNet  MATH  Google Scholar 

  • García-Escudero LA, Gordaliza A, Greselin F, Ingrassia S, Mayo-Iscar A (2016) The joint role of trimming and constraints in robust estimation for mixtures of Gaussian factor analyzers. Comput Stat Data Anal 99:131–147

    Article  MathSciNet  MATH  Google Scholar 

  • García-Escudero LA, Gordaliza A, Greselin F, Ingrassia S, Mayo-Iscar A (2017) Eigenvalues in robust approaches to mixture modeling: a review. Technical report, in preparation

  • Ghahramani Z, Hinton G (1997) The EM algorithm for factor analyzers. Technical report CRG-TR-96-1, University of Toronto

  • Greselin F, Ingrassia S (2010) Constrained monotone em algorithms for mixtures of multivariate \(t\)-distributions. Stat Comput 20:9–22

    Article  MathSciNet  Google Scholar 

  • Greselin F, Ingrassia S (2015) Maximum likelihood estimation in constrained parameter spaces for mixtures of factor analyzers. Stat Comput 25(2):215–226

    Article  MathSciNet  MATH  Google Scholar 

  • Greselin F, Ingrassia S, Punzo A (2011) Assessing the pattern of covariance matrices via an augmentation multiple testing procedure. Stat Methods Appl 20:141–170

    Article  MathSciNet  MATH  Google Scholar 

  • Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13(2):795–800

    Article  MathSciNet  MATH  Google Scholar 

  • Hathaway RJ (1996) A constrained EM algorithm for univariate normal mixtures. J Stat Comput Simul 23:211–230

    Article  Google Scholar 

  • Hennig C (2004) Breakdown points for maximum likelihood estimators of location-scale mixtures. Ann Stat 32:1313–1340

    Article  MathSciNet  MATH  Google Scholar 

  • Ingrassia S (1992) A comparison between the simulated annealing and the EM algorithms in normal mixture decompositions. Stat Comput 2:203–211

    Article  Google Scholar 

  • Ingrassia S (2004) A likelihood-based constrained algorithm for multivariate normal mixture models. Stat Methods Appl 13(4):151–166

    MathSciNet  Google Scholar 

  • Ingrassia S, Rocci R (2007) Constrained monotone EM algorithms for finite mixture of multivariate gaussians. Comput Stat Data Anal 51(11):5339–5351

    Article  MathSciNet  MATH  Google Scholar 

  • Ingrassia S, Rocci R (2011) Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints. Comput Stat Data Anal 55(4):1715–1725

    Article  MathSciNet  MATH  Google Scholar 

  • Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann Math Stat 27(4):887–906

    Article  MathSciNet  MATH  Google Scholar 

  • Kiefer NM (1978) Discrete parameter variation: efficient estimation of a switching regression model. Econometrica 46(2):427–434

    Article  MathSciNet  MATH  Google Scholar 

  • Lindsay BG (1995) Mixture models: theory, geometry and applications. NSF-CBMS regional conference series in probability and statistics, vol 5. Institute of Mathematical Statistics, Hayward, CA

  • McLachlan G, Krishnan T (2008a) The EM algorithm and extensions, 2nd edn, vol 589. Wiley, New York

  • McLachlan GJ, Krishnan T (2008b) The EM algorithm and its extensions, 2nd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

    Book  MATH  Google Scholar 

  • McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comput 18:285–296

    Article  MathSciNet  Google Scholar 

  • Meng X-L (1994) On the rate of convergence of the ECM algorithm. Ann Stat 22(1):326–339

    Article  MathSciNet  MATH  Google Scholar 

  • Meng X-L, van Dyk D (1997) The EM algorithm. An old folk song sung to a fast new tune. J R Stat Soc B 59(3):511–567

    Article  MathSciNet  MATH  Google Scholar 

  • Nettleton D (1999) Convergence properties of the EM algorithm in constrained spaces. Canad J Stat 27(3):639–644

    Article  MathSciNet  MATH  Google Scholar 

  • O’Hagan A, White A (2016) Improved model-based clustering performance using bayes initialization averaging. Technical report, arXiv:1504.06870v4

  • O’Hagan A, Murphy TB, Gormley C (2013) Computational aspects of fitting mixture model via the expectation-maximisation algorithm. Comput Stat Data Anal 56(12):3843–3864

    Article  MATH  Google Scholar 

  • Puntanen S, Styan GP, Isotalo J (2011) Matrix tricks for linear statistical models. Springer, Berlin

    Book  MATH  Google Scholar 

  • Redner RA, Walker HF (1984) Mixture densities maximum likelihood and the EM algorithm. SIAM Rev 26(2):195–239

    Article  MathSciNet  MATH  Google Scholar 

  • Ritter G (2014) Cluster analysis and variable selection. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Rocci R, Gattone SA, Di Mari R (2016) A data driven equivariant approach to constrained Gaussian mixture modeling. Adv Data Anal Classif. doi:10.1007/s11634-016-0279-1

  • Rousseeuw PJ, Leroy AM (2005) Robust regression and outlier detection. Wiley, New York

    MATH  Google Scholar 

  • Rudin W (1976) Principles of mathematical analysis, 3rd edn. McGraw-Hill, Tokyo

    MATH  Google Scholar 

  • Scrucca L, Fop M, Murphy TB, Raftery A (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J 8(1):289–317

    Google Scholar 

  • Seo B, Kim D (2012) Root selection in normal mixture models. Comput Stat Data Anal 56:2454–2470

    Article  MathSciNet  MATH  Google Scholar 

  • Subedi S, Punzo A, Ingrassia S, McNicholas PD (2013) Clustering and classification via cluster-weighted factor analyzers. Adv Data Anal Classif 7:5–40

    Article  MathSciNet  MATH  Google Scholar 

  • Subedi S, Punzo A, Ingrassia S, McNicholas PD (2015) Cluster-weighted \(t\)-factor analyzers for robust model-based clustering and dimension reduction. Stat Methods Appl 24(4):623–649

  • Tanaka K (2009) Strong consistency of the maximum likelihood estimator for finite mixtures of location-scale distributions when penalty is imposed on the ratios of the scale parameters. Scand J Stat 36(1):171–184

    MathSciNet  MATH  Google Scholar 

  • Tanaka K, Takemura A (2006) Strong consistency of the maximum likelihood estimator for finite mixtures of location-scale distributions when the scale parameters are exponentially small. Bernoulli 12(6):1003–1017

    Article  MathSciNet  MATH  Google Scholar 

  • Tarone RD, Gruenhage G (1975) A note on the uniqueness of the roots of the likelihood equations for vector-valued parameters. J Am Stat Assoc 70(352):903–904

    Article  MathSciNet  MATH  Google Scholar 

  • Tarone RD, Gruenhage G (1979) Corrigenda: a note on the uniqueness of the roots of the likelihood equations for vector-valued parameters. J Am Stat Assoc 74(367):744

    Article  Google Scholar 

  • Theobald C (1975) An inequality with application to multivariate analysis. Biometrika 62(2):461–466

    Article  MathSciNet  MATH  Google Scholar 

  • Theobald C (1976) Corrections and amendments: an inequality with application to multivariate analysis. Biometrika 63(3):685

    MathSciNet  Google Scholar 

  • Tipping M, Bishop CM (1999) Mixtures of probabilistic principal component mixtures of probabilistic principal component analysers. Neural Comput 11(2):443–482

    Article  MATH  Google Scholar 

  • Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York

    MATH  Google Scholar 

  • van Laarhoven PJM, Aarts EHL (1988) Simulated annealing: theory and practice. D. Reidel, Dordecht

    Google Scholar 

  • Wu CFJ (1983) On convergence properties of the EM algorithm. Ann Stat 11:95–103

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Salvatore Ingrassia.

Appendix: Proof of Theorem 4

Appendix: Proof of Theorem 4

To maximize \(\mathcal {L}(\mathbf {\psi })\) means to jointly maximize \(|\varvec{\varSigma }_g|^{-1/2}\) and to minimize the argument of the exponential, i.e., \((\mathbf {x}_n- \varvec{\mu }_g)' \varvec{\varSigma }_g^{-1}(\mathbf {x}_n- \varvec{\mu }_g)\), for each \(g=1,\ldots ,G\). Hence, firstly we will show that, for a given \(\varvec{\varSigma }_g\), the mean vector \(\varvec{\mu }_g\) has to lie in a compact subset in \(\mathbb {R}^d\). Let C be the convex hull of \(\mathcal {X}\), i.e., the intersection of all convex sets containing the N points, given by

$$\begin{aligned} C(\mathcal {X})=\left\{ \sum _{n=1}^N u_n \mathbf {x}_n \; | \; \sum _{n=1}^N u_n=1, u_n \ge 0\right\} . \end{aligned}$$
(53)

Suppose now that \(\bar{\mathbf {\psi }} \in \mathbf {\Psi }_{a,b}\) satisfies \(\bar{\varvec{\mu }}_g \notin C(\mathcal {X})\). Then \(\mathcal {L}(\bar{\mathbf {\psi }}) \le \mathcal {L}(\mathbf {\psi }^*)\) where \(\mathbf {\psi }^* \in \mathbf {\Psi }_{a,b}\) is obtained from \(\bar{\mathbf {\psi }}\) by changing the gth mean component to \(\varvec{\mu }_g=\alpha \bar{\varvec{\mu }}_g\) for some \(\alpha \in (0,1)\) (i.e., along the line joining \(\mathbf{0}\) and \(\bar{\varvec{\mu }}_g\)) such that \(\varvec{\mu }_g \in C(\mathcal {X})\).

Let us set \(S= \{ \mathbf {\psi }\in \mathbf {\Psi }_{a,b} \, | \, \varvec{\mu }_g\in C(\mathcal {X}); 0< a \le \lambda _i(\varvec{\varSigma }_g)\le b < + \infty \quad g=1, \ldots , G\}\). Then, it follows that

$$\begin{aligned} \sup _{\mathbf {\psi }\in \mathbf {\Psi }_{a,b}} \mathcal {L}(\mathbf {\psi }) = \sup _{\mathbf {\psi }\in S} \mathcal {L}(\mathbf {\psi }) . \end{aligned}$$

By the compactness of S and the continuity of \(\mathcal {L}(\mathbf {\psi })\), there exists a parameter \(\hat{\mathbf {\psi }}\in \mathbf {\Psi }_{a,b}\) satisfying

$$\begin{aligned} \mathcal {L}(\hat{\mathbf {\psi }}) =\sup _{\mathbf {\psi }\in \mathbf {\Psi }_{a,b}} \mathcal {L}(\mathbf {\psi })= \sup _{\mathbf {\psi }\in S} \mathcal {L}(\mathbf {\psi }) \end{aligned}$$

by Weierstrass’ theorem, see, e.g., Theorem 4.16 in Rudin (1976). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

García-Escudero, L.A., Gordaliza, A., Greselin, F. et al. Eigenvalues and constraints in mixture modeling: geometric and computational issues. Adv Data Anal Classif 12, 203–233 (2018). https://doi.org/10.1007/s11634-017-0293-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-017-0293-y

Keywords

Mathematics Subject Classification

Navigation