Abstract
Maximum likelihood estimation of Gaussian mixture models with different class-specific covariance matrices is known to be problematic. This is due to the unboundedness of the likelihood, together with the presence of spurious maximizers. Existing methods to bypass this obstacle are based on the fact that unboundedness is avoided if the eigenvalues of the covariance matrices are bounded away from zero. This can be done imposing some constraints on the covariance matrices, i.e. by incorporating a priori information on the covariance structure of the mixture components. The present work introduces a constrained approach, where the class conditional covariance matrices are shrunk towards a pre-specified target matrix \(\varvec{\varPsi }.\) Data-driven choices of the matrix \(\varvec{\varPsi },\) when a priori information is not available, and the optimal amount of shrinkage are investigated. Then, constraints based on a data-driven \(\varvec{\varPsi }\) are shown to be equivariant with respect to linear affine transformations, provided that the method used to select the target matrix be also equivariant. The effectiveness of the proposal is evaluated on the basis of a simulation study and an empirical example.
Similar content being viewed by others
References
Anderson TW, Gupta SD (1963) Some inequalities on characteristic roots of matrices. Biometrika 50:522–524
Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79
Biernacki C, Chrétien S (2003) Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures with the EM. Stat Probab Lett 61:373–382
Browne RP, Subedi S, McNicholas P (2013) Constrained optimization for a subset of the Gaussian parsimonious clustering models. arXiv:1306.5824
Chen J, Tan X (2009) Inference for multivariate normal mixtures. J Multivar Anal 100:1367–1383
Chen J, Tan X, Zhang R (2008) Inference for normal mixtures in mean and variance. Stat Sin 18(2):443
Ciuperca G, Ridolfi A, Idier J (2003) Penalized maximum likelihood estimator for normal mixtures. Scand J Stat 30(1):45–59
Day NE (1969) Estimating the components of a mixture of two normal distributions. Biometrika 56:463–474
Dawid AP (1981) Some matrix-variate distribution theory: notational considerations and a Bayesian application. Biometrika 68(1):265–274
Dickey JM (1967) Matricvariate generalizations of the multivariate t distribution and the inverted multivariate t distribution. Ann Math Stat 38(2):511–518
Di Mari R, Oberski DL, Vermunt JK (2016) Bias-adjusted three-step latent Markov modeling with covariates. Struct Equ Model Multidiscip J. doi:10.1080/10705511.2016.1191015
Doherty KAJ, Adams RG (2007) Unsupervised learning with normalised data and non-Euclidean norms. Appl Soft Comput 7:20321
Fraley C, Raftery AE (2007) Bayesian regularization for normal mixture estimation and model-based clustering. J Classif 24(2):155–181
Fritz H, Garcia-Escudero LA, Mayo-Iscar A (2013) A fast algorithm for robust constrained clustering. Comput Stat Data Anal 61:124–136
Gallegos MT, Ritter G (2009a) Trimming algorithms for clustering contaminated grouped data and their robustness. Adv Data Anal Classif 3(2):135–167
Gallegos MT, Ritter G (2009b) Trimmed ML estimation of contaminated mixtures. Sankhya Indian J Stat Ser A (2008-) 71(2):164–220
Garcia-Escudero LA, Gordaliza A, Matran C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36:1324–1345
Garcia-Escudero LA, Gordaliza A, Matran C, Mayo-Iscar A (2014) Avoiding spurious local maximizers in mixture modeling. Stat Comput 25(3):619–633
Greselin F, Ingrassia S (2013) Maximum likelihood estimation in constrained parameter spaces for mixtures of factor analyzers. Stat Comput 25(2):215–226
Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13:795–800
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Ingrassia S (2004) A likelihood-based constrained algorithm for multivariate normal mixture models. Stat Methods Appl 13:151–166
Ingrassia S, Rocci R (2007) A constrained monotone EM algorithm for finite mixture of multivariate Gaussians. Comput Stat Data Anal 51:5339–5351
Ingrassia S, Rocci R (2011) Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints. Comput Stat Data Anal 55(4):1715–1725
James W, Stein C (1961) Estimation with quadratic loss. In: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability Vol. 1, No. 1961, pp 361–379
Kearns M (1997) A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split. Neural Comput 9(5):1143–1161
Kiefer NM (1978) Discrete parameter variation: efficient estimation of a switching regression model. Econometrica 46:427–434
Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann Math Stat 27:886906
Kim D, Seo B (2014) Assessment of the number of components in Gaussian mixture models in the presence of multiple local maximizers. J Multivar Anal 125:100–120
Kleinber J (2002) An impossibility theorem for clustering. In: Advances in neural information processing systems, (NIPS). MIT Press, Cambridge, pp 446–453
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
McLachlan GJ, Peel D (1998) Robust cluster analysis via mixtures of multivariate t-distributions. In: Amin A, Dori D, Pudil P, Freeman H (eds) Lecture Notes in Computer Science, vol 1451. Springer, Berlin, pp 658–666
Milligan GW, Cooper MC (1988) A study of standardization of variables in cluster analysis. J Classif 5:181–204
Peel D, McLachlan GJ (2000) Robust mixture modelling using the t distribution. Stat Comput 10(4):339–348
Policello II GE (1981) Conditional maximum likelihood estimation in gaussian mixtures. In: Taillie C, Patil GP, Baldessari BA (eds) Statistical distributions in scientific work. Volume 5–inferential problems and properties proceedings of the NATO advanced study institute held at the Università degli Studi di Trieste, Trieste, Italy, July 10-August 1 1980. NATO advanced study institutes series, vol 79. Springer, Netherlands, pp 111–125
Ridolfi A, Idier J (1999) Penalized maximum likelihood estimation for univariate normal mixture distributions. In: Actes du 17’ colloque GRETSI, Vannes, pp 259–262
Ridolfi A, Idier J (2000) Penalized maximum likelihood estimation for univariate normal mixture distributions. Bayesian inference and maximum entropy methods, MaxEnt workshops. Gif-sur-Yvette, July 2000
Ritter G (2014) Robust cluster analysis and variable selection. CRC Press, Boca Raton
Roth M (2013) On the multivariate \(t\) distribution. Technical report, Linköping university, Division of automatic control
Seo B, Kim D (2012) Root selection in normal mixture models. Comput Stat Data Anal 56:2454–2470
Smyth P (1996) Clustering using Monte-Carlo cross validation. In Proceedings of the second international conference on knowledge discovery and data mining. AAAI Press, Menlo Park, p 126133
Smyth P (2000) Model selection for probabilistic clustering using cross-validated likelihood. Stat Comput 10(1):63–72
Snoussi H, Mohammad-Djafari A (2001) Penalized maximum likelihood for multivariate Gaussian mixture. In: Fry RL (ed) MaxEnt workshops: Bayesian inference and maximum entropy methods, Aug 2001, pp 36–46
Tan X, Chen J, Zhang R (2007) Consistency of the constrained maximum likelihood estimator in finite normal mixture models. In: Proceedings of the American Statistical Association, American Statistical Association, Alexandria, 2007 [CD-ROM], pp 2113–2119
Tanaka K, Takemura A (2006) Strong consistency of the maximum likelihood estimator for finite mixtures of locationscale distributions when the scale parameters are exponentially small. Bernoulli 12(6):1003–1017
van der Laan MJ, Dudoit S, Keles S (2004) Asymptotic optimality of likelihood-based cross-validation. Stat Appl Genet Mol Biol 3(1):1–23
Vermunt JK (2010) Latent class modeling with covariates: two improved three-step approaches. Polit Anal 18(4):450–469
Xu J, Tan X, Zhang R (2010) A note on Phillips (1991): a constrained maximum likelihood approach to estimating switching regressions. J Econom 154:35–41
Acknowledgements
The authors are grateful to the associate editor and the two anonymous referees for their useful comments which have lead to a considerable improvement of the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rocci, R., Gattone, S.A. & Di Mari, R. A data driven equivariant approach to constrained Gaussian mixture modeling. Adv Data Anal Classif 12, 235–260 (2018). https://doi.org/10.1007/s11634-016-0279-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-016-0279-1