Abstract
The model selection in a mixture setting was extensively studied in literature in order to assess the number of components. There exist different classes of criteria; we focus on those penalizing the log-likelihood with a penalty term, that accounts for model complexity. However, a full likelihood is not always computationally feasible. To overcome this issue, the likelihood is replaced with a surrogate objective function. Thus, a question arises naturally: how the use of a surrogate objective function affects the definition of model selection criteria? The model selection and the model estimation are distinct issues. Even if it is not possible to establish a cause and effect relationship between them, they are linked to each other by the likelihood. In both cases, we need to approximate the likelihood; to this purpose, it is computationally efficient to use the same surrogate function. The aim of this paper is not to provide an exhaustive survey of model selection, but to show the main used criteria in a standard mixture setting and how they can be adapted to a non-standard context. In the last decade two criteria based on the observed composite likelihood were introduced. Here, we propose some new extensions of the standard criteria based on the expected complete log-likelihood to the non-standard context of a pairwise likelihood approach. The main advantage is a less demanding and more stable estimation. Finally, a simulation study is conducted to test and compare the performances of the proposed criteria with those existing in literature. As discussed in detail in Sect. 7, the novel criteria work very well in all scenarios considered.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory. Akademinai Kiado (pp. 267–281).
Akaike, H. (1978). A bayesian analysis of the minimum AIC procedure. Annals of the Institute of Statistical Mathematics, 30(1), 9–14.
Banfield, J. D., & Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803–821.
Biernacki, C., Celeux, G., & Govaert, G. (1999). An improvement of the NEC criterion for assessing the number of clusters in a mixture model. Pattern Recognition Letters, 20(3), 267–272.
Biernacki, C., Celeux, G., & Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7), 719–725.
Biernacki, C., & Govaert, G. (1997). Using the classification likelihood to choose the number of clusters. Computing Science and Statistics, 29, 451–457.
Böhning, D., Dietz, E., Schaub, R., Schlattmann, P., & Lindsay, B. G. (1994). The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics, 46(2), 373–388.
Bozdogan, H. (1983). Determining the Number of Component Clusters in the Standard Multivariate Normal Mixture Model Using Model-Selection Criteria. No. UIC/DQM/A83-1. Illinois Univ at Chicago Circle Dept of Quantitative Methods.
Bozdogan, H. (1993). Choosing the number of component clusters in the mixture-model using a new informational complexity criterion of the inverse-Fisher information matrix. In O. Opitz, B. Lausen, & R. Klar Information and classification. Studies in classification, data analysis and knowledge organization (pp. 40–54). Berlin, Heidelberg: Springer.
Celeux, G., & Soromenho, G. (1996). An entropy criterion for assessing the number of clusters in a mixture model. Journal of classification 13(2), 195–212.
Everitt, B. (1988). A finite mixture model for the clustering of mixed-mode data. Statistics & Probability Letters, 6(5), 305–309.
Everitt, B., & Merette, C. (1990). The clustering of mixed-mode data: a comparison of possible approaches. Journal of Applied Statistics, 17(3), 283–297.
Gao, X., & Song, P. X. K. (2010). Composite likelihood bayesian information criteria for model selection in high-dimensional data. Journal of the American Statistical Association, 105(492), 1531–1540.
Hathaway, R. J. (1986). Another interpretation of the EM algorithm for mixture distributions. Statistics and Probability Letters, 4(2), 53–56.
Keribin, C. (2000). Consistent estimation of the order of mixture models. Sankhya: The Indian Journal of Statistics, Series A, 62, 49–66.
Leroux, B. G. (1992). Consistent estimation of a mixing distribution. The Annals of Statistics, 20(3), 1350–1360.
Liang, Z., Jaszczak, R. J., & Coleman, R. E. (1992). Parameter estimation of finite mixtures using the EM algorithm and information criteria with application to medical image processing. IEEE Transactions on Nuclear Science, 39(4), 1126–1133.
Lindsay, B. G. (1983). Efficiency of the conditional score in a mixture setting. The Annals of Statistics, 11, 486–497.
Lindsay, B. G. (1988). Composite likelihood methods. Contemporary Mathematics, 80, 221–239.
Lubke, G., & Neale, M. (2008). Distinguishing between latent classes and continuous factors with categorical outcomes: Class invariance of parameters of factor mixture models. Multivariate Behavioral Research, 43(4), 592–620.
Mclachlan, G. J. (1987): On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Applied Statistics, 36, 318–324.
Ranalli, M., & Rocci, R. (2014). Mixture models for ordinal data: a pairwise likelihood approach. Statistics and Computing. doi: 10.1007/s11222-014-9543-4.
Sawa, T. (1978). Information criteria for discriminating among alternative regression models. Econometrica: Journal of the Econometric Society, 46, 1273–1291.
Schwarz, G. et al. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.
Shibata, R. (1980). Asymptotically efficient selection of the order of the model for estimating parameters of a linear process. The Annals of Statistics, 8, 147–164.
Sugiura, N. (1978). Further analysts of the data by Akaike’s information criterion and the finite corrections. Communications in Statistics-Theory and Methods, 7(1), 13–26.
Takeuchi, K. (1976). Distribution of informational statistics and a criterion of model fitting. Mathematical Sciences, 153, 1, 12–18.
Varin, C., Reid, N., & Firth, D. (2011). An overview of composite likelihood methods. Statistica Sinica, 21(1), 1–41.
Varin, C., & Vidoni, P. (2005) A note on composite likelihood inference and model selection. Biometrika, 92(3), 519–528.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Appendix
Appendix
Maximizing the observed pairwise log-likelihood is equivalent to maximize the fuzzy classification pairwise log-likelihood. This partially justifies the behaviour of the criteria based on the expected complete pairwise log-likelihood. In this appendix we derive the pairwise EN term. This is useful to two things: if we define the pairwise EN, the criteria based on the expected complete pairwise log-likelihood can be seen as the observed pairwise likelihood penalized by the pairwise EN term. Moreover, it gives us an idea about the separation between the mixture components.
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ranalli, M., Rocci, R. (2016). Standard and Novel Model Selection Criteria in the Pairwise Likelihood Estimation of a Mixture Model for Ordinal Data. In: Wilhelm, A., Kestler, H. (eds) Analysis of Large and Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-25226-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-25226-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25224-7
Online ISBN: 978-3-319-25226-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)