Abstract
Many of the methods which deal with clustering in matrices of data are based on mathematical techniques such as distance-based algorithms or matrix decomposition and eigenvalues. In general, it is not possible to use statistical inferences or select the appropriateness of a model via information criteria with these techniques because there is no underlying probability model. This article summarizes some recent model-based methodologies for matrices of binary, count, and ordinal data, which are modelled under a unified statistical framework using finite mixtures to group the rows and/or columns. The model parameter can be constructed from a linear predictor of parameters and covariates through link functions. This likelihood-based one-mode and two-mode fuzzy clustering provides maximum likelihood estimation of parameters and the options of using likelihood information criteria for model comparison. Additionally, a Bayesian approach is presented in which the parameters and the number of clusters are estimated simultaneously from their joint posterior distribution. Visualization tools focused on ordinal data, the fuzziness of the clustering structures, and analogies of various standard plots used in the multivariate analysis are presented. Finally, a set of future extensions is enumerated.
Similar content being viewed by others
References
Agresti A (2010) Analysis of ordinal categorical data, 2nd edn. Wiley series in probability and statistics. Wiley, Hoboken
Agresti A (2013) Categorical data analysis, 3rd edn. Wiley series in probability and statistics. Wiley, Hoboken
Agresti A, Lang JB (1993) Quasi-symmetric latent class models, with application to rater agreement. Biometrics 49(1):131–139
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) 2nd international symposium on information theory, pp 267–281
Anderson JA (1984) Regression and ordered categorical variables. J R Stat Soc Ser B 46(1):1–30
Arnold R, Hayakawa Y, Yip P (2010) Capture-recapture estimation using finite mixtures of arbitrary dimension. Biometrics 66(2):644–655
Bartolucci F, Bacci S, Pennoni F (2014) Longitudinal analysis of self-reported health status by mixture latent auto-regressive models. J R Stat Soc Ser C (Appl Stat) 63(2):267–288
Biernacki C, Celeux G, Govaert G (1998) Assessing a mixture model for clustering with the integrated completed likelihood. Technical Report 3521, INRIA, Rhne-Alpes
Böhning D, Seidel W, Alfò M, Garel B, Patilea V, Walther G (2007) Advances in mixture models. Comput Stat Data Anal 51(11):5205–5210
Breen R, Luijkx R (2010) Assessing proportionality in the proportional odds model for ordinal logistic regression. Sociol Methods Res 39(1):3–24
Browne RP, McNicholas PD (2012) Model-based clustering, classification, and discriminant analysis of data with mixed type. J Stat Plan Inference 142(11):2976–2984
Burnham KP, Anderson DR (2002) Model selection and multi-model inference: a practical information-theoretic approach, 2nd edn. Springer, Berlin
Cai JH, Song XY, Lam KH, Ip EHS (2011) A mixture of generalized latent variable models for mixed mode and heterogeneous data. Comput Stat Data Anal 55(11):2889–2907
Cappé O, Robert C, Rydén T (2003) Reversible jump, birth-and-death, and more general continuous time MCMC samplers. J R Stat Soc Ser B 65(3):679–700
Celeux G (1998) Bayesian inference for mixtures: the label switching problem. In: Proceedings in computational statistics 1998 (COMPSTAT98), Physica-Verlag HD, pp 227–232
Costilla R, Liu I, Arnold R (2015) A Bayesian model-based approach to estimate clusters in repeated ordinal data. In: JSM Proceedings, biometrics section, pp 545–556
Dellaportas P, Papageorgiou I (2006) Multivariate mixtures of normals with unknown number of components. Stat Comput 16(1):57–68
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38
DeSantis SM, Houseman EA, Coull BA, Stemmer-Rachamimov A, Betensky RA (2008) A penalized latent class model for ordinal data. Biostatistics 9(2):249–262
Diggle PJ, Heagerty PJ, Liang KY, Zeger SL (2002) Analysis of longitudinal data, 2nd edn. Oxford University Press, Oxford
van Dijk B, van Rosmalen J, Paap R (2009) A Bayesian approach to two-mode clustering. Technical Report
Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis, 5th edn. Wiley, Chichester
Fernández D, Arnold R (2016) Model selection for mixture-based clustering for ordinal data. Aust NZ J Stat 58(4):437–472
Fernández D, Liu I (2016) A goodness-of-fit test for the ordered stereotype model. Stat Med 35(25):4660–4696
Fernández D, Pledger S (2016) Categorising count data into ordinal responses with application to ecological communities. J Agric Biol Environ Stat 21(2):348–362
Fernández D, Pledger S, Arnold R (2014) Introducing spaced mosaic plots. Research Report Series. ISSN: 1174-2011. 14-3, School of Mathematics, Statistics and Operations Research, VUW. http://msor.victoria.ac.nz/foswiki/pub/Main/ResearchReportSeries/TechReport_Spaced_Mosaic_Plots.pdf
Fernández D, Arnold R, Pledger S (2016) Mixture-based clustering for the ordered stereotype model. Comput Stat Data Anal 93:46–75
Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631
Fraley C, Raftery AE (2007) Bayesian regularization for normal mixture estimation and model-based clustering. J Classif 24(2):155–181
Friedman HP, Rubin J (1967) On some invariant criteria for grouping data. J Amer Stat Assoc 62:1159–1178
Friendly M (1991) Mosaic displays for multiway contingency tables. Technival Report 195, Department of Psychology Reports, New York University
Frühwirth-Schnatter S (2001) Markov chain Monte Carlo estimation of classical and dynamic switching and mixture models. J Am Stat Assoc 453(96):194–209
Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Wiley, New York
Frühwirth-Schnatter S, Pamminger C, Weber A, Winter-Ebmer R (2012) Labor market entry and earnings dynamics: Bayesian inference using mixtures-of-experts markov chain clustering. J Appl Econom 27(7):1116–1137
Frydman H (2005) Estimation in the mixture of markov chains moving with different speeds. J Am Stat Assoc 100(471):1046–1053
Goodman LA (1974) Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61:215–231
Gotelli NJ, Graves GR (1996) Null models in ecology. Smithsonian Institution Press, Washington
Govaert G, Nadif M (2003) Clustering with block mixture models. Pattern Recognit 36(2):463–473
Govaert G, Nadif M (2005) An EM algorithm for the block mixture model. IEEE Trans Pattern Anal Mach Intell 27(4):643–647
Govaert G, Nadif M (2010) Latent block model for contingency table. Commun Stat Theory Methods 39(3):416–425
Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4):711–732
Haberman SJ (1979) Analysis of qualitative data, vol 2. Academic Press, New York
Hartigan JA, Kleiner B (1981) Mosaics for contingency tables. In: Proceedings of the 13th symposium on the interface between computer sciencies and statistics, Springer, pp 268–273
Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–108
Hasnat MA, Velcin J, Bonnevay S, Jacques J (2015) Simultaneous clustering and model selection for multinomial distribution: a comparative study. In: International symposium on intelligent data analysis, Springer, pp 120–131
Hui FK, Taskinen S, Pledger S, Foster SD, Warton DI (2015) Model-based approaches to unconstrained ordination. Methods Ecol Evol 6(4):399–411
Hurn M, Justel A, Robert CP (2003) Estimating mixture of regressions. J Comput Graph Stat 12(1):55–79
Hurvich CM, Tsai CL (1989) Regression and time series model selection in small samples. Biometrika 76(2):297–307
Jasra A, Holmes CC, Stephens DA (2005) MCMC and the label switching problem in Bayesian mixture models. Stat Sci 20(1):50–67
Jobson JD (1992) Applied multivariate data analysis: categorical and multivariate methods. Springer texts in statistics. Springer, Berlin
Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254
Lee K, Marin JM, Robert C, Mengersen K (2008) Bayesian inference on mixtures of distributions. In: Proceedings of the platinum jubilee of the Indian statistical institute, p 776
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Cam LML, Neyman J (eds) Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, University of California Press, pp 281–297
Manly BFJ (2005) Multivariate statistical methods: a primer. Chapman & Hall, London
Manly BFJ (2007) Randomization, bootstrap and monte carlo methods in biology, 3rd edn. Chapman & Hall, London
Marin JM, Robert C (2007) Bayesian core: a practical approach to computational Bayesian statistics. Springer texts in statistics. Springer, Berlin
Marin JM, Mengersen K, Robert C (2005) Bayesian modelling and inferences on mixtures of distributions. In: Dey D, Rao CR (eds) Handbook of statistics, vol 25. Springer, New York
Marrs AD (1998) An application of reversible-jump MCMC to multivariate spherical Gaussian mixtures. In: Jordan MI, Kearns MJ, Solla SA (eds) Advances in neural information processing systems, vol 10. MIT Press, Cambridge, pp 577–583
Matechou E, Liu I, Pledger S, Arnold R (2011) Biclustering models for ordinal data, presentation at the NZ Statistical Assn. In: Annual conference, University of Auckland, 28–31 Aug 2011
Matechou E, Liu I, Fernández D, Farias M, Gjelsvik B (2016) Biclustering models for two-mode ordinal data. Psychometrika 81(3):611–624
Maurizio V (2001) Double k-means clustering for simultaneous classification of objects and variables. Advances in classification and data analysis. Springer, Berlin, Heidelberg, pp 43–52
McCullagh P (1980) Regression models for ordinal data. J R Stat Soc 42(2):109–142
McCullagh P, Yang J (2008) How many clusters? Bayesian Anal 3(1):101–120
McCune B, Grace JB (2002) Analysis of ecological communities. Struct Equ Model 28(2)
McCutcheon AL (1987) Latent class analysis. Sage Publications, Thousand Oaks
McLachlan G, Peel D (2004) Finite mixture models. Wiley series in probability and statistics. Wiley, New York
McLachlan GJ (1982) The classification and mixture maximum likelihood approaches to cluster analysis. Handb Stat 2(299):199–208
McLachlan GJ (1987) On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Appl Stat 36(3):318–324
McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Statistics, textbooks and monographs. M. Dekker, New York
McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley series in probability and statistics: applied probability and statistics. Wiley, Hoboken
McParland D, Gormley IC (2013) Clustering ordinal data via latent variable models. In: Lausen B, Van den Poel D, Ultsch A (eds) Algorithms from and for nature and life, studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 127–135
McParland D, Gormley IC (2016) Model based clustering for mixed data: clustMD. Adv Data Anal Classif 10(2):155–169
Melnykov V (2013) Finite mixture modelling in mass spectrometry analysis. J R Stat Soc Ser C (Appl Stat) 62(4):573–592
Melnykov V, Maitra R (2010) Finite mixture models and model-based clustering. Stat Surv 4(9):80–116
Moustaki I (2000) A latent variable model for ordinal variables. Appl Psychol Meas 24(3):211–233
Nadif M, Govaert G (2005) A comparison between block CEM and two-way CEM algorithms to cluster a contingency table. In: European conference on principles of data mining and knowledge discovery, Springer, pp 609–616
Pamminger C, Frühwirth-Schnatter S et al (2010) Model-based clustering of categorical time series. Bayesian Anal 5(2):345–368
Pledger S (2000) Unified maximum likelihood estimates for closed capture-recapture models using mixtures. Biometrics 56(2):434–442
Pledger S, Arnold R (2014) Multivariate methods using mixtures: correspondence analysis, scaling and pattern-detection. Comput Stat Data Anal 71:241–261
Quinn GP, Keough MJ (2002) Experimental design and data analysis for biologists. Cambridge University Press, Cambridge
Raftery AE, Dean N (2006) Variable selection for model-based clustering. J Am Stat Assoc 101(473):168–178
Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components. J R Stat Soc Ser B 59(4):731–792
Rocci R, Vichi M (2008) Two-mode multi-partitioning. Comput Stat Data Anal 52(4):1984–2003
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Self SG, Liang KY (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 82(398):605–610
Silvestre C, Cardoso MG, Figueiredo MA (2014) Identifying the number of clusters in discrete mixture models. arXiv:1409.7419
Skrondal A, Rabe-Hesketh S (2004) Generalized latent variable modeling: multilevel, longitudinal, and structural equation models. Monographs on statistics and applied probability. Chapman & Hall, London
Stahl D, Sallis H (2012) Model-based cluster analysis. Wiley Interdiscip Rev Comput Stat 4(4):341–358
Stephens M (2000a) Bayesian analysis of mixture models with an unknown number of components-an alternative to reversible jump methods. Ann Stat 28(1):40–74
Stephens M (2000b) Dealing with label switching in mixture models. J R Stat Soc Ser B 62(4):795–809
Sugar CA, James GM (2003) Finding the number of clusters in a dataset: an information-theoretic approach. J Am Stat Assoc 98(463):750–763
Tibshirani R, Walther G (2005) Cluster validation by prediction strength. J Comput Graph Stat 14(3):511–528
Vermunt JK (2001) The use of restricted latent class models for defining and testing nonparametric and parametric item response theory models. Appl Psychol Meas 25(3):283–294
Vermunt JK, Hagenaars JA (2004) Ordinal longitudinal data analysis. In: Hauspie R, Cameron N, Molinari L (eds) Methods in human growth research. Cambridge University Press, Cambridge
Vermunt JK, Van Dijk L (2001) A nonparametric random-coefficients approach: the latent class regression model. Multilevel Model Newsl 13(2):6–13
Vichi M (2001) Double k-means clustering for simultaneous classification of objects and variables. In: Borra S, Rocci R, Vichi M, Schader M (eds) Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 43–52
Wagenmakers EJ, Lee M, Lodewyckx T, Iverson GJ (2008) Bayesian versus frequentist inference. Springer, Berlin
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou ZH, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37
Wyse J, Friel N (2012) Block clustering with collapsed latent block models. Stat Comput 22(2):415–428
Zhang Z, Chan KL, Wu Y, Chen C (2004) Learning a multivariate gaussian mixture model with the reversible jump MCMC algorithm. Stat Comput 14(4):343–355
Acknowledgements
This work was supported by the Marsden Fund on “Dimension reduction for mixed type multivariate data” (Award Number E2987-3648) from New Zealand Government funding, administrated by the Royal Society of New Zealand.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Fernández, D., Arnold, R., Pledger, S. et al. Finite mixture biclustering of discrete type multivariate data. Adv Data Anal Classif 13, 117–143 (2019). https://doi.org/10.1007/s11634-018-0324-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-018-0324-3