Model-based clustering using copulas with applications

Ioannis Kosmidis¹ &
Dimitris Karlis²

1594 Accesses
1 Altmetric
Explore all metrics

Abstract

The majority of model-based clustering techniques is based on multivariate normal models and their variants. In this paper copulas are used for the construction of flexible families of models for clustering applications. The use of copulas in model-based clustering offers two direct advantages over current methods: (i) the appropriate choice of copulas provides the ability to obtain a range of exotic shapes for the clusters, and (ii) the explicit choice of marginal distributions for the clusters allows the modelling of multivariate data of various modes (either discrete or continuous) in a natural way. This paper introduces and studies the framework of copula-based finite mixture models for clustering applications. Estimation in the general case can be performed using standard EM, and, depending on the mode of the data, more efficient procedures are provided that can fully exploit the copula structure. The closure properties of the mixture models under marginalization are discussed, and for continuous, real-valued data parametric rotations in the sample space are introduced, with a parallel discussion on parameter identifiability depending on the choice of copulas for the components. The exposition of the methodology is accompanied and motivated by the analysis of real and artificial data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Article 14 January 2022

Clustering Using Student t Mixture Copulas

Article Open access 13 February 2021

Advances in Robust Constrained Model Based Clustering

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Alfo, M., Maruotti, A., Trovato, G.: A finite mixture model for multivariate counts under endogenous selectivity. Stat. Comput. 21(2), 185–202 (2011)
Article MathSciNet Google Scholar
Andrews, J.L., McNicholas, P.D.: Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis. J. Stat. Plan. Inference 141, 1479–1486 (2011)
Article MathSciNet MATH Google Scholar
Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
Article MathSciNet MATH Google Scholar
Bedford, T., Cooke, R.M.: Vines—a new graphical model for dependent random variables. Ann. Stat. 30, 1031–1068 (2002)
Article MathSciNet MATH Google Scholar
Brechmann, E.C., Schepsmeier, U.: Modeling dependence with c- and d-vine copulas: The r package cdvine. J. Stat. Softw. 52(3), 1–27 (2013)
Browne, R., McNicholas, P.: Model-based clustering, classification, and discriminant analysis of data with mixed type. J. Stat. Plan. Inference 142(11), 2976–2984 (2012)
Article MathSciNet MATH Google Scholar
Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recogn. 28, 781–793 (1995)
Article Google Scholar
Dean, N., Nugent, R.: Clustering student skill set profiles in a unit hypercube using mixtures of multivariate betas. Adv. Data Anal. Classif. 7(3), 339–357 (2013)
Article MathSciNet MATH Google Scholar
Di Lascio, F.M.L., Giannerini, S.: A copula-based algorithm for discovering patterns of dependent observations. J. Classif. 29, 50–75 (2012)
Article MathSciNet MATH Google Scholar
Fang, H.-B., Fang, K.-T., Kotz, S.: The meta-elliptical distributions with given marginals. J. Multivar. Anal. 82(1), 1–16 (2002). [Corr.: Journal of Multivariate Analysis 94, 222–223 (2005)]
Article MathSciNet MATH Google Scholar
Forbes, F., Wraith, D.: A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: application to robust clustering. Stat. Comput. 24(6), 971–984 (2014)
Article MathSciNet MATH Google Scholar
Fraley, C., Raftery, A.E., Murphy, T.B., Scrucca, L.: mclust version 4 for R: Normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report 597, Department of Statistics, University of Washington, Seattle (2012)
Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. Biostatistics 11(2), 317–336 (2010)
Article Google Scholar
Genest, C., Nešlehová, J.: A primer on copulas for count data. ASTIN Bull. 37(2), 475–515 (2007)
Article MathSciNet MATH Google Scholar
Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., Hothorn, T.: mvtnorm: Multivariate normal and t distributions. R package version 0.9-9996. http://cran.r-project.org/package=mvtnorm (2013)
Hanson, A.J.: Rotations for \(n\)-dimensional graphics. In Paeth, A. W. (Ed.), Graphics Gems V, Number II.4 in The Graphics Gems, Chapter II, pp. 55–64. Academic Press, San Diego (1995)
Hennig, C.: Methods for merging Gaussian mixture components. Adv. Data Anal. Classif. 4(1), 3–34 (2010)
Article MathSciNet MATH Google Scholar
Henningsen, A., Toomet, O.: maxlik: A package for maximum likelihood estimation in R. Comput. Stat. 26(3), 443–458 (2011)
Article MathSciNet MATH Google Scholar
Hofert, M., Kojadinovic, I., Maechler, M., Yan, J.: copula: Multivariate Dependence with Copulas. R package version 0.999-13 (2015)
Hofert, M., Mächler, M., McNeil, A.J.: Likelihood inference for Archimedean copulas in high dimensions under known margins. J. Multivar. Anal. 110, 133–150 (2012)
Article MathSciNet MATH Google Scholar
Jajuga, K., Papla, D.: Copula functions in model based clustering. From Data and Information Analysis to Knowledge Engineering Studies in Classification, Data Analysis, and Knowledge Organization, vol. 15, pp. 606–613. Springer, Berlin (2006)
Chapter Google Scholar
Joe, H.: Approximations to multivariate normal rectangle probabilities based on conditional expectations. J. Am. Stat. Assoc. 90(431), 957–964 (1995)
Article MathSciNet MATH Google Scholar
Joe, H.: Multivariate Models Depend Concepts. Chapman & Hall Ltd, London (1997)
Book MATH Google Scholar
Johnson, N., Kotz, S., Balakrishnan, N.: Multivariate Discrete Distributions. Wiley, New York (1997)
MATH Google Scholar
Jorgensen, M.: Using multinomial mixture models to cluster internet traffic. Aust. N. Z. J. Stat. 46(2), 205–218 (2004)
Article MathSciNet MATH Google Scholar
Karlis, D., Meligkotsidou, L.: Finite multivariate Poisson mixtures with applications. J. Stat. Plan. Inference 137, 1942–1960 (2007)
Article MathSciNet MATH Google Scholar
Karlis, D., Santourian, A.: Model-based clustering with non-elliptically contoured distributions. Stat. Comput. 19(1), 73–83 (2009)
Article MathSciNet Google Scholar
Lee, S., McLachlan, G.: Finite mixtures of multivariate skew t-distributions: some recent and new results. Stat. Comput. 24, 181–202 (2014)
Article MathSciNet MATH Google Scholar
Lin, T.-I., Ho, H., Lee, C.-R.: Flexible mixture modelling using the multivariate skew-t-normal distribution. Stat. Comput. 24(4), 531–546 (2014)
Marbac, M., Biernacki, C., Vandewalle, V.: Model-based clustering of Gaussian copulas for mixed data. ArXiv e-prints (2014). arXiv:1405.1299
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
Book MATH Google Scholar
McNicholas, P.D., Murphy, T.B.: Parsimonious Gaussian mixture models. Stat. Comput. 18(3), 285–296 (2008)
Article MathSciNet Google Scholar
Meng, X.-L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80, 267–278 (1993)
Article MathSciNet MATH Google Scholar
Morris, K., McNicholas, P.: Dimension reduction for model-based clustering via mixtures of shifted asymmetric Laplace distributions. Stat. Probab. Lett. 83(9), 2088–2093 (2013)
Article MathSciNet MATH Google Scholar
Nelsen, R.: An introduction to copulas, Springer series in statistics, 2nd ed. Springer, Berlin (2006)
Panagiotelis, A., Czado, C., Joe, M.: Pair copula constructions for multivariate discrete data. J. Am. Stat. Assoc. 107(499), 1063–1072 (2012)
Article MathSciNet MATH Google Scholar
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2015)
Robitzsch, A., Kiefer, T., George, A.C., Uenlue, A.: CDM: cognitive diagnosis modeling. R package version 2.6-13. http://cran.r-project.org/package=CDM (2014)
Vrac, M., Billard, L., Diday, E., Chèdin, A.: Copula analysis of mixture models. Comput. Stat. 27, 427–457 (2012)
Article MathSciNet MATH Google Scholar
Zimmer, D., Trivedi, P.: Using trivariate copulas to model sample selection and treatment effects: application to family health care demand. J. Bus. Econ. Stat. 24(1), 63–72 (2006)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistical Science, University College London, Gower Street, London, WC1E 6BT, UK
Ioannis Kosmidis
Department of Statistics, Athens University of Economics and Business, 76 Patision Str, Athens, 10434, Greece
Dimitris Karlis

Authors

Ioannis Kosmidis
View author publications
You can also search for this author in PubMed Google Scholar
Dimitris Karlis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimitris Karlis.

Electronic supplementary material

Below is the link to the electronic supplementary material.

11222_2015_9590_MOESM1_ESM.pdf

Supplementary material extends Example 4.2 to illustrate that distinct sensible, transformations can lead to different results. R scripts that reproduce the analyses undertaken in this paper are available upon request to the authors.(PDF 96.5KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kosmidis, I., Karlis, D. Model-based clustering using copulas with applications. Stat Comput 26, 1079–1099 (2016). https://doi.org/10.1007/s11222-015-9590-5

Download citation

Received: 15 August 2014
Accepted: 23 June 2015
Published: 23 July 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11222-015-9590-5

Model-based clustering using copulas with applications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Clustering Using Student t Mixture Copulas

Advances in Robust Constrained Model Based Clustering

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

11222_2015_9590_MOESM1_ESM.pdf

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Model-based clustering using copulas with applications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Clustering Using Student t Mixture Copulas

Advances in Robust Constrained Model Based Clustering

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

11222_2015_9590_MOESM1_ESM.pdf

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation