Abstract
Mixture model-based clustering, usually applied to multidimensional data, has become a popular approach in many data analysis problems, both for its good statistical properties and for the simplicity of implementation of the Expectation–Maximization (EM) algorithm. Within the context of a railway application, this paper introduces a novel mixture model for dealing with time series that are subject to changes in regime. The proposed approach, called ClustSeg, consists in modeling each cluster by a regression model in which the polynomial coefficients vary according to a discrete hidden process. In particular, this approach makes use of logistic functions to model the (smooth or abrupt) transitions between regimes. The model parameters are estimated by the maximum likelihood method solved by an EM algorithm. This approach can also be regarded as a clustering approach which operates by finding groups of time series having common changes in regime. In addition to providing a time series partition, it therefore provides a time series segmentation. The problem of selecting the optimal numbers of clusters and segments is solved by means of the Bayesian Information Criterion. The ClustSeg approach is shown to be efficient using a variety of simulated time series and real-world time series of electrical power consumption from rail switching operations.
Similar content being viewed by others
References
Banfield JD, Raftery AE (1993) Model-based gaussian and non-gaussian clustering. Biometrics 49: 803–821
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7): 719–725
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recogn. 28(5): 781–793
Chamroukhi F, Samé A, Govaert G, Aknin P (2010) A hidden process regression model for functional data description. application to curve discrimination. Neurocomputing 73: 1210–1221
Chiou J, Li P (2007) Functional clustering and identifying substructures of longitudinal data. J Royal Stat Soc Ser B (Stat Methodol) 69(4): 679–699
Coke G, Tsao M (2010) Random effects mixture models for clustering electrical load series. J Time Ser Anal 31(6): 451–464
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm (with discussion). J Royal Stat Soc B 39: 1–38
Gaffney S, Smyth P (1999) Trajectory clustering with mixtures of regression models. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, San Diego, CA, USA
Gaffney S, Smyth P (2003) Curve clustering with random effects regression mixtures. In: Proceedings of the ninth international workshop on artificial intelligence and statistics, society for artificial intelligence and statistics, Key West, Florida, USA
Green P (1984) Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives. J Royal Stat Soc B 46(2): 149–192
Hébrail G, Hugueney B, Lechevallier Y, Rossi F (2010) Exploratory analysis of functional data via clustering and optimal segmentation. Neurocomputing 73(7–9): 1125–1141
James G, Sugar C (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462): 397–408
Liu X, Yang M (2009) Simultaneous curve registration and clustering for functional data. Comput Stat Data Anal 53(4): 1361–1376
McLachlan GJ, Krishnan K (2008) The EM algorithm and extension, 2nd edn. Wiley, New York
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Ng S, McLachlan G, Wang K, Ben-Tovim Jones L, Ng S (2006) A mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics 22(14): 1745
Ramsay JO, Silverman BW (1997) Fuctional data analysis. Springer Series in Statistics, Springer, New York
Schwarz G (1978) Estimating the number of components in a finite mixture model. Ann Stat 6: 461–464
Shi J, Wang B (2008) Curve prediction and clustering with mixtures of gaussian process functional regression models. Stat Comput 18(3): 267–283
Wong C, Li W (2000) On a mixture autoregressive model. J Royal Stat Soc Ser B Stat Methodol 62(1): 95–115
Xiong Y, Yeung D (2004) Time series clustering with arma mixtures. Pattern Recogn 37(8): 1675–1689
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Samé, A., Chamroukhi, F., Govaert, G. et al. Model-based clustering and segmentation of time series with changes in regime. Adv Data Anal Classif 5, 301–321 (2011). https://doi.org/10.1007/s11634-011-0096-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-011-0096-5
Keywords
- Clustering
- Time series
- Change in regime
- Mixture model
- Regression mixture
- Hidden logistic process
- EM algorithm