[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Piecewise Regression Mixture for Simultaneous Functional Data Clustering and Optimal Segmentation

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

This paper introduces a novel mixture model-based approach to the simultaneous clustering and optimal segmentation of functional data, which are curves presenting regime changes. The proposed model consists of a finite mixture of piecewise polynomial regression models. Each piecewise polynomial regression model is associated with a cluster, and within each cluster, each piecewise polynomial component is associated with a regime (i.e., a segment). We derive two approaches to learning the model parameters: the first is an estimation approach which maximizes the observed-data likelihood via a dedicated expectation-maximization (EM) algorithm, then yielding a fuzzy partition of the curves into K clusters obtained at convergence by maximizing the posterior cluster probabilities. The second is a classification approach and optimizes a specific classification likelihood criterion through a dedicated classification expectation-maximization (CEM) algorithm. The optimal curve segmentation is performed by using dynamic programming. In the classification approach, both the curve clustering and the optimal segmentation are performed simultaneously as the CEM learning proceeds. We show that the classification approach is a probabilistic version generalizing the deterministic K-means-like algorithm proposed in Hébrail, Hugueney, Lechevallier, and Rossi (2010). The proposed approach is evaluated using simulated curves and real-world curves. Comparisons with alternatives including regression mixture models and the K-means-like algorithm for piecewise regression demonstrate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Similar content being viewed by others

References

  • ANDREWS, J., and MCNICHOLAS, P. (2014), “Variable Selection for Clustering and Classification”, Journal of Classification, 31(2), 136–153.

  • BANFIELD, J.D., and RAFTERY A.E. (1993), “Model-Based Gaussian and Non-Gaussian Clustering”, Biometrics, 49(3), 803–821.

  • BELLMAN, R. (1961), “On the Approximation of Curves by Line Segments Using Dynamic Programming”, Communications of the Association for Computing Machinery, 4(6), 284.

  • BIERNACKI, C., CELEUX, G., and GOVAERT, G. (2000), “Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood”, IEEE PAMI, 22(7), 719–725.

  • BIERNACKI, C., CELEUX, G., and GOVAERT, G. (2003), “Choosing Starting Values for the EM Algorithm for Getting the Highest Likelihood in Multivariate Gaussian Mixture Models”, Computational Statistics and Data Analysis, 41, 561–575.

  • BOUVEYRON, C. (2014), “Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes”, Journal of Classification, 31(1), 49–84.

  • BOUVEYRON, C., and BRUNET, C. (2014), “Model-Based Clustering of High-Dimensional Data: A Review”, Computational Statistics & Data Analysis, 71, 52–78.

  • BRAILOVSKY, V.L., and KEMPNER, Y. (1992), “Application of Piecewise Regression to Detecting Internal Structure of Signal”, Pattern Recognition, 25(11), 1361–1370.

  • CELEUX, G., and GOVAERT, G. (1992), “A Classification EM Algorithm for Clustering and Two Stochastic Versions”, Computational Statistics and Data Analysis, 14, 315–332.

  • CELEUX, G., and GOVAERT, G. (1993), “Comparison of the Mixture and the Classification Maximum Likelihood in Cluster Analysis”, Journal of Statistical Computation and Simulation, 47, 127–146.

  • CELEUX, G., and GOVAERT, G. (1995), “Gaussian Parsimonious Clustering Models”, Pattern Recognition, 28(5), 781–793.

  • CHAMROUKHI, F. (2010), “Hidden Process Regression for Curve Modeling, Classification and Tracking”, Ph.D. thesis, Université de Technologie de Compiègne, France.

  • CHAMROUKHI, F., SAMÉ, A., GOVAERT, G., and AKNIN, P. (2009a), “A Regression Model with a Hidden Logistic Process for Feature Extraction from Time Series”, Neural Networks, 22(5-6), 593–602.

  • CHAMROUKHI, F., SAMÉ, A., GOVAERT, G., and AKNIN, P. (2009b), “Time Series Modeling by a Regression Approach Based on a Latent Process”, Neural Networks, 22(5-6), 593–602.

  • CHAMROUKHI, F., SAMÉ, A., GOVAERT, G., and AKNIN, P. (2010), “A Hidden Process Regression Model For Functional Data Description. Application to Curve Discrimination”, Neurocomputing, 73(7-9), 1210–1221.

  • CHAMROUKHI, F., SAMÉ, A., AKNIN, P., and GOVAERT, G. (2011), “Model-Based Clustering with Hidden Markov Model Regression for Time Series with Regime Changes”, in International Joint Conference on Neural Networks, pp. 2814–2821.

  • CHAMROUKHI, F., HERVÉ, G., and SAMÉ, A. (2013), “Model-Based Functional Mixture Discriminant Analysis with Hidden Process Regression for Curve Classification”, Neurocomputing, 112, 153–163.

  • DEMPSTER, A.P., LAIRD, N.M., and RUBIN, D.B. (1977), “Maximum Likelihood from Incomplete Data via the EM Algorithm”, Journal of the Royal Statistical Society, Series B, 39(1), 1–38.

  • FEARNHEAD, P. (2006), “Exact and Efficient Bayesian Inference for Multiple Changepoint Problems”, Statistics and Computing, 16, 203–213.

  • FEARNHEAD, P., and LIU, Z. (2007), “Online Inference for Multiple Changepoint Problems”, Journal of the Royal Statistical Society, Series B, 69, 589–605.

  • FERRARI-TRECATE, G., and MUSELLI,M. (2002), “A New Learning Method for Piecewise Linear Regression”, in International Conference on Artificial Neural Networks, pp. 28–30.

  • FRALEY, C., and RAFTERY, A.E. (2002), “Model-Based Clustering, Discriminant Analysis, and Density Estimation”, Journal of the American Statistical Association, 97, 611–631.

  • GAFFNEY, S., and SMYTH, P. (1999), “Trajectory Clustering with Mixtures of Regression Models”, in Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 63–72.

  • GAFFNEY, S.J. (2004), “Probabilistic Curve-Aligned Clustering and Prediction with Regression Mixture Models”, PhD thesis, University of California, Irvine.

  • GAFFNEY, S.J., and SMYTH, P. (2004), “Joint Probabilistic Curve Clustering and Alignment”, in Advances in Neural Information Processing Systems 17.

  • GANESALINGAM, S., and MCLACHLAN, G.J. (1978), “The Efficiency of a Linear Discriminant Function Based on Unclassified Initial Samples”, Biometrika, 65, 658–662.

  • GANESALINGAM, S., and MCLACHLAN, G.J. (1979), “A Case Study of Two Clustering Methods Based on Maximum Likelihood”, Statistica Neerlandica, 33, 81–90.

  • GOVAERT, G., INGRASSIA, S., and MCLACHLAN, G. (eds) (2015), “Special Issue on ‘New Trends on Model-Based Clustering and Classification’”, Advances in Data Analysis and Classification, 9(4), 367–369.

  • GUI, J., and LI, H. (2003), “Mixture Functional Discriminant Analysis for Gene Function Classification Based on Time Course Gene Expression Data”, in Proceedings of the Joint Statistical Meeting (Biometric Section).

  • HÉBRAIL, G., HUGUENEY, B., LECHEVALLIER, Y., and ROSSI, F. (2010), “Exploratory Analysis of Functional Data via Clustering and Optimal Segmentation”, Neurocomputing 73(7–9), 1125–1141.

  • HUGUENEY, B., HÉBRAIL, G., LECHEVALLIER, Y., and ROSSI, F. (2009), “Simultaneous Clustering and Segmentation for Functional Data”, in European Symposium on Artificial Neural Networks, pp. 281–286.

  • INGRASSIA, S., MINOTTI, S., and VITTADINI, G. (2012), “Local Statistical Modeling via a Cluster-Weighted Approach with Elliptical Distributions”, Journal of Classification, 29(3), 363–401.

  • INGRASSIA, S., PUNZO, A., VITTADINI, G., and MINOTTI, S. (2015), “The Generalized Linear Mixed Cluster-Weighted Model”, Journal of Classification, 32(1), 85–113.

  • JACQUES, J., and PREDA, C. (2014), “Model-Based Clustering for Multivariate Functional Data”, Computational Statistics & Data Analysis, 71, 92–106.

  • JAMES, G.M., and SUGAR, C. (2003), “Clustering for Sparsely Sampled Functional Data”, Journal of the American Statistical Association, 98(462), 397–408.

  • LEE, S., and MCLACHLAN, G. (2014), “Finite Mixtures of Multivariate Skew t-Distributions: Some Recent and New Results”, Statistics and Computing. 24(2), 181–202.

  • LEE, S.X., and MCLACHLAN, G.J. (2013), “Model-Based Clustering and Classification with Non-Normal Mixture Distributions”, Statistical Methods and Applications, 22(4), 427–454.

  • LEE, S.X., and MCLACHLAN, G.J. (2015), “Finite Mixtures of Canonical Fundamental Skew t-Distributions”, Statistics and Computing, 24(2), 181–202.

  • LIU, X., and YANG, M. (2009), “Simultaneous Curve Registration and Clustering for Functional Data”, Computational Statistics and Data Analysis, 53(4), 1361–1376.

  • MCGEE, V.E., and CARLETON, W.T. (1970), “Piecewise Regression”, Journal of the American Statistical Association, 65, 1109–1124.

  • MCLACHLAN, G., and BASFORD, K. (1988), Mixture Models: Inference and Applications to Clustering, New York: Marcel Dekker.

  • MCLACHLAN, G.J. (1982), “The Classification and Mixture Maximum Likelihood Approaches to Cluster Analysis”, in Handbook of Statistics, Vol. 2, eds. P. Krishnaiah and L. Kanal, pp. 199–208.

  • MCLACHLAN, G.J. (1992), Discriminant Analysis and Statistical Pattern Recognition, New York: Wiley.

  • MCLACHLAN, G.J., and KRISHNAN, T. (2008), The EM Algorithm and Extensions (2nd ed.), New York: Wiley.

  • MCLACHLAN, G.J., and PEEL, D. (2000), Finite Mixture Models, New York: Wiley.

  • MELNYKOV, V. (2016), “Model-Based Biclustering of Clickstream Data”, Computational Statistics & Data Analysis, 93(C), 31–45.

  • MELNYKOV, V., and MAITRA, R. (2010), “Finite Mixture Models and Model-Based Clustering”, Statistics Surveys 4, 80–116.

  • MURRAY, P.M., BROWNE, R.P., and MCNICHOLAS, P.D. (2014), “Mixtures of Skew-Factor Analyzers”, Computational Statistics & Data Analysis, 77, 326–335.

  • NGUYEN, H.D., MCLACHLAN, G.J., and WOOD, I.A. (2016), “Mixtures of Spatial Spline Regressions for Clustering and Classification”, Computational Statistics and Data Analysis, 93, 76–85.

  • PICARD, F., ROBIN, S., LEBARBIER, E., and DAUDIN, J.J. (2007) “A Segmentation/Clustering Model for the Analysis of Array CGH Data”, Biometrics, 63(3), 758–766.

  • RAMSAY, J.O., and SILVERMAN, B.W. (2005), Functional Data Analysis, Berlin: Springer.

  • SAMÉ, A., CHAMROUKHI, F., GOVAERT, G., and AKNIN, P. (2011) “Model-Based Clustering and Segmentation of Time Series with Changes in Regime”, Advances in Data Analysis and Classification, 5(4), 301–321.

  • SCHWARZ, G. (1978), “Estimating the Dimension of a Model”, Annals of Statistics, 6, 461–464.

  • SCOTT, A.J., and SYMONS,M.J. (1971), “Clustering Methods Based on Likelihood Ratio Criteria”, Biometrics, 27, 387–397.

  • SHI, J.Q., and WANG, B. (2008), “Curve Prediction and Clustering with Mixtures of Gaussian Process Functional Regression Models”, Statistics and Computing, 18(3), 267–283.

  • SMYTH, P. (1996). “Clustering Sequences with Hidden Markov Models”, in Advances in Neural Information Processing Systems 9, NIPS, pp. 648–654.

  • STEINLEY, D., and BRUSCOM.J. (2007), “Initializing k-Means Batch Clustering: A Critical Evaluation of Several Techniques”, Journal of Classification, 24, 99–121.

  • STONE, H. (1961), “Approximation of Curves by Line Segments”, Mathematics of Computation, 15(73), 40–47.

  • TANG, Y., BROWNE, R.P., and MCNICHOLAS, P.D. (2015), “Model Based Clustering of High-Dimensional Binary Data”, Computational Statistics & Data Analysis, 87, 84–101.

  • TITTERINGTON, D., SMITH, A., and MAKOV, U. (1985) Statistical Analysis of Finite Mixture Distributions, New York: John Wiley & Sons.

  • WOLFE, J.H. (1970), “Pattern Clustering by Multivariate Mixture Analysis”, Multivariate Behavior Research, 5, 329–359.

  • XIONG, Y., and YEUNG, D.Y. (2004), “Time Series Clustering with ARMA Mixtures”, Pattern Recognition, 37(8), 1675–1689.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Faicel Chamroukhi.

Additional information

We would like to thank the partners of the FUI-SYCIE Project for their financial support to this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chamroukhi, F. Piecewise Regression Mixture for Simultaneous Functional Data Clustering and Optimal Segmentation. J Classif 33, 374–411 (2016). https://doi.org/10.1007/s00357-016-9212-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-016-9212-8

Keywords

Navigation