Abstract
Time-series clustering algorithms have been used in a variety of areas to extract valuable information from complex and massive data sets. However, these algorithms suffer from two shortcomings. On the one hand, most of them are designed for the equal-length time series, while clustering of unequal-length time series is often encountered in real-world problems. On the other hand, commonly used distance measures of time series cannot fully reveal trend differences. To overcome these two shortcomings, this paper focuses on the trend of time series and employs the area-based shape distance to measure their similarity. In addition, we present a new hierarchical clustering for unequal-length time series based on area-based shape distance measure. A series of experiments illustrates the performance of the proposed clustering algorithm.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aghabozorgi S, Shirkhorshidi A, Wah T (2015) Time-series clustering-a decade review. Inf Syst 53:16–38
Bagnall A, Janacek G (2005) Clustering time series with clipped data. Mach Learn 58(2–3):151–178
Berndt D, Clifford J (1994) Using dynamic time warping to find patterns in time series. KDD Workshop Seattle 10:359–370
Caiado J, Crato N, Peña D (2009) Comparison of times series with unequal length in the frequency domain. Commun Stat Simul Comput 38:527–540
Camacho M, Perez-Quiro G, Saiz L (2006) Are European business cycles close enough to be just one? J Econ Dyn Control 30(9–10):1687–1706
Cao D, Tian Y, Bai D (2015) Time series clustering method based on principal component analysis. In 5th International conference on information engineering for mechanics and materials, pp 888–895
Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015a) The UCR time series classification archive. http://www.cs.ucr.edu/~eamonn/time_series_data. Accessed 25 Nov 2017
Chen Z, Zuo W, Hu Q, Lin L (2015b) Kernel sparse representation for time series classification. Inf Sci 292:15–26
Dai D, Mu D (2012) A fast approach to \(K\)-means clustering for time series based on symbolic representation. Int J Adv Comput Technol 4(5):233–239
Dias J, Vermunt J, Ramos S (2015) Clustering financial time series: new insights from an extended hidden Markov model. Eur J Oper Res 243(3):852–864
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499
Górecki T (2014) Using derivatives in a longest common subsequence dissimilarity measure for time series classification. Pattern Recognit Lett 45(1):99–105
http://archive.ics.uci.edu/ml/datasets.html. Accessed 29 Nov 2017
Izakian H, Pedrycz W, Jamal I (2015) Fuzzy clustering of time series data using dynamic time warping distance. Eng Appl Artif Intell 39:235–244
Keogh E, Lin J (2005) Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl Inf Syst 8:154–177
Keogh E, Pazzani M (2001) Derivative dynamic time warping, In: Proceedings of the SIAM international conference on data mining, Chicago, pp 5–7
Kim S, Koh K, Boyd S, Gorinevsky D (2009) \(l_{1}\) trend filtering. SIAM Rev 51(2):339–360
Kini V, Sekhar C (2009) Bayesian mixture of AR models for time series clustering. Formal Pattern Anal Appl 16(2):35–38
Košmelj K, Batagelj V (1990) Cross-sectional approach for clustering time varying data. J Classif 7:99–109
Lai C, Chung P, Tseng V (2010) A novel two-level clustering method for time series data analysis. Expert Syst Appl 37(9):6319–6326
Liang J, Zhao X, Li D, Cao F, Dang C (2012) Determining the number of clusters using information entropy for mixed data. Pattern Recognit 45(6):2251–2265
Liao T (2005) Clustering of time series data-a survey. Pattern Recognit 38(11):1857–1874
Łuczak M (2016) Hierarchical clustering of time series data with parametric derivative dynamic time warping. Expert Syst Appl 62:116–130
Mori U, Mendiburu A, Lozano J (2015) Similarity measure selection for clustering time series databases. IEEE Trans Knowl Data Eng 28(1):181–195
Nguyen H, Mclachlan G, Orban P, Bellec P, Janke A (2017) Maximum pseudolikelihood estimation for model-based clustering of time series data. Neural Comput 29(4):990–1020
Nieto-Barajas L, Contreras-Cristán A (2014) A Bayesian nonparametric approach for time series clustering. Bayesian Anal 9(1):147–170
Qiu X, Zhang L, Suganthan P, Amaratunga G (2017) Oblique random forest ensemble via least square estimation for time series forecasting. Inf Sci 420:249–262
Rosset S, Zhu J (2007) Piecewise linear regularized solution paths. Inst Math Stat 35(3):1012–1030
Roy A (2016) A novel multivariate fuzzy time series based forecasting algorithm incorporating the effect of clustering on prediction. Soft Comput 20(5):1991–2019
Sedano J, Sedano J, Camara M, Prieto C (2016) Gene clustering for time-series microarray with production outputs. Soft Comput 20(11):4301–4312
Silva D, Giusti R, Keogh E, Batista G (2018) Speeding up similarity search under dynamic time warping by pruning unpromising alignments. Data Min Knowl Discov. https://doi.org/10.1007/s10618-018-0557-y
Troncoso A, Arias M, Riquelme JC (2015) A multi-scale smoothing kernel for measuring time-series similarity. Neurocomputing 167:8–17
Wang X, Yu F, Zhang H, Liu S, Wang J (2015) Large-scale time series clustering based on fuzzy granulation and collaboration. Int J Intell Syst 30(6):763–780
Wang X, Yu F, Pedrycz W (2016) An area-based shape distance measure of time series. Appl Soft Comput 48:650–659
Wei L, Jiang J (2010) A hidden Markov model-based K-means time series clustering algorithm. In: IEEE international conference on intelligent computing & intelligent systems, pp 135–138
Xiong Y, Yeung D (2004) Time series clustering with ARMA mixtures. Pattern Recognit 37(8):1675–1689
Yu H, Liu Z, Wang G (2014) An automatic method to determine the number of clusters using decision-theoretic rough set. Int J Approx Reason 55(1):101–115
Yu F, Dong K, Chen F, Jiang Y, Zeng W (2007) Clustering time series with granular dynamic time warping method. In: IEEE international conference on granular computing, San Jose, CA, pp 393–398
Zhang Y, Mańdziuk J, Chai H, Goh B (2017) Curvature-based method for determining the number of clusters. Inf Sci 415–416:414–428
Acknowledgements
This work was funded by National Natural Science Foundation of China (Nos. 11701338, 11571001), Natural Science Foundation of Shandong Province (No. ZR2016AP12), and a Project of Shandong Province Higher Educational Science and Technology Program (No. J17KB124).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, X., Yu, F., Pedrycz, W. et al. Hierarchical clustering of unequal-length time series with area-based shape distance. Soft Comput 23, 6331–6343 (2019). https://doi.org/10.1007/s00500-018-3287-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-018-3287-6