Abstract
Accurate clustering of time series is a challenging problem for data arising from areas such as financial markets, biomedical studies, and environmental sciences, especially when some, or all, of the series exhibit nonlinearity and nonstationarity. When a subset of the series exhibits nonlinear characteristics, frequency domain clustering methods based on higher-order spectral properties, such as the bispectra or trispectra are useful. While these methods address nonlinearity, they rely on the assumption of series stationarity. We propose the Bispectral Smooth Localized Complex EXponential (BSLEX) approach for clustering nonlinear and nonstationary time series. BSLEX is an extension of the SLEX approach for linear, nonstationary series, and overcomes the challenges of both nonlinearity and nonstationarity through smooth partitions of the nonstationary time series into stationary subsets in a dyadic fashion. The performance of the BSLEX approach is illustrated via simulation where several nonstationary or nonlinear time series are clustered, as well as via accurate clustering of the records of 16 seismic events, eight of which are earthquakes and eight are explosions. We illustrate the utility of the approach by clustering S&P 100 financial returns.
Similar content being viewed by others
References
Ashley R, Patterson D, Hinich M (1986) A diagnostic test for nonlinear serial dependence in time series fitting errors. J Time Ser Anal 7:165–178
Balian R (1987) Un principe d’incertitude fort en théorie du signal ou en mécanique quantique. CR Acad Sci Paris 292:1357–1362
Bauwens L, Rambouts JVK (2007) Bayesian clustering of many GARCH models. Econ Rev 26:365–386
Blandford RR (1993) Discrimination of earthquakes and explosions. Tech. rep., Air Force Technical Applications Center, Patrick Air Force Base, FL
Böhm H, Ombao H, von Sachs R, J S (2010) Classification of multivariate non-stationary signals: the slex-shrinkage approach. J Stat Plan Infer 140:3754–3763
Bollerslev T (1986) Generalized autoregressive conditional heteroskedasticity. J Econ 31:307–327
Bouveyron C, Côme E, Jacques J (2015) The discriminative functional mixture model for a comparative analysis of bike sharing systems. Ann Appl Stat 9:1726–1760
Brillinger DR (1965) An introduction to polyspectra. Ann Math Stat 36:1351–1374
Brillinger DR, Rosenblatt M (1967a) Asymptotic theory of estimates of k-th order spectra. In: Harris B (ed) Spectral analysis of time series. Wiley, New York
Brillinger DR, Rosenblatt M (1967b) Computation and interpretion of k-th order spectra. In: Harris B (ed) Spectral analysis of time series. Wiley, New York
Chan KS, Tong H (1985) On the use of the deterministic Lyapunov function for the ergodicity of stochastic difference equations. Adv Appl Probab 17:666–678
Choi HI, Williams WJ (1989) Improved time-frequency representation of multiple component signal using exponential kernel. IEEE Trans Acoust Speech Signal Process 37:862–871
Coates DS, Diggle PJ (1986) Tests for comparing two estimated spectral densities. J Time Ser Anal 7:7–20
Corduas M, Piccolo D (2008) Time series clustering and classification by the autoregressive metric. Comput Stat Data Anal 52:1860–1872
Dahlhaus R (1997) Fitting time series models to nonstationary processes. Ann Stat 25:1–37
Dahlhaus R (2001) A likelihood approximation for locally stationary processes. Ann Stat 28:1762–1794
Fokianos K, Promponas V (2011) Biological applications of time series frequency domain clustering. J Time Ser Anal 33:744–756
Fokianos K, Savvides A (2008) On comparing several spectral densities. Technometrics 50:317–331
Fruwirth-Schnatter S, Kaufmann S (2008) Model-based clustering of multiple time series. J Bus Econ Stat 26:78–89
Granger CWJ, Anderson AP (1978) An introduction to bilinear time series models. Vandenhoeck and Ruprecht, Göttingen
Harvill JL (1999) Testing time series linearity via goodness of fit methods. J Stat Plan Infer 75:331–341
Harvill JL, Ravishanker N, Ray BK (2011) Bispectral-based methods for clustering time series. Comput Stat Data Anal 64:113–131
Heard NA, Homes C, Stephens DA (2006) A quantitative study of gene regulation involved in the immune response of anopheline mosquitoes: An application of Bayesian hierarchical clustering of curves. J Am Stat Assoc 101:18–29
Hinich M (1982) Testing for gaussianity and linearity for a stationary time series. J Time Ser Anal 3:169–176
Huang H, Ombao H, Stoffer D (2004) Classification and discrimination of non-stationary time series using the SLEX model. J Am Stat Assoc 99:763–774
Ioannou A, Fokianos K, Promponas V (2010) Spectral density ratio based clustering for the binary segmentation of protein sequences: a comparative study. BioSystems 100:132–143
Jahan N, Harvill JL (2008) Bispectral-based goodness-of-fit tests for gaussianity and linearity of stationary time series. Commun Stat Theory Methods 37:3216–3227
Johnson D, Wichern D (2007) Applied multivariate statistical analysis, 6th edn. Prentice Hall, New York
Juarez MA, Steel MFJ (2010) Model-based clustering of non-Gaussian panel data based on skew-t distributions. J Bus Econ Stat 28:52–66
Kakizawa Y, Shumway RH, Taniguchi M (1998) Discrimination and clustering for multivariate time series. J Am Stat Assoc 93:328–340
Kalpakis K, Gada D, Puttagunta V (2001) Distance measures for effective clustering of arima time-series. In: Proceedings of the 2001 IEEE international conference on data mining. San Jose, pp 273–280
Kaufman L, Rousseuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley-Interscience, New York
Low F (1985) Complete sets of wave-packets. In: A passion for physics-essays in honor of Geoffrey chew. Word Scientific, Singapore, pp 17–22
Milligan GW, Cooper MC (1985) An examiniation of procedures determining the number of clusters in a data set. Psychometrica 50:159–179
Newton HJ (1988) Timeslab: a time series analysis laboratory. Wadsworth/Brooks-Cole, California
Ombao H, Raz J, von Sachs R, Malow B (2001) Automatic statistical analysis of bivariate nonstationary time series. J Am Stat Assoc 96:543–560
Ombao H, Raz J, von Sachs R, Guo W (2002) The SLEX model of non-stationary random rocesses. Ann Inst Stat Math 54:171–200
Ombao H, von Sachs R, Guo W (2005) SLEX analysis of multivariate non-stationary time series. J Am Stat Assoc 100:519–531
Paparoditis E, Preuß P (2014) Estimation of the bispectrum for locally stationary processes. Stat Probab Lett 89:8–16
Priestley MB (1965) Evolutionary spectra and nonstationary processes. J R Stat Soc Ser B 28:228–240
Priestley MB (1981) Spectral analysis and time series, vol 1 and 2. Academic Press, London
Priestley MB, Subba Rao T (1969) A test for non-stationarity of time-series. J R Stat Soc Ser B 31(1):140–149
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850
Ravishanker N, Hosking JRM, Mukhopadhyay J (2010) Spectrum based comparison of multivariate time series. Methodol Comput Appl Probab 12:749–762
Rousseuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput Appl Math 20:53–65
Saito N (1994) Local feature extraction and its applications. PhD thesis, Department of Mathematics, Yale University
Sakiyama K, Taniguchi M (2001) Discrimination for locally stationary random processes. Tech. rep., Technical Report
Sakiyama K, Taniguchi M (2004) Discrimination for locally stationary random processes. J Multivar Anal 90:282–300
Savvides A, Promponas V, Fokianos K (2008) Clustering of biological time series by cepstral coefficients based distances. Pattern Recogn 41:2398–2412
Shearer PM (2009) An introduction to seismology. Cambridge University Press, New York
Shumway RH, Stoffer DS (2011) Time series analysis and its applications: with R examples. Springer, New York
Stein S, Wysession M (2009) An introduction to seismology, earthquakes, and earth structure. Wiley, New York
Subba Rao T, Gabr MM (1980) An introduction to bispectral analysis and bilinear time series. Springer-Verlag, New York
Sugar CA, James GM (2003) Finding the number of clusters in a dataset: an information-theoretic approach. J Am Stat Assoc 98:750–763
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of data clusters via the gap statistic. J R Stat Soc Ser B 63:411–423
Tong H, Lim KS (1980) Threshold autoregression, limit cycles, and cyclical data. J R Stat Soc Ser B 42:245–292
Tsay RS (1991) Detecting and modeling nonlinearity in univariate time series analysis. Stat Sin 1:431–451
Van Ness JW (1966) Asymptotic normality of bispectral estimates. Ann Math Stat 37(5):1257–1272
Vlachos M, Lin J, Keogh E, D G (2003) A wavelet based anytime algorithm for k-means clustering of time series. In: Proceedings of the 3rd SIAM international conference on data mining may. San Fransisco
Wickerhauser M (1994) Adapted wavelet analysis from theory to software. IEEE Press, Wellesley
Wigner E (1932) On the quantum correction for thermodynamic equilibrium. Phys Rev 40(5):749–759
Xiong X, DY Y (2004) Time series clustering with arma mixtures. Pattern Recogn 37:1675–1689
Zhao Y, Karypis G (2004) Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach Learn 55:311–331
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Harvill, J.L., Kohli, P. & Ravishanker, N. Clustering Nonlinear, Nonstationary Time Series Using BSLEX. Methodol Comput Appl Probab 19, 935–955 (2017). https://doi.org/10.1007/s11009-016-9528-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11009-016-9528-1