Abstract
Time-series data analysis is essential in many modern applications, such as financial markets, sensor networks, and data centers, and correlation discovery is a core technique for the analysis. In this paper, we address a novel problem that computes a k-sized time-series dataset where the minimum Pearson correlation of any two time-series in the set is maximized. This problem discovers a group of time-series, which are highly correlated with each other, from a given time-series dataset without any prior knowledge, thus helps many analytical applications. We show that this problem is NP-hard, and design an approximate heuristic solution that provides a high quality result with fast response time. Extensive experiments on real and synthetic datasets verify the efficiency, effectiveness, and scalability of our solution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amagata, D., Hara, T.: Mining top-k co-occurrence patterns across multiple streams. TKDE 29(10), 2249–2262 (2017)
Cole, R., Shasha, D., Zhao, X.: Fast window correlations over uncooperative time series. In: KDD, pp. 743–749 (2005)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: SoCG, pp. 253–262 (2004)
Drosou, M., Pitoura, E.: Diversity over continuous data. IEEE Data Eng. Bull. 32(4), 49–56 (2009)
Gan, J., Feng, J., Fang, Q., Ng, W.: Locality-sensitive hashing scheme based on dynamic collision counting. In: SIGMOD, pp. 541–552 (2012)
Guo, T., Sathe, S., Aberer, K.: Fast distributed correlation discovery over streaming time-series data. In: CIKM, pp. 1161–1170 (2015)
Huang, Q., Feng, J., Zhang, Y., Fang, Q., Ng, W.: Query-aware locality-sensitive hashing for approximate nearest neighbor search. PVLDB 9(1), 1–12 (2015)
Kato, S., Amagata, D., Nishio, S., Hara, T.: Monitoring range motif on streaming time-series. In: DEXA, pp. 251–266 (2018)
Kim, J., Ruggiero, M., Atienza, D., Lederberger, M.: Correlation-aware virtual machine allocation for energy-efficient datacenters. In: DATE, pp. 1345–1350 (2013)
Li, L., Hong, X., Tang, D., Na, M.: GHG emissions, economic growth and urbanization: a spatial approach. Sustainability 8(5), 462 (2016)
Li, Y., Yiu, M.L., Gong, Z., et al.: Quick-motif: an efficient and scalable framework for exact motif discovery. In: ICDE, pp. 579–590 (2015)
Linardi, M., Zhu, Y., Palpanas, T., Keogh, E.: Matrix profile x: VALMOD-scalable discovery of variable-length motifs in data series. In: SIGMOD, pp. 1053–1066 (2018)
Lucas, D., et al.: Designing optimal greenhouse gas observing networks that consider performance and cost. Geosci. Instrum. Methods Data Syst. 4(1), 121 (2015)
Marti, G., Andler, S., Nielsen, F., Donnat, P.: Clustering financial time series: how long is enough?. In: IJCAI, pp. 2583–2589 (2016)
Mueen, A., Keogh, E., Bigdely-Shamlo, N.: Finding time series motifs in disk-resident data. In: ICDM, pp. 367–376 (2009)
Mueen, A., Keogh, E., Zhu, Q., Cash, S., Westover, B.: Exact discovery of time series motifs. In: SDM, pp. 473–484 (2009)
Mueen, A., Nath, S., Liu, J.: Fast approximate correlation for massive time-series data. In: SIGMOD, pp. 171–182 (2010)
Ravi, S.S., Rosenkrantz, D.J., Tayi, G.K.: Facility dispersion problems: heuristics and special cases. In: Dehne, F., Sack, J.-R., Santoro, N. (eds.) WADS 1991. LNCS, vol. 519, pp. 355–366. Springer, Heidelberg (1991). https://doi.org/10.1007/BFb0028275
Reiss, C., Wilkes, J., Hellerstein, J.L.: Google cluster-usage traces: format+ schema, pp. 1–14. Google Inc., White Paper (2011)
Tsytsarau, M., Amer-Yahia, S., Palpanas, T.: Efficient sentiment correlation for large-scale demographics. In: SIGMOD, pp. 253–264 (2013)
Vlachos, M., Hadjieleftheriou, M., Gunopulos, D., Keogh, E.: Indexing multidimensional time-series. VLDB J. 15(1), 1–20 (2006)
Yankov, D., Keogh, E., Medina, J., Chiu, B., Zordan, V.: Detecting time series motifs under uniform scaling. In: KDD, pp. 844–853 (2007)
Yeh, C.C.M., Kavantzas, N., Keogh, E.: Matrix profile vi: meaningful multidimensional motif discovery. In: ICDM, pp. 565–574 (2017)
Yeh, C.C.M., et al.: Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: ICDM, pp. 1317–1322 (2016)
Yi, X., Zheng, Y., Zhang, J., Li, T.: ST-MVL: filling missing values in geo-sensory time series data. In: IJCAI, pp. 2704–2710 (2016)
Zhu, Y., et al.: Matrix profile ii: exploiting a novel algorithm and GPUs to break the one hundred million barrier for time series motifs and joins. In: ICDM, pp. 739–748 (2016)
Zhu, Y., Shasha, D.: Statstream: statistical monitoring of thousands of data streams in real time. In: VLDB, pp. 358–369 (2002)
Acknowledgment
This research is partially supported by JSPS Grant-in-Aid for Scientific Research (A) Grant Number 18H04095, JSPS Grant-in-Aid for Young Scientists (B) Grant Number JP16K16056, and JST CREST Grant Number J181401085.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Amagata, D., Hara, T. (2019). Correlation Set Discovery on Time-Series Data. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11707. Springer, Cham. https://doi.org/10.1007/978-3-030-27618-8_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-27618-8_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27617-1
Online ISBN: 978-3-030-27618-8
eBook Packages: Computer ScienceComputer Science (R0)