[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A Framework for Similarity Search in Streaming Time Series based on Spark Streaming

Published: 01 October 2022 Publication History

Abstract

Similarity search in streaming time series is a challenging problem due to tight requirements in processing streaming data and replying feedback, e.g., quickly processing a time-series stream of high speed, and accurately replying found results to a query system. These difficulties urge researchers of time-series data mining to have a framework at hand for building systems of similarity search in streaming time series based on a platform specializing in handling streaming data. In the paper, we introduce a framework of similarity search in streaming time series based on Spark Streaming. Subsequently, a prototype system implementing the framework would be proposed to demonstrate the feasibility of the framework for building similarity search systems which can work efficiently and effectively in streaming context. In addition, the prototype system takes advantages of SUCR-DTW to perform similarity search efficiently in streaming environment under Dynamic Time Warping. The experimental results obtained from the prototype system demonstrate that the Spark job of similarity search in streaming time series is accomplished quickly and accurately. The subsequences of streaming time series, which are similar to predefined queries, are found in near real time. They are the same as those obtained from the execution of similarity search in streaming time series by another reference system. Furthermore, the prototype system has high scalability, stably works while processing time-series streams of high steady rate. These experimental results also underline the value of the combination of Spark Streaming and SUCR-DTW to handle the challenging problem.

References

[1]
The Apache Software Foundation (2018) Spark streaming. https://spark.apache.org/streaming/. Accessed 01 June 2020
[2]
Zhang X, Qian Z, Shen S, Shi J, Wang S (2019) Streaming massive electric power data analysis based on Spark Streaming. In: Proceedings of international conference on database systems for advanced applications, pp 200–212, DOI
[3]
Paolis D, Tommaso L, Luca VD, Paiano R (2018) Sensor data collection and analytics with thingsboard and spark streaming. In: Proceedings of 2018 IEEE workshop on environmental, energy, and structural monitoring systems (EESMS), pp 1–6, DOI, (to appear in print)
[4]
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: Proceedings of AAAI Workshop on Knowledge Discovery in Databases, Seattle, Washington, USA, pp 359–370
[5]
Giao BC and Anh DT Similarity search for numerous patterns over multiple time series streams under dynamic time warping which supports data normalization Vietnam J Comput Sci 2016 3 3 181-196
[6]
Luo W, Li Y, Yao F, Wang S, Li Z, Zhan P, and Li X Multi-resolution representation for streaming time series retrieval Int J Pattern Recog Artif Intell 2021 35 06 2150019
[7]
Zhan P, Sun C, Hu Y, Luo W, Zheng J, and Li X Feature-based online representation algorithm for streaming time series similarity search Int J Pattern Recog Artif Intell 2020 34 05 2050010
[8]
Keogh E, Smyth P (1997) A probabilistic approach to fast pattern matching in time. In: Proceedings of third international conference knowledge discovery and data mining, vol 97. AAAI Press, 1997, California, USA, pp 24–30
[9]
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data, pp 151–162, DOI
[10]
Aggarwal CC, Philip SY, Han J, Wang J (2003) A framework for clustering evolving data streams. In: Proceedings of 2003 VLDB Conference, pp 81–92, DOI
[11]
Hartigan JA and Wong MA Algorithm AS 136: A k-means clustering algorithm J R Stat Soc Seri C (Appl Stat) 1979 28 1 100-108
[12]
Ziehn A, Charfuelan M, Hemsen H, Markl V (2019) Time series similarity search for streaming data in distributed systems. In: Workshops of the EDBT/ICDT 2019 Joint Conference (EDBT/ICDT 2019), Lisbon, Portugal
[13]
The Apache Software Foundation (2014) Apache Flink. https://flink.apache.org/. Accessed 01 Sept 2021
[14]
Ding Y, Luo W, Zhao Y, Li Z, Zhan P, and Li X A novel similarity search approach for streaming time series J Phys Conf Ser 2019 1302 2 022084
[15]
Oregi I, Péres A, Ser DJ, Lozano JA (2017) On-line Dynamic Time Warping for streaming time series. In: Joint european conference on machine learning and knowledge discovery in databases, pp 591–605, DOI
[16]
Sakoe H and Chiba S Dynamic programming algorithm optimization for spoken word recognition IEEE Trans Acoust Speech Sign Process 1978 26 1 43-49
[17]
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under Dynamic Time Warping. In: Proceedings of The 18th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’12), pp 262–270, DOI
[18]
The Apache Software Foundation (2018) Apache Spark. https://spark.apache.org/. Accessed 01 June 2020
[19]
The Apache Software Foundation (2008) Apache YARN. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html. Accessed 01 Sept 2020
[20]
The Apache Software Foundation (2012) Apache Mesos. http://mesos.apache.org/. Accessed 01 Sept 2020
[21]
The Apache Software Foundation (2006) Apache Hadoop. https://hadoop.apache.org/. Accessed 01 Sept 2020
[22]
The Apache Software Foundation (2009) Apache Flume. https://flume.apache.org/. Accessed 01 Sept 2020
[23]
The Apache Software Foundation (2017) Apache Kafka. https://kafka.apache.org/. Accessed 01 Sept 2020
[24]
Gupta G Learning real-time processing with Spark Streaming 2015 Birmingham B3 2PB, UK Packt Publishing Ltd
[25]
The Apache Software Foundation (2004) Apache Derby. https://db.apache.org/derby/. Accessed 01 Sept 2020
[26]
West M (2021) Time-series data. http://www2.stat.duke.edu/~mw/mwsoftware/moredata/ts_data. Accessed 01 Sept 2021
[27]
Weigend AS (2016) SantaFe Time Series. http://www-psych.stanford.edu/~andreas/Time-Series/SantaFe.html. Accessed Dec 2016
[28]
Group MP (2016) Datasets relate to the operation of the electricity market. http://ftp.emi.ea.govt.nz/Datasets/. Accessed Dec 2016

Index Terms

  1. A Framework for Similarity Search in Streaming Time Series based on Spark Streaming
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image Mobile Networks and Applications
            Mobile Networks and Applications  Volume 27, Issue 5
            Oct 2022
            436 pages

            Publisher

            Springer-Verlag

            Berlin, Heidelberg

            Publication History

            Published: 01 October 2022
            Accepted: 04 April 2022

            Author Tags

            1. Similarity search
            2. Streaming time series
            3. Spark streaming
            4. SUCR-DTW

            Qualifiers

            • Research-article

            Funding Sources

            • Saigon University

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 06 Jan 2025

            Other Metrics

            Citations

            View Options

            View options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media