Abstract
In this paper, we address the scalability problem of periodicity detection for time series and sequence databases. We present time and space efficient periodicity detection method that efficiently uses external memory (disk) when the series cannot be processed inside the available main memory. Our approach uses suffix tree to facilitate periodicity detection. We consider two cases, namely in-core and out of core. First, we optimize storage requirements of the suffix tree to be able to fit larger suffix trees in-core. This guarantees the ability to mine larger sequences when everything can be kept in-core, which is what the current periodicity detection approaches are able to mine. Second, when the data structures go out of core, we extend the suffix tree construction part to use external memory. We are able to achieve this while maintaining linear time complexity. As a result, when we go out of core, we can mine databases that require considerably larger space to keep the data structures compared to the available main memory. For the out-of-core periodicity detection part, the proposed method allows the required data structures to be kept on external memory partially when a memory overflow situation occurs. Various pruning strategies are also proposed to allow the proposed approach to process large sequences within reasonable amount of time. Additionally, we introduced the notion of “emulated tree traversal” for fast suffix tree traversal. Due to all these special considerations, we are able to mine much larger sequences compared to other existing periodicity detection algorithms. To demonstrate the applicability, power, and effectiveness of the proposed framework, we present results of periodicity detection up to 500 MB of time sequence data, which (to the best of our knowledge) is the largest reported sequence mined for periodicity detection ever.
Similar content being viewed by others
References
Ahdesmaki M, Lahdesmaki H, Pearson R, Huttunen H, Yli-Harja O (2005) Robust detection of periodic time series measured from biological systems. BMC Bioinform 6:117
Al-Rawi A, Lansari A, Bouslama F (2003) A new non-recursive algorithm for binary search tree traversal. In: Proceedings of 10th IEEE international conference on electronics, circuits and systems ICECS, vol 2. UAE, pp 770–773 (Dec 2003)
Barsky M, Stege U, Thomo A, Upton C (2009) Suffix trees for very large genomic sequences. In: Proceeding of the 18th ACM conference on information and knowledge management (CIKM ’09). New York, NY, USA, pp 1417–1420
Bedathur SJ, Haritsa JR (2004) Engineering a fast online persistent suffix tree construction. In: Proceedings of the 20th international conference on data, engineering
Bentley J (1999) Programming pearls, 2nd edn. Addison-Wesley Professional, Reading
Berberidis C, Aref W, Atallah M, Vlahavas I, Elmagarmid A (2002) Multiple and partial periodicity mining in time series databases. In: Proceedings of European conference on artificial intelligence (July 2002)
Branch JW, Giannella C, Szymanski B, Wolff R, Kargupta H (2012) In-network outlier detection in wireless sensor networks. Knowl Inform Syst 18. doi:10.1007/s10115-011-0474-5 (January 2012, Online First)
Cheung C-F, Yu JX, Lu H (2005) Constructing suffix tree for Gigabyte sequences with Megabyte memory. IEEE Trans Knowl Data Eng 17(1):90–105
Elfeky MG, Aref WG, Elmagarmid AK (2005) Periodicity detection in time series databases. IEEE Trans Knowl Data Eng 17(7):875–887
Elfeky MG, Aref WG, Elmagarmid AK (2005) WARP: time warping for periodicity detection. In: Proceedings of IEEE international conference of data mining (Nov 2005)
Fayolle J, Ward MD (2005) Analysis of the average depth in a suffix tree under a Markov model. In: Martnez C (ed) 2005 International conference on analysis of algorithms. Discrete mathematics and theoretical computer ccience proceedings AD, pp 95–104
Garcia ACB, Bentes C, de Melo RHC, Zadrozny B, Penna TJP (2011) Sensor data analysis for equipment monitoring. Knowl Inform Syst 28(2):333–364
Glynn EF, Chen J, Mushegian AR (2006) Detecting periodic patterns in unevenly spaced gene expression time series using Lomb-Scargle periodograms. Bioinformatics 22(3):310–316
Gusfield D (1997) Algorithms on strings, trees, and sequences. Cambridge University Press, Cambridge
Han J, Gong W, Yin Y (1998) Mining segment-wise periodic patterns in time related databases. In: Proceedings of ACM international conference on knowledge discovery and data mining (Aug 1998)
Han J, Yin Y, Dong G (1999) Efficient mining of partial periodic patterns in time series database. In: Proceedings of IEEE international conference on data, engineering, p 106
Huang K-Y, Chang C-H (June 2005) SMCA: a general model for mining asynchronous periodic patterns in temporal databases. IEEE Trans Knowl Data Eng 17(6):774–785
Indyk P, Koudas N, Muthukrishnan S (2000) Identifying representative trends in massive time series data sets using sketches. In: Proceedings of the international conference on very large databases, VLDB (Sept 2000)
Koknar-Tezel S, Latecki LJ (2011) Improving SVM classification on imbalanced time series data sets with ghost points. Knowl Inform Syst 28(1):1–23
Kurtz S (1999) Reducing the space requirement of suffix trees. Softw Pract Exp 29(13):1149–1171
Lahiri M, Berger-Wolf TY (2010) Periodic subgraph mining in dynamic networks. Knowl Inform Syst 24(3):467–497
Lee G, Yang W, Lee J-M (2006) A parallel algorithm for mining multiple partial periodic patterns. Elsevier J Inform Sci 176:3591–3609
Nelson M (1996) Fast string searching with suffix trees. Dr. Dobb’s Journal
Phoophakdee B, Zaki MJ (2007) Genome-scale disk-based suffix tree indexing. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data (SIGMOD ’07), pp 833–844
Rasheed F, Alshalalfa M, Alhajj R (2011) Efficient periodicity mining in time series databases using suffix tree. IEEE Trans Knowl Data Eng (TKDE) 23(1):79–94
Reznik YA (2002) On tries, suffix trees, and universal variable-length-to-block codes. In: Proceedings of IEEE international symposium on information theory, p 123
Sheng C, Hsu W, Lee M-L (2012) Mining dense periodic patterns in time series data. In: Proceedings of IEEE international conference on data, engineering, p 115
Tian Y, Tata S, Hankins RA, Patel JM (Sep. 2005) Practical methods for constructing suffix trees. VLDB J 14(3):281–299
Ukkonen E (1995) Online construction of suffix trees. Algorithmica 14(3):249–260
Valimaki N, Gerlach W, Dixit K, Makinen V (2007) Compressed suffix tree—a basis for genome-scale sequence analysis. Bioinformatics 23:629–630
Weigend A, Gershenfeld N (1994) Time series prediction: forecasting the future and understanding the past. Addison-Wesley, Reading
Wong S-S, Sung W-K, Wong L (2007) CPS-tree: a compact partitioned suffix tree for diskAbased indexing on large genome sequences. In: International conference on data engineering (ICDE), pp 1350–1354
Yang J, Wang W, Philip SY (2001) Infominer: mining surprising periodic patterns. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD, pp 395–400
Yankov D, Keogh E, Rebbapragada U (2008) Disk aware discord discovery: finding unusual time series in terabyte sized datasets. Knowl Inform Syst 17(2):241–262
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rasheed, F., Adnan, M. & Alhajj, R. Out-of-core detection of periodicity from sequence databases. Knowl Inf Syst 36, 277–301 (2013). https://doi.org/10.1007/s10115-012-0546-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-012-0546-1