[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Out-of-core detection of periodicity from sequence databases

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In this paper, we address the scalability problem of periodicity detection for time series and sequence databases. We present time and space efficient periodicity detection method that efficiently uses external memory (disk) when the series cannot be processed inside the available main memory. Our approach uses suffix tree to facilitate periodicity detection. We consider two cases, namely in-core and out of core. First, we optimize storage requirements of the suffix tree to be able to fit larger suffix trees in-core. This guarantees the ability to mine larger sequences when everything can be kept in-core, which is what the current periodicity detection approaches are able to mine. Second, when the data structures go out of core, we extend the suffix tree construction part to use external memory. We are able to achieve this while maintaining linear time complexity. As a result, when we go out of core, we can mine databases that require considerably larger space to keep the data structures compared to the available main memory. For the out-of-core periodicity detection part, the proposed method allows the required data structures to be kept on external memory partially when a memory overflow situation occurs. Various pruning strategies are also proposed to allow the proposed approach to process large sequences within reasonable amount of time. Additionally, we introduced the notion of “emulated tree traversal” for fast suffix tree traversal. Due to all these special considerations, we are able to mine much larger sequences compared to other existing periodicity detection algorithms. To demonstrate the applicability, power, and effectiveness of the proposed framework, we present results of periodicity detection up to 500 MB of time sequence data, which (to the best of our knowledge) is the largest reported sequence mined for periodicity detection ever.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Ahdesmaki M, Lahdesmaki H, Pearson R, Huttunen H, Yli-Harja O (2005) Robust detection of periodic time series measured from biological systems. BMC Bioinform 6:117

    Article  Google Scholar 

  2. Al-Rawi A, Lansari A, Bouslama F (2003) A new non-recursive algorithm for binary search tree traversal. In: Proceedings of 10th IEEE international conference on electronics, circuits and systems ICECS, vol 2. UAE, pp 770–773 (Dec 2003)

  3. Barsky M, Stege U, Thomo A, Upton C (2009) Suffix trees for very large genomic sequences. In: Proceeding of the 18th ACM conference on information and knowledge management (CIKM ’09). New York, NY, USA, pp 1417–1420

  4. Bedathur SJ, Haritsa JR (2004) Engineering a fast online persistent suffix tree construction. In: Proceedings of the 20th international conference on data, engineering

  5. Bentley J (1999) Programming pearls, 2nd edn. Addison-Wesley Professional, Reading

    Google Scholar 

  6. Berberidis C, Aref W, Atallah M, Vlahavas I, Elmagarmid A (2002) Multiple and partial periodicity mining in time series databases. In: Proceedings of European conference on artificial intelligence (July 2002)

  7. Branch JW, Giannella C, Szymanski B, Wolff R, Kargupta H (2012) In-network outlier detection in wireless sensor networks. Knowl Inform Syst 18. doi:10.1007/s10115-011-0474-5 (January 2012, Online First)

  8. Cheung C-F, Yu JX, Lu H (2005) Constructing suffix tree for Gigabyte sequences with Megabyte memory. IEEE Trans Knowl Data Eng 17(1):90–105

    Article  Google Scholar 

  9. Elfeky MG, Aref WG, Elmagarmid AK (2005) Periodicity detection in time series databases. IEEE Trans Knowl Data Eng 17(7):875–887

    Article  Google Scholar 

  10. Elfeky MG, Aref WG, Elmagarmid AK (2005) WARP: time warping for periodicity detection. In: Proceedings of IEEE international conference of data mining (Nov 2005)

  11. Fayolle J, Ward MD (2005) Analysis of the average depth in a suffix tree under a Markov model. In: Martnez C (ed) 2005 International conference on analysis of algorithms. Discrete mathematics and theoretical computer ccience proceedings AD, pp 95–104

  12. Garcia ACB, Bentes C, de Melo RHC, Zadrozny B, Penna TJP (2011) Sensor data analysis for equipment monitoring. Knowl Inform Syst 28(2):333–364

    Article  Google Scholar 

  13. Glynn EF, Chen J, Mushegian AR (2006) Detecting periodic patterns in unevenly spaced gene expression time series using Lomb-Scargle periodograms. Bioinformatics 22(3):310–316

    Article  Google Scholar 

  14. Gusfield D (1997) Algorithms on strings, trees, and sequences. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  15. Han J, Gong W, Yin Y (1998) Mining segment-wise periodic patterns in time related databases. In: Proceedings of ACM international conference on knowledge discovery and data mining (Aug 1998)

  16. Han J, Yin Y, Dong G (1999) Efficient mining of partial periodic patterns in time series database. In: Proceedings of IEEE international conference on data, engineering, p 106

  17. Huang K-Y, Chang C-H (June 2005) SMCA: a general model for mining asynchronous periodic patterns in temporal databases. IEEE Trans Knowl Data Eng 17(6):774–785

  18. Indyk P, Koudas N, Muthukrishnan S (2000) Identifying representative trends in massive time series data sets using sketches. In: Proceedings of the international conference on very large databases, VLDB (Sept 2000)

  19. Koknar-Tezel S, Latecki LJ (2011) Improving SVM classification on imbalanced time series data sets with ghost points. Knowl Inform Syst 28(1):1–23

    Article  Google Scholar 

  20. Kurtz S (1999) Reducing the space requirement of suffix trees. Softw Pract Exp 29(13):1149–1171

    Article  Google Scholar 

  21. Lahiri M, Berger-Wolf TY (2010) Periodic subgraph mining in dynamic networks. Knowl Inform Syst 24(3):467–497

    Article  Google Scholar 

  22. Lee G, Yang W, Lee J-M (2006) A parallel algorithm for mining multiple partial periodic patterns. Elsevier J Inform Sci 176:3591–3609

    Article  MathSciNet  MATH  Google Scholar 

  23. Nelson M (1996) Fast string searching with suffix trees. Dr. Dobb’s Journal

  24. Phoophakdee B, Zaki MJ (2007) Genome-scale disk-based suffix tree indexing. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data (SIGMOD ’07), pp 833–844

  25. Rasheed F, Alshalalfa M, Alhajj R (2011) Efficient periodicity mining in time series databases using suffix tree. IEEE Trans Knowl Data Eng (TKDE) 23(1):79–94

    Article  Google Scholar 

  26. Reznik YA (2002) On tries, suffix trees, and universal variable-length-to-block codes. In: Proceedings of IEEE international symposium on information theory, p 123

  27. Sheng C, Hsu W, Lee M-L (2012) Mining dense periodic patterns in time series data. In: Proceedings of IEEE international conference on data, engineering, p 115

  28. Tian Y, Tata S, Hankins RA, Patel JM (Sep. 2005) Practical methods for constructing suffix trees. VLDB J 14(3):281–299

    Google Scholar 

  29. Ukkonen E (1995) Online construction of suffix trees. Algorithmica 14(3):249–260

    Article  MathSciNet  MATH  Google Scholar 

  30. Valimaki N, Gerlach W, Dixit K, Makinen V (2007) Compressed suffix tree—a basis for genome-scale sequence analysis. Bioinformatics 23:629–630

    Article  Google Scholar 

  31. Weigend A, Gershenfeld N (1994) Time series prediction: forecasting the future and understanding the past. Addison-Wesley, Reading

    Google Scholar 

  32. Wong S-S, Sung W-K, Wong L (2007) CPS-tree: a compact partitioned suffix tree for diskAbased indexing on large genome sequences. In: International conference on data engineering (ICDE), pp 1350–1354

  33. Yang J, Wang W, Philip SY (2001) Infominer: mining surprising periodic patterns. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD, pp 395–400

  34. Yankov D, Keogh E, Rebbapragada U (2008) Disk aware discord discovery: finding unusual time series in terabyte sized datasets. Knowl Inform Syst 17(2):241–262

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reda Alhajj.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rasheed, F., Adnan, M. & Alhajj, R. Out-of-core detection of periodicity from sequence databases. Knowl Inf Syst 36, 277–301 (2013). https://doi.org/10.1007/s10115-012-0546-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-012-0546-1

Keywords

Navigation