[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Counting the frequency of time-constrained serial episodes in a streaming sequence

Published: 01 December 2019 Publication History

Abstract

As a representative sequential pattern mining problem, counting the frequency of serial episodes from a streaming sequence has drawn continuous attention in academia due to its wide application in practice, e.g., telecommunication alarms, stock market, transaction logs, bioinformatics, etc. Although a number of serial episodes mining algorithms have been developed recently, most of them are neither stream-oriented, as they require processing the whole dataset multiple times, nor time-aware, as they fail to take into account the time constraint of serial episodes. In this paper, we propose two novel one-pass algorithms, ONCE and ONCE+, each of which can respectively compute two popular frequencies of given episodes satisfying predefined time-constraint as signals in a stream arrives one-after-another. ONCE is only used for non-overlapped frequency where the occurrences of a serial episode in sequence are not intersected. ONCE+ is designed for the distinct frequency where the occurrences of a serial episode do not share any event. Theoretical study proves that our algorithm can correctly mine the frequency of target time constraint serial episodes in a given stream. Experimental study over both real-world and synthetic datasets demonstrates that the proposed algorithm can work, with little time and space, in signal-intensive streams where millions of signals arrive within a single second. Moreover, the algorithm has been applied in a real stream processing system, where the efficacy and efficiency of this work are tested in practical applications.

References

[1]
A. Achar, A. Ibrahim, P.S. Sastry, Pattern-growth based frequent serial episode discovery, Data Knowl. Eng. 87 (2013) 91–108.
[2]
A. Achar, S. Laxman, P.S. Sastry, A unified view of the apriori-based algorithms for frequent episode discovery, Knowl. Inf. Syst. 31 (2) (2012) 223–250.
[3]
R. Agrawal, R. Srikant, Mining sequential patterns, Proceedings of the Eleventh International Conference on Data Engineering, March 6–10, 1995, Taipei, Taiwan, 1995, pp. 3–14.
[4]
X. Ao, P. Luo, C. Li, F. Zhuang, Q. He, Online frequent episode mining, 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13–17, 2015, 2015, pp. 891–902.
[5]
X. Ao, P. Luo, J. Wang, F. Zhuang, Q. He, Mining precise-positioning episode rules from event sequences, 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19–22, 2017, 2017, pp. 83–86.
[6]
R. Bertens, J. Vreeken, A. Siebes, Keeping it short and simple: Summarising complex event sequences with multivariate patterns, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13–17, 2016, 2016, pp. 735–744.
[7]
B. Cadonna, J. Gamper, M.H. Böhlen, Sequenced event set pattern matching, EDBT 2011, 14th International Conference on Extending Database Technology, Uppsala, Sweden, March 21–24, 2011, Proceedings, 2011, pp. 33–44.
[8]
T. Calders, N. Dexters, B. Goethals, Mining frequent itemsets in a stream, Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), October 28–31, 2007, Omaha, Nebraska, USA, 2007, pp. 83–92.
[9]
J. Cheng, Y. Ke, W. Ng, A survey on algorithms for mining frequent itemsets over data streams, Knowl. Inf. Syst. 16 (1) (2008) 1–27.
[10]
K. Golmohammadi, O.R. Zaïane, Data mining applications for fraud detection in securities market, 2012 European Intelligence and Security Informatics Conference, EISIC 2012, Odense, Denmark, August 22–24, 2012, 2012, pp. 107–114.
[11]
K. Iwanuma, Y. Takano, H. Nabeshima, On anti-monotone frequency measures for extracting sequential patterns from a single very-long data sequence, 2004 IEEE Conference on Cybernetics and Intelligent Systems, 2004, pp. 213–217 vol.1.
[12]
X. Ji, J. Bailey, G. Dong, Mining minimal distinguishing subsequence patterns with gap constraints, Knowl. Inf. Syst. 11 (3) (2007) 259–286.
[13]
M. Joshi, G. Karypis, V. Kumar, A Universal Formulation of Sequential Patterns, Technical Report, Department of Computer Science, University of Minnesota, 1999.
[14]
S. Laxman, Discovering Frequent Episodes: Fast Algorithms, Connections with HMMs and Generalizations, Indian Institute of Science, 2006, Ph.D. thesis.
[15]
S. Laxman, P.S. Sastry, K.P. Unnikrishnan, Discovering frequent episodes and learning hidden markov models: a formal connection, IEEE Trans. Knowl. Data Eng. 17 (11) (2005) 1505–1517.
[16]
S. Laxman, P.S. Sastry, K.P. Unnikrishnan, A fast algorithm for finding frequent episodes in event streams, Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12–15, 2007, 2007, pp. 410–419.
[17]
G.S. Manku, R. Motwani, Approximate frequency counts over data streams, PVLDB 5 (12) (2012) 1699.
[18]
H. Mannila, H. Toivonen, A.I. Verkamo, Discovery of frequent episodes in event sequences, Data Min. Knowl. Discov. 1 (3) (1997) 259–289.
[19]
C. Mooney, J.F. Roddick, Sequential pattern mining - approaches and algorithms, ACM Comput. Surv. 45 (2) (2013) 19.
[20]
D. Patnaik, S. Laxman, B. Chandramouli, N. Ramakrishnan, Efficient episode mining of dynamic event streams, 12th IEEE International Conference on Data Mining, ICDM 2012, Brussels, Belgium, December 10–13, 2012, 2012, pp. 605–614.
[21]
Patnaik, D., Ramakrishnan, N., Laxman, S., Chandramouli, B., 2012b. Streaming algorithms for pattern discovery over dynamically changing event sequences. CoRR abs/1205.4477.
[22]
J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, M. Hsu, Prefixspan: mining sequential patterns by prefix-projected growth, Proceedings of the 17th International Conference on Data Engineering, April 2–6, 2001, Heidelberg, Germany, 2001, pp. 215–224.
[23]
J. Pei, H. Wang, J. Liu, K. Wang, J. Wang, P.S. Yu, Discovering frequent closed partial orders from strings, IEEE Trans. Knowl. Data Eng. 18 (11) (2006) 1467–1481.
[24]
M.M. Rahman, C.F. Ahmed, C.K. Leung, Mining weighted frequent sequences in uncertain databases, Inf. Sci. 479 (2019) 76–100.
[25]
I. Saha, J. Dewanjee, A web based nucleotide sequencing tool using blast algorithm, Int. J. Biotech. Trends Technol. 18 (2016).
[26]
L. Wan, L. Chen, C. Zhang, Mining frequent serial episodes over uncertain sequence data, Joint 2013 EDBT/ICDT Conferences, EDBT ’13 Proceedings, Genoa, Italy, March 18–22, 2013, 2013, pp. 215–226.
[27]
C. Wu, Y. Lin, P.S. Yu, V.S. Tseng, Mining high utility episodes in complex event sequences, The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, August 11–14, 2013, 2013, pp. 536–544.
[28]
E. Wu, Y. Diao, S. Rizvi, High-performance complex event processing over streams, Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, June 27–29, 2006, 2006, pp. 407–418.
[29]
H. Yahyaoui, A. Al-Mutairi, A feature-based trust sequence classification algorithm, Inf. Sci. 328 (2016) 455–484.
[30]
X. Yan, J. Han, R. Afshar, Clospan: Mining closed sequential patterns in large datasets, Proceedings of the Third SIAM International Conference on Data Mining, San Francisco, CA, USA, May 1–3, 2003, 2003, pp. 166–177.
[31]
H. Zhang, Y. Diao, N. Immerman, On complexity and optimization of expensive queries in complex event processing, International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22–27, 2014, 2014, pp. 217–228.
[32]
M. Zhang, B. Kao, D.W. Cheung, K.Y. Yip, Mining periodic patterns with gap requirement from sequences, ACM Trans. Knowl. Discov. Data 1 (2) (2007).

Cited By

View all
  • (2024)Discovering frequent parallel episodes in complex event sequences by counting distinct occurrencesApplied Intelligence10.1007/s10489-023-05187-y54:1(701-721)Online publication date: 1-Jan-2024
  • (2022)Incremental Mining of Frequent Serial Episodes Considering Multiple OccurrencesComputational Science – ICCS 202210.1007/978-3-031-08751-6_33(460-472)Online publication date: 21-Jun-2022

Index Terms

  1. Counting the frequency of time-constrained serial episodes in a streaming sequence
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Information Sciences: an International Journal
        Information Sciences: an International Journal  Volume 505, Issue C
        Dec 2019
        600 pages

        Publisher

        Elsevier Science Inc.

        United States

        Publication History

        Published: 01 December 2019

        Author Tags

        1. Alarms
        2. Overlap
        3. One-pass algorithm
        4. Occurrence map

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 14 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Discovering frequent parallel episodes in complex event sequences by counting distinct occurrencesApplied Intelligence10.1007/s10489-023-05187-y54:1(701-721)Online publication date: 1-Jan-2024
        • (2022)Incremental Mining of Frequent Serial Episodes Considering Multiple OccurrencesComputational Science – ICCS 202210.1007/978-3-031-08751-6_33(460-472)Online publication date: 21-Jun-2022

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media