[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Efficiently mining high utility sequential patterns in static and streaming data

Published: 01 January 2017 Publication History

Abstract

High utility sequential pattern (HUSP) mining has emerged as a novel topic in data mining. Although some preliminary works have been conducted on this topic, they incur the problem of producing a large search space for high utility sequential patterns. In addition, they mainly focus on mining HUSPs in static databases and do not take streaming data into account, where unbounded data come continuously and often at a high speed. To efficiently deal with both problems, we propose a novel framework for mining high utility sequential patterns over static and streaming databases. In this regard, two efficient data structures named ItemUtilLists (Item Utility Lists) and HUSP-Tree (High Utility Sequential Pattern Tree) are proposed to maintain essential information for mining HUSPs in both offline and online fashions. In addition, a novel utility model called Sequence-Suffix Utility is proposed for effectively pruning the search space in HUSP mining. We propose an algorithm named HUSP-Miner (High Utility Sequential Pattern Miner) to find HUSPs in static databases efficiently. Then, a one-pass algorithm named HUSP-Stream (High Utility Sequential Pattern mining over Data Streams) is proposed to incrementally update ItemUtilLists and HUSP-Tree online and find HUSPs over data streams. To the best of our knowledge, HUSP-Stream is the first method to find HUSPs over data streams. Experimental results on both real and synthetic datasets show that HUSP-Miner outperforms the compared algorithms substantially in terms of execution time, memory usage and number of generated candidates. The experiments also demonstrate impressive performance of HUSP-Stream to update the data structures and discover HUSPs over data streams.

References

[1]
Agrawal R. and Srikant R., Mining sequential patterns, In Proc. of the Eleventh International Conference on Data Engineering, 1995, 3-14.
[2]
Ahmed C.F., Tanbeer S.K. and Jeong B., A novel approach for mining high-utility sequential patterns in sequence databases, Electronics and Telecommunications Research Institute Journal 32 (2010), 676-686.
[3]
Ahmed C.F., Tanbeer S.K. and Jeong B.-S., A framework for mining high utility web access sequences, IETE Technical Review 28 (2011), 3-16.
[4]
Ahmed C.F., Tanbeer S.K. and Jeong B.S., Interactive mining of high utility patterns over data streams, Expert Systems With Applications 39 (2012), 11979-11991.
[5]
Ahmed C.F., Tanbeer S.K., Jeong B.S. and Lee Y.K., Efficient tree structures for high-utility pattern mining in incremental databases, IEEE Transactions on Knowledge and Data Engineering 21 (2009), 1708-1721.
[6]
Ayres J., Flannick J., Gehrke J. and Yiu T., Sequential pattern mining using a bitmap representation, In Proc. of ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, 2002, 429-435.
[7]
Rassi P.P.C. and Teisseire M., Speed: Mining maximal sequential patterns over data streams, In Proc. of the IEEE Int'l Conference on Intelligent Systems, 2006, 546-552.
[8]
Chang L., Wang T., Yang D. and Luan H., Seqstream: Mining closed sequential patterns over stream sliding windows, In Proc. of the IEEE International Conference on Data Mining, 2008, 83-92.
[9]
Chen G., Wu X. and Zhu X., Mining Sequential Patterns across Data Streams, PhD thesis, University of Vermont, 2005.
[10]
Chu C., Tseng V.S. and Liang T., An efficient algorithm for mining temporal high utility itemsets from data streams, Journal of Systems and Software 81 (2008), 1105-1117.
[11]
Fournier-Viger P., Gomariz A., Gueniche T., Soltani A., Wu C. and Tseng V.S., Spmf: A java open-source pattern mining library, Journal of Machine Learning Research 15 (2014), 3389-3393.
[12]
Fournier-Viger P., Wu C.-W., Zida S. and Tseng V.S., FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning, 83-92. Springer International Publishing, Cham, 2014.
[13]
Han J., Pei J., Mortazavi-Asl B., Chen Q., Dayal U. and Hsu M., Freespan: Frequent pattern-projected sequential pattern mining, In Proc.of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2010, 355-359.
[14]
Ho C., Li H., Kuo F. and Lee S., Incremental mining of sequential patterns over a stream sliding window, In Proc. of the ICDM Workshops, 2006, 677-681.
[15]
Li H.F., Huang H.Y., Chen Y.C., Liu Y.J. and Lee S.Y., Fast and memory efficient mining of high utility itemsets in data streams, In Proc. of the 8th IEEE Int'l Conference on Data Mining, 2008, 881-886.
[16]
Liu M. and Qu J., Mining high utility itemsets without candidate generation, In Proc. of the 21st ACM international Conference on Information and Knowledge Management, 2012, 55-64.
[17]
Liu Y., Liao W.k. and Choudhary A., A fast high utility itemsets mining algorithm, In Proc. of the 1st International Workshop on Utility-Based Data Mining, 2005, 90-99.
[18]
Marascu A. and Masseglia F., Mining sequential patterns from temporal streaming data, In Prc. of the ECML/PKDD Workshop on Mining Complex Data, 2005, 355-359.
[19]
Pei J., Han J., Mortazavi-Asl B., Wang J., Pinto H., Chen Q., Dayal U. and Hsu M.-C., Mining sequential patterns by pattern-growth: The prefixspan approach, IEEE Transactions on Knowledge and Data Engineering 16 (2004), 1424-1440.
[20]
Pisharath J., Liu Y., Ozisikyilmaz B., Narayanan R., Liao W.K., Choudhary A. and Memik G., Nu-minebench version 2.0 dataset and technical report, http://cucis.ece.northwestern.edu/projects/dms/minebench.html, 2012.
[21]
Shie B., Hsiao H. and Tseng V.S., Efficient algorithms for discovering high utility user behavior patterns in mobile commerce environments, Knowledge and Information Systems journal 37 (2013).
[22]
Shie B.E., Yu P.S. and Tseng V.S., Efficient algorithms for mining maximal high utility itemsets from data streams with different models, Expert Systems with Applications 39 (2012), 12947-12960.
[23]
Srikant R. and Agrawal R., Mining sequential patterns: Generalizations and performance improvements, In Proc. of the Intelligent Conference on Extending Database Technology: Advances in Database Technology, 1996, 3-17.
[24]
Tseng V., Shie B., Wu C.-W. and Yu P., Efficient algorithms for mining high utility itemsets from transactional databases, IEEE Transactions on Knowledge and Data Engineering 25 (2013), 1772-1786.
[25]
Yen S., Wu C., Lee Y., Tseng V.S. and Hsieh C., A fast algorithm for mining frequent closed itemsets over stream sliding window, In Proc. of the Int'l Conference on Fuzzy Systems, 2011, 996-1002.
[26]
Yin J., Zheng Z. and Cao L., Uspan: An efficient algorithm for mining high utility sequential patterns, In Proc. of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, 660-668.
[27]
Zaki M.J., Spade: An efficient algorithm for mining frequent sequences, Machine Learning Journal 42 (2001), 31-60.
[28]
Zihayat M. and An A., Mining top-k high utility patterns over data streams, Information Sciences 285 (2014), 138-161.
[29]
Zihayat M., Wu C.-W., An A. and Tseng V.S., Mining high utility sequential patterns from evolving data streams, In Proc. of the ASE BigData & SocialInformatics 2015, 2015, 52:1-52:6.

Cited By

View all
  • (2023)A survey of high utility sequential patterns mining methodsJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23210745:5(8049-8077)Online publication date: 4-Nov-2023
  • (2022)An efficient algorithm for mining closed high utility itemsets over data streams with one dataset scanKnowledge and Information Systems10.1007/s10115-022-01763-965:1(207-240)Online publication date: 24-Sep-2022
  • (2022)Incremental Mining of Frequent Serial Episodes Considering Multiple OccurrencesComputational Science – ICCS 202210.1007/978-3-031-08751-6_33(460-472)Online publication date: 21-Jun-2022

Index Terms

  1. Efficiently mining high utility sequential patterns in static and streaming data
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Intelligent Data Analysis
        Intelligent Data Analysis  Volume 21, Issue S1
        2017
        226 pages

        Publisher

        IOS Press

        Netherlands

        Publication History

        Published: 01 January 2017

        Author Tags

        1. High utility sequential pattern mining
        2. data streams
        3. sliding window

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 17 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)A survey of high utility sequential patterns mining methodsJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23210745:5(8049-8077)Online publication date: 4-Nov-2023
        • (2022)An efficient algorithm for mining closed high utility itemsets over data streams with one dataset scanKnowledge and Information Systems10.1007/s10115-022-01763-965:1(207-240)Online publication date: 24-Sep-2022
        • (2022)Incremental Mining of Frequent Serial Episodes Considering Multiple OccurrencesComputational Science – ICCS 202210.1007/978-3-031-08751-6_33(460-472)Online publication date: 21-Jun-2022

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media