[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Weighted frequent itemset mining over uncertain databases

Published: 01 January 2016 Publication History

Abstract

Frequent itemset mining (FIM) is a fundamental research topic, which consists of discovering useful and meaningful relationships between items in transaction databases. However, FIM suffers from two important limitations. First, it assumes that all items have the same importance. Second, it ignores the fact that data collected in a real-life environment is often inaccurate, imprecise, or incomplete. To address these issues and mine more useful and meaningful knowledge, the problems of weighted and uncertain itemset mining have been respectively proposed, where a user may respectively assign weights to items to specify their relative importance, and specify existential probabilities to represent uncertainty in transactions. However, no work has addressed both of these issues at the same time. In this paper, we address this important research problem by designing a new type of patterns named high expected weighted itemset (HEWI) and the HEWI-Uapriori algorithm to efficiently discover HEWIs. The HEWI-Uapriori finds HEWIs using an Apriori-like two-phase approach. The algorithm introduces a property named high upper-bound expected weighted downward closure (HUBEWDC) to early prune the search space and unpromising itemsets. Substantial experiments on real-life and synthetic datasets are conducted to evaluate the performance of the proposed algorithm in terms of runtime, memory consumption, and number of patterns found. Results show that the proposed algorithm has excellent performance and scalability compared with traditional methods for weighted-itemset mining and uncertain itemset mining.

References

[1]
(2012) Frequent itemset mining dataset repository. Available: http://fimi.ua.ac.be/data/
[2]
Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data, ACM SIGKDD international conference on knowledge discovery and data mining, pp 29---38
[3]
Aggarwal CC, Yu PS (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21(5):609---623
[4]
Aggarwal CC (2010) Managing and mining uncertain data, managing and mining uncertain data
[5]
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases, the international conference on very large data bases, pp 487---499
[6]
Agrawal R, Srikant R (1994) Quest synthetic data generator. Available: http://www.Almaden.ibm.com/cs/quest/syndata.html
[7]
Bernecker T, Kriegel HP, Renz M, Verhein F, Zuefl A (2009) Probabilistic frequent itemset mining in uncertain databases, ACM SIGKDD international conference on knowledge discovery and data mining, pp 119---128
[8]
Cai CH, Fu AWC, Cheng CH, Kwong WW (1998) Mining association rules with weighted items, the international database engineering and applications symposium, pp 68---77
[9]
Chen MS, Han J, Yu PS (1996) Data mining: An overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866--- 883
[10]
Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data, advances in knowledge discovery and data mining
[11]
Geng L, Hamilton H J Interestingness measures for data mining: A survey. ACM Comput Surv 38(3):2006
[12]
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Disc 8(1):53---87
[13]
Lan GC, Hong TP, Lee HY, Lin C W (2013) Mining weighted frequent itemsets, the workshop on combinatorial mathematics and computation theory, pp 85---89
[14]
Lan GC, Hong TP, Lee HY (2014) An efficient approach for finding weighted sequential patterns from sequence databases. Appl Intell 41(2):439---452
[15]
Leung CKS, Mateo MAF, Brajczuk DA (2008) A tree-based approach for frequent pattern mining from uncertain data, advances in knowledge discovery and data mining, pp 653---661
[16]
Leung CKS, Hao B (2009) Mining of frequent itemsets from streams of uncertain data, IEEE international conference on data engineering, pp 1663---1670
[17]
Lin CW, Hong TP, Lu WH (2009) The Pre-FUFP algorithm for incremental mining. Expert Syst Appl 36(5):9498--- 9505
[18]
Lin CW, Hong TP (2011) Temporal data mining with up-to-date pattern trees. Expert Syst Appl 38 (12):15143---15150
[19]
Lin CW, Hong TP (2012) A new mining approach for uncertain databases using cufp trees. Expert Syst Appl 39(4):4084--- 4093
[20]
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, et al. (2004) Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424---1440
[21]
Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements, the international conference on extending database technology: advances in database technology, pp 3---17
[22]
Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees, the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 273---282
[23]
Tang P, Peterson E A (2011) Mining probabilistic frequent closed itemsets in uncertain databases. The Annual Southeast Regional Conference, pp 86---91
[24]
Tao F, Murtagh F, Farid M (2003) Weighted association rule mining using weighted support and significance framework, ACM SIGKDD international conference on knowledge discovery and data mining, pp 661---666
[25]
Tong Y, Chen L, Cheng Y, Yu PS (2012) Mining frequent itemsets over uncertain databases. The VLDB Endowment 5(11):1650---1661
[26]
Vo B, Coenen F, Le B (2013) A new method for mining frequent weighted itemsets based on wit-trees. Expert Syst Appl 40(4):1256---1264
[27]
Wang W, Yang J, Yu PS (2000) Efficient mining of weighted association rules (war), ACM SIGKDD international conference on knowledge discovery and data mining, pp 270---274
[28]
Yun U, Leggett J (2005) WFIM: Weighted frequent itemset mining with a weight range and a minimum weight, SIAM international conference on data mining, pp 636---640
[29]
Yun U, Leggett J (2006) WSpan: Weighted sequential pattern mining in large sequential database, IEEE international conference on intelligent systems, pp 512---517
[30]
Yun U (2008) A new framework for detecting weighted sequential patterns in large sequence databases. Knowl-Based Syst 21(2):110---122
[31]
Zaki M, Hsiao C (2002) CHARM: An efficient algorithm for closed itemset mining. SIAM Int Conf Data Min 2:457---473

Cited By

View all
  • (2022)Discovering Top-k Profitable Patterns for Smart ManufacturingCompanion Proceedings of the Web Conference 202210.1145/3487553.3524706(956-964)Online publication date: 25-Apr-2022
  • (2022)Discovering probabilistically weighted sequential patterns in uncertain databasesApplied Intelligence10.1007/s10489-022-03699-753:6(6525-6553)Online publication date: 9-Jul-2022
  • (2022)Fast Weighted Sequential Pattern MiningAdvances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence10.1007/978-3-031-08530-7_68(807-818)Online publication date: 19-Jul-2022
  • Show More Cited By
  1. Weighted frequent itemset mining over uncertain databases

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Applied Intelligence
    Applied Intelligence  Volume 44, Issue 1
    January 2016
    250 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 January 2016

    Author Tags

    1. Data mining
    2. Two-phase
    3. Uncertain databases
    4. Upper-bound
    5. Weighted frequent itemsets

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 23 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Discovering Top-k Profitable Patterns for Smart ManufacturingCompanion Proceedings of the Web Conference 202210.1145/3487553.3524706(956-964)Online publication date: 25-Apr-2022
    • (2022)Discovering probabilistically weighted sequential patterns in uncertain databasesApplied Intelligence10.1007/s10489-022-03699-753:6(6525-6553)Online publication date: 9-Jul-2022
    • (2022)Fast Weighted Sequential Pattern MiningAdvances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence10.1007/978-3-031-08530-7_68(807-818)Online publication date: 19-Jul-2022
    • (2020)UHUOPM: High Utility Occupancy Pattern Mining in Uncertain Data2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC42975.2020.9282878(3066-3071)Online publication date: 11-Oct-2020
    • (2020)UWFP-Outlier: an efficient frequent-pattern-based outlier detection method for uncertain weighted data streamsApplied Intelligence10.1007/s10489-020-01718-z50:10(3452-3470)Online publication date: 1-Oct-2020
    • (2020)Minimal weighted infrequent itemset mining-based outlier detection approach on uncertain data streamNeural Computing and Applications10.1007/s00521-018-3876-432:11(6619-6639)Online publication date: 1-Jun-2020
    • (2019)SPPCApplied Intelligence10.1007/s10489-018-1280-549:2(478-495)Online publication date: 1-Feb-2019
    • (2018)Efficient algorithms for mining top-rank-k erasable patterns using pruning strategies and the subsume conceptEngineering Applications of Artificial Intelligence10.1016/j.engappai.2017.09.01068:C(1-9)Online publication date: 1-Feb-2018
    • (2018)Exploiting highly qualified pattern with frequency and weight occupancyKnowledge and Information Systems10.1007/s10115-017-1103-856:1(165-196)Online publication date: 1-Jul-2018
    • (2017)Extracting recent weighted-based patterns from uncertain temporal databasesEngineering Applications of Artificial Intelligence10.1016/j.engappai.2017.03.00461:C(161-172)Online publication date: 1-May-2017
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media