Abstract
Frequent itemset mining (FIM) is a fundamental research topic, which consists of discovering useful and meaningful relationships between items in transaction databases. However, FIM suffers from two important limitations. First, it assumes that all items have the same importance. Second, it ignores the fact that data collected in a real-life environment is often inaccurate, imprecise, or incomplete. To address these issues and mine more useful and meaningful knowledge, the problems of weighted and uncertain itemset mining have been respectively proposed, where a user may respectively assign weights to items to specify their relative importance, and specify existential probabilities to represent uncertainty in transactions. However, no work has addressed both of these issues at the same time. In this paper, we address this important research problem by designing a new type of patterns named high expected weighted itemset (HEWI) and the HEWI-Uapriori algorithm to efficiently discover HEWIs. The HEWI-Uapriori finds HEWIs using an Apriori-like two-phase approach. The algorithm introduces a property named high upper-bound expected weighted downward closure (HUBEWDC) to early prune the search space and unpromising itemsets. Substantial experiments on real-life and synthetic datasets are conducted to evaluate the performance of the proposed algorithm in terms of runtime, memory consumption, and number of patterns found. Results show that the proposed algorithm has excellent performance and scalability compared with traditional methods for weighted-itemset mining and uncertain itemset mining.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
(2012) Frequent itemset mining dataset repository. Available: http://fimi.ua.ac.be/data/
Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data, ACM SIGKDD international conference on knowledge discovery and data mining, pp 29–38
Aggarwal CC, Yu PS (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21(5):609–623
Aggarwal CC (2010) Managing and mining uncertain data, managing and mining uncertain data
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases, the international conference on very large data bases, pp 487–499
Agrawal R, Srikant R (1994) Quest synthetic data generator. Available: http://www.Almaden.ibm.com/cs/quest/syndata.html
Bernecker T, Kriegel HP, Renz M, Verhein F, Zuefl A (2009) Probabilistic frequent itemset mining in uncertain databases, ACM SIGKDD international conference on knowledge discovery and data mining, pp 119–128
Cai CH, Fu AWC, Cheng CH, Kwong WW (1998) Mining association rules with weighted items, the international database engineering and applications symposium, pp 68–77
Chen MS, Han J, Yu PS (1996) Data mining: An overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866– 883
Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data, advances in knowledge discovery and data mining
Geng L, Hamilton H J Interestingness measures for data mining: A survey. ACM Comput Surv 38(3):2006
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87
Lan GC, Hong TP, Lee HY, Lin C W (2013) Mining weighted frequent itemsets, the workshop on combinatorial mathematics and computation theory, pp 85–89
Lan GC, Hong TP, Lee HY (2014) An efficient approach for finding weighted sequential patterns from sequence databases. Appl Intell 41(2):439–452
Leung CKS, Mateo MAF, Brajczuk DA (2008) A tree-based approach for frequent pattern mining from uncertain data, advances in knowledge discovery and data mining, pp 653–661
Leung CKS, Hao B (2009) Mining of frequent itemsets from streams of uncertain data, IEEE international conference on data engineering, pp 1663–1670
Lin CW, Hong TP, Lu WH (2009) The Pre-FUFP algorithm for incremental mining. Expert Syst Appl 36(5):9498– 9505
Lin CW, Hong TP (2011) Temporal data mining with up-to-date pattern trees. Expert Syst Appl 38 (12):15143–15150
Lin CW, Hong TP (2012) A new mining approach for uncertain databases using cufp trees. Expert Syst Appl 39(4):4084– 4093
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, et al. (2004) Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements, the international conference on extending database technology: advances in database technology, pp 3–17
Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees, the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 273–282
Tang P, Peterson E A (2011) Mining probabilistic frequent closed itemsets in uncertain databases. The Annual Southeast Regional Conference, pp 86–91
Tao F, Murtagh F, Farid M (2003) Weighted association rule mining using weighted support and significance framework, ACM SIGKDD international conference on knowledge discovery and data mining, pp 661–666
Tong Y, Chen L, Cheng Y, Yu PS (2012) Mining frequent itemsets over uncertain databases. The VLDB Endowment 5(11):1650–1661
Vo B, Coenen F, Le B (2013) A new method for mining frequent weighted itemsets based on wit-trees. Expert Syst Appl 40(4):1256–1264
Wang W, Yang J, Yu PS (2000) Efficient mining of weighted association rules (war), ACM SIGKDD international conference on knowledge discovery and data mining, pp 270–274
Yun U, Leggett J (2005) WFIM: Weighted frequent itemset mining with a weight range and a minimum weight, SIAM international conference on data mining, pp 636–640
Yun U, Leggett J (2006) WSpan: Weighted sequential pattern mining in large sequential database, IEEE international conference on intelligent systems, pp 512–517
Yun U (2008) A new framework for detecting weighted sequential patterns in large sequence databases. Knowl-Based Syst 21(2):110–122
Zaki M, Hsiao C (2002) CHARM: An efficient algorithm for closed itemset mining. SIAM Int Conf Data Min 2:457–473
Acknowledgments
This research was partially supported by the Tencent Project under grant CCF-TencentRAGR20140114, by the Shenzhen Peacock Project, China, under grant KQC201109020055A, by the Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under grant HIT.NSRIF.2014100, and by the Shenzhen Strategic Emerging Industries Program under grant ZDSY20120613125016389.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lin, J.CW., Gan, W., Fournier-Viger, P. et al. Weighted frequent itemset mining over uncertain databases. Appl Intell 44, 232–250 (2016). https://doi.org/10.1007/s10489-015-0703-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-015-0703-9