Abstract
Recently, two research directions have been noticed in data mining: frequent itemset mining (FIM) and high utility itemset mining (HUIM). The FIM process will output itemsets whose number of occurrences together exceeds or equals the required threshold, but this process ignores the beneficial attribute of each item. HUIM algorithms are proposed to overcome the disadvantage of FIM, but these algorithms only use a single threshold, which is unsuitable in the real world when applications often require different utility thresholds. HUIM algorithms with multi-threshold utilities are proposed, but these have high mining time and memory consumption. This paper thus presents an efficient method for Mining High Utility Itemsets with Multiple Utility Thresholds (MHUI-MUT). The article applies upper bounds and the strategy of pruning, thus reducing database scanning, and proposes a cut-off threshold to minimize the mining time.We also present a method to parallelize the algorithm to make the most of the performance of multi-core computers. The experimental results show the superior speed of the MHUI-MUT algorithm compared to the previous one, and the parallel version also outperforms the proposed sequential algorithm.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
References
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22(2):207–216
Nguyen D, Luo W, Phung D, Venkatesh S (2018) LTARM: A novel temporal association rule mining method to understand toxicities in a routine cancer treatment. Knowl-Based Syst 161:313–328
Nguyen D, Luo W, Vo B, Pedrycz W (2020) Succinct contrast sets via false positive controlling with an application in clinical process redesign. Expert Syst Appl 161:113670
Nguyen D, Luo W, Vo B, Nguyen LTT, Pedrycz W (2021) Con2Vec: Learning embedding representations for contrast sets. Knowl-Based Syst 229:107382
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large Databases. In: Proceedings of the 20th international conference on very large data bases, vol 1215, p 487499
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390
Han J, Pei J, Yin Y, Mao R (2004) Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. Data Min Knowl Disc 8(1):53–87
Liu B, Hsu W, Ma Y (1999) Mining association rules with multiple minimum supports. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 99, p 337341. https://doi.org/10.1145/312129.312274
Hu Y-H, Chen Y-L (2006) Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism. Decis Support Syst 42(1):1–24
Yao H, Hamilton HJ, Butz CJ (2004) A Foundational approach to mining itemset utilities from databases. In: Proceedings of the 2004 SIAM international conference on data mining, vol 4, p 482486. https://doi.org/10.1137/1.9781611972740.51
Tseng VS, Shie B-EE, Wu C-WW, Yu PS (2013) Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Ahmed CF, Tanbeer SK (2009) Byeong-Soo Jeong, and Young-Koo Lee, “Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases.” IEEE Trans Knowl Data Eng 21(12):1708–1721
Liu Y-C, Cheng C-P, Tseng VS (2013) Mining differential top-k co-expression patterns from time course comparative gene expression datasets. BMC Bioinformatics 14(1):230
Thilagu M, Nadarajan R (2012) Efficiently Mining of Effective Web Traversal Patterns with Average Utility. Procedia Technol 6:444–451
Belghith K, Fournier-Viger P, Jawadi J (2022) Hui2Vec: learning transaction embedding through high utility itemsets. In: lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 13773. LNCS, p 211224. https://doi.org/10.1007/978-3-031-24094-2_15
Liu Y, Liao WK, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 3518. LNAI, Springer-Verlag, p 689695. https://doi.org/10.1007/11430919_79
Tseng VS, Wu CW, Shie BE, Yu PS (2010) UP-Growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p 253262. https://doi.org/10.1145/1835804.1835839
Le B, Nguyen H, Vo B (2011) An efficient strategy for mining high utility itemsets. Int J Intell Inf Database Syst 5(2):164–176
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM 12, p 55. https://doi.org/10.1145/2396761.2396773
Fournier-Viger P, Wu C-W, Zida S, Tseng VS (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 8502. LNAI, p 8392. https://doi.org/10.1007/978-3-319-08326-1_9
Krishnamoorthy S (2017) HMiner: Efficiently mining high utility itemsets. Expert Syst Appl 90:168–183
Duong Q-HH, Fournier-Viger P, Ramampiaro H, Nørvåg K, Dam T-LL (2018) Efficient high utility itemset mining using buffered utility-lists. Appl Intell 48(7):1859–1877
Zida S, Fournier-Viger P, Lin JC-W, Wu C-W, Tseng VS (2017) EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51(2):595–625
Nguyen LTT, Nguyen P, Nguyen TDD, Vo B, Fournier-Viger P, Tseng VS (2019) Mining high-utility itemsets in dynamic profit databases. Knowl-Based Syst 175:130–144
Lin JC-W, Gan W, Fournier-Viger P, Hong T-P (2015) Mining high-utility itemsets with multiple minimum utility thresholds. In: Proceedings of the eighth international C* Conference on Computer Science & Software Engineering - C3S2E 15, pp 917. https://doi.org/10.1007/978-3-319-44403-1_5
Lin JC-W, Gan W, Fournier-Viger P, Hong T-P, Zhan J (2016) Efficient mining of high-utility itemsets using multiple minimum utility thresholds. Knowl-Based Syst 113:100–115
Gan W, Lin JCW, Fournier-Viger P, Chao HC (2016) More efficient algorithms for mining high-utility itemsets with multiple minimum utility thresholds. In: Hartmann S, Ma H (eds) Database and expert systems applications. DEXA 2016. Lecture Notes in Computer Science, vol 9827. Springer, Cham. https://doi.org/10.1007/978-3-319-44403-1_5
Krishnamoorthy S (2018) Efficient mining of high utility itemsets with multiple minimum utility thresholds. Eng Appl Artif Intell 69:112–126
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Rec 29(2):1–12
Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 03, p 326. https://doi.org/10.1145/956750.956788
Kiran RU, Reddy PK (2011) Novel techniques to reduce search space in multiple minimum supports-based frequent pattern mining algorithms. In: Proceedings of the 14th International Conference on Extending Database Technology - EDBT/ICDT 11, p 11. https://doi.org/10.1145/1951365.1951370
Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Zhan J (2017) Mining of frequent patterns with multiple minimum supports. Eng Appl Artif Intell 60:83–96
Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603–626
Le B, Nguyen H, Cao TA, Vo B (2009) A novel algorithm for mining high utility itemsets. In: 2009 first asian conference on intelligent information and database systems, pp 1317. https://doi.org/10.1109/ACIIDS.2009.55
Wu P, Niu X, Fournier-Viger P, Huang C, Wang B (2022) UBP-Miner: An efficient bit based high utility itemset mining algorithm. Knowl-Based Syst 248:108865
Cheng Z, Fang W, Shen W, Lin JC-W, Yuan B (2023) An efficient utility-list based high-utility itemset mining algorithm. Appl Intell 53(6):6992–7006
Liu J, Wang K, Fung BCM (2012) Direct discovery of high utility itemsets without candidate generation. In: Proceedings IEEE International Conference on Data Mining, ICDM, pp 984989. https://doi.org/10.1109/ICDM.2012.20
Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Tseng VS, Yu PS (2021) A Survey of Utility-Oriented Pattern Mining. IEEE Trans Knowl Data Eng 33(4):1306–1327
Singh K, Singh SS, Kumar A, Biswas B (2019) TKEH: an efficient algorithm for mining top-k high utility itemsets. Appl Intell 49(3):1078–1097
Krishnamoorthy S (2019) Mining top-k high utility itemsets with effective threshold raising strategies. Expert Syst Appl 117:148–165
Han X, Liu X, Li J, Gao H (2021) Efficient top-k high utility itemset mining on massive data. Inf Sci 557:382–406
Nguyen LTT, Vu D-B, Nguyen TDD, Vo B (2020) Mining Maximal High Utility Itemsets on Dynamic Profit Databases. Cybern Syst 51(2):140–160
Vo B, Nguyen LTT, Bui N, Nguyen TDD, Huynh V-N, Hong T-P (2020) An Efficient Method for Mining Closed Potential High-Utility Itemsets. IEEE Access 8:31813–31822
Nguyen TDD, Nguyen LTT, Vu L, Vo B, Pedrycz W (2021) Efficient algorithms for mining closed high utility itemsets in dynamic profit databases. Expert Syst Appl 186:115741
Yun U, Nam H, Lee G, Yoon E (2019) Efficient approach for incremental high utility pattern mining with indexed list structure. Futur Gener Comput Syst 95:221–239
Tung NT, Nguyen LTT, Nguyen TDD, Vo B (2022) An efficient method for mining multi-level high utility Itemsets. Appl Intell 52(5):5475–5496
Tung NT, Nguyen LTT, Nguyen TDD, Fourier-Viger P, Nguyen N-T, Vo B (2022) Efficient mining of cross-level high-utility itemsets in taxonomy quantitative databases. Inf Sci 587:41–62
Alhusaini N, Li J, Fournier-Viger P, Hawbani A, Chen G (2022) Mining high utility itemset with multiple minimum utility thresholds based on utility deviation. In: 2022 IEEE International Conference on Data Mining Workshops (ICDMW), pp 49049. https://doi.org/10.1109/ICDMW58026.2022.00071
Nguyen TDD, Nguyen LTT, Vo B (2019) A parallel algorithm for mining high utility itemsets. In: Advances in intelligent systems and computing, vol 853. Springer Verlag, pp 286295. https://doi.org/10.1007/978-3-319-99996-8_26
Vo B, Nguyen LTT, Nguyen TDD, Fournier-Viger P, Yun U (2020) A Multi-Core Approach to Efficiently Mining High-Utility Itemsets in Dynamic Profit Databases. IEEE Access 8:85890–85899
Nguyen LTT et al (2020) Efficient method for mining high utility itemsets using high-average utility measure. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 12496. LNAI, pp 305315. https://doi.org/10.1007/978-3-030-63007-2_24
Nguyen TDD, LTT Nguyen, Kozierkiewicz A, Pham T, Vo B (2021) An efficient approach for mining high-utility itemsets from multiple abstraction levels. In: Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 12672. LNAI, p 92103. https://doi.org/10.1007/978-3-030-73280-6_8
Tung NT, Nguyen LTT, Nguyen TDD, Kozierkiewicz A (2021) Cross-level high-utility itemset mining using multi-core processing. In: Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Note in Bioinformatics), vol 12876. LNAI, p 467479. https://doi.org/10.1007/978-3-030-88081-1_35
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Contributions
Bao Huynh: Methodology, Writing-Original draft preparation; N.T. Tung: Data curation, Software, Validation, Writing—Review & Editing; Trinh D.D. Nguyen: Validation, Writing—Review & Editing; Vaclav Snasel: Validation, Writing—Review & Editing; Loan T.T. Nguyen: Methodology, Validation, Writing—Review & Editing.
Corresponding author
Ethics declarations
Competing Interests
The authors have no relevant financial or non-financial interests to disclose.
Ethical and informed consent for data used.
We use public datasets in our experiments.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huynh, B., Tung, N.T., Nguyen, T.D.D. et al. New approaches for mining high utility itemsets with multiple utility thresholds. Appl Intell 54, 767–790 (2024). https://doi.org/10.1007/s10489-023-05145-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05145-8