[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

New approaches for mining high utility itemsets with multiple utility thresholds

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Recently, two research directions have been noticed in data mining: frequent itemset mining (FIM) and high utility itemset mining (HUIM). The FIM process will output itemsets whose number of occurrences together exceeds or equals the required threshold, but this process ignores the beneficial attribute of each item. HUIM algorithms are proposed to overcome the disadvantage of FIM, but these algorithms only use a single threshold, which is unsuitable in the real world when applications often require different utility thresholds. HUIM algorithms with multi-threshold utilities are proposed, but these have high mining time and memory consumption. This paper thus presents an efficient method for Mining High Utility Itemsets with Multiple Utility Thresholds (MHUI-MUT). The article applies upper bounds and the strategy of pruning, thus reducing database scanning, and proposes a cut-off threshold to minimize the mining time.We also present a method to parallelize the algorithm to make the most of the performance of multi-core computers. The experimental results show the superior speed of the MHUI-MUT algorithm compared to the previous one, and the parallel version also outperforms the proposed sequential algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php

Notes

  1. http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php

  2. http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php

References

  1. Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22(2):207–216

    Article  Google Scholar 

  2. Nguyen D, Luo W, Phung D, Venkatesh S (2018) LTARM: A novel temporal association rule mining method to understand toxicities in a routine cancer treatment. Knowl-Based Syst 161:313–328

    Article  Google Scholar 

  3. Nguyen D, Luo W, Vo B, Pedrycz W (2020) Succinct contrast sets via false positive controlling with an application in clinical process redesign. Expert Syst Appl 161:113670

    Article  Google Scholar 

  4. Nguyen D, Luo W, Vo B, Nguyen LTT, Pedrycz W (2021) Con2Vec: Learning embedding representations for contrast sets. Knowl-Based Syst 229:107382

    Article  Google Scholar 

  5. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large Databases. In: Proceedings of the 20th international conference on very large data bases, vol 1215, p 487499

  6. Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390

    Article  Google Scholar 

  7. Han J, Pei J, Yin Y, Mao R (2004) Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. Data Min Knowl Disc 8(1):53–87

    Article  MathSciNet  Google Scholar 

  8. Liu B, Hsu W, Ma Y (1999) Mining association rules with multiple minimum supports. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 99, p 337341. https://doi.org/10.1145/312129.312274

  9. Hu Y-H, Chen Y-L (2006) Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism. Decis Support Syst 42(1):1–24

    Article  Google Scholar 

  10. Yao H, Hamilton HJ, Butz CJ (2004) A Foundational approach to mining itemset utilities from databases. In: Proceedings of the 2004 SIAM international conference on data mining, vol 4, p 482486. https://doi.org/10.1137/1.9781611972740.51

  11. Tseng VS, Shie B-EE, Wu C-WW, Yu PS (2013) Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases. IEEE Trans Knowl Data Eng 25(8):1772–1786

    Article  Google Scholar 

  12. Ahmed CF, Tanbeer SK (2009) Byeong-Soo Jeong, and Young-Koo Lee, “Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases.” IEEE Trans Knowl Data Eng 21(12):1708–1721

  13. Liu Y-C, Cheng C-P, Tseng VS (2013) Mining differential top-k co-expression patterns from time course comparative gene expression datasets. BMC Bioinformatics 14(1):230

    Article  Google Scholar 

  14. Thilagu M, Nadarajan R (2012) Efficiently Mining of Effective Web Traversal Patterns with Average Utility. Procedia Technol 6:444–451

    Article  Google Scholar 

  15. Belghith K, Fournier-Viger P, Jawadi J (2022) Hui2Vec: learning transaction embedding through high utility itemsets. In: lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 13773. LNCS, p 211224. https://doi.org/10.1007/978-3-031-24094-2_15

  16. Liu Y, Liao WK, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 3518. LNAI, Springer-Verlag, p 689695. https://doi.org/10.1007/11430919_79

  17. Tseng VS, Wu CW, Shie BE, Yu PS (2010) UP-Growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p 253262. https://doi.org/10.1145/1835804.1835839

  18. Le B, Nguyen H, Vo B (2011) An efficient strategy for mining high utility itemsets. Int J Intell Inf Database Syst 5(2):164–176

    Google Scholar 

  19. Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM 12, p 55. https://doi.org/10.1145/2396761.2396773

  20. Fournier-Viger P, Wu C-W, Zida S, Tseng VS (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 8502. LNAI, p 8392. https://doi.org/10.1007/978-3-319-08326-1_9

  21. Krishnamoorthy S (2017) HMiner: Efficiently mining high utility itemsets. Expert Syst Appl 90:168–183

    Article  Google Scholar 

  22. Duong Q-HH, Fournier-Viger P, Ramampiaro H, Nørvåg K, Dam T-LL (2018) Efficient high utility itemset mining using buffered utility-lists. Appl Intell 48(7):1859–1877

    Article  Google Scholar 

  23. Zida S, Fournier-Viger P, Lin JC-W, Wu C-W, Tseng VS (2017) EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51(2):595–625

    Article  Google Scholar 

  24. Nguyen LTT, Nguyen P, Nguyen TDD, Vo B, Fournier-Viger P, Tseng VS (2019) Mining high-utility itemsets in dynamic profit databases. Knowl-Based Syst 175:130–144

    Article  Google Scholar 

  25. Lin JC-W, Gan W, Fournier-Viger P, Hong T-P (2015) Mining high-utility itemsets with multiple minimum utility thresholds. In: Proceedings of the eighth international C* Conference on Computer Science & Software Engineering - C3S2E 15, pp 917. https://doi.org/10.1007/978-3-319-44403-1_5

  26. Lin JC-W, Gan W, Fournier-Viger P, Hong T-P, Zhan J (2016) Efficient mining of high-utility itemsets using multiple minimum utility thresholds. Knowl-Based Syst 113:100–115

    Article  Google Scholar 

  27. Gan W, Lin JCW, Fournier-Viger P, Chao HC (2016) More efficient algorithms for mining high-utility itemsets with multiple minimum utility thresholds. In: Hartmann S, Ma H (eds) Database and expert systems applications. DEXA 2016. Lecture Notes in Computer Science, vol 9827. Springer, Cham. https://doi.org/10.1007/978-3-319-44403-1_5

  28. Krishnamoorthy S (2018) Efficient mining of high utility itemsets with multiple minimum utility thresholds. Eng Appl Artif Intell 69:112–126

    Article  Google Scholar 

  29. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Rec 29(2):1–12

    Article  Google Scholar 

  30. Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 03, p 326. https://doi.org/10.1145/956750.956788

  31. Kiran RU, Reddy PK (2011) Novel techniques to reduce search space in multiple minimum supports-based frequent pattern mining algorithms. In: Proceedings of the 14th International Conference on Extending Database Technology - EDBT/ICDT 11, p 11. https://doi.org/10.1145/1951365.1951370

  32. Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Zhan J (2017) Mining of frequent patterns with multiple minimum supports. Eng Appl Artif Intell 60:83–96

    Article  Google Scholar 

  33. Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603–626

    Article  Google Scholar 

  34. Le B, Nguyen H, Cao TA, Vo B (2009) A novel algorithm for mining high utility itemsets. In: 2009 first asian conference on intelligent information and database systems, pp 1317. https://doi.org/10.1109/ACIIDS.2009.55

  35. Wu P, Niu X, Fournier-Viger P, Huang C, Wang B (2022) UBP-Miner: An efficient bit based high utility itemset mining algorithm. Knowl-Based Syst 248:108865

    Article  Google Scholar 

  36. Cheng Z, Fang W, Shen W, Lin JC-W, Yuan B (2023) An efficient utility-list based high-utility itemset mining algorithm. Appl Intell 53(6):6992–7006

    Article  Google Scholar 

  37. Liu J, Wang K, Fung BCM (2012) Direct discovery of high utility itemsets without candidate generation. In: Proceedings  IEEE International Conference on Data Mining, ICDM, pp 984989. https://doi.org/10.1109/ICDM.2012.20

  38. Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Tseng VS, Yu PS (2021) A Survey of Utility-Oriented Pattern Mining. IEEE Trans Knowl Data Eng 33(4):1306–1327

    Article  Google Scholar 

  39. Singh K, Singh SS, Kumar A, Biswas B (2019) TKEH: an efficient algorithm for mining top-k high utility itemsets. Appl Intell 49(3):1078–1097

    Article  Google Scholar 

  40. Krishnamoorthy S (2019) Mining top-k high utility itemsets with effective threshold raising strategies. Expert Syst Appl 117:148–165

    Article  Google Scholar 

  41. Han X, Liu X, Li J, Gao H (2021) Efficient top-k high utility itemset mining on massive data. Inf Sci 557:382–406

    Article  MathSciNet  Google Scholar 

  42. Nguyen LTT, Vu D-B, Nguyen TDD, Vo B (2020) Mining Maximal High Utility Itemsets on Dynamic Profit Databases. Cybern Syst 51(2):140–160

    Article  Google Scholar 

  43. Vo B, Nguyen LTT, Bui N, Nguyen TDD, Huynh V-N, Hong T-P (2020) An Efficient Method for Mining Closed Potential High-Utility Itemsets. IEEE Access 8:31813–31822

    Article  Google Scholar 

  44. Nguyen TDD, Nguyen LTT, Vu L, Vo B, Pedrycz W (2021) Efficient algorithms for mining closed high utility itemsets in dynamic profit databases. Expert Syst Appl 186:115741

    Article  Google Scholar 

  45. Yun U, Nam H, Lee G, Yoon E (2019) Efficient approach for incremental high utility pattern mining with indexed list structure. Futur Gener Comput Syst 95:221–239

    Article  Google Scholar 

  46. Tung NT, Nguyen LTT, Nguyen TDD, Vo B (2022) An efficient method for mining multi-level high utility Itemsets. Appl Intell 52(5):5475–5496

    Article  Google Scholar 

  47. Tung NT, Nguyen LTT, Nguyen TDD, Fourier-Viger P, Nguyen N-T, Vo B (2022) Efficient mining of cross-level high-utility itemsets in taxonomy quantitative databases. Inf Sci 587:41–62

    Article  Google Scholar 

  48. Alhusaini N, Li J, Fournier-Viger P, Hawbani A, Chen G (2022) Mining high utility itemset with multiple minimum utility thresholds based on utility deviation. In: 2022 IEEE International Conference on Data Mining Workshops (ICDMW), pp 49049. https://doi.org/10.1109/ICDMW58026.2022.00071

  49. Nguyen TDD, Nguyen LTT, Vo B (2019) A parallel algorithm for mining high utility itemsets. In: Advances in intelligent systems and computing, vol 853. Springer Verlag, pp 286295. https://doi.org/10.1007/978-3-319-99996-8_26

  50. Vo B, Nguyen LTT, Nguyen TDD, Fournier-Viger P, Yun U (2020) A Multi-Core Approach to Efficiently Mining High-Utility Itemsets in Dynamic Profit Databases. IEEE Access 8:85890–85899

    Article  Google Scholar 

  51. Nguyen LTT et al (2020) Efficient method for mining high utility itemsets using high-average utility measure. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 12496. LNAI, pp 305315. https://doi.org/10.1007/978-3-030-63007-2_24

  52. Nguyen TDD, LTT Nguyen, Kozierkiewicz A, Pham T, Vo B (2021) An efficient approach for mining high-utility itemsets from multiple abstraction levels. In: Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 12672. LNAI, p 92103. https://doi.org/10.1007/978-3-030-73280-6_8

  53. Tung NT, Nguyen LTT, Nguyen TDD, Kozierkiewicz A (2021) Cross-level high-utility itemset mining using multi-core processing. In: Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Note in Bioinformatics), vol 12876. LNAI, p 467479. https://doi.org/10.1007/978-3-030-88081-1_35

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Contributions

Bao Huynh: Methodology, Writing-Original draft preparation; N.T. Tung: Data curation, Software, Validation, Writing—Review & Editing; Trinh D.D. Nguyen: Validation, Writing—Review & Editing; Vaclav Snasel: Validation, Writing—Review & Editing; Loan T.T. Nguyen: Methodology, Validation, Writing—Review & Editing.

Corresponding author

Correspondence to Loan Nguyen.

Ethics declarations

Competing Interests

The authors have no relevant financial or non-financial interests to disclose.

Ethical and informed consent for data used.

We use public datasets in our experiments.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huynh, B., Tung, N.T., Nguyen, T.D.D. et al. New approaches for mining high utility itemsets with multiple utility thresholds. Appl Intell 54, 767–790 (2024). https://doi.org/10.1007/s10489-023-05145-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05145-8

Keywords

Navigation