[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

STARM: STreaming Association Rules Mining in High-Dimensional Data

  • Conference paper
  • First Online:
Advanced Information Networking and Applications (AINA 2024)

Abstract

Predictive analytics involves using Data Mining algorithms to discover knowledge from large databases. The Association Rules (ARs) mining technique is considered to be one of the most prevalent data mining techniques in this context. When it comes to Big Data, we talk about data stream mining which is the process of extracting knowledge from continuous data streams. In this paper, STARM (STreaming Association Rules Mining) is proposed as an efficient and distributed algorithm for mining ARs. Based on the transaction-sensitive sliding-window model, the Apriori algorithm is applied to data streams to extract frequent itemsets (FI) that are then generated into ARs via Spark streaming framework. A Dimensionality Reduction (DR) step takes place as a data preprocessing step that may reduce the search space. The conducted experiments show that the proposed streaming model achieves state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 143.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 199.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Databases (VLDB), vol. 1215, pp. 487–499. Santiago, Chile (1994)

    Google Scholar 

  2. Bellman, R., Kalaba, R.: Dynamic programming and statistical communication theory. Proc. Natl. Acad. Sci. U.S.A. 43(8), 749 (1957)

    Article  MathSciNet  Google Scholar 

  3. Borgelt, C.: An implementation of the FP-growth algorithm. In: Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, pp. 1–5 (2005)

    Google Scholar 

  4. Chang, J.H., Lee, W.S.: Finding recent frequent itemsets adaptively over online data streams. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 487–492 (2003)

    Google Scholar 

  5. Cheng, J., Ke, Y., Ng, W.: \(\backslash \)delta-tolerance closed frequent itemsets. In: Proceedings of the Sixth International Conference on Data Mining (ICDM), pp. 139–148. IEEE (2006)

    Google Scholar 

  6. Chi, Y., Wang, H., Yu, P.S., Muntz, R.R.: Moment: maintaining closed frequent itemsets over a stream sliding window. In: Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM), pp. 59–66 (2004)

    Google Scholar 

  7. Cormode, G.: Fundamentals of analyzing and mining data streams. Monde des Util. Anal. Données 36, 1–5 (2007)

    Google Scholar 

  8. Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. SIAM J. Comput. 31(6), 1794–1813 (2002)

    Article  MathSciNet  Google Scholar 

  9. De Matteis, T., Mencagli, G., De Sensi, D., Torquati, M., Danelutto, M.: Gasser: an auto-tunable system for general sliding-window streaming operators on GPUs. IEEE Access 7, 48753–48769 (2019)

    Article  Google Scholar 

  10. El Moudden, I., ElBernoussi, S., Benyacoub, B.: Modeling human activity recognition by dimensionality reduction approach. In: Proceedings of the 27th International Business Information Management Association Conference-Innovation Management and Education Excellence Vision, vol. 2020 (2016)

    Google Scholar 

  11. Gahar, R.M., Arfaoui, O., Hidri, M.S., Alouane, N.B.-H.: Dimensionality reduction with missing values imputation. arXiv preprint arXiv:1707.00351 (2017)

  12. Gahar, R.M., Arfaoui, O., Hidri, M.S., Hadj-Alouane, N.B.: Parallelcharmax: an effective maximal frequent itemset mining algorithm based on mapreduce framework. In: Proceedings of the IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA), pp. 571–578. IEEE (2017)

    Google Scholar 

  13. Gahar, R.M., Arfaoui, O., Hidri, M.S., Hadj-Alouane, N.B.: An ontology-driven mapreduce framework for association rules mining in massive data. Procedia Comput. Sci. 126, 224–233 (2018)

    Google Scholar 

  14. Gahar, R.M., Arfaoui, O., Hidri, M.S., Hadj-Alouane, N.B.: A distributed approach for high-dimensionality heterogeneous data reduction. IEEE Access 7, 151006–151022 (2019)

    Google Scholar 

  15. Jin, R., Agrawal, G.: An algorithm for in-core frequent itemset mining on streaming data. In: Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM) (2005)

    Google Scholar 

  16. Karamizadeh, S., Abdullah, S.M., Manaf, A.A., Zamani, M., Hooman, A.: An overview of principal component analysis. J. Sig. Inf. Process. 4(3B), 173 (2013)

    Google Scholar 

  17. Koh, J.-L., Lin, C.-Y.: Concept shift detection for frequent itemsets from sliding windows over data streams. In: Proceedings of the International Workshops on Database Systems for Advanced Applications, pp. 334–348 (2009)

    Google Scholar 

  18. Leung, C.K.-S., Jiang, F.: Frequent pattern mining from time-fading streams of uncertain data. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2011. LNCS, vol. 6862, pp. 252–264. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23544-3_19

    Chapter  Google Scholar 

  19. Liu, L., Wen, J., Zheng, Z., Hansong, S.: An improved approach for mining association rules in parallel using spark streaming. Int. J. Circuit Theory Appl. 49(4), 1028–1039 (2021)

    Article  Google Scholar 

  20. Liu, X., Guan, J., Ping, H.: Mining frequent closed itemsets from a landmark window over online data streams. Comput. Math. Appl. 57(6), 927–936 (2009)

    Article  Google Scholar 

  21. Sahlberg, P.: Education policies for raising student learning: the Finnish approach. J. Educ. Policy 22(2), 147–171 (2007)

    Article  Google Scholar 

  22. Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho, A.C.P.L.F., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. (CSUR) 46(1), 1–31 (2013)

    Google Scholar 

  23. Tanbeer, S.K., Ahmed, C.F., Jeong, B.-S., Lee, Y.-K.: Sliding window-based frequent pattern mining over data streams. Inf. Sci. 179(22), 3843–3865 (2009)

    Google Scholar 

  24. Verleysen, M., François, D.: The curse of dimensionality in data mining and time series prediction. In: Proceedings of the International Work-Conference on Artificial Neural Networks, pp. 758–770 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rania Mkhinini Gahar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mkhinini Gahar, R., Arfaoui, O., Hidri, A., Alsaif, S.A., Sassi Hidri, M. (2024). STARM: STreaming Association Rules Mining in High-Dimensional Data. In: Barolli, L. (eds) Advanced Information Networking and Applications. AINA 2024. Lecture Notes on Data Engineering and Communications Technologies, vol 200. Springer, Cham. https://doi.org/10.1007/978-3-031-57853-3_12

Download citation

Publish with us

Policies and ethics