Abstract
Predictive analytics involves using Data Mining algorithms to discover knowledge from large databases. The Association Rules (ARs) mining technique is considered to be one of the most prevalent data mining techniques in this context. When it comes to Big Data, we talk about data stream mining which is the process of extracting knowledge from continuous data streams. In this paper, STARM (STreaming Association Rules Mining) is proposed as an efficient and distributed algorithm for mining ARs. Based on the transaction-sensitive sliding-window model, the Apriori algorithm is applied to data streams to extract frequent itemsets (FI) that are then generated into ARs via Spark streaming framework. A Dimensionality Reduction (DR) step takes place as a data preprocessing step that may reduce the search space. The conducted experiments show that the proposed streaming model achieves state-of-the-art performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Databases (VLDB), vol. 1215, pp. 487–499. Santiago, Chile (1994)
Bellman, R., Kalaba, R.: Dynamic programming and statistical communication theory. Proc. Natl. Acad. Sci. U.S.A. 43(8), 749 (1957)
Borgelt, C.: An implementation of the FP-growth algorithm. In: Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, pp. 1–5 (2005)
Chang, J.H., Lee, W.S.: Finding recent frequent itemsets adaptively over online data streams. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 487–492 (2003)
Cheng, J., Ke, Y., Ng, W.: \(\backslash \)delta-tolerance closed frequent itemsets. In: Proceedings of the Sixth International Conference on Data Mining (ICDM), pp. 139–148. IEEE (2006)
Chi, Y., Wang, H., Yu, P.S., Muntz, R.R.: Moment: maintaining closed frequent itemsets over a stream sliding window. In: Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM), pp. 59–66 (2004)
Cormode, G.: Fundamentals of analyzing and mining data streams. Monde des Util. Anal. Données 36, 1–5 (2007)
Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. SIAM J. Comput. 31(6), 1794–1813 (2002)
De Matteis, T., Mencagli, G., De Sensi, D., Torquati, M., Danelutto, M.: Gasser: an auto-tunable system for general sliding-window streaming operators on GPUs. IEEE Access 7, 48753–48769 (2019)
El Moudden, I., ElBernoussi, S., Benyacoub, B.: Modeling human activity recognition by dimensionality reduction approach. In: Proceedings of the 27th International Business Information Management Association Conference-Innovation Management and Education Excellence Vision, vol. 2020 (2016)
Gahar, R.M., Arfaoui, O., Hidri, M.S., Alouane, N.B.-H.: Dimensionality reduction with missing values imputation. arXiv preprint arXiv:1707.00351 (2017)
Gahar, R.M., Arfaoui, O., Hidri, M.S., Hadj-Alouane, N.B.: Parallelcharmax: an effective maximal frequent itemset mining algorithm based on mapreduce framework. In: Proceedings of the IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA), pp. 571–578. IEEE (2017)
Gahar, R.M., Arfaoui, O., Hidri, M.S., Hadj-Alouane, N.B.: An ontology-driven mapreduce framework for association rules mining in massive data. Procedia Comput. Sci. 126, 224–233 (2018)
Gahar, R.M., Arfaoui, O., Hidri, M.S., Hadj-Alouane, N.B.: A distributed approach for high-dimensionality heterogeneous data reduction. IEEE Access 7, 151006–151022 (2019)
Jin, R., Agrawal, G.: An algorithm for in-core frequent itemset mining on streaming data. In: Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM) (2005)
Karamizadeh, S., Abdullah, S.M., Manaf, A.A., Zamani, M., Hooman, A.: An overview of principal component analysis. J. Sig. Inf. Process. 4(3B), 173 (2013)
Koh, J.-L., Lin, C.-Y.: Concept shift detection for frequent itemsets from sliding windows over data streams. In: Proceedings of the International Workshops on Database Systems for Advanced Applications, pp. 334–348 (2009)
Leung, C.K.-S., Jiang, F.: Frequent pattern mining from time-fading streams of uncertain data. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2011. LNCS, vol. 6862, pp. 252–264. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23544-3_19
Liu, L., Wen, J., Zheng, Z., Hansong, S.: An improved approach for mining association rules in parallel using spark streaming. Int. J. Circuit Theory Appl. 49(4), 1028–1039 (2021)
Liu, X., Guan, J., Ping, H.: Mining frequent closed itemsets from a landmark window over online data streams. Comput. Math. Appl. 57(6), 927–936 (2009)
Sahlberg, P.: Education policies for raising student learning: the Finnish approach. J. Educ. Policy 22(2), 147–171 (2007)
Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho, A.C.P.L.F., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. (CSUR) 46(1), 1–31 (2013)
Tanbeer, S.K., Ahmed, C.F., Jeong, B.-S., Lee, Y.-K.: Sliding window-based frequent pattern mining over data streams. Inf. Sci. 179(22), 3843–3865 (2009)
Verleysen, M., François, D.: The curse of dimensionality in data mining and time series prediction. In: Proceedings of the International Work-Conference on Artificial Neural Networks, pp. 758–770 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mkhinini Gahar, R., Arfaoui, O., Hidri, A., Alsaif, S.A., Sassi Hidri, M. (2024). STARM: STreaming Association Rules Mining in High-Dimensional Data. In: Barolli, L. (eds) Advanced Information Networking and Applications. AINA 2024. Lecture Notes on Data Engineering and Communications Technologies, vol 200. Springer, Cham. https://doi.org/10.1007/978-3-031-57853-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-57853-3_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57852-6
Online ISBN: 978-3-031-57853-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)