Abstract
The reliability of the weather forecast models is a complex issue since it depends on numerous parameters and the technical infrastructure which supports them. In doing so, there is a need for advanced works oriented towards a better understanding of these models and the analysis of main associated parameters. Our approach is to study the applicability of the extracted association rules to provide a clearer understanding of atmospheric exchanges. In this work, the proposed methodology is based on the discovery of the interesting interpretable relationships between measured meteorological parameters at the Atmospheric Research Center of Lannemezan (South-West of France). In the preprocessing step, the proposed method is considered to be effectively flexible to account for data uncertainties, unlike the majority of classical evaluation methods mainly directed towards the reduction of variables and data redundancy. In postprocessing, the advantage of our approach is that the extracted rules are a metamodeling of interpretable useful knowledge for the clarity and conciseness of its representation. Moreover, in the processing, the interpretability in data sciences is recent and still in its infancy. The generated association rules with their statistical and semantic interpretations have globally highlighted the possibilities of explicit analysis of meteorological parameters. This study showed that among the generated relevant rules, three parameters (temperature, humidity, wind speed) have a high frequency in the antecedents of the rules and that the only consequence is rain. This is useful for the identification of potential improvements and gaps in the existing models of atmospheric observations, in particular, to understand the related parameterizations to the productivity of the rain phenomenon.
Similar content being viewed by others
References
Andreas A, Ackerman M, Brownstein NC. To cluster, or not to cluster: an analysis of clusterability methods. Pattern Recogn. 2019;88:13–26.
Ajak AD, Lilford E, Topal E. Application of predictive data mining to create mine plan flexibility in the face of geological uncertainty. Resour Policy. 2017;55:62–79.
Agrawal R, Imielinski T, Swami A. Mining associations between sets of items in large databases. In: ACM SIGMOD int’l conference on management of data, Washington D.C.; 1993, pp. 207–16.
Arnaud P, Cantet P, Odry J. Uncertainties of flood frequency estimation approaches based on continuous simulation using data resampling. J Hydrol. 2017;554:360–9.
Azimi R, Ghofrani M, Ghayekhloo M. A hybrid wind power forecasting model based on data mining and wavelets analysis. Energy Convers Manag. 2016;127:208–25.
Bandaru S, Ng AHC, Deb K. Data mining methods for knowledge discovery in multi-objective optimization: part A—survey. Expert Syst Appl. 2017;70:139–59.
Beierle C. Management of uncertainty in Artificial Intelligence and databases. Int J Approx Reason. 2017;86:24–5.
Bilalli B, Abelló A, Aluja-Banet T, Wrembel R. Intelligent assistance for data pre-processing. Comput Stand Interfaces. 2018;57:101–9.
Bourdeau M, Zhai X, Nefzaoui E, Guo X, Chatellier P. Modeling and forecasting building energy consumption: a review of data-driven techniques. Sustain Cities Soc. 2019;48:101533.
Borah A, Nath B. Identifying risk factors for adverse diseases using dynamic rare association rule mining. Expert Syst Appl. 2018;113:233–63.
Chemchem A, Drias H. From data mining to knowledge mining: application to intelligent agents. Expert Syst Appl. 2015;42(3):1436–45.
Xiaobo C, Wei Z, Li Z, Liang J, Cai Y, Zhang B. Ensemble correlation-based low-rank matrix completion with applications to traffic data imputation. Knowl-Based Syst. 2017;132(15):249–62.
Crone SF, Lessmann S, Stahlbock R. The impact of preprocessing on data mining: an evaluation of classifier sensitivity in direct marketing. Eur J Operational Res. 2006;173(3):781–800.
De Mauro A, Greco M, Grimaldi M, Ritala P. Human resources for Big Data professions: a systematic classification of job roles and required skill sets. Inf Process Manag. 2018;54(5):807–17.
Djenouri Y, Comuzzi M. Combining Apriori heuristic and bio-inspired algorithms for solving the frequent itemsets mining problem. Inf Sci. 2017;420(2017):1–15.
Djenouri Y, Belhadi A, Fournier-Viger P, Fujita H. Mining diversified association rules in big datasets: a cluster/GPU/genetic approach. Inf Sci. 2018;459:117–34.
Doostan and Chowdhury, 2017. Milad Doostan, Badrul H. Chowdhury. Power distribution system fault cause analysis by using association rule mining. Electric Power Systems Research, Volume 152, November 2017, Pages 140–147.
Figueiredo LNL, de Assis GT, Ferreira AA. DERIN: a data extraction method based on rendering information and n-gram. Inf Process Manag. 2017;53(5):1120–38.
García S, Luengo J, Herrera F. Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowl-Based Syst. 2016;98:1–29.
García-Gil D, Luengo J, García S, Herrera F. Enabling smart data: noise filtering in big data classification. Inf Sci. 2019;479:135–52.
Fan C, Ding Y, Liao Y. Analysis of hourly cooling load prediction accuracy with data-mining approaches on different training time scales. Sustain Cities Soc. 2019;51:101717.
Gupta A, Datta S, Das S. Fast automatic estimation of the number of clusters from the minimum inter-center distance for k-means clustering. Pattern Recogn Lett. 2018;116(1):72–9.
Henriques R, Antunes C, Madeira SC. A structured view on pattern mining-based biclustering. Pattern Recognit. 2015;48(12):3941–58.
Huang C, Lu R, Choo K-KR. Secure and flexible cloud-assisted association rule mining over horizontally partitioned databases. J Comput Syst Sci. 2017;89:51–63.
Kamsu-Foguem B, Rigal F, Mauget F. Mining association rules for the quality improvement of the production process. Expert Syst Appl. 2013;40(4):1034–45.
Karmitsa N, Bagirov AM, Taheri S. New diagonal bundle method for clustering problems in large data sets. Eur J Oper Res. 2017;263(2):367–79.
Khader N, Lashier A, Yoon SW. Pharmacy robotic dispensing and planogram analysis using association rule mining with prescription data. Expert Syst Appl. 2016;57:296–310.
Li R, Jiang P, Yang H, Li C. A novel hybrid forecasting scheme for electricity demand time series. Sustain Cities Soc. 2020;55:102036.
Li W-P, Yang J, Zhang J-P. Uncertain canonical correlation analysis for multi-view feature extraction from uncertain data streams. Neurocomputing. 2015;149(Part C):1337–47.
Liu K, Liu T-Z, Jian P, Lin Y. The re-optimization strategy of multi-layer hybrid building’s cooling and heating load soft sensing technology research based on temperature interval and hierarchical modeling techniques. Sustain Cities Soc. 2018;38:42–54.
Liao S, Chang H. A rough set-based association rule approach for a recommendation system for online consumers. Inf Process Manag. 2016;52(6):1142–60.
Lin JC-W, Gan W, Fournier-Viger P, Hong T-P, Tseng VS. Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl-Based Syst. 2016;96:171–87.
Loría-Salazar SM, Panorska A, Arnott WP, Barnard JC, Boehmler JM, Holmes HA. Toward understanding atmospheric physics impacting the relationship between columnar aerosol optical depth and near-surface PM2.5 mass concentrations in Nevada and California, U.S.A., during 201. Atmos Environ. 2017;171:289–300.
Narvekar M, Syed SF. An optimized algorithm for association rule mining using FP tree. Proc Comput Sci. 2015;45(2015):101–10.
Pei B, Zhao S, Chen H, Zhou X, Chen D. FARP: Mining fuzzy association rules from a probabilistic quantitative database. Inf Sci. 2013;237:242–60.
Petrollese M, Cau G, Cocco D. Use of weather forecast for increasing the self-consumption rate of home solar systems: an Italian case study. Appl Energy. 2018;212(15):746–58.
Massana J, Pous C, Burgas L, Melendez J, Colomer J. Identifying services for short-term load forecasting using data driven models in a Smart City platform. Sustain Cities Soc. 2017;28:108–17.
Pereira RB, Plastino A, Zadrozny B, Merschmann LHC. Correlation analysis of performance measures for multi-label classification. Inf Process Manag. 2018;54(3):359–69.
Ramírez-Gallego S, Krawczyk B, García S, Woźniak M, Herrera F. A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing. 2017;239:39–57.
Ristoski P, Paulheim H. Semantic Web in data mining and knowledge discovery: a comprehensive survey. Web Semant Sci Serv Agents World Wide Web. 2016;36:1–22.
Saggi MK, Jain S. A survey towards an integration of big data analytics to big insights for value-creation. Inf Process Manag. 2018;54(5):758–90.
Shi F, Peng X, Liu Z, Li E, Hu Y. A data-driven approach for pipe deformation prediction based on soil properties and weather conditions. Sustain Cities Soc. 2020;55:102012.
Singh S, Garg R, Mishra PK. Performance optimization of MapReduce-based Apriori algorithm on Hadoop cluster. Comput Electric Eng. 2017;67:348–64.
Talaat M, Alsayyari AS, Alblawi A, Hatata AY. Hybrid-cloud-based data processing for power system monitoring in smart grids. Sustain Cities Soc. 2020;55:102049.
Ahmad T, Chen H. A review on machine learning forecasting growth trends and their real-time applications in different energy systems. Sustain Cities Soc. 2020;54:102010.
Ahmad T, Chen H. Utility companies strategy for short-term energy demand forecasting using machine learning based models. Sustain Cities Soc. 2018;39:401–17.
Vadim K. Overview of different approaches to solving problems of data mining. Proc Comput Sci. 2018;123(2018):234–9.
Valverde-Rebaza JC, Roche M, Poncelet P, de Lopes AA. The role of location and social strength for friendship prediction in location-based social networks. Inf Process Manag. 2018;54(4):475–89.
Yesilbudak M, Sagiroglu S, Colak I. A novel implementation of kNN classifier based on multi-tupled meteorological input data for wind power prediction. Energy Convers Manag. 2017;135:434–44.
Zarzo M, Martí P. Modeling the variability of solar radiation data among weather stations by means of principal components analysis. Appl Energy. 2011;88:2775–84.
Zhang X, He L, Zhang J, Whiting MD, Karkee M, Zhang Q. Determination of key canopy parameters for mass mechanical apple harvesting using supervised machine learning and principal component analysis (PCA). Biosys Eng. 2020;193:247–63.
Zhang Z, Pedrycz W, Huang J. Efficient mining product-based fuzzy association rules through central limit theorem. Appl Soft Comput. 2018;63:235–48.
Zhao C, Song G. Application of data mining to the analysis of meteorological data for air quality prediction: a case study in Shenyang. In: IOP conference series: earth and environmental science, Vol. 81, conference 1; 2017.
Zhen Q, Deng Y, Wang Y, Wang X, Zhang H, Sun X, Ouyang Z. Meteorological factors had more impact on airborne bacterial communities than air pollutants. Sci Total Environ. 2017;601–602:703–12.
Zhu J, Shen Y, Song Z, Zhou D, Kusiak A. Data-driven building load profiling and energy management. Sustain Cities Soc. 2019;49:101587.
Zhu E, Ma R. An effective partitional clustering algorithm based on new clustering validity index. Appl Soft Comput. 2018;71:608–21.
Funding
This research is funded by the Malian Ministry of National Education, Ecole Normale d’Enseignement Technique et Professionnel (ENETP) de l’Université de Bamako in Mali and the French Embassy in Mali through Cultural and Cooperation Department (SCAC). The grant IDs of the research is N◦ 954749E with the reference PRISME 0185MLIB190027.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Coulibaly, L., Kamsu-Foguem, B. & Tangara, F. Explainability with Association Rule Learning for Weather Forecast. SN COMPUT. SCI. 2, 116 (2021). https://doi.org/10.1007/s42979-021-00525-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-021-00525-8