[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (474)

Search Parameters:
Keywords = stacking ensemble learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 7491 KiB  
Article
Recycled Aggregate Concrete Incorporating GGBS and Polypropylene Fibers Using RSM and Machine Learning Techniques
by Anjali Jaglan and Rati Ram Singh
Buildings 2025, 15(1), 66; https://doi.org/10.3390/buildings15010066 (registering DOI) - 28 Dec 2024
Viewed by 114
Abstract
In this study, Response Surface Methodology (RSM) and machine learning models were used to predict the mechanical properties of recycled aggregate concrete (RAC) containing ground granulated blast furnace slag (GGBS) and polypropylene fibers (PPFs). The investigation focused on compressive strength (CS) and split [...] Read more.
In this study, Response Surface Methodology (RSM) and machine learning models were used to predict the mechanical properties of recycled aggregate concrete (RAC) containing ground granulated blast furnace slag (GGBS) and polypropylene fibers (PPFs). The investigation focused on compressive strength (CS) and split tensile strength (STS) tests at curing periods of 7, 28, 56, and 90 days, with variations in the percentages of GGBS (0–50%), recycled aggregate (RA) (0–100%), and PPF (0–1%). The RSM model showed high accuracy in predicting both CS and STS, with statistically significant results (p-value < 0.0001). Among the machine learning models, the Gradient Boosting Machine (GBM) exhibited the highest performance, achieving an R2 value of 0.98961 during the training and testing phases for CS prediction. It also demonstrated strong results for STS prediction, with an MSE of 0.02773, MAPE of 2.69775, and R2 value of 0.99404 in the training phase, and an MSE of 0.14141, MAPE of 5.71691, and R2 value of 0.96947 during testing. The Stacked Ensemble Learning model performed similarly to GBM, with an R2 of 0.99251 during training for STS and 0.96619 during testing. However, GBM consistently outperformed the other models in terms of balancing low error rates and high R2 values across both datasets. The Distributed Random Forest model also provided strong performance but slightly higher error rates and lower R2 values than GBM. Overall, both GGBS and PPF significantly enhanced the mechanical properties and workability of the concrete, highlighting the importance of these additives in optimizing concrete performance. Full article
(This article belongs to the Special Issue Solid Waste Management in the Construction Sector)
Show Figures

Figure 1

Figure 1
<p>Research methodology.</p>
Full article ">Figure 2
<p>Materials used in this study: (<b>a</b>) RA; (<b>b</b>) GGBS; (<b>c</b>) PPF.</p>
Full article ">Figure 3
<p>Gradation curves of (<b>a</b>) fine aggregates; (<b>b</b>) coarse aggregates.</p>
Full article ">Figure 4
<p>Experimental Setup.</p>
Full article ">Figure 5
<p>Average Compressive Strength: (<b>a</b>) At 7 and 28 days; (<b>b</b>) At 56 and 90 days.</p>
Full article ">Figure 6
<p>STS analysis: (<b>a</b>) At 7 and 28 days; (<b>b</b>) At 56 and 90 days.</p>
Full article ">Figure 7
<p>(<b>a</b>–<b>c</b>) Response surfaces illustrating the impact of factors on CS; (<b>d</b>) the actual vs. predicted plot.</p>
Full article ">Figure 8
<p>(<b>a</b>–<b>c</b>) Response surfaces illustrating the impact of factors on the STS; (<b>d</b>) the actual vs. predicted plot.</p>
Full article ">Figure 9
<p>Measured vs. predicted compressive strength using (<b>a</b>,<b>b</b>) Random Forest; (<b>c</b>,<b>d</b>) Gradient Boosting; (<b>e</b>,<b>f</b>) Stacked Ensemble Models.</p>
Full article ">Figure 10
<p>Measured vs. predicted split tensile strength using (<b>a</b>,<b>b</b>) Random Forest; (<b>c</b>,<b>d</b>) Gradient Boosting; (<b>e</b>,<b>f</b>) Stacked Ensemble Models.</p>
Full article ">Figure 11
<p>Scatter plot of predicted and measured compressive strengths.</p>
Full article ">Figure 12
<p>Scatter plot of predicted and measured split tensile strengths.</p>
Full article ">
14 pages, 1424 KiB  
Article
Rice Disease Classification Using a Stacked Ensemble of Deep Convolutional Neural Networks
by Zhibin Wang, Yana Wei, Cuixia Mu, Yunhe Zhang and Xiaojun Qiao
Sustainability 2025, 17(1), 124; https://doi.org/10.3390/su17010124 - 27 Dec 2024
Viewed by 216
Abstract
Rice is a staple food for almost half of the world’s population, and the stability and sustainability of rice production plays a decisive role in food security. Diseases are a major cause of loss in rice crops. The timely discovery and control of [...] Read more.
Rice is a staple food for almost half of the world’s population, and the stability and sustainability of rice production plays a decisive role in food security. Diseases are a major cause of loss in rice crops. The timely discovery and control of diseases are important in reducing the use of pesticides, protecting the agricultural eco-environment, and improving the yield and quality of rice crops. Deep convolutional neural networks (DCNNs) have achieved great success in disease image classification. However, most models have complex network structures that frequently cause problems, such as redundant network parameters, low training efficiency, and high computational costs. To address this issue and improve the accuracy of rice disease classification, a lightweight deep convolutional neural network (DCNN) ensemble method for rice disease classification is proposed. First, a new lightweight DCNN model (called CG-EfficientNet), which is based on an attention mechanism and EfficientNet, was designed as the base learner. Second, CG-EfficientNet models with different optimization algorithms and network parameters were trained on rice disease datasets to generate seven different CG-EfficientNets, and a resampling strategy was used to enhance the diversity of the individual models. Then, the sequential least squares programming algorithm was used to calculate the weight of each base model. Finally, logistic regression was used as the meta-classifier for stacking. To verify the effectiveness, classification experiments were performed on five classes of rice tissue images: rice bacterial blight, rice kernel smut, rice false smut, rice brown spot, and healthy leaves. The accuracy of the proposed method was 96.10%, which is higher than the results of the classic CNN models VGG16, InceptionV3, ResNet101, and DenseNet201 and four integration methods. The experimental results show that the proposed method is not only capable of accurately identifying rice diseases but is also computationally efficient. Full article
Show Figures

Figure 1

Figure 1
<p>Representative rice images used in this study. (<b>a</b>) Rice bacterial blight, (<b>b</b>) rice brown spot, (<b>c</b>) rice kernel smut, (<b>d</b>) rice false smut, and (<b>e</b>) healthy leaves.</p>
Full article ">Figure 2
<p>Framework of the proposed method.</p>
Full article ">Figure 3
<p>Architecture of CG-EfficientNet. (<b>a</b>) Overall structure, (<b>b</b>) MBConv1, and (<b>c</b>) MBConv6.</p>
Full article ">Figure 4
<p>Structure of the stacking ensemble algorithm.</p>
Full article ">
20 pages, 5692 KiB  
Article
Combining UAV Remote Sensing with Ensemble Learning to Monitor Leaf Nitrogen Content in Custard Apple (Annona squamosa L.)
by Xiangtai Jiang, Lutao Gao, Xingang Xu, Wenbiao Wu, Guijun Yang, Yang Meng, Haikuan Feng, Yafeng Li, Hanyu Xue and Tianen Chen
Agronomy 2025, 15(1), 38; https://doi.org/10.3390/agronomy15010038 - 27 Dec 2024
Viewed by 185
Abstract
One of the most important nutrients needed for fruit tree growth is nitrogen. For orchards to get targeted, well-informed nitrogen fertilizer, accurate, large-scale, real-time monitoring, and assessment of nitrogen nutrition is essential. This study examines the Leaf Nitrogen Content (LNC) of the custard [...] Read more.
One of the most important nutrients needed for fruit tree growth is nitrogen. For orchards to get targeted, well-informed nitrogen fertilizer, accurate, large-scale, real-time monitoring, and assessment of nitrogen nutrition is essential. This study examines the Leaf Nitrogen Content (LNC) of the custard apple tree, a noteworthy fruit tree that is extensively grown in China’s Yunnan Province. This study uses an ensemble learning technique based on multiple machine learning algorithms to effectively and precisely monitor the leaf nitrogen content in the tree canopy using multispectral canopy footage of custard apple trees taken via Unmanned Aerial Vehicle (UAV) across different growth phases. First, canopy shadows and background noise from the soil are removed from the UAV imagery by using spectral shadow indices across growth phases. The noise-filtered imagery is then used to extract a number of vegetation indices (VIs) and textural features (TFs). Correlation analysis is then used to determine which features are most pertinent for LNC estimation. A two-layer ensemble model is built to quantitatively estimate leaf nitrogen using the stacking ensemble learning (Stacking) principles. Random Forest (RF), Adaptive Boosting (ADA), Gradient Boosting Decision Trees (GBDT), Linear Regression (LR), and Extremely Randomized Trees (ERT) are among the basis estimators that are integrated in the first layer. By detecting and eliminating redundancy among base estimators, the Least Absolute Shrinkage and Selection Operator regression (Lasso)model used in the second layer improves nitrogen estimation. According to the analysis results, Lasso successfully finds redundant base estimators in the suggested ensemble learning approach, which yields the maximum estimation accuracy for the nitrogen content of custard apple trees’ leaves. With a root mean square error (RMSE) of 0.059 and a mean absolute error (MAE) of 0.193, the coefficient of determination (R2) came to 0. 661. The significant potential of UAV-based ensemble learning techniques for tracking nitrogen nutrition in custard apple leaves is highlighted by this work. Additionally, the approaches investigated might offer insightful information and a point of reference for UAV remote sensing applications in nitrogen nutrition monitoring for other crops. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>Geographical location of the study area.</p>
Full article ">Figure 2
<p>DJI Mavic 3M.</p>
Full article ">Figure 3
<p>(<b>a</b>–<b>f</b>) show a comparison between the original images and the images with shadows and soil removed across three growth stages.</p>
Full article ">Figure 4
<p>Design of an Ensemble Learning Workflow for Estimating LNC in Custard Apple.</p>
Full article ">Figure 5
<p>Presents the heatmap of spectral features with moderate or stronger correlations with the LNC of custard apple.</p>
Full article ">Figure 6
<p>Heatmap of the correlation between custard apple LNC and the optimal input variables.</p>
Full article ">Figure 7
<p>This illustrates the remote sensing estimation of custard apple leaf nitrogen content using different learning methods. (<b>a</b>–<b>e</b>) represent the fitting curves of the base models RF, GBDT, ADA, ERT, and LR, respectively. (<b>f</b>) shows the fitting curve of the meta-model, and (<b>g</b>) presents the Lasso weight graph.</p>
Full article ">Figure 8
<p>Remote Sensing Monitoring of Custard Apple Leaf Nitrogen Content Based on UAV Multispectral Imagery. (<b>a</b>–<b>c</b>) represent the remote sensing monitoring images of leaf nitrogen content in May, August, and November, respectively.</p>
Full article ">
23 pages, 16606 KiB  
Article
Method for Evaluating Urban Building Renewal Potential Based on Multimachine Learning Integration: A Case Study of Longgang and Longhua Districts in Shenzhen
by Dengkuo Sun, Yuefeng Lu, Yong Qin, Miao Lu, Zhenqi Song and Ziqi Ding
Land 2025, 14(1), 15; https://doi.org/10.3390/land14010015 - 25 Dec 2024
Viewed by 132
Abstract
With the continuous advancement of urbanization, urban renewal has become a vital means of enhancing urban functionality and improving living environments. Traditional urban renewal research primarily focuses on the macro level, analyzing regions or units, with limited studies targeting individual buildings. Consequently, the [...] Read more.
With the continuous advancement of urbanization, urban renewal has become a vital means of enhancing urban functionality and improving living environments. Traditional urban renewal research primarily focuses on the macro level, analyzing regions or units, with limited studies targeting individual buildings. Consequently, the unique characteristics and specific requirements of individual buildings during urban renewal have often been overlooked. This study first identified individual buildings undergoing urban renewal in the Longgang and Longhua Districts of Shenzhen, China, from 2018 to 2023 using multisource data such as the 2018 Shenzhen Building Census. A regression analysis based on building characteristics and locational factors was conducted using a stacking ensemble machine learning model. In addition, buildings were categorized into residential, industrial, and commercial types based on their usage, enabling both overall- and category-specific predictions of building renewal. The results show the following: (1) Using the prediction results of multilayer perceptron (MLP) and eXtreme Gradient Boosting (XGBoost) base models as inputs and fusing them with an AdaBoost classifier as the final metamodel, the goodness of fit of the overall building renewal regression model increased by 2.19%. (2) The regression model achieved an overall urban renewal prediction accuracy of 89.41%. Categorizing urban renewal projects improved the goodness of fit for residential and industrial building renewal by 0.14% and 6.13%, respectively. (3) Compared with traditional macro-level evaluation methods, the experimental results of this study improved by 8.41%, and compared with single-model approaches based on planning permit data, the accuracy improved by 29.11%. Full article
Show Figures

Figure 1

Figure 1
<p>Location of the study area.</p>
Full article ">Figure 2
<p>Data Example: (<b>a</b>) Updated building space distribution. (<b>b</b>) Main roads. (<b>c</b>) Secondary roads. (<b>d</b>) School locations. (<b>e</b>) Hospital locations. (<b>f</b>) Subway locations. (<b>g</b>) Commercial locations. (<b>h</b>) Park locations. (<b>i</b>) Elevation. (<b>j</b>) Vegetation index. (<b>k</b>) Population. (<b>l</b>) Slope. (<b>m</b>) Gross Domestic Product (GDP). (<b>n</b>) Building Price. (<b>o</b>) Longhua and Longgang Buildings.</p>
Full article ">Figure 3
<p>Three-dimensional map of building distribution in the study area.</p>
Full article ">Figure 4
<p>Technical approach flowchart.</p>
Full article ">Figure 5
<p>Heatmap of feature correlation analysis. (Note: x1: building perimeter, x2: building footprint area, x3: building structural type, x4: building height, x5: number of above-ground floors, x6: number of underground floors, x7: building volume, x8: building attribute, x9: distance to main road, x10: distance to secondary road, x11: distance to school, x12: distance to hospital, x13: distance to subway, x14: distance to park, x15: distance to commercial facilities, x16: population in the building area, x17: slope in the building area, x18: vegetation index in the building area, x19: elevation of the building area, x20: Gross Domestic Product(GDP), x21: Building Data.)</p>
Full article ">Figure 6
<p>Ensemble model construction method.</p>
Full article ">Figure 7
<p>City renewal level evaluation results.</p>
Full article ">Figure 8
<p>Spatial Kernel Density Analysis. (Note: (<b>a</b>): number of buildings updated between 2018 and 2023. (<b>b</b>,<b>c1</b>): overall building renewal kernel density. (<b>c2</b>): industrial building renewal kernel density. (<b>c3</b>): residential building renewal kernel density. (<b>c4</b>): industrial, residential, and commercial buildings in Longgang and Longhua. (<b>d1</b>): overall predicted building renewal kernel density. (<b>d2</b>): industrial predicted building renewal kernel density. (<b>d3</b>): residential predicted building renewal kernel density. (<b>d4</b>): spatial distribution of commercial building renewal. (<b>e1</b>): overlay plot of overall building renewal and predicted building kernel density analysis. (<b>e2</b>): overlay plot of industrial building renewal and predicted building kernel density analysis. (<b>e3</b>): overlay plot of residential building renewal and predicted building kernel density analysis. (<b>e4</b>): distribution of Industrial, residential, and commercial building renewal.).</p>
Full article ">Figure 9
<p>(<b>a</b>) SHAP Values. (<b>b</b>) Bee Swarm Plot. (Note: To ensure that the model can recognize textual data, such as building structure and function factors, these text features need to be encoded. “x3-1”, “x3-2”, “x3-3”, “x3-4”, and “x3-5” represent different building structures: brick and tile structure, frame structure, mixed structure, steel structure, and cylindrical structure, respectively. “x8-1”, “x8-2”, “x8-3”, “x8-4”, “x8-5”, “x8-6” represent different building functions: industrial building, private residential building, residential accessory building, commercial building, residential building, and warehouse building, respectively.).</p>
Full article ">
17 pages, 1429 KiB  
Article
Detection of UAV GPS Spoofing Attacks Using a Stacked Ensemble Method
by Ting Ma, Xiaofeng Zhang and Zhexin Miao
Drones 2025, 9(1), 2; https://doi.org/10.3390/drones9010002 - 24 Dec 2024
Viewed by 209
Abstract
Unmanned aerial vehicles (UAVs) are vulnerable to global positioning system (GPS) spoofing attacks, which can mislead their navigation systems and result in unpredictable catastrophic consequences. To address this issue, we propose a detection method based on stacked ensemble learning that combines convolutional neural [...] Read more.
Unmanned aerial vehicles (UAVs) are vulnerable to global positioning system (GPS) spoofing attacks, which can mislead their navigation systems and result in unpredictable catastrophic consequences. To address this issue, we propose a detection method based on stacked ensemble learning that combines convolutional neural network (CNN) and extreme gradient boosting (XGBoost) to detect spoofing signals in the GPS data received by UAVs. First, we applied the synthetic minority oversampling (SMOTE) technique to the dataset to address the issue of class imbalance. Then, we used a CNN model to extract high-level features, combined with the original features as input for the stacked model. The stacked model employs XGBoost as the base learner, which is optimized through five-fold cross-validation, and utilizes logistic regression for the final prediction. Furthermore, we incorporated magnetic field data to enhance the system’s robustness, thereby further improving the accuracy and reliability of GPS spoofing attack detection. Experimental results indicate that the proposed model achieved a high accuracy of 99.79% in detecting GPS spoofing attacks, demonstrating its potential effectiveness in enhancing UAV security. Full article
Show Figures

Figure 1

Figure 1
<p>GPS spoofing attack.</p>
Full article ">Figure 2
<p>CNN architecture diagram.</p>
Full article ">Figure 3
<p>GBDT architecture diagram.</p>
Full article ">Figure 4
<p>XGBoost architecture diagram.</p>
Full article ">Figure 5
<p>CNN-XGBoost model architecture diagram.</p>
Full article ">Figure 6
<p>Five-fold cross-validation architecture.</p>
Full article ">
28 pages, 2131 KiB  
Article
A Financial Fraud Prediction Framework Based on Stacking Ensemble Learning
by Shanshan Zhu, Haotian Wu, Eric W. T. Ngai, Jifan Ren, Daojing He, Tengyun Ma and Yubin Li
Systems 2024, 12(12), 588; https://doi.org/10.3390/systems12120588 - 23 Dec 2024
Viewed by 422
Abstract
With the rapid development of the capital market, financial fraud cases are becoming increasingly common. The evolving fraud strategies pose significant threats to financial regulation, market order, and the interests of ordinary investors. In order to combine the generalization performance of different machine [...] Read more.
With the rapid development of the capital market, financial fraud cases are becoming increasingly common. The evolving fraud strategies pose significant threats to financial regulation, market order, and the interests of ordinary investors. In order to combine the generalization performance of different machine learning methods and improve the effectiveness of financial fraud prediction, this paper proposes a novel financial fraud prediction framework based on stacking ensemble learning. This framework, based on data from listed companies, comprehensively considers financial ratio indicators and non-financial indicators. It uses the stacking ensemble technique to integrate numerous base models of machine learning algorithms for predicting financial fraud. Furthermore, the proposed framework has high versatility and is suitable for various tasks related to financial fraud prediction, addressing the problem of model selection difficulties in previous research due to different scenarios and data. We also conducted case studies on specific companies and industries, confirming the significant interpretability and practical applicability of the proposed framework. The results show that the recall rate and Area Under Curve (AUC) of our framework reached 0.8246 and 0.8146, respectively, surpassing mainstream machine learning models such as XGBoost and LightGBM in existing studies. This research study is of great significance for predicting the increasing number of financial fraud cases, providing a reliable tool for financial regulatory institutions and investors. Full article
(This article belongs to the Section Systems Practice in Social Science)
Show Figures

Figure 1

Figure 1
<p>Schematic diagram of the proposed financial fraud prediction framework.</p>
Full article ">Figure 2
<p>Using cross-validation in the training process of the stacking model.</p>
Full article ">Figure 3
<p>Fraud prediction with different models in stock exchange (SSE).</p>
Full article ">Figure 4
<p>Fraud prediction with different models in stock exchange (SZSE).</p>
Full article ">Figure 5
<p>Fraud prediction for future period (next 2 years).</p>
Full article ">Figure 6
<p>Fraud prediction for future period (next 3 years).</p>
Full article ">Figure 7
<p>Prediction of specific fraudulent behavior (fabricated profits).</p>
Full article ">Figure 8
<p>Prediction of specific fraudulent behavior (misreported assets).</p>
Full article ">Figure 9
<p>Prediction of specific fraudulent behavior (false records (misleading statements)).</p>
Full article ">Figure 10
<p>Prediction of specific fraudulent behavior (improper general accounting practices).</p>
Full article ">Figure 11
<p>The PDP for the top five features of the ExtraTrees model (ownership).</p>
Full article ">Figure 12
<p>The PDP for the top five features of the ExtraTrees model (report reliability index).</p>
Full article ">Figure 13
<p>The PDP for the top five features of the ExtraTrees model (legal compliance index).</p>
Full article ">Figure 14
<p>The PDP for the top five features of the ExtraTrees model (Largest Holder Rate).</p>
Full article ">Figure 15
<p>The PDP for the top five features of the ExtraTrees model (Common Stock Earnings Yield A).</p>
Full article ">
13 pages, 1076 KiB  
Article
BagStacking: An Integrated Ensemble Learning Approach for Freezing of Gait Detection in Parkinson’s Disease
by Seffi Cohen, Nurit Cohen-Inger and Lior Rokach
Information 2024, 15(12), 822; https://doi.org/10.3390/info15120822 - 23 Dec 2024
Viewed by 278
Abstract
This study introduces BagStacking, an innovative ensemble learning framework designed to enhance the detection of freezing of gait (FOG) in Parkinson’s disease (PD) using accelerometer data. By synergistically combining bagging’s variance reduction with stacking’s sophisticated blending mechanisms, BagStacking achieves superior predictive performance. Evaluated [...] Read more.
This study introduces BagStacking, an innovative ensemble learning framework designed to enhance the detection of freezing of gait (FOG) in Parkinson’s disease (PD) using accelerometer data. By synergistically combining bagging’s variance reduction with stacking’s sophisticated blending mechanisms, BagStacking achieves superior predictive performance. Evaluated on a comprehensive PD dataset provided by the Michael J. Fox Foundation, BagStacking attained a mean average precision (MAP) of 0.306, surpassing standalone LightGBM and traditional stacking methods. Furthermore, BagStacking demonstrated superior area under the curve (AUC) metrics across key FOG event classes. Specifically, it achieved AUCs of 0.88 for start hesitation, 0.90 for turning, and 0.84 for walking events, outperforming multistrategy ensemble, regular stacking, and LightGBM baselines. Additionally, BagStacking exhibited reduced runtime compared to other ensemble approaches, making it suitable for real-time clinical monitoring. These results underscore BagStacking’s effectiveness in addressing the variability inherent in FOG detection, thereby contributing to improved patient care in PD. Full article
(This article belongs to the Special Issue Application of Machine Learning in Human Activity Recognition)
Show Figures

Figure 1

Figure 1
<p>BagStacking method overview: <b>D</b>—Bootstrap sampling the training set S, <b>M</b>—Training the base models, <b>P</b>—Apply the base models on the original training set, <b>M’</b>—Train the meta learner on the base models predictions, <b><math display="inline"><semantics> <msub> <mover accent="true"> <mi>y</mi> <mo stretchy="false">^</mo> </mover> <mrow> <mi>b</mi> <mi>a</mi> <mi>g</mi> <mi>s</mi> <mi>t</mi> <mi>a</mi> <mi>c</mi> <mi>k</mi> <mi>i</mi> <mi>n</mi> <mi>g</mi> </mrow> </msub> </semantics></math></b>—Apply base models to new instance; feed outputs to meta-learner for final prediction.</p>
Full article ">Figure 2
<p>Raw accelerometer data for the vertical (AccV), mediolateral (AccML), and anteroposterior (AccAP) axes over a 5-s window.</p>
Full article ">Figure 3
<p>Examples of feature transformations: time-domain features (mean, standard deviation), frequency-domain features (PSD mean, PSD median), and wavelet-domain features (wavelet coefficient means at levels 0 and 1) for the first five windows.</p>
Full article ">Figure 4
<p>AUC comparison of different methods across FOG event classes. BagStacking consistently outperforms other methods in start hesitation, turning, and walking event classes.</p>
Full article ">
21 pages, 4604 KiB  
Article
Integrative Stacking Machine Learning Model for Small Cell Lung Cancer Prediction Using Metabolomics Profiling
by Md. Shaheenur Islam Sumon, Marwan Malluhi, Noushin Anan, Mohannad Natheef AbuHaweeleh, Hubert Krzyslak, Semir Vranic, Muhammad E. H. Chowdhury and Shona Pedersen
Cancers 2024, 16(24), 4225; https://doi.org/10.3390/cancers16244225 - 18 Dec 2024
Viewed by 397
Abstract
Background: Small cell lung cancer (SCLC) is an extremely aggressive form of lung cancer, characterized by rapid progression and poor survival rates. Despite the importance of early diagnosis, the current diagnostic techniques are invasive and restricted. Methods: This study presents a novel stacking-based [...] Read more.
Background: Small cell lung cancer (SCLC) is an extremely aggressive form of lung cancer, characterized by rapid progression and poor survival rates. Despite the importance of early diagnosis, the current diagnostic techniques are invasive and restricted. Methods: This study presents a novel stacking-based ensemble machine learning approach for classifying small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) using metabolomics data. The analysis included 191 SCLC cases, 173 NSCLC cases, and 97 healthy controls. Feature selection techniques identified significant metabolites, with positive ions proving more relevant. Results: For multi-class classification (control, SCLC, NSCLC), the stacking ensemble achieved 85.03% accuracy and 92.47 AUC using Support Vector Machine (SVM). Binary classification (SCLC vs. NSCLC) further improved performance, with ExtraTreesClassifier reaching 88.19% accuracy and 92.65 AUC. SHapley Additive exPlanations (SHAP) analysis revealed key metabolites like benzoic acid, DL-lactate, and L-arginine as significant predictors. Conclusions: The stacking ensemble approach effectively leverages multiple classifiers to enhance overall predictive performance. The proposed model effectively captures the complementary strengths of different classifiers, enhancing the detection of SCLC and NSCLC. This work accentuates the potential of combining metabolomics with advanced machine learning for non-invasive early lung cancer subtype detection, offering an alternative to conventional biopsy methods. Full article
(This article belongs to the Collection Diagnosis and Treatment of Primary and Secondary Lung Cancers)
Show Figures

Figure 1

Figure 1
<p>Proposed Stacking Ensemble Model.</p>
Full article ">Figure 2
<p>Overview of the methodology employed in this study.</p>
Full article ">Figure 3
<p>Features ranked using the XGBoost feature selection algorithm for multi-class classification.</p>
Full article ">Figure 4
<p>Features ranked using the Extra Trees feature selection algorithm for binary classification.</p>
Full article ">Figure 5
<p>Accuracy of top features for multiclass classification.</p>
Full article ">Figure 6
<p>Top Feature Accuracy for Binary Classifications.</p>
Full article ">Figure 7
<p>AUC-ROC curve for the stacking-based SVM classifier in multi-class classification.</p>
Full article ">Figure 8
<p>AUC-ROC curve for the stacking-based ExtraTrees classifier in binary classification.</p>
Full article ">Figure 9
<p>SHAP summary plot for the multi-class classification model.</p>
Full article ">Figure 10
<p>SHAP summary plot for the binary classification model.</p>
Full article ">Figure 11
<p>Local explanations of a representative sample are shown in two forms: (<b>A</b>) a force plot illustrating an SCLC prediction and (<b>B</b>) a waterfall plot displaying the same prediction.</p>
Full article ">
19 pages, 2563 KiB  
Article
Optimization of Cocoa Pods Maturity Classification Using Stacking and Voting with Ensemble Learning Methods in RGB and LAB Spaces
by Kacoutchy Jean Ayikpa, Abou Bakary Ballo, Diarra Mamadou and Pierre Gouton
J. Imaging 2024, 10(12), 327; https://doi.org/10.3390/jimaging10120327 - 18 Dec 2024
Viewed by 471
Abstract
Determining the maturity of cocoa pods early is not just about guaranteeing harvest quality and optimizing yield. It is also about efficient resource management. Rapid identification of the stage of maturity helps avoid losses linked to a premature or late harvest, improving productivity. [...] Read more.
Determining the maturity of cocoa pods early is not just about guaranteeing harvest quality and optimizing yield. It is also about efficient resource management. Rapid identification of the stage of maturity helps avoid losses linked to a premature or late harvest, improving productivity. Early determination of cocoa pod maturity ensures both the quality and quantity of the harvest, as immature or overripe pods cannot produce premium cocoa beans. Our innovative research harnesses artificial intelligence and computer vision technologies to revolutionize the cocoa industry, offering precise and advanced tools for accurately assessing cocoa pod maturity. Providing an objective and rapid assessment enables farmers to make informed decisions about the optimal time to harvest, helping to maximize the yield of their plantations. Furthermore, by automating this process, these technologies reduce the margins for human error and improve the management of agricultural resources. With this in mind, our study proposes to exploit a computer vision method based on the GLCM (gray level co-occurrence matrix) algorithm to extract the characteristics of images in the RGB (red, green, blue) and LAB (luminance, axis between red and green, axis between yellow and blue) color spaces. This approach allows for in-depth image analysis, which is essential for capturing the nuances of cocoa pod maturity. Next, we apply classification algorithms to identify the best performers. These algorithms are then combined via stacking and voting techniques, allowing our model to be optimized by taking advantage of the strengths of each method, thus guaranteeing more robust and precise results. The results demonstrated that the combination of algorithms produced superior performance, especially in the LAB color space, where voting scored 98.49% and stacking 98.71%. In comparison, in the RGB color space, voting scored 96.59% and stacking 97.06%. These results surpass those generally reported in the literature, showing the increased effectiveness of combined approaches in improving the accuracy of classification models. This highlights the importance of exploring ensemble techniques to maximize performance in complex contexts such as cocoa pod maturity classification. Full article
(This article belongs to the Special Issue Imaging Applications in Agriculture)
Show Figures

Figure 1

Figure 1
<p>Diagram representing the voting process.</p>
Full article ">Figure 2
<p>Illustration of the stacking process of the algorithms in our study.</p>
Full article ">Figure 3
<p>The overall architecture of our method.</p>
Full article ">Figure 4
<p>Histogram of model performance comparison (accuracy) in the RGB space.</p>
Full article ">Figure 5
<p>Confusion matrix of the best-performing models in the RGB color space.</p>
Full article ">Figure 6
<p>Histogram of model performance comparison (accuracy) in the LAB space.</p>
Full article ">Figure 7
<p>Confusion matrix of the best-performing models in the LAB color space.</p>
Full article ">Figure 8
<p>Histogram of algorithm performance in RGB and LAB color spaces.</p>
Full article ">
16 pages, 361 KiB  
Article
Stroke Dataset Modeling: Comparative Study of Machine Learning Classification Methods
by Kalina Kitova, Ivan Ivanov and Vincent Hooper
Algorithms 2024, 17(12), 571; https://doi.org/10.3390/a17120571 - 13 Dec 2024
Viewed by 430
Abstract
Stroke prediction is a vital research area due to its significant implications for public health. This comparative study offers a detailed evaluation of algorithmic methodologies and outcomes from three recent prominent studies on stroke prediction. Ivanov et al. tackled issues of imbalanced datasets [...] Read more.
Stroke prediction is a vital research area due to its significant implications for public health. This comparative study offers a detailed evaluation of algorithmic methodologies and outcomes from three recent prominent studies on stroke prediction. Ivanov et al. tackled issues of imbalanced datasets and algorithmic bias using deep learning techniques, achieving notable results with a 98% accuracy and a 97% recall rate. They utilized resampling methods to balance the classes and advanced imputation techniques to handle missing data, underscoring the critical role of data preprocessing in enhancing the performance of Support Vector Machines (SVMs). Hassan et al. addressed missing data and class imbalance using multiple imputations and the Synthetic Minority Oversampling Technique (SMOTE). They developed a Dense Stacking Ensemble (DSE) model with over 96% accuracy. Their results underscore the efficiency of ensemble learning techniques and imputation for handling imbalanced datasets in stroke prediction. Bathla et al. employed various classifiers and feature selection techniques, including SMOTE, for class balancing. Their Random Forest (RF) classifier, combined with Feature Importance (FI) selection, achieved an accuracy of 97.17%, illustrating the positive impact of RF and relevant feature selection on model performance. A comparative analysis indicated that Ivanov et al.’s method achieved the highest accuracy rate. However, the studies collectively highlight that the choice of models and techniques for stroke prediction should be tailored to the specific characteristics of the dataset used. This study emphasizes the importance of effective data management and model selection in enhancing predictive performance. Full article
(This article belongs to the Special Issue Algorithms in Data Classification (2nd Edition))
Show Figures

Figure 1

Figure 1
<p>Sparsity matrix for the dataset.</p>
Full article ">
18 pages, 12816 KiB  
Article
Monthly Runoff Prediction Based on Stochastic Weighted Averaging-Improved Stacking Ensemble Model
by Kaixiang Fu, Xutong Sun, Kai Chen, Li Mo, Wenjing Xiao and Shuangquan Liu
Water 2024, 16(24), 3580; https://doi.org/10.3390/w16243580 (registering DOI) - 12 Dec 2024
Viewed by 427
Abstract
The accuracy of monthly runoff predictions is crucial for decision-making and efficiency in various areas, such as water resources management, flood control and disaster mitigation, hydraulic engineering scheduling, and agricultural irrigation. Therefore, in order to further improve the accuracy of monthly runoff prediction, [...] Read more.
The accuracy of monthly runoff predictions is crucial for decision-making and efficiency in various areas, such as water resources management, flood control and disaster mitigation, hydraulic engineering scheduling, and agricultural irrigation. Therefore, in order to further improve the accuracy of monthly runoff prediction, aiming at the problem that the traditional Stacking ensemble method ignores (the base model correlation between different folds in the prediction process), this paper proposes a novel Stacking multi-scale ensemble learning model (SWA–FWWS) based on random weight averaging and a K-fold cross-validation weighted ensemble. Then, it is evaluated and compared with base models and other multi-model ensemble models in the runoff prediction of two upstream and downstream reservoirs in a certain river. The results show that the proposed model exhibits excellent performance and adaptability in monthly runoff prediction, with an average RMSE reduction of 6.44% compared to traditional Stacking models. This provides a new research direction for the application of ensemble models in reservoir monthly runoff prediction. Full article
(This article belongs to the Section Hydrology)
Show Figures

Figure 1

Figure 1
<p>The structural of the dilated causal convolution stack.</p>
Full article ">Figure 2
<p>Diagram of TCN residual connection network.</p>
Full article ">Figure 3
<p>Cosine annealing learning rate curve.</p>
Full article ">Figure 4
<p>Flow chart of the SWA–FWWS model for monthly runoff prediction.</p>
Full article ">Figure 5
<p>Precipitation and runoff data of A and B reservoirs from 1953 to 2012.</p>
Full article ">Figure 6
<p>Improvements of the FWWS model compared with other models.</p>
Full article ">Figure 7
<p>Improvements of the SWA–FWWS model compared with other models.</p>
Full article ">Figure 8
<p><span class="html-italic">RMSE</span>, <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>S</mi> <mi>D</mi> </mrow> <mrow> <mi>b</mi> <mi>i</mi> <mi>a</mi> <mi>s</mi> </mrow> </msub> </mrow> </semantics></math>, <span class="html-italic">Bias</span>, and <span class="html-italic">DISP</span> plots of the prediction results of each model on the A dataset.</p>
Full article ">Figure 9
<p><span class="html-italic">RMSE</span>, <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>S</mi> <mi>D</mi> </mrow> <mrow> <mi>b</mi> <mi>i</mi> <mi>a</mi> <mi>s</mi> </mrow> </msub> </mrow> </semantics></math>, <span class="html-italic">Bias</span>, and <span class="html-italic">DISP</span> plots of the prediction results of each model on the B dataset.</p>
Full article ">Figure 10
<p>Comparison of the predictions and observations of the ensemble models on the A test set.</p>
Full article ">Figure 11
<p>Comparison of the predictions and observations of the ensemble models on the B test set.</p>
Full article ">
17 pages, 12137 KiB  
Article
Ensemble Learning for Oat Yield Prediction Using Multi-Growth Stage UAV Images
by Pengpeng Zhang, Bing Lu, Jiali Shang, Xingyu Wang, Zhenwei Hou, Shujian Jin, Yadong Yang, Huadong Zang, Junyong Ge and Zhaohai Zeng
Remote Sens. 2024, 16(23), 4575; https://doi.org/10.3390/rs16234575 - 6 Dec 2024
Viewed by 500
Abstract
Accurate crop yield prediction is crucial for optimizing cultivation practices and informing breeding decisions. Integrating UAV-acquired multispectral datasets with advanced machine learning methodologies has markedly refined the accuracy of crop yield forecasting. This study aimed to construct a robust and versatile yield prediction [...] Read more.
Accurate crop yield prediction is crucial for optimizing cultivation practices and informing breeding decisions. Integrating UAV-acquired multispectral datasets with advanced machine learning methodologies has markedly refined the accuracy of crop yield forecasting. This study aimed to construct a robust and versatile yield prediction model for multi-genotyped oat varieties by investigating 14 modeling scenarios that combine multispectral data from four key growth stages. An ensemble learning framework, StackReg, was constructed by stacking four base algorithms—ridge regression (RR), support vector machines (SVM), Cubist, and extreme gradient boosting (XGBoost)—to predict oat yield. The results show that, for single growth stages, base models achieved R2 values within the interval of 0.02 to 0.60 and RMSEs ranging from 391.50 to 620.49 kg/ha. By comparison, the StackReg improved performance, with R2 values extending from 0.25 to 0.61 and RMSEs narrowing to 385.33 and 542.02 kg/ha. In dual-stage and multi-stage settings, the StackReg consistently surpassed the base models, reaching R2 values of up to 0.65 and RMSE values as low as 371.77 kg/ha. These findings underscored the potential of combining UAV-derived multispectral imagery with ensemble learning for high-throughput phenotyping and yield forecasting, advancing precision agriculture in oat cultivation. Full article
(This article belongs to the Section Remote Sensing in Agriculture and Vegetation)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Study area and experimental layout with P4M UAV images taken on 25 July 2022 and 25 July 2023.</p>
Full article ">Figure 2
<p>Model setup and stacked regression framework for oat yield prediction.</p>
Full article ">Figure 3
<p>Distribution of UAV-captured spectral reflectance at different growth stages (<b>a</b>), and total oat yield distribution (<b>b</b>). Jointing (P1), heading (P2), early-grain filling (P3), and mid-grain filling (P4).</p>
Full article ">Figure 4
<p>Correlation between spectral variables and oat yield across different growth stages: jointing (P1), heading (P2), early-grain filling (P3), and mid-grain filling (P4).</p>
Full article ">Figure 5
<p>The statistical distribution of base and ensemble learning models’ prediction accuracy (R<sup>2</sup>) for oat yield prediction using UAV imagery from individual growth stages. (<b>a</b>) P1, jointing; (<b>b</b>) P2, heading; (<b>c</b>) P3, early-grain filling; and (<b>d</b>) P4, mid-grain filling. Statistical significance markers (* <span class="html-italic">p</span> &lt; 0.05, ** <span class="html-italic">p</span> &lt; 0.01, *** <span class="html-italic">p</span> &lt; 0.001, and ns <span class="html-italic">p</span> ≥ 0.05) represent differences in prediction performance between the StackReg model and the base models.</p>
Full article ">Figure 6
<p>The statistical distribution of base and ensemble learning models’ prediction accuracy (RMSE) for oat yield prediction using UAV imagery from individual growth stages. (<b>a</b>) P1, jointing; (<b>b</b>) P2, heading; (<b>c</b>) P3, early-grain filling; and (<b>d</b>) P4, mid-grain filling.</p>
Full article ">Figure 7
<p>The statistical distribution of base and ensemble learning models’ prediction accuracy (R<sup>2</sup>) for oat yield prediction using dual-stage UAV imagery. Growth stages include jointing (P1), heading (P2), early-grain filling (P3), and mid-grain filling (P4). (<b>a</b>–<b>c</b>) P1 paired with P2, P3, and P4; (<b>d</b>–<b>f</b>) P2 paired with P3 and P4 and P3 paired with P4. Statistical significance markers (** <span class="html-italic">p</span> &lt; 0.01, *** <span class="html-italic">p</span> &lt; 0.001, and ns <span class="html-italic">p</span> ≥ 0.05) represent differences in prediction performance between the StackReg model and the base models.</p>
Full article ">Figure 8
<p>The statistical distribution of base and ensemble learning models’ prediction accuracy (RMSE) for oat yield prediction using dual-stage UAV imagery. Growth stages include jointing (P1), heading (P2), early-grain filling (P3), and mid-grain filling (P4). (<b>a</b>–<b>c</b>) P1 paired with P2, P3, and P4; (<b>d</b>–<b>f</b>) P2 paired with P3 and P4, and P3 paired with P4.</p>
Full article ">Figure 9
<p>The statistical distribution of base and ensemble learning models’ prediction accuracy (R<sup>2</sup>) for oat yield prediction using multi-stage UAV imagery. Growth stages include jointing (P1), heading (P2), early-grain filling (P3), and mid-grain filling (P4). (<b>a</b>–<b>c</b>) Three-stage combinations (P123, P124, and P234); (<b>d</b>) four-stage combination (P1234). Statistical significance markers (* <span class="html-italic">p</span> &lt; 0.05, ** <span class="html-italic">p</span> &lt; 0.01, *** <span class="html-italic">p</span> &lt; 0.001, and ns <span class="html-italic">p</span> ≥ 0.05) represent differences in prediction performance between the StackReg model and the base models.</p>
Full article ">Figure 10
<p>The statistical distribution of base and ensemble learning models’ prediction accuracy (RMSE) for oat yield prediction using multi-stages UAV imagery. Growth stages include jointing (P1), heading (P2), early-grain filling (P3), and mid-grain filling (P4). (<b>a</b>–<b>c</b>) Three-stage combinations (P123, P124, P234); (<b>d</b>) Four-stage combination (P1234).</p>
Full article ">
30 pages, 18170 KiB  
Article
Performance Assessment of Individual and Ensemble Learning Models for Gully Erosion Susceptibility Mapping in a Mountainous and Semi-Arid Region
by Meryem El Bouzekraoui, Abdenbi Elaloui, Samira Krimissa, Kamal Abdelrahman, Ali Y. Kahal, Sonia Hajji, Maryem Ismaili, Biraj Kanti Mondal and Mustapha Namous
Land 2024, 13(12), 2110; https://doi.org/10.3390/land13122110 - 6 Dec 2024
Viewed by 711
Abstract
High-accuracy gully erosion susceptibility maps play a crucial role in erosion vulnerability assessment and risk management. The principal purpose of the present research is to evaluate the predictive power of individual machine learning models such as random forest (RF), decision tree (DT), and [...] Read more.
High-accuracy gully erosion susceptibility maps play a crucial role in erosion vulnerability assessment and risk management. The principal purpose of the present research is to evaluate the predictive power of individual machine learning models such as random forest (RF), decision tree (DT), and support vector machine (SVM), and ensemble machine learning approaches such as stacking, voting, bagging, and boosting with k-fold cross validation resampling techniques for modeling gully erosion susceptibility in the Oued El Abid watershed in the Moroccan High Atlas. A dataset comprising 200 gully points, identified through field observations and high-resolution Google Earth imagery, was used, alongside 21 gully erosion conditioning factors selected based on their importance, information gain, and multi-collinearity analysis. The exploratory results indicate that all derived gully erosion susceptibility maps had a good accuracy for both individual and ensemble models. Based on the receiver operating characteristic (ROC), the RF and the SVM models had better predictive performances, with AUC = 0.82, than the DT model. However, ensemble models significantly outperformed individual models. Among the ensembles, the RF-DT-SVM stacking model achieved the highest predictive accuracy, with an AUC value of 0.86, highlighting its robustness and superior predictive capability. The prioritization results also confirmed the RF-DT-SVM ensemble model as the best. These findings highlight the superiority of ensemble learning models over individual ones and underscore their potential for application in similar geo-environmental contexts. Full article
(This article belongs to the Special Issue Artificial Intelligence for Soil Erosion Prediction and Modeling)
Show Figures

Figure 1

Figure 1
<p>Geographic situation of the study area (<b>a</b>) at national scale and (<b>b</b>) at regional scale and (<b>c</b>) digital elevation model of the study area.</p>
Full article ">Figure 2
<p>Geological map of the study area.</p>
Full article ">Figure 3
<p>Methodological flowchart of this study.</p>
Full article ">Figure 4
<p>Location of gullies and no gullies in the study area.</p>
Full article ">Figure 5
<p>Recent field photographs of gully erosion in the study area.</p>
Full article ">Figure 6
<p>Gully conditioning factors: (<b>a</b>) elevation, (<b>b</b>) slope, (<b>c</b>) aspect, (<b>d</b>) curvature, (<b>e</b>) plan curvature, (<b>f</b>) profile curvature, (<b>g</b>) rainfall, (<b>h</b>) LULC, (<b>i</b>) NDVI, (<b>j</b>) drainage density, (<b>k</b>) distance to river, (<b>l</b>) distance to roads, (<b>m</b>) geomorphons, (<b>n</b>) TWI, (<b>o</b>) SPI, (<b>p</b>) TPI, (<b>q</b>) TRI, (<b>r</b>) LS, (<b>s</b>) convergence, (<b>t</b>) lithology, (<b>u</b>) valley depth.</p>
Full article ">Figure 6 Cont.
<p>Gully conditioning factors: (<b>a</b>) elevation, (<b>b</b>) slope, (<b>c</b>) aspect, (<b>d</b>) curvature, (<b>e</b>) plan curvature, (<b>f</b>) profile curvature, (<b>g</b>) rainfall, (<b>h</b>) LULC, (<b>i</b>) NDVI, (<b>j</b>) drainage density, (<b>k</b>) distance to river, (<b>l</b>) distance to roads, (<b>m</b>) geomorphons, (<b>n</b>) TWI, (<b>o</b>) SPI, (<b>p</b>) TPI, (<b>q</b>) TRI, (<b>r</b>) LS, (<b>s</b>) convergence, (<b>t</b>) lithology, (<b>u</b>) valley depth.</p>
Full article ">Figure 6 Cont.
<p>Gully conditioning factors: (<b>a</b>) elevation, (<b>b</b>) slope, (<b>c</b>) aspect, (<b>d</b>) curvature, (<b>e</b>) plan curvature, (<b>f</b>) profile curvature, (<b>g</b>) rainfall, (<b>h</b>) LULC, (<b>i</b>) NDVI, (<b>j</b>) drainage density, (<b>k</b>) distance to river, (<b>l</b>) distance to roads, (<b>m</b>) geomorphons, (<b>n</b>) TWI, (<b>o</b>) SPI, (<b>p</b>) TPI, (<b>q</b>) TRI, (<b>r</b>) LS, (<b>s</b>) convergence, (<b>t</b>) lithology, (<b>u</b>) valley depth.</p>
Full article ">Figure 7
<p>The correlation matrix of conditioning factors.</p>
Full article ">Figure 8
<p>Predictive capabilities using the information gain method.</p>
Full article ">Figure 9
<p>Importance of selected factors using the random forest model.</p>
Full article ">Figure 10
<p>Gully erosion susceptibility maps predicted by (<b>a</b>) RF, (<b>b</b>) DT, (<b>c</b>) SVM, (<b>d</b>) RF-DT, (<b>e</b>) RF-SVM, (<b>f</b>) DT-SVM, (<b>g</b>) RF-DT-SVM, (<b>h</b>) voting, (<b>i</b>) bagging, (<b>j</b>) AdaBoost, (<b>k</b>) GBoost.</p>
Full article ">Figure 10 Cont.
<p>Gully erosion susceptibility maps predicted by (<b>a</b>) RF, (<b>b</b>) DT, (<b>c</b>) SVM, (<b>d</b>) RF-DT, (<b>e</b>) RF-SVM, (<b>f</b>) DT-SVM, (<b>g</b>) RF-DT-SVM, (<b>h</b>) voting, (<b>i</b>) bagging, (<b>j</b>) AdaBoost, (<b>k</b>) GBoost.</p>
Full article ">Figure 11
<p>Percentages of gully erosion susceptibility classes.</p>
Full article ">Figure 12
<p>The receiver operating characteristic (ROC) curves: success rate (training data) and predictive rate (testing data).</p>
Full article ">Figure 13
<p>Model prioritization using training and testing data.</p>
Full article ">
21 pages, 5660 KiB  
Article
EWAIS: An Ensemble Learning and Explainable AI Approach for Water Quality Classification Toward IoT-Enabled Systems
by Nermeen Gamal Rezk, Samah Alshathri, Amged Sayed and Ezz El-Din Hemdan
Processes 2024, 12(12), 2771; https://doi.org/10.3390/pr12122771 - 5 Dec 2024
Viewed by 597
Abstract
In the context of smart cities with advanced Internet of Things (IoT) systems, ensuring the sustainability and safety of freshwater resources is pivotal for public health and urban resilience. This study introduces EWAIS (Ensemble Learning and Explainable AI System), a novel framework designed [...] Read more.
In the context of smart cities with advanced Internet of Things (IoT) systems, ensuring the sustainability and safety of freshwater resources is pivotal for public health and urban resilience. This study introduces EWAIS (Ensemble Learning and Explainable AI System), a novel framework designed for the smart monitoring and assessment of water quality. Leveraging the strengths of Ensemble Learning models and Explainable Artificial Intelligence (XAI), EWAIS not only enhances the prediction accuracy of water quality but also provides transparent insights into the factors influencing these predictions. EWAIS integrates multiple Ensemble Learning models—Extra Trees Classifier (ETC), K-Nearest Neighbors (KNN), AdaBoost Classifier, decision tree (DT), Stacked Ensemble, and Voting Ensemble Learning (VEL)—to classify water as drinkable or non-drinkable. The system incorporates advanced techniques for handling missing data and statistical analysis, ensuring robust performance even in complex urban datasets. To address the opacity of traditional Machine Learning models, EWAIS employs XAI methods such as SHAP and LIME, generating intuitive visual explanations like force plots, summary plots, dependency plots, and decision plots. The system achieves high predictive performance, with the VEL model reaching an accuracy of 0.89 and an F1-Score of 0.85, alongside precision and recall scores of 0.85 and 0.86, respectively. These results demonstrate the proposed framework’s capability to deliver both accurate water quality predictions and actionable insights for decision-makers. By providing a transparent and interpretable monitoring system, EWAIS supports informed water management strategies, contributing to the sustainability and well-being of urban populations. This framework has been validated using controlled datasets, with IoT implementation suggested to enhance water quality monitoring in smart city environments. Full article
Show Figures

Figure 1

Figure 1
<p>Proposed IoT-based Smart water system.</p>
Full article ">Figure 2
<p>Water quality system using Ensemble Learning and Explainable AI in the context of a smart city.</p>
Full article ">Figure 3
<p>Features distribution in the water data.</p>
Full article ">Figure 4
<p>Statistical analysis of the water quality features of the full dataset.</p>
Full article ">Figure 5
<p>Statistical analysis of water quality features (with label 0).</p>
Full article ">Figure 6
<p>Statistical analysis of water quality features (with label 1).</p>
Full article ">Figure 7
<p>Water quality label distribution [1 (potable) and 0 (not potable)].</p>
Full article ">Figure 8
<p>Comparative study between the proposed model and existing models using mean imputation.</p>
Full article ">Figure 9
<p>Comparative study between the proposed model and existing models using light boosting imputation.</p>
Full article ">Figure 10
<p>SHAP feature relevance determined by mean absolute SHAP values.</p>
Full article ">Figure 11
<p>SHAP summary plot for VEL predictions.</p>
Full article ">Figure 12
<p>SHAP summary plot for VEL predictions.</p>
Full article ">Figure 13
<p>SHAP waterfall plots corresponding to water features.</p>
Full article ">Figure 14
<p>LIME plots of water features.</p>
Full article ">Figure 15
<p>Suggested IoT system for smart water quality system.</p>
Full article ">
24 pages, 2247 KiB  
Article
Enhancing Intrusion Detection Systems with Dimensionality Reduction and Multi-Stacking Ensemble Techniques
by Ali Mohammed Alsaffar, Mostafa Nouri-Baygi and Hamed Zolbanin
Algorithms 2024, 17(12), 550; https://doi.org/10.3390/a17120550 - 3 Dec 2024
Viewed by 532
Abstract
The deployment of intrusion detection systems (IDSs) is essential for protecting network resources and infrastructure against malicious threats. Despite the wide use of various machine learning methods in IDSs, such systems often struggle to achieve optimal performance. The key challenges include the curse [...] Read more.
The deployment of intrusion detection systems (IDSs) is essential for protecting network resources and infrastructure against malicious threats. Despite the wide use of various machine learning methods in IDSs, such systems often struggle to achieve optimal performance. The key challenges include the curse of dimensionality, which significantly impacts IDS efficacy, and the limited effectiveness of singular learning classifiers in handling complex, imbalanced, and multi-categorical traffic datasets. To overcome these limitations, this paper presents an innovative approach that integrates dimensionality reduction and stacking ensemble techniques. We employ the LogitBoost algorithm with XGBRegressor for feature selection, complemented by a Residual Network (ResNet) deep learning model for feature extraction. Furthermore, we introduce multi-stacking ensemble (MSE), a novel ensemble method, to enhance attack prediction capabilities. The evaluation on benchmark datasets such as CICIDS2017 and UNSW-NB15 demonstrates that our IDS surpasses current models across various performance metrics. Full article
(This article belongs to the Section Evolutionary Algorithms and Machine Learning)
Show Figures

Figure 1

Figure 1
<p>The proposed method.</p>
Full article ">Figure 2
<p>Imbalance ratio of categories in <span class="html-italic">D1</span>.</p>
Full article ">Figure 3
<p>Imbalance ratio of categories in <span class="html-italic">D2</span>.</p>
Full article ">Figure 4
<p>Structure of ResNet.</p>
Full article ">Figure 5
<p>Structure of the residual block.</p>
Full article ">Figure 6
<p>Multiclass confusion matrix for <span class="html-italic">D1</span>.</p>
Full article ">Figure 7
<p>Multiclass confusion matrix for <span class="html-italic">D2</span>.</p>
Full article ">
Back to TopTop