1. Introduction
Alzheimer’s disease is a progressive neurological illness that affects memory, cognition, and behavior in senior individuals and is the most common cause of dementia [
1,
2,
3]. An AD diagnosis should be accurate and timely to ensure effective disease management and timely intervention, resulting in improved patient care and potential therapeutic interventions. For clinicians, the precise diagnosis of AD is a challenging task due to its complexity and heterogeneity. In recent years, machine learning has emerged as a powerful tool in medical diagnoses, offering the potential to augment traditional diagnostic approaches and improve accuracy [
4,
5].
Figure 1 represents the IoT-based patient monitoring system. Through the integration of disparate machine learning algorithms, this study aims to improve AD diagnosis accuracy by leveraging the capabilities of machine learning. Multi-algorithm approaches strive to overcome the limitations of individual models by harnessing the predictive capabilities of multiple algorithms. This study aims to enhance Alzheimer’s disease (AD) diagnoses using machine learning techniques.
The method involves working with a dataset containing missing values. To address this, imputation techniques are employed, ensuring dataset integrity and analysis quality [
6,
7,
8]. Additionally, a feature selection algorithm identifies key dataset characteristics crucial for accurate AD prediction. Class imbalance, common in medical datasets, is tackled using the Synthetic Minority Oversampling Technique (SMOTE) [
9,
10]. This ensures unbiased predictive models and more reliable disease prediction. To further improve AD prediction accuracy, an automated method using machine learning is proposed. An Ensemble Classification algorithm is applied, combining multiple predictive models [
11]. This approach enhances AD detection reliability by amalgamating results from various algorithms. Feature extraction extracts fresh data features, while feature selection picks significant characteristics. Appropriate feature selection algorithms are vital for accurate prediction [
12]. Algorithms like Mutual Information Scores, Relief, and Recursive Feature Elimination efficiently extract essential features. A Univariate Analysis assigns significance scores to each feature. Class distribution is balanced with SMOTE [
13], addressing the unbalanced dataset issue by increasing minority class rows, thereby boosting minority class classifier accuracy [
14]. Other studies have used advanced machine learning methods such as deep learning and convolutional neural networks for AD diagnoses [
15,
16]. Novel biomarkers and multimodal data integration [
17,
18] have shown promise in predicting AD. This study utilizes a robust AD dataset from UCI’s machine learning repository, ensuring findings’ reliability and generalizability. The proposed approach outperforms existing methods in disease prediction after rigorous evaluation. Beyond an AD diagnosis, this study holds broader implications. It offers insights into machine learning applications in medical diagnoses, facilitating precise early detection across medical domains. As AD and neurodegenerative disorders rise, accurate diagnostic methods are vital for patient outcomes and healthcare management. Moreover, this research advances machine learning’s role in medical research, beyond an AD diagnosis. The remainder of the paper is arranged as follows:
Section 2 explains the materials and methodology,
Section 3 presents the experimentation and performance assessment,
Section 4 demonstrates Experimental Configuration, Results, and Discussion, and
Section 5 concludes with a discussion of future work.
2. Materials and Methods/Methodology
2.1. Literature Review
Ensemble learning enhances an AD diagnosis through 3D convolutional neural networks and MRI. Such networks can distinguish between healthy individuals, those with mild cognitive impairment, and AD patients [
1,
2]. Transfer learning aids in detecting AD with these networks [
3]. Techniques like deep aggregation learning and stacking-based ensemble learning with genetic hyperparameter tweaking improve diagnostic accuracy [
4,
5]. Using the ADNI dataset, MRI-based ensemble learning achieves 95.2% accuracy distinguishing AD from NC and 77.8% accuracy distinguishing sMCI from pMCI [
11]. However, this dataset’s limitations include sample size and lack of method comparison with other state-of-the-art techniques. Additionally, the research does not assess the method’s interpretability, potentially limiting its application [
11]. Another study introduces an ensemble learning architecture using 2D CNNs for an AD diagnosis [
12]. This method trains on grey matter density maps and uses ensemble models to improve prediction accuracy. However, its limitations include reliance on 2D MRI images and the need for testing on larger datasets [
12]. Research on machine learning for AD diagnoses using neuroimaging data explored techniques like Support Vector Machines and CNNs [
13,
14]. While some methods achieve significant accuracy, they often face challenges with real-world healthcare data or require further testing on more extensive datasets [
15].
A study using a stacking-genetic algorithm ensemble learning model reached a high accuracy, precision, recall, and F1-score in early AD diagnoses [
15]. Nevertheless, issues like variable dataset validation and clinical interpretability remain. Combining MRI classifiers offers reliable AD detection, but its applicability requires further exploration [
16]. On the other hand, Random Forest achieves high accuracy predicting AD using limited features from MRI scans [
17]. Deep learning has shown potential in AD diagnoses, especially when studying complex disease pathways [
18]. Still, its reliability in predicting AD progression needs rigorous testing across various imaging modalities and larger datasets. The use of a deep CNN for a stage-based AD diagnosis shows promise, but a comprehensive methodology comparison and general applicability assessment are essential [
19]. Other methods, such as high-pressure liquid chromatography with AI algorithms, offer insights into predicting Alzheimer’s medication properties [
20]. Deep learning techniques integrating expert knowledge and multi-source data have outperformed many ensemble methods [
21]. However, the system might need substantial computational resources and could vary across datasets. Research on ensemble learning with Conformal Predictors indicates improved categorization, but a broader dataset is essential for validation [
22]. Hierarchical ensemble learning addresses some deep learning challenges, providing enhanced classification accuracy with pre-trained neural networks [
23]. However, this may require substantial training datasets and high-quality MRI scans. Lastly, ensemble learning for regression problems shows potential in predicting medication effects, but needs expansion for broader applications [
14,
24]. Ensemble learning and advanced algorithms demonstrate significant promise in AD diagnoses [
25,
26]. However, broader dataset validations, methodology comparisons, and evaluations of real-world applicability are crucial.
2.2. Proposed Work
In the initial phase of the system, categorical attributes are converted into numeric attributes (0 s and 1 s). The absent values in the dataset are then handled using the median value. Feature extraction creates new data features. Next, feature selection is used to find disease–diagnosis-relevant traits. Accurate prediction requires this step. Several feature selection techniques are researched to choose the most useful characteristics for an AD diagnosis. After declaring a set number of features, Recursive Feature Elimination (RFE) removes them. A Univariate Analysis evaluates each attribute numerically. PCA reduces dimensionality while maintaining useful data. Mutual Information Scores and Relief automatically choose relevant features to accelerate a diagnosis. SMOTE is used to oversample the minority class in AD datasets to address class imbalance. Fair categorization datasets result. Ensemble classification mixes model predictions to efficiently handle textual characteristics. Aggregating label forecasts and forecasting the majority vote improve classification accuracy. A comprehensive and sophisticated ensemble-based model aims to improve AD diagnoses. The model extracts critical characteristics and uses a balanced dataset by merging several feature selection techniques and SMOTE for class imbalance, boosting illness prediction accuracy. The ensemble classification strengthens the model’s textual feature management. The planned study will enhance AD detection and diagnoses, improving patient outcomes and healthcare management.
- A.
Pre-processing
The pre-processing phase prepares the raw AD dataset for an analysis. Categorical attributes are transformed to numeric for compatibility with machine learning. Median imputation addresses missing values, ensuring data completeness. Feature extraction enriches the dataset, while feature selection pinpoints the most informative attributes. Recursive Feature Elimination (RFE) removes less vital features iteratively. A Univariate Analysis ranks features based on their importance. A Principal Component Analysis (PCA) compresses data without losing critical information. This rigorous preparation creates a solid foundation for the ensemble-based AD diagnosis model.
- B.
Extraction and Selection of Features
In the ensemble-based model for an AD diagnosis, feature extraction transforms the raw AD dataset to capture essential patterns, enhancing its richness for better prediction. Feature selection then identifies the most critical characteristics within this dataset. Several algorithms assess which features most influence diagnostic accuracy. Recursive Feature Elimination (RFE) methodically removes less important features to streamline the model, while a Univariate Analysis ranks each feature’s significance in classification. A Principal Component Analysis (PCA) compresses data, retaining essential variance for a concise representation. By using these feature extraction and selection methods, the model highlights the AD dataset’s key aspects, improving prediction accuracy and supporting early disease detection for improved patient results.
Given a dataset X of size n × m (n samples, m features),
Compute the mean of each feature and subtract it from the corresponding feature in X, resulting in a zero-mean dataset
.
Calculate the covariance matrix.
Compute the eigen values (λ) and eigenvectors (v) of the covariance matrix C.
- C.
Synthetic Minority Oversampling Technique (SMOTE)
The proposed ensemble-based model for AD diagnoses uses the Synthetic Minority Oversampling Technique (SMOTE) to tackle class imbalance often found in medical datasets, including AD. Class imbalance can lead to biased learning, favoring the larger class and reducing accuracy. SMOTE addresses this by creating synthetic samples for the underrepresented class, enhancing its presence in the dataset. By adding these samples, the model better understands minority class patterns, leading to better AD diagnosis accuracy. Using SMOTE ensures a balanced dataset, enhancing the model’s prediction accuracy for both classes.
Algorithm: synthetic samples depending on the minority-majority class ratio.
Input
Output
Synthetic samples: S
- 1.
Create an empty synthetic sample list: S = []
- 2.
Calculate the number of synthetic samples (n_synthetic) depending on the minority-majority class ratio.
- 3.
Each minority class sample m in M:
Find k closest neighbors of m from minority class samples, omitting m.
Randomly choose one of the k neighbours (nn).
Difference vector diff = nn − m.
Add a random proportion of diff to ‘m’ to create n_synthetic samples.
- 4.
Add all newly synthesized samples to S.
- 5.
Return synthetic sample list S.
- 6.
The method identifies AD efficiently and correctly. Healthy and AD patients are first separated. SMOTE fakes minority class samples for dataset balance. Representing both groups promotes learning. Splitting the balanced dataset into training and testing sets preserves class distribution.
- 7.
Logistic Regression, Random Forest, or SVM predict AD. The chosen model learns from training set characteristics and labels. The testing set evaluates the model’s accuracy, precision, recall, and F1-score.
- 8.
Successful classification models can discover AD in new data. Fresh instance features suggest AD.
SMOTE builds synthetic samples along line segments linking a minority class sample and its k nearest neighbors, extending the minority class in feature space. Logistic Regression, Random Forest, or a Support Vector Machine are used to predict AD. The selected model learns features and annotations from the training set. To measure the model’s efficacy, the accuracy, precision, recall, and F1-score are used on the assessment set. The trained classification model is able to detect AD in new, unlabeled data if its performance is adequate. By feeding the model the characteristics of new instances, it can precisely predict the presence of AD. With careful consideration of dataset quality, feature selection, and model selection, this algorithm provides a promising strategy for early and accurate AD detection. Utilizing SMOTE to resolve class imbalance and advanced classification techniques, the algorithm improves patient outcomes by facilitating timely diagnoses and intervention.
- D.
AD Prediction Using SMOTE
The proposed method efficiently classifies AD. The dataset, initially divided into healthy and AD patients, is balanced using SMOTE. This enhances learning by representing both classes equally. The data are then split for training and testing with equal class distribution. The model, using Logistic Regression, Random Forest, or a Support Vector Machine, learns from the training set and is evaluated based on the accuracy, precision, recall, and F1-score. Once trained, the model can predict AD in new data. By addressing class imbalances with SMOTE and using advanced techniques, this approach promises early and accurate AD detection, improving patient outcomes.
- E.
Classification Procedure
A Support Vector Machine (SVM) is a key classification tool with significant potential for an AD diagnosis. It is a versatile supervised learning algorithm suited for both linear and nonlinear tasks. Especially useful for complex medical datasets like AD, SVM identifies the best hyperplane to separate classes. After refining features, SVM can discern complex patterns and relationships in the dataset. Its ability to handle nonlinear relationships through various kernel functions and resist outliers ensures reliable predictions. When trained on a balanced dataset from SMOTE, SVM offers high sensitivity and specificity, vital for early AD detection.
4. Results and Discussion
Feature selection and data resampling are critical for enhancing machine learning model performance, especially with imbalanced datasets. Feature selection chooses relevant features from the initial set, eliminating unimportant or redundant ones. This enhances model efficiency and interpretability, and reduces overfitting. Methods like Recursive Feature Elimination (RFE), Univariate Feature Selection (UFS), and a Principal Component Analysis (PCA) help identify key features for accurate predictions. Data resampling adjusts the dataset’s distribution, particularly when class imbalances exist. Oversampling, like the SMOTE method, creates synthetic samples for the minority class, while under-sampling removes instances from the dominant class. However, under-sampling can lead to information loss. By integrating feature selection and resampling, models are trained on balanced and pertinent datasets, improving accuracy and real-world applicability. These techniques effectively address challenges like class imbalances and high-dimensional feature spaces.
- A.
Tuning hyperparameters
Hyperparameter optimization is essential for enhancing the ensemble-based AD diagnosis model. This process finds the best values for key model parameters like learning rate, depth of decision trees, or number of neighbors in K-Nearest Neighbors (KNNs). Methods like Grid Search or Random Search are used, combined with cross-validation to avoid overfitting. The model’s performance is tested with various hyperparameter combinations using metrics like the accuracy, F1-score, or AUC-ROC. After identifying the optimal hyperparameters, the model is validated on unseen data to confirm its reliability. This thorough tuning ensures the model’s peak accuracy in an AD diagnosis, benefiting patient care and disease management.
- B.
Grid-Search tuning
During the model’s hyperparameter tuning, various parameter combinations were explored to optimize performance. We examined the ‘Bootstrap’ parameter using both ‘True’ and ‘False’. Handling missing data, the maximum model depth was tested with values of 5 and 7, and the maximum features were assessed with options of 3 and 4. We also evaluated the impact of minimum sample leaf values of 3 and 4. The decision trees’ minimum sample split values were tried at 3, 5, and 7, while the number of estimators was tested with 200, 400, and 600. By assessing these combinations, we identified the optimal configuration for the best model performance. This rigorous tuning improved the model’s predictive capabilities.
- C.
Optimal RF hyperparameters
During hyperparameter tuning, we adjusted several parameters to enhance the model’s performance. We set the “Bootstrap” to “False”, the “Maximum depth” to seven layers, and “Maximum features” to four. The “Minimum samples leaf” was fixed at three, while “Minimum samples split” required seven samples. The model employed 200 estimators as indicated by the “n_estimators” value. These adjustments optimized the model’s performance, ensuring more accurate predictions. Proper hyperparameter tuning is vital for improved model results and capabilities.
- D.
Effectiveness Evaluation
For the study article on AD diagnoses, the ensemble-based model must be evaluated and compared to different classification methods at several significance levels. To correct class imbalance, the dataset is prepared, preprocessed, and balanced using SMOTE. Cross-validation divides the balanced dataset into training and testing sets for proper evaluation.
Figure 2 represents the relationship with various models. The ensemble-based model is trained using optimized hyperparameters and SVM and Logistic Regression classifiers.
- E.
Precision
It quantifies the model’s True Positive predictions. A high accuracy score means the model reliably predicts positive situations, whereas a low score means it produces many erroneous positive predictions.
Figure 3 represents the precision score for different models.
The precision graph represented in
Figure 3 clearly illustrates the varying precision scores of predictive algorithms. SVM stands out with an impressive 96%, indicating accurate positive predictions. Extra Tree shows a lower 76%, while the decision tree, Logistic Regression, and XG Boost perform moderately at 81%. SVM’s dominance is evident.
- F.
Recall
Recall is a performance statistic for binary classification models. It tests the model’s ability to identify all positive occurrences from the dataset’s total positive instances. Recall is sensitivity and the True Positive rate (TPR).
Figure 4 represents the recall score for different models. The ratio of True Positive predictions (properly recognized positive cases) to the total of True Positive and False Negative predictions (positive instances mistakenly forecasted as negative) is used.
A high recall score suggests that the model can properly identify a significant proportion of positive cases, meaning few False Negatives. A low recall score means the model misses many positive examples, resulting in more False Negatives. Recall is critical in medical diagnoses (to detect illnesses) and fraud detection (to detect fraudulent transactions) to accurately identify positive instances. However, optimizing one statistic might affect other metrics in a classification assignment; therefore, it is important to balance recall and other metrics like accuracy.
The recall scores for various models were evaluated to quantify their ability to accurately identify positive cases in the dataset. The SVM algorithm exhibited an outstanding recall score of 97%, correctly identifying 97% of the positive cases. Surprisingly, the KNN algorithm surpassed even the SVM, obtaining a recall score of 95%, demonstrating its effectiveness in correctly identifying positive cases. In contrast, the decision tree algorithm achieved a lower recall score of 84%, indicating that it missed a considerable portion of the positive cases. The Naive Bayes model achieved a recall score of 75%, while the Logistic Regression model performed relatively better with a recall score of 81%. Overall, the results highlight the superior performance of the KNN model in identifying positive cases compared to the other four algorithms. The models’ diagnostic skills on the testing set are assessed using the accuracy, precision, recall, F1-score, and AUC-ROC. The confusion matrix also assesses true, false, positive, and negative predictions. External validation on a different dataset assesses the model’s capacity to generalize to unobserved data by comparing the proposed model’s performance to the baseline classifiers and using statistical tests to discover performance differences.
- G.
F1-Score
A model with a high F1-score is one that effectively balances precision and recall. Evaluating the F1-scores of the aforementioned models would provide a more thorough comprehension of their overall effectiveness and potential trade-offs between precision and recall.
Figure 5 represents a confusion matrix on prediction of Alzheimer’s.
Visualization methods like ROC curves and precision–recall curves show the model’s discrimination performance, whereas a feature significance analysis shows feature contributions.
Figure 5 illustrates an AD prediction confusion matrix. True Positive (TP) occurrences are accurately predicted as including the disease, while False Positive (FP) instances are wrongly forecasted as positive. True Negative (TN) occurrences were accurately predicted as negative (not having the disease), while False Negative (FN) examples were mistakenly forecasted as negative but included the disease. This research study evaluates the suggested ensemble-based model for an AD diagnosis to improve medical data analytics and patient care by revealing its accuracy and efficacy.