[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Enhancing image-based diagnosis of gastrointestinal tract diseases through deep learning with EfficientNet and advanced data augmentation techniques

Abstract

The early detection and diagnosis of gastrointestinal tract diseases, such as ulcerative colitis, polyps, and esophagitis, are crucial for timely treatment. Traditional imaging techniques often rely on manual interpretation, which is subject to variability and may lack precision. Current methodologies leverage conventional deep learning models that, while effective to an extent, often suffer from overfitting and generalization issues on medical image datasets due to the intricate and subtle variations in disease manifestations. These models typically do not fully utilize the potential of transfer learning or advanced data augmentation, leading to less-than-optimal performance, especially in diverse real-world scenarios where data variability is high. This study introduces a robust model using the EfficientNetB5 architecture combined with a sophisticated data augmentation strategy. The model is tailored for the high variability and intricate details present in gastrointestinal tract disease images. By integrating transfer learning with maximal pooling and extensive regularization, the model aims to enhance diagnostic accuracy and reduce overfitting. The proposed model achieved a test accuracy of 98.89%, surpassing traditional methods by incorporating advanced regularization and augmentation techniques. The application of horizontal flipping and dynamic scaling during training significantly improved the model’s ability to generalize, evidenced by a low-test loss of 0.230 and high precision metrics across all classes. The proposed deep learning framework demonstrates superior performance in the automated classification of gastrointestinal diseases from image data. By addressing key limitations of existing models through innovative techniques, this study contributes to the enhancement of diagnostic processes in medical imaging, potentially leading to more accurate and timely disease interventions.

Peer Review reports

Introduction

The identification and treatment of gastrointestinal (GI) tract disorders, including conditions like ulcerative colitis, polyps, and esophagitis, pose a substantial challenge in the domain of medical diagnostics. These conditions, if not detected and treated early, can lead to severe health complications, including increased risk of cancer and life-threatening emergencies. Traditionally, the diagnosis of these diseases heavily relies on endoscopic examination followed by histopathological analysis of biopsy samples. However, these methods are invasive, resource-intensive, and often subjective, depending on the specialist’s experience and expertise.

Deep learning methods have demonstrated promising outcomes in analyzing medical images, providing opportunities to assist and enhance clinical decision-making. Nevertheless, despite these advancements, applying deep learning in medical imaging, particularly for GI diseases, encounters substantial obstacles. These challenges include the variability in image quality, the subtle differences between disease states, and the generalization of models to new, unseen cases. Figure 1 shows some of the images of different classes of colon disease from the dataset.

Fig. 1
figure 1

Sample Images from the dataset

The limitations of current endoscopic techniques include subjectivity in diagnosis, as traditional methods rely heavily on the visual assessment of gastroenterologists, leading to variability in diagnostic accuracy and potential human error; invasiveness, as existing methods often require biopsies and physical tissue removal for histopathological examination, posing risks to patients; and resource intensity, as these procedures demand significant time and specialized skills, limiting their availability and increasing healthcare costs. In contrast, deep learning offers potential solutions by enhancing diagnostic accuracy through consistent analysis of visual data, reducing subjectivity and variability, and enabling the detection of subtle patterns that may be missed during manual examination. Moreover, improved accuracy in image-based diagnosis could reduce the need for invasive biopsy procedures by facilitating more accurate non-invasive diagnostics. Additionally, automated analysis can expedite the diagnostic process and lower costs, enhancing accessibility across diverse healthcare settings. Our study addresses these issues by utilizing a customized deep learning model based on the EfficientNetB5 architecture, which effectively manages the intricate details in endoscopic images due to its depth and complexity scaling. We also implement targeted data augmentation techniques to train the model to handle various imaging conditions, mimicking real-world variability in endoscopic examinations, and integrate sophisticated regularization methods to prevent overfitting, ensuring that our model remains robust and generalizable across different clinical environments and patient populations.

Objectives of the research

This research seeks to overcome these limitations by creating a more robust and efficient deep learning model that utilizes the cutting-edge EfficientNetB5 architecture. The objectives of this study are twofold:

  1. a)

    Enhance the generalization capabilities of deep learning models for GI disease diagnosis through innovative data augmentation techniques and advanced model regularization strategies.

  2. b)

    Enhance diagnostic precision and efficiency by combining transfer learning with deep learning models, enabling the swift and accurate classification of GI tract diseases from endoscopic images.

Research design and methodology

To achieve these objectives, the research employs a comprehensive approach involving several key components:

  1. c)

    Data Collection and Preprocessing: A curated dataset of endoscopic images labeled with four major GI disease categories is used. This dataset undergoes a series of preprocessing steps, including image resizing, normalization, and augmentation techniques such as horizontal flipping and random scaling to enhance model robustness.

  2. d)

    Model Development: Utilizing the EfficientNetB5 architecture, known for its efficiency and effectiveness in handling complex image data, the study explores its application to medical imaging. The model includes several layers of regularization and dropout to combat overfitting, a common challenge in medical image analysis.

  3. e)

    Training and Validation: The model is trained using a split dataset approach, ensuring that it learns to generalize well over unseen data.

  4. f)

    Evaluation: The final model is evaluated on a separate test set to assess its real-world applicability. Additionally, a detailed analysis of the model’s performance, including confusion matrices and classification reports, provides insights into its diagnostic capabilities.

The main contributions of this study include:

  1. 1.

    Our study introduces a robust deep learning model using the EfficientNetB5 architecture, optimized for the complex and variable nature of gastrointestinal tract images.

  2. 2.

    We have developed and applied a series of advanced data augmentation techniques specifically tailored for gastrointestinal imaging.

  3. 3.

    Our application of multiple regularization methods, including L2 and L1 regularizations, bias regularization, and dropout, helps in significantly reducing the risk of overfitting.

The remainder of this study is organized as follows: The Literature Review discusses prior work in medical image analysis, focusing on deep learning in GI disease diagnosis, highlighting progress and gaps. The Methodology details data collection, model architecture, training procedures, and evaluation metrics. Results and Discussion present the research findings and compare them with existing models, discussing their clinical implications. Finally, the Conclusion and Future Work summarizes key findings and contributions and outlines potential directions for enhancing AI diagnostic capabilities in medicine.

By systematically addressing the challenges associated with AI-driven diagnostics in GI tract diseases, this research contributes significantly to the field of medical imaging, offering potential for more accurate, efficient, and non-invasive diagnostic solutions.

Literature review

The integration of artificial intelligence (AI) and deep learning in medical imaging, particularly in the diagnosis of gastrointestinal (GI) tract diseases, represents a significant evolution in the realm of diagnostic methodologies. Historically, GI diseases such as ulcerative colitis, polyps, and esophagitis have been diagnosed through endoscopic examination and biopsy, processes that not only require substantial medical expertise but are also prone to human error and variability in interpretation [1]. The surge in computational power and data availability over the last decade has catalyzed the exploration of convolutional neural networks (CNNs) and other AI techniques to augment and, in some cases, potentially replace traditional diagnostic practices. Seminal works have demonstrated the utility of standard CNN architectures like AlexNet and VGG in identifying and classifying pathological features in various medical images, setting a foundation for more specialized investigations into GI-specific applications.

Research has progressively moved towards more complex architectures and hybrid models to tackle the nuanced challenges of medical imaging, such as the differentiation of subtle morphological features across disease states and variations in imaging conditions. Notable studies have employed architectures like Inception and ResNet, which introduced deeper and more complex structures capable of capturing intricate patterns in high-resolution images. These models have shown improved accuracy in identifying GI diseases but often require extensive computational resources and large datasets for training, which are not always feasible in medical settings. The introduction of EfficientNet marked a significant advancement by systematically scaling CNN dimensions, offering a balance between model complexity and efficiency, which is crucial for deployment in clinical environments where both accuracy and computational efficiency are valued [2]. Table 1 summarizes some of the works conducted in the field of GI diseases with respect to deep learning.

Table 1 Summary of recent studies in the field of GI diseases

Furthermore, the use of transfer learning, where a model developed for one task is repurposed for another related task, has been particularly transformative in medical imaging. This approach has allowed for leveraging pre-trained networks on large, generic datasets to achieve notable successes in medical fields, thereby mitigating the challenges associated with the scarcity of labeled medical data. However, despite these advancements, the issue of model overfitting remains prevalent, driven by the high variability in medical images due to different imaging technologies and patient-specific factors. This has led researchers to explore advanced data augmentation techniques and regularization strategies to enhance the generalization capabilities of these models.

Moreover, the emerging trend of integrating AI with traditional diagnostic tools, such as combining CNN outputs with endoscopic analysis [13], has started to demonstrate potential in improving diagnostic accuracy and reliability. Recent studies have focused on not just classifying diseases but also on determining disease severity, a critical aspect of medical treatment planning that has traditionally relied heavily on subjective human judgment. The application of deep learning in this multifaceted way underscores its potential to transform the landscape of medical diagnostics.

Despite these technological advances, several challenges persist. The interpretability of AI models, or the lack thereof, remains a significant barrier. Medical practitioners often require transparent decision-making processes, which many deep learning models do not provide due to their ‘black box’ nature [14]. Efforts to integrate explainable AI into medical imaging are underway, aiming to build trust and reliability in AI-assisted diagnostics [15]. Moreover, ethical concerns, including data privacy and the potential for bias in AI models, necessitate rigorous standards and regulatory frameworks to ensure that the deployment of these technologies enhances healthcare outcomes without compromising patient rights.

Literature underscores a dynamic and rapidly evolving field where deep learning continues to push the boundaries of what is possible in medical imaging. As these technologies advance, they promise not only to enhance diagnostic procedures but also to reshape the operational dynamics within the healthcare industry, making diagnostics faster, more accurate, and less invasive. Future research will focus on refining these models for greater accuracy, efficiency, and user-friendliness in real-world clinical settings, bridging the gap between AI potential and practical medical application.

Methodology

This section outlines the detailed methodologies used in the study to improve the diagnosis of gastrointestinal (GI) tract diseases using deep learning techniques. Acknowledging the importance of precise and efficient diagnostic processes, the research combines advanced computational models with conventional medical imaging data. Figure 2 illustrates the workflow of the proposed model.

Fig. 2
figure 2

Workflow of the Proposed Model

The chosen methodologies span data collection, preprocessing, model architecture design, training, and rigorous evaluation to ensure the development of a robust and generalizable deep learning model [24]. This approach not only aims to achieve high diagnostic accuracy but also addresses the challenges of overfitting and model generalization in the highly variable domain of medical imaging. Each methodological component is crafted to contribute significantly to the overarching goal of improving GI disease diagnosis, thereby facilitating early and effective treatment interventions.

Dataset overview

The “WCE Curated Colon Disease Dataset” serves as the foundation for our study, representing a comprehensive collection of high-quality images used to analyze gastrointestinal tract conditions through deep learning methodologies [15]. These images are instrumental in training models to accurately identify and differentiate between normal colon tissue and pathological conditions such as Ulcerative Colitis, Polyps, and Esophagitis. The dataset comprises a total of 6000 images, meticulously annotated and verified by medical professionals to ensure accuracy and relevance. The images are categorized into four distinct classes: Normal (0_normal), which depict healthy colon tissue; Ulcerative Colitis (1_ulcerative_colitis), showing the inflamed and ulcerated lining of the colon; Polyps (2_polyps), representing growths on the inner lining of the colon; and Esophagitis (3_esophagitis), depicting inflammation of the esophagus.

In our study, the dataset consists of 6,000 high-quality endoscopic images categorized into four classes and is divided as follows: 70% (4,200 images) is allocated to the training set, allowing the model to learn from various data complexities; 15% (900 images) forms the validation set, essential for fine-tuning model parameters and early detection of overfitting; and the remaining 15% (900 images) serves as the testing set, providing an unbiased assessment of the model’s performance on unseen data after the training and validation phases.

Preprocessing steps

In our methodology, we employ several preprocessing techniques aimed at standardizing and enhancing the input data. First, all images are resized to a standard dimension of 150 × 150 pixels to ensure uniform input to the neural network, facilitating efficient image processing.

$$I_{\text {resized }}=\operatorname{resize}(I,(150,150))$$
(1)
  • I = Original image.

  • Iresized​ = Resized image (150 × 150 pixels).

Next, image pixel values are normalized to a range of 0 to 1, which helps reduce model training time and enhances the numerical stability of the learning algorithm. Normalization is achieved using Eq. 2.

$$I_{\text {nomalized }}=\frac{I_{\text {resized }}}{255}$$
(2)
  • Inormalized​ = Normalized image with pixel values in the range [0, 1].

Figure 3 shows the train, test, and validation images across different classes.

Fig. 3
figure 3

Training, Test, and validation images

Gastrointestinal tract images, particularly for diagnosing diseases like ulcerative colitis, polyps, and esophagitis, present unique challenges that our data augmentation strategies address. Variations in lighting, caused by the depth and angle of the endoscopic camera, are mitigated through brightness adjustments and shadow augmentation, allowing the model to recognize features under diverse conditions. To handle orientation and rotation variability, we include random rotations and flipping (both horizontal and vertical) in our augmentation strategy, ensuring the model remains invariant to input orientation for accurate diagnoses. We also employ scaling and zoom augmentation to account for scale variability, training the model to recognize features at different distances from the tissue. Lastly, we apply elastic transformations to simulate the natural deformation of soft gastrointestinal tissues, enhancing the model’s ability to generalize across various physical presentations of conditions.

Model architecture

The architecture of our deep learning model for analyzing the “WCE Curated Colon Disease Dataset” centers around the use of EfficientNetB5 as the base model. In our study, we implement transfer learning using the EfficientNetB5 architecture pre-trained on the ImageNet dataset, which consists of over a million images across 1,000 categories. This pre-training enables the model to learn rich feature representations that are beneficial for medical image analysis. Transfer learning is crucial in this domain for several reasons: it enhances feature extraction by leveraging common visual characteristics found in both medical and general images, reduces the risk of overfitting on smaller medical datasets by starting with a model that has already learned a broad set of features, accelerates training time as the model converges faster with pre-optimized weights, and compensates for scarce data, which is often a challenge in medical imaging due to the difficulty and expense of data collection and expert annotation.

This approach allows the model to converge faster than training from scratch and often results in higher overall performance. Furthermore, EfficientNetB5 incorporates a scaling methodology that optimizes the model’s depth, width, and resolution based on available resources, ensuring that we maximize the efficiency of our computations. This is crucial in medical applications where quick processing times can be vital.

The primary objective of our work is to enhance the accuracy and efficiency of diagnosing gastrointestinal tract diseases using deep learning techniques, for which we have designed a comprehensive framework that integrates state-of-the-art computational models with advanced data augmentation strategies to tackle challenges in medical image analysis, such as high variability in image quality and the subtlety of disease manifestations. Our framework includes key components such as data collection and preprocessing, where we utilize a curated dataset of high-quality endoscopic images that are preprocessed through normalization and augmentation to improve model training effectiveness; model development, employing the EfficientNetB5 architecture optimized for medical imaging through additional layers of regularization and dropout to combat overfitting; and training and validation, implementing a robust training regimen with split dataset techniques to ensure the model generalizes well to unseen data, followed by evaluation through comprehensive testing on a separate validation set to assess performance and real-world applicability. Each component of the framework is designed to contribute toward a more accurate and efficient diagnostic process, facilitating early and effective intervention for gastrointestinal diseases. The choice of EfficientNetB5 was driven by several considerations specific to the needs of medical image processing, including model efficiency and scalability, as it provides an excellent balance between accuracy and computational efficiency critical in clinical settings where high performance and quick processing times are required; state-of-the-art performance, as studies have shown EfficientNet architectures achieve superior accuracy on benchmarks like ImageNet, translating into more effective learning for complex medical imaging tasks; and optimal resource use, as the architecture employs compound scaling (scaling up width, depth, and resolution of the network), allowing for systematic and resource-efficient enhancements in model performance, making it particularly suitable for deployment in diverse medical environments.

To tailor the EfficientNetB5 model to our specific task of classifying colon diseases, we introduce several layers to the architecture, enhancing its capability to fine-tune from the specific features of our dataset. Table 2 summarizes the model’s parameters as per the layer arrangement.

Table 2 Layer wise arrangement with parameter

Batch Normalization is a method used to enhance the training of deep neural networks by making them faster and more stable. It achieves this by normalizing the inputs of each layer through re-centering and re-scaling. It is achieved using Eqs. 3 & 4.

$$x^{\widehat{(k)}}=\frac{x^{(k)}-\mu^{(k)}}{\sqrt{\left(\sigma^{(k)}\right)^2+\epsilon}}$$
(3)
$$y^{(k)}=\gamma^{(k)} x^{(\bar{k})}+\beta^{(k)}$$
(4)
  • x(k) = Input to the k-th neuron.

  • µ(k) = Mean.

  • σ(k) = Standard deviation.

  • ϵ = Small constant for numerical stability.

  • γ(k), β(k) = Parameters learned during training.

Applied after the convolution layers but before activation functions (like ReLU achieved using Eq. 5), it helps mitigate the problem known as “internal covariate shift.“

$$f(x)=\max (0, x)$$
(5)

In our model, batch normalization is applied right after the base model and before the first Dense layer. This ensures that the activations are scaled and normalized, speeding up the learning process and enhancing overall performance. Dense layers, which are fully connected layers, have each input node connected to every output node. The first Dense layer following the batch normalization has 256 units and is essential for learning non-linear combinations of the high-level features extracted by the base model. To prevent overfitting, we employ L2 and L1 regularization in our Dense layers, which adds a penalty for weight size to the loss function. This encourages the model to maintain smaller weights and thus simpler models.

Dropout is another regularization method used to prevent overfitting in neural networks by randomly dropping units (and their connections) during the training process. This simulates a robust, redundant network that generalizes better to new data. We set the dropout rate to 45% after the first Dense layer to balance between excessive and insufficient regularization.

The final layer in our model is a Dense layer with units equal to the number of classes in the dataset (four). It uses the SoftMax activation function to output a probability distribution over the four classes, making the model’s predictions interpretable as confidence levels for each class. SoftMax is implemented using Eq. 6.

$$\sigma\left(z_i\right)=\frac{e^{z_i}}{\sum_j e^{z_j}}$$
(6)
  • zi​ = Input to the i-th neuron of the final layer.

This layer is crucial for multi-class classification as it maps the non-linearities learned by previous layers to probabilities that are easy to interpret and evaluate in a clinical setting.

The entire model is compiled using the Adamax optimizer, an extension of the Adam optimizer that can be more robust to variations in the learning rate. It is achieved using Eqs. 7,8 and 9.

$$m_t=\beta_1 m_{t-1}+\left(1-\beta_1\right) g_t$$
(7)
$$v_t=\max \left(\beta_2 v_{t-1},\left|g_t\right|\right)$$
(8)
$$\theta_{t+1}=\theta_t-\eta \frac{m_t}{v_t}$$
(9)
  • mt​ = Exponential moving average of gradients.

  • vt​ = Maximum of the exponential moving average of squared gradients.

  • θ = Model parameters.

  • η = Learning rate (0.001).

  • β1, β2​ = Hyperparameters for exponential decay rates (default: β1 = 0.9, β2 = 0.999).

We use a learning rate of 0.001, which provides a good balance between speed and accuracy in convergence. Learning rate is achieved by using Eq. 10.

$$\eta = \eta \times 0.5{\rm{ (if\ no\ improvement\ in\ validation\ loss\ for\ specified\ epochs) }}$$
(10)

The loss function used is ‘categorical_crossentropy’, which is appropriate for multi-class classification tasks. The model architecture, thus customized and compiled, represents a robust system tailored to the specific nuances of medical image classification. It leverages both the powerful feature extraction capabilities of EfficientNetB5 and the tailored dense network to address the challenge of accurately classifying diseases from colonoscope images.

Our model architecture is designed not just to perform well in terms of accuracy but also to be efficient and scalable. This approach ensures that it can be deployed effectively in medical settings where both accuracy and computational efficiency are crucial. Through this architecture, we aim to contribute a valuable tool in the field of medical diagnostics, potentially aiding in faster and more accurate diagnosis of colon diseases.

Training process

The training process for our deep learning model designed to classify gastrointestinal diseases using the “WCE Curated Colon Disease Dataset” is meticulously configured to optimize performance and ensure robust generalization capabilities. At the heart of this configuration is the choice of the Adamax optimizer, a variant of the widely used Adam optimizer, known for its adaptive learning rate capabilities and suitability for problems that are large in terms of data and/or parameters. Adamax is more stable than Adam in cases where gradients may be sparse, due to its infinite norm approach to scaling the learning rates. This characteristic makes Adamax particularly suitable for medical image analysis, where the input data can vary significantly in terms of visual features and disease markers. The learning rate, a crucial hyperparameter in the context of training deep neural networks, is set at 0.001 for the start of training. This rate is chosen based on empirical evidence suggesting that it offers a good balance between convergence speed and stability. Table 3 gives an idea of hyperparameter’s with their equivalent value.

Table 3 Hyperparameter and value

Our model incorporates multiple regularization strategies to effectively prevent overfitting, a common challenge in deep learning, especially with high-dimensional data such as images. We apply L2 regularization (weight decay) in the Dense layer, adding a penalty equivalent to the square of the magnitude of coefficients to the loss function, which discourages learning overly large weights and simplifies the model, ensuring it focuses on the most relevant patterns critical for identifying subtle features in medical images. Additionally, we employ L1 regularization to promote sparsity, leading to a model where some feature weights are exactly zero, which helps in identifying significant features in complex image data. To further reduce the risk of overfitting, we apply L1 regularization to the bias terms of our Dense layers, an effective but less common approach that penalizes the intercept and reduces model complexity. We also incorporate dropout layers, which randomly set a proportion of input units to zero during training, preventing neurons from co-adapting too much and forcing the network to learn robust features that are useful across various random subsets of other neurons. These regularization techniques are particularly effective in the medical imaging context, ensuring that the model remains generalizable across different patients and imaging conditions, preventing it from memorizing noise and specific details of training images, and helping it focus on the most informative features crucial for accurate disease identification and classification.

The regularization L1 and L2 is achieved using Eqs. 11 & 12.

$$L 1_{\text {regularization }}=\lambda_1 \sum\left|w_i\right|$$
(11)
$$L 2_{\text {regularization }}=\lambda_2 \sum_i^i w_i^2$$
(12)
  • λ1​=0.006 for L1 regularization.

  • λ2 = 0.016 for L2 regularization.

  • wi= Weights of the dense layer.

The categorical cross entropy is calculated using Eq. 13.

$$L=-\sum_{i=1}^N y_i \log \left(\widehat{y}_i\right)$$
(13)
  • N = Number of classes.

  • yi​ = True label (one-hot encoded).

  • \(\:\widehat{{y}_{i}}\)​ = Predicted probability for class i

The custom callback also includes a unique interactive feature that prompts the user at certain intervals—defined by the ‘ask_epoch’ parameter—to decide whether to continue training beyond the initially set epochs. This feature adds a layer of flexibility, allowing for human oversight in the training process, which can be crucial when training complex models on nuanced datasets.

By leveraging the Adamax optimizer’s robust handling of sparse gradients and incorporating a sophisticated custom callback that closely monitors and adjusts the training process based on real-time data, we ensure that the model is not only trained to high standards of accuracy but also exhibits strong generalizability when applied to new, unseen data. This comprehensive training strategy is designed to harness the full potential of the underlying EfficientNetB5 architecture, and the custom layers added to it, aiming to set a new benchmark in the accuracy and efficiency of medical image analysis models.

Our model training was conducted using enhanced hardware specifications on Kaggle, specifically an NVIDIA Tesla P100 GPU with 16 GB of GPU memory and a system RAM of 29 GB. The NVIDIA Tesla P100 is well-suited for deep learning which facilitates the rapid processing of large datasets and complex neural network architectures.

Evaluation metrics

The evaluation of model performance in medical image classification, such as in our study with the “WCE Curated Colon Disease Dataset,” employs a comprehensive set of metrics designed to assess various aspects of the model’s predictive capabilities. Each metric offers unique insights into the effectiveness of the model in classifying gastrointestinal diseases, which is critical for ensuring the reliability and utility of the system in clinical settings.

Accuracy: This metric is the most straightforward and commonly used. High accuracy is indicative of a model’s overall effectiveness across all classes. However, in medical imaging, where the cost of misclassification can be high, relying solely on accuracy can be misleading, especially in datasets with imbalanced classes. It is achieved using Eq. 14.

$$\text { Accuracy }=\frac{T P+T N}{T P+T N+F P+F N}$$
(14)
  • TP = True positives.

  • TN = True negatives.

  • FP = False positives.

  • FN = False negatives.

Precision: Precision, or the positive predictive value, measures the accuracy of positive predictions. For instance, high precision in detecting polyps is vital, as false positives could lead to unnecessary invasive procedures like biopsies. It is calculated using Eq. 15.

$$\text { Precision }=\frac{T P}{T P+F P}$$
(15)

Recall: Also known as sensitivity or true positive rate, recall quantifies the model’s ability to identify all relevant instances per class. In medical terms, a high recall rate is essential for conditions such as ulcerative colitis or esophagitis, where failing to detect an actual disease (false negative) could delay crucial treatment. It is achieved using Eq. 16.

$$\text { Recall }=\frac{T P}{T P+F N}$$
(16)

F1-Score: By balancing the trade-offs between precision and recall, the F1-score provides a more holistic view of the model’s performance, particularly in ensuring that both false positives and false negatives are minimized. It is achieved using Eq. 17.

$$F 1=2 \times \frac{\text { Precision } \times \text { Recall }}{\text { Precision }+ \text { Recall }}$$
(17)

Loss Metrics: In the training and evaluation phases, we also monitor loss metrics, specifically categorical cross-entropy in this context, which provides a measure of the model’s predictive error. Lower loss values indicate better model predictions that are close to the actual class labels. The loss metric is particularly useful during the training phase to adjust model parameters (like weights) and during validation to gauge the model’s ability to generalize beyond the training data.

In our evaluation, these metrics are calculated for each class and aggregated across the dataset to provide both detailed insights per class and a comprehensive overview. This multi-metric approach enables us to finely tune the model’s performance and ensure it meets the high standards required for medical diagnostic applications. It is particularly important in ensuring that the model performs well across all categories of disease, given the varying degrees of severity and the different visual characteristics that each category may present.

This comprehensive assessment strategy helps in identifying any potential biases or weaknesses in the model, guiding further refinement, and ensuring that the final product can be trusted in real-world medical scenarios.

Results and discussion

The model demonstrated exceptional accuracy levels, achieving 100% in training, 99.17% in validation, and 98.89% in testing phases. These high accuracy scores are indicative of the model’s robustness and its capability to handle complex image classification tasks effectively. Table 4 shows the epoch wise training with loss, accuracy, time, etc. during the training process.

Table 4 Epoch wise progress

The loss metrics observed across the training, validation, and test datasets showed minimal variation, underscoring the model’s efficiency in generalizing from the training data to unseen data. Specifically, the training loss was notably low, and the slight increases in validation and test loss did not significantly affect the model’s overall performance. This stability in loss metrics suggests that the model is well-tuned and balanced, avoiding common pitfalls like overfitting or underfitting, thus making it a reliable tool for clinical diagnostics. The dataset loss and accuracy has been given in Table 5.

Model performance Fig. 4 shows the curves of training and validation loss and accuracy

Table 5 Model performance
Fig. 4
figure 4

Loss and accuracy curve during model training

Particularly, the model’s high accuracy, coupled with low loss metrics, positions it favorably against existing solutions that often struggle with either metric under complex real-world conditions. Additionally, the computational efficiency of the model, supported using efficient backbones like the EfficientNetB5, ensures that it can be deployed effectively in clinical settings where quick and accurate diagnosis is essential.

A detailed statistical analysis reveals the competitive or superior nature of the model. Precision, recall, and F1-score metrics across different classes (normal, ulcerative colitis, polyps, esophagitis) consistently exceed 98%, highlighting the model’s precision and reliability in classifying various gastrointestinal conditions. These metrics not only support the model’s efficacy but also reinforce its potential to significantly improve diagnostic accuracies in clinical practices, reducing the likelihood of misdiagnosis and ensuring appropriate treatment pathways are chosen based on accurate model predictions [25, 26].

The classification report offers an in-depth analysis of the model’s performance across the various classes—normal, ulcerative colitis, polyps, and esophagitis. Figure 5 displays the precision, recall, and f1-score for each class.

Each class showed remarkably high precision, recall, and F1-score metrics, indicative of the model’s robust ability to correctly identify and classify varied gastrointestinal conditions. Figure 6 shows the confusion matrix of the proposed model.

Fig. 5
figure 5

Classification report

Fig. 6
figure 6

Confusion matrix

The mean squared error (MSE), root mean squared error (RMSE), and mean absolute error (MAE) were all reported at low values, illustrating the minimal error in the model’s predictive capabilities. These low error metrics underscore the precision of the model in clinical predictions [27], minimizing the likelihood of misdiagnosis and ensuring high reliability. Figure 7 shows the error metrics comparison.

Fig. 7
figure 7

Error metrics

Figure 8 shows the roc-auc curve across different classes of the proposed model.

Fig. 8
figure 8

ROC curve

Integrating this model into clinical workflows could revolutionize current diagnostic processes by providing gastroenterologists with a powerful tool to improve diagnostic accuracy and efficiency. Figure 9 gives visual insight into the precision-recall curve.

Fig. 9
figure 9

precision-recall curve

Its high accuracy and rapid processing capabilities make it an invaluable asset in clinical settings, assisting in the early detection and classification of gastrointestinal diseases. This is crucial for improving patient outcomes and optimizing treatment strategies. Table 6 presents a comparative analysis of the proposed model against state-of-the-art methodologies.

Table 6 Comparitive analysis of the proposed model

Given the model’s efficient computational performance and robust accuracy, it holds significant potential for real-time application in medical diagnostics. Its ability to process and classify images swiftly could be particularly beneficial in endoscopic procedures, providing immediate insights that could guide clinical decisions during procedures, enhancing both the efficacy and safety of medical interventions.

Image quality is crucial for the performance of models analyzing medical images, as variations in acquisition parameters like resolution, contrast, and noise can significantly affect diagnostic accuracy. These variations often stem from differences in endoscopic equipment, lighting conditions, and patient movements during procedures. For example, lower resolution can hinder the model’s ability to detect subtle features, while poor contrast and noise can obscure critical details, leading to misclassifications, such as false positives or negatives. To address these challenges, our model incorporates preprocessing steps to standardize image quality, including contrast enhancement, lighting normalization, and noise reduction filters. Furthermore, it is trained on a diverse dataset containing images with various quality issues, enhancing its robustness against such variability.

While the model shows high accuracy and reliability, potential limitations such as data biases and the risk of overfitting must be addressed. The model’s performance might currently be optimized for the dataset it was trained on, which may not fully represent the broader population diversity seen in clinical settings. Figure 10 shows the misclassified images of different classes.

In the next phases of our research, we aim to enhance the interpretability and explainability of our deep learning model through several strategies. We plan to implement Layer-wise Relevance Propagation (LRP) to visually explain the contributions of individual pixels in the input image to the model’s predictions, helping clinicians understand which areas influenced decision-making. Additionally, we will integrate Grad-CAM (Gradient-weighted Class Activation Mapping) to generate heatmaps that highlight significant regions for predictions, thereby validating the model’s analysis against clinician assessments. We also intend to develop decision trees to map features learned by the model, providing an intuitive explanation of how certain features impact the output, making the information more accessible for non-expert stakeholders.

Fig. 10
figure 10

Misclassified cases

While our model demonstrates high accuracy in diagnosing the specific GI diseases included in our dataset (ulcerative colitis, polyps, and esophagitis), the scalability of this model to other types of GI diseases remains an area for further investigation. GI diseases encompass a wide range of conditions that may differ significantly in their visual manifestations. Diseases such as Crohn’s disease, gastrointestinal cancers, and various infectious diseases present unique challenges that were not covered in the current model training.

The model’s ability to generalize to these conditions without extensive retraining or adaptation is not guaranteed [28]. The training dataset primarily included images of a limited set of diseases, which may not provide the diverse visual patterns necessary to train the model to recognize other GI conditions effectively. This limit raises concerns about the model’s performance when faced with images of diseases not represented in the training data. The transferability of the learned features to other GI diseases depends significantly on the similarity between the conditions included in the training set and new diseases. Figure 11 shows the heatmap of misclassification done by the model.

Fig. 11
figure 11

Misclassified cases heatmap

While transfer learning might offer a pathway to adapt the model to new diseases, the effectiveness of this approach without substantial supplementary data specific to each new condition is uncertain. To enhance the scalability of our model, future work will focus on expanding the types of GI diseases included in the training process. This expansion will involve curating a more comprehensive dataset that captures a broader spectrum of GI conditions. Moreover, implementing advanced machine learning techniques such as few-shot learning or meta-learning could improve the model’s ability to adapt to new diseases with minimal data. Such approaches would support the model’s utility in a broader clinical context, making it a more versatile tool for GI diagnostics.

Conclusion

This study successfully demonstrated the effectiveness of a deep learning model utilizing the EfficientNetB5 architecture with advanced data augmentation to enhance the diagnosis of gastrointestinal tract diseases. Achieving a test accuracy of 98.89%, the model significantly outperformed existing methods, showcasing its ability to adapt to the variability inherent in medical imaging and reduce common issues like overfitting.

The results confirm the model’s robustness and precision in identifying and classifying various GI conditions, which could improve clinical decision-making and patient outcomes. Future research will focus on further enhancing the model’s generalization capabilities by expanding the dataset to encompass a more diverse range of imaging scenarios and patient demographics. Efforts will also be made to optimize the model’s computational efficiency to facilitate broader deployment in clinical settings, ensuring it can be used effectively without extensive resource requirements. These advancements will continue to push the boundaries of AI in medical diagnostics, aiming to deliver more reliable, efficient, and accessible healthcare solutions.

Data availability

The dataset used for the findings are publicly available https://www.kaggle.com/datasets/francismon/curated-colon-dataset-for-deep-learning.

References

  1. Sharma A, Kumar R, Garg P. Deep learning-based prediction model for diagnosing gastrointestinal diseases using endoscopy images. Int J Med Informatics. Sep. 2023;177:105142. https://doi.org/10.1016/j.ijmedinf.2023.105142.

  2. Tan M, Le Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, in International Conference on Machine Learning, 2019, pp. 6105–6114.

  3. Raut V, Gunjan R, Shete VV, Eknath UD. Gastrointestinal tract disease segmentation and classification in wireless capsule endoscopy using intelligent deep learning model, Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, vol. 11, no. 3, pp. 606–622, Jul. 2022, https://doi.org/10.1080/21681163.2022.2099298

  4. Thomas Abraham JV, Muralidhar A, Sathyarajasekaran K, Ilakiyaselvan N. A Deep-Learning Approach for Identifying and Classifying Digestive Diseases. Symmetry. Jan. 2023;15(2):379. https://doi.org/10.3390/sym15020379.

  5. Obayya M, et al. Modified Salp Swarm Algorithm With Deep Learning Based Gastrointestinal Tract Disease Classification on Endoscopic Images. IEEE Access. 2023;11:25959–67. https://doi.org/10.1109/access.2023.3256084.

    Article  Google Scholar 

  6. Gunasekaran H, Ramalakshmi K, Swaminathan DK, A. J, and, Mazzara M. GIT-Net: An Ensemble Deep Learning-Based GI Tract Classification of Endoscopic Images. Bioengineering. Jul. 2023;10(7):809. https://doi.org/10.3390/bioengineering10070809.

  7. Noor MN, Nazir M, Ashraf I, Almujally NA, Aslam M, Fizzah Jilani S. GastroNet: A robust attention-based deep learning and cosine similarity feature selection framework for gastrointestinal disease classification from endoscopic images. CAAI Trans Intell Technol Jun. 2023. https://doi.org/10.1049/cit2.12231.

    Article  Google Scholar 

  8. Aliyi S, Dese K, Raj H. Detection of gastrointestinal tract disorders using deep learning methods from colonoscopy images and videos. Sci Afr. Jul. 2023;20:e01628. https://doi.org/10.1016/j.sciaf.2023.e01628.

  9. Nouman Noor M, Nazir M, Khan SA, Song O-Y, Ashraf I. Efficient Gastrointestinal Disease Classification Using Pretrained Deep Convolutional Neural Network. Electronics. Mar. 2023;12(7):1557. https://doi.org/10.3390/electronics12071557.

  10. Sivari E, Bostanci E, Guzel MS, Acici K, Asuroglu T, Ercelebi Ayyildiz T. A New Approach for Gastrointestinal Tract Findings Detection and Classification: Deep Learning-Based Hybrid Stacking Ensemble Models. Diagnostics. Feb. 2023;13(4):720. https://doi.org/10.3390/diagnostics13040720.

  11. Malik H, Naeem A, Sadeghi-Niaraki A, Naqvi RA, Lee S-W. Multi-classification deep learning models for detection of ulcerative colitis, polyps, and dyed-lifted polyps using wireless capsule endoscopy images, Complex & Intelligent Systems, vol. 10, no. 2, pp. 2477–2497, Nov. 2023, https://doi.org/10.1007/s40747-023-01271-5

  12. Bajhaiya D, Unni SN, Koushik AK. Deep learning–powered generation of artificial endoscopic images of GI tract ulcers. iGIE. Dec. 2023;2(4):452–63. https://doi.org/10.1016/j.igie.2023.08.002. .e2.

  13. Wu R, et al. Application of the convolution neural network in determining the depth of invasion of gastrointestinal cancer: a systematic review and meta-analysis. J Gastrointest Surg. Apr. 2024;28(4):538–47. https://doi.org/10.1016/j.gassur.2023.12.029.

  14. Albalawi E, et al. Oral squamous cell carcinoma detection using EfficientNet on histopathological images. Front Med. Jan. 2024;10. https://doi.org/10.3389/fmed.2023.1349336.

  15. Alshuhail A, et al. Refining neural network algorithms for accurate brain tumor classification in MRI imagery. BMC Med Imaging. May 2024;24(1). https://doi.org/10.1186/s12880-024-01285-6.

  16. Cai L, Fang H, Xu N, Ren B. Counterfactual Causal-Effect Intervention for Interpretable Medical Visual Question Answering. IEEE Trans Med Imaging. 2024. https://doi.org/10.1109/TMI.2024.3425533.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Demirbaş AA, Üzen H, Fırat H. “Spatial-attention ConvMixer architecture for classification and detection of gastrointestinal diseases using the Kvasir dataset,” Health Inf Sci Syst. 2024;12(1). https://doi.org/10.1007/s13755-024-00290-x.

  18. Ahmed IA, Senan EM, Shatnawi HSA. “Hybrid models for endoscopy image analysis for early detection of gastrointestinal diseases based on fused features,” Diagnostics, 2023;13(10):1758. https://doi.org/10.3390/diagnostics13101758.

  19. Bella F, Berrichi A, Moussaoui A. Vision Transformer Model for Gastrointestinal Tract Diseases Classification from WCE Images, 2024 8th International Conference on Image and Signal Processing and their Applications (ISPA), Apr. 2024, https://doi.org/10.1109/ispa59904.2024.10536754

  20. Kumar R, Singh A, Khamparia A. Multiclass Classification of Gastrointestinal Colorectal Cancer Using Deep Learning, Lecture Notes in Networks and Systems, pp. 625–636, Oct. 2023, https://doi.org/10.1007/978-981-99-4071-4_48

  21. Kim H-S, Cho B, Park J-O, Kang B. Diagnostics. Mar. 2024;14(6):591. https://doi.org/10.3390/diagnostics14060591. Color-Transfer-Enhanced Data Construction and Validation for Deep Learning-Based Upper Gastrointestinal Landmark Classification in Wireless Capsule Endoscopy.

  22. Patel V, Patel K, Goel P, Shah M. Classification of Gastrointestinal Diseases from Endoscopic Images Using Convolutional Neural Network with Transfer Learning, 2024 5th International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), Mar. 2024, https://doi.org/10.1109/icicv62344.2024.00085

  23. Varalaxmi G, Baddam SR, Yalamarthi ES, Swaraja K, Madhavi KR, Cn, Sujatha. Diagnosis of Gastrointestinal Diseases Using Modern CNN Techniques, 2023 IEEE 8th International Conference for Convergence in Technology (I2CT), Apr. 2023, https://doi.org/10.1109/i2ct57861.2023.10126259

  24. Sujatha R, Helen D, Hemamalini U, Divya V, Priya Dharshini A. Gastrointestinal disease prediction using transfer learning, International Conference on Computer Vision and Internet of Things 2023 (ICCVIoT’23), 2023, https://doi.org/10.1049/icp.2023.2851

  25. Diwakar M, Singh P, Garg D. Edge-guided filtering based CT image denoising using fractional order total variation. Biomed Signal Process Control. 2024;92:106072.

    Article  Google Scholar 

  26. Singh P, Diwakar M. Total variation-based ultrasound image despeckling using method noise thresholding in non‐subsampled contourlet transform. Int J Imaging Syst Technol. 2023;33(3):1073–91.

    Article  Google Scholar 

  27. Singh P, Diwakar M, Singh V, Kadry S, Kim J. A new local structural similarity fusion-based thresholding method for homomorphic ultrasound image despeckling in NSCT domain. J King Saud University-Computer Inform Sci. 2023;35(7):101607.

    Google Scholar 

  28. Khozeymeh F, Ariamanesh M, Roshan NM, Jafarian A, Farzanehfar M, Majd HM, Sedghian A, Dehghani M. Comparison of FNA-based conventional cytology specimens and digital image analysis in assessment of pancreatic lesions. CytoJournal. 2023;20:39. https://doi.org/10.25259/Cytojournal_61_2022.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Not Applicable.

Funding

This research received no external funding.

Author information

Authors and Affiliations

Authors

Contributions

A.M.J.M.Z.R and R.M took care of the review of literature and methodology. A.M.J.M.Z.R and K.C have done the formal analysis, data collection and investigation. M.T.R has done the initial drafting and statistical analysis. K.V and T.E.Y have supervised the overall project. All the authors of the article have read and approved the final article.

Corresponding author

Correspondence to Temesgen Engida Yimer.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zubair Rahman, A.M.J.M., Mythili, R., Chokkanathan, K. et al. Enhancing image-based diagnosis of gastrointestinal tract diseases through deep learning with EfficientNet and advanced data augmentation techniques. BMC Med Imaging 24, 306 (2024). https://doi.org/10.1186/s12880-024-01479-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12880-024-01479-y

Keywords