1. Introduction
Detecting brain tumors is critical in medical diagnostics, given the severe implications these anomalies pose to patients’ health and well-being [
1]. Tumors pose a formidable challenge due to their extensive network of neurons and supporting structures, which makes the human brain susceptible to a wide range of pathological disorders [
2]. These tumors, defined by aberrant cell growths inside the brain tissue, can affect people of any age or demographic and present in various forms, from benign to malignant. Because brain tumors can impede neurological function and cause a wide range of symptoms, including headaches, seizures, cognitive decline, and potentially life-threatening complications, it is critical to identify them accurately and promptly [
3,
4,
5]. Furthermore, the timing of the identification of these abnormalities has a substantial impact on the prognosis and available treatments for individuals with brain tumors. Early detection improves the chances of a successful course of treatment. It makes it possible for medical professionals to carry out measures meant to maintain quality of life and cognitive function.
Historically, the detection of brain tumors has relied heavily on conventional imaging modalities such as computed tomography (CT) and magnetic resonance imaging (MRI) [
6,
7]. While these techniques have revolutionized the field of diagnostic neuroimaging, allowing for the visualization of anatomical structures with unprecedented clarity, their efficacy in discerning subtle or early-stage lesions remains limited. Furthermore, the interpretation of imaging findings often necessitates the expertise of radiologists or neurosurgeons, leading to potential delays in diagnosis and treatment initiation. In recent years, advancements in technology and computational methodologies have spurred the development of innovative approaches for brain tumor detection. Machine learning algorithms, in particular, have emerged as powerful tools for analyzing medical imaging data and extracting clinically relevant information with remarkable accuracy and efficiency [
8,
9,
10]. By leveraging large datasets of annotated images, these algorithms can be trained to discern patterns indicative of brain tumors, enabling automated screening and detection processes that augment the capabilities of healthcare professionals. CNNs, one of the most new deep learning algorithms developed primarily to handle image-related tasks, are one of the cases of machine learning applied in brain tumor diagnosis [
11,
12]. These networks are proficient in segmenting basic shapes and relations and difficult patterns and features in medical imaging, enabling distinguishing between normal and pathologic brain sectors. Using variants of CNNs, they can detect slight changes in picture intensity, shape, or texture, which are likely signs of a tumor, by using an iterative process of training the networks using annotated datasets.
Deep learning, a subset of artificial intelligence, has emerged as a powerful tool in medical imaging, particularly for brain segmentation. Brain segmentation is a critical process in medical diagnostics and research, enabling the precise delineation of anatomical structures and pathological regions within brain images. Traditional segmentation methods, often reliant on manual annotation or conventional image processing techniques, are time-consuming and prone to variability [
12]. In contrast, deep learning approaches leverage large datasets and advanced neural network architectures to automate and enhance the segmentation process, achieving high accuracy and consistency. Convolutional neural networks (CNNs), in particular, have demonstrated remarkable success in capturing intricate features and patterns within brain images, facilitating the identification of subtle differences between healthy and diseased tissues.
Recent advancements in deep learning have refined brain segmentation techniques, integrating novel architectures such as U-Net, Fully Convolutional Networks (FCNs), and Transformer models. These models are designed to handle brain structures’ complex and heterogeneous nature, offering improved performance over traditional methods. The application of deep learning in brain segmentation enhances the accuracy of diagnosis and treatment planning and accelerates the pace of research in neuroscience and related fields. Additionally, the advent of transfer learning and domain adaptation techniques allows for the effective utilization of pre-trained models, reducing the need for extensive labeled datasets and enabling more efficient deployment in clinical settings. As deep learning evolves, its potential to revolutionize brain segmentation and broader medical imaging applications becomes increasingly evident [
13].
In conclusion, the incorporation of machine learning algorithms into neuroimaging processes signifies a revolutionary change in identifying and treating brain cancers. Researchers and healthcare professionals can improve the effectiveness, precision, and accessibility of diagnostic services, ultimately leading to better patient outcomes and advancement in our knowledge of neurological disorders by utilizing artificial intelligence to analyze complex medical images. Neuro-oncology is about to enter a new era of precision medicine as advances in computational techniques and early diagnosis of brain cancers are expected to bring about revolutionary changes in individualized treatment plans.
Literature Review
There is potential for improving resource allocation in healthcare settings and expediting diagnostic procedures by incorporating machine learning algorithms into current neuroimaging workflows [
13,
14]. By automating the initial screening and triage of imaging studies, these algorithms enable radiologists and clinicians to prioritize cases requiring urgent attention, thereby expediting the diagnostic pathway for patients at risk of brain tumors. Furthermore, the quantitative insights machine learning models provide can aid in risk stratification and treatment planning, guiding healthcare providers in selecting the most appropriate therapeutic interventions for individual patients [
15,
16,
17]. Despite the considerable progress achieved in brain tumor detection through machine learning, several challenges and opportunities for further research persist. Chief among these is the need for large-scale, annotated datasets encompassing diverse populations and tumor subtypes, essential for training robust and generalizable algorithms. Additionally, efforts to enhance the interpretability and transparency of machine learning models are crucial for fostering trust and acceptance within the medical community, where clinical decision-making carries profound implications for patient outcomes.
Brain tumor segmentation, a crucial task in medical image analysis, has seen significant advancements propelled by deep learning techniques. Various approaches have been explored to enhance the accuracy and efficiency of brain tumor segmentation algorithms. Bindu and Sastry [
18] introduced a Modified U-Net and ResNet model, addressing the challenge of high-resolution MRI scans. By incorporating skip connections between different parts of the network, their model improves prediction accuracy and detail preservation compared to traditional U-Net architectures. The results from their study showed an Intersection over Union (IoU) of 90.2% and a dice coefficient of 94.8%. However, they did not provide any information regarding the size of the training and testing datasets. Fang and Wang [
19] proposed a multi-input U-Net model integrating mask images to provide spatial relationship information, thereby improving segmentation accuracy, especially in regions with fuzzy boundaries.
Additionally, their model tackles the low-resolution issue in sagittal and coronal planes, enhancing memory efficiency. A small dataset has been utilized, and their accuracy did not exceed 92%. Vinisha and Boda [
20] developed a cascaded, Fully Convolutional Improved DenseNet with an Attention-based Adaptive Swin U-Net-derived segmentation strategy, achieving high precision and accuracy rates in tumor detection and segmentation tasks. The Brain MRI Dataset of Multiple Sclerosis has been exploited with 96.07% accuracy.
Efforts have also been made to optimize brain tumor segmentation models for efficient hardware implementation. Neiso et al. [
21] optimized a U-Net model for FPGA-based inference, reducing depth and filter count to enhance efficiency on FPGA hardware. Leveraging High-Level Synthesis for Machine Learning (HLS4ML), their implementation demonstrated significant reductions in resource utilization while maintaining segmentation accuracy comparable to the original model. A low result was achieved in the dice coefficient and did not exceed 74%. Additionally, Shedbalkar and Prabhushetty [
22] proposed a deep transfer learning model combining U-Net and chopped VGGNet for brain tumor segmentation and classification. Their two-stage framework utilizes U-Net for segmentation and VGGNet for classification, but no results are mentioned in their paper. Innovative architectures have also emerged to address limitations in conventional convolutional networks. Liang et al. [
23] introduced BTSwin-U-Net, a 3D U-shaped symmetrical Swin Transformer-based network for brain tumor segmentation. Motivated by the powerful long-range information interaction of Vision Transformer, their model leverages self-supervised pre-training to achieve advanced segmentation performance, and they obtained a dice similarity coefficient that does not exceed 86.4%.
Furthermore, Iriawan et al. [
24] proposed a YOLO-U-Net architecture combining CNN and FCN methodologies for brain tumor detection and segmentation. By integrating deep learning techniques, their approach provides automatic tumor detection and segmentation with high accuracy and precise localization. A high accuracy reached 97%, with 277 images for training, 69 for validation, and 14 for tests.
Recent advancements in brain tumor segmentation algorithms have leveraged the U-Net architecture, integrating novel modules to improve performance. For instance, Shedbalkar et al. [
25] introduced a U-Net and Transformer-based model with a hybrid attention mechanism, achieving high segmentation accuracy across various datasets. The best achievement in BraTS2021 reported a dice coefficient of 95%, with a sample size of 1480 patients studied. Similarly, Akbar et al. [
26] proposed a DO-U-Net model incorporating residual modules, multi-scale feature fusion, and attention mechanisms to enhance segmentation accuracy while reducing computational complexity. Furthermore, lightweight architectures like Yaru3DFPN and TFFbU have been developed to address the computational demands of traditional CNNs while maintaining segmentation performance [
26]. These models integrate feature pyramid networks and attention mechanisms to achieve accurate and efficient segmentation, promising clinical applicability; their dice coefficients did not exceed 86.2%.
Moreover, hybrid models combining deep learning with meta-heuristic algorithms, as demonstrated by Siva Kumar et al. [
27], showcase improved segmentation accuracy and computational efficiency. By integrating meta-heuristic optimization with U-Net architecture, these models enhance feature extraction and classification, facilitating precise tumor delineation, but no information about their results as numbers. In addition to architectural innovations, researchers have explored integrating advanced techniques such as LSTM [
28] and attention mechanisms [
29] into U-Net-based models, and good results in dice coefficients have been achieved. Still, their method suffered from high-time computation. These enhancements enable the capture of temporal dependencies and contextual information, leading to more accurate tumor segmentation. Zheng and his colleague [
30] presented a serial encoding-decoding structure that improves segmentation performance by combining hybrid dilation convolution (HDC) units and concatenating every unit of two serial networks. They also suggest a novel loss function to help the model focus on challenging to segment and categorize samples.
Sahoo et al. [
31] suggested a two-stage strategy. In the initial stage, an encoder–decoder-oriented U-Net with a residual network is deployed to detect various brain cancers. In the second stage, a YOLO2 (you only look once)-based transfer learning strategy is utilized to identify the obtained tumors.
Utilizing the suggested Enhanced Invasive Bat (IIB)-based Deep Residual network model, Gupta and Bibhu [
32] developed an effective brain tumor detection method. As a result, the enhanced Invasive Weed Optimization (IWO) and Bat Algorithm (BA) are incorporated into the proposed IIB algorithm.
Zoghbi et al. [
33] presented Generative Adversarial Networks (GANs) with the ability to synthesize realistic images to enhance the training of deep learning models. Additionally, they investigated the CyclaGAN architecture through hyperparameter adjustments and carried out brain tumor image segmentation.
Encoder–decoder-based convolutional neural networks were proposed by Raghu and Lakshmi [
34] as an approach to automate brain tumor segmentation fully.
A new deep-learning model for medical image segmentation called nn-U-Net, was proposed by Fabian Isensee [
35]. This model configures its parameters automatically without human intervention to achieve excellent segmentation results. It was tested on 23 public databases containing various medical images and demonstrated strong performance.
Magadza and Viriri’s [
36] model implemented nnU-Net [
35], which gained BraTS 2020. The model has a structure of encoders and decoders with skip connections connecting the two paths. There are five resolution levels in the network. Each step in the encoding pathway doubles the feature maps from basic feature maps of 32 to a maximum of 320 and uses strides convolution to cut the spatial resolution in half. Each layer was subjected to two successive convolution blocks, each carrying out a 3 × 3 × 3 convolution, instance normalization, and Leaky ReLu non-linearity. Their network achieved average dice scores of 85% with 25.71 M parameters.
A comparison study of four deep learning models, including nnU-Net, evaluating their effectiveness in brain tumor research was presented by Huang et al. [
37]. With the same structure used in [
2] but with a bottleneck feature map size of 4 × 4 × 4, this was applied to the BRATS dataset, and the resulting dice score was 85%.
These diverse methodologies underscore the ongoing efforts to enhance brain tumor segmentation techniques through profound learning innovations, optimization for hardware efficiency, and novel architectural designs. As advancements continue, the field moves closer to realizing more accurate and efficient solutions for aiding medical diagnosis and treatment planning.
This paper explores deep learning models for brain segmentation and enhances these models to achieve more robust, accurate, and fast brain tumor segmentation. It examines the differences between various models to determine the best one with minimal computation time. The novelty of this work lies in identifying and enhancing the most effective deep-learning model for accurately segmenting brain tumors. This involves refining the model’s structure to achieve high accuracy and speed, making it suitable for medical applications.
2. Materials and Methods
The study’s workflow began with obtaining and preparing image data. The dataset is divided into three subsets: 70% for training, 10% for validation, and 20% for testing. The preprocessing started by resizing images and merging them with their corresponding masks. The prepared dataset was employed to train several deep learning models of different versions of DeeplabV3 architecture [
38] with various networks of backbone such as (ResNet18, ResNet50, and MobileNet) besides utilizing several U-Net variations [
39], which is based on enhancing the existed U-Net model. Various assessment methods are exploited with many matrices to differentiate between the best and the worst models in segmenting brain tumors. These matrices are the dice similarity coefficient [
40], Global Accuracy (GA), Mean Accuracy (MA), Mean Intersection over Union (Mean IoU) [
41], and Weighted IoU (WIoU) [
42]. Moreover, Grad-CAM [
43] was created to understand the image regions visually, enhancing the impact on the model’s predictions.
Figure 1 outlines the methodology employed in this study, starting from the input image and concluding with the visualization using Grad-CAM.
2.1. Dataset
The dataset used in this study is a 2D brain tumor segmentation dataset. It is publicly available on Kaggle at the following link:
https://www.kaggle.com/datasets/nikhilroxtomar/brain-tumor-segmentation/data (accessed on 20 September 2023) [
44]. This dataset comprises 3064 pairs of T1-weighted contrast-enhanced images and their respective binary masks indicating tumor presence. The photos are in .png format; each image is sized 512 × 512 pixels. The dataset consists of imaging data from 233 patients, encompassing various types of brain tumors. The distribution of tumor types within the dataset is as follows: Meningioma—708 slices; Glioma—1426 slices; Pituitary tumor—930 slices. These data provide a diverse representation of brain tumor types commonly encountered in clinical practice, facilitating comprehensive research and development of segmentation algorithms and diagnostic tools [
38].
Figure 2 shows an example of a brain MR image alongside its corresponding tumor segments.
2.2. Data Preprocessing
The data preprocessing section of the study involved several steps to prepare the raw image and mask data for tumor segmentation. Initially, pictures and corresponding pixel-level masks were organized into separate folders. Data partitioning was conducted next, splitting the dataset into training (70%), validation (10%), and testing sets (20%). Subsequently, images were resized to a standard size of [224 × 224 × 3]. Finally, the preprocessed images and labels were combined into unified data stores for training and evaluation. These combined data stores were then utilized in subsequent steps to train and assess the efficacy of the tumor segmentation model. All models are trained using the following hyperparameters: Adam optimizer, an initial learning rate of 0.0001, a maximum of 100 epochs, and a minibatch size of 32.
2.3. Deep Learning Architecture Description
This paper employs two types of deep learning architectures: DeepLabV3 models with various backbone structures (ResNet50, ResNet101, and MobileNet) and U-Net. The U-Net deep learning model consists of an encoder, decoder, and bridge connection. The respective sections provide a detailed explanation of each model’s structure.
2.3.1. DeepLabV3
Various deep learning architectures were examined to train models for semantic segmentation tasks, including the use of DeepLabV3 [
38] with different backbone networks such as ResNet18, ResNet50, and MobileNet. A key feature of DeepLabV3 is its reliance on atrous (dilated) convolution, which focuses on capturing features without compromising spatial resolution by using different rates of perspective fields. For example, a rate of 2 means a gap of one between two kernel parameters, reducing the number of parameters without affecting resolution. Additionally, DeepLabV3 employs atrous spatial pyramid pooling (ASPP) to capture contextual features at multiple scales, such as 1 × 1 and 3 × 3, using various atrous convolution rates. This study leverages each DeepLabV3 backbone structure to evaluate its performance in brain tumor segmentation using multiple metrics, aiming to identify the most effective model for the segmentation process.
2.3.2. U-Net Architecture Description
The U-Net architecture is a deep learning model designed explicitly for semantic segmentation in image processing, characterized by an encoder–decoder structure. Input images of size 224 × 224 pixels with three channels are normalized to [0, 1] before processing. The encoder consists of traditional convolutional layers followed by ReLU activations to extract hierarchical features, with max pooling used to reduce spatial dimensions and capture increasingly abstract features. A bridge section connects the encoder to the decoder using convolutional layers, incorporating dropout to prevent overfitting. The decoder employs transposed convolutions for upsampling and ReLU activations to introduce non-linearity and capture complex feature relationships. The network concludes with a softmax-activated convolutional layer for pixel-wise class probabilities and a specialized dice pixel classification layer for precise semantic segmentation.
Different variants of the U-Net architecture were employed to segment brain tumor images, each utilizing distinct parameter configurations.
Table 1 below summarizes the variations in architecture, including kernel size, number of channels, and dropout rate:
These variations allowed for exploring different architectural configurations within the U-Net framework to assess their impact on brain tumor segmentation performance.
Figure 3 depicts the modified version of U-Net, which significantly enhances brain tumor segmentation performance. It shows the various modifications made to the encoder, decoder, and bridge to reconstruct the segmented image to match the original image size. Additionally, the figure visually represents the image sizes at each stage, from encoding through the bridge to decoding. The figure illustrates the changes made to the structure by replacing ReLU with Leaky ReLU, switching max pooling to average pooling, adjusting the kernel size to 5 × 5 with 32 channels, and adding a dropout layer of 0.3. These modifications enhance and optimize the existing U-Net architecture for brain tumor segmentation.
2.4. Evaluation Metrics
The performance evaluation in image segmentation tasks requires appropriate metrics to quantify the accuracy and efficacy of segmentation algorithms. In this section, we discuss several key evaluation metrics commonly employed for assessing the quality of segmentation results.
2.4.1. Dice Similarity Coefficient [40]
It is used to assess each model’s segmentation performance, known as the dice similarity index (DSI). This assessment is conducted by comparing the segmentation results produced by the model with manually segmented images created by radiologists, which serve as the ground truth.
The formula for calculating the dice coefficient is:
A represents the set of pixels classified as part of the predicted segmentation.
B represents the set of pixels part of the ground truth segmentation.
|A ∩ B| denotes the number of standard pixels to both sets.
|A|+|B| denotes the number of pixels in the predicted and ground truth segmentations.
2.4.2. Global Accuracy (GA)
Global Accuracy (GA) represents the proportion of correctly classified pixels over the total number of pixels in all classes. It is computed as:
where
are the number of true positives, false positives, false negatives, and true negatives for class
and
N is the total number of classes.
2.4.3. Mean Boundary F1 Score (MeanBF)
It is defined as the harmonic mean of precision and recall, incorporating a distance error tolerance to determine if a point on the predicted boundary matches a point on the ground truth boundary.
2.4.4. Mean Accuracy (MA)
MA calculates the average accuracy of pixel-wise classification across all classes, providing a balanced assessment of segmentation performance across different categories.
2.4.5. Mean Intersection over Union (Mean IoU) [40]
Mean IoU computes the average IoU (Intersection over Union) and is used to assess how well segmentation models work in computer vision applications. The overlap between the segmentation that is anticipated and the segmentation that is based on ground truth is measured. IoU is computed explicitly as the area of union divided by the area of intersection over the ground truth and the predicted segmentation. This yields a number in the range of 0 to 1, where 1 denotes perfect overlap.
2.4.6. Weighted Intersection over Union (WIoU) [41]
WIoU accounts for class imbalance by weighting the IoU of each class based on its frequency in the dataset, offering a more nuanced evaluation of segmentation performance in scenarios with uneven class distributions.
2.5. Visualization
Grad-CAM (Gradient-weighted Class Activation Mapping) [
43] is a visualization technique used to provide visual explanations for the decisions made by convolutional neural networks (CNNs) in image segmentation tasks. It highlights the regions of an image most important for the network’s prediction, making the model’s decision process more interpretable.
3. Results
The available dataset is divided into three subsets: 70% for training with 2415 images, 20% for testing with 613 images, and 10% for validation with 305 images. The proposed workflow was trained, validated, and tested using hyperparameters, including RMSprop optimizer, a learning rate of 0.0001, 100 epochs, and a mini-batch size of 32. The process began with existing deep labv3 models, specifically ResNet-18, MobileNet, ResNet-50, and the original U-Net. This paper is carried out using a work station two VGA cards with processors 13Gen Intel Core i9 13900K, with memory 4× 32 GB FURY 6000 MHz DDR5, and internal drive 2× Samsung 980 PRO 2 TB M.2 NVME GEN 4. The table below illustrates the performance of each model under these settings.
MobileNet demonstrated the best Global Accuracy and Mean Accuracy, Weighted IoU, and achieved the second highest rank in DSI. ResNet-18 outperformed ResNet-50 and nearly matched U-Net in various performance metrics. Due to the numerous advantages of U-Net’s encoder and decoder structures, this paper focuses on modifying the designed U-Net to create a more accurate, faster, and lighter model. The experiment involved multiple parameter modifications to enhance the existing U-Net’s performance. Each modification is detailed in terms of kernel size, number of channels, and dropout ratio before the connected layer.
Table 2 presents the results of the original U-Net without any modifications, with parameters including a kernel size of 3 × 3, 64 channels, and a dropout ratio of 0.5. As shown in
Table 2, its performance is comparable to MobileNet.
In contrast,
Table 3 highlights the differences between the results obtained from MobileNet and U-Net. The primary difference lies in computation time: MobileNet requires nearly one hour to build the model, whereas U-Net has the lowest computation time among all structures, taking only 27 min. This indicates a more straightforward structure. Therefore, modifications to the existing U-Net structure were made to optimize it for better results while maintaining lower computation time, which is the primary goal of this paper.
Table 2 lists each modification by name and number and the specific parameters altered. Furthermore,
Figure 4 depicts the results presented in
Table 2. The bold number indicates to lowest computation time.
Each proposed model was trained using the previously mentioned hyperparameters (optimizers, number of epochs, learning rate). The training time for each model was recorded, and the results are summarized in
Table 4.
The previous table clearly shows the effectiveness of reducing the kernel size, which is the field of view of the filter and adjusting the number of depth channels and the dropout ratio during training time. The shortest training time was achieved by U-Net_P3, completing training within 21 min. The performance criteria for each model were computed, and the results are summarized in
Table 4.
The results in
Table 5 are presented visually in
Figure 5, as shown below. The bold indictaes to the highest result in each column.
The results indicate that modifying the kernel size and dropout rate has enhanced segmentation outcomes regarding DSI and mean IoU, mainly observed in the first proposed network, U-Net_P1. This modification process has improved results, notably strengthening the dice similarity index. Further improvement was achieved by altering the layers, such as replacing the max pooling layer with an average pooling layer and changing the transfer function to Leaky ReLU from ReLU.
The latest modification to the U-Net structure involved adjusting the kernel size to 5 × 5, setting the number of channels to 32, and the dropout ratio to 0.3. This modification significantly improved brain tumor segmentation, as demonstrated in the following table (
Table 6):
The superior results achieved with the lowest training time of 19 min compared to existing methods surpass the standards in the literature.
The experiment involved numerous parameter modifications, and the heat map images depicted the performance of each proposed model in delineating the tumor region. The accuracy of tumor segmentation, whether for small or large tumors, primarily hinges on the efficacy of the designed model in extracting local features and classifying pixels within the kernel structure’s field of view, in addition to the dropout ratio. The results are visually depicted in the color map in the figure.
Figure 6 shows the test image and its corresponding label.
Figure 7 displays the segmentation results of various models for the original image, as shown in
Figure 6.
The heat map localization in each case does not precisely indicate the tumor region; it either extends to other areas or fails to cover the entire region. However, the last Figure demonstrates the effectiveness of the Modified U-Net in accurately covering the whole region without compromising the tumor site. This improvement stems from the new modification’s ability to extract the most relevant features using the kernel size and activation function, enhancing existing results and reducing computation time.
The improved deep learning model effectively detects small tumors, not just large ones, as demonstrated in
Figure 7, but also small tumors, as is evident in the corresponding Figure.
Figure 8 shows the original image, the ground truth tumor region, the network’s predicted label, and a heat map representation to highlight the exact affected area, as indicated by the heat map bar on the far right of the Figure.
The current study is compared with the literature to identify its strengths and weaknesses.
Table 7 describes the study number alongside the type of database used to conduct the study. The dataset used in these studies is the same data utilized to evaluate the enhanced model. The corresponding table details all studies that employed this dataset.
As evident from the previous table, numerous papers have used the same dataset in recent years but achieved results that are inferior to ours. The only study with a higher MDice is reference [
30]. However, this study did not use a validation set and had a smaller test sample size than ours. Researchers tested on 289 images, whereas we tested on 613 images, significantly affecting the dice results. Additionally, previous studies in the literature did not mention their results clearly and often used different datasets with smaller sample sizes. The key strengths of our study are its simplicity, low computation time, and precise results.
A notable difference emerges when comparing the current study to the nnU-Net. The modified version is optimized specifically for brain tumors and can be applied to other datasets. In contrast, the automated adjustment of parameters in nnU-Net to achieve the best segmentation results and changing layers requires significant computation time, which is costly in medical diagnostics. The enhanced version of U-Net uses Leaky ReLU instead of ReLU, offering advantages like improved model robustness and performance by addressing limitations such as vanishing gradients for negative inputs. With fixed parameters in the modified version, it delivers robust and fast results, making it suitable for hospital use and training radiology residents.
Furthermore, the modified model was compared with the nnU-Net used in [
36,
37], which was applied to the Brats 2020 brain tumor dataset. The nnU-Net achieved a dice coefficient not exceeding 85%, whereas the modified version of U-Net reached a dice coefficient of 90.2% with 5.5 M parameters. In contrast, the nnU-Net had 25.71 M parameters, indicating its high complexity and longer computation time required to optimize hyperparameters. By fixing hyperparameters such as learning rate, batch size, and the optimizer, time is saved, and efforts can be focused on adjusting the model weights to achieve the best results.
4. Conclusions
Detecting brain tumors is a critical task in medical diagnostics due to the severe implications these anomalies pose to patients’ health and well-being. The human brain, with its intricate network of neurons and supporting structures, is susceptible to various pathological conditions, with tumors being particularly challenging. Employing artificial intelligence aims to detect and segment brain tumor regions accurately. This study evaluated the performance of deep learning models for accurate segmentation of brain tumors using a large dataset. It began by examining existing models and their efficiency in precise segmentation. The study utilized the U-Net model with various modifications to achieve the best Enhanced U-Net model for segmentation.
The focus was on specific parameters such as the number of channels, dropout ratio, kernel size, activation function type, and pooling layer type. Training options were standardized across all trials to evaluate the impact of these modifications on performance results. The optimal model was achieved by combining a 5 × 5 kernel size, 32 channels, a 0.3 dropout ratio, average pooling instead of max pooling, and Leaky ReLU instead of ReLU. The results showed significant improvement over previous literature, with training time not exceeding 19 min. The performance metrics revealed a dice similarity of 90.2% and an accuracy of 99.5%. The modified version’s performance was tested on both large and small tumors, which presents a significant challenge for radiologists. In all cases, the model demonstrated notable results in delineating the tumor region and identifying the major affected areas.
This proposed model has the potential to be developed into standalone software for use in hospitals and clinics, serving as a valuable tool for radiologists in identifying tumor locations for further diagnosis. It can benefit resident radiologists by enhancing their experience pinpointing tumor locations and reducing errors in identifying suspicious regions.