Open AccessArticle

Automatic Segmentation in 3D CT Images: A Comparative Study of Deep Learning Architectures for the Automatic Segmentation of the Abdominal Aorta

Christos Mavridis

^*,

Theodoros P. Vagenas

Theodore L. Economopoulos

Ioannis Vezakis

Ourania Petropoulou

Ioannis Kakkos

and

George K. Matsopoulos

Biomedical Engineering Laboratory, Department of Electrical and Computer Engineering, National Technical University of Athens, 15780 Athens, Greece

Authors to whom correspondence should be addressed.

Electronics 2024, 13(24), 4919; https://doi.org/10.3390/electronics13244919

Submission received: 31 October 2024 / Revised: 8 December 2024 / Accepted: 11 December 2024 / Published: 13 December 2024

(This article belongs to the Special Issue Pattern Recognition and Image Processing: Latest Advances and Prospects)

Download

Browse Figures

Figure 1
The architecture of UNet. "> Figure 2
The architecture of UNETR. "> Figure 3
The architecture of SwinUNETR. "> Figure 4
The architecture of SegResNet. "> Figure 5
Detected regions of interest (aorta) superimposed on the original imaging data slices for two clinical cases (row 1 and 2) of the public dataset for (a) the initial image; (b) ground truth; (c) UNet model; (d) UNETR model; (e) SwinUNETR model; and (f) SegResNet. "> Figure 6
Detected regions of interest (aorta) superimposed on the original imaging data slices for two clinical cases (row 1 and 2) of the private dataset for (a) the initial image; (b) ground truth; (c) UNet model; (d) UNETR model; (e) SwinUNETR model; and (f) SegResNet. "> Figure 7
Three-dimensional fused models of the estimated aorta (blue) superimposed on the ground truth (coral) for three cases, using (a) UNet; (b) UNETR; (c) SwinUNETR; and (d) SegResNet. "> Figure 7 Cont.
Three-dimensional fused models of the estimated aorta (blue) superimposed on the ground truth (coral) for three cases, using (a) UNet; (b) UNETR; (c) SwinUNETR; and (d) SegResNet. ">

Versions Notes

Abstract

Abdominal aortic aneurysm (AAA) is a complex vascular condition associated with high mortality rates. Accurate abdominal aorta segmentation is essential in medical imaging, facilitating diagnosis and treatment for a range of cardiovascular diseases. In this regard, deep learning-based automated segmentation has shown significant promise in the precise delineation of the aorta. However, comparisons across different models remain limited, with most studies performing algorithmic training and testing on the same dataset. Furthermore, due to the variability in AAA presentation, using healthy controls for deep learning AAA segmentation poses a significant challenge. This study provides a detailed comparative analysis of four deep learning architectures—UNet, SegResNet, UNet Transformers (UNETR), and Shifted-Windows UNet Transformers (SwinUNETR)—for full abdominal aorta segmentation. The models were evaluated both qualitatively and quantitatively using private and public 3D (Computed Tomography) CT datasets. Moreover, they were successful in attaining high performance in delineating AAA aorta, while being trained on healthy aortic imaging data. Our findings indicate that the UNet architecture achieved the highest segmentation accuracy among the models tested.

Keywords:

abdominal aorta; computed tomography; machine learning; deep learning; medical image analysis; segmentation

1. Introduction

Abdominal aortic aneurysm (AAA) is a vascular disease characterized by a localized dilatation of the vessel (where the maximum diameter between the renal and iliac arteries exceeds more than 50% of its normal diameter) which, if left untreated, can lead to catastrophic rupture [1,2]. The timely detection of AAA and other associated conditions like intraluminal thrombus and vascular calcification is essential for improving patient outcomes [3]. As such, knowing the geometry of the abdominal aorta (starting from the diaphragm and bifurcating into the common iliac arteries), including its diameter and length, is critical for the diagnostic evaluation of AAA (since the abdominal aorta plays a crucial role in supplying blood to multiple organs) [4,5]. Medical imaging is a critical component in the diagnosis and management of AAA. Among the various imaging techniques, computed tomography (CT) is particularly valuable for its ability to provide high-resolution views of anatomical structures. Visualization techniques, especially 3D visualization, are essential for allowing medical professionals to assess anatomical regions of interest (ROIs) with precision, beyond what is possible with two-dimensional (2D) imaging modalities [6]. Through 3D visualization, medical experts can qualitatively and quantitatively evaluate geometric properties and abnormalities, providing a reliable means to examine ROIs such as the abdominal aorta [7].

Image segmentation is a necessary step prior to 3D visualization, as it clearly defines and isolates the anatomical ROI from the surrounding data [8,9]. While manual or semi-automatic segmentation methods are commonly employed in clinical practice, their reliance on human intervention can limit their scalability and accuracy. Automated segmentation approaches using machine learning and deep learning architectures offer a promising alternative by eliminating many of the limitations inherent in manual methods [10,11]. In recent years, deep neural networks (DNNs) have proven highly effective in medical image tasks [12,13]. Their ability to extract vital characteristics and patterns from imaging data, using large numbers of datasets, has proven to be crucial in providing more accurate and effective analysis results [14]. In addition, through quantitative analysis and accurate measurements, DNNs may provide a precise distinction of neighboring organs or other areas of interest, which is useful in categorizing several anatomical features, anomalies, and diseases such as AAA [15]. On this premise, convolutional neural networks (CNNs) have established their effectiveness in medical image segmentation tasks, utilizing different layers for feature extraction and feature mapping [16]. In particular, Lopez-Linares et al. [17] suggested a fully automatic technique, based on CNNs, for the detection and segmentation of ILT in post-operative computed tomography angiography (CTA) images. Furthermore, Lareyre et al. [18] presented an automatic threshold-based contour detection method for the detection and evaluation of the main characteristics of AAA.

Among CNN-based architectures, 3D UNet and SegResNet are commonly utilized for 3D medical image segmentation due to their capacity to handle volumetric data [19]. However, the fixed receptive field of CNN layers limits their ability to effectively capture long-range dependencies, which can lead to sub-optimal performance in modeling global context information [20]. This limitation can be addressed by employing transformer-based models, which excel in modeling long-range relationships through self-attention mechanisms, improving segmentation accuracy [21]. As such, UNETR and SwinUNETR are transformer-based DNN architectures that leverage self-attention mechanisms to overcome the limitations of traditional CNNs [22]. UNETR integrates transformers into the UNet structure, while SwinUNETR builds on this approach with an enhanced design specifically tailored for image segmentation [23].

Despite the demonstrated effectiveness of machine learning in segmenting abdominal aorta regions, the plethora of deep learning architectures and differences in input data make a direct comparison of the results challenging. To address this, comparative studies are essential for systematically evaluating the performance of different architectures under standardized conditions and data sets, allowing for more reliable conclusions about their accuracy, robustness, and clinical applicability in abdominal aorta segmentation. From this standpoint, Camara et al. [24] used CNNs for screening and identifying CTA results of infrarenal AAAs, randomizing the medical image data to sets of 60%, 10%, and 30% for model training, validation, and testing, respectively. Their results indicated a 99.1% accuracy of the final customized CNN model and an area under the receiver operating characteristic curve of 0.99. In a similar design, Cao et al. [25] implemented three convolutional neural networks (CNNs) based on UNet architecture for the segmentation of type B aortic dissection, and with CNN3, achieved the best Dice metrics of 93% ± 0.01, 93% ± 0.01, and 91% ± 0.02 for the whole aorta, true lumen, and false lumen, accordingly. While the implementation of the different machine learning models provides a framework for uniform evaluation metrics in the same dataset, the impact of different types of input data (e.g., CT resolution, or noise levels) on the performance of different segmentation deep learning models is largely underexplored. In this regard, the comparison of the different segmentation algorithms, when trained and evaluated on one (limited or homogeneous) dataset, restricts the generalizability to diverse patient populations, reducing their potential to be deploying in real-world clinical environments. Moreover, since AAA causes structural and morphological changes in the aorta, such as dilation and irregular wall contours (which differ significantly from the uniform appearance in healthy aortas), deep learning models trained exclusively on healthy aortic images may struggle to generalize to AAA cases [26]. This discrepancy can lead to inaccurate segmentation in aneurysmal aortas, as the model may incorrectly assume standard features, such as consistent vessel diameter and shape (lacking exposure to these pathological variations during training).

Considering all the above, the aim of this study is to conduct a comprehensive evaluation, both qualitative and quantitative, to assess and compare the performance of various deep learning architectures, while incorporating healthy and patient data from different, distinct datasets. Specifically, four specific neural networks (i.e., UNet, SegResNet, UNETR, and SwinUNETR) are implemented for the automatic segmentation of the abdominal aorta, combining and analyzing in detail convolutional and transformer architectures. These include all of the steps of preprocessing, training, and mesh construction with extensive training and testing on over 56 aorta images from healthy subjects from different centers (public dataset), and additional validation with 20 abdominal aorta images from patients with AAA (private dataset). The rest of the paper is organized as follows: the Section 2 describes the four deep learning architectures used to train the data and obtain the segmentation predictions for the unseen images. It also analyses and examines the workflow diagram of the CNN and transformer architectures. In the Section 3, the comparative results are presented, and finally, the Section 4 examines key aspects of the four mentioned convolutional neural network architectures and evaluates the potential usage for clinical practice including treatment planning and assessment.

2. Materials and Methods

2.1. Data Acquisition

In this study, seventy-six (76) CT images were used for the segmentation of the abdominal aorta. Of these, fifty (56) images (public dataset) were obtained from healthy subjects, derived from scans across three distinct datasets, the KiTS19 Grand Challenge, the Rider Lung CT dataset, and cases from the Dongyang Hospital [27,28]. The acquisition protocol involved imaging using computed tomography angiography (CTA) with image parameters being 512 × 512 pixels for both KiTS19 Grand Challenge and Rider Lung CT datasets and 512 × 666 pixels for data derived from Dongyang Hospital. The median slice thickness and number of axial slices were 5 mm, 0.625 mm, 3 mm and 146, and 1008 and 149 for KiTS19 Grand Challenge, Rider Lung CT and the Dongyang Hospital datasets, respectively. Additionally, twenty (20) CT images were obtained from patients with AAA, provided by the Department of Vascular Surgery, General University Hospital ATTIKON (private dataset). All private data were acquired using a 3D CT scanner (Philips Brilliance 16-slice), with the following size parameters for the digital images: mean array dimensions: 512 × 512 × 553.3 pixels; mean voxel spacing: 0.84 × 0.84 × 2.31 mm; median array dimensions: 512 × 512 × 448 pixels; and median voxel spacing: 0.88 × 0.88 × 1.13 mm.

2.2. Preprocessing

To ensure homogeneity across all datasets and reduce the impact of potential outliers, an intensity clip utilization with range [−275, 1900] was applied [29]. The range was determined through iterative exploratory testing conducted on the dataset, ensuring that essential features of the aortic region were preserved, while excluding irrelevant variations. Moreover, the narrower range could potentially capture the standard values for soft tissue, including the vascular structures, which are critical for the precise delineation of the aorta. Subsequently, the intensities were normalized to the range [0,1], further reducing numerical instability during model training by preventing excessively large or small input values that might hinder the learning process. To increase the data input and therefore model efficiency, a data augmentation procedure was implemented including random flipping, rotating, and shifting intensity techniques [30]. Specifically, a probability of 25% was applied for all augmentation techniques, increasing the training data’s variety and providing advanced model extrapolation capabilities. After data augmentation, the images were uniformly resampled to an established spacing with pixel dimension [1.0, 1.0, 1.5], ensuring consistency across both datasets.

2.3. Deep Learning Segmenation

To achieve effective segmentation and conduct a comprehensive comparison, we implemented four deep learning architectures: UNet, SegResNet, UNETR, and SwinUNETR. The performance of each model was assessed against the actual aorta for each case (ground truth), which was manually delineated by medical experts specializing in cardiovascular diseases. This was conducted for both datasets. Grounded in our rationale that the deep learning methods should exhibit high performance universally (regardless of the input data), we trained the models using the larger public dataset. In this context, if the models successfully segmented the abdominal aorta in patients (private dataset), it would indicate their efficacy in accurately segmenting both patient and healthy subjects. As such, we employed an 80–20 random split of the public (healthy) dataset for the training and testing phases (i.e., 80% of the total dataset was allocated for training purposes, enabling the models to learn from a substantial portion of the data, and the remaining 20% was designated as the testing set, which was used to evaluate the performance and generalization capabilities of the trained models). For the training procedures, all images were fed into the deep learning models in 64 × 64 × 64 non-overlapping image windows fashion, centered to random points inside the segmentation mask, to minimize GPU memory usage.

After training and testing on the public dataset, a two-fold cross-validation method was used to assess the model’s performance on the private (AAA) dataset. Moreover, we employed the Dice loss function with cross-entropy to quantify the overlap between predicted and ground truth segmentations due to its effectiveness in capturing smaller vessels [31]. The loss function implemented in all deep learning models was calculated as described below.

For the training steps, a composite loss function that combines the cross-entropy loss and the Dice loss were employed. This hybrid loss function was designed to leverage the advantages of both pixel-wise classification accuracy and overlap-based similarity measures, which is particularly beneficial in enhancing the segmentation quality of the aorta structure with branches that are smaller than the main lumen structure. The combined loss function was defined as follows:

L o s s = {α L}_{C E} + {β L}_{D S C}

(1)

where L_CE is the cross-entropy loss, L_DSC is the Dice loss, and α, β are the weighting coefficients set to 0.5.

The cross-entropy loss was calculated as follows:

L_{C E} = - \sum_{i = 1}^{N} \sum_{c = 1}^{C} y_{i c} l o g (p_{i c})

(2)

where N is the total number of pixels in each batch, C is the number of classes, and y_ic is the ground truth label for pixel i and class c. As such, p_ic is the predicted probability that pixel i belongs to class c, with y_ic = 1 if pixel i belongs to class c, and y_ic = 0 if it does not.

Dice loss was calculated as follows:

L_{Dice} = 1 - \frac{2 \sum_{i = 1}^{N} y_{i} p_{i} + ϵ}{\sum_{i = 1}^{N} y_{i} + \sum_{i = 1}^{N} p_{i} + ϵ}

(3)

where y_i is the binary label ground truth for pixel i; p_i is the predicted probability for pixel i, and ϵ is a small constant added to prevent division by zero.

Each deep learning architecture is presented briefly below, while the hyperparameters of each deep learning architecture are presented in Table 1. Hyperparameter values were selected based on the following procedures. The learning rate was determined through a grid search on the validation set, exploring values in the range of

10^{- 5}

10^{- 1}

. Additionally, the dropout rate was included as part of the grid search, with values tested at 0.01, 0.1, and 0.35. The number of training epochs was set to x, constant, as it was observed that all model training and validation losses had stabilized. The batch size was set to the maximum value to fit in the available GPU memory. For the architectural design, parameters such as features per layer, feature size, convolutional kernels, and the number of heads, we adopted the initial proposed values from the respective articles and experimented with slight variations to achieve the best DSC value in the validation set. Regarding the normalization techniques, group normalization was part of the SegResNet architecture, UNETR and SwinUNETR performed better in the validation set with instance normalization, and batch normalization made the UNet training more stable. The optimizer, Adam, and the loss function, the summation of DSC and cross-entropy, were kept the same across all of the models to ensure uniformity and to maintain consistency and emphasize baseline performance comparisons.

For evaluating the segmentations, four major metrics were employed, namely the Dice coefficient (Sorensen–Dice coefficient, DSC), recall, precision, and average symmetric surface distance (ASSD) [32,33].

Specifically, the DSC quantifies the degree of similarity or overlap between the ground truth segmentation and the algorithm’s generated result. DSC was calculated using the following formula:

D S C = \frac{2 | A \cap B |}{|A| + | B |}

(4)

where A and B denote the voxel sets of the two segmentations and |A|, |B| illustrate the complete number of voxels in A and B, respectively.

The recall metric is defined as the ratio of correctly identified positive voxels to the total number of real positive cases and is calculated through the following equation:

R e c a l l = \frac{T P}{T P + F N}

(5)

where TP, FN, and FP denote the number of true positives, false negatives, and false positives accordingly.

Precision represents the ratio of true positives to the total count of voxels classified as positive and is defined as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(6)

Similar to recall, TP, FN, and FP indicate the number of true positives, false negatives, and false positives, respectively.

ASSD is the mean of distances from the generated boundary points to the ground truth boundary, and vice versa. ASSD is calculated as follows:

A S S D (A, B) = \frac{1}{|S (A)| + |S (B)|} \sum_{S_{A} ϵ S (A)} d (s_{A}, S (B)) + \sum_{S_{B} ϵ S (B)} d (s_{B}, S (A))

(7)

where S(A) and S(B) denote the set of surface voxels of A and B; d(s_A, S(B)) denotes the shortest distance of voxel A to S(B), calculated as follows:

d (s_{A}, S (B)) = {m i n}_{S_{B} ϵ S (B)} ‖A - S_{B}‖

(8)

where

‖.‖

denotes the Euclidean distance.

2.3.1. UNet Architecture

Briefly, the UNet structure consists of an encoder part which is responsible for the extraction of features with context information and a decoder part which can reconstruct the segmentation mask in the full resolution [34]. The encoder (contractive path) incorporates convolutional blocks including convolutions, the ReLU activation function, and max pooling layers to down-sample the feature maps by 2, while doubling the number of features in each stage. In this work, we additionally included residual connections inside the convolutional blocks of the encoder and decoder to facilitate gradient propagation. In this regard, each encoder block included two convolutional layers, the first with stride 2 to down-sample the feature map, followed by instance normalization and the PReLU activation function. The number of convolutional filters for the 5 layers of the encoder were set to (16, 32, 64, 128, 256). The final layer of the encoder was the bottleneck layer between the encoder and decoder path. In each stage of the decoder, the feature maps from the previous stage were up-sampled by 2 through transpose convolutions and then concatenated with the feature maps from the corresponding encoder stage connected with the skip connection. In detail, in each decoder block, the up-sampled features were concatenated with the features from the skip connection along the channel dimension, then they pass through a convolutional layer, instance normalization, and the PReLU activation function. Finally, the residual connection adds the concatenated input to the PReLU output to shape the final output of the corresponding decoder layer. The final convolutional layer was followed by a softmax activation function to produce the segmentation mask in full resolution. The architectural design of UNet is presented in Figure 1, while its hyperparameters are listed in Table 1.

2.3.2. UNETR Architecture

The UNETR model combines the UNet-like architecture composed of the encoder for feature extraction and the decoder for the segmentation reconstruction with the transformer network [35]. The conventional convolutional encoder is replaced with a transformer that is capable of modeling large spatial dependencies. Skip connections are implemented to transfer information from the encoder to different resolutions in the decoder. In this study, the encoder of UNETR was built upon a vision transformer (ViT), which requires splitting the input image into patches of fixed size. These non-overlapping 3D patches were then flattened to create one-dimensional vectors, x_v, that would be used as input to the ViT. In the next step, a linear layer was applied to transform the patches into vectors with a lower dimension of size K. In the created vectors, a 1D learnable positional embedding was added to retain the initial spatial details leading to the final embeddings z that will be inserted into the ViT. The generation of embeddings is followed by a sequence of transformer blocks incorporating multi-head self-attention (MSA) and multi-layer perceptrons (MLP). At layers 3, 6, 9, and 12 of the transformer, sequence representations were extracted. These layers correspond to different resolutions in the encoder, capturing different levels of feature abstraction. The sequence representations extracted from the encoder are initially in the form of 1D patch embeddings. Then, the embeddings were reshaped into a 3D tensor of the required dimension for the decoder. The reshaped tensors from the encoder are fed directly into the decoder through skip connections. At the bottleneck layer, a transposed convolution was applied to increase the spatial resolution of the output of the last transformer layer by a factor of 2. Then, the resized feature map was concatenated with the one from the previous transformer output. The decoder in UNETR comprised convolutional layers, as well as an instance normalization (IN) process, designed to gradually up-sample the encoded representations back to the original image resolution. The skip connections merged features from different transformer layers with corresponding levels of the decoder, preserving spatial details. At each stage, the reshaped tensors are processed by 3 × 3 × 3 convolutional layers followed by normalization, ensuring a smooth transition from the embedding space to the input space. In the output layer, a 1 × 1 × 1 convolution produced the segmentation map. The architectural design of UNETR is presented in Figure 2, while its hyperparameters are listed in Table 1.

2.3.3. SwinUNETR Architecture

The SwinUNETR architecture also follows a UNET like encoder-decoder scheme, with the main difference being that the encoder is based on the Shifted-Windows Self-Attention method [36]. This encoder leverages the hierarchical scheme of Swin transformers to efficiently capture both local and global contextual information in 3D medical images. First, the original input is partitioned into patches to create a series of 3D boxes (tokens). These patches are then linearly projected to vectors of dimension C. By utilizing the shifted windows, non-overlapping patches are created to be used for the self-attention layers. A 3D window with size M in each dimension splits the initial 3D boxes into smaller patches. In the subsequent layer, the partition moves by

(\frac{M}{2}, \frac{M}{2}, \frac{M}{2})

voxels, leading to new patches to be used as tokens for self-attention. The Swin transformer encoder extracts features at five different resolutions by utilizing shifted windows for computing self-attention, and is connected to an FCNN-based decoder at each resolution via skip connections. In this study, the encoder includes 4 steps with 2 transformer blocks inside. In the beginning, a partitioning layer (where the patch size was set to 2 × 2 × 2 with a feature dimension of 32) was applied to convert the input blocks into a sequence of feature embeddings. At each stage, these blocks utilized a combination of Window-based Multi-head Self-Attention (W-MSA) and Shifted-Window Multi-head Self-Attention (SW-MSA) mechanisms to compute self-attention within non-overlapping windows. As the feature maps moved through the encoder stages, a patch merging operation reduced the spatial resolution by a factor of 2, while increasing the feature dimension, thereby generating multi-scale feature representations. The decoder included multiple residual blocks at each stage, with each block containing two convolutional layers (kernel size 3) with instance normalization and ReLU activation to enhance the feature representations. After the residual block, the feature maps were up-sampled by a factor of 2 with a deconvolution layer. The up-sampled feature maps were then concatenated with the skip connections from the encoder. The final segmentation maps were generated by a convolutional layer with a kernel size of 1 and a softmax activation function. The architectural design of SwinUNETR is presented in Figure 3, while its hyperparameters are listed in Table 1.

2.3.4. SegResNet Architecture

SegResNet follows the UNet structure with the encoder, the bottleneck, and the decoder part [37]. The variational auto-encoder part of the decoder is omitted to reduce computational cost [38]. The encoder path is composed of convolutional ResNet blocks that include residual connections to mitigate the vanishing gradient problem. Each encoder block comprises two convolutional blocks, which include a group normalization, ReLU activation, and 3D convolution, with a skip connection to aid in gradient propagation. In this study, in the encoder path, each encoder block would double the number of filters and down-sample the input feature maps by 2, using convolutions with stride 2, in order to capture the context of the image in various levels of abstraction. A bottleneck layer consisting of 256 filters was located between the encoder and the decoder part. The decoder module reconstructed the segmentation mask by gradually up-scaling the feature representation from the different levels of the encoder. In each decoder block, the feature maps from the previous level of the decoder (which were up-sampled by 2 using a transpose convolution), were added to the feature maps from the corresponding skip connection and inserted into 1 × 1 × 1 convolutions to refine the number of features. In contrast to the encoder blocks, decoder blocks included one convolutional layer per block. The last layer applied a softmax activation function with 2 channels for the final segmentation mask. The architectural design of SegResNet is presented in Figure 4, while its hyperparameters are listed in Table 1.

2.4. Implementation and Computational Cost

The required experiments were conducted on a desktop computer equipped with an Nvidia 4090 GPU, featuring 24GB of GPU memory. All comparative deep learning methods were implemented using PyTorch 2.1.0 and the MONAI library [39]. However, the computational cost of each of the four different deep learning models differed significantly due to the different number of parameters (representing the sum of all weights and biases within the network layers). This number is a critical factor influencing a model’s complexity, memory requirements, and computational demands. In Table 2, the number of total parameters for each model is presented (estimated with PyTorch built-in functions to calculate parameter counts for each layer and the entire model).

2.5. 3D Mesh Reconstruction

Following the machine learning segmentation methods, the delineated abdominal aorta was further evaluated via surface geometry modeling. In this regard, the marching cubes algorithm [40] was employed to generate the reconstructed surface geometries, serving as a 3D surface model for both visualization and mesh processing tasks. Briefly, the marching cubes algorithm is a widely used technique for extracting a polygonal mesh from a three-dimensional scalar field (utilized in this study to create a surface mesh from volumetric data from the extracted segmentation mask). As such, the 3D scalar field was partitioned into smaller cubes, with scalar values calculated at each vertex to enable interpolation and facilitate the generation of triangles that accurately approximate the isosurface within each cube. Following that, triangles from adjacent cubes were merged to form a smooth, continuous surface mesh, while applying a smoothing Laplacian Image Filter to the original extracted geometry, further minimizing the surface roughness of the extracted mesh [41]. The parameter values of Laplacian smoothing were configured with a lambda (smoothing) factor of 0.5 and a total of 10 iterations. For the 3D mesh reconstruction evaluation, minor manual refinements were implemented in the generated mesh to eliminate singularities, self-intersections, and degenerate elements, ensuring impermeability to fluids. For the procedures of mesh creation and smoothing, the Trimesh library in Python 3.7+ was utilized [42]. The final 3D reconstruction of the abdominal aortic region was based on the visualization toolkit (VTK) [43] using OpenGL 3D rendering.

3. Results

3.1. Quantitative Evaluation

For the quantitative evaluation of the results, the performance was assessed by comparing the estimated region of interest to the ground truth. Metrics included DSC, recall, precision, and ASSD. For ASSD, optimal performance is represented by lower values, as it measures the discrepancy between the estimated segmentation and the ground truth. In contrast, higher values indicate better performance for all of the other metrics. The results of the different deep learning architectures are presented below in Table 3 (for the public dataset) and Table 4 (for the private dataset).

Regarding the model performance when utilizing the public dataset, UNet achieved the overall higher DSC value of 0.89 compared to the other three architectures. In addition, the recall and precision metrics for the UNet method were also superior, attaining 0.90 and 0.89, respectively. Interestingly, UNet, SwinUNETR, and SegResNet performed similarly in terms of ASSD (0.08), slightly better than the UNETR method (0.10).

Similar to the public dataset, when employing the private dataset the overall higher performance was achieved with the UNet architecture. This applies to DSC, recall, and ASSD values (0.89, 0.89, and 0.04, respectively) when compared to the other three models. On the contrary, SegResNet presented the overall higher precision (0.92) compared to UNet, UNETR, and SwinUNETR (0.89, 0.77, and 0.85, respectively).

To further assess the performance of the segmentation models, a paired t-test was conducted using the Dice coefficient as the evaluation metric across images. This test accounts for the paired nature of the data, reducing variability due to individual differences and focusing solely on the model performance variation. The analysis was performed with a significance threshold of p = 0.05, assessing whether observed differences in Dice scores between model pairs were statistically significant (Table 5).

3.2. Qualitative Evaluation

For the qualitative evaluation of the four deep learning architectures, the detected aortic structure was superimposed on the original image (and the golden standard) in each case. As mentioned in the previous sections, the golden standard was a mask of the actual abdominal aorta defined by experts manually (utilizing the 3D Slicer application). Representative examples are shown in Figure 5 and Figure 6 (for the private and public datasets, respectively), where the segmented region of interest is highlighted in red for visualization purposes.

As illustrated in Figure 5, the UNet architecture demonstrates noticeably higher accuracy compared to the other three deep learning models. Moreover, in the second case (Figure 5, bottom row), UNet successfully detects most of the smaller regions within the abdominal aortic mask with the other models deviating from the ground truth segmentation.

In the private dataset evaluation (Figure 6), all four comparative deep learning architectures achieved an adequately accurate abdominal aortic mask, although in some cases over- or under-segmented estimates were observed. Particularly, in the first case (Figure 6 upper row), SegResNet achieved the smoothest abdominal aortic mask compared to UNet, UNETR, and SwinUNETR (under-segmentation). Similarly, in the second case, UNet and UNETR architectures resulted in better accuracy in comparison to SwinUNETR and SegResNet (over-segmentation).

3.3. 3D Reconstruction Evaluation

To further evaluate model performance, the 3D reconstruction was reviewed by clinicians to examine the abdominal aorta area from various angles and viewpoints, which would not be possible through 2D medical imaging. The 3D reconstruction models of indicative cases, along with the four deep learning architecture meshes, are presented in Figure 7.

As can be seen in Figure 7, the fused model from the combination of the ground truth and UNet meshes achieved the highest accuracy. In comparison, the 3D models fusion generated by UNETR (Figure 7b), SwinUNETR (Figure 7c), and SegResNet (Figure 7d), exhibited varying degrees of under-segmentation, when matched against the true abdominal aortic region. Moreover, the combination of the Marching Cubes algorithm with the supplemental mesh approach and localized corrections demonstrated the superior quality of the fluid impermeability meshes.

4. Discussion

In this study, we implemented four distinct deep learning architectures for the automatic segmentation of the abdominal aorta, utilizing different datasets for a robust comparative analysis. Specifically, the four methods were evaluated using a large amount of 3D CT data, obtained from public and private datasets, following a comprehensive qualitative and quantitative evaluation. Our findings indicate that the UNet model consistently outperformed SegResNet, UNETR, and SwinUNETR in the majority of cases, both in terms of the key performance metrics (used in this study) and from a qualitative perspective.

Specifically, regarding the quantitative performance of the deep learning models, UNet outperformed all other architectures, based on the DSC metric, in terms of segmentation accuracy. This was evident in both healthy controls (public) and patient (private) datasets, as shown in Table 3 and Table 4. Interestingly, although the deep learning models were not exposed to AAA variations during training, their performance in full aorta delineation was comparable to the healthy controls. The statistical evaluation of DSC scores (Table 5) also demonstrated a significant performance improvement between UNet and UNETR/ Swin UNETR (p < 0.05), confirming the fact that U-Net consistently performed better than certain transformer-based models. In contrast, the statistical differences between U-Net and SegResNet were not significant (p = 0.0566), which was also the case between SegResNet and Swin UNETR (p = 0.3881). It should be noted that although the best segmentation performance was estimated under the constraints of limited annotated images, finer or more anatomically intricate vascular branches of the aorta were not fully captured. This may be attributed to CNN’s limited ability to capture long-range dependencies, which are crucial for accurately representing the complete anatomical complexity of the aorta [44]. On the other hand, the optimal precision was achieved by the SegResNet architecture. However, the deviation between precision and recall, particularly noticeable in the public dataset, suggests a tendency of the implemented SegResNet model toward under-segmentation, subsequently leading to the under-detection of small or subtle vessels [45].

Another important consideration is the computational cost of each model when compared with algorithmic performance. On this premise, all four architectures rely on a variety of parameters that regulate key aspects of their algorithms (Table 2). In this regard, SegResNet was the most lightweight architecture employed, with approximately 1.18 million parameters, followed by UNet with about 4.8 million parameters. In contrast, transformer-based models like UNETR and Swin UNETR have significantly higher parameter counts of approximately 92.6 million and 62.1 million, respectively. As such, the increased computational demands for training and inference require more powerful hardware with larger memory capacity. The higher computational cost of UNETR and Swin UNETR underscores the trade-off between model complexity and resource feasibility, highlighting U-Net’s practicality in our study where computational resources were limited [46]. Despite the fact that, whenever feasible, the same hyperparameter values were utilized for shared characteristics across the four methods, the mentioned values were finalized after multiple trials with images from all available datasets (thus ensuring optimal performance) [47]. Therefore, it can be assumed that different (larger) datasets would alter the hyperparameter values, significantly affecting the overall performance of each model [48]. Similarly, the number of epochs utilized for each method is directly associated with the processing time, increasing as more epochs are employed [49]. The implemented UNet architectures appear to strike a balance between computational efficiency and segmentation accuracy, offering competitive performance without significantly increasing processing time, even as the number of epochs rises [50]. This makes it particularly well-suited for clinical integration applications where both speed and precision are essential [51]. In fact, the robustness of deep learning architectures, particularly UNet, in generalizing from healthy controls to aneurysmal cases, highlight the potential of deep learning-based segmentation to improve clinical decision-making, streamline workflows, and enhance patient outcomes, providing a foundation for integrating automated tools into clinical workflows [52]. By significantly reducing the time and variability associated with manual segmentation, deep learning’s computational efficiency and high accuracy make it suitable for real-time imaging applications, such as intraoperative 3D imaging or portable diagnostic systems [53]. Automated segmentation can also enhance diagnostic consistency, support personalized treatment planning by providing reliable measurements of aortic morphology, and assist in determining patient suitability for interventions like stent graft placement [54].

Regarding the qualitative evaluation, all deep learning architectures performed similarly with UNet being visibly more accurate than the other three deep learning architectures (Figure 5 and Figure 6). Particularly when observing the results of the 3D reconstruction, UNet effectively balances the risks of over-segmentation and under-segmentation (Figure 7). Nevertheless, all deep learning models demonstrated complications when trying to detect very small vessels of the aorta. This relates to the fact that smaller vessels often exhibit under-segmentation due to their reduced surface area, which poses a challenge for the segmentation models [55]. In these cases, the model tends to prioritize the larger, more prominent lumen, while failing to accurately capture the finer details of smaller arteries. This issue is particularly evident in tasks involving complex vascular structures, where smaller branches are more prone to being missed by the model [56]. To address this limitation, we utilized the Dice loss function, which is effective in measuring the overlap between the predicted segmentation and the ground truth [31]. However, due to the imbalanced nature of vessel sizes, relying solely on Dice loss proved insufficient for capturing smaller arteries. To mitigate this under-segmentation problem, we incorporated a cross-entropy term in the loss function, which provided a better balance by focusing on pixel-wise classification accuracy. This combined approach helped improve the model’s ability to segment both larger and smaller vessels, resulting in a more accurate representation of the vascular structure. In future work, we plan to expand this study by incorporating larger cohorts and exploring additional models to investigate whether the integration of cross-entropy with the Dice loss function can provide an effective solution for reducing the under-segmentation of smaller vessels (encouraging the model to assign more weight to these smaller structures while maintaining the segmentation accuracy of larger vessels).

5. Conclusions

The present study conducted a comprehensive comparison of four widely utilized deep learning architectures (UNet, SegResNet, UNETR, and SwinUNETR) for the automatic segmentation of the abdominal aorta using 3D CT data. By employing two distinct datasets (one comprising healthy individuals and another including AAA patients) we provided a comprehensive analysis of model performance. Training and testing were conducted on the healthy dataset, with additional validation on the AAA dataset to assess the models’ generalizability. The results demonstrated that deep learning architectures are capable of accurately delineating the abdominal aorta, even when trained exclusively on healthy data. Among the models evaluated, UNet consistently outperformed the others, with an average DSC of 0.89 ± 0.05 on the public dataset and 0.89 ± 0.07 on the private dataset. Statistical analysis revealed significant differences between UNet and transformer-based models UNETR (p = 5.01 × 10⁻⁵) and SwinUNETR (p = 0.0318). In contrast, SegResNet achieved the highest precision (0.92 ± 0.04), but its relatively lower recall (0.75 ± 0.18) highlighted a tendency for under-segmentation, particularly in smaller vessels. Despite the absence of AAA-specific training data, all models demonstrated a high level of generalization to aneurysmal cases. Qualitative evaluations further underscored the superior performance of UNet, which effectively balanced over- and under-segmentation, producing smoother and more accurate 3D reconstructions of the aorta. However, smaller vascular structures often posed challenges for all models, with under-segmentation attributed to their reduced surface area. To address this issue, a hybrid loss function combining Dice loss and cross-entropy was implemented, resulting in improved segmentation accuracy for these smaller structures without compromising the accuracy of larger vessels. In addition to its superior performance, UNet also exhibited computational efficiency, with a parameter count significantly lower than transformer-based models such as UNETR and SwinUNETR. This efficiency, combined with its accuracy, underscores UNet’s practicality for real-time clinical applications where computational resources may be limited.

Author Contributions

Conceptualization, T.P.V. and G.K.M.; methodology, C.M., T.L.E. and I.K.; software, T.P.V., T.L.E. and I.V; validation, I.V., O.P. and I.K.; formal analysis, C.M. and T.P.V.; investigation, I.V. and G.K.M.; resources, T.L.E. and G.K.M.; data curation, O.P. and I.K.; writing—original draft preparation, C.M., T.P.V. and T.L.E.; writing—review and editing, C.M., I.K. and O.P.; visualization, T.L.E. and I.K.; supervision, G.K.M.; project administration, O.P. and G.K.M.; funding acquisition, G.K.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was implemented in the framework of the action “Flagship actions in interdisciplinary scientific fields with a special focus on the productive fabric”, which is implemented through the National Recovery and Resilience Fund Greece 2.0 and funded by the European Union—NextGenerationEU (Project ID: TAEDR-0535983).

Institutional Review Board Statement

Ethical review and approval were waived for this study due to the anonymization of the final data used by the proposed algorithm.

Informed Consent Statement

Patient consent was waived as all patient data were analyzed retrospectively after being anonymized.

Data Availability Statement

The public datasets used in this study were provided from three different collections, the KiTS19 Grand Challenge, the Rider Lung CT dataset, and cases from the Dongyang Hospital. For more details of all three public datasets, see https://doi.org/10.1016/j.dib.2022.107801 (accessed on 11 June 2024) and https://doi.org/10.6084/m9.figshare.14806362.v1 (accessed on 11 June 2024). Additionally, the private dataset was acquired from the Department of Vascular Surgery, General University Hospital “Attikon”, Athens, Greece, and is available by contacting the corresponding author.

Acknowledgments

The authors would like to thank Tassos Raptis, Christos Manopoulos, and Ioannis D. Kakisis and his team at General University Hospital “Attikon”, Athens, Greece, for supplying the required 3D datasets to conduct this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Rodella, L.F.; Rezzani, R.; Bonomini, F.; Peroni, M.; Cocchi, M.A.; Hirtler, L.; Bonardelli, S. Abdominal Aortic Aneurysm and Histological, Clinical, Radiological Correlation. Acta Histochem. 2016, 118, 256–262. [Google Scholar] [CrossRef]
Shaw, P.M.; Loree, J.; Gibbons, R.C. Abdominal Aortic Aneurysm. In StatPearls; StatPearls Publishing: St. Petersburg, FL, USA, 2024. [Google Scholar]
Sethi, A.; Taylor, D.L.; Ruby, J.G.; Venkataraman, J.; Sorokin, E.; Cule, M.; Melamud, E. Calcification of the Abdominal Aorta Is an Under-Appreciated Cardiovascular Disease Risk Factor in the General Population. Front. Cardiovasc. Med. 2022, 9, 1003246. [Google Scholar] [CrossRef]
Gameraddin, M. Normal Abdominal Aorta Diameter on Abdominal Sonography in Healthy Asymptomatic Adults: Impact of Age and Gender. J. Radiat. Res. Appl. Sci. 2019, 12, 186–191. [Google Scholar] [CrossRef]
Tran, C.T.; Wu, C.Y.; Bordes, S.J.; Lui, F. Anatomy, Abdomen and Pelvis: Abdominal Aorta. In StatPearls; StatPearls Publishing: St. Petersburg, FL, USA, 2024. [Google Scholar]
Zhou, L.; Fan, M.; Hansen, C.; Johnson, C.R.; Weiskopf, D. A Review of Three-Dimensional Medical Image Visualization. Health Data Sci. 2022, 2022, 9840519. [Google Scholar] [CrossRef] [PubMed]
Ma, Y.; Ding, P.; Li, L.; Liu, Y.; Jin, P.; Tang, J.; Yang, J. Three-Dimensional Printing for Heart Diseases: Clinical Application Review. Bio-Des. Manuf. 2021, 4, 675–687. [Google Scholar] [CrossRef]
Shashi, P.; Suchithra, R. Review Study on Digital Image Processing and Segmentation. Am. J. Comput. Sci. Technol. 2019, 2, 68–72. [Google Scholar] [CrossRef]
Fantazzini, A.; Esposito, M.; Finotello, A.; Auricchio, F.; Pane, B.; Basso, C.; Spinella, G.; Conti, M. 3D Automatic Segmentation of Aortic Computed Tomography Angiography Combining Multi-View 2D Convolutional Neural Networks. Cardiovasc. Eng. Technol. 2020, 11, 576–586. [Google Scholar] [CrossRef] [PubMed]
Mavridis, C.; Economopoulos, T.L.; Benetos, G.; Matsopoulos, G.K. Aorta Segmentation in 3D CT Images by Combining Image Processing and Machine Learning Techniques. Cardiovasc. Eng. Technol. 2024, 15, 359–373. [Google Scholar] [CrossRef]
Abdolmanafi, A.; Forneris, A.; Moore, R.D.; Di Martino, E.S. Deep-Learning Method for Fully Automatic Segmentation of the Abdominal Aortic Aneurysm from Computed Tomography Imaging. Front. Cardiovasc. Med. 2023, 9, 1040053. [Google Scholar] [CrossRef]
Vezakis, A.; Vezakis, I.; Vagenas, T.P.; Kakkos, I.; Matsopoulos, G.K. A Multidimensional Framework Incorporating 2D U-Net and 3D Attention U-Net for the Segmentation of Organs from 3D Fluorodeoxyglucose-Positron Emission Tomography Images. Electronics 2024, 13, 3526. [Google Scholar] [CrossRef]
Lyu, T.; Yang, G.; Zhao, X.; Shu, H.; Luo, L.; Chen, D.; Xiong, J.; Yang, J.; Li, S.; Coatrieux, J.-L.; et al. Dissected Aorta Segmentation Using Convolutional Neural Networks. Comput. Methods Programs Biomed. 2021, 211, 106417. [Google Scholar] [CrossRef]
Chang, V.; Bhavani, V.R.; Xu, A.Q.; Hossain, M. An Artificial Intelligence Model for Heart Disease Detection Using Machine Learning Algorithms. Healthc. Anal. 2022, 2, 100016. [Google Scholar] [CrossRef]
Izadikhah, M. A Fuzzy Stochastic Slacks-Based Data Envelopment Analysis Model with Application to Healthcare Efficiency. Healthc. Anal. 2022, 2, 100038. [Google Scholar] [CrossRef]
Yamashita, R.; Nishio, M.; Do, R.K.G.; Togashi, K. Convolutional Neural Networks: An Overview and Application in Radiology. Insights Imaging 2018, 9, 611–629. [Google Scholar] [CrossRef] [PubMed]
López-Linares, K.; Aranjuelo, N.; Kabongo, L.; Maclair, G.; Lete, N.; Ceresa, M.; García-Familiar, A.; Macía, I.; González Ballester, M.A. Fully Automatic Detection and Segmentation of Abdominal Aortic Thrombus in Post-Operative CTA Images Using Deep Convolutional Neural Networks. Med. Image Anal. 2018, 46, 202–214. [Google Scholar] [CrossRef] [PubMed]
Lareyre, F.; Adam, C.; Carrier, M.; Dommerc, C.; Mialhe, C.; Raffort, J. A Fully Automated Pipeline for Mining Abdominal Aortic Aneurysm Using Image Segmentation. Sci. Rep. 2019, 9, 13750. [Google Scholar] [CrossRef]
Kalla, M.-P.; Vagenas, T.P.; Economopoulos, T.L.; Matsopoulos, G.K. Deep Learning-Based Registration of Two-Dimensional Dental Images with Edge Specific Loss. J. Med. Imaging 2023, 10, 034002. [Google Scholar] [CrossRef]
Shamshad, F.; Khan, S.; Zamir, S.W.; Khan, M.H.; Hayat, M.; Khan, F.S.; Fu, H. Transformers in Medical Imaging: A Survey. Med. Image Anal. 2023, 88, 102802. [Google Scholar] [CrossRef]
Hatamizadeh, A.; Nath, V.; Tang, Y.; Yang, D.; Roth, H.R.; Xu, D. Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries; Crimi, A., Bakas, S., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 272–284. [Google Scholar]
Yang, T.; Zhu, G.; Cai, L.; Yeo, J.H.; Mao, Y.; Yang, J. A Benchmark Study of Convolutional Neural Networks in Fully Automatic Segmentation of Aortic Root. Front. Bioeng. Biotechnol. 2023, 11, 1171868. [Google Scholar] [CrossRef]
Kim, T.; On, S.; Gwon, J.G.; Kim, N. Computed Tomography-Based Automated Measurement of Abdominal Aortic Aneurysm Using Semantic Segmentation with Active Learning. Sci. Rep. 2024, 14, 8924. [Google Scholar] [CrossRef]
Camara, J.R.; Tomihama, R.T.; Pop, A.; Shedd, M.P.; Dobrowski, B.S.; Knox, C.J.; Abou-Zamzam, A.M.; Kiang, S.C. Development of a Convolutional Neural Network to Detect Abdominal Aortic Aneurysms. J. Vasc. Surg. Cases Innov. Tech. 2022, 8, 305–311. [Google Scholar] [CrossRef] [PubMed]
Cao, L.; Shi, R.; Ge, Y.; Xing, L.; Zuo, P.; Jia, Y.; Liu, J.; He, Y.; Wang, X.; Luan, S.; et al. Fully Automatic Segmentation of Type B Aortic Dissection from CTA Images Enabled by Deep Learning. Eur. J. Radiol. 2019, 121, 108713. [Google Scholar] [CrossRef] [PubMed]
Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding Deep Learning (Still) Requires Rethinking Generalization. Commun. ACM 2021, 64, 107–115. [Google Scholar] [CrossRef]
Radl, L.; Jin, Y.; Pepe, A.; Li, J.; Gsaxner, C.; Zhao, F.; Egger, J. AVT: Multicenter Aortic Vessel Tree CTA Dataset Collection with Ground Truth Segmentation Masks. Data Brief 2022, 40, 107801. [Google Scholar] [CrossRef]
Radl, L.; Jin, Y.; Pepe, A.; Li, J.; Gsaxner, C.; Zhao, F.; Egger, J. Aortic Vessel Tree (AVT) CTA Datasets and Segmentations. Figshare Dataset 2022. [Google Scholar] [CrossRef]
Peyrin, F.; Engelke, K. CT Imaging: Basics and New Trends. In Handbook of Particle Detection and Imaging; Grupen, C., Buvat, I., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 883–915. ISBN 978-3-642-13271-1. [Google Scholar]
Shorten, C.; Khoshgoftaar, T.M. A Survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
Jadon, S. A Survey of Loss Functions for Semantic Segmentation. In Proceedings of the 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Via del Mar, Chile, 27–29 October 2020; pp. 1–7. [Google Scholar] [CrossRef]
Taha, A.A.; Hanbury, A. Metrics for Evaluating 3D Medical Image Segmentation: Analysis, Selection, and Tool. BMC Med. Imaging 2015, 15, 29. [Google Scholar] [CrossRef] [PubMed]
Yeghiazaryan, V.; Voiculescu, I.D. Family of Boundary Overlap Metrics for the Evaluation of Medical Image Segmentation. JMI 2018, 5, 015006. [Google Scholar] [CrossRef]
Sanjar, K.; Bekhzod, O.; Kim, J.; Kim, J.; Paul, A.; Kim, J. Improved U-Net: Fully Convolutional Network Model for Skin-Lesion Segmentation. Appl. Sci. 2020, 10, 3658. [Google Scholar] [CrossRef]
Gillot, M.; Baquero, B.; Le, C.; Deleat-Besson, R.; Bianchi, J.; Ruellas, A.; Gurgel, M.; Yatabe, M.; Turkestani, N.A.; Najarian, K.; et al. Automatic Multi-Anatomical Skull Structure Segmentation of Cone-Beam Computed Tomography Scans Using 3D UNETR. PLoS ONE 2022, 17, e0275033. [Google Scholar] [CrossRef] [PubMed]
Kakavand, R.; Palizi, M.; Tahghighi, P.; Ahmadi, R.; Gianchandani, N.; Adeeb, S.; Souza, R.; Edwards, W.B.; Komeili, A. Integration of Swin UNETR and Statistical Shape Modeling for a Semi-Automated Segmentation of the Knee and Biomechanical Modeling of Articular Cartilage. Sci. Rep. 2024, 14, 2748. [Google Scholar] [CrossRef] [PubMed]
Lee, J.; Kim, J.; Dallan, L.; Zimin, V.; Hoori, A.; Shafiabadi, N.; Makhlouf, M.; Guagliumi, G.; Bezerra, H.; Wilson, D. Deep Learning Segmentation of Fibrous Cap in Intravascular Optical Coherence Tomography Images. Sci. Rep. 2024, 14, 4393. [Google Scholar] [CrossRef]
Vagenas, T.P.; Georgas, K.; Matsopoulos, G.K. Deep Learning-Based Segmentation and Mesh Reconstruction of the Aortic Vessel Tree from CTA Images. In Segmentation of the Aorta. Towards the Automatic Segmentation, Modeling, and Meshing of the Aortic Vessel Tree from Multicenter Acquisition; Pepe, A., Melito, G.M., Egger, J., Eds.; Springer Nature: Cham, Switzerland, 2024; pp. 80–94. [Google Scholar]
Cardoso, M.J.; Li, W.; Brown, R.; Ma, N.; Kerfoot, E.; Wang, Y.; Murrey, B.; Myronenko, A.; Zhao, C.; Yang, D.; et al. MONAI: An Open-Source Framework for Deep Learning in Healthcare. arXiv 2022, arXiv:2211.02701. [Google Scholar]
Lorensen, W.E.; Cline, H.E. Marching Cubes: A High Resolution 3D Surface Construction Algorithm. SIGGRAPH Comput. Graph. 1987, 21, 163–169. [Google Scholar] [CrossRef]
Vollmer, J.; Mencl, R.; Müller, H. Improved Laplacian Smoothing of Noisy Surface Meshes. Comput. Graph. Forum 1999, 18, 131–138. [Google Scholar] [CrossRef]
Attene, M. A Lightweight Approach to Repairing Digitized Polygon Meshes. Vis. Comput. 2010, 26, 1393–1406. [Google Scholar] [CrossRef]
Advincula, W.D.C.; Choco, J.A.G.; Magpantay, K.A.G.; Sabellina, L.A.N., III; Tolentino, J.G.M.F.; Baldovino, R.G.; Bugtai, N.T.; See, A.R.; Du, Y.-C. Development and Future Trends in the Application of Visualization Toolkit (VTK): The Case for Medical Image 3D Reconstruction. AIP Conf. Proc. 2019, 2092, 020022. [Google Scholar] [CrossRef]
Yang, Y.; Jiang, P.; Cai, X.; Xue, Z.; Shen, D. Integrating Convolutional Neural Network and Transformer for Lumen Prediction Along the Aorta Sections. In Machine Learning in Medical Imaging; Xu, X., Cui, Z., Rekik, I., Ouyang, X., Sun, K., Eds.; Springer Nature: Cham, Switzerland, 2024; pp. 340–349. [Google Scholar]
Yang, Q.; Wang, C.; Pan, K.; Xia, B.; Xie, R.; Shi, J. An Improved 3D-UNet-Based Brain Hippocampus Segmentation Model Based on MR Images. BMC Med. Imaging 2024, 24, 166. [Google Scholar] [CrossRef] [PubMed]
Azad, R.; Aghdam, E.K.; Rauland, A.; Jia, Y.; Avval, A.H.; Bozorgpour, A.; Karimijafarbigloo, S.; Cohen, J.P.; Adeli, E.; Merhof, D. Medical Image Segmentation Review: The Success of U-Net. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10076–10095. [Google Scholar] [CrossRef]
Salehi, A.W.; Khan, S.; Gupta, G.; Alabduallah, B.I.; Almjally, A.; Alsolai, H.; Siddiqui, T.; Mellit, A. A Study of CNN and Transfer Learning in Medical Imaging: Advantages, Challenges, Future Scope. Sustainability 2023, 15, 5930. [Google Scholar] [CrossRef]
Shin, H.-C.; Roth, H.R.; Gao, M.; Lu, L.; Xu, Z.; Nogues, I.; Yao, J.; Mollura, D.; Summers, R.M. Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning. IEEE Trans. Med. Imaging 2016, 35, 1285–1298. [Google Scholar] [CrossRef] [PubMed]
Ali, L.; Alnajjar, F.; Jassmi, H.A.; Gocho, M.; Khan, W.; Serhani, M.A. Performance Evaluation of Deep CNN-Based Crack Detection and Localization Techniques for Concrete Structures. Sensors 2021, 21, 1688. [Google Scholar] [CrossRef]
Chen, G.; Li, L.; Zhang, J.; Dai, Y. Rethinking the Unpretentious U-Net for Medical Ultrasound Image Segmentation. Pattern Recognit. 2023, 142, 109728. [Google Scholar] [CrossRef]
Pinto-Coelho, L. How Artificial Intelligence Is Shaping Medical Imaging Technology: A Survey of Innovations and Applications. Bioengineering 2023, 10, 1435. [Google Scholar] [CrossRef] [PubMed]
Elhaddad, M.; Hamam, S. AI-Driven Clinical Decision Support Systems: An Ongoing Pursuit of Potential. Cureus 2024, 16, e57728. [Google Scholar] [CrossRef] [PubMed]
Rahman, H.; Khan, A.R.; Sadiq, T.; Farooqi, A.H.; Khan, I.U.; Lim, W.H. A Systematic Literature Review of 3D Deep Learning Techniques in Computed Tomography Reconstruction. Tomography 2023, 9, 2158–2189. [Google Scholar] [CrossRef]
Kappe, K.O.; Smorenburg, S.P.M.; Hoksbergen, A.W.J.; Wolterink, J.M.; Yeung, K.K. Deep Learning–Based Intraoperative Stent Graft Segmentation on Completion Digital Subtraction Angiography During Endovascular Aneurysm Repair. J. Endovasc. Ther. 2023, 30, 822–827. [Google Scholar] [CrossRef]
Ntiri, E.E.; Holmes, M.F.; Forooshani, P.M.; Ramirez, J.; Gao, F.; Ozzoude, M.; Adamo, S.; Scott, C.J.M.; Dowlatshahi, D.; Lawrence-Dewar, J.M.; et al. Improved Segmentation of the Intracranial and Ventricular Volumes in Populations with Cerebrovascular Lesions and Atrophy Using 3D CNNs. Neuroinform 2021, 19, 597–618. [Google Scholar] [CrossRef]
Lesage, D.; Angelini, E.D.; Bloch, I.; Funka-Lea, G. A Review of 3D Vessel Lumen Segmentation Techniques: Models, Features and Extraction Schemes. Med. Image Anal. 2009, 13, 819–845. [Google Scholar] [CrossRef]

Figure 1. The architecture of UNet.

Figure 2. The architecture of UNETR.

Figure 3. The architecture of SwinUNETR.

Figure 4. The architecture of SegResNet.

Figure 5. Detected regions of interest (aorta) superimposed on the original imaging data slices for two clinical cases (row 1 and 2) of the public dataset for (a) the initial image; (b) ground truth; (c) UNet model; (d) UNETR model; (e) SwinUNETR model; and (f) SegResNet.

Figure 6. Detected regions of interest (aorta) superimposed on the original imaging data slices for two clinical cases (row 1 and 2) of the private dataset for (a) the initial image; (b) ground truth; (c) UNet model; (d) UNETR model; (e) SwinUNETR model; and (f) SegResNet.

Figure 7. Three-dimensional fused models of the estimated aorta (blue) superimposed on the ground truth (coral) for three cases, using (a) UNet; (b) UNETR; (c) SwinUNETR; and (d) SegResNet.

Table 1. Architectural hyperparameters.

Hyperparameters	UNet	UNETR	SwinUNETR	SegResNet
Features per layer	(16, 32, 64, 128, 256)	(16, 32, 64, 128)	(48, 96, 192, 384, 768)	(8, 16, 32, 64)
Number of residual connections	2	-	-	8
Conv kernel size	3	3	3	3
Normalization	Batch	Instance	Instance	Group
Dropout rate	0.1	0.01	0.01	0.35
Learning rate	0.0001	0.0001	0.0001	0.0001
Optimizer	Adam	Adam	Adam	Adam
Feature size	0	16	24	-
Hidden size	-	768	-	-
Number of heads	-	12	(3, 6, 12, 24)	-

Table 2. Number of total parameters for each model.

Model	Parameters
UNet	4,808,917
UNETR	92,667,106
SwinUNETR	62,186,708
SegResNet	1,186,994

Table 3. Performance of the different deep learning models in the public dataset.

Model	DSC *	Recall *	Precision *	ASSD *
UNet	0.89 ± 0.05	0.90 ± 0.06	0.89 ± 0.05	0.08 ± 0.04
UNETR	0.75 ± 0.16	0.77 ± 0.12	0.74 ± 0.20	0.10 ± 0.06
SwinUNETR	0.88 ± 0.08	0.87 ± 0.08	0.90 ± 0.09	0.08 ± 0.04
SegResNet	0.88 ± 0.08	0.85 ± 0.10	0.91 ± 0.07	0.08 ± 0.04

* The standard deviation of each metric is presented next to each measure after the ± symbol.

Table 4. Performance of the different deep learning models in the private dataset.

Model	DSC *	Recall *	Precision *	ASSD *
UNet	0.89 ± 0.07	0.89 ± 0.10	0.89 ± 0.05	0.04 ± 0.02
UNETR	0.80 ± 0.13	0.84 ± 0.11	0.77 ± 0.17	0.05 ± 0.02
SwinUNETR	0.85 ± 0.09	0.86 ± 0.12	0.85 ± 0.08	0.05 ± 0.02
SegResNet	0.81 ± 0.13	0.75 ± 0.18	0.92 ± 0.04	0.06 ± 0.02

* The standard deviation of each metric is presented next to each measure after the ± symbol.

Table 5. Statistical evaluation of the segmentation performance.

Models	UNet	UNETR	SwinUNETR	SegResNet
UNet	-	5.01e-5	0.0318	0.0566
UNETR	5.01 × 10⁻⁵	-	4.5 × 10⁻⁵	0.0271
SwinUNETR	0.0318	4.5 × 10⁻⁵	-	0.3881
SegResNet	0.0566	0.0271	0.3881	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mavridis, C.; Vagenas, T.P.; Economopoulos, T.L.; Vezakis, I.; Petropoulou, O.; Kakkos, I.; Matsopoulos, G.K. Automatic Segmentation in 3D CT Images: A Comparative Study of Deep Learning Architectures for the Automatic Segmentation of the Abdominal Aorta. Electronics 2024, 13, 4919. https://doi.org/10.3390/electronics13244919

AMA Style

Mavridis C, Vagenas TP, Economopoulos TL, Vezakis I, Petropoulou O, Kakkos I, Matsopoulos GK. Automatic Segmentation in 3D CT Images: A Comparative Study of Deep Learning Architectures for the Automatic Segmentation of the Abdominal Aorta. Electronics. 2024; 13(24):4919. https://doi.org/10.3390/electronics13244919

Chicago/Turabian Style

Mavridis, Christos, Theodoros P. Vagenas, Theodore L. Economopoulos, Ioannis Vezakis, Ourania Petropoulou, Ioannis Kakkos, and George K. Matsopoulos. 2024. "Automatic Segmentation in 3D CT Images: A Comparative Study of Deep Learning Architectures for the Automatic Segmentation of the Abdominal Aorta" Electronics 13, no. 24: 4919. https://doi.org/10.3390/electronics13244919

APA Style

Mavridis, C., Vagenas, T. P., Economopoulos, T. L., Vezakis, I., Petropoulou, O., Kakkos, I., & Matsopoulos, G. K. (2024). Automatic Segmentation in 3D CT Images: A Comparative Study of Deep Learning Architectures for the Automatic Segmentation of the Abdominal Aorta. Electronics, 13(24), 4919. https://doi.org/10.3390/electronics13244919

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu