Abstract
Automatic computerized segmentation of fetal head from ultrasound images and head circumference (HC) biometric measurement is still challenging, due to the inherent characteristics of fetal ultrasound images at different semesters of pregnancy. In this paper, we proposed a new deep learning method for automatic fetal ultrasound image segmentation and HC biometry: deeply supervised attention-gated (DAG) V-Net, which incorporated the attention mechanism and deep supervision strategy into V-Net models. In addition, multi-scale loss function was introduced for deep supervision. The training set of the HC18 Challenge was expanded with data augmentation to train the DAG V-Net deep learning models. The trained models were used to automatically segment fetal head from two-dimensional ultrasound images, followed by morphological processing, edge detection, and ellipse fitting. The fitted ellipses were then used for HC biometric measurement. The proposed DAG V-Net method was evaluated on the testing set of HC18 (n = 355), in terms of four performance indices: Dice similarity coefficient (DSC), Hausdorff distance (HD), HC difference (DF), and HC absolute difference (ADF). Experimental results showed that DAG V-Net had a DSC of 97.93%, a DF of 0.09 ± 2.45 mm, an AD of 1.77 ± 1.69 mm, and an HD of 1.29 ± 0.79 mm. The proposed DAG V-Net method ranks fifth among the participants in the HC18 Challenge. By incorporating the attention mechanism and deep supervision, the proposed method yielded better segmentation performance than conventional U-Net and V-Net methods. Compared with published state-of-the-art methods, the proposed DAG V-Net had better or comparable segmentation performance. The proposed DAG V-Net may be used as a new method for fetal ultrasound image segmentation and HC biometry. The code of DAG V-Net will be made available publicly on https://github.com/xiaojinmao-code/.
Keywords: Fetal ultrasound image segmentation, Head circumference, Deep learning, Attention mechanism, Deep supervision
Introduction
Ultrasound imaging has been widely used in prenatal care for pregnant women at different semesters of pregnancy [1]. Fetal examination mainly monitors fetal development by measuring the biological parameters, such as fetal abdominal circumference (AC), femur length (FL), crown-rump length (CRL), biparietal diameter (BPD), and head circumference (HC) [2]. Fetal HC biometric measurement is an important item of fetal examination, which is generally measured in the specific cross section of fetal head (called standard plane). Obstetricians and gynecologists can estimate the gestational age and fetal weight at the 13th to 25th week of pregnancy by measuring the fetal HC, evaluate the development of the fetus, and determine the delivery mode of pregnant women [3]. At present, the fetal HC is generally measured manually by radiologists with ellipse fitting, which is time-consuming and tedious, and may cause inter- and intra-operator difference. Therefore, there is a need for developing computerized automatic fetal HC biometric measurement methods.
Most of the existing computerized fetal HC measurement methods are based on the assumption that the fetal head contour is approximately elliptical. The detection of fetal head contour is a key step for measuring HC, but due to the defects of fetal ultrasound imaging, as shown in Fig. 1, there are some difficulties in automatic segmentation of fetal head from ultrasound images, such as artifacts, attenuation, speckle noise, and low signal-to-noise ratio [4]. These difficulties render fetal ultrasound image segmentation quite challenging, in terms of blurred head boundary, interruption of the fetal skull by the normal sutures or ultrasound artifacts, and interference form structures similar to fetal head in texture such as the interface of the uterine wall, and amniotic fluid was very large.
Traditional methods have been used for fetal head segmentation from ultrasound images, including the difference of Gaussians [4], deformable models [5], texture maps [6], multi-level thresholding [7], and morphological operators [8]. There are also machine-learning methods used. Lu et al. [9] proposed an iterative random Hough transform method for the detection of incomplete ellipses in images with strong noise, but their method may not detect fetal head in low-contrast ultrasound images. Zhang et al. [10] designed multi-scale and multi-directional filter banks to extract fetal anatomical structure and texture features. Li et al. [11] used the prior knowledge of fetal HC to obtain the region of interest with random forest and detected the fetal head edge with phase symmetry, but their method may have poor performance on fitting the fetal skull of partial missing on ultrasound images in late pregnancy. Heuvel et al. [12] provided a fetal HC biometric measurement database HC18 and used random forest and Haar features to extract fetal HC with Hough transform.
In recent years, with the development of deep learning technology, the combination of medical image and artificial intelligence has been a hot research direction. With the emergence of convolutional neural networks (CNNs), their application in medical image segmentation is booming. Fully convolutional networks (FCNs) [13], U-Net [14], and three-dimensional V-Net [15] are some of the representative architectures of CNNs. Jang et al. [16] proposed a CNN method to identify the abdominal region and applied Hough transform to measure the fetal AC. Wu et al. [17] used a cascaded FCN for the segmentation of fetal head and abdomen in ultrasound images in combination with context information, but the model in the cascading process would repeatedly extract similar low-level features, which would lead to excessive and redundant use of computing resources and model parameters. Al bander et al. [18] proposed a fetal head boundary detection method based on the combination of fast R-CNN and FCN, combining target localization and segmentation. Sobhaninia et al. [1] proposed a lightweight multi-scale CNN. Their network parameters are few, and the training time is short. However, the fetal head segmentation and HC biometry performance of current deep learning–based methods still needs to be improved, as automatic segmentation is challenging due to the inherent characteristics of fetal ultrasound images at different semesters of pregnancy (Fig. 1).
In this paper, we proposed a deeply supervised attention-gated (DAG) V-Net method for automatic fetal head segmentation and HC biometric measurement from two-dimensional ultrasound images. We improved the original V-Net model by incorporating attention gates (AGs) and deep supervision. In addition, we introduced multi-scale loss function for deep supervision. Experimental results on the testing set of HC18 showed that the incorporation of the attention mechanism and deeply supervised strategy by the proposed DAG V-Net method improved the segmentation accuracy while accelerating the convergence speed.
In the following sections, the methodology of DAG V-Net and HC measurement is described, followed by experiments and results. Then, fetal head segmentation and HC biometric at different semesters of pregnancy is discussed and concluded.
Methodology
Figure 2 shows the flow chart of automatic fetal ultrasound segmentation and HC biometry by the DAG V-Net method proposed in this paper. Firstly, we enhanced the training set of fetal ultrasound images by data augmentation. Then, we used the augmented set to train a DAG V-Net model. For an input image from testing sets, the output of the model was usually not a standard ellipse, so we further used morphological processing to eliminate small circular structures and noise in the model output images. Subsequently, the morphologically processed images were edge-detected, and then the least-squares method was used to fit the ellipse of the head according to the detected edge (boundary). Finally, we calculated the final HC with the pixel resolution of each ultrasound image and determined the position of the fetal head according to the information of ellipse center points and rotation angles, etc. The specific details of the proposed method are described below.
Deeply Supervised V-Net
To design an end-to-end segmentation method for fetal ultrasound images, we were inspired by the V-Net architecture [15]. The V-Net model is an excellent three-dimensional CNN, which is commonly used in biomedical image processing. The high-resolution features of the encoder in V-Net can provide accurate localization of the target. The skip connection structure connects the high-resolution features with the output features sampled on the decoder, making the final prediction results more accurate; in particular, V-Net can overcome the serious imbalance between the number of foreground pixels and background pixels [15]. On this basis, we improved the architecture, dimensions, and details of V-Net, by incorporating attention modules and deeply supervised strategy [19] with hybrid loss function. Figure 3 shows the architecture of the proposed DAG V-Net, consisting of an encoder on the left and a decoder on the right.
Encoder
In each layer of the encoder of DAG V-Net, we extracted the corresponding low-resolution features from the data through 1–3 convolutional blocks for learning. The architecture of each convolutional block is shown in Fig. 4. We used a 3 × 3 convolution kernel. We chose group normalization (GN) [20] instead of batch normalization (BN) [21], because the accuracy of GN is quite stable in a large range of batch size, which will be more suitable for the data calculation of the network, accelerating the training speed and improving the accuracy of the model. The GN was in the channel direction, and the number of groups was 16. The GN was followed by a rectified linear unit (ReLU) activation layer to further improve the speed of convergence. Finally, a dropout layer (dropout ratio 0.5) was set to inactivate the phase weight, which can effectively prevent data overfitting and improve the generalization ability of the model. At the end of each stage, the following two points should be noted.
Firstly, the residual mechanism between convolutions was used. The purpose was to add the input convolution layer of each stage to the output of the last convolution layer, so as to learn the residual function. Our experiments showed that these mechanisms can make the residual information have semantic meaning in the shallow and deep layers of the network. Compared with the network without learning the residual function, the architecture with the residual mechanism takes less time to converge. It can learn the features of different abstract levels from the edge (shallow layers) to the very complex features (deep layers), so we used the residual mechanisms through the whole model.
Secondly, pooling layers were replaced with convolution layers. Conventional CNNs use a large amount of pooling layers for downsampling. We considered that the extensive use of pooling layers may lose some information when decoding (up-sampling) for resolution reconstruction. For semantic segmentation of fetal ultrasound images, conventional CNNs such as U-Net may have a bottleneck for segmentation accuracy. Therefore, we replaced pooling layers with convolution layers to reduce resolution (downsampling), at the expense of a small amount of computation capacity. The convolution stride was set at 2 [22].
Decoder
In each layer of the decoder of DAG V-Net, we also utilized the residual connection similar to the encoder, and then used deconvolution to increase the input size and image resolution. Each layer used 1–3 convolution blocks to extract corresponding features from the data, so as to collect deep image feature information. The difference consisted in that we added attention modules and implemented deep supervision strategy to further guide network training. The following two points should be noted.
Firstly, the skip connection was replaced with the attention module. Similar to conventional V-Net models, we retained the operation of extracting rich shallow resolution feature information from the left side of CNNs and merging it into the right side. The difference was that we used multiple attention module connections instead of the skip connection in the conventional V-Net architecture. The attention guidance module filtered the low-resolution and high-resolution feature maps, outputted the feature map with attention guidance (also known as the corrected feature map), and then fused the features outputted by attention modules with the deconvolution layer of previous stages. As a result, the spatial information was recovered, and the structure information was fused at different resolution levels. Thus, we could collect the fine-grained details lost in the encoder and improve the quality of final contour prediction through guidance in network training. We also observed that the use of these connections accelerated the convergence of the model.
Secondly, the deeply supervised strategy was used. The number of training data used in this paper was limited, which is also a common problem in the field of medical image processing. In order to overcome the difficulty of training deep neural networks with limited training data, we incorporated the deeply supervised strategy [23] into the network (indicated by the yellow region enclosed by dotted lines in Fig. 3), so as to improve the segmentation sensitivity by using multi-level features. After learning from the residual network, the five layers of deep supervision were up-sampled by a factor of 16, 8, 4, 2, and 1, respectively, so that the output and input sizes of different layers were the same. Then, the convolution kernel of 1 × 1 was used to map to the same channel dimension. Probability segmentation of foreground and background regions was achieved by the sigmoid activation function (0 and 1 represented fetal head and non-fetal-head regions, respectively).
Hybrid Loss Function
Dice [15] was selected as the loss function for training. Dice is a common loss function and quality evaluation index in the field of image segmentation. Dice and Dice similarity coefficient (DSC) are defined as:
1 |
2 |
where is the region manually annotated by radiologists as the gold standard, and is the segmented region predicted by the network model.
Refer to Fig. 3. The output of our proposed network had five different scales of segmentation results. We calculated the Dices of , , , and from the first four scales and added them as . In order to control the influence of these four Dices on the final segmentation results, a weight factor (with an initial value of 1) was multiplied to control the proportion:
3 |
Our experiments showed that it was best to decay by 0.9 times for each epoch. As the training progressed, the weight of the low-level feature contribution decreased. It should be noted that the low-level features were not directly output, but they were restored through the up-sampling process. Finally, the Dice calculated by the final mixed output layer, , was defined as:
4 |
Note that we did not use the loss coefficient as the index to evaluate the segmentation performance of our method, because the loss function of the proposed DAG V-Net was calculated by the sum of the losses of all five scales, totally different from the loss function defined in other related networks.
Attention-Gated Module
The attention mechanism was initially applied in the field of natural language processing [24] and is now widely used in a variety of tasks, including image segmentation. Rich spatial information is the key for semantic segmentation. U-Net has achieved good results in the field of medical image segmentation, but the skip connection module in the encoder and decoder of U-Net tends to use unnecessary information and computational sources, where likable low-level encoder features are repeatedly used at multiple scales [39], which may affect the further improvement of image detail segmentation. For fetal ultrasound image segmentation, the shallow feature map not only contains the details of the fetal head but also includes the non-fetal-head region. The deep feature map can capture the highly semantic information to describe the location, but it may lose the details of the head boundary. Hence, there is a need for a method to guide the network to carry out accurate localization learning.
In order to improve the accuracy of segmentation, some current segmentation frameworks [25, 26] added object localization models in feature extraction, which undoubtedly complicated the overall model and consumed a lot of computing resources. Inspired by the work of Schlemper et al. [27] and Zhang et al. [28], we added attention-gated modules to the segmentation model for training (Fig. 5), which suppressed the features irrelevant to learning and tasks, while strengthening the features related to learning and tasks. In order to refine the features of each layer, we introduced the deep attention module. The proposed DAG V-Net used attention blocks instead of skip connection modules to capture low-level spatial and high-level context features. Through filtering on low-resolution and high-resolution feature maps, spatial information was recovered, and structure information was fused from different resolution levels, while the feature response of unrelated background regions was gradually suppressed. Attention blocks further extended the network performance by fusion with the deeply supervised V-Net to guide the network in learning the most significant features.
Attention modules were very important in our method, which made the whole network performance further optimized. In particular, the attention block was used to highlight the foreground and reduce the influence of background pixels, which could effectively solve the problem that when the area of the structure similar to the fetal head in the image, such as the interface of the uterine wall and amniotic fluid, was very large, some regions outside the head were mistakenly classified as fetal heads. Refer to Fig. 5. Let us take the first attention block as an example. Specifically, it consists of the following six steps.
Step 1. The attention block had two input gates: one was the gating signal g in the low-resolution layer, and the other was the input signal x in the high-resolution layer. Before inputted to the attention module, the signal x underwent channel transform to obtain the same feature channel as the signal g, and then the feature map x' was obtained.
Step 2. Feature mapping of x' and g was conducted, in order to obtain the same size as the input signal x. The linear transformation with a 1 × 1 convolution kernel was conducted. The strides of the gating signal g were set at 2. Then, the feature map g' was obtained.
Step 3. The residual connection between the feature maps x' and g' was conducted, which was called the attention based on vector connection [29], where the connected features were linearly mapped to a potential space. Then, the activated feature map was obtained by ReLU on the connected feature:
5 |
where ; , , and are convolution weights; , , and are bias terms.
Step 4. The feature map was convolved with a 1 × 1 × 1 convolution kernel. The purpose was to reduce the depth of these combined feature maps to 1. Then, the attention coefficient was normalized by the sigmoid function . In this way, each pixel value of the attention map was rescaled to [0, 1], and the attention coefficient was obtained:
6 |
where is a convolution weight and is a bias term. Next, was restored to the original size by a factor of 2 to obtain the attention coefficient .
Step 5. The attention coefficient was multiplied by the matrix of the gating signal g to recalibrate the relative activation of each feature. The channel was mapped back to the original size by the 1 × 1 convolution kernel, followed by the GN with 16 channels, further accelerating the training convergence to obtain the attention-guided feature map :
7 |
where is a convolution weight and is a bias term.
Step 6. The last step was to connect the result of the previous steps with the original input signal x. Before that, the input signal needed to be deconvolved to improve the resolution to obtain the signal x″, where a 3 × 3 convolution kernel was used.
It is worth noting that our attention modules are very simple and can be directly inserted into the pipeline of similar encoder structures. They do not add too many parameters and can encode abundant low-level spatial information at low computational cost.
Ellipse Fitting
In clinical diagnosis, the measurement plane is the thalamus plane (the same as the measurement plane of BPD) when the fetal head is scanned. There are two commonly used estimation methods for fetal HC, as shown in Fig. 6. Method 1 (Fig. 6a) is based on occipitofrontal diameter (OFD) and BPD:
8 |
Method 1 has few measurement parameters and is easy to calculate. It is more accurate in the 12th to 28th week of pregnancy. However, in the early stage of pregnancy, there will be large deviation due to factors such as irregular shapes of gestational sacs and blurred skulls around the head. Therefore, at present, the ellipse fitting method, i.e., method 2 (Fig. 6b), is an important method for clinical measurement of fetal HC. The fitting position and size were determined according to the long axis (semi_axis_a), short axis (semi_axis_b), ellipse center point (center_x, center_y), and central angle (angle) of the fitted ellipse, as denoted in Fig. 6b. In method 2, HC was calculated by
9 |
In this paper, method 2 was used for HC biometric measurement.
Experiments
Dataset
The dataset used in this work is the open-access HC18 Challenge dataset collected from the obstetric database of Radboud University Nijmegen Medical Center, the Netherlands, without any abnormal growth. The ultrasound images were acquired from 551 pregnant women who received a routine ultrasound screening exam between May 2014 and May 2015. Images were acquired by experienced sonographers using either the Voluson E8 or the Voluson 730 ultrasound device (General Electric, Austria) [12]. As shown in Table 1, there were 1354 two-dimensional ultrasound images collected from 551 pregnant women at different semesters of pregnancy. There were 999 images in the training set and 355 images in the testing set. The training set also contained manually annotated HC images by experienced radiologists, which were used as gold standard for image segmentation. However, the testing set did not provide gold standard for segmentation. Each image was 800 × 540 pixels in size, and the pixel resolution was from 0.052 to 0.326 mm. This big change in pixel resolution resulted from adjusting the ultrasound scanners (depth setting and zoom) according to different fetal sizes. For the HC18 Challenge (https://hc18.grand-challenge.org/), the results of the participant’s algorithm on the testing set should be uploaded to HC18 to obtain the final performance.
Table 1.
Trimesters of pregnancy | Training sets | Testing sets |
---|---|---|
First trimester | 165 | 55 |
Second trimester | 693 | 233 |
Third trimester | 141 | 47 |
Total | 999 | 355 |
Augmentation
The number of images in the training set of HC18 (n = 999) is far from satisfying a deep network model for learning. In order to improve the network robustness, prevent the overfitting of training data, and improve the generalization ability, we expanded the dataset by data augmentation (Fig. 2). Especially in the process of ultrasound fetal scanning, the size, shape, and location of fetuses have significant differences among individuals. Data enhancement can help to introduce more such training data diversity and balanced samples. We used rotation and flip transforms to generate 20 images for each image in the training set. For data enhancement, we transformed each image by flipping the image horizontally and vertically, using a fixed rotation angle from − 20° to 20°. The scaling ratio of the image was [0.85, 1.15]. We generated a total of 19,980 enhanced images for the augmented set. The images in the augmented set were randomly divided into 80% training dataset and 20% verification dataset. Note that we did nothing on the testing set provided by HC18, all of which was directly used to test the performance of the deep learning model.
Training
Our experiments were performed on a graphics workstation, with Intel (R) Xeon (R) CPU e5-2620 V4 @ 2.10 GHz, NVIDIA GeForce GTX 1080ti 11G, and 64G RAM. The popular Tensorflow and Keras were selected for the deep learning framework. We tried several models to test the effectiveness of the attention module and the deep supervision (DS) strategy respectively, namely, U-Net, V-Net, Attention V-Net, DS U-Net, DS V-Net, and DAG V-Net. Some key hyperparameter settings have been described previously in this paper. Then, we initialized the network parameters and trained 20 epochs each time. The training time of each epoch was about 85 min, and the total training time was about 30 h. The input size of model training was 768 × 512, and we found that the precision of training at this scale would be higher. The optimizer used Adam et al.’s method [30]. The initial learning rate was 0.001, the batch size was 2, and the dropout ratio in all steps was 0.5.
Evaluation
In order to analyze the model quantitatively and evaluate the performance of the model, we used indices including DSC, Hausdorff distance (HD), HC difference (DF), and HC absolute difference (ADF) as the performance indices of the model. DF and ADF are defined as:
10 |
11 |
where is the gold standard HC, and is the HC predicted by deep learning network models.
HD is defined by:
12 |
where
13 |
14 |
15 |
where represents the pixels in the segmentation results, and represents the pixels from the gold standard.
Results
Table 2 shows the segmentation and biometry performance of different deep network models on the HC18 testing set, including U-Net, V-Net, Attention V-Net, DS U-Net, DS V-Net, and DAG V-Net. Four key indices of DSC, DF, AD, and HD were presented. It can be seen that V-Net had better performance than U-Net. Incorporating the deep supervision strategy improved the performance of U-Net and V-Net (DS U-Net better than U-Net; DS V-Net better than V-Net). The Attention V-Net model produced a DSC of 97.91 ± 1.24%, a DF of − 0.57 ± 2.46 mm, an AD of 1.85 ± 1.71 mm, and an HD of 1.28 ± 0.79 mm. Compared with the non-attention-based U-Net and V-Net, all the four indices were improved by Attention V-Net. Furthermore, the proposed DAG V-Net model yielded the best performance among the six models; the DSC reached 97.93 ± 1.25%, and the AD was reduced to 1.77 ± 1.70 mm, showing that on the basis of incorporating the attention module, the deep supervision strategy further optimized the overall model and improved the segmentation accuracy.
Table 2.
Methods | DSC (%) | DF (mm) | AD (mm) | HD (mm) |
---|---|---|---|---|
U-Net | 97.72 ± 1.42 | 1.83 ± 2.58 | 2.36 ± 2.11 | 1.35 ± 0.85 |
V-Net | 97.85 ± 1.32 | 0.95 ± 2.58 | 2.01 ± 1.87 | 1.30 ± 0.75 |
Attention V-Net | 97.91 ± 1.24 | − 0.57 ± 2.46 | 1.85 ± 1.71 | 1.28 ± 0.79 |
DS U-Net | 97.85 ± 1.26 | − 1.06 ± 2.48 | 2.00 ± 1.80 | 1.31 ± 0.78 |
DS V-Net | 97.87 ± 1.27 | − 0.38 ± 2.53 | 1.89 ± 1.73 | 1.29 ± 0.74 |
DAG V-Net | 97.93 ± 1.25 | 0.09 ± 2.45 | 1.77 ± 1.70 | 1.27 ± 0.80 |
Data are presented as mean ± standard deviation. The segmentation and biometry performance data were obtained after uploading the experimental results by different methods to the official assessment system of the HC18 Challenge. The DAG V-Net proposed in this work achieved the best performance among the six deep learning methods (data indicated in bold)
DSC Dice similarity coefficient, HD Hausdorff distance, DF head circumference difference, AD head circumference absolute difference, DS deeply supervised, DAG deeply supervised attention-gated
Figure 7 shows representative fetal head segmentation results by three deep learning models: U-Net, V-Net, and Attention V-Net, in order to demonstrate the effect of the attention module. The Attention V-Net model produced more accurate segmentation results than U-Net and V-Net and was the closest to the gold standard among the three models. Incorporating the attention mechanism can better capture the specific position of the fetal head, especially in the blurred head edge.
Figure 8 shows the performance of three deep network models on the training set: U-Net, V-Net, DS U-Net, Attention V-Net, and DAG V-Net, demonstrating the effect of the deep supervision strategy. The DSC of DAG V-Net on the training set was 98.29%. While effectively improving the accuracy over Attention V-Net and DS U-Net, the convergence speed was also significantly accelerated. Further, Fig. 9 shows the segmentation effect on the testing set (green: DS U-Net, blue: Attention V-Net, red: DAG V-Net). It can be seen that the DAG V-Net model is the best for addressing the problems of blurred head edges, incomplete skull edges, and interference from similar structures.
Table 3 shows the performance of the proposed DAG V-Net model on HC18 testing sets for different trimesters of pregnancy. The proposed method achieved a high accuracy, especially in the second and third trimesters of pregnancy, for which the DSC reached over 98.10%. At the same time, we also found some shortcomings of our method, such as a lower DSC in the first trimester and a relatively larger AD in the third trimester. These will be further discussed in the next section.
Table 3.
Evaluation indices | First trimester | Second trimester | Third trimester |
---|---|---|---|
DSC (%) | 96.82 | 98.16 | 98.11 |
DF (mm) | 0.306 | − 0.011 | − 0.293 |
AD (mm) | 1.419 | 1.718 | 2.429 |
HD (mm) | 0.846 | 1.189 | 2.157 |
DSC Dice similarity coefficient, HD Hausdorff distance, DF head circumference difference, AD head circumference absolute difference. The segmentation and biometry performance data were obtained after uploading the experimental results by the proposed DAG V-Net to the official assessment system of the HC18 Challenge
The average running time for automated fetal HC segmentation and measurement in a testing ultrasound image was 6.3 ms on the graphics workstation used in this study.
Discussion
Significance of this study
In this paper, a new improved deep learning method was proposed for automatic fetal ultrasound image segmentation and HC biometric measurement. The advanced three-dimensional V-Net was transformed into two-dimensional network models, which then incorporated the attention-gated module. The attention mechanism adaptively integrated local features and their global dependencies by capturing rich context information. At the same time, multi-scale loss function was introduced to carry out deep supervision, so as to effectively address the problems of blurred boundary and edge missing by using multi-level features to refine the segmentation. The problems including head boundary missing and interference from structures similar to fetal head (e.g., the interface of the uterine wall and amniotic fluid was very large) were effectively solved by the proposed DAG V-Net method. As a result, the fetal head was accurately segmented by the proposed method, and the segmentation and biometry performance was improved. For the HC18 Challenge, the DSC by DAG V-Net on the training set was 98.29%; on the testing set, the DSC was 97.93%, and the AD was 1.77 ± 1.69 mm. Currently, the proposed DAG V-Net method ranks fifth among the participants in the HC18 Challenge.
Contribution of Residual Mechanism
Compared with U-Net, V-Net showed better segmentation performance (Table 2), which is largely due to the residual mechanism introduced in convolutional blocks of each layer. The residual mechanism can make the residual information have semantic significance in the shallow and deep layers of the network. It can learn from the edge (shallow layer) to the very complex features (deep layer) in different abstract levels, optimizing the gradient derivation process of reverse training.
Contribution of Attention Mechanism
When combining attention mechanism with the V-Net model, the convergence speed of the resulted network (Attention V-Net) was significantly improved; the DSC on the testing set was 97.91 ± 1.24%, and the AD was reduced to 1.85 ± 1.71 mm, with improved performance in comparison with V-Net. Conventional U-Net and V-Net models were not able to segment the images more accurately and effectively, especially for the problem of blurred boundary, resulting in a large error in pixel classification, which greatly affected the quality of subsequent HC ellipse fitting. However, after model fusion, the attention mechanism can capture the specific position of the fetal head and can further guide the model to learn the edge details with more attention resources given. Local features with their global dependencies are adaptively integrated, so as to guide the network to learn how to use the multi-level features to refine the segmentation, which is the key for learning. The attention module can effectively solve the problem that the skip connection module in the U-Net encoder architecture cannot effectively use the structure information, which may affect the image segmentation performance.
Contribution of Deep Supervision
After incorporation of deep supervision, the convergence speed and segmentation accuracy were improved by the proposed DAG V-Net model. The deep supervision strategy can accelerate the training convergence of fetal ultrasound image segmentation based on deep learning, with a high accuracy. We introduced the multi-scale loss function for deep supervision, making full use of the complementary information encoded by each layer of CNNs. Multi-level features were fused to refine the features of each individual layer, gradually suppressing the non-fetal noise in the shallow layer of CNNs, while highlighting the foreground information and putting more fetal details into the deep features. As a result, it is possible to recover spatial information and fuse structure information from different resolution levels.
Comparison Between Different Trimesters of Pregnancy
The fetal skull is not fully developed in the first trimester of pregnancy, so it is challenging for fetal head segmentation from ultrasound images in this trimester. The proposed method achieved generally better segmentation performance for the second and third trimesters of pregnancy. From experiments, we found two challenges for the proposed method. First, the DSC of the proposed method in the early pregnancy was relatively lower, as shown in Fig. 10a. From the training set images, we found that part of the fetal skull in the early pregnancy has not developed, and the skull contour is small, with noise which may affect image segmentation performance. The second was that the AD was generally larger in the third trimester of pregnancy, as shown in Fig. 10b. We considered two possible reasons. (i) Because the fetal skull grows increasingly at late pregnancy, it may be difficult to capture a complete structure on an ultrasound imaging plane, resulting in partial loss of the skull at the two ends, so the fitted ellipse may be significantly larger. (ii) The edge structure of fetal skull is obviously thicker at the third semester, meaning that there are more, brighter pixels at the skull with high echogenicity. Some of these pixels on the thick skull may cause misclassification, as the computer-detected fetal head boundary is a thinner ellipse curve (Fig. 10b).
Comparison with Published State-of-the-Art Methods
Table 4 compares the proposed method with published state-of-the-art methods for fetal head segmentation and HC biometry from ultrasound images. The testing ultrasound images can be classified into 3 groups. One is the testing set provided by the HC18 Challenge [12], containing 355 testing ultrasound images; the data distribution over different semesters of pregnancy is uniform. The second is a dataset of International Symposium on Biomedical Imaging (ISBI) 2012 [4], containing 90 testing ultrasound images; the data distribution over different semesters of pregnancy is less uniform. Note that ISBI 2012 is a subset of HC18. The third is the private dataset. Compared with published state-of-the-art methods, the proposed method yielded better performance in DF and AD. For DSC, Liu et al. [32] had the highest mean DSC (98.05 mm), but their standard deviation of DSC is large (4.02 mm); our method (97.93 ± 1.25 mm) is comparable with their method in DSC. For HD, both Liu et al. [32] and our method had a lowest mean HD (1.27 mm), while Liu et al. [32] had a slightly smaller standard deviation of HD (0.77 mm vs. 0.80 mm). Overall, our method is better than or comparable with published state-of-the-art methods.
Table 4.
Authors | Methods | Year | Number of cases in testing sets | DSC (%) | DF (mm) | AD (mm) | HD (mm) |
---|---|---|---|---|---|---|---|
Heuvel et al. [12] | Random forest | 2018 | 355a | 97.10 ± 2.73 | 0.56 ± 4.21 | 2.83 ± 3.16 | 1.83 ± 1.60 |
Rong et al. [31] | GVF-Net | 2019 | 355a | 95.53 ± 3.98 | -0.24 ± 3.23 | 2.42 ± 1.93 | 2.18 ± 2.40 |
Al-Bander et al. [18] | Mask R-CNN | 2019 | 355a | 97.73 ± 1.32 | 1.49 ± 2.85 | 2.33 ± 2.21 | 1.39 ± 0.82 |
Liu et al. [32] | SAF-Net | 2020 | 355a | 98.05 ± 4.02 | 1.26 ± 2.95 | – | 1.27 ± 0.77 |
Sobhaninia et al. [1] | Mini Link-Net | 2019 | 355a | 96.84 ± 2.89 | 1.13 ± 2.69 | 2.12 ± 1.87 | 1.72 ± 1.39 |
Ni et al. [33] | AdaBoost | 2013 | 175 | – | – | 5.58 ± 1.74 | – |
Jatmika et al. [2] | AdaBoost | 2015 | 100 | – | – | 8.21 ± 0.00 | – |
Foi et al. [4] | Difference of Gaussians | 2014 | 90b | 97.80 ± 1.04 | -2.01 ± 3.29 | – | 2.16 ± 1.44 |
Sun et al. [4] |
Circular shortest paths |
2014 | 90b | 96.97 ± 1.07 | 3.83 ± 5.66 | – | 3.02 ± 1.55 |
Stebbing et al. [4] | Random forest | 2014 | 90b | 97.23 ± 0.77 | -3.46 ± 4.06 | – | 2.59 ± 1.14 |
Ciurte et al. [4] | Semi-supervised patch-based | 2014 | 90b | 94.45 ± 1.57 | 11.93 ± 5.32 | – | 4.60 ± 1.64 |
Ponom et al. [4] | Multilevel thresholding | 2014 | 90b | 92.53 ± 10.22 | 16.39 ± 24.88 | – | 6.87 ± 9.82 |
Zalud et al. [34] | Hierarchical and discriminative | 2009 | 80 | – | – | 5.10 ± 5.40 | – |
Satwika et al. [35] | Particle swarm optimization | 2014 | 72 | – | – | 14.6 ± 0.00 | – |
Ryou et al. [36] | U-Net | 2019 | 21 | – | – | 1.98 ± 1.19 | – |
Carneiro et al. [37] | Probabilistic boosting tree | 2008 | 20 | – | – | 2.76 ± 1.40 | 4.15 ± 2.05 |
Lu et al. [38] | Iterative random Hough transform | 2008 | 11 | – | – | 3.41 ± 1.74 | – |
Perez et al. [6] | Texture maps | 2015 | 10 | 97.19 ± 0.97 | − 2.73 ± 2.04 | – | 2.64 ± 0.57 |
Zhang et al. [10] | Multi-directional filter banks | 2016 | 10 | – | − 0.22 ± 9.53 | – | 3.30 ± 1.09 |
Our method | DAG V-Net | 2020 | 355a | 97.93 ± 1.25 | 0.09 ± 2.45 | 1.77 ± 1.70 | 1.27 ± 0.80 |
DSC Dice similarity coefficient, HD Hausdorff distance, DF head circumference difference, AD head circumference absolute difference
aTesting set of HC18 were used
bTesting set of ISBI 2012 were used
However, the algorithm execution time was rarely described in the above-published literature. We compared the proposed method with some state-of-the-art methods (U-Net [14], SCSE U-Net [40], DeepLabV3 + [41], PSPNet [42]) using the same computer and image datasets, in terms of mean AD, DSC, and algorithm execution time. These comparisons are shown in Table 5. It can be seen that the proposed method outperforms these methods with respect to AD and DSC, while the mean algorithm execution time of the proposed method does not increase much as compared with U-Net [14] (6.3 vs. 5.3 ms).
Table 5.
Method | Mean AD (mm) | Mean DSC (%) | Mean algorithm execution time (ms) |
---|---|---|---|
U-Net [14] | 2.36 | 97.72 | 5.3 |
SCSE U-Net [40] | 1.97 | 97.69 | 5.8 |
DeepLabV3 + [41] | 2.64 | 97.16 | 10.2 |
PSPNet [42] | 2.71 | 96.21 | 9.5 |
Our method | 1.77 | 97.93 | 6.3 |
AD head circumference absolute difference, DSC Dice similarity coefficient
Potential Applications of the Proposed Method
Fetal sonography is a relatively low-cost examination and can be taught to those whose specialty is not fetal imaging. This would make the proposed automated HC measurement method more valuable for “medically underserved” areas where ultrasound expertise may be lacking or in short supply. Use in point-of-care situations is another potential application of the proposed method in which a person who is not a trained ultrasound technologist can get this necessary measurement in a bedside or remote location.
Limitations and Future Work
This study has some limitations. Firstly, the number of cases is limited. Secondly, the segmentation accuracy for the first trimester of pregnancy is lower. For the third trimester of pregnancy, the AD is larger. These limitations may be overcome in future work.
Conclusions
In this work, we proposed a DAG V-Net deep learning method for fetal head segmentation HC biometric measurement from two-dimensional ultrasound images. By incorporating the attention mechanism and deep supervision, the proposed method yielded better segmentation performance than conventional U-Net and V-Net methods. Compared with published state-of-the-art methods, the proposed DAG V-Net had better or comparable segmentation performance, with respect to indices of DSC, DF, AD, and HD. The proposed DAG V-Net may be used as a new method for fetal ultrasound image segmentation and HC biometry.
Acknowledgments
The authors would like to thank the anonymous reviewers for their insightful and valuable comments and suggestions.
Funding
This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 11804013, 61871005, 61801312, and 71661167001), the Beijing Natural Science Foundation (Grant No. 4184081), the International Research Cooperation Seed Fund of Beijing University of Technology (Grant No. 2018A15), the Basic Research Fund of Beijing University of Technology, and the Intelligent Physiological Measurement and Clinical Translation, Beijing International Base for Scientific and Technological Cooperation.
Compliance with Ethical Standard
Conflict of Interest
The authors declare that they have no competing interests.
Ethical Approval
This retrospective study used fetal ultrasound images provided by the HC18 Challenge, which were collected from the database of the Department of Obstetrics of the Radboud University Medical Center, Nijmegen, the Netherlands, in accordance with the local ethics committee (CMO Arnhem-Nijmegen). All data were anonymized according to the tenets of the Declaration of Helsinki.
Informed Consent
This study used retrospective data provided by the HC18 Challenge, so informed consent was waived.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Yan Zeng and Po-Hsiang Tsui have equally contributed in this work.
Contributor Information
Zhuhuang Zhou, Email: zhouzh@bjut.edu.cn.
Shuicai Wu, Email: wushuicai@bjut.edu.cn.
References
- 1.Sobhaninia Z, Emami A, Karimi N, Samavi S: Localization of fetal head in ultrasound images by multiscale view and deep neural networks. arXiv preprint, 2019 https://arxiv.org/abs/1911.00908
- 2.Jatmiko W, Habibie I, Ma'sum MA, Rahmatullah R, Satwika IP: Automated telehealth system for fetal growth detection and approximation of ultrasound images. Int J Smart Sensing Intell Syst 8(1):697-719,2015
- 3.Schmidt U, Temerinac D, Bildstein K, Tuschy B, Mayer J, Sütterlin M, Siemer J, Kehl S: Finding the most accurate method to measure head circumference for fetal weight estimation. Eur J Obstet Gynecol Reprod Biol 178:153-156,2014 [DOI] [PubMed]
- 4.Rueda S, Fathima S, Knight CL, Yaqub M, Papageorghiou AT, Rahmatullah B, Foi A, Maggioni M, Pepe A, Tohka J, Stebbing RV, McManigle JE, Ciurte A, Bresson X, Cuadra MB, Sun C, Ponomarev GV, Gelfand M.S, Kazanov MD, Wang CW, Chen HC, Peng CW, Hung CM, Noble JA: Evaluation and comparison of current fetal ultrasound image segmentation methods for biometric measurements: A grand challenge, IEEE Trans Med Imaging 33(4):797-813,2013 [DOI] [PubMed]
- 5.Jardim SMGVB, Figueiredo MAT: Segmentation of fetal ultrasound images. Ultrasound Med Biol 31(2):243–250,2005 [DOI] [PubMed]
- 6.Perez-Gonzalez JL, Muńoz JCB, Porras MCR, Arámbula-Cosío F, Medina-Bańuelos V. Automatic fetal head measurements from ultrasound images using optimal ellipse detection and texture maps, VI Latin American Congress on Biomedical Engineering CLAIB. Springer. 2014;2015:329–332. [Google Scholar]
- 7.Ponomarev GV, Gelfand MS, Kazanov MD. A multilevel thresholding combined with edge detection and shape-based recognition for segmentation of fetal ultrasound images. Proceedings of Challenge US: Biometric Measurements from Fetal Ultrasound Images, ISBI. 2012;2012:17–19. [Google Scholar]
- 8.Shrimali V, Anand R, Kumar V: Improved segmentation of ultrasound images for fetal biometry, using morphological operators, Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE. 2009;2009:459–462. doi: 10.1109/IEMBS.2009.5334470. [DOI] [PubMed] [Google Scholar]
- 9.Lu W, Tan J, Floyd R: And biology, Automated fetal head detection and measurement in ultrasound images by iterative randomized Hough transform, Ultrasound Med. Biol. 31(7):929–936,2005 [DOI] [PubMed]
- 10.Zhang L, Ye X, Lambrou T, Duan W, Allinson N, Dudley NJ: A supervised texton based approach for automatic segmentation and measurement of the fetal head and femur in 2D ultrasound images. Phys Med Biol 61(3):1095–1115,2016 [DOI] [PubMed]
- 11.Li J, Wang Y, Lei B, Cheng JZ, Qin J, Wang T, Li S, Ni D: Automatic fetal head circumference measurement in ultrasound using random forest and fast ellipse fitting. IEEE J Biomed Health Inform 22(1):215-223,2017 [DOI] [PubMed]
- 12.van den Heuvel TL, de Bruijn D, de Korte CL, van Ginneken B: Automated measurement of fetal head circumference using 2D ultrasound images. PLoS One 13(8):e0200412,2018 [DOI] [PMC free article] [PubMed]
- 13.Long J, Shelhamer E, Darrell T: Fully convolutional networks for semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2015, pp. 3431–3440 [DOI] [PubMed]
- 14.Ronneberger O, Fischer P, Brox T: U-net: Convolutional networks for biomedical image segmentation, International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Springer, 2015, pp. 234–241
- 15.Milletari F, Navab N, Ahmadi SA: V-net: Fully convolutional neural networks for volumetric medical image segmentation, 2016 Fourth International Conference on 3D Vision (3DV), IEEE, 2016, pp. 565–571
- 16.Jang J, Park Y, Kim B, Lee SM, Kwon JY, Seo JK: Automatic estimation of fetal abdominal circumference from ultrasound images. IEEE J Biomed Health Inform 22(5):1512-1520,2017 [DOI] [PubMed]
- 17.Wu L, Xin Y, Li S, Wang T, Heng PA, D. Ni: Cascaded fully convolutional networks for automatic prenatal ultrasound image segmentation, IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017) IEEE. 2017;2017:663–666. [Google Scholar]
- 18.Al-Bander B, Alzahrani T, Alzahrani S, Williams BM, Zheng Y: Improving fetal head contour detection by object localisation with deep learning, Annual Conference on Medical Image Understanding and Analysis, Springer, 2019, pp. 142–150
- 19.Dou Q, Yu LH, Chen H, Jin Y, Yang X, Qin J, Heng PA: 3D deeply supervised network for automated segmentation of volumetric medical images, Med. Image Anal. 41, 2017, 40-54 [DOI] [PubMed]
- 20.Wu Y, He K: Group normalization, Proceedings of the European Conference on Computer Vision (ECCV), Springer, 2018, pp. 3–19
- 21.Ioffe S, Szegedy C: Batch normalization: Accelerating deep network training by reducing internal covariate shift, arXiv preprint, 2015 https://arxiv.org/abs/1502.03167
- 22.Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M: Striving for simplicity: the all convolutional net, arXiv preprint, 2014 https://arxiv.org/abs/1412.6806
- 23.Zhu Q, Du B, Turkbey B, Choyke PL, Yan P: Deeply-supervised CNN for prostate segmentation, 2017 International Joint Conference on Neural Networks, IEEE, 2017, pp. 178–184
- 24.Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C: DiSAN: Directional self-attention network for RNN/CNN-free language understanding, Thirty-Second AAAI Conference on Artificial Intelligence, AAAI Publications, 2018, pp. 5446–5455
- 25.Roth HR, Lu L, Lay N, Harrison AP, Farag A, Sohn A, Summers RM: Spatial aggregation of holistically-nested convolutional neural networks for automated pancreas localization and segmentation, Med. Image Anal. 45, 2018, 94-107 [DOI] [PubMed]
- 26.Roth HR, Oda H, Hayashi Y, Oda M, Shimizu N, Fujiwara M, Misawa KK Mori K: Hierarchical 3D fully convolutional networks for multi-organ segmentation, arXiv preprint, 2017 https://arxiv.org/abs/1704.06382
- 27.Schlemper J, Oktay O, Chen L, Matthew J, Knight C, Kainz B, Glocker B, Rueckert D: Attention-gated networks for improving ultrasound scan plane detection, arXiv preprint, 2018 https://arxiv.org/abs/1804.05338
- 28.Zhang S, Fu H, Yan Y, Zhang Y, Wu Q, Yang M, Tan M, Xu Y: Attention guided network for retinal image segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Springer, 2019, pp. 797–805
- 29.Wang X, Girshick R, Gupta A, He K: Non-local neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2018, pp. 7794–7803
- 30.Kingma DP, Ba J: Adam: A method for stochastic optimization, arXiv preprint, 2014 https://arxiv.org/abs/1412.6980
- 31.Rong Y, Xiang D, Zhu W, Shi F, Gao E, Fan Z, Chen X: Deriving external forces via convolutional neural networks for biomedical image segmentation. Biomed Opt Express 10(8):3800-3814,2019 [DOI] [PMC free article] [PubMed]
- 32.Liu P, Zhao H, Li P, Cao F. Automated classification and measurement of fetal ultrasound images with attention feature pyramid network. Second Target Recognition and Artificial Intelligence Summit Forum: SPIE; 2020. p. 114272R. [Google Scholar]
- 33.Ni D, Yang Y, Li S, Qin J, Ouyang S, Wang T, P.A. Heng: Learning based automatic head detection and measurement from fetal ultrasound images via prior knowledge and imaging parameters, IEEE 10th International Symposium on Biomedical Imaging. IEEE. 2013;2013:772–775. [Google Scholar]
- 34.Zalud I, Good S, Carneiro G, Georgescu B, Aoki K, Green L, Shahrestani F, Okumura R: Fetal biometry: A comparison between experienced sonographers and automated measurements. J Matern Fetal Neonatal Med 22(1):43-50,2009 [DOI] [PubMed]
- 35.Satwika IP, Habibie I, Ma'sum MA, Febrian A, Budianto E: Particle swarm optimation based 2-dimensional randomized Hough transform for fetal head biometry detection and approximation in ultrasound imaging, 2014 International Conference on Advanced Computer Science and Information System, IEEE, 2014, pp. 468–473
- 36.Ryou H, Yaqub M, Cavallaro A, Papageorghiou AT, Noble JA: Automated 3D ultrasound image analysis for first trimester assessment of fetal health. Phys Med Biol 64(18):185010,2019 [DOI] [PubMed]
- 37.Carneiro G, Georgescu B, Good S, Comaniciu D: Detection and measurement of fetal anatomies from ultrasound images using a constrained probabilistic boosting tree. IEEE Trans Med Imaging 27(9):1342-1355,2008 [DOI] [PubMed]
- 38.Lu W, Tan J: Detection of incomplete ellipse in images with strong noise by iterative randomized Hough transform (IRHT). Pattern Recognit 41(4):1268-1279,2008
- 39.Khanh TLB, Dao DP, Ho NH, Yang HJ, Baek ET, Lee G, Kim SH, Yoo SB: Enhancing U-Net with spatial-channel attention gate for abnormal tissue segmentation in medical imaging. Appl Sci 10(17):5729,2020
- 40.Roy AG, Navab N, Wachinger C: Concurrent spatial and channel ‘squeeze & excitation’ in fully convolutional networks, International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). 2018, 421–429
- 41.Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H: Encoder-decoder with atrous separable convolution for semantic image segmentation, Proceedings of the European Conference on Computer Vision (ECCV). 2018, 801–818
- 42.Zhao H, Shi J, Qi X, Wang X, Jia J: Pyramid scene parsing network, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017, 2881–2890