Keywords

1 Introduction

Prostate cancer is the second leading cancer-related cause of death in the male population [1], with the estimate that 1 in 9 men in the US will be diagnosed with prostate cancer in their lifetime [2]. The prostate-specific antigen (PSA) test is mainly used as an upfront screening test for men without symptoms. Men with PSA level >4 would have an increasing chance of prostate cancer, and will be followed-up by further tests, such as MRI. mpMRI is commonly employed before tissue biopsy. Diagnosis with MRI would reduce unnecessary biopsy by 70%. Daniel et al. [3] reported 486% increased use of mpMRI from Oct 2013 to Dec 2015. Thus, a lot of prostate magnetic resonance (MR) images require radiologists to interpret in current clinical routine. Recently, the Prostate Imaging Reporting and Data System [4] guidelines were established to standardize and minimize the variation in interpreting prostate MRI. Geoffrey et al. [5] observed variability in cancer yield across radiologists, with 24% of males who were assigned a score of having a benign lesion turned out to have clinically significant prostate cancer on biopsy. This false negative rate could vary from 13% to 60% among radiologists. With the advances of deep learning in medical imaging, our goal is to automate the diagnosis process by detecting and classifying prostate lesions with high accuracy in a single framework.

There have been several works on prostate lesion classification with multi-parametric MRI. Karimi et al. [6] proposed to combine hand-crafted features and learned features for classifying malignancy of prostate lesions mpMRI images, which led to an area-under-the-curve (AUC) of 0.87. Liu et al. [7] proposed XmasNet that reformulates 3D mpMRI images as a 2D problem. To incorporate 3D information when learning from 2D slices, they employed data augmentation through 3D rotation and slicing. The approach attained a performance of 0.84 AUC.

However, these methods require labeling of lesion centroid, which is necessarily done by clinicians. In a related research on prostate lesion detection and classification in MRI, Kiraly et al. [8] used a deep convolutional encoder-decoder architecture to simultaneously detect and classify prostate cancer, and they reached an average classification performance of 0.834 AUC. However, they applied a region of interest (ROI) of roughly the same size to ensure only the prostate and its surrounding areas were considered. Such simplification may lead to imprecision.

In this paper, we propose a novel two-stage framework for fully-automated prostate lesion detection and diagnosis. In the first stage, prostate zone/contour in MR images is automatically segmented. Secondly, an analytical framework is implemented to detect and classify malignancy of prostate lesions. The major work contributions are: (i) A Mask R-CNN [9] model is trained to segment/pinpoint a smaller ROI that contains all the lesion candidates. (ii) A weakly supervised deep neural network is developed to process the detection/classification in a single run. (iii) Detailed validation is conducted on two datasets, namely PROSTATEx Challenge [10] dataset and our local cohort. Classification performance is also compared with the state-of-the-art approaches.

2 Methods

Figure 1 illustrates our proposed two-stage automatic framework. We applied an object detection method, Mask R-CNN, to train a prostate segmentation model for T2-weighted images that show tissues in high contrast and brightness. The model outputs a smaller size image that contains the prostate ROI. With coordinate transformation, prostate regions in ADC maps and high b-value diffusion-weighted images become available. Next, the prostate ROI will act as input to a deep neural network (DNN), outputting lesion areas and classification results in one single feedforward pass. Malignancy of the classified lesions will then be determined through the ensemble learning of images of different input sequences.

Fig. 1.
figure 1

Schematics of the proposed automated prostate lesion detection/classification.

2.1 Mask R-CNN for Automated Prostate Structures Segmentation

Prostate lesions detection can be difficult, as the lesions are small relative to the entire image size. Thus, identifying the prostate ROI is crucial to enable more accurate and effective lesion detection. Mask R-CNN object detection approach relies on generated region proposals, each of which outputs a class, bounding box, and a mask. Multi-scale features are extracted from various layers, which provide more powerful information to segment specific objects at various scales. These are particularly useful for our case, as the prostate sizes among patients can vary. To train this Mask R-CNN model, we employ an online prostate segmentation dataset, along with well labelled masks, which were released by the Initiative for Collaborative Computer Vision Benchmarking (I2CVB) [11]. The trained model can narrow the entire MR images down to the prostate ROI only, thus facilitating the subsequent detection and classification process.

2.2 Weakly Supervised Deep Neural Network for Prostate Lesion Detection and Classification

In the prostate classification datasets used by existing approaches, only lesion centroids are located, but not their exact outlines. Previous work [8] attempted to resolve this issue by applying lesion labels with identical gaussian distribution at each lesion point. However, this approach could not account for variations in lesion sizes. To this end, we employ a distance regularized level set evolution [12] to generate weak lesion labels. Provided with the lesion centroids, this method can be applied to edge-based active contour model for weak lesion labels.

Given the prostate region identified at stage one, we employed our novel weakly supervised DNN for simultaneous lesion detection and classification based on the weak masks and classification labels. The network comprises three key components, namely encoder, decoder and classifier. The encoder consists of five groups of convolutional layers and max-pooling layers while the decoder contains five upsampling layers. The encoder and decoder are linked with skip connections between their corresponding layers. The decoder is trained to produce lesion masks, indicating the location of predicted lesions. The predicted weak lesion mask, together with prostate region and features extracted from the encoder are then reused by the classifier. We hypothesize that the additional lesion and prostate structure masks can act as a form of attention map to guide the classifier, leading to the improved classification performance. To predict the malignancy of lesions, the classifier itself also contains five groups of convolutional and max-pooling layers with two fully connected layers. A composite loss function is applied, which combines the lesion segmentation loss and classification loss, to train the entire network. For each sample, the classification loss \( L_{C} \) can be designed as:

$$ L_{\text{C}} { = } - y\log c(x) - (1 - y)\log \left[ {1 - c(x)} \right] $$
(1)

where \( x \in {\mathbb{R}}^{w \times h} \), \( y \in {\mathbb{R}}^{1} \) and \( c(x) \in {\mathbb{R}}^{1} \), respectively, denotes input, label and output of the data sample; \( w \) and \( h \) are the width and height of input images. Lesion segmentation loss \( L_{S} \) can be described as:

$$ L_{S} = 1 - \frac{{2\sum\limits_{i}^{w} {\sum\limits_{j}^{h} {\left[ {M_{i,j} S_{i,j} (x)} \right] + \varepsilon } } }}{{\sum\limits_{i}^{w} {\sum\limits_{j}^{h} {M_{i,j} } } + \sum\limits_{i}^{w} {\sum\limits_{j}^{h} {S_{i,j} (x)} } + \varepsilon }} $$
(2)

where, \( M \in {\mathbb{R}}^{w \times h} \) and \( S(x) \in {\mathbb{R}}^{w \times h} \), respectively, denote mask label and output mask of the lesion; \( M_{i,j} \) and \( S_{i,j} (x) \) represent pixel values of their corresponding masks in the \( i_{th} \) column and \( j_{th} \) row of matrix. The parameter \( \varepsilon \) is a numerical constant to avoid division of zero and ensure numerical stability. The total loss function \( L_{T} \) is also weighted with two coefficients, \( \lambda_{C} \) and \( \lambda_{S} \) as below. Note that the Adam [13] optimizer is applied to train the network with a learning rate of \( 10^{ - 6} \).

$$ L_{T} = \lambda_{C} L_{C} + \lambda_{S} L_{S} $$
(3)

3 Experimental Results

3.1 Prostate Structures Segmentation

In the first stage, 646 slices of T2-weighted images from 21 patients scanned with 1.5-T (GE) scanner and 15 patients scanned with 3.0-T (Siemens) scanner were extracted to train and validate our prostate segmentation model. We used a 7:2:1 split ratio for the slices extracted from the I2CVB dataset for training, validation and testing. Data augmentation was performed on the training sets by random rotation of [±30°, ±60°, ±90°]. All the inputs were resized to 512 × 512 pixels before feeding in the Mask R-CNN model. Learning rate is set to 10−3. We trained on two GPUs with a batch size of 4 for 200 epochs. The model with the best dice coefficient on the validation set is chosen as final model, with results on the test split reported. Figure 2a shows a sample of segmentation result. As shown in Table 1, we have compared our results with other methods using mean intersection over union (IoU). These show that Mask R-CNN can achieve higher IoU in segmenting prostate and central gland. Our segmentation model can obtain prostate regions on T2 sequences of the two datasets (Fig. 2b and c).

Fig. 2.
figure 2

Samples of automated prostate segmentation with Mask R-CNN. (a) Results for test split of I2CVB dataset. The ground truth is indicated in green, and the predicted area is in yellow. Predicted prostate area of PROSTATEx Challenge dataset (b) and our local cohort (c). (Color figure online)

Table 1. Comparison of Mean IoU on I2CVB dataset with other methods.

3.2 Prostate Lesion Detection and Classification

In the second stage, our proposed weakly supervised network is trained and validated on PROSTATEx Challenge dataset (330 lesion samples, 76 malignant and 254 benign) and our local cohort (74 lesion samples, 51 malignant lesions and 23 benign lesions). We used GLEASON score (malignant for values ≥ 7) to label lesion malignancy on both datasets. We trained two sets of models on these two datasets. Each set of models consists of models trained on three sequences, namely T2-weighted images, ADC maps and high b-value diffusion-weighted images. All the segmented prostate regions were prepared for both datasets in stage one. Data augmentation was applied to the training data with random rotation of [±3°, ±6°, ±9°, ±12°, ±15°]. The image sizes were then scaled to 224 × 224 pixels for training. We conducted 5-fold cross validation experiments on both datasets. Through repeated experimental trials, parameters weights \( \lambda_{C} \) and \( \lambda_{S} \) were both set to 1, with \( \varepsilon \) set to 10−5. The final classification results and AUC were obtained through ensemble learning from all three sequences.

Figure 3 shows the results of lesion detection in T2 sequences. Figure 4a and b illustrate the final average receiver operating characteristic curve (ROC curve) on the two prostate lesion datasets with our proposed approach. For PROSTATEx Challenge dataset, our model outputs the average AUC of 0.912 with ensembled input sequences of ADC maps and T2-weighted images, outperforming the other two existing methods that individually combine hand-craft features and learned features (AUC of 0.870) [6], and encoder-decoder architecture (AUC of 0.834) [8]. For our local cohort, the average AUC of 0.882 is obtained with ensemble learning over input sequences. Moreover, we compare the results without pre-segmentation. Figure 4c and d indicate significantly lower AUC on the two datasets with the non-segmented/cropped image as input, showing the crucial role of pre-segmentation in this detection/classification framework.

Fig. 3.
figure 3

Samples of result for lesion detection of T2-weighted images. Green spots indicate the given lesions centroid. (a) Input of raw prostate region. (b) Weak lesion mask obtained from level set method. (c) Predicted lesion area contoured by yellow. (Color figure online)

Fig. 4.
figure 4

Average AUC results on the two datasets: PROSTATEx Challenge (Left column) and our local cohort (Right column). AUC results (a) and (b) are input with prostate region segmented by Mask R-CNN models. Lower AUC can be observed in (c) and (d) without the segmentation.

4 Conclusion

This paper proposes a novel framework for fully-automated prostate lesion detection and diagnosis in MR images. Experiments on PROSTATEx Challenge dataset and our local cohort achieve a promising average AUC of 0.912 and 0.882 on their validation set respectively. The resultant efficacy is comparable to the first (champion) and second highest in AUC in PROSTATEx challenge [7], which achieved 0.87 and 0.84 on the test set, respectively. Our proposed method is extensible to other structures demanding for similar lesion diagnosis using MRI. For future work, we will extend the framework 3D MRI. To improve robustness, we will also attempt to consider using specific regions, such as segmented central gland and peripheral prostate zones for classification.