Abstract
Lung tumors, especially those located close to or surrounded by soft tissues like the mediastinum, are difficult to segment due to the low soft tissue contrast on computed tomography images. Magnetic resonance images contain superior soft-tissue contrast information that can be leveraged if both modalities were available for training. Therefore, we developed a cross-modality educed learning approach where MR information that is educed from CT is used to hallucinate MRI and improve CT segmentation. Our approach, called cross-modality educed deep learning segmentation (CMEDL) combines CT and pseudo MR produced from CT by aligning their features to obtain segmentation on CT. Features computed in the last two layers of parallelly trained CT and MR segmentation networks are aligned. We implemented this approach on U-net and dense fully convolutional networks (dense-FCN). Our networks were trained on unrelated cohorts from open-source the Cancer Imaging Archive CT images (N = 377), an internal archive T2-weighted MR (N = 81), and evaluated using separate validation (N = 304) and testing (N = 333) CT-delineated tumors. Our approach using both networks were significantly more accurate (U-net \(P <0.001\); denseFCN \(P <0.001\)) than CT-only networks and achieved an accuracy (Dice similarity coefficient) of \(0.71\pm 0.15\) (U-net), \(0.74\pm 0.12\) (denseFCN) on validation and \(0.72\pm 0.14\) (U-net), \(0.73\pm 0.12\) (denseFCN) on the testing sets. Our novel approach demonstrated that educing cross-modality information through learned priors enhances CT segmentation performance.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Precision medical treatments including image-guided radiotherapy require accurate target tumor segmentation [1]. Computed tomography (CT), the standard-of-care imaging modality lacks sufficient soft-tissue contrast, which makes visualizing tumor boundaries difficult, especially for those that are adjacent to soft-tissue structures. With the advent of new MRI simulator technologies, radiation oncologists can delineate target structures on MRI acquired in simulation position, which then have to be transferred using image registration to the planning CTs acquired at a different time in treatment position for radiation therapy planning [2]. Image registration itself is prone to errors and thus accurate segmentation on CT itself is more desirable for improving accuracy of clinical radiation treatment margins. More importantly, driven by the lack of simultaneously acquired CT and MR scans, current methods are restricted to CT alone. Therefore, we developed a novel approach, called cross-modality educed deep learning (CMEDL), that uses unpaired cross-domain adaptation between unrelated CT and MR datasets to hallucinate MR-like images or pseudo MR (pMR) from CT scans. The pMR image is combined with CT image to regularize training of a CT segmentation network. This is accomplished by aligning the features of the CT with the pMR features during training (Fig. 1).
Ours is not a method for data augmentation using cross-domain adaptation [3,4,5]. Our work is also unlike methods that seek to reduce the datashift differences between same imaging modalities [6,7,8]. Instead, our goal in this work is to maximize the segmentation performance in a single less informative imaging modality, namely, CT using learned information modeling the latent tissue relationships with a more informative modality, namely MRI. The key insight here is that the features dismissed as uninterpretable on CT can provide inference information when learning proceeds from a more informative modality such as MRI.
Our approach is most similar in its goal to compute shared representations for improving segmentations as in the work by [9], where several shared representations between CT and MRI were constructed using fully convolutional networks. Our approach, that is based on GANs for cross-modality learning, shares some similarities to [10] that also used a GAN as a backbone framework, and implemented dual networks for performing segmentations on both CT and MRI. However, our approach substantially differs from prior works in its use of the cross-modality tissue relations as priors to improve inference on the less informative source (or CT) domain. Though applied to segmenting lung tumors, this method is generally applicable to other structures and imaging modalities.
Our contributions in this work are as follows: (i) first, we developed a novel approach to generate segmentation on CT by leveraging more informative MRI through cross-modality priors. (ii) second, we implemented this approach on two different segmentation networks to study feasibility of segmenting lung tumors located in the mediastinum, an area where there is diminished contrast between tumor and the surrounding soft-tissue. (iii) third, we evaluated our approach on a large dataset of 637 tumors.
2 Methods
We use a supervised cross-modality and CT segmentation approach with a reasonably large number of expert segmented CT scans (\(X_{CT}, y_{CT}\)) and a few MR scans with expert segmentation (\(\{X_{MR}, y_{MR}\}\), where, \(N_{X_{MR}}\) \(\ll \) \(N_{X_{CT}}\)). The cross-modality educed deep learning (CMEDL) segmentation consists of two sub-networks that are optimized alternatively. The first sub-network (Fig. 1A) generates a pMR image given a CT image. The second sub-network (Fig. 1B), trains its CT segmentation network constrained using features from another network trained using pMRI. The alternative optimization enables the approach to regularize both the segmentation and pMR generation, such that the pMR is specifically tuned to increase segmentation accuracy. In other words, pMR acts as an informative regularizer for CT segmentation, while the gradients of segmentation errors serve to constrain the generated pMR images.
2.1 Cross-domain Adaptation for Hallucinating Pseudo MR Images
A pair of conditional GANs [11] are trained with unpaired CT and T2-weighted (T2w) MR images arising from different sets of patients. The first GAN transforms CT into a pseudo MR (pMR) image (\(G_{C \rightarrow M}\)) while the second, transforms a MR image into its corresponding pseudo CT (pCT) (\(G_{M \rightarrow C}\)) image. The GANs are optimized using the standard adversarial loss (\(L_{adv} = L_{adv}^{CT}+L_{adv}^{MR}\)) and cycle consistency losses (\(L_{cycl} = L_{cycl}^{CT} + L_{cycl}^{MR}\)). In addition, we employed a contextual loss that was introduced for real-world images [12] in order to handle learning from image sets lacking spatial correspondence. The contextual loss facilitates such transformations by treating images as collection of features and computing a global similarity between all pairs of features between the two images (\(\{g_{j \in N}, m_{i \in M}\}\)) used in computing domain adaptation. The contextual similarity is expressed as:
where, N corresponds to the number of features. The contextual similarity is computed by normalizing the inverse of cosine distances between the features in the two images as described in [12]. The contextual loss is computed as:
The total loss for the cross-modality adaptation is then expressed as the summation of all the aforementioned losses. The pMR generated from this step is passed as an additional input for training the CT segmentation network.
2.2 Segmentation Combining CT with pMR Images
Our approach for combining the CT with pMR images uses the idea of only matching information that is highly predictable from each other. This usually corresponds to the features closest to the output as the two images are supposed to produce identical segmentation. Therefore, the features computed from the last two layers of CT and pMR segmentation networks are matched by minimizing the squared difference or the L2 loss between them. This is expressed as below.
where \(S_{CT}, S_{MR}\) are the segmentation networks trained using the CT and pMR images, \(\phi _{CT}, \phi _{MR}\) are the features computed from these networks, and \(G_{CT \rightarrow MR}\) is the cross-modality network used to compute the pMR image, and F stands for Frobenius norm.
The total loss computed from the cross-modality adaptation and the segmentation networks is expressed as:
where \(\lambda _{cyc}\), \(\lambda _{cx}\) and \(\lambda _{seg}\) are the weighting coefficients for each loss. During training, we alternatively update the cross-domain adaptation network and the segmentation network with the following gradients, \(-\varDelta _{\theta _{G}}(L_{adv}+ \lambda _{cyc}{L_{cyc}}+ \lambda _{c}{L_{c}}+ \lambda _{cx}L_{cx})\), \(-\varDelta _{\theta _{D}}(L_{adv})\) and \(-\varDelta _{\theta _{seg}}L_{seg}\). More concretely, the segmentation network is fixed when updating the cross-modality translation and vice versa in each iteration.
2.3 Segmentation Architecture
We implemented the U-net [13] and dense fully convolutional networks (denseFCN) [14] to evaluate the feasibility of combining hallucinated MR for improving CT segmentation accuracy. These networks are briefly described below.
-
1.
U-net was modified using batch normalization after each convolution filter in order to standardize the features computed at the different layers.
-
2.
Fully Convolutional DenseNets (Dense-FCN) that is based on [14], uses dense feature maps computed using a sequence of dense feature blocks and concatenated with feature maps from previous computations through residual connections. Specifically, a dense feature block is produced by iterative summation of previous feature maps within that block. As features computed from all image resolutions starting from the image resolution to the lowest resolution are iteratively concatenated, features at all levels are utilized. This in turn facilitates an implicit dense supervision to stabilize training.
2.4 Implementation and Training
All networks were implemented using the Pytorch [15] library and trained end to end on Tesla V100 with 16 GB memory and a batch size of 2. The ADAM algorithm [16] with an initial learning rate of 1e-4 was used during training. The segmentation networks were trained with a learning rate of 2e-4. We set \(\lambda _{adv}=10\), \(\lambda _{cx}=1\), \(\lambda _{cyc}=1\) and \(\lambda _{seg}=5\). For the contextual loss, we use the convolution filters after the Con7, Conv8 and Conv9 due to memory limitations.
3 Datasets and Evaluation
We used patients obtained from three different cohorts consisting of (a) the Cancer Imaging Archive (TCIA) [17] with non-small cell lung cancers (NSCLC) [18] consisting of 377 patients (training), (b) 81 longitudinal T2-weighted MR scans (scanned on Philips 3T Ingenia) from 21 patients treated with radiation therapy, and (training) (c) 637 contrast-enhanced tumors treated with immunotherapy at our institution for validation (N = 304) and testing (N = 333) such that different sets of patients were used for validation and testing. Early stopping was used during the training to prevent overfitting and the best model selected using validation set was used for testing. Identical CT datasets were used in both CT only and CMEDL approach for equitable comparisons. Expert segmentations were available on all scans.
The segmentation accuracies were evaluated using Dice similarity coefficient (DSC) and Hausdorff distance at \(95^{th}\) percentile (HD95) as recommended in [19]. In addition, we computed the detection rate for the tumors where tumors with at least 50% DSC overlap with expert segmentations were considered as detected.
4 Results
4.1 Tumor Detection Rate
Our method achieved the most accurate detection using both U-net and DenseFCN methods for validation and test sets. In comparison the CT-only method resulted in much lower detection rates for both networks (Table 1).
4.2 Segmentation Accuracies
The CMEDL approach resulted in more accurate segmentations than CT-only segmentations (see Table 1). In addition, both of the U-net and denseFCN networks trained using CMEDL approach were significantly more accurate than CT only segmentations when evaluated with both DSC (\(P < 0.001\)) and HD95 (\(P < 0.001\)) metrics. Figure 2 shows the box plots for the validation and test sets using the two metrics and the two networks. P-values computed using paired Wilcoxon two-sided tests are also shown.
4.3 Visual Comparisons
Figure 3 shows visual segmentation results produced by the different networks for representative cases when trained using CT-only and with the CMEDL approach. As seen, in both networks, the CMEDL method closely follows the expert-segmentation that is missed using CT-only networks. Figure 4 shows the feature map activations produced using U-net CT only and with Unet CMEDL. As seen, the feature activations are minimal when using CT-only but shows a clear preferential boundary activation when incorporating the MR information. Figure 4(b) also shows a pseudo MR produced from a CT (Fig. 4(a)).
5 Discussion
We developed a novel approach for segmenting lung tumors located in areas with low soft-tissue contrast by leveraging learned prior information from more informative MR modality. These cross-modality priors are learned from unrelated patients and are used to hallucinate MRI to inform CT segmentation. Through extensive experiments on two different network architectures, we showed that leveraging a more informative modality (MRI) to inform inference in a less informative modality (CT), improves segmentation. Our work is limited by lack of sufficiently large MR datasets to potentially improve the accuracy of cross-domain adaptation models. Nevertheless, this is the first approach to our knowledge that used the cross-modality information in a novel way to generate CT segmentation.
6 Conclusions
We introduced a novel approach for segmenting on CT datasets that can leverage more informative MR modality through cross-modality learning. Our approach implemented on two different segmentation architectures shows improved performance over CT-only methods.
References
Njeh, C.: Tumor delineation: the weakest link in the search for accuracy in radiotherapy. J. Med. Phys./Assoc. Med. Phys. India 33(4), 136 (2008)
Devic, S.: MRI simulation for radiotherapy treatment planning. Med. Phys. 39(11), 6701–6711 (2012)
Nie, D., et al.: Medical image synthesis with context-aware generative adversarial networks. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 417–425. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66179-7_48
Chartsias, A., Joyce, T., Dharmakumar, R., Tsaftaris, S.A.: Adversarial image synthesis for unpaired multi-modal cardiac data. In: Tsaftaris, S.A., Gooya, A., Frangi, A.F., Prince, J.L. (eds.) SASHIMI 2017. LNCS, vol. 10557, pp. 3–13. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68127-6_1
Jiang, J., et al.: Tumor-aware, adversarial domain adaptation from CT to MRI for lung cancer segmentation. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11071, pp. 777–785. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00934-2_86
Zhu, J.Y., Park, T., Isola, P., Efros, A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: International Conference Computer Vision (ICCV), pp. 2223–2232 (2017)
Long, M., Zhu, H., Wang, J., Jordan, M.I.: Deep transfer learning with joint adaptation networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2208–2217. JMLR. org (2017)
Kamnitsas, K., et al.: Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal. 36, 61–78 (2017)
Vanya, V.V., et al.: Multi-modal learning from unpaired images: application to multi-organ segmentation in CT and MRI. In: 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, Lake Tahoe, NV, USA, 12–15 March 2018, pp. 547–556 (2018)
Cai, J., Zhang, Z., Cui, L., Zheng, Y., Yang, L.: Towards cross-modal organ translation and segmentation: a cycle-and shape-consistent generative adversarial network. Med. Image Anal. 52, 174–184 (2019)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (NIPS), pp. 2672–2680 (2014)
Mechrez, R., Talmi, I., Zelnik-Manor, L.: The contextual loss for image transformation with non-aligned data. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 800–815. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_47
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Jégou, S., Drozdzal, M., Vazquez, D., Romero, A., Bengio, Y.: The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1175–1183. IEEE (2017)
Paszke, A., et al.: Automatic differentiation in pytorch (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR) (2014)
Clark, K., et al.: The cancer imaging archive (TCIA): maintaining and operating a public information repository. J. Digit. Imaging 26(6), 1045–1057 (2013)
Aerts, H.J., et al.: Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat. Commun. 5, 4006 (2014)
Menze, B.H., et al.: The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34(10), 1993 (2015)
Acknowledgements
This work was supported by the MSK Cancer Center support grant/core grant P30 CA008748, and NCI R01 CA198121-03.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Jue, J. et al. (2019). Integrating Cross-modality Hallucinated MRI with CT to Aid Mediastinal Lung Tumor Segmentation. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11769. Springer, Cham. https://doi.org/10.1007/978-3-030-32226-7_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-32226-7_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32225-0
Online ISBN: 978-3-030-32226-7
eBook Packages: Computer ScienceComputer Science (R0)