Keywords

1 Introduction

Within last three and a half decade Magnetic Resonance Imaging (MRI) has evolved from a potential idea to primary diagnostic tool for many clinical and research problems [1]. The reason for such an enormous growth is its non-invasive nature, the ability to generate distinct contrasts of same anatomical structure and non-exposure to ionization radiation [2]. Different deep learning methods utilize these multi-contrast MR images (T\(_{1}\)-weighted, T\(_{2}\)-weighted etc.,) for brain tumor segmentation [3] and white/gray matter segmentation [4]. However, these deep neural networks rely heavily on huge datasets for training. The availability of such datasets in the domain of medical imaging is quite challenging and it becomes even more difficult when the required data is multi-contrast. Therefore, to enhance the performance of deep learning methods, synthetic generation of images for data augmentation is of great importance [5].

Since the introduction of generative adversarial networks (GANs), there has been remarkable development in the direction of image synthesis [6]. GANs have been widely adopted in medical imaging, [5] uses Wasserstein-GANs to generate T\(_{1}\)-weighted, T\(_{2}\)-weighted and FLAIR images of brain, [3] used Progressively Growing GANs for generation of retinal fundus and brain images. Some cross-modality image synthesis methods based on Cycle-GAN [7], cGAN [8] and Pix2Pix [9] have also been presented for generating missing modality data. However, all of these methods are limited to generate synthetic data for one or two contrasts only. For generation of multi-contrast data, existing methods require training of separate models for each corresponding contrast which is extensively time consuming, and computationally very expensive. This also limits the potential of generator network to learn common features from all available data samples which is crucial when training dataset is small.

To alleviate the above issue, we propose a new method, which leverages the power of Star-GAN [6] and U-NET [10] for synthetic generation of multi-contrast MR images (T\(_{1}\)-weighted, T\(_{2}\)-weighted, PD-weighted and MRA) using only one generator and discriminator network. Our method eliminates the requirement of training separate models for each mapping, thus, reducing the training time significantly. In addition, our approach allows us to utilize images from all contrasts for training in an unsupervised manner, which helps the generator to learn common geometric properties among all contrasts. The unsupervised training eliminates the requirement of paired data, hence broadening the scope of our method.

A new generation loss is proposed which preserves the small anatomical structural details of given input image using structural similarity (SSIM) [11]. It also employs recently proposed Learned Perceptual Image Patch Similarity (LPIPS) metric [12], that forces the generator to learn reverse mapping for reconstructing real image from fake image while prioritizing perceptual similarity between reconstructed and real images. For stable training of our model we add regularization term to the adversarial loss [13]. The model is trained to generate images for all four contrasts using only one image as input from any contrast. We provide qualitative and quantitative results for synthetic generation of multi-contrast MR images, which shows the superiority of our approach over existing methods.

2 Method

The proposed method efficiently and effectively learns the mappings among four contrasts of MRI [T\(_1\)-weighted, T\(_2\)-weighted, Proton Density (PD)-weighted, Magnetic Resonance Angiography (MRA)] to generate a fake image of target contrast given a real image and original contrast. For example, given an input image of T\(_{1}\)-weighted contrast our model can generate fake T\(_{2}\)-weighted, PD-weighted and MRA images using only one generator. Working of the model is illustrated by Fig. 1 and details of loss functions are described next.

Fig. 1.
figure 1

U-NET generator performs two synthesis: (i) generating a fake image given depth-wise concatenated real image and target contrast; (ii) reconstructing real image given fake image concatenated depth-wise with original contrast. Fake image is used to measure two losses: (i) Adversarial loss and (ii) contrast classification loss using PatchGAN discriminator. Reconstructed and real image are used to measure reconstruction loss to observe how close reconstructed image is to the real image in terms of structural (SSIM), perceptual (LPIPS) and global (L\(_1\)) similarity

2.1 Loss Functions

Adversarial Loss: Instead of using adversarial loss proposed by [14], which is reported to suffer from various training problems including mode collapse, vanishing gradients and senstitivity to hyper-parameters, we use regularized Wassersteing GAN with gradient penalty (WGAN-GP). This not only provides stable learning for deep generator and discriminator networks but also increase the quality of generated images. It is defined as

$$\begin{aligned} \begin{aligned} \mathcal {L}_{adv} = \mathcal {L}_{WGAN_{gp}} + \lambda _{ct} CT|_{x', x''} \end{aligned} \end{aligned}$$
(1)

Here, the first term gives us the WGAN-GP loss and the second term regularize this loss using a consistency term. WGAN-GP loss is given as

$$\begin{aligned} \begin{aligned} \mathcal {L}_{WGAN_{gp}} = \mathbb {E}\big [D_{src}(x)\big ] - \mathbb {E}_{x,c}\big [D_{src}(G(x,c))\big ] \\ - \lambda _{gp} \mathbb {E}_{\hat{x}} \big [(\parallel \bigtriangledown _{\hat{x}} D_{src}(\hat{x}) \parallel _2 - 1)^2\big ] \end{aligned} \end{aligned}$$
(2)

In above equation, generator G takes an input image x and a target label c to generate a fake image of the target contrast. While the discriminator D is responsible for finding out if the given image is real (from training set) or fake (generated by G). The consistency term of Eq. 1 is given as

$$\begin{aligned} \begin{aligned} CT|_{x', x''} = \mathbb {E}_{x\sim \mathbb {P}} \big [max(0, d(D(x'), D(x'')) \\ \quad + 0.1 \cdot d(D\_\,(x'), D\_\,(x'')) - M')\big ] \end{aligned} \end{aligned}$$
(3)

here, \(x'\) and \(x''\) corresponds to virtual data points close to x and \(D\_\) is the output of the discriminator from second to the last layer.

For our experiments, we use \(\lambda _{gp}=10\), \(\lambda _{ct}=1\) and \(M'=0.\)

Contrast Classification Loss: It forces the generator to produce image of correct contrast and allows discriminator to perform contrast classification for real and fake images [6]. It is defined as

$$\begin{aligned} \mathcal {L}^r_{cls} = \mathbb {E}_{x, c'} \big [-\log D_{cls}(c'|x)\big ] \end{aligned}$$
(4)

for fake images

$$\begin{aligned} \mathcal {L}^f_{cls} = \mathbb {E}_{x, c'} \big [-\log D_{cls}(c'|G(x,c)\big ] \end{aligned}$$
(5)

Here, x and \(c'\) represents real image and original label. while G(xc) and c corresponds to fake image and target contrast.

Generation Loss: If the model generates a fake image T\(_1'\) belonging to T\(_1\) contrast using a real T\(_2\)-weighted image then by using reverse mapping it should reconstruct the real T\(_2\)-weighted image. For this [7] uses cycle consistency loss:

$$\begin{aligned} \mathcal {L}_{cyc} = \mathbb {E}_{x, c, c'} \big [ \parallel x - G(G(x, c), c') \parallel \big ] \end{aligned}$$
(6)

However, this L\(_1\) loss focuses on an entire image ignoring patch level dissimilarity among images, thus providing less information for generator to work with. Therefore, to impose small patch wise dissimilarity measure between real and reconstructed image, we increment generation loss with two additional terms. (i) Inspired by the strength of structural similarity (SSIM) [11] for measuring structural similarity between two images in a patch-wise manner, we employ structural dissimilarity loss (DSSIM); an extension of (SSIM) as

$$\begin{aligned} \mathcal {L}_{DSSIM} = \mathbb {E}_{x, c, c'} \left[ \frac{1 - SSIM(x - G(G(x, c), c'))}{2}\right] \end{aligned}$$
(7)

(ii) Secondly, to enforce the generator to produce images perpetually more closer to the target contrast, we utilize recently proposed Learned Perceptual Image Patch Similarity metric [12]:

$$\begin{aligned} \mathcal {L}_{LPIPS} = \mathbb {E}_{x, c, c'} \left[ x - G(G(x, c), c')) \right] \end{aligned}$$
(8)

Both additional terms calculate differences between real and reconstructed image in a patch wise manner. This allows our generator to focus on small anatomical regions and preserve structure while changing only contrast related properties for image synthesis. Our final reconstruction loss takes the good from all three terms:

$$\begin{aligned} \mathcal {L}_{rec} = \lambda _{cyc} \, \mathcal {L}_{cyc} + \lambda _{DSSIM} \, \mathcal {L}_{DSSIM} + \lambda _{lpips} \, \mathcal {L}_{lpips} \end{aligned}$$
(9)

We use \(\lambda _{cyc} = \lambda _{DSSIM} = \lambda _{lpips} = 10\) for training.

Full Objective: Finally, the full objective for our discriminator network is to minimize the loss \(\mathcal {L}_D\), which is defined as

$$\begin{aligned} \mathcal {L}_{D} = - \mathcal {L}_{adv} + \mathcal {L}^r_{cls} \end{aligned}$$
(10)

while the generator tries to minimize \(\mathcal {L}_G\) given as

$$\begin{aligned} \mathcal {L}_{G} = \mathcal {L}_{adv} + \mathcal {L}^f_{cls} + \mathcal {L}_{rec} \end{aligned}$$
(11)

2.2 Network Architecture

For the exceptional performance of U-Net [10] for medical images, we use U-Net based generator for our model adapted from [7]. The generator contains 7 down-sampling layers with strided convolutions of stride 2 followed by the 7 up-sampling layers with fractional strides. Each convolutional layer is followed by instance normalization and ReLU activation except for the final layer which uses tanh after convolution layer. Similar to [6, 7, 15] we are using PatchGANs-based discriminator which can classify local patches for real or fake, providing efficiency over full image classifier. No normalization is applied to discriminator.

3 Experiments and Results

3.1 Dataset

We use IXI datasetFootnote 1 for all of our experiments, which provides scans of almost 600 subjects for all four contrasts. Images for IXI dataset are acquired using three different scanners, however information for only two (Philips Medical Systems Gyroscan Intera 1.5T \(\rightarrow S1\), Philips Medical Systems Intera 3T \(\rightarrow S2\))is available which is provided in Table 1. Since, the provided images were not registered we used AntsPyFootnote 2 package for registering all images to a common template using affine transformation. This provides us with 568 images of same size and position from which 68 were randomly selected for testing while remaining 500 were used for training. Since the MRA images of IXI dataset provide better resolution in axial plane, therefore, axial slices of all images were taken.

Table 1. Images acquisition parameters

3.2 Implementation Details

For all of our experiments we used PyTorch, and the image slices were center croped and resized to \(256 \times 256\) due to computational limitations. Input image and target contrast are selected randomly in an unpaired manner for training. For fair comparison both models default Star-GAN and proposed use same values of hyperparameters. Both models are trained for 200,000 iterations with a batch size of 10, for optimization Adam optimizer with momentum of 0.9 is used.

3.3 Quantitative Results

To evaluate the performance of our model against Star-GAN, we utilize the commonly used metrics of peak signal-noise ratio (PSNR), SSIM [11] and LPIPS [12]. The averaged results of 4129 slices for each meaningful mapping are shown in Table 2. Here, high PSNR, SSIM and lower LPIPS means better quality of the generated images. Our method has clearly outperformed Star-GAN for all mappings.

Table 2. Synthesis: Real Image \(\rightarrow \) Fake Image (generated by network).
Fig. 2.
figure 2

Synthesis of MRA, PD-weighted and T\(_1\)-weighted images using a single T\(_2\)-weighted image as input.

Fig. 3.
figure 3

Synthesis of MRA, T\(_2\)-weighted and T\(_1\)-weighted images using a single PD-weighted image as input.

Fig. 4.
figure 4

Synthesis of PD, T\(_2\)-weighted and T\(_1\)-weighted images using a single MRA image as input.

3.4 Qualitative Results

Figures 2, 3 and 4 shows the qualitative comparison of our method against Star-GAN for multi-contrast synthesis. It can be seen that images generated by Star-GAN lack structural and perceptual similarity for small anatomical regions, which are captured by our method. Synthesis of MRA from T\(_2\)-weighted image Fig. 2 shows Star-GAN failed to capture the overall color of the image, while our method generated image identical to the real one. Similarly Figs. 3 and 4 show the superiority of our method.

4 Conclusion

In this paper, we proposed a Star-GAN based method with U-NET generator and new generation loss for multi-contrast MR image synthesis using only one generator and discriminator. The qualitative and quantitative results show the superiority of our method against default Star-GAN. Our solution also removes the limitation of training multiple networks for multi-contrast image synthesis, which is extremely important for many deep learning methods dependent on multi-contrast data for training. In our future work, we would like to extend our experiments to include more modalities and learn mappings among all of them using only a single generator and discriminator.