[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Joint self-supervised and supervised contrastive learning for multimodal MRI data: : Towards predicting abnormal neurodevelopment

Published: 01 November 2024 Publication History

Abstract

The integration of different imaging modalities, such as structural, diffusion tensor, and functional magnetic resonance imaging, with deep learning models has yielded promising outcomes in discerning phenotypic characteristics and enhancing disease diagnosis. The development of such a technique hinges on the efficient fusion of heterogeneous multimodal features, which initially reside within distinct representation spaces. Naively fusing the multimodal features does not adequately capture the complementary information and could even produce redundancy. In this work, we present a novel joint self-supervised and supervised contrastive learning method to learn the robust latent feature representation from multimodal MRI data, allowing the projection of heterogeneous features into a shared common space, and thereby amalgamating both complementary and analogous information across various modalities and among similar subjects. We performed a comparative analysis between our proposed method and alternative deep multimodal learning approaches. Through extensive experiments on two independent datasets, the results demonstrated that our method is significantly superior to several other deep multimodal learning methods in predicting abnormal neurodevelopment. Our method has the capability to facilitate computer-aided diagnosis within clinical practice, harnessing the power of multimodal data. The source code of the proposed model is publicly accessible on GitHub: https://github.com/leonzyzy/Contrastive-Network.

Highlights

Deep learning and multimodal data to predict neurological deficits.
Self-supervised contrastive learning to fuse heterogeneous multimodal features.
Supervised contrastive learning to capture shared information among similar subjects.
Joint contrastive learning to learn feature representations.

References

[1]
Kidwell C.S., Alger J.R., Saver J.L., Beyond mismatch: evolving paradigms in imaging the ischemic penumbra with multimodal magnetic resonance imaging, Stroke 34 (11) (2003) 2729–2735.
[2]
Frisoni G.B., Fox N.C., Jack C.R. Jr., Scheltens P., Thompson P.M., The clinical use of structural MRI in Alzheimer disease, Nat Rev Neurol 6 (2) (2010) 67–77.
[3]
Jones D.K., Diffusion MRI, Oxford University Press, 2010.
[4]
Friston K.J., Jezzard P., Turner R., Analysis of functional MRI time-series, Hum Brain Mapp 1 (2) (1994) 153–171.
[5]
Dai X., Lei Y., Fu Y., Curran W.J., Liu T., Mao H., Yang X., Multimodal MRI synthesis using unified generative adversarial networks, Med Phys 47 (12) (2020) 6343–6354.
[6]
Lee J.-Y., Martin-Bastida A., Murueta-Goyena A., Gabilondo I., Cuenca N., Piccini P., Jeon B., Multimodal brain and retinal imaging of dopaminergic degeneration in parkinson disease, Nat Rev Neurol 18 (4) (2022) 203–220.
[7]
Ramachandram D., Taylor G.W., Deep multimodal learning: A survey on recent advances and trends, IEEE Signal Process Mag 34 (6) (2017) 96–108.
[8]
Wang D., Zhao T., Yu W., Chawla N.V., Jiang M., Deep multimodal complementarity learning, IEEE Trans Neural Netw Learn Syst (2022).
[9]
Poria S, Cambria E, Gelbukh A. Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: Proceedings of the 2015 conference on empirical methods in natural language processing. 2015, p. 2539–44.
[10]
Wen H., Liu Y., Rekik I., Wang S., Chen Z., Zhang J., Zhang Y., Peng Y., He H., Multi-modal multiple kernel learning for accurate identification of tourette syndrome children, Pattern Recognit 63 (2017) 601–611.
[11]
Wang Y., Huang W., Sun F., Xu T., Rong Y., Huang J., Deep multimodal fusion by channel exchanging, Adv Neural Inf Process Syst 33 (2020) 4835–4845.
[12]
Huang S.-C., Pareek A., Zamanian R., Banerjee I., Lungren M.P., Multimodal fusion with deep neural networks for leveraging CT imaging and electronic health record: a case-study in pulmonary embolism detection, Sci Rep 10 (1) (2020) 1–9.
[13]
He L., Li H., Chen M., Wang J., Altaye M., Dillman J.R., Parikh N.A., Deep multimodal learning from MRI and clinical data for early prediction of neurodevelopmental deficits in very preterm infants, Front Neurosci 15 (2021).
[14]
Boulahia S.Y., Amamra A., Madi M.R., Daikh S., Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition, Mach Vis Appl 32 (6) (2021) 121.
[15]
Radu V., Tong C., Bhattacharya S., Lane N.D., Mascolo C., Marina M.K., Kawsar F., Multimodal deep learning for activity and context recognition, Proc ACM Interact Mob Wearable Ubiquitous Technol 1 (4) (2018) 1–27.
[16]
Liu W., Qiu J.-L., Zheng W.-L., Lu B.-L., Multimodal emotion recognition using deep canonical correlation analysis, 2019, arXiv preprint arXiv:1908.05349.
[17]
Yuan F., Ke X., Cheng E., Joint representation and recognition for ship-radiated noise based on multimodal deep learning, J Mar Sci Eng 7 (11) (2019) 380.
[18]
Puyol-Antón E., Sidhu B.S., Gould J., Porter B., Elliott M.K., Mehta V., Rinaldi C.A., King A.P., A multimodal deep learning model for cardiac resynchronisation therapy response prediction, Med Image Anal 79 (2022).
[19]
He X., Wang Y., Zhao S., Chen X., Co-attention fusion network for multimodal skin cancer diagnosis, Pattern Recognit 133 (2023).
[20]
Jha A, Bose S, Banerjee B. GAF-Net: Improving the Performance of Remote Sensing Image Fusion Using Novel Global Self and Cross Attention Learning. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2023, p. 6354–63.
[21]
Bakkali S., Ming Z., Coustaty M., Rusiñol M., Terrades O.R., VLCDoC: Vision-language contrastive pre-training model for cross-Modal document classification, Pattern Recognit 139 (2023).
[22]
Li X., Jia M., Islam M.T., Yu L., Xing L., Self-supervised feature learning via exploiting multi-modal data for retinal disease diagnosis, IEEE Trans Med Imaging 39 (12) (2020) 4023–4033.
[23]
Sun H., Liu J., Chen Y.-W., Lin L., Modality-invariant temporal representation learning for multimodal sentiment classification, Inf Fusion 91 (2023) 504–514.
[24]
Radford A., Kim J.W., Hallacy C., Ramesh A., Goh G., Agarwal S., Sastry G., Askell A., Mishkin P., Clark J., et al., Learning transferable visual models from natural language supervision, in: International conference on machine learning, PMLR, 2021, pp. 8748–8763.
[25]
Sanghi A, Chu H, Lambourne JG, Wang Y, Cheng C-Y, Fumero M, Malekshan KR. Clip-forge: Towards zero-shot text-to-shape generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022, p. 18603–13.
[26]
Wang C, Chai M, He M, Chen D, Liao J. Clip-nerf: Text-and-image driven manipulation of neural radiance fields. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022, p. 3835–44.
[27]
Taleb A, Kirchler M, Monti R, Lippert C. Contig: Self-supervised multimodal contrastive learning for medical imaging with genetics. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022, p. 20908–21.
[28]
Zhang Y., Jiang H., Miura Y., Manning C.D., Langlotz C.P., Contrastive learning of medical visual representations from paired images and text, in: Machine learning for healthcare conference, PMLR, 2022, pp. 2–25.
[29]
Akbari H., Yuan L., Qian R., Chuang W.-H., Chang S.-F., Cui Y., Gong B., Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text, Adv Neural Inf Process Syst 34 (2021) 24206–24221.
[30]
Huang Z., Xu X., Ni J., Zhu H., Wang C., Multimodal representation learning for recommendation in Internet of Things, IEEE Internet Things J 6 (6) (2019) 10675–10685.
[31]
Zhang W, Gui L, He Y. Supervised contrastive learning for multimodal unreliable news detection in covid-19 pandemic. In: Proceedings of the 30th ACM international conference on information & knowledge management. 2021, p. 3637–41.
[32]
Zhu Q., Wang H., Xu B., Zhang Z., Shao W., Zhang D., Multimodal triplet attention network for brain disease diagnosis, IEEE Trans Med Imaging 41 (12) (2022) 3884–3894.
[33]
Khosla P., Teterwak P., Wang C., Sarna A., Tian Y., Isola P., Maschinot A., Liu C., Krishnan D., Supervised contrastive learning, Adv Neural Inf Process Syst 33 (2020) 18661–18673.
[34]
Hoffer E., Ailon N., Deep metric learning using triplet network, in: Similarity-based pattern recognition: third international workshop, SIMBAD 2015, copenhagen, Denmark, October 12-14, 2015. proceedings 3, Springer, 2015, pp. 84–92.
[35]
Aderghal K, Benois-Pineau J, Afdel K. Classification of sMRI for Alzheimer’s disease diagnosis with CNN: single Siamese networks with 2D+? Approach and fusion on ADNI. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval. 2017, p. 494–8.
[36]
Rossi A., Hosseinzadeh M., Bianchini M., Scarselli F., Huisman H., Multi-modal siamese network for diagnostically similar lesion retrieval in prostate MRI, IEEE Trans Med Imaging 40 (3) (2020) 986–995.
[37]
Yu Y., Hu P., Lin J., Krishnaswamy P., Multimodal multitask deep learning for X-ray image retrieval, in: Medical image computing and computer assisted intervention–mICCAI 2021: 24th international conference, strasbourg, France, September 27–October 1, 2021, proceedings, part v 24, Springer, 2021, pp. 603–613.
[38]
Zhang S., Zhang J., Tian B., Lukasiewicz T., Xu Z., Multi-modal contrastive mutual learning and pseudo-label re-learning for semi-supervised medical image segmentation, Med Image Anal 83 (2023).
[39]
Tang M., Kumar P., Chen H., Shrivastava A., Deep multimodal learning for the diagnosis of autism spectrum disorder, J Imaging 6 (6) (2020) 47.
[40]
Joo S., Ko E.S., Kwon S., Jeon E., Jung H., Kim J.-Y., Chung M.J., Im Y.-H., Multimodal deep learning models for the prediction of pathologic response to neoadjuvant chemotherapy in breast cancer, Sci Rep 11 (1) (2021) 18800.
[41]
Yang X., Liu W., Liu W., Tao D., A survey on canonical correlation analysis, IEEE Trans Knowl Data Eng 33 (6) (2019) 2349–2368.
[42]
Gao L., Qi L., Chen E., Guan L., Discriminative multiple canonical correlation analysis for information fusion, IEEE Trans Image Process 27 (4) (2017) 1951–1965.
[43]
Subramanian V., Syeda-Mahmood T., Do M.N., Multimodal fusion using sparse CCA for breast cancer survival prediction, in: 2021 IEEE 18th international symposium on biomedical imaging, ISBI, IEEE, 2021, pp. 1429–1432.
[44]
Song X., Chao H., Xu X., Guo H., Xu S., Turkbey B., Wood B.J., Sanford T., Wang G., Yan P., Cross-modal attention for multi-modal image registration, Med Image Anal 82 (2022).
[45]
Dalmaz O., Yurt M., Çukur T., ResViT: residual vision transformers for multimodal medical image synthesis, IEEE Trans Med Imaging 41 (10) (2022) 2598–2614.
[46]
Ye M, Zhang X, Yuen PC, Chang S-F. Unsupervised embedding learning via invariant and spreading instance feature. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019, p. 6210–9.
[47]
He K, Fan H, Wu Y, Xie S, Girshick R. Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020, p. 9729–38.
[48]
Chen X., Fan H., Girshick R., He K., Improved baselines with momentum contrastive learning, 2020, arXiv preprint arXiv:2003.04297.
[49]
Chen T., Kornblith S., Norouzi M., Hinton G., A simple framework for contrastive learning of visual representations, in: International conference on machine learning, PMLR, 2020, pp. 1597–1607.
[50]
Grill J.-B., Strub F., Altché F., Tallec C., Richemond P., Buchatskaya E., Doersch C., Avila Pires B., Guo Z., Gheshlaghi Azar M., et al., Bootstrap your own latent-a new approach to self-supervised learning, Adv Neural Inf Process Syst 33 (2020) 21271–21284.
[51]
Chen X, He K. Exploring simple siamese representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021, p. 15750–8.
[52]
Liang S., Gu Y., Computer-aided diagnosis of Alzheimer’s disease through weak supervision deep learning framework with attention mechanism, Sensors 21 (1) (2020) 220.
[53]
Fedorov A., Sylvain T., Geenjaar E., Luck M., Wu L., DeRamus T.P., Kirilin A., Bleklov D., Calhoun V.D., Plis S.M., Self-supervised multimodal domino: in search of biomarkers for Alzheimer’s disease, in: 2021 IEEE 9th international conference on healthcare informatics, ICHI, IEEE, 2021, pp. 23–30.
[54]
Fischer M., Hepp T., Gatidis S., Yang B., Self-supervised contrastive learning with random walks for medical image segmentation with limited annotations, Comput Med Imaging Graph (2023).
[55]
Chopra S., Hadsell R., LeCun Y., Learning a similarity metric discriminatively, with application to face verification, in: 2005 IEEE computer society conference on computer vision and pattern recognition, Vol. 1, CVPR’05, IEEE, 2005, pp. 539–546.
[56]
Sohn K., Improved deep metric learning with multi-class n-pair loss objective, Adv Neural Inf Process Syst 29 (2016).
[57]
Li Z., Ralescu A., Learning generalized hybrid proximity representation for image recognition, in: 2022 IEEE 34th international conference on tools with artificial intelligence, ICTAI, IEEE, 2022, pp. 901–908.
[58]
Li Z., Ralescu A., Generalized self-supervised contrastive learning with bregman divergence for image recognition, Pattern Recognit Lett 171 (2023) 155–161.
[59]
Zhu Q., Xu B., Huang J., Wang H., Xu R., Shao W., Zhang D., Deep multi-modal discriminative and interpretability network for Alzheimer’s disease diagnosis, IEEE Trans Med Imaging (2022).
[60]
Ktena S.I., Parisot S., Ferrante E., Rajchl M., Lee M., Glocker B., Rueckert D., Distance metric learning using graph convolutional networks: Application to functional brain networks, in: Medical image computing and computer assisted intervention- mICCAI 2017: 20th international conference, quebec city, QC, Canada, September 11-13, 2017, proceedings, part i 20, Springer, 2017, pp. 469–477.
[61]
Memmesheimer R., Theisen N., Paulus D., SL-DML: Signal level deep metric learning for multimodal one-shot action recognition, in: 2020 25th international conference on pattern recognition, ICPR, IEEE, 2021, pp. 4573–4580.
[62]
Li Z., Li H., Braimah A., Dillman J.R., Parikh N.A., He L., A novel ontology-guided attribute partitioning ensemble learning model for early prediction of cognitive deficits using quantitative structural MRI in very preterm infants, NeuroImage 260 (2022).
[63]
Li H., Wang J., Li Z., Cecil K.M., Altaye M., Dillman J.R., Parikh N.A., He L., Supervised contrastive learning enhances graph convolutional networks for predicting neurodevelopmental deficits in very preterm infants using brain structural connectome, NeuroImage 291 (2024).
[64]
Li X., Hu X., Qi X., Yu L., Zhao W., Heng P.-A., Xing L., Rotation-oriented collaborative self-supervised learning for retinal disease diagnosis, IEEE Trans Med Imaging 40 (9) (2021) 2284–2294.
[65]
Cheng J., Zhang X., Zhao F., Wu Z., Yuan X., Wang L., Lin W., Li G., Prediction of infant cognitive development with cortical surface-based multimodal learning, in: International conference on medical image computing and computer-assisted intervention, Springer, 2023, pp. 618–627.
[66]
Hu D., Zhang H., Wu Z., Wang F., Wang L., Smith J.K., Lin W., Li G., Shen D., Disentangled-multimodal adversarial autoencoder: Application to infant age prediction with incomplete multimodal neuroimages, IEEE Trans Med Imaging 39 (12) (2020) 4137–4149.
[67]
Yuan X., Cheng J., Zhao F., Wu Z., Wang L., Lin W., Zhang Y., Li G., Multi-task joint prediction of infant cortical morphological and cognitive development, in: International conference on medical image computing and computer-assisted intervention, Springer, 2023, pp. 545–554.
[68]
Dcouto S.S., Pradeepkandhasamy J., Multimodal deep learning in early autism detection—Recent advances and challenges, Eng Proc 59 (1) (2024) 205.
[69]
Li M., Tang D., Zeng J., Zhou T., Zhu H., Chen B., Zou X., An automated assessment framework for atypical prosody and stereotyped idiosyncratic phrases related to autism spectrum disorder, Comput Speech Lang 56 (2019) 80–94.
[70]
Jang Y.H., Ham J., Kasani P.H., Kim H., Lee J.Y., Lee G.Y., Han T.H., Kim B.-N., Lee H.J., Predicting 2-year neurodevelopmental outcomes in preterm infants using multimodal structural brain magnetic resonance imaging with local connectivity, Sci Rep 14 (1) (2024) 9331.
[71]
Luo Y., Alvarez T.L., Halperin J.M., Li X., Multimodal neuroimaging-based prediction of adult outcomes in childhood-onset ADHD using ensemble learning techniques, NeuroImage: Clin 26 (2020).
[72]
Peng B., Wang S., Zhou Z., Liu Y., Tong B., Zhang T., Dai Y., A multilevel-ROI-features-based machine learning method for detection of morphometric biomarkers in Parkinson’s disease, Neurosci Lett 651 (2017) 88–94.
[73]
Tan M., Le Q., Efficientnet: Rethinking model scaling for convolutional neural networks, in: International conference on machine learning, PMLR, 2019, pp. 6105–6114.
[74]
Dai W., Li X., Chiu W.H.K., Kuo M.D., Cheng K.-T., Adaptive contrast for image regression in computer-aided disease assessment, IEEE Trans Med Imaging 41 (5) (2021) 1255–1268.
[75]
Kawahara J., Brown C.J., Miller S.P., Booth B.G., Chau V., Grunau R.E., Zwicker J.G., Hamarneh G., BrainNetCNN: Convolutional neural networks for brain networks; towards predicting neurodevelopment, NeuroImage 146 (2017) 1038–1049.
[76]
Makropoulos A., Robinson E.C., Schuh A., Wright R., Fitzgibbon S., Bozek J., Counsell S.J., Steinweg J., Vecchiato K., Passerat-Palmbach J., et al., The developing human connectome project: A minimal processing pipeline for neonatal cortical surface reconstruction, Neuroimage 173 (2018) 88–112.
[77]
Gousias I.S., Edwards A.D., Rutherford M.A., Counsell S.J., Hajnal J.V., Rueckert D., Hammers A., Magnetic resonance imaging of the newborn brain: manual segmentation of labelled atlases in term-born and preterm infants, Neuroimage 62 (3) (2012) 1499–1509.
[78]
Van Griethuysen J.J., Fedorov A., Parmar C., Hosny A., Aucoin N., Narayan V., Beets-Tan R.G., Fillion-Robin J.-C., Pieper S., Aerts H.J., Computational radiomics system to decode the radiographic phenotype, Cancer Res 77 (21) (2017) e104–e107.
[79]
Makropoulos A., Gousias I.S., Ledig C., Aljabar P., Serag A., Hajnal J.V., Edwards A.D., Counsell S.J., Rueckert D., Automatic whole brain MRI segmentation of the developing neonatal brain, IEEE Trans Med Imaging 33 (9) (2014) 1818–1831.
[80]
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I., Attention is all you need, Adv Neural Inf Process Syst 30 (2017).
[81]
Parikh N.A., Sharma P., He L., Li H., Altaye M., Illapani V.S.P., Arnsperger A., Beiersdorfer T., Bridgewater K., Cahill T., et al., Perinatal risk and protective factors in the development of diffuse white matter abnormality on term-equivalent age magnetic resonance imaging in infants born very preterm, J Pediatr 233 (2021) 58–65.
[82]
Kline J.E., Dudley J., Illapani V.S.P., Li H., Kline-Fath B., Tkach J., He L., Yuan W., Parikh N.A., Diffuse excessive high signal intensity in the preterm brain on advanced MRI represents widespread neuropathology, Neuroimage 264 (2022).
[83]
Kelly K.J., Hutton J.S., Parikh N.A., Barnes-Davis M.E., Neuroimaging of brain connectivity related to reading outcomes in children born preterm: A critical narrative review, Front Pediatr 11 (2023).
[84]
Bayley N., Bayley scales of infant and toddler development–third edition (vol. 2), 2006.
[85]
Li Z., Li H., Ralescu A.L., Dillman J.R., Parikh N.A., He L., A novel collaborative self-supervised learning method for radiomic data, NeuroImage (2023).
[86]
D’Souza N.S., Nebel M.B., Crocetti D., Robinson J., Wymbs N., Mostofsky S.H., Venkataraman A., Deep sr-DDL: Deep structurally regularized dynamic dictionary learning to integrate multimodal and dynamic functional connectomics data for multidimensional clinical characterizations, NeuroImage 241 (2021).
[87]
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision. 2017, p. 618–26.
[88]
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, p. 770–8.
[89]
Koonce B., Koonce B., ResNet 34, in: Convolutional neural networks with swift for tensorflow: image recognition and dataset categorization, Springer, 2021, pp. 51–61.
[90]
Bromley J., Guyon I., LeCun Y., Säckinger E., Shah R., Signature verification using a” siamese” time delay neural network, Adv Neural Inf Process Syst 6 (1993).

Index Terms

  1. Joint self-supervised and supervised contrastive learning for multimodal MRI data: Towards predicting abnormal neurodevelopment
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Please enable JavaScript to view thecomments powered by Disqus.

              Information & Contributors

              Information

              Published In

              cover image Artificial Intelligence in Medicine
              Artificial Intelligence in Medicine  Volume 157, Issue C
              Nov 2024
              404 pages

              Publisher

              Elsevier Science Publishers Ltd.

              United Kingdom

              Publication History

              Published: 01 November 2024

              Author Tags

              1. Deep multimodal learning
              2. Joint contrastive learning
              3. Multimodal MRI
              4. Disease diagnosis

              Qualifiers

              • Research-article

              Contributors

              Other Metrics

              Bibliometrics & Citations

              Bibliometrics

              Article Metrics

              • 0
                Total Citations
              • 0
                Total Downloads
              • Downloads (Last 12 months)0
              • Downloads (Last 6 weeks)0
              Reflects downloads up to 28 Jan 2025

              Other Metrics

              Citations

              View Options

              View options

              Figures

              Tables

              Media

              Share

              Share

              Share this Publication link

              Share on social media