[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3666122.3666901guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
research-article

INSPECT: a multimodal dataset for pulmonary embolism diagnosis and prognosis

Published: 10 December 2023 Publication History

Abstract

Synthesizing information from multiple data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of patients at risk for pulmonary embolism (PE), along with ground truth labels for multiple outcomes. INSPECT contains data from 19,402 patients, including CT images, radiology report impression sections, and structured electronic health record (EHR) data (i.e. demographics, diagnoses, procedures, vitals, and medications). Using INSPECT, we develop and release a benchmark for evaluating several baseline modeling approaches on a variety of important PE related tasks. We evaluate image-only, EHR-only, and multimodal fusion models. Trained models and the de-identified dataset are made available for non-commercial use under a data use agreement. To the best of our knowledge, INSPECT is the largest multimodal dataset integrating 3D medical imaging and EHR for reproducible methods evaluation and research. https://inspect.stanford.edu

References

[1]
José Luis Alonso-Martínez, FJ Anniccherico Sánchez, and MA Urbieta Echezarreta. Delay and misdiagnosis in sub-massive and non-massive acute pulmonary embolism. European journal of internal medicine, 21(4):278-282, 2010.
[2]
M. Ascha, R. D. Renapurkar, and A. R. Tonelli. A review of imaging modalities in pulmonary hypertension. Ann Thorac Med, 12(2):61-73, 2017.
[3]
Muhammad Adeel Azam, Khan Bahadar Khan, Sana Salahuddin, Eid Rehman, Sajid Ali Khan, Muhammad Attique Khan, Seifedine Kadry, and Amir H Gandomi. A review on multimodal medical image fusion: Compendious analysis of medical modalities, multimodal databases, fusion techniques and quality metrics. Computers in biology and medicine, 144:105253, 2022.
[4]
Andreas Gunter Bach, Bettina-Maria Taute, Nansalmaa Baasai, Andreas Wienke, Hans Jonas Meyer, Dominik Schramm, and Alexey Surov. 30-day mortality in acute pulmonary embolism: prognostic value of clinical scores and anamnestic features. PloS one, 11(2):e0148728, 2016.
[5]
Imon Banerjee, Yuan Ling, Matthew C Chen, Sadid A Hasan, Curtis P Langlotz, Nathaniel Moradzadeh, Brian Chapman, Timothy Amrhein, David Mong, Daniel L Rubin, et al. Comparative effectiveness of convolutional neural network (cnn) and recurrent neural network (rnn) architectures for radiology text report classification. Artificial intelligence in medicine, 97:79-88, 2019.
[6]
Gaurav Bhatnagar, QM Jonathan Wu, and Zheng Liu. A new contrast based multimodal medical image fusion framework. Neurocomputing, 157:143-152, 2015.
[7]
UK Biobank. About uk biobank, 2014.
[8]
Clare Bycroft, Colin Freeman, Desislava Petkova, Gavin Band, Lloyd T Elliott, Kevin Sharp, Allan Motyer, Damjan Vukcevic, Olivier Delaneau, Jared O'Connell, et al. The uk biobank resource with deep phenotyping and genomic data. Nature, 562(7726):203-209, 2018.
[9]
Alison Callahan, Euan Ashley, Somalee Datta, Priyamvada Desai, Todd A Ferris, Jason A Fries, Michael Halaas, Curtis P Langlotz, Sean Mackey, José D Posada, et al. The stanford medicine data science ecosystem for clinical and translational research. JAMIA open, 6(3):ooad054, 2023.
[10]
Mervyn D Cohen. Accuracy of information on imaging requisitions: does it matter? Journal of the American College of Radiology, 4(9):617-621, 2007.
[11]
Errol Colak, Felipe C Kitamura, Stephen B Hobbs, Carol C Wu, Matthew P Lungren, Luciano M Prevedello, Jayashree Kalpathy-Cramer, Robyn L Ball, George Shih, Anouk Stein, et al. The rsna pulmonary embolism ct dataset. Radiology: Artificial Intelligence, 3(2):e200254, 2021.
[12]
Nneka I Comfere, Margot S Peters, Sarah Jenkins, Kandace Lackore, Kathleen Yost, and Jon Tilburt. Dermatopathologists' concerns and challenges with clinical information in the skin biopsy requisition form: a mixed-methods study. Journal of cutaneous pathology, 42(5):333-345, 2015.
[13]
Nneka I Comfere, Olayemi Sokumbi, Victor M Montori, Annie LeBlanc, Larry J Prokop, M Hassan Murad, and Jon C Tilburt. Provider-to-provider communication in dermatology and implications of missing clinical information in skin biopsy requisition forms: a systematic review. International journal of dermatology, 53(5):549-557, 2014.
[14]
Can Cui, Haichun Yang, Yaohong Wang, Shilin Zhao, Zuhayr Asad, Lori A Coburn, Keith T Wilson, Bennett Landman, and Yuankai Huo. Deep multi-modal fusion of image and non-image data in disease diagnosis and prognosis: a review. Progress in Biomedical Engineering, 2023.
[15]
Somalee Datta, Jose Posada, Garrick Olson, Wencheng Li, Ciaran O'Reilly, Deepa Balraj, Joseph Mesterhazy, Joseph Pallas, Priyamvada Desai, and Nigam Shah. A new paradigm for accelerating clinical data science at stanford medicine. arXiv preprint arXiv:2003.10534, 2020.
[16]
Dina Demner-Fushman, Marc D Kohli, Marc B Rosenman, Sonya E Shooshan, Laritza Rodriguez, Sameer Antani, George R Thoma, and Clement J McDonald. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, 23(2):304-310, 2016.
[17]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
[18]
Akane Haruna, Shigeo Muro, Yasutaka Nakano, Tadashi Ohara, Yuma Hoshino, Emiko Ogawa, Toyohiro Hirai, Akio Niimi, Koichi Nishimura, Kazuo Chin, et al. Ct scan findings of emphysema predict mortality incopd. Chest, 138(3):635-640, 2010.
[19]
Janneke MT Hendriksen, Marleen Koster-van Ree, Marcus J Morgenstern, Ruud Oudega, Roger EG Schutgens, Karel GM Moons, and Geert-Jan Geersing. Clinical characteristics associated with diagnostic delay of pulmonary embolism in primary care: a retrospective observational study. BMJ open, 7(3):e012789, 2017.
[20]
Haithem Hermessi, Olfa Mourali, and Ezzeddine Zagrouba. Multimodal medical image fusion review: Theoretical background and recent advances. Signal Processing, 183:108036, 2021.
[21]
Kenneth T Horlander, David M Mannino, and Kenneth V Leeper. Pulmonary embolism mortality in the united states, 1979-1998: an analysis using multiple-cause mortality data. Archives of internal medicine, 163(14):1711-1717, 2003.
[22]
Bing Huang, Feng Yang, Mengxiao Yin, Xiaoying Mo, and Cheng Zhong. A review of multimodal medical image fusion techniques. Computational and mathematical methods in medicine, 2020, 2020.
[23]
Shih-Cheng Huang, Tanay Kothari, Imon Banerjee, Chris Chute, Robyn L Ball, Norah Borus, Andrew Huang, Bhavik N Patel, Pranav Rajpurkar, Jeremy Irvin, et al. Penet—a scalable deep-learning model for automated diagnosis of pulmonary embolism using volumetric ct imaging. NPJ digital medicine, 3(1):61, 2020.
[24]
Shih-Cheng Huang, Anuj Pareek, Saeed Seyyedi, Imon Banerjee, and Matthew P Lungren. Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ digital medicine, 3(1):136, 2020.
[25]
Shih-Cheng Huang, Anuj Pareek, Roham Zamanian, Imon Banerjee, and Matthew P Lungren. Multimodal fusion with deep neural networks for leveraging ct imaging and electronic health record: a case-study in pulmonary embolism detection. Scientific reports, 10(1):22147, 2020.
[26]
Shih-Cheng Huang, Liyue Shen, Matthew P Lungren, and Serena Yeung. Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3942-3951, 2021.
[27]
Heidi Huhtanen, Mikko Nyman, Tarek Mohsen, Arho Virkki, Antti Karlsson, and Jussi Hirvonen. Automated detection of pulmonary embolism from ct-angiograms using deep learning. BMC Medical Imaging, 22(1):43, 2022.
[28]
Andetta R Hunsaker. Deep learning and risk assessment in acute pulmonary embolism, 2022.
[29]
Seung Hyup Hyun, Mi Sun Ahn, Young Wha Koh, and Su Jin Lee. A machine-learning approach using pet-based radiomics to predict the histological subtypes of lung cancer. Clinical nuclear medicine, 44(12):956-960, 2019.
[30]
Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn L. Ball, Katie S. Shpanskaya, Jayne Seekins, David A. Mong, Safwan S. Halabi, Jesse K. Sandberg, Ricky Jones, David B. Larson, Curtis P. Langlotz, Bhavik N. Patel, Matthew P. Lungren, and Andrew Y. Ng. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. CoRR, abs/1901.07031, 2019.
[31]
Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Leo Anthony Celi, Roger Mark, and S Horng IV. Mimiciv-ed. PhysioNet, 2021.
[32]
Alistair EW Johnson, Tom J Pollard, Nathaniel R Greenbaum, Matthew P Lungren, Chih-ying Deng, Yifan Peng, Zhiyong Lu, Roger G Mark, Seth J Berkowitz, and Steven Horng. Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042, 2019.
[33]
Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1-9, 2016.
[34]
Jost B Jonas, Tin Aung, Rupert R Bourne, Alain M Bron, Robert Ritch, and Songhomitra Panda-Jonas. Glaucoma-authors' reply. The Lancet, 391(10122):740, 2018.
[35]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30, 2017.
[36]
P Kharazmi, S Kalia, H Lui, ZJ Wang, and TK Lee. A feature fusion system for basal cell carcinoma detection through data-driven feature learning and patient profile. Skin research and technology, 24(2):256-264, 2018.
[37]
Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, and Neil Houlsby. Big transfer (bit): General visual representation learning. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part V 16, pages 491-507. Springer, 2020.
[38]
Ajith AK Kumar. Mortality prediction in the icu: The daunting task of predicting the unpredictable. Indian Journal of Critical Care Medicine: Peer-reviewed, Official Publication of Indian Society of Critical Care Medicine, 26(1):13, 2022.
[39]
Grégoire Le Gal and Henri Bounameaux. Diagnosing pulmonary embolism: running after the decreasing prevalence of cases among suspected patients. Journal of thrombosis and haemostasis, 2(8):1244-1246, 2004.
[40]
Adones Leslie, AJ Jones, and PR Goddard. The influence of clinical information on the reporting of ct by radiologists. The British journal of radiology, 73(874):1052-1055, 2000.
[41]
Ann N Leung, Todd M Bull, Roman Jaeschke, Charles J Lockwood, Phillip M Boiselle, Lynne M Hurwitz, Andra H James, Laurence B McCullough, Yusuf Menda, Michael J Paidas, et al. An official american thoracic society/society of thoracic radiology clinical practice guideline: evaluation of suspected pulmonary embolism in pregnancy. American journal of respiratory and critical care medicine, 184(10):1200-1208, 2011.
[42]
Ann N Leung, Todd M Bull, Roman Jaeschke, Charles J Lockwood, Phillip M Boiselle, Lynne M Hurwitz, Andra H James, Laurence B McCullough, Yusuf Menda, Michael J Paidas, et al. American thoracic society documents: an official american thoracic society/society of thoracic radiology clinical practice guideline—evaluation of suspected pulmonary embolism in pregnancy. Radiology, 262(2):635-646, 2012.
[43]
Hongming Li and Yong Fan. Early prediction of alzheimer's disease dementia based on baseline hippocampal mri and 1-year follow-up cognitive measures using deep recurrent neural networks. In 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pages 368-371. IEEE, 2019.
[44]
Yikuan Li, Shishir Rao, José Roberto Ayala Solares, Abdelaali Hassaine, Rema Ramakrishnan, Dexter Canoy, Yajie Zhu, Kazem Rahimi, and Gholamreza Salimi-Khorshidi. Behrt: transformer for electronic health records. Scientific reports, 10(1):1-12, 2020.
[45]
Yikuan Li, Ramsey M Wehbe, Faraz S Ahmad, Hanyin Wang, and Yuan Luo. Clinical-longformer and clinical-bigbird: Transformers for long clinical sequences. arXiv preprint arXiv:2201.11838, 2022.
[46]
Thomas J Littlejohns, Jo Holliday, Lorna M Gibson, Steve Garratt, Niels Oesingmann, Fidel Alfaro-Almagro, Jimmy D Bell, Chris Boultwood, Rory Collins, Megan C Conroy, et al. The uk biobank imaging enhancement of 100,000 participants: rationale, data collection, management and future directions. Nature communications, 11(1):2624, 2020.
[47]
Ming-Qian LIU. Bone age assessment model based on multi-dimensional feature fusion using deep learning. Academic journal ofsecond military medical university, pages 909-916, 2018.
[48]
Weifang Liu, Min Liu, Xiaojuan Guo, Peiyao Zhang, Ling Zhang, Rongguo Zhang, Han Kang, Zhenguo Zhai, Xincao Tao, Jun Wan, et al. Evaluation of acute pulmonary embolism and clot burden on ctpa with deep learning. European radiology, 30:3567-3575, 2020.
[49]
Zhaodong Liu, Hongpeng Yin, Yi Chai, and Simon X Yang. A novel approach for multimodal medical image fusion. Expert systems with applications, 41(16):7425-7435, 2014.
[50]
Xiaotian Ma, Emma C Ferguson, Xiaoqian Jiang, Sean I Savitz, and Shayan Shams. A multitask deep learning approach for pulmonary embolism detection and identification. Scientific Reports, 12(1):13087, 2022.
[51]
Ioana Mastora, Martine Remy-Jardin, Pascal Masson, Eric Galland, Valérie Delannoy, Jean-Jacques Bauchart, and Jacques Remy. Severity of acute pulmonary embolism: evaluation of a new spiral ct angiographic score in correlation with echocardiographic data. European radiology, 13:29-35, 2003.
[52]
Felix G Meinel, John W Nance Jr, U Joseph Schoepf, Verena S Hoffmann, Kolja M Thierfelder, Philip Costello, Samuel Z Goldhaber, and Fabian Bamberg. Predictive value of computed tomography in acute pulmonary embolism: systematic review and meta-analysis. The American journal of medicine, 128(7):747-759, 2015.
[53]
Pooya Mobadersany, Safoora Yousefi, Mohamed Amgad, David A Gutman, Jill S Barnholtz-Sloan, José E Velázquez Vega, Daniel J Brat, and Lee AD Cooper. Predicting cancer outcomes from histology and genomics using convolutional networks. Proceedings of the National Academy of Sciences, 115(13):E2970-E2979, 2018.
[54]
Lisa Moores, Celia Zamarro, Vicente Gómez, Drahomir Aujesky, Leticia Garcia, Rosa Nieto, Roger Yusen, and David Jiménez. Changes in pesi scores predict mortality in intermediate-risk patients with acute pulmonary embolism. European respiratory journal, 41(2):354-359, 2013.
[55]
D. nez, D. Aujesky, L. Moores, V. mez, J. L. Lobo, F. Uresandi, R. Otero, M. Monreal, A. Muriel, R. D. Yusen, M. Monreal, H. Decousus, P. Prandoni, B. Brenner, R. Barba, P. Di Micco, K. Rivron-Guillot, J. I. Arcelus, M. n, A. Blanco, M. Bonilla, T. Bueso, I. as, I. Casado, F. Conget, C. á, C. n, P. Gallego, F. a Bragado, R. Guijarro, E. Grau, M. Guil, J. rrez, L. ndez, D. nez, R. Lecumberri, J. M. n, M. Llado, J. L. Lobo, L. pez, A. Lorenzo, J. M. Luque, O. Madridano, A. Maestre, P. J. Marchena, A. n, J. J. n Villasclaras, R. Monte, F. J. z, M. D. Naufall, J. A. Nieto, M. Oribe, M. T. Orue, R. Otero, J. Portillo, R. al, C. Renzi, A. Riera-Mestre, V. Rosa, S. Rubio, A. Ruiz-Gamietea, J. C. Sahuquillo, A. L. Samperiz, R. nchez, J. F. oz Torrero, R. Sandoval, S. Soler, G. Tiberio, R. Tirado, J. A. í, C. Tolosa, I. Torres, J. Trujillo-Santos, F. Uresandi, M. s, V. s, R. Valle, B. Vasco, J. Vela, H. Boccalon, N. Falvo, P. Le Corvoisier, K. Rivron-Guillot, G. Barillari, M. Ciammaichella, F. Dalla Valle, R. Duce, A. Ferrari, S. Pasca, G. Piovaccari, R. Poggio, P. Prandoni, R. Quintavalla, A. Rocci, L. Rota, A. Schenone, E. Tiraferri, and A. à. Simplification of the pulmonary embolism severity index for prognostication in patients with acute symptomatic pulmonary embolism. Arch Intern Med, 170(15):1383-1389, Aug 2010.
[56]
Dong Nie, Junfeng Lu, Han Zhang, Ehsan Adeli, Jun Wang, Zhengda Yu, LuYan Liu, Qian Wang, Jinsong Wu, and Dinggang Shen. Multi-channel 3d deep feature learning for survival time prediction of brain tumor patients using multi-modal neuroimages. Scientific reports, 9(1):1103, 2019.
[57]
OHDSI. Omop common data model. https://ohdsi.github.io/CommonDataModel/index.html, 2023. Accessed: 2023-06-07.
[58]
Ian Pan. Deep learning for pulmonary embolism detection: tackling the rsna 2020 ai challenge. Radiology: Artificial Intelligence, 3(5):e210068, 2021.
[59]
Shikha Purwar, Rajiv Kumar Tripathi, Ravi Ranjan, and Renu Saxena. Detection of microcytic hypochromia using cbc and blood film features extracted from convolution neural network by different classifiers. Multimedia Tools and Applications, 79:4573-4595, 2020.
[60]
Alvin Rajkomar, Eyal Oren, Kai Chen, Andrew M Dai, Nissan Hajaj, Michaela Hardt, Peter J Liu, Xiaobing Liu, Jake Marcus, Mimi Sun, et al. Scalable and accurate deep learning with electronic health records. NPJ digital medicine, 1(1):18, 2018.
[61]
Bob Segert. Athena. https://athena.ohdsi.org/, 2023. Accessed: 2023-8-16.
[62]
Shelly Soffer, Eyal Klang, Orit Shimon, Yiftach Barash, Noa Cahan, Hayit Greenspana, and Eli Konen. Deep learning for pulmonary embolism detection on computed tomography pulmonary angiogram: a systematic review and meta-analysis. Scientific reports, 11(1):15814, 2021.
[63]
Ethan Steinberg, Yizhe Xu, Jason Fries, and Nigam Shah. Self-supervised time-to-event modeling with structured medical records. arXiv preprint arXiv:2301.03150, 2023.
[64]
Kim-Han Thung, Pew-Thian Yap, and Dinggang Shen. Multi-stage diagnosis of alzheimer's disease with incomplete multimodal data via multi-task deep learning. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada, September 14, Proceedings, pages 160-168. Springer, 2017.
[65]
Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, and Ronald M Summers. Tienet: Text-image embedding network for common thorax disease classification and reporting in chest x-rays. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9049-9058, 2018.
[66]
Jordan Yap, William Yolland, and Philipp Tschandl. Multimodal skin lesion classification using deep learning. Experimental dermatology, 27(11):1261-1267, 2018.
[67]
Zhuo Zhi, Moe Elbadawi, Adam Daneshmend, Mine Orlu, Abdul Basit, Andreas Demosthenous, and Miguel Rodrigues. Multimodal diagnosis for pulmonary embolism from ehr data and ct images. In 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 2053-2057. IEEE, 2022.
[68]
Yuyin Zhou, Shih-Cheng Huang, Jason Alan Fries, Alaa Youssef, Timothy J Amrhein, Marcello Chang, Imon Banerjee, Daniel Rubin, Lei Xing, Nigam Shah, et al. Radfusion: Benchmarking performance and fairness for multimodal pulmonary embolism detection from ct and ehr. arXiv preprint arXiv:2111.11665, 2021.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems
December 2023
80772 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 10 December 2023

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 29 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media