[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3535508.3545555acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article
Open access

Using natural language processing on free-text clinical notes to identify patients with long-term COVID effects

Published: 07 August 2022 Publication History

Abstract

As of May 15th, 2022, the novel coronavirus SARS-COV-2 has infected 517 million people and resulted in more than 6.2 million deaths around the world. About 40% to 87% of patients suffer from persistent symptoms weeks or months after their original infection. Despite remarkable progress in preventing and treating acute COVID-19 conditions, the clinical diagnosis of long-term COVID remains difficult. In this work, we use free-text clinical notes and natural language processing (NLP) techniques to explore long-term COVID effects. We first obtain free-text clinical notes from 719 outpatient encounters representing patients treated by physicians at Emory Clinic to detect patterns in patients with long-term COVID symptoms. We apply state-of-the-art NLP frameworks to automatically identify patients with long-term COVID effects, achieving 0.881 recall (sensitivity) score for note-level prediction. We further interpret the prediction outcomes and discuss potential phenotypes. Our work aims to provide a data-driven solution to identify patients who have developed persistent symptoms after acute COVID infection. With this work, clinicians may be able to identify patients who have long-term COVID symptoms to optimize treatment.

References

[1]
Hervé Abdi and Lynne J Williams. 2010. Tukey's honestly significant difference (HSD) test. Encyclopedia of research design 3, 1 (2010), 1--5.
[2]
Emily Alsentzer, John R Murphy, Willie Boag, Wei-Hung Weng, Di Jin, Tristan Naumann, and Matthew McDermott. 2019. Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323 (2019).
[3]
Ben Athiwaratkun, Andrew Gordon Wilson, and Anima Anandkumar. 2018. Probabilistic fasttext for multi-sense word embeddings. arXiv preprint arXiv:1806.02901 (2018).
[4]
Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one 10, 7 (2015), e0130140.
[5]
Richard C Becker. 2020. Anticipating the long-term cardiovascular effects of COVID-19. Journal of thrombosis and thrombolysis 50, 3 (2020), 512--524.
[6]
Alberto Blanco, Alicia Pérez, Arantza Casillas, and Daniel Cobos. 2020. Extracting Cause of Death from Verbal Autopsy with Deep Learning interpretable methods. IEEE Journal of Biomedical and Health Informatics 25, 4 (2020), 1315--1325.
[7]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the association for computational linguistics 5 (2017), 135--146.
[8]
Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N Balasubramanian. 2018. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, 839--847.
[9]
Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014).
[10]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[11]
Prerna Garg, Umang Arora, Arvind Kumar, and Naveet Wig. 2020. The" post-COVID" syndrome: How deep is the damage? Journal of medical virology (2020).
[12]
Wendong Ge, Jin-Won Huh, Yu Rang Park, Jae-Ho Lee, Young-Hak Kim, and Alexander Turchin. 2018. An Interpretable ICU Mortality Prediction Model Based on Logistic Regression and Recurrent Neural Networks with LSTM units. In AMIA Annual Symposium Proceedings, Vol. 2018. American Medical Informatics Association, 460.
[13]
Felipe Giuste, Wenqi Shi, Yuanda Zhu, Tarun Naren, Monica Isgut, Ying Sha, Li Tong, Mitali Gupte, and May D Wang. 2022. Explainable Artificial Intelligence Methods in Combating Pandemics: A Systematic Review. IEEE Reviews in Biomedical Engineering (2022).
[14]
Felipe O Giuste, Lawrence L He, Monica Isgut, Wenqi Shi, Blake J Anderson, and May D Wang. 2021. Automated Risk Assessment of COVID-19 Patients at Diagnosis Using Electronic Healthcare Records. In 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE, 1--4.
[15]
Trisha Greenhalgh, Matthew Knight, Maria Buxton, Laiba Husain, et al. 2020. Management of post-acute covid-19 in primary care. bmj 370 (2020).
[16]
Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. 2021. Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH) 3, 1 (2021), 1--23.
[17]
Lixue Huang, Qun Yao, Xiaoying Gu, Qiongya Wang, Lili Ren, Yeming Wang, Ping Hu, Li Guo, Min Liu, Jiuyang Xu, et al. 2021. 1-year outcomes in hospital survivors with COVID-19: a longitudinal cohort study. The Lancet 398, 10302 (2021), 747--758.
[18]
Dagmar Jamiolkowski, Beda Mühleisen, Simon Müller, Alexander A Navarini, Alexandar Tzankov, and Elisabeth Roider. 2020. SARS-CoV-2 PCR testing of skin for COVID-19 diagnostics: a case report. The Lancet 396, 10251 (2020), 598--599.
[19]
Mengqi Jin, Mohammad Taha Bahadori, Aaron Colak, Parminder Bhatia, Busra Celikkaya, Ram Bhakta, Selvan Senthivel, Mohammed Khalilia, Daniel Navarro, Borui Zhang, et al. 2018. Improving hospital mortality prediction with medical named entities and multimodal learning. arXiv preprint arXiv:1811.12276 (2018).
[20]
Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database. Scientific data 3, 1 (2016), 1--9.
[21]
Harvey J Keselman and Joanne C Rogan. 1978. A comparison of the modified-Tukey and Scheffe methods of multiple comparisons for pairwise contrasts. J. Amer. Statist. Assoc. 73, 361 (1978), 47--52.
[22]
Akib Mohi Ud Din Khanday, Syed Tanzeel Rabani, Qamar Rayees Khan, Nusrat Rouf, and Masarat Mohi Ud Din. 2020. Machine learning based approaches for detecting COVID-19 using clinical text data. International Journal of Information Technology 12, 3 (2020), 731--739.
[23]
Faiza Khan Khattak, Serena Jeblee, Chloé Pou-Prom, Mohamed Abdalla, Christopher Meaney, and Frank Rudzicz. 2019. A survey of word embeddings for clinical text. Journal of Biomedical Informatics 100 (2019), 100057.
[24]
Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234--1240.
[25]
Sandra Lopez-Leon, Talia Wegman-Ostrosky, Carol Perelman, Rosalinda Sepulveda, Paulina A Rebolledo, Angelica Cuapio, and Sonia Villapol. 2021. More than 50 long-term effects of COVID-19: a systematic review and meta-analysis. Scientific reports 11, 1 (2021), 1--12.
[26]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017).
[27]
Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015).
[28]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[29]
Grégoire Montavon, Alexander Binder, Sebastian Lapuschkin, Wojciech Samek, and Klaus-Robert Müller. 2019. Layer-wise relevance propagation: an overview. Explainable AI: interpreting, explaining and visualizing deep learning (2019), 193--209.
[30]
Tarun Naren, Yuanda Zhu, and May Dongmei Wang. 2021. COVID-19 diagnosis using model agnostic meta-learning on limited chest X-ray images. In Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. 1--9.
[31]
Mayssam Nehme, Olivia Braillard, François Chappuis, Delphine S Courvoisier, Idris Guessous, and CoviCare Study Team. 2021. Prevalence of symptoms more than seven months after diagnosis of symptomatic COVID-19 in an outpatient setting. Annals of internal medicine 174, 9 (2021), 1252--1260.
[32]
Amy D Proal and Michael B VanElzakker. 2021. Long COVID or post-acute sequelae of COVID-19 (PASC): An overview of biological factors that may contribute to persistent symptoms. Frontiers in microbiology (2021), 1494.
[33]
AV Raveendran, Rajeev Jayadevan, and S Sashidharan. 2021. Long COVID: an overview. Diabetes & Metabolic Syndrome: Clinical Research & Reviews 15, 3 (2021), 869--875.
[34]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135--1144.
[35]
Justyna Sarzynska-Wawer, Aleksander Wawer, Aleksandra Pawlak, Julia Szymanowska, Izabela Stefaniak, Michal Jarkiewicz, and Lukasz Okruszek. 2021. Detecting formal thought disorder by deep contextualized word representations. Psychiatry Research 304 (2021), 114135.
[36]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618--626.
[37]
Wenqi Shi, Li Tong, Yuanda Zhu, and May D Wang. 2021. COVID-19 automatic diagnosis with radiographic imaging: Explainable attention transfer deep neural networks. IEEE Journal of Biomedical and Health Informatics 25, 7 (2021), 2376--2387.
[38]
Wenqi Shi, Li Tong, Yuchen Zhuang, Yuanda Zhu, and May D Wang. 2020. EXAM: an explainable attention-based model for COVID-19 automatic diagnosis. In Proceedings of the 11th ACM international conference on bioinformatics, computational biology and health informatics. 1--6.
[39]
Louise Sigfrid, Muge Cevik, Edwin Jesudason, Wei Shen Lim, Jordi Rello, John Amuasi, Fernando Bozza, Carlo Palmieri, Daniel Munblit, Jan Cato Holter, et al. 2021. What is the recovery rate and risk of long-term consequences following a diagnosis of COVID-19? A harmonised, global longitudinal observational study protocol. BMJ open 11, 3 (2021), e043887.
[40]
Antoni Sisó-Almirall, Pilar Brito-Zerón, Laura Conangla Ferrín, Belchin Kostov, Anna Moragas Moreno, Jordi Mestres, Jaume Sellarès, Gisela Galindo, Ramon Morera, Josep Basora, et al. 2021. Long Covid-19: proposed primary care clinical guidelines for diagnosis and disease management. International journal of environmental research and public health 18, 8 (2021), 4350.
[41]
Antoni Sisó-Almirall, Belchin Kostov, Minerva Mas-Heredia, Sergi Vilanova-Rotllan, Ethel Sequeira-Aymar, Mireia Sans-Corrales, Elisenda Sant-Arderiu, Laia Cayuelas-Redondo, Angela Martínez-Pérez, Noemi Garcia-Plana, et al. 2020. Prognostic factors in Spanish COVID-19 patients: A case series from Barcelona. PloS one 15, 8 (2020), e0237960.
[42]
Manoj Sivan and Sharon Taylor. 2020. NICE guideline on long covid.
[43]
Carole H Sudre, Benjamin Murray, Thomas Varsavsky, Mark S Graham, Rose S Penfold, Ruth C Bowyer, Joan Capdevila Pujol, Kerstin Klaser, Michela Antonelli, Liane S Canas, et al. 2021. Attributes and predictors of long COVID. Nature medicine 27, 4 (2021), 626--631.
[44]
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In International conference on machine learning. PMLR, 3319--3328.
[45]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. Advances in neural information processing systems 27 (2014).
[46]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[47]
Priya Venkatesan. 2021. NICE guideline on long COVID. The Lancet Respiratory Medicine 9, 2 (2021), 129.
[48]
Tyler Wagner, FNU Shweta, Karthik Murugadoss, Samir Awasthi, AJ Venkatakrishnan, Sairam Bade, Arjun Puranik, Martin Kang, Brian W Pickering, John C O'Horo, et al. 2020. Augmented curation of clinical notes from a massive EHR system reveals symptoms of impending COVID-19 diagnosis. Elife 9 (2020), e58227.
[49]
Jingqi Wang, Noor Abu-el Rub, Josh Gray, Huy Anh Pham, Yujia Zhou, Frank J Manion, Mei Liu, Xing Song, Hua Xu, Masoud Rouhizadeh, et al. 2021. COVID-19 SignSym: a fast adaptation of a general clinical NLP tool to identify and normalize COVID-19 signs and symptoms to OMOP common data model. Journal of the American Medical Informatics Association 28, 6 (2021), 1275--1283.
[50]
Matthew C Woodruff, Tiffany A Walker, Alexander D Truong, Adviteeya N Dixit, Jenny E Han, Richard P Ramonell, Martin C Runnstrom, Mark E Rudolph, Arezou Khosroshahi, F Eun-Hyung Lee, et al. 2021. Evidence of Persisting Autoreactivity in Post-Acute Sequelae of SARS-CoV-2 Infection. medRxiv (2021).
[51]
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2921--2929.
[52]
Yuanda Zhu, Ying Sha, Hang Wu, Mai Li, Ryan A Hoffman, and May D Wang. 2022. Proposing Causal Sequence of Death by Neural Machine Translation in Public Health Informatics. IEEE Journal of Biomedical and Health Informatics 26, 4 (2022), 1422--1431.
[53]
Yuanda Zhu, Janani Venugopalan, Zhenyu Zhang, Nikhil K Chanani, Kevin O Maher, and May D Wang. 2022. Domain Adaptation Using Convolutional Autoencoder and Gradient Boosting for Adverse Events Prediction in the Intensive Care Unit. Frontiers in Artificial Intelligence 5 (2022).

Cited By

View all
  • (2024)A pseudonymized corpus of occupational health narratives for clinical entity recognition in SpanishBMC Medical Informatics and Decision Making10.1186/s12911-024-02609-w24:1Online publication date: 24-Jul-2024
  • (2024)An ensemble model for predicting dispositions of emergency department patientsBMC Medical Informatics and Decision Making10.1186/s12911-024-02503-524:1Online publication date: 22-Apr-2024
  • (2024)Providing Context to the "Unknown": Patient and Provider Reflections on Connecting Personal Tracking, Patient-Reported Insights, and EHR Data within a Post-COVID ClinicProceedings of the ACM on Human-Computer Interaction10.1145/36869888:CSCW2(1-34)Online publication date: 8-Nov-2024
  • Show More Cited By

Index Terms

  1. Using natural language processing on free-text clinical notes to identify patients with long-term COVID effects

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    BCB '22: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
    August 2022
    549 pages
    ISBN:9781450393867
    DOI:10.1145/3535508
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 August 2022

    Check for updates

    Author Tags

    1. PASC
    2. clinical notes
    3. explainable AI
    4. long COVID
    5. natural language processing

    Qualifiers

    • Research-article

    Funding Sources

    • Wallace H. Coulter Distinguished Faculty Fellowship
    • Petit Institute Faculty Fellowship
    • Microsoft Research

    Conference

    BCB '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 254 of 885 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)247
    • Downloads (Last 6 weeks)18
    Reflects downloads up to 18 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A pseudonymized corpus of occupational health narratives for clinical entity recognition in SpanishBMC Medical Informatics and Decision Making10.1186/s12911-024-02609-w24:1Online publication date: 24-Jul-2024
    • (2024)An ensemble model for predicting dispositions of emergency department patientsBMC Medical Informatics and Decision Making10.1186/s12911-024-02503-524:1Online publication date: 22-Apr-2024
    • (2024)Providing Context to the "Unknown": Patient and Provider Reflections on Connecting Personal Tracking, Patient-Reported Insights, and EHR Data within a Post-COVID ClinicProceedings of the ACM on Human-Computer Interaction10.1145/36869888:CSCW2(1-34)Online publication date: 8-Nov-2024
    • (2024)Charting the COVID Long Haul Experience - A Longitudinal Exploration of Symptoms, Activity, and Clinical AdherenceProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642827(1-21)Online publication date: 11-May-2024
    • (2023)An NLP System for COVID/PASC: A Case Demonstration of the OHNLP Toolkit from the National COVID Cohort Collaborative and the RECOVER programs (Preprint)JMIR Medical Informatics10.2196/49997Online publication date: 15-Jun-2023
    • (2023)A Natural Language Processing Model for COVID-19 Detection Based on Dutch General Practice Electronic Health Records by Using Bidirectional Encoder Representations From Transformers: Development and Validation StudyJournal of Medical Internet Research10.2196/4994425(e49944)Online publication date: 4-Oct-2023
    • (2023)RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative NotesCancer Control10.1177/1073274823120989230Online publication date: 1-Nov-2023
    • (2023)Latent Topic Extraction as a Source of Labeling in Natural Language Processing2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM58861.2023.10385618(4312-4319)Online publication date: 5-Dec-2023
    • (2023)Automated Seizure Detection using Transformer Models on Multi-Channel EEGs2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI)10.1109/BHI58575.2023.10313440(1-6)Online publication date: 15-Oct-2023
    • (2023)Heart disease risk factors detection from electronic health records using advanced NLP and deep learning techniquesScientific Reports10.1038/s41598-023-34294-613:1Online publication date: 3-May-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media