[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Mining for equitable health: : Assessing the impact of missing data in electronic health records

Published: 01 March 2023 Publication History

Graphical abstract

Display Omitted

Abstract

Electronic health records (EHR) are collected as a routine part of healthcare delivery, and have great potential to be utilized to improve patient health outcomes. They contain multiple years of health information to be leveraged for risk prediction, disease detection, and treatment evaluation. However, they do not have a consistent, standardized format across institutions, particularly in the United States, and can present significant analytical challenges– they contain multi-scale data from heterogeneous domains and include both structured and unstructured data. Data for individual patients are collected at irregular time intervals and with varying frequencies. In addition to the analytical challenges, EHR can reflect inequity– patients belonging to different groups will have differing amounts of data in their health records. Many of these issues can contribute to biased data collection. The consequence is that the data for under-served groups may be less informative partly due to more fragmented care, which can be viewed as a type of missing data problem. For EHR data in this complex form, there is currently no framework for introducing realistic missing values. There has also been little to no work in assessing the impact of missing data in EHR. In this work, we first introduce a terminology to define three levels of EHR data and then propose a novel framework for simulating realistic missing data scenarios in EHR to adequately assess their impact on predictive modeling. We incorporate the use of a medical knowledge graph to capture dependencies between medical events to create a more realistic missing data framework. In an intensive care unit setting, we found that missing data have greater negative impact on the performance of disease prediction models in groups that tend to have less access to healthcare, or seek less healthcare. We also found that the impact of missing data on disease prediction models is stronger when using the knowledge graph framework to introduce realistic missing values as opposed to random event removal.

References

[1]
B.K. Beaulieu-Jones, D.R. Lavage, J.W. Snyder, J.H. Moore, S.A. Pendergrass, C.R. Bauer, Characterizing and Managing Missing Structured Data in Electronic HealthRecords: Data Analysis, JMIR Med Inform (2018),.
[2]
S.V. Buuren, Flexible Imputation of Missing Data, CRC Press (2018),.
[3]
A. Deeks, C. Lombard, J. Michelmore, H. Teede, The effects of gender and age on health related behaviors, BMC Public Health 9 (2009),.
[4]
W. Farhan, Z. Wang, Y. Huang, S. Wang, F. Wang, X. Jiang, A predictive model for medical events based on contextual embedding of temporal sequences, JMIR medical informatics 4 (4) (2016) e39.
[5]
Freedman, HG, Williams, H, Miller, MA, Birtwell, D, Mowery, DL, and Stoeckert, CJ (2020). A novel tool for standardizing clinical data in a semantically rich model. Journal of Biomedical Informatics 112. Articles initially published in Journal of Biomedical Informatics: X 5-8, 2020, 100086. ISSN: 1532-0464. .2020.100086. URL: https://www.sciencedirect.com/science/article/pii/S2590177X20300214.
[6]
Getzen, E, Ruan, Y, Ungar, L, and Long, Q (2022). Mining for Health: A Comparison of Word Embedding Methods for Analysis of EHRs Data. medRxiv.
[7]
M. Ghassemi, T. Naumann, P. Schulam, A.L. Beam, I.Y. Chen, R. Ranganath, A Review of Challenges and Opportunities in Machine Learning for Health, AMIA Joint Summits on Translational Science (2020),.
[8]
M.A. Gianfrancsco, S. Tamang, J. Yazdany, G. Schmajuk, Potential biases in machine learning algorithms using electronic health record data, JAMA Internal Medicine (2018),.
[9]
Goodwin, T and Harabagiu, SM (2013). “Automatic Generation of a Qualified Medical Knowledge Graph and Its Usage for Retrieving Patient Cohorts from Electronic Medical Records”. In: 2013 IEEE Seventh International Conference on Semantic Computing 363–370.
[10]
W.J. Hall, M.V. Chapman, K.M. Lee, Y.M. Merino, T.W. Thomas, B.K. Payne, E. Eng, S.H. Day, T. Coyne-Beasley, Implicit Racial/Ethnic Bias Among Health Care Professionals and Its Influence on Health Care Outcomes: A Systematic Review, American journal of public health 105 (12) (2015) e60–e76,. URL: https://pubmed.ncbi.nlm.nih.gov/26469668.
[11]
D.F. Heitjan, S. Basu, Distinguishing ”Missing at Random and ”Missing Completely at Random”, The American Statistician 50 (1996) 207–213,.
[12]
A. Hoerbst, E. Ammenwerth, Electronic health records. A systematic review on quality requirements, Methods Inf. Med 49 (2010) 320–336,.
[13]
R.A. Hubbard, J. Huang, J. Harton, A. Oganisian, G. Choi, E.I. Utidjian, L.C. Bailey, Y. Chen, A Bayesian latent class approach for EHR-based phenotyping, Statistics in Medicine 38 (2018) 74–87,.
[14]
A.E.W. Johnson, T.J. Pollard, L. Shen, L.H. Lehman, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. Anthony Celi, R.G. Mark, MIMIC-III, a freely accessible critical care database, Scientific Data 3 (2016),.
[15]
T. van Loenen, M.J. van den Berg, M.J. Faber, G.P. Westert, Propensity to seek healthcare in different healthcare systems: analysis of patient data in 34 countries, BMC Health Services Research 15 (1) (2015) 465,.
[16]
Medicine, I of and Council, NR (2015). Investing in the Health and Well-Being of Young Adults. Washington, DC: National Academic Press (US).
[17]
T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed Representations of Words and Phrases and their Compositionality, Advances in Neural Information Processing Systems 7 (2013) 3111–3119,.
[18]
A. Rajkomar, M. Hardt, M. Howell, G. Corrado, M. Chin, Ensuring Fairness in Machine Learning to Advance Health Equity, Annals of Internal Medicine (2018),.
[19]
M. Rotmensch, Y. Halpern, A. Tlimat, S. Horng, D. Sontag, Learning a Health Knowledge Graph from Electronic Medical Records, Scientific Reports 7 (1) (2017) 5994,.
[20]
D.B. Rubin, Inference and Missing Data, Biometrika 3 (1976) 581–592,.
[21]
Santos, A, Colaco, AR, Nielsen, AB, Niu, L, Strauss, M, Geyer, PE, Coscia, F, Albrechtsen, NJW, Mundt, F, Jensen, LJ, and Mann, M (2022). A knowledge graph to interpret clinical proteomics data. Nature Biotechnology. URL:
[22]
Schafer Shafer, JL (1997). The Analysis of Incomplete Multivariate Data. New York: Chapman and Hall / CRC.
[23]
A. Shinozaki, Electronic Medical Records and Machine Learning Approaches to Drug Development, Artificial Intelligence in Oncology Drug Discovery and Development (2019),.
[24]
J.R.A. Solares, F.E.D. Raimondi, Y. Zhu, F. Rahimian, D. Canoy, J. Tran, A.C.P. Gomes, A.H. Payberah, M. Zottoli, M. Nazarzadeh, N. Conrad, K. Rahimi, G. Salimi-Khorshidi, Deep learning for electronic health records: A comparative review of multiple deep neural architectures, Journal of Biomedical Informatics 101 (2020),.
[25]
G.M. Weber, W.G. Adams, E.V. Bernstam, J.P. Bickel, K.P. Fox, K. Marsolo, V.A. Raghavan, A. Turchin, X. Zhou, S.N. Murphy, K.D. Mandl, Biases introduced by filtering electronic health records for patients with ”complete data”, Journal of the American Medical Informatics Association 24 (2018) 1134–1141,.
[26]
Wells, B, Chagin, KM, Nowacki, AS, and Kattan, MW (2013). Strategies for handling missing data in electronic health record derived data. eGEMs. 41.

Cited By

View all
  • (2024)Participant flow diagrams for health equity in AIJournal of Biomedical Informatics10.1016/j.jbi.2024.104631152:COnline publication date: 9-Jul-2024
  • (2023)Informative missingnessJournal of Biomedical Informatics10.1016/j.jbi.2023.104306139:COnline publication date: 1-Mar-2023
  • (2023)Future Opportunities for Systematic AI Support in HealthcareBridging the Gap Between AI and Reality10.1007/978-3-031-73741-1_13(203-224)Online publication date: 23-Oct-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Biomedical Informatics
Journal of Biomedical Informatics  Volume 139, Issue C
Mar 2023
337 pages

Publisher

Elsevier Science

San Diego, CA, United States

Publication History

Published: 01 March 2023

Author Tags

  1. Missing data
  2. Electronic health records
  3. Knowledge graph
  4. Fairness
  5. Health disparities

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Participant flow diagrams for health equity in AIJournal of Biomedical Informatics10.1016/j.jbi.2024.104631152:COnline publication date: 9-Jul-2024
  • (2023)Informative missingnessJournal of Biomedical Informatics10.1016/j.jbi.2023.104306139:COnline publication date: 1-Mar-2023
  • (2023)Future Opportunities for Systematic AI Support in HealthcareBridging the Gap Between AI and Reality10.1007/978-3-031-73741-1_13(203-224)Online publication date: 23-Oct-2023

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media