[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
rapid-communication

Converting OMOP CDM to phenopackets: : A model alignment and patient data representation evaluation

Published: 24 July 2024 Publication History

Graphical abstract

Display Omitted

Abstract

Objective

This study aims to promote interoperability in precision medicine and translational research by aligning the Observational Medical Outcomes Partnership (OMOP) and Phenopackets data models. Phenopackets is an expert knowledge-driven schema designed to facilitate the storage and exchange of multimodal patient data, and support downstream analysis. The first goal of this paper is to explore model alignment by characterizing the common data models using a newly developed data transformation process and evaluation method. Second, using OMOP normalized clinical data, we evaluate the mapping of real-world patient data to Phenopackets . We evaluate the suitability of Phenopackets as a patient data representation for real-world clinical cases.

Methods

We identified mappings between OMOP and Phenopackets and applied them to a real patient dataset to assess the transformation’s success. We analyzed gaps between the models and identified key considerations for transforming data between them. Further, to improve ambiguous alignment, we incorporated Unified Medical Language System (UMLS) semantic type-based filtering to direct individual concepts to their most appropriate domain and conducted a domain-expert evaluation of the mapping’s clinical utility.

Results

The OMOP to Phenopacket transformation pipeline was executed for 1,000 Alzheimer’s disease patients and successfully mapped all required entities. However, due to missing values in OMOP for required Phenopacket attributes, 10.2 % of records were lost. The use of UMLS-semantic type filtering for ambiguous alignment of individual concepts resulted in 96 % agreement with clinical thinking, increased from 68 % when mapping exclusively by domain correspondence.

Conclusion

This study presents a pipeline to transform data from OMOP to Phenopackets. We identified considerations for the transformation to ensure data quality, handling restrictions for successful Phenopacket validation and discrepant data formats. We identified unmappable Phenopacket attributes that focus on specialty use cases, such as genomics or oncology, which OMOP does not currently support. We introduce UMLS semantic type filtering to resolve ambiguous alignment to Phenopacket entities to be most appropriate for real-world interpretation. We provide a systematic approach to align OMOP and Phenopackets schemas. Our work facilitates future use of Phenopackets in clinical applications by addressing key barriers to interoperability when deriving a Phenopacket from real-world patient data.

References

[1]
A. Rajkomar, E. Oren, K. Chen, A.M. Dai, N. Hajaj, M. Hardt, et al., Scalable and accurate deep learning with electronic health records, Npj Digit. Med. 1 (1) (2018) 18.
[2]
N.G. Weiskopf, G. Hripcsak, S. Swaminathan, C. Weng, Defining and measuring completeness of electronic health records for secondary use, J. Biomed. Inform. 46 (5) (2013) 830–836.
[3]
R.R. Deer, M.A. Rock, N. Vasilevsky, L. Carmody, H. Rando, A.J. Anzalone, et al., Characterizing long COVID: deep phenotype of a complex condition, EBioMedicine 25 (74) (2021).
[4]
A. Kline, H. Wang, Y. Li, S. Dennis, M. Hutch, Z. Xu, et al., Multimodal machine learning in precision health: A scoping review, Npj Digit. Med. 5 (1) (2022) 1–14.
[5]
P.N. Robinson, M.A. Haendel, Ontologies, knowledge representation, and machine learning for translational research: recent contributions, Yearb. Med. Inform. 29 (01) (2020) 159–162.
[6]
M. Haendel, J. McMurry, R. Relevo, C. Mungall, P. Robinson, C. Chute, A census of disease ontologies, Annu. Rev. Biomed. Data Sci. 20 (2018) 1.
[7]
T.J. Callahan, A.L. Stefanski, J.M. Wyrwa, C. Zeng, A. Ostropolets, J.M. Banda, et al., Ontologizing health systems data at scale: making translational discovery a reality, Npj Digit. Med. 6 (1) (2023) 89.
[8]
M.A. Haendel, C.G. Chute, P.N. Robinson, Classification, ontology, and precision medicine, Phimister EG, editor, N. Engl. J. Med. 379 (15) (2018) 1452–1462.
[9]
M. Choi, R. Starr, M. Braunstein, J. Duke, OHDSI on FHIR Platform Development with OMOP CDM mapping to FHIR Resources.
[10]
J. Gruendner, T. Schwachhofer, P. Sippl, N. Wolf, M. Erpenbeck, C. Gulden, et al., KETOS: clinical decision support and machine learning as a service – a training and deployment platform based on Docker, OMOP-CDM, and FHIR Web Services, PLoS One 14 (10) (2019).
[11]
C. Weng, N.H. Shah, G. Hripcsak, Deep phenotyping: embracing complexity and temporality—towards scalability, portability, and interoperability, J. Biomed. Inform. 105 (2020).
[12]
D. Danis, J.O.B. Jacobsen, A.H. Wagner, T. Groza, M.A. Beckwith, L. Rekerle, et al., Phenopacket-tools: building and validating GA4GH phenopackets, PLoS One 18 (5) (2023).
[13]
J.O.B. Jacobsen, M. Baudis, G.S. Baynam, J.S. Beckmann, S. Beltran, O.J. Buske, et al., The GA4GH Phenopacket schema defines a computable representation of clinical data, Nat. Biotechnol. 40 (6) (2022) 817–820.
[14]
R. Steinhaus, S. Proft, E. Seelow, T. Schalau, P.N. Robinson, D. Seelow, Deep phenotyping: symptom annotation made simple with SAMS, Nucleic Acids Res. 50 (W1) (2022) W677–W681.
[15]
Phenopackets v2.0 expands utility to provide a more complete medical picture [Internet]. [cited 2023 Dec 4], Available from: https://www.ga4gh.org/news_item/phenopackets-v2-expands-utility-to-provide-a-more-complete-medical-picture/.
[16]
N. Queralt-Rosinach, P.A. Moreno, T. Callahan, G. Delussu, C. Fraboulet, J. Jacobsen, et al., Mapping OHDSI OMOP Common Data Model and GA4GH Phenopackets for COVID-19 disease epidemics and analytics [Internet], BioHackrXiv; 2022 Nov [cited 2023 May 23], Available from: https://osf.io/ep3xh.
[17]
O. Bodenreider, The Unified Medical Language System (UMLS): integrating biomedical terminology, Nucleic Acids Res. 32 (90001) (2004) 267D–D270.
[18]
V. Kashyap, The UMLS® semantic network and the semantic web, AMIA Annu. Symp. Proc. 2003 (2003) 351–355.
[19]
M.S. Ladewig, J.O.B. Jacobsen, A.H. Wagner, D. Danis, B. El Kassaby, M. Gargano, et al., GA4GH phenopackets: a practical introduction, Adv. Genet. Hoboken NJ 4 (1) (2023).
[20]
S. Köhler, M. Gargano, N. Matentzoglu, L.C. Carmody, D. Lewis-Smith, N.A. Vasilevsky, et al., The human phenotype ontology in 2021, Nucleic Acids Res. 49 (D1) (2021) D1207–D1217.
[21]
S. Avram, C.G. Bologa, J. Holmes, G. Bocci, T.B. Wilson, D.T. Nguyen, et al., DrugCentral 2021 supports drug discovery and repositioning, Nucleic Acids Res. 49 (D1) (2021) D1160–D1169.
[22]
N.A. Vasilevsky, N.A. Matentzoglu, S. Toro, J.E. Flack, H. Hegde, D.R. Unni, et al., Mondo: unifying diseases for the world, by the world [Internet], Apr [cited 2023 Dec 4] Health Inf. (2022) http://medrxiv.org/lookup/doi/10.1101/2022.04.13.22273750.
[23]
S.J. Nelson, K. Zeng, J. Kilbourne, T. Powell, R. Moore, Normalized names for clinical drugs: RxNorm at 6 years, J. Am. Med. Inform. Assoc. JAMIA 18 (4) (2011) 441–448.
[24]
C.J. McDonald, S.M. Huff, J.G. Suico, G. Hill, D. Leavelle, R. Aller, et al., LOINC, a universal standard for identifying laboratory observations: a 5-year update, Clin. Chem. 49 (4) (2003) 624–633.
[25]
K. Donnelly, SNOMED-CT: the advanced terminology and coding system for eHealth, Stud. Health Technol. Inform. 121 (2006) 279–290.
[26]
Informatics OHDS and, The Book of OHDSI [Internet] [cited 2023 Sep 11], Available from: https://ohdsi.github.io/TheBookOfOhdsi/.
[27]
What is a phenopacket? — phenopacket-schema 2.0 documentation [Internet] [cited 2023 Dec 4], Available from: https://phenopacket-schema.readthedocs.io/en/latest/basics.html.
[28]
National Libray of Medicine, UMLS release file: 2023AA [Internet], Available from: https://www.nlm.nih.gov/research/umls/licensedcontent/umlsarchives04.html.
[29]
Y. Yu, N. Zong, A. Wen, S. Liu, D.J. Stone, D. Knaack, et al., Developing an ETL tool for converting the PCORnet CDM into the OMOP CDM to facilitate the COVID-19 data integration, J. Biomed. Inform. 127 (2022).
[30]
N. Sioutos, S. de Coronado, M.W. Haber, F.W. Hartel, W.L. Shaiu, L.W. Wright, NCI thesaurus: a semantic model integrating cancer-related clinical and molecular information, J. Biomed. Inform. 40 (1) (2007) 30–43.
[31]
Chris Carlson, Group Health Cooperative, Dementia, PheKB [Internet], 2012, Available from: https://phekb.org/phenotype/10.
[32]
PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability - [Internet], [cited 2023 Dec 4], Available from: https://pubmed-ncbi-nlm-nih-gov.ezproxy.cul.columbia.edu/27026615/.
[33]
M. Rueda, I.C. Leist, I.G. Gut, Convert-pheno: a software toolkit for the interconversion of standard data models for phenotypic data, J. Biomed. Inform. 29 (2023).
[34]
R. Sisk, L. Lin, M. Sperrin, J.K. Barrett, B. Tom, K. Diaz-Ordaz, et al., Informative presence and observation in routine health data: a review of methodology for clinical risk prediction, J. Am. Med. Inform. Assoc. 28 (1) (2021) 155–166.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Biomedical Informatics
Journal of Biomedical Informatics  Volume 155, Issue C
Jul 2024
73 pages

Publisher

Elsevier Science

San Diego, CA, United States

Publication History

Published: 24 July 2024

Author Tags

  1. Phenopackets schema
  2. OMOP-CDM
  3. Health data standards
  4. Interoperability
  5. Phenotyping
  6. Data model

Qualifiers

  • Rapid-communication

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media