[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Data harmonization and federated learning for multi-cohort dementia research using the OMOP common data model: : A Netherlands consortium of dementia cohorts case study

Published: 24 July 2024 Publication History

Graphical abstract

Display Omitted

Abstract

Background

Establishing collaborations between cohort studies has been fundamental for progress in health research. However, such collaborations are hampered by heterogeneous data representations across cohorts and legal constraints to data sharing. The first arises from a lack of consensus in standards of data collection and representation across cohort studies and is usually tackled by applying data harmonization processes. The second is increasingly important due to raised awareness for privacy protection and stricter regulations, such as the GDPR. Federated learning has emerged as a privacy-preserving alternative to transferring data between institutions through analyzing data in a decentralized manner.

Methods

In this study, we set up a federated learning infrastructure for a consortium of nine Dutch cohorts with appropriate data available to the etiology of dementia, including an extract, transform, and load (ETL) pipeline for data harmonization. Additionally, we assessed the challenges of transforming and standardizing cohort data using the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) and evaluated our tool in one of the cohorts employing federated algorithms.

Results

We successfully applied our ETL tool and observed a complete coverage of the cohorts’ data by the OMOP CDM. The OMOP CDM facilitated the data representation and standardization, but we identified limitations for cohort-specific data fields and in the scope of the vocabularies available. Specific challenges arise in a multi-cohort federated collaboration due to technical constraints in local environments, data heterogeneity, and lack of direct access to the data.

Conclusion

In this article, we describe the solutions to these challenges and limitations encountered in our study. Our study shows the potential of federated learning as a privacy-preserving solution for multi-cohort studies that enhance reproducibility and reuse of both data and analyses.

References

[1]
V. Ehrenstein, H. Nielsen, A.B. Pedersen, S.P. Johnsen, L. Pedersen, Clinical epidemiology in the era of big data: new opportunities, familiar challenges, CLEP 9 (2017) 245–250.
[2]
T. Hulsen, et al., From big data to precision medicine, Front. Med. 6 (2019) 34.
[3]
J. Xu, et al., Federated learning for healthcare informatics, J. Healthc. Inform. Res. 5 (2021) 1–19.
[4]
S. Banabilah, M. Aloqaily, E. Alsayed, N. Malik, Y. Jararweh, Federated learning review: fundamentals, enabling technologies, and future applications, Inf. Process. Manag. 59 (2022).
[5]
I. Kholod, et al., Open-source federated learning frameworks for IoT: a comparative review and analysis, Sensors 21 (2020) 167.
[6]
E.A. Voss, et al., Feasibility and utility of applications of the common data model to multiple, disparate observational health databases, J. Am. Med. Inform. Assoc. 22 (2015) 553–564.
[7]
G. Hripcsak, et al., Observational Health Data Sciences and Informatics (OHDSI): opportunities for Observational Researchers, Stud. Health Technol. Inform. 216 (2015) 574–578.
[8]
J.C. Quiroz, et al., Extract, transform, load framework for the conversion of health databases to OMOP, PLoS One 17 (2022) e0266911.
[9]
WhiteRabbit and Rabbit-In-A-Hat (Version 0.10.8). OHDSI. https://github.com/OHDSI/WhiteRabbit/.
[10]
X. Zhou, et al., An evaluation of the THIN database in the OMOP common data model for active drug safety surveillance, Drug Saf. 36 (2013) 119–134.
[11]
S.T. Rosenbloom, R.J. Carroll, J.L. Warner, M.E. Matheny, J.C. Denny, Representing knowledge consistently across health systems, Yearb. Med. Inform. 26 (2017) 139–147.
[12]
B. Li, R. Tsui, How to improve the reuse of clinical data– openEHR and OMOP CDM, J. Phys. Conf. Ser. 1624 (2020).
[13]
V. Papez, et al., Transforming and evaluating the UK Biobank to the OMOP Common Data Model for COVID-19 research and beyond, J. Am. Med. Inform. Assoc. 30 (2022) 103–111.
[14]
Y. Yu, et al., Developing an ETL tool for converting the PCORnet CDM into the OMOP CDM to facilitate the COVID-19 data integration, J. Biomed. Inform. 127 (2022).
[15]
S.M.K. Sathappan, et al., Transformation of electronic health records and questionnaire data to OMOP CDM: a feasibility study using SG_T2DM dataset, Appl. Clin. Inform. 12 (2021) 757–767.
[16]
N. Paris, A. Lamer, A. Parrot, Transformation and evaluation of the MIMIC database in the OMOP common data model: development and usability study, JMIR Med. Inform. 9 (2021) e30970.
[17]
J.G. Klann, M.A.H. Joss, K. Embree, S.N. Murphy, Data model harmonization for the All Of Us Research Program: Transforming i2b2 data into the OMOP common data model, PLoS One 14 (2019).
[18]
J.R. Almeida, L.B. Silva, I. Bos, P.J. Visser, J.L. Oliveira, A methodology for cohort harmonisation in multicentre clinical research, Inf. Med. Unlocked 27 (2021).
[19]
A. Matcho, P. Ryan, D. Fife, C. Reich, Fidelity assessment of a clinical practice research datalink conversion to the OMOP common data model, Drug Saf. 37 (2014) 945–959.
[20]
M. Oja, et al., Transforming Estonian health data to the Observational Medical Outcomes Partnership (OMOP) Common Data Model: lessons learned, JAMIA Open 6 (2023).
[21]
P. Biedermann, et al., Standardizing registry data to the OMOP Common Data Model: experience from three pulmonary hypertension databases, BMC Med. Res. Method. 21 (2021) 238.
[22]
D. Puttmann, N. De Keizer, R. Cornet, E. Van Der Zwan, F. Bakhshi-Raiez, FAIRifying a Quality Registry Using OMOP CDM: Challenges and Solutions, in: B. Séroussi, et al. (Eds.) Studies in Health Technology and Informatics, IOS Press, 2022. https://doi.org/10.3233/SHTI220476.
[23]
F. Cremonesi, et al., The need for multimodal health data modeling: a practical approach for a federated-learning healthcare platform, J. Biomed. Inform. 141 (2023).
[24]
G.H. Lee, et al., Feasibility study of federated learning on the distributed research network of OMOP common data model, Healthc. Inform. Res. 29 (2023) 168–173.
[25]
M.T. Schram, et al., The Maastricht Study: an extensive phenotyping study on determinants of type 2 diabetes, its complications and its comorbidities, Eur. J. Epidemiol. 29 (2014) 439–451.
[26]
C. Sun, J. Van Soest, M. Dumontier, Generating synthetic personal health data using conditional generative adversarial networks combining with differential privacy, J. Biomed. Inform. 143 (2023).
[27]
W.M. Van Der Flier, et al., Optimizing patient care and research: the Amsterdam dementia cohort, JAD 41 (2014) 313–327.
[28]
W. Verschuren, A. Blokstra, H. Picavet, H. Smit, Cohort profile: the doetinchem cohort study, Int. J. Epidemiol. 37 (2008) 1236–1241.
[29]
N. Legdeur, et al., Resilience to cognitive impairment in the oldest-old: design of the EMIF-AD 90+ study, BMC Geriatr. 18 (2018) 289.
[30]
E. Konijnenberg, et al., The EMIF-AD PreclinAD study: study design and baseline cohort overview, Alz Res Therapy 10 (2018) 75.
[31]
M. Schoenmaker, et al., Evidence of genetic enrichment for exceptional survival using a family approach: the Leiden Longevity Study, Eur. J. Hum. Genet. 14 (2006) 79–84.
[32]
M. Huisman, et al., Cohort profile: the longitudinal aging study Amsterdam, Int. J. Epidemiol. 40 (2011) 868–876.
[33]
M.M.B. Breteler, J.J. Claus, D.E. Grobbee, A. Hofman, Cardiovascular disease and distribution of cognitive function in elderly people: the Rotterdam study, BMJ 308 (1994) 1604–1608.
[34]
A.P. Appelman, et al., Total cerebral blood flow, white matter lesions and brain atrophy: the SMART-MR study, J. Cereb. Blood Flow Metab. 28 (2008) 633–639.
[35]
S.J. Reisinger, et al., Development and evaluation of a common data model enabling active drug safety surveillance using disparate healthcare databases, J. Am. Med. Inform. Assoc. 17 (2010) 652–662.
[36]
K.A. Spackman, K.E. Campbell, R.A. Côté, SNOMED RT: a reference terminology for health care, Proc AMIA Annu Fall Symp 640–644 (1997).
[37]
G. Shadow, C.J. McDonald, The Unified Code for Units of Measure, 2009. https://link.springer.com/chapter/10.1007/978-3-319-98192-5_37.
[38]
I. Bermejo, S. Vos European Medical Information Framework’s (EMIF) Alzheimer’s disease (AD) ontology, 2021.
[39]
V. Papez, et al., Transforming and evaluating electronic health record disease phenotyping algorithms using the OMOP common data model: a case study in heart failure, JAMIA Open 4 (2021) ooab001.
[40]
Athena (Version 1.11.0). OHDSI. https://github.com/OHDSI/Athena/.
[41]
Schuemie, M. Usagi (Version 1.3.0). OHDSI. https://github.com/OHDSI/Usagi/.
[42]
O. Beyan, et al., Distributed analytics on sensitive medical data: the personal health train, Data Intellegence 2 (2020) 96–107.
[43]
A. Moncada-Torres, F. Martin, M. Sieswerda, J. Van Soest, G. Geleijnse, VANTAGE6: an open source priVAcy preserviNg federaTed leArninG infrastructurE for Secure Insight eXchange, AMIA Annu. Symp. Proc. 2020 (2020) 870–877.
[44]
M.D. Wilkinson, et al., The FAIR Guiding Principles for scientific data management and stewardship, Sci. Data 3 (2016).
[45]
M. Garza, G. Del Fiol, J. Tenenbaum, A. Walden, M.N. Zozus, Evaluating common data models for use with a longitudinal community registry, J. Biomed. Inform. 64 (2016) 333–341.
[46]
P. Mateus, et al., Federated BrainAge estimation from MRI: a proof of concept, Alzheimer’s & Dementia 19 (2023).

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Biomedical Informatics
Journal of Biomedical Informatics  Volume 155, Issue C
Jul 2024
73 pages

Publisher

Elsevier Science

San Diego, CA, United States

Publication History

Published: 24 July 2024

Author Tags

  1. Data harmonization
  2. Cohort studies
  3. ETL
  4. OMOP
  5. CDM
  6. Federated learning

Author Tags

  1. NCDC
  2. OMOP
  3. CDM
  4. OHDSI
  5. ETL
  6. FAIR

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media