Data Descriptor
Open access
Published: 05 February 2025

Comorbidity Networks From Population-Wide Health Data: Aggregated Data of 8.9M Hospital Patients (1997–2014)

Elma Dervić ORCID: orcid.org/0000-0001-7168-3310^1,2,3,
Katharina Ledebur^1,2,3,
Stefan Thurner^1,2,4 &
…
Peter Klimek ORCID: orcid.org/0000-0003-1187-6713^1,2,3,5

Scientific Data volume 12, Article number: 215 (2025) Cite this article

1423 Accesses
6 Altmetric
Metrics details

Subjects

Abstract

Comorbidity networks have become a valuable tool to support data-driven biomedical research. Yet, studies often are severely hindered by the availability of the necessary comprehensive data, often due to the sensitivity of health care information. This study presents a population-wide comorbidity network dataset derived from 45 million hospital stays of 8.9 million patients over 17 years in Austria. We present co-occurrence networks of hospital diagnoses, stratified by age, sex, and observation period in a total of 96 different subgroups. For each of these groups we report a range of association measures (e.g., count data, and odds ratios) for all pairs of diagnoses. The dataset provides the possibility to researchers to create their own, tailor-made comorbidity networks from real patient data that can be used as a starting point in quantitative and machine learning methods. This data platform is intended to lead to deeper insights into a wide range of epidemiological, public health, and biomedical research questions.

A Poisson binomial-based statistical testing framework for comorbidity discovery across electronic health record datasets

Article 21 October 2021

Global patterns of prognostic biomarkers across disease space

Article Open access 19 December 2022

Unraveling cradle-to-grave disease trajectories from multilayer comorbidity networks

Article Open access 07 March 2024

Background & Summary

Integrating emerging technologies and vast data resources has significantly enhanced various fields, from healthcare to finance, by enabling more accurate predictions, personalized services, and data-driven decision-making. Especially the combination of large data sets with recent machine learning techniques has made the potential obvious. However, the vast amount of patient data generated in healthcare systems is largely inaccessible to the broader research community due to its sensitive nature¹. This presents a significant challenge to the advancement of research on observational health data and the reproducibility of corresponding results. On the other hand, uncovering new knowledge and improving healthcare systems using medical data hold the potential to tackle some of the challenges that 100-year lifespans and accelerated aging of global populations bring about^2,3,4. Research on chronic diseases that are the leading causes of avoidable premature deaths^5,6,7, could tackle some of these challenges with new data-driven strategies for disease prevention^8,9.

Network medicine (NM) adopts a series of principles and approaches from network theory with the aim of preventing and treating diseases by leveraging their interconnected nature¹⁰. For instance, comorbidity networks represent phenomenological disease relations that provide insights into chronic disease progression patterns across life and gender^11,12. It was found that the likelihood to develop specific diseases often depends on the proximity in the comorbidity network to already present diseases. This fact highlights the potential predictive value of these networks. Typically, a patients’ health is not characterized by a single disease but by multiple coexisting medical conditions. Identifying complex structures and patterns within high-dimensional medical data sets may allow for a better understanding of comorbidities and how they affect each other, of gender differences in disease progression, or lead to the discovery of new disease predictors. Data-driven methods based on comorbidity networks have also opened a novel way into epidemiology on a population-wide scale by analyzing individual patient trajectories^13,14, their diagnosis progression patterns¹⁵, and typical clusters of such trajectories^16,17. To construct the disease networks, most studies have used data on in-hospital stays¹⁸.

This project aims to provide a comprehensive dataset for a wide range of comorbidity networks to foster further research in this direction. Networks derived from medical claims databases from all Austrian hospitals reflect information about 44,619,964 hospital stays and their interrelations in the Austrian population (N = 8,996,916). The underlying database, maintained by the Austrian Ministry of Health, includes data on patients’ age and gender, primary and secondary diagnoses, entry and release dates, release type, hospital region, and patient’s residential region. It covers the years 1997 through 2014. Level-3 ICD-10 codes are used to represent primary and secondary diagnoses¹⁹.

Here we present this dataset to construct different types of comorbidity networks, for instance, networks of ICD10 diagnoses, ICD blocks, and chronic conditions blocks for different sex, age-groups, and time periods. We combine aggregated hospital data, and exported networks, and scripts in the Github and FigShare repository²⁰. The workflow of the research presented in this article is presented in Fig. 1.

Using this network data, one can validate whether comorbidities predicted by shared pathobiological processes indeed do occur in the population and thereby validate potential disease etiologies²¹. This network data can also be used to find leverage points for targeted prevention efforts in specific at-risk cohorts using disease trajectories¹⁴, in particular to understand type 2 diabetes progression²², to explore and compare multimorbidity profiles in different populations²³, as well as to achieve more accurate predictions on the length of hospital stays²⁴. Network data of the kind presented has been used in interactive tools to analyze population-level disease progression over time in¹³.

We aim to present a dataset from which many different types of diagnosis co-occurrence statistics can be derived, providing a flexible platform for the construction of comorbidity networks. Here, we present one way of doing this and discuss how the data can be used to achieve some of the objectives mentioned. Different questions may require different definitions of comorbidity networks, many of which can be explored with the data presented.

Methods

The original dataset comprises highly sensitive medical information from the Austrian Federal Ministry for Health. As part of the collaboration agreement, we can only share the aggregated datasets. In our projects, we make secondary use of hospital claims data collected for billing reasons.

Comorbidity networks known as disease-disease networks, express the relationships between various individual diseases. These networks are typically constructed from extensive longitudinal health datasets. Numerous statistical ways exist to construct and derive these networks from raw data; tools from network science are frequently employed to analyze (temporal) disease correlations. In²⁵ the authors review several methods for network reconstruction. Typically, nodes represent diagnoses, and links represent statistically significant correlations (of various types) between two diagnoses.

Odds Ratio calculation and statistical significance testing

We employ Odds ratios (OR) to quantify the strength of association between diseases. The OR is a straightforward metric for network construction. In addition, when controlling for potential confounding variables such as age, sex and time, the Cochran-Mantel-Haenszel (CMH) method allows for a more accurate and unbiased estimate of disease associations by stratifying the analysis and calculating weighted averages of ORs across strata.

We start from contingency tables (two-way tables) for each disease pair to assess statistically significant correlations. These tables are used in statistics to summarize relations between categorical variables. An example is shown in Table 1.

Table 1 Example of contingency table: letters a-d are the respective counts for the various combinations.

Full size table

For the case shown in Table 1, the OR is calculated as \(OR=\frac{a/c}{b/d}\). The lower limit of the OR is zero, while it does not have an upper bound. An odds ratio of one means equal probabilities of presented outcome and absence of outcome. Logarithmic odds ratios or log-odds log (OR) is defined as \(log(OR)=log(\frac{a/c}{b/d})\). The log (OR) has a range from \(-\infty \) to \(+\infty \). The standard error SE of the log (OR) is \(S{E}_{log(OR)}=\sqrt{\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}}\). A 95% confidence interval for the log (OR) is obtained as 1.96 SE on either side of the estimate²⁶.

To enable researchers to study disease associations with other measures than ORs, our dataset provides 96 contingency tables for each pair of 1,080 diagnoses (ICD 3-digit codes) for each stratum.

From these variables most of the association measures used in the comorbidity network literature to date can be readily derived, compare²⁵. For instance, instead of the OR one might choose the relative risk (RR), see for instance^27,28. RR is the ratio of the probability of an event occurring in one group that is the exposed (with diagnose obesity) versus the unexposed (without diagnose obesity) group, \(RR=\frac{\frac{a}{a+b}}{\frac{c}{c+d}}\). It can be readily computed from the provided data. An RR of one means there is no difference between the compared groups.

Stratified analysis

A stratified analysis considered confounding variables such as age and period. 48 strata for men and women were created by splitting the dataset into 10-year age groups and six 2-year intervals from 2003 to 2014 (2003-2004, 2005-2006, etc.). A contingency table for every pair of diagnoses in every stratum was created. Odds Ratio (OR) and p-values that test the null hypothesis that the co-occurrence of two diagnoses is statistically independent, were only computed using contingency tables with appropriate patient numbers in each grouping (more than 4).

We calculated a weighted average of the odds ratio (OR) estimations across the stratified data using the Cochran–Mantel–Haenszel technique (see details below)²⁹. Filtering the resulting correlation matrices by statistical significance alone is not advisable, since this could bias the resulting network towards links between very frequent diseases with low–but still significant–correlations. Hence, we only include comorbidities with an OR greater than 1.5, a p-value less than 0.05 and at least 100 patients with the analysed comorbidity.

The Cochran-Mantel-Haenszel method

To account for confounding factors in the analysis, we perform a stratified analysis by constructing two-by-two tables for each stratum (or category) of the confounding variable, as illustrated in Fig. 2. Cochran-Mantel-Haenszel estimates the OR as a weighted average of the odds ratio of the different strata, \({OR}_{cmh}=\frac{\sum \frac{{a}_{i}{d}_{i}}{{n}_{i}}}{\sum \frac{{b}_{i}{c}_{i}}{{n}_{i}}}\), and RR as the weighted average of the risk ratio, \({RR}_{cmh}=\frac{\sum \frac{{a}_{i}({c}_{i}+{d}_{i})}{{n}_{i}}}{\sum \frac{{c}_{i}({a}_{i}+{b}_{i})}{{n}_{i}}}\))²⁹, n_i is the number of stratum and a_i, b_i, c_i, d_i refers to the corresponding terms in the contingency table of the i-th stratum.

Data Records

The dataset is available at³⁰ and it is organised in four groups:

(i)
Prevalence
(ii)
Contingency Tables
(iii)
Adjacency Matrices
(iv)
Graphs - gexf files.

Prevalence data is provided in CSV format, and contingency tables are organized as lists and stored in RDS format. Adjacency Matrices are published in both CSV and RDS formats. Graphs are available in GEXF format. We also provide R and Python scripts for the analysis of available variables.

Overview of data sources and how datasets are analysed and organise is shown in Fig. 3.

Hospital claims data

The dataset under analysis comprises 44,619,964 hospital admissions, corresponding to roughly the Austrian population (N = 8,996,916) between 1997 and 2014. As a result of a collaboration, the dataset is provided by the Austrian Ministry of Health to the Complexity Science Hub and the Medical University of Vienna. The database contains a patient ID, sex (male/female), age group (resolution of five-year), primary and secondary diagnoses, admission and discharge dates, the type of discharge (routine discharge, transfer to another facility, etc.), region of residency of the patient (32 regions NUTS3), region of the hospital, and department of the hospital department are all included in the database^17,31,32. The primary diagnoses (one diagnosis per hospital stay) refer to the primary reason for hospitalization, secondary diagnoses (one or more diagnoses per hospital stay) specify additional diseases.

ICD-10 codes of level-3 as provided by the WHO are used to represent primary and secondary diagnoses¹⁹. We limited the study to codes between A00 and N99, reducing the number of 3-digit ICD codes from 1,699 to 1,080 diagnoses. We exclude diagnosis codes that cannot be directly related to diseases but encode other reasons for hospitalization, such as O00-O99 - pregnancy, childbirth, puberty, and S00-T88 - injury, poisoning, and some other effects of external causes.

Technical Validation

The goal of this study is to facilitate the secondary use of a population-wide in-hospital database (originally collected for billing purposes³³). The LKF framework (Leistungsorientierte Krankenanstaltenfinanzierung) is Austria’s performance-oriented hospital financing system. It was introduced to ensure that hospitals are funded based on the services they provide. While primarily used for billing purposes, this data is also highly valuable for research, offering reliable insights into healthcare utilization, patient outcomes, and disease patterns.

Data collection under the LKF framework adheres to a rigorous standardized process and validation. Hospitals must collect and report detailed structured data, which includes patient demographics, admission and discharge dates, and diagnostic information (ICD codes). The data collection process is subject to regular external audits to ensure that hospitals are reporting accurately³⁴. These audits are critical to identifying and correcting discrepancies, such as missing or inaccurate diagnoses.

Non-systematic errors, such as sporadic missing diagnoses, have been evaluated and their impact on the results of the analyses is minimal due to the large volume of data. To account for these limitations, sensitivity analyses are often performed to assess the robustness of the results, especially when analyzing rare conditions or specific comorbidities.

We performed filtering to prepare the dataset for comorbidity analysis. We limited the scope of our investigation to information collected between 2003 and 2014. We excluded any patient who had at least one hospital visit between 1997 and 2002 to ensure the comparability of the health state of the study population. Hence, we can assume that our cohort is “healthy” at the beginning of the observation period in the sense that they had no hospital stays during this time period. In the early 2000s, the Austrian diagnosis coding system was changed. By restricting the comorbidity network analysis to times from 2003 onwards, we avoid inaccuracies stemming from changes in diagnosis coding within the hospitals.

This database has been used in studies to analyze gender differences among diabetic patients^35,36,37, gender differences in cardiovascular diseases³⁸, comorbidities of obesity³⁹, clusters of patients¹⁷, and disease trajectories³². These studies have validated the reliability of the LKF dataset in addressing a wide range of research questions, highlighting its robustness despite the known limitations.

Despite the robust structure and auditing, certain limitations remain in the LKF dataset. Diagnoses that do not lead directly to financial compensation, such as alcohol-related disorders or nicotine dependence, may be underreported. In addition, the database lacks outpatient visits, detailed socioeconomic indicators, and medication information. This may prevent the impact of these aspects on comorbidity from being uncovered. These limitations are acknowledged in studies using the dataset and are addressed through careful interpretation of results and, where possible, complementary data sources.

Usage Notes

Table 2 illustrates baseline characteristics of the hospital claims data set containing 3,378,906 patients (females: 1,688,467, males: 1,690,439) following filtering. They are 44.30 ± 24.89 years on average. Figure 4 shows the age distribution.

Table 2 Baseline table of the analyzed database, after filtering.

Full size table

Prevalence of diagnoses

The most prevalent ICD chapters (based on the first letter of each code) for females and males over all time periods are cardiovascular disease (I–Circulatory System), cancers, and neoplasms (C–D–Neoplasms). In males, the third most prevalent are digestive diseases (K–Digestive System), followed by mental disorders (F–Mental and Behavioral Disorders), while in females, we see musculoskeletal disorders (M–Musculoskeletal, Connective Tissue), followed by digestive diseases (K–Digestive System). Interestingly, cardiovascular diagnoses were consistently the most prevalent in males and remained the most common in females up until 2006. However, after 2006, cancer diagnoses became the most prevalent among females. The prevalence of all ICD chapters over time is presented in Fig. 5 a) male, b) female.

Comorbidity networks

We constructed three versions of networks with different types of node:

1.
ICD10 3 digits codes¹⁹, Fig. 6a,b
2.
ICD10 Blocks¹⁹, Fig. 6c,d
3.
Chronic conditions⁴⁰, Fig. 6e,f.

A comprehensive analysis of the network properties of ICD10 codes comorbidity (undirected weighted) networks for each age group (links weights normalized to range from 0 to 1 by dividing each link’s weight by the sum of all links of a target node) is shown in Fig. 7.

These properties unravel a massive topological restructuring of the networks as the underlying patient cohorts age. Figure 7a shows the total number of nodes with at least one connection in the network. The number of these nodes and the average degree (the average number of connections or edges each node has) Fig. 7b increases with age. For both genders, the average path length decreases with age, indicating that the network gets denser with age Fig. 7c. This indicates that diseases become more correlated.

Betweenness centrality is a quantity that measures the influence of a node in “connecting” other nodes. The mean value of betweenness for the whole network fluctuates for both genders, with an increase starting around 40–49 years for both females and males Fig. 7d. This indicates that some diseases in males are critical “bridges” between other diseases in this age range. The networks become increasingly dense with age (except the youngest age group). This is associated with an increase in the betweenness centrality and a decrease in the average path length Fig. 7c,d, respectively.

Closeness centrality measures how quickly a node can reach other nodes in the network Fig. 7f. The spike in closeness for males in younger age groups (10–19 years) suggests that diseases in young males are more densely connected by a few diseases serving as hubs compare to the situation in other age groups. However, the values decline with age for both genders, suggesting a reduced influence of individual diseases as the network becomes denser. Both, males and females, show a decline in modularity with age, meaning that diseases are less likely to form separate, distinct clusters as individuals age, Fig. 7g. Males start with higher modularity but converge to levels of females in older age.

In summary, to the best of our knowledge, this dataset on comorbidities is the only one of its kind that spans 17 years and covers 9 million individuals, and it is publicly available to the research community. Research of these comorbidity networks and aggregated hospital claims data can enhance the understanding of comorbidities by identifying disease co-occurrence patterns. This enables more accurate patient classification based on risk profiles and disease trajectory prediction by analyzing comorbidities’ progression. The data also supports medication studies, assessing drug interactions in patients with multiple conditions. It can be used to test hypotheses about disease relationships across age groups, gender differences in comorbidities, and population-specific patterns.

Here we present a series of network centrality measures that quantify properties of the networks and provide a characterization of their topology and structure. In particular, we employ the degree (to how many diseases a disease is significantly connected to), betweenness centrality (which captures which diseases connect many others), average path length (that quantifies how close–in terms of networks distance–diseases are on average), modularity (reflecting how easily the network can be partitioned into distinct clusters or communities), as well as closeness centrality that captures how quickly a node can access other nodes in the network.

Code availability

This project is accessible on GitHub at: https://github.com/elmadervic/Comorbidity-Networks-From-Population-Wide-Health-Data. The code used to describe and explore the dataset is written in the programming languages R and Python. Please refer to the READ.ME file in the code release for further instructions.

References

Goncalves, A. et al. Generation and evaluation of synthetic patient data. BMC Medical Research Methodology. 20, 1–40, https://doi.org/10.1186/s12874-020-00977-1 (2020).
Article MATH Google Scholar
World Health Organization - Ageing and health, from https://www.who.int/news-room/fact-sheets/detail/ageing-and-health (2024).
Ageing Europe: LOOKING AT THE LIVES OF OLDER PEOPLE IN THE EU, 2019 edition from https://ec.europa.eu/eurostat/documents/3217494/10166544/KS-02-19.
World Health Organization, World report on ageing and health. (2015).
Overview non-communicable-diseases, from https://ec.europa.eu/health/non-communicable-diseases/overview_en (2012).
Hajat, C. & Stein, E. The global burden of multiple chronic conditions: a narrative review. Preventive Medicine Reports. 12, 284–293, https://doi.org/10.1016/j.pmedr.2018.10.008 (2018).
Article PubMed PubMed Central MATH Google Scholar
He, Z. et al. & Others Prevalence of multiple chronic conditions among older adults in Florida and the United States: comparative analysis of the OneFlorida data trust and national inpatient sample. Journal Of Medical Internet Research 20, e8961, https://doi.org/10.2196/jmir.8961 (2018).
Article Google Scholar
Struckmann, V. et al. & Others Caring for people with multiple chronic conditions in. Europe. Eurohealth. 20, 35–40 (2014).
MATH Google Scholar
Kudesia, P. et al. The incidence of multimorbidity and patterns in accumulation of chronic conditions: A systematic review. Journal Of Multimorbidity And Comorbidity. 11, 26335565211032880, https://doi.org/10.1177/26335565211032880 (2021).
Article PubMed PubMed Central Google Scholar
Barabási, A., Gulbahce, N. & Loscalzo, J. Network medicine: a network-based approach to human disease. Nature Reviews Genetics 12, 56–68, https://doi.org/10.1038/nrg2918 (2011).
Article CAS PubMed PubMed Central MATH Google Scholar
Chmiel, A., Klimek, P. & Thurner, S. Spreading of diseases through comorbidity networks across life and gender. New Journal Of Physics 16, 115013, https://doi.org/10.1088/1367-2630/16/11/115013 (2014).
Article ADS MATH Google Scholar
Roque, F. et al. & Others Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Computational Biology 7, e1002141, https://doi.org/10.1371/journal.pcbi.1002141 (2011).
Article CAS PubMed PubMed Central MATH Google Scholar
Siggaard, T. et al. Disease trajectory browser for exploring temporal, population-wide disease progression patterns in 7.2 million Danish patients. Nature Communications 11, 1–10, https://doi.org/10.1038/s41467-020-18682-4 (2020).
Article CAS Google Scholar
Jensen, A. et al. Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients. Nature Communications 5, 1–10, https://doi.org/10.1038/ncomms5022 (2014).
Article CAS MATH Google Scholar
Jeong, E., Ko, K., Oh, S. & Han, H. Network-based analysis of diagnosis progression patterns using claims data. Scientific Reports 7, 1–12, https://doi.org/10.1038/s41598-017-15647-4 (2017).
Article CAS MATH Google Scholar
Giannoula, A., Gutierrez-Sacristán, A., Bravo, Á., Sanz, F. & Furlong, L. Identifying temporal patterns in patient disease trajectories using dynamic time warping: a population-based study. Scientific Reports 8, 1–14, https://doi.org/10.1038/s41598-018-22578-1 (2018).
Article CAS Google Scholar
Haug, N. et al. High-risk multimorbidity patterns on the road to cardiovascular mortality. BMC Medicine. 18, 1–12, https://doi.org/10.1186/s12916-020-1508-1 (2020).
Article MATH Google Scholar
Jørgensen, I., Haue, A., Placido, D., Hjaltelin, J. & Brunak, S. Disease Trajectories from Healthcare Data: Methodologies, Key Results, and Future Perspectives. Annual Review Of Biomedical Data Science 7, 251–276, https://doi.org/10.1146/annurev-biodatasci-110123-041001 (2024).
Article PubMed Google Scholar
International Classification of Diseases (ICD), from https://www.who.int/standards/classifications/classification-of-diseases (2022).
Gephi: An Open Graph Visualization Platform, from https://github.com/gephi/gephi (2024).
Klimek, P., Aichberger, S. & Thurner, S. Disentangling genetic and environmental risk factors for individual diseases from multiplex comorbidity networks. Scientific Reports. 6, 39658, https://doi.org/10.1038/srep03689 (2016).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Khan, A., Uddin, S. & Srinivasan, U. Comorbidity network for chronic disease: A novel approach to understand type 2 diabetes progression. International Journal Of Medical Informatics. 115, 1–9, https://doi.org/10.1016/j.ijmedinf.2018.04.001 (2018).
Article PubMed MATH Google Scholar
Bao, Y. et al. & Others Exploring multimorbidity profiles in middle-aged inpatients: a network-based comparative study of China and the United Kingdom. BMC Medicine 21, 495, https://doi.org/10.1186/s12916-023-03204-y (2023).
Article PubMed PubMed Central MATH Google Scholar
Kalgotra, P. & Sharda, R. When will I get out of the hospital? Modeling length of stay using comorbidity networks. Journal Of Management Information Systems 38, 1150–1184, https://doi.org/10.1080/07421222.2021.1990618 (2021).
Article MATH Google Scholar
Fotouhi, B., Momeni, N., Riolo, M. & Buckeridge, D. Statistical methods for constructing disease comorbidity networks from longitudinal inpatient data. Applied Network Science 3, 1–34, https://doi.org/10.1007/s41109-018-0101-4 (2018).
Article MATH Google Scholar
Bland, J. & Altman, D. The odds ratio. Bmj 320, 1468, https://doi.org/10.1136/bmj.320.7247.1468 (2000).
Article CAS PubMed PubMed Central MATH Google Scholar
Hidalgo, C., Blumm, N., Barabási, A. & Christakis, N. A dynamic network approach for the study of human phenotypes. PLoS Computational Biology 5, e1000353, https://doi.org/10.1371/journal.pcbi.1000353 (2009).
Article CAS PubMed PubMed Central MATH Google Scholar
Folino, F., Pizzuti, C. & Ventura, M. A comorbidity network approach to predict disease risk. International Conference On Information Technology In Bio-and Medical Informatics. pp. 102-109 https://doi.org/10.1007/978-3-642-15020-3_10 (2010).
Kuritz, S. A general overview of Mantel-Haenszel methods: applications and recent developments. Annu Rev Public Health. 9, 123–160, https://doi.org/10.1146/annurev.pu.09.050188.001011 (1988).
Article CAS PubMed MATH Google Scholar
Dervic, E., Ledebur, K., Thurner, S. & Klimek, P. Comorbidity Networks From Population-Wide Health Data: Aggregated Data of 8.9 M Hospital Patients (1997–2014). figshare https://doi.org/10.6084/m9.figshare.27102553 (2024).
Article MATH Google Scholar
Strauss, M., Niederkrotenthaler, T., Thurner, S., Kautzky-Willer, A. & Klimek, P. Data-driven identification of complex disease phenotypes. Journal Of The Royal Society Interface 18, 20201040, https://doi.org/10.1098/rsif.2020.1040 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Dervic, E. et al. Unraveling cradle-to-grave disease trajectories from multilayer comorbidity networks. Npj Digital Medicine 7, 56, https://doi.org/10.1038/s41746-024-01015-w (2024).
Article PubMed PubMed Central Google Scholar
Leistungsorientierte Krankenanstaltenfinanzierung (LKF) - German, from https://www.sozialministerium.at/Themen/Gesundheit/Gesundheitssystem/Krankenanstalten/Leistungsorientierte-Krankenanstaltenfinanzierung-(LKF).html (2024).
Kobel, C. & Pfeiffer, K. Austria: Inpatient care and the LKF framework. Diagnosis-Related Groups In Europe: Moving Towards Transparency, Efficiency And Quality In Hospitals. pp. 175-196, https://doi.org/10.1136/bmj.f3197 (2011).
Deischinger, C. et al. Diabetes mellitus is associated with a higher risk for major depressive disorder in women than in men. BMJ Open Diabetes Research And Care 8, e001430, https://doi.org/10.1136/bmjdrc-2020-001430 (2020).
Article PubMed PubMed Central Google Scholar
Deischinger, C., Dervic, E., Kaleta, M., Klimek, P. & Kautzky-Willer, A. Diabetes mellitus is associated with a higher relative risk for Parkinson’s disease in women than in men. Journal Of Parkinson’s Disease 11, 793–800, https://doi.org/10.3233/JPD-202486 (2021).
Article CAS PubMed Google Scholar
Deischinger, C. et al. Diabetes mellitus is associated with a higher relative risk for venous thromboembolism in females than in males. Diabetes Research And Clinical Practice. 194, 110190, https://doi.org/10.1016/j.diabres.2022.110190 (2022).
Article PubMed MATH Google Scholar
Dervic, E. et al. & Others The Effect of Cardiovascular Comorbidities on Women Compared to Men: Longitudinal Retrospective Analysis. JMIR Cardio 5, e28015, https://doi.org/10.2196/28015 (2021).
Article PubMed PubMed Central Google Scholar
Leutner, M. et al. Risk of Typical Diabetes-Associated Complications in Different Clusters of Diabetic Patients: Analysis of Nine Risk Factors. Journal Of Personalized Medicine 11, 328, https://doi.org/10.3390/jpm11050328 (2021).
Article PubMed PubMed Central Google Scholar
Koller, D. et al. Multimorbidity and long-term care dependency—a five-year follow-up. BMC Geriatrics 14, 1–9, https://doi.org/10.1186/1471-2318-14-70 (2014).
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of the Science of Complex Systems, Center for Medical Data Science, Medical University of Vienna, Spitalgasse 23, 1090, Vienna, Austria
Elma Dervić, Katharina Ledebur, Stefan Thurner & Peter Klimek
Complexity Science Hub, Metternichgasse 8, 1030, Vienna, Austria
Elma Dervić, Katharina Ledebur, Stefan Thurner & Peter Klimek
Supply Chain Intelligence Institute Austria (ASCII), Metternichgasse 8, 1030, Vienna, Austria
Elma Dervić, Katharina Ledebur & Peter Klimek
Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM, 87501, USA
Stefan Thurner
Division of Insurance Medicine, Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
Peter Klimek

Authors

Elma Dervić
View author publications
You can also search for this author in PubMed Google Scholar
Katharina Ledebur
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Thurner
View author publications
You can also search for this author in PubMed Google Scholar
Peter Klimek
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.D., K.L., S.T., P.K. conceived the study, E.D. carried out the analysis, produced the plots and graphics, and drafted the manuscript; E.D.,K.L., S.T., P.K. analyzed the results. All authors wrote the manuscript.

Corresponding author

Correspondence to Elma Dervić.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Dervić, E., Ledebur, K., Thurner, S. et al. Comorbidity Networks From Population-Wide Health Data: Aggregated Data of 8.9M Hospital Patients (1997–2014). Sci Data 12, 215 (2025). https://doi.org/10.1038/s41597-025-04508-9

Download citation

Received: 30 October 2024
Accepted: 20 January 2025
Published: 05 February 2025
DOI: https://doi.org/10.1038/s41597-025-04508-9

Comorbidity Networks From Population-Wide Health Data: Aggregated Data of 8.9M Hospital Patients (1997–2014)

Subjects

Abstract

Similar content being viewed by others

A Poisson binomial-based statistical testing framework for comorbidity discovery across electronic health record datasets

Global patterns of prognostic biomarkers across disease space

Unraveling cradle-to-grave disease trajectories from multilayer comorbidity networks

Background & Summary

Methods

Odds Ratio calculation and statistical significance testing

Stratified analysis

The Cochran-Mantel-Haenszel method

Data Records

Hospital claims data

Technical Validation

Usage Notes

Prevalence of diagnoses

Comorbidity networks

Code availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

A Poisson binomial-based statistical testing framework for comorbidity discovery across electronic health record datasets

Global patterns of prognostic biomarkers across disease space

Unraveling cradle-to-grave disease trajectories from multilayer comorbidity networks

Background & Summary

Methods

Odds Ratio calculation and statistical significance testing

Stratified analysis

The Cochran-Mantel-Haenszel method

Data Records

Hospital claims data

Technical Validation

Usage Notes

Prevalence of diagnoses

Comorbidity networks

Code availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links