Abstract
Comorbidity networks have become a valuable tool to support data-driven biomedical research. Yet, studies often are severely hindered by the availability of the necessary comprehensive data, often due to the sensitivity of health care information. This study presents a population-wide comorbidity network dataset derived from 45 million hospital stays of 8.9 million patients over 17 years in Austria. We present co-occurrence networks of hospital diagnoses, stratified by age, sex, and observation period in a total of 96 different subgroups. For each of these groups we report a range of association measures (e.g., count data, and odds ratios) for all pairs of diagnoses. The dataset provides the possibility to researchers to create their own, tailor-made comorbidity networks from real patient data that can be used as a starting point in quantitative and machine learning methods. This data platform is intended to lead to deeper insights into a wide range of epidemiological, public health, and biomedical research questions.
Similar content being viewed by others
Background & Summary
Integrating emerging technologies and vast data resources has significantly enhanced various fields, from healthcare to finance, by enabling more accurate predictions, personalized services, and data-driven decision-making. Especially the combination of large data sets with recent machine learning techniques has made the potential obvious. However, the vast amount of patient data generated in healthcare systems is largely inaccessible to the broader research community due to its sensitive nature1. This presents a significant challenge to the advancement of research on observational health data and the reproducibility of corresponding results. On the other hand, uncovering new knowledge and improving healthcare systems using medical data hold the potential to tackle some of the challenges that 100-year lifespans and accelerated aging of global populations bring about2,3,4. Research on chronic diseases that are the leading causes of avoidable premature deaths5,6,7, could tackle some of these challenges with new data-driven strategies for disease prevention8,9.
Network medicine (NM) adopts a series of principles and approaches from network theory with the aim of preventing and treating diseases by leveraging their interconnected nature10. For instance, comorbidity networks represent phenomenological disease relations that provide insights into chronic disease progression patterns across life and gender11,12. It was found that the likelihood to develop specific diseases often depends on the proximity in the comorbidity network to already present diseases. This fact highlights the potential predictive value of these networks. Typically, a patients’ health is not characterized by a single disease but by multiple coexisting medical conditions. Identifying complex structures and patterns within high-dimensional medical data sets may allow for a better understanding of comorbidities and how they affect each other, of gender differences in disease progression, or lead to the discovery of new disease predictors. Data-driven methods based on comorbidity networks have also opened a novel way into epidemiology on a population-wide scale by analyzing individual patient trajectories13,14, their diagnosis progression patterns15, and typical clusters of such trajectories16,17. To construct the disease networks, most studies have used data on in-hospital stays18.
This project aims to provide a comprehensive dataset for a wide range of comorbidity networks to foster further research in this direction. Networks derived from medical claims databases from all Austrian hospitals reflect information about 44,619,964 hospital stays and their interrelations in the Austrian population (N = 8,996,916). The underlying database, maintained by the Austrian Ministry of Health, includes data on patients’ age and gender, primary and secondary diagnoses, entry and release dates, release type, hospital region, and patient’s residential region. It covers the years 1997 through 2014. Level-3 ICD-10 codes are used to represent primary and secondary diagnoses19.
Here we present this dataset to construct different types of comorbidity networks, for instance, networks of ICD10 diagnoses, ICD blocks, and chronic conditions blocks for different sex, age-groups, and time periods. We combine aggregated hospital data, and exported networks, and scripts in the Github and FigShare repository20. The workflow of the research presented in this article is presented in Fig. 1.
Using this network data, one can validate whether comorbidities predicted by shared pathobiological processes indeed do occur in the population and thereby validate potential disease etiologies21. This network data can also be used to find leverage points for targeted prevention efforts in specific at-risk cohorts using disease trajectories14, in particular to understand type 2 diabetes progression22, to explore and compare multimorbidity profiles in different populations23, as well as to achieve more accurate predictions on the length of hospital stays24. Network data of the kind presented has been used in interactive tools to analyze population-level disease progression over time in13.
We aim to present a dataset from which many different types of diagnosis co-occurrence statistics can be derived, providing a flexible platform for the construction of comorbidity networks. Here, we present one way of doing this and discuss how the data can be used to achieve some of the objectives mentioned. Different questions may require different definitions of comorbidity networks, many of which can be explored with the data presented.
Methods
The original dataset comprises highly sensitive medical information from the Austrian Federal Ministry for Health. As part of the collaboration agreement, we can only share the aggregated datasets. In our projects, we make secondary use of hospital claims data collected for billing reasons.
Comorbidity networks known as disease-disease networks, express the relationships between various individual diseases. These networks are typically constructed from extensive longitudinal health datasets. Numerous statistical ways exist to construct and derive these networks from raw data; tools from network science are frequently employed to analyze (temporal) disease correlations. In25 the authors review several methods for network reconstruction. Typically, nodes represent diagnoses, and links represent statistically significant correlations (of various types) between two diagnoses.
Odds Ratio calculation and statistical significance testing
We employ Odds ratios (OR) to quantify the strength of association between diseases. The OR is a straightforward metric for network construction. In addition, when controlling for potential confounding variables such as age, sex and time, the Cochran-Mantel-Haenszel (CMH) method allows for a more accurate and unbiased estimate of disease associations by stratifying the analysis and calculating weighted averages of ORs across strata.
We start from contingency tables (two-way tables) for each disease pair to assess statistically significant correlations. These tables are used in statistics to summarize relations between categorical variables. An example is shown in Table 1.
For the case shown in Table 1, the OR is calculated as \(OR=\frac{a/c}{b/d}\). The lower limit of the OR is zero, while it does not have an upper bound. An odds ratio of one means equal probabilities of presented outcome and absence of outcome. Logarithmic odds ratios or log-odds log (OR) is defined as \(log(OR)=log(\frac{a/c}{b/d})\). The log (OR) has a range from \(-\infty \) to \(+\infty \). The standard error SE of the log (OR) is \(S{E}_{log(OR)}=\sqrt{\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}}\). A 95% confidence interval for the log (OR) is obtained as 1.96 SE on either side of the estimate26.
To enable researchers to study disease associations with other measures than ORs, our dataset provides 96 contingency tables for each pair of 1,080 diagnoses (ICD 3-digit codes) for each stratum.
From these variables most of the association measures used in the comorbidity network literature to date can be readily derived, compare25. For instance, instead of the OR one might choose the relative risk (RR), see for instance27,28. RR is the ratio of the probability of an event occurring in one group that is the exposed (with diagnose obesity) versus the unexposed (without diagnose obesity) group, \(RR=\frac{\frac{a}{a+b}}{\frac{c}{c+d}}\). It can be readily computed from the provided data. An RR of one means there is no difference between the compared groups.
Stratified analysis
A stratified analysis considered confounding variables such as age and period. 48 strata for men and women were created by splitting the dataset into 10-year age groups and six 2-year intervals from 2003 to 2014 (2003-2004, 2005-2006, etc.). A contingency table for every pair of diagnoses in every stratum was created. Odds Ratio (OR) and p-values that test the null hypothesis that the co-occurrence of two diagnoses is statistically independent, were only computed using contingency tables with appropriate patient numbers in each grouping (more than 4).
We calculated a weighted average of the odds ratio (OR) estimations across the stratified data using the Cochran–Mantel–Haenszel technique (see details below)29. Filtering the resulting correlation matrices by statistical significance alone is not advisable, since this could bias the resulting network towards links between very frequent diseases with low–but still significant–correlations. Hence, we only include comorbidities with an OR greater than 1.5, a p-value less than 0.05 and at least 100 patients with the analysed comorbidity.
The Cochran-Mantel-Haenszel method
To account for confounding factors in the analysis, we perform a stratified analysis by constructing two-by-two tables for each stratum (or category) of the confounding variable, as illustrated in Fig. 2. Cochran-Mantel-Haenszel estimates the OR as a weighted average of the odds ratio of the different strata, \({OR}_{cmh}=\frac{\sum \frac{{a}_{i}{d}_{i}}{{n}_{i}}}{\sum \frac{{b}_{i}{c}_{i}}{{n}_{i}}}\), and RR as the weighted average of the risk ratio, \({RR}_{cmh}=\frac{\sum \frac{{a}_{i}({c}_{i}+{d}_{i})}{{n}_{i}}}{\sum \frac{{c}_{i}({a}_{i}+{b}_{i})}{{n}_{i}}}\))29, ni is the number of stratum and ai, bi, ci, di refers to the corresponding terms in the contingency table of the i-th stratum.
Data Records
The dataset is available at30 and it is organised in four groups:
-
(i)
Prevalence
-
(ii)
Contingency Tables
-
(iii)
Adjacency Matrices
-
(iv)
Graphs - gexf files.
Prevalence data is provided in CSV format, and contingency tables are organized as lists and stored in RDS format. Adjacency Matrices are published in both CSV and RDS formats. Graphs are available in GEXF format. We also provide R and Python scripts for the analysis of available variables.
Overview of data sources and how datasets are analysed and organise is shown in Fig. 3.
Hospital claims data
The dataset under analysis comprises 44,619,964 hospital admissions, corresponding to roughly the Austrian population (N = 8,996,916) between 1997 and 2014. As a result of a collaboration, the dataset is provided by the Austrian Ministry of Health to the Complexity Science Hub and the Medical University of Vienna. The database contains a patient ID, sex (male/female), age group (resolution of five-year), primary and secondary diagnoses, admission and discharge dates, the type of discharge (routine discharge, transfer to another facility, etc.), region of residency of the patient (32 regions NUTS3), region of the hospital, and department of the hospital department are all included in the database17,31,32. The primary diagnoses (one diagnosis per hospital stay) refer to the primary reason for hospitalization, secondary diagnoses (one or more diagnoses per hospital stay) specify additional diseases.
ICD-10 codes of level-3 as provided by the WHO are used to represent primary and secondary diagnoses19. We limited the study to codes between A00 and N99, reducing the number of 3-digit ICD codes from 1,699 to 1,080 diagnoses. We exclude diagnosis codes that cannot be directly related to diseases but encode other reasons for hospitalization, such as O00-O99 - pregnancy, childbirth, puberty, and S00-T88 - injury, poisoning, and some other effects of external causes.
Technical Validation
The goal of this study is to facilitate the secondary use of a population-wide in-hospital database (originally collected for billing purposes33). The LKF framework (Leistungsorientierte Krankenanstaltenfinanzierung) is Austria’s performance-oriented hospital financing system. It was introduced to ensure that hospitals are funded based on the services they provide. While primarily used for billing purposes, this data is also highly valuable for research, offering reliable insights into healthcare utilization, patient outcomes, and disease patterns.
Data collection under the LKF framework adheres to a rigorous standardized process and validation. Hospitals must collect and report detailed structured data, which includes patient demographics, admission and discharge dates, and diagnostic information (ICD codes). The data collection process is subject to regular external audits to ensure that hospitals are reporting accurately34. These audits are critical to identifying and correcting discrepancies, such as missing or inaccurate diagnoses.
Non-systematic errors, such as sporadic missing diagnoses, have been evaluated and their impact on the results of the analyses is minimal due to the large volume of data. To account for these limitations, sensitivity analyses are often performed to assess the robustness of the results, especially when analyzing rare conditions or specific comorbidities.
We performed filtering to prepare the dataset for comorbidity analysis. We limited the scope of our investigation to information collected between 2003 and 2014. We excluded any patient who had at least one hospital visit between 1997 and 2002 to ensure the comparability of the health state of the study population. Hence, we can assume that our cohort is “healthy” at the beginning of the observation period in the sense that they had no hospital stays during this time period. In the early 2000s, the Austrian diagnosis coding system was changed. By restricting the comorbidity network analysis to times from 2003 onwards, we avoid inaccuracies stemming from changes in diagnosis coding within the hospitals.
This database has been used in studies to analyze gender differences among diabetic patients35,36,37, gender differences in cardiovascular diseases38, comorbidities of obesity39, clusters of patients17, and disease trajectories32. These studies have validated the reliability of the LKF dataset in addressing a wide range of research questions, highlighting its robustness despite the known limitations.
Despite the robust structure and auditing, certain limitations remain in the LKF dataset. Diagnoses that do not lead directly to financial compensation, such as alcohol-related disorders or nicotine dependence, may be underreported. In addition, the database lacks outpatient visits, detailed socioeconomic indicators, and medication information. This may prevent the impact of these aspects on comorbidity from being uncovered. These limitations are acknowledged in studies using the dataset and are addressed through careful interpretation of results and, where possible, complementary data sources.
Usage Notes
Table 2 illustrates baseline characteristics of the hospital claims data set containing 3,378,906 patients (females: 1,688,467, males: 1,690,439) following filtering. They are 44.30 ± 24.89 years on average. Figure 4 shows the age distribution.
Prevalence of diagnoses
The most prevalent ICD chapters (based on the first letter of each code) for females and males over all time periods are cardiovascular disease (I–Circulatory System), cancers, and neoplasms (C–D–Neoplasms). In males, the third most prevalent are digestive diseases (K–Digestive System), followed by mental disorders (F–Mental and Behavioral Disorders), while in females, we see musculoskeletal disorders (M–Musculoskeletal, Connective Tissue), followed by digestive diseases (K–Digestive System). Interestingly, cardiovascular diagnoses were consistently the most prevalent in males and remained the most common in females up until 2006. However, after 2006, cancer diagnoses became the most prevalent among females. The prevalence of all ICD chapters over time is presented in Fig. 5 a) male, b) female.
Comorbidity networks
We constructed three versions of networks with different types of node:
- 1.
- 2.
- 3.
Examples of different comorbidity networks. Node size represents disease prevalence; colors indicate the ICD chapter (first letter of ICD 10 code). Links weights are proportional to the odds ratios. Online dynamic version available at https://vis.csh.ac.at/comorbidity_networks/.
A comprehensive analysis of the network properties of ICD10 codes comorbidity (undirected weighted) networks for each age group (links weights normalized to range from 0 to 1 by dividing each link’s weight by the sum of all links of a target node) is shown in Fig. 7.
These properties unravel a massive topological restructuring of the networks as the underlying patient cohorts age. Figure 7a shows the total number of nodes with at least one connection in the network. The number of these nodes and the average degree (the average number of connections or edges each node has) Fig. 7b increases with age. For both genders, the average path length decreases with age, indicating that the network gets denser with age Fig. 7c. This indicates that diseases become more correlated.
Betweenness centrality is a quantity that measures the influence of a node in “connecting” other nodes. The mean value of betweenness for the whole network fluctuates for both genders, with an increase starting around 40–49 years for both females and males Fig. 7d. This indicates that some diseases in males are critical “bridges” between other diseases in this age range. The networks become increasingly dense with age (except the youngest age group). This is associated with an increase in the betweenness centrality and a decrease in the average path length Fig. 7c,d, respectively.
Closeness centrality measures how quickly a node can reach other nodes in the network Fig. 7f. The spike in closeness for males in younger age groups (10–19 years) suggests that diseases in young males are more densely connected by a few diseases serving as hubs compare to the situation in other age groups. However, the values decline with age for both genders, suggesting a reduced influence of individual diseases as the network becomes denser. Both, males and females, show a decline in modularity with age, meaning that diseases are less likely to form separate, distinct clusters as individuals age, Fig. 7g. Males start with higher modularity but converge to levels of females in older age.
In summary, to the best of our knowledge, this dataset on comorbidities is the only one of its kind that spans 17 years and covers 9 million individuals, and it is publicly available to the research community. Research of these comorbidity networks and aggregated hospital claims data can enhance the understanding of comorbidities by identifying disease co-occurrence patterns. This enables more accurate patient classification based on risk profiles and disease trajectory prediction by analyzing comorbidities’ progression. The data also supports medication studies, assessing drug interactions in patients with multiple conditions. It can be used to test hypotheses about disease relationships across age groups, gender differences in comorbidities, and population-specific patterns.
Here we present a series of network centrality measures that quantify properties of the networks and provide a characterization of their topology and structure. In particular, we employ the degree (to how many diseases a disease is significantly connected to), betweenness centrality (which captures which diseases connect many others), average path length (that quantifies how close–in terms of networks distance–diseases are on average), modularity (reflecting how easily the network can be partitioned into distinct clusters or communities), as well as closeness centrality that captures how quickly a node can access other nodes in the network.
Code availability
This project is accessible on GitHub at: https://github.com/elmadervic/Comorbidity-Networks-From-Population-Wide-Health-Data. The code used to describe and explore the dataset is written in the programming languages R and Python. Please refer to the READ.ME file in the code release for further instructions.
References
Goncalves, A. et al. Generation and evaluation of synthetic patient data. BMC Medical Research Methodology. 20, 1–40, https://doi.org/10.1186/s12874-020-00977-1 (2020).
World Health Organization - Ageing and health, from https://www.who.int/news-room/fact-sheets/detail/ageing-and-health (2024).
Ageing Europe: LOOKING AT THE LIVES OF OLDER PEOPLE IN THE EU, 2019 edition from https://ec.europa.eu/eurostat/documents/3217494/10166544/KS-02-19.
World Health Organization, World report on ageing and health. (2015).
Overview non-communicable-diseases, from https://ec.europa.eu/health/non-communicable-diseases/overview_en (2012).
Hajat, C. & Stein, E. The global burden of multiple chronic conditions: a narrative review. Preventive Medicine Reports. 12, 284–293, https://doi.org/10.1016/j.pmedr.2018.10.008 (2018).
He, Z. et al. & Others Prevalence of multiple chronic conditions among older adults in Florida and the United States: comparative analysis of the OneFlorida data trust and national inpatient sample. Journal Of Medical Internet Research 20, e8961, https://doi.org/10.2196/jmir.8961 (2018).
Struckmann, V. et al. & Others Caring for people with multiple chronic conditions in. Europe. Eurohealth. 20, 35–40 (2014).
Kudesia, P. et al. The incidence of multimorbidity and patterns in accumulation of chronic conditions: A systematic review. Journal Of Multimorbidity And Comorbidity. 11, 26335565211032880, https://doi.org/10.1177/26335565211032880 (2021).
Barabási, A., Gulbahce, N. & Loscalzo, J. Network medicine: a network-based approach to human disease. Nature Reviews Genetics 12, 56–68, https://doi.org/10.1038/nrg2918 (2011).
Chmiel, A., Klimek, P. & Thurner, S. Spreading of diseases through comorbidity networks across life and gender. New Journal Of Physics 16, 115013, https://doi.org/10.1088/1367-2630/16/11/115013 (2014).
Roque, F. et al. & Others Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Computational Biology 7, e1002141, https://doi.org/10.1371/journal.pcbi.1002141 (2011).
Siggaard, T. et al. Disease trajectory browser for exploring temporal, population-wide disease progression patterns in 7.2 million Danish patients. Nature Communications 11, 1–10, https://doi.org/10.1038/s41467-020-18682-4 (2020).
Jensen, A. et al. Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients. Nature Communications 5, 1–10, https://doi.org/10.1038/ncomms5022 (2014).
Jeong, E., Ko, K., Oh, S. & Han, H. Network-based analysis of diagnosis progression patterns using claims data. Scientific Reports 7, 1–12, https://doi.org/10.1038/s41598-017-15647-4 (2017).
Giannoula, A., Gutierrez-Sacristán, A., Bravo, Á., Sanz, F. & Furlong, L. Identifying temporal patterns in patient disease trajectories using dynamic time warping: a population-based study. Scientific Reports 8, 1–14, https://doi.org/10.1038/s41598-018-22578-1 (2018).
Haug, N. et al. High-risk multimorbidity patterns on the road to cardiovascular mortality. BMC Medicine. 18, 1–12, https://doi.org/10.1186/s12916-020-1508-1 (2020).
Jørgensen, I., Haue, A., Placido, D., Hjaltelin, J. & Brunak, S. Disease Trajectories from Healthcare Data: Methodologies, Key Results, and Future Perspectives. Annual Review Of Biomedical Data Science 7, 251–276, https://doi.org/10.1146/annurev-biodatasci-110123-041001 (2024).
International Classification of Diseases (ICD), from https://www.who.int/standards/classifications/classification-of-diseases (2022).
Gephi: An Open Graph Visualization Platform, from https://github.com/gephi/gephi (2024).
Klimek, P., Aichberger, S. & Thurner, S. Disentangling genetic and environmental risk factors for individual diseases from multiplex comorbidity networks. Scientific Reports. 6, 39658, https://doi.org/10.1038/srep03689 (2016).
Khan, A., Uddin, S. & Srinivasan, U. Comorbidity network for chronic disease: A novel approach to understand type 2 diabetes progression. International Journal Of Medical Informatics. 115, 1–9, https://doi.org/10.1016/j.ijmedinf.2018.04.001 (2018).
Bao, Y. et al. & Others Exploring multimorbidity profiles in middle-aged inpatients: a network-based comparative study of China and the United Kingdom. BMC Medicine 21, 495, https://doi.org/10.1186/s12916-023-03204-y (2023).
Kalgotra, P. & Sharda, R. When will I get out of the hospital? Modeling length of stay using comorbidity networks. Journal Of Management Information Systems 38, 1150–1184, https://doi.org/10.1080/07421222.2021.1990618 (2021).
Fotouhi, B., Momeni, N., Riolo, M. & Buckeridge, D. Statistical methods for constructing disease comorbidity networks from longitudinal inpatient data. Applied Network Science 3, 1–34, https://doi.org/10.1007/s41109-018-0101-4 (2018).
Bland, J. & Altman, D. The odds ratio. Bmj 320, 1468, https://doi.org/10.1136/bmj.320.7247.1468 (2000).
Hidalgo, C., Blumm, N., Barabási, A. & Christakis, N. A dynamic network approach for the study of human phenotypes. PLoS Computational Biology 5, e1000353, https://doi.org/10.1371/journal.pcbi.1000353 (2009).
Folino, F., Pizzuti, C. & Ventura, M. A comorbidity network approach to predict disease risk. International Conference On Information Technology In Bio-and Medical Informatics. pp. 102-109 https://doi.org/10.1007/978-3-642-15020-3_10 (2010).
Kuritz, S. A general overview of Mantel-Haenszel methods: applications and recent developments. Annu Rev Public Health. 9, 123–160, https://doi.org/10.1146/annurev.pu.09.050188.001011 (1988).
Dervic, E., Ledebur, K., Thurner, S. & Klimek, P. Comorbidity Networks From Population-Wide Health Data: Aggregated Data of 8.9 M Hospital Patients (1997–2014). figshare https://doi.org/10.6084/m9.figshare.27102553 (2024).
Strauss, M., Niederkrotenthaler, T., Thurner, S., Kautzky-Willer, A. & Klimek, P. Data-driven identification of complex disease phenotypes. Journal Of The Royal Society Interface 18, 20201040, https://doi.org/10.1098/rsif.2020.1040 (2021).
Dervic, E. et al. Unraveling cradle-to-grave disease trajectories from multilayer comorbidity networks. Npj Digital Medicine 7, 56, https://doi.org/10.1038/s41746-024-01015-w (2024).
Leistungsorientierte Krankenanstaltenfinanzierung (LKF) - German, from https://www.sozialministerium.at/Themen/Gesundheit/Gesundheitssystem/Krankenanstalten/Leistungsorientierte-Krankenanstaltenfinanzierung-(LKF).html (2024).
Kobel, C. & Pfeiffer, K. Austria: Inpatient care and the LKF framework. Diagnosis-Related Groups In Europe: Moving Towards Transparency, Efficiency And Quality In Hospitals. pp. 175-196, https://doi.org/10.1136/bmj.f3197 (2011).
Deischinger, C. et al. Diabetes mellitus is associated with a higher risk for major depressive disorder in women than in men. BMJ Open Diabetes Research And Care 8, e001430, https://doi.org/10.1136/bmjdrc-2020-001430 (2020).
Deischinger, C., Dervic, E., Kaleta, M., Klimek, P. & Kautzky-Willer, A. Diabetes mellitus is associated with a higher relative risk for Parkinson’s disease in women than in men. Journal Of Parkinson’s Disease 11, 793–800, https://doi.org/10.3233/JPD-202486 (2021).
Deischinger, C. et al. Diabetes mellitus is associated with a higher relative risk for venous thromboembolism in females than in males. Diabetes Research And Clinical Practice. 194, 110190, https://doi.org/10.1016/j.diabres.2022.110190 (2022).
Dervic, E. et al. & Others The Effect of Cardiovascular Comorbidities on Women Compared to Men: Longitudinal Retrospective Analysis. JMIR Cardio 5, e28015, https://doi.org/10.2196/28015 (2021).
Leutner, M. et al. Risk of Typical Diabetes-Associated Complications in Different Clusters of Diabetic Patients: Analysis of Nine Risk Factors. Journal Of Personalized Medicine 11, 328, https://doi.org/10.3390/jpm11050328 (2021).
Koller, D. et al. Multimorbidity and long-term care dependency—a five-year follow-up. BMC Geriatrics 14, 1–9, https://doi.org/10.1186/1471-2318-14-70 (2014).
Author information
Authors and Affiliations
Contributions
E.D., K.L., S.T., P.K. conceived the study, E.D. carried out the analysis, produced the plots and graphics, and drafted the manuscript; E.D.,K.L., S.T., P.K. analyzed the results. All authors wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Dervić, E., Ledebur, K., Thurner, S. et al. Comorbidity Networks From Population-Wide Health Data: Aggregated Data of 8.9M Hospital Patients (1997–2014). Sci Data 12, 215 (2025). https://doi.org/10.1038/s41597-025-04508-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-025-04508-9