[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A patient network-based machine learning model for disease prediction: The case of type 2 diabetes mellitus

Published: 01 February 2022 Publication History

Abstract

In recent years, the prevalence of chronic diseases such as type 2 diabetes mellitus (T2DM) has increased, bringing a heavy burden to healthcare systems. While regular monitoring of patients is expensive and impractical, understanding chronic disease progressions and identifying patients at risk of developing comorbidities are crucial. This research used a real-world administrative claim dataset of T2DM to develop an ensemble of innovative patient network and machine learning approach for disease prediction. The healthcare data of 1,028 T2DM patients and 1,028 non-T2DM patients are extracted from the de-identified data to predict the risk of T2DM. The proposed model is based on the ‘patient network’, which represents the underlying relationships among health conditions for a group of patients diagnosed with the same disease using the graph theory. Besides patients’ socio-demographic and behaviour characteristics, the attributes of the ‘patient network’ (e.g., centrality measure) discover patients’ latent features, which are effective in risk prediction. We apply eight machine learning models (Logistic Regression, K-Nearest Neighbours, Support Vector Machine, Naïve Bayes, Decision Tree, Random Forest, XGBoost and Artificial Neural Network) to the extracted features to predict the chronic disease risk. The extensive experiments show that the proposed framework with machine learning classifiers performance with the Area Under Curve (AUC) ranged from 0.79 to 0.91. The Random Forest model outperformed the other models; whereas, eigenvector centrality and closeness centrality of the network and patient age are the most important features for the model. The outstanding performance of our model provides promising potential applications in healthcare services. Also, we provide strong evidence that the extracted latent features are essential in the disease risk prediction. The proposed approach offers vital insight into chronic disease risk prediction that could benefit healthcare service providers and their stakeholders.

References

[1]
World Health Organization (2020) Diabetes. https://www.who.int/news-room/fact-sheets/detail/diabetes. Accessed 8 March 2021
[2]
Hossain ME, Uddin S, and Khan A Network analytics and machine learning for predictive risk modelling of cardiovascular disease in patients with type 2 diabetes Expert Syst Appl 2021 164 113918
[3]
Australian Institute of Health and Welfare (2021) Diabetes. https://www.aihw.gov.au/reports/diabetes/diabetes/contents/what-is-diabetes. Accessed 8 March 2021
[4]
Jermendy G Can type 2 diabetes mellitus be considered preventable? Diabetes Res Clin Practice 2005 68 S73-S81
[5]
Rathmann W, Haastert B, Icks A, Löwel H, Meisinger C, Holle R, and Giani G High prevalence of undiagnosed diabetes mellitus in southern germany: target populations for efficient screening. the kora survey 2000 Diabetologia 2003 46 2 182-189
[6]
Zhang L, Wang Y, Niu M, Wang C, and Wang Z Machine learning for characterizing risk of type 2 diabetes mellitus in a rural chinese population: The henan rural cohort study Sci Rep 2020 10 1 1-10
[7]
Khan A, Uddin S, and Srinivasan U Chronic disease prediction using administrative data and graph theory: The case of type 2 diabetes Expert Syst Appl 2019 136 230-241
[8]
Collins GS, Mallett S, Omar O, and Yu L-M Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting BMC Med 2011 9 1 1-14
[9]
Fiorini S, Hajati F, Barla A, and Girosi F Predicting diabetes second-line therapy initiation in the australian population via time span-guided neural attention network PloS One 2019 14 10 e0211844
[10]
Kopitar L, Kocbek P, Cilar L, Sheikh A, and Stiglic G Early detection of type 2 diabetes mellitus using machine learning-based prediction models Sci Rep 2020 10 1 1-12
[11]
Sahoo A K, Pradhan C, Das H (2020) Performance evaluation of different machine learning methods and deep-learning based convolutional neural network for health decision making. In: Nature inspired computing for data science. Springer, pp 201– 212
[12]
Heydari M, Teimouri M, Heshmati Z, and Alavinia SM Comparison of various classification algorithms in the diagnosis of type 2 diabetes in iran Int J Diabetes Dev Count 2016 36 2 167-173
[13]
Samant P and Agarwal R Machine learning techniques for medical diagnosis of diabetes using iris images Comput Methods Program Biomed 2018 157 121-128
[14]
Xiao Q, Dai J, Luo J, and Fujita H Multi-view manifold regularized learning-based method for prioritizing candidate disease mirnas Knowl-Based Syst 2019 175 118-129
[15]
Butt AH, Rovini E, Fujita H, Maremmani C, and Cavallo F Data-driven models for objective grading improvement of parkinson’s disease Ann Biomed Eng 2020 48 12 2976-2987
[16]
Zhang X, Yang Y, Li T, Zhang Y, Wang H, and Fujita H Cmc: A consensus multi-view clustering model for predicting alzheimers disease progression Comput Methods Prog Biomed 2021 199 105895
[17]
Lei X, Tie J, and Fujita H Relational completion based non-negative matrix factorization for predicting metabolite-disease associations Knowl-Based Syst 2020 204 106238
[18]
Uddin S, Khan A, Hossain ME, and Moni MA Comparing different supervised machine learning algorithms for disease prediction BMC Med Inf Decis Making 2019 19 1 1-16
[19]
Razavian N, Blecker S, Schmidt AM, Smith-McLallen A, Nigam S, and Sontag D Population-level prediction of type 2 diabetes from claims data and analysis of risk factors Big Data 2015 3 4 277-287
[20]
Barabsi A-L Network medicine - from obesity to the ‘diseasome’ England J Med 2007 357 4 404-407
[21]
Loscalzo J, Kohane I, and Barabasi A-L Human disease classification in the postgenomic era: a complex systems approach to human pathobiology Mol Syst Biol 2007 3 1 124
[22]
Fotouhi B, Momeni N, Riolo MA, and Buckeridge DL Statistical methods for constructing disease comorbidity networks from longitudinal inpatient data Appl Netw Sci 2018 3 1 1-34
[23]
Aguado A, Moratalla-Navarro F, López-Simarro F, and Moreno V Morbinet: multimorbidity networks in adult general population. analysis of type 2 diabetes mellitus comorbidity Sci Rep 2020 10 1 1-12
[24]
Folino F, Pizzuti C, Ventura M (2010) A comorbidity network approach to predict disease risk. In: International Conference on Information Technology in Bio-and Medical Informatics. Springer, pp 102–109
[25]
World Health Organization (2020) International classification of diseases (ICD) information sheet. https://www.who.int/classifications/icd/factsheet/en/. Accessed 8 March 2021
[26]
The Australian Classification of Health Interventions (2020) ICD-10-AM. http://www.accd.net.au/icd-10-am-achi-acs/. Accessed 8 March 2021
[27]
Charlson ME, Pompei P, Ales KL, and MacKenzie CR A new method of classifying prognostic comorbidity in longitudinal studies: development and validation J Chron Diseas 1987 40 5 373-383
[28]
Elixhauser A, Steiner C, Harris D R, Coffey R M (1998) Comorbidity measures for use with administrative data. Med Care:8–27
[29]
Asratian A S, Denley Tristan MJ, Häggkvist R (1998) Bipartite graphs and their applications, vol 131. Cambridge university press
[30]
Zweig KA and Kaufmann M A systematic approach to the one-mode projection of bipartite graphs Soc Netw Anal Min 2011 1 3 187-218
[31]
Capobianco E et al. Comorbidity: a multidimensional approach Trends Mol Med 2013 19 9 515-521
[32]
Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, and Barabási A-L The human disease network Proc Natl Acad Sci 2007 104 21 8685-8690
[33]
Sandford AJ, Weir TD, and Pare PD Genetic risk factors for chronic obstructive pulmonary disease Eur Respir J 1997 10 6 1380-1391
[34]
Zhou T, Ren J, Medo M, and Zhang Y-C Bipartite network projection and personal recommendation Phys Rev E 2007 76 4 046115
[35]
Shaw ME Group structure and the behavior of individuals in small groups J Psychol 1954 38 1 139-149
[36]
Bonacich P Factoring and weighting approaches to status scores and clique identification J Math Sociol 1972 2 1 113-120
[37]
Freeman LC Centrality in social networks conceptual clarification Soc Netw 1978 1 3 215-239
[38]
Holland PW and Leinhardt S Transitivity in structural models of small groups Comp Group Stud 1971 2 2 107-124
[39]
Kavanagh A, Bentley RJ, Turrell G, Shaw J, Dunstan D, and Subramanian SV Socioeconomic position, gender, health behaviours and biomarkers of cardiovascular disease and diabetes Soc Sci Med 2010 71 6 1150-1160
[40]
Agah A Medical applications of artificial intelligence 2013 1st edn. Baton Rouge Taylor & Francis Group
[41]
Kleinbaum D G, Dietz K, Gail M, Klein M, Klein M (2002) Logistic regression. Springer
[42]
Cover T and Hart P Nearest neighbor pattern classification IEEE Trans Inf Theory 1967 13 1 21-27
[43]
Cortes C and Vapnik V Support-vector networks Mach Learn 1995 20 3 273-297
[44]
Lindley DV Fiducial distributions and bayes’ theorem J R Stat Soc Ser B (Methodol) 1958 20 1 102-107
[45]
Quinlan JR Induction of decision trees Mach Learn 1986 1 1 81-106
[46]
Breiman L Random forests Mach Learn 2001 45 1 5-32
[47]
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794
[48]
McCulloch WS and Pitts W A logical calculus of the ideas immanent in nervous activity Bullet Math Biophys 1943 5 4 115-133
[49]
Rumelhart DE, Hinton GE, and Williams RJ Learning representations by back-propagating errors Nature 1986 323 6088 533
[50]
Kohavi R et al (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, vol 14, Montreal, pp 1137–1145
[51]
Fawcett T An introduction to roc analysis Pattern Recogn Lett 2006 27 8 861-874
[52]
Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. In: Proceedings of the International AAAI Conference on Web and Social Media, vol 3
[53]
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: Machine learning in python J Mach Learn Res 2011 12 2825-2830
[54]
Chollet F et al (2015) Keras. https://keras.io
[55]
Kingma D P, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
[56]
Mani S, Chen Y, Elasy T, Clayton W, Denny J (2012) Type 2 diabetes risk forecasting from emr data using machine learning. In: AMIA Ann Symp Proc, vol 2012. American Medical Informatics Association, p 606
[57]
Yang J, Yao D, Zhan X, Zhan X (2014) Predicting disease risks using feature selection based on random forest and support vector machine. In: International Symposium on Bioinformatics Research and Applications. Springer, pp 1–11
[58]
Altmann A, Toloşi L, Sander O, and Lengauer T Permutation importance: a corrected feature importance measure Bioinformatics 2010 26 10 1340-1347
[59]
Scornet E, Biau G, and Vert J-P Consistency of random forests Ann Stat 2015 43 4 1716-1741
[60]
Pippitt K, Li M, and Gurgle HE Diabetes mellitus: screening and diagnosis Amer Family Phys 2016 93 2 103-109
[61]
Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, and Chouvarda I Machine learning and data mining methods in diabetes research Comput Struct Biotechnol J 2017 15 104-116
[62]
Dinh A, Miertschin S, Young A, and Mohanty SD A data-driven approach to predicting diabetes and cardiovascular disease with machine learning BMC Med Inf Decis Making 2019 19 1 1-15
[63]
Venugopala PS, Barh D, Ashwini B et al (2021) Artificial intelligence techniques for predicting type 2 diabetes. In: Advances in Artificial Intelligence and Data Engineering. Springer, pp 411–430

Cited By

View all

Index Terms

  1. A patient network-based machine learning model for disease prediction: The case of type 2 diabetes mellitus
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Applied Intelligence
        Applied Intelligence  Volume 52, Issue 3
        Feb 2022
        1148 pages

        Publisher

        Kluwer Academic Publishers

        United States

        Publication History

        Published: 01 February 2022
        Accepted: 13 May 2021

        Author Tags

        1. Disease prediction
        2. Type 2 Diabetes
        3. Administrative data
        4. Network analysis
        5. Machine learning

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 05 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Predictive health monitoringComputers in Biology and Medicine10.1016/j.compbiomed.2024.108469174:COnline publication date: 1-May-2024
        • (2024)Machine learning applications in preventive healthcareArtificial Intelligence in Medicine10.1016/j.artmed.2024.102950156:COnline publication date: 1-Oct-2024
        • (2024)Ensemble learning-based early detection of influenza diseaseMultimedia Tools and Applications10.1007/s11042-023-15848-283:2(5723-5743)Online publication date: 1-Jan-2024
        • (2024)iDP: ML-driven diabetes prediction framework using deep-ensemble modelingNeural Computing and Applications10.1007/s00521-023-09184-736:5(2525-2548)Online publication date: 1-Feb-2024
        • (2023)Cluster-based Discovering of Disease Risk Factors: A COVID-19 Case StudyProceedings of the 2023 4th International Symposium on Artificial Intelligence for Medicine Science10.1145/3644116.3644234(706-712)Online publication date: 20-Oct-2023
        • (2023)Integrating Cyber-Physical System with Federated-Edge Computing for Diabetes Detection and ManagementProceedings of the 2023 5th International Conference on Big-data Service and Intelligent Computation10.1145/3633624.3633627(16-22)Online publication date: 20-Oct-2023
        • (2023)HealthEdgeProcedia Computer Science10.1016/j.procs.2023.03.043220:C(331-338)Online publication date: 10-May-2023
        • (2023)Optimal deep learning control for modernized microgridsApplied Intelligence10.1007/s10489-022-04298-253:12(15638-15655)Online publication date: 1-Jun-2023
        • (2023)KNN-Based Patient Network and Ensemble Machine Learning for Disease PredictionHealth Information Science10.1007/978-981-99-7108-4_25(296-305)Online publication date: 23-Oct-2023
        • (2022)Predictive risk modelling in mental health issues using machine learning on graphsProceedings of the 2022 Australasian Computer Science Week10.1145/3511616.3513112(168-175)Online publication date: 14-Feb-2022
        • Show More Cited By

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media