A Catalogue of Machine Learning Algorithms for Healthcare Risk Predictions †
<p>Indicative example of BNB steps.</p> "> Figure 2
<p>Indicative example of KNN steps.</p> "> Figure 3
<p>Indicative example of DT steps.</p> "> Figure 4
<p>Indicative example of RF steps.</p> "> Figure 5
<p>Indicative example of LR steps.</p> "> Figure 6
<p>Indicative example of MLP steps.</p> "> Figure 7
<p>Indicative example of SGD steps.</p> "> Figure 8
<p>Overall mechanism architecture.</p> "> Figure 9
<p>Confusion matrix.</p> "> Figure 10
<p>Example of stroke probability form.</p> "> Figure 11
<p>Precision results of ML models for each use case.</p> "> Figure 12
<p>Recall results of ML models for each use case.</p> "> Figure 13
<p>F1-score results of ML models for each use case.</p> "> Figure 14
<p>Specificity results of ML models for each use case.</p> "> Figure 15
<p>Train–validation–test score for diabetes use case.</p> "> Figure 16
<p>Confusion matrix of prediction results for diabetes use case.</p> "> Figure 17
<p>Performance comparison in the diabetes use case.</p> "> Figure 18
<p>Train–validation–test score for stroke use case.</p> "> Figure 19
<p>Confusion matrix of prediction results for stroke use case.</p> "> Figure 20
<p>Performance comparison in the stroke use case.</p> "> Figure 21
<p>Train–validation–test score for heart failure use case.</p> "> Figure 22
<p>Confusion matrix of prediction results for heart failure use case.</p> "> Figure 23
<p>Performance comparison in the heart failure use case.</p> "> Figure 24
<p>Train–validation–test score for COVID-19 use case.</p> "> Figure 25
<p>Confusion matrix of prediction results for COVID-19 use case.</p> "> Figure 26
<p>Performance comparison in the COVID-19 use case.</p> "> Figure 27
<p>Train–validation–test score for breast cancer use case.</p> "> Figure 28
<p>Confusion matrix of prediction results for breast cancer use case.</p> "> Figure 29
<p>Performance comparison in the breast cancer use case.</p> "> Figure 30
<p>Train–validation–test score for kidney disease use case.</p> "> Figure 31
<p>Confusion matrix of prediction results for kidney disease use case.</p> "> Figure 32
<p>Performance comparison in the kidney disease use case.</p> "> Figure 33
<p>Training performance comparison for each algorithm per dataset.</p> ">
Abstract
:1. Introduction
2. Materials and Methods
2.1. Machine Learning Principles and Algorithms
2.1.1. Bernoulli Naïve Bayes (BNB)
2.1.2. K-Nearest Neighbors (KNN)
2.1.3. Decision Tree (DT)
2.1.4. Random Forest (RF)
2.1.5. Logistic Regression (LR)
2.1.6. Neural Networks (NN)
2.1.7. Stochastic Gradient Descent (SGD)
2.2. Proposed Machine Learning Approach
3. Results
3.1. Datasets Description
3.2. Evaluation Environment
3.3. Evaluation Results
3.3.1. Diabetes Use Case
3.3.2. Stroke Use Case
3.3.3. Heart Failure Use Case
3.3.4. COVID-19 Use Case
3.3.5. Breast Cancer Use Case
3.3.6. Kidney Disease Use Case
3.3.7. Training Performance
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Power, D.J.; Sharda, R.; Burstein, F. Decision Support Systems; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2015. [Google Scholar]
- Kourou, K.; Exarchos, T.P.; Exarchos, K.P.; Karamouzis, M.V.; Fotiadis, D.I. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 2015, 13, 8–17. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Pan, L.; Liu, G.; Lin, F.; Zhong, S.; Xia, H.; Sun, X.; Liang, H. Machine learning applications for prediction of relapse in childhood acute lymphoblastic leukemia. Sci. Rep. 2017, 7, 7402. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zantalis, F.; Koulouras, G.; Karabetsos, S.; Kandris, D. A review of machine learning and IoT in smart transportation. Future Internet 2019, 11, 94. [Google Scholar] [CrossRef] [Green Version]
- Dixon, M.F.; Halperin, I.; Bilokon, P. Machine Learning in Finance; Springer: New York, NY, USA, 2020; Volume 1406. [Google Scholar]
- Luan, H.; Tsai, C.C. A review of using machine learning approaches for precision education. Educ. Technol. Soc. 2021, 24, 250–266. [Google Scholar]
- Ullah, Z.; Al-Turjman, F.; Mostarda, L.; Gagliardi, R. Applications of artificial intelligence and machine learning in smart cities. Comput. Commun. 2020, 154, 313–323. [Google Scholar] [CrossRef]
- Assaf, D.; Gutman, Y.A.; Neuman, Y.; Segal, G.; Amit, S.; Gefen-Halevi, S.; Shilo, N.; Epstein, A.; Mor-Cohen, R.; Biber, A.; et al. Utilization of machine-learning models to accurately predict the risk for critical COVID-19. Intern. Emerg. Med. 2020, 15, 1435–1443. [Google Scholar] [CrossRef]
- Yu, J.; Park, S.; Kwon, S.H.; Ho, C.M.B.; Pyo, C.S.; Lee, H. AI-based stroke disease prediction system using real-time electromyography signals. Appl. Sci. 2020, 10, 6791. [Google Scholar] [CrossRef]
- Lisboa, P.J.; Taktak, A.F.G. The use of artificial neural networks in decision support in cancer: A systematic review. Neural Netw. 2006, 19, 408–415. [Google Scholar] [CrossRef]
- Esteban, C.; Arostegui, I.; Moraza, J.; Aburto, M.; Quintana, J.M.; Perez-Izquierdo, J.; Aizpiri, S.; Capelastegui, A. Development of a decision tree to assess the severity and prognosis of stable COPD. Eur. Respir. J. 2011, 38, 1294–1300. [Google Scholar] [CrossRef] [Green Version]
- Verduijn, M.; Peek, N.; Rosseel, P.M.J.; de Jonge, E.; de Mol, B.A.J.M. Prognostic Bayesian networks I: Rationale, learning procedure, and clinical use. J. Biomed. Inform. 2007, 40, 609–618. [Google Scholar] [CrossRef] [Green Version]
- Barakat, N.H.; Bradley, A.P.; Barakat, M.N.H. Intelligible support vector machines for diagnosis of diabetes mellitus. IEEE Trans. Inf. Technol. Biomed. 2010, 14, 1114–1120. [Google Scholar] [CrossRef] [PubMed]
- Jiang, F.; Jiang, Y.; Zhi, H.; Dong, Y.; Li, H.; Ma, S.; Wang, Y.; Dong, Q.; Shen, H.; Wang, Y. Artificial intelligence in healthcare: Past, present and future. Stroke Vasc. Neurol. 2017, 2, 230–243. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tran, B.X.; Vu, G.T.; Ha, G.H.; Vuong, Q.H.; Ho, M.T.; Vuong, T.T.; La, V.P.; Ho, M.T.; Nghiem, K.P.; Nguyen, H.L.T.; et al. Global evolution of research in artificial intelligence in health and medicine: A bibliometric study. J. Clin. Med. 2019, 8, 60. [Google Scholar] [CrossRef] [Green Version]
- Ferdous, M.; Debnath, J.; Chakraborty, N.R. Machine learning algorithms in healthcare: A literature survey. In Proceedings of the 2020 11th International conference on computing, communication and networking technologies (ICCCNT), Kharagpur, India, 1–3 July 2020. [Google Scholar]
- Arora, Y.K.; Tandon, A.; Nijhawan, R. Hybrid computational intelligence technique: Eczema detection. In Proceedings of the TENCON 2019-2019 IEEE Region 10 Conference (TENCON), Kochi, India, 17–20 October 2019; pp. 2472–2474. [Google Scholar]
- Tong, Y.; Messinger, A.I.; Wilcox, A.B.; Mooney, S.D.; Davidson, G.H.; Suri, P.; Luo, G. Forecasting future asthma hospital encounters of patients with asthma in an academic health care system: Predictive model development and secondary analysis study. J. Med. Internet Res. 2021, 23, e22796. [Google Scholar] [CrossRef]
- Wang, L.; Wang, X.; Chen, A.; Jin, X.; Che, H. Prediction of type 2 diabetes risk and its effect evaluation based on the XGBoost model. Healthcare 2020, 8, 247. [Google Scholar] [CrossRef] [PubMed]
- Lin, H.R.; Fujiwara, K.; Sasaki, M.; Ishiyama, K.; Ikeda-Sonoda, S.; Takahashi, A.; Miyata, H. Development and validation of gradient boosting decision tree models for predicting care needs using a long-term care database in Japan. medRxiv 2021. [Google Scholar] [CrossRef]
- Garg, V.; Jaiswal, A. A Review on Parkinson’s Disease Prediction using Machine Learning. Int. J. Eng. Res. Technol. 2021, 9, 330–334. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Ravi, C.; Manoharan, R. Malware detection using windows api sequence and machine learning. Int. J. Comput. Appl. 2012, 43, 12–16. [Google Scholar] [CrossRef]
- Vembandasamy, K.; Sasipriya, R.; Deepa, E. Heart diseases detection using Naive Bayes algorithm. Int. J. Innov. Sci. Eng. Technol. 2015, 2, 441–444. [Google Scholar]
- Bahramirad, S.; Mustapha, A.; Eshraghi, M. Classification of liver disease diagnosis: A comparative study. In Proceedings of the 2013 Second International Conference on Informatics & Applications (ICIA), Lodz, Poland, 23–25 September 2013; pp. 42–46. [Google Scholar]
- Marucci-Wellman, H.R.; Lehto, M.R.; Corns, H.L. A practical tool for public health surveillance: Semi-automated coding of short injury narratives from large administrative databases using Naïve Bayes algorithms. Accid. Anal. Prev. 2015, 84, 165–176. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, X.; Lu, R.; Ma, J.; Chen, L.; Qin, B. Privacy-preserving patient-centric clinical decision support system on naive Bayesian classification. IEEE J. Biomed. Health Inform. 2015, 20, 655–668. [Google Scholar] [CrossRef] [PubMed]
- Al-Aidaroos, K.M.; Bakar, A.A.; Othman, Z. Medical data classification with Naive Bayes approach. Inf. Technol. J. 2012, 11, 1166. [Google Scholar] [CrossRef] [Green Version]
- Mahima, S.; Mathu, T.; Raimond, K. COVID-19 Symptom Analysis and Prediction Using Machine Learning Techniques. In Disruptive Technologies for Big Data and Cloud Applications; Springer: Singapore, 2022; pp. 847–857. [Google Scholar]
- Uddin, S.; Haque, I.; Lu, H.; Moni, M.A.; Gide, E. Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Sci. Rep. 2022, 12, 6256. [Google Scholar] [CrossRef] [PubMed]
- Islam, R.; Debnath, S.; Palash, T.I. Predictive Analysis for Risk of Stroke Using Machine Learning Techniques. In Proceedings of the 2021 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2), Rajshahi, Bangladesh, 26–27 December 2021. [Google Scholar]
- Romadhon, M.R.; Kurniawan, F. A comparison of naive Bayes methods, logistic regression and KNN for predicting healing of Covid-19 patients in Indonesia. In Proceedings of the 2021 3rd East Indonesia Conference on Computer and Information Technology (EICONCIT), Surabaya, Indonesia, 9–11 April 2021. [Google Scholar]
- Zamiri, M.; Ferreira, J.; Sarraipa, J.; Sassanelli, C.; Gusmeroli, S.; Jardim-Goncalves, R. Towards a conceptual framework for developing sustainable digital innovation hubs. In Proceedings of the 2021 IEEE International Conference on Engineering, Technology and Innovation (ICE/ITMC), Cardiff, UK, 21–23 June 2021. [Google Scholar]
- Devika, R.; Avilala, S.V.; Subramaniyaswamy, V. Comparative study of classifier for chronic kidney disease prediction using naive bayes, KNN and random forest. In Proceedings of the 2019 3rd International conference on computing methodologies and communication (ICCMC), Erode, India, 27–29 March 2019. [Google Scholar]
- Assegie, T.A.; Sushma, S.J.; Bhavya, B.G.; Padmashree, S. Correlation analysis for determining effective data in machine learning: Detection of heart failure. SN Comput. Sci. 2021, 2, 213. [Google Scholar] [CrossRef]
- Rajani Kumari, L.V.; Padma Sai, Y. Classification of arrhythmia beats using optimized K-nearest neighbor classifier. In Intelligent Systems; Springer: Singapore, 2021; pp. 349–359. [Google Scholar]
- Khateeb, N.; Usman, M. Efficient heart disease prediction system using K-nearest neighbor classification technique. In Proceedings of the International Conference on Big Data and Internet of Thing, London, UK, 20–22 December 2017; pp. 21–26. [Google Scholar]
- Chandel, K.; Kunwar, V.; Sabitha, S.; Choudhury, T.; Mukherjee, S. A comparative study on thyroid disease detection using K-nearest neighbor and Naive Bayes classification techniques. CSI Trans. ICT 2016, 4, 313–319. [Google Scholar] [CrossRef]
- Ahmad, P.; Qamar, S.; Rizvi, S.Q.A. Techniques of data mining in healthcare: A review. Int. J. Comput. Appl. 2015, 120, 38–50. [Google Scholar] [CrossRef]
- Lin, L.; Wu, Y.; Ye, M. Experimental Comparisons of Multi-class Classifiers. Informatica 2015, 39, 71–85. [Google Scholar]
- Vaghela, C.; Bhatt, N.; Mistry, D. A Survey on Various Classification Techniques for Clinical Decision Support System. Int. J. Comput. Appl. 2015, 116, 11–17. [Google Scholar] [CrossRef]
- Biswas, N.; Uddin, K.M.M.; Rikta, S.T.; Dey, S.K. A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach. Healthc. Anal. 2022, 2, 100116. [Google Scholar] [CrossRef]
- Elhazmi, A.; Al-Omari, A.; Sallam, H.; Mufti, H.N.; Rabie, A.A.; Alshahrani, M.; Arabi, Y.M. Machine learning decision tree algorithm role for predicting mortality in critically ill adult COVID-19 patients admitted to the ICU. J. Infect. Public Health 2022, 15, 826–834. [Google Scholar] [CrossRef] [PubMed]
- Singh, A.; Dhillon, A.; Kumar, N.; Hossain, M.S.; Muhammad, G.; Kumar, M. eDiaPredict: An Ensemble-based framework for diabetes prediction. ACM Trans. Multimid. Comput. Commun. Appl. 2021, 17, 1–26. [Google Scholar] [CrossRef]
- Naji, M.A.; El Filali, S.; Aarika, K.; Benlahmar, E.H.; Abdelouhahid, R.A.; Debauche, O. Machine learning algorithms for breast cancer prediction and diagnosis. Procedia Comput. Sci. 2021, 191, 487–492. [Google Scholar] [CrossRef]
- Senan, E.M.; Al-Adhaileh, M.H.; Alsaade, F.W.; Aldhyani, T.H.; Alqarni, A.A.; Alsharif, N.; Alzahrani, M.Y. Diagnosis of chronic kidney disease using effective classification algorithms and recursive feature elimination techniques. J. Healthc. Eng. 2021, 2021, 1004767. [Google Scholar] [CrossRef]
- Arumugam, K.; Naved, M.; Shinde, P.P.; Leiva-Chauca, O.; Huaman-Osorio, A.; Gonzales-Yanac, T. Multiple disease prediction using Machine learning algorithms. Mater. Today Proc. 2021. [Google Scholar] [CrossRef]
- Shaik, A.B.; Srinivasan, S. A brief survey on random forest ensembles in classification model. In Proceedings of the International Conference on Innovative Computing and Communications, Ostrava, Czech Republic, 21–22 March 2019; pp. 253–260. [Google Scholar]
- Fernandez-Lozano, C.; Hervella, P.; Mato-Abad, V.; Rodríguez-Yáñez, M.; Suárez-Garaboa, S.; López-Dequidt, I.; Estany-Gestal, A.; Sobrino, T.; Campos, F.; Castillo, J. Random forest-based prediction of stroke outcome. Sci. Rep. 2021, 11, 10071. [Google Scholar] [CrossRef]
- Khan, I.U.; Aslam, N.; Aljabri, M.; Aljameel, S.S.; Kamaleldin, M.M.A.; Alshamrani, F.M.; Chrouf, S.M.B. Computational intelligence-based model for mortality rate prediction in COVID-19 patients. Int. J. Environ. Res. Public Health 2021, 18, 6429. [Google Scholar] [CrossRef]
- Sivaranjani, S.; Ananya, S.; Aravinth, J.; Karthika, R. Diabetes prediction using machine learning algorithms with feature selection and dimensionality reduction. In Proceedings of the 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 19–20 March 2021. [Google Scholar]
- Williamson, S.; Vijayakumar, K.; Kadam, V.J. Predicting breast cancer biopsy outcomes from BI-RADS findings using random forests with chi-square and MI features. Multimed. Tools Appl. 2022, 81, 36869–36889. [Google Scholar] [CrossRef]
- Lee, C.L.; Liu, W.J.; Tsai, S.F. Development and validation of an insulin resistance model for a population with chronic kidney disease using a machine learning approach. Nutrients 2022, 14, 2832. [Google Scholar] [CrossRef]
- Ishaq, A.; Sadiq, S.; Umer, M.; Ullah, S.; Mirjalili, S.; Rupapara, V.; Nappi, M. Improving the prediction of heart failure patients’ survival using SMOTE and effective data mining techniques. IEEE Access 2021, 9, 39707–39716. [Google Scholar] [CrossRef]
- Freedman, D.A. Statistical Models: Theory and Practice; Cambridge University Press: Cambridge, UK, 2009; p. 128. [Google Scholar]
- Choi, Y.; Boo, Y. Comparing logistic regression models with alternative machine learning methods to predict the risk of drug intoxication mortality. Int. J. Environ. Res. Public Health 2020, 17, 897. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Rustam, Z.; Zhafarina, F.; Saragih, G.S.; Hartini, S. Pancreatic cancer classification using logistic regression and random forest. IAES Int. J. Artif. Intell. 2021, 10, 476. [Google Scholar] [CrossRef]
- Selim, A.; Kandeel, S.; Alshaya, D.S.; Attia, K.A.; AlKahtani, M.D.; Albohairy, F.M.; Megahed, A. A Comparison of logistic regression and classification tree to assess brucellosis associated risk factors in dairy cattle. Prev. Vet. Med. 2022, 203, 105664. [Google Scholar]
- Kim, J.K.; Choo, Y.J.; Chang, M.C. Prediction of motor function in stroke patients using machine learning algorithm: Development of practical models. J. Stroke Cerebrovasc. Dis. 2021, 30, 105856. [Google Scholar] [CrossRef] [PubMed]
- Khanam, J.J.; Foo, S.Y. A comparison of machine learning algorithms for diabetes prediction. ICT Express 2021, 7, 432–439. [Google Scholar] [CrossRef]
- Chittora, P.; Chaurasia, S.; Chakrabarti, P.; Kumawat, G.; Chakrabarti, T.; Leonowicz, Z.; Bolshev, V. Prediction of chronic kidney disease-a machine learning perspective. IEEE Access 2021, 9, 17312–17334. [Google Scholar] [CrossRef]
- Du, K.L.; Swamy, M.N. Neural Networks and Statistical Learning; Springer Science & Business Media: London, UK, 2013. [Google Scholar]
- Taud, H.; Mas, J.F. Multilayer perceptron (MLP). In Geomatic Approaches for Modeling Land Change Scenarios; Springer: Cham, Switzerland, 2018; pp. 451–455. [Google Scholar]
- Raad, A.; Kalakech, A.; Ayache, M. Breast cancer classification using neural network approach: MLP and RBF. Ali Mohsen Kabalan 2012, 7, 105. [Google Scholar]
- Savalia, S.; Emamian, V. Cardiac arrhythmia classification by multi-layer perceptron and convolution neural networks. Bioengineering 2018, 5, 35. [Google Scholar] [CrossRef] [Green Version]
- Li, X.D.; Wang, J.S.; Hao, W.K.; Wang, M.; Zhang, M. Multi-layer perceptron classification method of medical data based on biogeography-based optimization algorithm with probability distributions. Appl. Soft Comput. 2022, 121, 108766. [Google Scholar] [CrossRef]
- Xie, Y.; Yang, H.; Yuan, X.; He, Q.; Zhang, R.; Zhu, Q.; Yan, C. Stroke prediction from electrocardiograms by deep neural network. Multimed. Tools Appl. 2021, 80, 17291–17297. [Google Scholar] [CrossRef]
- Namasudra, S.; Dhamodharavadhani, S.; Rathipriya, R. Nonlinear neural network based forecasting model for predicting COVID-19 cases. Neural Process. Lett. 2021, 1–21. Available online: https://link.springer.com/article/10.1007/s11063-021-10495-w (accessed on 2 November 2022). [CrossRef] [PubMed]
- Bukhari, M.M.; Alkhamees, B.F.; Hussain, S.; Gumaei, A.; Assiri, A.; Ullah, S.S. An improved artificial neural network model for effective diabetes prediction. Complexity 2021, 2021, 5525271. [Google Scholar] [CrossRef]
- Desai, M.; Shah, M. An anatomization on breast cancer detection and diagnosis employing multi-layer perceptron neural network (MLP) and Convolutional neural network (CNN). Clin. e-Health 2021, 4, 1–11. [Google Scholar] [CrossRef]
- Bottou, L.; Bousquet, O. The tradeoffs of large scale learning. Adv. Neural Inf. Process. Syst. 2007, 20, 1–17. [Google Scholar]
- Ketkar, N. Stochastic gradient descent. In Deep Learning with Python; Manning Publications: Shelter Island, NY, USA, 2017; pp. 113–132. [Google Scholar]
- Langer, D.L.; Van der Kwast, T.H.; Evans, A.J.; Trachtenberg, J.; Wilson, B.C.; Haider, M.A. Prostate cancer detection with multi-parametric MRI: Logistic regression analysis of quantitative T2, diffusion-weighted imaging, and dynamic contrast-enhanced MRI. J. Magn. Reson. Imaging Off. J. Int. Soc. Magn. Reson. Med. 2009, 30, 327–334. [Google Scholar] [CrossRef]
- Devaki, A.; Rao, C.G. An Ensemble Framework for Improving Brain Stroke Prediction Performance. In Proceedings of the 2022 First International Conference on Electrical, Electronics, Information and Communication Technologies (ICEEICT), Trichy, India, 16–18 February 2022. [Google Scholar]
- Ali, H.A.; Hariri, W.; Zghal, N.S.; Aissa, D.B. A Comparison of Machine Learning Methods for best Accuracy COVID-19 Diagnosis Using Chest X-ray Images. In Proceedings of the 2022 IEEE 9th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), Hammamet, Tunisia, 28–30 May 2022. [Google Scholar]
- Mutlu, G.; Acı, Ç.İ. SVM-SMO-SGD: A hybrid-parallel support vector machine algorithm using sequential minimal optimization with stochastic gradient descent. Parallel Comput. 2022, 113, 102955. [Google Scholar] [CrossRef]
- Nanglia, S.; Ahmad, M.; Khan, F.A.; Jhanjhi, N.Z. An enhanced Predictive heterogeneous ensemble model for breast cancer prediction. Biomed. Signal Process. Control 2022, 72, 103279. [Google Scholar] [CrossRef]
- Emon, M.U.; Islam, R.; Keya, M.S.; Zannat, R. Performance Analysis of Chronic Kidney Disease through Machine Learning Approaches. In Proceedings of the 2021 6th International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 20–22 January 2021. [Google Scholar]
- Mavrogiorgou, A.; Kleftakis, S.; Mavrogiorgos, K.; Zafeiropoulos, N.; Menychtas, A.; Kiourtis, A.; Maglogiannis, I.; Kyriazis, D. beHEALTHIER: A microservices platform for analyzing and exploiting healthcare data. In Proceedings of the 2021 IEEE 34th International Symposium on Computer-Based Medical Systems (CBMS), Aveiro, Portugal, 7–9 June 2021; pp. 283–288. [Google Scholar]
- Perakis, K.; Miltiadou, D.; De Nigro, A.; Torelli, F.; Montandon, L.; Magdalinou, A.; Mavrogiorgou, A.; Kyriazis, D. Data Sources and Gateways: Design and Open Specification. Acta Inform. Med. 2019, 27, 341. [Google Scholar] [CrossRef]
- Mavrogiorgou, A.; Kiourtis, A.; Kyriazis, D. A plug ‘n’play approach for dynamic data acquisition from heterogeneous IoT medical devices of unknown nature. Evol. Syst. 2020, 11, 269–289. [Google Scholar] [CrossRef]
- Jalal, A.A.; Jasim, A.A.; Mahawish, A.A. A web content mining application for detecting relevant pages using Jaccard similarity. Int. J. Electr. Comput. Eng. (IJECE) 2022, 12, 6461–6471. [Google Scholar] [CrossRef]
- Henderi, H.; Winarno, W. Text Mining an Automatic Short Answer Grading (ASAG), Comparison of Three Methods of Cosine Similarity, Jaccard Similarity and Dice’s Coefficient. J. Appl. Data Sci. 2021, 2, 45–54. [Google Scholar]
- Ormerod, M.; Del Rincón, J.M.; Devereux, B. Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis. JMIR Med. Inform. 2021, 9, e23099. [Google Scholar] [CrossRef] [PubMed]
- Mavrogiorgos, K.; Mavrogiorgou, A.; Kiourtis, A.; Kleftakis, S.; Zafeiropoulos, N.; Kyriazis, D. Automated Rule-Based Data Cleaning Using NLP. In Proceedings of the 32nd Conference of Open Innovations Association FRUCT (FRUCT), Tampere, Finland, 9–11 November 2022. [Google Scholar]
- Elhassan, A.; Abu-Soud, S.M.; Alghanim, F.; Salameh, W. ILA4: Overcoming missing values in machine learning datasets–An inductive learning approach. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 4284–4295. [Google Scholar] [CrossRef]
- Morgenthaler, S. Exploratory data analysis. Wiley Interdiscip. Rev. Comput. Stat. 2009, 1, 33–44. [Google Scholar] [CrossRef]
- Probst, P.; Bischl, B.; Boulesteix, A.-L. Tunability: Importance of Hyperparameters of Machine Learning Algorithms. arXiv 2018, arXiv:1802.09596. [Google Scholar]
- Weka 3: Machine Learning Software in Java. Available online: https://www.cs.waikato.ac.nz/ml/weka/ (accessed on 2 November 2022).
- Singh, V.K.; Joshi, K. Automated Machine Learning (AutoML): An overview of opportunities for application and research. J. Inf. Technol. Case Appl. Res. 2022, 24, 75–85. [Google Scholar] [CrossRef]
- Kaggle. Diabetes Dataset. Available online: https://www.kaggle.com/smit1212/diabetic-data-cleaning (accessed on 2 November 2022).
- Kaggle. Stroke Dataset. Available online: https://www.kaggle.com/fedesoriano/stroke-prediction-dataset (accessed on 2 November 2022).
- Kaggle. Heart Failure Dataset. Available online: https://www.kaggle.com/andrewmvd/heart-failure-clinical-data (accessed on 2 November 2022).
- GitHub. COVID 19 Dataset. Available online: https://github.com/burakalakuss/COVID-19-Clinical/tree/master/Clinical%20Data (accessed on 2 November 2022).
- Kaggle. Breast Cancer Dataset. Available online: https://www.kaggle.com/code/buddhiniw/breast-cancer-prediction/data (accessed on 2 November 2022).
- Kaggle. Kidney Disease Dataset. Available online: https://www.kaggle.com/mansoordaku/ckdisease (accessed on 2 November 2022).
- Bytyçi, I.; Bajraktari, G. Mortality in heart failure patients. Anatol. J. Cardiol. 2015, 15, 63–68. [Google Scholar] [CrossRef] [Green Version]
- World Health Organization (WHO). Noncommunicable Diseases. Available online: https://www.who.int/news-room/fact-sheets/detail/noncommunicable-diseases (accessed on 2 November 2022).
- JMeter. Available online: https://jmeter.apache.org/ (accessed on 2 November 2022).
- Bisong, E. Batch vs. Online Learning. In Building Machine Learning and Deep Learning Models on Google Cloud Platform; Apress: Berkeley, CA, USA, 2019; pp. 199–201. [Google Scholar]
- Qayyum, A.; Qadir, J.; Bilal, M.; Al-Fuqaha, A. Secure and robust machine learning for healthcare: A survey. IEEE Rev. Biomed. Eng. 2020, 14, 156–180. [Google Scholar] [CrossRef]
- Abdullah, T.A.; Zahid, M.S.M.; Ali, W. A review of interpretable ml in healthcare: Taxonomy, applications, challenges, and future directions. Symmetry 2021, 13, 2439. [Google Scholar] [CrossRef]
- Karthik, S.; Bhadoria, R.S.; Lee, J.G.; Sivaraman, A.K.; Samanta, S.; Balasundaram, A.; Ashokkumar, S. Prognostic Kalman Filter Based Bayesian Learning Model for Data Accuracy Prediction. Comput. Mater. Contin. 2022, 72, 243–259. [Google Scholar] [CrossRef]
- Mogaveera, D.; Mathur, V.; Waghela, S. e-Health Monitoring System with Diet and Fitness Recommendation using Machine Learning. In Proceedings of the 6th International Conference on Inventive Computation Technologies, Coimbatore, India, 20–22 January 2021. [Google Scholar]
- Wu, Y.; Zhang, Q.; Hu, Y.; Sun-Woo, K.; Zhang, X.; Zhu, H.; Li, S. Novel binary logistic regression model based on feature transformation of XGBoost for type 2 Diabetes Mellitus prediction in healthcare systems. Future Gener. Comput. Syst. 2022, 129, 1–12. [Google Scholar] [CrossRef]
- Xing, Y.; Wang, J.; Zhao, Z.; Gao, A. Combination Data Mining Methods with New Medical Data to Predicting Outcome of Coronary Heart Disease. In Proceedings of the 2007 International Conference on Convergence Information Technology (ICCIT 2007), Gwangju, Korea, 21–23 November 2007. [Google Scholar]
- Oza, A.; Bokhare, A. Diabetes Prediction Using Logistic Regression and K-Nearest Neighbor. In Proceedings of the Congress on Intelligent Systems, Bengaluru, India, 4–5 September 2021. [Google Scholar]
- Palimkar, P.; Shaw, R.N.; Ghosh, A. Machine learning technique to prognosis diabetes disease: Random forest classifier approach. In Advanced Computing and Intelligent Technologies; Springer: Singapore, 2022; pp. 219–244. [Google Scholar]
- Komal Kumar, N.; Vigneswari, D.; Vamsi Krishna, M.; Phanindra Reddy, G.V. An optimized random forest classifier for diabetes mellitus. In Emerging Technologies in Data Mining and Information Security; Springer: Singapore, 2019; pp. 765–773. [Google Scholar]
- Ahmad, M.A.; Eckert, C.; Teredesai, A. Interpretable machine learning in healthcare. In Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, Washington, DC, USA, 29 August–1 September 2018. [Google Scholar]
- Ho, T.T.; Tran, K.D.; Huang, Y. FedSGDCOVID: Federated SGD COVID-19 Detection under Local Differential Privacy Using Chest X-ray Images and Symptom Information. Sensors 2022, 22, 3728. [Google Scholar] [CrossRef] [PubMed]
- Oyelade, O.N.; Ezugwu, A.E.S.; Chiroma, H. CovFrameNet: An enhanced deep learning framework for COVID-19 detection. IEEE Access 2021, 9, 77905–77919. [Google Scholar] [CrossRef]
- Hassan Yaseen, H.; Alibraheemi, K.H. Classification Covid-19 disease based on CNN and Hybrid Models. NeuroQuantology 2022, 20, 8039–8054. [Google Scholar]
- Shaban, W.M.; Rabie, A.H.; Saleh, A.I.; Abo-Elsoud, M.A. A new COVID-19 Patients Detection Strategy (CPDS) based on hybrid feature selection and enhanced KNN classifier. Knowl. Based Syst. 2020, 205, 106270. [Google Scholar] [CrossRef] [PubMed]
- Yoo, S.H.; Geng, H.; Chiu, T.L.; Yu, S.K.; Cho, D.C.; Heo, J.; Choi, M.S.; Choi, H.I.; Van, C.C.; Nhung, N.V. Deep learning-based decision-tree classifier for COVID-19 diagnosis from chest X-ray imaging. Front. Med. 2020, 7, 427. [Google Scholar] [CrossRef] [PubMed]
- Akbulut, A.; Ertugrul, E.; Topcu, V. Fetal health status prediction based on maternal clinical history using machine learning techniques. Comput. Methods Programs Biomed. 2018, 163, 87–100. [Google Scholar] [CrossRef]
- Peter, T.J.; Somasundaram, K. An empirical study on prediction of heart disease using classification data mining techniques. In Proceedings of the IEEE-International Conference On Advances In Engineering, Science And Management (ICAESM-2012), Nagapattinam, India, 30–31 March 2012. [Google Scholar]
- Morgenstern, J.D.; Rosella, L.C.; Costa, A.P.; Anderson, L.N. Development of machine learning prediction models to explore nutrients predictive of cardiovascular disease using Canadian linked population-based data. Appl. Physiol. Nutr. Metab. 2022, 47, 529–546. [Google Scholar] [CrossRef]
- Qian, X.; Li, Y.; Zhang, X.; Guo, H.; He, J.; Wang, X.; Yan, Y.; Ma, J.; Ma, R.; Guo, S. A Cardiovascular Disease Prediction Model Based on Routine Physical Examination Indicators Using Machine Learning Methods: A Cohort Study. Front. Cardiovasc. Med. 2022, 9, 854287. [Google Scholar] [CrossRef]
- Çinar, A.; Tuncer, S.A. Classification of normal sinus rhythm, abnormal arrhythmia and congestive heart failure ECG signals using LSTM and hybrid CNN-SVM deep neural networks. Comput. Methods Biomech. Biomed. Eng. 2021, 24, 203–214. [Google Scholar] [CrossRef]
- Ponciano-Rodríguez, G.; Reynales-Shigematsu, L.M.; Rodríguez-Bolaños, R.; Pruñonosa-Santana, J.; Cartujano-Barrera, F.; Cupertino, A.P. Enhancing smoking cessation in Mexico using an e-Health tool in primary healthcare. Salud Pública México 2019, 60, 549–558. [Google Scholar] [CrossRef] [PubMed]
- Santos, L.I.; Osorio Camargos, M.; Silveira VasconcelosD’Angelo, M.F.; Batista Mendes, J.; de Medeiros, E.E.C.; Guimarães, A.L.S.; MartínezPalhares, R. Decision tree and artificial immune systems for stroke prediction in imbalanced data. Expert Syst. Appl. 2022, 191, 116221. [Google Scholar] [CrossRef]
- Dev, S.; Wang, H.; Nwosu, C.S.; Jain, N.; Veeravalli, B.; John, D. A predictive analytics approach for stroke prediction using machine learning and neural networks. Healthc. Anal. 2022, 2, 100032. [Google Scholar] [CrossRef]
- Paikaray, D.; Mehta, A.K. An extensive approach towards heart stroke prediction using machine learning with ensemble classifier. In Proceedings of the International Conference on Paradigms of Communication, Computing and Data Sciences, Jaipur, India, 5–7 July 2022. [Google Scholar]
- Iosa, M.; Morone, G.; Antonucci, G.; Paolucci, S. Prognostic factors in neurorehabilitation of stroke: A comparison among regression, neural network, and cluster analyses. Brain Sci. 2021, 11, 1147. [Google Scholar] [CrossRef]
- Pal, S. Chronic Kidney Disease Prediction Using Machine Learning Techniques. Biomed. Mater. Devices 2022, 1–7. [Google Scholar] [CrossRef]
- Revathy, S.; Bharathi, B.; Jeyanthi, P.; Ramesh, M. Chronic kidney disease prediction using machine learning models. Int. J. Eng. Adv. Technol. 2019, 9, 6364–6367. [Google Scholar] [CrossRef]
- Sinha, P.; Sinha, P. Comparative study of chronic kidney disease prediction using KNN and SVM. Int. J. Eng. Res. Technol. 2015, 4, 608–612. [Google Scholar]
- Almustafa, K.M. Prediction of chronic kidney disease using different classification algorithms. Inform. Med. Unlocked 2021, 24, 100631. [Google Scholar] [CrossRef]
- Singh, V.; Asari, V.K.; Rajasekaran, R. A Deep Neural Network for Early Detection and Prediction of Chronic Kidney Disease. Diagnostics 2022, 12, 116. [Google Scholar] [CrossRef]
- Kim, D.H.; Ye, S.Y. Classification of chronic kidney disease in sonography using the GLCM and artificial neural network. Diagnostics 2021, 11, 864. [Google Scholar] [CrossRef]
- Mittal, D.; Gaurav, D.; Roy, S.S. An effective hybridized classifier for breast cancer diagnosis. In Proceedings of the 2015 IEEE international conference on advanced intelligent mechatronics (AIM), Busan, Korea, 7–11 July 2015. [Google Scholar]
- Tran, T.; Le, U.; Shi, Y. An effective up-sampling approach for breast cancer prediction with imbalanced data: A machine learning model-based comparative analysis. PLoS ONE 2019, 17, e0269135. [Google Scholar] [CrossRef] [PubMed]
- Pfob, A.; Mehrara, B.J.; Nelson, J.A.; Wilkins, E.G.; Pusic, A.L.; Sidey-Gibbons, C. Towards patient-centered decision-making in breast cancer surgery: Machine learning to predict individual patient-reported outcomes at 1-year follow-up. Ann. Surg. 2022. [Google Scholar] [CrossRef]
- Rasool, A.; Bunterngchit, C.; Tiejian, L.; Islam, M.R.; Qu, Q.; Jiang, Q. Improved machine learning-based predictive models for breast cancer diagnosis. Int. J. Environ. Res. Public Health 2022, 19, 3211. [Google Scholar] [CrossRef] [PubMed]
- Naseem, U.; Rashid, J.; Ali, L.; Kim, J.; Haq, Q.E.U.; Awan, M.J.; Imran, M. An automatic detection of breast cancer diagnosis and prognosis based on machine learning using ensemble of classifiers. IEEE Access 2022, 10, 78242–78252. [Google Scholar] [CrossRef]
- Allugunti, V.R. Breast cancer detection based on thermographic images using machine learning and deep learning algorithms. Int. J. Eng. Comput. Sci. 2022, 4, 49–56. [Google Scholar]
- Marcus, G. Deep learning: A critical appraisal. arXiv 2018, arXiv:1801.00631. [Google Scholar]
- Bologna, G.; Hayashi, Y. Characterization of symbolic rules embedded in deep DIMLP networks: A challenge to transparency of deep learning. J. Artif. Intell. Soft Comput. Res. 2017, 7, 265. [Google Scholar] [CrossRef] [Green Version]
- Lacave, C.; Díez, F.J. A review of explanation methods for Bayesian networks. Knowl. Eng. Rev. 2002, 17, 107–127. [Google Scholar] [CrossRef]
- Kiourtis, A.; Karamolegkos, P.; Karabetian, A.; Voulgaris, K.; Poulakis, Y.; Mavrogiorgou, A.; Kyriazis, D. An Autoscaling Platform Supporting Graph Data Modelling Big Data Analytics. Stud. Health Technol. Inform. 2022, 295, 376–379. [Google Scholar]
No. | Attribute Name | Attribute Information | Range of Values |
---|---|---|---|
1 | Race | Race of patient | “Caucasian”, “Asian”, “African”, “American”, “Hispanic”, “Other” |
2 | Gender | Gender of patient | “Male”, “Female”, “Unknown/Invalid” |
3 | Age | Age of patient | (0–10), …, (90,100) |
4 | Admission type | Type of admission | (1–8) |
5 | Discharge disposition | Disposition of discharge | (1–28) |
6 | Admission source | Source of admission | (1–20) |
7 | Time in hospital | Number of days between admission and discharge | (1–14) |
8 | Number of procedures | Number of operations conducted during the encounter | (0–6) |
9 | Number of medications | Number of different names used throughout the encounter | (1–81) |
10 | Number of inpatient visits | Number of inpatient visits in the year preceding the encounter | (0–21) |
11 | Number of diagnoses | Number of diagnoses that have been entered into the system | (1–16) |
12 | Glucose serum test result | Range of result/Test not taken | “>200”, “>300”, “Normal”, “None” |
13 | A1c test result | Range of result/Test not taken | “>8”, “>7”, “Normal”, “None” |
14 | Change of medications | Change in diabetic medications (either dosage or generic name) | “Change”, “No change” |
No. | Attribute Name | Attribute Information | Range of Values |
---|---|---|---|
1 | id | Unique identifier | (67–72.940) |
2 | gender | Patient’s gender | “Male”, “Female”, “Other” |
3 | age | Patient’s age | (0.08–82) |
4 | hypertension | Patient has hypertension or not | (0–1) 0: Does not have hypertension, 1: Has hypertension |
5 | heart_disease | Patient has heart_disease or not | (0–1) 0: Does not have any heart diseases, 1: Has heart diseases |
6 | ever_married | Patient is or not married | “No”, “Yes” |
7 | work_type | Type of work type | “Children”, “Govt_jov”, “Private” “Never_worked”,“Self-employed” |
8 | residence_type | Residence type of patient | “Rural”, “Urban” |
9 | avg_glucose_level | Average blood glucose level | (55.12–271.74) |
10 | bmi | Body mass index | (10.3–97.6) |
11 | smoking_status | Patient smoking status | “Formerly smoked”, “Never smoked”, “Smokes”, “Unknown” |
No. | Attribute Name | Attribute Information | Range of Values |
---|---|---|---|
1 | age | Age of the patient | (40–95) |
2 | anemia | Decrease of red blood cells or hemoglobin | (0–1) 0: red blood cells, 1: hemoglobin |
3 | creatinine_phosphokinase | Blood’s CPK enzyme (mcg/L) | (23–7861) |
4 | diabetes | If the patient has diabetes | (0–1) 0: patient has diabetes, 1: patient has not diabetes |
5 | ejection_fraction | Percentage of blood that leaves the heart with each contraction | (14–80) |
6 | high_blood_pressure | If the patient has hypertension | (0–1) 0: patient has hypertension, 1: patient has not hypertension |
7 | platelets | Platelets found in the blood (kiloplatelets/mL) | (25,100–850,000) |
8 | serum_creatinine | Blood’s serum creatinine (mg/dL) | (0.5–9.4) |
9 | serum_sodium | Blood’s serum sodium (mEq/L) | (113–148) |
10 | sex | Woman or man | (0–1) 0: woman, 1: man |
11 | smoking | Whether or not the patient smokes | (0–1) 0: patient smokes, 1: patient does not smoke |
12 | time | Follow-up period (days) | (4–285) |
13 | death_event | If the patient deceased during the follow-up period | (0–1) 0: patient has deceased during the follow-up period, 1: patient has not deceased during the follow-up period |
No. | Attribute Name | Attribute Information | Range of Values |
---|---|---|---|
1 | Patient age quantile | Age of the patient | (0–19) |
2 | Hematocrit | Quantity of hematocrit | (−4.50–2.66) |
3 | Hemoglobin | Quantity of hemoglobin | (−4.34–2.67) |
4 | Platelets | Quantity of platelets | (−2.55–9.53) |
5 | Red blood Cells | Quantity of red blood cells | (−3.97–3.64) |
6 | Lymphocytes | Quantity of lymphocytes | (−1.86–3.76) |
7 | Leukocytes | Quantity of leukocytes | (−2.02–4.52) |
8 | Basophils | Quantity of basophils | (−1.14–11.07) |
9 | Eosinophils | Quantity of eosinophils | (−0.83–8.35) |
10 | Monocytes | Quantity of monocytes | (−2.16–4.53) |
11 | Serum Glucose | Quantity of serum glucose | (−1.10–7.00) |
12 | Neutrophils | Quantity of neutrophils | (3.33–2.53) |
13 | Urea | Quantity of urea | (−1.63–11.24) |
14 | Proteina C reativa mg/dL | Quantity of proteina C reativa | (−0.53–8.02) |
15 | Creatinine | Quantity of creatinine | (−2.38–5.05) |
16 | Potassium | Quantity of potassium | (−2.28–3.40) |
17 | Sodium | Quantity of sodium | (−5.24–4.09) |
18 | Alanine transaminase | Quantity of alanine transaminase | (−0.64–7.93) |
19 | Aspartate transaminase | Quantity of aspartate transaminase | (−0.70–7.23) |
No. | Attribute Name | Attribute Information | Range of Values |
---|---|---|---|
1 | radius_mean | Radius of lobes | (6.98–28.1) |
2 | texture_mean | Mean of surface texture | (9.71–39.3) |
3 | perimeter_mean | Outer perimeter of lobes | (43.8–189) |
4 | area_mean | Mean area of lobes | (144–2501) |
5 | smoothness_mean | Mean of smoothness levels | (0.05–0.16) |
6 | compactness_mean | Mean of compactness | (0.02–0.35) |
7 | concavity_mean | Mean of concavity | (0–0.43) |
8 | concave points_mean | Mean of concave points | (0–0.2) |
9 | symmetry_mean | Mean of symmetry | (0.11–0.3) |
10 | fractal_dimension_mean | Mean of fractal dimension | (0.05–0.1) |
11 | radius_se | SE of radius | (0.11–2.87) |
12 | texture_se | SE of texture | (0.36–4.88) |
13 | perimeter_se | Perimeter of SE | (0.76–22) |
14 | area_se | Are of SE | (6.8–542) |
15 | smoothness_se | SE of smoothness | (0–0.03) |
16 | compactness_se | SE of compactness | (0–0.14) |
17 | concavity_se | SEE of concavity | (0–0.4) |
18 | concave points_se | SE of concave points | (0–0.05) |
19 | symmetry_se | SE of symmetry | (0.01–0.08) |
20 | fractal_dimension_se | SE of fractal dimension | (0–0.03) |
21 | radius_worst | Worst radius | (7.93–36) |
22 | texture_worst | Worst texture | (12–49.5) |
23 | perimeter_worst | Worst perimeter | (50.4–251) |
24 | area_worst | Worst area | (185–4250) |
25 | smoothness_worst | Worst smoothness | (0.07–0.22) |
26 | compactness_worst | Worse compactness | (0.03–1.06) |
27 | concavity_worst | Worst concavity | (0–1.25) |
28 | concave points_worst | Worst concave Points | (0–0.29) |
29 | symmetry_worst | Worst symmetry | (0.16–0.66) |
30 | fractal_dimension_worst | Worst fractal dimension | (0.06–0.21) |
No. | Attribute Name | Attribute Information | Range of Values |
---|---|---|---|
1 | Id | Unique ID | (1–399) |
2 | Age | Age of the patient | (2–90) |
3 | Bp | Blood pressure | (50–180) |
4 | Sg | Specific gravity | (1–1.02) |
5 | Al | Albumin | (0–5) |
6 | Su | Sugar | (0–5) |
7 | Rbc | Red blood cells | “Normal”, “Unknown” |
8 | Pc | Pus cell | “Normal”, “Unknown” |
9 | Pcc | Pus cell clumps | “Not present”, “Present”, “Unknown” |
10 | Ba | Bacteria | “Not present”, “Present”, “Unknown” |
11 | Bgr | Blood glucose random | (70–490) |
12 | Bu | Blood urea | (10–391) |
13 | Sc | Serum creatinine | (0.4–76) |
14 | Sod | Sodium | (4.5–163) |
15 | Pot | Potassium | (2.7–47) |
16 | Hemo | Hemoglobin | (3.1–17.8) |
17 | Pcv | Packed cell volume | (9–54) |
18 | Wc | White blood cell count | (0–9600) |
19 | Rc | Red blood cell count | (0–4.5) |
20 | Htn | Hypertension | ”True”, ”False” |
21 | Dm | Diabetes mellitus | ”No”, ”Yes”, ”Other” |
22 | Cad | Coronary artery disease | ”No”, ”Yes”, ”Other” |
23 | Appet | Appetite | ”Good”, ”Poor” |
24 | Pe | Pedal edema | ”True”, ”False” |
25 | Ane | Anemia | ”True”, ”False” |
Dataset | Number of Records | Cleaning Metrics | ||
---|---|---|---|---|
Missing Values | Outliers Values | Duplicate Values | ||
Diabetes | 101,766 | 100,723 | 0 | 0 |
Stroke | 5110 | 201 | 0 | 1 |
Heart Failure | 299 | 0 | 19 | 0 |
COVID-19 | 600 | 0 | 0 | 0 |
Breast Cancer | 569 | 0 | 5 | 0 |
Kidney Disease | 400 | 684 | 0 | 0 |
Parameter | Set Value | Description |
---|---|---|
alpha | 1.0 | Additive parameter (Laplace/Lidstone) used for smoothing |
binarize | 0.0 | Threshold used for mapping to booleans a sample feature |
fit_prior | True | Learn class prior probabilities |
class_prior | None | Prior probabilities of the classes |
Parameter | Set Value | Description |
---|---|---|
n_neighbors | 5 | Integer number corresponding to the neighbors |
weights | Uniform | All the points in each neighborhood are equally weighted |
algorithm | Auto | The most proper algorithm is chosen based on the values that are passed to the fit method |
Leaf_size | 30 | Leaf size that is passed to BallTree or KDTree |
metric | 2 | Minkowski distance (equivalent to standard Euclidean metric) |
Parameter | Set Value | Description |
---|---|---|
criterion | Gini | Function to review the quality of a split |
splitter | Best | Strategy to choose the splitting method at each node |
max_depth | None | Maximum depth of tree (if None, nodes are expanded until all leaves are pure or contain less than min_samples_split) |
Parameter | Set Value | Description |
---|---|---|
n_estimators | 100 | Integer number corresponding to the trees in the forest |
criterion | Gini | Function to review a split |
max_depth | None | Maximum depth of the tree |
min_samples_split | 2 | Integer number that minimizes the number of samples required to split an internal node |
min_samples_leaf | 1 | Integer number that minimizes the number of samples required to be at a leaf node |
min_weight_fraction_leaf | 0.0 | Minimum weighted fraction of the total of weights (of all the input samples) required to be at a leaf node |
max_features | sqrt | Number of features to consider when looking for the best split |
max_leaf_nodes | None | Grow trees with max_leaf_nodes in best-first fashion, where best nodes are defined as relative reduction in impurity |
min_impurity_decrease | 0.0 | A node will be split if this split induces a decrease of the impurity greater than or equal to this value |
bootstrap | True | Use bootstrap samples when building trees |
oob_score | False | Use out-of-bag samples to estimate the generalization score |
n_jobs | None | Number of jobs to run in parallel |
random_state | None | Randomness of samples’ bootstrapping when building trees and sampling of features when looking for the best node’s split |
verbose | 0 | Verbosity when fitting and predicting |
warm_start | False | Fit a whole new forest |
class_weight | None | Weight of each class |
ccp_alpha | 0.0 | Parameter used for Minimal Cost-Complexity Pruning |
max_samples | None | Draw X.shape [0] samples |
Parameter | Set Value | Description |
---|---|---|
solver | liblinear | Algorithm to use in the optimization problem |
penalty | l2 | Additive penalty term (L2) |
dual | True | Dual or primal formulation (Dual formulation is only implemented for L2 penalty with liblinear solver) |
tol | 10-4 | Tolerance for stopping criteria |
C | 1.0 | Inverse of regularization strength, where smaller values specify stronger regularization |
fit_intercept | True | A constant should be added to the decision function |
intercept_scaling | 1 | Used for solver ‘liblinear’ self.fit_intercept ‘True’ |
class_weight | None | No class weigh |
random state | None | Shuffle data |
max_iter | 100 | Maximum number of iterations for solvers to converge |
multi_class | Auto | Selects ‘ovr’ if data is binary or if solver = ‘liblinear’, otherwise selects ‘multinomial’ |
verbose | 0 | Verbosity level |
Parameter | Set Value | Description |
---|---|---|
hidden_layer_sizes | 5000, 10 | Represents the number of neurons in the ith hidden layer |
activation | relu | Activation function for the hidden layer |
solver | lbfgs | Optimizer in the family of quasi-Newton methods |
alpha | 10-5 | Strength of the L2 regularization term |
batch_size | Auto | Size of minibatches for stochastic optimizers |
learning_rate | Constant | Learning rate schedule for weight updates |
learning_rate_init | 0.001 | Initial learning rate for step-size in updating the weights |
power_t | 0.5 | Exponent for inverse scaling learning rate |
max_iter | 200 | Maximum number of iterations |
shuffle | True | Whether to shuffle samples in each iteration |
random_state | None | Random number generation for weights and bias initialization |
tol | 10-4 | Tolerance for the optimization |
verbose | False | Print progress messages to stdout |
warm_start | False | Erase the previous solution |
momentum | 0.9 | Momentum for gradient descent update |
nesterovs_momentum | True | Use Nesterov’s momentum |
early_stopping | False | Use early stopping to terminate training when validation score is not improving |
validation_fraction | 0.1 | Proportion of training data to set aside as validation set for early stopping |
max_fun | 15000 | Maximum number of loss function calls |
Parameter | Set Value | Description |
---|---|---|
loss | hinge | Loss function |
penalty | l2 | Penalty (regularization) to be used |
alpha | 0.0001 | Constant that multiplies the regularization term |
fit_intercept | True | Intercept should be estimated |
max_iter | 5000 | Maximum number of passes over training data (epochs) |
tol | 10-3 | Stopping criterion |
shuffle | True | Training data should be shuffled after each epoch |
verbose | 0 | Verbosity level |
epsilon | 0.1 | Epsilon in the epsilon-insensitive loss functions |
n_jobs | None | Number of CPUs for One Versus All (OVA) computation |
random_state | None | Shuffling the data |
learning_rate | optimal | eta = 1.0///(alpha * (t + t0)) where t0 is chosen by a heuristic |
power_t | 0.5 | Exponent for inverse scaling learning rate |
early_stopping | False | Use of early stopping to terminate training when validation score is not improving |
validation_fraction | 0.1 | Proportion of training data to set aside as validation set for early stopping |
n_iter_no_change | 5 | Number of iterations with no improvement to wait before stopping fitting |
class_weight | None | Preset for the class_weight fit parameter |
warm_start | False | Erase the previous solution |
Dataset | ML Algorithms | ||||||
---|---|---|---|---|---|---|---|
BNB | KNN | DT | LR | RF | SGD | NN | |
Diabetes | Y(76%) | N(68%) | Y(70%) | Y(77%) | Y(76%) | N(54%) | N(77%) |
Stroke | N(95%) | N(96%) | Y(91%) | Y(95%) | N(95%) | N(96%) | N(92%) |
Heart Failure | Y(80%) | N(76%) | N(72%) | Y(82%) | Y(86%) | N(80%) | N(68%) |
COVID-19 | N(85%) | N(88%) | N(79%) | N(89%) | N(92%) | N(87%) | N(86%) |
Breast Cancer | B(93%) | B(96%) | B(94%) | B(100%) | B(94%) | B(98%) | B(98%) |
Kidney Disease | N(90%) | N(90%) | N(100%) | N(90%) | N(100%) | N(97%) | N(100%) |
Disease | Author | Methods | ||||||
---|---|---|---|---|---|---|---|---|
BNB | KNN | NN | SGD | DT | LR | RF | ||
Diabetes | Mogaveera et al. (2021) [104] | * | ||||||
Wu et al. (2022) [105] | * | |||||||
Xing et al. (2007) [106] | * | * | ||||||
Oza et al. (2022) [107] | * | * | ||||||
Palimkar et al. (2022) [108] | * | * | * | * | ||||
Komal et al. (2019) [109] | * | |||||||
COVID-19 | Ahmad et al. (2018) [110] | * | ||||||
Ho et al. (2022) [111] | * | |||||||
Oyelade et al. (2021) [112] | * | |||||||
Hassan Yaseen et al. (2022) [113] | * | * | ||||||
Shaban et al. (2020) [114] | * | |||||||
Yoo et al. (2020) [115] | * | |||||||
Heart Failure | Akbulut et al. (2018) [116] | * | * | * | * | |||
Peter et al. (2012) [117] | * | * | * | * | ||||
Morgenstern et al. (2022) [118] | * | |||||||
Qian et al. (2022) [119] | * | * | ||||||
Çınar et al. (2021) [120] | * | |||||||
Stroke | Ponciano-Rodríguez et al. (2019) [121] | * | ||||||
Santos et al. (2022) [122] | * | |||||||
Dev et al. (2022) [123] | * | * | * | |||||
Paikaray et al. (2022) [124] | * | * | ||||||
Iosa et al. (2021) [125] | * | |||||||
Kidney Disease | Pal et al. (2022) [126] | * | * | |||||
Revathy et al. (2022) [127] | * | * | ||||||
Sinha et al. (2015) [128] | * | |||||||
Almustafa et al. (2015) [129] | * | * | * | |||||
Singh et al. (2022) [130] | * | |||||||
Kim et al. (2021) [131] | * | |||||||
Breast Cancer | Mittal et al. (2015) [132] | * | ||||||
Tran et al. (2022) [133] | * | * | ||||||
Pfob et al. (2022) [134] | * | * | ||||||
Rasool et al. (2022) [135] | * | * | ||||||
Naseem et al. (2022) [136] | * | * | * | * | ||||
Allugunti et al. (2022) [137] | * | * |
Author | Components | ||||
---|---|---|---|---|---|
Gateway | Data Reliability | Hyperparameters Tuning | Data Storage | Model Evaluation | |
Proposed Mechanism | * | * | * | * | * |
Mogaveera et al. (2021) [104] | * | * | |||
Wu et al. (2022) [105] | * | * | |||
Xing et al. (2007) [106] | * | * | |||
Oza et al. (2022) [107] | * | * | |||
Palimkar et al. (2022) [108] | * | ||||
Komal et al. (2019) [109] | * | * | |||
Ahmad et al. (2018) [110] | * | ||||
Ho et al. (2022) [111] | * | * | * | ||
Oyelade et al. (2021) [112] | * | * | |||
Hassan Yaseen et al. (2022) [113] | * | * | |||
Shaban et al. (2020) [114] | * | * | * | ||
Yoo et al. (2020) [115] | * | ||||
Akbulut et al. (2018) [116] | * | * | * | ||
Peter et al. (2012) [117] | * | ||||
Morgenstern et al. (2022) [118] | * | ||||
Qian et al. (2022) [119] | * | * | * | ||
Çınar et al. (2021) [120] | * | * | |||
Ponciano-Rodríguez et al. (2019) [121] | * | ||||
Santos et al. (2022) [122] | * | * | |||
Dev et al. (2022) [123] | * | * | |||
Paikaray et al. (2022) [124] | * | ||||
Iosa et al. (2021) [125] | * | ||||
Pal et al. (2022) [126] | * | * | |||
Revathy et al. (2022) [127] | * | ||||
Sinha et al. (2015) [128] | * | ||||
Almustafa et al. (2015) [129] | * | ||||
Singh et al. (2022) [130] | * | * | * | ||
Kim et al. (2021) [131] | * | * | |||
Mittal et al. (2015) [132] | * | * | |||
Tran et al. (2022) [133] | * | ||||
Pfob et al. (2022) [134] | * | * | |||
Rasool et al. (2022) [135] | * | * | |||
Naseem et al. (2022) [136] | * | * | |||
Allugunti et al. (2022) [137] | * | * |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mavrogiorgou, A.; Kiourtis, A.; Kleftakis, S.; Mavrogiorgos, K.; Zafeiropoulos, N.; Kyriazis, D. A Catalogue of Machine Learning Algorithms for Healthcare Risk Predictions. Sensors 2022, 22, 8615. https://doi.org/10.3390/s22228615
Mavrogiorgou A, Kiourtis A, Kleftakis S, Mavrogiorgos K, Zafeiropoulos N, Kyriazis D. A Catalogue of Machine Learning Algorithms for Healthcare Risk Predictions. Sensors. 2022; 22(22):8615. https://doi.org/10.3390/s22228615
Chicago/Turabian StyleMavrogiorgou, Argyro, Athanasios Kiourtis, Spyridon Kleftakis, Konstantinos Mavrogiorgos, Nikolaos Zafeiropoulos, and Dimosthenis Kyriazis. 2022. "A Catalogue of Machine Learning Algorithms for Healthcare Risk Predictions" Sensors 22, no. 22: 8615. https://doi.org/10.3390/s22228615
APA StyleMavrogiorgou, A., Kiourtis, A., Kleftakis, S., Mavrogiorgos, K., Zafeiropoulos, N., & Kyriazis, D. (2022). A Catalogue of Machine Learning Algorithms for Healthcare Risk Predictions. Sensors, 22(22), 8615. https://doi.org/10.3390/s22228615