[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3075564.3075576acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

An Ensemble Model for Diabetes Diagnosis in Large-scale and Imbalanced Dataset

Published: 15 May 2017 Publication History

Abstract

Diabetes is becoming a more and more serious health challenge worldwide with the yearly rising prevalence, especially in developing countries. The vast majority of diabetes are type 2 diabetes, which has been indicated that about 80% of type 2 diabetes complications can be prevented or delayed by timely detection. In this paper, we propose an ensemble model to precisely diagnose the diabetic on a large-scale and imbalance dataset. The dataset used in our work covers millions of people from one province in China from 2009 to 2015, which is highly skew. Results on the real-world dataset prove that our method is promising for diabetes diagnosis with a high sensitivity, F3 and G --- mean, i.e, 91.00%, 58.24%, 86.69%, respectively.

References

[1]
Nahla Barakat, Andrew P Bradley, and Mohamed Nabil H Barakat. 2010. Intelligible support vector machines for diagnosis of diabetes mellitus. IEEE transactions on information technology in biomedicine 14, 4 (2010), 1114--1120.
[2]
Gustavo EAPA Batista, Ronaldo C Prati, and Maria Carolina Monard. 2004. A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter 6, 1 (2004), 20--29.
[3]
Leo Breiman. 1996. Bagging predictors. Machine learning 24, 2 (1996), 123--140.
[4]
Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. 2013. API design for machine learning software: experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning. 108--122.
[5]
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321--357.
[6]
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 785--794.
[7]
Esin Dogantekin, Akif Dogantekin, Derya Avci, and Levent Avci. 2010. An intelligent diagnosis system for diabetes on linear discriminant analysis and adaptive network based fuzzy inference system: LDA-ANFIS. Digital Signal Processing 20, 4 (2010), 1248--1255.
[8]
Chris Drummond, Robert C Holte, and others. C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. Citeseer.
[9]
Monica Franciosi, Giorgia De Berardis, Maria CE Rossi, Michele Sacco, Maurizio Belfiglio, Fabio Pellegrini, Gianni Tognoni, Miriam Valentini, and Antonio Nicolucci. 2005. Use of the diabetes risk score for opportunistic screening of undiagnosed diabetes and impaired glucose tolerance. Diabetes Care 28, 5 (2005), 1187--1194.
[10]
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.
[11]
David G Shoback Gardner, Dolores Greenspan, and others. 2007. Greenspan's basic & clinical endocrinology. McGraw-Hill Medical,.
[12]
Ruchika Goel, Anoop Misra, Dimple Kondal, Ravindra M Pandey, Naval K Vikram, Jasjeet S Wasir, Vibha Dhingra, and Kalpana Luthra. 2009. Identification of insulin resistance in Asian Indian adolescents: classification and regression tree (CART) and logistic regression based classification rules. Clinical endocrinology 70, 5 (2009), 717--724.
[13]
Longfei Han, Senlin Luo, Jianmin Yu, Limin Pan, and Songjing Chen. 2015. Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes. IEEE journal of biomedical and health informatics 19, 2 (2015), 728--734.
[14]
Kenneth E Heikes, David M Eddy, Bhakti Arondekar, and Leonard Schlessinger. 2008. Diabetes risk calculator. Diabetes care 31, 5 (2008), 1040--1045.
[15]
Yue Huang, Paul McCullagh, Norman Black, and Roy Harper. 2007. Feature selection and classification model construction on type 2 diabetic patients data. Artificial intelligence in medicine 41, 3 (2007), 251--262.
[16]
Lin Li. 2014. Diagnosis of Diabetes Using a Weight-Adjusted Voting Approach. In Bioinformatics and Bioengineering (BIBE), 2014 IEEE International Conference on. IEEE, 320--324.
[17]
Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou. 2009. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39, 2 (2009), 539--550.
[18]
World Health Organization and others. 2006. Definition and diagnosis of diabetes mellitus and intermediate hyperglycaemia. (2006).
[19]
World Health Organization and others. 2016. Global report on diabetes. (2016).
[20]
Jaakko Tuomilehto, Jaana Lindström, Johan G Eriksson, Timo T Valle, Helena Hämäläinen, Pirjo Ilanne-Parikka, Sirkka Keinänen-Kiukaanniemi, Mauri Laakso, Anne Louheranta, Merja Rastas, and others. 2001. Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. New England Journal of Medicine 344, 18 (2001), 1343--1350.
[21]
Gary M Weiss. 2004. Mining with rarity: a unifying framework. ACM Sigkdd Explorations Newsletter 6, 1 (2004), 7--19.
[22]
Dennis L Wilson. 1972. Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics 2, 3 (1972), 408--421.
[23]
Jianxin Wu, S Charles Brubaker, Matthew D Mullin, and James M Rehg. 2008. Fast asymmetric learning for cascade face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 3 (2008), 369--382.
[24]
Zhi-Hua Zhou and Xu-Ying Liu. 2006. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering 18, 1 (2006), 63--77.

Cited By

View all
  • (2024)Handling imbalanced medical datasets: review of a decade of researchArtificial Intelligence Review10.1007/s10462-024-10884-257:10Online publication date: 2-Sep-2024
  • (2023)Automated Detection of Type 2 Diabetes with Imbalanced and Machine Learning MethodsMachine Learning, Image Processing, Network Security and Data Sciences10.1007/978-981-19-5868-7_3(29-40)Online publication date: 1-Jan-2023
  • (2022)Predicting diabetes in imbalanced datasets using neural networksProceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3535508.3545540(1-10)Online publication date: 7-Aug-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CF'17: Proceedings of the Computing Frontiers Conference
May 2017
450 pages
ISBN:9781450344876
DOI:10.1145/3075564
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 May 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Diabetes diagnosis
  2. Ensemble model
  3. Imbalanced data

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CF '17
Sponsor:
CF '17: Computing Frontiers Conference
May 15 - 17, 2017
Siena, Italy

Acceptance Rates

CF'17 Paper Acceptance Rate 43 of 87 submissions, 49%;
Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)1
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Handling imbalanced medical datasets: review of a decade of researchArtificial Intelligence Review10.1007/s10462-024-10884-257:10Online publication date: 2-Sep-2024
  • (2023)Automated Detection of Type 2 Diabetes with Imbalanced and Machine Learning MethodsMachine Learning, Image Processing, Network Security and Data Sciences10.1007/978-981-19-5868-7_3(29-40)Online publication date: 1-Jan-2023
  • (2022)Predicting diabetes in imbalanced datasets using neural networksProceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3535508.3545540(1-10)Online publication date: 7-Aug-2022
  • (2022)Improved inpatient deterioration detection in general wards by using time-series vital signsScientific Reports10.1038/s41598-022-16195-212:1Online publication date: 13-Jul-2022
  • (2022)Design and implementation of intelligent patient in-house monitoring system based on efficient XGBoost-CNN approachCognitive Neurodynamics10.1007/s11571-021-09754-216:5(1135-1149)Online publication date: 12-Jan-2022
  • (2021)Current-Visit and Next-Visit Prediction for Fatty Liver Disease With a Large-Scale Dataset: Model Development and Performance ComparisonJMIR Medical Informatics10.2196/263989:8(e26398)Online publication date: 12-Aug-2021
  • (2021)MLP-DTP: Performance Evaluation of Diabetes Class Prediction2021 10th International Conference on Internet of Everything, Microwave Engineering, Communication and Networks (IEMECON)10.1109/IEMECON53809.2021.9689183(1-6)Online publication date: 1-Dec-2021
  • (2021)Emotion-infused deep neural network for emotionally resonant conversationApplied Soft Computing10.1016/j.asoc.2021.107861(107861)Online publication date: Sep-2021
  • (2020)Predicting autism spectrum disorder from associative genetic markers of phenotypic groups using machine learningJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-020-02155-z12:3(3257-3270)Online publication date: 31-May-2020
  • (2019)A Study on Machine Vision Techniques for the Inspection of Health Personnels’ Protective Suits for the Treatment of Patients in Extreme IsolationElectronics10.3390/electronics80707438:7(743)Online publication date: 30-Jun-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media