WebMAC: A web based clinical expert system

Saba Bashir¹,
Usman Qamar¹ &
Farhan Hassan Khan¹

538 Accesses
6 Citations
Explore all metrics

Abstract

Disease diagnosis at early stages can enable the physicians to overcome the complications and treat them properly. The diagnosis method plays an important role in disease diagnosis and accuracy of its treatment. A diagnosis expert system can help a great deal in identifying those diseases and describing methods of treatment to be carried out; taking into account the user capability in order to deal and interact with expert system easily and clearly. A good way to improve diagnosis accuracy of expert systems is use of ensemble classifiers. The proposed research presents an expert system using multi-layer classification with enhanced bagging and optimized weighting. The proposed method is named as “M2-BagWeight” which overcomes the limitations of individual as well as other ensemble classifiers. Evaluation of the proposed model is performed on two different liver disease datasets, chronic kidney disease dataset, heart disease dataset, diabetic retinopathy debrecen dataset, breast cancer dataset and primary tumor dataset obtained from UCI public repository. It is clear from the analysis of results that proposed expert system has achieved high classification and prediction accuracy when compared with individual as well as ensemble classifiers. Moreover, an application named “WebMAC” is also developed for practical implementation of proposed model in hospital for diagnostic advice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Performance Analysis of State-of-the-Art Classifiers and Stack Ensemble Model for Liver Disease Diagnosis

Liver Disease Prediction Using an Ensemble Based Approach

An Ensemble Model for Predicting Chronic Diseases Using Machine Learning Algorithms

Notes

http://archive.ics.uci.edu/ml/datasets.html [Last Accessed 25 Sep. 2015]

References

Aruna, S., Rajagopalan, S. P., & Nandakishore, L. V. (2011). Knowledge based analysis of various statistical tools in detecting breast cancer. CCSEA, CS IT, 02, 37–45.
Google Scholar
Ashfaq, A. K., Aljahdali, S., & Hussain, S. N. (2013). Comparative prediction performance with support vector machine and random forest classification techniques. International Journal of Computers and Applications, 69(11), 0975–8887.
Google Scholar
Ba-Alwi, F. M., & Hintaya, H. M. (2013). Comparative Study for Analysis the Prognostic in Hepatitis Data: Data Mining Approach. International Journal of Scientific & Engineering Research, 4(8), 680–685.
Google Scholar
Ben-Hur, A., & Weston, J. (2010). A user’s guide to support vector machines. In Data mining techniques for the life sciences, Humana Press. (pp. 223–239).
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Article Google Scholar
Chen, A.H.; Huang, S.Y.; Hong, P.S.; Cheng, C.H.; Lin, E.J. (2011a). HDPS: Heart disease prediction system. In: Computing in Cardiology, IEEE pp. 557–560.
Chen, C. M., Hsu, C. Y., Chiu, H. W., & Rau, H. H. (2011b). Prediction of survival in patients with liver cancer using artificial neural networks and classification and regression trees. In Natural Computation (ICNC), 2011, IEEE, Seventh International Conference on (Vol. 2, pp. 811–815).
Chitra, R., & Seenivasagam, D. V. (2013). Heart Disease Prediction System Using Supervised Learning Classifier. International Journal of Software Engineering and Soft Computing, 3(1).
Díez-Pastor, J. F., Rodríguez, J. J., García-Osorio, C., & Kuncheva, L. I. (2015). Random balance: ensembles of variable priors classifiers for imbalanced data. Knowledge-Based Systems, 85, 96–111.
Article Google Scholar
Dua, S., & Du, X. (2011). Data mining and machine learning in cyber security. CRC press.
Fernandez-Millan, R., Medina-Merodio, J. A., Plata, R. B., Martinez-Herraiz, J. J., & Gutierrez-Martinez, J. M. (2015). A laboratory test expert system for clinical diagnosis support in primary health care. Applied Sciences, 5(3), 222–240.
Article Google Scholar
Freund, Y. (2001). An adaptive version of the boost by majority algorithm. Machine Learning, 43(3), 293–318.
Article Google Scholar
Freund, Y., & Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 119–139.
Article Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics, 28(2), 337–407.
Article Google Scholar
García-Laencina, P. J., Sancho-Gómez, J. L., Figueiras-Vidal, A. R., & Verleysen, M. (2009). K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing, 72(7), 1483–1493.
Article Google Scholar
Gath, S.J., & Kulkarni, R.V. (2014). A Review: expert system for diagnosis of myocardial infarction. arXiv preprint arXiv:1401.0245.
Ghumbre, S., Patil, C., Ghatol, A. (2011). Heart disease diagnosis using support vector machine. In: International Conference on Computer Science and Information Technology (ICCSIT') Pattaya.
Gulia, A., Vohra, R., & Rani, P. (2014). Liver patient classification using intelligent techniques. International Journal of Computer Science and Information Technologies, 5(4), 5110–5115.
Google Scholar
Jilani, T. A., Shoaib, M., Rasheed, R., & Rehman, B. U. (2014). A comparative study of data mining techniques for Hcv patients’ data. J. Appl. Environ. Biol. Sci, 4(9S), 217–223.
Google Scholar
Jin, H., Kim, S., & Kim, J. (2014). Decision factors on effective liver patient data prediction. International Journal of BioScience and BioTechnology, 6(4), 167–178.
Google Scholar
Kalaiselvi, C., & Nasira, G. M. (2015). Prediction of Heart Diseases and Cancer in Diabetic Patients Using Data Mining Techniques. Indian Journal of Science and Technology, 8(14).
Kang, S., Cho, S., & Kang, P. (2015). Multi-class classification via heterogeneous ensemble of one-class classifiers. Engineering Applications of Artificial Intelligence, 43, 35–43.
Article Google Scholar
Kankanhalli, A., Hahn, J., Tan, S., & Gao, G. (2016). Big data and analytics in healthcare: introduction to the special section. Information Systems Frontiers, 18(2), 233–235.
Article Google Scholar
Kaya, Y., & Uyar, M. (2013). A hybrid decision support system based on rough set and extreme learning machine for diagnosis of hepatitis disease. Applied Soft Computing, 13(8), 3429–3438.
Article Google Scholar
Kim, M. J., Kang, D. K., & Kim, H. B. (2015). Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction. Expert Systems with Applications, 42(3), 1074–1082.
Article Google Scholar
King, M. A., Abrahams, A. S., & Ragsdale, C. T. (2015). Ensemble learning methods for pay-per-click campaign management. Expert Systems with Applications, 42(10), 4818–4829.
Article Google Scholar
Kitakaze, M., Asakura, M., Nakano, A., Takashima, S., & Washio, T. (2015). Data mining as a powerful tool for creating novel drugs in cardiovascular medicine: the importance of a “back-and-forth loop” between clinical data and basic research. Cardiovascular Drugs and Therapy, 29, 309–315.
Article Google Scholar
Kumar, Y., & Sahoo, G. (2013). Prediction of different types of liver diseases using rule based classification model. Technology and healthcare, 21(5), 417–432.
Google Scholar
Kumar, V., & Velide, L. (2014). A data mining approach for prediction and treatment of diabetes disease.
Kumar, M. V., Sharathi, V. V., & Devi, B. R. G. (2012a). Hepatitis prediction model based on data mining algorithm and optimal feature selection to improve predictive accuracy. International Journal of Computer Applications, 51(19), 13–16.
Article Google Scholar
Kumar, M. V., Sharathi, V. V., & Devi, B. R. G. (2012b). Hepatitis Prediction Model based on Data Mining Algorithm and Optimal Feature Selection to Improve Predictive Accuracy. International Journal of Computer Applications, 51(19).
Lavanya, D., & Rani, K. U. (2012). Ensemble decision tree classifier for breast cancer data. International Journal of Information Technology Convergence and Services (IJITCS), 2(1).
Lavrač, N. (1999). Selected techniques for data mining in medicine. Artificial Intelligence in Medicine, 16(1), 3–23.
Article Google Scholar
Moretti, F., Pizzuti, S., Panzieri, S., & Annunziato, M. (2015). Urban traffic flow forecasting through statistical and neural network bagging ensemble hybrid modeling. Neurocomputing, 167, 3–7.
Nagarajan, S., Chandrasekaran, R. M., & Ramasubramanian, P. (2015). Data mining techniques for performance evaluation of diagnosis in gestational diabetes.
Oh, D. Y., & Gray, J. B. (2013). GA-ensemble: a genetic algorithm for robust ensembles. Computational Statistics, 28(5), 2333–2347.
Article Google Scholar
Rokach, L., & Maimon, O. (2005). Decision trees. In Data mining and knowledge discovery handbook, Springer US. (pp. 165–192).
Saeys, Y., Inza, I., & Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19), 2507–2517.
Article Google Scholar
Salama, G. I., Abdelhalim, M. B., & Zeid, M. A. (2012a). Breast Cancer Diagnosis on Three Different Datasets Using Multi-Classifiers. International Journal of Computer and Information Technology, 01(01).
Salama, G. I., Abdelhalim, M. B., & Zeid, M. A. (2012b). Breast cancer diagnosis on three different datasets using multiclassifiers. Int. J. Comput. Inf. Technol., 01(01), 764–2277.
Google Scholar
Shah, B. R., & Lipscombe, L. L. (2015). Clinical diabetes research using data mining: a Canadian perspective. Canadian Journal of Diabetes, 39(3), 235–238.
Article Google Scholar
Shouman, M.,Turner, T., Stocker, R. (2012). Integrating Naive Bayes and K-means clustering with different initial centroid selection methods in the diagnosis of heart disease patients. In: Computer science and information technology, pp. 125–137.
Shouman, M., Turner, T., & Stocker, R. (2013). Integrating Clustering with Different Data Mining Techniques in the Diagnosis of Heart Disease. Journal of Computing Science and Engineering, 20(1).
Thirumal, P. C., & Nagarajan, N. (2006). Utilization of data mining techniques for diagnosis of diabetes mellitus-a case study.
Timsina, P., Liu, J., & El-Gayar, O. (2016). Advanced analytics for the automation of medical systematic reviews. Information Systems Frontiers, 18(2), 237–252.
Article Google Scholar
Vijayan, V., & Ravikumar, A. (2014). Study of data mining algorithms for prediction and diagnosis of diabetes mellitus. International Journal of Computer Applications, 95(17).
Vijayarani, S., & Dhayanand, M. S. (2015). Liver Disease Prediction using SVM and Naïve Bayes Algorithms. International Journal of Science, Engineering and Technology Research (IJSETR), 4(4).
Yang, C. G., & Lee, H. J. (2016). A study on the antecedents of healthcare information protection intention. Information Systems Frontiers, 18(2), 253–263.
Article Google Scholar
Yasin, H., Jilani, T. A., & Danish, M. (2011). Hepatitis-C classification using data mining techniques. International Journal of Computer Applications, 24(3), 1–6.
Article Google Scholar
Zhu, F., Patumcharoenpol, P., Zhang, C., Yang, Y., Chan, J., Meechai, A., & Shen, B. (2013). Biomedical text mining and its applications in cancer research. Journal of Biomedical Informatics, 46(2), 200–211.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering Department, College of Electrical and Mechanical Engineering, National University of Sciences and Technology (NUST), Islamabad, Pakistan
Saba Bashir, Usman Qamar & Farhan Hassan Khan

Authors

Saba Bashir
View author publications
You can also search for this author in PubMed Google Scholar
Usman Qamar
View author publications
You can also search for this author in PubMed Google Scholar
Farhan Hassan Khan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Usman Qamar.

Appendix: Details of the datasets

1.1 Datasets

1.1.1 Liver disease datasets

Two liver disease datasets namely Bupa liver disease dataset and ILPD liver disease dataset are used for evaluation purposes. Both datasets are obtained from the UCI machine learning repository. Each dataset contains a diverse set of attributes and instances that will ultimately determine the presence or absence of liver disease in patients. The class labels are represented by 0 and 1 where 0 indicates the absence of disease, whereas 1 represented the presence of disease. Complete description of each dataset is given below:

a)
Bupa Liver Disease Dataset

The Bupa liver diabetes dataset was initially taken from BUPA Medical Research Ltd. There are 345 instances in the dataset representing both healthy and liver disease patients. There are seven attributes in the dataset containing no missing values. It is a complete dataset containing categorical, real and integer type attributes. The first 5 variables are all blood tests which are thought to be sensitive to liver disorders that might arise from excessive alcohol consumption. A sample of the Bupa liver diabetes dataset is shown in Table 28.

b)
Indian Liver Patient Dataset (ILPD)

The Indian liver patient dataset was collected from north east of Andhra Pradesh, India. There are 583 instances in the dataset which contains 416 liver patients’ record and 167 non-liver patient’s record. The dataset contains 441 male patients’ record and 142 female patients’ records. There are 10 attributes in the dataset that are age, gender, total Bilirubin, direct Bilirubin, total proteins, albumin, A/G ratio, SGPT, SGOT and Alkphos. The dataset does not contain any missing value attribute. The attributes consist of integer and real type data sets. A sample of ILPD dataset is shown in Table 29.

1.1.2 Chronic kidney disease dataset

The chronic kidney disease dataset is used to determine chronic kidney disease in patients. There are two class labels in the dataset; CKD (chronic kidney disease) and NotCKD. The dataset contains 24 disease diagnosis attributes and 1 class label attribute. There are 400 instances in the dataset and it also contains missing values. The CKD patients are 250 and NotCKD are 150. The class labels are replaced with 0 and 1 where 0 indicates NotCKD whereas 1 represent CKD patients. A sample of CKD dataset is given in Table 30.

1.1.3 Cleveland heart disease dataset

There are total 303 records in Cleveland heart disease dataset. The training set is composed of 272 instances, whereas test set consists of 31 instances. The feature space contains 14 attributes where 13 attributes present vital signs and one attribute is goal class (0, 1), 0 presents the absence of heart disease and 1 show the presence of dis- ease. A sample dataset of heart disease from UCI repository is shown in Table 31.

1.1.4 Diabetic retinopathy debrecen dataset

This dataset contains features extracted from the Messidor image set to predict whether an image contains signs of diabetic retinopathy or not. The dataset contains 121 instances and 20 attributes. There is no missing value in the dataset. The class label 1 contains sign of disease whereas 0 indicates absence of disease. A sample set of Diabetic Retinopathy Debrecen dataset is shown in Table 32.

1.1.5 Wisconsin breast cancer dataset (WBC)

The Wisconsin breast cancer dataset consists of 699 instances and 11 attributes. 10 attributes represent feature information, whereas one attribute contains class information where 2 = Benign and 4 = Malignant. There are 16 missing values in the dataset which are denoted by “?”. The class distribution consists of 458 benign instances and 241 malignant instances. This represents an unbalanced dataset. A sample of WBC dataset is shown in Table 33.

1.1.6 Primary tumor dataset

The primary tumor dataset is initially taken from Ljubljana Oncology Institute. The dataset contains 339 instances and 17 attributes. The class label is replaced with either 0 or 1 where 0 is absence of disease and 1 indicates presence of disease. A sample of primary tumor dataset is given in Table 34.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bashir, S., Qamar, U. & Khan, F.H. WebMAC: A web based clinical expert system. Inf Syst Front 20, 1135–1151 (2018). https://doi.org/10.1007/s10796-016-9718-y

Download citation

Published: 29 October 2016
Issue Date: October 2018
DOI: https://doi.org/10.1007/s10796-016-9718-y

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Performance Analysis of State-of-the-Art Classifiers and Stack Ensemble Model for Liver Disease Diagnosis

Liver Disease Prediction Using an Ensemble Based Approach

An Ensemble Model for Predicting Chronic Diseases Using Machine Learning Algorithms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Details of the datasets

1.1 Datasets

1.1.1 Liver disease datasets

1.1.2 Chronic kidney disease dataset

1.1.3 Cleveland heart disease dataset

1.1.4 Diabetic retinopathy debrecen dataset

1.1.5 Wisconsin breast cancer dataset (WBC)

1.1.6 Primary tumor dataset

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

WebMAC: A web based clinical expert system

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Performance Analysis of State-of-the-Art Classifiers and Stack Ensemble Model for Liver Disease Diagnosis

Liver Disease Prediction Using an Ensemble Based Approach

An Ensemble Model for Predicting Chronic Diseases Using Machine Learning Algorithms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Details of the datasets

Appendix: Details of the datasets

1.1 Datasets

1.1.1 Liver disease datasets

1.1.2 Chronic kidney disease dataset

1.1.3 Cleveland heart disease dataset

1.1.4 Diabetic retinopathy debrecen dataset

1.1.5 Wisconsin breast cancer dataset (WBC)

1.1.6 Primary tumor dataset

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation