Weak Classifiers Performance Measure in Handling Noisy Clinical Trial Data

Ezzatul Akmal Kamaru-Zaman^13,14,
Andrew Brass¹⁴,
James Weatherall¹⁵ &
…
Shuzlina Abdul Rahman¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 652))

Included in the following conference series:

International Conference on Soft Computing in Data Science

839 Accesses
2 Citations

Abstract

Most research concluded that machine learning performance is better when dealing with cleaned dataset compared to dirty dataset. In this paper, we experimented three weak or base machine learning classifiers: Decision Table, Naive Bayes and k-Nearest Neighbor to see their performance on real-world, noisy and messy clinical trial dataset rather than employing beautifully designed dataset. We involved the clinical trial data scientist in leading us to a better data analysis exploration and enhancing the performance result evaluation. The classifiers performances were analyzed using Accuracy and Receiver Operating Characteristic (ROC), supported with sensitivity, specificity and precision values which resulted to contradiction of conclusion made by previous research. We employed pre-processing techniques such as interquartile range technique to remove the outliers and mean imputation to handle missing values and these techniques resulted to; all three classifiers work better in dirty dataset compared to imputed and clean dataset by showing highest accuracy and ROC measure. Decision Table turns out to be the best classifier when dealing with real-world noisy clinical trial.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Supervised machine learning for early predicting the sepsis patient: modified mean imputation and modified chi-square feature selection

Article 06 March 2021

A Novel Hybrid Imputation Method to Predict Missing Values in Medical Datasets

Filter-based feature selection methods in the presence of missing data for medical prediction models

Article 10 August 2023

References

Rogers, S., Girolami, M.: A First Course in Machine Learning. CRC Press, Boca Raton (2015)
MATH Google Scholar
Simon, H.A.: Applications of Machine Learning and Rule Induction (1995)
Google Scholar
Gamberger, D., Lavrač, N.: Noise detection and elimination applied to noise handling in KRK chess endgame. In: International Conference Inductive Logic Programming (1997)
Google Scholar
Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004)
Article MathSciNet MATH Google Scholar
Little, R.J., D’Agostino, R., Cohen, M.L., Dickersin, K., Emerson, S.S., Farrar, J.T., Frangakis, C., Hogan, J.W., Molenberghs, G., Murphy, S.A., Neaton, J.D., Rotnitzky, A., Scharfstein, D., Shih, W.J., Siegel, J.P., Stern, H.: The prevention and treatment of missing data in clinical trials. N. Engl. J. Med. 367(14), 1355–1360 (2012)
Article Google Scholar
Grubbs, F.E.: Procedures for detecting outlying observations in samples (1974)
Google Scholar
Gamberger, D., Lavrač, N., Duzeroski, S.: Noise detection and elimination in data preprocessing: experiments in medical domains. Appl. Artif. Intell. 14(2), 205–223 (2000)
Article Google Scholar
Van Hulse, J., Khoshgoftaar, T.: Knowledge discovery from imbalanced and noisy data. Data Knowl. Eng. 68(12), 1513–1542 (2009)
Article Google Scholar
Zhu, X., Wu, X., Chen, Q.: Eliminating class noise in large datasets. In: ICML, pp. 920–927 (2003)
Google Scholar
Hall, M.A.: Correlation-based feature selection for machine learning. Methodology 21i195–i20, 1–5 (1999)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Google Scholar
Zupan, B., Demšar, J., Kattan, M.W., Beck, J.R., Bratko, I.: Machine learning for survival analysis: a case study on recurrence of prostate cancer. Artif. Intell. Med. 20(1), 59–75 (2000)
Article Google Scholar
Kalapanidas, E., Avouris, N., Craciun, M., Neagu, D.: Machine learning algorithms: a study on noise sensitivity. In: Proceedings of the 1st Balcan Conference on Informatics, pp. 356–365, October 2003
Google Scholar
Vannucci, M., Colla, V., Cateni, S.: An hybrid ensemble method based on data clustering and weak learners reliabilities estimated through neural networks. In: Rojas, I., Joya, G., Catala, A. (eds.) IWANN 2015. LNCS, vol. 9095, pp. 400–411. Springer, Heidelberg (2015)
Chapter Google Scholar
Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1–2), 1–39 (2010)
Article Google Scholar
Maclin, R., Opitz, D.: Popular Ensemble Methods: An Empirical Study, arXiv.org, vol. cs.AI, pp. 169–198 (2011)
Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Chapter Google Scholar
Kohavi, R.: The power of decision tables. In: Machine Learning, ECML 1995, pp. 174–189 (1995)
Google Scholar
Wets, G., Vanthienen, J., Timmermans, H.: Modelling decision tables from data. In: Wu, X., Kotagiri, R., Korb, K.B. (eds.) PAKDD 1998. LNCS, vol. 1394. Springer, Heidelberg (1998)
Chapter Google Scholar
John, G.H.G., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, Montreal, Quebec, Canada, vol. 1, pp. 338–345 (1995)
Google Scholar
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Google Scholar
Zweig, M.H., Campbell, G.: Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin. Chem. 39(4), 561–577 (1993)
Google Scholar
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques (Google eBook) (2011)
Google Scholar
Li, M., Shang, C., Feng, S., Fan, J.: Quick attribute reduction in inconsistent decision tables. Inf. Sci. (Ny) 254, 155–180 (2014)
Article MathSciNet MATH Google Scholar
Tomar, D., Agarwal, S.: A survey on data mining approaches for healthcare. Int. J. Bio-Sci. Bio-Technol. 5(5), 241–266 (2013)
Article Google Scholar
Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Miscellaneous Clustering Methods (2011)
Google Scholar

Download references

Acknowledgement

This paper is a part of Master Dissertation Theses written in University of Manchester, UK. We would like to thank data scientists from Advance Analytics Centre, Astra Zeneca, Alderley Park, Chesire, UK for their review, support and suggestion on this study.

Author information

Authors and Affiliations

Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia
Ezzatul Akmal Kamaru-Zaman & Shuzlina Abdul Rahman
School of Computer Science, University of Manchester, Manchester, UK
Ezzatul Akmal Kamaru-Zaman & Andrew Brass
Advanced Analytics Centre, Astra Zeneca R&D, Alderley Park, Chesire, UK
James Weatherall

Authors

Ezzatul Akmal Kamaru-Zaman
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Brass
View author publications
You can also search for this author in PubMed Google Scholar
James Weatherall
View author publications
You can also search for this author in PubMed Google Scholar
Shuzlina Abdul Rahman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ezzatul Akmal Kamaru-Zaman .

Editor information

Editors and Affiliations

University of Tennessee, Knoxville, Tennessee, USA
Michael W. Berry
Universiti Teknologi MARA, Shah Alam, Malaysia
Azlinah Hj. Mohamed
Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam, Malaysia
Bee Wah Yap

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kamaru-Zaman, E.A., Brass, A., Weatherall, J., Rahman, S.A. (2016). Weak Classifiers Performance Measure in Handling Noisy Clinical Trial Data. In: Berry, M., Hj. Mohamed, A., Yap, B. (eds) Soft Computing in Data Science. SCDS 2016. Communications in Computer and Information Science, vol 652. Springer, Singapore. https://doi.org/10.1007/978-981-10-2777-2_13

Download citation

DOI: https://doi.org/10.1007/978-981-10-2777-2_13
Published: 18 September 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2776-5
Online ISBN: 978-981-10-2777-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Weak Classifiers Performance Measure in Handling Noisy Clinical Trial Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Supervised machine learning for early predicting the sepsis patient: modified mean imputation and modified chi-square feature selection

A Novel Hybrid Imputation Method to Predict Missing Values in Medical Datasets

Filter-based feature selection methods in the presence of missing data for medical prediction models

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Weak Classifiers Performance Measure in Handling Noisy Clinical Trial Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Supervised machine learning for early predicting the sepsis patient: modified mean imputation and modified chi-square feature selection

A Novel Hybrid Imputation Method to Predict Missing Values in Medical Datasets

Filter-based feature selection methods in the presence of missing data for medical prediction models

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation