Abstract
One relevant problem in data quality is the presence of missing data. In cases where missing data are abundant, effective ways to deal with these absences could improve the performance of machine learning algorithms. Missing data can be treated using imputation. Imputation methods replace the missing data by values estimated from the available data. This paper presents Corai, an imputation algorithm which is an adaption of Co-training, a multi-view semi-supervised learning algorithm. The comparison of Corai with other imputation methods found in the literature in three data sets from UCI with different levels of missingness inserted into up to three attributes, shows that Corai tends to perform well in data sets at greater percentages of missingness and number of attributes with missing values.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Little, R.J.A., Rubin, D.B.: Statistical analysis with missing data. John Wiley & Sons, Inc., New York (1986)
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)
Batista, G.E.A.P.A., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Applied Art. Intell. 17(5-6), 519–533 (2003)
Levy, P.: Missing data estimation, ‘hot deck’ and ‘cold deck’. In: Encyclopedia of Biostatistics. Wiley, Chichester (1998)
Zhu, X.: Semi-supervised learning literature survey. Computer Sciences TR 1530, University of Wisconsin Madison (2007), http://pages.cs.wisc.edu/~jerryzhu/research/ssl/semireview.html
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of Royal Stat. Soc. B39, 1–38 (1977)
Blum, A., Mitchell, T.M.: Combining labeled and unlabeled sata with co-training. In: Conference on Learning Theory, pp. 92–100 (1998)
Goldman, S., Zhou, Y.: Enhancing supervised learning with unlabeled data. In: International Conference on Machine Learning, pp. 327–334 (2000)
Zhou, Z.H., Li, M.: Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17(11), 1529–1541 (2005)
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Matsubara, E.T., Prati, R.C., Batista, G.E.A.P.A., Monard, M.C. (2008). Missing Value Imputation Using a Semi-supervised Rank Aggregation Approach. In: Zaverucha, G., da Costa, A.L. (eds) Advances in Artificial Intelligence - SBIA 2008. SBIA 2008. Lecture Notes in Computer Science(), vol 5249. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88190-2_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-88190-2_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88189-6
Online ISBN: 978-3-540-88190-2
eBook Packages: Computer ScienceComputer Science (R0)