Abstract
In this paper we apply the ensemble approach to the identification of incorrectly annotated items (noise) in a training set. In a controlled experiment, memory-based, decision tree-based and transformation-based classifiers are used as a filter to detect and remove noise deliberately introduced into a manually tagged corpus. The results indicate that the method can be successfully applied to automatically detect errors in a corpus.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging. In Computational Linguistics. 21:4. (1995).
Brill, E. & Wu, Y.: Classifier Combination for Improved Lexical Disambiguation. In Proceedings of COLING-ACL’98, 1, pp. 191–195. (1998).
Brodley, C. E., & Friedl, M. A.: Identifying and eliminating mislabelled training instances. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 799–805 Portland, OR. AAAI Press. (1996).
Brodley, C. E., Friedl, M. A.: Identifying Mislabelled Training Data. Journal of Artificial Intelligence Research, 11, pp. 131–167. (1999).
Daelemans, W., Zavrel, J., Sloot, K, Bosch, A.: TiMBL: Tilburg Memory Based Learner. Reference Guide. ILK Technical Report-ILK 99-01. (1999).
Quinlan, J. R.: Induction of decision trees. Machine Learning, 1(1), 81–106. (1986).
Sampson, G.: English for the Computer. Oxford University Press. (1995).
Zavrel, J and Daelemans, W.: Recent Advances in Memory-Based Part-of-Speech Tagging. VI Simposio Internacional de Comunicacion Social, Santiago de Cuba, pp. 590–597, 1999. ILK-9903. (1999).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Berthelsen, H., Megyesi, B. (2000). Ensemble of Classifiers for Noise Detection in PoS Tagged Corpora. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2000. Lecture Notes in Computer Science(), vol 1902. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45323-7_5
Download citation
DOI: https://doi.org/10.1007/3-540-45323-7_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41042-3
Online ISBN: 978-3-540-45323-9
eBook Packages: Springer Book Archive