Abstract
Automated part-of-speech (POS) tagging has been a very active research area for many years and is the foundation of natural language processing systems. Natural Language Toolkit (NLTK) library in the Python environment provides the necessary tools for tagging, but doesn’t actually tell us what methods work the best. Therefore, this work analyzes the performance of part-of-speech taggers, namely the NLTK Default tagger, Regex tagger and N-gram taggers (Unigram, Bigram and Trigram) on a particular corpus. The corpora we have used for the analysis are; Brown, Penn Treebank and CoNLL2000. We have applied all taggers to these three corpora, resultantly we have shown that whereas Unigram tagger does the best tagging in all corpora, the combination of taggers does better if it is correctly ordered.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. OReilly Media, USA (2009)
Boehm, I.: Unigram Backoff vs. TnT Evaluating Part of Speech Taggers, Introduction to Computational Linguistics, Austria
Smedt, T.D., Marfia, F., Matteucci, M., Daelemans, W.: Using Wiktionary to Build an Italian, CLiPS Computational Linguistics Research Group. University of Antwerp
Sheikh, Z.M.A.W.: A Trigram Part-of-Speech Tagger for the Apertium Free/Open Source Machine Translation Platform, Computer Science and Engineering. National Institute of Technology Allahabad-211004, India
Hagerman, C.: Evaluating the Performance of Automated Part-of-Speech Taggers on an L2 Corpus. Osaka Jogakuin College
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19, 313–330 (1993)
Part-Of-Speech tagging with NLTK. https://streamhacker.wordpress.com/tag/tagging/
NLTK 3.0 Documentation. http://www.nltk.org/
Brown Corpus Manual. http://icame.uib.no/brown/bcm.html
NLTK Default Tagger Performance on CoNLL2000. http://streamhacker.com/2011/01/25/nltk-default-tagger-conll2000-tag-coverage/
Processing Corpora with Python and the Natural Language Toolkit. http://www.freecode.com/articles/processing-corpora-with-python-and-the-natural-language-toolkit
Corpus Readers-Tagged Corpora. http://www.nltk.org/howto/corpus.html#tagged-corpora
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Khin, N.P.P., Aung, T.N. (2016). Analyzing Tagging Accuracy of Part-of-Speech Taggers. In: Zin, T., Lin, JW., Pan, JS., Tin, P., Yokota, M. (eds) Genetic and Evolutionary Computing. GEC 2015. Advances in Intelligent Systems and Computing, vol 388. Springer, Cham. https://doi.org/10.1007/978-3-319-23207-2_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-23207-2_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23206-5
Online ISBN: 978-3-319-23207-2
eBook Packages: EngineeringEngineering (R0)