[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.3115/974147.974178dlproceedingsArticle/Chapter ViewAbstractPublication PagesanlcConference Proceedingsconference-collections
Article
Free access

TnT: a statistical part-of-speech tagger

Published: 29 April 2000 Publication History

Abstract

Trigrams'n'Tags (TnT) is an efficient statistical part-of-speech tagger. Contrary to claims found elsewhere in the literature, we argue that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework. A recent comparison has even shown that TnT performs significantly better for the tested corpora. We describe the basic model of TnT, the techniques used for smoothing and for handling unknown words. Furthermore, we present evaluations on two corpora.

References

[1]
Thorsten Brants, Wojciech Skut, and Hans Uszkoreit. 1999. Syntactic annotation of a German newspaper corpus. In Proceedings of the ATALA Treebank Workshop, pages 69--76, Paris, France.
[2]
Eric Brill. 1993. A Corpus-Based Approach to Language Learning. Ph.D. Dissertation, Department of Computer and Information Science, University of Pennsylvania.
[3]
Eugene Charniak, Curtis Hendrickson, Neil Jacobson, and Mike Perkowitz. 1993. Equations for part-of-speech tagging. In Proceedings of the Eleventh National Conference on Artificial Intelligence, pages 784--789, Menlo Park: AAAI Press/MIT Press.
[4]
Doug Cutting, Julian Kupiec, Jan Pedersen, and Penelope Sibun. 1992. A practical part-of-speech tagger. In Proceedings of the 3rd Conference on Applied Natural Language Processing (ACL), pages 133--140.
[5]
Walter Daelemans, Jakub Zavrel, Peter Berck, and Steven Gillis. 1996. Mbt: A memory-based part of speech tagger-generator. In Proceedings of the Workshop on Very Large Corpora, Copenhagen, Denmark.
[6]
Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313--330.
[7]
Lawrence R. Rabiner. 1989. A tutorial on Hidden Markov Models and selected applications in speech recognition. In Proceedings of the IEEE, volume 77(2), pages 257--285.
[8]
Adwait Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP-96, Philadelphia, PA.
[9]
Christer Samuelsson. 1993. Morphological tagging based entirely on Bayesian inference. In 9th Nordic Conference on Computational Linguistics NODALIDA-93, Stockholm University, Stockholm, Sweden.
[10]
Helmut Schmid. 1995. Improvements in part-of-speech tagging with an application to German. In Helmut Feldweg and Erhard Hinrichts, editors, Lexikon und Text. Niemeyer, Tübingen.
[11]
Wojciech Skut, Brigitte Krenn, Thorsten Brants, and Hans Uszkoreit. 1997. An annotation scheme for free word order languages. In Proceedings of the Fifth Conference on Applied Natural Language Processing ANLP-97, Washington, DC.
[12]
Hans van Halteren, Jakub Zavrel, and Walter Daelemans. 1998. Improving data driven wordclass tagging by system combination. In Proceedings of the International Conference on Computational Linguistics COLING-98, pages 491--497, Montreal, Canada.
[13]
Martin Volk and Gerold Schneider. 1998. Comparing a statistical and a rule-based tagger for german. In Proceedings of KONVENS-98, pages 125--137, Bonn.
[14]
Jakub Zavrel and Walter Daelemans. 1999. Evaluatie van part-of-speech taggers voor het corpus gesproken nederlands. CGN technical report, Katholieke Universiteit Brabant, Tilburg.

Cited By

View all
  • (2022)AdaSLInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10296459:4Online publication date: 1-Jul-2022
  • (2022)Resources for Turkish dependency parsing: introducing the BOUN Treebank and the BoAT annotation toolLanguage Resources and Evaluation10.1007/s10579-021-09558-056:1(259-307)Online publication date: 1-Mar-2022
  • (2021)Part-of-Speech (POS) Tagging Using Deep Learning-Based Approaches on the Designed Khasi POS CorpusACM Transactions on Asian and Low-Resource Language Information Processing10.1145/348838121:3(1-24)Online publication date: 13-Dec-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ANLC '00: Proceedings of the sixth conference on Applied natural language processing
April 2000
344 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 29 April 2000

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)89
  • Downloads (Last 6 weeks)9
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)AdaSLInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10296459:4Online publication date: 1-Jul-2022
  • (2022)Resources for Turkish dependency parsing: introducing the BOUN Treebank and the BoAT annotation toolLanguage Resources and Evaluation10.1007/s10579-021-09558-056:1(259-307)Online publication date: 1-Mar-2022
  • (2021)Part-of-Speech (POS) Tagging Using Deep Learning-Based Approaches on the Designed Khasi POS CorpusACM Transactions on Asian and Low-Resource Language Information Processing10.1145/348838121:3(1-24)Online publication date: 13-Dec-2021
  • (2020)Named Entity Recognition and Classification for Punjabi ShahmukhiACM Transactions on Asian and Low-Resource Language Information Processing10.1145/338330619:4(1-13)Online publication date: 17-Apr-2020
  • (2020)A Survey on Renamings of Software EntitiesACM Computing Surveys10.1145/337944353:2(1-38)Online publication date: 17-Apr-2020
  • (2019)Deep Learning-Based Morphological Taggers and Lemmatizers for Annotating Historical TextsProceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage10.1145/3322905.3322915(133-137)Online publication date: 8-May-2019
  • (2019)Toward an Effective Igbo Part-of-Speech TaggerACM Transactions on Asian and Low-Resource Language Information Processing10.1145/331494218:4(1-26)Online publication date: 21-May-2019
  • (2018)A Scalable Solution for Rule-Based Part-of-Speech Tagging on Novel Hardware AcceleratorsProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3219889(665-674)Online publication date: 19-Jul-2018
  • (2018)A Basic Language Resource Kit Implementation for the IgboNLP ProjectACM Transactions on Asian and Low-Resource Language Information Processing10.1145/314638717:2(1-23)Online publication date: 11-Jan-2018
  • (2017)Coupling an annotated corpus and a lexicon for amazigh POS taggingJournal of Mobile Multimedia10.5555/3370040.337004513:3-4(222-232)Online publication date: 1-Dec-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media