[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.3115/974499.974526dlproceedingsArticle/Chapter ViewAbstractPublication PagesanlcConference Proceedingsconference-collections
Article
Free access

A simple rule-based part of speech tagger

Published: 31 March 1992 Publication History

Abstract

Automatic part of speech tagging is an area of natural language processing where statistical techniques have been more successful than rule-based methods. In this paper, we present a simple rule-based part of speech tagger which automatically acquires its rules and tags with accuracy comparable to stochastic taggers. The rule-based tagger has many advantages over these taggers, including: a vast reduction in stored information required, the perspicuity of a small set of meaningful rules, ease of finding and implementing improvements to the tagger, and better portability from one tag set, corpus genre or language to another. Perhaps the biggest contribution of this work is in demonstrating that the stochastic method is not the only viable method for part of speech tagging. The fact that a simple rule-based tagger that automatically learns its rules can perform so well should offer encouragement for researchers to further explore rule-based tagging, searching for a better and more expressive set of rule templates and other variations on the simple but effective theme described below.

References

[1]
{Church 88} Church, K. A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In Proceedings of the Second Conference on Applied Natural Language Processing, ACL, 136--143, 1988.
[2]
{Cutting et al. 92} Cutting, D., Kupiec, J., Pederson, J. and Sibun, P. A Practical Part-of-Speech Tagger. In Proceedings of the Third Conference on Applied Natural Language Processing, ACL, 1992.
[3]
{DeRose 88} DeRose, S. J. Grammatical Category Disambiguation by Statistical Optimization. Computational Linguistics 14: 31--39, 1988.
[4]
{Deroualt and Merialdo 86} Deroualt, A. and Merialdo, B. Natural language modeling for phoneme-to-text transcription. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-8, No. 6, 742--749, 1986.
[5]
{Francis and Kučera 82} Francis, W. Nelson and Kučera, Henry, Frequency analysis of English usage. Lexicon and grammar. Houghton Mifflin, Boston, 1982.
[6]
{Garside et al. 87} Garside, R., Leech, G. & Sampson, G. The Computational Analysis of English: A Corpus-Based Approach. Longman: London, 1987.
[7]
{Green and Rubin 71} Green, B. and Rubin, G. Automated Grammatical Tagging of English. Department of Linguistics, Brown University, 1971.
[8]
{Hindle 89} Hindle, D. Acquiring disambiguation rules from text. Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics, 1989.
[9]
{Jelinek 85} Jelinek, F. Markov source modeling of text generation. In J. K. Skwirzinski, ed., Impact of Processing Techniques on Communication, Dordrecht, 1985.
[10]
{Klein and Simmons 63} Klein, S. and Simmons, R. F. A Computational Approach to Grammatical Coding of English Words. JACM 10: 334--47. 1963.
[11]
{Kupiec 89} Kupiec, J. Augmenting a hidden Markov model for phrase-dependent word tagging. In Proceedings of the DARPA Speech and Natural Language Workshop, Morgan Kaufmann, 1989.
[12]
{Meteer et al. 91} Meteer, M., Schwartz, R., and Weischedel, R. Empirical Studies in Part of Speech Labelling, Proceedings of the DARPA Speech and Natural Language Workshop, Morgan Kaufmann, 1991.

Cited By

View all
  • (2024)Joining metadata and textual features to advise administrative courts decisions: a cascading classifier approachArtificial Intelligence and Law10.1007/s10506-023-09348-932:1(201-230)Online publication date: 1-Mar-2024
  • (2024)A LayoutLMv3-Based Model for Enhanced Relation Extraction in Visually-Rich DocumentsDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70546-5_10(160-174)Online publication date: 30-Aug-2024
  • (2021)Part-of-Speech (POS) Tagging Using Deep Learning-Based Approaches on the Designed Khasi POS CorpusACM Transactions on Asian and Low-Resource Language Information Processing10.1145/348838121:3(1-24)Online publication date: 13-Dec-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ANLC '92: Proceedings of the third conference on Applied natural language processing
March 1992
273 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 31 March 1992

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,118
  • Downloads (Last 6 weeks)36
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Joining metadata and textual features to advise administrative courts decisions: a cascading classifier approachArtificial Intelligence and Law10.1007/s10506-023-09348-932:1(201-230)Online publication date: 1-Mar-2024
  • (2024)A LayoutLMv3-Based Model for Enhanced Relation Extraction in Visually-Rich DocumentsDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70546-5_10(160-174)Online publication date: 30-Aug-2024
  • (2021)Part-of-Speech (POS) Tagging Using Deep Learning-Based Approaches on the Designed Khasi POS CorpusACM Transactions on Asian and Low-Resource Language Information Processing10.1145/348838121:3(1-24)Online publication date: 13-Dec-2021
  • (2021)Towards an Automated Classification Approach for Software Engineering ResearchProceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering10.1145/3463274.3463358(347-352)Online publication date: 21-Jun-2021
  • (2021)A Cascaded Unsupervised Model for PoS TaggingACM Transactions on Asian and Low-Resource Language Information Processing10.1145/344775920:1(1-23)Online publication date: 31-Mar-2021
  • (2020)Information extraction meets the Semantic WebSemantic Web10.3233/SW-18033311:2(255-335)Online publication date: 1-Jan-2020
  • (2020)A Proposed Model for Bengali Named Entity Recognition Using Maximum Entropy Markov Model Incorporated with Rich Linguistic Feature SetProceedings of the International Conference on Computing Advancements10.1145/3377049.3377117(1-6)Online publication date: 10-Jan-2020
  • (2019)Evaluation of word representations in grounding natural language instructions through computational human-robot interactionProceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction10.5555/3378680.3378746(307-316)Online publication date: 11-Mar-2019
  • (2019)Word embedding and cognitive linguistic models in text classification tasksProceedings of the XI International Scientific Conference Communicative Strategies of the Information Society10.1145/3373722.3373778(1-6)Online publication date: 25-Oct-2019
  • (2019)Augment to PreventProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358040(991-1000)Online publication date: 3-Nov-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media