[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.3115/974358.974391dlproceedingsArticle/Chapter ViewAbstractPublication PagesanlcConference Proceedingsconference-collections
Article
Free access

Tagging and morphological disambiguation of Turkish text

Published: 13 October 1994 Publication History

Abstract

Automatic text tagging is an important component in higher level analysis of text corpora, and its output can be used in many natural language processing applications. In languages like Turkish or Finnish, with agglutinative morphology, morphological disambiguation is a very crucial process in tagging, as the structures of many lexical forms are morphologically ambiguous. This paper describes a POS tagger for Turkish text based on a full-scale two-level specification of Turkish morphology that is based on a lexicon of about 24,000 root words. This is augmented with a multiword and idiomatic construct recognizer, and most importantly morphological disambiguator based on local neighborhood constraints, heuristics and limited amount of statistical information. The tagger also has functionality for statistics compilation and fine tuning of the morphological analyzer, such as logging erroneous morphological parses, commonly used roots, etc. Preliminary results indicate that the tagger can tag about 98-99% of the texts accurately with very minimal user intervention. Furthermore for sentences morphologically disambiguated with the tagger, an LFG parser developed for Turkish, generates, on the average, 50% less ambiguous parses and parses almost 2.5 times faster. The tagging functionality is not specific to Turkish, and can be applied to any language with a proper morphological analysis interface.

References

[1]
E. L. Antworth. 1990. PC-KIMMO: A Two-level Processor for Morphological Analysis. Summer Institute of Linguistics, Dallas, Texas.
[2]
E. Brill. 1992. A simple rule-based part-of-speech tagger. In Proceedings of the Third Conference on Applied Computational Linguistics, Trento, Italy.
[3]
K. W. Church. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing (ACL), pages 136--143.
[4]
D. Cutting, J. Kupiec, J. Pedersen, and P. Sibun. 1993. A practical part-of-speech tagger. Technical report, Xerox Palo Alto Research Center.
[5]
C. Demir. 1993. An ATN grammar for Turkish. Master's thesis, Department of Computer Engineering and Information Sciences, Bilkent University, Ankara, Turkey, July.
[6]
Z. Güngördü and K. Oflazer. 1994. Parsing Turkish using the Lexical-Functional Grammar formalism. In Proceedings of COLING-94, the 15th International Conference on Computational Linguistics, Kyoto, Japan.
[7]
Z. Güngördü. 1993. A Lexical-Functional Grammar for Turkish. Master's thesis, Department of Computer Engineering and Information Sciences, Bilkent University, Ankara, Turkey, July.
[8]
F. Karlsson. 1990. Constraint grammar as a framework for parsing running text. In Proceedings of COLING-90, the 13th International Conference on Computational Linguistics, volume 3, pages 168--173, Helsinki, Finland.
[9]
L. Karttunen and K. R. Beesley. 1992. Two-level rule compiler. Technical Report, XEROX Palo Alto Research Center.
[10]
K. Koskenniemi, P. Tapanainen, and A. Voutilainen. 1992. Compiling and using finite-state syntactic rules. In Proceedings of COLING-92, the 14th International Conference on Computational Linguistics, volume 1, pages 156--162, Nantes, France.
[11]
K. Oflazer. 1993. Two-level description of Turkish morphology. In Proceedings of the Sixth Conference of the European Chapter of the Association for Computational Linguistics, April. A full version appears in Literary and Linguistic Computing, Vol. 9 No. 2, 1994.
[12]
A. Voutilainen and P. Tapanainen. 1993. Ambiguity resolution in a reductionistic parser. In Proceedings of EACL'93, Utrecht, Holland.
[13]
A. Voutilainen, J. Heikkila, and A. Anttila. 1992. Constraint Grammar of English. University of Helsinki.

Cited By

View all
  • (2020)Learning Word-vector QuantizationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/339796719:5(1-18)Online publication date: 18-Jun-2020
  • (2018)Sentiment polarity classification of Turkish product reviews for measuring and summarizing user satisfactionProceedings of the Workshop on Opinion Mining, Summarization and Diversification10.1145/3301020.3303752(1-10)Online publication date: 9-Jul-2018
  • (2016)A morphology-aware network for morphological disambiguationProceedings of the Thirtieth AAAI Conference on Artificial Intelligence10.5555/3016100.3016302(2863-2869)Online publication date: 12-Feb-2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ANLC '94: Proceedings of the fourth conference on Applied natural language processing
October 1994
226 pages

Sponsors

  • ACL: Association for Computational Linguistics
  • Gesellschaft ffir Informatik

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 13 October 1994

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)44
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Learning Word-vector QuantizationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/339796719:5(1-18)Online publication date: 18-Jun-2020
  • (2018)Sentiment polarity classification of Turkish product reviews for measuring and summarizing user satisfactionProceedings of the Workshop on Opinion Mining, Summarization and Diversification10.1145/3301020.3303752(1-10)Online publication date: 9-Jul-2018
  • (2016)A morphology-aware network for morphological disambiguationProceedings of the Thirtieth AAAI Conference on Artificial Intelligence10.5555/3016100.3016302(2863-2869)Online publication date: 12-Feb-2016
  • (2011)Automatic semantic subject indexing of web documents in highly inflected languagesProceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I10.5555/2008892.2008912(215-229)Online publication date: 29-May-2011
  • (2010)Verbs are where all the action liesProceedings of the 23rd International Conference on Computational Linguistics: Posters10.5555/1944566.1944606(347-355)Online publication date: 23-Aug-2010
  • (2006)Morphological richness offsets resource demand- experiences in constructing a POS tagger for HindiProceedings of the COLING/ACL on Main conference poster sessions10.5555/1273073.1273173(779-786)Online publication date: 17-Jul-2006
  • (2006)Learning morphological disambiguation rules for TurkishProceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics10.3115/1220835.1220877(328-334)Online publication date: 4-Jun-2006
  • (2000)Statistical morphological disambiguation for agglutinative languagesProceedings of the 18th conference on Computational linguistics - Volume 110.3115/990820.990862(285-291)Online publication date: 31-Jul-2000
  • (1998)Implementing voting constraints with finite state transducersProceedings of the International Workshop on Finite State Methods in Natural Language Processing10.5555/1611533.1611542(91-100)Online publication date: 30-Jun-1998
  • (1998)Does tagging help parsing?Proceedings of the International Workshop on Finite State Methods in Natural Language Processing10.5555/1611533.1611536(25-36)Online publication date: 30-Jun-1998
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media