Abstract
In this paper we will describe the work that is being cooperatively done by Portugal and Brazil. It uses Statistical Methods for Natural Language Processing. Namely, we will focus on the problem of Part-of-Speech (POS) Tagging. POS Tagging is a recent and successful technique for assigning each word in a sentence its correct POS tag. This technique can achieve more than 96% of accuracy, even with unseen untagged texts. All steps involved in this process will be described as well as the problems faced. Besides, we will present the stochastic approach to POS Tagging, which treats the generation of tag alignments as a probabilistic problem. Finally, we will report the results achieved by using these kinds of techniques for Portuguese texts.
Work partially suported by a PhD Scholarship by JNICT-PRAXIS XXI/BD/2909/94
Preview
Unable to display preview. Download preview PDF.
Bibliography
BRILL, E.; A Simple Rule-Based Part of Speech Tagger. In Proceedings of the DARPA Speech and Natural Language Workshop, 112–116, 1992.
BRISCOE, E.J.B.; CARROL, J.; Robust Parsing — Advanced Course. ESSLLI'94.
CHANG, C.H.; CHEN, C.D.; HMM-based Part-of-Speech Tagging for Chinese Corpora. In Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, p. 40–47, 1993.
CHANOD, J. P.;TAPANAINEN, P.; Creating a tagset, lexicon and guesser for a French tagger. CMP-LG, 1995.
CHURCH, K. W.; A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In Proceedings of the Second Conference on Applied Natural Language Processing (ACL), p 136–143, 1988.
CHURCH, K. W.; GALE, W.A.; A Comparison of the Enhanced Good-Turing and Deleted Estimation Methods for Estimating Probabilities of English Bigrams. In Computer Speech and Language, 5:19–54, 1991.
CUTTING, D.; KUPIEC, J.; PEDERSEN, J.; SIBUN, P.; A practical part-of-speech tagger. In Proceedings of the 3rd Conference on Applied Language Processing, Trento, Italy, 133–140,1992.
ELWORTHY, D.; Does Baum-Welch Re-estimation Help Taggers? In CMP-LG 1994.
ELWORTHY, D.; Tagset Design and Inflected Languages. In CMP-LG 1994.
KEMPE, A.; A Stochastic Tagger and an Analysis of Tagging Errors. Internal Paper. Institute for Computational Linguistics, University of Stuttgart.
KEMPE, A.; Probabilistic Tagging with Feature Structures. IN CMP-LG 1994.
LOPES, J.G.P., SANTOS, A.M.M.; Portuguese Lexicon Acquisition Interface (PLAIN). In Eurolex 8∼90, Proceedings, BiblografVOX, 1992, 105–107.
Marques, N. M. C.; Lopes, J. G. P.; POLARIS: A Po rtuguese L exicon A cquisition and R etrieval I nteractive S ystem. In Proceedings of the Conference on Practical Applications of Prolog, 1994.
MERIALDO, B.; Tagging English Text with a Probabilistic Model. In Computational Linguistics, v. 20, n. 2, p. 155–171, 1994.
SCHIMID, H.; Part-of-Speech Tagging with Neural Networks. CMP-LG 1994.
VOUTILAINEN, A.; A syntax-based part-of-speech analyser. In CMP-LG 19955
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Villavicencio, A., Lopes, J.G.P., Marques, N.M.C., Villavicencio, F. (1995). Part-of-speech tagging for portuguese texts. In: Wainer, J., Carvalho, A. (eds) Advances in Artificial Intelligence. SBIA 1995. Lecture Notes in Computer Science, vol 991. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0034825
Download citation
DOI: https://doi.org/10.1007/BFb0034825
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60436-5
Online ISBN: 978-3-540-47467-8
eBook Packages: Springer Book Archive