A Machine Learning Approach to POS Tagging

Lluís Màrquez¹,
Lluís Padró² &
Horacio Rodríguez³

8323 Accesses
22 Citations
Explore all metrics

Abstract

We have applied the inductive learning of statistical decision trees and relaxation labeling to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve POS ambiguities, consisting of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired decision trees have been directly used in a tagger that is both relatively simple and fast, and which has been tested and evaluated on the Wall Street Journal (WSJ) corpus with competitive accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labeling based tagger. In this direction we describe a tagger which is able to use information of any kind (n-grams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine-learned decision trees. Simultaneously, we address the problem of tagging when only limited training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that high levels of accuracy can be achieved with our system in this situation, and report some results obtained when using it to develop a 5.5 million words Spanish corpus from scratch.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Aarts, E.H. & Korst, J.H. (1987). Boltzmann machines and their applications. In J.W. de Bakker, A.J. Nijman & P.C. Treleaven (Eds.). Proceedings PARLE (Parallel Architectures and Languages Europe). Lecture Notes in Computer Science, Vol. 258.
Bahl, L.R., Brown, P.F., DeSouza, P.V., & Mercer, R.L. (1989). A tree-based statistical language model for natural language speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 37(7), 1001–1008.
Google Scholar
Baum, L.E. (1972). An inequality and associated maximization technique in statistical estimation for probabilistic functions of a Markov process. Inequalities, 3, 1–8.
Google Scholar
Blum, A. & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. Proceedings of the 11th Annual Conference on Computational Learning Theory, COLT-98 (pp. 92–100). Madison, Wisconsin.
Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. The Wadsworth Statistics/Probability Series. Belmont, CA: Wadsworth International Group.
Google Scholar
Brill, E. (1992). A simple rule-based part-of-speech tagger. Proceedings of the 3rd Conference on Applied Natural Language Processing, ANLP (pp. 152–155). ACL.
Brill, E. (1994). Some advances in rule-based part-of-speech tagging. Proceedings of the 12th National Conference on Artificial Intelligence, AAAI (pp. 722–727).
Brill, E. (1995). Unsupervised learning of disambiguation rules for part-of-speech tagging. Proceedings of the 3rd Workshop on Very Large Corpora (pp. 1–13). Massachusetts.
Brill, E. & Wu, J. (1998). Classifier combination for improved lexical disambiguation. Proceedings of the Joint 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, COLING-ACL (pp. 191–195). Montréal, Canada.
Cardie, C. (1994). Domain specific knowledge acquisition for conceptual sentence analysis. Ph.d. Thesis, University of Massachusets. Available as CMPSCI Technical Report 94–74, University of Massachusetts.
Carmona, J., Cervell, S., Màrquez, L., Martí, M., Padró, L., Placer, R., Rodríguez, H., Taulé, M., & Turmo, J. (1998). An environment for morphosyntactic processing of unrestricted spanish text. Proceedings of the 1st International Conference on Language Resources and Evaluation, LREC (pp. 915–922). Spain: Granada.
Google Scholar
Chanod, J.-P. & Tapanainen, P. (1995). Tagging French—Comparing a statistical and a constraint-based method. Proceedings of the 7th Conference of the European Chapter of the Association for Computational Linguistics, EACL (pp. 149–156). Dublin, Ireland.
Church, K.W. (1988). A stochastic parts program and noun phrase parser for unrestricted text. Proceedings of the 1st Conference on Applied Natural Language Processing, ANLP (pp. 136–143). ACL.
Cover, T.M. & Thomas, J.A. (Eds.). (1991). Elements of information theory. John Wiley & Sons.
Cutting, D., Kupiec, J., Pedersen, J., & Sibun, P. (1992). A practical part-of-speech tagger. Proceedings of the 3rd Conference on Applied Natural Language Processing, ANLP (pp. 133–140). ACL.
Daelemans, W., Zavrel, J., Berck, P., & Gillis, S. (1996). MBT: A memory-based part-of-speech tagger generator. Proceedings of the 4th Workshop on Very Large Corpora (pp. 14–27). Copenhagen, Denmark.
DeRose, S.J. (1988). Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14, 31–39.
Google Scholar
Elworthy, D. (1993). Part-of-speech and phrasal tagging. Working Paper #10, ESPRIT BRA-7315 Acquilex II.
Elworthy, D. (1994). Does Baum-Welch re-estimation help taggers? Proceedings of the 4th Conference on Applied Natural Language Processing, ANLP (pp. 53–58). ACL.
Garside, R., Leech, G., & Sampson, G. (Eds.) (1987). The computational analysis of English: A corpus-based approach. London: Longman.
Google Scholar
Greene, B.B. & Rubin, G.M. (1971). Automatic grammatical tagging of English. Technical Report, Department of Linguistics, Brown University.
Halteren, H.v., Zavrel, J., & Daelemans, W. (1998). Improving data driven wordclass tagging by system combination. Proceedings of the Joint 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, COLING-ACL (pp. 491–497). Montréal, Canada.
Karlsson, F., Voutilainen, A., Heikkilä, J., & Anttila, A. (Eds.). (1995). Constraint grammar: A language independent system for parsing unrestricted text. Berlin: Mouton de Gruyter.
Google Scholar
Kononenko, I., Šimec, E., & Robnik-Šikonja, M. (1995). Overcoming the myopia of inductive learning algorithms with RELIEFF. Applied Intelligence, 10, 39–55.
Google Scholar
Krenn, B. & Samuelsson, C. (1997). The linguists' guide to statistics: Don't panic. Technical Report Universität des Saarlandes. Postscript version of December 19, 1997 at URL: http://coli.uni-sb.de/∼christer.
Krovetz, R. (1997). Homonymy and polysemy in information retrieval. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics. Joint ACL/EACL (pp. 72–79). Madrid, Spain.
Larrosa, J. & Meseguer, P. (1995a). An optimization-based heuristic for maximal constraint satisfaction. Proceedings of International Conference on Principles and Practice of Constraint Programming (pp. 103–120).
Larrosa, J. & Meseguer, P. (1995b). Constraint satisfaction as global optimization. Proceedings of 14th International Joint Conference on Artificial Intelligence, IJCAI '95 (pp. 579–584).
Lloyd, S.A. (1983). An optimization approach to relaxation labelling algorithms. Image and Vision Computing, 1(2), 85–91.
Google Scholar
López de Mántaras, R. (1991). A distance-based attribute selection measure for decision tree induction. Machine Learning, 6(1), 81–92.
Google Scholar
López de Mántaras, R., Cerquides, J., & Garcia, P. (1996). Comparing information-theoretic attribute selection measures: A statistical approach research report 96–16, IIIA. To appear in Artificial Intelligence Communications.
Magerman, D.M. (1996). Learning grammatical structure using statistical decision-trees. Proceedings of the 3rd International Colloquium on Grammatical Inference, ICGI (pp. 1–21). Springer-Verlag Lecture Notes Series in Artificial Intelligence 1147.
Marcus, M.P., Marcinkiewicz, M.A., & Santorini, B. (1993). Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19(2), pp. 313–330.
Google Scholar
Màrquez, L. (1999). Part-of-speech tagging: A machine-learning approach based on decision trees. Ph.d. Thesis, Dep. Llenguatges i Sistemes Informàtics. Universitat Politècnica de Catalunya.
Màrquez, L. & Padró, L. (1997). A flexible POS tagger using an automatically acquired language model. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics. Joint ACL/EACL (pp. 238–245). Madrid, Spain.
Màrquez, L., Padró, L.,& Rodríguez, H. (1998). Improving tagging accuracy by voting taggers. Proceedings of the 2nd Conference on Natural Language Processing & Industrial Applications, NLP+IA/TAL+AI (pp. 149–155). New Brunswick, Canada.
Màrquez, L. & Rodríguez, H. (1995). Towards learning a constraint grammar from annotated corpora using decision trees. Working Paper #21, ESPRIT BRA-7315 Acquilex II.
Màrquez, L. & Rodríguez, H. (1997). Automatically acquiring a language model for POS tagging using decision trees. Proceedings of the Second Conference on Recent Advances in Natural Language Processing, RANLP (pp. 27–34). Tzigov Chark, Bulgaria.
Màrquez, L. & Rodríguez, H. (1998). Part-of-speech tagging using decision trees. Proceedings of the 10th European Conference on Machine Learning, ECML (pp. 25–36). Chemnitz, Germany. (Lecture Notes in Artificial Intelligence, Vol. 1398. Claire Nédellec and Céline Rouveirol Eds., Springer.
McCarthy, J.F. & Lehnert, W.G. (1995). Using decision trees for coreference resolution. Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI (pp. 1050–1055).
Merialdo, B. (1994). Tagging english text with a probabilistic model. Computational Linguistics, 20(2), 155–171.
Google Scholar
Mooney, R.J. (1996). Comparative experiments on disambiguating word senses: An illustration of the role of bias in machine learning. Proceedings of the 1st Conference on Empirical Methods in Natural Language Processing, EMNLP (pp. 82–91).
Nigam, K., McCallum, A., Thrun, S., & Mitchell, T. (1998). Learning to classify text from labeled and unlabeled documents. Proceedings of the 15th National Conference on Artificial Intelligence, AAAI-98 Madison, Wisconsin.
Oostdijk, N. (1991). Corpus linguistic and the automatic analysis of English. Amsterdam: Rodopi.
Google Scholar
Padró, L. (1996). POS tagging using relaxation labelling. Proceedings of the 16th International Conference on Computational Linguistics, COLING (pp. 877–882). Copenhagen, Denmark.
Padró, L. (1998). A hybrid environment for syntax-semantic tagging. Ph.d. Thesis, Dep. Llenguatges i Sistemes Informàtics. Universitat Politècnica de Catalunya.
Padró, L. & Màrquez, L. (1998). On the evaluation and comparison of taggers: The effect of noise in testing corpora. Proceedings of the Joint 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, COLING-ACL (pp. 997–1002). Montréal, Canada.
Pelillo, M. & Maffione, A. (1994). Using simulated annealing to train relaxation labelling processes. Proceedings of ICANN '94 (pp. 250–253).
Pelillo, M. & Refice, M. (1994). Learning compatibility coefficients for relaxation labeling processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(9), 933–945.
Google Scholar
Quinlan, J.R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann Publishers Inc.
Google Scholar
Quinlan, J.R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.
Google Scholar
Ratnaparkhi, A. (1996). Amaximumentropy part-of-speech tagger. Proceedings of the 1st Conference on Empirical Methods in Natural Language Processing, EMNLP.
Ratnaparkhi, A. (1997). A simple introduction to maximum entropy models for natural language processing. Technical Report 97–08, Institute for Research in Cognitive Science, University of Pennsylvania.
Richards, J., Landgrebe, D., & Swain, P. (1981). On the accuracy of pixel relaxation labelling. IEEE Transactions on Systems, Man and Cybernetics, 11(4), 303–309.
Google Scholar
Ristad, E. & Thomas, R.G. (1996). Nonuniform Markov models. Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Munich, Germany.
Ristad, E.S. (1997). Maximum entropy modeling for natural language. Joint ACL/EACL Tutorial Program, Madrid, Spain.
Rosenfeld, R. (1994). Adaptive statistical language modelling: A maximum entropy approach. Ph.d. Thesis, School of Computer Science, Carnegie Mellon University.
Rosenfeld, R., Hummel, R., & Zucker, S. (1976). Scene labelling by relaxation operations. IEEE Transactions on Systems, Man and Cybernetics, 6(6), 420–433.
Google Scholar
Samuelsson, C., Tapanainen, P., & Voutilainen, A. (1996). Inducing constraint grammars. Proceedings of the 3rd International Colloquium on Grammatical Inference, ICGI (pp. 146–155). Montpellier, France.
Samuelsson, C. & Voutilainen, A. (1997). Comparing a linguistic and a stochastic tagger. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (pp. 246–253). Madrid, Spain.
Saul, L. & Pereira, F. (1997). Aggregate and mixed-order Markov models for statistical language processing. Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing, EMNLP.
Schmid, H. (1994a). Part-of-speech tagging with neural networks. Proceedings of the 15th International Conference on Computational Linguistics, COLING (pp. 172–176). Kyoto, Japan.
Schmid, H. (1994b). Probabilistic part-of-speech tagging using decision trees. Proceedings of the Conference on New Methods in Language Processing (pp. 44–49). Manchester, UK.
Southwell, R. (1940). Relaxation methods in engineering science. Clarendon.
Torras, C. (1989). Relaxation and neural learning: Points of convergence and divergence. Journal of Parallel and Distributed Computing, 6, 217–244.
Google Scholar
Voutilainen, A. (1994). Three studies of grammar-based surface parsing on unrestricted English text. Ph.d. Thesis, Department of General Linguistics. University of Helsinki.
Voutilainen, A. & Padró, L. (1997). Developing a hybrid NP parser. Proceedings of the 5th Conference on Applied Natural Language Processing, ANLP (pp. 80–97). Washington DC: ACL.
Google Scholar
Waltz, D. (1975). Understanding line drawings of scenes with shadows: Psychology of Computer Vision. New York: McGraw-Hill.
Google Scholar
Weischedel, R., Schwartz, R., Palmucci, J., Meteer, M., & Ramshaw, L. (1993). Coping with ambiguity and unknown words through probabilistic models. Computational Linguistics, 19(2), 359–382.
Google Scholar
Wilks, Y. & Stevenson, M. (1997). Combining independent knowledge sources for word sense disambiguation. Proceedings of the Second Conference on Recent Advances in Natural Language Processing, RANLP (pp. 1–7), Tzigov Chark, Bulgaria.
Zhou, X. & Dillon, T.S. (1991). A statistical-heuristic feature selection criterion for decision tree induction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(8), 834–841.
Google Scholar

Download references

Author information

Authors and Affiliations

Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, c/ Jordi Girona 1–3, Barcelona, 08034, Catalonia
Lluís Màrquez
Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, c/ Jordi Girona 1–3, Barcelona, 08034, Catalonia
Lluís Padró
Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, c/ Jordi Girona 1–3, Barcelona, 08034, Catalonia
Horacio Rodríguez

Authors

Lluís Màrquez
View author publications
You can also search for this author in PubMed Google Scholar
Lluís Padró
View author publications
You can also search for this author in PubMed Google Scholar
Horacio Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Màrquez, L., Padró, L. & Rodríguez, H. A Machine Learning Approach to POS Tagging. Machine Learning 39, 59–91 (2000). https://doi.org/10.1023/A:1007673816718

Download citation

Issue Date: April 2000
DOI: https://doi.org/10.1023/A:1007673816718

A Machine Learning Approach to POS Tagging

Abstract

Article PDF

Similar content being viewed by others

Joint PoS Tagging and Stemming for Agglutinative Languages

An Approach to the POS Tagging Problem Using Genetic Algorithms

Building a Pos Tagger and Lemmatizer for the Italian Language

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Machine Learning Approach to POS Tagging

Abstract

Article PDF

Similar content being viewed by others

Joint PoS Tagging and Stemming for Agglutinative Languages

An Approach to the POS Tagging Problem Using Genetic Algorithms

Building a Pos Tagger and Lemmatizer for the Italian Language

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation