[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1620754.1620767dlproceedingsArticle/Chapter ViewAbstractPublication PagesnaaclConference Proceedingsconference-collections
research-article
Free access

Adding more languages improves unsupervised multilingual part-of-speech tagging: a Bayesian non-parametric approach

Published: 31 May 2009 Publication History

Abstract

We investigate the problem of unsupervised part-of-speech tagging when raw parallel data is available in a large number of languages. Patterns of ambiguity vary greatly across languages and therefore even unannotated multilingual data can serve as a learning signal. We propose a non-parametric Bayesian model that connects related tagging decisions across languages through the use of multilingual latent variables. Our experiments show that performance improves steadily as the number of languages increases.

References

[1]
C. E. Antoniak. 1974. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics, 2:1152--1174, November.
[2]
Trevor Cohn and Mirella Lapata. 2007. Machine translation by triangulation: Making effective use of multiparallel corpora. In Proceedings of ACL.
[3]
T. Erjavec. 2004. MULTEXT-East version 3: Multilingual morphosyntactic specifications, lexicons and corpora. In Fourth International Conference on Language Resources and Evaluation, LREC, volume 4, pages 1535--1538.
[4]
Anna Feldman, Jirka Hana, and Chris Brew. 2006. A cross-language approach to rapid creation of new morpho-syntactically annotated resources. In Proceedings of LREC, pages 549--554.
[5]
Dmitriy Genzel. 2005. Inducing a multilingual dictionary from a parallel multitext in related languages. In Proceedings of the HLT/EMNLP, pages 875--882.
[6]
Sharon Goldwater and Thomas L. Griffiths. 2007. A fully Bayesian approach to unsupervised part-of-speech tagging. In Proceedings of the ACL, pages 744--751.
[7]
W. K. Hastings. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57:97--109.
[8]
Franz Josef Och and Hermann Ney. 2001. Statistical multi-source translation. In MT Summit 2001, pages 253--258.
[9]
Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19--51.
[10]
J. Sethuraman. 1994. A constructive definition of Dirichlet priors. Statistica Sinica, 4:639--650.
[11]
Benjamin Snyder, Tahira Naseem, Jacob Eisenstein, and Regina Barzilay. 2008. Unsupervised multilingual learning for POS tagging. In Proceedings of the EMNLP, pages 1041--1050.
[12]
Masao Utiyama and Hitoshi Isahara. 2006. A comparison of pivot methods for phrase-based statistical machine translation. In Proceedings of NAACL/HLT, pages 484--491.
[13]
David Yarowsky and Grace Ngai. 2001. Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora. In Proceedings of the NAACL, pages 1--8.

Cited By

View all
  • (2012)Universal grapheme-to-phoneme prediction over Latin alphabetsProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2390990(332-343)Online publication date: 12-Jul-2012
  • (2012)Nudging the envelope of direct transfer methods for multilingual named entity recognitionProceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure10.5555/2390426.2390435(55-63)Online publication date: 7-Jun-2012
  • (2012)Leveraging supplemental representations for sequential transductionProceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies10.5555/2382029.2382085(396-406)Online publication date: 3-Jun-2012
  • Show More Cited By

Index Terms

  1. Adding more languages improves unsupervised multilingual part-of-speech tagging: a Bayesian non-parametric approach

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image DL Hosted proceedings
      NAACL '09: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
      May 2009
      716 pages
      ISBN:9781932432411

      Publisher

      Association for Computational Linguistics

      United States

      Publication History

      Published: 31 May 2009

      Qualifiers

      • Research-article

      Acceptance Rates

      Overall Acceptance Rate 21 of 29 submissions, 72%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)37
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 14 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2012)Universal grapheme-to-phoneme prediction over Latin alphabetsProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2390990(332-343)Online publication date: 12-Jul-2012
      • (2012)Nudging the envelope of direct transfer methods for multilingual named entity recognitionProceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure10.5555/2390426.2390435(55-63)Online publication date: 7-Jun-2012
      • (2012)Leveraging supplemental representations for sequential transductionProceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies10.5555/2382029.2382085(396-406)Online publication date: 3-Jun-2012
      • (2011)Universal morphological analysis using structured nearest neighbor predictionProceedings of the Conference on Empirical Methods in Natural Language Processing10.5555/2145432.2145470(322-332)Online publication date: 27-Jul-2011
      • (2011)Multi-source transfer of delexicalized dependency parsersProceedings of the Conference on Empirical Methods in Natural Language Processing10.5555/2145432.2145440(62-72)Online publication date: 27-Jul-2011
      • (2010)Climbing the tower of babelProceedings of the 27th International Conference on International Conference on Machine Learning10.5555/3104322.3104328(29-36)Online publication date: 21-Jun-2010
      • (2010)Improved unsupervised POS induction using intrinsic clustering quality and a Zipfian constraintProceedings of the Fourteenth Conference on Computational Natural Language Learning10.5555/1870568.1870577(57-66)Online publication date: 15-Jul-2010
      • (2010)Cross-lingual variation of light verb constructionsProceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground10.5555/1870166.1870174(52-60)Online publication date: 16-Jul-2010
      • (2010)Phylogenetic grammar inductionProceedings of the 48th Annual Meeting of the Association for Computational Linguistics10.5555/1858681.1858812(1288-1297)Online publication date: 11-Jul-2010
      • (2010)Posterior Regularization for Structured Latent Variable ModelsThe Journal of Machine Learning Research10.5555/1756006.185991811(2001-2049)Online publication date: 1-Aug-2010
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media