[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1613715.1613828dlproceedingsArticle/Chapter ViewAbstractPublication PagesemnlpConference Proceedingsconference-collections
research-article
Free access

Two languages are better than one (for syntactic parsing)

Published: 25 October 2008 Publication History

Abstract

We show that jointly parsing a bitext can substantially improve parse quality on both sides. In a maximum entropy bitext parsing model, we define a distribution over source trees, target trees, and node-to-node alignments between them. Features include monolingual parse scores and various measures of syntactic divergence. Using the translated portion of the Chinese treebank, our model is trained iteratively to maximize the marginal likelihood of training tree pairs, with alignments treated as latent variables. The resulting bitext parser outperforms state-of-the-art monolingual parser baselines by 2.5 F1 at predicting English side trees and 1.8 F1 at predicting Chinese side trees (the highest published numbers on these corpora). Moreover, these improved trees yield a 2.4 BLEU increase when used in a downstream MT evaluation.

References

[1]
Anthony Aue, Arul Menezes, Bob Moore, Chris Quirk, and Eric Ringger. 2004. Statistical machine translation using labeled semantic dependency graphs. In TMI.
[2]
Ann Bies, Martha Palmer, Justin Mott, and Colin Warner. 2007. English chinese translation treebank v 1.0. Web download. LDC2007T02.
[3]
Daniel M. Bikel and David Chiang. 2000. Two statistical parsing models applied to the chinese treebank. In Second Chinese Language Processing Workshop.
[4]
Eugene Charniak and Mark Johnson. 2005. Coarse-to-fine n-best parsing and maxent discriminative reranking. In ACL.
[5]
David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201--228.
[6]
Michael Collins. 2003. Head-driven statistical models for natural language parsing. Computational Linguistics, 29(4):589--637.
[7]
John DeNero and Dan Klein. 2007. Tailoring word alignments to syntactic machine translation. In ACL.
[8]
Yuan Ding and Martha Palmer. 2005. Machine translation using probabilistic synchronous dependency insertion grammars. In ACL.
[9]
Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. 2004. What's in a translation rule? In HLT-NAACL.
[10]
Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In COLING-ACL.
[11]
Liang Huang, Kevin Knight, and Aravind Joshi. 2006. Statistical syntax-directed translation with extended domain of locality. In HLT-NAACL.
[12]
Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In ACL.
[13]
Percy Liang, Ben Taskar, and Dan Klein. 2006. Alignment by agreement. In HLT-NAACL.
[14]
Yuval Marton and Philip Resnik. 2008. Soft syntactic constraints for hierarchical phrase-based translation. In ACL.
[15]
Franz Josef Och, Daniel Gildea, Sanjeev Khudanpur, Anoop Sarkar, Kenji Yamada, Alex Fraser, Shankar Kumar, Libin Shen, David Smith, Katherine Eng, Viren Jain, Zhen Jin, and Dragomir Radev. 2003. Syntax for statistical machine translation. Technical report, CLSP, Johns Hopkins University.
[16]
Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2001. Bleu: a method for automatic evaluation of machine translation. Research report, IBM. RC22176.
[17]
Slav Petrov and Dan Klein. 2007. Improved inference for unlexicalized parsing. In HLT-NAACL.
[18]
Chris Quirk, Arul Menezes, and Colin Cherry. 2005. Dependency treelet translation: Syntactically informed phrasal smt. In ACL.
[19]
Libin Shen, Jinxi Xu, and Ralph Weishedel. 2008. A new string-to-dependency machine translation algorithm with a target dependency language model. In ACL.
[20]
David A. Smith and Noah A. Smith. 2004. Bilingual parsing with factored estimation: using english to parse korean. In EMNLP.
[21]
Leslie G. Valiant. 1979. The complexity of computing the permanent. In Theoretical Computer Science 8.
[22]
Wen Wang, Andreas Stolcke, and Jing Zheng. 2007. Reranking machine translation hypotheses with structured and web-based language models. In IEEE ASRU Workshop.
[23]
Dekai Wu. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23(3):377--404.
[24]
Nianwen Xue, Fu-Dong Chiou, and Martha Palmer. 2002. Building a large-scale annotated chinese corpus. In COLING.
[25]
Kenji Yamada and Kevin Knight. 2001. A syntax-based statistical translation model. In ACL.
[26]
Hao Zhang, Chris Quirk, Robert C. Moore, and Daniel Gildea. 2008. Bayesian learning of non-compositional phrases with synchronous parsing. In ACL.
[27]
Andreas Zollmann, Ashish Venugopal, Stephan Vogel, and Alex Waibel. 2006. The cmu-aka syntax augmented machine translation system for iwslt-06. In IWSLT.

Cited By

View all
  • (2015)Joint learning of constituency and dependency grammars by decomposed cross-lingual inductionProceedings of the 24th International Conference on Artificial Intelligence10.5555/2832249.2832381(953-959)Online publication date: 25-Jul-2015
  • (2015)SMPLearnerAutomated Software Engineering10.1007/s10515-014-0161-322:1(111-141)Online publication date: 1-Mar-2015
  • (2012)Learning to map into a universal POS tagsetProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2391104(1368-1378)Online publication date: 12-Jul-2012
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing
October 2008
1129 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 25 October 2008

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 73 of 234 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)9
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2015)Joint learning of constituency and dependency grammars by decomposed cross-lingual inductionProceedings of the 24th International Conference on Artificial Intelligence10.5555/2832249.2832381(953-959)Online publication date: 25-Jul-2015
  • (2015)SMPLearnerAutomated Software Engineering10.1007/s10515-014-0161-322:1(111-141)Online publication date: 1-Mar-2015
  • (2012)Learning to map into a universal POS tagsetProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2391104(1368-1378)Online publication date: 12-Jul-2012
  • (2012)Re-training monolingual parser bilingually for syntactic SMTProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2391040(854-862)Online publication date: 12-Jul-2012
  • (2012)An exploration of forest-to-string translationProceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 210.5555/2390665.2390738(317-321)Online publication date: 8-Jul-2012
  • (2012)Cross-lingual parse disambiguation based on semantic correspondenceProceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 210.5555/2390665.2390697(125-129)Online publication date: 8-Jul-2012
  • (2012)Higher-order constituent parsing and parser combinationProceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 210.5555/2390665.2390667(1-5)Online publication date: 8-Jul-2012
  • (2012)Exploiting multiple treebanks for parsing with quasi-synchronous grammarsProceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 110.5555/2390524.2390619(675-684)Online publication date: 8-Jul-2012
  • (2012)Selective sharing for multilingual dependency parsingProceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 110.5555/2390524.2390613(629-637)Online publication date: 8-Jul-2012
  • (2011)Relaxed cross-lingual projection of constituent syntaxProceedings of the Conference on Empirical Methods in Natural Language Processing10.5555/2145432.2145558(1192-1201)Online publication date: 27-Jul-2011
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media