[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Improve syntax-based translation using deep syntactic structures

Published: 01 June 2010 Publication History

Abstract

This paper introduces deep syntactic structures to syntax-based Statistical Machine Translation (SMT). We use a Head-driven Phrase Structure Grammar (HPSG) parser to obtain the deep syntactic structures of a sentence, which include not only a fine-grained syntactic property description but also a semantic representation. Considering the abundant information included in the deep syntactic structures, it is interesting to investigate whether or not they improve the traditional syntax-based translation models based on PCFG parsers. In order to use deep syntactic structures for SMT, this paper focuses on extracting tree-to-string translation rules from aligned HPSG tree---string pairs. The major challenge is to properly localize the non-local relations among nodes in an HPSG tree. To localize the semantic dependencies among words and phrases, which can be inherently non-local, a minimum covering tree is defined by taking a predicate word and its lexical/phrasal arguments as the frontier nodes. Starting from this definition, a linear-time algorithm is proposed to extract translation rules through one-time traversal of the leaf nodes in an HPSG tree. Extensive experiments on a tree-to-string translation system testified the effectiveness of our proposal.

References

[1]
Birch A, Osborne M, Koehn P (2007) CCG supertags in factored statistical machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, pp 9-16.
[2]
Carpenter B (1992) The logic of typed feature structures. Cambridge University Press, New York.
[3]
Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: Proceedings of ACL, Ann Arbor, MI, pp 263-270.
[4]
Chiang D (2007) Hierarchical phrase-based translation. Comput Lingust 33(2):201-228.
[5]
Ding Y, Palmer M (2005) Machine translation using probabilistic synchronous dependency insertion grammers. In: Proceedings of ACL, Ann Arbor, pp 541-548.
[6]
Galley M, Hopkins M, Knight K, Marcu D (2004) What's in a translation rule? In: Proceedings of HLT-NAACL.
[7]
Galley M, Graehl J, Knight K, Marcu D, De Neefe S, Wang W, Thayer I (2006) Scalable inference and training of context-rich syntactic translation models. In: Proceedings of COLING-ACL, Sydney, pp 961-968.
[8]
Hassan H, Sima'an K, Way A (2007) Supertagged phrase-based statistical machine translation. In: Proceedings of ACL, pp 288-295.
[9]
Huang L, Knight K, Joshi A (2006) Statistical syntax-directed translation with extended domain of locality. In: Proceedings of 7th AMTA, Boston, MA.
[10]
Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Lin D, Wu D (eds) Proceedings of EMNLP 2004, pp 388-395.
[11]
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the ACL 2007 demo and poster sessions, pp 177-180.
[12]
Li Z, Callison-Burch C, Dyery C, Ganitkevitch J, Khudanpur S, Schwartz L, Thornton WNG, Weese J, Zaidan OF (2009) Demonstration of Joshua: an open source toolkit for parsing-based machine translation. In: Proceedings of the ACL-IJCNLP 2009 software demonstrations, pp 25-28.
[13]
Liu Y, Liu Q, Lin S (2006) Tree-to-string alignment templates for statistical machine translation. In: Proceedings of COLING-ACL, pp 609-616.
[14]
Liu Y, Lü Y, Liu Q (2009a) Improving tree-to-tree translation with packed forests. In: Proceedings of ACL-IJCNLP, Suntec, Singapore, pp 558-566.
[15]
Liu Y, Mi H, Feng Y, Liu Q (2009b) Joint decoding with multiple translation models. In: Proceedings of ACL-IJCNLP, pp 576-584.
[16]
Mi H, Huang L (2008) Forest-based translation rule extraction. In: Proceedings of EMNLP, Honolulu, Hawaii, pp 206-214.
[17]
Mi H, Huang L, Liu Q (2008) Forest-based translation. In: Proceedings of ACL-08:HLT, Columbus, OH, pp 192-199.
[18]
Miyao Y, Tsujii J (2008) Feature forest models for probabilistic HPSG parsing. Comput Lingust 34(1):35-80.
[19]
Miyao Y, Ninomiya T, Tsujii J (2003) Probabilistic modeling of argument structures including nonlocal dependencies. In: Proceedings of the international conference on recent advances in natural language processing, Borovets, pp 285-291.
[20]
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of ACL, pp 160-167.
[21]
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19-51.
[22]
Oepen S, Velldal E, Lønning JT, Meurer P, Rosén V (2007) Towards hybrid quality-oriented machine translation--on linguistics and probabilities in MT. In: Proceedings of the 11th international conference on theoretical and methodological issues in machine translation (TMI-07).
[23]
Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL, pp 311-318.
[24]
Pollard C, Sag IA (1994) Head-driven phrase structure grammar. University of Chicago Press, Chicago.
[25]
Quirk C, Menezes A, Cherry C (2005) Dependency treelet translation: syntactically informed phrasal SMT. In: Proceedings of ACL, pp 271-279.
[26]
Riezler S, Maxwell JT III (2006) Grammatical machine translation. In: Proceedings of HLT-NAACL, Morristown, NJ, USA, pp 248-255.
[27]
Sag IA, Wasow T, Bender EM (2003) Syntactic theory: a formal introduction. Number 152 in CSLI lecture notes. CSLI Publications, Stanford.
[28]
Shen L, Xu J, Weischedel R (2008) A new string-to-dependency machine translation algorithm with a target dependency language model. In: Proceedings of ACL-08:HLT, Columbus, OH, pp 577-585.
[29]
Stolcke A (2002) SRILM--an extensible language modeling toolkit. In: Proceedings of international conference on spoken language processing, pp 901-904.
[30]
Utiyama M, Isahara H (2007) A Japanese-English patent parallel corpus. In: Proceedings of MT summit XI, Copenhagen, pp 475-482.
[31]
Wu D (1997) Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput Linguist 23(3):377-403.
[32]
Zaidan OF (2009) Z-MERT: a fully configurable open source tool for minimum error rate training of machine translation systems. Prague Bull Math Linguist 91:79-88.
[33]
Zhang H, Zhang M, Li H, Aw A, Tan CL (2009) Forest-based tree sequence to string translation model. In: Proceedings of ACL-IJCNLP, Suntec, Singapore, pp 172-180.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Machine Translation
Machine Translation  Volume 24, Issue 2
June 2010
105 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 June 2010

Author Tags

  1. Head-driven Phrase Structure Grammar
  2. Predicate---argument structures
  3. Syntax-based translation
  4. Typed feature structures

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media