[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.3115/1219840.1219906dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

Clause restructuring for statistical machine translation

Published: 25 June 2005 Publication History

Abstract

We describe a method for incorporating syntactic information in statistical machine translation systems. The first step of the method is to parse the source language string that is being translated. The second step is to apply a series of transformations to the parse tree, effectively reordering the surface string on the source language side of the translation system. The goal of this step is to recover an underlying word order that is closer to the target language word-order than the original string. The reordering approach is applied as a pre-processing step in both the training and decoding phases of a phrase-based statistical MT system. We describe experiments on translation from German to English, showing an improvement from 25.2% Bleu score for a baseline system to 26.8% Bleu score for the system with reordering, a statistically significant improvement.

References

[1]
Alshawi, H. (1996). Head automata and bilingual tiling: Translation with minimal representations (invited talk). In Proceedings of ACL 1996.
[2]
Berger, A. L., Pietra, S. A. D., and Pietra, V. J. D. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39--69.
[3]
Brown, P. F., Pietra, S. A. D., Pietra, V. J. D., and Mercer, R. L. (1993). The mathematics of statistical machine translation. Computational Linguistics, 19(2):263--313.
[4]
Charniak, E., Knight, K., and Yamada, K. (2003). Syntax-based language models for statistical machine translation. In Proceedings of the MT Summit IX.
[5]
Dubey, A. and Keller, F. (2003). Parsing german with sister-head dependencies. In Proceedings of ACL 2003.
[6]
Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Springer-Verlag.
[7]
Galley, M., Hopkins, M., Knight, K., and Marcu, D. (2004). What's in a translation rule? In Proceedings of HLT-NAACL 2004.
[8]
Gildea, D. (2003). Loosely tree-based alignment for machine translation. In Proceedings of ACL 2003.
[9]
Graehl, J. and Knight, K. (2004). Training tree transducers. In Proceedings of HLT-NAACL 2004.
[10]
Koehn, P. (2004). Statistical significance tests for machine translation evaluation. In Lin, D. and Wu, D., editors, Proceedings of EMNLP 2004.
[11]
Koehn, P. and Knight, K. (2003). Feature-rich statistical translation of noun phrases. In Hinrichs, E. and Roth, D., editors, Proceedings of ACL 2003, pages 311--318.
[12]
Koehn, P., Och, F. J., and Marcu, D. (2003). Statistical phrase based translation. In Proceedings of HLT-NAACL 2003.
[13]
Lehmann, E. L. (1986). Testing Statistical Hypotheses (Second Edition). Springer-Verlag.
[14]
Marcu, D. and Wong, W. (2002). A phrase-based, joint probability model for statistical machine translation. In Proceedings of EMNLP 2002.
[15]
Melamed, I. D. (2004). Statistical machine translation by parsing. In Proceedings of ACL 2004.
[16]
Niessen, S. and Ney, H. (2004). Statistical machine translation with scarce resources using morpho-syntactic information. Computational Linguistics, 30(2):181--204.
[17]
Och, F. J. (2003). Minimum error rate training in statistical machine translation. In Proceedings of ACL 2003.
[18]
Och, F. J., Gildea, D., Khudanpur, S., Sarkar, A., Yamada, K., Fraser, A., Kumar, S., Shen, L., Smith, D., Eng, K., Jain, V., Jin, Z., and Radev, D. (2004). A smorgasbord of features for statistical machine translation. In Proceedings of HLT-NAACL 2004.
[19]
Och, F. J., Tillmann, C., and Ney, H. (1999). Improved alignment models for statistical machine translation. In Proceedings of EMNLP 1999, pages 20--28.
[20]
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). BLEU: a method for automatic evaluation of machine translation. In Proceedings of ACL 2002.
[21]
Shen, L., Sarkar, A., and Och, F. J. (2004). Discriminative reranking for machine translation. In Proceedings of HLT-NAACL 2004.
[22]
Wasserman, L. (2004). All of Statistics. Springer-Verlag.
[23]
Wu, D. (1997). Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23(3).
[24]
Xia, F. and McCord, M. (2004). Improving a statistical MT system with automatically learned rewrite patterns. In Proceedings of Coling 2004.
[25]
Yamada, K. and Knight, K. (2001). A syntax-based statistical translation model. In Proceedings of ACL 2001.
[26]
Zhang, Y. and Vogel, S. (2004). Measuring confidence intervals for the machine translation evaluation metrics. In Proceedings of the Tenth Conference on Theoretical and Methodological Issues in Machine Translation (TMI).

Cited By

View all
  • (2024)Enhancing Lyrics Rewriting with Weak Supervision from Grammatical Error Correction Pre-training and Reference Knowledge FusionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/368712623:11(1-26)Online publication date: 21-Nov-2024
  • (2023)Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive LearningProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615093(326-336)Online publication date: 21-Oct-2023
  • (2023)DAS-CL: Towards Multimodal Machine Translation via Dual-Level Asymmetric Contrastive LearningProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614832(337-347)Online publication date: 21-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
June 2005
657 pages
  • General Chair:
  • Kevin Knight

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 25 June 2005

Qualifiers

  • Article

Acceptance Rates

ACL '05 Paper Acceptance Rate 77 of 423 submissions, 18%;
Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)82
  • Downloads (Last 6 weeks)5
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Enhancing Lyrics Rewriting with Weak Supervision from Grammatical Error Correction Pre-training and Reference Knowledge FusionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/368712623:11(1-26)Online publication date: 21-Nov-2024
  • (2023)Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive LearningProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615093(326-336)Online publication date: 21-Oct-2023
  • (2023)DAS-CL: Towards Multimodal Machine Translation via Dual-Level Asymmetric Contrastive LearningProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614832(337-347)Online publication date: 21-Oct-2023
  • (2022)Low-resource Neural Machine Translation: Methods and TrendsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/352430021:5(1-22)Online publication date: 15-Nov-2022
  • (2022)Data-Driven Fuzzy Target-Side Representation for Intelligent Translation SystemIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2022.316712930:11(4568-4577)Online publication date: 1-Nov-2022
  • (2021)Unsupervised Neural Machine Translation for Similar and Distant Language PairsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/341805920:1(1-17)Online publication date: 31-Mar-2021
  • (2021)Integrating Prior Translation Knowledge Into Neural Machine TranslationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2021.313871430(330-339)Online publication date: 28-Dec-2021
  • (2021)Syntax-Aware Multi-Spans Generation for Reading ComprehensionIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2021.313867930(260-268)Online publication date: 28-Dec-2021
  • (2021)Which Apple Keeps Which Doctor Away? Colorful Word Representations With Visual OraclesIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2021.313097230(49-59)Online publication date: 28-Dec-2021
  • (2021)Modeling Future Cost for Neural Machine TranslationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2020.304200629(770-781)Online publication date: 21-Jan-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media