More Web Proxy on the site http://driver.im/

research-article

Free access

Direct error rate minimization for statistical machine translation

Authors:

Tagyoung Chung,

Michel GalleyAuthors Info & Claims

WMT '12: Proceedings of the Seventh Workshop on Statistical Machine Translation

Pages 468 - 479

Published: 07 June 2012 Publication History

Abstract

Minimum error rate training is often the preferred method for optimizing parameters of statistical machine translation systems. MERT minimizes error rate by using a surrogate representation of the search space, such as N-best lists or hypergraphs, which only offer an incomplete view of the search space. In our work, we instead minimize error rate directly by integrating the decoder into the minimizer. This approach yields two benefits. First, the function being optimized is the true error rate. Second, it lets us optimize parameters of translations systems other than standard linear model features, such as distortion limit. Since integrating the decoder into the minimizer is often too slow to be practical, we also exploit statistical significance tests to accelerate the search by quickly discarding unpromising models. Experiments with a phrase-based system show that our approach is scalable, and that optimizing the parameters that MERT cannot handle brings improvements to translation results.

References

[1]

Oliver Bender, Richard Zens, Evgeny Matusov, and Hermann Ney. 2004. Alignment templates: the RWTH SMT system. In Proc. of the International Workshop on Spoken Language Translation, pages 79--84, Kyoto, Japan.

[2]

Yin-Wen Chang and Michael Collins. 2011. Exact decoding of phrase-based translation models through Lagrangian relaxation. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 26--37, Edinburgh, Scotland, UK., July. Association for Computational Linguistics.

Digital Library

[3]

David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Conference of the Association for Computational Linguistics (ACL-05), pages 263--270, Ann Arbor, MI.

Digital Library

[4]

Hal Daumé, III. 2006. Practical Structured Learning Techniques for Natural Language Processing. Ph.D. thesis, University of Southern California, Los Angeles, CA, USA.

Digital Library

[5]

B. Efron and R. J. Tibshirani. 1993. An Introduction to the Bootstrap. Chapman & Hall, New York.

[6]

Michel Galley and Christopher D. Manning. 2008. A simple and effective hierarchical phrase reordering model. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 848--856, Honolulu, Hawaii, October. Association for Computational Linguistics.

Digital Library

[7]

Michel Galley and Chris Quirk. 2011. Optimal search for minimum error rate training. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 38--49, Edinburgh, Scotland, UK., July. Association for Computational Linguistics.

Digital Library

[8]

Spence Green, Michel Galley, and Christopher D. Manning. 2010. Improved models of distortion cost for statistical machine translation. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 867--875, Los Angeles, California, June. Association for Computational Linguistics.

Digital Library

[9]

Wassily Hoeffding. 1963. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301): 13--30.

[10]

Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Meeting of the North American chapter of the Association for Computational Linguistics (NAACL-03), Edmonton, Alberta.

Digital Library

[11]

Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of ACL, Demonstration Session, pages 177--180.

Digital Library

[12]

Shankar Kumar, Wolfgang Macherey, Chris Dyer, and Franz Och. 2009. Efficient minimum error rate training and minimum Bayes-risk decoding for translation hypergraphs and lattices. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 163--171, Suntec, Singapore, August. Association for Computational Linguistics.

Digital Library

[13]

Wolfgang Macherey, Franz Och, Ignacio Thayer, and Jakob Uszkoreit. 2008. Lattice-based minimum error rate training for statistical machine translation. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 725--734, Honolulu, Hawaii, October. Association for Computational Linguistics.

Digital Library

[14]

Oded Maron and Andrew W. Moore. 1994. Hoeffding races: Accelerating model selection search for classification and function approximation. In Advances in neural information processing systems 6, pages 59--66. Morgan Kaufmann.

[15]

Andrew Moore and Mary Soon Lee. 1994. Efficient algorithms for minimizing cross validation error. In W. W. Cohen and H. Hirsh, editors, Proceedings of the 11th International Confonference on Machine Learning, pages 190--198. Morgan Kaufmann.

[16]

Robert C. Moore and Chris Quirk. 2008. Random restarts in minimum error rate training for statistical machine translation. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 585--592, Manchester, UK, August. Coling 2008 Organizing Committee.

Digital Library

[17]

J. A. Nelder and R. Mead. 1965. A simplex method for function minimization. Computer Journal, 7: 308--313.

[18]

Eric W. Noreen. 1989. Computer-Intensive Methods for Testing Hypotheses: An Introduction. Wiley-Interscience.

[19]

Franz Josef Och, Christoph Tillmann, Hermann Ney, and Lehrstuhl Fiir Informatik. 1999. Improved alignment models for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pages 20--28.

[20]

Franz Josef Och. 2003. Minimum error rate training for statistical machine translation. In Proceedings of the 41th Annual Conference of the Association for Computational Linguistics (ACL-03).

Digital Library

[21]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Conference of the Association for Computational Linguistics (ACL-02).

Digital Library

[22]

M. J. D. Powell. 1964. An efficient method for finding the minimum of a function of several variables without calculating derivatives. The Computer Journal, 7: 155--162.

[23]

William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. 1992. Numerical recipes in C (2nd ed.): the art of scientific computing. Cambridge University Press, New York, NY, USA.

Digital Library

[24]

Stefan Riezler and John T. Maxwell. 2005. On some pitfalls in automatic evaluation and significance testing for MT. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 57--64, June.

[25]

Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas, pages 223--231.

[26]

M H Wright. 1995. Direct search methods: Once scorned, now respectable. Numerical Analysis, 344: 191--208.

[27]

Richard Zens, Sasa Hasan, and Hermann Ney. 2007. A systematic comparison of training criteria for statistical machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 524--532.

[28]

Bing Zhao and Shengyuan Chen. 2009. A simplex Armijo downhill algorithm for optimizing statistical machine translation decoding parameters. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pages 21--24, Boulder, Colorado, June. Association for Computational Linguistics.

Digital Library

Direct error rate minimization for statistical machine translation
1. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination
EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics

In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven ...
Linguistically annotated BTG for statistical machine translation
COLING '08: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1

Bracketing Transduction Grammar (BTG) is a natural choice for effective integration of desired linguistic knowledge into statistical machine translation (SMT). In this paper, we propose a Linguistically Annotated BTG (LABTG) for SMT. It conveys ...
Lattice-based minimum error rate training for statistical machine translation
EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing

Minimum Error Rate Training (MERT) is an effective means to estimate the feature function weights of a linear model such that an automated evaluation criterion for measuring system performance can directly be optimized in training. To accomplish this, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

WMT '12: Proceedings of the Seventh Workshop on Statistical Machine Translation

June 2012

509 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 07 June 2012

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
95
Total Downloads

Downloads (Last 12 months)50
Downloads (Last 6 weeks)8

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents