[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2393015.2393081dlproceedingsArticle/Chapter ViewAbstractPublication PageswmtConference Proceedingsconference-collections
research-article
Free access

Direct error rate minimization for statistical machine translation

Published: 07 June 2012 Publication History

Abstract

Minimum error rate training is often the preferred method for optimizing parameters of statistical machine translation systems. MERT minimizes error rate by using a surrogate representation of the search space, such as N-best lists or hypergraphs, which only offer an incomplete view of the search space. In our work, we instead minimize error rate directly by integrating the decoder into the minimizer. This approach yields two benefits. First, the function being optimized is the true error rate. Second, it lets us optimize parameters of translations systems other than standard linear model features, such as distortion limit. Since integrating the decoder into the minimizer is often too slow to be practical, we also exploit statistical significance tests to accelerate the search by quickly discarding unpromising models. Experiments with a phrase-based system show that our approach is scalable, and that optimizing the parameters that MERT cannot handle brings improvements to translation results.

References

[1]
Oliver Bender, Richard Zens, Evgeny Matusov, and Hermann Ney. 2004. Alignment templates: the RWTH SMT system. In Proc. of the International Workshop on Spoken Language Translation, pages 79--84, Kyoto, Japan.
[2]
Yin-Wen Chang and Michael Collins. 2011. Exact decoding of phrase-based translation models through Lagrangian relaxation. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 26--37, Edinburgh, Scotland, UK., July. Association for Computational Linguistics.
[3]
David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Conference of the Association for Computational Linguistics (ACL-05), pages 263--270, Ann Arbor, MI.
[4]
Hal Daumé, III. 2006. Practical Structured Learning Techniques for Natural Language Processing. Ph.D. thesis, University of Southern California, Los Angeles, CA, USA.
[5]
B. Efron and R. J. Tibshirani. 1993. An Introduction to the Bootstrap. Chapman & Hall, New York.
[6]
Michel Galley and Christopher D. Manning. 2008. A simple and effective hierarchical phrase reordering model. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 848--856, Honolulu, Hawaii, October. Association for Computational Linguistics.
[7]
Michel Galley and Chris Quirk. 2011. Optimal search for minimum error rate training. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 38--49, Edinburgh, Scotland, UK., July. Association for Computational Linguistics.
[8]
Spence Green, Michel Galley, and Christopher D. Manning. 2010. Improved models of distortion cost for statistical machine translation. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 867--875, Los Angeles, California, June. Association for Computational Linguistics.
[9]
Wassily Hoeffding. 1963. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301): 13--30.
[10]
Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Meeting of the North American chapter of the Association for Computational Linguistics (NAACL-03), Edmonton, Alberta.
[11]
Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of ACL, Demonstration Session, pages 177--180.
[12]
Shankar Kumar, Wolfgang Macherey, Chris Dyer, and Franz Och. 2009. Efficient minimum error rate training and minimum Bayes-risk decoding for translation hypergraphs and lattices. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 163--171, Suntec, Singapore, August. Association for Computational Linguistics.
[13]
Wolfgang Macherey, Franz Och, Ignacio Thayer, and Jakob Uszkoreit. 2008. Lattice-based minimum error rate training for statistical machine translation. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 725--734, Honolulu, Hawaii, October. Association for Computational Linguistics.
[14]
Oded Maron and Andrew W. Moore. 1994. Hoeffding races: Accelerating model selection search for classification and function approximation. In Advances in neural information processing systems 6, pages 59--66. Morgan Kaufmann.
[15]
Andrew Moore and Mary Soon Lee. 1994. Efficient algorithms for minimizing cross validation error. In W. W. Cohen and H. Hirsh, editors, Proceedings of the 11th International Confonference on Machine Learning, pages 190--198. Morgan Kaufmann.
[16]
Robert C. Moore and Chris Quirk. 2008. Random restarts in minimum error rate training for statistical machine translation. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 585--592, Manchester, UK, August. Coling 2008 Organizing Committee.
[17]
J. A. Nelder and R. Mead. 1965. A simplex method for function minimization. Computer Journal, 7: 308--313.
[18]
Eric W. Noreen. 1989. Computer-Intensive Methods for Testing Hypotheses: An Introduction. Wiley-Interscience.
[19]
Franz Josef Och, Christoph Tillmann, Hermann Ney, and Lehrstuhl Fiir Informatik. 1999. Improved alignment models for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pages 20--28.
[20]
Franz Josef Och. 2003. Minimum error rate training for statistical machine translation. In Proceedings of the 41th Annual Conference of the Association for Computational Linguistics (ACL-03).
[21]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Conference of the Association for Computational Linguistics (ACL-02).
[22]
M. J. D. Powell. 1964. An efficient method for finding the minimum of a function of several variables without calculating derivatives. The Computer Journal, 7: 155--162.
[23]
William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. 1992. Numerical recipes in C (2nd ed.): the art of scientific computing. Cambridge University Press, New York, NY, USA.
[24]
Stefan Riezler and John T. Maxwell. 2005. On some pitfalls in automatic evaluation and significance testing for MT. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 57--64, June.
[25]
Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas, pages 223--231.
[26]
M H Wright. 1995. Direct search methods: Once scorned, now respectable. Numerical Analysis, 344: 191--208.
[27]
Richard Zens, Sasa Hasan, and Hermann Ney. 2007. A systematic comparison of training criteria for statistical machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 524--532.
[28]
Bing Zhao and Shengyuan Chen. 2009. A simplex Armijo downhill algorithm for optimizing statistical machine translation decoding parameters. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pages 21--24, Boulder, Colorado, June. Association for Computational Linguistics.
  1. Direct error rate minimization for statistical machine translation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image DL Hosted proceedings
    WMT '12: Proceedings of the Seventh Workshop on Statistical Machine Translation
    June 2012
    509 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    Published: 07 June 2012

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 95
      Total Downloads
    • Downloads (Last 12 months)50
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media