[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

A survey of domain adaptation for statistical machine translation

  • Published:
Machine Translation

Abstract

Differences in domains of language use between training data and test data have often been reported to result in performance degradation for phrase-based machine translation models. Throughout the past decade or so, a large body of work aimed at exploring domain-adaptation methods to improve system performance in the face of such domain differences. This paper provides a systematic survey of domain-adaptation methods for phrase-based machine-translation systems. The survey starts out with outlining the sources of errors in various components of phrase-based models due to domain change, including lexical selection, reordering and optimization. Subsequently, it outlines the different research lines to domain adaptation in the literature, and surveys the existing work within these research lines, discussing how these approaches differ and how they relate to each other.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. See Ozdowska and Way (2009) for a clear demonstration that building MT systems with more EuroParl data does not always lead to better translation results.

  2. http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz.

  3. http://www.statmt.org/wmt13/training-parallel-un.tgz.

  4. http://www.statmt.org/wmt15/training-parallel-nc-v10.tgz.

  5. Readers may refer to Lopez (2008) or Koehn (2010) for a comprehensive survey of SMT in general.

  6. https://www.bing.com/translator/.

  7. https://translate.google.com.

  8. As a side note, the size of the N-best list does not seem to have a significant impact on adaptation [cf. Bertoldi and Federico (2009)].

  9. In principle, search errors caused by a decoding algorithm can be a factor. The contribution of this factor to degradation of lexical translation quality, however, is minor, as shown in Irvine et al. (2013a).

  10. In Su et al. (2012), an interpolation model is computed for \(P_{IN}(z_{{\tilde{f}}_{IN}}|\ \tilde{f})\), which is decomposed into the topic posterior distribution at word level for smoothing.

  11. Joint inference of topic models on a concatenation of \(\mathcal {S}_{IN}\) and \(\mathcal {S}_{OUT}\) would drop the requirement of computing the topic-mapping probability distribution [cf. Gong et al. (2011) and Hewavitharana et al. (2013)]. An empirical comparison of the approaches, however, has yet to be thoroughly conducted, to the best of our knowledge.

  12. There has not been any attempt at such an implementation for combining multiple sub-models, as far as we are aware.

  13. The EM algorithm often gives a more efficient and stable performance in practice [cf. Razmara et al. (2012)].

  14. Zhang et al. (2014a) improve upon the binary fill-up model of Bisazza et al. (2011) with a probability distribution over phrase pairs to signify the extent to which a phrase pair is considered in-domain or out-of-domain.

References

  • Axelrod A, He X, Gao J (2011) Domain adaptation via pseudo in-domain data selection. In: EMNLP ’11: proceedings of the conference on empirical methods in natural language processing, Edinburgh, UK, pp 355–362

  • Aziz W, Dymetman M, Specia L (2014) Exact decoding for phrase-based statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, pp 1237–1249

  • Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceedings of the international conference on learning representations, San Diego, CA

  • Bertoldi N, Federico M (2009) Domain adaptation for statistical machine translation with monolingual resources. In: Proceedings of the fourth workshop on statistical machine translation, Athens, Greece, pp 182–189

  • Besling S, Meier HG (1995) Language model speaker adaptation. In: Fourth European conference on speech communication and technology (EUROSPEECH ’95), Madrid, Spain, pp 1755–1758

  • Bhattacharyya A (1946) On a measure of divergence between two multinomial populations. Sankhya Indian J Stat 7(4):401–406

    MathSciNet  MATH  Google Scholar 

  • Birch A, Osborne M, Koehn P (2007) CCG supertags in factored statistical machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 9–16

  • Bisazza A, Ruiz N, Federico M (2011) Fill-up versus interpolation methods for phrase-based SMT adaptation. In: 2011 international workshop on spoken language translation, IWSLT, San Francisco, CA, USA, pp 136–143

  • Biçici E, Yuret D (2011) Instance selection for machine translation using feature decay algorithms. In: WMT 2011: Proceedings of the 6th workshop on statistical machine translation, Edinburgh, Scotland, UK, pp 272–283

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Blunsom P, Osborne M (2008) Probabilistic inference for machine translation. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 215–223

  • Bod R, Scha R, Sima’an K (2003) Data-oriented parsing. Center for the Study of Language and Information—Lecture Notes, Amsterdam, The Netherlands

  • Brown PF, Cocke J, Della Pietra SA, Della Pietra VJ, Jelinek F, Lafferty JD, Mercer RL, Roossin PS (1990) A statistical approach to machine translation. Comput Linguist 16(2):79–85

    Google Scholar 

  • Brown PF, Della Pietra VJ, Della Pietra SA, Mercer RL (1993) The mathematics of statistical machine translation: Parameter estimation. Comput Linguist 19(2):263–311

    Google Scholar 

  • Carl M, Way A (eds) (2003) Recent advances in example-based machine translation. Kluwer Academic Publishers, Dordrecht

    MATH  Google Scholar 

  • Carpuat M, Goutte C, Foster G (2014) Linear mixture models for robust machine translation. In: Proceedings of the ninth workshop on statistical machine translation, Baltimore, Maryland, USA, pp 499–509

  • Chang YW, Collins M (2011) Exact decoding of phrase-based translation models through lagrangian relaxation. In: Proceedings of the conference on empirical methods in natural language processing, Edinburgh, United Kingdom, pp 26–37

  • Chang YW, Rush AM, DeNero J, Collins M (2014) A constrained viterbi relaxation for bidirectional word alignment. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: long papers), Baltimore, Maryland, pp 1481–1490

  • Chen B, Foster G, Kuhn R (2013) Adaptation of reordering models for statistical machine translation. In: 2013 conference of the North American Chapter of the Association for computational linguistics: human language technologies. Atlanta, Georgia, pp 938–946

  • Chen B, Kuhn R, Foster GF (2013b) Vector space model for adaptation in statistical machine translation. In: Proceedings of the 51st annual meeting of the association for computational linguistics, volume 1: long papers, Sofia, Bulgaria, pp 1285–1293

  • Chen B, Kuhn R, Foster G, Cherry C, Huang F (2016) Bilingual methods for adaptive training data selection for machine translation. In: Conference of the association for machine translation in the Americas, the twelfth conference of the association for machine translation in the Americas, Austin, Texas, pp 93–106

  • Cherry C, Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: human language technologies, Montreal, Canada, pp 427–436

  • Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd annual meeting on association for computational linguistics, Ann Arbor, Michigan, pp 263–270

  • Chiang D (2007) Hierarchical phrase-based translation. Comput Linguist 33(2):202–228

    Article  MATH  Google Scholar 

  • Chiang D, Marton Y, Resnik P (2008) Online large-margin training of syntactic and structural translation features. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 224–233

  • Chiang D, Knight K, Wang W (2009) 11,001 new features for statistical machine translation. In: Proceedings of Human Language Technologies: The 2009 annual conference of the North American chapter of the association for computational linguistics, Boulder, Colorado, pp 218–226

  • Chiang D, DeNeefe S, Pust M (2011) Two easy improvements to lexical weighting. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers, Portland, Oregon, vol 2, pp 455–460

  • Clark J, Dyer C, Lavie A (2014) Locally non-linear learning for statistical machine translation via discretization and structured regularization. Trans Assoc Comput Linguist 2:393–404

    Google Scholar 

  • Clarkson P, Robinson A (1997) Language model adaptation using mixtures and an exponentially decaying cache. In: IEEE international conference on acoustics, speech, and signal processing, ICASSP-97. Munich, Germany, pp 799–802

  • Cui L, Chen X, Zhang D, Liu S, Li M, Zhou M (2013) Multi-domain adaptation for SMT using multi-task learning. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, Washington, USA, pp 1055–1065

  • Cuong H, Sima’an K (2014a) Latent domain phrase-based models for adaptation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, pp 566–576

  • Cuong H, Sima’an K (2014b) Latent domain translation models in mix-of-domains haystack. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, Dublin, Ireland, pp 1928–1939

  • Cuong H, Sima’an K (2015) Latent domain word alignment for heterogeneous corpora. In: Proceedings of the 2015 conference of the north american chapter of the association for computational linguistics: human language technologies, Denver, Colorado, USA, pp 398–408

  • Cuong H, Frank S, Sima’an K (2016a) ILLC-UvA adaptation system (scorpio) at WMT’16 IT-DOMAIN Task. In: Proceedings of the first conference on machine translation, shared task papers, Berlin, Germany, vol 2, pp 423–427

  • Cuong H, Sima’an K, Titov I (2016b) Adapting to all domains at once: rewarding domain invariance in SMT. Trans Assoc Comput Linguist 4:99–112

    Google Scholar 

  • Daumé H III, Jagarlamudi J (2011) Domain adaptation for machine translation by mining unseen words. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers, Portland, Oregon, vol 2, pp 407–412

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  • Devlin J, Zbib R, Huang Z, Lamar T, Schwartz R, Makhoul J (2014) Fast and robust neural network joint models for statistical machine translation. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: long papers), Baltimore, Maryland, pp 1370–1380

  • Dong M, Cheng Y, Liu Y, Xu J, Sun M, Izuha T, Hao J (2014) Query lattice for translation retrieval. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, Dublin, Ireland, pp 2031–2041

  • Duh K, Sudoh K, Tsukada H (2010) Analysis of translation model adaptation in statistical machine translation. In: 2010 international workshop on spoken language translation, IWSLT 2010. France, Paris, pp 243–250

  • Duh K, Neubig G, Sudoh K, Tsukada H (2013) Adaptation data selection using neural language models: experiments in machine translation. In: 51st annual meeting of the association for computational linguistics (short papers). Sofia, Bulgaria, vol 2, pp 678–683

  • Durrani N, Sajjad H, Joty S, Abdelali A, Vogel S (2015) Using joint models for domain adaptation in statistical machine translation. In: Proceedings of the MT summit XV, MT researchers’ track, Miami, Florida, USA, vol. 1, pp 117–130

  • Eck M, Vogel S, Waibel A (2005) Low cost portability for statistical machine translation based on n-gram coverage. In: MT Summit X, conference proceedings: the tenth machine translation summit, Phuket, Thailand, pp 227–234

  • Eidelman V, Boyd-Graber J, Resnik P (2012) Topic models for dynamic translation model adaptation. In: Proceedings of the 50th annual meeting of the association for computational linguistics: short papers, Jeju Island, Korea, vol 2, pp 115–119

  • Federico M, Cettolo M, Bentivogli L, Paul M, Stüker S (2012) Overview of the IWSLT 2012 evaluation campaign. In: 2012 international workshop on spoken language translation, Hong Kong, pp 12–33

  • Foster G, Kuhn R (2007) Mixture-model adaptation for smt. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 128–135

  • Foster G, Goutte C, Kuhn R (2010) Discriminative instance weighting for domain adaptation in statistical machine translation. In: 2010 conference on empirical methods in natural language processing, Massachusetts, Cambridge, pp 451–459

  • Foster G, Chen B, Kuhn R (2013) Simulating discriminative training for linear mixture adaptation in statistical machine translation. In: Proceedings of the XIV machine translation summit, Nice, France, pp 183–190

  • Galley M, Manning CD (2008) A simple and effective hierarchical phrase reordering model. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 848–856

  • Gao Q, Lewis W, Quirk C, Hwang MY (2011) Incremental training and intentional over-fitting of word alignment. In: Proceedings of the 13th machine translation summit (MT summit XIII), Xiamen, China, pp 106–113

  • Gong Z, Zhang M, Zhou G (2011) Cache-based document-level statistical machine translation. In: Proceedings of the conference on empirical methods in natural language processing, Edinburgh, United Kingdom, pp 909–919

  • Goodman JT (1998) Parsing inside-out. PhD thesis, Harvard University, Cambridge, MA

  • Green S, Wang S, Cer D, Manning CD (2013) Fast and adaptive online training of feature-rich translation models. In: Proceedings of the 51st annual meeting of the association for computational linguistics (volume 1: long papers), Sofia, Bulgaria, pp 311–321

  • Green S, Cer DM, Manning CD (2014) An empirical comparison of features and tuning for phrase-based machine translation. In: Proceedings of the ninth workshop on statistical machine translation, WMT@ACL 2014, Baltimore, Maryland, USA, pp 466–476

  • Gruber A, Weiss Y, Rosen-Zvi M (2007) Hidden topic markov models. In: Proceedings of the eleventh international conference on artificial intelligence and statistics, San Juan, Puerto Rico, pp 163–170

  • Haddow B (2013) Applying pairwise ranked optimisation to improve the interpolation of translation models. In: Proceedings of the human language technologies: conference of the North American chapter of the association of computational linguistics, Atlanta, Georgia, USA, pp 342–347

  • Haghighi A, Liang P, Berg-Kirkpatrick T, Klein D (2008) Learning bilingual lexicons from monolingual corpora. In: Proceedings of ACL-08: HLT, Columbus, Ohio, pp 771–779

  • Hasler E, Haddow B, Koehn P (2012) Sparse lexicalised features and topic adaptation for SMT. In: 2012 international workshop on spoken language translation. IWSLT, Hong Kong, pp 268–275

  • Hasler E, Blunsom P, Koehn P, Haddow B (2014) Dynamic topic adaptation for phrase-based MT. In: Proceedings of the 14th conference of the European chapter of the association for computational linguistics, Gothenburg, Sweden, pp 328–337

  • Hewavitharana S, Mehay D, Ananthakrishnan S, Natarajan P (2013) Incremental topic-based translation model adaptation for conversational spoken language translation. In: Proceedings of the 51st annual meeting of the association for computational linguistics (volume 2: short papers), Sofia, Bulgaria, pp 697–701

  • Hieber F, Riezler S (2015) Bag-of-words forced decoding for cross-lingual information retrieval. In: NAACL HLT 2015, the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies. Denver, Colorado, USA, pp 1172–1182

  • Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 289–296

  • Hopkins M, May J (2011) Tuning as ranking. In: Proceedings of the conference on empirical methods in natural language processing, Edinburgh, United Kingdom, pp 1352–1362

  • Hu Y, Zhai K, Eidelman V, Boyd-Graber J (2014) Polylingual tree-based topic models for translation domain adaptation. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: long papers), Baltimore, Maryland, pp 1166–1176

  • Irvine A, Morgan J, Carpuat M, Daumé H III, Munteanu D (2013a) Measuring machine translation errors in new domains. Trans Assoc Comput Linguist 1:429–440

    Google Scholar 

  • Irvine A, Quirk C, Daumé III H (2013b) Monolingual marginal matching for translation model adaptation. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, Washington, USA, pp 1077–1088

  • Jeblee S, Feely W, Bouamor H, Lavie A, Habash N, Oflazer K (2014) Domain and dialect adaptation for machine translation into egyptian arabic. In: Proceedings of the EMNLP 2014 workshop on arabic natural language processing (ANLP), Doha, Qatar, pp 196–206

  • Joty S, Sajjad H, Durrani N, Al-Mannai K, Abdelali A, Vogel S (2015) How to avoid unwanted pregnancies: Domain adaptation using neural network models. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal, pp 1259–1270

  • Kettunen K (2009) Choosing the Best MT Programs for CLIR purposes—can MT metrics be helpful? In: Proceedings of the 31st European conference on information retrieval research: advances in information retrieval, Springer International Publishing, Heidelberg/Berlin, Germany. Lecture Notes in Computer Science, vol 5478, pp 706–712

  • Kirchhoff K, Bilmes J (2014) Submodularity for data selection in machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, pp 131–141

  • Koehn P (2004) Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In: Proceedings of the machine translation: from real users to research: 6th conference of the association for machine translation in the Americas, Springer, Berlin/Heidelberg, Germany, pp 115–124

  • Koehn P (2005) Europarl: A parallel corpus for statistical machine translation. In: MT Summit X, conference proceedings: the tenth machine translation summit, Phuket, Thailand, pp 79–86

  • Koehn P (2010) Statistical machine translation. Cambridge University Press, New York, NY, USA

    MATH  Google Scholar 

  • Koehn P, Knight K (2002) Learning a translation lexicon from monolingual corpora. In: Proceedings of the ACL-02 workshop on unsupervised lexical acquisition, Philadelphia, Pennsylvania, pp 9–16

  • Koehn P, Schroeder J (2007) Experiments in domain adaptation for statistical machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 224–227

  • Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, vol 1, Edmonton, Canada, pp 48–54

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, Prague, Czech Republic, pp 177–180

  • Kuhn R, De Mori R (1992) Corrections to “a cache-based language model for speech recognition”. IEEE Trans Pattern Anal Mach Intell 14(6):691–692

    Google Scholar 

  • Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    Article  MathSciNet  MATH  Google Scholar 

  • Kumar S, Byrne W (2004) Minimum Bayes-risk decoding for statistical machine translation. In: Proceedings of the human language technology conference of the North American chapter of the association for computational linguistics: HLT-NAACL 2004, Boston, Massachusetts, USA, pp 169–176

  • Lambert P, Schwenk H, Servan C, Abdul-Rauf S (2011) Investigations on translation model adaptation using monolingual data. In: Proceedings of the sixth workshop on statistical machine translation, Edinburgh, Scotland, pp 284–293

  • Lewis W, Eetemadi S (2013) Dramatically reducing training data size through vocabulary saturation. In: Proceedings of the eighth workshop on statistical machine translation, Sofia, Bulgaria, pp 281–291

  • Liu C, Liu Y, Sun M, Luan H, Yu H (2015) Generalized agreement for bidirectional word alignment. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal, pp 1828–1836

  • Liu L, Watanabe T, Sumita E, Zhao T (2013) Additive neural networks for statistical machine translation. In: 51st annual meeting of the association for computational linguistics (long papers), Sofia, Bulgaria, vol 1, pp 791–801

  • Lopez A (2008) Statistical machine translation. ACM Comput Surv 40(3):1–49

    Article  Google Scholar 

  • Louis A, Webber B (2014) Structured and unstructured cache models for smt domain adaptation. In: Proceedings of the 14th conference of the European chapter of the association for computational linguistics, Gothenburg, Sweden, pp 155–163

  • Macherey W, Och FJ, Thayer I, Uszkoreit J (2008) Lattice-based minimum error rate training for statistical machine translation. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 725–734

  • Mansour S, Ney H (2014) Unsupervised adaptation for statistical machine translation. In: Proceedings of the ninth workshop on statistical machine translation, Baltimore, Maryland, USA, pp 457–465

  • Mansour S, Wuebker J, Ney H (2011) Combining translation and language model scoring for domain-specific data filtering. In: International workshop on spoken language translation, CA, USA, San Francisco, pp 222–229

  • Marton Y, Resnik P (2008) Soft syntactic constraints for hierarchical phrased-based translation. In: Proceedings of ACL-08: HLT, Columbus, Ohio, pp 1003–1011

  • Matsoukas S, Rosti AVI, Zhang B (2009) Discriminative corpus weight estimation for machine translation. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, vol 2, pp 708–717

  • Mimno D, Wallach HM, Naradowsky J, Smith DA, McCallum A (2009) Polylingual topic models. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, vol 2, pp 880–889

  • Moore RC, Lewis W (2010) Intelligent selection of language model training data. In: Proceedings of the ACL 2010 conference short papers, Uppsala, Sweden, pp 220–224

  • Nagao M (1984) A framework of a mechanical translation between japanese and english by analogy principle. In: Elithorn A, Banerji R (eds) Artif Hum Intell. North-Holland, Amsterdam, pp 173–180

    Google Scholar 

  • Nakov P (2008) Improving english-spanish statistical machine translation: Experiments in domain adaptation, sentence paraphrasing, tokenization, and recasing. In: Proceedings of the third workshop on statistical machine translation, Columbus, Ohio, pp 147–150

  • Neubig G, Watanabe T (2016) Optimization for statistical machine translation: a survey. Comput Linguist 42(1):1–54

    Article  MathSciNet  Google Scholar 

  • Nikoulina V, Kovachev B, Lagos N, Monz C (2012) Adaptation of statistical machine translation model for cross-lingual information retrieval in a service context. In: Proceedings of the 13th conference of the European chapter of the association for computational linguistics, Avignon, France, pp 109–119

  • Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting on association for computational linguistics, Sapporo, Japan, vol 1, pp 160–167

  • Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, Philadelphia, Pennsylvania, pp 295–302

  • Ozdowska S, Way A (2009) Optimal bilingual data for French-English PB-SMT. In: Proceedings of the 13th annual meeting of the European association for machine translation, Barcelona, Spain, pp 96–103

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, Philadelphia, Pennsylvania, pp 311–318

  • Pauls A, DeNero J, Klein D (2009) Consensus training for consensus decoding in machine translation. In: Proceedings of the 2009 conference on empirical methods in natural language processing, vol 3, Singapore, pp 1418–1427

  • Pecina P, Toral A, van Genabith J (2012) Simple and effective parameter tuning for domain adaptation of statistical machine translation. In: Proceedings of the 24th international conference on computational linguistics, Mumbai, India, pp 2209–2224

  • Poncelas A, de Buy Maillette, Wenniger G, Way A (2017) Applying n-gram alignment entropy to improve feature decay algorithms. Prague Bull Math Linguist 108:245–256

    Article  Google Scholar 

  • Quirk C, Menezes A (2006) Dependency treelet translation: the convergence of statistical and example-based machine-translation? Mach Transl 20(1):43–65

    Article  Google Scholar 

  • Quirk C, Menezes A, Cherry C (2005) Dependency treelet translation: syntactically informed phrasal SMT. In: Proceedings of the 43rd annual meeting on association for computational linguistics, Ann Arbor, Michigan, pp 271–279

  • Razmara M, Foster G, Sankaran B, Sarkar A (2012) Mixing multiple translation models in statistical machine translation. In: Proceedings of the 50th annual meeting of the association for computational linguistics, long papers, Jeju Island, Korea, vol 1, pp 940–949

  • Schwenk H (2008) Investigations on large-scale lightly-supervised training for statistical machine translation. In: 2008 international workshop on spoken language translation, Honolulu, Hawaii, USA, pp 182–189

  • Schwenk H, Senellart J (2009) Translation model adaptation for an Arabic/French news translation system by lightly-supervised training. In: MT Summit XII: proceedings of the twelfth machine translation summit, Ottawa, Ontario, Canada, pp 308–315

  • Sennrich R (2012) Perplexity minimization for translation model domain adaptation in statistical machine translation. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, pp 539–549

  • Shah K, Barrault L, Schwenk H (2010) Translation model adaptation by resampling. In: Proceedings of the joint fifth workshop on statistical machine translation and MetricsMATR, Uppsala, Sweden, pp 392–399

  • Shah K, Barrault L, Schwenk H (2012) A general framework to weight heterogeneous parallel data for model adaptation in statistical machine translation. In: Proceedings of the AMTA-2012: the tenth biennial conference of the association for machine translation in the Americas, San Diego, CA, 10pp

  • Shen S, Liu Y, Sun M, Luan H (2015) Consistency-aware search for word alignment. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal, pp 1228–1237

  • Sima’an K (2003) On maximizing metrics for syntactic disambiguation. In: Proceedings of the 8th international workshop on parsing technologies (IWPT), Nancy, France, pp 183–194

  • Simianer P, Riezler S, Dyer C (2012) Joint feature selection in distributed stochastic learning for large-scale discriminative training in smt. In: Proceedings of the 50th annual meeting of the association for computational linguistics: long papers, vol 1, Jeju Island, Korea, pp 11–21

  • Simion A, Collins M, Stein C (2013) A convex alternative to IBM model 2. In: Proceedings of the 2013 conference on empirical methods in natural language processing, EMNLP 2013, Seattle, Washington, USA, pp 1574–1583

  • Smith DA, Eisner J (2006) Minimum risk annealing for training log-linear models. In: Proceedings of the COLING/ACL 2006 main conference poster sessions, Sydney, Australia, pp 787–794

  • Snover M, Dorr B, Schwartz R (2008) Language and translation model adaptation using comparable corpora. In: Proceedings of the conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 857–866

  • Su J, Wu H, Wang H, Chen Y, Shi X, Dong H, Liu Q (2012) Translation model adaptation for statistical machine translation with monolingual topic information. In: Proceedings of the 50th annual meeting of the association for computational linguistics (long papers), Jeju Island, Korea, vol 1, pp 459–468

  • Tamura A, Watanabe T, Sumita E (2012) Bilingual lexicon extraction from comparable corpora using label propagation. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, Jeju Island, Korea, pp 24–36

  • Tamura A, Watanabe T, Sumita E (2014) Recurrent neural networks for word alignment model. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (long papers), Baltimore, Maryland, vol 1, pp 1470–1480

  • Tang J, Meng Z, Nguyen X, Mei Q, Zhang M (2014) Understanding the limiting factors of topic modeling via posterior contraction analysis. In: Proceedings of the 31st international conference on machine learning (ICML-14), Beijing, China, pp 190–198

  • Tiedemann J (2010) Context adaptation in statistical machine translation using models with exponentially decaying cache. In: Proceedings of the 2010 workshop on domain adaptation for natural language processing, Uppsala, Sweden, pp 8–15

  • Tillmann C (2004) A unigram orientation model for statistical machine translation. In: Proceedings of HLT-NAACL 2004: short papers, Boston, Massachusetts, pp 101–104

  • Tsuruoka Y, Tsujii J, Ananiadou S (2009) Stochastic gradient descent training for l1-regularized log-linear models with cumulative penalty. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, Singapore, vol 1, pp 477–485

  • Vogel S, Ney H, Tillmann C (1996) HMM-based word alignment in statistical translation. In: Proceedings of the coling 1996: the 16th international conference on computational linguistics, Denmark, Copenhagen, pp 836–841

  • Waite A, Byrne B (2015) The geometry of statistical machine translation. In: Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies, Denver, Colorado, pp 376–386

  • Wang W, Macherey K, Macherey W, Och F, Xu P (2012) Improved domain adaptation for statistical machine translation. In: Proceedings of the AMTA-2012: the tenth biennial conference of the association for machine translation in the Americas, San Diego, CA

  • Wang X, Utiyama M, Finch A, Watanabe T, Sumita E (2015) Leave-one-out word alignment without garbage collector effects. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal, pp 1817–1827

  • Wäschle K, Riezler S (2012) Structural and topical dimensions in multi-task patent translation. In: Proceedings of the 13th conference of the European chapter of the association for computational linguistics, Avignon, France, pp 818–828

  • Watanabe T, Suzuki J, Tsukada H, Isozaki H (2007) Online large-margin training for statistical machine translation. In: EMNLP-CoNLL 2007, Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Prague, Czech Republic, pp 764–773

  • Van Der Wees M, Bisazza A, Weerkamp W, Monz C (2015) What’s in a Domain? Analyzing Genre and Topic Differences in Statistical Machine Translation. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the asian federation of natural language processing, Short Papers, Beijing, China, vol 2, pp 560–566

  • Wu H, Wang H, Liu Z (2005) Alignment model adaptation for domain-specific word alignment. In: Proceedings of the 43rd annual meeting on association for computational linguistics, Ann Arbor, Michigan, pp 467–474

  • Wu H, Wang H, Zong C (2008) Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora. In: Proceedings of the 22nd international conference on computational linguistics, Manchester, United Kingdom, vol 1, pp 993–1000

  • Yamada K, Knight K (2001) A syntax-based statistical translation model. In: Proceedings of the 39th annual meeting on association for computational linguistics, Toulouse, France, pp 523–530

  • Yu H, Huang L, Mi H, Zhao K (2013) Max-violation perceptron and forced decoding for scalable MT training. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, Washington, USA, pp 1112–1123

  • Zhang B, Su J, Xiong D, Duan H, Yao J (2015) Discriminative reordering model adaptation via structural learning. In: Proceedings of the 24th international conference on artificial intelligence, Buenos Aires, Argentina, pp 1040–1046

  • Zhang H, Chiang D (2014) Kneser-Ney smoothing on expected counts. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (long papers), Baltimore, Maryland, vol 1, pp 765–774

  • Zhang J, Li L, Way A, Liu Q (2014a) A probabilistic feature-based fill-up for smt. In: Proceedings of the 11th conference of the association for machine translation in the Americas, MT Researchers Track, Vancouver, Canada, vol 1, pp 96–109

  • Zhang M, Xiao X, Xiong D, Liu Q (2014b) Topic-based dissimilarity and sensitivity models for translation rule selection. J Artif Intell Res 50(1):1–30

    MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank the editor, anonymous reviewers and Ivan Titov for their inputs. The work is performed at ILLC, University of Amsterdam. The authors are supported by EU FP7 Marie Curie ITN Project (nr. 317471) and QT21 Project (H2020 nr. 645452).

Funding

Funding was provided by VICI (Grant No. 277-89-002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hoang Cuong.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cuong, H., Sima’an, K. A survey of domain adaptation for statistical machine translation. Machine Translation 31, 187–224 (2017). https://doi.org/10.1007/s10590-018-9216-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-018-9216-8

Keywords

Navigation