Abstract
Differences in domains of language use between training data and test data have often been reported to result in performance degradation for phrase-based machine translation models. Throughout the past decade or so, a large body of work aimed at exploring domain-adaptation methods to improve system performance in the face of such domain differences. This paper provides a systematic survey of domain-adaptation methods for phrase-based machine-translation systems. The survey starts out with outlining the sources of errors in various components of phrase-based models due to domain change, including lexical selection, reordering and optimization. Subsequently, it outlines the different research lines to domain adaptation in the literature, and surveys the existing work within these research lines, discussing how these approaches differ and how they relate to each other.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
See Ozdowska and Way (2009) for a clear demonstration that building MT systems with more EuroParl data does not always lead to better translation results.
As a side note, the size of the N-best list does not seem to have a significant impact on adaptation [cf. Bertoldi and Federico (2009)].
In principle, search errors caused by a decoding algorithm can be a factor. The contribution of this factor to degradation of lexical translation quality, however, is minor, as shown in Irvine et al. (2013a).
In Su et al. (2012), an interpolation model is computed for \(P_{IN}(z_{{\tilde{f}}_{IN}}|\ \tilde{f})\), which is decomposed into the topic posterior distribution at word level for smoothing.
Joint inference of topic models on a concatenation of \(\mathcal {S}_{IN}\) and \(\mathcal {S}_{OUT}\) would drop the requirement of computing the topic-mapping probability distribution [cf. Gong et al. (2011) and Hewavitharana et al. (2013)]. An empirical comparison of the approaches, however, has yet to be thoroughly conducted, to the best of our knowledge.
There has not been any attempt at such an implementation for combining multiple sub-models, as far as we are aware.
The EM algorithm often gives a more efficient and stable performance in practice [cf. Razmara et al. (2012)].
References
Axelrod A, He X, Gao J (2011) Domain adaptation via pseudo in-domain data selection. In: EMNLP ’11: proceedings of the conference on empirical methods in natural language processing, Edinburgh, UK, pp 355–362
Aziz W, Dymetman M, Specia L (2014) Exact decoding for phrase-based statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, pp 1237–1249
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceedings of the international conference on learning representations, San Diego, CA
Bertoldi N, Federico M (2009) Domain adaptation for statistical machine translation with monolingual resources. In: Proceedings of the fourth workshop on statistical machine translation, Athens, Greece, pp 182–189
Besling S, Meier HG (1995) Language model speaker adaptation. In: Fourth European conference on speech communication and technology (EUROSPEECH ’95), Madrid, Spain, pp 1755–1758
Bhattacharyya A (1946) On a measure of divergence between two multinomial populations. Sankhya Indian J Stat 7(4):401–406
Birch A, Osborne M, Koehn P (2007) CCG supertags in factored statistical machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 9–16
Bisazza A, Ruiz N, Federico M (2011) Fill-up versus interpolation methods for phrase-based SMT adaptation. In: 2011 international workshop on spoken language translation, IWSLT, San Francisco, CA, USA, pp 136–143
Biçici E, Yuret D (2011) Instance selection for machine translation using feature decay algorithms. In: WMT 2011: Proceedings of the 6th workshop on statistical machine translation, Edinburgh, Scotland, UK, pp 272–283
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Blunsom P, Osborne M (2008) Probabilistic inference for machine translation. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 215–223
Bod R, Scha R, Sima’an K (2003) Data-oriented parsing. Center for the Study of Language and Information—Lecture Notes, Amsterdam, The Netherlands
Brown PF, Cocke J, Della Pietra SA, Della Pietra VJ, Jelinek F, Lafferty JD, Mercer RL, Roossin PS (1990) A statistical approach to machine translation. Comput Linguist 16(2):79–85
Brown PF, Della Pietra VJ, Della Pietra SA, Mercer RL (1993) The mathematics of statistical machine translation: Parameter estimation. Comput Linguist 19(2):263–311
Carl M, Way A (eds) (2003) Recent advances in example-based machine translation. Kluwer Academic Publishers, Dordrecht
Carpuat M, Goutte C, Foster G (2014) Linear mixture models for robust machine translation. In: Proceedings of the ninth workshop on statistical machine translation, Baltimore, Maryland, USA, pp 499–509
Chang YW, Collins M (2011) Exact decoding of phrase-based translation models through lagrangian relaxation. In: Proceedings of the conference on empirical methods in natural language processing, Edinburgh, United Kingdom, pp 26–37
Chang YW, Rush AM, DeNero J, Collins M (2014) A constrained viterbi relaxation for bidirectional word alignment. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: long papers), Baltimore, Maryland, pp 1481–1490
Chen B, Foster G, Kuhn R (2013) Adaptation of reordering models for statistical machine translation. In: 2013 conference of the North American Chapter of the Association for computational linguistics: human language technologies. Atlanta, Georgia, pp 938–946
Chen B, Kuhn R, Foster GF (2013b) Vector space model for adaptation in statistical machine translation. In: Proceedings of the 51st annual meeting of the association for computational linguistics, volume 1: long papers, Sofia, Bulgaria, pp 1285–1293
Chen B, Kuhn R, Foster G, Cherry C, Huang F (2016) Bilingual methods for adaptive training data selection for machine translation. In: Conference of the association for machine translation in the Americas, the twelfth conference of the association for machine translation in the Americas, Austin, Texas, pp 93–106
Cherry C, Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: human language technologies, Montreal, Canada, pp 427–436
Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd annual meeting on association for computational linguistics, Ann Arbor, Michigan, pp 263–270
Chiang D (2007) Hierarchical phrase-based translation. Comput Linguist 33(2):202–228
Chiang D, Marton Y, Resnik P (2008) Online large-margin training of syntactic and structural translation features. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 224–233
Chiang D, Knight K, Wang W (2009) 11,001 new features for statistical machine translation. In: Proceedings of Human Language Technologies: The 2009 annual conference of the North American chapter of the association for computational linguistics, Boulder, Colorado, pp 218–226
Chiang D, DeNeefe S, Pust M (2011) Two easy improvements to lexical weighting. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers, Portland, Oregon, vol 2, pp 455–460
Clark J, Dyer C, Lavie A (2014) Locally non-linear learning for statistical machine translation via discretization and structured regularization. Trans Assoc Comput Linguist 2:393–404
Clarkson P, Robinson A (1997) Language model adaptation using mixtures and an exponentially decaying cache. In: IEEE international conference on acoustics, speech, and signal processing, ICASSP-97. Munich, Germany, pp 799–802
Cui L, Chen X, Zhang D, Liu S, Li M, Zhou M (2013) Multi-domain adaptation for SMT using multi-task learning. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, Washington, USA, pp 1055–1065
Cuong H, Sima’an K (2014a) Latent domain phrase-based models for adaptation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, pp 566–576
Cuong H, Sima’an K (2014b) Latent domain translation models in mix-of-domains haystack. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, Dublin, Ireland, pp 1928–1939
Cuong H, Sima’an K (2015) Latent domain word alignment for heterogeneous corpora. In: Proceedings of the 2015 conference of the north american chapter of the association for computational linguistics: human language technologies, Denver, Colorado, USA, pp 398–408
Cuong H, Frank S, Sima’an K (2016a) ILLC-UvA adaptation system (scorpio) at WMT’16 IT-DOMAIN Task. In: Proceedings of the first conference on machine translation, shared task papers, Berlin, Germany, vol 2, pp 423–427
Cuong H, Sima’an K, Titov I (2016b) Adapting to all domains at once: rewarding domain invariance in SMT. Trans Assoc Comput Linguist 4:99–112
Daumé H III, Jagarlamudi J (2011) Domain adaptation for machine translation by mining unseen words. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers, Portland, Oregon, vol 2, pp 407–412
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38
Devlin J, Zbib R, Huang Z, Lamar T, Schwartz R, Makhoul J (2014) Fast and robust neural network joint models for statistical machine translation. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: long papers), Baltimore, Maryland, pp 1370–1380
Dong M, Cheng Y, Liu Y, Xu J, Sun M, Izuha T, Hao J (2014) Query lattice for translation retrieval. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, Dublin, Ireland, pp 2031–2041
Duh K, Sudoh K, Tsukada H (2010) Analysis of translation model adaptation in statistical machine translation. In: 2010 international workshop on spoken language translation, IWSLT 2010. France, Paris, pp 243–250
Duh K, Neubig G, Sudoh K, Tsukada H (2013) Adaptation data selection using neural language models: experiments in machine translation. In: 51st annual meeting of the association for computational linguistics (short papers). Sofia, Bulgaria, vol 2, pp 678–683
Durrani N, Sajjad H, Joty S, Abdelali A, Vogel S (2015) Using joint models for domain adaptation in statistical machine translation. In: Proceedings of the MT summit XV, MT researchers’ track, Miami, Florida, USA, vol. 1, pp 117–130
Eck M, Vogel S, Waibel A (2005) Low cost portability for statistical machine translation based on n-gram coverage. In: MT Summit X, conference proceedings: the tenth machine translation summit, Phuket, Thailand, pp 227–234
Eidelman V, Boyd-Graber J, Resnik P (2012) Topic models for dynamic translation model adaptation. In: Proceedings of the 50th annual meeting of the association for computational linguistics: short papers, Jeju Island, Korea, vol 2, pp 115–119
Federico M, Cettolo M, Bentivogli L, Paul M, Stüker S (2012) Overview of the IWSLT 2012 evaluation campaign. In: 2012 international workshop on spoken language translation, Hong Kong, pp 12–33
Foster G, Kuhn R (2007) Mixture-model adaptation for smt. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 128–135
Foster G, Goutte C, Kuhn R (2010) Discriminative instance weighting for domain adaptation in statistical machine translation. In: 2010 conference on empirical methods in natural language processing, Massachusetts, Cambridge, pp 451–459
Foster G, Chen B, Kuhn R (2013) Simulating discriminative training for linear mixture adaptation in statistical machine translation. In: Proceedings of the XIV machine translation summit, Nice, France, pp 183–190
Galley M, Manning CD (2008) A simple and effective hierarchical phrase reordering model. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 848–856
Gao Q, Lewis W, Quirk C, Hwang MY (2011) Incremental training and intentional over-fitting of word alignment. In: Proceedings of the 13th machine translation summit (MT summit XIII), Xiamen, China, pp 106–113
Gong Z, Zhang M, Zhou G (2011) Cache-based document-level statistical machine translation. In: Proceedings of the conference on empirical methods in natural language processing, Edinburgh, United Kingdom, pp 909–919
Goodman JT (1998) Parsing inside-out. PhD thesis, Harvard University, Cambridge, MA
Green S, Wang S, Cer D, Manning CD (2013) Fast and adaptive online training of feature-rich translation models. In: Proceedings of the 51st annual meeting of the association for computational linguistics (volume 1: long papers), Sofia, Bulgaria, pp 311–321
Green S, Cer DM, Manning CD (2014) An empirical comparison of features and tuning for phrase-based machine translation. In: Proceedings of the ninth workshop on statistical machine translation, WMT@ACL 2014, Baltimore, Maryland, USA, pp 466–476
Gruber A, Weiss Y, Rosen-Zvi M (2007) Hidden topic markov models. In: Proceedings of the eleventh international conference on artificial intelligence and statistics, San Juan, Puerto Rico, pp 163–170
Haddow B (2013) Applying pairwise ranked optimisation to improve the interpolation of translation models. In: Proceedings of the human language technologies: conference of the North American chapter of the association of computational linguistics, Atlanta, Georgia, USA, pp 342–347
Haghighi A, Liang P, Berg-Kirkpatrick T, Klein D (2008) Learning bilingual lexicons from monolingual corpora. In: Proceedings of ACL-08: HLT, Columbus, Ohio, pp 771–779
Hasler E, Haddow B, Koehn P (2012) Sparse lexicalised features and topic adaptation for SMT. In: 2012 international workshop on spoken language translation. IWSLT, Hong Kong, pp 268–275
Hasler E, Blunsom P, Koehn P, Haddow B (2014) Dynamic topic adaptation for phrase-based MT. In: Proceedings of the 14th conference of the European chapter of the association for computational linguistics, Gothenburg, Sweden, pp 328–337
Hewavitharana S, Mehay D, Ananthakrishnan S, Natarajan P (2013) Incremental topic-based translation model adaptation for conversational spoken language translation. In: Proceedings of the 51st annual meeting of the association for computational linguistics (volume 2: short papers), Sofia, Bulgaria, pp 697–701
Hieber F, Riezler S (2015) Bag-of-words forced decoding for cross-lingual information retrieval. In: NAACL HLT 2015, the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies. Denver, Colorado, USA, pp 1172–1182
Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 289–296
Hopkins M, May J (2011) Tuning as ranking. In: Proceedings of the conference on empirical methods in natural language processing, Edinburgh, United Kingdom, pp 1352–1362
Hu Y, Zhai K, Eidelman V, Boyd-Graber J (2014) Polylingual tree-based topic models for translation domain adaptation. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: long papers), Baltimore, Maryland, pp 1166–1176
Irvine A, Morgan J, Carpuat M, Daumé H III, Munteanu D (2013a) Measuring machine translation errors in new domains. Trans Assoc Comput Linguist 1:429–440
Irvine A, Quirk C, Daumé III H (2013b) Monolingual marginal matching for translation model adaptation. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, Washington, USA, pp 1077–1088
Jeblee S, Feely W, Bouamor H, Lavie A, Habash N, Oflazer K (2014) Domain and dialect adaptation for machine translation into egyptian arabic. In: Proceedings of the EMNLP 2014 workshop on arabic natural language processing (ANLP), Doha, Qatar, pp 196–206
Joty S, Sajjad H, Durrani N, Al-Mannai K, Abdelali A, Vogel S (2015) How to avoid unwanted pregnancies: Domain adaptation using neural network models. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal, pp 1259–1270
Kettunen K (2009) Choosing the Best MT Programs for CLIR purposes—can MT metrics be helpful? In: Proceedings of the 31st European conference on information retrieval research: advances in information retrieval, Springer International Publishing, Heidelberg/Berlin, Germany. Lecture Notes in Computer Science, vol 5478, pp 706–712
Kirchhoff K, Bilmes J (2014) Submodularity for data selection in machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, pp 131–141
Koehn P (2004) Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In: Proceedings of the machine translation: from real users to research: 6th conference of the association for machine translation in the Americas, Springer, Berlin/Heidelberg, Germany, pp 115–124
Koehn P (2005) Europarl: A parallel corpus for statistical machine translation. In: MT Summit X, conference proceedings: the tenth machine translation summit, Phuket, Thailand, pp 79–86
Koehn P (2010) Statistical machine translation. Cambridge University Press, New York, NY, USA
Koehn P, Knight K (2002) Learning a translation lexicon from monolingual corpora. In: Proceedings of the ACL-02 workshop on unsupervised lexical acquisition, Philadelphia, Pennsylvania, pp 9–16
Koehn P, Schroeder J (2007) Experiments in domain adaptation for statistical machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 224–227
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, vol 1, Edmonton, Canada, pp 48–54
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, Prague, Czech Republic, pp 177–180
Kuhn R, De Mori R (1992) Corrections to “a cache-based language model for speech recognition”. IEEE Trans Pattern Anal Mach Intell 14(6):691–692
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Kumar S, Byrne W (2004) Minimum Bayes-risk decoding for statistical machine translation. In: Proceedings of the human language technology conference of the North American chapter of the association for computational linguistics: HLT-NAACL 2004, Boston, Massachusetts, USA, pp 169–176
Lambert P, Schwenk H, Servan C, Abdul-Rauf S (2011) Investigations on translation model adaptation using monolingual data. In: Proceedings of the sixth workshop on statistical machine translation, Edinburgh, Scotland, pp 284–293
Lewis W, Eetemadi S (2013) Dramatically reducing training data size through vocabulary saturation. In: Proceedings of the eighth workshop on statistical machine translation, Sofia, Bulgaria, pp 281–291
Liu C, Liu Y, Sun M, Luan H, Yu H (2015) Generalized agreement for bidirectional word alignment. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal, pp 1828–1836
Liu L, Watanabe T, Sumita E, Zhao T (2013) Additive neural networks for statistical machine translation. In: 51st annual meeting of the association for computational linguistics (long papers), Sofia, Bulgaria, vol 1, pp 791–801
Lopez A (2008) Statistical machine translation. ACM Comput Surv 40(3):1–49
Louis A, Webber B (2014) Structured and unstructured cache models for smt domain adaptation. In: Proceedings of the 14th conference of the European chapter of the association for computational linguistics, Gothenburg, Sweden, pp 155–163
Macherey W, Och FJ, Thayer I, Uszkoreit J (2008) Lattice-based minimum error rate training for statistical machine translation. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 725–734
Mansour S, Ney H (2014) Unsupervised adaptation for statistical machine translation. In: Proceedings of the ninth workshop on statistical machine translation, Baltimore, Maryland, USA, pp 457–465
Mansour S, Wuebker J, Ney H (2011) Combining translation and language model scoring for domain-specific data filtering. In: International workshop on spoken language translation, CA, USA, San Francisco, pp 222–229
Marton Y, Resnik P (2008) Soft syntactic constraints for hierarchical phrased-based translation. In: Proceedings of ACL-08: HLT, Columbus, Ohio, pp 1003–1011
Matsoukas S, Rosti AVI, Zhang B (2009) Discriminative corpus weight estimation for machine translation. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, vol 2, pp 708–717
Mimno D, Wallach HM, Naradowsky J, Smith DA, McCallum A (2009) Polylingual topic models. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, vol 2, pp 880–889
Moore RC, Lewis W (2010) Intelligent selection of language model training data. In: Proceedings of the ACL 2010 conference short papers, Uppsala, Sweden, pp 220–224
Nagao M (1984) A framework of a mechanical translation between japanese and english by analogy principle. In: Elithorn A, Banerji R (eds) Artif Hum Intell. North-Holland, Amsterdam, pp 173–180
Nakov P (2008) Improving english-spanish statistical machine translation: Experiments in domain adaptation, sentence paraphrasing, tokenization, and recasing. In: Proceedings of the third workshop on statistical machine translation, Columbus, Ohio, pp 147–150
Neubig G, Watanabe T (2016) Optimization for statistical machine translation: a survey. Comput Linguist 42(1):1–54
Nikoulina V, Kovachev B, Lagos N, Monz C (2012) Adaptation of statistical machine translation model for cross-lingual information retrieval in a service context. In: Proceedings of the 13th conference of the European chapter of the association for computational linguistics, Avignon, France, pp 109–119
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting on association for computational linguistics, Sapporo, Japan, vol 1, pp 160–167
Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, Philadelphia, Pennsylvania, pp 295–302
Ozdowska S, Way A (2009) Optimal bilingual data for French-English PB-SMT. In: Proceedings of the 13th annual meeting of the European association for machine translation, Barcelona, Spain, pp 96–103
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, Philadelphia, Pennsylvania, pp 311–318
Pauls A, DeNero J, Klein D (2009) Consensus training for consensus decoding in machine translation. In: Proceedings of the 2009 conference on empirical methods in natural language processing, vol 3, Singapore, pp 1418–1427
Pecina P, Toral A, van Genabith J (2012) Simple and effective parameter tuning for domain adaptation of statistical machine translation. In: Proceedings of the 24th international conference on computational linguistics, Mumbai, India, pp 2209–2224
Poncelas A, de Buy Maillette, Wenniger G, Way A (2017) Applying n-gram alignment entropy to improve feature decay algorithms. Prague Bull Math Linguist 108:245–256
Quirk C, Menezes A (2006) Dependency treelet translation: the convergence of statistical and example-based machine-translation? Mach Transl 20(1):43–65
Quirk C, Menezes A, Cherry C (2005) Dependency treelet translation: syntactically informed phrasal SMT. In: Proceedings of the 43rd annual meeting on association for computational linguistics, Ann Arbor, Michigan, pp 271–279
Razmara M, Foster G, Sankaran B, Sarkar A (2012) Mixing multiple translation models in statistical machine translation. In: Proceedings of the 50th annual meeting of the association for computational linguistics, long papers, Jeju Island, Korea, vol 1, pp 940–949
Schwenk H (2008) Investigations on large-scale lightly-supervised training for statistical machine translation. In: 2008 international workshop on spoken language translation, Honolulu, Hawaii, USA, pp 182–189
Schwenk H, Senellart J (2009) Translation model adaptation for an Arabic/French news translation system by lightly-supervised training. In: MT Summit XII: proceedings of the twelfth machine translation summit, Ottawa, Ontario, Canada, pp 308–315
Sennrich R (2012) Perplexity minimization for translation model domain adaptation in statistical machine translation. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, pp 539–549
Shah K, Barrault L, Schwenk H (2010) Translation model adaptation by resampling. In: Proceedings of the joint fifth workshop on statistical machine translation and MetricsMATR, Uppsala, Sweden, pp 392–399
Shah K, Barrault L, Schwenk H (2012) A general framework to weight heterogeneous parallel data for model adaptation in statistical machine translation. In: Proceedings of the AMTA-2012: the tenth biennial conference of the association for machine translation in the Americas, San Diego, CA, 10pp
Shen S, Liu Y, Sun M, Luan H (2015) Consistency-aware search for word alignment. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal, pp 1228–1237
Sima’an K (2003) On maximizing metrics for syntactic disambiguation. In: Proceedings of the 8th international workshop on parsing technologies (IWPT), Nancy, France, pp 183–194
Simianer P, Riezler S, Dyer C (2012) Joint feature selection in distributed stochastic learning for large-scale discriminative training in smt. In: Proceedings of the 50th annual meeting of the association for computational linguistics: long papers, vol 1, Jeju Island, Korea, pp 11–21
Simion A, Collins M, Stein C (2013) A convex alternative to IBM model 2. In: Proceedings of the 2013 conference on empirical methods in natural language processing, EMNLP 2013, Seattle, Washington, USA, pp 1574–1583
Smith DA, Eisner J (2006) Minimum risk annealing for training log-linear models. In: Proceedings of the COLING/ACL 2006 main conference poster sessions, Sydney, Australia, pp 787–794
Snover M, Dorr B, Schwartz R (2008) Language and translation model adaptation using comparable corpora. In: Proceedings of the conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 857–866
Su J, Wu H, Wang H, Chen Y, Shi X, Dong H, Liu Q (2012) Translation model adaptation for statistical machine translation with monolingual topic information. In: Proceedings of the 50th annual meeting of the association for computational linguistics (long papers), Jeju Island, Korea, vol 1, pp 459–468
Tamura A, Watanabe T, Sumita E (2012) Bilingual lexicon extraction from comparable corpora using label propagation. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, Jeju Island, Korea, pp 24–36
Tamura A, Watanabe T, Sumita E (2014) Recurrent neural networks for word alignment model. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (long papers), Baltimore, Maryland, vol 1, pp 1470–1480
Tang J, Meng Z, Nguyen X, Mei Q, Zhang M (2014) Understanding the limiting factors of topic modeling via posterior contraction analysis. In: Proceedings of the 31st international conference on machine learning (ICML-14), Beijing, China, pp 190–198
Tiedemann J (2010) Context adaptation in statistical machine translation using models with exponentially decaying cache. In: Proceedings of the 2010 workshop on domain adaptation for natural language processing, Uppsala, Sweden, pp 8–15
Tillmann C (2004) A unigram orientation model for statistical machine translation. In: Proceedings of HLT-NAACL 2004: short papers, Boston, Massachusetts, pp 101–104
Tsuruoka Y, Tsujii J, Ananiadou S (2009) Stochastic gradient descent training for l1-regularized log-linear models with cumulative penalty. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, Singapore, vol 1, pp 477–485
Vogel S, Ney H, Tillmann C (1996) HMM-based word alignment in statistical translation. In: Proceedings of the coling 1996: the 16th international conference on computational linguistics, Denmark, Copenhagen, pp 836–841
Waite A, Byrne B (2015) The geometry of statistical machine translation. In: Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies, Denver, Colorado, pp 376–386
Wang W, Macherey K, Macherey W, Och F, Xu P (2012) Improved domain adaptation for statistical machine translation. In: Proceedings of the AMTA-2012: the tenth biennial conference of the association for machine translation in the Americas, San Diego, CA
Wang X, Utiyama M, Finch A, Watanabe T, Sumita E (2015) Leave-one-out word alignment without garbage collector effects. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal, pp 1817–1827
Wäschle K, Riezler S (2012) Structural and topical dimensions in multi-task patent translation. In: Proceedings of the 13th conference of the European chapter of the association for computational linguistics, Avignon, France, pp 818–828
Watanabe T, Suzuki J, Tsukada H, Isozaki H (2007) Online large-margin training for statistical machine translation. In: EMNLP-CoNLL 2007, Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Prague, Czech Republic, pp 764–773
Van Der Wees M, Bisazza A, Weerkamp W, Monz C (2015) What’s in a Domain? Analyzing Genre and Topic Differences in Statistical Machine Translation. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the asian federation of natural language processing, Short Papers, Beijing, China, vol 2, pp 560–566
Wu H, Wang H, Liu Z (2005) Alignment model adaptation for domain-specific word alignment. In: Proceedings of the 43rd annual meeting on association for computational linguistics, Ann Arbor, Michigan, pp 467–474
Wu H, Wang H, Zong C (2008) Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora. In: Proceedings of the 22nd international conference on computational linguistics, Manchester, United Kingdom, vol 1, pp 993–1000
Yamada K, Knight K (2001) A syntax-based statistical translation model. In: Proceedings of the 39th annual meeting on association for computational linguistics, Toulouse, France, pp 523–530
Yu H, Huang L, Mi H, Zhao K (2013) Max-violation perceptron and forced decoding for scalable MT training. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, Washington, USA, pp 1112–1123
Zhang B, Su J, Xiong D, Duan H, Yao J (2015) Discriminative reordering model adaptation via structural learning. In: Proceedings of the 24th international conference on artificial intelligence, Buenos Aires, Argentina, pp 1040–1046
Zhang H, Chiang D (2014) Kneser-Ney smoothing on expected counts. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (long papers), Baltimore, Maryland, vol 1, pp 765–774
Zhang J, Li L, Way A, Liu Q (2014a) A probabilistic feature-based fill-up for smt. In: Proceedings of the 11th conference of the association for machine translation in the Americas, MT Researchers Track, Vancouver, Canada, vol 1, pp 96–109
Zhang M, Xiao X, Xiong D, Liu Q (2014b) Topic-based dissimilarity and sensitivity models for translation rule selection. J Artif Intell Res 50(1):1–30
Acknowledgements
We thank the editor, anonymous reviewers and Ivan Titov for their inputs. The work is performed at ILLC, University of Amsterdam. The authors are supported by EU FP7 Marie Curie ITN Project (nr. 317471) and QT21 Project (H2020 nr. 645452).
Funding
Funding was provided by VICI (Grant No. 277-89-002).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cuong, H., Sima’an, K. A survey of domain adaptation for statistical machine translation. Machine Translation 31, 187–224 (2017). https://doi.org/10.1007/s10590-018-9216-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-018-9216-8