[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Optimizing segmentation granularity for neural machine translation

  • Published:
Machine Translation

Abstract

In neural machine translation (NMT), it has become standard to translate using subword units to allow for an open vocabulary and improve accuracy on infrequent words. Byte-pair encoding (BPE) and its variants are the predominant approach to generating these subwords, as they are unsupervised, resource-free, and empirically effective. However, the granularity of these subword units is a hyperparameter to be tuned for each language and task, using methods such as grid search. Tuning may be done inexhaustively or skipped entirely due to resource constraints, leading to sub-optimal performance. In this paper, we propose a method to automatically tune this parameter using only one training pass. We incrementally introduce new BPE vocabulary online based on the held-out validation loss, beginning with smaller, general subwords and adding larger, more specific units over the course of training. Our method matches the results found with grid search, optimizing segmentation granularity while significantly reducing overall training time. We also show benefits in training efficiency and performance improvements for rare words due to the way embeddings for larger units are incrementally constructed by combining those from smaller units.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://nlp.stanford.edu/projects/nmt/.

  2. http://statmt.org/wmt17/translation-task.html.

  3. https://github.com/moses-smt/mosesdecoder/.

  4. https://github.com/fxsjy/jieba.

  5. https://github.com/jhclark/multeval.

References

  • Ataman D, Negri M, Turchi M, Federico M (2017) Linguistically motivated vocabulary reduction for neural machine translation from Turkish to English. Prague Bull Math Linguist 108(1):331–342

    Article  Google Scholar 

  • Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. CoRR arXiv:1409.0473

  • Bojar O (2007) English-to-Czech factored machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 232–239

  • Chen T, Goodfellow IJ, Shlens J (2015) Net2net: accelerating learning via knowledge transfer. CoRR arXiv:1511.05641

  • Chung J, Cho K, Bengio Y (2016) A character-level decoder without explicit segmentation for neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Berlin, Germany, pp 1693–1703

  • Clark JH, Dyer C, Lavie A, Smith NA (2011) Better hypothesis testing for statistical machine translation: controlling for optimizer instability. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Portland, Oregon, USA, pp 176–181

  • Gage P (1994) A new algorithm for data compression. C Users J 12(2):23–38

    Google Scholar 

  • Huck M, Riess S, Fraser AM (2017) Target-side word segmentation strategies for neural machine translation. In: Proceedings of the second conference on machine translation, WMT 2017, Copenhagen, Denmark, pp 56–67

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, Prague, Czech Republic, pp 177–180

  • Kudo T (2018) Subword regularization: improving neural network translation models with multiple subword candidates. arXiv:180410959

  • Lee J, Yoon J, Yang E, Hwang SJ (2017) Lifelong learning with dynamically expandable networks. CoRR arXiv:1708.01547

  • Ling W, Luís T, Marujo L, Astudillo RF, Amir S, Dyer C, Black AW, Trancoso I (2015) Finding function in form: compositional character models for open vocabulary word representation. arXiv preprint arXiv:150802096

  • Luong MT, Manning CD (2016) Achieving open vocabulary neural machine translation with hybrid word-character models. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Berlin, Germany, pp 1054–1063

  • Neubig G, Watanabe T, Mori S, Kawahara T (2012) Machine translation without words through substring alignment. In: Proceedings of the 50th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Jeju Island, Korea, pp 165–174

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics (ACL 2002), Philadelphia, PA, USA, pp 311–318

  • Passban P, Liu Q, Way A (2018) Improving character-based decoding using target-side morphological information for neural machine translation. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long Papers), New Orleans, Louisiana, pp 58–68

  • Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Autodiff Workshop: the future of gradient-based machine learning software and techniques. Long Beach, CA, USA, p 4

  • Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Berlin, Germany, pp 1715–1725

  • Sennrich R, Firat O, Cho K, Birch A, Haddow B, Hitschler J, Junczys-Dowmunt M, Läubli S, Barone AVM, Mokry J, Nadejde M (2017) Nematus: a toolkit for neural machine translation. In: Proceedings of the software demonstrations of the 15th conference of the European chapter of the association for computational linguistics, Valencia, Spain, pp 65–68

  • Subotin M (2011) An exponential translation model for target language morphology. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Portland, Oregon, USA, pp 230–238

  • Tamchyna A, Weller-Di Marco M, Fraser A (2017) Modeling target-side inflection in neural machine translation. In: Proceedings of the second conference on machine translation, Copenhagen, Denmark, pp 32–42

  • Wang Y, Ramanan D, Hebert M (2017) Growing a brain: Fine-tuning by increasing model capacity. 2017 IEEE conference on computer vision and pattern recognition (CVPR 2017). Honolulu, HI, USA, pp 3029–3038

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elizabeth Salesky.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Salesky, E., Runge, A., Coda, A. et al. Optimizing segmentation granularity for neural machine translation. Machine Translation 34, 41–59 (2020). https://doi.org/10.1007/s10590-019-09243-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-019-09243-8

Keywords

Navigation