Abstract
In neural machine translation (NMT), it has become standard to translate using subword units to allow for an open vocabulary and improve accuracy on infrequent words. Byte-pair encoding (BPE) and its variants are the predominant approach to generating these subwords, as they are unsupervised, resource-free, and empirically effective. However, the granularity of these subword units is a hyperparameter to be tuned for each language and task, using methods such as grid search. Tuning may be done inexhaustively or skipped entirely due to resource constraints, leading to sub-optimal performance. In this paper, we propose a method to automatically tune this parameter using only one training pass. We incrementally introduce new BPE vocabulary online based on the held-out validation loss, beginning with smaller, general subwords and adding larger, more specific units over the course of training. Our method matches the results found with grid search, optimizing segmentation granularity while significantly reducing overall training time. We also show benefits in training efficiency and performance improvements for rare words due to the way embeddings for larger units are incrementally constructed by combining those from smaller units.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ataman D, Negri M, Turchi M, Federico M (2017) Linguistically motivated vocabulary reduction for neural machine translation from Turkish to English. Prague Bull Math Linguist 108(1):331–342
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. CoRR arXiv:1409.0473
Bojar O (2007) English-to-Czech factored machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 232–239
Chen T, Goodfellow IJ, Shlens J (2015) Net2net: accelerating learning via knowledge transfer. CoRR arXiv:1511.05641
Chung J, Cho K, Bengio Y (2016) A character-level decoder without explicit segmentation for neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Berlin, Germany, pp 1693–1703
Clark JH, Dyer C, Lavie A, Smith NA (2011) Better hypothesis testing for statistical machine translation: controlling for optimizer instability. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Portland, Oregon, USA, pp 176–181
Gage P (1994) A new algorithm for data compression. C Users J 12(2):23–38
Huck M, Riess S, Fraser AM (2017) Target-side word segmentation strategies for neural machine translation. In: Proceedings of the second conference on machine translation, WMT 2017, Copenhagen, Denmark, pp 56–67
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, Prague, Czech Republic, pp 177–180
Kudo T (2018) Subword regularization: improving neural network translation models with multiple subword candidates. arXiv:180410959
Lee J, Yoon J, Yang E, Hwang SJ (2017) Lifelong learning with dynamically expandable networks. CoRR arXiv:1708.01547
Ling W, Luís T, Marujo L, Astudillo RF, Amir S, Dyer C, Black AW, Trancoso I (2015) Finding function in form: compositional character models for open vocabulary word representation. arXiv preprint arXiv:150802096
Luong MT, Manning CD (2016) Achieving open vocabulary neural machine translation with hybrid word-character models. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Berlin, Germany, pp 1054–1063
Neubig G, Watanabe T, Mori S, Kawahara T (2012) Machine translation without words through substring alignment. In: Proceedings of the 50th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Jeju Island, Korea, pp 165–174
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics (ACL 2002), Philadelphia, PA, USA, pp 311–318
Passban P, Liu Q, Way A (2018) Improving character-based decoding using target-side morphological information for neural machine translation. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long Papers), New Orleans, Louisiana, pp 58–68
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Autodiff Workshop: the future of gradient-based machine learning software and techniques. Long Beach, CA, USA, p 4
Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Berlin, Germany, pp 1715–1725
Sennrich R, Firat O, Cho K, Birch A, Haddow B, Hitschler J, Junczys-Dowmunt M, Läubli S, Barone AVM, Mokry J, Nadejde M (2017) Nematus: a toolkit for neural machine translation. In: Proceedings of the software demonstrations of the 15th conference of the European chapter of the association for computational linguistics, Valencia, Spain, pp 65–68
Subotin M (2011) An exponential translation model for target language morphology. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Portland, Oregon, USA, pp 230–238
Tamchyna A, Weller-Di Marco M, Fraser A (2017) Modeling target-side inflection in neural machine translation. In: Proceedings of the second conference on machine translation, Copenhagen, Denmark, pp 32–42
Wang Y, Ramanan D, Hebert M (2017) Growing a brain: Fine-tuning by increasing model capacity. 2017 IEEE conference on computer vision and pattern recognition (CVPR 2017). Honolulu, HI, USA, pp 3029–3038
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Salesky, E., Runge, A., Coda, A. et al. Optimizing segmentation granularity for neural machine translation. Machine Translation 34, 41–59 (2020). https://doi.org/10.1007/s10590-019-09243-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-019-09243-8