Optimizing segmentation granularity for neural machine translation

Elizabeth Salesky ORCID: orcid.org/0000-0001-6765-1447¹,
Andrew Runge¹,
Alex Coda¹,
Jan Niehues² &
…
Graham Neubig¹

576 Accesses
12 Citations
Explore all metrics

Abstract

In neural machine translation (NMT), it has become standard to translate using subword units to allow for an open vocabulary and improve accuracy on infrequent words. Byte-pair encoding (BPE) and its variants are the predominant approach to generating these subwords, as they are unsupervised, resource-free, and empirically effective. However, the granularity of these subword units is a hyperparameter to be tuned for each language and task, using methods such as grid search. Tuning may be done inexhaustively or skipped entirely due to resource constraints, leading to sub-optimal performance. In this paper, we propose a method to automatically tune this parameter using only one training pass. We incrementally introduce new BPE vocabulary online based on the held-out validation loss, beginning with smaller, general subwords and adding larger, more specific units over the course of training. Our method matches the results found with grid search, optimizing segmentation granularity while significantly reducing overall training time. We also show benefits in training efficiency and performance improvements for rare words due to the way embeddings for larger units are incrementally constructed by combining those from smaller units.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Finding Better Subword Segmentation for Neural Machine Translation

Neural Machine Translation for Morphologically Rich Languages with Improved Sub-word Units and Synthetic Data

Exploring Subword Segmentation Methods in English-Vietnamese Neural Machine Translation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Ataman D, Negri M, Turchi M, Federico M (2017) Linguistically motivated vocabulary reduction for neural machine translation from Turkish to English. Prague Bull Math Linguist 108(1):331–342
Article Google Scholar
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. CoRR arXiv:1409.0473
Bojar O (2007) English-to-Czech factored machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 232–239
Chen T, Goodfellow IJ, Shlens J (2015) Net2net: accelerating learning via knowledge transfer. CoRR arXiv:1511.05641
Chung J, Cho K, Bengio Y (2016) A character-level decoder without explicit segmentation for neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Berlin, Germany, pp 1693–1703
Clark JH, Dyer C, Lavie A, Smith NA (2011) Better hypothesis testing for statistical machine translation: controlling for optimizer instability. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Portland, Oregon, USA, pp 176–181
Gage P (1994) A new algorithm for data compression. C Users J 12(2):23–38
Google Scholar
Huck M, Riess S, Fraser AM (2017) Target-side word segmentation strategies for neural machine translation. In: Proceedings of the second conference on machine translation, WMT 2017, Copenhagen, Denmark, pp 56–67
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, Prague, Czech Republic, pp 177–180
Kudo T (2018) Subword regularization: improving neural network translation models with multiple subword candidates. arXiv:180410959
Lee J, Yoon J, Yang E, Hwang SJ (2017) Lifelong learning with dynamically expandable networks. CoRR arXiv:1708.01547
Ling W, Luís T, Marujo L, Astudillo RF, Amir S, Dyer C, Black AW, Trancoso I (2015) Finding function in form: compositional character models for open vocabulary word representation. arXiv preprint arXiv:150802096
Luong MT, Manning CD (2016) Achieving open vocabulary neural machine translation with hybrid word-character models. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Berlin, Germany, pp 1054–1063
Neubig G, Watanabe T, Mori S, Kawahara T (2012) Machine translation without words through substring alignment. In: Proceedings of the 50th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Jeju Island, Korea, pp 165–174
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics (ACL 2002), Philadelphia, PA, USA, pp 311–318
Passban P, Liu Q, Way A (2018) Improving character-based decoding using target-side morphological information for neural machine translation. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long Papers), New Orleans, Louisiana, pp 58–68
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Autodiff Workshop: the future of gradient-based machine learning software and techniques. Long Beach, CA, USA, p 4
Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Berlin, Germany, pp 1715–1725
Sennrich R, Firat O, Cho K, Birch A, Haddow B, Hitschler J, Junczys-Dowmunt M, Läubli S, Barone AVM, Mokry J, Nadejde M (2017) Nematus: a toolkit for neural machine translation. In: Proceedings of the software demonstrations of the 15th conference of the European chapter of the association for computational linguistics, Valencia, Spain, pp 65–68
Subotin M (2011) An exponential translation model for target language morphology. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Portland, Oregon, USA, pp 230–238
Tamchyna A, Weller-Di Marco M, Fraser A (2017) Modeling target-side inflection in neural machine translation. In: Proceedings of the second conference on machine translation, Copenhagen, Denmark, pp 32–42
Wang Y, Ramanan D, Hebert M (2017) Growing a brain: Fine-tuning by increasing model capacity. 2017 IEEE conference on computer vision and pattern recognition (CVPR 2017). Honolulu, HI, USA, pp 3029–3038

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, PA, USA
Elizabeth Salesky, Andrew Runge, Alex Coda & Graham Neubig
Maastricht University, Maastricht, Netherlands
Jan Niehues

Authors

Elizabeth Salesky
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Runge
View author publications
You can also search for this author in PubMed Google Scholar
Alex Coda
View author publications
You can also search for this author in PubMed Google Scholar
Jan Niehues
View author publications
You can also search for this author in PubMed Google Scholar
Graham Neubig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elizabeth Salesky.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Salesky, E., Runge, A., Coda, A. et al. Optimizing segmentation granularity for neural machine translation. Machine Translation 34, 41–59 (2020). https://doi.org/10.1007/s10590-019-09243-8

Download citation

Received: 11 January 2019
Accepted: 26 November 2019
Published: 24 January 2020
Issue Date: April 2020
DOI: https://doi.org/10.1007/s10590-019-09243-8

Optimizing segmentation granularity for neural machine translation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Finding Better Subword Segmentation for Neural Machine Translation

Neural Machine Translation for Morphologically Rich Languages with Improved Sub-word Units and Synthetic Data

Exploring Subword Segmentation Methods in English-Vietnamese Neural Machine Translation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Optimizing segmentation granularity for neural machine translation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Finding Better Subword Segmentation for Neural Machine Translation

Neural Machine Translation for Morphologically Rich Languages with Improved Sub-word Units and Synthetic Data

Exploring Subword Segmentation Methods in English-Vietnamese Neural Machine Translation

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation