Abstract
In recent years, significant progress has been made in the field of non-autoregressive machine translations. However, the accuracy of non-autoregressive models still lags behind their autoregressive counterparts. This discrepancy can be attributed to the abundance of repetitive tokens in the target sequences generated by non-autoregressive models. In this study, we delve into this phenomenon and propose a novel approach to train a non-autoregressive model using unlikelihood loss. We evaluate our method on three widely used benchmark tasks. The experimental results demonstrating that our proposed approach significantly reduces the number of repetitive tokens while improving the overall performance of non-autoregressive machine translations. Compared to the baseline model ”Mask-Predict”, the average number of repetitions on IWSLT 14 DE\(\rightarrow \)EN valid set is reduced from 0.48 to 0.17, resulting in a remarkable 62% decrease.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Enquiries about data availability should be directed to the authors.
References
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Bahuleyan H, El Asri L (2020) Diverse keyphrase generation with neural unlikelihood training. In: Proceedings of the 28th International Conference on Computational Linguistics, pp 5271–5287
Bao Y, Zhou H, Huang S, Wang D, Qian L, Dai X, Chen J, Li L (2022) Latent-glat: glancing at latent variables for parallel text generation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 8398–8409
Clark K, Luong M-T, Le QV, Manning CD (2019) Electra: pre-training text encoders as discriminators rather than generators. In: International Conference on Learning Representations
Geng X, Feng X, Qin B (2021) Learning to rewrite for non-autoregressive neural machine translation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 3297–3308
Ghazvininejad M, Levy O, Liu Y, Zettlemoyer L (2019) Mask-predict: parallel decoding of conditional masked language models. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 6114–6123
Gu J, Bradbury J, Xiong C, Li VOK, Socher R (2018) Non-autoregressive neural machine translation. In: International Conference on Learning Representations
Guo J, Tan X, He D, Qin T, Xu L, Liu T-Y (2019) Non-autoregressive neural machine translation with enhanced decoder input. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp 3723–3730
Guo J, Xu L, Chen E (2020) Jointly masked sequence-to-sequence model for non-autoregressive neural machine translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 376–385
Huang F, Zhou H, Liu Y, Li H, Huang M (2022) Directed acyclic transformer for non-autoregressive machine translation. In: International Conference on Machine Learning, PLMR, pp 9410–9428
Huang C, Zhou H, Zaïane OR, Mou L, Li L (2022) Non-autoregressive translation with layer-wise prediction and deep supervision. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp 10776–10784
Kaiser L, Bengio S, Roy A, Vaswani A, Parmar N, Uszkoreit J, Shazeer N (2018) Fast decoding in sequence models using discrete latent variables. In: ICML
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Lee J, Mansimov E, Cho K (2018) Deterministic non-autoregressive neural sequence modeling by iterative refinement. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 1173–1182
Li Z, Lin Z, He D, Tian F, Qin T, Wang L, Liu T-Y (2019) Hint-based training for non-autoregressive machine translation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 5712–5717
Ma X, Zhou C, Li X, Neubig G, Hovy E (2019) Flowseq: non-autoregressive conditional sequence generation with generative flow. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 4273–4283
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 311–318
Qian L, Zhou H, Bao Y, Wang M, Qiu L, Zhang W, Yu Y, Li L (2021) Glancing transformer for non-autoregressive neural machine translation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 1993–2003
Ren Y, Liu J, Tan X, Zhao S, Zhao Z, Liu T-Y (2020) A study of non-autoregressive model for sequence generation. arXiv preprint arXiv:2004.10454
Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1715–1725
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Wang Y, Tian F, He D, Qin T, Zhai CX, Liu T-Y (2019) Non-autoregressive machine translation with auxiliary regularization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp 5377–5384
Wang S, Shi S, Huang H (2021) Improving non-autoregressive machine translation with soft-masking. In: CCF International Conference on Natural Language Processing and Chinese Computing, Springer, pp 141–152
Wang S, Huang H, Shi S (2023) Incorporating history and future into non-autoregressive machine translation. Comput Speech Lang 77:101439
Welleck S, Kulikov I, Roller S, Dinan E, Cho K, Weston J (2019) Neural text generation with unlikelihood training. arXiv preprint arXiv:1908.04319
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Nos. 61732005, 61671064).
Funding
This work was supported by the National Natural Science Foundation of China (Nos. 61732005, 61671064)
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Shuheng Wang. The first draft of the manuscript was written by Shuheng Wang and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Informed consent was not required for this type of study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, S., Shi, S. & Huang, H. Alleviating repetitive tokens in non-autoregressive machine translation with unlikelihood training. Soft Comput 28, 4681–4688 (2024). https://doi.org/10.1007/s00500-023-09490-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-023-09490-1