Alleviating repetitive tokens in non-autoregressive machine translation with unlikelihood training

177 Accesses
Explore all metrics

Abstract

In recent years, significant progress has been made in the field of non-autoregressive machine translations. However, the accuracy of non-autoregressive models still lags behind their autoregressive counterparts. This discrepancy can be attributed to the abundance of repetitive tokens in the target sequences generated by non-autoregressive models. In this study, we delve into this phenomenon and propose a novel approach to train a non-autoregressive model using unlikelihood loss. We evaluate our method on three widely used benchmark tasks. The experimental results demonstrating that our proposed approach significantly reduces the number of repetitive tokens while improving the overall performance of non-autoregressive machine translations. Compared to the baseline model ”Mask-Predict”, the average number of repetitions on IWSLT 14 DE\(\rightarrow \)EN valid set is reduced from 0.48 to 0.17, resulting in a remarkable 62% decrease.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Improving Non-autoregressive Machine Translation with Soft-Masking

Enhanced encoder for non-autoregressive machine translation

Article 16 November 2021

Non-autoregressive Neural Machine Translation with Distortion Model

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

Enquiries about data availability should be directed to the authors.

References

Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Bahuleyan H, El Asri L (2020) Diverse keyphrase generation with neural unlikelihood training. In: Proceedings of the 28th International Conference on Computational Linguistics, pp 5271–5287
Bao Y, Zhou H, Huang S, Wang D, Qian L, Dai X, Chen J, Li L (2022) Latent-glat: glancing at latent variables for parallel text generation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 8398–8409
Clark K, Luong M-T, Le QV, Manning CD (2019) Electra: pre-training text encoders as discriminators rather than generators. In: International Conference on Learning Representations
Geng X, Feng X, Qin B (2021) Learning to rewrite for non-autoregressive neural machine translation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 3297–3308
Ghazvininejad M, Levy O, Liu Y, Zettlemoyer L (2019) Mask-predict: parallel decoding of conditional masked language models. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 6114–6123
Gu J, Bradbury J, Xiong C, Li VOK, Socher R (2018) Non-autoregressive neural machine translation. In: International Conference on Learning Representations
Guo J, Tan X, He D, Qin T, Xu L, Liu T-Y (2019) Non-autoregressive neural machine translation with enhanced decoder input. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp 3723–3730
Guo J, Xu L, Chen E (2020) Jointly masked sequence-to-sequence model for non-autoregressive neural machine translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 376–385
Huang F, Zhou H, Liu Y, Li H, Huang M (2022) Directed acyclic transformer for non-autoregressive machine translation. In: International Conference on Machine Learning, PLMR, pp 9410–9428
Huang C, Zhou H, Zaïane OR, Mou L, Li L (2022) Non-autoregressive translation with layer-wise prediction and deep supervision. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp 10776–10784
Kaiser L, Bengio S, Roy A, Vaswani A, Parmar N, Uszkoreit J, Shazeer N (2018) Fast decoding in sequence models using discrete latent variables. In: ICML
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Lee J, Mansimov E, Cho K (2018) Deterministic non-autoregressive neural sequence modeling by iterative refinement. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 1173–1182
Li Z, Lin Z, He D, Tian F, Qin T, Wang L, Liu T-Y (2019) Hint-based training for non-autoregressive machine translation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 5712–5717
Ma X, Zhou C, Li X, Neubig G, Hovy E (2019) Flowseq: non-autoregressive conditional sequence generation with generative flow. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 4273–4283
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 311–318
Qian L, Zhou H, Bao Y, Wang M, Qiu L, Zhang W, Yu Y, Li L (2021) Glancing transformer for non-autoregressive neural machine translation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 1993–2003
Ren Y, Liu J, Tan X, Zhao S, Zhao Z, Liu T-Y (2020) A study of non-autoregressive model for sequence generation. arXiv preprint arXiv:2004.10454
Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1715–1725
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Wang Y, Tian F, He D, Qin T, Zhai CX, Liu T-Y (2019) Non-autoregressive machine translation with auxiliary regularization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp 5377–5384
Wang S, Shi S, Huang H (2021) Improving non-autoregressive machine translation with soft-masking. In: CCF International Conference on Natural Language Processing and Chinese Computing, Springer, pp 141–152
Wang S, Huang H, Shi S (2023) Incorporating history and future into non-autoregressive machine translation. Comput Speech Lang 77:101439
Article MATH Google Scholar
Welleck S, Kulikov I, Roller S, Dinan E, Cho K, Weston J (2019) Neural text generation with unlikelihood training. arXiv preprint arXiv:1908.04319

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 61732005, 61671064).

Funding

This work was supported by the National Natural Science Foundation of China (Nos. 61732005, 61671064)

Author information

Authors and Affiliations

School of Computer and Software, Nanyang Institute of Technology, Nanyang, China
Shuheng Wang
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Shumin Shi & Heyan Huang

Authors

Shuheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shumin Shi
View author publications
You can also search for this author in PubMed Google Scholar
Heyan Huang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Shuheng Wang. The first draft of the manuscript was written by Shuheng Wang and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Shumin Shi.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was not required for this type of study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, S., Shi, S. & Huang, H. Alleviating repetitive tokens in non-autoregressive machine translation with unlikelihood training. Soft Comput 28, 4681–4688 (2024). https://doi.org/10.1007/s00500-023-09490-1

Download citation

Accepted: 21 November 2023
Published: 03 January 2024
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00500-023-09490-1

Alleviating repetitive tokens in non-autoregressive machine translation with unlikelihood training

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Non-autoregressive Machine Translation with Soft-Masking

Enhanced encoder for non-autoregressive machine translation

Non-autoregressive Neural Machine Translation with Distortion Model

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Alleviating repetitive tokens in non-autoregressive machine translation with unlikelihood training

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Non-autoregressive Machine Translation with Soft-Masking

Enhanced encoder for non-autoregressive machine translation

Non-autoregressive Neural Machine Translation with Distortion Model

Explore related subjects

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation