Computer Science > Computation and Language

arXiv:2310.14262 (cs)

[Submitted on 22 Oct 2023]

Title:Boosting Unsupervised Machine Translation with Pseudo-Parallel Data

View PDF

Abstract:Even with the latest developments in deep learning and large-scale language modeling, the task of machine translation (MT) of low-resource languages remains a challenge. Neural MT systems can be trained in an unsupervised way without any translation resources but the quality lags behind, especially in truly low-resource conditions. We propose a training strategy that relies on pseudo-parallel sentence pairs mined from monolingual corpora in addition to synthetic sentence pairs back-translated from monolingual corpora. We experiment with different training schedules and reach an improvement of up to 14.5 BLEU points (English to Ukrainian) over a baseline trained on back-translated data only.

Comments:	MT Summit 2023
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2310.14262 [cs.CL]
	(or arXiv:2310.14262v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.14262
Journal reference:	Ivana Kvapilíková, Ondřej Bojar (2023): Boosting Unsupervised Machine Translation with Pseudo-Parallel Data. In: Proceedings of Machine Translation Summit XIX vol. 1: Research Track, pp. 135-147, AAMT, Kyoto, Japan

Submission history

From: Ivana Kvapilikova [view email]
[v1] Sun, 22 Oct 2023 10:57:12 UTC (533 KB)

Computer Science > Computation and Language

Title:Boosting Unsupervised Machine Translation with Pseudo-Parallel Data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Boosting Unsupervised Machine Translation with Pseudo-Parallel Data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators