Computer Science > Computation and Language

arXiv:2208.00463 (cs)

[Submitted on 31 Jul 2022 (v1), last revised 3 Mar 2024 (this version, v3)]

Title:Mismatching-Aware Unsupervised Translation Quality Estimation For Low-Resource Languages

Authors:Fatemeh Azadi, Heshaam Faili, Mohammad Javad Dousti

Abstract:Translation Quality Estimation (QE) is the task of predicting the quality of machine translation (MT) output without any reference. This task has gained increasing attention as an important component in the practical applications of MT. In this paper, we first propose XLMRScore, which is a cross-lingual counterpart of BERTScore computed via the XLM-RoBERTa (XLMR) model. This metric can be used as a simple unsupervised QE method, nevertheless facing two issues: firstly, the untranslated tokens leading to unexpectedly high translation scores, and secondly, the issue of mismatching errors between source and hypothesis tokens when applying the greedy matching in XLMRScore. To mitigate these issues, we suggest replacing untranslated words with the unknown token and the cross-lingual alignment of the pre-trained model to represent aligned words closer to each other, respectively. We evaluate the proposed method on four low-resource language pairs of the WMT21 QE shared task, as well as a new English$\rightarrow$Persian (En-Fa) test dataset introduced in this paper. Experiments show that our method could get comparable results with the supervised baseline for two zero-shot scenarios, i.e., with less than 0.01 difference in Pearson correlation, while outperforming unsupervised rivals in all the low-resource language pairs for above 8%, on average.

Comments:	Submitted to Language Resources and Evaluation
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2208.00463 [cs.CL]
	(or arXiv:2208.00463v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2208.00463

Submission history

From: Fatemeh Azadi [view email]
[v1] Sun, 31 Jul 2022 16:23:23 UTC (1,372 KB)
[v2] Sat, 12 Aug 2023 10:59:56 UTC (341 KB)
[v3] Sun, 3 Mar 2024 11:27:26 UTC (343 KB)

Computer Science > Computation and Language

Title:Mismatching-Aware Unsupervised Translation Quality Estimation For Low-Resource Languages

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Mismatching-Aware Unsupervised Translation Quality Estimation For Low-Resource Languages

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators