KLU-COAT: Vol 35, No 4

In recent years, neural network-based machine translation (MT) approaches have steadily superseded the statistical MT (SMT) methods, and represents the current state-of-the-art in MT research. Neural MT (NMT) is a data-driven end-to-end learning ...

research-article

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Pages 475–502https://doi.org/10.1007/s10590-021-09260-6

Abstract

This paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are ...

research-article

Improving bilingual word embeddings mapping with monolingual context information

Pages 503–518https://doi.org/10.1007/s10590-021-09274-0

Abstract

Bilingual word embeddings (BWEs) play a very important role in many natural language processing (NLP) tasks, especially cross-lingual tasks such as machine translation (MT) and cross-language information retrieval. Most existing methods to train ...

research-article

Tag-less back-translation

Pages 519–549https://doi.org/10.1007/s10590-021-09284-y

Abstract

An effective method to generate a large number of parallel sentences for training improved neural machine translation (NMT) systems is the use of the back-translations of the target-side monolingual data. The standard back-translation method has ...

research-article

Jointly learning bilingual word embeddings and alignments

Pages 551–569https://doi.org/10.1007/s10590-021-09283-z

Abstract

Learning bilingual word embeddings can be much easier if the parallel corpora are available with their words well aligned explicitly. However, in most cases, the parallel corpora only provide a set of pairs that are semantically equivalent to each ...

research-article

Dual contextual module for neural machine translation

Pages 571–593https://doi.org/10.1007/s10590-021-09282-0

Abstract

Self-attention-based encoder-decoder frameworks have drawn increasing attention in recent years. The self-attention mechanism generates contextual representations by attending to all tokens in the sentence. Despite improvements in performance, ...

research-article

Enhanced encoder for non-autoregressive machine translation

Pages 595–609https://doi.org/10.1007/s10590-021-09285-x

Abstract

Non-autoregressive machine translation aims to speed up the decoding procedure by discarding the autoregressive model and generating the target words independently. Because non-autoregressive machine translation fails to exploit target-side ...

research-article

Word reordering on multiple pivots for the Japanese and Indonesian language pair

Pages 611–636https://doi.org/10.1007/s10590-021-09288-8

Abstract

We investigated multiple pivot approaches for the Japanese and Indonesian (Ja–Id) language pair in phrase-based statistical machine translation (SMT). We used four languages as pivots: viz., English, Malaysian, Filipino, and the Myanmar language. ...

research-article

Joint source–target encoding with pervasive attention

Pages 637–659https://doi.org/10.1007/s10590-021-09289-7

Abstract

The pervasive attention model is a sequence-to-sequence model that addresses the issue of source–target interaction in encoder–decoder models by jointly encoding the two sequences with a two-dimensional convolutional neural network. We investigate ...

research-article

Augmenting training data with syntactic phrasal-segments in low-resource neural machine translation

Pages 661–685https://doi.org/10.1007/s10590-021-09290-0

Abstract

Neural machine translation (NMT) has emerged as a preferred alternative to the previous mainstream statistical machine translation (SMT) approaches largely due to its ability to produce better translations. The NMT training is often characterized ...

research-article

Investigating the roles of sentiment in machine translation

Pages 687–709https://doi.org/10.1007/s10590-021-09291-z

Abstract

Parallel corpora are central to translation studies and contrastive linguistics. However, training machine translation (MT) systems by barely using the semantic aspects of a parallel corpus leads to unsatisfactory results, as then the trained MT ...

research-article

Simple measures of bridging lexical divergence help unsupervised neural machine translation for low-resource languages

Pages 711–744https://doi.org/10.1007/s10590-021-09292-y

Abstract

Unsupervised Neural Machine Translation (UNMT) approaches have gained widespread popularity in recent times. Though these approaches show impressive translation performance using only monolingual corpora of the languages involved, these approaches ...