CN109446535A

CN109446535A - A kind of illiteracy Chinese nerve machine translation method based on triangle framework

Info

Publication number: CN109446535A
Application number: CN201811231026.1A
Authority: CN
Inventors: 苏依拉; 孙晓骞; 王宇飞; 高芬; 张振; 牛向华; 赵亚平
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2018-10-22
Filing date: 2018-10-22
Publication date: 2019-03-08

Abstract

In order to change the relatively backward development of low-resource language machine translation, the present invention discloses a Mongolian-Chinese neural machine translation method based on triangular architecture. Compared with the existing pure end-to-end neural machine translation method, the present invention fully considers The problem of limited parallel corpora in small languages, especially the scarcity of Mongolian-Chinese parallel corpus, improves the quality of Mongolian-Chinese translation under the premise of lack of parallel corpus; secondly, the unified two-way EM algorithm is used to jointly optimize the Mongolian translation model; The pseudo samples generated by the model x→z or z→y are mixed 1:1 with the true bilingual samples in the same mini-batch to stabilize the training process.

Description

A kind of illiteracy Chinese nerve machine translation method based on triangle framework

Technical field

The invention belongs to machine translation mothod field, in particular to a kind of illiteracy Chinese nerve machine translation based on triangle framework Method.

Background technique

A kind of automatic language translation can be become another language using computer by machine translation, be to solve language barrier Hinder most one of powerful measure of problem.In recent years, many large-scale searching enterprises and service centre such as Google, Baidu etc. are directed to machine Device translation has all carried out large-scale research, is made that significant contribution, therefore big language to obtain the high quality translation of machine translation Already close to human translation level, millions of people realizes leap using translation on line system and mobile application for translation between kind The exchange of aphasis.In the tide of deep learning in recent years, machine translation has become the most important thing, and it is complete to have become promotion The important component of ball exchange.

As a kind of data-driven method, the performance height of neural machine translation relies on scale, the quality of Parallel Corpus With neighborhood covering face.However, the language resourceful in addition to Chinese, English etc., most language all lack big rule in the world The Parallel Corpus of mould, high quality, wide coverage rate, Mongol are exactly a typical representative.Therefore, how to make full use of existing Data alleviate scarcity of resources problem, become an important research direction of neural machine translation.

Currently, end-to-end nerve machine translation is rapidly developed, translated relative to traditional machine translation method It is significantly improved in quality, has become the core technology of commercial online machine translation system.But for Parallel Corpus The translation of deficient low-resource language still has no small disadvantage compared to the translation between majority language.

Summary of the invention

In order to overcome the disadvantages of the above prior art, the purpose of the present invention is to provide a kind of illiteracy Chinese based on triangle framework Neural machine translation method, this method especially cover Chinese parallel corpora mainly for the limited problem of Parallel Corpus in rare foreign languages Library scarcity problem regard Mongol (z) as intermediate hidden variable, is introduced into the translation between English (x) and Chinese (y), by English Translation between the Chinese is decomposed into via Mongolian two steps.

To achieve the goals above, the technical solution adopted by the present invention is that:

A kind of illiteracy Chinese nerve machine translation method based on triangle framework, which is characterized in that using Mongol as intermediate hidden Variable is introduced into the translation between majority language x (such as English, French, Japanese) and Chinese, will be between majority language x and Chinese Translation be decomposed into via Mongolian two steps, under target a possibility that maximizing translation majority language x and Chinese, use The unified two-way Mongolian translation model of EM algorithm combined optimization, is promoted and covers Chinese translation quality, and any two of them combines it Between translation still use coder-decoder structure end to end.

Mongol is indicated with z, indicates that Chinese, the two-way EM algorithmic procedure of the unification are as follows with y:

The direction x → y

E: optimization θ_z|x

Wherein: θ_z|xIt indicates to make to translate ginseng when accuracy rate reaches on setting value when translating Mongol z from majority language x Numerical value, and p (z | x) indicate the accuracy rate that Mongol z is translated from majority language x, it is really to be distributed, p (z | y) it indicates to turn over from Chinese y The accuracy rate for translating Mongol z is the fitting distribution of p (z | x), and KL () is KullbackLeibler divergence；KL(p(z|x)|| P (z | y)) it indicates when being fitted true distribution p (z | x) with p (z | y), the information loss of generation；

M: optimization θ_y|z

Wherein: θ_y|zIt indicates to make to translate parameter when accuracy rate reaches on setting value when translating Chinese y from Mongol z Value, E_{Z~p (z | x)}The mathematic expectaion of z when indicating to translate Mongol z from majority language x, and p (y | z) it indicates to translate the Chinese from Mongol z The accuracy rate of language y, D indicate entire training set；

The direction y → x

E: optimization θ_z|y

Wherein: θ_z|yIt indicates to make to translate parameter when accuracy rate reaches on setting value when translating Mongol z from Chinese y Value；

M: optimization θ_x|z

Wherein: θ_x|zIt indicates to make to translate ginseng when accuracy rate reaches on setting value when translating majority language x from Mongol z Numerical value, E_{Z~p (z | y)}The mathematic expectaion of z when indicating to translate Mongol z from Chinese y, and p (x | z) it indicates to translate greatly from Mongol z The accuracy rate of languages x.

P (z | x), p (z | y), p (y | z) and p (x | z) are trained by the sample that itself is generated.

The training of the direction the x → y translation is decomposed into two stages, two translation models of training, the first model x → z From the potential translation of the input sentence generation Mongol z of majority language x, second model z → y generates Chinese according to the potential translation The final translation of y, it then follows the step of standard EM algorithm and Jensen inequality, the lower bound of p on entire training data D (y | x) is such as Under:

Wherein: L (Q) is L (θ；D lower bound), L (θ；It D) is likelihood function, θ is the model parameter of p (z | x) and p (y | z) Parameter value when concentration reaches translation accuracy rate on setting value, and p (y | x) it indicates to translate the accurate of Chinese y from majority language x Rate, Q (z) are any Posterior distrbutionps of z, Q (z)=p (z | x).

It is weighted examination with translation of the IBM model to generation, translation probability is calculated according to given bilingual data, it is described Bilingual data refers to low-resource to (x；Or (y z)；z).

The pseudo- sample generated by model p (z | x) or p (z | y) and real bilingual sample are blended in together with the ratio of 1:1 In one small lot, to stablize training process.

The entire training process algorithm of the present invention is as follows:

Input: resource bilingual data (x abundant；Y), low-resource bilingual data (x；And (y z)；z)

Output: parameter θ_z|x,θ_y|z,θ_z|yAnd θ_x|z

1: pre-training p (z | x), p (z | y), p (x | z), p (y | z)

2:while does not restrain do

3: parallel corpora (x, y) ∈ D between majority language x and Chinese y, the parallel corpora between majority language x and Mongol z (x^*,z^*) ∈ D, the parallel corpora (y between Chinese y and Mongol z^*,z^*)∈D

The direction 4:x → y: optimization θ_z|x,θ_y|z

5: generating z ' from p (z ' | x) and establish training batch B₁=(x, z ') ∪ (x^*,z^*), B₁Indicate sample (x；Z) it adds (x after training the pseudo- Parallel Corpus come；Z) parallel corpora, the corpus of the newly-generated Mongol z of z ' expression, B₂=(y, z′)∪(y^*,z^*), B₂Indicate sample (y；Z) (the y after being added to the pseudo- Parallel Corpus for training and；Z) parallel corpora

6:E step: B is used₁Update θ_z|x

7:M step: B is used₂Update θ_y|z

The direction 8:y → x: optimization θ_z|y,θ_x|z

9: generating z ' from p (z ' | y) and establish training batch B₃=(y, z ') ∪ (y^*,z^*), B₄=(x, z ') ∪ (x^*,z^*)

10:E step: B is used₃Update θ_z|y

11:M step: B is used₄Update θ_x|z

12:end while

13: returning: θ_z|x, θ_y|z, θ_z|y, θ_x|z

Compared with existing end-to-end neural machine translation method, the present invention has fully considered Parallel Corpus in rare foreign languages Limited problem especially covers Chinese Parallel Corpus scarcity problem, improves under the premise of Parallel Corpus scarcity and covers Chinese translation Quality；Secondly, utilizing the unified two-way Mongolian translation model of EM algorithm combined optimization；Finally, by model x → z or z → y The pseudo- sample of generation and real bilingual sample are blended in same small lot with the ratio of 1:1 and stablize training process.

Detailed description of the invention

The triangle that Fig. 1 is low-resource NMT learns architecture diagram.

Fig. 2 is end-to-end coder-decoder structure.

Specific embodiment

The embodiment that the present invention will be described in detail with reference to the accompanying drawings and examples.

Problem description: the illiteracy Chinese nerve machine translation method based on triangle framework, it is excellent with unified two-way EM algorithm joint Change Mongolian translation model.

Mongol is indicated with z, Chinese is indicated with y, English is indicated with x, unified two-way broad sense EM process is as follows:

The training of the translation of x → y is decomposed into two stages to train two translation models, the first model x → z is from x's The potential translation of sentence generation z is inputted, second model z → y generates the final translation of y language according to the potential translation, this two A process uses coder-decoder structure end to end；In addition, it then follows the step of standard EM algorithm and Jensen inequality, The lower bound for obtaining the p (y | x) on entire training data D is as follows:

Wherein: L (Q) is L (θ；D lower bound), L (θ；It D) is likelihood function, θ is the model parameter of p (z | x) and p (y | z) Parameter value when concentration reaches translation accuracy rate on setting value, and p (z | x) it indicates to translate the accurate of language z from language x Rate, p (y | z) indicate the accuracy rate that language y is translated from language z, p (y | x) indicate the accuracy rate that language y is translated from language x, D indicates entire training set, and Q (z) is any Posterior distrbutionp of z, Q (z)=p (z | x).

The direction x → y

E: optimization θ_z|x

In order to make L (Q) and L (θ；D error reaches minimum between), uses following formula:

M: optimization θ_y|z

Wherein: θ_y|zIt indicates to make to translate parameter value when accuracy rate reaches on setting value when translating language y from language z, E_{Z~p (z | x)}The mathematic expectaion of z when indicating to translate z from x；

The direction y → x

E: optimization θ_z|y

Wherein: θ_z|yIt indicates to make to translate parameter value when accuracy rate reaches on setting value when translating language z from language y, P (z | y) indicate the accuracy rate that language z is translated from language y；

M: optimization θ_x|z

Wherein: θ_x|zIt indicates to make to translate parameter value when accuracy rate reaches on setting value when translating language x from language z, E_{Z~p (z | y)}Indicating the mathematic expectaion of z when translating z from y, p (x | z) indicates the accuracy rate that language x is translated from language z, p (x | Y) accuracy rate that language x is translated from language y is indicated；

In above-mentioned two-way training process, E step is performed both by gradient decline training, and the gradient descent algorithm on the direction x → y is public Formula is as follows:

Training process algorithm is as follows:

Output: parameter θ_z|x,θ_y|z,θ_z|yAnd θ_x|z

1: pre-training p (z | x), p (z | y), p (x | z), p (y | z)

2:while does not restrain do

3: parallel corpora (x, y) ∈ D between language x and language y, the parallel corpora (x between language x and language z^*,z^*) ∈ D, the parallel corpora (y between language y and language z^*,z^*)∈D

The direction 4:x → y: optimization θ_z|x,θ_y|z

5: generating z ' from p (z ' | x) and establish training batch B₁=(x, z ') ∪ (x^*,z^*), B₁Indicate sample (x；Z) it adds (x after training the pseudo- Parallel Corpus come；Z) parallel corpora, the corpus of the newly-generated language z of z ' expression, B₂=(y, z′)∪(y^*,z^*), B₂Indicate sample (y；Z) (the y after being added to the pseudo- Parallel Corpus for training and；Z) parallel corpora, z ' table Show the corpus of newly-generated language z

6:E step: B is used₁Update θ_z|x

7:M step: B is used₂Update θ_y|z

The direction 8:y → x: optimization θ_z|y,θ_x|z

10:E step: B is used₃Update θ_z|y

11:M step: B is used₄Update θ_x|z

12:end while

13: returning: θ_z|x, θ_y|z, θ_z|y, θ_x|z

The guarantee of training process stability:

In order to guarantee the stability of training process, by the pseudo- sample generated by model x → z or z → y and real bilingual sample This is blended in same small lot with the ratio of 1:1.

It is that English covers, illiteracy English, Meng Han, the Chinese cover and utilize coder-decoder knot end to end between any bilingual below The process of structure translation:

Referring to Fig. 2, firstly, the top half of Fig. 2, encoder encode source language sentence, it is semantic to generate context Vector Groups, then, the coding that these context semantic vectors are intended to as user.During generation (lower part of Fig. 2), decoding Device combination attention mechanism generates each of object language word, while generating each word, considers input this institute Corresponding context semantic vector, so that the content that this generates is consistent with the meaning of original language.

Specific translation steps are as follows:

1. the source language sentence that encoder reads input；

2. the sentence read is encoded to hidden layer state using Recognition with Recurrent Neural Network by encoder, a context semanteme is formed Vector Groups；

3. each word that decoder combination attention mechanism sequentially generates object language.

Claims

1. a kind of illiteracy Chinese nerve machine translation method based on triangle framework, which is characterized in that using Mongol as intermediate hidden change Amount, is introduced into the translation between majority language x and Chinese, the translation between majority language x and Chinese is decomposed into via Mongol Two steps, it is excellent with unified two-way EM algorithm joint in the case where maximizing translation majority language x and target a possibility that Chinese Change Mongolian translation model, promoted and cover Chinese translation quality, the translation between any two of them combination still uses end-to-end Coder-decoder structure.

2. the illiteracy Chinese nerve machine translation method based on triangle framework according to claim 1, which is characterized in that indicated with z Mongol indicates that Chinese, the two-way EM algorithmic procedure of the unification are as follows with y:

The direction x → y

E: optimization θ_z|x

Wherein: θ_z|xIt indicates to make to translate parameter value when accuracy rate reaches on setting value when translating Mongol z from majority language x, P (z | x) indicate the accuracy rate that Mongol z is translated from majority language x, it is really to be distributed, p (z | y) it indicates to translate illiteracy from Chinese y The accuracy rate of archaism z is the fitting distribution of p (z | x), and KL () is KullbackLeibler divergence；KL(p(z|x)||p(z| Y) it) indicates when being fitted true distribution p (z | x) with p (z | y), the information loss of generation；

M: optimization θ_y|z

Wherein: θ_y|zIt indicates to make to translate parameter value when accuracy rate reaches on setting value when translating Chinese y from Mongol z, E_{Z~p (z | x)}The mathematic expectaion of z when indicating to translate Mongol z from majority language x, and p (y | z) it indicates to translate Chinese y from Mongol z Accuracy rate, D indicates entire training set；

The direction y → x

E: optimization θ_z|y

Wherein: θ_z|yIt indicates to make to translate parameter value when accuracy rate reaches on setting value when translating Mongol z from Chinese y；

M: optimization θ_x|z

Wherein: θ_x|zIt indicates to make to translate parameter value when accuracy rate reaches on setting value when translating majority language x from Mongol z, E_{Z~p (z | y)}The mathematic expectaion of z when indicating to translate Mongol z from Chinese y, and p (x | z) it indicates to translate majority language x from Mongol z Accuracy rate.

4. the illiteracy Chinese nerve machine translation method based on triangle framework according to claim 2, which is characterized in that p (z | x), p (z | y), p (y | z) and p (x | z) are trained by the sample that itself is generated.

5. the illiteracy Chinese nerve machine translation method based on triangle framework according to claim 2, which is characterized in that the x → y The training of direction translation is decomposed into two stages, two translation models of training, input sentence of the first model x → z from majority language x Son generates the potential translation of Mongol z, and second model z → y generates the final translation of Chinese y according to the potential translation, it then follows The lower bound of the step of standard EM algorithm and Jensen inequality, the p (y | x) on entire training data D is as follows:

Wherein: L (Q) is L (θ；D lower bound), L (θ；It D) is likelihood function, θ is the model parameter concentration of p (z | x) and p (y | z) Parameter value when reaching translation accuracy rate on setting value, and p (y | x) indicate the accuracy rate that Chinese y is translated from majority language x, Q (z) be z any Posterior distrbutionp, Q (z)=p (z | x).

6. the illiteracy Chinese nerve machine translation method based on triangle framework according to claim 2, which is characterized in that use IBM mould Type is weighted examination to the translation of generation, calculates translation probability according to given bilingual data, the bilingual data refers to low money Source is to (x；Or (y z)；z).

7. the illiteracy Chinese nerve machine translation method based on triangle framework according to claim 5, which is characterized in that will be by model The pseudo- sample and real bilingual sample that p (z | x) or p (z | y) are generated are blended in same small lot with the ratio of 1:1, with steady Determine training process.

8. the illiteracy Chinese nerve machine translation method based on triangle framework according to claim 6, which is characterized in that entire training Process algorithm is as follows:

Output: parameter θ_z|x,θ_y|z,θ_z|yAnd θ_x|z

1: pre-training p (z | x), p (z | y), p (x | z), p (y | z)

2:while does not restrain do

3: parallel corpora (x, y) ∈ D between majority language x and Chinese y, the parallel corpora (x between majority language x and Mongol z^*, z^*) ∈ D, the parallel corpora (y between Chinese y and Mongol z^*,z^*)∈D

The direction 4:x → y: optimization θ_z|x,θ_y|z

5: generating z ' from p (z ' | x) and establish training batch B₁=(x, z ') ∪ (x^*,z^*), B₁Indicate sample (x；Z) it is added to instruction (x after practising the pseudo- Parallel Corpus come；Z) parallel corpora, the corpus of the newly-generated Mongol z of z ' expression, B₂=(y, z ') ∪(y^*,z^*), B₂Indicate sample (y；Z) (the y after being added to the pseudo- Parallel Corpus for training and；Z) parallel corpora

6:E step: B is used₁Update θ_z|x

7:M step: B is used₂Update θ_y|z

The direction 8:y → x: optimization θ_z|y,θ_x|z

10:E step: B is used₃Update θ_z|y

11:M step: B is used₄Update θ_x|z

12:end while

13: returning: θ_z|x, θ_y|z, θ_z|y, θ_x|z

9. the illiteracy Chinese nerve machine translation method based on triangle framework according to claim 1, which is characterized in that the big language Kind x is English.