CN111444730A - Data enhancement Weihan machine translation system training method and device based on Transformer model - Google Patents
Data enhancement Weihan machine translation system training method and device based on Transformer model Download PDFInfo
- Publication number
- CN111444730A CN111444730A CN202010226101.6A CN202010226101A CN111444730A CN 111444730 A CN111444730 A CN 111444730A CN 202010226101 A CN202010226101 A CN 202010226101A CN 111444730 A CN111444730 A CN 111444730A
- Authority
- CN
- China
- Prior art keywords
- phrase
- word
- phrases
- chinese
- uygur
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013519 translation Methods 0.000 title claims abstract description 56
- 238000012549 training Methods 0.000 title claims abstract description 30
- 238000000034 method Methods 0.000 title claims abstract description 26
- 239000013598 vector Substances 0.000 claims description 66
- 238000001914 filtration Methods 0.000 claims description 25
- 238000007781 pre-processing Methods 0.000 claims description 14
- 230000011218 segmentation Effects 0.000 claims description 10
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 5
- 239000000463 material Substances 0.000 claims description 5
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 241000586313 Withania Species 0.000 claims 4
- 235000001978 Withania somnifera Nutrition 0.000 claims 4
- 230000001537 neural effect Effects 0.000 abstract description 11
- 239000010410 layer Substances 0.000 description 31
- 238000010586 diagram Methods 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 3
- 125000004122 cyclic group Chemical group 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000009901 attention process Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses a method and a device for training a data-enhanced Wei-Han machine translation system based on a Transformer model, wherein the Transformer model consists of an encoder and a decoder, the left half part of the model is an encoder end and consists of 6 identical layers, and each layer consists of two sub-layers. The right half is the decoder side, which consists of 6 identical layers, each layer consisting of three sub-layers. The problem of poor translation performance of the neural machine translation model under the condition of resource shortage is greatly improved, and the generalization capability of the model is improved. Experimental results show that data are forged by 17 ten thousand pairs of Wei-Han parallel linguistic data and a translation model is trained, and finally the translation quality is improved to a certain extent.
Description
Technical Field
The invention relates to the technical field of translation, in particular to a method and a device for training a data-enhanced Weihan machine translation system based on a Transformer model.
Background
Machine translation is the process of converting one natural language to another by machine. The concept of machine translation has been proposed to go through roughly four stages: rule-based machine translation, instance-based machine translation, statistical-based machine translation, and neural machine translation. The traditional machine translation method needs manually set translation rules and wide-coverage parallel corpora, and has the difficulties of high cost and long development period. The neural machine translation concept is put forward and then receives the attention of a plurality of researchers, and the translation performance of the neural machine translation exceeds that of the traditional machine translation method.
The neural machine translation method has different ideas from statistical machine translation, and the main idea of the statistical machine translation method is to construct a statistical translation model by counting a large number of parallel corpora, while the neural machine translation method is to construct the neural machine translation model by firstly converting texts into numbers and secondly operating the numbers. The method for converting the text into the number has discrete representation and distributed representation, when the one-hot represents the word vector of the word, the size of the word list is set as the length of the vector, the value of one dimension in the vector is 1, and the value of the other dimensions is 0, but the meaning of the word cannot be effectively represented on the semantic level. Google published a Word2vec Word vector training tool in 2013, and Word2vec trained Word vector models quickly and efficiently with given text data. The model can represent the vector of the words on the semantic layer, and the similarity of the two words can be conveniently calculated. Word2vec is a milestone in the field of natural language processing, which facilitates individual ones of the natural language processing tasks.
The neural machine translation system mainly comprises an encoder and a decoder, wherein the encoder encodes sentences of any length in a source language, and the decoder takes vectors of specific lengths output by the encoder as input and decodes sentences in a target language. The structure is modeled in an end-to-end fashion, with all parameters of the model being trained with an objective function. Fig. 1 shows the structure of an encoder-decoder model.
The method is characterized in that a cyclic neural network (RNN), a long-short term memory (L STM), a gated recurrent neural network (GRU), a Transformer and the like are adopted in different neural machine translation systems of an encoder and a decoder.
The existing machine translation depends on large-scale high-quality parallel corpora, which require millions or even tens of millions of parallel corpora to be trained to achieve certain effect, but for the language of the resource of Uygur language, the large-scale parallel corpora cannot be obtained, and even if the large-scale parallel corpora exist, the quality of the long sentence translation of the machine translation based on statistics and the machine translation based on L STM is not high,
disclosure of Invention
The invention aims to provide a method and a device for training a data-enhanced Weihan machine translation system based on a Transformer model, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: the training device for the data enhancement Winhamer machine translation system based on the Transformer model comprises an encoder and a decoder, wherein the left half part of the model is an encoder end and consists of 6 identical layers, and each layer consists of two sublayers. The right half is the decoder side, which consists of 6 identical layers, each layer consisting of three sub-layers.
Preferably, the first sub-layer self-attention layer and the second sub-layer are feedforward neural networks, each word passes through the self-attention layer first, the word is coded, the position information of the word is obtained through a position coder, a query sum key value pair vector is created from an input vector, and the three vectors are trained through a point product scaling attention algorithm.
Preferably, the training method comprises the following steps:
A. preprocessing the corpus:
B. phrase alignment, extraction and filtering, and extracting noun phrases;
C. pseudo parallel sentence pairs are generated.
Preferably, the preprocessing in the step a includes preprocessing chinese and uygur, using an uygur preprocessing tool and a segmentation tool, performing extension region-basis region transcoding and segmentation on uygur, performing full-angle-half-angle conversion on chinese corpus, and segmenting the chinese corpus using a hayward chinese segmentation tool.
Preferably, phrase alignment and phrase pair extraction are performed by using a statistical machine translation tool moses in the step B to obtain about ten million phrase pairs; the phrase filtering is to filter the extracted phrase pairs by simply defining the following rules:
a. filtering phrase pairs containing punctuation marks;
b. filtering pairs of phrases containing numbers;
c. filtering phrase pairs in which the Chinese phrase contains non-Chinese characters or the Uygur phrase contains non-Uygur characters;
d. filtering phrase pairs with too large or too small length proportion;
e. filtering single words and non-noun phrases, and then remaining 324 ten thousand phrase pairs;
extracting noun phrases, performing syntactic analysis on the Chinese sentence by using a syntactic analyzer of the Hadamard, and extracting all noun phrases in the sentence; due to the lack of the Uygur syntax analyzer, the phrase alignment table is used to find out the Uygur noun phrases corresponding to the Chinese noun phrases.
Preferably, step C includes:
a. training word vectors, namely training word vector models by using Chinese and Uygur language monolingual corpora, wherein the word vectors are the skip-gram models in word2 vec;
b. calculating phrase similarity: firstly, calculating phrase vectors on the basis of word vectors, secondly, calculating the similarity of two phrases through cosine similarity, and adding the vectors of each word in the phrases and then averaging to obtain the vectors of the phrases; then respectively calculating the similarity of each phrase and all phrases in the phrase table, wherein the cosine similarity is used when calculating the phrase similarity; the phrase vector and phrase similarity formula are calculated as follows:
where p is a phrase vector, wiIs the vector of the ith word, piAnd pjTwo phrase vectors with similarity needing to be calculated;
c. and generating a sentence: the noun phrases in the original sentence pair are replaced by the phrases with the highest similarity in the phrase table, the similarity of the phrases of Uygur is calculated, and when the phrases of Uygur are replaced, the corresponding phrases in the Chinese sentences are replaced at the same time.
d. Screening for pseudo-parallel corpora and filtering out non-compliance rules using SRI L M for 359-ten thousand Uygur languagesThe method comprises the steps of respectively training language models of Uygur language and Chinese language from monolingual data and monolingual data of 354 thousand Chinese language, calculating the confusion degree of each newly generated sentence through the trained language models, filtering out sentences of which the confusion degree is 5 higher than that of original sentences, wherein the confusion degree measurement is an index for evaluating the quality of the language models, the confusion degree is the measure of information theory, the measure is used for measuring the quality of a probability model prediction sample, the lower the confusion degree is, the better the confusion degree is, and a given text corpus w containing n words1,w2,…,wnAnd a language model function L M, L M based on word history for assigning probabilities to words, where the degree of confusion in the corpus is:
compared with the prior art, the invention has the beneficial effects that: the invention greatly improves the problem of poor translation performance of the neural machine translation model under the condition of resource shortage, and improves the generalization capability of the model. Experimental results show that data are forged by 17 ten thousand pairs of Wei-Han parallel linguistic data and a translation model is trained, and finally the translation quality is improved to a certain extent.
Drawings
FIG. 1 is a schematic diagram of a prior art encoder-decoder model;
FIG. 2 is a diagram of a prior art system architecture;
FIG. 3 is a model block diagram of the present invention;
FIG. 4 is a vector diagram of a target sentence corresponding to a query vector according to the present invention;
FIG. 5 is a diagram of a data query architecture in accordance with the present invention;
FIG. 6 is a schematic illustration of the present invention in position embedding;
FIG. 7 is a flow chart of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-7, the present invention provides a technical solution: the training device for the data enhancement Winhamer machine translation system based on the Transformer model comprises an encoder and a decoder, wherein the left half part of the model is an encoder end and consists of 6 identical layers, and each layer consists of two sublayers. The right half is the decoder side, which consists of 6 identical layers, each layer consisting of three sub-layers.
In the invention, a first sub-layer self-attention layer and a second sub-layer of an encoder are feedforward neural networks, each word passes through the self-attention layer firstly, the word is encoded, the position information of the word is obtained through a position encoder, a query and key value pair vector is established from an input vector, and the three vectors are trained through a scaling dot product attention algorithm. In this algorithm, it is assumed that the key "k" and the value "v" are the same vector, and as shown in fig. 4, the query vector q corresponds to the vector of the target sentence.
The specific operation comprises three steps
1. A dot product operation is performed on each query vector q and the "key".
2. The dot product results were normalized with softmax.
3. The final multiplication by the value "v" is used as the attention vector.
The mathematical formula for the calculation is as follows:
attention of multiple heads: the multi-head attention repeats the zooming dot product attention process h times in order to acquire more semantic information in sentences as much as possible, and a plurality of query values q1 are obtained; the final results of parallel computations performed on n { q1, q 2.., qn } are combined into a matrix, and the architecture is shown in fig. 5.
The invention discloses a data enhancement Wei-Han machine translation system training method based on a Transformer model, which is characterized by comprising the following steps: the training method comprises the following steps:
A. preprocessing the materials;
B. phrase alignment, extraction and filtering, and extracting noun phrases;
C. pseudo parallel sentence pairs are generated.
The preprocessing in the step A comprises preprocessing Chinese and preprocessing Uygur language, the Uygur language preprocessing tool and the word segmentation tool are used for carrying out extension region-basic region code conversion and word segmentation on the Uygur language, full-angle-half-angle conversion is carried out on Chinese language materials, and the Chinese word segmentation tool with the size of Harmony is used for carrying out word segmentation on the Chinese language materials.
B, aligning and extracting short words, and performing phrase alignment and phrase pair extraction by using a statistical machine translation tool moses to obtain about ten million phrase pairs; the phrase filtering is to filter the extracted phrase pairs by simply defining the following rules:
a. filtering phrase pairs containing punctuation marks;
b. filtering pairs of phrases containing numbers;
c. filtering phrase pairs in which the Chinese phrase contains non-Chinese characters or the Uygur phrase contains non-Uygur characters;
d. filtering phrase pairs with too large or too small length proportion;
e. filtering single words and non-noun phrases, and then remaining 324 ten thousand phrase pairs;
extracting noun phrases, performing syntactic analysis on the Chinese sentence by using a syntactic analyzer of the Hadamard, and extracting all noun phrases in the sentence; due to the lack of the Uygur syntax analyzer, the phrase alignment table is used to find out the Uygur noun phrases corresponding to the Chinese noun phrases.
The step C comprises the following steps:
a. training word vectors, namely training word vector models by using Chinese and Uygur language monolingual corpora, wherein the word vectors are the skip-gram models in word2 vec;
b. calculating phrase similarity: firstly, calculating phrase vectors on the basis of word vectors, secondly, calculating the similarity of two phrases through cosine similarity, and adding the vectors of each word in the phrases and then averaging to obtain the vectors of the phrases; then respectively calculating the similarity of each phrase and all phrases in the phrase table, wherein the cosine similarity is used when calculating the phrase similarity; the phrase vector and phrase similarity formula are calculated as follows:
where p is a phrase vector, wiIs the vector of the ith word, piAnd pjTwo phrase vectors with similarity needing to be calculated;
c. and generating a sentence: the noun phrases in the original sentence pair are replaced by the phrases with the highest similarity in the phrase table, the similarity of the phrases of Uygur is calculated, and when the phrases of Uygur are replaced, the corresponding phrases in the Chinese sentences are replaced at the same time.
d. Using SRI L M to train language models of 359 ten thousand Uygur languages and 354 ten thousand Chinese language data respectively, calculating the confusion degree of each newly generated sentence through the trained language models, and filtering out the sentences of which the confusion degree is 5 higher than that of the original sentence, wherein the confusion degree measurement is an index for evaluating the good quality of the language models, the confusion degree is a measure of information theory, the lower the confusion degree is, the better the probability model is, and the given text corpus w containing n words1,w2,…,wnAnd a language model function L M, L M based on word history for assigning probabilities to words, where the degree of confusion in the corpus is:
a good language model will assign a higher probability to the samples in the corpus and will also have a lower confusion value.
The invention uses a Transformer model to train a dimension-Chinese machine translation model. The Transformer model is consistent with the encoder-decoder model in the structure, and the problem of long sequence dependence in the neural network is effectively solved by adopting the attention layer and the full connection layer, so that a better effect is achieved. Fig. 3 shows the structure of the transform model, the left half is the encoder side, which is composed of a multi-headed attention layer and a fully-connected layer, and the right half is the decoder side, which is composed of a multi-headed attention layer and a fully-connected layer. Wherein the multi-headed attention layer is different from the encoder segment, and is composed of a self-attention layer and an encoder-decoder attention layer. The concrete structure is as follows:
1. packet attention network
The attention network can be viewed as mapping a data query Q onto a key (K) -value (V) pair and producing a weighted output. Unlike conventional attention mechanisms that use only one attention network to generate one context vector, the grouped attention network concatenates multiple attention networks, specifically, given (Q, K, V), Q, K and V are first mapped to different spaces using different linear mappings, respectively, and then context vectors for the different spaces are computed using different attention networks and concatenated to the final output. Is calculated by the formula
MultiHead(Q,K,V)=Concat(head1,head2,…,headn)Wo
Wherein the headi=Attention(QWi Q,KWi K,VWi V),QWi Q,KWi K,VWi VIs group i, WoLinear mapping parameters for the final context are generated after splicing.
Position coding: the attention network based encoder and decoder cannot take into account location information, which is important for language understanding and generation. To solve this problem, the position embedding method is applied to an attention-based encoder and decoder. As shown in fig. 6, the position-embedded vector adds together the word-embedded vectors in a bit-wise manner, so that the position-embedded vector has the same length as the word-embedded vector.
Unlike the position embedding method which needs a proper parameter, the fixed position embedding method which is defined based on the trigonometric function and does not need to be learned is specifically defined as follows:
where pos is the number of the word, 2i and 2i +1 are the dimension of the position code, dmodelIs the length of the position code. PE (pos,2i) defines the value of the 2 i-th dimension of the position code with position number pos. Similarly, PE (pos,2i +1) defines the (2i +1) th dimension of the position code. The advantage of such a design position coding is that the position coding of two positions pos and (pos + k) separated by k words is a linear transformation defined by the position coding of k, and the specific process is as follows:
self-attention network performance: in both the encoder and decoder, a self-attention network is used to model Uygur sentences and Chinese sentences. The temporal complexity of the self-attention network is lower than for constructing encoders and decoders using a recurrent neural network and a convolutional neural network to model sentences. Assuming that the sentence length is n, the length of the hidden state vector is d, and the convolution kernel size of the convolutional neural network is k, the respective computational complexity is shown in table 1.
Single-layer computational complexity refers to the total computational complexity when only one layer of such a network is used. When the sentence length n is less than the length d of the implicit vector, the total computation from the attention network is less. In most cases, the sentence length is significantly smaller than the length of the implicit vector. The sequence operand is the number of operations which are required to be sequentially executed to generate the hidden state corresponding to each word in the sentence (for example, the cyclic neural network can generate the hidden state of the next word only after the hidden state corresponding to the previous word is generated, and the generation of the hidden state between the words cannot be parallel), and the larger the sequence operand is, the weaker the parallelization capability is; the maximum path length is the maximum number of operations that need to be performed under conditions that ensure that each word in the sentence can affect another word. For example, in the cyclic neural network, the first word of a sentence affects the last word, the information needs to be transmitted to all hidden states behind the sentence, and the hidden states are (n-1) nodes in total, so the complexity of the maximum length of the sentence is O (n), each layer of the convolutional neural network only enables k words in a convolutional kernel to affect each other, and log is overlappedk(n) layers of convolutional networks, while the self-attention network can directly link any two words in a sentence through an attention mechanism.
The invention adopts 17 ten thousand dimensional Chinese parallel sentence pairs as initial training corpus to carry out experiments, and verifies the effect of the invention. In order to avoid the generated data from being too similar, 3 ten thousand sentences and 8 ten thousand sentences with the best quality are screened out from the generated 200 ten thousand sentences and used as pseudo parallel sentence pairs to be respectively tested, and the test results are shown in table 2:
from Table 2, it can be seen that in the case of using only the original corpus, the model using the Transformer model is improved by 9.35B L EU values compared with the model using RNN, and is improved by 4.52B L EU. compared with the model using statistical machine translation Moses, the model of the original corpus running under the Transformer model is used as a pre-training model, and then the training corpus consisting of the original corpus and newly generated 3 ten thousand sentence pairs is used to continue training in the Transformer model, and the final result is improved by 0.7B L EU values compared with the result of using only the original corpus and the Transformer model, and the training corpus consisting of the original corpus and the newly generated 13 ten thousand sentence pairs is improved by 1.05B L EU values on the Transformer model.
In conclusion, the invention greatly improves the problem of poor translation performance of the neural machine translation model under the condition of resource shortage, and improves the generalization capability of the model. Experimental results show that data are forged by 17 ten thousand pairs of Wei-Han parallel linguistic data and a translation model is trained, and finally the translation quality is improved to a certain extent.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Claims (6)
1. The training device of the data enhancement dimension Chinese machine translation system based on the Transformer model, the Transformer model is composed of an encoder and a decoder, and is characterized in that: the left half of the model is the encoder side and consists of 6 identical layers, each layer consisting of two sub-layers. The right half is the decoder side, which consists of 6 identical layers, each layer consisting of three sub-layers.
2. The device for training a data-enhanced Withania machine translation system based on a Transformer model according to claim 1, wherein: the first sub-layer self-attention layer and the second sub-layer are feedforward neural networks, each word passes through the self-attention layer firstly, the word is coded, the position information of the word is obtained through a position coder, a query vector and a key value pair vector are created from an input vector, and the three vectors are trained through a scaling dot product attention algorithm.
3. A data enhancement Wei-Han machine translation system training method based on a Transformer model is characterized by comprising the following steps: the training method comprises the following steps:
A. preprocessing the corpus:
B. phrase alignment, extraction and filtering, and extracting noun phrases;
C. pseudo parallel sentence pairs are generated.
4. The method for training the data-enhanced Withania machine translation system based on the Transformer model according to claim 3, wherein: the preprocessing in the step A comprises preprocessing Chinese and preprocessing Uygur language, the Uygur language preprocessing tool and the word segmentation tool are used for carrying out extension region-basic region code conversion and word segmentation on the Uygur language, carrying out full-angle-half-angle conversion on Chinese language materials, and using a Chinese word segmentation tool with large Harmony to segment the Chinese language materials.
5. The method for training the data-enhanced Withania machine translation system based on the Transformer model according to claim 3, wherein: in the step B, phrase alignment and phrase pair extraction are carried out by using a statistical machine translation tool moses to obtain about ten million phrase pairs; the phrase filtering is to filter the extracted phrase pairs by simply defining the following rules:
a. filtering phrase pairs containing punctuation marks;
b. filtering pairs of phrases containing numbers;
c. filtering phrase pairs in which the Chinese phrase contains non-Chinese characters or the Uygur phrase contains non-Uygur characters;
d. filtering phrase pairs with too large or too small length proportion;
e. filtering single words and non-noun phrases, and then remaining 324 ten thousand phrase pairs;
extracting noun phrases, performing syntactic analysis on the Chinese sentence by using a syntactic analyzer of the Hadamard, and extracting all noun phrases in the sentence; due to the lack of the Uygur syntax analyzer, the phrase alignment table is used to find out the Uygur noun phrases corresponding to the Chinese noun phrases.
6. The method for training the data-enhanced Withania machine translation system based on the Transformer model according to claim 3, wherein: the step C comprises the following steps:
a. training word vectors, namely training word vector models by using Chinese and Uygur language monolingual corpora, wherein the word vectors are the skip-gram models in word2 vec;
b. calculating phrase similarity: firstly, calculating phrase vectors on the basis of word vectors, secondly, calculating the similarity of two phrases through cosine similarity, and adding the vectors of each word in the phrases and then averaging to obtain the vectors of the phrases; then respectively calculating the similarity of each phrase and all phrases in the phrase table, wherein the cosine similarity is used when calculating the phrase similarity; the phrase vector and phrase similarity formula are calculated as follows:
where p is a phrase vector, wiIs the vector of the ith word, piAnd pjTwo phrase vectors with similarity needing to be calculated;
c. and generating a sentence: the noun phrases in the original sentence pair are replaced by the phrases with the highest similarity in the phrase table, the similarity of the phrases of Uygur is calculated, and when the phrases of Uygur are replaced, the corresponding phrases in the Chinese sentences are replaced at the same time.
d. Using SRI L M to train language models of 359 ten thousand Uygur languages and 354 ten thousand Chinese language data respectively, calculating the confusion degree of each newly generated sentence through the trained language models, and filtering out the sentences of which the confusion degree is 5 higher than that of the original sentence, wherein the confusion degree measurement is an index for evaluating the good quality of the language models, the confusion degree is a measure of information theory, the lower the confusion degree is, the better the probability model is, and the given text corpus w containing n words1,w2,…,wnAnd a language model function L M, L M based on word history for assigning probabilities to words, where the degree of confusion in the corpus is:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010226101.6A CN111444730A (en) | 2020-03-27 | 2020-03-27 | Data enhancement Weihan machine translation system training method and device based on Transformer model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010226101.6A CN111444730A (en) | 2020-03-27 | 2020-03-27 | Data enhancement Weihan machine translation system training method and device based on Transformer model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111444730A true CN111444730A (en) | 2020-07-24 |
Family
ID=71652486
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010226101.6A Pending CN111444730A (en) | 2020-03-27 | 2020-03-27 | Data enhancement Weihan machine translation system training method and device based on Transformer model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111444730A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112507734A (en) * | 2020-11-19 | 2021-03-16 | 南京大学 | Roman Uygur language-based neural machine translation system |
CN112633018A (en) * | 2020-12-28 | 2021-04-09 | 内蒙古工业大学 | Mongolian Chinese neural machine translation method based on data enhancement |
CN113742467A (en) * | 2021-09-02 | 2021-12-03 | 新疆大学 | Dialog state generation method and device for hierarchically selecting slot-position-related context |
CN116562311A (en) * | 2023-07-07 | 2023-08-08 | 中铁四局集团有限公司 | Operation and maintenance method and system based on natural language machine translation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102654867A (en) * | 2011-03-02 | 2012-09-05 | 北京百度网讯科技有限公司 | Webpage sorting method and system in cross-language search |
CN104050160A (en) * | 2014-03-12 | 2014-09-17 | 北京紫冬锐意语音科技有限公司 | Machine and human translation combined spoken language translation method and device |
CN104391842A (en) * | 2014-12-18 | 2015-03-04 | 苏州大学 | Translation model establishing method and system |
CN105808530A (en) * | 2016-03-23 | 2016-07-27 | 苏州大学 | Translation method and device in statistical machine translation |
CN106484682A (en) * | 2015-08-25 | 2017-03-08 | 阿里巴巴集团控股有限公司 | Based on the machine translation method of statistics, device and electronic equipment |
-
2020
- 2020-03-27 CN CN202010226101.6A patent/CN111444730A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102654867A (en) * | 2011-03-02 | 2012-09-05 | 北京百度网讯科技有限公司 | Webpage sorting method and system in cross-language search |
CN104050160A (en) * | 2014-03-12 | 2014-09-17 | 北京紫冬锐意语音科技有限公司 | Machine and human translation combined spoken language translation method and device |
CN104391842A (en) * | 2014-12-18 | 2015-03-04 | 苏州大学 | Translation model establishing method and system |
CN106484682A (en) * | 2015-08-25 | 2017-03-08 | 阿里巴巴集团控股有限公司 | Based on the machine translation method of statistics, device and electronic equipment |
CN105808530A (en) * | 2016-03-23 | 2016-07-27 | 苏州大学 | Translation method and device in statistical machine translation |
Non-Patent Citations (3)
Title |
---|
张金超等: "基于多编码器多解码器的大规模维汉神经网络机器翻译模型", 《中文信息学报》 * |
杨洋: "基于神经网络的维汉翻译研究与实现", 《中国优秀硕士论文电子期刊网》 * |
祁青山等: "基 于 ATT-IndRNN-CNN 的维吾尔语名词指代消解", 《中文信息学报》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112507734A (en) * | 2020-11-19 | 2021-03-16 | 南京大学 | Roman Uygur language-based neural machine translation system |
CN112507734B (en) * | 2020-11-19 | 2024-03-19 | 南京大学 | Neural machine translation system based on romanized Uygur language |
CN112633018A (en) * | 2020-12-28 | 2021-04-09 | 内蒙古工业大学 | Mongolian Chinese neural machine translation method based on data enhancement |
CN113742467A (en) * | 2021-09-02 | 2021-12-03 | 新疆大学 | Dialog state generation method and device for hierarchically selecting slot-position-related context |
CN113742467B (en) * | 2021-09-02 | 2023-08-08 | 新疆大学 | Method and device for generating dialogue state of hierarchical selection slot phase context |
CN116562311A (en) * | 2023-07-07 | 2023-08-08 | 中铁四局集团有限公司 | Operation and maintenance method and system based on natural language machine translation |
CN116562311B (en) * | 2023-07-07 | 2023-12-01 | 中铁四局集团有限公司 | Operation and maintenance method and system based on natural language machine translation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111382582B (en) | Neural machine translation decoding acceleration method based on non-autoregressive | |
Zhang et al. | Understanding subtitles by character-level sequence-to-sequence learning | |
CN111444730A (en) | Data enhancement Weihan machine translation system training method and device based on Transformer model | |
CN109062907B (en) | Neural machine translation method integrating dependency relationship | |
CN110059324B (en) | Neural network machine translation method and device based on dependency information supervision | |
CN110598221A (en) | Method for improving translation quality of Mongolian Chinese by constructing Mongolian Chinese parallel corpus by using generated confrontation network | |
CN109635124A (en) | A kind of remote supervisory Relation extraction method of combination background knowledge | |
CN111414476A (en) | Attribute-level emotion analysis method based on multi-task learning | |
CN111382574B (en) | Semantic parsing system combining syntax under virtual reality and augmented reality scenes | |
CN113569562B (en) | Method and system for reducing cross-modal and cross-language barriers of end-to-end voice translation | |
CN111401079A (en) | Training method and device of neural network machine translation model and storage medium | |
Kesavan et al. | Deep learning based automatic image caption generation | |
Leng et al. | Using recurrent neural network structure with enhanced multi-head self-attention for sentiment analysis | |
CN113657123A (en) | Mongolian aspect level emotion analysis method based on target template guidance and relation head coding | |
Hong et al. | Interpretable sequence classification via prototype trajectory | |
CN113887251A (en) | Mongolian Chinese machine translation method combining Meta-KD framework and fine-grained compression | |
CN113947072A (en) | Text error correction method and text error correction device | |
CN118262874A (en) | Knowledge-graph-based traditional Chinese medicine diagnosis and treatment model data expansion system and method | |
CN116681090A (en) | BestTransformer Haematococcus conversion method and system | |
CN116821326A (en) | Text abstract generation method and device based on self-attention and relative position coding | |
Hujon et al. | Neural machine translation systems for English to Khasi: A case study of an Austroasiatic language | |
CN115169285A (en) | Event extraction method and system based on graph analysis | |
CN114580376A (en) | Chinese abstract generating method based on component sentence method analysis | |
CN116048613A (en) | Attention mechanism-based code abstracting method for graph sequence association | |
CN116681087B (en) | Automatic problem generation method based on multi-stage time sequence and semantic information enhancement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200724 |
|
RJ01 | Rejection of invention patent application after publication |