[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111444730A - Data enhancement Weihan machine translation system training method and device based on Transformer model - Google Patents

Data enhancement Weihan machine translation system training method and device based on Transformer model Download PDF

Info

Publication number
CN111444730A
CN111444730A CN202010226101.6A CN202010226101A CN111444730A CN 111444730 A CN111444730 A CN 111444730A CN 202010226101 A CN202010226101 A CN 202010226101A CN 111444730 A CN111444730 A CN 111444730A
Authority
CN
China
Prior art keywords
phrase
word
phrases
chinese
uygur
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010226101.6A
Other languages
Chinese (zh)
Inventor
艾山·吾买尔
西热艾力·海热拉
刘文其
盛嘉宝
早克热·卡德尔
郑炅
徐翠云
斯拉吉艾合麦提·如则麦麦提
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinjiang University
Original Assignee
Xinjiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinjiang University filed Critical Xinjiang University
Priority to CN202010226101.6A priority Critical patent/CN111444730A/en
Publication of CN111444730A publication Critical patent/CN111444730A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a method and a device for training a data-enhanced Wei-Han machine translation system based on a Transformer model, wherein the Transformer model consists of an encoder and a decoder, the left half part of the model is an encoder end and consists of 6 identical layers, and each layer consists of two sub-layers. The right half is the decoder side, which consists of 6 identical layers, each layer consisting of three sub-layers. The problem of poor translation performance of the neural machine translation model under the condition of resource shortage is greatly improved, and the generalization capability of the model is improved. Experimental results show that data are forged by 17 ten thousand pairs of Wei-Han parallel linguistic data and a translation model is trained, and finally the translation quality is improved to a certain extent.

Description

Data enhancement Weihan machine translation system training method and device based on Transformer model
Technical Field
The invention relates to the technical field of translation, in particular to a method and a device for training a data-enhanced Weihan machine translation system based on a Transformer model.
Background
Machine translation is the process of converting one natural language to another by machine. The concept of machine translation has been proposed to go through roughly four stages: rule-based machine translation, instance-based machine translation, statistical-based machine translation, and neural machine translation. The traditional machine translation method needs manually set translation rules and wide-coverage parallel corpora, and has the difficulties of high cost and long development period. The neural machine translation concept is put forward and then receives the attention of a plurality of researchers, and the translation performance of the neural machine translation exceeds that of the traditional machine translation method.
The neural machine translation method has different ideas from statistical machine translation, and the main idea of the statistical machine translation method is to construct a statistical translation model by counting a large number of parallel corpora, while the neural machine translation method is to construct the neural machine translation model by firstly converting texts into numbers and secondly operating the numbers. The method for converting the text into the number has discrete representation and distributed representation, when the one-hot represents the word vector of the word, the size of the word list is set as the length of the vector, the value of one dimension in the vector is 1, and the value of the other dimensions is 0, but the meaning of the word cannot be effectively represented on the semantic level. Google published a Word2vec Word vector training tool in 2013, and Word2vec trained Word vector models quickly and efficiently with given text data. The model can represent the vector of the words on the semantic layer, and the similarity of the two words can be conveniently calculated. Word2vec is a milestone in the field of natural language processing, which facilitates individual ones of the natural language processing tasks.
The neural machine translation system mainly comprises an encoder and a decoder, wherein the encoder encodes sentences of any length in a source language, and the decoder takes vectors of specific lengths output by the encoder as input and decodes sentences in a target language. The structure is modeled in an end-to-end fashion, with all parameters of the model being trained with an objective function. Fig. 1 shows the structure of an encoder-decoder model.
The method is characterized in that a cyclic neural network (RNN), a long-short term memory (L STM), a gated recurrent neural network (GRU), a Transformer and the like are adopted in different neural machine translation systems of an encoder and a decoder.
The existing machine translation depends on large-scale high-quality parallel corpora, which require millions or even tens of millions of parallel corpora to be trained to achieve certain effect, but for the language of the resource of Uygur language, the large-scale parallel corpora cannot be obtained, and even if the large-scale parallel corpora exist, the quality of the long sentence translation of the machine translation based on statistics and the machine translation based on L STM is not high,
disclosure of Invention
The invention aims to provide a method and a device for training a data-enhanced Weihan machine translation system based on a Transformer model, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: the training device for the data enhancement Winhamer machine translation system based on the Transformer model comprises an encoder and a decoder, wherein the left half part of the model is an encoder end and consists of 6 identical layers, and each layer consists of two sublayers. The right half is the decoder side, which consists of 6 identical layers, each layer consisting of three sub-layers.
Preferably, the first sub-layer self-attention layer and the second sub-layer are feedforward neural networks, each word passes through the self-attention layer first, the word is coded, the position information of the word is obtained through a position coder, a query sum key value pair vector is created from an input vector, and the three vectors are trained through a point product scaling attention algorithm.
Preferably, the training method comprises the following steps:
A. preprocessing the corpus:
B. phrase alignment, extraction and filtering, and extracting noun phrases;
C. pseudo parallel sentence pairs are generated.
Preferably, the preprocessing in the step a includes preprocessing chinese and uygur, using an uygur preprocessing tool and a segmentation tool, performing extension region-basis region transcoding and segmentation on uygur, performing full-angle-half-angle conversion on chinese corpus, and segmenting the chinese corpus using a hayward chinese segmentation tool.
Preferably, phrase alignment and phrase pair extraction are performed by using a statistical machine translation tool moses in the step B to obtain about ten million phrase pairs; the phrase filtering is to filter the extracted phrase pairs by simply defining the following rules:
a. filtering phrase pairs containing punctuation marks;
b. filtering pairs of phrases containing numbers;
c. filtering phrase pairs in which the Chinese phrase contains non-Chinese characters or the Uygur phrase contains non-Uygur characters;
d. filtering phrase pairs with too large or too small length proportion;
e. filtering single words and non-noun phrases, and then remaining 324 ten thousand phrase pairs;
extracting noun phrases, performing syntactic analysis on the Chinese sentence by using a syntactic analyzer of the Hadamard, and extracting all noun phrases in the sentence; due to the lack of the Uygur syntax analyzer, the phrase alignment table is used to find out the Uygur noun phrases corresponding to the Chinese noun phrases.
Preferably, step C includes:
a. training word vectors, namely training word vector models by using Chinese and Uygur language monolingual corpora, wherein the word vectors are the skip-gram models in word2 vec;
b. calculating phrase similarity: firstly, calculating phrase vectors on the basis of word vectors, secondly, calculating the similarity of two phrases through cosine similarity, and adding the vectors of each word in the phrases and then averaging to obtain the vectors of the phrases; then respectively calculating the similarity of each phrase and all phrases in the phrase table, wherein the cosine similarity is used when calculating the phrase similarity; the phrase vector and phrase similarity formula are calculated as follows:
Figure BDA0002427696360000041
Figure BDA0002427696360000042
where p is a phrase vector, wiIs the vector of the ith word, piAnd pjTwo phrase vectors with similarity needing to be calculated;
c. and generating a sentence: the noun phrases in the original sentence pair are replaced by the phrases with the highest similarity in the phrase table, the similarity of the phrases of Uygur is calculated, and when the phrases of Uygur are replaced, the corresponding phrases in the Chinese sentences are replaced at the same time.
d. Screening for pseudo-parallel corpora and filtering out non-compliance rules using SRI L M for 359-ten thousand Uygur languagesThe method comprises the steps of respectively training language models of Uygur language and Chinese language from monolingual data and monolingual data of 354 thousand Chinese language, calculating the confusion degree of each newly generated sentence through the trained language models, filtering out sentences of which the confusion degree is 5 higher than that of original sentences, wherein the confusion degree measurement is an index for evaluating the quality of the language models, the confusion degree is the measure of information theory, the measure is used for measuring the quality of a probability model prediction sample, the lower the confusion degree is, the better the confusion degree is, and a given text corpus w containing n words1,w2,…,wnAnd a language model function L M, L M based on word history for assigning probabilities to words, where the degree of confusion in the corpus is:
Figure BDA0002427696360000043
compared with the prior art, the invention has the beneficial effects that: the invention greatly improves the problem of poor translation performance of the neural machine translation model under the condition of resource shortage, and improves the generalization capability of the model. Experimental results show that data are forged by 17 ten thousand pairs of Wei-Han parallel linguistic data and a translation model is trained, and finally the translation quality is improved to a certain extent.
Drawings
FIG. 1 is a schematic diagram of a prior art encoder-decoder model;
FIG. 2 is a diagram of a prior art system architecture;
FIG. 3 is a model block diagram of the present invention;
FIG. 4 is a vector diagram of a target sentence corresponding to a query vector according to the present invention;
FIG. 5 is a diagram of a data query architecture in accordance with the present invention;
FIG. 6 is a schematic illustration of the present invention in position embedding;
FIG. 7 is a flow chart of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-7, the present invention provides a technical solution: the training device for the data enhancement Winhamer machine translation system based on the Transformer model comprises an encoder and a decoder, wherein the left half part of the model is an encoder end and consists of 6 identical layers, and each layer consists of two sublayers. The right half is the decoder side, which consists of 6 identical layers, each layer consisting of three sub-layers.
In the invention, a first sub-layer self-attention layer and a second sub-layer of an encoder are feedforward neural networks, each word passes through the self-attention layer firstly, the word is encoded, the position information of the word is obtained through a position encoder, a query and key value pair vector is established from an input vector, and the three vectors are trained through a scaling dot product attention algorithm. In this algorithm, it is assumed that the key "k" and the value "v" are the same vector, and as shown in fig. 4, the query vector q corresponds to the vector of the target sentence.
The specific operation comprises three steps
1. A dot product operation is performed on each query vector q and the "key".
2. The dot product results were normalized with softmax.
3. The final multiplication by the value "v" is used as the attention vector.
The mathematical formula for the calculation is as follows:
Figure BDA0002427696360000061
attention of multiple heads: the multi-head attention repeats the zooming dot product attention process h times in order to acquire more semantic information in sentences as much as possible, and a plurality of query values q1 are obtained; the final results of parallel computations performed on n { q1, q 2.., qn } are combined into a matrix, and the architecture is shown in fig. 5.
The invention discloses a data enhancement Wei-Han machine translation system training method based on a Transformer model, which is characterized by comprising the following steps: the training method comprises the following steps:
A. preprocessing the materials;
B. phrase alignment, extraction and filtering, and extracting noun phrases;
C. pseudo parallel sentence pairs are generated.
The preprocessing in the step A comprises preprocessing Chinese and preprocessing Uygur language, the Uygur language preprocessing tool and the word segmentation tool are used for carrying out extension region-basic region code conversion and word segmentation on the Uygur language, full-angle-half-angle conversion is carried out on Chinese language materials, and the Chinese word segmentation tool with the size of Harmony is used for carrying out word segmentation on the Chinese language materials.
B, aligning and extracting short words, and performing phrase alignment and phrase pair extraction by using a statistical machine translation tool moses to obtain about ten million phrase pairs; the phrase filtering is to filter the extracted phrase pairs by simply defining the following rules:
a. filtering phrase pairs containing punctuation marks;
b. filtering pairs of phrases containing numbers;
c. filtering phrase pairs in which the Chinese phrase contains non-Chinese characters or the Uygur phrase contains non-Uygur characters;
d. filtering phrase pairs with too large or too small length proportion;
e. filtering single words and non-noun phrases, and then remaining 324 ten thousand phrase pairs;
extracting noun phrases, performing syntactic analysis on the Chinese sentence by using a syntactic analyzer of the Hadamard, and extracting all noun phrases in the sentence; due to the lack of the Uygur syntax analyzer, the phrase alignment table is used to find out the Uygur noun phrases corresponding to the Chinese noun phrases.
The step C comprises the following steps:
a. training word vectors, namely training word vector models by using Chinese and Uygur language monolingual corpora, wherein the word vectors are the skip-gram models in word2 vec;
b. calculating phrase similarity: firstly, calculating phrase vectors on the basis of word vectors, secondly, calculating the similarity of two phrases through cosine similarity, and adding the vectors of each word in the phrases and then averaging to obtain the vectors of the phrases; then respectively calculating the similarity of each phrase and all phrases in the phrase table, wherein the cosine similarity is used when calculating the phrase similarity; the phrase vector and phrase similarity formula are calculated as follows:
Figure BDA0002427696360000071
Figure BDA0002427696360000072
where p is a phrase vector, wiIs the vector of the ith word, piAnd pjTwo phrase vectors with similarity needing to be calculated;
c. and generating a sentence: the noun phrases in the original sentence pair are replaced by the phrases with the highest similarity in the phrase table, the similarity of the phrases of Uygur is calculated, and when the phrases of Uygur are replaced, the corresponding phrases in the Chinese sentences are replaced at the same time.
d. Using SRI L M to train language models of 359 ten thousand Uygur languages and 354 ten thousand Chinese language data respectively, calculating the confusion degree of each newly generated sentence through the trained language models, and filtering out the sentences of which the confusion degree is 5 higher than that of the original sentence, wherein the confusion degree measurement is an index for evaluating the good quality of the language models, the confusion degree is a measure of information theory, the lower the confusion degree is, the better the probability model is, and the given text corpus w containing n words1,w2,…,wnAnd a language model function L M, L M based on word history for assigning probabilities to words, where the degree of confusion in the corpus is:
Figure BDA0002427696360000081
a good language model will assign a higher probability to the samples in the corpus and will also have a lower confusion value.
The invention uses a Transformer model to train a dimension-Chinese machine translation model. The Transformer model is consistent with the encoder-decoder model in the structure, and the problem of long sequence dependence in the neural network is effectively solved by adopting the attention layer and the full connection layer, so that a better effect is achieved. Fig. 3 shows the structure of the transform model, the left half is the encoder side, which is composed of a multi-headed attention layer and a fully-connected layer, and the right half is the decoder side, which is composed of a multi-headed attention layer and a fully-connected layer. Wherein the multi-headed attention layer is different from the encoder segment, and is composed of a self-attention layer and an encoder-decoder attention layer. The concrete structure is as follows:
1. packet attention network
The attention network can be viewed as mapping a data query Q onto a key (K) -value (V) pair and producing a weighted output. Unlike conventional attention mechanisms that use only one attention network to generate one context vector, the grouped attention network concatenates multiple attention networks, specifically, given (Q, K, V), Q, K and V are first mapped to different spaces using different linear mappings, respectively, and then context vectors for the different spaces are computed using different attention networks and concatenated to the final output. Is calculated by the formula
MultiHead(Q,K,V)=Concat(head1,head2,…,headn)Wo
Wherein the headi=Attention(QWi Q,KWi K,VWi V),QWi Q,KWi K,VWi VIs group i, WoLinear mapping parameters for the final context are generated after splicing.
Position coding: the attention network based encoder and decoder cannot take into account location information, which is important for language understanding and generation. To solve this problem, the position embedding method is applied to an attention-based encoder and decoder. As shown in fig. 6, the position-embedded vector adds together the word-embedded vectors in a bit-wise manner, so that the position-embedded vector has the same length as the word-embedded vector.
Unlike the position embedding method which needs a proper parameter, the fixed position embedding method which is defined based on the trigonometric function and does not need to be learned is specifically defined as follows:
Figure BDA0002427696360000091
Figure BDA0002427696360000092
where pos is the number of the word, 2i and 2i +1 are the dimension of the position code, dmodelIs the length of the position code. PE (pos,2i) defines the value of the 2 i-th dimension of the position code with position number pos. Similarly, PE (pos,2i +1) defines the (2i +1) th dimension of the position code. The advantage of such a design position coding is that the position coding of two positions pos and (pos + k) separated by k words is a linear transformation defined by the position coding of k, and the specific process is as follows:
Figure BDA0002427696360000093
self-attention network performance: in both the encoder and decoder, a self-attention network is used to model Uygur sentences and Chinese sentences. The temporal complexity of the self-attention network is lower than for constructing encoders and decoders using a recurrent neural network and a convolutional neural network to model sentences. Assuming that the sentence length is n, the length of the hidden state vector is d, and the convolution kernel size of the convolutional neural network is k, the respective computational complexity is shown in table 1.
Figure BDA0002427696360000101
Single-layer computational complexity refers to the total computational complexity when only one layer of such a network is used. When the sentence length n is less than the length d of the implicit vector, the total computation from the attention network is less. In most cases, the sentence length is significantly smaller than the length of the implicit vector. The sequence operand is the number of operations which are required to be sequentially executed to generate the hidden state corresponding to each word in the sentence (for example, the cyclic neural network can generate the hidden state of the next word only after the hidden state corresponding to the previous word is generated, and the generation of the hidden state between the words cannot be parallel), and the larger the sequence operand is, the weaker the parallelization capability is; the maximum path length is the maximum number of operations that need to be performed under conditions that ensure that each word in the sentence can affect another word. For example, in the cyclic neural network, the first word of a sentence affects the last word, the information needs to be transmitted to all hidden states behind the sentence, and the hidden states are (n-1) nodes in total, so the complexity of the maximum length of the sentence is O (n), each layer of the convolutional neural network only enables k words in a convolutional kernel to affect each other, and log is overlappedk(n) layers of convolutional networks, while the self-attention network can directly link any two words in a sentence through an attention mechanism.
The invention adopts 17 ten thousand dimensional Chinese parallel sentence pairs as initial training corpus to carry out experiments, and verifies the effect of the invention. In order to avoid the generated data from being too similar, 3 ten thousand sentences and 8 ten thousand sentences with the best quality are screened out from the generated 200 ten thousand sentences and used as pseudo parallel sentence pairs to be respectively tested, and the test results are shown in table 2:
Figure BDA0002427696360000102
Figure BDA0002427696360000111
from Table 2, it can be seen that in the case of using only the original corpus, the model using the Transformer model is improved by 9.35B L EU values compared with the model using RNN, and is improved by 4.52B L EU. compared with the model using statistical machine translation Moses, the model of the original corpus running under the Transformer model is used as a pre-training model, and then the training corpus consisting of the original corpus and newly generated 3 ten thousand sentence pairs is used to continue training in the Transformer model, and the final result is improved by 0.7B L EU values compared with the result of using only the original corpus and the Transformer model, and the training corpus consisting of the original corpus and the newly generated 13 ten thousand sentence pairs is improved by 1.05B L EU values on the Transformer model.
In conclusion, the invention greatly improves the problem of poor translation performance of the neural machine translation model under the condition of resource shortage, and improves the generalization capability of the model. Experimental results show that data are forged by 17 ten thousand pairs of Wei-Han parallel linguistic data and a translation model is trained, and finally the translation quality is improved to a certain extent.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (6)

1. The training device of the data enhancement dimension Chinese machine translation system based on the Transformer model, the Transformer model is composed of an encoder and a decoder, and is characterized in that: the left half of the model is the encoder side and consists of 6 identical layers, each layer consisting of two sub-layers. The right half is the decoder side, which consists of 6 identical layers, each layer consisting of three sub-layers.
2. The device for training a data-enhanced Withania machine translation system based on a Transformer model according to claim 1, wherein: the first sub-layer self-attention layer and the second sub-layer are feedforward neural networks, each word passes through the self-attention layer firstly, the word is coded, the position information of the word is obtained through a position coder, a query vector and a key value pair vector are created from an input vector, and the three vectors are trained through a scaling dot product attention algorithm.
3. A data enhancement Wei-Han machine translation system training method based on a Transformer model is characterized by comprising the following steps: the training method comprises the following steps:
A. preprocessing the corpus:
B. phrase alignment, extraction and filtering, and extracting noun phrases;
C. pseudo parallel sentence pairs are generated.
4. The method for training the data-enhanced Withania machine translation system based on the Transformer model according to claim 3, wherein: the preprocessing in the step A comprises preprocessing Chinese and preprocessing Uygur language, the Uygur language preprocessing tool and the word segmentation tool are used for carrying out extension region-basic region code conversion and word segmentation on the Uygur language, carrying out full-angle-half-angle conversion on Chinese language materials, and using a Chinese word segmentation tool with large Harmony to segment the Chinese language materials.
5. The method for training the data-enhanced Withania machine translation system based on the Transformer model according to claim 3, wherein: in the step B, phrase alignment and phrase pair extraction are carried out by using a statistical machine translation tool moses to obtain about ten million phrase pairs; the phrase filtering is to filter the extracted phrase pairs by simply defining the following rules:
a. filtering phrase pairs containing punctuation marks;
b. filtering pairs of phrases containing numbers;
c. filtering phrase pairs in which the Chinese phrase contains non-Chinese characters or the Uygur phrase contains non-Uygur characters;
d. filtering phrase pairs with too large or too small length proportion;
e. filtering single words and non-noun phrases, and then remaining 324 ten thousand phrase pairs;
extracting noun phrases, performing syntactic analysis on the Chinese sentence by using a syntactic analyzer of the Hadamard, and extracting all noun phrases in the sentence; due to the lack of the Uygur syntax analyzer, the phrase alignment table is used to find out the Uygur noun phrases corresponding to the Chinese noun phrases.
6. The method for training the data-enhanced Withania machine translation system based on the Transformer model according to claim 3, wherein: the step C comprises the following steps:
a. training word vectors, namely training word vector models by using Chinese and Uygur language monolingual corpora, wherein the word vectors are the skip-gram models in word2 vec;
b. calculating phrase similarity: firstly, calculating phrase vectors on the basis of word vectors, secondly, calculating the similarity of two phrases through cosine similarity, and adding the vectors of each word in the phrases and then averaging to obtain the vectors of the phrases; then respectively calculating the similarity of each phrase and all phrases in the phrase table, wherein the cosine similarity is used when calculating the phrase similarity; the phrase vector and phrase similarity formula are calculated as follows:
Figure FDA0002427696350000021
Figure FDA0002427696350000022
where p is a phrase vector, wiIs the vector of the ith word, piAnd pjTwo phrase vectors with similarity needing to be calculated;
c. and generating a sentence: the noun phrases in the original sentence pair are replaced by the phrases with the highest similarity in the phrase table, the similarity of the phrases of Uygur is calculated, and when the phrases of Uygur are replaced, the corresponding phrases in the Chinese sentences are replaced at the same time.
d. Using SRI L M to train language models of 359 ten thousand Uygur languages and 354 ten thousand Chinese language data respectively, calculating the confusion degree of each newly generated sentence through the trained language models, and filtering out the sentences of which the confusion degree is 5 higher than that of the original sentence, wherein the confusion degree measurement is an index for evaluating the good quality of the language models, the confusion degree is a measure of information theory, the lower the confusion degree is, the better the probability model is, and the given text corpus w containing n words1,w2,…,wnAnd a language model function L M, L M based on word history for assigning probabilities to words, where the degree of confusion in the corpus is:
Figure FDA0002427696350000031
CN202010226101.6A 2020-03-27 2020-03-27 Data enhancement Weihan machine translation system training method and device based on Transformer model Pending CN111444730A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010226101.6A CN111444730A (en) 2020-03-27 2020-03-27 Data enhancement Weihan machine translation system training method and device based on Transformer model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010226101.6A CN111444730A (en) 2020-03-27 2020-03-27 Data enhancement Weihan machine translation system training method and device based on Transformer model

Publications (1)

Publication Number Publication Date
CN111444730A true CN111444730A (en) 2020-07-24

Family

ID=71652486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010226101.6A Pending CN111444730A (en) 2020-03-27 2020-03-27 Data enhancement Weihan machine translation system training method and device based on Transformer model

Country Status (1)

Country Link
CN (1) CN111444730A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507734A (en) * 2020-11-19 2021-03-16 南京大学 Roman Uygur language-based neural machine translation system
CN112633018A (en) * 2020-12-28 2021-04-09 内蒙古工业大学 Mongolian Chinese neural machine translation method based on data enhancement
CN113742467A (en) * 2021-09-02 2021-12-03 新疆大学 Dialog state generation method and device for hierarchically selecting slot-position-related context
CN116562311A (en) * 2023-07-07 2023-08-08 中铁四局集团有限公司 Operation and maintenance method and system based on natural language machine translation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102654867A (en) * 2011-03-02 2012-09-05 北京百度网讯科技有限公司 Webpage sorting method and system in cross-language search
CN104050160A (en) * 2014-03-12 2014-09-17 北京紫冬锐意语音科技有限公司 Machine and human translation combined spoken language translation method and device
CN104391842A (en) * 2014-12-18 2015-03-04 苏州大学 Translation model establishing method and system
CN105808530A (en) * 2016-03-23 2016-07-27 苏州大学 Translation method and device in statistical machine translation
CN106484682A (en) * 2015-08-25 2017-03-08 阿里巴巴集团控股有限公司 Based on the machine translation method of statistics, device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102654867A (en) * 2011-03-02 2012-09-05 北京百度网讯科技有限公司 Webpage sorting method and system in cross-language search
CN104050160A (en) * 2014-03-12 2014-09-17 北京紫冬锐意语音科技有限公司 Machine and human translation combined spoken language translation method and device
CN104391842A (en) * 2014-12-18 2015-03-04 苏州大学 Translation model establishing method and system
CN106484682A (en) * 2015-08-25 2017-03-08 阿里巴巴集团控股有限公司 Based on the machine translation method of statistics, device and electronic equipment
CN105808530A (en) * 2016-03-23 2016-07-27 苏州大学 Translation method and device in statistical machine translation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张金超等: "基于多编码器多解码器的大规模维汉神经网络机器翻译模型", 《中文信息学报》 *
杨洋: "基于神经网络的维汉翻译研究与实现", 《中国优秀硕士论文电子期刊网》 *
祁青山等: "基 于 ATT-IndRNN-CNN 的维吾尔语名词指代消解", 《中文信息学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507734A (en) * 2020-11-19 2021-03-16 南京大学 Roman Uygur language-based neural machine translation system
CN112507734B (en) * 2020-11-19 2024-03-19 南京大学 Neural machine translation system based on romanized Uygur language
CN112633018A (en) * 2020-12-28 2021-04-09 内蒙古工业大学 Mongolian Chinese neural machine translation method based on data enhancement
CN113742467A (en) * 2021-09-02 2021-12-03 新疆大学 Dialog state generation method and device for hierarchically selecting slot-position-related context
CN113742467B (en) * 2021-09-02 2023-08-08 新疆大学 Method and device for generating dialogue state of hierarchical selection slot phase context
CN116562311A (en) * 2023-07-07 2023-08-08 中铁四局集团有限公司 Operation and maintenance method and system based on natural language machine translation
CN116562311B (en) * 2023-07-07 2023-12-01 中铁四局集团有限公司 Operation and maintenance method and system based on natural language machine translation

Similar Documents

Publication Publication Date Title
CN111382582B (en) Neural machine translation decoding acceleration method based on non-autoregressive
Zhang et al. Understanding subtitles by character-level sequence-to-sequence learning
CN111444730A (en) Data enhancement Weihan machine translation system training method and device based on Transformer model
CN109062907B (en) Neural machine translation method integrating dependency relationship
CN110059324B (en) Neural network machine translation method and device based on dependency information supervision
CN110598221A (en) Method for improving translation quality of Mongolian Chinese by constructing Mongolian Chinese parallel corpus by using generated confrontation network
CN109635124A (en) A kind of remote supervisory Relation extraction method of combination background knowledge
CN111414476A (en) Attribute-level emotion analysis method based on multi-task learning
CN111382574B (en) Semantic parsing system combining syntax under virtual reality and augmented reality scenes
CN113569562B (en) Method and system for reducing cross-modal and cross-language barriers of end-to-end voice translation
CN111401079A (en) Training method and device of neural network machine translation model and storage medium
Kesavan et al. Deep learning based automatic image caption generation
Leng et al. Using recurrent neural network structure with enhanced multi-head self-attention for sentiment analysis
CN113657123A (en) Mongolian aspect level emotion analysis method based on target template guidance and relation head coding
Hong et al. Interpretable sequence classification via prototype trajectory
CN113887251A (en) Mongolian Chinese machine translation method combining Meta-KD framework and fine-grained compression
CN113947072A (en) Text error correction method and text error correction device
CN118262874A (en) Knowledge-graph-based traditional Chinese medicine diagnosis and treatment model data expansion system and method
CN116681090A (en) BestTransformer Haematococcus conversion method and system
CN116821326A (en) Text abstract generation method and device based on self-attention and relative position coding
Hujon et al. Neural machine translation systems for English to Khasi: A case study of an Austroasiatic language
CN115169285A (en) Event extraction method and system based on graph analysis
CN114580376A (en) Chinese abstract generating method based on component sentence method analysis
CN116048613A (en) Attention mechanism-based code abstracting method for graph sequence association
CN116681087B (en) Automatic problem generation method based on multi-stage time sequence and semantic information enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200724

RJ01 Rejection of invention patent application after publication