CN113420571A

CN113420571A - Text translation method, device and equipment based on deep learning and storage medium

Info

Publication number: CN113420571A
Application number: CN202110691222.2A
Authority: CN
Inventors: 付亚州
Original assignee: Kangjian Information Technology Shenzhen Co Ltd
Current assignee: Kangjian Information Technology Shenzhen Co Ltd
Priority date: 2021-06-22
Filing date: 2021-06-22
Publication date: 2021-09-21
Also published as: WO2022267674A1

Abstract

The invention relates to the field of artificial intelligence, and discloses a text translation method, a text translation device, text translation equipment and a text translation storage medium based on deep learning, which are used for performing word embedding and position coding processing on a text to be translated to obtain a position vector, calling a preset encoder and a preset decoder to perform coding and decoding processing on the position vector, and improving the accuracy of the translated text. The text translation method based on deep learning comprises the following steps: acquiring a text to be translated, and performing word embedding and position coding processing on the text to be translated to obtain a position vector; calling a preset encoder, and encoding the position vector to obtain an initial characteristic vector; calling a preset decoder to decode the initial characteristic vector to obtain a target characteristic vector; and performing linear transformation on the target characteristic vector, mapping the linear transformation result to a preset logarithm probability vector, and determining a target text. In addition, the invention also relates to a block chain technology, and the target text can be stored in the block chain node.

Description

Text translation method, device and equipment based on deep learning and storage medium

Technical Field

The invention relates to the field of deep learning, in particular to a text translation method, a text translation device, text translation equipment and a storage medium based on deep learning.

Background

Machine translation refers to the translation of a source language into a target language with a computer, and is one of the ultimate goals of artificial intelligence, and the development of the machine translation has gone through a plurality of different stages: the early stage is a translation method based on rules, then the later stage is a translation method based on statistics, and the later stage is a translation method based on deep learning, so that the quality of machine translation is greatly improved by the translation method based on deep learning.

The rule-based translation system needs a linguist to write conversion rules between two languages, and then the rules are recorded into a computer, the rule-based method has very high requirements on the linguist, the rules of each language are different, the written rules cannot be universal among multiple languages, and are relatively complex, the conversion rules of the machine translation based on statistics are obtained by automatically learning from large-scale linguistic data by a machine instead of rules actively provided manually, but still cannot cover all languages, global features are difficult to utilize, the link error rate of many data preprocessing is high, and the accuracy of translated texts is low.

Disclosure of Invention

The invention provides a text translation method, a text translation device, text translation equipment and a text translation storage medium based on deep learning, which are used for performing word embedding and position coding processing on a text to be translated to obtain a position vector, calling a preset encoder and a preset decoder to perform coding and decoding processing on the position vector, and improving the accuracy of the translated text.

The invention provides a text translation method based on deep learning in a first aspect, which comprises the following steps: acquiring a text to be translated, and performing word embedding and position coding processing on the text to be translated to obtain a position vector; calling a preset encoder, and carrying out encoding processing on the position vector to obtain an initial characteristic vector, wherein the encoder is an encoder of a deep learning network based on a self-attention mechanism; calling a preset decoder, and decoding the initial characteristic vector to obtain a target characteristic vector, wherein the decoder is based on a self-attention mechanism deep learning network; and performing linear transformation on the target feature vector to obtain a linear transformation result, mapping the linear transformation result to a preset logarithm probability vector, and determining a target text according to the logarithm probability vector.

Optionally, in a first implementation manner of the first aspect of the present invention, the obtaining a text to be translated, and performing word embedding and position coding processing on the text to be translated to obtain a position vector includes: acquiring a text to be translated, carrying out unique hot coding processing on the text to be translated to obtain a target unique hot coding vector set, calling a preset embedded model, and mapping the target unique hot coding vector set into a matrix with preset dimensionality to obtain a target dimensionality matrix vector set; and calling a preset position coding function, and determining the position information of each word in the target dimension matrix vector set to obtain a position vector.

Optionally, in a second implementation manner of the first aspect of the present invention, the obtaining a text to be translated, performing unique hot coding on the text to be translated to obtain a target unique hot coding vector set, calling a preset embedding model, and mapping the target unique hot coding vector set to a matrix with preset dimensions respectively to obtain a target dimension matrix vector set includes: acquiring a text to be translated, identifying a target punctuation mark in the text to be translated, and performing sentence division processing on the text to be translated according to the target punctuation mark to obtain a plurality of sub-texts; performing one-hot coding on each word in each sub-text to obtain one-hot coding vector sets corresponding to each sub-text, and determining the one-hot coding vector sets corresponding to all the sub-texts as target one-hot coding vector sets; and calling a preset embedded model to construct a matrix with preset dimensions to obtain a target dimension matrix, and mapping the target unique hot coding vector set through the target dimension matrix to obtain a target dimension matrix vector set.

Optionally, in a third implementation manner of the first aspect of the present invention, the performing unique hot coding on each word in each sub-text to obtain a unique hot coding vector set corresponding to each sub-text, and determining the unique hot coding vector sets corresponding to all the sub-texts as the target unique hot coding vector set includes: calling a preset counting function to count the number of words in each sub-text; filling each word in each sub-text through a preset value to obtain an initial vector set corresponding to each sub-text; checking the initial vector set corresponding to each sub-text according to the number of words in each sub-text to obtain a unique hot coding vector set corresponding to each sub-text which passes the checking; and determining the one-hot coding vector set corresponding to all the sub-texts passing the verification as a target one-hot coding vector set.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the invoking a preset encoder to perform encoding processing on the position vector to obtain an initial feature vector, where the encoder is an encoder of a deep learning network based on a self-attention mechanism, and includes: sequentially performing attention operation and bias operation on the position vector through a first stacking layer in a preset encoder to obtain an intermediate encoding vector, wherein the encoder comprises a plurality of stacking layers which are sequentially connected, each stacking layer comprises a multi-head attention sublayer and a feedforward neural network sublayer, and the encoder is based on a self-attention mechanism deep learning network; and sequentially carrying out attention operation and bias operation on the intermediate coding vector through a target layer in the encoder to obtain an initial feature vector, wherein the target layer is used for indicating the stack layers except the first stack layer in the plurality of stack layers.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the invoking a preset decoder, performing decoding processing on the initial feature vector to obtain a target feature vector, where the decoder is a decoder of a deep learning network based on a self-attention mechanism, and includes: sequentially performing attention operation and bias operation on the initial characteristic vector through a first stacking layer in a preset decoder to obtain an intermediate decoding vector, wherein the decoder comprises a plurality of stacking layers which are sequentially connected, each stacking layer comprises two multi-head attention sublayers and a feedforward neural network sublayer which are sequentially connected, and the decoder is a decoder of a deep learning network based on a self-attention mechanism; and sequentially carrying out attention operation and bias operation on the intermediate decoding vector through a target layer in the decoder to obtain a target characteristic vector, wherein the target layer is used for indicating the stacking layers except the first stacking layer in the plurality of stacking layers.

Optionally, in a sixth implementation manner of the first aspect of the present invention, the performing linear transformation on the target feature vector to obtain a linear transformation result, mapping the linear transformation result to a preset logarithm probability vector, and determining the target text according to the logarithm probability vector includes: performing linear transformation on the target characteristic vector to obtain a linear transformation result, and mapping the linear transformation result to a preset logarithmic probability vector; and obtaining the probability value of the log probability vectors, sequencing the log probability vectors according to the sequencing of the probability value from large to small, and determining the first sequenced log probability vector as a target text.

The second aspect of the present invention provides a text translation apparatus based on deep learning, including: the system comprises an acquisition module, a translation module and a translation module, wherein the acquisition module is used for acquiring a text to be translated, and performing word embedding and position coding processing on the text to be translated to obtain a position vector; the encoding module is used for calling a preset encoder to encode the position vector to obtain an initial feature vector, and the encoder is based on a self-attention mechanism deep learning network; the decoding module is used for calling a preset decoder and decoding the initial characteristic vector to obtain a target characteristic vector, wherein the decoder is based on a self-attention mechanism deep learning network; and the linear transformation module is used for performing linear transformation on the target characteristic vector to obtain a linear transformation result, mapping the linear transformation result to a preset logarithm probability vector, and determining a target text according to the logarithm probability vector.

Optionally, in a first implementation manner of the second aspect of the present invention, the obtaining module includes: the mapping unit is used for acquiring a text to be translated, carrying out unique hot coding processing on the text to be translated to obtain a target unique hot coding vector set, calling a preset embedded model, and mapping the target unique hot coding vector set into a matrix with preset dimensionality to obtain a target dimensionality matrix vector set; and the determining unit is used for calling a preset position coding function, determining the position information of each word in the target dimension matrix vector set and obtaining a position vector.

Optionally, in a second implementation manner of the second aspect of the present invention, the mapping unit includes: the recognition subunit is used for acquiring a text to be translated, recognizing a target punctuation mark in the text to be translated, and performing sentence division processing on the text to be translated according to the target punctuation mark to obtain a plurality of sub-texts; the one-hot coding subunit is used for performing one-hot coding on each word in each sub-text to obtain one-hot coding vector sets corresponding to each sub-text, and determining the one-hot coding vector sets corresponding to all the sub-texts as target one-hot coding vector sets; and the mapping subunit is used for calling a preset embedded model to construct a matrix with preset dimensions to obtain a target dimension matrix, and mapping the target unique hot coding vector set through the target dimension matrix to obtain a target dimension matrix vector set.

Optionally, in a third implementation manner of the second aspect of the present invention, the one-hot coding subunit is specifically configured to: calling a preset counting function to count the number of words in each sub-text; filling each word in each sub-text through a preset value to obtain an initial vector set corresponding to each sub-text; checking the initial vector set corresponding to each sub-text according to the number of words in each sub-text to obtain a unique hot coding vector set corresponding to each sub-text which passes the checking; and determining the one-hot coding vector set corresponding to all the sub-texts passing the verification as a target one-hot coding vector set.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the encoding module includes: the first coding unit is used for sequentially carrying out attention operation and bias operation on the position vector through a first stacking layer in a preset coder to obtain an intermediate coding vector, the coder comprises a plurality of stacking layers, the stacking layers are sequentially connected in sequence, each stacking layer comprises a multi-head attention sublayer and a feedforward neural network sublayer, and the coder is a self-attention mechanism-based deep learning network coder; and the second coding unit is used for sequentially carrying out attention operation and bias operation on the intermediate coding vector through a target layer in the coder to obtain an initial feature vector, wherein the target layer is used for indicating the stacking layers except the first stacking layer in the plurality of stacking layers.

Optionally, in a fifth implementation manner of the second aspect of the present invention, the decoding module includes: the first decoding unit is used for sequentially carrying out attention operation and bias operation on the initial characteristic vector through a first stacking layer in a preset decoder to obtain an intermediate decoding vector, the decoder comprises a plurality of stacking layers, the stacking layers are sequentially connected in sequence, each stacking layer comprises two multi-head attention sublayers and a feedforward neural network sublayer, and the decoder is based on a deep learning network of an automatic attention mechanism; and the second decoding unit is used for sequentially carrying out attention operation and bias operation on the intermediate decoding vector through a target layer in the decoder to obtain a target characteristic vector, wherein the target layer is used for indicating the stacking layers except the first stacking layer in the plurality of stacking layers.

Optionally, in a sixth implementation manner of the second aspect of the present invention, the linear transformation module includes: the linear transformation unit is used for carrying out linear transformation on the target characteristic vector to obtain a linear transformation result and mapping the linear transformation result to a preset logarithmic probability vector; and the sequencing unit is used for acquiring the probability value of the log probability vector, sequencing the log probability vector according to the sequencing of the probability value from large to small, and determining the first log probability vector in the sequencing as the target text.

The third aspect of the present invention provides a text translation apparatus based on deep learning, including: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the deep learning based text translation device to perform the deep learning based text translation method described above.

A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to execute the above-described deep learning-based text translation method.

According to the technical scheme provided by the invention, a text to be translated is obtained, and word embedding and position coding processing are carried out on the text to be translated to obtain a position vector; calling a preset encoder, and carrying out encoding processing on the position vector to obtain an initial characteristic vector, wherein the encoder is an encoder of a deep learning network based on a self-attention mechanism; calling a preset decoder, and decoding the initial characteristic vector to obtain a target characteristic vector, wherein the decoder is based on a self-attention mechanism deep learning network; and performing linear transformation on the target feature vector to obtain a linear transformation result, mapping the linear transformation result to a preset logarithm probability vector, and determining a target text according to the logarithm probability vector. In the embodiment of the invention, word embedding and position coding processing are carried out on the text to be translated to obtain the position vector, and a preset coder and a preset decoder are called to carry out coding and decoding processing on the position vector, so that the accuracy rate of the translated text is improved.

Drawings

FIG. 1 is a diagram of an embodiment of a text translation method based on deep learning according to an embodiment of the present invention;

FIG. 2 is a diagram of another embodiment of a text translation method based on deep learning according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an embodiment of a deep learning based text translation apparatus according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of another embodiment of a deep learning based text translation apparatus according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an embodiment of a text translation device based on deep learning according to an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a text translation method, a text translation device, text translation equipment and a text translation storage medium based on deep learning, which are used for performing word embedding and position coding processing on a text to be translated to obtain a position vector, calling a preset encoder and a preset decoder to perform coding and decoding processing on the position vector, and improving the accuracy of the translated text.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For convenience of understanding, a specific flow of the embodiment of the present invention is described below, and referring to fig. 1, an embodiment of the text translation method based on deep learning in the embodiment of the present invention includes:

101. and acquiring a text to be translated, and performing word embedding and position coding processing on the text to be translated to obtain a position vector.

It is to be understood that the execution subject of the present invention may be a text translation apparatus based on deep learning, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.

The server obtains a text to be translated, and performs word embedding and position coding processing on the text to be translated to obtain a position vector. The text to be translated can be related text in the medical field, the server receives the text to be translated input by the user in the information input field of the medical text translation device, the text to be translated in the embodiment is authorized by the user, the text to be translated includes medical field related text data, such as user inquiry information, a description text of a drug formula described by the user, spoken symptom description information input by the user, and the like, the server divides the text to be translated into a plurality of sub-texts, processes each sub-text by adopting unique hot coding to obtain a target unique hot coding vector set, the unique hot coding is also called one-bit effective coding, a plurality of state registers with a plurality of preset bits are mainly adopted to code a plurality of states with the same number, each state has an independent register bit, and only one register bit is effective at any time, the server calls a preset embedded model word2vec, maps the target unique hot coding vector set into a target dimension matrix vector set, and determines the position information of each word in the target dimension matrix vector set through a sine function and a cosine function in a preset position coding function to obtain a position vector.

102. And calling a preset encoder to encode the position vector to obtain an initial characteristic vector, wherein the encoder is based on a self-attention mechanism deep learning network.

And the server calls a preset encoder to encode the position vector to obtain an initial characteristic vector, wherein the encoder is based on a self-attention mechanism deep learning network. The server inputs the position vector into a preset encoder, in the embodiment, the encoder of the deep learning network based on the self-attention mechanism is adopted, the encoder is formed by stacking a plurality of same layers, each layer comprises two sublayers, the two sublayers are respectively a multi-head attention sublayer and a feedforward neural network sublayer, the multi-head attention sublayer and the feedforward neural network sublayer are sequentially connected, and meanwhile, an addition normalization layer is further included in each sublayer, so that gradient propagation and model convergence are promoted, and the training speed of the model is also accelerated.

103. And calling a preset decoder, and decoding the initial characteristic vector to obtain a target characteristic vector, wherein the decoder is based on a self-attention mechanism deep learning network.

And the server calls a preset decoder to decode the initial characteristic vector to obtain a target characteristic vector, wherein the decoder is based on a self-attention mechanism deep learning network. The server inputs the initial feature vector into a preset decoder, the decoder in the embodiment adopts a decoder of a deep learning network based on an attention mechanism, and compared with the structure of the decoder in the existing deep learning network, the decoder in the embodiment only adds a multi-head attention sublayer in each stacked layer composing the decoder, namely, each stacked layer comprises two connected multi-head attention sublayers and a feedforward neural network sublayer.

104. And performing linear transformation on the target characteristic vector to obtain a linear transformation result, mapping the linear transformation result to a preset logarithm probability vector, and determining a target text according to the logarithm probability vector.

And the server performs linear transformation on the target characteristic vector to obtain a linear transformation result, maps the linear transformation result to a preset logarithm probability vector, and determines a target text according to the logarithm probability vector. The target feature vector is mapped into a vector called logarithm probability (namely a logarithm probability vector) after linear transformation, each logarithm probability vector corresponds to a probability value, each logarithm probability vector corresponds to a word, if ten thousand words are learnt from a training set, the logarithm probability vector is a vector with the length of ten thousand cells, each cell corresponds to the probability value of one word, the server sorts the probability values corresponding to each logarithm probability vector, and the first ordered logarithm probability vector is finally determined as the target text.

In the embodiment of the invention, word embedding and position coding processing are carried out on the text to be translated to obtain the position vector, and a preset coder and a preset decoder are called to carry out coding and decoding processing on the position vector, so that the accuracy rate of the translated text is improved.

Referring to fig. 2, another embodiment of the text translation method based on deep learning according to the embodiment of the present invention includes:

201. the method comprises the steps of obtaining a text to be translated, carrying out one-hot coding processing on the text to be translated to obtain a target one-hot coding vector set, calling a preset embedded model, mapping the target one-hot coding vector set into a matrix with preset dimensionality, and obtaining a target dimensionality matrix vector set.

The server obtains a text to be translated, carries out unique hot coding processing on the text to be translated to obtain a target unique hot coding vector set, calls a preset embedding model, and maps the target unique hot coding vector set into a matrix with preset dimensionality to obtain a target dimensionality matrix vector set. Specifically, the server acquires a text to be translated, identifies a target punctuation mark in the text to be translated, and performs sentence division processing on the text to be translated according to the target punctuation mark to obtain a plurality of sub-texts; the server carries out unique hot coding on each word in each sub-text to obtain a unique hot coding vector set corresponding to each sub-text, and determines the unique hot coding vector sets corresponding to all the sub-texts as a target unique hot coding vector set; the server calls a preset embedded model to construct a matrix with preset dimensions to obtain a target dimension matrix, and the target unique hot coding vector set is mapped through the target dimension matrix to obtain a target dimension matrix vector set.

Specifically, the server performs unique hot coding on each word in each sub-text to obtain a unique hot coding vector set corresponding to each sub-text, and determines the unique hot coding vector sets corresponding to all the sub-texts as a target unique hot coding vector set further includes: the server calls a preset counting function to count the number of words in each sub-text; filling each word in each sub-text by the server through a preset value to obtain an initial vector set corresponding to each sub-text; the server checks the initial vector set corresponding to each sub-text according to the number of words in each sub-text to obtain a unique hot coding vector set corresponding to each sub-text which passes the check; and the server determines the unique hot coding vector set corresponding to all the sub-texts passing the verification as a target unique hot coding vector set.

After obtaining a text to be translated, a server performs sentence division processing on the text to be translated by identifying a target punctuation mark in the text to be translated to obtain a plurality of sub-texts, performs unique hot coding on each word in each sub-text, and simultaneously calls a preset counting function counter to count the number of words in each sub-text, for example: the text to be translated comprises two sub-texts of ' i has satiety after meal and ' i has paroxysmal pain in lower abdomen ', the text to be translated is divided into two sub-texts of ' i has satiety after meal ' and ' i has satiety in lower abdomen ' according to target punctuations, the target punctuations comprise but are not limited to commas, semicolons, periods and the like, the ' i has satiety after meal ' comprises three words of ' i ', ' after meal ' and ' having satiety ', wherein the ' i ' one-hot coded vector can be expressed as [ 1, 0, 0 ], the ' after-meal ' one-hot coded vector can be expressed as [ 0, 1, 0 ], the ' one-hot coded vector having satiety ' can be expressed as [ 0, 0, 1 ], an initial vector set corresponding to each sub-text is obtained, a counting function counter counts that the number of words in the sub-text is 3, and whether the number of the initial set of the sub-text is also 3 is checked, if the number of the words is consistent with that of the sub-texts, obtaining a unique hot coding vector set corresponding to each sub-text which passes the verification, determining the unique hot coding vector set corresponding to all the sub-texts as a target unique hot coding vector set, and if the verification does not pass, performing unique hot coding again until generating the unique hot coding vector set corresponding to each sub-text which passes the verification. The server calls a preset embedded model word2vec, and maps a target unique hot coding vector set into a target dimension matrix vector set, the basic idea of the word2vec is to represent each word in a natural language into a short vector with unified meaning and unified dimension, and in the embodiment, the target unique hot coding vector set is mapped into a matrix with preset dimension through the word2vec to obtain the target dimension matrix vector set.

202. And calling a preset position coding function, and determining the position information of each word in the target dimension matrix vector set to obtain a position vector.

The server calls a preset position coding function to determine the position information of each word in the target dimension matrix vector set,a position vector is obtained. The position vector passes through the sine and cosine function formula p_k,2i＝sin(k/10000^2i/d) And p_k，2i+1＝cos(k/10000^2i/d) To obtain p_k,2iIs the 2 i-th component, p, of the code vector with position information k_k，2i+1Is the 2i +1 th component of the coded vector with position information k.

203. And calling a preset encoder to encode the position vector to obtain an initial characteristic vector, wherein the encoder is based on a self-attention mechanism deep learning network.

And the server calls a preset encoder to encode the position vector to obtain an initial characteristic vector, wherein the encoder is based on a self-attention mechanism deep learning network. Specifically, the server sequentially performs attention operation and bias operation on the position vector through a first stacking layer in a preset encoder to obtain an intermediate encoding vector, the encoder comprises a plurality of stacking layers which are sequentially connected in sequence, each stacking layer comprises a multi-head attention sublayer and a feedforward neural network sublayer, and the encoder is based on a self-attention mechanism deep learning network; the server sequentially performs attention operation and bias operation on the intermediate coding vector through a target layer in the encoder to obtain an initial characteristic vector, wherein the target layer is used for indicating stack layers except a first stack layer in the plurality of stack layers. The multi-head attention sublayer is mainly used for performing attention operation, the feedforward neural network sublayer is used for realizing mapping from input to output, namely bias operation, and a preset encoder is called to perform encoding processing on a position vector to finally obtain an initial feature vector.

204. And calling a preset decoder, and decoding the initial characteristic vector to obtain a target characteristic vector, wherein the decoder is based on a self-attention mechanism deep learning network.

And the server calls a preset decoder to decode the initial characteristic vector to obtain a target characteristic vector, wherein the decoder is based on a self-attention mechanism deep learning network. Specifically, the server sequentially performs attention operation and bias operation on the initial feature vector through a first stack layer in a preset decoder to obtain an intermediate decoding vector, the decoder comprises a plurality of stack layers, the plurality of stack layers are sequentially connected in sequence, each stack layer comprises two multi-head attention sublayers and a feedforward neural network sublayer, and the decoder is based on a self-attention mechanism deep learning network; and the server sequentially performs attention operation and bias operation on the intermediate decoding vector through a target layer in the decoder to obtain a target characteristic vector, wherein the target layer is used for indicating the stack layers except the first stack layer in the plurality of stack layers. Compared with the structure of the decoder in the existing deep learning network, the decoder in the embodiment only adds a multi-head attention sublayer in each stacked layer composing the decoder, namely each stacked layer comprises two connected multi-head attention sublayers and a feedforward neural network sublayer.

205. And performing linear transformation on the target characteristic vector to obtain a linear transformation result, mapping the linear transformation result to a preset logarithm probability vector, and determining a target text according to the logarithm probability vector.

And the server performs linear transformation on the target characteristic vector to obtain a linear transformation result, maps the linear transformation result to a preset logarithm probability vector, and determines a target text according to the logarithm probability vector. Specifically, the server performs linear transformation on the target characteristic vector to obtain a linear transformation result, and maps the linear transformation result to a preset logarithmic probability vector; and the server acquires the probability value of the logarithm probability vectors, sorts the logarithm probability vectors according to the sorting of the probability value from large to small, and determines the first logarithm probability vector to be a target text. The target feature vector is mapped into a vector called logarithm probability (namely a logarithm probability vector) after linear transformation, each logarithm probability vector corresponds to a probability value, each logarithm probability vector corresponds to a word, if ten thousand words are learnt from a training set, the logarithm probability vector is a vector with the length of ten thousand cells, each cell corresponds to the probability value of one word, the server sorts the probability values corresponding to each logarithm probability vector, and the first ordered logarithm probability vector is finally determined as the target text.

In the above description of the text translation method based on deep learning in the embodiment of the present invention, referring to fig. 3, a text translation device based on deep learning in the embodiment of the present invention is described below, and an embodiment of the text translation device based on deep learning in the embodiment of the present invention includes:

the obtaining module 301 is configured to obtain a text to be translated, perform word embedding and position coding processing on the text to be translated, and obtain a position vector;

the encoding module 302 is configured to invoke a preset encoder, perform encoding processing on the position vector to obtain an initial feature vector, where the encoder is an encoder of a deep learning network based on an attention-free mechanism;

the decoding module 303 is configured to invoke a preset decoder, perform decoding processing on the initial feature vector to obtain a target feature vector, where the decoder is a decoder of a deep learning network based on an attention-free mechanism;

and the linear transformation module 304 is configured to perform linear transformation on the target feature vector to obtain a linear transformation result, map the linear transformation result to a preset logarithm probability vector, and determine a target text according to the logarithm probability vector.

Referring to fig. 4, another embodiment of the text translation apparatus based on deep learning according to the embodiment of the present invention includes:

wherein, the obtaining module 301 specifically includes:

the mapping unit 3011 is configured to obtain a text to be translated, perform unique hot coding on the text to be translated to obtain a target unique hot coding vector set, call a preset embedding model, and map the target unique hot coding vector set into a matrix with preset dimensions to obtain a target dimension matrix vector set;

a determining unit 3012, configured to invoke a preset position coding function, determine position information of each word in the target dimensional matrix vector set, and obtain a position vector;

Optionally, the mapping unit 3011 includes:

the identifying subunit 30111 is configured to obtain a text to be translated, identify a target punctuation mark in the text to be translated, and perform sentence division processing on the text to be translated according to the target punctuation mark to obtain multiple sub-texts;

a unique hot coding subunit 30112, configured to perform unique hot coding on each word in each sub-text to obtain a unique hot coding vector set corresponding to each sub-text, and determine the unique hot coding vector sets corresponding to all the sub-texts as a target unique hot coding vector set;

the mapping subunit 30113 is configured to invoke a preset embedded model to construct a matrix with preset dimensions, obtain a target dimension matrix, and map the target unique hot coding vector set through the target dimension matrix to obtain a target dimension matrix vector set.

Optionally, the one-hot coding subunit 30112 may be further specifically configured to:

calling a preset counting function to count the number of words in each sub-text; filling each word in each sub-text through a preset value to obtain an initial vector set corresponding to each sub-text; checking the initial vector set corresponding to each sub-text according to the number of words in each sub-text to obtain a unique hot coding vector set corresponding to each sub-text passing the checking; and determining the one-hot coding vector set corresponding to all the sub-texts passing the verification as a target one-hot coding vector set.

Optionally, the encoding module 302 includes:

the first encoding unit 3021 is configured to perform attention operation and bias operation on a position vector in sequence through a first stack layer in a preset encoder to obtain an intermediate encoding vector, where the encoder includes multiple stack layers, the multiple stack layers are sequentially connected in sequence, each stack layer includes a multi-head attention sublayer and a feedforward neural network sublayer, and the encoder is an encoder of a deep learning network based on a self-attention mechanism;

a second encoding unit 3022, configured to perform attention operation and bias operation on the intermediate encoded vector sequentially through a target layer in the encoder to obtain an initial feature vector, where the target layer is used to indicate stack layers other than a first stack layer in the plurality of stack layers.

Optionally, the decoding module 303 includes:

a first decoding unit 3031, configured to perform attention operation and bias operation on an initial feature vector sequentially through a first stacked layer in a preset decoder to obtain an intermediate decoding vector, where the decoder includes multiple stacked layers, the multiple stacked layers are sequentially connected in sequence, each stacked layer includes two multi-head attention sublayers and one feedforward neural network sublayer, and the decoder is a decoder based on a deep learning network of a self-attention mechanism;

a second decoding unit 3032, configured to perform attention operation and bias operation on the intermediate decoding vector sequentially through a target layer in the decoder to obtain a target feature vector, where the target layer is used to indicate a stack layer other than a first stack layer in the plurality of stack layers.

Optionally, the linear transformation module 304 includes:

a linear transformation unit 3041, configured to perform linear transformation on the target feature vector to obtain a linear transformation result, and map the linear transformation result to a preset logarithmic probability vector;

the sorting unit 3042 is configured to obtain probability values of the log probability vectors, sort the log probability vectors according to a descending order of the probability values, and determine the first log probability vector to be a target text.

Fig. 3 and 4 describe the text translation apparatus based on deep learning in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the text translation apparatus based on deep learning in the embodiment of the present invention is described in detail from the perspective of hardware processing.

Fig. 5 is a schematic structural diagram of a deep learning based text translation apparatus 500 according to an embodiment of the present invention, which may include one or more processors (CPUs) 510 (e.g., one or more processors) and a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) storing applications 533 or data 532, where the deep learning based text translation apparatus 500 may generate relatively large differences due to different configurations or performances. Memory 520 and storage media 530 may be, among other things, transient or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations for the deep learning based text translation apparatus 500. Still further, the processor 510 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the deep learning based text translation device 500.

The deep learning based text translation apparatus 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input-output interfaces 560, and/or one or more operating systems 531, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the deep learning based text translation device architecture shown in fig. 5 does not constitute a limitation of deep learning based text translation devices and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

The invention also provides a text translation device based on deep learning, which comprises a memory and a processor, wherein computer readable instructions are stored in the memory, and when being executed by the processor, the computer readable instructions cause the processor to execute the steps of the text translation method based on deep learning in the embodiments.

The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the deep learning based text translation method.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A text translation method based on deep learning is characterized in that the text translation method based on deep learning comprises the following steps:

acquiring a text to be translated, and performing word embedding and position coding processing on the text to be translated to obtain a position vector;

calling a preset encoder, and carrying out encoding processing on the position vector to obtain an initial characteristic vector, wherein the encoder is an encoder of a deep learning network based on a self-attention mechanism;

calling a preset decoder, and decoding the initial characteristic vector to obtain a target characteristic vector, wherein the decoder is based on a self-attention mechanism deep learning network;

and performing linear transformation on the target feature vector to obtain a linear transformation result, mapping the linear transformation result to a preset logarithm probability vector, and determining a target text according to the logarithm probability vector.

2. The text translation method based on deep learning of claim 1, wherein the obtaining of the text to be translated, performing word embedding and position coding processing on the text to be translated, and obtaining the position vector comprises:

acquiring a text to be translated, carrying out unique hot coding processing on the text to be translated to obtain a target unique hot coding vector set, calling a preset embedded model, and mapping the target unique hot coding vector set into a matrix with preset dimensionality to obtain a target dimensionality matrix vector set;

and calling a preset position coding function, and determining the position information of each word in the target dimension matrix vector set to obtain a position vector.

3. The text translation method based on deep learning of claim 2, wherein the obtaining a text to be translated, performing unique hot coding on the text to be translated to obtain a target unique hot coding vector set, calling a preset embedding model, and mapping the target unique hot coding vector set to a matrix with preset dimensions respectively to obtain a target dimension matrix vector set comprises:

acquiring a text to be translated, identifying a target punctuation mark in the text to be translated, and performing sentence division processing on the text to be translated according to the target punctuation mark to obtain a plurality of sub-texts;

performing one-hot coding on each word in each sub-text to obtain one-hot coding vector sets corresponding to each sub-text, and determining the one-hot coding vector sets corresponding to all the sub-texts as target one-hot coding vector sets;

and calling a preset embedded model to construct a matrix with preset dimensions to obtain a target dimension matrix, and mapping the target unique hot coding vector set through the target dimension matrix to obtain a target dimension matrix vector set.

4. The text translation method based on deep learning of claim 3, wherein the performing unique hot coding on each word in each sub-text to obtain a unique hot coding vector set corresponding to each sub-text, and determining the unique hot coding vector set corresponding to all the sub-texts as the target unique hot coding vector set comprises:

calling a preset counting function to count the number of words in each sub-text;

filling each word in each sub-text through a preset value to obtain an initial vector set corresponding to each sub-text;

checking the initial vector set corresponding to each sub-text according to the number of words in each sub-text to obtain a unique hot coding vector set corresponding to each sub-text which passes the checking;

and determining the one-hot coding vector set corresponding to all the sub-texts passing the verification as a target one-hot coding vector set.

5. The method for text translation based on deep learning of claim 1, wherein the invoking a preset encoder to perform encoding processing on the position vector to obtain an initial feature vector, the encoder being an encoder of a deep learning network based on a self-attention mechanism includes:

sequentially performing attention operation and bias operation on the position vector through a first stacking layer in a preset encoder to obtain an intermediate encoding vector, wherein the encoder comprises a plurality of stacking layers which are sequentially connected, each stacking layer comprises a multi-head attention sublayer and a feedforward neural network sublayer, and the encoder is based on a self-attention mechanism deep learning network;

and sequentially carrying out attention operation and bias operation on the intermediate coding vector through a target layer in the encoder to obtain an initial feature vector, wherein the target layer is used for indicating the stack layers except the first stack layer in the plurality of stack layers.

6. The text translation method based on deep learning of claim 1, wherein the invoking a preset decoder to decode the initial feature vector to obtain a target feature vector, the decoder being a decoder of a deep learning network based on a self-attention mechanism includes:

sequentially performing attention operation and bias operation on the initial characteristic vector through a first stacking layer in a preset decoder to obtain an intermediate decoding vector, wherein the decoder comprises a plurality of stacking layers which are sequentially connected, each stacking layer comprises two multi-head attention sublayers and a feedforward neural network sublayer which are sequentially connected, and the decoder is a decoder of a deep learning network based on a self-attention mechanism;

and sequentially carrying out attention operation and bias operation on the intermediate decoding vector through a target layer in the decoder to obtain a target characteristic vector, wherein the target layer is used for indicating the stacking layers except the first stacking layer in the plurality of stacking layers.

7. The text translation method based on deep learning of any one of claims 1-6, wherein the performing linear transformation on the target feature vector to obtain a linear transformation result, mapping the linear transformation result into a preset log probability vector, and determining the target text according to the log probability vector comprises:

performing linear transformation on the target characteristic vector to obtain a linear transformation result, and mapping the linear transformation result to a preset logarithmic probability vector;

and obtaining the probability value of the log probability vectors, sequencing the log probability vectors according to the sequencing of the probability value from large to small, and determining the first sequenced log probability vector as a target text.

8. A deep learning based text translation apparatus, comprising:

the system comprises an acquisition module, a translation module and a translation module, wherein the acquisition module is used for acquiring a text to be translated, and performing word embedding and position coding processing on the text to be translated to obtain a position vector;

the encoding module is used for calling a preset encoder to encode the position vector to obtain an initial feature vector, and the encoder is based on a self-attention mechanism deep learning network;

the decoding module is used for calling a preset decoder and decoding the initial characteristic vector to obtain a target characteristic vector, wherein the decoder is based on a self-attention mechanism deep learning network;

and the linear transformation module is used for performing linear transformation on the target characteristic vector to obtain a linear transformation result, mapping the linear transformation result to a preset logarithm probability vector, and determining a target text according to the logarithm probability vector.

9. A deep learning based text translation apparatus, characterized in that the deep learning based text translation apparatus comprises:

a memory and at least one processor, the memory having instructions stored therein;

the at least one processor invokes the instructions in the memory to cause the deep learning based text translation device to perform the deep learning based text translation method of any of claims 1-7.

10. A computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement a deep learning based text translation method according to any one of claims 1-7.