CN116701961B

CN116701961B - Method and system for automatically evaluating machine translation result of cultural relics

Info

Publication number: CN116701961B
Application number: CN202310973916.4A
Authority: CN
Inventors: 李炜; 邵艳秋; 董立成; 申资卓; 杜彦融
Original assignee: BEIJING LANGUAGE AND CULTURE UNIVERSITY
Current assignee: BEIJING LANGUAGE AND CULTURE UNIVERSITY
Priority date: 2023-08-04
Filing date: 2023-08-04
Publication date: 2023-10-20
Anticipated expiration: 2043-08-04
Also published as: CN116701961A

Abstract

The application relates to the technical field of natural language processing, and discloses a method and a system for automatically evaluating a machine translation result of a cultural relics, wherein the method comprises the following steps: constructing a data set to be evaluated; preprocessing and word segmentation processing are carried out on the data set to be evaluated; acquiring an original text and a reference translation of a training corpus from a data set to be evaluated, translating the original text by using a machine translation model of the text to be evaluated, comparing the obtained target translation with the reference translation and the original text, manually analyzing the target translation, and constructing a test data set in a mode of original text-reference translation-target translation-manual scoring; calculating the similarity of the reference translation and the target translation, calculating scores of three dimensions of the faithfulness, the fluency and the colloquiality of the target translation according to the test data set, calculating average score, error and relevance of the target translation, and automatically scoring the target translation; the total score is calculated by assigning loyalty and fluency weights.

Description

Method and system for automatically evaluating machine translation result of cultural relics

Technical Field

The application relates to the technical field of natural language processing, in particular to a method and a system for automatically evaluating a machine translation result of a cultural relics.

Background

The earliest forms of translation were manual, but machine translation was increasingly favored due to inefficiency, high labor costs, and the like, as well as the advent of computers and the development of computer technology. The advantages of high efficiency and low cost compared to manual translation have led to rapid development of machine translation. The ancient documents in China are huge, and moreover, due to the fact that grammar and word sense of ancient Chinese are different from those of modern Chinese, most modern people have difficulty in deeply researching ancient Chinese. The ancient text is translated into a modern text which is well known and meets the requirements of most people. Machine translation may be used to translate the palindromic text.

However, the construction of the automatic evaluation method of ancient Chinese machine translation still has many problems, the existing machine translation and automatic evaluation methods are mainly used for bilingual translation, and the machine translation and evaluation methods from ancient Chinese to modern Chinese are relatively lacking. In addition, the translation between ancient chinese to modern chinese is not exactly the same as other cross-language translations. Modern Chinese is evolved from ancient Chinese, and the two have similarity, and some information in the ancient Chinese can be directly applied to the modern Chinese. However, compared with modern Chinese language, ancient Chinese language translation has three distinct features: firstly, the words with partial concepts disappeared need to be output as they are, and the accuracy of word segmentation is very important; secondly, the boundaries of the ancient text and the modern text have a certain ambiguity, but the application hopes that the translation model can sufficiently translate the ancient text and reduce the output of the ancient text as much as possible; thirdly, the ancient text has high condensing performance, and a great amount of omission phenomenon exists in sentences, so that the reference translation is often complemented, and the complemented background word information needs to be considered when evaluating the result of machine-machine translation.

In order to solve the above problems, common automatic evaluation methods of ancient text machine translation include manual evaluation and automatic evaluation methods, wherein the manual evaluation can more accurately reflect the quality of machine translation, but the automatic evaluation methods have a plurality of defects: inefficiency, significant time and economic costs, non-reusability of results, susceptibility to subjective factors, etc. The automatic evaluation method can evaluate a large number of text translation results and output the results only based on a fixed algorithm in a short time, and can evaluate and feed back the translation quality of the translation model in time, so that the automatic evaluation method has more advantages compared with the automatic evaluation method. Therefore, how to combine the characteristics of the paleo-language translation to optimize the automatic evaluation method so that the method can be better applied to tasks is a technical problem to be solved by the technicians in the field.

Disclosure of Invention

In order to better guide the ancient text machine translation model to carry out iterative optimization and assist people to judge the quality and the deficiency of the current ancient text machine translation model. The application constructs an ancient text-reference translation-target translation-manual scoring data set based on three dimensions of loyalty, smoothness and popularity, and finds a more proper method for evaluating each dimension inside so as to achieve the purposes of intuitively evaluating the quality of an ancient text machine translation result and optimizing an ancient text translation model in iteration.

Step S101: screening out a plurality of sentences to construct a data set to be evaluated by a random and manual selection mode according to modern text parallel pairing corpus;

step S102: preprocessing and word segmentation processing are carried out on the data set to be evaluated;

step S103: acquiring an original text and a reference translation of a training corpus from the data set to be evaluated, translating the original text by using a machine translation model of the text to be evaluated, comparing the obtained target translation with the reference translation and the original text, manually analyzing the target translation, judging the quality of the translation translated by the machine translation model of the text, and constructing a test data set according to an original text-reference translation-target translation-manual scoring mode;

step S104: and calculating the similarity of the reference translation and the target translation by comparing the reference translation with the target translation, calculating scores of three dimensions of loyalty, fluency and colloquiality of the target translation according to the test data set, calculating average score, error and relevance of the target translation according to the scores of the three dimensions, and automatically scoring the target translation.

In some embodiments, step S102 further includes:

counting words subjected to word segmentation, dividing the words according to manual word segmentation, automatic word segmentation and word segmentation according to word granularity, and comparing the effect differences of the manual word segmentation, the automatic word segmentation and the word segmentation according to the word granularity.

In some embodiments, step S103 further includes:

and processing the components of the reference translation, which are complemented according to the chapter information, into background words, and not serving as references for evaluating the target translation.

In some embodiments, step S104 further includes:

step S1041: calculating similarity of the words in the reference translation and the target translation in pairs, setting a threshold value for judging synonyms, and taking the word pairs with similarity higher than or equal to the threshold value as the paraphrasing; then respectively calculating matching score of reference translation _ref Score matching target translation _tar The method comprises the steps of carrying out a first treatment on the surface of the And using the recall rate as a judging basis for judging whether the original text information is lost or not and calculating the loyalty, wherein the calculation formula of the loyalty is as follows:

；

wherein sim is _max To obtain the maximum similarity calculated according to the sense source in the reference translation or the target translation, N _r To refer to the number of words in the translation, N _t Score for the number of words in the target translation _ref Sentence similarity, score, obtained by summing the maximum similarity of each word for reference translation _tar Sentence similarity obtained by summing maximum similarity of each word for target translation, score _match Score for similarity score of reference translation and target translation _zhong Score for loyalty;

step S1042: calculating the fluency of the cultural relics through the longest continuous public subsequence, wherein the fluency is calculated according to the following formula:

；

wherein Pen is a punishment term in the process of calculating fluency, # chunks in target sentence is the number of longest continuous subsequences calculated according to dynamic programming, # words in target sentence is the number of sense sources in a sentence, beta is a punishment coefficient, and gamma is a punishment index term;

step S1043: the popularity is calculated by the following formula:

；

wherein, # insertion _tar And #insertion _ref The number of words to be inserted in calculating the editing times of the target translation and the reference translation is # substraction _tar And # subtistition _ref The number of the words to be replaced when the editing times of the target translation and the reference translation are calculated is, # degree _tar And # delete _ref The number of words to be deleted when the editing times of the target translation and the reference translation are calculated respectively _tar For the editing times of the target translation to the original text, the edition is used _ref Score for referring to the number of edits from translation to original _con Score for ratio of reference translation edit times to target translation edit times _tong Is popularDividing into two parts.

In some embodiments, step S104 further includes: the similarity is determined according to the following formula:

；

wherein Sim is _s Representing similarity scores of the target translation and the reference translation, S ₁ And S is ₂ Respectively representing two concepts to be evaluated, structSim represents a structural similarity calculation function called by OpenHownet, sim _DEF Representing the semblance similarity calculation function, beta, invoked by OpenHownet _struct Weight parameter, beta, as a StructSim function _DEF Is Sim _DEF Weight parameters of the function.

In order to achieve the above object, the present application further provides a system for automatically evaluating the machine translation result of a text and a word, comprising:

and a data construction module: the method is used for screening out a plurality of sentences to construct a data set to be evaluated by a random and manual selection mode according to modern text parallel pairing corpus;

and a pretreatment module: the method is used for preprocessing and word segmentation processing of the data set to be evaluated;

the test set construction module: the method comprises the steps of obtaining an original text and a reference translation of a training corpus from a data set to be evaluated, translating the original text by using a machine translation model of the text to be evaluated, comparing an obtained target translation with the reference translation and the original text, manually analyzing the target translation, judging the translation quality translated by the machine translation model of the text, and constructing a test data set according to an original text-reference translation-target translation-manual scoring mode;

and a scoring module: the method is used for calculating the similarity of the reference translation and the target translation by comparing the reference translation with the target translation, calculating scores of three dimensions of loyalty, fluency and popularity of the target translation according to the test data set, calculating average scores, errors and correlations of the target translation according to the scores of the three dimensions, and automatically scoring the target translation.

In some embodiments, the preprocessing module further comprises:

In some embodiments, the test set construction module further comprises:

In some embodiments, the scoring module further comprises:

loyalty calculation unit: calculating similarity of the words in the reference translation and the target translation in pairs, setting a threshold value for judging synonyms, and taking the word pairs with similarity higher than or equal to the threshold value as the paraphrasing; then respectively calculating matching score of reference translation _ref Score matching target translation _tar The method comprises the steps of carrying out a first treatment on the surface of the And using the recall rate as a judging basis for judging whether the original text information is lost or not and calculating the loyalty, wherein the calculation formula of the loyalty is as follows:

；

fluency calculating unit: calculating the fluency of the cultural relics through the longest continuous public subsequence, wherein the fluency is calculated according to the following formula:

；

popularity calculation unit: the popularity is calculated by the following formula:

；

wherein, # insertion _tar And #insertion _ref The target translation and the reference translation are respectively calculated by inserting the target translation and the reference translationNumber of words, # of questions _tar And # subtistition _ref The number of the words to be replaced when the editing times of the target translation and the reference translation are calculated is, # degree _tar And # delete _ref The number of words to be deleted when the editing times of the target translation and the reference translation are calculated respectively _tar For the editing times of the target translation to the original text, the edition is used _ref Score for referring to the number of edits from translation to original _con Score for ratio of reference translation edit times to target translation edit times _tong Is popular score.

In some embodiments, the similarity is determined according to the following formula:

；

wherein Sim is _s Representing similarity scores of the target translation and the reference translation, S ₁ And S is ₂ Respectively representing concepts to be compared, structSim represents a structural similarity calculation function called by OpenHownet, sim _DEF Representing the semblance similarity calculation function, beta, invoked by OpenHownet _struct Weight parameter, beta, as a StructSim function _DEF Is Sim _DEF Weight parameters of the function.

The beneficial effects of the technical scheme are that:

(1) According to the application, word meaning information is integrated in evaluation aiming at the ancient text target translation output by machine translation, and the evaluation dimension of colloquiality (whether the translation fully translates the original text) is increased beyond the evaluation dimension of traditional faithfulness (the information that the translation result is faithful to the original text) and fluency (the target translation obtained by translation is fluent and natural and accords with the expression habit of the target language), so that the effectiveness of the evaluation method is improved.

(2) Compared with manual word segmentation, the automatic word segmentation is finer, so that a plurality of words which can be words are changed into single words, and the proportion of the unknown words in the known network is greatly reduced. Therefore, more word sense information can be obtained by adopting the automatic word segmentation method, most words can have word senses when the loyalty is calculated, and comparison is convenient, so that the correlation between the evaluation result obtained by the evaluation method and the human score is higher.

(3) Compared with the basic granularity with words as loyalty, the basic granularity with words as loyalty is evaluated in single loyalty dimension or in two dimensions of smoothness and popularity, and the obtained result has higher correlation with human scoring results. And after the popularity dimension is added in the evaluation method, no matter what method is used for evaluating the fluency, the correlation between the obtained evaluation result and the human score is greatly improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for automatically evaluating the results of machine translation of a text and a sentence according to an embodiment of the present application;

fig. 2 is a schematic diagram of a system for automatically evaluating the machine translation result of a text and a sentence according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments.

Examples of the embodiments are illustrated in the accompanying drawings, wherein like or similar symbols indicate like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present application and should not be construed as limiting the application.

Example 1

One embodiment of the present application provides a method for automatically evaluating the machine translation result of a text-to-speech, referring to fig. 1, including:

step S101: according to the modern text parallel pairing corpus, a plurality of sentences are screened out to construct a data set to be evaluated in a random and manual selection mode.

Step S102: and preprocessing and word segmentation processing are carried out on the data set to be evaluated.

In a specific embodiment of the present application, step S102 further includes:

Step S103: and acquiring an original text and a reference translation of the training corpus from the data set to be evaluated, translating the original text by using a machine translation model of the text to be evaluated, comparing the obtained target translation with the reference translation and the original text, manually analyzing the target translation, judging the translation quality translated by the machine translation model of the text, and constructing a test data set according to an original text-reference translation-target translation-manual scoring mode.

In a specific embodiment of the present application, step S103 further includes:

In a specific embodiment of the present application, step S104 further includes:

calculating the target translationSimilarity score Sim of sense origin corresponding to text and reference translation _s And determining the maximum similarity according to the following formula:

；

wherein Sim is _s Representing similarity scores of the target translation and the reference translation, S ₁ And S is ₂ Concepts to be compared are represented separately, i.e., each word may correspond to a plurality of interpretations, the concept being one of the plurality of interpretations, each concept being illustrated as being decomposed by one or more senes. StructSim represents a structural similarity calculation function called by openhopnet, sim _DEF Representing the semblance similarity calculation function, beta, invoked by OpenHownet _struct Weight parameter, beta, as a StructSim function _DEF Is Sim _DEF Weight parameters of the function.

Specifically, during actual calculation, the similarity of two words based on the knowledge network can be obtained through open source item OpenHownet calculation.

For example, in "teacher and father must be good at treating him" (reference translation) and "teacher should be good at treating him" (target translation), it is possible for "teacher" and "teacher and father" to obtain a similarity of 0.82 in the knowledge network.

step S1041: calculating similarity of the words in the reference translation and the target translation in pairs, setting a threshold value for judging synonyms, and regarding the word pairs with similarity higher than or equal to the threshold value as a paraphrasing; then respectively calculating matching score of reference translation _ref And target translation score _tar Matching scores of (2); and using the recall rate as a judging basis for judging whether the original text information is lost or not and calculating the loyalty, wherein the calculation formula of the loyalty is as follows:

；

wherein sim is _max To obtain the maximum similarity calculated according to a word in the reference translation or the target translation, N _r To refer to the number of words in the translation, N _t Score for the number of words in the target translation _ref Sentence similarity, score, obtained by summing the maximum similarity of each word for reference translation _tar Sentence similarity obtained by summing maximum similarity of each word for target translation, score _match Score for similarity score of reference translation and target translation _zhong Is a loyalty score.

Specifically, if the teacher and the father must be good at treating him (refer to the translation) and the teacher should be good at treating him (target translation), the terms in the two sentences are calculated to be similar in pairs; calculating "he is excellent" and "he is true", the terms in the two sentences are similar to each other, for example:

the word similarity of the teacher, the father and the teacher is as follows: 0.82;

the similarity between the teacher and the father and the words to be treated is as follows: 0.30;

the similarity of the words of the teachers and parents and good words is as follows: 0.25;

the similarity between the teacher and the words to be treated is as follows: 0.24;

the word similarity of the teacher, the father and the other is as follows: 0.73;

according to the calculation method, the similarity scores of the words in the target translation and the reference translation are respectively as follows:

master: 0.82 Certain: 1, to: 1, good treatment: 0.69, he: 1, a step of;

teacher). 0.82 The preparation method comprises the following steps: 1, good: 0.56, treat: 0.69, he: 1, a step of;

setting a threshold value for a near term, such as 0.44, and a highest similarity score above the threshold value, where the score is scored as a match score for the term, and a match score below the threshold value is 0, assuming that each term is equally weighted, for example:

the teacher and the father must stay his matching score as follows: 0.82+1+1+0.69+1=4.51;

the teacher would be good at treating his matching score as: 0.82+1+0.56+0.69+1=4.07;

the matching score of these two sentences was min (4.51, 4.07) =4.07, and the loyalty score was 4.07/5=0.81.

；

specifically, in the above example, the number of consecutive # chunks in target sentence is 2, the number of words is 5, and the super parameter is assumed1->If the smoothness score is 0.5, the smoothness score can be calculated to be 0.37.

Step S1043: the popularity is calculated by the following formula:

；

wherein, # insertion _tar And #insertion _ref The number of words to be inserted in calculating the editing times of the target translation and the reference translation is # substraction _tar And # subtistition _ref The number of the words to be replaced when the editing times of the target translation and the reference translation are calculated is, # degree _tar And # delete _ref The number of words to be deleted when the editing times of the target translation and the reference translation are calculated respectively _tar For the editing times of the target translation to the original text, the edition is used _ref Score for referring to the number of edits from translation to original _con Score for ratio of reference translation edit times to target translation edit times _tong Is popular score.

Specifically, for example, the literary text "is still more so, the artist is at his best. When the reference translation is "but he comes again, the teacher and the father must stay on his own. The "target translation obtained by machine translation of ancient text" is that he is still better treated by the teacher. "the number of edits from ancient text to reference translation is 11, the number of edits from ancient text to target translation is 10, score _con =10/11, so the popularity score of this sentence is 0, which is not subjected to insufficient translation penalty reduction.

In one embodiment of the application, the total score is calculated by assigning the loyalty and fluency weights. According to the score _zhong 、score _liu 、score _con By combining score values _zhong 、score _liu Respectively setting weights, setting the total weight as 1, and setting score _con The total Score is determined as a subtractive term, and if the result is negative, the total Score is set to 0.

Specifically, the calculation of the total score should comprehensively consider the scores of three dimensions, and weights are respectively set for loyalty and fluency, wherein the total weight is 1; popularity is involved in scoring as a penalty term. Meanwhile, in order to avoid that the total score takes a negative value, after the scores of three dimensions participate in calculation, if the final score is a negative value, the total score is set to 0. For example, the calculation formulas for loyalty and fluency weights of 0.8 and 0.2 are now provided:

；

where Score is the final total Score.

After testing, it was found that untreated background words negatively interfere with the assessment of loyalty, resulting in reduced relevance to human scores. So that there is a decrease in correlation with human scores when evaluated in all dimensions. Removal of background words has proven to be effective in evaluating ancient machine translation. Table 1, table 2, table 3 below, shows:

TABLE 1 correlation of evaluation methods Using word granularity with human evaluation results

TABLE 2 statistical table after word segmentation of test corpus

TABLE 3 effect of the evaluation method of the application on the corpus of unprocessed background words

Remarks: naming of each evaluation method in tables 1-3:

TMH (Traditional evaluation Method base on HowNet) refers to the evaluation method, fluency, popularity of the loyalty based on the HowNet of the known network, and the different methods adopted inside each evaluation dimension are represented by using the form of suffixes. The suffix content is as follows:

different methods inside loyalty:

a: the evaluation performed on the basis of the word segmentation using the automatic word segmentation tool is shown.

And h: the evaluation performed on the basis of manually performing word segmentation is shown.

W: the evaluation using the granularity of the words is shown.

Different methods inside fluency:

l: a smoothness evaluation method using the longest common subsequence is shown.

1: a fluency assessment method based on 1-gram was used.

2: a 2-gram based fluency assessment method was used.

3: a fluency assessment method based on 3-gram was used.

4: a 4-gram based fluency assessment method was used.

The popular evaluation method comprises the following steps:

and _E: the number of edits was used as a popular evaluation method.

On the selection of corpus:

if the corpus after processing the background word is used, there is no additional character.

B: an untreated test corpus containing background words was used.

For example: TMH_w refers to the use of word granularity for loyalty calculations, and the final evaluation method includes only one dimension of loyalty. TMH_a_3_E refers to performing faithfulness calculation according to word granularity after word segmentation by using an automatic word segmentation tool, wherein the evaluation method of fluency selects an evaluation method based on 3-gram, the colloquiality is evaluated by using the editing times, and the final evaluation method comprises three dimensions of faithfulness, fluency and colloquiality.

Compared with the basic granularity with words as loyalty, the basic granularity with words as loyalty is evaluated in single loyalty dimension or in two dimensions of smoothness and popularity, and the obtained result has higher correlation with human scoring results. After the popularity dimension is added in the evaluation method, the correlation between the obtained evaluation result and the human score is greatly improved no matter what method is used for evaluating the fluency.

Example two

One embodiment of the present application provides a system for automatically evaluating the machine translation result of a text-to-speech, as shown in fig. 2, comprising:

the data set construction module 10: the method is used for screening out a plurality of sentences to construct a data set to be evaluated by a random and manual selection mode according to modern text parallel pairing corpus.

Pretreatment module 20: and the method is used for preprocessing and word segmentation processing of the data set to be evaluated.

In one embodiment of the present application, the preprocessing module 20 further includes:

Test set construction module 30: and the method is used for acquiring the original text and the reference translation of the training corpus from the data set to be evaluated, translating the original text by utilizing a machine translation model of the text to be evaluated, comparing the obtained target translation with the reference translation and the original text, manually analyzing the target translation, judging the translation quality translated by the machine translation model of the text, and constructing a test data set according to the original text-reference translation-target translation-manual scoring mode.

In one embodiment of the present application, the test set construction module 30 further includes:

Scoring module 40: the method is used for calculating the similarity of the reference translation and the target translation by comparing the reference translation with the target translation, calculating scores of three dimensions of loyalty, fluency and popularity of the target translation according to the test data set, calculating average scores, errors and correlations of the target translation according to the scores of the three dimensions, and automatically scoring the target translation.

In one embodiment of the present application, scoring module 40 further comprises:

calculating similarity score Sim of corresponding sense source of the target translation and the reference translation _s And determining the maximum similarity according to the following formula:

；

loyalty calculation unit: calculating similarity of the words of the reference translation and the target translation in pairs, setting a threshold value for judging synonyms, and taking the word pairs with similarity higher than or equal to the threshold value as the paraphrasing; then respectively calculating matching score of reference translation _ref And target translation score _tar Matching scores of (2); and using the recall rate as a judging basis for judging whether the original text information is lost or not and calculating the loyalty, wherein the calculation formula of the loyalty is as follows:

；

wherein Pen is a penalty term in the process of calculating fluency, # chunks in target sentence is the number of longest continuous subsequences calculated according to dynamic programming, # words in target sentence is the number of sense sources in a sentence, beta is a penalty coefficient, and gamma is a penalty index term.

；

where Score is the final total Score.

In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "examples," "particular examples," "one particular embodiment," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be appreciated by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A method for automatically evaluating the results of machine translation of a document, comprising:

step S104: calculating the similarity of the reference translation and the target translation by comparing the reference translation with the target translation, calculating scores of three dimensions of loyalty, fluency and popularity of the target translation according to the test data set, calculating average scores, errors and correlations of the target translation according to the scores of the three dimensions, and automatically scoring the target translation;

score _match ＝min(score _ref ,score _tar )

wherein sim is _max To take reference translation or target translationMaximum similarity calculated by sense origin, S _i And S is _j Respectively representing the concepts to be evaluated, nr is the number of words in the reference translation, N _t Score for the number of words in the target translation _ref Sentence similarity, score, obtained by summing the maximum similarity of each word for reference translation _tar Sentence similarity obtained by summing maximum similarity of each word for target translation, score _match Score for similarity score of reference translation and target translation _zhong Is a loyalty score;

score _liu ＝1-βPen ^γ

wherein Pen is a punishment term in the process of calculating fluency, # chunksin target sentence is the number of longest continuous subsequences calculated according to dynamic programming, # words in target sentence is the number of sense sources in a sentence, beta is a punishment coefficient, and gamma is a punishment index term;

step S1043: the popularity is calculated by the following formula:

edit _tar ＝#insertion _tar +#substitution _tar +#deletion _tar

edit _ref ＝#insertion _ref +#substitution _ref +#deletion _ref

score _tong ＝max(0,(0.4-score _con ))

2. The method for automatically evaluating the machine translation result of a text according to claim 1, wherein said step S102 further comprises:

3. The method for automatically evaluating the machine translation result of a text according to claim 1, wherein said step S103 further comprises:

4. The method for automatically evaluating machine translation results of a text according to claim 1, wherein said similarity is determined according to the following formula:

wherein Sim is _s Representing similarity scores of the target translation and the reference translation, S ₁ And S is ₂ Respectively representing two concepts to be evaluated, structSim representing a structural phase called by openhopnetSimilarity calculation function, sim _DEF Representing the semblance similarity calculation function, beta, invoked by OpenHownet _struct Weight parameter, beta, as a StructSim function _DEF Is Sim _DEF Weight parameters of the function.

5. A system for automatically evaluating the results of machine translation of a document, comprising:

the data set construction module: the method is used for screening out a plurality of sentences to construct a data set to be evaluated by a random and manual selection mode according to modern text parallel pairing corpus;

and a scoring module: the method comprises the steps of comparing the reference translation with a target translation, calculating the similarity of the reference translation and the target translation, calculating scores of three dimensions of loyalty, fluency and colloquiality of the target translation according to the test data set, calculating average scores, errors and correlations of the target translation according to the scores of the three dimensions, and automatically scoring the target translation;

score _match ＝min(score _ref ,score _tar )

wherein sim is _max To obtain the maximum similarity calculated according to the sense origin in the reference translation or the target translation S _i And S is _j Respectively represent the concepts to be evaluated, N _r To refer to the number of words in the translation, N _t Score for the number of words in the target translation _ref Sentence similarity, score, obtained by summing the maximum similarity of each word for reference translation _tar Sentence similarity obtained by summing maximum similarity of each word for target translation, score _match Score for similarity score of reference translation and target translation _zhong Is a loyalty score;

score _liu ＝1-βPen ^γ

wherein Pen is a punishment term in the process of calculating fluency, # chunksin target sentence is the number of longest continuous subsequences calculated according to dynamic programming, # wordsin target sentence is the number of sense sources in a sentence, beta is a punishment coefficient, and gamma is a punishment index term;

edit _tar ＝#insertion _tar +#substitution _tar +#deletion _tar

edit _ref ＝#insertion _ref +#substitution _ref +#deletion _ref

score _tong ＝max(0,(0.4-score _con ))

6. The system for automatically evaluating machine translation results of a document according to claim 5, wherein said preprocessing module further comprises:

7. The system for automatically evaluating machine translation results of a document according to claim 5, wherein said test set construction module further comprises:

8. The system for automatically evaluating machine translation results of a document according to claim 5, wherein said similarity is determined according to the following formula:

wherein Sim is _s Representing similarity scores of the target translation and the reference translation, S ₁ And S is ₂ Respectively representing concepts to be compared, structSim represents a structural similarity calculation function called by openhopnet, sim _DEF Representing the semblance similarity calculation function, beta, invoked by OpenHownet _struct Weight parameter, beta, as a StructSim function _DEF Is Sim _DEF Weight parameters of the function.