[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN116701961B - Method and system for automatically evaluating machine translation result of cultural relics - Google Patents

Method and system for automatically evaluating machine translation result of cultural relics Download PDF

Info

Publication number
CN116701961B
CN116701961B CN202310973916.4A CN202310973916A CN116701961B CN 116701961 B CN116701961 B CN 116701961B CN 202310973916 A CN202310973916 A CN 202310973916A CN 116701961 B CN116701961 B CN 116701961B
Authority
CN
China
Prior art keywords
translation
score
target
similarity
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310973916.4A
Other languages
Chinese (zh)
Other versions
CN116701961A (en
Inventor
李炜
邵艳秋
董立成
申资卓
杜彦融
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING LANGUAGE AND CULTURE UNIVERSITY
Original Assignee
BEIJING LANGUAGE AND CULTURE UNIVERSITY
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING LANGUAGE AND CULTURE UNIVERSITY filed Critical BEIJING LANGUAGE AND CULTURE UNIVERSITY
Priority to CN202310973916.4A priority Critical patent/CN116701961B/en
Publication of CN116701961A publication Critical patent/CN116701961A/en
Application granted granted Critical
Publication of CN116701961B publication Critical patent/CN116701961B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to the technical field of natural language processing, and discloses a method and a system for automatically evaluating a machine translation result of a cultural relics, wherein the method comprises the following steps: constructing a data set to be evaluated; preprocessing and word segmentation processing are carried out on the data set to be evaluated; acquiring an original text and a reference translation of a training corpus from a data set to be evaluated, translating the original text by using a machine translation model of the text to be evaluated, comparing the obtained target translation with the reference translation and the original text, manually analyzing the target translation, and constructing a test data set in a mode of original text-reference translation-target translation-manual scoring; calculating the similarity of the reference translation and the target translation, calculating scores of three dimensions of the faithfulness, the fluency and the colloquiality of the target translation according to the test data set, calculating average score, error and relevance of the target translation, and automatically scoring the target translation; the total score is calculated by assigning loyalty and fluency weights.

Description

Method and system for automatically evaluating machine translation result of cultural relics
Technical Field
The application relates to the technical field of natural language processing, in particular to a method and a system for automatically evaluating a machine translation result of a cultural relics.
Background
The earliest forms of translation were manual, but machine translation was increasingly favored due to inefficiency, high labor costs, and the like, as well as the advent of computers and the development of computer technology. The advantages of high efficiency and low cost compared to manual translation have led to rapid development of machine translation. The ancient documents in China are huge, and moreover, due to the fact that grammar and word sense of ancient Chinese are different from those of modern Chinese, most modern people have difficulty in deeply researching ancient Chinese. The ancient text is translated into a modern text which is well known and meets the requirements of most people. Machine translation may be used to translate the palindromic text.
However, the construction of the automatic evaluation method of ancient Chinese machine translation still has many problems, the existing machine translation and automatic evaluation methods are mainly used for bilingual translation, and the machine translation and evaluation methods from ancient Chinese to modern Chinese are relatively lacking. In addition, the translation between ancient chinese to modern chinese is not exactly the same as other cross-language translations. Modern Chinese is evolved from ancient Chinese, and the two have similarity, and some information in the ancient Chinese can be directly applied to the modern Chinese. However, compared with modern Chinese language, ancient Chinese language translation has three distinct features: firstly, the words with partial concepts disappeared need to be output as they are, and the accuracy of word segmentation is very important; secondly, the boundaries of the ancient text and the modern text have a certain ambiguity, but the application hopes that the translation model can sufficiently translate the ancient text and reduce the output of the ancient text as much as possible; thirdly, the ancient text has high condensing performance, and a great amount of omission phenomenon exists in sentences, so that the reference translation is often complemented, and the complemented background word information needs to be considered when evaluating the result of machine-machine translation.
In order to solve the above problems, common automatic evaluation methods of ancient text machine translation include manual evaluation and automatic evaluation methods, wherein the manual evaluation can more accurately reflect the quality of machine translation, but the automatic evaluation methods have a plurality of defects: inefficiency, significant time and economic costs, non-reusability of results, susceptibility to subjective factors, etc. The automatic evaluation method can evaluate a large number of text translation results and output the results only based on a fixed algorithm in a short time, and can evaluate and feed back the translation quality of the translation model in time, so that the automatic evaluation method has more advantages compared with the automatic evaluation method. Therefore, how to combine the characteristics of the paleo-language translation to optimize the automatic evaluation method so that the method can be better applied to tasks is a technical problem to be solved by the technicians in the field.
Disclosure of Invention
In order to better guide the ancient text machine translation model to carry out iterative optimization and assist people to judge the quality and the deficiency of the current ancient text machine translation model. The application constructs an ancient text-reference translation-target translation-manual scoring data set based on three dimensions of loyalty, smoothness and popularity, and finds a more proper method for evaluating each dimension inside so as to achieve the purposes of intuitively evaluating the quality of an ancient text machine translation result and optimizing an ancient text translation model in iteration.
Step S101: screening out a plurality of sentences to construct a data set to be evaluated by a random and manual selection mode according to modern text parallel pairing corpus;
step S102: preprocessing and word segmentation processing are carried out on the data set to be evaluated;
step S103: acquiring an original text and a reference translation of a training corpus from the data set to be evaluated, translating the original text by using a machine translation model of the text to be evaluated, comparing the obtained target translation with the reference translation and the original text, manually analyzing the target translation, judging the quality of the translation translated by the machine translation model of the text, and constructing a test data set according to an original text-reference translation-target translation-manual scoring mode;
step S104: and calculating the similarity of the reference translation and the target translation by comparing the reference translation with the target translation, calculating scores of three dimensions of loyalty, fluency and colloquiality of the target translation according to the test data set, calculating average score, error and relevance of the target translation according to the scores of the three dimensions, and automatically scoring the target translation.
In some embodiments, step S102 further includes:
counting words subjected to word segmentation, dividing the words according to manual word segmentation, automatic word segmentation and word segmentation according to word granularity, and comparing the effect differences of the manual word segmentation, the automatic word segmentation and the word segmentation according to the word granularity.
In some embodiments, step S103 further includes:
and processing the components of the reference translation, which are complemented according to the chapter information, into background words, and not serving as references for evaluating the target translation.
In some embodiments, step S104 further includes:
step S1041: calculating similarity of the words in the reference translation and the target translation in pairs, setting a threshold value for judging synonyms, and taking the word pairs with similarity higher than or equal to the threshold value as the paraphrasing; then respectively calculating matching score of reference translation ref Score matching target translation tar The method comprises the steps of carrying out a first treatment on the surface of the And using the recall rate as a judging basis for judging whether the original text information is lost or not and calculating the loyalty, wherein the calculation formula of the loyalty is as follows:
wherein sim is max To obtain the maximum similarity calculated according to the sense source in the reference translation or the target translation, N r To refer to the number of words in the translation, N t Score for the number of words in the target translation ref Sentence similarity, score, obtained by summing the maximum similarity of each word for reference translation tar Sentence similarity obtained by summing maximum similarity of each word for target translation, score match Score for similarity score of reference translation and target translation zhong Score for loyalty;
step S1042: calculating the fluency of the cultural relics through the longest continuous public subsequence, wherein the fluency is calculated according to the following formula:
wherein Pen is a punishment term in the process of calculating fluency, # chunks in target sentence is the number of longest continuous subsequences calculated according to dynamic programming, # words in target sentence is the number of sense sources in a sentence, beta is a punishment coefficient, and gamma is a punishment index term;
step S1043: the popularity is calculated by the following formula:
wherein, # insertion tar And #insertion ref The number of words to be inserted in calculating the editing times of the target translation and the reference translation is # substraction tar And # subtistition ref The number of the words to be replaced when the editing times of the target translation and the reference translation are calculated is, # degree tar And # delete ref The number of words to be deleted when the editing times of the target translation and the reference translation are calculated respectively tar For the editing times of the target translation to the original text, the edition is used ref Score for referring to the number of edits from translation to original con Score for ratio of reference translation edit times to target translation edit times tong Is popularDividing into two parts.
In some embodiments, step S104 further includes: the similarity is determined according to the following formula:
wherein Sim is s Representing similarity scores of the target translation and the reference translation, S 1 And S is 2 Respectively representing two concepts to be evaluated, structSim represents a structural similarity calculation function called by OpenHownet, sim DEF Representing the semblance similarity calculation function, beta, invoked by OpenHownet struct Weight parameter, beta, as a StructSim function DEF Is Sim DEF Weight parameters of the function.
In order to achieve the above object, the present application further provides a system for automatically evaluating the machine translation result of a text and a word, comprising:
and a data construction module: the method is used for screening out a plurality of sentences to construct a data set to be evaluated by a random and manual selection mode according to modern text parallel pairing corpus;
and a pretreatment module: the method is used for preprocessing and word segmentation processing of the data set to be evaluated;
the test set construction module: the method comprises the steps of obtaining an original text and a reference translation of a training corpus from a data set to be evaluated, translating the original text by using a machine translation model of the text to be evaluated, comparing an obtained target translation with the reference translation and the original text, manually analyzing the target translation, judging the translation quality translated by the machine translation model of the text, and constructing a test data set according to an original text-reference translation-target translation-manual scoring mode;
and a scoring module: the method is used for calculating the similarity of the reference translation and the target translation by comparing the reference translation with the target translation, calculating scores of three dimensions of loyalty, fluency and popularity of the target translation according to the test data set, calculating average scores, errors and correlations of the target translation according to the scores of the three dimensions, and automatically scoring the target translation.
In some embodiments, the preprocessing module further comprises:
counting words subjected to word segmentation, dividing the words according to manual word segmentation, automatic word segmentation and word segmentation according to word granularity, and comparing the effect differences of the manual word segmentation, the automatic word segmentation and the word segmentation according to the word granularity.
In some embodiments, the test set construction module further comprises:
and processing the components of the reference translation, which are complemented according to the chapter information, into background words, and not serving as references for evaluating the target translation.
In some embodiments, the scoring module further comprises:
loyalty calculation unit: calculating similarity of the words in the reference translation and the target translation in pairs, setting a threshold value for judging synonyms, and taking the word pairs with similarity higher than or equal to the threshold value as the paraphrasing; then respectively calculating matching score of reference translation ref Score matching target translation tar The method comprises the steps of carrying out a first treatment on the surface of the And using the recall rate as a judging basis for judging whether the original text information is lost or not and calculating the loyalty, wherein the calculation formula of the loyalty is as follows:
wherein sim is max To obtain the maximum similarity calculated according to the sense source in the reference translation or the target translation, N r To refer to the number of words in the translation, N t Score for the number of words in the target translation ref Sentence similarity, score, obtained by summing the maximum similarity of each word for reference translation tar Sentence similarity obtained by summing maximum similarity of each word for target translation, score match Score for similarity score of reference translation and target translation zhong Score for loyalty;
fluency calculating unit: calculating the fluency of the cultural relics through the longest continuous public subsequence, wherein the fluency is calculated according to the following formula:
wherein Pen is a punishment term in the process of calculating fluency, # chunks in target sentence is the number of longest continuous subsequences calculated according to dynamic programming, # words in target sentence is the number of sense sources in a sentence, beta is a punishment coefficient, and gamma is a punishment index term;
popularity calculation unit: the popularity is calculated by the following formula:
wherein, # insertion tar And #insertion ref The target translation and the reference translation are respectively calculated by inserting the target translation and the reference translationNumber of words, # of questions tar And # subtistition ref The number of the words to be replaced when the editing times of the target translation and the reference translation are calculated is, # degree tar And # delete ref The number of words to be deleted when the editing times of the target translation and the reference translation are calculated respectively tar For the editing times of the target translation to the original text, the edition is used ref Score for referring to the number of edits from translation to original con Score for ratio of reference translation edit times to target translation edit times tong Is popular score.
In some embodiments, the similarity is determined according to the following formula:
wherein Sim is s Representing similarity scores of the target translation and the reference translation, S 1 And S is 2 Respectively representing concepts to be compared, structSim represents a structural similarity calculation function called by OpenHownet, sim DEF Representing the semblance similarity calculation function, beta, invoked by OpenHownet struct Weight parameter, beta, as a StructSim function DEF Is Sim DEF Weight parameters of the function.
The beneficial effects of the technical scheme are that:
(1) According to the application, word meaning information is integrated in evaluation aiming at the ancient text target translation output by machine translation, and the evaluation dimension of colloquiality (whether the translation fully translates the original text) is increased beyond the evaluation dimension of traditional faithfulness (the information that the translation result is faithful to the original text) and fluency (the target translation obtained by translation is fluent and natural and accords with the expression habit of the target language), so that the effectiveness of the evaluation method is improved.
(2) Compared with manual word segmentation, the automatic word segmentation is finer, so that a plurality of words which can be words are changed into single words, and the proportion of the unknown words in the known network is greatly reduced. Therefore, more word sense information can be obtained by adopting the automatic word segmentation method, most words can have word senses when the loyalty is calculated, and comparison is convenient, so that the correlation between the evaluation result obtained by the evaluation method and the human score is higher.
(3) Compared with the basic granularity with words as loyalty, the basic granularity with words as loyalty is evaluated in single loyalty dimension or in two dimensions of smoothness and popularity, and the obtained result has higher correlation with human scoring results. And after the popularity dimension is added in the evaluation method, no matter what method is used for evaluating the fluency, the correlation between the obtained evaluation result and the human score is greatly improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for automatically evaluating the results of machine translation of a text and a sentence according to an embodiment of the present application;
fig. 2 is a schematic diagram of a system for automatically evaluating the machine translation result of a text and a sentence according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments.
Examples of the embodiments are illustrated in the accompanying drawings, wherein like or similar symbols indicate like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present application and should not be construed as limiting the application.
Example 1
One embodiment of the present application provides a method for automatically evaluating the machine translation result of a text-to-speech, referring to fig. 1, including:
step S101: according to the modern text parallel pairing corpus, a plurality of sentences are screened out to construct a data set to be evaluated in a random and manual selection mode.
Step S102: and preprocessing and word segmentation processing are carried out on the data set to be evaluated.
In a specific embodiment of the present application, step S102 further includes:
counting words subjected to word segmentation, dividing the words according to manual word segmentation, automatic word segmentation and word segmentation according to word granularity, and comparing the effect differences of the manual word segmentation, the automatic word segmentation and the word segmentation according to the word granularity.
Step S103: and acquiring an original text and a reference translation of the training corpus from the data set to be evaluated, translating the original text by using a machine translation model of the text to be evaluated, comparing the obtained target translation with the reference translation and the original text, manually analyzing the target translation, judging the translation quality translated by the machine translation model of the text, and constructing a test data set according to an original text-reference translation-target translation-manual scoring mode.
In a specific embodiment of the present application, step S103 further includes:
and processing the components of the reference translation, which are complemented according to the chapter information, into background words, and not serving as references for evaluating the target translation.
Step S104: and calculating the similarity of the reference translation and the target translation by comparing the reference translation with the target translation, calculating scores of three dimensions of loyalty, fluency and colloquiality of the target translation according to the test data set, calculating average score, error and relevance of the target translation according to the scores of the three dimensions, and automatically scoring the target translation.
In a specific embodiment of the present application, step S104 further includes:
calculating the target translationSimilarity score Sim of sense origin corresponding to text and reference translation s And determining the maximum similarity according to the following formula:
wherein Sim is s Representing similarity scores of the target translation and the reference translation, S 1 And S is 2 Concepts to be compared are represented separately, i.e., each word may correspond to a plurality of interpretations, the concept being one of the plurality of interpretations, each concept being illustrated as being decomposed by one or more senes. StructSim represents a structural similarity calculation function called by openhopnet, sim DEF Representing the semblance similarity calculation function, beta, invoked by OpenHownet struct Weight parameter, beta, as a StructSim function DEF Is Sim DEF Weight parameters of the function.
Specifically, during actual calculation, the similarity of two words based on the knowledge network can be obtained through open source item OpenHownet calculation.
For example, in "teacher and father must be good at treating him" (reference translation) and "teacher should be good at treating him" (target translation), it is possible for "teacher" and "teacher and father" to obtain a similarity of 0.82 in the knowledge network.
In a specific embodiment of the present application, step S104 further includes:
step S1041: calculating similarity of the words in the reference translation and the target translation in pairs, setting a threshold value for judging synonyms, and regarding the word pairs with similarity higher than or equal to the threshold value as a paraphrasing; then respectively calculating matching score of reference translation ref And target translation score tar Matching scores of (2); and using the recall rate as a judging basis for judging whether the original text information is lost or not and calculating the loyalty, wherein the calculation formula of the loyalty is as follows:
wherein sim is max To obtain the maximum similarity calculated according to a word in the reference translation or the target translation, N r To refer to the number of words in the translation, N t Score for the number of words in the target translation ref Sentence similarity, score, obtained by summing the maximum similarity of each word for reference translation tar Sentence similarity obtained by summing maximum similarity of each word for target translation, score match Score for similarity score of reference translation and target translation zhong Is a loyalty score.
Specifically, if the teacher and the father must be good at treating him (refer to the translation) and the teacher should be good at treating him (target translation), the terms in the two sentences are calculated to be similar in pairs; calculating "he is excellent" and "he is true", the terms in the two sentences are similar to each other, for example:
the word similarity of the teacher, the father and the teacher is as follows: 0.82;
the similarity between the teacher and the father and the words to be treated is as follows: 0.30;
the similarity of the words of the teachers and parents and good words is as follows: 0.25;
the similarity between the teacher and the words to be treated is as follows: 0.24;
the word similarity of the teacher, the father and the other is as follows: 0.73;
according to the calculation method, the similarity scores of the words in the target translation and the reference translation are respectively as follows:
master: 0.82 Certain: 1, to: 1, good treatment: 0.69, he: 1, a step of;
teacher). 0.82 The preparation method comprises the following steps: 1, good: 0.56, treat: 0.69, he: 1, a step of;
setting a threshold value for a near term, such as 0.44, and a highest similarity score above the threshold value, where the score is scored as a match score for the term, and a match score below the threshold value is 0, assuming that each term is equally weighted, for example:
the teacher and the father must stay his matching score as follows: 0.82+1+1+0.69+1=4.51;
the teacher would be good at treating his matching score as: 0.82+1+0.56+0.69+1=4.07;
the matching score of these two sentences was min (4.51, 4.07) =4.07, and the loyalty score was 4.07/5=0.81.
Step S1042: calculating the fluency of the cultural relics through the longest continuous public subsequence, wherein the fluency is calculated according to the following formula:
wherein Pen is a punishment term in the process of calculating fluency, # chunks in target sentence is the number of longest continuous subsequences calculated according to dynamic programming, # words in target sentence is the number of sense sources in a sentence, beta is a punishment coefficient, and gamma is a punishment index term;
specifically, in the above example, the number of consecutive # chunks in target sentence is 2, the number of words is 5, and the super parameter is assumed1->If the smoothness score is 0.5, the smoothness score can be calculated to be 0.37.
Step S1043: the popularity is calculated by the following formula:
wherein, # insertion tar And #insertion ref The number of words to be inserted in calculating the editing times of the target translation and the reference translation is # substraction tar And # subtistition ref The number of the words to be replaced when the editing times of the target translation and the reference translation are calculated is, # degree tar And # delete ref The number of words to be deleted when the editing times of the target translation and the reference translation are calculated respectively tar For the editing times of the target translation to the original text, the edition is used ref Score for referring to the number of edits from translation to original con Score for ratio of reference translation edit times to target translation edit times tong Is popular score.
Specifically, for example, the literary text "is still more so, the artist is at his best. When the reference translation is "but he comes again, the teacher and the father must stay on his own. The "target translation obtained by machine translation of ancient text" is that he is still better treated by the teacher. "the number of edits from ancient text to reference translation is 11, the number of edits from ancient text to target translation is 10, score con =10/11, so the popularity score of this sentence is 0, which is not subjected to insufficient translation penalty reduction.
In one embodiment of the application, the total score is calculated by assigning the loyalty and fluency weights. According to the score zhong 、score liu 、score con By combining score values zhong 、score liu Respectively setting weights, setting the total weight as 1, and setting score con The total Score is determined as a subtractive term, and if the result is negative, the total Score is set to 0.
Specifically, the calculation of the total score should comprehensively consider the scores of three dimensions, and weights are respectively set for loyalty and fluency, wherein the total weight is 1; popularity is involved in scoring as a penalty term. Meanwhile, in order to avoid that the total score takes a negative value, after the scores of three dimensions participate in calculation, if the final score is a negative value, the total score is set to 0. For example, the calculation formulas for loyalty and fluency weights of 0.8 and 0.2 are now provided:
where Score is the final total Score.
After testing, it was found that untreated background words negatively interfere with the assessment of loyalty, resulting in reduced relevance to human scores. So that there is a decrease in correlation with human scores when evaluated in all dimensions. Removal of background words has proven to be effective in evaluating ancient machine translation. Table 1, table 2, table 3 below, shows:
TABLE 1 correlation of evaluation methods Using word granularity with human evaluation results
TABLE 2 statistical table after word segmentation of test corpus
TABLE 3 effect of the evaluation method of the application on the corpus of unprocessed background words
Remarks: naming of each evaluation method in tables 1-3:
TMH (Traditional evaluation Method base on HowNet) refers to the evaluation method, fluency, popularity of the loyalty based on the HowNet of the known network, and the different methods adopted inside each evaluation dimension are represented by using the form of suffixes. The suffix content is as follows:
different methods inside loyalty:
a: the evaluation performed on the basis of the word segmentation using the automatic word segmentation tool is shown.
And h: the evaluation performed on the basis of manually performing word segmentation is shown.
W: the evaluation using the granularity of the words is shown.
Different methods inside fluency:
l: a smoothness evaluation method using the longest common subsequence is shown.
1: a fluency assessment method based on 1-gram was used.
2: a 2-gram based fluency assessment method was used.
3: a fluency assessment method based on 3-gram was used.
4: a 4-gram based fluency assessment method was used.
The popular evaluation method comprises the following steps:
and _E: the number of edits was used as a popular evaluation method.
On the selection of corpus:
if the corpus after processing the background word is used, there is no additional character.
B: an untreated test corpus containing background words was used.
For example: TMH_w refers to the use of word granularity for loyalty calculations, and the final evaluation method includes only one dimension of loyalty. TMH_a_3_E refers to performing faithfulness calculation according to word granularity after word segmentation by using an automatic word segmentation tool, wherein the evaluation method of fluency selects an evaluation method based on 3-gram, the colloquiality is evaluated by using the editing times, and the final evaluation method comprises three dimensions of faithfulness, fluency and colloquiality.
Compared with the basic granularity with words as loyalty, the basic granularity with words as loyalty is evaluated in single loyalty dimension or in two dimensions of smoothness and popularity, and the obtained result has higher correlation with human scoring results. After the popularity dimension is added in the evaluation method, the correlation between the obtained evaluation result and the human score is greatly improved no matter what method is used for evaluating the fluency.
Example two
One embodiment of the present application provides a system for automatically evaluating the machine translation result of a text-to-speech, as shown in fig. 2, comprising:
the data set construction module 10: the method is used for screening out a plurality of sentences to construct a data set to be evaluated by a random and manual selection mode according to modern text parallel pairing corpus.
Pretreatment module 20: and the method is used for preprocessing and word segmentation processing of the data set to be evaluated.
In one embodiment of the present application, the preprocessing module 20 further includes:
counting words subjected to word segmentation, dividing the words according to manual word segmentation, automatic word segmentation and word segmentation according to word granularity, and comparing the effect differences of the manual word segmentation, the automatic word segmentation and the word segmentation according to the word granularity.
Test set construction module 30: and the method is used for acquiring the original text and the reference translation of the training corpus from the data set to be evaluated, translating the original text by utilizing a machine translation model of the text to be evaluated, comparing the obtained target translation with the reference translation and the original text, manually analyzing the target translation, judging the translation quality translated by the machine translation model of the text, and constructing a test data set according to the original text-reference translation-target translation-manual scoring mode.
In one embodiment of the present application, the test set construction module 30 further includes:
and processing the components of the reference translation, which are complemented according to the chapter information, into background words, and not serving as references for evaluating the target translation.
Scoring module 40: the method is used for calculating the similarity of the reference translation and the target translation by comparing the reference translation with the target translation, calculating scores of three dimensions of loyalty, fluency and popularity of the target translation according to the test data set, calculating average scores, errors and correlations of the target translation according to the scores of the three dimensions, and automatically scoring the target translation.
In one embodiment of the present application, scoring module 40 further comprises:
calculating similarity score Sim of corresponding sense source of the target translation and the reference translation s And determining the maximum similarity according to the following formula:
wherein Sim is s Representing similarity scores of the target translation and the reference translation, S 1 And S is 2 Respectively representing concepts to be compared, structSim represents a structural similarity calculation function called by OpenHownet, sim DEF Representing the semblance similarity calculation function, beta, invoked by OpenHownet struct Weight parameter, beta, as a StructSim function DEF Is Sim DEF Weight parameters of the function.
In one embodiment of the present application, scoring module 40 further comprises:
loyalty calculation unit: calculating similarity of the words of the reference translation and the target translation in pairs, setting a threshold value for judging synonyms, and taking the word pairs with similarity higher than or equal to the threshold value as the paraphrasing; then respectively calculating matching score of reference translation ref And target translation score tar Matching scores of (2); and using the recall rate as a judging basis for judging whether the original text information is lost or not and calculating the loyalty, wherein the calculation formula of the loyalty is as follows:
wherein sim is max To obtain the maximum similarity calculated according to a word in the reference translation or the target translation, N r To refer to the number of words in the translation, N t Score for the number of words in the target translation ref Sentence similarity, score, obtained by summing the maximum similarity of each word for reference translation tar Sentence similarity obtained by summing maximum similarity of each word for target translation, score match Score for similarity score of reference translation and target translation zhong Is a loyalty score.
Fluency calculating unit: calculating the fluency of the cultural relics through the longest continuous public subsequence, wherein the fluency is calculated according to the following formula:
wherein Pen is a penalty term in the process of calculating fluency, # chunks in target sentence is the number of longest continuous subsequences calculated according to dynamic programming, # words in target sentence is the number of sense sources in a sentence, beta is a penalty coefficient, and gamma is a penalty index term.
Popularity calculation unit: the popularity is calculated by the following formula:
wherein, # insertion tar And #insertion ref The number of words to be inserted in calculating the editing times of the target translation and the reference translation is # substraction tar And # subtistition ref The number of the words to be replaced when the editing times of the target translation and the reference translation are calculated is, # degree tar And # delete ref The number of words to be deleted when the editing times of the target translation and the reference translation are calculated respectively tar For the editing times of the target translation to the original text, the edition is used ref Score for referring to the number of edits from translation to original con Score for ratio of reference translation edit times to target translation edit times tong Is popular score.
In one embodiment of the application, the total score is calculated by assigning the loyalty and fluency weights. According to the score zhong 、score liu 、score con By combining score values zhong 、score liu Respectively setting weights, setting the total weight as 1, and setting score con The total Score is determined as a subtractive term, and if the result is negative, the total Score is set to 0.
Specifically, the calculation of the total score should comprehensively consider the scores of three dimensions, and weights are respectively set for loyalty and fluency, wherein the total weight is 1; popularity is involved in scoring as a penalty term. Meanwhile, in order to avoid that the total score takes a negative value, after the scores of three dimensions participate in calculation, if the final score is a negative value, the total score is set to 0. For example, the calculation formulas for loyalty and fluency weights of 0.8 and 0.2 are now provided:
where Score is the final total Score.
Compared with the basic granularity with words as loyalty, the basic granularity with words as loyalty is evaluated in single loyalty dimension or in two dimensions of smoothness and popularity, and the obtained result has higher correlation with human scoring results. After the popularity dimension is added in the evaluation method, the correlation between the obtained evaluation result and the human score is greatly improved no matter what method is used for evaluating the fluency.
In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "examples," "particular examples," "one particular embodiment," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be appreciated by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (8)

1. A method for automatically evaluating the results of machine translation of a document, comprising:
step S101: screening out a plurality of sentences to construct a data set to be evaluated by a random and manual selection mode according to modern text parallel pairing corpus;
step S102: preprocessing and word segmentation processing are carried out on the data set to be evaluated;
step S103: acquiring an original text and a reference translation of a training corpus from the data set to be evaluated, translating the original text by using a machine translation model of the text to be evaluated, comparing the obtained target translation with the reference translation and the original text, manually analyzing the target translation, judging the quality of the translation translated by the machine translation model of the text, and constructing a test data set according to an original text-reference translation-target translation-manual scoring mode;
step S104: calculating the similarity of the reference translation and the target translation by comparing the reference translation with the target translation, calculating scores of three dimensions of loyalty, fluency and popularity of the target translation according to the test data set, calculating average scores, errors and correlations of the target translation according to the scores of the three dimensions, and automatically scoring the target translation;
step S1041: calculating similarity of the words in the reference translation and the target translation in pairs, setting a threshold value for judging synonyms, and taking the word pairs with similarity higher than or equal to the threshold value as the paraphrasing; then respectively calculating matching score of reference translation ref Score matching target translation tar The method comprises the steps of carrying out a first treatment on the surface of the And using the recall rate as a judging basis for judging whether the original text information is lost or not and calculating the loyalty, wherein the calculation formula of the loyalty is as follows:
score match =min(score ref ,score tar )
wherein sim is max To take reference translation or target translationMaximum similarity calculated by sense origin, S i And S is j Respectively representing the concepts to be evaluated, nr is the number of words in the reference translation, N t Score for the number of words in the target translation ref Sentence similarity, score, obtained by summing the maximum similarity of each word for reference translation tar Sentence similarity obtained by summing maximum similarity of each word for target translation, score match Score for similarity score of reference translation and target translation zhong Is a loyalty score;
step S1042: calculating the fluency of the cultural relics through the longest continuous public subsequence, wherein the fluency is calculated according to the following formula:
score liu =1-βPen γ
wherein Pen is a punishment term in the process of calculating fluency, # chunksin target sentence is the number of longest continuous subsequences calculated according to dynamic programming, # words in target sentence is the number of sense sources in a sentence, beta is a punishment coefficient, and gamma is a punishment index term;
step S1043: the popularity is calculated by the following formula:
edit tar =#insertion tar +#substitution tar +#deletion tar
edit ref =#insertion ref +#substitution ref +#deletion ref
score tong =max(0,(0.4-score con ))
wherein, # insertion tar And #insertion ref The number of words to be inserted in calculating the editing times of the target translation and the reference translation is # substraction tar And # subtistition ref The number of the words to be replaced when the editing times of the target translation and the reference translation are calculated is, # degree tar And # delete ref The number of words to be deleted when the editing times of the target translation and the reference translation are calculated respectively tar For the editing times of the target translation to the original text, the edition is used ref Score for referring to the number of edits from translation to original con Score for ratio of reference translation edit times to target translation edit times tong Is popular score.
2. The method for automatically evaluating the machine translation result of a text according to claim 1, wherein said step S102 further comprises:
counting words subjected to word segmentation, dividing the words according to manual word segmentation, automatic word segmentation and word segmentation according to word granularity, and comparing the effect differences of the manual word segmentation, the automatic word segmentation and the word segmentation according to the word granularity.
3. The method for automatically evaluating the machine translation result of a text according to claim 1, wherein said step S103 further comprises:
and processing the components of the reference translation, which are complemented according to the chapter information, into background words, and not serving as references for evaluating the target translation.
4. The method for automatically evaluating machine translation results of a text according to claim 1, wherein said similarity is determined according to the following formula:
wherein Sim is s Representing similarity scores of the target translation and the reference translation, S 1 And S is 2 Respectively representing two concepts to be evaluated, structSim representing a structural phase called by openhopnetSimilarity calculation function, sim DEF Representing the semblance similarity calculation function, beta, invoked by OpenHownet struct Weight parameter, beta, as a StructSim function DEF Is Sim DEF Weight parameters of the function.
5. A system for automatically evaluating the results of machine translation of a document, comprising:
the data set construction module: the method is used for screening out a plurality of sentences to construct a data set to be evaluated by a random and manual selection mode according to modern text parallel pairing corpus;
and a pretreatment module: the method is used for preprocessing and word segmentation processing of the data set to be evaluated;
the test set construction module: the method comprises the steps of obtaining an original text and a reference translation of a training corpus from a data set to be evaluated, translating the original text by using a machine translation model of the text to be evaluated, comparing an obtained target translation with the reference translation and the original text, manually analyzing the target translation, judging the translation quality translated by the machine translation model of the text, and constructing a test data set according to an original text-reference translation-target translation-manual scoring mode;
and a scoring module: the method comprises the steps of comparing the reference translation with a target translation, calculating the similarity of the reference translation and the target translation, calculating scores of three dimensions of loyalty, fluency and colloquiality of the target translation according to the test data set, calculating average scores, errors and correlations of the target translation according to the scores of the three dimensions, and automatically scoring the target translation;
loyalty calculation unit: calculating similarity of the words in the reference translation and the target translation in pairs, setting a threshold value for judging synonyms, and taking the word pairs with similarity higher than or equal to the threshold value as the paraphrasing; then respectively calculating matching score of reference translation ref Score matching target translation tar The method comprises the steps of carrying out a first treatment on the surface of the And using the recall rate as a judging basis for judging whether the original text information is lost or not and calculating the loyalty, wherein the calculation formula of the loyalty is as follows:
score match =min(score ref ,score tar )
wherein sim is max To obtain the maximum similarity calculated according to the sense origin in the reference translation or the target translation S i And S is j Respectively represent the concepts to be evaluated, N r To refer to the number of words in the translation, N t Score for the number of words in the target translation ref Sentence similarity, score, obtained by summing the maximum similarity of each word for reference translation tar Sentence similarity obtained by summing maximum similarity of each word for target translation, score match Score for similarity score of reference translation and target translation zhong Is a loyalty score;
fluency calculating unit: calculating the fluency of the cultural relics through the longest continuous public subsequence, wherein the fluency is calculated according to the following formula:
score liu =1-βPen γ
wherein Pen is a punishment term in the process of calculating fluency, # chunksin target sentence is the number of longest continuous subsequences calculated according to dynamic programming, # wordsin target sentence is the number of sense sources in a sentence, beta is a punishment coefficient, and gamma is a punishment index term;
popularity calculation unit: the popularity is calculated by the following formula:
edit tar =#insertion tar +#substitution tar +#deletion tar
edit ref =#insertion ref +#substitution ref +#deletion ref
score tong =max(0,(0.4-score con ))
wherein, # insertion tar And #insertion ref The number of words to be inserted in calculating the editing times of the target translation and the reference translation is # substraction tar And # subtistition ref The number of the words to be replaced when the editing times of the target translation and the reference translation are calculated is, # degree tar And # delete ref The number of words to be deleted when the editing times of the target translation and the reference translation are calculated respectively tar For the editing times of the target translation to the original text, the edition is used ref Score for referring to the number of edits from translation to original con Score for ratio of reference translation edit times to target translation edit times tong Is popular score.
6. The system for automatically evaluating machine translation results of a document according to claim 5, wherein said preprocessing module further comprises:
counting words subjected to word segmentation, dividing the words according to manual word segmentation, automatic word segmentation and word segmentation according to word granularity, and comparing the effect differences of the manual word segmentation, the automatic word segmentation and the word segmentation according to the word granularity.
7. The system for automatically evaluating machine translation results of a document according to claim 5, wherein said test set construction module further comprises:
and processing the components of the reference translation, which are complemented according to the chapter information, into background words, and not serving as references for evaluating the target translation.
8. The system for automatically evaluating machine translation results of a document according to claim 5, wherein said similarity is determined according to the following formula:
wherein Sim is s Representing similarity scores of the target translation and the reference translation, S 1 And S is 2 Respectively representing concepts to be compared, structSim represents a structural similarity calculation function called by openhopnet, sim DEF Representing the semblance similarity calculation function, beta, invoked by OpenHownet struct Weight parameter, beta, as a StructSim function DEF Is Sim DEF Weight parameters of the function.
CN202310973916.4A 2023-08-04 2023-08-04 Method and system for automatically evaluating machine translation result of cultural relics Active CN116701961B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310973916.4A CN116701961B (en) 2023-08-04 2023-08-04 Method and system for automatically evaluating machine translation result of cultural relics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310973916.4A CN116701961B (en) 2023-08-04 2023-08-04 Method and system for automatically evaluating machine translation result of cultural relics

Publications (2)

Publication Number Publication Date
CN116701961A CN116701961A (en) 2023-09-05
CN116701961B true CN116701961B (en) 2023-10-20

Family

ID=87824300

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310973916.4A Active CN116701961B (en) 2023-08-04 2023-08-04 Method and system for automatically evaluating machine translation result of cultural relics

Country Status (1)

Country Link
CN (1) CN116701961B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102693222A (en) * 2012-05-25 2012-09-26 熊晶 Carapace bone script explanation machine translation method based on example
CN107480147A (en) * 2017-08-15 2017-12-15 中译语通科技(北京)有限公司 A kind of method and system of comparative evaluation's machine translation system
CN109344408A (en) * 2018-08-24 2019-02-15 腾讯科技(深圳)有限公司 A kind of translation detection method, device and electronic equipment
CN109359294A (en) * 2018-09-18 2019-02-19 湖北文理学院 A kind of archaic Chinese interpretation method based on neural machine translation
CN109783825A (en) * 2019-01-07 2019-05-21 四川大学 A kind of ancient Chinese prose interpretation method neural network based
CN110674646A (en) * 2019-09-06 2020-01-10 内蒙古工业大学 Mongolian Chinese machine translation system based on byte pair encoding technology

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7587307B2 (en) * 2003-12-18 2009-09-08 Xerox Corporation Method and apparatus for evaluating machine translation quality

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102693222A (en) * 2012-05-25 2012-09-26 熊晶 Carapace bone script explanation machine translation method based on example
CN107480147A (en) * 2017-08-15 2017-12-15 中译语通科技(北京)有限公司 A kind of method and system of comparative evaluation's machine translation system
CN109344408A (en) * 2018-08-24 2019-02-15 腾讯科技(深圳)有限公司 A kind of translation detection method, device and electronic equipment
CN109359294A (en) * 2018-09-18 2019-02-19 湖北文理学院 A kind of archaic Chinese interpretation method based on neural machine translation
CN109783825A (en) * 2019-01-07 2019-05-21 四川大学 A kind of ancient Chinese prose interpretation method neural network based
CN110674646A (en) * 2019-09-06 2020-01-10 内蒙古工业大学 Mongolian Chinese machine translation system based on byte pair encoding technology

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
An automatic evaluation metric for Ancient-Modern Chinese translation;Kexin Yang等;Neural Computing and Applications;第3855-3867页 *
基于扩展参考译文的机器翻译自动评价研究;李娜;中国优秀硕士学位论文全文数据库信息科技辑;第2-4章 *

Also Published As

Publication number Publication date
CN116701961A (en) 2023-09-05

Similar Documents

Publication Publication Date Title
Specia et al. Improving the confidence of machine translation quality estimates
CN106383818A (en) Machine translation method and device
CA2971884C (en) Method and device for general machine translation engine-oriented individualized translation
CN111626042B (en) Reference digestion method and device
CN116012492A (en) Prompt word intelligent optimization method and system for character generation image
CN105573994B (en) Statictic machine translation system based on syntax skeleton
CN111178040B (en) Method and system for detecting plagiarism of Tibetan cross-language paper
CN112836525A (en) Human-computer interaction based machine translation system and automatic optimization method thereof
Clark et al. One system, many domains: Open-domain statistical machine translation via feature augmentation
CN116701961B (en) Method and system for automatically evaluating machine translation result of cultural relics
RU2546064C1 (en) Distributed system and method of language translation
Bergmanis et al. From zero to production: Baltic-ukrainian machine translation systems to aid refugees
Costa et al. Domain adaptation in neural machine translation using a qualia-enriched FrameNet
CN111738022B (en) Machine translation optimization method and system in national defense and military industry field
CN111178038B (en) Document similarity recognition method and device based on latent semantic analysis
CN110347824B (en) Method for determining optimal number of topics of LDA topic model based on vocabulary similarity
CN109299461B (en) Method for extracting bilingual parallel segments of comparable corpus based on Dirichlet process
CN109783820B (en) Semantic parsing method and system
Sun et al. Adaptive Simultaneous Sign Language Translation with Confident Translation Length Estimation
Cettolo et al. Project adaptation for mt-enhanced computer assisted translation
CN113850087A (en) Translation scoring method and related equipment thereof
CN115034237B (en) Data screening method based on translation simplicity
Kwok et al. Cantonese to Written Chinese Translation via HuggingFace Translation Pipeline
CN113033220A (en) Lavenstein ratio-based method for constructing literary-modern translation system
CN118333067B (en) Old-middle nerve machine translation method based on code transcription enhancement word embedding migration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant