CN108710616A

CN108710616A - A kind of voice translation method and device

Info

Publication number: CN108710616A
Application number: CN201810503163.XA
Authority: CN
Inventors: 占萌萌; 刘俊华
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2018-05-23
Filing date: 2018-05-23
Publication date: 2018-10-26
Also published as: WO2019223437A1

Abstract

This application discloses a kind of voice translation method and device, the method includes：The source voice data of user is translated, the first cypher text is obtained, the languages of first cypher text are different from the languages of source voice data, then, by being interacted with user, judge whether the first cypher text is correct as the translation result of source voice data.As it can be seen that the application by being used as whether the translation result of source voice data correctly judges to the first cypher text, can be handled, so as to improve the accuracy of translation result based on the first cypher text of judging result pair.

Description

A kind of voice translation method and device

Technical field

This application involves field of artificial intelligence more particularly to a kind of voice translation methods and device.

Background technology

Voiced translation refers to the process of that the voice data of original language is automatically translated into the voice data of object language, In, original language belongs to different languages from object language.It is by the voice data of original language in existing voiced translation technology Translation result is directly translated and is obtained, still, which may being inaccurate property.

For example, the voice data of original language is that " luggage has to pass through safety check to Chinese speech data", the language of object language Sound data are English voice data " Does Lee have to go through security", still, the English voice number According to corresponding Chinese meaning, really " Mr. Li passes through safety check", it is seen then that " luggage must for the Chinese speech data before translation Safety check must be passed through" " Mr. Li passes through safety check with the physical meaning of English voice data after translation" be different, I.e. translation result is inaccurate.

Invention content

The main purpose of the embodiment of the present application is to provide a kind of voice translation method and device, can improve voiced translation As a result accuracy.

The embodiment of the present application provides a kind of voice translation method, including：

The source voice data of user is translated, the first cypher text is obtained, wherein the language of first cypher text Kind is different from the languages of source voice data；

By being interacted with the user, translation knot of first cypher text as the source voice data is judged Whether fruit is correct.

Optionally, it is described judge first cypher text as the source voice data translation result it is whether correct it Afterwards, further include：

If it is wrong that judgement, which obtains first cypher text as the translation result of the source voice data, to institute It states the first cypher text to be modified, and using revised text as the translation result of the source voice data.

Optionally, described by before being interacted with the user, further including：

Judge whether the translation quality of first cypher text is more than predetermined quality threshold, wherein first translation The translation quality of text is for characterizing correctness of first cypher text as the translation result of the source voice data；

If it is not, the step of then executing by being interacted with the user.

Optionally, whether the translation quality for judging first cypher text is more than predetermined quality threshold, including：

First cypher text is translated, the second cypher text is obtained, wherein the language of second cypher text Kind is identical as the languages of source voice data；

According to second cypher text, judge whether the translation quality of first cypher text is more than preset quality threshold Value.

Optionally, described according to second cypher text, judge whether the translation quality of first cypher text is big In predetermined quality threshold, including：

According to the identification text of the source voice data and second cypher text, first cypher text is judged Translation quality whether be more than predetermined quality threshold.

Optionally, described by being interacted with the user, judge first cypher text as the source voice Whether the translation result of data is correct, including：

It is interacted with the user using second cypher text, judges first cypher text as the source Whether the translation result of voice data is correct.

Optionally, described to be interacted with the user using second cypher text, judge the first translation text Whether this is correct as the translation result of the source voice data, including：

To user output the first inquiry voice, wherein the first inquiry voice is for inquiring the source voice number According to whether similar to the semanteme of second cypher text；

If receiving affirmative acknowledgement (ACK) of the user to the first inquiry voice, first cypher text is as institute The translation result for stating source voice data is correct；

If receiving negative acknowledge of the user to the first inquiry voice, first cypher text is as institute The translation result for stating source voice data is wrong.

Optionally, described that first cypher text is modified, including：

By the way of text matches, first cypher text is modified.

Optionally, described by the way of text matches, first cypher text is modified, including：

The identification text of the source voice data is subjected to matching operation with the text data in database, wherein described Store at least one set of sentence pair in database, the sentence to include first sample text and to the first sample text into The second sample text after the correct translation of row, the languages of the first sample text are identical as the languages of source voice data, The languages of second sample text are identical as the languages of the first cypher text；

By the matching operation, the first sample text most like with the identification text of the source voice data is obtained；

According to the most like first sample text, first cypher text is modified.

Optionally, described that first cypher text is modified according to the most like first sample text, packet It includes：

It is interacted, is realized to first cypher text with the user using the most like first sample text Amendment.

Optionally, described to be interacted with the user using the most like first sample text, it realizes to described The amendment of first cypher text, including：

To user output the second inquiry voice, wherein the second inquiry voice is for inquiring the source voice number According to whether similar to the semanteme of the most like first sample text；

If affirmative acknowledgement (ACK) of the user to the second inquiry voice is received, from the most like first sample Sentence centering belonging to text obtains the second sample text, and revised text is carried out successfully as to first cypher text This.

Optionally, the method further includes：

If receiving negative acknowledge of the user to the second inquiry voice, suggestion voice is exported, wherein described Suggestion voice is used to prompt the user to repeat the source voice data or replaces the saying of the source voice data.

The embodiment of the present application also provides a kind of speech translation apparatus, including：

Speech interpreting unit is translated for the source voice data to user, obtains the first cypher text, wherein institute The languages for stating the first cypher text are different from the languages of source voice data；

User interaction unit, for by being interacted with the user, judging described in the first cypher text conduct Whether the translation result of source voice data is correct.

Optionally, described device further includes：

Text amending unit, if for judging to obtain translation knot of first cypher text as the source voice data Fruit is wrong, then is modified to first cypher text, and using revised text as the source voice data Translation result.

Optionally, described device further includes：

Quality estimation unit, for judging whether the translation quality of first cypher text is more than predetermined quality threshold, Wherein, the translation quality of first cypher text is for characterizing the first cypher text turning over as the source voice data Translate the correctness of result；If it is not, then triggering the user interaction unit to judge described by being interacted with the user Whether one cypher text is correct as the translation result of the source voice data.

Optionally, the Quality estimation unit includes：

Reverse translation subelement obtains the second cypher text for being translated to first cypher text, wherein The languages of second cypher text are identical as the languages of source voice data；

Quality estimation subelement, for according to second cypher text, judging the translation matter of first cypher text Whether amount is more than predetermined quality threshold.

Optionally, the Quality estimation subelement is specifically used for the identification text according to the source voice data and institute The second cypher text is stated, judges whether the translation quality of first cypher text is more than predetermined quality threshold.

Optionally, the user interaction unit, specifically for being handed over the user using second cypher text Mutually, judge whether first cypher text is correct as the translation result of the source voice data.

Optionally, the user interaction unit includes：

First inquiry subelement, for inquiring voice to user output first, wherein the first inquiry voice is used It is whether similar in the semanteme of the inquiry source voice data and second cypher text；

As a result determination subelement, if for receiving affirmative acknowledgement (ACK) of the user to the first inquiry voice, institute The translation result that the first cypher text is stated as the source voice data is correct；If receiving the user to described first Inquire the negative acknowledge of voice, then first cypher text is wrong as the translation result of the source voice data.

Optionally, the text amending unit is specifically used for by the way of text matches, to first cypher text It is modified.

Optionally, the text amending unit includes：

Text matches subelement, for carrying out the identification text of the source voice data with the text data in database Matching operation, wherein store at least one set of sentence pair in the database, the sentence is to including first sample text and right The first sample text carries out the second sample text after correctly translating, the languages of the first sample text and the source language The languages of sound data are identical, and the languages of second sample text are identical as the languages of the first cypher text；

Text obtains subelement, for by the matching operation, obtaining with the identification text of the source voice data most Similar first sample text；

Text revise subelemen, for according to the most like first sample text, to first cypher text into Row is corrected.

Optionally, the text revise subelemen, be specifically used for using the most like first sample text with it is described User interacts, and realizes the amendment to first cypher text.

Optionally, the text revise subelemen includes：

Second inquiry subelement, for inquiring voice to user output second, wherein the second inquiry voice is used It is whether similar in the semanteme of the inquiry source voice data and the most like first sample text；

It corrects and completes subelement, if for receiving affirmative acknowledgement (ACK) of the user to the second inquiry voice, from Sentence centering belonging to the most like first sample text obtains the second sample text, as to first cypher text Carry out successfully revised text.

Optionally, the text revise subelemen further includes：

Voice prompt subelement, if for receiving negative acknowledge of the user to the second inquiry voice, it is defeated Go out suggestion voice, wherein the suggestion voice is for prompting the user to repeat the source voice data or replacing the source The saying of voice data.

The embodiment of the present application also provides a kind of speech translation apparatus, including：Processor, memory, system bus；

The processor and the memory are connected by the system bus；

The memory includes instruction, described instruction for storing one or more programs, one or more of programs The processor is set to execute any one realization method in above-mentioned voice translation method when being executed by the processor.

The embodiment of the present application also provides a kind of computer readable storage mediums, including instruction, when it is transported on computers When row so that computer executes any one realization method in above-mentioned voice translation method.

A kind of voice translation method and device provided by the embodiments of the present application, translate the source voice data of user, Obtain the first cypher text, the languages of first cypher text are different from the languages of source voice data, then, by with user into Row interaction, judges whether the first cypher text is correct as the translation result of source voice data.As it can be seen that by the first translation text Whether this is correctly judged as the translation result of source voice data, can be based on the first cypher text of judging result pair and be carried out Processing, so as to improve the accuracy of translation result.

Description of the drawings

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the application Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.

Fig. 1 is a kind of flow diagram of voice translation method provided by the embodiments of the present application；

Fig. 2 is a kind of flow diagram of translation quality determination method provided by the embodiments of the present application；

Fig. 3 is provided by the embodiments of the present application a kind of to judge the whether believable method flow schematic diagram of translation result；

Fig. 4 is a kind of flow diagram of cypher text modification method provided by the embodiments of the present application；

Fig. 5 is a kind of composition schematic diagram of speech translation apparatus provided by the embodiments of the present application；

Fig. 6 is a kind of hardware architecture diagram of speech translation apparatus provided by the embodiments of the present application.

Specific implementation mode

Voiced translation refers to that the voice data (voice data before translating) of original language is automatically translated into object language Voice data (translate after voice data) process, usually, voiced translation technology is related to speech recognition, machine translation With these three chief components of phonetic synthesis.Wherein, speech recognition refers to the voice by original language by speech recognition technology Data are identified, and generate source language text；Machine translation refers to that source language text is translated into mesh by machine translation mothod Mark language text；Phonetic synthesis refers to that target language text is synthesized to the voice number of object language by speech synthesis technique According to.

As the application of voiced translation technology is more and more extensive, people to the accuracy requirement of translation result also increasingly It is high.A kind of voice translation method is to realize voiced translation by a wheel human-computer dialogue, that is, by once input and once output it is real Existing voiced translation, input be original language voice data, output be object language voice data, specifically will by user The voice data of the original language of required translation is input in speech translation apparatus, and speech translation apparatus passes through speech recognition, machine again The voice data of original language, is automatically translated into the voice data of object language, and feed back by device translation and phonetic synthesis To user, still, in the process, speech recognition, machine translation result be likely to will appear deviation, so as to cause last The voice data of the object language of output is inaccurate, that is to say, that user can only be passive receives the primary of speech translation apparatus Property translation result, and serve as interpreter result mistake when, speech translation apparatus can not correct erroneous translation result in time, to Reduce the accuracy of translation result.

For this purpose, the embodiment of the present application provides a kind of voice translation method, the debugging functions to translation result are increased, I.e., it is possible to assess the accuracy of above-mentioned disposable translation result, when assessment result indicate the accuracy of translation result compared with When low, which can be modified, can specifically be repaiied according to interaction results by being interacted with user Just, to improve the accuracy of translation result.

It should be noted that voice translation method provided by the embodiments of the present application, is not applied to scene and is limited, than Such as, this method can be used for the scene of the needs translations such as user's travel abroad, entry and exit safety check.

To keep the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, technical solutions in the embodiments of the present application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art The every other embodiment obtained without making creative work, shall fall in the protection scope of this application.

First embodiment

It is a kind of flow diagram of voice translation method provided in this embodiment, the voice translation method packet referring to Fig. 1 Include following steps：

S101：The source voice data of user is translated, the first cypher text is obtained, wherein the first translation text This languages are different from the languages of source voice data.

Voice data (namely voice to be translated) before translation is known as source voice data by the present embodiment；Also, this implementation Example does not limit the languages type of source voice data, for example, source voice data can be Chinese speech or English voice etc..

Text data after translation is known as the first cypher text by the present embodiment；Also, the present embodiment does not limit first and turns over The languages type of translation sheet, as long as the first cypher text belongs to different languages types from source voice data, for example, source language Sound data are Chinese speeches, and the first cypher text is English text, and for another example, source voice data is English voice, the first translation Text is Chinese text.

In the present embodiment, speech recognition can be carried out to source voice data by speech recognition technology, obtains source voice The identification text A1 of data, then machine translation is carried out to identification text A1 by machine translation mothod, obtain the first cypher text B1.It should be noted that the speech recognition technology in the present embodiment can be any one voice of the existing or following appearance Identification technology, similarly, the machine translation mothod in the present embodiment can also be any one machine of the existing or following appearance Device translation technology.

For example, when entering and leaving the border safety check, user wishes to engage in the dialogue by speech translation apparatus and security staff, it is assumed that uses The source voice data that family is said is that " luggage has to pass through safety check", after speech translation apparatus carries out speech recognition to it, obtain Identify that text A1 is that " Lee has to pass through safety check", then identification text A1 is translated into (in translate English), the first obtained translation Text B1 is " Does Lee have to go through security".As it can be seen that carrying out speech recognition to source voice data When, there is identification mistake in identification text A1.

S102：By being interacted with the user, judge first cypher text as the source voice data Whether translation result is correct.

In the present embodiment, speech translation apparatus can be interacted with user, and interactive voice or text specifically may be used The modes such as this interaction, and according to interaction results, judge whether the first cypher text is correct as the translation result of source voice data. If it is determined that obtain the first cypher text as the translation result of source voice data and be it is correct, then can be by the first cypher text Translation results of the B1 as source voice data.

At this point it is possible to further carry out phonetic synthesis to the first cypher text B1, target speech data is obtained, and by target Voice data is fed directly to user, to terminate epicycle translation.Certainly, when using the first cypher text B1 as source voice data Text translation result after, other processing can also be carried out to it, the present embodiment does not limit subsequent processing mode.

It should be noted that if it is determined that it is mistake to obtain the first cypher text as the translation result of source voice data , it can be modified processing by the first cypher text of follow-up fourth embodiment pair B1, alternatively, request user comes again source Voice data or change it is a kind of with source voice data in semantically similar saying, to open the translation interaction of a new round.

To sum up, a kind of voice translation method provided in this embodiment translates the source voice data of user, obtains The languages of one cypher text, first cypher text are different from the languages of source voice data, then, by being handed over user Mutually, judge whether the first cypher text is correct as the translation result of source voice data.As it can be seen that by making to the first cypher text Whether correctly judged for the translation result of source voice data, can be based at the first cypher text of judging result pair Reason, so as to improve the accuracy of translation result.

Second embodiment

In the present embodiment, before judgment step S102 that can also be in the first embodiment, that is, can be by man-machine Interactive mode judge the first cypher text as the translation result of source voice data it is whether correct before, first by machine (i.e. voice Interpreting equipment) judge whether the first cypher text is correct as the translation result of source voice data.

Therefore, before judgment step S102 in the first embodiment, can also include：Judge first cypher text Translation quality whether be more than predetermined quality threshold, wherein the translation quality of first cypher text is for characterizing described the Correctness of one cypher text as the translation result of the source voice data；If it is not, then executing the judgement in first embodiment Step S102.

In the present embodiment, it can be estimated that the translation quality of the first cypher text B1, if its translation quality is not higher than pre- The quality threshold first set, is called predetermined quality threshold here, then it is assumed that the first cypher text B1 is as source voice data Translation result is incredible, i.e., the first cypher text B1 is wrong as the translation result of source voice data, at this point it is possible to Step S102 is continued to execute, to do further judgement as the correctness of translation result to the first cypher text B1.

, whereas if the translation quality of the first cypher text B1 is higher than predetermined quality threshold, then it is assumed that the first cypher text B1 is believable as the translation result of source voice data, i.e., the first cypher text is just as the translation result of source voice data True, at this point it is possible to using the first cypher text B1 as the translation result of source voice data, it is possible to further be turned over to first This B1 of translation carries out phonetic synthesis, target speech data is obtained, and target speech data is fed directly to user, to terminate Epicycle is translated.Certainly, when using the first cypher text B1 as after the text translation result of source voice data, can also being carried out to it Other processing, the present embodiment do not limit subsequent processing mode.

(" it will judge whether the translation quality of first cypher text is big to above-mentioned translation quality judgment step below In predetermined quality threshold ") specific implementation be introduced.

It is a kind of flow diagram of translation quality determination method provided in this embodiment referring to Fig. 2, which sentences The method of determining includes the following steps：

S201：First cypher text is translated, the second cypher text is obtained, wherein the second translation text This languages are identical as the languages of source voice data.

In the present embodiment, reverse translation can be carried out to the first cypher text B1, obtains the second cypher text A2.Its In, the languages of the first cypher text B1 are the languages of language after translation, such as English；The languages of second cypher text A2 are translation The languages of preceding language, such as Chinese.

For example, continuing above-mentioned example, it is assumed that the first cypher text B1 is " Does Lee have to go through security", the second cypher text A2 for carrying out obtaining after reverse translation to it is that " Mr. Li passes through safety check".

S202：According to second cypher text, it is default to judge whether the translation quality of first cypher text is more than Quality threshold.

In the present embodiment, it can be based on the second cypher text A2, the translation quality of the first cypher text B1 is carried out Judgement.In one implementation, this step S202 can specifically include：According to the identification text of the source voice data and Second cypher text, judges whether the translation quality of first cypher text is more than predetermined quality threshold.

In the specific implementation of this step 202, BLEU (bilingual evaluation can be specifically utilized Understudy) algorithm, judges whether the translation quality of first cypher text is more than predetermined quality threshold.

Specifically, BLEU algorithms are a kind of evaluation and test algorithms to machine translation result, for assessing a kind of natural language Translate into the translation quality of another natural language.Specific algorithm is as follows：

First, it in order to which the translation effect to the first cypher text B1 is considered comprehensively, needs successively based on 1 word Unit (1-gram) is multiple angles such as basic unit (n-gram) to multiple words, removes the statistics translation texts of identification text A1 and second It can not consider that each base unit is residing in the text with matched base unit number, in statistic processes between this A2 Position.Then, according to the base unit number matched, of the second cypher text A2 under each rank base unit is calculated separately With accuracy rate.

The second cypher text A2 can be calculated at each rank base unit i-gram (i=1,2 ... n) according to following formula Matching accuracy rate percison：

Wherein, corret be in the second cypher text A2 with the correct matched same order base unit numbers of identification text A1, Output_length is the sum of the same order base unit in the second cypher text A2.

For example, continuing above-mentioned example, it is assumed that identification text A1 is that " Lee has to pass through safety check", the second cypher text A2 It is that " Mr. Li passes through safety check", then the result of calculation of matching accuracy rate percison is as shown in table 1 below.

Table 1

	Correct matched base unit	Matching accuracy rate percison
			1-gram	Lee, mistake, peace, inspection,	6/10=0.6
2-gram	Cross peace, safety check,	3/9=0.33
			3-gram	Cross safety check	1/8=0.125
4-gram	(nothing)	0/7=0

Then, it is also contemplated that punishing the redundancy word in the second cypher text A2, so, introduce length punishment The factor solves the problems, such as this, and principle is that the second cypher text A2 is longer, and punishment deduction of points is then more, length penalty factor Calculation formula is as follows：

C=min (1, L1/L2) (2)

Wherein, L1 is the length for identifying text A1, and L2 is the length of the second cypher text A2.

In formula (2), if identification text A1 and the second cypher text A2 is Chinese text, it can be counted as unit of word Calculate text size.For example, when identification text A1 is that " Lee has to pass through safety check", the length of 9；As the second cypher text A2 It is that " Mr. Li passes through safety check", the length of 10.

Finally, it calculates separately to obtain matching accuracy rate corret and length penalty factor when according to above-mentioned formula (1) and (2) After C, the BLEU scores of the second cypher text A2 can be calculated.The corresponding BLEU scores of certain rank base unit can be specifically selected, For example the corresponding BLEU scores of selection 4-igram, calculation formula are as follows：

bleu_4-gram=C*f (4-gram) (3)

Wherein, bleu_4-gramFor the BLEU scores of the second cypher text A2, C is length penalty factor, f be to 1-gram, The processing function of the corresponding matching accuracy rate of 2-gram, 3-gram, 4-gram.

For example, when identification text A1 is that " Lee has to pass through safety check", the second cypher text A2 be that " Mr. Li passes through peace It has examined" when, it by each matching accuracy rate (as described in Table 1) that formula (1) is calculated and formula (2) will be passed through will count Obtained length penalty factor substitutes into formula (3), and the BLEU scores that the second cypher text A2 can be calculated are 20.56.

In the present embodiment, a translation scoring threshold value can be pre-set, here using translation scoring threshold value as pre- If quality threshold, for example the threshold value is set as 50, since the above-mentioned score 20.56 being calculated is less than threshold value 50, it can To judge that the translation result of the first cypher text B1 as source voice data is incredible, such as above-mentioned first cypher text B1 "Does Lee have to go through security" it is incredible；Conversely, when BLEU scores are greater than or equal to threshold Value 50, then it is believable to judge the first cypher text B1 as the translation result of source voice data.

To sum up, a kind of translation quality determination method provided in this embodiment, can reversely turn over the first cypher text It translates, the second cypher text, and the identification text based on source voice data and the second cypher text is obtained, using BLEU algorithms pair Second cypher text scores, real so as to be judged according to the translation quality of the first cypher text of appraisal result pair The evaluation problem of translation quality is showed.

3rd embodiment

In the present embodiment, if the translation quality for judging to obtain the first cypher text by above-mentioned second embodiment is not more than Predetermined quality threshold, that is, judge the first cypher text B1 as the translation result of source voice data it is insincere after, due to voice The judging result of interpreting equipment may be inaccurate, and therefore, speech translation apparatus can pass through the step in first embodiment S102 is interacted with user, the interaction feedback based on user, to judge whether the first cypher text B1 is source voice data Correct translation result.

In a kind of realization method of the present embodiment, the step S102 in first embodiment can specifically include：Using institute It states the second cypher text to interact with the user, judges translation of first cypher text as the source voice data As a result whether correct.In the present embodiment, can be using the second cypher text A2 as the content interacted with user, and root Judged according to user feedback result.

The judgment step can be specifically realized in the following way.

As shown in figure 3, for it is provided in this embodiment it is a kind of judging the whether believable method flow schematic diagram of translation result, can To include the following steps：

S301：To user output the first inquiry voice, wherein the first inquiry voice is for inquiring the source language Whether sound data and the semanteme of second cypher text are similar.

In the present embodiment, it can will be interacted with user after the second cypher text A2 synthesis voices, the mesh of interaction Be inquire user want translation sentence be the second cypher text A2 (i.e. source voice data and the second cypher text A2 It is semantic whether similar), it for ease of description and distinguishes, the voice of inquiry user is known as the first inquiry voice by the present embodiment, should First inquiry voice can be specifically " you want translation be the second cypher text A2".

For example, it is assumed that the first cypher text B1 is " Does Lee have to go through security", pass through Second embodiment S201 carries out it reverse translation and obtains the second cypher text A2 to be that " Mr. Li passes through safety check", work as profit It is scored second cypher text A2 with BLEU algorithms, for example it is 20.56 points to obtain score value, since it is less than preset quality Threshold value 50 is divided, then by second cypher text A2 synthesis the first inquiry voice after reverse translation, for example " may I ask that you think translation is " Mr. Li passes through safety check"".

At this point, speech translation apparatus by the first inquiry voice feedback to user, and wait for the answer of user.

S302：If receiving affirmative acknowledgement (ACK) of the user to the first inquiry voice, first cypher text Translation result as the source voice data is correct.

User can make affirmative acknowledgement (ACK), for example, user can be in a manner of voice or button etc. to the first inquiry voice Speech translation apparatus input voice " yes " or " OK " or " confirmation " key is pressed on speech translation apparatus etc..In this feelings Under condition, it is believable that speech translation apparatus, which thinks the first cypher text B1 as the translation result of source voice data, that is, thinks One cypher text B1 is correct as the translation result of source voice data, therefore, can be translated first by step S103 Translation results of the text B1 as source voice data.

S303：If receiving negative acknowledge of the user to the first inquiry voice, first cypher text Translation result as the source voice data is wrong.

User can make negative acknowledge, for example, user can be to language in a manner of voice or button etc. to the first inquiry voice Sound interpreting equipment inputs voice "no" or presses " NO " key on speech translation apparatus etc..In this case, voice turns over It is incredible to translate equipment to think the first cypher text B1 as the translation result of source voice data, that is, thinks the first translation text This B1 is wrong as the translation result of source voice data.

To sum up, it is provided in this embodiment it is a kind of judging the whether believable method of translation result, can to user export first Inquire voice, which is used to inquire whether source voice data and the semanteme of the second cypher text to be similar；If receiving To affirmative acknowledgement (ACK), then it is assumed that the first cypher text is believable as the translation result of source voice data, if conversely, receiving no It is fixed to answer, then it is assumed that the first cypher text is incredible as the translation result of source voice data.As it can be seen that by with user into Row human-computer interaction can be confirmed whether the first cypher text is correct, to ensure that the accuracy of translation result.

Fourth embodiment

In the present embodiment, when first embodiment judges that first cypher text is used as the source language by step S102 When the translation result of sound data is mistake, further first cypher text can be modified, and will be revised Translation result of the text as the source voice data.

After correcting successfully, can using revised text data as the text translation result of source voice data, at this point, Can phonetic synthesis further be carried out to revised text data, obtain target speech data, and target speech data is straight The reversed user that feeds, to terminate epicycle translation.Certainly, text of text data after will correct successfully as source voice data After this translation result, other processing can also be carried out to it, the present embodiment does not limit subsequent processing mode.

As it can be seen that embodiment adds the debugging functions to translation result, that is, can be to the translation matter of the first cypher text Amount is assessed, and when assessment result indicates that the first cypher text is relatively low as the translation quality of translation result, can be turned over to this It translates result to be modified, to improve the accuracy of translation result.

It should be noted that the application can be repaiied according to provided in this embodiment based on any of the above embodiments The first cypher text of correction method pair B1 is modified.

In a kind of realization method of the present embodiment, the mode of text matches specifically may be used, to the first cypher text B1 is modified, next, by the specific implementation of this amendment step is introduced.

It is a kind of flow diagram of cypher text modification method provided in this embodiment referring to Fig. 4, which repaiies Correction method includes the following steps：

S401：The identification text of the source voice data is subjected to matching operation with the text data in database.

In the present embodiment, a database can be built in advance, wherein at least one set of sentence is stored in the database Right, the sentence is to carrying out the second sample text after correctly translating including first sample text and to the first sample text This, languages (languages of the preceding language of translation) of the languages of the first sample text and the source voice data are identical, and described the The languages of two sample texts are identical as languages (languages of language after translation) of first cypher text.

Specifically, a large amount of first sample text can be collected in advance and first sample text is correctly translated Mutual corresponding first sample text and the second sample text are formed sentence pair, utilize these languages by the second sample text afterwards Sentence pair builds database, which can be the local data base of speech translation apparatus, can also be and speech translation apparatus The database of the Cloud Server side communicated.

In the present embodiment, which can be built according to specific application demand, that is, the database can be only Storage and the relevant sentence pair of concrete application scene, for example, user needs to use speech translation apparatus when entering and leaving the border safety check, that , the entry and exit common sentence pair of safety check can be stored in the database in advance；Certainly, it can also be stored in the database more A relevant sentence pair of application scenarios, in practical application, can judge applied field automatically according to the source voice data of user Then scape selects the sentence of respective application scene to set.

It should be noted that the present embodiment does not limit the quantity of the sentence pair under a certain application scenarios, such as ten thousand left sides 1-4 Right sentence pair, but in order to reach correction effect, need to cover as possible and the relevant common or sentence that is of little use of respective application scene It is right.

By taking safety check scene of entering and leaving the border as an example, certain sentence is as follows to storage format in the database：

{"cn":"Luggage has to pass through safety check","update_time":"20171018T173941",

"en":"Must the luggage be checked by security",

"create_time":"20171018T173941","id":"00000001"}

Wherein：cn：Indicate Chinese sentence；

en：Indicate corresponding English sentence；

update_time：It indicates to upload the database time；

create_time：Indicate the time of making sentence pair；

id：Indicate data to unique mark in the database.

In the present embodiment, the identification text A1 of source voice data is matched with the text data in database, than Such as, Doc2Vec algorithms may be used to be matched, wherein Doc2Vec, which is called, is paragraph2vec or sentence Embeddings is a kind of non-supervisory formula algorithm.

S402：By the matching operation, the first sample most like with the identification text of the source voice data is obtained Text.

By will identification text A1 matched with the text data in database, obtain in database with identify text A1 Sample text A3 is called in most like first sample text, here letter.When being matched, can first will identification text A1 to After quantization, obtain the sentence vector of identification text A1, then, in database with identification text A1 is identical languages every 1 the One sample text calculates separately the distance between the sentence vector of the sentence vector and each first sample text of identification text A1, choosing It selects apart from nearest first sample text, as the sample text A3 most like with identification text A1.

For example, when being matched using Doc2Vec algorithms, it is assumed that identification text A1 is that " Lee has to pass through safety check", Itself and database are subjected to text matches, however, it is determined that id is that " luggage has to pass through safety check for the first sample text of " 00000001 " " sentence vector with " Lee has to pass through safety check" the distance between sentence vector it is most short, then be " 00000001 " by id " luggage has to pass through safety check to first sample text", as the sample text A3 most like with identification text A1.

S403：According to the most like first sample text, first cypher text is modified.

In the present embodiment, when getting the first sample text most like with the identification text A1 of source voice data, i.e., Sample text A3 can be modified using A3 couples of the first cypher text B1 of sample text.

It, can be directly by the second sample of the sentence centering belonging to sample text A3 in a kind of realization method of the present embodiment This text carries out successfully revised text as to the first cypher text.

In another realization method of the present embodiment, step S403 can specifically utilize the most like first sample Text is interacted with the user, realizes the amendment to first cypher text.It, can will be described in this realization method Most like first sample text, that is, sample text A3, as the content interacted with user, and according to user feedback result Amendment to the first cypher text.

The specific implementation of step S403, may comprise steps of A-B：

Step A：To user output the second inquiry voice, wherein the second inquiry voice is for inquiring the source Whether voice data and the semanteme of the most like first sample text are similar.

The sample text A3 that can will go out from database matching is interacted after synthesizing voice with user, the mesh of interaction Be to inquire whether the sentence that user wants translation is that (i.e. whether are the semanteme of source voice data and sample text A3 by sample text A3 It is similar), it for ease of description and distinguishes, the voice of inquiry user is known as the second inquiry voice by the present embodiment, second inquiry Voice can be specifically " you want translation be sample text A3".

For example, it is assumed that sample text A3 is that " luggage has to pass through safety check", then the second inquiry voice can " be may I ask What you thought translation is that " luggage has to pass through safety check"".

At this point, speech translation apparatus by the second inquiry voice feedback to user, and wait for the answer of user.

Step B：If receiving affirmative acknowledgement (ACK) of the user to the second inquiry voice, from described most like the Sentence centering belonging to one sample text obtains the second sample text, after successfully being corrected to first cypher text Text.

User can make affirmative acknowledgement (ACK), for example, user can be in a manner of voice or button etc. to the second inquiry voice Speech translation apparatus input voice " yes " or " OK " or " confirmation " key is pressed on speech translation apparatus etc..In this feelings It, can be by inquiring database under condition, the sentence centering belonging to sample text A3 obtains the second sample text, is claimed here For sample text B3, revised text is carried out successfully using sample text B3 as to the first cypher text.

For example, user hears that speech translation apparatus sends out the second inquiry voice and " could you tell me that think translation is that " luggage must be through Cross safety check" ", if answer result be " yes ", at this point, speech translation apparatus think user want translation be sample Text A3：" luggage has to pass through safety check", and by corresponding sample text B3 " the Must the of sentence centering luggage be checked by security" as revised text is carried out successfully to the first cypher text B1, it corrects Success.

Further, user may also make negative acknowledge to the second inquiry voice, and therefore, the present embodiment can also wrap It includes：

Step C：If receiving negative acknowledge of the user to the second inquiry voice, suggestion voice is exported, In, the suggestion voice is used to prompt the user to repeat the source voice data or replaces saying for the source voice data Method.

User can make negative acknowledge, for example, user can be to language in a manner of voice or button etc. to the second inquiry voice Sound interpreting equipment inputs voice "no" or presses " NO " key on speech translation apparatus etc..In this case, it is believed that repair Positive failure, at this point, speech translation apparatus can ask user to come again source voice data or change one kind with voice mode With source voice data in semantically similar saying, to open the translation interaction of a new round.

To sum up, a kind of cypher text modification method provided in this embodiment, by the identification text and data of source voice data Text data in library carries out matching operation, to obtain and identify the most like sentence of text, then according to the most like language Sentence, is modified the first cypher text.As it can be seen that the present embodiment can be accumulated in advance under each translation direction, each application scenarios Sentence pair, and store in the database, the sentence most like with identification text can be searched in the database by matching algorithm, Using the cypher text of the sentence as revised text, to realize text amendment.

5th embodiment

A kind of speech translation apparatus will be introduced in the present embodiment, and related content refers to above method embodiment.It needs It is noted that the speech translation apparatus can be above-mentioned speech translation apparatus, can also be in above-mentioned speech translation apparatus A part.

It is a kind of composition schematic diagram of speech translation apparatus provided in this embodiment referring to Fig. 5, which includes：

Speech interpreting unit 501 is translated for the source voice data to user, obtains the first cypher text, wherein The languages of first cypher text are different from the languages of source voice data；

User interaction unit 502, for by being interacted with the user, judging first cypher text as institute Whether the translation result for stating source voice data is correct.

In a kind of realization method of the present embodiment, described device 500 can also include：

Quality estimation unit, for judging whether the translation quality of first cypher text is more than predetermined quality threshold, Wherein, the translation quality of first cypher text is for characterizing the first cypher text turning over as the source voice data Translate the correctness of result；If it is not, then triggering the user interaction unit 502 come by being interacted described in judgement with the user Whether the first cypher text is correct as the translation result of the source voice data.

In a kind of realization method of the present embodiment, the Quality estimation unit includes：

In a kind of realization method of the present embodiment, the Quality estimation subelement is specifically used for according to the source voice It is default to judge whether the translation quality of first cypher text is more than for the identification text of data and second cypher text Quality threshold.

In a kind of realization method of the present embodiment, the user interaction unit 502 specifically can be used for utilizing described the Two cypher texts are interacted with the user, judge translation result of first cypher text as the source voice data It is whether correct.

In a kind of realization method of the present embodiment, the user interaction unit 502 may include：

In a kind of realization method of the present embodiment, the text amending unit specifically can be used for using text matches Mode, first cypher text is modified.

In a kind of realization method of the present embodiment, the text amending unit may include：

In a kind of realization method of the present embodiment, text revise subelemen specifically can be used for using described most like First sample text interacted with the user, realize amendment to first cypher text.

In a kind of realization method of the present embodiment, the text revise subelemen may include：

In a kind of realization method of the present embodiment, the text revise subelemen can also include：

Sixth embodiment

Another speech translation apparatus will be introduced in the present embodiment, and related content refers to above method embodiment.

It is a kind of hardware architecture diagram of speech translation apparatus provided in this embodiment, the interactive voice referring to Fig. 6 Device 600 includes memory 601 and receiver 602, and connect respectively with the memory 601 and the receiver 602 Processor 603, the memory 601 is for storing batch processing instruction, and the processor 603 is for calling the memory The program instruction of 601 storages executes following operation：

In a kind of realization method of the present embodiment, the processor 603 is additionally operable to call the storage of the memory 601 Program instruction executes following operation：

If it is not, the step of then executing by being interacted with the user.

By the way of text matches, first cypher text is modified.

According to the most like first sample text, first cypher text is modified.

In some embodiments, the processor 603 can be central processing unit (Central Processing Unit, CPU), the memory 601 can be the interior of random access memory (Random Access Memory, RAM) type Portion's memory, the receiver 602 can include General Physics interface, and the physical interface can be that ether (Ethernet) connects Mouth or asynchronous transfer mode (Asynchronous Transfer Mode, ATM) interface.The processor 603, receiver 602 One or more independent circuits or hardware can be integrated into memory 601, such as：Application-specific integrated circuit (Application Specific Integrated Circuit, ASIC).

Further, the present embodiment additionally provides a kind of computer readable storage medium, including instruction, when it is in computer When upper operation so that computer executes any one realization method in above-mentioned voice translation method.

As seen through the above description of the embodiments, those skilled in the art can be understood that above-mentioned implementation All or part of step in example method can add the mode of required general hardware platform to realize by software.Based on such Understand, substantially the part that contributes to existing technology can be in the form of software products in other words for the technical solution of the application It embodies, which can be stored in a storage medium, such as ROM/RAM, magnetic disc, CD, including several Instruction is used so that a computer equipment (can be the network communications such as personal computer, server, or Media Gateway Equipment, etc.) execute method described in certain parts of each embodiment of the application or embodiment.

It should be noted that each embodiment is described by the way of progressive in this specification, each embodiment emphasis is said Bright is all difference from other examples, and just to refer each other for identical similar portion between each embodiment.For reality For applying device disclosed in example, since it is corresponded to the methods disclosed in the examples, so description is fairly simple, related place Referring to method part illustration.

It should also be noted that, herein, relational terms such as first and second and the like are used merely to one Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation There are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to contain Lid non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also include other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.

The foregoing description of the disclosed embodiments enables professional and technical personnel in the field to realize or use the application. Various modifications to these embodiments will be apparent to those skilled in the art, as defined herein General Principle can in other embodiments be realized in the case where not departing from spirit herein or range.Therefore, the application It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest range caused.

Claims

1. a kind of voice translation method, which is characterized in that including：

The source voice data of user is translated, the first cypher text is obtained, wherein the languages of first cypher text with The languages of the source voice data are different；

By being interacted with the user, judge that first cypher text is as the translation result of the source voice data It is no correct.

2. according to the method described in claim 1, it is characterized in that, described judge first cypher text as the source language After whether the translation result of sound data is correct, further include：

If judge to obtain first cypher text as the translation result of the source voice data and be it is wrong, to described the One cypher text is modified, and using revised text as the translation result of the source voice data.

3. according to the method described in claim 1, it is characterized in that, described by before being interacted with the user, also wrapping It includes：

Judge whether the translation quality of first cypher text is more than predetermined quality threshold, wherein first cypher text Translation quality for characterizing correctness of first cypher text as the translation result of the source voice data；

If it is not, the step of then executing by being interacted with the user.

4. according to the method described in claim 3, it is characterized in that, described judge that the translation quality of first cypher text is It is no be more than predetermined quality threshold, including：

First cypher text is translated, the second cypher text is obtained, wherein the languages of second cypher text with The languages of the source voice data are identical；

According to second cypher text, judge whether the translation quality of first cypher text is more than predetermined quality threshold.

5. according to the method described in claim 4, it is characterized in that, described according to second cypher text, described is judged Whether the translation quality of one cypher text is more than predetermined quality threshold, including：

According to the identification text of the source voice data and second cypher text, turning over for first cypher text is judged Translate whether quality is more than predetermined quality threshold.

6. according to the method described in claim 4, it is characterized in that, described by being interacted with the user, described in judgement Whether the first cypher text is correct as the translation result of the source voice data, including：

It is interacted with the user using second cypher text, judges first cypher text as the source voice Whether the translation result of data is correct.

7. according to the method described in claim 6, it is characterized in that, it is described using second cypher text and the user into Row interaction, judges whether first cypher text is correct as the translation result of the source voice data, including：

To the user output first inquiry voice, wherein it is described first inquiry voice for inquire the source voice data with Whether the semanteme of second cypher text is similar；

If receiving affirmative acknowledgement (ACK) of the user to the first inquiry voice, first cypher text is as the source The translation result of voice data is correct；

If receiving negative acknowledge of the user to the first inquiry voice, first cypher text is as the source The translation result of voice data is wrong.

8. according to claim 2 to 7 any one of them method, which is characterized in that described to be carried out to first cypher text It corrects, including：

By the way of text matches, first cypher text is modified.

9. according to the method described in claim 8, it is characterized in that, described by the way of text matches, turned over to described first Translation is originally modified, including：

The identification text of the source voice data is subjected to matching operation with the text data in database, wherein the data At least one set of sentence pair is stored in library, the sentence is to including first sample text and being carried out just to the first sample text Really the second sample text after translation, the languages of the first sample text are identical as the languages of source voice data, described The languages of second sample text are identical as the languages of the first cypher text；

According to the most like first sample text, first cypher text is modified.

10. right according to the method described in claim 9, it is characterized in that, described according to the most like first sample text First cypher text is modified, including：

It is interacted with the user using the most like first sample text, first cypher text is repaiied in realization Just.

11. according to the method described in claim 10, it is characterized in that, it is described using the most like first sample text with The user interacts, and realizes the amendment to first cypher text, including：

To the user output second inquiry voice, wherein it is described second inquiry voice for inquire the source voice data with Whether the semanteme of the most like first sample text is similar；

If affirmative acknowledgement (ACK) of the user to the second inquiry voice is received, from the most like first sample text Affiliated sentence centering obtains the second sample text, and revised text is carried out successfully as to first cypher text.

12. according to the method for claim 11, which is characterized in that the method further includes：

If receiving negative acknowledge of the user to the second inquiry voice, suggestion voice is exported, wherein the prompt Voice is used to prompt the user to repeat the source voice data or replaces the saying of the source voice data.

13. a kind of speech translation apparatus, which is characterized in that including：

Speech interpreting unit is translated for the source voice data to user, obtains the first cypher text, wherein described The languages of one cypher text are different from the languages of source voice data；

User interaction unit, for by being interacted with the user, judging first cypher text as the source language Whether the translation result of sound data is correct.

14. device according to claim 13, which is characterized in that described device further includes：

Text amending unit, if for judging that obtain first cypher text is as the translation result of the source voice data Mistake, then first cypher text is modified, and using revised text as the translation of the source voice data As a result.

15. device according to claim 13, which is characterized in that described device further includes：

Quality estimation unit, for judging whether the translation quality of first cypher text is more than predetermined quality threshold, wherein The translation quality of first cypher text is for characterizing translation knot of first cypher text as the source voice data The correctness of fruit；If it is not, then triggering the user interaction unit to judge that described first turns over by being interacted with the user Whether this correct as the translation result of the source voice data for translation.

16. device according to claim 15, which is characterized in that the Quality estimation unit includes：

Reverse translation subelement obtains the second cypher text, wherein described for being translated to first cypher text The languages of second cypher text are identical as the languages of source voice data；

Quality estimation subelement, for according to second cypher text, judging that the translation quality of first cypher text is It is no to be more than predetermined quality threshold.

17. device according to claim 16, which is characterized in that the user interaction unit is specifically used for described in utilization Second cypher text is interacted with the user, judges translation knot of first cypher text as the source voice data Whether fruit is correct.

18. device according to claim 17, which is characterized in that the user interaction unit includes：

First inquiry subelement, for inquiring voice to user output first, wherein the first inquiry voice is for asking Ask whether the source voice data and the semanteme of second cypher text are similar；

As a result determination subelement, if for receiving affirmative acknowledgement (ACK) of the user to the first inquiry voice, described the One cypher text is correct as the translation result of the source voice data；If receiving the user to first inquiry The negative acknowledge of voice, then first cypher text is wrong as the translation result of the source voice data.

19. according to claim 14 to 18 any one of them device, which is characterized in that the text amending unit, it is specific to use In by the way of text matches, first cypher text is modified.

20. device according to claim 19, which is characterized in that the text amending unit includes：

Text matches subelement, for matching the identification text of the source voice data with the text data in database Operation, wherein store at least one set of sentence pair in the database, the sentence is to including first sample text and to described First sample text carries out the second sample text after correctly translating, the languages of the first sample text and the source voice number According to languages it is identical, the languages of second sample text are identical as the languages of the first cypher text；

Text obtains subelement, for by the matching operation, obtaining most like with the identification text of the source voice data First sample text；

Text revise subelemen, for according to the most like first sample text, being repaiied to first cypher text Just.

21. device according to claim 20, which is characterized in that text revise subelemen is specifically used for described in most Similar first sample text is interacted with the user, realizes the amendment to first cypher text.

22. device according to claim 21, which is characterized in that the text revise subelemen includes：

Second inquiry subelement, for inquiring voice to user output second, wherein the second inquiry voice is for asking Ask whether the source voice data and the semanteme of the most like first sample text are similar；

It corrects and completes subelement, if for receiving affirmative acknowledgement (ACK) of the user to the second inquiry voice, from described Sentence centering belonging to most like first sample text obtains the second sample text, is carried out as to first cypher text The revised text of success.

23. a kind of speech translation apparatus, which is characterized in that including：Processor, memory, system bus；

The processor and the memory are connected by the system bus；

The memory includes instruction for storing one or more programs, one or more of programs, and described instruction works as quilt The processor makes the processor execute such as claim 1-12 any one of them methods when executing.

24. a kind of computer readable storage medium, including instruction, when run on a computer so that computer executes such as Method described in claim 1-12 any one.