CN108710616A - A kind of voice translation method and device - Google Patents
A kind of voice translation method and device Download PDFInfo
- Publication number
- CN108710616A CN108710616A CN201810503163.XA CN201810503163A CN108710616A CN 108710616 A CN108710616 A CN 108710616A CN 201810503163 A CN201810503163 A CN 201810503163A CN 108710616 A CN108710616 A CN 108710616A
- Authority
- CN
- China
- Prior art keywords
- text
- cypher
- translation
- voice data
- cypher text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013519 translation Methods 0.000 title claims abstract description 277
- 238000000034 method Methods 0.000 title claims abstract description 92
- 230000003993 interaction Effects 0.000 claims description 23
- 238000003860 storage Methods 0.000 claims description 18
- 235000013399 edible fruits Nutrition 0.000 claims description 6
- 230000014616 translation Effects 0.000 description 239
- 238000010586 diagram Methods 0.000 description 12
- 238000012545 processing Methods 0.000 description 12
- 238000005516 engineering process Methods 0.000 description 11
- 230000015572 biosynthetic process Effects 0.000 description 9
- 238000003786 synthesis reaction Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 3
- 238000012937 correction Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 238000002715 modification method Methods 0.000 description 3
- RTZKZFJDLAIYFH-UHFFFAOYSA-N Diethyl ether Chemical compound CCOCC RTZKZFJDLAIYFH-UHFFFAOYSA-N 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000007306 turnover Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/005—Language recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
This application discloses a kind of voice translation method and device, the method includes:The source voice data of user is translated, the first cypher text is obtained, the languages of first cypher text are different from the languages of source voice data, then, by being interacted with user, judge whether the first cypher text is correct as the translation result of source voice data.As it can be seen that the application by being used as whether the translation result of source voice data correctly judges to the first cypher text, can be handled, so as to improve the accuracy of translation result based on the first cypher text of judging result pair.
Description
Technical field
This application involves field of artificial intelligence more particularly to a kind of voice translation methods and device.
Background technology
Voiced translation refers to the process of that the voice data of original language is automatically translated into the voice data of object language,
In, original language belongs to different languages from object language.It is by the voice data of original language in existing voiced translation technology
Translation result is directly translated and is obtained, still, which may being inaccurate property.
For example, the voice data of original language is that " luggage has to pass through safety check to Chinese speech data", the language of object language
Sound data are English voice data " Does Lee have to go through security", still, the English voice number
According to corresponding Chinese meaning, really " Mr. Li passes through safety check", it is seen then that " luggage must for the Chinese speech data before translation
Safety check must be passed through" " Mr. Li passes through safety check with the physical meaning of English voice data after translation" be different,
I.e. translation result is inaccurate.
Invention content
The main purpose of the embodiment of the present application is to provide a kind of voice translation method and device, can improve voiced translation
As a result accuracy.
The embodiment of the present application provides a kind of voice translation method, including:
The source voice data of user is translated, the first cypher text is obtained, wherein the language of first cypher text
Kind is different from the languages of source voice data;
By being interacted with the user, translation knot of first cypher text as the source voice data is judged
Whether fruit is correct.
Optionally, it is described judge first cypher text as the source voice data translation result it is whether correct it
Afterwards, further include:
If it is wrong that judgement, which obtains first cypher text as the translation result of the source voice data, to institute
It states the first cypher text to be modified, and using revised text as the translation result of the source voice data.
Optionally, described by before being interacted with the user, further including:
Judge whether the translation quality of first cypher text is more than predetermined quality threshold, wherein first translation
The translation quality of text is for characterizing correctness of first cypher text as the translation result of the source voice data;
If it is not, the step of then executing by being interacted with the user.
Optionally, whether the translation quality for judging first cypher text is more than predetermined quality threshold, including:
First cypher text is translated, the second cypher text is obtained, wherein the language of second cypher text
Kind is identical as the languages of source voice data;
According to second cypher text, judge whether the translation quality of first cypher text is more than preset quality threshold
Value.
Optionally, described according to second cypher text, judge whether the translation quality of first cypher text is big
In predetermined quality threshold, including:
According to the identification text of the source voice data and second cypher text, first cypher text is judged
Translation quality whether be more than predetermined quality threshold.
Optionally, described by being interacted with the user, judge first cypher text as the source voice
Whether the translation result of data is correct, including:
It is interacted with the user using second cypher text, judges first cypher text as the source
Whether the translation result of voice data is correct.
Optionally, described to be interacted with the user using second cypher text, judge the first translation text
Whether this is correct as the translation result of the source voice data, including:
To user output the first inquiry voice, wherein the first inquiry voice is for inquiring the source voice number
According to whether similar to the semanteme of second cypher text;
If receiving affirmative acknowledgement (ACK) of the user to the first inquiry voice, first cypher text is as institute
The translation result for stating source voice data is correct;
If receiving negative acknowledge of the user to the first inquiry voice, first cypher text is as institute
The translation result for stating source voice data is wrong.
Optionally, described that first cypher text is modified, including:
By the way of text matches, first cypher text is modified.
Optionally, described by the way of text matches, first cypher text is modified, including:
The identification text of the source voice data is subjected to matching operation with the text data in database, wherein described
Store at least one set of sentence pair in database, the sentence to include first sample text and to the first sample text into
The second sample text after the correct translation of row, the languages of the first sample text are identical as the languages of source voice data,
The languages of second sample text are identical as the languages of the first cypher text;
By the matching operation, the first sample text most like with the identification text of the source voice data is obtained;
According to the most like first sample text, first cypher text is modified.
Optionally, described that first cypher text is modified according to the most like first sample text, packet
It includes:
It is interacted, is realized to first cypher text with the user using the most like first sample text
Amendment.
Optionally, described to be interacted with the user using the most like first sample text, it realizes to described
The amendment of first cypher text, including:
To user output the second inquiry voice, wherein the second inquiry voice is for inquiring the source voice number
According to whether similar to the semanteme of the most like first sample text;
If affirmative acknowledgement (ACK) of the user to the second inquiry voice is received, from the most like first sample
Sentence centering belonging to text obtains the second sample text, and revised text is carried out successfully as to first cypher text
This.
Optionally, the method further includes:
If receiving negative acknowledge of the user to the second inquiry voice, suggestion voice is exported, wherein described
Suggestion voice is used to prompt the user to repeat the source voice data or replaces the saying of the source voice data.
The embodiment of the present application also provides a kind of speech translation apparatus, including:
Speech interpreting unit is translated for the source voice data to user, obtains the first cypher text, wherein institute
The languages for stating the first cypher text are different from the languages of source voice data;
User interaction unit, for by being interacted with the user, judging described in the first cypher text conduct
Whether the translation result of source voice data is correct.
Optionally, described device further includes:
Text amending unit, if for judging to obtain translation knot of first cypher text as the source voice data
Fruit is wrong, then is modified to first cypher text, and using revised text as the source voice data
Translation result.
Optionally, described device further includes:
Quality estimation unit, for judging whether the translation quality of first cypher text is more than predetermined quality threshold,
Wherein, the translation quality of first cypher text is for characterizing the first cypher text turning over as the source voice data
Translate the correctness of result;If it is not, then triggering the user interaction unit to judge described by being interacted with the user
Whether one cypher text is correct as the translation result of the source voice data.
Optionally, the Quality estimation unit includes:
Reverse translation subelement obtains the second cypher text for being translated to first cypher text, wherein
The languages of second cypher text are identical as the languages of source voice data;
Quality estimation subelement, for according to second cypher text, judging the translation matter of first cypher text
Whether amount is more than predetermined quality threshold.
Optionally, the Quality estimation subelement is specifically used for the identification text according to the source voice data and institute
The second cypher text is stated, judges whether the translation quality of first cypher text is more than predetermined quality threshold.
Optionally, the user interaction unit, specifically for being handed over the user using second cypher text
Mutually, judge whether first cypher text is correct as the translation result of the source voice data.
Optionally, the user interaction unit includes:
First inquiry subelement, for inquiring voice to user output first, wherein the first inquiry voice is used
It is whether similar in the semanteme of the inquiry source voice data and second cypher text;
As a result determination subelement, if for receiving affirmative acknowledgement (ACK) of the user to the first inquiry voice, institute
The translation result that the first cypher text is stated as the source voice data is correct;If receiving the user to described first
Inquire the negative acknowledge of voice, then first cypher text is wrong as the translation result of the source voice data.
Optionally, the text amending unit is specifically used for by the way of text matches, to first cypher text
It is modified.
Optionally, the text amending unit includes:
Text matches subelement, for carrying out the identification text of the source voice data with the text data in database
Matching operation, wherein store at least one set of sentence pair in the database, the sentence is to including first sample text and right
The first sample text carries out the second sample text after correctly translating, the languages of the first sample text and the source language
The languages of sound data are identical, and the languages of second sample text are identical as the languages of the first cypher text;
Text obtains subelement, for by the matching operation, obtaining with the identification text of the source voice data most
Similar first sample text;
Text revise subelemen, for according to the most like first sample text, to first cypher text into
Row is corrected.
Optionally, the text revise subelemen, be specifically used for using the most like first sample text with it is described
User interacts, and realizes the amendment to first cypher text.
Optionally, the text revise subelemen includes:
Second inquiry subelement, for inquiring voice to user output second, wherein the second inquiry voice is used
It is whether similar in the semanteme of the inquiry source voice data and the most like first sample text;
It corrects and completes subelement, if for receiving affirmative acknowledgement (ACK) of the user to the second inquiry voice, from
Sentence centering belonging to the most like first sample text obtains the second sample text, as to first cypher text
Carry out successfully revised text.
Optionally, the text revise subelemen further includes:
Voice prompt subelement, if for receiving negative acknowledge of the user to the second inquiry voice, it is defeated
Go out suggestion voice, wherein the suggestion voice is for prompting the user to repeat the source voice data or replacing the source
The saying of voice data.
The embodiment of the present application also provides a kind of speech translation apparatus, including:Processor, memory, system bus;
The processor and the memory are connected by the system bus;
The memory includes instruction, described instruction for storing one or more programs, one or more of programs
The processor is set to execute any one realization method in above-mentioned voice translation method when being executed by the processor.
The embodiment of the present application also provides a kind of computer readable storage mediums, including instruction, when it is transported on computers
When row so that computer executes any one realization method in above-mentioned voice translation method.
A kind of voice translation method and device provided by the embodiments of the present application, translate the source voice data of user,
Obtain the first cypher text, the languages of first cypher text are different from the languages of source voice data, then, by with user into
Row interaction, judges whether the first cypher text is correct as the translation result of source voice data.As it can be seen that by the first translation text
Whether this is correctly judged as the translation result of source voice data, can be based on the first cypher text of judging result pair and be carried out
Processing, so as to improve the accuracy of translation result.
Description of the drawings
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the application
Some embodiments for those of ordinary skill in the art without creative efforts, can also basis
These attached drawings obtain other attached drawings.
Fig. 1 is a kind of flow diagram of voice translation method provided by the embodiments of the present application;
Fig. 2 is a kind of flow diagram of translation quality determination method provided by the embodiments of the present application;
Fig. 3 is provided by the embodiments of the present application a kind of to judge the whether believable method flow schematic diagram of translation result;
Fig. 4 is a kind of flow diagram of cypher text modification method provided by the embodiments of the present application;
Fig. 5 is a kind of composition schematic diagram of speech translation apparatus provided by the embodiments of the present application;
Fig. 6 is a kind of hardware architecture diagram of speech translation apparatus provided by the embodiments of the present application.
Specific implementation mode
Voiced translation refers to that the voice data (voice data before translating) of original language is automatically translated into object language
Voice data (translate after voice data) process, usually, voiced translation technology is related to speech recognition, machine translation
With these three chief components of phonetic synthesis.Wherein, speech recognition refers to the voice by original language by speech recognition technology
Data are identified, and generate source language text;Machine translation refers to that source language text is translated into mesh by machine translation mothod
Mark language text;Phonetic synthesis refers to that target language text is synthesized to the voice number of object language by speech synthesis technique
According to.
As the application of voiced translation technology is more and more extensive, people to the accuracy requirement of translation result also increasingly
It is high.A kind of voice translation method is to realize voiced translation by a wheel human-computer dialogue, that is, by once input and once output it is real
Existing voiced translation, input be original language voice data, output be object language voice data, specifically will by user
The voice data of the original language of required translation is input in speech translation apparatus, and speech translation apparatus passes through speech recognition, machine again
The voice data of original language, is automatically translated into the voice data of object language, and feed back by device translation and phonetic synthesis
To user, still, in the process, speech recognition, machine translation result be likely to will appear deviation, so as to cause last
The voice data of the object language of output is inaccurate, that is to say, that user can only be passive receives the primary of speech translation apparatus
Property translation result, and serve as interpreter result mistake when, speech translation apparatus can not correct erroneous translation result in time, to
Reduce the accuracy of translation result.
For this purpose, the embodiment of the present application provides a kind of voice translation method, the debugging functions to translation result are increased,
I.e., it is possible to assess the accuracy of above-mentioned disposable translation result, when assessment result indicate the accuracy of translation result compared with
When low, which can be modified, can specifically be repaiied according to interaction results by being interacted with user
Just, to improve the accuracy of translation result.
It should be noted that voice translation method provided by the embodiments of the present application, is not applied to scene and is limited, than
Such as, this method can be used for the scene of the needs translations such as user's travel abroad, entry and exit safety check.
To keep the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
In attached drawing, technical solutions in the embodiments of the present application is clearly and completely described, it is clear that described embodiment is
Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art
The every other embodiment obtained without making creative work, shall fall in the protection scope of this application.
First embodiment
It is a kind of flow diagram of voice translation method provided in this embodiment, the voice translation method packet referring to Fig. 1
Include following steps:
S101:The source voice data of user is translated, the first cypher text is obtained, wherein the first translation text
This languages are different from the languages of source voice data.
Voice data (namely voice to be translated) before translation is known as source voice data by the present embodiment;Also, this implementation
Example does not limit the languages type of source voice data, for example, source voice data can be Chinese speech or English voice etc..
Text data after translation is known as the first cypher text by the present embodiment;Also, the present embodiment does not limit first and turns over
The languages type of translation sheet, as long as the first cypher text belongs to different languages types from source voice data, for example, source language
Sound data are Chinese speeches, and the first cypher text is English text, and for another example, source voice data is English voice, the first translation
Text is Chinese text.
In the present embodiment, speech recognition can be carried out to source voice data by speech recognition technology, obtains source voice
The identification text A1 of data, then machine translation is carried out to identification text A1 by machine translation mothod, obtain the first cypher text
B1.It should be noted that the speech recognition technology in the present embodiment can be any one voice of the existing or following appearance
Identification technology, similarly, the machine translation mothod in the present embodiment can also be any one machine of the existing or following appearance
Device translation technology.
For example, when entering and leaving the border safety check, user wishes to engage in the dialogue by speech translation apparatus and security staff, it is assumed that uses
The source voice data that family is said is that " luggage has to pass through safety check", after speech translation apparatus carries out speech recognition to it, obtain
Identify that text A1 is that " Lee has to pass through safety check", then identification text A1 is translated into (in translate English), the first obtained translation
Text B1 is " Does Lee have to go through security".As it can be seen that carrying out speech recognition to source voice data
When, there is identification mistake in identification text A1.
S102:By being interacted with the user, judge first cypher text as the source voice data
Whether translation result is correct.
In the present embodiment, speech translation apparatus can be interacted with user, and interactive voice or text specifically may be used
The modes such as this interaction, and according to interaction results, judge whether the first cypher text is correct as the translation result of source voice data.
If it is determined that obtain the first cypher text as the translation result of source voice data and be it is correct, then can be by the first cypher text
Translation results of the B1 as source voice data.
At this point it is possible to further carry out phonetic synthesis to the first cypher text B1, target speech data is obtained, and by target
Voice data is fed directly to user, to terminate epicycle translation.Certainly, when using the first cypher text B1 as source voice data
Text translation result after, other processing can also be carried out to it, the present embodiment does not limit subsequent processing mode.
It should be noted that if it is determined that it is mistake to obtain the first cypher text as the translation result of source voice data
, it can be modified processing by the first cypher text of follow-up fourth embodiment pair B1, alternatively, request user comes again source
Voice data or change it is a kind of with source voice data in semantically similar saying, to open the translation interaction of a new round.
To sum up, a kind of voice translation method provided in this embodiment translates the source voice data of user, obtains
The languages of one cypher text, first cypher text are different from the languages of source voice data, then, by being handed over user
Mutually, judge whether the first cypher text is correct as the translation result of source voice data.As it can be seen that by making to the first cypher text
Whether correctly judged for the translation result of source voice data, can be based at the first cypher text of judging result pair
Reason, so as to improve the accuracy of translation result.
Second embodiment
In the present embodiment, before judgment step S102 that can also be in the first embodiment, that is, can be by man-machine
Interactive mode judge the first cypher text as the translation result of source voice data it is whether correct before, first by machine (i.e. voice
Interpreting equipment) judge whether the first cypher text is correct as the translation result of source voice data.
Therefore, before judgment step S102 in the first embodiment, can also include:Judge first cypher text
Translation quality whether be more than predetermined quality threshold, wherein the translation quality of first cypher text is for characterizing described the
Correctness of one cypher text as the translation result of the source voice data;If it is not, then executing the judgement in first embodiment
Step S102.
In the present embodiment, it can be estimated that the translation quality of the first cypher text B1, if its translation quality is not higher than pre-
The quality threshold first set, is called predetermined quality threshold here, then it is assumed that the first cypher text B1 is as source voice data
Translation result is incredible, i.e., the first cypher text B1 is wrong as the translation result of source voice data, at this point it is possible to
Step S102 is continued to execute, to do further judgement as the correctness of translation result to the first cypher text B1.
, whereas if the translation quality of the first cypher text B1 is higher than predetermined quality threshold, then it is assumed that the first cypher text
B1 is believable as the translation result of source voice data, i.e., the first cypher text is just as the translation result of source voice data
True, at this point it is possible to using the first cypher text B1 as the translation result of source voice data, it is possible to further be turned over to first
This B1 of translation carries out phonetic synthesis, target speech data is obtained, and target speech data is fed directly to user, to terminate
Epicycle is translated.Certainly, when using the first cypher text B1 as after the text translation result of source voice data, can also being carried out to it
Other processing, the present embodiment do not limit subsequent processing mode.
(" it will judge whether the translation quality of first cypher text is big to above-mentioned translation quality judgment step below
In predetermined quality threshold ") specific implementation be introduced.
It is a kind of flow diagram of translation quality determination method provided in this embodiment referring to Fig. 2, which sentences
The method of determining includes the following steps:
S201:First cypher text is translated, the second cypher text is obtained, wherein the second translation text
This languages are identical as the languages of source voice data.
In the present embodiment, reverse translation can be carried out to the first cypher text B1, obtains the second cypher text A2.Its
In, the languages of the first cypher text B1 are the languages of language after translation, such as English;The languages of second cypher text A2 are translation
The languages of preceding language, such as Chinese.
For example, continuing above-mentioned example, it is assumed that the first cypher text B1 is " Does Lee have to go through
security", the second cypher text A2 for carrying out obtaining after reverse translation to it is that " Mr. Li passes through safety check".
S202:According to second cypher text, it is default to judge whether the translation quality of first cypher text is more than
Quality threshold.
In the present embodiment, it can be based on the second cypher text A2, the translation quality of the first cypher text B1 is carried out
Judgement.In one implementation, this step S202 can specifically include:According to the identification text of the source voice data and
Second cypher text, judges whether the translation quality of first cypher text is more than predetermined quality threshold.
In the specific implementation of this step 202, BLEU (bilingual evaluation can be specifically utilized
Understudy) algorithm, judges whether the translation quality of first cypher text is more than predetermined quality threshold.
Specifically, BLEU algorithms are a kind of evaluation and test algorithms to machine translation result, for assessing a kind of natural language
Translate into the translation quality of another natural language.Specific algorithm is as follows:
First, it in order to which the translation effect to the first cypher text B1 is considered comprehensively, needs successively based on 1 word
Unit (1-gram) is multiple angles such as basic unit (n-gram) to multiple words, removes the statistics translation texts of identification text A1 and second
It can not consider that each base unit is residing in the text with matched base unit number, in statistic processes between this A2
Position.Then, according to the base unit number matched, of the second cypher text A2 under each rank base unit is calculated separately
With accuracy rate.
The second cypher text A2 can be calculated at each rank base unit i-gram (i=1,2 ... n) according to following formula
Matching accuracy rate percison:
Wherein, corret be in the second cypher text A2 with the correct matched same order base unit numbers of identification text A1,
Output_length is the sum of the same order base unit in the second cypher text A2.
For example, continuing above-mentioned example, it is assumed that identification text A1 is that " Lee has to pass through safety check", the second cypher text A2
It is that " Mr. Li passes through safety check", then the result of calculation of matching accuracy rate percison is as shown in table 1 below.
Table 1
Correct matched base unit | Matching accuracy rate percison | |
1-gram | Lee, mistake, peace, inspection, | 6/10=0.6 |
2-gram | Cross peace, safety check, | 3/9=0.33 |
3-gram | Cross safety check | 1/8=0.125 |
4-gram | (nothing) | 0/7=0 |
Then, it is also contemplated that punishing the redundancy word in the second cypher text A2, so, introduce length punishment
The factor solves the problems, such as this, and principle is that the second cypher text A2 is longer, and punishment deduction of points is then more, length penalty factor
Calculation formula is as follows:
C=min (1, L1/L2) (2)
Wherein, L1 is the length for identifying text A1, and L2 is the length of the second cypher text A2.
In formula (2), if identification text A1 and the second cypher text A2 is Chinese text, it can be counted as unit of word
Calculate text size.For example, when identification text A1 is that " Lee has to pass through safety check", the length of 9;As the second cypher text A2
It is that " Mr. Li passes through safety check", the length of 10.
Finally, it calculates separately to obtain matching accuracy rate corret and length penalty factor when according to above-mentioned formula (1) and (2)
After C, the BLEU scores of the second cypher text A2 can be calculated.The corresponding BLEU scores of certain rank base unit can be specifically selected,
For example the corresponding BLEU scores of selection 4-igram, calculation formula are as follows:
bleu4-gram=C*f (4-gram) (3)
Wherein, bleu4-gramFor the BLEU scores of the second cypher text A2, C is length penalty factor, f be to 1-gram,
The processing function of the corresponding matching accuracy rate of 2-gram, 3-gram, 4-gram.
For example, when identification text A1 is that " Lee has to pass through safety check", the second cypher text A2 be that " Mr. Li passes through peace
It has examined" when, it by each matching accuracy rate (as described in Table 1) that formula (1) is calculated and formula (2) will be passed through will count
Obtained length penalty factor substitutes into formula (3), and the BLEU scores that the second cypher text A2 can be calculated are 20.56.
In the present embodiment, a translation scoring threshold value can be pre-set, here using translation scoring threshold value as pre-
If quality threshold, for example the threshold value is set as 50, since the above-mentioned score 20.56 being calculated is less than threshold value 50, it can
To judge that the translation result of the first cypher text B1 as source voice data is incredible, such as above-mentioned first cypher text B1
"Does Lee have to go through security" it is incredible;Conversely, when BLEU scores are greater than or equal to threshold
Value 50, then it is believable to judge the first cypher text B1 as the translation result of source voice data.
To sum up, a kind of translation quality determination method provided in this embodiment, can reversely turn over the first cypher text
It translates, the second cypher text, and the identification text based on source voice data and the second cypher text is obtained, using BLEU algorithms pair
Second cypher text scores, real so as to be judged according to the translation quality of the first cypher text of appraisal result pair
The evaluation problem of translation quality is showed.
3rd embodiment
In the present embodiment, if the translation quality for judging to obtain the first cypher text by above-mentioned second embodiment is not more than
Predetermined quality threshold, that is, judge the first cypher text B1 as the translation result of source voice data it is insincere after, due to voice
The judging result of interpreting equipment may be inaccurate, and therefore, speech translation apparatus can pass through the step in first embodiment
S102 is interacted with user, the interaction feedback based on user, to judge whether the first cypher text B1 is source voice data
Correct translation result.
In a kind of realization method of the present embodiment, the step S102 in first embodiment can specifically include:Using institute
It states the second cypher text to interact with the user, judges translation of first cypher text as the source voice data
As a result whether correct.In the present embodiment, can be using the second cypher text A2 as the content interacted with user, and root
Judged according to user feedback result.
The judgment step can be specifically realized in the following way.
As shown in figure 3, for it is provided in this embodiment it is a kind of judging the whether believable method flow schematic diagram of translation result, can
To include the following steps:
S301:To user output the first inquiry voice, wherein the first inquiry voice is for inquiring the source language
Whether sound data and the semanteme of second cypher text are similar.
In the present embodiment, it can will be interacted with user after the second cypher text A2 synthesis voices, the mesh of interaction
Be inquire user want translation sentence be the second cypher text A2 (i.e. source voice data and the second cypher text A2
It is semantic whether similar), it for ease of description and distinguishes, the voice of inquiry user is known as the first inquiry voice by the present embodiment, should
First inquiry voice can be specifically " you want translation be the second cypher text A2".
For example, it is assumed that the first cypher text B1 is " Does Lee have to go through security", pass through
Second embodiment S201 carries out it reverse translation and obtains the second cypher text A2 to be that " Mr. Li passes through safety check", work as profit
It is scored second cypher text A2 with BLEU algorithms, for example it is 20.56 points to obtain score value, since it is less than preset quality
Threshold value 50 is divided, then by second cypher text A2 synthesis the first inquiry voice after reverse translation, for example " may I ask that you think translation is
" Mr. Li passes through safety check"".
At this point, speech translation apparatus by the first inquiry voice feedback to user, and wait for the answer of user.
S302:If receiving affirmative acknowledgement (ACK) of the user to the first inquiry voice, first cypher text
Translation result as the source voice data is correct.
User can make affirmative acknowledgement (ACK), for example, user can be in a manner of voice or button etc. to the first inquiry voice
Speech translation apparatus input voice " yes " or " OK " or " confirmation " key is pressed on speech translation apparatus etc..In this feelings
Under condition, it is believable that speech translation apparatus, which thinks the first cypher text B1 as the translation result of source voice data, that is, thinks
One cypher text B1 is correct as the translation result of source voice data, therefore, can be translated first by step S103
Translation results of the text B1 as source voice data.
S303:If receiving negative acknowledge of the user to the first inquiry voice, first cypher text
Translation result as the source voice data is wrong.
User can make negative acknowledge, for example, user can be to language in a manner of voice or button etc. to the first inquiry voice
Sound interpreting equipment inputs voice "no" or presses " NO " key on speech translation apparatus etc..In this case, voice turns over
It is incredible to translate equipment to think the first cypher text B1 as the translation result of source voice data, that is, thinks the first translation text
This B1 is wrong as the translation result of source voice data.
To sum up, it is provided in this embodiment it is a kind of judging the whether believable method of translation result, can to user export first
Inquire voice, which is used to inquire whether source voice data and the semanteme of the second cypher text to be similar;If receiving
To affirmative acknowledgement (ACK), then it is assumed that the first cypher text is believable as the translation result of source voice data, if conversely, receiving no
It is fixed to answer, then it is assumed that the first cypher text is incredible as the translation result of source voice data.As it can be seen that by with user into
Row human-computer interaction can be confirmed whether the first cypher text is correct, to ensure that the accuracy of translation result.
Fourth embodiment
In the present embodiment, when first embodiment judges that first cypher text is used as the source language by step S102
When the translation result of sound data is mistake, further first cypher text can be modified, and will be revised
Translation result of the text as the source voice data.
After correcting successfully, can using revised text data as the text translation result of source voice data, at this point,
Can phonetic synthesis further be carried out to revised text data, obtain target speech data, and target speech data is straight
The reversed user that feeds, to terminate epicycle translation.Certainly, text of text data after will correct successfully as source voice data
After this translation result, other processing can also be carried out to it, the present embodiment does not limit subsequent processing mode.
As it can be seen that embodiment adds the debugging functions to translation result, that is, can be to the translation matter of the first cypher text
Amount is assessed, and when assessment result indicates that the first cypher text is relatively low as the translation quality of translation result, can be turned over to this
It translates result to be modified, to improve the accuracy of translation result.
It should be noted that the application can be repaiied according to provided in this embodiment based on any of the above embodiments
The first cypher text of correction method pair B1 is modified.
In a kind of realization method of the present embodiment, the mode of text matches specifically may be used, to the first cypher text
B1 is modified, next, by the specific implementation of this amendment step is introduced.
It is a kind of flow diagram of cypher text modification method provided in this embodiment referring to Fig. 4, which repaiies
Correction method includes the following steps:
S401:The identification text of the source voice data is subjected to matching operation with the text data in database.
In the present embodiment, a database can be built in advance, wherein at least one set of sentence is stored in the database
Right, the sentence is to carrying out the second sample text after correctly translating including first sample text and to the first sample text
This, languages (languages of the preceding language of translation) of the languages of the first sample text and the source voice data are identical, and described the
The languages of two sample texts are identical as languages (languages of language after translation) of first cypher text.
Specifically, a large amount of first sample text can be collected in advance and first sample text is correctly translated
Mutual corresponding first sample text and the second sample text are formed sentence pair, utilize these languages by the second sample text afterwards
Sentence pair builds database, which can be the local data base of speech translation apparatus, can also be and speech translation apparatus
The database of the Cloud Server side communicated.
In the present embodiment, which can be built according to specific application demand, that is, the database can be only
Storage and the relevant sentence pair of concrete application scene, for example, user needs to use speech translation apparatus when entering and leaving the border safety check, that
, the entry and exit common sentence pair of safety check can be stored in the database in advance;Certainly, it can also be stored in the database more
A relevant sentence pair of application scenarios, in practical application, can judge applied field automatically according to the source voice data of user
Then scape selects the sentence of respective application scene to set.
It should be noted that the present embodiment does not limit the quantity of the sentence pair under a certain application scenarios, such as ten thousand left sides 1-4
Right sentence pair, but in order to reach correction effect, need to cover as possible and the relevant common or sentence that is of little use of respective application scene
It is right.
By taking safety check scene of entering and leaving the border as an example, certain sentence is as follows to storage format in the database:
{"cn":"Luggage has to pass through safety check","update_time":"20171018T173941",
"en":"Must the luggage be checked by security",
"create_time":"20171018T173941","id":"00000001"}
Wherein:cn:Indicate Chinese sentence;
en:Indicate corresponding English sentence;
update_time:It indicates to upload the database time;
create_time:Indicate the time of making sentence pair;
id:Indicate data to unique mark in the database.
In the present embodiment, the identification text A1 of source voice data is matched with the text data in database, than
Such as, Doc2Vec algorithms may be used to be matched, wherein Doc2Vec, which is called, is paragraph2vec or sentence
Embeddings is a kind of non-supervisory formula algorithm.
S402:By the matching operation, the first sample most like with the identification text of the source voice data is obtained
Text.
By will identification text A1 matched with the text data in database, obtain in database with identify text A1
Sample text A3 is called in most like first sample text, here letter.When being matched, can first will identification text A1 to
After quantization, obtain the sentence vector of identification text A1, then, in database with identification text A1 is identical languages every 1 the
One sample text calculates separately the distance between the sentence vector of the sentence vector and each first sample text of identification text A1, choosing
It selects apart from nearest first sample text, as the sample text A3 most like with identification text A1.
For example, when being matched using Doc2Vec algorithms, it is assumed that identification text A1 is that " Lee has to pass through safety check",
Itself and database are subjected to text matches, however, it is determined that id is that " luggage has to pass through safety check for the first sample text of " 00000001 "
" sentence vector with " Lee has to pass through safety check" the distance between sentence vector it is most short, then be " 00000001 " by id
" luggage has to pass through safety check to first sample text", as the sample text A3 most like with identification text A1.
S403:According to the most like first sample text, first cypher text is modified.
In the present embodiment, when getting the first sample text most like with the identification text A1 of source voice data, i.e.,
Sample text A3 can be modified using A3 couples of the first cypher text B1 of sample text.
It, can be directly by the second sample of the sentence centering belonging to sample text A3 in a kind of realization method of the present embodiment
This text carries out successfully revised text as to the first cypher text.
In another realization method of the present embodiment, step S403 can specifically utilize the most like first sample
Text is interacted with the user, realizes the amendment to first cypher text.It, can will be described in this realization method
Most like first sample text, that is, sample text A3, as the content interacted with user, and according to user feedback result
Amendment to the first cypher text.
The specific implementation of step S403, may comprise steps of A-B:
Step A:To user output the second inquiry voice, wherein the second inquiry voice is for inquiring the source
Whether voice data and the semanteme of the most like first sample text are similar.
The sample text A3 that can will go out from database matching is interacted after synthesizing voice with user, the mesh of interaction
Be to inquire whether the sentence that user wants translation is that (i.e. whether are the semanteme of source voice data and sample text A3 by sample text A3
It is similar), it for ease of description and distinguishes, the voice of inquiry user is known as the second inquiry voice by the present embodiment, second inquiry
Voice can be specifically " you want translation be sample text A3".
For example, it is assumed that sample text A3 is that " luggage has to pass through safety check", then the second inquiry voice can " be may I ask
What you thought translation is that " luggage has to pass through safety check"".
At this point, speech translation apparatus by the second inquiry voice feedback to user, and wait for the answer of user.
Step B:If receiving affirmative acknowledgement (ACK) of the user to the second inquiry voice, from described most like the
Sentence centering belonging to one sample text obtains the second sample text, after successfully being corrected to first cypher text
Text.
User can make affirmative acknowledgement (ACK), for example, user can be in a manner of voice or button etc. to the second inquiry voice
Speech translation apparatus input voice " yes " or " OK " or " confirmation " key is pressed on speech translation apparatus etc..In this feelings
It, can be by inquiring database under condition, the sentence centering belonging to sample text A3 obtains the second sample text, is claimed here
For sample text B3, revised text is carried out successfully using sample text B3 as to the first cypher text.
For example, user hears that speech translation apparatus sends out the second inquiry voice and " could you tell me that think translation is that " luggage must be through
Cross safety check" ", if answer result be " yes ", at this point, speech translation apparatus think user want translation be sample
Text A3:" luggage has to pass through safety check", and by corresponding sample text B3 " the Must the of sentence centering
luggage be checked by security" as revised text is carried out successfully to the first cypher text B1, it corrects
Success.
Further, user may also make negative acknowledge to the second inquiry voice, and therefore, the present embodiment can also wrap
It includes:
Step C:If receiving negative acknowledge of the user to the second inquiry voice, suggestion voice is exported,
In, the suggestion voice is used to prompt the user to repeat the source voice data or replaces saying for the source voice data
Method.
User can make negative acknowledge, for example, user can be to language in a manner of voice or button etc. to the second inquiry voice
Sound interpreting equipment inputs voice "no" or presses " NO " key on speech translation apparatus etc..In this case, it is believed that repair
Positive failure, at this point, speech translation apparatus can ask user to come again source voice data or change one kind with voice mode
With source voice data in semantically similar saying, to open the translation interaction of a new round.
To sum up, a kind of cypher text modification method provided in this embodiment, by the identification text and data of source voice data
Text data in library carries out matching operation, to obtain and identify the most like sentence of text, then according to the most like language
Sentence, is modified the first cypher text.As it can be seen that the present embodiment can be accumulated in advance under each translation direction, each application scenarios
Sentence pair, and store in the database, the sentence most like with identification text can be searched in the database by matching algorithm,
Using the cypher text of the sentence as revised text, to realize text amendment.
5th embodiment
A kind of speech translation apparatus will be introduced in the present embodiment, and related content refers to above method embodiment.It needs
It is noted that the speech translation apparatus can be above-mentioned speech translation apparatus, can also be in above-mentioned speech translation apparatus
A part.
It is a kind of composition schematic diagram of speech translation apparatus provided in this embodiment referring to Fig. 5, which includes:
Speech interpreting unit 501 is translated for the source voice data to user, obtains the first cypher text, wherein
The languages of first cypher text are different from the languages of source voice data;
User interaction unit 502, for by being interacted with the user, judging first cypher text as institute
Whether the translation result for stating source voice data is correct.
In a kind of realization method of the present embodiment, described device 500 can also include:
Text amending unit, if for judging to obtain translation knot of first cypher text as the source voice data
Fruit is wrong, then is modified to first cypher text, and using revised text as the source voice data
Translation result.
In a kind of realization method of the present embodiment, described device 500 can also include:
Quality estimation unit, for judging whether the translation quality of first cypher text is more than predetermined quality threshold,
Wherein, the translation quality of first cypher text is for characterizing the first cypher text turning over as the source voice data
Translate the correctness of result;If it is not, then triggering the user interaction unit 502 come by being interacted described in judgement with the user
Whether the first cypher text is correct as the translation result of the source voice data.
In a kind of realization method of the present embodiment, the Quality estimation unit includes:
Reverse translation subelement obtains the second cypher text for being translated to first cypher text, wherein
The languages of second cypher text are identical as the languages of source voice data;
Quality estimation subelement, for according to second cypher text, judging the translation matter of first cypher text
Whether amount is more than predetermined quality threshold.
In a kind of realization method of the present embodiment, the Quality estimation subelement is specifically used for according to the source voice
It is default to judge whether the translation quality of first cypher text is more than for the identification text of data and second cypher text
Quality threshold.
In a kind of realization method of the present embodiment, the user interaction unit 502 specifically can be used for utilizing described the
Two cypher texts are interacted with the user, judge translation result of first cypher text as the source voice data
It is whether correct.
In a kind of realization method of the present embodiment, the user interaction unit 502 may include:
First inquiry subelement, for inquiring voice to user output first, wherein the first inquiry voice is used
It is whether similar in the semanteme of the inquiry source voice data and second cypher text;
As a result determination subelement, if for receiving affirmative acknowledgement (ACK) of the user to the first inquiry voice, institute
The translation result that the first cypher text is stated as the source voice data is correct;If receiving the user to described first
Inquire the negative acknowledge of voice, then first cypher text is wrong as the translation result of the source voice data.
In a kind of realization method of the present embodiment, the text amending unit specifically can be used for using text matches
Mode, first cypher text is modified.
In a kind of realization method of the present embodiment, the text amending unit may include:
Text matches subelement, for carrying out the identification text of the source voice data with the text data in database
Matching operation, wherein store at least one set of sentence pair in the database, the sentence is to including first sample text and right
The first sample text carries out the second sample text after correctly translating, the languages of the first sample text and the source language
The languages of sound data are identical, and the languages of second sample text are identical as the languages of the first cypher text;
Text obtains subelement, for by the matching operation, obtaining with the identification text of the source voice data most
Similar first sample text;
Text revise subelemen, for according to the most like first sample text, to first cypher text into
Row is corrected.
In a kind of realization method of the present embodiment, text revise subelemen specifically can be used for using described most like
First sample text interacted with the user, realize amendment to first cypher text.
In a kind of realization method of the present embodiment, the text revise subelemen may include:
Second inquiry subelement, for inquiring voice to user output second, wherein the second inquiry voice is used
It is whether similar in the semanteme of the inquiry source voice data and the most like first sample text;
It corrects and completes subelement, if for receiving affirmative acknowledgement (ACK) of the user to the second inquiry voice, from
Sentence centering belonging to the most like first sample text obtains the second sample text, as to first cypher text
Carry out successfully revised text.
In a kind of realization method of the present embodiment, the text revise subelemen can also include:
Voice prompt subelement, if for receiving negative acknowledge of the user to the second inquiry voice, it is defeated
Go out suggestion voice, wherein the suggestion voice is for prompting the user to repeat the source voice data or replacing the source
The saying of voice data.
Sixth embodiment
Another speech translation apparatus will be introduced in the present embodiment, and related content refers to above method embodiment.
It is a kind of hardware architecture diagram of speech translation apparatus provided in this embodiment, the interactive voice referring to Fig. 6
Device 600 includes memory 601 and receiver 602, and connect respectively with the memory 601 and the receiver 602
Processor 603, the memory 601 is for storing batch processing instruction, and the processor 603 is for calling the memory
The program instruction of 601 storages executes following operation:
The source voice data of user is translated, the first cypher text is obtained, wherein the language of first cypher text
Kind is different from the languages of source voice data;
By being interacted with the user, translation knot of first cypher text as the source voice data is judged
Whether fruit is correct.
In a kind of realization method of the present embodiment, the processor 603 is additionally operable to call the storage of the memory 601
Program instruction executes following operation:
If it is wrong that judgement, which obtains first cypher text as the translation result of the source voice data, to institute
It states the first cypher text to be modified, and using revised text as the translation result of the source voice data.
In a kind of realization method of the present embodiment, the processor 603 is additionally operable to call the storage of the memory 601
Program instruction executes following operation:
Judge whether the translation quality of first cypher text is more than predetermined quality threshold, wherein first translation
The translation quality of text is for characterizing correctness of first cypher text as the translation result of the source voice data;
If it is not, the step of then executing by being interacted with the user.
In a kind of realization method of the present embodiment, the processor 603 is additionally operable to call the storage of the memory 601
Program instruction executes following operation:
First cypher text is translated, the second cypher text is obtained, wherein the language of second cypher text
Kind is identical as the languages of source voice data;
According to second cypher text, judge whether the translation quality of first cypher text is more than preset quality threshold
Value.
In a kind of realization method of the present embodiment, the processor 603 is additionally operable to call the storage of the memory 601
Program instruction executes following operation:
According to the identification text of the source voice data and second cypher text, first cypher text is judged
Translation quality whether be more than predetermined quality threshold.
In a kind of realization method of the present embodiment, the processor 603 is additionally operable to call the storage of the memory 601
Program instruction executes following operation:
It is interacted with the user using second cypher text, judges first cypher text as the source
Whether the translation result of voice data is correct.
In a kind of realization method of the present embodiment, the processor 603 is additionally operable to call the storage of the memory 601
Program instruction executes following operation:
To user output the first inquiry voice, wherein the first inquiry voice is for inquiring the source voice number
According to whether similar to the semanteme of second cypher text;
If receiving affirmative acknowledgement (ACK) of the user to the first inquiry voice, first cypher text is as institute
The translation result for stating source voice data is correct;
If receiving negative acknowledge of the user to the first inquiry voice, first cypher text is as institute
The translation result for stating source voice data is wrong.
In a kind of realization method of the present embodiment, the processor 603 is additionally operable to call the storage of the memory 601
Program instruction executes following operation:
By the way of text matches, first cypher text is modified.
In a kind of realization method of the present embodiment, the processor 603 is additionally operable to call the storage of the memory 601
Program instruction executes following operation:
The identification text of the source voice data is subjected to matching operation with the text data in database, wherein described
Store at least one set of sentence pair in database, the sentence to include first sample text and to the first sample text into
The second sample text after the correct translation of row, the languages of the first sample text are identical as the languages of source voice data,
The languages of second sample text are identical as the languages of the first cypher text;
By the matching operation, the first sample text most like with the identification text of the source voice data is obtained;
According to the most like first sample text, first cypher text is modified.
In a kind of realization method of the present embodiment, the processor 603 is additionally operable to call the storage of the memory 601
Program instruction executes following operation:
It is interacted, is realized to first cypher text with the user using the most like first sample text
Amendment.
In a kind of realization method of the present embodiment, the processor 603 is additionally operable to call the storage of the memory 601
Program instruction executes following operation:
To user output the second inquiry voice, wherein the second inquiry voice is for inquiring the source voice number
According to whether similar to the semanteme of the most like first sample text;
If affirmative acknowledgement (ACK) of the user to the second inquiry voice is received, from the most like first sample
Sentence centering belonging to text obtains the second sample text, and revised text is carried out successfully as to first cypher text
This.
In a kind of realization method of the present embodiment, the processor 603 is additionally operable to call the storage of the memory 601
Program instruction executes following operation:
If receiving negative acknowledge of the user to the second inquiry voice, suggestion voice is exported, wherein described
Suggestion voice is used to prompt the user to repeat the source voice data or replaces the saying of the source voice data.
In some embodiments, the processor 603 can be central processing unit (Central Processing
Unit, CPU), the memory 601 can be the interior of random access memory (Random Access Memory, RAM) type
Portion's memory, the receiver 602 can include General Physics interface, and the physical interface can be that ether (Ethernet) connects
Mouth or asynchronous transfer mode (Asynchronous Transfer Mode, ATM) interface.The processor 603, receiver 602
One or more independent circuits or hardware can be integrated into memory 601, such as:Application-specific integrated circuit (Application
Specific Integrated Circuit, ASIC).
Further, the present embodiment additionally provides a kind of computer readable storage medium, including instruction, when it is in computer
When upper operation so that computer executes any one realization method in above-mentioned voice translation method.
As seen through the above description of the embodiments, those skilled in the art can be understood that above-mentioned implementation
All or part of step in example method can add the mode of required general hardware platform to realize by software.Based on such
Understand, substantially the part that contributes to existing technology can be in the form of software products in other words for the technical solution of the application
It embodies, which can be stored in a storage medium, such as ROM/RAM, magnetic disc, CD, including several
Instruction is used so that a computer equipment (can be the network communications such as personal computer, server, or Media Gateway
Equipment, etc.) execute method described in certain parts of each embodiment of the application or embodiment.
It should be noted that each embodiment is described by the way of progressive in this specification, each embodiment emphasis is said
Bright is all difference from other examples, and just to refer each other for identical similar portion between each embodiment.For reality
For applying device disclosed in example, since it is corresponded to the methods disclosed in the examples, so description is fairly simple, related place
Referring to method part illustration.
It should also be noted that, herein, relational terms such as first and second and the like are used merely to one
Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation
There are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to contain
Lid non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also include other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
The foregoing description of the disclosed embodiments enables professional and technical personnel in the field to realize or use the application.
Various modifications to these embodiments will be apparent to those skilled in the art, as defined herein
General Principle can in other embodiments be realized in the case where not departing from spirit herein or range.Therefore, the application
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest range caused.
Claims (24)
1. a kind of voice translation method, which is characterized in that including:
The source voice data of user is translated, the first cypher text is obtained, wherein the languages of first cypher text with
The languages of the source voice data are different;
By being interacted with the user, judge that first cypher text is as the translation result of the source voice data
It is no correct.
2. according to the method described in claim 1, it is characterized in that, described judge first cypher text as the source language
After whether the translation result of sound data is correct, further include:
If judge to obtain first cypher text as the translation result of the source voice data and be it is wrong, to described the
One cypher text is modified, and using revised text as the translation result of the source voice data.
3. according to the method described in claim 1, it is characterized in that, described by before being interacted with the user, also wrapping
It includes:
Judge whether the translation quality of first cypher text is more than predetermined quality threshold, wherein first cypher text
Translation quality for characterizing correctness of first cypher text as the translation result of the source voice data;
If it is not, the step of then executing by being interacted with the user.
4. according to the method described in claim 3, it is characterized in that, described judge that the translation quality of first cypher text is
It is no be more than predetermined quality threshold, including:
First cypher text is translated, the second cypher text is obtained, wherein the languages of second cypher text with
The languages of the source voice data are identical;
According to second cypher text, judge whether the translation quality of first cypher text is more than predetermined quality threshold.
5. according to the method described in claim 4, it is characterized in that, described according to second cypher text, described is judged
Whether the translation quality of one cypher text is more than predetermined quality threshold, including:
According to the identification text of the source voice data and second cypher text, turning over for first cypher text is judged
Translate whether quality is more than predetermined quality threshold.
6. according to the method described in claim 4, it is characterized in that, described by being interacted with the user, described in judgement
Whether the first cypher text is correct as the translation result of the source voice data, including:
It is interacted with the user using second cypher text, judges first cypher text as the source voice
Whether the translation result of data is correct.
7. according to the method described in claim 6, it is characterized in that, it is described using second cypher text and the user into
Row interaction, judges whether first cypher text is correct as the translation result of the source voice data, including:
To the user output first inquiry voice, wherein it is described first inquiry voice for inquire the source voice data with
Whether the semanteme of second cypher text is similar;
If receiving affirmative acknowledgement (ACK) of the user to the first inquiry voice, first cypher text is as the source
The translation result of voice data is correct;
If receiving negative acknowledge of the user to the first inquiry voice, first cypher text is as the source
The translation result of voice data is wrong.
8. according to claim 2 to 7 any one of them method, which is characterized in that described to be carried out to first cypher text
It corrects, including:
By the way of text matches, first cypher text is modified.
9. according to the method described in claim 8, it is characterized in that, described by the way of text matches, turned over to described first
Translation is originally modified, including:
The identification text of the source voice data is subjected to matching operation with the text data in database, wherein the data
At least one set of sentence pair is stored in library, the sentence is to including first sample text and being carried out just to the first sample text
Really the second sample text after translation, the languages of the first sample text are identical as the languages of source voice data, described
The languages of second sample text are identical as the languages of the first cypher text;
By the matching operation, the first sample text most like with the identification text of the source voice data is obtained;
According to the most like first sample text, first cypher text is modified.
10. right according to the method described in claim 9, it is characterized in that, described according to the most like first sample text
First cypher text is modified, including:
It is interacted with the user using the most like first sample text, first cypher text is repaiied in realization
Just.
11. according to the method described in claim 10, it is characterized in that, it is described using the most like first sample text with
The user interacts, and realizes the amendment to first cypher text, including:
To the user output second inquiry voice, wherein it is described second inquiry voice for inquire the source voice data with
Whether the semanteme of the most like first sample text is similar;
If affirmative acknowledgement (ACK) of the user to the second inquiry voice is received, from the most like first sample text
Affiliated sentence centering obtains the second sample text, and revised text is carried out successfully as to first cypher text.
12. according to the method for claim 11, which is characterized in that the method further includes:
If receiving negative acknowledge of the user to the second inquiry voice, suggestion voice is exported, wherein the prompt
Voice is used to prompt the user to repeat the source voice data or replaces the saying of the source voice data.
13. a kind of speech translation apparatus, which is characterized in that including:
Speech interpreting unit is translated for the source voice data to user, obtains the first cypher text, wherein described
The languages of one cypher text are different from the languages of source voice data;
User interaction unit, for by being interacted with the user, judging first cypher text as the source language
Whether the translation result of sound data is correct.
14. device according to claim 13, which is characterized in that described device further includes:
Text amending unit, if for judging that obtain first cypher text is as the translation result of the source voice data
Mistake, then first cypher text is modified, and using revised text as the translation of the source voice data
As a result.
15. device according to claim 13, which is characterized in that described device further includes:
Quality estimation unit, for judging whether the translation quality of first cypher text is more than predetermined quality threshold, wherein
The translation quality of first cypher text is for characterizing translation knot of first cypher text as the source voice data
The correctness of fruit;If it is not, then triggering the user interaction unit to judge that described first turns over by being interacted with the user
Whether this correct as the translation result of the source voice data for translation.
16. device according to claim 15, which is characterized in that the Quality estimation unit includes:
Reverse translation subelement obtains the second cypher text, wherein described for being translated to first cypher text
The languages of second cypher text are identical as the languages of source voice data;
Quality estimation subelement, for according to second cypher text, judging that the translation quality of first cypher text is
It is no to be more than predetermined quality threshold.
17. device according to claim 16, which is characterized in that the user interaction unit is specifically used for described in utilization
Second cypher text is interacted with the user, judges translation knot of first cypher text as the source voice data
Whether fruit is correct.
18. device according to claim 17, which is characterized in that the user interaction unit includes:
First inquiry subelement, for inquiring voice to user output first, wherein the first inquiry voice is for asking
Ask whether the source voice data and the semanteme of second cypher text are similar;
As a result determination subelement, if for receiving affirmative acknowledgement (ACK) of the user to the first inquiry voice, described the
One cypher text is correct as the translation result of the source voice data;If receiving the user to first inquiry
The negative acknowledge of voice, then first cypher text is wrong as the translation result of the source voice data.
19. according to claim 14 to 18 any one of them device, which is characterized in that the text amending unit, it is specific to use
In by the way of text matches, first cypher text is modified.
20. device according to claim 19, which is characterized in that the text amending unit includes:
Text matches subelement, for matching the identification text of the source voice data with the text data in database
Operation, wherein store at least one set of sentence pair in the database, the sentence is to including first sample text and to described
First sample text carries out the second sample text after correctly translating, the languages of the first sample text and the source voice number
According to languages it is identical, the languages of second sample text are identical as the languages of the first cypher text;
Text obtains subelement, for by the matching operation, obtaining most like with the identification text of the source voice data
First sample text;
Text revise subelemen, for according to the most like first sample text, being repaiied to first cypher text
Just.
21. device according to claim 20, which is characterized in that text revise subelemen is specifically used for described in most
Similar first sample text is interacted with the user, realizes the amendment to first cypher text.
22. device according to claim 21, which is characterized in that the text revise subelemen includes:
Second inquiry subelement, for inquiring voice to user output second, wherein the second inquiry voice is for asking
Ask whether the source voice data and the semanteme of the most like first sample text are similar;
It corrects and completes subelement, if for receiving affirmative acknowledgement (ACK) of the user to the second inquiry voice, from described
Sentence centering belonging to most like first sample text obtains the second sample text, is carried out as to first cypher text
The revised text of success.
23. a kind of speech translation apparatus, which is characterized in that including:Processor, memory, system bus;
The processor and the memory are connected by the system bus;
The memory includes instruction for storing one or more programs, one or more of programs, and described instruction works as quilt
The processor makes the processor execute such as claim 1-12 any one of them methods when executing.
24. a kind of computer readable storage medium, including instruction, when run on a computer so that computer executes such as
Method described in claim 1-12 any one.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810503163.XA CN108710616A (en) | 2018-05-23 | 2018-05-23 | A kind of voice translation method and device |
PCT/CN2019/082040 WO2019223437A1 (en) | 2018-05-23 | 2019-04-10 | Speech translation method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810503163.XA CN108710616A (en) | 2018-05-23 | 2018-05-23 | A kind of voice translation method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108710616A true CN108710616A (en) | 2018-10-26 |
Family
ID=63869422
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810503163.XA Pending CN108710616A (en) | 2018-05-23 | 2018-05-23 | A kind of voice translation method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108710616A (en) |
WO (1) | WO2019223437A1 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110047488A (en) * | 2019-03-01 | 2019-07-23 | 北京彩云环太平洋科技有限公司 | Voice translation method, device, equipment and control equipment |
WO2019223437A1 (en) * | 2018-05-23 | 2019-11-28 | 科大讯飞股份有限公司 | Speech translation method and apparatus |
CN111245460A (en) * | 2020-03-25 | 2020-06-05 | 广州锐格信息技术科技有限公司 | Wireless interphone with artificial intelligence translation |
CN111507113A (en) * | 2020-03-18 | 2020-08-07 | 北京捷通华声科技股份有限公司 | Method and device for machine-assisted manual translation |
CN111508484A (en) * | 2019-01-31 | 2020-08-07 | 阿里巴巴集团控股有限公司 | Voice data processing method and device |
CN112215015A (en) * | 2020-09-02 | 2021-01-12 | 文思海辉智科科技有限公司 | Translation text revision method, translation text revision device, computer equipment and storage medium |
CN112818702A (en) * | 2021-01-19 | 2021-05-18 | 传神语联网网络科技股份有限公司 | Multi-user multi-language collaborative speech translation system and method |
CN112818703A (en) * | 2021-01-19 | 2021-05-18 | 传神语联网网络科技股份有限公司 | Multi-language consensus translation system and method based on multi-thread communication |
CN113362818A (en) * | 2021-05-08 | 2021-09-07 | 山西三友和智慧信息技术股份有限公司 | Voice interaction guidance system and method based on artificial intelligence |
CN114727161A (en) * | 2022-04-19 | 2022-07-08 | 中国工商银行股份有限公司 | Intercommunication terminal and intercommunication method |
CN114783437A (en) * | 2022-06-15 | 2022-07-22 | 湖南正宇软件技术开发有限公司 | Man-machine voice interaction realization method and system and electronic equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102043774A (en) * | 2011-01-13 | 2011-05-04 | 北京交通大学 | Machine translation evaluation device and method |
CN102662934A (en) * | 2012-04-01 | 2012-09-12 | 百度在线网络技术(北京)有限公司 | Method and device for proofing translated texts in inter-lingual communication |
CN103744843A (en) * | 2013-12-25 | 2014-04-23 | 北京百度网讯科技有限公司 | Online voice translation method and device |
CN103810158A (en) * | 2012-11-07 | 2014-05-21 | 中国移动通信集团公司 | Speech-to-speech translation method and device |
US20150254238A1 (en) * | 2007-10-26 | 2015-09-10 | Facebook, Inc. | System and Methods for Maintaining Speech-To-Speech Translation in the Field |
CN107844470A (en) * | 2016-09-18 | 2018-03-27 | 腾讯科技(深圳)有限公司 | A kind of voice data processing method and its equipment |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9626968B2 (en) * | 2008-06-25 | 2017-04-18 | Verint Systems Ltd. | System and method for context sensitive inference in a speech processing system |
CN108710616A (en) * | 2018-05-23 | 2018-10-26 | 科大讯飞股份有限公司 | A kind of voice translation method and device |
-
2018
- 2018-05-23 CN CN201810503163.XA patent/CN108710616A/en active Pending
-
2019
- 2019-04-10 WO PCT/CN2019/082040 patent/WO2019223437A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150254238A1 (en) * | 2007-10-26 | 2015-09-10 | Facebook, Inc. | System and Methods for Maintaining Speech-To-Speech Translation in the Field |
CN102043774A (en) * | 2011-01-13 | 2011-05-04 | 北京交通大学 | Machine translation evaluation device and method |
CN102662934A (en) * | 2012-04-01 | 2012-09-12 | 百度在线网络技术(北京)有限公司 | Method and device for proofing translated texts in inter-lingual communication |
CN103810158A (en) * | 2012-11-07 | 2014-05-21 | 中国移动通信集团公司 | Speech-to-speech translation method and device |
CN103744843A (en) * | 2013-12-25 | 2014-04-23 | 北京百度网讯科技有限公司 | Online voice translation method and device |
CN107844470A (en) * | 2016-09-18 | 2018-03-27 | 腾讯科技(深圳)有限公司 | A kind of voice data processing method and its equipment |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019223437A1 (en) * | 2018-05-23 | 2019-11-28 | 科大讯飞股份有限公司 | Speech translation method and apparatus |
CN111508484A (en) * | 2019-01-31 | 2020-08-07 | 阿里巴巴集团控股有限公司 | Voice data processing method and device |
CN111508484B (en) * | 2019-01-31 | 2024-04-19 | 阿里巴巴集团控股有限公司 | Voice data processing method and device |
CN110047488B (en) * | 2019-03-01 | 2022-04-12 | 北京彩云环太平洋科技有限公司 | Voice translation method, device, equipment and control equipment |
CN110047488A (en) * | 2019-03-01 | 2019-07-23 | 北京彩云环太平洋科技有限公司 | Voice translation method, device, equipment and control equipment |
CN111507113A (en) * | 2020-03-18 | 2020-08-07 | 北京捷通华声科技股份有限公司 | Method and device for machine-assisted manual translation |
CN111245460A (en) * | 2020-03-25 | 2020-06-05 | 广州锐格信息技术科技有限公司 | Wireless interphone with artificial intelligence translation |
CN111245460B (en) * | 2020-03-25 | 2020-10-27 | 广州锐格信息技术科技有限公司 | Wireless interphone with artificial intelligence translation |
CN112215015A (en) * | 2020-09-02 | 2021-01-12 | 文思海辉智科科技有限公司 | Translation text revision method, translation text revision device, computer equipment and storage medium |
CN112818703A (en) * | 2021-01-19 | 2021-05-18 | 传神语联网网络科技股份有限公司 | Multi-language consensus translation system and method based on multi-thread communication |
CN112818703B (en) * | 2021-01-19 | 2024-02-27 | 传神语联网网络科技股份有限公司 | Multilingual consensus translation system and method based on multithread communication |
CN112818702B (en) * | 2021-01-19 | 2024-02-27 | 传神语联网网络科技股份有限公司 | Multi-user multi-language cooperative speech translation system and method |
CN112818702A (en) * | 2021-01-19 | 2021-05-18 | 传神语联网网络科技股份有限公司 | Multi-user multi-language collaborative speech translation system and method |
CN113362818A (en) * | 2021-05-08 | 2021-09-07 | 山西三友和智慧信息技术股份有限公司 | Voice interaction guidance system and method based on artificial intelligence |
CN114727161A (en) * | 2022-04-19 | 2022-07-08 | 中国工商银行股份有限公司 | Intercommunication terminal and intercommunication method |
CN114783437A (en) * | 2022-06-15 | 2022-07-22 | 湖南正宇软件技术开发有限公司 | Man-machine voice interaction realization method and system and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
WO2019223437A1 (en) | 2019-11-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108710616A (en) | A kind of voice translation method and device | |
CN108984529B (en) | Real-time court trial voice recognition automatic error correction method, storage medium and computing device | |
JP6997781B2 (en) | Error correction method and device for search terms | |
AU2020298542B2 (en) | Deriving multiple meaning representations for an utterance in a natural language understanding framework | |
CN111507088B (en) | Sentence completion method, equipment and readable storage medium | |
CN111310440B (en) | Text error correction method, device and system | |
WO2022121251A1 (en) | Method and apparatus for training text processing model, computer device and storage medium | |
CN110083819B (en) | Spelling error correction method, device, medium and electronic equipment | |
US20220261545A1 (en) | Systems and methods for producing a semantic representation of a document | |
US9311299B1 (en) | Weakly supervised part-of-speech tagging with coupled token and type constraints | |
CN111613214A (en) | Language model error correction method for improving voice recognition capability | |
CN111178064B (en) | Information pushing method and device based on field word segmentation processing and computer equipment | |
CN111695361A (en) | Method for constructing Chinese-English bilingual corpus and related equipment thereof | |
CN114678027A (en) | Error correction method and device for voice recognition result, terminal equipment and storage medium | |
CN111651961A (en) | Voice-based input method and device | |
CN108304389B (en) | Interactive voice translation method and device | |
CN111161730B (en) | Voice instruction matching method, device, equipment and storage medium | |
CN109614624B (en) | English sentence recognition method and electronic equipment | |
CN111723583A (en) | Statement processing method, device, equipment and storage medium based on intention role | |
Yu et al. | Recurrent neural network based rule sequence model for statistical machine translation | |
WO2022242535A1 (en) | Translation method, translation apparatus, translation device and storage medium | |
US20170270917A1 (en) | Word score calculation device, word score calculation method, and computer program product | |
CN114117021A (en) | Method and device for determining reply content and electronic equipment | |
CN108829657B (en) | Smoothing method and system | |
CN112560511A (en) | Method and device for translating lines and method and device for training translation model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181026 |