CN104978971B

CN104978971B - A kind of method and system for evaluating spoken language

Info

Publication number: CN104978971B
Application number: CN201410139305.0A
Authority: CN
Inventors: 陈进; 刘丹; 魏思; 胡国平; 胡郁; 刘庆峰
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2014-04-08
Filing date: 2014-04-08
Publication date: 2019-04-05
Anticipated expiration: 2034-04-08
Also published as: CN104978971A

Abstract

The present invention relates to field of voice signal, disclose a kind of method and system for evaluating spoken language.This method comprises: receiving voice data to be evaluated；It is scored using the first system the voice data, obtains the first appraisal result；If first appraisal result meets first condition, first appraisal result is exported；Otherwise, it is scored using second system the voice data, obtains the second appraisal result；If second appraisal result meets second condition, first appraisal result and second appraisal result are merged, obtain the first fusion appraisal result, then exports the first fusion appraisal result；Otherwise, it is scored using third system the voice data, obtains third appraisal result；First appraisal result, the second appraisal result and third appraisal result are merged, the second fusion appraisal result is obtained, then exports the second fusion appraisal result.The operation efficiency of system is greatly improved on the basis of guaranteeing to evaluate and test precision using the present invention.

Description

A kind of method and system for evaluating spoken language

Technical field

The present invention relates to field of voice signal, and in particular to a kind of method and system for evaluating spoken language.

Background technique

As the important medium of interpersonal communication, conversational language occupies extremely important status in real life.With society The aggravation of continuous development and the trend of globalization that can be economic, people to the objectivity of the efficiency of language learning and language assessment, Fairness and scale test propose increasingly higher demands.Traditional artificial spoken language proficiency evaluating method makes Faculty and Students Be very limited on instructional blocks of time and space, teacher strength, teaching place, in terms of there is also many hard Gap and imbalance on part；Artificial evaluation and test not can avoid the individual deviation of evaluator itself, thus it cannot be guaranteed that standards of grading Unification, can not even accurately reflect the true horizon of measured sometimes；And for extensive oral test, then it needs a large amount of Human and material resources and financial support limit regular, scale assessment test.For this purpose, industry has developed some languages in succession It teaches by precept and evaluating system.

For manually scoring is not sufficiently stable under big working strength, or even not fair enough problem, spoken automatic judgment become Trend of the times.Legacy system, which is used, develops different language teaching and points-scoring system, different scorings based on speech recognition technology System respectively has advantage because of application environment difference, and for the risk for reducing machine scoring, existing oral evaluation system is frequently with polyphyly System fusion method, such method is individually evaluated and tested with multiple and different points-scoring systems, then to the score more runs of acquired multisystem Data are merged, and best result, average mark as described in selection in score are finally scored.Legacy system passes through synthesis Multisystem scoring, is effectively reduced oral evaluation system and high score data is commented to low risk, but exacerbates calculation resources consumption and comment Time overhead is surveyed, such as in the integration program using N number of system, oral evaluation total time-consuming will be N times of single system.Especially Under the limited application environment of calculation resources, evaluation and test efficiency is in urgent need to be improved.

Summary of the invention

The embodiment of the present invention provides a kind of method and system for evaluating spoken language, for mentioning on the basis of guaranteeing to evaluate and test performance The operation efficiency of high system.

For this purpose, the invention provides the following technical scheme:

A kind of oral evaluation method, comprising:

Receive voice data to be evaluated；

It is scored using the first system the voice data, obtains the first appraisal result；

If first appraisal result meets first condition, first appraisal result is exported；

Otherwise, it is scored using second system the voice data, obtains the second appraisal result；

If second appraisal result meets second condition, first appraisal result and second scoring are tied Fruit is merged, and the first fusion appraisal result is obtained, and then exports the first fusion appraisal result；

Otherwise, it is scored using third system the voice data, obtains third appraisal result；

First appraisal result, the second appraisal result and third appraisal result are merged, the second fusion is obtained and comments Divide as a result, then exporting the second fusion appraisal result.

Preferably, the method also includes:

Preset the first system, second system and third system；Or

The first system, second system described in real-time selection and third system.

Preferably, the first system, second system and the third system of presetting includes:

The recognition confidence and evaluation result closed according to dependence test collection, selects the system of best performance as the first system System；

Select the system complementary with the first system as second system.

Preferably, the first system, second system described in the real-time selection and third system include:

According to system running environment or voice data feature, real-time selection the first system；

Select the system complementary with the first system as second system.

Preferably, the method also includes:

According to different application scenarios and demand, the first condition and second condition are determined.

Preferably, the first condition is that first appraisal result is higher than setting score value；Or the first condition is The recognition confidence of the first system is higher than setting thresholding.

Preferably, the second condition is that the difference of first appraisal result and second appraisal result is less than setting Difference.

A kind of oral evaluation system, comprising:

Receiving module, for receiving voice data to be evaluated；

The first system module obtains the first appraisal result for scoring the voice data；

Judgment module, for judging whether first appraisal result meets first condition, if it is, by described first Appraisal result sends output module to, so that the output module exports first appraisal result；Otherwise, second system is notified Module scores to the voice data；

The second system module obtains the second appraisal result for scoring the voice data；

The judgment module, is also used to judge whether second appraisal result meets second condition, if it is, notice First Fusion Module merges first appraisal result and the second appraisal result；Otherwise, third system module pair is notified The voice data scores；

First Fusion Module, for carrying out fusion to first appraisal result and the second appraisal result and obtaining One fusion appraisal result sends output module to, so that output module output the first fusion appraisal result；

The third system module obtains third appraisal result for scoring the voice data；

Second Fusion Module, for melting to first appraisal result, the second appraisal result and third appraisal result Merge the second fusion appraisal result that will be obtained and send the output module to, so that output module output described second is melted Close appraisal result；

The output module, for exporting the first appraisal result, the first fusion appraisal result or the second fusion scoring knot Fruit.

Preferably, the system also includes:

Setup module, for presetting the first system, second system and third system；Or

Selecting module, for the first system, second system and third system described in real-time selection.

Preferably, the setup module, specifically for the recognition confidence and evaluation result closed according to dependence test collection, It selects the system of best performance as the first system, and selects the system complementary with the first system as second system.

Preferably, the selecting module is specifically used for according to system running environment or voice data feature, real-time selection the One system, and select the system complementary with the first system as second system.

Preferably, the system also includes:

Condition determining module, for determining the first condition and second condition according to different application scenarios and demand.

Method and system for evaluating spoken language provided in an embodiment of the present invention adds on the basis of multisystem merges oral evaluation Enter appraisal result arbitration functions, in the case where appraisal result is unsatisfactory for requirement, other evaluating systems is further selected to carry out again Secondary scoring, and obtained a variety of appraisal results are merged, obtain final appraisal result.Therefore, guaranteeing evaluation and test performance On the basis of, greatly improve evaluation and test efficiency.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only one recorded in the present invention A little embodiments are also possible to obtain other drawings based on these drawings for those of ordinary skill in the art.

Fig. 1 is the flow chart of oral evaluation method of the embodiment of the present invention；

Fig. 2 is a kind of structural schematic diagram of oral evaluation system of the embodiment of the present invention.

Specific embodiment

The scheme of embodiment in order to enable those skilled in the art to better understand the present invention with reference to the accompanying drawing and is implemented Mode is described in further detail the embodiment of the present invention.

For the lower problem of the fusion of multisystem in the prior art oral evaluation system evaluation efficiency, the embodiment of the present invention is mentioned For a kind of method and system for evaluating spoken language, scores first with the first system voice data to be evaluated, obtain first Appraisal result；Then judge whether the first appraisal result meets first condition, if it is, output first appraisal result, Otherwise, it is scored using second system the voice data, obtains the second appraisal result；Subsequently determine whether the second appraisal result Whether meet second condition, if it is, merging to first appraisal result and second appraisal result, obtains One fusion appraisal result is simultaneously exported as final appraisal result, otherwise, is scored using third system the voice data, Obtain third appraisal result；Finally first appraisal result, the second appraisal result and third appraisal result are merged, obtained It exports to the second fusion appraisal result and as final appraisal result.

As shown in Figure 1, being the flow chart of oral evaluation method of the embodiment of the present invention, comprising the following steps:

Step 101, voice data to be evaluated is received.

Step 102, it is scored using the first system the voice data, obtains the first appraisal result.

Existing multisystem fusion oral evaluation system merges all single system appraisal results, therefore, to monosystem The sequencing of system operation does not have particular requirement.This evaluating method is from the angle for improving system operations efficiency, in order to the greatest extent may be used The evaluation and test sequence of the fast generation evaluation result of energy, each single system is quite important.The first system should be able to be solved accurately greatly Partial evaluation and test problem.

Specifically, include following two for the selection method of the first system:

1) it presets, i.e., just sets the first system before system operation, no longer change in system operation.For pre- The multiple evaluating systems being first arranged, can choose wherein any one evaluating system, can also be according in phase as the first system Recognition confidence and evaluation result that test set closes are closed, selects the system with optimal performance or most robust as the first system System.

2) real-time selection, system can according to current operating environment, or the characteristics of according to voice data to be evaluated etc., Real-time selection has the system of optimal representation as the first system.Such as: DNN(Deep Neural Network, deep layer nerve Network) there is stronger expression ability for training data, DNN identifying system obtained from low signal-to-noise ratio training data, which is added, one Fixed anti-noise ability can be real-time if the running environment of current system is relatively poor or data SNR to be evaluated is relatively low Select the system identified based on DNN as the first system, otherwise selection is based on BN (Bottle-Neck) system identified or base In GMM-HMM(Gaussian Mixture Model-Hidden Markov Model, gauss hybrid models-Hidden Markov Model) identification system as the first system.

It should be noted that, institute different with two methods of the foundation of real-time selection selection the first system due to presetting It is also different with applicable scene.It is real for the application scenarios of known speech evaluating data portion characteristic, system current operating situation When select the first system method, can guarantee evaluate and test effect on the basis of, quickly obtain final evaluation result；For Have the application scenarios of the evaluation result on dependence test collection, preset the first system (such as based on Bottle Neck identification System) method have more advantage.

Step 103, judge whether the first appraisal result meets first condition.

The first condition can be determined according to different application scenarios and demand, be commented for example, can be described first Result is divided to be higher than setting score value；Either the recognition confidence of the first system is higher than setting thresholding.

By judging whether the first appraisal result meets first condition, it is possible to prevente effectively from high score data are commented low ask Topic, and then improve evaluation and test precision.

In the present embodiment, high score data are commented to low risk in order to reduce, first condition can choose as the first scoring As a result whether it is higher than 80 points (hundred-mark system)；In order to improve the efficiency of multisystem fusion, first condition can be simply selected to be the Whether the recognition confidence of one system is higher than setting thresholding.

Further, if the first appraisal result meets first condition, 109 are thened follow the steps, exports first scoring As a result；Otherwise, step 104 is executed.

Step 104, it is scored using second system the voice data, obtains the second appraisal result.

Different speech recognition systems is such as based on PLP(Perceptual Linear based on different acoustic features Predictive, perceive linear prediction) feature acoustic model, or use different acoustic models, such as based on the nerve net of DNN Network acoustic model decodes voice data even being searched for using different decoding processes such as Viterbi.

Since different speech recognition systems has different decoding advantages, often have centainly between recognition result Complementarity, therefore second system should have corresponding supplementary function to the first system, to improve the accuracy of evaluation and test.

For the selection method of second system equally preset with two kinds of real-time selection, specifically refer to the first system Selection method, and will not be described here in detail.

Step 105, judge whether second appraisal result meets second condition.

The second condition can equally be determined according to different application scenarios and demand, for example, can be described The difference of one appraisal result and second appraisal result is less than setting difference etc..

By judging whether the second appraisal result meets second condition, individual voice evaluation and test can be reduced to a certain extent System exception or the extremely caused score abnormal conditions of evaluating object, and then improve evaluation and test precision.

In the present embodiment, different in order to reduce the extremely caused score of individual voice evaluating system exception or evaluating object Reason condition, the second condition can be set to first appraisal result and the difference of second appraisal result is less than setting Difference (such as 4%).

It should be noted that in practical applications, can also be loaded according to different examination types and be suitble to this type accordingly The speech evaluating system of examination is mentioned by being compared to evaluation result of the same type evaluating system to different evaluating objects Before judge system operation situation, the extremely caused score abnormal conditions of removal system, to further increase the operation of system Efficiency and evaluation and test precision.

Further, if the second appraisal result meets second condition, 106 are thened follow the steps；Otherwise, step 107 is executed.

Step 106, first appraisal result and the second appraisal result are merged, obtains the first fusion scoring knot Then fruit executes step 109, output the first fusion appraisal result.

In the present embodiment, high score data are commented into low caused appraisal result abnormal conditions in order to be effectively reduced, can adopted A variety of amalgamation modes are taken, for example, higher appraisal result in the first appraisal result and the second appraisal result is taken to merge as first Appraisal result, naturally it is also possible to use other amalgamation modes, without limitation to this embodiment of the present invention.

In practical application, for having run complementary stronger the first system (such as Bottle Neck) and second system The pre- evaluation and test data of (such as GMM-HMM), if the appraisal result of the first system and second system is not much different, (for example difference is not More than operation third system 4%), is not then needed, directly first appraisal result and the second appraisal result are merged, obtained To the first fusion appraisal result；And it is exported as final appraisal result.

Step 107, it is scored using third system the voice data, obtains third appraisal result.

The third system has certain complementarity (integrality, accuracy, stream to the first system and second system Sharp degree etc.).

Since different speech evaluating systems uses different recognizer or acoustic model, often there is different knowledges Not as a result, evaluation and test score is also not quite similar accordingly, there is also certain complementarity between appraisal result.

For third system alternatives equally preset with two kinds of real-time selection, specifically refer to the first system Selection method, and will not be described here in detail.

Step 108, first appraisal result, the second appraisal result and third appraisal result are merged, obtains Two fusion appraisal results.

High score data are commented into low caused appraisal result abnormal conditions in order to be effectively reduced, take the first appraisal result, second Highest appraisal result in appraisal result and third appraisal result is as the second fusion appraisal result.

It should be noted that under different application scenarios, can choose different system appraisal results in practical application Fusion method.For example, high score data to be commented to low risk in order to reduce for formal examination, the method for system score fusion is Take the peak of the first appraisal result, the second appraisal result and third appraisal result.And for other application scenarios, such as mould Quasi- examination, machine indirect labor scoring etc., for the comprehensive condition of conservative estimation individual, the method for system score fusion can be selected Take the average value of the first appraisal result, the second appraisal result and third appraisal result as the second fusion appraisal result.

Step 109, appraisal result is exported.

Scoring is added on the basis of multisystem merges oral evaluation in oral evaluation method provided in an embodiment of the present invention As a result arbitration functions further select other evaluating systems to be scored again in the case where appraisal result is unsatisfactory for requirement, And obtained a variety of appraisal results are merged, obtain final appraisal result.It not only ensure that the accuracy of evaluation result, and And greatly improve evaluation and test efficiency.

Correspondingly, the embodiment of the present invention also provides a kind of oral evaluation system, as shown in Fig. 2, being a kind of knot of the system Structure schematic diagram.

In this embodiment, the system comprises receiving module 201, the first system module 202, judgment module 203, Two system module 204, the first Fusion Module 205, third system module 206, the second Fusion Module 207 and output module 208.Wherein:

Receiving module 201, for receiving voice data to be evaluated.

The first system module 202 obtains the first appraisal result for scoring the voice data.

Judgment module 203, for judging whether first appraisal result meets first condition, if it is, will be described First appraisal result sends output module 208 to, so that the output module 208 exports first appraisal result；Otherwise, lead to Know that second system module 204 scores to the voice data.

Above-mentioned first condition can be determined according to different application scenarios and demand, be commented for example, can be described first Result is divided to be higher than setting score value；Either the recognition confidence of the first system is higher than setting thresholding etc..

Judgment module 203 is by judging whether the first appraisal result meets first condition, it is possible to prevente effectively from by balloon score According to commenting low problem, and then improve assessment precision.

In the present embodiment, high score data are commented to low risk in order to reduce, first condition can choose as the first scoring As a result whether it is higher than 80 points (hundred-mark system)；In order to improve the efficiency of multisystem fusion, first condition can simply be selected as the Whether the recognition confidence of one system module is higher than setting thresholding.

Further, if the first appraisal result meets first condition, judgment module 203 notifies output module 208 defeated First appraisal result out；Otherwise, notice second system module 204 evaluates and tests the voice data.

Second system module 204 obtains the second appraisal result for scoring the voice data.

Further, judgment module 203, are also used to judge whether second appraisal result meets second condition, if It is that the first Fusion Module 205 is then notified to merge first appraisal result and the second appraisal result；Otherwise, is notified Three system modules 206 score to the voice data.

Equally, above-mentioned second condition can also be determined according to different application scenarios and demand, for example, can be described The difference of first appraisal result and second appraisal result is less than setting difference etc..

First Fusion Module 205, for merge and will obtain to first appraisal result and the second appraisal result First fusion appraisal result send output module 208 to so that the output module 208 export it is described first fusion scoring knot Fruit.

First Fusion Module 205 to the specific amalgamation mode of the first appraisal result and the second appraisal result can there are many, For example take in the first appraisal result and the second appraisal result higher appraisal result as the first fusion appraisal result, or weighting It is average etc..In practical application, Bottle is such as based on for having run complementary stronger above-mentioned the first system module 202( Neck identification system) and second system module 204(such as based on GMM-HMM identify system) pre- evaluation and test data, if The appraisal result of the first system module 202 and second system module 204 is not much different (for example difference is no more than 4%), then no longer needs Third system module 206 is wanted to score the voice data, the first Fusion Module 205 is directly by first appraisal result It is merged with the second appraisal result, obtains the first fusion appraisal result；And output module is sent to as final appraisal result 208。

Third system module 206 obtains third appraisal result for scoring the voice data.

The third system module 206 has centainly mutual to the first system module 202 and second system module 204 Benefit property (integrality, accuracy, fluency etc.).When second appraisal result is unsatisfactory for second condition, third system module Pre- evaluation and test data are evaluated and tested.

It should be noted that since different speech evaluating systems uses different recognizer or acoustic model, it is past Toward having different recognition results, corresponding score of evaluating and testing also is not quite similar, and there is also certain complementarity between appraisal result.

Second Fusion Module 207, for being carried out to first appraisal result, the second appraisal result and third appraisal result It merges and sends the obtain second fusion appraisal result to output module 208, so that second fusion of the output of output module 208 is commented Divide result.

Equally, the second Fusion Module 207 can also be using a variety of amalgamation modes to the first appraisal result, the second appraisal result It is merged with third appraisal result, for example, high score data are commented into low caused appraisal result abnormal conditions in order to be effectively reduced, Second Fusion Module 207 takes the highest appraisal result conduct in the first appraisal result, the second appraisal result and third appraisal result Second fusion appraisal result, or these three appraisal results are weighted and averaged.Certainly, under different application scenarios, the Two Fusion Modules 207 can select different fusion methods.For example, for formal examination, in order to reduce high score data are commented it is low Risk, the second Fusion Module 207 can take the peak of the first appraisal result, the second appraisal result and third appraisal result to make For the second fusion appraisal result.And for other application scenarios, such as mock examination, machine indirect labor scoring etc., in order to The comprehensive condition of conservative estimation individual, the second Fusion Module 207 can take the first appraisal result, the second appraisal result and third to comment Divide the average value of result as the second fusion appraisal result.

Output module 208, for exporting the first appraisal result, the first fusion appraisal result or the second fusion scoring knot Fruit.

It is previously noted that above-mentioned first condition and second condition can be determined according to different application scenarios and demand, phase Ying Di in the evaluating system of the embodiment of the present invention, is also provided with condition determining module in order to facilitate systematic difference (not shown), for determining the first condition and second condition according to different application scenarios and demand.In this way, in difference Application environment under, corresponding first condition and second condition easily can be arranged by the condition determining module.

In addition, in practical applications, above-mentioned the first system module 202, second system module 204 and third system module 206 are loaded with different types of oral evaluation system respectively, for convenience, are referred to as the first system, the second system respectively System and third system.And the first system, second system and third system can be according to application environment and characteristic voices to be evaluated To determine.For this purpose, can also further comprise in the oral evaluation system of the embodiment of the present invention: setup module or selecting module (not shown).Wherein:

The setup module, for presetting the first system, second system and third system, such as can basis The recognition confidence and evaluation result that dependence test collection closes select the system of best performance as the first system, and select with The system of the first system complementation is as second system.

The selecting module, for the first system, second system and third system described in real-time selection, such as can basis System running environment or voice data feature, real-time selection the first system, and the system complementary with the first system is selected to make For second system.

In addition, it is necessary to which explanation, in practical applications, the single evaluating system of multiple and different types can be preset to In the oral evaluation system of the embodiment of the present invention, each system module is selected to need by above-mentioned setup module by user to be loaded System.Certainly, according to the thought of the embodiment of the present invention, the appraisal result of more triangular webs can also be merged, is obtained To more accurate appraisal result, without limitation to this embodiment of the present invention.

Scoring is added on the basis of multisystem merges oral evaluation in oral evaluation system provided in an embodiment of the present invention As a result arbitration functions further select other evaluating systems to be scored again in the case where appraisal result is unsatisfactory for requirement, And obtained a variety of appraisal results are merged, obtain final appraisal result.It not only ensure that the accuracy of evaluation result, and And greatly improve evaluation and test efficiency.

The same or similar parts between the embodiments can be referred to each other in this specification, for system embodiment Speech, since it is substantially similar to the method embodiment, so describing fairly simple, referring to the part of embodiment of the method in place of correlation Explanation.System embodiment described above is only schematical, wherein the module as illustrated by the separation member It may or may not be physically separated, the component shown as module may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness In the case where labour, it can understand and implement.

Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice Microprocessor or digital signal processor (DSP) come realize some in oral evaluation system according to an embodiment of the present invention or The some or all functions of person's whole component.The present invention is also implemented as one for executing method as described herein Point or whole device or device programs (for example, computer program and computer program product).

The embodiment of the present invention has been described in detail above, and specific embodiment used herein carries out the present invention It illustrates, the above description of the embodiments is only used to help understand the method and apparatus of the present invention；Meanwhile for the one of this field As technical staff, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, to sum up institute It states, the contents of this specification are not to be construed as limiting the invention.

Claims

1. a kind of oral evaluation method characterized by comprising

Receive voice data to be evaluated；

Otherwise, it is scored using second system the voice data, obtains the second appraisal result；Wherein, second system pair The first system has complementarity；

If second appraisal result meets second condition, to first appraisal result and second appraisal result into Row fusion, obtains the first fusion appraisal result, then exports the first fusion appraisal result；

Otherwise, it is scored using third system the voice data, obtains third appraisal result；Wherein, third system pair The first system and second system have complementarity；

First appraisal result, the second appraisal result and third appraisal result are merged, the second fusion scoring knot is obtained Then fruit exports the second fusion appraisal result.

2. the method according to claim 1, wherein the method also includes:

Preset the first system, second system and third system；Or

3. according to the method described in claim 2, it is characterized in that, described preset the first system, second system and third System includes:

The recognition confidence and evaluation result closed according to dependence test collection, selects the system of best performance as the first system；

Select the system complementary with the first system as second system.

4. according to the method described in claim 2, it is characterized in that, the first system described in the real-time selection, second system and Third system includes:

Select the system complementary with the first system as second system.

5. method according to any one of claims 1 to 4, which is characterized in that the method also includes:

6. according to the method described in claim 5, it is characterized in that, the first condition be first appraisal result be higher than set Determine score value；Or the first condition is the recognition confidence of the first system higher than setting thresholding.

7. according to the method described in claim 5, it is characterized in that, the second condition be first appraisal result with it is described The difference of second appraisal result is less than setting difference.

8. a kind of oral evaluation system characterized by comprising

Receiving module, for receiving voice data to be evaluated；

Judgment module, for judging whether first appraisal result meets first condition, if it is, described first is scored As a result output module is sent to, so that the output module exports first appraisal result；Otherwise, second system module is notified It scores the voice data；

The second system module obtains the second appraisal result for scoring the voice data；Wherein, the second system System has complementarity to the first system；

The judgment module, is also used to judge whether second appraisal result meets second condition, if it is, notifying first Fusion Module merges first appraisal result and the second appraisal result；Otherwise, notice third system module is to described Voice data scores；Wherein, third system has complementarity to the first system and second；

First Fusion Module, for merge and melt obtain first to first appraisal result and the second appraisal result It closes appraisal result and sends output module to, so that output module output the first fusion appraisal result；

Second Fusion Module, for being merged simultaneously to first appraisal result, the second appraisal result and third appraisal result The obtain second fusion appraisal result is sent to the output module, so that output module output second fusion is commented Divide result；

The output module, for exporting the first appraisal result, the first fusion appraisal result or the second fusion appraisal result.

9. system according to claim 8, which is characterized in that the system also includes:

10. system according to claim 9 characterized by comprising

The setup module selects performance most specifically for the recognition confidence and evaluation result closed according to dependence test collection Excellent system selects the system complementary with the first system as second system as the first system.

11. system according to claim 9 characterized by comprising

The selecting module is specifically used for according to system running environment or voice data feature, real-time selection the first system, and selects The system complementary with the first system is selected as second system.

12. system according to any one of claims 8 to 11, which is characterized in that the system also includes: