CN110176225B

CN110176225B - Method and device for evaluating rhythm prediction effect

Info

Publication number: CN110176225B
Application number: CN201910461506.5A
Authority: CN
Inventors: 杨勤英; 吴陈成; 宋明
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2019-05-30
Filing date: 2019-05-30
Publication date: 2021-08-13
Anticipated expiration: 2039-05-30
Also published as: CN110176225A

Abstract

The application provides a method and a device for evaluating a prosody prediction effect, wherein the method comprises the following steps: obtaining rhythm prediction results corresponding to a plurality of test cases in the test case set respectively, wherein the rhythm prediction result corresponding to each test case is obtained through prediction of a rhythm prediction engine to be evaluated; determining weights of rhythm prediction results corresponding to the multiple test cases respectively based on weights of each artificial rhythm marking result in an artificial rhythm marking result set corresponding to the multiple test cases respectively, wherein the weight of any artificial rhythm marking result can represent the reasonable degree of the artificial rhythm marking result; and determining an evaluation result of the prosody prediction effect of the prosody prediction engine to be evaluated according to the prosody prediction results respectively corresponding to the plurality of test cases and the weights of the prosody prediction results respectively corresponding to the plurality of test cases. The method for evaluating the prosody prediction effect can automatically, efficiently and objectively evaluate the prediction effect of the prosody prediction engine.

Description

Method and device for evaluating rhythm prediction effect

Technical Field

The present application relates to the field of speech synthesis technologies, and in particular, to a method and an apparatus for evaluating a prosody prediction effect.

Background

Prosody prediction is an indispensable part of a speech synthesis system, and belongs to front-end processing of a speech synthesis system for predicting prosody boundary positions in text data, and back-end processing of the speech synthesis system gives audio pauses according to the prosody boundary positions.

The prosody prediction is realized by a prosody prediction engine, the quality of the prosody prediction effect of the prosody prediction engine directly affects the overall quality of speech synthesis, and in order to obtain higher speech synthesis quality, the prosody prediction effect of the prosody prediction engine needs to be evaluated.

Currently, a method for evaluating a prosody prediction effect of a prosody prediction engine is a manual evaluation method, that is, an evaluator evaluates a prosody prediction result of the prosody prediction engine. However, the manual evaluation method is susceptible to subjective factors (such as experience, status, and the like of an evaluator), so that the reliability of the evaluation result is not high, and the labor cost and time cost of the manual evaluation method are high.

Disclosure of Invention

In view of the above, the present application provides a method and an apparatus for evaluating a prosody prediction result, so as to solve the problems that in the prior art, an artificial evaluation method is subject to subjective factors, so that the reliability of the evaluation result is not high, and the labor cost and the time cost of the artificial evaluation method are high, and the technical scheme is as follows:

a method for evaluating the effect of prosody prediction, comprising:

obtaining rhythm prediction results corresponding to a plurality of test cases in the test case set respectively, wherein the rhythm prediction result corresponding to each test case is obtained through prediction of a rhythm prediction engine to be evaluated;

determining weights of rhythm prediction results corresponding to the plurality of test cases respectively based on a pre-obtained weight of each artificial rhythm marking result in the artificial rhythm marking result sets corresponding to the plurality of test cases respectively, wherein the artificial rhythm marking result set corresponding to any test case comprises at least one artificial rhythm marking result corresponding to the test case, and the weight of any artificial rhythm marking result can represent the reasonable degree of the artificial rhythm marking result;

and determining an evaluation result of the prosody prediction effect of the prosody prediction engine to be evaluated according to the respective prosody prediction results of the plurality of test cases and the weights of the respective prosody prediction results of the plurality of test cases.

Optionally, the obtaining a weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to the plurality of test cases respectively includes:

acquiring artificial rhythm marking result sets respectively corresponding to the plurality of test cases;

performing audio synthesis on each artificial rhythm marking result in the artificial rhythm marking result set corresponding to the plurality of test cases respectively to obtain a synthesized audio set corresponding to the plurality of test cases respectively;

determining the weight of each artificial rhythm marking result in an artificial rhythm marking result set corresponding to each of the plurality of test cases according to the optimal artificial rhythm marking result selected by each audiometer aiming at each test case; the method comprises the steps that a test case is generated according to the artificial rhythm marking result selected by an audiologist, wherein the optimal artificial rhythm marking result selected by the audiologist aiming at any test case is the optimal artificial rhythm marking result selected by the audiologist from the artificial rhythm marking result set corresponding to the test case through audiometring of each synthetic audio in the synthetic audio set corresponding to the test case.

Optionally, the determining, according to the best artificial prosody labeling result selected by each audiometer of the multiple audiometers for each test case, a weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to each of the multiple test cases includes:

dividing the test cases in the test case set into a plurality of groups, wherein each group of test cases forms a test case subset to obtain a plurality of test case subsets;

obtaining an unobtainable test case subset from the plurality of test case sets as a target test case subset;

determining target weights respectively corresponding to the audiometries based on the initial weights respectively corresponding to the audiometries and the optimal artificial prosody labeling result selected by each audiometrie aiming at each test case in the target test case set;

determining the weight of each artificial rhythm marking result in the artificial rhythm marking result set corresponding to each test case in the target test case set according to the target weight respectively corresponding to the audiometrists;

and then executing the step of obtaining an unobtained test case subset from the plurality of test cases in a centralized manner as a target test case subset until the unobtained test case subset does not exist, so as to obtain the weight of each artificial rhythm marking result in the artificial rhythm marking result sets corresponding to the plurality of test cases respectively.

Optionally, the determining the target weights respectively corresponding to the multiple audiometries based on the initial weights respectively corresponding to the multiple audiometries and the optimal artificial prosody labeling result selected by each audiometrie for each test case in the target test case set includes:

determining the number ratio of the test cases corresponding to any two audiometries according to the optimal artificial rhythm marking result selected by each audiometrie for each test case in the target test case set, wherein the number ratio of the test cases corresponding to any two audiometries is as follows: the ratio of the total number of the test cases corresponding to the same optimal artificial prosody marking result selected by any two audiometrists aiming at each test case in the target test case set to the total number of the test cases in the target test case set;

and determining the target weights respectively corresponding to the audiometries based on the initial weights respectively corresponding to the audiometries and the number ratio of the test cases corresponding to any two audiometries.

Optionally, the determining, by the target weights respectively corresponding to the audiometries, a weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to each test case in the target test case set includes:

for any artificial prosody labeling result in the artificial prosody labeling result set corresponding to any test case in the target test case set:

determining the weight of the artificial rhythm marking result by selecting a target weight corresponding to the audiometer with the artificial rhythm marking result as the optimal artificial rhythm marking result;

and obtaining the weight of each artificial prosody marking result in the artificial prosody marking result set corresponding to each test case in the target test case set.

Optionally, the determining the weight of the artificial prosody annotation result by selecting the target weight corresponding to the audiologist whose artificial prosody annotation result is the best artificial prosody annotation result includes:

and summing the target weights corresponding to the audiologists with the artificial prosody labeling result selected as the optimal artificial prosody labeling result, and summing to obtain a value serving as the weight of the artificial prosody labeling result.

Optionally, the determining the weights of the prosody prediction results corresponding to the plurality of test cases based on the weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to the plurality of test cases, which is obtained in advance, includes:

aiming at the prosody prediction result corresponding to any test case:

determining an artificial prosody labeling result which is consistent with a prosody prediction result corresponding to the test case from an artificial prosody labeling result set corresponding to the test case, and taking the weight of the determined artificial prosody labeling result as the weight of the prosody prediction result corresponding to the test case;

and obtaining the weights of the prosody prediction results corresponding to the plurality of test cases respectively.

Optionally, the determining, according to the respective prosody prediction results of the multiple test cases and the respective weights of the respective prosody prediction results of the multiple test cases, an evaluation result of the prosody prediction effect of the to-be-evaluated prosody prediction engine includes:

determining the score of the prosody prediction effect of the prosody prediction engine to be evaluated according to the prosody prediction results respectively corresponding to the test cases and the weights of the prosody prediction results respectively corresponding to the test cases;

determining the ratio of the prosody prediction effect score of the to-be-evaluated prosody prediction engine to the artificial highest score as an evaluation result of the prosody prediction effect of the to-be-evaluated prosody prediction engine;

the artificial highest score is obtained by summing the maximum weights corresponding to the test cases, and the maximum weight corresponding to any test case is the maximum weight in the weights of the artificial prosody marking results in the artificial prosody marking result set corresponding to the test case.

An apparatus for evaluating a prosody prediction effect, comprising: the system comprises a rhythm prediction result acquisition module, a rhythm prediction result weight determination module and a rhythm prediction effect evaluation module;

the prosody prediction result acquisition module is used for acquiring prosody prediction results corresponding to a plurality of test cases in the test case set respectively, wherein the prosody prediction result corresponding to each test case is obtained by predicting through a prosody prediction engine to be evaluated;

the prosody prediction result weight determining module is used for determining the weights of the prosody prediction results corresponding to the plurality of test cases respectively based on the weights of each artificial prosody labeling result in the artificial prosody labeling result sets corresponding to the plurality of test cases respectively obtained in advance, wherein the artificial prosody labeling result set corresponding to any test case comprises at least one artificial prosody labeling result corresponding to the test case, and the weight of any artificial prosody labeling result can represent the reasonable degree of the artificial prosody labeling result;

and the prosody prediction effect evaluation module is used for determining an evaluation result of the prosody prediction effect of the to-be-evaluated prosody prediction engine according to the respective prosody prediction results of the plurality of test cases and the weights of the respective prosody prediction results of the plurality of test cases.

The apparatus for evaluating the prosody prediction effect further includes: the system comprises an artificial rhythm marking result set acquisition module, an audio synthesis module and an artificial rhythm marking result weight determination module;

the artificial rhythm marking result set acquiring module is used for acquiring artificial rhythm marking result sets corresponding to the plurality of test cases respectively;

the audio synthesis module is used for performing audio synthesis on each artificial rhythm marking result in the artificial rhythm marking result set corresponding to the test cases respectively to obtain a synthesized audio set corresponding to the test cases respectively;

the artificial prosody labeling result weight determining module is used for determining the weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to each test case according to the optimal artificial prosody labeling result selected by each audiometer aiming at each test case; the method comprises the steps that a test case is generated according to the artificial rhythm marking result selected by an audiologist, wherein the optimal artificial rhythm marking result selected by the audiologist aiming at any test case is the optimal artificial rhythm marking result selected by the audiologist from the artificial rhythm marking result set corresponding to the test case through audiometring of each synthetic audio in the synthetic audio set corresponding to the test case.

Optionally, the module for determining the weight of the result of the artificial prosody labeling includes: the system comprises a grouping submodule, an obtaining submodule, a first weight determining submodule, a second weight determining submodule and an initial weight determining submodule;

the grouping submodule is used for dividing the test cases in the test case set into a plurality of groups, and each group of test cases form a test case subset to obtain a plurality of test case subsets;

the obtaining submodule is used for obtaining an unobtainable test case subset from the plurality of test case sets as a target test case subset;

the first weight determination submodule is used for determining target weights respectively corresponding to the audiometries based on initial weights respectively corresponding to the audiometries and an optimal artificial prosody labeling result selected by each audiometrie aiming at each test case in the target test case set;

the second weight determination submodule is used for determining the weight of each artificial prosody marking result in the artificial prosody marking result set corresponding to each test case in the target test case set through the target weight corresponding to the audiometer respectively;

the initial weight determining submodule is configured to determine target weights corresponding to the plurality of audiometries respectively as the initial weights corresponding to the plurality of audiometries respectively, and then trigger the obtaining submodule to obtain an unacquired test case subset from the plurality of test cases collectively as a target test case subset until the unacquired test case subset does not exist, so as to obtain a weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to the plurality of test cases respectively.

Optionally, the first weight determining submodule is specifically configured to determine a number ratio of test cases corresponding to any two audiometries through an optimal artificial prosody labeling result selected by each audiometrie for each test case in the target test case set; determining target weights respectively corresponding to the audiometries based on the initial weights respectively corresponding to the audiometries and the number ratio of the test cases corresponding to any two audiometries;

the number ratio of the test cases of any two audiometries is as follows: and the ratio of the total number of the test cases corresponding to the same optimal artificial prosody labeling result selected by any two audiometries aiming at each test case in the target test case set to the total number of the test cases in the target test case set.

Optionally, the second weight determining submodule is specifically configured to, for any artificial prosody labeling result in the artificial prosody labeling result set corresponding to any test case in the target test case set, determine a weight of the artificial prosody labeling result by selecting a target weight corresponding to an audiologist whose artificial prosody labeling result is the best artificial prosody labeling result, so as to obtain a weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to each test case in the target test case set.

Optionally, the prosody prediction result weight determining module is specifically configured to, for a prosody prediction result corresponding to any test case: determining an artificial prosody labeling result which is consistent with a prosody prediction result corresponding to the test case from an artificial prosody labeling result set corresponding to the test case, and taking the weight of the determined artificial prosody labeling result as the weight of the prosody prediction result corresponding to the test case; and obtaining the weights of the prosody prediction results corresponding to the plurality of test cases respectively.

Optionally, the prosody prediction effect evaluation module includes: a prosodic prediction effect score determining submodule and an evaluation result determining submodule;

the prosody prediction effect score determining submodule is used for determining the score of the prosody prediction effect of the prosody prediction engine to be evaluated according to the prosody prediction results respectively corresponding to the test cases and the weights of the prosody prediction results respectively corresponding to the test cases;

the evaluation result determining submodule is used for determining the proportion of the prosody prediction effect of the to-be-evaluated prosody prediction engine to the artificial highest score, and the proportion is used as the evaluation result of the prosody prediction effect of the to-be-evaluated prosody prediction engine; the artificial highest score is obtained by summing the maximum weights corresponding to the test cases, and the maximum weight corresponding to any test case is the maximum weight in the weights of the artificial prosody marking results in the artificial prosody marking result set corresponding to the test case.

An evaluation apparatus of a prosody prediction effect, comprising: a memory and a processor;

the memory is used for storing programs;

the processor is configured to execute the program to implement the steps of the method for evaluating the prosody prediction effect.

A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of assessing the effect of prosody prediction.

According to the scheme, the method and the device for evaluating the prosody prediction effect provided by the application can automatically evaluate the prosody prediction effect of the prosody prediction engine to be evaluated based on the weights of the artificial prosody marking results in the artificial prosody marking result sets respectively corresponding to the plurality of test cases in the test case set, and finally determine the evaluation result of the prosody prediction effect of the prosody prediction engine to be evaluated according to the weights of the prosody prediction results respectively corresponding to the plurality of test cases in the test case set, compared with the existing manual evaluation mode, the method has the advantages that the influence of subjective factors on the evaluation result is avoided, the labor is saved, and the evaluation time is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flowchart illustrating a method for evaluating prosody prediction according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart illustrating a process of obtaining a weight of each artificial prosody labeling result in an artificial prosody labeling result set corresponding to a plurality of test cases respectively according to an embodiment of the present application;

fig. 3 is a schematic flow chart illustrating a process of determining a weight of each artificial prosody labeling result in an artificial prosody labeling result set corresponding to a plurality of test cases respectively according to an optimal artificial prosody labeling result selected by each audiometer of the plurality of audiometers for each test case according to an embodiment of the present application;

fig. 4 is a schematic flow chart illustrating a process of determining an evaluation result of a prosody prediction effect of a prosody prediction engine to be evaluated according to weights of prosody prediction results respectively corresponding to a plurality of test cases according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of an apparatus for evaluating prosody prediction according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an apparatus for evaluating prosody prediction effects according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The conventional manual evaluation method evaluates the prosody prediction results (namely, the prosody prediction results of the test case set by the prosody prediction engine to be evaluated) corresponding to each test case in the test case set by a plurality of evaluators, and specifically, the evaluators perform prosody labeling and proofreading on the prosody prediction results or audio synthesized by the prosody prediction results, so that unacceptable prosody prediction results are selected from the prosody prediction results corresponding to each test case, and further, the unacceptable rate is counted.

The inventor finds that the above manual evaluation method has a large subjectivity, for example, when different evaluators listen to an audio synthesized by a prosody prediction result, the difference of the audiometry results can reach 25%, even the difference of the audiometry results of the same evaluator for the same audio in different time periods is large, which results in low reliability of the evaluation result, and the manual evaluation method requires more manpower and has a long evaluation time, that is, the manual evaluation method has high labor cost and time cost.

In addition, in the above manual evaluation method, the manually labeled data cannot be reused, that is, before and after the optimization of the prosody prediction engine, the evaluator needs to perform prosody labeling proofreading on the prosody prediction result of the prosody prediction engine or the audio synthesized by the prosody prediction result to select an unacceptable prosody prediction result, that is, the labeled data before the optimization of the prosody prediction engine has no value in any use for the evaluation of the prosody prediction result of the prosody prediction engine after the optimization.

In view of the problems existing in the manual evaluation mode, the inventor of the present invention has conducted an in-depth study, and finally provides a method for evaluating the prediction effect of a prosody prediction engine with a good effect, the evaluation method is suitable for an application scenario in which the prediction effect of the prosody prediction engine needs to be evaluated, the evaluation method can automatically, efficiently and objectively evaluate the prediction effect of the prosody prediction engine to be evaluated, and the evaluation prediction method can be applied to a terminal and can also be applied to a server. Next, a method for evaluating the prosody prediction effect provided by the present application will be described by the following embodiments.

Referring to fig. 1, a flow chart of a method for evaluating prosody prediction effect according to an embodiment of the present application is shown, where the method may include:

step S101: and acquiring prosody prediction results corresponding to a plurality of test cases in the test case set respectively.

The test case set comprises a plurality of test cases selected from different fields of a user scene according to a preset big data proportion, and a prosody prediction result corresponding to each test case is obtained through prediction of a prosody prediction engine to be evaluated.

Step S102: and determining the weights of the rhythm prediction results corresponding to the plurality of test cases respectively based on the weights of each artificial rhythm marking result in the artificial rhythm marking result set corresponding to the plurality of test cases respectively, which are obtained in advance.

And any test case corresponds to an artificial rhythm marking result set, and the artificial rhythm marking result set corresponding to any test case comprises at least one artificial rhythm marking result corresponding to the test case.

In consideration of the fact that multiple reasonable rhythm labeling results may exist in the same test case, the method can acquire multiple reasonable artificial rhythm labeling results to form an artificial rhythm labeling result set corresponding to the test case for any test case, wherein the multiple reasonable artificial rhythm labeling results can be obtained by performing rhythm boundary position labeling on the test case through multiple labeling personnel.

And the weight of any prosody labeling result can represent the reasonable degree of the prosody labeling result.

The following description of the embodiments may refer to the implementation process of obtaining the weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to each of the plurality of test cases in the test case set in advance.

Step S103: and determining an evaluation result of the prosody prediction effect of the prosody prediction engine to be evaluated according to the prosody prediction results respectively corresponding to the plurality of test cases and the weights of the prosody prediction results respectively corresponding to the plurality of test cases.

Specifically, the score of the prosody prediction effect of the prosody prediction engine to be evaluated can be determined according to the prosody prediction results respectively corresponding to the plurality of test cases in the test case set and the weights of the prosody prediction results respectively corresponding to the plurality of test cases, and then the evaluation result of the prosody prediction effect of the prosody prediction engine to be evaluated is determined based on the score of the prosody prediction effect of the prosody prediction engine to be evaluated.

The method for evaluating the prosody prediction effect provided by the embodiment of the application comprises the steps of firstly obtaining prosody prediction results corresponding to a plurality of test cases in a test case set, then determining the weight of each artificial prosody marking result in the artificial prosody marking result set corresponding to the plurality of test cases respectively based on the weight of each artificial prosody marking result in the artificial prosody marking result set corresponding to the plurality of test cases respectively obtained in advance, and finally determining the evaluation result of the prosody prediction effect of the prosody prediction engine to be evaluated according to the weight of the prosody prediction results corresponding to the plurality of test cases respectively, so that the method for evaluating the prosody prediction effect provided by the embodiment of the application can automatically evaluate the prosody prediction effect of the prosody prediction engine to be evaluated based on the weight of each artificial prosody marking result in the artificial prosody marking result set corresponding to the plurality of test cases in the test case set, compared with the existing manual evaluation mode, the method has the advantages that the influence of subjective factors on the evaluation result is avoided, the labor is saved, and the evaluation time is reduced.

In addition, in this embodiment, the artificial prosody labeling result sets corresponding to the multiple test cases in the test case set respectively, and the weight of each artificial prosody labeling result in the artificial prosody labeling result sets corresponding to the multiple test cases respectively only need to be obtained once, so that the prediction effect of the prosody prediction engine of each version can be evaluated by using the artificial prosody labeling result sets, that is, the artificial prosody labeling result sets corresponding to the multiple test cases in the test case set respectively, and the weight of each artificial prosody labeling result in the artificial prosody labeling result sets corresponding to the multiple test cases respectively can be reused.

And then, introducing the weight of each artificial prosody marking result in the artificial prosody marking result set which respectively corresponds to a plurality of test cases in the test case set in advance.

Referring to fig. 2, a schematic flow chart illustrating a process of obtaining a weight of each artificial prosody labeling result in an artificial prosody labeling result set corresponding to a plurality of test cases in a test case set respectively is shown, which may include:

step S201: and acquiring an artificial rhythm marking result set corresponding to a plurality of test cases in the test case set respectively.

And marking the prosodic annotation result in the artificial prosodic annotation result set corresponding to any test case by a plurality of marking personnel aiming at the test case to obtain the prosodic phrase boundary position marking.

Step S202: and carrying out audio synthesis on each artificial rhythm marking result in the artificial rhythm marking result set corresponding to the plurality of test cases respectively to obtain a synthesized audio set corresponding to the plurality of test cases respectively.

Specifically, for any test case, audio synthesis is performed on each artificial prosody labeling result in the artificial prosody labeling result set corresponding to the test case, and a set formed by the synthesized audio is used as a synthesized audio set corresponding to the test case, so that synthesized audio sets corresponding to a plurality of test cases are obtained.

Step S203: and determining the weight of each artificial prosody marking result in the artificial prosody marking result set corresponding to the plurality of test cases respectively according to the optimal artificial prosody marking result selected by each audiometer aiming at each test case.

The method comprises the steps that a test case is generated according to the artificial rhythm marking result selected by an audiologist, wherein the optimal artificial rhythm marking result selected by the audiologist aiming at any test case is the optimal artificial rhythm marking result selected by the audiologist from the artificial rhythm marking result set corresponding to the test case through audiometring of each synthetic audio in the synthetic audio set corresponding to the test case. It should be noted that, when any audiometer performs audiometry on the synthesized audio set corresponding to any test case, an optimal audio is selected from the synthesized audio set corresponding to the test case, and an artificial prosody labeling result corresponding to the optimal audio is an optimal artificial prosody labeling result.

The following pair "step S203: and determining the weight of each artificial prosody marking result in the artificial prosody marking result set corresponding to the plurality of test cases respectively for introduction according to the optimal artificial prosody marking result selected by each audiometer aiming at each test case.

Referring to fig. 3, a schematic flow chart illustrating a process of determining a weight of each artificial prosody labeling result in an artificial prosody labeling result set corresponding to a plurality of test cases according to an optimal artificial prosody labeling result selected by each audiometer for each test case by each audiometer in the plurality of audiometers is shown, and the process may include:

step S301: and dividing the test cases in the test case set into a plurality of groups, wherein each group of test cases forms a test case subset to obtain a plurality of test case subsets.

Illustratively, the test case set includes 100 test cases, and the 100 test cases may be divided into 5 groups, and each group includes 5 test cases, so that 5 test case subsets may be obtained.

Step S302: and acquiring an unacquired test case subset from the plurality of test case sets as a target test case subset.

Step S303: and determining the target weights respectively corresponding to the audiometries based on the initial weights respectively corresponding to the audiometries and the optimal artificial prosody labeling result selected by each audiometrie aiming at each test case in the target test case set.

Specifically, the implementation process of step S303 may include:

step 3031, determining the number ratio of the test cases corresponding to any two audiometries according to the best artificial prosody labeling result selected by each audiometrie aiming at each test case in the target test case set.

The number ratio of the test cases corresponding to any two audiometries is as follows: and the ratio of the total number of the test cases corresponding to the same optimal artificial prosody marking result selected by any two audiometries aiming at each test case in the target test case set to the total number of the test cases in the target test case set.

Illustratively, the target test case subset includes A, B, C test cases, the artificial prosody labeling result set corresponding to test case a is { a1, a2, a3}, the artificial prosody labeling result set corresponding to test case B is { B1, B2, B3}, the artificial prosody labeling result set corresponding to test case C is { C1, C2, C3}, it is assumed that the best artificial prosody labeling result selected by audiologist x for test case a is a2, the best artificial prosody labeling result selected for test case B is B1, the best artificial prosody labeling result selected for test case C is C3, the best artificial prosody labeling result selected by audiologist y for test case a is a1, the best artificial prosody labeling result selected for test case B is B1, the best artificial prosody labeling result selected for test case C is C3, and if the total number of the test cases corresponding to the same optimal artificial prosody labeling result selected by the audiologist x and the audiologist y for the test case A, B, C is 2 and the total number of the test cases in the target test case set is 3, the ratio of the number of the test cases corresponding to the audiologist x to the number of the test cases corresponding to the audiologist y is 2/3.

Step S3032, determining target weights respectively corresponding to a plurality of audiometries based on the initial weights respectively corresponding to the plurality of audiometries and the number ratio of the test cases corresponding to any two audiometries.

The higher the target weight corresponding to any audiometer is, the more abundant the experience of the audiometer in audiometry is.

Suppose the number of audiometries is N_pIn this embodiment, N may be_pThe initial weights respectively corresponding to each audiometer form an N_pThe wiki vectors are used as the initial probability distribution corresponding to the target test case subset, and the number ratio of the test cases of any two audiometries can be obtainedForm an N_p*N_pThe matrix of (2) is used as a transfer matrix corresponding to the target test case subset. It should be noted that, for the first target test case subset, the value of each element in the corresponding initial probability distribution is 1/N_p。

Targeting a target subset of test cases S_iObtaining target test case subset S_iCorresponding initial probability distribution V_i-1And a transfer matrix M_iThen, can be according to lim_n→∞M_i ⁿV_i-1Performing a plurality of iterations until the calculation result tends to be stable, wherein the final calculation result is the target test case subset S_iCorresponding probability distribution V_i，V_iThe value of each element in the test case is a target weight value corresponding to each of a plurality of audiometries, wherein the value of i is 1-Q, and Q is the number of test case subsets.

Step S304: and determining the weight of each artificial rhythm marking result in the artificial rhythm marking result set corresponding to each test case of the target test case subset through the target weights respectively corresponding to audiometries.

Specifically, for any artificial prosody labeling result in the artificial prosody labeling result set corresponding to any test case in the target test case set: and determining the weight of the artificial prosody marking result by selecting the target weight corresponding to the audiologist with the artificial prosody marking result as the optimal artificial prosody marking result so as to obtain the weight of each artificial prosody marking result in the artificial prosody marking result set corresponding to each test case in the target test example set.

Further, for any artificial prosody labeling result in the artificial prosody labeling result set corresponding to any test case in the target test case set: the target weights corresponding to the audiologists who select the artificial prosody annotation result as the best artificial prosody annotation result can be summed, and the sum value is used as the weight of the artificial prosody annotation result, so that the weight of each artificial prosody annotation result in the artificial prosody annotation result set corresponding to each test case in the target test case set is obtained.

Step S305: and judging whether the plurality of test case sets have the test case subsets which are not obtained, if so, executing the step S306, and if not, ending the weight determination process.

Step S306: the target weights corresponding to the audiometries are used as initial weights corresponding to the audiometries, and then step S302 is executed.

In this embodiment, when the weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to each test case in the first target test case set is calculated, the weight is calculated as [1/N ]_p 1/N_p…1/N_p]^TAs the initial probability distribution V corresponding to the first target test case subset₀Through the initial probability distribution V₀Transfer matrix M corresponding to first target test case subset₁And determining the probability distribution V1 corresponding to the first target test case subset, when calculating the weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to each test case in the second target test case set, taking V1 as the initial probability distribution corresponding to the second target test case subset, and determining the probability distribution V2 corresponding to the second target test case subset through V1 and the transition matrix M2 corresponding to the second target test case subset, and so on.

Through the process, the weight of each artificial prosody marking result in the artificial prosody marking result set corresponding to the plurality of test cases in the test case set can be obtained.

It should be noted that, in this embodiment, the test cases in the test case set are grouped, and the weight is determined for each group of test cases, because the data of each operation is relatively small, the operation efficiency is high, and in addition, the iteration times can be reduced by using the probability distribution corresponding to the previous target test case subset as the initial probability distribution corresponding to the next target test case subset, so that the calculation result quickly tends to be stable, and the operation efficiency can be further improved.

In addition, theIt should be noted that the above steps S301 to S306 are a preferred implementation manner of the step S203, and this embodiment does not limit that the step S203 can be implemented only by the steps S301 to S306, and can also be implemented by other manners, for example, an initial probability distribution V may be set for the whole test case set without grouping the test case sets₀(V₀Is N_pVector of dimension column, N_pThe value of each element in the dimension column vector is 1/N_p) Then, the number ratio of the test cases corresponding to any two audiometries is combined into a transfer matrix M (N) corresponding to the whole test case set_p*N_p) Through lim_n→∞MⁿV₀And performing a plurality of iterations until the calculation result tends to be stable, wherein the final calculation result is a probability distribution V corresponding to the whole test case set, the value of each element in the V is a target weight corresponding to each of a plurality of audiometries, and the weight of each artificial rhythm marking result in the artificial rhythm marking result set corresponding to each of the plurality of test cases in the whole test case set is determined through the target weights corresponding to each of the audiometries.

The foregoing process provides a process of obtaining in advance a weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to each of the plurality of test cases in the test case set, and then, for the "step S102: and determining the weight of each artificial prosody prediction result corresponding to each of the plurality of test cases for introduction based on the weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to each of the plurality of test cases obtained in advance.

The process of determining the weights of the prosody prediction results corresponding to the plurality of test cases respectively based on the weights of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to the plurality of test cases respectively obtained in advance may include: aiming at the rhythm prediction result corresponding to any test case, determining an artificial rhythm marking result which is consistent with the rhythm prediction result corresponding to the test case from the artificial rhythm marking result set corresponding to the test case, and taking the weight of the determined artificial rhythm marking result as the weight of the rhythm prediction result corresponding to the test case; so as to obtain the weights of the prosody prediction results respectively corresponding to the plurality of test cases in the test case set.

Illustratively, a prosody prediction result corresponding to a test case is x, the artificial prosody labeling result set corresponding to the test case is { a1, a2, a3}, and if the prosody prediction result x is consistent with the artificial prosody labeling result a2, the weight of the artificial prosody labeling result a2 is used as the weight of the prosody prediction result x. It should be noted that the prosody prediction result x and the artificial prosody labeling result a2 may be the same prosody phrase pause position in the prosody prediction result x and the artificial prosody labeling result a 2.

It should be noted that, for a prosody prediction result corresponding to any test case, if there is no artificial prosody labeling result in the set of artificial prosody labeling results corresponding to the test case that is consistent with the prosody prediction result corresponding to the test case, the weight of the prosody prediction result corresponding to the test case is set to 0.

After the weights of the prosody prediction results corresponding to the multiple test cases in the test case set are obtained, the evaluation result of the prosody prediction effect of the prosody prediction engine to be evaluated can be determined according to the weights of the prosody prediction results corresponding to the multiple test cases.

Referring to fig. 4, a schematic flow chart of determining an evaluation result of a prosody prediction effect of a prosody prediction engine to be evaluated according to respective prosody prediction results corresponding to a plurality of test cases and weights of the respective prosody prediction results corresponding to the plurality of test cases is shown, where the flow chart may include:

step S401: and determining the score of the prosody prediction effect of the prosody prediction engine to be evaluated according to the prosody prediction results respectively corresponding to the plurality of test cases and the weights of the prosody prediction results respectively corresponding to the plurality of test cases.

Specifically, the weights of the prosody prediction results corresponding to the multiple test cases can be summed, and the sum is used to obtain a score of the prosody prediction effect of the prosody prediction engine to be evaluated after the values are normalized.

Step S402: and determining the ratio of the prosody predicting effect score of the to-be-evaluated prosody predicting engine to the artificial highest score as the evaluation result of the prosody predicting effect of the to-be-evaluated prosody predicting engine.

The artificial highest score is obtained by summing the maximum weights corresponding to the plurality of test cases, and the maximum weight corresponding to any test case is the maximum weight in the weights of the artificial prosody marking results in the artificial prosody marking result set corresponding to the test case.

In this embodiment, the ratio of the prosody prediction effect score of the prosody prediction engine to be evaluated to the artificial highest score may reflect the satisfaction degree of the prosody prediction effect of most users on the prosody prediction engine to be evaluated, and it can be understood that the larger the ratio of the prosody prediction effect score of the prosody prediction engine to be evaluated to the artificial highest score is, the higher the satisfaction degree of the prosody prediction effect of the prosody prediction engine to be evaluated is.

The method for evaluating the prosody prediction effect provided by the embodiment of the application can automatically evaluate the prosody prediction effect of the prosody prediction engine to be evaluated, and compared with the existing manual evaluation mode, the method not only avoids the influence of subjective factors on the evaluation result, but also saves manpower and reduces evaluation time consumption, namely the method for evaluating the prosody prediction effect of the prosody prediction engine to be evaluated can automatically, efficiently and objectively evaluate the prosody prediction effect of the prosody prediction engine to be evaluated. In addition, in the present embodiment, the artificial prosody labeling result set corresponding to each of the plurality of test cases in the test case set, and the weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to each of the plurality of test cases can be reused.

The following describes the evaluation device provided in the embodiments of the present application, and the evaluation device described below and the evaluation method described above may be referred to in correspondence with each other.

Referring to fig. 5, a schematic structural diagram of an apparatus for evaluating prosody prediction according to an embodiment of the present application is shown, where the apparatus may include: a prosody prediction result obtaining module 501, a prosody prediction result weight determining module 502 and a prosody prediction effect evaluating module 503.

The prosody prediction result obtaining module 501 is configured to obtain prosody prediction results corresponding to a plurality of test cases in the test case set.

And predicting a prosody prediction result corresponding to each test case by using a prosody prediction engine to be evaluated.

A prosody prediction result weight determining module 502, configured to determine, based on a weight of each artificial prosody labeling result in an artificial prosody labeling result set corresponding to a plurality of test cases respectively obtained in advance, a weight of a prosody prediction result corresponding to each of the plurality of test cases.

The artificial prosody marking result set corresponding to any test case comprises at least one artificial prosody marking result corresponding to the test case, and the weight of any artificial prosody marking result can represent the reasonable degree of the artificial prosody marking result.

The prosody prediction effect evaluation module 503 is configured to determine an evaluation result of the prosody prediction effect of the to-be-evaluated prosody prediction engine according to the respective prosody prediction results of the multiple test cases and the weights of the respective prosody prediction results of the multiple test cases.

The apparatus for evaluating prosody prediction effect provided in this embodiment of the present application first obtains prosody prediction results corresponding to a plurality of test cases in a test case set, then determines weights of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to a plurality of test cases obtained in advance, and finally determines evaluation results of the prosody prediction effect of the prosody prediction engine to be evaluated according to the weights of the prosody prediction results corresponding to the plurality of test cases, so that the apparatus for evaluating prosody prediction effect provided in this embodiment of the present application can automatically evaluate the prosody prediction effect of the prosody prediction engine to be evaluated based on the weights of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to a plurality of test cases, compared to the existing artificial evaluation method, not only avoids the influence of subjective factors on the evaluation result, but also saves the labor and reduces the evaluation time.

In a possible implementation manner, the apparatus for evaluating prosody prediction effect provided by the foregoing embodiment further includes: the system comprises an artificial rhythm marking result set acquisition module, an audio synthesis module and an artificial rhythm marking result weight determination module.

And the artificial rhythm marking result set acquisition module is used for acquiring artificial rhythm marking result sets respectively corresponding to the plurality of test cases.

And the audio synthesis module is used for carrying out audio synthesis on each artificial rhythm marking result in the artificial rhythm marking result set corresponding to the test cases respectively to obtain a synthesized audio set corresponding to the test cases respectively.

And the artificial prosody labeling result weight determining module is used for determining the weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to the plurality of test cases respectively according to the optimal artificial prosody labeling result selected by each audiometer aiming at each test case.

The method comprises the steps that a test case is generated according to the artificial rhythm marking result selected by an audiologist, wherein the optimal artificial rhythm marking result selected by the audiologist aiming at any test case is the optimal artificial rhythm marking result selected by the audiologist from the artificial rhythm marking result set corresponding to the test case through audiometring of each synthetic audio in the synthetic audio set corresponding to the test case.

In a possible implementation manner, the module for determining a weight of an artificial prosody labeling result includes: the system comprises a grouping submodule, an obtaining submodule, a first weight determining submodule, a second weight determining submodule and an initial weight determining submodule;

the initial weight determining submodule is configured to determine target weights corresponding to the plurality of audiometries respectively as the initial weights corresponding to the plurality of audiometries respectively, and then trigger the obtaining submodule to obtain an unacquired test case subset from the plurality of test cases collectively as a target test case subset until the unacquired test case subset does not exist, so as to obtain a weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to the plurality of test cases in the test case set respectively.

In a possible implementation manner, the first weight determining submodule is specifically configured to determine a ratio of the number of test cases corresponding to any two audiometries according to an optimal artificial prosody labeling result selected by each audiometrie for each test case in the target test case set; determining target weights respectively corresponding to the audiometries based on the initial weights respectively corresponding to the audiometries and the number ratio of the test cases corresponding to any two audiometries;

In a possible implementation manner, the second weight determining submodule is specifically configured to determine, for any artificial prosody labeling result in the artificial prosody labeling result set corresponding to any test case in the target test case set, a weight of the artificial prosody labeling result by selecting a target weight corresponding to an audiologist whose artificial prosody labeling result is the best artificial prosody labeling result, so as to obtain a weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to each test case in the target test case set.

In a possible implementation manner, the second weight determination submodule, when determining the weight of the artificial prosody tagging result by selecting a target weight corresponding to the audiologist whose artificial prosody tagging result is the best artificial prosody tagging result, is specifically configured to sum the target weights corresponding to the audiologist whose artificial prosody tagging result is the best artificial prosody tagging result, and sum the target weights to obtain a value as the weight of the artificial prosody tagging result.

In a possible implementation manner, in the apparatus for evaluating a prosody prediction effect provided in the foregoing embodiment, the prosody prediction result weight determining module 502 is specifically configured to, for a prosody prediction result corresponding to any test case: determining an artificial prosody labeling result which is consistent with a prosody prediction result corresponding to the test case from an artificial prosody labeling result set corresponding to the test case, and taking the weight of the determined artificial prosody labeling result as the weight of the prosody prediction result corresponding to the test case; and obtaining the weights of the prosody prediction results corresponding to the plurality of test cases respectively.

In a possible implementation manner, in the apparatus for evaluating a prosody prediction effect provided in the foregoing embodiment, the prosody prediction effect evaluation module 503 may include: a prosody prediction effect score determining submodule and an evaluation result determining submodule.

And the prosody prediction effect score determining submodule is used for determining the score of the prosody prediction effect of the prosody prediction engine to be evaluated according to the prosody prediction results respectively corresponding to the plurality of test cases and the weights of the prosody prediction results respectively corresponding to the plurality of test cases.

And the evaluation result determining submodule is used for determining the proportion of the prosody predicting effect of the to-be-evaluated prosody predicting engine to the artificial highest score, and the proportion is used as the evaluation result of the prosody predicting effect of the to-be-evaluated prosody predicting engine.

The artificial highest score is obtained by summing maximum weights corresponding to the plurality of test cases respectively, and the maximum weight corresponding to any test case is the maximum weight in the weights of the artificial prosody marking results in the artificial prosody marking result set corresponding to the test case.

An embodiment of the present application further provides an evaluation apparatus for prosody prediction effect, please refer to fig. 6, which shows a schematic structural diagram of the evaluation apparatus, and the evaluation apparatus may include: at least one processor 601, at least one communication interface 602, at least one memory 603, and at least one communication bus 604;

in the embodiment of the present application, the number of the processor 601, the communication interface 602, the memory 603, and the communication bus 604 is at least one, and the processor 601, the communication interface 602, and the memory 603 complete communication with each other through the communication bus 604;

the processor 601 may be a central processing unit CPU, or an application Specific Integrated circuit asic, or one or more Integrated circuits configured to implement embodiments of the present invention, or the like;

the memory 603 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory), etc., such as at least one disk memory;

wherein the memory stores a program and the processor can call the program stored in the memory, the program for:

Alternatively, the detailed function and the extended function of the program may be as described above.

Embodiments of the present application further provide a readable storage medium, where a program suitable for being executed by a processor may be stored, where the program is configured to:

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for evaluating the effect of prosody prediction, comprising:

determining an evaluation result of the prosody prediction effect of the prosody prediction engine to be evaluated according to the respective prosody prediction results of the test cases and the weights of the respective prosody prediction results of the test cases;

wherein, the process of obtaining the weight of each artificial prosody labeling result in the artificial prosody labeling result set respectively corresponding to the plurality of test cases comprises:

and determining the weight of each artificial rhythm marking result in the artificial rhythm marking result sets respectively corresponding to the plurality of test cases according to the optimal artificial rhythm marking result selected from the artificial rhythm marking result set corresponding to each test case by a plurality of audiometries respectively according to the synthetic audio set corresponding to each test case.

2. The method for evaluating prosody prediction effect according to claim 1, wherein the determining the weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to each of the plurality of test cases according to the best artificial prosody labeling result selected by each audiometer for each test case by the audiometer comprises:

3. The method for evaluating prosody prediction effect according to claim 2, wherein the determining the target weights respectively corresponding to the plurality of audiometries based on the initial weights respectively corresponding to the plurality of audiometries and the best artificial prosody labeling result selected by each audiometrie for each test case in the target test case set comprises:

4. The method for evaluating prosody prediction effect according to claim 2, wherein the determining the weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to each test case in the target test case set according to the target weights respectively corresponding to the audiometrists comprises:

5. The method of claim 4, wherein the determining the weight of the artificial prosody labeling result by selecting a target weight corresponding to the audiologist whose artificial prosody labeling result is the best artificial prosody labeling result comprises:

6. The method for evaluating a prosody prediction effect according to any one of claims 1 to 5, wherein the determining the weights of the prosody prediction results corresponding to the plurality of test cases based on the pre-obtained weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to the plurality of test cases respectively comprises:

aiming at the prosody prediction result corresponding to any test case:

7. The method for evaluating a prosody prediction effect according to any one of claims 1 to 5, wherein the determining an evaluation result of the prosody prediction effect of the prosody prediction engine to be evaluated according to the respective prosody prediction results of the plurality of test cases and the respective weights of the respective prosody prediction results of the plurality of test cases comprises:

8. An apparatus for evaluating a prosody prediction effect, comprising: the system comprises a rhythm prediction result acquisition module, an artificial rhythm marking result set acquisition module, an audio synthesis module, an artificial rhythm marking result weight determination module, a rhythm prediction result weight determination module and a rhythm prediction effect evaluation module;

the artificial rhythm marking result weight determining module is used for determining the weight of each artificial rhythm marking result in the artificial rhythm marking result sets respectively corresponding to the plurality of test cases according to the best artificial rhythm marking result selected from the artificial rhythm marking result set corresponding to each test case by a plurality of audiometries respectively according to the synthetic audio set corresponding to each test case;

9. The apparatus for evaluating prosody prediction effects according to claim 8, wherein the module for determining the artificial prosody labeling result weight comprises: the system comprises a grouping submodule, an obtaining submodule, a first weight determining submodule, a second weight determining submodule and an initial weight determining submodule;

10. The apparatus for evaluating prosody prediction according to claim 9, wherein the first weight determination submodule is specifically configured to determine a ratio of the number of test cases corresponding to any two audiometries according to an optimal artificial prosody labeling result selected by each audiometrie for each test case in the target test case set; determining target weights respectively corresponding to the audiometries based on the initial weights respectively corresponding to the audiometries and the number ratio of the test cases corresponding to any two audiometries;

11. The apparatus for evaluating prosody prediction effect according to claim 9, wherein the second weight determination submodule is specifically configured to determine, for any artificial prosody labeling result in the artificial prosody labeling result set corresponding to any test case in the target test case set, a weight of the artificial prosody labeling result by selecting a target weight corresponding to an audiologist whose artificial prosody labeling result is the best artificial prosody labeling result, so as to obtain a weight of each artificial prosody labeling result in the artificial prosody labeling result set corresponding to each test case in the target test case set.

12. The apparatus for evaluating prosody prediction effects according to any one of claims 8 to 11, wherein the prosody prediction result weight determining module is specifically configured to, for a prosody prediction result corresponding to any test case: determining an artificial prosody labeling result which is consistent with a prosody prediction result corresponding to the test case from an artificial prosody labeling result set corresponding to the test case, and taking the weight of the determined artificial prosody labeling result as the weight of the prosody prediction result corresponding to the test case; and obtaining the weights of the prosody prediction results corresponding to the plurality of test cases respectively.

13. The apparatus for evaluating prosodic prediction effects according to any one of claims 8 to 11, wherein the prosodic prediction effect evaluation module comprises: a prosodic prediction effect score determining submodule and an evaluation result determining submodule;