Summary of the invention
The invention provides a kind of voice fuzzy retrieval method and device, to solve the inaccurate problem of retrieval that the existing voice recognition technology exists.
For this reason, the embodiment of the invention adopts following technical scheme:
A kind of voice fuzzy retrieval method comprises:
Acoustic model that utilization is preset and language model carry out speech recognition to the voice signal that obtains, and obtain recognition result;
The concordance list that utilization is preset is retrieved in the textual entry storehouse of presetting according to described recognition result, obtains the primary election clauses and subclauses;
Described primary election clauses and subclauses and described recognition result are carried out the character string fuzzy matching, choose the selected clauses and subclauses of matching degree in the matching degree threshold range that presets, write down matched position simultaneously;
Calculate the posterior probability between selected entries match part text and described voice signal, utilize posterior probability and select the result for retrieval of several clauses and subclauses as voice signal by the matching ratio that described matched position obtains.
This method also comprises:
Is that indexing units is set up described concordance list according to textual entry to be retrieved with syllable, word or speech, in order to carry out one or more levels index.
This method also comprises:
Described language model is all or part of to utilize described textual entry storehouse training of presetting to obtain.
Wherein:
The form of described recognition result comprises the most probable text strings of voice signal correspondence, the most possible kinds of words string of voice signal correspondence, and the speech figure of voice signal correspondence.
The concordance list that described utilization is preset is retrieved the detailed process that obtains the primary election clauses and subclauses according to described recognition result in the textual entry storehouse of presetting:
The concordance list that utilization is preset is voted to each character/word in the recognition result, chooses votes and is higher than the clauses and subclauses of the votes threshold value that presets as described primary election clauses and subclauses;
Wherein, described ballot is meant the index entry of searching concordance list with the character/word in the recognition result, inquire index entry after, each clauses and subclauses votes that this index is included all adds 1.
The matching algorithm of described fuzzy matching adopts based on editing distance dynamic programming computing method between the text of confusion matrix, and wherein, described confusion matrix obtains or preestablishes by training, is optimized replacing, insert, delete cost.
A kind of voice fuzzy indexing unit comprises:
The voice signal acquiring unit is used to obtain voice signal;
Recognition unit is used to utilize the acoustic model and the language model that preset that the voice signal that obtains is carried out speech recognition, obtains recognition result;
Retrieval unit is used for utilizing the concordance list that presets to retrieve in the textual entry storehouse of presetting according to described recognition result, obtains the primary election clauses and subclauses;
The fuzzy matching unit is used for described primary election clauses and subclauses and described recognition result are carried out the character string fuzzy matching, chooses the selected clauses and subclauses of matching degree in the matching degree threshold range that presets, and the record matched position;
Determining unit is used to calculate the compatible portion of selected clauses and subclauses and the posterior probability between described voice signal as a result, utilizes posterior probability and selects the result for retrieval of several clauses and subclauses as voice signal by the matching ratio that described matched position obtains.
This device also comprises:
Concordance list is set up the unit, and being used for according to the textual entry storehouse of presetting to be retrieved is that indexing units is set up described concordance list with syllable, word or speech, and described concordance list is in order to carry out one or more levels index.
This device also comprises:
Language model is set up the unit, is used to utilize described textual entry storehouse training of presetting to obtain the part or all of of described language model.
Described retrieval unit comprises:
Index ballot subelement, be used for utilizing the concordance list that presets that each character/word of recognition result is voted, wherein, described ballot is meant the index entry of searching concordance list with the character/word in the recognition result, after inquiring about index entry, each clauses and subclauses votes that this index is included all adds 1;
The primary election clauses and subclauses are chosen subelement, are used to choose votes and are higher than the clauses and subclauses of the votes threshold value that presets as described primary election clauses and subclauses.
As seen, the present invention proposes a kind of brand-new voice fuzzy search modes, it is by using the steps such as posterior probability calculating of relevant language model, index ballot, character string fuzzy matching, selected clauses and subclauses and voice signal, overcome the adverse effect that incomplete correct voice identification result is retrieved text library, realized the quick and precisely retrieval of voice signal on magnanimity textual entry storehouse.
Embodiment
Voice fuzzy retrieval scheme provided by the invention, when identification, add suitable language model to improve accuracy rate, when utilizing recognition result, carry out the character string fuzzy matching to reduce the influence of identification error as text retrieval, and, the calculated candidate keyword is that the posterior probability of audio content is verified, thereby increases substantially the accuracy and the reliability of retrieval.
Referring to Fig. 1, be voice fuzzy retrieval method process flow diagram of the present invention, may further comprise the steps:
S101: utilize the acoustic model and the language model that preset that the voice signal that obtains is carried out speech recognition, obtain recognition result;
S102: utilize the concordance list that presets in the textual entry storehouse of presetting, to retrieve, obtain the primary election clauses and subclauses according to described recognition result;
Wherein, described textual entry storehouse of presetting generally is the textual entry storehouse of magnanimity, comprises a large amount of textual entry information.
S103: described primary election clauses and subclauses and described recognition result are carried out the character string fuzzy matching, choose the selected clauses and subclauses of matching degree in the matching degree threshold range that presets, write down matched position simultaneously;
S104: calculate the compatible portion of selected clauses and subclauses and the posterior probability between described voice signal, utilize described posterior probability and select the result for retrieval of several clauses and subclauses as voice signal by the matching ratio that described matched position obtains.
Below in conjunction with instantiation, the present invention is described in detail.
Referring to Fig. 2, carry out the specific embodiment method flow diagram in speech retrieval magnanimity textual entry storehouse for utilizing the voice fuzzy retrieval technique, comprising:
S201: the voice signal that obtains user's input;
S202: utilize acoustic model and the language model set up in advance that the voice signal that obtains is carried out speech recognition, obtain recognition result;
S203: utilize the concordance list that presets in the textual entry storehouse of presetting, to retrieve fast, obtain the primary election clauses and subclauses according to recognition result;
Before beginning to make up the voice fuzzy searching system, need set up the concordance list in suitable speech model and magnanimity textual entry storehouse in advance.
Because will in magnanimity textual entry storehouse, retrieve the text that comprises voice content, so voice content very likely is to exist in the magnanimity textual entry storehouse, be wherein certain clauses and subclauses or the part of certain clauses and subclauses, therefore, according to magnanimity textual entry storehouse is that the language model that corpus trains is to use relevant language model, and it can adapt to retrieval tasks better.
For the concordance list that presets, it comprises two parts composition: the content of index entry and index entry correspondence.The index entry of concordance list is word or speech among the present invention, and the content of index entry correspondence is the text that comprises this word or speech in the magnanimity textual entry storehouse, the corresponding a plurality of texts of a common index entry.For example, index entry " in " corresponding content comprises " Chinese Communist Party ", " The People's Republic of China " and " our big China " or the like.
Thus, in S202, the input voice when carrying out speech recognition, are added the relevant language model of application of training among the S203, can improve the accuracy rate of identification well, in S202, obtain the high recognition result of accuracy rate.
Recognition result is that voice signal is through the decoded character form of expression, form commonly used has: the most probable text strings of input speech signal correspondence (promptly has only a kind of recognition result, for example " People's Republic of China (PRC) "), most possible is that N kind text strings (is multiple recognition result, 3 kinds of recognition results for example: " Chinese Communist Party ", " The People's Republic of China " and " our big China "), the speech figure of voice signal correspondence, the predicate figure of institute is meant in the mode of directed acyclic graph and represents all possible text strings, speech figure is the form of recognition result performance the most efficiently, and the quantity of information that it comprises also is the abundantest.
In S203,, utilize the concordance list that presets to carry out the index ballot to each character/word in the recognition result that obtains among the S202.So-called ballot that is to say, searches the index entry of concordance list with the character/word in the recognition result, and inquiry is fallen behind the index entry, and the text votes of correspondence adds 1.For example, comprise in the recognition result " in " word, then all comprise " in " text, as the Chinese Communist Party ", the votes of " The People's Republic of China " and correspondences such as " our big China " adds 1.The text that votes is high more is high more with the matching degree of recognition result.Keep votes and be higher than the text of threshold value as the primary election clauses and subclauses.
S204: primary election clauses and subclauses and recognition result are carried out the character string fuzzy matching,, and only keep the selected clauses and subclauses of matching degree in the matching degree threshold range according to sort the from high to low clauses and subclauses of coupling of matching degree;
Because speech recognition technology can not guarantee accuracy very, cause existing in the recognition result certain mistake, and concordance list has only write down and has contained those character/word in the text, the positional information that does not have character/word, so the primary election clauses and subclauses that index goes out can not be directly as result for retrieval.
Therefore, utilize character string fuzzy matching technology, obtain the matching degree in primary election clauses and subclauses and the recognition result.For the character string Accuracy Matching, fuzzy matching allows substring incomplete same with main string.Two main method of character string fuzzy matching at present are bit vector method and filter method, and the present invention can adopt existing method to carry out.The simplest fuzzy matching algorithm is based on the editing distance of dynamic programming, there is deletion in the coupling, inserts and substitutes three kinds of mistakes, every kind of mistake can define different wrong costs according to practical application, and for the part of correct coupling, the definition error cost is zero usually.Among the present invention, the text in recognition result and the magnanimity textual entry storehouse can be regarded certain character form of expression as, and substring is recognition result, and main string is the clauses and subclauses in the magnanimity textual entry storehouse.Matching degree and wrong cost journey inverse ratio.Because the voice signal of user input may be the text fragments in the magnanimity textual entry storehouse, the character string fuzzy matching when providing matching degree, also given most probable matched position.
S205: each qualified selected clauses and subclauses is calculated its posterior probability for the input audio content; Simultaneously, record matched position;
Because the selected clauses and subclauses that obtain of step S204 compare in the character aspect with recognition result and get, and recognition result itself contains certain mistake, thus the matching degree height might not to represent it be that the possibility of voice actual content is big.Therefore in S205, calculated the posterior probability of selected clauses and subclauses under the given voice signal condition.This posterior probability is the numerical value between 0 to 1, and the posterior probability sum of all selected clauses and subclauses is 1.Posterior probability is big more, and its corresponding clauses and subclauses really are that the possibility of voice content is just big more.Posterior probability is meant after the information that obtains " result " probability of revising again, in Bayesian formula, be in " hold fruit seek because of " problem " because of ", prior probability and posterior probability have indivisible the contact, the calculating of posterior probability will be based on prior probability.The computing method of relevant posterior probability are ripe prior art, do not do describe herein more.
S206: the matching ratio that utilizes described posterior probability and obtain by described matched position, select the result for retrieval of several clauses and subclauses, then process ends as voice signal.
Wherein, can finally select the relative higher clauses and subclauses of posterior probability as result for retrieval by mode to posterior probability and matching ratio weighted with matching ratio.
Corresponding with said method, the invention provides a kind of voice fuzzy indexing unit, this device can be realized by software, hardware or software and hardware combining mode.
Referring to Fig. 3, be this device inner structure synoptic diagram, comprising: voice signal acquiring unit 300, recognition unit 301, retrieval unit 302, fuzzy matching unit 303 and determining unit 304 as a result, wherein:
Voice signal acquiring unit 300 is used to obtain voice signal;
Recognition unit 301 is used to utilize the acoustic model and the language model that preset that the voice signal that voice signal acquiring unit 300 obtains is carried out speech recognition, obtains recognition result;
Retrieval unit 302 is used for utilizing the concordance list that presets to retrieve in the textual entry storehouse of presetting according to the recognition result that recognition unit 301 obtains, and obtains the primary election clauses and subclauses;
Fuzzy matching unit 303 is used for the recognition result that primary election clauses and subclauses that retrieval unit 302 is obtained and recognition unit 301 obtain and carries out the character string fuzzy matching, chooses the selected clauses and subclauses of matching degree in the matching degree threshold range that presets, and writes down matched position simultaneously;
Determining unit 304 as a result, be used to calculate the selected clauses and subclauses of fuzzy matching unit 303 couplings and the posterior probability between voice signal, the matching ratio that utilizes described posterior probability and obtain by described matched position is selected the result for retrieval of several clauses and subclauses as voice signal.
Preferably, this device also comprises:
Concordance list is set up unit 305, and being used for according to the described textual entry that presets is that indexing units is set up concordance list with syllable, word or speech.
Preferably, this device also comprises:
Language model is set up unit 306, is used to utilize described textual entry storehouse training of presetting to obtain language model.
Preferably, retrieval unit 302 further comprises:
Index ballot subelement (not shown), be used for utilizing the concordance list that presets that each character/word of recognition result is voted, wherein, described ballot is meant the index entry of searching concordance list with the character/word in the recognition result, after inquiring about index entry, each clauses and subclauses votes that this index is included all adds 1;
The primary election clauses and subclauses are chosen the subelement (not shown), are used to choose votes and are higher than the clauses and subclauses of the votes threshold value that presets as described primary election clauses and subclauses.
Can repeat no more referring to method embodiment for the realization details that the invention provides device herein.
As seen, the present invention proposes a kind of brand-new voice fuzzy retrieval scheme, it is by using the steps such as posterior probability calculating of relevant language model, index ballot, character string fuzzy matching, candidate's text and voice signal, overcome the adverse effect that incomplete correct voice identification result is retrieved text library, realized the quick and precisely retrieval of voice signal on magnanimity textual entry storehouse.
One of ordinary skill in the art will appreciate that, the process of the method for realization the foregoing description can be finished by the relevant hardware of programmed instruction, described program can be stored in the read/write memory medium, and this program is carried out the corresponding step in the said method when carrying out.Described storage medium can be as ROM/RAM, magnetic disc, CD etc.
The above only is a preferred implementation of the present invention; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the principle of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.