CN110765763A

CN110765763A - Error correction method and device for speech recognition text, computer equipment and storage medium

Info

Publication number: CN110765763A
Application number: CN201910903618.1A
Authority: CN
Inventors: 宁义双; 张良杰; 闵刚
Original assignee: Kingdee Software China Co Ltd
Current assignee: Kingdee Software China Co Ltd
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2020-02-07
Anticipated expiration: 2039-09-24
Also published as: CN110765763B

Abstract

The application relates to a method and a device for correcting errors of a speech recognition text, a computer device and a storage medium. The method comprises the following steps: acquiring fluency of a voice recognition text by using a preset language model, wherein the preset language model is obtained by using corpus training of a first corpus and a second corpus, the first corpus comprises a corpus of a general scene, and the second corpus comprises a corpus of a preset scene; if the fluency of the voice recognition text is smaller than a fluency threshold value, acquiring words to be corrected in the voice recognition text; and determining a correction word corresponding to the word to be corrected from an error correction database, and obtaining a corrected voice recognition text according to the correction word. The method and the device improve the accuracy of user intention identification.

Description

Error correction method and device for speech recognition text, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for correcting a speech recognition text, a computer device, and a storage medium.

Background

For enterprise applications, correctly understanding the user's intent is key to improving the user's satisfaction. In the voice interaction system, the user concept representation is carried out on the voice recognition result to obtain the intention of the user, wherein the user concept representation means that the essential characteristics of the perceived things are expressed by processing the input information.

However, the traditional speech recognition technology only models from the perspective of pronunciation and grammar, so that the speech recognition result has the problem of inaccuracy, thereby influencing the accuracy rate of the recognition of the user intention.

Disclosure of Invention

In view of the above, it is necessary to provide a method and apparatus for correcting a speech recognition text, a computer device, and a storage medium, which can improve the accuracy of recognition of a user's intention, in view of the above technical problems.

A method of error correction for speech recognized text, the method comprising:

acquiring fluency of a voice recognition text by using a preset language model, wherein the preset language model is obtained by using corpus training of a first corpus and a second corpus, the first corpus comprises a corpus of a general scene, and the second corpus comprises a corpus of a preset scene;

if the fluency of the voice recognition text is smaller than a fluency threshold value, acquiring words to be corrected in the voice recognition text;

and determining a correction word corresponding to the word to be corrected from an error correction database, and obtaining a corrected voice recognition text according to the correction word.

In one embodiment, the error correction database is constructed in a manner including:

obtaining the corpus of the second corpus;

performing word segmentation on the corpus of the second corpus by using a word segmentation dictionary to obtain candidate words;

and constructing the error correction database according to the candidate words and the pinyin of the candidate words.

In one embodiment, the method further comprises:

obtaining confusion words corresponding to the candidate words;

and adding the confusion word into the word segmentation dictionary.

In one embodiment, the obtaining of the word to be corrected in the speech recognition text includes:

performing word segmentation on the voice recognition text by using the word segmentation dictionary to obtain text words;

calculating the average absolute deviation value of each text word;

and if the average absolute deviation value of the text word is greater than the deviation threshold value, determining that the text word is the word to be corrected.

In one embodiment, the determining, from the error correction database, a correction word corresponding to the word to be corrected includes:

determining an error correction candidate word corresponding to the word to be corrected from the error correction database;

determining the corrected word among the error corrected candidate words.

In one embodiment, the determining, from the error correction database, an error correction candidate word corresponding to the word to be corrected includes:

obtaining the pinyin of the word to be corrected;

acquiring the similarity between the pinyin of the word to be corrected and the pinyin of the candidate word in the error correction database;

and taking the candidate word with the similarity larger than a similarity threshold as the error correction candidate word.

In one embodiment, the obtaining the similarity between the pinyin of the word to be corrected and the pinyin of the candidate word in the error correction database includes:

and acquiring the edit distance between the pinyin of the word to be corrected and the pinyin of the candidate word in the error correction database, and representing the similarity between the pinyin of the word to be corrected and the pinyin of the candidate word in the error correction database by using the edit distance.

In one embodiment, the determining the corrected word from the error correction candidate words includes:

replacing words to be corrected in the voice recognition text by the error correction candidate words, and calculating fluency of the replaced voice recognition text by the preset language model;

and taking the error correction candidate word with the fluency meeting the preset condition as the correction word.

In one embodiment, the preset language model is a binary language model and a ternary language model;

the calculating the fluency of the replaced voice recognition text by using the preset language model comprises the following steps:

respectively inputting the replaced voice recognition texts into the binary language model and the ternary language model to obtain fluency output by the binary language model and fluency output by the ternary language model;

and taking the maximum value of the fluency output by the binary language model and the fluency output by the ternary language model as the fluency of the voice recognition text.

An apparatus for error correction of speech recognized text, the apparatus comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring fluency of a voice recognition text by using a preset language model, the preset language model is obtained by using corpus training of a first corpus and a second corpus, the first corpus comprises corpus of a general scene, and the second corpus comprises corpus of a preset scene;

the obtaining module is further configured to obtain a word to be corrected in the speech recognition text if the fluency of the speech recognition text is less than a fluency threshold;

and the determining module is used for determining a corrected word corresponding to the word to be corrected from an error correction database and obtaining a corrected voice recognition text according to the corrected word.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

According to the method, the device, the computer equipment and the storage medium for correcting the voice recognition text, the fluency of the voice recognition text is obtained by using the preset language model, if the fluency of the voice recognition text is smaller than a fluency threshold value, words to be corrected in the voice recognition text are obtained, the correction words corresponding to the words to be corrected are determined from the correction database, and the corrected voice recognition text is obtained according to the correction words.

Drawings

FIG. 1 is a diagram illustrating an exemplary embodiment of a method for error correction of speech recognition text;

FIG. 2 is a flowchart illustrating a method for error correction of speech recognition text according to one embodiment;

FIG. 3 is a diagram illustrating the operation of a method for error correction of speech recognition text according to one embodiment;

FIG. 4 is a schematic diagram of an error correction database in one embodiment;

FIG. 5 is a flowchart illustrating a method for correcting errors in speech recognition text according to another embodiment;

FIG. 6 is a block diagram showing the structure of an apparatus for correcting a speech-recognized text in one embodiment;

FIG. 7 is a block diagram showing the construction of an apparatus for correcting a speech recognition text in another embodiment;

FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The method for correcting the text by the voice recognition can be applied to the application environment shown in fig. 1. The method comprises the steps that the fluency of a voice recognition text is obtained by the terminal 102 or the server 104 through a preset language model, wherein the preset language model is obtained through corpus training of a first corpus and a second corpus, the first corpus comprises corpora of a general scene, and the second corpus comprises corpora of a preset scene; if the fluency of the voice recognition text is smaller than a fluency threshold value, acquiring words to be corrected in the voice recognition text; and determining a correction word corresponding to the word to be corrected from an error correction database, and obtaining a corrected voice recognition text according to the correction word.

The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.

In one embodiment, as shown in fig. 2, a method for correcting a speech recognition text is provided, which is described by taking the method as an example applied to the terminal in fig. 1, and includes the following steps:

step 202, obtaining fluency of a speech recognition text by using a preset language model, wherein the preset language model is obtained by utilizing corpus training of a first corpus and a second corpus, the first corpus comprises corpus of a general scene, and the second corpus comprises corpus of a preset scene.

In one embodiment, the first corpus may be a wikipedia dataset, and the wikipedia dataset includes 5000 ten thousand correct expressions conforming to the general scene.

The corpus of the preset scene refers to a corpus applied to a specific scene in each field, and the specific scene can be a working scene, such as finance (financial index query, expense reimbursement, enterprise operation data query), approval (business trip approval, leave approval), purchase (commodity purchase), management (human resource management) and the like. The second corpus includes corpora of predetermined scenes, and in one embodiment, the second corpus can select interactive corpora of working scenes in various fields. Since each domain corresponds to professional knowledge which plays an important role in characterizing the user concept, the interpretation of the user concept can be enhanced by the second corpus.

The speech recognition text is text data recognized based on input speech. Due to the diversity, complexity, and dialect habits of natural language, different users may express the same thing differently, and thus the text data obtained by recognition may also be different. For example, the input speech may be "how much stock of the warehouse is left", the speech recognition text may be "how much stock of the warehouse is left", and may also be "how much stock of the warehouse is saved".

A preset language model refers to a mathematical model established for the context between each word in a sentence that takes into account the context between at least two words, i.e. the occurrence of the next word depends only on the word or words in front of it. The preset language model includes at least one of a binary language model, a ternary language model, …, and an n-gram language model.

As shown in fig. 3, the predetermined language model is obtained by training corpora in the first corpus and the second corpus. Specifically, a language model training tool is used to train the corpora in the first corpus and the second corpus to obtain a preset language model. The language model training tool can be SRILM, IRSTLM, BerkeleyLM, KenLM and the like.

Taking training a binary language model as an example, the probability that two adjacent words in the first corpus and the second corpus occur together is counted, and the statistical result is stored. To simplify the calculation, the probability may take a base-10 logarithmic value, e.g., "our company" may be stored as "our company-1.25". To improve storage efficiency, the storage file may be converted into a binary file.

Specifically, firstly, a preset language model is adopted to detect errors of the voice recognition text. Inputting the voice recognition text into a preset language model to obtain the fluency of the voice recognition text, and judging whether the voice recognition text has errors or not through the fluency, wherein if the fluency is smaller than a fluency threshold value, judging that the voice recognition text has errors, and correcting the voice recognition text.

In one embodiment, the preset language models are a binary language model and a ternary language model, and the binary language model and the ternary language model are both obtained by corpus training in the first corpus and the second corpus. And inputting the voice recognition text into a preset language model to obtain two fluency, and judging that the voice recognition text has errors if the maximum value of the two fluency is smaller than a fluency threshold value.

And 204, if the fluency of the voice recognition text is smaller than a fluency threshold, acquiring words to be corrected in the voice recognition text.

The fluency threshold is used for judging whether errors exist in the speech recognition text, and can be set according to practical application. If the fluency of the voice recognition text is greater than or equal to the fluency threshold, judging that the voice recognition text is correct; and if the fluency is smaller than the fluency threshold value, judging that the voice recognition text has errors, and correcting the voice recognition text.

The words to be corrected refer to erroneous text words in the speech recognition text. In one embodiment, the speech recognition text is tokenized (using a tokenization tool, such as a crust tokenization tool) to obtain text words. Calculating the average absolute deviation value of each text word, if the average absolute deviation of one text word is greater than a deviation threshold value, judging that the text word is wrong, and taking the text word as a word to be corrected; and if the average absolute deviation of one text word is less than or equal to the deviation threshold value, judging that the text word is correct.

Step 206, determining a correction word corresponding to the word to be corrected from the error correction database, and obtaining a corrected voice recognition text according to the correction word.

The error correction database is used for replacing words to be corrected in the voice recognition text.

In one embodiment, as shown in fig. 3, the error correction database may be constructed from the second corpus. And acquiring the corpus of the second corpus, segmenting the corpus of the second corpus by using a segmentation dictionary to obtain candidate words, and constructing an error correction database according to the candidate words and pinyin of the candidate words.

In another embodiment, the error correction database may be constructed from the first corpus and the second corpus. Obtaining the linguistic data of the first corpus and the second corpus, utilizing a word segmentation dictionary to segment the linguistic data of the first corpus and the second corpus to obtain candidate words, and constructing an error correction database according to the candidate words and pinyin of the candidate words.

Wherein, a large number of words are stored in the word segmentation dictionary and are used for word segmentation operation. And when the candidate word is obtained, obtaining the confusion word corresponding to the candidate word, and adding the confusion word into the word segmentation dictionary to enrich the resources of the word segmentation dictionary.

Specifically, error correction candidate words corresponding to the words to be corrected are determined from the error correction database, and further, correction words are determined from the error correction candidate words.

The mode of determining the error correction candidate word corresponding to the word to be corrected from the error correction database may be: the method comprises the steps of obtaining the pinyin of a word to be corrected, obtaining the similarity between the pinyin of the word to be corrected and the pinyin of a candidate word in an error correction database, and taking the candidate word with the similarity larger than a similarity threshold as an error correction candidate word.

The manner of determining the corrected word among the error correction candidate words may be: and replacing the words to be corrected in the voice recognition text by the error correction candidate words, calculating the fluency of the replaced voice recognition text by using a preset language model, and taking the error correction candidate words with the fluency meeting the preset conditions as correction words. In one embodiment, the error correction candidate word corresponding to the maximum value in fluency is taken as the correction word.

Specifically, the words to be corrected in the speech recognition text are replaced by the correction words, so that the corrected speech recognition text is obtained.

According to the method for correcting the voice recognition text, the fluency of the voice recognition text is obtained by using the preset language model, if the fluency of the voice recognition text is smaller than the fluency threshold, words to be corrected in the voice recognition text are obtained, correction words corresponding to the words to be corrected are determined from the correction database, and the corrected voice recognition text is obtained according to the correction words.

In one embodiment, the error correction database is constructed in a manner including: obtaining the corpus of the second corpus; performing word segmentation on the corpus of the second corpus by using a word segmentation dictionary to obtain candidate words; and constructing the error correction database according to the candidate words and the pinyin of the candidate words.

The candidate words refer to words included in the corpus of the second corpus.

Specifically, the corpus of the second corpus is segmented by using a segmentation tool (such as a word segmentation tool for ending) and a segmentation dictionary to obtain candidate words. For example, "how much cash can be withdrawn from the account of our company", the word segmentation tool is used to segment "we", "company", "account", "can", "withdraw", "how much", "cash".

And acquiring the pinyin of each candidate word, and storing the candidate words and the pinyin in an associated manner. In one embodiment, as shown in FIG. 4, words and pinyins are stored as key-value pairs.

In the error correction method for the voice recognition text, the error correction database is constructed according to the second corpus, and the user concept characterization is distinguished and enhanced.

In one embodiment, the method further comprises: obtaining confusion words corresponding to the candidate words; and adding the confusion word into the word segmentation dictionary.

The confusing word refers to a word with a pronunciation close to or the same as that of the candidate word.

A segmentation dictionary stores a large number of words, which are used for the segmentation operation. When the candidate word is obtained, the confusion word corresponding to the candidate word is obtained, and the confusion word is added into the word segmentation dictionary, so that the resources of the word segmentation dictionary are enriched, and the accuracy of word segmentation of the voice recognition text is improved.

Specifically, each character in each candidate word is replaced to obtain a confusion word corresponding to the candidate word. In one embodiment, each word in the candidate word is replaced with a word-level confusion set. For example, the confusing words corresponding to "cash" may be "advanced", "line-in", "current time", etc.

In the error correction method for the voice recognition text, the confusion words are added into the word segmentation dictionary, so that the resources of the word segmentation dictionary are enriched, and the accuracy of the word segmentation of the voice recognition text is improved.

In one embodiment, the obtaining of the word to be corrected in the speech recognition text includes: performing word segmentation on the voice recognition text by using the word segmentation dictionary to obtain text words; calculating the average absolute deviation value of each text word; and if the average absolute deviation value of the text word is greater than the deviation threshold value, determining that the text word is the word to be corrected.

Wherein, the word to be corrected refers to the wrong word in the speech recognition text; text words refer to words in speech recognition text; the deviation threshold is used for judging whether the text word is wrong or not, and can be set according to practical application.

Specifically, the speech recognition text is segmented by using a segmentation tool (such as a Chinese word segmentation tool) and a segmentation dictionary to obtain text words. Calculating the average absolute deviation value of each text word, if the average absolute deviation of one text word is greater than a deviation threshold value, judging that the text word is wrong, and taking the text word as a word to be corrected; and if the average absolute deviation of one text word is less than or equal to the deviation threshold value, judging that the text word is correct.

In the error correction method for the voice recognition text, whether the text word has errors or not is judged according to the average absolute deviation value of the text word, so that the accuracy of error correction is improved.

In one embodiment, the determining, from the error correction database, a correction word corresponding to the word to be corrected includes: determining an error correction candidate word corresponding to the word to be corrected from the error correction database; determining the corrected word among the error corrected candidate words.

The word to be corrected is a word set that may be used to correct the word to be corrected. For example, if the speech recognition text is "how many lines can be drawn by our company account", and "line in" is a word to be corrected, then the word candidate for correction may be "advanced", "cash", and so on.

Specifically, the manner of determining the error correction candidate word corresponding to the word to be corrected from the error correction database may be: and determining error correction candidate words from an error correction database through the pinyin similarity. The pinyin similarity can be determined by the edit distance of the pinyin.

Specifically, the manner of determining the corrected word in the corrected candidate words may be: and replacing the words to be corrected in the voice recognition text by the error correction candidate words, calculating the fluency of the replaced voice recognition text by using a preset language model, and taking the error correction candidate words with the fluency meeting the preset conditions as correction words.

In the method for correcting the voice recognition text, the voice recognition text can be corrected by combining the editing distance of the pinyin and the preset language model, so that the accuracy of selecting the corrected words is further improved.

In one embodiment, the determining, from the error correction database, an error correction candidate word corresponding to the word to be corrected includes: obtaining the pinyin of the word to be corrected; acquiring the similarity between the pinyin of the word to be corrected and the pinyin of the candidate word in the error correction database; and taking the candidate word with the similarity larger than a similarity threshold as the error correction candidate word.

Because the candidate word and the pinyin of the candidate word are stored in the error correction database, the error correction candidate word can be determined by comparing the similarity between the pinyin of the word to be corrected and the pinyin of the candidate word in the error correction database.

Specifically, the similarity between the pinyin of the word to be corrected and the pinyin of the candidate word in the error correction database may be determined by calculating an edit distance between the pinyin of the word to be corrected and the pinyin of the candidate word, where the edit distance is an index for measuring the similarity between the two sequences. Taking pinyin as an example, the edit distance of pinyin means the minimum number of character edit operations required to convert one pinyin to another pinyin between two pinyins. The smaller the editing distance between the pinyin of the word to be corrected and the pinyin of the candidate word is, the greater the similarity between the word to be corrected and the candidate word is, and therefore the candidate word with the similarity greater than the similarity threshold is taken as the error correction candidate word.

In one embodiment, the edit distance of a pinyin is calculated as follows:

wherein, t₀And t_iRespectively the word to be corrected and the candidate in the correction databaseSelecting words, len (x) is the number of words included in the word x, len_p(x) The number of characters contained in the pinyin for the word x.

In the error correction method for the voice recognition text, the error correction candidate words are determined by using the editing distance of the pinyin, so that the accuracy rate of selecting the error correction candidate words is improved.

In one embodiment, the determining the corrected word among the error correction candidate words includes: replacing words to be corrected in the voice recognition text by the error correction candidate words, and calculating fluency of the replaced voice recognition text by the preset language model; and taking the error correction candidate word with the fluency meeting the preset condition as the correction word.

The preset language model is used for calculating the probability of fluency of a sentence. The predetermined language model may be an N-gram language model, where N may be one, two, three, four, etc. The N-element language model means that for one position in a sentence, the probability that the sentence is fluent when each word to be selected is at the position is calculated according to the first N-1 words of the position. The preset language model may also be a combination of at least two N-gram language models, for example, the preset language model may be a binary language model, a ternary language model, or the like.

Specifically, the candidate words for error correction are used for replacing words to be corrected in the voice recognition text, the fluency of the replaced voice recognition text is calculated by using a preset language model, and the correction words are determined according to the fluency. The method for calculating the fluency of the replaced voice recognition text by utilizing the preset language model comprises the following steps: and inputting the replaced voice recognition text into a preset language model to obtain the fluency of the replaced voice recognition text.

The preset conditions are used for screening the correction words from the error correction candidate words and can be set according to practical application. In one embodiment, the error correction candidate word corresponding to the maximum fluency in fluency of the replaced speech recognition text output by the preset language model is used as the correction word.

In the error correction method for the voice recognition text, the correction words are determined through the preset language model, and the accuracy rate of selecting the correction words is improved.

In one embodiment, the preset language model is a binary language model and a ternary language model; the calculating the fluency of the replaced voice recognition text by using the preset language model comprises the following steps: respectively inputting the replaced voice recognition texts into the binary language model and the ternary language model to obtain fluency output by the binary language model and fluency output by the ternary language model; and taking the maximum value of the fluency output by the binary language model and the fluency output by the ternary language model as the fluency of the voice recognition text.

The binary language model and the ternary language model are obtained through corpus training in the first corpus and the second corpus.

Specifically, the replaced speech recognition text is input into a preset language model, the fluency output by the binary language model and the fluency output by the ternary language model are obtained, and the maximum value of the two fluency is used as the fluency of the speech recognition text.

In the error correction method for the voice recognition text, the correction words are determined through the binary language model and the ternary language model, and the accuracy rate of selecting the correction words is improved.

As shown in fig. 5, the method for correcting the error of the speech recognition text in one embodiment is described in detail:

502, acquiring fluency of a voice recognition text by using a preset language model;

step 504, if the fluency of the voice recognition text is smaller than a fluency threshold, performing word segmentation on the voice recognition text by using a word segmentation dictionary to obtain text words;

step 506, calculating the average absolute deviation value of each text word;

step 508, if the average absolute deviation value of the text word is greater than the deviation threshold, determining that the text word is a word to be corrected;

step 510, obtaining the pinyin of the word to be corrected;

step 512, obtaining the similarity between the pinyin of the word to be corrected and the pinyin of the candidate word in the error correction database;

step 514, using the candidate word with the similarity larger than the similarity threshold as the error correction candidate word;

step 516, replacing the words to be corrected in the voice recognition text with the error correction candidate words, and calculating the fluency of the replaced voice recognition text by using a preset language model;

and 518, taking the error correction candidate word with the fluency meeting the preset condition as a correction word of the word to be corrected.

It should be understood that although the steps in the flowcharts of fig. 2 and 5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2 and 5 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 6, there is provided an apparatus 600 for correcting a speech recognition text, including: an obtaining module 602 and a determining module 604, wherein:

an obtaining module 602, configured to obtain fluency of a speech recognition text by using a preset language model, where the preset language model is obtained by using corpus training of a first corpus and a second corpus, the first corpus includes corpora of a general scene, and the second corpus includes corpora of a preset scene;

the obtaining module 602 is further configured to obtain a word to be corrected in the speech recognition text if the fluency of the speech recognition text is less than a fluency threshold;

a determining module 604, configured to determine a correction word corresponding to the word to be corrected from an error correction database, and obtain a corrected speech recognition text according to the correction word.

According to the error correction device 600 for the voice recognition text, the fluency of the voice recognition text is obtained by using the preset language model, if the fluency of the voice recognition text is smaller than the fluency threshold, words to be corrected in the voice recognition text are obtained, correction words corresponding to the words to be corrected are determined from the error correction database, and the corrected voice recognition text is obtained according to the correction words, so that the wrong words in the voice recognition text are detected and corrected, the accuracy of recognition of the voice recognition text is improved, the preset language model is trained by using the second corpus, the error correction database is built, the concept representations of users are distinguished and enhanced, and the accuracy of recognition of the intentions of the users is improved.

In an embodiment, as shown in fig. 7, the apparatus 600 for correcting a text recognition further includes a word segmentation module 606 and a construction module 608, where the obtaining module 602 is further configured to obtain corpora of the second corpus; the word segmentation module 606 is configured to perform word segmentation on the corpus of the second corpus by using a word segmentation dictionary to obtain candidate words; the constructing module 608 is configured to construct the error correction database according to the candidate word and the pinyin of the candidate word.

In an embodiment, the apparatus 600 for identifying a text further includes an adding module, and the obtaining module 602 is further configured to obtain a confusion word corresponding to the candidate word; the adding module is used for adding the confusion word into the word segmentation dictionary.

In an embodiment, the obtaining module 602 is further configured to perform word segmentation on the voice recognition text by using the word segmentation dictionary to obtain text words; calculating the average absolute deviation value of each text word; and if the average absolute deviation value of the text word is greater than the deviation threshold value, determining that the text word is the word to be corrected.

In an embodiment, the determining module 604 is further configured to determine, from the error correction database, an error correction candidate word corresponding to the word to be corrected; determining the corrected word among the error corrected candidate words.

In an embodiment, the determining module 604 is further configured to obtain a pinyin of the word to be corrected; acquiring the similarity between the pinyin of the word to be corrected and the pinyin of the candidate word in the error correction database; and taking the candidate word with the similarity larger than a similarity threshold as the error correction candidate word.

In an embodiment, the determining module 604 is further configured to obtain an editing distance between the pinyin of the word to be corrected and the pinyin of the candidate word in the error correction database, and represent a similarity between the pinyin of the word to be corrected and the pinyin of the candidate word in the error correction database by using the editing distance. In an embodiment, the determining module 604 is further configured to replace a word to be corrected in the speech recognition text with the error correction candidate word, and calculate fluency of the replaced speech recognition text by using the preset language model; and taking the error correction candidate word with the fluency meeting the preset condition as the correction word.

In an embodiment, the determining module 604 is further configured to input the replaced speech recognition text into the binary language model and the ternary language model respectively, so as to obtain fluency output by the binary language model and fluency output by the ternary language model; and taking the maximum value of the fluency output by the binary language model and the fluency output by the ternary language model as the fluency of the voice recognition text. For the specific definition of the error correction device for the speech recognition text, reference may be made to the above definition of the error correction method for the speech recognition text, and details are not described here. The modules in the device for correcting the speech recognition text can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server or a terminal, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing error correction data for speech recognition text. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of error correction for speech recognition text.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

In one embodiment, the processor, when executing the computer program, further performs the steps of:

obtaining the corpus of the second corpus;

obtaining confusion words corresponding to the candidate words;

and adding the confusion word into the word segmentation dictionary.

calculating the average absolute deviation value of each text word;

determining the corrected word among the error corrected candidate words.

obtaining the pinyin of the word to be corrected;

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

obtaining the corpus of the second corpus;

obtaining confusion words corresponding to the candidate words;

and adding the confusion word into the word segmentation dictionary.

calculating the average absolute deviation value of each text word;

determining the corrected word among the error corrected candidate words.

obtaining the pinyin of the word to be corrected;

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of error correction for speech recognized text, the method comprising:

2. The method of claim 1, wherein the error correction database is constructed in a manner comprising:

obtaining the corpus of the second corpus;

3. The method of claim 2, further comprising:

obtaining confusion words corresponding to the candidate words;

and adding the confusion word into the word segmentation dictionary.

4. The method according to claim 3, wherein the obtaining of the word to be corrected in the speech recognition text comprises:

calculating the average absolute deviation value of each text word;

5. The method according to claim 2, wherein the determining, from the error correction database, a corrected word corresponding to the word to be corrected comprises:

determining the corrected word among the error corrected candidate words.

6. The method according to claim 5, wherein the determining, from the error correction database, the error correction candidate word corresponding to the word to be corrected comprises:

obtaining the pinyin of the word to be corrected;

7. The method according to claim 6, wherein the obtaining the similarity between the pinyin of the word to be corrected and the pinyin of the candidate word in the error correction database comprises:

8. The method of claim 5, wherein determining the corrected word among the error corrected candidate words comprises:

9. The method according to claim 8, wherein the preset language model is a binary language model and a ternary language model;

10. An apparatus for correcting a speech-recognized text, the apparatus comprising:

11. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 9 when executing the computer program.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 9.