[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111243593A - Speech recognition error correction method, mobile terminal and computer-readable storage medium - Google Patents

Speech recognition error correction method, mobile terminal and computer-readable storage medium Download PDF

Info

Publication number
CN111243593A
CN111243593A CN201811333544.4A CN201811333544A CN111243593A CN 111243593 A CN111243593 A CN 111243593A CN 201811333544 A CN201811333544 A CN 201811333544A CN 111243593 A CN111243593 A CN 111243593A
Authority
CN
China
Prior art keywords
recognition result
error correction
voice
user
voice recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201811333544.4A
Other languages
Chinese (zh)
Inventor
林柏青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qiku Internet Technology Shenzhen Co Ltd
Original Assignee
Qiku Internet Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qiku Internet Technology Shenzhen Co Ltd filed Critical Qiku Internet Technology Shenzhen Co Ltd
Priority to CN201811333544.4A priority Critical patent/CN111243593A/en
Publication of CN111243593A publication Critical patent/CN111243593A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses a voice recognition error correction method, a mobile terminal and a computer readable storage medium, wherein the method comprises the following steps: recognizing a first voice input by a user, and generating a first voice recognition result corresponding to the first voice; displaying the first voice recognition result to the user so that the user can confirm the first voice recognition result; receiving second voice input by the user, and generating a second voice recognition result corresponding to the second voice; judging whether the second voice recognition result carries the error correction intention of the user or not; if so, correcting the error of the first voice recognition result according to the second voice recognition result to obtain a first voice recognition result after error correction. The method and the device can intelligently correct the voice recognition result according to the voice with the error correction intention input by the user, thereby realizing accurate error correction of the voice recognition result and ensuring the accuracy of the corrected voice recognition result.

Description

Speech recognition error correction method, mobile terminal and computer-readable storage medium
Technical Field
The present application relates to the field of communications technologies, and in particular, to a speech recognition error correction method, a mobile terminal, and a computer-readable storage medium.
Background
With the increasing maturity of the related art of artificial intelligence, more and more intelligent devices enter the lives of users, and human-machine interaction is becoming common. The most frequently used voice interaction in the interaction process is generally realized based on a voice recognition technology. The voice recognition technology is a technology for recognizing a voice signal input by a user and finally converting the voice signal into a text/character string, namely, a recognition result is a text, and convenience is provided for natural and humanized man-machine interaction. Taking a mobile terminal adopting a voice recognition technology as an example, under the support of the voice recognition technology, as long as a user speaks towards the mobile terminal, characters can be automatically formed after being recognized by a voice recognition system, and the input efficiency of the user is greatly improved.
However, in the existing application environment, the speech recognition technology still cannot achieve a hundred percent correct recognition rate, and when the speech information input by the user is recognized, errors are likely to occur in the obtained speech recognition result. When the voice recognition result is wrong, the user can only modify and edit the voice recognition result manually, the intelligence is low, and particularly on a finger touch screen device with a small screen, the size of the screen is limited, so that great inconvenience is brought to the user for character input, and the user experience is poor.
Disclosure of Invention
The application mainly aims to provide a voice recognition error correction method, a mobile terminal and a computer readable storage medium, and aims to solve the technical problems that when a voice recognition result is wrong, a user can only manually correct and edit the voice recognition result, the intelligence is low, and the user experience is poor.
The application provides a voice recognition error correction method, which is applied to a mobile terminal and comprises the following steps:
recognizing a first voice input by a user, and generating a first voice recognition result corresponding to the first voice;
displaying the first voice recognition result to the user so that the user can confirm the first voice recognition result;
receiving second voice input by the user, and generating a second voice recognition result corresponding to the second voice;
judging whether the second voice recognition result carries the error correction intention of the user or not;
if so, correcting the error of the first voice recognition result according to the second voice recognition result to obtain a first voice recognition result after error correction.
Preferably, the step of recognizing the first voice input by the user and generating the first voice recognition result corresponding to the first voice is preceded by the steps of:
receiving original voice input by the user;
judging whether the original voice meets a preset condition or not;
if the original voice does not meet the preset condition, sending reminding information for replying the original voice to the user;
receiving a repeat voice input by the user and obtained by repeating the original voice, wherein the repeat voice is the first voice.
Preferably, the step of judging whether the second speech recognition result carries the error correction intention of the user includes:
extracting all words in the second voice recognition result;
matching each word with all keywords in a preset database one by one;
if the matching is successful, judging that the second voice recognition result carries the error correction intention of the user;
and if the matching is unsuccessful, judging that the second voice recognition result does not carry the error correction intention of the user.
Preferably, the step of judging whether the second speech recognition result carries the error correction intention of the user includes:
inputting the second voice recognition result into a preset first neural network model so as to obtain a classification result after the second voice recognition result is subjected to intention classification through the first neural network model, wherein the classification result comprises an intention of error correction or no intention of error correction;
receiving the classification result returned by the first neural network model, and judging whether the classification result has an error correction intention:
if the classification result is that the user has the error correction intention, judging that the second voice recognition result carries the error correction intention of the user;
and if the classification result does not have the error correction intention, judging that the second voice recognition result does not carry the error correction intention of the user.
Preferably, the step of performing error correction on the first speech recognition result according to the second speech recognition result to obtain an error-corrected first speech recognition result includes:
inputting the second voice recognition result into a preset second neural network model so that the second neural network model can analyze and process the second voice recognition result and then output error correction information corresponding to the second voice recognition result;
receiving the error correction information returned by the second neural network model;
and correcting the error of the first voice recognition result according to the error correction information to obtain the corrected first voice recognition result.
Preferably, the step of performing error correction on the first speech recognition result according to the error correction information to obtain the error-corrected first speech recognition result includes:
matching the error correction information with a plurality of preset error correction templates one by one, and screening out a first error correction template matched with the error correction information;
respectively extracting error words, error correction words and error correction type words of the second voice recognition result according to the format of the first error correction template;
acquiring the position of the wrong word in the first voice recognition result;
and performing corresponding error correction at the position according to the error correction words and the error correction type words to obtain the first voice recognition result after error correction.
Preferably, after the step of performing error correction on the first speech recognition result according to the second speech recognition result to obtain an error-corrected first speech recognition result, the method includes:
displaying the first voice recognition result after error correction to the user so that the user can confirm the first voice recognition result after error correction;
judging whether the confirmation information of the user is received or not;
if so, performing intention analysis on the first voice recognition result after error correction, and extracting a user intention corresponding to the first voice recognition result after error correction;
and executing the operation corresponding to the user intention.
Preferably, after the step of performing the operation corresponding to the user intention, the method includes:
judging whether a third voice is detected within a first preset time period;
and entering a standby state if the third voice is not detected within the first preset time period.
The application also provides a mobile terminal, which comprises a memory and a processor, wherein the memory is stored with a computer program, and the processor executes the computer program to realize the steps of the method.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method.
The voice recognition error correction method, the mobile terminal and the computer readable storage medium provided by the application have the following beneficial effects:
the voice recognition error correction method, the mobile terminal and the computer readable storage medium provided by the application recognize a first voice input by a user and generate a first voice recognition result corresponding to the first voice; displaying the first voice recognition result to the user so that the user can confirm the first voice recognition result; receiving second voice input by the user, and generating a second voice recognition result corresponding to the second voice; judging whether the second voice recognition result carries the error correction intention of the user or not; if so, correcting the error of the first voice recognition result according to the second voice recognition result to obtain a first voice recognition result after error correction. According to the method and the device, the voice recognition result can be intelligently corrected according to the voice with the error correction intention input by the user, so that accurate error correction of the voice recognition result is realized, the accuracy of the voice recognition result after error correction is ensured, in addition, the user only needs to send the voice with the error correction intention, manual modification is not needed, and the use experience of the user is improved.
Drawings
FIG. 1 is a schematic flow chart of a speech recognition error correction method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a speech recognition error correction method according to another embodiment of the present application;
FIG. 3 is a schematic flow chart of a speech recognition error correction method according to another embodiment of the present application;
fig. 4 is a schematic structural diagram of a mobile terminal according to an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
It should be noted that all directional indicators (such as upper, lower, left, right, front and rear … …) in the embodiments of the present application are only used to explain the relative position relationship between the components, the movement situation, etc. in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicator is changed accordingly, and the connection may be a direct connection or an indirect connection.
Referring to fig. 1, a speech recognition error correction method according to an embodiment of the present application includes:
s1: recognizing a first voice input by a user, and generating a first voice recognition result corresponding to the first voice;
s2: displaying the first voice recognition result to the user so that the user can confirm the first voice recognition result;
s3: receiving second voice input by the user, and generating a second voice recognition result corresponding to the second voice;
s4: judging whether the second voice recognition result carries the error correction intention of the user or not;
s5: if so, correcting the error of the first voice recognition result according to the second voice recognition result to obtain a first voice recognition result after error correction.
In this embodiment, the speech recognition error correction method provided by the present application is applied to a device having a speech recognition function and a speech input/output function, and is not limited thereto. Generally, the device realizes the voice input and output function through a man-machine voice interaction interface, a specific voice input interface can be a microphone and other equipment, and a voice output interface can be a sound and other equipment. Specifically, the device having the voice recognition function and the voice input/output function may be a voice assistant in a mobile terminal, and the following embodiments will be described with the voice assistant as an execution subject in the voice recognition error correction method provided by the present application. Because the speech recognition technology still cannot achieve hundred percent correct recognition, when first speech input by a user is received, a first speech recognition result obtained by a speech assistant through recognizing the first speech is likely to have an error, and the existing speech assistant does not have a function of correcting the error of the first speech recognition result, when the first speech recognition result has an error, the user needs to manually input correct content to modify, and the user experience is poor. In this embodiment, after the first speech recognition result is displayed to the user, if the user finds that the first speech recognition result is incorrect, the user may interact with the speech assistant, so that the speech assistant may implement error correction on the first speech recognition result according to the content of interaction with the user. Specifically, after a first voice recognition result is displayed to a user, if a second voice sent by the user is received, the second voice may be an error correction voice sent by the user and requiring error correction of the content of the first voice recognition result, at this time, the voice assistant may first recognize the second voice to obtain a second voice recognition result corresponding to the second voice, and then determine whether the second voice recognition result carries an error correction intention of the user, if the second voice recognition result carries the error correction intention of the user, it indicates that the user has an intention to correct the error of the first voice recognition result, and the voice assistant may modify the first voice recognition result according to the content of the second voice recognition result and obtain a modified first voice recognition result. The step of determining whether the second speech recognition result carries the error correction intention of the user may specifically be: judging whether words matched with all keywords in a preset database exist in the second voice recognition result, and if yes, judging that the second voice recognition result carries the error correction intention of the user; or inputting the second voice recognition result into a preset first neural network model, wherein the output of the first neural network model comprises the error correction intention or does not comprise the error correction intention, and judging that the second voice recognition result does not carry the error correction intention of the user if the output of the first neural network model is judged to have the error correction intention. In the embodiment, the first voice recognition result is intelligently corrected and corrected according to the second voice with the error correction intention input by the user, so that the accurate error correction of the voice recognition result is realized, the accuracy of the voice recognition result after error correction is ensured, in addition, the user only needs to send the voice with the error correction intention, manual modification is not needed, and the use experience of the user is improved.
Further, in an embodiment of the present application, before the step S1, the method includes:
s100: receiving original voice input by the user;
s101: judging whether the original voice meets a preset condition or not;
s102: if the original voice does not meet the preset condition, sending reminding information for replying the original voice to the user;
s103: receiving a repeat voice input by the user and obtained by repeating the original voice, wherein the repeat voice is the first voice.
In this embodiment, when the user needs to perform voice interaction with the voice assistant, the voice assistant may input voice information to the man-machine voice interaction interface, and then recognize the voice information input by the user to obtain a voice recognition result, and perform a corresponding operation according to the voice recognition result. However, in practical applications, due to the influence of various factors such as network quality, environmental noise, and speech rate or volume of the user's voice, for example: background noise is particularly loud, interference is large, or the volume of voice input by a user is too small, so that voice information input by the user is distorted, and a voice assistant fails to recognize the voice information. In this embodiment, in order to avoid outputting a result of a speech recognition failure to a user, so that the user experiences a poor situation, when a situation that speech input by the user cannot be recognized occurs, the speech assistant may intelligently remind the user to re-input speech, and then recognize the speech and return a corresponding speech recognition result to the user, so as to ensure the user experience of the user. Specifically, in this embodiment, first, a preset condition corresponding to the voice information is set, when an original voice input by a user is received, the voice assistant may determine the original voice, and when the original voice does not satisfy the preset condition, it is determined that the original voice is distorted, and at this time, the voice assistant may not recognize the original voice, that is, the recognition of the original voice fails; and when the original voice meets the preset conditions, judging that the original voice is not deformed, and finishing the recognition of the original voice by the voice assistant. The parameter satisfying the preset condition is that the parameter of the original speech satisfies a parameter condition, and the category and the number of the parameter condition are not limited, specifically, the parameter condition may include a frequency condition, a speech rate condition, and a noise ratio condition, the frequency condition may be limited to a frequency variation range, the speech rate condition may be limited to a speech rate range or a speech rate variation range, and the noise ratio condition may be limited to a noise ratio range.
Further, in an embodiment of the present application, the step S4 includes:
s400: extracting all words in the second voice recognition result;
s401: matching each word with all keywords in a preset database one by one;
s402: if the matching is successful, judging that the second voice recognition result carries the error correction intention of the user;
s403: and if the matching is unsuccessful, judging that the second voice recognition result does not carry the error correction intention of the user.
In this embodiment, the above-mentioned keywords are some special words corresponding to the intention of error correction, such as "delete", "replace", "change", "add", "insert", and the like, and the keywords may be automatically set by the voice assistant after analyzing and processing the historical habit data of the user, or may be input to the voice assistant by the user according to the actual usage habit of the user. After the keywords are obtained, a database is set, and all the keywords are stored in the database, wherein the database can be set in a mobile terminal corresponding to the voice assistant or in a cloud end, the database and the voice assistant are in communication connection, and the voice assistant can call out all the keywords in the database at any time; in addition, the keywords in the database may also be updated, for example, adding some new keywords. When receiving a second voice sent by a user, because it may happen that the user finds that the first voice recognition result displayed by the voice assistant is wrong and needs to correct the error of the first voice recognition result, after recognizing the second voice to obtain the second voice recognition result, it is to analyze whether the second voice recognition result carries the user's intention to correct, specifically: firstly, extracting all words in the words through some specified programs or algorithms, or extracting all words in the words through word segmentation processing, and then respectively matching all the words with all keywords stored in a preset database one by one, wherein the keywords have corresponding relations with the error correction intention; if at least one of all the words extracted from the words extraction device can be matched with the keyword, the second voice recognition result can be judged to carry the error correction intention of the user, and if no words which can be matched with the keyword exists in all the words extracted from the words extraction device, the second voice recognition result can be judged not to carry the error correction intention of the user.
Further, in an embodiment of the present application, the step S4 includes:
s410: inputting the second voice recognition result into a preset first neural network model so as to obtain a classification result after the second voice recognition result is subjected to intention classification through the first neural network model, wherein the classification result comprises an intention of error correction or no intention of error correction;
s411: receiving the classification result returned by the first neural network model, and judging whether the classification result has an error correction intention:
s412: if the classification result is that the user has the error correction intention, judging that the second voice recognition result carries the error correction intention of the user;
s413: and if the classification result does not have the error correction intention, judging that the second voice recognition result does not carry the error correction intention of the user.
In this embodiment, since the second speech recognition result corresponding to the second speech input by the user does not necessarily include the keyword in the preset database, in order to avoid the situation that the user has an error correction intention for performing error correction on the first speech recognition result but is not detected, a preset first neural network model can be used to accurately determine whether the second speech recognition result includes the error correction intention. The specific type of the neural network model corresponding to the first neural network model is not limited, and the first neural network model may be any one of a Convolutional Neural Network (CNN) model, a Recurrent Neural Network (RNN) model, and a Deep Neural Network (DNN) model, or a combination of any plurality of them, but is not limited to the above-mentioned neural network models. Specifically, after the second speech recognition result is input into the first neural network model, the first neural network model obtains a classification result corresponding to the second speech recognition result in a manner of combining word vector features and statistical features, that is, the first neural network model extracts word vector features of each word in the second speech recognition result by using an embedding layer (embedding) and a long-and-short time memory network, develops feature engineering to extract statistical features of each word in the second speech recognition result, and finally performs comprehensive analysis on the word vector features and the statistical features of each word in the obtained second speech recognition result to obtain a classified classification result, and then knows whether the second speech recognition result carries the error correction intention of the user according to specific content of the classification result. The statistical characteristics can be selected according to actual needs, for example, the statistical characteristics can include pinyin characteristics, pronunciation distance characteristics, rule characteristics, and the like, and the classification result includes the error correction intention or does not include the error correction intention.
Referring to fig. 2, further, in an embodiment of the application, the step S5 includes:
s500: inputting the second voice recognition result into a preset second neural network model so that the second neural network model can analyze and process the second voice recognition result and then output error correction information corresponding to the second voice recognition result;
s501: receiving the error correction information returned by the second neural network model;
s502: and correcting the error of the first voice recognition result according to the error correction information to obtain the corrected first voice recognition result.
In this embodiment, after it is determined that the second speech recognition result carries the error correction intention of the user, the second speech recognition result cannot be directly used to correct the error of the first speech recognition result, because the second speech recognition result includes not only error correction information but also other noise information, it is necessary to extract useful error correction information and remove irrelevant information, so that the subsequent error correction processing can be completed. In this embodiment, a second neural network model is preset, the second speech recognition result is input into the second neural network model, and after the second speech recognition result is analyzed and processed by the second neural network model, error correction information in the second speech recognition result is output, and then the speech assistant can correct the error of the first speech recognition result according to the error correction information returned by the second neural network model, so as to obtain the corrected first speech recognition result. Further, before inputting the second speech recognition result into the second neural network model, the process of creating the second neural network model is also required, specifically: collecting a specified amount of error correction text data, marking error correction information in the error correction text data, inputting the error correction text data serving as a training sample into a specified neural network model for training until the specified neural network model converges, namely, enabling the error between the result output by the specified neural network model for a certain specific training sample and the error correction information marked by the specific sample to be smaller than a preset threshold value, such as smaller than 2%, and finally determining the converged specified neural network model as the second neural network model. Therefore, the second neural network model can analyze any input actual measurement error correction text data and then accurately output error correction information in the actual measurement error correction text data.
Referring to fig. 3, further, in an embodiment of the application, the step S502 includes:
s5020: matching the error correction information with a plurality of preset error correction templates one by one, and screening out a first error correction template matched with the error correction information;
s5021: respectively extracting error words, error correction words and error correction type words of the second voice recognition result according to the format of the first error correction template;
s5022: acquiring the position of the wrong word in the first voice recognition result;
s5023: and performing corresponding error correction at the position according to the error correction words and the error correction type words to obtain the first voice recognition result after error correction.
In this embodiment, a plurality of error correction templates for voice recognition error correction are preset, where a specific number of the error correction templates is set according to actual requirements, and the error correction templates may be automatically generated by the voice assistant according to the historical usage habit of the voice assistant by the user, or may be set by the user according to the requirement of the user. For example, the error correction template may include an alternative error correction template: "replace/change a to B," may also include deleting the error correction template: "delete a," may also include adding an error correction template: "Add B before or after A". After the error correction information is extracted from the second speech recognition result, the error correction is performed on the first speech recognition result according to the error correction information, so as to obtain the error-corrected first speech recognition result. Wherein, the step of correcting the error of the first speech recognition result according to the error correction information specifically includes: firstly, matching the error correction information with all preset error correction templates to screen out a first error correction template matched with the error correction information, and then extracting error words, error correction words and error correction type words in the second voice recognition result according to the format of the first error correction template, for example, if the first error correction template is a replacement error correction template: and finally, performing corresponding error correction on the error correction position according to the error correction type words of the extracted error correction words to obtain the first voice recognition result after error correction. For example, if the first speech recognition result is "help me make a call to a small name", and the second speech recognition result is "change the name to tomorrow", the corresponding error correction template is a replacement error correction template, the error word is "name", the error correction word is "tomorrow's brightness", and the error correction type word is "change", and the modified first speech recognition result obtained after error correction is "help me make a call to tomorrow".
Further, in an embodiment of the present application, after the step S5, the method includes:
s6: displaying the first voice recognition result after error correction to the user so that the user can confirm the first voice recognition result after error correction;
s7: judging whether the confirmation information of the user is received or not;
s8: if so, performing intention analysis on the first voice recognition result after error correction, and extracting a user intention corresponding to the first voice recognition result after error correction;
s9: and executing the operation corresponding to the user intention.
In this embodiment, after the second speech recognition result corresponding to the second speech input by the user is obtained, the error correction information extracted from the second speech recognition result is used to correct the first speech recognition result, and the corrected first speech recognition result is generated. In order to prevent an error from existing in the newly generated error-corrected first speech recognition result, the speech assistant may present the error-corrected first speech recognition result to the user, that is, present the error-corrected first speech recognition result in a display screen of the mobile terminal, so as to enable the user to confirm the accuracy of the error-corrected first speech recognition result, if the user returns confirmation information, it indicates that the error-corrected first speech recognition result is correct, and then may further perform intent analysis on the error-corrected first speech recognition result, to obtain a user intent corresponding to the error-corrected first speech recognition result, and perform a corresponding operation according to the user intent. For example, if the third speech recognition result is "help me make a telephone call to xiao ming", the speech assistant searches for a xiao ming telephone number from the telephone communication class after analyzing the intention of the user, and then performs an operation of dialing the xiao ming telephone number to complete the operation that the user needs to perform. If the user does not return the confirmation information, an error may still exist in the third speech recognition result, and at this time, the error-corrected first speech recognition result needs to be further corrected again according to the interaction with the user until the corrected result is a correct result confirmed by the user.
Further, in an embodiment of the present application, after the step S9, the method includes:
s10: judging whether a third voice is detected within a first preset time period;
s11: and entering a standby state if the third voice is not detected within the first preset time period.
In this embodiment, after the user intention corresponding to the corrected third recognition result is extracted and the executed action corresponding to the user intention is completed, that is, the voice assistant has completed the operation that the user wants to execute at this time. And presetting a first preset time period, and when the third voice sent by the user is not detected in the first preset time period, automatically switching the voice assistant from the active state to the standby state so as to reduce the power consumption of the mobile terminal. The first preset time period may be set according to actual needs, for example, may be set to 1-10 minutes, and when the voice message sent by the user is not detected within 5 minutes, the voice assistant hi automatically enters the standby state. In addition, the voice assistant may also enter a standby state immediately after completing the performed action corresponding to the user's intention, and after the voice assistant performs the standby state, when the user needs to interact with the voice assistant again, the user needs to wake up the voice assistant first, for example, the voice assistant may be woken up by a voice wake-up instruction, and then perform voice interaction with the voice assistant that has woken up successfully.
Referring to fig. 4, an embodiment of the present application further provides a mobile terminal, where the mobile terminal may be a server, and an internal structure of the mobile terminal may be as shown in fig. 4. The mobile terminal includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the mobile terminal is designed to provide computing and control capabilities. The memory of the mobile terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the mobile terminal is used for storing data such as voice recognition error correction and the like. The network interface of the mobile terminal is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a speech recognition error correction method.
The processor executes the steps of the voice recognition error correction method:
recognizing a first voice input by a user, and generating a first voice recognition result corresponding to the first voice;
displaying the first voice recognition result to the user so that the user can confirm the first voice recognition result;
receiving second voice input by the user, and generating a second voice recognition result corresponding to the second voice;
judging whether the second voice recognition result carries the error correction intention of the user or not;
if so, correcting the error of the first voice recognition result according to the second voice recognition result to obtain a first voice recognition result after error correction.
Those skilled in the art will appreciate that the structure shown in fig. 4 is only a block diagram of a part of the structure related to the present application, and does not constitute a limitation to the mobile terminal to which the present application is applied.
An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a speech recognition error correction method, and specifically:
recognizing a first voice input by a user, and generating a first voice recognition result corresponding to the first voice;
displaying the first voice recognition result to the user so that the user can confirm the first voice recognition result;
receiving second voice input by the user, and generating a second voice recognition result corresponding to the second voice;
judging whether the second voice recognition result carries the error correction intention of the user or not;
if so, correcting the error of the first voice recognition result according to the second voice recognition result to obtain a first voice recognition result after error correction.
To sum up, the speech recognition error correction method, the mobile terminal and the computer-readable storage medium provided in the embodiment of the present application recognize a first speech input by a user, and generate a first speech recognition result corresponding to the first speech; displaying the first voice recognition result to the user so that the user can confirm the first voice recognition result; receiving second voice input by the user, and generating a second voice recognition result corresponding to the second voice; judging whether the second voice recognition result carries the error correction intention of the user or not; if so, correcting the error of the first voice recognition result according to the second voice recognition result to obtain a first voice recognition result after error correction. According to the method and the device, the voice recognition result can be intelligently corrected according to the voice with the error correction intention input by the user, so that accurate error correction of the voice recognition result is realized, the accuracy of the voice recognition result after error correction is ensured, in addition, the user only needs to send the voice with the error correction intention, manual modification is not needed, and the use experience of the user is improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by hardware associated with instructions of a computer program, which may be stored on a non-volatile computer-readable storage medium, and when executed, may include processes of the above embodiments of the methods. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (10)

1. A method for speech recognition error correction, comprising:
recognizing a first voice input by a user, and generating a first voice recognition result corresponding to the first voice;
displaying the first voice recognition result to the user so that the user can confirm the first voice recognition result;
receiving second voice input by the user, and generating a second voice recognition result corresponding to the second voice;
judging whether the second voice recognition result carries the error correction intention of the user or not;
if so, correcting the error of the first voice recognition result according to the second voice recognition result to obtain a first voice recognition result after error correction.
2. The speech recognition error correction method according to claim 1, wherein the step of recognizing the first speech input by the user and generating the first speech recognition result corresponding to the first speech is preceded by:
receiving original voice input by the user;
judging whether the original voice meets a preset condition or not;
if the original voice does not meet the preset condition, sending reminding information for replying the original voice to the user;
receiving a repeat voice input by the user and obtained by repeating the original voice, wherein the repeat voice is the first voice.
3. The method according to claim 1, wherein the step of determining whether the second speech recognition result carries the user's intention to correct the error comprises:
extracting all words in the second voice recognition result;
matching each word with all keywords in a preset database one by one;
if the matching is successful, judging that the second voice recognition result carries the error correction intention of the user;
and if the matching is unsuccessful, judging that the second voice recognition result does not carry the error correction intention of the user.
4. The method according to claim 1, wherein the step of determining whether the second speech recognition result carries the user's intention to correct the error comprises:
inputting the second voice recognition result into a preset first neural network model so as to obtain a classification result after the second voice recognition result is subjected to intention classification through the first neural network model, wherein the classification result comprises an intention of error correction or no intention of error correction;
receiving the classification result returned by the first neural network model, and judging whether the classification result has an error correction intention:
if the classification result is that the user has the error correction intention, judging that the second voice recognition result carries the error correction intention of the user;
and if the classification result does not have the error correction intention, judging that the second voice recognition result does not carry the error correction intention of the user.
5. The method for correcting the voice recognition error according to claim 1, wherein the step of correcting the first voice recognition result according to the second voice recognition result to obtain the corrected first voice recognition result comprises:
inputting the second voice recognition result into a preset second neural network model so that the second neural network model can analyze and process the second voice recognition result and then output error correction information corresponding to the second voice recognition result;
receiving the error correction information returned by the second neural network model;
and correcting the error of the first voice recognition result according to the error correction information to obtain the corrected first voice recognition result.
6. The method according to claim 5, wherein the step of performing error correction on the first speech recognition result according to the error correction information to obtain the error-corrected first speech recognition result comprises:
matching the error correction information with a plurality of preset error correction templates one by one, and screening out a first error correction template matched with the error correction information;
respectively extracting error words, error correction words and error correction type words of the second voice recognition result according to the format of the first error correction template;
acquiring the position of the wrong word in the first voice recognition result;
and performing corresponding error correction at the position according to the error correction words and the error correction type words to obtain the first voice recognition result after error correction.
7. The method according to claim 5, wherein the step of correcting the error of the first speech recognition result according to the second speech recognition result to obtain the corrected first speech recognition result comprises:
displaying the first voice recognition result after error correction to the user so that the user can confirm the first voice recognition result after error correction;
judging whether the confirmation information of the user is received or not;
if so, performing intention analysis on the first voice recognition result after error correction, and extracting a user intention corresponding to the first voice recognition result after error correction;
and executing the operation corresponding to the user intention.
8. The speech recognition error correction method according to claim 1, wherein the step of performing the operation corresponding to the user's intention is followed by:
judging whether a third voice is detected within a first preset time period;
and entering a standby state if the third voice is not detected within the first preset time period.
9. A mobile terminal comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method according to any of claims 1 to 8.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.
CN201811333544.4A 2018-11-09 2018-11-09 Speech recognition error correction method, mobile terminal and computer-readable storage medium Withdrawn CN111243593A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811333544.4A CN111243593A (en) 2018-11-09 2018-11-09 Speech recognition error correction method, mobile terminal and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811333544.4A CN111243593A (en) 2018-11-09 2018-11-09 Speech recognition error correction method, mobile terminal and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN111243593A true CN111243593A (en) 2020-06-05

Family

ID=70863656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811333544.4A Withdrawn CN111243593A (en) 2018-11-09 2018-11-09 Speech recognition error correction method, mobile terminal and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN111243593A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112331194A (en) * 2019-07-31 2021-02-05 北京搜狗科技发展有限公司 Input method and device and electronic equipment
CN112509581A (en) * 2020-11-20 2021-03-16 北京有竹居网络技术有限公司 Method and device for correcting text after speech recognition, readable medium and electronic equipment
CN116229975A (en) * 2023-03-17 2023-06-06 杭州盈禾嘉田科技有限公司 System and method for voice reporting of field diseases and insect pests in intelligent interaction scene

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016023317A1 (en) * 2014-08-15 2016-02-18 中兴通讯股份有限公司 Voice information processing method and terminal
CN107220235A (en) * 2017-05-23 2017-09-29 北京百度网讯科技有限公司 Speech recognition error correction method, device and storage medium based on artificial intelligence
JP2018040904A (en) * 2016-09-06 2018-03-15 トヨタ自動車株式会社 Voice recognition device and voice recognition method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016023317A1 (en) * 2014-08-15 2016-02-18 中兴通讯股份有限公司 Voice information processing method and terminal
JP2018040904A (en) * 2016-09-06 2018-03-15 トヨタ自動車株式会社 Voice recognition device and voice recognition method
CN107220235A (en) * 2017-05-23 2017-09-29 北京百度网讯科技有限公司 Speech recognition error correction method, device and storage medium based on artificial intelligence

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112331194A (en) * 2019-07-31 2021-02-05 北京搜狗科技发展有限公司 Input method and device and electronic equipment
CN112331194B (en) * 2019-07-31 2024-06-18 北京搜狗科技发展有限公司 Input method and device and electronic equipment
CN112509581A (en) * 2020-11-20 2021-03-16 北京有竹居网络技术有限公司 Method and device for correcting text after speech recognition, readable medium and electronic equipment
CN112509581B (en) * 2020-11-20 2024-03-01 北京有竹居网络技术有限公司 Error correction method and device for text after voice recognition, readable medium and electronic equipment
CN116229975A (en) * 2023-03-17 2023-06-06 杭州盈禾嘉田科技有限公司 System and method for voice reporting of field diseases and insect pests in intelligent interaction scene
CN116229975B (en) * 2023-03-17 2023-08-18 杭州盈禾嘉田科技有限公司 System and method for voice reporting of field diseases and insect pests in intelligent interaction scene

Similar Documents

Publication Publication Date Title
CN110298019B (en) Named entity recognition method, device, equipment and computer readable storage medium
CN107229684B (en) Sentence classification method and system, electronic equipment, refrigerator and storage medium
CN110008319B (en) Model training method and device based on dialogue template
CN110060674B (en) Table management method, device, terminal and storage medium
CN110162681B (en) Text recognition method, text processing method, text recognition device, text processing device, computer equipment and storage medium
CN112668313A (en) Intelligent sentence error correction method and device, computer equipment and storage medium
CN111243593A (en) Speech recognition error correction method, mobile terminal and computer-readable storage medium
CN113506574A (en) Method and device for recognizing user-defined command words and computer equipment
CN112509566B (en) Speech recognition method, device, equipment, storage medium and program product
CN112632912A (en) Text error correction method, device and equipment and readable storage medium
CN111179934A (en) Method of selecting a speech engine, mobile terminal and computer-readable storage medium
CN109800333B (en) Report generation method and device based on machine learning and computer equipment
CN112201238A (en) Method and device for processing voice data in intelligent question answering and related equipment
CN111223476A (en) Method and device for extracting voice feature vector, computer equipment and storage medium
CN101405693A (en) Personal synergic filtering of multimodal inputs
CN115497484B (en) Voice decoding result processing method, device, equipment and storage medium
CN110609618A (en) Man-machine conversation method and device, computer equipment and storage medium
US20190279623A1 (en) Method for speech recognition dictation and correction by spelling input, system and storage medium
CN110929514B (en) Text collation method, text collation apparatus, computer-readable storage medium, and electronic device
CN111027345A (en) Font identification method and apparatus
CN113095067A (en) OCR error correction method, device, electronic equipment and storage medium
CN110895924A (en) Document content reading method and device, electronic equipment and readable storage medium
CN113792558B (en) Self-learning translation method and device based on machine translation and post-translation editing
CN116343791A (en) Service execution method, device, computer equipment and storage medium thereof
CN112435657B (en) Speech recognition method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200605

WW01 Invention patent application withdrawn after publication