CN106098066B

CN106098066B - Voice recognition method and device

Info

Publication number: CN106098066B
Application number: CN201610389407.7A
Authority: CN
Inventors: 吴刚; 党君利; 柳义庆; 冯晓龙
Original assignee: Shenzhen Mixlinker Network Co Ltd
Current assignee: Shenzhen Mixlinker Network Co Ltd
Priority date: 2016-06-02
Filing date: 2016-06-02
Publication date: 2020-01-17
Anticipated expiration: 2036-06-02
Also published as: CN106098066A

Abstract

The invention discloses a voice recognition method and a voice recognition device, wherein different operation instruction templates are set for different service interfaces, whether received voice information is matched with the operation instruction template or not is judged by taking the operation instruction template corresponding to the current service interface as the reference, and the operation indicated by the voice information is executed only if the matching is successful, so that the situation that the input approximate voice information is taken as an operation instruction when multiple sounds exist is avoided, the service currently provided is interrupted, and the operation instruction content in the voice information is accurately recognized.

Description

Voice recognition method and device

Technical Field

The present invention relates to the field of speech technology, and in particular, to a speech recognition method and apparatus.

Background

With the development of multimedia technology, the service items of the multimedia system are also expanded, such as music, video, pictures, real-time traffic signals, destination map navigation, voice navigation, and the like. The wide use of intelligent terminals provides a wide development space for the service projects.

No matter the terminal is provided with the keys or the touch screen, the service items can be used only by manual operation, so that the operation is complicated, and danger can be caused, for example, a driver can operate the vehicle-mounted equipment manually in the driving process to cause danger. The development of speech recognition technology provides new development directions for such operations. However, when the service item is used in a narrow internal space, such as an automobile, by using voice recognition, multiple sounds may coexist, and how to accurately recognize the operation command content in the voice information is a problem to be solved.

Disclosure of Invention

The embodiment of the invention provides a voice recognition method, which can accurately recognize the operation instruction content in voice information when multiple sounds exist.

The embodiment of the invention also provides a voice recognition device which can accurately recognize the operation instruction content in the voice information when multiple sounds exist.

The voice recognition method provided by the embodiment of the invention comprises the following steps:

receiving voice information;

judging whether the voice information is matched with an operation instruction template corresponding to the current service interface;

and if the voice information is matched with the operation instruction template, executing the operation indicated by the voice information, and if the voice information is not matched with the operation execution template, not executing the operation.

Therefore, in the embodiment of the invention, different operation instruction templates are set for different service interfaces, whether the received voice information is matched with the operation instruction template or not is judged by taking the operation instruction template corresponding to the current service interface as a reference, and the operation indicated by the voice information is executed if the matching is successful, so that the situation that the input approximate voice information is taken as an operation instruction when multiple sounds exist is avoided, the service currently provided is interrupted, and the operation instruction content in the voice information is accurately identified.

As an optional implementation, the operation instruction template includes: keyword ranking order and keyword lexicons.

Therefore, the operation instruction template in the embodiment of the invention not only comprises the keyword lexicon, but also comprises the keyword arrangement sequence, thereby improving the standard matched with the operation instruction template and realizing more accurate identification of the operation instruction content in the voice information.

As an optional implementation manner, the determining whether the voice information matches the operation instruction template includes:

performing word group division on the voice information;

judging whether the keywords obtained after word grouping division are contained in the keyword lexicon or not according to the splitting and the combination of the keywords obtained after word grouping division;

if the keywords obtained after word grouping and division are contained in the keyword lexicon, judging whether the keywords obtained after word grouping and division are matched with the keyword arrangement sequence; if the keywords obtained after word group division are matched with the keyword arrangement sequence, the voice information is determined to be matched with the operation instruction template; if the keywords obtained after word group division are not matched with the keyword arrangement sequence, determining that the voice information is not matched with the operation instruction template;

and if the keywords obtained after word group division are not contained in the keyword word stock, determining that the voice information is not matched with the operation instruction template.

Therefore, the voice information segmentation technology is adopted in the embodiment of the invention, the received voice information is divided into words, and the effect of accurately identifying the voice information is realized.

As an optional implementation, the method further comprises:

if the keywords obtained after word group division are not contained in the keyword lexicon, displaying the keywords which are not contained in the keyword lexicon and obtained by word group division;

after receiving a confirmation instruction, continuing to execute the step of judging whether the keywords obtained after word grouping and division are matched with the keyword arrangement sequence; and when a negative instruction is received, determining that the voice information is not matched with the operation instruction template.

Therefore, in the embodiment of the invention, when the keywords obtained after word group division are not contained in the keyword lexicon, the keywords can be further displayed, and if a confirmation instruction is received, the step of judging whether the keywords are matched with the keyword arrangement sequence is continuously executed, so that the wrong judgment on some keywords obtained after word group division is avoided when the keyword lexicon is incomplete.

As an optional implementation, the method further comprises:

before judging whether the keywords obtained after word grouping division are contained in the keyword lexicon, judging whether the keywords obtained after word grouping division contain instruction keywords;

if the keywords obtained after word group division comprise instruction keywords, continuing to execute the step of judging whether the keywords obtained after word group division are contained in the keyword lexicon; and if the keywords obtained after word group division do not contain instruction keywords, determining that the voice information is not matched with the operation instruction template.

Therefore, in the embodiment of the invention, whether the voice information comprises the instruction keywords is judged firstly, and whether the keywords in the voice information are contained in the keyword lexicon is further judged only on the basis of comprising the instruction keywords, so that the processing efficiency is improved.

An embodiment of the present invention provides a speech recognition apparatus, including:

the voice information receiving module is used for receiving voice information;

the judging module is used for judging whether the voice information is matched with an operation instruction template corresponding to the current service interface;

and the voice information response module is used for executing the operation indicated by the voice information when the voice information is matched with the operation instruction template, and not executing the operation when the voice information is not matched with the operation execution template.

As an optional implementation manner, the determining module includes:

the voice information analysis submodule is used for carrying out word group division on the voice information; the first judgment sub-module is used for judging whether the keywords obtained after word grouping division are contained in the keyword lexicon or not according to the splitting and the combination of the keywords obtained after word grouping division, triggering the second judgment sub-module to execute operation when the keywords obtained after word grouping division are contained in the keyword lexicon, and determining that the voice information is not matched with the operation instruction template when the keywords obtained after word grouping division are not contained in the keyword lexicon;

the second judgment sub-module is used for judging whether the keywords obtained after word grouping division are matched with the keyword arrangement sequence or not, and when the keywords obtained after word grouping division are matched with the keyword arrangement sequence, determining that the voice information is matched with the operation instruction template; and when the keywords obtained after word group division are not matched with the keyword arrangement sequence, determining that the voice information is not matched with the operation instruction template.

As an optional implementation manner, the first determining sub-module includes:

the first judgment execution sub-module is used for judging whether the keywords obtained after word grouping division are contained in the keyword lexicon or not according to the splitting and combination of the keywords obtained after word grouping division, triggering the second judgment sub-module to execute operation when the keywords obtained after word grouping division are contained in the keyword lexicon, and triggering the display sub-module to execute operation when the keywords obtained after word grouping division are not contained in the keyword lexicon;

the display sub-module is used for displaying the keywords which are not contained in the keyword lexicon and are obtained by word formation division when the keywords which are obtained by word formation division are not contained in the keyword lexicon;

the triggering module is used for triggering the second judging submodule to execute operation after receiving the confirmation instruction; and after receiving a negative instruction, determining that the voice information is not matched with the operation instruction template.

the second judgment execution sub-module is used for judging whether the keywords obtained after word grouping division contain instruction keywords or not according to the splitting and combination of the keywords obtained after the word grouping division, triggering a third judgment execution sub-module to execute operation when the keywords obtained after the word grouping division contain the instruction keywords, and determining that the voice information is not matched with the operation instruction template when the keywords obtained after the word grouping division do not contain the instruction keywords;

and the third judgment execution sub-module is used for judging whether the keywords obtained after word group division are contained in the keyword lexicon or not, triggering the second judgment sub-module to execute operation when the keywords obtained after word group division are contained in the keyword lexicon, and determining that the voice information is not matched with the operation instruction template when the keywords obtained after word group division are not contained in the keyword lexicon.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flow chart of a method of speech recognition according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method of speech recognition according to an embodiment of the present invention;

FIG. 2A is a schematic diagram of a system interface in an embodiment of the invention;

FIG. 3 is a block diagram of a speech recognition apparatus according to an embodiment of the present invention;

FIG. 4 is a block diagram of a speech recognition apparatus according to an embodiment of the present invention;

FIG. 5 is a block diagram of a speech recognition apparatus according to an embodiment of the present invention;

FIG. 5A is a block diagram of a speech recognition apparatus according to an embodiment of the present invention;

fig. 6 is a block diagram illustrating an apparatus 600 for speech recognition according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

Fig. 1 is a flowchart of a speech recognition method in an embodiment of the present invention, which can be applied to a terminal.

In step 11, voice information is received.

In step 12, it is determined whether the voice message matches the operation instruction template corresponding to the current service interface, if yes, step 13 is executed, otherwise, no operation is executed.

In step 13, the operation indicated by the voice information is executed.

The operation instruction template in the embodiment of the present invention may include: keyword ranking order and keyword lexicons. Different service interfaces correspond to different operation instruction templates, for example, a navigation service interface corresponds to one operation instruction template, and a music service interface corresponds to another operation instruction template.

Taking the navigation service as an example, the operation instruction template corresponding to the navigation service interface is shown in table one.

Watch 1

Taking music service as an example, the operation instruction template corresponding to the music service interface is shown in table two.

Watch two

In the keyword templates shown in the above table one and table two, there is a kind of instruction keyword, such as "navigate to" in the navigation operation instruction template, and also such as "play" in the music operation instruction template. As can be seen, the instruction keywords are typically verbs.

Fig. 2 is a flowchart of a speech recognition method according to an embodiment of the present invention, where the method may be applied to a terminal.

In step 21, voice information is received.

In step 22, an operation instruction template corresponding to the current service interface is determined.

As an alternative embodiment, when the end user wants to use the service, a voice interface wakeup command may be input on the system interface shown in fig. 2A, and the system interface displays the service available to the current user in the form of service channel. For example, when a music service is desired to be used, the voice is input to "open a music interface", and when a navigation service is desired to be used, the voice is input to "open a navigation interface". And after receiving the interface awakening instruction, the terminal opens the current service interface corresponding to the interface awakening instruction, and subsequent operation is executed based on the opened current service interface. The terminal stores the corresponding relation between the service interface and the operation instruction template, so that the operation instruction template corresponding to the current service interface can be determined according to the current service interface.

In step 23, the received voice message is divided into words.

As an optional implementation manner, a voice information segmentation technology is adopted to perform word segmentation on the received voice information to obtain the splitting and combining of the keywords.

In step 24, it is determined whether the keyword obtained after word segmentation is in the keyword lexicon, and when the keyword obtained after word segmentation is in the keyword lexicon, step 25 is executed, and when the keyword obtained after word segmentation is not in the keyword lexicon, no operation is executed.

As an alternative, in step 24, when the keyword obtained after word group division is not in the keyword lexicon, the keyword which is not included in the keyword lexicon may be displayed and a confirmed or negative function option may be provided to the user, after the user confirms the keyword, the terminal will receive a confirmation instruction, at this time, step 25 may be continued, after the user negates the keyword, the terminal will receive a negative instruction, at this time, no operation is performed. Therefore, the situation that some keywords cannot be identified when the keyword lexicon is incomplete is avoided. Further, after the user confirms the keyword, the keyword may be updated to a keyword thesaurus. Alternatively, here the user may input a confirmation or a negative instruction using voice.

As another alternative embodiment, before determining whether the keyword obtained after word formation division is in the keyword lexicon, it is determined whether the keyword obtained after word formation division includes an instruction keyword, and only when it is determined that the keyword obtained after word formation division includes the instruction keyword, the step of determining whether the keyword obtained after word formation division is in the keyword lexicon is performed, and if the keyword obtained after word formation division does not include the instruction keyword, it may be directly determined that the received voice information is not matched with the operation instruction at the last shift. Therefore, the received voice information is determined to include the instruction keywords to match the keyword lexicon, and the processing efficiency is improved.

In step 25, it is determined whether the keywords obtained after word segmentation are matched with the keyword arrangement order, when the keywords obtained after word segmentation are matched with the keyword arrangement order, it is determined that the received voice message is matched with the operation instruction template, step 26 is performed, and when the keywords obtained after word segmentation are not matched with the keyword arrangement order, no operation is performed.

In step 26, the operation indicated by the voice information is performed.

According to the method shown in fig. 1 or fig. 2, several specific application scenarios are given below. Take the terminal as an example of the vehicle-mounted device.

When a driver wants to use the navigation service, the voice input interface awakening instruction 'opens the navigation interface', and the vehicle-mounted equipment opens the navigation service interface after receiving the interface awakening instruction. After the navigation service interface is opened, the driver can continue to input voice information to navigate to the Tiananmen, and the vehicle-mounted equipment judges that the voice information is matched with the operation instruction template corresponding to the navigation service interface and executes corresponding navigation operation. In the process of providing navigation service, it is assumed that other passengers and drivers in the vehicle talk about tourist attractions, a plurality of place names may be mentioned, and at the moment, as long as the voice information received by the vehicle-mounted device does not conform to the format of 'navigation to place name', no operation is executed, so that the situation that the place names input by other voices are mistakenly regarded as new navigation instructions when the place names are received in a narrow space in the vehicle is avoided, and the current ongoing navigation service is interrupted.

When a driver wants to use music service, the voice input interface awakening instruction is 'open a music interface', and after the vehicle-mounted equipment receives the interface awakening instruction, the music service interface is opened. After the music service interface is opened, the driver can continue to input the voice information 'play song 1', the vehicle-mounted equipment judges that the voice information is matched with the operation instruction template corresponding to the music service interface, and corresponding music playing operation is executed. In the process of providing music playing, it is assumed that other passengers and drivers in the vehicle talk about current popular songs, and may mention a plurality of song names, and at this time, as long as the voice information received by the vehicle-mounted device does not conform to the format of playing song names, no operation is performed, so that the situation that the song names input by other voices are mistakenly regarded as new playing instructions when the song names are received in a narrow space in the vehicle is avoided, and thus the current ongoing music playing service is interrupted.

Examples of speech recognition devices according to embodiments of the present invention are given below, which may implement the speech recognition methods described above. The functions of each module or sub-module in these apparatuses correspond to the corresponding steps in the method flow, and the detailed explanation has been given above and will not be described below.

Fig. 3 is a block diagram of a speech recognition apparatus in an embodiment of the present invention, where the speech recognition apparatus may be located in a terminal, and includes: a voice message receiving module 31, a judging module 32 and a voice message responding module 33.

And the voice information receiving module 31 is configured to receive voice information.

And the judging module 32 is configured to judge whether the voice information matches with the operation instruction template corresponding to the current service interface, and send a judgment result to the voice information response module 33.

And a voice message response module 33, configured to execute the operation indicated by the voice message when the voice message matches the operation instruction template, and not execute the operation when the voice message does not match the operation execution template.

Fig. 4 is a block diagram of a speech recognition apparatus in an embodiment of the present invention, where the speech recognition apparatus may be located in a terminal, and includes: a voice message receiving module 31, a judging module 32, a voice message responding module 33 and a waking module 34.

The operation instruction module in the embodiment of the present invention may include: keyword ranking order and keyword lexicons.

The determination module 32 may include a voice information analysis sub-module 321, a first determination sub-module 322, and a second determination sub-module 323.

The voice information analysis sub-module 321 is configured to perform word segmentation on the voice information.

The first judging submodule 322 is configured to judge whether the keyword obtained after word formation division is included in the keyword lexicon according to splitting and combining of the keyword obtained after word formation division, trigger the second judging submodule 323 to perform an operation when the keyword obtained after word formation division is included in the keyword lexicon, and determine that the voice information is not matched with the operation instruction template when the keyword obtained after word formation division is not included in the keyword lexicon.

As an alternative, in order to avoid the keyword lexicon from being incomplete, when the determining module 322 determines that the keywords obtained after the word group division are not included in the keyword lexicon, an optional function of displaying confirmation may be provided for the user. In this case, the first determining submodule 322 may further include: a first judgment execution sub-module 3221, a display sub-module 3222, and a trigger module 3223. A block diagram of an apparatus including this section is shown in fig. 5.

The first determining and executing sub-module 3221 is configured to determine, according to the splitting and combining of the keywords obtained after the word formation division, whether the keywords obtained after the word formation division are included in the keyword lexicon, trigger the second determining sub-module 323 to execute an operation when the keywords obtained after the word formation division are included in the keyword lexicon, and trigger the display sub-module 3222 to execute an operation when the keywords obtained after the word formation division are not included in the keyword lexicon.

The display sub-module 3222 is configured to, when the keyword obtained by performing word formation division is not included in the keyword lexicon, display the keyword obtained by performing word formation division, which is not included in the keyword lexicon.

The triggering module 3223 is configured to, after receiving the confirmation instruction, trigger the second determining sub-module 323 to perform an operation; and after receiving a negative instruction, determining that the voice information is not matched with the operation instruction template.

As another optional implementation manner, in order to improve the processing efficiency, the first determining sub-module 322 may further include: a second judgment execution sub-module 3224 and a third judgment execution sub-module 3225. A block diagram of an apparatus including this portion is shown in fig. 5A.

The second judgment execution sub-module 3224 is configured to judge whether the keywords obtained after the word formation division include an instruction keyword according to the splitting and combining of the keywords obtained after the word formation division, trigger the third judgment execution sub-module 3225 to execute an operation when the keywords obtained after the word formation division include the instruction keyword, and determine that the voice information is not matched with the operation instruction template when the keywords obtained after the word formation division do not include the instruction keyword.

The third determining and executing sub-module 3225 is configured to determine whether a keyword obtained after performing word grouping division is included in the keyword lexicon, trigger the second determining sub-module 323 to execute an operation when the keyword obtained after performing word grouping division is included in the keyword lexicon, and determine that the voice information is not matched with the operation instruction template when the keyword obtained after performing word grouping division is not included in the keyword lexicon.

The second judging submodule 323 is configured to judge whether the keywords obtained after word formation division match the keyword arrangement order, and determine that the voice information matches the operation instruction template when the keywords obtained after word formation division match the keyword arrangement order; and when the keywords obtained after word group division are not matched with the keyword arrangement sequence, determining that the voice information is matched with the operation instruction template.

And the voice message response module 33 is configured to execute the operation indicated by the voice message when the voice message matches the operation instruction template, and not execute the operation when the voice message does not match the operation execution template.

And the wakeup module 34 is configured to receive an interface wakeup instruction and open the current service interface corresponding to the interface wakeup instruction.

Fig. 6 is a block diagram illustrating an apparatus 600 for speech recognition according to an example embodiment. For example, the apparatus 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 6, apparatus 600 may include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 608, an audio component 610, an interface to input/output (I/O) 612, a sensor component 614, and a communication component 616.

The processing component 602 generally controls overall operation of the device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or a portion of the steps of the speech recognition method described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.

The memory 604 is configured to store various types of data to support operation at the device 600. Examples of such data include instructions for any application or method operating on device 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 604 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power component 606 provides power to the various components of device 600. Power components 606 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for device 600.

The multimedia component 608 includes a screen that provides an output interface between the device 600 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 600 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 610 is configured to output and/or input audio signals. For example, audio component 610 includes a Microphone (MIC) configured to receive external audio signals when apparatus 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 614 includes one or more sensors for providing status assessment of various aspects of the apparatus 600. For example, the sensor component 614 may detect an open/closed state of the device 600, the relative positioning of components, such as a display and keypad of the apparatus 600, the sensor component 614 may also detect a change in position of the apparatus 600 or a component of the apparatus 600, the presence or absence of user contact with the apparatus 600, orientation or acceleration/deceleration of the apparatus 600, and a change in temperature of the apparatus 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is configured to facilitate communications between the apparatus 600 and other devices in a wired or wireless manner. The apparatus 600 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 604 comprising instructions, executable by the processor 620 of the apparatus 600 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. The description is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. The embodiments are to be considered merely as illustrative, with the true scope and spirit of the invention being indicated by the following claims.

Claims

1. A method of speech recognition, the method comprising:

receiving voice information;

judging whether the voice information is matched with an operation instruction template corresponding to the current service interface or not, wherein the judging step comprises the following steps: performing word group division on the voice information; judging whether the keywords obtained after word grouping division are contained in the keyword lexicon or not according to the splitting and the combination of the keywords obtained after word grouping division; if the keywords obtained after word grouping and division are contained in the keyword lexicon, judging whether the keywords obtained after word grouping and division are matched with the keyword arrangement sequence; if the keywords obtained after word group division are matched with the keyword arrangement sequence, the voice information is determined to be matched with the operation instruction template; if the keywords obtained after word group division are not matched with the keyword arrangement sequence, determining that the voice information is not matched with the operation instruction template; if the keywords obtained after word group division are not contained in the keyword lexicon, determining that the voice information is not matched with the operation instruction template;

if the voice information is matched with the operation instruction template, executing the operation indicated by the voice information, and if the voice information is not matched with the operation execution template, not executing the operation;

the operation instruction template comprises a keyword arrangement sequence and a keyword word library.

2. The method of claim 1, wherein the method further comprises:

3. The method of claim 1, wherein the method further comprises:

4. A speech recognition apparatus, characterized in that the apparatus comprises:

the voice information receiving module is used for receiving voice information;

the judging module is used for judging whether the voice information is matched with an operation instruction template corresponding to the current service interface or not, and comprises the following steps: the voice information analysis submodule is used for carrying out word group division on the voice information; the first judgment sub-module is used for judging whether the keywords obtained after word grouping division are contained in the keyword lexicon or not according to the splitting and the combination of the keywords obtained after word grouping division, triggering the second judgment sub-module to execute operation when the keywords obtained after word grouping division are contained in the keyword lexicon, and determining that the voice information is not matched with the operation instruction template when the keywords obtained after word grouping division are not contained in the keyword lexicon; the second judgment sub-module is used for judging whether the keywords obtained after word grouping division are matched with the keyword arrangement sequence or not, and when the keywords obtained after word grouping division are matched with the keyword arrangement sequence, determining that the voice information is matched with the operation instruction template; when the keywords obtained after word group division are not matched with the keyword arrangement sequence, determining that the voice information is not matched with the operation instruction template;

the voice information response module is used for executing the operation indicated by the voice information when the voice information is matched with the operation instruction template, and not executing the operation when the voice information is not matched with the operation execution template;

5. The apparatus of claim 4, wherein the first determining submodule comprises:

6. The apparatus of claim 4, wherein the first determining submodule comprises: