CN109389978B

CN109389978B - Voice recognition method and device

Info

Publication number: CN109389978B
Application number: CN201811306260.6A
Authority: CN
Inventors: 韩雪; 王慧君; 毛跃辉; 陶梦春
Original assignee: Gree Electric Appliances Inc of Zhuhai
Current assignee: Gree Electric Appliances Inc of Zhuhai
Priority date: 2018-11-05
Filing date: 2018-11-05
Publication date: 2020-11-03
Anticipated expiration: 2038-11-05
Also published as: CN109389978A

Abstract

The application provides a voice recognition method and device. The method comprises the following steps: receiving voice including a first voice command of the first user and noise according to pre-stored information of a first location of the first user, and recognizing the first voice command from the voice according to pre-stored voiceprint information of the first user. In the scheme, the voice is received through the pre-stored information of the first position of the first user, the voice command matched with the voiceprint information of the first user is identified from the received voice through the pre-stored voiceprint information of the first user, and the function corresponding to the voice command is executed, so that the accuracy of voice identification in a noise environment can be improved.

Description

Voice recognition method and device

Technical Field

The present application relates to the field of speech control technologies, and in particular, to a speech recognition method and apparatus.

Background

The voice intelligent control equipment can receive the voice of the user, analyze the voice of the user to obtain a voice command, and then execute a corresponding function according to the voice command.

When the existing voice intelligent control equipment is used, if the surrounding environment is noisy, a voice command issued by a user is interfered by the surrounding environment, and the voice intelligent control equipment possibly cannot analyze the voice command according to the voice command issued by the user or can analyze an error voice command.

Disclosure of Invention

The application provides a voice recognition method and a voice recognition device, which are used for improving the voice recognition accuracy of voice intelligent control equipment in a noise environment.

In a first aspect, the present application provides a speech recognition method, including: receiving voice including a first voice command of the first user and noise according to pre-stored information of a first location of the first user, and recognizing the first voice command from the voice according to pre-stored voiceprint information of the first user. In the scheme, the voice is received through the pre-stored information of the first position of the first user, the voice command matched with the voiceprint information of the first user is identified from the received voice through the pre-stored voiceprint information of the first user, and the function corresponding to the voice command is executed, so that the accuracy of voice identification in a noise environment can be improved.

In a possible implementation manner, the receiving voice according to the pre-stored information of the first location of the first user includes: and determining a voice acquisition strategy according to pre-stored information of the first position of the first user, and then receiving voice according to the voice acquisition strategy. The voice collection strategy is: the voice reception intensity at any position within the voice reception range is inversely proportional to the first distance, wherein the first distance is the distance between any position and the first position, and the voice reception range includes the first position of the first user. According to the scheme, different voice receiving strengths are adopted for different positions through a voice collecting strategy, the voice receiving strength near the first position of the pre-stored first user is stronger, and the voice command of the first user can be better received.

In a possible implementation manner, before receiving the speech, the method may further include: and receiving a second voice command of the first user, and determining and storing voiceprint information of the first user and/or information of the first position of the first user according to the second voice command. The information of the first position of the first user stored in the scheme is used for determining the voice acquisition strategy, and the stored voiceprint information of the first user is used for recognizing the first voice command from the received voice.

In a possible implementation manner, the method may further include: and according to the first voice command, determining the information of the position corresponding to the first voice command, and updating the first position information according to the information of the position corresponding to the first voice command. According to the scheme, after the first voice command is received, the stored position information of the first user is updated to adjust the voice acquisition strategy, so that the voice command of the first user can be better received.

In a possible implementation manner, the recognizing the first voice command from the voice according to the pre-stored voiceprint information of the first user includes: and recognizing the first voice command from the voice according to the pre-stored voiceprint information of the first user and the information of the first position of the first user. According to the scheme, the recognized voice command is simultaneously with the pre-stored voiceprint information of the first user and the information of the first position of the first user, so that the recognized voice command is higher in accuracy.

In a second aspect, the present application provides a speech recognition apparatus comprising: the device comprises a voice receiving unit and a voice recognition unit, wherein the voice receiving unit is used for receiving voice according to pre-stored information of a first position of a first user, and the voice comprises a first voice command and noise of the first user. And the voice recognition unit is used for recognizing a first voice command from the voice according to the pre-stored voiceprint information of the first user. In the scheme, the voice command matched with the voiceprint information of the first user is recognized from the received voice through the prestored voiceprint information of the first user, and the function corresponding to the voice command is executed, so that the accuracy of voice recognition in a noise environment can be improved.

In a possible implementation manner, the apparatus may further include a determining unit, configured to determine a voice collecting policy according to pre-stored information of the first location of the first user, where the voice collecting policy is: the voice reception intensity at any position within the voice reception range is inversely proportional to the first distance, wherein the first distance is the distance between any position and the first position, and the voice reception range includes the first position of the first user. The voice receiving unit is specifically configured to receive a voice according to a voice acquisition policy. According to the scheme, different voice receiving strengths are adopted for different positions through a voice collecting strategy, the voice receiving strength near the first position of the pre-stored first user is stronger, and the voice command of the first user can be better received.

In a possible implementation manner, the voice receiving unit may be further configured to receive a second voice command of the first user, and the apparatus may further include a voiceprint recognition unit, a sound source localization unit, and a storage unit, where the voiceprint recognition unit is configured to determine voiceprint information of the first user according to the second voice command. The sound source positioning unit is used for determining information of a first position of the first user according to the second voice command. The storage unit is used for storing voiceprint information and/or information of the first position of the first user. The information of the first position of the first user stored in the scheme is used for determining the voice acquisition strategy, and the stored voiceprint information of the first user is used for recognizing the first voice command from the received voice.

In a possible implementation manner, the sound source positioning unit may be further configured to determine, according to the first voice command, information of a position corresponding to the first voice command. The storage unit may be further configured to update the information of the first location according to the information of the location corresponding to the first voice command. According to the scheme, after the first voice command is received, the stored position information is updated so as to adjust the voice acquisition strategy, and the voice command of the first user can be better received.

In a possible implementation manner, the voice recognition unit is specifically configured to recognize the first voice command from a voice according to pre-stored voiceprint information of the first user and information of a first location of the first user. According to the scheme, the recognized voice command is simultaneously with the pre-stored voiceprint information of the first user and the information of the first position of the first user, so that the recognized voice command is higher in accuracy.

In a third aspect, an embodiment of the present invention provides a network device, including:

a memory for storing program instructions;

a processor, configured to call the program instructions stored in the memory, and execute the method according to any of the foregoing first aspect or embodiments of the first aspect according to the obtained program.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium storing computer-executable instructions for causing a computer to perform the method according to the first aspect or any one of the embodiments of the first aspect.

Drawings

FIG. 1 is a schematic flow chart of a speech recognition method provided in the present application;

FIG. 2 is a schematic diagram of a speech recognition application scenario provided herein;

FIG. 3 is a schematic diagram of a speech recognition apparatus provided in the present application;

fig. 4 is a schematic structural diagram of a network device provided in the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clear, the present application will be further described in detail with reference to the accompanying drawings. The particular methods of operation in the method embodiments may also be applied to apparatus embodiments or system embodiments. In the description of the present application, the term "plurality" means two or more unless otherwise specified.

Fig. 1 is a flow chart illustrating an exemplary speech recognition method provided in the present application. The speech recognition method may be performed by a speech recognition apparatus. The voice recognition device can be an intelligent device which can be controlled by voice, such as a voice-controlled television, a voice-controlled toy, a mobile phone and the like, or can also be a chip in any intelligent device, or can also be a functional module with a voice recognition function in any intelligent device.

The method comprises the following steps:

and 105, receiving voice according to the pre-stored information of the first position of the first user.

The received voice includes a first voice command of the first user and noise, wherein the noise may include a sound made by a person (which may be referred to as a second user) other than the first user or a sound (such as a car whistle, a wind noise, etc.) in an external environment that interferes with the first voice command.

And 106, recognizing a first voice command of the first user from the voice according to the pre-stored voiceprint information of the first user.

With respect to step 106, since the first voice command included in the voice is sent by the first user, and the voiceprint information of the first voice command is matched with the pre-stored voiceprint information of the first user, the first voice command of the first user can be recognized from the received voice through the pre-stored voiceprint information of the first user.

Through the

above steps

105 and 106, since the voice is received through the pre-stored information of the first location of the first user, and the voice command matching with the voiceprint information of the first user is recognized from the received voice through the pre-stored voiceprint information of the first user, and the function corresponding to the voice command is executed, the accuracy of voice recognition in a noisy environment can be improved.

In a possible implementation manner, before the step 105, the method may further include:

and step 104, determining a voice acquisition strategy according to pre-stored information of the first position of the first user.

The voice collection strategy is: the voice reception intensity at any position within the voice reception range is inversely proportional to the first distance, wherein the first distance is the distance between any position and the first position, and the voice reception range includes the first position of the first user. According to the scheme, different voice receiving strengths are adopted for different positions through a voice collecting strategy, the voice receiving strength near the first position of the pre-stored first user is stronger, and the voice command of the first user can be better received. The information of the first location is pre-stored in the speech recognition device, and the information of the first location refers to information of a location where the first user is located, such as satellite coordinates of the first user, or a relative location between the first user and the speech recognition device.

Step 105 may be specifically implemented by: the voice is received according to a voice capture strategy.

For example, the first position of the user corresponding to the information of the first position of the user is a coordinate a, and there are two positions of a coordinate B and a coordinate C, where a distance from the coordinate B to the coordinate a is smaller than a distance from the coordinate C to the coordinate a, and the receiving intensity of the sound emitted by the voice recognition device according to the determined voice collecting policy to the coordinate B is greater than that of the sound emitted by the coordinate C.

According to the scheme, different voice receiving strengths are adopted for different positions through a voice collecting strategy, the voice receiving strength near the first position of the pre-stored first user is stronger, and the voice command of the first user can be better received.

step 101, receiving a second voice command of the first user.

The second voice command may be a wake-up command, where the wake-up command is used to adjust the voice recognition apparatus to enter the working state, and the wake-up command may be a specific sentence, such as "open the voice system", and when the voice recognition apparatus receives the voice of "open the voice system", it is determined that the voice command is the second voice command and enters the working state. The second voice command is from the first user.

After step 101, the method may further include:

and step 102, determining and storing the voiceprint information of the first user according to the second voice command.

The voiceprint information of the first user stored in step 102 can be used to identify a voice command of the first user from the received speech in step 106. The voiceprint information here is used to identify the voice characteristics of the first user. The voiceprint information of different users is different, so that the voiceprint information can be used for distinguishing the voices of different users.

By

steps

101 and 102, pre-storing of voiceprint information of the first user is achieved for later recognition of the first voice command from the received voice comprising the first voice command and noise based on the pre-stored voiceprint information. Of course, the method of pre-storing the voiceprint information of the first user is not limited to this, and for example, the voiceprint information may be recorded when the voice recognition apparatus is initially started.

In a possible implementation manner, after the step 101, the method may further include:

and 103, determining and storing the information of the first position of the first user according to the second voice command.

The second voice command is received in step 101, and the information of the first location refers to information of a location where the first user sent the second voice command. The information of the first location of the first user, which is stored in step 103, is used to determine the voice capture strategy in step 104.

It should be noted that there is no strict execution sequence between the step 102 and the step 103, for example, the step 102 may be executed first and then the step 103 is executed, or the step 103 may be executed first and then the step 102 is executed, or the step 102 and the step 103 may be executed in one step.

In a possible implementation manner, after the step 106, the method may further include:

and according to the first voice command, determining the information of the position corresponding to the first voice command, and updating the first position information according to the information of the position corresponding to the first voice command. Through the scheme, after the first voice command is received, the stored position information of the first user is updated to adjust the voice acquisition strategy, so that the voice command of the first user can be better received.

In a possible implementation manner, the step 106 may specifically be: and recognizing the first voice command from the voice according to the pre-stored voiceprint information of the first user and the information of the first position of the first user. According to the scheme, the recognized voice command is simultaneously with the pre-stored voiceprint information of the first user and the information of the first position of the first user, so that the recognized voice command is higher in accuracy.

In a possible implementation manner, if the received voice only includes a voice command and does not include noise in step 101, it is determined whether voiceprint information of the voice command matches with the pre-stored voiceprint information of the first user. If the voice command is matched with the voice command, the corresponding function is executed according to the voice command, and if the voice command is not matched with the voice command, the corresponding function is not executed.

A specific example is given below, and the above-described speech recognition method is specifically described. Fig. 2 is a schematic view of a speech recognition application scenario provided by the present application.

The voice recognition device may be, for example, an intelligent device such as a voice-controlled television, a voice-controlled toy, or a mobile phone, which can be controlled by using voice, or may also be a chip in any one of the above intelligent devices, or may also be a functional module with a voice recognition function in any one of the above intelligent devices. Taking the voice recognition device as a voice-controlled television for example, the first user sends a wake-up command to the voice recognition device at a first location, where the wake-up command is equivalent to the second voice command, and the wake-up command may be, for example, a voice "power on", and after the voice-controlled television receives the wake-up command, the voice-controlled television is turned on, and voiceprint information of the first user and information of the first location of the first user are determined and stored according to the wake-up command. After storing the voiceprint information of the first user, the voice-controlled television indicates that the voice-controlled television is controlled by the first user later, namely, the voice-controlled television can execute corresponding operation according to the received voice command of the first user. For other users, such as the voice command sent by the second user, the voice command sent by the second user is regarded as noise by the voice-controlled television because the voiceprint information of the second user does not match the voiceprint information of the first user stored in the voice-controlled television.

Further, the voice-controlled television can adjust the voice acquisition strategy to be that: the voice control television has stronger voice acquisition intensity for the position closer to the first position.

As an example, when the first user moves from the first location to the second location, and the locations of the second user a and the second user B are as shown in fig. 2, according to the voice collecting policy, since the locations of the second user a, the second location of the first user, and the second user B are sequentially from near to far from the first location: the voice acquisition intensity of the voice-controlled television to the positions of the first user, the second user A and the second user B is sequentially from large to small as follows: a location of a second user a, a second location of the first user, a location of a second user B.

When the first user sends the voice command zap to the voice controlled television at a second location, the second user a sends the voice command "increase volume", the second user B sends the voice command "decrease volume", and there is also a car whistling sound at the time, the location of which is shown in fig. 2. At this time, according to the voice collection strategy, since the position of the car whistle is far from the first position, the collection intensity of the car whistle is less than the voice command "zapping", the voice command "increase volume", and the voice command "decrease volume", so the interference of the car whistle to the voice command is greatly reduced, the voice command "zapping", the voice command "increase volume", and the voice command "decrease volume" can be clearly received by the voice-controlled television, and finally, the voice received by the voice-controlled television includes: the voice control television determines the voice command 'zapping' as the voice command sent by the first user from the received voice according to the stored voiceprint information of the first user, and therefore the voice control television performs zapping according to the voice command 'zapping'.

Based on the scheme, the voice is received through the pre-stored information of the first position of the first user, the voice command matched with the voiceprint information of the first user is recognized from the received voice through the pre-stored voiceprint information of the first user, and the function corresponding to the voice command is executed, so that the accuracy of voice recognition in a noise environment can be improved.

Based on the same inventive concept, fig. 3 exemplarily illustrates a speech recognition apparatus provided by the present application, which can execute a flow of a speech recognition method. The device includes:

the voice receiving unit 301 is configured to receive a voice according to pre-stored information of a first location of a first user, where the voice includes a first voice command of the first user and noise, where the noise includes a sound made by a person other than the first user and a sound (e.g., a car whistling sound, a wind noise, etc.) in an external environment that interferes with the first voice command.

A voice recognition unit 302, configured to recognize a first voice command from the voice according to pre-stored voiceprint information of the first user.

In a possible implementation manner, the apparatus may further include a determining unit 303, configured to determine, according to pre-stored information of the first location of the first user, a voice collecting policy, where the voice collecting policy is: the voice reception intensity at any position within the voice reception range is inversely proportional to the first distance, wherein the first distance is the distance between any position and the first position, and the voice reception range includes the first position of the first user. The voice receiving unit 301 is specifically configured to receive voice according to a voice collection policy.

In a possible implementation manner, the voice receiving unit 301 is further configured to receive a second voice command of the first user. The apparatus may further comprise a voiceprint recognition unit 304, a sound source localization unit 305 and a storage unit 306, wherein the voiceprint recognition unit 304 is configured to determine voiceprint information of the first user based on the second voice command. The sound source localization unit 305 is configured to determine information of a first location of the first user based on the second voice command. The storage unit 306 is configured to store voiceprint information and/or information of the first location of the first user.

In a possible implementation manner, the sound source positioning unit 305 may be further configured to determine, according to the first voice command, information of a position corresponding to the first voice command. The storage unit 306 may be further configured to update the information of the first location according to the information of the location corresponding to the first voice command.

In a possible implementation manner, the voice recognition unit 302 is specifically configured to recognize the first voice command from a voice according to pre-stored voiceprint information of the first user and information of the first location of the first user.

In a possible implementation manner, if the voice received by the voice receiving unit 301 only includes a voice command and does not include noise, the voice recognition unit 302 determines whether the voiceprint information of the voice command matches with the pre-stored voiceprint information of the first user.

For the concepts, explanations, detailed descriptions and other steps related to the above device and related to the technical solutions provided in the present application, please refer to the foregoing speech recognition method or descriptions related to these contents in other embodiments, which are not described herein again.

Based on the same concept as the embodiment, the application also provides a network device.

Fig. 4 is a schematic structural diagram of a network device provided in the present application. As shown in fig. 4, the network device 400 includes:

a memory 401 for storing program instructions;

a processor 402 for calling the program instructions stored in the memory and executing the speech recognition method in any of the foregoing embodiments according to the obtained program.

Based on the same concept as the above embodiments, the present application also provides a computer storage medium storing computer-executable instructions for causing a computer to perform the speech recognition method described in any of the foregoing embodiments.

It should be noted that the division of the units in the present application is schematic, and is only one division of logic functions, and there may be another division manner in actual implementation. In the present application, each functional unit may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the present application are generated in whole or in part when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

As will be appreciated by one skilled in the art, the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A speech recognition method, comprising:

receiving voice according to pre-stored information of a first position of a first user, wherein the voice comprises a first voice command and noise of the first user;

recognizing the first voice command from the voice according to prestored voiceprint information of the first user;

the receiving voice according to the pre-stored information of the first location of the first user includes:

determining a voice acquisition strategy according to pre-stored information of the first position of the first user, wherein the voice acquisition strategy is as follows: a voice reception intensity at any location within a voice reception range, the voice reception range including a first location of the first user, is inversely proportional to a first distance between the any location and the first location of the first user;

and receiving voice according to the voice acquisition strategy.

2. The method of claim 1, wherein prior to receiving speech, further comprising:

receiving a second voice command of the first user;

and determining and storing the voiceprint information of the first user and/or the information of the first position of the first user according to the second voice command.

3. The method of claim 2, wherein the method further comprises:

determining information of a position corresponding to the first voice command;

and updating the information of the first position according to the information of the position corresponding to the first voice command.

4. The method of claim 1, wherein the recognizing the first voice command from the voice according to the pre-stored voiceprint information of the first user comprises:

and recognizing the first voice command from the voice according to pre-stored voiceprint information of the first user and information of a first position of the first user.

5. A speech recognition apparatus, comprising:

the voice receiving unit is used for receiving voice according to pre-stored information of a first position of a first user, and the voice comprises a first voice command and noise of the first user;

the voice recognition unit is used for recognizing the first voice command from the voice according to pre-stored voiceprint information of the first user;

the device further comprises a determining unit, wherein the determining unit is used for determining a voice acquisition strategy according to pre-stored information of the first position of the first user, and the voice acquisition strategy is as follows: a voice reception intensity at any location within a voice reception range, the voice reception range including a first location of the first user, is inversely proportional to a first distance between the any location and the first location of the first user;

the voice receiving unit is specifically configured to receive a voice according to the voice acquisition policy.

6. The apparatus of claim 5, wherein the voice receiving unit is further configured to receive a second voice command of the first user;

the device also comprises a voiceprint recognition unit, a voice recognition unit and a voice recognition unit, wherein the voiceprint recognition unit is used for determining the voiceprint information of the first user according to the second voice command;

the device further comprises a sound source positioning unit, which is used for determining the information of the first position of the first user according to the second voice command;

the apparatus further comprises a storage unit for storing the voiceprint information and/or information of the first location of the first user.

7. The apparatus of claim 6, wherein the sound source localization unit is further configured to determine information of a location corresponding to the first voice command;

the storage unit is further configured to update the information of the first location according to the information of the location corresponding to the first voice command.

8. The apparatus according to claim 5, wherein the speech recognition unit is specifically configured to recognize the first voice command from the speech according to pre-stored voiceprint information of the first user and information of the first location of the first user.