CN112133296B

CN112133296B - Full duplex voice control method and device, storage medium and voice equipment

Info

Publication number: CN112133296B
Application number: CN202010881215.4A
Authority: CN
Inventors: 陈士勇
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2024-05-21
Anticipated expiration: 2040-08-27
Also published as: CN112133296A

Abstract

The disclosure relates to a full duplex voice control method, a device, a storage medium and voice equipment, which solve the technical problem that in the interaction process of full duplex voice in the related technology, the interaction process is easily influenced by environmental factors, and the false recognition and execution are caused. The method comprises the following steps: under the condition that the voice equipment is in a radio state, responding to a voice instruction sent by a target object, and collecting biological characteristic information of the target object; under the condition that the biological characteristic information is matched with the preset characteristic information, acquiring pronunciation direction information of the target object; determining whether the pronunciation direction of the target object faces the voice equipment according to the pronunciation direction information; executing the operation corresponding to the voice instruction and prolonging the radio duration of the voice equipment under the condition that the pronunciation direction of the target object is towards the voice equipment; and discarding the operation corresponding to the voice instruction and shortening the radio duration of the voice equipment under the condition that the pronunciation direction of the target object does not face the voice equipment.

Description

Full duplex voice control method and device, storage medium and voice equipment

Technical Field

The disclosure relates to the technical field of voice interaction, and in particular relates to a full duplex voice control method, a full duplex voice control device, a storage medium and voice equipment.

Background

The voice interaction is an indispensable family interaction mode for people, and can realize that a sentence is turned on, a television station is adjusted, and the like, so that the experience of the voice interaction is improved, the voice interaction is more natural, the voice interaction is the topic focused by a user, and the full duplex voice is one direction for the voice interaction to be more natural.

In the related art, the principle of full duplex voice is to always open the mic to receive sound, or to extend the sound receiving time within a certain sound receiving time period, which is easy to be influenced by environmental factors, resulting in the problem of false recognition and execution.

Disclosure of Invention

In order to overcome the technical problems in the related art, the present disclosure provides a full duplex voice control method, a device, a storage medium and a voice apparatus.

According to a first aspect of an embodiment of the present disclosure, there is provided a full duplex voice control method, including:

Under the condition that the voice equipment is in a radio state, responding to a voice instruction sent by a target object, and collecting biological characteristic information of the target object;

acquiring pronunciation direction information of the target object under the condition that the biological characteristic information is matched with preset characteristic information;

Determining whether the pronunciation direction of the target object faces the voice equipment according to the pronunciation direction information;

Executing the operation corresponding to the voice instruction and prolonging the radio duration of the voice equipment under the condition that the pronunciation direction of the target object is towards the voice equipment;

and discarding the operation corresponding to the voice instruction and shortening the sound receiving time of the voice equipment under the condition that the sound generating direction of the target object does not face the voice equipment.

Optionally, the acquiring the biometric information of the target object includes:

and collecting voiceprint information of the target object according to the voice command.

Optionally, the acquiring the pronunciation direction information of the target object includes:

acquiring image information of the target object through a camera, and determining face characteristic information and mouth shape characteristic information of the target object according to the image information;

and determining the face orientation of the target object according to the image information, wherein the pronunciation direction information comprises the face orientation.

And collecting the face characteristic information and the mouth shape characteristic information of the target object.

Acquiring the acquired face characteristic information and mouth shape characteristic information of the target object;

Optionally, the prolonging the radio duration of the voice equipment includes:

Extending the radio duration according to a preset growth gradient, wherein the growth gradient comprises a plurality of growth proportions, and the growth proportion of the last time is larger than that of the previous time;

the shortening of the radio duration of the voice equipment comprises the following steps:

Shortening the radio duration according to a preset shortening gradient, wherein the shortening gradient comprises a plurality of shortening ratios, and the shortening ratio of the last time is larger than that of the previous time.

Optionally, after shortening the radio duration of the voice device, the method further includes:

and under the condition that the shortened sound receiving time length is smaller than a preset shortest sound receiving time length threshold value, controlling the voice equipment to stop sound receiving.

According to a second aspect of embodiments of the present disclosure, there is provided a full duplex voice control apparatus, comprising:

The first information acquisition module is configured to acquire biological characteristic information of a target object in response to receiving a voice instruction sent by the target object under the condition that the voice equipment is in a radio state;

The second information acquisition module is configured to acquire the pronunciation direction information of the target object under the condition that the biological characteristic information is matched with the preset characteristic information;

a judging module configured to determine whether a sound emitting direction of the target object is toward the voice device according to the sound emitting direction information;

The first execution module is configured to execute the operation corresponding to the voice instruction and prolong the radio duration of the voice equipment under the condition that the pronunciation direction of the target object is towards the voice equipment;

and the second execution module is configured to discard the operation corresponding to the voice instruction and shorten the sound receiving time of the voice equipment under the condition that the sound emitting direction of the target object is not towards the voice equipment.

Optionally, the first information acquisition module is configured to acquire voiceprint information of the target object according to the voice instruction.

Optionally, the second information acquisition module is configured to acquire image information of the target object through a camera, and determine face feature information and mouth shape feature information of the target object according to the image information;

Optionally, the first information acquisition module is configured to acquire image information of the target object through a camera, and determine face feature information and mouth shape feature information of the target object according to the image information;

Optionally, the second information acquisition module is configured to acquire the acquired face feature information and mouth shape feature information of the target object;

Optionally, the first execution module is configured to extend the radio duration according to a preset growth gradient, the growth gradient including a plurality of growth ratios, and the last growth ratio being greater than the previous growth ratio;

The second execution module is configured to shorten the radio duration according to a preset shortening gradient, the shortening gradient including a plurality of shortening ratios, and a last shortening ratio being greater than a previous shortening ratio.

Optionally, the apparatus further includes a first sound reception control module configured to control the voice device to stop sound reception if the shortened sound reception duration is less than a preset shortest sound reception duration threshold.

According to a third aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the full duplex speech control method provided by the first aspect of the present disclosure.

According to a fourth aspect of embodiments of the present disclosure, there is provided a full duplex voice control apparatus, comprising:

A processor;

a memory for storing processor-executable instructions;

Wherein the processor is configured to:

According to a fifth aspect of embodiments of the present disclosure, there is provided a voice device comprising the full duplex voice control apparatus provided in the second aspect of the present disclosure.

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: by identifying the biological feature information of the target object sending out the voice instruction, the voice equipment can only respond to the voice instruction of the user (the user can be a wake-up person of the voice equipment, for example) appointed by the preset feature information in the process of continuously talking with the user, the probability of false identification and false execution is reduced, in addition, under the condition that the user appointed by the non-preset feature information sends out the voice instruction, the voice equipment can shorten the sound receiving time, the sound receiving time of the voice equipment under the condition of excessive environmental noise is reduced, and the probability of false identification and false execution is further reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a flow chart illustrating a full duplex voice control method according to an exemplary embodiment.

Fig. 2 is another flow chart illustrating a full duplex voice control method according to an exemplary embodiment.

Fig. 3 is another flow chart illustrating a full duplex voice control method according to an exemplary embodiment.

Fig. 4 is a block diagram illustrating a full duplex voice control apparatus according to an exemplary embodiment.

Fig. 5 is another block diagram of a full duplex voice control apparatus according to an exemplary embodiment.

Fig. 6 is another block diagram of a full duplex voice control apparatus according to an exemplary embodiment.

Fig. 7 is a block diagram illustrating a full duplex voice control apparatus according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

Fig. 1 is a flowchart illustrating a full duplex voice control method according to an exemplary embodiment, where the full duplex voice control method may be used in, for example, a voice device, such as a mobile terminal, a smart speaker, a voice tv, etc., which is not limited in this disclosure. As shown in fig. 1, the method comprises the steps of:

In step S110, under the condition that the voice device is in a radio state, in response to receiving a voice command sent by a target object, acquiring biometric information of the target object;

in step S120, in the case that the biometric information is matched with the preset feature information, acquiring pronunciation direction information of the target object;

In step S130, determining whether the pronunciation direction of the target object is toward the voice device according to the pronunciation direction information;

In step S140, executing an operation corresponding to the voice command and extending the radio duration of the voice device when the sound emitting direction of the target object is toward the voice device;

in step S150, in the case that the sound emitting direction of the target object is not toward the voice device, discarding the operation corresponding to the voice command, and shortening the sound receiving duration of the voice device.

Alternatively, the biometric information may include different types of feature information, such as voiceprint information, facial information, mouth shape feature information, and the like;

The preset feature information may include preset voiceprint information, preset face information, preset mouth shape feature information, etc. corresponding to the biometric information.

In the embodiment, the user (the user can be, for example, a wake-up person of the voice equipment) designated by the preset characteristic information of the target object is determined through voiceprint recognition, face recognition and mouth shape recognition of the target object, and then whether the target object is sending an instruction to the voice equipment is determined according to the pronunciation direction of the target object; executing the operation corresponding to the voice instruction and prolonging the radio duration of the voice equipment under the condition that the pronunciation direction of the target object is towards the voice equipment; under the condition that the pronunciation direction of the target object does not face the voice equipment (for example, the condition that the target object is talking with other users), discarding the operation corresponding to the voice instruction, and shortening the sound receiving time of the voice equipment, thereby realizing intelligent sound receiving of the voice equipment, reducing the sound receiving time of the voice equipment under the condition of excessive environmental noise, only responding the instruction initiated by the user designated by the set feature information, and reducing the probability of false recognition and execution in the interaction process. On the basis of judging the voice equipment radio time window, voice print recognition, face recognition and mouth shape recognition are combined, so that after the voice equipment wakes up once, continuous dialogue can be carried out between the voice equipment and a user designated by the set feature information, and the voice interaction experience is improved.

Alternatively, the collection of the biometric information of the target object in step S110 may be achieved by:

Optionally, the collecting of the biometric information of the target object in step S110 may be achieved by:

The preset characteristic information comprises preset voiceprint information, face characteristic information and mouth shape characteristic information of a awakening person of the voice equipment.

Optionally, in step S120, when the biometric information matches with the preset feature information, the acquiring the pronunciation direction information of the target object may be implemented by the following steps:

Optionally, in step S120, when the biometric information matches with the preset feature information, the acquiring the pronunciation direction information of the target object may further be implemented by the following steps:

In the embodiment, when the target object is not in the acquisition range of the camera, determining that the target object is a wake-up person of the voice equipment through voiceprint recognition; or when the target object is in the acquisition range of the camera, determining that the target object is a wake-up person of the voice equipment through facial recognition and mouth shape recognition; so that the voice device responds only to instructions initiated by the wake-up person. According to the actual conditions of the voice equipment and the target object in the current scene, the identity information of the target object is verified in a corresponding mode, so that the flexibility of the voice equipment in carrying out identity verification on the awakening person is improved, and the interference of other objects on the interaction process of the voice equipment and the awakening person is avoided.

Further, under the condition that the target object sends an instruction to the voice equipment, the acquired pronunciation direction of the target object faces the voice equipment, and at the moment, the voice equipment executes the operation corresponding to the voice instruction sent by the target object currently, and the radio duration of the voice equipment is prolonged; under the condition that the target object and other objects are communicated, the face orientation of the obtained target object does not face the voice equipment, and at the moment, the voice equipment discards the operation corresponding to the voice instruction sent by the target object currently, and shortens the sound receiving time of the voice equipment. The voice equipment can intelligently identify whether the target object gives out an instruction or not, and misidentification and misoperation of the voice equipment according to the communication content of the target object and other objects are avoided.

Optionally, in step S140, the sound receiving duration of the voice device may be prolonged, where the sound receiving duration may be prolonged according to a preset growth gradient, the growth gradient includes a plurality of growth ratios, and the last growth ratio is greater than the previous growth ratio;

In step S150, the radio duration of the voice device is shortened, and the radio duration may be shortened according to a preset shortening gradient, where the shortening gradient includes a plurality of shortening ratios, and the shortening ratio in the last time is greater than the shortening ratio in the previous time.

The preset increasing gradient and the preset shortening gradient can be preset according to specific conditions in the human-computer interaction process, and the method is not particularly limited.

For example, in this embodiment, the preset growth gradient is 5%, 10%, 15%, 30%, the preset reduction gradient is 5%, 10%, 15%, 30%, and the initial sound reception duration of the voice device is 10s;

Under the condition that a target object sending instruction is received for the first time and the sound emitting direction of the target object is determined to be towards the voice equipment, executing the operation corresponding to the voice instruction and prolonging the sound receiving time length of the voice equipment to be 10.5s; under the condition that a target object sending instruction is received for the second time and whether the pronunciation direction of the target object is towards the voice equipment is determined, executing the operation corresponding to the voice instruction and prolonging the radio duration of the voice equipment to be 11.5s; and under the condition that the sending instruction of the target object is received for the third time and the sound emitting direction of the target object is not towards the voice equipment, discarding the operation corresponding to the voice instruction and shortening the sound receiving time of the voice equipment to 11s.

In step S140, the sound receiving duration of the voice device is prolonged, and the sound receiving duration of the voice device may also be prolonged to a first preset sound receiving duration;

in step S150, the sound receiving duration of the voice device is shortened, and the sound receiving duration of the voice device may be shortened to a second preset sound receiving duration;

The first preset sound reception time length is longer than the second preset sound reception time length.

The first preset radio duration and the second preset radio duration can be preset according to specific conditions in the human-computer interaction process, and the method is not particularly limited.

For example, in this embodiment, the first preset sound reception duration is 30s, the second preset sound reception duration is 8s, and the initial sound reception duration of the voice device is 10s.

Under the condition that the sound emitting direction of the target object is determined to be towards the voice equipment, executing the operation corresponding to the voice instruction, and prolonging the sound receiving duration of the voice equipment to be 30s; and under the condition that the sound emitting direction of the target object is not towards the voice equipment, executing the operation corresponding to the voice instruction, and shortening the sound receiving duration of the voice equipment to 8s.

According to the voice interaction method and device, the sound receiving duration of the voice device is adjusted according to whether the sound emitting direction of the target object faces the voice device or not, so that the voice device can more intelligently receive sound, a voice command sent by the target object can be responded more flexibly, and the voice interaction experience is improved.

Fig. 2 is another flowchart of a full duplex voice control method according to an exemplary embodiment, and as shown in fig. 2, the full duplex voice control method may be used in a voice device, for example, a mobile terminal, a smart sound, a voice television, etc., and the disclosure is not limited thereto, and the method includes the following steps:

In step S160, if the shortened sound reception duration is less than the preset shortest sound reception duration threshold, the voice device is controlled to stop sound reception.

In step S160, the shortest radio duration threshold may be the shortest duration of the voice device for collecting the valid voice command, which is set according to the capability of the voice device, and may be preset according to a specific situation in the human-computer interaction process, which is not specifically limited in the disclosure.

For example, in this case, if the sound receiving duration is shortened to 1.5S after executing step S150, the voice device may be directly controlled to stop receiving sound.

In this embodiment, by limiting the shortest sound receiving duration threshold, the sound receiving time of the voice device can be prevented from being too short, and the complete instruction sent by the target object can not be received, so that the power consumption of the voice device is saved.

Fig. 3 is another flowchart of a full duplex voice control method according to an exemplary embodiment, and as shown in fig. 3, the full duplex voice control method may be used in a voice device, for example, a mobile terminal, a smart sound, a voice tv, etc., and the present disclosure is not limited thereto, and the method includes the following steps:

In step S170, at the end time of the sound reception duration, the voice device is controlled to stop sound reception.

For example, in this embodiment, the sound receiving duration of the voice device may be 10s, and at the time of 0s, the voice device is controlled to stop receiving sound.

In this embodiment, at the end time of the sound reception duration, the voice device is controlled to stop sound reception, so that the voice device can be prevented from still receiving sound after the end time of the sound reception duration, and the power consumption of the voice device is saved.

Fig. 4 is a block diagram illustrating a full-duplex voice control apparatus that may implement part or all of a voice device in software, hardware, or a combination of both, according to an exemplary embodiment, as shown in fig. 4, the full-duplex voice control apparatus 400 includes:

A first information acquisition module 401 configured to acquire biometric information of a target object in response to receiving a voice instruction sent by the target object in a case where the voice device is in a sound reception state;

A second information obtaining module 402 configured to obtain pronunciation direction information of the target object if the biometric information matches with preset feature information;

A judging module 403 configured to determine whether the pronunciation direction of the target object is toward the speech device according to the pronunciation direction information;

The first execution module 404 is configured to execute an operation corresponding to the voice instruction and prolong the radio duration of the voice equipment when the pronunciation direction of the target object is towards the voice equipment;

The second execution module 405 is configured to discard the operation corresponding to the voice instruction and shorten the sound reception duration of the voice device when the sound emitting direction of the target object is not towards the voice device.

In this embodiment, the full duplex voice control apparatus determines, according to the biometric information of the target object, that the target object is a user specified by the preset feature information (the user may be, for example, a wake-up person of the voice device), and then determines whether the target object is sending an instruction to the voice device according to the pronunciation direction of the target object; executing the operation corresponding to the voice instruction and prolonging the radio duration of the voice equipment under the condition that the pronunciation direction of the target object is towards the voice equipment; under the condition that the pronunciation direction of the target object does not face the voice equipment (for example, the condition that the target object is talking with other users), discarding the operation corresponding to the voice instruction, and shortening the sound receiving time of the voice equipment, thereby realizing intelligent sound receiving of the voice equipment, reducing the sound receiving time of the voice equipment under the condition of excessive environmental noise, and only responding the instruction initiated by the user appointed by the set feature information, avoiding false recognition and execution and improving the experience of voice interaction in the conversation process.

Optionally, the first information obtaining module 401 may specifically collect voiceprint information of the target object according to the voice command.

Optionally, the first information obtaining module 401 may be further specifically configured to obtain image information of the target object through a camera, and determine facial feature information and mouth shape feature information of the target object according to the image information;

Optionally, the second information obtaining module 402 may specifically be configured to collect image information of the target object through a camera, and determine face feature information and mouth shape feature information of the target object according to the image information;

Optionally, the second information obtaining module 402 may be further specifically configured to obtain the collected face feature information and mouth shape feature information of the target object;

Alternatively, the first execution module 404 may specifically extend the radio duration according to a preset growth gradient, where the growth gradient includes a plurality of growth ratios, and the last growth ratio is greater than the previous growth ratio.

Optionally, the second execution module 405 may specifically shorten the radio duration according to a preset shortening gradient, where the shortening gradient includes a plurality of shortening ratios, and the shortening ratio of the last time is greater than the shortening ratio of the previous time.

Fig. 5 is another block diagram of a full-duplex voice control apparatus according to an exemplary embodiment, which may implement part or all of a voice device in software, hardware, or a combination of both, as shown in fig. 5, the full-duplex voice control apparatus 400 may further include:

The first sound reception control module 406 is configured to control the voice device to stop sound reception if the shortened sound reception duration is less than a preset shortest sound reception duration threshold.

The first sound reception control module 406 controls the voice device to stop sound reception by defining a shortest sound reception duration threshold, so that the sound reception time of the voice device is prevented from being too short to receive the complete instruction sent by the target object.

Fig. 6 is another block diagram of a full-duplex voice control apparatus according to an exemplary embodiment, which may implement part or all of a voice device in software, hardware, or a combination of both, as shown in fig. 6, the full-duplex voice control apparatus 400 may further include:

The second sound reception control module 407 is configured to control the voice device to stop sound reception at the end time of the sound reception duration.

The second sound reception control module 407 controls the voice device to stop receiving sound at the time when the sound reception time is ended, so that the voice device can be prevented from receiving sound after the sound reception time is ended.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

The present disclosure also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the full duplex speech control method provided by the present disclosure.

In particular, the computer readable storage medium may be a flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, etc.

With respect to the computer-readable storage medium in the above-described embodiments, the steps of the method when the computer program stored thereon is executed will be described in detail in the embodiments regarding the method, and will not be described in detail here.

The present disclosure also provides a full duplex voice control apparatus, which may be a computer, a platform device, etc., including:

A processor;

a memory for storing processor-executable instructions;

Wherein the processor is configured to:

The full duplex voice control apparatus determines that the target object is a user designated by preset feature information (the user may be, for example, a wake-up person of the voice device) through recognition of the biometric information of the target object, and then determines whether the target object is transmitting an instruction to the voice device according to the pronunciation direction of the target object; executing the operation corresponding to the voice instruction and prolonging the radio duration of the voice equipment under the condition that the pronunciation direction of the target object is towards the voice equipment; under the condition that the pronunciation direction of the target object does not face the voice equipment (for example, the condition that the target object is talking with other users), discarding the operation corresponding to the voice instruction and shortening the sound receiving time of the voice equipment, thereby realizing intelligent sound receiving of the voice equipment, reducing the sound receiving time of the voice equipment under the condition of excessive environmental noise, only responding the instruction initiated by the user appointed by the set feature information, avoiding false recognition and execution and improving the experience of voice interaction

Fig. 7 is a block diagram illustrating a full duplex voice control apparatus 800 according to an exemplary embodiment. As shown in fig. 7, the full duplex voice control 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operations of the apparatus 800, such as operations associated with camera operations and interactive recording operations, among others. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the full duplex voice control method described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the apparatus 800. Examples of such data include instructions for any application or method operating on the device 800, voice print information, facial feature information, mouth shape feature information, etc. of a person to wake up. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power component 806 provides power to the various components of the device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.

In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the apparatus 800 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may acquire image information of the target object. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive voice commands from a target object when the device 800 is in an operational mode, such as a voice recognition mode. The received voice instructions may be further stored in memory 804 or transmitted via communications component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals in response to voice instructions.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, buttons, etc. These buttons may include, but are not limited to: volume button, start button and lock button.

The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the apparatus 800. For example, the sensor assembly 814 may detect the relative positioning of the device 800 with the target object. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact.

The communication component 816 is configured to facilitate communication between the apparatus 800 and other devices, either in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 816 includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for performing the full-duplex voice control method described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including instructions executable by processor 820 of apparatus 800 to perform the full duplex voice control method described above. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

In another exemplary embodiment, there is also provided a voice device including the above-described full duplex voice control apparatus.

The voice device determines that the target object is a user (the user may be, for example, a wake-up person of the voice device) designated by the preset feature information through recognition of the biological feature information of the target object, and then determines whether the target object is transmitting an instruction to the voice device according to the pronunciation direction of the target object; executing the operation corresponding to the voice instruction and prolonging the radio duration of the voice equipment under the condition that the pronunciation direction of the target object is towards the voice equipment; under the condition that the pronunciation direction of the target object does not face the voice equipment (for example, the condition that the target object is talking with other users), discarding the operation corresponding to the voice instruction, and shortening the sound receiving time of the voice equipment, thereby realizing intelligent sound receiving of the voice equipment, reducing the sound receiving time of the voice equipment under the condition of excessive environmental noise, and only responding the instruction initiated by the user appointed by the set feature information, avoiding false recognition and execution and improving the experience of voice interaction in the conversation process.

Alternatively, the voice device may be a speaker, an air conditioner, a television, or the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of full duplex voice control, the method comprising:

discarding the operation corresponding to the voice instruction and shortening the radio duration of the voice equipment under the condition that the pronunciation direction of the target object does not face the voice equipment;

The prolonging of the radio duration of the voice equipment comprises the following steps: extending the radio duration according to a preset growth gradient, wherein the growth gradient comprises a plurality of growth proportions, and the growth proportion of the last time is larger than that of the previous time;

the shortening of the radio duration of the voice equipment comprises the following steps: shortening the radio duration according to a preset shortening gradient, wherein the shortening gradient comprises a plurality of shortening ratios, and the shortening ratio of the last time is larger than that of the previous time.

2. The method of claim 1, wherein the acquiring biometric information of the target object comprises:

3. The method of claim 1, wherein the obtaining the pronunciation direction information of the target object comprises:

4. The method of claim 1, wherein the acquiring biometric information of the target object comprises:

5. The method of claim 4, wherein the obtaining the pronunciation direction information of the target object comprises:

6. The method of any of claims 1-5, wherein after shortening the radio duration of the voice device, the method further comprises:

7. A full duplex voice control apparatus, the apparatus comprising:

the second execution module is configured to discard the operation corresponding to the voice instruction and shorten the radio duration of the voice equipment under the condition that the pronunciation direction of the target object is not towards the voice equipment;

the first execution module is configured to extend the radio duration according to a preset growth gradient, the growth gradient including a plurality of growth proportions, and a last growth proportion being greater than a previous growth proportion;

8. The apparatus of claim 7, wherein the first information acquisition module is configured to acquire voiceprint information of the target object in accordance with the voice instruction.

9. The apparatus of claim 7, wherein the second information acquisition module is configured to acquire image information of the target object through a camera, and determine facial feature information and mouth shape feature information of the target object according to the image information;

10. The apparatus of claim 7, wherein the first information acquisition module is configured to acquire image information of the target object through a camera, and determine facial feature information and mouth shape feature information of the target object according to the image information;

11. The apparatus of claim 10, wherein the second information acquisition module is configured to acquire the acquired face feature information and mouth shape feature information of the target object;

12. The apparatus of any of claims 7-11, further comprising a first sound pickup control module configured to control the voice device to stop picking up sound if the shortened sound pickup time period is less than a preset minimum sound pickup time period threshold.

13. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the steps of the full duplex speech control method of any of claims 1-6.

14. A full duplex voice control apparatus, the apparatus comprising:

A processor;

a memory for storing processor-executable instructions;

Wherein the processor is configured to:

15. A speech device, characterized in that the speech device comprises a full duplex speech control means according to claim 14.