WO2019196312A1

WO2019196312A1 - Method and apparatus for adjusting sound volume by robot, computer device and storage medium

Info

Publication number: WO2019196312A1
Application number: PCT/CN2018/102853
Authority: WO
Inventors: 周宸; 周宝; 王健宗; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-04-10
Filing date: 2018-08-29
Publication date: 2019-10-17
Also published as: CN108628572A; CN108628572B

Abstract

The present application provides a method and an apparatus for automatically adjusting sound volume by a robot. The robot has a camera, a speaker and an ambient microphone for acquiring an ambient sound. It is predefined that the corresponding sound volume of the speaker is V when a first user with a height of H has a distance D from the robot. The method comprises the following steps: acquiring an image by means of a camera and detecting a second user image feature in the image; calculating according to the second user image feature the height h of the second user and a distance d relative to a robot; determining a height gain k_h according to a relationship between the h and the H, and determining a distance gain k_d according to a relationship between the d and the D; acquiring an ambient sound volume by means of an ambient microphone to obtain an ambient noise value v_e, and determining a corresponding ambient gain k_e according to the v_e and a preset correlation; determining the sound volume of a speaker (I) according to the k_h, k_d, k_e and V, so that the robot can intelligently adjust the sound volume of the speaker according to the actual situation, improving interaction efficiency and user experience. Further provided are a computer device and a storage medium.

Description

Method, device, computer device and storage medium for robot to adjust volume

This application claims the priority of the Chinese patent application filed on April 10, 2018, the Chinese Patent Office, Application No. 201101314093.3, entitled "Method, Apparatus, Computer Equipment and Storage Medium for Robots to Adjust Volume", the entire contents of which are The citations are incorporated herein by reference.

Technical field

The present application relates to the field of robotics, and in particular, to a method and apparatus for automatically adjusting a volume by a robot, and a computer device and a storage medium storing computer readable instructions.

Background technique

The inventor realized that current service robots generally use a fixed volume for voice dialogue and video playback, and may have a large decibel value of environmental noise due to various factors, such as the sound of people/other audio equipment, which makes it difficult for users. When you hear the sound of the robot, the interaction efficiency is poor and the user experience is poor.

Summary of the invention

The purpose of the present application is to solve at least one of the above technical drawbacks, in particular, a technical defect of poor interaction efficiency.

The present application provides a method for a robot to automatically adjust a volume. The robot has a camera, a speaker, and an ambient microphone for collecting ambient sound. The first user with a predefined height H is corresponding to the speaker volume when the distance from the robot is D. V, the method includes the steps of: acquiring an image by the camera and detecting an image feature of a second user in the image, calculating a height h of the second user according to the second user image feature, and comparing the a distance d of the robot, determining a height gain k _h according to the relationship between the h and the H, and determining a distance gain k _d by the relationship between the d and the D; obtaining an environmental noise value by collecting the ambient volume through the ambient microphone _e , determining a corresponding environmental gain k _e according to the v _e and a preset correspondence relationship; determining a speaker volume according to the k _h , the k _d , the k _{e ,} and the V

The present application also provides a device for automatically adjusting the volume of a robot, the robot having a camera, a speaker and an environment microphone for collecting ambient sound, and a speaker volume corresponding to a first user of a predefined height H at a distance D from the robot For V, the apparatus includes: a first calculation module, configured to acquire an image by the camera and detect a second user image feature in the image, and calculate a height h of the second user according to the second user image feature a distance d from the robot, a height gain k _{h is} determined according to the relationship between the h and the H, and a relationship between the d and the D determines a distance gain k _d ; a second calculation module is configured to pass the ambient microphone Collecting an ambient volume to obtain an environmental noise value v _e , determining and corresponding a corresponding environmental gain k _e according to the v _e and a preset correspondence relationship; a volume calculation module, configured to _perform , according to the k _h , the k _d , the k _e and the V determine the speaker volume

The application also provides a computer device comprising a memory and a processor, the memory storing computer readable instructions, the computer readable instructions being executed by the processor, such that the processor executes a robot automatically a method for adjusting a volume, the robot having a camera, a speaker, and an ambient microphone for collecting ambient sound, the first user of the predefined height H is corresponding to a speaker volume of V when the distance from the robot is D, and the robot automatically adjusts The method of volume includes the steps of: acquiring an image by the camera and detecting an image feature of a second user in the image, calculating a height h of the second user and a relative robot according to the second user image feature a distance d, a height gain k _{h is} determined according to the relationship between the h and the H, and a relationship between the d and the D determines a distance gain k _d ; and an ambient noise value v _{e is} obtained by collecting the ambient volume through the ambient microphone; determining a gain corresponding to the environment according to corresponding relationship between the k _e v _e and a preset; based on the k _h, the k _d, the k _e Determining the speaker volume V

The present application also provides a non-volatile storage medium storing computer readable instructions that, when executed by one or more processors, cause one or more processors to perform a robot to automatically adjust the volume The method has a camera, a speaker and an environment microphone for collecting ambient sound, and the first user of the predefined height H is corresponding to a speaker volume of V when the distance from the robot is D, and the robot automatically adjusts the volume. The method comprises the steps of: acquiring an image by the camera and detecting an image feature of a second user in the image, calculating a height h of the second user and a distance d relative to the robot according to the second user image feature, according to The relationship between the h and the H determines a height gain k _h , and the relationship between the d and the D determines a distance gain k _d ; the ambient microphone generates an ambient noise value v _e according to the ambient microphone, according to the v Corresponding relationship between _e and preset determines a corresponding environmental gain k _e ; determining speaker volume according to the k _h , the k _d , the k _{e ,} and the V

The above method, device, computer device and storage medium for automatically adjusting the volume of the robot determine the speaker volume V _m by determining the distance h between the user's h and the user relative to the robot and the environmental noise value v _e measured by the environmental microphone, so that the robot can Intelligently adjust the speaker volume according to the actual situation, so that the user can give the most appropriate volume level in any environment, which improves the interaction efficiency and user experience.

DRAWINGS

The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from

1 is a schematic diagram showing the internal structure of a computer device in an embodiment;

2 is a schematic flow chart of a method for automatically adjusting a volume of a robot according to an embodiment;

Figure 3 is a top plan view of the spatial position between the robot and the user of one embodiment;

4 is a schematic diagram of a device module for automatically adjusting a volume of a robot according to an embodiment.

detailed description

FIG. 1 is a schematic diagram showing the internal structure of a computer device in an embodiment. As shown in FIG. 1, the computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected by a system bus. The non-volatile storage medium of the computer device stores an operating system, a database, and computer readable instructions. The database may store a sequence of control information. When the computer readable instructions are executed by the processor, the processor may implement a processor. A method in which the robot automatically adjusts the volume. The processor of the computer device is used to provide computing and control capabilities to support the operation of the entire computer device. Computer readable instructions may be stored in the memory of the computer device, the computer readable instructions being executable by the processor to cause the processor to perform a method of automatically adjusting the volume by the robot. The network interface of the computer device is used to communicate with the terminal connection. It will be understood by those skilled in the art that the structure shown in FIG. 1 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device to which the solution of the present application is applied. The specific computer device may It includes more or fewer components than those shown in the figures, or some components are combined, or have different component arrangements.

The method of automatically adjusting the volume of the robot described below can be applied to an intelligent robot such as a customer service robot, a child education robot, and the like.

FIG. 2 is a schematic flow chart of a method for automatically adjusting a volume of a robot according to an embodiment.

The present application provides a method for automatically adjusting a volume of a robot, the robot having a camera, a speaker, and an environment microphone for collecting ambient sound (also having a microphone for collecting user sound), and the first user with a predefined height H is away from the robot When the distance is D, the corresponding speaker volume is V, and the method includes the following steps:

Step S100: acquiring an image by the camera and detecting an image feature of the second user in the image, and calculating a height h of the second user and a distance d of the relative robot according to the second user image feature, according to the h and The relationship of H determines the height gain k _h , and the relationship of d to D determines the distance gain k _d .

The face detection method can be used for face detection to detect the second user in the image.

Since the robot's camera may capture multiple faces, some of whom are just background characters, and do not interact with the robot, such as a conversation, it is only necessary to consider the person who is facing the camera and the robot. The camera is usually placed in the direction of the robot facing the user. For example, if the robot has a head, it can be placed at the position of the forehead or face of the head; if the robot has a torso, it can also be placed at the position of the front torso. Here, the setting position of the camera is not limited, and it is only necessary to ensure that the second user can capture the second user when talking to the robot.

For the camera, the captured image (picture or video frame) is fixed in size, and the preset rectangular position can be defined as the face recognition area at the center of the picture, and the face detection is performed only in this face recognition area. . For example, assuming a 1920×1080 size picture, a 1000×1000 rectangular position can be delineated at the center of the picture as a face recognition area.

The face detection technology, which is a technique for detecting a face existing in an image by image analysis and accurately arranging the position of the face with a rectangular frame, is the basis of face feature point detection and face recognition. Commonly used face detection data sets, including FDDB (Face Detection Data Set and Benchmark). With the rapid development of deep learning in recent years, many excellent face detection methods have emerged.

For example, the FDDB database has submitted many excellent face detection methods, such as the face detection method using Convolutional Neural Network (Convolutional Neural Network): A Convolutioanal Neural Network Cascade, improved fast rcnn for face detection. :Face Detection using Deep Learning: An Improved Faster RCNN Approach, and Finding tiny faces that are very successful for small face detection. In addition, databases such as opencv, dlib, and libfacedetect also provide interfaces for face detection.

Commonly used face detection methods are as follows:

1. Single CNN face detection method

2. Cascading CNN face detection method

3.OpenCV face detection method

4.Dlib face detection method

5.libfacedetect face detection method

6.Seetaface face detection method

The following is a brief introduction to the single CNN face detection method.

First, train a second classifier that judges faces and non-faces. For example, using the convolutional neural network caffenet for two classifications, models that can be trained in the imagenet dataset can be fine-tuned using their own face datasets. It is also possible to customize the convolutional network for training. In order to detect smaller face targets, we generally use a smaller convolutional neural network as the two-class model to reduce the image input size and speed up the prediction.

Then change the fully connected layer of the trained face judgment classification network to the convolution layer, so that the network becomes a full convolution network, which can accept any input image size, and the image will obtain the feature map through the full convolution network. The probability that each "point" corresponding position maps to the receptive field on the original image belongs to the face, and the face probability that the face probability is greater than the set threshold is regarded as a face candidate box.

The size of the face on the image changes. In order to adapt to this change, the best way is to use the image pyramid to scale the image to be detected to different sizes for multi-scale face detection. For all face candidate frames detected at multiple scales, the non-maximum value suppression NMS is obtained, and the result of the final face detection is obtained.

If the method of automatically adjusting the volume of the robot in the embodiment is applied to the Android system, the FaceDetector type may be applied to determine whether the image captured by the camera has a face image. Android has a built-in face recognition API: FaceDetector, which can perform face recognition with a small amount of code, but this recognition is the most basic recognition, that is, only the face in the image can be recognized.

The face recognition technology in Android requires the underlying library: android/external/neven/, architecture layer: frameworks/base/media/java/android/media/FaceDetector.java. Java layer restrictions: 1, can only accept data in Bitmap format; 2, can only recognize face images with binocular distance greater than 20 pixels (can be modified in the framework layer); C, can only detect the position of the face (both eyes) The center point and distance) cannot match the face (find the specified face).

The main methods provided by the Neven library:

A, android.media.FaceDetector.FaceDetector(int width, int height, int maxFaces); B, int android.media.FaceDetector.findFaces(Bitmap bitmap, Face[]faces).

In the Android system, the two-eye position of the face image can be obtained by the FaceDetector class, and the position of the face image on the desktop is determined according to the position of the two eyes. The specific step may be: acquiring the center point of the two-eye position of the face image, acquiring the distance between the two eyes of the face image (pitch distance), and drawing a rectangular area (rectangular frame) according to the center point of the two-eye position and the distance between the two eyes, and the rectangular area As the location of the face image on the desktop. The center point of the two-eye position of the face image can be obtained by the following code:

mFace[i].getMidPoint(eyeMidPoint);

The two-eye spacing of the face image can be obtained by the following code:

eyesDistance=mFace[i].eyesDistance();

You can draw a rectangular area with the following code:

myEyesDistance=face.eyesDistance();//Get the center point and the two eye spacing parameters of the two eyes, and frame each face

Regarding the face detection method, it will not be described here. In the present embodiment, the face detection method is not limited. After detecting the face, the detected height h of the second user and the distance d relative to the robot are calculated, the height gain k _{h is} determined according to the relationship between h and H, and the relationship between d and D determines the distance gain k _d .

The distance d of the second user relative to the robot can be calculated and determined by the sensed values of the associated ranging sensors, such as infrared ranging by infrared sensors, laser ranging using laser sensors, and the like. In the present embodiment, the determination d is calculated by an image analysis method.

When calculating the determination d, it is based on two assumptions. First, for most people, the gap between people's distances is not much different (±2cm). Second, the user who talks to the robot, the distance between the user and the robot is limited to a relatively small range. The principle is to estimate the distance by comparing the ratio of the face distance in the captured picture to the face distance in the calibration picture. The closer the user's face is to the camera, the larger the size of the face, and the relationship is approximately linear.

In this embodiment, the second user image feature includes a portrait distance, and the portrait distance in the image of the first user when the distance from the robot is D1 is A1 (actual interpupillary distance), and the distance between the first user and the robot is When the portrait distance in the image at D2 is A2 (actual interpupillary distance), the distance d between the second user and the robot is calculated by the following formula:

d=k(a-A1)+D1

among them

a is the second user portrait distance detected when the face detection method is applied, that is, the portrait distance in the image.

Of course, other image analysis methods may be used to calculate and determine d. For example, other calculation formulas may be used, and only the above two assumptions and principles need to be followed, and are not described herein.

After determining d, the gain is determined from the relationship K _d and D d. The distance gains k _d and d have a positive relationship, such as a proportional relationship. In the present embodiment, the distance gain k _d = d / D. Of course, in other embodiments, other calculation manners may be used, k _d =d/D+m (m is a preset coefficient), and the like, and details are not described herein.

In this embodiment, when the first user distance is C (the actual lay length) and the portrait distance in the corresponding image is c (the portrait distance in the image), the second user height h is calculated by the following formula:

Among them, H1 is the camera height, and Δh is the pixel difference between the center of the face rectangle and the center of the image detected when the face detection method is applied for face detection.

Similarly, other image analysis methods may be used to calculate and determine h. For example, other calculation formulas may be used, and only the above two assumptions and principles need to be followed, and are not described herein.

After h is determined, the height gain k _{h is} determined according to the relationship between h and H. Similarly, h and H are positively related, such as a proportional relationship. In the present embodiment, the height gain k _h = (h - Δ) / (H - Δ), where Δ is the speaker height. Of course, in some embodiments, the speaker height can be ignored, ie Δ=0.

Step S200: The ambient noise value v _{e is} obtained by collecting the ambient volume by the environment microphone, and the corresponding environment gain k _{e is} determined according to the v _e and the preset correspondence relationship.

Specifically, the environmental noise value v _{e is} obtained by collecting the ambient volume by the environmental microphone, and the environmental gain k _e corresponding to the interval range is determined according to the range of the interval in which the v _e is located.

A plurality of interval ranges can be preset, and each interval range has a corresponding preset environment gain. That is, the (v ₁ , v ₂ ) interval range corresponds to the environmental gains k ₁ , (v ₂ , v ₃ ), the interval ranges correspond to the environmental gains k ₂ , ..., (v _n-1 , v _n ), and the interval ranges correspond to the environmental gains k _{n -1} .

For example, set the noise standard to 70dB.

For quiet environments (v _e <40dB), k _e =0.8;

For normal environments (40dB < v _e <70dB), k _e =1;

For noisy environments (70dB<v _e <90dB), k _e =1+(v _e -70)/100;

For extremely noisy cases (v _e >90dB), k _e =∞;

Of course, in some embodiments, the environmental gain k _e can be determined by v _e and a preset calculation formula.

In one embodiment, the ambient microphone includes at least a first microphone and a second microphone, the first microphone and the second microphone being located on both sides of the robot (baseline in front of the robot), such as on both sides of the robot head, or two robot torso Side, see FIG. 3; the process of collecting the ambient volume by the ambient microphone to obtain the ambient noise value v _e includes:

The first ambient noise value v _{1 is} obtained by collecting the ambient volume by the first microphone, the second ambient noise value v _{2 is} obtained by the second microphone collecting the ambient volume, and the largest one of v ₁ and v ₂ is determined as the environmental noise value v _e That is, v _e =max(v ₁ , v ₂ ).

After determining v _e , the interval range of v _e can be obtained by querying the range of v _e in the data table, and then the environmental gain k _e corresponding to the range of the range described by v _{e is} obtained.

Step S300: determining the speaker volume according to k _h , k _d , k _{e ,} and V

k _h , k _d , k _e are all in a positive relationship (for example, a proportional relationship) with the speaker volume V _m ,

Equivalent to the source gain,

Corresponding to the total gain, any appropriate deformation of the calculation formula of the above-mentioned speaker volume V _m based on the forward relationship can be considered to be reasonable, and will not be described herein.

Of course, it is also possible to preset the maximum volume V _max and the minimum volume V _min , if V _m <V _min , then V _m =V _min ; if V _m >V _max , then V _m =V _max .

The above method for automatically adjusting the volume of the robot determines the speaker volume V _m by determining the distance d between the second user and the second user relative to the robot and in combination with the ambient noise value v _e measured by the environmental microphone, so that the robot can intelligently adjust according to the actual situation. The speaker volume, which gives the user the optimum volume level regardless of the environment, improves the interaction efficiency and user experience.

4 is a schematic diagram of a device module for automatically adjusting a volume of a robot according to an embodiment. Corresponding to the method for automatically adjusting the volume of the robot, the present application also provides a device for automatically adjusting the volume of the robot, the robot having a camera, a speaker and an environment microphone for collecting ambient sound (also having a microphone for collecting user sound), a predefined height The first user of H is at a speaker volume of V when the distance from the robot is D, and the device includes a first calculation module 100, a second calculation module 200, and a volume calculation module 300.

The first calculation module 100 is configured to acquire an image by the camera and detect an image feature of a second user in the image, and calculate a height h of the second user and a distance d of the relative robot according to the second user image feature, according to The relationship between h and H determines the height gain k _h , and the relationship between d and D determines the distance gain k _d ; the second calculation module 200 is configured to obtain the ambient noise value v _e by collecting the ambient volume through the ambient microphone, according to v _e and pre The corresponding relationship is determined to determine the corresponding environment gain k _e ; the volume calculation module 300 is configured to determine the speaker volume according to k _h , k _d , k _e and V

The first calculation module 100 acquires an image by the camera and detects a second user image feature in the image, and calculates a height h of the second user and a distance d relative to the robot according to the second user image feature, according to h and H The relationship determines the height gain k _h , and the relationship between d and D determines the distance gain k _d .

The first calculation module 100 may perform face detection using a face detection method to detect a second user in the image.

For the camera, the captured image (picture or video frame) is fixed in size, and the preset rectangular position can be defined as the face recognition area at the center of the picture, and the face detection is performed only in this face recognition area. . For example, if a picture of 1920×1080 size is used, a rectangular position of 1000×1000 can be delimited at the center of the picture as a face recognition area.

Commonly used face detection methods are as follows:

1. Single CNN face detection method

2. Cascading CNN face detection method

3.OpenCV face detection method

4.Dlib face detection method

5.libfacedetect face detection method

6.Seetaface face detection method

The following is a brief introduction to the single CNN face detection method.

If the apparatus for automatically adjusting the volume of the robot in the embodiment is applied to the Android system, the first calculation module 100 may apply the FaceDetector type to determine whether the image captured by the camera has a face image. Android has a built-in face recognition API: FaceDetector, which can perform face recognition with a small amount of code, but this recognition is the most basic recognition, that is, only the face in the image can be recognized.

The main methods provided by the Neven library:

In the Android system, the first calculation module 100 can acquire the two-eye position of the face image through the FaceDetector class, and determine the location of the face image on the desktop according to the two-eye position. The specific step may be: acquiring the center point of the two-eye position of the face image, acquiring the distance between the two eyes of the face image (pitch distance), and drawing a rectangular area (rectangular frame) according to the center point of the two-eye position and the distance between the two eyes, and the rectangular area As the location of the face image on the desktop. The center point of the two-eye position of the face image can be obtained by the following code:

mFace[i].getMidPoint(eyeMidPoint);

The two-eye spacing of the face image can be obtained by the following code:

eyesDistance=mFace[i].eyesDistance();

You can draw a rectangular area with the following code:

Regarding the face detection method, it will not be described here. In the present embodiment, the face detection method is not limited. After detecting the human face, the first calculation module 100 calculates the detected height h of the second user and the distance d of the relative robot, determines the height gain k _h according to the relationship between h and H, and determines the distance gain by the relationship between d and D. k _d .

The distance d of the second user relative to the robot can be calculated and determined by the sensed values of the associated ranging sensors, such as infrared ranging by infrared sensors, laser ranging using laser sensors, and the like. In the present embodiment, the first calculation module 100 calculates the determination d by an image analysis method.

In this embodiment, the second user image feature includes a portrait distance, and the portrait distance in the image of the first user when the distance from the robot is D1 is A1 (actual interpupillary distance), and the distance between the first user and the robot is When the portrait distance in the image at D2 is A2 (actual interpupillary distance), the first calculation module 100 calculates the distance d between the second user and the robot by the following formula:

d=k(a-A1)+D1

among them

Of course, the first calculation module 100 can also use other image analysis methods to calculate the determination d. For example, other calculation formulas are used, and only the above two assumptions and principles need to be followed, and details are not described herein.

After the first calculation module 100 determines d, the distance gain k _{d is} determined according to the relationship between d and D. The distance gains k _d and d have a positive relationship, such as a proportional relationship. In the present embodiment, the distance gain k _d = d / D. Of course, in other embodiments, other calculation manners may be used, k _d =d/D+m (m is a preset coefficient), and the like, and details are not described herein.

In this embodiment, when the first user distance is C (the actual lay length), the portrait distance in the corresponding image is c (the portrait distance in the image), and the first calculation module 100 calculates the first formula by the following formula. Two user height h:

Similarly, the first calculation module 100 may also use other image analysis methods to calculate and determine h. For example, other calculation formulas may be used, and only the above two assumptions and principles need to be followed, and details are not described herein.

After the first calculation module 100 determines h, the height gain k _{h is} determined according to the relationship between h and H. Similarly, h and H are positively related, such as a proportional relationship. In the present embodiment, the height gain k _h = (h - Δ) / (H - Δ), where Δ is the speaker height. Of course, in some embodiments, the speaker height can be ignored, ie Δ=0.

The second computing module 200 obtains the ambient noise value v _e by collecting the ambient volume through the ambient microphone, and determines the corresponding environmental gain k _e according to the v _e and the preset correspondence. Specifically, the environmental noise value v _{e is} obtained by collecting the ambient volume by the environmental microphone, and the environmental gain k _e corresponding to the interval range is determined according to the range of the interval in which the v _e is located.

For example, set the noise standard to 70dB.

For quiet environments (v _e <40dB), k _e =0.8;

For normal environments (40dB < v _e <70dB), k _e =1;

For noisy environments (70dB<v _e <90dB), k _e =1+(v _e -70)/100;

For extremely noisy cases (v _e >90dB), k _e =∞;

Of course, in some embodiments, the second calculation module 200 can determine the environmental gain k _e by using v _e and a preset calculation formula.

In one embodiment, the ambient microphone includes at least a first microphone and a second microphone, the first microphone and the second microphone being located on both sides of the robot (baseline in front of the robot), such as on both sides of the robot head, or two robot torso Side, see FIG. 3; the process of the second computing module 200 collecting the ambient volume through the ambient microphone to obtain the ambient noise value v _e includes:

After the second calculation module 200 determines v _e, by the data table can query v _e the interval range, and to give the section v _e environmental gain corresponding to the range k _e.

The volume calculation module 300 determines the speaker volume based on k _h , k _d , k _{e ,} and V

Equivalent to the source gain,

The device for automatically adjusting the volume by the robot determines the speaker volume V _m by determining the distance d between the second user and the second user relative to the robot and the ambient noise value v _e measured by the ambient microphone, so that the robot can intelligently adjust according to the actual situation. The speaker volume, which gives the user the optimum volume level regardless of the environment, improves the interaction efficiency and user experience.

The application also provides a computer device comprising a memory and a processor, the memory storing computer readable instructions, the computer readable instructions being executed by the processor, causing the processor to perform any of the above The steps of the method for automatically adjusting the volume of the robot by the embodiment.

The present application also provides a storage medium storing computer readable instructions that, when executed by one or more processors, cause one or more processors to perform the robotic automatic described in any of the above embodiments The steps of the method of adjusting the volume.

The robot has a camera, a speaker and an environmental microphone for collecting ambient sound, and the volume of the speaker corresponding to the first household when the distance from the robot is D is V, the method includes the following steps: The camera acquires an image and detects a second user image feature in the image, and calculates a height h of the second user and a distance d relative to the robot according to the second user image feature, according to the h and the H The relationship determines the height gain k _h , and the relationship between the d and the D determines the distance gain k _d ; the ambient noise is obtained by the ambient microphone to obtain the ambient noise value v _e , which is determined according to the v _e and the preset correspondence relationship Corresponding environmental gain k _e ; determining speaker volume according to the k _h , the k _d , the k _{e ,} and the V

The speaker volume V _m is determined by judging the distance h between the user's h and the user relative to the robot and the ambient noise value v _e measured by the ambient microphone, so that the robot can intelligently adjust the speaker volume according to the actual situation, so that the user can be given the environment regardless of the environment. The most appropriate volume level improves interaction efficiency and user experience.

A person skilled in the art can understand that all or part of the process of implementing the above embodiment method can be completed by a computer program to instruct related hardware, and the computer program can be stored in a computer readable storage medium. When executed, the flow of an embodiment of the methods as described above may be included. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Claims

A method for automatically adjusting a volume by a robot, the robot having a camera, a speaker, and an environment microphone for collecting ambient sound, and the first user having a predefined height H is corresponding to a speaker volume of V when the distance from the robot is D. The method includes the following steps:

Acquiring an image by the camera and detecting an image feature of a second user in the image, calculating a height h of the second user and a distance d relative to the robot according to the second user image feature, according to the h The relationship with the H determines a height gain k h , and the relationship between the d and the D determines a distance gain k d ;

Obtaining an ambient noise value v e by collecting the ambient volume by the ambient microphone, and determining a corresponding environmental gain k e according to the v e and the preset correspondence relationship;

Determining the speaker volume according to the k h , the k d , the k e , and the V
The method of automatically adjusting a volume of a robot according to claim 1, wherein the second user image feature comprises a portrait pupil distance, and predefining a portrait distance of the first user in the image when the distance from the robot is D1 For A1, when the first user is at a distance A2 from the robot when the robot is at a distance D2, the distance d of the second user relative to the robot is calculated by the following formula: d=k ( a-A1)+D1,

among them
a is the portrait distance of the second user in the image.
The method of automatically adjusting a volume of a robot according to claim 1, wherein the second user image feature comprises a portrait distance, and a pre-defined vertical distance of the first user is C, corresponding to a portrait distance in the image. c, the second user height h is calculated by the following formula:

Where H1 is the camera height, and Δh is the pixel difference between the detected center of the rectangular frame and the center of the image.
The method of automatically adjusting a volume of a robot according to claim 1, wherein the ambient microphone comprises at least a first microphone and a second microphone, the first microphone and the second microphone being located on both sides of the robot; and the ambient volume is collected by the ambient microphone Obtaining an environmental noise value v e includes:

Acquiring the ambient volume by the first microphone to obtain a first ambient noise value v 1 , collecting the ambient volume by the second microphone to obtain a second ambient noise value v 2 , determining the largest of the v 1 and the v 2 as the environment The noise value v e .
A method of automatically adjusting a volume by a robot according to claim 1, a height gain k h = (h - Δ) / (H - Δ), wherein Δ is a speaker height.
The method of automatically adjusting the volume of the robot according to claim 1, wherein the distance gain k d =d/D.
A device for automatically adjusting a volume of a robot, the robot having a camera, a speaker and an environment microphone for collecting ambient sound, and the first user having a predefined height H is corresponding to a speaker volume of V when the distance from the robot is D, The device includes:

a first calculation module, configured to acquire an image by the camera and detect a second user image feature in the image, and calculate a height h of the second user and a distance d of the relative robot according to the second user image feature, according to the The relationship between h and the H determines a height gain k h , and the relationship between the d and the D determines a distance gain k d ;

a second calculation module, configured to obtain an ambient noise value v e by collecting the ambient volume through the environment microphone, and determining and corresponding environment gain k e according to the v e and the preset correspondence relationship;

a volume calculation module, configured to determine a speaker volume according to the k h , the k d , the k e , and the V
The apparatus for automatically adjusting a volume of a robot according to claim 7, wherein the second user image feature comprises a portrait pupil distance, and predefining a portrait distance of the first user in the image when the distance from the robot is D1 For A1, the first user calculates the distance d of the second user relative to the robot by the following formula when the distance of the portrait in the image is A2 when the distance from the robot is D2:

d=k(a-A1)+D1,

among them
a is the portrait distance of the second user in the image.
A computer device comprising a memory and a processor, the memory storing computer readable instructions, the computer readable instructions being executed by the processor, causing the processor to perform a method for automatically adjusting a volume by a robot The robot has a camera, a speaker and an environment microphone for collecting ambient sound. The first user with a predefined height H is corresponding to a speaker volume of V when the distance from the robot is D. The method for the robot to automatically adjust the volume includes The following steps:

Acquiring an image by the camera and detecting an image feature of the second user in the image, calculating a height h of the second user and a distance d relative to the robot according to the second user image feature, according to the h and the The relationship of H determines the height gain k h , and the relationship between d and the D determines the distance gain k d ;

Obtaining an ambient noise value v e by collecting the ambient volume by the ambient microphone, and determining a corresponding environmental gain k e according to the v e and the preset correspondence relationship;

Determining the speaker volume according to the k h , the k d , the k e , and the V
The computer device according to claim 9, wherein the second user image feature comprises a portrait distance, and the portrait distance in the image of the first user when the distance from the robot is D1 is predefined to be A1. When the distance of the portrait in the image is A2 when the first user is at a distance D2 from the robot, the distance d between the second user and the robot is calculated by the following formula:

d=k(a-A1)+D1,

among them
a is the portrait distance of the second user in the image.
The computer device according to claim 9, wherein the second user image feature comprises a portrait distance, and if the real user's distance in the image is c when the real user's real distance is C, the The following formula calculates the second user height h:

Where H1 is the camera height, and Δh is the pixel difference between the detected center of the rectangular frame and the center of the image.
The computer device according to claim 9, wherein the ambient microphone comprises at least a first microphone and a second microphone, the first microphone and the second microphone being located on both sides of the robot; and the ambient sound is collected by the ambient microphone to obtain an environmental noise value v e includes:

Acquiring the ambient volume by the first microphone to obtain a first ambient noise value v 1 , collecting the ambient volume by the second microphone to obtain a second ambient noise value v 2 , determining the largest of the v 1 and the v 2 as the environment The noise value v e .
The computer apparatus according to claim 9, wherein the height gain k h = (h - Δ) / (H - Δ), wherein Δ is the speaker height.
The computer apparatus according to claim 9, the distance gain k d = d / D.
A non-volatile storage medium having stored therein computer readable instructions that, when executed by one or more processors, cause one or more processors to perform an automatic robot adjustment The method of volume, the robot has a camera, a speaker and an environment microphone for collecting ambient sound, and the first user with a predefined height H is corresponding to a speaker volume of V when the distance from the robot is D, and the robot automatically adjusts the volume. The method includes the following steps:

Acquiring an image by the camera and detecting an image feature of the second user in the image, calculating a height h of the second user and a distance d relative to the robot according to the second user image feature, according to the h and the The relationship of H determines the height gain k h , and the relationship between d and the D determines the distance gain k d ;

Obtaining an ambient noise value v e by collecting the ambient volume by the ambient microphone, and determining a corresponding environmental gain k e according to the v e and the preset correspondence relationship;

Determining the speaker volume according to the k h , the k d , the k e , and the V
The non-volatile storage medium of claim 15, wherein the second user image feature comprises a portrait distance, predefining a portrait distance of the first user in the image when the distance from the robot is D1 For A1, when the distance of the portrait in the image is A2 when the first user is at a distance D2 from the robot, the distance d between the second user and the robot is calculated by the following formula:

d=k(a-A1)+D1,

among them
a is the portrait distance of the second user in the image.
The non-volatile storage medium according to claim 15, wherein the second user image feature comprises a portrait distance, and a pre-defined vertical distance of the first user is C, corresponding to a portrait distance in the image. c, the second user height h is calculated by the following formula:

Where H1 is the camera height, and Δh is the pixel difference between the detected center of the rectangular frame and the center of the image.
The non-volatile storage medium of claim 15, the ambient microphone comprising at least a first microphone and a second microphone, the first microphone and the second microphone being located on both sides of the robot; the collecting the ambient volume through the ambient microphone Obtaining an environmental noise value v e includes:

Acquiring the ambient volume by the first microphone to obtain a first ambient noise value v 1 , collecting the ambient volume by the second microphone to obtain a second ambient noise value v 2 , determining the largest of the v 1 and the v 2 as the environment The noise value v e .
The nonvolatile storage medium according to claim 15, wherein the height gain k h = (h - Δ) / (H - Δ), wherein Δ is the speaker height.
The nonvolatile storage medium according to claim 15, wherein the distance gain k d =d/D.