CN108197618B

CN108197618B - Method and device for generating human face detection model

Info

Publication number: CN108197618B
Application number: CN201810307313.XA
Authority: CN
Inventors: 何泽强
Original assignee: Baidu Online Network Technology Beijing Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd
Priority date: 2018-04-08
Filing date: 2018-04-08
Publication date: 2021-10-22
Anticipated expiration: 2038-04-08
Also published as: CN108197618A

Abstract

The embodiment of the application discloses a method and a device for generating a face detection model. One embodiment of the method comprises: obtaining a sample set; selecting samples from the sample set, and performing the following training steps: inputting a sample face image of a selected sample into an initial model to obtain face characteristic information and head characteristic information of the sample; respectively determining a face characteristic loss value and a head characteristic loss value; according to the preset face weight and the head weight, taking the weighting result of the face characteristic loss value of the sample and the head characteristic loss value of the sample as the total loss value of the sample, and comparing the total loss value of the sample with a target value; determining whether the initial model is trained according to the comparison result; in response to determining that the initial model training is complete, the initial model is treated as a face detection model. By the embodiment, a model which can be used for face detection can be obtained. And the method enriches the generation mode of the model.

Description

Method and device for generating human face detection model

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for generating a face detection model.

Background

Face Detection (Face Detection) is a key link in an automatic Face recognition system. It generally refers to any given image, which is searched by a certain strategy to determine whether the image contains a human face. If the human face is contained, the position, size, posture and the like of the human face can be returned.

With the rapid development of artificial intelligence, the existing face detection technology usually inputs images into a trained neural network, so as to obtain the face detection result of the images.

Disclosure of Invention

The embodiment of the application provides a method and a device for generating a human face detection model, and a method and a device for detecting a human face.

In a first aspect, an embodiment of the present application provides a method for generating a face detection model, including: acquiring a sample set, wherein samples in the sample set comprise sample face images, sample face characteristic information and sample head characteristic information corresponding to the sample face images; selecting samples from the sample set, and performing the following training steps: inputting a sample face image of a selected sample into an initial model to obtain face characteristic information and head characteristic information of the sample; analyzing the obtained facial feature information and the corresponding sample facial feature information to determine a facial feature loss value; analyzing the obtained head characteristic information and the corresponding sample head characteristic information to determine a head characteristic loss value; according to the preset face weight and the head weight, taking the weighting result of the face characteristic loss value of the sample and the head characteristic loss value of the sample as the total loss value of the sample, and comparing the total loss value of the sample with a target value; determining whether the initial model is trained according to the comparison result; in response to determining that the initial model training is complete, the initial model is treated as a face detection model.

In some embodiments, the preset face weight and head weight are obtained by: detecting skin color areas of five sense organs and/or face regions in the sample face image; determining the shielding proportion of the face according to the detection result; based on the shielding proportion, adjusting the preset initial face weight and the initial head weight to respectively obtain the face weight and the head weight, wherein the face weight is in negative correlation with the shielding proportion, and the head weight is in positive correlation with the shielding proportion.

In some embodiments, when the occlusion ratio is smaller than the preset ratio value, the face weight is the maximum value of the weight range, and the head weight is the minimum value of the weight range.

In some embodiments, the method further comprises: in response to determining that the initial model is untrained, adjusting relevant parameters in the initial model, and reselecting samples from the sample set, continuing to perform the training step.

In a second aspect, an embodiment of the present application provides an apparatus for generating a face detection model, including: the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a sample set, and samples in the sample set comprise sample face images and sample face characteristic information and sample head characteristic information corresponding to the sample face images; a training unit configured to select samples from the sample set and to perform the following training steps: inputting a sample face image of a selected sample into an initial model to obtain face characteristic information and head characteristic information of the sample; analyzing the obtained facial feature information and the corresponding sample facial feature information to determine a facial feature loss value; analyzing the obtained head characteristic information and the corresponding sample head characteristic information to determine a head characteristic loss value; according to the preset face weight and the head weight, taking the weighting result of the face characteristic loss value of the sample and the head characteristic loss value of the sample as the total loss value of the sample, and comparing the total loss value of the sample with a target value; determining whether the initial model is trained according to the comparison result; in response to determining that the initial model training is complete, the initial model is treated as a face detection model.

In some embodiments, the apparatus further comprises a weight obtaining unit configured to: detecting skin color areas of five sense organs and/or face regions in the sample face image; determining the shielding proportion of the face according to the detection result; based on the shielding proportion, adjusting the preset initial face weight and the initial head weight to respectively obtain the face weight and the head weight, wherein the face weight is in negative correlation with the shielding proportion, and the head weight is in positive correlation with the shielding proportion.

In some embodiments, the apparatus further comprises: and the adjusting unit is configured to adjust the relevant parameters in the initial model in response to the fact that the initial model is determined not to be trained, reselect samples from the sample set and continue to execute the training step.

In a third aspect, an embodiment of the present application provides a method for detecting a human face, including: acquiring a face image of a detection object; the face image is input into the face detection model generated by the method described in any of the embodiments of the first aspect, and a face detection result of the detection object is generated.

In a fourth aspect, an embodiment of the present application provides an apparatus for detecting a human face, including: an acquisition unit configured to acquire a face image of a detection object; a generating unit, configured to input a face image into a face detection model generated by adopting the method described in any one of the embodiments of the first aspect, and generate a face detection result of the detection object.

In a fifth aspect, an embodiment of the present application provides an electronic device, including: one or more processors; storage means for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method as described in any of the embodiments of the first and third aspects above.

In a sixth aspect, the present application provides a computer-readable medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method as described in any of the first and third aspects above.

According to the method and the device for generating the face detection model, provided by the embodiment of the application, the sample set is obtained, and the sample can be selected from the sample set to train the initial model. The samples in the sample set may include sample face images, and sample face feature information and sample head feature information corresponding to the sample face images. Therefore, the face characteristic information and the head characteristic information of the sample can be obtained by inputting the sample face image of the selected sample into the initial model. The obtained facial feature information and head feature information may then be analyzed with corresponding sample facial feature information and sample head feature information, respectively, to determine a facial feature loss value and a head feature loss value. Then, according to the preset face weight and head weight, a weighting result of the face feature loss value of the sample and the head feature loss value of the sample may be taken as a total loss value of the sample, and the total loss value of the sample may be compared with a target value. Finally, it may be determined whether the initial model is trained based on the comparison. And if the training of the initial model is determined to be finished, the trained initial model can be used as the face detection model. A model can thus be obtained that can be used for face detection. And facilitates a rich model generation approach.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for generating a face detection model according to the present application;

FIG. 3 is a schematic diagram of an application scenario of the method for generating a face detection model according to the present application;

FIG. 4 is a schematic diagram illustrating an embodiment of an apparatus for generating a face detection model according to the present application;

FIG. 5 is a flow diagram of one embodiment of a method for detecting faces according to the present application;

FIG. 6 is a schematic diagram of an embodiment of an apparatus for detecting human faces according to the application;

FIG. 7 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 illustrates an exemplary system architecture 100 of a method for generating a face detection model, an apparatus for generating a face detection model, a method for detecting a face, or an apparatus for detecting a face to which embodiments of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminals

101, 102, a network 103, a database server 104, and a server 105. The network 103 serves as a medium for providing communication links between the

terminals

101, 102, the database server 104 and the server 105. Network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user 110 may use the

terminals

101, 102 to interact with the server 105 over the network 103 to receive or send messages or the like. The

terminals

101 and 102 may have various client applications installed thereon, such as a model training application, a face detection and recognition application, a shopping application, a payment application, a web browser, an instant messenger, and the like.

Here, the

terminals

101 and 102 may be hardware or software. When the

terminals

101 and 102 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III), laptop portable computers, desktop computers, and the like. When the

terminals

101 and 102 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

When the

terminals

101, 102 are hardware, an image capturing device may be further mounted thereon. The image acquisition device can be various devices capable of realizing the function of acquiring images, such as a camera, a sensor and the like. The user 110 may use the image capturing device on the

terminal

101, 102 to capture the facial image of himself or another person.

Database server 104 may be a database server that provides various services. For example, a database server may have a sample set stored therein. The sample set contains a large number of samples. The sample may include a sample face image, and sample facial feature information and sample head feature information corresponding to the sample face image. In this way, the user 110 may also select samples from a set of samples stored by the database server 104 via the

terminals

101, 102.

The server 105 may also be a server providing various services, such as a background server providing support for various applications displayed on the

terminals

101, 102. The background server may train the initial model using samples in the sample set sent by the

terminals

101 and 102, and may send a training result (e.g., a generated face detection model) to the

terminals

101 and 102. In this way, the user can apply the generated face detection model to perform face detection.

Here, the database server 104 and the server 105 may be hardware or software. When they are hardware, they can be implemented as a distributed server cluster composed of a plurality of servers, or as a single server. When they are software, they may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the method for generating a face detection model or the method for detecting a face provided in the embodiment of the present application is generally performed by the server 105. Accordingly, the means for generating a face detection model or the means for detecting a face are also typically provided in the server 105.

It is noted that database server 104 may not be provided in system architecture 100, as server 105 may perform the relevant functions of database server 104.

It should be understood that the number of terminals, networks, database servers, and servers in fig. 1 are merely illustrative. There may be any number of terminals, networks, database servers, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating a face detection model according to the present application is shown. The method for generating a face detection model may comprise the steps of:

step 201, a sample set is obtained.

In the present embodiment, the execution subject of the method for generating a face detection model (e.g., the server 105 shown in fig. 1) may acquire a sample set in a variety of ways. For example, the executing entity may obtain the existing sample set stored therein from a database server (e.g., database server 104 shown in fig. 1) via a wired connection or a wireless connection. As another example, a user may collect a sample via a terminal (e.g.,

terminals

101, 102 shown in FIG. 1). In this way, the executing entity may receive samples collected by the terminal and store the samples locally, thereby generating a sample set.

Here, the sample set may include at least one sample. The sample may include a sample face image, and sample facial feature information and sample head feature information corresponding to the sample face image. The sample facial feature information here may be information for characterizing facial features in an image. The sample header feature information may be information for characterizing a header feature in the image. For example, the sample facial feature information may include position information of the face in the image or face contour keypoint information, such as face bounding box (x, y, w, h). Wherein x represents the abscissa of the center point of the frame; y represents the center point ordinate of the frame; w represents the width of the frame; h represents the length of the frame.

It is understood that the sample facial feature information and the sample head feature information may be manually set in advance, or may be obtained by executing a certain setting program by a subject or other equipment. As an example, where the location of the face bounding box is known, the executing subject may determine the center point location of the face bounding box. Thereafter, a head frame may be determined centered on the center point. Specifically, the execution subject may enlarge the size of the face frame by a certain factor centering on the center point, thereby obtaining the head frame. Or, in consideration of factors such as a hairstyle and a human face posture, the execution main body may move the enlarged frame upward (toward the top of the head), leftward and/or rightward by a certain distance with the central point as a reference, thereby obtaining a head frame.

In the present embodiment, the sample face image generally refers to an image containing a face. It may be a planar face image or a stereo face image (i.e. a face image containing depth information). And the sample face image may be a color image (e.g., RGB (Red, Green, Blue, Red-Green-Blue) photograph) and/or a grayscale image, etc. The Format of the Image is not limited in the present application, and may be a Format such as jpg (Joint Photo graphics Experts Group, a picture Format), BMP (Bitmap, Image file Format), or RAW (RAW Image Format), as long as the subject reading and recognition can be performed.

At step 202, a sample is selected from a sample set.

In this embodiment, the executing subject may select a sample from the sample set obtained in step 201, and perform the training steps from step 203 to step 208. The selection manner and the number of samples are not limited in the present application. For example, at least one sample may be randomly selected, or a sample with better sharpness (i.e., higher pixels) of the face image of the sample may be selected from the samples.

Step 203, inputting the sample face image of the selected sample into the initial model to obtain the face characteristic information and the head characteristic information of the sample.

In this embodiment, the executing subject may input a sample face image of the sample selected in step 202 into the initial model. By detecting and analyzing the face region in the sample face image, the face feature information can be obtained. Meanwhile, head feature information can be obtained by detecting and analyzing the head region of the sample face image. The facial feature information may be information for characterizing facial features in the image. The head feature information may be information for characterizing a head feature in the image.

It is understood that the head region often includes a face region, and therefore the head feature information often includes face feature information. In addition, in order to implement the training of the model, the face feature information and the head feature information obtained here generally have the same representation as the sample face feature information. For example, the facial feature information may be a facial frame (x, y, w, h). The header feature information may be a header frame (x, y, w, h).

In the present embodiment, the initial model may be various existing neural network models created based on machine learning techniques. The neural network model may have various existing neural network structures (e.g., DenseBox, VGGNet, ResNet, SegNet, etc.). The storage location of the initial model is likewise not limited in this application.

And 204, analyzing the obtained facial feature information and the corresponding sample facial feature information to determine a facial feature loss value.

In this embodiment, the execution subject may analyze the facial feature information of the sample face image and the sample facial feature information corresponding to the sample face image, so that a facial feature loss value may be determined. For example, the facial feature information and the corresponding sample facial feature information may be input to a specified loss function (loss function) as parameters, and a loss value between the two may be calculated.

In this embodiment, the loss function is generally used to measure the degree of disparity between the predicted value (e.g., facial feature information) and the actual value (e.g., sample facial feature information) of the model. It is a non-negative real-valued function. In general, the smaller the loss function, the better the robustness of the model. The loss function may be set according to actual requirements.

And step 205, analyzing the obtained head characteristic information and the corresponding sample head characteristic information to determine a head characteristic loss value.

In this embodiment, the execution subject may further analyze the head feature information of the sample face image and the sample head feature information corresponding to the sample face image, so that a head feature loss value may be determined. For example, reference may be made to the related method described in step 204, which is not described herein again.

And step 206, taking the weighting result of the facial feature loss value of the sample and the head feature loss value of the sample as the total loss value of the sample according to the preset facial weight and head weight, and comparing the total loss value of the sample with the target value.

In this embodiment, the execution subject may perform weighting processing on the face feature loss value and the head feature loss value of the same sample according to a preset face weight and a preset head weight. Namely, the preset face weight is the weight of the loss value of the face feature. The preset head weight is the weight of the head characteristic loss value. The execution subject may then use the above-mentioned weighting result for the same sample as the total loss value for that sample. And the total loss value of the selected sample may be compared to a target value.

In the present embodiment, the preset face weight and the head weight may be set according to actual circumstances. While the target value may generally be used to represent an ideal case of a degree of inconsistency between the predicted value (i.e., facial feature information, head feature information) and the true value (sample facial feature information, sample head feature information). That is, when the total loss value is less than the target value, the predicted value may be considered to be close to or approximate the true value. The target value may be set according to actual demand.

It should be noted that if a plurality of (at least two) samples are selected in step 202, the executive agent may compare the total loss value of each sample with the target value respectively. It can thus be determined whether the total loss value for each sample is less than the target value.

In some optional implementations of this embodiment, the preset face weight and the preset head weight may be a fixed weight value respectively. Also, since the main purpose is to detect a human face, the face weight may be preset to be relatively large, such as 80%. Meanwhile, the weight of the head may be preset to be relatively small, such as 20%.

Optionally, in order to improve the quasi-determination of the detection result, the execution subject may also dynamically adjust the face weight and/or the head weight according to different sample face images. That is, the preset face weight and/or the preset head weight may be a non-fixed weight value. As an example, the preset face weight and head weight may be obtained by:

first, the executive principal may detect the skin tone area of the five sense organs and/or the facial region in the sample face image. Then, the occlusion ratio of the face (i.e., the ratio of the occlusion area of the face to the total area of the face) can be determined from the detection result. Then, based on the occlusion ratio, the executive subject may adjust the preset initial face weight and initial head weight, resulting in the face weight and the head weight, respectively. Wherein the face weight is negatively correlated with the occlusion proportion, and the head weight is positively correlated with the occlusion proportion. That is, the higher the occlusion ratio is, the smaller the face weight becomes and the larger the head weight becomes.

For example, when the two eyes, nose, and mouth of a face in the sample face image are detected, the occlusion ratio may be considered to be 0. When no mouth is detected, the occlusion ratio can be considered to be 30%. And when the nose and mouth are not detected, the occlusion ratio can be considered to be 60%. Meanwhile, the initial face weight may be set to 1, and the initial head weight may be 0. The corresponding initial occlusion ratio may be 0 at this time. In this way, the executive may lower the initial face weight and raise the initial head weight according to the currently determined occlusion ratio (e.g., 30%), resulting in a face weight (e.g., 0.7) and a head weight (0.3), respectively.

For another example, different occlusion ratio ranges and face weights and head weights corresponding thereto may be set in advance. When the shielding ratio is between 10% and 30%, the face weight is 0.9, and the head weight is 0.1; when the occlusion ratio is between 30% and 50%, the face weight is 0.7, the head weight is 0.3, and so on. At this time, the execution subject may first determine a range in which the occlusion proportion of the sample face image is located. Then, the initial face weight and the initial head weight can be adjusted according to the face weight and the head weight corresponding to the range of the shielding ratio. That is, the face weight and the head weight corresponding to the range of the occlusion ratio are respectively used as the face weight and the head weight of the sample face image.

Further, when the shielding ratio is less than the preset ratio value, the face weight may be a maximum value of the weight range, and the head weight may be a minimum value of the weight range. For example, when the occlusion ratio is less than 10%, the face weight may be 1 and the head weight may be 0. That is, when the occlusion ratio is smaller than the preset ratio value, it can be considered that the face in the sample face image is substantially not occluded or the occluded area is small. That is, it means that more facial feature data can be acquired at this time. Less head feature data may be disregarded or fused.

The preset proportion value and the weight range can also be set according to actual requirements. For example, the face weight and the head weight may correspond to the same weight range. For another example, the face weight and the head weight may be set to have corresponding weight ranges.

Here, the face weight and the head weight are dynamically adjusted according to the situation of different sample face images. Can help to improve the flexibility and the application range of the training method. Meanwhile, the model obtained by training by the method can obtain more accurate detection results.

It can be understood that the face feature loss value and the head feature loss value are fused by using a weighting method, so as to adjust the optimization model. The face detection model obtained by training by the method can effectively improve the robustness of face detection. Especially, under the conditions of serious face shielding, poor illumination and the like, the accuracy of face detection and identification can be improved by combining the head information.

And step 207, determining whether the initial model is trained completely according to the comparison result.

In this embodiment, the executive may determine whether the initial model is trained completely based on the comparison in step 206. As an example, if multiple samples are selected in step 202, the performing agent may determine that the initial model training is complete if the total loss value for each sample is less than the target value. As another example, the performing agent may count the proportion of samples with total loss values less than the target value to the selected samples. And when the ratio reaches a preset sample ratio (e.g., 95%), it can be determined that the initial model training is complete.

In this embodiment, if the execution subject determines that the initial model has been trained, the execution subject may continue to perform step 208. If the executing agent determines that the initial model is not trained, the relevant parameters in the initial model may be adjusted. For example, using back propagation techniques to modify the weights in each convolutional layer in the initial model. And may return to step 202 to re-select samples from the sample set. So that the training steps described above can be continued.

It should be noted that the selection manner is not limited in the present application. For example, in the case where there are a large number of samples in the sample set, the execution subject may select a non-selected sample from the sample set.

And step 208, in response to determining that the training of the initial model is finished, taking the initial model as a face detection model.

In this embodiment, if the subject is executed to determine that the initial model training is completed, the initial model (i.e., the trained initial model) may be used as the face detection model.

In addition, the detection of the face region and the head region may be performed by two independent submodels in the initial model; or by different parts of the initial model that are related to each other. That is, the initial model may be composed of a plurality of submodels, or may be a complete model including a plurality of submodel functions.

Alternatively, the execution subject may store the generated face detection model locally, or may send it to a terminal or a database server.

With further reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for generating a face detection model according to the present embodiment. In the application scenario of fig. 3, a model training application may be installed on the terminal 31 used by the user. After the user opens the application and uploads the sample set or the storage path of the sample set, the server 32 providing background support for the application may run a method for generating a face detection model, including:

first, a sample set may be obtained. Wherein the samples in the sample set may include a sample face image 321, and sample face feature information 322 and sample header feature information 323 corresponding to the sample face image. Thereafter, samples may be selected from the sample set, and the following training steps performed: inputting a sample face image 321 of the selected sample into the initial model 320 to obtain face characteristic information 322 'and head characteristic information 323' of the sample; analyzing the facial feature information 322' with the corresponding sample facial feature information 322 to determine a facial feature loss value 324; analyzing the head characteristic information 323' and the corresponding sample head characteristic information 323 to determine a head characteristic loss value 325; taking the weighted result of the facial feature loss value 324 of the sample and the head feature loss value 325 of the sample as a total loss value 326 of the sample according to the preset facial weight and head weight, and comparing the total loss value 326 of the sample with a target value; determining whether the initial model 320 is trained according to the comparison result; in response to determining that the initial model 320 training is complete, the initial model 320 is considered a face detection model 320'.

At this time, the server 32 may also transmit prompt information indicating that the model training is completed to the terminal 31. The prompt message may be a voice and/or text message. In this way, the user can acquire the face detection model at a preset storage location.

In the method for generating a face detection model in this embodiment, by obtaining a sample set, samples can be selected from the sample set to perform training of an initial model. The samples in the sample set may include sample face images, and sample face feature information and sample head feature information corresponding to the sample face images. Therefore, the face characteristic information and the head characteristic information of the sample can be obtained by inputting the sample face image of the selected sample into the initial model. Then, from the sample facial feature information and the sample head feature information, a facial feature loss value and a head feature loss value may be determined, respectively. Then, according to the preset face weight and head weight, a weighting result of the face feature loss value of the sample and the head feature loss value of the sample may be taken as a total loss value of the sample, and the total loss value of the sample may be compared with a target value. Finally, it may be determined whether the initial model is trained based on the comparison. And if the training of the initial model is determined to be finished, the trained initial model can be used as the face detection model. A model can thus be obtained that can be used for face detection. And facilitates a rich model generation approach.

With continuing reference to FIG. 4, as an implementation of the methods illustrated in the above figures, the present application provides one embodiment of an apparatus for generating a face detection model. The embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device can be applied to various electronic devices.

As shown in fig. 4, the apparatus 400 for generating a face detection model according to the present embodiment may include: an obtaining unit 401 configured to obtain a sample set, where samples in the sample set include a sample face image, and sample face feature information and sample head feature information corresponding to the sample face image; a training unit 402 configured to select samples from the sample set and to perform the following training steps: inputting a sample face image of a selected sample into an initial model to obtain face characteristic information and head characteristic information of the sample; analyzing the obtained facial feature information and the corresponding sample facial feature information to determine a facial feature loss value; analyzing the obtained head characteristic information and the corresponding sample head characteristic information to determine a head characteristic loss value; according to the preset face weight and the head weight, taking the weighting result of the face characteristic loss value of the sample and the head characteristic loss value of the sample as the total loss value of the sample, and comparing the total loss value of the sample with a target value; determining whether the initial model is trained according to the comparison result; in response to determining that the initial model training is complete, the initial model is treated as a face detection model.

In some optional implementations of this embodiment, the apparatus 400 may further include a weight obtaining unit (not shown in fig. 4) configured to: detecting skin color areas of five sense organs and/or face regions in the sample face image; determining the shielding proportion of the face according to the detection result; based on the shielding proportion, adjusting the preset initial face weight and the initial head weight to respectively obtain the face weight and the head weight, wherein the face weight is in negative correlation with the shielding proportion, and the head weight is in positive correlation with the shielding proportion.

Further, when the shielding ratio is smaller than the preset ratio value, the face weight is the maximum value of the weight range, and the head weight is the minimum value of the weight range.

Optionally, the apparatus 400 may further include: an adjusting unit 403, configured to adjust relevant parameters in the initial model and to reselect samples from the sample set in response to determining that the initial model is not trained completely, and to continue to perform the training step.

It will be understood that the elements described in the apparatus 400 correspond to various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 400 and the units included therein, and will not be described herein again.

Referring to fig. 5, a flowchart 500 of an embodiment of a method for detecting a human face provided by the present application is shown. The method for detecting a human face may include the steps of:

step 501, acquiring a face image of a detection object.

In the present embodiment, an execution subject (e.g., the server 105 shown in fig. 1) of the method for detecting a face may acquire a face image of a detection target in various ways. For example, the execution subject may obtain the facial image stored therein from a database server (e.g., database server 104 shown in fig. 1) through a wired connection or a wireless connection. As another example, the execution subject may also receive a face image captured by a terminal (e.g.,

terminals

101 and 102 shown in fig. 1) or other device.

In the present embodiment, the detection object may be any user, such as a user using a terminal, or another user who appears in the image capturing range, or the like. The face image may also be a color image and/or a grayscale image, etc. And the format of the face image is not limited in the present application.

Step 502, inputting the face image into the face detection model, and generating the face detection result of the detection object.

In this embodiment, the execution subject may input the face image acquired in step 501 into the face detection model, thereby generating a face detection result of the detection object. The face detection result may be information for describing a face in the image. For example, the face detection result may include whether or not a face is detected in the image, and facial feature information in the case where a face is detected, and the like.

In this embodiment, the face detection model may be generated by the method described in the embodiment of fig. 2. For a specific generation process, reference may be made to the related description of the embodiment in fig. 2, which is not described herein again.

It should be noted that the method for detecting a human face in this embodiment may be used to test the human face detection model generated in the foregoing embodiments. And then the face detection model can be continuously optimized according to the test result. The method may also be a practical application method of the face detection model generated by the above embodiments. The face detection model generated by the embodiments is adopted to detect the face, which is beneficial to improving the performance of the face detection. If more faces are found, the found face information is more accurate, and the like.

With continuing reference to fig. 6, as an implementation of the method illustrated in fig. 5 described above, the present application provides one embodiment of an apparatus for detecting a human face. The embodiment of the device corresponds to the embodiment of the method shown in fig. 5, and the device can be applied to various electronic devices.

As shown in fig. 6, the apparatus 600 for detecting a human face of the present embodiment may include: an acquisition unit 601 configured to acquire a face image of a detection object; a generating unit 602, configured to input the face image into the face detection model generated by the method as described in the embodiment of fig. 2, and generate a face detection result of the detection object.

It will be understood that the elements described in the apparatus 600 correspond to various steps in the method described with reference to fig. 5. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 600 and the units included therein, and are not described herein again.

Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input portion 706 including a touch panel, a keyboard, a mouse, a camera, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by a Central Processing Unit (CPU)701, performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit and a training unit. As another example, it can also be described as: a processor includes an acquisition unit and a generation unit. Where the names of these units do not in some cases constitute a limitation of the unit itself, for example, the acquisition unit may also be described as a "unit acquiring a sample set".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a sample set, wherein samples in the sample set comprise sample face images, sample face characteristic information and sample head characteristic information corresponding to the sample face images; selecting samples from the sample set, and performing the following training steps: inputting a sample face image of a selected sample into an initial model to obtain face characteristic information and head characteristic information of the sample; analyzing the obtained facial feature information and the corresponding sample facial feature information to determine a facial feature loss value; analyzing the obtained head characteristic information and the corresponding sample head characteristic information to determine a head characteristic loss value; according to the preset face weight and the head weight, taking the weighting result of the face characteristic loss value of the sample and the head characteristic loss value of the sample as the total loss value of the sample, and comparing the total loss value of the sample with a target value; determining whether the initial model is trained according to the comparison result; in response to determining that the initial model training is complete, the initial model is treated as a face detection model.

Further, the one or more programs, when executed by the electronic device, may further cause the electronic device to: acquiring a face image of a detection object; and inputting the face image into a face detection model to generate a face detection result of the detection object. The face detection model may be generated by using the method for generating the face detection model described in the above embodiments.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for generating a face detection model, comprising:

acquiring a sample set, wherein samples in the sample set comprise sample face images, sample face characteristic information and sample head characteristic information corresponding to the sample face images;

selecting samples from the sample set, and performing the following training steps: inputting a sample face image of a selected sample into an initial model to obtain face characteristic information and head characteristic information of the sample; analyzing the obtained facial feature information and the corresponding sample facial feature information to determine a facial feature loss value; analyzing the obtained head characteristic information and the corresponding sample head characteristic information to determine a head characteristic loss value; according to the preset face weight and the head weight, taking the weighting result of the face characteristic loss value of the sample and the head characteristic loss value of the sample as the total loss value of the sample, and comparing the total loss value of the sample with a target value; determining whether the initial model is trained according to the comparison result; in response to determining that the initial model training is completed, using the initial model as a face detection model, wherein the preset face weight is not less than the head weight, the preset face weight and the head weight are determined by an occlusion proportion of a face region in the sample face image, the face weight is inversely related to the occlusion proportion, and the head weight is positively related to the occlusion proportion.

2. The method of claim 1, wherein the preset face weight and head weight are determined by an occlusion ratio of a face region in the sample face image, comprising:

detecting skin color areas of five sense organs and/or face regions in the sample face image;

determining the shielding proportion of the face according to the detection result;

and adjusting the preset initial face weight and the initial head weight based on the shielding proportion to respectively obtain the face weight and the head weight.

3. The method according to claim 2, wherein, when the occlusion ratio is less than a preset ratio value, the face weight is a maximum value of the weight range and the head weight is a minimum value of the weight range.

4. The method according to one of claims 1-3, wherein the method further comprises:

in response to determining that the initial model is not trained completely, adjusting relevant parameters in the initial model, and reselecting samples from the set of samples, continuing to perform the training step.

5. An apparatus for generating a face detection model, comprising:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a sample set, wherein samples in the sample set comprise sample face images and sample face characteristic information and sample head characteristic information corresponding to the sample face images;

a training unit configured to select samples from the set of samples and to perform the following training steps: inputting a sample face image of a selected sample into an initial model to obtain face characteristic information and head characteristic information of the sample; analyzing the obtained facial feature information and the corresponding sample facial feature information to determine a facial feature loss value; analyzing the obtained head characteristic information and the corresponding sample head characteristic information to determine a head characteristic loss value; according to the preset face weight and the head weight, taking the weighting result of the face characteristic loss value of the sample and the head characteristic loss value of the sample as the total loss value of the sample, and comparing the total loss value of the sample with a target value; determining whether the initial model is trained according to the comparison result; in response to determining that the initial model training is completed, using the initial model as a face detection model, wherein the preset face weight is not less than the head weight, the preset face weight and the head weight are determined by an occlusion proportion of a face region in the sample face image, the face weight is inversely related to the occlusion proportion, and the head weight is positively related to the occlusion proportion.

6. The apparatus of claim 5, wherein the apparatus further comprises a weight obtaining unit configured to:

7. The apparatus of claim 6, wherein, when the occlusion ratio is less than a preset ratio value, the face weight is a maximum value of the weight range and the head weight is a minimum value of the weight range.

8. The apparatus according to one of claims 5-7, wherein the apparatus further comprises:

and the adjusting unit is configured to adjust relevant parameters in the initial model in response to determining that the initial model is not trained, and reselect samples from the sample set to continue the training step.

9. A method for detecting a human face, comprising:

acquiring a face image of a detection object;

inputting the face image into a face detection model generated by the method according to any one of claims 1 to 4, and generating a face detection result of the detection object.

10. An apparatus for detecting a human face, comprising:

an acquisition unit configured to acquire a face image of a detection object;

a generating unit configured to input the face image into a face detection model generated by the method according to any one of claims 1 to 4, and generate a face detection result of the detection object.

11. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-4, 9.

12. A computer-readable medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method of any of claims 1-4, 9.