CN111860362A - Method and device for generating human face image correction model and correcting human face image - Google Patents
Method and device for generating human face image correction model and correcting human face image Download PDFInfo
- Publication number
- CN111860362A CN111860362A CN202010720935.2A CN202010720935A CN111860362A CN 111860362 A CN111860362 A CN 111860362A CN 202010720935 A CN202010720935 A CN 202010720935A CN 111860362 A CN111860362 A CN 111860362A
- Authority
- CN
- China
- Prior art keywords
- face image
- face
- image
- sample
- confrontation network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 78
- 238000003702 image correction Methods 0.000 title claims abstract description 51
- 238000012549 training Methods 0.000 claims abstract description 27
- 230000014509 gene expression Effects 0.000 claims abstract description 24
- 230000001815 facial effect Effects 0.000 claims abstract description 20
- 239000002131 composite material Substances 0.000 claims abstract description 10
- 238000001514 detection method Methods 0.000 claims description 41
- 230000015654 memory Effects 0.000 claims description 20
- 238000012937 correction Methods 0.000 claims description 11
- 238000013441 quality evaluation Methods 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 2
- 239000000523 sample Substances 0.000 description 50
- 230000036544 posture Effects 0.000 description 21
- 230000008921 facial expression Effects 0.000 description 16
- 230000006870 function Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 10
- 238000012545 processing Methods 0.000 description 10
- 238000013527 convolutional neural network Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 6
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 4
- 238000004590 computer program Methods 0.000 description 4
- 230000007935 neutral effect Effects 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 210000001097 facial muscle Anatomy 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000006101 laboratory sample Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The application discloses a method and a device for generating a face image correction model and a method and a device for correcting a face image, and relates to the technical field of face recognition. The specific implementation scheme is as follows: acquiring a sample set, wherein each sample in the sample set comprises a front face image of the same person and a side face image of any posture angle; selecting samples from the sample set, and performing the following training steps: inputting the side face image of the selected sample to generate a confrontation network to obtain a composite image; analyzing the synthesized image and the corresponding frontal face image to determine a face characteristic loss value; and if the fact that the generation of the confrontation network is finished is determined according to the face characteristic loss value, the generated confrontation network is used as a face image correction model. The embodiment can generate the facial image correction model, and the facial images at different angles are converted into the frontal face image and then subjected to expression recognition, so that the accuracy and the robustness of the expression recognition are improved.
Description
Technical Field
The application relates to the technical field of computers, in particular to the technical field of face recognition.
Background
Most of laboratory samples are collected by the problems that the human face is over against the camera, the head posture is correct, the postures of the human face are various and the like. And in a real scene, the human face expression is generated spontaneously, the deviation of the head posture is large, and the difficulty of identification is increased.
At present, the facial expression recognition generally uses a traditional method or a single-model convolutional neural network, a facial expression image after face correction is used as input, expression features are extracted through the convolutional neural network or manually, and then the expression recognition classification result is obtained through classifier output. The robustness is poor when the face posture in a real scene is too large, and false recognition is easily caused, so that the accuracy of the algorithm is reduced.
Disclosure of Invention
The present disclosure provides a method, an apparatus, a device and a storage medium for generating a face image correction model, and a method, an apparatus, a device and a storage medium for correcting a face image.
According to a first aspect of the present disclosure, there is provided a method for generating a face image correction model, including: acquiring a sample set, wherein each sample in the sample set comprises a front face image of the same person and a side face image of any posture angle; selecting samples from the sample set, and performing the following training steps: inputting the side face image of the selected sample to generate a confrontation network to obtain a composite image; analyzing the synthesized image and the corresponding frontal face image to determine a face characteristic loss value; and if the fact that the generation of the confrontation network is finished is determined according to the face characteristic loss value, the generated confrontation network is used as a face image correction model.
According to a second aspect of the present disclosure, there is provided a method of correcting a face image, comprising: inputting a head portrait to be recognized into a face detection model to obtain a face image; inputting a face image into a face key point detection model to obtain an aligned face comprising key points; inputting the aligned face into the face image correction model generated by the method in the first aspect, and obtaining a pose-corrected front face image.
According to a third aspect of the present disclosure, there is provided an apparatus for generating a face image correction model, comprising: an acquisition unit configured to acquire a sample set, wherein each sample in the sample set includes a front face image of the same person and a side face image of an arbitrary pose angle; the training unit is configured to input the side face image of the selected sample into a generation countermeasure network to obtain a composite image; analyzing the synthesized image and the corresponding frontal face image to determine a face characteristic loss value; and if the fact that the generation of the confrontation network is finished is determined according to the face characteristic loss value, the generated confrontation network is used as a face image correction model.
According to a fourth aspect of the present disclosure, there is provided an apparatus for correcting a face image, comprising: the face detection unit is configured to input a head portrait to be recognized into the face detection model to obtain a face image; a key point detection unit configured to input a face image into a face key point detection model to obtain an aligned face including key points; a correction unit configured to input the aligned face into a face image correction model generated using the apparatus according to one of the first aspects, resulting in a pose-corrected frontal face image.
According to a fifth aspect of the present disclosure, there is provided an electronic apparatus, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first and second aspects.
According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions, characterized in that the computer instructions are for causing a computer to perform the method of any one of claims 1-7.
According to the technical scheme of the application, the method for generating the countermeasure network (GAN) can be used for converting the facial image in any posture into the facial expression image in the front posture, and the accuracy and robustness of the facial expression recognition in the large posture in the complex environment can be greatly improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a method for generating a correction model for a face image according to the present application;
FIG. 3 is a schematic diagram of an application scenario of the method for generating a face image correction model according to the present application;
FIG. 4 is a flow chart of one embodiment of a method for correcting a face image according to the present application;
FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for generating a face image correction model according to the present application;
FIG. 6 is a schematic structural diagram of an embodiment of an apparatus for correcting a face image according to the present application;
fig. 7 is a block diagram of an electronic device for implementing the method for generating a face image correction model and the method for correcting a face image according to the embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 illustrates an exemplary system architecture 100 to which a method of generating a face image correction model, an apparatus for generating a face image correction model, a method of correcting a face image, or an apparatus for correcting a face image according to an embodiment of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminals 101, 102, a network 103, a database server 104, and a server 105. The network 103 serves as a medium for providing communication links between the terminals 101, 102, the database server 104 and the server 105. Network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user 110 may use the terminals 101, 102 to interact with the server 105 over the network 103 to receive or send messages or the like. The terminals 101 and 102 may have various client applications installed thereon, such as a model training application, a face detection and recognition application, a shopping application, a payment application, a web browser, an instant messenger, and the like.
Here, the terminals 101 and 102 may be hardware or software. When the terminals 101 and 102 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III), laptop portable computers, desktop computers, and the like. When the terminals 101 and 102 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
When the terminals 101, 102 are hardware, an image capturing device may be further mounted thereon. The image acquisition device can be various devices capable of realizing the function of acquiring images, such as a camera, a sensor and the like. The user 110 may use the image capturing device on the terminal 101, 102 to capture the facial image of himself or another person.
The server 105 may also be a server providing various services, such as a background server providing support for various applications displayed on the terminals 101, 102. The background server may train the initial model using samples in the sample set sent by the terminals 101 and 102, and may send a training result (such as the generated face image correction model) to the terminals 101 and 102. In this way, the user can apply the generated face image correction model to perform face detection.
Here, the database server 104 and the server 105 may be hardware or software. When they are hardware, they can be implemented as a distributed server cluster composed of a plurality of servers, or as a single server. When they are software, they may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the method for generating a face image correction model or the method for correcting a face image provided in the embodiment of the present application is generally performed by the server 105. Accordingly, a device for generating a face image correction model or a device for correcting a face image is also generally provided in the server 105.
It is noted that database server 104 may not be provided in system architecture 100, as server 105 may perform the relevant functions of database server 104.
It should be understood that the number of terminals, networks, database servers, and servers in fig. 1 are merely illustrative. There may be any number of terminals, networks, database servers, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow diagram 200 of one embodiment of a method of generating a correction model for a face image in accordance with the present application is shown. The method for generating the face image correction model can comprise the following steps:
In the present embodiment, the implementation subject (e.g., the server 105 shown in fig. 1) of the method of generating a face image correction model may acquire a sample set in various ways. For example, the executing entity may obtain the existing sample set stored therein from a database server (e.g., database server 104 shown in fig. 1) via a wired connection or a wireless connection. As another example, a user may collect a sample via a terminal (e.g., terminals 101, 102 shown in FIG. 1). In this way, the executing entity may receive samples collected by the terminal and store the samples locally, thereby generating a sample set.
Here, the sample set may include at least one sample. Wherein, the sample can comprise a front face image of the same person and a side face image of any posture angle. When the GAN (generic adaptive Networks, generation countermeasure network) is trained, a facial expression image with a front face posture angle of 0 degrees and a facial expression image with any posture angle of the same person are input, so that the GAN learns to generate the facial expression image with the front face posture angle of 0 degrees through the facial expression image with any posture angle, and meanwhile, the character id characteristic and the expression characteristic are ensured not to change. Multi-PIE can be used as a training set of GAN, and facial expression images at different pose angles (pose angles from-90 ° to 90 °) can be obtained assuming that the face pose angle is 0 °. In addition, the facial expressions are classified into 7 basic types of expressions according to changes in facial muscles, anger (Angry), Disgust (dispust), Fear (Fear), happy (happenses), Sadness (Sadness), Surprise (surrise), and Neutral (Neutral). Meanwhile, the face is defined to contain 72 key points which are respectively (x)1,y1)…(x72,y72) The schematic diagram is shown in fig. 3.
The generation of the samples is as follows: carrying out image preprocessing on images at different angles acquired by a camera, firstly obtaining an image containing a human face, and detecting the human face through a detection model to obtain an approximate position area of the human face; the detection model can be an existing face detection model, and the face position can be detected. Secondly, detecting key points of the face through a face key point detection model according to the detected face area to obtain key point coordinate values of the face; wherein, the human face key point detection model is an existing model, the existing model is called, the image of the detected human face is input, and 72 human face key point coordinates are obtained, which are respectively (x)1,y1)…(x72,y72). Then, the face alignment is carried out on the target face according to the key point coordinate value of the face, meanwhile, the region only containing the face is intercepted through affine transformation and is adjusted to the same size, for example, 224x224, and the face key point coordinate is also remapped to a new coordinate according to an affine transformation matrix. The front face image and the side face image in the sample comprise the coordinates of key points of the human face.
Optionally, the obtained region including the face image may be subjected to image normalization processing. In this embodiment, the image normalization processing is performed on each pixel in the image in sequence, and the normalization processing method may be: the pixel value of each pixel is subtracted by 128 and divided by 256 to bring the pixel value of each pixel between-0.5, 0.5. Therefore, the complexity of image processing can be reduced, and the processing speed can be improved.
Optionally, the normalized image may be subjected to random data enhancement. Such as rotation, scaling, cropping, flipping, etc. This may increase the number of training samples, making the trained model more robust.
At step 202, a sample is selected from a sample set.
In this embodiment, the executing subject may select a sample from the sample set obtained in step 201, and perform the training steps from step 203 to step 206. The selection manner and the number of samples are not limited in the present application. For example, at least one sample may be randomly selected, or a sample with better sharpness (i.e., higher pixels) of the face image may be selected.
And step 203, inputting the side face image of the selected sample into a generation countermeasure network to obtain a composite image.
In this embodiment, the generation countermeasure Network may be a deep convolution generated countermeasure Network (DCGAN). The generation of the countermeasure network may include a generation network configured to perform pose adjustment on the input image and output an adjusted image, and a determination network configured to determine whether the input image is an image output by the generation network. The generation network may be a convolutional neural network for performing image processing (for example, various convolutional neural network structures including a convolutional layer, a pooling layer, an anti-pooling layer, and an anti-convolutional layer, and may perform down-sampling and up-sampling in sequence); the discriminative network may be a convolutional neural network (e.g., various convolutional neural network structures that include a fully-connected layer, where the fully-connected layer may perform a classification function). In addition, the above discriminant network may be other model structures that can be used to implement the classification function, such as a Support Vector Machine (SVM). It should be noted that the image output by the above generation network can be expressed by a matrix of RGB three channels. Here, the determination network may output 1 if it determines that the input image is an image (from the generated data) output by the generation network; if it is determined that the input image is not an image (from the real data, i.e., the second image) output by the generation network, 0 may be output. The discrimination network may output other values based on a preset value, and is not limited to 1 and 0.
Based on a machine learning method, a side face image in a sample is used as an input of the generation network, an image output by the generation network and a front face image in the sample are used as inputs of the discrimination network, the generation network and the discrimination network are trained, and the generation confrontation network after training is determined as a face image correction model. Specifically, the parameters of any one of the generation network and the discrimination network (which may be referred to as a first network) may be fixed first, and the network with unfixed parameters (which may be referred to as a second network) may be optimized; and fixing the parameters of the second network to improve the first network. The iteration is continued so that the discrimination network cannot distinguish whether the input image is generated by the generation network until the final convergence. At this time, the image generated by the generation network is close to the second image, and the discrimination network cannot accurately distinguish the real data from the generated data (i.e., the accuracy is 50%), so that the generation network at this time can be determined as the face image correction model.
And 204, analyzing the synthesized image and the corresponding front face image to determine a face characteristic loss value.
In this embodiment, the execution subject may analyze a composite image generated from a side face image of a sample and a front face image of the sample, so that a facial feature loss value may be determined. For example, the coordinates of the key point in the composite image and the coordinates of the key point in the face image may be used as parameters, and the parameters may be input to a predetermined loss function (loss function), so that a loss value between the two may be calculated.
In this embodiment, the loss function is generally used to measure the degree of disparity between the predicted value (e.g., the synthesized image) and the actual value (e.g., the front face image) of the model. It is a non-negative real-valued function. In general, the smaller the loss function, the better the robustness of the model. The loss function may be set according to actual requirements.
And step 205, if the fact that the training of the generated confrontation network is finished is determined according to the face characteristic loss value, the generated confrontation network is used as a face image correction model.
In this embodiment, if a plurality of samples are selected in step 202, the execution subject may determine that the training of the generative confrontation network is complete if the loss value of the facial feature of each sample reaches the target value. As another example, the executive may count the proportion of samples whose facial feature loss values reach the target value to the selected samples. And when the ratio reaches a preset sample ratio (such as 95%), it can be determined that the generation of the confrontation network is finished.
In this embodiment, in response to determining that the face feature loss value does not reach the target value, the electronic device may adjust parameters of the generation countermeasure network, i.e., update the generation network and/or the discrimination network, and then reuse the updated generation network and discrimination network to re-execute the training step. Therefore, parameters of the face image correction model obtained by the generative confrontation network training are obtained based on the training samples and can be determined based on the back propagation of the discrimination network, the training of the generative model can be realized without depending on a large number of labeled samples, the human cost is reduced, and the flexibility of image processing is further improved.
With further reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for generating a face image correction model according to the present embodiment. In the application scenario of fig. 3, a terminal used by a user may have a model training application installed thereon. When a user opens the application and uploads a sample set or a storage path of the sample set, a server providing background support for the application can run a method for generating a face image correction model, and the method comprises the following steps:
first, a sample set may be obtained. The samples in the sample set may include a front face image and a side face image at any pose angle. Thereafter, samples may be selected from the sample set, and the following training steps performed: inputting the side face image of the selected sample to generate a confrontation network to obtain a composite image; analyzing the synthesized image and the corresponding frontal face image to determine a face characteristic loss value; and if the fact that the generation of the confrontation network is finished is determined according to the face characteristic loss value, the generated confrontation network is used as a face image correction model.
At this time, the server may also send prompt information indicating that the model training is completed to the terminal. The prompt message may be a voice and/or text message. In this way, the user can acquire the face image correction model at a preset storage position.
The method provided by the above embodiment of the present disclosure may generate a face image correction model. Therefore, the side face image can be converted into the front face image, and the accuracy of face recognition can be improved. The method for generating the countermeasure network (GAN) can convert the facial image in any posture into the facial expression image in the front posture, and can greatly improve the accuracy and robustness of the facial expression recognition in the large posture in the complex environment.
With further reference to fig. 4, a flow 400 of an embodiment of a method of correcting a face image is shown. The process 400 of the method for correcting a face image includes the following steps:
In the present embodiment, the execution subject (e.g., the server 105 shown in fig. 1) of the method of correcting a face image may acquire an avatar of a detection target in various ways, and the avatar may include other parts such as a neck in addition to a face. For example, the execution subject may obtain the image including the human face stored therein from a database server (e.g., database server 104 shown in fig. 1) through a wired connection or a wireless connection. As another example, the executing entity may also receive an avatar captured by a terminal (e.g., terminals 101, 102 shown in fig. 1) or other device.
In the present embodiment, the detection object may be any user, such as a user using a terminal, or another user who appears in the image capturing range, or the like. The avatar may also be a color image and/or a grayscale image, etc. And the format of the avatar is not limited in this application.
Detecting the head portrait to be recognized through a detection model to obtain an approximate position area of the face; the detection model is an existing face detection model and can detect the face position.
In this embodiment, according to a detected face region, detecting a face key point through a face key point detection model to obtain a key point coordinate value of a face; the face key point detection model is an existing model, the existing model is called, an image of a detected face is input, and 72 face key point coordinates are obtained and are respectively (x1, y1) … (x72, y 72). And then, carrying out face alignment on the target face according to the key point coordinate value of the face, simultaneously intercepting only a face region through affine transformation, adjusting the face region to the same ruler 224x224, and remapping the face key point coordinates to new coordinates according to an affine transformation matrix.
In this embodiment, the aligned face is input to the GAN trained in step 201 and 206, and a front face image with its posture corrected by the GAN is obtained.
Optionally, the front face image may be input into a pre-trained expression detection model, and expression information may be output. The expression detection model is a convolutional neural network and is used for extracting expression features, the convolutional neural network comprises 8 convolutional layers and 5 maximum pooling layers, and finally face 7 classification expression information is obtained through a full connection layer. Facial expressions are classified into 7 types of basic expressions, anger (Angry), Disgust (distust), Fear (Fear), happy (happenses), Sadness (Sadness), Surprise (surrise), and Neutral (Neutral) according to changes in facial muscles.
Optionally, the method may further acquire scene information of the currently acquired avatar, and then perform service quality evaluation according to the expression information and the scene information. And auxiliary functions such as information recommendation and the like can be realized according to the service quality evaluation result. The improvement of the precision is beneficial to improving the service quality of a plurality of applications, for example, in the aspect of advertisement putting, the auxiliary recommendation of a search result which is more in line with the user requirement and the accurate advertisement putting are facilitated; in the aspect of distance education, the emotion recognition of students is facilitated to improve teaching contents and improve the quality of distance education; in the monitoring scene of the driver, the emotion of the driver can be recognized, and the driver is prompted correspondingly, so that the safety of the driver is guaranteed.
It should be noted that the method for correcting a face image according to the present embodiment can be used to test the face image correction model generated according to the above embodiments. And then the face image correction model can be continuously optimized according to the test result. The method may also be a practical application method of the face image correction model generated by the above embodiments. The face image correction model generated by the embodiments is adopted to correct the face image, which is beneficial to improving the performance of face recognition. If more faces are found, the found face information is more accurate, and the like.
With further reference to fig. 5, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of an apparatus for generating a face image correction model, where the apparatus embodiment corresponds to the method embodiment shown in fig. 2, and the apparatus may be applied to various electronic devices.
As shown in fig. 5, the apparatus 500 for generating a face image correction model according to the present embodiment includes: an acquisition unit 501 and a training unit 502. The acquiring unit 501 is configured to acquire a sample set, where each sample in the sample set includes a front face image of the same person and a side face image of any pose angle; a training unit 502 configured to select samples from a sample set and to perform the following training steps: inputting the side face image of the selected sample to generate a confrontation network to obtain a composite image; analyzing the synthesized image and the corresponding frontal face image to determine a face characteristic loss value; and if the fact that the generation of the confrontation network is finished is determined according to the face characteristic loss value, the generated confrontation network is used as a face image correction model.
In some optional implementations of this embodiment, the apparatus 500 further includes an adjusting unit 503 configured to: if the fact that the generation of the confrontation network is not trained is determined according to the face feature loss value, relevant parameters in the generation of the confrontation network are adjusted, samples are reselected from the sample set, the adjusted generation of the confrontation network is used as the generation of the confrontation network, and the training step is continuously executed.
In some optional implementations of the present embodiment, the front face image and the side face image in each sample in the sample set are subjected to normalization processing.
In some optional implementations of this embodiment, the samples in the sample set are subjected to the random data enhancement processing by at least one of: rotating, zooming, clipping and turning.
With further reference to fig. 6, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for correcting a face image, which corresponds to the method embodiment shown in fig. 4, and which is particularly applicable to various electronic devices.
As shown in fig. 6, the apparatus 600 for correcting a face image of the present embodiment includes: a face detection unit 601, a key point detection unit 602, and a correction unit 603. The face detection unit 601 is configured to input a head portrait to be recognized into a face detection model, so as to obtain a face image; a key point detection unit 602 configured to input a face image into a face key point detection model, resulting in an aligned face including key points; a correction unit 603 configured to input the aligned faces into a face image correction model generated using the apparatus according to any one of claims 1 to 4, resulting in a pose-corrected frontal face image.
In some optional implementations of this embodiment, the apparatus 600 further includes an expression detection unit 604 configured to: and inputting the front face image into a pre-trained expression detection model, and outputting expression information.
In some optional implementations of this embodiment, the apparatus 600 further comprises an evaluation unit (not shown in the drawings) configured to: acquiring scene information; and performing service quality evaluation according to the expression information and the scene information.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 7 is a block diagram of an electronic device for generating a face image correction model and a method for correcting a face image according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 7, the electronic apparatus includes: one or more processors 701, a memory 702, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 7, one processor 701 is taken as an example.
The memory 702 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method for generating a correction model of a facial image and the method for correcting a facial image provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of generating a correction model for a face image and the method of correcting a face image provided by the present application.
The memory 702 serves as a non-transitory computer readable storage medium and may be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., xx module X01, xx module X02, and xx module X03 shown in fig. 5) corresponding to the method for generating a face image correction model and the method for correcting a face image in the embodiments of the present application. The processor 701 executes various functional applications of the server and data processing, namely, a method of generating a face image correction model and a method of correcting a face image in the above-described method embodiments, by running a non-transitory software program, instructions and modules stored in the memory 702.
The memory 702 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the electronic device that generates the face image correction model and the method of correcting the face image, and the like. Further, the memory 702 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 702 may optionally include memory located remotely from the processor 701, and such remote memory may be coupled through a network to an electronic device that generates the method for correcting a model of a facial image and the method for correcting a facial image. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the method of generating a face image correction model and the method of correcting a face image may further include: an input device 703 and an output device 704. The processor 701, the memory 702, the input device 703 and the output device 704 may be connected by a bus or other means, and fig. 7 illustrates an example of a connection by a bus.
The input device 703 may receive input numeric or character information, and generate key signal inputs related to user settings and function control of the electronic apparatus for the method of generating the face image correction model and the method of correcting the face image, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, and the like. The output devices 704 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the application, the method for generating the countermeasure network (GAN) can be used for converting the facial image in any posture into the facial expression image in the front posture, and the accuracy and robustness of the facial expression recognition in the large posture in the complex environment can be greatly improved.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (16)
1. A method of generating a correction model for a face image, comprising:
acquiring a sample set, wherein each sample in the sample set comprises a front face image of the same person and a side face image of any posture angle;
selecting samples from the sample set, and performing the following training steps: inputting the side face image of the selected sample to generate a confrontation network to obtain a composite image; analyzing the synthesized image and the corresponding frontal face image to determine a face characteristic loss value; and if the generated confrontation network is determined to be trained completely according to the facial feature loss value, taking the generated confrontation network as a facial image correction model.
2. The method of claim 1, wherein the method further comprises:
and if the generated confrontation network is determined not to be trained completely according to the facial feature loss value, adjusting relevant parameters in the generated confrontation network, reselecting a sample from the sample set, using the adjusted generated confrontation network as the generated confrontation network, and continuing to execute the training step.
3. The method of claim 1, wherein the front face image and the side face image in each sample in the sample set are normalized.
4. The method of claims 1-3, wherein the samples in the sample set are randomly data enhanced by at least one of:
rotating, zooming, clipping and turning.
5. A method of correcting a face image, comprising:
inputting a head portrait to be recognized into a face detection model to obtain a face image;
inputting the face image into a face key point detection model to obtain an aligned face comprising key points;
inputting the aligned face into a face image correction model generated by the method according to any one of claims 1 to 4, and obtaining a posture-corrected front face image.
6. The method of claim 5, wherein the method further comprises:
and inputting the front face image into a pre-trained expression detection model, and outputting expression information.
7. The method of claim 6, wherein the method further comprises:
acquiring scene information;
and performing service quality evaluation according to the expression information and the scene information.
8. An apparatus for generating a correction model of a face image, comprising:
an acquisition unit configured to acquire a sample set, wherein each sample in the sample set includes a front face image of the same person and a side face image of an arbitrary pose angle;
a training unit configured to select samples from the set of samples and to perform the following training steps: inputting the side face image of the selected sample to generate a confrontation network to obtain a composite image; analyzing the synthesized image and the corresponding frontal face image to determine a face characteristic loss value; and if the generated confrontation network is determined to be trained completely according to the facial feature loss value, taking the generated confrontation network as a facial image correction model.
9. The apparatus of claim 8, wherein the apparatus further comprises an adjustment unit configured to:
and if the generated confrontation network is determined not to be trained completely according to the facial feature loss value, adjusting relevant parameters in the generated confrontation network, reselecting a sample from the sample set, using the adjusted generated confrontation network as the generated confrontation network, and continuing to execute the training step.
10. The apparatus of claim 8, wherein the front and side face images in each sample in the sample set are normalized.
11. The apparatus as recited in claims 8-10, wherein the samples of the sample set are randomly data enhanced by at least one of:
rotating, zooming, clipping and turning.
12. An apparatus for correcting a face image, comprising:
the face detection unit is configured to input a head portrait to be recognized into the face detection model to obtain a face image;
a key point detection unit configured to input the face image into a face key point detection model to obtain an aligned face including key points;
a correction unit configured to input the aligned faces into a face image correction model generated using the apparatus according to any one of claims 1 to 4, resulting in a pose-corrected frontal face image.
13. The apparatus of claim 12, further comprising an expression detection unit configured to:
and inputting the front face image into a pre-trained expression detection model, and outputting expression information.
14. The apparatus of claim 13, wherein the apparatus further comprises an evaluation unit configured to:
acquiring scene information;
and performing service quality evaluation according to the expression information and the scene information.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010720935.2A CN111860362A (en) | 2020-07-24 | 2020-07-24 | Method and device for generating human face image correction model and correcting human face image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010720935.2A CN111860362A (en) | 2020-07-24 | 2020-07-24 | Method and device for generating human face image correction model and correcting human face image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111860362A true CN111860362A (en) | 2020-10-30 |
Family
ID=72951135
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010720935.2A Pending CN111860362A (en) | 2020-07-24 | 2020-07-24 | Method and device for generating human face image correction model and correcting human face image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111860362A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112183492A (en) * | 2020-11-05 | 2021-01-05 | 厦门市美亚柏科信息股份有限公司 | Face model precision correction method, device and storage medium |
CN112330781A (en) * | 2020-11-24 | 2021-02-05 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for generating model and generating human face animation |
CN112395979A (en) * | 2020-11-17 | 2021-02-23 | 平安科技(深圳)有限公司 | Image-based health state identification method, device, equipment and storage medium |
CN113033476A (en) * | 2021-04-19 | 2021-06-25 | 清华大学 | Cross-posture face recognition method |
CN113033288A (en) * | 2021-01-29 | 2021-06-25 | 浙江大学 | Method for generating front face picture based on side face picture for generating confrontation network |
CN113191197A (en) * | 2021-04-01 | 2021-07-30 | 杭州海康威视系统技术有限公司 | Image restoration method and device |
CN113255788A (en) * | 2021-05-31 | 2021-08-13 | 西安电子科技大学 | Method and system for generating confrontation network face correction based on two-stage mask guidance |
CN113343931A (en) * | 2021-07-05 | 2021-09-03 | Oppo广东移动通信有限公司 | Training method for generating countermeasure network, image sight correction method and device |
CN113781540A (en) * | 2021-09-15 | 2021-12-10 | 京东鲲鹏(江苏)科技有限公司 | Network generation method and device, electronic equipment and computer readable medium |
CN113822790A (en) * | 2021-06-03 | 2021-12-21 | 腾讯云计算(北京)有限责任公司 | Image processing method, device, equipment and computer readable storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108446609A (en) * | 2018-03-02 | 2018-08-24 | 南京邮电大学 | A kind of multi-angle human facial expression recognition method based on generation confrontation network |
CN110222668A (en) * | 2019-06-17 | 2019-09-10 | 苏州大学 | Based on the multi-pose human facial expression recognition method for generating confrontation network |
CN110738161A (en) * | 2019-10-12 | 2020-01-31 | 电子科技大学 | face image correction method based on improved generation type confrontation network |
-
2020
- 2020-07-24 CN CN202010720935.2A patent/CN111860362A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108446609A (en) * | 2018-03-02 | 2018-08-24 | 南京邮电大学 | A kind of multi-angle human facial expression recognition method based on generation confrontation network |
CN110222668A (en) * | 2019-06-17 | 2019-09-10 | 苏州大学 | Based on the multi-pose human facial expression recognition method for generating confrontation network |
CN110738161A (en) * | 2019-10-12 | 2020-01-31 | 电子科技大学 | face image correction method based on improved generation type confrontation network |
Non-Patent Citations (3)
Title |
---|
RUI HUANG等人: "Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis", ARXIV:1704.04086V2, 4 August 2017 (2017-08-04), pages 1 - 11 * |
范雪,杨鸿波,李永: "基于深度学习的人脸图像扭正算法", 《信息通信》, 31 July 2017 (2017-07-31), pages 5 - 9 * |
陈宗海主编: "《系统仿真技术及其应用 第18卷》", 31 August 2017, pages: 280 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112183492A (en) * | 2020-11-05 | 2021-01-05 | 厦门市美亚柏科信息股份有限公司 | Face model precision correction method, device and storage medium |
CN112183492B (en) * | 2020-11-05 | 2022-07-15 | 厦门市美亚柏科信息股份有限公司 | Face model precision correction method, device and storage medium |
CN112395979A (en) * | 2020-11-17 | 2021-02-23 | 平安科技(深圳)有限公司 | Image-based health state identification method, device, equipment and storage medium |
CN112395979B (en) * | 2020-11-17 | 2024-05-10 | 平安科技(深圳)有限公司 | Image-based health state identification method, device, equipment and storage medium |
CN112330781A (en) * | 2020-11-24 | 2021-02-05 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for generating model and generating human face animation |
CN113033288B (en) * | 2021-01-29 | 2022-06-24 | 浙江大学 | Method for generating front face picture based on side face picture for generating confrontation network |
CN113033288A (en) * | 2021-01-29 | 2021-06-25 | 浙江大学 | Method for generating front face picture based on side face picture for generating confrontation network |
CN113191197A (en) * | 2021-04-01 | 2021-07-30 | 杭州海康威视系统技术有限公司 | Image restoration method and device |
CN113191197B (en) * | 2021-04-01 | 2024-02-09 | 杭州海康威视系统技术有限公司 | Image restoration method and device |
CN113033476B (en) * | 2021-04-19 | 2022-08-12 | 清华大学 | Cross-posture face recognition method |
CN113033476A (en) * | 2021-04-19 | 2021-06-25 | 清华大学 | Cross-posture face recognition method |
CN113255788A (en) * | 2021-05-31 | 2021-08-13 | 西安电子科技大学 | Method and system for generating confrontation network face correction based on two-stage mask guidance |
CN113822790A (en) * | 2021-06-03 | 2021-12-21 | 腾讯云计算(北京)有限责任公司 | Image processing method, device, equipment and computer readable storage medium |
WO2022252372A1 (en) * | 2021-06-03 | 2022-12-08 | 腾讯云计算(北京)有限责任公司 | Image processing method, apparatus and device, and computer-readable storage medium |
CN113822790B (en) * | 2021-06-03 | 2023-04-21 | 腾讯云计算(北京)有限责任公司 | Image processing method, device, equipment and computer readable storage medium |
CN113343931A (en) * | 2021-07-05 | 2021-09-03 | Oppo广东移动通信有限公司 | Training method for generating countermeasure network, image sight correction method and device |
CN113781540A (en) * | 2021-09-15 | 2021-12-10 | 京东鲲鹏(江苏)科技有限公司 | Network generation method and device, electronic equipment and computer readable medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111860362A (en) | Method and device for generating human face image correction model and correcting human face image | |
CN109313490B (en) | Eye gaze tracking using neural networks | |
EP3467707B1 (en) | System and method for deep learning based hand gesture recognition in first person view | |
US20190392587A1 (en) | System for predicting articulated object feature location | |
CN111783620B (en) | Expression recognition method, device, equipment and storage medium | |
CN111598164B (en) | Method, device, electronic equipment and storage medium for identifying attribute of target object | |
WO2020081239A1 (en) | Speaking classification using audio-visual data | |
US20220051004A1 (en) | Image processing method, apparatus, device and storage medium | |
CN111783622A (en) | Method, device and equipment for recognizing facial expressions and computer-readable storage medium | |
CN113221771B (en) | Living body face recognition method, device, apparatus, storage medium and program product | |
CN112966742A (en) | Model training method, target detection method and device and electronic equipment | |
CN111783621A (en) | Method, device, equipment and storage medium for facial expression recognition and model training | |
CN110741377A (en) | Face image processing method and device, storage medium and electronic equipment | |
US11921276B2 (en) | Method and apparatus for evaluating image relative definition, device and medium | |
WO2023098912A1 (en) | Image processing method and apparatus, storage medium, and electronic device | |
CN111539897A (en) | Method and apparatus for generating image conversion model | |
CN112561879A (en) | Ambiguity evaluation model training method, image ambiguity evaluation method and device | |
US20230139994A1 (en) | Method for recognizing dynamic gesture, device, and storage medium | |
CN111523467A (en) | Face tracking method and device | |
CN115393488A (en) | Method and device for driving virtual character expression, electronic equipment and storage medium | |
CN112328088B (en) | Image presentation method and device | |
CN112200169B (en) | Method, apparatus, device and storage medium for training a model | |
CN113313048A (en) | Facial expression recognition method and device | |
CN113128436A (en) | Method and device for detecting key points | |
US11556183B1 (en) | Techniques for generating data for an intelligent gesture detector |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |