CN111428661A - Method for processing face image based on intelligent human-computer interaction - Google Patents
Method for processing face image based on intelligent human-computer interaction Download PDFInfo
- Publication number
- CN111428661A CN111428661A CN202010232764.9A CN202010232764A CN111428661A CN 111428661 A CN111428661 A CN 111428661A CN 202010232764 A CN202010232764 A CN 202010232764A CN 111428661 A CN111428661 A CN 111428661A
- Authority
- CN
- China
- Prior art keywords
- face
- facial
- predicted
- image
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 230000003993 interaction Effects 0.000 title claims abstract description 26
- 238000012545 processing Methods 0.000 title claims abstract description 16
- 230000001815 facial effect Effects 0.000 claims abstract description 26
- 230000014509 gene expression Effects 0.000 claims abstract description 24
- 230000008921 facial expression Effects 0.000 claims abstract description 17
- 230000000007 visual effect Effects 0.000 claims abstract description 15
- 238000001514 detection method Methods 0.000 claims description 47
- 239000013598 vector Substances 0.000 claims description 20
- 238000013527 convolutional neural network Methods 0.000 claims description 19
- 238000006073 displacement reaction Methods 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 11
- 230000008859 change Effects 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 6
- 230000036544 posture Effects 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 5
- 230000015572 biosynthetic process Effects 0.000 claims description 4
- 238000003786 synthesis reaction Methods 0.000 claims description 4
- 238000012937 correction Methods 0.000 claims description 3
- 238000012067 mathematical method Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000006399 behavior Effects 0.000 abstract description 4
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000013135 deep learning Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000008451 emotion Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/165—Detection; Localisation; Normalisation using facial parts and geometric relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Geometry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
A method for processing face images based on intelligent human-computer interaction belongs to the field of artificial intelligence. The facial features related to the user information are divided into overall facial features, expression features and visual tracking, and three-dimensional face information can be obtained through combined analysis of the three aspects. The invention binds the specific action in the facial features of the user into the behavior information, realizes man-machine interaction through the facial feature information and reduces the selection error of the facial recognition point. According to the invention, the captured human face is subjected to facial region analysis, the basic facial features of the human face are positioned, visual tracking is carried out aiming at the eye region, the facial expression is bound with behavior information so as to realize man-machine interaction, the selection error of facial recognition points is reduced, and the accuracy of facial recognition is improved.
Description
Technical Field
The invention belongs to the field of artificial intelligence, relates to a method for processing a face image, and particularly relates to a method for processing a face image by acquiring face images and video information in real time, and performing face point detection, expression recognition and visual tracking.
Background
Human-computer interaction is a process of information exchange between human and system, and the process can exist in a plurality of different types of systems, the earliest method of human-computer interaction is realized by inputting machine language instructions through manual operation, and the medium of interaction is a computer language. With the development of the graphic processing field, the human-computer interaction media are gradually changed to the graphic interface, so that the graphic interface has better interaction experience, the interaction of the user is more convenient, and more accurate information is fed back to the user. With the development of the related technologies such as pervasive computing and deep learning, the variety of human-computer interaction is continuously enriched, the media for interaction becomes diversified, the main methods are continuously improved to include voice recognition, gesture recognition, tracking and the like, the interaction forms become diversified, and the amount of information which can be transmitted is greatly increased.
As the face analysis technology becomes mature, the face features are widely applied to various occasions, and the problem of processing face images is pushed up to the wind gap wave tip of the times. The invention carries out face point detection, expression recognition and visual tracking on the face by positioning the face characteristic region, combines and applies the three methods, overcomes different limitations of each method and realizes an intelligent human-computer interaction process.
The human face point detection and recognition is one of important applications in the field of deep learning, and the human face point recognition refers to positioning key points on a human face after the human face is detected, preprocessing data after a human face characteristic region is positioned, extracting characteristics by using a recognition algorithm, and completing the human face recognition, wherein the process is shown in fig. 1. With the rapid development of the deep learning technology, the face point recognition technology is mature day by day.
Expression recognition is an important direction for understanding human emotion by a computer and is also an important aspect of man-machine interaction, and by analyzing expressions, a user can capture user information and make a decision, so that the emotion and the psychology of the user can be directly judged. The emotion common convolutional neural network is analyzed, and by analyzing a plurality of convolutional layers and acquisition layers in the cellular neural network, high-level and multi-level features of the whole face or a local area can be extracted, and good expression image feature analysis is obtained. Experience shows that the application of the convolutional neural network in image recognition is superior to other types of neural networks, and the best expression recognition effect can be achieved through the convolutional neural network.
The visual tracking is another important aspect in the human-computer interaction process, the focus of a user can be conveniently observed through the visual tracking, the interested area of the user can be more favorably analyzed, the selection and the preference of the user can be analyzed, human eyes are used as an input source of a computer, the sight range of the human eyes is determined by tracking the sight of the user, and the corresponding human-computer interaction is completed.
The analysis is integrated to see that the three methods respectively aim at one direction in face recognition, so that the defects exist in the independent application to cause the occurrence of recognition errors. Therefore, the invention provides a new face recognition method by combining the three methods, the basic facial features of the face are positioned by analyzing the facial region of the captured face, the visual tracking is carried out aiming at the eye region, the facial expression is bound with the behavior information to realize man-machine interaction, the selection error of the facial recognition point is reduced, and the accuracy of the facial recognition is improved.
Disclosure of Invention
The invention aims to solve the problem of processing a face image, and provides a method for realizing face image processing by acquiring face images and video information in real time, and performing face point detection, expression recognition and visual tracking.
The method for implementing the invention is described as follows:
the invention discloses a face image processing technology based on a convolutional neural network to realize intelligent man-machine interaction, and the specific realization method comprises the following three processes:
method for detecting face points
The method adopts a three-layer convolutional neural network structure, adopts an absolute value correction and parameter sharing mechanism to establish a first layer of convolutional neural network, and adopts a multi-layer regression idea to obtain two-layer and three-layer convolutional neural networks. Because the face pose changes greatly, detection is unstable, the relative positions of the face points and the detection points may change in a large range, namely, a large relative error is generated, and therefore the input area of the first-level network is selected to be large so as to cover as many predicted positions as possible. The output of the first-level network provides a selection condition for the selection of a subsequent detection area, so that the second-level detection area and the third-level detection area are correspondingly reduced, and the selection condition of the detection areas is as follows: a circular area containing 75% of all predicted positions obtained by the previous network, and centered at a position point where the predicted position density of the previous network is highest.
The prediction position is obtained again in the new prediction area, the process is repeated for many times until the detection area is reduced to 1% of the first-stage detection area, the obtained prediction position is the prediction position of each point, and then a plurality of networks of different input areas of each level are obtained;
the final predicted position x of the face point can be formally expressed as a cascade of n levels, and the mathematical expression of the predicted position x is as follows:
where x is the predicted position, liThe number of predicted positions on the level i is the number of predicted positions on the level i cascaded by n levelsIs represented by, i.e. x1 (1)For the first predicted position at level 1, l at level iiA predicted positionCompared with the corresponding l-th level on the i-1 leveliΔ x for the change of predicted position1 (i)And (4) showing.
The method adopts the design of three layers of convolutional neural networks. The first layer of convolutional neural network comprises three deep convolutional networks with different detection areas, namely an F1 network detection area covering the whole face, an EN1 network detection area covering only the eyes and nose area and an NM1 network detection area covering only the nose and mouth area. The three networks simultaneously adopt the prediction method to predict different areas of the same face, the obtained predicted values of the three networks are averaged, the first-layer predicted position of the whole face feature point can be obtained by reducing the variance, and the deviation of the face prediction result from the reality caused by the excessively obvious local features is avoided. And (3) according to the idea of regression adopted by the first-layer predicted position, respectively obtaining corresponding second-layer predicted positions and third-layer predicted positions for three networks F1, EN1 and NM 1. Because the input area range of the second and third levels is strictly limited by the prediction result of the first level, the prediction positions of the second and third levels can achieve extremely high precision, but are also strictly limited.
(II) facial expression recognition method
In the facial expression recognition method, an end-to-end learning model is provided, facial image synthesis is carried out on the model through two angles of the gesture and the expression, and facial expression recognition is carried out when the gesture is unchanged. The structure of the model consists of a generator, two discriminators and a classifier. The image is pre-processed before being introduced into the model, and a face detection algorithm is applied to a base library containing 68 landmark points for face detection. After preprocessing, the facial image is input into a generator G to generate an identity identifier, specifically, a rule f (x) exists, each input image has a determined and unique identity identifier, and the identity identifier is connected with an expression code e and a posture code p in a cascade mode to represent the change of the face. By applying the maximum and minimum algorithm between the generator G and the discriminator D and adding corresponding labels at the input end of the decoder, new labels of face images with different postures and expressions can be obtained. The invention herein uses two discriminator structures, denoted Datt and Di, respectively, where the discriminator Datt is used to identify and indicate the identity entanglement and Di is used to improve the quality of the resulting image. After the face image is synthesized, the classifier Cexp is used for completing the facial expression recognition task of the face image, specifically, the deep learning algorithm is applied to the classifier, and the classification key factors are ensured to be gradually stable in each presentation layer while the feature information of each facial expression is kept.
(III) visual tracking method
The invention adopts a tracking algorithm based on detection, analyzes and captures an image gradient vector field of a human face, describes the relation between the possible center and the direction of the image gradient by a mathematical method, sets a possible center c, provides a gradient vector at the center position, and leads the displacement to be the same as the gradient direction by normalization. By calculating a normalized displacement related to the position of the centreAnd gradient vector gjThe optimal center c of the eye region in the face image is obtained*I.e., the pupil center position, is given by:
aiming at a possible center c, selecting N different gradients which respectively correspond to different normalized displacementsAnd gradient vector g1…gNI.e. the selected normalized displacement corresponding to the jth gradient isGradient vector is gjWhen the objective function obtains the maximum value, the corresponding position variable is the optimal center position c*. Wherein the selected different gradients correspond to a displacement djThe method can be obtained by the following steps:
will displace djZooming to unit length to obtain the same weight at different positions in the face image, and gradient vector g for improving the robustness of the method to the linear change of illumination and contrastjAlso scaled to a single length, it can be found that the objective function produces a maximum at the pupil center position. In addition, the complexity of the algorithm can be reduced by only considering gradient vectors with significant amplitudes.
The method provided by the invention has the following advantages when being applied to the field of face image processing:
the invention realizes man-machine interaction by identifying facial features.
The facial features related to the user information are divided into overall facial features, expression features and visual tracking, and three-dimensional face information can be obtained through combined analysis of the overall facial features, the expression features and the visual tracking.
And thirdly, binding specific actions in the facial features of the user into the behavior information, realizing man-machine interaction through the facial feature information and reducing the selection error of the facial recognition points.
Drawings
FIG. 1 is a schematic diagram of a human-computer interaction processing process based on face detection;
FIG. 2 is a block diagram of the model;
Detailed Description
The method for processing human face images based on intelligent human-computer interaction according to the present invention will now be described in further detail with reference to the model block diagram and the implementation example shown in fig. 2.
The invention discloses a method for processing a face image based on intelligent human-computer interaction, which is a method for realizing the processing of the face image by acquiring the face image and video information in real time, and performing face point detection, expression recognition and visual tracking. The general working framework applied to the field of face recognition of the invention is described as follows: the method comprises the steps of capturing a human face, positioning key points of the human face through a three-layer convolutional neural network, identifying facial expressions of the human face by adopting an end-to-end deep learning model, realizing human face image synthesis and expression identification by utilizing different gestures and expressions, and positioning the center of the eye by using image gradients to realize visual tracking. After obtaining these three features, we combine them into a three-layer neural network for training so that the machine responds reasonably according to the combined features.
The invention adopts the following technical scheme and implementation steps:
method for detecting face points
The method adopts a three-layer convolutional neural network structure, adopts an absolute value correction and parameter sharing mechanism to establish a first layer of convolutional neural network, and adopts a multi-layer regression idea to obtain two-layer and three-layer convolutional neural networks. Because the face pose changes greatly, detection is unstable, the relative positions of the face points and the detection points may change in a large range, namely, a large relative error is generated, and therefore the input area of the first-level network is selected to be large so as to cover as many predicted positions as possible. The output of the first-level network provides a selection condition for the selection of a subsequent detection area, so that the second-level detection area and the third-level detection area are correspondingly reduced, and the selection condition of the detection areas is as follows: a circular area containing 75% of all predicted positions obtained by the previous network, and centered at a position point where the predicted position density of the previous network is highest.
The prediction position is obtained again in the new prediction area, the process is repeated for many times until the detection area is reduced to 1% of the first-stage detection area, the obtained prediction position is the prediction position of each point, and then a plurality of networks of different input areas of each level are obtained;
the final predicted position x of the face point can be formally expressed as a cascade of n levels, and the mathematical expression of the predicted position x is as follows:
where x is the predicted position, liThe number of predicted positions on the level i is the number of predicted positions on the level i cascaded by n levelsIs represented by, i.e. x1 (1)For the first predicted position at level 1, l at level iiA predicted positionCompared with the corresponding l-th level on the i-1 leveliΔ x for the change of predicted position1 (i)And (4) showing.
The method adopts the design of three layers of convolutional neural networks. The first layer of convolutional neural network comprises three deep convolutional networks with different detection areas, namely an F1 network detection area covering the whole face, an EN1 network detection area covering only the eyes and nose area and an NM1 network detection area covering only the nose and mouth area. The three networks simultaneously adopt the prediction method to predict different areas of the same face, the obtained predicted values of the three networks are averaged, the first-layer predicted position of the whole face feature point can be obtained by reducing the variance, and the deviation of the face prediction result from the reality caused by the excessively obvious local features is avoided. And (3) according to the idea of regression adopted by the first-layer predicted position, respectively obtaining corresponding second-layer predicted positions and third-layer predicted positions for three networks F1, EN1 and NM 1. Because the input area range of the second and third levels is strictly limited by the prediction result of the first level, the prediction positions of the second and third levels can achieve extremely high precision, but are also strictly limited.
(II) facial expression recognition method
In the facial expression recognition method, an end-to-end learning model is provided, facial image synthesis is carried out on the model through two angles of the gesture and the expression, and facial expression recognition is carried out when the gesture is unchanged. The structure of the model consists of a generator, two discriminators and a classifier. The image is pre-processed before being introduced into the model, and a face detection algorithm is applied to a base library containing 68 landmark points for face detection. After preprocessing, the facial image is input into a generator G to generate an identity identifier, specifically, a rule f (x) exists, each input image has a determined and unique identity identifier, and the identity identifier is connected with an expression code e and a posture code p in a cascade mode to represent the change of the face. By applying the maximum and minimum algorithm between the generator G and the discriminator D and adding corresponding labels at the input end of the decoder, new labels of face images with different postures and expressions can be obtained. The invention herein uses two discriminator structures, denoted Datt and Di, respectively, where the discriminator Datt is used to identify and indicate the identity entanglement and Di is used to improve the quality of the resulting image. After the face image is synthesized, the classifier Cexp is used for completing the facial expression recognition task of the face image, specifically, the deep learning algorithm is applied to the classifier, and the classification key factors are ensured to be gradually stable in each presentation layer while the feature information of each facial expression is kept.
(III) visual tracking method
The invention adopts a tracking algorithm based on detection, analyzes and captures an image gradient vector field of a human face, describes the relation between the possible center and the direction of the image gradient by a mathematical method, sets a possible center c, provides a gradient vector at the center position, and leads the displacement to be the same as the gradient direction by normalization. By calculating a normalized displacement related to the position of the centreAnd gradient vector gjThe optimal center c of the eye region in the face image is obtained*I.e., the pupil center position, is given by:
aiming at a possible center c, selecting N different gradients which respectively correspond to different normalized displacementsAnd gradient vector g1…gNI.e. the selected first gradient corresponds to a normalized displacement ofGradient vector is gjWhen the objective function obtains the maximum value, the corresponding position variable is the optimal center position c*. Wherein the selected different gradients correspond to a displacement djThe method can be obtained by the following steps:
will displace djZooming to unit length to obtain the same weight at different positions in the face image, and gradient vector g for improving the robustness of the method to the linear change of illumination and contrastjAlso scaled to a single length, it can be found that the objective function produces a maximum at the pupil center position. In addition, the complexity of the algorithm can be reduced by only considering gradient vectors with significant amplitudes.
Claims (2)
1. A method for processing face images based on intelligent human-computer interaction is characterized in that the specific implementation method comprises the following three processes:
method for detecting face points
Adopting a three-layer convolutional neural network structure, adopting an absolute value correction and parameter sharing mechanism to establish a first layer of convolutional neural network, and adopting a multi-layer regression idea to obtain two-layer and three-layer convolutional neural networks;
the input area of the first-level network is selected to be larger so as to cover the predicted positions as much as possible; correspondingly reducing the second and third detection areas, wherein the selection conditions of the detection areas are as follows: a circular area including 75% of predicted positions among all predicted positions obtained by the previous network, the circular area being centered at a position point where the density of predicted positions of the previous network is highest;
the prediction position is obtained again in the new prediction area, the process is repeated for many times until the detection area is reduced to 1% of the first-stage detection area, the obtained prediction position is the prediction position of each point, and then a plurality of networks of different input areas of each level are obtained;
the final predicted position x of the face point is represented as a cascade of n levels, and the mathematical representation of the predicted position x is as follows:
where x is the predicted position, liThe number of predicted positions on the ith level is n levels of cascade connection, and the predicted positions on the ith level are respectively usedIs represented by, i.e. x1 (1)For the first predicted position at level 1, l at level iiA predicted positionCompared with the corresponding l-th level on the i-1 leveliΔ x for the change of predicted position1 (i)Represents;
(II) facial expression recognition method
In the facial expression recognition method, an end-to-end learning model is provided, the model carries out facial image synthesis through two angles of gesture and expression, and carries out facial expression recognition when the gesture is unchanged; the structure of the model consists of a generator, two discriminators and a classifier;
preprocessing is carried out before the image is transmitted into the model, and a face detection algorithm is applied to a basic library containing 68 mark points for face detection; after preprocessing, inputting facial images into a generator G to generate identity marks, wherein each input image has a determined and unique identity mark, and then cascading the identity marks with an expression code e and a posture code p to express the change of the face; applying a maximum and minimum algorithm between the generator G and the discriminator D, and adding corresponding labels at the input end of a decoder to obtain new labels of face images with different postures and expressions;
two discriminator structures are respectively represented by Datt and Di, wherein the discriminator Datt is used for identifying and representing the entanglement degree of the mark, and the Di is used for improving the quality of the generated image; after the face image is synthesized, the classifier Cexp is used for completing the facial expression recognition task of the face image;
(III) visual tracking method
Adopting a tracking algorithm based on detection, analyzing an image gradient vector field of a captured face, describing the relation between the possible center and the direction of the image gradient by a mathematical method, setting a possible center c, providing a gradient vector at the center position, and enabling the displacement to be the same as the gradient direction through normalization;
by calculating a normalized displacement related to the position of the centreAnd gradient vector gjThe optimal center c of the eye region in the face image is obtained*I.e., the pupil center position, is given by:
aiming at a possible center c, selecting N different gradients which respectively correspond to different normalized displacementsAnd gradient vector g1…gNI.e. the selected normalized displacement corresponding to the jth gradient isGradient vector is gjWhen the objective function obtains the maximum value, the corresponding position variable is the optimal center position c*(ii) a Wherein the selected different gradients correspond to a displacement djThe method comprises the following steps:
will displace djZooming to unit length to obtain the same weight and gradient vector g of different positions in the face imagejAnd the unit length is also scaled, so that the maximum value of the target function at the pupil center position is obtained.
2. The method of claim 1, wherein:
the first layer of convolutional neural network comprises three deep convolutional networks with different detection areas, wherein F1 network detection areas cover the whole face, EN1 network detection areas only cover the eyes and nose areas, and NM1 network detection areas only cover the nose and mouth areas; the three networks simultaneously predict different areas of the same face, the obtained predicted values of the three networks are averaged, and the corresponding second-layer predicted positions and the corresponding third-layer predicted positions are obtained respectively according to the three networks F1, FN1 and NM1 by adopting a regression idea according to the first-layer predicted positions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010232764.9A CN111428661A (en) | 2020-03-28 | 2020-03-28 | Method for processing face image based on intelligent human-computer interaction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010232764.9A CN111428661A (en) | 2020-03-28 | 2020-03-28 | Method for processing face image based on intelligent human-computer interaction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111428661A true CN111428661A (en) | 2020-07-17 |
Family
ID=71549134
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010232764.9A Pending CN111428661A (en) | 2020-03-28 | 2020-03-28 | Method for processing face image based on intelligent human-computer interaction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111428661A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150169938A1 (en) * | 2013-12-13 | 2015-06-18 | Intel Corporation | Efficient facial landmark tracking using online shape regression method |
CN105868689A (en) * | 2016-02-16 | 2016-08-17 | 杭州景联文科技有限公司 | Cascaded convolutional neural network based human face occlusion detection method |
CN108875624A (en) * | 2018-06-13 | 2018-11-23 | 华南理工大学 | Method for detecting human face based on the multiple dimensioned dense Connection Neural Network of cascade |
-
2020
- 2020-03-28 CN CN202010232764.9A patent/CN111428661A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150169938A1 (en) * | 2013-12-13 | 2015-06-18 | Intel Corporation | Efficient facial landmark tracking using online shape regression method |
CN105868689A (en) * | 2016-02-16 | 2016-08-17 | 杭州景联文科技有限公司 | Cascaded convolutional neural network based human face occlusion detection method |
CN108875624A (en) * | 2018-06-13 | 2018-11-23 | 华南理工大学 | Method for detecting human face based on the multiple dimensioned dense Connection Neural Network of cascade |
Non-Patent Citations (3)
Title |
---|
FABIAN TIMM等: "Accurate Eye Centre Localisation by Means of Gradients", 《HTTPS://WWW.RESEARCHGATE.NET/PUBLICATION/221415814》 * |
FEIFEI ZHANG等: "Joint Pose and Expression Modeling for Facial Expression Recognition", 《CVPR》 * |
YI SUN等: "Deep Convolutional Network Cascade for Facial Point Detection", 《HTTP://MMLAB.IE.CUHK.EDU.HK/CNN FACEPOINT.HTM》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Boulahia et al. | Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition | |
Song et al. | Constructing stronger and faster baselines for skeleton-based action recognition | |
EP3559860B1 (en) | Compact language-free facial expression embedding and novel triplet training scheme | |
Du et al. | Representation learning of temporal dynamics for skeleton-based action recognition | |
Várkonyi-Kóczy et al. | Human–computer interaction for smart environment applications using fuzzy hand posture and gesture models | |
Geetha et al. | A vision based dynamic gesture recognition of indian sign language on kinect based depth images | |
Shi et al. | Improving CNN performance accuracies with min–max objective | |
CN108804453A (en) | A kind of video and audio recognition methods and device | |
Kollias et al. | On line emotion detection using retrainable deep neural networks | |
Yang et al. | Facial expression recognition based on dual-feature fusion and improved random forest classifier | |
Tur et al. | Evaluation of hidden markov models using deep cnn features in isolated sign recognition | |
Li et al. | Visual object tracking via multi-stream deep similarity learning networks | |
Al Farid et al. | Single Shot Detector CNN and Deep Dilated Masks for Vision-Based Hand Gesture Recognition From Video Sequences | |
Ghaleb et al. | Multimodal fusion based on information gain for emotion recognition in the wild | |
CN113076916A (en) | Dynamic facial expression recognition method and system based on geometric feature weighted fusion | |
CN111814604A (en) | Pedestrian tracking method based on twin neural network | |
Xu et al. | Emotion recognition research based on integration of facial expression and voice | |
CN111428661A (en) | Method for processing face image based on intelligent human-computer interaction | |
Xie et al. | Towards Hardware-Friendly and Robust Facial Landmark Detection Method | |
CN113887509B (en) | Rapid multi-modal video face recognition method based on image set | |
Deramgozin et al. | Attention-enabled lightweight neural network architecture for detection of action unit activation | |
Zhou et al. | ULME-GAN: a generative adversarial network for micro-expression sequence generation | |
Srinivas et al. | E-CNN-FFE: An Enhanced Convolutional Neural Network for Facial Feature Extraction and Its Comparative Analysis with FaceNet, DeepID, and LBPH Methods | |
CN116682168B (en) | Multi-modal expression recognition method, medium and system | |
Ameer et al. | Deep Transfer Learning for Lip Reading Based on NASNetMobile Pretrained Model in Wild Dataset |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200717 |