CN112836566A - Multitask neural network face key point detection method for edge equipment - Google Patents
Multitask neural network face key point detection method for edge equipment Download PDFInfo
- Publication number
- CN112836566A CN112836566A CN202011386983.9A CN202011386983A CN112836566A CN 112836566 A CN112836566 A CN 112836566A CN 202011386983 A CN202011386983 A CN 202011386983A CN 112836566 A CN112836566 A CN 112836566A
- Authority
- CN
- China
- Prior art keywords
- face
- neural network
- key point
- convolutional neural
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 44
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 10
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 27
- 238000000034 method Methods 0.000 claims abstract description 25
- 238000012549 training Methods 0.000 claims description 20
- 230000014509 gene expression Effects 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 abstract description 14
- 238000013135 deep learning Methods 0.000 abstract description 10
- 230000008859 change Effects 0.000 description 5
- 238000013461 design Methods 0.000 description 5
- 210000000887 face Anatomy 0.000 description 5
- 230000001815 facial effect Effects 0.000 description 5
- 238000005286 illumination Methods 0.000 description 5
- 230000004807 localization Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 210000003128 head Anatomy 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- FKOQWAUFKGFWLH-UHFFFAOYSA-M 3,6-bis[2-(1-methylpyridin-1-ium-4-yl)ethenyl]-9h-carbazole;diiodide Chemical compound [I-].[I-].C1=C[N+](C)=CC=C1C=CC1=CC=C(NC=2C3=CC(C=CC=4C=C[N+](C)=CC=4)=CC=2)C3=C1 FKOQWAUFKGFWLH-UHFFFAOYSA-M 0.000 description 1
- 241000288110 Fulica Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013434 data augmentation Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 230000009021 linear effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000001343 mnemonic effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the field of deep learning, face recognition and face key point detection, and provides a multitask neural network face key point detection method for edge equipment, which is used for realizing face key point calibration and face accurate recognition on mobile equipment. Therefore, the technical scheme adopted by the invention is that a face image to be detected is input into a convolutional neural network aiming at a multitask neural network face key point detection method of edge equipment, and the detected face key point coordinates are output by the convolutional neural network; defining the convolutional neural network loss function. The method is mainly applied to the occasions of face recognition and face key point detection.
Description
Technical Field
The invention relates to the field of deep learning, face recognition and face key point detection, in particular to a multitask neural network face key point detection method for edge equipment.
Background
Face keypoint detection, also known as face targeting or face alignment, is intended to automatically locate a set of predefined fiducial points on a face (e.g., corners of the eyes, nose tip, corners of the mouth, etc.). This problem has been of interest in the field of computer vision and has made great progress over the past few years as a fundamental component of various face applications such as face recognition [1, 2], face verification [3], face morphing [4], and face editing [5 ]. However, due to the factors of detection accuracy, processing speed and model size, it is still challenging to develop a practical face key point detection technology.
The technical difficulty is that the face with very high quality is difficult to acquire in a real scene, that is, the face state in the natural environment is uncontrolled and unconstrained. Under different lighting conditions, the posture, the expression and the shape of the lamp body are greatly changed, and local shielding sometimes occurs, as shown in the figure I. Therefore, the challenges in face calibration detection mainly include the following four types:
1. local variation: the face image is locally disturbed by facial expressions, local extreme illumination (such as highlights and shadows), occlusion and the like, so that some key points may not be displayed or are abnormally positioned.
2. Global change: pose and image quality are two key factors, which can bring global influence to the face appearance in the image, and when the global structure of the face is estimated by mistake, the positioning of most key points is not accurate.
3. Data imbalance: the phenomenon that the distribution of the data set which can be used for training is not uniform among the face types and attributes is quite common. The imbalance is likely to make the algorithm or model unable to correctly characterize the data, thereby reducing the accuracy of the detection.
4. Model efficiency: the size and computational performance of the model also limit the utility of the algorithm. Due to the limitations of computing performance and memory resources of the mobile phone or other embedded devices, the complexity of the detection algorithm must be low and the processing speed must be fast.
In recent years, human face key point positioning research has received extensive attention, and many classical algorithms have been created. The method is characterized in that a face shape model is established firstly, the face characteristic points are described by parameters with lower dimensionality, then a face appearance model is established, and the positions of the characteristic points are updated according to the matching degree of the reconstructed face appearance and the model. Wherein Active Appearance Models (AAMs) and Constraint Local Models (CLMs) proposed by Cootes et al [6] are taken as representatives, and the characteristics of the human face are fully utilized by maximizing the position information of the human face. Active appearance models and their subsequent studies [7,8,9] attempt to jointly model overall appearance and shape, while CLMs and related algorithms [10,11] learn local information by applying various shape constraints. Furthermore, a tree-structured component model (TSPM) [12] utilizes deformable component-based models for simultaneous detection, pose estimation, and key point localization. Another type of method is the extended shape regression method (ESR) [13] and the Supervised Descent Method (SDM) [14], which attempt to solve this problem in a regression manner. The main limitations of these methods are poor robustness to complex scene detection, large computational effort, or high model complexity.
Deep Learning (Deep Learning) is a new research direction in the field of machine Learning, and research is performed by using a multilayer neural network. The Convolutional Neural Network (CNN) is a deep learning model, is widely applied to image and audio signal processing, and has a good effect in the face key point detection in recent years. Zhang et al [15] established a multitask learning network (TCDCN) for joint learning of keypoint locations and pose attributes. But due to the multitasking nature of TCDCN, it is difficult to train in practical applications. Trigeorgis et al [16] proposed a coarse-to-fine recursive convolution model (MDM). Lv et al [17] propose a depth regression structure (TSR) based on two-stage re-initialization, which segments a face into several parts to improve detection accuracy. The method [18] uses attitude angles, including yaw, pitch and roll as attributes to construct a network and estimates these three angles directly to aid in keypoint detection, but its complex nature makes it less than ideal in keypoint detection. The pose-invariant face alignment algorithm (PIFA) proposed by jourablo et al [19] estimates a three-dimensional to two-dimensional projection matrix by deep cascaded regression. Algorithm [20] first builds a model of the face depth in the Z buffer and then fits a three-dimensional model of the two-dimensional image.
Recently, Kumar and Chellapa have designed a single tree-shaped CNN named posture condition tree-shaped convolutional neural network (PCD-CNN) [21], which combines a modular classification network on the basis of a classification network to improve the detection accuracy. Honari et al [22] designed a sequence multitasking (SeqMT) network using the equal-variant scaling transform (ELT) as its loss term. The method [23] proposes a regression method based on face calibration of a coarse-to-fine regression tree set (ERT). To make the face keypoint detection method robust to intrinsic changes in image style, Dong et al [24] developed a Style Aggregation Network (SAN) that combines raw face images with style aggregation images to train the keypoint detector. Wu et al [25] propose a boundary-based face alignment algorithm (LAB) that considers boundary information as the geometry of a face to improve detection accuracy, which extracts facial key points from boundary lines to avoid ambiguity in face key point definition to a large extent. Although the deep learning algorithm has advanced sufficiently at present, there are still many shortcomings, especially in practical applications, there is still much room for improvement regarding the accuracy, efficiency and simplicity of the detection algorithm [28 ].
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a multitask neural network face key point detection method for edge equipment, which is used for realizing face key point calibration and face accurate identification on mobile equipment. Therefore, the technical scheme adopted by the invention is that a face image to be detected is input into a convolutional neural network aiming at a multitask neural network face key point detection method of edge equipment, and the detected face key point coordinates are output by the convolutional neural network; the convolutional neural network loss function is as follows:
whereinThe distance of the nth key point corresponding to the mth input is shown; n represents the number of key points which are preset for each face and need to be detected; m represents the total number of samples of the training picture set; theta1、θ2And theta3Respectively indicating yaw angle and pitchDeviation values between the actual and predicted values of the angle and roll angle, K being 1, 2, 3; c represents different types of human faces, including front faces, side faces, head-up, head-down, expressions and shielding conditions; weight ofAnd adjusting according to the sample class score, and taking the reciprocal of the classification as a weight.
The convolutional neural network is based on a MobileNet convolutional neural network.
The method for enhancing the training data comprises the following specific steps:
1) turning each face picture, and rotating every 5 degrees before minus 30 degrees to 30 degrees;
2) 20% of the face area was randomly occluded on each picture.
And introducing a sub-network in the process of training the convolutional neural network for supervising the training of the model, wherein the sub-network is only used in the training stage, the input is the output of the fourth layer of the convolutional neural network, and the output is three Euler angles of yaw, pitch and roll and is used for calculating a loss function.
The invention has the characteristics and beneficial effects that:
1. the design of the network in the invention is very light, can support multiple tasks, and can simultaneously obtain the key point and face angles after the face image is input.
2. The model in the invention has very small size, saves memory space, is very suitable for running on mobile platforms such as mobile phones and the like, has high running speed, and the frame rate in the mobile platforms can reach 140 fps.
3. Aiming at the problems of geometric constraint and data imbalance, the invention designs a new loss function, thereby solving the problems of geometric constraint and data imbalance.
4. In order to enlarge the receptive field and better capture the global structure of the human face, the invention designs a multi-scale full-connection layer for accurately positioning key points in the human face image.
5. Compared with other face key point detection algorithms, the method uses a coupling mode between three-dimensional attitude estimation and two-dimensional distance measurement; the network structure is simple and visual, and forward calculation and backward propagation are easy to perform; in a single-stage network structure rather than in a cascaded form, this improves the computational efficiency and performance of the method.
6. The algorithm in the invention has high accuracy under various complex conditions of unconstrained gesture, expression, illumination, shielding and the like. We surpassed other advanced methods (such as TSR [17], SAN [24], LAB [25]) in the two data sets of 300W (300Faces in-the-world change face key point data set) and AFLW (identified face Landmarks in the Wild face key point data set). FIGS. 2-5 show examples of partial faces in 300W and AFLW and examples of multi-face pictures, where green points in the pictures mark key points of detected faces.
Description of the drawings:
fig. 1 is an overall model structure diagram in the method of the present invention, and is a diagram illustrating the architecture of a backbone network and an auxiliary network.
FIG. 2 is an example of a human face of the present invention at different poses, expressions, lights, occlusions and image qualities.
FIG. 3 is an example of the detection results of the face key points under extreme illumination, expression, occlusion and blur disturbance in the present invention.
Fig. 4 is an example of the detection result of the multi-face key points in the complex background.
Fig. 5 is an example of the detection result of the multi-face key points in the complex background.
Detailed Description
The invention provides a practical face key point detection technical method based on deep learning, which can be used for effectively calibrating face key points on a mobile terminal. The scheme is mainly realized based on a convolutional neural network, and a network model is required to be designed firstly, and mainly comprises a convolutional neural network structure and a loss function. The input of the model is a face image needing to be detected, and the output of the model is the coordinates of key points of the detected face. Therefore, the core of the method of the present invention is the design of the model, and we will introduce from the loss function, the backbone network, the auxiliary network, and other implementation details.
A first part: loss function
In the case of small data size, the accuracy of the algorithm depends mainly on the design of the loss function. Taking the geometric information into account in the loss function may help solve the training quality problem. Since the change of the local expression hardly affects the projection, the degrees of freedom including scaling and two-dimensional translation can be reduced, and only three euler angles, namely, a pitch angle, a yaw angle and a roll angle, need to be estimated.
Furthermore, in deep learning, data imbalance is another problem that often affects detection accuracy. Therefore, more penalties are made to the loss values corresponding to the rare training samples, which can help to deal with the data imbalance problem.
In view of the above, we have designed the loss function as follows:
whereinThe distance of the nth key point corresponding to the mth input is shown; n represents the number of key points which are preset for each face and need to be detected; m represents the total number of samples of the training picture set; theta1、θ2And theta3(K-3) respectively representing deviation values between actual values and predicted values of the yaw angle, the pitch angle and the roll angle, and obviously, as the deviation angle value increases, the penalty also increases; c represents different types of human faces, such as front faces, side faces, head-up, head-down, expressions and shielding conditions; weight ofAnd adjusting according to the sample class score, and taking the reciprocal of the classification as a weight in the invention.
By means of the loss function, it can be found that whether the training is affected by three-dimensional posture change or data imbalance, our loss can be used for processing local change by measuring distance.
A second part: backbone network
The backbone network uses convolutional neural networks in deep learning to extract features and predict keypoints (lower level branches in fig. 1). Because a human face has strong global structures, such as symmetric spatial relationships between eyes, mouth, nose, and the like, using global structures can help in more accurate localization. We use a multi-scale profile to expand the receptive field by using different step sizes to perform the convolution operation. In order to map abstract information in different size receptive fields learned by the previous convolutional layer into a larger space and increase the representation capability of a model, the final prediction is completed by connecting the previous three multi-scale feature maps through a full connection layer, detailed parameters of a backbone network are shown in table one in detail, a picture is converted into a three-dimensional matrix of 112x 112x 3 to serve as input, wherein 112x 112 represents the pixel size of an input image, 3 represents the number of RGB (red, green and blue) three channels, an output layer is the multi-scale full connection layer of the user and is connected with the output of the three convolutional layers. The input of each layer of the network is the output of the previous layer, the first two dimensions represent the image size, and the third dimension represents the number of channels. Taking the second layer 562x 64 as an example, 56 is the previous layer pixel divided by the step size, i.e., 112/2-56, and the third dimension is the number of channels of the previous layer convolutional layer, i.e., 64.
Since the backbone network is a bottleneck in terms of processing speed and model size, MobileNet [26-27] is used instead of the conventional convolution operation. MobileNet is a lightweight convolutional neural network, primarily used for mobile and embedded vision applications. The computational load of the backbone network is greatly reduced by using the MobileNet, so that the detection speed is accelerated. In addition, the main network can adjust the width parameter of the MobileNet according to different requirements to compress the network, so that the model is smaller and the calculation speed is higher, and the model size in the invention can still obtain good detection precision after being compressed by 80%.
Table-detailed parameters of a backbone network
And a third part: auxiliary network
In the process of training the backbone network, a sub-network is introduced to supervise the training of the model (upper branch in fig. 1). The network is only used in the training phase, and the input is the output of the fourth layer of the backbone network. The auxiliary network aims to estimate three-dimensional rotation information including three euler angles of yaw, pitch and roll for each input face sample, thereby determining the head pose. The auxiliary network can effectively improve the stability and robustness of key point detection, the specific structure of the auxiliary network is shown in table two, the input represents the size of a three-dimensional array in the main network, and the output represents three Euler angles of yaw, pitch and rolling, and is used for calculating a loss function in the first part of formula. I.e. the loss function of the overall network.
Table two detailed parameters of auxiliary network
The fourth part: other details
In order to enable the deep neural network model to have good effect, the hyper-parameters in the network are optimized, and the values of the hyper-parameters in the table III can be used as references:
hyper-parameters for table three network training
In addition, to solve the problem of data imbalance, we also use a data enhancement strategy. Data enhancement, also called data augmentation, refers to having limited data produce value equivalent to more data without substantially increasing the data. Therefore, we mainly adopt the following two ways:
1) and turning each face picture, and rotating every 5 degrees before the face picture is rotated by minus 30 degrees to 30 degrees.
2) 20% of the face area was randomly occluded on each picture.
By adopting a data enhancement strategy, the expansion of the training data set is realized, so that a better detection effect is obtained.
The invention provides a face key point detection method which is mainly based on a convolutional neural network algorithm in deep learning. The neural network model mainly comprises a backbone network and an auxiliary network, wherein the backbone network takes a MobileNet block as a main structure, and simultaneously introduces a multi-scale full-connection layer to expand the receptive field and enhance the expression capability of the structural features of the human face. The auxiliary network can effectively estimate the rotation information to improve the positioning capability of the key point.
The invention solves the problems of geometric normalization and data imbalance by providing a new loss function, and the whole algorithm is superior to the most advanced method in precision, model size and operation speed. From the detection results of fig. 2 to 5, it can be observed that the present invention can still obtain satisfactory visual effects even under extreme illumination, expression, occlusion, and blur interference.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Primary references
【1】Y.Liu,F.Wei,J.Shao,L.Sheng,J.Yan,and X.Wang.Exploring disentangled feature representation beyond face identification.In CVPR,2018.1
【2】X.Zhu,Z.Lei,J.Yan,D.Yi,and S.Z.Li.High-fidelity pose and expression normalization for face recognition in the wild.In CVPR,2015.
【3】Y.Sun,X.Wang,and X.Tang.Hybrid deep learning for face verification.IEEE TPAMI,38(10):1997-2009,2016.
【4】T.Hassner,S.Harel,E.Paz,and R.Enbar.Effective face frontalization in unconstrained images.In CVPR,2015.
【5】J.Thies,M.Zollho fer,M.Stamminger,C.Theobalt,and M.Niener.Face2face:Real-time face capture and reenactment of rgb videos.In CVPR,2016.
【6】T.Cootes,G.Edwards,and C.Taylor.Active appear-ance models.IEEE TPAMI,23(6):681-685,2001.
【7】I.Matthews and S.Baker.Active appearance models revisited.IJCV,60(2):135-164,2004.
【8】F.Khraman,M.Go kmen,S.Darkner,and R.Larsen.An active illumination and appearance(AIA)model for face alignment.In CVPR,2007.
【9】L.Liang,R.Xiao,F.Wen,and J.Sun.Face alignment via component-based discriminative search.In ECCV,2008.
【10】P.Belhumeur,D.Jacobs,D.Kriegman,and N.Kumar.Localizing parts of faces using a consensus of exem-plars.In CVPR,2011.
【11】M.Valstar,B.Martinez,X.Binefa,and M.Pantic.Facial point detection using boosted regression and graph models.In CVPR,2010.
【12】X.Zhu and D.Ramanan.Face detection,pose estimation,and landmark localization in the wild.In CVPR,2012.
【13】X.Cao,Y.Wei,F.Wen,and J.Sun.Face alignment by explicit shape regression.IJCV,107(2):177-190,2014.2,6,7
【14】X.Xiong and F.De la Torre.Supervised decent method and its applications to face alignment.In CVPR,2013.
【15】Z.Zhang,P.Luo,C.Loy,and X.Tang.Facial land-mark detection via deep multi-task learning.In ECCV,2014.
【16】G.Trigeogis,P.Snape,M.Nicolaou,E.Antonakos,and S.Zafeiriou.Mnemonic descent method:A re-current process applied for end-to-end face alignment.In CVPR,2016.
【17】J.Lv,X.Shao,J.Xing,C.Cheng,and X.Zhou.A deep regression architecture with two-stage re-initialization for high performance facial landmark detection.In CVPR,2017.
【18】H.Yang,W.Mou,Y.Zhang,I.Patras,H.Gunes,and P.Robinson.Face alignment assisted by head pose estimation.In BMVC,2015.
【19】A.Jourabloo and X.Liu.Pose-invariant 3d face align-ment.In ICCV,2015.
【20】X.Zhu,Z.Lei,X.Liu,H.Shi,and S.Z.Li.Face alignment across large poses:A 3d solution.In CVPR,2016.
【21】A.Kumar and R.Chellappa.Disentangling 3d pose in a dendritic cnn for unconstrained 2d face alignment.InCVPR,2018.
【22】S.Honari,P.Molchanov,S.Tyree,P.Vincent,C.Pal,and J.Kautz.Improving landmark localization with semi-supervised learning.In CVPR,2018.
【23】R.Valle,J.Buenaposada,A.Valdes,and L.Baumela.A deeply-initialized coarse-to-fine ensemble of regres-sion trees for face alignment.In ECCV,2018.
【24】X.Dong,Y.Yan,W.Ouyang,and Y.Yang.Style aggregated network for facial landmark detection.In CVPR,2018.
【25】W.Wu,C.Qian,S.Yang,Q.Wang,Y.Cai,and Q.Zhou.Look at boundary:A boundary-aware face alignment algorithm.In CVPR,2018.
【26】A.Howard,M.Zhu,B.Chen,D.Kalenichenko,W.Wang,T.Weyand,M.Andreetto,and H.Adam.Mobilenets:Efficient convolutional neural networks for mobile vision applications.CoRR,abs/1704.04861,2017.3,4.
【27】M.Sandle,A.Howard,M.Zhu,A.Zhmoginov,and L.-C.Chen.Mobilenetv2:Inverted residuals and lin-ear bottlenecks.CoRR,abs/1801.04381,2018.3,4.
【28】Xiaojie Guo,Siyuan Li,Jiawan Zhang,Jiayi Ma,Lin Ma,Wei Liu,Haibin Ling:PFLD:A Practical Facial Landmark Detector.CoRR abs/1902.10859,2019。
Claims (3)
1. A multitask neural network face key point detection method for edge equipment is characterized in that a face image to be detected is input into a convolutional neural network, and the convolutional neural network outputs the detected face key point coordinates; the convolutional neural network loss function is as follows:
whereinThe distance of the nth key point corresponding to the mth input is shown; n represents the number of key points which are preset for each face and need to be detected; m represents the total number of samples of the training picture set; theta1、θ2And theta3Respectively representing deviation values between actual values and predicted values of yaw angle, pitch angle and roll angle, wherein K is 1, 2 and 3; c represents different types of human faces, including front faces, side faces, head-up, head-down, expressions and shielding conditions; weight ofAnd adjusting according to the sample class score, and taking the reciprocal of the classification as a weight.
2. The method of claim 1, wherein the convolutional neural network is based on a MobileNet convolutional neural network.
3. The method for detecting the key points of the face of the multitask neural network aiming at the edge equipment as claimed in claim 1, wherein the data enhancement processing is carried out on the training data, and the specific steps are as follows:
1) turning each face picture, and rotating every 5 degrees before minus 30 degrees to 30 degrees;
2) 20% of the face area was randomly occluded on each picture.
And introducing a sub-network in the process of training the convolutional neural network for supervising the training of the model, wherein the sub-network is only used in the training stage, the input is the output of the fourth layer of the convolutional neural network, and the output is three Euler angles of yaw, pitch and roll and is used for calculating a loss function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011386983.9A CN112836566A (en) | 2020-12-01 | 2020-12-01 | Multitask neural network face key point detection method for edge equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011386983.9A CN112836566A (en) | 2020-12-01 | 2020-12-01 | Multitask neural network face key point detection method for edge equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112836566A true CN112836566A (en) | 2021-05-25 |
Family
ID=75923432
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011386983.9A Pending CN112836566A (en) | 2020-12-01 | 2020-12-01 | Multitask neural network face key point detection method for edge equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112836566A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113782184A (en) * | 2021-08-11 | 2021-12-10 | 杭州电子科技大学 | Cerebral apoplexy auxiliary evaluation system based on facial key point and feature pre-learning |
CN114399803A (en) * | 2021-11-30 | 2022-04-26 | 际络科技(上海)有限公司 | Face key point detection method and device |
WO2022257456A1 (en) * | 2021-06-10 | 2022-12-15 | 平安科技(深圳)有限公司 | Hair information recognition method, apparatus and device, and storage medium |
CN115984461A (en) * | 2022-12-12 | 2023-04-18 | 广州紫为云科技有限公司 | Face three-dimensional key point detection method based on RGBD camera |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239736A (en) * | 2017-04-28 | 2017-10-10 | 北京智慧眼科技股份有限公司 | Method for detecting human face and detection means based on multitask concatenated convolutional neutral net |
CN108805977A (en) * | 2018-06-06 | 2018-11-13 | 浙江大学 | A kind of face three-dimensional rebuilding method based on end-to-end convolutional neural networks |
WO2019109526A1 (en) * | 2017-12-06 | 2019-06-13 | 平安科技(深圳)有限公司 | Method and device for age recognition of face image, storage medium |
WO2019128367A1 (en) * | 2017-12-26 | 2019-07-04 | 广州广电运通金融电子股份有限公司 | Face verification method and apparatus based on triplet loss, and computer device and storage medium |
CN110263774A (en) * | 2019-08-19 | 2019-09-20 | 珠海亿智电子科技有限公司 | A kind of method for detecting human face |
CN111160269A (en) * | 2019-12-30 | 2020-05-15 | 广东工业大学 | Face key point detection method and device |
-
2020
- 2020-12-01 CN CN202011386983.9A patent/CN112836566A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239736A (en) * | 2017-04-28 | 2017-10-10 | 北京智慧眼科技股份有限公司 | Method for detecting human face and detection means based on multitask concatenated convolutional neutral net |
WO2019109526A1 (en) * | 2017-12-06 | 2019-06-13 | 平安科技(深圳)有限公司 | Method and device for age recognition of face image, storage medium |
WO2019128367A1 (en) * | 2017-12-26 | 2019-07-04 | 广州广电运通金融电子股份有限公司 | Face verification method and apparatus based on triplet loss, and computer device and storage medium |
CN108805977A (en) * | 2018-06-06 | 2018-11-13 | 浙江大学 | A kind of face three-dimensional rebuilding method based on end-to-end convolutional neural networks |
CN110263774A (en) * | 2019-08-19 | 2019-09-20 | 珠海亿智电子科技有限公司 | A kind of method for detecting human face |
CN111160269A (en) * | 2019-12-30 | 2020-05-15 | 广东工业大学 | Face key point detection method and device |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022257456A1 (en) * | 2021-06-10 | 2022-12-15 | 平安科技(深圳)有限公司 | Hair information recognition method, apparatus and device, and storage medium |
CN113782184A (en) * | 2021-08-11 | 2021-12-10 | 杭州电子科技大学 | Cerebral apoplexy auxiliary evaluation system based on facial key point and feature pre-learning |
CN114399803A (en) * | 2021-11-30 | 2022-04-26 | 际络科技(上海)有限公司 | Face key point detection method and device |
CN115984461A (en) * | 2022-12-12 | 2023-04-18 | 广州紫为云科技有限公司 | Face three-dimensional key point detection method based on RGBD camera |
CN115984461B (en) * | 2022-12-12 | 2024-10-25 | 广州紫为云科技有限公司 | Face three-dimensional key point detection method based on RGBD camera |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112836566A (en) | Multitask neural network face key point detection method for edge equipment | |
CN107832672B (en) | Pedestrian re-identification method for designing multi-loss function by utilizing attitude information | |
WO2020108362A1 (en) | Body posture detection method, apparatus and device, and storage medium | |
CN110490158B (en) | Robust face alignment method based on multistage model | |
WO2016034059A1 (en) | Target object tracking method based on color-structure features | |
CN107953329B (en) | Object recognition and attitude estimation method and device and mechanical arm grabbing system | |
CN109472198A (en) | A kind of video smiling face's recognition methods of attitude robust | |
CN112037320B (en) | Image processing method, device, equipment and computer readable storage medium | |
Tsai et al. | Simultaneous 3D object recognition and pose estimation based on RGB-D images | |
US20200211220A1 (en) | Method for Identifying an Object Instance and/or Orientation of an Object | |
Zhang et al. | Poseflow: A deep motion representation for understanding human behaviors in videos | |
CN111259739B (en) | Human face pose estimation method based on 3D human face key points and geometric projection | |
CN112528902B (en) | Video monitoring dynamic face recognition method and device based on 3D face model | |
CN110598715A (en) | Image recognition method and device, computer equipment and readable storage medium | |
CN108596193A (en) | A kind of method and system for building the deep learning network structure for ear recognition | |
CN110110603A (en) | A kind of multi-modal labiomaney method based on facial physiologic information | |
CN112308128B (en) | Image matching method based on attention mechanism neural network | |
CN108564043B (en) | Human body behavior recognition method based on space-time distribution diagram | |
CN108711175B (en) | Head attitude estimation optimization method based on interframe information guidance | |
CN110188630A (en) | A kind of face identification method and camera | |
Luo et al. | Dynamic face recognition system in recognizing facial expressions for service robotics | |
CN104751144B (en) | A kind of front face fast appraisement method of facing video monitoring | |
CN111881841A (en) | Face detection and recognition method based on binocular vision | |
Luo et al. | Alignment and tracking of facial features with component-based active appearance models and optical flow | |
Mostafa et al. | Dynamic weighting of facial features for automatic pose-invariant face recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210525 |