[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112115783A - Human face characteristic point detection method, device and equipment based on deep knowledge migration - Google Patents

Human face characteristic point detection method, device and equipment based on deep knowledge migration Download PDF

Info

Publication number
CN112115783A
CN112115783A CN202010809064.1A CN202010809064A CN112115783A CN 112115783 A CN112115783 A CN 112115783A CN 202010809064 A CN202010809064 A CN 202010809064A CN 112115783 A CN112115783 A CN 112115783A
Authority
CN
China
Prior art keywords
face
network
training
feature
student
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010809064.1A
Other languages
Chinese (zh)
Other versions
CN112115783B (en
Inventor
吕科
高鹏程
薛健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Chinese Academy of Sciences
Original Assignee
University of Chinese Academy of Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Chinese Academy of Sciences filed Critical University of Chinese Academy of Sciences
Priority to CN202010809064.1A priority Critical patent/CN112115783B/en
Publication of CN112115783A publication Critical patent/CN112115783A/en
Application granted granted Critical
Publication of CN112115783B publication Critical patent/CN112115783B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the invention discloses a method, a device and equipment for detecting human face characteristic points based on deep knowledge migration, wherein the method comprises the following steps: providing a face data set, and cutting a face image according to a face detection frame or a surrounding frame of a face characteristic point provided by the face data set to obtain a training set, a verification set and a test set; inputting a test sample and a training sample into an initial face alignment network frame; training an initial face to align a teacher network and a student network in a network frame by using a Pythrch until a loss function and the maximum iteration number meet a preset condition to generate a training model; model parameters of a teacher network are frozen, deep dark knowledge learned by the teacher network is extracted and transmitted to a student network to generate a final face alignment network model; and inputting the RGB face image in the natural scene into a final face alignment network model, and outputting a face characteristic point detection result. The invention has the advantages of high face feature point detection precision, low model parameter quantity and low calculation complexity.

Description

Human face characteristic point detection method, device and equipment based on deep knowledge migration
Technical Field
The embodiment of the invention relates to the field of computer vision and digital image processing, in particular to a method, a device and equipment for detecting human face characteristic points based on deep knowledge migration.
Background
The existing method for detecting the human face characteristic points cannot effectively solve the problem of human face characteristic point positioning in a natural scene, and the complicated method has huge model parameters and higher calculation complexity and cannot meet the requirement of running speed. The simple method cannot cope with the interference of factors such as extreme postures, variable illumination, severe shielding and the like in a natural scene, and the precision cannot meet the application requirement.
Disclosure of Invention
The embodiment of the invention aims to provide a method, a device and equipment for detecting human face characteristic points based on deep knowledge migration, which are used for solving the problems of higher computational complexity, low running speed and low precision of the existing human face characteristic point detection.
In order to achieve the above object, the embodiments of the present invention mainly provide the following technical solutions:
in a first aspect, an embodiment of the present invention provides a method for detecting a human face feature point based on depth knowledge migration, including:
s1: providing a face data set containing face characteristic point labels, and cutting a face image according to a face detection frame or an enclosing frame of the face characteristic points provided by the face data set to obtain a training set, a verification set and a test set;
s2: acquiring a training sample from the training set, acquiring a test sample from the test set, and inputting the test sample and the training sample into an initial face alignment network framework;
s3: setting parameters of a convolutional neural network, training a teacher network and a student network in the initial face alignment network framework by utilizing a Pythrch, and generating a training model until a loss function and the maximum iteration number meet preset conditions;
s4: freezing model parameters of a teacher network, extracting deep dark knowledge learned by the teacher network, transmitting the deep dark knowledge to the student network, and supervising a training process of the student network to generate a final face alignment network model;
s5: and inputting the RGB face image in the natural scene into the final face alignment network model, and outputting a face characteristic point detection result.
In one embodiment of the present invention, step S1 includes:
s1-1: providing a WFLW data set, wherein the WFLW data set comprises N training pictures and M testing pictures, each picture is provided with a picture label, the picture information comprises face frame information, face feature point position information and a plurality of attribute information, and N and M are positive integers larger than zero;
s1-2: and cutting a face image according to a face detection frame provided by the face data set, disturbing the face detection frame, and applying random rotation, size scaling and turning to the face image so as to perform data enhancement to obtain the training set, the verification set and the test set.
In one embodiment of the present invention, the initial face alignment web frame is generated by:
generating the teacher grid by adopting a network structure of an encoder-decoder, wherein the teacher grid encoder comprises three upper sampling layers and convolution layers, and is used for carrying out feature extraction and encoding on an input image, retaining feature extraction information of an original network, and removing a final average pooling layer, a full-connection layer for classification and a final 1 × 1 convolution layer for ascending dimension;
adding the decoder behind the encoder, performing spatial up-sampling on the image features extracted by the encoder to obtain a feature map, converting the channel dimension of the feature map into the number of the human face feature points, and calculating the coordinates of the human face feature points expected to be obtained on each transformed feature map by using spatial softargmax operation;
providing a student network with an EfficientFAN structure, wherein a student network encoder comprises three upsampling layers and convolutional layers, the student network is used for final human face characteristic point detection, EfficientNet-B0 is used as a main part of the student network encoder, and the final average pooling layer of EfficientNet-B0, a full connection layer for classification and a 1 x 1 convolutional layer of the last ascending dimension are removed;
adding a convolution layer of 1 multiplied by 1 behind the student grid encoder, converting the channel number of the feature map obtained by up-sampling the student grid encoder into the number of the human face feature points, and calculating the coordinates of the human face feature points on the converted feature map by using spatial softargmax operation.
In one embodiment of the present invention, step S3 includes:
training the teacher network and the student network separately, using a feature point loss function LPOptimizing network parameters, characteristic point loss function LPCalculated by the Wingloss loss function, the Wingloss loss function is expressed as follows:
Figure BDA0002630256640000031
Figure BDA0002630256640000032
wherein P ∈ R1×2NIs a coordinate vector of a predicted face characteristic point, G is belonged to R1×2NIs the real face feature point coordinate vector, N is the number of face feature points, ω, ∈ is the preset parameter of f (x).
In one embodiment of the present invention, in step S4, extracting deep dark knowledge learned by the teacher network includes:
extracting pixel distribution information on a feature map based on a knowledge distillation method of feature alignment, aligning the pixel distribution of the teacher network feature map and the pixel distribution of the student network feature map, wherein a knowledge distillation loss function of feature alignment is as follows:
Figure BDA0002630256640000033
wherein A and B are feature maps of the teacher network and the student network at the same stage,
Figure BDA0002630256640000034
is a 1 x 1 convolutional layer for aligning the channel dimensions of both a and B feature maps.
In one embodiment of the present invention, in step S4, the passing the deep dark knowledge to the student network includes:
and extracting face structure information under different scales by a knowledge distillation method based on block similarity, and transmitting the structural information of the face image to the student network from the teacher network.
In a second aspect, an embodiment of the present invention further provides a face feature point detection apparatus based on depth knowledge migration, including:
the system comprises a providing module, a judging module and a judging module, wherein the providing module is used for providing a human face data set containing human face characteristic point labels, and cutting a human face image according to a human face detection frame or a surrounding frame of the human face characteristic points provided by the human face data set to obtain a training set, a verification set and a test set;
an output module;
the control processing module is used for acquiring a training sample from the training set, acquiring a test sample from the test set, and inputting the test sample and the training sample into an initial face alignment network framework; the control processing module is further used for setting parameters of a convolutional neural network, training a teacher network and a student network in the initial face alignment network frame by utilizing a Pythrch, and generating a training model until a loss function and the maximum iteration number meet preset conditions; the control processing module is also used for freezing model parameters of a teacher network, extracting deep dark knowledge learned by the teacher network, transmitting the deep dark knowledge to the student network, and monitoring a training process of the student network to generate a final face alignment network model; the control processing module is further used for inputting the RGB face image in the natural scene into the final face alignment network model, and outputting the face characteristic point detection result through the output module.
In a third aspect, an embodiment of the present invention further provides an electronic device, including: at least one processor and at least one memory; the memory is to store one or more program instructions; the processor is configured to execute one or more program instructions to perform the method for detecting human face feature points based on deep knowledge migration according to the first aspect.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium containing one or more program instructions for being executed by the method for detecting human face feature points based on deep knowledge migration according to the first aspect.
The technical scheme provided by the embodiment of the invention at least has the following advantages:
according to the method, the device and the equipment for detecting the human face feature points based on the deep knowledge migration, provided by the embodiment of the invention, EfficientFAN is used as a simple and effective lightweight model, the up-sampling recovery process of the feature map is rapidly realized based on the decoder structure of up-sampling and deep separable convolution, and the spatial information of the feature map is effectively saved.
Compared with the current advanced large complex model, the method can achieve comparable human face characteristic point detection precision, but the model parameter quantity and the calculation complexity are obviously reduced.
The invention uses a knowledge distillation method and a knowledge migration module to improve the accuracy of positioning the feature points of the EfficientFAN face of a student network, provides a block similarity knowledge distillation method for learning multi-scale structural information of the face, combines the pixel distribution information on a feature alignment knowledge distillation learning feature map, and supervises and guides the training process of the EfficientFAN together. Under the premise of not changing the network structure and not increasing the model parameters, EfficientFAN obtains a more accurate human face characteristic point detection result through a knowledge migration method. Experimental results on a public data set show that EfficientFAN is a simple and effective human face characteristic point detection network, and the knowledge distillation method effectively improves the human face characteristic point detection precision. In combination, EfficientFAN has excellent performance, precision and speed.
Drawings
Fig. 1 is a flowchart of a method for detecting human face feature points based on depth knowledge migration according to the present invention.
Fig. 2 is a block diagram of the structure of the apparatus for detecting human face feature points based on deep knowledge migration according to the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure.
In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In the description of the present invention, it is to be understood that the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "connected" and "connected" are to be interpreted broadly, e.g., as meaning directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Fig. 1 is a flowchart of a method for detecting human face feature points based on depth knowledge migration according to the present invention. As shown in fig. 1, the method for detecting human face feature points based on depth knowledge migration of the present invention includes:
s1: providing a face data set containing the marks of the face characteristic points, and cutting a face image according to a face detection frame or an enclosing frame of the face characteristic points provided by the face data set to obtain a training set, a verification set and a test set.
Specifically, step S1 includes:
s1-1: a WFLW dataset is provided. The data set originates from IEEE Conference on Computer Vision and Pattern Recognition 2018, and comprises 10000 pictures (7500 training pictures and 2500 test pictures). Each picture label provides face frame information, 98 person face feature point location information, and 6 attribute information (pose, expression, lighting, makeup, occlusion, blur), and divides the entire data set into 6 classes of subsets according to the image attribute information.
S1-2: and cutting the face image according to a face detection frame provided by the face data set, disturbing the face detection frame, and applying random rotation, size scaling and overturning to the face image so as to perform data enhancement to obtain a training set, a verification set and a test set.
S2: and acquiring a training sample from the training set, acquiring a test sample from the test set, and inputting the test sample and the training sample into an initial face alignment network framework.
Specifically, the teacher network employs an encoder-decoder network architecture, using EfficientNet-B7 as the backbone of its encoder. The encoder is used to perform feature extraction and encoding on the input image, only the feature extraction part of the original network is retained, the last average pooling layer and the fully connected layer used for classification are removed, the last up-dimension 1 × 1 convolutional layer is also removed and features are extracted from the last inverse residual module. The operation enables the number of channels of the feature map extracted by the teacher network to have a smaller number of channels (640vs.2048) than the feature map after the feature map passes through the 1 × 1 convolutional layer, maintains more original feature information, does not lose information due to dimension increase, and is more suitable for being analyzed by a decoder.
And a decoder is added after the last reverse residual error module of EfficientNet-B7, the image features extracted by the encoder are subjected to spatial up-sampling, a more natural up-sampling method is used for improving the spatial dimension of the feature map, namely, the combination of an up-sampling layer and a convolution layer is used for replacing deconvolution, a general up-sampling method is firstly used for carrying out spatial up-sampling on the feature map, and then convolution operation is carried out on the basis of the up-sampling feature map to enrich the transformation of the feature map.
The present invention uses a combination of three upsampling layers and convolutional layers as a decoder for a face alignment network, added after the encoder. In the network model, the traditional convolution operation is replaced by the deep separable convolution, so that the calculation amount in the up-sampling process is reduced.
Specifically, the scaling factor of the upsampling layer is set to 2, that is, the length and width of each upsampled feature map are expanded to be twice of the input feature map, and the upsampling of the feature map is realized based on the nearest neighbor interpolation algorithm. A spatial thermodynamic diagram is generated using 1 x 1 convolutional layers after the decoder and the channel dimensions of the feature map are converted to the number of human face feature points. And calculating the coordinates of the expected corresponding human face characteristic points on each transformed characteristic map by using a spatial softargmax operation.
The spatial softargmax operation can be divided into two steps, the first step is normalized on the output characteristic diagram by using the softmax operation, and can be represented as follows:
Figure BDA0002630256640000071
wherein x and y are pixel indexes, exp represents an exponential function, and the obtained M is a normalized feature map. In the second step, the coordinate P of the feature point l can be finally expressed as:
Figure BDA0002630256640000072
a small and light student Network called Efficient Face Alignment Network (Efficient FAN) has a Network structure similar to that of a teacher Network and is used for final Face feature point detection. EfficientNet-B0 was used as the backbone for the student network EfficientFAN encoder. Like the teacher's network, the encoder of the student's network also removed the last average pooling layer and the fully connected layer for classification in EfficientNet-B0, as well as the 1 × 1 convolutional layer for the last ascending dimension.
Likewise, a combination of three upsampled layers and convolutional layers is used as a decoder for the student network, added after the encoder. The scale factor for each upsampled layer is 2 and the number of output channels per convolutional layer is 128. A1 multiplied by 1 convolutional layer is added after a decoder of the student network, and the number of channels of the feature map obtained by up-sampling the decoder is converted from 128 into the number of face feature points.
And finally, calculating the coordinates of the human face characteristic points on the converted characteristic graph by using a spatial softargmax operation.
Figure BDA0002630256640000081
Figure BDA0002630256640000091
TABLE 1 student network
The concrete structure of the student network is shown in table 1, where MBConv represents a Mobile phone-side inverse residual module (Mobile inversed bottleeck) used by the efficiency, DSConv represents a depth separable convolution, and k represents the size of a convolution kernel.
The teacher network located above and the student network located below are organically linked together by a Knowledge Transfer (knowlege Transfer) module.
Two knowledge distillation methods are used in the efficient face alignment network based on deep knowledge migration, so that different types of dark knowledge are migrated to a student network EfficientFAN from a teacher network.
The knowledge distillation method of feature alignment extracts pixel distribution information on the feature graph, aligns pixel distribution of the teacher network and the student network feature graph, and enables the feature graph distribution of the student network to be close to the distribution of the teacher network.
Correspondingly, the knowledge distillation method of block similarity extracts the face structure information under different scales, and transmits the structural information of the face image to a student network from a teacher network, so that the simple student network can learn the face structure information of the current image.
Feature alignment distillation aligns the channel dimensions of feature maps at the same stage of teacher and student networks and directly compares the difference between the teacher and student network feature maps as the supervisory information in the student network training process.
S3: and setting parameters of the convolutional neural network, training an initial face by utilizing a Pythrch to align a teacher network and a student network in a network frame, and generating a training model until a loss function and the maximum iteration number meet preset conditions.
Specifically, the teacher network and the student network are trained separately, using only the feature point loss function LPAnd optimizing the network parameters. Characteristic point loss function LPThrough the calculation of the Wing loss function, the Wing loss function can be expressed as follows:
Figure BDA0002630256640000092
Figure BDA0002630256640000101
wherein P ∈ R1×2NIs a coordinate vector of a predicted face characteristic point, G is belonged to R1×2NIs the real face feature point coordinate vector, and N is the number of face feature points. (x) is a specially designed loss function that, for smaller errors, behaves as a logarithmic loss function with an offset; for larger errors, which are expressed as L1 loss functions, ω, ∈ is a preset parameter of f (x),
Figure BDA0002630256640000102
is a constant.
S4: model parameters of the teacher network are frozen, deep dark knowledge learned by the teacher network is extracted, the deep dark knowledge is transmitted to the student network, and a final face alignment network model is generated in the training process of the student network.
Specifically, the knowledge distillation method for feature alignment extracts pixel distribution information on a feature map, aligns pixel distribution of a teacher network and a student network feature map, and enables the feature map distribution of the student network to be close to the distribution of the teacher network. The knowledge distillation loss function for feature alignment can be defined as follows:
Figure BDA0002630256640000103
wherein A and B are respectively the characteristic diagrams of the teacher network and the student network at the same stage,
Figure BDA0002630256640000104
is a 1 x 1 convolutional layer for aligning the channel dimensions of both a and B feature maps.
The knowledge distillation method of block similarity extracts face structure information under different scales, and transmits the structural information of the face image to a student network from a teacher network, so that the simple student network can learn the face structure information of the current image.
And constructing relationship graphs with different scales for the input feature graph, and calculating a similarity matrix based on the constructed relationship graphs. For a feature map of size H × W, the feature map region may be divided into local blocks of different sizes. The size of the characteristic diagram is usually H-W-2nThe whole feature graph is used as a connected domain, the relation graph is constructed by taking local blocks with different sizes as nodes, and the nodes in the relation graph can be set to be 2k×2kK is a partial block of size 0, 1, …, k-1. A width 2n×2nThe feature map of (2) constructs a node size of 2k×2kContains 2n-k×2n-kA local block or a relational node. For simplicity, 2 will be used with an average pooling operationk×2kThe local blocks of (a) are aggregated into a 1 × 1 relationship graph node. For a feature graph with the number of channels being, the vectorization of the first node in the constructed relationship graph can be expressed as fi∈RC. Calculating similarity between nodes in a relational graph by using cosine similarity of vectors, i-th node vector fiAnd the jth node vector fjThe similarity between aijThe calculation is as follows.
Figure BDA0002630256640000111
In particular, the intermediate feature maps of the teacher network and the student network at the same stage have the same resolution and different channel numbers. Suppose the characteristic diagram of the teacher network is A e RC×H×WThe characteristic diagram of the student network is B e RC′×H×W2 on the characteristic diagramk×2kIn the affinity graph constructed by using local blocks with the same size as the nodes, the number of the nodes is 4n-kThe similarity relation between the two nodes can be calculated to obtain 4n-k×4n-kA similarity matrix of sizes. Order to
Figure BDA0002630256640000112
2 on the network characteristic diagram of the teacherk×2kThe local block of the size is the cosine similarity obtained by the ith node and the jth node in a relational graph constructed by the nodes,
Figure BDA0002630256640000113
the corresponding characteristic diagram of the student network is also shown by 2k×2kThe cosine similarity obtained by the ith node and the jth node in a relational graph constructed by local blocks with the size can be summarized as follows, and the loss function of the block similarity knowledge distillation method can be summarized as follows, wherein the size of the characteristic graph satisfies that H, W, 2n
Figure BDA0002630256640000114
Introducing a knowledge migration loss function L by combining a feature alignment knowledge distillation method and a block similarity knowledge distillation methodKTThe training process of the student network is supervised as part of the network training loss function. The student network learns not only the true tag information provided by the labeled face feature point coordinates, but also more refined face structured knowledge and data distribution knowledge extracted from the teacher network. Optimizing the performance of the student network EfficientFAN by using a knowledge migration module and a knowledge distillation method, and completing pre-trainingThe parameters of the teacher network are kept in a frozen state, and the knowledge is transferred to a loss function LKTAnd the method is added into a training loss function, and the dark knowledge learned by a teacher through a network is distilled and transmitted to a student network in the training process of EfficientFAN, so that the positioning precision of the face feature points of the student network is improved. The loss function ultimately used to optimize the student network EfficientFAN is shown below, as a characteristic point loss function LPAnd LKTIn combination, where λ is an adjustable weighting parameter, for balancing the influence of two loss functions,
Figure BDA0002630256640000115
and
Figure BDA0002630256640000116
respectively, a block similarity knowledge distillation loss function and a feature alignment knowledge distillation loss function at the d stage of the decoder.
Figure BDA0002630256640000117
S5: and inputting the RGB face image in the natural scene into a final face alignment network model, and outputting a face characteristic point detection result.
According to the face feature point detection method based on depth knowledge migration, provided by the embodiment of the invention, EfficientFAN is used as a simple and effective lightweight model, and the decoder structure based on upsampling and depth separable convolution quickly realizes the upsampling recovery process of the feature map, so that the spatial information of the feature map is effectively saved.
Compared with the current advanced large complex model, the method can achieve comparable human face characteristic point detection precision, but the model parameter quantity and the calculation complexity are obviously reduced.
The invention uses a knowledge distillation method and a knowledge migration module to improve the accuracy of positioning the feature points of the EfficientFAN face of a student network, provides a block similarity knowledge distillation method for learning multi-scale structural information of the face, combines the pixel distribution information on a feature alignment knowledge distillation learning feature map, and supervises and guides the training process of the EfficientFAN together. Under the premise of not changing the network structure and not increasing the model parameters, EfficientFAN obtains a more accurate human face characteristic point detection result through a knowledge migration method. Experimental results on a public data set show that EfficientFAN is a simple and effective human face characteristic point detection network, and the knowledge distillation method effectively improves the human face characteristic point detection precision. In combination, EfficientFAN has excellent performance, precision and speed.
Fig. 2 is a block diagram of the structure of the apparatus for detecting human face feature points based on deep knowledge migration according to the present invention. As shown in fig. 2, the apparatus for detecting human face feature points based on depth knowledge migration according to the present invention includes: a module 100, an output module 200 and a control processing module 300 are provided.
The providing module 100 is configured to provide a face data set including a label of a face feature point, and crop a face image according to a face detection frame provided by the face data set or an enclosure of the face feature point to obtain a training set, a verification set, and a test set. The control processing module 300 is configured to obtain a training sample from a training set, obtain a test sample from a test set, and input the test sample and the training sample into an initial face alignment network framework. The control processing module 300 is further configured to set parameters of the convolutional neural network, train an initial face to align with a teacher network and a student network in the network framework by using a Pytorch, and generate a training model until a loss function and the maximum iteration number meet predetermined conditions. The control processing module 300 is further configured to freeze model parameters of the teacher network, extract deep dark knowledge learned by the teacher network, transmit the deep dark knowledge to the student network, and supervise a training process of the student network to generate a final face alignment network model. The control processing module 300 is further configured to input the RGB face images in the natural scene into the final face alignment network model, and output a face feature point detection result through the output module.
It should be noted that, the specific implementation of the face feature point detection apparatus based on deep knowledge migration in the embodiment of the present invention is similar to the specific implementation of the face feature point detection method based on deep knowledge migration in the embodiment of the present invention, and specific reference is specifically made to the description of the face feature point detection method based on deep knowledge migration, and details are not repeated for reducing redundancy.
In addition, other configurations and functions of the apparatus for detecting human face feature points based on deep knowledge migration according to the embodiments of the present invention are known to those skilled in the art, and are not described in detail for reducing redundancy.
An embodiment of the present invention further provides an electronic device, including: at least one processor and at least one memory; the memory is to store one or more program instructions; the processor is configured to execute one or more program instructions to perform the method for detecting human face feature points based on deep knowledge migration according to the first aspect.
The disclosed embodiments of the present invention provide a computer-readable storage medium having stored therein computer program instructions, which, when run on a computer, cause the computer to execute the above-mentioned face feature point detection method based on deep knowledge migration.
In an embodiment of the invention, the processor may be an integrated circuit chip having signal processing capability. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The processor reads the information in the storage medium and completes the steps of the method in combination with the hardware.
The storage medium may be a memory, for example, which may be volatile memory or nonvolatile memory, or which may include both volatile and nonvolatile memory.
The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory.
The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of example and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (ddr Data Rate SDRAM), Enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM).
The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.
Those skilled in the art will appreciate that the functionality described in the present invention may be implemented in a combination of hardware and software in one or more of the examples described above. When software is applied, the corresponding functionality may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (9)

1. A face feature point detection method based on depth knowledge migration is characterized by comprising the following steps:
s1: providing a face data set containing face characteristic point labels, and cutting a face image according to a face detection frame or an enclosing frame of the face characteristic points provided by the face data set to obtain a training set, a verification set and a test set;
s2: acquiring a training sample from the training set, acquiring a test sample from the test set, and inputting the test sample and the training sample into an initial face alignment network framework;
s3: setting parameters of a convolutional neural network, training a teacher network and a student network in the initial face alignment network framework by utilizing a Pythrch, and generating a training model until a loss function and the maximum iteration number meet preset conditions;
s4: freezing model parameters of a teacher network, extracting deep dark knowledge learned by the teacher network, transmitting the deep dark knowledge to the student network, and supervising a training process of the student network to generate a final face alignment network model;
s5: and inputting the RGB face image in the natural scene into the final face alignment network model, and outputting a face characteristic point detection result.
2. The method for detecting human face feature points based on deep knowledge migration according to claim 1, wherein step S1 includes:
s1-1: providing a WFLW data set, wherein the WFLW data set comprises N training pictures and M testing pictures, each picture is provided with a picture label, the picture information comprises face frame information, face feature point position information and a plurality of attribute information, and N and M are positive integers larger than zero;
s1-2: and cutting a face image according to a face detection frame provided by the face data set, disturbing the face detection frame, and applying random rotation, size scaling and turning to the face image so as to perform data enhancement to obtain the training set, the verification set and the test set.
3. The method of claim 1, wherein the initial face alignment network framework is generated by:
generating the teacher grid by adopting a network structure of an encoder-decoder, wherein the teacher grid encoder comprises three upper sampling layers and convolution layers, and is used for carrying out feature extraction and encoding on an input image, retaining feature extraction information of an original network, and removing a final average pooling layer, a full-connection layer for classification and a final 1 × 1 convolution layer for ascending dimension;
adding the decoder behind the encoder, performing spatial up-sampling on the image features extracted by the encoder to obtain a feature map, converting the channel dimension of the feature map into the number of the human face feature points, and calculating the coordinates of the human face feature points expected to be obtained on each transformed feature map by using spatial softargmax operation;
providing a student network with an EfficientFAN structure, wherein a student network encoder comprises three upsampling layers and convolutional layers, the student network is used for final human face characteristic point detection, EfficientNet-B0 is used as a main part of the student network encoder, and the final average pooling layer of EfficientNet-B0, a full connection layer for classification and a 1 x 1 convolutional layer of the last ascending dimension are removed;
adding a convolution layer of 1 multiplied by 1 behind the student grid encoder, converting the channel number of the feature map obtained by up-sampling the student grid encoder into the number of the human face feature points, and calculating the coordinates of the human face feature points on the converted feature map by using spatial softargmax operation.
4. The method for detecting human face feature points based on deep knowledge migration according to claim 3, wherein the step S3 includes:
training the teacher network and the student network separately, using a feature point loss function LPOptimizing network parameters, characteristic point loss function LPCalculated by the Wing loss function, the Wing loss function is expressed as follows:
Figure FDA0002630256630000021
Figure FDA0002630256630000022
wherein P ∈ R1×2NIs a coordinate vector of a predicted face characteristic point, G is belonged to R1×2NIs the real face feature point coordinate vector, N is the number of face feature points, ω, ∈ is the preset parameter of f (x).
5. The method for detecting human face feature points based on deep knowledge migration of claim 4, wherein in step S4, extracting deep dark knowledge learned by the teacher network comprises:
extracting pixel distribution information on a feature map based on a knowledge distillation method of feature alignment, aligning the pixel distribution of the teacher network feature map and the pixel distribution of the student network feature map, wherein a knowledge distillation loss function of feature alignment is as follows:
Figure FDA0002630256630000031
wherein A and B are the teacher network respectivelyA profile of the student network at the same stage,
Figure FDA0002630256630000032
is a 1 x 1 convolutional layer for aligning the channel dimensions of both a and B feature maps.
6. The method for detecting human face feature points based on deep knowledge migration according to claim 5, wherein in step S4, the transferring the deep dark knowledge to the student network comprises:
and extracting face structure information under different scales by a knowledge distillation method based on block similarity, and transmitting the structural information of the face image to the student network from the teacher network.
7. A face feature point detection device based on deep knowledge migration, characterized by comprising:
the system comprises a providing module, a judging module and a judging module, wherein the providing module is used for providing a human face data set containing human face characteristic point labels, and cutting a human face image according to a human face detection frame or a surrounding frame of the human face characteristic points provided by the human face data set to obtain a training set, a verification set and a test set;
an output module;
the control processing module is used for acquiring a training sample from the training set, acquiring a test sample from the test set, and inputting the test sample and the training sample into an initial face alignment network framework; the control processing module is further used for setting parameters of a convolutional neural network, training a teacher network and a student network in the initial face alignment network frame by utilizing a Pythrch, and generating a training model until a loss function and the maximum iteration number meet preset conditions; the control processing module is also used for freezing model parameters of a teacher network, extracting deep dark knowledge learned by the teacher network, transmitting the deep dark knowledge to the student network, and monitoring a training process of the student network to generate a final face alignment network model; the control processing module is further used for inputting the RGB face image in the natural scene into the final face alignment network model, and outputting the face characteristic point detection result through the output module.
8. An electronic device, characterized in that the electronic device comprises: at least one processor and at least one memory;
the memory is to store one or more program instructions;
the processor, configured to execute one or more program instructions to perform the method for detecting facial feature points based on deep knowledge migration according to any one of claims 1 to 6.
9. A computer-readable storage medium containing one or more program instructions for executing the method for deep knowledge migration based face feature point detection according to any one of claims 1 to 6.
CN202010809064.1A 2020-08-12 2020-08-12 Depth knowledge migration-based face feature point detection method, device and equipment Active CN112115783B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010809064.1A CN112115783B (en) 2020-08-12 2020-08-12 Depth knowledge migration-based face feature point detection method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010809064.1A CN112115783B (en) 2020-08-12 2020-08-12 Depth knowledge migration-based face feature point detection method, device and equipment

Publications (2)

Publication Number Publication Date
CN112115783A true CN112115783A (en) 2020-12-22
CN112115783B CN112115783B (en) 2023-11-14

Family

ID=73805270

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010809064.1A Active CN112115783B (en) 2020-08-12 2020-08-12 Depth knowledge migration-based face feature point detection method, device and equipment

Country Status (1)

Country Link
CN (1) CN112115783B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418195A (en) * 2021-01-22 2021-02-26 电子科技大学中山学院 Face key point detection method and device, electronic equipment and storage medium
CN112633406A (en) * 2020-12-31 2021-04-09 天津大学 Knowledge distillation-based few-sample target detection method
CN112634441A (en) * 2020-12-28 2021-04-09 深圳市人工智能与机器人研究院 3D human body model generation method, system and related equipment
CN112734632A (en) * 2021-01-05 2021-04-30 百果园技术(新加坡)有限公司 Image processing method, image processing device, electronic equipment and readable storage medium
CN112767320A (en) * 2020-12-31 2021-05-07 平安科技(深圳)有限公司 Image detection method, image detection device, electronic equipment and storage medium
CN113052144A (en) * 2021-04-30 2021-06-29 平安科技(深圳)有限公司 Training method, device and equipment of living human face detection model and storage medium
CN113343898A (en) * 2021-06-25 2021-09-03 江苏大学 Mask shielding face recognition method, device and equipment based on knowledge distillation network
CN113343979A (en) * 2021-05-31 2021-09-03 北京百度网讯科技有限公司 Method, apparatus, device, medium and program product for training a model
CN113470099A (en) * 2021-07-09 2021-10-01 北京的卢深视科技有限公司 Depth imaging method, electronic device and storage medium
CN113487614A (en) * 2021-09-08 2021-10-08 四川大学 Training method and device for fetus ultrasonic standard section image recognition network model
CN113628635A (en) * 2021-07-19 2021-11-09 武汉理工大学 Voice-driven speaking face video generation method based on teacher and student network
CN113705361A (en) * 2021-08-03 2021-11-26 北京百度网讯科技有限公司 Method and device for detecting model in living body and electronic equipment
CN113705317A (en) * 2021-04-14 2021-11-26 腾讯科技(深圳)有限公司 Image processing model training method, image processing method and related equipment
CN113947801A (en) * 2021-12-21 2022-01-18 中科视语(北京)科技有限公司 Face recognition method and device and electronic equipment
US20220156596A1 (en) * 2020-11-17 2022-05-19 A.I.MATICS Inc. Neural architecture search method based on knowledge distillation
WO2022156331A1 (en) * 2021-01-22 2022-07-28 北京市商汤科技开发有限公司 Knowledge distillation and image processing method and apparatus, electronic device, and storage medium
CN114821654A (en) * 2022-05-09 2022-07-29 福州大学 Human hand detection method fusing local and depth space-time diagram network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108363962A (en) * 2018-01-25 2018-08-03 南京邮电大学 A kind of method for detecting human face and system based on multi-level features deep learning
WO2019128646A1 (en) * 2017-12-28 2019-07-04 深圳励飞科技有限公司 Face detection method, method and device for training parameters of convolutional neural network, and medium
CN110414400A (en) * 2019-07-22 2019-11-05 中国电建集团成都勘测设计研究院有限公司 A kind of construction site safety cap wearing automatic testing method and system
CN110674714A (en) * 2019-09-13 2020-01-10 东南大学 Human face and human face key point joint detection method based on transfer learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019128646A1 (en) * 2017-12-28 2019-07-04 深圳励飞科技有限公司 Face detection method, method and device for training parameters of convolutional neural network, and medium
CN108363962A (en) * 2018-01-25 2018-08-03 南京邮电大学 A kind of method for detecting human face and system based on multi-level features deep learning
CN110414400A (en) * 2019-07-22 2019-11-05 中国电建集团成都勘测设计研究院有限公司 A kind of construction site safety cap wearing automatic testing method and system
CN110674714A (en) * 2019-09-13 2020-01-10 东南大学 Human face and human face key point joint detection method based on transfer learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘伦豪杰;王晨辉;卢慧;王家豪;: "基于迁移卷积神经网络的人脸表情识别", 电脑知识与技术, no. 07 *
张延安;王宏玉;徐方;: "基于深度卷积神经网络与中心损失的人脸识别", 科学技术与工程, no. 35 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12051004B2 (en) * 2020-11-17 2024-07-30 Aimatics Co., Ltd. Neural architecture search method based on knowledge distillation
US20220156596A1 (en) * 2020-11-17 2022-05-19 A.I.MATICS Inc. Neural architecture search method based on knowledge distillation
CN112634441A (en) * 2020-12-28 2021-04-09 深圳市人工智能与机器人研究院 3D human body model generation method, system and related equipment
CN112634441B (en) * 2020-12-28 2023-08-22 深圳市人工智能与机器人研究院 3D human body model generation method, system and related equipment
CN112633406A (en) * 2020-12-31 2021-04-09 天津大学 Knowledge distillation-based few-sample target detection method
CN112767320A (en) * 2020-12-31 2021-05-07 平安科技(深圳)有限公司 Image detection method, image detection device, electronic equipment and storage medium
WO2022141859A1 (en) * 2020-12-31 2022-07-07 平安科技(深圳)有限公司 Image detection method and apparatus, and electronic device and storage medium
CN112734632A (en) * 2021-01-05 2021-04-30 百果园技术(新加坡)有限公司 Image processing method, image processing device, electronic equipment and readable storage medium
CN112734632B (en) * 2021-01-05 2024-02-27 百果园技术(新加坡)有限公司 Image processing method, device, electronic equipment and readable storage medium
CN112418195A (en) * 2021-01-22 2021-02-26 电子科技大学中山学院 Face key point detection method and device, electronic equipment and storage medium
WO2022156331A1 (en) * 2021-01-22 2022-07-28 北京市商汤科技开发有限公司 Knowledge distillation and image processing method and apparatus, electronic device, and storage medium
CN113705317A (en) * 2021-04-14 2021-11-26 腾讯科技(深圳)有限公司 Image processing model training method, image processing method and related equipment
CN113705317B (en) * 2021-04-14 2024-04-26 腾讯科技(深圳)有限公司 Image processing model training method, image processing method and related equipment
CN113052144B (en) * 2021-04-30 2023-02-28 平安科技(深圳)有限公司 Training method, device and equipment of living human face detection model and storage medium
CN113052144A (en) * 2021-04-30 2021-06-29 平安科技(深圳)有限公司 Training method, device and equipment of living human face detection model and storage medium
CN113343979A (en) * 2021-05-31 2021-09-03 北京百度网讯科技有限公司 Method, apparatus, device, medium and program product for training a model
CN113343898A (en) * 2021-06-25 2021-09-03 江苏大学 Mask shielding face recognition method, device and equipment based on knowledge distillation network
CN113470099A (en) * 2021-07-09 2021-10-01 北京的卢深视科技有限公司 Depth imaging method, electronic device and storage medium
CN113470099B (en) * 2021-07-09 2022-03-25 北京的卢深视科技有限公司 Depth imaging method, electronic device and storage medium
CN113628635B (en) * 2021-07-19 2023-09-15 武汉理工大学 Voice-driven speaker face video generation method based on teacher student network
CN113628635A (en) * 2021-07-19 2021-11-09 武汉理工大学 Voice-driven speaking face video generation method based on teacher and student network
CN113705361A (en) * 2021-08-03 2021-11-26 北京百度网讯科技有限公司 Method and device for detecting model in living body and electronic equipment
CN113487614A (en) * 2021-09-08 2021-10-08 四川大学 Training method and device for fetus ultrasonic standard section image recognition network model
CN113947801B (en) * 2021-12-21 2022-07-26 中科视语(北京)科技有限公司 Face recognition method and device and electronic equipment
CN113947801A (en) * 2021-12-21 2022-01-18 中科视语(北京)科技有限公司 Face recognition method and device and electronic equipment
CN114821654A (en) * 2022-05-09 2022-07-29 福州大学 Human hand detection method fusing local and depth space-time diagram network

Also Published As

Publication number Publication date
CN112115783B (en) 2023-11-14

Similar Documents

Publication Publication Date Title
CN112115783B (en) Depth knowledge migration-based face feature point detection method, device and equipment
CN109299274B (en) Natural scene text detection method based on full convolution neural network
CN109840556B (en) Image classification and identification method based on twin network
CN110569738B (en) Natural scene text detection method, equipment and medium based on densely connected network
Chen et al. Convolutional neural network based dem super resolution
CN110782420A (en) Small target feature representation enhancement method based on deep learning
CN113066089B (en) Real-time image semantic segmentation method based on attention guide mechanism
CN110599502B (en) Skin lesion segmentation method based on deep learning
CN114913379B (en) Remote sensing image small sample scene classification method based on multitasking dynamic contrast learning
CN109299303B (en) Hand-drawn sketch retrieval method based on deformable convolution and depth network
TWI803243B (en) Method for expanding images, computer device and storage medium
CN116468919A (en) Image local feature matching method and system
Alshehri A content-based image retrieval method using neural network-based prediction technique
CN117197462A (en) Lightweight foundation cloud segmentation method and system based on multi-scale feature fusion and alignment
CN114782752B (en) Small sample image integrated classification method and device based on self-training
CN117693768A (en) Semantic segmentation model optimization method and device
CN116503398B (en) Insulator pollution flashover detection method and device, electronic equipment and storage medium
CN117765363A (en) Image anomaly detection method and system based on lightweight memory bank
Wang et al. Insulator defect detection based on improved you-only-look-once v4 in complex scenarios
CN116778164A (en) Semantic segmentation method for improving deep V < 3+ > network based on multi-scale structure
CN117115616A (en) Real-time low-illumination image target detection method based on convolutional neural network
CN117115880A (en) Lightweight face key point detection method based on heavy parameterization
CN116343034A (en) Remote sensing image change detection method, system, electronic equipment and medium
CN116246147A (en) Cross-species target detection method based on cross-layer feature fusion and linear attention optimization
CN114913382A (en) Aerial photography scene classification method based on CBAM-AlexNet convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant