CN112115783A

CN112115783A - Human face characteristic point detection method, device and equipment based on deep knowledge migration

Info

Publication number: CN112115783A
Application number: CN202010809064.1A
Authority: CN
Inventors: 吕科; 高鹏程; 薛健
Original assignee: University of Chinese Academy of Sciences
Current assignee: University of Chinese Academy of Sciences
Priority date: 2020-08-12
Filing date: 2020-08-12
Publication date: 2020-12-22
Anticipated expiration: 2040-08-12
Also published as: CN112115783B

Abstract

The embodiment of the invention discloses a method, a device and equipment for detecting human face characteristic points based on deep knowledge migration, wherein the method comprises the following steps: providing a face data set, and cutting a face image according to a face detection frame or a surrounding frame of a face characteristic point provided by the face data set to obtain a training set, a verification set and a test set; inputting a test sample and a training sample into an initial face alignment network frame; training an initial face to align a teacher network and a student network in a network frame by using a Pythrch until a loss function and the maximum iteration number meet a preset condition to generate a training model; model parameters of a teacher network are frozen, deep dark knowledge learned by the teacher network is extracted and transmitted to a student network to generate a final face alignment network model; and inputting the RGB face image in the natural scene into a final face alignment network model, and outputting a face characteristic point detection result. The invention has the advantages of high face feature point detection precision, low model parameter quantity and low calculation complexity.

Description

Human face characteristic point detection method, device and equipment based on deep knowledge migration

Technical Field

The embodiment of the invention relates to the field of computer vision and digital image processing, in particular to a method, a device and equipment for detecting human face characteristic points based on deep knowledge migration.

Background

The existing method for detecting the human face characteristic points cannot effectively solve the problem of human face characteristic point positioning in a natural scene, and the complicated method has huge model parameters and higher calculation complexity and cannot meet the requirement of running speed. The simple method cannot cope with the interference of factors such as extreme postures, variable illumination, severe shielding and the like in a natural scene, and the precision cannot meet the application requirement.

Disclosure of Invention

The embodiment of the invention aims to provide a method, a device and equipment for detecting human face characteristic points based on deep knowledge migration, which are used for solving the problems of higher computational complexity, low running speed and low precision of the existing human face characteristic point detection.

In order to achieve the above object, the embodiments of the present invention mainly provide the following technical solutions:

in a first aspect, an embodiment of the present invention provides a method for detecting a human face feature point based on depth knowledge migration, including:

s1: providing a face data set containing face characteristic point labels, and cutting a face image according to a face detection frame or an enclosing frame of the face characteristic points provided by the face data set to obtain a training set, a verification set and a test set;

s2: acquiring a training sample from the training set, acquiring a test sample from the test set, and inputting the test sample and the training sample into an initial face alignment network framework;

s3: setting parameters of a convolutional neural network, training a teacher network and a student network in the initial face alignment network framework by utilizing a Pythrch, and generating a training model until a loss function and the maximum iteration number meet preset conditions;

s4: freezing model parameters of a teacher network, extracting deep dark knowledge learned by the teacher network, transmitting the deep dark knowledge to the student network, and supervising a training process of the student network to generate a final face alignment network model;

s5: and inputting the RGB face image in the natural scene into the final face alignment network model, and outputting a face characteristic point detection result.

In one embodiment of the present invention, step S1 includes:

s1-1: providing a WFLW data set, wherein the WFLW data set comprises N training pictures and M testing pictures, each picture is provided with a picture label, the picture information comprises face frame information, face feature point position information and a plurality of attribute information, and N and M are positive integers larger than zero;

s1-2: and cutting a face image according to a face detection frame provided by the face data set, disturbing the face detection frame, and applying random rotation, size scaling and turning to the face image so as to perform data enhancement to obtain the training set, the verification set and the test set.

In one embodiment of the present invention, the initial face alignment web frame is generated by:

generating the teacher grid by adopting a network structure of an encoder-decoder, wherein the teacher grid encoder comprises three upper sampling layers and convolution layers, and is used for carrying out feature extraction and encoding on an input image, retaining feature extraction information of an original network, and removing a final average pooling layer, a full-connection layer for classification and a final 1 × 1 convolution layer for ascending dimension;

adding the decoder behind the encoder, performing spatial up-sampling on the image features extracted by the encoder to obtain a feature map, converting the channel dimension of the feature map into the number of the human face feature points, and calculating the coordinates of the human face feature points expected to be obtained on each transformed feature map by using spatial softargmax operation;

providing a student network with an EfficientFAN structure, wherein a student network encoder comprises three upsampling layers and convolutional layers, the student network is used for final human face characteristic point detection, EfficientNet-B0 is used as a main part of the student network encoder, and the final average pooling layer of EfficientNet-B0, a full connection layer for classification and a 1 x 1 convolutional layer of the last ascending dimension are removed;

adding a convolution layer of 1 multiplied by 1 behind the student grid encoder, converting the channel number of the feature map obtained by up-sampling the student grid encoder into the number of the human face feature points, and calculating the coordinates of the human face feature points on the converted feature map by using spatial softargmax operation.

In one embodiment of the present invention, step S3 includes:

training the teacher network and the student network separately, using a feature point loss function L_POptimizing network parameters, characteristic point loss function L_PCalculated by the Wingloss loss function, the Wingloss loss function is expressed as follows:

wherein P ∈ R^1×2NIs a coordinate vector of a predicted face characteristic point, G is belonged to R^1×2NIs the real face feature point coordinate vector, N is the number of face feature points, ω, ∈ is the preset parameter of f (x).

In one embodiment of the present invention, in step S4, extracting deep dark knowledge learned by the teacher network includes:

extracting pixel distribution information on a feature map based on a knowledge distillation method of feature alignment, aligning the pixel distribution of the teacher network feature map and the pixel distribution of the student network feature map, wherein a knowledge distillation loss function of feature alignment is as follows:

wherein A and B are feature maps of the teacher network and the student network at the same stage,

is a 1 x 1 convolutional layer for aligning the channel dimensions of both a and B feature maps.

In one embodiment of the present invention, in step S4, the passing the deep dark knowledge to the student network includes:

and extracting face structure information under different scales by a knowledge distillation method based on block similarity, and transmitting the structural information of the face image to the student network from the teacher network.

In a second aspect, an embodiment of the present invention further provides a face feature point detection apparatus based on depth knowledge migration, including:

the system comprises a providing module, a judging module and a judging module, wherein the providing module is used for providing a human face data set containing human face characteristic point labels, and cutting a human face image according to a human face detection frame or a surrounding frame of the human face characteristic points provided by the human face data set to obtain a training set, a verification set and a test set;

an output module;

the control processing module is used for acquiring a training sample from the training set, acquiring a test sample from the test set, and inputting the test sample and the training sample into an initial face alignment network framework; the control processing module is further used for setting parameters of a convolutional neural network, training a teacher network and a student network in the initial face alignment network frame by utilizing a Pythrch, and generating a training model until a loss function and the maximum iteration number meet preset conditions; the control processing module is also used for freezing model parameters of a teacher network, extracting deep dark knowledge learned by the teacher network, transmitting the deep dark knowledge to the student network, and monitoring a training process of the student network to generate a final face alignment network model; the control processing module is further used for inputting the RGB face image in the natural scene into the final face alignment network model, and outputting the face characteristic point detection result through the output module.

In a third aspect, an embodiment of the present invention further provides an electronic device, including: at least one processor and at least one memory; the memory is to store one or more program instructions; the processor is configured to execute one or more program instructions to perform the method for detecting human face feature points based on deep knowledge migration according to the first aspect.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium containing one or more program instructions for being executed by the method for detecting human face feature points based on deep knowledge migration according to the first aspect.

The technical scheme provided by the embodiment of the invention at least has the following advantages:

according to the method, the device and the equipment for detecting the human face feature points based on the deep knowledge migration, provided by the embodiment of the invention, EfficientFAN is used as a simple and effective lightweight model, the up-sampling recovery process of the feature map is rapidly realized based on the decoder structure of up-sampling and deep separable convolution, and the spatial information of the feature map is effectively saved.

Compared with the current advanced large complex model, the method can achieve comparable human face characteristic point detection precision, but the model parameter quantity and the calculation complexity are obviously reduced.

The invention uses a knowledge distillation method and a knowledge migration module to improve the accuracy of positioning the feature points of the EfficientFAN face of a student network, provides a block similarity knowledge distillation method for learning multi-scale structural information of the face, combines the pixel distribution information on a feature alignment knowledge distillation learning feature map, and supervises and guides the training process of the EfficientFAN together. Under the premise of not changing the network structure and not increasing the model parameters, EfficientFAN obtains a more accurate human face characteristic point detection result through a knowledge migration method. Experimental results on a public data set show that EfficientFAN is a simple and effective human face characteristic point detection network, and the knowledge distillation method effectively improves the human face characteristic point detection precision. In combination, EfficientFAN has excellent performance, precision and speed.

Drawings

Fig. 1 is a flowchart of a method for detecting human face feature points based on depth knowledge migration according to the present invention.

Fig. 2 is a block diagram of the structure of the apparatus for detecting human face feature points based on deep knowledge migration according to the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In the description of the present invention, it is to be understood that the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "connected" and "connected" are to be interpreted broadly, e.g., as meaning directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Fig. 1 is a flowchart of a method for detecting human face feature points based on depth knowledge migration according to the present invention. As shown in fig. 1, the method for detecting human face feature points based on depth knowledge migration of the present invention includes:

s1: providing a face data set containing the marks of the face characteristic points, and cutting a face image according to a face detection frame or an enclosing frame of the face characteristic points provided by the face data set to obtain a training set, a verification set and a test set.

Specifically, step S1 includes:

s1-1: a WFLW dataset is provided. The data set originates from IEEE Conference on Computer Vision and Pattern Recognition 2018, and comprises 10000 pictures (7500 training pictures and 2500 test pictures). Each picture label provides face frame information, 98 person face feature point location information, and 6 attribute information (pose, expression, lighting, makeup, occlusion, blur), and divides the entire data set into 6 classes of subsets according to the image attribute information.

S1-2: and cutting the face image according to a face detection frame provided by the face data set, disturbing the face detection frame, and applying random rotation, size scaling and overturning to the face image so as to perform data enhancement to obtain a training set, a verification set and a test set.

S2: and acquiring a training sample from the training set, acquiring a test sample from the test set, and inputting the test sample and the training sample into an initial face alignment network framework.

Specifically, the teacher network employs an encoder-decoder network architecture, using EfficientNet-B7 as the backbone of its encoder. The encoder is used to perform feature extraction and encoding on the input image, only the feature extraction part of the original network is retained, the last average pooling layer and the fully connected layer used for classification are removed, the last up-dimension 1 × 1 convolutional layer is also removed and features are extracted from the last inverse residual module. The operation enables the number of channels of the feature map extracted by the teacher network to have a smaller number of channels (640vs.2048) than the feature map after the feature map passes through the 1 × 1 convolutional layer, maintains more original feature information, does not lose information due to dimension increase, and is more suitable for being analyzed by a decoder.

And a decoder is added after the last reverse residual error module of EfficientNet-B7, the image features extracted by the encoder are subjected to spatial up-sampling, a more natural up-sampling method is used for improving the spatial dimension of the feature map, namely, the combination of an up-sampling layer and a convolution layer is used for replacing deconvolution, a general up-sampling method is firstly used for carrying out spatial up-sampling on the feature map, and then convolution operation is carried out on the basis of the up-sampling feature map to enrich the transformation of the feature map.

The present invention uses a combination of three upsampling layers and convolutional layers as a decoder for a face alignment network, added after the encoder. In the network model, the traditional convolution operation is replaced by the deep separable convolution, so that the calculation amount in the up-sampling process is reduced.

Specifically, the scaling factor of the upsampling layer is set to 2, that is, the length and width of each upsampled feature map are expanded to be twice of the input feature map, and the upsampling of the feature map is realized based on the nearest neighbor interpolation algorithm. A spatial thermodynamic diagram is generated using 1 x 1 convolutional layers after the decoder and the channel dimensions of the feature map are converted to the number of human face feature points. And calculating the coordinates of the expected corresponding human face characteristic points on each transformed characteristic map by using a spatial softargmax operation.

The spatial softargmax operation can be divided into two steps, the first step is normalized on the output characteristic diagram by using the softmax operation, and can be represented as follows:

wherein x and y are pixel indexes, exp represents an exponential function, and the obtained M is a normalized feature map. In the second step, the coordinate P of the feature point l can be finally expressed as:

a small and light student Network called Efficient Face Alignment Network (Efficient FAN) has a Network structure similar to that of a teacher Network and is used for final Face feature point detection. EfficientNet-B0 was used as the backbone for the student network EfficientFAN encoder. Like the teacher's network, the encoder of the student's network also removed the last average pooling layer and the fully connected layer for classification in EfficientNet-B0, as well as the 1 × 1 convolutional layer for the last ascending dimension.

Likewise, a combination of three upsampled layers and convolutional layers is used as a decoder for the student network, added after the encoder. The scale factor for each upsampled layer is 2 and the number of output channels per convolutional layer is 128. A1 multiplied by 1 convolutional layer is added after a decoder of the student network, and the number of channels of the feature map obtained by up-sampling the decoder is converted from 128 into the number of face feature points.

And finally, calculating the coordinates of the human face characteristic points on the converted characteristic graph by using a spatial softargmax operation.

TABLE 1 student network

The concrete structure of the student network is shown in table 1, where MBConv represents a Mobile phone-side inverse residual module (Mobile inversed bottleeck) used by the efficiency, DSConv represents a depth separable convolution, and k represents the size of a convolution kernel.

The teacher network located above and the student network located below are organically linked together by a Knowledge Transfer (knowlege Transfer) module.

Two knowledge distillation methods are used in the efficient face alignment network based on deep knowledge migration, so that different types of dark knowledge are migrated to a student network EfficientFAN from a teacher network.

The knowledge distillation method of feature alignment extracts pixel distribution information on the feature graph, aligns pixel distribution of the teacher network and the student network feature graph, and enables the feature graph distribution of the student network to be close to the distribution of the teacher network.

Correspondingly, the knowledge distillation method of block similarity extracts the face structure information under different scales, and transmits the structural information of the face image to a student network from a teacher network, so that the simple student network can learn the face structure information of the current image.

Feature alignment distillation aligns the channel dimensions of feature maps at the same stage of teacher and student networks and directly compares the difference between the teacher and student network feature maps as the supervisory information in the student network training process.

S3: and setting parameters of the convolutional neural network, training an initial face by utilizing a Pythrch to align a teacher network and a student network in a network frame, and generating a training model until a loss function and the maximum iteration number meet preset conditions.

Specifically, the teacher network and the student network are trained separately, using only the feature point loss function L_PAnd optimizing the network parameters. Characteristic point loss function L_PThrough the calculation of the Wing loss function, the Wing loss function can be expressed as follows:

wherein P ∈ R^1×2NIs a coordinate vector of a predicted face characteristic point, G is belonged to R^1×2NIs the real face feature point coordinate vector, and N is the number of face feature points. (x) is a specially designed loss function that, for smaller errors, behaves as a logarithmic loss function with an offset; for larger errors, which are expressed as L1 loss functions, ω, ∈ is a preset parameter of f (x),

is a constant.

S4: model parameters of the teacher network are frozen, deep dark knowledge learned by the teacher network is extracted, the deep dark knowledge is transmitted to the student network, and a final face alignment network model is generated in the training process of the student network.

Specifically, the knowledge distillation method for feature alignment extracts pixel distribution information on a feature map, aligns pixel distribution of a teacher network and a student network feature map, and enables the feature map distribution of the student network to be close to the distribution of the teacher network. The knowledge distillation loss function for feature alignment can be defined as follows:

wherein A and B are respectively the characteristic diagrams of the teacher network and the student network at the same stage,

The knowledge distillation method of block similarity extracts face structure information under different scales, and transmits the structural information of the face image to a student network from a teacher network, so that the simple student network can learn the face structure information of the current image.

And constructing relationship graphs with different scales for the input feature graph, and calculating a similarity matrix based on the constructed relationship graphs. For a feature map of size H × W, the feature map region may be divided into local blocks of different sizes. The size of the characteristic diagram is usually H-W-2ⁿThe whole feature graph is used as a connected domain, the relation graph is constructed by taking local blocks with different sizes as nodes, and the nodes in the relation graph can be set to be 2^k×2^kK is a partial block of size 0, 1, …, k-1. A width 2ⁿ×2ⁿThe feature map of (2) constructs a node size of 2^k×2^kContains 2^n-k×2^n-kA local block or a relational node. For simplicity, 2 will be used with an average pooling operation^k×2^kThe local blocks of (a) are aggregated into a 1 × 1 relationship graph node. For a feature graph with the number of channels being, the vectorization of the first node in the constructed relationship graph can be expressed as f_i∈R^C. Calculating similarity between nodes in a relational graph by using cosine similarity of vectors, i-th node vector f_iAnd the jth node vector f_jThe similarity between a_ijThe calculation is as follows.

In particular, the intermediate feature maps of the teacher network and the student network at the same stage have the same resolution and different channel numbers. Suppose the characteristic diagram of the teacher network is A e R^C×H×WThe characteristic diagram of the student network is B e R^C′×H×W2 on the characteristic diagram^k×2^kIn the affinity graph constructed by using local blocks with the same size as the nodes, the number of the nodes is 4^n-kThe similarity relation between the two nodes can be calculated to obtain 4^n-k×4^n-kA similarity matrix of sizes. Order to

2 on the network characteristic diagram of the teacher^k×2^kThe local block of the size is the cosine similarity obtained by the ith node and the jth node in a relational graph constructed by the nodes,

the corresponding characteristic diagram of the student network is also shown by 2^k×2^kThe cosine similarity obtained by the ith node and the jth node in a relational graph constructed by local blocks with the size can be summarized as follows, and the loss function of the block similarity knowledge distillation method can be summarized as follows, wherein the size of the characteristic graph satisfies that H, W, 2ⁿ。

Introducing a knowledge migration loss function L by combining a feature alignment knowledge distillation method and a block similarity knowledge distillation method_KTThe training process of the student network is supervised as part of the network training loss function. The student network learns not only the true tag information provided by the labeled face feature point coordinates, but also more refined face structured knowledge and data distribution knowledge extracted from the teacher network. Optimizing the performance of the student network EfficientFAN by using a knowledge migration module and a knowledge distillation method, and completing pre-trainingThe parameters of the teacher network are kept in a frozen state, and the knowledge is transferred to a loss function L_KTAnd the method is added into a training loss function, and the dark knowledge learned by a teacher through a network is distilled and transmitted to a student network in the training process of EfficientFAN, so that the positioning precision of the face feature points of the student network is improved. The loss function ultimately used to optimize the student network EfficientFAN is shown below, as a characteristic point loss function L_PAnd L_KTIn combination, where λ is an adjustable weighting parameter, for balancing the influence of two loss functions,

and

respectively, a block similarity knowledge distillation loss function and a feature alignment knowledge distillation loss function at the d stage of the decoder.

S5: and inputting the RGB face image in the natural scene into a final face alignment network model, and outputting a face characteristic point detection result.

According to the face feature point detection method based on depth knowledge migration, provided by the embodiment of the invention, EfficientFAN is used as a simple and effective lightweight model, and the decoder structure based on upsampling and depth separable convolution quickly realizes the upsampling recovery process of the feature map, so that the spatial information of the feature map is effectively saved.

Fig. 2 is a block diagram of the structure of the apparatus for detecting human face feature points based on deep knowledge migration according to the present invention. As shown in fig. 2, the apparatus for detecting human face feature points based on depth knowledge migration according to the present invention includes: a module 100, an output module 200 and a control processing module 300 are provided.

The providing module 100 is configured to provide a face data set including a label of a face feature point, and crop a face image according to a face detection frame provided by the face data set or an enclosure of the face feature point to obtain a training set, a verification set, and a test set. The control processing module 300 is configured to obtain a training sample from a training set, obtain a test sample from a test set, and input the test sample and the training sample into an initial face alignment network framework. The control processing module 300 is further configured to set parameters of the convolutional neural network, train an initial face to align with a teacher network and a student network in the network framework by using a Pytorch, and generate a training model until a loss function and the maximum iteration number meet predetermined conditions. The control processing module 300 is further configured to freeze model parameters of the teacher network, extract deep dark knowledge learned by the teacher network, transmit the deep dark knowledge to the student network, and supervise a training process of the student network to generate a final face alignment network model. The control processing module 300 is further configured to input the RGB face images in the natural scene into the final face alignment network model, and output a face feature point detection result through the output module.

It should be noted that, the specific implementation of the face feature point detection apparatus based on deep knowledge migration in the embodiment of the present invention is similar to the specific implementation of the face feature point detection method based on deep knowledge migration in the embodiment of the present invention, and specific reference is specifically made to the description of the face feature point detection method based on deep knowledge migration, and details are not repeated for reducing redundancy.

In addition, other configurations and functions of the apparatus for detecting human face feature points based on deep knowledge migration according to the embodiments of the present invention are known to those skilled in the art, and are not described in detail for reducing redundancy.

An embodiment of the present invention further provides an electronic device, including: at least one processor and at least one memory; the memory is to store one or more program instructions; the processor is configured to execute one or more program instructions to perform the method for detecting human face feature points based on deep knowledge migration according to the first aspect.

The disclosed embodiments of the present invention provide a computer-readable storage medium having stored therein computer program instructions, which, when run on a computer, cause the computer to execute the above-mentioned face feature point detection method based on deep knowledge migration.

In an embodiment of the invention, the processor may be an integrated circuit chip having signal processing capability. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The processor reads the information in the storage medium and completes the steps of the method in combination with the hardware.

The storage medium may be a memory, for example, which may be volatile memory or nonvolatile memory, or which may include both volatile and nonvolatile memory.

The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory.

The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of example and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (ddr Data Rate SDRAM), Enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM).

The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.

Those skilled in the art will appreciate that the functionality described in the present invention may be implemented in a combination of hardware and software in one or more of the examples described above. When software is applied, the corresponding functionality may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A face feature point detection method based on depth knowledge migration is characterized by comprising the following steps:

2. The method for detecting human face feature points based on deep knowledge migration according to claim 1, wherein step S1 includes:

3. The method of claim 1, wherein the initial face alignment network framework is generated by:

4. The method for detecting human face feature points based on deep knowledge migration according to claim 3, wherein the step S3 includes:

training the teacher network and the student network separately, using a feature point loss function L_POptimizing network parameters, characteristic point loss function L_PCalculated by the Wing loss function, the Wing loss function is expressed as follows:

5. The method for detecting human face feature points based on deep knowledge migration of claim 4, wherein in step S4, extracting deep dark knowledge learned by the teacher network comprises:

wherein A and B are the teacher network respectivelyA profile of the student network at the same stage,

6. The method for detecting human face feature points based on deep knowledge migration according to claim 5, wherein in step S4, the transferring the deep dark knowledge to the student network comprises:

7. A face feature point detection device based on deep knowledge migration, characterized by comprising:

an output module;

8. An electronic device, characterized in that the electronic device comprises: at least one processor and at least one memory;

the memory is to store one or more program instructions;

the processor, configured to execute one or more program instructions to perform the method for detecting facial feature points based on deep knowledge migration according to any one of claims 1 to 6.

9. A computer-readable storage medium containing one or more program instructions for executing the method for deep knowledge migration based face feature point detection according to any one of claims 1 to 6.