CN114463812B

CN114463812B - Low-resolution face recognition method based on double-channel multi-branch fusion feature distillation

Info

Publication number: CN114463812B
Application number: CN202210054234.9A
Authority: CN
Inventors: 钟锐; 王晨; 汪廷华; 宋亚锋
Original assignee: Gannan Normal University
Current assignee: Gannan Normal University
Priority date: 2022-01-18
Filing date: 2022-01-18
Publication date: 2024-03-26
Anticipated expiration: 2042-01-18
Also published as: CN114463812A

Abstract

The invention provides a low-resolution face recognition method based on double-channel multi-branch fusion feature distillation, which adopts a double-channel network to simultaneously extract local and global fusion features of high-low resolution images of faces, and adopts a feature distillation method to screen out representative similar fusion features from a complex high-recognition-rate face recognition model, and the features are used for carrying out auxiliary training on a constructed light low-resolution channel network, so that the function of accurately recognizing a low-resolution face test sample in a complex scene by the trained low-resolution channel network is realized. The model disclosed by the invention has a simple and small structure, still has the accurate recognition rate of the complex high-performance face recognition model on the low-resolution face picture, has a great advantage in the running speed, and solves the problems of low recognition rate of the low-resolution model and high consumption of calculation resources. The invention has good effect on the low-resolution face recognition problem, especially in various small application scenes.

Description

Low-resolution face recognition method based on double-channel multi-branch fusion feature distillation

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a low-resolution face recognition method based on double-channel multi-branch fusion feature distillation.

Background

With the rapid development of computer vision, the face recognition based on the high-resolution sample obtains high recognition rate at present, and the face recognition is widely applied to various fields, such as smart phones, security monitoring, intelligent access control and the like.

Methods for low-resolution face recognition in recent years can be broadly divided into two categories: the first is an indirect method, i.e., a low Resolution face recognition method based on reconstructed Super-Resolution (SR) images. The method mainly comprises the following two steps: firstly, performing super-Resolution reconstruction on a Low Resolution (LR) face image to obtain a High Resolution (HR) face image with a good visual effect, and then using the reconstructed LR face image for matching recognition; the second type is a direct method, namely an LR face recognition method based on a common feature subspace, and the method realizes classification recognition of test samples in the common feature subspace by constructing the common feature subspace of high-resolution and low-resolution face images.

However, in many practical application scenarios, the resolution of some collected key images may be low due to the higher installation position of the camera, so that many classical face recognition algorithms cannot obtain an ideal recognition rate in the above scenarios. Meanwhile, in order to improve the recognition rate of the low-resolution model, the algorithm of the recognition model of the type is generally complex, and a large amount of computing resources are required to perform model training, so that the model is difficult to deploy in mobile equipment.

Disclosure of Invention

The invention aims to solve the technical problems that: the low-resolution face recognition method based on the double-channel multi-branch fusion feature distillation is provided for improving the recognition rate of a low-resolution model.

The technical scheme adopted by the invention for solving the technical problems is as follows: a low-resolution face recognition method based on double-channel multi-branch fusion feature distillation comprises the following steps:

s1: constructing a two-channel multi-branch feature extraction network, wherein the two-channel multi-branch feature extraction network comprises an HR channel network, an LR channel network and a weighted fusion loss function; the HR channel network and the LR channel network respectively comprise a backbone network and a branch network; the HR channel network comprises an HR backbone network, an HR first branch network, an HR second branch network, an HR third branch network and a characteristic screening module; the architecture of the HR channel network is composed of an acceptance-v 3 module of GoogLeNet, and the HR channel network comprises 2 convolution layers, 6 acceptance modules, 3 maximum pooling layers, 2 connection layers and a full connection layer; the HR backbone network comprises an acceptance 4 module and an acceptance 5 module; the HR first branch network, the HR second branch network and the HR third branch network respectively comprise an acceptance 4 module; the LR channel network comprises an LR backbone network, an LR first branch network, an LR second branch network and an LR third branch network; the network architecture of the LR channel network is composed of convolution blocks; the LR channel network includes 11 convolutional layers, 2 max pooling layers, and 1 fully-connected layer; the weighted fusion loss function comprises a soft loss function, a hard loss function and a multi-core maximum average difference loss function MK-MMD;

S2: the output characteristics of the main network and the branch network of the HR channel network and the main network and the branch network of the LR channel network are respectively connected to the complete connection layers of the respective channel networks through cascade fusion to obtain facial multi-region fusion characteristics, so as to form final facial representation;

s3: inputting high-resolution face pictures to an HR channel network to train the HR channel network, and screening face depth features representing identity information of a subject;

s4: and inputting a low-resolution face picture with the same subject identity information as the high-resolution face picture into the LR channel network, training the LR channel network, optimizing parameters in the LR channel network by using the screened characteristics, and transferring the characteristic representation capability of the HR channel network with high recognition rate to the LR channel network through a weighted fusion loss function so that the LR channel network learns the characteristic representation capability of the HR channel network.

According to the above scheme, in the step S2, the specific steps are as follows:

s21: the main network extracts features from the global area of the face image, and the size of feature mapping is reduced to 3 multiplied by 1024 through a maximum pooling layer in an acceptance 5 module of the main network;

s22: cutting features output by a second layer convolution layer and an acceptance 3 module of the backbone network are used as input of each branch network, the size of a feature diagram cut by the second layer convolution layer is reduced to 6 multiplied by 64 through a maximum pooling layer, and then the cutting mapping diagram is output by the acceptance 3 module to be used as input of the branch network; the feature map size of the branched network is reduced to 3×3×832 by the maximum pooling layer;

S23: the output feature maps of the backbone network and the branch network are fused through cascading to form deep face fusion features, the dimension is reduced to 512 through a complete connection layer, and the 512-dimensional feature vector is used as a final face representation.

According to the above scheme, in the step S3, the specific steps are as follows:

s31: inputting a high-resolution face picture with a label into an HR channel network, carrying out face detection and feature point positioning on the input face picture, and dividing a face feature sampling area;

s32: the trunk network of the HR channel network learns the face representation of the whole face image and extracts the global features of the face image;

s33: each branch network of the HR channel network learns the representation of the image blocks cut around a face component and extracts the local features of the face image; the first branch network extracts the region characteristics taking eyes as the center, the second branch network extracts the region characteristics taking noses as the center, and the third branch extracts the region characteristics taking mouths as the center;

s34: the feature screening module screens face features with good semantic characteristics by combining the soft loss function and the central loss function, the distance between the features in the classes is increased in a combined training mode, and the distance between the features in the classes is reduced; and screening out the depth characteristics of the face representing the identity information of the subject in the HR channel.

Further, in the step S34, the specific steps are as follows:

s341: classifying semantic features with the same label at the tail end of the HR channel network by using a soft loss function to obtain a semantic set; let x be _n Is the nth eigenvector, y _n Is x _n Corresponding label, W _m For the M-th column of the weight W in the last full-connection layer, b is the deviation, N is the size of each batch, M is the number of training set types, and the soft loss function L _S Expressed as:

s342: calculating the distance between the face image features and the center of the class by using a center loss function, gathering the features in the same class towards the center of the current class, screening out the face features with better semantic characteristics in the partitioned semantic sets according to the distribution of the features, and eliminating feature vectors with longer distances from the center in the same class; let c _yn Representing y _n A center of class that updates as the extracted deep features change; x is x _n Representing an nth feature vector; center loss function L _C Expressed as:

in the iteration, the change of each class center point depends on the mean of the feature vectors of the corresponding class in the current batch, thus by calculating L _C Relative to x _n Center c of gradient update class of (c) _yn ：

The center loss function is used to minimize intra-class distances of features while keeping the different classes of features separable.

According to the above scheme, in the step S4, the specific steps are as follows:

s41: inputting a low-resolution face picture to pretrain an LR channel network, carrying out face detection and feature point positioning on the low-resolution face picture, dividing a face feature sampling area, and inputting each sampling area into the LR channel network according to branch categories;

s42: optimizing parameters of the LR channel network by using the high-resolution face depth feature set combined with the weighted fusion loss function screened in the feature screening module; the method comprises the following specific steps:

s421: let τ be the distillation temperature parameter, Z _HR Z is the final output characteristic of the HR channel _LR For the final output feature of the LR channel, softmax () is a softmax function, the soft labels defining the HR channel network are:

the soft labels of the LR channel network are:

definition of Soft loss function L _soft Is thatAnd->Cross entropy between:

soft loss function L _soft Using soft labels to transfer knowledge from the HR channel network to the LR channel network;

s422: let H () be the cross entropy loss function, X _LR For the non-softening class probability, y is the label value, a hard loss function L is defined _hard For enhancing classification performance of LR channel networks:

L _hard ＝H(X _LR ,y)；

s423: let N be the N feature vector, x ⁱ For samples derived from the high resolution training dataset, y ⁱ Is a sample derived from a low resolution training dataset;the method is characterized in that the method is an explicit mapping function, and is used for projecting samples to a high-dimensional space, so that sample characteristics are easier to classify, and the mapping function is not fixed in the form of data sets with different distributions; to reduce the difference between the dataset of the HR and LR channel networks, the multi-core maximum average difference loss function MK-MMD is used as the classification loss function L _MMD (x，y)：

S424: is provided withThe weights of the characteristic losses of the multi-core maximum average difference loss function MK-MMD are respectively a hard loss function, a soft loss function and a normalized multi-core maximum average difference loss function, and the soft loss function L is fused in a weighting mode _soft Hard loss function L _hard And the multi-core maximum average difference loss function MK-MMD is integrated into the weighted fusion loss function by the distribution difference of the multi-core maximum average difference loss function MK-MMD so as to improve the model classification precision and the overall classification performance of the model:

and solving by a backward propagation algorithm to obtain an optimized model of the LR channel network.

Further, in the step S423, the specific steps are as follows:

s4231: is provided withAs Gaussian kernel function for samplingThe vector is projected to a high-dimensional space; sigma (sigma) ² The average value of the squares of the paired distances is M, and M is the M th feature vector; then the Gaussian kernel function is adopted to project samples in the data set of the LR channel network and the HR channel network, and the classification loss function L is unfolded _MMD (x, y) is:

s4232: normalizing the multi-core maximum average difference loss function MK-MMD characteristic difference loss by usingAndsubstituting original features x and y to obtain a multi-core maximum average difference loss function MK-MMD normalized feature difference loss as followsThereby integrating the multi-core maximum average difference loss function MK-MMD into a weighted fusion loss function.

A dual-channel multi-branch feature extraction network comprises an HR channel network, an LR channel network and a weighted fusion loss function; the HR channel network is used for extracting the characteristics of the high-resolution face picture; the LR channel network is used for extracting the characteristics of the low-resolution face picture which has the same subject identity information as the high-resolution face picture; the weighted fusion loss function is used for migrating the characteristic representation capability of the high-recognition-rate HR channel network to the LR channel network so as to improve the classification precision of the LR channel network; the HR channel network and the LR channel network respectively comprise a backbone network and a branch network; the HR channel network comprises an HR backbone network, an HR first branch network, an HR second branch network, an HR third branch network and a characteristic screening module; the HR backbone network is used for extracting global features of the face image; the HR first branch network, the HR second branch network and the HR third branch network are respectively used for extracting local features of face cutting, which are centered on eyes, nose and mouth; the feature screening module is used for screening the depth features of the face representing the identity information of the subject in the HR channel network; the architecture of the HR channel network is composed of an acceptance-v 3 module of GoogLeNet, and the HR channel network comprises 2 convolution layers, 6 acceptance modules, 3 maximum pooling layers, 2 connection layers and a full connection layer; the HR backbone network comprises an acceptance 4 module and an acceptance 5 module; the HR first branch network, the HR second branch network and the HR third branch network respectively comprise an acceptance 4 module; the LR channel network comprises an LR backbone network, an LR first branch network, an LR second branch network and an LR third branch network; the LR first branch network, the LR second branch network and the LR third branch network are respectively used for extracting local features centering on eyes, noses and mouths; the network architecture of the LR channel network is composed of convolution blocks; the LR channel network includes 11 convolutional layers, 2 max pooling layers, and 1 fully-connected layer; the weighted fusion loss function comprises a soft loss function, a hard loss function and a multi-core maximum average difference loss function MK-MMD; the soft loss function is used to transfer knowledge from the HR channel to the LR channel by using soft labels; the hard loss function is used to enhance the classification performance of the LR channel network; the multi-core maximum average difference loss function is used to reduce differences between datasets of the LR and HR channel networks.

Further, the acceptance-v 3 module comprises three continuous acceptance module groups and Auxiliary logic Auxiliary Logits, global average pooling and Softmax classification; each module group comprises a plurality of acceptance modules, and the modules are used as multiple convolution filters, and convolution is applied to the same input for a plurality of times, so that deeper convolution is provided, convolution and nonlinear variation are increased, characteristics are refined, and network performance is improved.

Further, the HR backbone network comprises a first convolution layer, a second convolution layer, a first maximum pooling layer, an acceptance 3 module, a second maximum pooling layer, a fourth acceptance 4 module, a third maximum pooling layer and an acceptance 5 module which are connected in sequence; the second convolution layer and the acceptance 3 module are connected with the first acceptance 4 module through the first connection layer to serve as an HR first branch network; the second convolution layer and the acceptance 3 module are connected with the second acceptance 4 module through the first connection layer to serve as an HR second branch network; the second convolution layer and the acceptance 3 module are connected with a third acceptance 4 module through the first connecting layer to serve as an HR third branch network; the second acceptance 4 module is connected with the third acceptance 4 module to form a second connection layer; the second connecting layer is connected with the full connecting layer and then is combined with the HR trunk network and the HR first branch network;

The parameters of each layer of the HR channel network are set as follows: the convolution kernel size of the first convolution layer is 7*7, the step length is 2, and the output size is 12×12×64; the convolution kernel size of the second convolution layer is 3*3, the step length is 1, and the output size is 3×3×192; the convolution kernel sizes of the first maximum pooling layer, the second maximum pooling layer and the third maximum pooling layer are 2×2, and the output sizes are 6×6×64.

Further, the LR backbone network comprises a third convolution layer, a fourth maximum pooling layer, a fifth convolution layer, a sixth convolution layer, a fifth maximum pooling layer and a seventh convolution layer which are sequentially connected; the fourth convolution layer is sequentially connected with the eighth convolution layer and the ninth convolution layer through a third connection layer to serve as an LR first branch network; the fourth convolution layer is sequentially connected with a tenth convolution layer and an eleventh convolution layer through a third connection layer to serve as an LR second branch network; the fourth convolution layer is sequentially connected with a twelfth convolution layer and a thirteenth convolution layer through a third connection layer to serve as an LR third branch network; the LR trunk network, the LR first branch network, the LR second branch network and the LR third branch network are connected on the full connection layer;

the parameters of each layer of the LR channel network are set as follows: the convolution kernel size of the third convolution layer is 7*7, the step length is 2, and the output size is 12×12×64; the convolution kernel size of the fourth convolution layer is 3*3, the step length is 1, and the output size is 3×3×192; the convolution kernel size of the fourth maximum pooling layer and the fifth maximum pooling layer is 2 x 2, and the output size is 6 x 64; the convolution kernel sizes of the fifth convolution layer, the eighth convolution layer, the tenth convolution layer and the twelfth convolution layer are 3*3, the step length is 1, and the output size is 3×3×256; the convolution kernel sizes of the sixth convolution layer, the ninth convolution layer, the eleventh convolution layer and the thirteenth convolution layer are 3*3, the step length is 1, and the output size is 3×3×512; the convolution kernel size of the seventh convolution layer is 3*3, the step size is 1, and the output size is 3×3×832.

The beneficial effects of the invention are as follows:

1. according to the low-resolution face recognition method based on the double-channel multi-branch fusion feature distillation, the local and global fusion features of the high-low resolution images of the face are extracted simultaneously through the double-channel network, the feature distillation method is used for screening out representative similar fusion features from the complex high-recognition-rate face recognition model, the features are used for carrying out auxiliary training on the constructed light low-resolution channel network, and the function of accurately recognizing the low-resolution face test sample in the complex scene through the trained low-resolution channel network is achieved.

2. The model disclosed by the invention has a simple and small structure, still has the accurate recognition rate of the complex high-performance face recognition model on the low-resolution face picture, has a great advantage in the running speed, and solves the problems of low recognition rate of the low-resolution model and high consumption of calculation resources.

3. The invention has good effect on the low-resolution face recognition problem, especially in various small application scenes.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

Fig. 2 is a diagram of an HR channel network model according to an embodiment of the present invention.

Fig. 3 is a diagram of an LR channel network model of an embodiment of the present invention.

Fig. 4 is a view of a face feature sampling area according to an embodiment of the present invention.

FIG. 5 is a profile of an embodiment of the present invention trained using a softmax loss and center loss function.

Detailed Description

The invention will be described in further detail with reference to the drawings and the detailed description.

The main implementation method of the invention is as follows: firstly, a two-channel multi-branch characteristic fusion network is constructed, wherein the network is two three-branch networks. The HR channel of the network is trained by high-resolution face pictures to obtain high-resolution face features, the distance between the face sample features and the center is calculated by using a center loss function and a softmax loss function, then a data set is screened according to the distribution of the features, the extracted features are aggregated and deleted, and feature vectors which are in the same type and have a larger distance from the center are removed to obtain a face depth feature set which can well represent identity information of a subject. And then, a weighted fusion loss function is formed by the soft loss function, the hard loss function and the multi-core maximum average difference loss function, and the characteristic representation capability of the HR channel is migrated to an LR channel network by using the loss function, so that the aim of improving the classification precision of the LR channel is achieved, and an LR channel model can accurately identify an LR sample in a complex scene.

Referring to fig. 1, the low-resolution face recognition method based on the dual-channel multi-branch fusion feature distillation according to the embodiment of the invention comprises the following steps:

step S1: a two-channel multi-branch feature extraction network is constructed, and the network is divided into an HR channel and an LR channel. The two-channel network is equally divided into a plurality of branches, wherein a backbone network is used for extracting global features, each branch network is used for extracting local features, and finally deep face fusion features are obtained through feature fusion. The construction of the network model comprises the following steps, and the specific construction flow is shown in fig. 2 and 3:

step S11: the construction of the HR channel convolutional neural network model comprises the following specific steps:

step S111: the high-resolution face picture with the label is input into an HR channel network, the input face picture is subjected to face detection and feature point positioning, and a face feature sampling area is divided, wherein a specific division method is shown in figure 4.

Step S112: facial multi-region features are extracted through the backbone and branching networks. The backbone network is used for extracting facial full face region features, and the branch network is used for extracting facial local region features, wherein the first branch network extracts region features centered on eyes, the second branch network extracts region features centered on nose, and the third branch extracts region features centered on mouth.

Step S113: in order to enable the model to better extract semantic features of the face, the invention constructs a three-branch network, and totally uses 2 convolution layers, 6 acceptance modules, 3 maximum pooling layers, 2 connection layers and 1 full connection layer.

Table 1 shows the method of setting the parameters of each layer in the HR channel model,

table 1 HR settings of parameters of layers in the channel network model

The specific parameter setting steps are as follows:

step S1131: the convolution kernel size of Conv1 is 7*7, the step size is 2, and the output size is 12×12×64;

step S1132: the convolution kernel size of Conv2 is 3*3, the step size is 1, and the output size is 3×3×192;

step S1133: max working convolution kernel size is 2 x 2, output size is 6 x 64;

step S1134: the concept-v 3 module of GoogLeNet is used, and the concept module is a module serving as a multiple convolution filter, and can apply convolution to the same input multiple times, so that deeper convolution is provided. The specific structure of the acceptance-v 3 module is as follows:

step S1135: the acceptance-v 3 module consists of three continuous acceptance module groups and Auxiliary Logits, global average pooling and Softmax classification modules, wherein each module group consists of a plurality of acceptance modules for increasing the convolution and nonlinear characteristics of the model, so that the characteristics with better semantic characteristics can be better extracted, and the classification performance of the network is improved.

Step S1136: the Conv1, conv2, maximum pooling layer, acceptance 3 module, maximum pooling layer, acceptance 4 module, maximum pooling layer and acceptance 5 module are connected in sequence. The specific connection method is that Conv2 and acceptance 3 are connected into a connection layer 1, an acceptance 4 module is connected as a first branch, then the acceptance 4 module is connected into the connection layer as a second branch, and an acceptance 4 module is continuously connected into the connection layer as a third branch. And the acceptance 4 module of the second branch and the acceptance 4 module of the third branch are connected into a connection layer, and finally the trunk and the branches are connected onto the full connection layer. A specific network structure is shown in fig. 2.

Step S12: constructing an LR channel convolutional neural network model, which comprises the following specific steps:

step S121: face detection and feature point positioning are carried out on the LR face picture, face feature sampling areas are divided, and each sampling area is input into an LR channel network according to branch categories.

Step S122: in order to make the LR channel lighter, the invention consists of 11 convolution layers (convolution layer 3 and convolution layer 4 on different branches are completely identical convolution blocks), 2 maximum pooling layers and a full connection layer, and table 2 shows the setting method of parameters of each layer in the LR channel model.

Table 2 setting of parameters of each layer in LR channel network model

The specific parameter setting steps are as follows:

step S1221: the convolution kernel size of Conv3 is 7*7, the step size is 2, and the output size is 12×12×64;

step S1222: the convolution kernel size of Conv4 is 3*3, the step size is 1, and the output size is 3×3×192;

step S1223: max working convolution kernel size is 2 x 2, output size is 6 x 64;

step S1224: conv5, conv8, conv10 and Conv12 have a convolution kernel size of 3*3, a step size of 1, and an output size of 3×3×256;

step S1225: conv6, conv9, conv11 and Conv13 have a convolution kernel size of 3*3, a step size of 1, and an output size of 3×3×512;

step S1226: the convolution kernel size of Conv7 is 3*3, the step size is 1, and the output size is 3×3×832;

the LR backbone network comprises a third convolution layer, a fourth maximum pooling layer, a fifth convolution layer, a sixth convolution layer, a fifth maximum pooling layer and a seventh convolution layer which are sequentially connected; the fourth convolution layer is sequentially connected with the eighth convolution layer and the ninth convolution layer through a third connection layer to serve as an LR first branch network; the fourth convolution layer is sequentially connected with a tenth convolution layer and an eleventh convolution layer through a third connection layer to serve as an LR second branch network; the fourth convolution layer is sequentially connected with a twelfth convolution layer and a thirteenth convolution layer through a third connection layer to serve as an LR third branch network; the LR trunk network, the LR first branch network, the LR second branch network and the LR third branch network are connected on the full connection layer;

Step S1227: and sequentially connecting Conv1, conv2, a maximum pooling layer, conv3, conv4, a maximum pooling layer and Conv5, connecting Conv3 and Conv4 as a first branch to a convolution layer 2, connecting Conv3 and Conv4 as a second branch from the connection layer, continuing to connect Conv3 and Conv4 as a third branch from the connection layer, and finally connecting a trunk and a branch to a full connection layer. A specific network structure is shown in fig. 3.

Step S13: the main network is used for extracting facial global features of a face sample, the branch network is used for extracting facial local region features, and then output features of the main network and the branch network are fused through cascading and are connected to a complete connection layer, so that final facial multi-region fusion features are obtained, and the specific feature fusion steps are as follows:

step S131: the backbone network performs feature extraction from the global area of the face sample, and reduces the size of feature mapping to 3×3×1024 through a maximum pooling layer in a backbone network acceptance 5 module.

Step S132: for each branch network, the invention takes the clipping features output by Conv2 and acceptance 3 modules of the backbone network directly as input, rather than calculating the features from the head. The size output of the feature map clipped from Conv2 is reduced by half to 6x6x64 by maximum pooling, and then the clipping map is output by the connection acceptance 3 module as input to the branching network. Similar to the backbone network, the feature map size of the branched network is reduced to 3×3×832 by the maximum pooling layer.

Step S133: the output feature maps of the backbone network and the branch network are fused through cascading to form a deep face fusion feature, the dimension of the deep face fusion feature is reduced to 512 through a complete connection layer, and the 512-dimensional feature vector is used as a final face representation.

Step S2: the feature types contained in the depth face fusion feature set extracted in the HR channel are numerous, not all features can well represent the face, so the invention constructs a feature screening module, uses the module to calculate the distance between the face sample feature and the center of the class, classifies and clusters the face sample feature, and screens out the face depth feature which can well represent the identity information of the subject. The characteristics can be described from multiple aspects such as global and local faces, so that the characteristics have better classification characteristics. The invention combines the softmax loss function and the center loss function to screen the face features with better semantic characteristics, and the distance between the features in the classes is increased and the distance between the features in the classes is reduced by a combined training mode. The softmax loss function can classify and divide semantic features with the same label, the center loss function can effectively gather the features in the same class towards the center of the current class, and facial features with good semantic characteristics are screened out in the classified semantic set. The screening method comprises the following steps:

Step S21: classification using HR channel ends uses a softmax loss function that forces different classes of depth fusion features to remain separate, the function being in the specific form:

wherein x is _n Represents the nth eigenvector, y _n Is x _n Corresponding label, W _m Represents the mth column of the weight W in the last fully connected layer, b represents the deviation, N represents each batch bigAnd the small M is the number of training set types.

Step S22: the center loss function is used for calculating the distance between the face sample features and the center of the class, and the face fusion feature set closest to the center of the class can be effectively screened out, and the function can be expressed as:

wherein c _yn Denoted by y _n Center of class, L _C Is the distance of the face sample feature from the center of the class.

Step S221: since c is updated every time _yn Instead of updating based on the entire training set, the current small training set is based on. In the iteration, the change of each class center point depends on the mean of the feature vectors of the corresponding class in the current batch. In this case, not all the centerpoints are updated every time an iteration is performed, because every batch sometimes cannot contain all classes. To c _yn Update, need to calculate L _C Relative to x _n Is a gradient of (a). c _yn The updating steps of (a) are as follows:

the center loss function can minimize intra-class distances of features while keeping the different classes of features separable.

Step S222: and screening the data set according to the distribution of the characteristics. The extracted features are aggregated and screened out, and the labels are the same. And screening the data set by using a selection strategy of minimizing 'intra-class aggregation' for the features of the same class, and removing the feature vectors which are far away from the center vector in the same class to obtain a face depth fusion feature set which can well represent the identity information of the subject in the current class.

Step S3: in order to enable an LR channel to learn feature representation capability from an HR channel with high recognition rate, thereby improving classification accuracy of the LR channel, the invention adopts a weighted fusion loss function composed of a soft loss function, a hard loss function and a multi-core maximum average difference loss function together to achieve the aim, wherein the soft loss can transfer knowledge from the HR channel to the LR channel by using a soft label, the hard loss can enable the LR channel to develop own classification capability, and the multi-core maximum average difference loss function is used as the loss function of the LR channel to reduce data set difference. Firstly, inputting a low-resolution face picture to pretrain an LR channel, after training, using a high-resolution face depth feature set screened in a feature screening module, and combining three combined loss functions to optimize the super-parameters of the LR channel network.

Step S31: first, soft loss can transfer knowledge from HR channel to LR channel by using soft labels, the present invention defines the soft labels of HR channel and LR channel as:

wherein τ is a distillation temperature parameter, Z _HR Is the final output characteristic of the HR channel, Z _LR For the final output feature of the LR channel, softmax () is a softmax function.

The invention will soft loss L according to the definition of soft label _soft Is defined asAnd->The cross entropy between the two is specifically defined as follows:

step S32: since the soft loss function only migrates the sample tag, the present invention achieves this goal by introducing a hard loss function in order to enhance the classification performance of the LR channel, in the following specific form:

L _hard ＝H(X _LR ,y)

in which L _hard As hard loss function, H () is cross entropy loss function, X _LR And y is a label value for the probability of non-softened class.

Step S33: because the training set for the HR channel is a high resolution picture and the training set for the LR channel is a low resolution picture in this scheme, this will lead to a large distribution difference between the features output by the HR and LR channels, so that the classification accuracy of the conventional classification loss function is drastically reduced, for this purpose, the present invention uses MK-MMD (Multi Kernel Maximum Mean Discrepancy, multi-core maximum average difference loss function) as the classification loss function to reduce the difference between the data sets, and the specific form is as follows:

Wherein N is the Nth feature vector; x is x ⁱ And y ⁱ Representing two samples, respectively derived from a high-resolution low-resolution training dataset;is an explicit mapping function for projecting samples into a high dimensional space, making the sample features easier to classify, which also results in the mapping function being of a non-fixed form in the differently distributed data sets.

Step S331: in order to solve the problem that the mapping function form is not fixed, the invention adopts a Gaussian kernel function to project samples in LR and HR data sets, so that the above formula can be expanded, and the specific form is as follows:

in the middle ofAs Gaussian kernel functions for projecting sample vectors into high-dimensional space, σ ² Set as the average of the squares of the pair distances, M is the mth feature vector.

Step S332: in order to integrate MK-MMD into the overall loss function, the present invention normalizes MK-MMD feature difference loss byAnd->Replacing the original features x and y. Thus, the MK-MMD normalized feature difference loss is obtained and is recorded as: />

Step S34: in order to further improve the model classification precision, the three loss functions are fused in a weighted mode, and MK-MMD distribution differences are integrated into the overall loss function, so that the aim of improving the overall classification performance of the model is achieved, and the method is specifically shown as follows:

In the middle ofWeights for MK-MMD feature loss for hard, soft, and normalized are represented.

Step S35: solving through a backward propagation algorithm to obtain an optimized LR channel network model, wherein the model can be used for low-resolution face recognition in a complex scene.

The above embodiments are merely for illustrating the design concept and features of the present invention, and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, the scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes or modifications according to the principles and design ideas of the present invention are within the scope of the present invention.

Claims

1. The low-resolution face recognition method based on the double-channel multi-branch fusion feature distillation is characterized by comprising the following steps of: the method comprises the following steps:

S2: the output characteristics of the trunk network and the branch network of the HR channel network and the trunk network and the branch network of the LR channel network are respectively fused in a cascading way and connected to the complete connection layer of the respective channel network, so as to obtain the facial multi-region fusion characteristic which is used as the final facial representation;

2. The low-resolution face recognition method based on the double-channel multi-branch fusion feature distillation according to claim 1, wherein the method comprises the following steps: in the step S2, the specific steps are as follows:

3. The low-resolution face recognition method based on the double-channel multi-branch fusion feature distillation according to claim 1, wherein the method comprises the following steps: in the step S3, the specific steps are as follows:

4. The low-resolution face recognition method based on the double-channel multi-branch fusion feature distillation according to claim 3, wherein the method comprises the following steps: in the step S34, the specific steps are as follows:

s342: calculating the distance between the face image features and the center of the class by using a center loss function, gathering the features in the same class towards the center of the current class, screening out the face features with better semantic characteristics in the partitioned semantic sets according to the distribution of the features, and eliminating feature vectors with longer distances from the center in the same class; let c _yn Representing y _n Center of class, it follows extractionChanges in the deep features to be updated; x is x _n Representing an nth feature vector; center loss function L _C Expressed as:

in the iteration, the change of the center point of each class depends on the mean of the feature vectors of the corresponding class in the current batch,

thus by calculating L _C Relative to x _n Center c of gradient update class of (c) _yn ：

5. The low-resolution face recognition method based on the double-channel multi-branch fusion feature distillation according to claim 1, wherein the method comprises the following steps: in the step S4, the specific steps are as follows:

s421: let τ be the distillation temperature parameter, Z _HR Z is the final output characteristic of the HR channel _LR Final for LR channelOutput features, softmax () is a softmax function, soft labels defining HR channel networks are:

the soft labels of the LR channel network are:

definition of Soft loss function L _soft Is thatAnd->Cross entropy between:

L _hard ＝H(X _LR ,y)；

s423: let N be the N feature vector, x ⁱ For samples derived from the high resolution training dataset, y ⁱ Is a sample derived from a low resolution training dataset;for explicit mapping functions for projecting samples into high-dimensional space, making sample features easier to classify, mapping functions in different distributed numbersThe form of the data set is not fixed; to reduce the difference between the dataset of the HR and LR channel networks, the multi-core maximum average difference loss function MK-MMD is used as the classification loss function L _MMD (x，y)：

6. The low-resolution face recognition method based on the double-channel multi-branch fusion feature distillation according to claim 5, wherein the method comprises the following steps: in step S423, the specific steps are as follows:

s4231: is provided withIs a gaussian kernel function for projecting a sample vector into a high dimensional space; sigma (sigma) ² The average value of the squares of the paired distances is M, and M is the M th feature vector; then the Gaussian kernel function is adopted to project samples in the data set of the LR channel network and the HR channel network, and the classification loss function is developedNumber L _MMD (x, y) is:

s4232: normalizing the multi-core maximum average difference loss function MK-MMD characteristic difference loss by using And->Substituting original features x and y to obtain a multi-core maximum average difference loss function MK-MMD normalized feature difference loss as followsThereby integrating the multi-core maximum average difference loss function MK-MMD into a weighted fusion loss function.

7. A two-channel multi-branch feature extraction network, characterized by: the method comprises the steps of including an HR channel network, an LR channel network and a weighted fusion loss function; the HR channel network is used for extracting the characteristics of the high-resolution face picture; the LR channel network is used for extracting the characteristics of the low-resolution face picture which has the same subject identity information as the high-resolution face picture; the weighted fusion loss function is used for migrating the characteristic representation capability of the high-recognition-rate HR channel network to the LR channel network so as to improve the classification precision of the LR channel network;

the HR channel network and the LR channel network respectively comprise a backbone network and a branch network;

the HR channel network comprises an HR backbone network, an HR first branch network, an HR second branch network, an HR third branch network and a characteristic screening module; the HR backbone network is used for extracting global features of the face image; the HR first branch network, the HR second branch network and the HR third branch network are respectively used for extracting local features of face cutting, which are centered on eyes, nose and mouth; the feature screening module is used for screening the depth features of the face representing the identity information of the subject in the HR channel network;

The architecture of the HR channel network is composed of an acceptance-v 3 module of GoogLeNet, and the HR channel network comprises 2 convolution layers, 6 acceptance modules, 3 maximum pooling layers, 2 connection layers and a full connection layer; the HR backbone network comprises an acceptance 4 module and an acceptance 5 module; the HR first branch network, the HR second branch network and the HR third branch network respectively comprise an acceptance 4 module;

the LR channel network comprises an LR backbone network, an LR first branch network, an LR second branch network and an LR third branch network; the LR first branch network, the LR second branch network and the LR third branch network are respectively used for extracting local features centering on eyes, noses and mouths;

the network architecture of the LR channel network is composed of convolution blocks; the LR channel network includes 11 convolutional layers, 2 max pooling layers, and 1 fully-connected layer;

the weighted fusion loss function comprises a soft loss function, a hard loss function and a multi-core maximum average difference loss function MK-MMD; the soft loss function is used to transfer knowledge from the HR channel to the LR channel by using soft labels; the hard loss function is used to enhance the classification performance of the LR channel network; the multi-core maximum average difference loss function is used to reduce differences between datasets of the LR and HR channel networks.

8. The dual channel multi-branch feature extraction network of claim 7, wherein:

the acceptance-v 3 module comprises three continuous acceptance module groups and Auxiliary logic Auxiliary Logits, global average pooling and Softmax classification; each module group comprises a plurality of acceptance modules, and the modules are used as multiple convolution filters, and convolution is applied to the same input for a plurality of times, so that deeper convolution is provided, convolution and nonlinear variation are increased, characteristics are refined, and network performance is improved.

9. The dual channel multi-branch feature extraction network of claim 7, wherein:

the HR backbone network comprises a first convolution layer, a second convolution layer, a first maximum pooling layer, an acceptance 3 module, a second maximum pooling layer, a fourth acceptance 4 module, a third maximum pooling layer and an acceptance 5 module which are connected in sequence;

the second convolution layer and the acceptance 3 module are connected with the first acceptance 4 module through the first connection layer to serve as an HR first branch network;

the second convolution layer and the acceptance 3 module are connected with the second acceptance 4 module through the first connection layer to serve as an HR second branch network;

the second convolution layer and the acceptance 3 module are connected with a third acceptance 4 module through the first connecting layer to serve as an HR third branch network;

The second acceptance 4 module is connected with the third acceptance 4 module to form a second connection layer;

the second connecting layer is connected with the full connecting layer and then is combined with the HR trunk network and the HR first branch network;

the parameters of each layer of the HR channel network are set as follows:

the convolution kernel size of the first convolution layer is 7*7, the step length is 2, and the output size is 12×12×64;

the convolution kernel size of the second convolution layer is 3*3, the step length is 1, and the output size is 3×3×192;

the convolution kernel sizes of the first maximum pooling layer, the second maximum pooling layer and the third maximum pooling layer are 2×2, and the output sizes are 6×6×64.

10. The dual channel multi-branch feature extraction network of claim 7, wherein:

the LR backbone network comprises a third convolution layer, a fourth maximum pooling layer, a fifth convolution layer, a sixth convolution layer, a fifth maximum pooling layer and a seventh convolution layer which are sequentially connected;

the fourth convolution layer is sequentially connected with the eighth convolution layer and the ninth convolution layer through a third connection layer to serve as an LR first branch network;

the fourth convolution layer is sequentially connected with a tenth convolution layer and an eleventh convolution layer through a third connection layer to serve as an LR second branch network;

the fourth convolution layer is sequentially connected with a twelfth convolution layer and a thirteenth convolution layer through a third connection layer to serve as an LR third branch network;

The LR trunk network, the LR first branch network, the LR second branch network and the LR third branch network are connected on the full connection layer;

the parameters of each layer of the LR channel network are set as follows:

the convolution kernel size of the third convolution layer is 7*7, the step length is 2, and the output size is 12×12×64;

the convolution kernel size of the fourth convolution layer is 3*3, the step length is 1, and the output size is 3×3×192;

the convolution kernel size of the fourth maximum pooling layer and the fifth maximum pooling layer is 2 x 2, and the output size is 6 x 64;

the convolution kernel sizes of the fifth convolution layer, the eighth convolution layer, the tenth convolution layer and the twelfth convolution layer are 3*3, the step length is 1, and the output size is 3×3×256;

the convolution kernel sizes of the sixth convolution layer, the ninth convolution layer, the eleventh convolution layer and the thirteenth convolution layer are 3*3, the step length is 1, and the output size is 3×3×512;

the convolution kernel size of the seventh convolution layer is 3*3, the step size is 1, and the output size is 3×3×832.