CN114519897B

CN114519897B - Human face living body detection method based on color space fusion and cyclic neural network

Info

Publication number: CN114519897B
Application number: CN202111663546.1A
Authority: CN
Inventors: 钱鹰; 张蓝; 刘歆; 陈仕杰
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2024-09-24
Anticipated expiration: 2041-12-31
Also published as: CN114519897A

Abstract

The invention discloses a face living body detection method based on color space fusion and a cyclic neural network, and relates to the technical field of living body detection. The invention includes fusing the new color space; constructing a face living body detection LSTM network; the color features of the fake face attack video of the public data set are input into the constructed LSTM for training; and utilizing the newly fused color space and the trained network model for human face living body detection. The human face living body detection algorithm provided by the invention can directly detect the human face living body of the content captured by the camera, can realize accurate detection under two-dimensional fake human face attack and fine-done three-dimensional fake human face attack, and solves the problem of low human face living body detection stability under multi-dimensional and cross-dataset fake human face attack.

Description

Human face living body detection method based on color space fusion and cyclic neural network

Technical Field

The invention belongs to the field of biological authentication anti-counterfeiting, and particularly relates to a human face living body detection method based on color space fusion and a cyclic neural network.

Background

With the advent of the artificial intelligence era, people began to use their own biological characteristics as identity marks, so that the identity recognition became more convenient and safer. In biological recognition, the face recognition has a large specific gravity in the biological recognition due to the advantages of low cost and no need of equipment management, and along with the development of face detection and recognition technology, the face recognition is widely applied to the daily life fields such as face payment, access control systems, attendance checking systems and the like.

The fake face attack is an attack aiming at the face recognition system, and the fake face attack attempts to enable the face recognition system to authenticate an illegal user as a legal user by presenting a fake version of the face of the legal user to a camera, so that the illegal user obtains the trust of the face recognition system.

In general, fake face attacks can be classified into 2 categories: two-dimensional fake face attacks and three-dimensional fake face attacks, wherein the two-dimensional fake face attacks comprise photo attacks and video attacks, the photo attacks refer to the trust of a face recognition system obtained by an illegal user in a mode that a photo or a picture of a legal user is printed or displayed on electronic equipment, and the video attacks refer to the trust of the face recognition system obtained by the illegal user by utilizing a video containing the face information of the legal user. The three-dimensional fake face attack mainly refers to that an illegal user utilizes various materials (such as silica gel and latex) to manufacture a 3D face mask of a legal user, and the trust of a face recognition system is obtained by wearing the 3D face mask.

In recent years, despite a great deal of research on human face living body detection, there is still room for improvement in the existing human face living body detection algorithm in terms of cross-dataset, multi-dimensional fake human face attacks. The existing human face living detection algorithm has methods such as LBP, LBP-TOP, markov and the like based on manual characteristics, and mainly utilizes the designed texture characteristics to distinguish real human faces from fake human faces; there are detection methods based on deep learning, such as: CNN, CNN+LSTM, DEEP TREE NET, etc., to learn the difference between the real face and the fake face by using the neural network, so as to judge the fake face attack; there are methods based on vital information (e.g., rpg method) that use vital information specific to a real face to distinguish a real face from a fake face attack. The method has higher detection accuracy when independently detecting two-dimensional or three-dimensional fake face attacks, but has lower detection accuracy in multi-dimensional and cross-dataset tests.

In the human face living body detection, the detection of two-dimensional and three-dimensional fake human face attacks is included, and the detection method using the mixed characteristics (firstly detecting the two-dimensional fake human face attacks and then detecting three fake human face attacks) can obtain higher detection accuracy on the multi-dimensional fake human face attacks, but the time is longer, so that the method is not beneficial to practical use. The deep learning method requires multiple data for training, and the generalization capability of the model tends to be reduced in the learning process. Therefore, the problem of multi-dimensional fake face attack across data sets is a current hot spot in the field of face living body detection.

The application publication number CN105354554A is a human face living body detection method based on color and singular value characteristics, and mainly solves the problems of complex calculation and low recognition rate of the existing human face authenticity recognition technology. The implementation steps are as follows: 1) Marking positive and negative samples of a face database, and dividing the positive and negative samples into a training set and a testing set; 2) Partitioning the training set face image, and extracting color features and singular value features of small blocks of the training set in batches; 3) Normalizing the extracted feature vectors, and sending the normalized feature vectors to a support vector machine classifier for training to obtain a training model; 4) And extracting the characteristics of the test set data, and predicting the characteristics of the test set data by using a training model to obtain a classification result. The invention improves the classification efficiency, obtains higher classification effect, and can be used for detecting the authenticity of the face in social networks or real life. According to the method, the extracted color features are trained and predicted through the support vector machine, so that mask fake face attacks with fine work can be hardly detected successfully, and the mask fake face has feature information very similar to that of a real face. The invention combines the specific pulse characteristics (rPPG signals) of the real human face to carry out human face living body detection, and has higher detection accuracy on three-dimensional fake human face attack.

CN111881815A is a human face living body detection method for multi-model feature migration, and by constructing and fusing heterogeneous data sets, living body training is carried out by adopting the multi-model feature migration method under various color spaces, and the accuracy and generalization capability of a living body detection model are improved. In the training stage, fusing visible light images on an open source or private data set, performing face detection, alignment and cutting, and simultaneously training an RGB model and a YUV model respectively until the models converge; in the prediction stage, the acquired visible light images are respectively input into a trained RGB model and a trained YUV model, the results of the two models are respectively obtained, a final score is obtained through a model score fusion strategy, and finally, a living body detection result is judged according to the score. The method has good generalization performance and high precision, and is suitable for industrial deployment and use. The method carries out the living detection of the human face by a multi-model characteristic migration method under a plurality of color spaces, can well avoid the influence of illumination on the detection result, but has the problem of difficult detection for mask type fake human face attacks with fine work and fake human face attacks across data sets. The invention can avoid the influence of external illumination on the detection result by color space fusion, combines the specific pulse characteristics (rPPG signals) of the real human face to carry out human face living detection, and has higher detection accuracy in multi-dimensional and cross-dataset fake human face attack.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. The method and the device have the advantages that the original video is directly used for human face living body detection, the image in the video does not need to be preprocessed, and the method and the device accord with actual application scenes. Meanwhile, the human face living body detection method based on the color space fusion and the cyclic neural network has stable and higher detection precision under the cross-data set and multi-dimensional fake human face attack. The technical scheme of the invention is as follows:

A human face living body detection method based on color space fusion and a circulating neural network comprises the following steps:

Carrying out face detection on each frame of image in the original video, and dividing a face area and a background area in the image;

constructing a new color space by utilizing the correlation between the face region and the background region rPPG signal;

Face detection is carried out on each frame of input image, and a face area and a background area in the image are segmented; converting the segmented image into HSV and YCbCr color spaces from RGB color spaces, segmenting 9 color channels, and performing Fourier transformation on each color channel to obtain rPPG signals of face areas and background areas of the color channels;

By utilizing the principle that the correlation between a real face region and a background region rPPG is smaller and the correlation between a fake face region is larger, three color channels are selected to construct a new color space.

Constructing an LSTM network for human face living body detection;

inputting the color characteristics of the video in the public data set into a constructed LSTM network for training;

and performing face living body detection by using a face living body detection model trained by the new color space and the LSTM network.

Further, the step of dividing the face region image and the background region image includes the steps of:

detecting each frame of image in the original video using frontal _face_detector in dlib library as face detector;

Positioning and labeling the face position of the person in the image by using a shape_predictor_68_face_landmarks.dat face feature extraction extractor;

And dividing the face area and the background area according to the marked position information, and dividing the face area and the background area.

Further, the building of the new color space comprises the steps of:

Capturing each frame of image in an original video, carrying out face detection on the image, and dividing the image into a face part and a background part;

Performing color space conversion on the face and background area images, and converting the face and background area images into HSV and YCbCr color spaces;

dividing RGB, HSV, YCbCr three color spaces to obtain 9 color channels;

Acquiring color characteristics of each color channel of a segmentation area of each frame of a current video to form 9 face color characteristic lists and 9 background color characteristic lists;

performing Fourier transformation on the face color feature list and the background area color feature list to obtain an rPPG signal;

calculating and recording a correlation coefficient of a human face region and a background region rPPG signal;

For the average correlation coefficient value of all channels of all videos, for a real face video, arranging color channels according to the ascending order of the correlation coefficient, and for a fake face attack video, arranging the color channels according to the descending order, and selecting three color channels with the highest front-to-back contact ratio as a new color space.

Further, the calculating the correlation coefficient of the rpg signal of the face region and the background region specifically includes:

{R₁,R₂,R₃}＝f_min({m1，m2，m3.......m9})

{F₁,F₂,F₃}＝f_max({m1，m2，m3.......m9})

Wherein m _j is the correlation coefficient of the rPPG signal generated by the face of the same channel and the same video of the background, C _i is the average correlation coefficient f _max generated by n videos, f _min is the maximum 3 m values of 9 color channels, and f _min is the minimum 3 m values of 9 color channels.

Further, the construction of the LSTM network for human face living body detection comprises the following steps: extracting the color characteristics of the face area and the original area of each frame of the original video by using a new color space to form a Feature Map; introducing Feature Map into LSTM network, using LSTM layer with 100 hidden neurons, full connection layer and FFT layer, LSTM is used to estimate frame input sequence with N _f Wherein I _j represents the color characteristics of each frame, and the FFT layer converts the response of the full-connection layer into the fourier domain to obtain an rpg signal;

The method comprises the steps of accessing a Fourier transform layer after a full connection layer of an LSTM network, and further performing Fourier transform on a difference sequence output by the network by using the Fourier transform layer, so as to obtain frequency domain information;

And combining the correlation between the LSTM prediction result and the frequency information of each region of the face and the background region, and outputting the result.

Further, the step of inputting the color characteristics of the video in the public dataset into the constructed LSTM network for training comprises the following steps:

Aiming at a real face video in the public dataset, a traditional rPPG method is used for obtaining an rPPG signal of the real face video, the rPPG signal is used as ground truth in the network training process, and for a fake face attack video, the rPPG signal is set to be 0;

Extracting color features from the video in the public dataset by using the constructed new color space, and inputting the color feature sequence and the set ground truth into an LSTM network for training;

and inserting a Fourier transform layer after the full connection layer of the LSTM network, converting the color characteristic change sequence into an rPPG signal, and then classifying and predicting.

The invention has the advantages and beneficial effects as follows:

The method based on color space fusion and the cyclic neural network provided by the invention can ensure that stable and higher detection precision can be realized under the attack of multi-dimensional fake faces on the living body detection task of the faces; according to the related color space fusion method, through the principle that the correlation between the rPPG signals extracted by each region of the human face and the pulse signals extracted by the background region is low, and the correlation between the pulse signals extracted by each partial region of the human face and the rPPG signals of the background region is high, each frame of image is divided into a human face region and a background region, the correlation between each color channel of the human face region and each color channel of the background region is calculated, three color channels which can most represent the color characteristic change of the human face are obtained through statistics, and the three color channels are combined into a new color space, so that the influence of external environment noise on the human face living body detection process is reduced, and the human face living body detection precision is improved. Compared with the existing method, the method can combine the rPPG method and the cyclic neural network, and the color characteristic change of each frame of the video is captured to the greatest extent by constructing a new color space, so that the illumination robustness is enhanced, and compared with other methods, the accuracy and the practicability are improved well.

Drawings

Fig. 1 is a flowchart of a face in-vivo detection method based on color space fusion and recurrent neural network according to a preferred embodiment of the present invention.

Fig. 2 is a flow chart of face region and background region segmentation in an embodiment of the present invention.

Fig. 3 is a flowchart of color space fusion in an embodiment of the invention.

Fig. 4 is a diagram of a part of a recurrent neural network according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and specifically described below with reference to the drawings in the embodiments of the present invention. The described embodiments are only a few embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

In order to realize face living body detection, the face living body detection method based on the color space fusion and the cyclic neural network provided by the embodiment of the invention comprises three stages of face region and background region segmentation, color space fusion and LSTM network prediction rPPG signal introduction, and comprises the following steps:

face region and background region segmentation stage

S1: face detection is carried out on each frame of image in the original video, and a face area and a background area in the image are segmented

Color space fusion phase

S2: constructing a new color space by utilizing the correlation between the face region and the background region rPPG signal;

rPPG signal phase prediction by introducing cyclic neural network

S3: introducing an LSTM network for human face living body detection;

s4: inputting the color characteristics of the video in the public data set into a constructed cyclic neural network for training;

s5: and using the new color space and the LSTM network to train a human face living body detection model for human face living body detection.

In the step S1, the step of dividing the face region image and the background region image includes the following steps (as shown in fig. 2):

S11: detecting each frame of image in the original video using frontal _face_detector in dlib library as face detector;

S12: positioning and labeling the face position of the person in the image by using a shape_predictor_68_face_landmarks.dat face feature extraction extractor;

s13: and dividing the face area and the background area according to the marked position information, and dividing the face area and the background area. Mainly comprises the following steps:

A1: finding out the position midpoint of the face in the vertical direction;

Further, according to the identification information of 68 feature points, the position information 28 representing the midpoint of the eyes is found, the x1 variable name is taken to represent, the position information 34 representing the midpoint of the nose is found, the x2 variable name is taken to represent, the position information 9 representing the midpoint of the chin is found, and the x3 variable name is taken to represent;

further, the value of (x1+x3)/2 is stored by the x_mid_up variable through the position information of x1, x2 and x3, the position information of the upper part in the human face is represented, and the value of (x2+x3)/2 is stored by the x_mid_down variable, the position information of the lower part in the human face is represented.

A2: finding out the position midpoint of the human face in the horizontal direction;

further, according to the identification information of 68 feature points, position information 3 representing the middle contour point on the left side of the face is found, the y1 variable name is taken to represent the position information 30 representing the middle upper point in the vertical direction of the face is found, the y2 variable name is taken to represent the position information 13 representing the middle lower position contour point on the right side of the face is found, and the y3 variable name is taken to represent the position information;

further, the position information of the face part of the person with the nose far left is represented by the position information of y1, y2 and y3, the value of (y1+y2)/2 is stored by using a y_mid_left variable, and the position information of the face part with the nose far right is represented by using a value of (y2+y3)/2 stored by using an x_mid_right variable.

A3, dividing the face area according to the values of the following x1, x2, x3, y1, y2 and y 3;

A4, finding out the position information of the background area around the face;

Further, according to the position information of y_mid_left and y1, creating a distance variable to store the value of y_mid_left-y1, in order to avoid the appearance of a human face, let y0=y1-distance×3, y4=y3+distance×3 and reset the values of y1, y3 for the purpose of zooming out, let y1=y1-distance×2, y3=y3+distance×2;

a5, segmenting a background area;

further, judging the size of y0, when y0<0, cutting the left background area of the human face into img [ x1: x3,0: y1], and when y0>0, cutting the left background area of the human face into img [ x1: x3, y0: y1];

Further, judging the size of y4, when y4>640, cutting the background area on the right side of the human face into img [ x1: x3, y3:640], and when y4<640, cutting the background area on the right side of the human face into img [ x1: x3, y3: y4];

in S2, the color space fusion includes the following steps (as shown in fig. 3):

S21: according to the principle that the correlation between the rPPG signals extracted from each region of the face and the rPPG signals extracted from the background region is low, and the correlation between the rPPG signals extracted from each partial region of the forged face and the rPPG signals of the background region is high, performing color space conversion on the segmented face region and the background region, and converting RGB color space into HSV and YCbCr color space;

s22: dividing RGB, HSV, YCbCr color spaces of the face area and the background area image to obtain R, G, B, H, S, V, Y, cb, cr color channels of the face area and the background area;

s23: acquiring color characteristics of each color channel of a segmentation area of each frame of a current video to form 9 face color characteristic lists and 9 background color characteristic lists;

S24: performing Fourier transformation on the face color feature list and the background area color feature list to obtain an rPPG signal;

s25: calculating and recording the correlation coefficient of the rPPG signals of the face area and the background area, and utilizing a formula Obtaining average correlation coefficient values of all channels of all videos, wherein M _j is the correlation coefficient of rPPG signals generated by the same channel face and the same background video, and C _i is the average correlation coefficient generated by n videos;

S26: obtaining 3 color channels to be recorded through a formula {R₁,R₂,R₃}＝f_min({x1,x2,x3.......x9}){F₁,F₂,F₃}＝f_max({x1,x2,x3.......x9}), wherein f _max is the color channels which are arranged in a descending order according to the correlation coefficient when the original video is a fake face attack video, the 3 color channels with the largest correlation coefficient between a face area and a background area are taken for recording, f _min is the color channels which are arranged in an ascending order according to the correlation coefficient when the original video is a real face video, the 3 color channels with the smallest correlation coefficient between the face area and the background area are taken for recording, and three color channels with the highest front-to-back coincidence degree are selected to be used as a new color space;

in S3, the construction of the recurrent neural network for human face living body detection includes the following steps (as shown in fig. 3):

s31: extracting color features of a face region and an original region in each frame of image of an original video by using the new color space combined in the S2 to form a Feature Map;

S32: the Feature Map was imported into the LSTM network using LSTM layers with 100 hidden neurons. The purpose of LSTM is to estimate an input sequence with N _f frames Is a rpg signal f;

S33: the full-connection layer is accessed after the LSTM layer, the Fourier transform layer is accessed after the full-connection layer, and then the Fourier transform is carried out on the difference sequence output by the network by utilizing the layer, so that the frequency domain information is obtained;

s34: and combining the correlation between the LSTM prediction result and the frequency information of each region of the face and the background region, and outputting the result.

In the step S4, the color characteristics of the video in the public data set are input into the constructed cyclic neural network for training, and the method comprises the following steps:

S41: aiming at the real face video in the public dataset, a traditional rPPG method is used for obtaining an rPPG signal of the real face video, and the rPPG signal is used as ground truth in the network training process. For fake face attack video, the rPPG signal of the fake face attack video is set to be 0;

S42: extracting color features from video in the public dataset by using the color space constructed by S2, inputting the color feature sequence and the set ground truth into a recurrent neural network for training, wherein the objective function is set as follows Wherein θ _R is RNN parameter, F _j is face region feature map, N _s is number of frame sequences, F _i denotes ground truth of the i-th frame;

S43: and inserting a Fourier transform layer after the full connection layer of the LSTM network, converting the color characteristic change sequence into an rPPG signal, and then classifying and predicting.

Compared with the existing method, the detection result obtained by the color space fusion and cyclic neural network based detection method has stable ratio and better accuracy.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.

Claims

1. The human face living body detection method based on the color space fusion and the cyclic neural network is characterized by comprising the following steps of:

Constructing a new color space by utilizing the correlation between the face region and the background region rPPG signal; face detection is carried out on each frame of input image, and a face area and a background area in the image are segmented;

Converting the segmented image into HSV and YCbCr color spaces from RGB color spaces, segmenting 9 color channels, and performing Fourier transformation on each color channel to obtain rPPG signals of face areas and background areas of the color channels;

By utilizing the principle that the correlation between a real face area and a background area rPPG is smaller and the correlation between a fake face area is larger, three color channels are selected to construct a new color space;

Constructing an LSTM network for human face living body detection;

Performing human face living body detection by using a human face living body detection model trained by a new color space and an LSTM network;

the step of dividing the face region image and the background region image comprises the following steps:

Dividing a face area and a background area according to the marked position information, and dividing the face area and the background area;

the building of the new color space comprises the steps of:

dividing RGB, HSV, YCbCr three color spaces to obtain 9 color channels;

for the average correlation coefficient value of all channels of all videos, arranging color channels according to the ascending order of the correlation coefficient for a real face video, arranging color channels according to the descending order of a fake face attack video, and selecting three color channels with the highest front-to-back contact ratio as a new color space;

the calculating the correlation coefficient of the rPPG signal of the face region and the background region specifically comprises the following steps:

{R₁,R₂,R₃}＝f_min({m1,m2,m3……m9})

{F₁,F₂,F₃}＝f_max({m1,m2,m3……m9})

Wherein m _j is the correlation coefficient of rPPG signals generated by the same channel face and the same background video, C _i is the average correlation coefficient f _max generated by n videos, f _min is the 3 m values with the largest 9 color channels, and f _min is the 3 m values with the smallest 9 color channels;

The construction of the LSTM network for human face living body detection comprises the following steps: extracting the color characteristics of the face area and the original area of each frame of the original video by using a new color space to form a Feature Map; introducing Feature Map into LSTM network, using LSTM layer with 100 hidden neurons, full connection layer and FFT layer, LSTM is used to estimate frame input sequence with N _f The FFT layer converts the response of the full connection layer into a Fourier domain to obtain an rPPG signal;

Combining the correlation of the LSTM prediction result and the frequency information of each region of the face and the background region, and outputting a result;

The method for inputting the color characteristics of the video in the public dataset into the constructed LSTM network for training comprises the following steps: