CN114821786A

CN114821786A - Gait recognition method based on human body contour and key point feature fusion

Info

Publication number: CN114821786A
Application number: CN202210452885.3A
Authority: CN
Inventors: 陈志�; 周晨; 岳文静; 艾虎; 王悦; 何丽
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2022-04-27
Filing date: 2022-04-27
Publication date: 2022-07-29

Abstract

The invention discloses a gait recognition method based on human body contour and key point feature fusion, which comprises the following steps: inputting a gait video of single person walking to obtain a pedestrian contour sequence in the video; substituting the gait video into an OpenPose algorithm module to obtain a normalized human body key point information sequence, and substituting the pedestrian contour sequence into a GaitSet algorithm module to obtain the characteristics of the gait contour sequence; a human key point feature extraction module which consists of LSTM and CNN of the human key point information sequence; respectively obtaining gait contour feature vectors and human body key point feature vectors; connecting the gait contour feature vector with the human key point feature vector and then inputting the connected gait contour feature vector into a feature fusion module; and importing the gait fusion characteristics into a fusion network for characteristic learning, and identifying the identity of the person in the video. The invention utilizes the human body contour feature extraction module and the human body key point feature extraction module to respectively extract the features of the human body contour feature extraction module and then carries out feature layer fusion to obtain gait fusion features, thereby improving the accuracy and robustness of gait recognition.

Description

Gait recognition method based on human body contour and key point feature fusion

Technical Field

The invention belongs to the cross technical field of computer vision, identity recognition, feature fusion and the like, and particularly relates to a gait recognition method based on human body contour and key point feature fusion.

Background

Gait is one of human biological behavior characteristics, describes the motion change rule of upper and lower joints in the walking process of a person, medically considers that the gait of each person is unique, and the human gait characteristics have the characteristics of global uniqueness, long-term invariance, difficult revocation, easy collection, non-moldability, difficult camouflage, non-contact property and the like, and are the biological characteristics which are most suitable for wide-range popularization in multiple fields at present. Gait recognition is to extract physical and behavioral features from an individual's walking pattern for identification. In the advanced security field, because the gait characteristics are difficult to disguise and imitate, the iris characteristics can be assisted to carry out mixed biological characteristic identification so as to improve the safety performance, and stronger safety guarantee is brought to the fields of finance, military affairs and the like.

Although individual gait is unique, factors such as clothing, carry-on things, perspective and viewing angle pose significant challenges to gait recognition, and various approaches have been proposed to address these issues. Early people differentiated people from differences in various parts of the body when each person walked, but this meant modeling numerous structures of the body, involving a large number of variables and complex operations; identification from the original RGB image is also a research direction, but the method faces the challenge of eliminating gait irrelevant information; with the rapid development of deep learning, two gait recognition methods based on human body contour and human body bone key points are gradually changed into mainstream at present.

The method based on the human body contour can avoid the interference of irrelevant pixel points in the video sequence to a great extent, quickly and effectively extract the characteristics in the contour sequence, and is suitable for the condition of low resolution. Although the contour sequence-based method has many advantages, the method only retains the external contour of the human body, and the trunk information of the human body cannot be utilized during walking. The method based on the human skeleton key points estimates the postures of the people in the video sequence, extracts the skeleton key points of the people and identifies the skeleton key points from the skeleton key point sequence.

The two characteristics are complementary, the composition of the two characteristics is expected to become a more comprehensive representation of gait, and then the complementary advantages of the contour and the bone key points are not fully utilized in the past research.

Disclosure of Invention

The purpose of the invention is as follows: in order to overcome the defects in the prior art, the invention provides a gait recognition method based on the fusion of human body contour and key point features.

The technical scheme is as follows: the invention provides a gait recognition method based on human body contour and key point feature fusion, which comprises the following steps:

acquiring a pedestrian contour sequence in the video based on the input single walking gait video, and substituting the gait video into an OpenPose algorithm module to obtain a normalized human body key point information sequence;

substituting the pedestrian contour sequence into a GaitSet algorithm module to obtain the characteristics of the gait contour sequence; importing the human body key point information sequence into a human body key point feature extraction module to obtain the features of the human body key points;

respectively obtaining a gait contour feature vector and a human body key point feature vector based on the features of the gait contour sequence and the features of the human body key points;

connecting the gait contour feature vector and the human body key point feature vector, and inputting the connected gait contour feature vector into a feature fusion module to obtain gait fusion features;

and importing the gait fusion characteristics into a fusion network for characteristic learning, and identifying the identity of the person in the video.

In a further embodiment, the method for inputting the gait video of the single person walking and acquiring the pedestrian contour sequence in the video comprises the following steps:

the gait video obtains the human body contour of each frame of image of the gait video by using a KNN algorithm;

calculating the number of non-zero pixel points of each frame of single contour image based on the human body contour of each frame of image, and selecting whether to output the image according to the threshold value of the image pixel;

acquiring an index value interval of an upper limit value and a lower limit value of a row pixel sum which is not 0 for the output image, and cutting an upper area and a lower area of the output image according to the index value interval to obtain a cut image;

searching a median based on an x axis in the cut image, and determining an x axis central point of a person in the image by the searched median;

slicing from the center point to both sides to obtain image arrays with 64-bit rows and columns;

and converting the image array type to obtain a pedestrian contour sequence.

In a further embodiment, the method for obtaining the normalized human body key point information sequence by substituting the gait video into the OpenPose algorithm module comprises the following steps:

acquiring position coordinates of each key point of a human body in a video based on the gait video;

selecting the position of a neck key point from position coordinates of each key point of a human body in a video as an origin, and normalizing other key points by taking the distance between the neck and the hip as a reference to obtain a normalized human body key point frame sequence;

wherein, the normalization formula is:

in the formula, P _i Is the position of the key point i, P' _i Is the normalized position of the key points, P is the position of the neck joint points, and D is the distance between the neck and hip key points.

In a further embodiment, the human body key point feature extraction module includes: an LSTM module and a CNN module; the method for importing the human body key point information sequence into the human body key point feature extraction module to obtain the features of the human body key points is to respectively transmit the obtained human body key point frame sequence into the LSTM module and the CNN module to obtain the features of each frame of the human body key points.

In a further embodiment, the method for obtaining the feature vector of the human body key points based on the features of the human body key points comprises the following steps:

obtaining a feature vector associated with each frame of features based on each frame of features of the key points of the human body, and connecting the obtained feature vectors of each frame;

and inputting the connected feature vectors into the compression block to obtain 62 gamma 128-dimensional human key point feature vectors.

In a further embodiment, the LSTM model consists of fully connected layers and LSTM layers, setting the characteristic dimension of LSTM to 256 dimensions;

the CNN module is a convolution layer provided with 10 layers of 3 gamma 3, the number of filters of the first layer of convolution layer is set to be 32, one pooling layer is respectively arranged between the second layer of convolution layer and the third layer of convolution layer as well as between the fifth layer of convolution layer and the sixth layer of convolution layer, the number of filters from the second layer to the fourth layer of convolution layer is set to be 64, the number of the rest filters is set to be 128, residual connection is carried out on the first layer of pooling layer and the fourth layer of convolution layer, residual connection is carried out on the second layer of pooling layer and the seventh layer of convolution layer, and the dimensionality of the full connection layer is set to be 256 dimensions;

the feature extraction module further comprises a compression module, wherein the compression module is composed of a BN layer, a ReLU layer, a Dropout layer and a 128-dimensional full connection layer.

In a further embodiment, the gait contour feature vector and the human body key point feature vector are connected and then input into the feature fusion module, and the method for obtaining the gait fusion feature comprises the following steps:

connecting each dimension of the gait contour feature vector and the human key point feature vector to obtain a connection vector of each dimension;

leading the connection vector of each dimension into a full connection layer of the feature fusion module to obtain a human gait fusion feature vector;

the feature fusion module introduces a ternary loss function for training, and the expression formula of the ternary loss function is as follows:

in the formula, L _BA () Representing the sum of the loss values of the positive and negative samples,

representing the difference between the distance between the anchor sample and the positive sample and the distance between the anchor sample and the negative sample,

representing the i-th anchor sample,

representing the jth anchor sample, D the distance between two samples,

which represents the i-th positive sample,

the method comprises the steps of representing the ith negative sample, representing a threshold parameter set according to actual needs, and being used for controlling the difference between the distance between the anchor point sample and the positive sample and the distance between the anchor point sample and the negative sample, wherein a represents the anchor point sample, P represents the negative sample, n represents the negative sample, i represents the number of the positive sample, j represents the number of the negative sample, P represents P ids, and K represents that each id has K samples.

In a further embodiment, the method for importing the gait fusion features into the fusion network for feature learning to identify the identity of the person in the video includes:

fusing gait to features F _Q And fusing each feature F in the network feature library _G Calculating Euclidean distance to obtain gait fusion characteristic F _Q And fusing each feature F in the network feature library _G A distance result;

and based on the distance of the distance result, selecting an approximate distance result and determining a recognition result based on the characteristics associated with the approximate distance result, thereby completing the identification of the person in the video.

In a second aspect, the invention provides a processing device, which includes a memory and a processor, the memory stores a computer program, which is executed by the processor to implement the gait recognition method based on human body contour and key point feature fusion.

A third aspect the present invention provides a readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method described above.

Has the advantages that: compared with the prior art, the invention has the following advantages:

(1) the method respectively extracts the figure outline sequence and the skeleton key point sequence in the gait video by using a morphological method and an OpenPose algorithm to be used as the initial walking characteristic representation of the pedestrian.

(2) The invention introduces GaitSet algorithm to extract the characteristics in the contour sequence, and then extracts the time sequence characteristics in the skeleton key point sequence through the combination of the LSTM network and the CNN network.

(3) The invention obtains the fusion characteristics of the gait by using a characteristic layer fusion method, obtains the comprehensive characteristic expression of each person through the learning of a fusion network, and effectively improves the accuracy and reliability of gait recognition.

Drawings

Fig. 1 is a general method flow diagram.

Fig. 2 is an architecture diagram of a gait feature extraction network based on feature fusion.

Detailed Description

In order to more fully understand the technical content of the present invention, the technical solution of the present invention will be further described and illustrated with reference to the following specific embodiments, but not limited thereto.

As shown in fig. 1 and 2, a gait recognition method based on human body contour and key point feature fusion includes the following steps:

step 1) inputting a gait video of single walking to acquire a pedestrian contour sequence in the video

Wherein in the process of acquiring the pedestrian contour sequence in the video, the pedestrian contour sequence is cut to remove invalid pixel points, and the specific steps are as follows:

step 11) extracting the human body contour of each frame of image by using a KNN algorithm;

step 12) calculating the number of nonzero pixel points of each frame of single contour image, and if the sum of image pixels is less than 10000, not returning image information;

step 13) obtaining the highest and lowest indexes of the row pixel sum which is not 0, and cutting the upper and lower areas of the image;

step 14) obtaining the median of an x axis, regarding the median as the x center of a person, and if the median of the image cannot be found, not returning image information;

step 15) slicing from the center point to both sides to obtain an image array with 64 rows and columns, if the value exceeds the image range, translating to the right, and splicing from both sides through all 0 arrays;

and step 16) converting the image array type and returning to obtain the cut pedestrian outline sequence.

Step 2, substituting the gait video into an OpenPose algorithm module to obtain a normalized human body key point information sequence, and substituting the pedestrian contour sequence into a GaitSet algorithm module to obtain the characteristics of the gait contour sequence;

the steps of obtaining the normalized human body key point information sequence are as follows:

step 21) obtaining position coordinates of each key point of a human body in the video;

step 22) in the detection of key points of the human body, the neck and the hip are key points of relatively stable points, and the position of a neck joint point is selected as an origin;

step 23), taking the distance between the neck and the hip as a reference, and normalizing other key points, wherein the normalization formula is as follows:

wherein P is _i Is the position of the key point i, P' _i Is the normalized position of the key points, P is the position of the neck, and D is the distance between the neck and hip key points.

Secondly, the pedestrian contour feature extraction comprises the following steps:

step 24) inputting the pedestrian contour sequence obtained in the step 1) into a GaitSet network so as to obtain pedestrian contour characteristics;

step 25) calculating by a GaitSet network to obtain a 62 x 128-dimensional feature vector.

Step 3, a human key point feature extraction module consisting of LSTM and CNN is used for extracting the features of the human key points from the human key point information sequence;

step 4, based on the characteristics of the human key points, the method for obtaining the characteristic vectors of the human key points comprises the following steps:

step 41) respectively transmitting the obtained human body key point frame sequence into an LSTM module and a CNN module to obtain the characteristics of each frame of the human body key point and connecting the obtained characteristic vectors of each frame;

and 42) inputting the connected feature vectors into the compressed block to obtain 62 gamma 128-dimensional human key point feature vectors.

The human body key point feature extraction module consists of an LSTM network for extracting time sequence information, a CNN network for extracting space information and a compression module.

The LSTM model consists of fully connected layers and LSTM layers, setting the characteristic dimensions of LSTM to 256 dimensions.

The CNN model is provided with 10 convolutional layers of 3 gamma 3, the number of filters of the first convolutional layer is set to be 32, one pooling layer is respectively arranged between the second convolutional layer and the third convolutional layer and between the fifth convolutional layer and the sixth convolutional layer, the number of filters from the second convolutional layer to the fourth convolutional layer is set to be 64, the number of the rest filters is set to be 128, residual connection thought of ResNet is used for reference, the first pooling layer and the fourth convolutional layer are subjected to residual connection, the second pooling layer and the seventh convolutional layer are subjected to residual connection, and the dimensionality of all the connecting layers is set to be 256.

To prevent overfitting, a compression module is provided, which consists of a BN layer, a ReLU layer, a Dropout layer and a 128-dimensional fully-connected layer.

Step 5, connecting the gait contour feature vector and the human key point feature vector and then inputting the connected gait contour feature vector into a feature fusion module, wherein the method for obtaining the gait fusion feature comprises the following steps:

step 51) connecting each dimension of the 62 gamma 128-dimensional gait contour characteristic vector and the 62 gamma 128-dimensional human body key point characteristic vector to obtain a connecting vector of each dimension;

step 52) leading the connection vector of each dimension into a full connection layer of the feature fusion module to obtain a human gait fusion feature vector;

representing the i-th anchor sample,

representing the jth anchor sample, D the distance between two samples,

which represents the i-th positive sample,

Step 6, importing the gait fusion characteristics into a fusion network for characteristic learning, and the method for identifying the identity of the person in the video comprises the following steps:

step 61) fusing gait features F _Q And fusing each feature F in the network feature library _G Calculating Euclidean distance to obtain gait fusion characteristic F _Q And fusing each feature F in the network feature library _G A distance result;

step 62) based on the distance of the distance result, selecting an approximate distance result and determining a recognition result based on the characteristics related to the approximate distance result, thereby completing identity recognition of the person in the video.

The method comprises the steps of obtaining a gait video sequence, respectively extracting a human body contour sequence and a human body key point sequence from the video sequence, respectively extracting characteristics of the human body contour sequence and the human body key point sequence by using a human body contour characteristic extraction module and a human body key point characteristic extraction module, then carrying out characteristic layer fusion to obtain gait fusion characteristics, carrying out characteristic learning through a fusion network, and introducing a ternary loss function for training, thereby realizing accurate identification of the identity of a person in the video.

Embodiment 2 provides a processing apparatus comprising a memory and a processor, the memory storing a computer program which is executed by the processor to implement the following gait recognition method based on human body contour and key point feature fusion:

connecting the gait contour feature vector with the human body key point feature vector and then inputting the connected gait contour feature vector into a feature fusion module to obtain gait fusion features;

Embodiment 3 provides a readable storage medium on which a computer program is stored which, when executed by a processor, implements the steps of the above-described method.

In conclusion, the figure outline sequence and the skeleton key point sequence in the gait video are respectively extracted by using a morphological method and an OpenPose algorithm and are used as the initial walking characteristic representation of the pedestrian; introducing a GaitSet algorithm to extract features in the contour sequence, and extracting time sequence features in the skeleton key point sequence through the combination of an LSTM network and a CNN network; the method has the advantages that the feature layer fusion method is used for obtaining the fusion features of the gait, the comprehensive feature expression of each person is obtained through the learning of the fusion network, and the accuracy and the reliability of gait recognition are effectively improved.

Embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A gait recognition method based on human body contour and key point feature fusion is characterized by comprising the following steps:

2. The gait recognition method based on the fusion of the human body contour and the key point features as claimed in claim 1, wherein the process of acquiring the pedestrian contour sequence in the video based on the input single-person walking gait video further comprises the steps of cutting the pedestrian contour sequence, and the method of acquiring the cut pedestrian contour sequence comprises the steps of:

and converting the image array type to obtain a pedestrian contour sequence.

3. The gait recognition method based on human body contour and key point feature fusion as claimed in claim 1, wherein the method for obtaining the normalized human body key point information sequence by substituting the gait video into the openpos algorithm module comprises:

based on the gait video, acquiring position coordinates of each key point of a human body in the video;

wherein, the normalization formula is:

in the formula, P _i Is the position of the key point i, P' _i Is the normalized position of the key point, P is the neck joint pointD is the distance between the neck and hip key points.

4. The gait recognition method based on human body contour and key point feature fusion of claim 1, wherein the human body key point feature extraction module comprises: an LSTM module and a CNN module; the method for importing the human body key point information sequence into the human body key point feature extraction module to obtain the features of the human body key points is to respectively transmit the obtained human body key point frame sequence into the LSTM module and the CNN module to obtain the features of each frame of the human body key points.

5. The gait recognition method based on the fusion of the human body contour and the key point features as claimed in claim 1, wherein the method for obtaining the human body key point feature vector based on the features of the human body key point comprises:

and compressing the input connected feature vectors to obtain the human key point feature vectors of 62 gamma 128 dimensions.

6. The gait recognition method based on the fusion of the human body contour and the key point features according to claim 4, characterized in that the LSTM model is composed of a fully connected layer and an LSTM layer, and the characteristic dimension of the LSTM is set to 256 dimensions;

7. The gait recognition method based on human body contour and key point feature fusion of claim 1, characterized in that the gait contour feature vector and the human body key point feature vector are connected and then input into the feature fusion module, and the method for obtaining the gait fusion feature comprises:

representing the i-th anchor sample,

representing the jth anchor sample, D the distance between two samples,

which represents the i-th positive sample,

8. The gait recognition method based on the human body contour and the key point feature fusion of claim 1, characterized in that the gait fusion features are introduced into a fusion network for feature learning, and the method for recognizing the identity of the person in the video comprises the following steps:

9. A processing apparatus comprising a memory and a processor, wherein the memory stores a computer program which is executed by the processor to implement the gait recognition method based on the fusion of the body contour and the key point feature according to any one of claims 1 to 8.

10. A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.