CN113392740B

CN113392740B - Pedestrian heavy identification system based on dual attention mechanism

Info

Publication number: CN113392740B
Application number: CN202110618743.5A
Authority: CN
Inventors: 李玲; 沈欣怡; 郭润北
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2021-06-03
Filing date: 2021-06-03
Publication date: 2022-06-28
Anticipated expiration: 2041-06-03
Also published as: CN113392740A

Abstract

The invention belongs to the technical field of image processing, and particularly relates to a pedestrian re-identification system based on a dual attention mechanism; attention mechanisms are introduced into the strongbaseline network and comprise a channel attention mechanism and a space attention mechanism, wherein the channel attention mechanism can promote a model by compressing in a space dimension so as to focus on a key channel; the spatial attention mechanism may highlight semantic pixels by aggregating similar features of all channels; the essence of the attention mechanism is to emphasize important positions useful for learning objects and suppress irrelevant information by assigning weight coefficients to image feature information; the attention mechanism is inserted into the human re-recognition model, so that the problems of camera angle, body posture change, body misalignment, image diversification and the like are solved, the feature extraction capability of the network model can be improved on the premise of not obviously increasing the calculated amount and the parameter amount, and the network performance is improved.

Description

Pedestrian heavy identification system based on dual attention mechanism

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a pedestrian re-identification system based on a dual attention mechanism.

Background

In recent years, researchers have conducted extensive research into Person re-identification (Person re-identification), which aims to verify the identity of a pedestrian in a sequence of images captured by non-overlapping cameras, has many applications in public safety video surveillance, and has great practical significance for security and criminal investigations. In recent years, with the development of deep learning, convolutional neural networks have been successfully used for human re-recognition. These methods achieve good results when the background is relatively simple and the situation is relatively fixed. However, in many real-life scenarios, the situation is often more complex, and person re-recognition is a challenging task due to the presence of field changes, such as spatial misalignment, background interference, and pedestrian pose changes. The traditional convolutional neural network cannot adaptively focus on useful channels and regions of the feature map, which limits the accuracy of pedestrian re-identification.

Disclosure of Invention

Aiming at the defects of the prior art, in order to obtain higher accuracy, the invention provides a pedestrian re-identification system based on a double attention mechanism, which has a channel and space double attention mechanism, focuses on important features and inhibits unnecessary features, and can improve the feature extraction capability of a network model on the premise of not obviously increasing the calculated amount and the parameter amount.

The invention adopts the following technical scheme:

a pedestrian re-identification system based on a double attention mechanism introduces an attention mechanism in a strongbaseline network, and comprises a channel attention mechanism and a space attention mechanism, wherein the channel attention mechanism can promote a model to concentrate on a key channel by compressing in a space dimension; the spatial attention mechanism may highlight semantic pixels by aggregating similar features of all channels; the essence of the attention mechanism is to emphasize important positions useful for learning the target and suppress irrelevant information by assigning a weight coefficient to image feature information.

A pedestrian re-identification system based on a double attention mechanism is characterized in that a double attention mechanism module is inserted on the basis of a strongbaseline network; the structure is as follows:

the first layer is a convolution layer, the second layer is a normalization layer, the third layer is an activation function layer, the fourth layer is a pooling layer, and a Stage structure is formed by the Stage1, Stage2, Stage3 and Stage 4; wherein:

inserting a dual attention module behind the third layer of the first branch in the Conv Block of Stage1, and inserting a dual attention module behind the third convolutional layer in each Identity Block of Stage 1;

Inserting a dual attention mechanism module behind the third layer of the first branch in Conv Block of Stage2, and inserting a dual attention mechanism module behind the third convolutional layer in each Identity Block of Stage 2;

inserting a dual attention mechanism module behind the third layer of the first branch in Conv Block of Stage3, and inserting a dual attention mechanism module behind the third convolutional layer in each Identity Block of Stage 3;

inserting a dual attention mechanism module behind the third layer of the first branch in Conv Block of Stage4, and inserting a dual attention mechanism module behind the third convolutional layer in each Identity Block of Stage 4;

and finally, sequentially forming a pooling layer, a normalization layer, a full connection layer and a SoftMax classifier.

The method for constructing the channel attention mechanism in the dual attention mechanism module comprises the following specific steps:

the method comprises the following steps: respectively carrying out average pooling and maximum pooling on a feature graph F obtained by block at the insertion position of the double attention mechanism module to obtain two C-dimension pooling feature graphs:

and

step two: will be provided with

And

sending the data into a multilayer sensor MLP comprising a hidden layer to obtain two channel attention diagrams with the size of 1 × C; wherein, in order to reduce the number of parameters, the hidden layer of MLP The number of the neurons is C/r, and r is a compression ratio;

step three: and adding corresponding elements of the two channel attention diagrams obtained through the multilayer perceptron MLP, then performing an activation function, wherein the activation function adopts a Sigmoid activation function to obtain a final channel attention mechanism Mc (F), and applying Mc (F) to the feature diagram F to obtain a final channel attention diagram F'.

The space attention mechanism in the dual attention mechanism module is constructed by the following specific steps:

the method comprises the following steps: for the final channel attention diagram F', firstly carrying out maximum pooling and average pooling along the channel direction to obtain two-dimensional feature maps

And

carrying out concat dimension splicing on the two obtained two-dimensional characteristic graphs to obtain spliced characteristic graphs, wherein the sizes of the two characteristic graphs are 1 × H × W;

step two: and generating a spatial attention mechanism Ms (F ') by using the spliced feature map through a convolution layer with a convolution kernel size of 7 x 7, and applying Ms (F') to the feature map F 'to obtain a final spatial attention map F'.

The pedestrian re-identification system based on the dual attention mechanism has the specific structure that:

the first layer is a convolution layer, the number of convolution kernels is 64, the size of each convolution kernel is 7 x 7, the second layer is a normalization layer, the third layer is an activation function layer, the activation function adopts a Relu activation function, the fourth layer is a pooling layer, the maximum pooling is adopted, and the pooling size is 3 x 3;

Next, Stage structure comprising Stage1, Stage2, Stage3, Stage 4; wherein:

stage1 consists of a Conv Block and 2 Identity blocks, where the Conv Block contains two branches, the first layer of the first branch is a convolutional layer, the number of convolutional cores is 64, each convolutional core size is 1 × 1, the second layer is a convolutional layer, the number of convolutional cores is 64, each convolutional core size is 3 × 3, the third layer is a convolutional layer, the number of convolutional cores is 256, each convolutional core size is 1 × 1, a dual attention mechanism module is inserted behind the layer, the second branch is a convolutional layer, the number of convolutional cores is 256, and each convolutional core size is 1 × 1; adding a BN layer after each convolution layer of each branch, and fusing the obtained characteristic graphs to obtain a new input characteristic graph; the first layer of the Identity Block is a convolution layer, the number of convolution kernels is 64, the size of each convolution kernel is 1 x 1, the second layer is a convolution layer, the number of convolution kernels is 64, the size of each convolution kernel is 3 x 3, the third layer is a convolution layer, the number of convolution kernels is 256, the size of each convolution kernel is 1 x 1, and a BN layer is added after each convolution layer; inserting a double attention mechanism module behind the third layer of each Identity Block, and fusing the feature graph of the Identity Block with the previous Block feature to obtain a new input feature graph;

Stage2 consists of a Conv Block and 3 Identity blocks, where the Conv Block contains two branches, the first layer of the first branch is a convolutional layer, the number of convolutional cores is 128, each convolutional core size is 1 × 1, the second layer is a convolutional layer, the number of convolutional cores is 128, each convolutional core size is 3 × 3, the third layer is a convolutional layer, the number of convolutional cores is 512, each convolutional core size is 1 × 1, a dual attention mechanism module is inserted behind the layer, the second branch is a convolutional layer, the number of convolutional cores is 512, and each convolutional core size is 1 × 1; adding a BN layer after each convolution layer of each branch, and fusing the characteristic diagrams to obtain a new input characteristic diagram; the first layer of the Identity Block is a convolution layer, the number of convolution kernels is 128, the size of each convolution kernel is 1 × 1, the second layer is a convolution layer, the number of convolution kernels is 128, the size of each convolution kernel is 3 × 3, the third layer is a convolution layer, the number of convolution kernels is 512, the size of each convolution kernel is 1 × 1, and a BN layer is added after each convolution layer; inserting a double attention mechanism module behind the third layer of each Identity Block layer, and fusing the feature graph of the Identity Block with the previous Block feature to obtain a new input feature graph;

Stage3 consists of a Conv Block and 5 Identity blocks, where the Conv Block comprises two branches, the first layer of the first branch is a convolutional layer, the number of convolutional cores is 256, each convolutional core size is 1 × 1, the second layer is a convolutional layer, the number of convolutional cores is 256, each convolutional core size is 3 × 3, the third layer is a convolutional layer, the number of convolutional cores is 1024, each convolutional core size is 1 × 1, a dual attention mechanism module is inserted behind the layer, the second branch is a layer of convolutional layers, the number of convolutional cores is 1024, each convolutional core size is 1 × 1; adding a BN layer after each convolution layer of each branch, and fusing the characteristic diagrams of the two branches to obtain a new input characteristic diagram; the first layer of the Identity Block is a convolution layer, the number of convolution kernels is 256, the size of each convolution kernel is 1 × 1, the second layer is a convolution layer, the number of convolution kernels is 256, the size of each convolution kernel is 3 × 3, the third layer is a convolution layer, the number of convolution kernels is 1024, the size of each convolution kernel is 1 × 1, and a BN layer is added after each convolution layer; inserting a double attention mechanism module behind the third layer of each Identity Block layer, and fusing the feature graph of the Identity Block with the previous Block feature to obtain a new input feature graph;

Stage4 consists of a Conv Block and 2 Identity blocks, where the Conv Block contains two branches, the first layer of the first branch is convolutional layer, the number of convolutional cores is 512, each convolutional core size is 1 × 1, the second layer is convolutional layer, the number of convolutional cores is 512, each convolutional core size is 3 × 3, the third layer is convolutional layer, the number of convolutional cores is 2048, each convolutional core size is 1 × 1, a dual attention mechanism module is inserted behind the layer, the second branch is convolutional layer, the number of convolutional cores is 2048, each convolutional core size is 1 × 1; adding a BN layer after each convolution layer of each branch, and fusing the characteristic diagrams of the two branches to obtain a new input characteristic diagram; the first layer of the Identity Block is convolution layers, the number of convolution kernels is 512, the size of each convolution kernel is 1 × 1, the second layer is convolution layers, the number of convolution kernels is 512, the size of each convolution kernel is 3 × 3, the third layer is convolution layers, the number of convolution kernels is 2048, the size of each convolution kernel is 1 × 1, and BN layers are added after each convolution layer; inserting a double attention mechanism module behind the third layer of each Identity Block layer, and fusing the feature graph of the Identity Block with the previous Block feature to obtain a new input feature graph;

And sequentially passing the obtained feature graph through a pooling layer, a normalization layer, a full connection layer and a SoftMax classifier, and classifying by the SoftMax classifier according to the features to obtain the category of the image.

And the pooling layer adopts global average pooling, and the pooling size is 3 x 3.

The training process of the pedestrian re-identification system based on the double attention mechanism is as follows:

step one, acquiring a public pedestrian re-identification data set, and carrying out normalization operation on the sizes of pictures in the data set, so that the pixel size of each picture is 256 × 128;

secondly, initializing parameters of a strongbaseline network in the pedestrian re-identification system based on the double attention mechanism by adopting ImageNet pre-training network parameters, and randomly initializing the parameters by an introduced double attention mechanism module;

and step three, inputting the data set processed in the step one as a training set into a pedestrian re-identification system based on a double attention mechanism, enabling the system to learn the characteristics of each pedestrian in the training set by adopting a back propagation algorithm and a random gradient descent method, finally evaluating the effectiveness of the system in pedestrian re-identification through two indexes of mAP and Rank1, and obtaining a well-trained system when the mAP and Rank1 reach optimal values simultaneously.

The invention has the beneficial effects that:

the invention combines the recognition model and the attention mechanism in the pedestrian, inserts the attention mechanism into the personnel re-recognition model, reduces the problems of camera angle, body posture change, body misalignment, image diversification and the like, can improve the feature extraction capability of the network model on the premise of not obviously increasing the calculated amount and the parameter amount, improves the network performance, more accurately recognizes the pedestrians in the same category, and better assists other fields such as safety, criminal investigation and the like.

Drawings

FIG. 1 is a schematic diagram of the system of the present invention;

FIG. 2 is a schematic diagram of a dual attention mechanism module according to the present invention;

FIG. 3 is a schematic view of the channel attention mechanism of the present invention;

FIG. 4 is a schematic view of the spatial attention mechanism of the present invention.

Detailed Description

The invention relates to a pedestrian re-identification algorithm based on a double attention mechanism, which is characterized in that an attention mechanism module is inserted into a strongbasepine network, the attention mechanism module comprises a channel attention mechanism and a space attention mechanism, an attention diagram is multiplied by an input characteristic diagram, and self-adaptive characteristic refinement is carried out, wherein:

the channel attention mechanism utilizes the inter-channel relation of the features to generate a channel attention graph, namely weight, each layer of the feature graph obtained through convolution is multiplied by different weights to represent the association degree and the importance degree of the features represented by the layer to the key information, and correspondingly, the larger the weight is, the more important the information represented by the layer to the key information is, the higher the association degree is; the smaller the weight is, the less important the information expressed by the layer is for the key information, the weight of each dimension is obtained, and the new characteristic is obtained by correspondingly multiplying the weight to the values of different channels.

The spatial attention mechanism utilizes the spatial relationship among the features to generate a spatial attention map, and by means of the attention mechanism, more attention is paid to the position characteristic, the spatial information in the original picture is transformed into another space through a spatial conversion module, and key information is reserved.

the first layer is a convolutional layer, the second layer is a normalization layer, the third layer is an activation function layer, the fourth layer is a pooling layer, and the Stage structure comprises Stage1, Stage2, Stage3 and Stage 4; wherein:

inserting a dual attention mechanism module behind the third layer of the first branch in Conv Block of Stage1, and inserting a dual attention mechanism module behind the third convolutional layer in each Identity Block of Stage 1;

And sequentially passing the obtained characteristic diagram through a pooling layer, a normalization layer, a full connection layer and a SoftMax classifier, wherein the SoftMax classifier classifies the classes of the pedestrians according to the characteristics.

the method comprises the following steps: and (3) respectively carrying out average pooling and maximum pooling on the feature diagram F obtained by block at the insertion position of the double attention mechanism module, aggregating spatial information and obtaining two C-dimensional pooling feature diagrams:

and

step two: will be provided with

And

sending the data into a multilayer sensor MLP comprising a hidden layer to obtain two channel attention diagrams with the size of 1 × C; wherein, in order to reduce the parameter number, the number of hidden layer neurons of the MLP is C/r, and r is a compression ratio;

the method comprises the following steps: for the final channel attention diagram F', the maximum pooling and the average pooling are firstly carried out along the channel direction to obtain two-dimensional characteristic maps

And

performing concat dimension splicing on the two obtained two-dimensional feature maps to obtain spliced feature maps, wherein the sizes of the two feature maps are 1 × H × W;

step two: and generating a spatial attention mechanism Ms (F ') through the convolution layer with the convolution kernel size of 7 × 7 for the spliced feature map, and applying Ms (F ') to the feature map F ' to obtain a final spatial attention map F ″.

The characteristic diagram without the channel attention mechanism is F, F is obtained after the channel attention mechanism is carried out on F, and F 'is obtained after the space attention mechanism is carried out on F'.

The pedestrian re-identification system based on the double attention mechanism comprises 2 basic blocks, one is an Identity Block, and the input and output dimensions are the same, so that a plurality of the basic blocks can be connected in series; another basic Block is Conv Block, the input and output dimensions are different, and they cannot be connected in series, and its specific structure is:

Next, Stage structure comprising Stage1, Stage2, Stage3, Stage 4; wherein:

stage1 consists of a Conv Block and 2 Identity blocks, where the Conv Block contains two branches, the first layer of the first branch is a convolutional layer, the number of convolutional cores is 64, each convolutional core size is 1 × 1, the second layer is a convolutional layer, the number of convolutional cores is 64, each convolutional core size is 3 × 3, the third layer is a convolutional layer, the number of convolutional cores is 256, each convolutional core size is 1 × 1, a dual attention mechanism module is inserted behind the layer, the second branch is a convolutional layer, the number of convolutional cores is 256, and each convolutional core size is 1 × 1; adding a BN layer after each convolution layer of each branch, and fusing the obtained characteristic graphs of the two branches to obtain a new input characteristic graph; the first layer of the Identity Block is a convolution layer, the number of convolution kernels is 64, the size of each convolution kernel is 1 × 1, the second layer is a convolution layer, the number of convolution kernels is 64, the size of each convolution kernel is 3 × 3, the third layer is a convolution layer, the number of convolution kernels is 256, the size of each convolution kernel is 1, and a BN layer is added after each convolution layer; inserting a double attention mechanism module behind the third layer of each Identity Block, and fusing the feature graph of the Identity Block with the previous Block feature to obtain a new input feature graph;

The first Identity Block is fused with the previous Conv Block feature, and the second Identity Block is fused with the previous Identity Block feature;

stage2 consists of a Conv Block and 3 Identity blocks, where the Conv Block contains two branches, the first layer of the first branch is a convolutional layer, the number of convolutional cores is 128, each convolutional core size is 1 × 1, the second layer is a convolutional layer, the number of convolutional cores is 128, each convolutional core size is 3 × 3, the third layer is a convolutional layer, the number of convolutional cores is 512, each convolutional core size is 1 × 1, a dual attention mechanism module is inserted behind the layer, the second branch is a convolutional layer, the number of convolutional cores is 512, and each convolutional core size is 1 × 1; adding a BN layer after each convolution layer of each branch, and fusing the characteristic diagrams of the two branches to obtain a new input characteristic diagram; the first layer of the Identity Block is a convolution layer, the number of convolution kernels is 128, each convolution kernel is 1 × 1, the second layer is a convolution layer, the number of convolution kernels is 128, each convolution kernel is 3 × 3, the third layer is a convolution layer, the number of convolution kernels is 512, each convolution kernel is 1 × 1, and a BN layer is added after each convolution layer; inserting a double attention mechanism module behind the third layer of each Identity Block layer, and fusing the feature graph of the Identity Block with the previous Block feature to obtain a new input feature graph;

different pedestrian photos are arranged in the pedestrian re-identification data set, different pedestrian categories are represented by different numbers, and each pedestrian has a plurality of different photos;

secondly, initializing parameters of a strongbaseline network in the pedestrian re-identification system based on a double attention mechanism by adopting ImageNet pre-training network parameters (which are well-known files of pth type and are directly used after being downloaded), and randomly initializing the parameters by an introduced double attention mechanism module;

The effectiveness of the model in the pedestrian re-recognition task is evaluated through mAP and Rank1 indexes, 1000 epoch training models are set, when 660 epochs are trained, mAP and Rank1 reach optimal values, and a well-trained model is obtained, wherein the loss adopts triple loss, center loss and ID loss.

The whole process is a model optimization process, and the aim is to obtain a model with good effect. The model optimization process needs to use a back propagation algorithm and a gradient descent method, a Loss value is calculated during model training, back propagation iteration is carried out according to the magnitude of the Loss value of forward propagation to update the weight of each layer, and the back propagation continuously optimizes the model according to the Loss value so that the model finds good parameters.

Example 2

As shown in fig. 1, the pedestrian re-identification system with dual attention mechanism inserts an attention mechanism module on the basis of strongbaseline. The pedestrian re-identification model of the double attention mechanism has 2 basic blocks, one is an Identity Block, and the input and output dimensions are the same, so that a plurality of pedestrian re-identification models can be connected in series; another basic Block is Conv Block, the input and output dimensions are different, and they cannot be connected in series, and its specific structure is:

The first layer is a convolution layer, the number of convolution kernels is 64, the size of each convolution kernel is 7 x 7, the second layer is a normalization layer, the third layer is an activation function layer, wherein the activation function adopts a Relu activation function, the fourth layer is a pooling layer, the maximum pooling is adopted, and the pooling size is 3 x 3;

next, Stage structure including Stage1, Stage2, Stage3, Stage 4.

Stage1 is composed of a Conv Block and 2 Identity blocks, wherein the Conv Block comprises two branches, the first layer of the first branch is a convolution layer, the number of convolution kernels is 64, the size of each convolution kernel is 1 × 1, the second layer is a convolution layer, the number of convolution kernels is 64, the size of each convolution kernel is 3 × 3, the third layer is a convolution layer, the number of convolution kernels is 256, the size of each convolution kernel is 1 × 1, a double attention mechanism module is inserted behind the layer, the second branch is a layer of convolution layers, the number of convolution kernels is 256, the size of each convolution kernel is 1 × 1, a BN layer is added behind each convolution layer of each branch, and the obtained feature maps of the two branches are fused to obtain a new input feature map. The first layer of the Identity Block is convolution layers, the number of convolution kernels is 64, the size of each convolution kernel is 1 x 1, the second layer is convolution layers, the number of convolution kernels is 64, the size of each convolution kernel is 3 x 3, the third layer is convolution layers, the number of convolution kernels is 256, the size of each convolution kernel is 1 x 1, and BN layers are added after each convolution layer; inserting a double attention mechanism module behind the third layer of each Identity Block layer, and fusing the feature graph of the branch with the previous Block feature to obtain a new input feature graph;

Stage2 is composed of Conv Block and 3 Identity Block, wherein Conv Block includes two branches, the first layer of the first branch is convolution layer, the number of convolution kernels is 128, the size of each convolution kernel is 1 × 1, the second layer is convolution layer, the number of convolution kernels is 128, the size of each convolution kernel is 3 × 3, the third layer is convolution layer, the number of convolution kernels is 512, the size of each convolution kernel is 1 × 1, a double attention mechanism module is inserted behind the layer, the second branch is one convolution layer, the number of convolution kernels is 512, the size of each convolution kernel is 1 × 1, BN layer is added behind each convolution layer of each branch, and feature maps of the two branches are fused to obtain a new input feature map. The first layer of the Identity Block is a convolution layer, the number of convolution kernels is 128, the size of each convolution kernel is 1 x 1, the second layer is a convolution layer, the number of convolution kernels is 128, the size of each convolution kernel is 3 x 3, the third layer is a convolution layer, the number of convolution kernels is 512, the size of each convolution kernel is 1 x 1, and a BN layer is added after each convolution layer; inserting a double attention mechanism module behind the third layer of each Identity Block layer, and fusing the feature graph of the branch with the previous Block feature to obtain a new input feature graph;

Stage3 is composed of a Conv Block and 5 Identity blocks, wherein the Conv Block comprises two branches, the first layer of the first branch is a convolution layer, the number of convolution kernels is 256, the size of each convolution kernel is 1 × 1, the second layer is a convolution layer, the number of convolution kernels is 256, the size of each convolution kernel is 3 × 3, the third layer is a convolution layer, the number of convolution kernels is 1024, the size of each convolution kernel is 1 × 1, a double attention mechanism module is inserted behind the layer, the second branch is a layer of convolution layers, the number of convolution kernels is 1024, the size of each convolution kernel is 1 × 1, a BN layer is added behind each convolution layer of each branch, and the feature maps of the two branches are fused to obtain a new input feature map. The first layer of the Identity Block is convolution layers, the number of convolution kernels is 256, the size of each convolution kernel is 1 x 1, the second layer is convolution layers, the number of convolution kernels is 256, the size of each convolution kernel is 3 x 3, the third layer is convolution layers, the number of convolution kernels is 1024, the size of each convolution kernel is 1 x 1, and BN layers are added after each convolution layer; inserting a double attention mechanism module behind the third layer of each Identity Block layer, and fusing the feature graph of the branch with the previous Block feature to obtain a new input feature graph;

Stage4 is composed of Conv Block and 2 Identity Block, wherein Conv Block includes two branches, the first layer of the first branch is convolution layer, the number of convolution kernels is 512, the size of each convolution kernel is 1 × 1, the second layer is convolution layer, the number of convolution kernels is 512, the size of each convolution kernel is 3 × 3, the third layer is convolution layer, the number of convolution kernels is 2048, the size of each convolution kernel is 1 × 1, a double attention mechanism module is inserted behind the layer, the second branch is one convolution layer, the number of convolution kernels is 2048, the size of each convolution kernel is 1 × 1, a BN layer is added behind each convolution layer of each branch, and feature maps of the two branches are fused to obtain a new input feature map. The first layer of the Identity Block is a convolution layer, the number of convolution kernels is 512, the size of each convolution kernel is 1 x 1, the second layer is a convolution layer, the number of convolution kernels is 512, the size of each convolution kernel is 3 x 3, the third layer is a convolution layer, the number of convolution kernels is 2048, the size of each convolution kernel is 1 x 1, and a BN layer is added after each convolution layer; inserting a double attention mechanism module behind the third layer of each Identity Block layer, and fusing the feature graph of the branch with the previous Block feature to obtain a new input feature graph;

Sequentially passing the obtained characteristic diagram through a pooling layer, and performing global average pooling, wherein the pooling size is 3 x 3; a normalization layer; and finally, extracting image features by adopting a depth convolution mode of a full connection layer in the network, obtaining dimension features, then classifying according to the features by using a SoftMax classifier, and obtaining image categories.

The training process of the pedestrian re-identification algorithm with the double attention mechanism is as follows:

step one, acquiring a public pedestrian re-identification data set, and carrying out normalization operation on the picture size to enable the pixel size of each picture to be 256 × 128;

secondly, initializing pedestrian re-recognition model parameters of the double attention mechanism by adopting ImageNet pre-training network parameters, and randomly initializing parameters by an introduced attention mechanism module;

and step three, inputting the data set into a pedestrian re-identification model with a double attention mechanism for training, enabling the pedestrian re-identification model with the double attention mechanism to learn the characteristics of each pedestrian in the training set, adopting a back propagation algorithm and a random gradient descent method for the pedestrian re-identification with the training double attention mechanism, and carrying out back propagation iteration to update the weight of each layer according to the magnitude of the Loss value of the forward propagation. The effectiveness of the model in a pedestrian re-identification task is evaluated through the mAP and the Rank1, 1000 epoch training models are set, when 660 epochs are trained, the mAP and the Rank1 reach optimal values, and the trained model is obtained, wherein the loss adopts triple loss, center loss and ID loss.

As shown in fig. 2, in the dual attention mechanism module, firstly, feature F extracted from each block of strongbaseline network is compressed in spatial dimension, and the compression adopts global maximum pooling and global average pooling to obtain two one-dimensional vectors, and then the operation is performed to obtain channel attention Mc, and F and Mc are fused into feature F'. And compressing the F ' on the channel by adopting global maximum pooling and global average pooling to obtain two one-dimensional vectors, then operating to obtain the attention Ms of the channel, and fusing the F ' and the Ms into a feature F '. Combining F' with F to obtain the final characteristic. The global average pooling has feedback to each pixel point on the feature map, and the global maximum pooling has the feedback of the gradient only at the place with the maximum response in the feature map when the gradient back propagation calculation is carried out, and can be used as a supplement of the global average pooling.

As shown in fig. 3, a structure diagram of the channel attention mechanism includes the following specific steps:

the method comprises the following steps: and (3) performing average pooling and maximum pooling operations on the feature graph F obtained by each block respectively, and aggregating spatial information to obtain two C-dimensional pooling feature graphs:

And

step two: will be provided with

And

sending the signal into a multilayer sensor MLP comprising a hidden layer to obtain two channel attention diagrams of 1 × C. Among them, in order to reduce the number of parameters, the number of hidden layer neurons is C/r, and r is called the compression ratio.

Step three: adding corresponding elements of the two channel attention diagrams obtained through MLP, obtaining a final channel attention mechanism Mc (F) through an activation function by adopting a Sigmoid activation function, and obtaining a final channel attention diagram F' by acting Mc (F) on a feature diagram F, wherein the formula is as follows:

wherein the final channel attention mechanism mc (f) is expressed as follows:

wherein W₀And W₁Respectively represents a hidden layer weight and an output layer weight, AvgPool (F) and MaxPool (F) are respectively

And

as shown in fig. 4, a structure diagram of the spatial attention mechanism is shown, and the spatial attention mechanism is constructed by the following specific steps:

the method comprises the following steps: for F', firstly, carrying out maximum pooling and average pooling along the channel direction to obtain two-dimensional characteristic maps

And

all attributes are 1 × H × W, and the two obtained feature graphs are subjected to concat dimension splicing to obtain spliced feature graphs

Step two: for the spliced feature map, a spatial attention mechanism Ms (F ') is generated through the convolution layer of 7 × 7, and the final spatial attention mechanism F ″ is obtained by applying Ms (F ') to the feature map F '.

The formula is as follows:

wherein the spatial attention mechanism Ms (F') is expressed as follows:

where σ denotes the Sigmoid function, f^7*7Represents the convolution operation of 7 x 7, AvgPool (F'); MaxPool (F') is respectively

And

the system can effectively match the same pedestrian, can improve the feature extraction capability of a network model on the premise of not obviously increasing the calculated amount and the parameters, and has strong model generalization capability and popularization capability reliability.

Claims

1. A pedestrian re-identification system based on a double attention mechanism is characterized in that a double attention mechanism module is inserted on the basis of a strongbaseline network; the structure is as follows:

inserting a dual attention module behind the second layer of the first branch in the Conv Block of Stage1, and inserting a dual attention module behind the third convolutional layer in each Identity Block of Stage 1;

Inserting a dual attention module behind the third layer of the first branch in the Conv Block of Stage2, and inserting a dual attention module behind the third convolutional layer in each Identity Block of Stage 2;

inserting a dual attention module behind the third layer of the first branch in the Conv Block of Stage3, and inserting a dual attention module behind the third convolutional layer in each Identity Block of Stage 3;

inserting a dual attention module behind the third layer of the first branch in the Conv Block of Stage4, and inserting a dual attention module behind the third convolutional layer in each Identity Block of Stage 4;

finally, a pooling layer, a normalization layer, a full connection layer and a SoftMax classifier are sequentially arranged;

next, Stage structures including Stage1, Stage2, Stage3, Stage 4; wherein:

Stage1 consists of a Conv Block and 2 Identity blocks, where the Conv Block contains two branches, the first layer of the first branch is a convolutional layer, the number of convolutional cores is 64, each convolutional core size is 1 × 1, the second layer is a convolutional layer, the number of convolutional cores is 64, each convolutional core size is 3 × 3, the third layer is a convolutional layer, the number of convolutional cores is 256, each convolutional core size is 1 × 1, a dual attention mechanism module is inserted behind the layer, the second branch is a convolutional layer, the number of convolutional cores is 256, and each convolutional core size is 1 × 1; adding a BN layer after each convolution layer of each branch, and fusing the obtained characteristic graphs to obtain a new input characteristic graph; the first layer of the Identity Block is a convolution layer, the number of convolution kernels is 64, the size of each convolution kernel is 1 x 1, the second layer is a convolution layer, the number of convolution kernels is 64, the size of each convolution kernel is 3 x 3, the third layer is a convolution layer, the number of convolution kernels is 256, the size of each convolution kernel is 1 x 1, and a BN layer is added after each convolution layer; inserting a double attention mechanism module into the back of the third layer of each Identity Block, and fusing the feature graph of the Identity Block with the previous Block feature to obtain a new input feature graph;

Stage3 consists of a Conv Block and 5 Identity blocks, where the Conv Block contains two branches, the first layer of the first branch is a convolutional layer, the number of convolutional cores is 256, each convolutional core size is 1 × 1, the second layer is a convolutional layer, the number of convolutional cores is 256, each convolutional core size is 3 × 3, the third layer is a convolutional layer, the number of convolutional cores is 1024, each convolutional core size is 1 × 1, a dual attention mechanism module is inserted behind the layer, the second branch is a convolutional layer, the number of convolutional cores is 1024, each convolutional core size is 1 × 1; adding a BN layer after each convolution layer of each branch, and fusing the characteristic diagrams of the two branches to obtain a new input characteristic diagram; the first layer of the Identity Block is a convolutional layer, the number of convolutional cores is 256, the size of each convolutional core is 1 × 1, the second layer is a convolutional layer, the number of convolutional cores is 256, the size of each convolutional core is 3 × 3, the third layer is a convolutional layer, the number of convolutional cores is 1024, the size of each convolutional core is 1 × 1, and a BN layer is added behind each convolutional layer; inserting a double attention mechanism module behind the third layer of each Identity Block layer, and fusing the feature graph of the Identity Block with the previous Block feature to obtain a new input feature graph;

Stage4 consists of a Conv Block and 2 Identity blocks, where the Conv Block contains two branches, the first layer of the first branch is convolutional layer, the number of convolutional cores is 512, each convolutional core size is 1 × 1, the second layer is convolutional layer, the number of convolutional cores is 512, each convolutional core size is 3 × 3, the third layer is convolutional layer, the number of convolutional cores is 2048, each convolutional core size is 1 × 1, a dual attention mechanism module is inserted behind the layer, the second branch is convolutional layer, the number of convolutional cores is 2048, each convolutional core size is 1 × 1; adding a BN layer after each convolution layer of each branch, and fusing the characteristic diagrams of the two branches to obtain a new input characteristic diagram; the first layer of the Identity Block is a convolutional layer, the number of convolutional cores is 512, the size of each convolutional core is 1 × 1, the second layer is a convolutional layer, the number of convolutional cores is 512, the size of each convolutional core is 3 × 3, the third layer is a convolutional layer, the number of convolutional cores is 2048, the size of each convolutional core is 1 × 1, and a BN layer is added after each convolutional layer; inserting a double attention mechanism module behind the third layer of each Identity Block layer, and fusing the feature graph of the Identity Block with the previous Block feature to obtain a new input feature graph;

Sequentially passing the obtained feature map through a pooling layer, a normalization layer, a full-link layer and a SoftMax classifier, and classifying the pedestrian category by the SoftMax classifier according to the features to obtain the category to which the image belongs;

2. The pedestrian re-identification system based on the dual attention mechanism is characterized in that the construction of the channel attention mechanism in the dual attention mechanism module comprises the following specific steps:

and

step two: will be provided with

And

3. The pedestrian re-identification system based on the dual attention mechanism is characterized in that the spatial attention mechanism in the dual attention mechanism module is constructed by the following specific steps:

The method comprises the following steps: for the final channel attention diagram F' first proceeds in the channel directionPerforming maximum pooling and average pooling to obtain two-dimensional characteristic graphs

And

step two: and generating a spatial attention mechanism Ms (F ') through the convolution layer with the convolution kernel size of 7 x 7 for the spliced feature map, and applying Ms (F') to the feature map F 'to obtain a final spatial attention map F'.

4. The dual attention mechanism-based pedestrian re-identification system of claim 3 wherein the pooling layer employs global average pooling of 3 x 3.