[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112836637A - Pedestrian re-identification method based on space reverse attention network - Google Patents

Pedestrian re-identification method based on space reverse attention network Download PDF

Info

Publication number
CN112836637A
CN112836637A CN202110146335.4A CN202110146335A CN112836637A CN 112836637 A CN112836637 A CN 112836637A CN 202110146335 A CN202110146335 A CN 202110146335A CN 112836637 A CN112836637 A CN 112836637A
Authority
CN
China
Prior art keywords
attention
pedestrian
features
spatial
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110146335.4A
Other languages
Chinese (zh)
Other versions
CN112836637B (en
Inventor
宋晓宁
王鹏
冯振华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202110146335.4A priority Critical patent/CN112836637B/en
Publication of CN112836637A publication Critical patent/CN112836637A/en
Application granted granted Critical
Publication of CN112836637B publication Critical patent/CN112836637B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pedestrian re-identification method based on a space reverse attention network, which comprises the following steps: collecting the shot pictures and dividing the pictures into a training set and a testing set; constructing a space reverse attention network model based on Resnet-50, training the convolutional neural network according to the training set, and adding CBAM-Pro; dividing the network into two branches according to the added CBAM-Pro, simultaneously executing forward learning and reverse attention, and extracting forward and reverse global features and local features; and connecting the extracted features according to the channel dimensions to obtain pedestrian identification features containing various types of features, and performing re-identification verification on the pedestrian identification features by using the test set to complete re-identification of pedestrians. The invention extracts various types of pedestrian identification features based on the space reverse attention network, and improves the effectiveness and reliability of re-identification.

Description

Pedestrian re-identification method based on space reverse attention network
Technical Field
The invention relates to the technical field of intelligent security, in particular to a pedestrian re-identification method based on a space reverse attention network.
Background
Pedestrian re-identification has a great demand in the field of intelligent security, aims to associate the same pedestrians in different time and different places, and generally provides a to-be-retrieved picture of a pedestrian, extracts the characteristics of the query picture and the pictures in the image library through a trained model, and sorts the pictures in the image library according to the similarity of the characteristics so as to retrieve images of the pedestrian. In recent years, the task of re-identifying pedestrians has been greatly improved, but because in an open outdoor environment, images of pedestrians can generate large differences due to the existence of interferences such as postures, shading, clothes, background noise, camera view angle and the like, the task of re-identifying pedestrians is still a very challenging task.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
The present invention has been made in view of the above-mentioned conventional problems.
Therefore, the technical problem solved by the invention is as follows: the prior art can not extract abundant pedestrian characteristics, so that higher re-identification accuracy can not be obtained.
In order to solve the technical problems, the invention provides the following technical scheme: collecting the shot pictures and dividing the pictures into a training set and a testing set; constructing a space reverse attention network model based on Resnet-50, training the convolutional neural network according to the training set, and adding CBAM-Pro; dividing the network into two branches according to the added CBAM-Pro, simultaneously executing forward learning and reverse attention, and extracting forward and reverse global features and local features; and connecting the extracted features according to the channel dimensions to obtain pedestrian identification features containing various types of features, and performing re-identification verification on the pedestrian identification features by using the test set to complete re-identification of pedestrians.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the forward learning and reverse attention executing process comprises the steps that after the network passes through CBAM-Pro, the network is divided into two branches, one branch is normally trained, namely forward learning is carried out, then a gradient-guided space attention is utilized to obtain a reverse mask, and therefore reverse attention is carried out on the other branch.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the generation of the spatial attention map includes, given a feature map F ∈ RC×H×WAnd its counter-propagating gradient tensor G ∈ RC ×H×WWhere C is the number of channels of the feature map and H × W represents the size of the feature map, first a weight vector W ∈ R is generated for G using global average poolingC×1Then, the gradient-guided spatial attention is calculated.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the spatial attention is intended to further include,
Figure BDA0002930542490000021
wherein wiThe i-th element, F, of W(i)The high value of M characterizes the more interesting positions in the feature map.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the spatial inverse mask includes a spatial inverse mask including,
Figure BDA0002930542490000022
wherein, aiAnd miThe elements at pixel position i of a and M are indicated, respectively, and T indicates the set spatial attention threshold.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the overall loss function for training the convolutional neural network includes,
L=β1Lsoftmax2Ltriplet
wherein L issoftmaxRepresenting the sum of the cross-entropy losses of all features, LtripletRepresenting the sum of triad losses, beta1、β2Represents the equilibrium parameters, and is defined as beta in the experiment1=2,β2=1。
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the sum of the cross-entropy losses of all the features includes,
Figure BDA0002930542490000023
wherein C represents the number of classes in the data set, W represents the weight vector of the corresponding class, N represents the experimental batch size, fiRepresenting the features in each batch.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the sum of the losses of the triad includes,
Figure BDA0002930542490000031
wherein,
Figure BDA0002930542490000032
respectively representing the characteristics of the anchor identity, the positive samples and the negative samples, and alpha represents the margin parameter of the loss of the triples.
As a preferable scheme of the pedestrian re-identification method based on the space reverse attention network, the method comprises the following steps: the CBAM-Pro representation improved convolution block attention model comprises an efficient channel attention module utilizing ECANet to improve CBAM, the channel weight feature vector comprises,
w=σ(CIDk(Cavg)+CIDk(Cmax))
wherein σ represents Sigmoid activation function, CIDkRepresenting a one-dimensional convolution operation with a convolution kernel size of k.
The invention has the beneficial effects that: the invention extracts various types of pedestrian identification features based on the space reverse attention network, and improves the effectiveness and reliability of re-identification.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:
fig. 1 is a schematic basic flow chart of a pedestrian re-identification method based on a spatial reverse attention network according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a spatial reverse attention network model of a pedestrian re-identification method based on a spatial reverse attention network according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an experimental result of a channel attention neighborhood parameter k of a pedestrian re-identification method based on a spatial reverse attention network according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.
Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Example 1
The pedestrian re-identification task aims to find the same pedestrian under different cameras, and although the development of deep learning brings great improvement to the pedestrian re-identification, the pedestrian re-identification task is still a challenging task. In recent years, attention mechanisms are widely verified to have excellent effects on the task of re-identifying pedestrians, but the combined use effects of different types of attention mechanisms (such as space attention, self-attention and the like) still need to be explored.
Referring to fig. 1 to 3, an embodiment of the present invention provides a pedestrian re-identification method based on a spatial reverse attention network, including:
s1: collecting the shot pictures and dividing the pictures into a training set and a testing set;
s2: constructing a space reverse attention network model based on Resnet-50, training a convolutional neural network according to a training set, and adding CBAM-Pro;
it should be noted that, the overall loss function for training the convolutional neural network includes,
L=β1Lsoftmax2Ltriplet
wherein L issoftmaxRepresenting the sum of the cross-entropy losses of all features, LtripletRepresenting the sum of triad losses, beta1、β2Represents the equilibrium parameters, and is defined as beta in the experiment1=2,β2=1。
Wherein the sum of the cross-entropy losses of all features comprises,
Figure BDA0002930542490000051
wherein C represents the number of classes in the data set, W represents the weight vector of the corresponding class, N represents the experimental batch size, fiRepresenting the features in each batch.
The sum of the losses of the triad is included,
Figure BDA0002930542490000052
wherein,
Figure BDA0002930542490000053
respectively representing the characteristics of the anchor identity, the positive samples and the negative samples, and alpha represents the margin parameter of the loss of the triples.
Further, CBAM-Pro represents an improved convolution block attention model including,
the CBAM is improved by an efficient channel attention module of ECANet, the channel weight feature vector includes,
w=σ(CIDk(Cavg)+CIDk(Cmax))
wherein σ represents Sigmoid activation function, CIDkRepresenting a one-dimensional convolution operation with a convolution kernel size of k.
Specifically, the invention provides an improved convolution block attention model CBAM-Pro by improving CBAM, because CBAM considers the attention of channel and space, when the CBAM is improved, the operation of space dimension still follows CBAM, the invention mainly focuses on the channel attention module of CBAM, actually, the multilayer perceptron used by CBAM in channel attention is a Squeeze-and-Excitation module, attention weight distribution is carried out by two fully connected layers, and the invention introduces an efficient channel attention module of ECANet to improve CBAM by considering that ECANet proves that Squeeze operation can bring negative effect on the prediction of channel attention.
Firstly, the characteristic diagram F is respectively passed through two pooling layers to obtain CavgAnd CmaxAnd then replacing the multilayer perceptron with an efficient channel attention module to distribute attention weights, wherein the efficient channel attention module captures information interaction between channels by paying attention to one channel and k neighbors adjacent to the channel so as to distribute the attention weights, so that the method can be easily realized by only one-dimensional convolution, and a better effect is achieved compared with the method of compressing a channel domain by using the Squeeze operation. Where the efficient channel attention module is used here as for the multi-layered perceptron in CBAM for CavgAnd CmaxAlso sharing parameters, and then obtaining channel weight feature vectors by element-by-element addition and Sigmoid operation:
w=σ(CIDk(Cavg)+CIDk(Cmax))
wherein σ represents Sigmoid activation function, CIDkThe one-dimensional convolution operation with the convolution kernel size k is shown, the CBAM-Pro not only retains the excellent characteristics of the CBAM, but also introduces the ECANet to the improvement of the Squeeze-and-Excitation module, so that the performance is more excellent, the selection of the parameter k is particularly important for the cross-channel information interaction module, and as can be seen from the graph 3, the neighborhood parameter k of the CBAM-Pro is tested on the Market1501 data set. In order to eliminate the influence brought by other methods, the single-path global feature model is used as a Baseline (Baseline), 6 groups of settings are compared in the experiment respectively, the Baseline experiment is carried out, CBAM is added to the Baseline, CBAM-Pro with different neighborhood parameters k is added to the Baseline, the improvement of the model of adding the Baseline with CBAM on mAP/rank-1 indexes can be seen from the graph, on the basis, the CBAM-Pro model with different parameters k is added to the Baseline, the model result is always superior to the addition of the CBAM no matter how the k value changes, and the effectiveness of the improvement of the CBAM-Pro is verified. In addition, it can be seen that mAP/rank-1 achieves the best results with k being 7. According to ECANet, the selection of the k value has a certain relation with the model and the number of characteristic diagram channels, ResNet-50 has better performance for larger k value, the number of the characteristic diagram channels using the model is 512, and k is obtained to be 5 by the method of automatically calculating the k value proposed by ECANet. From this, it is reasonable to find that the model achieves the optimal result when k is 7 under the combined action of two factors.
S3: dividing the network into two branches according to the added CBAM-Pro, simultaneously executing forward learning and reverse attention, and extracting forward and reverse global features and local features;
it should be noted that the forward learning and reverse attention performing process includes:
after the network passes through CBAM-Pro, the network is divided into two branches, one branch is normally trained, namely forward learning is carried out, and then reverse masks are obtained by utilizing the spatial attention of gradient guidance, so that the other branch is reversely noticed.
Wherein, the generation of the space attention graph comprises the following steps:
given characteristic diagram F ∈ RC×H×WAnd its counter-propagating gradient tensor G ∈ RC×H×WWhere C is the number of channels in the feature map and H × W represents the size of the feature map, first a weight vector W ∈ R is generated for G using global average poolingC×1Then, the gradient-guided spatial attention is calculated.
Spatial attention is further directed to the fact that,
Figure BDA0002930542490000071
wherein wiThe i-th element, F, of W(i)The high value of M characterizes the more interesting positions in the feature map.
Wherein the spatial inverse mask comprises, in part,
Figure BDA0002930542490000072
wherein, aiAnd miThe elements at pixel position i of a and M are indicated, respectively, and T indicates the set spatial attention threshold.
Specifically, in the training phase of the convolutional neural network, the gradient of the feature map in the back propagation operation characterizes the sensitivity of different positions of the feature map to prediction perception, that is, even a slight change in the gradient of a certain position can have a strong influence on the prediction result, and the network update can focus on the position element more. Based on this, the attention of each spatial pixel location is characterized by a gradient, thereby generating a visualized attention thermal image. Given characteristic diagram F ∈ RC×H×WAnd its counter-propagating gradient tensor G ∈ RC×H×WWhere C denotes the number of channels of the feature map, and H × W denotes the size of the feature map. First, a weight vector W ∈ R is generated for G by using global average poolingC×1Then, the gradient-guided spatial attention is calculated:
Figure BDA0002930542490000073
wherein, wiThe i-th element, F, of W(i)The subgraph of the ith channel of the feature map F is shown, with the high value of M characterizing the more interesting positions in the feature map.
In the training process, the gradient descent algorithm forces to be converged to the most sensitive position in the image, so that for the recognition task, many less sensitive positions in the image are ignored, the gradient-guided attention image is subjected to threshold processing, the sensitive positions in the original feature image are shielded by using an inverse mask, so that the network is forced to recognize according to the insensitive positions of the image, a gradient-guided spatial attention image M is defined, and a spatial inverse mask A can be obtained:
Figure BDA0002930542490000074
wherein, aiAnd miRepresenting the elements at pixel location i of a and M, respectively, and T representing a set spatial attention threshold, multiplying mask a with the corresponding feature map may force the network to learn from locations where the feature map is insensitive.
Further, as shown in fig. 2, the network structure used in the present invention is a network structure, where the network infrastructure adopts Resnet-50 which is most frequently used in the task of pedestrian re-identification, CBAM-Pro is added to the network after res _ conv _3 for attention learning, the network is divided into two branches after CBAM-Pro, and after normal training (referred to as forward learning) is performed on one branch, a gradient-guided spatial attention is used to obtain a reverse mask, so that reverse attention is performed on the other branch.
Specifically, for the branch of forward attention, the feature map F of the res _ conv _4 layer is obtained by a gradient echo propagation algorithm1Calculating a spatial attention map M guided by the gradient, performing threshold processing and 0-1 negation on the M to obtain a reverse mask A, and obtaining the reverse mask A and a feature map F passing through a res _ conv _4 layer on the reverse attention branch2Multiply element by element, make F2And shielding the global features of the two branches to a high attention position so as to force the network to perform attention learning on a reverse attention branch and a low sensitive position, respectively processing the global features of the two branches through the remaining convolution layers and then using global average-potential (GAP) to pass through dimension reduction layers including 1 × 1 convolution, batch normalization and ReLU activation to obtain two 256-dimensional independent attention global features, wherein the calculation process of the reverse mask of the dotted part is only used in a training stage.
In addition, a sub-branch is divided after the res _ conv _4 layer in the two attention branches for extracting local features, in order to keep the local features to have proper receiving domains, res _ conv _5 of the local feature branches does not use down-sampling operation, after res _ conv _5 is passed and respective local feature maps are divided, global max-posing (GMP) is used for each obtained local feature map, different from GAP used for the global feature map, GMP is used for the local feature maps to be more beneficial to digging out the local features with most discriminativity, each 256-dimensional local feature is obtained through a corresponding dimension reduction layer, and finally, the global feature and the local feature are connected according to channel dimensions to obtain the final pedestrian distinguishing feature containing the multi-type features.
S4: and connecting the extracted features according to the channel dimensions to obtain pedestrian identification features containing various types of features, and performing re-identification verification on the pedestrian identification features by using a test set to complete re-identification of pedestrians.
The invention improves the convolution block attention model, combines the improved convolution block attention model and the space reverse attention network model on the basis to obtain a joint attention module capable of extracting different attention characteristics, and aims at the pedestrian re-identification task based on the joint attention module and introduces local branches.
Example 2
In order to verify the technical effects adopted in the method, the embodiment adopts the traditional technical scheme and the method of the invention to carry out comparison test, and compares the test results by means of scientific demonstration to verify the real effect of the method.
The embodiment performs experiments on three data sets, namely Marker-1501, DukeMTMC-reiD and CUHK03, which are most frequently used by the task of pedestrian re-identification, and evaluates the experimental results by using the first successful matching probability (rank-1) and the mean average precision (mAP).
The Marker-1501 comprises 1501 pedestrians with different identities shot by 6 cameras, 32668 pictures containing single pedestrians are generated by a DPM detector and are divided into non-overlapping training/testing sets, 12936 pictures containing 751 pedestrians with different identities are generated in the training sets, 3368 query pictures and 19732 gallery pictures from 750 pedestrians with different identities are generated in the testing sets, and a detection frame of the query pictures is drawn manually, so that the accuracy of a testing result is ensured; DukeMTMC-reID is a pedestrian re-identification subset of the DukeMTMC dataset, which was acquired using 8 cameras, and which included 36411 pictures of 1812 pedestrian identities, of which 1404 pedestrians appeared under more than 2 cameras, the 1404 pictures of the 1404 pedestrians were divided by random sampling into a training set and a test set, which respectively included 702 pictures of pedestrians, and the remaining 408 pedestrians appeared under only 1 camera, and their pictures were also added to the gallery of the test set as interferences, the training set included 16522 pictures, the gallery included 17661 pictures, the query set included 2228 pictures, the CUHK03 dataset included 14097 pictures of 1467 pedestrians, each identity was acquired by 2 different cameras, of which 767 pictures were used for training, and another 700 pictures were used for testing, which provides both manual and automatic labeling of detectors, this example was tested under both labeled data sets.
The method for extracting local features in the space reverse attention network adopts a direct division method, the size of an input picture is adjusted to 384 multiplied by 128 in a training stage, then data enhancement is carried out by adopting random horizontal turning, standardization and random erasure, and the picture is also adjusted to be tested384 × 128, and only standardized processing is performed, the infrastructure of the network adopts the network model sharing parameters before ResNet-50 and res _ conv _3 pre-trained on the ImageNet data set, then CBAM-Pro is introduced into the model, all branches divided in the network are trained in parallel, wherein all branches after res _ conv _3 in the model are initialized by using the pre-training weights of the corresponding layers after res _ conv _3 of ResNet-50, the experimental batch size is set to be 32, P ═ 8 identities are randomly sampled from the training set, K ═ 4 pictures are sampled from each identity, the network is trained by using Adam optimizer, the initial learning rate is set to be 3 × 10-4A total of 250 cycles of training, and when the training is performed to 150 and 230 cycles, the learning rate is decreased to 3 × 10, respectively-5And 3X 10-6And the margin of the triple loss is set to be 1.2, wherein the reverse attention operation is only used in the training stage, and two branches in the testing stage directly extract the features.
The embodiment compares the results of the method of the present invention on three reference data sets with the recent method, which includes a method using an attention mechanism, a method using a partition local feature, and other advanced methods, and in order to ensure the fairness of experimental comparison, the embodiment and the compared method do not use a reordering method, and the experimental results are shown in the following table.
Table 1: and (5) a Market-1501 data set experiment result performance comparison table.
Figure BDA0002930542490000101
Table 2: DukeMTMC-reiD data set Experimental results Performance comparison (%) Table.
Figure BDA0002930542490000102
Figure BDA0002930542490000111
Table 3: CUHK03 data set Experimental results Performance comparison (%) Table.
Figure BDA0002930542490000112
The comparison of the performances of the Market-1501 and DukeMTMC-reiD data sets is respectively shown in Table 1 and Table 2, and it can be seen that the method of the present invention is superior to other methods using an attention mechanism, for example, for Auto-Reid with excellent performance, mAP is respectively promoted by 2.72%/3.63% and rank-1 is respectively promoted by 0.80%/0.70%, MGN is still in an advantage in the method based on dividing local features, rank-1 on the Market-1501 is only slightly lower than MGN, and mAP and rank-1 indexes of mAP and DukeMTMC-reiD are both better than MGN, especially for mAP index, the MGN is greatly promoted by 0.92%/0.33% on both data sets compared with MGN; compared with other advanced methods, the method has excellent results, and compared with BDB, the method improves the mAP of two data sets by 1.12%/2.73% respectively and improves the rank-1 of DukeMTMC-reiD by 0.20%; the comparison of the performance of the data sets of CUHK03 is shown in Table 3, the existing methods such as Auto-ReiD and BDB have excellent results on two labeling sets of CUHK03, EANet has good performance on a detection labeling set, under the two labeling sets, the results of the method of the invention are superior to the methods, compared with the BDB with the most excellent performance, mAP/rank-1 on a manual labeling set is respectively improved by 0.82%/0.10%, mAP/rank-1 on a detection labeling set is respectively improved by 0.11%/0.67%, it is noted that MGN has excellent performance on Market-1501 and DukeMTMC-reiD, but the performance on CUHK03 is poor, and compared with the three data sets, the method of the invention has good performance.
It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims (9)

1. A pedestrian re-identification method based on a space reverse attention network is characterized by comprising the following steps:
collecting the shot pictures and dividing the pictures into a training set and a testing set;
constructing a space reverse attention network model based on Resnet-50, training the convolutional neural network according to the training set, and adding CBAM-Pro;
dividing the network into two branches according to the added CBAM-Pro, simultaneously executing forward learning and reverse attention, and extracting forward and reverse global features and local features;
and connecting the extracted features according to the channel dimensions to obtain pedestrian identification features containing various types of features, and performing re-identification verification on the pedestrian identification features by using the test set to complete re-identification of pedestrians.
2. The pedestrian re-identification method based on the spatial reverse attention network as claimed in claim 1, wherein: the forward learning, reverse attention performing process includes,
after the network passes through CBAM-Pro, the network is divided into two branches, one branch is normally trained, namely forward learning is carried out, and then reverse masks are obtained by utilizing the spatial attention of gradient guidance, so that the other branch is reversely noticed.
3. The pedestrian re-identification method based on the spatial reverse attention network as claimed in claim 2, wherein: the generation of the spatial attention map includes,
given characteristic diagram F ∈ RC×H×WAnd its counter-propagating gradient tensor G ∈ RC×H×WWhere C is the number of channels of the feature map and H × W represents the size of the feature map, first a weight vector W ∈ R is generated for G using global average poolingC×1Then, the gradient-guided spatial attention is calculated.
4. The pedestrian re-identification method based on the spatial reverse attention network as claimed in claim 2 or 3, wherein: the spatial attention is intended to further include,
Figure FDA0002930542480000011
wherein wiThe i-th element, F, of W(i)The high value of M characterizes the more interesting positions in the feature map.
5. The pedestrian re-identification method based on the spatial reverse attention network as claimed in claim 1 or 2, wherein: the spatial inverse mask includes a spatial inverse mask including,
Figure FDA0002930542480000012
wherein, aiAnd miThe elements at pixel position i of a and M are indicated, respectively, and T indicates the set spatial attention threshold.
6. The pedestrian re-identification method based on the spatial reverse attention network according to claim 5, wherein: the overall loss function for training the convolutional neural network includes,
L=β1Lsoftmax2Ltriplet
wherein L issoftmaxRepresenting the sum of the cross-entropy losses of all features, LtripletRepresenting the sum of triad losses, beta1、β2Represents the equilibrium parameters, and is defined as beta in the experiment1=2,β2=1。
7. The pedestrian re-identification method based on the spatial reverse attention network as claimed in claim 6, wherein: the sum of the cross-entropy losses of all the features includes,
Figure FDA0002930542480000021
wherein C represents the number of classes in the data set, W represents the weight vector of the corresponding class, N represents the experimental batch size, fiRepresenting the features in each batch.
8. The pedestrian re-identification method based on the spatial reverse attention network as claimed in claim 6, wherein: the sum of the losses of the triad includes,
Figure FDA0002930542480000022
wherein,
Figure FDA0002930542480000023
respectively representing the characteristics of the anchor identity, the positive samples and the negative samples, and alpha represents the margin parameter of the loss of the triples.
9. The pedestrian re-identification method based on the spatial reverse attention network according to claim 8, wherein: the CBAM-Pro representation modified convolution block attention model includes,
the CBAM is improved by an efficient channel attention module of ECANet, the channel weight feature vector comprising,
w=σ(CIDk(Cavg)+CIDk(Cmax))
wherein σ represents Sigmoid activation function, CIDkRepresenting a one-dimensional convolution operation with a convolution kernel size of k.
CN202110146335.4A 2021-02-03 2021-02-03 Pedestrian re-identification method based on space reverse attention network Active CN112836637B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110146335.4A CN112836637B (en) 2021-02-03 2021-02-03 Pedestrian re-identification method based on space reverse attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110146335.4A CN112836637B (en) 2021-02-03 2021-02-03 Pedestrian re-identification method based on space reverse attention network

Publications (2)

Publication Number Publication Date
CN112836637A true CN112836637A (en) 2021-05-25
CN112836637B CN112836637B (en) 2022-06-14

Family

ID=75931804

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110146335.4A Active CN112836637B (en) 2021-02-03 2021-02-03 Pedestrian re-identification method based on space reverse attention network

Country Status (1)

Country Link
CN (1) CN112836637B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657355A (en) * 2021-10-20 2021-11-16 之江实验室 Global and local perception pedestrian re-identification method fusing segmentation information
CN115393788A (en) * 2022-08-03 2022-11-25 华中农业大学 Multi-scale monitoring pedestrian re-identification method based on global information attention enhancement
CN115862073A (en) * 2023-02-27 2023-03-28 国网江西省电力有限公司电力科学研究院 Transformer substation harmful bird species target detection and identification method based on machine vision

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070073A (en) * 2019-05-07 2019-07-30 国家广播电视总局广播电视科学研究院 Pedestrian's recognition methods again of global characteristics and local feature based on attention mechanism
CN111325111A (en) * 2020-01-23 2020-06-23 同济大学 Pedestrian re-identification method integrating inverse attention and multi-scale deep supervision
CN111368815A (en) * 2020-05-28 2020-07-03 之江实验室 Pedestrian re-identification method based on multi-component self-attention mechanism
US20200226421A1 (en) * 2019-01-15 2020-07-16 Naver Corporation Training and using a convolutional neural network for person re-identification
CN111507217A (en) * 2020-04-08 2020-08-07 南京邮电大学 Pedestrian re-identification method based on local resolution feature fusion
CN111539370A (en) * 2020-04-30 2020-08-14 华中科技大学 Image pedestrian re-identification method and system based on multi-attention joint learning
CN111881780A (en) * 2020-07-08 2020-11-03 上海蠡图信息科技有限公司 Pedestrian re-identification method based on multi-layer fusion and alignment division
CN111898736A (en) * 2020-07-23 2020-11-06 武汉大学 Efficient pedestrian re-identification method based on attribute perception
CN111931624A (en) * 2020-08-03 2020-11-13 重庆邮电大学 Attention mechanism-based lightweight multi-branch pedestrian heavy identification method and system
CN112183468A (en) * 2020-10-27 2021-01-05 南京信息工程大学 Pedestrian re-identification method based on multi-attention combined multi-level features

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200226421A1 (en) * 2019-01-15 2020-07-16 Naver Corporation Training and using a convolutional neural network for person re-identification
CN110070073A (en) * 2019-05-07 2019-07-30 国家广播电视总局广播电视科学研究院 Pedestrian's recognition methods again of global characteristics and local feature based on attention mechanism
CN111325111A (en) * 2020-01-23 2020-06-23 同济大学 Pedestrian re-identification method integrating inverse attention and multi-scale deep supervision
CN111507217A (en) * 2020-04-08 2020-08-07 南京邮电大学 Pedestrian re-identification method based on local resolution feature fusion
CN111539370A (en) * 2020-04-30 2020-08-14 华中科技大学 Image pedestrian re-identification method and system based on multi-attention joint learning
CN111368815A (en) * 2020-05-28 2020-07-03 之江实验室 Pedestrian re-identification method based on multi-component self-attention mechanism
CN111881780A (en) * 2020-07-08 2020-11-03 上海蠡图信息科技有限公司 Pedestrian re-identification method based on multi-layer fusion and alignment division
CN111898736A (en) * 2020-07-23 2020-11-06 武汉大学 Efficient pedestrian re-identification method based on attribute perception
CN111931624A (en) * 2020-08-03 2020-11-13 重庆邮电大学 Attention mechanism-based lightweight multi-branch pedestrian heavy identification method and system
CN112183468A (en) * 2020-10-27 2021-01-05 南京信息工程大学 Pedestrian re-identification method based on multi-attention combined multi-level features

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BAIYI SHU: "AMNet: Convolutional Neural Network embeded with Attention Mechanism for Semantic Segmentation", 《HPCCT 2019: PROCEEDINGS OF THE 2019 3RD HIGH PERFORMANCE COMPUTING AND CLUSTER TECHNOLOGIES CONFERENCEJUNE 2019, PP 261–266,HTTPS://DOI.ORG/10.1145/3341069.3342988》 *
SHANSHAN WANG等: "Dual Attentive Features for Person Re-identification", 《2019 5TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTICS (ICCAR) 19-22 APRIL 2019》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657355A (en) * 2021-10-20 2021-11-16 之江实验室 Global and local perception pedestrian re-identification method fusing segmentation information
CN115393788A (en) * 2022-08-03 2022-11-25 华中农业大学 Multi-scale monitoring pedestrian re-identification method based on global information attention enhancement
CN115862073A (en) * 2023-02-27 2023-03-28 国网江西省电力有限公司电力科学研究院 Transformer substation harmful bird species target detection and identification method based on machine vision

Also Published As

Publication number Publication date
CN112836637B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
Zhuge et al. Salient object detection via integrity learning
CN109543602B (en) Pedestrian re-identification method based on multi-view image feature decomposition
CN112836637B (en) Pedestrian re-identification method based on space reverse attention network
CN111738143B (en) Pedestrian re-identification method based on expectation maximization
Li et al. HAR-Net: Joint learning of hybrid attention for single-stage object detection
Faraki et al. Log‐Euclidean bag of words for human action recognition
Dou et al. Metagait: Learning to learn an omni sample adaptive representation for gait recognition
CN111709313B (en) Pedestrian re-identification method based on local and channel combination characteristics
CN112070044A (en) Video object classification method and device
CN115482508A (en) Reloading pedestrian re-identification method, reloading pedestrian re-identification device, reloading pedestrian re-identification equipment and computer-storable medium
CN112580480A (en) Hyperspectral remote sensing image classification method and device
Li et al. Egocentric action recognition by automatic relation modeling
Zhou et al. Exploiting visual context semantics for sound source localization
Ramesh Babu et al. A novel framework design for semantic based image retrieval as a cyber forensic tool
Zhang et al. Video action recognition with Key-detail Motion Capturing based on motion spectrum analysis and multiscale feature fusion
Dong et al. Scene-oriented hierarchical classification of blurry and noisy images
CN111582057B (en) Face verification method based on local receptive field
CN116229580A (en) Pedestrian re-identification method based on multi-granularity pyramid intersection network
CN116311345A (en) Transformer-based pedestrian shielding re-recognition method
Wang et al. Self-trained video anomaly detection based on teacher-student model
Luo et al. An efficient feature pyramid attention network for person re-identification
Sathiyaprasad et al. Content based video retrieval using Improved gray level Co-occurrence matrix with region-based pre convoluted neural network–RPCNN
Raboh et al. Learning latent scene-graph representations for referring relationships
Lu et al. Complementary pseudolabel based on global-and-channel information for unsupervised person reidentification
Chen et al. Spatial mask ConvLSTM network and intra-class joint training method for human action recognition in video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant