CN113792669A - Pedestrian re-identification baseline method based on hierarchical self-attention network - Google Patents
Pedestrian re-identification baseline method based on hierarchical self-attention network Download PDFInfo
- Publication number
- CN113792669A CN113792669A CN202111087471.7A CN202111087471A CN113792669A CN 113792669 A CN113792669 A CN 113792669A CN 202111087471 A CN202111087471 A CN 202111087471A CN 113792669 A CN113792669 A CN 113792669A
- Authority
- CN
- China
- Prior art keywords
- image
- pedestrian
- swin
- loss
- block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000000605 extraction Methods 0.000 claims abstract description 5
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 238000012545 processing Methods 0.000 claims description 18
- 239000011159 matrix material Substances 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 13
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 238000011176 pooling Methods 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 230000001902 propagating effect Effects 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000011840 criminal investigation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computational Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Algebra (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention provides a pedestrian re-identification baseline method based on a hierarchical self-attention network, and belongs to the field of computer vision. In the invention, the Swin transform is creatively used as a backbone network to be introduced into the field of pedestrian re-identification, the form of weighted sum of ID loss and Circle loss is used as a loss function, and the feature extraction capability is improved while the simple structure is ensured through effective data preprocessing and reasonable parameter adjustment. Compared with the traditional baseline method based on ResNet, the pedestrian re-identification method based on the RESNet has the advantage that the effect of pedestrian re-identification is remarkably improved.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a pedestrian re-identification baseline method based on a hierarchical self-attention network.
Background
The pedestrian re-identification needs to utilize a computer vision technology to identify a specific pedestrian under a cross-camera environment, a pedestrian monitoring image is given, the image of the pedestrian under a cross-device is searched, and the identification of the specific pedestrian has very important significance for violation judgment, criminal investigation, danger early warning and the like.
A good baseline method should achieve good effects while ensuring low parameters, the existing pedestrian re-identification baseline method is based on ResNet and limited by the limitation and insufficiency of a convolutional neural network on feature extraction, and the ResNet-based baseline method cannot achieve ideal effects.
With the progress of research, transformers are gradually applied to the field of computer vision. The existing pedestrian re-identification method based on the Transformer has the problems of overlarge calculated amount, single characteristic receptive field and the like.
Disclosure of Invention
The invention provides a pedestrian re-identification baseline method based on a hierarchical self-attention network, which aims to solve the problems in the background technology and achieve a good effect while having a simple structure.
The technical scheme of the invention is as follows:
a pedestrian re-identification baseline method based on a hierarchical self-attention network comprises the following specific steps:
step one, preprocessing data;
setting a total of N different pedestrians, each pedestrian including MiAn image of where Mi>1,MiIndicates the number of images in the class of the ith pedestrian, i indicates the ID number of each pedestrian; for the ith pedestrian, Mi1 image is used as a training set, 1 image is used as a verification set, i is used as a label and indicates that the image corresponds to the ith pedestrian;
1.1) using a bicubic interpolation algorithm to scale an image to (H, W, C) as an input image, wherein H represents the length of the image, W represents the width of the image, C represents the number of channels of the image, and C takes the value of 3; the method comprises the following specific steps:
1.1.1) constructing a Bicubic function:
wherein a is a variable value in the coefficient and is used for controlling the shape of the Bicubic curve;
1.1.2) interpolation formula is as follows:
wherein (x, y) represents the pixel point to be interpolated, and for each pixel point, 4 × 4 pixel points near the pixel point are taken to perform bicubic interpolation operation.
1.2) carrying out data enhancement by using a random erasure algorithm;
1.2.1) setting a threshold probability p, generating a random number p1 of 0-1, when p1> p, the input image is not processed, otherwise, the input image needs to be erased:
p1=Rand(0,1) (3)
1.2.2) determining an erasing area;
He=Rand(H/8,H/4) (4)
We=Rand(W/8,W/4) (5)
Se=He×We (6)
where H denotes a length of the input image and W denotes a width of the input image; heIndicates the length of erase, WeIndicates the width of erase, SeRepresents the area of erase;
1.2.3) determining an erasing coordinate;
xe=Rand(0,H-He) (7)
ye=Rand(0,W-We) (8)
wherein x iseRepresenting the upper left-hand x-coordinate of the erasure, yeRepresenting the erased upper left y coordinate.
Inputting the preprocessed image into a hierarchical self-attention network, namely a Swin Transformer neural network, and carrying out forward transmission;
the backbone network comprises 4 processing stages, wherein the 2-4 stages have the same network structure, and the specific steps are as follows:
2.1) stage 1;
2.1.1) block division; starting from the upper left corner of the image, the input image is divided into a non-coinciding set of image blocks, wherein each image block has a size of 4 x 4, the image is divided into a number of (4,4,3) image blocks, wherein the number of image blocks NpatchComprises the following steps:
Npatch=(H/4)×(W/4) (9)
2.1.2) linear embedding; flattening each image block into a vector with the dimension of C through a full connection layer, and sending the image blocks into two continuous Swin blocks;
2.1.3) extracting features from Swin blocks;
the Swin block comprises a Swin block 1 and a Swin block 2; the Swin block 1 mainly comprises a window multi-head self-attention module and a multilayer sensor, wherein layer standardization processing is carried out before the two modules, and residual connection is added after the two modules; the Swin block 2 mainly comprises a moving window multi-head self-attention module and a multilayer sensor, wherein layer standardization processing is carried out before the two modules, and residual connection is added after the two modules;
after the extraction of the Swin block, key feature information of the head, the hands, the actions and the like of the pedestrian can be obtained, and a feature set (H/4, W/4, C) is output;
2.2) stage 2;
2.2.1) block fusion; combining the input feature sets pairwise, adjusting the feature dimension to be twice of the original dimension by using a full-connection layer, and outputting feature sets (H/8, W/8 and 2C);
2.2.2) extracting features from Swin blocks; completely consistent with the Swin block structure in 2.1.3), and outputting a key feature set (H/8, W/8,2C) after the Swin block processing;
2.3) stages 3-4;
the network structures of the stage 3 and the stage 4 are completely consistent with the network structure of the stage 2, and after processing, feature sets (H/16, W/16,4C) and (H/32, W/32,8C) are respectively output;
2.4) a global average pooling layer and a full connection layer; and (3) carrying out global average pooling on the feature set output in the stage (4) to obtain a vector with the length of 8C, and mapping the features into N through a full connection layer, wherein N is the type of the pedestrians in the data set in the step (one).
Step three, calculating a loss function, and reversely propagating and updating network parameters;
3.1) the loss function consists of two parts, ID loss and Circle loss, and the formula is as follows:
Lreid=w1Lid+w2Lcircle (10)
wherein w1And w2Weights representing ID loss and Circle loss, respectively; l isreidRepresenting the total loss function, LidIndicates ID loss, LcircleIndicating Circle loss;
3.2) the ID loss formula is as follows:
where n denotes the number of samples per batch training, p (y)i|xi) Representing an input image xiIs set as label yiThe conditional probability of (a);
3.3) Circle loss formula as follows:
Δn=m (13)
Δm=1-m (14)
where N denotes the number of different pedestrian classes, MiRepresenting the number of images in the class of the ith pedestrian; gamma is a scale parameter; m is the stringency of the optimization; snIs an inter-class similarity score matrix, SpIs an intra-similarity score matrix; a isnAnd apIs a non-negative matrix, respectively SnAnd SpThe formula is as follows:
wherein SnIs an inter-class similarity score matrix, SpIs an intra-similarity score matrix;
3.4) setting hyper-parameters and training a network; adopting a preheating learning rate, wherein the learning rate is initially r and gradually increased to ten times of r in the previous 10 times of training; the optimizer adopts an optimized random gradient descent algorithm with an increase value of d1The weight decay sum of (1) is d2The momentum of (a); performing back propagation by using the set optimizer and learning rate and combining the loss values calculated in 3.1) -3.3), and updating network parameters.
Fourthly, carrying out pedestrian re-identification matching;
and (3) zooming the image of the pedestrian to be detected, inputting the zoomed image of the pedestrian to be detected into the Swin transform neural network in the step two, outputting the zoomed image, and processing the zoomed image by using softmax to obtain N probability values which respectively correspond to the probabilities that the pedestrian belongs to different classes, wherein the class with the maximum probability value is the identity of the pedestrian.
The invention has the beneficial effects that: the invention provides a pedestrian re-recognition baseline method based on a hierarchical self-attention network, which creatively introduces Swin transform as a backbone network into the field of pedestrian re-recognition, takes the form of weighted sum of ID loss and Circle loss as a loss function, and greatly improves the training effect while ensuring simple structure through effective data preprocessing and reasonable parameter adjustment.
Drawings
FIG. 1 is a general improved concept diagram of the present invention;
FIG. 2 is a model diagram of a baseline pedestrian re-identification method based on a hierarchical self-attention network according to the present invention;
fig. 3 is a schematic diagram of the Swin block structure.
Detailed Description
The following detailed description of the embodiments of the present invention is provided with reference to the accompanying drawings, which are used to implement the embodiments of the present invention, and give detailed descriptions and specific operation procedures. The data set of a specific experiment is a Market1501 data set collected in a university, and the training set comprises 751 persons and 12936 images; the test set had 750 people, and contained 19732 images.
Fig. 1 is a general improvement idea diagram of the present invention, and fig. 2 is a model diagram of a pedestrian re-identification baseline method based on a hierarchical self-attention network according to the present invention, which includes the following specific steps:
step one, preprocessing data;
751 people exist in the training set, N is 751, and each pedestrian comprises MiAn image of where Mi>1,MiIndicates the number of images in the class of the ith pedestrian, i indicates the ID number of each pedestrian; for the ith pedestrian, Mi1 image is used as a training set, 1 image is used as a verification set, i is used as a label and indicates that the image corresponds to the ith pedestrian;
1.1) scaling (224, 3) the image using a bicubic interpolation algorithm, where H represents the length of the image, W represents the width of the image, and C represents the number of channels of the image, as follows:
1.1.1) constructing a Bicubic function:
wherein a is-0.5, which is the variable value in the coefficient and is used for controlling the shape of the Bicubic curve;
1.1.2) interpolation formula is as follows:
wherein (x, y) represents the pixel point to be interpolated, and for each pixel point, 4 × 4 pixel points near the pixel point are taken to perform bicubic interpolation operation.
1.2) carrying out data enhancement by using a random erasure algorithm;
1.2.1) setting the threshold probability p to 0.5, generating a random number p1 of 0-1, when p1> p, the image is not processed, otherwise, the image needs to be erased:
p1=Rand(0,1) (3)
1.2.2) determining an erasing area;
He=Rand(H/8,H/4) (4)
We=Rand(W/8,W/4) (5)
Se=He×We (6)
where H denotes a length of the input image and W denotes a width of the input image; heIndicates the length of erase, WeIndicates the width of erase, SeRepresents the area of erase;
1.2.3) determining an erasing coordinate;
xe=Rand(0,H-He) (7)
ye=Rand(0,W-We) (8)
wherein x iseRepresenting the upper left-hand x-coordinate of the erasure, yeRepresenting the erased upper left y coordinate.
Inputting the preprocessed image into a hierarchical self-attention network, namely a Swin Transformer neural network, and carrying out forward transmission;
the backbone network comprises 4 processing stages, wherein the 2-4 stages have the same network structure, and the specific steps are as follows:
2.1) stage 1;
2.1.1) block division; starting from the upper left corner of the image, the input image is divided into a non-coinciding set of image blocks, wherein each image block has a size of 4 x 4, the image is divided into a number of (4,4,3) image blocks, wherein the number of image blocks NpatchComprises the following steps:
Npatch=(H/4)×(W/4) (9)
where H, W refer to the length and width of the input image, respectively, and N is the timepatch=56x56;
2.1.2) linear embedding; flattening each image block into a vector with dimension of 128 through a full connection layer, and sending the vector into two continuous Swin blocks;
2.1.3) extracting features from Swin blocks;
as shown in fig. 3, the Swin block includes Swin block 1 and Swin block 2; the Swin block 1 mainly comprises a window multi-head self-attention module and a multilayer sensor, wherein layer standardization processing is carried out before the two modules, and residual connection is added after the two modules; the Swin block 2 mainly comprises a moving window multi-head self-attention module and a multilayer sensor, wherein layer standardization processing is carried out before the two modules, and residual connection is added after the two modules;
after the extraction of the Swin block, key feature information of the head, hands, actions and the like of the pedestrian can be obtained, a feature set (56, 128) is output and transmitted to the next module;
2.2) stage 2;
2.2.1) block fusion; combining the input feature sets pairwise, adjusting feature dimensions to twice of the original feature dimensions by using a full connection layer, and outputting the feature sets (28, 256);
2.2.2) extracting features from Swin blocks; the structure is completely consistent with the structure 2.1.3), and a feature set (28, 256) is output after the Swin block processing;
2.3) stages 3-4;
the structure of the stage 3 and the stage 4 is completely consistent with that of the stage 2, and after processing, feature sets (14, 512) and (7, 1024) are respectively output;
2.4) a global average pooling layer and a full connection layer; and (3) carrying out global average pooling on the feature set output in the stage 4 to obtain a vector with the length of 1024, and mapping the features into 751 classes through a full-connection layer, wherein 751 is the class of pedestrians in the data set used in the embodiment.
Step three, calculating a loss function, and reversely propagating and updating network parameters;
3.1) the loss function consists of two parts, ID loss and Circle loss, and the formula is as follows:
Lreid=w1Lid+w2Lcircle (10)
wherein w1And w2Weights, w, representing ID loss and Circle loss, respectively1The value of 0.4, w2The value is 0.6; l isreidRepresenting the total loss function, LidIndicates ID loss, LcircleIndicating Circle loss;
3.2) the ID loss formula is as follows:
where n represents the number of samples per batch training, the value of this embodiment is 16, p (y)i|xi) Representing an input image xiIs set as label yiThe conditional probability of (a);
3.3) Circle loss formula as follows:
Δn=m (13)
Δm=1-m (14)
wherein N represents the number of different types of pedestrians, and the value 751 is taken by the embodiment; miRepresenting the number of images in the class of the ith pedestrian; γ is a scale parameter, 32 in this example; m is the severity of the optimization, which is 0.25 in this example; snIs an inter-class similarity score matrix, SpIs an intra-similarity score matrix; a isnAnd apIs a non-negative matrix, respectively SnAnd SpThe formula is as follows:
wherein SnIs an inter-class similarity score matrix, SpIs an intra-similarity score matrix;
and 3.4) setting hyper-parameters during training of the neural network as shown in the table 1, performing back propagation by using the set optimizer and learning rate in combination with the loss values calculated in 3.1) -3.3), and updating network parameters.
TABLE 1 hyper-parameter settings for training networks
Fourthly, carrying out pedestrian re-identification matching;
and (3) zooming the image of the pedestrian to be detected, inputting the zoomed image of the pedestrian to be detected into the Swin transform neural network in the step two, outputting the zoomed image, and processing the zoomed image by using softmax to obtain 751 probability values which respectively correspond to the probabilities that the pedestrian belongs to different classes, wherein the class with the maximum probability value is the identity of the pedestrian.
In the embodiment, a pedestrian re-recognition effect test is performed based on a Market1501 data set, and compared with the existing pedestrian re-recognition baseline model based on global features, as shown in table 2:
table 2 comparison of results with existing baseline model
The comparison of experimental results shows that the base line model provided by the invention can effectively improve Rank1 and mAP indexes of pedestrian re-identification, proves the effectiveness of the method and has great promoting significance on the practical application of pedestrian re-identification; in addition, the network structure is simple, the expandability is strong, and great reference significance is provided for designing pedestrian re-identification methods in the future.
Claims (1)
1. A pedestrian re-identification baseline method based on a hierarchical self-attention network is characterized by comprising the following steps:
step one, preprocessing data;
setting a total of N different pedestrians, each pedestrian including MiAn image of where Mi>1,MiIndicates the number of images in the class of the ith pedestrian, i indicates the ID number of each pedestrian; for the ith pedestrian, Mi1 image is used as a training set, 1 image is used as a verification set, i is used as a label and indicates that the image corresponds to the ith pedestrian;
1.1) using a bicubic interpolation algorithm to scale an image to (H, W, C) as an input image, wherein H represents the length of the image, W represents the width of the image, C represents the number of channels of the image, and C takes the value of 3; the method comprises the following specific steps:
1.1.1) constructing a Bicubic function:
wherein a is a variable value in the coefficient and is used for controlling the shape of the Bicubic curve;
1.1.2) interpolation formula is as follows:
wherein (x, y) represents the pixel points to be interpolated, and for each pixel point, 4 × 4 pixel points nearby the pixel point are taken to perform bicubic interpolation operation;
1.2) carrying out data enhancement by using a random erasure algorithm;
1.2.1) setting a threshold probability p, generating a random number p1 of 0-1, when p1> p, the input image is not processed, otherwise, the input image needs to be erased:
p1=Rand(0,1) (3)
1.2.2) determining an erasing area;
He=Rand(H/8,H/4) (4)
We=Rand(W/8,W/4) (5)
Se=He×We (6)
where H denotes a length of the input image and W denotes a width of the input image; heIndicates the length of erase, WeIndicates the width of erase, SeRepresents the area of erase;
1.2.3) determining an erasing coordinate;
xe=Rand(0,H-He) (7)
ye=Rand(0,W-We) (8)
wherein x iseRepresenting the upper left-hand x-coordinate of the erasure, yeRepresents the upper left-hand y coordinate of the erasure;
inputting the preprocessed image into a hierarchical self-attention network, namely a Swin Transformer neural network, and carrying out forward transmission;
the backbone network comprises 4 processing stages, wherein the 2-4 stages have the same network structure, and the specific steps are as follows:
2.1) stage 1;
2.1.1) block division; starting from the upper left corner of the image, the input image is divided into a non-coinciding set of image blocks, wherein each image block has a size of 4 x 4, the image is divided into a number of (4,4,3) image blocks, wherein the number of image blocks NpatchComprises the following steps:
Npatch=(H/4)×(W/4) (9)
2.1.2) linear embedding; flattening each image block into a vector with the dimension of C through a full connection layer, and sending the image blocks into two continuous Swin blocks;
2.1.3) extracting features from Swin blocks;
as shown in fig. 3, the Swin block includes Swin block 1 and Swin block 2; the Swin block 1 mainly comprises a window multi-head self-attention module and a multilayer sensor, wherein layer standardization processing is carried out before the two modules, and residual connection is added after the two modules; the Swin block 2 mainly comprises a moving window multi-head self-attention module and a multilayer sensor, wherein layer standardization processing is carried out before the two modules, and residual connection is added after the two modules;
after extraction is carried out through a Swin block, key feature information of the head, the hands and the actions of the pedestrian is obtained, and a feature set (H/4, W/4, C) is output;
2.2) stage 2;
2.2.1) block fusion; combining the input feature sets pairwise, adjusting the feature dimension to be twice of the original dimension by using a full-connection layer, and outputting feature sets (H/8, W/8 and 2C);
2.2.2) extracting features from Swin blocks; completely consistent with the Swin block structure in 2.1.3), and outputting a key feature set (H/8, W/8,2C) after the Swin block processing;
2.3) stages 3-4;
the network structures of the stage 3 and the stage 4 are completely consistent with the network structure of the stage 2, and after processing, feature sets (H/16, W/16,4C) and (H/32, W/32,8C) are respectively output;
2.4) a global average pooling layer and a full connection layer; performing global average pooling on the feature set output by the stage 4 to obtain a vector with the length of 8C, and mapping the features to N through a full connection layer, wherein N is the type of pedestrians in the data set in the step one;
step three, calculating a loss function, and reversely propagating and updating network parameters;
3.1) the loss function consists of two parts, ID loss and Circle loss, and the formula is as follows:
Lreid=w1Lid+w2Lcircle (10)
wherein w1And w2Weights representing ID loss and Circle loss, respectively; l isreidRepresenting the total loss function, LidIndicates ID loss, LcircleIndicating Circle loss;
3.2) the ID loss formula is as follows:
where n denotes the number of samples per batch training, p (y)i|xi) Representing an input image xiIs set as label yiThe conditional probability of (a);
3.3) Circle loss formula as follows:
Δn=m (13)
Δm=1-m (14)
where N denotes the number of different pedestrian classes, MiRepresenting the number of images in the class of the ith pedestrian; gamma is a scale parameter; m is the stringency of the optimization; snIs an inter-class similarity score matrix, SpIs an intra-similarity score matrix; a isnAnd apIs a non-negative matrix, respectively SnAnd SpThe formula is as follows:
wherein SnIs an inter-class similarity score matrix, SpIs an intra-similarity score matrix;
3.4) setting hyper-parameters and training a network; adopting a preheating learning rate, wherein the learning rate is initially r and gradually increased to ten times of r in the previous 10 times of training; the optimizer adopts an optimized random gradient descent algorithm with an increase value of d1The weight decay sum of (1) is d2The momentum of (a); performing back propagation by using the set optimizer and learning rate and combining the loss values calculated in 3.1) -3.3), and updating network parameters;
fourthly, carrying out pedestrian re-identification matching;
and (3) zooming the image of the pedestrian to be detected, inputting the zoomed image of the pedestrian to be detected into the Swin transform neural network in the step two, outputting the zoomed image, and processing the zoomed image by using softmax to obtain N probability values which respectively correspond to the probabilities that the pedestrian belongs to different classes, wherein the class with the maximum probability value is the identity of the pedestrian.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111087471.7A CN113792669B (en) | 2021-09-16 | 2021-09-16 | Pedestrian re-recognition baseline method based on hierarchical self-attention network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111087471.7A CN113792669B (en) | 2021-09-16 | 2021-09-16 | Pedestrian re-recognition baseline method based on hierarchical self-attention network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113792669A true CN113792669A (en) | 2021-12-14 |
CN113792669B CN113792669B (en) | 2024-06-14 |
Family
ID=78878614
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111087471.7A Active CN113792669B (en) | 2021-09-16 | 2021-09-16 | Pedestrian re-recognition baseline method based on hierarchical self-attention network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113792669B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114842085A (en) * | 2022-07-05 | 2022-08-02 | 松立控股集团股份有限公司 | Full-scene vehicle attitude estimation method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108710831A (en) * | 2018-04-24 | 2018-10-26 | 华南理工大学 | A kind of small data set face recognition algorithms based on machine vision |
CN112183468A (en) * | 2020-10-27 | 2021-01-05 | 南京信息工程大学 | Pedestrian re-identification method based on multi-attention combined multi-level features |
CN112818790A (en) * | 2021-01-25 | 2021-05-18 | 浙江理工大学 | Pedestrian re-identification method based on attention mechanism and space geometric constraint |
US20210201010A1 (en) * | 2019-12-31 | 2021-07-01 | Wuhan University | Pedestrian re-identification method based on spatio-temporal joint model of residual attention mechanism and device thereof |
-
2021
- 2021-09-16 CN CN202111087471.7A patent/CN113792669B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108710831A (en) * | 2018-04-24 | 2018-10-26 | 华南理工大学 | A kind of small data set face recognition algorithms based on machine vision |
US20210201010A1 (en) * | 2019-12-31 | 2021-07-01 | Wuhan University | Pedestrian re-identification method based on spatio-temporal joint model of residual attention mechanism and device thereof |
CN112183468A (en) * | 2020-10-27 | 2021-01-05 | 南京信息工程大学 | Pedestrian re-identification method based on multi-attention combined multi-level features |
CN112818790A (en) * | 2021-01-25 | 2021-05-18 | 浙江理工大学 | Pedestrian re-identification method based on attention mechanism and space geometric constraint |
Non-Patent Citations (1)
Title |
---|
刘紫燕;万培佩;: "基于注意力机制的行人重识别特征提取方法", 计算机应用, no. 03, 31 December 2020 (2020-12-31) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114842085A (en) * | 2022-07-05 | 2022-08-02 | 松立控股集团股份有限公司 | Full-scene vehicle attitude estimation method |
CN114842085B (en) * | 2022-07-05 | 2022-09-16 | 松立控股集团股份有限公司 | Full-scene vehicle attitude estimation method |
Also Published As
Publication number | Publication date |
---|---|
CN113792669B (en) | 2024-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109977918B (en) | Target detection positioning optimization method based on unsupervised domain adaptation | |
US20230186056A1 (en) | Grabbing detection method based on rp-resnet | |
CN109492529A (en) | A kind of Multi resolution feature extraction and the facial expression recognizing method of global characteristics fusion | |
CN108427921A (en) | A kind of face identification method based on convolutional neural networks | |
CN108509839A (en) | One kind being based on the efficient gestures detection recognition methods of region convolutional neural networks | |
CN109359603A (en) | A kind of vehicle driver's method for detecting human face based on concatenated convolutional neural network | |
CN104361316B (en) | Dimension emotion recognition method based on multi-scale time sequence modeling | |
CN109325440B (en) | Human body action recognition method and system | |
CN110956082B (en) | Face key point detection method and detection system based on deep learning | |
CN110532925B (en) | Driver fatigue detection method based on space-time graph convolutional network | |
CN110245620B (en) | Non-maximization inhibition method based on attention | |
CN105243154A (en) | Remote sensing image retrieval method and system based on significant point characteristics and spare self-encodings | |
CN112164077B (en) | Cell instance segmentation method based on bottom-up path enhancement | |
Kaluri et al. | A framework for sign gesture recognition using improved genetic algorithm and adaptive filter | |
CN109101108A (en) | Method and system based on three decision optimization intelligence cockpit human-computer interaction interfaces | |
CN117058437B (en) | Flower classification method, system, equipment and medium based on knowledge distillation | |
CN113255557A (en) | Video crowd emotion analysis method and system based on deep learning | |
CN110599502A (en) | Skin lesion segmentation method based on deep learning | |
US20190266443A1 (en) | Text image processing using stroke-aware max-min pooling for ocr system employing artificial neural network | |
CN116258874A (en) | SAR recognition database sample gesture expansion method based on depth condition diffusion network | |
CN114742224A (en) | Pedestrian re-identification method and device, computer equipment and storage medium | |
CN114821736A (en) | Multi-modal face recognition method, device, equipment and medium based on contrast learning | |
CN116524189A (en) | High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization | |
CN113792669A (en) | Pedestrian re-identification baseline method based on hierarchical self-attention network | |
CN118522039B (en) | Frame extraction pedestrian retrieval method based on YOLOv s and stage type regular combined pedestrian re-recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |