[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113792669A - Pedestrian re-identification baseline method based on hierarchical self-attention network - Google Patents

Pedestrian re-identification baseline method based on hierarchical self-attention network Download PDF

Info

Publication number
CN113792669A
CN113792669A CN202111087471.7A CN202111087471A CN113792669A CN 113792669 A CN113792669 A CN 113792669A CN 202111087471 A CN202111087471 A CN 202111087471A CN 113792669 A CN113792669 A CN 113792669A
Authority
CN
China
Prior art keywords
image
pedestrian
swin
loss
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111087471.7A
Other languages
Chinese (zh)
Other versions
CN113792669B (en
Inventor
陈炳才
张繁盛
聂冰洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202111087471.7A priority Critical patent/CN113792669B/en
Publication of CN113792669A publication Critical patent/CN113792669A/en
Application granted granted Critical
Publication of CN113792669B publication Critical patent/CN113792669B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computational Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a pedestrian re-identification baseline method based on a hierarchical self-attention network, and belongs to the field of computer vision. In the invention, the Swin transform is creatively used as a backbone network to be introduced into the field of pedestrian re-identification, the form of weighted sum of ID loss and Circle loss is used as a loss function, and the feature extraction capability is improved while the simple structure is ensured through effective data preprocessing and reasonable parameter adjustment. Compared with the traditional baseline method based on ResNet, the pedestrian re-identification method based on the RESNet has the advantage that the effect of pedestrian re-identification is remarkably improved.

Description

Pedestrian re-identification baseline method based on hierarchical self-attention network
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a pedestrian re-identification baseline method based on a hierarchical self-attention network.
Background
The pedestrian re-identification needs to utilize a computer vision technology to identify a specific pedestrian under a cross-camera environment, a pedestrian monitoring image is given, the image of the pedestrian under a cross-device is searched, and the identification of the specific pedestrian has very important significance for violation judgment, criminal investigation, danger early warning and the like.
A good baseline method should achieve good effects while ensuring low parameters, the existing pedestrian re-identification baseline method is based on ResNet and limited by the limitation and insufficiency of a convolutional neural network on feature extraction, and the ResNet-based baseline method cannot achieve ideal effects.
With the progress of research, transformers are gradually applied to the field of computer vision. The existing pedestrian re-identification method based on the Transformer has the problems of overlarge calculated amount, single characteristic receptive field and the like.
Disclosure of Invention
The invention provides a pedestrian re-identification baseline method based on a hierarchical self-attention network, which aims to solve the problems in the background technology and achieve a good effect while having a simple structure.
The technical scheme of the invention is as follows:
a pedestrian re-identification baseline method based on a hierarchical self-attention network comprises the following specific steps:
step one, preprocessing data;
setting a total of N different pedestrians, each pedestrian including MiAn image of where Mi>1,MiIndicates the number of images in the class of the ith pedestrian, i indicates the ID number of each pedestrian; for the ith pedestrian, Mi1 image is used as a training set, 1 image is used as a verification set, i is used as a label and indicates that the image corresponds to the ith pedestrian;
1.1) using a bicubic interpolation algorithm to scale an image to (H, W, C) as an input image, wherein H represents the length of the image, W represents the width of the image, C represents the number of channels of the image, and C takes the value of 3; the method comprises the following specific steps:
1.1.1) constructing a Bicubic function:
Figure BDA0003266334850000021
wherein a is a variable value in the coefficient and is used for controlling the shape of the Bicubic curve;
1.1.2) interpolation formula is as follows:
Figure BDA0003266334850000022
wherein (x, y) represents the pixel point to be interpolated, and for each pixel point, 4 × 4 pixel points near the pixel point are taken to perform bicubic interpolation operation.
1.2) carrying out data enhancement by using a random erasure algorithm;
1.2.1) setting a threshold probability p, generating a random number p1 of 0-1, when p1> p, the input image is not processed, otherwise, the input image needs to be erased:
p1=Rand(0,1) (3)
1.2.2) determining an erasing area;
He=Rand(H/8,H/4) (4)
We=Rand(W/8,W/4) (5)
Se=He×We (6)
where H denotes a length of the input image and W denotes a width of the input image; heIndicates the length of erase, WeIndicates the width of erase, SeRepresents the area of erase;
1.2.3) determining an erasing coordinate;
xe=Rand(0,H-He) (7)
ye=Rand(0,W-We) (8)
wherein x iseRepresenting the upper left-hand x-coordinate of the erasure, yeRepresenting the erased upper left y coordinate.
Inputting the preprocessed image into a hierarchical self-attention network, namely a Swin Transformer neural network, and carrying out forward transmission;
the backbone network comprises 4 processing stages, wherein the 2-4 stages have the same network structure, and the specific steps are as follows:
2.1) stage 1;
2.1.1) block division; starting from the upper left corner of the image, the input image is divided into a non-coinciding set of image blocks, wherein each image block has a size of 4 x 4, the image is divided into a number of (4,4,3) image blocks, wherein the number of image blocks NpatchComprises the following steps:
Npatch=(H/4)×(W/4) (9)
2.1.2) linear embedding; flattening each image block into a vector with the dimension of C through a full connection layer, and sending the image blocks into two continuous Swin blocks;
2.1.3) extracting features from Swin blocks;
the Swin block comprises a Swin block 1 and a Swin block 2; the Swin block 1 mainly comprises a window multi-head self-attention module and a multilayer sensor, wherein layer standardization processing is carried out before the two modules, and residual connection is added after the two modules; the Swin block 2 mainly comprises a moving window multi-head self-attention module and a multilayer sensor, wherein layer standardization processing is carried out before the two modules, and residual connection is added after the two modules;
after the extraction of the Swin block, key feature information of the head, the hands, the actions and the like of the pedestrian can be obtained, and a feature set (H/4, W/4, C) is output;
2.2) stage 2;
2.2.1) block fusion; combining the input feature sets pairwise, adjusting the feature dimension to be twice of the original dimension by using a full-connection layer, and outputting feature sets (H/8, W/8 and 2C);
2.2.2) extracting features from Swin blocks; completely consistent with the Swin block structure in 2.1.3), and outputting a key feature set (H/8, W/8,2C) after the Swin block processing;
2.3) stages 3-4;
the network structures of the stage 3 and the stage 4 are completely consistent with the network structure of the stage 2, and after processing, feature sets (H/16, W/16,4C) and (H/32, W/32,8C) are respectively output;
2.4) a global average pooling layer and a full connection layer; and (3) carrying out global average pooling on the feature set output in the stage (4) to obtain a vector with the length of 8C, and mapping the features into N through a full connection layer, wherein N is the type of the pedestrians in the data set in the step (one).
Step three, calculating a loss function, and reversely propagating and updating network parameters;
3.1) the loss function consists of two parts, ID loss and Circle loss, and the formula is as follows:
Lreid=w1Lid+w2Lcircle (10)
wherein w1And w2Weights representing ID loss and Circle loss, respectively; l isreidRepresenting the total loss function, LidIndicates ID loss, LcircleIndicating Circle loss;
3.2) the ID loss formula is as follows:
Figure BDA0003266334850000041
where n denotes the number of samples per batch training, p (y)i|xi) Representing an input image xiIs set as label yiThe conditional probability of (a);
3.3) Circle loss formula as follows:
Figure BDA0003266334850000042
Δn=m (13)
Δm=1-m (14)
where N denotes the number of different pedestrian classes, MiRepresenting the number of images in the class of the ith pedestrian; gamma is a scale parameter; m is the stringency of the optimization; snIs an inter-class similarity score matrix, SpIs an intra-similarity score matrix; a isnAnd apIs a non-negative matrix, respectively SnAnd SpThe formula is as follows:
Figure BDA0003266334850000051
Figure BDA0003266334850000052
wherein SnIs an inter-class similarity score matrix, SpIs an intra-similarity score matrix;
3.4) setting hyper-parameters and training a network; adopting a preheating learning rate, wherein the learning rate is initially r and gradually increased to ten times of r in the previous 10 times of training; the optimizer adopts an optimized random gradient descent algorithm with an increase value of d1The weight decay sum of (1) is d2The momentum of (a); performing back propagation by using the set optimizer and learning rate and combining the loss values calculated in 3.1) -3.3), and updating network parameters.
Fourthly, carrying out pedestrian re-identification matching;
and (3) zooming the image of the pedestrian to be detected, inputting the zoomed image of the pedestrian to be detected into the Swin transform neural network in the step two, outputting the zoomed image, and processing the zoomed image by using softmax to obtain N probability values which respectively correspond to the probabilities that the pedestrian belongs to different classes, wherein the class with the maximum probability value is the identity of the pedestrian.
The invention has the beneficial effects that: the invention provides a pedestrian re-recognition baseline method based on a hierarchical self-attention network, which creatively introduces Swin transform as a backbone network into the field of pedestrian re-recognition, takes the form of weighted sum of ID loss and Circle loss as a loss function, and greatly improves the training effect while ensuring simple structure through effective data preprocessing and reasonable parameter adjustment.
Drawings
FIG. 1 is a general improved concept diagram of the present invention;
FIG. 2 is a model diagram of a baseline pedestrian re-identification method based on a hierarchical self-attention network according to the present invention;
fig. 3 is a schematic diagram of the Swin block structure.
Detailed Description
The following detailed description of the embodiments of the present invention is provided with reference to the accompanying drawings, which are used to implement the embodiments of the present invention, and give detailed descriptions and specific operation procedures. The data set of a specific experiment is a Market1501 data set collected in a university, and the training set comprises 751 persons and 12936 images; the test set had 750 people, and contained 19732 images.
Fig. 1 is a general improvement idea diagram of the present invention, and fig. 2 is a model diagram of a pedestrian re-identification baseline method based on a hierarchical self-attention network according to the present invention, which includes the following specific steps:
step one, preprocessing data;
751 people exist in the training set, N is 751, and each pedestrian comprises MiAn image of where Mi>1,MiIndicates the number of images in the class of the ith pedestrian, i indicates the ID number of each pedestrian; for the ith pedestrian, Mi1 image is used as a training set, 1 image is used as a verification set, i is used as a label and indicates that the image corresponds to the ith pedestrian;
1.1) scaling (224, 3) the image using a bicubic interpolation algorithm, where H represents the length of the image, W represents the width of the image, and C represents the number of channels of the image, as follows:
1.1.1) constructing a Bicubic function:
Figure BDA0003266334850000061
wherein a is-0.5, which is the variable value in the coefficient and is used for controlling the shape of the Bicubic curve;
1.1.2) interpolation formula is as follows:
Figure BDA0003266334850000062
wherein (x, y) represents the pixel point to be interpolated, and for each pixel point, 4 × 4 pixel points near the pixel point are taken to perform bicubic interpolation operation.
1.2) carrying out data enhancement by using a random erasure algorithm;
1.2.1) setting the threshold probability p to 0.5, generating a random number p1 of 0-1, when p1> p, the image is not processed, otherwise, the image needs to be erased:
p1=Rand(0,1) (3)
1.2.2) determining an erasing area;
He=Rand(H/8,H/4) (4)
We=Rand(W/8,W/4) (5)
Se=He×We (6)
where H denotes a length of the input image and W denotes a width of the input image; heIndicates the length of erase, WeIndicates the width of erase, SeRepresents the area of erase;
1.2.3) determining an erasing coordinate;
xe=Rand(0,H-He) (7)
ye=Rand(0,W-We) (8)
wherein x iseRepresenting the upper left-hand x-coordinate of the erasure, yeRepresenting the erased upper left y coordinate.
Inputting the preprocessed image into a hierarchical self-attention network, namely a Swin Transformer neural network, and carrying out forward transmission;
the backbone network comprises 4 processing stages, wherein the 2-4 stages have the same network structure, and the specific steps are as follows:
2.1) stage 1;
2.1.1) block division; starting from the upper left corner of the image, the input image is divided into a non-coinciding set of image blocks, wherein each image block has a size of 4 x 4, the image is divided into a number of (4,4,3) image blocks, wherein the number of image blocks NpatchComprises the following steps:
Npatch=(H/4)×(W/4) (9)
where H, W refer to the length and width of the input image, respectively, and N is the timepatch=56x56;
2.1.2) linear embedding; flattening each image block into a vector with dimension of 128 through a full connection layer, and sending the vector into two continuous Swin blocks;
2.1.3) extracting features from Swin blocks;
as shown in fig. 3, the Swin block includes Swin block 1 and Swin block 2; the Swin block 1 mainly comprises a window multi-head self-attention module and a multilayer sensor, wherein layer standardization processing is carried out before the two modules, and residual connection is added after the two modules; the Swin block 2 mainly comprises a moving window multi-head self-attention module and a multilayer sensor, wherein layer standardization processing is carried out before the two modules, and residual connection is added after the two modules;
after the extraction of the Swin block, key feature information of the head, hands, actions and the like of the pedestrian can be obtained, a feature set (56, 128) is output and transmitted to the next module;
2.2) stage 2;
2.2.1) block fusion; combining the input feature sets pairwise, adjusting feature dimensions to twice of the original feature dimensions by using a full connection layer, and outputting the feature sets (28, 256);
2.2.2) extracting features from Swin blocks; the structure is completely consistent with the structure 2.1.3), and a feature set (28, 256) is output after the Swin block processing;
2.3) stages 3-4;
the structure of the stage 3 and the stage 4 is completely consistent with that of the stage 2, and after processing, feature sets (14, 512) and (7, 1024) are respectively output;
2.4) a global average pooling layer and a full connection layer; and (3) carrying out global average pooling on the feature set output in the stage 4 to obtain a vector with the length of 1024, and mapping the features into 751 classes through a full-connection layer, wherein 751 is the class of pedestrians in the data set used in the embodiment.
Step three, calculating a loss function, and reversely propagating and updating network parameters;
3.1) the loss function consists of two parts, ID loss and Circle loss, and the formula is as follows:
Lreid=w1Lid+w2Lcircle (10)
wherein w1And w2Weights, w, representing ID loss and Circle loss, respectively1The value of 0.4, w2The value is 0.6; l isreidRepresenting the total loss function, LidIndicates ID loss, LcircleIndicating Circle loss;
3.2) the ID loss formula is as follows:
Figure BDA0003266334850000091
where n represents the number of samples per batch training, the value of this embodiment is 16, p (y)i|xi) Representing an input image xiIs set as label yiThe conditional probability of (a);
3.3) Circle loss formula as follows:
Figure BDA0003266334850000092
Δn=m (13)
Δm=1-m (14)
wherein N represents the number of different types of pedestrians, and the value 751 is taken by the embodiment; miRepresenting the number of images in the class of the ith pedestrian; γ is a scale parameter, 32 in this example; m is the severity of the optimization, which is 0.25 in this example; snIs an inter-class similarity score matrix, SpIs an intra-similarity score matrix; a isnAnd apIs a non-negative matrix, respectively SnAnd SpThe formula is as follows:
Figure BDA0003266334850000093
Figure BDA0003266334850000094
wherein SnIs an inter-class similarity score matrix, SpIs an intra-similarity score matrix;
and 3.4) setting hyper-parameters during training of the neural network as shown in the table 1, performing back propagation by using the set optimizer and learning rate in combination with the loss values calculated in 3.1) -3.3), and updating network parameters.
TABLE 1 hyper-parameter settings for training networks
Figure BDA0003266334850000095
Fourthly, carrying out pedestrian re-identification matching;
and (3) zooming the image of the pedestrian to be detected, inputting the zoomed image of the pedestrian to be detected into the Swin transform neural network in the step two, outputting the zoomed image, and processing the zoomed image by using softmax to obtain 751 probability values which respectively correspond to the probabilities that the pedestrian belongs to different classes, wherein the class with the maximum probability value is the identity of the pedestrian.
In the embodiment, a pedestrian re-recognition effect test is performed based on a Market1501 data set, and compared with the existing pedestrian re-recognition baseline model based on global features, as shown in table 2:
table 2 comparison of results with existing baseline model
Figure BDA0003266334850000101
The comparison of experimental results shows that the base line model provided by the invention can effectively improve Rank1 and mAP indexes of pedestrian re-identification, proves the effectiveness of the method and has great promoting significance on the practical application of pedestrian re-identification; in addition, the network structure is simple, the expandability is strong, and great reference significance is provided for designing pedestrian re-identification methods in the future.

Claims (1)

1. A pedestrian re-identification baseline method based on a hierarchical self-attention network is characterized by comprising the following steps:
step one, preprocessing data;
setting a total of N different pedestrians, each pedestrian including MiAn image of where Mi>1,MiIndicates the number of images in the class of the ith pedestrian, i indicates the ID number of each pedestrian; for the ith pedestrian, Mi1 image is used as a training set, 1 image is used as a verification set, i is used as a label and indicates that the image corresponds to the ith pedestrian;
1.1) using a bicubic interpolation algorithm to scale an image to (H, W, C) as an input image, wherein H represents the length of the image, W represents the width of the image, C represents the number of channels of the image, and C takes the value of 3; the method comprises the following specific steps:
1.1.1) constructing a Bicubic function:
Figure FDA0003266334840000011
wherein a is a variable value in the coefficient and is used for controlling the shape of the Bicubic curve;
1.1.2) interpolation formula is as follows:
Figure FDA0003266334840000012
wherein (x, y) represents the pixel points to be interpolated, and for each pixel point, 4 × 4 pixel points nearby the pixel point are taken to perform bicubic interpolation operation;
1.2) carrying out data enhancement by using a random erasure algorithm;
1.2.1) setting a threshold probability p, generating a random number p1 of 0-1, when p1> p, the input image is not processed, otherwise, the input image needs to be erased:
p1=Rand(0,1) (3)
1.2.2) determining an erasing area;
He=Rand(H/8,H/4) (4)
We=Rand(W/8,W/4) (5)
Se=He×We (6)
where H denotes a length of the input image and W denotes a width of the input image; heIndicates the length of erase, WeIndicates the width of erase, SeRepresents the area of erase;
1.2.3) determining an erasing coordinate;
xe=Rand(0,H-He) (7)
ye=Rand(0,W-We) (8)
wherein x iseRepresenting the upper left-hand x-coordinate of the erasure, yeRepresents the upper left-hand y coordinate of the erasure;
inputting the preprocessed image into a hierarchical self-attention network, namely a Swin Transformer neural network, and carrying out forward transmission;
the backbone network comprises 4 processing stages, wherein the 2-4 stages have the same network structure, and the specific steps are as follows:
2.1) stage 1;
2.1.1) block division; starting from the upper left corner of the image, the input image is divided into a non-coinciding set of image blocks, wherein each image block has a size of 4 x 4, the image is divided into a number of (4,4,3) image blocks, wherein the number of image blocks NpatchComprises the following steps:
Npatch=(H/4)×(W/4) (9)
2.1.2) linear embedding; flattening each image block into a vector with the dimension of C through a full connection layer, and sending the image blocks into two continuous Swin blocks;
2.1.3) extracting features from Swin blocks;
as shown in fig. 3, the Swin block includes Swin block 1 and Swin block 2; the Swin block 1 mainly comprises a window multi-head self-attention module and a multilayer sensor, wherein layer standardization processing is carried out before the two modules, and residual connection is added after the two modules; the Swin block 2 mainly comprises a moving window multi-head self-attention module and a multilayer sensor, wherein layer standardization processing is carried out before the two modules, and residual connection is added after the two modules;
after extraction is carried out through a Swin block, key feature information of the head, the hands and the actions of the pedestrian is obtained, and a feature set (H/4, W/4, C) is output;
2.2) stage 2;
2.2.1) block fusion; combining the input feature sets pairwise, adjusting the feature dimension to be twice of the original dimension by using a full-connection layer, and outputting feature sets (H/8, W/8 and 2C);
2.2.2) extracting features from Swin blocks; completely consistent with the Swin block structure in 2.1.3), and outputting a key feature set (H/8, W/8,2C) after the Swin block processing;
2.3) stages 3-4;
the network structures of the stage 3 and the stage 4 are completely consistent with the network structure of the stage 2, and after processing, feature sets (H/16, W/16,4C) and (H/32, W/32,8C) are respectively output;
2.4) a global average pooling layer and a full connection layer; performing global average pooling on the feature set output by the stage 4 to obtain a vector with the length of 8C, and mapping the features to N through a full connection layer, wherein N is the type of pedestrians in the data set in the step one;
step three, calculating a loss function, and reversely propagating and updating network parameters;
3.1) the loss function consists of two parts, ID loss and Circle loss, and the formula is as follows:
Lreid=w1Lid+w2Lcircle (10)
wherein w1And w2Weights representing ID loss and Circle loss, respectively; l isreidRepresenting the total loss function, LidIndicates ID loss, LcircleIndicating Circle loss;
3.2) the ID loss formula is as follows:
Figure FDA0003266334840000031
where n denotes the number of samples per batch training, p (y)i|xi) Representing an input image xiIs set as label yiThe conditional probability of (a);
3.3) Circle loss formula as follows:
Figure FDA0003266334840000041
Δn=m (13)
Δm=1-m (14)
where N denotes the number of different pedestrian classes, MiRepresenting the number of images in the class of the ith pedestrian; gamma is a scale parameter; m is the stringency of the optimization; snIs an inter-class similarity score matrix, SpIs an intra-similarity score matrix; a isnAnd apIs a non-negative matrix, respectively SnAnd SpThe formula is as follows:
Figure FDA0003266334840000042
Figure FDA0003266334840000043
wherein SnIs an inter-class similarity score matrix, SpIs an intra-similarity score matrix;
3.4) setting hyper-parameters and training a network; adopting a preheating learning rate, wherein the learning rate is initially r and gradually increased to ten times of r in the previous 10 times of training; the optimizer adopts an optimized random gradient descent algorithm with an increase value of d1The weight decay sum of (1) is d2The momentum of (a); performing back propagation by using the set optimizer and learning rate and combining the loss values calculated in 3.1) -3.3), and updating network parameters;
fourthly, carrying out pedestrian re-identification matching;
and (3) zooming the image of the pedestrian to be detected, inputting the zoomed image of the pedestrian to be detected into the Swin transform neural network in the step two, outputting the zoomed image, and processing the zoomed image by using softmax to obtain N probability values which respectively correspond to the probabilities that the pedestrian belongs to different classes, wherein the class with the maximum probability value is the identity of the pedestrian.
CN202111087471.7A 2021-09-16 2021-09-16 Pedestrian re-recognition baseline method based on hierarchical self-attention network Active CN113792669B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111087471.7A CN113792669B (en) 2021-09-16 2021-09-16 Pedestrian re-recognition baseline method based on hierarchical self-attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111087471.7A CN113792669B (en) 2021-09-16 2021-09-16 Pedestrian re-recognition baseline method based on hierarchical self-attention network

Publications (2)

Publication Number Publication Date
CN113792669A true CN113792669A (en) 2021-12-14
CN113792669B CN113792669B (en) 2024-06-14

Family

ID=78878614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111087471.7A Active CN113792669B (en) 2021-09-16 2021-09-16 Pedestrian re-recognition baseline method based on hierarchical self-attention network

Country Status (1)

Country Link
CN (1) CN113792669B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114842085A (en) * 2022-07-05 2022-08-02 松立控股集团股份有限公司 Full-scene vehicle attitude estimation method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710831A (en) * 2018-04-24 2018-10-26 华南理工大学 A kind of small data set face recognition algorithms based on machine vision
CN112183468A (en) * 2020-10-27 2021-01-05 南京信息工程大学 Pedestrian re-identification method based on multi-attention combined multi-level features
CN112818790A (en) * 2021-01-25 2021-05-18 浙江理工大学 Pedestrian re-identification method based on attention mechanism and space geometric constraint
US20210201010A1 (en) * 2019-12-31 2021-07-01 Wuhan University Pedestrian re-identification method based on spatio-temporal joint model of residual attention mechanism and device thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710831A (en) * 2018-04-24 2018-10-26 华南理工大学 A kind of small data set face recognition algorithms based on machine vision
US20210201010A1 (en) * 2019-12-31 2021-07-01 Wuhan University Pedestrian re-identification method based on spatio-temporal joint model of residual attention mechanism and device thereof
CN112183468A (en) * 2020-10-27 2021-01-05 南京信息工程大学 Pedestrian re-identification method based on multi-attention combined multi-level features
CN112818790A (en) * 2021-01-25 2021-05-18 浙江理工大学 Pedestrian re-identification method based on attention mechanism and space geometric constraint

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘紫燕;万培佩;: "基于注意力机制的行人重识别特征提取方法", 计算机应用, no. 03, 31 December 2020 (2020-12-31) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114842085A (en) * 2022-07-05 2022-08-02 松立控股集团股份有限公司 Full-scene vehicle attitude estimation method
CN114842085B (en) * 2022-07-05 2022-09-16 松立控股集团股份有限公司 Full-scene vehicle attitude estimation method

Also Published As

Publication number Publication date
CN113792669B (en) 2024-06-14

Similar Documents

Publication Publication Date Title
CN109977918B (en) Target detection positioning optimization method based on unsupervised domain adaptation
US20230186056A1 (en) Grabbing detection method based on rp-resnet
CN109492529A (en) A kind of Multi resolution feature extraction and the facial expression recognizing method of global characteristics fusion
CN108427921A (en) A kind of face identification method based on convolutional neural networks
CN108509839A (en) One kind being based on the efficient gestures detection recognition methods of region convolutional neural networks
CN109359603A (en) A kind of vehicle driver's method for detecting human face based on concatenated convolutional neural network
CN104361316B (en) Dimension emotion recognition method based on multi-scale time sequence modeling
CN109325440B (en) Human body action recognition method and system
CN110956082B (en) Face key point detection method and detection system based on deep learning
CN110532925B (en) Driver fatigue detection method based on space-time graph convolutional network
CN110245620B (en) Non-maximization inhibition method based on attention
CN105243154A (en) Remote sensing image retrieval method and system based on significant point characteristics and spare self-encodings
CN112164077B (en) Cell instance segmentation method based on bottom-up path enhancement
Kaluri et al. A framework for sign gesture recognition using improved genetic algorithm and adaptive filter
CN109101108A (en) Method and system based on three decision optimization intelligence cockpit human-computer interaction interfaces
CN117058437B (en) Flower classification method, system, equipment and medium based on knowledge distillation
CN113255557A (en) Video crowd emotion analysis method and system based on deep learning
CN110599502A (en) Skin lesion segmentation method based on deep learning
US20190266443A1 (en) Text image processing using stroke-aware max-min pooling for ocr system employing artificial neural network
CN116258874A (en) SAR recognition database sample gesture expansion method based on depth condition diffusion network
CN114742224A (en) Pedestrian re-identification method and device, computer equipment and storage medium
CN114821736A (en) Multi-modal face recognition method, device, equipment and medium based on contrast learning
CN116524189A (en) High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
CN113792669A (en) Pedestrian re-identification baseline method based on hierarchical self-attention network
CN118522039B (en) Frame extraction pedestrian retrieval method based on YOLOv s and stage type regular combined pedestrian re-recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant