CN117877068B - Mask self-supervision shielding pixel reconstruction-based shielding pedestrian re-identification method - Google Patents
Mask self-supervision shielding pixel reconstruction-based shielding pedestrian re-identification method Download PDFInfo
- Publication number
- CN117877068B CN117877068B CN202410016648.1A CN202410016648A CN117877068B CN 117877068 B CN117877068 B CN 117877068B CN 202410016648 A CN202410016648 A CN 202410016648A CN 117877068 B CN117877068 B CN 117877068B
- Authority
- CN
- China
- Prior art keywords
- image
- pedestrian
- mask
- network
- image block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 72
- 238000012549 training Methods 0.000 claims abstract description 70
- 238000012360 testing method Methods 0.000 claims abstract description 45
- 230000008569 process Effects 0.000 claims abstract description 31
- 230000000295 complement effect Effects 0.000 claims abstract 8
- 230000000903 blocking effect Effects 0.000 claims abstract 3
- 230000006870 function Effects 0.000 claims description 53
- 230000014759 maintenance of location Effects 0.000 claims description 38
- 239000011159 matrix material Substances 0.000 claims description 36
- 230000011218 segmentation Effects 0.000 claims description 22
- 238000010586 diagram Methods 0.000 claims description 9
- 238000013527 convolutional neural network Methods 0.000 claims description 7
- 238000012546 transfer Methods 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 230000000644 propagated effect Effects 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 2
- 239000013074 reference sample Substances 0.000 claims 4
- 239000000523 sample Substances 0.000 claims 3
- 238000012163 sequencing technique Methods 0.000 claims 2
- 238000012544 monitoring process Methods 0.000 claims 1
- 238000005096 rolling process Methods 0.000 claims 1
- 230000017105 transposition Effects 0.000 claims 1
- 230000010365 information processing Effects 0.000 abstract 1
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000003709 image segmentation Methods 0.000 description 3
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 2
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 1
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 1
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Image Analysis (AREA)
Abstract
Description
技术领域Technical Field
本发明属于计算机视觉中的行人重识别领域,具体涉及一种基于掩码自监督遮挡像素重建的遮挡行人重识别方法。The present invention belongs to the field of pedestrian re-identification in computer vision, and in particular relates to an occluded pedestrian re-identification method based on mask self-supervised occluded pixel reconstruction.
背景技术Background Art
行人重识别是一项检索被不同摄像头捕捉到的特定行人的任务,在监控系统中具有至关重要的地位,因而引起广泛关注。然而,由于在数据采集过程中存在摄像头位置差异、低分辨率、光照变化以及障碍物遮挡等挑战,导致现有行人重识别方法缺乏鲁棒性,识别准确率相对较低。其中,遮挡问题是当前行人重识别领域中最困难问题之一。Person re-identification is a task to retrieve specific pedestrians captured by different cameras. It plays a vital role in surveillance systems and has attracted widespread attention. However, due to challenges such as camera position differences, low resolution, illumination changes, and obstacle occlusion during data collection, existing person re-identification methods lack robustness and have relatively low recognition accuracy. Among them, the occlusion problem is one of the most difficult problems in the current field of person re-identification.
目前,基于深度学习的方法多依赖于人体关键点检测或人体结构检测。然而,由于遮挡问题,关键点或结构往往被其他障碍物遮挡,导致现有方法无法准确识别。而且,较为相似的障碍物会使得不同行人图片中出现相似的内容,导致神经网络提取特征的判别性大大降低,进一步影响了行人重识别方法的准确性和鲁棒性。因此,需要一种能够有效克服遮挡问题的新型行人重识别技术,以提高在复杂场景下的识别性能。At present, most deep learning-based methods rely on human key point detection or human structure detection. However, due to occlusion problems, key points or structures are often blocked by other obstacles, resulting in the inability of existing methods to accurately identify them. Moreover, similar obstacles will cause similar content to appear in different pedestrian images, which greatly reduces the discriminability of neural network feature extraction, further affecting the accuracy and robustness of pedestrian re-identification methods. Therefore, a new pedestrian re-identification technology that can effectively overcome the occlusion problem is needed to improve recognition performance in complex scenes.
发明内容Summary of the invention
为了解决上述问题,本发明提供了一种基于掩码自监督遮挡像素重建的遮挡行人重识别方法,所述方法包括步骤:In order to solve the above problems, the present invention provides a method for re-identifying occluded pedestrians based on mask self-supervised reconstruction of occluded pixels, the method comprising the steps of:
从监控摄像装置采集训练数据A、训练数据B与测试数据。其中,训练数据A包括有遮挡的行人重识别图片与实例分割级的标注与行人编号。训练数据B包括有较为完整人体部分的行人重识别图片。测试数据分为查询图片与库图片,所述的测试数据中只包含原始有遮挡的行人重识别图片与行人编号。The training data A, training data B and test data are collected from the surveillance camera device. The training data A includes the pedestrian re-identification pictures with occlusion and the annotations and pedestrian numbers at the instance segmentation level. The training data B includes the pedestrian re-identification pictures with relatively complete human body parts. The test data is divided into query pictures and library pictures, and the test data only includes the original pedestrian re-identification pictures with occlusion and pedestrian numbers.
首先,使用训练数据A与训练数据B将所述图像补全模型训练至收敛。将测试图像输入收敛的图像补全模型中,对图像中被遮挡的部分行人像素进行重建,得到去遮挡的行人图像。然后,使用训练数据B将所述遮挡行人重识别网络中训练至收敛。最后,将测试数据输入图像补全模块并得到去遮挡的行人图像。将去遮挡的行人图像输入所述行人重识别网络,可得到较好的行人识别精度。First, the image completion model is trained to convergence using training data A and training data B. The test image is input into the converged image completion model, and the obscured pedestrian pixels in the image are reconstructed to obtain a de-occluded pedestrian image. Then, the obscured pedestrian re-identification network is trained to convergence using training data B. Finally, the test data is input into the image completion module and a de-occluded pedestrian image is obtained. The de-occluded pedestrian image is input into the pedestrian re-identification network to obtain a better pedestrian recognition accuracy.
所述的基于掩码指导的掩码自编码器微调图像补全模型,其特征在于对所述的有遮挡的行人图片进行去遮挡处理,所述的补全模型训练过程如下:The mask-guided masked autoencoder fine-tuning image completion model is characterized by performing de-occlusion processing on the occluded pedestrian image. The completion model training process is as follows:
(1)使用训练数据A,对现有实例分割网络Mask2former训练至收敛,得到可以输出行人图片中人体掩码的实例分割网络。所述的收敛的实例分割网络可以预测身份为i的图片Xi对应的行人掩码Mi。其中,原图片Xi中未被遮挡的行人像素在对应的行人掩码Mi图片位置中的像素为白色,其他像素为黑色。(1) Using training data A, the existing instance segmentation network Mask2former is trained until convergence, and an instance segmentation network that can output human body masks in pedestrian images is obtained. The converged instance segmentation network can predict the pedestrian mask Mi corresponding to the image Xi with identity i. Among them, the pixels of pedestrians that are not blocked in the original image Xi in the corresponding pedestrian mask Mi image position are white, and the other pixels are black.
(2)使用自监督掩码指导的图像建模网络对有遮挡的行人图像中被遮挡的部分行人像素进行重建。将所述有遮挡的行人图片与图片对应掩码Xi,(H与W为图片的高与宽,3为RGB图片的三个维度)使用图像分块函数转换为图像块嵌入,其中每个图像块的高宽为Ph与Pw。对于图片Xi,图像分块函数为对每个图片Xi进行卷积,卷积核大小为Ph与Pw,且步长同样为Ph与Pw,每个图像块的输出维度为C。对图片使用图像分块函数转换为图像块嵌入并拉平得到的图像块嵌入如下:(2) Use the image modeling network guided by the self-supervised mask to reconstruct the obscured pedestrian pixels in the obscured pedestrian image. The obscured pedestrian image and the mask corresponding to the image are reconstructed. (H and W are the height and width of the image, and 3 is the three dimensions of the RGB image) Use the image block function to convert it to an image block embedding, where the height and width of each image block are Ph and Pw . For the image Xi , the image block function is to convolve each image Xi , the convolution kernel size is Ph and Pw , and the step size is also Ph and Pw , and the output dimension of each image block is C. The image is converted to an image block embedding using the image block function and flattened to obtain the image block embedding as follows:
XiP=Patch(Xi,θ) XiP = Patch( Xi ,θ)
MiP=Random(Mi)M iP = Random(M i )
其中,Patch是图像分块函数。 其中C为维度数量。(Ph,Pw)表示每个图像块的分辨率,是图像块的总数,θ为可学习参数。Random函数根据输入掩码图像大小计算图像块的数量,并随机生成对应数量的像素留存得分。MiP表示当前图片Xi中每个图像块的像素留存得分,其取值从0到105,像素留存得分小于60的图像块会被标记为重建块,被标记的图像块对应的图像块嵌入被丢弃且不会进入编码器部分,在训练阶段,MiP使用随机函数生成。Among them, Patch is the image block function. Where C is the number of dimensions. (P h ,P w ) represents the resolution of each image block. is the total number of image blocks, and θ is a learnable parameter. The Random function calculates the number of image blocks according to the input mask image size and randomly generates the corresponding number of pixel retention scores. M iP represents the pixel retention score of each image block in the current image Xi , and its value ranges from 0 to 105. Image blocks with pixel retention scores less than 60 will be marked as reconstructed blocks. The image block embedding corresponding to the marked image blocks will be discarded and will not enter the encoder part. During the training phase, M iP is generated using a random function.
(3)将所述的未丢弃图像块嵌入与图像块位置编码相加后输入像素重建编码器。所述的位置编码使用2D位置编码公式如下:(3) The non-discarded image block embedding and the image block position coding are added and then input into the pixel reconstruction encoder. The position coding uses the 2D position coding formula as follows:
所述的posX与posY是图像块在原图片中的横坐标与纵坐标。所述的dmodel是位置编码的维度数,在本发明中dmodel与图像块嵌入C相同。i的取值为从0到0.5C-1的正整数。位置编码的每一项计算完毕后,将所得的编码按如下方式排列,得到所述的2D位置编码:PE(posX,0),PE(posY,1),PE(posX,2),PE(posY,3)...PE(posX,C-2),PE(posY,C-1)。The pos X and pos Y are the horizontal and vertical coordinates of the image block in the original image. The d model is the dimension of the position code, and in the present invention, the d model is the same as the image block embedding C. The value of i is a positive integer from 0 to 0.5C-1. After each item of the position code is calculated, the obtained codes are arranged in the following manner to obtain the 2D position code: PE(pos X , 0), PE(pos Y , 1), PE(pos X , 2), PE(pos Y , 3)...PE(pos X , C-2), PE(pos Y , C-1).
将原图像块嵌入输入重建编码器,得到中间张量。在中间张量中的被丢弃像素块位置插入可学习张量,得到待学习中间张量。所述的可学习张量的形状与被替换的图像块嵌入形状相同,为C代表图像块嵌入的维度数。将待学习中间张量输入解码器,得到重建后图像块嵌入,将所述的重建后图像块嵌入解包后得到重建后的图像。The original image block is embedded into the input reconstruction encoder to obtain an intermediate tensor. The learnable tensor is inserted into the position of the discarded pixel block in the intermediate tensor to obtain the intermediate tensor to be learned. The shape of the learnable tensor is the same as the embedding shape of the replaced image block, which is C represents the number of dimensions of the image block embedding. The intermediate tensor to be learned is input into the decoder to obtain the reconstructed image block embedding, and the reconstructed image block embedding is unpacked to obtain the reconstructed image.
在训练过程中,使用训练数据B与其掩码图像对自监督掩码指导的图像建模网络进行训练。具体的,将每张行人图片的掩码中黑色像素覆盖于对应行人图片上,白色像素不做处理。得到训练数据B_withMask。使用训练数据B_withMask对所述的图像建模网络进行自监督训练。During the training process, the image modeling network guided by the self-supervised mask is trained using the training data B and its mask image. Specifically, the black pixels in the mask of each pedestrian image are covered on the corresponding pedestrian image, and the white pixels are not processed. The training data B_withMask is obtained. The image modeling network is self-supervised trained using the training data B_withMask.
自监督掩码指导的图像建模网络仅对像素留存得分小于60的图像块计算损失,采用均方误差作为损失函数。The self-supervised mask-guided image modeling network only calculates the loss for image patches with a pixel retention score less than 60, and uses the mean square error as the loss function.
进一步地,在预测过程中,将测试数据送入收敛的实例分割网络,得到测试图片掩码MiTest。将测试数据与测试图片掩码MiTest共同输入所述的图像建模网络,得到去遮挡的行人图像。所述的预测过程中图像建模网络流程如下:Furthermore, during the prediction process, the test data is fed into the converged instance segmentation network to obtain the test image mask MiTest . The test data and the test image mask MiTest are fed into the image modeling network to obtain a de-occluded pedestrian image. The image modeling network process during the prediction process is as follows:
将行人图片与测试图片掩码MiTest共同输入图像分块函数,得到图像块嵌入并拉平得到的图像块嵌入与像素留存得分MiP。像素留存得分MiP小于60的图像块将被丢弃。将未丢弃图像块嵌入与图像块位置编码相加,输入像素重建编码器,得到中间张量。在中间张量中的被丢弃像素块位置插入可学习张量,得到待学习中间张量。将待学习中间张量输入解码器,得到重建后图像块嵌入,将所述的重建后图像块嵌入解包后得到重建后的图像。The pedestrian image and the test image mask MiTest are input into the image segmentation function together to obtain the image block embedding and the flattened image block embedding and the pixel retention score MiP . The image blocks with a pixel retention score MiP less than 60 will be discarded. The non-discarded image block embedding is added to the image block position code and input into the pixel reconstruction encoder to obtain an intermediate tensor. The learnable tensor is inserted into the position of the discarded pixel block in the intermediate tensor to obtain the intermediate tensor to be learned. The intermediate tensor to be learned is input into the decoder to obtain the reconstructed image block embedding, and the reconstructed image block embedding is unpacked to obtain the reconstructed image.
预测过程中仅有像素留存得分MiP的获得方式与训练过程不同。所述的测试过程中像素留存得分MiP的获得方式,如下公式:The only difference between the prediction process and the training process is the way to obtain the pixel retention score M iP . The method for obtaining the pixel retention score M iP in the test process is as follows:
MiP=Patch_formask(MiTest,1) MiP =Patch_formask( MiTest ,1)
MiP表示当前图片Xi中所有图像块的像素留存得分,每个得分取值从0到105,像素留存得分小于60的图像块会被标记为重建块,被标记的图像块对应的XiP被丢弃且不会进入编码器部分。M iP represents the pixel retention score of all image blocks in the current image Xi . Each score ranges from 0 to 105. Image blocks with pixel retention scores less than 60 will be marked as reconstructed blocks. The XiP corresponding to the marked image blocks will be discarded and will not enter the encoder part.
Patch_formask是与Patch函数结构相同的图像分块函数,它们的区别在于Patch函数中卷积核的可学习参数为θ,Patch_formask函数中卷积核中的参数都不可学习且固定为1。像素留存得分的输出维度数为1,即每个图像块对应一个像素留存得分。Patch_formask函数中1代表卷积核中的参数都不可学习且固定为1。Patch_formask is an image segmentation function with the same structure as the Patch function. The difference between them is that the learnable parameter of the convolution kernel in the Patch function is θ, while the parameters in the convolution kernel in the Patch_formask function are not learnable and are fixed to 1. The output dimension of the pixel retention score is 1, that is, each image patch corresponds to a pixel retention score. In the Patch_formask function, 1 means that the parameters in the convolution kernel are not learnable and are fixed to 1.
进一步地,所述的行人重识别网络使用动态图结构模块与图卷积特征传播模块辅助传播特征。所述的动态图结构模块在卷积神经网络中的多个位置建立不同的图结构,即在卷积神经网络中的多个位置将特征图转化为K最近邻图结构。所述动态图结构模块输入高为H,宽为W的特征图,输出该特征图对应的邻接矩阵。所述动态图结构模块的伪代码如下:Furthermore, the pedestrian re-identification network uses a dynamic graph structure module and a graph convolution feature propagation module to assist in propagating features. The dynamic graph structure module establishes different graph structures at multiple positions in the convolutional neural network, that is, converts the feature graph into a K nearest neighbor graph structure at multiple positions in the convolutional neural network. The dynamic graph structure module inputs a feature graph with a height of H and a width of W, and outputs an adjacency matrix corresponding to the feature graph. The pseudo code of the dynamic graph structure module is as follows:
进一步地,计算特征图中的每个结点的相关性,得到相关矩阵,公式如下:Furthermore, the correlation of each node in the feature graph is calculated to obtain the correlation matrix. The formula is as follows:
R=θ(F)φ(F)T R=θ(F)φ(F) T
其中,R是相关矩阵,是卷积层输出的特征图,C是特征图的维度数,W与H分别是特征图的宽与高。XT表示矩阵X的转置。θ(F)和φ(F)表示将特征图F分别输入具有完全相同的结构但不同参数的两个传递函数。所述传递函数由一个1×1卷积层、一个批量归一化层和一个ReLU激活函数构成。Where R is the correlation matrix, is the feature map output by the convolutional layer, C is the number of dimensions of the feature map, and W and H are the width and height of the feature map, respectively. XT represents the transpose of the matrix X. θ(F) and φ(F) represent the input of the feature map F into two transfer functions with exactly the same structure but different parameters. The transfer function consists of a 1×1 convolutional layer, a batch normalization layer, and a ReLU activation function.
接着,将相关矩阵R与邻接矩阵A相乘,并使用softmax函数进行归一化,得到相似度邻接矩阵,公式如下:Next, multiply the correlation matrix R by the adjacency matrix A and normalize it using the softmax function to obtain the similarity adjacency matrix. The formula is as follows:
其中,是相似度邻接矩阵,N为特征图中的结点数,A为动态图结构模块输出的邻接矩阵,⊙表示矩阵的哈达玛积。in, is the similarity adjacency matrix, N is the number of nodes in the feature graph, A is the adjacency matrix output by the dynamic graph structure module, and ⊙ represents the Hadamard product of the matrix.
然后,在图结构A上进行节点特征传播,公式如下:Then, node feature propagation is performed on the graph structure A, and the formula is as follows:
其中,是经过特征传播后的特征,是卷积层输出的特征图经矩阵变换得到的特征。通过矩阵转换,传播后的特征可以重新转换为特征图。in, is the feature after feature propagation, It is the feature obtained by matrix transformation of the feature map output by the convolution layer. Through matrix transformation, the propagated features can be converted back into feature maps.
将上述动态图结构模块与图卷积特征传播模块的所有过程记为OGA(F)。其中,是卷积层输出的特征图。All the processes of the above dynamic graph structure module and graph convolution feature propagation module are denoted as OGA(F). It is the feature map output by the convolutional layer.
进一步地,将残差结构引入特征传播过程。同时,OGA模块进行多次堆叠能使特征进行更充分地传播,所述的有残差结构的OGA模块堆叠方式如下:Furthermore, the residual structure is introduced into the feature propagation process. At the same time, multiple stacking of OGA modules can make the features propagate more fully. The stacking method of the OGA modules with residual structure is as follows:
其中β代表可学习参数。Where β represents the learnable parameter.
进一步地,在训练过程中,所述的遮挡行人重识别网络的损失函数表达式为:Furthermore, during the training process, the loss function of the occluded person re-identification network is expressed as:
L=LID+LTriplet+εLC L=L ID +L Triplet +εL C
其中,LTriplet是三元组损失,LID是ID损失,LC是Center损失。ε是Center损失的平衡权重,在本网络中,其设置为0.1。Among them, L Triplet is the triplet loss, L ID is the ID loss, and L C is the Center loss. ε is the balance weight of the Center loss, which is set to 0.1 in this network.
所述的LTriplet表达式如下:The L Triplet expression is as follows:
上式中B是一个训练小批量样本数,表示基准样本,表示与基准样本相同类别但不同的正样本,表示与基准样本不同类别的负样本。α表示设定的训练间隔,在本发明中设置为0.2。f(x)为图片x的特征。In the above formula, B is the number of training mini-batch samples. represents the benchmark sample, represents a positive sample of the same category but different from the benchmark sample, represents a negative sample of a different category from the benchmark sample. α represents a set training interval, which is set to 0.2 in the present invention. f(x) is the feature of image x.
所述的LID表达式如下:The L ID expression is as follows:
在上式中,所述y表示所述训练样本的真实标签值。N表示数据集中行人身份数量。f(xi)表示网络对图片xi预测得到的嵌入。在本发明中ε值为常量,设置为0.1。In the above formula, y represents the true label value of the training sample. N represents the number of pedestrian identities in the data set. f( xi ) represents the embedding predicted by the network for the image xi . In the present invention, the ε value is a constant, set to 0.1.
所述的LC表达式如下:The L C expression is as follows:
上式中f(xi)为图片xi的特征。B是一个训练小批量样本数。表示类别为yi的所有特征的中心,即小批量中所有类别为yi的图片的特征的平均值。In the above formula, f( xi ) is the feature of image xi . B is the number of training mini-batch samples. Represents the center of all features of category yi , that is, the average value of the features of all images of category yi in the mini-batch.
进一步地,在全部网络训练至收敛后,将测试数据送入基于掩码指导的掩码自编码器微调图像补全模型,得到去遮挡的行人图像。然后,将去遮挡的行人图像送入基于动态图与图卷积的遮挡行人重识别网络,得到每张图片的特征。对每个查询图片,按特征距离进行排序,距离越近的库图像会被排在前面,取距离最近的10个库图片作为查询图片的查询结果。将具有相同身份标签的库图片作为该查询图匹配到的正确结果,并且计算该查询图片的平均准确率。计算所有查询图片的平均准确率,得到本发明在该数据集上的平均准确率均值。计算所有查询图片特征距离最近库图片正确率的均值,得到首位命中率。Furthermore, after all networks are trained to convergence, the test data is sent to a masked autoencoder fine-tuned image completion model based on mask guidance to obtain a de-occluded pedestrian image. Then, the de-occluded pedestrian image is sent to an occluded pedestrian re-identification network based on dynamic graphs and graph convolutions to obtain the features of each image. For each query image, sort by feature distance, and the library images with closer distances will be ranked in front, and the 10 library images with the closest distances are taken as the query results of the query image. The library images with the same identity label are taken as the correct results matched to the query graph, and the average accuracy of the query image is calculated. The average accuracy of all query images is calculated to obtain the average accuracy mean of the present invention on the data set. The average of the accuracy of the library images with the closest feature distances of all query images is calculated to obtain the first hit rate.
本发明提供了一种掩码自监督遮挡像素重建的遮挡行人重识别方法,具有以下优势:The present invention provides a masked pedestrian re-identification method for self-supervised reconstruction of occluded pixels, which has the following advantages:
(1)所述方法的自监督掩码指导的图像建模网络使用自监督的方式进行训练,其避免了大规模标注图像数据的时间浪费,并且其效果较有监督方法性能持平。(1) The self-supervised mask-guided image modeling network of the method is trained in a self-supervised manner, which avoids the time waste of large-scale labeling of image data, and its effect is comparable to that of supervised methods.
(2)所述方法采用了像素补全技术,使得网络可以将遮挡数据中被遮挡的行人完全的补全,使得现有的基于人体关键点的方法能够适用于被遮挡的图像。(2) The method adopts pixel completion technology, so that the network can completely complete the occluded pedestrians in the occluded data, so that the existing method based on human key points can be applied to occluded images.
(3)所述方法采用采用基于动态图神经和图卷积结合的遮挡行人重识别网络,有效的利用了图像空间信息,动态图神经网络有利于增大卷积神经网络的有效感受野,提高了对于不同场景和遮挡情况下的行人识别性能。(3) The method adopts an occluded pedestrian re-identification network based on a combination of dynamic graph neural networks and graph convolution, which effectively utilizes the image spatial information. The dynamic graph neural network is conducive to increasing the effective receptive field of the convolutional neural network and improving the pedestrian recognition performance in different scenes and occlusion conditions.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative labor.
图1是本发明提供的一种掩码自监督遮挡像素重建的遮挡行人重识别方法的流程图;FIG1 is a flow chart of a method for re-identifying an occluded person by masked self-supervised reconstruction of occluded pixels provided by the present invention;
图2是所述的补全模型(包括实例分割网络与图像建模网络)的示意图;FIG2 is a schematic diagram of the completion model (including an instance segmentation network and an image modeling network);
图3是补全模型对遮挡图像进行补全效果的示意图;FIG3 is a schematic diagram of the completion model's completion effect on an occluded image;
图4是基于动态图和图卷积的遮挡行人重识别网络结构的示意图;FIG4 is a schematic diagram of a network structure for occluded person re-identification based on dynamic graphs and graph convolution;
具体实施方式DETAILED DESCRIPTION
为使本发明的目的、技术方案和优点更加清楚明了,下面结合具体实施方式并参照附图,对本发明进一步详细说明。应该理解,这些描述只是示例性的,而并非要限制本发明的范围。此外,在以下说明中,省略了对公知结构和技术的描述,以避免不必要地混淆本发明的概念。In order to make the purpose, technical scheme and advantages of the present invention clearer, the present invention is further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings. It should be understood that these descriptions are only exemplary and are not intended to limit the scope of the present invention. In addition, in the following description, the description of well-known structures and technologies is omitted to avoid unnecessary confusion of the concept of the present invention.
示例性方法Exemplary Methods
如图1,本发明提供了基于掩码自监督遮挡像素重建的遮挡行人重识别方法流程图,所述方法步骤如下:As shown in FIG1 , the present invention provides a flow chart of a method for re-identifying an occluded person based on mask self-supervised reconstruction of occluded pixels, and the method steps are as follows:
步骤S110:首先将数据集中的测试集图片全部保留,将训练数据分为存在较完整行人人体结构图像(记为训练数据B)与存在较大遮挡物的行人重识别图像(记为训练数据A)。此外,将训练数据A图像中的行人进行实例分割级标记。将所有的数据都转换为高210像素,宽为98像素。Step S110: First, all test set images in the data set are retained, and the training data is divided into images with relatively complete pedestrian human body structures (referred to as training data B) and pedestrian re-identification images with large occlusions (referred to as training data A). In addition, pedestrians in the training data A images are labeled at the instance segmentation level. All data are converted to 210 pixels in height and 98 pixels in width.
步骤S120:通过训练数据A对实例分割网络进行训练,实例分割网络使用Mask2former网络,损失函数使用Diceloss。完成后将训练数据B中的图像送入实例分割网络得到训练数据B中每个图像的行人掩码。Step S120: Train the instance segmentation network using the training data A. The instance segmentation network uses the Mask2former network and the loss function uses Diceloss. After completion, the images in the training data B are sent to the instance segmentation network to obtain the pedestrian mask of each image in the training data B.
使用训练数据B与其掩码图像对自监督掩码指导的图像建模网络进行训练至收敛。具体的,将每张行人图片的掩码中黑色像素覆盖于对应行人图片上,白色像素不做处理。得到训练数据B_withMask。使用训练数据B_withMask对所述的图像建模网络进行自监督训练(自监督掩码指导的图像建模网络训练不使用图像掩膜,MiP使用随机函数生成)。The self-supervised mask-guided image modeling network is trained until convergence using the training data B and its mask image. Specifically, the black pixels in the mask of each pedestrian image are covered on the corresponding pedestrian image, and the white pixels are not processed. The training data B_withMask is obtained. The image modeling network is self-supervisedly trained using the training data B_withMask (the image modeling network training guided by the self-supervised mask does not use the image mask, and the M iP is generated using a random function).
图像建模网络将预测图片与对应掩码Xi,(H与W为图片高度与宽度,3是RGB图片的维度)转换为多个图像块,其中每个图像块的高为15像素,宽为7像素。对于图片Xi,Patch操作为对每个图片Xi进行卷积,卷积核大小高为15,宽为7,且步长同样高为15,宽为7,其每个图像块的输出维度为C。对图像转换为图像块并拉平得到的图像块嵌入结果如下:The image modeling network will predict the image and the corresponding mask Xi , (H and W are the image height and width, and 3 is the dimension of the RGB image) is converted into multiple image blocks, where each image block is 15 pixels high and 7 pixels wide. For the image Xi , the Patch operation is to convolve each image Xi , with a convolution kernel size of 15 high and 7 wide, and a stride of 15 high and 7 wide. The output dimension of each image block is C. The image is converted into image blocks and flattened to get the image block embedding result as follows:
XiP=Patch(Xi,θ) XiP = Patch( Xi ,θ)
MiP=Random(Mi)M iP = Random(M i )
其中C为维度数量,在本发明中C为768。掩码的维度数量为1。(Ph,Pw)表示每个图像块的分辨率即(15,7),是图像块的总数,在本发明中为196。Patch是图像分块函数,Patch函数中卷积核的可学习参数为θ。Random函数根据输入掩码图像大小计算图像块的数量,并随机生成对应数量的像素留存得分。所述的MiP表示当前图片Xi中每个图像块的像素留存得分范围从0到105,像素留存得分小于60的图像块会被标记为重建块,被标记的图像块对应的图像块嵌入被丢弃且不会进入编码器部分。训练过程中MiP随机生成。 Where C is the number of dimensions, and in the present invention, C is 768. The number of dimensions of the mask is 1. (P h ,P w ) represents the resolution of each image block, i.e., (15, 7). is the total number of image blocks, which is 196 in the present invention. Patch is an image block function, and the learnable parameter of the convolution kernel in the Patch function is θ. The Random function calculates the number of image blocks according to the input mask image size, and randomly generates a corresponding number of pixel retention scores. The M iP represents the pixel retention score of each image block in the current image Xi , ranging from 0 to 105. Image blocks with pixel retention scores less than 60 will be marked as reconstructed blocks, and the image block embedding corresponding to the marked image blocks will be discarded and will not enter the encoder part. M iP is randomly generated during training.
将所述的未丢弃图像块嵌入与图像块位置编码相加后输入像素重建编码器。所述的位置编码使用2D位置编码公式如下:The non-discarded image block embedding and the image block position coding are added and then input into the pixel reconstruction encoder. The position coding uses the 2D position coding formula as follows:
所述的posX与posY是图像块在原图片中的横坐标与纵坐标。所述的dmodel是位置编码的维度数,在本发明中dmodel与图像块嵌入C相同。i的取值为从0到0.5C-1的正整数。位置编码的每一项计算完毕后,将所得的编码按如下方式排列,得到所述的2D位置编码:PE(posX,0),PE(posY,1),PE(posX,2),PE(posY,3)...PE(posX,766),PE(posY,767)。The pos X and pos Y are the horizontal and vertical coordinates of the image block in the original image. The d model is the dimension of the position code, and in the present invention, the d model is the same as the image block embedding C. The value of i is a positive integer from 0 to 0.5C-1. After each item of the position code is calculated, the obtained codes are arranged in the following manner to obtain the 2D position code: PE (pos X , 0), PE (pos Y , 1), PE (pos X , 2), PE (pos Y , 3) ... PE (pos X , 766), PE (pos Y , 767).
经过重建编码器,原图像块嵌入被转换为中间张量。在中间张量中的被丢弃像素块位置插入可学习张量,得到解码器输入张量。可学习张量的形状与被替换的图像块数据形状相同,为C代表图像块嵌入的维度数,即768。将解码器输入张量输入解码器,得到经过像素重建的所有重建的图像块嵌入,将所述的重建的图像块嵌入解块后得到重建后的图像。After the reconstruction encoder, the original image block embedding is converted into an intermediate tensor. The learnable tensor is inserted into the position of the discarded pixel block in the intermediate tensor to obtain the decoder input tensor. The shape of the learnable tensor is the same as the shape of the replaced image block data, which is C represents the number of dimensions of the image block embedding, that is, 768. The decoder input tensor is input into the decoder to obtain embeddings of all reconstructed image blocks after pixel reconstruction, and the reconstructed image blocks are embedded in the deblocking to obtain a reconstructed image.
自监督掩码指导的图像建模网络仅对像素留存得分小于60的图像块计算损失,损失采用均方误差损失。The self-supervised mask-guided image modeling network only calculates the loss for image patches with pixel retention scores less than 60, and the loss uses the mean square error loss.
进一步地,在测试过程中,将测试数据送入收敛的实例分割网络,得到测试图片掩码MiTest。将测试数据与测试图片掩码MiTest共同输入所述的图像建模网络,得到去遮挡的行人图像。Furthermore, during the test process, the test data is fed into the converged instance segmentation network to obtain the test image mask MiTest . The test data and the test image mask MiTest are jointly fed into the image modeling network to obtain a de-occluded pedestrian image.
测试过程中仅有像素留存得分MiP的获得方式与训练过程不同。所述的测试过程中像素留存得分MiP的获得方式,如下公式:During the test, only the method for obtaining the pixel retention score M iP is different from that during the training process. The method for obtaining the pixel retention score M iP during the test is as follows:
MiP=Patch_formask(MiTest,1) MiP =Patch_formask( MiTest ,1)
MiP表示当前图片Mi中所有图像块的像素留存得分,每个得分取值从0到105,像素留存得分小于60的图像块会被标记为重建块,被标记的图像块对应的XiP被丢弃且不会进入编码器部分。M iP represents the pixel retention score of all image blocks in the current image M i . Each score ranges from 0 to 105. Image blocks with pixel retention scores less than 60 will be marked as reconstructed blocks. The XiP corresponding to the marked image blocks will be discarded and will not enter the encoder part.
Patch_formask是与Patch函数结构相同的图像分块函数,它们的区别在于Patch函数中卷积核的可学习参数为θ,Patch_formask函数中卷积核中的参数都不可学习且固定为1。像素留存得分的输出维度数为1,即每个图像块对应一个像素留存得分。Patch_formask is an image segmentation function with the same structure as the Patch function. The difference between them is that the learnable parameter of the convolution kernel in the Patch function is θ, while the parameters in the convolution kernel in the Patch_formask function are not learnable and are fixed to 1. The output dimension of the pixel retention score is 1, that is, each image block corresponds to one pixel retention score.
Patch_formask函数中1代表卷积核中的参数都不可学习且固定为1。In the Patch_formask function, 1 means that the parameters in the convolution kernel are not learnable and are fixed to 1.
其他过程与所述的训练过程完全相同。即为将测试数据送入实力分割网络,得到测试图片掩码。将测试数据与测试图片掩码共同送入自监督掩码指导的图像建模网络,得到去遮挡的行人图像。所述去遮挡的行人图像作为基于动态图与图卷积的遮挡行人重识别网络的测试数据。The rest of the process is exactly the same as the training process. That is, the test data is fed into the strength segmentation network to obtain the test image mask. The test data and the test image mask are fed into the self-supervised mask-guided image modeling network to obtain the de-occluded pedestrian image. The de-occluded pedestrian image is used as the test data for the occluded pedestrian re-identification network based on dynamic graph and graph convolution.
步骤S130:使用训练集B对遮挡行人重识别网络进行训练。其中行人重识别网络包括ResNet-50网络、动态图结构模块与图卷积特征传播模块。其中ResNet-50网络用于在行人重识别图像中提取特征图,所述动态图神经网络特征传播模块用于对标准化后的特征图进行特征传播以获得更为鲁棒且有判别力的特征。具体来说,动态图结构建立模块在ResNet-50的stage1、2、3、4四个阶段后分别插入多层动态图结构建立模块与动态图卷积特征传播模块(模块仅会对特征图的内容进行改变,并不会改变其形状)。Step S130: Use training set B to train the occluded pedestrian re-identification network. The pedestrian re-identification network includes a ResNet-50 network, a dynamic graph structure module and a graph convolution feature propagation module. The ResNet-50 network is used to extract feature maps in pedestrian re-identification images, and the dynamic graph neural network feature propagation module is used to propagate features on the standardized feature maps to obtain more robust and discriminative features. Specifically, the dynamic graph structure establishment module inserts a multi-layer dynamic graph structure establishment module and a dynamic graph convolution feature propagation module after stage 1, 2, 3, and 4 of ResNet-50 respectively (the module only changes the content of the feature map, and does not change its shape).
将高为H,宽为W的特征图输入动态图结构模块,得到该特征图对应的邻接矩阵R。The feature map with a height of H and a width of W is input into the dynamic graph structure module to obtain the adjacency matrix R corresponding to the feature map.
所述动态图结构模块的伪代码如下:The pseudo code of the dynamic graph structure module is as follows:
进一步地,计算特征图中的每个结点的相关性,得到相关矩阵,公式如下:Furthermore, the correlation of each node in the feature graph is calculated to obtain the correlation matrix. The formula is as follows:
R=θ(F)φ(F)T R=θ(F)φ(F) T
其中,R是相关矩阵,是卷积层输出的特征图,C是特征图的维度数,W与H分别是特征图的宽与高。XT表示矩阵X的转置。θ(F)和φ(F)表示将特征图F分别输入具有完全相同的结构但不同参数的两个传递函数。所述传递函数由一个1×1卷积层、一个批量归一化层和一个ReLU激活函数构成。Where R is the correlation matrix, is the feature map output by the convolutional layer, C is the number of dimensions of the feature map, and W and H are the width and height of the feature map, respectively. XT represents the transpose of the matrix X. θ(F) and φ(F) represent the input of the feature map F into two transfer functions with exactly the same structure but different parameters. The transfer function consists of a 1×1 convolutional layer, a batch normalization layer, and a ReLU activation function.
接着,将相关矩阵R与邻接矩阵A相乘,并使用softmax函数进行归一化,得到相似度邻接矩阵,公式如下:Next, multiply the correlation matrix R by the adjacency matrix A and normalize it using the softmax function to obtain the similarity adjacency matrix. The formula is as follows:
其中,是相似度邻接矩阵,N为特征图中的结点数,A为动态图结构模块输出的邻接矩阵,⊙表示矩阵的哈达玛积。in, is the similarity adjacency matrix, N is the number of nodes in the feature graph, A is the adjacency matrix output by the dynamic graph structure module, and ⊙ represents the Hadamard product of the matrix.
然后,在图结构A上进行节点特征传播,公式如下:Then, node feature propagation is performed on the graph structure A, and the formula is as follows:
其中,是经过特征传播后的特征,是卷积层输出的特征图经矩阵变换得到的特征。通过矩阵转换,传播后的特征可以重新转换为特征图。in, is the feature after feature propagation, It is the feature obtained by matrix transformation of the feature map output by the convolution layer. Through matrix transformation, the propagated features can be converted back into feature maps.
将上述动态图结构模块与图卷积特征传播模块的所有过程记为OGA(F)。其中,是卷积层输出的特征图。All the processes of the above dynamic graph structure module and graph convolution feature propagation module are denoted as OGA(F). It is the feature map output by the convolutional layer.
进一步地,将残差结构引入特征传播过程。同时,OGA模块进行多次堆叠能使特征进行更充分地传播,所述的有残差结构的OGA模块堆叠方式如下:Furthermore, the residual structure is introduced into the feature propagation process. At the same time, multiple stacking of OGA modules can make the features propagate more fully. The stacking method of the OGA modules with residual structure is as follows:
其中β代表可学习参数。Where β represents the learnable parameter.
进一步地,在训练过程中,所述的遮挡行人重识别网络的损失函数表达式为:Furthermore, during the training process, the loss function of the occluded person re-identification network is expressed as:
L=LID+LTriplet+εLC L=L ID +L Triplet +εL C
其中,LTriplet是三元组损失,LID是ID损失,LC是Center损失。ε是Center损失的平衡权重,在本网络中,其设置为0.1。Among them, L Triplet is the triplet loss, L ID is the ID loss, and L C is the Center loss. ε is the balance weight of the Center loss, which is set to 0.1 in this network.
所述的LTriplet表达式如下:The L Triplet expression is as follows:
上式中B是一个训练小批量样本数,表示基准样本,表示与基准样本相同类别但不同的正样本,表示与基准样本不同类别的负样本。α表示设定的训练间隔,在本发明中设置为0.2。f(x)为图片x的特征。In the above formula, B is the number of training mini-batch samples. represents the benchmark sample, represents a positive sample of the same category but different from the benchmark sample, represents a negative sample of a different category from the benchmark sample. α represents a set training interval, which is set to 0.2 in the present invention. f(x) is the feature of image x.
所述的LID表达式如下:The L ID expression is as follows:
在上式中,所述y表示所述训练样本的真实标签值。N表示数据集中行人身份数量。f(xi)表示网络对图片xi预测得到的嵌入。在本发明中ε值为常量,设置为0.1。In the above formula, y represents the true label value of the training sample. N represents the number of pedestrian identities in the data set. f( xi ) represents the embedding predicted by the network for the image xi . In the present invention, the ε value is a constant, set to 0.1.
所述的LC表达式如下:The L C expression is as follows:
上式中f(xi)为图片xi的特征。B是一个训练小批量样本数。表示类别为yi的所有特征的中心,即小批量中所有类别为yi的图片的特征的平均值。In the above formula, f( xi ) is the feature of image xi . B is the number of training mini-batch samples. Represents the center of all features of category yi , that is, the average value of the features of all images of category yi in the mini-batch.
利用梯度下降算法进行网络训练,采用Adam默认配置。当损失值降到最低时训练结束。The network is trained using the gradient descent algorithm with the default configuration of Adam. The training ends when the loss value reaches the minimum.
步骤S140:得到训练至收敛的网络即可使用测试数据对基于动态图与图卷积的遮挡行人重识别网络进行测试。将步骤S120得到的去遮挡的行人图像送入行人重识别网络进行特征提取,得到每张图片的特征。对每个查询图片,按特征距离进行排序,距离越近的库图像会被排在前面,取距离最近的10个库图片作为查询图片的查询结果。将具有相同身份标签的库图片作为该查询图匹配到的正确结果,并且计算该查询图片的平均准确率。计算所有查询图片的平均准确率,得到本发明在该数据集上的平均准确率均值。计算所有查询图片特征距离最近库图片正确率的均值,得到首位命中率。Step S140: After the network is trained to convergence, the test data can be used to test the occluded pedestrian re-identification network based on dynamic graph and graph convolution. The de-occluded pedestrian image obtained in step S120 is sent to the pedestrian re-identification network for feature extraction to obtain the features of each image. For each query image, sort by feature distance, and the library images with closer distances will be ranked in front. The 10 library images with the closest distances are taken as the query results of the query image. The library images with the same identity label are taken as the correct results matched to the query graph, and the average accuracy of the query image is calculated. The average accuracy of all query images is calculated to obtain the average accuracy mean of the present invention on the data set. Calculate the average accuracy of the library images with the closest feature distance of all query images to obtain the first hit rate.
步骤S150:根据所得的分类结果,计算所述的遮挡行人重识别数据集Occluded-DukeMTMC上的行人重识别准确率。Step S150: Calculate the pedestrian re-identification accuracy rate on the occluded pedestrian re-identification dataset Occluded-DukeMTMC according to the obtained classification results.
通过本实施方式,首先将数据集分为有较为完整人体和有较大遮挡像素两部分(分别为训练数据A、B),然后使用训练数据A、B对图像补全模型进行训练至收敛。使用训练数据B对遮挡行人重识别网络进行训练。在测试过程中,首先将测试数据送入所述的图像补全模型中:先送入所述的实例分割网络,得到每个测试数据对应的掩码后。将测试数据与测试图片掩码同时送入所述的图像建模网络中进行遮挡像素补全。最后将去遮挡的行人图像送入行人重识别网络进行特征提取与预测,得到网络预测的分类结果。Through this implementation, the data set is first divided into two parts: one with a relatively complete human body and one with large occluded pixels (training data A and B, respectively). Then, the image completion model is trained until convergence using training data A and B. The occluded pedestrian re-identification network is trained using training data B. During the test process, the test data is first sent to the image completion model: first sent to the instance segmentation network to obtain the mask corresponding to each test data. The test data and the test image mask are simultaneously sent to the image modeling network for occluded pixel completion. Finally, the de-occluded pedestrian image is sent to the pedestrian re-identification network for feature extraction and prediction to obtain the classification result predicted by the network.
进一步说明,假设将一个遮挡行人重识别数据集根据本实施方式进行分类,将得到一个准确率高于大多数方法的分类结果。To further illustrate, assuming that an occluded pedestrian re-identification dataset is classified according to this implementation, a classification result with higher accuracy than most methods will be obtained.
具体实施方式结果Detailed description of the invention Results
本实施方式采用已公开的遮挡行人重识别数据集Occluded-DukeMTMC。数据集的细节描述如下:This implementation uses the publicly available occluded person re-identification dataset Occluded-DukeMTMC. The details of the dataset are as follows:
在这个数据集中,所有查询图像都被各种各样的对象(例如,树、汽车、其他人)遮挡。训练、查询和图库集分别包含14%/15%/10%的遮挡图像。Occluded-DukeMTMC的训练集包含15,618张图像,共覆盖702个身份。测试集包含1,110个身份,其中17,661个图库图像和2,210个查询图像。Occluded-DukeMTMC数据集是当前遮挡行人重识别任务中最大,且难度最高的数据集。In this dataset, all query images are occluded by various objects (e.g., trees, cars, other people). The training, query, and gallery sets contain 14%/15%/10% occluded images, respectively. The training set of Occluded-DukeMTMC contains 15,618 images, covering a total of 702 identities. The test set contains 1,110 identities, including 17,661 gallery images and 2,210 query images. The Occluded-DukeMTMC dataset is the largest and most difficult dataset for the current occluded person re-identification task.
为了验证本实施方式(Ours)的优越性,将本实施方式与几种现有的遮挡行人重识别方法进行比较,包括BoT、MHSA、OAMN、HG等方法,将会比较这些方法在上述公开数据集上的平均准确率均值(mAP)与首位命中率(Rank-1),具体的数据对比如表一所示。In order to verify the superiority of this implementation (Ours), this implementation is compared with several existing occluded pedestrian re-identification methods, including BoT, MHSA, OAMN, HG and other methods. The mean average precision (mAP) and the first hit rate (Rank-1) of these methods on the above public datasets will be compared. The specific data comparison is shown in Table 1.
表一Occluded-DukeMTMC数据集准确率(%)Table 1 Accuracy of Occluded-DukeMTMC dataset (%)
通过上表数据对比,可以清楚的看到,Ours达到了最好的性能,显著提高了遮挡行人重识别的准确率。定量结果充分说明了Ours的优越性,因为Ours能更好的补全遮挡图像中的人体关键点信息和视角差异性信息。Ours使用卷积神经网络与动态图神经网络集合的方式,有效的扩大的网络的感受野,从而提高了对遮挡行人重识别数据的鲁棒性。大量实验表明,该方法优于现有方法。By comparing the data in the above table, we can clearly see that Ours achieves the best performance and significantly improves the accuracy of occluded pedestrian re-identification. The quantitative results fully demonstrate the superiority of Ours, because Ours can better complete the human key point information and perspective difference information in the occluded image. Ours uses a combination of convolutional neural networks and dynamic graph neural networks to effectively expand the network's receptive field, thereby improving the robustness of occluded pedestrian re-identification data. A large number of experiments show that this method is superior to existing methods.
本实施方式提出了一种基于掩码自监督遮挡像素重建的遮挡行人重识别方法,用于对遮挡图像进行行人重识别任务。This embodiment proposes an occluded person re-identification method based on mask self-supervised occluded pixel reconstruction, which is used to perform pedestrian re-identification tasks on occluded images.
通过训练一个由实例分割网络与自监督掩码指导的图像建模网络构成的图像补全模型,对图像中被遮挡的部分行人像素进行重建,得到去遮挡的行人图像。后通过一个基于动态图和图卷积的遮挡行人重识别网络对得到的去遮挡的行人图像进行行人重识别。所述的遮挡行人重识别网络使用了动态图结构可以有效的扩大卷积神经网络的感受野,增加其学习到的特征图的判别性与鲁棒性。而在公开数据集Occluded-DukeMTMC上的实验结果表明,本实施方式相对于其他方法有着更高的准确率,有着更好的优越性。By training an image completion model consisting of an instance segmentation network and an image modeling network guided by a self-supervised mask, the occluded pedestrian pixels in the image are reconstructed to obtain a de-occluded pedestrian image. Then, an occluded pedestrian re-identification network based on dynamic graphs and graph convolution is used to re-identify the de-occluded pedestrian image. The occluded pedestrian re-identification network uses a dynamic graph structure to effectively expand the receptive field of the convolutional neural network and increase the discriminability and robustness of the feature graph learned. The experimental results on the public dataset Occluded-DukeMTMC show that this implementation has higher accuracy and better superiority than other methods.
应当理解的是,本发明的上述具体实施方式仅仅用于示例性说明或解释本发明的原理,而不构成对本发明的限制。因此,在不偏离本发明的精神和范围的情况下所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。此外,本发明所附权利要求旨在涵盖落入所附权利要求范围和边界、或者这种范围和边界的等同形式内的全部变化和修改例。It should be understood that the above specific embodiments of the present invention are only used to illustrate or explain the principles of the present invention, and do not constitute a limitation of the present invention. Therefore, any modifications, equivalent substitutions, improvements, etc. made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. In addition, the appended claims of the present invention are intended to cover all changes and modifications that fall within the scope and boundaries of the appended claims, or the equivalent forms of such scope and boundaries.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410016648.1A CN117877068B (en) | 2024-01-04 | 2024-01-04 | Mask self-supervision shielding pixel reconstruction-based shielding pedestrian re-identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410016648.1A CN117877068B (en) | 2024-01-04 | 2024-01-04 | Mask self-supervision shielding pixel reconstruction-based shielding pedestrian re-identification method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117877068A CN117877068A (en) | 2024-04-12 |
CN117877068B true CN117877068B (en) | 2024-09-20 |
Family
ID=90584054
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410016648.1A Active CN117877068B (en) | 2024-01-04 | 2024-01-04 | Mask self-supervision shielding pixel reconstruction-based shielding pedestrian re-identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117877068B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118170936B (en) * | 2024-05-08 | 2024-07-26 | 齐鲁工业大学(山东省科学院) | An occluded pedestrian retrieval method based on multimodal data and relationship enhancement |
CN119091467A (en) * | 2024-09-12 | 2024-12-06 | 北京航空航天大学 | A method for re-identification of occluded pedestrians based on subject target discrimination |
CN119205924B (en) * | 2024-11-27 | 2025-05-13 | 山东建筑大学 | Anti-shielding human head three-dimensional positioning method and system based on instance segmentation |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108520226A (en) * | 2018-04-03 | 2018-09-11 | 东北大学 | A Pedestrian Re-Identification Method Based on Body Decomposition and Saliency Detection |
CN113538273A (en) * | 2021-07-13 | 2021-10-22 | 荣耀终端有限公司 | Image processing method and image processing device |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11205082B2 (en) * | 2019-10-08 | 2021-12-21 | Toyota Research Institute, Inc. | Spatiotemporal relationship reasoning for pedestrian intent prediction |
CN115909488A (en) * | 2022-11-10 | 2023-04-04 | 杭州电子科技大学 | An occluded person re-identification method based on pose guidance and dynamic feature extraction |
CN116682144B (en) * | 2023-06-20 | 2023-12-22 | 北京大学 | Multi-modal pedestrian re-recognition method based on multi-level cross-modal difference reconciliation |
-
2024
- 2024-01-04 CN CN202410016648.1A patent/CN117877068B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108520226A (en) * | 2018-04-03 | 2018-09-11 | 东北大学 | A Pedestrian Re-Identification Method Based on Body Decomposition and Saliency Detection |
CN113538273A (en) * | 2021-07-13 | 2021-10-22 | 荣耀终端有限公司 | Image processing method and image processing device |
Also Published As
Publication number | Publication date |
---|---|
CN117877068A (en) | 2024-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111259786B (en) | Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video | |
CN117877068B (en) | Mask self-supervision shielding pixel reconstruction-based shielding pedestrian re-identification method | |
CN112597941B (en) | Face recognition method and device and electronic equipment | |
CN109934200B (en) | RGB color remote sensing image cloud detection method and system based on improved M-Net | |
CN108960141B (en) | Pedestrian Re-identification Method Based on Enhanced Deep Convolutional Neural Network | |
CN110084156A (en) | A kind of gait feature abstracting method and pedestrian's personal identification method based on gait feature | |
CN112836646B (en) | Video pedestrian re-identification method based on channel attention mechanism and application | |
CN108520216A (en) | A method of identity recognition based on gait images | |
CN110059768A (en) | The semantic segmentation method and system of the merging point and provincial characteristics that understand for streetscape | |
CN110390308B (en) | Video behavior identification method based on space-time confrontation generation network | |
CN114330529A (en) | Real-time pedestrian shielding detection method based on improved YOLOv4 | |
CN111523586B (en) | Noise-aware-based full-network supervision target detection method | |
CN113505719B (en) | Gait recognition model compression system and method based on local-whole joint knowledge distillation algorithm | |
CN110647820B (en) | Low-resolution face recognition method based on feature space super-resolution mapping | |
CN114998995B (en) | Cross-view gait recognition method based on metric learning and spatiotemporal dual-stream network | |
CN112861605A (en) | Multi-person gait recognition method based on space-time mixed characteristics | |
CN115019039B (en) | An instance segmentation method and system combining self-supervision and global information enhancement | |
CN118097150A (en) | A small sample camouflage target segmentation method | |
Sun et al. | [Retracted] Research on Face Recognition Algorithm Based on Image Processing | |
CN115830643B (en) | Light pedestrian re-recognition method based on posture guiding alignment | |
CN116704367A (en) | Multi-scale feature fusion farmland change detection method and system | |
Zhou et al. | MaskNet++: Inlier/outlier identification for two point clouds | |
CN115909398A (en) | A cross-domain pedestrian re-identification method based on feature enhancement | |
CN113283393B (en) | Deepfake video detection method based on image group and two-stream network | |
CN119495004A (en) | Water extraction method from remote sensing images based on multi-scale feature fusion of compressed pyramid gating units |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |