CN111079516B - Pedestrian gait segmentation method based on deep neural network - Google Patents
Pedestrian gait segmentation method based on deep neural network Download PDFInfo
- Publication number
- CN111079516B CN111079516B CN201911050215.3A CN201911050215A CN111079516B CN 111079516 B CN111079516 B CN 111079516B CN 201911050215 A CN201911050215 A CN 201911050215A CN 111079516 B CN111079516 B CN 111079516B
- Authority
- CN
- China
- Prior art keywords
- mask
- pedestrian
- gait
- size
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a pedestrian gait segmentation method based on a deep neural network, aiming at the problems that the O-shaped shape between two legs is difficult to segment and the leg shape segmentation is not fine enough during pedestrian gait segmentation. The method realizes the fine segmentation of the gait of the pedestrian by two steps of designing a cavity convolution residual convolution network and adding an edge detector branch; the method has the advantages that the receptive field of the shallow network is improved by replacing the common convolution of the resnet in the last stage with the cavity convolution, characteristics of more information are obtained and transmitted to the next stage, and finally the obtained mask is input into an edge detector composed of edge detection operators, so that the problem that the gait edge in the gait of the pedestrian is not fitted is well solved, the more accurate gait edge of the pedestrian is obtained, and the fineness of leg segmentation is improved.
Description
Technical Field
The invention relates to the technical field of image processing and pattern recognition in computer vision, in particular to a pedestrian gait segmentation method based on a deep neural network.
Background
In recent years, video surveillance is widely applied to the fields of traffic, military, urban construction, safety and the like, and the importance of the video surveillance is becoming more and more non-negligible.
Gait segmentation of pedestrians is an indispensable part of video monitoring technology. The pedestrian region is extracted from the image video of the pedestrian gait, which is an important link of the pedestrian gait recognition and is one of the most rigorous computer vision tasks.
Currently, there is less research on pedestrian gait segmentation, while the study on example segmentation is relatively more sophisticated. Instance segmentation is a basic computer vision technique, which is a key step from image processing to image analysis, is the first step of performing image analysis, and is one of the most demanding computer vision tasks, which involve object localization and segmentation of object instances. In recent years, a large number of example segmentation papers are published, and a large number of example segmentation methods are provided, so that a good technical basis is provided for pedestrian gait segmentation.
Disclosure of Invention
The invention aims to provide a pedestrian gait segmentation method based on a deep neural network.
In order to achieve the purpose, the invention is realized by the following technical scheme:
a pedestrian gait segmentation method based on a deep neural network is characterized by comprising the following steps:
s1) predicting gait boundaries of pedestrians
Predicting gait boundaries of 1 or more pedestrians in a picture or video given the picture or video;
for the pictures, detecting targets of all pedestrians in a single picture, and carrying out gait segmentation on the targets;
for a video, inputting each frame, detecting targets of all pedestrians in each frame of the video, carrying out gait segmentation on the targets, outputting each processed frame and combining the processed frames to form a segmented pedestrian gait video;
s2) image preprocessing and label making
Uniformly adjusting the size of the segmented pedestrian gait images to h x w, wherein h is the height of the images, and w is the width of the images;
making a label, carrying out pixel value processing on the targets at the same positions of the image, adopting pixels with the pixel values of 14 to delineate the positions of the pedestrians, and uniformly setting the pixel values of 0 at the positions of the non-pedestrians to represent the background;
s3) constructing a gait segmentation deep convolution neural network
S3-1) extracting characteristics by adopting a basic network
A resnet50 network is used as a basic network, and on the resnet50 network structure, the hole convolution with the hole rate of 2 is used for replacing the common convolution of the last stage of the resnet 50;
s3-2) inputting the image preprocessed in the step S2) into the basic network in the step S3-1), inputting the image into the FPN after passing through the basic network to further extract the feature of each dimension, and effectively generating a multi-dimension feature expression method for the image by utilizing feature expression structures of different dimensions of the same scale image of each layer from bottom to top of the FPN;
s3-3) generating ROI features with the size of 14 × 256 by the features extracted in the step S3-2) through ROIAlign, generating candidate frame region pro-position mapping through ROIAlign to generate feature maps with fixed size, and obtaining more accurate pedestrian candidate frames by adopting a bilinear interpolation method;
s3-4) converting the feature map with the size of 14 × 256 in the step S3-3) into the pedestrian P _ mask with the size of 28 × 1 through 5 convolutions and deconvolution;
s3-5) performing a kernel size of 2 and a stride of 2 on the 28 × 1P _ mask obtained in step S3-4) to make the predicted mask have the same spatial size as the output of step S3-3), and combining the predicted mask with the output of step S3-3) to obtain a 14 × 257 size feature map;
the characteristic diagram passes through 4 convolutional layers, and the kernel size and the number of filters of the 4 convolutional layers are respectively set to be 3 and 256; adding 3 full convolution layers, wherein the first two full convolution layers are set to be 1024, the last full convolution layer is set to be the number of categories, the number of the categories is 1, namely pedestrians; the output value is the score of the mask, the threshold value is set to be 0.5, and the mask with the threshold value larger than 0.5 is adopted and is defined as GT _ mask;
s4) constructing a loss function by using a Binary Cross Entropy loss function Binary _ Cross _ Encopy, and expressing the real probability asThe prediction probability is expressed asWhere y represents the probability that the sample belongs to a pedestrian, 1-y represents the probability that the sample belongs to the background,which represents the probability of predicting a pedestrian,representing the probability of predicting the background, the similarity between p and q is measured by cross entropy, and the formula is as follows:
s5) comparing the information of each pixel point in the GT _ mask and the P _ mask by using a Binary Cross Entropy loss function Binary _ Cross _ Encopy;
s6) inputting the P _ mask and the GT _ mask obtained in the step S3) into an edge detector, wherein the edge detector is composed of one edge detection operator with the size of 3 × 1, the two masks and the edge detection operator carry out convolution to obtain the edges of the two masks, and the edge result obtained after the P _ mask is input is defined as the edge resultFor the edge result obtained after GT _ mask input, it is defined as;
S7) the product obtained in the step S6)Andthe loss function loss is constructed, the formula is as follows:
compared with the prior art, the invention has the following advantages:
the invention provides a pedestrian gait segmentation method based on a deep neural network, aiming at the situation that O-shaped legs exist in pedestrian gait and the leg shape is difficult to draw. The method realizes the fine segmentation of the gait of the pedestrian by two steps of designing a cavity convolution residual convolution network and adding an edge detector branch; the cavity convolution is used for replacing the common convolution of the last stage of the resnet to improve the receptive field of the shallow network, the characteristics of more information are obtained and transmitted to the next stage, and the finally obtained mask is input into an edge detector consisting of edge detection operators, so that the problem that the gait edge in the gait of the pedestrian is not fitted is well solved, and the gait edge of the pedestrian is more accurate.
Detailed Description
A pedestrian gait segmentation method based on a deep neural network is characterized by comprising the following steps:
s1) predicting gait boundaries of pedestrians
Predicting gait boundaries of 1 or more pedestrians in a picture or video given the picture or video;
for the pictures, detecting targets of all pedestrians in a single picture, and carrying out gait segmentation on the targets;
and for the video, inputting each frame, detecting the targets of all pedestrians in each frame of the video, carrying out gait segmentation on the targets, outputting each processed frame and combining the processed frames into a segmented pedestrian gait video.
S2) image preprocessing and label making
Uniformly adjusting the size of the segmented pedestrian gait images to h x w, wherein h is the height of the images, and w is the width of the images;
and (3) making a label, performing pixel value processing on the target at the same position of the image, drawing the position of the pedestrian by adopting a pixel with a pixel value of 14, and uniformly setting the pixel value of the non-pedestrian position to be 0 to represent the background.
S3) constructing a gait segmentation deep convolution neural network
S3-1) extracting characteristics by adopting a basic network
A resnet50 network is used as a basic network, and on the basis of a resnet50 network structure, the common convolution of the resnet50 in the last stage is replaced by the hole convolution with the hole rate of 2; the receptive field of the network is enlarged, and the subsequent characteristic extraction of a deep network is facilitated; wherein, the resnet50 network is a 50-layer residual convolution network, namely deep residual network;
s3-2) inputting the image preprocessed in the step S2) into the basic network of the step S3-1), inputting the image into an FPN (field programmable gate array) after the image passes through the basic network to further extract the feature of each dimension, wherein the FPN is an efficient CNN feature extraction method, and a method for effectively generating multi-dimension feature expression of a picture by using feature expression structures of different dimensions of the same scale picture of each layer from bottom to top of the FPN is utilized, so that a feature map with stronger expression force is generated to be used for a computer vision task of the next stage;
s3-3) generating 14 × 256 ROI features extracted in the step S3-2) through ROIAlign, generating candidate frame region pro-positive mapping through ROIAlign to generate fixed-size feature map, and obtaining more accurate pedestrian candidate frames by adopting a bilinear interpolation method; ROIAlign is a regional feature aggregation mode proposed in Kaiming He, et al, mask R-CNN, ICCV 2017;
s3-4) converting the feature map with the size of 14 × 256 in the step S3-3) into the pedestrian P _ mask with the size of 28 × 1 through 5 convolutions and deconvolution;
s3-5) performing a max posing layer with kernel size of 2 and stride of 2 on the P _ mask with the size of 28 × 1 obtained in the step S3-4) to enable the predicted mask to have the same space size as the output in the step S3-3), and combining the predicted mask with the output in the step S3-3) to obtain a feature map with the size of 14 × 257;
the characteristic diagram passes through 4 convolutional layers, and the kernel size and the number of filters of the 4 convolutional layers are respectively set to be 3 and 256; adding 3 full convolution layers, wherein the first two full convolution layers are set to be 1024, the last full convolution layer is set to be the number of categories, the number of the categories is 1, namely pedestrians; the output value is the score of the mask, a threshold value is set to be 0.5, and the GT _ mask is defined by adopting the mask with the threshold value being more than 0.5.
S4) constructing a loss function by using a Binary-classification Cross Entropy loss function Binary _ Cross _ Encopy, and expressing the real probability asThe prediction probability is expressed asWherein y representsThe probability that the sample belongs to a pedestrian, 1-y represents the probability that the sample belongs to the background,which represents the probability of predicting a pedestrian,representing the probability of predicting the background, the similarity between p and q is measured by cross entropy, and the formula is as follows:
s5) comparing the information of each pixel point in the GT _ mask and the P _ mask by using a Binary Cross Entropy loss function Binary _ Cross _ Encopy.
S6) inputting the P _ mask and the GT _ mask obtained in the step S3) into an edge detector, wherein the edge detector is composed of one edge detection operator with the size of 3 × 1, the two masks and the edge detection operator carry out convolution to obtain the edges of the two masks, and the edge result obtained after the P _ mask is input is defined as the edge result obtained after the P _ mask is inputFor the edge result obtained after GT _ mask input, it is defined as。
S7) the product obtained in the step S6)Andthe loss function loss is constructed, the formula is as follows:
the gait edge fitting degree of the pedestrian after passing through the edge detector is greatly improved, and the gap contour between the two legs can be detected.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, many modifications and variations can be made without departing from the spirit of the present invention, and these modifications and variations should also be considered within the scope of the present invention.
Claims (1)
1. A pedestrian gait segmentation method based on a deep neural network is characterized by comprising the following steps:
s1) predicting gait boundaries of pedestrians
Predicting gait boundaries of 1 or more pedestrians in a picture or video given the picture or video;
for the pictures, detecting targets of all pedestrians in a single picture, and carrying out gait segmentation on the targets;
inputting each frame of a video, detecting targets of all pedestrians in each frame of the video, carrying out gait segmentation on the targets, outputting each processed frame and combining the processed frames into a segmented pedestrian gait video;
s2) image preprocessing and label making
Uniformly adjusting the size of the segmented pedestrian gait images to h x w, wherein h is the height of the images, and w is the width of the images;
making a label, carrying out pixel value processing on targets at the same positions of the image, drawing edges of the positions of pedestrians by adopting pixels with the pixel value of 14, and uniformly setting the pixel values of non-pedestrian positions to be 0 to represent the background;
s3) constructing a gait segmentation depth convolution neural network
S3-1) extracting characteristics by adopting a basic network
A resnet50 network is used as a basic network, and on the basis of a resnet50 network structure, the common convolution of the resnet50 in the last stage is replaced by the hole convolution with the hole rate of 2;
s3-2) inputting the image preprocessed in the step S2) into the basic network in the step S3-1), inputting the image into the FPN after passing through the basic network to further extract the feature of each dimension, and effectively generating a multi-dimension feature expression method for the image by utilizing feature expression structures of different dimensions of the same scale image of each layer from bottom to top of the FPN;
s3-3) generating 14 × 256 ROI features extracted in the step S3-2) through ROIAlign, generating candidate frame region pro-positive mapping through ROIAlign to generate fixed-size feature map, and obtaining more accurate pedestrian candidate frames by adopting a bilinear interpolation method;
s3-4) carrying out 5 convolutions on the feature map with the size of 14 × 256 in the step S3-3), and then carrying out deconvolution on the feature map to transform the feature map into the pedestrian P _ mask with the size of 28 × 1;
s3-5) performing a kernel size of 2 and a stride of 2 on the 28 × 1P _ mask obtained in step S3-4) to make the predicted mask have the same spatial size as the output of step S3-3), and combining the predicted mask with the output of step S3-3) to obtain a 14 × 257 size feature map;
the characteristic diagram passes through 4 convolution layers, and the kernel size and the number of filters of the 4 convolution layers are respectively set to be 3 and 256; adding 3 full convolution layers, wherein the first two full convolution layers are set to be 1024, the last full convolution layer is set to be the number of categories, the number of the categories is 1, namely pedestrians; the output value is the score of the mask, a threshold value is set to be 0.5, and the mask with the threshold value larger than 0.5 is adopted and is defined as GT _ mask;
s4) constructing a loss function by using a Binary Cross Entropy loss function Binary _ Cross _ Encopy, expressing the real probability as p e { y,1-y }, and expressing the prediction probability as q e as a great face,1-Where y denotes the probability that the sample belongs to a pedestrian, 1-y denotes the probability that the sample belongs to the background,representing the probability of predicting a pedestrian, 1-Representing the probability of predicting the background, the similarity between p and q is measured by cross entropy, and the formula is as follows:
s5) comparing the information of each pixel point in the GT _ mask and the P _ mask by using a Binary Cross Entropy loss function Binary _ Cross _ Encopy;
s6) inputting the P _ mask and the GT _ mask obtained in the step S3) into an edge detector, wherein the edge detector is composed of one edge detection operator with the size of 3 × 1, the two masks and the edge detection operator carry out convolution to obtain the edges of the two masks, and the edge result obtained after the P _ mask is input is defined as the edge result obtained after the P _ mask is inputFor the edge result obtained after GT _ mask input, it is defined as;
S7) the product obtained in the step S6)Andthe loss function loss is constructed, the formula is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911050215.3A CN111079516B (en) | 2019-10-31 | 2019-10-31 | Pedestrian gait segmentation method based on deep neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911050215.3A CN111079516B (en) | 2019-10-31 | 2019-10-31 | Pedestrian gait segmentation method based on deep neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111079516A CN111079516A (en) | 2020-04-28 |
CN111079516B true CN111079516B (en) | 2022-12-20 |
Family
ID=70310602
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911050215.3A Active CN111079516B (en) | 2019-10-31 | 2019-10-31 | Pedestrian gait segmentation method based on deep neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111079516B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111898533B (en) * | 2020-07-30 | 2023-11-28 | 中国计量大学 | Gait classification method based on space-time feature fusion |
CN113160297B (en) * | 2021-04-25 | 2024-08-02 | Oppo广东移动通信有限公司 | Image depth estimation method and device, electronic equipment and computer readable storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110348445A (en) * | 2019-06-06 | 2019-10-18 | 华中科技大学 | A kind of example dividing method merging empty convolution sum marginal information |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016065534A1 (en) * | 2014-10-28 | 2016-05-06 | 中国科学院自动化研究所 | Deep learning-based gait recognition method |
-
2019
- 2019-10-31 CN CN201911050215.3A patent/CN111079516B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110348445A (en) * | 2019-06-06 | 2019-10-18 | 华中科技大学 | A kind of example dividing method merging empty convolution sum marginal information |
Non-Patent Citations (3)
Title |
---|
Rethinking Atrous Convolution for Semantic Image Segmentation;Liang-Chieh Chen et al.;《arXiv》;20171205;全文 * |
基于Mask R-CNN的舰船目标检测研究;吴金亮等;《无线电工程》;20181019(第11期);全文 * |
基于深度卷积网络与空洞卷积融合的人群计数;盛馨心等;《上海师范大学学报(自然科学版)》;20191015(第05期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111079516A (en) | 2020-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106845478B (en) | A kind of secondary licence plate recognition method and device of character confidence level | |
WO2019169816A1 (en) | Deep neural network for fine recognition of vehicle attributes, and training method thereof | |
CN107204006B (en) | Static target detection method based on double background difference | |
CN107316031A (en) | The image characteristic extracting method recognized again for pedestrian | |
CN109685045B (en) | Moving target video tracking method and system | |
CN102915544A (en) | Video image motion target extracting method based on pattern detection and color segmentation | |
CN101945257A (en) | Synthesis method for extracting chassis image of vehicle based on monitoring video content | |
CN111028263B (en) | Moving object segmentation method and system based on optical flow color clustering | |
CN111368742B (en) | Reconstruction and identification method and system of double yellow traffic marking lines based on video analysis | |
CN111079516B (en) | Pedestrian gait segmentation method based on deep neural network | |
CN105405138A (en) | Water surface target tracking method based on saliency detection | |
Bisio et al. | Traffic analysis through deep-learning-based image segmentation from UAV streaming | |
CN106951831B (en) | Pedestrian detection tracking method based on depth camera | |
CN109241932A (en) | A kind of thermal infrared human motion recognition method based on movement variogram phase property | |
Bailke et al. | Real-time moving vehicle counter system using opencv and python | |
Ouzounis et al. | Interactive collection of training samples from the max-tree structure | |
CN110570450B (en) | Target tracking method based on cascade context-aware framework | |
Kajatin et al. | Image segmentation of bricks in masonry wall using a fusion of machine learning algorithms | |
Chen et al. | Stingray detection of aerial images with region-based convolution neural network | |
CN110390283B (en) | Cross-camera pedestrian re-retrieval method in commercial scene | |
CN106603888A (en) | Image color extraction processing structure | |
CN118212572A (en) | Road damage detection method based on improvement YOLOv7 | |
Li et al. | Global anomaly detection in crowded scenes based on optical flow saliency | |
Wu et al. | Video surveillance object recognition based on shape and color features | |
Yuan et al. | Multi-scale deformable transformer encoder based single-stage pedestrian detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |