CN116363696A - Pedestrian rotating frame detection method, system, medium and device - Google Patents
Pedestrian rotating frame detection method, system, medium and device Download PDFInfo
- Publication number
- CN116363696A CN116363696A CN202310229603.8A CN202310229603A CN116363696A CN 116363696 A CN116363696 A CN 116363696A CN 202310229603 A CN202310229603 A CN 202310229603A CN 116363696 A CN116363696 A CN 116363696A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- frame
- neural network
- pedestrian detection
- detection neural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 274
- 238000013528 artificial neural network Methods 0.000 claims abstract description 95
- 238000012216 screening Methods 0.000 claims abstract description 70
- 230000006870 function Effects 0.000 claims abstract description 51
- 238000012549 training Methods 0.000 claims abstract description 47
- 238000012360 testing method Methods 0.000 claims abstract description 32
- 238000013138 pruning Methods 0.000 claims abstract description 20
- 238000002372 labelling Methods 0.000 claims description 17
- 238000012805 post-processing Methods 0.000 claims description 16
- 238000003062 neural network model Methods 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 10
- 238000011156 evaluation Methods 0.000 claims description 9
- 230000004927 fusion Effects 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 6
- 238000000034 method Methods 0.000 abstract description 41
- 230000000694 effects Effects 0.000 abstract description 14
- 238000011084 recovery Methods 0.000 abstract description 6
- 230000006978 adaptation Effects 0.000 abstract description 2
- 238000013135 deep learning Methods 0.000 abstract description 2
- 238000013461 design Methods 0.000 abstract description 2
- 238000012545 processing Methods 0.000 description 20
- 238000004364 calculation method Methods 0.000 description 19
- 238000010586 diagram Methods 0.000 description 17
- 239000011159 matrix material Substances 0.000 description 17
- 230000008569 process Effects 0.000 description 14
- 238000013139 quantization Methods 0.000 description 13
- 230000005540 biological transmission Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000003491 array Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 235000019800 disodium phosphate Nutrition 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
- G06V10/242—Aligning, centring, orientation detection or correction of the image by image rotation, e.g. by 90 degrees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Traffic Control Systems (AREA)
Abstract
The application provides a pedestrian rotating frame detection method, a system, a medium and a device, comprising the following steps: acquiring a pedestrian detection frame; constructing a small pedestrian detection neural network; setting a positive sample screening strategy of a pedestrian detection neural network; setting a loss function of a pedestrian detection neural network; based on the pedestrian detection frame, training a small pedestrian detection neural network by adopting a positive sample screening strategy and a loss function; pruning the trained small pedestrian detection neural network; loading the cut convolution weight to carry out recovery precision training on the lightweight small network; and testing and cutting network effects. The method adopts a common convolution structure to design a lightweight deep learning network, adopts a training method of heavy parameters and a high-efficiency pruning method for proposing BN layer adaptation, reduces the calculated amount of the model to the greatest extent, ensures that the final model is light and simple to deploy, and keeps the original precision under the condition of reducing 40% calculated amount of the original model.
Description
Technical Field
The invention belongs to the technical field of computer vision applied to artificial intelligence, relates to a pedestrian detection method, and in particular relates to a pedestrian rotating frame detection method, system, medium and device.
Background
In the application scene of retail passenger flow detection, the terminal equipment is usually weaker in calculation power, meanwhile, the terminal equipment has multiple tasks of detection, attribute and the like, the calculation amount of a detection model is reduced while the coverage area is not sacrificed, and the parameter amount becomes a main problem. In the current light and small model detection, a mode similar to a Yolo series end-to-end detection model is often used for realizing the detection of a horizontal frame, the defect of the mode is that the detection of some far and inclined targets is sacrificed, the detection of missed detection or non-attached targets of the targets can negatively influence the detection effect, and the reduction of effective detection coverage range can be caused. In the existing rotating frame detection algorithm, most of the rotating frame detection algorithm is applied to remote sensing, and is realized by adopting a two-stage detection mode, even in a single-stage detection model, most of the rotating frame detection algorithm describes the rotation of the frame through a prediction angle, the output is generally the center point, the width and the height of the frame, and the representation mode enables the prediction value to have the magnitude difference due to the length and the angle, so that a large quantization error is often caused when the output of the same feature diagram is used for carrying out low-precision quantization on a terminal. In addition, the method generally used for solving the angle periodicity problem and the long-short side switching EoE problem requires a larger number of channels to represent the angle, which also causes a larger performance loss for the calculation of the terminal device.
Therefore, in the existing pedestrian detection technology, a detection technology of a lightweight pedestrian rotating frame is required to be capable of realizing real-time, accurate and effective detection with a small load in a terminal device.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, an object of the present application is to provide a method, a system, a medium and a device for detecting a rotating frame of a pedestrian, which are used for solving the problems of reduced coverage of a light and small model of a horizontal frame, quantization error of the rotating frame due to an angle prediction mode, and larger performance loss in the process of implementing the human body detection technology in the prior art.
To achieve the above and other related objects, in a first aspect, the present application provides a pedestrian rotating frame detection method, including the steps of: acquiring a pedestrian detection frame; constructing a pedestrian detection neural network; setting a positive sample screening strategy of the pedestrian detection neural network; setting a loss function of the pedestrian detection neural network; training the pedestrian detection neural network by adopting the positive sample screening strategy and the loss function based on the pedestrian detection frame; pruning the trained pedestrian detection neural network; and based on the pedestrian detection frame, retraining the pruned pedestrian detection neural network by adopting the positive sample screening strategy and the loss function so as to detect pedestrians based on the retrained pedestrian detection neural network.
In one implementation manner of the first aspect, acquiring the pedestrian detection box includes the following steps: a pedestrian detection box is acquired based on the open source dataset.
In one implementation manner of the first aspect, constructing the pedestrian detection neural network includes the steps of: constructing a tiny-yolov3 network, and adding an attention module in the tiny-yolov3 network to perform feature fusion; setting a network post-processing module; and constructing the pedestrian detection neural network based on the tiny-yolov3 network and the network post-processing module.
In one implementation manner of the first aspect, setting a positive sample screening policy of the pedestrian detection neural network includes the following steps: selecting a predicted frame of the center point and 8 points around the center point of the real annotation frame as a preliminary screening pre-selected frame group of the real annotation frame; when a certain preselection frame is matched with a plurality of real labeling frames in a plurality of preliminary screening preselection frame groups at the same time, the preselection frame is only reserved in the preliminary screening preselection frame group with the largest cross-over ratio with the real labeling frame, and the preselection frame is deleted in other preliminary screening preselection frame groups; and calculating the cross-over ratio between the real annotation frame and the matched pre-screening frame group, calculating a cross-over ratio threshold value based on the cross-over ratio, and selecting candidate positive samples with the cross-over ratio larger than the cross-over ratio threshold value as final positive samples.
In one implementation manner of the first aspect, setting the loss function of the pedestrian detection neural network includes the following steps: obtaining confidence loss, category loss, shape and positioning loss of a pedestrian detection frame; setting the loss function as a sum of the confidence loss, the category loss, the shape and the positioning loss.
In one implementation manner of the first aspect, training the pedestrian detection neural network based on the pedestrian detection box using the positive sample screening policy and the loss function includes the steps of: acquiring positive sample preselection frames of all pedestrian detection frames by adopting the positive sample screening strategy; and training the pedestrian detection neural network based on the positive sample pre-selection frame until the loss function of the pedestrian detection neural network obtained by training meets the preset requirement.
In one implementation form of the first aspect, pruning the trained pedestrian detection neural network comprises the steps of: randomly selecting the number of cutting channels of each layer of convolution based on the trained pedestrian detection neural network; the number of the cutting channels is determined by a cutting coefficient; the clipping coefficient is a random decimal value; sorting the sum of convolution kernels of each channel from small to large, and removing the channels of the clipping channel number; updating the batch normalization layer based on the cut pedestrian detection neural network, and acquiring an average precision value of the cut pedestrian detection neural network; selecting a preset number of different clipping coefficients, respectively obtaining corresponding average precision values, and calculating the contribution value of each convolution to the average precision values based on the clipping coefficients and the average precision values; determining clipping weight of each convolution according to each contribution degree, randomly selecting the clipping channel number of each convolution based on the clipping weight, and obtaining the average precision value of the clipped pedestrian detection neural network until the average progress value and the calculated amount meet the preset requirement, thereby obtaining the clipped pedestrian detection neural network.
In an implementation manner of the first aspect, the pedestrian rotation frame detection method further includes the following steps: detecting and evaluating the trained pedestrian detection neural network model through detection and evaluation indexes; the detection evaluation index comprises: recall rate, accuracy, and overall index.
In a second aspect, the present application provides a pedestrian rotating frame detection system comprising: the acquisition module is used for acquiring the pedestrian detection frame; the network building module is used for building a pedestrian detection neural network; the positive sample screening module is used for setting a positive sample screening strategy of the pedestrian detection neural network; the configuration module is used for setting a loss function of the pedestrian detection network; the training module is used for training the pedestrian detection neural network by adopting the positive sample screening strategy and the loss function based on the pedestrian detection frame; the pruning module is used for pruning the trained pedestrian detection neural network; and the test module is used for testing the pedestrian detection neural network before and after pruning.
In a final aspect, the present application provides a pedestrian rotating frame detection device, including: a processor and a memory. The memory is used for storing a computer program; the processor is connected with the memory and is used for executing the computer program stored in the memory so that the pedestrian rotating frame detection device can execute the pedestrian rotating frame detection method.
As described above, the pedestrian rotating frame detection method, system, medium and device of the invention have the following beneficial effects:
(1) The model provided by the application can finish end-to-end detection in one forward propagation, output the position of the rotating frame, and accurately attach to a detection target; and the rotating rectangular description mode in the application can obtain the frame closest to the human body, the scale magnitude difference of each channel output by the model is not large, a better quantization effect can be obtained at the terminal equipment, and the precision loss caused by quantization is reduced.
(2) According to the pedestrian rotating frame detection method, 40% of calculated amount can be reduced on the premise of guaranteeing the performance of the model, the model precision is improved, and the calculated amount is small.
(3) The pedestrian rotating frame detection model is simple in structure, only one scale of feature map is output, and due to the adoption of the CBAM module and the heavy parameter method, features of different scales are well fused on the output feature map, and the detection requirement in a retail scene can be met by the one scale of feature map.
Drawings
Fig. 1 is a flow chart of a pedestrian rotation frame detection method according to an embodiment of the invention.
Fig. 2 is a schematic diagram illustrating an implementation of the pedestrian rotation frame detection method in an application scenario.
Fig. 3A is a schematic flow chart of S12 in the pedestrian rotation frame detection method of the present invention.
Fig. 3B is a schematic diagram showing a network structure constructed in the pedestrian rotating frame detection method of the present invention.
Fig. 3C is a schematic diagram of heavy parameter training reasoning in an embodiment of the pedestrian rotation frame detection method of the present invention.
Fig. 3D is a schematic diagram of a CBAM structure of a pedestrian rotation frame detection method according to an embodiment of the invention.
Fig. 3E is a schematic diagram illustrating a rotating frame description mode of the pedestrian rotating frame detection method according to an embodiment of the invention.
Fig. 4 is a flowchart of S13 in the pedestrian rotation frame detection method of the present invention.
Fig. 5 is a flowchart of S14 in the pedestrian rotation frame detection method of the present invention.
Fig. 6 is a flowchart of S15 in the pedestrian rotation frame detection method of the present invention.
Fig. 7A is a flowchart of S16 in the pedestrian rotation frame detection method of the present invention.
Fig. 7B is a schematic flow chart of pruning of the pedestrian detection neural network according to the present invention.
Fig. 8 is a schematic structural diagram of a pedestrian rotating frame detection system according to an embodiment of the invention.
Fig. 9 is a schematic structural diagram of a pedestrian rotating frame detection device according to an embodiment of the invention.
Description of element reference numerals
81. Acquisition module
82. Network building module
83. Positive sample screening module
84. Configuration module
85. Training module
86. Pruning module
87. Test module
91. Processor and method for controlling the same
92. Memory device
S11 to S17 steps
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
The pedestrian rotation frame detection method provided in the embodiment of the present application will be described in detail below with reference to the drawings in the embodiment of the present application.
Fig. 1 and fig. 2 are a schematic flow chart of a pedestrian rotation frame detection method according to an embodiment of the invention and a schematic implementation diagram of the pedestrian rotation frame detection method according to the invention in an application scenario, respectively. As shown in fig. 1 and 2, the present embodiment provides a pedestrian rotating frame detection method.
The pedestrian rotating frame detection method specifically comprises the following steps:
s11, acquiring a pedestrian detection frame.
A pedestrian detection box is acquired based on the open source dataset.
In this embodiment, the pedestrian detection image and the label thereof are obtained from the internet open source data set. Wherein, the open source data set refers to a pedestrian rotating frame data set. The pedestrian rotation frame dataset may be divided into: a training data set and a test data set. The training data set is used for training the pedestrian detection network; the test dataset is used for evaluating and testing the acquired model.
The pedestrian detection box may be described as: (x) center ,y center ,x 1 ,y 1 ,x 2 ,y 2 ,x 3 ,y 3 ,x 4 ,y 4 ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein, (x) center ,y center ) Coordinates of a center point of a pedestrian detection frame, (x) 1 ,y 1 ,x 2 ,y 2 ,x 3 ,y 3 ,x 4 ,y 4 ) Is the vertex coordinates of the quadrangle.
S12, constructing a pedestrian detection neural network. Referring to fig. 3A, a flowchart of S12 in the pedestrian rotation frame detection method of the present invention is shown. As shown in fig. 3A, the step S12 includes the following steps:
S121, constructing a tiny-yolov3 network, and adding an attention module in the tiny-yolov3 network to perform feature fusion. Fig. 3B, 3C and 3D are schematic diagrams of a network structure constructed in the pedestrian rotation frame detection method of the present invention, a heavy parameter training reasoning schematic diagram in an embodiment of the pedestrian rotation frame detection method of the present invention, and a CBAM structure schematic diagram in an embodiment of the pedestrian rotation frame detection method of the present invention, respectively. As shown in fig. 3B, 3C and 3D, in this embodiment, the obtained pedestrian detection frame is processed by using a tiny-yolov3 network to obtain a plurality of pedestrian features.
Specifically, the tiny-yolov3 network processes the pedestrian detection frame, so as to obtain pedestrian feature matrices of different scales such as stride8, stride16, stride32 and the like. In order to reduce performance loss caused by transmission and decoding of a plurality of pedestrian features, a attention module is adopted to perform feature fusion on pedestrian feature matrixes with different scales.
The attention module is used for introducing attention in the neural network model, so that the model can grasp the emphasis to improve the understanding capability of the model. The CBAM module is preferred in this embodiment. The CBAM module is a lightweight attention module that incorporates both channel and spatial attention mechanism modules.
In this embodiment, since the stride16 is the most common scale in the retail scene of the common lens, the CBAM module processes the pedestrian feature matrices of different scales such as stride8, stride16, and stride32, so as to fully fuse the features of all the pedestrian feature matrices of different scales, and finally obtain the pedestrian feature matrix of the stride16, thereby effectively improving the post-processing efficiency.
Meanwhile, in order to fully identify the image information, the heavy parameter method training is performed based on the heavy parameter reparams module, so that scene information can be fully learned during training, and parameters can be combined during reasoning to become common convolution.
S122, constructing a network processing post-module. And obtaining a detection preselection frame of the grid number of the feature map by setting a decoding mode of the network feature map matrix.
The final output of the network is that the channel is: (5 coordinate related positions + confidence + category number) a matrix of feature map grid numbers.
In this embodiment, the confidence and the category may be calculated by a Sigmoid function in the same manner as other detection networks. Fig. 3E is a schematic diagram showing a rotating frame description mode of the pedestrian rotating frame detection method according to an embodiment of the invention. As shown in FIG. 3E, the p1 point and the p2 point are the midpoints of two parallel lines, respectively, which can be defined by the first four bits (x 1 ,y 1 ,x 2 ,y 2 ) And calculating to obtain coordinates. The step size of the stride corresponding to the feature map in which it is located is 16 in this example.
The four-bit coordinate calculation formula before the channel bit is as follows:
the angle sine and cosine values of the pedestrian detection frame can be calculated by a formula, and the calculation formula is as follows:
let 5 th bit h of channel 1 The length of the decision frame is: h=e h1 *stride。
From the above, we can deduce four points (x 1 ,y 1 ,x 2 ,y 2 ,x 3 ,y 3 ,x 4 ,y 4 ) Is defined by the coordinates of (a). The coordinates of these four points can be expressed by the following formula:
according to the description of the formula, the confidence coefficient, the category confidence coefficient and the calculation mode of the frame position of the pedestrian detection frame are set, so that the coordinates, the category confidence coefficient, the frame confidence coefficient and the like output by the model pass through e x The function is used for representing, the value range of the channel can be in the same range, and then each variable can inform the value representation in the same range when the model output is subjected to low-precision quantization by the terminal, so that quantization loss is reduced.
S123, constructing the pedestrian detection neural network based on the tiny-yolov3 network and the network post-processing module.
In this embodiment, based on the tiny-yolov3 network and the network post-processing module, the matrix of the pedestrian fusion feature matrix (5 coordinate related positions+confidence coefficient+category number) that is obtained in step S121 is set, and the network output matrix is encoded to obtain a pre-selected frame of the pedestrian feature grid number, so that the output channel is the matrix of the pedestrian feature grid number (5 coordinate related positions+confidence coefficient+category number). Namely: during the training process, the module directly outputs the encoded pre-selected frames. Meanwhile, deep neural network training is performed based on pedestrian characteristics, so that a pedestrian detection neural network is obtained.
The network post-processing module obtains a preselected frame of the characteristic diagram grid number by encoding the network output matrix. Similarly, during the reasoning process, the module performs NMS (non-maximum suppression) processing on the encoded pre-selected box, outputting the final pedestrian detection box.
The tiny-yolov3 network employed in this embodiment may also be replaced with other lightweight networks, such as: mobilenet, efficientnet, shufflenet, etc., including but not limited to the networks described above.
S13, setting a positive sample screening strategy of the pedestrian detection neural network. Referring to fig. 4, a flow chart of S13 in the pedestrian rotation frame detection method of the present invention is shown. As shown in fig. 4, the step S13 includes the following steps:
in this embodiment, positive and negative samples are automatically selected according to the statistical information of the target based on a simplified adaptive sample selection mechanism ATSS (Adaptive Training Sample Selection) strategy. Meanwhile, a dynamic iou threshold value is adopted, and self-adaptive adjustment is performed in the training process; namely: the network post-processing module directly outputs the encoded pre-selection frame, and positive and negative samples are divided by a positive sample screening algorithm for use in calculating a loss function. The specific process is as follows:
S131, selecting a predicted frame of the center point of the real annotation frame and 8 points around the center point as a preliminary screening pre-selected frame group.
And directly outputting all the pre-selected frames obtained by encoding according to the network post-processing module, and selecting the center point of a pedestrian real labeling frame (namely a gt frame) and the prediction frames of 8 points around as a pre-selected frame group. Each real annotation frame is paired with a plurality of preliminary screening preselection frames as a preliminary selection pairing group of the real annotation frame.
S132, removing the weight of the preliminary screening pre-selection frame. When a certain preselection frame is matched with a plurality of real labeling frames in a plurality of preliminary screening preselection frame groups at the same time, the preselection frame is only reserved in the preliminary screening preselection frame group with the largest cross-over ratio with the real labeling frame, and the preselection frame is deleted in other preliminary screening preselection frame groups. Wherein the intersection ratio is the ratio of the intersection and union of the pre-screening frame and the real labeling frame.
In this embodiment, when the preliminary screening pre-selected frame is matched with more than 2 gt frames, calculating the cross ratio of the preliminary screening pre-selected frame and the real labeling frame, comparing all the cross ratios, selecting the gt frames represented by the maximum cross ratio for pairing, and deleting the preliminary selected frame from the paired preliminary selected frame groups of other gt frames.
S133, calculating the cross-over ratio between the real labeling frame and the pre-screening frame group matched with the real labeling frame, calculating a cross-over ratio threshold value based on the cross-over ratio, and selecting candidate positive samples with the cross-over ratio larger than the cross-over ratio threshold value as final positive samples.
In this embodiment, the cross-over ratio between the frame and the paired pre-screening frame group is calculated, then the average value and standard deviation of all the cross-over ratios are calculated, the average value and the standard deviation are added as a preset cross-over ratio threshold value, and finally the sample with the cross-over ratio greater than the cross-over ratio threshold value is selected from all the candidate positive samples as the final positive sample.
Specifically, the iou between the gt frame and the paired preliminary screening pre-selected frame group is calculated, then the average value avg_iou and the standard deviation Var_iou of all the iou are calculated, the sum of the average value avg_iou and the standard deviation Var_iou (namely, the cross-over threshold value Iou _th=the average value avg_iou+the standard deviation Var_iou of the iou) is added as the threshold value Iou _th of the preset iou, and finally the sample with the iou larger than the cross-over threshold value Iou _th is selected from all the candidate positive samples as the final positive sample P final 。
Wherein, the calculation formula of the cross ratio (iou) is as follows:
from the above, the value of iou ranges from 0 to 1. When the two pedestrian detection frames have no intersection at all, iou is 0, and when they are fully overlapped, iou is 1, that is, the smaller the overlap ratio is, the closer the iou is to 0, and the larger the overlap ratio is, the closer the iou is to 1.
Likewise, in the positive sample screening method in the present application, the following may be used: the different strategies of the SimOTA of yolox, and the screening of positive samples carried by yolo3 to yolo5, are not limited to one or a combination of several, including but not limited to the above methods.
S14, setting a loss function of the pedestrian detection neural network. Referring to fig. 5, a flow chart of S14 in the pedestrian rotation frame detection method of the present invention is shown. As shown in fig. 5, the step S14 includes the following steps:
s141, obtaining confidence loss, category loss, shape and positioning loss of the pedestrian detection frame.
And respectively calculating confidence loss, category loss and shape and positioning loss of the pedestrian detection frame according to the loss function.
The confidence loss calculation formula of the pedestrian detection frame is as follows:
wherein conf proposal Representing the confidence of the grid output of the feature map; lambda (lambda) conf A weight factor representing a confidence loss for the positive sample preselection box; conf target Represented as identical conf proposal A numerical matrix of size divided by the prediction block P selected as positive samples final Except for the grid value of 1, the remaining negative sample preselection frame grid values are all set to 0.
The calculation formula of the category loss is as follows:
wherein class is gt Representing the category of the gt frame; lambda (lambda) class A weight factor representing the loss of the gt box; class of things Pfinal Represented as P matching the gt box final The class loss is set as cross entropy loss.
The calculation formulas of the shape and the positioning loss function are as follows:
wherein, the corodinate gt Four-point coordinates representing a rectangle of the gt frame; lambda (lambda) coor A weight factor representing a positive sample prediction block; corodinate Pfinal Representing P matching to gt frame final Rectangular four-point coordinates of (a); l (L) KFIOU The function for calculating the approximate iou loss of the two rectangles has the advantages that the difference between the two rectangles in the form (including angles) can be directly measured by converting the two rectangles into Gaussian distribution, and ambiguity caused by angle periodicity and definition of long and short edges is avoided; and L is MSE (coordinate_center gt ,coordinate_center Pfinal ) The positioning loss of the model is measured by calculating the difference between the center points of the two rectangular frames.
S142, setting the loss function as the sum of the confidence loss, the category loss, the shape and the positioning loss.
The sum of the confidence loss, the category loss, the shape and the positioning loss is calculated according to the respective loss functions in step S141.
The calculation formula of the loss function is as follows:
wherein conf proposal Representing the confidence of the grid output of the feature map; lambda (lambda) conf A weight factor representing a confidence loss for the positive sample preselection box; soft arget Represented as identical conf predict A numerical matrix of sizes; class of things gt Representing the category of the gt frame; lambda (lambda) class A weight factor representing the loss of the gt box; class of things Pfinal Represented as P matching the gt box final The category loss of (2) is set as cross entropy loss; corodinate gt Four-point coordinates representing a rectangle of the gt frame; lambda (lambda) coor A weight factor representing a positive sample prediction block; corodinate P final Representing P matching to gt frame final Rectangular four-point coordinates of (a); l (L) KFIOU Representing a function that computes two rectangular approximations of the iou loss.
And S15, training the pedestrian detection neural network by adopting the positive sample screening strategy and the loss function based on the pedestrian detection frame until the loss function of the pedestrian detection neural network obtained by training meets the preset requirement. Referring to fig. 6, a flowchart of S15 in the pedestrian rotation frame detection method of the present invention is shown.
In this embodiment, the image of S11 is input to the model in S12 and post-processed to obtain a pre-selected frame, the pre-selected frame is divided by using the positive sample screening strategy in S13, the loss function in S14 is entered to perform loss calculation, the loss feedback is updated to the model, and the iteration is performed until the loss function is no longer lowered, so as to obtain a weight file, and the trained pedestrian detection neural network model is obtained. The model is optimized, and the requirement of high precision is met.
S16, pruning the trained pedestrian detection neural network. Referring to fig. 7A and 7B, a flow chart of S16 in the method for detecting a pedestrian rotation frame and a flow chart of pruning a line detection neural network in the invention are shown respectively. As shown in fig. 7A and 7B, the step S16 includes the steps of:
And S161, randomly selecting the number of clipping channels of each layer of convolution based on the trained pedestrian detection neural network. The number of the cutting channels is determined by a cutting coefficient; the clipping coefficient is a random decimal value.
In this embodiment, the pedestrian detection neural network model trained in step S152 is subjected to channel clipping.
Specifically, assume that the number of channels cut for each layer convolution is N 1 . Wherein N is 1 Is determined by the clipping coefficient k.
The calculation formula of the clipping channel number is as follows: number of cutting channels N 1 The clipping coefficient k is the total channel number M. And the clipping coefficient k value is a randomly selected fraction in the (0, 0.5) interval. The k values of each convolution are randomly selected within a range to obtain a set of k values and N values. Note that the convolutions of the convolutions 3x3 and 1x1 of the heavy parameter are considered to be the same convolution, sharing a single value of k.
S162, sorting the sum of convolution kernels of each channel from small to large, and removing the channels with the clipping channel number.
In this embodiment, the convolution kernels of the respective channels are added according to the series of clipping channel numbers obtained in the previous step, and arranged in order from small to large, and then the previous N is added 1 The convolved channels are deleted.
And S163, updating the batch normalization layer based on the cut pedestrian detection neural network, and acquiring the average precision value of the cut pedestrian detection neural network.
In the embodiment, the cut pedestrian detection neural network performs forward reasoning on a test data set in the pedestrian detection data set, so as to update the batch normalization layer; then, the test data set is tested, so that the average precision value (i.e. mAP, mean Average Precision) of the cut pedestrian detection neural network is obtained.
Wherein the normalization layer is also referred to as BN layer. The BN layer means that the data needs to be normalized to have a mean value of 0 and a variance of 1 before going through one layer to the next.
The forward reasoning allows the new model to adapt the BN layer to the test dataset and allows the model effect to exclude the effect of the BN layer during the evaluation. Therefore, the method for detecting the pedestrian after cutting can select a network which can truly extract the characteristics of the pedestrian.
S164, selecting a preset number of groups of different clipping coefficients, wherein each group of clipping coefficients corresponds to one clipping model, respectively obtaining average precision values corresponding to the preset number of models, and calculating the contribution value of each convolution to the average precision value based on the clipping coefficients and the average precision values.
In this embodiment, a certain number X of different clipping coefficients are selected, and channel clipping is performed respectively, and then the convolved channels are removed according to the clipping channel values, and the respective clipped networks are tested on the test data set, and the clipped mAP value is obtained. The k value and mAP are then noted for each time. Then each convolution has X k values, and calculating the correlation between these X k values and the mAP yields a contribution of the convolution to the mAP of R.
Specifically, 20 different sets of clipping coefficients are preferred and described. Selecting 20 groups of different clipping coefficients k 1 -k 20 The method comprises the steps of carrying out a first treatment on the surface of the And performing channel clipping according to different clipping coefficients, removing convolved channels according to the clipping channel values, testing the test data set by the respective clipped network, and obtaining the clipped mAP value. Then, each clipping coefficient k and its corresponding mAP are noted. At this time, each convolution has 20 values of the clipping coefficient k, and then the contribution degree of the convolution to the mAP is obtained by calculating the correlation between the 20 k values and the mAP.
And S165, searching the pedestrian detection neural network meeting the preset requirement according to the contribution degree. And determining the number of cutting channels of each convolution according to the contribution degree of each convolution in each search, acquiring the average precision value of the cut pedestrian detection neural network, and searching for a plurality of times until the cut network meets the pedestrian detection neural network with mAP and calculated quantity meeting the preset requirement, so that the cut pedestrian detection neural network can be obtained.
In this embodiment, each search redefines the number of clipping channels N for each convolution based on the contribution R of each convolution 2 Then according to N 2 The convolution is cut, so that the convolution with small contribution is more cut, the cut with large contribution is less, and the network meeting the requirements is easier to search. Wherein the number of clipping channels is determined by a clipping coefficient and the contribution value.
Specifically, the clipping coefficient k is redetermined according to each contribution degree R a Obtaining the number N of clipping channels of each convolution 2 At this time, the clipping coefficient k a The randomly selected fraction within the interval of values (0, 0.3) is multiplied by (1-contribution). Then based on the new number N of clipping channels 2 The pedestrian detection neural network is cut down, and the test data set is tested to obtain a new cutting channel N 2 The mAP below. The cutting and testing process is repeated until a network with the calculated amount of 60% of the model is obtained, and the mAP is estimated to be 70% of the mAP of the pedestrian detection neural network model. In this example, the model structure meeting the requirement is searched out after repeating 30 times, and the model weight after clipping is obtained.
S17, loading the cut convolution weight to carry out recovery precision training on the network.
In this embodiment, the pedestrian detection neural network model finally obtained in step S16 is adopted, the training data set is input into the pedestrian detection neural network model finally obtained, the loss function of the pedestrian detection neural network model is calculated through a positive sample screening strategy to train, iteration is performed until the loss function is no longer reduced, and then the weight of the pedestrian detection neural network model after precision recovery is obtained.
Then, the clipping network effect is tested.
And counting the detection effect of the clipping model by using the test data set. The precision evaluation index of the optimal model can be used: recall R (recovery), accuracy P (precision), and comprehensive index F (F-Measure) are evaluated. The larger the indexes are, the better the model detection effect is, and the comprehensive index F can weigh recall rate and accuracy rate and evaluate the model effect.
The evaluation index is calculated as follows:
recall = detect correct total number of targets/total number of targets
Accuracy = total number of detected correct targets/total number of detected targets
The comprehensive index calculation formula is as follows:
where α represents the importance in the assessment for adjusting recall and accuracy. When α=1, recall and accuracy are equally important. The test index of the network after clipping in this example can be completely restored to the test index of the network before clipping.
By using the pedestrian rotating frame detection method, a set of proper rotating rectangular frame description mode quantized by terminal equipment can be provided for the application scene of retail passenger flow detection, a common convolution structure is adopted to design a light deep learning network, a heavy parameter training method and a high-efficiency pruning method for providing BN layer adaptation are used, the calculated amount of a model is reduced to the greatest extent, the deployment of a final model is light and simple, and the original precision is maintained under the condition of reducing 40% calculated amount of an original model. The method and the device improve the calculation power of the terminal equipment, and reduce the calculation amount of the detection model while not sacrificing the coverage area. Meanwhile, the large quantization error caused when the output of the same feature map is quantized with low precision by the terminal due to the magnitude difference of the length and the angle is reduced; the performance loss is also reduced to a large extent for the computation of the terminal device.
The protection scope of the pedestrian rotating frame detection method according to the embodiment of the present application is not limited to the execution sequence of the steps listed in the embodiment, and all the schemes implemented by adding or removing steps and replacing steps according to the prior art made by the principles of the present application are included in the protection scope of the present application.
The present embodiment additionally provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the pedestrian rotation frame detection method as described in fig. 1.
The present application may be a system, method, and/or computer program product at any possible level of technical detail. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present application.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device. Computer program instructions for carrying out operations of the present application may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, integrated circuit configuration data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and a procedural programming language such as the "C" language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present application are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which may execute the computer readable program instructions.
The embodiment of the application also provides a pedestrian rotating frame detection system, which can realize the pedestrian rotating frame detection method, but the implementation device of the pedestrian rotating frame detection method includes but is not limited to the structure of the pedestrian rotating frame detection system listed in the embodiment, and all structural deformation and replacement of the prior art made according to the principles of the application are included in the protection scope of the application.
The pedestrian rotation frame detection system provided in this embodiment will be described in detail below with reference to the drawings.
The present embodiment provides a pedestrian rotating frame detection system, including:
referring to fig. 8, a schematic structural diagram of a pedestrian rotating frame detection system according to an embodiment of the invention is shown. As shown in fig. 8, the pedestrian rotating frame detection system includes: an acquisition module 81, a network building module 82, a positive sample screening module 83, a configuration module 84, a training module 85, a pruning module 86 and a testing module 87.
The acquiring module 81 is configured to acquire a pedestrian detection frame.
In this embodiment, the pedestrian detection frame is acquired based on the open source data set.
Specifically, the pedestrian detection image and the label thereof can be obtained from the internet open source data set. Wherein, the open source data set refers to a pedestrian rotating frame data set. The pedestrian rotation frame dataset may be divided into: a training data set and a test data set. The training data set is used for training the pedestrian detection network; the test dataset is used for evaluating and testing the acquired model.
The pedestrian detection box may be described as: (x) center ,y center ,x 1 ,y 1 ,x 2 ,y 2 ,x 3 ,y 3 ,x 4 ,y 4 ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein, (x) center ,y center ) Coordinates of a center point of a pedestrian detection frame, (x) 1 ,y 1 ,x 2 ,y 2 ,x 3 ,y 3 ,x 4 ,y 4 ) Is the vertex coordinates of the quadrangle.
The network construction module 82 is connected to the acquisition module 81, and is configured to construct a pedestrian detection neural network.
And constructing a tiny-yolov3 network, and adding an attention module in the tiny-yolov3 network to perform feature fusion.
In this embodiment, a tiny-yolov3 network is used to process the obtained pedestrian detection frame to obtain a plurality of pedestrian features.
Specifically, the tiny-yolov3 network processes the pedestrian detection frame, so as to obtain pedestrian feature matrices of different scales such as stride8, stride16, stride32 and the like. In order to reduce performance loss caused by transmission and decoding of a plurality of pedestrian features, a attention module is adopted to perform feature fusion on pedestrian feature matrixes with different scales.
The attention module is used for introducing attention in the neural network model, so that the model can grasp the emphasis to improve the understanding capability of the model. The CBAM module is preferred in this embodiment. The CBAM module is a lightweight attention module that incorporates both channel and spatial attention mechanism modules.
In this embodiment, since the stride16 is the most common scale in the retail scene of the common lens, the CBAM module processes the pedestrian feature matrices of different scales such as stride8, stride16, and stride32, so as to fully fuse the features of all the pedestrian feature matrices of different scales, and finally obtain the pedestrian feature matrix of the stride16, thereby effectively improving the post-processing efficiency.
Meanwhile, in order to fully identify the image information, the heavy parameter method training is performed based on the heavy parameter reparams module, so that scene information can be fully learned during training, and parameters can be combined during reasoning to become common convolution.
Then, a post-network processing module is constructed. And obtaining a detection preselection frame of the grid number of the feature map by setting a decoding mode of the network feature map matrix. The final output of the network is that the channel is: (5 coordinate related positions + confidence + category number) a matrix of feature map grid numbers.
In this embodiment, the confidence and the category may be calculated by a Sigmoid function in the same manner as other detection networks.
Setting the confidence coefficient, the category confidence coefficient and the calculation mode of the frame position of the pedestrian detection frame according to the related calculation formula, so that the coordinates, the category confidence coefficient, the frame confidence coefficient and the like output by the model pass through e x The function is used for representing, the value range of the channel can be in the same range, and then each variable can inform the value representation in the same range when the model output is subjected to low-precision quantization by the terminal, so that quantization loss is reduced.
And finally, constructing the pedestrian detection neural network based on the tiny-yolov3 network and the network post-processing module.
In this embodiment, based on the tiny-yolov3 network and the network post-processing module, a matrix of pedestrian fusion feature matrices (5 coordinate related positions+confidence levels+category numbers) and pedestrian feature grid numbers is set, and a network output matrix is encoded to obtain a preselected frame of the pedestrian feature grid numbers, so that an output channel is a matrix of (5 coordinate related positions+confidence levels+category numbers) and pedestrian feature grid numbers. Namely: during the training process, the module directly outputs the encoded pre-selected frames. Meanwhile, deep neural network training is performed based on pedestrian characteristics, so that a pedestrian detection neural network is obtained.
The network post-processing module obtains a preselected frame of the characteristic diagram grid number by encoding the network output matrix. Similarly, during the reasoning process, the module performs NMS (non-maximum suppression) processing on the encoded pre-selected box, outputting the final pedestrian detection box.
The network used in this embodiment is as follows: tiny-yolov3, mobilenet, efficientnet, shufflenet, etc., and the above algorithm can run on a processing server or terminal such as CPU, NPU or GPU.
The positive sample screening module 83 is configured to set a positive sample screening policy of the pedestrian detection neural network.
And setting a positive sample screening strategy of the pedestrian detection neural network. In this embodiment, positive and negative samples are automatically selected according to the statistical information of the target based on a simplified adaptive sample selection mechanism ATSS policy. Meanwhile, a dynamic iou threshold value is adopted, and self-adaptive adjustment is performed in the training process; namely: the network post-processing module directly outputs the encoded pre-selection frame, and positive and negative samples are divided by a positive sample screening algorithm for use in calculating a loss function.
Specifically, a predicted frame of the center point and 8 points around the center point of the real annotation frame is selected as a pre-selected frame group for preliminary screening. And directly outputting all the pre-selected frames obtained by encoding according to the network post-processing module, wherein each real labeling frame is matched with a plurality of predicted frames with the central point of a pedestrian real labeling frame (namely, a gt frame) and 8 surrounding points as a pre-selected frame group. Each real annotation frame is paired with a plurality of preliminary screening preselection frames as a preliminary selection pairing group of the real annotation frame.
And (5) removing the weight of the preliminary screening pre-selection frame. When a certain preselection frame is matched with a plurality of real labeling frames in a plurality of preliminary screening preselection frame groups at the same time, the preselection frame is only reserved in the preliminary screening preselection frame group with the largest cross-over ratio with the real labeling frame, and the preselection frame is deleted in other preliminary screening preselection frame groups. Wherein the intersection ratio is the ratio of the intersection and union of the pre-screening frame and the real labeling frame.
In this embodiment, when the preliminary screening pre-selected frame is matched with more than 2 gt frames, calculating the cross ratio of the preliminary screening pre-selected frame and the real labeling frame, comparing all the cross ratios, selecting the gt frames represented by the maximum cross ratio for pairing, and deleting the preliminary selected frame from the paired preliminary selected frame groups of other gt frames.
And calculating the cross-over ratio between the real annotation frame and the pre-screening frame group matched with the real annotation frame, calculating a cross-over ratio threshold value based on the cross-over ratio, and selecting candidate positive samples with the cross-over ratio larger than the cross-over ratio threshold value as final positive samples.
In this embodiment, the cross-over ratio between the frame and the pre-screening frame group paired with the frame is calculated, then the average value and standard deviation of all the cross-over ratios are calculated, the average value and the standard deviation are added as a preset cross-over ratio threshold value, and finally the sample with the cross-over ratio greater than the cross-over ratio threshold value is selected from all the candidate positive samples as the final positive sample.
Likewise, in the positive sample screening method in the present application, the following may be used: the different strategies of the SimOTA of yolox, and the screening of positive samples carried by yolo3 to yolo5, are not limited to one or a combination of several, including but not limited to the above methods.
The configuration module 84 is used to set the loss function of the pedestrian detection network.
In this embodiment, confidence loss, category loss, shape, and positioning loss of the pedestrian detection frame are obtained.
And respectively calculating confidence loss, category loss and shape and positioning loss of the pedestrian detection frame according to the loss function. The penalty function is then set to the sum of the confidence penalty, the category penalty, the shape and the location penalty.
The training module 85 is configured to train the pedestrian detection neural network by using the positive sample screening policy and the loss function based on the pedestrian detection frame until the loss function of the pedestrian detection neural network obtained by training meets a preset requirement.
In this embodiment, the images in the training data set are input into the constructed model and post-processed to obtain the pre-selected frames, the pre-selected frames are divided by using a positive sample screening strategy, the loss function is entered to perform loss calculation, the loss is returned to update the model, iteration is performed until the loss function is no longer descending, and then the weight file is obtained, so that the trained pedestrian detection neural network model is obtained. The model is optimized, and the requirement of high precision is met.
The pruning module 86 is used to prune the trained pedestrian detection neural network.
In this embodiment, the number of clipping channels of each layer of convolution is randomly selected based on the trained pedestrian detection neural network. The number of the cutting channels is determined by a cutting coefficient; the clipping coefficient is a random decimal value.
The sum of the convolution kernels of each channel is sorted from small to large, and the channels of the clipping channel number are removed.
Updating the batch normalization layer based on the cut pedestrian detection neural network, and obtaining the average precision value of the cut pedestrian detection neural network.
In the embodiment, the cut pedestrian detection neural network performs forward reasoning on a test data set in the pedestrian detection data set, so as to update the BN layer; and then testing the test data set, thereby obtaining the mAP of the cut pedestrian detection neural network. The forward reasoning allows the new model to adapt the BN layer to the test dataset and allows the model effect to exclude the effect of the BN layer during the evaluation. Therefore, the method for detecting the pedestrian after cutting can select a network which can truly extract the characteristics of the pedestrian.
Selecting a preset number of groups of different clipping coefficients, wherein each group of clipping coefficients corresponds to one clipping model, respectively obtaining average precision values corresponding to the preset number of models, and calculating the contribution value of each convolution to the average precision value based on the clipping coefficients and the average precision values.
In this embodiment, a certain number X of different clipping coefficients are selected, and channel clipping is performed respectively, and then the convolved channels are removed according to the clipping channel values, and the respective clipped networks are tested on the test data set, and the clipped mAP value is obtained. The k value and mAP are then noted for each time. Then each convolution has X k values, and calculating the correlation between these X k values and the mAP yields a contribution of the convolution to the mAP of R.
And searching the pedestrian detection neural network meeting the preset requirement according to the contribution degree. And determining the number of cutting channels of each convolution according to the contribution degree of each convolution in each search, acquiring the average precision value of the cut pedestrian detection neural network, and searching for a plurality of times until the cut network meets the pedestrian detection neural network with mAP and calculated quantity meeting the preset requirement, so that the cut pedestrian detection neural network can be obtained.
The test module 87 is used for testing the pedestrian detection neural network before and after pruning.
In this embodiment, the cut convolution weight is loaded to perform recovery accuracy training on the network. And inputting the training data set into the finally obtained pedestrian detection neural network model, calculating a loss function of the pedestrian detection neural network model through a positive sample screening strategy for training, iterating until the loss function is not reduced any more, and then obtaining the weight of the pedestrian detection neural network model after precision recovery.
Then, the test data set is utilized to count the detection effect of the clipping model. The precision evaluation index of the optimal model can be used: recall rate R, accuracy rate P, and comprehensive index F.
The pedestrian rotating frame detection model is used for constructing a pedestrian rotating frame detection system, so that the computing power of terminal equipment can be improved in a retail passenger flow detection system, and the calculated amount of the detection model is reduced while the coverage area is not sacrificed. Meanwhile, quantization errors are reduced; the computing performance and the service life of the terminal equipment are improved.
It should be noted that, it should be understood that the division of the modules of the above system is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated. And these modules may all be implemented in software in the form of calls by the processing element; or can be realized in hardware; the method can also be realized in a form of calling software by a processing element, and the method can be realized in a form of hardware by a part of modules. For example, the x module may be a processing element that is set up separately, may be implemented in a chip of the system, or may be stored in a memory of the system in the form of program code, and the function of the x module may be called and executed by a processing element of the system. The implementation of the other modules is similar. In addition, all or part of the modules can be integrated together or can be independently implemented. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.
The above modules may be one or more integrated circuits configured to implement the above methods, for example: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), etc. For another example, when a module is implemented in the form of a processing element scheduler code, the processing element may be a general purpose processor, such as a Central Processing Unit (CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Referring to fig. 9, a schematic structural diagram of a pedestrian rotation frame detection device according to an embodiment of the invention is shown. As shown in fig. 9, the present embodiment provides a pedestrian rotating frame detection apparatus including: a processor 91 and a memory 92; the memory 92 is used for storing a computer program; the processor 91 is connected to the memory 92 for executing a computer program stored in the memory 92 to cause the pedestrian rotation frame detection device to execute the steps of the pedestrian rotation frame detection method as described above.
Preferably, the memory may comprise random access memory (RandomAccess Memory, abbreviated as RAM), and may further comprise non-volatile memory (non-volatile memory), such as at least one magnetic disk memory.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field programmable gate arrays (Field Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In summary, the pedestrian rotating frame detection method, system, medium and device provided by the application have the following beneficial effects:
the model provided by the application can finish end-to-end detection in one forward propagation, output the position of the rotating frame, and accurately attach to a detection target; and the rotating rectangular description mode in the application can obtain the frame closest to the human body, the scale magnitude difference of each channel output by the model is not large, a better quantization effect can be obtained at the terminal equipment, and the precision loss caused by quantization is reduced. Meanwhile, 40% of calculated amount can be reduced on the premise of ensuring the performance of the model, the model precision is improved, and the calculated amount is small. The pedestrian rotating frame detection model is simple in structure, and features of different scales can be well fused on the output feature map only by outputting the feature map of one scale, so that the detection requirement in a retail scene is met.
The above embodiments are merely illustrative of the principles of the present invention and its effectiveness, and are not intended to limit the invention. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the invention. Accordingly, it is intended that all equivalent modifications and variations of the invention be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.
Claims (10)
1. The pedestrian rotating frame detection method is characterized by comprising the following steps of:
acquiring a pedestrian detection frame;
constructing a pedestrian detection neural network;
setting a positive sample screening strategy of the pedestrian detection neural network;
setting a loss function of the pedestrian detection neural network;
training the pedestrian detection neural network by adopting the positive sample screening strategy and the loss function based on the pedestrian detection frame;
pruning the trained pedestrian detection neural network;
and based on the pedestrian detection frame, retraining the pruned pedestrian detection neural network by adopting the positive sample screening strategy and the loss function so as to detect pedestrians based on the retrained pedestrian detection neural network.
2. The pedestrian rotation frame detection method according to claim 1, wherein acquiring the pedestrian detection frame includes the steps of:
a pedestrian detection box is acquired based on the open source dataset.
3. The pedestrian rotating frame detection method of claim 1, wherein constructing the pedestrian detection neural network includes the steps of:
constructing a tiny-yolov3 network, and adding an attention module in the tiny-yolov3 network to perform feature fusion;
setting a network post-processing module;
and constructing the pedestrian detection neural network based on the tiny-yolov3 network and the network post-processing module.
4. The pedestrian rotating frame detection method of claim 1, wherein setting a positive sample screening policy of the pedestrian detection neural network includes the steps of:
selecting a predicted frame of the center point and 8 points around the center point of the real annotation frame as a preliminary screening pre-selected frame group of the real annotation frame;
when a certain preselection frame is matched with a plurality of real labeling frames in a plurality of preliminary screening preselection frame groups at the same time, the preselection frame is only reserved in the group with the largest cross-over ratio with the real labeling frames, and the preselection frame is deleted in other preliminary screening preselection frame groups;
and calculating the cross-over ratio between the real annotation frame and the pre-screening frame group matched with the real annotation frame, calculating a cross-over ratio threshold value based on the cross-over ratio, and selecting candidate positive samples with the cross-over ratio larger than the cross-over ratio threshold value as final positive samples.
5. The pedestrian rotating frame detection method according to claim 1, wherein setting a loss function of the pedestrian detection neural network includes the steps of:
obtaining confidence loss, category loss, shape and positioning loss of a pedestrian detection frame;
setting the loss function as a sum of the confidence loss, the category loss, the shape and the positioning loss.
6. The pedestrian rotating frame detection method of claim 1, wherein training the pedestrian detection neural network with the positive sample screening policy and the loss function based on the pedestrian detection frame comprises the steps of:
acquiring positive sample preselection frames of all pedestrian detection frames by adopting the positive sample screening strategy;
and training the pedestrian detection neural network based on the positive sample pre-selection frame until the loss function of the pedestrian detection neural network obtained by training meets the preset requirement.
7. The pedestrian rotating frame detection method of claim 1, wherein pruning the trained pedestrian detection neural network includes the steps of:
randomly selecting the number of cutting channels of each layer of convolution based on the trained pedestrian detection neural network; the number of the cutting channels is determined by a cutting coefficient; the clipping coefficient is a random decimal value;
Sorting the sum of convolution kernels of each channel from small to large, and removing the channels of the clipping channel number;
updating the batch normalization layer based on the cut pedestrian detection neural network, and acquiring an average precision value of the cut pedestrian detection neural network;
selecting a preset number of different clipping coefficients, respectively obtaining corresponding average precision values, and calculating the contribution value of each convolution to the average precision values based on the clipping coefficients and the average precision values;
and determining the clipping weight of each convolution according to each contribution degree, selecting the clipping channel number of each convolution layer based on the clipping weight, and obtaining the average precision value of the clipped pedestrian detection neural network until the average progress value and the calculated amount meet the preset requirement, thereby obtaining the clipped pedestrian detection neural network.
8. The pedestrian rotating frame detection method according to claim 1, characterized by further comprising the steps of:
detecting and evaluating the trained pedestrian detection neural network model through detection and evaluation indexes; the detection evaluation index comprises: recall rate, accuracy, and overall index.
9. A pedestrian rotating frame detection system, comprising:
The acquisition module is used for acquiring the pedestrian detection frame;
the network building module is used for building a pedestrian detection neural network;
the positive sample screening module is used for setting a positive sample screening strategy of the pedestrian detection neural network;
the configuration module is used for setting a loss function of the pedestrian detection network;
the training module is used for training the pedestrian detection neural network by adopting the positive sample screening strategy and the loss function based on the pedestrian detection frame;
the pruning module is used for pruning the trained pedestrian detection neural network;
and the test module is used for testing the pedestrian detection neural network before and after pruning.
10. A pedestrian rotating frame detection device, characterized by comprising: a processor and a memory;
the memory is used for storing a computer program;
the processor is connected to the memory for executing the computer program stored in the memory to cause the pedestrian rotation frame detection device to execute the pedestrian rotation frame detection method as claimed in any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310229603.8A CN116363696A (en) | 2023-03-10 | 2023-03-10 | Pedestrian rotating frame detection method, system, medium and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310229603.8A CN116363696A (en) | 2023-03-10 | 2023-03-10 | Pedestrian rotating frame detection method, system, medium and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116363696A true CN116363696A (en) | 2023-06-30 |
Family
ID=86918045
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310229603.8A Pending CN116363696A (en) | 2023-03-10 | 2023-03-10 | Pedestrian rotating frame detection method, system, medium and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116363696A (en) |
-
2023
- 2023-03-10 CN CN202310229603.8A patent/CN116363696A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108701250B (en) | Data fixed-point method and device | |
CN111079780B (en) | Training method for space diagram convolution network, electronic equipment and storage medium | |
CN116994140A (en) | Cultivated land extraction method, device, equipment and medium based on remote sensing image | |
US20230162477A1 (en) | Method for training model based on knowledge distillation, and electronic device | |
CN114677548B (en) | Neural network image classification system and method based on resistive random access memory | |
CN113159276A (en) | Model optimization deployment method, system, equipment and storage medium | |
CN116151488B (en) | Pollution data analysis method, system and equipment | |
CN113657595A (en) | Neural network real-time pruning method and system and neural network accelerator | |
CN114241230A (en) | Target detection model pruning method and target detection method | |
CN110874627B (en) | Data processing method, data processing device and computer readable medium | |
CN114444668A (en) | Network quantization method, network quantization system, network quantization apparatus, network quantization medium, and image processing method | |
CN115564987A (en) | Training method and application of image classification model based on meta-learning | |
CN114330090B (en) | Defect detection method, device, computer equipment and storage medium | |
CN114239799A (en) | Efficient target detection method, device, medium and system | |
CN114139564A (en) | Two-dimensional code detection method and device, terminal equipment and training method for detection network | |
CN116363696A (en) | Pedestrian rotating frame detection method, system, medium and device | |
US11036980B2 (en) | Information processing method and information processing system | |
CN113947177A (en) | Quantization calibration method, calculation device and computer readable storage medium | |
WO2022242471A1 (en) | Neural network configuration parameter training and deployment method and apparatus for coping with device mismatch | |
CN116206212A (en) | SAR image target detection method and system based on point characteristics | |
CN111382761B (en) | CNN-based detector, image detection method and terminal | |
CN115880486B (en) | Target detection network distillation method and device, electronic equipment and storage medium | |
Furuta et al. | An Efficient Implementation of FPGA-based Object Detection Using Multi-scale Attention | |
CN118608792B (en) | Mamba-based ultra-light image segmentation method and computer device | |
US12020409B2 (en) | Blur correction system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |