CN112016605B

CN112016605B - Target detection method based on corner alignment and boundary matching of bounding box

Info

Publication number: CN112016605B
Application number: CN202010837568.4A
Authority: CN
Inventors: 郑途; 蔡登�; 刘子立
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-08-19
Filing date: 2020-08-19
Publication date: 2022-05-27
Anticipated expiration: 2040-08-19
Also published as: CN112016605A

Abstract

The invention discloses a target detection method based on the alignment of corner points of a boundary frame and boundary matching, which comprises the following steps: (1) acquiring a real scene picture, and marking the category and the position of a boundary frame of a target object to form a training data set; (2) inputting the pictures in the training data set into a detection model to obtain the prediction category distribution and the prediction boundary box position of the objects in the pictures; (3) constructing a loss function, and respectively calculating the classification loss of the object and the positioning loss of the bounding box; (4) optimizing the loss function of the classification loss and the positioning loss, selecting pictures in the training data set, repeating the steps (2) and (3), and finishing the training after the preset training times are reached; (5) and after the detection model is trained, selecting the picture to be detected to input the model, and obtaining the category and the position of the boundary box of the target. By using the invention, the network can focus on learning the boundary box with low overlap ratio, and the overall detection precision is improved.

Description

Target detection method based on corner alignment and boundary matching of bounding box

Technical Field

The invention belongs to the field of computer vision target detection, and particularly relates to a target detection method based on corner point alignment and boundary matching of a boundary frame.

Background

Object detection is an important task in the field of computer vision. The method is in the field of mutual promotion and development with deep learning, and can be applied to real scenes such as vehicle detection, pedestrian detection and traffic light detection and a plurality of fields such as unmanned driving and security systems. In recent years, with the development of convolutional neural networks and the introduction of various deep learning models that are well-designed for the task of target detection, the problem of target detection has made dramatic progress.

For example, chinese patent publication No. CN111428625A discloses a traffic scene target detection method and system based on deep learning, which uses an improved YOLOv3 target detection method to detect vehicles and pedestrians, including establishing an improved YOLOv3 model for implementing traffic scene feature extraction and training, where the improved YOLOv3 model models a bounding box based on center information and size information of the bounding box to obtain a corresponding gaussian model to predict uncertainty of the bounding box and set a loss function accordingly; decomposing traffic video collected by a vehicle into pictures and marking the pictures, inputting the pictures into a trained improved YOLOv3 model, and identifying vehicles and pedestrians in a traffic scene.

In the target detection, a bounding box of an object in a picture and a category of the object need to be output. In the usual case, measuring the similarity between two bounding boxes takes l_nIs measured. However, these loss functions may lead to inaccurate bounding box regression because they do not match the evaluation index IoU (intersection over units), i.e., l_nThe value of IoU may be different when the values are the same. Therefore, IoU, GIoU and DIoU loss functions provide direct optimization assessment IoU, and can further solve l_nAnd the loss function is not matched with the evaluation index.

However, the loss based on IoU will produce a smaller gradient when the bounding box overlap is smaller. Therefore, in the process of training the neural network, gradients generated by some bounding boxes with a relatively high overlap ratio can dominate gradients reversely propagated by the neural network, so that gradients generated by some bounding boxes with a relatively low overlap ratio are easy to ignore, and learning of difficult samples by the neural network is influenced. Therefore, the overall final detection accuracy is not high.

Disclosure of Invention

The invention provides a target detection method based on the alignment of corner points of a boundary frame and boundary matching, which improves the gradient of small coincidence degree of the boundary frame, can lead the network to focus on the learning of the boundary frame with low coincidence degree, and improves the overall detection precision.

A target detection method based on corner alignment and boundary matching of a bounding box comprises the following steps:

(1) acquiring a real scene picture, and marking the category and the position of a boundary frame of a target object to form a training data set;

(2) inputting the pictures in the training data set into a detection model to obtain the prediction category distribution and the prediction boundary box position of the object in the pictures;

(3) constructing a loss function, and respectively calculating the classification loss of the object and the positioning loss of the bounding box;

(4) optimizing the loss function of the classification loss and the positioning loss, selecting pictures in the training data set, repeating the steps (2) and (3), and finishing the training after the preset training times are reached;

(5) and after the detection model is trained, selecting the picture to be detected to input the model, and obtaining the category and the position of the boundary box of the target.

The invention adopts the regression loss function based on the edge and corner alignment of the bounding box, considers the alignment of the edge and the intersection point of the bounding box, on one hand, the regression loss function is a regression function with stronger constraint, on the other hand, the regression loss function has good correlation with the evaluation index IoU, improves the gradient of the bounding box when the coincidence ratio is smaller, and can improve the detection precision.

Preferably, in step (2), one picture is input to the detection model at a time.

In step (3), the loss function of the classification loss is:

wherein

Represents the cross entropy loss function, p represents the probability of predicting a class, and y is the true label.

In step (3), the loss function of the positioning loss is as follows:

input as a prediction bounding box B^pTrue bounding box B^gAnd a weight factor α, wherein

The output is the loss function value

Wherein,

representing boundary matching loss;

indicating the loss of corner alignment, in particular,

wherein,

coincidence box coordinates representing the predicted bounding box and the true bounding box, I_w,I_hRespectively representing the width and the length of the coincidence frame;

coordinates representing the minimum bounding convex hull of the prediction bounding box and the real bounding box; c_w,C_hRespectively representing the width and length of the bounding convex hull; d_lt,D_rbRespectively representing the distance square of the upper left corner and the distance square of the lower right corner of the predicted bounding box and the real bounding box; d_diagRepresents the square of the diagonal distance of the minimum closure.

Preferably, the value of the weighting factor α is set to 0.2.

In the step (4), the loss function process for optimizing the classification loss and the positioning loss is as follows:

wherein,

is the total loss for training the model,

is a loss of the classification is that,

is a loss of position.

Compared with the prior art, the invention has the following beneficial effects:

1. the method based on the alignment of the corner points of the boundary frame and the boundary matching improves the gradient with small coincidence degree of the boundary frame, so that the network can learn the boundary frame with low coincidence degree by focusing on the coincidence degree, and the overall detection precision is improved.

2. The loss function provided by the invention is simple and effective, can be randomly inserted into various detection networks, and improves the detection precision.

3. In a main stream target detection data set PASCAL VOC and COCO, the performance higher than that of other algorithms is shown through a large number of experiments, and the superiority of the method is proved through experiments.

Drawings

FIG. 1 is a schematic overall framework and flow diagram of the process of the present invention;

FIG. 2 is a schematic diagram of the loss function for focus alignment and boundary matching in the present invention.

Detailed Description

The invention will be described in further detail below with reference to the drawings and examples, which are intended to facilitate the understanding of the invention without limiting it in any way.

As shown in fig. 1, a target detection method based on corner point alignment and boundary matching of a bounding box includes the following steps:

s01, acquiring a real scene picture, labeling the category and the position of a boundary frame of a target object, and forming a training data set;

s02, inputting a picture in the training data set into the detection model for feature extraction, and obtaining the prediction category distribution and the prediction boundary box position of the object in the picture;

s03, constructing a loss function, and respectively calculating the classification loss of the object and the positioning loss of the bounding box as shown in FIG. 2, wherein the positioning loss of the bounding box directly minimizes the distance between two corner points and enlarges the coincidence ratio of the boundary to realize the purpose of positioning the bounding box.

S04, optimizing the loss functions of the classification loss and the positioning loss, selecting pictures in the training data set, repeating the steps S02 and S03, and finishing the training after the preset training times are reached;

and S05, after the training of the detection model is finished, selecting the picture to be detected to input the model, and obtaining the category and the position of the boundary box of the target.

In order to verify the effectiveness of the invention, the invention compares the target detection data PASCAL VOC and COCO of two main streams with other leading edge loss function algorithms at present.

The Pascal Visual Object Classes (VOC) dataset is one of the most popular datasets for Object classification, detection and semantic segmentation. For object detection, it has 20 categories with labeled bounding boxes. We used Pascal VOC 2007+2012 (combination of VOC 2007 and VOC 2012), 16551 images as a training set, and 4952 images in Pascal VOC 2007test as a test set.

MS COCO (Microsoft Common Objects in Context) is another popular data set for target detection, instance partitioning and target keypoint detection. It is a large-scale dataset containing 80 classes. We used COCO train2017 as the training set, which contained 135k images, COCO val2017 as the verification set, which contained 5k images, and COCO test-dev2017 as the test set, which contained 20k images.

Table 1 comparison of results using five loss functions MSE, IoU, GIoU, DIoU, SCA (our), etc. on COCO datasets using the YOLOV3 detection framework.

TABLE 1

Loss/Evaluation	mAP	AP50	AP65	AP75	AP80	AP90
							MSE	31.4	56.0	45.1	31.4	23.1	4.0
IoU	34.5	54.3	45.7	36.2	29.7	12.2
							GIoU	34.7	55.0	45.8	36.2	29.6	12.5
DIoU	34.7	54.7	46.0	36.4	29.6	12.5
							SCA(ours)	35.2	55.6	46.5	36.8	30.4	12.8

Table 2 uses the SSD detection framework to compare the results of five loss functions, l1-smooth, IoU, GIoU, DIoU, SCA (our), etc., on the PASCAL VOC data set.

TABLE 2

Loss/Evaluation	mAP	AP50	AP65	AP75	AP80	AP90
							l1-smooth	51.0	78.7	68.6	54.8	45.0	15.7
IoU	52.28	78.3	68.5	56.22	46.9	20.2
							GIoU	52.50	78.6	69.1	56.7	46.9	19.9
DIoU	52.7	78.6	69.1	56.6	47.9	20.1
							SCA(ours)	53.2	79.0	69.3	57.0	48.9	21.6

As can be seen from tables 1 and 2, the method of the present invention achieves the highest accuracy on both data sets, representing the superiority of the method.

The embodiments described above are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims

1. A target detection method based on the alignment of corner points of a boundary frame and boundary matching is characterized by comprising the following steps:

(3) constructing a loss function, and respectively calculating the classification loss of the object and the positioning loss of the bounding box; the loss function of the classification loss is:

wherein

Represents the cross entropy loss function, p represents the probability of predicting a category, and y is the true label;

the loss function for localization loss is as follows:

The output being the loss function value

Wherein,

represents the boundary match penalty;

indicating the loss of corner alignment, in particular,

wherein,

coincidence box coordinates representing the predicted and actual bounding boxes, I_w,I_hRespectively representing the width and the length of the coincidence frame;

coordinates representing the minimum bounding convex hull of the prediction bounding box and the real bounding box; c_w,C_hRespectively representing the width and length of the bounding convex hull; d_lt,D_rbRespectively representing the distance square of the upper left corner and the distance square of the lower right corner of the predicted bounding box and the real bounding box; d_diagA diagonal distance squared representing a minimum closure;

2. The method for detecting an object based on corner point alignment and boundary matching of a bounding box as claimed in claim 1, wherein in step (2), one picture is input to the detection model at a time.

3. The method for detecting an object based on corner point alignment and boundary matching of a bounding box as claimed in claim 1, wherein the value of the weighting factor α is set to 0.2.

4. The method for detecting an object based on corner point alignment and boundary matching of a bounding box as claimed in claim 1, wherein in the step (4), the process of optimizing the loss function of the classification loss and the localization loss is as follows:

wherein,

is the total loss for training the model,

is a loss of the classification is that,

is a loss of position.