CN112016605B - Target detection method based on corner alignment and boundary matching of bounding box - Google Patents
Target detection method based on corner alignment and boundary matching of bounding box Download PDFInfo
- Publication number
- CN112016605B CN112016605B CN202010837568.4A CN202010837568A CN112016605B CN 112016605 B CN112016605 B CN 112016605B CN 202010837568 A CN202010837568 A CN 202010837568A CN 112016605 B CN112016605 B CN 112016605B
- Authority
- CN
- China
- Prior art keywords
- loss
- bounding box
- boundary
- box
- loss function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 43
- 238000012549 training Methods 0.000 claims abstract description 27
- 238000000034 method Methods 0.000 claims description 12
- 230000004807 localization Effects 0.000 claims 2
- 230000006870 function Effects 0.000 abstract description 24
- 238000011156 evaluation Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 3
- 102100031315 AP-2 complex subunit mu Human genes 0.000 description 2
- 101000796047 Homo sapiens AP-2 complex subunit mu Proteins 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a target detection method based on the alignment of corner points of a boundary frame and boundary matching, which comprises the following steps: (1) acquiring a real scene picture, and marking the category and the position of a boundary frame of a target object to form a training data set; (2) inputting the pictures in the training data set into a detection model to obtain the prediction category distribution and the prediction boundary box position of the objects in the pictures; (3) constructing a loss function, and respectively calculating the classification loss of the object and the positioning loss of the bounding box; (4) optimizing the loss function of the classification loss and the positioning loss, selecting pictures in the training data set, repeating the steps (2) and (3), and finishing the training after the preset training times are reached; (5) and after the detection model is trained, selecting the picture to be detected to input the model, and obtaining the category and the position of the boundary box of the target. By using the invention, the network can focus on learning the boundary box with low overlap ratio, and the overall detection precision is improved.
Description
Technical Field
The invention belongs to the field of computer vision target detection, and particularly relates to a target detection method based on corner point alignment and boundary matching of a boundary frame.
Background
Object detection is an important task in the field of computer vision. The method is in the field of mutual promotion and development with deep learning, and can be applied to real scenes such as vehicle detection, pedestrian detection and traffic light detection and a plurality of fields such as unmanned driving and security systems. In recent years, with the development of convolutional neural networks and the introduction of various deep learning models that are well-designed for the task of target detection, the problem of target detection has made dramatic progress.
For example, chinese patent publication No. CN111428625A discloses a traffic scene target detection method and system based on deep learning, which uses an improved YOLOv3 target detection method to detect vehicles and pedestrians, including establishing an improved YOLOv3 model for implementing traffic scene feature extraction and training, where the improved YOLOv3 model models a bounding box based on center information and size information of the bounding box to obtain a corresponding gaussian model to predict uncertainty of the bounding box and set a loss function accordingly; decomposing traffic video collected by a vehicle into pictures and marking the pictures, inputting the pictures into a trained improved YOLOv3 model, and identifying vehicles and pedestrians in a traffic scene.
In the target detection, a bounding box of an object in a picture and a category of the object need to be output. In the usual case, measuring the similarity between two bounding boxes takes lnIs measured. However, these loss functions may lead to inaccurate bounding box regression because they do not match the evaluation index IoU (intersection over units), i.e., lnThe value of IoU may be different when the values are the same. Therefore, IoU, GIoU and DIoU loss functions provide direct optimization assessment IoU, and can further solve lnAnd the loss function is not matched with the evaluation index.
However, the loss based on IoU will produce a smaller gradient when the bounding box overlap is smaller. Therefore, in the process of training the neural network, gradients generated by some bounding boxes with a relatively high overlap ratio can dominate gradients reversely propagated by the neural network, so that gradients generated by some bounding boxes with a relatively low overlap ratio are easy to ignore, and learning of difficult samples by the neural network is influenced. Therefore, the overall final detection accuracy is not high.
Disclosure of Invention
The invention provides a target detection method based on the alignment of corner points of a boundary frame and boundary matching, which improves the gradient of small coincidence degree of the boundary frame, can lead the network to focus on the learning of the boundary frame with low coincidence degree, and improves the overall detection precision.
A target detection method based on corner alignment and boundary matching of a bounding box comprises the following steps:
(1) acquiring a real scene picture, and marking the category and the position of a boundary frame of a target object to form a training data set;
(2) inputting the pictures in the training data set into a detection model to obtain the prediction category distribution and the prediction boundary box position of the object in the pictures;
(3) constructing a loss function, and respectively calculating the classification loss of the object and the positioning loss of the bounding box;
(4) optimizing the loss function of the classification loss and the positioning loss, selecting pictures in the training data set, repeating the steps (2) and (3), and finishing the training after the preset training times are reached;
(5) and after the detection model is trained, selecting the picture to be detected to input the model, and obtaining the category and the position of the boundary box of the target.
The invention adopts the regression loss function based on the edge and corner alignment of the bounding box, considers the alignment of the edge and the intersection point of the bounding box, on one hand, the regression loss function is a regression function with stronger constraint, on the other hand, the regression loss function has good correlation with the evaluation index IoU, improves the gradient of the bounding box when the coincidence ratio is smaller, and can improve the detection precision.
Preferably, in step (2), one picture is input to the detection model at a time.
In step (3), the loss function of the classification loss is:
whereinRepresents the cross entropy loss function, p represents the probability of predicting a class, and y is the true label.
In step (3), the loss function of the positioning loss is as follows:
wherein,coincidence box coordinates representing the predicted bounding box and the true bounding box, Iw,IhRespectively representing the width and the length of the coincidence frame;coordinates representing the minimum bounding convex hull of the prediction bounding box and the real bounding box; cw,ChRespectively representing the width and length of the bounding convex hull; dlt,DrbRespectively representing the distance square of the upper left corner and the distance square of the lower right corner of the predicted bounding box and the real bounding box; ddiagRepresents the square of the diagonal distance of the minimum closure.
Preferably, the value of the weighting factor α is set to 0.2.
In the step (4), the loss function process for optimizing the classification loss and the positioning loss is as follows:
wherein,is the total loss for training the model,is a loss of the classification is that,is a loss of position.
Compared with the prior art, the invention has the following beneficial effects:
1. the method based on the alignment of the corner points of the boundary frame and the boundary matching improves the gradient with small coincidence degree of the boundary frame, so that the network can learn the boundary frame with low coincidence degree by focusing on the coincidence degree, and the overall detection precision is improved.
2. The loss function provided by the invention is simple and effective, can be randomly inserted into various detection networks, and improves the detection precision.
3. In a main stream target detection data set PASCAL VOC and COCO, the performance higher than that of other algorithms is shown through a large number of experiments, and the superiority of the method is proved through experiments.
Drawings
FIG. 1 is a schematic overall framework and flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram of the loss function for focus alignment and boundary matching in the present invention.
Detailed Description
The invention will be described in further detail below with reference to the drawings and examples, which are intended to facilitate the understanding of the invention without limiting it in any way.
As shown in fig. 1, a target detection method based on corner point alignment and boundary matching of a bounding box includes the following steps:
s01, acquiring a real scene picture, labeling the category and the position of a boundary frame of a target object, and forming a training data set;
s02, inputting a picture in the training data set into the detection model for feature extraction, and obtaining the prediction category distribution and the prediction boundary box position of the object in the picture;
s03, constructing a loss function, and respectively calculating the classification loss of the object and the positioning loss of the bounding box as shown in FIG. 2, wherein the positioning loss of the bounding box directly minimizes the distance between two corner points and enlarges the coincidence ratio of the boundary to realize the purpose of positioning the bounding box.
S04, optimizing the loss functions of the classification loss and the positioning loss, selecting pictures in the training data set, repeating the steps S02 and S03, and finishing the training after the preset training times are reached;
and S05, after the training of the detection model is finished, selecting the picture to be detected to input the model, and obtaining the category and the position of the boundary box of the target.
In order to verify the effectiveness of the invention, the invention compares the target detection data PASCAL VOC and COCO of two main streams with other leading edge loss function algorithms at present.
The Pascal Visual Object Classes (VOC) dataset is one of the most popular datasets for Object classification, detection and semantic segmentation. For object detection, it has 20 categories with labeled bounding boxes. We used Pascal VOC 2007+2012 (combination of VOC 2007 and VOC 2012), 16551 images as a training set, and 4952 images in Pascal VOC 2007test as a test set.
MS COCO (Microsoft Common Objects in Context) is another popular data set for target detection, instance partitioning and target keypoint detection. It is a large-scale dataset containing 80 classes. We used COCO train2017 as the training set, which contained 135k images, COCO val2017 as the verification set, which contained 5k images, and COCO test-dev2017 as the test set, which contained 20k images.
Table 1 comparison of results using five loss functions MSE, IoU, GIoU, DIoU, SCA (our), etc. on COCO datasets using the YOLOV3 detection framework.
TABLE 1
Loss/Evaluation | mAP | AP50 | AP65 | AP75 | AP80 | AP90 |
MSE | 31.4 | 56.0 | 45.1 | 31.4 | 23.1 | 4.0 |
IoU | 34.5 | 54.3 | 45.7 | 36.2 | 29.7 | 12.2 |
GIoU | 34.7 | 55.0 | 45.8 | 36.2 | 29.6 | 12.5 |
DIoU | 34.7 | 54.7 | 46.0 | 36.4 | 29.6 | 12.5 |
SCA(ours) | 35.2 | 55.6 | 46.5 | 36.8 | 30.4 | 12.8 |
Table 2 uses the SSD detection framework to compare the results of five loss functions, l1-smooth, IoU, GIoU, DIoU, SCA (our), etc., on the PASCAL VOC data set.
TABLE 2
Loss/Evaluation | mAP | AP50 | AP65 | AP75 | AP80 | AP90 |
l1-smooth | 51.0 | 78.7 | 68.6 | 54.8 | 45.0 | 15.7 |
IoU | 52.28 | 78.3 | 68.5 | 56.22 | 46.9 | 20.2 |
GIoU | 52.50 | 78.6 | 69.1 | 56.7 | 46.9 | 19.9 |
DIoU | 52.7 | 78.6 | 69.1 | 56.6 | 47.9 | 20.1 |
SCA(ours) | 53.2 | 79.0 | 69.3 | 57.0 | 48.9 | 21.6 |
As can be seen from tables 1 and 2, the method of the present invention achieves the highest accuracy on both data sets, representing the superiority of the method.
The embodiments described above are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the present invention.
Claims (4)
1. A target detection method based on the alignment of corner points of a boundary frame and boundary matching is characterized by comprising the following steps:
(1) acquiring a real scene picture, and marking the category and the position of a boundary frame of a target object to form a training data set;
(2) inputting the pictures in the training data set into a detection model to obtain the prediction category distribution and the prediction boundary box position of the object in the pictures;
(3) constructing a loss function, and respectively calculating the classification loss of the object and the positioning loss of the bounding box; the loss function of the classification loss is:
whereinRepresents the cross entropy loss function, p represents the probability of predicting a category, and y is the true label;
the loss function for localization loss is as follows:
Wherein,represents the boundary match penalty;indicating the loss of corner alignment, in particular,
wherein,coincidence box coordinates representing the predicted and actual bounding boxes, Iw,IhRespectively representing the width and the length of the coincidence frame;coordinates representing the minimum bounding convex hull of the prediction bounding box and the real bounding box; cw,ChRespectively representing the width and length of the bounding convex hull; dlt,DrbRespectively representing the distance square of the upper left corner and the distance square of the lower right corner of the predicted bounding box and the real bounding box; ddiagA diagonal distance squared representing a minimum closure;
(4) optimizing the loss function of the classification loss and the positioning loss, selecting pictures in the training data set, repeating the steps (2) and (3), and finishing the training after the preset training times are reached;
(5) and after the detection model is trained, selecting the picture to be detected to input the model, and obtaining the category and the position of the boundary box of the target.
2. The method for detecting an object based on corner point alignment and boundary matching of a bounding box as claimed in claim 1, wherein in step (2), one picture is input to the detection model at a time.
3. The method for detecting an object based on corner point alignment and boundary matching of a bounding box as claimed in claim 1, wherein the value of the weighting factor α is set to 0.2.
4. The method for detecting an object based on corner point alignment and boundary matching of a bounding box as claimed in claim 1, wherein in the step (4), the process of optimizing the loss function of the classification loss and the localization loss is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010837568.4A CN112016605B (en) | 2020-08-19 | 2020-08-19 | Target detection method based on corner alignment and boundary matching of bounding box |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010837568.4A CN112016605B (en) | 2020-08-19 | 2020-08-19 | Target detection method based on corner alignment and boundary matching of bounding box |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112016605A CN112016605A (en) | 2020-12-01 |
CN112016605B true CN112016605B (en) | 2022-05-27 |
Family
ID=73505112
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010837568.4A Active CN112016605B (en) | 2020-08-19 | 2020-08-19 | Target detection method based on corner alignment and boundary matching of bounding box |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112016605B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112651500B (en) * | 2020-12-30 | 2021-12-28 | 深圳金三立视频科技股份有限公司 | Method for generating quantization model and terminal |
CN116508073A (en) * | 2021-03-05 | 2023-07-28 | 华为技术有限公司 | Method and device for determining target detection model |
CN113052031B (en) * | 2021-03-15 | 2022-08-09 | 浙江大学 | 3D target detection method without post-processing operation |
CN113780453A (en) * | 2021-09-16 | 2021-12-10 | 惠州市德赛西威汽车电子股份有限公司 | Perception model training method and scene perception method based on perception model |
CN114463720B (en) * | 2022-01-25 | 2022-10-21 | 杭州飞步科技有限公司 | Lane line detection method based on line segment intersection ratio loss function |
CN117437397A (en) * | 2022-07-15 | 2024-01-23 | 马上消费金融股份有限公司 | Model training method, target detection method and device |
CN116245950B (en) * | 2023-05-11 | 2023-08-01 | 合肥高维数据技术有限公司 | Screen corner positioning method for full screen or single corner deletion |
CN117036985B (en) * | 2023-10-09 | 2024-02-06 | 武汉工程大学 | Small target detection method and device for video satellite image |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105760886A (en) * | 2016-02-23 | 2016-07-13 | 北京联合大学 | Image scene multi-object segmentation method based on target identification and saliency detection |
EP3171297A1 (en) * | 2015-11-18 | 2017-05-24 | CentraleSupélec | Joint boundary detection image segmentation and object recognition using deep learning |
CN109685152A (en) * | 2018-12-29 | 2019-04-26 | 北京化工大学 | A kind of image object detection method based on DC-SPP-YOLO |
CN109934121A (en) * | 2019-02-21 | 2019-06-25 | 江苏大学 | A kind of orchard pedestrian detection method based on YOLOv3 algorithm |
CN111222395A (en) * | 2019-10-21 | 2020-06-02 | 杭州飞步科技有限公司 | Target detection method and device and electronic equipment |
-
2020
- 2020-08-19 CN CN202010837568.4A patent/CN112016605B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3171297A1 (en) * | 2015-11-18 | 2017-05-24 | CentraleSupélec | Joint boundary detection image segmentation and object recognition using deep learning |
CN105760886A (en) * | 2016-02-23 | 2016-07-13 | 北京联合大学 | Image scene multi-object segmentation method based on target identification and saliency detection |
CN109685152A (en) * | 2018-12-29 | 2019-04-26 | 北京化工大学 | A kind of image object detection method based on DC-SPP-YOLO |
CN109934121A (en) * | 2019-02-21 | 2019-06-25 | 江苏大学 | A kind of orchard pedestrian detection method based on YOLOv3 algorithm |
CN111222395A (en) * | 2019-10-21 | 2020-06-02 | 杭州飞步科技有限公司 | Target detection method and device and electronic equipment |
Non-Patent Citations (3)
Title |
---|
SIF_Self-Inspirited_Feature_Learning_for_Person_Re-Identification;Long wei;《IEEE:SIF_Self-Inspirited_Feature_Learning_for_Person_Re-Identification》;20200304;全文 * |
Tiny YOLOV3目标检测改进;马立等;《光学精密工程》;20200415(第04期);全文 * |
靳一凡.基于级联卷积神经网络的人脸关键点检测算法.《中国知网:博硕士论文库》.2015, * |
Also Published As
Publication number | Publication date |
---|---|
CN112016605A (en) | 2020-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112016605B (en) | Target detection method based on corner alignment and boundary matching of bounding box | |
CN110363122B (en) | Cross-domain target detection method based on multi-layer feature alignment | |
CN108830188B (en) | Vehicle detection method based on deep learning | |
CN112001385B (en) | Target cross-domain detection and understanding method, system, equipment and storage medium | |
CN106951830B (en) | Image scene multi-object marking method based on prior condition constraint | |
CN108681693B (en) | License plate recognition method based on trusted area | |
CN107679078A (en) | A kind of bayonet socket image vehicle method for quickly retrieving and system based on deep learning | |
CN109325502B (en) | Shared bicycle parking detection method and system based on video progressive region extraction | |
CN105930791A (en) | Road traffic sign identification method with multiple-camera integration based on DS evidence theory | |
CN111078946A (en) | Bayonet vehicle retrieval method and system based on multi-target regional characteristic aggregation | |
CN112215190A (en) | Illegal building detection method based on YOLOV4 model | |
CN103871077A (en) | Extraction method for key frame in road vehicle monitoring video | |
CN110852327A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
CN103279738A (en) | Automatic identification method and system for vehicle logo | |
CN116704490B (en) | License plate recognition method, license plate recognition device and computer equipment | |
Mijić et al. | Traffic sign detection using YOLOv3 | |
Cao et al. | An end-to-end neural network for multi-line license plate recognition | |
Hu | Intelligent road sign inventory (IRSI) with image recognition and attribute computation from video log | |
CN117333845A (en) | Real-time detection method for small target traffic sign based on improved YOLOv5s | |
CN116844126A (en) | YOLOv7 improved complex road scene target detection method | |
CN106548195A (en) | A kind of object detection method based on modified model HOG ULBP feature operators | |
CN117456480B (en) | Light vehicle re-identification method based on multi-source information fusion | |
CN113673534A (en) | RGB-D image fruit detection method based on fast RCNN | |
CN104331708A (en) | Automatic detecting and analyzing method and system for crosswalk lines | |
CN111832463A (en) | Deep learning-based traffic sign detection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |