[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112016605B - Target detection method based on corner alignment and boundary matching of bounding box - Google Patents

Target detection method based on corner alignment and boundary matching of bounding box Download PDF

Info

Publication number
CN112016605B
CN112016605B CN202010837568.4A CN202010837568A CN112016605B CN 112016605 B CN112016605 B CN 112016605B CN 202010837568 A CN202010837568 A CN 202010837568A CN 112016605 B CN112016605 B CN 112016605B
Authority
CN
China
Prior art keywords
loss
bounding box
boundary
box
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010837568.4A
Other languages
Chinese (zh)
Other versions
CN112016605A (en
Inventor
郑途
蔡登�
刘子立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202010837568.4A priority Critical patent/CN112016605B/en
Publication of CN112016605A publication Critical patent/CN112016605A/en
Application granted granted Critical
Publication of CN112016605B publication Critical patent/CN112016605B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection method based on the alignment of corner points of a boundary frame and boundary matching, which comprises the following steps: (1) acquiring a real scene picture, and marking the category and the position of a boundary frame of a target object to form a training data set; (2) inputting the pictures in the training data set into a detection model to obtain the prediction category distribution and the prediction boundary box position of the objects in the pictures; (3) constructing a loss function, and respectively calculating the classification loss of the object and the positioning loss of the bounding box; (4) optimizing the loss function of the classification loss and the positioning loss, selecting pictures in the training data set, repeating the steps (2) and (3), and finishing the training after the preset training times are reached; (5) and after the detection model is trained, selecting the picture to be detected to input the model, and obtaining the category and the position of the boundary box of the target. By using the invention, the network can focus on learning the boundary box with low overlap ratio, and the overall detection precision is improved.

Description

Target detection method based on corner alignment and boundary matching of bounding box
Technical Field
The invention belongs to the field of computer vision target detection, and particularly relates to a target detection method based on corner point alignment and boundary matching of a boundary frame.
Background
Object detection is an important task in the field of computer vision. The method is in the field of mutual promotion and development with deep learning, and can be applied to real scenes such as vehicle detection, pedestrian detection and traffic light detection and a plurality of fields such as unmanned driving and security systems. In recent years, with the development of convolutional neural networks and the introduction of various deep learning models that are well-designed for the task of target detection, the problem of target detection has made dramatic progress.
For example, chinese patent publication No. CN111428625A discloses a traffic scene target detection method and system based on deep learning, which uses an improved YOLOv3 target detection method to detect vehicles and pedestrians, including establishing an improved YOLOv3 model for implementing traffic scene feature extraction and training, where the improved YOLOv3 model models a bounding box based on center information and size information of the bounding box to obtain a corresponding gaussian model to predict uncertainty of the bounding box and set a loss function accordingly; decomposing traffic video collected by a vehicle into pictures and marking the pictures, inputting the pictures into a trained improved YOLOv3 model, and identifying vehicles and pedestrians in a traffic scene.
In the target detection, a bounding box of an object in a picture and a category of the object need to be output. In the usual case, measuring the similarity between two bounding boxes takes lnIs measured. However, these loss functions may lead to inaccurate bounding box regression because they do not match the evaluation index IoU (intersection over units), i.e., lnThe value of IoU may be different when the values are the same. Therefore, IoU, GIoU and DIoU loss functions provide direct optimization assessment IoU, and can further solve lnAnd the loss function is not matched with the evaluation index.
However, the loss based on IoU will produce a smaller gradient when the bounding box overlap is smaller. Therefore, in the process of training the neural network, gradients generated by some bounding boxes with a relatively high overlap ratio can dominate gradients reversely propagated by the neural network, so that gradients generated by some bounding boxes with a relatively low overlap ratio are easy to ignore, and learning of difficult samples by the neural network is influenced. Therefore, the overall final detection accuracy is not high.
Disclosure of Invention
The invention provides a target detection method based on the alignment of corner points of a boundary frame and boundary matching, which improves the gradient of small coincidence degree of the boundary frame, can lead the network to focus on the learning of the boundary frame with low coincidence degree, and improves the overall detection precision.
A target detection method based on corner alignment and boundary matching of a bounding box comprises the following steps:
(1) acquiring a real scene picture, and marking the category and the position of a boundary frame of a target object to form a training data set;
(2) inputting the pictures in the training data set into a detection model to obtain the prediction category distribution and the prediction boundary box position of the object in the pictures;
(3) constructing a loss function, and respectively calculating the classification loss of the object and the positioning loss of the bounding box;
(4) optimizing the loss function of the classification loss and the positioning loss, selecting pictures in the training data set, repeating the steps (2) and (3), and finishing the training after the preset training times are reached;
(5) and after the detection model is trained, selecting the picture to be detected to input the model, and obtaining the category and the position of the boundary box of the target.
The invention adopts the regression loss function based on the edge and corner alignment of the bounding box, considers the alignment of the edge and the intersection point of the bounding box, on one hand, the regression loss function is a regression function with stronger constraint, on the other hand, the regression loss function has good correlation with the evaluation index IoU, improves the gradient of the bounding box when the coincidence ratio is smaller, and can improve the detection precision.
Preferably, in step (2), one picture is input to the detection model at a time.
In step (3), the loss function of the classification loss is:
Figure BDA0002640263610000031
wherein
Figure BDA0002640263610000032
Represents the cross entropy loss function, p represents the probability of predicting a class, and y is the true label.
In step (3), the loss function of the positioning loss is as follows:
input as a prediction bounding box BpTrue bounding box BgAnd a weight factor α, wherein
Figure BDA0002640263610000033
Figure BDA0002640263610000034
The output is the loss function value
Figure BDA0002640263610000035
Figure BDA0002640263610000036
Wherein,
Figure BDA0002640263610000037
representing boundary matching loss;
Figure BDA0002640263610000038
indicating the loss of corner alignment, in particular,
Figure BDA0002640263610000039
Figure BDA00026402636100000310
Figure BDA00026402636100000311
Figure BDA00026402636100000312
Figure BDA00026402636100000313
Figure BDA00026402636100000314
Figure BDA00026402636100000315
Figure BDA00026402636100000316
Figure BDA00026402636100000317
Figure BDA00026402636100000318
Figure BDA00026402636100000319
wherein,
Figure BDA00026402636100000320
coincidence box coordinates representing the predicted bounding box and the true bounding box, Iw,IhRespectively representing the width and the length of the coincidence frame;
Figure BDA00026402636100000321
coordinates representing the minimum bounding convex hull of the prediction bounding box and the real bounding box; cw,ChRespectively representing the width and length of the bounding convex hull; dlt,DrbRespectively representing the distance square of the upper left corner and the distance square of the lower right corner of the predicted bounding box and the real bounding box; ddiagRepresents the square of the diagonal distance of the minimum closure.
Preferably, the value of the weighting factor α is set to 0.2.
In the step (4), the loss function process for optimizing the classification loss and the positioning loss is as follows:
Figure BDA0002640263610000041
wherein,
Figure BDA0002640263610000042
is the total loss for training the model,
Figure BDA0002640263610000043
is a loss of the classification is that,
Figure BDA0002640263610000044
is a loss of position.
Compared with the prior art, the invention has the following beneficial effects:
1. the method based on the alignment of the corner points of the boundary frame and the boundary matching improves the gradient with small coincidence degree of the boundary frame, so that the network can learn the boundary frame with low coincidence degree by focusing on the coincidence degree, and the overall detection precision is improved.
2. The loss function provided by the invention is simple and effective, can be randomly inserted into various detection networks, and improves the detection precision.
3. In a main stream target detection data set PASCAL VOC and COCO, the performance higher than that of other algorithms is shown through a large number of experiments, and the superiority of the method is proved through experiments.
Drawings
FIG. 1 is a schematic overall framework and flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram of the loss function for focus alignment and boundary matching in the present invention.
Detailed Description
The invention will be described in further detail below with reference to the drawings and examples, which are intended to facilitate the understanding of the invention without limiting it in any way.
As shown in fig. 1, a target detection method based on corner point alignment and boundary matching of a bounding box includes the following steps:
s01, acquiring a real scene picture, labeling the category and the position of a boundary frame of a target object, and forming a training data set;
s02, inputting a picture in the training data set into the detection model for feature extraction, and obtaining the prediction category distribution and the prediction boundary box position of the object in the picture;
s03, constructing a loss function, and respectively calculating the classification loss of the object and the positioning loss of the bounding box as shown in FIG. 2, wherein the positioning loss of the bounding box directly minimizes the distance between two corner points and enlarges the coincidence ratio of the boundary to realize the purpose of positioning the bounding box.
S04, optimizing the loss functions of the classification loss and the positioning loss, selecting pictures in the training data set, repeating the steps S02 and S03, and finishing the training after the preset training times are reached;
and S05, after the training of the detection model is finished, selecting the picture to be detected to input the model, and obtaining the category and the position of the boundary box of the target.
In order to verify the effectiveness of the invention, the invention compares the target detection data PASCAL VOC and COCO of two main streams with other leading edge loss function algorithms at present.
The Pascal Visual Object Classes (VOC) dataset is one of the most popular datasets for Object classification, detection and semantic segmentation. For object detection, it has 20 categories with labeled bounding boxes. We used Pascal VOC 2007+2012 (combination of VOC 2007 and VOC 2012), 16551 images as a training set, and 4952 images in Pascal VOC 2007test as a test set.
MS COCO (Microsoft Common Objects in Context) is another popular data set for target detection, instance partitioning and target keypoint detection. It is a large-scale dataset containing 80 classes. We used COCO train2017 as the training set, which contained 135k images, COCO val2017 as the verification set, which contained 5k images, and COCO test-dev2017 as the test set, which contained 20k images.
Table 1 comparison of results using five loss functions MSE, IoU, GIoU, DIoU, SCA (our), etc. on COCO datasets using the YOLOV3 detection framework.
TABLE 1
Loss/Evaluation mAP AP50 AP65 AP75 AP80 AP90
MSE 31.4 56.0 45.1 31.4 23.1 4.0
IoU 34.5 54.3 45.7 36.2 29.7 12.2
GIoU 34.7 55.0 45.8 36.2 29.6 12.5
DIoU 34.7 54.7 46.0 36.4 29.6 12.5
SCA(ours) 35.2 55.6 46.5 36.8 30.4 12.8
Table 2 uses the SSD detection framework to compare the results of five loss functions, l1-smooth, IoU, GIoU, DIoU, SCA (our), etc., on the PASCAL VOC data set.
TABLE 2
Loss/Evaluation mAP AP50 AP65 AP75 AP80 AP90
l1-smooth 51.0 78.7 68.6 54.8 45.0 15.7
IoU 52.28 78.3 68.5 56.22 46.9 20.2
GIoU 52.50 78.6 69.1 56.7 46.9 19.9
DIoU 52.7 78.6 69.1 56.6 47.9 20.1
SCA(ours) 53.2 79.0 69.3 57.0 48.9 21.6
As can be seen from tables 1 and 2, the method of the present invention achieves the highest accuracy on both data sets, representing the superiority of the method.
The embodiments described above are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims (4)

1. A target detection method based on the alignment of corner points of a boundary frame and boundary matching is characterized by comprising the following steps:
(1) acquiring a real scene picture, and marking the category and the position of a boundary frame of a target object to form a training data set;
(2) inputting the pictures in the training data set into a detection model to obtain the prediction category distribution and the prediction boundary box position of the object in the pictures;
(3) constructing a loss function, and respectively calculating the classification loss of the object and the positioning loss of the bounding box; the loss function of the classification loss is:
Figure FDA0003565398700000011
wherein
Figure FDA0003565398700000012
Represents the cross entropy loss function, p represents the probability of predicting a category, and y is the true label;
the loss function for localization loss is as follows:
input as a prediction bounding box BpTrue bounding box BgAnd a weight factor α, wherein
Figure FDA0003565398700000013
Figure FDA0003565398700000014
The output being the loss function value
Figure FDA0003565398700000015
Figure FDA0003565398700000016
Wherein,
Figure FDA0003565398700000017
represents the boundary match penalty;
Figure FDA0003565398700000018
indicating the loss of corner alignment, in particular,
Figure FDA0003565398700000019
Figure FDA00035653987000000110
Figure FDA00035653987000000111
Figure FDA00035653987000000112
Figure FDA00035653987000000113
Figure FDA00035653987000000114
Figure FDA00035653987000000115
Figure FDA0003565398700000021
Figure FDA0003565398700000022
Figure FDA0003565398700000023
Figure FDA0003565398700000024
wherein,
Figure FDA0003565398700000025
coincidence box coordinates representing the predicted and actual bounding boxes, Iw,IhRespectively representing the width and the length of the coincidence frame;
Figure FDA0003565398700000026
coordinates representing the minimum bounding convex hull of the prediction bounding box and the real bounding box; cw,ChRespectively representing the width and length of the bounding convex hull; dlt,DrbRespectively representing the distance square of the upper left corner and the distance square of the lower right corner of the predicted bounding box and the real bounding box; ddiagA diagonal distance squared representing a minimum closure;
(4) optimizing the loss function of the classification loss and the positioning loss, selecting pictures in the training data set, repeating the steps (2) and (3), and finishing the training after the preset training times are reached;
(5) and after the detection model is trained, selecting the picture to be detected to input the model, and obtaining the category and the position of the boundary box of the target.
2. The method for detecting an object based on corner point alignment and boundary matching of a bounding box as claimed in claim 1, wherein in step (2), one picture is input to the detection model at a time.
3. The method for detecting an object based on corner point alignment and boundary matching of a bounding box as claimed in claim 1, wherein the value of the weighting factor α is set to 0.2.
4. The method for detecting an object based on corner point alignment and boundary matching of a bounding box as claimed in claim 1, wherein in the step (4), the process of optimizing the loss function of the classification loss and the localization loss is as follows:
Figure FDA0003565398700000027
wherein,
Figure FDA0003565398700000028
is the total loss for training the model,
Figure FDA0003565398700000029
is a loss of the classification is that,
Figure FDA00035653987000000210
is a loss of position.
CN202010837568.4A 2020-08-19 2020-08-19 Target detection method based on corner alignment and boundary matching of bounding box Active CN112016605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010837568.4A CN112016605B (en) 2020-08-19 2020-08-19 Target detection method based on corner alignment and boundary matching of bounding box

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010837568.4A CN112016605B (en) 2020-08-19 2020-08-19 Target detection method based on corner alignment and boundary matching of bounding box

Publications (2)

Publication Number Publication Date
CN112016605A CN112016605A (en) 2020-12-01
CN112016605B true CN112016605B (en) 2022-05-27

Family

ID=73505112

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010837568.4A Active CN112016605B (en) 2020-08-19 2020-08-19 Target detection method based on corner alignment and boundary matching of bounding box

Country Status (1)

Country Link
CN (1) CN112016605B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112651500B (en) * 2020-12-30 2021-12-28 深圳金三立视频科技股份有限公司 Method for generating quantization model and terminal
CN116508073A (en) * 2021-03-05 2023-07-28 华为技术有限公司 Method and device for determining target detection model
CN113052031B (en) * 2021-03-15 2022-08-09 浙江大学 3D target detection method without post-processing operation
CN113780453A (en) * 2021-09-16 2021-12-10 惠州市德赛西威汽车电子股份有限公司 Perception model training method and scene perception method based on perception model
CN114463720B (en) * 2022-01-25 2022-10-21 杭州飞步科技有限公司 Lane line detection method based on line segment intersection ratio loss function
CN117437397A (en) * 2022-07-15 2024-01-23 马上消费金融股份有限公司 Model training method, target detection method and device
CN116245950B (en) * 2023-05-11 2023-08-01 合肥高维数据技术有限公司 Screen corner positioning method for full screen or single corner deletion
CN117036985B (en) * 2023-10-09 2024-02-06 武汉工程大学 Small target detection method and device for video satellite image

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760886A (en) * 2016-02-23 2016-07-13 北京联合大学 Image scene multi-object segmentation method based on target identification and saliency detection
EP3171297A1 (en) * 2015-11-18 2017-05-24 CentraleSupélec Joint boundary detection image segmentation and object recognition using deep learning
CN109685152A (en) * 2018-12-29 2019-04-26 北京化工大学 A kind of image object detection method based on DC-SPP-YOLO
CN109934121A (en) * 2019-02-21 2019-06-25 江苏大学 A kind of orchard pedestrian detection method based on YOLOv3 algorithm
CN111222395A (en) * 2019-10-21 2020-06-02 杭州飞步科技有限公司 Target detection method and device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3171297A1 (en) * 2015-11-18 2017-05-24 CentraleSupélec Joint boundary detection image segmentation and object recognition using deep learning
CN105760886A (en) * 2016-02-23 2016-07-13 北京联合大学 Image scene multi-object segmentation method based on target identification and saliency detection
CN109685152A (en) * 2018-12-29 2019-04-26 北京化工大学 A kind of image object detection method based on DC-SPP-YOLO
CN109934121A (en) * 2019-02-21 2019-06-25 江苏大学 A kind of orchard pedestrian detection method based on YOLOv3 algorithm
CN111222395A (en) * 2019-10-21 2020-06-02 杭州飞步科技有限公司 Target detection method and device and electronic equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SIF_Self-Inspirited_Feature_Learning_for_Person_Re-Identification;Long wei;《IEEE:SIF_Self-Inspirited_Feature_Learning_for_Person_Re-Identification》;20200304;全文 *
Tiny YOLOV3目标检测改进;马立等;《光学精密工程》;20200415(第04期);全文 *
靳一凡.基于级联卷积神经网络的人脸关键点检测算法.《中国知网:博硕士论文库》.2015, *

Also Published As

Publication number Publication date
CN112016605A (en) 2020-12-01

Similar Documents

Publication Publication Date Title
CN112016605B (en) Target detection method based on corner alignment and boundary matching of bounding box
CN110363122B (en) Cross-domain target detection method based on multi-layer feature alignment
CN108830188B (en) Vehicle detection method based on deep learning
CN112001385B (en) Target cross-domain detection and understanding method, system, equipment and storage medium
CN106951830B (en) Image scene multi-object marking method based on prior condition constraint
CN108681693B (en) License plate recognition method based on trusted area
CN107679078A (en) A kind of bayonet socket image vehicle method for quickly retrieving and system based on deep learning
CN109325502B (en) Shared bicycle parking detection method and system based on video progressive region extraction
CN105930791A (en) Road traffic sign identification method with multiple-camera integration based on DS evidence theory
CN111078946A (en) Bayonet vehicle retrieval method and system based on multi-target regional characteristic aggregation
CN112215190A (en) Illegal building detection method based on YOLOV4 model
CN103871077A (en) Extraction method for key frame in road vehicle monitoring video
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN103279738A (en) Automatic identification method and system for vehicle logo
CN116704490B (en) License plate recognition method, license plate recognition device and computer equipment
Mijić et al. Traffic sign detection using YOLOv3
Cao et al. An end-to-end neural network for multi-line license plate recognition
Hu Intelligent road sign inventory (IRSI) with image recognition and attribute computation from video log
CN117333845A (en) Real-time detection method for small target traffic sign based on improved YOLOv5s
CN116844126A (en) YOLOv7 improved complex road scene target detection method
CN106548195A (en) A kind of object detection method based on modified model HOG ULBP feature operators
CN117456480B (en) Light vehicle re-identification method based on multi-source information fusion
CN113673534A (en) RGB-D image fruit detection method based on fast RCNN
CN104331708A (en) Automatic detecting and analyzing method and system for crosswalk lines
CN111832463A (en) Deep learning-based traffic sign detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant