WO2021051601A1 - Method and system for selecting detection box using mask r-cnn, and electronic device and storage medium - Google Patents
Method and system for selecting detection box using mask r-cnn, and electronic device and storage medium Download PDFInfo
- Publication number
- WO2021051601A1 WO2021051601A1 PCT/CN2019/118279 CN2019118279W WO2021051601A1 WO 2021051601 A1 WO2021051601 A1 WO 2021051601A1 CN 2019118279 W CN2019118279 W CN 2019118279W WO 2021051601 A1 WO2021051601 A1 WO 2021051601A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- iou
- detection frame
- mask
- polygon
- cnn
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/255—Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Definitions
- the neural network is continuously convolved and pooled, and the key features of the image are extracted and processed by the neural network algorithm, and the detection results and categories are obtained (that is, the rectangular frame of the object in the image is obtained); the obtained rectangular frame is compared with the real target Preliminary screening of the IOU value is performed on the overlapping part between the two; then, the polygon point set obtained by Mask (ie, the polygon outline obtained by the instance segmentation) is further used to perform the secondary screening of the IOU value of the polygon between the polygon point set and the real target, and finally accord with Set the border of the threshold as the detection frame.
- the beneficial effects are as follows:
- FIG. 3 is a schematic diagram of a preferred embodiment of the two-dimensional array mapping coding method of this application.
- FIG. 5 is a schematic structural diagram of a preferred embodiment of the electronic device of this application.
- the first matching between the candidate detection frame and the predicted target is performed first, and the first matching result is screened, that is, the screening is performed when the IOU value of the candidate detection frame is greater than IOU 1 .
- Mask R-CNN finally expands the output dimension of RoIAlign and predicts a Mask; that is, the result obtained by Mask branch is the point set of the polygon outline.
- the network interface 54 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 5 and other electronic devices.
- a standard wired interface and a wireless interface such as a WI-FI interface
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
A method and system for selecting a detection box using a Mask R-CNN, and an electronic device and a storage medium, relating to the technical field of image recognition. The method comprises: performing instance segmentation on a target image using a Mask R-CNN, and obtaining a rectangular candidate detection box and a polygonal contour corresponding to the candidate detection box (S110); respectively calculating IOU values of the candidate detection box and the polygonal contour, and when the IOU value of the candidate detection box is greater than a first preset threshold IOU1 and the IOU value of the polygonal contour is greater than a second preset threshold IOU2, screening out the candidate detection box as a target detection box, wherein the second preset threshold IOU2 is greater than the first preset threshold IOU1 (S120). By means of IOU secondary screening of the polygonal contour, the detection precision of the detection box is improved.
Description
本申请要求申请号为201910885674.7,申请日为2019年9月19日,发明创造名称为“利用Mask R-CNN选择检测框的方法、装置及存储介质”的专利申请的优先权。This application requires the priority of a patent application whose application number is 201910885674.7, the application date is September 19, 2019, and the invention-creation title is "Method, device and storage medium for selecting a detection frame using Mask R-CNN".
本申请涉及图像识别技术领域,尤其涉及一种利用Mask R-CNN选择检测框的方法及系统、电子装置及存储介质。This application relates to the field of image recognition technology, and in particular to a method and system for selecting a detection frame using Mask R-CNN, an electronic device, and a storage medium.
基于视频的运动人体检测和跟踪被广泛应用于银行、火车站等对安全要求较高的人员密集场所的监控中。而实时场景的人体跟踪比较复杂,存在背景变化、遮挡等其他干扰因素,难以同时满足检测的准确性、鲁棒性和实时性的要求。Video-based detection and tracking of moving human bodies are widely used in the surveillance of crowded places with high safety requirements, such as banks and railway stations. Human tracking in real-time scenes is more complicated, and there are other interfering factors such as background changes and occlusions, and it is difficult to meet the requirements of detection accuracy, robustness, and real-time at the same time.
申请人意识到目前的人体检测和跟踪方法是通过矩形搜索框来实现的,存在弊端如下:The applicant realizes that the current human body detection and tracking method is implemented through a rectangular search box, and the disadvantages are as follows:
1、搜索框通过IOU进行评价检测结果,即使符合IOU指标的搜索框,仍存在干扰图像;1. The search box evaluates the detection results through the IOU, even if the search box meets the IOU index, there are still interfering images;
2、目前搜索框的检测目标分类仅限于大类,比如人或者动物;而对于细节分类,比如男和女或者老和少无法进一步区分;2. At present, the detection target classification of the search box is limited to large categories, such as humans or animals; for detailed classification, such as male and female or old and young cannot be further distinguished;
3、复杂背景下对人体进行检测时,受周围环境影响较大;比如当行人所穿的衣服颜色与背景着色相似或者背景光线变化较大时,很难从背景中分割出运动的人体;3. When detecting the human body in a complex background, it is greatly affected by the surrounding environment; for example, when the color of the clothes worn by the pedestrian is similar to the background coloring or the background light changes greatly, it is difficult to segment the moving human body from the background;
4、当场景中有“影子”、“镜子”存在时,增加了搜索框中特征的复杂程度,干扰搜索框的检测,会造成“镜子中的人像是人”或者“影子区域是人”的误判;或者场景中存在运动的物体,如汽车或者摇摆的树、波动的水面也会增加搜索框中特征的复杂程度,增加检测难度。4. When there are "shadows" and "mirrors" in the scene, it increases the complexity of the features in the search box, interferes with the detection of the search box, and causes "the person in the mirror is like a person" or "the shadow area is a person" Misjudgment; or the presence of moving objects in the scene, such as cars or swaying trees, or fluctuating water surfaces will increase the complexity of the features in the search box and increase the difficulty of detection.
鉴于以上问题的存在,亟须一种更好地排除干扰区分虚假目标以及进行分类更加细致的目标检测方法。In view of the existence of the above problems, there is an urgent need for a target detection method that better eliminates interference and distinguishes false targets and performs more detailed classification.
发明内容Summary of the invention
本申请提供一种利用Mask R-CNN选择检测框的方法及系统、电子装置及计算机可读存储介质,其主要通过实例分割技术得到目标的矩形框以及多边形轮廓点集,将得到的矩形框经过IOU值初步筛选后;再利用多边形轮廓点集经过IOU值二次筛选,将符合两次筛选的矩形框作为目标检测框,继续进行目标检测。This application provides a method and system for selecting a detection frame using Mask R-CNN, an electronic device, and a computer-readable storage medium. It mainly obtains a rectangular frame of a target and a set of polygonal contour points through an instance segmentation technique, and passes the obtained rectangular frame through After the initial screening of the IOU value, the polygon contour point set is used for secondary screening by the IOU value, and the rectangular frame that meets the two screenings is used as the target detection frame, and the target detection is continued.
为实现上述目的,本申请还提供一种利用Mask R-CNN选择检测框的方法,应用于电子装置,所述方法包括:To achieve the above objective, this application also provides a method for selecting a detection frame using Mask R-CNN, which is applied to an electronic device, and the method includes:
S110、使用Mask R-CNN对目标图像进行实例分割,获得矩形的候选检测框以及其多边形轮廓;S120、分别计算所述候选检测框和所述多边形轮廓的IOU值;当所述候选检测框的IOU值大于第一预设阈值IOU
1,且所述多边形轮廓的IOU值大于第二预设阈值IOU
2时,筛选出所述候选检测框作为目标检测框;其中,所述第二预设阈值IOU
2大于第一预设阈值IOU
1。
S110. Use Mask R-CNN to perform instance segmentation on the target image to obtain a rectangular candidate detection frame and its polygon contour; S120. Calculate the IOU values of the candidate detection frame and the polygon contour respectively; when the candidate detection frame is When the IOU value is greater than the first preset threshold IOU 1 , and the IOU value of the polygon contour is greater than the second preset threshold IOU 2 , the candidate detection frame is screened out as the target detection frame; wherein, the second preset threshold IOU 2 is greater than the first preset threshold IOU 1 .
为实现上述目的,一种利用Mask R-CNN选择检测框的系统,包括实例分割模块和目标检测框筛选模块;其中,所述实例分割模块,用于使用Mask R-CNN对目标图像进行实例分割,获得矩形的候选检测框以及与所述候选检测框对应的多边形轮廓;所述目标检测框筛选模块,用于分别计算所述候选检测框和所述多边形轮廓的IOU值;当所述候选检测框的IOU值大于第一预设阈值IOU
1,且所述多边形轮廓的IOU值大于第二预设阈值IOU
2时,筛选出所述候选检测框作为目标检测框;其中,所述第二预设阈值IOU
2大于第一预设阈值IOU
1。
In order to achieve the above purpose, a system for selecting a detection frame using Mask R-CNN includes an instance segmentation module and a target detection frame screening module; wherein the instance segmentation module is used for instance segmentation of the target image using Mask R-CNN , Obtain a rectangular candidate detection frame and a polygon contour corresponding to the candidate detection frame; the target detection frame screening module is configured to calculate the IOU values of the candidate detection frame and the polygon contour respectively; when the candidate detection frame When the IOU value of the frame is greater than the first preset threshold IOU 1 , and the IOU value of the polygon outline is greater than the second preset threshold IOU 2 , the candidate detection frame is screened out as the target detection frame; wherein, the second preset Set the threshold IOU 2 to be greater than the first preset threshold IOU 1 .
为实现上述目的,本申请提供一种电子装置,该电子装置包括:存储器、处理器,所述存储器中包括检测框的选择程序,所述检测框的选择程序被所述处理器执行时实现如下步骤:S110、使用Mask R-CNN对目标图像进行实例分割,获得矩形的候选检测框以及其多边形轮廓;S120、分别计算所述候选检测框和所述多边形轮廓的IOU值,并各自与其预设阈值进行比较;其中, 所述候选检测框的预设阈值为IOU
1,所述多边形轮廓的预设阈值为IOU
2,IOU
2大于IOU
1;S130、筛选所述候选检测框的IOU值大于IOU
1,且其多边形轮廓的IOU值大于IOU
2的候选检测框作为目标检测框。
In order to achieve the above object, the present application provides an electronic device, the electronic device includes: a memory, a processor, the memory includes a detection box selection program, when the detection box selection program is executed by the processor, the implementation is as follows Steps: S110. Use Mask R-CNN to perform instance segmentation on the target image to obtain a rectangular candidate detection frame and its polygonal contour; S120. Calculate the IOU values of the candidate detection frame and the polygonal contour respectively, and respectively preset them Thresholds are compared; wherein, the preset threshold of the candidate detection frame is IOU 1 , the preset threshold of the polygon contour is IOU 2 , and IOU 2 is greater than IOU 1 ; S130. The IOU value of the candidate detection frame is selected to be greater than IOU 1 and the candidate detection frame whose polygonal contour IOU value is greater than IOU 2 is used as the target detection frame.
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括检测框的选择程序,所述检测框的选择程序被处理器执行时,实现上述的利用Mask R-CNN选择检测框的方法的步骤。In addition, in order to achieve the above-mentioned object, the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, the computer program includes a check box selection program, the check box selection program is When the processor executes, it implements the steps of the above-mentioned method of using Mask R-CNN to select a detection frame.
本申请提出的利用Mask R-CNN选择检测框的方法及系统、电子装置及计算机可读存储介质,利用Mask R-CNN(MaskRegion-based Convolutional Neural Network)神经网络进行的运算方法,监测图像在深度神经网络不断地被卷积和池化,利用神经网络算法对图像的关键特征进行提取和处理,得到的检测结果和类别(即获得图像中对象的矩形边框);将得到的矩形框与真实目标之间的重叠部分进行IOU值初步筛选;然后进一步利用Mask获得的多边形点集(即实例分割得到的多边形轮廓),将多边形点集与真实目标间的多边形进行IOU值的二次筛选,最终符合设定阈值的边框作为检测框。有益效果如下:The method and system for selecting a detection frame using Mask R-CNN, an electronic device, and a computer-readable storage medium proposed in this application, using the Mask R-CNN (MaskRegion-based Convolutional Neural Network) neural network to perform the calculation method, monitor the depth of the image The neural network is continuously convolved and pooled, and the key features of the image are extracted and processed by the neural network algorithm, and the detection results and categories are obtained (that is, the rectangular frame of the object in the image is obtained); the obtained rectangular frame is compared with the real target Preliminary screening of the IOU value is performed on the overlapping part between the two; then, the polygon point set obtained by Mask (ie, the polygon outline obtained by the instance segmentation) is further used to perform the secondary screening of the IOU value of the polygon between the polygon point set and the real target, and finally accord with Set the border of the threshold as the detection frame. The beneficial effects are as follows:
(1)、通过Mask R-CNN的Mask得到目标的多边形点集,在矩形候选框的基础上缩小像素范围(即缩小包围盒范围),从而实现更加细致的目标分类;(1) Get the polygon point set of the target through the Mask of R-CNN, and reduce the pixel range (that is, reduce the bounding box range) on the basis of the rectangular candidate frame, so as to achieve more detailed target classification;
(2)、根据影子的特征,结合二维数组编码形成一种判断镜像是否存在的分析方法,从而实现对影子这种虚假目标的排除目的;(2) According to the characteristics of shadows, combined with two-dimensional array coding to form an analysis method for judging whether the mirrors exist, so as to achieve the purpose of eliminating the false targets of shadows;
(3)、利用二维数组编码的方式计算多边形轮廓的IOU,精准快捷;(3) Calculate the IOU of the polygon contour using the two-dimensional array coding method, which is accurate and fast;
(4)、对于候选框的选择,先经过候选框的IOU初筛,再通过多边形点集的IOU二次筛选,进一步回归,得到更加精准的目标检测框。(4) For the selection of the candidate frame, first go through the IOU preliminary screening of the candidate frame, and then pass the IOU secondary screening of the polygon point set, and further return to obtain a more accurate target detection frame.
图1为本申请利用Mask R-CNN选择检测框的方法较佳实施例的流程图;Fig. 1 is a flowchart of a preferred embodiment of a method for selecting a detection frame by using Mask R-CNN in this application;
图2为本申请的利用二维数组映射编码方法计算IOU值的方法较佳实施例的流程图;FIG. 2 is a flowchart of a preferred embodiment of a method for calculating an IOU value using a two-dimensional array mapping coding method according to this application;
图3为本申请的二维数组映射编码方法的较佳实施例的示意图;FIG. 3 is a schematic diagram of a preferred embodiment of the two-dimensional array mapping coding method of this application;
图4为本申请利用Mask R-CNN选择检测框的系统较佳实施例的结构示意图;FIG. 4 is a schematic structural diagram of a preferred embodiment of a system for selecting a detection frame by using Mask R-CNN in this application;
图5为本申请的电子装置的较佳实施例的结构示意图;FIG. 5 is a schematic structural diagram of a preferred embodiment of the electronic device of this application;
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the application, and not used to limit the application.
需要说明的是,在本文中,“第一”、“第二”字样仅仅用来将相同的名称区分开来,而不是暗示这些名称之间的关系或者顺序。It should be noted that in this article, the words "first" and "second" are only used to distinguish the same names, and do not imply the relationship or order between these names.
目标检测的目的是在图片或者视频中识别并定位特定类别的对象,检测的过程可以看做是一个分类的过程,区分目标和背景。而在检测过程中检测框的选择影响着检测中干扰的排除效果以及检测中分类的细致程度。The purpose of target detection is to identify and locate a specific category of objects in a picture or video. The detection process can be regarded as a classification process to distinguish between the target and the background. In the detection process, the selection of the detection frame affects the elimination of interference in the detection and the level of detail in the classification in the detection.
本申请提供一种利用Mask R-CNN选择检测框的方法。参照图1所示,为本申请利用Mask R-CNN选择检测框的方法较佳实施例的流程图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。This application provides a method for selecting a detection frame using Mask R-CNN. Referring to FIG. 1, it is a flowchart of a preferred embodiment of a method for selecting a detection frame by using Mask R-CNN in this application. The method can be executed by a device, and the device can be implemented by software and/or hardware.
其中,Mask R-CNN(MaskRegion-based Convolutional Neural Network)为用于预测图像中检测对象的类别并精调边框,进而分割检测对象的多边形轮廓的Mask;其中,边框(bounding box)为能够包括图像某个对象的最小的矩形框。Among them, Mask Region-based Convolutional Neural Network (MaskRegion-based Convolutional Neural Network) is a mask used to predict the category of the detection object in the image and fine-tune the border to segment the polygon contour of the detection object; among them, the bounding box is a mask that can include the image The smallest rectangular frame of an object.
在本实施例中,利用Mask R-CNN选择检测框的方法包括:步骤S110-步骤S130。In this embodiment, the method of using Mask R-CNN to select a detection frame includes: step S110-step S130.
S110、使用Mask R-CNN对目标图像进行实例分割,获得矩形的候选检测框以及其多边形轮廓。S110. Use Mask R-CNN to perform instance segmentation on the target image to obtain a rectangular candidate detection frame and its polygonal contour.
Mask R-CNN的实例分割分为两步:第一步的动作是选取,所选取的候选框的位置和种类(即预测图像中对象的类别并精调边框),所选取的为矩形;第二步是的动作是分割,所分割的为多边形轮廓(通过掩码层Mask branch获 得)。The instance segmentation of Mask R-CNN is divided into two steps: the first action is to select, the position and type of the selected candidate frame (that is, predict the category of the object in the image and fine-tune the frame), and the selected is a rectangle; The action of the second step is segmentation, and the segmented is the polygon outline (obtained by the mask layer Mask branch).
S120、分别计算所述候选检测框和所述多边形轮廓的IOU值,且所述多边形轮廓的IOU值大于第二预设阈值IOU
2时,筛选出所述候选检测框作为目标检测框;其中,所述第二预设阈值IOU
2大于第一预设阈值IOU
1需要说明的是,IOU(Intersection over Union,交并比),IOU可以理解为预测框和候选检测框的重合程度。
S120, the candidate values are calculated IOU detection frame and the polygonal profile, polygonal profile and the IOU values is greater than a second predetermined threshold value 2 IOU, the selected candidate block is detected as a target detection frame; wherein, The second preset threshold IOU 2 is greater than the first preset threshold IOU 1. It should be noted that, IOU (Intersection over Union), IOU can be understood as the degree of overlap between the prediction frame and the candidate detection frame.
在一个具体的实施例中,第一预设阈值IOU
1和第二预设阈值IOU
2可根据不同的场景进行设置;而且,为了提高矩形检测框检测精度,将第二预设阈值IOU
2大于第一预设阈值IOU
1。
In a specific embodiment, the first preset threshold IOU 1 and the second preset threshold IOU 2 can be set according to different scenarios; and, in order to improve the detection accuracy of the rectangular detection frame, the second preset threshold IOU 2 is greater than The first preset threshold IOU 1 .
先进行候选检测框与预测目标的第一次匹配,并对第一次匹配结果进行筛选,也就是说,进行所述候选检测框的IOU值大于IOU
1的筛选。
The first matching between the candidate detection frame and the predicted target is performed first, and the first matching result is screened, that is, the screening is performed when the IOU value of the candidate detection frame is greater than IOU 1 .
然后进行多边形轮廓与预测目标的第二次匹配,并对第二次匹配结果进行筛选,也就是说,进行多边形轮廓的IOU值大于IOU
2的筛选。
Then, the second matching between the polygon contour and the predicted target is performed, and the second matching result is screened, that is, the IOU value of the polygon contour is larger than IOU 2 screening.
完成两次筛选过后的候选检测框作为最终的目标检测框。The candidate detection frame after the two screenings is completed as the final target detection frame.
在具体的实施例中,第一预设阈值IOU
1和第二预设阈值IOU
2的取值范围均为0.5-0.7。
In a specific embodiment, the value ranges of the first preset threshold IOU 1 and the second preset threshold IOU 2 are both 0.5-0.7.
综上所述,Mask R-CNN的实例分割获得的候选检测框与多边形轮廓两个分支结果,本申请将两个平行无交集的分支结果,建立了新的判断关系;通过利用候选检测框进行IOU初步筛选,并利用多边形轮廓进行IOU二次筛选;进而获得了检测精度更高的目标检测框。In summary, the two branch results of the candidate detection frame and the polygon contour obtained by the instance segmentation of Mask R-CNN, this application establishes a new judgment relationship between the two parallel and non-intersecting branch results; by using the candidate detection frame IOU was initially screened, and the polygonal contour was used for IOU secondary screening; and then a target detection frame with higher detection accuracy was obtained.
参照图2所示,本申请的利用二维数组映射编码方法计算IOU值的方法较佳实施例的流程图;图2示出了,利用二维数组映射编码方法计算IOU值的方法包括步骤:S210-S230;2 shows a flowchart of a preferred embodiment of the method for calculating an IOU value using a two-dimensional array mapping encoding method of the present application; FIG. 2 shows that the method for calculating an IOU value using a two-dimensional array mapping encoding method includes the steps: S210-S230;
S210、将所述多边形轮廓与其预测框分别映射至一个预先被线段组合分割的平面模板上,其中,所述线段组合将所述平面模板分割成等大的分割块;S210. Map the polygon contour and its prediction frame to a plane template pre-divided by a line segment combination, wherein the line segment combination divides the plane template into equal-sized segmented blocks;
参照图3所示,本申请的二维数组映射编码方法的较佳实施例的示意图;图3示出了,二维数组映射编码方法的编码过程。Referring to FIG. 3, a schematic diagram of a preferred embodiment of the two-dimensional array mapping encoding method of the present application; FIG. 3 shows the encoding process of the two-dimensional array mapping encoding method.
右侧为目标检测的对象,其外侧为多边形轮廓;将多边形轮廓映射到二值图上;如图3所示,二值图被选段组合分割成等大的分割块,二值图内的分 割块包括编码为1的分割块和编码为0的分割块组成。The right side is the object detected by the target, and its outer side is the polygon contour; the polygon contour is mapped to the binary image; as shown in Figure 3, the binary image is divided into equal-sized segments by the combination of selected segments, and the segmentation in the binary image A block consists of a segmented block coded as 1 and a segmented block coded as 0.
S220、将多边形轮廓和其预测框的映射结果分别对应至与所述平面模板等大的二值图上,将每个分割块表示为二维数组的映射编码(A,B)形式;其中,分割块对应多边形轮廓的编码状态赋值为A,分割块对应预测框的编码状态赋值为B;当所述分割块位于所述多边形轮廓内时A=1,所述分割块位于所述多边形轮廓外时A=0;当所述分割块位于所述预测框内时B=1,所述分割块位于所述预测框外时B=0。S220. Correspond the mapping results of the polygon contour and its prediction frame to a binary image the same size as the plane template, and represent each segmentation block as a two-dimensional array of mapping coding (A, B) form; wherein, The coding state of the segmented block corresponding to the polygon contour is assigned the value A, and the coding status of the segmented block corresponding to the prediction frame is assigned the value B; when the segmented block is located within the polygonal contour, A=1, the segmented block is located outside the polygonal contour When A=0; B=1 when the partition block is located in the prediction frame, and B=0 when the partition block is located outside the prediction frame.
如图3所示,将右侧的人形轮廓映射至左侧的二值图上,当所述分割块位于所述多边形轮廓内时分割块被赋值为1,所述分割块位于所述多边形轮廓外时分割块被赋值为0。赋值后的二值图如图3所示。As shown in FIG. 3, the humanoid contour on the right is mapped to the binary image on the left. When the segment is located within the polygonal contour, the segment is assigned a value of 1, and the segment is located on the polygonal contour. The outer time segmentation block is assigned a value of 0. The binary graph after assignment is shown in Figure 3.
具体地说,因为多边形轮廓和多边形轮廓的预测框存在差异,每个分割块在对应多边形轮廓和对应多边形轮廓的预测框时,被赋值可能不同。如果一个分割块既在多边形轮廓内,也在多边形轮廓的预测框内,则该分割块的编码为(1,1);如果一个分割块只在多边形轮廓内,不在多边形轮廓的预测框内,则该分割块的编码为(1,0);如果一个分割块不在多边形轮廓内,只在多边形轮廓的预测框内,则该分割块的编码为(0,1);如果一个分割块既不在多边形轮廓内,也不在多边形轮廓的预测框内,则该分割块的编码为(0,0)。所以说,该分割块的编码出现了上述(1,1)、(1,0)、(0,1)和(0,0)四种编码情况。Specifically, because there are differences between the polygon contour and the prediction frame of the polygon contour, each segmentation block may be assigned different values when the corresponding polygon contour and the prediction frame of the corresponding polygon contour are different. If a segmentation block is both in the polygon contour and the prediction frame of the polygon contour, the coding of the segmentation block is (1, 1); if a segmentation block is only in the polygon contour, not in the prediction frame of the polygon contour, Then the code of the segmentation block is (1, 0); if a segmentation block is not in the polygon contour, but only in the prediction frame of the polygon contour, then the code of the segmentation block is (0, 1); if a segmentation block is neither If the polygon contour is not in the prediction frame of the polygon contour, the coding of the partition block is (0, 0). Therefore, the coding of the divided block has the above-mentioned (1, 1), (1, 0), (0, 1) and (0, 0) four coding situations.
S230、通过统计分割块的编码,求取IOU值;其中,IOU=编码为(1,1)的分割块的数量/[编码为(1,0)的分割块数量+编码为(0,1)的分割块数量+编码为(1,1)分割块数量]。S230. Obtain the IOU value by counting the coding of the divided blocks; where IOU=number of divided blocks coded as (1,1)/[number of divided blocks coded as (1,0)+coded as (0,1) The number of partitions of) + the code is (1, 1) the number of partitions].
IOU=交集多边形的面积/(多边形轮廓面积+预测框面积-交集多边形面积);IOU = area of the intersection polygon/(polygon outline area + prediction frame area-intersection polygon area);
因此,交集多边形的面积=多边形轮廓与其预测框之间相交的面积;并集多边形的面积=多边形轮廓面积+预测框面积-交集多边形面积;多边形轮廓与其预测框之间相交的面积也就是编码为(1,1)的所有分割块的面积;而并集多边形的面积等同于编码为(1,0)的分割块面积+编码为(0,1)的分割块面积+编码为(1,1)分割块面积;因此,交集多边形的面积/并集多边形的面积 =IOU=编码为(1,1)的分割块的数量/[编码为(1,0)的分割块数量+编码为(0,1)的分割块数量+编码为(1,1)分割块数量]。Therefore, the area of the intersection polygon = the area of the intersection between the polygon outline and its prediction frame; the area of the union polygon = the area of the polygon outline + the area of the prediction frame-the area of the intersection polygon; the area of the intersection between the polygon outline and its prediction frame is also coded as The area of all the partitions of (1,1); and the area of the union polygon is equivalent to the area of the partition coded as (1,0) + the area of the partition coded as (0,1) + code is (1,1) ) The area of the partition; therefore, the area of the intersection polygon/the area of the union polygon=IOU=the number of partitions coded as (1, 1)/[the number of partitions coded as (1, 0) + code is (0 , 1) the number of partitions + code is (1, 1) the number of partitions].
在具体的实施例中,当检测的场景中存在“影子”或者“镜子”时,会同时对检测目标以及目标的“镜像”(或者影子)产生检测框,非常容易造成存在两个检测目标的误判。对所获得的所有候选检测框进行二维数组映射编码;对编码后的候选检测框进行重合度比对;当两个候选检测框的重合度大于重合阈值时,判定所述两个候选检测框所检测的目标中存在镜像。In a specific embodiment, when there is a "shadow" or "mirror" in the detected scene, a detection frame will be generated for the detection target and the "mirror" (or shadow) of the target at the same time, which is very easy to cause the existence of two detection targets. Misjudgment. Perform two-dimensional array mapping encoding on all obtained candidate detection frames; perform a coincidence degree comparison on the encoded candidate detection frames; when the coincidence degree of two candidate detection frames is greater than the coincidence threshold, determine the two candidate detection frames There is a mirror image in the detected target.
此处的重合阈值设定为75%;也就是说,如果两个候选检测框的编码重合度达到75%的时候,判定存在镜像或者影像等干扰,从而排除干扰。The coincidence threshold here is set to 75%; that is, if the coding coincidence degree of the two candidate detection frames reaches 75%, it is determined that there is interference such as mirror or image, so as to eliminate the interference.
在一个具体实施例中,计算所述多边形轮廓的IOU值包括,对通过交并集面积方法计算所述多边形轮廓的IOU值;其中所述交并集面积方法包括:S310、获得所述多边形轮廓与其预测框的关键点,并对所述关键点进行标注,其中关键点包括所述多边形轮廓与其预测框的各顶点以及所述多边形轮廓与其预测框的各交点;S320、将所述交点以及交点内部的点,通过排序构成交集多边形的点集;S330、计算多边形轮廓及其预测框的面积、交集多边形的面积,并根据所述多边形轮廓及其预测框的面积、交集多边形的面积计算出多边形轮廓的IOU值,IOU=交集多边形的面积/(多边形轮廓面积+预测框面积-交集多边形面积)。In a specific embodiment, calculating the IOU value of the polygon contour includes calculating the IOU value of the polygon contour by an intersection area method; wherein the intersection area method includes: S310. Obtain the polygon contour The key points of the prediction frame are marked with the key points, where the key points include the vertices of the polygon outline and its prediction frame, and the intersection points of the polygon outline and the prediction frame; S320, adding the intersection points and the intersection points The internal points are sorted to form the point set of the intersection polygon; S330. Calculate the polygon outline and the area of the prediction frame, the area of the intersection polygon, and calculate the polygon according to the polygon outline and the area of the prediction frame, and the area of the intersection polygon The IOU value of the contour, IOU=the area of the intersection polygon/(the area of the polygon outline+the area of the prediction frame-the area of the intersection polygon).
上述利用Mask R-CNN选择检测框的方法通过Mask R-CNN选择检测框模型实现,所述Mask R-CNN选择检测框模型的神经网络结构包括卷积层Mask R-CNN以及位于所述Mask R-CNN之后的RoI Align层。所述Mask R-CNN选择检测框模型的神经网络结构还包括掩码层、分类器和全连接层,所述全连接层用于RoI边框修正训练。The above method of using Mask R-CNN to select a detection frame is implemented by Mask R-CNN selecting a detection frame model. The neural network structure of the Mask R-CNN selection detection frame model includes the convolutional layer Mask R-CNN and the mask R-CNN. -RoI Align layer after CNN. The neural network structure of the Mask R-CNN selection detection frame model further includes a mask layer, a classifier and a fully connected layer, and the fully connected layer is used for RoI frame correction training.
具体地说,Mask R-CNN选择检测框模型的神经网络结构包括:Specifically, the neural network structure of Mask R-CNN to select the detection frame model includes:
Mask R-CNN总的来说,就是在实现目标检测的同时,把目标像素分割出来;换句话说,就是在基础的边框识别的架构上增加了Mask分支网络,其中Mask分支网络就是用于目标像素的分割,从而得到目标的多边形轮廓点集。In general, Mask R-CNN is to segment the target pixels while achieving target detection; in other words, it adds a Mask branch network to the basic frame recognition architecture, where the Mask branch network is used for the target. Pixel segmentation, so as to obtain the polygon contour point set of the target.
在CNN卷积层之后是在RoI Align层,之后就是掩码层、分类器和RoI边 框修正训练(全连接层)。其中,Mask R-CNN继承了Faster R-CNN的RPN部分。After the CNN convolutional layer, there is the RoI Align layer, followed by the mask layer, classifier and RoI frame correction training (fully connected layer). Among them, Mask R-CNN inherits the RPN part of Faster R-CNN.
执行任务的过程包括:使用共享的卷积层为检测目标图像提取特征,然后将得到的feature maps送入RPN,RPN生成待检测框(指定RoI的位置)并对RoI的包围框进行第一次修正。之后就是Fast R-CNN的架构了,RoIAlign根据RPN的输出在feature map上面选取每个RoI对应的特征,并将维度置为定值。最后,使用全连接层(FC Layer)对框进行分类,并且进行目标包围框的第二次修正;最终得到候选检测框(box regression)和分类(classification)。The process of performing the task includes: using the shared convolutional layer to extract features for the detection target image, and then sending the resulting feature maps to the RPN, and the RPN generates the frame to be detected (the position of the RoI is specified) and the bounding frame of the RoI is performed for the first time Fix. After that is the Fast R-CNN architecture. RoIAlign selects the features corresponding to each RoI on the feature map based on the output of the RPN, and sets the dimension to a fixed value. Finally, the fully connected layer (FC Layer) is used to classify the boxes, and the second modification of the target bounding box is performed; finally, candidate detection boxes (box regression) and classification (classification) are obtained.
另一个分支为head部分,Mask R-CNN最终将RoIAlign的输出维度扩大,预测出一个Mask;也就是说,Mask branch所获得的结果即为多边形轮廓的点集。The other branch is the head part. Mask R-CNN finally expands the output dimension of RoIAlign and predicts a Mask; that is, the result obtained by Mask branch is the point set of the polygon outline.
而对于Mask R-CNN而言,预测Mask和分类(以及候选检测框)是各有各的训练参数。在Mask R-CNN模型训练之前,将所述Mask R-CNN模型的超参数设置为FAster R-CNN模型的参数值,并利用ResNet50、ResNet101、FPN网络对所述超参数进行预训练;进一步利用大量样本对Mask R-CNN模型进行训练,得到Mask R-CNN模型。训练得到Mask R-CNN模型后,利用测试样本对Mask R-CNN模型进行测试,以验证Mask R-CNN模型的准确性。For Mask R-CNN, prediction Mask and classification (and candidate detection frame) have their own training parameters. Before the Mask R-CNN model training, set the hyperparameters of the Mask R-CNN model to the parameter values of the FAster R-CNN model, and use ResNet50, ResNet101, and FPN networks to pre-train the hyperparameters; further use A large number of samples train the Mask R-CNN model to obtain the Mask R-CNN model. After training the Mask R-CNN model, use the test samples to test the Mask R-CNN model to verify the accuracy of the Mask R-CNN model.
在一个具体的实施例中,训练的数据集为COCO trainval35k有80种物体类别和150万个物体实例。In a specific embodiment, the training data set is COCO trainval35k, which has 80 object categories and 1.5 million object instances.
在一个具体的实施例中,将训练后的Mask R-CNN模型检测得到的结果保存至分布式数据库中,以利用分布式数据库对训练后Mask R-CNN模型进行更新。In a specific embodiment, the results obtained from the detection of the trained Mask R-CNN model are stored in a distributed database, so as to update the trained Mask R-CNN model using the distributed database.
综上所述,输入的图像为目标的多角度的图像,形成样本库;将样本送入Mask R-CNN检测识别模型中进行训练,并在卷积层提取图像特征,最终得到准去的目标分类框和对应的目标状态以及实例分割的多边形点集。In summary, the input image is the target's multi-angle image to form a sample library; the sample is sent to the Mask R-CNN detection and recognition model for training, and the image features are extracted in the convolutional layer, and finally the target is obtained. The classification box and the corresponding target state and the polygon point set of the instance segmentation.
为实现上述目的,一种利用Mask R-CNN选择检测框的系统400,包括实例分割模块410和目标检测框筛选模块420;其中,所述实例分割模块410,用于使用Mask R-CNN对目标图像进行实例分割,获得矩形的候选检测框以及与所述候选检测框对应的多边形轮廓;所述目标检测框筛选模块420,用于分别计算所述候选检测框和所述多边形轮廓的IOU值;当所述候选检测框的 IOU值大于第一预设阈值IOU
1,且所述多边形轮廓的IOU值大于第二预设阈值IOU
2时,筛选出所述候选检测框作为目标检测框;其中,所述第二预设阈值IOU
2大于第一预设阈值IOU
1。
In order to achieve the above purpose, a system 400 for selecting a detection frame using Mask R-CNN includes an instance segmentation module 410 and a target detection frame screening module 420; wherein, the instance segmentation module 410 is used to use Mask R-CNN for target detection. The image is divided into instances to obtain a rectangular candidate detection frame and a polygon outline corresponding to the candidate detection frame; the target detection frame screening module 420 is configured to calculate the IOU values of the candidate detection frame and the polygon outline respectively; When the IOU value of the candidate detection frame is greater than the first preset threshold IOU 1 , and the IOU value of the polygon contour is greater than the second preset threshold IOU 2 , the candidate detection frame is screened out as the target detection frame; wherein, The second preset threshold IOU 2 is greater than the first preset threshold IOU 1 .
所述目标检测框筛选模块420包括多边形轮廓的IOU值获取第一子模块,所述多边形轮廓的IOU值获取第一子模块,用于通过二维数组映射编码方法计算所述多边形轮廓的IOU值。The target detection frame screening module 420 includes a first sub-module for obtaining the IOU value of the polygon contour, and the first sub-module for obtaining the IOU value of the polygon contour is used to calculate the IOU value of the polygon contour by a two-dimensional array mapping coding method .
具体地,所述多边形轮廓的IOU值获取第一子模块包括二维数组映射单元和多边形轮廓的IOU值第一获取单元;其中,所述二维数组映射单元,用于将所述多边形轮廓与其预测框分别映射至一个预先被线段组合分割的平面模板上,其中,所述线段组合将所述平面模板分割成等大的分割块;将多边形轮廓和其预测框的映射结果分别对应至与所述平面模板等大的二值图上,将每个分割块表示为二维数组的映射编码(A,B)形式;其中,分割块对应多边形轮廓的编码状态赋值为A,分割块对应预测框的编码状态赋值为B;当所述分割块位于所述多边形轮廓内时A=1,所述分割块位于所述多边形轮廓外时A=0;当所述分割块位于所述预测框内时B=1,所述分割块位于所述预测框外时B=0;多边形轮廓的IOU值第一获取单元,用于通过统计分割块的编码,求取IOU值;其中,IOU=编码为(1,1)的分割块的数量/[编码为(1,0)的分割块数量+编码为(0,1)的分割块数量+编码为(1,1)分割块数量]。Specifically, the first sub-module for obtaining the IOU value of the polygonal contour includes a two-dimensional array mapping unit and a first obtaining unit for the IOU value of the polygonal contour; wherein the two-dimensional array mapping unit is used to combine the polygonal contour with the first submodule. The prediction boxes are respectively mapped to a plane template pre-divided by a line segment combination, wherein the line segment combination divides the plane template into equal-sized segmented blocks; the polygon contour and the mapping result of its prediction box are respectively corresponding to the On a binary graph such as the plane template, each segmentation block is represented as a two-dimensional array of mapping coding (A, B) form; where the segmentation block corresponds to the coding state of the polygon contour is assigned the value A, and the segmentation block corresponds to the prediction frame The coding state of is assigned the value B; when the partition block is located in the polygon contour, A=1, when the partition block is located outside the polygon contour, A=0; when the partition block is located in the prediction frame B=1, when the partition block is outside the prediction frame, B=0; the first acquiring unit of the IOU value of the polygon contour is used to calculate the IOU value by counting the encoding of the partition block; where IOU=encoding is ( The number of partitions of 1, 1)/[the number of partitions coded as (1, 0) + the number of partition blocks coded as (0, 1) + the number of partition blocks coded as (1, 1)].
在一个具体实施例中,所述目标检测框筛选模块420包括多边形轮廓的IOU值获取第二子模块,所述多边形轮廓的IOU值获取第二子模块,用于通过交并集面积方法计算所述多边形轮廓的IOU值;所述多边形轮廓的IOU值获取第二子模块包括点集获取单元和多边形轮廓的IOU值第二获取单元;其中,所述点集获取单元,用于通过获得所述多边形轮廓与其预测框的关键点,并对所述关键点进行标注,其中关键点包括所述多边形轮廓与其预测框的各顶点以及所述多边形轮廓与其预测框的各交点;将所述交点以及交点内部的点,通过排序构成交集多边形的点集;所述多边形轮廓的IOU值第二获取单元,用于计算多边形轮廓及其预测框的面积、交集多边形的面积,并根据所述多边形轮廓及其预测框的面积、交集多边形的面积计算出多边形轮廓的IOU值,IOU=交集多边形的面积/(多边形轮廓面积+预测框面积-交集多边形面积)。In a specific embodiment, the target detection frame screening module 420 includes a second sub-module for obtaining the IOU value of the polygonal outline, and the second sub-module for obtaining the IOU value of the polygonal outline is used to calculate the total area by the intersection and union method. The IOU value of the polygon profile; the second sub-module for acquiring the IOU value of the polygon profile includes a point set acquiring unit and a second acquiring unit of the polygon profile IOU value; wherein, the point set acquiring unit is configured to obtain the Key points of the polygon contour and its prediction frame, and label the key points, where the key points include the vertices of the polygon contour and its prediction frame, and the intersections of the polygon contour and its prediction frame; The internal points are sorted to form the point set of the intersection polygon; the second acquisition unit of the IOU value of the polygon outline is used to calculate the area of the polygon outline and its prediction frame, and the area of the intersection polygon, and according to the polygon outline and the area of the intersection polygon. The area of the prediction frame and the area of the intersection polygon calculate the IOU value of the polygon outline, IOU=the area of the intersection polygon/(polygon outline area + prediction frame area-intersection polygon area).
在一个具体实施例中,所述目标检测框筛选模块中所述第一预设阈值IOU
1以及所述第二预设阈值IOU
2的取值范围均为0.5-0.7。
In a specific embodiment, the value ranges of the first preset threshold IOU 1 and the second preset threshold IOU 2 in the target detection frame screening module are both 0.5-0.7.
在一个具体实施例中,还包括镜像筛选模块430,用于对所筛选出的所有候选检测框进行二维数组映射编码;对编码后的候选检测框进行重合度比对;当两个候选检测框的重合度大于重合阈值时,判定所述两个候选检测框所检测的目标中存在镜像。In a specific embodiment, it also includes a mirror screening module 430, which is used to perform two-dimensional array mapping coding on all the candidate detection frames selected; to compare the coincidence degree of the coded candidate detection frames; when the two candidate detection frames are When the coincidence degree of the frames is greater than the coincidence threshold, it is determined that there is a mirror image in the target detected by the two candidate detection frames.
在一个具体实施例中,上述利用Mask R-CNN选择检测框的系统通过Mask R-CNN选择检测框模型实现,所述Mask R-CNN选择检测框模型的神经网络结构包括卷积层Mask R-CNN以及位于所述Mask R-CNN之后的RoI Align层。In a specific embodiment, the aforementioned system using Mask R-CNN to select a detection frame is implemented by Mask R-CNN to select a detection frame model. The neural network structure of the Mask R-CNN selection detection frame model includes a convolutional layer Mask R- CNN and the RoI Align layer behind the Mask R-CNN.
在一个具体实施例中,所述Mask R-CNN选择检测框模型的神经网络结构还包括掩码层、分类器和全连接层,所述全连接层用于RoI边框修正训练。In a specific embodiment, the neural network structure of the Mask R-CNN selection detection frame model further includes a mask layer, a classifier, and a fully connected layer, and the fully connected layer is used for RoI frame correction training.
本申请提供一种利用Mask R-CNN选择检测框的方法,应用于一种电子装置5。参照图5所示,为本申请利用Mask R-CNN选择检测框的方法较佳实施例的应用环境示意图。This application provides a method for selecting a detection frame using Mask R-CNN, which is applied to an electronic device 5. Referring to FIG. 5, it is a schematic diagram of an application environment of a preferred embodiment of the method for selecting a detection frame by using Mask R-CNN in this application.
在本实施例中,电子装置5可以是服务器、智能手机、平板电脑、便携计算机、桌上型计算机等具有运算功能的终端设备。In this embodiment, the electronic device 5 may be a terminal device with arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.
该电子装置5包括:处理器52、存储器51、通信总线53及网络接口55。The electronic device 5 includes a processor 52, a memory 51, a communication bus 53 and a network interface 55.
存储器51包括至少一种类型的可读存储介质。所述至少一种类型的可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器51等的非易失性存储介质。在一些实施例中,所述可读存储介质可以是所述电子装置5的内部存储单元,例如该电子装置5的硬盘。在另一些实施例中,所述可读存储介质也可以是所述电子装置5的外部存储器51,例如所述电子装置5上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。The memory 51 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory 51, and the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 5, such as a hard disk of the electronic device 5. In other embodiments, the readable storage medium may also be the external memory 51 of the electronic device 5, such as a plug-in hard disk equipped on the electronic device 5, or a smart memory card (Smart Media Card, SMC). , Secure Digital (SD) card, Flash Card (Flash Card), etc.
在本实施例中,所述存储器51的可读存储介质通常用于存储安装于所述电子装置5的检测框的选择程序50等。所述存储器51还可以用于暂时地存储已经输出或者将要输出的数据。In this embodiment, the readable storage medium of the memory 51 is generally used to store the selection program 50 of the detection frame installed in the electronic device 5 and the like. The memory 51 can also be used to temporarily store data that has been output or will be output.
处理器52在一些实施例中可以是一中央处理器(Central Processing Unit,CPU),微处理器或其他数据处理芯片,用于运行存储器51中存储的程序代码或处理数据,例如执行检测框的选择程序50等。The processor 52 may be a central processing unit (CPU), microprocessor or other data processing chip in some embodiments, and is used to run the program code or processing data stored in the memory 51, for example, to execute the detection frame Select program 50 and so on.
通信总线53用于实现这些组件之间的连接通信。The communication bus 53 is used to realize the connection and communication between these components.
网络接口54可选地可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该电子装置5与其他电子设备之间建立通信连接。The network interface 54 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 5 and other electronic devices.
图5仅示出了具有组件51-54的电子装置5,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。FIG. 5 only shows the electronic device 5 with the components 51-54, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
可选地,该电子装置5还可以包括用户接口,用户接口可以包括输入单元比如键盘(Keyboard)、语音输入装置比如麦克风(microphone)等具有语音识别功能的设备、语音输出装置比如音响、耳机等,可选地用户接口还可以包括标准的有线接口、无线接口。Optionally, the electronic device 5 may also include a user interface. The user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other devices with voice recognition functions, and a voice output device such as audio, earphones, etc. Optionally, the user interface may also include a standard wired interface and a wireless interface.
可选地,该电子装置5还可以包括显示器,显示器也可以称为显示屏或显示单元。在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)触摸器等。显示器用于显示在电子装置5中处理的信息以及用于显示可视化的用户界面。Optionally, the electronic device 5 may also include a display, and the display may also be referred to as a display screen or a display unit. In some embodiments, it may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an organic light-emitting diode (Organic Light-Emitting Diode, OLED) touch device, etc. The display is used to display the information processed in the electronic device 5 and to display a visualized user interface.
可选地,该电子装置5还可以包括射频(Radio Frequency,RF)电路,传感器、音频电路等等,在此不再赘述。Optionally, the electronic device 5 may also include a radio frequency (RF) circuit, a sensor, an audio circuit, etc., which will not be repeated here.
在图5所示的装置实施例中,作为一种计算机存储介质的存储器51中可以包括操作系统、以及检测框的选择程序50;处理器52执行存储器51中存储的检测框的选择程序50时实现如下步骤:S110、使用Mask R-CNN对目标图像进行实例分割,获得矩形的候选检测框以及与所述候选检测框对应的多边形轮廓;S120、分别计算所述候选检测框和所述多边形轮廓的IOU值,且所述多边形轮廓的IOU值大于第二预设阈值IOU
2时,筛选出所述候选检测框作为目标检测框;其中,所述第二预设阈值IOU
2大于第一预设阈值IOU
1。
In the device embodiment shown in FIG. 5, the memory 51 as a computer storage medium may include an operating system and a detection box selection program 50; when the processor 52 executes the detection box selection program 50 stored in the memory 51 The following steps are implemented: S110. Use Mask R-CNN to perform instance segmentation on the target image to obtain a rectangular candidate detection frame and a polygon contour corresponding to the candidate detection frame; S120. Calculate the candidate detection frame and the polygon contour respectively When the IOU value of the polygon contour is greater than the second preset threshold IOU 2 , the candidate detection frame is screened out as the target detection frame; wherein, the second preset threshold IOU 2 is greater than the first preset threshold. Threshold IOU 1 .
在其他实施例中,检测框的选择程序50还可以被分割为一个或者多个模块,一个或者多个模块被存储于存储器51中,并由处理器52执行,以完成本申请。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段。In other embodiments, the detection frame selection program 50 may also be divided into one or more modules, and the one or more modules are stored in the memory 51 and executed by the processor 52 to complete the application. The module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质中包括检测框的选择程序,所述检测框的选择程序被处理器执行时实现如下操作:S110、使用Mask R-CNN对目标图像进行实例分割,获得矩形的候选检测框以及其多边形轮廓;S120、分别计算所述候选检测框和所述多边形轮廓的IOU值;当所述候选检测框的IOU值大于第一预设阈值IOU
1,且所述多边形轮廓的IOU值大于第二预设阈值IOU
2时,筛选出所述候选检测框作为目标检测框;其中,所述第二预设阈值IOU
2大于第一预设阈值IOU
1。
In addition, the embodiment of the present application also proposes a computer-readable storage medium, the computer-readable storage medium includes a detection box selection program, and when the detection box selection program is executed by a processor, the following operations are implemented: S110, use Mask R-CNN performs instance segmentation on the target image to obtain a rectangular candidate detection frame and its polygonal contour; S120. Calculate the IOU values of the candidate detection frame and the polygon contour respectively; when the IOU value of the candidate detection frame is greater than When the first preset threshold IOU 1 and the IOU value of the polygonal contour is greater than the second preset threshold IOU 2 , the candidate detection frame is screened out as the target detection frame; wherein, the second preset threshold IOU 2 is greater than The first preset threshold IOU 1 .
本申请中所述的计算机可读存储介质可以为计算机非易失性可读存储介质。本申请之计算机可读存储介质的具体实施方式与上述利用Mask R-CNN选择检测框的方法及系统、电子装置的具体实施方式大致相同,在此不再赘述。The computer-readable storage medium described in this application may be a non-volatile computer-readable storage medium. The specific implementation of the computer-readable storage medium of the present application is substantially the same as the specific implementation of the above-mentioned method and system for selecting a detection frame using Mask R-CNN, and the electronic device, and will not be repeated here.
总的来说,本申请的利用Mask R-CNN神经网络进行的运算方法,监测图像在深度神经网络不断地被卷积和池化,利用神经网络算法对图像的关键特征进行提取和处理,获得图像中对象的矩形边框;将得到的矩形框与真实目标之间的重叠部分进行IOU值初步筛选;然后进一步利用Mask获得的的多边形轮廓,将多边形点集与真实目标间的多边形进行IOU值的二次筛选,最终符合设定阈值的边框作为检测框。In general, the calculation method of this application using the Mask R-CNN neural network, the monitoring image is continuously convolved and pooled in the deep neural network, and the neural network algorithm is used to extract and process the key features of the image to obtain The rectangular frame of the object in the image; the IOU value of the overlap between the obtained rectangular frame and the real target is preliminarily screened; then the polygon contour obtained by Mask is further used to calculate the IOU value of the polygon between the polygon point set and the real target After the second screening, the frame that finally meets the set threshold is used as the detection frame.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, It also includes other elements not explicitly listed, or elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article, or method that includes the element.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中, 包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments. Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disks, optical disks), including a number of instructions to enable a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the method described in each embodiment of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.
Claims (20)
- 一种利用Mask R-CNN选择检测框的方法,应用于电子装置,其特征在于,所述方法包括:A method for selecting a detection frame using Mask R-CNN, applied to an electronic device, is characterized in that the method includes:使用Mask R-CNN对目标图像进行实例分割,获得矩形的候选检测框以及与所述候选检测框对应的多边形轮廓;Use Mask R-CNN to perform instance segmentation on the target image to obtain a rectangular candidate detection frame and a polygon contour corresponding to the candidate detection frame;分别计算所述候选检测框和所述多边形轮廓的IOU值;当所述候选检测框的IOU值大于第一预设阈值IOU 1,且所述多边形轮廓的IOU值大于第二预设阈值IOU 2时,筛选出所述候选检测框作为目标检测框;其中,所述第二预设阈值IOU 2大于第一预设阈值IOU 1。 Calculate the IOU values of the candidate detection frame and the polygon contour respectively; when the IOU value of the candidate detection frame is greater than the first preset threshold IOU 1 , and the IOU value of the polygon contour is greater than the second preset threshold IOU 2 When the detection frame is selected, the candidate detection frame is selected as the target detection frame; wherein, the second preset threshold IOU 2 is greater than the first preset threshold IOU 1 .
- 根据权利要求1所述的利用Mask R-CNN选择检测框的方法,其特征在于,计算所述多边形轮廓的IOU值包括,通过二维数组映射编码方法计算所述多边形轮廓的IOU值;The method for selecting a detection frame using Mask R-CNN according to claim 1, wherein calculating the IOU value of the polygon contour comprises calculating the IOU value of the polygon contour by a two-dimensional array mapping coding method;其中所述二维数组映射编码方法包括:The two-dimensional array mapping coding method includes:将所述多边形轮廓与其预测框分别映射至一个预先被线段组合分割的平面模板上,其中,所述线段组合将所述平面模板分割成等大的分割块;Mapping the polygon contour and its prediction frame to a plane template previously divided by a line segment combination, wherein the line segment combination divides the plane template into equal-sized segmented blocks;将多边形轮廓和其预测框的映射结果分别对应至与所述平面模板等大的二值图上,将每个分割块表示为二维数组的映射编码(A,B)形式;其中,分割块对应多边形轮廓的编码状态赋值为A,分割块对应预测框的编码状态赋值为B;The mapping results of the polygon contour and its prediction frame are respectively mapped to a binary image as large as the plane template, and each segmentation block is represented as a two-dimensional array of mapping coding (A, B) form; where the segmentation block The coding state of the corresponding polygon contour is assigned the value A, and the coding state of the segmented block corresponding to the prediction frame is assigned the value B;当所述分割块位于所述多边形轮廓内时A=1,所述分割块位于所述多边形轮廓外时A=0;当所述分割块位于所述预测框内时B=1,所述分割块位于所述预测框外时B=0;When the partition block is located in the polygon contour, A=1, when the partition block is located outside the polygon contour, A=0; when the partition block is located in the prediction frame, B=1, the partition B=0 when the block is outside the prediction frame;通过统计分割块的编码,求取IOU值;其中,IOU=编码为(1,1)的分割块的数量/[编码为(1,0)的分割块数量+编码为(0,1)的分割块数量+编码为(1,1)分割块数量]。Obtain the IOU value by counting the coding of the divided blocks; among them, IOU = the number of divided blocks coded as (1, 1)/[the number of divided blocks coded as (1, 0) + coded as (0, 1) The number of divided blocks + the code is (1, 1) the number of divided blocks].
- 根据权利要求1所述的利用Mask R-CNN选择检测框的方法,其特征在于,计算所述多边形轮廓的IOU值包括,对通过交并集面积方法计算所述多边形轮廓的IOU值;The method for selecting a detection frame using Mask R-CNN according to claim 1, wherein calculating the IOU value of the polygon contour comprises calculating the IOU value of the polygon contour by an intersection area method;其中所述交并集面积方法包括:The method of intersection area includes:获得所述多边形轮廓与其预测框的关键点,并对所述关键点进行标注,其中关键点包括所述多边形轮廓与其预测框的各顶点以及所述多边形轮廓与其预测框的各交点;Obtaining key points of the polygon contour and its prediction frame, and labeling the key points, where the key points include the vertices of the polygon contour and its prediction frame and the intersections of the polygon contour and its prediction frame;将所述交点以及交点内部的点,通过排序构成交集多边形的点集;The intersection point and the points inside the intersection point are sorted to form a point set of the intersection polygon;计算多边形轮廓及其预测框的面积、交集多边形的面积,并根据所述多边形轮廓及其预测框的面积、交集多边形的面积计算出多边形轮廓的IOU值,IOU=交集多边形的面积/(多边形轮廓面积+预测框面积-交集多边形面积)。Calculate the area of the polygon contour and its prediction frame, the area of the intersection polygon, and calculate the IOU value of the polygon contour according to the polygon contour and the area of the prediction frame, and the area of the intersection polygon, IOU=area of the intersection polygon/(polygon contour Area + prediction box area-intersection polygon area).
- 根据权利要求1所述的利用Mask R-CNN选择检测框的方法,其特征在于,所述第一预设阈值IOU 1以及所述第二预设阈值IOU 2的取值范围均为0.5-0.7。 The method for selecting a detection frame using Mask R-CNN according to claim 1, wherein the range of the first preset threshold IOU 1 and the second preset threshold IOU 2 are both 0.5-0.7 .
- 根据权利要求2所述的利用Mask R-CNN选择检测框的方法,其特征在于,在所述筛选出所述候选检测框作为目标检测框之后还包括:The method for selecting a detection frame using Mask R-CNN according to claim 2, characterized in that, after the screening of the candidate detection frame as the target detection frame, the method further comprises:对所筛选出的所有候选检测框进行二维数组映射编码;Perform two-dimensional array mapping coding on all candidate detection frames selected;对编码后的候选检测框进行重合度比对;Compare the coincidence degree of the encoded candidate detection frame;当两个候选检测框的重合度大于重合阈值时,判定所述两个候选检测框所检测的目标中存在镜像。When the coincidence degree of the two candidate detection frames is greater than the coincidence threshold, it is determined that there is a mirror image in the target detected by the two candidate detection frames.
- 根据权利要求1所述的利用Mask R-CNN选择检测框的方法,其特征在于,上述利用Mask R-CNN选择检测框的方法通过Mask R-CNN选择检测框模型实现,所述Mask R-CNN选择检测框模型的神经网络结构包括卷积层Mask R-CNN以及位于所述Mask R-CNN之后的RoI Align层。The method for selecting a detection frame using Mask R-CNN according to claim 1, wherein the method for selecting a detection frame using Mask R-CNN is implemented by a Mask R-CNN selection detection frame model, and the Mask R-CNN The neural network structure for selecting the detection frame model includes a convolutional layer Mask R-CNN and a RoI Align layer behind the Mask R-CNN.
- 根据权利要求6所述的利用Mask R-CNN选择检测框的方法,其特征在于,所述Mask R-CNN选择检测框模型的神经网络结构还包括掩码层、分类器和全连接层,所述全连接层用于RoI边框修正训练。The method for selecting a detection frame using Mask R-CNN according to claim 6, wherein the neural network structure of the Mask R-CNN selection detection frame model further includes a mask layer, a classifier, and a fully connected layer, so The fully connected layer is used for RoI frame correction training.
- 一种利用Mask R-CNN选择检测框的系统,其特征在于,包括实例分割模块和目标检测框筛选模块;其中,A system for selecting a detection frame using Mask R-CNN is characterized in that it includes an instance segmentation module and a target detection frame screening module; wherein,所述实例分割模块,用于使用Mask R-CNN对目标图像进行实例分割,获得矩形的候选检测框以及与所述候选检测框对应的多边形轮廓;The instance segmentation module is configured to use Mask R-CNN to perform instance segmentation on a target image to obtain a rectangular candidate detection frame and a polygon contour corresponding to the candidate detection frame;所述目标检测框筛选模块,用于分别计算所述候选检测框和所述多边形 轮廓的IOU值;当所述候选检测框的IOU值大于第一预设阈值IOU 1,且所述多边形轮廓的IOU值大于第二预设阈值IOU 2时,筛选出所述候选检测框作为目标检测框;其中,所述第二预设阈值IOU 2大于第一预设阈值IOU 1。 The target detection frame screening module is configured to calculate the IOU values of the candidate detection frame and the polygon contour respectively; when the IOU value of the candidate detection frame is greater than a first preset threshold IOU 1 , and the polygon contour is When the IOU value is greater than the second preset threshold IOU 2 , the candidate detection frame is screened out as the target detection frame; wherein, the second preset threshold IOU 2 is greater than the first preset threshold IOU 1 .
- 根据权利要求8所述的利用Mask R-CNN选择检测框的系统,其特征在于,所述目标检测框筛选模块包括多边形轮廓的IOU值获取第一子模块,所述多边形轮廓的IOU值获取第一子模块,用于通过二维数组映射编码方法计算所述多边形轮廓的IOU值;The system for selecting a detection frame using Mask R-CNN according to claim 8, wherein the target detection frame screening module comprises a first sub-module for obtaining the IOU value of the polygon contour, and the IOU value obtaining the first submodule of the polygon contour A sub-module for calculating the IOU value of the polygon contour through a two-dimensional array mapping coding method;所述多边形轮廓的IOU值获取第一子模块包括二维数组映射单元和多边形轮廓的IOU值第一获取单元;其中,The first sub-module for acquiring the IOU value of the polygon contour includes a two-dimensional array mapping unit and a first acquiring unit of the IOU value of the polygon contour; wherein,所述二维数组映射单元,用于将所述多边形轮廓与其预测框分别映射至一个预先被线段组合分割的平面模板上,其中,所述线段组合将所述平面模板分割成等大的分割块;将多边形轮廓和其预测框的映射结果分别对应至与所述平面模板等大的二值图上,将每个分割块表示为二维数组的映射编码(A,B)形式;其中,分割块对应多边形轮廓的编码状态赋值为A,分割块对应预测框的编码状态赋值为B;当所述分割块位于所述多边形轮廓内时A=1,所述分割块位于所述多边形轮廓外时A=0;当所述分割块位于所述预测框内时B=1,所述分割块位于所述预测框外时B=0;The two-dimensional array mapping unit is configured to map the polygon contour and its prediction frame to a plane template pre-divided by line segment combination, wherein the line segment combination divides the plane template into equal-sized segmented blocks ; The mapping results of the polygon contour and its prediction frame are respectively mapped to a binary image as large as the plane template, and each segmentation block is represented as a two-dimensional array of mapping coding (A, B) form; where the segmentation The coding state of the block corresponding to the polygon contour is assigned the value A, and the coding state of the segmentation block corresponding to the prediction frame is assigned the value B; when the segmentation block is within the polygon contour, A=1 and the segmentation block is outside the polygon contour A=0; B=1 when the partition block is located in the prediction frame, and B=0 when the partition block is located outside the prediction frame;多边形轮廓的IOU值第一获取单元,用于通过统计分割块的编码,求取IOU值;其中,IOU=编码为(1,1)的分割块的数量/[编码为(1,0)的分割块数量+编码为(0,1)的分割块数量+编码为(1,1)分割块数量]。The first acquiring unit of the IOU value of the polygon contour is used to obtain the IOU value by counting the coding of the divided blocks; wherein, IOU=number of divided blocks coded as (1, 1)/[coded as (1, 0) Number of partitions + number of partitions coded as (0, 1) + number of partitions coded as (1, 1)].
- 根据权利要求8所述的利用Mask R-CNN选择检测框的系统,其特征在于,所述目标检测框筛选模块包括多边形轮廓的IOU值获取第二子模块,所述多边形轮廓的IOU值获取第二子模块,用于通过交并集面积方法计算所述多边形轮廓的IOU值;The system for selecting a detection frame using Mask R-CNN according to claim 8, wherein the target detection frame screening module includes a second sub-module for obtaining the IOU value of the polygon contour, and the IOU value of the polygon contour obtaining the first submodule. Two sub-modules, used to calculate the IOU value of the polygon outline by the intersection and union area method;所述多边形轮廓的IOU值获取第二子模块包括点集获取单元和多边形轮廓的IOU值第二获取单元;其中,The second sub-module for acquiring the IOU value of the polygon contour includes a point set acquiring unit and a second acquiring unit of the IOU value of the polygon contour; wherein,所述点集获取单元,用于通过获得所述多边形轮廓与其预测框的关键点,并对所述关键点进行标注,其中关键点包括所述多边形轮廓与其预测框的各顶点以及所述多边形轮廓与其预测框的各交点;将所述交点以及交点内部的 点,通过排序构成交集多边形的点集;The point set acquisition unit is configured to obtain key points of the polygon contour and its prediction frame, and mark the key points, wherein the key points include the polygon contour and the vertices of the prediction frame and the polygon contour Each intersection point with its prediction frame; the intersection point and the points inside the intersection point are sorted to form the point set of the intersection polygon;所述多边形轮廓的IOU值第二获取单元,用于计算多边形轮廓及其预测框的面积、交集多边形的面积,并根据所述多边形轮廓及其预测框的面积、交集多边形的面积计算出多边形轮廓的IOU值,IOU=交集多边形的面积/(多边形轮廓面积+预测框面积-交集多边形面积)。The second acquiring unit of the IOU value of the polygon contour is used to calculate the area of the polygon contour and its prediction frame, and the area of the intersection polygon, and calculate the polygon contour according to the area of the polygon contour and its prediction frame, and the area of the intersection polygon The IOU value of, IOU=area of intersection polygon/(polygon outline area+area of prediction frame-area of intersection polygon).
- 根据权利要求8所述的利用Mask R-CNN选择检测框的系统,其特征在于,所述目标检测框筛选模块中所述第一预设阈值IOU 1以及所述第二预设阈值IOU 2的取值范围均为0.5-0.7。 The system for selecting a detection frame using Mask R-CNN according to claim 8, wherein the first preset threshold IOU 1 and the second preset threshold IOU 2 in the target detection frame screening module The value range is 0.5-0.7.
- 根据权利要求8所述的利用Mask R-CNN选择检测框的系统,其特征在于,还包括镜像筛选模块,用于对所筛选出的所有候选检测框进行二维数组映射编码;对编码后的候选检测框进行重合度比对;当两个候选检测框的重合度大于重合阈值时,判定所述两个候选检测框所检测的目标中存在镜像。The system for selecting a detection frame using Mask R-CNN according to claim 8, characterized in that it further comprises a mirror screening module for performing two-dimensional array mapping encoding on all candidate detection frames selected; The coincidence degree comparison of the candidate detection frames is performed; when the coincidence degree of the two candidate detection frames is greater than the coincidence threshold, it is determined that there is a mirror image in the target detected by the two candidate detection frames.
- 根据权利要求8所述的利用Mask R-CNN选择检测框的系统,其特征在于,上述利用Mask R-CNN选择检测框的系统通过Mask R-CNN选择检测框模型实现,所述Mask R-CNN选择检测框模型的神经网络结构包括卷积层Mask R-CNN以及位于所述Mask R-CNN之后的RoI Align层。The system for selecting a detection frame using Mask R-CNN according to claim 8, wherein the system for selecting a detection frame using Mask R-CNN is implemented by a Mask R-CNN selection detection frame model, and the Mask R-CNN The neural network structure for selecting the detection frame model includes a convolutional layer Mask R-CNN and a RoI Align layer behind the Mask R-CNN.
- 根据权利要求8所述的利用Mask R-CNN选择检测框的系统,其特征在于,所述Mask R-CNN选择检测框模型的神经网络结构还包括掩码层、分类器和全连接层,所述全连接层用于RoI边框修正训练。The system for selecting a detection frame using Mask R-CNN according to claim 8, wherein the neural network structure of the Mask R-CNN selection detection frame model further includes a mask layer, a classifier, and a fully connected layer, so The fully connected layer is used for RoI frame correction training.
- 一种电子装置,其特征在于,该电子装置包括:存储器、处理器,所述存储器中包括检测框的选择程序,所述检测框的选择程序被所述处理器执行时实现如下步骤:An electronic device, characterized in that the electronic device includes a memory and a processor, the memory includes a detection frame selection program, and the following steps are implemented when the detection frame selection program is executed by the processor:使用Mask R-CNN对目标图像进行实例分割,获得矩形的候选检测框以及与所述候选检测框对应的多边形轮廓;Use Mask R-CNN to perform instance segmentation on the target image to obtain a rectangular candidate detection frame and a polygon contour corresponding to the candidate detection frame;分别计算所述候选检测框和所述多边形轮廓的IOU值;当所述候选检测框的IOU值大于第一预设阈值IOU 1,且所述多边形轮廓的IOU值大于第二预设阈值IOU 2时,筛选出所述候选检测框作为目标检测框;其中,所述第二预设阈值IOU 2大于第一预设阈值IOU 1。 Calculate the IOU values of the candidate detection frame and the polygon contour respectively; when the IOU value of the candidate detection frame is greater than the first preset threshold IOU 1 , and the IOU value of the polygon contour is greater than the second preset threshold IOU 2 When, the candidate detection frame is screened out as the target detection frame; wherein, the second preset threshold IOU 2 is greater than the first preset threshold IOU 1 .
- 根据权利要求15所述的电子装置,其特征在于,计算所述多边形轮廓的IOU值包括,通过二维数组映射编码方法计算所述多边形轮廓的IOU值;15. The electronic device according to claim 15, wherein calculating the IOU value of the polygonal contour comprises calculating the IOU value of the polygonal contour by a two-dimensional array mapping coding method;将所述多边形轮廓与其预测框分别映射至一个预先被线段组合分割的平面模板上,其中,所述线段组合将所述平面模板分割成等大的分割块;Mapping the polygon contour and its prediction frame to a plane template previously divided by a line segment combination, wherein the line segment combination divides the plane template into equal-sized segmented blocks;将多边形轮廓和其预测框的映射结果分别对应至与所述平面模板等大的二值图上,将每个分割块表示为二维数组的映射编码(A,B)形式;其中,分割块对应多边形轮廓的编码状态赋值为A,分割块对应预测框的编码状态赋值为B;The mapping results of the polygon contour and its prediction frame are respectively mapped to a binary image as large as the plane template, and each segmentation block is represented as a two-dimensional array of mapping coding (A, B) form; where the segmentation block The coding state of the corresponding polygon contour is assigned the value A, and the coding state of the segmented block corresponding to the prediction frame is assigned the value B;当所述分割块位于所述多边形轮廓内时A=1,所述分割块位于所述多边形轮廓外时A=0;当所述分割块位于所述预测框内时B=1,所述分割块位于所述预测框外时B=0;When the partition block is located in the polygon contour, A=1, when the partition block is located outside the polygon contour, A=0; when the partition block is located in the prediction frame, B=1, the partition B=0 when the block is outside the prediction frame;通过统计分割块的编码,求取IOU值;其中,IOU=编码为(1,1)的分割块的数量/[编码为(1,0)的分割块数量+编码为(0,1)的分割块数量+编码为(1,1)分割块数量]。Obtain the IOU value by counting the coding of the divided blocks; among them, IOU = the number of divided blocks coded as (1, 1)/[the number of divided blocks coded as (1, 0) + coded as (0, 1) The number of divided blocks + the code is (1, 1) the number of divided blocks].
- 根据权利要求15所述的电子装置,其特征在于,The electronic device according to claim 15, wherein:所述第一预设阈值IOU 1以及所述第二预设阈值IOU 2的取值范围均为0.5-0.7。 The value ranges of the first preset threshold IOU 1 and the second preset threshold IOU 2 are both 0.5-0.7.
- 根据权利要求15所述的电子装置,其特征在于,在所述筛选出所述候选检测框作为目标检测框之后还包括:15. The electronic device according to claim 15, wherein after the screening of the candidate detection frame as the target detection frame, the method further comprises:对所筛选出的所有候选检测框进行二维数组映射编码;Perform two-dimensional array mapping coding on all candidate detection frames selected;对编码后的候选检测框进行重合度比对;Compare the coincidence degree of the encoded candidate detection frame;当两个候选检测框的重合度大于重合阈值时,判定所述两个候选检测框所检测的目标中存在镜像。When the coincidence degree of the two candidate detection frames is greater than the coincidence threshold, it is determined that there is a mirror image in the target detected by the two candidate detection frames.
- 根据权利要求15所述的电子装置,其特征在于,所述存储器中包括检测框的选择程序通过Mask R-CNN选择检测框模型实现,所述Mask R-CNN选择检测框模型的神经网络结构包括卷积层Mask R-CNN以及位于所述Mask R-CNN之后的RoI Align层。The electronic device according to claim 15, characterized in that the selection program including the detection frame in the memory is implemented by Mask R-CNN selection detection frame model, and the neural network structure of the Mask R-CNN selection detection frame model includes The convolutional layer Mask R-CNN and the RoI Align layer behind the Mask R-CNN.
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质 存储有计算机程序,所述计算机程序包括检测框的选择程序,所述检测框的选择程序被处理器执行时,实现如权利要求1至7中任一项所述的利用Mask R-CNN选择检测框的方法的步骤。A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, the computer program includes a detection box selection program, and when the detection box selection program is executed by a processor, the following is achieved: The steps of the method for selecting a detection frame by using Mask R-CNN according to any one of claims 1 to 7.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910885674.7A CN110738125B (en) | 2019-09-19 | 2019-09-19 | Method, device and storage medium for selecting detection frame by Mask R-CNN |
CN201910885674.7 | 2019-09-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021051601A1 true WO2021051601A1 (en) | 2021-03-25 |
Family
ID=69268320
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/118279 WO2021051601A1 (en) | 2019-09-19 | 2019-11-14 | Method and system for selecting detection box using mask r-cnn, and electronic device and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110738125B (en) |
WO (1) | WO2021051601A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113409255A (en) * | 2021-06-07 | 2021-09-17 | 同济大学 | Zebra fish morphological classification method based on Mask R-CNN |
CN113409267A (en) * | 2021-06-17 | 2021-09-17 | 西安热工研究院有限公司 | Pavement crack detection and segmentation method based on deep learning |
CN113469302A (en) * | 2021-09-06 | 2021-10-01 | 南昌工学院 | Multi-circular target identification method and system for video image |
CN113591734A (en) * | 2021-08-03 | 2021-11-02 | 中国科学院空天信息创新研究院 | Target detection method based on improved NMS algorithm |
CN114526709A (en) * | 2022-02-21 | 2022-05-24 | 中国科学技术大学先进技术研究院 | Area measurement method and device based on unmanned aerial vehicle and storage medium |
CN116486265A (en) * | 2023-04-26 | 2023-07-25 | 北京卫星信息工程研究所 | Airplane fine granularity identification method based on target segmentation and graph classification |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111507341B (en) * | 2020-04-20 | 2022-06-28 | 广州文远知行科技有限公司 | Method, device and equipment for adjusting target bounding box and storage medium |
CN111898411B (en) * | 2020-06-16 | 2021-08-31 | 华南理工大学 | Text image labeling system, method, computer device and storage medium |
CN112132832B (en) | 2020-08-21 | 2021-09-28 | 苏州浪潮智能科技有限公司 | Method, system, device and medium for enhancing image instance segmentation |
CN112861711A (en) * | 2021-02-05 | 2021-05-28 | 深圳市安软科技股份有限公司 | Regional intrusion detection method and device, electronic equipment and storage medium |
CN113343779B (en) * | 2021-05-14 | 2024-03-12 | 南方电网调峰调频发电有限公司 | Environment abnormality detection method, device, computer equipment and storage medium |
CN113408531B (en) * | 2021-07-19 | 2023-07-14 | 北博(厦门)智能科技有限公司 | Target object shape frame selection method and terminal based on image recognition |
CN113705643B (en) * | 2021-08-17 | 2022-10-28 | 荣耀终端有限公司 | Target detection method and device and electronic equipment |
CN114022508A (en) * | 2021-09-18 | 2022-02-08 | 浙江大华技术股份有限公司 | Target tracking method, terminal and computer readable storage medium |
CN114863265A (en) * | 2021-12-14 | 2022-08-05 | 青岛海尔电冰箱有限公司 | Method for identifying information of articles in refrigerator, refrigerator and computer storage medium |
CN114882348A (en) * | 2022-03-29 | 2022-08-09 | 青岛海尔电冰箱有限公司 | Method for identifying information of articles in refrigerator, refrigerator and computer storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150071529A1 (en) * | 2013-09-12 | 2015-03-12 | Kabushiki Kaisha Toshiba | Learning image collection apparatus, learning apparatus, and target object detection apparatus |
CN106529565A (en) * | 2016-09-23 | 2017-03-22 | 北京市商汤科技开发有限公司 | Target identification model training and target identification method and device, and computing equipment |
CN108009554A (en) * | 2017-12-01 | 2018-05-08 | 国信优易数据有限公司 | A kind of image processing method and device |
CN109389640A (en) * | 2018-09-29 | 2019-02-26 | 北京字节跳动网络技术有限公司 | Image processing method and device |
CN109903310A (en) * | 2019-01-23 | 2019-06-18 | 平安科技(深圳)有限公司 | Method for tracking target, device, computer installation and computer storage medium |
CN110047095A (en) * | 2019-03-06 | 2019-07-23 | 平安科技(深圳)有限公司 | Tracking, device and terminal device based on target detection |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9972092B2 (en) * | 2016-03-31 | 2018-05-15 | Adobe Systems Incorporated | Utilizing deep learning for boundary-aware image segmentation |
US11475351B2 (en) * | 2017-11-15 | 2022-10-18 | Uatc, Llc | Systems and methods for object detection, tracking, and motion prediction |
CN108875577A (en) * | 2018-05-11 | 2018-11-23 | 深圳市易成自动驾驶技术有限公司 | Object detection method, device and computer readable storage medium |
CN109977943B (en) * | 2019-02-14 | 2024-05-07 | 平安科技(深圳)有限公司 | Image target recognition method, system and storage medium based on YOLO |
-
2019
- 2019-09-19 CN CN201910885674.7A patent/CN110738125B/en active Active
- 2019-11-14 WO PCT/CN2019/118279 patent/WO2021051601A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150071529A1 (en) * | 2013-09-12 | 2015-03-12 | Kabushiki Kaisha Toshiba | Learning image collection apparatus, learning apparatus, and target object detection apparatus |
CN106529565A (en) * | 2016-09-23 | 2017-03-22 | 北京市商汤科技开发有限公司 | Target identification model training and target identification method and device, and computing equipment |
CN108009554A (en) * | 2017-12-01 | 2018-05-08 | 国信优易数据有限公司 | A kind of image processing method and device |
CN109389640A (en) * | 2018-09-29 | 2019-02-26 | 北京字节跳动网络技术有限公司 | Image processing method and device |
CN109903310A (en) * | 2019-01-23 | 2019-06-18 | 平安科技(深圳)有限公司 | Method for tracking target, device, computer installation and computer storage medium |
CN110047095A (en) * | 2019-03-06 | 2019-07-23 | 平安科技(深圳)有限公司 | Tracking, device and terminal device based on target detection |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113409255A (en) * | 2021-06-07 | 2021-09-17 | 同济大学 | Zebra fish morphological classification method based on Mask R-CNN |
CN113409267A (en) * | 2021-06-17 | 2021-09-17 | 西安热工研究院有限公司 | Pavement crack detection and segmentation method based on deep learning |
CN113591734A (en) * | 2021-08-03 | 2021-11-02 | 中国科学院空天信息创新研究院 | Target detection method based on improved NMS algorithm |
CN113591734B (en) * | 2021-08-03 | 2024-02-20 | 中国科学院空天信息创新研究院 | Target detection method based on improved NMS algorithm |
CN113469302A (en) * | 2021-09-06 | 2021-10-01 | 南昌工学院 | Multi-circular target identification method and system for video image |
CN114526709A (en) * | 2022-02-21 | 2022-05-24 | 中国科学技术大学先进技术研究院 | Area measurement method and device based on unmanned aerial vehicle and storage medium |
CN116486265A (en) * | 2023-04-26 | 2023-07-25 | 北京卫星信息工程研究所 | Airplane fine granularity identification method based on target segmentation and graph classification |
CN116486265B (en) * | 2023-04-26 | 2023-12-19 | 北京卫星信息工程研究所 | Airplane fine granularity identification method based on target segmentation and graph classification |
Also Published As
Publication number | Publication date |
---|---|
CN110738125B (en) | 2023-08-01 |
CN110738125A (en) | 2020-01-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021051601A1 (en) | Method and system for selecting detection box using mask r-cnn, and electronic device and storage medium | |
CN108009543B (en) | License plate recognition method and device | |
CN112560999B (en) | Target detection model training method and device, electronic equipment and storage medium | |
CN110738101B (en) | Behavior recognition method, behavior recognition device and computer-readable storage medium | |
CN109657533B (en) | Pedestrian re-identification method and related product | |
WO2019218824A1 (en) | Method for acquiring motion track and device thereof, storage medium, and terminal | |
WO2019223586A1 (en) | Method and apparatus for detecting parking space usage condition, electronic device, and storage medium | |
CN110781836A (en) | Human body recognition method and device, computer equipment and storage medium | |
CN108805016B (en) | Head and shoulder area detection method and device | |
WO2020258077A1 (en) | Pedestrian detection method and device | |
CN111274926B (en) | Image data screening method, device, computer equipment and storage medium | |
CN113642474A (en) | Hazardous area personnel monitoring method based on YOLOV5 | |
CN115170792B (en) | Infrared image processing method, device and equipment and storage medium | |
Jiang et al. | A fast and high-performance object proposal method for vision sensors: Application to object detection | |
CN113780145A (en) | Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium | |
CN112800978B (en) | Attribute identification method, training method and device of part attribute extraction network | |
CN113591758A (en) | Human behavior recognition model training method and device and computer equipment | |
Lashkov et al. | Edge-computing-facilitated nighttime vehicle detection investigations with CLAHE-enhanced images | |
Bisht et al. | Integration of hough transform and inter-frame clustering for road lane detection and tracking | |
US20220300774A1 (en) | Methods, apparatuses, devices and storage media for detecting correlated objects involved in image | |
US12100214B2 (en) | Video-based public safety incident prediction system and method therefor | |
CN111914844A (en) | Image identification method and device, electronic equipment and storage medium | |
CN113449629B (en) | Lane line false and true identification device, method, equipment and medium based on driving video | |
CN112069357B (en) | Video resource processing method and device, electronic equipment and storage medium | |
CN112784691A (en) | Target detection model training method, target detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19946046 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19946046 Country of ref document: EP Kind code of ref document: A1 |