[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN109977997A - Image object detection and dividing method based on convolutional neural networks fast robust - Google Patents

Image object detection and dividing method based on convolutional neural networks fast robust Download PDF

Info

Publication number
CN109977997A
CN109977997A CN201910113339.5A CN201910113339A CN109977997A CN 109977997 A CN109977997 A CN 109977997A CN 201910113339 A CN201910113339 A CN 201910113339A CN 109977997 A CN109977997 A CN 109977997A
Authority
CN
China
Prior art keywords
bounding box
image
feature map
convolutional neural
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910113339.5A
Other languages
Chinese (zh)
Other versions
CN109977997B (en
Inventor
王坤峰
王飞跃
张慧
田永林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Academy Of Intelligent Industries
Institute of Automation of Chinese Academy of Science
Original Assignee
Qingdao Academy Of Intelligent Industries
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Academy Of Intelligent Industries, Institute of Automation of Chinese Academy of Science filed Critical Qingdao Academy Of Intelligent Industries
Priority to CN201910113339.5A priority Critical patent/CN109977997B/en
Publication of CN109977997A publication Critical patent/CN109977997A/en
Application granted granted Critical
Publication of CN109977997B publication Critical patent/CN109977997B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to computer vision fields, and in particular to a kind of image object detection and dividing method based on convolutional neural networks fast robust, it is intended to solve the problems, such as that image object detection is low with segmentation precision, inefficient.The method of the present invention includes: to generate Analysis On Multi-scale Features figure group using the image to be processed that deep layer convolutional neural networks will acquire;It is iterated feedback fusion, obtains fusion feature figure group;Based on the complementary relationship between each feature in fusion feature figure group, multiple positions for surrounding frame and encirclement frame on image to be processed are obtained;The posterior probability and adjustment package peripheral frame surrounded frame and belong to each classification are calculated according to prospect probability, condition class probability;According to position of the frame on image to be processed is surrounded, the target detection and segmentation of image to be processed are carried out.The method of the present invention has stronger feature representation ability, can overcome greatest differences existing for target under complex environment, processing speed with higher, and is accurately detected and segmentation result.

Description

Image target detection and segmentation method based on convolutional neural network rapid robustness
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a convolutional neural network-based fast robust image target detection and segmentation method.
Background
Object detection and segmentation is a very important research problem in computer vision. With the continuous development of electronic imaging technology, various cameras are widely used in social management, industrial production and people's life, and mass image data are generated every moment. The high-level information of interest is obtained through image analysis processing, intelligent perception and understanding of the physical world are achieved, and the method has a huge application value. The information of the target object generally belongs to the information of interest in image processing. In many cases, it is desirable to accurately detect and segment the object of interest in the image in time, and then apply the obtained information to a series of real-world tasks such as intelligent monitoring and environmental perception. Therefore, the objective detection and segmentation research is highly regarded by the academic and industrial fields.
The target detection needs to properly represent the target, and the target can be represented as a frame, a patch, a salient point, a component set and other shapes from the aspect of the shape. The frame is a simple and rough shape representation, and because the frame contains target pixels and pixels of a background or other targets, the frame is easy to interfere with feature extraction and influence the performance of target detection. Compared with a frame, the patch is used for representing the shape of the target, the pixels of each target can be accurately segmented in the image, and the target is strictly distinguished from the pixels of the background and other targets, so that the extraction of interference features is avoided, and the more accurate visual target detection task is realized.
At present, target detection methods based on a deep neural network are mainly divided into two types: a two-stage approach based on regional suggestions and a single-stage approach based on no regional suggestions. The target detection method based on the regional suggestion utilizes the information of texture, edge, color and the like in the image to find out the position where the target possibly appears in the image in advance, so that the high recall rate can be kept under the condition of selecting few windows (thousands or even hundreds), but the target detection cannot be carried out in real time in the candidate region generation stage; the target detection research based on the non-regional suggestion mainly adopts the regression idea, and directly predicts the position and the type of the surrounding frame based on the whole image, thereby greatly improving the efficiency and achieving the real-time effect.
The prior art has the following problems: the two-stage method based on the regional suggestion has high precision but poor real-time performance, and the method based on the non-regional suggestion has higher speed but lower precision; the existing feature representation method for generating the candidate bounding box only by using the last layer of feature map hardly covers the complexity of the environment and the huge difference of the target existence; in a single frame image, the accuracy and the calculation efficiency of the target detection algorithm still need to be improved.
Disclosure of Invention
In order to solve the above problems in the prior art, namely, the problems of low accuracy and poor efficiency of image target detection and segmentation, a first aspect of the present invention provides a fast robust image target detection and segmentation method based on a convolutional neural network, including:
step S10, acquiring a multi-scale feature map group of the image to be processed by adopting a deep convolutional neural network; the multi-scale feature map group comprises a high-level multi-scale feature map group and a low-level multi-scale feature map group;
step S20, performing iterative feedback fusion on the multi-scale feature map group to obtain a fusion feature map group;
step S30, acquiring a plurality of bounding boxes and positions of the bounding boxes on the image to be processed based on the complementary relation among the features in the fused feature map group;
and step S40, detecting and dividing the target of the image to be processed according to the position of the surrounding frame on the image to be processed.
In some preferred embodiments, in step S20, "iteratively feedback and fuse the multi-scale feature map set to obtain a fused feature map set", the method includes:
and sequentially performing convolution, standardization and activation operations on the high-level multi-scale feature map, and fusing the high-level multi-scale feature map and the corresponding low-level multi-scale feature map on the feature channel dimension to obtain a multi-scale fusion feature map.
In some preferred embodiments, in step S30, "obtaining a plurality of bounding boxes and positions of the bounding boxes on the image to be processed based on the complementary relationship between the features in the fused feature map set", includes:
step S31, calculating the size of the bounding box in the multi-scale fusion feature map group:
sj=2j×smin,j∈[0,3]
wherein S isjFor the jth bounding box, sminThe minimum dimension of the preset surrounding frame is set;
step S32, calculating the width of the bounding boxAnd height
Wherein r ismAn aspect ratio reference box;
step S33, calculating coordinates of the center point of the bounding box and the width and height of the bounding box:
wherein,coordinate parameters of the predicted bounding box relative to the reference bounding box are respectively;coordinate parameters of the real bounding box relative to the reference bounding box are respectively; x, y, w and h are the coordinates of the center of the predicted bounding box and the width and height of the predicted bounding box respectively; x is the number ofa、ya、wa、haThe coordinates of the center of the reference enclosing frame and the width and the height of the reference enclosing frame are respectively; x is the number of*、y*、w*、h*The coordinates of the center of the real bounding box and its width and height, respectively.
In some preferred embodiments, before the step S40 "performing the target detection and segmentation of the image to be processed according to the position of the bounding box on the image to be processed", there is further provided a step of adjusting the position of the bounding box, which includes the steps of:
step B10, calculating foreground classification loss and conditional classification loss of the bounding box;
step B20, calculating the position loss of the bounding box according to the calculation results of the foreground classification loss and the conditional classification loss;
step B30, calculating posterior probability of the bounding box belonging to each category according to the calculated foreground classification loss, the condition classification loss and the position loss of the bounding box, and obtaining the position adjustment quantity of the bounding box;
and step B40, adjusting the position of the bounding box according to the position adjustment amount of the bounding box.
In some preferred embodiments, the foreground classification loss and the conditional classification loss are calculated by:
loss of foreground classification
Wherein p isi(object) is the prediction confidence of whether bounding box i contains an object;is the label of the real frame corresponding to the bounding box iBook (I)Is 1, negative sampleIs 0;
conditional class classification penalty
Wherein,a conditional class probability distribution for containing the object; n is the number of object categories;is the label of the real box corresponding to the bounding box i, the positive sampleIs 1, negative sampleIs 0; i is the number of the bounding box.
In some preferred embodiments, the position loss of the bounding box is calculated by:
loss of bounding box positionloc(gj,g'j):
Wherein, gjIs a predicted bounding box coordinate, g'jIs the real bounding box coordinates.
In some preferred embodiments, the "posterior probability of bounding box belonging to each category" in step B30 is calculated by:
wherein L is the total loss of image target detection and segmentation, LmaskIn order to target the segmentation loss,for a corresponding detection loss on a feature level, NbbTotal number of bounding boxes for participating in gradient update, NlocThe number of positive samples in the bounding box participating in the gradient update.
On the other hand, the invention provides an image target detection and segmentation system based on convolutional neural network rapid robustness, which comprises an input module, a feature extraction module, a feedback fusion module, a target detection module, a position adjustment module, a target segmentation module and an output module;
the input module is configured to acquire and input an image to be processed;
the feature extraction module is configured to acquire a multi-scale feature map group of the image to be processed by adopting a deep convolutional neural network;
the feedback fusion module is configured to perform iterative feedback fusion on the multi-scale feature map group to obtain a fusion feature map group;
the target detection module is configured to acquire a plurality of bounding boxes and positions of the bounding boxes on the image to be processed based on the complementary relationship among the features in the fused feature map group;
the position adjusting module is configured to calculate posterior probabilities of the bounding boxes belonging to the various categories and adjust the positions of the bounding boxes;
the target detection and segmentation module is configured to detect and segment the target of the image to be processed according to the position of the surrounding frame on the image to be processed;
the output module is configured to output the obtained target detection and segmentation result.
In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being suitable for being loaded and executed by a processor to implement the above-mentioned fast and robust image object detection and segmentation method based on a convolutional neural network.
In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; the processor is suitable for executing various programs; the storage device is suitable for storing a plurality of programs; the program is suitable to be loaded and executed by a processor to realize the fast and robust image target detection and segmentation method based on the convolutional neural network.
The invention has the beneficial effects that:
(1) the method has strong characteristic expression capability, can overcome the great difference of targets in a complex environment, has high processing speed, and accurately obtains detection and segmentation results.
(2) The method obtains the feature representation of different scales from the input image through the convolutional neural network, carries out iterative feedback and fusion on the multi-scale features, and can obtain more robust feature representation.
(3) The method of the invention utilizes a series of convolution layers, pooling layers and deconvolution layers to construct the target segmentation module, and simultaneously fine-tunes the position of the surrounding frame, thereby obtaining more precise target detection and segmentation results.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a schematic flow chart of the fast and robust image target detection and segmentation method based on the convolutional neural network of the present invention;
FIG. 2 is a diagram of an example of different features of an input image according to an embodiment of the fast robust image target detection and segmentation method based on a convolutional neural network of the present invention;
FIG. 3 is a characteristic example diagram of a more rich semantic information of an input image according to an embodiment of the fast and robust image target detection and segmentation method based on a convolutional neural network of the present invention;
FIG. 4 is a schematic diagram of an iterative feedback fusion module according to an embodiment of the fast robust image target detection and segmentation method based on the convolutional neural network of the present invention;
FIG. 5 is an exemplary diagram of coordinates, foreground probability and conditional class probability scores of a prediction detection box on a plurality of feature layers according to an embodiment of the fast robust image target detection and segmentation method based on a convolutional neural network of the present invention;
FIG. 6 is a diagram illustrating visualization of segmentation module features according to an embodiment of the fast robust image target detection and segmentation method based on a convolutional neural network of the present invention;
FIG. 7 is an exemplary diagram of a target detection and segmentation result according to an embodiment of the fast and robust image target detection and segmentation method based on a convolutional neural network.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The invention discloses a fast robust image target detection and segmentation method based on a convolutional neural network, which comprises the following steps:
step S10, acquiring a multi-scale feature map group of the image to be processed by adopting a deep convolutional neural network; the multi-scale feature map group comprises a high-level multi-scale feature map group and a low-level multi-scale feature map group;
step S20, performing iterative feedback fusion on the multi-scale feature map group to obtain a fusion feature map group;
step S30, acquiring a plurality of bounding boxes and positions of the bounding boxes on the image to be processed based on the complementary relation among the features in the fused feature map group;
and step S40, detecting and dividing the target of the image to be processed according to the position of the surrounding frame on the image to be processed.
In order to more clearly describe the fast and robust image target detection and segmentation method based on the convolutional neural network of the present invention, the following describes each step in the embodiment of the method of the present invention in detail with reference to fig. 1.
The image target detection and segmentation method based on the convolutional neural network rapid robustness comprises the steps of S10-S40, and the steps are described in detail as follows:
step S10, acquiring a multi-scale feature map group of the image to be processed by adopting a deep convolutional neural network; the multi-scale feature map group comprises a high-level multi-scale feature map group and a low-level multi-scale feature map group.
The preferred embodiment of the present invention may select a ResNet network or a VGGNet network in the deep convolutional neural network to obtain the feature representation of the image to be processed, where the resolutions of different feature maps are 4, 8, 16, 32, and 64, respectively, with respect to the reduction factor of the input image, as shown in fig. 2, which is a different feature map of the input image according to the embodiment of the present invention.
And step S20, performing iterative feedback fusion on the multi-scale feature map group to obtain a fused feature map group.
Shallow features have higher spatial resolution and contain more fine-grained information, but compared with deep features, the layers lack rich semantic information, so that feature extraction of objects with huge differences is limited by the feature layers, especially for small objects and shielding objects in a detection system, along with the continuous deepening of convolutional layers, a large amount of information of the difficult objects is lost on a high-level feature map, but the low-level feature map lacks rich semantic information, and the performance of the detection system is greatly influenced.
The method sequentially performs convolution, standardization and activation operations on the high-level multi-scale feature map, and fuses the high-level multi-scale feature map and the corresponding low-level multi-scale feature map on the feature channel dimension to obtain the multi-scale fusion feature map, wherein semantic information of the multi-scale fusion feature map is richer. Fig. 3 is a diagram illustrating a feature example that semantic information of an input image is richer according to an embodiment of the present invention. The embodiments may adopt different methods to implement the above operations, for example:
method one, the feature layer LiObtaining an output characteristic layer M through convolution operationi
Computing output feature layer MiDimension of (H)i×Wi×li) Characteristic diagram F ofi LConversion to dimension (H)i×Wi×fi) Characteristic diagram F ofi MAs shown in formula (1):
Mi=LiΘC′iformula (1)
Where Θ represents the multidimensional convolution filter operation.
Method two, passing the high-level features upFeature F for generating a feedback context after a sampling operationi DMatching the feedback context feature with the lower layer feature diagram of the previous layer to obtain an output feature layer DiDimension of isCharacteristic diagram ofConversion to dimension (H)i×Wi×fi) Characteristic diagram F ofi D. Fig. 4 is a schematic diagram of an iterative feedback fusion module according to an embodiment of the present invention.
Step S30, based on the complementary relationship between the features in the fused feature map group, obtaining a plurality of bounding boxes and the positions of the bounding boxes on the image to be processed.
In order to adapt to the scale change of objects with different sizes, different from other methods, the images are converted into different scales, the images are independently processed through a convolutional neural network, and then the image results of the different scales are integrated, the method adopts a framework of a multi-layer feature layer regression detection frame, as shown in fig. 5, the method is a coordinate, foreground probability and conditional category probability score representation diagram of a prediction detection frame on a plurality of feature layers of an embodiment of the invention, and fused feature diagram components obtained after the processing of step S20 are respectively B2, B3, B4, B5 and B6, and the coordinate, foreground probability and conditional category probability score of a bounding box are predicted on the feature diagrams of the five different scales. The receptive fields of different feature maps are different in size, and the specific position in the feature map is responsible for the specific area in the image and the specific size of the object.
Step S31, calculating the size of the bounding box in the multi-scale fusion feature map group, as shown in formula (2):
sj=2j×smin,j∈[0,3]formula (2)
Wherein S isjFor the jth bounding box, sminIs the preset minimum dimension of the surrounding frame.
Step S32, calculating the width of the bounding boxAnd heightAs shown in formulas (3) and (4):
wherein r ismIs an aspect ratio reference box.
In the embodiment of the invention, the minimum dimension s of the surrounding frameminCan be 32 pixels, reference box
Step S33, calculating coordinates of the center point of the bounding box and the width and height of the bounding box, as shown in equations (5) to (12):
wherein,coordinate parameters of the predicted bounding box relative to the reference bounding box are respectively;coordinate parameters of the real bounding box relative to the reference bounding box are respectively; x, y, w and h are the coordinates of the center of the predicted bounding box and the width and height of the predicted bounding box respectively; x is the number ofa、ya、wa、haThe coordinates of the center of the reference enclosing frame and the width and the height of the reference enclosing frame are respectively; x is the number of*、y*、w*、h*The coordinates of the center of the real bounding box and its width and height, respectively.
The calculation process is a bounding box regression mode from a reference box to a nearby real box, and compared with a method based on a region of interest, the method of the invention acquires the regression position of the detected bounding box in a different mode. In order to solve the problem of large scale degree change difference under a complex environment, the regression vectors of a plurality of detection surrounding frames are learned, and each regression vector corresponds to one scale and one length-width ratio. Therefore, different sizes of detection bounding boxes can be predicted by a fixed size feature.
And step S40, detecting and dividing the target of the image to be processed according to the position of the surrounding frame on the image to be processed.
And constructing a target segmentation module by utilizing a series of convolution layers, pooling layers and deconvolution layers to finish image target segmentation. Fig. 6 is a diagram illustrating an example of a feature visualization of a segmentation module according to an embodiment of the present invention. For each target detection bounding box, bounding box features are extracted from a multi-scale feedback fusion feature map, then region blocks with the same scale extracted from each feature layer are directly stacked on a feature channel, and the stacked features are subjected to a series of convolution and deconvolution operations to obtain a final target segmentation result. The specific steps for extracting the characteristics of the bounding box are as follows: firstly traversing each surrounding frame, dividing the surrounding frame into a fixed number of unit blocks, then calculating an image numerical value on a pixel point with coordinates of a floating point number in each unit block by adopting a bilinear interpolation method, and finally performing maximum pooling operation in each unit block.
For each class, the target segmentation result is a binary pixel-level segmentation map, having (N +1) M2Dimension, where N represents the number of object classes (without background), M2Representing the resolution of the binary segmentation map for each class.
FIG. 7 is a diagram illustrating an exemplary target detection and segmentation result according to an embodiment of the present invention. The first column in the figure is the input image, the second column in the figure is the "real" label information, the third column in the figure is the corresponding detection and segmentation result map when the input size is 512 × 512 pixels, and the fourth column in the figure is the corresponding detection and segmentation result map when the input size is 1024 × 1024 pixels.
Step S40, according to the position of the bounding box on the image to be processed, before completing the target detection and segmentation of the image to be processed, a step of adjusting the position of the bounding box is also provided, which comprises the following steps:
and step B10, calculating foreground classification loss and conditional classification loss of the bounding box.
Loss of foreground classificationAs shown in formula (13):
wherein p isi(object) is the prediction confidence of whether bounding box i contains an object;is the label of the real box corresponding to the bounding box i, the positive sampleIs 1, negative sampleIs 0.
Conditional class classification penaltyAs shown in equation (14):
wherein,a conditional class probability distribution for containing the object; n is the number of object categories;is the label of the real box corresponding to the bounding box i, the positive sampleIs 1, negative sampleIs 0; i is the number of the bounding box.
And B20, calculating the position loss of the bounding box according to the calculation results of the foreground classification loss and the conditional classification loss.
Loss of bounding box positionloc(gj,g'j) As shown in equation (15):
wherein, gjIs a predicted bounding box coordinate, g'jIs the real bounding box coordinates.
And step B30, calculating posterior probability of the bounding box belonging to each category according to the foreground classification loss, the condition classification loss and the position loss of the bounding box obtained by calculation, and obtaining the position adjustment quantity of the bounding box.
The posterior probability that the bounding box belongs to each category is shown by equations (16) and (17):
wherein L is the total loss of image target detection and segmentation, LmaskIn order to target the segmentation loss,for a corresponding detection loss on a feature level, NbbTotal number of bounding boxes for participating in gradient update, NlocFor the ones of the positive samples in the bounding box participating in the gradient updateAnd (4) counting.
And step B40, adjusting the position of the bounding box according to the position adjustment amount of the bounding box.
The image target detection and segmentation system based on the convolutional neural network rapid robustness comprises an input module, a feature extraction module, a feedback fusion module, a target detection module, a position adjustment module, a target segmentation module and an output module, wherein the input module is used for inputting a feature of an image to be detected;
the input module is configured to acquire and input an image to be processed;
the feature extraction module is configured to acquire a multi-scale feature map group of the image to be processed by adopting a deep convolutional neural network;
the feedback fusion module is configured to perform iterative feedback fusion on the multi-scale feature map group to obtain a fusion feature map group;
the target detection module is configured to acquire a plurality of bounding boxes and positions of the bounding boxes on the image to be processed based on the complementary relationship among the features in the fused feature map group;
the position adjusting module is configured to calculate posterior probabilities of the bounding boxes belonging to the various categories and adjust the positions of the bounding boxes;
the target detection and segmentation module is configured to detect and segment the target of the image to be processed according to the position of the surrounding frame on the image to be processed;
the output module is configured to output the obtained target detection and segmentation result.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that, the convolutional neural network based fast robust image target detection and segmentation system provided in the above embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are decomposed or combined again, for example, the modules in the above embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
A storage device according to a third embodiment of the present invention stores a plurality of programs, which are suitable for being loaded and executed by a processor to implement the above-mentioned fast and robust image object detection and segmentation method based on convolutional neural network.
A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable to be loaded and executed by a processor to realize the fast and robust image target detection and segmentation method based on the convolutional neural network.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A fast and robust image target detection and segmentation method based on a convolutional neural network is characterized by comprising the following steps:
step S10, acquiring a multi-scale feature map group of the image to be processed by adopting a deep convolutional neural network; the multi-scale feature map group comprises a high-level multi-scale feature map group and a low-level multi-scale feature map group;
step S20, performing iterative feedback fusion on the multi-scale feature map group to obtain a fusion feature map group;
step S30, acquiring a plurality of bounding boxes and positions of the bounding boxes on the image to be processed based on the complementary relation among the features in the fused feature map group;
and step S40, detecting and dividing the target of the image to be processed according to the position of the surrounding frame on the image to be processed.
2. The convolutional neural network-based fast and robust image target detection and segmentation method as claimed in claim 1, wherein in step S20, "iteratively feedback fusing the multi-scale feature map group to obtain a fused feature map group" is performed by:
and sequentially performing convolution, standardization and activation operations on the high-level multi-scale feature map, and fusing the high-level multi-scale feature map and the corresponding low-level multi-scale feature map on the feature channel dimension to obtain a multi-scale fusion feature map.
3. The convolutional neural network-based fast and robust image target detection and segmentation method as claimed in claim 1, wherein in step S30, "based on the complementary relationship between the features in the fused feature map set, obtaining a plurality of bounding boxes and the positions of the bounding boxes on the image to be processed" includes the steps of:
step S31, calculating the size of the bounding box in the multi-scale fusion feature map group:
sj=2j×smin,j∈[0,3]
wherein S isjFor the jth bounding box, sminIs the preset minimum dimension of the surrounding frame;
step S32, calculating a bounding box SjWidth of (2)And height
Wherein r ismAn aspect ratio reference box;
step S33, calculating coordinates of the center point of the bounding box and the width and height of the bounding box:
wherein,respectively as predicted bounding box againstReferring to the coordinate parameters of the bounding box;coordinate parameters of the real bounding box relative to the reference bounding box are respectively; x, y, w and h are the coordinates of the center of the predicted bounding box and the width and height of the predicted bounding box respectively; x is the number ofa、ya、wa、haThe coordinates of the center of the reference enclosing frame and the width and the height of the reference enclosing frame are respectively; x is the number of*、y*、w*、h*The coordinates of the center of the real bounding box and its width and height, respectively.
4. The convolutional neural network based fast robust image object detection and segmentation method as claimed in claim 1, wherein step S40 "object detection and segmentation of the image to be processed according to the position of the bounding box on the image to be processed" is preceded by the step of bounding box position adjustment, which comprises the steps of:
step B10, calculating foreground classification loss and conditional classification loss of the bounding box;
step B20, calculating the position loss of the bounding box according to the calculation results of the foreground classification loss and the conditional classification loss;
step B30, calculating posterior probability of the bounding box belonging to each category according to the calculated foreground classification loss, the condition classification loss and the position loss of the bounding box, and obtaining the position adjustment quantity of the bounding box;
and step B40, adjusting the position of the bounding box according to the position adjustment amount of the bounding box.
5. The convolutional neural network based fast robust image target detection and segmentation method as claimed in claim 4, wherein the foreground classification loss and the conditional classification loss are calculated by:
loss of foreground classification
Wherein p isi(object) is the prediction confidence of whether bounding box i contains an object;is the label of the real box corresponding to the bounding box i, the positive sampleIs 1, negative sampleIs 0;
conditional class classification penalty
Wherein,a conditional class probability distribution for containing the object; n is the number of object categories;is the label of the real box corresponding to the bounding box i, the positive sampleIs 1, negative sampleIs 0; i is the number of the bounding box.
6. The convolutional neural network based fast and robust image target detection and segmentation method as claimed in claim 5, wherein the position loss of the bounding box is calculated by:
loss of bounding box positionloc(gj,g'j):
Wherein, gjIs a predicted bounding box coordinate, g'jIs the real bounding box coordinates.
7. The convolutional neural network based fast robust image target detection and segmentation method as claimed in claim 6, wherein in step B30, "posterior probability that bounding box belongs to each class" is calculated as:
wherein L is the total loss of image target detection and segmentation, LmaskIn order to target the segmentation loss,for a corresponding detection loss on a feature level, NbbTotal number of bounding boxes for participating in gradient update, NlocThe number of positive samples in the bounding box participating in the gradient update.
8. An image target detection and segmentation system based on convolutional neural network rapid robustness is characterized by comprising an input module, a feature extraction module, a feedback fusion module, a target detection module, a position adjustment module, a target segmentation module and an output module;
the input module is configured to acquire and input an image to be processed;
the feature extraction module is configured to acquire a multi-scale feature map group of the image to be processed by adopting a deep convolutional neural network;
the feedback fusion module is configured to perform iterative feedback fusion on the multi-scale feature map group to obtain a fusion feature map group;
the target detection module is configured to acquire a plurality of bounding boxes and positions of the bounding boxes on the image to be processed based on the complementary relationship among the features in the fused feature map group;
the position adjusting module is configured to calculate posterior probabilities of the bounding boxes belonging to the various categories and adjust the positions of the bounding boxes;
the target detection and segmentation module is configured to detect and segment the target of the image to be processed according to the position of the surrounding frame on the image to be processed;
the output module is configured to output the obtained target detection and segmentation result.
9. A storage device having stored thereon a plurality of programs, wherein the programs are adapted to be loaded and executed by a processor to implement the convolutional neural network based fast and robust image object detection and segmentation method of any of claims 1-7.
10. A treatment apparatus comprises
A processor adapted to execute various programs; and
a storage device adapted to store a plurality of programs;
wherein the program is adapted to be loaded and executed by a processor to perform:
the convolutional neural network-based fast and robust image target detection and segmentation method as claimed in any one of claims 1 to 7.
CN201910113339.5A 2019-02-13 2019-02-13 Image target detection and segmentation method based on convolutional neural network rapid robustness Expired - Fee Related CN109977997B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910113339.5A CN109977997B (en) 2019-02-13 2019-02-13 Image target detection and segmentation method based on convolutional neural network rapid robustness

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910113339.5A CN109977997B (en) 2019-02-13 2019-02-13 Image target detection and segmentation method based on convolutional neural network rapid robustness

Publications (2)

Publication Number Publication Date
CN109977997A true CN109977997A (en) 2019-07-05
CN109977997B CN109977997B (en) 2021-02-02

Family

ID=67076963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910113339.5A Expired - Fee Related CN109977997B (en) 2019-02-13 2019-02-13 Image target detection and segmentation method based on convolutional neural network rapid robustness

Country Status (1)

Country Link
CN (1) CN109977997B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555481A (en) * 2019-09-06 2019-12-10 腾讯科技(深圳)有限公司 Portrait style identification method and device and computer readable storage medium
CN110796649A (en) * 2019-10-29 2020-02-14 北京市商汤科技开发有限公司 Target detection method and device, electronic equipment and storage medium
CN111079623A (en) * 2019-12-09 2020-04-28 成都睿沿科技有限公司 Target detection method, device and storage medium
CN111524106A (en) * 2020-04-13 2020-08-11 北京推想科技有限公司 Skull fracture detection and model training method, device, equipment and storage medium
CN112016512A (en) * 2020-09-08 2020-12-01 重庆市地理信息和遥感应用中心 Remote sensing image small target detection method based on feedback type multi-scale training
CN112184635A (en) * 2020-09-10 2021-01-05 上海商汤智能科技有限公司 Target detection method, device, storage medium and equipment
CN112215853A (en) * 2020-10-12 2021-01-12 北京字节跳动网络技术有限公司 Image segmentation method and device, electronic equipment and computer readable medium
CN112766244A (en) * 2021-04-07 2021-05-07 腾讯科技(深圳)有限公司 Target object detection method and device, computer equipment and storage medium
CN113689430A (en) * 2021-10-26 2021-11-23 紫东信息科技(苏州)有限公司 Image processing method and device for enteroscopy state monitoring
CN114918944A (en) * 2022-06-02 2022-08-19 哈尔滨理工大学 Family service robot grabbing detection method based on convolutional neural network fusion
CN113496139B (en) * 2020-03-18 2024-02-13 北京京东乾石科技有限公司 Method and apparatus for detecting objects from images and training object detection models

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106097353A (en) * 2016-06-15 2016-11-09 北京市商汤科技开发有限公司 The method for segmenting objects merged based on multi-level regional area and device, calculating equipment
CN106897732A (en) * 2017-01-06 2017-06-27 华中科技大学 Multi-direction Method for text detection in a kind of natural picture based on connection word section
US20180096457A1 (en) * 2016-09-08 2018-04-05 Carnegie Mellon University Methods and Software For Detecting Objects in Images Using a Multiscale Fast Region-Based Convolutional Neural Network
CN107977620A (en) * 2017-11-29 2018-05-01 华中科技大学 A kind of multi-direction scene text single detection method based on full convolutional network
CN108288088A (en) * 2018-01-17 2018-07-17 浙江大学 A kind of scene text detection method based on end-to-end full convolutional neural networks
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN108564097A (en) * 2017-12-05 2018-09-21 华南理工大学 A kind of multiscale target detection method based on depth convolutional neural networks
CN108596101A (en) * 2018-04-25 2018-09-28 上海交通大学 A kind of remote sensing images multi-target detection method based on convolutional neural networks
CN108710868A (en) * 2018-06-05 2018-10-26 中国石油大学(华东) A kind of human body critical point detection system and method based under complex scene
CN109190458A (en) * 2018-07-20 2019-01-11 华南理工大学 A kind of person of low position's head inspecting method based on deep learning
WO2019014625A1 (en) * 2017-07-14 2019-01-17 Google Llc Object detection using neural network systems

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106097353A (en) * 2016-06-15 2016-11-09 北京市商汤科技开发有限公司 The method for segmenting objects merged based on multi-level regional area and device, calculating equipment
US20180096457A1 (en) * 2016-09-08 2018-04-05 Carnegie Mellon University Methods and Software For Detecting Objects in Images Using a Multiscale Fast Region-Based Convolutional Neural Network
CN106897732A (en) * 2017-01-06 2017-06-27 华中科技大学 Multi-direction Method for text detection in a kind of natural picture based on connection word section
WO2019014625A1 (en) * 2017-07-14 2019-01-17 Google Llc Object detection using neural network systems
CN107977620A (en) * 2017-11-29 2018-05-01 华中科技大学 A kind of multi-direction scene text single detection method based on full convolutional network
CN108564097A (en) * 2017-12-05 2018-09-21 华南理工大学 A kind of multiscale target detection method based on depth convolutional neural networks
CN108288088A (en) * 2018-01-17 2018-07-17 浙江大学 A kind of scene text detection method based on end-to-end full convolutional neural networks
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN108596101A (en) * 2018-04-25 2018-09-28 上海交通大学 A kind of remote sensing images multi-target detection method based on convolutional neural networks
CN108710868A (en) * 2018-06-05 2018-10-26 中国石油大学(华东) A kind of human body critical point detection system and method based under complex scene
CN109190458A (en) * 2018-07-20 2019-01-11 华南理工大学 A kind of person of low position's head inspecting method based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
QIFANG XIE ET AL.: "Aircraft Detection of High-Resolution Remote Sensing Image Based on Faster R-CNN Model and SSD Model", 《WEB OF SCIENCE》 *
夏源: "基于深度学习的图像物体检测与分类", 《中国优秀硕士学位论文全文数据库》 *
许庆志: "基于深度学习的交通标志识别及实现", 《中国优秀硕士学位论文全文数据库》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555481A (en) * 2019-09-06 2019-12-10 腾讯科技(深圳)有限公司 Portrait style identification method and device and computer readable storage medium
CN110796649B (en) * 2019-10-29 2022-08-30 北京市商汤科技开发有限公司 Target detection method and device, electronic equipment and storage medium
CN110796649A (en) * 2019-10-29 2020-02-14 北京市商汤科技开发有限公司 Target detection method and device, electronic equipment and storage medium
CN111079623A (en) * 2019-12-09 2020-04-28 成都睿沿科技有限公司 Target detection method, device and storage medium
CN113496139B (en) * 2020-03-18 2024-02-13 北京京东乾石科技有限公司 Method and apparatus for detecting objects from images and training object detection models
CN111524106A (en) * 2020-04-13 2020-08-11 北京推想科技有限公司 Skull fracture detection and model training method, device, equipment and storage medium
CN111524106B (en) * 2020-04-13 2021-05-28 推想医疗科技股份有限公司 Skull fracture detection and model training method, device, equipment and storage medium
CN112016512A (en) * 2020-09-08 2020-12-01 重庆市地理信息和遥感应用中心 Remote sensing image small target detection method based on feedback type multi-scale training
CN112184635A (en) * 2020-09-10 2021-01-05 上海商汤智能科技有限公司 Target detection method, device, storage medium and equipment
CN112215853A (en) * 2020-10-12 2021-01-12 北京字节跳动网络技术有限公司 Image segmentation method and device, electronic equipment and computer readable medium
CN112766244B (en) * 2021-04-07 2021-06-08 腾讯科技(深圳)有限公司 Target object detection method and device, computer equipment and storage medium
CN112766244A (en) * 2021-04-07 2021-05-07 腾讯科技(深圳)有限公司 Target object detection method and device, computer equipment and storage medium
CN113689430A (en) * 2021-10-26 2021-11-23 紫东信息科技(苏州)有限公司 Image processing method and device for enteroscopy state monitoring
CN113689430B (en) * 2021-10-26 2022-02-15 紫东信息科技(苏州)有限公司 Image processing method and device for enteroscopy state monitoring
CN114918944A (en) * 2022-06-02 2022-08-19 哈尔滨理工大学 Family service robot grabbing detection method based on convolutional neural network fusion

Also Published As

Publication number Publication date
CN109977997B (en) 2021-02-02

Similar Documents

Publication Publication Date Title
CN109977997B (en) Image target detection and segmentation method based on convolutional neural network rapid robustness
CN108288088B (en) Scene text detection method based on end-to-end full convolution neural network
Xie et al. Multilevel cloud detection in remote sensing images based on deep learning
CN110543837B (en) Visible light airport airplane detection method based on potential target point
CN108830285B (en) Target detection method for reinforcement learning based on fast-RCNN
CN107633226B (en) Human body motion tracking feature processing method
CN111080693A (en) Robot autonomous classification grabbing method based on YOLOv3
CN108305260B (en) Method, device and equipment for detecting angular points in image
CN109711416B (en) Target identification method and device, computer equipment and storage medium
CN111627050B (en) Training method and device for target tracking model
CN113221787A (en) Pedestrian multi-target tracking method based on multivariate difference fusion
CN114627052A (en) Infrared image air leakage and liquid leakage detection method and system based on deep learning
Zhuang et al. Instance segmentation based 6D pose estimation of industrial objects using point clouds for robotic bin-picking
CN113610895A (en) Target tracking method and device, electronic equipment and readable storage medium
CN108109163A (en) A kind of moving target detecting method for video of taking photo by plane
CN109658442B (en) Multi-target tracking method, device, equipment and computer readable storage medium
CN108764244B (en) Potential target area detection method based on convolutional neural network and conditional random field
CN111091101B (en) High-precision pedestrian detection method, system and device based on one-step method
CN110310305B (en) Target tracking method and device based on BSSD detection and Kalman filtering
CN107305691A (en) Foreground segmentation method and device based on images match
CN111027538A (en) Container detection method based on instance segmentation model
CN115019181B (en) Remote sensing image rotating target detection method, electronic equipment and storage medium
CN116503760A (en) Unmanned aerial vehicle cruising detection method based on self-adaptive edge feature semantic segmentation
CN111382638B (en) Image detection method, device, equipment and storage medium
Tan et al. Automobile Component Recognition Based on Deep Learning Network with Coarse‐Fine‐Grained Feature Fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210202

Termination date: 20220213

CF01 Termination of patent right due to non-payment of annual fee