CN109977997A

CN109977997A - Image object detection and dividing method based on convolutional neural networks fast robust

Info

Publication number: CN109977997A
Application number: CN201910113339.5A
Authority: CN
Inventors: 王坤峰; 王飞跃; 张慧; 田永林
Original assignee: Qingdao Academy Of Intelligent Industries; Institute of Automation of Chinese Academy of Science
Current assignee: Qingdao Academy Of Intelligent Industries; Institute of Automation of Chinese Academy of Science
Priority date: 2019-02-13
Filing date: 2019-02-13
Publication date: 2019-07-05
Anticipated expiration: 2039-02-13
Also published as: CN109977997B

Abstract

The invention belongs to computer vision fields, and in particular to a kind of image object detection and dividing method based on convolutional neural networks fast robust, it is intended to solve the problems, such as that image object detection is low with segmentation precision, inefficient.The method of the present invention includes: to generate Analysis On Multi-scale Features figure group using the image to be processed that deep layer convolutional neural networks will acquire；It is iterated feedback fusion, obtains fusion feature figure group；Based on the complementary relationship between each feature in fusion feature figure group, multiple positions for surrounding frame and encirclement frame on image to be processed are obtained；The posterior probability and adjustment package peripheral frame surrounded frame and belong to each classification are calculated according to prospect probability, condition class probability；According to position of the frame on image to be processed is surrounded, the target detection and segmentation of image to be processed are carried out.The method of the present invention has stronger feature representation ability, can overcome greatest differences existing for target under complex environment, processing speed with higher, and is accurately detected and segmentation result.

Description

Image target detection and segmentation method based on convolutional neural network rapid robustness

Technical Field

The invention belongs to the field of computer vision, and particularly relates to a convolutional neural network-based fast robust image target detection and segmentation method.

Background

Object detection and segmentation is a very important research problem in computer vision. With the continuous development of electronic imaging technology, various cameras are widely used in social management, industrial production and people's life, and mass image data are generated every moment. The high-level information of interest is obtained through image analysis processing, intelligent perception and understanding of the physical world are achieved, and the method has a huge application value. The information of the target object generally belongs to the information of interest in image processing. In many cases, it is desirable to accurately detect and segment the object of interest in the image in time, and then apply the obtained information to a series of real-world tasks such as intelligent monitoring and environmental perception. Therefore, the objective detection and segmentation research is highly regarded by the academic and industrial fields.

The target detection needs to properly represent the target, and the target can be represented as a frame, a patch, a salient point, a component set and other shapes from the aspect of the shape. The frame is a simple and rough shape representation, and because the frame contains target pixels and pixels of a background or other targets, the frame is easy to interfere with feature extraction and influence the performance of target detection. Compared with a frame, the patch is used for representing the shape of the target, the pixels of each target can be accurately segmented in the image, and the target is strictly distinguished from the pixels of the background and other targets, so that the extraction of interference features is avoided, and the more accurate visual target detection task is realized.

At present, target detection methods based on a deep neural network are mainly divided into two types: a two-stage approach based on regional suggestions and a single-stage approach based on no regional suggestions. The target detection method based on the regional suggestion utilizes the information of texture, edge, color and the like in the image to find out the position where the target possibly appears in the image in advance, so that the high recall rate can be kept under the condition of selecting few windows (thousands or even hundreds), but the target detection cannot be carried out in real time in the candidate region generation stage; the target detection research based on the non-regional suggestion mainly adopts the regression idea, and directly predicts the position and the type of the surrounding frame based on the whole image, thereby greatly improving the efficiency and achieving the real-time effect.

The prior art has the following problems: the two-stage method based on the regional suggestion has high precision but poor real-time performance, and the method based on the non-regional suggestion has higher speed but lower precision; the existing feature representation method for generating the candidate bounding box only by using the last layer of feature map hardly covers the complexity of the environment and the huge difference of the target existence; in a single frame image, the accuracy and the calculation efficiency of the target detection algorithm still need to be improved.

Disclosure of Invention

In order to solve the above problems in the prior art, namely, the problems of low accuracy and poor efficiency of image target detection and segmentation, a first aspect of the present invention provides a fast robust image target detection and segmentation method based on a convolutional neural network, including:

step S10, acquiring a multi-scale feature map group of the image to be processed by adopting a deep convolutional neural network; the multi-scale feature map group comprises a high-level multi-scale feature map group and a low-level multi-scale feature map group;

step S20, performing iterative feedback fusion on the multi-scale feature map group to obtain a fusion feature map group;

step S30, acquiring a plurality of bounding boxes and positions of the bounding boxes on the image to be processed based on the complementary relation among the features in the fused feature map group;

and step S40, detecting and dividing the target of the image to be processed according to the position of the surrounding frame on the image to be processed.

In some preferred embodiments, in step S20, "iteratively feedback and fuse the multi-scale feature map set to obtain a fused feature map set", the method includes:

and sequentially performing convolution, standardization and activation operations on the high-level multi-scale feature map, and fusing the high-level multi-scale feature map and the corresponding low-level multi-scale feature map on the feature channel dimension to obtain a multi-scale fusion feature map.

In some preferred embodiments, in step S30, "obtaining a plurality of bounding boxes and positions of the bounding boxes on the image to be processed based on the complementary relationship between the features in the fused feature map set", includes:

step S31, calculating the size of the bounding box in the multi-scale fusion feature map group:

s_j＝2^j×s_min,j∈[0,3]

wherein S is_jFor the jth bounding box, s_minThe minimum dimension of the preset surrounding frame is set;

step S32, calculating the width of the bounding boxAnd height

Wherein r is_mAn aspect ratio reference box;

step S33, calculating coordinates of the center point of the bounding box and the width and height of the bounding box:

wherein,coordinate parameters of the predicted bounding box relative to the reference bounding box are respectively;coordinate parameters of the real bounding box relative to the reference bounding box are respectively; x, y, w and h are the coordinates of the center of the predicted bounding box and the width and height of the predicted bounding box respectively; x is the number of_a、y_a、w_a、h_aThe coordinates of the center of the reference enclosing frame and the width and the height of the reference enclosing frame are respectively; x is the number of^*、y^*、w^*、h^*The coordinates of the center of the real bounding box and its width and height, respectively.

In some preferred embodiments, before the step S40 "performing the target detection and segmentation of the image to be processed according to the position of the bounding box on the image to be processed", there is further provided a step of adjusting the position of the bounding box, which includes the steps of:

step B10, calculating foreground classification loss and conditional classification loss of the bounding box;

step B20, calculating the position loss of the bounding box according to the calculation results of the foreground classification loss and the conditional classification loss;

step B30, calculating posterior probability of the bounding box belonging to each category according to the calculated foreground classification loss, the condition classification loss and the position loss of the bounding box, and obtaining the position adjustment quantity of the bounding box;

and step B40, adjusting the position of the bounding box according to the position adjustment amount of the bounding box.

In some preferred embodiments, the foreground classification loss and the conditional classification loss are calculated by:

loss of foreground classification

Wherein p is_i(object) is the prediction confidence of whether bounding box i contains an object;is the label of the real frame corresponding to the bounding box iBook (I)Is 1, negative sampleIs 0;

conditional class classification penalty

Wherein,a conditional class probability distribution for containing the object; n is the number of object categories;is the label of the real box corresponding to the bounding box i, the positive sampleIs 1, negative sampleIs 0; i is the number of the bounding box.

In some preferred embodiments, the position loss of the bounding box is calculated by:

loss of bounding box position_loc(g_j,g'_j)：

Wherein, g_jIs a predicted bounding box coordinate, g'_jIs the real bounding box coordinates.

In some preferred embodiments, the "posterior probability of bounding box belonging to each category" in step B30 is calculated by:

wherein L is the total loss of image target detection and segmentation, L_maskIn order to target the segmentation loss,for a corresponding detection loss on a feature level, N_bbTotal number of bounding boxes for participating in gradient update, N_locThe number of positive samples in the bounding box participating in the gradient update.

On the other hand, the invention provides an image target detection and segmentation system based on convolutional neural network rapid robustness, which comprises an input module, a feature extraction module, a feedback fusion module, a target detection module, a position adjustment module, a target segmentation module and an output module;

the input module is configured to acquire and input an image to be processed;

the feature extraction module is configured to acquire a multi-scale feature map group of the image to be processed by adopting a deep convolutional neural network;

the feedback fusion module is configured to perform iterative feedback fusion on the multi-scale feature map group to obtain a fusion feature map group;

the target detection module is configured to acquire a plurality of bounding boxes and positions of the bounding boxes on the image to be processed based on the complementary relationship among the features in the fused feature map group;

the position adjusting module is configured to calculate posterior probabilities of the bounding boxes belonging to the various categories and adjust the positions of the bounding boxes;

the target detection and segmentation module is configured to detect and segment the target of the image to be processed according to the position of the surrounding frame on the image to be processed;

the output module is configured to output the obtained target detection and segmentation result.

In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being suitable for being loaded and executed by a processor to implement the above-mentioned fast and robust image object detection and segmentation method based on a convolutional neural network.

In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; the processor is suitable for executing various programs; the storage device is suitable for storing a plurality of programs; the program is suitable to be loaded and executed by a processor to realize the fast and robust image target detection and segmentation method based on the convolutional neural network.

The invention has the beneficial effects that:

(1) the method has strong characteristic expression capability, can overcome the great difference of targets in a complex environment, has high processing speed, and accurately obtains detection and segmentation results.

(2) The method obtains the feature representation of different scales from the input image through the convolutional neural network, carries out iterative feedback and fusion on the multi-scale features, and can obtain more robust feature representation.

(3) The method of the invention utilizes a series of convolution layers, pooling layers and deconvolution layers to construct the target segmentation module, and simultaneously fine-tunes the position of the surrounding frame, thereby obtaining more precise target detection and segmentation results.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a schematic flow chart of the fast and robust image target detection and segmentation method based on the convolutional neural network of the present invention;

FIG. 2 is a diagram of an example of different features of an input image according to an embodiment of the fast robust image target detection and segmentation method based on a convolutional neural network of the present invention;

FIG. 3 is a characteristic example diagram of a more rich semantic information of an input image according to an embodiment of the fast and robust image target detection and segmentation method based on a convolutional neural network of the present invention;

FIG. 4 is a schematic diagram of an iterative feedback fusion module according to an embodiment of the fast robust image target detection and segmentation method based on the convolutional neural network of the present invention;

FIG. 5 is an exemplary diagram of coordinates, foreground probability and conditional class probability scores of a prediction detection box on a plurality of feature layers according to an embodiment of the fast robust image target detection and segmentation method based on a convolutional neural network of the present invention;

FIG. 6 is a diagram illustrating visualization of segmentation module features according to an embodiment of the fast robust image target detection and segmentation method based on a convolutional neural network of the present invention;

FIG. 7 is an exemplary diagram of a target detection and segmentation result according to an embodiment of the fast and robust image target detection and segmentation method based on a convolutional neural network.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

The invention discloses a fast robust image target detection and segmentation method based on a convolutional neural network, which comprises the following steps:

In order to more clearly describe the fast and robust image target detection and segmentation method based on the convolutional neural network of the present invention, the following describes each step in the embodiment of the method of the present invention in detail with reference to fig. 1.

The image target detection and segmentation method based on the convolutional neural network rapid robustness comprises the steps of S10-S40, and the steps are described in detail as follows:

step S10, acquiring a multi-scale feature map group of the image to be processed by adopting a deep convolutional neural network; the multi-scale feature map group comprises a high-level multi-scale feature map group and a low-level multi-scale feature map group.

The preferred embodiment of the present invention may select a ResNet network or a VGGNet network in the deep convolutional neural network to obtain the feature representation of the image to be processed, where the resolutions of different feature maps are 4, 8, 16, 32, and 64, respectively, with respect to the reduction factor of the input image, as shown in fig. 2, which is a different feature map of the input image according to the embodiment of the present invention.

And step S20, performing iterative feedback fusion on the multi-scale feature map group to obtain a fused feature map group.

Shallow features have higher spatial resolution and contain more fine-grained information, but compared with deep features, the layers lack rich semantic information, so that feature extraction of objects with huge differences is limited by the feature layers, especially for small objects and shielding objects in a detection system, along with the continuous deepening of convolutional layers, a large amount of information of the difficult objects is lost on a high-level feature map, but the low-level feature map lacks rich semantic information, and the performance of the detection system is greatly influenced.

The method sequentially performs convolution, standardization and activation operations on the high-level multi-scale feature map, and fuses the high-level multi-scale feature map and the corresponding low-level multi-scale feature map on the feature channel dimension to obtain the multi-scale fusion feature map, wherein semantic information of the multi-scale fusion feature map is richer. Fig. 3 is a diagram illustrating a feature example that semantic information of an input image is richer according to an embodiment of the present invention. The embodiments may adopt different methods to implement the above operations, for example:

method one, the feature layer L_iObtaining an output characteristic layer M through convolution operation_i：

Computing output feature layer M_iDimension of (H)_i×W_i×l_i) Characteristic diagram F of_i ^LConversion to dimension (H)_i×W_i×f_i) Characteristic diagram F of_i ^MAs shown in formula (1):

M_i＝L_iΘC′_iformula (1)

Where Θ represents the multidimensional convolution filter operation.

Method two, passing the high-level features upFeature F for generating a feedback context after a sampling operation_i ^DMatching the feedback context feature with the lower layer feature diagram of the previous layer to obtain an output feature layer D_iDimension of isCharacteristic diagram ofConversion to dimension (H)_i×W_i×f_i) Characteristic diagram F of_i ^D. Fig. 4 is a schematic diagram of an iterative feedback fusion module according to an embodiment of the present invention.

Step S30, based on the complementary relationship between the features in the fused feature map group, obtaining a plurality of bounding boxes and the positions of the bounding boxes on the image to be processed.

In order to adapt to the scale change of objects with different sizes, different from other methods, the images are converted into different scales, the images are independently processed through a convolutional neural network, and then the image results of the different scales are integrated, the method adopts a framework of a multi-layer feature layer regression detection frame, as shown in fig. 5, the method is a coordinate, foreground probability and conditional category probability score representation diagram of a prediction detection frame on a plurality of feature layers of an embodiment of the invention, and fused feature diagram components obtained after the processing of step S20 are respectively B2, B3, B4, B5 and B6, and the coordinate, foreground probability and conditional category probability score of a bounding box are predicted on the feature diagrams of the five different scales. The receptive fields of different feature maps are different in size, and the specific position in the feature map is responsible for the specific area in the image and the specific size of the object.

Step S31, calculating the size of the bounding box in the multi-scale fusion feature map group, as shown in formula (2):

s_j＝2^j×s_min,j∈[0,3]formula (2)

Wherein S is_jFor the jth bounding box, s_minIs the preset minimum dimension of the surrounding frame.

Step S32, calculating the width of the bounding boxAnd heightAs shown in formulas (3) and (4):

wherein r is_mIs an aspect ratio reference box.

In the embodiment of the invention, the minimum dimension s of the surrounding frame_minCan be 32 pixels, reference box

Step S33, calculating coordinates of the center point of the bounding box and the width and height of the bounding box, as shown in equations (5) to (12):

The calculation process is a bounding box regression mode from a reference box to a nearby real box, and compared with a method based on a region of interest, the method of the invention acquires the regression position of the detected bounding box in a different mode. In order to solve the problem of large scale degree change difference under a complex environment, the regression vectors of a plurality of detection surrounding frames are learned, and each regression vector corresponds to one scale and one length-width ratio. Therefore, different sizes of detection bounding boxes can be predicted by a fixed size feature.

And constructing a target segmentation module by utilizing a series of convolution layers, pooling layers and deconvolution layers to finish image target segmentation. Fig. 6 is a diagram illustrating an example of a feature visualization of a segmentation module according to an embodiment of the present invention. For each target detection bounding box, bounding box features are extracted from a multi-scale feedback fusion feature map, then region blocks with the same scale extracted from each feature layer are directly stacked on a feature channel, and the stacked features are subjected to a series of convolution and deconvolution operations to obtain a final target segmentation result. The specific steps for extracting the characteristics of the bounding box are as follows: firstly traversing each surrounding frame, dividing the surrounding frame into a fixed number of unit blocks, then calculating an image numerical value on a pixel point with coordinates of a floating point number in each unit block by adopting a bilinear interpolation method, and finally performing maximum pooling operation in each unit block.

For each class, the target segmentation result is a binary pixel-level segmentation map, having (N +1) M²Dimension, where N represents the number of object classes (without background), M²Representing the resolution of the binary segmentation map for each class.

FIG. 7 is a diagram illustrating an exemplary target detection and segmentation result according to an embodiment of the present invention. The first column in the figure is the input image, the second column in the figure is the "real" label information, the third column in the figure is the corresponding detection and segmentation result map when the input size is 512 × 512 pixels, and the fourth column in the figure is the corresponding detection and segmentation result map when the input size is 1024 × 1024 pixels.

Step S40, according to the position of the bounding box on the image to be processed, before completing the target detection and segmentation of the image to be processed, a step of adjusting the position of the bounding box is also provided, which comprises the following steps:

and step B10, calculating foreground classification loss and conditional classification loss of the bounding box.

Loss of foreground classificationAs shown in formula (13):

wherein p is_i(object) is the prediction confidence of whether bounding box i contains an object;is the label of the real box corresponding to the bounding box i, the positive sampleIs 1, negative sampleIs 0.

Conditional class classification penaltyAs shown in equation (14):

And B20, calculating the position loss of the bounding box according to the calculation results of the foreground classification loss and the conditional classification loss.

Loss of bounding box position_loc(g_j,g'_j) As shown in equation (15):

And step B30, calculating posterior probability of the bounding box belonging to each category according to the foreground classification loss, the condition classification loss and the position loss of the bounding box obtained by calculation, and obtaining the position adjustment quantity of the bounding box.

The posterior probability that the bounding box belongs to each category is shown by equations (16) and (17):

wherein L is the total loss of image target detection and segmentation, L_maskIn order to target the segmentation loss,for a corresponding detection loss on a feature level, N_bbTotal number of bounding boxes for participating in gradient update, N_locFor the ones of the positive samples in the bounding box participating in the gradient updateAnd (4) counting.

The image target detection and segmentation system based on the convolutional neural network rapid robustness comprises an input module, a feature extraction module, a feedback fusion module, a target detection module, a position adjustment module, a target segmentation module and an output module, wherein the input module is used for inputting a feature of an image to be detected;

the input module is configured to acquire and input an image to be processed;

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.

It should be noted that, the convolutional neural network based fast robust image target detection and segmentation system provided in the above embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are decomposed or combined again, for example, the modules in the above embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

A storage device according to a third embodiment of the present invention stores a plurality of programs, which are suitable for being loaded and executed by a processor to implement the above-mentioned fast and robust image object detection and segmentation method based on convolutional neural network.

A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable to be loaded and executed by a processor to realize the fast and robust image target detection and segmentation method based on the convolutional neural network.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A fast and robust image target detection and segmentation method based on a convolutional neural network is characterized by comprising the following steps:

2. The convolutional neural network-based fast and robust image target detection and segmentation method as claimed in claim 1, wherein in step S20, "iteratively feedback fusing the multi-scale feature map group to obtain a fused feature map group" is performed by:

3. The convolutional neural network-based fast and robust image target detection and segmentation method as claimed in claim 1, wherein in step S30, "based on the complementary relationship between the features in the fused feature map set, obtaining a plurality of bounding boxes and the positions of the bounding boxes on the image to be processed" includes the steps of:

s_j＝2^j×s_min,j∈[0,3]

wherein S is_jFor the jth bounding box, s_minIs the preset minimum dimension of the surrounding frame;

step S32, calculating a bounding box S_jWidth of (2)And height

Wherein r is_mAn aspect ratio reference box;

wherein,respectively as predicted bounding box againstReferring to the coordinate parameters of the bounding box;coordinate parameters of the real bounding box relative to the reference bounding box are respectively; x, y, w and h are the coordinates of the center of the predicted bounding box and the width and height of the predicted bounding box respectively; x is the number of_a、y_a、w_a、h_aThe coordinates of the center of the reference enclosing frame and the width and the height of the reference enclosing frame are respectively; x is the number of^*、y^*、w^*、h^*The coordinates of the center of the real bounding box and its width and height, respectively.

4. The convolutional neural network based fast robust image object detection and segmentation method as claimed in claim 1, wherein step S40 "object detection and segmentation of the image to be processed according to the position of the bounding box on the image to be processed" is preceded by the step of bounding box position adjustment, which comprises the steps of:

5. The convolutional neural network based fast robust image target detection and segmentation method as claimed in claim 4, wherein the foreground classification loss and the conditional classification loss are calculated by:

loss of foreground classification

Wherein p is_i(object) is the prediction confidence of whether bounding box i contains an object;is the label of the real box corresponding to the bounding box i, the positive sampleIs 1, negative sampleIs 0;

conditional class classification penalty

6. The convolutional neural network based fast and robust image target detection and segmentation method as claimed in claim 5, wherein the position loss of the bounding box is calculated by:

loss of bounding box position_loc(g_j,g'_j)：

7. The convolutional neural network based fast robust image target detection and segmentation method as claimed in claim 6, wherein in step B30, "posterior probability that bounding box belongs to each class" is calculated as:

8. An image target detection and segmentation system based on convolutional neural network rapid robustness is characterized by comprising an input module, a feature extraction module, a feedback fusion module, a target detection module, a position adjustment module, a target segmentation module and an output module;

the input module is configured to acquire and input an image to be processed;

9. A storage device having stored thereon a plurality of programs, wherein the programs are adapted to be loaded and executed by a processor to implement the convolutional neural network based fast and robust image object detection and segmentation method of any of claims 1-7.

10. A treatment apparatus comprises

A processor adapted to execute various programs; and

a storage device adapted to store a plurality of programs;

wherein the program is adapted to be loaded and executed by a processor to perform:

the convolutional neural network-based fast and robust image target detection and segmentation method as claimed in any one of claims 1 to 7.