CN111191730A

CN111191730A - Method and system for detecting oversized image target facing embedded deep learning

Info

Publication number: CN111191730A
Application number: CN202010003131.0A
Authority: CN
Inventors: 程陶然; 白林亭; 文鹏程; 高泽; 邹昌昊; 李欣瑶
Original assignee: Xian Aeronautics Computing Technique Research Institute of AVIC
Current assignee: Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date: 2020-01-02
Filing date: 2020-01-02
Publication date: 2020-05-22
Anticipated expiration: 2040-01-02
Also published as: CN111191730B

Abstract

The invention provides an embedded deep learning-oriented ultra-large image target detection method and system, which comprises the following steps: the device comprises an image preprocessing unit, a target detection unit and an image post-processing unit. The method for detecting the oversized image target facing the embedded deep learning, provided by the invention, faces to more and more oversized image processing requirements in the field of the embedded deep learning, and aims at the limitation of an embedded multi-core processor running deep neural network, a single image is divided into a plurality of images based on a blocking idea, so that the parallel target detection of the single image is realized, the detection result is analyzed by an image post-processing unit, the plurality of detection results are integrated, and the problem of low processing efficiency of an embedded computing platform is effectively solved.

Description

Method and system for detecting oversized image target facing embedded deep learning

Technical Field

The invention belongs to the field of intelligent calculation, and relates to an ultra-large image target detection method for embedded deep learning.

Background

With the continuous development of deep learning technology, image processing algorithms based on deep learning, such as target detection, semantic segmentation and the like, have achieved great success. However, due to the complex structure and numerous parameters of the deep neural network, the deep learning algorithm is poor in performance on an embedded computing platform with limited resources, and can only process a few small images. For large-size remote sensing images or high-definition images rich in detail information, the processing efficiency of the embedded deep learning algorithm is very low.

In order to improve the processing efficiency of the embedded deep learning algorithm, hardware manufacturers successively release a plurality of AI chips, the processing speed is improved by optimizing a computing structure on a hardware level, and the data throughput is improved by a multi-core technology. However, due to the high interconnectivity of the deep neural network, the parallelization computation difficulty of the algorithm model for processing a single image is increased. The real-time processing requirement for streaming media data is difficult to fully utilize the parallel computing capability of a multi-core processor.

Disclosure of Invention

The invention provides an embedded deep learning-oriented ultra-large image target detection method, which aims to efficiently process ultra-large image data in real time and complete comprehensive and accurate detection of a target.

The solution proposed by the invention is as follows:

the method for detecting the oversized image target facing the embedded deep learning comprises the following steps:

1) receiving an input image, and dividing the image into a plurality of sub-images according to the positions of pixel points, wherein any one sub-image is mutually overlapped with all the adjacent sub-images in a region close to a boundary, and the region is marked as a division redundant region;

2) respectively carrying out target detection on each sub-image to obtain target related information (target detection result);

3) the obtained target related information of each sub-image is referred to, and the targets of the divided redundant areas are re-detected and positioned; and marking the original image according to the updated target detection result, and outputting a visualization result.

Based on the above scheme, the invention further optimizes as follows:

optionally, in the step 1), image segmentation is specifically performed according to a preset width W, a preset height H, and a preset redundancy threshold T; the redundancy threshold value T characterizes the number of pixels which are overlapped with each other in the area close to the boundary; and 2) specifically, respectively carrying out target detection on the sub-images with the size of W multiplied by H based on a convolutional neural network algorithm, and outputting target related information, wherein the target related information at least comprises a target position.

Optionally, the target related information further includes an affiliated category and a confidence level.

Optionally, the width W, the height H and the redundancy threshold T are determined according to the original image size, the target size and the processor computing power;

the width range of the divided image is [0, W-1], [ W-T,2W-T-1], [2W-2T,3W-2T-1] … …;

the height range of the segmentation image is [0, H-1], [ H-T,2H-T-1], [2H-2T,3H-2T-1] … …;

any width range and height range together constitute a sub-image area.

Optionally, step 3) specifically includes:

analyzing whether a target regression frame exists in the segmentation redundant area of each sub-image, wherein the distance from the target regression frame to the boundary of the sub-image is less than aT, a is a set coefficient, and the preferable value range is 0< a < 0.5;

if the target regression frame meeting the conditions exists, re-determining the corresponding width or height range by taking the segmentation redundant area as the center, and sampling a new sub-image with the size of W multiplied by H;

and carrying out target detection on the new sub-image again, only adopting a target detection result of the segmented redundant area, updating the target related information of the segmented redundant area, and marking all the target related information on the original image according to a required rule according to the target related information of the previous non-segmented redundant area to form a visual output image.

Correspondingly, the invention also provides an embedded deep learning-oriented oversized image target detection system, which comprises:

the image preprocessing unit is used for receiving an input image and dividing the image into a plurality of sub-images according to the positions of pixel points, wherein any one sub-image is mutually overlapped with all the adjacent sub-images in a region close to a boundary, and the region is marked as a division redundant region;

the target detection unit is used for respectively carrying out target detection on each sub-image to obtain target related information;

the image post-processing unit is used for detecting and positioning the targets of the divided redundant areas again by referring to the obtained target related information of each sub-image; and marking the original image according to the updated target detection result, and outputting a visualization result.

Optionally, the image post-processing unit detects and positions the target of the segmented redundant area again, specifically: and generating a new sub-image by taking the divided redundant area as a center, reentering the target detection unit for target detection, and only adopting a target detection result of the divided redundant area.

Correspondingly, the invention also provides an embedded device, which comprises a processor and a program memory, wherein when a program stored in the program memory is loaded by the processor, the method for detecting the ultra-large-size image target facing the embedded deep learning is executed.

Compared with the prior art, the invention has the following beneficial effects:

the method for detecting the target of the super-large-size image facing the embedded deep learning realizes the parallel target detection of a single super-large-size image based on the deep neural network, fully utilizes the computing resources of a multi-core processor, improves the deep learning processing efficiency under the embedded computing environment, and effectively solves the real-time processing problem of the captured data of the (super) high-definition camera.

The invention fully considers that the possible target is on the segmentation line when the sub-image is segmented, so that the redundant processing (which is equivalent to the offset of the standard grid unit in the horizontal direction and the vertical direction) and the corresponding image post-processing are particularly carried out on the pixels near the boundary, thereby avoiding the missing detection of the target and realizing the complete and accurate detection of the target.

Drawings

Fig. 1 is a schematic diagram of the principle of the present invention.

Detailed Description

The invention is further described in detail below with reference to the figures and examples.

The invention provides an embedded depth learning-oriented oversized image target detection method aiming at the increasing requirements of the field of embedded depth learning and aiming at the limitation of an embedded multi-core processor running depth neural network.

As shown in fig. 1, the method for detecting an image target with a super-large size facing embedded deep learning may be implemented by the following software modules:

an image pre-processing unit to: receiving an input image, and performing image segmentation according to a preset width W, a preset height H and a preset redundancy threshold value T. Specifically, the image width is divided into [0, W-1], [ W-T,2W-T-1], [2W-2T,3W-2T-1] … …, and the image height is divided into [0, H-1], [ H-T,2H-T-1], [2H-2T,3H-2T-1] … …;

the target detection unit is used for: respectively carrying out target detection on the sub-images with the size of WxH based on a convolutional neural network algorithm, and marking information such as target positions, categories and confidence degrees;

the image post-processing unit is used for: and fusing target detection results, and re-detecting and positioning the targets of the divided redundant areas to form a visual target detection result. Specifically, whether a target regression frame exists in the adjacent sub-image segmentation redundant area or not is analyzed, and the distance from the target regression frame to the sub-image boundary is smaller than aT (0< a <0.5), and then target detection is carried out again on the sub-image with the segmentation redundant area as the center and the sampling size of W multiplied by H; and taking the secondary detection result as the standard for the target in the partitioned redundant area.

The image post-processing unit can update the relevant information of the target of the segmented redundant area according to the second round of target detection results, and mark the relevant information of the target of the non-segmented redundant area on the original image according to the required rule according to the relevant information of the target of the non-segmented redundant area in the first round to form a visual output image.

Taking the Faster R-CNN target detection algorithm as an example, the specific working flow is shown in FIG. 1, and the specific process is as follows:

first, an input image is segmented by an image preprocessing unit based on a preset rule to obtain a group of sub-images. The widths and heights of the sub-images are set to 612 and 426, respectively, and the redundancy threshold is 30, that is, the width of the original image is divided into [0,611] [582,1193] [1164,1775] [1746,2357] [2328,2939] [2910,3521] [3492,4103], [4096,4103] is filled with zero expansion, the height of the original image is divided into [0,455] [426,881] [852,1307] [1278,1733] [1704,2159], and the total of 7 × 5 to 35 sub-images, by comprehensively considering the original image size (4096 × 2160), the target size (assuming that the minimum target size is 32 × 32 and the maximum target size is 200 × 200), the computing power of the processor, and the like.

Then, the subimages enter a target detection unit in batch, and the target detection is carried out in parallel based on the Faster R-CNN algorithm.

Finally, the image post-processing unit analyzes the target detection result, and judges whether a target regression frame is in the segmentation redundant area and the distance between the target regression frame and the edge of the sub-image is less than 9 pixels (0.3T); if the target meeting the conditions exists, sampling and generating a new sub-image by taking the divided redundant area as the center, and detecting the target again. For example, if the target regression frame diagonal of the sub-image 2 is ([585,215] [645,255]), that is, the regression frame is 3 pixel distances away from the edge of the sub-image, the area of the abscissa range [241,852] and the ordinate range [0,455] is resampled to generate a new sub-image, and the new sub-image enters the target detection unit to reposition the target position in the middle area. In addition, the image post-processing unit marks target information such as positions, the categories, confidence degrees and the like on the original image according to the target detection result, completes visualization processing and outputs the image.

For the above target detection of each sub-image, different conventional algorithms (usually based on convolutional neural network algorithm) may be adopted according to actual needs, for example, besides the fast R-CNN algorithm (focusing on positioning accuracy), the YOLO algorithm (focusing on operation speed) may also be adopted.

Claims

1. An embedded deep learning-oriented ultra-large image target detection method is characterized by comprising the following steps:

2) respectively carrying out target detection on each sub-image to obtain target related information;

2. The embedded deep learning-oriented oversized image target detection method according to claim 1, characterized by:

step 1) specifically, image segmentation is carried out according to a preset width W, a preset height H and a preset redundancy threshold value T; the redundancy threshold value T characterizes the number of pixels which are overlapped with each other in the area close to the boundary;

and 2) specifically, respectively carrying out target detection on the sub-images with the size of W multiplied by H based on a convolutional neural network algorithm, and outputting target related information, wherein the target related information at least comprises a target position.

3. The embedded deep learning-oriented oversized image target detection method according to claim 2, characterized by comprising the following steps: the target related information further comprises a category to which the target related information belongs and a confidence level.

4. The embedded deep learning-oriented oversized image target detection method according to claim 2, characterized by comprising the following steps: the width W, the height H and the redundancy threshold value T are determined according to the original image size, the target size and the calculation capacity of a processor;

any width range and height range together constitute a sub-image area.

5. The embedded deep learning-oriented oversized image target detection method according to claim 2, characterized by comprising the following steps: the step 3) specifically comprises the following steps:

analyzing whether a target regression frame exists in the segmentation redundant area of each sub-image, wherein the distance from the target regression frame to the boundary of the sub-image is less than aT, and a is a set coefficient;

6. The embedded deep learning-oriented oversized image target detection method according to claim 5, characterized by comprising the following steps: 0< a < 0.5.

7. An embedded deep learning oriented ultra-large image target detection system is characterized by comprising:

8. The system for detecting the targets of the oversized images facing the embedded deep learning as claimed in claim 7, wherein the image post-processing unit is used for detecting and positioning the targets of the segmented redundant regions again, and specifically comprises: and generating a new sub-image by taking the divided redundant area as a center, reentering the target detection unit for target detection, and only adopting a target detection result of the divided redundant area.

9. An embedded device comprising a processor and a program memory, wherein the program stored in the program memory is loaded by the processor to execute the method for detecting the object of the ultra-large size image facing the embedded deep learning of claim 1.