CN118196401B

CN118196401B - Target detection method, target detection system, storage medium and electronic equipment

Info

Publication number: CN118196401B
Application number: CN202410612636.5A
Authority: CN
Inventors: 徐健锋; 王瑞华; 易文博; 卢巧红; 吴成磊
Original assignee: Nanchang Kanglai Medical Technology Co ltd; Nanchang University
Current assignee: Nanchang Kanglai Medical Technology Co ltd; Nanchang University
Priority date: 2024-05-17
Filing date: 2024-05-17
Publication date: 2024-07-19
Anticipated expiration: 2044-05-17
Also published as: CN118196401A

Abstract

The invention provides a target detection method, a target detection system, a storage medium and electronic equipment, wherein the target detection method comprises the following steps: preprocessing an image to obtain a training image, and inputting the training image into a target detection neural network; extracting features and detecting key points of the training images to obtain two thermodynamic diagrams with different sizes; performing maximum point calculation on the two thermodynamic diagrams to obtain a first key point matrixAnd a second key point matrix; Matrix the first key pointPerforming maximum pooling to filter the second keypoint matrixFurther training the target detection neural network; wherein, in training the target detection neural network, determining a loss function of a rotating rectangular bounding box based on projection position offset; the image to be detected is input into the target detection neural network after training to output a target detection result, and the target detection method can detect the target with high precision.

Description

Target detection method, target detection system, storage medium and electronic equipment

Technical Field

The present invention relates to the field of target detection technologies, and in particular, to a target detection method, a target detection system, a storage medium, and an electronic device.

Background

With the continuous development of computer power and artificial intelligence technology, the application and development of image target detection are promoted by computer vision technology.

However, there are still many problems in the field of image target detection, which restrict further application and development: 1. the image resolution is high and the target distribution is not uniform. 2. The size and aspect ratio of the targets vary greatly. 3. The rotation angle of the target is varied. Remote sensing is more varied than taking aerial photographs from the sky, compared to the angles of horizontally photographed image targets. Therefore, the horizontal bounding box for general object detection often cannot fit the object well, and a rotating rectangular bounding box is required for high-precision object detection of an image. 4. The image background features are complex. And the classical method for image target detection has the defect of insufficient interpretability and expansibility.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a target detection method which aims to solve the technical problems in the background art.

In order to achieve the above object, the present invention is achieved by the following technical scheme:

a target detection method comprising the steps of:

Preprocessing an image to obtain a training image, and inputting the training image into a target detection neural network;

Extracting features and detecting key points of the training images to obtain two thermodynamic diagrams with different sizes;

performing maximum point calculation on the two thermodynamic diagrams to obtain a first key point matrix And a second key point matrixThe first key point matrixIs larger than the second key point matrixIs a dimension of (2);

Matrix the first key point Performing maximum pooling to filter the second keypoint matrixFurther training the target detection neural network;

wherein, in training the target detection neural network, determining a loss function of a rotating rectangular bounding box based on projection position offset;

inputting the image to be detected into the trained target detection neural network to output a target detection result;

The target detection neural network is provided with a cascade filtering multi-scale detection head, the cascade filtering multi-scale detection head comprises a plurality of key point detection heads, the key point detection heads comprise a first key point detection head and a second key point detection head, and the specific steps of extracting features of the training images and detecting the key points to obtain two thermodynamic diagrams with different sizes comprise:

inputting an image to the target detection neural network based on the training image ；

The image isGenerating a first feature map and a second feature map after feature extraction of a backbone network, wherein the size of the first feature map is larger than that of the second feature map;

Inputting the first feature map and the second feature map into the first key point detection head and the second key point detection head respectively, wherein the first key point detection head and the second key point detection head are arranged adjacently, and the sizes of the first key point detection head and the second key point detection head are matched with the sizes of the first feature map and the second feature map respectively;

Outputting a first thermodynamic diagram through the first key point detection head and the second key point detection head respectively And a second thermodynamic diagramThe first thermodynamic diagramIs matched with the first characteristic diagram in size, and the second thermodynamic diagramIs matched to the dimensions of the second feature map.

Further, performing maximum point calculation on the two thermodynamic diagrams to obtain a first key point matrixAnd a second key point matrixThe first key point matrixIs larger than the second key point matrixThe specific steps of the dimension of (a) include:

calculating the first thermodynamic diagram by matrix offset and subtraction And said second thermodynamic diagramAll maximum points in (a);

based on the calculated maximum value points, respectively obtaining a first key point matrix And a second key point matrixThe first key point matrixMatching the size of the first feature map with the second key point matrixMatching the size of the second feature map.

Further, the first key point matrixPerforming maximum pooling to filter the second keypoint matrixThe specific steps of the repeated prediction block of (a) include:

The cascade filtering multi-scale detection head is used for aiming at the first key point matrix Max pooling to obtain a maskThe mask is provided withIs of a size and the second key point matrixIs uniform in size;

using the mask For the second key point matrixFiltering and reserving the second key point matrixIs 1 in (f), and the mask isThe key point with the middle value of 0 is further obtained to obtain a third key point matrixTo the second key point matrixIs filtered by the repeated prediction block.

Further, in said using said maskFor the second key point matrixFiltering and reserving the second key point matrixIs 1 in (f), and the mask isThe key point with the middle value of 0 is further obtained to obtain a third key point matrixThe steps of (a) further comprise:

and sequentially carrying out the filtering operation on the training images through the rest of the adjacent key point detection heads, and marking the prediction result of the target detection neural network as final output so as to realize the training of the target detection neural network.

Further, the specific step of determining the loss function of the rotated rectangular bounding box based on the projection position offset includes:

obtaining a real rectangular frame of the training image And predicting rectangular frames；

In the real rectangular frameRespectively dividing the w direction and the h direction for N times uniformly to obtainA plurality of location points;

In the real rectangular frame In the w directionIs divided and in the h directionThe divided position points are marked as，The prediction rectangle frameIntermediate to the location pointThe corresponding position points are marked as，，The average offset between the position point pairs is the loss function of the rotating rectangular boundary box：

。

Further, the position pointAnd the position pointThe offset calculation method comprises the following steps: obtaining an offset calculation formula:

Wherein the real rectangular frame Is encoded asThe prediction rectangular frameIs encoded as，Respectively represent the real rectangular framesAnd the prediction rectangle frameIs defined by the center point coordinates of (c),Respectively represent the real rectangular framesAnd the prediction rectangle frameIs formed by the length-width dimension of (a),Respectively represent the real rectangular framesAnd the prediction rectangle frameIs a rotation angle of (a);

Intercepting the real rectangular frame One of the position pointsAcquiring the distance between the position point and the real rectangular frameIntercept of upper left cornerAndCalculating the position point according to the geometric propertyA kind of electronic deviceThe coordinates are respectivelyAnd；

Intercepting the position pointAt the predicted rectangular boxCorresponding position point in (a)Acquiring the distance between the position point and the prediction rectangular frameIntercept of corresponding corner point in (C)AndFurther calculate the position pointA kind of electronic deviceThe coordinates are respectivelyAnd；

According to the position pointAnd the position pointAnd calculating the offset of the corresponding position point according to the coordinates of the position point, and carrying out scale normalization on the Euclidean distance.

The present invention also provides a target detection system, comprising:

And a pretreatment module: the method comprises the steps of preprocessing an image to obtain a training image, and inputting the training image into a target detection neural network;

the target detection neural network is provided with a cascade filtering multi-scale detection head, the cascade filtering multi-scale detection head comprises a plurality of key point detection heads, and the key point detection heads comprise a first key point detection head and a second key point detection head;

Thermodynamic diagram acquisition module: the method comprises the steps of extracting features of the training images and detecting key points of the training images to obtain two thermodynamic diagrams with different sizes;

the thermodynamic diagram obtaining module is specifically configured to: inputting an image to the target detection neural network based on the training image ；

Matrix acquisition module: for carrying out maximum point calculation on two thermodynamic diagrams to obtain a first key point matrixAnd a second key point matrixThe first key point matrixIs larger than the second key point matrixIs a dimension of (2);

the matrix acquisition module is specifically configured to: calculating the first thermodynamic diagram by matrix offset and subtraction And said second thermodynamic diagramAll maximum points in (a);

And a filtering module: for matrix the first keypointPerforming maximum pooling to filter the second keypoint matrixFurther training the target detection neural network;

the filter module is specifically used for: the cascade filtering multi-scale detection head is used for aiming at the first key point matrix Max pooling to obtain a maskThe mask is provided withIs of a size and the second key point matrixIs uniform in size;

Training module: and the key point detection heads are used for sequentially carrying out the filtering operation on the training images through the rest of the adjacent key point detection heads, and recording the prediction result of the target detection neural network as final output so as to train the target detection neural network.

and a determination module: a true rectangle frame for acquireing training is used image And predicting rectangular frames；

。

The position pointAnd the position pointThe offset calculation method comprises the following steps: obtaining an offset calculation formula:

And a detection module: and the target detection neural network is used for inputting the image to be detected into the trained target detection neural network so as to output a target detection result.

The present invention also provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the object detection method as described above.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the target detection method as described above when executing the computer program.

Compared with the prior art, the invention has the beneficial effects that:

According to the invention, thermodynamic diagram extraction can be carried out on an image through a target detection neural network, then maximum point calculation is carried out on the thermodynamic diagram to obtain a key point matrix, and finally, maximum pooling is carried out on the key point matrix, so that repeated prediction frames of the key point matrix with smaller size are filtered, noise information and interference factors in the image are effectively filtered, target characteristics are enhanced, a rotary rectangular boundary frame loss function based on projection position deviation is adopted in the image processing process, and the rotary rectangular boundary frame loss function based on projection position deviation is more explanatory and more expandable, is more visual, can effectively cope with characteristics of variable angles and shapes of targets in the image, improves detection efficiency and accuracy, and has potential of further expanding towards any quadrangle;

The cascade filtering multi-scale detection head based on the key points effectively filters repeated detection targets aiming at the condition that the size and the area of the targets in the image are various, greatly improves the detection efficiency and avoids repeated calculation.

Drawings

The described and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flowchart of a target detection method according to a first embodiment of the present invention;

FIG. 2 is a block flow diagram of steps S20-S40 in FIG. 1;

Fig. 3 is an enlarged view of the Image of fig. 2;

FIG. 4 is a diagram showing the loss of a rotated rectangular bounding box based on projection position offset according to a first embodiment of the present invention;

FIG. 5 is a schematic diagram showing calculation of projection position offset of a rotational moment rectangular bounding box according to a first embodiment of the present invention;

FIG. 6 is a block diagram of an object detection system according to a second embodiment of the present invention;

FIG. 7 is a block diagram of an electronic device according to a third embodiment of the present invention;

The invention will be further described in the following detailed description in conjunction with the above-described figures.

Detailed Description

In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Several embodiments of the invention are presented in the figures. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

It will be understood that when an element is referred to as being "mounted" on another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like are used herein for illustrative purposes only.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

Example 1

Referring to fig. 1, a target detection method in a first embodiment of the present invention includes steps S10 to S50 as follows:

S10, preprocessing an image to obtain a training image, and inputting the training image into a target detection neural network;

S20, extracting features and detecting key points of the training images to obtain two thermodynamic diagrams with different sizes;

s30, carrying out maximum point calculation on the two thermodynamic diagrams to obtain a first key point matrix And a second key point matrixThe first key point matrixIs larger than the second key point matrixIs a dimension of (2);

s40, the first key point matrix is obtained Performing maximum pooling to filter the second keypoint matrixFurther training the target detection neural network;

S50, inputting the image to be detected into the trained target detection neural network to output a target detection result.

It can be understood that the invention can extract the thermodynamic diagram of the image through the target detection neural network, and then calculate the maximum value point of the thermodynamic diagram to obtain the key point matrix, and finally maximize the pool of the key point matrix to filter the repeated prediction frame of the key point matrix with smaller size, thereby effectively filtering the noise information and interference factors in the image, enhancing the target characteristics, adopting the rotation rectangular boundary frame loss function based on the projection position deviation to use the rotation rectangular boundary frame with better interpretation and better expansion in the image processing process, rotating the target detection frame more intuitively, effectively aiming at the characteristics of changeable target angle and shape in the image, improving the detection efficiency and accuracy, and having the potential of expanding any quadrangle;

Specifically, the target detection neural network is provided with a cascade filtering multi-scale detection head, the cascade filtering multi-scale detection head comprises a plurality of key point detection heads, the key point detection heads comprise a first key point detection head and a second key point detection head, as shown in fig. 2-3, the step S20 is to perform feature extraction and key point detection on the training image, so as to obtain two thermodynamic diagrams with different sizes, and the specific steps include:

The embodiment shows the calculation process of the cascade filtering multi-scale detection head of the "plane" category under the condition of two key point detection heads;

The image is Generating a first feature map and a second feature map after feature extraction of a backbone network, wherein the size of the first feature map is larger than that of the second feature map;

Specifically, in this embodiment, the first thermodynamic diagramIs 8 x 8 in size, said second thermodynamic diagramIs 4 x 4 in size.

Further, referring to fig. 2 again, in step S30, maximum points of the two thermodynamic diagrams are calculated to obtain a first key point matrixAnd a second key point matrixThe first key point matrixIs larger than the second key point matrixThe specific steps of the dimension of (a) include:

Each of the great points corresponds to a target of the "plane" category;

In this embodiment, the first key point matrixIs 8 x 8 in size, the second keypoint matrixIs 4 x 4 in size.

Further, please refer to fig. 2 again, the imageTargets comprising 3 "plane" categories in total, two smaller targets being located in the upper left corner of the image and one larger target being located in the lower right corner of the image, said step S40 of matrix said first keypointPerforming maximum pooling to filter the second keypoint matrixThe specific steps of the repeated prediction block of (a) include:

The cascade filtering multi-scale detection head is used for aiming at the first key point matrix Maximum pooling of size 2 x 2 and step size 2 is performed to obtain a mask of size 4 x4The mask is provided withIs of a size and the second key point matrixIs uniform in size;

using the mask For the second key point matrixFiltering and reserving the second key point matrixIs 1 in (f), and the mask isThe key point with the middle value of 0, and then a new 4 multiplied by 4 third key point matrix is obtainedTo the second key point matrixIs filtered by the repeated prediction block.

the filtering operation is sequentially carried out on the training images through the rest of the adjacent key point detection heads, and the prediction result of the target detection neural network is recorded as final output, so that the training of the target detection neural network is realized; the filtering calculation flow ensures that smaller targets are predicted by feature images with finer granularity, a key point matrix with larger size is obtained, and the regression accuracy of target bounding boxes, especially dense and small targets, is ensured.

It can be understood that the principle of the target detection neural network is to take conv1-conv5 layers in ResNet-50 networks as backbone networks to perform convolution feature extraction, and add an average pooling layer (Average Pooling) to perform downsampling to reduce the influence of background noise information, then add an additional convolution layer conv6 to perform feature extraction, and then add a maximum pooling layer (Max Pooling) to perform downsampling and enhance the target features. And inputting the extracted image features into a cascade filtering multi-scale detection head based on key points for target detection, and outputting the detection result after post-processing.

Further, referring to fig. 4-5, determining the loss function of the rotated rectangular bounding box based on the projection position offset includes:

。

According to the position pointAnd the position pointAnd calculating the offset of the corresponding position point, and carrying out scale normalization on the Euclidean distance, so as to consider the size factor of the rectangle and keep the scale unchanged.

In summary, according to the target detection method in the above embodiment of the present invention, thermodynamic diagram extraction can be performed on an image through a target detection neural network, and then, maximum point calculation is performed on the thermodynamic diagram to obtain a key point matrix, and finally, maximum pooling is performed on the key point matrix to filter repeated prediction frames of the key point matrix with smaller size, so as to effectively filter noise information and interference factors in the image, enhance target characteristics, and in the image processing process, the used rotating rectangular bounding box loss function based on projection position offset has better interpretation and better expansion, and more visual rotating target detection frames can effectively cope with characteristics of variable target angles and shapes in the image, thereby improving detection efficiency and accuracy, and having the potential of further expanding to any quadrangle;

Example two

Referring to fig. 6, an object detection system 40 according to a second embodiment of the present invention includes:

pretreatment module 11: the method comprises the steps of preprocessing an image to obtain a training image, and inputting the training image into a target detection neural network;

Thermodynamic diagram acquisition module 12: the method comprises the steps of extracting features of the training images and detecting key points of the training images to obtain two thermodynamic diagrams with different sizes;

the thermodynamic diagram obtaining module 12 is specifically configured to: inputting an image to the target detection neural network based on the training image ；

Matrix acquisition module 13: for carrying out maximum point calculation on two thermodynamic diagrams to obtain a first key point matrixAnd a second key point matrixThe first key point matrixIs larger than the second key point matrixIs a dimension of (2);

the matrix acquisition module 13 is specifically configured to: calculating the first thermodynamic diagram by matrix offset and subtraction And said second thermodynamic diagramAll maximum points in (a);

The filter module 14: for matrix the first keypointPerforming maximum pooling to filter the second keypoint matrixFurther training the target detection neural network;

the filter module 14 is specifically configured to: the cascade filtering multi-scale detection head is used for aiming at the first key point matrix Max pooling to obtain a maskThe mask is provided withIs of a size and the second key point matrixIs uniform in size;

Training module 15: and the key point detection heads are used for sequentially carrying out the filtering operation on the training images through the rest of the adjacent key point detection heads, and recording the prediction result of the target detection neural network as final output so as to train the target detection neural network.

Determination module 16: a true rectangle frame for acquireing training is used image And predicting rectangular frames；

。

Detection module 17: and the target detection neural network is used for inputting the image to be detected into the trained target detection neural network so as to output a target detection result.

Example III

The present invention also proposes an electronic device, referring to fig. 7, which shows an electronic device according to a third embodiment of the present invention, including a memory 10, a processor 20, and a computer program 30 stored in the memory 10 and capable of running on the processor 20, where the processor 20 implements the above-mentioned target detection method when executing the computer program 30.

The memory 10 includes at least one type of storage medium including flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 10 may in some embodiments be an internal storage unit of an electronic device, such as a hard disk of the electronic device. The memory 10 may also be an external storage device such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like in other embodiments. Further, the memory 10 may also include both internal storage units and external storage devices of the electronic device. The memory 10 may be used not only for storing application software installed in an electronic device and various types of data, but also for temporarily storing data that has been output or is to be output.

The processor 20 may be, in some embodiments, an electronic control unit (Electronic Control Unit, ECU for short, also called a car computer), a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor or other data processing chip for running program codes or processing data stored in the memory 10, for example, executing an access restriction program or the like.

It should be noted that the structure shown in fig. 7 does not constitute a limitation of the electronic device, and in other embodiments the electronic device may comprise fewer or more components than shown, or may combine certain components, or may have a different arrangement of components.

The embodiment of the invention also provides a readable storage medium, on which a computer program is stored, which when executed by a processor implements the object detection method as described above.

Those of skill in the art will appreciate that the logic and/or steps represented in the flow diagrams or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. A method of target detection comprising the steps of:

2. The method of claim 1, wherein the maximum points of the thermodynamic diagrams are calculated to obtain a first key point matrixAnd a second key point matrixThe first key point matrixIs larger than the second key point matrixThe specific steps of the dimension of (a) include:

3. The method of claim 2, wherein the first keypoint is matrixPerforming maximum pooling to filter the second keypoint matrixThe specific steps of the repeated prediction block of (a) include:

4. The target detection method according to claim 3, wherein, in said using said maskFor the second key point matrixFiltering and reserving the second key point matrixIs 1 in (f), and the mask isThe key point with the middle value of 0 is further obtained to obtain a third key point matrixThe steps of (a) further comprise:

5. The object detection method according to claim 1, wherein the specific step of determining the loss function of the rotated rectangular bounding box based on the projection position shift includes:

。

6. The method of claim 5, wherein the location pointAnd the position pointThe offset calculation method comprises the following steps: obtaining an offset calculation formula:

7. An object detection system, comprising:

Outputting a first thermodynamic diagram through the first key point detection head and the second key point detection head respectively And a second thermodynamic diagramThe first thermodynamic diagramIs matched with the first characteristic diagram in size, and the second thermodynamic diagramIs matched to the size of the second feature map;

Matrix acquisition module: for carrying out maximum point calculation on two thermodynamic diagrams to obtain a first key point matrix And a second key point matrixThe first key point matrixIs larger than the second key point matrixIs a dimension of (2);

and a filtering module: for matrix the first keypoint Performing maximum pooling to filter the second keypoint matrixFurther training the target detection neural network;

8. A storage medium having stored thereon a computer program which, when executed by a processor, implements the object detection method according to any of claims 1-6.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the object detection method according to any of claims 1-6 when executing the computer program.