[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN118196401B - Target detection method, target detection system, storage medium and electronic equipment - Google Patents

Target detection method, target detection system, storage medium and electronic equipment Download PDF

Info

Publication number
CN118196401B
CN118196401B CN202410612636.5A CN202410612636A CN118196401B CN 118196401 B CN118196401 B CN 118196401B CN 202410612636 A CN202410612636 A CN 202410612636A CN 118196401 B CN118196401 B CN 118196401B
Authority
CN
China
Prior art keywords
key point
matrix
target detection
feature map
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410612636.5A
Other languages
Chinese (zh)
Other versions
CN118196401A (en
Inventor
徐健锋
王瑞华
易文博
卢巧红
吴成磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang Kanglai Medical Technology Co ltd
Nanchang University
Original Assignee
Nanchang Kanglai Medical Technology Co ltd
Nanchang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang Kanglai Medical Technology Co ltd, Nanchang University filed Critical Nanchang Kanglai Medical Technology Co ltd
Priority to CN202410612636.5A priority Critical patent/CN118196401B/en
Publication of CN118196401A publication Critical patent/CN118196401A/en
Application granted granted Critical
Publication of CN118196401B publication Critical patent/CN118196401B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a target detection method, a target detection system, a storage medium and electronic equipment, wherein the target detection method comprises the following steps: preprocessing an image to obtain a training image, and inputting the training image into a target detection neural network; extracting features and detecting key points of the training images to obtain two thermodynamic diagrams with different sizes; performing maximum point calculation on the two thermodynamic diagrams to obtain a first key point matrixAnd a second key point matrix; Matrix the first key pointPerforming maximum pooling to filter the second keypoint matrixFurther training the target detection neural network; wherein, in training the target detection neural network, determining a loss function of a rotating rectangular bounding box based on projection position offset; the image to be detected is input into the target detection neural network after training to output a target detection result, and the target detection method can detect the target with high precision.

Description

Target detection method, target detection system, storage medium and electronic equipment
Technical Field
The present invention relates to the field of target detection technologies, and in particular, to a target detection method, a target detection system, a storage medium, and an electronic device.
Background
With the continuous development of computer power and artificial intelligence technology, the application and development of image target detection are promoted by computer vision technology.
However, there are still many problems in the field of image target detection, which restrict further application and development: 1. the image resolution is high and the target distribution is not uniform. 2. The size and aspect ratio of the targets vary greatly. 3. The rotation angle of the target is varied. Remote sensing is more varied than taking aerial photographs from the sky, compared to the angles of horizontally photographed image targets. Therefore, the horizontal bounding box for general object detection often cannot fit the object well, and a rotating rectangular bounding box is required for high-precision object detection of an image. 4. The image background features are complex. And the classical method for image target detection has the defect of insufficient interpretability and expansibility.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a target detection method which aims to solve the technical problems in the background art.
In order to achieve the above object, the present invention is achieved by the following technical scheme:
a target detection method comprising the steps of:
Preprocessing an image to obtain a training image, and inputting the training image into a target detection neural network;
Extracting features and detecting key points of the training images to obtain two thermodynamic diagrams with different sizes;
performing maximum point calculation on the two thermodynamic diagrams to obtain a first key point matrix And a second key point matrixThe first key point matrixIs larger than the second key point matrixIs a dimension of (2);
Matrix the first key point Performing maximum pooling to filter the second keypoint matrixFurther training the target detection neural network;
wherein, in training the target detection neural network, determining a loss function of a rotating rectangular bounding box based on projection position offset;
inputting the image to be detected into the trained target detection neural network to output a target detection result;
The target detection neural network is provided with a cascade filtering multi-scale detection head, the cascade filtering multi-scale detection head comprises a plurality of key point detection heads, the key point detection heads comprise a first key point detection head and a second key point detection head, and the specific steps of extracting features of the training images and detecting the key points to obtain two thermodynamic diagrams with different sizes comprise:
inputting an image to the target detection neural network based on the training image
The image isGenerating a first feature map and a second feature map after feature extraction of a backbone network, wherein the size of the first feature map is larger than that of the second feature map;
Inputting the first feature map and the second feature map into the first key point detection head and the second key point detection head respectively, wherein the first key point detection head and the second key point detection head are arranged adjacently, and the sizes of the first key point detection head and the second key point detection head are matched with the sizes of the first feature map and the second feature map respectively;
Outputting a first thermodynamic diagram through the first key point detection head and the second key point detection head respectively And a second thermodynamic diagramThe first thermodynamic diagramIs matched with the first characteristic diagram in size, and the second thermodynamic diagramIs matched to the dimensions of the second feature map.
Further, performing maximum point calculation on the two thermodynamic diagrams to obtain a first key point matrixAnd a second key point matrixThe first key point matrixIs larger than the second key point matrixThe specific steps of the dimension of (a) include:
calculating the first thermodynamic diagram by matrix offset and subtraction And said second thermodynamic diagramAll maximum points in (a);
based on the calculated maximum value points, respectively obtaining a first key point matrix And a second key point matrixThe first key point matrixMatching the size of the first feature map with the second key point matrixMatching the size of the second feature map.
Further, the first key point matrixPerforming maximum pooling to filter the second keypoint matrixThe specific steps of the repeated prediction block of (a) include:
The cascade filtering multi-scale detection head is used for aiming at the first key point matrix Max pooling to obtain a maskThe mask is provided withIs of a size and the second key point matrixIs uniform in size;
using the mask For the second key point matrixFiltering and reserving the second key point matrixIs 1 in (f), and the mask isThe key point with the middle value of 0 is further obtained to obtain a third key point matrixTo the second key point matrixIs filtered by the repeated prediction block.
Further, in said using said maskFor the second key point matrixFiltering and reserving the second key point matrixIs 1 in (f), and the mask isThe key point with the middle value of 0 is further obtained to obtain a third key point matrixThe steps of (a) further comprise:
and sequentially carrying out the filtering operation on the training images through the rest of the adjacent key point detection heads, and marking the prediction result of the target detection neural network as final output so as to realize the training of the target detection neural network.
Further, the specific step of determining the loss function of the rotated rectangular bounding box based on the projection position offset includes:
obtaining a real rectangular frame of the training image And predicting rectangular frames
In the real rectangular frameRespectively dividing the w direction and the h direction for N times uniformly to obtainA plurality of location points;
In the real rectangular frame In the w directionIs divided and in the h directionThe divided position points are marked asThe prediction rectangle frameIntermediate to the location pointThe corresponding position points are marked asThe average offset between the position point pairs is the loss function of the rotating rectangular boundary box
Further, the position pointAnd the position pointThe offset calculation method comprises the following steps: obtaining an offset calculation formula:
Wherein the real rectangular frame Is encoded asThe prediction rectangular frameIs encoded asRespectively represent the real rectangular framesAnd the prediction rectangle frameIs defined by the center point coordinates of (c),Respectively represent the real rectangular framesAnd the prediction rectangle frameIs formed by the length-width dimension of (a),Respectively represent the real rectangular framesAnd the prediction rectangle frameIs a rotation angle of (a);
Intercepting the real rectangular frame One of the position pointsAcquiring the distance between the position point and the real rectangular frameIntercept of upper left cornerAndCalculating the position point according to the geometric propertyA kind of electronic deviceThe coordinates are respectivelyAnd
Intercepting the position pointAt the predicted rectangular boxCorresponding position point in (a)Acquiring the distance between the position point and the prediction rectangular frameIntercept of corresponding corner point in (C)AndFurther calculate the position pointA kind of electronic deviceThe coordinates are respectivelyAnd
According to the position pointAnd the position pointAnd calculating the offset of the corresponding position point according to the coordinates of the position point, and carrying out scale normalization on the Euclidean distance.
The present invention also provides a target detection system, comprising:
And a pretreatment module: the method comprises the steps of preprocessing an image to obtain a training image, and inputting the training image into a target detection neural network;
the target detection neural network is provided with a cascade filtering multi-scale detection head, the cascade filtering multi-scale detection head comprises a plurality of key point detection heads, and the key point detection heads comprise a first key point detection head and a second key point detection head;
Thermodynamic diagram acquisition module: the method comprises the steps of extracting features of the training images and detecting key points of the training images to obtain two thermodynamic diagrams with different sizes;
the thermodynamic diagram obtaining module is specifically configured to: inputting an image to the target detection neural network based on the training image
The image isGenerating a first feature map and a second feature map after feature extraction of a backbone network, wherein the size of the first feature map is larger than that of the second feature map;
Inputting the first feature map and the second feature map into the first key point detection head and the second key point detection head respectively, wherein the first key point detection head and the second key point detection head are arranged adjacently, and the sizes of the first key point detection head and the second key point detection head are matched with the sizes of the first feature map and the second feature map respectively;
Outputting a first thermodynamic diagram through the first key point detection head and the second key point detection head respectively And a second thermodynamic diagramThe first thermodynamic diagramIs matched with the first characteristic diagram in size, and the second thermodynamic diagramIs matched to the dimensions of the second feature map.
Matrix acquisition module: for carrying out maximum point calculation on two thermodynamic diagrams to obtain a first key point matrixAnd a second key point matrixThe first key point matrixIs larger than the second key point matrixIs a dimension of (2);
the matrix acquisition module is specifically configured to: calculating the first thermodynamic diagram by matrix offset and subtraction And said second thermodynamic diagramAll maximum points in (a);
based on the calculated maximum value points, respectively obtaining a first key point matrix And a second key point matrixThe first key point matrixMatching the size of the first feature map with the second key point matrixMatching the size of the second feature map.
And a filtering module: for matrix the first keypointPerforming maximum pooling to filter the second keypoint matrixFurther training the target detection neural network;
the filter module is specifically used for: the cascade filtering multi-scale detection head is used for aiming at the first key point matrix Max pooling to obtain a maskThe mask is provided withIs of a size and the second key point matrixIs uniform in size;
using the mask For the second key point matrixFiltering and reserving the second key point matrixIs 1 in (f), and the mask isThe key point with the middle value of 0 is further obtained to obtain a third key point matrixTo the second key point matrixIs filtered by the repeated prediction block.
Training module: and the key point detection heads are used for sequentially carrying out the filtering operation on the training images through the rest of the adjacent key point detection heads, and recording the prediction result of the target detection neural network as final output so as to train the target detection neural network.
Wherein, in training the target detection neural network, determining a loss function of a rotating rectangular bounding box based on projection position offset;
and a determination module: a true rectangle frame for acquireing training is used image And predicting rectangular frames
In the real rectangular frameRespectively dividing the w direction and the h direction for N times uniformly to obtainA plurality of location points;
In the real rectangular frame In the w directionIs divided and in the h directionThe divided position points are marked asThe prediction rectangle frameIntermediate to the location pointThe corresponding position points are marked asThe average offset between the position point pairs is the loss function of the rotating rectangular boundary box
The position pointAnd the position pointThe offset calculation method comprises the following steps: obtaining an offset calculation formula:
Wherein the real rectangular frame Is encoded asThe prediction rectangular frameIs encoded asRespectively represent the real rectangular framesAnd the prediction rectangle frameIs defined by the center point coordinates of (c),Respectively represent the real rectangular framesAnd the prediction rectangle frameIs formed by the length-width dimension of (a),Respectively represent the real rectangular framesAnd the prediction rectangle frameIs a rotation angle of (a);
Intercepting the real rectangular frame One of the position pointsAcquiring the distance between the position point and the real rectangular frameIntercept of upper left cornerAndCalculating the position point according to the geometric propertyA kind of electronic deviceThe coordinates are respectivelyAnd
Intercepting the position pointAt the predicted rectangular boxCorresponding position point in (a)Acquiring the distance between the position point and the prediction rectangular frameIntercept of corresponding corner point in (C)AndFurther calculate the position pointA kind of electronic deviceThe coordinates are respectivelyAnd
According to the position pointAnd the position pointAnd calculating the offset of the corresponding position point according to the coordinates of the position point, and carrying out scale normalization on the Euclidean distance.
And a detection module: and the target detection neural network is used for inputting the image to be detected into the trained target detection neural network so as to output a target detection result.
The present invention also provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the object detection method as described above.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the target detection method as described above when executing the computer program.
Compared with the prior art, the invention has the beneficial effects that:
According to the invention, thermodynamic diagram extraction can be carried out on an image through a target detection neural network, then maximum point calculation is carried out on the thermodynamic diagram to obtain a key point matrix, and finally, maximum pooling is carried out on the key point matrix, so that repeated prediction frames of the key point matrix with smaller size are filtered, noise information and interference factors in the image are effectively filtered, target characteristics are enhanced, a rotary rectangular boundary frame loss function based on projection position deviation is adopted in the image processing process, and the rotary rectangular boundary frame loss function based on projection position deviation is more explanatory and more expandable, is more visual, can effectively cope with characteristics of variable angles and shapes of targets in the image, improves detection efficiency and accuracy, and has potential of further expanding towards any quadrangle;
The cascade filtering multi-scale detection head based on the key points effectively filters repeated detection targets aiming at the condition that the size and the area of the targets in the image are various, greatly improves the detection efficiency and avoids repeated calculation.
Drawings
The described and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flowchart of a target detection method according to a first embodiment of the present invention;
FIG. 2 is a block flow diagram of steps S20-S40 in FIG. 1;
Fig. 3 is an enlarged view of the Image of fig. 2;
FIG. 4 is a diagram showing the loss of a rotated rectangular bounding box based on projection position offset according to a first embodiment of the present invention;
FIG. 5 is a schematic diagram showing calculation of projection position offset of a rotational moment rectangular bounding box according to a first embodiment of the present invention;
FIG. 6 is a block diagram of an object detection system according to a second embodiment of the present invention;
FIG. 7 is a block diagram of an electronic device according to a third embodiment of the present invention;
The invention will be further described in the following detailed description in conjunction with the above-described figures.
Detailed Description
In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Several embodiments of the invention are presented in the figures. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
It will be understood that when an element is referred to as being "mounted" on another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like are used herein for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
Example 1
Referring to fig. 1, a target detection method in a first embodiment of the present invention includes steps S10 to S50 as follows:
S10, preprocessing an image to obtain a training image, and inputting the training image into a target detection neural network;
S20, extracting features and detecting key points of the training images to obtain two thermodynamic diagrams with different sizes;
s30, carrying out maximum point calculation on the two thermodynamic diagrams to obtain a first key point matrix And a second key point matrixThe first key point matrixIs larger than the second key point matrixIs a dimension of (2);
s40, the first key point matrix is obtained Performing maximum pooling to filter the second keypoint matrixFurther training the target detection neural network;
wherein, in training the target detection neural network, determining a loss function of a rotating rectangular bounding box based on projection position offset;
S50, inputting the image to be detected into the trained target detection neural network to output a target detection result.
It can be understood that the invention can extract the thermodynamic diagram of the image through the target detection neural network, and then calculate the maximum value point of the thermodynamic diagram to obtain the key point matrix, and finally maximize the pool of the key point matrix to filter the repeated prediction frame of the key point matrix with smaller size, thereby effectively filtering the noise information and interference factors in the image, enhancing the target characteristics, adopting the rotation rectangular boundary frame loss function based on the projection position deviation to use the rotation rectangular boundary frame with better interpretation and better expansion in the image processing process, rotating the target detection frame more intuitively, effectively aiming at the characteristics of changeable target angle and shape in the image, improving the detection efficiency and accuracy, and having the potential of expanding any quadrangle;
The cascade filtering multi-scale detection head based on the key points effectively filters repeated detection targets aiming at the condition that the size and the area of the targets in the image are various, greatly improves the detection efficiency and avoids repeated calculation.
Specifically, the target detection neural network is provided with a cascade filtering multi-scale detection head, the cascade filtering multi-scale detection head comprises a plurality of key point detection heads, the key point detection heads comprise a first key point detection head and a second key point detection head, as shown in fig. 2-3, the step S20 is to perform feature extraction and key point detection on the training image, so as to obtain two thermodynamic diagrams with different sizes, and the specific steps include:
inputting an image to the target detection neural network based on the training image
The embodiment shows the calculation process of the cascade filtering multi-scale detection head of the "plane" category under the condition of two key point detection heads;
The image is Generating a first feature map and a second feature map after feature extraction of a backbone network, wherein the size of the first feature map is larger than that of the second feature map;
Inputting the first feature map and the second feature map into the first key point detection head and the second key point detection head respectively, wherein the first key point detection head and the second key point detection head are arranged adjacently, and the sizes of the first key point detection head and the second key point detection head are matched with the sizes of the first feature map and the second feature map respectively;
Outputting a first thermodynamic diagram through the first key point detection head and the second key point detection head respectively And a second thermodynamic diagramThe first thermodynamic diagramIs matched with the first characteristic diagram in size, and the second thermodynamic diagramIs matched to the dimensions of the second feature map.
Specifically, in this embodiment, the first thermodynamic diagramIs 8 x 8 in size, said second thermodynamic diagramIs 4 x 4 in size.
Further, referring to fig. 2 again, in step S30, maximum points of the two thermodynamic diagrams are calculated to obtain a first key point matrixAnd a second key point matrixThe first key point matrixIs larger than the second key point matrixThe specific steps of the dimension of (a) include:
calculating the first thermodynamic diagram by matrix offset and subtraction And said second thermodynamic diagramAll maximum points in (a);
Each of the great points corresponds to a target of the "plane" category;
based on the calculated maximum value points, respectively obtaining a first key point matrix And a second key point matrixThe first key point matrixMatching the size of the first feature map with the second key point matrixMatching the size of the second feature map.
In this embodiment, the first key point matrixIs 8 x 8 in size, the second keypoint matrixIs 4 x 4 in size.
Further, please refer to fig. 2 again, the imageTargets comprising 3 "plane" categories in total, two smaller targets being located in the upper left corner of the image and one larger target being located in the lower right corner of the image, said step S40 of matrix said first keypointPerforming maximum pooling to filter the second keypoint matrixThe specific steps of the repeated prediction block of (a) include:
The cascade filtering multi-scale detection head is used for aiming at the first key point matrix Maximum pooling of size 2 x 2 and step size 2 is performed to obtain a mask of size 4 x4The mask is provided withIs of a size and the second key point matrixIs uniform in size;
using the mask For the second key point matrixFiltering and reserving the second key point matrixIs 1 in (f), and the mask isThe key point with the middle value of 0, and then a new 4 multiplied by 4 third key point matrix is obtainedTo the second key point matrixIs filtered by the repeated prediction block.
Further, in said using said maskFor the second key point matrixFiltering and reserving the second key point matrixIs 1 in (f), and the mask isThe key point with the middle value of 0 is further obtained to obtain a third key point matrixThe steps of (a) further comprise:
the filtering operation is sequentially carried out on the training images through the rest of the adjacent key point detection heads, and the prediction result of the target detection neural network is recorded as final output, so that the training of the target detection neural network is realized; the filtering calculation flow ensures that smaller targets are predicted by feature images with finer granularity, a key point matrix with larger size is obtained, and the regression accuracy of target bounding boxes, especially dense and small targets, is ensured.
It can be understood that the principle of the target detection neural network is to take conv1-conv5 layers in ResNet-50 networks as backbone networks to perform convolution feature extraction, and add an average pooling layer (Average Pooling) to perform downsampling to reduce the influence of background noise information, then add an additional convolution layer conv6 to perform feature extraction, and then add a maximum pooling layer (Max Pooling) to perform downsampling and enhance the target features. And inputting the extracted image features into a cascade filtering multi-scale detection head based on key points for target detection, and outputting the detection result after post-processing.
Further, referring to fig. 4-5, determining the loss function of the rotated rectangular bounding box based on the projection position offset includes:
obtaining a real rectangular frame of the training image And predicting rectangular frames
In the real rectangular frameRespectively dividing the w direction and the h direction for N times uniformly to obtainA plurality of location points;
In the real rectangular frame In the w directionIs divided and in the h directionThe divided position points are marked asThe prediction rectangle frameIntermediate to the location pointThe corresponding position points are marked asThe average offset between the position point pairs is the loss function of the rotating rectangular boundary box
Further, the position pointAnd the position pointThe offset calculation method comprises the following steps: obtaining an offset calculation formula:
Wherein the real rectangular frame Is encoded asThe prediction rectangular frameIs encoded asRespectively represent the real rectangular framesAnd the prediction rectangle frameIs defined by the center point coordinates of (c),Respectively represent the real rectangular framesAnd the prediction rectangle frameIs formed by the length-width dimension of (a),Respectively represent the real rectangular framesAnd the prediction rectangle frameIs a rotation angle of (a);
Intercepting the real rectangular frame One of the position pointsAcquiring the distance between the position point and the real rectangular frameIntercept of upper left cornerAndCalculating the position point according to the geometric propertyA kind of electronic deviceThe coordinates are respectivelyAnd
Intercepting the position pointAt the predicted rectangular boxCorresponding position point in (a)Acquiring the distance between the position point and the prediction rectangular frameIntercept of corresponding corner point in (C)AndFurther calculate the position pointA kind of electronic deviceThe coordinates are respectivelyAnd
According to the position pointAnd the position pointAnd calculating the offset of the corresponding position point, and carrying out scale normalization on the Euclidean distance, so as to consider the size factor of the rectangle and keep the scale unchanged.
In summary, according to the target detection method in the above embodiment of the present invention, thermodynamic diagram extraction can be performed on an image through a target detection neural network, and then, maximum point calculation is performed on the thermodynamic diagram to obtain a key point matrix, and finally, maximum pooling is performed on the key point matrix to filter repeated prediction frames of the key point matrix with smaller size, so as to effectively filter noise information and interference factors in the image, enhance target characteristics, and in the image processing process, the used rotating rectangular bounding box loss function based on projection position offset has better interpretation and better expansion, and more visual rotating target detection frames can effectively cope with characteristics of variable target angles and shapes in the image, thereby improving detection efficiency and accuracy, and having the potential of further expanding to any quadrangle;
The cascade filtering multi-scale detection head based on the key points effectively filters repeated detection targets aiming at the condition that the size and the area of the targets in the image are various, greatly improves the detection efficiency and avoids repeated calculation.
Example two
Referring to fig. 6, an object detection system 40 according to a second embodiment of the present invention includes:
pretreatment module 11: the method comprises the steps of preprocessing an image to obtain a training image, and inputting the training image into a target detection neural network;
the target detection neural network is provided with a cascade filtering multi-scale detection head, the cascade filtering multi-scale detection head comprises a plurality of key point detection heads, and the key point detection heads comprise a first key point detection head and a second key point detection head;
Thermodynamic diagram acquisition module 12: the method comprises the steps of extracting features of the training images and detecting key points of the training images to obtain two thermodynamic diagrams with different sizes;
the thermodynamic diagram obtaining module 12 is specifically configured to: inputting an image to the target detection neural network based on the training image
The image isGenerating a first feature map and a second feature map after feature extraction of a backbone network, wherein the size of the first feature map is larger than that of the second feature map;
Inputting the first feature map and the second feature map into the first key point detection head and the second key point detection head respectively, wherein the first key point detection head and the second key point detection head are arranged adjacently, and the sizes of the first key point detection head and the second key point detection head are matched with the sizes of the first feature map and the second feature map respectively;
Outputting a first thermodynamic diagram through the first key point detection head and the second key point detection head respectively And a second thermodynamic diagramThe first thermodynamic diagramIs matched with the first characteristic diagram in size, and the second thermodynamic diagramIs matched to the dimensions of the second feature map.
Matrix acquisition module 13: for carrying out maximum point calculation on two thermodynamic diagrams to obtain a first key point matrixAnd a second key point matrixThe first key point matrixIs larger than the second key point matrixIs a dimension of (2);
the matrix acquisition module 13 is specifically configured to: calculating the first thermodynamic diagram by matrix offset and subtraction And said second thermodynamic diagramAll maximum points in (a);
based on the calculated maximum value points, respectively obtaining a first key point matrix And a second key point matrixThe first key point matrixMatching the size of the first feature map with the second key point matrixMatching the size of the second feature map.
The filter module 14: for matrix the first keypointPerforming maximum pooling to filter the second keypoint matrixFurther training the target detection neural network;
the filter module 14 is specifically configured to: the cascade filtering multi-scale detection head is used for aiming at the first key point matrix Max pooling to obtain a maskThe mask is provided withIs of a size and the second key point matrixIs uniform in size;
using the mask For the second key point matrixFiltering and reserving the second key point matrixIs 1 in (f), and the mask isThe key point with the middle value of 0 is further obtained to obtain a third key point matrixTo the second key point matrixIs filtered by the repeated prediction block.
Training module 15: and the key point detection heads are used for sequentially carrying out the filtering operation on the training images through the rest of the adjacent key point detection heads, and recording the prediction result of the target detection neural network as final output so as to train the target detection neural network.
Wherein, in training the target detection neural network, determining a loss function of a rotating rectangular bounding box based on projection position offset;
Determination module 16: a true rectangle frame for acquireing training is used image And predicting rectangular frames
In the real rectangular frameRespectively dividing the w direction and the h direction for N times uniformly to obtainA plurality of location points;
In the real rectangular frame In the w directionIs divided and in the h directionThe divided position points are marked asThe prediction rectangle frameIntermediate to the location pointThe corresponding position points are marked asThe average offset between the position point pairs is the loss function of the rotating rectangular boundary box
The position pointAnd the position pointThe offset calculation method comprises the following steps: obtaining an offset calculation formula:
Wherein the real rectangular frame Is encoded asThe prediction rectangular frameIs encoded asRespectively represent the real rectangular framesAnd the prediction rectangle frameIs defined by the center point coordinates of (c),Respectively represent the real rectangular framesAnd the prediction rectangle frameIs formed by the length-width dimension of (a),Respectively represent the real rectangular framesAnd the prediction rectangle frameIs a rotation angle of (a);
Intercepting the real rectangular frame One of the position pointsAcquiring the distance between the position point and the real rectangular frameIntercept of upper left cornerAndCalculating the position point according to the geometric propertyA kind of electronic deviceThe coordinates are respectivelyAnd
Intercepting the position pointAt the predicted rectangular boxCorresponding position point in (a)Acquiring the distance between the position point and the prediction rectangular frameIntercept of corresponding corner point in (C)AndFurther calculate the position pointA kind of electronic deviceThe coordinates are respectivelyAnd
According to the position pointAnd the position pointAnd calculating the offset of the corresponding position point according to the coordinates of the position point, and carrying out scale normalization on the Euclidean distance.
Detection module 17: and the target detection neural network is used for inputting the image to be detected into the trained target detection neural network so as to output a target detection result.
Example III
The present invention also proposes an electronic device, referring to fig. 7, which shows an electronic device according to a third embodiment of the present invention, including a memory 10, a processor 20, and a computer program 30 stored in the memory 10 and capable of running on the processor 20, where the processor 20 implements the above-mentioned target detection method when executing the computer program 30.
The memory 10 includes at least one type of storage medium including flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 10 may in some embodiments be an internal storage unit of an electronic device, such as a hard disk of the electronic device. The memory 10 may also be an external storage device such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like in other embodiments. Further, the memory 10 may also include both internal storage units and external storage devices of the electronic device. The memory 10 may be used not only for storing application software installed in an electronic device and various types of data, but also for temporarily storing data that has been output or is to be output.
The processor 20 may be, in some embodiments, an electronic control unit (Electronic Control Unit, ECU for short, also called a car computer), a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor or other data processing chip for running program codes or processing data stored in the memory 10, for example, executing an access restriction program or the like.
It should be noted that the structure shown in fig. 7 does not constitute a limitation of the electronic device, and in other embodiments the electronic device may comprise fewer or more components than shown, or may combine certain components, or may have a different arrangement of components.
The embodiment of the invention also provides a readable storage medium, on which a computer program is stored, which when executed by a processor implements the object detection method as described above.
Those of skill in the art will appreciate that the logic and/or steps represented in the flow diagrams or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (9)

1. A method of target detection comprising the steps of:
Preprocessing an image to obtain a training image, and inputting the training image into a target detection neural network;
Extracting features and detecting key points of the training images to obtain two thermodynamic diagrams with different sizes;
performing maximum point calculation on the two thermodynamic diagrams to obtain a first key point matrix And a second key point matrixThe first key point matrixIs larger than the second key point matrixIs a dimension of (2);
Matrix the first key point Performing maximum pooling to filter the second keypoint matrixFurther training the target detection neural network;
wherein, in training the target detection neural network, determining a loss function of a rotating rectangular bounding box based on projection position offset;
inputting the image to be detected into the trained target detection neural network to output a target detection result;
The target detection neural network is provided with a cascade filtering multi-scale detection head, the cascade filtering multi-scale detection head comprises a plurality of key point detection heads, the key point detection heads comprise a first key point detection head and a second key point detection head, and the specific steps of extracting features of the training images and detecting the key points to obtain two thermodynamic diagrams with different sizes comprise:
inputting an image to the target detection neural network based on the training image
The image isGenerating a first feature map and a second feature map after feature extraction of a backbone network, wherein the size of the first feature map is larger than that of the second feature map;
Inputting the first feature map and the second feature map into the first key point detection head and the second key point detection head respectively, wherein the first key point detection head and the second key point detection head are arranged adjacently, and the sizes of the first key point detection head and the second key point detection head are matched with the sizes of the first feature map and the second feature map respectively;
Outputting a first thermodynamic diagram through the first key point detection head and the second key point detection head respectively And a second thermodynamic diagramThe first thermodynamic diagramIs matched with the first characteristic diagram in size, and the second thermodynamic diagramIs matched to the dimensions of the second feature map.
2. The method of claim 1, wherein the maximum points of the thermodynamic diagrams are calculated to obtain a first key point matrixAnd a second key point matrixThe first key point matrixIs larger than the second key point matrixThe specific steps of the dimension of (a) include:
calculating the first thermodynamic diagram by matrix offset and subtraction And said second thermodynamic diagramAll maximum points in (a);
based on the calculated maximum value points, respectively obtaining a first key point matrix And a second key point matrixThe first key point matrixMatching the size of the first feature map with the second key point matrixMatching the size of the second feature map.
3. The method of claim 2, wherein the first keypoint is matrixPerforming maximum pooling to filter the second keypoint matrixThe specific steps of the repeated prediction block of (a) include:
The cascade filtering multi-scale detection head is used for aiming at the first key point matrix Max pooling to obtain a maskThe mask is provided withIs of a size and the second key point matrixIs uniform in size;
using the mask For the second key point matrixFiltering and reserving the second key point matrixIs 1 in (f), and the mask isThe key point with the middle value of 0 is further obtained to obtain a third key point matrixTo the second key point matrixIs filtered by the repeated prediction block.
4. The target detection method according to claim 3, wherein, in said using said maskFor the second key point matrixFiltering and reserving the second key point matrixIs 1 in (f), and the mask isThe key point with the middle value of 0 is further obtained to obtain a third key point matrixThe steps of (a) further comprise:
and sequentially carrying out the filtering operation on the training images through the rest of the adjacent key point detection heads, and marking the prediction result of the target detection neural network as final output so as to realize the training of the target detection neural network.
5. The object detection method according to claim 1, wherein the specific step of determining the loss function of the rotated rectangular bounding box based on the projection position shift includes:
obtaining a real rectangular frame of the training image And predicting rectangular frames
In the real rectangular frameRespectively dividing the w direction and the h direction for N times uniformly to obtainA plurality of location points;
In the real rectangular frame In the w directionIs divided and in the h directionThe divided position points are marked asThe prediction rectangle frameIntermediate to the location pointThe corresponding position points are marked asThe average offset between the position point pairs is the loss function of the rotating rectangular boundary box
6. The method of claim 5, wherein the location pointAnd the position pointThe offset calculation method comprises the following steps: obtaining an offset calculation formula:
Wherein the real rectangular frame Is encoded asThe prediction rectangular frameIs encoded asRespectively represent the real rectangular framesAnd the prediction rectangle frameIs defined by the center point coordinates of (c),Respectively represent the real rectangular framesAnd the prediction rectangle frameIs formed by the length-width dimension of (a),Respectively represent the real rectangular framesAnd the prediction rectangle frameIs a rotation angle of (a);
Intercepting the real rectangular frame One of the position pointsAcquiring the distance between the position point and the real rectangular frameIntercept of upper left cornerAndCalculating the position point according to the geometric propertyA kind of electronic deviceThe coordinates are respectivelyAnd
Intercepting the position pointAt the predicted rectangular boxCorresponding position point in (a)Acquiring the distance between the position point and the prediction rectangular frameIntercept of corresponding corner point in (C)AndFurther calculate the position pointA kind of electronic deviceThe coordinates are respectivelyAnd
According to the position pointAnd the position pointAnd calculating the offset of the corresponding position point according to the coordinates of the position point, and carrying out scale normalization on the Euclidean distance.
7. An object detection system, comprising:
And a pretreatment module: the method comprises the steps of preprocessing an image to obtain a training image, and inputting the training image into a target detection neural network;
the target detection neural network is provided with a cascade filtering multi-scale detection head, the cascade filtering multi-scale detection head comprises a plurality of key point detection heads, and the key point detection heads comprise a first key point detection head and a second key point detection head;
Thermodynamic diagram acquisition module: the method comprises the steps of extracting features of the training images and detecting key points of the training images to obtain two thermodynamic diagrams with different sizes;
the thermodynamic diagram obtaining module is specifically configured to: inputting an image to the target detection neural network based on the training image
The image isGenerating a first feature map and a second feature map after feature extraction of a backbone network, wherein the size of the first feature map is larger than that of the second feature map;
Inputting the first feature map and the second feature map into the first key point detection head and the second key point detection head respectively, wherein the first key point detection head and the second key point detection head are arranged adjacently, and the sizes of the first key point detection head and the second key point detection head are matched with the sizes of the first feature map and the second feature map respectively;
Outputting a first thermodynamic diagram through the first key point detection head and the second key point detection head respectively And a second thermodynamic diagramThe first thermodynamic diagramIs matched with the first characteristic diagram in size, and the second thermodynamic diagramIs matched to the size of the second feature map;
Matrix acquisition module: for carrying out maximum point calculation on two thermodynamic diagrams to obtain a first key point matrix And a second key point matrixThe first key point matrixIs larger than the second key point matrixIs a dimension of (2);
and a filtering module: for matrix the first keypoint Performing maximum pooling to filter the second keypoint matrixFurther training the target detection neural network;
wherein, in training the target detection neural network, determining a loss function of a rotating rectangular bounding box based on projection position offset;
and a detection module: and the target detection neural network is used for inputting the image to be detected into the trained target detection neural network so as to output a target detection result.
8. A storage medium having stored thereon a computer program which, when executed by a processor, implements the object detection method according to any of claims 1-6.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the object detection method according to any of claims 1-6 when executing the computer program.
CN202410612636.5A 2024-05-17 2024-05-17 Target detection method, target detection system, storage medium and electronic equipment Active CN118196401B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410612636.5A CN118196401B (en) 2024-05-17 2024-05-17 Target detection method, target detection system, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410612636.5A CN118196401B (en) 2024-05-17 2024-05-17 Target detection method, target detection system, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN118196401A CN118196401A (en) 2024-06-14
CN118196401B true CN118196401B (en) 2024-07-19

Family

ID=91402041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410612636.5A Active CN118196401B (en) 2024-05-17 2024-05-17 Target detection method, target detection system, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN118196401B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110580487A (en) * 2018-06-08 2019-12-17 Oppo广东移动通信有限公司 Neural network training method, neural network construction method, image processing method and device
CN111553212A (en) * 2020-04-16 2020-08-18 中国科学院深圳先进技术研究院 Remote sensing image target detection method based on smooth frame regression function

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102019106123A1 (en) * 2018-03-12 2019-09-12 Nvidia Corporation Three-dimensional (3D) pose estimation from the side of a monocular camera
CN112465821A (en) * 2020-12-22 2021-03-09 中国科学院合肥物质科学研究院 Multi-scale pest image detection method based on boundary key point perception
CN112364843A (en) * 2021-01-11 2021-02-12 中国科学院自动化研究所 Plug-in aerial image target positioning detection method, system and equipment
CN115830638A (en) * 2022-12-14 2023-03-21 中国电信股份有限公司 Small-size human head detection method based on attention mechanism and related equipment
CN117523336A (en) * 2023-11-13 2024-02-06 中国工程物理研究院电子工程研究所 Training method of key point detection model and key point detection method
CN117711063A (en) * 2023-12-14 2024-03-15 苏州万店掌网络科技有限公司 Target behavior detection method, device, equipment and medium
CN117934827A (en) * 2023-12-25 2024-04-26 浙江大华技术股份有限公司 Key point detection model training method, key point detection method and related device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110580487A (en) * 2018-06-08 2019-12-17 Oppo广东移动通信有限公司 Neural network training method, neural network construction method, image processing method and device
CN111553212A (en) * 2020-04-16 2020-08-18 中国科学院深圳先进技术研究院 Remote sensing image target detection method based on smooth frame regression function

Also Published As

Publication number Publication date
CN118196401A (en) 2024-06-14

Similar Documents

Publication Publication Date Title
US8160366B2 (en) Object recognition device, object recognition method, program for object recognition method, and recording medium having recorded thereon program for object recognition method
JP6000455B2 (en) Form recognition method and form recognition apparatus
CN108875723B (en) Object detection method, device and system and storage medium
Yu et al. Feature point-based copy-move forgery detection: covering the non-textured areas
CN108875731B (en) Target identification method, device, system and storage medium
US9053389B2 (en) Hough transform for circles
Huang et al. Efficient image stitching of continuous image sequence with image and seam selections
CN101650784B (en) Method for matching images by utilizing structural context characteristics
EP2136319A2 (en) Object recognition device, object recognition method, program for object recognition method, and recording medium having recorded thereon program for object recognition method
Shah et al. Removal of specular reflections from image sequences using feature correspondences
CN111626295A (en) Training method and device for license plate detection model
Flenner et al. Resampling forgery detection using deep learning and a-contrario analysis
CN111814852A (en) Image detection method, image detection device, electronic equipment and computer-readable storage medium
CN116363037A (en) Multi-mode image fusion method, device and equipment
CN110516731B (en) Visual odometer feature point detection method and system based on deep learning
Li et al. SPCS: a spatial pyramid convolutional shuffle module for YOLO to detect occluded object
Diaa A Deep Learning Model to Inspect Image Forgery on SURF Keypoints of SLIC Segmented Regions
CN118196401B (en) Target detection method, target detection system, storage medium and electronic equipment
Roth et al. Wide-baseline image matching with projective view synthesis and calibrated geometric verification
CN111915645B (en) Image matching method and device, computer equipment and computer readable storage medium
CN112541507A (en) Multi-scale convolutional neural network feature extraction method, system, medium and application
CN117333518A (en) Laser scanning image matching method, system and computer equipment
CN113051901B (en) Identification card text recognition method, system, medium and electronic terminal
CN111932515B (en) Short circuit detection method and system for product residual defects and defect classification system
JP2005537562A (en) Skin pattern image processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant