[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN115588126A - GAM, CARAFE and SnIoU fused vehicle target detection method - Google Patents

GAM, CARAFE and SnIoU fused vehicle target detection method Download PDF

Info

Publication number
CN115588126A
CN115588126A CN202211194651.XA CN202211194651A CN115588126A CN 115588126 A CN115588126 A CN 115588126A CN 202211194651 A CN202211194651 A CN 202211194651A CN 115588126 A CN115588126 A CN 115588126A
Authority
CN
China
Prior art keywords
gam
substep
sniou
carafe
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211194651.XA
Other languages
Chinese (zh)
Inventor
吴昌昊
骆文辉
徐徐
邢凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangtze River Delta Information Intelligence Innovation Research Institute
Original Assignee
Yangtze River Delta Information Intelligence Innovation Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangtze River Delta Information Intelligence Innovation Research Institute filed Critical Yangtze River Delta Information Intelligence Innovation Research Institute
Priority to CN202211194651.XA priority Critical patent/CN115588126A/en
Publication of CN115588126A publication Critical patent/CN115588126A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/32Normalisation of the pattern dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a vehicle target detection method fusing GAM, CARAFE and SnIoU, which comprises the following steps: converting the data set into a format suitable for YOLOv5 training, performing data enhancement on the image, adding GAM modules in a YOLOv5 main network and a neck network, using CARAFE to replace nearest neighbor interpolation for up-sampling in the neck network, and finally using SnIoU-Loss as a Loss function of the algorithm to finish detection of various vehicles under a monitoring view angle. The invention combines a GAM attention mechanism in a backbone network, samples on a neck network combined attention module and content perception feature recombination, predicts recombination kernels by content information of a bottom layer, recombines features in a predefined nearby area, learns global weight information according to the features of different scales and efficiently fuses, and provides a loss function to help training convergence process and effect. The invention can solve the problems of the prior art that the target is blocked, blurred and has poor detection precision.

Description

GAM, CARAFE and SnIoU fused vehicle target detection method
Technical Field
The invention relates to the technical field of vehicle target detection, in particular to a vehicle target detection method fusing GAM, CARAFE and SnIoU.
Background
With the gradual increase of the living standard of people, vehicles for daily transportation are increasing, and how to effectively manage the vehicles on the road faces a huge challenge. Vehicle target detection is a key basic technology for building smart cities, and is long-paid attention by a plurality of researchers at home and abroad. There are two main approaches: one method is to extract target features by using a traditional machine learning method, such as an HOG method, and input the extracted target features into a classifier, such as a Support Vector Machine (SVM), an iterator (AdaBoost), and the like, for classification detection; and the other is to automatically complete the task of feature extraction and detection of the target by using a deep learning technology (such as a convolutional neural network). Compared with a picture data set, the video data set has the conditions of fuzzy objects, mutual shielding and the like, so that the target information is difficult to be correctly extracted by the conventional method, and the correct positioning and classification are difficult to be carried out.
There are many related inventions disclosed using YOLOv5 for efficient detection of targets. For example, the publication No. CN114882393A, the publication date of the invention is 2022, 8 months and 9 days, i.e., discloses a method for detecting road retrograde motion and traffic accident events based on target detection, which comprises the following steps: s1, acquiring original data; s2, obtaining a sample from the original data, and marking the position of the vehicle in the frame picture and the vehicle type in the sample; s3, obtaining a training set and a verification set through data processing; s4, improving a data enhancement method and an activation function of the original YOLOv5 to obtain a YOLOv 5-beta model; s5, inputting the training set and the verification set into a YOLOv 5-beter model, and obtaining a weight file of an improved model through training; s6, inputting the obtained weight file into a YOLOv 5-beter model, performing test set test to obtain vehicle information, and inputting the vehicle information into deepsort to obtain the serial number id and the type of the vehicle; and S7, inputting the corresponding position information of each id in the video frame into a logic judgment algorithm to judge whether the vehicle is running in the wrong direction or an accident occurs. The method for detecting the road reverse driving and traffic accident events based on the target detection can be suitable for intelligent video analysis work. However, the above models often cannot combine more global semantic information for object detection.
Disclosure of Invention
1. Technical problem to be solved by the invention
In order to overcome the problems in the prior art, the invention provides a vehicle target detection method fusing GAM, CARAFE and SnIoU; according to the method, the position and local information are mined through the GAM module, the semantic information of the target is better extracted through CARAFE upsampling, and finally the model is converged more quickly and accurately through SnIoU-Loss, so that not only can the self-adaptive kernel be dynamically generated by using the high-level semantic information, but also the precision of the prediction frame can be improved through the regression vector angle.
2. Technical scheme
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
the invention discloses a vehicle target detection method fusing GAM, CARAFE and SnIoU, which comprises the following steps:
step 1, acquiring a vehicle target detection data set;
step 2, preprocessing the data set picture;
step 3, constructing a detection network, and introducing a GAM module into a backbone network and a neck network of YOLOv 5;
step 4, using CARAFE to replace nearest neighbor interpolation upsampling in a neck network of YOLOv 5;
step 5, replacing the Loss function with SnIoU-Loss;
and 6, inputting the training set into an improved YOLOv5-GCS model for training to obtain a weight file, and then testing the test set by using the weight file to obtain a final result.
Further, the vehicle target detection data set needs to be converted into a format suitable for YOLOv5 training in step 1, and data enhancement is performed on the data set characteristics in step 2.
Further, step 3 adds [ -1, gam attention, [ n, n ] ], to the backbone and neck network, i.e. layers 9, 19, 23, 27, in the version source code YOLOv5s.yaml file, where-1 represents the input of the layer from the output of the previous layer, 1 represents the repetition number of the layer, and [ n, n ] represents the number of input channels and the number of output channels are both n, and the number of channels in different layers is not consistent.
Furthermore, the GAM module introduced in step 3 is divided into two parts, namely a front part and a back part, wherein the first part is channel attention, and the second part is space attention, and the following is specific:
substep 1 obtains an input vector from the last convolutional layer
Figure BDA0003870488940000021
Through the channel attention M c Becomes a vector F 2
Substep 2 input vector
Figure BDA0003870488940000022
Attention M through space s Becomes a vector F 3
Further, step 4 replaces the original nn. Upsample nearest neighbor interpolation with the caroafe upsampling at the neck network, i.e. layers 12, 16, in the yollov 5 version source code YOLOv5s.yaml file.
Further, the modified upsampling method has the following flow:
substep 1 for an input profile with a shape of H × W × C, the number of channels is first compressed to C by a 1 × 1 convolution m
Substep 2 sets the upsampling kernel shape to be predicted to σ H × σ W × k up ×k up Wherein σ is an upsampling magnification;
for the input feature map compressed in substep 1, use is made of a k encoder ×k encoder Predicting the upsampled kernel by the convolution layer of (C), the number of input channels being m The number of output channels is
Figure BDA0003870488940000023
The channels are then maintained in the spatial dimensionIs unfolded to obtain a shape of
Figure BDA0003870488940000031
The upsampling core of (a);
substep 3, normalizing the up-sampling kernel obtained in the substep 2 by utilizing softmax, so that the weight sum of convolution kernels is 1;
substep 4 for each position in the output profile, mapping it back to the input profile, taking out the k centered on it up ×k up And (3) performing dot product with the up-sampling kernel of the point predicted by the substep 2 to obtain an output value, wherein different channels at the same position share the same up-sampling kernel.
Further, the process of calculating SnIoU-Loss in step 5 is as follows:
substep 1 calculating an angle loss Λ;
substep 2, calculating distance loss delta according to the angle loss lambda;
substep 3 calculating a shape loss Ω;
substep 4 calculating SnIoU-Loss
Figure BDA0003870488940000032
The IoU is the ratio of the intersection and union of the prediction frame and the real frame, and n is a constant.
3. Advantageous effects
Compared with the prior art, the technical scheme provided by the invention has the following remarkable effects:
in the prior monitoring, a field tiled monitoring equipment list is implemented, and the monitoring position is not intuitive; the equipment operation is comparatively traditional, needs the manual inspection of high frequency. At present, the problems of low efficiency, low reliability and the like exist in manual detection under huge industrial scale. With the advent of large-scale data sets, the difficulty of machine learning feature engineering is increasing, while deep learning models can learn the intrinsic features of data from the data itself. According to the method, the position and local information are mined through the GAM module, the semantic information of the target is better extracted through CARAFE upsampling, and finally the model is converged more quickly and accurately through SnIoU-Loss, so that the detection of the road vehicle target is completed. By utilizing the improved YOLOv5-GCS model, the vehicle target detection with high speed and high accuracy can be realized, and the method has important significance for the field application of maintaining road safety and relieving traffic jam.
Drawings
FIG. 1 is a schematic diagram of a model structure according to the present invention;
FIG. 2 is a schematic diagram of the channel attention submodule of the present invention;
FIG. 3 is a schematic diagram of the spatial attention submodule of the present invention;
FIG. 4 is a schematic diagram of the CARAFE upsampling module of the present invention;
FIG. 5 is a schematic diagram of an angle relationship between a real frame and a predicted frame according to the present invention;
FIG. 6 is a schematic diagram of the cross-comparison between the real frame and the predicted frame in the present invention;
FIG. 7 is a confusion matrix thermodynamic diagram of a model of the invention;
FIG. 8 is a flow chart of the detection method of the present invention.
Detailed Description
For a further understanding of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings and examples.
Example 1
The vehicle target detection model proposed in the present embodiment has a structure as shown in fig. 1. With reference to fig. 8, the method for detecting a vehicle target by fusing GAM, caraafe and SnIoU in this embodiment includes the following specific steps:
step 1 searching vehicle target detection data set close to monitoring visual angle
The UA-DETRAC data set of the vehicle is shot by using a road overpass mainly in Beijing and Tianjin, and is converted into a format suitable for Yolov5 training.
Step 2, preprocessing the data set picture
It is data enhanced for data set characteristics.
Substep 1 data enhancement was first performed with hsv, taking into account weather, light, etc.
The substep 2 performs processing using the picture horizontal inversion in consideration of the traveling direction of the vehicle.
And the substep 3 randomly cuts the four pictures, and then splices the four pictures to one picture as training data, so that the background of the pictures is enriched, and the blocksize is improved by splicing the four pictures together in a phase-changing manner.
Step 3, constructing a detection network, and introducing GAM modules into a backbone network and a neck network of YOLOv5
The kernel weights of CNNs are shared, i.e., all convolution kernels in the same layer have a set of same and shared weights, i.e., the functions of detecting features at different positions of a graph are the same, so that the corresponding feature graph responses are translated identically when the object in the graph is translated. Moreover, since the feature map is compressed by the pooling (posing) step, the final result of CNN is the same as before the translation.
The positional information of the vehicle at the monitoring perspective is essentially fixed and GAM is a global attention mechanism that improves the performance of the deep neural network by reducing information dispersion and enlarging the global interactive representation, capturing important features across all three-dimensional channels, spatial widths and spatial heights.
In this embodiment, in a source code YOLOv5s.yaml file of YOLOv5 version, a GAM module [ -1, GAM attention, [ n, n ] ] -1 is added to the 9 th, 19 th, 23 th and 27 th layers of the backbone and neck network, i.e., the input of the layer is output from the previous layer, 1 represents the number of repetitions of the layer, and [ n, n ] represents that the number of input channels and the number of output channels are both n, the number of channels in different layers is not the same, and the GAM attention allows the model to pay more attention to a certain part of the region. The GAM module is divided into a front part and a rear part, wherein the first part is channel attention, and the second part is space attention, and the detailed description is as follows:
substep 1 obtains the input vector from the last convolutional layer
Figure BDA0003870488940000041
Through the channel attention M c Becomes a vector F 2 The calculation formula is as follows:
Figure BDA0003870488940000042
wherein
Figure BDA0003870488940000051
Representing corresponding multiplication of co-located elements, M c The function is shown in fig. 2.
M C (F 1 )=σ(MLP(P(F 1 ))
=σP′(W 1 (W 0 (P(F 1 ))))
Where P is the channel transform, putting the first dimension to the last dimension, W 0 、W 1 The matrix is a fully-connected weight matrix, P' is inverse channel transformation, the matrix after P transformation is recovered, and sigma is a Sigmoid activation function.
Substep 2 input vector
Figure BDA0003870488940000052
Attention M through space s Becomes a vector F 3 The calculation formula is as follows:
Figure BDA0003870488940000053
wherein
Figure BDA0003870488940000054
Representing corresponding multiplication of co-located elements, M s The function is shown in fig. 3.
M S (F 2 )=σ(BN(Conv 7*7 (BN(Conv 7*7 (F 2 )))))
Wherein Conv 7*7 Represents a convolution kernel of 7 × 7, BN is batch normalization, σ is Sigmoid activation function.
Step 4 replace nearest neighbor interpolation upsampling using CARAFE in the neck network
The upsampling operation may be expressed as a dot product of the upsampling kernel at each location and the pixels of the corresponding neighborhood in the input feature map, which is referred to as feature reorganization. The upper sampling operation CARAFE can have a larger receptive field during recombination, the recombination process can be guided according to the input characteristics, semantic information extracted by the convolution operation before can be better expressed, and a better sampling effect is achieved on fuzzy vehicles.
In this embodiment, in a source code YOLOv5s.yaml file of a version of YOLOv5, original nn.upsample nearest neighbor interpolation is replaced by CARAFE upsampling in a neck network, i.e., layers 12 and 16. The modified upsampling method has the following flow:
substep 1 signature channel compression
For an input feature map with the shape of H × W × C (the input feature map is the output feature map of the previous layer), the number of channels is firstly compressed to C by using a 1 × 1 convolution m The main purpose of this step is to reduce the computational load of the subsequent steps.
Sub-step 2 content coding and upsampling kernel prediction
Assuming an upsampled kernel size of k up ×k up (larger upsampling kernel means larger receptive field and larger calculation amount), in the present embodiment, it is desirable to use different upsampling kernels for each position of the output feature map, so the upsampling kernel shape to be predicted is set to be σ H × σ W × k up ×k up Where σ is the upsampling magnification.
For the compressed input feature map in substep 1, use is made of a k encoder ×k encoder Predicting the upsampled core by the convolution layer of (C) m The number of output channels is
Figure BDA0003870488940000055
Then, the channel dimension is expanded in the space dimension to obtain the shape of
Figure BDA0003870488940000056
The upsampling kernel of (1).
Substep 3 upsampling kernel normalization
The upsampling kernel obtained in sub-step 2 is normalized by softmax so that the sum of the weights of the convolution kernels is 1.
Substep 4 feature reorganization
For each position in the output profile, it is mapped back to the input profile, taking out the k centered on it up ×k up And (3) performing dot product with the upsampling kernel of the point predicted by the substep 2 to obtain an output value. Different channels at the same location share the same upsampling core.
The specific structure of the up-sampling kernel prediction and the feature reorganization is shown in fig. 4.
Step 5, replacing the Loss function with SnIoU-Loss
Where the cars are dense, the prediction boxes for different vehicles but at close distances may be processed by the NMS. The vector regression angle is introduced into the SioU-Loss, so that a prediction frame can be converged more quickly, NMS (network management system) processing is prevented, an index n is fused, and the precision is further improved.
Substep 1 calculation of the angular loss
The model will try to first bring the predicted box to either the horizontal X-axis or the vertical Y-axis of the real box (whichever is closest), and then continue the approach along the relevant axis. In the convergence process if
Figure BDA0003870488940000061
Will try to minimize alpha first, otherwise minimize
Figure BDA0003870488940000062
The angle cost calculation process is shown in fig. 5.
Figure BDA0003870488940000063
Wherein c is h Is the height difference between the center points of the real frame and the predicted frame, sigma is the distance between the center points of the real frame and the predicted frame,
Figure BDA0003870488940000064
equal to the angle alpha.
Substep 2 calculating distance loss
In view of the above-mentioned angle loss, the distance loss is redefined:
Figure BDA0003870488940000065
wherein:
Figure BDA0003870488940000066
herein (c) w ,c h ) Width difference and height difference of center points of real frame and predicted frame, (c) w2 ,c h2 ) The width and the height of the minimum bounding rectangle of the real frame and the prediction frame are shown.
Substep 3 calculating the shape loss
The shape loss is defined as follows:
Figure BDA0003870488940000071
wherein:
Figure BDA0003870488940000072
here, (w, h) and (w) gt ,h gt ) Theta controls the degree of concern over shape loss for the width and height of the prediction box and the real box, respectively, and in order to avoid over-concern over shape loss and reduce movement of the prediction box, the present invention uses a genetic algorithm to calculate theta close to 4, and thus is defined as a theta parameter in the range of [2,6 ] for the purpose of reducing movement of the prediction box]。
Substep 4 calculating SnIoU-Loss
Figure BDA0003870488940000073
Where IoU is shown in FIG. 6, the ratio of the intersection and union of the prediction box and the real box.
Where n is typically 3, the gradient acceleration convergence may be increased.
And 6, inputting the training set of the UA-DETRAC into an improved YOLOv5-GCS model for training to obtain a weight file, and then testing the UA-DETRAC test set by using the weight file to obtain a final result.
The official test set classifies all vehicles into the same class. The Faster R-CNN uses the generation of the region proposal as the first stage, which usually shows higher recognition accuracy, but the generation of a large number of candidate frames greatly reduces the execution efficiency of the system. RN-VIDs solve the ambiguity problem with the help of optical flow and future frames, but calculating the optical flow and using the future information can make online detection difficult. The centeret detects an object as a point, and cannot completely draw a detection frame in case of serious occlusion between vehicles. YOLOv5 balances speed and accuracy, allowing real-time target detection. Table 1 lists the average accuracy of these models at an IoU threshold of 0.7, with the model reaching the highest accuracy for a single class.
TABLE 1 average accuracy of different models
Figure BDA0003870488940000074
The traffic policies are different for different vehicle driving behaviors, and in view of the above, the categories are classified into car, van, bus and others according to the original labels and retrained again. In order to more intuitively and effectively show the effect of the model, a confusion matrix thermodynamic diagram of the test result of the model is given below, and an ablation test table of part of modules is given. FIG. 7 is a confusion matrix thermodynamic diagram of the test results of the YOLOv5-GCS model proposed by the present invention, wherein the color depth of squares represents the prediction rate. As can be seen from fig. 7, since others are various vehicles, such as: police cars, construction vehicles, trucks, etc., and the number of car samples is so large that it is easily misclassified as cars. The rest confusion matrixes show that the model provided by the invention has better prediction performance. From table 2, it can be seen that the detection accuracy of the model provided by the present invention is superior to that of the original YOLOv5 model, and the superiority of the model of the present invention is further demonstrated.
Ablation experiments for the modules of Table 2
Figure BDA0003870488940000081
The invention utilizes a target detection model YOLOv 5.1 (YOLOv 5 for short) which is popular in hand and is iterated continuously from release to now to replace the traditional models such as HOG, DPM and the like, so that the model can efficiently detect vehicles under different backgrounds and angles. The invention combines a GAM attention mechanism with a backbone network of YOLOv5, thereby being capable of amplifying global dimension interaction characteristics under the condition of reducing information dispersion. According to the method, a neck network combined attention module (GAM) and a content perception feature recombination upsampling (CARAFE) are adopted, the recombination kernel is predicted by the content information of the bottom layer, the features are recombined in a predefined nearby area, and then global weight information is learned and efficiently fused according to the features with different scales. In addition, the invention also provides a Loss function SnIoU-Loss based on the SIoU-Loss, and the Loss function greatly helps the training convergence process and effect by introducing a regression vector angle and an index n so as to solve the problem of insufficient detection precision in the prior art and finally obtain an improved YOLOv5-GCS model detection algorithm. By adopting the vehicle target detection method based on deep learning, the problems that the existing target is shielded and blurred and the detection precision is poor can be solved.
The present invention and its embodiments have been described above schematically, without limitation, and what is shown in the drawings is only one of the embodiments of the present invention, and the actual structure is not limited thereto. Therefore, if the person skilled in the art receives the teaching, without departing from the spirit of the invention, the person skilled in the art shall not inventively design the similar structural modes and embodiments to the technical solution, but shall fall within the scope of the invention.

Claims (7)

1. A method for detecting a vehicle target by fusing GAM, CARAFE and SnIoU is characterized by comprising the following steps:
step 1, acquiring a vehicle target detection data set;
step 2, preprocessing a data set picture;
step 3, constructing a detection network, and introducing a GAM module into a backbone network and a neck network of YOLOv 5;
step 4, using CARAFE to replace nearest neighbor interpolation upsampling in a neck network of YOLOv 5;
step 5, replacing the Loss function with SnIoU-Loss;
and 6, inputting the training set into an improved YOLOv5-GCS model for training to obtain a weight file, and then testing the test set by using the weight file to obtain a final result.
2. The vehicle target detection method fusing GAM, CARAFE and SnIoU according to claim 1, characterized in that: the vehicle target detection dataset in step 1 needs to be converted into a format suitable for YOLOv5 training, and data enhancement is performed on the dataset in step 2 for the characteristics of the dataset.
3. The method of claim 2 for vehicle object detection incorporating GAM, CARAFE and SnIoU, wherein: step 3, in a source code Yolov5s.yaml file of a version of Yolov5, [ -1, GAM attention, [ n, n ] ], is added to the trunk and neck network, namely 9 th, 19 th, 23 th and 27 th layers, wherein-1 represents the input of the layer from the output of the previous layer, 1 represents the repetition number of the layer, and [ n, n ] represents that the number of input channels and the number of output channels are both n, and the number of channels in different layers is not consistent.
4. The vehicle target detection method fusing GAM, CARAFE and SnIoU according to claim 3, wherein: the GAM module introduced in the step 3 is divided into a front part and a rear part, wherein the first part is channel attention, and the second part is space attention, and the method specifically comprises the following steps:
substep 1 obtains an input vector from the last convolutional layer
Figure FDA0003870488930000011
Through the channel attention M c Becomes a vector F 2
Substep 2 input toMeasurement of
Figure FDA0003870488930000012
Attention M through space s Becomes a vector F 3
5. The method for vehicle object detection fusing GAM, CARAFE and SnIoU according to any one of claims 1-4, wherein: step 4, in a version source code YOLOv5s.yaml file of YOLOv5, original nn.UpSample nearest neighbor interpolation is replaced by CARAFE up-sampling in a neck network, namely layers 12 and 16.
6. The vehicle target detection method fusing GAM, CARAFE and SnIoU according to claim 5, wherein: the modified upsampling method has the following flow:
substep 1 for an input profile of H × W × C, the number of channels is first compressed to C by a 1 × 1 convolution m
Substep 2 sets the upsampling kernel shape to be predicted to σ H × σ W × k up ×k up Wherein σ is an upsampling magnification;
for the compressed input feature map in substep 1, use is made of a k encoder ×k encoder Predicting the upsampled core by the convolution layer of (C) m The number of output channels is
Figure FDA0003870488930000021
Then, the channel dimension is expanded in the space dimension to obtain the shape of
Figure FDA0003870488930000022
The upsampling core of (a);
substep 3, normalizing the up-sampling kernel obtained in the substep 2 by utilizing softmax, so that the weight sum of convolution kernels is 1;
substep 4 for each position in the output profile, mapping it back to the input profile, taking out the k centered on it up ×k up And sub-step 2 predictionAnd performing dot product on the obtained up-sampling kernel of the point to obtain an output value, wherein different channels at the same position share the same up-sampling kernel.
7. The vehicle target detection method fusing GAM, CARAFE and SnIoU according to claim 6, wherein: the SnIoU-Loss calculation process in the step 5 is as follows:
substep 1, calculating an angle loss Lambda;
substep 2, calculating distance loss delta according to the angle loss lambda;
substep 3 calculating a shape loss Ω;
substep 4 calculating SnIoU-Loss
Figure FDA0003870488930000023
The IoU is the ratio of the intersection and union of the prediction frame and the real frame, and n is a constant.
CN202211194651.XA 2022-09-29 2022-09-29 GAM, CARAFE and SnIoU fused vehicle target detection method Pending CN115588126A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211194651.XA CN115588126A (en) 2022-09-29 2022-09-29 GAM, CARAFE and SnIoU fused vehicle target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211194651.XA CN115588126A (en) 2022-09-29 2022-09-29 GAM, CARAFE and SnIoU fused vehicle target detection method

Publications (1)

Publication Number Publication Date
CN115588126A true CN115588126A (en) 2023-01-10

Family

ID=84777936

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211194651.XA Pending CN115588126A (en) 2022-09-29 2022-09-29 GAM, CARAFE and SnIoU fused vehicle target detection method

Country Status (1)

Country Link
CN (1) CN115588126A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116681983A (en) * 2023-06-02 2023-09-01 中国矿业大学 Long and narrow target detection method based on deep learning
CN117351353A (en) * 2023-10-16 2024-01-05 常熟理工学院 Crop pest real-time detection method and device based on deep learning and computer storage medium
CN117468084A (en) * 2023-12-27 2024-01-30 浙江晶盛机电股份有限公司 Crystal bar growth control method and device, crystal growth furnace system and computer equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116681983A (en) * 2023-06-02 2023-09-01 中国矿业大学 Long and narrow target detection method based on deep learning
CN116681983B (en) * 2023-06-02 2024-06-11 中国矿业大学 Long and narrow target detection method based on deep learning
CN117351353A (en) * 2023-10-16 2024-01-05 常熟理工学院 Crop pest real-time detection method and device based on deep learning and computer storage medium
CN117468084A (en) * 2023-12-27 2024-01-30 浙江晶盛机电股份有限公司 Crystal bar growth control method and device, crystal growth furnace system and computer equipment
CN117468084B (en) * 2023-12-27 2024-05-28 浙江晶盛机电股份有限公司 Crystal bar growth control method and device, crystal growth furnace system and computer equipment

Similar Documents

Publication Publication Date Title
Saha et al. Translating images into maps
CN107169421B (en) Automobile driving scene target detection method based on deep convolutional neural network
CN110414418B (en) Road detection method for multi-scale fusion of image-laser radar image data
CN110728200A (en) Real-time pedestrian detection method and system based on deep learning
CN110298257B (en) Driver behavior recognition method based on human body multi-part characteristics
CN115588126A (en) GAM, CARAFE and SnIoU fused vehicle target detection method
CN110379020A (en) A kind of laser point cloud painting methods and device based on generation confrontation network
CN114202743A (en) Improved fast-RCNN-based small target detection method in automatic driving scene
CN115019043B (en) Cross-attention mechanism-based three-dimensional object detection method based on image point cloud fusion
CN111814863A (en) Detection method for light-weight vehicles and pedestrians
CN112070174A (en) Text detection method in natural scene based on deep learning
CN112990065A (en) Optimized YOLOv5 model-based vehicle classification detection method
CN114842085A (en) Full-scene vehicle attitude estimation method
CN117975436A (en) Three-dimensional target detection method based on multi-mode fusion and deformable attention
CN115273032A (en) Traffic sign recognition method, apparatus, device and medium
CN114973199A (en) Rail transit train obstacle detection method based on convolutional neural network
CN111666988A (en) Target detection algorithm based on multi-layer information fusion
CN117111055A (en) Vehicle state sensing method based on thunder fusion
CN117115690A (en) Unmanned aerial vehicle traffic target detection method and system based on deep learning and shallow feature enhancement
CN115082869B (en) Vehicle-road cooperative multi-target detection method and system for serving special vehicle
CN117058641A (en) Panoramic driving perception method based on deep learning
CN116563825A (en) Improved Yolov 5-based automatic driving target detection algorithm
CN116109649A (en) 3D point cloud instance segmentation method based on semantic error correction
Nakano et al. Detection of objects jumping in front of car using deep learning
Hadi et al. Semantic instance segmentation in a 3D traffic scene reconstruction task

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination