[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112163602A - Target detection method based on deep neural network - Google Patents

Target detection method based on deep neural network Download PDF

Info

Publication number
CN112163602A
CN112163602A CN202010960423.3A CN202010960423A CN112163602A CN 112163602 A CN112163602 A CN 112163602A CN 202010960423 A CN202010960423 A CN 202010960423A CN 112163602 A CN112163602 A CN 112163602A
Authority
CN
China
Prior art keywords
target detection
deep neural
neural network
network
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010960423.3A
Other languages
Chinese (zh)
Inventor
李利荣
王子炎
熊炜
朱莉
巩朋成
张开
杨荻椿
艾美慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University of Technology
Original Assignee
Hubei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University of Technology filed Critical Hubei University of Technology
Priority to CN202010960423.3A priority Critical patent/CN112163602A/en
Publication of CN112163602A publication Critical patent/CN112163602A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

本发明属于深度学习与机器视觉技术领域,公开了一种基于深度神经网络的目标检测方法,包括获取目标检测对象图像集;对目标检测对象图像集进行预处理后得到数据集,根据数据集构建训练样本集;构建深度神经网络包括特征提取模块、特征融合模块、分类和回归模块;特征提取模块为结合d‑ResNet网络和eSENet模块的新的网络结构eSE‑dResNet;利用训练样本集进行深度神经网络的训练生成目标检测模型;将待检测对象图像输入至目标检测模型中得到目标检测结果。本发明解决了现有技术中目标检测的计算量大、耗时长、泛化能力差、识别精度低的问题,能够显著提高目标检测检测效果,能够适用于各种不良条件下的目标检测。

Figure 202010960423

The invention belongs to the technical field of deep learning and machine vision, and discloses a target detection method based on a deep neural network. Training sample set; constructing deep neural network includes feature extraction module, feature fusion module, classification and regression module; feature extraction module is a new network structure eSE-dResNet combining d-ResNet network and eSENet module; using training sample set for deep neural network The training of the network generates a target detection model; the target detection result is obtained by inputting the image of the object to be detected into the target detection model. The invention solves the problems of large calculation amount, long time consumption, poor generalization ability and low recognition accuracy of target detection in the prior art, can significantly improve the target detection and detection effect, and can be suitable for target detection under various adverse conditions.

Figure 202010960423

Description

Target detection method based on deep neural network
Technical Field
The invention relates to the technical field of deep learning and machine vision, in particular to a target detection method based on a deep neural network.
Background
With the rapid development of deep learning, the detection efficiency and detection accuracy of target detection as an important research direction of computer vision are greatly improved, but the detection effect of the existing target detection cannot be satisfied, and the method cannot be applied to target detection under adverse conditions of complicated image background, high environmental noise, low contrast, uneven illumination and the like.
Taking the detection of train bottom parts as an example, the train bottom parts are one of important train components as a necessary condition for train operation, and in order to ensure safe operation, the components of an inbound train need to be routinely checked. The detection method is generally divided into two methods, the first method is to carry out visual inspection on important parts by manpower, but with the rapid increase of the number of trains, the problems of visual fatigue, inattention or illusion can occur due to the complex environment at the bottom of the trains in the long-time monotonous manual inspection, so that the detection omission is easily caused, and the safe operation of the trains can be influenced.
The traditional target detection algorithm is mainly divided into three steps including region selection, feature extraction and classifier classification. The first step is to select a region, which is to locate the position of the target, and since the target may appear at any position of the image and the size and aspect ratio of the target are uncertain, the whole image is initially traversed by using a sliding window strategy, and different scales and different aspect ratios need to be set. Although this exhaustive strategy contains all possible positions of the target, the disadvantages are also evident, the time complexity of this method is too high, too many redundant windows are generated, which also seriously affects the speed and performance of the subsequent feature extraction and classification. And the second step of feature extraction is not easy to design a robust feature due to factors such as the form diversity, illumination variation diversity and background diversity of the target. However, the accuracy of classification is directly affected by the quality of the extracted features. And thirdly, classifying, namely classifying the features extracted in the last step by a classifier, and generally classifying by a support vector machine.
In summary, there are several major problems with conventional target detection methods: the area selection strategy based on the sliding window has no pertinence, the time complexity is high, the window is redundant, the manually designed characteristics have no good robustness to the change of diversity, and the collected photos have great difference in the aspects of image background, environmental noise, contrast and exposure, so that the target detection under multiple scenes is difficult to realize based on a single type of image processing technology.
Disclosure of Invention
The invention provides a target detection method based on a deep neural network, and solves the problems of large calculation amount, long consumed time, poor generalization capability and low identification precision of target detection in the prior art.
The invention provides a target detection method based on a deep neural network, which comprises the following steps:
step 1, obtaining a target detection object image set;
step 2, preprocessing the target detection object image set to obtain a data set, and constructing a training sample set according to the data set;
step 3, constructing a deep neural network, wherein the deep neural network comprises a feature extraction module, a feature fusion module and a classification and regression module; the feature extraction module is a new network structure eSE-dResNet combining a d-ResNet network and an eSEnet module;
step 4, training the deep neural network by using the training sample set to generate a target detection model;
and 5, inputting the image of the object to be detected into the target detection model to obtain a target detection result.
Preferably, in the step 2, the preprocessing the target detection object image set includes: cutting and correcting the original image; if the original images in the target detection object image set are consistent in width and unequal in height, maintaining the image width unchanged, and cutting the images at different heights, wherein the cutting correction is realized by adopting the following mode:
h=(w-h1)n+(n-1)h1
where h and w represent the total length and width of the original picture, respectively, h1Indicating the height of the excess rectangle after cutting out n pictures.
Preferably, in the step 2, the preprocessing the target detection object image set further includes: expanding the cut and corrected data set to obtain an expanded data set; and marking the target contained in the target detection image in the expanded data set by using a marking tool.
Preferably, in step 3, the d-ResNet network is obtained by adding two cross-layer connections to an identity block in an original ResNet50 structure; the d-ResNet network performs characteristic splicing operation on the input of the first 1 x 1 volume block, the output of the first 1 x 1 volume block and the output of the 3 x 3 volume block, and then takes the spliced result as the input of the second 1 x 1 volume block;
the eSEnet module is embedded between an identity block and a conv block in the d-ResNet network; and the eSEnet module replaces the original two fully-connected layers of the excitation part in the SEnet into a convolution layer with the convolution kernel size of 1.
Preferably, in step 3, the feature fusion module performs feature fusion of different dimensions by using a feature pyramid structure.
Preferably, in the step 3, the feature extraction module includes P1~PiI stages in total, the feature fusion module comprises Ci~CjA total of i-j +1 stages;
to PiPerforming dimensionality reduction operation on the calculation result of the stage to obtain CiThe result of the stage calculation, CiIntermediate result and P obtained after up-sampling operation is carried out on the calculation result of the stagei-1Adding intermediate results obtained after dimensionality reduction operation is carried out on the calculation results of the stages to obtain Ci-1Calculating results of the stages;
c is to bem+1Intermediate result and P obtained after up-sampling operation is carried out on the calculation result of the stagemAdding intermediate results obtained after dimensionality reduction operation is carried out on the calculation results of the stages to obtain CmCalculating results of the stages; wherein m is [ j, i-2 ]]。
Preferably, in the step 3, the classification and regression module includes: classifying sub-networks and regressing sub-networks;
obtaining a classification result through the classification sub-network, and obtaining prior frame coordinate change information through the regression sub-network; obtaining prior frame parameter information by using a k-means clustering algorithm, and obtaining predicted frame position information according to the prior frame parameter information and the coordinate change information of the prior frame; after a plurality of prediction frames are obtained, screening the prediction frames with the scores larger than a given threshold value, and obtaining the score information of the prediction frames; and carrying out non-maximum suppression processing by utilizing the position information of the prediction frame and the score information of the prediction frame to obtain positioning and classification result information.
Preferably, the classification sub-network comprises 4 convolutions of 256 dimensions and 1 convolution of N × K dimensions;
the regression subnetwork comprises 4 convolutions of 256 dimensions and 1 convolution of 4 xK dimensions;
wherein, K represents the number of prior frames possessed by the input feature layer, and N represents the number of types of the objects to be detected.
Preferably, in the step 4, the total loss function adopted by the target detection model includes a classification loss function and a regression loss function; the classification loss function adopts a Focal loss function, the regression loss function adopts a Smooth loss function, and the total loss function is as follows:
Figure BDA0002680342220000031
wherein Loss represents the total Loss function, FL (p)t) A function representing the loss of classification is represented,
Figure BDA0002680342220000032
the regression loss function is represented.
Preferably, the classification loss function is as follows:
FL(pt)=-αt(1-pt)γlog(pt)
wherein alpha istRepresents a weight coefficient, (1-p)t)γDenotes the adjustment coefficient, ptRepresenting the probability that a sample is predicted to be positive;
the definition of the regression loss function and its derivative form are as follows:
Figure BDA0002680342220000041
Figure BDA0002680342220000042
where x represents the difference between the predicted value and the true value.
One or more technical schemes provided by the invention at least have the following technical effects or advantages:
firstly, obtaining a target detection object image set, preprocessing the target detection object image set to obtain a data set, and constructing a training sample set according to the data set; then the constructed deep neural network comprises a feature extraction module, a feature fusion module and a classification and regression module; the feature extraction module is a new network structure eSE-dResNet combining a d-ResNet network and an eSEnet module; then, training a deep neural network by using a training sample set to generate a target detection model; and finally, inputting the image of the object to be detected into the target detection model to obtain a target detection result. The invention can automatically learn the target characteristics by adopting a detection method based on the deep neural network, has strong generalization capability, and can be suitable for target detection under the adverse conditions of complicated image background, high environmental noise, low contrast, uneven illumination and the like.
Drawings
Fig. 1 is a flowchart of a target detection method based on a deep neural network according to embodiment 2 of the present invention;
fig. 2 is a schematic diagram of a prior frame in a target detection method based on a deep neural network according to embodiment 2 of the present invention;
fig. 3 is an overall structure diagram of a deep neural network corresponding to the target detection method based on the deep neural network according to embodiment 2 of the present invention;
fig. 4 is a schematic structural diagram of a feature extraction module in the deep neural network-based target detection method according to embodiment 2 of the present invention;
fig. 5 is a schematic structural diagram of an eSENet module in the deep neural network-based target detection method according to embodiment 2 of the present invention.
Detailed Description
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
Example 1:
the embodiment 1 provides a target detection method based on a deep neural network, which comprises the following steps:
step 1, obtaining an image set of a target detection object.
And 2, preprocessing the target detection object image set to obtain a data set, and constructing a training sample set according to the data set.
Specifically, the preprocessing the target detection object image set includes: cutting and correcting the original image; if the original images in the target detection object image set are consistent in width and different in height, maintaining the image width unchanged, and cutting the images at different heights, wherein the cutting correction is realized by adopting the following method:
h=(w-h1)n+(n-1)h1
where h and w represent the total length and width of the original picture, respectively, h1Indicating the height of the excess rectangle after cutting out n pictures.
The preprocessing the target detection object image set further comprises: expanding the cut and corrected data set to obtain an expanded data set; and marking the target contained in the target detection image in the expanded data set by using a marking tool.
Step 3, constructing a deep neural network, wherein the deep neural network comprises a feature extraction module, a feature fusion module and a classification and regression module; the feature extraction module is a new network structure eSE-dResNet combining a d-ResNet network and an eSEnet module.
The d-ResNet network is obtained by adding two cross-layer connections in an identity block in an original ResNet50 structure; the d-ResNet network performs characteristic splicing operation on the input of the first 1 x 1 volume block, the output of the first 1 x 1 volume block and the output of the 3 x 3 volume block, and then takes the spliced result as the input of the second 1 x 1 volume block; the eSEnet module is embedded between an identity block and a conv block in the d-ResNet network; the eSEnet module replaces the original two fully-connected layers of the excitation part in the SEnet with a convolution layer with the convolution kernel size of 1.
The feature fusion module adopts a feature pyramid structure to perform feature fusion of different dimensions.
The feature extraction module comprises P1~PiI stages in total, the feature fusion module comprises Ci~CjA total of i-j +1 stages;
to PiPerforming dimensionality reduction operation on the calculation result of the stage to obtain CiThe result of the stage calculation, CiIntermediate result and P obtained after up-sampling operation is carried out on the calculation result of the stagei-1Adding intermediate results obtained after dimensionality reduction operation is carried out on the calculation results of the stages to obtain Ci-1Calculating results of the stages;
c is to bem+1Intermediate result and P obtained after up-sampling operation is carried out on the calculation result of the stagemAdding intermediate results obtained after dimensionality reduction operation is carried out on the calculation results of the stages to obtain CmCalculating results of the stages; wherein m is [ j, i-2 ]]。
The classification and regression module includes: classifying sub-networks and regressing sub-networks; obtaining a classification result through the classification sub-network, and obtaining prior frame coordinate change information through the regression sub-network; obtaining prior frame parameter information by using a k-means clustering algorithm, and obtaining predicted frame position information according to the prior frame parameter information and the prior frame coordinate change information; after a plurality of prediction frames are obtained, screening out the prediction frames with the scores larger than a given threshold value, and obtaining score information of the prediction frames; and carrying out non-maximum suppression processing by utilizing the position information of the prediction frame and the score information of the prediction frame to obtain positioning and classification result information.
The classification sub-network comprises 4 convolutions of dimension 256 and 1 convolution of dimension nxk; the regression subnetwork comprises 4 convolutions of 256 dimensions and 1 convolution of 4 xK dimensions; wherein, K represents the number of prior frames possessed by the input feature layer, and N represents the number of the types of the targets to be detected.
And 4, training the deep neural network by using the training sample set to generate a target detection model.
Specifically, the total loss function adopted by the target detection model comprises a classification loss function and a regression loss function; the classification loss function adopts a Focal loss function, the regression loss function adopts a Smooth loss function, and the total loss function is as follows:
Figure BDA0002680342220000061
wherein Loss represents the total Loss function, FL (p)t) A function representing the loss of classification is represented,
Figure BDA0002680342220000062
the regression loss function is represented.
The classification loss function is as follows:
FL(pt)=-αt(1-pt)γlog(pt)
wherein alpha istRepresents a weight coefficient, (1-p)t)γDenotes the adjustment coefficient, ptRepresenting the probability that a sample is predicted to be positive;
the definition of the regression loss function and its derivative form are as follows:
Figure BDA0002680342220000063
Figure BDA0002680342220000071
where x represents the difference between the predicted value and the true value.
And 5, inputting the image of the object to be detected into the target detection model to obtain a target detection result.
The present invention will be further described below by taking the detection of train bottom parts as an example.
Example 2:
embodiment 2 provides a target detection method based on a deep neural network, and designs a new target detection model, which can quickly locate key components at the bottom of a train, realize multi-target classification of a plurality of key components such as axles, hooks and piston rods, reduce manual detection links, and improve detection efficiency. According to the complexity of the environment at the bottom of the locomotive, an improved d-ResNet network is designed on the basis of a residual error network ResNet50, and an eSEnet module is embedded in the improved d-ResNet network, so that the characteristic extraction performance is enhanced; meanwhile, the characteristic pyramid structure is adopted to perform characteristic fusion of different dimensions, so that the network can learn more abundant low-dimensional characteristics and high-dimensional characteristics, and vehicle bottom parts can be detected more accurately. The experimental result shows that the designed network model greatly improves the detection effect of the vehicle bottom part.
The flowchart of this embodiment is shown in fig. 1, and the specific steps are as follows:
step 1: and (6) data processing.
The data set used by the present embodiment is provided by the local railway office. The original data set is obtained by shooting and collecting high-definition linear array cameras erected on the edge of a rail, the width of each picture is 2048 pixels, the height of each picture is 29956-39956, the pictures cannot be directly input into a network for training, and the original data need to be cut and corrected. The clipping manner adopted by the embodiment is as follows:
h=(w-h1)n+(n-1)h1
where h and w represent the total length and width of the original picture, respectively, h1The method shows the height of the surplus rectangle after the n pictures are cut out, and the cutting mode is very simple and is suitable for the pictures with large length-width ratio.
The method includes the steps of keeping the width of a picture unchanged, cutting different pictures in height, cutting an input picture into 2048 x 4096 size in a unified mode for convenient calculation, expanding a data set due to limitation of an integral data set, wherein the cut data size is insufficient and the cut data size contains a part of non-target pictures, expanding the data set from original 5123 to 11747 through geometric transformation such as translation, transposition, mirroring and rotation, dividing processed data into 8037 training sets and 3710 testing sets according to proportion, detecting objects including five types of objects including an I-type axle, a II-type axle, a car logo, a hook and a piston rod, and marking the objects contained in each picture by using a marking tool.
Step 2: and generating a prior frame.
In order to improve the detection performance, 4 kinds of prior frames with different sizes suitable for the data set are obtained by using a k-means clustering algorithm before the deep neural network is trained, the sizes of the prior frames are adjusted according to different feature layers, and each feature layer can divide an input picture into grids corresponding to the length and the width of the feature layer.
It should be noted that the number of the prior frames can be adjusted for different detection objects. The types of the targets detected by the embodiment are only five, the shapes and the sizes are fixed, and for the characteristics of the data set of the embodiment, the embodiment adopts 4 prior frames.
Fig. 2 shows the arrangement of prior frames in different feature layers, and only the last two feature layers are listed here for the 5-layer output feature map of the feature fusion module because the sizes of other feature layers are too large, where fig. 2(a) shows the input picture, fig. 2(b) and fig. 2(C) show the distribution of prior frames in one of the feature layers C6 and C7, respectively, and the size of C7 feature layer is 8 × 4, so the whole picture will be divided into 8 × 4 grids, and then 4 prior frames with different shapes obtained by clustering are established with the center of each frame, and the other feature layers are the same.
And step 3: a loss function is designed.
The model training phase needs to improve the overall performance of the model by minimizing the loss function. The loss function used in this embodiment is divided into two parts, including a classification loss function and a regression loss function, and the two are combined into a total loss metric in this embodiment.
The detection model designed by the embodiment belongs to a single-stage detection model, the detection performance is improved by utilizing a prior frame, but positive and negative samples can also appearAnd the phenomenon of ratio imbalance of difficult and easy samples, the Focal loss used by the RetinaNet network is taken as a classification loss function of the model, and compared with a cross entropy loss function, the Focal loss introduces a weight coefficient alpha on the basis of the Focal loss functiontBy adjusting αtTo reduce the impact of negative examples on training. At the same time, the coefficient (1-p) is introducedt)γThe weights between the samples which are easy to classify and the samples which are difficult to classify are adjusted, and the contribution of the samples which are difficult to classify to the loss value is increased. The loss function is defined as follows:
FL(pt)=-αt(1-pt)γlog(pt)
wherein p istRepresents the probability that the sample is predicted to be positive, when the value of gamma is 2, alphatThe experimental results are optimal when the value is 0.25.
The regression loss function is a Smooth loss function, and the definition of the loss function and the derivative form thereof are shown as follows:
Figure BDA0002680342220000081
Figure BDA0002680342220000091
where x represents the difference between the predicted value and the true value. The Smooth loss function can limit the gradient size, and combines the advantages of L1 loss and L2 loss, so that the loss function has a derivative at the 0 point, and the network is more robust. It can be seen from the derivative formula of the smoothing loss function that when the difference between the prediction frame and the actual frame is too large, the gradient is not too large, and when the difference between the prediction frame and the actual frame is smaller, the sufficiently small gradient can be ensured.
The overall loss function is shown below:
Figure BDA0002680342220000092
and 4, step 4: and inputting the data set into a deep neural network for training.
Inputting the training set obtained in the step 1 into the network in batches for training. In the training process, 50 rounds of training are carried out on data, due to the fact that the picture size is too large and limited by a memory, the number of pictures input into deep neural network training each time is 2, the number of iterations is 200000, an Adam optimizer is adopted in the network, the initial learning rate of the network is set to be 1 x 10-4
The deep neural network framework is shown in fig. 3, and the whole deep neural network is divided into three modules:
(1) a feature extraction module:
in the embodiment, d-Resnet combined with eSEnet improved on the basis of ResNet50 is used as a feature extraction module, and the module I has 56 layers and is divided into 7 stages of P1-P7 (see figure 1). In order to increase the richness and accuracy of feature extraction, this embodiment adds two cross-layer connections to the identity block (constant block) in the original ResNet50 structure, as shown in fig. 4, the original identity block is composed of two convolution blocks with a size of 1 × 1 and one convolution block with a size of 3 × 3, the modified identity block performs a splicing operation (see C connection in fig. 4) on the input of the first 1 × 1 convolution block, the output of the first 1 × 1 convolution block and the output of the 3 × 3 convolution block, then the spliced result is used as the input of the second 1 × 1 convolution block, and then the convolution operation is performed to perform different feature layers, so that the enhanced splicing multiple extraction of different features is realized, the overall effect is improved, which is referred to as dense-ResNet (d-ResNet for short), and in order to fully consider the relationship between feature channels, to enable the network to extract more valuable features, eSENet modules are embedded in each identity block and conv block (connection block), the combination of d-ResNet and eSENet being as shown in fig. 4.
The eSENet module is an improvement on the basis of a SeNet (Squeeze-and-Excitation network), and is as same as the SENet, the eSENet module is divided into a compression part and an Excitation part, a feature channel is fused in a feature recalibration mode, the compression part adopts self-adaptive global pooling operation to compress an input with dimension C of W × H to an output with dimension C of 1 × 1, and the output feature fuses global information. SENET scales characteristic dimensionality through two full-connection layers, the input with the dimensionality of C is changed into output with the dimensionality of C/r by the first full-connection layer through a parameter r, then the output is restored to the initial dimensionality through the second full-connection layer, information loss can be caused by dimensionality reduction operation in the process, eSENET replaces the original two full-connection layers of an excitation part with convolution with the convolution kernel size of 1, information loss is reduced to a certain extent, meanwhile, the calculated amount is reduced, the operation efficiency of a deep neural network is improved, and the eSEN structure is shown in figure 5.
(2) A feature fusion module:
the feature fusion module fuses the calculation results of the feature extraction module, and enhances the detection effect of the deep neural network on objects with different sizes by adding feature channels with different resolutions and different semantic information. Firstly, performing one-time dimensionality reduction operation on the calculation result of the stage P7 to obtain C7, changing the characteristic dimensionality from 8 × 4 × 2048 to 8 × 4 × 256, then performing characteristic upsampling operation on C7, changing the dimensionality of C7 from 8 × 4 × 256 to 16 × 8 × 256, finally performing dimensionality reduction operation on P6, and adding the operation result and the upsampling result of C7 to obtain C6. In a similar way, the number of characteristic layers of P5-P3 is reduced to 256 layers through dimensionality reduction operation, and then the characteristic layers are added with the result of sampling at the previous layer respectively to correspondingly obtain results of C5-C3. The feature fusion is only to add cross-layer connection on the basis of the feature extraction module, so that the parameter quantity cannot be increased while the model effect is improved, and the small increase of the calculated quantity can be ignored.
(3) Classification and regression:
the deeper the deep neural network is, the more serious the spatial information of the extracted features is lost, the influence on the effect of feature fusion can be caused, the depth of the network is insufficient, the semantic information of the extracted features is not rich enough, the detection effect on a large target is not good, and experiments show that the detection effect of the 5-type features adopted in the embodiment is optimal.
After feature fusion, 5 types of feature layers with different sizes and the same dimensionality can be obtained, and the 5 types of feature layers are processed by a classification sub-network and a regression sub-network to obtain a detection result. The classification sub-network comprises 4 convolutions with 256 dimensionalities and 1 convolution with N multiplied by K dimensionalities, wherein K refers to the number of prior frames possessed by an input feature layer, N refers to the number of types of targets to be detected, and classification results are output after the features are subjected to the N multiplied by K convolution. The regression subnetwork comprises 4 convolutions with 256 dimensionalities and 1 convolution with 4 multiplied by K dimensionalities, the output result is the change situation of the coordinate of each prior frame, and the prior frames are combined with the change situations to obtain the position information of the prediction frame. After classification and regression network processing, a plurality of prediction frames are obtained, the prediction frames with the scores larger than a given threshold value are screened out, and NMS (non-maximum suppression) processing is carried out by utilizing the position information and the scores of the frames to obtain a final detection result.
The target detection method based on the deep neural network provided by the embodiment of the invention at least comprises the following technical effects:
(1) the traditional target detection algorithm has the advantages that the manually designed characteristics have no good robustness to the change of diversity, the detection algorithm based on the deep neural network can automatically learn the characteristics of the target, the generalization capability is strong, and the method can be suitable for more scenes.
(2) The invention improves on the basis of a ResNet network, designs a d-ResNet network, combines an eSEnet module as a feature extraction module, and compared with other feature extraction modules, the module introduces intensive connection in a residual error module, realizes the reinforced multiple extraction of different features, has better feature extraction performance and brings less calculation amount.
(3) An attention mechanism is introduced into a feature extraction module, a feature weight calibration method is adopted for fusion among feature channels, the weight of the feature channels is obtained through self learning and distributed, the weight of the useful feature channels is improved, and the weight of the feature channels with small correlation is weakened.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to examples, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims (10)

1. A target detection method based on a deep neural network is characterized by comprising the following steps:
step 1, obtaining a target detection object image set;
step 2, preprocessing the target detection object image set to obtain a data set, and constructing a training sample set according to the data set;
step 3, constructing a deep neural network, wherein the deep neural network comprises a feature extraction module, a feature fusion module and a classification and regression module; the feature extraction module is a new network structure eSE-dResNet combining a d-ResNet network and an eSEnet module;
step 4, training the deep neural network by using the training sample set to generate a target detection model;
and 5, inputting the image of the object to be detected into the target detection model to obtain a target detection result.
2. The method for detecting the target based on the deep neural network of claim 1, wherein in the step 2, the preprocessing the target detection object image set comprises: cutting and correcting the original image; if the original images in the target detection object image set are consistent in width and different in height, maintaining the image width unchanged, and cutting the images at different heights, wherein the cutting correction is realized by adopting the following method:
h=(w-h1)n+(n-1)h1
where h and w represent the total length and width of the original picture, respectively, h1Indicating the height of the excess rectangle after cutting out n pictures.
3. The method for detecting an object based on a deep neural network of claim 2, wherein in the step 2, the preprocessing the image set of the object detection object further comprises: expanding the cut and corrected data set to obtain an expanded data set; and marking the target contained in the target detection image in the expanded data set by using a marking tool.
4. The deep neural network-based target detection method according to claim 1, wherein in the step 3, the d-ResNet network is obtained by adding two cross-layer connections to an identity block in an original ResNet50 structure; the d-ResNet network performs characteristic splicing operation on the input of the first 1 x 1 volume block, the output of the first 1 x 1 volume block and the output of the 3 x 3 volume block, and then takes the spliced result as the input of the second 1 x 1 volume block;
the eSEnet module is embedded between an identity block and a conv block in the d-ResNet network; the eSEnet module replaces the original two fully-connected layers of the excitation part in the SEnet with a convolution layer with the convolution kernel size of 1.
5. The method for detecting the target based on the deep neural network of claim 1, wherein in the step 3, the feature fusion module performs feature fusion of different dimensions by using a feature pyramid structure.
6. The deep neural network-based target detection method according to claim 1, wherein in the step 3, the feature extraction module comprises P1~PiI stages in total, the feature fusion module comprises Ci~CjA total of i-j +1 stages;
to PiPerforming dimensionality reduction operation on the calculation result of the stage to obtain CiThe result of the stage calculation, CiIntermediate result and P obtained after up-sampling operation is carried out on the calculation result of the stagei-1Adding intermediate results obtained after dimensionality reduction operation is carried out on the calculation results of the stages to obtain Ci-1Calculating results of the stages;
c is to bem+1Intermediate result and P obtained after up-sampling operation is carried out on the calculation result of the stagemAdding intermediate results obtained after dimensionality reduction operation is carried out on the calculation results of the stages to obtain CmCalculating results of the stages; wherein m is [ j, i-2 ]]。
7. The deep neural network-based target detection method of claim 1, wherein in the step 3, the classification and regression module comprises: classifying sub-networks and regressing sub-networks;
obtaining a classification result through the classification sub-network, and obtaining prior frame coordinate change information through the regression sub-network; obtaining prior frame parameter information by using a k-means clustering algorithm, and obtaining predicted frame position information according to the prior frame parameter information and the prior frame coordinate change information; after a plurality of prediction frames are obtained, screening out the prediction frames with the scores larger than a given threshold value, and obtaining the score information of the prediction frames; and carrying out non-maximum suppression processing by utilizing the position information of the prediction frame and the score information of the prediction frame to obtain positioning and classification result information.
8. The deep neural network-based object detection method of claim 7, wherein the classification sub-network comprises 4 convolutions of 256 dimensions and 1 convolution of nxk dimensions;
the regression subnetwork comprises 4 convolutions of 256 dimensions and 1 convolution of 4 xK dimensions;
wherein, K represents the number of prior frames possessed by the input feature layer, and N represents the number of the types of the targets to be detected.
9. The method for detecting the target based on the deep neural network of claim 1, wherein in the step 4, the total loss function adopted by the target detection model comprises a classification loss function and a regression loss function; the classification loss function adopts a Focal loss function, the regression loss function adopts a Smooth loss function, and the total loss function is as follows:
Figure FDA0002680342210000021
wherein Loss represents the total Loss function, FL (p)t) A function representing the loss of classification is represented,
Figure FDA0002680342210000022
the regression loss function is represented.
10. The deep neural network-based object detection method of claim 9, wherein the classification loss function is as follows:
FL(pt)=-αt(1-pt)γlog(pt)
wherein alpha istRepresents a weight coefficient, (1-p)t)γDenotes the adjustment coefficient, ptRepresenting the probability that a sample is predicted to be positive;
the definition of the regression loss function and its derivative form are as follows:
Figure FDA0002680342210000031
Figure FDA0002680342210000032
where x represents the difference between the predicted value and the true value.
CN202010960423.3A 2020-09-14 2020-09-14 Target detection method based on deep neural network Pending CN112163602A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010960423.3A CN112163602A (en) 2020-09-14 2020-09-14 Target detection method based on deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010960423.3A CN112163602A (en) 2020-09-14 2020-09-14 Target detection method based on deep neural network

Publications (1)

Publication Number Publication Date
CN112163602A true CN112163602A (en) 2021-01-01

Family

ID=73858002

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010960423.3A Pending CN112163602A (en) 2020-09-14 2020-09-14 Target detection method based on deep neural network

Country Status (1)

Country Link
CN (1) CN112163602A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669343A (en) * 2021-01-04 2021-04-16 桂林电子科技大学 Zhuang minority nationality clothing segmentation method based on deep learning
CN112699900A (en) * 2021-01-05 2021-04-23 东北林业大学 Improved traffic sign identification method of YOLOv4
CN112801110A (en) * 2021-02-01 2021-05-14 中车青岛四方车辆研究所有限公司 Target detection method and device for image distortion correction of linear array camera of rail train
CN112801027A (en) * 2021-02-09 2021-05-14 北京工业大学 Vehicle target detection method based on event camera
CN112861989A (en) * 2021-03-04 2021-05-28 水利部信息中心 Deep neural network regression model based on density screening
CN113221795A (en) * 2021-05-24 2021-08-06 大连恒锐科技股份有限公司 Feature extraction, fusion and comparison method and device for shoe sample retrieval in video
CN113221947A (en) * 2021-04-04 2021-08-06 青岛日日顺乐信云科技有限公司 Industrial quality inspection method and system based on image recognition technology
CN113255837A (en) * 2021-06-29 2021-08-13 南昌工程学院 Improved CenterNet network-based target detection method in industrial environment
CN113421230A (en) * 2021-06-08 2021-09-21 浙江理工大学 Vehicle-mounted liquid crystal display light guide plate defect visual detection method based on target detection network
CN115121913A (en) * 2022-08-30 2022-09-30 北京博清科技有限公司 Method for extracting laser center line
CN115205184A (en) * 2021-04-09 2022-10-18 南京大学 Bottle cap flaw detection method and system
CN115998295A (en) * 2023-03-24 2023-04-25 广东工业大学 Blood fat estimation method, system and device combining far-near infrared light
CN117593593A (en) * 2024-01-18 2024-02-23 湖北工业大学 An image emotion classification method based on multi-scale semantic fusion under emotional gain
CN118135622A (en) * 2023-10-23 2024-06-04 来邦科技股份公司 Face recognition-based non-contact body temperature measurement method and system
CN118368146A (en) * 2024-06-19 2024-07-19 北京航空航天大学杭州创新研究院 A computer network intrusion detection method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214399A (en) * 2018-10-12 2019-01-15 清华大学深圳研究生院 A kind of improvement YOLOV3 Target Recognition Algorithms being embedded in SENet structure
CN111126472A (en) * 2019-12-18 2020-05-08 南京信息工程大学 Improved target detection method based on SSD
CN111507199A (en) * 2020-03-25 2020-08-07 杭州电子科技大学 Method and device for detecting behavior of wearing a mask

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214399A (en) * 2018-10-12 2019-01-15 清华大学深圳研究生院 A kind of improvement YOLOV3 Target Recognition Algorithms being embedded in SENet structure
CN111126472A (en) * 2019-12-18 2020-05-08 南京信息工程大学 Improved target detection method based on SSD
CN111507199A (en) * 2020-03-25 2020-08-07 杭州电子科技大学 Method and device for detecting behavior of wearing a mask

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
G. HUANG: "Densely Connected Convolutional Networks", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》, 9 September 2017 (2017-09-09), pages 2261 - 2269 *
Y. LE: "CenterMask: Real-Time Anchor-Free Instance Segmentation", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
Y. LE: "CenterMask: Real-Time Anchor-Free Instance Segmentation", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》, 5 August 2020 (2020-08-05), pages 13903 - 13912 *
言有三: "深度学习之人脸图像处理 核心算法与案例实战", 北京:机械工业出版社, pages: 101 - 105 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669343A (en) * 2021-01-04 2021-04-16 桂林电子科技大学 Zhuang minority nationality clothing segmentation method based on deep learning
CN112699900A (en) * 2021-01-05 2021-04-23 东北林业大学 Improved traffic sign identification method of YOLOv4
CN112801110A (en) * 2021-02-01 2021-05-14 中车青岛四方车辆研究所有限公司 Target detection method and device for image distortion correction of linear array camera of rail train
CN112801027A (en) * 2021-02-09 2021-05-14 北京工业大学 Vehicle target detection method based on event camera
CN112801027B (en) * 2021-02-09 2024-07-12 北京工业大学 Vehicle target detection method based on event camera
CN112861989A (en) * 2021-03-04 2021-05-28 水利部信息中心 Deep neural network regression model based on density screening
CN113221947A (en) * 2021-04-04 2021-08-06 青岛日日顺乐信云科技有限公司 Industrial quality inspection method and system based on image recognition technology
CN115205184A (en) * 2021-04-09 2022-10-18 南京大学 Bottle cap flaw detection method and system
CN113221795A (en) * 2021-05-24 2021-08-06 大连恒锐科技股份有限公司 Feature extraction, fusion and comparison method and device for shoe sample retrieval in video
CN113221795B (en) * 2021-05-24 2024-05-14 大连恒锐科技股份有限公司 Method and device for extracting, fusing and comparing shoe sample features in video
CN113421230A (en) * 2021-06-08 2021-09-21 浙江理工大学 Vehicle-mounted liquid crystal display light guide plate defect visual detection method based on target detection network
CN113421230B (en) * 2021-06-08 2023-10-20 浙江理工大学 Visual detection method for defects of vehicle-mounted liquid crystal display light guide plate based on target detection network
CN113255837A (en) * 2021-06-29 2021-08-13 南昌工程学院 Improved CenterNet network-based target detection method in industrial environment
CN115121913A (en) * 2022-08-30 2022-09-30 北京博清科技有限公司 Method for extracting laser center line
CN115121913B (en) * 2022-08-30 2023-01-10 北京博清科技有限公司 Method for extracting laser central line
CN115998295A (en) * 2023-03-24 2023-04-25 广东工业大学 Blood fat estimation method, system and device combining far-near infrared light
CN118135622A (en) * 2023-10-23 2024-06-04 来邦科技股份公司 Face recognition-based non-contact body temperature measurement method and system
CN118135622B (en) * 2023-10-23 2025-05-30 来邦科技股份公司 Face recognition-based non-contact body temperature measurement method and system
CN117593593A (en) * 2024-01-18 2024-02-23 湖北工业大学 An image emotion classification method based on multi-scale semantic fusion under emotional gain
CN117593593B (en) * 2024-01-18 2024-04-09 湖北工业大学 Image emotion classification method for multi-scale semantic fusion under emotion gain
CN118368146A (en) * 2024-06-19 2024-07-19 北京航空航天大学杭州创新研究院 A computer network intrusion detection method and system

Similar Documents

Publication Publication Date Title
CN112163602A (en) Target detection method based on deep neural network
CN110532859B (en) Remote sensing image target detection method based on deep evolutionary pruning convolutional network
CN114202672B (en) A small object detection method based on attention mechanism
CN113052210B (en) Rapid low-light target detection method based on convolutional neural network
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN110781924B (en) Side-scan sonar image feature extraction method based on full convolution neural network
CN113780211A (en) Lightweight aircraft detection method based on improved yolk 4-tiny
CN111860171B (en) Method and system for detecting irregular-shaped target in large-scale remote sensing image
CN113610144B (en) A vehicle classification method based on multi-branch local attention network
CN113139896B (en) Target detection system and method based on super-resolution reconstruction
CN110263712B (en) A Coarse and Fine Pedestrian Detection Method Based on Region Candidates
CN114241003B (en) All-weather lightweight high-real-time sea surface ship detection and tracking method
CN113505640B (en) A small-scale pedestrian detection method based on multi-scale feature fusion
CN112990065A (en) Optimized YOLOv5 model-based vehicle classification detection method
CN110555877B (en) Image processing method, device and equipment and readable medium
CN112528862A (en) Remote sensing image target detection method based on improved cross entropy loss function
CN111340019A (en) Detection method of granary pests based on Faster R-CNN
CN112801182A (en) RGBT target tracking method based on difficult sample perception
CN116363610A (en) An improved YOLOv5-based detection method for aerial vehicle rotating targets
CN116523897A (en) Semi-supervised enteromorpha detection method and system based on transconductance learning
CN111815526B (en) Method and system for removing rain streaks in rainy images based on image filtering and CNN
CN116363535A (en) Ship detection method in unmanned aerial vehicle aerial image based on convolutional neural network
CN117173595A (en) Unmanned aerial vehicle aerial image target detection method based on improved YOLOv7
CN112926552A (en) Remote sensing image vehicle target recognition model and method based on deep neural network
CN112906658A (en) Lightweight automatic detection method for ground target investigation by unmanned aerial vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210101