CN112257786A

CN112257786A - Feature detection method based on combination of convolutional neural network and attention mechanism

Info

Publication number: CN112257786A
Application number: CN202011146098.3A
Authority: CN
Inventors: 柏杨; 焦新峰
Original assignee: Nanjing Taliang Numeric Control Tech Co ltd
Current assignee: Nanjing Taliang Numeric Control Tech Co ltd
Priority date: 2020-10-23
Filing date: 2020-10-23
Publication date: 2021-01-22

Abstract

The invention discloses a feature detection method based on the combination of a convolutional neural network and an attention mechanism, which combines a yolov3 network and the attention mechanism to construct a new feature map, then adopts a self-collected image as a training sample to train the feature map, and utilizes the trained feature map to detect the image, thereby improving the detection precision.

Description

Feature detection method based on combination of convolutional neural network and attention mechanism

Technical Field

The invention relates to the technical field of computer image processing, in particular to a feature detection method based on the combination of a convolutional neural network and an attention mechanism.

Background

The current target detection is divided into the traditional method target detection and the deep learning method target detection. The traditional target detection method mainly comprises three parts: and sliding a window, and extracting the features and classifying by a classifier. However, the traditional target detection has two main problems, the time complexity of the region selection strategy based on the sliding window is high, the candidate frames are redundant, and the classifier has poor robustness due to the characteristics of manual design and cannot adapt to diversity change. With the continuous development and application of deep learning in various fields, the convolutional neural network is also gradually applied in the field of image processing. The convolutional neural network has a good effect on image classification and extraction of image features, and has a great improvement on detection accuracy and fineness compared with the convolutional neural network in a traditional target detection method, wherein yolov3 is one of excellent neural networks. And yolov3 has a great improvement in detection speed compared with other detection networks, but the detection accuracy is relatively low.

Disclosure of Invention

The invention aims to provide a feature detection method based on the combination of a convolutional neural network and an attention mechanism, and the detection precision is improved.

In order to achieve the above object, the present invention provides a feature detection method based on a convolutional neural network in combination with an attention mechanism, comprising the following steps:

combining the yolov3 network with an attention mechanism to construct a new feature map;

training the characteristic map by using a self-collected image as a training sample;

and detecting by using the trained feature map, and outputting a detection result.

The yolov3 network is combined with an attention mechanism to construct a new feature map, and the method comprises the following steps:

and inputting the obtained original feature map into a yolov3 network for carrying out down sampling for multiple times, and extruding the feature map obtained by convolution to obtain the corresponding global feature.

Wherein, the yolov3 network is combined with an attention mechanism to construct a new feature map, and the method further comprises the following steps:

and exciting the global features, learning the relation of each channel, calculating the weight of different channels, and multiplying the original feature map by the obtained product to obtain a feature map.

Wherein, adopt self-collecting image as training sample, train the feature map, include:

the method comprises the steps of obtaining a plurality of self-collected images, and dividing the plurality of self-collected images into a training set and a testing set, wherein the ratio of the training set to the testing set is 4: 1.

Wherein, adopt from gathering the image as training sample, right the feature map trains, still includes:

and collecting and sketching the training set, inputting the characteristic spectrum, and training the characteristic spectrum.

According to the feature detection method based on the combination of the convolutional neural network and the attention mechanism, the yolov3 network and the attention mechanism are combined to construct a new feature map, then a self-collected image is used as a training sample to train the feature map, and the trained feature map is used for detecting the image, so that the detection precision can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of the steps of a feature detection method based on a convolutional neural network combined with an attention mechanism according to the present invention.

Fig. 2 is the yolov3 network structure provided by the present invention.

Fig. 3 is a diagram of an attention mechanism provided by the present invention.

Fig. 4 is an attention mechanism principle provided by the present invention.

Fig. 5 is a comparison of the performance of the constructed new neural network provided by the present invention with that of the conventional yolov3 network.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

Referring to fig. 1, the present invention provides a feature detection method based on a convolutional neural network and attention mechanism, which includes the following steps:

and S101, combining the yolov3 network with an attention mechanism to construct a new feature map.

Specifically, yolov3 is selected as a feature extraction and detection master network, a new neural network is built by combining a yolov3 network and an attention module, a yolov3 feature extraction layer adopts five times of down-sampling, the previous feature map is reduced to one half by feature extraction each time, an attention mechanism module (SE module) is added below a convolution layer, the SE module firstly performs extrusion operation on the feature map obtained by convolution to obtain channel-level global features, an extrusion part is that the original feature map is as large as H W C, wherein H is height, W is width, and C is channel number, a global pooling layer is used for extruding the feature layer to 1C, which is equivalent to one-dimensional compression, after the one-dimensional compression, the one-dimensional parameters obtain the original sense field, and the sensing area is wider. And then, carrying out excitation operation on the global features, learning the relation among the channels, and obtaining the weights of different channels, wherein the excitation part is to add a full-connection layer after extruding the 1 x C feature part, predict the importance of each channel, obtain the importance weights of different channels, and then apply (excite) the weights to the channels corresponding to the original H x W x C feature diagram for subsequent operation. And finally, multiplying the obtained result by the original feature map to obtain the final feature, and outputting the final result after global average pooling, full connection layer and feature weighting. In essence, the SE module performs attention or gating operations in the channel dimension, which allows the model to focus more on the most informative channel features while suppressing those less important. A new network obtained by combining dark net53(yolov3 feature extraction network) with an attention module extracts a more complete feature map (feature map) as an input of a detection network.

As shown in fig. 2, yolov3 uses a pre-trained daknet53t model to generate an initial feature map, and then integrates context information by a feature pyramid structure, and generates a final prediction map by convolving layer feature maps of different layers. As shown in fig. 3 and 4, fig. 3 is a schematic diagram of a single squeeze and excitation operation, and the squeeze operation is performed first, that is, the original feature size is as large as H × W × C, where H is height, W is width, and C is the number of channels, and the feature layer is squeezed to 1 × C using the global pooling layer, which is equivalent to one-dimensional compression, and after the one-dimensional compression, the one-dimensional parameter obtains the original sense field, and the sense region is wider. And then, performing excitation operation, namely adding a fully-connected layer to the 1 x C characteristic part obtained after extrusion, predicting the importance of each channel, obtaining the importance weight of different channels, and then acting (exciting) the importance weight on the channel corresponding to the original H x W x C characteristic diagram to perform subsequent operation. And the channel attention mechanism redistributes the weight value of the output in the feature extraction layer, so that the extracted features are more complete, and the detection of the network is more facilitated.

And S102, training the characteristic map by using the self-collected image as a training sample.

Specifically, the self-collected images are used as training samples, a built new neural network is trained, the self-collected images are divided into a training set and a testing set, training data are collected and sketched, the images are used for training and testing, and the training data are subjected to normalization processing. The data in the invention comprises 4500 data sheets, wherein a training set comprises 3600 data sheets, and a testing set comprises 900 data sheets. The training process is a process of adjusting parameters, suitable parameters are searched for through continuous training to achieve a better prediction effect, the difference between the patent and a common method lies in that other training does not care about the weight of a channel, and the patent can train the weight of the channel at the same time to improve the prediction accuracy.

And S103, detecting by using the trained feature map, and outputting a detection result.

Detecting the acquired data set under the condition of the same training and test data and the same training parameters, comparing and analyzing the detection effect of the neural network used in the method with the detection effect of yolov3, and comparing the map values and the recall rates of the two networks, as shown in fig. 5, wherein (a) is the map value of yolov3, (b) is the map value of the network of the invention, (c) is the yolov3 recall rate, and (d) is the recall rate of the network; it can be seen that the map value and recall ratio of the novel neural network are higher than those of the training using yolov3 alone, and the detection precision is improved. map is the average value of ap, i.e., mean average prediction, and in machine learning, the prediction can be divided into four cases, True Positive (TP): positive actual is positive, and the prediction is true, False Negative (FN): negative actual Positive prediction, False Positive (FP): negative actual positive prediction, wrong prediction, True Negative (TN): the prediction is negative, the actual is negative, and the prediction is right. In multi-class target detection, each class can draw a curve according to the precision and the Recall, the area under the curve is the AP value, and the average value of the APs of all the classes is the mAP.

The invention utilizes the existing neural network yolov3 and an attention mechanism module to build a new neural network, and the yolov3 has the advantages that a characteristic pyramid structure is used, multi-scale characteristics are fused, and a better detection effect is achieved; the attention mechanism has the advantages that more complete features can be extracted, and the extracted feature map is input into the detection network, so that the detection precision is improved. In order to fully combine the advantages of the two, the new neural network structure reserves the characteristic pyramid structure and feeds the output of the new network into the detection network. Compared with yolov3, the invention has certain improvement in detection effect and various performance measurement values.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A feature detection method based on a convolutional neural network and attention mechanism is characterized by comprising the following steps:

2. The feature detection method based on the combination of the convolutional neural network and the attention mechanism as claimed in claim 1, wherein the step of combining the yolov3 network and the attention mechanism to construct a new feature map comprises the following steps:

3. The feature detection method based on the combination of the convolutional neural network and the attention mechanism as claimed in claim 2, wherein the yolov3 network is combined with the attention mechanism to construct a new feature map, further comprising:

4. The feature detection method based on the combination of the convolutional neural network and the attention mechanism as claimed in claim 3, wherein training the feature map by using a self-collected image as a training sample comprises:

5. The method of feature detection based on a convolutional neural network in combination with an attention mechanism as claimed in claim 4, wherein the feature map is trained using self-captured images as training samples, further comprising: