CN116579992A

CN116579992A - Small target bolt defect detection method for unmanned aerial vehicle inspection

Info

Publication number: CN116579992A
Application number: CN202310446386.8A
Authority: CN
Inventors: 罗鹏; 王波; 马富齐; 王红霞; 马恒瑞; 王雷雄
Original assignee: Wuhan Jiachu Intelligent Information Technology Co ltd
Current assignee: Wuhan Jiachu Intelligent Information Technology Co ltd
Priority date: 2023-04-23
Filing date: 2023-04-23
Publication date: 2023-08-11

Abstract

The invention discloses a small target bolt defect detection method for unmanned aerial vehicle inspection, which comprises the following steps: (1) Constructing a backbone network suitable for small target feature extraction; (2) Constructing a global-local two-stage small target bolt defect detection model; (3) Constructing a small target bolt defect data set based on the unmanned aerial vehicle inspection image; (4) Training a global-local two-stage small target bolt defect detection model on an unmanned aerial vehicle inspection picture data set; (5) And using the trained model to intelligently identify the bolt defects in the unmanned aerial vehicle inspection picture. Aiming at the problems of small bolt defect targets and difficult feature extraction in a power transmission line inspection picture, the invention provides a small target bolt defect detection method for unmanned aerial vehicle inspection by combining the technologies of local image extraction, feature fusion, attention mechanism and the like.

Description

Small target bolt defect detection method for unmanned aerial vehicle inspection

Technical Field

The invention belongs to the technical field of digital image recognition, and particularly relates to a small target bolt defect detection method for unmanned aerial vehicle inspection.

Background

The transmission line inspection is an important means for guaranteeing the reliable operation of the power system, the traditional inspection mode mainly uses a telescope under a pole tower or ascends the pole tower to inspect by an inspection staff, and along with the continuous expansion of the power grid scale, the traditional manual inspection mode is more and more difficult to meet the inspection requirement of the transmission line. In recent years, unmanned aerial vehicle inspection modes are continuously popularized in the inspection of the transmission line, and the inspection efficiency of the transmission line is greatly improved. Unmanned aerial vehicle inspection is that unmanned aerial vehicle reaches appointed place through unmanned aerial vehicle flying hand control or according to appointed route, shoots transmission equipment. A large number of pictures are generated in the unmanned aerial vehicle inspection process, and the computer vision technology is combined with the unmanned aerial vehicle inspection, so that the development of automatic inspection of the power transmission line is effectively promoted.

The bolts are used as fastening components among the power transmission line connecting fittings, defects such as bolt missing, rust, nut looseness and the like are widely existed in the power transmission line, inspection is an important work, the defects of the bolts are intelligently identified by combining an image identification technology with unmanned aerial vehicle inspection, the inspection efficiency of the power transmission line is greatly improved, and the safety of the power transmission line is ensured.

Whereas for image recognition the physical size of the bolt is small, belonging to the typical small target type. Deep learning-based models have made great progress in the field of power image recognition, but these methods are still poorly suited for small-sized component recognition. Due to the limitation of the positioning precision and the endurance time of the unmanned aerial vehicle, fine shooting of small-size components on a power transmission line is often difficult in the inspection process. Therefore, small-size hardware fittings such as bolts in power transmission line equipment occupy a small area in the inspection picture, and background parts which do not contain equipment information occupy most of the area of the picture. In the processing process of a computer vision algorithm, a high-resolution inspection image is often firstly downsampled to a certain size, so that a large amount of information is lost, and the detection difficulty of small-size targets such as bolt defects is greatly increased. If the original image is directly analyzed, a large number of invalid background areas consume extremely large amounts of computing resources and time.

Therefore, by deeply analyzing the characteristics of the defects of the bolts of the transmission line, the intelligent identification method for the defects of the bolts of the transmission line image, which is suitable for inspection of the unmanned aerial vehicle, is provided by combining the existing deep learning technology, and has important significance for improving the inspection efficiency of the transmission line and promoting the intelligent development of inspection.

Disclosure of Invention

In order to solve the problems, the invention provides a small target bolt defect detection method for unmanned aerial vehicle inspection, which can effectively identify the bolt defect of a power transmission line by intelligently analyzing a power transmission line inspection image obtained by unmanned aerial vehicle shooting, provides reference for power detection personnel and ensures the reliability of power transmission.

The technical scheme adopted by the invention is as follows:

a small target bolt defect detection method for unmanned aerial vehicle inspection comprises the following steps:

step 1: constructing a backbone network suitable for small target feature extraction;

step 2: constructing a global-local two-stage small target bolt defect detection model;

step 3: constructing a small target bolt defect data set based on the unmanned aerial vehicle inspection image;

step 4: training a global-local two-stage small target bolt defect detection model on an unmanned aerial vehicle inspection picture data set;

step 5: and using the trained model to intelligently identify the bolt defects in the unmanned aerial vehicle inspection picture.

In the step 1, the specific implementation is as follows:

step 1.1: designing a small target bolt defect feature extraction network by taking ResNet as a main network;

step 1.2: the mixed attention module is added into the feature extraction network, and the module is mainly divided into a channel attention layer and a space attention layer, wherein the channel attention layer can strengthen the feature diagram which is favorable for representing the target and inhibit other feature diagrams, and the space attention layer is used for strengthening the foreground region where the target is positioned in the feature diagram and inhibiting the information of the background region. By designing the mixed attention module, foreground information can be kept as much as possible in the feature extraction process, and the effectiveness of feature extraction is improved;

step 1.3: and the feature pyramid network is adopted in the feature extraction network to perform multi-scale feature fusion, so that the information loss in the feature extraction process is reduced and the detection precision of the bolt defects is improved by fusing features with different scales.

In the step 2, the specific implementation is as follows:

step 2.1: the global-local ultra-small target detection module consists of two branches: global saliency region detection branches and local object detection branches. And taking the original image after the global saliency region detection branch is sampled as input, generating a saliency region by adopting a region suggestion network after the original image passes through a feature extraction network, and cutting out corresponding pixel blocks in the original image according to the coordinates of the saliency region. The cut picture blocks are transmitted into local target detection branches, targets are identified from the picture blocks and mapped into an original image, and repeated results are removed through non-maximum suppression, so that ultra-small target identification results are obtained;

step 2.2: in the global saliency region detection branch, clustering candidate targets by using a k-means algorithm, so as to avoid the need of labeling data in the saliency region generation stage;

step 2.3: the local target detection branch takes a foreground picture block extracted by the global significance region detection branch as input data, designs by referring to the Faster RCNN network, and simultaneously introduces a self-attention mechanism to optimize the RPN network.

In the step 3, the specific implementation is as follows:

step 3.1: in the implementation process, firstly, marking a bolt target in an original inspection image, and then dividing an original data set into 2 sub-data sets, wherein the first sub-data set is used for training a global significance region detection branch, and the second data set is used for training a local target detection branch;

step 3.2: merging all marked bolt target categories in the inspection picture into a target type of 'foreground' to form a target detection data set by the sub data set;

step 3.3: the second sub data set is formed by randomly cutting the original data set into 640 x 640 small pictures, and the target category is consistent with the original data set.

In the step 4, the specific implementation is as follows:

step 4.1: firstly, the method pretrains a feature extraction network on an image net2012 public data set;

step 4.2: and initializing a global salient region detection branch and a local target detection branch by using a trained feature extraction network respectively, and training the two modules on the first sub-data set and the second sub-data set respectively.

Aiming at the problems of small bolt defect targets and difficult feature extraction in a power transmission line inspection picture, the invention provides a small target bolt defect detection method for unmanned aerial vehicle inspection by combining the technologies of local image extraction, feature fusion, attention mechanism and the like, a global salient region detection branch acquires a feature dense region in a high-resolution image to obtain a local fine image, then a global-local detection network is utilized to detect pictures with different scales, and a global-local identification result is fused by adopting an improved non-maximum suppression method, so that the end-to-end identification of small target bolt defects in the power transmission line unmanned aerial vehicle inspection picture is realized.

Drawings

FIG. 1 is a schematic diagram of a hybrid attention module configuration of the present invention;

FIG. 2 is a schematic diagram of a backbone network employing multi-scale feature fusion in accordance with the present invention;

FIG. 3 is a diagram of the overall structure of the global-local two-stage small target bolt defect detection model of the present invention;

fig. 4 is a diagram showing a defective example of the bolt according to the present invention.

Detailed Description

In order to facilitate the understanding and practice of the invention, those of ordinary skill in the art will now make further details with reference to the drawings and examples, it being understood that the examples described herein are for the purpose of illustration and explanation only and are not intended to limit the invention thereto.

The invention provides a small target bolt defect detection method for unmanned aerial vehicle inspection, which comprises the following steps:

step 1.1: the invention designs an ultra-small target feature extraction network based on ResNet, and a depth residual error network (Deep residual network, resNet) is one of the most representative feature extraction networks in the field of computer vision, mainly comprises network structures with different depths such as ResNet-50, resNet-101, resNet-152 and the like, and the extraction capacity of deep features is further enhanced along with the increase of the number of network layers, but the loss of texture features is caused, the detection of small targets is not facilitated, and the calculated amount is greatly increased. The present embodiment builds a backbone network with ResNet-50, it being understood that the use of other backbone networks is within the scope of this patent.

Step 2.2: after the characteristic extraction of the pixel information in the picture is carried out by the convolutional neural network, a characteristic diagram with a certain depth is formed, and the characteristic diagram is used for representing the target. In the ultra-small target detection task of the power transmission line, pixel information used for representing a target to be detected is less, background information is dominant in the global convolution feature extraction process, and effective information is difficult to extract. Thus, a hybrid attention module is added to the feature extraction network, and the hybrid attention module is mainly divided into a channel attention layer and a spatial attention layer, wherein the channel attention layer can enhance a feature map which is favorable for representing a target and inhibit other feature maps, and the spatial attention layer is information for enhancing a foreground region where the target is located and inhibiting a background region in the feature map. By designing the mixed attention module, foreground information can be kept as much as possible in the feature extraction process, and the effectiveness of feature extraction is improved. A block diagram of the hybrid attention module is shown in fig. 1.

Step 2.3: the convolution neural network obtains feature images with different scales by carrying out convolution and pooling operations on the original image. Experiments show that the shallow feature map has high resolution, the detailed information of the original picture is reserved completely, but the whole form of the object has insufficient representation capability; the deep feature map contains rich semantic information after complex nonlinear transformation, but detail information in the picture is lost due to too low resolution. For ultra-small target detection tasks of the power transmission line, the shallow feature map is difficult to capture the overall form information of the target, and the deep feature map causes small target pixel information loss due to convolution operation, so that available information is greatly reduced. Therefore, the multi-scale feature fusion is carried out by adopting the feature pyramid network, so that the accuracy of the ultra-small target detection of the power transmission line is improved.

step 2.1: the global-local ultra-small target detection module consists of two branches: global saliency region detection branches and local object detection branches. And taking the original image after the global saliency region detection branch is sampled as input, generating a saliency region by adopting a region suggestion network after the original image passes through a feature extraction network, and cutting out corresponding pixel blocks in the original image according to the coordinates of the saliency region. And the cut picture blocks are transmitted into local target detection branches, targets are identified from the picture blocks and mapped into the original image, and the repeated results are removed through non-maximum suppression, so that an ultra-small target identification result is obtained. The structure diagram of the global-local two-stage small target bolt defect detection model is shown in figure 2.

Step 2.2: the global saliency region detection branch firstly downsamples an original picture to 800×800, a feature extraction network is utilized to obtain a feature map, and then an ultra-small target saliency region extraction module is designed based on a region suggestion network.

For the input feature map, taking a feature map of 25×25×256 as an example, 25×25 anchor points may be selected at equal intervals in the original image as the center points of candidate windows, 9 possible candidate windows {32×32,32×64,64×32,64×64,64×128,128×64,128×256,256×128} are set at each center point position, and a binary label is assigned to each candidate window, so that the label of each point may be finally represented by an 18-dimensional vector. In the module, if the overlapping part of the candidate window and any one of the targets is larger than 70% of the target area, the region is considered as a salient region, positive labels are given to the candidate frame, if the overlapping part of the candidate window and all the targets is smaller than 30% of the target area, the region is considered as a background, negative labels are given to the candidate frame, and the rest candidate frames are non-positive and non-negative. It is noted that the feature extraction network in the present model adopts a feature pyramid structure, so that 9 candidate frames with different sizes are dispersed into three feature graphs with different scales for prediction.

The loss function of the ultra-small target significance region extraction network is as follows.

Wherein N is _cls Representing the number of categories, and extracting the ultra-small target salient region, wherein the ultra-small target salient region extraction module comprises 2 categories of foreground and background, so that N is _cls ＝2，L _cls (p _i ,p _i ^* ) Representing class loss functions, L herein _cls (p _i ,p _i ^* ) The error between the predicted value and the actual value is measured using cross entropy, which can be expressed specifically as follows.

Wherein p is _i ^* Representing the actual category, p _i Representing the prediction category.

In the model training process, due to the fact that the number of candidate frames is large, positive and negative samples are seriously unbalanced, only 256 positive samples and 256 negative samples are selected from the feature map of each scale to participate in training. And finally obtaining 300 candidate frames with highest confidence coefficient as a foreground region in the test process.

For each candidate frame, the target position is characterized by the center point and the midpoints of four sides, so that 1500 anchor points are obtained. And k-means clustering is used for the 1500 anchors to obtain N clustering centers, and then, if the number of anchors with the distance from a certain anchor point being smaller than 64 is smaller than 3 in each type of anchor points, the anchor points are invalid. And for the effective anchor point, acquiring a corresponding boundary, thereby acquiring a salient region. The specific implementation is shown in algorithm 1.

Step 2.3: the local target detection branch takes a foreground picture block extracted by the global significance region detection branch as input data, uniformly resamples the picture block to 640 multiplied by 640, acquires a picture feature map through a feature extraction network, designs a target detection stage by referring to a fast RCNN network, and simultaneously introduces a self-attention mechanism to optimize an RPN network, wherein the overall structure of the local target detection module is shown as a local target detection branch in the figure 3.

First, self-attention semantic feature extraction branches are established herein based on conv_4 and conv_5 output features, respectively, as shown by blue and orange arrows in fig. 3, respectively. By adding the semantic feature extraction branches, the correlation between pixels can be fully reserved in the picture downsampling process. Subsequently, the semantic feature graphs obtained by conv_4 and conv_5 are spliced, and it should be noted that the conv_4 semantic feature graph is firstly downsampled by Average Pooling so as to keep the dimension consistent with the conv_5 semantic feature graph. Finally, the semantic feature map is fused with conv_ proposal feature map and used for subsequent target detection.

The loss function of the local object detection module is as follows.

Wherein N is _cls In the local target detection module, bolts are classified into 6 types according to different states, and the background is required to be used as one type independently, so N is the same as N _cls ＝7，L _cls (p _i ,p _i ^* ) Representing class loss functions, L herein _cls (p _i ,p _i ^* ) The cross entropy is calculated and can be expressed as the following formula.

N _pos Representing the number of position coordinates, herein a rectangular box is used to represent the target position, thus N _pos ＝4，L _pos Representing the position loss function, in order to accelerate the convergence speed of the model, a CIoU (complete intersection over union loss) loss function is introduced herein to calculate the position loss, which can be expressed specifically as the following formula.

Wherein t is _i Representing the position of the prediction frame, t _i ^* Representing the position of the actual frame, ρ representing the predicted frame t _i Center point of (c) and true target position t _i ^* The Euclidean distance between the center points of (c) represents the coverage t _i And t _i ^* Is IoU represents t _i And t _i ^* The ratio of the cross-over ratio can be calculated by the following equation.

Alpha is a balance coefficient and can be calculated by the following formula.

Where v represents a coupling coefficient between the aspect ratio of the predicted target and the aspect ratio of the actual target, and can be expressed by the following formula.

Wherein w and h represent the width and height of the predicted target, w ^* And h ^* Representing the width and height of the actual target.

step 3.1: the main targets of the data set used in the invention are bolts at the connecting hardware fitting such as a triangle yoke plate, an adjusting plate, a wire clamp and the like, and according to different visual forms of the bolts, the targets of the bolts to be identified are divided into five types: a, normal shape of nut and pin; b double nuts are in normal form; c, adding pins into the nuts and forming pin missing shapes; d, double nuts, wherein the nuts are loosened; e double nuts, nut missing form; f, rusting the nut; examples of the various categories are shown in fig. 4.

Step 3.2: in the embodiment, 1852 sample pictures with bolt defects are collected in total, 1482 sample pictures are randomly selected as a training set, 370 sample pictures are taken as a test set, and the sample pictures are marked according to the PASCAL VOC standard;

step 3.3: in order to enable the data set to be better suitable for the training tasks of the two stages of the method, the training set is further processed to obtain two data sets;

data set a: combining all target categories into a target type of 'foreground', wherein a total of 14635 targets are in a training set and a testing set, and the data set is used for training and testing a global significance region detection branch;

data set B: the data set is obtained by randomly clipping an original data set, the size of a clipped picture is 640 multiplied by 640, and two types of objects of A, B are considered to be far more than other four types of objects, so that if the clipped picture only comprises an A type object or a B type object, the picture is omitted. 5 pictures are randomly cut from each picture, 7410 pictures are obtained from a final training set, 1850 pictures are obtained from a test set, and a data set B is formed for training and testing of local target detection branches;

step 4.1: in this embodiment, the feature extraction network, the global salient region detection branch and the local target detection branch need to be trained respectively, and in order to enable the model to better converge on the dataset of this embodiment, the feature extraction network needs to be pre-trained on the ImageNet dataset to obtain an initialization weight;

step 4.2: in this embodiment, the model structure designed in the local target detection branch is pre-trained on the ImageNet dataset, the whole model is first randomly initialized, and then the model is trained on the pre-training dataset for 40 generations, wherein the initial training weight is 0.001, and the weight decay is 0.0001 after 32 generations of training. The training process adopts a small batch gradient descent method, the number of pictures for each training is 64, the training process adopts a random gradient descent method to optimize parameters, and adopts a momentum optimization method to accelerate convergence, and the momentum is set to be 0.9. After model training is finished, saving the trained parameters to a weight file;

step 4.3: when the global significance region detection branch is trained, firstly, a pre-trained model is utilized to initialize a feature extraction network, other parameters are randomly initialized, and meanwhile, the parameters of the first three layers of the network are frozen, and only the parameters of the second two layers and the head are trained. The model was trained on data set a herein for 15 generations, the input pictures were uniformly resampled to 800 x 800, the initial learning rate was 0.01, and the learning rate decay was 0.001 after 12 generations of training. The rest of the optimization strategy is consistent with the pre-training. The local target detection branch completes training in the data set B, and the training process and super-parameter setting are the same as those of the global significance region detection branch;

step 5: using a trained model to intelligently identify bolt defects in the unmanned aerial vehicle inspection picture;

step 5.1: to evaluate the effectiveness of the methods presented herein, comparative verification was performed herein using the Faster RCNN, SSD, retinaNet model, and the test results are shown in Table 1. The test result shows that the performance of the method provided by the invention on the task of detecting the defects of the bolts of the power transmission line is far superior to that of other methods.

Table 1 comparison of model test results

It should be understood that parts of the specification not specifically set forth herein are all prior art.

It should be understood that the foregoing description of the preferred embodiments is not intended to limit the scope of the invention, but rather to limit the scope of the claims, and that those skilled in the art can make substitutions or modifications without departing from the scope of the invention as set forth in the appended claims.

Claims

1. A small target bolt defect detection method for unmanned aerial vehicle inspection is characterized in that:

comprises the following steps:

2. The method for detecting the defects of the small target bolts for unmanned aerial vehicle inspection according to claim 1, wherein the method comprises the following steps: the construction of the backbone network suitable for small target feature extraction in the step 1 comprises the following specific steps:

step 1.2: adding a mixed attention module into a feature extraction network, wherein the module is mainly divided into a channel attention layer and a space attention layer, the channel attention layer can strengthen a feature diagram which is favorable for representing a target and inhibit other feature diagrams, the space attention layer is used for strengthening a foreground region where the target is positioned in the feature diagram and inhibiting information of a background region, and by designing the mixed attention module, the foreground information can be kept as much as possible in the feature extraction process, and the effectiveness of feature extraction is improved;

3. The method for detecting the defects of the small target bolts for unmanned aerial vehicle inspection according to claim 1, wherein the method comprises the following steps: the construction of the global-local two-stage small target bolt defect detection model in the step 2 comprises the following specific steps:

step 2.1: the global-local ultra-small target detection module consists of two branches: the method comprises the steps that a global saliency region detection branch and a local target detection branch are adopted, an original image after the global saliency region detection branch is subjected to downsampling is taken as input, after a feature extraction network is adopted, a saliency region is generated by adopting a region suggestion network, corresponding pixel blocks are cut in an original image according to saliency region coordinates, the cut image blocks are transmitted into the local target detection branch, targets are identified from the image blocks and mapped into the original image, a repetition result is removed through non-maximum suppression, and therefore an ultra-small target identification result is obtained;

step 2.2: in the global saliency region detection branch, clustering candidate targets by using a k-means algorithm, so as to avoid the need of labeling data in the saliency region generation stage; the algorithm implementation flow is as follows:

4. The method for detecting the defects of the small target bolts for unmanned aerial vehicle inspection according to claim 1, wherein the method comprises the following steps: the method for constructing the small target bolt defect data set based on the unmanned aerial vehicle inspection image comprises the following specific steps of:

5. The method for detecting the defects of the small target bolts for unmanned aerial vehicle inspection according to claim 1, wherein the method comprises the following steps: the global-local two-stage small target bolt defect detection model provided by the invention is trained on an unmanned aerial vehicle inspection picture data set in the step 4, and the specific steps are as follows: