CN109325947A

CN109325947A - A SAR image tower target detection method based on deep learning

Info

Publication number: CN109325947A
Application number: CN201811100702.1A
Authority: CN
Inventors: 李春升; 高原; 杨威; 陈杰
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2018-09-20
Filing date: 2018-09-20
Publication date: 2019-02-12

Abstract

The invention relates to a SAR image iron tower target detection method based on deep learning. Each sample slice in the sample set is processed to generate multiple different artificial sample slices, and the training sample set is added to achieve expansion after adding the sample label; the SSD model is constructed; the expanded training sample set is input into the constructed SSD model , use the gradient descent method to train the SSD model; cut the data graph to be tested into multiple slices to be detected with the same size as the sample slice, input the trained SSD model, and obtain the target detection result of the data graph to be tested. The invention has the advantages of strong robustness, fast running speed, high detection performance and easy migration, no need for high contrast between the target and the background during detection, and can be used for detection in complex scenes.

Description

A SAR image tower target detection method based on deep learning

技术领域technical field

本发明涉及图像处理技术领域，尤其涉及一种基于深度学习的SAR图像铁塔目标检测方法。The invention relates to the technical field of image processing, in particular to a deep learning-based SAR image tower target detection method.

背景技术Background technique

星载合成孔径雷达(SAR)属于一种微波成像雷达，其特点是不受天气、气候的影响，能够全天时、全天候、高分辨率、大区域对地观测，因此在军事目标侦查、海洋监测、资源探测、农业和林业等诸多领域得到广泛应用。Spaceborne Synthetic Aperture Radar (SAR) belongs to a microwave imaging radar. It is widely used in many fields such as monitoring, resource detection, agriculture and forestry.

卷积神经网络(CNN)是一种常用的深层、前置反馈的人工神经网络，是深度学习方法的一种，目前已经被成功应用在计算机视觉领域。随着深度学习技术的复苏，基于深度学习的目标检测方法在近几年来发展迅速。基于候选区域的R-CNN方法，及其改进版本FastR-CNN方法和Faster R-CNN，以及基于端对端思想的YOLO方法和SSD方法先后被提出，并且已经得到了广泛应用。但Faster R-CNN模型实时性能较差，YOLO模型准确率较差的不足。Convolutional Neural Network (CNN) is a commonly used deep, pre-feedback artificial neural network, which is a kind of deep learning method and has been successfully applied in the field of computer vision. With the recovery of deep learning technology, object detection methods based on deep learning have developed rapidly in recent years. The R-CNN method based on candidate regions, its improved versions FastR-CNN method and Faster R-CNN, as well as the YOLO method and SSD method based on the end-to-end idea have been proposed successively and have been widely used. However, the real-time performance of the Faster R-CNN model is poor, and the accuracy of the YOLO model is poor.

目前，已经发展出许多针对SAR图像的目标检测方法。其中，恒虚警率CFAR检测方法以其简单、快速、实时性强的特点而被广泛应用于SAR图像目标检测中。根据不同类型的目标在SAR图像上具有不同的表征形式，也相应的具有不同的检测方法。但这些现有SAR图像检测方法仅利用了SAR图像局部区域的统计特性，仅能做到像素级别的检测，要求目标与背景有较高的对比度，在简单场景下检测性能较好，但在复杂场景下检测性能较差。At present, many target detection methods for SAR images have been developed. Among them, the constant false alarm rate CFAR detection method is widely used in SAR image target detection because of its simplicity, rapidity and strong real-time performance. According to different types of targets, there are different representation forms on SAR images, and correspondingly different detection methods. However, these existing SAR image detection methods only utilize the statistical characteristics of the local area of the SAR image, and can only achieve pixel-level detection, requiring a high contrast between the target and the background, and the detection performance is good in simple scenes, but in complex The detection performance in the scene is poor.

发明内容SUMMARY OF THE INVENTION

(一)要解决的技术问题(1) Technical problems to be solved

本发明要解决的技术问题是解决现有技术中针对SAR图像的目标检测方法要求目标与背景有较高的对比度，在复杂场景下检测性能较差的问题。The technical problem to be solved by the present invention is to solve the problem that the target detection method for SAR images in the prior art requires a high contrast between the target and the background, and the detection performance is poor in complex scenes.

(二)技术方案(2) Technical solutions

为了解决上述技术问题，本发明提供了一种基于深度学习的SAR图像铁塔目标检测方法，以提高在复杂场景下的检测性能，包括：In order to solve the above technical problems, the present invention provides a SAR image tower target detection method based on deep learning to improve the detection performance in complex scenes, including:

S1、从SAR数据集中随机抽取多幅SAR图像，分割获取样本切片，为每张样本切片设置样本标签后构建训练样本集，样本标签包括该样本切片内目标的坐标信息和类别信息；S1. Randomly extract multiple SAR images from the SAR data set, segment to obtain sample slices, set a sample label for each sample slice to construct a training sample set, and the sample label includes the coordinate information and category information of the target in the sample slice;

S2、对训练样本集中的每张样本切片进行处理，生成出多张不同的人工样本切片，并在加注样本标签后加入所述训练样本集实现扩充；S2, processing each sample slice in the training sample set to generate a plurality of different artificial sample slices, and adding the training sample set to realize expansion after adding the sample label;

S3、构建SSD模型，包括：S3. Build the SSD model, including:

15层卷积神经网络，用于对输入图像的图像特征进行初步提取；A 15-layer convolutional neural network is used to initially extract the image features of the input image;

8层卷积神经网络，用于进一步提取不同尺度的图像特征；8-layer convolutional neural network for further extracting image features of different scales;

多尺度检测网络，对提取到的不同尺度的图像特征进行检测；The multi-scale detection network detects the extracted image features of different scales;

S4、将所述步骤S2中扩充后的训练样本集作为输入图像输入所述步骤S3构建的SSD模型，采用梯度下降法对所述SSD模型进行训练；S4, using the expanded training sample set in the step S2 as an input image and inputting the SSD model constructed in the step S3, and using the gradient descent method to train the SSD model;

S5、将待测数据图切割为多个与样本切片尺寸相同的待检测切片，输入所述步骤S4训练好的SSD模型，得到待测数据图的目标检测结果。S5. Cut the data graph to be tested into a plurality of slices to be detected with the same size as the sample slice, input the SSD model trained in step S4, and obtain the target detection result of the data graph to be tested.

优选地，所述步骤S1包括：Preferably, the step S1 includes:

S1-1、从MiniSAR数据集中随机抽取100幅SAR图像；S1-1. Randomly extract 100 SAR images from the MiniSAR dataset;

S1-2、从每一幅SAR图像中随机分割获取多幅张样本切片；S1-2. Randomly divide and obtain multiple sample slices from each SAR image;

S1-3、针对每一张样本切片加注样本标签，样本标签包括样本切片的绝对路径、目标个数、目标边框坐标以及目标的类别；S1-3. Add a sample label to each sample slice, and the sample label includes the absolute path of the sample slice, the number of targets, the coordinates of the target frame, and the category of the target;

S1-4、将各个带有对应样本标签的所述样本切片组成训练样本集。S1-4. Each of the sample slices with corresponding sample labels is formed into a training sample set.

优选地，所述步骤S2中对训练样本集中的每张样本切片进行处理，生成出多张不同的人工样本切片时，随机使用平移、翻转、旋转、增加噪声中的一种或多种方法进行处理，从每一张样本切片变换生成出100张不同的人工样本切片。Preferably, in the step S2, each sample slice in the training sample set is processed, and when multiple different artificial sample slices are generated, one or more methods of translation, flipping, rotation, and adding noise are randomly used to perform the processing. processing to generate 100 different artificial sample slices from each sample slice transformation.

优选地，所述步骤S4包括：Preferably, the step S4 includes:

S4-1、将扩充后的训练样本集输入SSD模型；S4-1. Input the expanded training sample set into the SSD model;

S4-2、利用前向传播法，计算输入图像对当前SSD模型的代价函数；S4-2, use the forward propagation method to calculate the cost function of the input image to the current SSD model;

S4-3、利用反向传播法，分别计算代价函数对于SSD模型内各卷积层中的参数的梯度值；S4-3. Using the back-propagation method, respectively calculate the gradient values of the cost function for the parameters in each convolutional layer in the SSD model;

S4-4、利用梯度下降法，根据代价函数对于各卷积层中的参数的梯度值，更新各卷积层中的参数；S4-4. Using the gradient descent method, update the parameters in each convolution layer according to the gradient value of the cost function for the parameters in each convolution layer;

S4-5、循环进行所述步骤S4-2至S4-4更新SSD模型，当循环次数达到设定值，或SSD模型中各卷积层的参数不再更新时，训练完成，保存当前SSD模型。S4-5. Repeat steps S4-2 to S4-4 to update the SSD model. When the number of cycles reaches the set value, or the parameters of each convolutional layer in the SSD model are no longer updated, the training is completed and the current SSD model is saved .

优选地，所述步骤S5包括：Preferably, the step S5 includes:

S5-1、以网格状将待测数据图切割为多个与样本切片尺寸相近的切片，作为待检测切片；S5-1, cutting the data graph to be tested into a plurality of slices with a size similar to the sample slice in a grid shape, as the slices to be detected;

S5-2、将待检测切片输入所述步骤S4训练好的SSD模型，利用SSD模型对各待检测切片进行目标检测，得到各待检测切片的目标检测结果；S5-2. Input the slice to be detected into the SSD model trained in step S4, and use the SSD model to perform target detection on each slice to be detected, and obtain the target detection result of each slice to be detected;

S5-3、将各待检测切片的目标检测结果合并在原始待测数据图的相应位置上，得到待测数据图的目标检测结果。S5-3. Merge the target detection results of each slice to be detected at the corresponding positions of the original data map to be tested to obtain the target detection results of the data map to be tested.

优选地，所述步骤S3中构建的SSD模型输入图像尺寸为300×300像素；Preferably, the size of the input image of the SSD model constructed in the step S3 is 300×300 pixels;

15层卷积神经网络包括：The 15-layer convolutional neural network includes:

第一组卷积层：第1-2层卷积层，每层分别使用64个窗口大小为3×3的卷积核，窗口移动步长为1，边缘填充1像素，输出64个尺寸为300×300的特征图；The first group of convolutional layers: 1-2 layers of convolutional layers, each layer uses 64 convolution kernels with a window size of 3 × 3, the window movement step size is 1, the edge is filled with 1 pixel, and the output size is 64 300×300 feature map;

第二组卷积层：第3-4层卷积层，每层分别使用128个窗口大小为3×3的卷积核，窗口移动步长为1，边缘填充1像素，输出128个尺寸为150×150的特征图；The second group of convolutional layers: 3-4 layers of convolutional layers, each layer uses 128 convolution kernels with a window size of 3 × 3, the window moving step is 1, the edge is filled with 1 pixel, and the output size is 128 150×150 feature map;

第三组卷积层：第5-7层卷积层，每层分别使用256个窗口大小为3×3的卷积核，窗口移动步长为1，边缘填充1像素，输出256个尺寸为75×75的特征图；The third group of convolutional layers: 5-7 layers of convolutional layers, each layer uses 256 convolution kernels with a window size of 3 × 3, the window moving step is 1, the edge is filled with 1 pixel, and the output size is 256 75×75 feature map;

第四组卷积层：第8-10层卷积层，每层分别使用512个窗口大小为3×3的卷积核，窗口移动步长为1，边缘填充1像素，输出512个尺寸为38×38的特征图；The fourth group of convolutional layers: 8-10 layers of convolutional layers, each layer uses 512 convolution kernels with a window size of 3 × 3, the window moving step is 1, the edge is filled with 1 pixel, and the output size is 512 38×38 feature map;

第五组卷积层：第11-13层卷积层，每层分别使用512个窗口大小为3×3的卷积核，窗口移动步长为1，边缘填充1像素，输出512个尺寸为19×19的特征图；The fifth group of convolutional layers: the 11-13th convolutional layers, each layer uses 512 convolution kernels with a window size of 3 × 3, the window moving step is 1, the edge is filled with 1 pixel, and the output size is 512 19×19 feature map;

第六组卷积层：第14层卷积层，使用1024个窗口大小为3×3的卷积核，窗口移动步长为1，边缘填充1像素，输出1024个尺寸为19×19的特征图；The sixth group of convolutional layers: the 14th convolutional layer, using 1024 convolution kernels with a window size of 3 × 3, a window shift step size of 1, and edge padding of 1 pixel, outputting 1024 features of size 19 × 19 picture;

第七组卷积层：第15层卷积层，使用1024个窗口大小为1×1的卷积核，窗口移动步长为1，边缘不填充，输出1024个尺寸为19×19的特征图；The seventh group of convolutional layers: the 15th convolutional layer, using 1024 convolution kernels with a window size of 1×1, a window movement step size of 1, no edge padding, and outputting 1024 feature maps with a size of 19×19 ;

其中，第2、4、7、10、13层卷积层输出的特征图还需要经过最大下采样层进行降维，使用窗口大小为2×2的下采样核，窗口移动步长为2。Among them, the feature maps output by the 2nd, 4th, 7th, 10th, and 13th convolutional layers also need to go through the largest downsampling layer for dimensionality reduction, using a downsampling kernel with a window size of 2 × 2, and a window moving step size of 2.

优选地，所述步骤S3中的8层卷积神经网络包括：Preferably, the 8-layer convolutional neural network in step S3 includes:

第八组卷积层：第16-17层卷积层，第16层卷积层使用256个窗口大小为1×1的卷积核，窗口移动步长为1，边缘不填充；第17层使用512个窗口大小为3×3的卷积核，窗口移动步长为2，边缘填充1像素；输出512个尺寸为10×10的特征图；The eighth group of convolutional layers: the 16-17th convolutional layer, the 16th convolutional layer uses 256 convolution kernels with a window size of 1 × 1, the window movement step size is 1, and the edges are not filled; the 17th layer Use 512 convolution kernels with a window size of 3 × 3, a window movement step size of 2, and edge padding of 1 pixel; output 512 feature maps with a size of 10 × 10;

第九组卷积层：第18-19层卷积层，第18层使用128个窗口大小为1×1的卷积核，窗口移动步长为1，边缘不填充；第17层使用256个窗口大小为3×3的卷积核，窗口移动步长为2，边缘填充1像素；输出256个尺寸为5×5的特征图；The ninth group of convolutional layers: 18-19 layers of convolutional layers, the 18th layer uses 128 convolution kernels with a window size of 1 × 1, the window movement step size is 1, and the edges are not filled; the 17th layer uses 256 A convolution kernel with a window size of 3×3, a window movement step size of 2, and edge padding of 1 pixel; output 256 feature maps with a size of 5×5;

第十组卷积层：第20-21层卷积层，第20层使用128个窗口大小为1×1的卷积核，窗口移动步长为1，边缘不填充；第21层使用256个窗口大小为3×3的卷积核，窗口移动步长为1，边缘不填充；输出256个尺寸为3×3的特征图；The tenth group of convolutional layers: 20-21 layers of convolutional layers, the 20th layer uses 128 convolution kernels with a window size of 1 × 1, the window movement step size is 1, and the edges are not filled; the 21st layer uses 256 A convolution kernel with a window size of 3×3, a window moving step size of 1, and no edge padding; output 256 feature maps with a size of 3×3;

第十一组卷积层：第22-23层卷积层，第22层使用128个窗口大小为1×1的卷积核，窗口移动步长为1，边缘不填充；第23层使用256个窗口大小为3×3的卷积核，窗口移动步长为1，边缘不填充；输出256个尺寸为1×1的特征图。The eleventh group of convolutional layers: 22-23 layers of convolutional layers, the 22nd layer uses 128 convolution kernels with a window size of 1 × 1, the window movement step size is 1, and the edges are not filled; the 23rd layer uses 256 A convolution kernel with a window size of 3 × 3, the window movement step size is 1, and the edges are not filled; 256 feature maps with a size of 1 × 1 are output.

优选地，所述步骤S3中的多尺度检测网络分别对第10、15、17、19、21、23层卷积层输出的特征图进行如下操作：Preferably, the multi-scale detection network in the step S3 performs the following operations on the feature maps output by the 10th, 15th, 17th, 19th, 21st, and 23rd convolutional layers:

对特征图进行归一化；Normalize the feature map;

使用一个含有4k个窗口大小为3×3的卷积核，窗口移动步长为1，边缘填充1像素的卷积层，提取k个默认边框的坐标信息；其中，k＝1、2、3、4、5、6分别对应第10、15、17、19、21、23层；Use a convolution kernel with 4k windows of size 3 × 3, the window movement step size is 1, and the edge is filled with a convolution layer of 1 pixel, and the coordinate information of k default borders is extracted; among them, k=1, 2, 3 , 4, 5, and 6 correspond to the 10th, 15th, 17th, 19th, 21st, and 23rd floors respectively;

使用一个含有(classes+1)×k个窗口大小为3×3的卷积核，窗口移动步长为1，边缘填充1像素的卷积层，提取k个默认边框的对于每个类别的置信分数，其中，classes为待检测目标的类别数；Use a convolution kernel with (classes+1)×k window size of 3×3, window shift step size 1, edge padding 1 pixel convolution layer, and extract the confidence for each class of k default borders Score, where classes is the number of categories of the target to be detected;

将各层特征图的各默认边框提取到的坐标信息和置信分数拼接起来，以生成检测结果。The coordinate information and confidence scores extracted from the default borders of the feature maps of each layer are spliced together to generate detection results.

优选地，对于第10、15、17、19、21、23层，其默认边框的尺寸值依次为下式中的下式s₁至s₆：Preferably, for the 10th, 15th, 17th, 19th, 21st, and 23rd layers, the default frame size values are sequentially s ₁ to s ₆ in the following formulas:

上式中，s_k为默认边框的尺寸值，s_min＝0.2，s_max＝0.9；In the above formula, s _k is the size value of the default frame, s _min =0.2, s _max =0.9;

对于每层特征图，设置不同的纵横比a_r，并结合该层的尺寸值s_k，得到默认边框的宽度和高度分别为：For each layer of feature maps, set different aspect ratios a _r , and combine the layer's size value _sk to get the width of the default border and height They are:

其中，对于第10、21、23层，a_r∈{1,2,1/2}；对于第15、17、19层a_r∈{1,2,3,1/2,1/3}，且对于上述六个卷积层，还对应有一种尺寸Among them, for layers 10, 21, and 23, a _r ∈ {1, 2, 1/2}; for layers 15, 17, and 19, a _r ∈ {1, 2, 3, 1/2, 1/3} , and for the above six convolutional layers, there is also a corresponding size

优选地，所述步骤S3中多尺度检测网络将各层特征图的各默认边框提取到的坐标信息和置信分数拼接起来以生成检测结果时，首先舍弃所有置信分数低于置信阈值的结果；接着使用非极大值抑制法，保留置信分数更高的最优检测结果的同时，抑制次优结果；最后拼接筛选后的坐标信息和置信分数。Preferably, in the step S3, when the multi-scale detection network splices the coordinate information and confidence scores extracted from each default frame of the feature map of each layer to generate the detection results, first discard all results whose confidence scores are lower than the confidence threshold; then Using the non-maximum suppression method, while retaining the optimal detection results with higher confidence scores, the sub-optimal results are suppressed; finally, the filtered coordinate information and confidence scores are spliced.

(三)有益效果(3) Beneficial effects

本发明的上述技术方案具有如下优点：本发明提供了一种基于深度学习的SAR图像铁塔目标检测方法，利用SAR图像训练SSD模型，通过SSD模型检测SAR图像中的铁塔目标。本发明与现有技术相比，其优点在于：The above technical solution of the present invention has the following advantages: the present invention provides a deep learning-based SAR image tower target detection method, using the SAR image to train the SSD model, and detecting the iron tower target in the SAR image through the SSD model. Compared with the prior art, the present invention has the following advantages:

(1)鲁棒性强(1) Strong robustness

本发明采用多层卷积神经网络模型，可以充分提取目标的高级特征，充分利用卷积层的平移不变性，因此本发明对于SAR图像的平移具有较强的鲁棒性。同时，本发明采用多尺度特征图的多纵横比检测方法，因此本发明对SAR图像的形变也具有较强的鲁棒性。The invention adopts a multi-layer convolutional neural network model, which can fully extract the high-level features of the target and make full use of the translation invariance of the convolution layer, so the invention has strong robustness to the translation of SAR images. At the same time, the present invention adopts a multi-aspect ratio detection method of multi-scale feature maps, so the present invention also has strong robustness to the deformation of the SAR image.

(2)运行速度快(2) Fast running speed

传统的CFAR检测方法以及Faster R-CNN模型都需要经过生成可疑目标、对可疑目标进行鉴别的两个步骤，检测效率较低。本发明采用端对端训练和检测方法，将生成和鉴别集成为一体，提高了训练和检测的速度。Both the traditional CFAR detection method and the Faster R-CNN model need to go through two steps of generating suspicious targets and identifying suspicious targets, and the detection efficiency is low. The invention adopts an end-to-end training and detection method, integrates generation and identification, and improves the speed of training and detection.

(3)检测性能高(3) High detection performance

传统的CFAR检测方法需要根据待测图像中的先验信息设置参数，若参数设置不合理，将严重影响检测结果。本发明中的网络模型充分考虑了各个多尺度特征图的多纵横比的目标，同时规避了CFAR检测方法中参数设置不合理的问题，从而提高了准确率。The traditional CFAR detection method needs to set parameters according to the prior information in the image to be tested. If the parameter settings are unreasonable, it will seriously affect the detection results. The network model in the present invention fully considers the objectives of multiple aspect ratios of each multi-scale feature map, and at the same time avoids the problem of unreasonable parameter settings in the CFAR detection method, thereby improving the accuracy.

(4)易迁移(4) Easy migration

经过训练的卷积神经网络中的较低层参数反映了图像共有的底层特征，如边缘、形状等。因此，当需要训练检测其它目标的模型时，可以基于迁移学习技术更快地完成训练。Lower layer parameters in a trained convolutional neural network reflect underlying features common to images, such as edges, shapes, etc. Therefore, when a model for detecting other objects needs to be trained, the training can be completed faster based on transfer learning techniques.

附图说明Description of drawings

图1是本发明实施例一中基于深度学习的SAR图像铁塔目标检测方法步骤示意图；1 is a schematic diagram of steps of a method for detecting a target in a SAR image iron tower based on deep learning in Embodiment 1 of the present invention;

图2是本发明实施例二中SSD模型示意图；2 is a schematic diagram of an SSD model in Embodiment 2 of the present invention;

图3是本发明实施例三中一幅待测数据图；Fig. 3 is a graph of data to be measured in the third embodiment of the present invention;

图4a和图4b是图3检测结果的局部放大图。FIG. 4a and FIG. 4b are partial enlarged views of the detection result of FIG. 3 .

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明的一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present invention.

实施例一Example 1

如图1所示，本发明实施例提供的一种基于深度学习的SAR图像铁塔目标检测方法，包括：As shown in FIG. 1 , a deep learning-based SAR image tower target detection method provided by an embodiment of the present invention includes:

S1、从SAR数据集中随机抽取多幅SAR图像，分割获取样本切片，为每张样本切片设置样本标签后构建训练样本集，样本标签包括该样本切片内目标的坐标信息和类别信息。S1. Randomly extract multiple SAR images from the SAR data set, obtain sample slices by segmentation, and set up a sample label for each sample slice to construct a training sample set. The sample label includes coordinate information and category information of the target in the sample slice.

优选地，步骤S1包括：Preferably, step S1 includes:

S1-3、针对每一张样本切片加注样本标签，样本标签包括样本切片的绝对路径、目标个数、目标边框坐标(即坐标信息)以及目标类别(即类别信息)；具体的，样本标签的组成可为：S1-3. Add a sample label to each sample slice. The sample label includes the absolute path of the sample slice, the number of targets, the coordinates of the target frame (ie coordinate information) and the target category (ie category information); specifically, the sample label can be composed of:

<DIR>n x_t1y_t1x_b1y_b1c₁x_t2y_t2x_b2y_b2c₂…x_tn y_tn x_bn y_bnc_n <DIR>nx _t1 y _t1 x _b1 y _b1 c ₁ x _t2 y _t2 x _b2 y _b2 c ₂ …x _tn y _tn x _bn y _bn c _n

其中，<DIR>表示对应样本切片的绝对路径，n表示该切片中铁塔目标个数；之后是n组目标信息，每组目标信息包括(x_ti,y_ti)、(x_bi,y_bi)和c_i：x_ti和y_ti表示第i个目标边框左上角坐标，x_bi和y_bi表示第i个目标边框右下角坐标，c_i表示第i个目标的类别。Among them, <DIR> represents the absolute path of the corresponding sample slice, n represents the number of tower targets in the slice; followed by n groups of target information, each group of target information includes (x _ti , y _ti ), (x _bi , y _bi ) ) and c _i : x _ti and y _ti represent the coordinates of the upper left corner of the ith target frame, x _bi and y _bi represent the coordinates of the lower right corner of the _ith target frame, and ci represent the category of the ith target frame.

S2、对训练样本集中的每张样本切片进行处理，生成出多张不同的人工样本切片，并在加注样本标签后加入训练样本集实现扩充。S2. Process each sample slice in the training sample set to generate a plurality of different artificial sample slices, and add the training sample set after adding sample labels to achieve expansion.

优选地，步骤S2中对训练样本集中的每张样本切片进行处理，生成出多张不同的人工样本切片时，随机使用平移、翻转、旋转、增加噪声中的一种或多种方法进行处理，从每一张样本切片变换生成出100张不同的人工样本切片，并根据具体的操作加注人工样本切片对应的样本标签，将各个带有对应样本标签的人工样本切片加入训练样本集，得到扩充后的训练样本集。Preferably, in step S2, each sample slice in the training sample set is processed, and when multiple different artificial sample slices are generated, one or more methods of translation, flip, rotation, and noise addition are randomly used for processing, From each sample slice transformation, 100 different artificial sample slices are generated, and the sample labels corresponding to the artificial sample slices are added according to the specific operation, and each artificial sample slice with the corresponding sample label is added to the training sample set to obtain the expansion After the training sample set.

S3、构建SSD模型，包括：S3. Build the SSD model, including:

15层卷积神经网络，用于对输入图像的图像特征进行初步提取；构建时可基于VGG-16模型网络；A 15-layer convolutional neural network is used to initially extract the image features of the input image; the construction can be based on the VGG-16 model network;

8层卷积神经网络，连接在15层卷积神经网络后方，用于进一步提取不同尺度的图像特征；The 8-layer convolutional neural network is connected behind the 15-layer convolutional neural network to further extract image features of different scales;

多尺度检测网络，连接在8层卷积神经网络后方，对提取到的不同尺度的图像特征进行检测。The multi-scale detection network is connected behind the 8-layer convolutional neural network to detect the extracted image features of different scales.

S4、将所述步骤S2中扩充后的训练样本集作为输入图像输入所述步骤S3构建的SSD模型，采用梯度下降法对所述SSD模型进行训练。S4. Input the expanded training sample set in the step S2 as an input image to the SSD model constructed in the step S3, and use the gradient descent method to train the SSD model.

优选地，步骤S4包括：Preferably, step S4 includes:

S4-5、循环进行所述步骤S4-2至S4-4对SSD模型进行更新，当循环次数达到设定值，或SSD模型中各卷积层的参数不再更新时，训练完成，保存当前SSD模型。S4-5. Perform the steps S4-2 to S4-4 in a loop to update the SSD model. When the number of cycles reaches the set value, or the parameters of each convolutional layer in the SSD model are no longer updated, the training is completed and the current state is saved. SSD model.

S5、将待测数据图切割为多个与样本切片尺寸相同或相近的待检测切片，输入所述步骤S4训练好的SSD模型，得到待测数据图的目标检测结果。优选地，样本切片应为正方形或近似于正方形的长方形，例如长宽比不大于3/2的长方形，待检测切片的尺寸应与训练时的样本切片尺寸相近，例如两个切片(样本切片与待检测切片)的长边之间的比例在2/3至3/2。S5. Cut the data graph to be tested into a plurality of slices to be detected with the same or similar size as the sample slice, input the SSD model trained in step S4, and obtain the target detection result of the data graph to be tested. Preferably, the sample slice should be a square or a rectangle similar to a square, such as a rectangle with an aspect ratio not greater than 3/2, and the size of the slice to be detected should be similar to the size of the sample slice during training, such as two slices (sample slice and The ratio between the long sides of the slice to be detected) is 2/3 to 3/2.

优选地，步骤S5包括：Preferably, step S5 includes:

本发明提供的方法采用多层卷积神经网络模型深度学习提取目标特征的方法实现SAR图像中铁塔目标的检测，无需目标与背景有较高的对比度，并利用多尺度特征图的多纵横比检测方法，多尺度提取特征，对SAR图像的形变有较强的鲁棒性，可对复杂场景进行检测。且生成可疑目标与鉴别可疑目标为一体，训练和检测的速度快。The method provided by the present invention adopts the method of extracting target features through deep learning of a multi-layer convolutional neural network model to realize the detection of the iron tower target in the SAR image, without the need for a high contrast between the target and the background, and uses the multi-scale feature map for multi-aspect ratio detection. The method extracts features at multiple scales, has strong robustness to the deformation of SAR images, and can detect complex scenes. And the generation of suspicious targets and the identification of suspicious targets are integrated, and the speed of training and detection is fast.

实施例二Embodiment 2

如图2所示，本实施例二与实施例一基本相同，相同之处不再赘述，不同之处在于：As shown in Figure 2, the second embodiment is basically the same as the first embodiment, the similarities will not be repeated, and the differences are:

步骤S3中构建的SSD模型输入图像尺寸为300×300像素。The input image size of the SSD model constructed in step S3 is 300 × 300 pixels.

图2是本实施例中的SSD模型示意图，图中标注了关键的卷积层。其中，卷积层4_3表示第四组卷积层中的第3层卷积层，即第10层卷积层，下同。FIG. 2 is a schematic diagram of the SSD model in this embodiment, and the key convolution layers are marked in the figure. Among them, the convolutional layer 4_3 represents the third convolutional layer in the fourth group of convolutional layers, that is, the tenth convolutional layer, the same below.

如图2所示，步骤S3中构建的15层卷积神经网络，其具体结构包括：As shown in Figure 2, the specific structure of the 15-layer convolutional neural network constructed in step S3 includes:

步骤S3中的8层卷积神经网络，其具体结构包括：The specific structure of the 8-layer convolutional neural network in step S3 includes:

第八组卷积层：第16-17层卷积层，第16层卷积层使用256个窗口大小为1×1的卷积核，窗口移动步长为1，边缘不填充；第17层使用512个窗口大小为3×3的卷积核，窗口移动步长为2，边缘填充1像素；第八组整体输出512个尺寸为10×10的特征图；The eighth group of convolutional layers: the 16-17th convolutional layer, the 16th convolutional layer uses 256 convolution kernels with a window size of 1 × 1, the window movement step size is 1, and the edges are not filled; the 17th layer Use 512 convolution kernels with a window size of 3 × 3, a window movement step size of 2, and edge padding of 1 pixel; the eighth group overall outputs 512 feature maps with a size of 10 × 10;

第九组卷积层：第18-19层卷积层，第18层使用128个窗口大小为1×1的卷积核，窗口移动步长为1，边缘不填充；第17层使用256个窗口大小为3×3的卷积核，窗口移动步长为2，边缘填充1像素；第九组整体输出256个尺寸为5×5的特征图；The ninth group of convolutional layers: 18-19 layers of convolutional layers, the 18th layer uses 128 convolution kernels with a window size of 1 × 1, the window movement step size is 1, and the edges are not filled; the 17th layer uses 256 A convolution kernel with a window size of 3×3, a window movement step size of 2, and edge padding of 1 pixel; the ninth group outputs 256 feature maps with a size of 5×5 as a whole;

第十组卷积层：第20-21层卷积层，第20层使用128个窗口大小为1×1的卷积核，窗口移动步长为1，边缘不填充；第21层使用256个窗口大小为3×3的卷积核，窗口移动步长为1，边缘不填充；第十组整体输出256个尺寸为3×3的特征图；The tenth group of convolutional layers: 20-21 layers of convolutional layers, the 20th layer uses 128 convolution kernels with a window size of 1 × 1, the window movement step size is 1, and the edges are not filled; the 21st layer uses 256 The convolution kernel with a window size of 3×3, the window moving step size is 1, and the edges are not filled; the tenth group outputs 256 feature maps with a size of 3×3 as a whole;

第十一组卷积层：第22-23层卷积层，第22层使用128个窗口大小为1×1的卷积核，窗口移动步长为1，边缘不填充；第23层使用256个窗口大小为3×3的卷积核，窗口移动步长为1，边缘不填充；第十一组整体输出256个尺寸为1×1的特征图。The eleventh group of convolutional layers: 22-23 layers of convolutional layers, the 22nd layer uses 128 convolution kernels with a window size of 1 × 1, the window movement step size is 1, and the edges are not filled; the 23rd layer uses 256 A convolution kernel with a window size of 3 × 3, a window movement step size of 1, and no edge padding; the eleventh group outputs 256 feature maps with a size of 1 × 1 as a whole.

步骤S3中的多尺度检测网络分别对第10、15、17、19、21、23层卷积层输出的特征图进行如下操作：The multi-scale detection network in step S3 performs the following operations on the feature maps output by the 10th, 15th, 17th, 19th, 21st, and 23rd convolutional layers respectively:

a)对特征图进行归一化。a) Normalize the feature map.

b)使用一个含有4k个窗口大小为3×3的卷积核，窗口移动步长为1，边缘填充1像素的卷积层，提取k个默认边框的坐标信息；其中，k＝1、2、3、4、5、6分别对应第10、15、17、19、21、23层。b) Use a convolution kernel with 4k windows of size 3×3, the window movement step size is 1, and the edge is filled with a convolution layer of 1 pixel, and the coordinate information of k default borders is extracted; among them, k=1, 2 , 3, 4, 5, and 6 correspond to the 10th, 15th, 17th, 19th, 21st, and 23rd floors respectively.

c)使用一个含有(classes+1)×k个窗口大小为3×3的卷积核，窗口移动步长为1，边缘填充1像素的卷积层，提取k个默认边框的对于每个类别的置信分数，其中，classes为待检测目标的类别数，即训练样本集中各个目标的总类别数。c) Use a convolution kernel with (classes+1)×k window size of 3×3, the window shift step size is 1, and the edge padding is 1 pixel convolution layer, extracting k default borders for each class The confidence score of , where classes is the number of categories of the target to be detected, that is, the total number of categories of each target in the training sample set.

默认边框的尺寸s_k和纵横比a_r为超参数。对于第10、15、17、19、21、23层，其默认边框的尺寸值依次为下式s₁至s₆：The default border size _sk and aspect ratio a _r are hyperparameters. For the 10th, 15th, 17th, 19th, 21st, and 23rd layers, the default border size values are in the following formulas s ₁ to s ₆ :

上式中，s_k为默认边框的尺寸值，s_min＝0.2，s_max＝0.9，即最低层第10层特征图中的默认边框的尺寸值为0.2，最高层第23层特征图中的默认边框的尺寸值为0.9，且各层特征图中默认边框的尺寸值间隔均匀。In the above formula, s _k is the size value of the default frame, s _min = 0.2, s _max = 0.9, that is, the size of the default frame in the feature map of the 10th layer of the lowest layer is 0.2, and the value of the default frame of the feature map of the 23rd layer of the highest layer is 0.2. The size value of the default border is 0.9, and the size values of the default borders in the feature maps of each layer are evenly spaced.

其中，对于第10、21、23层，a_r∈{1,2,1/2}；对于第15、17、19层a_r∈{1,2,3,1/2,1/3}，且对于上述第10、15、17、19、21、23层六个卷积层，再多考虑一种尺寸因此，对于第10、21、23层的特征图，每个位置共最多可生成4个不同的默认边框；对于第15、17、19层的特征图，每个位置共最多可生成6个不同的默认边框。这种设置基本可以覆盖输入图像中各种形状和大小的铁塔目标。Among them, for layers 10, 21, and 23, a _r ∈ {1, 2, 1/2}; for layers 15, 17, and 19, a _r ∈ {1, 2, 3, 1/2, 1/3} , and for the above-mentioned six convolutional layers of the 10th, 15th, 17th, 19th, 21st, and 23rd layers, consider one more size Therefore, for the feature maps of the 10th, 21st, and 23rd layers, a total of 4 different default borders can be generated for each position; for the feature maps of the 15th, 17th, and 19th layers, a total of 6 different default borders can be generated for each position. the default border. This setup basically covers tower targets of various shapes and sizes in the input image.

d)将各层特征图的各默认边框提取到的坐标信息和置信分数拼接起来，以生成检测结果。优选地，四个坐标信息和一个置信分数为一组，表示一个探测结果，拼接时保持坐标信息和置信分数的对应关系，将多个探测结果拼接。d) Concatenate the coordinate information and confidence scores extracted from each default frame of the feature map of each layer to generate a detection result. Preferably, four coordinate information and one confidence score are grouped together to represent one detection result, and the corresponding relationship between the coordinate information and the confidence score is maintained during splicing, and multiple detection results are spliced.

优选地，首先舍弃所有置信分数低于置信阈值(本发明中为0.1)的结果；接着使用非极大值抑制法(NMS)，保留置信分数更高的最优检测结果的同时，抑制次优结果；最后拼接筛选后的坐标信息和置信分数。非极大值抑制法可防止出现同一个目标被多次检测的情况，优选地，本发明中使用交并比IoU(AB)作为依据，判断两个区域重叠率是否过高，若：Preferably, all results whose confidence scores are lower than the confidence threshold (0.1 in the present invention) are first discarded; then the non-maximum suppression method (NMS) is used to retain the optimal detection results with higher confidence scores while suppressing the suboptimal detection results. Results; coordinate information and confidence scores after final stitching screening. The non-maximum value suppression method can prevent the situation that the same target is detected multiple times. Preferably, in the present invention, the intersection and union ratio IoU(AB) is used as the basis to judge whether the overlap rate of the two regions is too high, if:

则认为区域A和区域B重叠率过高。Then it is considered that the overlap rate of area A and area B is too high.

进一步优选地，步骤S4-2中利用前向传播法，计算输入图像对当前SSD模型的代价函数包括：Further preferably, using the forward propagation method in step S4-2, calculating the cost function of the input image to the current SSD model includes:

基于交并比，匹配目标边框和默认边框：首先将每个目标边框和与它的交并比最大的默认边框相匹配，其次将默认边框和任何与它的交并比高于阈值(本发明中为0.5)的目标边框相匹配，这样可以简化学习过程，更好地处理重叠目标之间的预测。Match target borders and default borders based on the intersection ratio: first match each target border with the default border with the largest intersection ratio with it, and secondly match the default border with any intersection with it above a threshold (the present invention 0.5), which simplifies the learning process and better handles predictions between overlapping objects.

设表示对于类别p，第i个默认边框和第j个目标边框之间是否匹配(表示第i个默认边框与类别p的第j个目标边框相匹配，表示第i个默认边框与类别p的第j个目标边框不匹配)，从上述的匹配策略可以看出，即一个目标边框可能会与多个默认边框相匹配。Assume Indicates whether there is a match between the ith default border and the jth target border for category p ( means that the ith default bounding box matches the jth target bounding box of category p, Indicates that the i-th default frame does not match the j-th target frame of category p), as can be seen from the above matching strategy, That is, one target border may match more than one default border.

优选地，代价函数，即总体损失函数可用如下公式表示：Preferably, the cost function, that is, the overall loss function can be expressed by the following formula:

其中，x是构成的集合；c是SSD模型网络预测的分类结果集合；l是预测边框，即SSD模型网络预测的定位结果集合；g是目标边框的信息集合。N是匹配的默认边框的个数，若N＝0，则把损失函数设为0，权重参数α在本方法中被设为1。L_conf(x,c)是总体损失函数中的分类损失分项，L_loc(x,l,g)是总体损失函数中的定位损失分项。where x is c is the set of classification results predicted by the SSD model network; l is the predicted bounding box, that is, the set of positioning results predicted by the SSD model network; g is the information set of the target bounding box. N is the number of matching default borders. If N=0, the loss function is set to 0, and the weight parameter α is set to 1 in this method. L _conf (x, c) is the classification loss sub-item in the overall loss function, and L _loc (x, l, g) is the localization loss sub-item in the overall loss function.

分类损失函数L_conf(x,c)是考虑所有分类的softmax损失，可用如下公式表示：The classification loss function L _conf (x,c) is the softmax loss considering all classifications, which can be expressed by the following formula:

其中，表示第i个默认边框对于类别p的输出分类结果，表示将第i个默认边框对于不同类别p的输出分类结果进行softmax之后得到的置信概率。in, Represents the output classification result of the ith default border for category p, Represents the confidence probability obtained by softmaxing the i-th default frame on the output classification results of different categories p.

定位损失函数L_loc(x,l,g)是预测边框和目标边框之间的Smooth L1损失，可用如下公式表示：The localization loss function L _loc (x,l,g) is the Smooth L1 loss between the predicted bounding box and the target bounding box, which can be expressed by the following formula:

其中，表示第i个预测边框的x坐标，表示第i个预测边框的y坐标，表示第i个预测边框的宽度，表示第i个预测边框的高度；表示退化后的第j个目标边框的x坐标，表示退化后的第j个目标边框的y坐标，表示退化后的第j个目标边框的宽度，表示退化后的第j个目标边框的高度；Smooth L1损失可用如下公式表示：in, represents the x-coordinate of the ith predicted bounding box, represents the y-coordinate of the ith predicted bounding box, represents the width of the ith predicted bounding box, Represents the height of the i-th predicted frame; Represents the x-coordinate of the j-th target frame after degradation, represents the y-coordinate of the degenerate j-th target frame, represents the width of the j-th target border after degradation, Represents the height of the j-th target frame after degradation; the Smooth L1 loss can be expressed by the following formula:

目标边框的坐标被使用默认边框的坐标退化为：The coordinates of the target bounding box are degenerated using the coordinates of the default bounding box to:

其中，表示第j个目标边框的x坐标，表示第j个目标边框的y坐标，表示第j个目标边框的宽度，表示第i个目标边框的高度，表示第i个默认边框的x坐标，表示第i个默认边框的y坐标，表示第i个默认边框的宽度，表示第i个默认边框的高度。in, represents the x-coordinate of the jth target border, represents the y-coordinate of the jth target border, represents the width of the jth target border, represents the height of the ith target border, represents the x-coordinate of the ith default border, represents the y-coordinate of the ith default border, represents the width of the ith default border, Indicates the height of the ith default border.

本发明提供的方法中，构建网络模型时充分考虑了各个多尺度特征图的多纵横比的目标，并针对较低层参数和较高层参数设置了不同的横纵比，同时规避了CFAR检测方法中参数设置不合理的问题，从而提高了准确率。并且，经过训练的卷积神经网络中的较低层参数反映了图像共有的底层特征，如边缘、形状等。因此，当需要训练检测其它目标的模型时，可以基于迁移学习技术更快地完成训练。In the method provided by the present invention, the multi-aspect ratio target of each multi-scale feature map is fully considered when constructing the network model, and different aspect ratios are set for lower layer parameters and higher layer parameters, while avoiding the CFAR detection method. The problem of unreasonable parameter settings in the medium is improved, thereby improving the accuracy. Also, the lower layer parameters in the trained convolutional neural network reflect the underlying features common to the images, such as edges, shapes, etc. Therefore, when a model for detecting other objects needs to be trained, the training can be completed faster based on transfer learning techniques.

实施例三Embodiment 3

如图3、图4a和图4b所示，本实施例三与实施例二基本相同，相同之处不再赘述，不同之处在于：As shown in Figure 3, Figure 4a and Figure 4b, the third embodiment is basically the same as the second embodiment, the similarities will not be repeated, and the differences are:

本实施例所用的待测数据图为一副场景SAR图像，如图3所示。待测数据图的图像大小为16384×8192像素，其中包含了铁塔、建筑等多种人工目标，目的是检测和定位出待测数据图中所有类型的铁塔目标。The data image to be measured used in this embodiment is a scene SAR image, as shown in FIG. 3 . The image size of the data map to be tested is 16384×8192 pixels, which includes various artificial targets such as towers and buildings. The purpose is to detect and locate all types of tower targets in the data map to be tested.

为了更好的观察检测结果，如图4a和4b所示，对目标检测结果的局部进行放大，图中白框标记部分即检测为铁塔的部分。In order to better observe the detection results, as shown in Figures 4a and 4b, the part of the target detection result is enlarged, and the part marked with a white frame in the figure is the part that is detected as an iron tower.

本实施例通过对整幅待测数据图的检测结果进行计算，得出本发明提供的方法准确率为82.4％，召回率(查全率)为97.6％，综合评价指标F1值(精确值和召回率的调和均值)为89.4％，取得了较好的测试结果。In this embodiment, by calculating the detection results of the entire data map to be tested, it is obtained that the method provided by the present invention has an accuracy rate of 82.4%, a recall rate (recall rate) of 97.6%, and a comprehensive evaluation index F1 value (precise value and The harmonic mean of recall) is 89.4%, which achieves better test results.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. a SAR image tower target detection method based on deep learning, is characterized in that, comprises:

S1. Randomly extract multiple SAR images from the SAR data set, segment to obtain sample slices, set a sample label for each sample slice to construct a training sample set, and the sample label includes the coordinate information and category information of the target in the sample slice;

S2, processing each sample slice in the training sample set to generate a plurality of different artificial sample slices, and adding the training sample set to realize expansion after adding the sample label;

S3. Build the SSD model, including:

A 15-layer convolutional neural network is used to initially extract the image features of the input image;

8-layer convolutional neural network for further extracting image features of different scales;

The multi-scale detection network detects the extracted image features of different scales;

S4, using the expanded training sample set in the step S2 as an input image and inputting the SSD model constructed in the step S3, and using the gradient descent method to train the SSD model;

S5. Cut the data graph to be tested into a plurality of slices to be detected with the same size as the sample slice, input the SSD model trained in step S4, and obtain the target detection result of the data graph to be tested.

2. The SAR image iron tower target detection method based on deep learning according to claim 1, is characterized in that, described step S1 comprises:

S1-1. Randomly extract 100 SAR images from the MiniSAR dataset;

S1-2, randomly dividing each SAR image to obtain multiple sample slices;

S1-3. Add a sample label to each sample slice, and the sample label includes the absolute path of the sample slice, the number of targets, the coordinates of the target frame, and the category of the target;

S1-4. Each of the sample slices with corresponding sample labels is formed into a training sample set.

3. the SAR image iron tower target detection method based on deep learning according to claim 1, is characterized in that:

In the step S2, each sample slice in the training sample set is processed, and when multiple different artificial sample slices are generated, one or more methods of translation, flipping, rotation, and noise addition are randomly used for processing, from Each sample slice transformation generates 100 different artificial sample slices.

4. the SAR image tower target detection method based on deep learning according to claim 1, is characterized in that, described step S4 comprises:

S4-1. Input the expanded training sample set into the SSD model;

S4-2, use the forward propagation method to calculate the cost function of the input image to the current SSD model;

S4-3. Using the back-propagation method, respectively calculate the gradient values of the cost function for the parameters in each convolutional layer in the SSD model;

S4-4. Using the gradient descent method, update the parameters in each convolution layer according to the gradient value of the cost function for the parameters in each convolution layer;

S4-5. Repeat steps S4-2 to S4-4 to update the SSD model. When the number of cycles reaches the set value, or the parameters of each convolutional layer in the SSD model are no longer updated, the training is completed and the current SSD model is saved .

5. SAR image tower target detection method based on deep learning according to claim 1, is characterized in that, described step S5 comprises:

S5-1, cutting the data graph to be tested into a plurality of slices with a size similar to the sample slice in a grid shape, as the slices to be detected;

S5-2. Input the slice to be detected into the SSD model trained in step S4, and use the SSD model to perform target detection on each slice to be detected, and obtain the target detection result of each slice to be detected;

S5-3. Merge the target detection results of each slice to be detected at the corresponding positions of the original data map to be tested to obtain the target detection results of the data map to be tested.

6. The deep learning-based SAR image tower target detection method according to any one of claims 1-5, wherein the SSD model input image size constructed in the step S3 is 300×300 pixels;

The 15-layer convolutional neural network includes:

The first group of convolutional layers: 1-2 layers of convolutional layers, each layer uses 64 convolution kernels with a window size of 3 × 3, the window movement step size is 1, the edge is filled with 1 pixel, and the output size is 64 300×300 feature map;

The second group of convolutional layers: 3-4 layers of convolutional layers, each layer uses 128 convolution kernels with a window size of 3 × 3, the window moving step is 1, the edge is filled with 1 pixel, and the output size is 128 150×150 feature map;

The third group of convolutional layers: 5-7 layers of convolutional layers, each layer uses 256 convolution kernels with a window size of 3 × 3, the window moving step is 1, the edge is filled with 1 pixel, and the output size is 256 75×75 feature map;

The fourth group of convolutional layers: 8-10 layers of convolutional layers, each layer uses 512 convolution kernels with a window size of 3 × 3, the window moving step is 1, the edge is filled with 1 pixel, and the output size is 512 38×38 feature map;

The fifth group of convolutional layers: the 11-13th convolutional layers, each layer uses 512 convolution kernels with a window size of 3 × 3, the window moving step is 1, the edge is filled with 1 pixel, and the output size is 512 19×19 feature map;

The sixth group of convolutional layers: the 14th convolutional layer, using 1024 convolution kernels with a window size of 3 × 3, a window shift step size of 1, and edge padding of 1 pixel, outputting 1024 features of size 19 × 19 picture;

The seventh group of convolutional layers: the 15th convolutional layer, using 1024 convolution kernels with a window size of 1×1, a window movement step size of 1, no edge padding, and outputting 1024 feature maps with a size of 19×19 ;

Among them, the feature maps output by the 2nd, 4th, 7th, 10th, and 13th convolutional layers also need to go through the largest downsampling layer for dimensionality reduction, using a downsampling kernel with a window size of 2 × 2, and a window moving step size of 2.

7. the SAR image iron tower target detection method based on deep learning according to claim 6, is characterized in that, the 8-layer convolutional neural network in described step S3 comprises:

The eighth group of convolutional layers: the 16-17th convolutional layer, the 16th convolutional layer uses 256 convolution kernels with a window size of 1 × 1, the window movement step size is 1, and the edges are not filled; the 17th layer Use 512 convolution kernels with a window size of 3 × 3, a window movement step size of 2, and edge padding of 1 pixel; output 512 feature maps with a size of 10 × 10;

The ninth group of convolutional layers: 18-19 layers of convolutional layers, the 18th layer uses 128 convolution kernels with a window size of 1 × 1, the window movement step size is 1, and the edges are not filled; the 17th layer uses 256 A convolution kernel with a window size of 3×3, a window movement step size of 2, and edge padding of 1 pixel; output 256 feature maps with a size of 5×5;

The tenth group of convolutional layers: 20-21 layers of convolutional layers, the 20th layer uses 128 convolution kernels with a window size of 1 × 1, the window movement step size is 1, and the edges are not filled; the 21st layer uses 256 A convolution kernel with a window size of 3×3, a window moving step size of 1, and no edge padding; output 256 feature maps with a size of 3×3;

The eleventh group of convolutional layers: 22-23 layers of convolutional layers, the 22nd layer uses 128 convolution kernels with a window size of 1 × 1, the window movement step size is 1, and the edges are not filled; the 23rd layer uses 256 A convolution kernel with a window size of 3 × 3, the window movement step size is 1, and the edges are not filled; 256 feature maps with a size of 1 × 1 are output.

8 . The deep learning-based SAR image tower target detection method according to claim 7 , wherein the multi-scale detection network in the step S3 is respectively used for the 10th, 15th, 17th, 19th, 21st, and 23rd layer volumes. 9 . The feature map output by the layered layer performs the following operations:

Normalize the feature map;

Use a convolution kernel with 4k windows of size 3 × 3, the window movement step size is 1, and the edge is filled with a convolution layer of 1 pixel, and the coordinate information of k default borders is extracted; among them, k=1, 2, 3 , 4, 5, and 6 correspond to the 10th, 15th, 17th, 19th, 21st, and 23rd floors respectively;

Use a convolution kernel with (classes+1)×k window size of 3×3, window shift step size 1, edge padding 1 pixel convolution layer, and extract the confidence for each class of k default borders Score, where classes is the number of categories of the target to be detected;

The coordinate information and confidence scores extracted from the default borders of the feature maps of each layer are spliced together to generate detection results.

9. The deep learning-based SAR image tower target detection method according to claim 8, wherein for the 10th, 15th, 17th, 19th, 21st, and 23rd layers, the size value of the default frame is in the following formula. The following formulas s ₁ to s ₆ :

In the above formula, s _k is the size value of the default frame, s _min =0.2, s _max =0.9;

For each layer of feature maps, set different aspect ratios a _r , and combine the layer's size value _sk to get the width of the default border and height They are:

Among them, for layers 10, 21, and 23, a _r ∈ {1, 2, 1/2}; for layers 15, 17, and 19, a _r ∈ {1, 2, 3, 1/2, 1/3} , and for the above six convolutional layers, there is also a corresponding size

10. SAR image iron tower target detection method based on deep learning according to claim 9, is characterized in that:

In the step S3, when the multi-scale detection network splices the coordinate information and confidence scores extracted from the default frames of the feature maps of each layer to generate the detection results, first discard all results whose confidence scores are lower than the confidence threshold; then use non-polarity The large-value suppression method retains the optimal detection results with higher confidence scores while suppressing the sub-optimal results; finally, the coordinate information and confidence scores after screening are spliced.