CN108710919A

CN108710919A - A kind of crack automation delineation method based on multi-scale feature fusion deep learning

Info

Publication number: CN108710919A
Application number: CN201810517520.8A
Authority: CN
Inventors: 张建; 倪富陶
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2018-05-25
Filing date: 2018-05-25
Publication date: 2018-10-26

Abstract

The present invention discloses the automatic delineation method in crack based on multi-scale feature fusion deep learning, and method includes：Crack Analysis On Multi-scale Features figure is obtained using learning strategy is migrated；The convolutional layer and bilinear interpolation for being 1 × 1 using convolution kernel size successively merge the characteristic pattern of multiple scales and finally obtain the fusion feature of multidimensional；The Pixel Information of multidimensional fusion feature is further merged using continuous multiple dimensioned full convolutional network, realizes the prediction for each pixel class in image.The present invention may learn the FRACTURE CHARACTERISTICS of multiple scales, the relationship between pixel in the relationship and regional extent between different scale feature respective pixel is fully considered, realize that the crack of quick high accuracy is delineated automatically, it can be suitable for the Crack Detection all types of complex environments.

Description

An automatic crack delineation method based on multi-scale feature fusion deep learning

技术领域technical field

本发明涉及结构检测与评估领域，特别是涉及一种利用图像、视屏等对结构表面裂缝自动化检测的方法。The invention relates to the field of structure detection and evaluation, in particular to a method for automatic detection of structural surface cracks by using images, video screens and the like.

背景技术Background technique

结构裂缝是土木工程领域最常见的病害之一，对结构的耐久性和安全性产生极大的危害，因此裂缝是各类型结构健康状况的主要评价指标之一。而现阶段裂缝检测仍然以人工检测为主，需要人工的进行标记将裂缝勾画出来，而后再进行长度、宽度、裂缝类型等方面的分析。这种检测方法需要借助手脚架等设备，劳动强度大、安全性低、检测效率低。Structural cracks are one of the most common diseases in the field of civil engineering, which can cause great harm to the durability and safety of structures. Therefore, cracks are one of the main evaluation indicators for the health of various types of structures. At present, the detection of cracks is still dominated by manual detection, which requires manual marking to outline the cracks, and then analyzes the length, width, and type of cracks. This detection method requires the use of scaffolding and other equipment, which is labor-intensive, low in safety, and low in detection efficiency.

基于数字图像处理的裂缝检测技术虽然逐渐的被应用到结构裂缝检测中，但是传统的基于图像处理的裂缝自动勾画方法比如边缘检测、阈值分割、简单神经网络等适用性差，检测效果很大程度上依赖于人工干预，对于工程中复杂图像中的大量污渍、人工标记干扰等的抗干扰能力很差。Although the crack detection technology based on digital image processing has been gradually applied to structural crack detection, the traditional automatic crack delineation methods based on image processing, such as edge detection, threshold segmentation, and simple neural network, have poor applicability, and the detection effect is largely Relying on manual intervention, the anti-interference ability for a large number of stains in complex images in engineering, artificial marking interference, etc. is poor.

近几年来，随着深度学习在图像识别领域取得的惊人的成功，深度学习技术也开始逐渐被应用到了裂缝检测中。在国外，Chen,ZhiQiang等人将深度学习应用到基于图像处理的裂缝识别中，利用Faster RCNN自动的检测出图像中不同尺度的裂缝。在国内，李惠等人利用深度受限玻尔兹曼机实现钢结构疲劳裂纹的自动化检测。在国内，公开号为CN106910186A的专利文献，公开了一种基于CNN深度学习的桥梁裂缝检测定位方法，其缺点是采用的CNN模板只有16像素大小，难以满足图像中不同宽度裂缝的检测，并且没有能够输出精确到像素的裂缝二值化图像。公开号为CN107133960A的专利文献，公开了一种基于全卷积神经网络(Fully Convolutional Networks，FCN)的图像裂缝分割方法，其缺点是对于裂缝的细节不敏感，并且没有充分考虑像素与像素之间的关系，忽略了在通常基于像素分类的分割方法中使用的空间规整步骤。In recent years, with the astonishing success of deep learning in the field of image recognition, deep learning technology has gradually been applied to crack detection. In foreign countries, Chen, Zhiqiang et al. applied deep learning to crack recognition based on image processing, and used Faster RCNN to automatically detect cracks of different scales in the image. In China, Li Hui and others used the depth-restricted Boltzmann machine to realize the automatic detection of fatigue cracks in steel structures. In China, the patent document with the publication number CN106910186A discloses a bridge crack detection and positioning method based on CNN deep learning. Capable of outputting pixel-accurate crack binarized images. The patent document with publication number CN107133960A discloses an image crack segmentation method based on fully convolutional neural network (Fully Convolutional Networks, FCN), which has the disadvantage of being insensitive to the details of cracks, and does not fully consider the relationship between pixels and pixels. , ignoring the spatial regularization step used in the usual pixel classification based segmentation methods.

总的来说，裂缝状况是各类型结构健康状态评估的主要指标之一，人工勾画裂缝费时费力，传统的基于数字图像处理的裂缝检测技术适用性差、难以在复杂的工程环境中应用，基于深度学习的裂缝检测已经取得了一定的成果，但距离实际工程应用仍然有很大的提升空间。In general, crack status is one of the main indicators for evaluating the health status of various types of structures. Manually drawing cracks is time-consuming and laborious. The traditional crack detection technology based on digital image processing has poor applicability and is difficult to apply in complex engineering environments. The crack detection of learning has achieved certain results, but there is still a lot of room for improvement in practical engineering applications.

发明内容Contents of the invention

针对现有技术存在的不足，本发明拟公开一种基于多尺度特征融合深度学习的裂缝自动化勾画方法，能够实现完全依靠深度学习本身的裂缝检测定位、并完成高精度的自动勾画。In view of the deficiencies in the existing technology, the present invention intends to disclose an automatic crack delineation method based on multi-scale feature fusion and deep learning, which can realize crack detection and positioning completely relying on deep learning itself, and complete high-precision automatic delineation.

本发明思路为：The idea of the present invention is:

本发明基于特征金字塔网络(Feature pyramid networks,FPN)和迁徙学习策略，提供一种实现图像中裂缝定位、分割的深度学习网络，能够自动的识别出图像中的裂缝，并融合深度学习网络各层学习到的不同尺度、维度的特征，利用后续多层的全卷积层实现精确到每个像素的裂缝预测。Based on Feature pyramid networks (FPN) and migration learning strategy, the present invention provides a deep learning network that realizes the location and segmentation of cracks in images, can automatically identify cracks in images, and integrates the layers of the deep learning network The learned features of different scales and dimensions are used to realize crack prediction accurate to each pixel by using the subsequent multi-layer full convolution layer.

为解决上述技术问题，本发明采用如下的技术方案：In order to solve the problems of the technologies described above, the present invention adopts the following technical solutions:

基于多尺度特征融合深度学习的裂缝自动化勾画方法，包括：An automatic crack delineation method based on multi-scale feature fusion deep learning, including:

(1)基于迁徙学习的裂缝定性检测及特征提取方法(1) Crack qualitative detection and feature extraction method based on migration learning

利用相机采集不同结构表面裂缝图像，构建裂缝图像的数据库。在图像识别领域，已有大量经过1000类分类图像数据库(ImageNet数据库)的各类型预训练深度学习模型，比如GoogLeNet、VGG-16、ResNet 101等，这些深度学习模型对于图像分类的能力已经超过人类。替换已保存图像分类信息的预训练模型的最后一层，即将分类输出数量替换为2(裂缝、非裂缝)。利用构建的裂缝图像数据库，结合预训练模型输入图像的大小，通过图像裁剪、旋转、缩放等操作搭建裂缝图像训练集和测试集，数据库中包含裂缝图像和非裂缝图像两种类型，裂缝图像标记为1，非裂缝图像标记为0。预训练的模型参数中已经包含了图像定性分类的先验知识，利用搭建的裂缝图像数据库进行迁移学习，训练模型直到网络收敛，最终得到高质量的裂缝图像定性分类检测模型。The camera is used to collect images of cracks on the surface of different structures, and a database of crack images is constructed. In the field of image recognition, there are a large number of pre-trained deep learning models of various types of 1000-category image database (ImageNet database), such as GoogLeNet, VGG-16, ResNet 101, etc. The ability of these deep learning models for image classification has surpassed that of humans. . Replace the last layer of the pre-trained model that has saved image classification information, i.e. replace the number of classification outputs with 2 (crack, non-crack). Using the constructed crack image database, combined with the size of the input image of the pre-training model, the crack image training set and test set were constructed through image cropping, rotation, scaling and other operations. The database contains two types of crack images and non-crack images, and the crack image marks is 1, and non-cracked images are marked as 0. The pre-trained model parameters already include the prior knowledge of image qualitative classification, and use the built crack image database for transfer learning, train the model until the network converges, and finally obtain a high-quality crack image qualitative classification detection model.

在训练好的深度学习模型定性检测裂缝图像时，利用深度学习网络各层的输出来提取输入裂缝图像在各个尺度下的不同维度的特征，完成输入图像的特征提取。由于深度学习网络的层数很多，所以不需要提取所有层的输出作为图像不同尺度的特征，深度学习网络各层输出特征图尺寸往往从大到小，对于输出特征图尺寸相同的层，只取输出这一尺寸大小特征图网络的最后一层作为特征提取层，这样就可以得到特征图尺度等比减少的各个尺度的裂缝特征图。以GoogLeNet为例，输入图像大小为224×224像素，那么依次提取的特征图大小为：64个112×112大小的第一阶特征图、192个56×56大小的第二阶特征图、480个28×28大小的第三阶特征图、832个14×14大小的第四阶特征图、1024个7×7大小的第五阶特征图。When the trained deep learning model qualitatively detects the crack image, the output of each layer of the deep learning network is used to extract the features of different dimensions of the input crack image at each scale, and the feature extraction of the input image is completed. Due to the large number of layers in the deep learning network, it is not necessary to extract the output of all layers as the features of different scales of the image. The size of the output feature map of each layer of the deep learning network is often from large to small. For layers with the same output feature map size, only take Output the last layer of the feature map network of this size as the feature extraction layer, so that the crack feature map of each scale with the proportional reduction of the feature map scale can be obtained. Taking GoogLeNet as an example, the input image size is 224×224 pixels, then the size of the feature maps extracted in sequence is: 64 first-order feature maps of 112×112 size, 192 second-order feature maps of 56×56 size, 480 A third-order feature map of size 28×28, 832 fourth-order feature maps of size 14×14, and 1024 fifth-order feature maps of size 7×7.

(2)多尺度深度学习特征融合方法(2) Multi-scale deep learning feature fusion method

在已有训练好深度学习模型提取多个尺度裂缝特征图像基础上，将各个尺度的特征进行逐层融合，最终得到多维的裂缝特征图。其中，高阶特征图像尺寸小但是维数大，低阶特征尺寸大但是维度小，融合不同尺度的特征必要在尺度和维度上都能够统一，具体方法为：On the basis of the existing trained deep learning model to extract multiple scale crack feature images, the features of each scale are fused layer by layer, and finally a multi-dimensional crack feature map is obtained. Among them, the size of the high-order feature image is small but the dimension is large, and the size of the low-order feature is large but the dimension is small. The fusion of features of different scales must be unified in both scale and dimension. The specific method is:

①融合多个尺度特征时候，采用相邻两阶特征逐层融合的方式，先融合较高阶两个尺度的特征为一个尺度的特征，在用融合后的高阶特征图像融合较低阶的特征图像；① When fusing multiple scale features, the adjacent two-level features are fused layer by layer. First, the higher-level two-scale features are fused into one scale feature, and the lower-level features are fused with the fused high-level feature image. feature image;

②在融合相邻两阶特征图像的时候，先要在维度上将两阶特征统一，利用卷积核大小为1×1的卷积层实现高阶特征的降维，使高阶特征的维度降低到与低阶特征的维度一样。以GoogLeNet的第4、5阶特征融合为例，第4阶特征为832个14×14大小的第四阶特征图，第五阶为1024个7×7大小的第五阶特征图，在第5阶特征图后接卷积核1×1大小的卷积层，控制输出维度为832个，这样第五阶特征图就降维到832个7×7大小的特征图；② When fusing adjacent two-order feature images, first unify the two-order features in dimension, and use the convolution layer with a convolution kernel size of 1×1 to achieve dimensionality reduction of high-order features, so that the dimension of high-order features Reduced to the same dimension as the low-level features. Taking GoogLeNet's 4th and 5th-order feature fusion as an example, the 4th-order feature is 832 14×14-sized fourth-order feature maps, and the fifth-order is 1024 7×7-sized fifth-order feature maps. The 5th-order feature map is followed by a convolution layer with a convolution kernel of 1×1 size, and the control output dimension is 832, so that the fifth-order feature map is reduced to 832 feature maps of 7×7 size;

③在高阶特征降维实现维度的统一后，还要实现特征图像尺度上的统一。将降维后的高阶特征图像，利用双线性插值，放大到与低阶特征图像一样的大小。同样以GoogLeNet的第4、5阶特征融合为例，第5阶特征降维到832个后，用双线性插值的方式，将7×7大小的特征图缩放到14×14大小，第5阶特征就变为832个14×14大小的特征图；③ After the high-order feature dimension reduction realizes the unification of dimensions, it is also necessary to unify the scale of feature images. The dimensionally reduced high-order feature image is enlarged to the same size as the low-order feature image by using bilinear interpolation. Also take GoogLeNet's 4th and 5th-order feature fusion as an example. After the 5th-order feature is reduced to 832, bilinear interpolation is used to scale the 7×7 feature map to 14×14 size. The first-order feature becomes 832 feature maps of 14×14 size;

④在相邻两阶特征图像的维度和尺寸都统一时，采用对应维度对应像素逐个像素像素值累加的方式，实现两个尺度特征的融合。以GoogLeNet的第4、5阶特征融合为例，在第5阶特征的维度、尺寸与第4阶特征统一时，在832个维度上分别于第4阶特征进行逐个像素像素值的累加，融合得到832个14×14大小的特征图。④ When the dimensions and sizes of the adjacent two-level feature images are uniform, the fusion of the two scale features is realized by accumulating the pixel values of the corresponding pixels of the corresponding dimensions pixel by pixel. Taking GoogLeNet's 4th and 5th-order feature fusion as an example, when the dimension and size of the 5th-order feature are unified with the 4th-order feature, the pixel values of the 4th-order features are accumulated pixel by pixel in 832 dimensions, and the fusion 832 feature maps of size 14×14 are obtained.

⑤将选择的多阶特征融合后，可以通过卷积核1×1大小的卷积层来达到最终融合特征维度上的升维或者降维，以适应计算机的运行能力。⑤ After the selected multi-level features are fused, the convolutional layer with a convolution kernel size of 1×1 can be used to achieve dimensionality enhancement or dimensionality reduction in the final fusion feature dimension to adapt to the computer's operating capabilities.

依次类推，采用两两逐层融合的方式完多阶特征的融合。本发明专利也包含低阶特征通过1×1大小的卷积层达到升维的效果，从而与高阶特征融合的方式。By analogy, the fusion of multi-level features is completed by means of layer-by-layer fusion. The patent of the present invention also includes the method of merging low-order features with high-order features through a 1×1 convolutional layer to achieve dimensionality enhancement.

(3)连续多尺度全卷积层裂缝预测方法(3) Continuous multi-scale fully convolutional layer crack prediction method

在得到多阶不同尺度特征融合特征后，利用后续的多尺度连续全卷积层实现像素级别的裂缝预测，具体方法为：After obtaining the multi-level and different scale feature fusion features, the subsequent multi-scale continuous full convolution layer is used to realize the crack prediction at the pixel level. The specific method is as follows:

①用卷积层、非线性激活层、批标准化层搭建4层连续的多尺度全卷积层，其中不包含池化层。① Build 4 consecutive multi-scale full convolutional layers with convolutional layers, nonlinear activation layers, and batch normalization layers, which do not include pooling layers.

②其中，第一层全卷积层的卷积核大小为小的尺度，用于每个像素点在不同维度上信息线性组合，比如1×1大小的卷积核或者3×3大小的卷积核，紧跟批标准化层和非线性激活层，第一层全卷积层保持输出的维度不变；第二层全卷积层的卷积核大小为大的尺度，用于每个像素点及其周围大范围区域像素信息的整合，比如21×21大小等，紧跟批标准化层和非线性激活层，第二层全卷积层保持输出的维度不变。第三层全卷积层与第一层全卷积层构造相同；第四层全卷积层采用卷积核大小为1×1，输出维度降维到1维，其后通过Sigmoid函数得到每个像素对应的最终预测值。② Among them, the size of the convolution kernel of the first full convolutional layer is a small scale, which is used for the linear combination of information of each pixel in different dimensions, such as a convolution kernel of 1×1 size or a volume of 3×3 size The product kernel, followed by the batch normalization layer and the nonlinear activation layer, the first layer of full convolution layer keeps the output dimension unchanged; the size of the convolution kernel of the second layer of full convolution layer is a large scale, which is used for each pixel The integration of pixel information of the point and its surrounding large-scale area, such as 21×21 size, etc., follows the batch normalization layer and nonlinear activation layer, and the second layer of full convolution layer keeps the dimension of the output unchanged. The structure of the third full convolutional layer is the same as that of the first full convolutional layer; the fourth full convolutional layer adopts a convolution kernel size of 1×1, and the output dimension is reduced to 1 dimension, and then the Sigmoid function is used to obtain each The final predicted value corresponding to pixels.

③上一步骤得到的并不是裂缝的二值化图像，网络直接预测得到的图像需要经过一次双线性插值缩放，缩放到与输入图像一样的大小。利用阈值0.5对缩放后的图像进行判别，像素值大于0.5的即为裂缝像素，像素值小于0.5的即为背景像素。③The image obtained in the previous step is not the binary image of the crack. The image directly predicted by the network needs to be scaled by bilinear interpolation to the same size as the input image. A threshold of 0.5 is used to discriminate the scaled image, the pixel value greater than 0.5 is the crack pixel, and the pixel value less than 0.5 is the background pixel.

(4)所述的多尺度特征融合深度学习采用如下方法进行样本训练，具体为：(4) The multi-scale feature fusion deep learning adopts the following method for sample training, specifically:

①首先利用迁徙学习策略，结合构建的裂缝数据库，训练好裂缝定性识别的网络，能够利用深度学习各层来提取不同尺度的裂缝特征；① First, use the migration learning strategy, combined with the constructed crack database, to train the network for qualitative crack identification, and use deep learning to extract crack features of different scales;

②在裂缝定性识别的训练集和测试集中，抽取部分裂缝图像并进行人工勾画(裂缝像素值为1，背景像素值为0)，搭建多尺度特征融合深度学习网络的训练集和测试集。其中，训练集和测试集的标签尺度跟随搭建网络最终输出尺寸改变，采用双线性插值的方式改变训练集和测试集标签的大小，并用0.3为阈值将标签二值化；② In the training set and test set of crack qualitative identification, some crack images were extracted and manually sketched (the crack pixel value is 1, the background pixel value is 0), and the training set and test set of multi-scale feature fusion deep learning network were built. Among them, the label scale of the training set and the test set is changed according to the final output size of the network, and the size of the training set and the test set label is changed by bilinear interpolation, and the label is binarized with 0.3 as the threshold;

③设置训练网络的损失函数为交叉熵代价函数(cross-entropy cost funtion),其表达式为：③Set the loss function of the training network to the cross-entropy cost function (cross-entropy cost function), and its expression is:

其中，y为对应像素的标签(0或1)，a为网络的实际输出。Among them, y is the label (0 or 1) of the corresponding pixel, and a is the actual output of the network.

④利用迁徙学习，将裂缝定性识别深度学习网络的特征提取层权重赋给裂缝自动勾画的多尺度特征融合深度学习网络，采用随机梯度下降算法不断更新网络的权重，减少损失函数的值，使得网络逐渐收敛。④Using migration learning, the weight of the feature extraction layer of the deep learning network for qualitative crack identification is assigned to the multi-scale feature fusion deep learning network for automatic crack delineation, and the stochastic gradient descent algorithm is used to continuously update the weight of the network and reduce the value of the loss function. Gradually converge.

(5)裂缝检测定位及裂缝自动勾画(5) Crack detection and positioning and automatic crack drawing

网络训练完成后，实现裂缝的自动勾画分为两个步骤：After the network training is completed, the automatic delineation of cracks is divided into two steps:

①裂缝定性深度学习网络实现裂缝定位检测，对于图像尺寸大于深度学习检测模型输入的情况，利用滑动窗口扫描的检测方式扫描检测图像，每个扫描检测的窗口在图像中x、y依次方向错开1/2个输入图像大小，根据每个窗口检测图像的分类实现裂缝的大致定位。① Crack qualitative deep learning network realizes crack location detection. For the case where the image size is larger than the input of the deep learning detection model, the sliding window scanning detection method is used to scan and detect the image. The windows of each scanning detection are staggered in the x and y directions of the image by 1 /2 input image size, the approximate location of the crack is achieved according to the classification of each window detection image.

②在上一步裂缝定位的基础上，将每个检测到裂缝窗口的图像裁剪出来，作为裂缝自动勾画的多尺度特征融合深度学习网络的输入，从而输出每个窗口内的裂缝自动勾画。最终完成整个图像内裂缝的自动化勾画。② On the basis of the crack location in the previous step, the image of each detected crack window is cut out, and used as the input of the multi-scale feature fusion deep learning network for automatic crack delineation, so as to output the automatic delineation of cracks in each window. The end result is automated delineation of cracks within the entire image.

相对于现有技术，本发明的有益效果如下：本发明开发设备能够广泛应用于结构表面裂缝检测，仅用数秒时间即可完成高精度的裂缝自动勾画：Compared with the prior art, the beneficial effects of the present invention are as follows: the development equipment of the present invention can be widely used in the detection of cracks on the surface of structures, and it only takes a few seconds to complete the automatic outline of high-precision cracks:

(1)该技术方案能够实现结构表面裂缝的快速自动化精准勾画，与传统方法相比：本发明提出算法不需要人工干涉的调整阈值分割等系数，完全依赖网络本身去识别裂缝，使用范围广，自动化程度高；(1) This technical solution can realize fast, automatic and accurate delineation of structural surface cracks. Compared with traditional methods: the algorithm proposed in the present invention does not require manual intervention to adjust coefficients such as threshold segmentation, and completely relies on the network itself to identify cracks. It has a wide range of applications. high degree of automation;

(2)利用深度学习提取各个尺度的裂缝特征，充分考虑了不同尺度特征对应像素之间的关系以及区域范围内各像素之间的关系，可快速实现高精度的裂缝区域分割。(2) Deep learning is used to extract fracture features at various scales, fully considering the relationship between pixels corresponding to different scale features and the relationship between pixels in the region, and can quickly achieve high-precision fracture region segmentation.

(3)该技术能够适用于各种类型的结构表面裂缝检测，在不同的复杂环境下能够有效的剔除噪声干扰，仅需要数秒时间就能完成一幅图像中裂缝的高精度自动化检测，能够搭载到无人机或者手持式的检测平台上，满足实时采集图像实时分析的需求。(3) This technology can be applied to the detection of cracks on the surface of various types of structures. It can effectively eliminate noise interference in different complex environments. It only takes a few seconds to complete the high-precision automatic detection of cracks in an image. It can be equipped with To the UAV or handheld detection platform to meet the needs of real-time image acquisition and real-time analysis.

附图说明Description of drawings

图1是本发明中所述的基于多尺度特征融合深度学习的裂缝自动勾画方法示意图；Fig. 1 is a schematic diagram of the crack automatic delineation method based on multi-scale feature fusion deep learning described in the present invention;

图2是本发明所述具体实施方式中的GoogLeNet网络示意图；Fig. 2 is the GoogLeNet network schematic diagram in the specific embodiment of the present invention;

图3是本发明所述具体实施方式中的融合GoogLeNet网络提取的第3、4、5阶特征示意图；Fig. 3 is the 3rd, 4th, 5th order feature schematic diagrams that the fusion GoogLeNet network in the specific embodiment of the present invention extracts;

图4是本发明所述具体实施方式中的连续多尺度全卷积层示意图；Fig. 4 is a schematic diagram of a continuous multi-scale fully convolutional layer in a specific embodiment of the present invention;

图5是本发明所述实例中的不同复杂环境背景下裂缝检测对比分析图。Fig. 5 is a comparative analysis diagram of crack detection under different complex environmental backgrounds in the example of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、内容和优点更加浅显易懂，下面具体介绍本发明的实施方式，但不应以此限制本发明的保护范围。In order to make the purpose, content and advantages of the present invention more understandable, the implementation of the present invention will be introduced in detail below, but the protection scope of the present invention should not be limited thereby.

实施例：一种基于多尺度特征融合深度学习的裂缝自动化勾画方法，所述方法包括以下步骤：Embodiment: a kind of method for automatic delineation of cracks based on deep learning of multi-scale feature fusion, the method includes the following steps:

(1)基于迁徙学习的裂缝定性检测及特征提取方法，(1) Crack qualitative detection and feature extraction method based on migration learning,

(2)多尺度深度学习特征融合方法，(2) Multi-scale deep learning feature fusion method,

(3)连续多尺度全卷积层裂缝预测方法，(3) Continuous multi-scale fully convolutional layer crack prediction method,

(4)所述的多尺度特征融合深度学习采用如下方法进行样本训练，(4) The multi-scale feature fusion deep learning adopts the following method for sample training,

(5)裂缝检测定位及裂缝自动勾画。(5) Crack detection and location and crack automatic outline.

具体如下：步骤一、采用的输入为224*224*3像素的深度学习模型为公开的GoogLeNet模型，原网络的输出层为1000类，修改网络的输出层为2类即分为裂缝和非裂缝两种，如图2所示。构建的224*224*3图像裂缝训练集，包含裂缝图像6.4万张和非裂缝图像6.4张，构建图像测试集，包含1.6万张裂缝图像和1.6万张非裂缝图像。利用迁徙学习的策略，将开源的1000类分类预训练模型的权重赋值给修改分类数为2类的模型，采用随机梯度下降算法更新网络的权重，直到网络收敛。网络训练好之后，网络各层输出的特征图像尺寸有以下5种：112×112、56×56、28×28、14×14、7×7。取输出每种尺寸特征图像的最高阶一层作为该尺寸特征的输出层，最终得到：64维度的112×112大小第一阶特征、192维度的56×56大小第二阶特征、480维度28×28大小第三阶特征、832维度14×14大小第四阶特征、1024维度7×7大小第五阶特征。The details are as follows: Step 1. The deep learning model with an input of 224*224*3 pixels is the public GoogLeNet model. The output layer of the original network is 1000 categories. The output layer of the modified network is divided into 2 categories, namely cracks and non-cracks. Two, as shown in Figure 2. The 224*224*3 image crack training set was constructed, including 64,000 crack images and 6.4 non-crack images, and the image test set was constructed, including 16,000 crack images and 16,000 non-crack images. Using the strategy of migration learning, the weights of the open-source 1000-category classification pre-training model are assigned to the model with a modified classification number of 2 categories, and the stochastic gradient descent algorithm is used to update the weight of the network until the network converges. After the network is trained, the feature image sizes output by each layer of the network have the following five types: 112×112, 56×56, 28×28, 14×14, 7×7. Take the highest-level layer that outputs feature images of each size as the output layer of this size feature, and finally get: 112×112 size first-order features of 64 dimensions, 56×56 size second-order features of 192 dimensions, 480 dimensions 28 ×28 size third-order features, 832 dimensions 14×14 size fourth-order features, 1024 dimensions 7×7 size fifth-order features.

步骤二、以融合第3、4、5阶特征图像为例，如图3所示。第五阶特征图为7×7×1024大小，先通过卷积核为1×1大小的卷积层将特征图转化为7×7×832大小，在利用双线性插值将特征图转化为14×14×832大小；将转化后的第5阶特征和第4阶特征进行融合，在对应的维度上进行逐个像素的累加，得到融合后的第4阶特征图14×14×832大小；同样，利用卷积核为1×1大小的卷积层和双线性插值层将融合后的第4阶特征图转化为28×28×480大小，再与第3阶特征进行对应维度上逐个像素的累加，得到融合后的第3阶特征图28×28×480大小；而后为了减少计算机训练的负担，进一步对数据进行降维，利用卷积核为1×1大小的卷积层将融合的第三阶特征图降维到28×28×192大小，并通过两次双线性插值将特征图放大到112×112×192大小，完成第3、4、5阶特征的融合。Step 2. Take the fusion of the 3rd, 4th, and 5th order feature images as an example, as shown in FIG. 3 . The fifth-order feature map has a size of 7×7×1024. First, the feature map is converted to a size of 7×7×832 through a convolution layer with a convolution kernel of 1×1 size, and then the feature map is converted into 14×14×832 in size; fuse the converted 5th-order features and 4th-order features, and accumulate pixel by pixel in the corresponding dimension to obtain the fused 4th-order feature map with a size of 14×14×832; Similarly, use the convolution layer with a convolution kernel of 1×1 size and the bilinear interpolation layer to convert the fused fourth-order feature map into a size of 28×28×480, and then perform the corresponding dimension one by one with the third-order features. The accumulation of pixels results in a fused third-order feature map with a size of 28×28×480; then, in order to reduce the burden of computer training, the data is further reduced in dimension, and the convolution layer with a convolution kernel of 1×1 size is used to fuse The dimension of the third-order feature map is reduced to 28×28×192, and the feature map is enlarged to 112×112×192 by two bilinear interpolation to complete the fusion of the 3rd, 4th, and 5th-order features.

步骤三、搭建多尺度的全卷积层，达到预测每个像素的目的，如图4所示。步骤二中最终得到的融合特征大小为112×112×192大小，后接第一层小尺度全卷积层的卷积核大小为1×1大小，其后跟随批标准化层和非线性激活层，这一层输出的维度为192维，输出尺寸不变；后接第二层大尺度全卷积层的卷积核大小为21×21大小，其后跟随批标准化层和非线性激活层，这一层输出的维度为112×112×192大小；后接第三层小尺度全卷积层的卷积核大小为1×1大小，其后跟随批标准化层和非线性激活层，这一层输出的维度为192维，输出尺寸不变；后接第四层全卷积层，也就是最后一层全卷积层，卷积核大小为1×1，输出的维度为1，也就是这一层输出尺寸为112×112×1大小；最后一层紧接着sigmoid函数层，实现逐个像素的裂缝预测。Step 3: Build a multi-scale fully convolutional layer to achieve the purpose of predicting each pixel, as shown in Figure 4. The size of the final fusion feature obtained in step 2 is 112×112×192, followed by the first layer of small-scale full convolution layer with a convolution kernel size of 1×1, followed by a batch normalization layer and a nonlinear activation layer , the output dimension of this layer is 192 dimensions, and the output size remains unchanged; followed by the second layer of large-scale full convolution layer, the convolution kernel size is 21×21, followed by batch normalization layer and nonlinear activation layer, The dimension of the output of this layer is 112×112×192; the size of the convolution kernel of the third layer of small-scale full convolution layer is 1×1, followed by the batch normalization layer and the nonlinear activation layer. The dimension of the layer output is 192 dimensions, and the output dimension remains unchanged; followed by the fourth fully convolutional layer, which is the last fully convolutional layer, the convolution kernel size is 1×1, and the output dimension is 1, that is The output size of this layer is 112×112×1; the last layer is followed by the sigmoid function layer to realize pixel-by-pixel crack prediction.

步骤四、搭建多尺度特征融合深度学习网络的训练集和测试集，训练网络。抽取步骤一中搭建的深度学习定性识别网络裂缝训练集和测试集中的裂缝图像，人工勾画裂缝制作裂缝的标签文件(裂缝像素为1，非裂缝像素为0)。因为设置网络的输出尺寸为112×112大小，所以讲人工勾画的裂缝标签双线性插值到112×112大小，并以0.3作为阈值得到二值化的标签。设置训练网络的损失函数为交叉熵代价函数，利用迁徙学习策略，将裂缝定性识别深度学习网络的特征提取层权重赋给裂缝自动勾画的多尺度特征融合深度学习网络，采用随机梯度下降算法不断更新网络的权重，减少损失函数的值，使得网络逐渐收敛。Step 4. Build the training set and test set of the multi-scale feature fusion deep learning network, and train the network. Extract the crack images in the deep learning qualitative identification network crack training set and test set built in step 1, and manually outline the cracks to create a crack label file (the crack pixel is 1, and the non-crack pixel is 0). Because the output size of the network is set to 112×112, the artificially drawn crack labels are bilinearly interpolated to 112×112, and the binarized labels are obtained with 0.3 as the threshold. Set the loss function of the training network as the cross-entropy cost function, and use the migration learning strategy to assign the weight of the feature extraction layer of the crack qualitative identification deep learning network to the multi-scale feature fusion deep learning network for automatic crack delineation, and use the stochastic gradient descent algorithm to continuously update The weight of the network reduces the value of the loss function so that the network gradually converges.

步骤五、在训练好两个网络后，对于一张输入为4000×6000像素的裂缝图像，首先用裂缝定性深度学习网络实现裂缝定位检测，利用滑动窗口扫描的检测方式扫描检测图像，每个扫描检测的窗口在图像中x、y依次方向错开112个像素，根据每个窗口检测图像的分类实现裂缝的大致定位；将每个检测到裂缝窗口的224×224(3通道)大小图像裁剪出来，作为裂缝自动勾画的多尺度特征融合深度学习网络的输入，每个窗口直接预测的图像大小为112×112像素，双线性插值到224×224大小，并以0.5为阈值进行分割，从而输出每个窗口内的裂缝自动勾画(二值化图像)，最终完成整个图像内裂缝的自动化勾画。Step 5. After training the two networks, for a crack image with an input of 4000×6000 pixels, first use the crack qualitative deep learning network to realize the crack location detection, and use the sliding window scanning detection method to scan the detection image. Each scan The detected windows are staggered by 112 pixels in the x and y directions in the image, and the cracks are roughly positioned according to the classification of the detected images of each window; the 224×224 (3-channel) size image of each detected crack window is cut out, As the input of the multi-scale feature fusion deep learning network for automatic crack delineation, the image size directly predicted by each window is 112×112 pixels, bilinearly interpolated to 224×224, and divided with a threshold of 0.5, so as to output each The cracks in each window can be automatically drawn (binary image), and finally the automatic drawing of cracks in the whole image can be completed.

应用实施例：Application example:

下面通过具体实施例对本发明作进一步说明，但不应以此限制本发明的保护范围。The present invention will be further described below through specific examples, but the protection scope of the present invention should not be limited thereby.

在不同的结构表面及不同环境背景下采集不同复杂程度的裂缝图像，用本发明实现裂缝的自动化检测并进行精度分析。Crack images of different complexity are collected under different structural surfaces and different environmental backgrounds, and the invention is used to realize automatic detection of cracks and perform precision analysis.

(1)深度学习扫描检测(1) Deep learning scanning detection

用训练好的224*224*3输入的GoogLeNet深度学习模型扫描检测图像，部分输出结果如图5第一列所示。Use the trained 224*224*3 input GoogLeNet deep learning model to scan and detect the image, and some output results are shown in the first column of Figure 5.

(2)用多尺度特征融合深度学习网络实现裂缝的自动勾画(2) Using multi-scale feature fusion deep learning network to realize automatic delineation of cracks

将每个检测到裂缝窗口的224×224(3通道)大小图像裁剪出来，作为裂缝自动勾画的多尺度特征融合深度学习网络的输入，从而输出每个窗口内的裂缝自动勾画，经过双线性插值和阈值分割，实现每个窗口内裂缝图像的二值化，最终完成整个图像内裂缝的自动化勾画，部分输出结果如图5第二列所示。Cut out the 224×224 (3-channel) size image of each detected crack window, and use it as the input of the multi-scale feature fusion deep learning network for automatic crack delineation, so as to output the automatic delineation of cracks in each window, after bilinear Interpolation and threshold segmentation realize the binarization of the crack image in each window, and finally complete the automatic delineation of cracks in the entire image. Part of the output results are shown in the second column of Figure 5.

(3)与传统方法对比(3) Compared with traditional methods

在第一步中每个检测到裂缝的窗口内，采用自适应阈值分割和Canny算子检测的结果进行视觉上直观的对比，部分输出结果如图5第三列和第四列所示。In each window where cracks are detected in the first step, the results of adaptive threshold segmentation and Canny operator detection are visually compared. Some output results are shown in the third and fourth columns of Figure 5.

(4)精度分析(4) Accuracy analysis

采用人工勾画裂缝二值化图像与发明所提出方法输出裂缝二值化图像结果对比。将图像划分为等间距的7*7像素的方格，一共有857*571(取整)个方格。对比每个方格内手动勾画与自动勾画裂缝的情况来统计检测的精度。The binary image of cracks manually drawn is compared with the result of the binary image of cracks output by the method proposed by the invention. Divide the image into equally spaced 7*7 pixel squares, a total of 857*571 (rounded) squares. Comparing the manual and automatic drawing of cracks in each grid to count the detection accuracy.

TP(True Positive)：手动勾画和自动勾画裂缝都有裂缝的方格数量；TP (True Positive): The number of squares with cracks in both manual and automatic sketching cracks;

TN(True Negative)：统计手动勾画、自动勾画裂缝都没有裂缝的方格数量；TN (True Negative): Count the number of squares with no cracks in manual sketching and automatic sketching;

FP(False Positive)：统计手动勾画没有裂缝、自动勾画裂缝有裂缝的方格数量；FP (False Positive): Count the number of squares with no cracks in manual drawing and cracks in automatic drawing;

FN(False Negative)：统计手动勾画有裂缝、自动勾画裂缝没有裂缝的方格数量；FN (False Negative): Count the number of squares with cracks drawn manually and without cracks automatically drawn;

设置三个指标：Set up three indicators:

精确率(precision)＝TP/(TP+FP)、召回率(recall)＝TP/(TP+FN)，F-measure＝2×精确率×召回率/(精确率+召回率).Precision=TP/(TP+FP), recall=TP/(TP+FN), F-measure=2×precision×recall/(precision+recall).

本发明提出方法在不同复杂环境中均能保持高准确率、高精确率、高召回率。The method proposed by the invention can maintain high accuracy rate, high precision rate and high recall rate in different complex environments.

Claims

1. A method for automatic delineation of cracks based on multi-scale feature fusion deep learning, characterized in that, the method comprises the following steps:

(1) Crack qualitative detection and feature extraction method based on migration learning;

(2) Multi-scale deep learning feature fusion method;

(3) Continuous multi-scale fully convolutional layer crack prediction method;

(4) The multi-scale feature fusion deep learning adopts the following method for sample training;

(5) Crack detection and location and crack automatic outline.

2. The crack automatic delineation method based on multi-scale feature fusion deep learning according to claim 1, characterized in that, said step 1) crack qualitative detection and feature extraction method based on migration learning, specifically as follows,

Use the camera to collect images of cracks on the surface of different structures, construct a database of crack images, use the constructed crack image database, combine the size of the input image of the pre-training model, and build a training set and test set of crack images through image cropping, rotation, scaling, etc. Operations, the database There are two types of cracked images and non-cracked images. The cracked image is marked as 1, and the non-cracked image is marked as 0. The pre-trained model parameters have included the prior knowledge of image qualitative classification, and the crack image database is used for migration. Learning, training the model until the network converges, and finally obtain a high-quality crack image qualitative classification detection model; when the trained deep learning model qualitatively detects the crack image, use the output of each layer of the deep learning network to extract the input crack image at various scales The features of different dimensions of the input image are extracted to complete the feature extraction of the input image. The size of the output feature map of each layer of the deep learning network is often from large to small. For the layer with the same output feature map size, only the last one of the output feature map network of this size The layer is used as the feature extraction layer, so that the feature maps of fractures at various scales can be obtained in which the scale of the feature map is reduced proportionally.

3. The crack automatic delineation method based on multi-scale feature fusion deep learning according to claim 1, characterized in that, the step 2) multi-scale deep learning feature fusion method is specifically as follows,

① When fusing multiple scale features, the adjacent two-level features are fused layer by layer. First, the higher-level two-scale features are fused into one scale feature, and the lower-level features are fused with the fused high-level feature image. feature image;

② When fusing adjacent two-order feature images, first unify the two-order features in dimension, and use the convolution layer with a convolution kernel size of 1×1 to achieve dimensionality reduction of high-order features, so that the dimension of high-order features Reduced to the same dimension as the low-level features;

③After the high-order feature dimensionality reduction realizes the unification of dimensions, it is necessary to realize the unification of the feature image scale, and the high-order feature image after dimensionality reduction is enlarged to the same size as the low-order feature image by using bilinear interpolation;

④ When the dimensions and sizes of the adjacent two-level feature images are uniform, the fusion of the two scale features is realized by accumulating the pixel values of the corresponding pixels of the corresponding dimensions pixel by pixel;

⑤ After the selected multi-level features are fused, the convolutional layer with a convolution kernel size of 1×1 can be used to achieve dimensionality enhancement or dimensionality reduction in the final fusion feature dimension to adapt to the computer's operating capabilities;

By analogy, the fusion of multi-level features is completed by means of layer-by-layer fusion.

4. The crack automatic delineation method based on multi-scale feature fusion deep learning according to claim 1, characterized in that, the step 3) continuous multi-scale full convolution layer crack prediction method, specifically as follows,

① Build 4 consecutive multi-scale full convolutional layers with convolutional layers, nonlinear activation layers, and batch normalization layers, which do not include pooling layers;

② Among them, the size of the convolution kernel of the first full convolutional layer is a small scale, which is used for the linear combination of information of each pixel in different dimensions, such as a convolution kernel of 1×1 size or a volume of 3×3 size The product kernel, followed by the batch normalization layer and the nonlinear activation layer, the first layer of full convolution layer keeps the output dimension unchanged; the size of the convolution kernel of the second layer of full convolution layer is a large scale, which is used for each pixel The integration of pixel information of the point and its surrounding large-scale area, such as 21×21 size, etc., follows the batch normalization layer and nonlinear activation layer, and the second layer of full convolution layer keeps the dimension of the output unchanged. The structure of the third full convolutional layer is the same as that of the first full convolutional layer; the fourth full convolutional layer adopts a convolution kernel size of 1×1, and the output dimension is reduced to 1 dimension, and then the Sigmoid function is used to obtain each The final predicted value corresponding to pixels;

③The image obtained in the previous step is not the binary image of the crack. The image directly predicted by the network needs to be scaled by bilinear interpolation to the same size as the input image, and the scaled image is discriminated with a threshold of 0.5. The pixel value greater than 0.5 is the crack pixel, and the pixel value less than 0.5 is the background pixel.

5. The crack automatic delineation method based on multi-scale feature fusion deep learning according to claim 1, characterized in that, the multi-scale feature fusion deep learning in the step 4) adopts the following method for sample training, specifically as follows,

① First, use the migration learning strategy, combined with the constructed crack database, to train the network for qualitative crack identification, and use deep learning to extract crack features of different scales;

②In the training set and test set of crack qualitative identification, some crack images were extracted and manually sketched (the crack pixel value is 1, the background pixel value is 0), and the training set and test set of multi-scale feature fusion deep learning network were built. , the label scale of the training set and the test set changes with the final output size of the built network, the size of the training set and the test set label is changed by bilinear interpolation, and the label is binarized with 0.3 as the threshold;

③Set the loss function of the training network to the cross-entropy cost function (cross-entropy cost function), and its expression is:

Among them, y is the label (0 or 1) of the corresponding pixel, and a is the actual output of the network;

④Using migration learning, the weight of the feature extraction layer of the deep learning network for qualitative crack identification is assigned to the multi-scale feature fusion deep learning network for automatic crack delineation, and the stochastic gradient descent algorithm is used to continuously update the weight of the network and reduce the value of the loss function. Gradually converge.

6. The crack automatic delineation method based on multi-scale feature fusion deep learning according to claim 1, characterized in that, said step 5) crack detection and positioning and crack automatic delineation, specifically as follows, after the network training is completed, the crack is realized Automatic sketching is divided into two steps:

① Crack qualitative deep learning network realizes crack location detection. For the case where the image size is larger than the input of the deep learning detection model, the sliding window scanning detection method is used to scan and detect the image. The windows of each scanning detection are staggered in the x and y directions of the image by 1 /2 input image size, according to the classification of each window detection image to achieve the approximate location of cracks;

② On the basis of the crack location in the previous step, the image of each detected crack window is cut out, and used as the input of the multi-scale feature fusion deep learning network for automatic crack delineation, so as to output the automatic delineation of cracks in each window. The end result is automated delineation of cracks within the entire image.