CN111079683A

CN111079683A - Remote sensing image cloud and snow detection method based on convolutional neural network

Info

Publication number: CN111079683A
Application number: CN201911350284.6A
Authority: CN
Inventors: 李坤; 杜洪才; 郭建华; 杨敬钰
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2020-04-28
Anticipated expiration: 2039-12-24
Also published as: CN111079683B

Abstract

The invention belongs to the field of image processing, and aims to realize a remote sensing image cloud and snow detection method for carrying out multi-scale and multi-path feature fusion on a convolutional neural network-based multi-convolutional layer, which is reasonable in design and high in identification accuracy. And carrying out pixel-level accurate classification on the image. Therefore, the technical scheme adopted by the invention is that the remote sensing image cloud and snow detection method based on the convolutional neural network comprises the following steps: the coding part of the network is as follows: performing feature coding on input image information; the decoding part of the network is: and extracting the resolution information of the original image by the multi-scale fusion module of the basic depth features of the coding structure codes to generate a cloud and snow object detection result consistent with the resolution of the original image. The method is mainly applied to weather image processing occasions.

Description

Remote sensing image cloud and snow detection method based on convolutional neural network

Technical Field

The invention belongs to the field of computer vision. In particular to a remote sensing image cloud and snow detection method for extracting and fusing multi-scale and multi-path semantic information based on a multi-feature layer of a convolutional neural network.

Background

The remote sensing image is generally used for land monitoring, target detection, geographical mapping and the like, and the distribution of clouds and snow in the image has a large influence on the spectrum of the remote sensing image. Improving the detection accuracy of the cloud and snow in the remote sensing image has become a target for many remote sensing image applications. The cloud and snow present in the remote sensing image can have adverse effects on remote sensing applications, such as atmospheric correction, target recognition, target detection, and the like.

Therefore, the method has very important significance for improving the cloud and snow detection precision of the optical satellite image remote sensing image. Cloud and snow detection of remote sensing images in the remote sensing images is a multi-classification problem, and machine learning methods such as Artificial Neural Networks (ANNs) and Support Vector Machines (SVM) are applied to remote sensing image classification research. These methods use manually designed functions and binary classifiers without utilizing advanced functions. Today, Convolutional Neural Networks (CNNs) have become the focus of research. Convolutional Neural Networks (CNN) have been applied to classification detection, target recognition and target detection, which is a typical deep learning algorithm, and the wide application of CNN framework in the fields of speech recognition and image semantic detection has become very important. Since it is capable of accurately extracting semantic information from a large number of input images. Many deep Convolutional Neural Networks (CNNs) for cloud and snow detection are pixel-based predictions.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to realize a remote sensing image cloud and snow detection method for carrying out multi-scale and multi-path feature fusion on a multi-convolution layer based on a convolutional neural network, which has reasonable design and high identification accuracy. And carrying out pixel-level accurate classification on the image. Therefore, the technical scheme adopted by the invention is that the remote sensing image cloud and snow detection method based on the convolutional neural network comprises the following steps:

the coding part of the network is as follows: performing feature coding on input image information;

the decoding part of the network is: and extracting the resolution information of the original image by the multi-scale fusion module of the basic depth features of the coding structure codes to generate a cloud and snow object detection result consistent with the resolution of the original image.

And (3) taking the comprehensive cross entropy loss and the mean square error loss as loss functions, training the proposed network by using a comprehensive loss function target, and evaluating the network performance by using an accuracy rate and a mean Intersection over Union.

The base depth features include global information and local information and are obtained by fusing output features of different convolutional layers.

The encoding part of the network, namely the step of performing characteristic encoding on the input image information, specifically comprises the following steps:

(1) processing an input image to a uniform size of 256 × 256, taking a residual network ResNet-50 structure with 50 layers of neurons as a coding structure part of the proposed network, and decomposing the residual network ResNet-50 into a preprocessing unit and 4 residual blocks according to the residual blocks according to 5 stage processing structures of ResNet-50;

(2) inputting the images with uniform size into a ResNet-50 network structure, wherein after the images are subjected to a series of convolution, batch normalization, pooling and ReLU operations, each residual block outputs a feature, and the resolution of the output feature of each residual block is as follows: the residual block 1 is 32 × 32, the residual block 2 is 16 × 16, the residual block 3 is 16 × 16, and the residual block 4 is 16 × 16, for a total of 4 local residual block output characteristics.

The encoding part of the network, namely the process of fusing the features generated in the step 1 by using the multi-scale fusion module, specifically comprises the following steps:

(1) performing 3 × 3 convolution on the features output by the residual block 1 with the output feature resolution of 32 × 32, wherein the convolution step size of the convolution is 1, and obtaining a 32 × 32 feature map which is marked as a feature map 1;

(2) performing expansion convolution on the features output by the residual block 2 with the output feature resolution of 16 × 16 by 3 × 3, wherein the convolution step of the expansion convolution is 1, and the expansion rate is 6, so as to obtain a 16 × 16 feature map which is marked as a feature map 2;

(3) performing expansion convolution on the features output by the residual block 3 with the output feature resolution of 16 × 16 by 3 × 3, wherein the convolution step of the expansion convolution is 1, and the expansion rate is 12, so as to obtain a 16 × 16 feature map which is marked as a feature map 3;

(4) performing expansion convolution on the features output by the residual block 4 with the output feature resolution of 16 × 16 by 3 × 3, wherein the convolution step of the expansion convolution is 1, and the expansion rate is 18, so as to obtain a 16 × 16 feature map which is marked as a feature map 4;

(5) the features output by the residual block 4 with the output feature resolution of 16 × 16 are subjected to convolution by 3 × 3 through a global average pooling layer, the convolution step length of the convolution is 1, and a 16 × 16 feature map is obtained and recorded as a feature map 5;

(6) respectively performing 2 times of upsampling on the feature map 2, the feature map 3, the feature map 4 and the feature map 5 to generate a feature map 2a, a feature map 3a, a feature map 4a and a feature map 5 a;

(7) cascading the feature diagram 1 with a feature diagram 2a, a feature diagram 3a, a feature diagram 4a and a feature diagram 5a to obtain a feature cascading diagram A;

(8) performing convolution on the feature cascade graph A by 1 multiplied by 1, wherein the convolution step of the convolution is 1, obtaining a feature graph of 32 multiplied by 32, and marking as a feature fusion graph B;

(9) performing double upsampling on the feature fusion image B to generate a feature fusion image C with the resolution of 64 multiplied by 64;

(10) the resolution ratio output by the second residual error unit in the residual error block 1 is 64 multiplied by 64 characteristic graph, the convolution step of the convolution is 1, so as to obtain 64 multiplied by 64 characteristic graph, and the characteristic graph is cascaded with the characteristic fusion graph C with the resolution ratio of 64 multiplied by 64, so as to obtain a characteristic fusion graph D;

(11) and (3) performing 3 × 3 convolution on the feature fusion image D, wherein the convolution step of the convolution is 1, obtaining a 64 × 64 feature image, and performing 4 times of upsampling on the feature image to generate a 256 × 256 detection image.

The training step for the proposed network specifically comprises the following steps:

(1) calculating the cross entropy loss and mean square error loss of the prediction detection graph and the labeled detection graph, and updating the weight by using a back propagation algorithm;

(2) after the network training is completed, the predicted performance is measured by using the pixel accuracy and mIOU (Mean Intersection over Union).

The invention has the characteristics and beneficial effects that:

the method is reasonable in design, local information and global information are fully considered, the calculation detection accuracy of the image is improved by adopting a multi-scale multi-convolution layer feature fusion method, the provided network is trained by taking the combination of cross entropy loss and mean square error loss as a minimum target loss function, and the cloud and snow detection accuracy of the image remote sensing image is effectively improved.

Description of the drawings:

fig. 1 is an overall network structure proposed by the present invention.

Fig. 2 is an input image of the inventive test.

Fig. 3 is a cloud and snow label of an input image tested by the present invention.

FIG. 4 shows the results of the detection of the present invention.

Detailed Description

In order to reduce the influence of external factors such as uncertainty, ambiguity and the like, fully utilize information in an image and obtain better feature representation, the invention provides a remote sensing image cloud and snow detection method for extracting and fusing multi-path semantic information on multiple convolution layers and multiple scales based on a convolution neural network. According to the method, firstly, the semantic features of different levels in the ResNet-50 model are extracted by adopting the expansion convolution with different expansion rates to increase the receptive field, and the multi-scale multi-path feature fusion is carried out on the multi-convolution layer, so that the cloud and snow detection performance of the image remote sensing image is improved.

The technical scheme of the invention is realized by the following steps:

1. performing feature coding on input image information;

2. extracting basic depth features, fusing the basic depth features through a multi-scale fusion module, recovering original image resolution information, and generating a cloud and snow object detection result which is consistent with the original image resolution;

3. and (3) taking the comprehensive cross entropy loss and the mean square error loss as loss functions, training the proposed network by using the comprehensive loss function target, and evaluating the network performance by using the accuracy and the mIOU.

The relationship among the above three steps is that step 2 processes the encoded information obtained in step 1 and generates a result, the encoded information obtained in step 1 is the basis of step 2, step 1 and step 2 are the whole network structure of the present invention, and step 3 is the training method of the network of the present invention.

The specific implementation method of the step 1 comprises the following steps:

(3) the input image is processed to a uniform size of 256 x 256, a ResNet-50 network structure is used as a coding structure part of the proposed network, and the ResNet-50 network is decomposed into a pre-processing unit plus 4 residual blocks by residual blocks according to a 5-stage processing structure of ResNet-50.

(4) Inputting the images with uniform size into a ResNet-50 network structure, wherein after the images are subjected to a series of convolution, batch normalization, pooling and ReLU operations, each residual block outputs a feature, and the resolution of the output feature of each residual block is as follows: the residual block 1 is 32 × 32, the residual block 2 is 16 × 16, the residual block 3 is 16 × 16, and the residual block 4 is 16 × 16, for a total of 4 local residual block output characteristics.

The specific implementation method of the step 2 comprises the following steps:

(12) the features output by the residual block 1 with the output feature resolution of 32 × 32 are convolved by 3 × 3, the convolution step of the convolution is 1, and a feature map of 32 × 32 is obtained and recorded as the feature map 1.

(13) The features output by the residual block 2 with the output feature resolution of 16 × 16 are subjected to expansion convolution with 3 × 3, the convolution step of the expansion convolution is 1, the expansion rate is 6, and a 16 × 16 feature map is obtained and recorded as the feature map 2.

(14) The feature output by the residual block 3 with the output feature resolution of 16 × 16 is subjected to 3 × 3 dilation convolution, the convolution step of the dilation convolution is 1, the dilation rate is 12, and a 16 × 16 feature map is obtained and recorded as the feature map 3.

(15) The features output by the residual block 4 with the output feature resolution of 16 × 16 are subjected to expansion convolution of 3 × 3, the convolution step of the expansion convolution is 1, the expansion rate is 18, and a 16 × 16 feature map is obtained and recorded as the feature map 4.

(16) And (3) performing convolution on the features output by the residual block 4 with the output feature resolution of 16 × 16 by 3 × 3 through a global average pooling layer, wherein the convolution step size of the convolution is 1, and obtaining a 16 × 16 feature map which is marked as a feature map 5.

(17) The feature map 2, the feature map 3, the feature map 4, and the feature map 5 are up-sampled by 2 times to generate a feature map 2a, a feature map 3a, a feature map 4a, and a feature map 5a, respectively.

(18) And cascading the feature map 1 with the feature map 2a, the feature map 3a, the feature map 4a and the feature map 5a to obtain a feature cascade map A.

(19) And (4) performing convolution on the feature cascade diagram A by 1 × 1, wherein the convolution step of the convolution is 1, and obtaining a 32 × 32 feature diagram which is marked as a feature fusion diagram B.

(20) The feature fusion map B is up-sampled twice to generate a feature fusion map C with a resolution of 64 × 64.

(21) And (3) performing 1 × 1 convolution on the feature map with the resolution of 64 × 64 output by the second residual unit in the residual block 1, wherein the convolution step of the convolution is 1, so as to obtain a 64 × 64 feature map, and cascading the feature map and the feature fusion map C with the resolution of 64 × 64 to obtain a feature fusion map D.

(22) And (3) performing 3 × 3 convolution on the feature fusion image D, wherein the convolution step of the convolution is 1, obtaining a 64 × 64 feature image, and performing 4 times of upsampling on the feature image to generate a 256 × 256 detection image.

The specific implementation method of the step 3 comprises the following steps:

(3) and calculating the cross entropy loss and the mean square error loss of the prediction detection graph and the labeled detection graph, and updating the weight by using a back propagation algorithm.

(4) After the network training is completed, the predicted performance is measured by using the pixel accuracy and mIOU (Mean Intersection over Union).

The embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The invention provides a method for detecting cloud and snow of a remote sensing image by using a multi-path characteristic multi-scale fusion module, aiming at the problem of how to fully utilize global information and local information in the cloud and snow detection of the remote sensing image. As shown in FIG. 1, in the network structure of the present invention, the output characteristics of each residual block in the ResNet-50 network structure are subjected to multi-scale expansion convolution to extract characteristics, the characteristics are subjected to size unification by a sampling layer and are cascaded, the characteristics are cascaded with low-level characteristics again after convolution fusion, and then the cascaded characteristics are subjected to convolution operation and upsampling to restore the original resolution of an image, so that the classification result is more reliable. The method is equivalent to that feature extraction of different scales is carried out on different feature layers at a feature extraction end in a network, and the receptive fields of convolution kernels of different scales are different, so that scale information contained in features obtained by each path is different, and finally a series of features from local to global are obtained. Such fusion results take into account both local information and global information. The output of the network is a detection image with the resolution consistent with the original image, the detection accuracy is calculated by using the existing label of the image, and finally the proposed network is trained by taking the minimum cross entropy loss and the mean square error loss as the targets.

In this embodiment, a remote sensing image cloud and snow detection method based on a convolutional neural network includes the following steps:

step S1, performing feature encoding on the input image information, the specific processing method of this step is as follows:

step S1.1, processing the input image to a uniform size of 256 multiplied by 256, taking a ResNet-50 network structure as a pre-trained basic convolutional neural network, and decomposing the input image into a preprocessing unit and 4 residual blocks according to the residual block according to 5 stage processing structures of ResNet-50.

Step S1.2, inputting the images with uniform sizes into a ResNet-50 network structure, wherein after the images are subjected to a series of convolution, batch normalization, pooling and ReLU operations, each residual block outputs a feature, and the resolution of the output feature of each residual block is as follows: the residual block 1 is 32 × 32, the residual block 2 is 16 × 16, the residual block 3 is 16 × 16, and the residual block 4 is 16 × 16, for a total of 4 local residual block output characteristics.

And S2, recovering the resolution information of the original image by passing the coding information obtained in the step S1 through a multi-scale fusion module, and generating a cloud and snow object detection result consistent with the resolution of the original image. The specific treatment method in the step is as follows:

s2.1, performing 3 × 3 convolution on the features output by the residual block 1 with the output feature resolution of 32 × 32, wherein the convolution step size of the convolution is 1, and obtaining a 32 × 32 feature map which is recorded as a feature map 1.

And S2.2, performing 3 × 3 expansion convolution on the features output by the residual block 2 with the output feature resolution of 16 × 16, wherein the convolution step of the expansion convolution is 1, and the expansion rate is 6, and obtaining a 16 × 16 feature map which is recorded as the feature map 2.

And S2.3, performing 3 × 3 expansion convolution on the features output by the residual block 3 with the output feature resolution of 16 × 16, wherein the convolution step of the expansion convolution is 1, the expansion rate is 12, and a 16 × 16 feature map is obtained and is marked as a feature map 3.

And S2.4, performing 3 × 3 expansion convolution on the features output by the residual block 4 with the output feature resolution of 16 × 16, wherein the convolution step of the expansion convolution is 1, the expansion rate is 18, and a 16 × 16 feature map is obtained and is marked as a feature map 4.

S2.5, the features output by the residual block 4 with the output feature resolution of 16 × 16 are subjected to convolution of 3 × 3 through a global average pooling layer, the convolution step of the convolution is 1, and a 16 × 16 feature map is obtained and recorded as a feature map 5.

And S2.7, respectively performing 2-time upsampling on the feature map 2, the feature map 3, the feature map 4 and the feature map 5 to generate a feature map 2a, a feature map 3a, a feature map 4a and a feature map 5 a.

And S2.8, cascading the feature diagram 1 with the feature diagram 2a, the feature diagram 3a, the feature diagram 4a and the feature diagram 5a to obtain a feature cascade diagram A.

S2.9, performing convolution on the feature cascade diagram A by 1 × 1, wherein the convolution step of the convolution is 1, obtaining a feature diagram of 32 × 32, and marking as a feature fusion diagram B.

S2.10, the feature fusion image B is subjected to double upsampling to generate a feature fusion image C with the resolution of 64 x 64.

S2.11, performing 1 × 1 convolution on the feature map with the resolution of 64 × 64 output by the second residual unit in the residual block 1, wherein the convolution step size of the convolution is 1, so as to obtain a 64 × 64 feature map, and cascading the feature map with the feature fusion map C with the resolution of 64 × 64 to obtain a feature fusion map D.

S2.12, performing 3 × 3 convolution on the feature fusion image D, wherein the convolution step of the convolution is 1, obtaining a 64 × 64 feature image, and performing up-sampling on the feature image by 4 times to generate a 256 × 256 detection image.

And S3, training the network provided by the steps S1 and S2 by taking the comprehensive cross entropy loss and mean square error loss function as targets, and evaluating the network performance by using the accuracy and the mIoU. The specific treatment method in the step is as follows:

and S3.1, calculating the cross entropy loss and the mean square error loss of the prediction detection map and the marked detection map, and updating the weight by using a back propagation algorithm.

And S3.2, after the network training is finished, measuring the prediction performance of the network by using the accuracy and mIOU (Mean Intersection ratio).

The following experiment was conducted according to the method of the present invention to explain the recognition effect of the present invention.

And (3) testing environment: python2.7(python language development environment); tensorflow (neural network algorithm library); ubuntu16.04 (operating system) system; NVIDIA GTX 1080Ti GPU (graphics processor developed by NVIDIA corporation)

And (3) testing sequence: the selected data set is an image data set for detecting cloud and snow images made of ZY-3 remote sensing satellite cloud and snow images, and comprises 21290 satellite remote sensing images

Testing indexes are as follows: the invention uses Pixel Accuracy (Pixel Accuracy) and mIOU as performance evaluation indexes. Pixel accuracy refers to pixel classification accuracy. mlou refers to the ratio of intersection to union of erroneous pixels that are predicted correctly on average. The index data are calculated by different algorithms which are popular at present, and then result comparison is carried out, so that the method provided by the invention obtains a better result in the field of cloud and snow detection of the image remote sensing image.

The test results were as follows:

as can be seen from the comparison data, the pixel accuracy and mIOU of the invention are obviously improved compared with the prior algorithm.

It should be emphasized that the embodiments described herein are illustrative rather than restrictive, and thus the present invention is not limited to the embodiments described in the detailed description, but also includes other embodiments that can be derived from the technical solutions of the present invention by those skilled in the art.

Claims

1. A remote sensing image cloud and snow detection method based on a convolutional neural network is characterized by comprising the following steps:

2. The remote sensing image cloud and snow detection method based on the convolutional neural network as claimed in claim 1, wherein cross entropy loss and mean square error loss are synthesized as loss functions, the proposed network is trained with a comprehensive loss function target, and network performance is evaluated by using an accuracy and a mean cross over ratio mIOU (mean cross over Unit).

3. The convolutional neural network-based remote sensing image cloud and snow detection method as claimed in claim 1, wherein the base depth features comprise global information and local information and are obtained by fusing output features of different convolutional layers.

4. The method for detecting the cloud and snow of the remote sensing image based on the convolutional neural network as claimed in claim 1, wherein a coding part of the network, namely a step of performing characteristic coding on input image information, specifically comprises the following steps:

5. The method for detecting the cloud and snow of the remote sensing image based on the convolutional neural network as claimed in claim 1, wherein a coding part of the network, namely a process of fusing the features generated in the step 1 by using a multi-scale fusion module, specifically comprises the following steps:

6. The remote sensing image cloud and snow detection method based on the convolutional neural network as claimed in claim 1, wherein the training step of the proposed network specifically comprises the following steps:

(2) after the network training is finished, the pixel accuracy and average cross-over ratio mIOU (mean Intersection overlap Union) are used for measuring the prediction performance of the network.