CN110084817B

CN110084817B - Digital elevation model production method based on deep learning

Info

Publication number: CN110084817B
Application number: CN201910217696.6A
Authority: CN
Inventors: 姜光; 赵秀臣; 张岚春
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-03-21
Filing date: 2019-03-21
Publication date: 2021-06-25
Anticipated expiration: 2039-03-21
Also published as: CN110084817A

Abstract

The invention discloses a digital elevation model production method based on deep learning, which relates to the technical field of computer visual angles, and can effectively segment massive remote sensing images and improve the segmentation efficiency by creating a target remote sensing image set, creating a semantic segmentation network based on deep learning, training the semantic segmentation network and creating a digital elevation model.

Description

Digital elevation model production method based on deep learning

Technical Field

The invention relates to the technical field of computer visual angles, in particular to a digital elevation model production method based on deep learning.

Background

Semantic segmentation technology is one of the important directions in the field of deep learning, and is to take original data such as pictures as input, then mark out interesting regions and classify the regions. Massive picture data can be efficiently processed by using a semantic segmentation technology based on deep learning, and the recognition precision and efficiency are improved.

The patent application with the application number of 201710675694.2 discloses a remote sensing ship contour segmentation and detection method based on a deep learning FCN full convolution network, and the technical scheme disclosed by the application comprises the following steps: constructing a remote sensing ship target database, labeling pixels of a remote sensing ship target one by one, then designing a deeper 6-layer Full Convolution Network (FCN) structure for parameter training through convolution and deconvolution, finally performing overlapping segmentation and detection on a wide remote sensing detection image, and then combining all blocks to obtain a final remote sensing image ship detection result. According to the scheme, the remote sensing ship target detection is efficiently and rapidly realized, meanwhile, the accurate segmentation of the ship outline is realized, the traditional complex detection process is simplified, however, the scheme adopts a conventional convolution mode to extract the features, the calculated amount is large, the training speed is low, and meanwhile, the segmentation accuracy is low because the feature information of images with different resolutions is not well utilized and the fusion of semantic features cannot be carried out.

Patent application No. 201711193696.4 discloses a method for automatically extracting forest canopy closure degree through unmanned aerial vehicle digital elevation model, and the technical scheme disclosed in the application comprises: firstly, acquiring image data of an east forest farm by an unmanned aerial vehicle, realizing splicing of the image data through Digital modeling, extracting Digital Elevation model data, then extracting pixel value intervals of non-occlusion areas in Digital Elevation Model (DEM) data through remote sensing image processing software, performing mask processing, and finally obtaining the occupation ratio condition of a mask section and a non-mask section through statistical analysis, thereby obtaining the accurate value of the forest occlusion degree. Compared with other canopy density measurement algorithms, the method has the advantages that the accuracy, the precision and the efficiency are greatly improved, but the method has the following defects: the pixel value space of the non-occlusion area in the object DEM needs to be manually extracted, the work is repeated, and the workload is large. In addition, since the selection of the non-occlusion region depends on manual experience, there may be a problem of wrong selection, which results in that large-scale data cannot be efficiently processed.

Disclosure of Invention

Aiming at the defects in the prior art, the embodiment of the invention provides a digital elevation model production method based on deep learning, which comprises the following steps:

acquiring a certain amount of original remote sensing images, and performing data enhancement on the original remote sensing images to generate a first target remote sensing image set, wherein the data enhancement comprises random scaling and mirror image processing on the original remote sensing images;

creating a semantic segmentation network based on deep learning, wherein the semantic segmentation network comprises a feature extraction module, a feature fusion module and a pixel classification module, and the semantic segmentation network comprises:

the feature extraction module is used for extracting features of the image through a convolutional neural network;

the feature fusion module is used for improving the image classification precision;

the pixel classification module is used for performing pixel classification on the image by using a full convolution network;

training the semantic segmentation network, including:

acquiring a plurality of target objects from the first target remote sensing image set, and respectively setting mapping relations between RGB values and gray values of the target objects, wherein the target objects comprise buildings, trees, automobiles, sundries, crops, roads, lakes and grasslands;

respectively converting each target object into a gray-scale image according to the set mapping relation between the RGB value and the gray-scale value;

inputting each gray level image into the semantic segmentation network to obtain a first label image set, and obtaining a first label image set according to a formula

Calculating softmax function values of all label images according to a formula

Obtaining the loss value of each label image, wherein S_iValue of softmax function, H, representing the ith label image_iA loss value representing the ith label image, j representing the number of types of target objects, V_iAn ith output value representing a fully connected layer of the convolutional neural network;

sequencing all the label images according to the loss value, sequentially determining the object type of each label image, sending the loss value and the object type corresponding to the loss value to the semantic segmentation network, and updating the node parameters of the deep learning semantic segmentation network by using a root-mean-square back propagation algorithm to obtain an optimal semantic segmentation network;

inputting each gray image into the optimal semantic segmentation network model to obtain a second label image set;

replacing the gray value of each label image in the second label image set with a corresponding RGB value according to the set mapping relation between the RGB value and the gray value, and coloring each image in the second label image set according to the RGB value to obtain a second target remote sensing image set;

and comparing the second target remote sensing image set with a digital surface model constructed by a plurality of original remote sensing images, extracting non-ground elements in the digital surface model, dividing the digital surface model into the non-ground elements and the ground elements, and removing the non-ground elements to generate the digital elevation model.

Further, culling the non-ground elements comprises:

eliminating elevation data corresponding to buildings, trees, automobiles and sundries, and supplementing the elevation data adjacent to the elevation data by using an interpolation algorithm;

subtracting a preset numerical value from the elevation data corresponding to the crops, and performing smooth filtering on the elevation data by using a Gaussian function;

and smoothing and filtering the elevation data corresponding to the roads, the lakes and the grasslands by using a Gaussian function.

Further, the semantic feature extraction module adopts the first 7 layers of the MobileNet as a backbone network and comprises 1 two-dimensional conventional convolution and 6 inverse residual modules of the MobileNet; the characteristic fusion module comprises a cavity convolution module and a cavity convolution space pyramid pooling ASPP module which are cascaded; the pixel classification module comprises a full convolution network FCN.

Further, the interpolation algorithm is an inverse distance weighted interpolation algorithm, and the expression is as follows:

wherein z is_pIs the elevation of the point to be interpolated, mu is the weight index, d_iDistance from the ith sample point to the interpolated point, d_i ^-uIs a distance decay function.

The digital elevation model production method based on deep learning provided by the embodiment of the invention has the following beneficial effects:

(1) designing a semantic segmentation network by combining a deep learning technology, and completing segmentation of massive remote sensing images by combining the light weight advantage of MobileNet and the characteristic that a void convolution space pyramid pooling module can acquire more global semantic information;

(2) the deep learning semantic segmentation technology is applied to the DEM creation process, the remote sensing image is segmented by adopting a semantic segmentation network, and then non-geographic information corresponding to the digital surface model, such as buildings, trees, rivers, lawns, roads, vehicles and the like, is determined, different DEM repairing strategies are formulated according to the elevation attributes of different remote sensing images, and the grid data of the DEM repairing area is repaired. Compared with the traditional processing method of manual visual interpretation and common filtering, the working efficiency of remote sensing image segmentation is improved.

Drawings

FIG. 1 is a schematic flow chart illustrating a method for producing a digital elevation model based on deep learning according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a semantic segmentation network according to an embodiment of the present invention.

Detailed Description

As shown in fig. 1, a method for producing a digital elevation model based on deep learning according to an embodiment of the present invention includes the following steps:

s101, creating a target remote sensing image set, comprising:

the method comprises the steps of obtaining a certain number of original remote sensing images, carrying out data enhancement on the original remote sensing images, and generating a target remote sensing image set, wherein the data enhancement comprises the steps of carrying out random scaling and mirror image processing on the original remote sensing images.

Further, the random scaling means that the original remote sensing image is randomly reduced or enlarged to 0.5-1.5 times, and the mirror image means that the original remote sensing image is horizontally turned.

S102, creating a semantic segmentation network based on deep learning, wherein:

as shown in fig. 2, the semantic segmentation network includes a feature extraction module 1, a feature fusion module 2, and a pixel classification module 3, where:

the characteristic extraction module is used for extracting the characteristics of the image through a convolutional neural network;

the pixel classification module is used for carrying out pixel classification on the image by utilizing a full convolution network.

Furthermore, the feature extraction module mainly adopts a basic unit bottleeck of MobileNet, the feature extraction module is formed by connecting 6 bottleecks in series, and the number of convolution kernels of each bottleeck is different. Bottleneck does not employ pooling, and downsampling is performed by setting the value of the sampling step size stride. The Bottleneck is an inverse residual error module, the shape of the module is similar to that of an inverse funnel, the dimension of an input image channel is increased in the module, and the dimension is reduced after the image features are extracted. The lifting proportion of the image characteristic dimension is set according to the expansion coefficient, and is generally set to be 1, 6 and 6. The convolution operation inside the Bottleneck uses depth separable deconvolution, so that on one hand, the calculation amount can be reduced, and on the other hand, the extraction of image features can be enhanced. The deep-resolution convolution operation is repeatedly performed in one bottleeck, and the feature extraction can be more effectively performed by the operations of repeating the six bottleecks by 1, 2, 3, 4, 3 and 3 respectively.

The output result of the semantic feature extraction module is input into the feature fusion module. Firstly, a cascaded cavity convolution module is introduced, and the size of an image output by a feature extraction module is 16 pixels²Since the oversized hole convolution module is not different from the ordinary convolution module of 1pixel x 1pixel in effect, the image size of 16 pixels is considered²Using 2 pixels each in size²、4pixel²、8pixel²Cascaded hole convolution modules to obtain different scalesContext information, which acts on the sixth bottleeck. And the characteristic diagram after the processing of the cascaded hole convolution module is finished is sent to the ASPP module. ASPP module with 6pixel dimension²、12pixel²And 18 pixels²The cavity convolution modules are connected in parallel to form the structure, and each cavity convolution module operates the sixth cavity to enhance semantic features under different scales. Multi-scale fusion can bring an improvement in classification accuracy. And the output of the ASPP module is superposed with the cavity convolution module with the expansion rate of 1 and the averaged pooled global image, then convolution operation is carried out, and finally the image enters a full convolution network for pixel classification.

As a specific embodiment, the work flow of the semantic segmentation network is as follows:

(1) the input image size is 512 pixels multiplied by 3, and the image with the size of 256 pixels multiplied by 16 is output through one layer of common convolution;

(2) entering a MobileNet network, and generating an image with the size of 16 pixels multiplied by 160 through 6 bottleeck operations;

(3) in order to increase the receptive field and obtain more semantic information, 3 modules with the cavity convolution expansion rate of 2, 4 and 8 are connected in series and act on the sixth bottleck to output images with the size of 16 pixels multiplied by 256;

(4) in order to obtain global context information, a common 1 × 1 convolution module and 3 hole convolution modules with expansion rates of 6, 12 and 18 are connected in parallel to form an image with the size of 16 pixels × 256 × 4;

(5) performing global averaging on the image, acquiring information of the global level of the image, and upsampling the image with the output size of 16 pixels multiplied by 256;

(6) combining the images output in the steps (4) and (5), and performing superposition operation to form an image with the size of 16 pixels multiplied by 256 multiplied by 5;

(7) performing convolution operation on an image with the size of 16 pixels multiplied by 256 multiplied by 5 to form an image with the size of 16 pixels multiplied by 8;

(8) upsampling and deconvolution operations are performed on images of size 16 pixels by 8, resulting in images of size 512 pixels by 8.

S103, training the semantic segmentation network, comprising:

acquiring a plurality of target objects from a first target remote sensing image set, and respectively setting a mapping relation between RGB (red, green and blue) values and gray values of each target object, wherein the target objects comprise buildings, trees, automobiles, sundries, crops, roads, lakes and grasslands;

as a specific embodiment, the mapping relationship may be: the grayscale value corresponding to the image with RGB value (0, 0, 255) is 1, the grayscale value corresponding to the image with RGB value (0, 255, 255) is 2, and the grayscale value corresponding to the image with RGB value (255, 255, 255) is 3.

Calculating softmax function values of all label images according to a formula

sequencing the label images according to the loss value, sequentially determining the object type of each label image, sending the loss value and the object type corresponding to the loss value to the semantic segmentation network, and updating the node parameters of the deep learning semantic segmentation network by using a root-mean-square back propagation algorithm to obtain the optimal semantic segmentation network.

S104, creating a digital elevation model, comprising:

Optionally, culling the non-ground elements comprises:

Optionally, the semantic feature extraction module adopts the first 7 layers of MobileNet as a backbone network, and includes 1 two-dimensional conventional convolution module and 6 inverse residual modules of MobileNet; the characteristic fusion module comprises a cascaded cavity convolution module and an ASPP (asynchronous Spatial Pyramid Pooling) module; the pixel classification module includes a full convolution network FCN.

Optionally, the interpolation algorithm is an inverse distance weighted interpolation algorithm, and its expression is:

According to the digital elevation model production method based on deep learning provided by the embodiment of the invention, the target remote sensing image set is created, the semantic segmentation network based on deep learning is created, the semantic segmentation network is trained, and the digital elevation model is created, so that the massive remote sensing images can be effectively segmented, and the segmentation efficiency is improved.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It will be appreciated that the relevant features of the method and apparatus described above are referred to one another. In addition, "first", "second", and the like in the above embodiments are for distinguishing the embodiments, and do not represent merits of the embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In addition, the memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

It should be noted that the above-mentioned embodiments do not limit the present invention in any way, and all technical solutions obtained by using equivalent alternatives or equivalent variations fall within the protection scope of the present invention.

Claims

1. A digital elevation model production method based on deep learning is characterized by comprising the following steps:

training the semantic segmentation network, including:

Calculating softmax function values of all label images according to a formula

comparing the second target remote sensing image set with a digital surface model constructed by a plurality of original remote sensing images, extracting non-ground elements in the digital surface model, dividing the digital surface model into the non-ground elements and the ground elements, removing the non-ground elements, and generating the digital elevation model, wherein:

the semantic feature extraction module adopts the first 7 layers of the MobileNet as a backbone network and comprises 1 two-dimensional conventional convolution and 6 inverse residual modules of the MobileNet; the characteristic fusion module comprises a cavity convolution module and a cavity convolution space pyramid pooling ASPP module which are cascaded; the pixel classification module comprises a full convolution network FCN.

2. The deep learning-based digital elevation model production method of claim 1, wherein culling the non-ground elements comprises:

3. The deep learning based digital elevation model production method of claim 2, wherein:

the interpolation algorithm is an inverse distance weighted interpolation algorithm, and the expression is as follows: