CN116485652B

CN116485652B - Super-resolution reconstruction method for remote sensing image vehicle target detection

Info

Publication number: CN116485652B
Application number: CN202310465820.7A
Authority: CN
Inventors: 陈千千; 符晗; 贺广均; 冯鹏铭; 梁颖; 上官博屹; 金世超; 梁银川
Original assignee: Beijing Institute of Satellite Information Engineering
Current assignee: Beijing Institute of Satellite Information Engineering
Priority date: 2023-04-26
Filing date: 2023-04-26
Publication date: 2024-03-01
Anticipated expiration: 2043-04-26
Also published as: CN116485652A

Abstract

The invention relates to a super-resolution reconstruction method for remote sensing image vehicle target detection, which comprises the following steps: constructing rich high-resolution remote sensing image data sets in different scenes, and preprocessing the high-resolution remote sensing image data sets; obtaining a corresponding low-resolution remote sensing image data set according to the high-resolution remote sensing image data set, and constructing a target super-resolution reconstruction data set; extracting edge features of the low-resolution remote sensing image in the low-resolution remote sensing image dataset; constructing a super-resolution reconstruction model, and training and optimizing the super-resolution reconstruction model by utilizing the target super-resolution reconstruction data set and the edge characteristics; and carrying out high-resolution recovery and reconstruction on the target by using the super-resolution reconstruction model. By implementing the scheme of the invention, the problem of low detection rate of the vehicle target due to weak and small characteristics is effectively solved, the reconstruction quality of the target is effectively improved, and the reconstruction calculation cost is reduced.

Description

Super-resolution reconstruction method for remote sensing image vehicle target detection

Technical Field

The invention relates to the technical field of super-resolution reconstruction, in particular to a super-resolution reconstruction method for remote sensing image vehicle target detection.

Background

The vehicle target detection of the remote sensing image has wide application in various fields such as traffic, safety, military and the like. Different from natural images, the remote sensing image imaging modes are more diversified, and the included targets are more complex. The vehicle targets in the remote sensing image are characterized by small size and dense arrangement, and the edges and the texture features of the vehicle targets are not outstanding. The bounding box of the vehicle object in the public dataset cocc is about 32 pixels×20 pixels, and the bounding box of the vehicle object in the MS COCO data is about 70 pixels×46 pixels, so that the pixel ratio of the vehicle object in the remote sensing image is very small. This results in less effective information being extracted from the vehicle target, presenting small target characteristics, making the detection of the vehicle target by remote sensing images a significant challenge. Acquiring a high-resolution image of a vehicle target is an effective way to improve the detection performance of the vehicle target.

The high resolution image contains rich texture detail features, which is a key to the advanced application field of computer vision. However, it is difficult to obtain high resolution images due to limitations of acquisition equipment, environment, information transmission, imaging conditions, and the like. Super-resolution reconstruction can be regarded as the inverse process of image blurring and degradation, and aims to reconstruct a low-resolution image by using an algorithm and restore the high-resolution image with more detail information. The super-resolution reconstruction technology is widely applied to a plurality of fields such as military reconnaissance, medical diagnosis, remote sensing and the like.

In recent years, with the development of deep learning technology, especially the development of generation countermeasure network technology, the quality of image super-resolution reconstruction is greatly improved. Because the remote sensing images have the characteristics of dimensional diversity, complex texture details and the like, the super-resolution reconstruction effect in the field of the remote sensing images needs to be improved, especially in the reconstruction scene of small targets such as vehicles and the like. Most of the existing super-resolution reconstruction methods based on deep learning improve the quality of reconstructed images by constructing a complex network (depth and width of a deepened model). In addition, the data set applied to the vehicle detection of the remote sensing image contains single scenes, which is not beneficial to improving the generalization performance of the vehicle target detection.

Disclosure of Invention

The invention aims to solve the technical problem that small and medium targets in the field of remote sensing image vehicle target detection have low detection rate due to weak and small characteristics.

In order to achieve the above purpose, the technical scheme of the invention is as follows:

the embodiment of the invention provides a super-resolution reconstruction method for detecting a vehicle target by remote sensing images, which comprises the following steps:

constructing rich high-resolution remote sensing image data sets in different scenes, and preprocessing the high-resolution remote sensing image data sets;

obtaining a corresponding low-resolution remote sensing image data set according to the high-resolution remote sensing image data set, and constructing a target super-resolution reconstruction data set;

extracting edge features of the low-resolution remote sensing image in the low-resolution remote sensing image dataset;

constructing a super-resolution reconstruction model, and training and optimizing the super-resolution reconstruction model by utilizing the target super-resolution reconstruction data set and the edge characteristics;

and carrying out high-resolution recovery and reconstruction on the target by using the super-resolution reconstruction model.

According to an aspect of the embodiment of the present invention, the constructing a rich high-resolution remote sensing image dataset in different scenes includes: the high-resolution remote sensing images of the vehicle targets in different typical scenes including cities, villages, airports, ports, bridges and mines are searched through google earth.

According to an aspect of the embodiment of the present invention, the preprocessing the high-resolution remote sensing image dataset includes:

slicing the high-resolution remote sensing image by adopting an overlapped frame mode;

screening out a target slice image containing a complete target and a background slice image partially containing no target;

and performing category labeling on the target slice image and the background slice image by using one-hot coding.

According to an aspect of the embodiment of the present invention, the obtaining a corresponding low resolution remote sensing image dataset according to the high resolution remote sensing image dataset, and constructing a target super-resolution reconstruction dataset, includes:

carrying out data enhancement processing comprising rotation at different angles and random overturn on the preprocessed high-resolution remote sensing image data set;

performing compression of different degrees on the high-resolution remote sensing image data set and the enhanced high-resolution remote sensing image data set by using an interpolation mode to obtain low-resolution remote sensing images with different sizes;

restoring the low-resolution remote sensing images with different sizes to the sizes of the high-resolution remote sensing images by using an interpolation mode;

and forming an image pair by the acquired high-resolution remote sensing image and the low-resolution remote sensing image, so as to construct a target super-resolution reconstruction data set.

According to an aspect of the embodiment of the present invention, the extracting the edge feature of the low resolution remote sensing image includes:

carrying out gray scale normalization processing on the low-resolution remote sensing image to obtain a gray scale image;

gaussian filtering is used for removing noise, and Canny gradient operators are used for detecting edges of the gray level images, so that edge features of targets are extracted.

According to an aspect of the embodiment of the present invention, the constructing a super-resolution reconstruction model includes:

the method comprises the steps of designing a generator, wherein the generator comprises a bottom layer feature extraction module, two downsampling modules, an advanced semantic feature extraction module, an image reconstruction module and a reconstruction feature fusion module;

the method comprises the steps of designing a discriminator, wherein the discriminator takes a ResNet18 network as a framework, and is connected with two full-connection layers after a global average pooling layer to serve as two classifiers, one classifier is used for discriminating a real image and a synthesized image, and the other classifier is used for classifying a target and a background.

According to one aspect of the embodiment of the present invention, the underlying feature extraction module uses a convolution kernel with a kernel size of 3×3 for extracting detail features, and the activation function adopts a ReLU function;

the downsampling module adopts a bottleneck layer with residual branches, the trunk branches of the downsampling module are a stack of 1×1 convolution, 3×3 convolution and 1×1 convolution, and the residual branches are 1×1 convolution;

the high-level semantic feature extraction module is composed of a series of LMFA modules, layer jump connection and global attention modules which are connected in series, wherein the LMFA modules are improved multi-scale large-core attention modules which are fused with features of different core sizes, larger receptive fields are realized by fewer parameters, and the context information of the extracted features is enriched; the layer jump connection adds the input of the LMFA module and the characteristics after the attention mechanism, and the effective characteristics are enhanced on the basis of enhancing the characteristic mobility; the global attention module adaptively integrates local features and global dependencies and enriches the context dependency relationship of the features;

the reconstruction feature fusion module fuses the features of the downsampling module and the image reconstruction module, and enhances the edge features of the image.

According to an aspect of the embodiment of the present invention, the optimizing the super-resolution reconstruction model by using the target super-resolution reconstruction data set and the edge feature training includes:

designing an objective function for training and optimizing the super-resolution reconstruction model, wherein the objective function comprises a loss function of a generator and a discriminator;

setting a training hyper-parameter, minimizing the loss function by using an optimization algorithm, and iterating the training discriminant and the generator until convergence.

According to one aspect of an embodiment of the invention, the generator's loss function includes a countering loss, a pixel-level based reconstruction loss, a difference in response based on the intermediate layer characteristics of the arbiter, and a classification loss based on the target and background of the arbiter.

According to one aspect of an embodiment of the invention, the penalty function of the arbiter is a counterpenalty and an auxiliary classification penalty.

Compared with the prior art, the invention has the following beneficial effects:

according to the scheme of the embodiment of the invention, aiming at the problem of poor detection performance of the weak and small targets in the remote sensing image target detection problem, the method carries out detail recovery and super-resolution reconstruction on the candidate target frames. Aiming at the problem of single scene, the target data set under rich scenes is obtained by searching google earth, so that the scene applicability of target detection is improved. Aiming at the limitation of reconstructing super-resolution images of single-scale low-resolution images, the invention adopts a Bicubic interpolation mode to carry out different-size scaling on original high-resolution images before image reconstruction, restores the scaled images, and takes the scaled images with uniform sizes as the input of a generator constructed by the invention.

Aiming at the problem of unclear boundary of a weak and small target, the invention fuses the traditional edge detection operator. Compared with the traditional method, the method acquires the images with different resolutions before reconstruction, and is favorable for reconstructing images with different scales. The deep learning features are combined with the traditional manual features, enhancing the edge features. Through designing the improved multi-scale large-core attention module and fusing different levels of attention mechanisms (local and global), the receptive field is enriched, the characteristic dependence is enhanced, and the target characteristics extracted by the model are richer and more effective.

Meanwhile, the invention uses a gating attention mechanism and layer jump connection to fuse the bottom layer characteristics with the reconstruction characteristics, strengthen the characteristic multiplexing and strengthen textures and edge details. Compared with the prior art, the method adopts the mode of the additive attention mechanism, so that the effective characteristics can be enhanced, and the synthetic quality of the image is improved. In the optimization function design, the intermediate characteristic layer loss and the auxiliary classification loss of the discriminators are adopted, so that the synthesized vehicle weak and small targets are more real.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

Fig. 1 to 3 schematically illustrate flowcharts of a super-resolution reconstruction method for detecting a target of a vehicle by using a remote sensing image according to an embodiment of the present invention;

FIG. 4 schematically illustrates a network architecture diagram of a generator in a super-resolution reconstruction model disclosed in an embodiment of the present invention;

FIG. 5 schematically illustrates a network architecture of a global channel location attention module disclosed in an embodiment of the present invention;

FIG. 6 schematically illustrates a network architecture diagram of a arbiter in a super-resolution reconstruction model according to an embodiment of the present invention;

fig. 7 schematically illustrates a network architecture of an LMFA module disclosed in an embodiment of the present invention.

Detailed Description

The description of the embodiments of this specification should be taken in conjunction with the accompanying drawings, which are a complete description of the embodiments. In the drawings, the shape or thickness of the embodiments may be enlarged and indicated simply or conveniently. Furthermore, portions of the structures in the drawings will be described in terms of separate descriptions, and it should be noted that elements not shown or described in the drawings are in a form known to those of ordinary skill in the art.

Any references to directions and orientations in the description of the embodiments herein are for convenience only and should not be construed as limiting the scope of the invention in any way. The following description of the preferred embodiments will refer to combinations of features, which may be present alone or in combination, and the invention is not particularly limited to the preferred embodiments. The scope of the invention is defined by the claims.

The embodiment of the invention discloses a super-resolution reconstruction method for detecting a remote sensing image target, which is characterized in that a deep learning algorithm and a traditional image processing operator are fused, features of a reconstructed image are refined, the quality of the remote sensing image target is improved, and the reconstruction quality of the vehicle target is effectively improved and the calculation cost of reconstruction is reduced by means of an improved multi-scale large-core attention module, a multi-level attention mechanism and various loss functions. The problem of detecting the small target of the vehicle in a large-breadth and complex background environment of the remote sensing satellite image can be further solved and improved by realizing high-quality reconstruction of the small target.

As shown in fig. 1 to 3, the specific steps of the above super-resolution reconstruction method for detecting the target of the vehicle by using the remote sensing image include the following steps:

s110, constructing rich high-resolution remote sensing image data sets in different scenes, and preprocessing the high-resolution remote sensing image data sets.

The present embodiment describes the above-described super-resolution reconstruction method by taking a small object such as a vehicle as an example. Further, the constructed high-resolution remote sensing image data set rich in different scenes comprises the high-resolution remote sensing images of the vehicle targets in different typical scenes including cities, villages, airports, ports, bridges and mines through google earth search, so that the data set is as rich as possible, and the scene adaptability of the vehicle target detection is improved.

The specific process of preprocessing the constructed high-resolution remote sensing image data set in step S110 includes: carrying out slicing pretreatment on the high-resolution remote sensing image in a mode of overlapping frames; screening out a target slice image containing a complete target and a background slice image partially containing no target; and performing category labeling on the target slice image and the background slice image by using one-hot coding.

In this embodiment, for example, in some vehicle target detection data sets, the boundary frame of the vehicle target is smaller than 100 pixels×100 pixels, and the acquired remote sensing image is sliced by adopting an overlapped frame mode, where the slice size is 128×128, so that the integrity of the target is ensured as much as possible. The slice data is then screened to screen out slice images that contain the complete vehicle object, and some background slice images that do not have the vehicle object are retained. And then, category standards are carried out on the target slice and the background slice of the vehicle, wherein the marking mode is one-hot coding, for example, the vehicle is [1,0], and the background is [0,1].

And S120, obtaining a corresponding low-resolution remote sensing image data set according to the high-resolution remote sensing image data set, and constructing a target super-resolution reconstruction data set.

The step S120 specifically includes: and carrying out data enhancement processing comprising rotation and random overturn of different angles on the preprocessed high-resolution remote sensing image data set, namely carrying out data enhancement processing on the high-resolution remote sensing slice image, wherein the data enhancement processing specifically comprises rotation and random overturn of different angles. For a vehicle target, the inclined frame detection performance of the vehicle target detection is lower than that of the positive frame detection performance, the rotation enhancement can improve the angle interference of the vehicle target detection, for example, the rotation angle range of data can be-30 degrees to 30 degrees, and the angle interval can be 5 degrees.

And compressing the high-resolution remote sensing image data set and the enhanced high-resolution remote sensing image data set to different degrees by using an interpolation mode to obtain low-resolution remote sensing images with different sizes and corresponding to the high-resolution remote sensing images. That is, the original high-resolution remote sensing slice image and the enhanced high-resolution remote sensing slice image are subjected to low-resolution processing, the high-resolution remote sensing slice image is compressed in different degrees by using a Bicubic interpolation mode, the compression sizes of the low resolution are 2, 4 and 8, and the corresponding obtained images are 64×64, 32×32 and 16×16.

And restoring the low-resolution remote sensing images with different sizes to the sizes of the high-resolution remote sensing images by using an interpolation mode. In this embodiment, the Bicubic interpolation method is used to restore the low-resolution remote sensing slice images with different sizes to the size 128×128 of the original high-resolution remote sensing slice image, so as to solve the single scaling problem.

And forming an image pair by the acquired high-resolution remote sensing image and the low-resolution remote sensing image, so as to construct a target super-resolution reconstruction data set for training a vehicle target reconstruction model. In this embodiment, each high resolution slice sample has 3 corresponding low resolution data, which not only breaks through the limitation of single-size image reconstruction, but also increases the number of samples to some extent.

S130, extracting edge features of the low-resolution remote sensing image in the low-resolution remote sensing image dataset. In the embodiment, the vehicle target presents small target characteristics in the remote sensing image, and the edge characteristics are easy to lose, so that the invention needs to extract the edge of the vehicle before reconstruction and refine the edge characteristics.

The specific steps of extracting the edge features of the low-resolution remote sensing image in step S130 include the following:

and carrying out gray scale normalization processing on the low-resolution remote sensing image to obtain a gray scale image. The gray scale normalization formula is:

Gray(i，j)＝0.299×R(i，j)+0.587×G(i，j)+0.114×B(i，j)

where Gray (i, j) represents the Gray pixel value at position (i, j), and R (i, j), G (i, j) and B (i, j) represent the pixel values of different channels of the color map, respectively.

Gaussian filtering is used for removing noise, and Canny gradient operators are used for detecting edges of the gray level images, so that edge features of targets are extracted. Noise and edges belong to high-frequency parts of images, the noise is removed through Gaussian filtering, and then gradient operators are used for detecting the edges, so that edge characteristics of a vehicle target are obtained and used as one of inputs of a follow-up neural network (reconstruction model).

And S140, constructing a super-resolution reconstruction model, and training and optimizing the super-resolution reconstruction model by utilizing the target super-resolution reconstruction data set and the edge characteristics. The model is used for realizing the super-resolution reconstruction of the vehicle target.

The super-resolution reconstruction model constructed in step S140 is based on generation of an antagonism network implementation, and specifically mainly includes: and (3) constructing a generator and a discriminator, and completing the super-score reconstruction of the target through the game countermeasure of the generator and the discriminator. The edge features and the low-resolution images are used as the input of the super-resolution reconstruction model, so that the edge features of the vehicle target can be thinned. In order to make the reconstructed image have more complete details and texture characteristics, an improved large-core attention model is used for representing the multi-scale features in a downsampling space, and depth convolution, depth expansion convolution and point convolution are used for fusing local information and global information, so that a receptive field is enlarged and the calculated amount is reduced. Furthermore, by using different scale attention mechanisms, the effective features are enhanced. The arbiter adds an auxiliary classifier based on the conventional true and false classification of the generated countermeasure network for optimizing the generator.

The process of constructing the super-resolution reconstruction model in step S140 specifically includes the following steps:

(1) Design of the generator. As shown in fig. 4, the generator includes a bottom layer feature extraction module, two downsampling modules, an advanced semantic feature extraction module, an image reconstruction module, and a reconstruction feature fusion module. The downsampling module enables the model to learn at a lower resolution, reducing complexity of the model. The high-level semantic feature extraction module characterizes the features, is designed as a multi-scale feature extraction module based on light weight, is added with a local attention mechanism, strengthens the dependence between local positions and channels, finally adopts a global attention module, expands the feature dependence range, and enables the model to extract more effective vehicle target information by using attention mechanisms of different scale features and different levels. In addition, in the image reconstruction module, the downsampling characteristics and the corresponding upsampling related characteristics are fused by referring to the layer jump connection, so that the information flow is enhanced, and the texture details and the edge characteristics of the image reconstruction are enriched.

The bottom layer feature extraction module uses a convolution kernel with a kernel size of 3×3 for extracting detail features, the activation function adopts a ReLU function, and the number of channels is 64. To reduce the complexity of the model and the computational loss, the original input is downsampled twice. The downsampling module adopts a bottleneck layer with residual branches, the trunk branches of the downsampling module are stacked by 1×1 convolution, 3×3 convolution (step length is 2) and 1×1 convolution, the residual branches are 1×1 convolution (step length is 2), and the downsampling of the features and the multiplication of the feature channels (128, 256) are realized while feature extraction is realized. The high-level semantic feature extraction module is composed of a series of LMFA modules, layer jump connection and global attention modules which are connected in series. The LMFA module is an improved multi-scale large-core attention module fusing features with different core sizes, and the large-core attention module decouples convolution into deep convolution, deep expansion convolution and point convolution, and as shown in fig. 7, local feature extraction and long-range dependence of features are realized by fewer parameters. The improved multi-scale large-core attention module fuses the characteristics of different core sizes, acquires the characteristics of different fine granularity by using different receptive fields, and enriches the contextual characteristics of the vehicle target characteristics. The module firstly changes a characteristic channel into 2 times of input (c=256) through 1×1 convolution, groups the characteristic channel evenly, extracts characteristics of different scales from the characteristic channel by using 4 branches, selectively fuses the characteristics among different branches, and then carries out channel dimension reduction on the characteristics fused by the 4 branches through 1×1 convolution, wherein the process is expressed as follows:

F _out ＝Cat(F ₁ ，F ₂ ，F ₃ ，F ₄ )

wherein F is _out Representing the output features, cat represents that features are connected in the channel dimension. Wherein F is ₁ The expression is as follows:

F ₁ ＝Conv(F _in ) _0～c/4

wherein F is _in Representing the input of each LMFA block, conv represents a 1 x 1 convolution and c represents the number of channels after convolution. F (F) ₂ The expression is as follows:

in the above formula, the 1/4 characteristic channel after the dimension rise is processed by DepC (depth separable convolution), depD (deep expansion convolution) and Conv (1×1 conventional convolution), and finally is subjected to matrix multiplication processing with the input characteristic. Through the above procedure, a large nuclear attention module with a nuclear size of 21 can be realized with a 5×5 DepC and a 7×7 DepD (d=3), reducing the parameters while combining the features of the different receptive fields. F (F) ₃ The expression is as follows:

wherein F is ₃ The output of the 3 rd branch is shown, fused with the output of the 2 nd branch, where the kernel size is 11, the convolution kernel of depc is 5 x 5, the convolution kernel of depd is 4 x 4, and the expansion ratio is 3. Output of 4 th branch F ₄ The following is expressed, with a kernel size of 13, a convolution kernel of depc of 5×5, a convolution kernel of depd of 4×4, and an expansion ratio of 4:

according to the process, 4 different branches are fused based on an average grouping, a light weight idea and a multi-scale large-core attention mechanism, the core size of each branch is different, the receptive field is enriched by fewer parameters, and vehicle feature extraction with different fine granularity is realized.

After multi-scale feature extraction, in order to highlight important features, a channel attention mechanism is used, local channel semantic dependencies are integrated, LMFA input and features after the attention mechanism are added through layer jump connection, and effective features are enhanced on the basis of enhancing feature mobility.

After the advanced semantic feature extraction module is connected in series through a plurality of LMFA modules, the input and the serial results are added to strengthen the global feature fluidity, and in addition, as shown in fig. 5, a global channel position attention (CAM module and PAM module) module is introduced, so that the model adaptively integrates local features and global dependence, the feature context dependence relationship is enriched, and the reconstruction effect is improved.

And (5) designing an image reconstruction module. And realizing the super-division reconstruction of the vehicle target by the characteristic after characterization through continuous two up-sampling and convolution, wherein the number of channels is 256- >128- >64, and finally restoring the super-division reconstruction into the original input channel. The up-sampling module uses a sub-pixel layer with a larger receptive field to provide more context information and synthesizes a more real super-resolution image.

And the design of the reconstruction feature fusion module enables the features of the downsampling module and the image reconstruction module to be fused, and the edge features of the image are enhanced. The learned features are more abstract and shallow information is ignored due to the fact that the features are more abstract in the downsampling and feature characterization processes. And in the image reconstruction process, shallow layer characteristics (texture and edge) can keep the detail information of the image. By introducing a gating attention mechanism, it is expressed as follows:

F _AG ＝F _d ×Sigmoid(BN(Conv(ReLU(F _d +UP_Sample(F _u )))))

wherein F is _d And F _u Representing the downsampling layer features and the corresponding upsampling layer previous layer features, respectively. The downsampling process extracts high-level semantic information through continuous convolution and feature dimension reduction, and finally the obtained features lose part of detail information. And calculating the related features of the downsampling features and the corresponding upsampling features, and integrating the related features into the upsampling features corresponding to the downsampling features, so that the fluidity of the features is enhanced, and the texture detail features of image reconstruction are effectively improved. Meanwhile, after the input subjected to interpolation reconstruction is fused to the reconstruction module for convolution, the edge characteristics of the image are enhanced.

(2) And (5) designing a discriminator. As shown in fig. 6, the discriminator uses a res net18 network as a skeleton, and connects two fully connected layers (FC 1 and FC 2) after the global averaging pooling layer, as two classifiers, one classifier is used for discriminating a real image from a synthesized image, and the other classifier is used for classifying a target and a background. In addition, minimizing the characteristic response difference of the intermediate layer of the discriminator can be used for improving the synthetic quality of the generator.

The specific process of optimizing the super-resolution reconstruction model by utilizing the target super-resolution reconstruction data set and the edge feature training in the step S140 includes:

the super-resolution reconstruction model is designed to train an optimized objective function comprising an improved loss function of the generator and the arbiter. The loss function of the generator includes, among other things, the countermeasures, the reconstruction losses based on pixel level, the differences in response based on the features of the intermediate layer of the arbiter (composite image versus original high resolution image), and the classification losses based on the object of the vehicle and the background of the arbiter.

Wherein the countermeasures of the generator are of the formula:

wherein, img _LR Representing the input low-resolution image(s),represents the super-resolution image after the synthesis,representing the probability that the arbiter decides that the composite image is a true image.

The pixel-level based reconstruction loss is expressed as follows:

where N represents the current sample, N represents the total number of samples of the optimized network, img _HR Representing a real high resolution image, gen _HR Represents the super-resolution image after the synthesis,the generator is optimized by minimizing the L1 penalty of the true high resolution image and the composite image.

The difference in response based on the intermediate layer characteristics of the arbiter is expressed as follows:

wherein Ω represents a hierarchical set of discriminators, f _k The characteristic response of the kth layer of the discriminator is represented, and the low-resolution remote sensing image and the synthesized super-resolution are as close as possible by minimizing the intermediate characteristic response difference.

Based on the classification loss of the target and the background of the discriminator, the auxiliary classifier classifies the synthesized image, so that the generator recovers more vehicle target details, and the method is expressed as follows:

the total loss function of the generator is:

wherein, gamma ₁ 、γ ₂ 、γ ₃ And gamma ₄ And the weight parameters of the corresponding loss functions are represented.

The loss function of the arbiter is then designed. The loss function of the arbiter is the counterloss and the auxiliary classification loss, and is expressed as follows:

where E (-) represents the expectation, log (-) represents the log function,representing a high resolution image that a arbiter determines to be authenticProbability of being a real image +.>Represents the super-resolution image after the synthesis,representing the probability that the discriminator decides the synthesized image as a true image, D _cls (Img _HR ) The probability of determining that the real high-resolution image is the vehicle target is represented, and alpha represents the weight coefficient of the total loss occupied by the classification loss.

Setting training super parameters, minimizing the loss function by using an optimization algorithm, and iterating the training discriminant and the generator until convergence. For example, the learning rate is 0.001, the learning rate decay strategy is an exponential decay strategy, the number of iterations is 100, the training batch is 16, and the optimization algorithm selects Adam algorithm.

S150, performing high-resolution recovery and reconstruction on the target by using the super-resolution reconstruction model. Specifically, a vehicle target low-resolution candidate region generated by a target detection network such as a vehicle in a remote sensing image is input into a super-resolution reconstruction model, a high-resolution target is reconstructed, a super-resolution image of a weak and small target is synthesized, the result can be applied to subsequent vehicle target detection, and the target detection rate is improved.

Through the steps, super-resolution reconstruction of the remote sensing image vehicle target can be achieved. According to the method, the target data set in the rich scene is obtained by searching google earth, so that the scene applicability of target detection is improved. Meanwhile, the method adopts low-resolution images with different reduction factors as input, and breaks through the limitation of reconstruction with a single scaling factor. In addition, the method designs a light multi-scale model, fuses attention mechanisms of different levels, so that the extracted features are more abundant and effective, and excessive calculation is not introduced. For a reconstruction module, the method merges shallow features and abstract features, and strengthens detail information and edge features of a target. In the training process, the optimal reconstruction model is trained based on the loss of the pixel level, the intermediate layer characteristic response difference of the discriminator and the auxiliary loss of the discriminator, so that the reconstruction quality of the target is effectively improved.

The sequence numbers of the steps related to the method of the present invention do not mean the sequence of the execution sequence of the method, and the execution sequence of the steps should be determined by the functions and the internal logic, and should not limit the implementation process of the embodiment of the present invention in any way.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the invention.

Claims

1. A super-resolution reconstruction method for detecting a vehicle target by remote sensing images comprises the following steps:

obtaining a corresponding low-resolution remote sensing image data set according to the high-resolution remote sensing image data set, and constructing a target super-resolution reconstruction data set; the method for constructing the target super-resolution reconstruction data set comprises the steps of:

forming an image pair by the acquired high-resolution remote sensing image and the low-resolution remote sensing image, so as to construct a target super-resolution reconstruction data set;

extracting edge features of the low-resolution remote sensing image in the low-resolution remote sensing image dataset; the extracting the edge features of the low-resolution remote sensing image comprises the following steps:

removing noise by Gaussian filtering, detecting the edge of the gray level image by using a Canny gradient operator, and extracting edge characteristics of a target;

2. The method of claim 1, wherein constructing the enriched high resolution remote sensing image dataset for different scenes comprises: the high-resolution remote sensing images of the vehicle targets in different typical scenes including cities, villages, airports, ports, bridges and mines are searched through google earth.

3. The method of claim 1, wherein the preprocessing the high resolution remote sensing image dataset comprises:

4. The method of claim 1, wherein the constructing the super-resolution reconstruction model comprises:

5. The method of claim 4, wherein the underlying feature extraction module uses a convolution kernel of kernel size 3 x 3 for extracting detail features, and the activation function employs a ReLU function;

6. The method of claim 1, wherein the optimizing the super-resolution reconstruction model using the target super-resolution reconstruction dataset and the edge feature training comprises:

7. The method of claim 6, wherein the generator's loss function includes a countering loss, a pixel-level based reconstruction loss, a difference in response based on a discriminator intermediate layer feature, and a classification loss based on a discriminator's target and background.

8. The method of claim 6, wherein the loss function of the arbiter is a counterloss and an auxiliary classification loss.