CN118608799A

CN118608799A - Remote sensing image compression method based on multi-scale asymmetric coding and decoding network

Info

Publication number: CN118608799A
Application number: CN202410645443.XA
Authority: CN
Inventors: 王柯俨; 马豪逸; 周培诚; 贾佳; 唐苒; 熊浩博; 宋娟; 刘凯; 李云松
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2024-05-23
Filing date: 2024-05-23
Publication date: 2024-09-06

Abstract

The invention discloses a remote sensing image compression method based on a multi-scale asymmetric coding and decoding network, which mainly solves the problem that the existing method can not fully extract multi-scale detail information in a remote sensing image. The implementation scheme is as follows: acquiring a remote sensing image data set and dividing the remote sensing image data set into a training set and a testing set; constructing an asymmetric remote sensing image compression network based on multi-scale hole attention and a loss function of the asymmetric remote sensing image compression network, wherein the asymmetric remote sensing image compression network consists of a multi-scale residual error module, a multi-scale hole attention module, a multi-scale asymmetric codec, a self-adaptive context entropy model and a super-prior codec; dividing the training set into a plurality of image groups according to the batch size, and sequentially and circularly inputting the image groups into the network until the loss function converges; and inputting the test set into a trained compression network to obtain a reconstructed remote sensing image. The invention can enhance the feature extraction capability of the compression model on the multi-scale information in the remote sensing image, reduce channel redundancy, improve the compression performance of the remote sensing image, and can be used for processing the image with abundant texture details and large data volume.

Description

Remote sensing image compression method based on multi-scale asymmetric coding and decoding network

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a remote sensing image compression method which can be used for processing images with abundant texture details and large data volume.

Background

The increasingly mature remote sensing technology greatly promotes the development of related tasks such as remote sensing satellite image acquisition and the like, is widely applied to the fields such as agriculture, emergency disaster relief, land dynamic monitoring, urban planning, military reconnaissance, resource exploration, environment monitoring and the like, but compared with a natural image, the remote sensing image has larger information quantity and higher data dimension, and the huge data quantity makes the remote sensing image difficult to store, transmit and apply, so that the research on a high-efficiency and feasible remote sensing image compression algorithm is a necessary means for solving the problem of remote sensing image transmission difficulty and saving storage resources. At present, the compression method for the remote sensing image is mainly divided into a traditional remote sensing image compression method and a remote sensing image compression method based on deep learning.

The traditional remote sensing image compression method relies on a manually designed feature extractor, and mainly comprises a compression method based on vector quantization, a compression method based on transformation and a compression method based on prediction, wherein:

The compression method based on vector quantization can map a group of continuous pixel points into a group of limited vector codes, thereby realizing the compression of image data. Typical methods such as that described in Ryan et al The lossless compression of AVIRIS images by vector quantization[J].IEEE transactions on geoscience and remote sensing,1997,35(3):546-550 only index the sub-code table during vector quantization to improve the coding process and achieve better compression performance by mean normalization. However, the compression method based on vector quantization has higher computational complexity, better effect and larger limitation only when processing images with continuous tone change and rich textures.

According to the transformation-based compression method, the original data are mapped into a transformation domain, so that the correlation of the data in the transformation domain is obviously reduced, and the data redundancy information is greatly reduced. Typical approaches such as presented by Bo et al Remote-sensing image compression using two-dimensional oriented wavelet transform[J].IEEE Transactions on Geoscience and Remote sensing,2010,49(1):236-250 use fewer bits to sparsely represent pixels in the transform domain, which, while reducing spatial redundancy, require a compromise between compression effects and overall complexity of the algorithm, and are difficult to apply directly.

The compression method based on prediction mainly utilizes the correlation between the current pixel of the image and the pixels in the neighborhood, and the pixel value at the current position is presumed by the values of nearby known pixels, so that the information entropy of the original image data is reduced. Typical methods are the adaptive prediction length method as proposed in Lossless compression of hyperspectral images using clustered linear prediction with adaptive prediction length[J].IEEE Geoscience and Remote Sensing Letters,2012,9(6):1118-1121 by Jarno et al (C-DPCM-Adaptive Prediction Length, C-DPCMAPL). The advantage of this prediction-based compression method is that it has low algorithm complexity and is easy to implement in hardware, but the compression performance of this prediction-based method is relatively poor.

The traditional compression method only uses a manually designed feature extractor, and can effectively solve the problem of high-resolution satellite image data transmission, but when the compression multiple is higher, serious distortion phenomenon can be generated at a detail texture, and a changeable optical remote sensing image can not be well processed.

The natural image compression based on the deep learning has shown great potential at present, and the application of the deep learning method to the remote sensing image compression has great feasibility, and the remote sensing image compression based on the deep learning has become one of the current hot spot researches. The deep learning has strong learning capacity, characteristic extraction capacity, expression capacity and processing capacity of high-dimensional data, and compared with the traditional compression method, the deep learning can better solve the problems of large-scale data volume, multiple details, high redundancy and high-rate compression of the remote sensing image.

Zhao Hua et al in patent document with application number of CN202211594112.5 disclose a remote sensing image compression method based on an end-to-end convolutional neural network, which extracts characteristic information of an image from an encoder, removes characteristic redundancy in the image by quantization and entropy coding, and finally performs joint optimization on image distortion and compression code rate by utilizing rate distortion optimization.

Chong et al in High-order Markov random field as attention network for high-resolution remote-sensing image compression[J].IEEE Transactions on Geoscience and Remote Sensing,2021,60:1-14 propose a high-resolution remote sensing image compression attention model based on a high-order Markov random field, and the parameter learning of the Markov random field is combined with an image compression network, so that prior information is effectively expressed, and meanwhile, the convergence speed of training is accelerated.

Han et al in Edge-Guided Remote Sensing Image Compression[J].IEEE Transactions on Geoscience and Remote Sensing,2023 propose a high-fidelity lossy remote sensing image compression network, incorporating an edge-guided dual discriminator GAN network and minimizing multi-constraint edge fidelity constraints, capable of reconstructing sharp edge structure and texture details of the remote sensing image.

Zhai et al introduce a classification strategy of the remote sensing image into image compression based on deep learning in Adaptive scene-aware deep attention network for remote sensing image compression[J].Journal of Electronic Imaging,2021,30(5):053008-053008, so that a good remote sensing image compression effect is achieved.

The four image compression techniques are all remote sensing image compression methods based on deep learning, and as the attention degree to multi-scale detail information contained in the remote sensing image is low, the distribution fitting degree of information entropy is poor, the complexity of a model is high, the channel redundancy is high, and the optimal compression performance is difficult to achieve.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a remote sensing image compression method based on a multi-scale asymmetric coding and decoding network, so as to enhance the characteristic extraction capability of a compression model on multi-scale information in a remote sensing image, improve the fitting degree of the compression model on the information entropy of the remote sensing image, reduce the complexity of the model, reduce channel redundancy and improve the compression performance of the remote sensing image.

The technical idea for realizing the purpose of the invention is as follows: the characteristic extraction capability of the compression model on the multi-scale information of the remote sensing image is enhanced by arranging a multi-scale residual error module and a multi-scale cavity attention module; by constructing an asymmetric coding and decoding network, the complexity of the model is reduced; by constructing the self-adaptive context entropy model, the fitting degree of the remote sensing image information entropy is improved, and the channel redundancy is reduced.

According to the technical thought, the implementation steps of the invention comprise the following steps:

(1) Selecting a group of remote sensing images from the existing remote sensing image data set, and carrying out pretreatment of normalization and cutting operation on the remote sensing images to obtain a data set with the uniform size of 640 multiplied by 640; the dataset was then assembled according to 9:1 is divided into a training set and a testing set;

(2) Constructing a remote sensing image compression network based on a multi-scale asymmetric coding and decoding network under Pytorch framework:

(2a) Establishing a multi-scale residual error module MSRB comprising a multi-scale feature fusion module and a local residual error learning module;

(2b) Establishing a multi-scale hole attention module MSDA comprising a multi-scale sliding window hole module, a convolution module and a basic residual block;

(2c) Establishing an adaptive context entropy model comprising a non-uniform channel context model and a discrete Laplace mixed entropy model;

(2d) Sequentially cascading a multi-scale residual error module MSRB, a multi-scale cavity attention module MSDA, a basic residual error block and a convolution block to form a multi-scale asymmetric encoder;

(2e) Establishing a multi-scale asymmetric decoder with the same structure as the multi-scale asymmetric encoder;

(2f) Establishing a super prior encoder comprising n convolution layers with different sizes, wherein 0< n <5;

(2g) Establishing a super prior decoder with the same structure as the existing super prior encoder;

(2h) Sequentially cascading a multi-scale asymmetric encoder, a super prior decoder, a self-adaptive context entropy model and a multi-scale asymmetric decoder to form a remote sensing image compression network based on a multi-scale asymmetric coding and decoding network;

(2i) Selecting an existing rate distortion loss function L as a loss function of the network;

(3) Inputting the training set in the step (1) into a remote sensing image compression network, and performing iterative training by using an Adam optimizer and a gradient descent method and taking a minimized loss function as a target to obtain a trained remote sensing image compression network;

(4) And inputting the test set into a trained remote sensing image compression network, and outputting a reconstructed remote sensing image.

Compared with the prior art, the invention has the following advantages:

firstly, the extraction capability of multi-scale features is high, and the complexity of a model is low:

The existing remote sensing image compression model based on deep learning does not optimize multi-scale features in the remote sensing image, so that the feature extraction capability of the existing remote sensing image compression model to the remote sensing image is poor;

The multi-scale asymmetric codec provided by the invention uses the multi-scale residual error block to adaptively detect and fuse multi-scale characteristics, captures and aggregates semantic information on each scale by using the attention of the multi-scale cavity, so that the network can not only effectively extract the characteristics of ground objects with different scales, but also retain the detailed information of more remote sensing images; and can reduce decoding complexity while not affecting image reconstruction performance.

Secondly, the fitting degree of the information entropy is improved, and the channel redundancy is reduced:

The existing remote sensing image compression model based on deep learning generally uses a Gaussian entropy model to model potential features, and meanwhile the problem of non-uniformity of entropy values of different channels is hardly considered; according to the self-adaptive context entropy model, the discrete Laplace mixed model is used, the precision modeling degree of potential features after quantization is improved, and the non-uniform channel context model is used for distributing smaller channel quantity granularity for channel slices with larger entropy values, so that channel redundancy is effectively reduced, and compression efficiency is improved.

Experimental results show that compared with the traditional remote sensing image compression method and the remote sensing image compression method based on deep learning, the remote sensing image compression method disclosed by the invention focuses on the reconstruction of the detail texture information of the remote sensing image, has better reconstruction quality under the same code rate, and realizes higher compression performance.

Drawings

FIG. 1 is a general flow chart of an implementation of the present invention;

FIG. 2 is a schematic diagram of a remote sensing image compression network based on a multi-scale asymmetric codec network constructed in the present invention;

FIG. 3 is a schematic diagram of the multi-scale asymmetric codec of FIG. 2;

FIG. 4 is a schematic diagram of the multi-scale residual module and the multi-scale hole attention module in FIG. 3;

FIG. 5 is a schematic diagram of the super a priori codec of FIG. 2;

FIG. 6 is a schematic diagram of the adaptive context entropy model of FIG. 2;

FIG. 7 is a comparison of reconstructed restored images using the remote sensing image compression methods of the present invention and the prior art, respectively;

fig. 8 is an enlarged detail view of the restored image of fig. 7.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, other embodiments obtained by those skilled in the art without making any creative effort shall fall within the protection scope of the present invention.

The remote sensing image compression network based on the multi-scale asymmetric coding and decoding network is used for carrying out lossy compression on the remote sensing image under the condition of low code rate, an input image is firstly sent into a multi-scale encoder, potential features under different scales are extracted, side information of the potential features is further extracted, the quantized potential features are guided to carry out entropy coding by using the side information of different scales to generate a code stream, and the code stream is decoded at a decoding end and then is input into the multi-scale decoder to obtain a reconstructed image.

Referring to fig. 1, the specific implementation steps of this example are as follows:

Step 1: a data set is acquired and preprocessed and partitioned.

Acquiring the existing DIOR remote sensing data set, wherein the total of 23463 remote sensing images are obtained, and the spatial resolution of each image is 800 multiplied by 800;

Preprocessing all images of the dataset, namely, normalizing and cutting to enable the spatial resolution of the images to be 640 multiplied by 640;

And then the pretreatment data set is processed according to 9:1, randomly selecting 2347 images as a test set and the rest images as training sets.

Step 2: and constructing a remote sensing image compression network based on the multi-scale asymmetric coding and decoding network.

2.1 Constructing a multi-scale asymmetric codec:

2.1.1 A) constructing a multi-scale residual block MSRB containing 2 parallel branches, as shown in fig. 4 (a):

The 1 st branch is formed by cascading two residual blocks and a sixth convolution layer, each residual block comprises three convolution layers and two activation functions, wherein a first convolution layer, a first ReLU activation function layer, a second convolution layer, a second ReLU activation function layer and a third convolution layer are sequentially cascaded, the output characteristics of the third convolution layer and the original input characteristics are added element by element and then used as the output of the residual block, the convolution kernel sizes of the first convolution layer, the third convolution layer and the sixth convolution layer are all 1 multiplied by 1, the convolution kernel size of the second convolution layer is 3 multiplied by 3, and the step sizes of the four convolution layers are all 2;

The 2 nd branch consists of two ConvNeXt modules and a seventh convolution layer cascade, each ConvNeXt block comprises three convolution layers, a normalization layer and a GeLu activation function layer, wherein the first depth convolution layer and the normalization layer are in cascade, the normalized features are in cascade with a fourth convolution layer, a GeLu activation function layer and a fifth convolution layer, and the output features of the fifth convolution layer and the original input features are added element by element to be used as the output of the ConvNeXt block; the convolution kernel size of the first depth convolution layer is 7 multiplied by 7, the convolution kernels of the fourth convolution layer, the fifth convolution layer and the seventh convolution layer are all 1 multiplied by 1, and the step sizes of the four convolution layers are all 2;

And splicing the outputs of the two branches, reducing the dimension through an eighth convolution layer, and adding the reduced dimension with the input features to obtain a multi-scale residual block MSRB.

2.1.2 A) constructing a multi-scale hole attention block MSDA containing 2 parallel branches, as shown in fig. 4 (b):

The first branch is formed by cascading a sliding window cavity module MSWDB, three residual blocks, a ninth convolution layer and a Sigmoid function layer, the sliding window cavity module MSWDB divides a channel of a feature map into three different heads, performs self-attention on each head by using different expansion rates r, and then splices multi-scale features obtained after r=1, 2 and 3 different expansion rates, and sends the multi-scale features into a linear layer for feature aggregation; the convolution kernel of the ninth convolution layer has a size of 1×1, the step length is 2, and the third residual block, the fourth residual block and the fifth residual block have the same structure as the residual blocks in the multi-scale residual module MSRB;

The second branch is formed by sequentially cascading a sixth residual block, a seventh residual block and an eighth residual block, and each residual block is identical to the residual block in the multi-scale residual block MSRB;

and multiplying the output characteristics of the two branches, and adding the obtained result with the original input characteristics element by element to obtain a multi-scale cavity attention block MSDA so as to aggregate semantic information on multiple scales of different receptive fields.

2.1.3 A convolution layer, three residual blocks, three multi-scale residual blocks MSRB and two multi-scale hole attention blocks MSDA are cascaded to form a multi-scale asymmetric encoder, as shown in fig. 3 (a), the structural relationship is:

1 st residual block→1 st multiscale residual block MSRB →2 nd residual block→1 st multiscale hole attention block msda→2 nd multiscale residual block MSRB →residual block 3→residual block MSRB of multiple scales 3→convolution layer 1→hole attention block MSDA of multiple scales 2;

the convolution kernel size of the 1 st convolution layer is 3×3, the step size is 2, and the 1 st residual block, the 2 nd residual block and the 3 rd residual block are identical to the residual block structure in the multi-scale residual block MSRB.

2.1.4 A convolutional layer, three residual blocks, a multi-scale residual block MSRB and two multi-scale hole attention blocks MSDA are cascaded to form a multi-scale asymmetric decoder, as shown in fig. 3 (b), the structural relationship is:

The 3 rd multi-scale hole attention block MSDA- & gt 4 th residual block- & gt 5 th residual block- & gt 4 th multi-scale hole attention block MSDA- & gt 4 th multi-scale residual block MSRB- & gt 6 th residual block- & gt 2 nd convolution layer;

The convolution kernel size of the 2 nd convolution layer is 3×3, the step size is 2, and the 4 th residual block, the 5 th residual block and the 6 th residual block are all identical to the residual block structure in the multi-scale residual block MSRB.

2.2 A super prior encoder including three convolution layers, namely, a3 rd convolution layer, a 4 th convolution layer and a 5 th convolution layer are cascaded, as shown in fig. 5 (a), wherein the convolution kernel size of the 3 rd convolution layer is 3×3, the convolution kernel sizes of the 4 th convolution layer and the 5 th convolution layer are 5×5, and the step sizes of the three convolution layers are all 2.

2.3 A super prior decoder including three convolution layers, namely, a 6 th convolution layer, a 7 th convolution layer and an 8 th convolution layer are cascaded, as shown in fig. 5 (b), wherein the convolution kernel size of the 8 th convolution layer is 3×3, the convolution kernel sizes of the 6 th convolution layer and the 7 th convolution layer are 5×5, and the step sizes of the three convolution layers are all 2.

2.4 Constructing a non-uniform channel context model as shown in fig. 6 (a):

2.4.1 Features that will have m channels Dividing into n channel slices along a channel dimensionWhere 0< i < n and passing these channel slices into the context module MEM++, we get the method for joint prediction of the super-priorsThe entropy parameters phi of the slice are sent into a context module MEM++ as the characteristic information of the next channel slice;

2.4.2 Repeating the step 2.4.1) to obtain a non-uniform channel context model.

The module mem++ is an existing one of the basic context modules comprising a spatial contextAnd channel contextTwo parts, in the kth nonuniform grouping block, using a space context model to identify space redundancy, using a channel context model to model channel context information, and connecting the outputs of space and channel branches with a super-prior representation psi to obtain entropy parametersDecoding the entropy parameters to obtain the characteristics of the kth channel

2.5 Constructing a discrete laplacian mixed entropy model including quantization operation, arithmetic entropy encoding, and arithmetic entropy decoding, as shown in fig. 6 (b), wherein:

The quantization operation is to convert the input potential characteristics into a limited value range by adopting general quantization U-Q to obtain an approximate quantized value: In the method, in the process of the invention, Is a rounding function, y is the potential feature of the input, u is the uniform noise subject to the range of [ -0.5,0.5 ];

The entropy coding and decoding are potential characteristics after quantization Entropy estimation is carried out to obtain probability distribution of output characteristicsAnd then the predicted probability distribution parameters are transmitted to an arithmetic coder and an arithmetic decoder for guiding entropy coding and entropy decoding, wherein,I represents the position index in the feature map, k represents the number of hybrid laplacian, Q (·) represents the quantization function, each laplacian model is a single laplacian distribution with three parameters, i.e. weightsMean value ofAnd the variance of each element feature y _i

2.6 Cascading the non-uniform channel context model and the discrete Laplace mixed entropy model to form the self-adaptive context entropy model.

2.7 The multi-scale asymmetric encoder, the super prior decoder, the self-adaptive context entropy model and the multi-scale asymmetric decoder are cascaded to form a remote sensing image compression network based on the multi-scale asymmetric coding and decoding network.

Step 3: and training the remote sensing image compression network based on the multi-scale asymmetric coding and decoding network.

3.1 Equally dividing the preprocessed images into a plurality of image groups according to the batch size, and inputting the first image group into a remote sensing image compression network to obtain initial weight and offset of each convolution operation of the network;

3.2 Calculating a Loss value Loss corresponding to each image in the training set:

Wherein, Representing underlying potential representationsIs a loss of the code rate of (a),Representing a super-prior potential representationIs a loss of the code rate of (a),The MSE distortion loss of the original image and the reconstructed image is represented, lambda represents a weight parameter, the compression code rate of the remote sensing image is controlled by controlling lambda, and the larger the lambda value is, the smaller the compression multiple of the remote sensing image is;

3.3 Using an Adam optimizer to update parameters of a remote sensing image compression network with the minimum loss function value as a target to obtain a compression network model after the first parameter update;

3.4 Inputting the second image group into the compressed network model after the first parameter update, repeating the steps 3.2) to 3.3) to obtain the compressed network model after the second parameter update, and the like until all the image groups are completely input, and completing one training round;

3.4 Repeating steps 3.2) to 3.4) until a set number of training rounds is reached to obtain a trained compressed network model, the example being set but not limited to 200 rounds.

Step 4: and inputting the test set into a trained remote sensing image compression network, and outputting a reconstructed image result.

The effect of the invention is further illustrated by the following simulation tests:

test conditions

Data set: adopting the existing DIOR remote sensing image dataset;

Experiment platform: CPU is Intel (R) Core (TM) i9-10900X@3.70GHz,64GB memory, operating system is Ubuntu18.04, display card is NVIDIA GeForce RTX 3090, pytorch version is 1.8.1;

Parameter setting: the training batch size of the invention is set to 8, the initial learning rate is set to 1e-4, and the training is carried out for 200 rounds.

Simulation test content

Test 1: the DIOR test set is compressed by the method, the conventional JPEG method, the JPEG2000 method, the BPG method, the MLIC++ method, the Mean-Scale method and the Cheng-atten method to obtain reconstructed images under different compression code rates, the respective peak signal-to-noise ratios PSNR are calculated, the saved code rate BD-rate (%) under the same reconstructed image quality PSNR and the improved PSNR performance BD-PSNR (dB) under the same code rate are used as two evaluation indexes, the Mean-Scale method is used as a base line, the corresponding BD-rate and BD-PSNR are set to 0, and the performance comparison is carried out, wherein the results are shown in Table 1.

Table 1 BD-rate performance comparisons tested on the DIOR test set for the present invention and the 6 compression methods available

In table 1, when the BD-rate value is negative, the code rate saved under the same PSNR is represented, when the BD-PSNR value is negative, the reduced PSNR performance under the same code rate is represented, when the BD-PSNR value is negative, the improved PSNR performance is represented, and when the PSNR is higher under the same code rate, the compression performance of the remote sensing image is better.

As can be seen from table 1, the image recovery quality and the saved code rate of the proposed method are superior to those of other compression methods, which shows that the method has obvious advantages in compression performance.

Test 2: the DIOR test set is compressed by the method and the existing JPEG method, JPEG2000 method, BPG method, MLIC++ method, mean-Scale method and Cheng-atten method respectively to obtain reconstructed images of different methods, as shown in FIG. 7, wherein:

FIG. 7 (a) is a test set raw image;

FIG. 7 (b) is a graph showing the result of the compressed reconstruction of the remote sensing image of FIG. 7 (a) using the method of the present invention;

FIG. 7 (c) is a graph showing the result of the compressed reconstruction of the remote sensing image of FIG. 7 (a) using the MLIC++ method;

FIG. 7 (d) is a result of the reconstruction of the compressed remote sensing image of FIG. 7 (a) using the Cheng-atten method;

FIG. 7 (e) is a graph showing the result of the reconstruction of the compressed remote sensing image of FIG. 7 (a) using the Mean-Scale method;

FIG. 7 (f) is a graph showing the result of the reconstruction of the compressed remote sensing image of FIG. 7 (a) using the BPG method;

FIG. 7 (g) is a graph showing the result of the reconstruction of the compressed remote sensing image of FIG. 7 (a) using the JPEG2000 method;

FIG. 7 (h) is a result of reconstruction of the remote sensing image of FIG. 7 (a) after compression using the JPEG method;

Fig. 8 is an enlarged detail view of fig. 7 to more intuitively present the reconstructed image result.

As can be seen from fig. 7 and 8, the reconstructed image obtained by using the conventional JPEG and JPEG2000 methods has obvious blurring and blocking effects, the conventional BPG method is equivalent to the reconstructed image obtained by using the mlic++ method, the Mean-Scale method and the Cheng-atten method based on deep learning, but the four methods have poor recovery effects on the detail information of the image, and the reconstructed image obtained by the method of the invention retains more detail information and tone characteristics of the texture, is more similar to the original image, and has better visual recovery effect.

In summary, the comparison of the seven methods on the simulation results shows that the compression performance of the method on the remote sensing image is superior to that of the other six existing methods.

It should be noted that, the step numbers in the description and the claims of the present invention are only for the purpose of clearly describing the embodiments of the present invention, so that it is convenient to understand that the sequence of the numbers is not limited.

Claims

1. A remote sensing image compression method based on a multi-scale asymmetric coding and decoding network is characterized by comprising the following steps:

2. The method of claim 1, wherein the multi-scale residual module MSRB established in step (2 a) is implemented as follows:

(2a1) Constructing a local residual learning submodule comprising two residual blocks and a sixth convolution layer, wherein each residual block comprises three convolution layers and two activation functions, the first convolution layer, the first ReLU activation function layer, the second convolution layer, the second ReLU activation function layer and the third convolution layer are sequentially cascaded, and the output characteristic of the third convolution layer is added with the original input characteristic element by element and then used as the output of the residual block;

(2a2) Constructing a multi-scale feature fusion submodule comprising two ConvNeXt blocks and a seventh convolution layer, wherein each ConvNeXt block comprises a depth convolution layer, two convolution layers, a normalization layer and a GeLu activation function layer, the first depth convolution layer and the normalization layer are cascaded, the normalized features are cascaded with a fourth convolution layer, a GeLu activation function layer and a fifth convolution layer, and the output features of the fifth convolution layer are added with the original input features element by element to be used as the output of the ConvNeXt blocks;

(2a3) And after the outputs of the local residual learning sub-module and the multi-scale feature fusion sub-module are connected in parallel, the dimension is reduced through an eighth convolution layer, and the output is added with the input features to obtain the multi-scale residual module MSRB.

3. The method according to claim 2, characterized in that:

the convolution kernel size of the second convolution layer is 3 multiplied by 3, and the step length is 2;

The convolution kernels of the first convolution layer, the third convolution layer, the fourth convolution layer, the fifth convolution layer, the sixth convolution layer, the seventh convolution layer and the eighth convolution layer are all 1 multiplied by 1, and the step length is 2;

the convolution kernel size of the first depth convolution layer is 7×7, and the step size is 2.

4. The method of claim 1, wherein the multi-scale hole attention module MSDA established in step (2 b) is implemented as follows:

(2b1) Establishing a first branch consisting of a sliding window cavity module, three residual blocks, a convolution layer and a Sigmoid function layer in sequence in a cascading way, wherein the sliding window cavity module is used for dividing a channel of a feature map into three different heads, performing self-attention on each head by using different expansion rates r, splicing multi-scale features obtained after the expansion rates r, and inputting the obtained multi-scale features into the residual blocks;

(2b2) Establishing a second branch consisting of a sixth residual block, a seventh residual block and an eighth residual block which are sequentially cascaded;

(2b3) And multiplying the output characteristics of the two branches, and adding the obtained result with the original input characteristics element by element to obtain the multi-scale cavity attention module MSDA.

5. The method according to claim 4, wherein:

The expansion ratio r=1, 2,3 of the first branch;

The convolution kernel size of the ninth convolution layer is 1 multiplied by 1, and the step length is 2;

the third, fourth, fifth, sixth, seventh and eighth residual blocks are all identical in structure to the residual blocks in the multi-scale residual module MSRB.

6. The method of claim 1, wherein the step (2 c) of establishing an adaptive context entropy model is performed as follows:

(2c1) Constructing a non-uniform channel context model consisting of n context modules MEM++ connected in parallel, the submodule inputting features with m channels Along the channel dimension, n channel slices are providedThese channel slices are fed into a context module MEM++, resulting in a super-prior for joint predictionEntropy parameter phi of (2);

(2c2) Constructing a discrete Laplace mixed entropy model comprising quantization, arithmetic entropy coding and arithmetic entropy decoding operations, wherein the quantization operation adopts general quantization U-Q to realize approximate quantization; the entropy coding and decoding operation is to carry out entropy estimation on the quantized potential features, predict the probability distribution of the output features, and transmit the predicted probability distribution parameters to an arithmetic coder and an arithmetic decoder for guiding entropy coding and entropy decoding;

(2c3) And cascading the non-uniform channel context model and the discrete Laplace mixed entropy model, and performing entropy coding and entropy decoding under the guidance of potential characteristic probability distribution to obtain the self-adaptive context entropy model.

7. The method according to claim 6, wherein: each context module mem++ comprises a spatial contextAnd channel contextTwo parts, in the kth nonuniform grouping block, using a space context model to identify space redundancy, using a channel context model to model channel context information, and connecting the outputs of space and channel branches with a super-prior representation psi to obtain entropy parametersDecoding the entropy parameters to obtain the characteristics of the kth channel

8. The method according to claim 1, characterized in that:

The multi-scale asymmetric encoder established in the step (2 d) comprises three residual blocks, a convolution layer, three multi-scale residual blocks MSRB and two multi-scale hole attention blocks MSDA, and the structural relationship is as follows: 1 st residual block→1 st multiscale residual block MSRB →2 nd residual block→1 st multiscale hole attention block msda→2 nd multiscale residual block MSRB →residual block 3→residual block MSRB of multiple scales 3→convolution layer 1→hole attention block MSDA of multiple scales 2;

The multi-scale asymmetric decoder established in the step (2 e) comprises three residual blocks, a convolution layer, a multi-scale residual block MSRB and two multi-scale hole attention blocks MSDA, and the structural relationship is as follows: the 3 rd multi-scale hole attention block MSDA- & gt 4 th residual block- & gt 5 th residual block- & gt 4 th multi-scale hole attention block MSDA- & gt 4 th multi-scale residual block MSRB- & gt 6 th residual block- & gt 2 nd convolution layer;

The convolution kernel sizes of the 1 st convolution layer and the 2 nd convolution layer are 3 multiplied by 3, and the step sizes are 2;

The 1 st residual block, the 2 nd residual block, the 3 rd residual block, the 4 th residual block, the 5 th residual block and the 6 th residual block are all identical to the residual blocks in the multi-scale residual module MSRB in structure.

9. The method according to claim 1, characterized in that:

The super prior encoder established in the step (2 f) is formed by sequentially cascading a 3 rd convolution layer, a 4 th convolution layer and a 5 th convolution layer, wherein the convolution kernel size of the 3 rd convolution layer is 3 multiplied by 3, the convolution kernel sizes of the 4 th convolution layer and the 5 th convolution layer are 5 multiplied by 5, and the step sizes of the three convolution layers are all 2;

The super prior decoder established in the step (2 g) is formed by sequentially cascading a 6 th convolution layer, a 7 th convolution layer and an 8 th convolution layer, wherein the convolution kernel sizes of the 6 th convolution layer and the 7 th convolution layer are 5 multiplied by 5, the convolution kernel size of the 8 th convolution layer is 3 multiplied by 3, and the step sizes of the three convolution layers are 2.

10. The method of claim 1, wherein the rate distortion loss function L in step (2 i) is represented as follows:

wherein: Representing underlying potential representations Is a loss of the code rate of (a),Representing a super-prior potential representationIs a loss of the code rate of (a),Representing MSE distortion loss of the original image and the reconstructed image, wherein lambda represents a weight parameter, and different rate distortion balances can be achieved by adjusting lambda, so that different compression ratios are realized.

11. The method of claim 1, wherein the iterative training of the remote sensing image compression network with the goal of minimizing the loss function in step (3) is performed using Adam optimizer and gradient descent method, as follows:

(3a) Dividing the remote sensing image training data set into a plurality of image groups according to the batch size, and inputting the first image group into a compression network to obtain initial weight and offset of each convolution operation of the network;

(3b) Substituting the verification image into a rate distortion optimization Loss formula, and calculating a Loss value Loss corresponding to each image in the training set;

(3c) Updating network parameters by using an Adam optimizer with the minimum loss function as a target to obtain a compressed network model after the first parameter update;

(3d) Inputting the second image group into the updated compressed network model, and repeating the steps (3 b) to (3 d) to obtain a compressed network model with updated second parameters; and the like until the last group of image groups is input into the compression network after the previous update, so as to obtain the trained compression network.