CN117495679A

CN117495679A - Image super-resolution method and device based on non-local sparse attention

Info

Publication number: CN117495679A
Application number: CN202311460075.3A
Authority: CN
Inventors: 祝晓斌; 李亦轲; 周鸿杨; 杨春; 殷绪成
Original assignee: University of Science and Technology Beijing USTB
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2023-11-03
Filing date: 2023-11-03
Publication date: 2024-02-02
Anticipated expiration: 2043-11-03
Also published as: CN117495679B

Abstract

The present disclosure relates to an image super-resolution method and device based on non-local sparse attention, which belongs to the technical field of image processing, and the method includes: acquiring a compressed low resolution image; extracting shallow features of the low-resolution image to obtain a shallow feature map of the low-resolution image; carrying out channel sparse processing at least once and space sparse processing at least once on the shallow feature map to obtain a two-dimensional sparse feature map; fusing the two-dimensional sparse feature map and the shallow feature map to obtain a feature fusion map of the low-resolution image; and carrying out convolution and up-sampling processing on the feature fusion map to obtain a super-resolution image.

Description

Image super-resolution method and device based on non-local sparse attention

Technical Field

The embodiment of the disclosure relates to the technical field of image processing, in particular to an image super-resolution method and device based on non-local sparse attention.

Background

With the rapid development of the information age, digital images are an important form of information transmission, and resolution is in higher demand. At present, the existing image reconstruction mode can segment the characteristics of a low-resolution image to a certain extent, the learning capability of the mode on global characteristics is poor, irrelevant textures can appear during image reconstruction, and the image reconstruction effect is poor.

Disclosure of Invention

It is an object of embodiments of the present disclosure to provide a new solution for an image super-resolution method and apparatus based on non-local sparse attention.

According to a first aspect of the present disclosure, there is provided a non-local sparse attention-based image super-resolution method, the method comprising:

acquiring a compressed low resolution image;

extracting shallow features of the low-resolution image to obtain a shallow feature map of the low-resolution image;

carrying out channel sparse processing at least once and space sparse processing at least once on the shallow feature map to obtain a two-dimensional sparse feature map;

fusing the two-dimensional sparse feature map and the shallow feature map to obtain a feature fusion map of the low-resolution image;

and carrying out convolution and up-sampling processing on the feature fusion map to obtain a super-resolution image.

Optionally, the performing at least one channel sparse processing and at least one space sparse processing on the shallow feature map to obtain a two-dimensional sparse feature map includes:

performing at least one channel sparse processing on the shallow feature map through a two-dimensional sparse processing module, and then performing space sparse processing to obtain a two-dimensional sparse feature map after at least one processing; the two-dimensional sparse processing module comprises at least one group of channel sparse networks and space sparse networks which are connected in series; the shallow layer feature map is input to a 1 st channel sparse network; the channel sparse network performs channel sparse processing on the input feature map to obtain a channel sparse feature map, and the same group of space sparse networks performs space sparse processing on the channel sparse feature map to obtain a processed depth sparse feature map; each space sparse network outputs a processed depth sparse feature map, the processed depth sparse feature map output by the space sparse network of the former group is input to the channel sparse network of the adjacent latter group, and channel sparse processing is continued.

Optionally, performing channel sparse processing on the input feature map to obtain a channel sparse feature map, including:

extracting non-local features of the input feature map to obtain a first remodelling matrix;

obtaining an attention map according to the vector of the first remodelling matrix;

screening attention features of a set proportion in the attention map according to a set screening mechanism;

and obtaining a channel sparse feature map according to the attention feature, a preset first activation function and the vector.

Optionally, performing spatial sparse processing on the channel sparse feature map to obtain the processed depth sparse feature map, where the processing includes:

dividing the channel sparse feature map into a plurality of groups of blocks according to a set partitioning rule;

extracting local features of the multiple groups of image blocks to obtain a second plastic matrix;

obtaining an attention map according to the vector of the second plastic matrix;

and obtaining a depth sparse feature map according to the attention map, a preset second activation function and the vector.

Optionally, the method further includes, after performing channel sparse processing on the input depth sparse feature map to obtain a channel sparse feature map:

Extracting first high-frequency information in the channel sparse feature map through a first gating convolution feedforward network connected between the channel sparse network of the same group and the space sparse network of the same group; the first high-frequency information is image high-frequency information in the channel sparse feature map after channel sparse processing.

Optionally, after performing spatial sparse processing on the channel sparse feature map to obtain the processed depth sparse feature map, the method further includes:

extracting second high-frequency information in the depth sparse feature map through a second gating convolution feed-forward network connected between the channel sparse network of the latter group and the space sparse network of the former group; the second high-frequency information is image high-frequency information in the depth sparse feature map after spatial sparse processing.

Optionally, each gated convolutional feed forward network is image high frequency information obtained by extracting features of the input sparse network map and a preset third activation function.

The steps of carrying out channel sparse processing at least once and space sparse processing at least once on the shallow feature map are respectively executed by N serially connected two-dimensional sparse processing modules, so that N processed deep feature maps are obtained; wherein N is more than or equal to 2 and is an integer; the shallow feature map is input to a 1 st two-dimensional sparse processing module; the two-dimensional sparse processing module performs at least one channel sparse processing and at least one space sparse processing on the characteristics of the input characteristic map to obtain a processed deep characteristic map; each two-dimensional sparse processing module outputs a processed deep feature map, and the processed deep feature map output by the previous two-dimensional sparse processing module is input to the next adjacent two-dimensional sparse processing module to continue to perform two-dimensional sparse processing; and obtaining a two-dimensional sparse feature map according to the shallow feature map and the N processed deep feature maps.

According to a second aspect of the present disclosure, there is also provided an image super-resolution apparatus based on non-local sparse attention, the apparatus comprising:

an image acquisition module for acquiring a compressed low resolution image;

The feature acquisition module is used for extracting the shallow features of the low-resolution image to obtain a shallow feature map of the low-resolution image;

the feature processing module is used for carrying out channel sparse processing at least once and space sparse processing at least once on the shallow feature map to obtain a two-dimensional sparse feature map;

the feature fusion module is used for fusing the two-dimensional sparse feature map and the shallow feature map to obtain a feature fusion map of the low-resolution image;

and the image obtaining module is used for carrying out rolling and up-sampling processing on the feature fusion image to obtain a super-resolution image.

According to a third aspect of the present disclosure, there is also provided an image super-resolution apparatus based on non-local sparse attention, comprising a memory for storing a computer program and a processor; the processor is configured to execute the computer program to implement the method according to the first aspect of the present disclosure.

According to a fourth aspect of the present disclosure, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method according to the first aspect of the present disclosure.

The method has the advantages that the local features and the non-local features of the low-resolution image are processed through the channel sparse processing of the channel sparse network and the space sparse processing of the space sparse network, so that the reconstruction effect of the low-resolution image is improved, the generation of artifacts is reduced, and the super-resolution image with higher quality is obtained.

Other features of the disclosed embodiments and their advantages will become apparent from the following detailed description of exemplary embodiments of the disclosure, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the embodiments of the disclosure.

FIG. 1 is a schematic diagram of the composition of a non-local sparse attention-based image super-resolution system capable of applying a non-local sparse attention-based image super-resolution method according to one embodiment;

FIG. 2 is a flow diagram of a non-local sparse attention-based image super-resolution method according to one embodiment;

FIG. 3 is a schematic diagram of the structure of a channel sparse network according to one embodiment;

FIG. 4 is a schematic diagram of the structure of a spatially sparse network according to one embodiment;

FIG. 5 is a schematic diagram of a structure of a two-dimensional sparse processing module, according to one embodiment;

FIG. 6 is a schematic diagram of a gated convolutional feed forward network in accordance with one embodiment;

FIG. 7 is a schematic diagram of the composition of a two-dimensional sparse processing module, according to one embodiment;

FIG. 8 is a schematic diagram of the composition of a residual sparse Trasformer set according to one embodiment

FIG. 9 is a block schematic diagram of a non-local sparse attention-based image super-resolution apparatus according to one embodiment;

fig. 10 is a schematic diagram of a hardware structure of an image super-resolution apparatus based on non-local sparse attention according to one embodiment.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.

The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of exemplary embodiments may have different values.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

< System example >

Fig. 1 is a schematic diagram of a composition structure of a non-local sparse attention-based image super-resolution system to which a non-local sparse attention-based image super-resolution method according to an embodiment can be applied. As shown in fig. 1, the system includes an image transmitting apparatus 1000 and an image receiving apparatus 2000, and can be applied to a scene of image processing.

The image transmitting apparatus 1000 may be communicatively connected to the image receiving apparatus 2000 by wired or wireless, and the image transmitting apparatus 1000 may compress some of the images into low resolution images to transmit to the image receiving apparatus 2000, and the image receiving apparatus 2000 may reconstruct the images upon receiving the compressed low resolution images and may obtain super resolution images to display.

The image transmission apparatus 1000 may also include a processor 1100, a memory 1200, an interface device 1300, and a communication device 1400.

The image receiving apparatus 2000 may also include a processor 2100, a memory 2200, an interface device 2300, and a communication device 2400.

The processors 1100, 1200 are configured to execute a computer program that may be written in an instruction set of an architecture such as x86, arm, RISC, MIPS, SSE, etc. The memory 1200, 2200 includes, for example, ROM (read only memory), RAM (random access memory), nonvolatile memory such as a hard disk, and the like. The interface devices 1300, 2300 include, for example, USB interfaces, video interfaces, network interfaces, and the like. The communication devices 1400 and 2400 can perform wired or wireless communication, for example, and the communication device 1400 may include at least one short-range communication module, for example, any module that performs short-range wireless communication based on a short-range wireless communication protocol such as a Hilink protocol, wiFi (IEEE 802.11 protocol), mesh, bluetooth, zigBee, thread, Z-Wave, NFC, UWB, liFi, or the like, and the communication device 1400 may include a remote communication module, for example, any module that performs WLAN, GPRS, 2G/3G/4G/5G remote communication.

The memory 2200 of the image receiving apparatus 2000 is for storing a computer program for controlling the processor 2100 to operate to perform the non-local sparse attention based image super resolution method according to any embodiment of the present disclosure to reconstruct a high resolution original image from a received low resolution image, completing image restoration. The skilled person can design a computer program according to the method steps and how the computer program controls the processor to operate, which is well known in the art and will not be described in detail here.

The above image transmitting apparatus 1000 and the image receiving apparatus 2000 may be any electronic apparatus having image processing capability, for example, may be any type of user terminal apparatus, may be a server, or the like, and are not limited herein.

< method example >

FIG. 2 is a flow diagram of a non-local sparse attention-based image super-resolution method according to one embodiment. The implementation subject is, for example, the image receiving apparatus 2000 in fig. 1.

As shown in fig. 2, the non-local sparse attention-based image super-resolution method of the present embodiment may include the following steps S210 to S250:

step S210, a compressed low resolution image is acquired.

In the present embodiment, the low resolution is lower than the resolution specified by the above-described image transmission apparatus 1000, and the specified resolution is, for example, 320×240, 1024×768, or 1600×1280, which is not limited herein.

In some examples, the low resolution image may be a multi-channel color image, such as an RGB three-channel, 8-channel, 16-channel, or 32-channel color image, or the like.

Step S220, shallow layer features of the low-resolution image are extracted, and a shallow layer feature map of the low-resolution image is obtained.

In some embodiments, shallow features of the low resolution image are extracted by 1 3 x 3 convolution layer and a shallow feature map of the low resolution image is obtained.

And step S230, carrying out channel sparse processing at least once and space sparse processing at least once on the shallow feature map to obtain a two-dimensional sparse feature map.

In this embodiment, the features of the low resolution image may include local features and non-local features, where the local features are features of a plurality of tiles into which the image is segmented, and the non-local features are features reflecting the overall effect of the image. The shallow feature map of the low-resolution image can be subjected to sparse processing for different times to obtain a two-dimensional sparse feature map. For example, the shallow feature map of the low-resolution image may be subjected to channel sparse processing at least once to obtain a channel sparse feature map, and then spatial sparse processing is performed on the channel sparse feature map at least once to obtain a two-dimensional sparse feature map. For another example, the shallow feature map of the low-resolution image may be subjected to at least one spatial sparse process to obtain a spatial sparse feature map, and then subjected to at least one channel sparse process to obtain a two-dimensional sparse feature map. For another example, the shallow feature map of the low-resolution image may be subjected to channel sparse processing and space sparse processing simultaneously to obtain a channel sparse feature map and a space sparse feature map, and the channel sparse feature map and the space sparse feature map are fused to obtain a two-dimensional sparse feature map.

In some embodiments, step S230 may include the following: performing at least one channel sparse processing and then space sparse processing on the shallow feature map of the low-resolution image through a two-dimensional sparse processing module to obtain a two-dimensional sparse feature map after at least one processing; the two-dimensional sparse processing module comprises at least one group of channel sparse networks and space sparse networks which are connected in series; the shallow layer feature map is input to a 1 st channel sparse network; the channel sparse network performs channel sparse processing on the input feature map to obtain a channel sparse feature map, and the same group of space sparse networks performs space sparse processing on the channel sparse feature map to obtain a processed depth sparse feature map; each space sparse network outputs a processed depth sparse feature map, the processed depth sparse feature map output by the space sparse network of the former group is input to the channel sparse network of the adjacent latter group, and channel sparse processing is continued.

For example, the two-dimensional sparse processing module comprises two groups of channel sparse networks and a space sparse network, the first group of the space sparse networks are connected in series with the second group of the channel sparse networks, the first group of the channel sparse networks conduct channel sparse processing on the shallow feature images to obtain a first channel sparse feature image, the first group of the space sparse networks conduct space sparse processing on the channel sparse feature images to obtain space sparse feature images, the second group of the channel sparse networks conduct channel sparse network processing on the space sparse feature images to obtain a second channel sparse feature image, and the second group of the space sparse networks conduct space sparse processing on the second channel sparse feature images to obtain the two-dimensional sparse feature images.

The channel sparse network is, for example, a channel sparse transform network, and the space sparse network is, for example, a space sparse transform network.

In other words, the channel sparse processing is performed on the shallow feature map and then the space sparse processing is performed, so that the calculated amount of the two-dimensional sparse feature map can be effectively reduced, and the calculation resources are saved.

In some embodiments, for an input feature map, performing channel sparseness processing, and a process of obtaining the channel sparseness feature map includes the following contents: extracting non-local features of an input feature map to obtain a first remodelling matrix; obtaining an attention map according to the vector of the first remodelling matrix; screening attention features of a set proportion in an attention map according to a set screening mechanism; and obtaining a channel sparse feature map according to the attention feature, a preset first activation function and the vector.

In some examples, as shown in fig. 3, for an input feature map, the corresponding non-local features may be extracted by 1 layer normalization, 1 x 1 convolution layer and 1 3 x 3 convolution layer, that is, a first remodeling matrix representing the non-local features may be calculated. The vectors of the first remodelling matrix are three vectors of a query Q, a key K and a value V, and the query Q and the key K are subjected to layer normalization and then are subjected to matrix multiplication in a channel dimension, so that attention is sought. The set screening mechanism is, for example, a Top-K mechanism, and can screen out the attention characteristics of the previous K, wherein K is, for example, 50%, 60% or 70%, etc., and the invention is not limited thereto. The attention attempts to use the Top-K mechanism and filters the attention features of the front K to achieve content sparseness. The preset first activation function is RELU, the first activation function is used for the attention features of the front K which are screened out, matrix multiplication is carried out on the first activation function and the value V in the channel dimension to obtain an output result, matrix remodeling is carried out on the output result, and then a channel sparse feature map is formed through 1 multiplied by 1 convolution layers.

In some embodiments, for the channel sparse feature map, performing spatial sparse processing, and obtaining a processed depth sparse feature map includes the following contents: dividing the channel sparse feature map into a plurality of groups of blocks according to a set partitioning rule; extracting local features of a plurality of groups of image blocks to obtain a second plastic matrix; obtaining an attention map according to the vector of the second plastic matrix; and obtaining a depth sparse feature map according to the attention map, the preset second activation function and the vector.

In some examples, as shown in fig. 4, the set partitioning rule may be a slicing operation based on position sparseness, and the input channel sparseness feature images are marked pixel by pixel with a granularity I, and the same marks are classified into the same group, so as to realize re-slicing of the channel sparseness feature images, so as to realize position sparseness. Dividing an input channel sparse feature map into a plurality of groups of image blocks through the block dividing rule, extracting local features of the plurality of groups of image blocks through 1 layer normalization, 1 multiplied by 1 convolution layer and 1 multiplied by 3 convolution, and calculating to obtain a second plastic matrix representing the local features of the plurality of groups of image blocks. The preset second activation function is, for example, a RELU activation function. The vectors of the second plastic matrix are three vectors of a query Q, a key K and a value V, the query Q and the key K are subjected to layer normalization and then are subjected to matrix multiplication in the space dimension to obtain attention force diagram, the attention force diagram is input into an activation function, matrix multiplication is performed on the value V and the activation function in the space dimension to obtain an output result, and the output result is subjected to matrix remodeling and then subjected to 1 multiplied by 1 convolution layer to form a depth sparse feature map.

In some embodiments, after obtaining the channel sparsity feature map, the method further includes: extracting first high-frequency information in a channel sparse feature map through a first gating convolution feedforward network connected between a channel sparse network of the same group and a space sparse network of the same group; the first high-frequency information is the image high-frequency information in the channel sparse feature map after the channel sparse processing.

In some examples, as shown in fig. 5, each two-dimensional sparse processing module may include a channel sparse network, a first gated convolutional feed forward network, a spatial sparse network, and a second gated convolutional feed forward network, connected in series in sequence. The first gating convolution feedforward network can extract first high-frequency information in a channel sparse feature map output by the channel sparse network, the first high-frequency information can contain image high-frequency information which shows the channel sparse feature map, and the image high-frequency information can enhance the expression of texture information by the network.

In some embodiments, after obtaining the processed depth sparse feature map, the method further comprises: extracting second high-frequency information in the depth sparse feature map through a second gating convolution feedforward network connected between the channel sparse network of the latter group and the space sparse network of the former group; the second high-frequency information is the image high-frequency information in the depth sparse feature map after the spatial sparse processing.

In some examples, as shown in fig. 5, the second gated convolutional feed forward network may extract second high frequency information in the depth sparse feature map output by the spatial sparse network, which may contain image high frequency information representing the depth sparse feature map, which may enhance the network's expression of texture information.

In some embodiments, each gated convolutional feed forward network is image high frequency information derived from the extracted features of the input sparse network map and a preset third activation function. Wherein the third activation function is for example a GELU activation function.

In some examples, as shown in fig. 6, the first gated convolution feedforward network may extract and calculate features in the channel sparse feature map by 1 layer normalization, 1×1 convolution layer and 1×3 convolution to obtain a first calculation result, input the first calculation result into the GELU activation function to obtain a second calculation result, perform element multiplication on the second calculation result as a coefficient and the first calculation result to obtain a third calculation result, and pass the third calculation result through 1×1 convolution layer to obtain the channel sparse feature map for extracting the image high frequency information.

In some examples, as shown in fig. 6, the second gated convolution feedforward network may also extract and calculate the features in the depth sparse feature map by 1 layer normalization, 1×1 convolution layer and 1×3 convolution to obtain a fourth calculation result, input the fourth calculation result into the GELU activation function to obtain a fifth calculation result, perform element multiplication on the fifth calculation result as a coefficient and the fourth calculation result to obtain a sixth calculation result, and pass the sixth calculation result through 1×1 convolution layer to obtain the depth sparse feature map of the pixel point of the extracted image high frequency information.

The image high-frequency information may represent regions where the image intensity such as brightness or gray scale is greatly changed, such as regions of texture or edges, or the like.

In some embodiments, step S220 may include the following: the method comprises the steps of performing channel sparse processing and space sparse processing on a shallow feature map at least once through N serially connected two-dimensional sparse processing modules respectively to obtain N processed deep feature maps; wherein N is more than or equal to 2 and is an integer; the shallow feature map is input to a 1 st two-dimensional sparse processing module; the two-dimensional sparse processing module performs at least one channel sparse processing and at least one space sparse processing on the characteristics of the input characteristic map to obtain a processed deep characteristic map; each two-dimensional sparse processing module outputs a processed deep feature map, and the processed deep feature map output by the previous two-dimensional sparse processing module is input to the next adjacent two-dimensional sparse processing module to continue to perform two-dimensional sparse processing; and obtaining a two-dimensional sparse feature map according to the shallow feature map and the N processed deep feature maps.

In some examples, as shown in fig. 7 and 8, shallow features of the low resolution image are extracted by 1 3×3 convolution layers, and a shallow feature map of the low resolution image is obtained. The shallow feature map is input to a 1 st channel sparse network of a 1 st two-dimensional sparse processing module, the 1 st channel sparse network performs channel sparse processing on the shallow feature map to obtain a channel sparse feature map, the 1 st space sparse network performs space sparse processing on the channel sparse feature map to obtain a depth sparse feature map, the depth sparse feature map is input to a 2 nd channel sparse network to be processed, and the like, and a last 1 space sparse network of the last 1 two-dimensional sparse processing module can obtain the two-dimensional sparse feature map. The number of the two-dimensional sparse processing modules is, for example, 3, 10 or 4 as shown in fig. 7, which is not limited herein, and each 4 two-dimensional modules may be a set of residual sparse Trasformer groups, which may be 3, 5 or 6 as shown in fig. 8, which is not limited herein. In other words, through the processing of the plurality of two-dimensional sparse processing modules, coupled two-dimensional content sparseness and position sparseness of the image can be achieved, and meanwhile, the number of parameters and floating point operation times are effectively reduced.

In some examples, as shown in fig. 8, after obtaining the two-dimensional sparse feature map, features of the two-dimensional sparse feature map may be extracted by 1 3 x 3 convolution layers.

And step S240, fusing the two-dimensional sparse feature map and the shallow feature map to obtain a feature fusion map of the low-resolution image.

In some examples, as shown in fig. 8, a feature fusion map of the low resolution image may be obtained by fusing a shallow feature map of the low resolution image extracted by 1 3×3 convolution layer with features extracted from a two-dimensional sparse feature map.

And step S250, rolling and up-sampling are carried out on the feature fusion map, and a super-resolution image is obtained.

In some examples, as shown in fig. 8, the features of the feature fusion map are extracted by 1 3×3 convolution layers, and then upsampled by sub-pixel convolution, so as to obtain a corresponding super-resolution image.

In other words, the local features and the non-local features of the low-resolution image are processed through the channel sparse processing of the channel sparse network and the space sparse processing of the space sparse network, so that the reconstruction effect of the low-resolution image is improved, the generation of artifacts is reduced, and the super-resolution image with higher quality is obtained.

In some examples, after acquiring the compressed low resolution image, the method further includes training the channel sparse network and the spatial sparse network with training samples.

Training the channel sparse network and the space sparse network through training samples, including:

the method comprises the steps of taking a common DIV2K data set as a training sample, wherein the common DIV2K data set comprises a plurality of groups of low-resolution training images and corresponding high-resolution training images, inputting the low-resolution training images into a network, outputting a processing result, carrying out L1Loss function calculation on the processing result and the high-resolution training images to obtain a Loss value, and adjusting parameters in the whole network architecture according to the obtained Loss value so as to correct learning deviation. The network architecture here may be one consisting of 6 sets of residual sparse Trasformer sets as in fig. 8.

< device example one >

FIG. 9 is a functional block diagram of an image super-resolution device based on non-local sparse attention, according to one embodiment. As shown in fig. 9, the non-local sparse attention-based image super-resolution apparatus 900 may include:

an image acquisition module 910 for acquiring a compressed low resolution image;

The feature obtaining module 920 is configured to extract a shallow feature of the low-resolution image, and obtain a shallow feature map of the low-resolution image;

the feature processing module 930 is configured to perform at least one channel sparse processing and at least one spatial sparse processing on the shallow feature map to obtain a two-dimensional sparse feature map;

the feature fusion module 940 is configured to fuse the two-dimensional sparse feature map with the shallow feature map to obtain a feature fusion map of the low-resolution image;

the image obtaining module 950 is configured to perform convolution and upsampling processing on the feature fusion map to obtain a super resolution image.

Optionally, the feature processing module 930 is further configured to perform at least one channel-first sparse processing and then space sparse processing on the shallow feature map through the two-dimensional sparse processing module, to obtain a two-dimensional sparse feature map after at least one processing; the two-dimensional sparse processing module comprises at least one group of channel sparse networks and space sparse networks which are connected in series; the shallow feature map is input to a 1 st channel sparse network; the channel sparse network performs channel sparse processing on the input feature map to obtain a channel sparse feature map, and the same group of space sparse networks performs space sparse processing on the channel sparse feature map to obtain a processed depth sparse feature map; each space sparse network outputs a processed depth sparse feature map, the processed depth sparse feature map output by the space sparse network of the former group is input to the channel sparse network of the adjacent latter group, and channel sparse processing is continued.

Optionally, the feature processing module 930 is further configured to extract a non-local feature of the input feature map to obtain a first remodeling matrix; obtaining an attention map according to the vector of the first remodelling matrix; screening attention features of a set proportion in an attention map according to a set screening mechanism; and obtaining a channel sparse feature map according to the attention feature, a preset first activation function and the vector.

Optionally, the feature processing module 930 is further configured to divide the channel sparse feature map into multiple groups of tiles according to a set partitioning rule; extracting local features of a plurality of groups of image blocks to obtain a second plastic matrix; obtaining an attention map according to the vector of the second plastic matrix; and obtaining a depth sparse feature map according to the attention map, the preset second activation function and the vector.

Optionally, the image super-resolution device 900 based on non-local sparse attention further includes a first information extraction module, where the first information extraction module is configured to extract first high-frequency information in the channel sparse feature map through a first gated convolution feedforward network connected between the channel sparse network of the same group and the space sparse network of the same group; the first high-frequency information is the image high-frequency information in the channel sparse feature map after the channel sparse processing.

Optionally, the image super-resolution device 900 based on non-local sparse attention further includes a second information extraction module, where the second information extraction module is configured to extract second high-frequency information in the depth sparse feature map through a second gated convolution feed-forward network connected between the channel sparse network of the latter group and the space sparse network of the former group; the second high-frequency information is the image high-frequency information in the depth sparse feature map after the spatial sparse processing.

The feature processing module 930 is further configured to perform, by performing, by each of the N serially connected two-dimensional sparse processing modules, at least one channel sparse processing and at least one spatial sparse processing on the shallow feature map, to obtain N processed deep feature maps; wherein N is more than or equal to 2 and is an integer; the shallow feature map is input to a 1 st two-dimensional sparse processing module; the two-dimensional sparse processing module performs at least one channel sparse processing and at least one space sparse processing on the characteristics of the input characteristic map to obtain a processed deep characteristic map; each two-dimensional sparse processing module outputs a processed deep feature map, and the processed deep feature map output by the previous two-dimensional sparse processing module is input to the next adjacent two-dimensional sparse processing module to continue to perform two-dimensional sparse processing; and obtaining a two-dimensional sparse feature map according to the shallow feature map and the N processed deep feature maps.

The non-local sparse attention based image super resolution apparatus 900 may be an image receiving device 2000.

< device example two >

Fig. 10 is a schematic diagram of a hardware structure of an image super-resolution apparatus based on non-local sparse attention according to another embodiment.

As shown in fig. 10, the non-local sparse attention based image super resolution device 100 comprises a processor 1010 and a memory 1020, the memory 1020 for storing an executable computer program, the processor 1010 for performing a method as in any of the method embodiments above, according to control of the computer program.

The non-local sparse attention based image super resolution apparatus 100 may be an image receiving device 2000.

The modules of the image super-resolution apparatus 900 based on the non-local sparse attention may be implemented by the processor 1010 executing the computer program stored in the memory 1020 in the present embodiment, or may be implemented by other structures, which are not limited herein.

The present invention may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are all equivalent.

The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. An image super-resolution method based on non-local sparse attention, the method comprising:

acquiring a compressed low resolution image;

2. The method of claim 1, wherein the performing at least one channel sparseness treatment and at least one space sparseness treatment on the shallow feature map to obtain a two-dimensional sparse feature map comprises:

3. The method of claim 2, wherein the performing channel sparseness processing on the input feature map to obtain a channel sparseness feature map includes:

4. The method according to claim 2, wherein the performing spatial sparseness processing on the channel sparseness feature map to obtain the processed depth sparseness feature map includes:

5. The method according to claim 2, wherein the channel sparseness processing is performed on the input feature map, and after obtaining the channel sparseness feature map, the method further includes:

6. The method of claim 5, wherein the performing spatial sparseness processing on the channel sparseness feature map, after obtaining the processed depth sparseness feature map, further comprises:

7. The method according to claim 5 or 6, wherein each gated convolutional feed forward network is image high frequency information obtained by extracting features of the input sparse network map and a preset third activation function.

8. The method of claim 1, wherein the performing at least one channel sparseness treatment and at least one space sparseness treatment on the shallow feature map to obtain a two-dimensional sparse feature map comprises:

9. An image super-resolution device based on non-local sparse attention, the device comprising:

an image acquisition module for acquiring a compressed low resolution image;

10. An image super-resolution device based on non-local sparse attention, comprising a memory and a processor, the memory being for storing a computer program; the processor is configured to execute the computer program to implement the method according to any one of claims 1 to 8.