CN115861343A

CN115861343A - Image representation method and system of arbitrary scale based on dynamic implicit image function

Info

Publication number: CN115861343A
Application number: CN202211590183.8A
Authority: CN
Inventors: 金枝; 何宗耀
Original assignee: Sun Yat Sen University; Sun Yat Sen University Shenzhen Campus
Current assignee: Sun Yat Sen University; Sun Yat Sen University Shenzhen Campus
Priority date: 2022-12-12
Filing date: 2022-12-12
Publication date: 2023-03-28
Anticipated expiration: 2042-12-12
Also published as: CN115861343B

Abstract

The invention discloses an arbitrary scale image representation method and system based on a dynamic implicit image function, wherein the method comprises the steps of obtaining an image to be processed; carrying out implicit coding processing on the image to be processed through a pre-trained coder to obtain a two-dimensional characteristic diagram; and inputting the two-dimensional characteristic diagram into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional characteristic diagram, and carrying out pixel value prediction processing through a double-stage multilayer perceptron to obtain an image pixel value. The embodiment of the invention can reduce the calculation cost of continuous representation of the image, improve the processing performance and can be widely applied to the technical field of artificial intelligence.

Description

Image representation method and system of any scale based on dynamic implicit image function

技术领域technical field

本发明涉及人工智能技术领域，尤其是基于动态隐式图像函数的任意尺度图像表示方法及系统。The invention relates to the technical field of artificial intelligence, in particular to a method and system for representing images of arbitrary scales based on dynamic implicit image functions.

背景技术Background technique

数字图像是真实世界在数字世界中的二维表示，但连续的物理世界却在常常在传感器中被量化同时在计算机中被存储为了离散的像素矩阵形式。如果图像可以表达为连续的形式，就可以在连续空间获取任意分辨率的图像，从而保证图像的所描述场景的精度。相关技术中对于图像的连续表示方法虽然在连续图像表示方面具有优异的性能，但是计算成本会随着图像放大倍数的增加而呈平方阶增加，使得任意尺度的超分辨率重建耗时巨大。综合上述，相关技术中存在的技术问题亟需得到解决。A digital image is a two-dimensional representation of the real world in the digital world, but the continuous physical world is often quantized in sensors and stored in a computer as a matrix of discrete pixels. If the image can be expressed in a continuous form, an image of any resolution can be obtained in the continuous space, thereby ensuring the accuracy of the scene described by the image. Although the continuous image representation method in the related art has excellent performance in continuous image representation, the computational cost will increase quadratically with the increase of image magnification, making super-resolution reconstruction of any scale time-consuming. Based on the above, the technical problems existing in the related technologies need to be solved urgently.

发明内容Contents of the invention

有鉴于此，本发明实施例提供基于动态隐式图像函数的任意尺度图像表示方法及系统，以实现减少计算成本，提高处理性能。In view of this, embodiments of the present invention provide a method and system for representing images of arbitrary scales based on dynamic implicit image functions, so as to reduce computing costs and improve processing performance.

一方面，本发明提供了一种基于动态隐式图像函数的任意尺度图像表示方法，包括：On the one hand, the present invention provides a method for representing images of arbitrary scales based on dynamic implicit image functions, including:

获取待处理图像；Get the image to be processed;

通过预先训练的编码器对所述待处理图像进行隐式编码处理，得到二维特征图；performing implicit encoding processing on the image to be processed by a pre-trained encoder to obtain a two-dimensional feature map;

将所述二维特征图输入动态隐式图像网络，对所述二维特征图进行动态坐标切片处理，并通过双阶段多层感知器进行像素值预测处理，得到图像像素值。Inputting the two-dimensional feature map into a dynamic implicit image network, performing dynamic coordinate slice processing on the two-dimensional feature map, and performing pixel value prediction processing through a two-stage multi-layer perceptron to obtain image pixel values.

可选地，所述对所述二维特征图进行动态坐标切片处理，包括：Optionally, the performing dynamic coordinate slicing processing on the two-dimensional feature map includes:

输入图像放大倍数；Enter image magnification;

从所述二维特征图中获取特征向量确定为隐码，根据所述隐码对所述二维特征图中的坐标进行分组处理，得到特征坐标组；Obtaining a feature vector from the two-dimensional feature map is determined as a hidden code, and grouping coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate group;

根据所述图像放大倍数对所述特征坐标组进行切片处理，得到坐标切片。Slicing is performed on the feature coordinate group according to the image magnification to obtain a coordinate slice.

可选地，所述根据所述图像放大倍数对所述特征坐标组进行切片处理，得到坐标切片，包括：Optionally, performing slice processing on the feature coordinate group according to the image magnification to obtain a coordinate slice includes:

根据所述图像放大倍数确定切片间隔；determining slice intervals according to the image magnification;

根据所述切片间隔对所述特征坐标组进行划分，得到坐标切片，所述坐标切片用于对切片内的所有坐标共享同一隐码。The feature coordinate group is divided according to the slice interval to obtain a coordinate slice, and the coordinate slice is used to share the same hidden code for all coordinates in the slice.

可选地，所述通过双阶段多层感知器进行像素值预测处理，包括：Optionally, said performing pixel value prediction processing through a two-stage multi-layer perceptron includes:

输入坐标切片和切片隐码；Input coordinate slice and slice hidden code;

对所述坐标切片和切片隐码进行第一阶段处理，得到切片隐向量；performing the first-stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;

获取待预测坐标，所述待预测坐标为所述坐标切片中的任意坐标；Acquiring coordinates to be predicted, where the coordinates to be predicted are any coordinates in the coordinate slice;

根据所述待预测坐标对所述切片隐向量进行第二阶段处理，得到所述待预测坐标的像素值。The second-stage processing is performed on the hidden vector of the slice according to the coordinates to be predicted to obtain the pixel values of the coordinates to be predicted.

可选地，所述双阶段多层感知器包括隐藏层，所述隐藏层由线性层和激活函数组成。Optionally, the two-stage multilayer perceptron includes a hidden layer, and the hidden layer is composed of a linear layer and an activation function.

可选地，在所述通过预先训练的编码器对所述待处理图像进行隐式编码处理，得到二维特征图之前，所述方法还包括预先训练所述编码器和动态隐式图像网络，具体包括：Optionally, before the pre-trained encoder performs implicit encoding processing on the image to be processed to obtain a two-dimensional feature map, the method further includes pre-training the encoder and a dynamic implicit image network, Specifically include:

获取训练图像；Get training images;

通过所述编码器和所述动态隐式图像网络对所述训练图像进行像素预测处理，得到预测像素值；performing pixel prediction processing on the training image through the encoder and the dynamic implicit image network to obtain predicted pixel values;

根据所述训练图像的像素值和所述预测像素值确定像素损失值；determining a pixel loss value based on the pixel values of the training image and the predicted pixel values;

根据所述像素损失值对所述编码器和所述动态隐式图像网络的权重参数进行更新，得到训练好的编码器和动态隐式图像网络。The weight parameters of the encoder and the dynamic implicit image network are updated according to the pixel loss value to obtain a trained encoder and dynamic implicit image network.

另一方面，本发明实施例还提供了一种系统，包括：On the other hand, an embodiment of the present invention also provides a system, including:

第一模块，用于获取待处理图像；The first module is used to obtain the image to be processed;

第二模块，用于通过预先训练的编码器对所述待处理图像进行隐式编码处理，得到二维特征图；The second module is used to perform implicit encoding processing on the image to be processed through a pre-trained encoder to obtain a two-dimensional feature map;

第三模块，用于将所述二维特征图输入动态隐式图像网络，对所述二维特征图进行动态坐标切片处理，并通过双阶段多层感知器进行像素值预测处理，得到图像像素值。The third module is used to input the two-dimensional feature map into the dynamic implicit image network, perform dynamic coordinate slice processing on the two-dimensional feature map, and perform pixel value prediction processing through a two-stage multi-layer perceptron to obtain image pixels value.

可选地，所述第三模块包括：Optionally, the third module includes:

第一子模块，用于对所述二维特征图进行动态坐标切片处理；The first submodule is used to perform dynamic coordinate slice processing on the two-dimensional feature map;

第二子模块，用于通过双阶段多层感知器进行像素值预测处理。The second sub-module is used to perform pixel value prediction processing through a two-stage multi-layer perceptron.

可选地，所述第一子模块包括：Optionally, the first submodule includes:

第一单元，用于输入图像放大倍数；The first unit is used to input image magnification;

第二单元，用于从所述二维特征图中获取特征向量确定为隐码，根据所述隐码对所述二维特征图中的坐标进行分组处理，得到特征坐标组；The second unit is configured to obtain a feature vector from the two-dimensional feature map and determine it as a hidden code, and group coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate group;

第三单元，用于根据所述图像放大倍数对所述特征坐标组进行切片处理，得到坐标切片。The third unit is configured to perform slice processing on the feature coordinate group according to the image magnification to obtain coordinate slices.

可选地，所述第二子模块包括：Optionally, the second submodule includes:

第四单元，用于输入坐标切片和切片隐码；The fourth unit is used to input coordinate slices and slice hidden codes;

第五单元，用于对所述坐标切片和切片隐码进行第一阶段处理，得到切片隐向量；The fifth unit is used to perform the first-stage processing on the coordinate slice and the slice hidden code to obtain the slice hidden vector;

第六单元，用于获取待预测坐标，所述待预测坐标为所述坐标切片中的任意坐标；The sixth unit is used to acquire coordinates to be predicted, where the coordinates to be predicted are any coordinates in the coordinate slice;

第七单元，用于根据所述待预测坐标对所述切片隐向量进行第二阶段处理，得到所述待预测坐标的像素值。The seventh unit is configured to perform a second-stage process on the hidden vector of the slice according to the coordinates to be predicted to obtain pixel values of the coordinates to be predicted.

本发明采用以上技术方案与现有技术相比，具有以下技术效果：本发明实施例将所述二维特征图输入动态隐式图像网络，对所述二维特征图进行动态坐标切片处理，能够使神经网络执行从坐标切片到像素值切片的多对多映射，以便解码器可以仅使用一次隐码来预测坐标切片对应的所有像素值，减少了计算成本；并通过双阶段多层感知器进行像素值预测处理，得到图像像素值，能够令解码器使用非固定数量的坐标作为输入从而减少隐藏层数量，提高了处理性能。Compared with the prior art, the present invention adopts the above technical scheme and has the following technical effects: the embodiment of the present invention inputs the two-dimensional feature map into the dynamic implicit image network, and performs dynamic coordinate slice processing on the two-dimensional feature map, which can Make the neural network perform a many-to-many mapping from coordinate slices to pixel value slices, so that the decoder can use only one hidden code to predict all pixel values corresponding to the coordinate slice, reducing computational costs; and through a two-stage multi-layer perceptron. The pixel value prediction process obtains the image pixel value, which enables the decoder to use a non-fixed number of coordinates as input to reduce the number of hidden layers and improve processing performance.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained based on these drawings without creative effort.

图1是本申请实施例提供的一种基于动态隐式图像函数的任意尺度图像表示方法的流程图；Fig. 1 is a flow chart of an image representation method of any scale based on a dynamic implicit image function provided by an embodiment of the present application;

图2是本申请实施例提供的一种动态隐式图像函数的整体框架图；FIG. 2 is an overall framework diagram of a dynamic implicit image function provided by an embodiment of the present application;

图3是本申请实施例提供的一种坐标切片示例图；FIG. 3 is an example diagram of a coordinate slice provided by an embodiment of the present application;

图4是本申请实施例提供的一种双阶段多层感知器的结构图。FIG. 4 is a structural diagram of a dual-stage multilayer perceptron provided by an embodiment of the present application.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.

本申请实施例中提供的基于动态隐式图像函数的任意尺度图像表示方法及系统主要涉及人工智能技术。人工智能(Artificial Intelligence，AI)技术是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。人工智能技术是一门综合学科，涉及领域广泛，既有硬件层面的技术也有软件层面的技术。其中，人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术；人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。The arbitrary-scale image representation method and system based on dynamic implicit image functions provided in the embodiments of the present application mainly involve artificial intelligence technology. Artificial Intelligence (AI) technology is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. Artificial intelligence technology is a comprehensive subject that involves a wide range of fields, including both hardware-level technology and software-level technology. Among them, artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics; artificial intelligence software technologies mainly include computer vision technology, voice Processing technology, natural language processing technology and machine learning/deep learning and other major directions.

具体地，本申请实施例中提供的基于动态隐式图像函数的任意尺度图像表示方法及系统可以采用人工智能领域内的计算机视觉技术以及机器学习/深度学习技术对图像进行分析处理，以得到图像的连续图像表示。可以理解的是，针对不同的任务，本申请实施例中提供的方法均可以在对应的人工智能系统的应用场景中被执行；并且，这些方法执行的具体时机可以处于人工智能系统运行流程中的任意环节。Specifically, the arbitrary-scale image representation method and system based on dynamic implicit image functions provided in the embodiments of the present application can use computer vision technology in the field of artificial intelligence and machine learning/deep learning technology to analyze and process images to obtain image continuous image representation. It can be understood that, for different tasks, the methods provided in the embodiments of the present application can be executed in the corresponding application scenarios of the artificial intelligence system; and, the specific timing for executing these methods can be in the operating process of the artificial intelligence system any link.

隐式神经表示技术:与显式表示相比，隐式神经表示能够用少量参数捕捉物体的细节，并且其可微分特性允许通过神经渲染模型进行反向传播。然而，隐式神经表示在二维视觉任务上应用时通常需要独立预测每一个像素点，需要耗费大量计算成本和漫长的运行时间。Implicit Neural Representation Techniques: Compared with explicit representations, implicit neural representations are able to capture object details with a small number of parameters, and their differentiable properties allow backpropagation through neural rendering models. However, when implicit neural representations are applied to 2D vision tasks, they usually need to predict each pixel independently, which requires a lot of computational cost and long running time.

局部隐式图像函数(Local Implicit Image Function,LIIF)，为一种一种新颖的图像隐式表示方法，其使用多层感知器推断每个坐标上的像素值。Local Implicit Image Function (LIIF), a novel implicit representation of images, uses a multi-layer perceptron to infer pixel values at each coordinate.

在相关技术中，尽管LIIF可以在最大为30倍的任意尺度超分辨率任务中提供稳定的表现，但其计算成本随着放大倍数的增加而迅速增加。In related art, although LIIF can provide stable performance in arbitrary scale super-resolution tasks up to 30×, its computational cost increases rapidly with the increase of magnification.

有鉴于此，参照图1，本发明实施例提供一种基于动态隐式图像函数的任意尺度图像表示方法，包括：In view of this, referring to FIG. 1 , an embodiment of the present invention provides an image representation method of any scale based on a dynamic implicit image function, including:

S101、获取待处理图像；S101. Obtain an image to be processed;

S102、通过预先训练的编码器对所述待处理图像进行隐式编码处理，得到二维特征图；S102. Perform implicit encoding processing on the image to be processed by using a pre-trained encoder to obtain a two-dimensional feature map;

S103、将所述二维特征图输入动态隐式图像网络，对所述二维特征图进行动态坐标切片处理，并通过双阶段多层感知器进行像素值预测处理，得到图像像素值。S103. Input the two-dimensional feature map into a dynamic implicit image network, perform dynamic coordinate slice processing on the two-dimensional feature map, and perform pixel value prediction processing through a two-stage multilayer perceptron to obtain image pixel values.

在本发明实施例中，提出了动态隐式图像函数(Dynamic Implicit ImageFunction,DIIF)，这是一种快速有效的任意尺度图像表示方法。参照图2，I_in表示输入图像，编码器将输入图像映射到二维特征图作为其DIIF表示。给定真实图像的分辨率，可以从二维特征图中获取隐码z^*，以及隐码周围的坐标切片

其中，X_1st表示坐标切片的首坐标，X_last表示坐标切片的尾坐标。然后解码函数使用上述的信息来预测该坐标切片的所有像素值，即通过双阶段多层感知器(或称为粗到细的多层感知器)进行坐标的像素值预测，通过第一阶段(粗略阶段)预测切片隐向量H^*，并和待预测坐标X_i一起作为第二阶段(精细阶段)的输入，输出得到待预测坐标的像素值I_out-i。本发明实施例在训练阶段，使用预测得到的像素值I_out-i和真实图像的像素值I_gt-i计算损失函数，编码器和解码函数在自监督超分辨率任务中联合训练，而学习到的网络参数由所有图像共享。本发明实施例通过使用图像坐标分组和切片策略，使神经网络能够执行从坐标切片到像素值切片的多对多映射，而不是每次单独预测给定坐标的像素值。本发明实施例进一步提出了双阶段多层感知器(Coarse-to-FineMultilayer Perceptron,C2F-MLP)，用于执行基于动态坐标切片策略的图像解码，使得每个切片中的坐标数量能随着放大倍数的变化而变化，使用动态坐标切片策略的DIIF可以显著降低大尺度超分辨率需要的计算成本。实验结果表明，与现有的任意尺度超分辨率方法相比，DIIF实现了最佳的计算效率和超分辨率性能。In the embodiment of the present invention, a dynamic implicit image function (Dynamic Implicit Image Function, DIIF) is proposed, which is a fast and effective method for representing images of any scale. Referring to Figure 2, I _in represents the input image, and the encoder maps the input image to a 2D feature map as its DIIF representation. Given the resolution of the real image, the hidden code z ^* can be obtained from the 2D feature map, as well as the coordinate slice around the latent code

Among them, X _1st represents the first coordinate of the coordinate slice, and X _last represents the tail coordinate of the coordinate slice. Then the decoding function uses the above information to predict all pixel values of the coordinate slice, that is, the pixel value prediction of the coordinates is performed through a two-stage multi-layer perceptron (or called a coarse-to-fine multi-layer perceptron), and the first stage ( Coarse stage) Predict slice hidden vector H ^* , and take the coordinates _Xi to be predicted as the input of the second stage (fine stage), and output the pixel value I _out-i of the coordinates to be predicted. In the embodiment of the present invention, in the training phase, the predicted pixel value I _out-i and the pixel value I _gt-i of the real image are used to calculate the loss function, the encoder and the decoding function are jointly trained in the self-supervised super-resolution task, and the learning The network parameters obtained are shared by all images. The embodiment of the present invention enables the neural network to perform a many-to-many mapping from coordinate slices to pixel value slices by using image coordinate grouping and slice strategies, instead of separately predicting the pixel value of a given coordinate each time. The embodiment of the present invention further proposes a two-stage multilayer perceptron (Coarse-to-FineMultilayer Perceptron, C2F-MLP), which is used to perform image decoding based on a dynamic coordinate slicing strategy, so that the number of coordinates in each slice can be enlarged with The DIIF using the dynamic coordinate slicing strategy can significantly reduce the computational cost required for large-scale super-resolution. Experimental results show that DIIF achieves the best computational efficiency and super-resolution performance compared with existing arbitrary-scale super-resolution methods.

进一步作为优选的实施方式，所述对所述二维特征图进行动态坐标切片处理，包括：Further as a preferred implementation manner, the performing dynamic coordinate slicing processing on the two-dimensional feature map includes:

输入图像放大倍数；Enter image magnification;

在本发明实施例中，从二维特征图中选取一个向量作为隐码，根据隐码对二维特征图中距离该隐码比距离其他隐码更近的坐标进行分组处理，得到特征坐标组。通过特征坐标组能够在一个坐标组内共享隐码，以便解码器可以仅使用一次隐码来预测坐标组对应的所有像素值。一个坐标组中的坐标数量与放大倍数成正比，因此放大倍数越大，可以节省的计算成本也更多。坐标分组要求解码器同时预测坐标组的所有像素值，这在进行大尺度的超分辨率时会给解码器带来了沉重的负担。本发明实施例提供了一个合理的解决方案是根据图像放大倍数对特征坐标组进行切片处理，得到坐标切片，通过将一个坐标组划分为多个坐标片，并且只在坐标切片内而不是整个坐标组内共享隐码输入。In the embodiment of the present invention, a vector is selected from the two-dimensional feature map as a hidden code, and the coordinates in the two-dimensional feature map that are closer to the hidden code than other hidden codes are grouped according to the hidden code to obtain a feature coordinate group . The hidden code can be shared within a coordinate group through the feature coordinate group, so that the decoder can use the hidden code only once to predict all pixel values corresponding to the coordinate group. The number of coordinates in a coordinate group is proportional to the magnification, so the greater the magnification, the greater the computational cost savings. Coordinate grouping requires the decoder to predict all pixel values of the coordinate group simultaneously, which imposes a heavy burden on the decoder when performing large-scale super-resolution. The embodiment of the present invention provides a reasonable solution to slice the feature coordinate group according to the magnification of the image to obtain a coordinate slice. By dividing a coordinate group into multiple coordinate slices, and only in the coordinate slice instead of the entire coordinate Share hidden code input within the group.

进一步作为优选的实施方式，所述根据所述图像放大倍数对所述特征坐标组进行切片处理，得到坐标切片，包括：As a further preferred implementation, the said feature coordinate group is sliced according to the image magnification to obtain a coordinate slice, including:

其中，设置合适的切片间隔以达到最好的性能和效率平衡，最简单的方法是固定坐标切片，它在任何情况下都使用固定的切片间隔。然而，随着放大倍数的增加，这种策略保留了计算成本的平方阶增加特性。此外，坐标切片内部会存在空间不连续性和冗余坐标两大问题。为了解决这些问题，本发明实施例提出了动态坐标切片，以在放大倍数变化时调整切片间隔。本发明实施例可采用第一种策略为线性阶坐标切片，它将切片间隔设置为放大倍数。使用线性阶坐标切片时，DIIF的计算成本随放大倍数的增加而线性增加。另一种策略是将切片间隔设置为放大倍数的平方，称之为常数阶坐标切片。使用常数阶坐标切片时，DIIF的计算成本仅由输入图像的分辨率决定，其随着放大倍数的增加会保持不变。在本发明实施例中，根据切片间隔对特征坐标组进行划分，得到坐标切片，坐标切片用于对切片内的所有坐标共享同一隐码。参照图3，图3为放大倍数为4坐标分组和采用切片间隔为4的坐标切片，Z^*表示隐码，X_1st表示坐标切片的首坐标，X_last表示坐标切片的尾坐标。Among them, setting an appropriate slice interval to achieve the best performance and efficiency balance, the simplest method is fixed coordinate slice, which uses a fixed slice interval in any case. However, this strategy preserves the quadratic increase in computational cost as the magnification increases. In addition, there will be two major problems of spatial discontinuity and redundant coordinates inside the coordinate slice. In order to solve these problems, the embodiment of the present invention proposes dynamic coordinate slicing, so as to adjust the slice interval when the magnification factor changes. In the embodiment of the present invention, the first strategy may be linear-order coordinate slicing, which sets the slicing interval as the magnification factor. When using linear-order coordinate slices, the computational cost of DIIF increases linearly with magnification. Another strategy is to set the slice interval as the square of the magnification factor, which is called constant-order coordinate slicing. When using constant-order coordinate slices, the computational cost of DIIF is determined only by the resolution of the input image, which remains constant as the magnification increases. In the embodiment of the present invention, the feature coordinate group is divided according to the slice interval to obtain a coordinate slice, and the coordinate slice is used to share the same hidden code for all coordinates in the slice. Referring to Fig. 3, Fig. 3 is a coordinate grouping with a magnification of 4 and a coordinate slice with a slice interval of 4. Z ^* represents the hidden code, X _1st represents the first coordinate of the coordinate slice, and X _last represents the tail coordinate of the coordinate slice.

进一步作为优选的实施方式，所述通过双阶段多层感知器进行像素值预测处理，包括：Further as a preferred embodiment, the pixel value prediction process performed by a two-stage multi-layer perceptron includes:

其中，为了执行动态坐标切片策略，解码器需要具有使用非固定数量的坐标作为输入并输出相应像素值的可伸缩性。然而，普通MLP只允许使用固定长度的向量作为输入。为了解决这个问题，本发明实施例提出了一种双阶段多层感知器(C2F-MLP)作为解码器，分为用于预测切片隐向量的第一阶段(粗略阶段)和用于预测像素值的第二阶段(精细阶段)。在本发明实施例中，粗略阶段的隐藏层将坐标切片的边界坐标及其对应隐码作为输入，生成切片隐向量。切片隐向量包含切片中所有像素值的信息，并被用作精细阶段的输入。粗略阶段的计算成本由坐标切片的数量决定，由于使用了动态坐标切片策略，该数量远小于输出坐标的数量。粗略阶段还允许解码函数利用切片内的空间关系，这使得其对像素值的预测更加准确。精细阶段的隐藏层将粗略阶段输出的切片隐向量和给定坐标切片中的任何坐标作为输入，以预测该坐标上的像素值。精细阶段被设计为独立预测待预测坐标上的像素值。精细阶段采用的解码函数可表示为：Among them, in order to perform the dynamic coordinate slicing strategy, the decoder needs to have the scalability to use a non-fixed number of coordinates as input and output corresponding pixel values. However, ordinary MLPs only allow fixed-length vectors as input. In order to solve this problem, the embodiment of the present invention proposes a two-stage multi-layer perceptron (C2F-MLP) as a decoder, which is divided into the first stage (coarse stage) for predicting the hidden vector of the slice and the first stage (coarse stage) for predicting the pixel value The second stage (fine stage). In the embodiment of the present invention, the hidden layer of the rough stage takes the boundary coordinates of the coordinate slice and its corresponding hidden code as input to generate a hidden vector of the slice. Slice latent vectors contain information about all pixel values in the slice and are used as input to the refinement stage. The computational cost of the coarse stage is determined by the number of coordinate slices, which is much smaller than the number of output coordinates due to the dynamic coordinate slice strategy. The coarse stage also allows the decoding function to exploit the spatial relationships within slices, which makes its predictions of pixel values more accurate. The hidden layer of the fine stage takes as input the slice latent vector output by the coarse stage and any coordinate in the given coordinate slice to predict the pixel value at that coordinate. The refinement stage is designed to independently predict pixel values at the coordinates to be predicted. The decoding function used in the fine stage can be expressed as:

I(X^*)＝f_θ(z^*,[x_tl-v^*,…,x_rb-v^*])；I(X ^* ) = f _θ (z ^* ,[x _tl -v ^* ,...,x _rb -v ^* ]);

式中，I是像素值，X^*＝[x_tl,…,x_rb]是给定的坐标切片，f_θ是解码器，z^*是坐标切片对应的隐码，v^*是隐码的坐标，x_tl和x_rb分别是该坐标切片的首坐标和尾坐标。In the formula, I is the pixel value, X ^* ＝[x _tl ,…,x _rb ] is the given coordinate slice, f _θ is the decoder, z ^* is the hidden code corresponding to the coordinate slice, v ^* is the coordinate of the hidden code , x _tl and x _rb are the first and last coordinates of the coordinate slice, respectively.

由于切片隐向量的长度比隐码的长度更短，且精细阶段的隐藏层数更少，因此与LIIF的解码器相比，DIIF的精细阶段所需要的计算成本显著更低。Since the length of the sliced latent vector is shorter than that of the latent code, and the number of hidden layers in the fine stage is smaller, the computational cost required for the fine stage of DIIF is significantly lower compared with that of LIIF's decoder.

进一步作为优选的实施方式，所述双阶段多层感知器包括隐藏层，所述隐藏层由线性层和激活函数组成。As a further preferred embodiment, the two-stage multilayer perceptron includes a hidden layer, and the hidden layer is composed of a linear layer and an activation function.

参照图4，C2F-MLP将解码器分为用于预测切片隐向量的粗略阶段和用于预测像素值的精细阶段。C2F-MLP的隐藏层由维度为256的线性层组成，随后是ReLU激活函数。在粗略阶段，将隐码z^*，坐标切片的首坐标X_1st，坐标切片的尾坐标X_last，当前放大倍数下的像素点面积a作为输入，输出得到坐标隐向量H_lt～rb。在精细阶段，输入坐标隐向量以及待预测坐标X_I,输出得到I_i。为了预测RGB值，精细阶段最后使用一个维度为3的输出线性层。Referring to Figure 4, C2F-MLP divides the decoder into a coarse stage for predicting slice latent vectors and a fine stage for predicting pixel values. The hidden layer of C2F-MLP consists of a linear layer of dimension 256 followed by a ReLU activation function. In the rough stage, the hidden code z ^* , the first coordinate X _1st of the coordinate slice, the last coordinate X _last of the coordinate slice, and the area a of the pixel point under the current magnification are taken as input, and the hidden coordinate vector H _lt~rb is output as output. In the refinement stage, input coordinate latent vector and coordinate X _I to be predicted, and output I _i . To predict RGB values, the refinement stage ends with an output linear layer of dimension 3.

进一步作为优选的实施方式，在所述通过预先训练的编码器对所述待处理图像进行隐式编码处理，得到二维特征图之前，所述方法还包括预先训练所述编码器和动态隐式图像网络，具体包括：As a further preferred embodiment, before the pre-trained encoder performs implicit encoding on the image to be processed to obtain a two-dimensional feature map, the method further includes pre-training the encoder and dynamic implicit Image network, specifically including:

获取训练图像；Get training images;

在本发明实施例中，训练阶段使用预测得到的像素值和真实图像的像素值计算像素级损失。编码器和解码函数在自监督超分辨率任务中联合训练，而学习到的网络参数由所有图像共享。In the embodiment of the present invention, the training phase uses the predicted pixel values and the pixel values of the real image to calculate the pixel-level loss. The encoder and decoding functions are jointly trained in the self-supervised super-resolution task, while the learned network parameters are shared by all images.

可选地，所述第三模块包括：Optionally, the third module includes:

本发明提出了基于动态隐式图像函数的任意尺度图像表示方法及系统，用于快速有效的任意尺度图像表示。在DIIF中，基于像素的图像被表示为二维特征图，而解码函数采用坐标切片和局部特征向量作为输入，预测对应的像素值组。通过在坐标切片内部共享局部特征向量，DIIF可以在极低的计算成本下进行大尺度超分辨率重建。实验结果表明，在所有的缩放倍数上，DIIF的超分辨率性能和计算效率都优于已有的任意尺度超分辨率方法。与LIIF相比，DIIF可以节省高达87％的计算成本,并始终具有更好的PSNR表现。DIIF可以高效地应用于需要以任意分辨率实时展示图像的场景。通过应用本发明实施例，能够实现图像查看/编辑软件中的任意缩放功能，对低分辨率图像的放大与修复以及对高分辨率图像的压缩与存储。The invention proposes an image representation method and system of any scale based on a dynamic implicit image function, which is used for fast and effective representation of images of any scale. In DIIF, a pixel-based image is represented as a 2D feature map, and the decoding function takes a coordinate slice and a local feature vector as input and predicts the corresponding set of pixel values. By sharing local feature vectors inside coordinate slices, DIIF enables large-scale super-resolution reconstruction at extremely low computational cost. Experimental results demonstrate that DIIF outperforms existing arbitrary-scale super-resolution methods in terms of super-resolution performance and computational efficiency at all zoom factors. Compared with LIIF, DIIF can save up to 87% computational cost and consistently have better PSNR performance. DIIF can be efficiently applied to scenes that need to display images in real time at any resolution. By applying the embodiment of the present invention, any scaling function in image viewing/editing software, magnification and restoration of low-resolution images, and compression and storage of high-resolution images can be realized.

在一些可选择的实施例中，在方框图中提到的功能/操作可以不按照操作示图提到的顺序发生。例如，取决于所涉及的功能/操作，连续示出的两个方框实际上可以被大体上同时地执行或所述方框有时能以相反顺序被执行。此外，在本发明的流程图中所呈现和描述的实施例以示例的方式被提供，目的在于提供对技术更全面的理解。所公开的方法不限于本文所呈现的操作和逻辑流程。可选择的实施例是可预期的，其中各种操作的顺序被改变以及其中被描述为较大操作的一部分的子操作被独立地执行。In some alternative implementations, the functions/operations noted in the block diagrams may occur out of the order noted in the operational diagrams. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/operations involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logical flow presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

此外，虽然在功能性模块的背景下描述了本发明，但应当理解的是，除非另有相反说明，所述的功能和/或特征中的一个或多个可以被集成在单个物理装置和/或软件模块中，或者一个或多个功能和/或特征可以在单独的物理装置或软件模块中被实现。还可以理解的是，有关每个模块的实际实现的详细讨论对于理解本发明是不必要的。更确切地说，考虑到在本文中公开的装置中各种功能模块的属性、功能和内部关系的情况下，在工程师的常规技术内将会了解该模块的实际实现。因此，本领域技术人员运用普通技术就能够在无需过度试验的情况下实现在权利要求书中所阐明的本发明。还可以理解的是，所公开的特定概念仅仅是说明性的，并不意在限制本发明的范围，本发明的范围由所附权利要求书及其等同方案的全部范围来决定。Furthermore, although the invention has been described in the context of functional modules, it should be understood that one or more of the described functions and/or features may be integrated into a single physical device and/or unless stated to the contrary. or software modules, or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to understand the present invention. Rather, given the attributes, functions and internal relationships of the various functional blocks in the devices disclosed herein, the actual implementation of the blocks will be within the ordinary skill of the engineer. Accordingly, those skilled in the art can implement the present invention set forth in the claims without undue experimentation using ordinary techniques. It is also to be understood that the particular concepts disclosed are illustrative only and are not intended to limit the scope of the invention which is to be determined by the appended claims and their full scope of equivalents.

在流程图中表示或在此以其他方式描述的逻辑和/或步骤，例如，可以被认为是用于实现逻辑功能的可执行指令的定序列表，可以具体实现在任何计算机可读介质中，以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用，或结合这些指令执行系统、装置或设备而使用。就本说明书而言，“计算机可读介质”可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。The logic and/or steps represented in the flowcharts or otherwise described herein, for example, can be considered as a sequenced listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium, For use with instruction execution systems, devices, or devices (such as computer-based systems, systems including processors, or other systems that can fetch instructions from instruction execution systems, devices, or devices and execute instructions), or in conjunction with these instruction execution systems, devices or equipment for use. For the purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate or transmit a program for use in or in conjunction with an instruction execution system, device or device.

应当理解，本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如，如果用硬件来实现，和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或他们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列(PGA)，现场可编程门阵列(FPGA)等。It should be understood that various parts of the present invention can be realized by hardware, software, firmware or their combination. In the embodiments described above, various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques known in the art: Discrete logic circuits, ASICs with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or characteristic is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

尽管已经示出和描述了本发明的实施例，本领域的普通技术人员可以理解：在不脱离本发明的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变型，本发明的范围由权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those skilled in the art can understand that various changes, modifications, substitutions and modifications can be made to these embodiments without departing from the principle and spirit of the present invention. The scope of the invention is defined by the claims and their equivalents.

以上是对本发明的较佳实施进行了具体说明，但本发明并不限于所述实施例，熟悉本领域的技术人员在不违背本发明精神的前提下还可做出种种的等同变形或替换，这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of the preferred implementation of the present invention, but the present invention is not limited to the described embodiments, and those skilled in the art can also make various equivalent deformations or replacements without violating the spirit of the present invention. These equivalent modifications or replacements are all within the scope defined by the claims of the present application.

Claims

1. A method for representing images of any scale based on dynamic implicit image functions, characterized in that the method comprises:

Get the image to be processed;

performing implicit encoding processing on the image to be processed by a pre-trained encoder to obtain a two-dimensional feature map;

Inputting the two-dimensional feature map into a dynamic implicit image network, performing dynamic coordinate slice processing on the two-dimensional feature map, and performing pixel value prediction processing through a two-stage multi-layer perceptron to obtain image pixel values.

2. The method according to claim 1, wherein said performing dynamic coordinate slice processing on said two-dimensional feature map comprises:

Enter image magnification;

Obtaining a feature vector from the two-dimensional feature map is determined as a hidden code, and grouping coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate group;

Slicing is performed on the feature coordinate group according to the image magnification to obtain a coordinate slice.

3. The method according to claim 2, wherein the step of performing slice processing on the feature coordinate group according to the image magnification to obtain a coordinate slice includes:

determining slice intervals according to the image magnification;

The feature coordinate group is divided according to the slice interval to obtain a coordinate slice, and the coordinate slice is used to share the same hidden code for all coordinates in the slice.

4. The method according to claim 1, wherein said performing pixel value prediction processing by a two-stage multi-layer perceptron comprises:

Input coordinate slice and slice hidden code;

performing the first-stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;

Acquiring coordinates to be predicted, where the coordinates to be predicted are any coordinates in the coordinate slice;

The second-stage processing is performed on the hidden vector of the slice according to the coordinates to be predicted to obtain the pixel values of the coordinates to be predicted.

5. The method of claim 1, wherein the two-stage multilayer perceptron includes a hidden layer consisting of a linear layer and an activation function.

6. The method according to any one of claims 1 to 5, characterized in that, before the pre-trained encoder performs implicit encoding on the image to be processed to obtain a two-dimensional feature map, the The method also includes pre-training the encoder and the dynamic implicit image network, comprising:

Get training images;

performing pixel prediction processing on the training image through the encoder and the dynamic implicit image network to obtain predicted pixel values;

determining a pixel loss value based on the pixel values of the training image and the predicted pixel values;

The weight parameters of the encoder and the dynamic implicit image network are updated according to the pixel loss value to obtain a trained encoder and dynamic implicit image network.

7. An arbitrary-scale image representation system based on dynamic implicit image functions, characterized in that the system includes:

The first module is used to obtain the image to be processed;

The second module is used to perform implicit encoding processing on the image to be processed through a pre-trained encoder to obtain a two-dimensional feature map;

The third module is used to input the two-dimensional feature map into the dynamic implicit image network, perform dynamic coordinate slice processing on the two-dimensional feature map, and perform pixel value prediction processing through a two-stage multi-layer perceptron to obtain image pixels value.

8. The system according to claim 7, wherein the third module comprises:

The first submodule is used to perform dynamic coordinate slice processing on the two-dimensional feature map;

The second sub-module is used to perform pixel value prediction processing through a two-stage multi-layer perceptron.

9. The system according to claim 8, wherein the first submodule comprises:

The first unit is used to input image magnification;

The second unit is configured to obtain a feature vector from the two-dimensional feature map and determine it as a hidden code, and group coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate group;

The third unit is configured to perform slice processing on the feature coordinate group according to the image magnification to obtain coordinate slices.

10. The system according to claim 8, wherein the second submodule comprises:

The fourth unit is used to input coordinate slices and slice hidden codes;

The fifth unit is used to perform the first-stage processing on the coordinate slice and the slice hidden code to obtain the slice hidden vector;

The sixth unit is used to acquire coordinates to be predicted, where the coordinates to be predicted are any coordinates in the coordinate slice;

The seventh unit is configured to perform a second-stage process on the hidden vector of the slice according to the coordinates to be predicted to obtain pixel values of the coordinates to be predicted.