WO2024179510A1

WO2024179510A1 - Image processing method and related device

Info

Publication number: WO2024179510A1
Application number: PCT/CN2024/078973
Authority: WO
Inventors: 黄毅; 刘健庄
Original assignee: 华为技术有限公司
Priority date: 2023-02-28
Filing date: 2024-02-28
Publication date: 2024-09-06
Also published as: CN116258651A

Abstract

An image processing method, which can be applied to the field of artificial intelligence. The method comprises: acquiring a first image; converting the first image to a frequency domain to obtain first data, the spatial resolution of the first data being lower than that of the first image; determining first high-frequency information according to the first data, the first high-frequency information being an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image; acquiring first low-frequency information, the first low-frequency information comprising noise of a low-frequency channel of the high-quality image; and according to the first high-frequency information and the first low-frequency information, obtaining first noise information by means of a first network, the first noise information being used for denoising the first low-frequency information to obtain second low-frequency information, and the second low-frequency information and the first high-frequency information being used for obtaining a second image. The present application can improve image restoration quality, and reduce processing time.

Description

Image processing method and related device

This application claims the priority of the Chinese patent application filed with the State Intellectual Property Office on February 28, 2023, with application number 202310233422.2 and invention name “An image processing method and related device”, all contents of which are incorporated by reference in this application.

Technical Field

The present application relates to the field of artificial intelligence, and in particular to an image processing method and related devices.

Background Art

Artificial intelligence (AI) is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines so that machines have the functions of perception, reasoning and decision-making.

In real life, due to the influence of the environment and photography technology, the pictures taken often contain degradation components such as raindrops, blur, noise, moiré, etc., which reduce their quality. Image restoration refers to the use of technical means to remove the degradation components in these low-quality pictures, thereby restoring clear and high-quality pictures.

At present, there are many methods for image restoration in the industry for single or multiple tasks. In the early days, most of them were traditional methods based on statistical priors. However, due to the limitations of traditional methods, these methods cannot remove degradation components well and may produce color artifacts. In recent years, the academic community has proposed some methods based on deep learning. Most of these methods are based on CNN or Transformer to directly predict the corresponding clear image from the blurred image through end-to-end training, which often requires a large amount of training data.

However, the processing accuracy of the image restoration technology in the prior art is low.

Summary of the invention

The present application provides an image processing method, which can improve image restoration quality and reduce processing time.

In a first aspect, an embodiment of the present application provides an image processing method, the method comprising: acquiring a first image; converting the first image to a frequency domain to obtain first data; the spatial resolution of the first data is lower than that of the first image; determining first high-frequency information based on the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image; acquiring first low-frequency information, the first low-frequency information containing noise of a low-frequency channel of the high-quality image; obtaining first noise information through a first network based on the first high-frequency information and the first low-frequency information; the first noise information is used to denoise the first low-frequency information to obtain second low-frequency information; the second low-frequency information and the first high-frequency information are used to obtain a second image.

In the embodiment of the present application, the image is converted to the frequency domain for image restoration, which can avoid the use of image segmentation (the segments need to be processed separately and then merged, which may cause boundary artifacts, and when the image size is large, the number of segments is too large, resulting in a long processing time), thereby improving the restoration quality and reducing the processing time. In addition, based on the high-frequency information and the low-frequency information containing noise to predict the noise, the image restoration based on the noise has a higher image quality (it can restore more details and significantly reduce the total sampling time).

In a possible implementation, obtaining first noise information through a first network according to the first high-frequency information and the first low-frequency information includes: obtaining first noise information through a first network according to the first high-frequency information, the first data, and the first low-frequency information.

In a possible implementation, obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer greater than 1; the method further includes: obtaining the second noise information through the first network according to the first high-frequency information and the second low-frequency information at the i+1-th iteration, the second noise information being used to denoise the second low-frequency information to obtain third low-frequency information; the first low-frequency information and the first high-frequency information are used to obtain a second image, including: the third low-frequency information and the first high-frequency information are used to obtain the second image.

In a possible implementation, the method further includes: performing target mapping on the second low-frequency information to obtain target low-frequency information; The target mapping does not include a noise estimation item; the second low-frequency information and the first high-frequency information are used to obtain a second image, including: the target low-frequency information and the first high-frequency information are used to fuse to obtain a fusion result, and the second image is obtained by mapping the fusion result to the spatial domain.

In a possible implementation, the second low-frequency information can be mapped through target mapping to obtain target low-frequency information; the target mapping does not include a noise estimation item; the target low-frequency information and the first high-frequency information are used for fusion (for example, splicing) to obtain a fusion result, and the second image is obtained by mapping the fusion result to the spatial domain (for example, through an inverse wavelet transform).

Through the above method, the total number of sampling steps can be greatly reduced (for example, reduced to about 1/5 of the original), thereby improving the sampling efficiency.

In a possible implementation, the first low-frequency information is randomly generated noise.

In a possible implementation, determining the first high-frequency information according to the first data includes: determining the first high-frequency information through a second network according to the first data.

In a possible implementation, converting the first image into the frequency domain includes: converting the first image into the frequency domain by using a second-order wavelet transform.

In a second aspect, the present application provides a model training method, the method comprising:

Acquire a first image and a second image; the first image and the second image are collected for the same scene; the second image is a high-quality image corresponding to the first image;

Convert the first image and the second image into the frequency domain to obtain first data and second data respectively; the spatial resolution of the first data is lower than that of the first image; the spatial resolution of the second data is lower than that of the second image; the second data includes first low-frequency information; the first low-frequency information is information of a low-frequency channel in the second data;

Determine first high-frequency information according to the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image;

According to the first high-frequency information and the first low-frequency information, first noise information is obtained through a first network; the first noise information is used to determine a first loss together with the second noise information; the second noise information is randomly generated noise;

The first network is updated according to the first loss.

In a possible implementation, obtaining first noise information through a first network according to the first high-frequency information and the first low-frequency information includes:

First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.

In a possible implementation, obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer; and the method further includes:

superimposing noise on the first low-frequency information to obtain third low-frequency information;

In the (i+1)th iteration, third noise information is obtained through the first network according to the first high-frequency information and the third low-frequency information; the third noise information is used to determine the second loss together with the fourth noise information; the fourth noise information is randomly generated noise;

The updated first network is updated according to the second loss.

In a possible implementation, determining first high-frequency information according to the first data includes:

According to the first data, first high-frequency information is determined through a second network; the second network is a pre-trained network.

In a possible implementation, converting the first image and the second image into a frequency domain includes:

The first image and the second image are converted into the frequency domain by second-order wavelet transform.

In a third aspect, the present application provides an image processing device, the device comprising:

An acquisition module, used for acquiring a first image;

A processing module, configured to convert the first image into a frequency domain to obtain first data; the spatial resolution of the first data is lower than that of the first image;

Acquire first low-frequency information, where the first low-frequency information includes noise of a low-frequency channel of the high-quality image;

According to the first high-frequency information and the first low-frequency information, first noise information is obtained through a first network; the first noise information is used to denoise the first low-frequency information to obtain second low-frequency information;

The second low-frequency information and the first high-frequency information are used to obtain a second image.

In a possible implementation, the processing module is specifically configured to:

In a possible implementation, obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer greater than 1; and the processing module is further used to:

In the (i+1)th iteration, second noise information is obtained through the first network according to the first high-frequency information and the second low-frequency information, and the second noise information is used to denoise the second low-frequency information to obtain third low-frequency information;

The first low-frequency information and the first high-frequency information are used to obtain a second image, including:

The third low-frequency information and the first high-frequency information are used to obtain a second image.

In a possible implementation, the processing module is further configured to:

The second low-frequency information is subjected to target mapping to obtain target low-frequency information; the target mapping does not include a noise estimation item;

The processing module is specifically used for:

The target low-frequency information and the first high-frequency information are used to fuse to obtain a fusion result, and the second image is obtained by mapping the fusion result to a spatial domain.

According to the first data, first high-frequency information is determined through a second network.

The first image is converted into the frequency domain by second-order wavelet transform.

In a fourth aspect, the present application provides a model training device, the device comprising:

An acquisition module, used to acquire a first image and a second image; the first image and the second image are acquired for the same scene; the second image is a high-quality image corresponding to the first image;

a processing module, configured to convert the first image and the second image into a frequency domain to obtain first data and second data respectively; the spatial resolution of the first data is lower than that of the first image; the spatial resolution of the second data is lower than that of the second image; the second data includes first low-frequency information; the first low-frequency information is information of a low-frequency channel in the second data;

The first network is updated according to the first loss.

In a possible implementation, obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer; and the processing module is further used to:

The updated first network is updated according to the second loss.

In a third aspect, an embodiment of the present application provides an image processing device, which may include a memory, a processor, and a bus system, wherein the memory is used to store programs, and the processor is used to execute the programs in the memory to perform any optional method as described in the first aspect above.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored. When the computer-readable storage medium is run on a computer, the computer executes the above-mentioned first aspect and any optional method.

In a fifth aspect, an embodiment of the present application provides a computer program product, including code, which, when executed, is used to implement the above-mentioned first aspect and any optional method.

In a sixth aspect, the present application provides a chip system, which includes a processor for supporting an image processing device to implement the functions involved in the above aspects, such as sending or processing the data involved in the above methods; or information. In one possible design, the chip system also includes a memory, which is used to store program instructions and data necessary for executing the device or training the device. The chip system can be composed of a chip, or it can include a chip and other discrete devices.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG1A is a schematic diagram of a structure of an artificial intelligence main framework;

1B and 2 are schematic diagrams of the application system framework of the present invention;

FIG3 is a schematic diagram of an optional hardware structure of a terminal;

FIG4 is a schematic diagram of the structure of a server;

FIG5 is a schematic diagram of a system architecture of the present application;

FIG6 is a process of a cloud service;

FIG7 is a schematic diagram of the structure of a neural network model in an embodiment of the present application;

FIG8 is a schematic diagram of the structure of a neural network model in an embodiment of the present application;

FIG9 is a schematic diagram of a process of an image processing method;

FIG10 is a schematic diagram of an image processing method;

FIG11 is a schematic diagram of an image processing method;

FIG12A is a schematic diagram of a beneficial effect;

FIG12B is a schematic diagram of an architecture;

FIG13 is a schematic diagram of the structure of an image processing device provided in an embodiment of the present application;

FIG14 is a schematic diagram of an execution device provided in an embodiment of the present application;

FIG15 is a schematic diagram of a training device provided in an embodiment of the present application;

FIG16 is a schematic diagram of a chip provided in an embodiment of the present application.

DETAILED DESCRIPTION

The following describes the embodiments of the present invention in conjunction with the accompanying drawings in the embodiments of the present invention. The terms used in the implementation mode of the present invention are only used to explain the specific embodiments of the present invention, and are not intended to limit the present invention.

The embodiments of the present application are described below in conjunction with the accompanying drawings. It is known to those skilled in the art that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.

The terms "first", "second", etc. in the specification and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and need not be used to describe a specific order or sequential order. It should be understood that the terms used in this way can be interchangeable under appropriate circumstances, which is only to describe the distinction mode adopted by the objects of the same attributes when describing in the embodiments of the present application. In addition, the terms "including" and "having" and any of their variations are intended to cover non-exclusive inclusions, so that the process, method, system, product or equipment comprising a series of units need not be limited to those units, but may include other units that are not clearly listed or inherent to these processes, methods, products or equipment.

First, the overall workflow of the artificial intelligence system is described. Please refer to Figure 1A. Figure 1A shows a structural diagram of the main framework of artificial intelligence. The following is an explanation of the above artificial intelligence theme framework from the two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be a general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has undergone a condensation process of "data-information-knowledge-wisdom". The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecology process of the system.

(1) Infrastructure

The infrastructure provides computing power support for the artificial intelligence system, enables communication with the outside world, and is supported by the basic platform. It communicates with the outside world through sensors; computing power is provided by smart chips (CPU, NPU, GPU, ASIC, FPGA and other hardware acceleration chips); the basic platform includes distributed computing frameworks and networks and other related platform guarantees and support, which can include cloud storage and computing, interconnected networks, etc. For example, sensors communicate with the outside world to obtain data, and these data are provided to the smart chips in the distributed computing system provided by the basic platform for calculation.

(2) Data

The data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence. The data involves graphics, images, voice, text, and IoT data of traditional devices, including business data of existing systems and perception data such as force, displacement, liquid level, temperature, and humidity.

(3) Data processing

Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.

Among them, machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, and training.

Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formalized information to perform machine thinking and solve problems based on reasoning control strategies. Typical functions are search and matching.

Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.

(4) General capabilities

After the data has undergone the data processing mentioned above, some general capabilities can be further formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.

(5) Smart products and industry applications

Smart products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall artificial intelligence solution, which productizes intelligent information decision-making and realizes practical applications. Its application areas mainly include: smart terminals, smart transportation, smart medical care, autonomous driving, smart cities, etc.

The embodiments of the present application can be applied to tasks related to image processing, such as image enhancement and other fields.

Next, we will first introduce the application scenarios of this application. This application can be applied to, but not limited to, applications with image processing functions or cloud services provided by cloud-side servers, etc., which will be introduced separately as follows:

1. Image processing applications

The product form of the embodiment of the present application may be an image processing application. The image processing application may be run on a terminal device or On the server on the cloud side.

In a possible implementation, the image processing application may implement image processing tasks or tasks based on image processing results.

In one possible implementation, a user can open an application with image processing functions installed on a terminal device. The application can obtain image data captured by the camera or image data specified by the user. The image processing application can obtain processing results based on the input data through the method provided in the embodiment of the present application, and present the image processing results or downstream task results based on the image processing results to the user (the presentation method can be but is not limited to display, saving, uploading to the cloud side, etc.).

In one possible implementation, a user can open an image processing application installed on a terminal device. The application can obtain image data captured by a camera or image data specified by a user. The image processing application can send the data (or the result obtained after certain processing on the data) to a server on the cloud side. The server on the cloud side generates an image processing result based on the image through the method provided in an embodiment of the present application, and transmits the image processing result or the result of a downstream task implemented based on the image processing result back to the terminal device. The terminal device can present the image processing result or the result of a downstream task implemented based on the image processing result to the user (the presentation method can be but is not limited to display, saving, uploading to the cloud side, etc.).

Next, the image processing application in the embodiment of the present application is introduced from the functional architecture and the product architecture that realizes the functions.

Referring to FIG. 1B , FIG. 1B is a schematic diagram of the functional architecture of an image processing application in an embodiment of the present application:

In a possible implementation, as shown in FIG1B , an image processing application 102 may receive input data 101 (e.g., image and event data) and generate a processing result 103. The image processing application 102 may be executed on, for example, at least one computer system and includes computer codes that, when executed by one or more computers, cause the computers to execute the image processing method described herein.

Referring to FIG. 2 , FIG. 2 is a schematic diagram of the physical architecture for running an image processing application in an embodiment of the present application:

Referring to Fig. 2, Fig. 2 shows a schematic diagram of a system architecture. The system may include a terminal 100 and a server 200. The server 200 may include one or more servers (one server is used as an example in Fig. 2 for illustration), and the server 200 may provide image processing services for one or more terminals or perform downstream tasks based on image processing results.

Among them, the terminal 100 can be installed with an image processing application, or a web page related to image processing or downstream tasks based on image processing results can be opened. The above application and web page can provide an interface. The terminal 100 can receive relevant parameters entered by the user on the image processing or downstream task interface based on image processing results, and send the above parameters to the server 200. The server 200 can obtain the processing results based on the received parameters and return the processing results to the terminal 100.

It should be understood that in some optional implementations, the terminal 100 can also complete the data processing results based on the received parameters by itself without the need for cooperation from the server, and the embodiments of the present application are not limited to this.

Next, the product form of the terminal 100 in FIG. 2 is described;

The terminal 100 in the embodiment of the present application can be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a laptop computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (PDA), etc., and the embodiment of the present application does not impose any limitation on this.

FIG. 3 shows a schematic diagram of an optional hardware structure of the terminal 100 .

3, the terminal 100 may include components such as a radio frequency unit 110, a memory 120, an input unit 130, a display unit 140, a camera 150 (optional), an audio circuit 160 (optional), a speaker 161 (optional), a microphone 162 (optional), a processor 170, an external interface 180, and a power supply 190. Those skilled in the art will appreciate that FIG3 is merely an example of a terminal or a multi-function device, and does not constitute a limitation on the terminal or the multi-function device, and may include more or fewer components than shown in the figure, or combine certain components, or different components.

The input unit 130 can be used to receive input digital or character information, and generate key signal input related to the user settings and function control of the portable multi-function device. Specifically, the input unit 130 may include a touch screen 131 (optional) and/or other input devices 132. The touch screen 131 can collect user touch operations on or near it (such as operations performed by the user using fingers, joints, stylus, or any other suitable objects on or near the touch screen), and drive the corresponding connection device according to a pre-set program. The touch screen can detect the user's touch action on the touch screen, convert the touch action into a touch signal and send it to the processor 170, and can receive and execute commands sent by the processor 170; the touch signal at least includes touch point coordinate information. The touch screen 131 can provide communication between the terminal 100 and the user. The input interface and output interface between the input and output interfaces. In addition, the touch screen can be implemented by various types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch screen 131, the input unit 130 can also include other input devices. Specifically, the other input devices 132 can include but are not limited to one or more of a physical keyboard, a function key (such as a volume control key, a switch key, etc.), a trackball, a mouse, a joystick, etc.

Among them, other input devices 132 can obtain image data collected by a camera or image data specified by a user, etc.

The display unit 140 may be used to display information input by the user or provided to the user, various menus of the terminal 100, interactive interfaces, file display, and/or playback of any multimedia file. In an embodiment of the present application, the display unit 140 may be used to display an interface of an application program related to image processing, etc.

The memory 120 can be used to store instructions and data. The memory 120 can mainly include an instruction storage area and a data storage area. The data storage area can store various data, such as multimedia files, texts, etc.; the instruction storage area can store software units such as operating systems, applications, instructions required for at least one function, or their subsets and extensions. It can also include a non-volatile random access memory; provide the processor 170 with hardware, software and data resources including management of computing and processing equipment, and support control software and applications. It is also used for the storage of multimedia files, and the storage of running programs and applications.

The processor 170 is the control center of the terminal 100. It uses various interfaces and lines to connect various parts of the entire terminal 100. By running or executing instructions stored in the memory 120 and calling data stored in the memory 120, it executes various functions of the terminal 100 and processes data, thereby controlling the terminal device as a whole. Optionally, the processor 170 may include one or more processing units; preferably, the processor 170 may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface and application program, and the modem processor mainly processes wireless communication. It is understandable that the above-mentioned modem processor may not be integrated into the processor 170. In some embodiments, the processor and the memory may be implemented on a single chip, and in some embodiments, they may also be implemented separately on separate chips. The processor 170 may also be used to generate corresponding operation control signals, send them to corresponding components of the computing and processing device, read and process data in the software, especially read and process data and programs in the memory 120, so that each functional module therein performs corresponding functions, thereby controlling the corresponding components to act according to the requirements of the instructions.

Among them, the memory 120 can be used to store software codes related to the image processing method, the processor 170 can execute the steps of the image processing method of the chip, and can also schedule other units (such as the above-mentioned input unit 130 and display unit 140) to realize corresponding functions.

The RF unit 110 (optional) can be used for receiving and sending information or receiving and sending signals during a call, for example, after receiving the downlink information of the base station, it is sent to the processor 170 for processing; in addition, the designed uplink data is sent to the base station. Generally, the RF circuit includes but is not limited to an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (Low Noise Amplifier, LNA), a duplexer, etc. In addition, the RF unit 110 can also communicate with network devices and other devices through wireless communication. The wireless communication can use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), etc.

In this embodiment of the present application, the RF unit 110 can send image data to the server 200 and receive information on processing results sent by the server 200.

It should be understood that the radio frequency unit 110 is optional and can be replaced by other communication interfaces, such as a network port.

The terminal 100 also includes a power supply 190 (such as a battery) for supplying power to various components. Preferably, the power supply can be logically connected to the processor 170 through a power management system, so that the power management system can manage functions such as charging, discharging, and power consumption.

The terminal 100 also includes an external interface 180, which can be a standard Micro USB interface or a multi-pin connector. It can be used to connect the terminal 100 to communicate with other devices, and can also be used to connect a charger to charge the terminal 100.

Although not shown, the terminal 100 may also include a flashlight, a wireless fidelity (WiFi) module, a Bluetooth module, sensors with different functions, etc., which will not be described in detail here. Some or all of the methods described below may be applied to the terminal 100 shown in FIG. 3 .

Next, the product form of the server 200 in FIG. 2 is described;

Fig. 4 provides a schematic diagram of the structure of a server 200. As shown in Fig. 4, the server 200 includes a bus 201, a processor 202, a communication interface 203 and a memory 204. The processor 202, the memory 204 and the communication interface 203 communicate with each other via the bus 201.

The bus 201 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of representation, FIG4 only uses one thick line, but does not mean that there is only one bus or one type of bus.

The processor 202 may be any one or more of a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP), or a digital signal processor (DSP).

The memory 204 may include a volatile memory (volatile memory), such as a random access memory (RAM). The memory 204 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory, a hard drive (HDD), or a solid state drive (SSD).

The memory 204 may be used to store software codes related to the image processing method, and the processor 202 may execute the steps of the image processing method of the chip, and may also schedule other units to implement corresponding functions.

It should be understood that the above-mentioned terminal 100 and server 200 can be centralized or distributed devices, and the processors in the above-mentioned terminal 100 and server 200 (such as processor 170 and processor 202) can be hardware circuits (such as application specific integrated circuit (ASIC), field-programmable gate array (FPGA), general-purpose processor, digital signal processor (DSP), microprocessor or microcontroller, etc.), or a combination of these hardware circuits. For example, the processor can be a hardware system with an instruction execution function, such as a CPU, DSP, etc., or a hardware system without an instruction execution function, such as an ASIC, FPGA, etc., or a combination of the above-mentioned hardware systems without an instruction execution function and hardware systems with an instruction execution function.

It should be understood that the steps related to the model reasoning process in the embodiments of the present application involve AI-related operations. When performing AI operations, the instruction execution architecture of the terminal device and the server is not limited to the processor combined with the memory architecture described above. The system architecture provided in the embodiments of the present application is described in detail below in conjunction with Figure 5.

Please refer to FIG. 5, which is a system architecture diagram of a system provided by an embodiment of the present application. In FIG. 5, a task processing system 500 includes an execution device 510, a training device 520, a database 530, a client device 540, a data storage system 550 and a data acquisition device 560, and the execution device 510 includes a computing module 511. Among them, the data acquisition device 560 is used to obtain an open source large-scale data set (i.e., a training set) required by the user, and store the training set in the database 530. The training device 520 trains the target model/rule 501 based on the training set maintained in the database 530, and the trained neural network obtained by the training is then used on the execution device 510. The execution device 510 can call the data, code, etc. in the data storage system 550, and can also store data, instructions, etc. in the data storage system 550. The data storage system 550 can be placed in the execution device 510, or the data storage system 550 can be an external memory relative to the execution device 510.

The trained neural network obtained after the target model/rule 501 is trained by the training device 520 can be applied to different systems or devices (i.e., the execution device 510), which can be edge devices or end-side devices, such as mobile phones, tablets, laptops, monitoring systems (such as cameras), security systems, etc. In FIG5 , the execution device 510 is configured with an I/O interface 512 for data interaction with external devices, and a "user" can input data to the I/O interface 512 through a client device 540. For example, the client device 540 can be a camera device of a monitoring system, and the images and event data captured by the camera device are input as input data to the computing module 511 of the execution device 510, and the computing module 511 processes the input target image to obtain a processing result, and then outputs the processing result to the camera device or directly displays it on the display interface of the execution device 510 (if any); in addition, in some embodiments of the present application, the client device 540 can also be integrated in the execution device 510, such as, when the execution device 510 is a mobile phone, the target task can be directly obtained through the mobile phone (such as, the image and event data can be captured by the camera of the mobile phone) or the target task sent by other devices (such as, another mobile phone) can be received, and then the computing module 511 in the mobile phone detects the target task and obtains the detection result, and directly presents the detection result on the display interface of the mobile phone. The product form of the execution device 510 and the client device 540 is not limited here.

It is worth noting that FIG. 5 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, components, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 5, the data storage system 550 is an external memory relative to the execution device 510. In other cases, the data storage system 550 can also be placed in the execution device 510. It should be understood that the above-mentioned execution device 510 can be deployed in the client device 540.

From the inference side of the model:

In the embodiment of the present application, the computing module 511 of the above-mentioned execution device 510 can obtain the code stored in the data storage system 550 to implement the steps related to the model reasoning process in the embodiment of the present application.

In the embodiment of the present application, the computing module 511 of the execution device 510 may include a hardware circuit (such as an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a general-purpose processor, a digital signal processor (DSP), a microprocessor or a microcontroller, etc.), or a combination of these hardware circuits. Combination, for example, the training device 520 can be a hardware system with the function of executing instructions, such as CPU, DSP, etc., or a hardware system without the function of executing instructions, such as ASIC, FPGA, etc., or a combination of the above-mentioned hardware systems without the function of executing instructions and hardware systems with the function of executing instructions.

Specifically, the computing module 511 of the execution device 510 can be a hardware system with an execution instruction function, and the steps related to the model reasoning process provided in the embodiment of the present application can be software codes stored in the memory. The computing module 511 of the execution device 510 can obtain the software code from the memory and execute the obtained software code to implement the steps related to the model reasoning process provided in the embodiment of the present application.

It should be understood that the computing module 511 of the execution device 510 can be a combination of a hardware system that does not have the function of executing instructions and a hardware system that has the function of executing instructions. Some of the steps related to the model reasoning process provided in the embodiments of the present application can also be implemented by the hardware system that does not have the function of executing instructions in the computing module 511 of the execution device 510, which is not limited here.

From the training side of the model:

In an embodiment of the present application, the above-mentioned training device 520 can obtain the code stored in the memory (not shown in Figure 5, which can be integrated into the training device 520 or deployed separately from the training device 520) to implement the steps related to model training in an embodiment of the present application.

In the embodiment of the present application, the training device 520 may include a hardware circuit (such as an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a general-purpose processor, a digital signal processor (DSP), a microprocessor or a microcontroller, etc.), or a combination of these hardware circuits. For example, the training device 520 may be a hardware system with an instruction execution function, such as a CPU, DSP, etc., or a hardware system without an instruction execution function, such as an ASIC, FPGA, etc., or a combination of the above-mentioned hardware systems without an instruction execution function and hardware systems with an instruction execution function.

It should be understood that the training device 520 can be a combination of a hardware system that does not have the function of executing instructions and a hardware system that has the function of executing instructions. Some of the steps related to the model training provided in the embodiments of the present application can also be implemented by the hardware system that does not have the function of executing instructions in the training device 520, which is not limited here.

2. Image processing cloud services provided by the server:

In one possible implementation, the server can provide image processing services to the end side or perform downstream tasks based on the image processing results through an application programming interface (API).

Among them, the terminal device can send relevant parameters (such as image data) to the server through the API provided by the cloud. The server can obtain processing results based on the received parameters and return the processing results (such as enhanced image data) to the terminal.

The description of the terminal and the server can be the same as that of the above embodiments, and will not be repeated here.

FIG. 6 shows a process of using an image processing cloud service provided by a cloud platform.

1. Activate and purchase content review service.

2. Users can download the software development kit (SDK) corresponding to the content review service. Usually, the cloud platform provides multiple development versions of the SDK for users to choose according to the requirements of the development environment, such as JAVA version SDK, Python version SDK, PHP version SDK, Android version SDK, etc.

3. After the user downloads the corresponding version of the SDK to the local computer according to the needs, the SDK project is imported into the local development environment, and configuration and debugging are performed in the local development environment. The local development environment can also be used to develop other functions, thus forming an application that integrates image processing capabilities.

4. When image processing applications are used, when image processing is required or downstream tasks are performed based on image processing results, API calls for image processing or downstream tasks based on image processing results can be triggered. When the application triggers image processing or performs downstream task functions based on image processing results, an API request is initiated to the running instance of the image processing service in the cloud environment, where the API request carries an image, and the running instance in the cloud environment processes the image to obtain the processing result.

5. The cloud environment returns the processing results to the application, thereby completing the image processing once or making a downstream task service call based on the image processing results.

Since the embodiments of the present application involve the application of a large number of neural networks, in order to facilitate understanding, the relevant terms and related concepts such as neural networks involved in the embodiments of the present application are first introduced below.

(1) Neural Network

A neural network may be composed of neural units, and a neural unit may refer to an operation unit that takes xs (i.e., input data) and intercept 1 as input, and the output of the operation unit may be:

Where s=1, 2, ...n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into the output signal. The output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting multiple single neural units mentioned above, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected to the local receptive field of the previous layer to extract the characteristics of the local receptive field. The local receptive field can be an area composed of several neural units.

(2) Convolutional neural network (CNN) is a deep neural network with a convolutional structure. A convolutional neural network contains a feature extractor consisting of a convolutional layer and a subsampling layer, which can be regarded as a filter. A convolutional layer refers to a neuron layer in a convolutional neural network that performs convolution processing on the input signal. In a convolutional layer of a convolutional neural network, a neuron can only be connected to some neurons in the adjacent layers. A convolutional layer usually contains several feature planes, each of which can be composed of some rectangularly arranged neural units. The neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract features is independent of position. The convolution kernel can be formalized as a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network. In addition, the direct benefit of shared weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.

CNN is a very common neural network. The following is a detailed introduction to the structure of CNN in conjunction with Figure 7. As mentioned in the previous basic concept introduction, convolutional neural network is a deep neural network with a convolution structure and a deep learning architecture. A deep learning architecture refers to multiple levels of learning at different abstract levels through machine learning algorithms. As a deep learning architecture, CNN is a feed-forward artificial neural network in which each neuron can respond to the image input into it.

As shown in FIG. 7 , a convolutional neural network (CNN) 200 may include an input layer 210 , a convolutional layer/pooling layer 220 (wherein the pooling layer is optional), and a fully connected layer 230 .

Convolutional layer/pooling layer 220:

Convolutional Layer:

As shown in FIG7 , the convolution layer/pooling layer 220 may include layers 221-226, for example: in one implementation, layer 221 is a convolution layer, layer 222 is a pooling layer, layer 223 is a convolution layer, layer 224 is a pooling layer, layer 225 is a convolution layer, and layer 226 is a pooling layer; in another implementation, layers 221 and 222 are convolution layers, layer 223 is a pooling layer, layers 224 and 225 are convolution layers, and layer 226 is a pooling layer. That is, the output of a convolution layer can be used as the input of a subsequent pooling layer, or as the input of another convolution layer to continue the convolution operation.

The following will take the convolution layer 221 as an example to introduce the internal working principle of a convolution layer.

The convolution layer 221 may include a plurality of convolution operators, which are also called kernels. The convolution operator is equivalent to a filter that extracts specific information from the input image matrix in image processing. The convolution operator can be essentially a weight matrix, which is usually predefined. In the process of performing convolution operations on the image, the weight matrix is usually processed one pixel after another (or two pixels after two pixels... depending on the value of the step length stride) in the horizontal direction on the input image, thereby completing the work of extracting specific features from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image. In the process of performing convolution operations, the weight matrix will extend to the entire depth of the input image. Therefore, convolution with a single weight matrix will produce a convolution output with a single depth dimension, but in most cases, a single weight matrix is not used, but multiple weight matrices of the same size (row × column), that is, multiple isotype matrices, are applied. The output of each weight matrix is stacked to form the depth dimension of the convolution image, and the dimension here can be understood as being determined by the "multiple" mentioned above. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to blur unnecessary noise in the image. The multiple weight matrices have the same size (rows × columns), and the feature maps extracted by the multiple weight matrices of the same size are also the same size. Multiple extracted feature maps of the same size are merged to form the output of the convolution operation.

The weight values in these weight matrices need to be obtained through a lot of training in practical applications. The weight matrices formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions.

When the convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (for example, 221) often extracts more general features, which can also be called low-level features. As the depth of the convolutional neural network 200 increases, the features extracted by the later convolutional layers (for example, 226) become more and more complex, such as high-level semantic features. Features with higher semantics are more suitable for the problem to be solved.

Pooling layer:

Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce a pooling layer after the convolution layer. In each layer 221-226 as shown in 220 in FIG. 7, a convolution layer may be followed by a pooling layer, or multiple convolution layers may be followed by one or more pooling layers. In the image processing process, the only purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator to sample the input image to obtain an image of smaller size. The average pooling operator may calculate the pixel values in the image within a specific range to generate an average value as the result of average pooling. The maximum pooling operator may take the pixel with the largest value in the range within a specific range as the result of maximum pooling. In addition, just as the size of the weight matrix used in the convolution layer should be related to the image size, the operator in the pooling layer should also be related to the image size. The size of the image output after processing by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average value or maximum value of the corresponding sub-region of the image input to the pooling layer.

Fully connected layer 230:

After being processed by the convolution layer/pooling layer 220, the convolution neural network 200 is not sufficient to output the required output information. Because as mentioned above, the convolution layer/pooling layer 220 will only extract features and reduce the parameters brought by the input image. However, in order to generate the final output information (the required class information or other related information), the convolution neural network 200 needs to use the fully connected layer 230 to generate one or a group of outputs of the required number of classes. Therefore, the fully connected layer 230 may include multiple hidden layers (231, 232 to 23n as shown in Figure 7), and the parameters contained in the multiple hidden layers can be pre-trained according to the relevant training data of the specific task type. For example, the task type may include image recognition, image classification, image super-resolution reconstruction, etc.

After the multiple hidden layers in the fully connected layer 230, that is, the last layer of the entire convolutional neural network 200 is the output layer 240, which has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error. Once the forward propagation of the entire convolutional neural network 200 (as shown in FIG. 7, the propagation from 210 to 240 is the forward propagation) is completed, the back propagation (as shown in FIG. 7, the propagation from 240 to 210 is the back propagation) will begin to update the weight values and biases of the aforementioned layers to reduce the loss of the convolutional neural network 200 and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.

It should be noted that the convolutional neural network 200 shown in Figure 7 is only an example of a convolutional neural network. In specific applications, the convolutional neural network may also exist in the form of other network models, for example, including only a part of the network structure shown in Figure 7. For example, the convolutional neural network used in the embodiment of the present application may only include an input layer 210, a convolution layer/pooling layer 220 and an output layer 240.

It should be noted that the convolutional neural network 100 shown in FIG. 7 is only an example of a convolutional neural network. In specific applications, the convolutional neural network can also exist in the form of other network models. For example, multiple convolutional layers/pooling layers are used in parallel as shown in FIG. 8, and the extracted features are input to the fully connected layer 230 for processing.

(3) Deep Neural Networks

Deep Neural Network (DNN), also known as multi-layer neural network, can be understood as a neural network with many hidden layers. There is no special metric for "many" here. From the position of different layers of DNN, the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the layers in between are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer. Although DNN looks complicated, the work of each layer is actually not complicated. Simply put, it is the following linear relationship expression: in, is the input vector, is the output vector, is the offset vector, W is the weight matrix (also called coefficient), and α() is the activation function. Each layer is just an input vector After such a simple operation, the output vector Since DNN has many layers, the coefficient W and the offset vector The definition of these parameters in DNN is as follows: Taking coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as The superscript 3 represents the layer number of the coefficient W, while the subscripts correspond to the output third layer index 2 and the input second layer index 4. In summary, the coefficients from the kth neuron in the L-1th layer to the jth neuron in the Lth layer are defined as It should be noted that the input The layer has no W parameter. In a deep neural network, more hidden layers allow the network to better describe complex situations in the real world. Theoretically, the more parameters a model has, the higher its complexity and the greater its "capacity", which means it can complete more complex learning tasks. Training a deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by many layers of vector W).

(4) Loss Function

In the process of training a deep neural network, because we hope that the output of the deep neural network is as close as possible to the value we really want to predict, we can compare the predicted value of the current network with the target value we really want, and then update the weight vector of each layer of the neural network according to the difference between the two (of course, there is usually an initialization process before the first update, that is, pre-configuring parameters for each layer in the deep neural network). For example, if the predicted value of the network is high, adjust the weight vector to make it predict a lower value, and keep adjusting until the deep neural network can predict the target value we really want or a value very close to the target value we really want. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function or objective function, which are important equations used to measure the difference between the predicted value and the target value. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, so the training of the deep neural network becomes a process of minimizing this loss as much as possible.

(5) Back propagation algorithm

The error back propagation (BP) algorithm can be used to correct the size of the parameters in the initial model during the training process, so that the error loss of the model becomes smaller and smaller. Specifically, the forward transmission of the input signal to the output will generate error loss, and the parameters in the initial model are updated by back propagating the error loss information, so that the error loss converges. The back propagation algorithm is a back propagation movement dominated by error loss, aiming to obtain the optimal model parameters, such as the weight matrix.

(6) Diffusion Models: Diffusion Models refer to defining a Markov chain of diffusion steps, gradually adding random noise to the data, and then learning the inverse diffusion process to construct the required data samples from the noise.

(7) Image restoration: Image Restoration refers to the process of removing the degraded components in low-quality images caused by various factors and restoring high-quality images with complete details.

This application can be applied in practical scenarios such as image enhancement and restoration, terminal applications, and autonomous driving.

For example, during autonomous driving, the front window view is often blocked due to rainy weather, which poses a great safety hazard. This application can effectively remove raindrops and restore a clear view;

For example, due to the influence of ambient light, photos taken by existing devices are often underexposed. This application can significantly enhance low-light photos to natural light levels, making subsequent processing easier.

For example, when taking a picture of the screen directly with the terminal camera, since the screen is refreshed in real time and dynamically, the pictures taken by the camera will have obvious moiré patterns. This application can effectively remove the moiré patterns and make up for the shortcomings of the terminal device.

In order to solve the above problems, the present application provides an image processing method, which can be a feedforward process of model training or an inference process.

Referring to FIG. 9 , FIG. 9 is an image processing method provided by an embodiment of the present application. As shown in FIG. 9 , the image processing method provided by the present application includes:

901. Acquire a first image.

In the embodiment of the present application, the execution subject of step 901 may be a terminal device, and the terminal device may be a portable mobile device, such as but not limited to a mobile or portable computing device (such as a smart phone), a personal computer, a server computer, a handheld device (such as a tablet), or a laptop. A computer or device comprising a laptop, a multiprocessor system, a game console or controller, a microprocessor-based system, a set-top box, a programmable consumer electronics product, a mobile phone, a mobile computing and/or communication device with a wearable or accessory form factor (e.g., a watch, glasses, headphones or earbuds), a network PC, a minicomputer, a mainframe computer, a distributed computing environment including any of the above systems or devices, and the like.

In the embodiment of the present application, the execution entity of step 1001 may be a server on the cloud side, and the server may receive the first image sent from the terminal device, and then the server may obtain the first image.

In a possible implementation, the first image may be a low-quality image, an image occluded by natural environment such as raindrops, or an image that is underexposed due to the influence of ambient light or an image with obvious moiré.

902. Convert the first image into a frequency domain to obtain first data; the spatial resolution of the first data is lower than that of the first image.

In a possible implementation, the first image may be converted into the frequency domain by a second-order wavelet transform.

As shown in the left half of Figure 11, the spatial domain RGB low-quality image _Xd can be transformed by a second-order Haar wavelet transform to obtain the image _xd in the wavelet domain. Optionally, the image size is changed from H×W×3 to This reduces spatial resolution by a factor of 16, which can speed up processing time.

In the above manner, the diffusion model is introduced from the spatial domain to the wavelet domain using wavelet transform, which can significantly reduce the image processing time (the model only needs to learn part of the spectrum of the image, which is relatively simpler. At the same time, due to the reduction in spatial resolution, the model takes less time to process the image).

903. Determine first high-frequency information according to the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image.

In a possible implementation, the first high-frequency information can be determined based on the first data through a pre-trained second network.

Optionally, the second network can be composed of multiple (e.g., 14) convolutional layers with residual structures. Its main function is to learn the difference between the high-frequency spectrum of the low-quality image and the high-frequency spectrum of its corresponding clear image, so as to predict the high-frequency spectrum of the low-quality image after restoration.

It should be understood that the high-frequency spectrum of the image in the embodiment of the present application (or the information corresponding to the high-frequency channel) is relative to the low-frequency spectrum of the image (or the information corresponding to the low-frequency channel). The frequency corresponding to the high-frequency spectrum is higher than the frequency of the low-frequency spectrum.

For example, the high-quality image corresponding to the first image may include information of multiple channels, wherein the multiple channels may include a high-frequency channel and a low-frequency channel relative to the high-frequency channel.

904. Obtain first low-frequency information, where the first low-frequency information includes noise in a low-frequency channel of the high-quality image.

In a possible implementation, step 904 and the subsequent step 905 may be an iterative process, and the result obtained in step 905 may be used as the first low-frequency information obtained in the next step 904 .

In a possible implementation, if step 904 is the first iteration process, the first low-frequency information may be randomly generated noise (e.g., Gaussian white noise). If step 904 is the i-th (i is greater than 1) iteration process, the first low-frequency information may be randomly generated noise (e.g., Gaussian white noise).

905. Obtain first noise information through a first network according to the first high-frequency information and the first low-frequency information; the first noise information is used to denoise the first low-frequency information to obtain second low-frequency information.

In a possible implementation, the first high-frequency information and the first low-frequency information may be input into a first network, and the first network is a pre-trained network, and the first noise information may be obtained according to the first high-frequency information and the first low-frequency information.

In a possible implementation, the first data may also be input into the first network, that is, the first noise information may be obtained through the first network according to the first high-frequency information, the first data and the first low-frequency information.

In a possible implementation, obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer greater than 1; at the i+1-th iteration, obtaining the second noise information through the first network according to the first high-frequency information and the second low-frequency information, and the second noise information is used to denoise the second low-frequency information to obtain third low-frequency information.

In the sampling frame, the initial time is t = T, and a Gaussian white noise Input into the noise estimation network. The purpose of this network is to estimate the noise that needs to be removed at each moment from this noise, step by step. The noise in the image is removed until it becomes a low-frequency spectrum of a clear image. In this noise removal process, at each time t, the low-quality image _xd and the high-frequency spectrum of the clear image predicted by HFRM need to be The two are input into the noise estimation network as the estimation condition. t decreases from T, and iterates back and forth until t = 0 until.

Next, we will introduce how to denoise the first low-frequency information according to the first noise information to obtain the second low-frequency information:

Most of the existing diffusion model-related works use DDIM's DIS. In this method, the whole sampling process (t=t-1, 1≤t≤T) uses skip sampling with a quantization interval S as the span to reduce the number of sampling steps from T to T/S. Based on this sampling method, the embodiment of the present application further explores a high-efficiency conditional sampling algorithm, which can directly predict the original image at the middle moment M in the sampling process, that is, there is no need to go through the entire DIS process. At this time, the number of sampling steps is (TM)/S. The flow of the sampling method is: it can be a preset proportion of T (for example, 80%). For example, the formula for obtaining Xt-1 as follows is the corresponding implementation of denoising the first low-frequency information according to the first noise information, and the formula for obtaining X0 is the corresponding implementation of the target mapping.

In a possible implementation, the second low-frequency information and the first high-frequency information are used to obtain a second image.

In the sampling frame, when the Gaussian white noise Restore the low-frequency spectrum of a clear image Then, compare it with the high-frequency spectrum of the clear image predicted by HFRM After being fused together and subjected to second-order Haar wavelet inverse transform, the restoration result of the low-quality image _Xd in spatial domain is obtained.

The present application provides an image processing method, the method comprising: obtaining a first image; converting the first image to a frequency domain to obtain first data; the spatial resolution of the first data is lower than that of the first image; determining first high-frequency information according to the first data; the first high-frequency information is the information prediction result of the high-frequency channel of the high-quality image corresponding to the first image; obtaining first low-frequency information, the first low-frequency information containing the noise of the low-frequency channel of the high-quality image; obtaining first noise information through a first network according to the first high-frequency information and the first low-frequency information; the first noise information is used to denoise the first low-frequency information to obtain second low-frequency information; the second low-frequency information and the first high-frequency information are used to obtain a second image. In an embodiment of the present application, converting the image to the frequency domain for image restoration can avoid the use of image segmentation (segmentation needs to be processed separately and then merged, boundary artifacts may occur, and when the image size is large, the number of segments is too large, resulting in a long processing time), thereby improving the restoration quality and reducing the processing time. In addition, based on the high-frequency information and the low-frequency information containing noise to predict noise, the image restoration based on the noise has a higher image quality (more details can be restored, while significantly reducing the total sampling time).

Referring to FIG. 10 , FIG. 10 is a model training method provided in an embodiment of the present application. As shown in FIG. 10 , the model training method provided in the present application includes:

1001. Acquire a first image and a second image; the first image and the second image are collected for the same scene; the second image is a high-quality image corresponding to the first image.

In a possible implementation, the second image may be a high-quality image corresponding to the first image, for example, an image obtained by removing raindrops from the first image, solving underexposure problems (for example, enhancing dark light photos to natural light levels), or removing moiré patterns.

1002. Convert the first image and the second image into frequency domain to obtain first data and second data respectively; the spatial resolution of the first data is lower than that of the first image; the spatial resolution of the second data is lower than that of the second image; the second data includes first low-frequency information; the first low-frequency information is information of a low-frequency channel in the second data.

In a possible implementation, the first image and the second image may be converted into a frequency domain by a second-order wavelet transform.

As shown in the left half of Figure 11, the spatial domain RGB low-quality image _Xd and the corresponding clear image _X0 can be transformed by the second-order Haar wavelet transform to obtain the wavelet domain images _xd and _x0 . Optionally, the image size is changed from H×W×3 to This reduces spatial resolution by a factor of 16, speeding up processing time.

1003. Determine first high-frequency information according to the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image.

In a possible implementation, the first high-frequency information can be determined according to the first data through a second network; the second network is a pre-trained network.

When training the second network, the frequency domain data corresponding to the low-quality image can be input into the second network to predict the information of the high-frequency channels of the high-quality image corresponding to the low-quality image, and obtain the information of the high-frequency channels of the high-quality image corresponding to the real low-quality image. Based on the two, the loss is constructed to update the second network, so that the second network has the ability to predict the information of the high-frequency channels of the high-quality image corresponding to the low-quality image based on the frequency domain data corresponding to the low-quality image.

1004. Obtain first noise information through a first network according to the first high-frequency information and the first low-frequency information; the first noise information is used to determine a first loss together with the second noise information; and the second noise information is randomly generated noise.

In a possible implementation, the first noise information may be obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.

In a possible implementation, obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer; noise may also be superimposed on the first low-frequency information to obtain third low-frequency information; at the i+1-th iteration, obtaining the third noise information through the first network according to the first high-frequency information and the third low-frequency information; the third noise information is used to determine the second loss with the fourth noise information; the fourth noise information is randomly generated noise; and the updated first network is updated according to the second loss.

In the training framework, the low-frequency spectrum of the high-quality image _x0 after wavelet transformation is first added with different degrees of Gaussian white noise at different times t, and then sent to the noise estimation network. The noise estimation network is a classic U-network structure, and its purpose is to correctly estimate the noise superimposed on the low-frequency spectrum of the high-quality image _x0 at each time. In this estimation process, at each time t, the low-quality image _xd and the high-frequency spectrum of the clear image predicted by HFRM need to be The two are input into the noise estimation network as the estimation condition. t increases from 0, and the iteration is repeated until t = T.

1005. Update the first network according to the first loss.

In order to compare the performance with the existing algorithms, Tables 1 to 4 respectively show the performance of the embodiment of the present application (WaveDM) and the existing image raindrop removal dataset (RainDrop), defocus blur removal dataset (DPDD), demoiré dataset (London's Buildings) and dark light enhancement dataset (LOL-v1). Among them, the evaluation indicators are PSNR, SSIM and recovery time Time. It can be seen from the table that the present application has achieved the best results in both evaluation indicators, and the speed is comparable.

Table 1

Table 2

Table 3

Table 4

The visual display of the beneficial effects of the embodiment of the present application is shown in Figure 12A. It can be seen that compared with other existing methods, the present application restores more image details, and the clarity is significantly better than the existing methods.

An architecture diagram of an embodiment of the present application can be shown in FIG12B , including a training framework and a sampling framework, which is mainly composed of wavelet transform and spectrum separation, a high-frequency fine-tuning module, a noise estimation network, a high-efficiency sampling algorithm, and an inverse wavelet transform. The functions of each part are described as follows:

Wavelet transform: Use a specific wavelet to transform the image from the spatial domain to the wavelet domain to obtain the wavelet spectrum of the image;

High-frequency fine-tuning module: restores the high-frequency spectrum corresponding to the clear image from the high-frequency spectrum of the low-quality image;

Noise estimation network: Using the output of the high-frequency fine-tuning module and the low-frequency spectrum remaining after spectrum separation as conditions, it iteratively restores the low-frequency spectrum of the high-quality image from the Gaussian white noise;

High-efficiency conditional sampling algorithm: Using the output of the high-frequency fine-tuning module and the low-frequency spectrum remaining after spectrum separation as conditions, high-quality images are directly predicted in the intermediate sampling step, thereby reducing the number of sampling steps;

Inverse wavelet transform: The low-frequency spectrum of the high-quality image output by the noise network and the high-frequency spectrum of the high-quality image output by the high-frequency fine-tuning module are combined, and a specific inverse wavelet transform is performed to obtain a clear spatial domain RGB high-quality image.

Next, an image processing device provided by an embodiment of the present application is introduced from the perspective of a device, with reference to FIG. 13 , which is a schematic diagram of the structure of an image processing device provided by an embodiment of the present application. As shown in FIG. 13 , an image processing device 1300 provided by an embodiment of the present application includes:

An acquisition module 1301 is used to acquire a first image;

The specific description of the acquisition module 1301 can refer to the description of step 901 in the above embodiment, which will not be repeated here.

A processing module 1302 is used to convert the first image into a frequency domain to obtain first data; the spatial resolution of the first data is lower than that of the first image;

The specific description of the processing module 1302 can refer to the description of step 902 to step 905 in the above embodiment, which will not be repeated here.

In the (i+1)th iteration, according to the first high-frequency information and the second low-frequency information, the second noise is obtained through the first network. information, wherein the second noise information is used to denoise the second low-frequency information to obtain third low-frequency information;

In a possible implementation, the processing module is further configured to:

The processing module is specifically used for:

In addition, the embodiment of the present application further provides a model training device (which may correspond to the model training method of FIG. 10 ), the device comprising:

The first network is updated according to the first loss.

The updated first network is updated according to the second loss.

Next, an execution device provided in an embodiment of the present application is introduced. Please refer to Figure 14. Figure 14 is a structural schematic diagram of an execution device provided in an embodiment of the present application. The execution device 1400 can be specifically expressed as a mobile phone, a tablet, a laptop computer, an intelligent wearable device, a server, etc., which is not limited here. Among them, the execution device 1400 implements the function of the image processing method in the corresponding embodiment of Figure 10. Specifically, the execution device 1400 includes: a receiver 1401, a transmitter 1402, a processor 1403 and a memory 1404 (wherein the number of processors 1403 in the execution device 1400 can be one or more), wherein the processor 1403 may include an application processor 14031 and a communication processor 14032. In some embodiments of the present application, the receiver 1401, the transmitter 1402, the processor 1403 and the memory 1404 may be connected via a bus or other means.

The memory 1404 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1403. A portion of the memory 1404 may also include a non-volatile random access memory (NVRAM). 1404 stores processors and operation instructions, executable modules or data structures, or subsets thereof, or extended sets thereof, wherein the operation instructions may include various operation instructions for implementing various operations.

The processor 1403 controls the operation of the execution device. In a specific application, the various components of the execution device are coupled together through a bus system, wherein the bus system includes not only a data bus but also a power bus, a control bus, and a status signal bus, etc. However, for the sake of clarity, various buses are referred to as bus systems in the figure.

The method disclosed in the above embodiment of the present application can be applied to the processor 1403, or implemented by the processor 1403. The processor 1403 can be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method can be completed by the hardware integrated logic circuit or software instructions in the processor 1403. The above processor 1403 can be a general-purpose processor, a digital signal processor (digital signal processing, DSP), a microprocessor or a microcontroller, and a vision processor (vision processing unit, VPU), a tensor processor (tensor processing unit, TPU) and other processors suitable for AI computing, and can further include an application specific integrated circuit (application specific integrated circuit, ASIC), a field programmable gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The processor 1403 can implement or execute the disclosed methods, steps and logic block diagrams in the embodiments of the present application. The general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc. The steps of the method disclosed in the embodiment of the present application can be directly embodied as being executed by a hardware decoding processor, or being executed by a combination of hardware and software modules in the decoding processor. The software module can be located in a storage medium mature in the art such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, etc. The storage medium is located in the memory 1404, and the processor 1403 reads the information in the memory 1404, and completes the steps 901 to 905 in the above embodiment in combination with its hardware.

The receiver 1401 can be used to receive input digital or character information and generate signal input related to the relevant settings and function control of the execution device. The transmitter 1402 can be used to output digital or character information through the first interface; the transmitter 1402 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1402 can also include a display device such as a display screen.

The embodiment of the present application also provides a training device, please refer to Figure 15, Figure 15 is a structural diagram of the training device provided by the embodiment of the present application, specifically, the training device 1500 is implemented by one or more servers, and the training device 1500 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (CPU) 1515 (for example, one or more processors) and memory 1532, one or more storage media 1530 (for example, one or more mass storage devices) storing application programs 1542 or data 1544. Among them, the memory 1532 and the storage medium 1530 can be short-term storage or permanent storage. The program stored in the storage medium 1530 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the training device. Furthermore, the central processor 1515 can be configured to communicate with the storage medium 1530 to execute a series of instruction operations in the storage medium 1530 on the training device 1500.

The training device 1500 may also include one or more power supplies 1526, one or more wired or wireless network interfaces 1550, one or more input and output interfaces 1558; or, one or more operating systems 1541, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

The embodiment of the present application also provides a computer program product including computer-readable instructions, which, when executed on a computer, enables the computer to execute the steps executed by the aforementioned execution device, or enables the computer to execute the steps executed by the aforementioned training device.

A computer-readable storage medium is also provided in an embodiment of the present application, which stores a program for signal processing. When the computer-readable storage medium is run on a computer, it enables the computer to execute the steps executed by the aforementioned execution device, or enables the computer to execute the steps executed by the aforementioned training device.

The execution device, training device or terminal device provided in the embodiments of the present application may specifically be a chip, and the chip includes: a processing unit and a communication unit, wherein the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin or a circuit, etc. The processing unit may execute the computer execution instructions stored in the storage unit, so that the chip in the execution device executes the model training method described in the above embodiment, or so that the chip in the training device executes the steps related to model training in the above embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc. The storage unit may also be a storage unit located outside the chip in the wireless access device end, such as a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM), etc.

Specifically, please refer to FIG. 16, which is a schematic diagram of a structure of a chip provided in an embodiment of the present application. The chip can be expressed as Neural network processor NPU 1600, NPU 1600 is mounted on the host CPU (Host CPU) as a coprocessor, and the host CPU assigns tasks. The core part of NPU is the operation circuit 1603, which is controlled by the controller 1604 to extract matrix data in the memory and perform multiplication operations.

In some implementations, the operation circuit 1603 includes multiple processing units (Process Engine, PE) inside. In some implementations, the operation circuit 1603 is a two-dimensional systolic array. The operation circuit 1603 can also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the operation circuit 1603 is a general-purpose matrix processor.

For example, assume there is an input matrix A, a weight matrix B, and an output matrix C. The operation circuit takes the corresponding data of matrix B from the weight memory 1602 and caches it on each PE in the operation circuit. The operation circuit takes the matrix A data from the input memory 1601 and performs matrix operation with matrix B, and the partial result or final result of the matrix is stored in the accumulator 1608.

The unified memory 1606 is used to store input data and output data. The weight data is directly transferred to the weight memory 1602 through the direct memory access controller (DMAC) 1605. The input data is also transferred to the unified memory 1606 through the DMAC.

BIU stands for Bus Interface Unit, that is, the bus interface unit 1610, which is used for the interaction between AXI bus and DMAC and instruction fetch buffer (IFB) 1609.

The bus interface unit 1610 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1609 to obtain instructions from the external memory, and is also used for the storage unit access controller 1605 to obtain the original data of the input matrix A or the weight matrix B from the external memory.

DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1606 or to transfer weight data to the weight memory 1602 or to transfer input data to the input memory 1601.

The vector calculation unit 1607 includes multiple operation processing units, which further process the output of the operation circuit when necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc. It is mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as Batch Normalization, pixel-level summation, upsampling of feature planes, etc.

In some implementations, the vector calculation unit 1607 can store the processed output vector to the unified memory 1606. For example, the vector calculation unit 1607 can apply a linear function; or a nonlinear function to the output of the operation circuit 1603, such as linear interpolation of the feature plane extracted by the convolution layer, and then, for example, a vector of accumulated values to generate an activation value. In some implementations, the vector calculation unit 1607 generates a normalized value, a pixel-level summed value, or both. In some implementations, the processed output vector can be used as an activation input to the operation circuit 1603, for example, for use in a subsequent layer in a neural network.

An instruction fetch buffer 1609 connected to the controller 1604 is used to store instructions used by the controller 1604;

Unified memory 1606, input memory 1601, weight memory 1602 and instruction fetch memory 1609 are all on-chip memories. External memories are private to the NPU hardware architecture.

The processor mentioned in any of the above places may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above programs.

It should also be noted that the device embodiments described above are merely schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. In addition, in the drawings of the device embodiments provided by the present application, the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.

Through the description of the above implementation mode, the technicians in the field can clearly understand that the present application can be implemented by means of software plus necessary general hardware, and of course, it can also be implemented by special hardware including special integrated circuits, special CPUs, special memories, special components, etc. In general, all functions completed by computer programs can be easily implemented by corresponding hardware, and the specific hardware structure used to implement the same function can also be various, such as analog circuits, digital circuits or special circuits. However, for the present application, software program implementation is a better implementation mode in more cases. Based on such an understanding, the technical solution of the present application is essentially or the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a readable storage medium, such as a computer floppy disk, a U disk, a mobile hard disk, a ROM, a RAM, a magnetic disk or an optical disk, etc., including a number of instructions to enable a computer device (which can be a personal computer, a training device, or a network device, etc.) to execute the methods described in each embodiment of the present application.

In the above embodiments, all or part of the embodiments may be implemented by software, hardware, firmware, or any combination thereof. At present, the present invention may be fully or partially implemented in the form of a computer program product.

The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the process or function described in the embodiment of the present application is generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website site, a computer, a training device, or a data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, training device, or data center. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training device, a data center, etc. that includes one or more available media integrations. The available medium may be a magnetic medium, (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state drive (SSD)), etc.

Claims

An image processing method, characterized in that the method comprises:

acquiring a first image;

Converting the first image into a frequency domain to obtain first data; wherein the spatial resolution of the first data is lower than that of the first image;

Determine first high-frequency information according to the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image;

Acquire first low-frequency information, where the first low-frequency information includes noise of a low-frequency channel of the high-quality image;

According to the first high-frequency information and the first low-frequency information, first noise information is obtained through a first network; the first noise information is used to denoise the first low-frequency information to obtain second low-frequency information;

The second low-frequency information and the first high-frequency information are used to obtain a second image.
The method according to claim 1, characterized in that obtaining the first noise information through a first network according to the first high-frequency information and the first low-frequency information comprises:

First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
The method according to claim 1 or 2, characterized in that the obtaining of the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer greater than 1; the method further comprises:

In the (i+1)th iteration, second noise information is obtained through the first network according to the first high-frequency information and the second low-frequency information, and the second noise information is used to denoise the second low-frequency information to obtain third low-frequency information;

The first low-frequency information and the first high-frequency information are used to obtain a second image, including:

The third low-frequency information and the first high-frequency information are used to obtain a second image.
The method according to claim 1 or 2, characterized in that the method further comprises:

The second low-frequency information is subjected to target mapping to obtain target low-frequency information; the target mapping does not include a noise estimation item;

The second low-frequency information and the first high-frequency information are used to obtain a second image, including:

The target low-frequency information and the first high-frequency information are used to fuse to obtain a fusion result, and the second image is obtained by mapping the fusion result to a spatial domain.
The method according to any one of claims 1 to 4, characterized in that the first low-frequency information is randomly generated noise.
The method according to any one of claims 1 to 5, characterized in that determining the first high-frequency information according to the first data comprises:

According to the first data, first high-frequency information is determined through a second network.
The method according to any one of claims 1 to 6, characterized in that converting the first image to the frequency domain comprises:

The first image is converted into the frequency domain by second-order wavelet transform.
A model training method, characterized in that the method comprises:

Acquire a first image and a second image; the first image and the second image are collected for the same scene; the second image is a high-quality image corresponding to the first image;

Convert the first image and the second image into the frequency domain to obtain first data and second data respectively; the spatial resolution of the first data is lower than that of the first image; the spatial resolution of the second data is lower than that of the second image; the second data includes first low-frequency information; the first low-frequency information is information of a low-frequency channel in the second data;

Determine first high-frequency information according to the first data; the first high-frequency information is a high-quality image corresponding to the first image. The information prediction results of the frequency channel;

According to the first high-frequency information and the first low-frequency information, first noise information is obtained through a first network; the first noise information is used to determine a first loss together with the second noise information; the second noise information is randomly generated noise;

The first network is updated according to the first loss.
The method according to claim 8, characterized in that obtaining the first noise information through a first network according to the first high-frequency information and the first low-frequency information comprises:

First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
The method according to claim 8 or 9, characterized in that the obtaining of the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer; the method further comprises:

superimposing noise on the first low-frequency information to obtain third low-frequency information;

In the (i+1)th iteration, third noise information is obtained through the first network according to the first high-frequency information and the third low-frequency information; the third noise information is used to determine the second loss together with the fourth noise information; the fourth noise information is randomly generated noise;

The updated first network is updated according to the second loss.
The method according to any one of claims 8 to 10, characterized in that determining the first high-frequency information according to the first data comprises:

According to the first data, first high-frequency information is determined through a second network; the second network is a pre-trained network.
The method according to any one of claims 8 to 11, characterized in that converting the first image and the second image into the frequency domain comprises:

The first image and the second image are converted into the frequency domain by second-order wavelet transform.
An image processing device, characterized in that the device comprises:

An acquisition module, used for acquiring a first image;

A processing module, configured to convert the first image into a frequency domain to obtain first data; the spatial resolution of the first data is lower than that of the first image;

Determine first high-frequency information according to the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image;

Acquire first low-frequency information, where the first low-frequency information includes noise of a low-frequency channel of the high-quality image;

According to the first high-frequency information and the first low-frequency information, first noise information is obtained through a first network; the first noise information is used to denoise the first low-frequency information to obtain second low-frequency information;

The second low-frequency information and the first high-frequency information are used to obtain a second image.
The device according to claim 13, characterized in that the processing module is specifically used to:

First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
The device according to claim 13 or 14, characterized in that the obtaining of the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer greater than 1; the processing module is further used to:

In the (i+1)th iteration, second noise information is obtained through the first network according to the first high-frequency information and the second low-frequency information, and the second noise information is used to denoise the second low-frequency information to obtain third low-frequency information;

The first low-frequency information and the first high-frequency information are used to obtain a second image, including:

The third low-frequency information and the first high-frequency information are used to obtain a second image.
The device according to claim 13 or 14, characterized in that the processing module is further used for:

The second low-frequency information is subjected to target mapping to obtain target low-frequency information; the target mapping does not include a noise estimation item;

The processing module is specifically used for:

The target low-frequency information and the first high-frequency information are used to fuse to obtain a fusion result, and the second image is obtained by mapping the fusion result to a spatial domain.
The device according to any one of claims 13 to 16, characterized in that the first low-frequency information is randomly generated noise.
The device according to any one of claims 13 to 17, characterized in that the processing module is specifically used to:

According to the first data, first high-frequency information is determined through a second network.
The device according to any one of claims 13 to 18, characterized in that the processing module is specifically used to:

The first image is converted into the frequency domain by second-order wavelet transform.
A model training device, characterized in that the device comprises:

An acquisition module, used to acquire a first image and a second image; the first image and the second image are acquired for the same scene; the second image is a high-quality image corresponding to the first image;

a processing module, configured to convert the first image and the second image into a frequency domain to obtain first data and second data respectively; the spatial resolution of the first data is lower than that of the first image; the spatial resolution of the second data is lower than that of the second image; the second data includes first low-frequency information; the first low-frequency information is information of a low-frequency channel in the second data;

Determine first high-frequency information according to the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image;

According to the first high-frequency information and the first low-frequency information, first noise information is obtained through a first network; the first noise information is used to determine a first loss together with the second noise information; the second noise information is randomly generated noise;

The first network is updated according to the first loss.
The device according to claim 20, characterized in that the processing module is specifically used to:

First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
The device according to claim 20 or 21, characterized in that the obtaining of the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer; and the processing module is further used to:

superimposing noise on the first low-frequency information to obtain third low-frequency information;

In the (i+1)th iteration, third noise information is obtained through the first network according to the first high-frequency information and the third low-frequency information; the third noise information is used to determine the second loss together with the fourth noise information; the fourth noise information is randomly generated noise;

The updated first network is updated according to the second loss.
The device according to any one of claims 20 to 22, characterized in that the processing module is specifically used to:

According to the first data, the first high-frequency information is determined through a second network; the second network is a pre-trained network.
The device according to any one of claims 20 to 23, characterized in that the processing module is specifically used to:

The first image and the second image are converted into the frequency domain by second-order wavelet transform.
A computing device, characterized in that the computing device includes a memory and a processor; the memory stores a code, and the processor is configured to obtain the code and execute the method according to any one of claims 1 to 12.
A computer storage medium, characterized in that the computer storage medium stores one or more instructions, and when the instructions are executed by one or more computers, the one or more computers implement any one of the methods of claims 1 to 12.
A computer program product, comprising codes, characterized in that when the codes are executed, they are used to implement the method according to any one of claims 1 to 12.