WO2024179510A1 - Image processing method and related device - Google Patents
Image processing method and related device Download PDFInfo
- Publication number
- WO2024179510A1 WO2024179510A1 PCT/CN2024/078973 CN2024078973W WO2024179510A1 WO 2024179510 A1 WO2024179510 A1 WO 2024179510A1 CN 2024078973 W CN2024078973 W CN 2024078973W WO 2024179510 A1 WO2024179510 A1 WO 2024179510A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frequency information
- image
- information
- low
- frequency
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 20
- 238000012545 processing Methods 0.000 claims abstract description 153
- 238000000034 method Methods 0.000 claims abstract description 116
- 230000015654 memory Effects 0.000 claims description 67
- 238000012549 training Methods 0.000 claims description 64
- 238000013507 mapping Methods 0.000 claims description 22
- 230000004927 fusion Effects 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 8
- 238000013473 artificial intelligence Methods 0.000 abstract description 17
- 230000006870 function Effects 0.000 description 56
- 230000008569 process Effects 0.000 description 47
- 238000013528 artificial neural network Methods 0.000 description 31
- 239000011159 matrix material Substances 0.000 description 31
- 238000011176 pooling Methods 0.000 description 31
- 238000010586 diagram Methods 0.000 description 30
- 238000013527 convolutional neural network Methods 0.000 description 29
- 238000001228 spectrum Methods 0.000 description 26
- 238000005070 sampling Methods 0.000 description 23
- 238000004891 communication Methods 0.000 description 15
- 230000001537 neural effect Effects 0.000 description 12
- 238000013500 data storage Methods 0.000 description 11
- 210000002569 neuron Anatomy 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 8
- 238000013135 deep learning Methods 0.000 description 7
- 230000018109 developmental process Effects 0.000 description 7
- 230000004913 activation Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 6
- 230000015556 catabolic process Effects 0.000 description 6
- 238000006731 degradation reaction Methods 0.000 description 6
- 238000011161 development Methods 0.000 description 6
- 238000009792 diffusion process Methods 0.000 description 6
- 238000003709 image segmentation Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 2
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 2
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000009833 condensation Methods 0.000 description 1
- 230000005494 condensation Effects 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000008570 general process Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000010897 surface acoustic wave method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
Definitions
- the present application relates to the field of artificial intelligence, and in particular to an image processing method and related devices.
- Artificial intelligence is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
- artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence.
- Artificial intelligence is to study the design principles and implementation methods of various intelligent machines so that machines have the functions of perception, reasoning and decision-making.
- Image restoration refers to the use of technical means to remove the degradation components in these low-quality pictures, thereby restoring clear and high-quality pictures.
- the present application provides an image processing method, which can improve image restoration quality and reduce processing time.
- an embodiment of the present application provides an image processing method, the method comprising: acquiring a first image; converting the first image to a frequency domain to obtain first data; the spatial resolution of the first data is lower than that of the first image; determining first high-frequency information based on the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image; acquiring first low-frequency information, the first low-frequency information containing noise of a low-frequency channel of the high-quality image; obtaining first noise information through a first network based on the first high-frequency information and the first low-frequency information; the first noise information is used to denoise the first low-frequency information to obtain second low-frequency information; the second low-frequency information and the first high-frequency information are used to obtain a second image.
- the image is converted to the frequency domain for image restoration, which can avoid the use of image segmentation (the segments need to be processed separately and then merged, which may cause boundary artifacts, and when the image size is large, the number of segments is too large, resulting in a long processing time), thereby improving the restoration quality and reducing the processing time.
- the image restoration based on the noise has a higher image quality (it can restore more details and significantly reduce the total sampling time).
- obtaining first noise information through a first network according to the first high-frequency information and the first low-frequency information includes: obtaining first noise information through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
- obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer greater than 1; the method further includes: obtaining the second noise information through the first network according to the first high-frequency information and the second low-frequency information at the i+1-th iteration, the second noise information being used to denoise the second low-frequency information to obtain third low-frequency information; the first low-frequency information and the first high-frequency information are used to obtain a second image, including: the third low-frequency information and the first high-frequency information are used to obtain the second image.
- the method further includes: performing target mapping on the second low-frequency information to obtain target low-frequency information;
- the target mapping does not include a noise estimation item;
- the second low-frequency information and the first high-frequency information are used to obtain a second image, including: the target low-frequency information and the first high-frequency information are used to fuse to obtain a fusion result, and the second image is obtained by mapping the fusion result to the spatial domain.
- the second low-frequency information can be mapped through target mapping to obtain target low-frequency information; the target mapping does not include a noise estimation item; the target low-frequency information and the first high-frequency information are used for fusion (for example, splicing) to obtain a fusion result, and the second image is obtained by mapping the fusion result to the spatial domain (for example, through an inverse wavelet transform).
- the total number of sampling steps can be greatly reduced (for example, reduced to about 1/5 of the original), thereby improving the sampling efficiency.
- the first low-frequency information is randomly generated noise.
- determining the first high-frequency information according to the first data includes: determining the first high-frequency information through a second network according to the first data.
- converting the first image into the frequency domain includes: converting the first image into the frequency domain by using a second-order wavelet transform.
- the present application provides a model training method, the method comprising:
- the spatial resolution of the first data is lower than that of the first image
- the spatial resolution of the second data is lower than that of the second image
- the second data includes first low-frequency information
- the first low-frequency information is information of a low-frequency channel in the second data
- the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image
- first noise information is obtained through a first network; the first noise information is used to determine a first loss together with the second noise information; the second noise information is randomly generated noise;
- the first network is updated according to the first loss.
- the image is converted to the frequency domain for image restoration, which can avoid the use of image segmentation (the segments need to be processed separately and then merged, which may cause boundary artifacts, and when the image size is large, the number of segments is too large, resulting in a long processing time), thereby improving the restoration quality and reducing the processing time.
- the image restoration based on the noise has a higher image quality (it can restore more details and significantly reduce the total sampling time).
- obtaining first noise information through a first network according to the first high-frequency information and the first low-frequency information includes:
- First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
- obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer; and the method further includes:
- third noise information is obtained through the first network according to the first high-frequency information and the third low-frequency information; the third noise information is used to determine the second loss together with the fourth noise information; the fourth noise information is randomly generated noise;
- the updated first network is updated according to the second loss.
- determining first high-frequency information according to the first data includes:
- first high-frequency information is determined through a second network; the second network is a pre-trained network.
- converting the first image and the second image into a frequency domain includes:
- the first image and the second image are converted into the frequency domain by second-order wavelet transform.
- the present application provides an image processing device, the device comprising:
- An acquisition module used for acquiring a first image
- a processing module configured to convert the first image into a frequency domain to obtain first data; the spatial resolution of the first data is lower than that of the first image;
- the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image
- the first low-frequency information includes noise of a low-frequency channel of the high-quality image
- first noise information is obtained through a first network; the first noise information is used to denoise the first low-frequency information to obtain second low-frequency information;
- the second low-frequency information and the first high-frequency information are used to obtain a second image.
- the processing module is specifically configured to:
- First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
- obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer greater than 1; and the processing module is further used to:
- second noise information is obtained through the first network according to the first high-frequency information and the second low-frequency information, and the second noise information is used to denoise the second low-frequency information to obtain third low-frequency information;
- the first low-frequency information and the first high-frequency information are used to obtain a second image, including:
- the third low-frequency information and the first high-frequency information are used to obtain a second image.
- processing module is further configured to:
- the second low-frequency information is subjected to target mapping to obtain target low-frequency information; the target mapping does not include a noise estimation item;
- the processing module is specifically used for:
- the target low-frequency information and the first high-frequency information are used to fuse to obtain a fusion result, and the second image is obtained by mapping the fusion result to a spatial domain.
- the first low-frequency information is randomly generated noise.
- the processing module is specifically configured to:
- first high-frequency information is determined through a second network.
- the processing module is specifically configured to:
- the first image is converted into the frequency domain by second-order wavelet transform.
- the present application provides a model training device, the device comprising:
- An acquisition module used to acquire a first image and a second image; the first image and the second image are acquired for the same scene; the second image is a high-quality image corresponding to the first image;
- a processing module configured to convert the first image and the second image into a frequency domain to obtain first data and second data respectively; the spatial resolution of the first data is lower than that of the first image; the spatial resolution of the second data is lower than that of the second image; the second data includes first low-frequency information; the first low-frequency information is information of a low-frequency channel in the second data;
- the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image
- first noise information is obtained through a first network; the first noise information is used to determine a first loss together with the second noise information; the second noise information is randomly generated noise;
- the first network is updated according to the first loss.
- the processing module is specifically configured to:
- First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
- obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer; and the processing module is further used to:
- third noise information is obtained through the first network according to the first high-frequency information and the third low-frequency information; the third noise information is used to determine the second loss together with the fourth noise information; the fourth noise information is randomly generated noise;
- the updated first network is updated according to the second loss.
- the processing module is specifically configured to:
- first high-frequency information is determined through a second network; the second network is a pre-trained network.
- the processing module is specifically configured to:
- the first image and the second image are converted into the frequency domain by second-order wavelet transform.
- an embodiment of the present application provides an image processing device, which may include a memory, a processor, and a bus system, wherein the memory is used to store programs, and the processor is used to execute the programs in the memory to perform any optional method as described in the first aspect above.
- an embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored.
- the computer-readable storage medium is run on a computer, the computer executes the above-mentioned first aspect and any optional method.
- an embodiment of the present application provides a computer program product, including code, which, when executed, is used to implement the above-mentioned first aspect and any optional method.
- the present application provides a chip system, which includes a processor for supporting an image processing device to implement the functions involved in the above aspects, such as sending or processing the data involved in the above methods; or information.
- the chip system also includes a memory, which is used to store program instructions and data necessary for executing the device or training the device.
- the chip system can be composed of a chip, or it can include a chip and other discrete devices.
- FIG1A is a schematic diagram of a structure of an artificial intelligence main framework
- FIGS. 1B and 2 are schematic diagrams of the application system framework of the present invention.
- FIG3 is a schematic diagram of an optional hardware structure of a terminal
- FIG4 is a schematic diagram of the structure of a server
- FIG5 is a schematic diagram of a system architecture of the present application.
- FIG6 is a process of a cloud service
- FIG7 is a schematic diagram of the structure of a neural network model in an embodiment of the present application.
- FIG8 is a schematic diagram of the structure of a neural network model in an embodiment of the present application.
- FIG9 is a schematic diagram of a process of an image processing method
- FIG10 is a schematic diagram of an image processing method
- FIG11 is a schematic diagram of an image processing method
- FIG12A is a schematic diagram of a beneficial effect
- FIG12B is a schematic diagram of an architecture
- FIG13 is a schematic diagram of the structure of an image processing device provided in an embodiment of the present application.
- FIG14 is a schematic diagram of an execution device provided in an embodiment of the present application.
- FIG15 is a schematic diagram of a training device provided in an embodiment of the present application.
- FIG16 is a schematic diagram of a chip provided in an embodiment of the present application.
- Figure 1A shows a structural diagram of the main framework of artificial intelligence.
- the following is an explanation of the above artificial intelligence theme framework from the two dimensions of "intelligent information chain” (horizontal axis) and “IT value chain” (vertical axis).
- the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be a general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has undergone a condensation process of "data-information-knowledge-wisdom".
- the "IT value chain” reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecology process of the system.
- the infrastructure provides computing power support for the artificial intelligence system, enables communication with the outside world, and is supported by the basic platform. It communicates with the outside world through sensors; computing power is provided by smart chips (CPU, NPU, GPU, ASIC, FPGA and other hardware acceleration chips); the basic platform includes distributed computing frameworks and networks and other related platform guarantees and support, which can include cloud storage and computing, interconnected networks, etc. For example, sensors communicate with the outside world to obtain data, and these data are provided to the smart chips in the distributed computing system provided by the basic platform for calculation.
- smart chips CPU, NPU, GPU, ASIC, FPGA and other hardware acceleration chips
- the basic platform includes distributed computing frameworks and networks and other related platform guarantees and support, which can include cloud storage and computing, interconnected networks, etc.
- sensors communicate with the outside world to obtain data, and these data are provided to the smart chips in the distributed computing system provided by the basic platform for calculation.
- the data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence.
- the data involves graphics, images, voice, text, and IoT data of traditional devices, including business data of existing systems and perception data such as force, displacement, liquid level, temperature, and humidity.
- Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
- machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, and training.
- Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
- some general capabilities can be further formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.
- Smart products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall artificial intelligence solution, which productizes intelligent information decision-making and realizes practical applications. Its application areas mainly include: smart terminals, smart transportation, smart medical care, autonomous driving, smart cities, etc.
- the embodiments of the present application can be applied to tasks related to image processing, such as image enhancement and other fields.
- the product form of the embodiment of the present application may be an image processing application.
- the image processing application may be run on a terminal device or On the server on the cloud side.
- the image processing application may implement image processing tasks or tasks based on image processing results.
- a user can open an application with image processing functions installed on a terminal device.
- the application can obtain image data captured by the camera or image data specified by the user.
- the image processing application can obtain processing results based on the input data through the method provided in the embodiment of the present application, and present the image processing results or downstream task results based on the image processing results to the user (the presentation method can be but is not limited to display, saving, uploading to the cloud side, etc.).
- a user can open an image processing application installed on a terminal device.
- the application can obtain image data captured by a camera or image data specified by a user.
- the image processing application can send the data (or the result obtained after certain processing on the data) to a server on the cloud side.
- the server on the cloud side generates an image processing result based on the image through the method provided in an embodiment of the present application, and transmits the image processing result or the result of a downstream task implemented based on the image processing result back to the terminal device.
- the terminal device can present the image processing result or the result of a downstream task implemented based on the image processing result to the user (the presentation method can be but is not limited to display, saving, uploading to the cloud side, etc.).
- FIG. 1B is a schematic diagram of the functional architecture of an image processing application in an embodiment of the present application:
- an image processing application 102 may receive input data 101 (e.g., image and event data) and generate a processing result 103.
- the image processing application 102 may be executed on, for example, at least one computer system and includes computer codes that, when executed by one or more computers, cause the computers to execute the image processing method described herein.
- FIG. 2 is a schematic diagram of the physical architecture for running an image processing application in an embodiment of the present application:
- Fig. 2 shows a schematic diagram of a system architecture.
- the system may include a terminal 100 and a server 200.
- the server 200 may include one or more servers (one server is used as an example in Fig. 2 for illustration), and the server 200 may provide image processing services for one or more terminals or perform downstream tasks based on image processing results.
- the terminal 100 can be installed with an image processing application, or a web page related to image processing or downstream tasks based on image processing results can be opened.
- the above application and web page can provide an interface.
- the terminal 100 can receive relevant parameters entered by the user on the image processing or downstream task interface based on image processing results, and send the above parameters to the server 200.
- the server 200 can obtain the processing results based on the received parameters and return the processing results to the terminal 100.
- the terminal 100 can also complete the data processing results based on the received parameters by itself without the need for cooperation from the server, and the embodiments of the present application are not limited to this.
- the terminal 100 in the embodiment of the present application can be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a laptop computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (PDA), etc., and the embodiment of the present application does not impose any limitation on this.
- AR augmented reality
- VR virtual reality
- UMPC ultra-mobile personal computer
- PDA personal digital assistant
- FIG. 3 shows a schematic diagram of an optional hardware structure of the terminal 100 .
- the terminal 100 may include components such as a radio frequency unit 110, a memory 120, an input unit 130, a display unit 140, a camera 150 (optional), an audio circuit 160 (optional), a speaker 161 (optional), a microphone 162 (optional), a processor 170, an external interface 180, and a power supply 190.
- a radio frequency unit 110 such as a radio frequency unit 110, a memory 120, an input unit 130, a display unit 140, a camera 150 (optional), an audio circuit 160 (optional), a speaker 161 (optional), a microphone 162 (optional), a processor 170, an external interface 180, and a power supply 190.
- FIG3 is merely an example of a terminal or a multi-function device, and does not constitute a limitation on the terminal or the multi-function device, and may include more or fewer components than shown in the figure, or combine certain components, or different components.
- the input unit 130 can be used to receive input digital or character information, and generate key signal input related to the user settings and function control of the portable multi-function device.
- the input unit 130 may include a touch screen 131 (optional) and/or other input devices 132.
- the touch screen 131 can collect user touch operations on or near it (such as operations performed by the user using fingers, joints, stylus, or any other suitable objects on or near the touch screen), and drive the corresponding connection device according to a pre-set program.
- the touch screen can detect the user's touch action on the touch screen, convert the touch action into a touch signal and send it to the processor 170, and can receive and execute commands sent by the processor 170; the touch signal at least includes touch point coordinate information.
- the touch screen 131 can provide communication between the terminal 100 and the user.
- the touch screen can be implemented by various types such as resistive, capacitive, infrared, and surface acoustic wave.
- the input unit 130 can also include other input devices.
- the other input devices 132 can include but are not limited to one or more of a physical keyboard, a function key (such as a volume control key, a switch key, etc.), a trackball, a mouse, a joystick, etc.
- other input devices 132 can obtain image data collected by a camera or image data specified by a user, etc.
- the display unit 140 may be used to display information input by the user or provided to the user, various menus of the terminal 100, interactive interfaces, file display, and/or playback of any multimedia file.
- the display unit 140 may be used to display an interface of an application program related to image processing, etc.
- the memory 120 can be used to store instructions and data.
- the memory 120 can mainly include an instruction storage area and a data storage area.
- the data storage area can store various data, such as multimedia files, texts, etc.;
- the instruction storage area can store software units such as operating systems, applications, instructions required for at least one function, or their subsets and extensions. It can also include a non-volatile random access memory; provide the processor 170 with hardware, software and data resources including management of computing and processing equipment, and support control software and applications. It is also used for the storage of multimedia files, and the storage of running programs and applications.
- the processor 170 is the control center of the terminal 100. It uses various interfaces and lines to connect various parts of the entire terminal 100. By running or executing instructions stored in the memory 120 and calling data stored in the memory 120, it executes various functions of the terminal 100 and processes data, thereby controlling the terminal device as a whole.
- the processor 170 may include one or more processing units; preferably, the processor 170 may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface and application program, and the modem processor mainly processes wireless communication. It is understandable that the above-mentioned modem processor may not be integrated into the processor 170.
- the processor and the memory may be implemented on a single chip, and in some embodiments, they may also be implemented separately on separate chips.
- the processor 170 may also be used to generate corresponding operation control signals, send them to corresponding components of the computing and processing device, read and process data in the software, especially read and process data and programs in the memory 120, so that each functional module therein performs corresponding functions, thereby controlling the corresponding components to act according to the requirements of the instructions.
- the memory 120 can be used to store software codes related to the image processing method
- the processor 170 can execute the steps of the image processing method of the chip, and can also schedule other units (such as the above-mentioned input unit 130 and display unit 140) to realize corresponding functions.
- the RF unit 110 (optional) can be used for receiving and sending information or receiving and sending signals during a call, for example, after receiving the downlink information of the base station, it is sent to the processor 170 for processing; in addition, the designed uplink data is sent to the base station.
- the RF circuit includes but is not limited to an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (Low Noise Amplifier, LNA), a duplexer, etc.
- the RF unit 110 can also communicate with network devices and other devices through wireless communication.
- the wireless communication can use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), etc.
- GSM Global System of Mobile communication
- GPRS General Packet Radio Service
- CDMA Code Division Multiple Access
- WCDMA Wideband Code Division Multiple Access
- LTE Long Term Evolution
- email Short Messaging Service
- the RF unit 110 can send image data to the server 200 and receive information on processing results sent by the server 200.
- radio frequency unit 110 is optional and can be replaced by other communication interfaces, such as a network port.
- the terminal 100 also includes a power supply 190 (such as a battery) for supplying power to various components.
- a power supply 190 such as a battery
- the power supply can be logically connected to the processor 170 through a power management system, so that the power management system can manage functions such as charging, discharging, and power consumption.
- the terminal 100 also includes an external interface 180, which can be a standard Micro USB interface or a multi-pin connector. It can be used to connect the terminal 100 to communicate with other devices, and can also be used to connect a charger to charge the terminal 100.
- an external interface 180 can be a standard Micro USB interface or a multi-pin connector. It can be used to connect the terminal 100 to communicate with other devices, and can also be used to connect a charger to charge the terminal 100.
- the terminal 100 may also include a flashlight, a wireless fidelity (WiFi) module, a Bluetooth module, sensors with different functions, etc., which will not be described in detail here. Some or all of the methods described below may be applied to the terminal 100 shown in FIG. 3 .
- WiFi wireless fidelity
- Bluetooth Bluetooth
- Fig. 4 provides a schematic diagram of the structure of a server 200.
- the server 200 includes a bus 201, a processor 202, a communication interface 203 and a memory 204.
- the processor 202, the memory 204 and the communication interface 203 communicate with each other via the bus 201.
- the bus 201 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
- the bus may be divided into an address bus, a data bus, a control bus, etc.
- FIG4 only uses one thick line, but does not mean that there is only one bus or one type of bus.
- the processor 202 may be any one or more of a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP), or a digital signal processor (DSP).
- CPU central processing unit
- GPU graphics processing unit
- MP microprocessor
- DSP digital signal processor
- the memory 204 may include a volatile memory (volatile memory), such as a random access memory (RAM).
- volatile memory such as a random access memory (RAM).
- RAM random access memory
- non-volatile memory non-volatile memory
- ROM read-only memory
- flash memory flash memory
- HDD hard drive
- SSD solid state drive
- the memory 204 may be used to store software codes related to the image processing method, and the processor 202 may execute the steps of the image processing method of the chip, and may also schedule other units to implement corresponding functions.
- the above-mentioned terminal 100 and server 200 can be centralized or distributed devices, and the processors in the above-mentioned terminal 100 and server 200 (such as processor 170 and processor 202) can be hardware circuits (such as application specific integrated circuit (ASIC), field-programmable gate array (FPGA), general-purpose processor, digital signal processor (DSP), microprocessor or microcontroller, etc.), or a combination of these hardware circuits.
- the processor can be a hardware system with an instruction execution function, such as a CPU, DSP, etc., or a hardware system without an instruction execution function, such as an ASIC, FPGA, etc., or a combination of the above-mentioned hardware systems without an instruction execution function and hardware systems with an instruction execution function.
- the steps related to the model reasoning process in the embodiments of the present application involve AI-related operations.
- the instruction execution architecture of the terminal device and the server is not limited to the processor combined with the memory architecture described above.
- the system architecture provided in the embodiments of the present application is described in detail below in conjunction with Figure 5.
- a task processing system 500 includes an execution device 510, a training device 520, a database 530, a client device 540, a data storage system 550 and a data acquisition device 560, and the execution device 510 includes a computing module 511.
- the data acquisition device 560 is used to obtain an open source large-scale data set (i.e., a training set) required by the user, and store the training set in the database 530.
- the training device 520 trains the target model/rule 501 based on the training set maintained in the database 530, and the trained neural network obtained by the training is then used on the execution device 510.
- the execution device 510 can call the data, code, etc. in the data storage system 550, and can also store data, instructions, etc. in the data storage system 550.
- the data storage system 550 can be placed in the execution device 510, or the data storage system 550 can be an external memory relative to the execution device 510.
- the trained neural network obtained after the target model/rule 501 is trained by the training device 520 can be applied to different systems or devices (i.e., the execution device 510), which can be edge devices or end-side devices, such as mobile phones, tablets, laptops, monitoring systems (such as cameras), security systems, etc.
- the execution device 510 is configured with an I/O interface 512 for data interaction with external devices, and a "user" can input data to the I/O interface 512 through a client device 540.
- the client device 540 can be a camera device of a monitoring system, and the images and event data captured by the camera device are input as input data to the computing module 511 of the execution device 510, and the computing module 511 processes the input target image to obtain a processing result, and then outputs the processing result to the camera device or directly displays it on the display interface of the execution device 510 (if any); in addition, in some embodiments of the present application, the client device 540 can also be integrated in the execution device 510, such as, when the execution device 510 is a mobile phone, the target task can be directly obtained through the mobile phone (such as, the image and event data can be captured by the camera of the mobile phone) or the target task sent by other devices (such as, another mobile phone) can be received, and then the computing module 511 in the mobile phone detects the target task and obtains the detection result, and directly presents the detection result on the display interface of the mobile phone.
- the product form of the execution device 510 and the client device 540 is not limited here.
- FIG. 5 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, components, modules, etc. shown in the figure does not constitute any limitation.
- the data storage system 550 is an external memory relative to the execution device 510. In other cases, the data storage system 550 can also be placed in the execution device 510. It should be understood that the above-mentioned execution device 510 can be deployed in the client device 540.
- the computing module 511 of the above-mentioned execution device 510 can obtain the code stored in the data storage system 550 to implement the steps related to the model reasoning process in the embodiment of the present application.
- the computing module 511 of the execution device 510 may include a hardware circuit (such as an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a general-purpose processor, a digital signal processor (DSP), a microprocessor or a microcontroller, etc.), or a combination of these hardware circuits.
- ASIC application specific integrated circuit
- FPGA field-programmable gate array
- DSP digital signal processor
- the training device 520 can be a hardware system with the function of executing instructions, such as CPU, DSP, etc., or a hardware system without the function of executing instructions, such as ASIC, FPGA, etc., or a combination of the above-mentioned hardware systems without the function of executing instructions and hardware systems with the function of executing instructions.
- the computing module 511 of the execution device 510 can be a hardware system with an execution instruction function, and the steps related to the model reasoning process provided in the embodiment of the present application can be software codes stored in the memory.
- the computing module 511 of the execution device 510 can obtain the software code from the memory and execute the obtained software code to implement the steps related to the model reasoning process provided in the embodiment of the present application.
- the computing module 511 of the execution device 510 can be a combination of a hardware system that does not have the function of executing instructions and a hardware system that has the function of executing instructions. Some of the steps related to the model reasoning process provided in the embodiments of the present application can also be implemented by the hardware system that does not have the function of executing instructions in the computing module 511 of the execution device 510, which is not limited here.
- the above-mentioned training device 520 can obtain the code stored in the memory (not shown in Figure 5, which can be integrated into the training device 520 or deployed separately from the training device 520) to implement the steps related to model training in an embodiment of the present application.
- the training device 520 may include a hardware circuit (such as an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a general-purpose processor, a digital signal processor (DSP), a microprocessor or a microcontroller, etc.), or a combination of these hardware circuits.
- the training device 520 may be a hardware system with an instruction execution function, such as a CPU, DSP, etc., or a hardware system without an instruction execution function, such as an ASIC, FPGA, etc., or a combination of the above-mentioned hardware systems without an instruction execution function and hardware systems with an instruction execution function.
- the training device 520 can be a combination of a hardware system that does not have the function of executing instructions and a hardware system that has the function of executing instructions. Some of the steps related to the model training provided in the embodiments of the present application can also be implemented by the hardware system that does not have the function of executing instructions in the training device 520, which is not limited here.
- Image processing cloud services provided by the server:
- the server can provide image processing services to the end side or perform downstream tasks based on the image processing results through an application programming interface (API).
- API application programming interface
- the terminal device can send relevant parameters (such as image data) to the server through the API provided by the cloud.
- the server can obtain processing results based on the received parameters and return the processing results (such as enhanced image data) to the terminal.
- FIG. 6 shows a process of using an image processing cloud service provided by a cloud platform.
- SDK software development kit
- the cloud platform provides multiple development versions of the SDK for users to choose according to the requirements of the development environment, such as JAVA version SDK, Python version SDK, PHP version SDK, Android version SDK, etc.
- the SDK project is imported into the local development environment, and configuration and debugging are performed in the local development environment.
- the local development environment can also be used to develop other functions, thus forming an application that integrates image processing capabilities.
- API calls for image processing or downstream tasks based on image processing results can be triggered.
- an API request is initiated to the running instance of the image processing service in the cloud environment, where the API request carries an image, and the running instance in the cloud environment processes the image to obtain the processing result.
- the cloud environment returns the processing results to the application, thereby completing the image processing once or making a downstream task service call based on the image processing results.
- a neural network may be composed of neural units, and a neural unit may refer to an operation unit that takes xs (i.e., input data) and intercept 1 as input, and the output of the operation unit may be:
- n is a natural number greater than 1
- Ws is the weight of xs
- b is the bias of the neural unit.
- f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into the output signal.
- the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
- a neural network is a network formed by connecting multiple single neural units mentioned above, that is, the output of one neural unit can be the input of another neural unit.
- the input of each neural unit can be connected to the local receptive field of the previous layer to extract the characteristics of the local receptive field.
- the local receptive field can be an area composed of several neural units.
- Convolutional neural network is a deep neural network with a convolutional structure.
- a convolutional neural network contains a feature extractor consisting of a convolutional layer and a subsampling layer, which can be regarded as a filter.
- a convolutional layer refers to a neuron layer in a convolutional neural network that performs convolution processing on the input signal.
- a neuron can only be connected to some neurons in the adjacent layers.
- a convolutional layer usually contains several feature planes, each of which can be composed of some rectangularly arranged neural units. The neural units in the same feature plane share weights, and the shared weights here are convolution kernels.
- Shared weights can be understood as the way to extract features is independent of position.
- the convolution kernel can be formalized as a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network.
- the direct benefit of shared weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
- CNN is a very common neural network.
- convolutional neural network is a deep neural network with a convolution structure and a deep learning architecture.
- a deep learning architecture refers to multiple levels of learning at different abstract levels through machine learning algorithms.
- CNN is a feed-forward artificial neural network in which each neuron can respond to the image input into it.
- a convolutional neural network (CNN) 200 may include an input layer 210 , a convolutional layer/pooling layer 220 (wherein the pooling layer is optional), and a fully connected layer 230 .
- the convolution layer/pooling layer 220 may include layers 221-226, for example: in one implementation, layer 221 is a convolution layer, layer 222 is a pooling layer, layer 223 is a convolution layer, layer 224 is a pooling layer, layer 225 is a convolution layer, and layer 226 is a pooling layer; in another implementation, layers 221 and 222 are convolution layers, layer 223 is a pooling layer, layers 224 and 225 are convolution layers, and layer 226 is a pooling layer. That is, the output of a convolution layer can be used as the input of a subsequent pooling layer, or as the input of another convolution layer to continue the convolution operation.
- the convolution layer 221 may include a plurality of convolution operators, which are also called kernels.
- the convolution operator is equivalent to a filter that extracts specific information from the input image matrix in image processing.
- the convolution operator can be essentially a weight matrix, which is usually predefined. In the process of performing convolution operations on the image, the weight matrix is usually processed one pixel after another (or two pixels after two pixels... depending on the value of the step length stride) in the horizontal direction on the input image, thereby completing the work of extracting specific features from the image.
- the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image.
- the weight matrix will extend to the entire depth of the input image. Therefore, convolution with a single weight matrix will produce a convolution output with a single depth dimension, but in most cases, a single weight matrix is not used, but multiple weight matrices of the same size (row ⁇ column), that is, multiple isotype matrices, are applied.
- the output of each weight matrix is stacked to form the depth dimension of the convolution image, and the dimension here can be understood as being determined by the "multiple" mentioned above.
- Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to blur unnecessary noise in the image.
- the multiple weight matrices have the same size (rows ⁇ columns), and the feature maps extracted by the multiple weight matrices of the same size are also the same size. Multiple extracted feature maps of the same size are merged to form the output of the convolution operation.
- weight values in these weight matrices need to be obtained through a lot of training in practical applications.
- the weight matrices formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions.
- the initial convolutional layer (for example, 221) often extracts more general features, which can also be called low-level features.
- the features extracted by the later convolutional layers (for example, 226) become more and more complex, such as high-level semantic features. Features with higher semantics are more suitable for the problem to be solved.
- a convolution layer may be followed by a pooling layer, or multiple convolution layers may be followed by one or more pooling layers.
- the pooling layer may include an average pooling operator and/or a maximum pooling operator to sample the input image to obtain an image of smaller size.
- the average pooling operator may calculate the pixel values in the image within a specific range to generate an average value as the result of average pooling.
- the maximum pooling operator may take the pixel with the largest value in the range within a specific range as the result of maximum pooling.
- the operator in the pooling layer should also be related to the image size.
- the size of the image output after processing by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average value or maximum value of the corresponding sub-region of the image input to the pooling layer.
- the convolution neural network 200 After being processed by the convolution layer/pooling layer 220, the convolution neural network 200 is not sufficient to output the required output information. Because as mentioned above, the convolution layer/pooling layer 220 will only extract features and reduce the parameters brought by the input image. However, in order to generate the final output information (the required class information or other related information), the convolution neural network 200 needs to use the fully connected layer 230 to generate one or a group of outputs of the required number of classes. Therefore, the fully connected layer 230 may include multiple hidden layers (231, 232 to 23n as shown in Figure 7), and the parameters contained in the multiple hidden layers can be pre-trained according to the relevant training data of the specific task type. For example, the task type may include image recognition, image classification, image super-resolution reconstruction, etc.
- the output layer 240 After the multiple hidden layers in the fully connected layer 230, that is, the last layer of the entire convolutional neural network 200 is the output layer 240, which has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error.
- the forward propagation of the entire convolutional neural network 200 (as shown in FIG. 7, the propagation from 210 to 240 is the forward propagation) is completed, the back propagation (as shown in FIG. 7, the propagation from 240 to 210 is the back propagation) will begin to update the weight values and biases of the aforementioned layers to reduce the loss of the convolutional neural network 200 and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.
- the convolutional neural network 200 shown in Figure 7 is only an example of a convolutional neural network.
- the convolutional neural network may also exist in the form of other network models, for example, including only a part of the network structure shown in Figure 7.
- the convolutional neural network used in the embodiment of the present application may only include an input layer 210, a convolution layer/pooling layer 220 and an output layer 240.
- the convolutional neural network 100 shown in FIG. 7 is only an example of a convolutional neural network.
- the convolutional neural network can also exist in the form of other network models. For example, multiple convolutional layers/pooling layers are used in parallel as shown in FIG. 8, and the extracted features are input to the fully connected layer 230 for processing.
- Deep Neural Network also known as multi-layer neural network
- DNN Deep Neural Network
- the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer.
- the first layer is the input layer
- the last layer is the output layer
- the layers in between are all hidden layers.
- the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
- the coefficients from the kth neuron in the L-1th layer to the jth neuron in the Lth layer are defined as It should be noted that the input The layer has no W parameter.
- more hidden layers allow the network to better describe complex situations in the real world. Theoretically, the more parameters a model has, the higher its complexity and the greater its "capacity", which means it can complete more complex learning tasks.
- Training a deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by many layers of vector W).
- the error back propagation (BP) algorithm can be used to correct the size of the parameters in the initial model during the training process, so that the error loss of the model becomes smaller and smaller. Specifically, the forward transmission of the input signal to the output will generate error loss, and the parameters in the initial model are updated by back propagating the error loss information, so that the error loss converges.
- the back propagation algorithm is a back propagation movement dominated by error loss, aiming to obtain the optimal model parameters, such as the weight matrix.
- Diffusion Models refer to defining a Markov chain of diffusion steps, gradually adding random noise to the data, and then learning the inverse diffusion process to construct the required data samples from the noise.
- Image Restoration refers to the process of removing the degraded components in low-quality images caused by various factors and restoring high-quality images with complete details.
- This application can be applied in practical scenarios such as image enhancement and restoration, terminal applications, and autonomous driving.
- Image restoration refers to the use of technical means to remove the degradation components in these low-quality pictures, thereby restoring clear and high-quality pictures.
- the present application provides an image processing method, which can be a feedforward process of model training or an inference process.
- FIG. 9 is an image processing method provided by an embodiment of the present application. As shown in FIG. 9 , the image processing method provided by the present application includes:
- the execution subject of step 901 may be a terminal device, and the terminal device may be a portable mobile device, such as but not limited to a mobile or portable computing device (such as a smart phone), a personal computer, a server computer, a handheld device (such as a tablet), or a laptop.
- a mobile or portable computing device such as a smart phone
- a personal computer such as a personal computer
- a server computer such as a server
- a handheld device such as a tablet
- a laptop such as a laptop.
- a computer or device comprising a laptop, a multiprocessor system, a game console or controller, a microprocessor-based system, a set-top box, a programmable consumer electronics product, a mobile phone, a mobile computing and/or communication device with a wearable or accessory form factor (e.g., a watch, glasses, headphones or earbuds), a network PC, a minicomputer, a mainframe computer, a distributed computing environment including any of the above systems or devices, and the like.
- a wearable or accessory form factor e.g., a watch, glasses, headphones or earbuds
- the execution entity of step 1001 may be a server on the cloud side, and the server may receive the first image sent from the terminal device, and then the server may obtain the first image.
- the first image may be a low-quality image, an image occluded by natural environment such as raindrops, or an image that is underexposed due to the influence of ambient light or an image with obvious moiré.
- the first image may be converted into the frequency domain by a second-order wavelet transform.
- the spatial domain RGB low-quality image Xd can be transformed by a second-order Haar wavelet transform to obtain the image xd in the wavelet domain.
- the image size is changed from H ⁇ W ⁇ 3 to This reduces spatial resolution by a factor of 16, which can speed up processing time.
- the diffusion model is introduced from the spatial domain to the wavelet domain using wavelet transform, which can significantly reduce the image processing time (the model only needs to learn part of the spectrum of the image, which is relatively simpler. At the same time, due to the reduction in spatial resolution, the model takes less time to process the image).
- the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image.
- the first high-frequency information can be determined based on the first data through a pre-trained second network.
- the second network can be composed of multiple (e.g., 14) convolutional layers with residual structures. Its main function is to learn the difference between the high-frequency spectrum of the low-quality image and the high-frequency spectrum of its corresponding clear image, so as to predict the high-frequency spectrum of the low-quality image after restoration.
- the high-frequency spectrum of the image in the embodiment of the present application (or the information corresponding to the high-frequency channel) is relative to the low-frequency spectrum of the image (or the information corresponding to the low-frequency channel).
- the frequency corresponding to the high-frequency spectrum is higher than the frequency of the low-frequency spectrum.
- the high-quality image corresponding to the first image may include information of multiple channels, wherein the multiple channels may include a high-frequency channel and a low-frequency channel relative to the high-frequency channel.
- first low-frequency information where the first low-frequency information includes noise in a low-frequency channel of the high-quality image.
- step 904 and the subsequent step 905 may be an iterative process, and the result obtained in step 905 may be used as the first low-frequency information obtained in the next step 904 .
- the first low-frequency information may be randomly generated noise (e.g., Gaussian white noise). If step 904 is the i-th (i is greater than 1) iteration process, the first low-frequency information may be randomly generated noise (e.g., Gaussian white noise).
- first noise information through a first network according to the first high-frequency information and the first low-frequency information; the first noise information is used to denoise the first low-frequency information to obtain second low-frequency information.
- the first high-frequency information and the first low-frequency information may be input into a first network, and the first network is a pre-trained network, and the first noise information may be obtained according to the first high-frequency information and the first low-frequency information.
- the first data may also be input into the first network, that is, the first noise information may be obtained through the first network according to the first high-frequency information, the first data and the first low-frequency information.
- obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer greater than 1; at the i+1-th iteration, obtaining the second noise information through the first network according to the first high-frequency information and the second low-frequency information, and the second noise information is used to denoise the second low-frequency information to obtain third low-frequency information.
- the purpose of this network is to estimate the noise that needs to be removed at each moment from this noise, step by step.
- the noise in the image is removed until it becomes a low-frequency spectrum of a clear image.
- the second low-frequency information can be mapped through target mapping to obtain target low-frequency information; the target mapping does not include a noise estimation item; the target low-frequency information and the first high-frequency information are used for fusion (for example, splicing) to obtain a fusion result, and the second image is obtained by mapping the fusion result to the spatial domain (for example, through an inverse wavelet transform).
- the whole sampling process uses skip sampling with a quantization interval S as the span to reduce the number of sampling steps from T to T/S.
- the embodiment of the present application further explores a high-efficiency conditional sampling algorithm, which can directly predict the original image at the middle moment M in the sampling process, that is, there is no need to go through the entire DIS process.
- the number of sampling steps is (TM)/S.
- the flow of the sampling method is: it can be a preset proportion of T (for example, 80%).
- the formula for obtaining Xt-1 as follows is the corresponding implementation of denoising the first low-frequency information according to the first noise information, and the formula for obtaining X0 is the corresponding implementation of the target mapping.
- the total number of sampling steps can be greatly reduced (for example, reduced to about 1/5 of the original), thereby improving the sampling efficiency.
- the second low-frequency information and the first high-frequency information are used to obtain a second image.
- the Gaussian white noise Restore the low-frequency spectrum of a clear image Then, compare it with the high-frequency spectrum of the clear image predicted by HFRM After being fused together and subjected to second-order Haar wavelet inverse transform, the restoration result of the low-quality image Xd in spatial domain is obtained.
- the present application provides an image processing method, the method comprising: obtaining a first image; converting the first image to a frequency domain to obtain first data; the spatial resolution of the first data is lower than that of the first image; determining first high-frequency information according to the first data; the first high-frequency information is the information prediction result of the high-frequency channel of the high-quality image corresponding to the first image; obtaining first low-frequency information, the first low-frequency information containing the noise of the low-frequency channel of the high-quality image; obtaining first noise information through a first network according to the first high-frequency information and the first low-frequency information; the first noise information is used to denoise the first low-frequency information to obtain second low-frequency information; the second low-frequency information and the first high-frequency information are used to obtain a second image.
- converting the image to the frequency domain for image restoration can avoid the use of image segmentation (segmentation needs to be processed separately and then merged, boundary artifacts may occur, and when the image size is large, the number of segments is too large, resulting in a long processing time), thereby improving the restoration quality and reducing the processing time.
- image restoration based on the noise has a higher image quality (more details can be restored, while significantly reducing the total sampling time).
- FIG. 10 is a model training method provided in an embodiment of the present application. As shown in FIG. 10 , the model training method provided in the present application includes:
- the first image may be a low-quality image, an image occluded by natural environment such as raindrops, or an image that is underexposed due to the influence of ambient light or an image with obvious moiré.
- the second image may be a high-quality image corresponding to the first image, for example, an image obtained by removing raindrops from the first image, solving underexposure problems (for example, enhancing dark light photos to natural light levels), or removing moiré patterns.
- the first image and the second image into frequency domain to obtain first data and second data respectively; the spatial resolution of the first data is lower than that of the first image; the spatial resolution of the second data is lower than that of the second image; the second data includes first low-frequency information; the first low-frequency information is information of a low-frequency channel in the second data.
- the first image and the second image may be converted into a frequency domain by a second-order wavelet transform.
- the spatial domain RGB low-quality image Xd and the corresponding clear image X0 can be transformed by the second-order Haar wavelet transform to obtain the wavelet domain images xd and x0 .
- the image size is changed from H ⁇ W ⁇ 3 to This reduces spatial resolution by a factor of 16, speeding up processing time.
- the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image.
- the first high-frequency information can be determined according to the first data through a second network; the second network is a pre-trained network.
- the frequency domain data corresponding to the low-quality image can be input into the second network to predict the information of the high-frequency channels of the high-quality image corresponding to the low-quality image, and obtain the information of the high-frequency channels of the high-quality image corresponding to the real low-quality image.
- the loss is constructed to update the second network, so that the second network has the ability to predict the information of the high-frequency channels of the high-quality image corresponding to the low-quality image based on the frequency domain data corresponding to the low-quality image.
- first noise information through a first network according to the first high-frequency information and the first low-frequency information; the first noise information is used to determine a first loss together with the second noise information; and the second noise information is randomly generated noise.
- the first noise information may be obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
- obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer; noise may also be superimposed on the first low-frequency information to obtain third low-frequency information; at the i+1-th iteration, obtaining the third noise information through the first network according to the first high-frequency information and the third low-frequency information; the third noise information is used to determine the second loss with the fourth noise information; the fourth noise information is randomly generated noise; and the updated first network is updated according to the second loss.
- the low-frequency spectrum of the high-quality image x0 after wavelet transformation is first added with different degrees of Gaussian white noise at different times t, and then sent to the noise estimation network.
- the noise estimation network is a classic U-network structure, and its purpose is to correctly estimate the noise superimposed on the low-frequency spectrum of the high-quality image x0 at each time.
- the image is converted to the frequency domain for image restoration, which can avoid the use of image segmentation (the segments need to be processed separately and then merged, which may cause boundary artifacts, and when the image size is large, the number of segments is too large, resulting in a long processing time), thereby improving the restoration quality and reducing the processing time.
- the image restoration based on the noise has a higher image quality (it can restore more details and significantly reduce the total sampling time).
- Tables 1 to 4 respectively show the performance of the embodiment of the present application (WaveDM) and the existing image raindrop removal dataset (RainDrop), defocus blur removal dataset (DPDD), demoiré dataset (London's Buildings) and dark light enhancement dataset (LOL-v1).
- the evaluation indicators are PSNR, SSIM and recovery time Time. It can be seen from the table that the present application has achieved the best results in both evaluation indicators, and the speed is comparable.
- FIG12B An architecture diagram of an embodiment of the present application can be shown in FIG12B , including a training framework and a sampling framework, which is mainly composed of wavelet transform and spectrum separation, a high-frequency fine-tuning module, a noise estimation network, a high-efficiency sampling algorithm, and an inverse wavelet transform.
- a training framework and a sampling framework which is mainly composed of wavelet transform and spectrum separation, a high-frequency fine-tuning module, a noise estimation network, a high-efficiency sampling algorithm, and an inverse wavelet transform.
- the functions of each part are described as follows:
- Wavelet transform Use a specific wavelet to transform the image from the spatial domain to the wavelet domain to obtain the wavelet spectrum of the image;
- High-frequency fine-tuning module restores the high-frequency spectrum corresponding to the clear image from the high-frequency spectrum of the low-quality image
- Noise estimation network Using the output of the high-frequency fine-tuning module and the low-frequency spectrum remaining after spectrum separation as conditions, it iteratively restores the low-frequency spectrum of the high-quality image from the Gaussian white noise;
- High-efficiency conditional sampling algorithm Using the output of the high-frequency fine-tuning module and the low-frequency spectrum remaining after spectrum separation as conditions, high-quality images are directly predicted in the intermediate sampling step, thereby reducing the number of sampling steps;
- Inverse wavelet transform The low-frequency spectrum of the high-quality image output by the noise network and the high-frequency spectrum of the high-quality image output by the high-frequency fine-tuning module are combined, and a specific inverse wavelet transform is performed to obtain a clear spatial domain RGB high-quality image.
- an image processing device 1300 provided by an embodiment of the present application includes:
- An acquisition module 1301 is used to acquire a first image
- the specific description of the acquisition module 1301 can refer to the description of step 901 in the above embodiment, which will not be repeated here.
- a processing module 1302 is used to convert the first image into a frequency domain to obtain first data; the spatial resolution of the first data is lower than that of the first image;
- the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image
- the first low-frequency information includes noise of a low-frequency channel of the high-quality image
- first noise information is obtained through a first network; the first noise information is used to denoise the first low-frequency information to obtain second low-frequency information;
- the second low-frequency information and the first high-frequency information are used to obtain a second image.
- processing module 1302 can refer to the description of step 902 to step 905 in the above embodiment, which will not be repeated here.
- the processing module is specifically configured to:
- First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
- obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer greater than 1; and the processing module is further used to:
- the second noise is obtained through the first network. information, wherein the second noise information is used to denoise the second low-frequency information to obtain third low-frequency information;
- the first low-frequency information and the first high-frequency information are used to obtain a second image, including:
- the third low-frequency information and the first high-frequency information are used to obtain a second image.
- processing module is further configured to:
- the second low-frequency information is subjected to target mapping to obtain target low-frequency information; the target mapping does not include a noise estimation item;
- the processing module is specifically used for:
- the target low-frequency information and the first high-frequency information are used to fuse to obtain a fusion result, and the second image is obtained by mapping the fusion result to a spatial domain.
- the first low-frequency information is randomly generated noise.
- the processing module is specifically configured to:
- first high-frequency information is determined through a second network.
- the processing module is specifically configured to:
- the first image is converted into the frequency domain by second-order wavelet transform.
- the embodiment of the present application further provides a model training device (which may correspond to the model training method of FIG. 10 ), the device comprising:
- An acquisition module used to acquire a first image and a second image; the first image and the second image are acquired for the same scene; the second image is a high-quality image corresponding to the first image;
- a processing module configured to convert the first image and the second image into a frequency domain to obtain first data and second data respectively; the spatial resolution of the first data is lower than that of the first image; the spatial resolution of the second data is lower than that of the second image; the second data includes first low-frequency information; the first low-frequency information is information of a low-frequency channel in the second data;
- the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image
- first noise information is obtained through a first network; the first noise information is used to determine a first loss together with the second noise information; the second noise information is randomly generated noise;
- the first network is updated according to the first loss.
- the processing module is specifically configured to:
- First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
- obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer; and the processing module is further used to:
- third noise information is obtained through the first network according to the first high-frequency information and the third low-frequency information; the third noise information is used to determine the second loss together with the fourth noise information; the fourth noise information is randomly generated noise;
- the updated first network is updated according to the second loss.
- the processing module is specifically configured to:
- first high-frequency information is determined through a second network; the second network is a pre-trained network.
- the processing module is specifically configured to:
- the first image and the second image are converted into the frequency domain by second-order wavelet transform.
- FIG 14 is a structural schematic diagram of an execution device provided in an embodiment of the present application.
- the execution device 1400 can be specifically expressed as a mobile phone, a tablet, a laptop computer, an intelligent wearable device, a server, etc., which is not limited here.
- the execution device 1400 implements the function of the image processing method in the corresponding embodiment of Figure 10.
- the execution device 1400 includes: a receiver 1401, a transmitter 1402, a processor 1403 and a memory 1404 (wherein the number of processors 1403 in the execution device 1400 can be one or more), wherein the processor 1403 may include an application processor 14031 and a communication processor 14032.
- the receiver 1401, the transmitter 1402, the processor 1403 and the memory 1404 may be connected via a bus or other means.
- the memory 1404 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1403. A portion of the memory 1404 may also include a non-volatile random access memory (NVRAM). 1404 stores processors and operation instructions, executable modules or data structures, or subsets thereof, or extended sets thereof, wherein the operation instructions may include various operation instructions for implementing various operations.
- NVRAM non-volatile random access memory
- the processor 1403 controls the operation of the execution device.
- the various components of the execution device are coupled together through a bus system, wherein the bus system includes not only a data bus but also a power bus, a control bus, and a status signal bus, etc.
- the bus system includes not only a data bus but also a power bus, a control bus, and a status signal bus, etc.
- various buses are referred to as bus systems in the figure.
- the method disclosed in the above embodiment of the present application can be applied to the processor 1403, or implemented by the processor 1403.
- the processor 1403 can be an integrated circuit chip with signal processing capabilities.
- each step of the above method can be completed by the hardware integrated logic circuit or software instructions in the processor 1403.
- the above processor 1403 can be a general-purpose processor, a digital signal processor (digital signal processing, DSP), a microprocessor or a microcontroller, and a vision processor (vision processing unit, VPU), a tensor processor (tensor processing unit, TPU) and other processors suitable for AI computing, and can further include an application specific integrated circuit (application specific integrated circuit, ASIC), a field programmable gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
- the processor 1403 can implement or execute the disclosed methods, steps and logic block diagrams in the embodiments of the present application.
- the general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc.
- the steps of the method disclosed in the embodiment of the present application can be directly embodied as being executed by a hardware decoding processor, or being executed by a combination of hardware and software modules in the decoding processor.
- the software module can be located in a storage medium mature in the art such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, etc.
- the storage medium is located in the memory 1404, and the processor 1403 reads the information in the memory 1404, and completes the steps 901 to 905 in the above embodiment in combination with its hardware.
- the receiver 1401 can be used to receive input digital or character information and generate signal input related to the relevant settings and function control of the execution device.
- the transmitter 1402 can be used to output digital or character information through the first interface; the transmitter 1402 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1402 can also include a display device such as a display screen.
- FIG. 15 is a structural diagram of the training device provided by the embodiment of the present application, specifically, the training device 1500 is implemented by one or more servers, and the training device 1500 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (CPU) 1515 (for example, one or more processors) and memory 1532, one or more storage media 1530 (for example, one or more mass storage devices) storing application programs 1542 or data 1544.
- the memory 1532 and the storage medium 1530 can be short-term storage or permanent storage.
- the program stored in the storage medium 1530 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the training device.
- the central processor 1515 can be configured to communicate with the storage medium 1530 to execute a series of instruction operations in the storage medium 1530 on the training device 1500.
- the training device 1500 may also include one or more power supplies 1526, one or more wired or wireless network interfaces 1550, one or more input and output interfaces 1558; or, one or more operating systems 1541, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
- the embodiment of the present application also provides a computer program product including computer-readable instructions, which, when executed on a computer, enables the computer to execute the steps executed by the aforementioned execution device, or enables the computer to execute the steps executed by the aforementioned training device.
- a computer-readable storage medium is also provided in an embodiment of the present application, which stores a program for signal processing.
- the computer-readable storage medium When the computer-readable storage medium is run on a computer, it enables the computer to execute the steps executed by the aforementioned execution device, or enables the computer to execute the steps executed by the aforementioned training device.
- the execution device, training device or terminal device provided in the embodiments of the present application may specifically be a chip, and the chip includes: a processing unit and a communication unit, wherein the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin or a circuit, etc.
- the processing unit may execute the computer execution instructions stored in the storage unit, so that the chip in the execution device executes the model training method described in the above embodiment, or so that the chip in the training device executes the steps related to model training in the above embodiment.
- the storage unit is a storage unit in the chip, such as a register, a cache, etc.
- the storage unit may also be a storage unit located outside the chip in the wireless access device end, such as a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM), etc.
- ROM read-only memory
- RAM random access memory
- FIG. 16 is a schematic diagram of a structure of a chip provided in an embodiment of the present application.
- the chip can be expressed as Neural network processor NPU 1600, NPU 1600 is mounted on the host CPU (Host CPU) as a coprocessor, and the host CPU assigns tasks.
- the core part of NPU is the operation circuit 1603, which is controlled by the controller 1604 to extract matrix data in the memory and perform multiplication operations.
- the operation circuit 1603 includes multiple processing units (Process Engine, PE) inside.
- the operation circuit 1603 is a two-dimensional systolic array.
- the operation circuit 1603 can also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition.
- the operation circuit 1603 is a general-purpose matrix processor.
- the operation circuit takes the corresponding data of matrix B from the weight memory 1602 and caches it on each PE in the operation circuit.
- the operation circuit takes the matrix A data from the input memory 1601 and performs matrix operation with matrix B, and the partial result or final result of the matrix is stored in the accumulator 1608.
- the unified memory 1606 is used to store input data and output data.
- the weight data is directly transferred to the weight memory 1602 through the direct memory access controller (DMAC) 1605.
- the input data is also transferred to the unified memory 1606 through the DMAC.
- DMAC direct memory access controller
- BIU stands for Bus Interface Unit, that is, the bus interface unit 1610, which is used for the interaction between AXI bus and DMAC and instruction fetch buffer (IFB) 1609.
- IOB instruction fetch buffer
- the bus interface unit 1610 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1609 to obtain instructions from the external memory, and is also used for the storage unit access controller 1605 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
- BIU Bus Interface Unit
- DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1606 or to transfer weight data to the weight memory 1602 or to transfer input data to the input memory 1601.
- the vector calculation unit 1607 includes multiple operation processing units, which further process the output of the operation circuit when necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc. It is mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as Batch Normalization, pixel-level summation, upsampling of feature planes, etc.
- the vector calculation unit 1607 can store the processed output vector to the unified memory 1606.
- the vector calculation unit 1607 can apply a linear function; or a nonlinear function to the output of the operation circuit 1603, such as linear interpolation of the feature plane extracted by the convolution layer, and then, for example, a vector of accumulated values to generate an activation value.
- the vector calculation unit 1607 generates a normalized value, a pixel-level summed value, or both.
- the processed output vector can be used as an activation input to the operation circuit 1603, for example, for use in a subsequent layer in a neural network.
- An instruction fetch buffer 1609 connected to the controller 1604 is used to store instructions used by the controller 1604;
- Unified memory 1606, input memory 1601, weight memory 1602 and instruction fetch memory 1609 are all on-chip memories. External memories are private to the NPU hardware architecture.
- the processor mentioned in any of the above places may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above programs.
- the device embodiments described above are merely schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment.
- the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
- the technical solution of the present application is essentially or the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a readable storage medium, such as a computer floppy disk, a U disk, a mobile hard disk, a ROM, a RAM, a magnetic disk or an optical disk, etc., including a number of instructions to enable a computer device (which can be a personal computer, a training device, or a network device, etc.) to execute the methods described in each embodiment of the present application.
- a computer device which can be a personal computer, a training device, or a network device, etc.
- all or part of the embodiments may be implemented by software, hardware, firmware, or any combination thereof.
- the present invention may be fully or partially implemented in the form of a computer program product.
- the computer program product includes one or more computer instructions.
- the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
- the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
- the computer instructions may be transmitted from a website site, a computer, a training device, or a data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, training device, or data center.
- the computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training device, a data center, etc. that includes one or more available media integrations.
- the available medium may be a magnetic medium, (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state drive (SSD)), etc.
- a magnetic medium e.g., a floppy disk, a hard disk, a tape
- an optical medium e.g., a DVD
- a semiconductor medium e.g., a solid-state drive (SSD)
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Image Processing (AREA)
Abstract
An image processing method, which can be applied to the field of artificial intelligence. The method comprises: acquiring a first image; converting the first image to a frequency domain to obtain first data, the spatial resolution of the first data being lower than that of the first image; determining first high-frequency information according to the first data, the first high-frequency information being an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image; acquiring first low-frequency information, the first low-frequency information comprising noise of a low-frequency channel of the high-quality image; and according to the first high-frequency information and the first low-frequency information, obtaining first noise information by means of a first network, the first noise information being used for denoising the first low-frequency information to obtain second low-frequency information, and the second low-frequency information and the first high-frequency information being used for obtaining a second image. The present application can improve image restoration quality, and reduce processing time.
Description
本申请要求于2023年02月28日提交国家知识产权局、申请号为202310233422.2、发明名称为“一种图像处理方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the State Intellectual Property Office on February 28, 2023, with application number 202310233422.2 and invention name “An image processing method and related device”, all contents of which are incorporated by reference in this application.
本申请涉及人工智能领域,尤其涉及一种图像处理方法及相关装置。The present application relates to the field of artificial intelligence, and in particular to an image processing method and related devices.
人工智能(artificialintelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。Artificial intelligence (AI) is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines so that machines have the functions of perception, reasoning and decision-making.
在现实生活中,由于环境和摄影技术的影响,拍摄到的图片往往会包含退化成分如雨滴、模糊、噪声、摩尔纹等而导致其本身的质量降低。图像复原指的就是通过技术手段去除这些低质图片中的退化成分,从而恢复出清晰的高质量的图片。In real life, due to the influence of the environment and photography technology, the pictures taken often contain degradation components such as raindrops, blur, noise, moiré, etc., which reduce their quality. Image restoration refers to the use of technical means to remove the degradation components in these low-quality pictures, thereby restoring clear and high-quality pictures.
目前业界已经存在较多针对单个或多个任务进行图像恢复的方法。早期它们大多是基于统计先验的传统方法。但由于传统方法的局限性,这些方法不能很好地去除退化成分,而且可能会产生彩色伪影。近年来学术界提出了一些基于深度学习的方法,这些方法大多基于CNN或者Transformer直接通过端对端的训练方式从模糊图像中预测对应的清晰图像,往往需要大量训练数据。At present, there are many methods for image restoration in the industry for single or multiple tasks. In the early days, most of them were traditional methods based on statistical priors. However, due to the limitations of traditional methods, these methods cannot remove degradation components well and may produce color artifacts. In recent years, the academic community has proposed some methods based on deep learning. Most of these methods are based on CNN or Transformer to directly predict the corresponding clear image from the blurred image through end-to-end training, which often requires a large amount of training data.
然而现有技术中的图像复原技术的处理精度较低。However, the processing accuracy of the image restoration technology in the prior art is low.
发明内容Summary of the invention
本申请提供了一种图像处理方法,可以提高图像复原质量以及降低处理时间。The present application provides an image processing method, which can improve image restoration quality and reduce processing time.
第一方面,本申请实施例提供了一种图像处理方法,所述方法包括:获取第一图像;将所述第一图像转换到频域,得到第一数据;所述第一数据的空间分辨率低于所述第一图像;根据所述第一数据,确定第一高频信息;所述第一高频信息为对所述第一图像对应的高质图像的高频通道的信息预测结果;获取第一低频信息,所述第一低频信息包含所述高质图像的低频通道的噪声;根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息;所述第一噪声信息用于对所述第一低频信息进行去噪,得到第二低频信息;所述第二低频信息和所述第一高频信息用于得到第二图像。In a first aspect, an embodiment of the present application provides an image processing method, the method comprising: acquiring a first image; converting the first image to a frequency domain to obtain first data; the spatial resolution of the first data is lower than that of the first image; determining first high-frequency information based on the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image; acquiring first low-frequency information, the first low-frequency information containing noise of a low-frequency channel of the high-quality image; obtaining first noise information through a first network based on the first high-frequency information and the first low-frequency information; the first noise information is used to denoise the first low-frequency information to obtain second low-frequency information; the second low-frequency information and the first high-frequency information are used to obtain a second image.
本申请实施例中,将图像转换到频域进行图像恢复,可以避免采用图像的切分(切块需要分别处理再合并,可能出现边界伪影,且图片尺寸大时切块数量过多,导致处理时间长),从而提高复原质量以及降低处理时间。此外,基于高频信息和包含噪声的低频信息来预测噪声,基于该噪声进行的图像复原得到的图像质量较高(能够恢复出更多的细节,同时显著减少了采样的总时间)。In the embodiment of the present application, the image is converted to the frequency domain for image restoration, which can avoid the use of image segmentation (the segments need to be processed separately and then merged, which may cause boundary artifacts, and when the image size is large, the number of segments is too large, resulting in a long processing time), thereby improving the restoration quality and reducing the processing time. In addition, based on the high-frequency information and the low-frequency information containing noise to predict the noise, the image restoration based on the noise has a higher image quality (it can restore more details and significantly reduce the total sampling time).
在一种可能的实现中,所述根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息,包括:根据所述第一高频信息、所述第一数据和所述第一低频信息,通过第一网络得到第一噪声信息。In a possible implementation, obtaining first noise information through a first network according to the first high-frequency information and the first low-frequency information includes: obtaining first noise information through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
在一种可能的实现中,所述根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息为在第i次迭代时执行的,所述i为大于1的正整数;所述方法还包括:在第i+1次迭代时,根据所述第一高频信息和所述第二低频信息,通过所述第一网络得到第二噪声信息,所述第二噪声信息用于对所述第二低频信息进行去噪,得到第三低频信息;所述第一低频信息和所述第一高频信息用于得到第二图像,包括:所述第三低频信息和所述第一高频信息用于得到第二图像。In a possible implementation, obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer greater than 1; the method further includes: obtaining the second noise information through the first network according to the first high-frequency information and the second low-frequency information at the i+1-th iteration, the second noise information being used to denoise the second low-frequency information to obtain third low-frequency information; the first low-frequency information and the first high-frequency information are used to obtain a second image, including: the third low-frequency information and the first high-frequency information are used to obtain the second image.
在一种可能的实现中,所述方法还包括:将所述第二低频信息通过目标映射,得到目标低频信息;
所述目标映射不包含噪声估计项;所述第二低频信息和所述第一高频信息用于得到第二图像,包括:所述目标低频信息和所述第一高频信息用于融合得到融合结果,所述第二图像为将所述融合结果映射到空间域得到的。In a possible implementation, the method further includes: performing target mapping on the second low-frequency information to obtain target low-frequency information; The target mapping does not include a noise estimation item; the second low-frequency information and the first high-frequency information are used to obtain a second image, including: the target low-frequency information and the first high-frequency information are used to fuse to obtain a fusion result, and the second image is obtained by mapping the fusion result to the spatial domain.
在一种可能的实现中,可以将所述第二低频信息通过目标映射,得到目标低频信息;所述目标映射不包含噪声估计项;所述目标低频信息和所述第一高频信息用于融合(例如拼接)得到融合结果,所述第二图像为将所述融合结果映射到空间域(例如通过逆小波变换)得到的。In a possible implementation, the second low-frequency information can be mapped through target mapping to obtain target low-frequency information; the target mapping does not include a noise estimation item; the target low-frequency information and the first high-frequency information are used for fusion (for example, splicing) to obtain a fusion result, and the second image is obtained by mapping the fusion result to the spatial domain (for example, through an inverse wavelet transform).
通过上述方式,总采样步数能够大幅缩减(例如缩减至原来的1/5左右),从而提升采样效率。Through the above method, the total number of sampling steps can be greatly reduced (for example, reduced to about 1/5 of the original), thereby improving the sampling efficiency.
在一种可能的实现中,所述第一低频信息为随机生成的噪声。In a possible implementation, the first low-frequency information is randomly generated noise.
在一种可能的实现中,所述根据所述第一数据,确定第一高频信息,包括:根据所述第一数据,通过第二网络,确定第一高频信息。In a possible implementation, determining the first high-frequency information according to the first data includes: determining the first high-frequency information through a second network according to the first data.
在一种可能的实现中,所述将所述第一图像转换到频域,包括:通过二阶小波变换将所述第一图像转换到频域。In a possible implementation, converting the first image into the frequency domain includes: converting the first image into the frequency domain by using a second-order wavelet transform.
第二方面,本申请提供了一种模型训练方法,所述方法包括:In a second aspect, the present application provides a model training method, the method comprising:
获取第一图像和第二图像;所述第一图像和所述第二图像为针对于相同场景采集的;所述第二图像为所述第一图像对应的高质图像;Acquire a first image and a second image; the first image and the second image are collected for the same scene; the second image is a high-quality image corresponding to the first image;
将所述第一图像和所述第二图像转换到频域,分别得到第一数据和第二数据;所述第一数据的空间分辨率低于所述第一图像;所述第二数据的空间分辨率低于所述第二图像;所述第二数据包括第一低频信息;所述第一低频信息为所述第二数据中低频通道的信息;Convert the first image and the second image into the frequency domain to obtain first data and second data respectively; the spatial resolution of the first data is lower than that of the first image; the spatial resolution of the second data is lower than that of the second image; the second data includes first low-frequency information; the first low-frequency information is information of a low-frequency channel in the second data;
根据所述第一数据,确定第一高频信息;所述第一高频信息为对所述第一图像对应的高质图像的高频通道的信息预测结果;Determine first high-frequency information according to the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image;
根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息;所述第一噪声信息用于和第二噪声信息确定第一损失;所述第二噪声信息为随机生成的噪声;According to the first high-frequency information and the first low-frequency information, first noise information is obtained through a first network; the first noise information is used to determine a first loss together with the second noise information; the second noise information is randomly generated noise;
根据所述第一损失,对所述第一网络进行更新。The first network is updated according to the first loss.
本申请实施例中,将图像转换到频域进行图像恢复,可以避免采用图像的切分(切块需要分别处理再合并,可能出现边界伪影,且图片尺寸大时切块数量过多,导致处理时间长),从而提高复原质量以及降低处理时间。此外,基于高频信息和包含噪声的低频信息来预测噪声,基于该噪声进行的图像复原得到的图像质量较高(能够恢复出更多的细节,同时显著减少了采样的总时间)。In the embodiment of the present application, the image is converted to the frequency domain for image restoration, which can avoid the use of image segmentation (the segments need to be processed separately and then merged, which may cause boundary artifacts, and when the image size is large, the number of segments is too large, resulting in a long processing time), thereby improving the restoration quality and reducing the processing time. In addition, based on the high-frequency information and the low-frequency information containing noise to predict the noise, the image restoration based on the noise has a higher image quality (it can restore more details and significantly reduce the total sampling time).
在一种可能的实现中,所述根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息,包括:In a possible implementation, obtaining first noise information through a first network according to the first high-frequency information and the first low-frequency information includes:
根据所述第一高频信息、所述第一数据和所述第一低频信息,通过第一网络得到第一噪声信息。First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
在一种可能的实现中,所述根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息为在第i次迭代时执行的,所述i为正整数;所述方法还包括:In a possible implementation, obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer; and the method further includes:
对所述第一低频信息叠加噪声,得到第三低频信息;superimposing noise on the first low-frequency information to obtain third low-frequency information;
在第i+1次迭代时,根据所述第一高频信息和所述第三低频信息,通过所述第一网络得到第三噪声信息;所述第三噪声信息用于和第四噪声信息确定第二损失;所述第四噪声信息为随机生成的噪声;In the (i+1)th iteration, third noise information is obtained through the first network according to the first high-frequency information and the third low-frequency information; the third noise information is used to determine the second loss together with the fourth noise information; the fourth noise information is randomly generated noise;
根据所述第二损失,对更新后的所述第一网络进行更新。The updated first network is updated according to the second loss.
在一种可能的实现中,所述根据所述第一数据,确定第一高频信息,包括:In a possible implementation, determining first high-frequency information according to the first data includes:
根据所述第一数据,通过第二网络,确定第一高频信息;所述第二网络为预先训练好的网络。According to the first data, first high-frequency information is determined through a second network; the second network is a pre-trained network.
在一种可能的实现中,所述将所述第一图像和所述第二图像转换到频域,包括:In a possible implementation, converting the first image and the second image into a frequency domain includes:
通过二阶小波变换将所述第一图像和所述第二图像转换到频域。The first image and the second image are converted into the frequency domain by second-order wavelet transform.
第三方面,本申请提供了一种图像处理装置,所述装置包括:
In a third aspect, the present application provides an image processing device, the device comprising:
获取模块,用于获取第一图像;An acquisition module, used for acquiring a first image;
处理模块,用于将所述第一图像转换到频域,得到第一数据;所述第一数据的空间分辨率低于所述第一图像;A processing module, configured to convert the first image into a frequency domain to obtain first data; the spatial resolution of the first data is lower than that of the first image;
根据所述第一数据,确定第一高频信息;所述第一高频信息为对所述第一图像对应的高质图像的高频通道的信息预测结果;Determine first high-frequency information according to the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image;
获取第一低频信息,所述第一低频信息包含所述高质图像的低频通道的噪声;Acquire first low-frequency information, where the first low-frequency information includes noise of a low-frequency channel of the high-quality image;
根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息;所述第一噪声信息用于对所述第一低频信息进行去噪,得到第二低频信息;According to the first high-frequency information and the first low-frequency information, first noise information is obtained through a first network; the first noise information is used to denoise the first low-frequency information to obtain second low-frequency information;
所述第二低频信息和所述第一高频信息用于得到第二图像。The second low-frequency information and the first high-frequency information are used to obtain a second image.
在一种可能的实现中,所述处理模块,具体用于:In a possible implementation, the processing module is specifically configured to:
根据所述第一高频信息、所述第一数据和所述第一低频信息,通过第一网络得到第一噪声信息。First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
在一种可能的实现中,所述根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息为在第i次迭代时执行的,所述i为大于1的正整数;所述处理模块,还用于:In a possible implementation, obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer greater than 1; and the processing module is further used to:
在第i+1次迭代时,根据所述第一高频信息和所述第二低频信息,通过所述第一网络得到第二噪声信息,所述第二噪声信息用于对所述第二低频信息进行去噪,得到第三低频信息;In the (i+1)th iteration, second noise information is obtained through the first network according to the first high-frequency information and the second low-frequency information, and the second noise information is used to denoise the second low-frequency information to obtain third low-frequency information;
所述第一低频信息和所述第一高频信息用于得到第二图像,包括:The first low-frequency information and the first high-frequency information are used to obtain a second image, including:
所述第三低频信息和所述第一高频信息用于得到第二图像。The third low-frequency information and the first high-frequency information are used to obtain a second image.
在一种可能的实现中,所述处理模块,还用于:In a possible implementation, the processing module is further configured to:
将所述第二低频信息通过目标映射,得到目标低频信息;所述目标映射不包含噪声估计项;The second low-frequency information is subjected to target mapping to obtain target low-frequency information; the target mapping does not include a noise estimation item;
所述处理模块,具体用于:The processing module is specifically used for:
所述目标低频信息和所述第一高频信息用于融合得到融合结果,所述第二图像为将所述融合结果映射到空间域得到的。The target low-frequency information and the first high-frequency information are used to fuse to obtain a fusion result, and the second image is obtained by mapping the fusion result to a spatial domain.
在一种可能的实现中,所述第一低频信息为随机生成的噪声。In a possible implementation, the first low-frequency information is randomly generated noise.
在一种可能的实现中,所述处理模块,具体用于:In a possible implementation, the processing module is specifically configured to:
根据所述第一数据,通过第二网络,确定第一高频信息。According to the first data, first high-frequency information is determined through a second network.
在一种可能的实现中,所述处理模块,具体用于:In a possible implementation, the processing module is specifically configured to:
通过二阶小波变换将所述第一图像转换到频域。The first image is converted into the frequency domain by second-order wavelet transform.
第四方面,本申请提供了一种模型训练装置,所述装置包括:In a fourth aspect, the present application provides a model training device, the device comprising:
获取模块,用于获取第一图像和第二图像;所述第一图像和所述第二图像为针对于相同场景采集的;所述第二图像为所述第一图像对应的高质图像;An acquisition module, used to acquire a first image and a second image; the first image and the second image are acquired for the same scene; the second image is a high-quality image corresponding to the first image;
处理模块,用于将所述第一图像和所述第二图像转换到频域,分别得到第一数据和第二数据;所述第一数据的空间分辨率低于所述第一图像;所述第二数据的空间分辨率低于所述第二图像;所述第二数据包括第一低频信息;所述第一低频信息为所述第二数据中低频通道的信息;a processing module, configured to convert the first image and the second image into a frequency domain to obtain first data and second data respectively; the spatial resolution of the first data is lower than that of the first image; the spatial resolution of the second data is lower than that of the second image; the second data includes first low-frequency information; the first low-frequency information is information of a low-frequency channel in the second data;
根据所述第一数据,确定第一高频信息;所述第一高频信息为对所述第一图像对应的高质图像的高频通道的信息预测结果;Determine first high-frequency information according to the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image;
根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息;所述第一噪声信息用于和第二噪声信息确定第一损失;所述第二噪声信息为随机生成的噪声;According to the first high-frequency information and the first low-frequency information, first noise information is obtained through a first network; the first noise information is used to determine a first loss together with the second noise information; the second noise information is randomly generated noise;
根据所述第一损失,对所述第一网络进行更新。
The first network is updated according to the first loss.
在一种可能的实现中,所述处理模块,具体用于:In a possible implementation, the processing module is specifically configured to:
根据所述第一高频信息、所述第一数据和所述第一低频信息,通过第一网络得到第一噪声信息。First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
在一种可能的实现中,所述根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息为在第i次迭代时执行的,所述i为正整数;所述处理模块,还用于:In a possible implementation, obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer; and the processing module is further used to:
对所述第一低频信息叠加噪声,得到第三低频信息;superimposing noise on the first low-frequency information to obtain third low-frequency information;
在第i+1次迭代时,根据所述第一高频信息和所述第三低频信息,通过所述第一网络得到第三噪声信息;所述第三噪声信息用于和第四噪声信息确定第二损失;所述第四噪声信息为随机生成的噪声;In the (i+1)th iteration, third noise information is obtained through the first network according to the first high-frequency information and the third low-frequency information; the third noise information is used to determine the second loss together with the fourth noise information; the fourth noise information is randomly generated noise;
根据所述第二损失,对更新后的所述第一网络进行更新。The updated first network is updated according to the second loss.
在一种可能的实现中,所述处理模块,具体用于:In a possible implementation, the processing module is specifically configured to:
根据所述第一数据,通过第二网络,确定第一高频信息;所述第二网络为预先训练好的网络。According to the first data, first high-frequency information is determined through a second network; the second network is a pre-trained network.
在一种可能的实现中,所述处理模块,具体用于:In a possible implementation, the processing module is specifically configured to:
通过二阶小波变换将所述第一图像和所述第二图像转换到频域。The first image and the second image are converted into the frequency domain by second-order wavelet transform.
第三方面,本申请实施例提供了一种图像处理装置,可以包括存储器、处理器以及总线系统,其中,存储器用于存储程序,处理器用于执行存储器中的程序,以执行如上述第一方面任一可选的方法。In a third aspect, an embodiment of the present application provides an image processing device, which may include a memory, a processor, and a bus system, wherein the memory is used to store programs, and the processor is used to execute the programs in the memory to perform any optional method as described in the first aspect above.
第四方面,本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面及任一可选的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored. When the computer-readable storage medium is run on a computer, the computer executes the above-mentioned first aspect and any optional method.
第五方面,本申请实施例提供了一种计算机程序产品,包括代码,当代码被执行时,用于实现上述第一方面及任一可选的方法。In a fifth aspect, an embodiment of the present application provides a computer program product, including code, which, when executed, is used to implement the above-mentioned first aspect and any optional method.
第六方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持图像处理装置实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据;或,信息。在一种可能的设计中,芯片系统还包括存储器,所述存储器,用于保存执行设备或训练设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。In a sixth aspect, the present application provides a chip system, which includes a processor for supporting an image processing device to implement the functions involved in the above aspects, such as sending or processing the data involved in the above methods; or information. In one possible design, the chip system also includes a memory, which is used to store program instructions and data necessary for executing the device or training the device. The chip system can be composed of a chip, or it can include a chip and other discrete devices.
图1A为人工智能主体框架的一种结构示意图;FIG1A is a schematic diagram of a structure of an artificial intelligence main framework;
图1B和图2为本发明的应用系统框架示意;1B and 2 are schematic diagrams of the application system framework of the present invention;
图3为终端的一种可选的硬件结构示意图;FIG3 is a schematic diagram of an optional hardware structure of a terminal;
图4为一种服务器的结构示意图;FIG4 is a schematic diagram of the structure of a server;
图5为本申请的一种系统架构示意;FIG5 is a schematic diagram of a system architecture of the present application;
图6为一种云服务的流程;FIG6 is a process of a cloud service;
图7为本申请实施例中的一种神经网络模型的结构示意;FIG7 is a schematic diagram of the structure of a neural network model in an embodiment of the present application;
图8为本申请实施例中的一种神经网络模型的结构示意;FIG8 is a schematic diagram of the structure of a neural network model in an embodiment of the present application;
图9为一种图像处理方法的流程示意;FIG9 is a schematic diagram of a process of an image processing method;
图10为一种图像处理方法的示意;FIG10 is a schematic diagram of an image processing method;
图11为一种图像处理方法的示意;FIG11 is a schematic diagram of an image processing method;
图12A为一种有益效果示意;FIG12A is a schematic diagram of a beneficial effect;
图12B为一种架构示意;FIG12B is a schematic diagram of an architecture;
图13为本申请实施例提供的一种图像处理装置的结构示意图;FIG13 is a schematic diagram of the structure of an image processing device provided in an embodiment of the present application;
图14为本申请实施例提供的一种执行设备的示意图;FIG14 is a schematic diagram of an execution device provided in an embodiment of the present application;
图15为本申请实施例提供的一种训练设备的示意图;FIG15 is a schematic diagram of a training device provided in an embodiment of the present application;
图16为本申请实施例提供的一种芯片的示意图。
FIG16 is a schematic diagram of a chip provided in an embodiment of the present application.
下面结合本发明实施例中的附图对本发明实施例进行描述。本发明的实施方式部分使用的术语仅用于对本发明的具体实施例进行解释,而非旨在限定本发明。The following describes the embodiments of the present invention in conjunction with the accompanying drawings in the embodiments of the present invention. The terms used in the implementation mode of the present invention are only used to explain the specific embodiments of the present invention, and are not intended to limit the present invention.
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The embodiments of the present application are described below in conjunction with the accompanying drawings. It is known to those skilled in the art that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。The terms "first", "second", etc. in the specification and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and need not be used to describe a specific order or sequential order. It should be understood that the terms used in this way can be interchangeable under appropriate circumstances, which is only to describe the distinction mode adopted by the objects of the same attributes when describing in the embodiments of the present application. In addition, the terms "including" and "having" and any of their variations are intended to cover non-exclusive inclusions, so that the process, method, system, product or equipment comprising a series of units need not be limited to those units, but may include other units that are not clearly listed or inherent to these processes, methods, products or equipment.
首先对人工智能系统总体工作流程进行描述,请参见图1A,图1A示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。First, the overall workflow of the artificial intelligence system is described. Please refer to Figure 1A. Figure 1A shows a structural diagram of the main framework of artificial intelligence. The following is an explanation of the above artificial intelligence theme framework from the two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be a general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has undergone a condensation process of "data-information-knowledge-wisdom". The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecology process of the system.
(1)基础设施(1) Infrastructure
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。The infrastructure provides computing power support for the artificial intelligence system, enables communication with the outside world, and is supported by the basic platform. It communicates with the outside world through sensors; computing power is provided by smart chips (CPU, NPU, GPU, ASIC, FPGA and other hardware acceleration chips); the basic platform includes distributed computing frameworks and networks and other related platform guarantees and support, which can include cloud storage and computing, interconnected networks, etc. For example, sensors communicate with the outside world to obtain data, and these data are provided to the smart chips in the distributed computing system provided by the basic platform for calculation.
(2)数据(2) Data
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。The data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence. The data involves graphics, images, voice, text, and IoT data of traditional devices, including business data of existing systems and perception data such as force, displacement, liquid level, temperature, and humidity.
(3)数据处理(3) Data processing
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, and training.
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formalized information to perform machine thinking and solve problems based on reasoning control strategies. Typical functions are search and matching.
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
(4)通用能力(4) General capabilities
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。After the data has undergone the data processing mentioned above, some general capabilities can be further formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.
(5)智能产品及行业应用(5) Smart products and industry applications
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、智慧城市等。Smart products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall artificial intelligence solution, which productizes intelligent information decision-making and realizes practical applications. Its application areas mainly include: smart terminals, smart transportation, smart medical care, autonomous driving, smart cities, etc.
本申请实施例可以应用在图像处理的相关任务,例如图像增强等领域中。The embodiments of the present application can be applied to tasks related to image processing, such as image enhancement and other fields.
接下来首先介绍本申请的应用场景,本申请可以但不限于应用在具备图像处理功能的应用程序或者云侧服务器提供的云服务等,接下来分别进行介绍:Next, we will first introduce the application scenarios of this application. This application can be applied to, but not limited to, applications with image processing functions or cloud services provided by cloud-side servers, etc., which will be introduced separately as follows:
一、图像处理类应用程序1. Image processing applications
本申请实施例的产品形态可以为图像处理类应用程序。图像处理类应用程序可以运行在终端设备或
者云侧的服务器上。The product form of the embodiment of the present application may be an image processing application. The image processing application may be run on a terminal device or On the server on the cloud side.
在一种可能的实现中,图像处理类应用程序可以实现图像处理任务或者基于图像处理结果进行的任务等。In a possible implementation, the image processing application may implement image processing tasks or tasks based on image processing results.
在一种可能的实现中,用户可以打开终端设备上安装的具备图像处理类功能的应用程序,应用程序可以获取到相机采集的图像数据或者是用户指定的图像数据,图像处理类应用程序可以通过本申请实施例提供的方法基于输入数据得到处理结果,并将图像处理结果或者是基于图像处理结果实现的下游任务结果呈现给用户(呈现方式可以但不限于是显示、保存、上传到云侧等)。In one possible implementation, a user can open an application with image processing functions installed on a terminal device. The application can obtain image data captured by the camera or image data specified by the user. The image processing application can obtain processing results based on the input data through the method provided in the embodiment of the present application, and present the image processing results or downstream task results based on the image processing results to the user (the presentation method can be but is not limited to display, saving, uploading to the cloud side, etc.).
在一种可能的实现中,用户可以打开终端设备上安装的图像处理类应用程序,应用程序可以获取到相机采集的图像数据或者是用户指定的图像数据,图像处理类应用程序可以将数据(或者对该数据进行一定处理后得到的结果)发送至云侧的服务器,云侧的服务器通过本申请实施例提供的方法基于图像生成图像处理结果,并将图像处理结果或者是基于图像处理结果实现的下游任务结果回传至终端设备,终端设备可以将图像处理结果或者是基于图像处理结果实现的下游任务结果呈现给用户(呈现方式可以但不限于是显示、保存、上传到云侧等)。In one possible implementation, a user can open an image processing application installed on a terminal device. The application can obtain image data captured by a camera or image data specified by a user. The image processing application can send the data (or the result obtained after certain processing on the data) to a server on the cloud side. The server on the cloud side generates an image processing result based on the image through the method provided in an embodiment of the present application, and transmits the image processing result or the result of a downstream task implemented based on the image processing result back to the terminal device. The terminal device can present the image processing result or the result of a downstream task implemented based on the image processing result to the user (the presentation method can be but is not limited to display, saving, uploading to the cloud side, etc.).
接下来分别从功能架构以及实现功能的产品架构介绍本申请实施例中的图像处理类应用程序。Next, the image processing application in the embodiment of the present application is introduced from the functional architecture and the product architecture that realizes the functions.
参照图1B,图1B为本申请实施例中图像处理类应用程序的功能架构示意:Referring to FIG. 1B , FIG. 1B is a schematic diagram of the functional architecture of an image processing application in an embodiment of the present application:
在一种可能的实现中,如图1B所示,图像处理类应用程序102可接收输入的数据101(例如图像以及事件数据)且产生处理结果103。图像处理类应用程序102可在(举例来说)至少一个计算机系统上执行,且包括计算机代码,所述计算机代码在由一或多个计算机执行时致使所述计算机执行用于执行本文中所描述的图像处理方法。In a possible implementation, as shown in FIG1B , an image processing application 102 may receive input data 101 (e.g., image and event data) and generate a processing result 103. The image processing application 102 may be executed on, for example, at least one computer system and includes computer codes that, when executed by one or more computers, cause the computers to execute the image processing method described herein.
参照图2,图2为本申请实施例中运行图像处理类应用程序的实体架构示意:Referring to FIG. 2 , FIG. 2 is a schematic diagram of the physical architecture for running an image processing application in an embodiment of the present application:
参见图2,图2示出了一种系统架构示意图。该系统可以包括终端100、以及服务器200。其中,服务器200可以包括一个或者多个服务器(图2中以包括一个服务器作为示例进行说明),服务器200可以为一个或者多个终端提供图像处理服务或者基于图像处理结果进行下游任务。Referring to Fig. 2, Fig. 2 shows a schematic diagram of a system architecture. The system may include a terminal 100 and a server 200. The server 200 may include one or more servers (one server is used as an example in Fig. 2 for illustration), and the server 200 may provide image processing services for one or more terminals or perform downstream tasks based on image processing results.
其中,终端100上可以安装有图像处理类应用程序,或者打开与图像处理或者基于图像处理结果进行下游任务相关的网页,上述应用程序和网页可以提供一个界面,终端100可以接收用户在图像处理或者基于图像处理结果进行下游任务界面上输入的相关参数,并将上述参数发送至服务器200,服务器200可以基于接收到的参数,得到处理结果,并将处理结果返回至至终端100。Among them, the terminal 100 can be installed with an image processing application, or a web page related to image processing or downstream tasks based on image processing results can be opened. The above application and web page can provide an interface. The terminal 100 can receive relevant parameters entered by the user on the image processing or downstream task interface based on image processing results, and send the above parameters to the server 200. The server 200 can obtain the processing results based on the received parameters and return the processing results to the terminal 100.
应理解,在一些可选的实现中,终端100也可以由自身完成基于接收到的参数,得到数据处理结果,而不需要服务器配合实现,本申请实施例并不限定。It should be understood that in some optional implementations, the terminal 100 can also complete the data processing results based on the received parameters by itself without the need for cooperation from the server, and the embodiments of the present application are not limited to this.
接下来描述图2中终端100的产品形态;Next, the product form of the terminal 100 in FIG. 2 is described;
本申请实施例中的终端100可以为手机、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等,本申请实施例对此不作任何限制。The terminal 100 in the embodiment of the present application can be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a laptop computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (PDA), etc., and the embodiment of the present application does not impose any limitation on this.
图3示出了终端100的一种可选的硬件结构示意图。FIG. 3 shows a schematic diagram of an optional hardware structure of the terminal 100 .
参考图3所示,终端100可以包括射频单元110、存储器120、输入单元130、显示单元140、摄像头150(可选的)、音频电路160(可选的)、扬声器161(可选的)、麦克风162(可选的)、处理器170、外部接口180、电源190等部件。本领域技术人员可以理解,图3仅仅是终端或多功能设备的举例,并不构成对终端或多功能设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件。3, the terminal 100 may include components such as a radio frequency unit 110, a memory 120, an input unit 130, a display unit 140, a camera 150 (optional), an audio circuit 160 (optional), a speaker 161 (optional), a microphone 162 (optional), a processor 170, an external interface 180, and a power supply 190. Those skilled in the art will appreciate that FIG3 is merely an example of a terminal or a multi-function device, and does not constitute a limitation on the terminal or the multi-function device, and may include more or fewer components than shown in the figure, or combine certain components, or different components.
输入单元130可用于接收输入的数字或字符信息,以及产生与该便携式多功能装置的用户设置以及功能控制有关的键信号输入。具体地,输入单元130可包括触摸屏131(可选的)和/或其他输入设备132。该触摸屏131可收集用户在其上或附近的触摸操作(比如用户使用手指、关节、触笔等任何适合的物体在触摸屏上或在触摸屏附近的操作),并根据预先设定的程序驱动相应的连接装置。触摸屏可以检测用户对触摸屏的触摸动作,将该触摸动作转换为触摸信号发送给该处理器170,并能接收该处理器170发来的命令并加以执行;该触摸信号至少包括触点坐标信息。该触摸屏131可以提供该终端100和用户之
间的输入界面和输出界面。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触摸屏。除了触摸屏131,输入单元130还可以包括其他输入设备。具体地,其他输入设备132可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。The input unit 130 can be used to receive input digital or character information, and generate key signal input related to the user settings and function control of the portable multi-function device. Specifically, the input unit 130 may include a touch screen 131 (optional) and/or other input devices 132. The touch screen 131 can collect user touch operations on or near it (such as operations performed by the user using fingers, joints, stylus, or any other suitable objects on or near the touch screen), and drive the corresponding connection device according to a pre-set program. The touch screen can detect the user's touch action on the touch screen, convert the touch action into a touch signal and send it to the processor 170, and can receive and execute commands sent by the processor 170; the touch signal at least includes touch point coordinate information. The touch screen 131 can provide communication between the terminal 100 and the user. The input interface and output interface between the input and output interfaces. In addition, the touch screen can be implemented by various types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch screen 131, the input unit 130 can also include other input devices. Specifically, the other input devices 132 can include but are not limited to one or more of a physical keyboard, a function key (such as a volume control key, a switch key, etc.), a trackball, a mouse, a joystick, etc.
其中,其他输入设备132可以获取相机采集的图像数据或者用户指定的图像数据等等。Among them, other input devices 132 can obtain image data collected by a camera or image data specified by a user, etc.
该显示单元140可用于显示由用户输入的信息或提供给用户的信息、终端100的各种菜单、交互界面、文件显示和/或任意一种多媒体文件的播放。在本申请实施例中,显示单元140可用于显示和图像处理相关的应用程序的界面等。The display unit 140 may be used to display information input by the user or provided to the user, various menus of the terminal 100, interactive interfaces, file display, and/or playback of any multimedia file. In an embodiment of the present application, the display unit 140 may be used to display an interface of an application program related to image processing, etc.
该存储器120可用于存储指令和数据,存储器120可主要包括存储指令区和存储数据区,存储数据区可存储各种数据,如多媒体文件、文本等;存储指令区可存储操作系统、应用、至少一个功能所需的指令等软件单元,或者他们的子集、扩展集。还可以包括非易失性随机存储器;向处理器170提供包括管理计算处理设备中的硬件、软件以及数据资源,支持控制软件和应用。还用于多媒体文件的存储,以及运行程序和应用的存储。The memory 120 can be used to store instructions and data. The memory 120 can mainly include an instruction storage area and a data storage area. The data storage area can store various data, such as multimedia files, texts, etc.; the instruction storage area can store software units such as operating systems, applications, instructions required for at least one function, or their subsets and extensions. It can also include a non-volatile random access memory; provide the processor 170 with hardware, software and data resources including management of computing and processing equipment, and support control software and applications. It is also used for the storage of multimedia files, and the storage of running programs and applications.
处理器170是终端100的控制中心,利用各种接口和线路连接整个终端100的各个部分,通过运行或执行存储在存储器120内的指令以及调用存储在存储器120内的数据,执行终端100的各种功能和处理数据,从而对终端设备进行整体控制。可选的,处理器170可包括一个或多个处理单元;优选的,处理器170可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器170中。在一些实施例中,处理器、存储器、可以在单一芯片上实现,在一些实施例中,他们也可以在独立的芯片上分别实现。处理器170还可以用于产生相应的操作控制信号,发给计算处理设备相应的部件,读取以及处理软件中的数据,尤其是读取和处理存储器120中的数据和程序,以使其中的各个功能模块执行相应的功能,从而控制相应的部件按指令的要求进行动作。The processor 170 is the control center of the terminal 100. It uses various interfaces and lines to connect various parts of the entire terminal 100. By running or executing instructions stored in the memory 120 and calling data stored in the memory 120, it executes various functions of the terminal 100 and processes data, thereby controlling the terminal device as a whole. Optionally, the processor 170 may include one or more processing units; preferably, the processor 170 may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface and application program, and the modem processor mainly processes wireless communication. It is understandable that the above-mentioned modem processor may not be integrated into the processor 170. In some embodiments, the processor and the memory may be implemented on a single chip, and in some embodiments, they may also be implemented separately on separate chips. The processor 170 may also be used to generate corresponding operation control signals, send them to corresponding components of the computing and processing device, read and process data in the software, especially read and process data and programs in the memory 120, so that each functional module therein performs corresponding functions, thereby controlling the corresponding components to act according to the requirements of the instructions.
其中,存储器120可以用于存储图像处理方法相关的软件代码,处理器170可以执行芯片的图像处理方法的步骤,也可以调度其他单元(例如上述输入单元130以及显示单元140)以实现相应的功能。Among them, the memory 120 can be used to store software codes related to the image processing method, the processor 170 can execute the steps of the image processing method of the chip, and can also schedule other units (such as the above-mentioned input unit 130 and display unit 140) to realize corresponding functions.
该射频单元110(可选的)可用于收发信息或通话过程中信号的接收和发送,例如,将基站的下行信息接收后,给处理器170处理;另外,将设计上行的数据发送给基站。通常,RF电路包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,LNA)、双工器等。此外,射频单元110还可以通过无线通信与网络设备和其他设备通信。该无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(Global System of Mobile communication,GSM)、通用分组无线服务(General Packet Radio Service,GPRS)、码分多址(Code Division Multiple Access,CDMA)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、长期演进(Long Term Evolution,LTE)、电子邮件、短消息服务(Short Messaging Service,SMS)等。The RF unit 110 (optional) can be used for receiving and sending information or receiving and sending signals during a call, for example, after receiving the downlink information of the base station, it is sent to the processor 170 for processing; in addition, the designed uplink data is sent to the base station. Generally, the RF circuit includes but is not limited to an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (Low Noise Amplifier, LNA), a duplexer, etc. In addition, the RF unit 110 can also communicate with network devices and other devices through wireless communication. The wireless communication can use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), etc.
其中,在本申请实施例中,该射频单元110可以将图像数据发送至服务器200,并接收到服务器200发送的处理结果的信息。In this embodiment of the present application, the RF unit 110 can send image data to the server 200 and receive information on processing results sent by the server 200.
应理解,该射频单元110为可选的,其可以被替换为其他通信接口,例如可以是网口。It should be understood that the radio frequency unit 110 is optional and can be replaced by other communication interfaces, such as a network port.
终端100还包括给各个部件供电的电源190(比如电池),优选的,电源可以通过电源管理系统与处理器170逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。The terminal 100 also includes a power supply 190 (such as a battery) for supplying power to various components. Preferably, the power supply can be logically connected to the processor 170 through a power management system, so that the power management system can manage functions such as charging, discharging, and power consumption.
终端100还包括外部接口180,该外部接口可以是标准的Micro USB接口,也可以使多针连接器,可以用于连接终端100与其他装置进行通信,也可以用于连接充电器为终端100充电。The terminal 100 also includes an external interface 180, which can be a standard Micro USB interface or a multi-pin connector. It can be used to connect the terminal 100 to communicate with other devices, and can also be used to connect a charger to charge the terminal 100.
尽管未示出,终端100还可以包括闪光灯、无线保真(wireless fidelity,WiFi)模块、蓝牙模块、不同功能的传感器等,在此不再赘述。下文中描述的部分或全部方法均可以应用在如图3所示的终端100中。Although not shown, the terminal 100 may also include a flashlight, a wireless fidelity (WiFi) module, a Bluetooth module, sensors with different functions, etc., which will not be described in detail here. Some or all of the methods described below may be applied to the terminal 100 shown in FIG. 3 .
接下来描述图2中服务器200的产品形态;Next, the product form of the server 200 in FIG. 2 is described;
图4提供了一种服务器200的结构示意图,如图4所示,服务器200包括总线201、处理器202、通信接口203和存储器204。处理器202、存储器204和通信接口203之间通过总线201通信。Fig. 4 provides a schematic diagram of the structure of a server 200. As shown in Fig. 4, the server 200 includes a bus 201, a processor 202, a communication interface 203 and a memory 204. The processor 202, the memory 204 and the communication interface 203 communicate with each other via the bus 201.
总线201可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图4中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
The bus 201 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of representation, FIG4 only uses one thick line, but does not mean that there is only one bus or one type of bus.
处理器202可以为中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。The processor 202 may be any one or more of a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP), or a digital signal processor (DSP).
存储器204可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存储器204还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard drive drive,HDD)或固态硬盘(solid state drive,SSD)。The memory 204 may include a volatile memory (volatile memory), such as a random access memory (RAM). The memory 204 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory, a hard drive (HDD), or a solid state drive (SSD).
其中,存储器204可以用于存储图像处理方法相关的软件代码,处理器202可以执行芯片的图像处理方法的步骤,也可以调度其他单元以实现相应的功能。The memory 204 may be used to store software codes related to the image processing method, and the processor 202 may execute the steps of the image processing method of the chip, and may also schedule other units to implement corresponding functions.
应理解,上述终端100和服务器200可以为集中式或者是分布式的设备,上述终端100和服务器200中的处理器(例如处理器170以及处理器202)可以为硬件电路(如专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器等等)、或这些硬件电路的组合,例如,处理器可以为具有执行指令功能的硬件系统,如CPU、DSP等,或者为不具有执行指令功能的硬件系统,如ASIC、FPGA等,或者为上述不具有执行指令功能的硬件系统以及具有执行指令功能的硬件系统的组合。It should be understood that the above-mentioned terminal 100 and server 200 can be centralized or distributed devices, and the processors in the above-mentioned terminal 100 and server 200 (such as processor 170 and processor 202) can be hardware circuits (such as application specific integrated circuit (ASIC), field-programmable gate array (FPGA), general-purpose processor, digital signal processor (DSP), microprocessor or microcontroller, etc.), or a combination of these hardware circuits. For example, the processor can be a hardware system with an instruction execution function, such as a CPU, DSP, etc., or a hardware system without an instruction execution function, such as an ASIC, FPGA, etc., or a combination of the above-mentioned hardware systems without an instruction execution function and hardware systems with an instruction execution function.
应理解,本申请实施例中的和模型推理过程相关的步骤涉及AI相关的运算,在执行AI运算时,终端设备和服务器的指令执行架构不仅仅局限在上述介绍的处理器结合存储器的架构。下面结合图5对本申请实施例提供的系统架构进行详细的介绍。It should be understood that the steps related to the model reasoning process in the embodiments of the present application involve AI-related operations. When performing AI operations, the instruction execution architecture of the terminal device and the server is not limited to the processor combined with the memory architecture described above. The system architecture provided in the embodiments of the present application is described in detail below in conjunction with Figure 5.
请参阅图5,图5为本申请实施例提供的系统的一种系统架构图,在图5中,任务处理系统500包括执行设备510、训练设备520、数据库530、客户设备540、数据存储系统550和数据采集设备560,执行设备510中包括计算模块511。其中,数据采集设备560用于获取用户需要的开源的大规模数据集(即训练集),并将训练集存入数据库530中,训练设备520基于数据库530中的维护的训练集对目标模型/规则501进行训练,训练得到的训练后的神经网络再在执行设备510上进行运用。执行设备510可以调用数据存储系统550中的数据、代码等,也可以将数据、指令等存入数据存储系统550中。数据存储系统550可以置于执行设备510中,也可以为数据存储系统550相对执行设备510是外部存储器。Please refer to FIG. 5, which is a system architecture diagram of a system provided by an embodiment of the present application. In FIG. 5, a task processing system 500 includes an execution device 510, a training device 520, a database 530, a client device 540, a data storage system 550 and a data acquisition device 560, and the execution device 510 includes a computing module 511. Among them, the data acquisition device 560 is used to obtain an open source large-scale data set (i.e., a training set) required by the user, and store the training set in the database 530. The training device 520 trains the target model/rule 501 based on the training set maintained in the database 530, and the trained neural network obtained by the training is then used on the execution device 510. The execution device 510 can call the data, code, etc. in the data storage system 550, and can also store data, instructions, etc. in the data storage system 550. The data storage system 550 can be placed in the execution device 510, or the data storage system 550 can be an external memory relative to the execution device 510.
经由训练设备520训练的目标模型/规则501后得到的训练后的神经网络可以应用于不同的系统或设备(即执行设备510)中,具体可以是边缘设备或端侧设备,例如,手机、平板、笔记本电脑、监控系统(如,摄像头)、安防系统等等。在图5中,执行设备510配置有I/O接口512,与外部设备进行数据交互,“用户”可以通过客户设备540向I/O接口512输入数据。如,客户设备540可以是监控系统的摄像设备,通过该摄像设备拍摄的图像以及事件数据作为输入数据输入至执行设备510的计算模块511,由计算模块511对输入的该目标图像进行处理后得到处理结果,再将该处理结果输出至摄像设备或直接在执行设备510的显示界面(若有)进行显示;此外,在本申请的一些实施方式中,客户设备540也可以集成在执行设备510中,如,当执行设备510为手机时,则可以直接通过该手机获取到目标任务(如,可以通过该手机的摄像头拍摄到图像以及事件数据等)或者接收其他设备(如,另一个手机)发送的目标任务,再由该手机内的计算模块511对该目标任务进行检测后得出检测结果,并直接将该检测结果呈现在手机的显示界面。此处对执行设备510与客户设备540的产品形态不做限定。The trained neural network obtained after the target model/rule 501 is trained by the training device 520 can be applied to different systems or devices (i.e., the execution device 510), which can be edge devices or end-side devices, such as mobile phones, tablets, laptops, monitoring systems (such as cameras), security systems, etc. In FIG5 , the execution device 510 is configured with an I/O interface 512 for data interaction with external devices, and a "user" can input data to the I/O interface 512 through a client device 540. For example, the client device 540 can be a camera device of a monitoring system, and the images and event data captured by the camera device are input as input data to the computing module 511 of the execution device 510, and the computing module 511 processes the input target image to obtain a processing result, and then outputs the processing result to the camera device or directly displays it on the display interface of the execution device 510 (if any); in addition, in some embodiments of the present application, the client device 540 can also be integrated in the execution device 510, such as, when the execution device 510 is a mobile phone, the target task can be directly obtained through the mobile phone (such as, the image and event data can be captured by the camera of the mobile phone) or the target task sent by other devices (such as, another mobile phone) can be received, and then the computing module 511 in the mobile phone detects the target task and obtains the detection result, and directly presents the detection result on the display interface of the mobile phone. The product form of the execution device 510 and the client device 540 is not limited here.
值得注意的是,图5仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图5中,数据存储系统550相对执行设备510是外部存储器,在其它情况下,也可以将数据存储系统550置于执行设备510中。应理解,上述执行设备510可以部署于客户设备540中。It is worth noting that FIG. 5 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, components, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 5, the data storage system 550 is an external memory relative to the execution device 510. In other cases, the data storage system 550 can also be placed in the execution device 510. It should be understood that the above-mentioned execution device 510 can be deployed in the client device 540.
从模型的推理侧来说:From the inference side of the model:
本申请实施例中,上述执行设备510的计算模块511可以获取到数据存储系统550中存储的代码来实现本申请实施例中的和模型推理过程相关的步骤。In the embodiment of the present application, the computing module 511 of the above-mentioned execution device 510 can obtain the code stored in the data storage system 550 to implement the steps related to the model reasoning process in the embodiment of the present application.
本申请实施例中,执行设备510的计算模块511可以包括硬件电路(如专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器等等)、或这些硬件电路的
组合,例如,训练设备520可以为具有执行指令功能的硬件系统,如CPU、DSP等,或者为不具有执行指令功能的硬件系统,如ASIC、FPGA等,或者为上述不具有执行指令功能的硬件系统以及具有执行指令功能的硬件系统的组合。In the embodiment of the present application, the computing module 511 of the execution device 510 may include a hardware circuit (such as an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a general-purpose processor, a digital signal processor (DSP), a microprocessor or a microcontroller, etc.), or a combination of these hardware circuits. Combination, for example, the training device 520 can be a hardware system with the function of executing instructions, such as CPU, DSP, etc., or a hardware system without the function of executing instructions, such as ASIC, FPGA, etc., or a combination of the above-mentioned hardware systems without the function of executing instructions and hardware systems with the function of executing instructions.
具体的,执行设备510的计算模块511可以为具有执行指令功能的硬件系统,本申请实施例提供的和模型推理过程相关的步骤可以为存储在存储器中的软件代码,执行设备510的计算模块511可以从存储器中获取到软件代码,并执行获取到的软件代码来实现本申请实施例提供的和模型推理过程相关的步骤。Specifically, the computing module 511 of the execution device 510 can be a hardware system with an execution instruction function, and the steps related to the model reasoning process provided in the embodiment of the present application can be software codes stored in the memory. The computing module 511 of the execution device 510 can obtain the software code from the memory and execute the obtained software code to implement the steps related to the model reasoning process provided in the embodiment of the present application.
应理解,执行设备510的计算模块511可以为不具有执行指令功能的硬件系统以及具有执行指令功能的硬件系统的组合,本申请实施例提供的和模型推理过程相关的步骤的部分步骤还可以通过执行设备510的计算模块511中不具有执行指令功能的硬件系统来实现,这里并不限定。It should be understood that the computing module 511 of the execution device 510 can be a combination of a hardware system that does not have the function of executing instructions and a hardware system that has the function of executing instructions. Some of the steps related to the model reasoning process provided in the embodiments of the present application can also be implemented by the hardware system that does not have the function of executing instructions in the computing module 511 of the execution device 510, which is not limited here.
从模型的训练侧来说:From the training side of the model:
本申请实施例中,上述训练设备520可以获取到存储器(图5中未示出,可以集成于训练设备520或者与训练设备520分离部署)中存储的代码来实现本申请实施例中和模型训练相关的步骤。In an embodiment of the present application, the above-mentioned training device 520 can obtain the code stored in the memory (not shown in Figure 5, which can be integrated into the training device 520 or deployed separately from the training device 520) to implement the steps related to model training in an embodiment of the present application.
本申请实施例中,训练设备520可以包括硬件电路(如专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器等等)、或这些硬件电路的组合,例如,训练设备520可以为具有执行指令功能的硬件系统,如CPU、DSP等,或者为不具有执行指令功能的硬件系统,如ASIC、FPGA等,或者为上述不具有执行指令功能的硬件系统以及具有执行指令功能的硬件系统的组合。In the embodiment of the present application, the training device 520 may include a hardware circuit (such as an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a general-purpose processor, a digital signal processor (DSP), a microprocessor or a microcontroller, etc.), or a combination of these hardware circuits. For example, the training device 520 may be a hardware system with an instruction execution function, such as a CPU, DSP, etc., or a hardware system without an instruction execution function, such as an ASIC, FPGA, etc., or a combination of the above-mentioned hardware systems without an instruction execution function and hardware systems with an instruction execution function.
应理解,训练设备520可以为不具有执行指令功能的硬件系统以及具有执行指令功能的硬件系统的组合,本申请实施例提供的中和模型训练相关的部分步骤还可以通过训练设备520中不具有执行指令功能的硬件系统来实现,这里并不限定。It should be understood that the training device 520 can be a combination of a hardware system that does not have the function of executing instructions and a hardware system that has the function of executing instructions. Some of the steps related to the model training provided in the embodiments of the present application can also be implemented by the hardware system that does not have the function of executing instructions in the training device 520, which is not limited here.
二、服务器提供的图像处理类云服务:2. Image processing cloud services provided by the server:
在一种可能的实现中,服务器可以通过应用程序编程接口(application programming interface,API)为端侧提供图像处理或者基于图像处理结果进行下游任务的服务。In one possible implementation, the server can provide image processing services to the end side or perform downstream tasks based on the image processing results through an application programming interface (API).
其中,终端设备可以通过云端提供的API,将相关参数(例如图像数据)发送至服务器,服务器可以基于接收到的参数,得到处理结果,并将处理结果(例如增强后的图像数据)返回至至终端。Among them, the terminal device can send relevant parameters (such as image data) to the server through the API provided by the cloud. The server can obtain processing results based on the received parameters and return the processing results (such as enhanced image data) to the terminal.
关于终端以及服务器的描述可以上述实施例的描述,这里不再赘述。The description of the terminal and the server can be the same as that of the above embodiments, and will not be repeated here.
如图6示出了使用一项云平台提供的图像处理类云服务的流程。FIG. 6 shows a process of using an image processing cloud service provided by a cloud platform.
1.开通并购买内容审核服务。1. Activate and purchase content review service.
2.用户可以下载内容审核服务对应的软件开发工具包(software development kit,SDK),通常云平台提供多个开发版本的SDK,供用户根据开发环境的需求选择,例如JAVA版本的SDK、python版本的SDK、PHP版本的SDK、Android版本的SDK等。2. Users can download the software development kit (SDK) corresponding to the content review service. Usually, the cloud platform provides multiple development versions of the SDK for users to choose according to the requirements of the development environment, such as JAVA version SDK, Python version SDK, PHP version SDK, Android version SDK, etc.
3.用户根据需求下载对应版本的SDK到本地后,将SDK工程导入至本地开发环境,在本地开发环境中进行配置和调试,本地开发环境还可以进行其他功能的开发,使得形成一个集合了图像处理类能力的应用。3. After the user downloads the corresponding version of the SDK to the local computer according to the needs, the SDK project is imported into the local development environment, and configuration and debugging are performed in the local development environment. The local development environment can also be used to develop other functions, thus forming an application that integrates image processing capabilities.
4.图像处理类应用在被使用的过程中,当需要进行图像处理或者基于图像处理结果进行下游任务时,可以触发图像处理或者基于图像处理结果进行下游任务的API调用。当应用触发图像处理或者基于图像处理结果进行下游任务功能时,发起API请求至云环境中的图像处理类服务的运行实例,其中,API请求中携带图像,由云环境中的运行实例对图像进行处理,获得处理结果。4. When image processing applications are used, when image processing is required or downstream tasks are performed based on image processing results, API calls for image processing or downstream tasks based on image processing results can be triggered. When the application triggers image processing or performs downstream task functions based on image processing results, an API request is initiated to the running instance of the image processing service in the cloud environment, where the API request carries an image, and the running instance in the cloud environment processes the image to obtain the processing result.
5.云环境将处理结果返回至应用,由此完成一次的图像处理或者基于图像处理结果进行下游任务服务调用。5. The cloud environment returns the processing results to the application, thereby completing the image processing once or making a downstream task service call based on the image processing results.
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例涉及的相关术语及神经网络等相关概念进行介绍。
Since the embodiments of the present application involve the application of a large number of neural networks, in order to facilitate understanding, the relevant terms and related concepts such as neural networks involved in the embodiments of the present application are first introduced below.
(1)神经网络(1) Neural Network
神经网络可以是由神经单元组成的,神经单元可以是指以xs(即输入数据)和截距1为输入的运算单元,该运算单元的输出可以为:
A neural network may be composed of neural units, and a neural unit may refer to an operation unit that takes xs (i.e., input data) and intercept 1 as input, and the output of the operation unit may be:
A neural network may be composed of neural units, and a neural unit may refer to an operation unit that takes xs (i.e., input data) and intercept 1 as input, and the output of the operation unit may be:
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。Where s=1, 2, ...n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into the output signal. The output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting multiple single neural units mentioned above, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected to the local receptive field of the previous layer to extract the characteristics of the local receptive field. The local receptive field can be an area composed of several neural units.
(2)卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取特征的方式与位置无关。卷积核可以以随机大小的矩阵的形式化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。(2) Convolutional neural network (CNN) is a deep neural network with a convolutional structure. A convolutional neural network contains a feature extractor consisting of a convolutional layer and a subsampling layer, which can be regarded as a filter. A convolutional layer refers to a neuron layer in a convolutional neural network that performs convolution processing on the input signal. In a convolutional layer of a convolutional neural network, a neuron can only be connected to some neurons in the adjacent layers. A convolutional layer usually contains several feature planes, each of which can be composed of some rectangularly arranged neural units. The neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract features is independent of position. The convolution kernel can be formalized as a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network. In addition, the direct benefit of shared weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
CNN是一种非常常见的神经网络,下面结合图7重点对CNN的结构进行详细的介绍。如前文的基础概念介绍所述,卷积神经网络是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。CNN is a very common neural network. The following is a detailed introduction to the structure of CNN in conjunction with Figure 7. As mentioned in the previous basic concept introduction, convolutional neural network is a deep neural network with a convolution structure and a deep learning architecture. A deep learning architecture refers to multiple levels of learning at different abstract levels through machine learning algorithms. As a deep learning architecture, CNN is a feed-forward artificial neural network in which each neuron can respond to the image input into it.
如图7所示,卷积神经网络(CNN)200可以包括输入层210,卷积层/池化层220(其中池化层为可选的),以及全连接层(fully connected layer)230。As shown in FIG. 7 , a convolutional neural network (CNN) 200 may include an input layer 210 , a convolutional layer/pooling layer 220 (wherein the pooling layer is optional), and a fully connected layer 230 .
卷积层/池化层220:Convolutional layer/pooling layer 220:
卷积层:Convolutional Layer:
如图7所示卷积层/池化层220可以包括如示例221-226层,举例来说:在一种实现中,221层为卷积层,222层为池化层,223层为卷积层,224层为池化层,225为卷积层,226为池化层;在另一种实现方式中,221、222为卷积层,223为池化层,224、225为卷积层,226为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。As shown in FIG7 , the convolution layer/pooling layer 220 may include layers 221-226, for example: in one implementation, layer 221 is a convolution layer, layer 222 is a pooling layer, layer 223 is a convolution layer, layer 224 is a pooling layer, layer 225 is a convolution layer, and layer 226 is a pooling layer; in another implementation, layers 221 and 222 are convolution layers, layer 223 is a pooling layer, layers 224 and 225 are convolution layers, and layer 226 is a pooling layer. That is, the output of a convolution layer can be used as the input of a subsequent pooling layer, or as the input of another convolution layer to continue the convolution operation.
下面将以卷积层221为例,介绍一层卷积层的内部工作原理。The following will take the convolution layer 221 as an example to introduce the internal working principle of a convolution layer.
卷积层221可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用多个尺寸(行×列)相同的权重矩阵,即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度,这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同,经过该多个尺寸相同的权重矩阵提取后的特征图的尺寸也相同,再将
提取到的多个尺寸相同的特征图合并形成卷积运算的输出。The convolution layer 221 may include a plurality of convolution operators, which are also called kernels. The convolution operator is equivalent to a filter that extracts specific information from the input image matrix in image processing. The convolution operator can be essentially a weight matrix, which is usually predefined. In the process of performing convolution operations on the image, the weight matrix is usually processed one pixel after another (or two pixels after two pixels... depending on the value of the step length stride) in the horizontal direction on the input image, thereby completing the work of extracting specific features from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image. In the process of performing convolution operations, the weight matrix will extend to the entire depth of the input image. Therefore, convolution with a single weight matrix will produce a convolution output with a single depth dimension, but in most cases, a single weight matrix is not used, but multiple weight matrices of the same size (row × column), that is, multiple isotype matrices, are applied. The output of each weight matrix is stacked to form the depth dimension of the convolution image, and the dimension here can be understood as being determined by the "multiple" mentioned above. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to blur unnecessary noise in the image. The multiple weight matrices have the same size (rows × columns), and the feature maps extracted by the multiple weight matrices of the same size are also the same size. Multiple extracted feature maps of the same size are merged to form the output of the convolution operation.
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络200进行正确的预测。The weight values in these weight matrices need to be obtained through a lot of training in practical applications. The weight matrices formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions.
当卷积神经网络200有多个卷积层的时候,初始的卷积层(例如221)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络200深度的加深,越往后的卷积层(例如226)提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。When the convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (for example, 221) often extracts more general features, which can also be called low-level features. As the depth of the convolutional neural network 200 increases, the features extracted by the later convolutional layers (for example, 226) become more and more complex, such as high-level semantic features. Features with higher semantics are more suitable for the problem to be solved.
池化层:Pooling layer:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,在如图7中220所示例的221-226各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce a pooling layer after the convolution layer. In each layer 221-226 as shown in 220 in FIG. 7, a convolution layer may be followed by a pooling layer, or multiple convolution layers may be followed by one or more pooling layers. In the image processing process, the only purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator to sample the input image to obtain an image of smaller size. The average pooling operator may calculate the pixel values in the image within a specific range to generate an average value as the result of average pooling. The maximum pooling operator may take the pixel with the largest value in the range within a specific range as the result of maximum pooling. In addition, just as the size of the weight matrix used in the convolution layer should be related to the image size, the operator in the pooling layer should also be related to the image size. The size of the image output after processing by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average value or maximum value of the corresponding sub-region of the image input to the pooling layer.
全连接层230:Fully connected layer 230:
在经过卷积层/池化层220的处理后,卷积神经网络200还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层220只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息),卷积神经网络200需要利用全连接层230来生成一个或者一组所需要的类的数量的输出。因此,在全连接层230中可以包括多层隐含层(如图7所示的231、232至23n),该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像超分辨率重建等等……After being processed by the convolution layer/pooling layer 220, the convolution neural network 200 is not sufficient to output the required output information. Because as mentioned above, the convolution layer/pooling layer 220 will only extract features and reduce the parameters brought by the input image. However, in order to generate the final output information (the required class information or other related information), the convolution neural network 200 needs to use the fully connected layer 230 to generate one or a group of outputs of the required number of classes. Therefore, the fully connected layer 230 may include multiple hidden layers (231, 232 to 23n as shown in Figure 7), and the parameters contained in the multiple hidden layers can be pre-trained according to the relevant training data of the specific task type. For example, the task type may include image recognition, image classification, image super-resolution reconstruction, etc.
在全连接层230中的多层隐含层之后,也就是整个卷积神经网络200的最后层为输出层240,该输出层240具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络200的前向传播(如图7由210至240方向的传播为前向传播)完成,反向传播(如图7由240至210方向的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络200的损失,及卷积神经网络200通过输出层输出的结果和理想结果之间的误差。After the multiple hidden layers in the fully connected layer 230, that is, the last layer of the entire convolutional neural network 200 is the output layer 240, which has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error. Once the forward propagation of the entire convolutional neural network 200 (as shown in FIG. 7, the propagation from 210 to 240 is the forward propagation) is completed, the back propagation (as shown in FIG. 7, the propagation from 240 to 210 is the back propagation) will begin to update the weight values and biases of the aforementioned layers to reduce the loss of the convolutional neural network 200 and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.
需要说明的是,如图7所示的卷积神经网络200仅作为一种卷积神经网络的示例,在具体的应用中,卷积神经网络还可以以其他网络模型的形式存在,例如,仅包括图7中所示的网络结构的一部分,比如,本申请实施例中所采用的卷积神经网络可以仅包括输入层210、卷积层/池化层220和输出层240。It should be noted that the convolutional neural network 200 shown in Figure 7 is only an example of a convolutional neural network. In specific applications, the convolutional neural network may also exist in the form of other network models, for example, including only a part of the network structure shown in Figure 7. For example, the convolutional neural network used in the embodiment of the present application may only include an input layer 210, a convolution layer/pooling layer 220 and an output layer 240.
需要说明的是,如图7所示的卷积神经网络100仅作为一种卷积神经网络的示例,在具体的应用中,卷积神经网络还可以以其他网络模型的形式存在,例如,如图8所示的多个卷积层/池化层并行,将分别提取的特征均输入给全连接层230进行处理。It should be noted that the convolutional neural network 100 shown in FIG. 7 is only an example of a convolutional neural network. In specific applications, the convolutional neural network can also exist in the form of other network models. For example, multiple convolutional layers/pooling layers are used in parallel as shown in FIG. 8, and the extracted features are input to the fully connected layer 230 for processing.
(3)深度神经网络(3) Deep Neural Networks
深度神经网络(Deep Neural Network,DNN),也称多层神经网络,可以理解为具有很多层隐含层的神经网络,这里的“很多”并没有特别的度量标准。从DNN按不同层的位置划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:其中,是输入向量,是输出向量,是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量经过如此简单的操作得到输出向量由于DNN层数多,则系数W和偏移向量的数量也就很多了。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。总结就是:第L-1层的第k个神经元到第L层的第j个神经元的系数定义为需要注意的是,输入
层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。Deep Neural Network (DNN), also known as multi-layer neural network, can be understood as a neural network with many hidden layers. There is no special metric for "many" here. From the position of different layers of DNN, the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the layers in between are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer. Although DNN looks complicated, the work of each layer is actually not complicated. Simply put, it is the following linear relationship expression: in, is the input vector, is the output vector, is the offset vector, W is the weight matrix (also called coefficient), and α() is the activation function. Each layer is just an input vector After such a simple operation, the output vector Since DNN has many layers, the coefficient W and the offset vector The definition of these parameters in DNN is as follows: Taking coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as The superscript 3 represents the layer number of the coefficient W, while the subscripts correspond to the output third layer index 2 and the input second layer index 4. In summary, the coefficients from the kth neuron in the L-1th layer to the jth neuron in the Lth layer are defined as It should be noted that the input The layer has no W parameter. In a deep neural network, more hidden layers allow the network to better describe complex situations in the real world. Theoretically, the more parameters a model has, the higher its complexity and the greater its "capacity", which means it can complete more complex learning tasks. Training a deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by many layers of vector W).
(4)损失函数(4) Loss Function
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。In the process of training a deep neural network, because we hope that the output of the deep neural network is as close as possible to the value we really want to predict, we can compare the predicted value of the current network with the target value we really want, and then update the weight vector of each layer of the neural network according to the difference between the two (of course, there is usually an initialization process before the first update, that is, pre-configuring parameters for each layer in the deep neural network). For example, if the predicted value of the network is high, adjust the weight vector to make it predict a lower value, and keep adjusting until the deep neural network can predict the target value we really want or a value very close to the target value we really want. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function or objective function, which are important equations used to measure the difference between the predicted value and the target value. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, so the training of the deep neural network becomes a process of minimizing this loss as much as possible.
(5)反向传播算法(5) Back propagation algorithm
可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始模型中参数的大小,使得模型的误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始模型中的参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的模型参数,例如权重矩阵。The error back propagation (BP) algorithm can be used to correct the size of the parameters in the initial model during the training process, so that the error loss of the model becomes smaller and smaller. Specifically, the forward transmission of the input signal to the output will generate error loss, and the parameters in the initial model are updated by back propagating the error loss information, so that the error loss converges. The back propagation algorithm is a back propagation movement dominated by error loss, aiming to obtain the optimal model parameters, such as the weight matrix.
(6)扩散模型:Diffusion Models,是指定义一个扩散步骤的马尔可夫链,逐渐向数据添加随机噪声,然后学习逆扩散过程,从噪声中构建所需的数据样本。(6) Diffusion Models: Diffusion Models refer to defining a Markov chain of diffusion steps, gradually adding random noise to the data, and then learning the inverse diffusion process to construct the required data samples from the noise.
(7)图像复原:Image Restoration,是指将由于各种因素导致的低质量图片中的退化成分去除,恢复出具有完整细节的高质量图片的过程。(7) Image restoration: Image Restoration refers to the process of removing the degraded components in low-quality images caused by various factors and restoring high-quality images with complete details.
本申请可以应用在图像增强与修复、终端应用、自动驾驶等实际场景中。This application can be applied in practical scenarios such as image enhancement and restoration, terminal applications, and autonomous driving.
例如,自动驾驶过程中往往会因阴雨天气而被遮挡前窗视线,具有极大的安全隐患,本申请可以有效地去除雨滴,恢复出清晰视线;For example, during autonomous driving, the front window view is often blocked due to rainy weather, which poses a great safety hazard. This application can effectively remove raindrops and restore a clear view;
例如,由于环境光的影响,现有设备拍摄出的照片往往会曝光不足,本申请可以显著将暗光照片增强到自然光水平,方便后续处理;For example, due to the influence of ambient light, photos taken by existing devices are often underexposed. This application can significantly enhance low-light photos to natural light levels, making subsequent processing easier.
例如,当直接用终端摄像头对屏幕进行拍照时,由于屏幕是实时动态刷新的,因此摄像头拍摄下来的照片会存在明显的摩尔纹,本申请可以有效去除摩尔纹,弥补终端设备的不足。For example, when taking a picture of the screen directly with the terminal camera, since the screen is refreshed in real time and dynamically, the pictures taken by the camera will have obvious moiré patterns. This application can effectively remove the moiré patterns and make up for the shortcomings of the terminal device.
在现实生活中,由于环境和摄影技术的影响,拍摄到的图片往往会包含退化成分如雨滴、模糊、噪声、摩尔纹等而导致其本身的质量降低。图像复原指的就是通过技术手段去除这些低质图片中的退化成分,从而恢复出清晰的高质量的图片。In real life, due to the influence of the environment and photography technology, the pictures taken often contain degradation components such as raindrops, blur, noise, moiré, etc., which reduce their quality. Image restoration refers to the use of technical means to remove the degradation components in these low-quality pictures, thereby restoring clear and high-quality pictures.
目前业界已经存在较多针对单个或多个任务进行图像恢复的方法。早期它们大多是基于统计先验的传统方法。但由于传统方法的局限性,这些方法不能很好地去除退化成分,而且可能会产生彩色伪影。近年来学术界提出了一些基于深度学习的方法,这些方法大多基于CNN或者Transformer直接通过端对端的训练方式从模糊图像中预测对应的清晰图像,往往需要大量训练数据。At present, there are many methods for image restoration in the industry for single or multiple tasks. In the early days, most of them were traditional methods based on statistical priors. However, due to the limitations of traditional methods, these methods cannot remove degradation components well and may produce color artifacts. In recent years, the academic community has proposed some methods based on deep learning. Most of these methods are based on CNN or Transformer to directly predict the corresponding clear image from the blurred image through end-to-end training, which often requires a large amount of training data.
然而现有技术中的图像复原技术的处理精度较低。However, the processing accuracy of the image restoration technology in the prior art is low.
为了解决上述问题,本申请提供了一种图像处理方法,该图像处理方法可以为模型训练的前馈过程,也可以为推理过程。In order to solve the above problems, the present application provides an image processing method, which can be a feedforward process of model training or an inference process.
参照图9,图9为本申请实施例提供的一种图像处理方法,如图9所示,本申请提供的图像处理方法,包括:Referring to FIG. 9 , FIG. 9 is an image processing method provided by an embodiment of the present application. As shown in FIG. 9 , the image processing method provided by the present application includes:
901、获取第一图像。901. Acquire a first image.
本申请实施例中,步骤901的执行主体可以为终端设备,终端设备可以为便携式移动设备,例如但不限于移动或便携式计算设备(如智能手机)、个人计算机、服务器计算机、手持式设备(例如平板)或膝
上型设备、多处理器系统、游戏控制台或控制器、基于微处理器的系统、机顶盒、可编程消费电子产品、移动电话、具有可穿戴或配件形状因子(例如,手表、眼镜、头戴式耳机或耳塞)的移动计算和/或通信设备、网络PC、小型计算机、大型计算机、包括上面的系统或设备中的任何一种的分布式计算环境等等。In the embodiment of the present application, the execution subject of step 901 may be a terminal device, and the terminal device may be a portable mobile device, such as but not limited to a mobile or portable computing device (such as a smart phone), a personal computer, a server computer, a handheld device (such as a tablet), or a laptop. A computer or device comprising a laptop, a multiprocessor system, a game console or controller, a microprocessor-based system, a set-top box, a programmable consumer electronics product, a mobile phone, a mobile computing and/or communication device with a wearable or accessory form factor (e.g., a watch, glasses, headphones or earbuds), a network PC, a minicomputer, a mainframe computer, a distributed computing environment including any of the above systems or devices, and the like.
本申请实施例中,步骤1001的执行主体可以为云侧的服务器,服务器可以接收来自终端设备发送的第一图像,进而服务器可以获取到第一图像。In the embodiment of the present application, the execution entity of step 1001 may be a server on the cloud side, and the server may receive the first image sent from the terminal device, and then the server may obtain the first image.
在一种可能的实现中,第一图像可以为低质图像,第一图像可以为包括雨滴等自然环境遮挡的图像、或者是由于环境光的影响存在曝光不足的图像、存在明显的摩尔纹的图像。In a possible implementation, the first image may be a low-quality image, an image occluded by natural environment such as raindrops, or an image that is underexposed due to the influence of ambient light or an image with obvious moiré.
902、将所述第一图像转换到频域,得到第一数据;所述第一数据的空间分辨率低于所述第一图像。902. Convert the first image into a frequency domain to obtain first data; the spatial resolution of the first data is lower than that of the first image.
在一种可能的实现中,可以通过二阶小波变换将所述第一图像转换到频域。In a possible implementation, the first image may be converted into the frequency domain by a second-order wavelet transform.
如图11左半部分所示,可以将空间域RGB低质图像Xd进行二阶Haar小波变换,得到小波域内图像xd,可选的,图像尺寸由H×W×3变成使得空间分辨率下降了16倍,从而可以加快处理时间。As shown in the left half of Figure 11, the spatial domain RGB low-quality image Xd can be transformed by a second-order Haar wavelet transform to obtain the image xd in the wavelet domain. Optionally, the image size is changed from H×W×3 to This reduces spatial resolution by a factor of 16, which can speed up processing time.
通过上述方式,利用小波变换将扩散模型从空间域引入到小波域内,能够显著减少图像处理时间(模型只需学习图像的部分频谱,相对更加简单,同时由于空间分辨率的降低,模型处理图片的时间更少)。In the above manner, the diffusion model is introduced from the spatial domain to the wavelet domain using wavelet transform, which can significantly reduce the image processing time (the model only needs to learn part of the spectrum of the image, which is relatively simpler. At the same time, due to the reduction in spatial resolution, the model takes less time to process the image).
903、根据所述第一数据,确定第一高频信息;所述第一高频信息为对所述第一图像对应的高质图像的高频通道的信息预测结果。903. Determine first high-frequency information according to the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image.
在一种可能的实现中,可以根据所述第一数据,通过预训练的第二网络,确定第一高频信息。In a possible implementation, the first high-frequency information can be determined based on the first data through a pre-trained second network.
可选的,第二网络可以有多个(例如14个)具有残差结构的卷积层堆叠构成。其主要作用是学习低质图像高频频谱和其对应的清晰图像高频频谱之间的差异,从而预测出低质图像恢复后的高频频谱
Optionally, the second network can be composed of multiple (e.g., 14) convolutional layers with residual structures. Its main function is to learn the difference between the high-frequency spectrum of the low-quality image and the high-frequency spectrum of its corresponding clear image, so as to predict the high-frequency spectrum of the low-quality image after restoration.
应理解,本申请实施例中图像的高频频谱(或者可以称之为高频通道对应的信息),是相对于图像的低频频谱(或者可以称之为低频通道对应的信息)而言的。高频频谱对应的频率相对于低频频谱的频率是更高的。It should be understood that the high-frequency spectrum of the image in the embodiment of the present application (or the information corresponding to the high-frequency channel) is relative to the low-frequency spectrum of the image (or the information corresponding to the low-frequency channel). The frequency corresponding to the high-frequency spectrum is higher than the frequency of the low-frequency spectrum.
例如,第一图像对应的高质图像的可以包含多个通道的信息,其中,多个通道可以包括高频通道和相对于高频通道而言的低频通道。For example, the high-quality image corresponding to the first image may include information of multiple channels, wherein the multiple channels may include a high-frequency channel and a low-frequency channel relative to the high-frequency channel.
904、获取第一低频信息,所述第一低频信息包含所述高质图像的低频通道的噪声。904. Obtain first low-frequency information, where the first low-frequency information includes noise in a low-frequency channel of the high-quality image.
在一种可能的实现中,步骤904以及后续的905可以为迭代的过程,905得到的结果可以作为下一次步骤904所获取的第一低频信息。In a possible implementation, step 904 and the subsequent step 905 may be an iterative process, and the result obtained in step 905 may be used as the first low-frequency information obtained in the next step 904 .
在一种可能的实现中,若步骤904为第一次迭代的过程,则所述第一低频信息可以为随机生成的噪声(例如高斯白噪声)。若步骤904为第i次(i大于1)迭代的过程,则所述第一低频信息可以为随机生成的噪声(例如高斯白噪声)。In a possible implementation, if step 904 is the first iteration process, the first low-frequency information may be randomly generated noise (e.g., Gaussian white noise). If step 904 is the i-th (i is greater than 1) iteration process, the first low-frequency information may be randomly generated noise (e.g., Gaussian white noise).
905、根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息;所述第一噪声信息用于对所述第一低频信息进行去噪,得到第二低频信息。905. Obtain first noise information through a first network according to the first high-frequency information and the first low-frequency information; the first noise information is used to denoise the first low-frequency information to obtain second low-frequency information.
在一种可能的实现中,可以将第一高频信息和所述第一低频信息输入到第一网络,第一网络为预训练好的网络,可以根据第一高频信息和所述第一低频信息,得到第一噪声信息。In a possible implementation, the first high-frequency information and the first low-frequency information may be input into a first network, and the first network is a pre-trained network, and the first noise information may be obtained according to the first high-frequency information and the first low-frequency information.
在一种可能的实现中,可以将第一数据也输入到第一网络,也就是可以根据所述第一高频信息、所述第一数据和所述第一低频信息,通过第一网络得到第一噪声信息。In a possible implementation, the first data may also be input into the first network, that is, the first noise information may be obtained through the first network according to the first high-frequency information, the first data and the first low-frequency information.
在一种可能的实现中,所述根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息为在第i次迭代时执行的,所述i为大于1的正整数;在第i+1次迭代时,根据所述第一高频信息和所述第二低频信息,通过所述第一网络得到第二噪声信息,所述第二噪声信息用于对所述第二低频信息进行去噪,得到第三低频信息。In a possible implementation, obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer greater than 1; at the i+1-th iteration, obtaining the second noise information through the first network according to the first high-frequency information and the second low-frequency information, and the second noise information is used to denoise the second low-frequency information to obtain third low-frequency information.
在采样框架中,初始时刻为t=T,将一个高斯白噪声输入到噪声估计网络中,此时该网络的目的就是从这个噪声中估计出每个时刻需要去除的噪声,一步一步将中的噪声去除掉直至变成一张清晰图片的低频频谱。在这个噪声去除过程中,同样在每个时刻t都需要将低质图像xd和HFRM预测出的清晰图像高频频谱一起输入到噪声估计网络中作为估计的条件。t从T开始减少,如此往复迭代,直到t=0
为止。In the sampling frame, the initial time is t = T, and a Gaussian white noise Input into the noise estimation network. The purpose of this network is to estimate the noise that needs to be removed at each moment from this noise, step by step. The noise in the image is removed until it becomes a low-frequency spectrum of a clear image. In this noise removal process, at each time t, the low-quality image xd and the high-frequency spectrum of the clear image predicted by HFRM need to be The two are input into the noise estimation network as the estimation condition. t decreases from T, and iterates back and forth until t = 0 until.
接下来介绍如何根据第一噪声信息对所述第一低频信息进行去噪,得到第二低频信息:Next, we will introduce how to denoise the first low-frequency information according to the first noise information to obtain the second low-frequency information:
在一种可能的实现中,可以将所述第二低频信息通过目标映射,得到目标低频信息;所述目标映射不包含噪声估计项;所述目标低频信息和所述第一高频信息用于融合(例如拼接)得到融合结果,所述第二图像为将所述融合结果映射到空间域(例如通过逆小波变换)得到的。In a possible implementation, the second low-frequency information can be mapped through target mapping to obtain target low-frequency information; the target mapping does not include a noise estimation item; the target low-frequency information and the first high-frequency information are used for fusion (for example, splicing) to obtain a fusion result, and the second image is obtained by mapping the fusion result to the spatial domain (for example, through an inverse wavelet transform).
现有的扩散模型相关工作大都采用DDIM的DIS,在这个方法中,对整个采样过程(t=t-1,1≤t≤T)采用以量化间隔S为跨度的跳步采样可以将采样步数从T缩减到T/S。本申请实施例在该采样方法的基础上,进一步探索出一种高效率条件采样算法,可以直接在采样过程中间时刻M直接预测原始图片,即无需走完整个DIS过程,此时采样步数为(T-M)/S。该采样方法流程为:可以是T的一个预设比例(比如80%)的值。例如如下得到Xt-1的公式为根据第一噪声信息对所述第一低频信息进行去噪对应的实现,得到X0的公式为目标映射对应的实现。
Most of the existing diffusion model-related works use DDIM's DIS. In this method, the whole sampling process (t=t-1, 1≤t≤T) uses skip sampling with a quantization interval S as the span to reduce the number of sampling steps from T to T/S. Based on this sampling method, the embodiment of the present application further explores a high-efficiency conditional sampling algorithm, which can directly predict the original image at the middle moment M in the sampling process, that is, there is no need to go through the entire DIS process. At this time, the number of sampling steps is (TM)/S. The flow of the sampling method is: it can be a preset proportion of T (for example, 80%). For example, the formula for obtaining Xt-1 as follows is the corresponding implementation of denoising the first low-frequency information according to the first noise information, and the formula for obtaining X0 is the corresponding implementation of the target mapping.
Most of the existing diffusion model-related works use DDIM's DIS. In this method, the whole sampling process (t=t-1, 1≤t≤T) uses skip sampling with a quantization interval S as the span to reduce the number of sampling steps from T to T/S. Based on this sampling method, the embodiment of the present application further explores a high-efficiency conditional sampling algorithm, which can directly predict the original image at the middle moment M in the sampling process, that is, there is no need to go through the entire DIS process. At this time, the number of sampling steps is (TM)/S. The flow of the sampling method is: it can be a preset proportion of T (for example, 80%). For example, the formula for obtaining Xt-1 as follows is the corresponding implementation of denoising the first low-frequency information according to the first noise information, and the formula for obtaining X0 is the corresponding implementation of the target mapping.
通过上述方式,总采样步数能够大幅缩减(例如缩减至原来的1/5左右),从而提升采样效率。Through the above method, the total number of sampling steps can be greatly reduced (for example, reduced to about 1/5 of the original), thereby improving the sampling efficiency.
在一种可能的实现中,所述第二低频信息和所述第一高频信息用于得到第二图像。In a possible implementation, the second low-frequency information and the first high-frequency information are used to obtain a second image.
在采样框架中,当从高斯白噪声恢复出清晰图片的低频频谱之后,再与HFRM预测出的清晰图像高频频谱融合在一起,进行二阶Haar小波逆变换,便得到低质图像Xd空间域恢复结果。In the sampling frame, when the Gaussian white noise Restore the low-frequency spectrum of a clear image Then, compare it with the high-frequency spectrum of the clear image predicted by HFRM After being fused together and subjected to second-order Haar wavelet inverse transform, the restoration result of the low-quality image Xd in spatial domain is obtained.
本申请提供了一种图像处理方法,所述方法包括:获取第一图像;将所述第一图像转换到频域,得到第一数据;所述第一数据的空间分辨率低于所述第一图像;根据所述第一数据,确定第一高频信息;所述第一高频信息为对所述第一图像对应的高质图像的高频通道的信息预测结果;获取第一低频信息,所述第一低频信息包含所述高质图像的低频通道的噪声;根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息;所述第一噪声信息用于对所述第一低频信息进行去噪,得到第二低频信息;所述第二低频信息和所述第一高频信息用于得到第二图像。本申请实施例中,将图像转换到频域进行图像恢复,可以避免采用图像的切分(切块需要分别处理再合并,可能出现边界伪影,且图片尺寸大时切块数量过多,导致处理时间长),从而提高复原质量以及降低处理时间。此外,基于高频信息和包含噪声的低频信息来预测噪声,基于该噪声进行的图像复原得到的图像质量较高(能够恢复出更多的细节,同时显著减少了采样的总时间)。The present application provides an image processing method, the method comprising: obtaining a first image; converting the first image to a frequency domain to obtain first data; the spatial resolution of the first data is lower than that of the first image; determining first high-frequency information according to the first data; the first high-frequency information is the information prediction result of the high-frequency channel of the high-quality image corresponding to the first image; obtaining first low-frequency information, the first low-frequency information containing the noise of the low-frequency channel of the high-quality image; obtaining first noise information through a first network according to the first high-frequency information and the first low-frequency information; the first noise information is used to denoise the first low-frequency information to obtain second low-frequency information; the second low-frequency information and the first high-frequency information are used to obtain a second image. In an embodiment of the present application, converting the image to the frequency domain for image restoration can avoid the use of image segmentation (segmentation needs to be processed separately and then merged, boundary artifacts may occur, and when the image size is large, the number of segments is too large, resulting in a long processing time), thereby improving the restoration quality and reducing the processing time. In addition, based on the high-frequency information and the low-frequency information containing noise to predict noise, the image restoration based on the noise has a higher image quality (more details can be restored, while significantly reducing the total sampling time).
参照图10,图10为本申请实施例提供的一种模型训练方法,如图10所示,本申请提供的模型训练方法,包括:Referring to FIG. 10 , FIG. 10 is a model training method provided in an embodiment of the present application. As shown in FIG. 10 , the model training method provided in the present application includes:
1001、获取第一图像和第二图像;所述第一图像和所述第二图像为针对于相同场景采集的;所述第二图像为所述第一图像对应的高质图像。1001. Acquire a first image and a second image; the first image and the second image are collected for the same scene; the second image is a high-quality image corresponding to the first image.
在一种可能的实现中,第一图像可以为低质图像,第一图像可以为包括雨滴等自然环境遮挡的图像、或者是由于环境光的影响存在曝光不足的图像、存在明显的摩尔纹的图像。In a possible implementation, the first image may be a low-quality image, an image occluded by natural environment such as raindrops, or an image that is underexposed due to the influence of ambient light or an image with obvious moiré.
在一种可能的实现中,第二图像可以为第一图像对应的高质图像,例如可以为对第一图像去除雨滴、解决曝光不足问题(例如将暗光照片增强到自然光水平)或者是去除摩尔纹得到的图像。In a possible implementation, the second image may be a high-quality image corresponding to the first image, for example, an image obtained by removing raindrops from the first image, solving underexposure problems (for example, enhancing dark light photos to natural light levels), or removing moiré patterns.
1002、将所述第一图像和所述第二图像转换到频域,分别得到第一数据和第二数据;所述第一数据的空间分辨率低于所述第一图像;所述第二数据的空间分辨率低于所述第二图像;所述第二数据包括第一低频信息;所述第一低频信息为所述第二数据中低频通道的信息。
1002. Convert the first image and the second image into frequency domain to obtain first data and second data respectively; the spatial resolution of the first data is lower than that of the first image; the spatial resolution of the second data is lower than that of the second image; the second data includes first low-frequency information; the first low-frequency information is information of a low-frequency channel in the second data.
在一种可能的实现中,可以通过二阶小波变换将所述第一图像和所述第二图像转换到频域。In a possible implementation, the first image and the second image may be converted into a frequency domain by a second-order wavelet transform.
如图11左半部分所示,可以将空间域RGB低质图像Xd和对应的清晰图像X0一同进行二阶Haar小波变换,得到小波域内图像xd和x0,可选的,图像尺寸由H×W×3变成使得空间分辨率下降了16倍,从而加快处理时间。As shown in the left half of Figure 11, the spatial domain RGB low-quality image Xd and the corresponding clear image X0 can be transformed by the second-order Haar wavelet transform to obtain the wavelet domain images xd and x0 . Optionally, the image size is changed from H×W×3 to This reduces spatial resolution by a factor of 16, speeding up processing time.
1003、根据所述第一数据,确定第一高频信息;所述第一高频信息为对所述第一图像对应的高质图像的高频通道的信息预测结果。1003. Determine first high-frequency information according to the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image.
在一种可能的实现中,可以根据所述第一数据,通过第二网络,确定第一高频信息;所述第二网络为预先训练好的网络。In a possible implementation, the first high-frequency information can be determined according to the first data through a second network; the second network is a pre-trained network.
在训练第二网络时,可以将低质图像对应的频域数据输入到第二网络中,预测低质图像对应的高质图像的高频通道的信息,并获取真实的低质图像对应的高质图像的高频通道的信息,基于两者来构建损失进行更新第二网络,进而使得第二网络具备基于低质图像对应的频域数据来预测低质图像对应的高质图像的高频通道的信息的能力。When training the second network, the frequency domain data corresponding to the low-quality image can be input into the second network to predict the information of the high-frequency channels of the high-quality image corresponding to the low-quality image, and obtain the information of the high-frequency channels of the high-quality image corresponding to the real low-quality image. Based on the two, the loss is constructed to update the second network, so that the second network has the ability to predict the information of the high-frequency channels of the high-quality image corresponding to the low-quality image based on the frequency domain data corresponding to the low-quality image.
1004、根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息;所述第一噪声信息用于和第二噪声信息确定第一损失;所述第二噪声信息为随机生成的噪声。1004. Obtain first noise information through a first network according to the first high-frequency information and the first low-frequency information; the first noise information is used to determine a first loss together with the second noise information; and the second noise information is randomly generated noise.
在一种可能的实现中,可以根据所述第一高频信息、所述第一数据和所述第一低频信息,通过第一网络得到第一噪声信息。In a possible implementation, the first noise information may be obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
在一种可能的实现中,所述根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息为在第i次迭代时执行的,所述i为正整数;还可以对所述第一低频信息叠加噪声,得到第三低频信息;在第i+1次迭代时,根据所述第一高频信息和所述第三低频信息,通过所述第一网络得到第三噪声信息;所述第三噪声信息用于和第四噪声信息确定第二损失;所述第四噪声信息为随机生成的噪声;根据所述第二损失,对更新后的所述第一网络进行更新。In a possible implementation, obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer; noise may also be superimposed on the first low-frequency information to obtain third low-frequency information; at the i+1-th iteration, obtaining the third noise information through the first network according to the first high-frequency information and the third low-frequency information; the third noise information is used to determine the second loss with the fourth noise information; the fourth noise information is randomly generated noise; and the updated first network is updated according to the second loss.
在训练框架中,将小波变换后的高质图像x0的低频频谱在不同时刻t先加上不同程度的高斯白噪声,再送入到噪声估计网络中。噪声估计网络是一个经典的U网络型结构,其目的是为了正确估计在每个时刻叠加到高质图像x0低频频谱上的噪声。在这个估计过程中,每个时刻t都需要将低质图像xd和HFRM预测出的清晰图像高频频谱一起输入到噪声估计网络中作为估计的条件。t从0开始增加,如此往复迭代,直到t=T为止。In the training framework, the low-frequency spectrum of the high-quality image x0 after wavelet transformation is first added with different degrees of Gaussian white noise at different times t, and then sent to the noise estimation network. The noise estimation network is a classic U-network structure, and its purpose is to correctly estimate the noise superimposed on the low-frequency spectrum of the high-quality image x0 at each time. In this estimation process, at each time t, the low-quality image xd and the high-frequency spectrum of the clear image predicted by HFRM need to be The two are input into the noise estimation network as the estimation condition. t increases from 0, and the iteration is repeated until t = T.
1005、根据所述第一损失,对所述第一网络进行更新。1005. Update the first network according to the first loss.
本申请实施例中,将图像转换到频域进行图像恢复,可以避免采用图像的切分(切块需要分别处理再合并,可能出现边界伪影,且图片尺寸大时切块数量过多,导致处理时间长),从而提高复原质量以及降低处理时间。此外,基于高频信息和包含噪声的低频信息来预测噪声,基于该噪声进行的图像复原得到的图像质量较高(能够恢复出更多的细节,同时显著减少了采样的总时间)。In the embodiment of the present application, the image is converted to the frequency domain for image restoration, which can avoid the use of image segmentation (the segments need to be processed separately and then merged, which may cause boundary artifacts, and when the image size is large, the number of segments is too large, resulting in a long processing time), thereby improving the restoration quality and reducing the processing time. In addition, based on the high-frequency information and the low-frequency information containing noise to predict the noise, the image restoration based on the noise has a higher image quality (it can restore more details and significantly reduce the total sampling time).
为了与现有的算法的性能进行对比,表1至表4分别给出了本申请实施例(WaveDM)和现有图像去雨滴数据集(RainDrop),去失焦模糊数据集(DPDD),去摩尔纹数据集(London’s Buildings)和暗光增强数据集(LOL-v1)上的表现。其中,评价指标选取PSNR,SSIM和恢复时间Time。从表中可以看出,目前本申请在两个评价指标上都取得了最好的效果,同时速度相当。
In order to compare the performance with the existing algorithms, Tables 1 to 4 respectively show the performance of the embodiment of the present application (WaveDM) and the existing image raindrop removal dataset (RainDrop), defocus blur removal dataset (DPDD), demoiré dataset (London's Buildings) and dark light enhancement dataset (LOL-v1). Among them, the evaluation indicators are PSNR, SSIM and recovery time Time. It can be seen from the table that the present application has achieved the best results in both evaluation indicators, and the speed is comparable.
表1
Table 1
Table 1
表2
Table 2
Table 2
表3
Table 3
Table 3
表4
Table 4
Table 4
本申请实施例的有益效果的视觉展示如图12A所示。可以看到和其他现有方法相比本申请恢复出了更多的图像细节,而且清晰度要明显优于现有方法。The visual display of the beneficial effects of the embodiment of the present application is shown in Figure 12A. It can be seen that compared with other existing methods, the present application restores more image details, and the clarity is significantly better than the existing methods.
本申请实施例的一个架构示意可以如图12B所示,包括训练框架和采样框架,其主要由小波变换与频谱分离、高频微调模块、噪声估计网络、高效率采样算法和小波逆变换构成。各个部分的功能描述如下:An architecture diagram of an embodiment of the present application can be shown in FIG12B , including a training framework and a sampling framework, which is mainly composed of wavelet transform and spectrum separation, a high-frequency fine-tuning module, a noise estimation network, a high-efficiency sampling algorithm, and an inverse wavelet transform. The functions of each part are described as follows:
小波变换:使用特定小波将图片从空间域转换到小波域,得到图像的小波频谱;Wavelet transform: Use a specific wavelet to transform the image from the spatial domain to the wavelet domain to obtain the wavelet spectrum of the image;
高频微调模块:从低质图片的高频频谱中恢复出清晰图片相对应的高频频谱;High-frequency fine-tuning module: restores the high-frequency spectrum corresponding to the clear image from the high-frequency spectrum of the low-quality image;
噪声估计网络:以高频微调模块的输出与频谱分离剩下的低频频谱作为条件,从高斯白噪声中循环迭代恢复出高质图片的低频频谱;Noise estimation network: Using the output of the high-frequency fine-tuning module and the low-frequency spectrum remaining after spectrum separation as conditions, it iteratively restores the low-frequency spectrum of the high-quality image from the Gaussian white noise;
高效率条件采样算法:以高频微调模块的输出与频谱分离剩下的低频频谱作为条件,在采样中间步骤直接预测高质图片,从而减少采样步数;High-efficiency conditional sampling algorithm: Using the output of the high-frequency fine-tuning module and the low-frequency spectrum remaining after spectrum separation as conditions, high-quality images are directly predicted in the intermediate sampling step, thereby reducing the number of sampling steps;
小波逆变换:将噪声网络输出的高质图片低频频谱和高频微调模块输出的高质图片高频频谱融合在一起,进行特定小波逆变换,得到清晰的空间域RGB高质图片。Inverse wavelet transform: The low-frequency spectrum of the high-quality image output by the noise network and the high-frequency spectrum of the high-quality image output by the high-frequency fine-tuning module are combined, and a specific inverse wavelet transform is performed to obtain a clear spatial domain RGB high-quality image.
接下来从装置的角度介绍本申请实施例提供的一种图像处理装置,参照图13,图13为本申请实施例提供的一种图像处理装置的结构示意,如图13所示,本申请实施例提供的一种图像处理装置1300包括:Next, an image processing device provided by an embodiment of the present application is introduced from the perspective of a device, with reference to FIG. 13 , which is a schematic diagram of the structure of an image processing device provided by an embodiment of the present application. As shown in FIG. 13 , an image processing device 1300 provided by an embodiment of the present application includes:
获取模块1301,用于获取第一图像;An acquisition module 1301 is used to acquire a first image;
其中,关于获取模块1301的具体描述可以参照上述实施例中步骤901的描述,这里不再赘述。The specific description of the acquisition module 1301 can refer to the description of step 901 in the above embodiment, which will not be repeated here.
处理模块1302,用于将所述第一图像转换到频域,得到第一数据;所述第一数据的空间分辨率低于所述第一图像;A processing module 1302 is used to convert the first image into a frequency domain to obtain first data; the spatial resolution of the first data is lower than that of the first image;
根据所述第一数据,确定第一高频信息;所述第一高频信息为对所述第一图像对应的高质图像的高频通道的信息预测结果;Determine first high-frequency information according to the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image;
获取第一低频信息,所述第一低频信息包含所述高质图像的低频通道的噪声;Acquire first low-frequency information, where the first low-frequency information includes noise of a low-frequency channel of the high-quality image;
根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息;所述第一噪声信息用于对所述第一低频信息进行去噪,得到第二低频信息;According to the first high-frequency information and the first low-frequency information, first noise information is obtained through a first network; the first noise information is used to denoise the first low-frequency information to obtain second low-frequency information;
所述第二低频信息和所述第一高频信息用于得到第二图像。The second low-frequency information and the first high-frequency information are used to obtain a second image.
其中,关于处理模块1302的具体描述可以参照上述实施例中步骤902至步骤905的描述,这里不再赘述。The specific description of the processing module 1302 can refer to the description of step 902 to step 905 in the above embodiment, which will not be repeated here.
在一种可能的实现中,所述处理模块,具体用于:In a possible implementation, the processing module is specifically configured to:
根据所述第一高频信息、所述第一数据和所述第一低频信息,通过第一网络得到第一噪声信息。First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
在一种可能的实现中,所述根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息为在第i次迭代时执行的,所述i为大于1的正整数;所述处理模块,还用于:In a possible implementation, obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer greater than 1; and the processing module is further used to:
在第i+1次迭代时,根据所述第一高频信息和所述第二低频信息,通过所述第一网络得到第二噪声
信息,所述第二噪声信息用于对所述第二低频信息进行去噪,得到第三低频信息;In the (i+1)th iteration, according to the first high-frequency information and the second low-frequency information, the second noise is obtained through the first network. information, wherein the second noise information is used to denoise the second low-frequency information to obtain third low-frequency information;
所述第一低频信息和所述第一高频信息用于得到第二图像,包括:The first low-frequency information and the first high-frequency information are used to obtain a second image, including:
所述第三低频信息和所述第一高频信息用于得到第二图像。The third low-frequency information and the first high-frequency information are used to obtain a second image.
在一种可能的实现中,所述处理模块,还用于:In a possible implementation, the processing module is further configured to:
将所述第二低频信息通过目标映射,得到目标低频信息;所述目标映射不包含噪声估计项;The second low-frequency information is subjected to target mapping to obtain target low-frequency information; the target mapping does not include a noise estimation item;
所述处理模块,具体用于:The processing module is specifically used for:
所述目标低频信息和所述第一高频信息用于融合得到融合结果,所述第二图像为将所述融合结果映射到空间域得到的。The target low-frequency information and the first high-frequency information are used to fuse to obtain a fusion result, and the second image is obtained by mapping the fusion result to a spatial domain.
在一种可能的实现中,所述第一低频信息为随机生成的噪声。In a possible implementation, the first low-frequency information is randomly generated noise.
在一种可能的实现中,所述处理模块,具体用于:In a possible implementation, the processing module is specifically configured to:
根据所述第一数据,通过第二网络,确定第一高频信息。According to the first data, first high-frequency information is determined through a second network.
在一种可能的实现中,所述处理模块,具体用于:In a possible implementation, the processing module is specifically configured to:
通过二阶小波变换将所述第一图像转换到频域。The first image is converted into the frequency domain by second-order wavelet transform.
此外,本申请实施例还提供了一种模型训练装置(可以对应于图10的模型训练方法),所述装置包括:In addition, the embodiment of the present application further provides a model training device (which may correspond to the model training method of FIG. 10 ), the device comprising:
获取模块,用于获取第一图像和第二图像;所述第一图像和所述第二图像为针对于相同场景采集的;所述第二图像为所述第一图像对应的高质图像;An acquisition module, used to acquire a first image and a second image; the first image and the second image are acquired for the same scene; the second image is a high-quality image corresponding to the first image;
处理模块,用于将所述第一图像和所述第二图像转换到频域,分别得到第一数据和第二数据;所述第一数据的空间分辨率低于所述第一图像;所述第二数据的空间分辨率低于所述第二图像;所述第二数据包括第一低频信息;所述第一低频信息为所述第二数据中低频通道的信息;a processing module, configured to convert the first image and the second image into a frequency domain to obtain first data and second data respectively; the spatial resolution of the first data is lower than that of the first image; the spatial resolution of the second data is lower than that of the second image; the second data includes first low-frequency information; the first low-frequency information is information of a low-frequency channel in the second data;
根据所述第一数据,确定第一高频信息;所述第一高频信息为对所述第一图像对应的高质图像的高频通道的信息预测结果;Determine first high-frequency information according to the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image;
根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息;所述第一噪声信息用于和第二噪声信息确定第一损失;所述第二噪声信息为随机生成的噪声;According to the first high-frequency information and the first low-frequency information, first noise information is obtained through a first network; the first noise information is used to determine a first loss together with the second noise information; the second noise information is randomly generated noise;
根据所述第一损失,对所述第一网络进行更新。The first network is updated according to the first loss.
在一种可能的实现中,所述处理模块,具体用于:In a possible implementation, the processing module is specifically configured to:
根据所述第一高频信息、所述第一数据和所述第一低频信息,通过第一网络得到第一噪声信息。First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
在一种可能的实现中,所述根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息为在第i次迭代时执行的,所述i为正整数;所述处理模块,还用于:In a possible implementation, obtaining the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer; and the processing module is further used to:
对所述第一低频信息叠加噪声,得到第三低频信息;superimposing noise on the first low-frequency information to obtain third low-frequency information;
在第i+1次迭代时,根据所述第一高频信息和所述第三低频信息,通过所述第一网络得到第三噪声信息;所述第三噪声信息用于和第四噪声信息确定第二损失;所述第四噪声信息为随机生成的噪声;In the (i+1)th iteration, third noise information is obtained through the first network according to the first high-frequency information and the third low-frequency information; the third noise information is used to determine the second loss together with the fourth noise information; the fourth noise information is randomly generated noise;
根据所述第二损失,对更新后的所述第一网络进行更新。The updated first network is updated according to the second loss.
在一种可能的实现中,所述处理模块,具体用于:In a possible implementation, the processing module is specifically configured to:
根据所述第一数据,通过第二网络,确定第一高频信息;所述第二网络为预先训练好的网络。According to the first data, first high-frequency information is determined through a second network; the second network is a pre-trained network.
在一种可能的实现中,所述处理模块,具体用于:In a possible implementation, the processing module is specifically configured to:
通过二阶小波变换将所述第一图像和所述第二图像转换到频域。The first image and the second image are converted into the frequency domain by second-order wavelet transform.
接下来介绍本申请实施例提供的一种执行设备,请参阅图14,图14为本申请实施例提供的执行设备的一种结构示意图,执行设备1400具体可以表现为手机、平板、笔记本电脑、智能穿戴设备、服务器等,此处不做限定。其中,执行设备1400实现图10对应实施例中图像处理方法的功能。具体的,执行设备1400包括:接收器1401、发射器1402、处理器1403和存储器1404(其中执行设备1400中的处理器1403的数量可以一个或多个),其中,处理器1403可以包括应用处理器14031和通信处理器14032。在本申请的一些实施例中,接收器1401、发射器1402、处理器1403和存储器1404可通过总线或其它方式连接。Next, an execution device provided in an embodiment of the present application is introduced. Please refer to Figure 14. Figure 14 is a structural schematic diagram of an execution device provided in an embodiment of the present application. The execution device 1400 can be specifically expressed as a mobile phone, a tablet, a laptop computer, an intelligent wearable device, a server, etc., which is not limited here. Among them, the execution device 1400 implements the function of the image processing method in the corresponding embodiment of Figure 10. Specifically, the execution device 1400 includes: a receiver 1401, a transmitter 1402, a processor 1403 and a memory 1404 (wherein the number of processors 1403 in the execution device 1400 can be one or more), wherein the processor 1403 may include an application processor 14031 and a communication processor 14032. In some embodiments of the present application, the receiver 1401, the transmitter 1402, the processor 1403 and the memory 1404 may be connected via a bus or other means.
存储器1404可以包括只读存储器和随机存取存储器,并向处理器1403提供指令和数据。存储器1404的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器
1404存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。The memory 1404 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1403. A portion of the memory 1404 may also include a non-volatile random access memory (NVRAM). 1404 stores processors and operation instructions, executable modules or data structures, or subsets thereof, or extended sets thereof, wherein the operation instructions may include various operation instructions for implementing various operations.
处理器1403控制执行设备的操作。具体的应用中,执行设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。The processor 1403 controls the operation of the execution device. In a specific application, the various components of the execution device are coupled together through a bus system, wherein the bus system includes not only a data bus but also a power bus, a control bus, and a status signal bus, etc. However, for the sake of clarity, various buses are referred to as bus systems in the figure.
上述本申请实施例揭示的方法可以应用于处理器1403中,或者由处理器1403实现。处理器1403可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1403中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1403可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器、以及视觉处理器(vision processing unit,VPU)、张量处理器(tensor processing unit,TPU)等适用于AI运算的处理器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1403可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1404,处理器1403读取存储器1404中的信息,结合其硬件完成上述实施例中步骤901至步骤905的步骤。The method disclosed in the above embodiment of the present application can be applied to the processor 1403, or implemented by the processor 1403. The processor 1403 can be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method can be completed by the hardware integrated logic circuit or software instructions in the processor 1403. The above processor 1403 can be a general-purpose processor, a digital signal processor (digital signal processing, DSP), a microprocessor or a microcontroller, and a vision processor (vision processing unit, VPU), a tensor processor (tensor processing unit, TPU) and other processors suitable for AI computing, and can further include an application specific integrated circuit (application specific integrated circuit, ASIC), a field programmable gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The processor 1403 can implement or execute the disclosed methods, steps and logic block diagrams in the embodiments of the present application. The general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc. The steps of the method disclosed in the embodiment of the present application can be directly embodied as being executed by a hardware decoding processor, or being executed by a combination of hardware and software modules in the decoding processor. The software module can be located in a storage medium mature in the art such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, etc. The storage medium is located in the memory 1404, and the processor 1403 reads the information in the memory 1404, and completes the steps 901 to 905 in the above embodiment in combination with its hardware.
接收器1401可用于接收输入的数字或字符信息,以及产生与执行设备的相关设置以及功能控制有关的信号输入。发射器1402可用于通过第一接口输出数字或字符信息;发射器1402还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1402还可以包括显示屏等显示设备。The receiver 1401 can be used to receive input digital or character information and generate signal input related to the relevant settings and function control of the execution device. The transmitter 1402 can be used to output digital or character information through the first interface; the transmitter 1402 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1402 can also include a display device such as a display screen.
本申请实施例还提供了一种训练设备,请参阅图15,图15是本申请实施例提供的训练设备一种结构示意图,具体的,训练设备1500由一个或多个服务器实现,训练设备1500可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1515(例如,一个或一个以上处理器)和存储器1532,一个或一个以上存储应用程序1542或数据1544的存储介质1530(例如一个或一个以上海量存储设备)。其中,存储器1532和存储介质1530可以是短暂存储或持久存储。存储在存储介质1530的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对训练设备中的一系列指令操作。更进一步地,中央处理器1515可以设置为与存储介质1530通信,在训练设备1500上执行存储介质1530中的一系列指令操作。The embodiment of the present application also provides a training device, please refer to Figure 15, Figure 15 is a structural diagram of the training device provided by the embodiment of the present application, specifically, the training device 1500 is implemented by one or more servers, and the training device 1500 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (CPU) 1515 (for example, one or more processors) and memory 1532, one or more storage media 1530 (for example, one or more mass storage devices) storing application programs 1542 or data 1544. Among them, the memory 1532 and the storage medium 1530 can be short-term storage or permanent storage. The program stored in the storage medium 1530 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the training device. Furthermore, the central processor 1515 can be configured to communicate with the storage medium 1530 to execute a series of instruction operations in the storage medium 1530 on the training device 1500.
训练设备1500还可以包括一个或一个以上电源1526,一个或一个以上有线或无线网络接口1550,一个或一个以上输入输出接口1558;或,一个或一个以上操作系统1541,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。The training device 1500 may also include one or more power supplies 1526, one or more wired or wireless network interfaces 1550, one or more input and output interfaces 1558; or, one or more operating systems 1541, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
本申请实施例中还提供一种包括计算机可读指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如前述执行设备所执行的步骤,或者,使得计算机执行如前述训练设备所执行的步骤。The embodiment of the present application also provides a computer program product including computer-readable instructions, which, when executed on a computer, enables the computer to execute the steps executed by the aforementioned execution device, or enables the computer to execute the steps executed by the aforementioned training device.
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述执行设备所执行的步骤,或者,使得计算机执行如前述训练设备所执行的步骤。A computer-readable storage medium is also provided in an embodiment of the present application, which stores a program for signal processing. When the computer-readable storage medium is run on a computer, it enables the computer to execute the steps executed by the aforementioned execution device, or enables the computer to execute the steps executed by the aforementioned training device.
本申请实施例提供的执行设备、训练设备或终端设备具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使执行设备内的芯片执行上述实施例描述的模型训练方法,或者,以使训练设备内的芯片执行上述实施例中与模型训练相关的步骤。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。The execution device, training device or terminal device provided in the embodiments of the present application may specifically be a chip, and the chip includes: a processing unit and a communication unit, wherein the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin or a circuit, etc. The processing unit may execute the computer execution instructions stored in the storage unit, so that the chip in the execution device executes the model training method described in the above embodiment, or so that the chip in the training device executes the steps related to model training in the above embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc. The storage unit may also be a storage unit located outside the chip in the wireless access device end, such as a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM), etc.
具体的,请参阅图16,图16为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为
神经网络处理器NPU 1600,NPU 1600作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路1603,通过控制器1604控制运算电路1603提取存储器中的矩阵数据并进行乘法运算。Specifically, please refer to FIG. 16, which is a schematic diagram of a structure of a chip provided in an embodiment of the present application. The chip can be expressed as Neural network processor NPU 1600, NPU 1600 is mounted on the host CPU (Host CPU) as a coprocessor, and the host CPU assigns tasks. The core part of NPU is the operation circuit 1603, which is controlled by the controller 1604 to extract matrix data in the memory and perform multiplication operations.
在一些实现中,运算电路1603内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路1603是二维脉动阵列。运算电路1603还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1603是通用的矩阵处理器。In some implementations, the operation circuit 1603 includes multiple processing units (Process Engine, PE) inside. In some implementations, the operation circuit 1603 is a two-dimensional systolic array. The operation circuit 1603 can also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the operation circuit 1603 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1602中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1601中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)1608中。For example, assume there is an input matrix A, a weight matrix B, and an output matrix C. The operation circuit takes the corresponding data of matrix B from the weight memory 1602 and caches it on each PE in the operation circuit. The operation circuit takes the matrix A data from the input memory 1601 and performs matrix operation with matrix B, and the partial result or final result of the matrix is stored in the accumulator 1608.
统一存储器1606用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)1605,DMAC被搬运到权重存储器1602中。输入数据也通过DMAC被搬运到统一存储器1606中。The unified memory 1606 is used to store input data and output data. The weight data is directly transferred to the weight memory 1602 through the direct memory access controller (DMAC) 1605. The input data is also transferred to the unified memory 1606 through the DMAC.
BIU为Bus Interface Unit即,总线接口单元1610,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)1609的交互。BIU stands for Bus Interface Unit, that is, the bus interface unit 1610, which is used for the interaction between AXI bus and DMAC and instruction fetch buffer (IFB) 1609.
总线接口单元1610(Bus Interface Unit,简称BIU),用于取指存储器1609从外部存储器获取指令,还用于存储单元访问控制器1605从外部存储器获取输入矩阵A或者权重矩阵B的原数据。The bus interface unit 1610 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1609 to obtain instructions from the external memory, and is also used for the storage unit access controller 1605 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1606或将权重数据搬运到权重存储器1602中或将输入数据数据搬运到输入存储器1601中。DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1606 or to transfer weight data to the weight memory 1602 or to transfer input data to the input memory 1601.
向量计算单元1607包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。The vector calculation unit 1607 includes multiple operation processing units, which further process the output of the operation circuit when necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc. It is mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as Batch Normalization, pixel-level summation, upsampling of feature planes, etc.
在一些实现中,向量计算单元1607能将经处理的输出的向量存储到统一存储器1606。例如,向量计算单元1607可以将线性函数;或,非线性函数应用到运算电路1603的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元1607生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1603的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, the vector calculation unit 1607 can store the processed output vector to the unified memory 1606. For example, the vector calculation unit 1607 can apply a linear function; or a nonlinear function to the output of the operation circuit 1603, such as linear interpolation of the feature plane extracted by the convolution layer, and then, for example, a vector of accumulated values to generate an activation value. In some implementations, the vector calculation unit 1607 generates a normalized value, a pixel-level summed value, or both. In some implementations, the processed output vector can be used as an activation input to the operation circuit 1603, for example, for use in a subsequent layer in a neural network.
控制器1604连接的取指存储器(instruction fetch buffer)1609,用于存储控制器1604使用的指令;An instruction fetch buffer 1609 connected to the controller 1604 is used to store instructions used by the controller 1604;
统一存储器1606,输入存储器1601,权重存储器1602以及取指存储器1609均为On-Chip存储器。外部存储器私有于该NPU硬件架构。Unified memory 1606, input memory 1601, weight memory 1602 and instruction fetch memory 1609 are all on-chip memories. External memories are private to the NPU hardware architecture.
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述程序执行的集成电路。The processor mentioned in any of the above places may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above programs.
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。It should also be noted that the device embodiments described above are merely schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. In addition, in the drawings of the device embodiments provided by the present application, the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation mode, the technicians in the field can clearly understand that the present application can be implemented by means of software plus necessary general hardware, and of course, it can also be implemented by special hardware including special integrated circuits, special CPUs, special memories, special components, etc. In general, all functions completed by computer programs can be easily implemented by corresponding hardware, and the specific hardware structure used to implement the same function can also be various, such as analog circuits, digital circuits or special circuits. However, for the present application, software program implementation is a better implementation mode in more cases. Based on such an understanding, the technical solution of the present application is essentially or the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a readable storage medium, such as a computer floppy disk, a U disk, a mobile hard disk, a ROM, a RAM, a magnetic disk or an optical disk, etc., including a number of instructions to enable a computer device (which can be a personal computer, a training device, or a network device, etc.) to execute the methods described in each embodiment of the present application.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实
现时,可以全部或部分地以计算机程序产品的形式实现。In the above embodiments, all or part of the embodiments may be implemented by software, hardware, firmware, or any combination thereof. At present, the present invention may be fully or partially implemented in the form of a computer program product.
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the process or function described in the embodiment of the present application is generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website site, a computer, a training device, or a data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, training device, or data center. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training device, a data center, etc. that includes one or more available media integrations. The available medium may be a magnetic medium, (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state drive (SSD)), etc.
Claims (27)
- 一种图像处理方法,其特征在于,所述方法包括:An image processing method, characterized in that the method comprises:获取第一图像;acquiring a first image;将所述第一图像转换到频域,得到第一数据;所述第一数据的空间分辨率低于所述第一图像;Converting the first image into a frequency domain to obtain first data; wherein the spatial resolution of the first data is lower than that of the first image;根据所述第一数据,确定第一高频信息;所述第一高频信息为对所述第一图像对应的高质图像的高频通道的信息预测结果;Determine first high-frequency information according to the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image;获取第一低频信息,所述第一低频信息包含所述高质图像的低频通道的噪声;Acquire first low-frequency information, where the first low-frequency information includes noise of a low-frequency channel of the high-quality image;根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息;所述第一噪声信息用于对所述第一低频信息进行去噪,得到第二低频信息;According to the first high-frequency information and the first low-frequency information, first noise information is obtained through a first network; the first noise information is used to denoise the first low-frequency information to obtain second low-frequency information;所述第二低频信息和所述第一高频信息用于得到第二图像。The second low-frequency information and the first high-frequency information are used to obtain a second image.
- 根据权利要求1所述的方法,其特征在于,所述根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息,包括:The method according to claim 1, characterized in that obtaining the first noise information through a first network according to the first high-frequency information and the first low-frequency information comprises:根据所述第一高频信息、所述第一数据和所述第一低频信息,通过第一网络得到第一噪声信息。First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
- 根据权利要求1或2所述的方法,其特征在于,所述根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息为在第i次迭代时执行的,所述i为大于1的正整数;所述方法还包括:The method according to claim 1 or 2, characterized in that the obtaining of the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer greater than 1; the method further comprises:在第i+1次迭代时,根据所述第一高频信息和所述第二低频信息,通过所述第一网络得到第二噪声信息,所述第二噪声信息用于对所述第二低频信息进行去噪,得到第三低频信息;In the (i+1)th iteration, second noise information is obtained through the first network according to the first high-frequency information and the second low-frequency information, and the second noise information is used to denoise the second low-frequency information to obtain third low-frequency information;所述第一低频信息和所述第一高频信息用于得到第二图像,包括:The first low-frequency information and the first high-frequency information are used to obtain a second image, including:所述第三低频信息和所述第一高频信息用于得到第二图像。The third low-frequency information and the first high-frequency information are used to obtain a second image.
- 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:The method according to claim 1 or 2, characterized in that the method further comprises:将所述第二低频信息通过目标映射,得到目标低频信息;所述目标映射不包含噪声估计项;The second low-frequency information is subjected to target mapping to obtain target low-frequency information; the target mapping does not include a noise estimation item;所述第二低频信息和所述第一高频信息用于得到第二图像,包括:The second low-frequency information and the first high-frequency information are used to obtain a second image, including:所述目标低频信息和所述第一高频信息用于融合得到融合结果,所述第二图像为将所述融合结果映射到空间域得到的。The target low-frequency information and the first high-frequency information are used to fuse to obtain a fusion result, and the second image is obtained by mapping the fusion result to a spatial domain.
- 根据权利要求1至4任一所述的方法,其特征在于,所述第一低频信息为随机生成的噪声。The method according to any one of claims 1 to 4, characterized in that the first low-frequency information is randomly generated noise.
- 根据权利要求1至5任一所述的方法,其特征在于,所述根据所述第一数据,确定第一高频信息,包括:The method according to any one of claims 1 to 5, characterized in that determining the first high-frequency information according to the first data comprises:根据所述第一数据,通过第二网络,确定第一高频信息。According to the first data, first high-frequency information is determined through a second network.
- 根据权利要求1至6任一所述的方法,其特征在于,所述将所述第一图像转换到频域,包括:The method according to any one of claims 1 to 6, characterized in that converting the first image to the frequency domain comprises:通过二阶小波变换将所述第一图像转换到频域。The first image is converted into the frequency domain by second-order wavelet transform.
- 一种模型训练方法,其特征在于,所述方法包括:A model training method, characterized in that the method comprises:获取第一图像和第二图像;所述第一图像和所述第二图像为针对于相同场景采集的;所述第二图像为所述第一图像对应的高质图像;Acquire a first image and a second image; the first image and the second image are collected for the same scene; the second image is a high-quality image corresponding to the first image;将所述第一图像和所述第二图像转换到频域,分别得到第一数据和第二数据;所述第一数据的空间分辨率低于所述第一图像;所述第二数据的空间分辨率低于所述第二图像;所述第二数据包括第一低频信息;所述第一低频信息为所述第二数据中低频通道的信息;Convert the first image and the second image into the frequency domain to obtain first data and second data respectively; the spatial resolution of the first data is lower than that of the first image; the spatial resolution of the second data is lower than that of the second image; the second data includes first low-frequency information; the first low-frequency information is information of a low-frequency channel in the second data;根据所述第一数据,确定第一高频信息;所述第一高频信息为对所述第一图像对应的高质图像的高 频通道的信息预测结果;Determine first high-frequency information according to the first data; the first high-frequency information is a high-quality image corresponding to the first image. The information prediction results of the frequency channel;根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息;所述第一噪声信息用于和第二噪声信息确定第一损失;所述第二噪声信息为随机生成的噪声;According to the first high-frequency information and the first low-frequency information, first noise information is obtained through a first network; the first noise information is used to determine a first loss together with the second noise information; the second noise information is randomly generated noise;根据所述第一损失,对所述第一网络进行更新。The first network is updated according to the first loss.
- 根据权利要求8所述的方法,其特征在于,所述根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息,包括:The method according to claim 8, characterized in that obtaining the first noise information through a first network according to the first high-frequency information and the first low-frequency information comprises:根据所述第一高频信息、所述第一数据和所述第一低频信息,通过第一网络得到第一噪声信息。First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
- 根据权利要求8或9所述的方法,其特征在于,所述根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息为在第i次迭代时执行的,所述i为正整数;所述方法还包括:The method according to claim 8 or 9, characterized in that the obtaining of the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer; the method further comprises:对所述第一低频信息叠加噪声,得到第三低频信息;superimposing noise on the first low-frequency information to obtain third low-frequency information;在第i+1次迭代时,根据所述第一高频信息和所述第三低频信息,通过所述第一网络得到第三噪声信息;所述第三噪声信息用于和第四噪声信息确定第二损失;所述第四噪声信息为随机生成的噪声;In the (i+1)th iteration, third noise information is obtained through the first network according to the first high-frequency information and the third low-frequency information; the third noise information is used to determine the second loss together with the fourth noise information; the fourth noise information is randomly generated noise;根据所述第二损失,对更新后的所述第一网络进行更新。The updated first network is updated according to the second loss.
- 根据权利要求8至10任一所述的方法,其特征在于,所述根据所述第一数据,确定第一高频信息,包括:The method according to any one of claims 8 to 10, characterized in that determining the first high-frequency information according to the first data comprises:根据所述第一数据,通过第二网络,确定第一高频信息;所述第二网络为预先训练好的网络。According to the first data, first high-frequency information is determined through a second network; the second network is a pre-trained network.
- 根据权利要求8至11任一所述的方法,其特征在于,所述将所述第一图像和所述第二图像转换到频域,包括:The method according to any one of claims 8 to 11, characterized in that converting the first image and the second image into the frequency domain comprises:通过二阶小波变换将所述第一图像和所述第二图像转换到频域。The first image and the second image are converted into the frequency domain by second-order wavelet transform.
- 一种图像处理装置,其特征在于,所述装置包括:An image processing device, characterized in that the device comprises:获取模块,用于获取第一图像;An acquisition module, used for acquiring a first image;处理模块,用于将所述第一图像转换到频域,得到第一数据;所述第一数据的空间分辨率低于所述第一图像;A processing module, configured to convert the first image into a frequency domain to obtain first data; the spatial resolution of the first data is lower than that of the first image;根据所述第一数据,确定第一高频信息;所述第一高频信息为对所述第一图像对应的高质图像的高频通道的信息预测结果;Determine first high-frequency information according to the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image;获取第一低频信息,所述第一低频信息包含所述高质图像的低频通道的噪声;Acquire first low-frequency information, where the first low-frequency information includes noise of a low-frequency channel of the high-quality image;根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息;所述第一噪声信息用于对所述第一低频信息进行去噪,得到第二低频信息;According to the first high-frequency information and the first low-frequency information, first noise information is obtained through a first network; the first noise information is used to denoise the first low-frequency information to obtain second low-frequency information;所述第二低频信息和所述第一高频信息用于得到第二图像。The second low-frequency information and the first high-frequency information are used to obtain a second image.
- 根据权利要求13所述的装置,其特征在于,所述处理模块,具体用于:The device according to claim 13, characterized in that the processing module is specifically used to:根据所述第一高频信息、所述第一数据和所述第一低频信息,通过第一网络得到第一噪声信息。First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
- 根据权利要求13或14所述的装置,其特征在于,所述根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息为在第i次迭代时执行的,所述i为大于1的正整数;所述处理模块,还用于:The device according to claim 13 or 14, characterized in that the obtaining of the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer greater than 1; the processing module is further used to:在第i+1次迭代时,根据所述第一高频信息和所述第二低频信息,通过所述第一网络得到第二噪声信息,所述第二噪声信息用于对所述第二低频信息进行去噪,得到第三低频信息;In the (i+1)th iteration, second noise information is obtained through the first network according to the first high-frequency information and the second low-frequency information, and the second noise information is used to denoise the second low-frequency information to obtain third low-frequency information;所述第一低频信息和所述第一高频信息用于得到第二图像,包括:The first low-frequency information and the first high-frequency information are used to obtain a second image, including:所述第三低频信息和所述第一高频信息用于得到第二图像。 The third low-frequency information and the first high-frequency information are used to obtain a second image.
- 根据权利要求13或14所述的装置,其特征在于,所述处理模块,还用于:The device according to claim 13 or 14, characterized in that the processing module is further used for:将所述第二低频信息通过目标映射,得到目标低频信息;所述目标映射不包含噪声估计项;The second low-frequency information is subjected to target mapping to obtain target low-frequency information; the target mapping does not include a noise estimation item;所述处理模块,具体用于:The processing module is specifically used for:所述目标低频信息和所述第一高频信息用于融合得到融合结果,所述第二图像为将所述融合结果映射到空间域得到的。The target low-frequency information and the first high-frequency information are used to fuse to obtain a fusion result, and the second image is obtained by mapping the fusion result to a spatial domain.
- 根据权利要求13至16任一所述的装置,其特征在于,所述第一低频信息为随机生成的噪声。The device according to any one of claims 13 to 16, characterized in that the first low-frequency information is randomly generated noise.
- 根据权利要求13至17任一所述的装置,其特征在于,所述处理模块,具体用于:The device according to any one of claims 13 to 17, characterized in that the processing module is specifically used to:根据所述第一数据,通过第二网络,确定第一高频信息。According to the first data, first high-frequency information is determined through a second network.
- 根据权利要求13至18任一所述的装置,其特征在于,所述处理模块,具体用于:The device according to any one of claims 13 to 18, characterized in that the processing module is specifically used to:通过二阶小波变换将所述第一图像转换到频域。The first image is converted into the frequency domain by second-order wavelet transform.
- 一种模型训练装置,其特征在于,所述装置包括:A model training device, characterized in that the device comprises:获取模块,用于获取第一图像和第二图像;所述第一图像和所述第二图像为针对于相同场景采集的;所述第二图像为所述第一图像对应的高质图像;An acquisition module, used to acquire a first image and a second image; the first image and the second image are acquired for the same scene; the second image is a high-quality image corresponding to the first image;处理模块,用于将所述第一图像和所述第二图像转换到频域,分别得到第一数据和第二数据;所述第一数据的空间分辨率低于所述第一图像;所述第二数据的空间分辨率低于所述第二图像;所述第二数据包括第一低频信息;所述第一低频信息为所述第二数据中低频通道的信息;a processing module, configured to convert the first image and the second image into a frequency domain to obtain first data and second data respectively; the spatial resolution of the first data is lower than that of the first image; the spatial resolution of the second data is lower than that of the second image; the second data includes first low-frequency information; the first low-frequency information is information of a low-frequency channel in the second data;根据所述第一数据,确定第一高频信息;所述第一高频信息为对所述第一图像对应的高质图像的高频通道的信息预测结果;Determine first high-frequency information according to the first data; the first high-frequency information is an information prediction result of a high-frequency channel of a high-quality image corresponding to the first image;根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息;所述第一噪声信息用于和第二噪声信息确定第一损失;所述第二噪声信息为随机生成的噪声;According to the first high-frequency information and the first low-frequency information, first noise information is obtained through a first network; the first noise information is used to determine a first loss together with the second noise information; the second noise information is randomly generated noise;根据所述第一损失,对所述第一网络进行更新。The first network is updated according to the first loss.
- 根据权利要求20所述的装置,其特征在于,所述处理模块,具体用于:The device according to claim 20, characterized in that the processing module is specifically used to:根据所述第一高频信息、所述第一数据和所述第一低频信息,通过第一网络得到第一噪声信息。First noise information is obtained through a first network according to the first high-frequency information, the first data, and the first low-frequency information.
- 根据权利要求20或21所述的装置,其特征在于,所述根据所述第一高频信息和所述第一低频信息,通过第一网络得到第一噪声信息为在第i次迭代时执行的,所述i为正整数;所述处理模块,还用于:The device according to claim 20 or 21, characterized in that the obtaining of the first noise information through the first network according to the first high-frequency information and the first low-frequency information is performed at the i-th iteration, where i is a positive integer; and the processing module is further used to:对所述第一低频信息叠加噪声,得到第三低频信息;superimposing noise on the first low-frequency information to obtain third low-frequency information;在第i+1次迭代时,根据所述第一高频信息和所述第三低频信息,通过所述第一网络得到第三噪声信息;所述第三噪声信息用于和第四噪声信息确定第二损失;所述第四噪声信息为随机生成的噪声;In the (i+1)th iteration, third noise information is obtained through the first network according to the first high-frequency information and the third low-frequency information; the third noise information is used to determine the second loss together with the fourth noise information; the fourth noise information is randomly generated noise;根据所述第二损失,对更新后的所述第一网络进行更新。The updated first network is updated according to the second loss.
- 根据权利要求20至22任一所述的装置,其特征在于,所述处理模块,具体用于:The device according to any one of claims 20 to 22, characterized in that the processing module is specifically used to:根据所述第一数据,通过第二网络,确定第一高频信息;所述第二网络为预先训练好的网络。According to the first data, the first high-frequency information is determined through a second network; the second network is a pre-trained network.
- 根据权利要求20至23任一所述的装置,其特征在于,所述处理模块,具体用于:The device according to any one of claims 20 to 23, characterized in that the processing module is specifically used to:通过二阶小波变换将所述第一图像和所述第二图像转换到频域。The first image and the second image are converted into the frequency domain by second-order wavelet transform.
- 一种计算设备,其特征在于,所述计算设备包括存储器和处理器;所述存储器存储有代码,所述处理器被配置为获取所述代码,并执行如权利要求1至12任一所述的方法。 A computing device, characterized in that the computing device includes a memory and a processor; the memory stores a code, and the processor is configured to obtain the code and execute the method according to any one of claims 1 to 12.
- 一种计算机存储介质,其特征在于,所述计算机存储介质存储有一个或多个指令,所述指令在由一个或多个计算机执行时使得所述一个或多个计算机实施权利要求1至12任一所述的方法。A computer storage medium, characterized in that the computer storage medium stores one or more instructions, and when the instructions are executed by one or more computers, the one or more computers implement any one of the methods of claims 1 to 12.
- 一种计算机程序产品,包括代码,其特征在于,在所述代码被执行时用于实现如权利要求1至12任一所述的方法。 A computer program product, comprising codes, characterized in that when the codes are executed, they are used to implement the method according to any one of claims 1 to 12.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310233422.2 | 2023-02-28 | ||
CN202310233422.2A CN116258651A (en) | 2023-02-28 | 2023-02-28 | Image processing method and related device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024179510A1 true WO2024179510A1 (en) | 2024-09-06 |
Family
ID=86684186
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2024/078973 WO2024179510A1 (en) | 2023-02-28 | 2024-02-28 | Image processing method and related device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116258651A (en) |
WO (1) | WO2024179510A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116258651A (en) * | 2023-02-28 | 2023-06-13 | 华为技术有限公司 | Image processing method and related device |
CN118037607A (en) * | 2024-03-18 | 2024-05-14 | 无锡英菲感知技术有限公司 | Infrared image non-uniformity correction method, device, equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103747189A (en) * | 2013-11-27 | 2014-04-23 | 杨新锋 | Digital image processing method |
US20170256037A1 (en) * | 2016-03-01 | 2017-09-07 | Realtek Semiconductor Corp. | Image de-noising method and apparatus thereof |
CN107742279A (en) * | 2017-10-31 | 2018-02-27 | 努比亚技术有限公司 | A kind of image processing method, device and storage medium |
US20200357098A1 (en) * | 2019-05-07 | 2020-11-12 | Healcerion Co., Ltd. | Discrete wavelet transform-based noise removal apparatus for removing noise from image signal and remote medical diagnosis system including the same |
CN113870104A (en) * | 2020-06-30 | 2021-12-31 | 微软技术许可有限责任公司 | Super-resolution image reconstruction |
WO2022021025A1 (en) * | 2020-07-27 | 2022-02-03 | 华为技术有限公司 | Image enhancement method and apparatus |
KR102448498B1 (en) * | 2022-04-05 | 2022-09-28 | 한화시스템 주식회사 | Method and apparatus for removing the noise of IR image |
CN116258651A (en) * | 2023-02-28 | 2023-06-13 | 华为技术有限公司 | Image processing method and related device |
-
2023
- 2023-02-28 CN CN202310233422.2A patent/CN116258651A/en active Pending
-
2024
- 2024-02-28 WO PCT/CN2024/078973 patent/WO2024179510A1/en unknown
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103747189A (en) * | 2013-11-27 | 2014-04-23 | 杨新锋 | Digital image processing method |
US20170256037A1 (en) * | 2016-03-01 | 2017-09-07 | Realtek Semiconductor Corp. | Image de-noising method and apparatus thereof |
CN107742279A (en) * | 2017-10-31 | 2018-02-27 | 努比亚技术有限公司 | A kind of image processing method, device and storage medium |
US20200357098A1 (en) * | 2019-05-07 | 2020-11-12 | Healcerion Co., Ltd. | Discrete wavelet transform-based noise removal apparatus for removing noise from image signal and remote medical diagnosis system including the same |
CN113870104A (en) * | 2020-06-30 | 2021-12-31 | 微软技术许可有限责任公司 | Super-resolution image reconstruction |
WO2022021025A1 (en) * | 2020-07-27 | 2022-02-03 | 华为技术有限公司 | Image enhancement method and apparatus |
KR102448498B1 (en) * | 2022-04-05 | 2022-09-28 | 한화시스템 주식회사 | Method and apparatus for removing the noise of IR image |
CN116258651A (en) * | 2023-02-28 | 2023-06-13 | 华为技术有限公司 | Image processing method and related device |
Non-Patent Citations (1)
Title |
---|
HUANG YI, HUANG JIANCHENG, LIU JIANZHUANG, DONG YU, LV JIAXI, CHEN SHIFENG: "WaveDM: Wavelet-Based Diffusion Models for Image Restoration", IEEE TRANSACTIONS ON MULTIMEDIA, 5 February 2024 (2024-02-05), XP093205865 * |
Also Published As
Publication number | Publication date |
---|---|
CN116258651A (en) | 2023-06-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022116856A1 (en) | Model structure, model training method, and image enhancement method and device | |
WO2022134971A1 (en) | Noise reduction model training method and related apparatus | |
CN113066017B (en) | Image enhancement method, model training method and equipment | |
WO2022042713A1 (en) | Deep learning training method and apparatus for use in computing device | |
WO2024179510A1 (en) | Image processing method and related device | |
EP4163832A1 (en) | Neural network training method and apparatus, and image processing method and apparatus | |
CN113011562A (en) | Model training method and device | |
WO2024041479A1 (en) | Data processing method and apparatus | |
CN111738403B (en) | Neural network optimization method and related equipment | |
CN113065635A (en) | Model training method, image enhancement method and device | |
WO2023231794A1 (en) | Neural network parameter quantification method and apparatus | |
CN114595799A (en) | Model training method and device | |
WO2024213099A1 (en) | Data processing method and apparatus | |
CN111950700A (en) | Neural network optimization method and related equipment | |
WO2024083121A1 (en) | Data processing method and apparatus | |
WO2022111387A1 (en) | Data processing method and related apparatus | |
WO2024061269A1 (en) | Three-dimensional reconstruction method and related apparatus | |
WO2024002211A1 (en) | Image processing method and related apparatus | |
WO2024212648A1 (en) | Method for training classification model, and related apparatus | |
CN113066018A (en) | Image enhancement method and related device | |
WO2024188171A1 (en) | Image processing method and related device thereof | |
WO2024199409A1 (en) | Data processing method and apparatus thereof | |
WO2022001364A1 (en) | Method for extracting data features, and related apparatus | |
WO2024175014A1 (en) | Image processing method and related device thereof | |
WO2024160219A1 (en) | Model quantization method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24763190 Country of ref document: EP Kind code of ref document: A1 |