WO2020215644A1 - 视频图像处理方法及装置 - Google Patents
视频图像处理方法及装置 Download PDFInfo
- Publication number
- WO2020215644A1 WO2020215644A1 PCT/CN2019/114139 CN2019114139W WO2020215644A1 WO 2020215644 A1 WO2020215644 A1 WO 2020215644A1 CN 2019114139 W CN2019114139 W CN 2019114139W WO 2020215644 A1 WO2020215644 A1 WO 2020215644A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- frame
- processing
- convolution
- deblurring
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 14
- 238000012545 processing Methods 0.000 claims abstract description 329
- 238000000034 method Methods 0.000 claims abstract description 79
- 230000033001 locomotion Effects 0.000 claims description 52
- 230000015654 memory Effects 0.000 claims description 27
- 238000004590 computer program Methods 0.000 claims description 12
- 238000007499 fusion processing Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 description 33
- 238000013528 artificial neural network Methods 0.000 description 21
- 238000010586 diagram Methods 0.000 description 16
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 14
- 230000006870 function Effects 0.000 description 12
- 230000000694 effects Effects 0.000 description 11
- 238000000605 extraction Methods 0.000 description 6
- 239000000284 extract Substances 0.000 description 5
- 238000011176 pooling Methods 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 4
- 238000005859 coupling reaction Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- RZVHIXYEVGDQDX-UHFFFAOYSA-N 9,10-anthraquinone Chemical compound C1=CC=C2C(=O)C3=CC=CC=C3C(=O)C2=C1 RZVHIXYEVGDQDX-UHFFFAOYSA-N 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/68—Control of cameras or camera modules for stable pick-up of the scene, e.g. compensating for camera body vibrations
- H04N23/682—Vibration or motion blur correction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/68—Control of cameras or camera modules for stable pick-up of the scene, e.g. compensating for camera body vibrations
- H04N23/681—Motion detection
- H04N23/6811—Motion detection based on the image signal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/68—Control of cameras or camera modules for stable pick-up of the scene, e.g. compensating for camera body vibrations
- H04N23/682—Vibration or motion blur correction
- H04N23/683—Vibration or motion blur correction performed by a processor, e.g. controlling the readout of an image memory
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/80—Camera processing pipelines; Components thereof
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/95—Computational photography systems, e.g. light-field imaging systems
Definitions
- This application relates to the field of image processing technology, and in particular to a video image processing method and device.
- the captured video is prone to blurring.
- the blur caused by camera shake or the motion of the subject will often result in shooting failure or failure to perform video-based processing.
- Next processing Traditional methods can remove the blur in the video image through optical flow or neural network, but the deblurring effect is poor.
- the embodiments of the present application provide a video image processing method and device.
- an embodiment of the present application provides a video image processing method, including: acquiring multiple frames of continuous video images, wherein the multiple frames of continuous video images include the Nth frame image, the N-1th frame image, and the Nth frame image.
- -1 frame deblurred image where N is a positive integer; based on the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image, obtain all The deblurring convolution kernel for the Nth frame image; the deblurring processing is performed on the Nth frame image through the deblurring convolution kernel to obtain an image after the Nth frame deblurring processing.
- the deblurring convolution kernel of the Nth frame image in the video image can be obtained, and then the Nth frame image can be convolved by the deblurring convolution kernel of the Nth frame image, which can effectively remove Blur in the Nth frame of image, obtain the Nth frame of deblurred image.
- the image of the Nth frame of image is obtained based on the Nth frame of image, the N-1th frame of image, and the deblurred image of the N-1th frame
- the deblurring convolution kernel includes: performing convolution processing on the pixels of the image to be processed to obtain the deblurring convolution kernel, wherein the image to be processed is composed of the Nth frame image, the N-1th frame image, and The deblurred image of the N-1th frame is obtained by superimposing the channel dimension.
- the deblurring convolution kernel of the pixel is obtained , And use the deblurring convolution kernel to perform deconvolution processing on the corresponding pixels in the Nth frame of image to remove the blurring of the pixels in the Nth frame of image; by generating one for each pixel in the Nth frame of image
- the deblurring convolution kernel can remove the blur in the Nth frame image (non-uniform blur image), and the image after deblurring is clear and natural.
- performing convolution processing on the pixels of the image to be processed to obtain a deblurring convolution kernel includes: performing convolution processing on the image to be processed to extract the N-th The motion information of the pixels of a frame of image relative to the pixels of the Nth frame of image obtains the alignment convolution kernel, where the motion information includes speed and direction; the alignment convolution kernel is encoded to obtain The deblurring convolution kernel.
- the alignment convolution kernel of the pixels is obtained, and the alignment kernel can be used for subsequent alignment. deal with. Then through the convolution processing of the alignment kernel, the deblurring information between the pixels of the N-1th frame image and the pixels of the N-1th frame deblurred image is extracted, and the deblurring kernel is obtained.
- the deblurring kernel not only contains the deblurring information between the pixels of the N-1th frame image and the pixels of the N-1th frame deblurred image, but also contains the pixels of the N-1th frame image and the The motion information between the pixels of the N frame image is beneficial to improve the effect of removing the blur of the Nth frame image.
- the deblurring of the Nth frame of image through the deblurring convolution kernel to obtain the deblurred image of the Nth frame includes: using the deblurring volume
- the product core performs convolution processing on the pixels of the characteristic image of the Nth frame of image to obtain a first characteristic image; performs decoding processing on the first characteristic image to obtain the deblurred image of the Nth frame.
- deblurring is performed on the characteristic image of the Nth frame image through the deblurring convolution kernel, which can reduce the amount of data processing in the deblurring process and increase the processing speed.
- the performing convolution processing on the pixels of the characteristic image of the Nth frame of image through the deblurring convolution kernel to obtain the first characteristic image includes: adjusting the deblurring The dimensions of the convolution kernel are such that the number of channels of the deblurring convolution kernel is the same as the number of channels of the characteristic image of the Nth frame of image; the Nth frame of image is checked by the deblurring convolution after dimension adjustment The pixel points of the characteristic image are subjected to convolution processing to obtain the first characteristic image.
- the dimension of the deblurring convolution kernel is the same as the dimension of the characteristic image of the Nth frame image, and then the deblurring convolution check by adjusting the dimension is achieved.
- the characteristic images of N frames of images are subjected to convolution processing.
- the convolution processing is performed on the to-be-processed image to extract the motion information of the pixel of the N-1th frame of image relative to the pixel of the Nth frame of image
- the method further includes: performing convolution processing on the pixels of the characteristic image of the deblurred image of the N-1th frame through the aligned convolution kernel to obtain a second characteristic image.
- the pixel points of the characteristic image of the N-1th frame of image are convolved by the alignment convolution kernel to realize the time alignment of the characteristic image of the N-1th frame of image to the Nth frame.
- the convolution processing is performed on the pixel points of the characteristic image of the deblurred image of the N-1th frame through the aligned convolution kernel to obtain a second characteristic image, including : Adjust the dimensions of the aligned convolution kernel so that the number of channels of the aligned convolution kernel is the same as the number of channels of the feature image of the N-1th frame image; check all the channels by the aligned convolution after adjusting the dimensions
- the pixel points of the characteristic image of the deblurred image of the N-1th frame are subjected to convolution processing to obtain the second characteristic image.
- the dimension of the de-aligned convolution kernel is the same as the dimension of the feature image of the N-1th frame image, and then the convolution check by adjusting the dimension is aligned
- the feature image of the N-1th frame image is subjected to convolution processing.
- the decoding processing of the first characteristic image to obtain the deblurred image of the Nth frame includes: performing the decoding processing on the first characteristic image and the second characteristic image.
- the feature image is fused to obtain a third feature image; the third feature image is decoded to obtain the Nth frame deblurred image.
- the first feature image and the second feature image are merged to improve the deblurring effect of the Nth frame image, and then the fused third feature image is decoded to obtain the Nth image.
- Frame deblurred image
- the convolution processing is performed on the to-be-processed image to extract the motion information of the pixel of the N-1th frame of image relative to the pixel of the Nth frame of image ,
- Obtaining the alignment convolution kernel including: performing superposition processing on the channel dimension of the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image to obtain the The image to be processed; the image to be processed is encoded to obtain a fourth characteristic image; the fourth characteristic image is subjected to convolution processing to obtain a fifth characteristic image; the fifth characteristic image is obtained by convolution processing The number of channels is adjusted to the first preset value to obtain the aligned convolution kernel.
- the motion information of the pixels of the N-1th frame image relative to the pixels of the Nth frame image is extracted, and then convolution processing is used to facilitate subsequent processing.
- the number of channels of the fifth characteristic image is adjusted to the first preset value.
- performing encoding processing on the aligned convolution kernel to obtain the deblurring convolution kernel includes: adjusting the number of channels of the aligned convolution kernel to a second preset through convolution processing. Set a value to obtain a sixth characteristic image; perform fusion processing on the fourth characteristic image and the sixth characteristic image to obtain a seventh characteristic image; perform convolution processing on the seventh characteristic image to extract the first characteristic image
- the deblurring information of the pixels of the N-1 frame deblurred image relative to the pixels of the N-1th frame image obtains the deblurring convolution kernel.
- the deblurring convolution kernel is obtained by convolution processing on the aligned convolution kernel, which can make the deblurring convolution kernel not only include the pixels of the N-1th frame image relative to the Nth frame image
- the motion information of the pixel points also includes the deblurring information of the pixels of the N-1th frame deblurred image relative to the pixels of the N-1th frame image, which improves the subsequent deblurring convolution kernel to remove the Nth The blur effect of the frame image.
- the convolution processing is performed on the seventh characteristic image to extract the difference between the N-1th frame of the deblurred image and the N-1th frame of image
- Obtaining the deblurring convolution kernel from the deblurring information of the pixel includes: performing convolution processing on the seventh feature image to obtain an eighth feature image; and calculating the number of channels of the eighth feature image through convolution processing Adjust to the first preset value to obtain the deblurring convolution kernel.
- the motion information of the pixels of the N-1th frame image relative to the pixels of the N-1th frame deblurred image is extracted,
- the number of channels of the eighth feature image is adjusted to the first preset value through convolution processing.
- the performing decoding processing on the third characteristic image to obtain the deblurred image of the Nth frame includes: performing deconvolution processing on the third characteristic image , Obtain a ninth characteristic image; perform convolution processing on the ninth characteristic image to obtain a decoded image of the Nth frame; compare the pixel value of the first pixel of the Nth frame of image with the Nth frame The pixel values of the second pixel of the decoded image are added to obtain the image after deblurring of the Nth frame, wherein the position of the first pixel in the Nth frame of image is the same as that of the The position of the second pixel point in the Nth frame decoded image is the same.
- the third characteristic image is decoded through deconvolution processing and convolution processing to obtain the Nth frame decoded image, and then the Nth frame image and the Nth frame are decoded.
- the pixel values of corresponding pixels in the processed image are added to obtain the deblurred image of the Nth frame, which further improves the deblurring effect.
- an embodiment of the present application also provides a video image processing device, including: an acquiring unit configured to acquire multiple frames of continuous video images, wherein the multiple frames of continuous video images include the Nth frame image and the Nth frame image.
- an acquiring unit configured to acquire multiple frames of continuous video images, wherein the multiple frames of continuous video images include the Nth frame image and the Nth frame image.
- the first processing unit is configured to be based on the Nth frame of image, the N-1th frame of image, and the first N-1 frames of the deblurred image to obtain a deblurring convolution kernel for the Nth frame of image
- a second processing unit configured to perform deblurring processing on the Nth frame of image through the deblurring convolution kernel , Get the Nth frame deblurred image.
- the first processing unit includes: a first convolution processing subunit, configured to perform convolution processing on pixels of the image to be processed to obtain a deblurring convolution kernel, wherein the The processed image is obtained by superimposing the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image in the channel dimension.
- the first convolution processing subunit is configured to perform convolution processing on the image to be processed to extract pixels of the N-1th frame image relative to the
- the motion information of the pixels of the Nth frame of image obtains the alignment convolution kernel, where the motion information includes speed and direction; and the alignment convolution kernel is encoded to obtain the deblurring convolution kernel.
- the second processing unit includes: a second convolution processing subunit configured to convolve the pixels of the characteristic image of the Nth frame of the image through the deblurring convolution kernel Product processing to obtain a first characteristic image; a decoding processing subunit configured to perform decoding processing on the first characteristic image to obtain the Nth frame deblurred image.
- the second convolution processing subunit is configured to: adjust the dimension of the deblurring convolution kernel so that the number of channels of the deblurring convolution kernel is the same as the Nth frame The number of channels of the feature image of the image is the same; and the pixel points of the feature image of the Nth frame of image are convolved by the deblurring convolution kernel after the dimension is adjusted to obtain the first feature image.
- the first convolution processing subunit is further configured to: perform convolution processing on the image to be processed to extract pixels of the N-1th frame image With respect to the motion information of the pixels of the Nth frame of image, after the aligned convolution kernel is obtained, the pixels of the characteristic image of the deblurred image of the N-1th frame are convolved through the aligned convolution kernel. Product processing to obtain the second feature image.
- the first convolution processing subunit is further configured to adjust the dimension of the aligned convolution kernel so that the number of channels of the aligned convolution kernel is equal to the number of channels of the N-1th convolution kernel.
- the number of channels of the characteristic image of the frame image is the same; and the pixel points of the characteristic image of the image after the deblurring processing of the N-1th frame are subjected to convolution processing by the aligned convolution check after adjusting the dimensions to obtain the first Two feature images.
- the second processing unit is configured to: perform fusion processing on the first feature image and the second feature image to obtain a third feature image; The image is decoded to obtain the deblurred image of the Nth frame.
- the first convolution processing subunit is further configured to: deblur the Nth frame image, the N-1th frame image, and the N-1th frame The latter image is superimposed in the channel dimension to obtain the image to be processed; and the image to be processed is encoded to obtain a fourth characteristic image; and the fourth characteristic image is convolved to obtain the first Five characteristic images; and adjusting the number of channels of the fifth characteristic image to a first preset value through convolution processing to obtain the aligned convolution kernel.
- the first convolution processing subunit is further configured to: adjust the number of channels of the aligned convolution kernel to a second preset value through convolution processing to obtain a sixth characteristic image And performing fusion processing on the fourth feature image and the sixth feature image to obtain a seventh feature image; and performing convolution processing on the seventh feature image to extract the N-1th frame for deblurring
- the deblurring information of the pixels of the processed image with respect to the pixels of the N-1th frame of image obtains the deblurring convolution kernel.
- the first convolution processing subunit is further configured to: perform convolution processing on the seventh feature image to obtain an eighth feature image; and perform convolution processing on the first The number of channels of the eight feature image is adjusted to the first preset value to obtain the deblurring convolution kernel.
- the second processing unit is further configured to: perform deconvolution processing on the third feature image to obtain a ninth feature image; and perform convolution on the ninth feature image Processing to obtain a decoded image of the Nth frame; and adding the pixel value of the first pixel of the Nth frame of image to the pixel value of the second pixel of the image of the Nth frame of decoded image, Obtain the deblurred image of the Nth frame, wherein the position of the first pixel in the image of the Nth frame and the position of the second pixel in the decoded image of the Nth frame The location is the same.
- an embodiment of the present application further provides a processor, which is configured to execute the foregoing first aspect and any one of the possible implementation methods thereof.
- an embodiment of the present application also provides an electronic device, including: a processor, an input device, an output device, and a memory.
- the processor, input device, output device, and memory are connected to each other, and the memory stores Program instructions; when the program instructions are executed by the processor, the processor executes the above-mentioned first aspect and any one of its possible implementation methods.
- the embodiments of the present application also provide a computer-readable storage medium in which a computer program is stored, and the computer program includes program instructions that are processed by an electronic device When the processor executes, the processor is caused to execute the above-mentioned first aspect and any one of its possible implementation methods.
- FIG. 1 is a schematic diagram of corresponding pixels in different images provided by an embodiment of the application
- Fig. 2 is a non-uniform blurred image provided by an embodiment of the application
- FIG. 3 is a schematic flowchart of a video image processing method provided by an embodiment of this application.
- FIG. 4 is a schematic diagram of the flow of deblurring processing in a video image processing method according to an embodiment of the application
- FIG. 5 is a schematic flowchart of another video image processing method provided by an embodiment of the application.
- FIG. 6 is a schematic diagram of a process for obtaining a deblurring convolution kernel and an alignment convolution kernel provided by an embodiment of the application;
- FIG. 7 is a schematic diagram of an encoding module provided by an embodiment of the application.
- FIG. 8 is a schematic diagram of an aligned convolution kernel generation module provided by an embodiment of the application.
- FIG. 9 is a schematic diagram of a deblurring convolution kernel generation module provided by an embodiment of the application.
- FIG. 10 is a schematic flowchart of another video image processing method provided by an embodiment of the application.
- FIG. 11 is a schematic diagram of an adaptive convolution processing module provided by an embodiment of the application.
- FIG. 12 is a schematic diagram of a decoding module provided by an embodiment of this application.
- FIG. 13 is a schematic structural diagram of a video image deblurring neural network provided by an embodiment of this application.
- FIG. 14 is a schematic structural diagram of an aligned convolution kernel and deblurring convolution kernel generation module provided by an embodiment of the application;
- FIG. 15 is a schematic structural diagram of a video image processing device provided by an embodiment of this application.
- FIG. 16 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the application.
- the word “correspondence” will appear a lot, where the corresponding pixels in the two images refer to two pixels at the same position in the two images.
- the pixel point a in the image A corresponds to the pixel point d in the image B
- the pixel point b in the image A corresponds to the pixel point c in the image B.
- the corresponding pixels in the multiple images have the same meaning as the corresponding pixels in the two images.
- the non-uniform blurred image that appears in the following refers to the different degrees of blurring of different pixels in the image, that is, the motion trajectories of different pixels are different.
- the blur degree of the font on the sign in the upper left corner is greater than the blur degree of the car in the lower right corner, that is, the blur degrees of the two areas are inconsistent.
- the embodiments of the present application can be used to remove the blur in the non-uniformly blurred image. The embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
- FIG. 3 is a schematic flowchart of a video image processing method provided by an embodiment of the present application. As shown in FIG. 3, the method includes:
- the multi-frame continuous video image includes an Nth frame image, an N-1th frame image, and an N-1th frame deblurred image, where N is a positive integer.
- multiple frames of continuous video images can be obtained by shooting video with a camera.
- the above Nth frame image and N-1th frame image are two adjacent frames of the multi-frame continuous video image, and the Nth frame image is the image after the N-1th frame image, and the Nth frame image is the current Prepare a frame of image for processing (that is, apply the implementation provided in this application for deblurring processing).
- the image after deblurring the N-1th frame is the image obtained after deblurring the N-1th frame image.
- the deblurring of the video image in the embodiments of this application is a recursive process, that is, the image after the deblurring of the N-1th frame will be used as the input image of the Nth frame of image deblurring.
- the deblurred image of the Nth frame will be used as the input image of the N+1th frame of image deblurring process.
- N 1, that is, the object of the current deblurring process is the first frame in the video.
- the N-1th frame image and the N-1th frame deblurred image are both the Nth frame, that is, three first frame images are acquired.
- a sequence obtained by arranging each frame of images in the video in the order of shooting time is called a video frame sequence.
- the image obtained after deblurring is called the image after deblurring.
- the video image is deblurred according to the sequence of video frames, and only one frame of the image is deblurred at a time.
- the video image and the deblurred image can be stored in the storage of the electronic device.
- the video refers to the video stream, that is, the video images are stored in the memory of the electronic device in the order of the video frame sequence. Therefore, the electronic device can directly obtain the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image from the memory.
- the video image mentioned in the embodiments of the present application may be a video captured in real time by a camera of an electronic device, or may be a video image stored in a memory of the electronic device.
- the Nth frame is obtained based on the Nth frame image, the N-1th frame image, and the deblurred image of the N-1th frame.
- the deblurring convolution kernel of the frame image includes: performing convolution processing on the pixels of the image to be processed to obtain the deblurring convolution kernel, wherein the image to be processed is composed of the Nth frame image and the N-1th image.
- the frame image and the deblurred image of the N-1th frame are superimposed on the channel dimension.
- the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image are superimposed in the channel dimension to obtain the image to be processed.
- the size of the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image are all 100*100*3
- the size of the image to be processed after superposition is 100*100*9, that is to say, three images (the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image) are superimposed in the image to be processed
- the number of pixels is unchanged compared to the number of pixels in any one of the three images, but the number of channels of each pixel will become 3 times that of any one of the three images.
- the convolution processing performed on the pixels of the image to be processed can be implemented by multiple arbitrarily stacked convolution layers.
- the embodiment of this application controls the number of convolution layers and the size of the convolution kernel in the convolution layer. Not limited.
- the characteristic information of the pixels in the image to be processed can be extracted to obtain a deblurring convolution kernel.
- the characteristic information includes motion information of pixels of the N-1 frame image relative to the pixels of the N frame image, and pixels of the N-1 frame image relative to the N-1 frame deblurring The deblurring information of the pixels of the processed image.
- the aforementioned motion information includes the motion speed and direction of the pixel in the N-1th frame of image relative to the corresponding pixel in the Nth frame of image.
- the deblurring convolution kernel in the embodiment of the present application is the result of the convolution processing of the image to be processed, and it is used as the convolution kernel of the convolution processing in the subsequent processing of the embodiment of the present application.
- performing convolution processing on pixels of the image to be processed refers to performing convolution processing on each pixel of the image to be processed to obtain a deblurring convolution kernel for each pixel respectively.
- Example 1 continues the example (Example 2), the size of the image to be processed is 100*100*9, that is, the image to be processed contains 100*100 pixels, and the pixels of the image to be processed are convolved to obtain A 100*100 feature image, where each pixel in the above 100*100 feature image can be used as a deblurring convolution kernel for subsequent deblurring of the pixel in the Nth frame of image.
- the deblurring processing is performed on the Nth frame of image through the deblurring convolution kernel to obtain the deblurred image of the Nth frame,
- the feature image of the Nth frame image can be obtained by performing feature extraction processing on the Nth frame image.
- the feature extraction processing may be convolution processing or pooling processing, which is not limited in the embodiment of the present application.
- the deblurring convolution kernel of each pixel in the image to be processed is obtained.
- the number of pixels in the image to be processed is the same as the number of pixels in the Nth frame of image, and the pixels in the image to be processed correspond to the pixels in the Nth frame of image one-to-one.
- the meaning of one-to-one correspondence can be seen in the following example: the pixel point A in the image to be processed corresponds to the pixel point B in the Nth frame image, that is, the position of the pixel point A in the image to be processed and the pixel Point B has the same position in the Nth frame of image.
- the foregoing decoding processing may be implemented through deconvolution processing, or may be obtained through a combination of deconvolution processing and convolution processing, which is not limited in the embodiment of the present application.
- the pixel value of the pixel in the image obtained by decoding the first characteristic image is added to the pixel value of the pixel of the Nth frame of image,
- the image obtained after the "addition” is regarded as the image after the deblurring of the Nth frame.
- the information of the Nth frame image can be used to obtain the Nth frame deblurred image.
- the Nth frame obtained after "addition" is deblurred
- the pixel value of pixel E in the processed image is 350, where the position of C in the image to be processed, the position of D in the Nth frame of image, and the position of E in the Nth frame of deblurred image the same.
- the motion trajectories of different pixels in a non-uniformly blurred image are different, and the more complex the motion trajectory of the pixel, the higher the degree of blur.
- the embodiment of the present application predicts one pixel for each pixel in the image to be processed
- the deblurring convolution kernel is used to perform convolution processing on the feature points in the Nth frame of image through the predicted deblurring convolution kernel to remove the blurring of the pixels in the Nth frame of features. Since different pixels in a non-uniform blurred image have different degrees of blur, it is obvious that the corresponding deblurring convolution kernel is generated for different pixels, which can better remove the blur of each pixel, and then achieve the removal of non-uniform blur Blur in the image.
- the embodiment of the present application obtains the deblurring convolution kernel of the pixel based on the deblurring information between the pixels of the N-1th frame of image and the deblurred image of the N-1th frame, and uses the deblurring
- the convolution kernel performs deconvolution processing on the corresponding pixels in the Nth frame image to remove the blur of the pixels in the Nth frame image; by generating a deblurring convolution kernel for each pixel in the Nth frame image , Can remove the blur in the Nth frame image (non-uniform blur image), the image after deblurring is clear and natural, and the entire deblurring process is time-consuming and fast.
- FIG. 5 is a schematic flowchart of a possible implementation manner of 302 according to an embodiment of the present application. As shown in FIG. 5, the method includes:
- the motion information includes speed and direction, which can be understood as the motion information of a pixel point from the time of the N-1th frame (the time when the image of the N-1th frame is taken) to the time of the Nth frame ( The time when the Nth frame of image was taken).
- the motion information of the pixel points helps to remove the blur of the Nth frame image.
- the convolution processing performed on the pixels of the image to be processed can be implemented by multiple arbitrarily stacked convolution layers.
- the embodiment of this application controls the number of convolution layers and the size of the convolution kernel in the convolution layer. Not limited.
- the characteristic information of the pixels in the image to be processed can be extracted to obtain the aligned convolution kernel.
- the feature information here includes motion information of pixels of the N-1th frame image relative to the pixels of the Nth frame image.
- the aligned convolution kernel in the embodiment of the present application is the result obtained by performing the aforementioned convolution processing on the image to be processed, and will be used as the convolution kernel of the convolution processing in the subsequent processing of the embodiment of the present application.
- the alignment convolution kernel extracts the motion information of the pixels of the N-1th frame image relative to the pixels of the Nth frame image by performing convolution processing on the image to be processed, it can be subsequently checked by alignment convolution The pixel points of the Nth frame image are aligned.
- the aligned convolution kernel obtained in this embodiment is also obtained in real time, that is, through the above processing, the aligned convolution kernel of each pixel in the Nth frame of image is obtained.
- the encoding processing here can be convolution processing or pooling processing.
- the foregoing encoding processing is convolution processing, and the convolution processing can be implemented by a plurality of arbitrarily stacked convolution layers.
- the embodiment of the present application controls the number of convolution layers and the convolution kernel in the convolution layer.
- the size is not limited.
- the convolution processing in 402 is different from the convolution processing in 401.
- the convolution processing in 401 is implemented by 3 convolutional layers with 32 channels (the size of the convolution kernel is 3*3), and the convolution processing in 402 consists of 5 convolutions with 64 channels.
- the build-up layer (the size of the convolution kernel is 3*3) is implemented. Both (3 convolutional layers and 5 convolutional layers) are essentially convolution processing, but the specific implementation process of the two is different.
- the image to be processed Since the image to be processed is obtained by superimposing the image of the Nth frame, the image of the N-1th frame, and the deblurred image of the N-1th frame in the channel dimension, the image to be processed contains the image of the Nth frame, Information about the N-1th frame image and the deblurred image of the N-1th frame.
- the convolution processing in 401 focuses more on extracting the motion information of the pixels of the N-1th frame image relative to the pixels of the Nth frame image, that is to say, after the processing of 401, the Nth image in the image to be processed
- the deblurring information between the -1 frame image and the N-1th frame deblurred image is not extracted.
- the image to be processed and the alignment convolution kernel may be fused, so that the aligned convolution kernel obtained after fusion includes the N-1th frame image and the N-1th frame The deblurring information between the deblurred images.
- the deblurring information of the image after deblurring processing in the N-1th frame relative to the pixels of the N-1th frame image is extracted to obtain the deblurring convolution kernel.
- the deblurring information can be understood as the mapping relationship between the pixels of the N-1th frame of image and the pixels of the N-1th deblurred image, that is, the pixels before deblurring and the pixels after deblurring. The mapping relationship between points.
- the deblurring convolution kernel obtained by convolution processing the alignment convolution kernel includes the deblurring between the pixels of the N-1th frame image and the pixels of the N-1th frame deblurred image
- the information includes the motion information between the pixels of the N-1th frame of image and the pixels of the Nth frame of image.
- Subsequent convolution processing is performed on the pixels of the Nth frame of image through the deblurring convolution kernel to improve the deblurring effect.
- the embodiment of the present application obtains the alignment convolution kernel of the pixels based on the motion information between the pixels of the N-1th frame image and the pixels of the Nth frame image, and subsequent alignment processing can be performed through the alignment convolution kernel. Then through the convolution processing of the alignment convolution kernel, the deblurring information between the pixels of the N-1th frame image and the pixels of the N-1th frame deblurred image is extracted, and the deblurring convolution is obtained.
- the kernel can make the deblurring convolution kernel not only include the deblurring information between the pixels of the N-1th frame image and the pixels of the N-1th frame deblurred image, but also include the N-1th frame
- the motion information between the pixels of the image and the pixels of the Nth frame of image is beneficial to improve the effect of removing the blur of the Nth frame of image.
- the foregoing embodiments all obtain the deblurring convolution kernel and the alignment convolution kernel by performing convolution processing on the image. Due to the large number of pixels contained in the image, if the image is processed directly, the amount of data to be processed is large and the processing speed is slow. Therefore, the embodiment of the present application will provide a deblurring convolution based on the characteristic image.
- the implementation of the kernel and alignment convolution kernel is a deblurring convolution kernel.
- FIG. 6 is a schematic diagram of a process for obtaining a deblurring convolution kernel and an alignment convolution kernel according to Embodiment 6 of the present application. As shown in FIG. 6, the method includes:
- step 302 Please refer to step 302 to obtain the implementation of the image to be processed, which will not be repeated here.
- the foregoing encoding processing can be implemented in multiple ways, such as convolution, pooling, etc., which are not specifically limited in the embodiment of the present application.
- the module shown in Figure 7 can be used to encode the image to be processed.
- the module in turn includes a convolutional layer with 32 channels (the size of the convolution kernel is 3*3) , Two residual blocks with 32 channels (each residual block contains two convolutional layers, the size of the convolution kernel of the convolutional layer is 3*3), and a convolutional layer with 64 channels (convolution The size of the product kernel is 3*3), two residual blocks with 64 channels (each residual block contains two convolution layers, and the size of the convolution kernel of the convolution layer is 3*3), one channel number A 128 convolutional layer (convolution kernel size is 3*3), two residual blocks with 128 channels (each residual block contains two convolutional layers, the size of the convolution kernel of the convolutional layer is 3*3).
- the image to be processed is subjected to layer-by-layer convolution processing to complete the encoding of the image to be processed, and the fourth characteristic image is obtained.
- the characteristic content and semantic information extracted by each convolution layer are different, and the specific expression is encoding processing
- the features of the image to be processed are abstracted step by step, and relatively minor features will be gradually removed. Therefore, the smaller the size of the feature image extracted later, and the more concentrated the semantic information.
- the image to be processed is convolved step by step, and the corresponding features are extracted, and finally a fixed size fourth feature image is obtained. In this way, the main content information of the image to be processed (ie the fourth feature image) can be obtained At the same time, the image size is reduced, the amount of data processing is reduced, and the processing speed is increased.
- Example 3 For example (Example 3), assuming that the size of the image to be processed is 100*100*3, the size of the fourth characteristic image obtained through the encoding process of the module shown in FIG. 7 is 25*25*128.
- the implementation process of the above convolution processing is as follows: the convolution layer performs convolution processing on the image to be processed, that is, the convolution kernel is used to slide on the image to be processed, and the pixels on the image to be processed are Multiply the values on the corresponding convolution kernel, and then add all the multiplied values as the pixel value on the image corresponding to the middle pixel of the convolution kernel. Finally, all the pixels in the image to be processed are slidingly processed, and the fourth is obtained. Feature image.
- the step size of the convolutional layer may be set to 2.
- FIG. 8 is a module for generating an aligned convolution kernel provided by an embodiment of the application.
- the fourth feature image is input to the module shown in Figure 8.
- the fourth feature image sequentially passes through a convolutional layer with 128 channels (convolution kernel size is 3*3) and two channels
- the number of residual blocks is 64 (each residual block contains two convolutional layers, the size of the convolution kernel of the convolutional layer is 3*3) to realize the convolution processing of the fourth feature image, and extract the first
- the motion information between the pixel points of the N-1th frame image and the pixel points of the Nth frame image in the four-feature image is used to obtain the fifth feature image.
- the size of the image does not change, that is, the size of the fifth characteristic image obtained is the same as the size of the fourth characteristic image.
- the size of the fourth feature image is 25*25*128, and the size of the fifth feature image obtained through the processing of 303 is also 25*25*128.
- the fourth layer in Figure 8 performs convolution processing on the fifth feature image, and the obtained
- the size of the aligned convolution kernel is 25*25*c*k*k (it needs to be understood that the number of channels of the fifth feature image is adjusted by the fourth layer of convolution processing), where c is the fifth feature image K is a positive integer, optionally, the value of k is 5.
- 25*25*c*k*k is adjusted to 25*25*ck 2 , where ck 2 is the first preset value.
- the height and width of the aligned convolution kernel are both 25.
- the aligned convolution kernel contains 25*25 elements, each element contains c pixels, and the positions of different elements in the aligned convolution kernel are different, such as: assuming that the width and height of the aligned convolution kernel are defined If it is the xoy plane, each element in the aligned convolution kernel can be determined by coordinates (x, y), where o is the origin.
- the elements of the aligned convolution kernel are the convolution kernels for pixel alignment in the subsequent processing, and the size of each element is 1*1*ck 2 .
- Example 4 continues the example (Example 5), the size of the fifth feature image is 25*25*128, and the size of the aligned convolution kernel obtained by the processing of 304 is 25*25*128*k*k, which is 25*25 *128k 2 .
- the aligned convolution kernel contains 25*25 elements, each element contains 128 pixels, and different elements have different positions in the first aligned convolution kernel.
- the size of each element is 1*1*128*k 2 .
- the fourth layer is a convolutional layer, and the larger the convolution kernel of the convolutional layer, the greater the amount of data processing.
- the fourth layer in FIG. 8 is a convolutional layer with 128 channels and a convolution kernel size of 1*1. Adjusting the number of channels of the fifth feature image through the convolution layer with the convolution kernel size of 1*1 can reduce the amount of data processing and increase the processing speed.
- the alignment convolution Since the number of channels of the fifth feature image is adjusted by convolution processing in 504 (that is, the fourth layer in Figure 8), before convolution processing the alignment convolution kernel to obtain the deblurring convolution kernel, the alignment convolution The number of channels of the product core is adjusted to the second preset value (that is, the number of channels of the fifth characteristic image).
- the number of channels of the aligned convolution kernel is adjusted to the second preset value through convolution processing to obtain the sixth characteristic image.
- the convolution processing can be implemented by a convolution layer with 128 channels and a convolution kernel size of 1*1.
- the present embodiments 502 to 504 are more focused on extracting the motion information between the pixels of the N-1th frame of image and the pixels of the Nth frame of the image to be processed. Since the subsequent processing needs to extract the deblurring information between the pixels of the N-1th frame of the image to be processed and the pixels of the N-1th frame of the deblurred image, before the subsequent processing, by The fourth characteristic image and the sixth characteristic image are merged to add deblurring information between the pixels of the N-1th frame image and the pixels of the N-1th frame deblurred image in the characteristic image.
- the fourth feature image and the sixth feature image are concatenated, that is, the fourth feature image and the sixth feature image are superimposed in the channel dimension to obtain the seventh feature image.
- the seventh characteristic image contains the deblurring information between the extracted pixels of the N-1th frame image and the pixels of the N-1th frame deblurred image, and the seventh characteristic image is scrolled
- the product processing can further extract the deblurring information between the pixels of the N-1th frame image and the pixels of the N-1th frame deblurred image to obtain a deblurring convolution kernel.
- the process includes the following steps :
- Convolution processing is performed on the seventh feature image to obtain an eighth feature image; the number of channels of the eighth feature image is adjusted to the first preset value through convolution processing to obtain a deblurring convolution kernel.
- the seventh feature image is input to the module shown in Figure 9, and the seventh feature image sequentially passes through a convolutional layer with 128 channels (the size of the convolution kernel is 3*3), two residual blocks with 64 channels (each residual block contains two convolutional layers, and the size of the convolution kernel of the convolutional layer is 3*3) processing to achieve the seventh feature
- the image convolution process extracts the deblurring information between the pixels of the N-1th frame image in the seventh characteristic image and the pixels of the N-1th frame deblurred image to obtain the eighth characteristic image.
- the processing procedure of the seventh characteristic image by the module shown in FIG. 9 can refer to the processing procedure of the fifth characteristic image by the module shown in FIG. 8, which will not be repeated here.
- the module shown in Figure 8 (used to generate aligned convolution kernels) is compared with the module shown in Figure 9 (used to generate deblurring convolution kernels).
- the module has one more convolutional layer (that is, the fourth layer of the module shown in Figure 8). Although the rest of the composition is the same, the weights of the two are different, which directly determines that the uses of the two are different.
- the weights of the modules shown in FIG. 8 and the modules shown in FIG. 9 may be obtained by training the modules shown in FIG. 8 and FIG. 9.
- the deblurring convolution kernel obtained by 507 is a deblurring convolution kernel including each pixel in the seventh feature image, and the size of the convolution kernel of each pixel is 1*1*ck 2 .
- Example 5 continues the example (Example 6), the size of the seventh feature image is 25*25*128*k*k, that is to say, the seventh feature image contains 25*25 pixels. Accordingly, the obtained The fuzzy convolution kernel (size 25*25*128k 2 ) contains 25*25 deblurring convolution kernels (that is, each pixel corresponds to a deblurring convolution kernel, and each pixel deblurring convolution kernel The size is 1*1*128k 2 ).
- the information of each pixel in the seventh characteristic image is synthesized into a convolution kernel, that is, the information of each pixel Deblurring the convolution kernel.
- the motion information between the pixels of the N-1 frame image and the pixels of the N frame image is extracted, and the aligned convolution kernel of each pixel is obtained.
- the deblurring information between the pixels of the N-1th frame image and the pixels of the N-1th frame deblurred image is extracted, and each pixel is obtained The deblurring convolution kernel.
- This embodiment explains in detail how to obtain the deblurring convolution kernel and the aligned convolution kernel.
- the following embodiments will elaborate on how to remove the blur in the Nth frame image through the deblurring convolution kernel and the aligned convolution kernel, and obtain the first N frames of deblurred image.
- FIG. 10 is a schematic flowchart of another video image processing method provided by an embodiment of the present application. As shown in FIG. 10, the method includes:
- the above-mentioned feature image of the Nth frame image may be obtained by performing feature extraction processing on the Nth frame image, where the feature extraction processing may be convolution processing or pooling processing, which is not limited in the embodiment of the application.
- the feature extraction process of the Nth frame image can be performed by the encoding module shown in FIG. 7 to obtain the feature image of the Nth frame image.
- the specific composition of FIG. 7 and the processing process of the Nth frame image in FIG. 7 can be referred to 502, which will not be repeated here.
- the feature image of the Nth frame image includes the Nth frame Image information (in this application, the information here can be understood as the information of the blurred area in the Nth frame of image), so subsequent processing of the characteristic image of the Nth frame of image can reduce the amount of data processing and increase the processing speed.
- each pixel in the image to be processed is subjected to convolution processing to obtain the deblurring convolution kernel of each pixel respectively, and the pixel points of the characteristic image of the Nth frame image are convolved through the deblurring convolution kernel.
- Processing refers to: using the deblurring convolution kernel of each pixel in the deblurring convolution kernel obtained by the foregoing embodiment as the convolution kernel of the corresponding pixel in the feature image of the Nth frame of image, Each pixel of the characteristic image is convolved.
- the deblurring convolution kernel of each pixel in the deblurring convolution kernel contains the information of each pixel in the seventh feature image, and this information is one-dimensional information in the deblurring convolution kernel. .
- the pixel points of the characteristic image of the Nth frame image are three-dimensional. Therefore, the information of each pixel point in the seventh characteristic image is used as the convolution kernel of each pixel point in the characteristic image of the Nth frame image.
- the dimension of the deblurring convolution kernel needs to be adjusted. Based on the above considerations, the implementation process of 901 includes the following steps:
- the deblurring convolution kernel of each pixel in the deblurring convolution kernel obtained in the foregoing embodiment can be used as the characteristic image of the Nth frame image through the module (adaptive convolution processing module) shown in FIG. 11 Convolution kernel of the corresponding pixel in the, and perform convolution processing on the pixel.
- the reshape in Figure 11 refers to the dimension of the deblurring convolution kernel for each pixel in the deblurring convolution kernel, that is, the dimension of the deblurring kernel of each pixel is adjusted from 1*1*ck 2 to c*k*k.
- Example 6 continues the example (Example 7), the size of the deblurring convolution kernel of each pixel is 1*1*128k 2 , after reshape the deblurring convolution kernel of each pixel, the resulting convolution kernel The size is 128*k*k.
- the aligned convolution kernel performs convolution processing on the pixel points of the feature image of the image after deblurring the N-1th frame to obtain a second feature image , Including: adjusting the dimension of the aligned convolution kernel so that the number of channels of the aligned convolution kernel is the same as the number of channels of the feature image of the N-1th frame image; and the aligned convolution after adjusting the dimensions
- the pixel points of the characteristic image of the deblurred image of the N-1th frame are checked for convolution processing to obtain the second characteristic image.
- the deblurring convolution kernel obtained in the previous embodiment is used as the deblurring convolution kernel for each pixel of the feature image of the Nth frame image through the module shown in FIG. 11.
- the image deblurring is the same.
- the dimension of the alignment convolution kernel of each pixel in the alignment convolution kernel obtained in the foregoing embodiment is adjusted to 128*k*k through the reshape in the module shown in FIG. 11, and through adjustment
- the aligned convolution kernel after the dimensions performs convolution processing on the corresponding pixels in the feature image of the image after the deblurring processing of the N-1th frame.
- the characteristic image of the deblurred image in the N-1th frame contains a large number of clear (that is, no blur) pixels, but the pixels in the characteristic image of the deblurred image in the N-1th frame are the same as the current frame There is a displacement between the pixels of. Therefore, through the processing of 902, the position of the pixel point of the characteristic image of the image after the deblurring process of the N-1th frame is adjusted, so that the adjusted position of the pixel point is closer to the position at the time of the Nth frame (the position here refers to The position of the subject in the Nth frame of image). In this way, the subsequent processing can use the information of the second characteristic image to remove the blur in the Nth frame of image.
- 901 can be executed first, then 902, or 902 can be executed first, then 901, or 901 and 902 can be executed simultaneously.
- 901 may be executed first, and then 505-507, or 505-507 may be executed first, and then 901 or 902 may be executed.
- the embodiments of this application do not limit this.
- the first feature image with the second feature image By fusing the first feature image with the second feature image, it can be based on the motion information between the pixels of the N-1 frame image and the pixels of the N frame image and the pixels of the N-1 frame image On the basis of the deblurring information between the pixels of the deblurred image in the N-1th frame, the information of the characteristic image of the (aligned) N-1th frame image is used to improve the deblurring effect.
- the first feature image and the second feature image are superimposed on the channel dimension to obtain the third feature image.
- the decoding processing can be any one of deconvolution processing, deconvolution processing, bilinear interpolation processing, and depooling processing, or deconvolution processing, deconvolution processing, double
- deconvolution processing deconvolution processing, double
- Figure 12 shows the decoding module, which in turn includes a deconvolution layer with 64 channels (the size of the convolution kernel is 3*3), and two channels A residual block of 64 (each residual block contains two convolutional layers, the size of the convolution kernel of the convolutional layer is 3*3), and a deconvolution layer with 32 channels (the size of the convolution kernel 3*3), two residual blocks with 32 channels (each residual block contains two convolutional layers, and the size of the convolution kernel of the convolutional layer is 3*3).
- the third characteristic image is decoded by the decoding module shown in FIG. 12 to obtain the deblurred image of the Nth frame including the following steps: deconvolution processing on the third characteristic image to obtain the ninth characteristic image; The nine-feature image is subjected to convolution processing to obtain the N-th frame decoded image.
- the pixel value of the first pixel of the Nth frame of image can be added to the pixel value of the second pixel of the Nth frame of decoded image .
- To obtain the deblurred image of the Nth frame wherein the position of the first pixel in the Nth frame of image is the same as the position of the second pixel in the Nth frame of decoded image. Make the Nth frame deblurred image more natural.
- the feature image of the Nth frame image can be deblurred by the deblurring convolution kernel obtained in the foregoing embodiment, and the feature image of the N-1th frame image can be aligned by the alignment convolution kernel obtained by the foregoing embodiment deal with.
- Deblurring the first feature image obtained by the deblurring process and the second feature image obtained by the alignment process is fused to decode the third feature image, which can improve the deblurring effect of the Nth frame image and deblur the Nth frame
- the processed image is more natural.
- the target of both the deblurring processing and the alignment processing in this embodiment is the feature image, therefore, the data processing amount is small, the processing speed is fast, and real-time deblurring of the video image can be realized.
- This application also provides a video image deblurring neural network for implementing the method in the foregoing embodiment.
- FIG. 13 is a schematic structural diagram of a video image deblurring neural network provided by an embodiment of the present application.
- the video image deblurring neural network includes: an encoding module, an alignment convolution kernel, a deblurring convolution kernel generation module, and a decoding module.
- the encoding module in FIG. 13 is the same as the encoding module shown in FIG. 7, and the decoding module in FIG. 13 is the same as the decoding module shown in FIG. 12, which will not be repeated here.
- the aligned convolution kernel and deblurring convolution kernel generation module shown in Fig. 14 includes: a decoding module, an aligned convolution kernel generation module, a deblurring convolution kernel generation module, and the alignment convolution kernel generation module and
- the deblurring convolution kernel generation module includes a convolution layer with a channel number of 128 and a convolution kernel size of 1*1.
- the convolution layer is connected to a concatenate layer.
- the adaptive convolutional layer shown in FIG. 14 is the module shown in FIG. 11.
- the aligned convolution kernel and deblurring convolution kernel generated by the module shown in Figure 14 respectively convolve the pixel points of the feature image of the N-1th frame image and the feature image of the Nth frame image through the adaptive convolution layer.
- Product processing ie, alignment processing and de-blurring processing
- the N-th frame decoded image is obtained, and the pixel value of the first pixel of the N-th frame image is compared with the value of the N-th frame decoded image.
- the pixel values of the second pixel are added to obtain the deblurred image of the Nth frame, where the position of the first pixel in the Nth frame of image and the second pixel in the Nth frame of decoded image The location is the same.
- the Nth frame image and the deblurred image of the Nth frame are used as the input of the video image deblurring neural network to process the N+1th frame image.
- the video image deblurring neural network requires 4 inputs to deblur each frame of the video.
- the 4 inputs are: The feature image of the N-1th frame image, the N-1th frame deblurred image, the Nth frame image, and the N-1th frame deblurred image (that is, the feature image after the above Nth frame fusion) .
- the video image deblurring neural network provided by this embodiment can perform deblurring processing on the video image, and the entire processing process only needs 4 inputs to directly obtain the deblurred image, and the processing speed is fast.
- the deblurring convolution kernel generation module and the alignment convolution kernel generation module generate a deblurring convolution kernel and alignment convolution kernel for each pixel in the image, which can improve the video image deblurring neural network for different frames in the video. Deblurring effect for non-uniformly blurred images.
- the embodiment of the application Based on the video image deblurring neural network provided in the embodiment, the embodiment of the application provides a training method for the video image deblurring neural network.
- the difference between the Nth frame deblurred image output by the video image deblurring neural network and the clear image of the Nth frame image is determined.
- the error between is determined.
- the specific expression of the mean square error loss function is as follows:
- C, H, W are respectively the Nth frame image (assuming that the video image deblurring neural network deblurs the Nth frame image) channel number, height, and width, and R is the Nth frame input of the video image deblurring neural network.
- Frame deblurred image, S is the supervision data of the Nth frame image.
- the perceptual loss function is used to determine the Euclidean distance between the features of the Nth frame of the deblurred image output by the VGG-19 network and the features of the Nth frame of image supervision data.
- the specific expression of the perceptual loss function is as follows:
- ⁇ j ( ⁇ ) is the feature image output by the jth layer in the pre-trained VGG-19 network, They are the number of channels, height, and width of the feature image, R is the Nth frame deblurred image input by the video image deblurring neural network, and S is the ground truth of the Nth frame image.
- this embodiment obtains the loss function of the video image deblurring neural network by performing weighted summation on formula (1) and formula (2).
- the specific expression is as follows:
- ⁇ is the weight; optionally, ⁇ is a natural number.
- the value of j may be 15, and the value of ⁇ may be 0.01.
- the training of the video image deblurring neural network of this embodiment can be completed.
- the embodiments of the present application provide several possible implementation scenarios.
- Applying the embodiments of the present application to a drone can remove the blur of the video image captured by the drone in real time, and provide users with clearer videos.
- the UAV's flight control system is based on the deblurred video image to process the UAV's attitude and movement, which can improve the control accuracy and provide strong support for the UAV to complete various aerial operations.
- the embodiments of this application can also be applied to mobile terminals (such as mobile phones, sports cameras, etc.).
- the user uses the terminal to capture videos of objects that move vigorously, and the terminal can take pictures of the user by running the method provided in the embodiments of this application.
- the video is processed in real time to reduce the blur caused by the intense movement of the subject and improve the user experience.
- the violent movement of the subject refers to the relative movement between the terminal and the subject.
- the video image processing method provided by the embodiments of the present application has fast processing speed and good real-time performance.
- the neural network provided by the embodiments of the present application has less weights and requires less processing resources to run the neural network, and therefore, it can be applied to mobile terminals.
- FIG. 15 is a schematic structural diagram of a video image processing apparatus provided by an embodiment of the application.
- the apparatus 1 includes: an acquisition unit 11, a first processing unit 12, and a second processing unit 13, wherein:
- the acquiring unit 11 is configured to acquire multiple frames of continuous video images, wherein the multiple frames of continuous video images include an Nth frame image, an N-1th frame image, and an N-1th frame deblurred image, and the N Is a positive integer;
- the first processing unit 12 is configured to obtain a deblurring volume of the Nth frame image based on the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image Product core
- the second processing unit 13 is configured to perform deblurring processing on the Nth frame of image through the deblurring convolution kernel to obtain a deblurred image of the Nth frame.
- the first processing unit 12 includes: a first convolution processing subunit 121, configured to perform convolution processing on pixels of the image to be processed to obtain a deblurring convolution kernel, wherein The image to be processed is obtained by superimposing the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image in the channel dimension.
- a first convolution processing subunit 121 configured to perform convolution processing on pixels of the image to be processed to obtain a deblurring convolution kernel, wherein The image to be processed is obtained by superimposing the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image in the channel dimension.
- the first convolution processing subunit 121 is configured to perform convolution processing on the image to be processed to extract the pixels of the N-1th frame image relative to all
- the motion information of the pixels of the Nth frame of image obtains the aligned convolution kernel, where the motion information includes speed and direction; and the alignment convolution kernel is encoded to obtain the deblurring convolution kernel.
- the second processing unit 13 includes: a second convolution processing subunit 131 configured to check the pixel points of the characteristic image of the Nth frame of image through the deblurring convolution check Perform convolution processing to obtain a first characteristic image; the decoding processing sub-unit 132 is configured to perform decoding processing on the first characteristic image to obtain the Nth frame of the deblurred image.
- the second convolution processing subunit 131 is configured to adjust the dimension of the deblurring convolution kernel so that the number of channels of the deblurring convolution kernel is equal to the number of channels of the Nth convolution kernel.
- the number of channels of the characteristic image of the frame image is the same; and the pixel points of the characteristic image of the Nth frame image are convolved by the deblurring convolution kernel after the dimension is adjusted to obtain the first characteristic image.
- the first convolution processing subunit 121 is further configured to: perform convolution processing on the to-be-processed image to extract pixels of the N-1th frame of image
- the motion information of a point relative to the pixel of the Nth frame image is obtained after the aligned convolution kernel is obtained, and then the pixel points of the characteristic image of the image deblurred in the N-1th frame are processed through the aligned convolution kernel. Convolution processing to obtain the second feature image.
- the first convolution processing subunit 121 is further configured to: adjust the dimension of the aligned convolution kernel so that the number of channels of the aligned convolution kernel is equal to the number of channels of the N-th convolution kernel.
- the number of channels of the characteristic image of one frame of image is the same; and the pixel points of the characteristic image of the image after the deblurring of the N-1th frame are convolved by the aligned convolution check after adjusting the dimensions to obtain the The second feature image.
- the second processing unit 13 is configured to: perform fusion processing on the first characteristic image and the second characteristic image to obtain a third characteristic image; The characteristic image is decoded to obtain the deblurred image of the Nth frame.
- the first convolution processing subunit 121 is further configured to: deblur the Nth frame image, the N-1th frame image, and the N-1th frame The processed image is superimposed in the channel dimension to obtain the image to be processed; and the image to be processed is encoded to obtain a fourth characteristic image; and the fourth characteristic image is convolved to obtain A fifth characteristic image; and adjusting the number of channels of the fifth characteristic image to a first preset value through convolution processing to obtain the aligned convolution kernel.
- the first convolution processing subunit 121 is further configured to adjust the number of channels of the aligned convolution kernel to the second preset value through convolution processing to obtain the first Six feature images; and performing fusion processing on the fourth feature image and the sixth feature image to obtain a seventh feature image; and performing convolution processing on the seventh feature image to extract the N-1th feature image
- the deblurring information of the pixels of the image after frame deblurring processing relative to the pixels of the N-1th frame image is obtained to obtain the deblurring convolution kernel.
- the first convolution processing subunit 121 is further configured to: perform convolution processing on the seventh feature image to obtain an eighth feature image; and perform convolution processing on the The number of channels of the eighth characteristic image is adjusted to the first preset value to obtain the deblurring convolution kernel.
- the second processing unit 13 is further configured to: perform deconvolution processing on the third feature image to obtain a ninth feature image; and perform convolution on the ninth feature image.
- Product processing to obtain a decoded image of the Nth frame; and add the pixel value of the first pixel of the Nth frame of image to the pixel value of the second pixel of the image of the Nth frame of decoded , To obtain the deblurred image of the Nth frame, wherein the position of the first pixel in the Nth frame of image and the second pixel in the Nth frame of the decoded image In the same position.
- the functions or units included in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
- the functions or units included in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
- the embodiment of the present application also provides an electronic device, including: a processor, an input device, an output device, and a memory.
- the processor, the input device, the output device, and the memory are connected to each other, and the memory stores program instructions; When the program instructions are executed by the processor, the processor is caused to execute the method described in the embodiment of the present application.
- the embodiment of the present application also provides a processor configured to execute the method described in the embodiment of the present application.
- FIG. 16 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the application.
- the electronic device 2 includes a processor 21, a memory 22, and a camera 23.
- the processor 21, the memory 22, and the camera 23 are coupled through a connector, and the connector includes various interfaces, transmission lines or buses, etc., which are not limited in the embodiment of the present application.
- coupling refers to mutual connection in a specific manner, including direct connection or indirect connection through other devices, for example, connection through various interfaces, transmission lines, buses, etc.
- the processor 21 may be one or more graphics processing units (Graphics Processing Unit, GPU).
- GPU Graphics Processing Unit
- the processor 21 may be a single-core GPU or a multi-core GPU.
- the processor 21 may be a processor group composed of multiple GPUs, and the multiple processors are coupled to each other through one or more buses.
- the processor may also be other types of processors, etc., which is not limited in the embodiment of the present application.
- the memory 22 may be used to store computer program instructions and various computer program codes including program codes used to execute the solutions of the present application.
- the memory includes but is not limited to Random Access Memory (RAM), Read-Only Memory (ROM), Erasable Programmable Read-Only Memory, EPROM ), or a portable read-only memory (Compact Disc Read-Only Memory, CD-ROM), which is used for related instructions and data.
- RAM Random Access Memory
- ROM Read-Only Memory
- EPROM Erasable Programmable Read-Only Memory
- CD-ROM Compact Disc Read-Only Memory
- the camera 23 can be used to obtain related videos or images and so on.
- the memory can be used not only to store related instructions, but also to store related images and videos.
- the memory can be used to store videos acquired by the camera 23, or the memory can also be used to store 21 and the generated image after deblurring processing, etc., the embodiment of the present application does not limit the specific video or image stored in the memory.
- FIG. 16 only shows a simplified design of the video image processing device.
- the video image processing device may also include other necessary components, including but not limited to any number of input/output devices, processors, controllers, memories, etc., and all devices that can implement the embodiments of this application are Within the protection scope of this application.
- the embodiments of the present application also provide a computer-readable storage medium in which a computer program is stored.
- the computer program includes program instructions. When the program instructions are executed by a processor of an electronic device, Enabling the processor to execute the method described in the embodiment of the present application.
- the disclosed system, device, and method may be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the computer program product includes one or more computer instructions.
- the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
- the computer instructions may be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium.
- the computer instructions can be sent from a website, computer, server, or data center via wired (for example, coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (for example, infrared, wireless, microwave, etc.) Another website site, computer, server or data center for transmission.
- the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
- the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a Digital Versatile Disc (DVD)), or a semiconductor medium (for example, a Solid State Disk (SSD) )Wait.
- a magnetic medium for example, a floppy disk, a hard disk, and a magnetic tape
- an optical medium for example, a Digital Versatile Disc (DVD)
- DVD Digital Versatile Disc
- SSD Solid State Disk
- the process can be completed by a computer program instructing relevant hardware.
- the program can be stored in a computer readable storage medium. , May include the processes of the foregoing method embodiments.
- the aforementioned storage media include: read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM), magnetic disks or optical disks, and various media that can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (27)
- 一种视频图像处理方法,包括:获取多帧连续视频图像,其中,所述多帧连续视频图像包括第N帧图像、第N-1帧图像以及第N-1帧去模糊处理后的图像,所述N为正整数;基于所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像,得到所述第N帧图像的去模糊卷积核;通过所述去模糊卷积核对所述第N帧图像进行去模糊处理,得到第N帧去模糊处理后的图像。
- 根据权利要求1所述的方法,其中,所述基于所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像,得到所述第N帧图像的去模糊卷积核,包括:对待处理图像的像素点进行卷积处理,得到去模糊卷积核,其中,所述待处理图像由所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像在通道维度上进行叠加得到。
- 根据权利要求2所述的方法,其中,所述对待处理图像的像素点进行卷积处理,得到去模糊卷积核,包括:对所述待处理图像进行卷积处理,以提取所述第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,得到对齐卷积核,其中,所述运动信息包括速度和方向;对所述对齐卷积核进行编码处理,得到所述去模糊卷积核。
- 根据权利要求2或3所述的方法,其中,所述通过所述去模糊卷积核对所述第N帧图像进行去模糊处理,得到第N帧去模糊处理后的图像,包括:通过所述去模糊卷积核对所述第N帧图像的特征图像的像素点进行卷积处理,得到第一特征图像;对所述第一特征图像进行解码处理,得到所述第N帧去模糊处理后的图像。
- 根据权利要求4所述的方法,其中,所述通过所述去模糊卷积核对所述第N帧图像的特征图像的像素点进行卷积处理,得到第一特征图像,包括:调整所述去模糊卷积核的维度,使所述去模糊卷积核的通道数与所述第N帧图像的特征图像的通道数相同;通过调整维度后的所述去模糊卷积核对所述第N帧图像的特征图像的像素点进行卷积处理,得到所述第一特征图像。
- 根据权利要求3所述的方法,其中,所述对所述待处理图像进行卷积处理,以提取所述第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,得到对齐卷积核之后,还包括:通过所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到第二特征图像。
- 根据权利要求6所述的方法,其中,所述通过所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到第二特征图像,包括:调整所述对齐卷积核的维度,使所述对齐卷积核的通道数与所述第N-1帧图像的特征图像的通道数相同;通过调整维度后的所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到所述第二特征图像。
- 根据权利要求7所述的方法,其中,所述对所述第一特征图像进行解码处理,得到所述第N帧去模糊处理后的图像,包括:对所述第一特征图像和所述第二特征图像进行融合处理,得到第三特征图像;对所述第三特征图像进行解码处理,得到所述第N帧去模糊处理后的图像。
- 根据权利要求3所述的方法,其中,所述对所述待处理图像进行卷积处理,以提取所述第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,得到对齐卷积核,包括:对所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像在通道维度上进行叠加处理,得到所述待处理图像;对所述待处理图像进行编码处理,得到第四特征图像;对所述第四特征图像进行卷积处理,得到第五特征图像;通过卷积处理将所述第五特征图像的通道数调整至第一预设值,得到所述对齐卷积核。
- 根据权利要求9所述的方法,其中,所述对齐卷积核进行编码处理,得到所述去模糊卷积核,包括:通过卷积处理将所述对齐卷积核的通道数调整至第二预设值,得到第六特征图像;对所述第四特征图像和所述第六特征图像进行融合处理,得到第七特征图像;对所述第七特征图像进行卷积处理,以提取所述第N-1帧去模糊处理后的图像的像素点相对于所述第N-1帧图像的像素点的去模糊信息,得到所述去模糊卷积核。
- 根据权利要求10所述的方法,其中,所述对所述第七特征图像进行卷积处理,以提取所述第N-1帧去模糊处理后的图像相对于所述第N-1帧图像的像素点的去模糊信息,得到所述去模糊卷积核,包括:对所述第七特征图像进行卷积处理,得到第八特征图像;通过卷积处理将所述第八特征图像的通道数调整至所述第一预设值,得到所述去模糊卷积核。
- 根据权利要求8所述的方法,其中,所述对所述第三特征图像进行解码处理,得到所述第N帧去模糊处理后的图像,包括:对所述第三特征图像进行解卷积处理,得到第九特征图像;对所述第九特征图像进行卷积处理,得到第N帧解码处理后的图像;将所述第N帧图像的第一像素点的像素值与所述第N帧解码处理后的图像的第二像素点的像素值相加,得到所述第N帧去模糊处理后的图像,其中,所述第一像素点在所述第N帧图像中的位置与所述第二像素点在所述第N帧解码处理后的图像中的位置相同。
- 一种视频图像处理装置,包括:获取单元,配置为获取多帧连续视频图像,其中,所述多帧连续视频图像包括第N帧图像、第N-1帧图像以及第N-1帧去模糊处理后的图像,所述N为正整数;第一处理单元,配置为基于所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像,得到所述第N帧图像的去模糊卷积核;第二处理单元,配置为通过所述去模糊卷积核对所述第N帧图像进行去模糊处理,得到第N帧去模糊处理后的图像。
- 根据权利要求13所述的装置,其中,所述第一处理单元包括:第一卷积处理子单元,配置为对待处理图像的像素点进行卷积处理,得到去模糊卷积核,其中,所述待处理图像由所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像在通道维度上进行叠加得到。
- 根据权利要求14所述的装置,其中,所述第一卷积处理子单元配置为:对所述待处理图像进行卷积处理,以提取所述第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,得到对齐卷积核,其中,所述运动信息包括速度和方向;以及对所述对齐卷积核进行编码处理,得到所述去模糊卷积核。
- 根据权利要求14或15所述的装置,其中,所述第二处理单元包括:第二卷积处理子单元,配置为通过所述去模糊卷积核对所述第N帧图像的特征图像的像素点进行卷积处理,得到第一特征图像;解码处理子单元,配置为对所述第一特征图像进行解码处理,得到所述第N帧去模糊处理后的图像。
- 根据权利要求16所述的装置,其中,所述第二卷积处理子单元配置为:调整所述去模糊卷积核的维度,使所述去模糊卷积核的通道数与所述第N帧图像的特征图像的通道数相同;以及通过调整维度后的所述去模糊卷积核对所述第N帧图像的特征图像的像素点进行卷积处理,得到所述第一特征图像。
- 根据权利要求15所述的装置,其中,所述第一卷积处理子单元还配置为:在所述对所述待处理图像进行卷积处理,以提取所述第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,得到对齐卷积核之后,通过所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到第二特征图像。
- 根据权利要求18所述的装置,其中,所述第一卷积处理子单元还配置为:调整所述对齐卷积核的维度,使所述对齐卷积核的通道数与所述第N-1帧图像的特征图像的通道数相同;以及通过调整维度后的所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到所述第二特征图像。
- 根据权利要求19所述的装置,其中,所述第二处理单元配置为:对所述第一特征图像和所 述第二特征图像进行融合处理,得到第三特征图像;以及对所述第三特征图像进行解码处理,得到所述第N帧去模糊处理后的图像。
- 根据权利要求15所述的装置,其中,所述第一卷积处理子单元还配置为:对所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像在通道维度上进行叠加处理,得到所述待处理图像;以及对所述待处理图像进行编码处理,得到第四特征图像;以及对所述第四特征图像进行卷积处理,得到第五特征图像;以及通过卷积处理将所述第五特征图像的通道数调整至第一预设值,得到所述对齐卷积核。
- 根据权利要求21所述的装置,其中,所述第一卷积处理子单元还配置为:通过卷积处理将所述对齐卷积核的通道数调整至第二预设值,得到第六特征图像;以及对所述第四特征图像和所述第六特征图像进行融合处理,得到第七特征图像;以及对所述第七特征图像进行卷积处理,以提取所述第N-1帧去模糊处理后的图像的像素点相对于所述第N-1帧图像的像素点的去模糊信息,得到所述去模糊卷积核。
- 根据权利要求22所述的装置,其中,所述第一卷积处理子单元还配置为:对所述第七特征图像进行卷积处理,得到第八特征图像;以及通过卷积处理将所述第八特征图像的通道数调整至所述第一预设值,得到所述去模糊卷积核。
- 根据权利要求20所述的方法,其中,所述第二处理单元还配置为:对所述第三特征图像进行解卷积处理,得到第九特征图像;以及对所述第九特征图像进行卷积处理,得到第N帧解码处理后的图像;以及将所述第N帧图像的第一像素点的像素值与所述第N帧解码处理后的图像的第二像素点的像素值相加,得到所述第N帧去模糊处理后的图像,其中,所述第一像素点在所述第N帧图像中的位置与所述第二像素点在所述第N帧解码处理后的图像中的位置相同。
- 一种处理器,所述处理器用于执行如权利要求1至12任意一项所述的方法。
- 一种电子设备,包括:处理器、输入装置、输出装置和存储器,所述处理器、输入装置、输出装置和存储器相互连接,所述存储器中存储有程序指令;所述程序指令被所述处理器执行时,使所述处理器执行如权利要求1至12任一项权利要求所述的方法。
- 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被电子设备的处理器执行时,使所述处理器执行权利要求1至12任意一项所述的方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021520271A JP7123256B2 (ja) | 2019-04-22 | 2019-10-29 | ビデオ画像処理方法及び装置 |
KR1020217009399A KR20210048544A (ko) | 2019-04-22 | 2019-10-29 | 비디오 이미지 처리 방법 및 장치 |
SG11202108197SA SG11202108197SA (en) | 2019-04-22 | 2019-10-29 | Video image processing method and apparatus |
US17/384,910 US20210352212A1 (en) | 2019-04-22 | 2021-07-26 | Video image processing method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910325282.5 | 2019-04-22 | ||
CN201910325282.5A CN110062164B (zh) | 2019-04-22 | 2019-04-22 | 视频图像处理方法及装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/384,910 Continuation US20210352212A1 (en) | 2019-04-22 | 2021-07-26 | Video image processing method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020215644A1 true WO2020215644A1 (zh) | 2020-10-29 |
Family
ID=67319990
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/114139 WO2020215644A1 (zh) | 2019-04-22 | 2019-10-29 | 视频图像处理方法及装置 |
Country Status (7)
Country | Link |
---|---|
US (1) | US20210352212A1 (zh) |
JP (1) | JP7123256B2 (zh) |
KR (1) | KR20210048544A (zh) |
CN (3) | CN113992848A (zh) |
SG (1) | SG11202108197SA (zh) |
TW (1) | TWI759668B (zh) |
WO (1) | WO2020215644A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2023523502A (ja) * | 2021-04-07 | 2023-06-06 | ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド | モデルトレーニング方法、歩行者再識別方法、装置および電子機器 |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113992848A (zh) * | 2019-04-22 | 2022-01-28 | 深圳市商汤科技有限公司 | 视频图像处理方法及装置 |
CN112465698A (zh) | 2019-09-06 | 2021-03-09 | 华为技术有限公司 | 一种图像处理方法和装置 |
CN111241985B (zh) * | 2020-01-08 | 2022-09-09 | 腾讯科技(深圳)有限公司 | 一种视频内容识别方法、装置、存储介质、以及电子设备 |
CN112200732B (zh) * | 2020-04-30 | 2022-10-21 | 南京理工大学 | 一种清晰特征融合的视频去模糊方法 |
CN113409209B (zh) * | 2021-06-17 | 2024-06-21 | Oppo广东移动通信有限公司 | 图像去模糊方法、装置、电子设备与存储介质 |
US20230034727A1 (en) * | 2021-07-29 | 2023-02-02 | Rakuten Group, Inc. | Blur-robust image segmentation |
CN116362976A (zh) * | 2021-12-22 | 2023-06-30 | 北京字跳网络技术有限公司 | 一种模糊视频修复方法及装置 |
CN114708166A (zh) * | 2022-04-08 | 2022-07-05 | Oppo广东移动通信有限公司 | 图像处理方法、装置、存储介质以及终端 |
CN116132798B (zh) * | 2023-02-02 | 2023-06-30 | 深圳市泰迅数码有限公司 | 一种智能摄像头的自动跟拍方法 |
CN116128769B (zh) * | 2023-04-18 | 2023-06-23 | 聊城市金邦机械设备有限公司 | 摇摆运动机构的轨迹视觉记录系统 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100201865A1 (en) * | 2009-02-09 | 2010-08-12 | Samsung Electronics Co., Ltd. | Imaging method for use with variable coded aperture device and imaging apparatus using the imaging method |
US20120033096A1 (en) * | 2010-08-06 | 2012-02-09 | Honeywell International, Inc. | Motion blur modeling for image formation |
CN102576454A (zh) * | 2009-10-16 | 2012-07-11 | 伊斯曼柯达公司 | 利用空间图像先验的图像去模糊法 |
CN104103050A (zh) * | 2014-08-07 | 2014-10-15 | 重庆大学 | 一种基于局部策略的真实视频复原方法 |
CN108109121A (zh) * | 2017-12-18 | 2018-06-01 | 深圳市唯特视科技有限公司 | 一种基于卷积神经网络的人脸模糊快速消除方法 |
CN108875900A (zh) * | 2017-11-02 | 2018-11-23 | 北京旷视科技有限公司 | 视频图像处理方法和装置、神经网络训练方法、存储介质 |
CN109345449A (zh) * | 2018-07-17 | 2019-02-15 | 西安交通大学 | 一种基于融合网络的图像超分辨率及去非均匀模糊方法 |
CN110062164A (zh) * | 2019-04-22 | 2019-07-26 | 深圳市商汤科技有限公司 | 视频图像处理方法及装置 |
Family Cites Families (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8654201B2 (en) * | 2005-02-23 | 2014-02-18 | Hewlett-Packard Development Company, L.P. | Method for deblurring an image |
EP2153407A1 (en) * | 2007-05-02 | 2010-02-17 | Agency for Science, Technology and Research | Motion compensated image averaging |
KR101574733B1 (ko) * | 2008-11-19 | 2015-12-04 | 삼성전자 주식회사 | 고화질 컬러 영상을 획득하기 위한 영상 처리 장치 및 방법 |
WO2010093040A1 (ja) | 2009-02-13 | 2010-08-19 | 国立大学法人静岡大学 | モーションブラー制御装置、方法、及びプログラム |
US8379120B2 (en) * | 2009-11-04 | 2013-02-19 | Eastman Kodak Company | Image deblurring using a combined differential image |
JP5204165B2 (ja) * | 2010-08-05 | 2013-06-05 | パナソニック株式会社 | 画像復元装置および画像復元方法 |
CN102073993B (zh) * | 2010-12-29 | 2012-08-22 | 清华大学 | 一种基于摄像机自标定的抖动视频去模糊方法和装置 |
CN102158730B (zh) * | 2011-05-26 | 2014-04-02 | 威盛电子股份有限公司 | 影像处理系统及方法 |
KR101844332B1 (ko) * | 2012-03-13 | 2018-04-03 | 삼성전자주식회사 | 블러 영상 및 노이즈 영상으로 구성된 멀티 프레임을 이용하여 비균일 모션 블러를 제거하는 방법 및 장치 |
CN103049891B (zh) * | 2013-01-25 | 2015-04-08 | 西安电子科技大学 | 基于自适应窗口选择的视频图像去模糊方法 |
US9392173B2 (en) * | 2013-12-13 | 2016-07-12 | Adobe Systems Incorporated | Image deblurring based on light streaks |
CN104932868B (zh) * | 2014-03-17 | 2019-01-15 | 联想(北京)有限公司 | 一种数据处理方法及电子设备 |
CN104135598B (zh) * | 2014-07-09 | 2017-05-17 | 清华大学深圳研究生院 | 一种视频图像稳定方法及装置 |
CN106033595B (zh) * | 2015-03-13 | 2021-06-22 | 中国科学院西安光学精密机械研究所 | 一种基于局部约束的图像盲去模糊方法 |
CN105405099A (zh) * | 2015-10-30 | 2016-03-16 | 北京理工大学 | 一种基于点扩散函数的水下图像超分辨率重建方法 |
CN105957036B (zh) * | 2016-05-06 | 2018-07-10 | 电子科技大学 | 一种加强字符先验的视频去运动模糊方法 |
CN106251297A (zh) * | 2016-07-19 | 2016-12-21 | 四川大学 | 一种改进的基于多幅图像模糊核估计的盲超分辨率重建算法 |
CN106791273B (zh) * | 2016-12-07 | 2019-08-20 | 重庆大学 | 一种结合帧间信息的视频盲复原方法 |
CN107273894A (zh) * | 2017-06-15 | 2017-10-20 | 珠海习悦信息技术有限公司 | 车牌的识别方法、装置、存储介质及处理器 |
CN108875486A (zh) * | 2017-09-28 | 2018-11-23 | 北京旷视科技有限公司 | 目标对象识别方法、装置、系统和计算机可读介质 |
CN107944416A (zh) * | 2017-12-06 | 2018-04-20 | 成都睿码科技有限责任公司 | 一种通过视频进行真人验证的方法 |
CN108256629B (zh) * | 2018-01-17 | 2020-10-23 | 厦门大学 | 基于卷积网络和自编码的eeg信号无监督特征学习方法 |
CN108629743B (zh) * | 2018-04-04 | 2022-03-25 | 腾讯科技(深圳)有限公司 | 图像的处理方法、装置、存储介质和电子装置 |
CN108846861B (zh) * | 2018-06-12 | 2020-12-29 | 广州视源电子科技股份有限公司 | 图像单应矩阵计算方法、装置、移动终端及存储介质 |
CN108830221A (zh) * | 2018-06-15 | 2018-11-16 | 北京市商汤科技开发有限公司 | 图像的目标对象分割及训练方法和装置、设备、介质、产品 |
CN109410130B (zh) * | 2018-09-28 | 2020-12-04 | 华为技术有限公司 | 图像处理方法和图像处理装置 |
CN109472837A (zh) * | 2018-10-24 | 2019-03-15 | 西安电子科技大学 | 基于条件生成对抗网络的光电图像转换方法 |
CN109360171B (zh) * | 2018-10-26 | 2021-08-06 | 北京理工大学 | 一种基于神经网络的视频图像实时去模糊方法 |
-
2019
- 2019-04-22 CN CN202111217908.4A patent/CN113992848A/zh not_active Withdrawn
- 2019-04-22 CN CN201910325282.5A patent/CN110062164B/zh active Active
- 2019-04-22 CN CN202111217907.XA patent/CN113992847A/zh not_active Withdrawn
- 2019-10-29 JP JP2021520271A patent/JP7123256B2/ja active Active
- 2019-10-29 WO PCT/CN2019/114139 patent/WO2020215644A1/zh active Application Filing
- 2019-10-29 KR KR1020217009399A patent/KR20210048544A/ko active IP Right Grant
- 2019-10-29 SG SG11202108197SA patent/SG11202108197SA/en unknown
- 2019-12-13 TW TW108145856A patent/TWI759668B/zh active
-
2021
- 2021-07-26 US US17/384,910 patent/US20210352212A1/en not_active Abandoned
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100201865A1 (en) * | 2009-02-09 | 2010-08-12 | Samsung Electronics Co., Ltd. | Imaging method for use with variable coded aperture device and imaging apparatus using the imaging method |
CN102576454A (zh) * | 2009-10-16 | 2012-07-11 | 伊斯曼柯达公司 | 利用空间图像先验的图像去模糊法 |
US20120033096A1 (en) * | 2010-08-06 | 2012-02-09 | Honeywell International, Inc. | Motion blur modeling for image formation |
CN104103050A (zh) * | 2014-08-07 | 2014-10-15 | 重庆大学 | 一种基于局部策略的真实视频复原方法 |
CN108875900A (zh) * | 2017-11-02 | 2018-11-23 | 北京旷视科技有限公司 | 视频图像处理方法和装置、神经网络训练方法、存储介质 |
CN108109121A (zh) * | 2017-12-18 | 2018-06-01 | 深圳市唯特视科技有限公司 | 一种基于卷积神经网络的人脸模糊快速消除方法 |
CN109345449A (zh) * | 2018-07-17 | 2019-02-15 | 西安交通大学 | 一种基于融合网络的图像超分辨率及去非均匀模糊方法 |
CN110062164A (zh) * | 2019-04-22 | 2019-07-26 | 深圳市商汤科技有限公司 | 视频图像处理方法及装置 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2023523502A (ja) * | 2021-04-07 | 2023-06-06 | ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド | モデルトレーニング方法、歩行者再識別方法、装置および電子機器 |
JP7403673B2 (ja) | 2021-04-07 | 2023-12-22 | ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド | モデルトレーニング方法、歩行者再識別方法、装置および電子機器 |
Also Published As
Publication number | Publication date |
---|---|
CN110062164B (zh) | 2021-10-26 |
CN113992847A (zh) | 2022-01-28 |
TW202040986A (zh) | 2020-11-01 |
KR20210048544A (ko) | 2021-05-03 |
SG11202108197SA (en) | 2021-08-30 |
US20210352212A1 (en) | 2021-11-11 |
CN110062164A (zh) | 2019-07-26 |
JP2021528795A (ja) | 2021-10-21 |
JP7123256B2 (ja) | 2022-08-22 |
TWI759668B (zh) | 2022-04-01 |
CN113992848A (zh) | 2022-01-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020215644A1 (zh) | 视频图像处理方法及装置 | |
US12008797B2 (en) | Image segmentation method and image processing apparatus | |
CN110473137B (zh) | 图像处理方法和装置 | |
US11688070B2 (en) | Video frame segmentation using reduced resolution neural network and masks from previous frames | |
TWI777185B (zh) | 機器人圖像增強方法、處理器、電子設備、電腦可讀儲存介質 | |
CN101540046B (zh) | 基于图像特征的全景图拼接方法和装置 | |
JP7086235B2 (ja) | ビデオ処理方法、装置及びコンピュータ記憶媒体 | |
WO2022042124A1 (zh) | 超分辨率图像重建方法、装置、计算机设备和存储介质 | |
CN112950471A (zh) | 视频超分处理方法、装置、超分辨率重建模型、介质 | |
CN110428382B (zh) | 一种用于移动终端的高效视频增强方法、装置和存储介质 | |
WO2020146911A2 (en) | Multi-stage multi-reference bootstrapping for video super-resolution | |
CN109005334A (zh) | 一种成像方法、装置、终端和存储介质 | |
Conde et al. | Lens-to-lens bokeh effect transformation. NTIRE 2023 challenge report | |
JP2023525462A (ja) | 特徴を抽出するための方法、装置、電子機器、記憶媒体およびコンピュータプログラム | |
CN113949808A (zh) | 视频生成方法、装置、可读介质及电子设备 | |
CN107644423A (zh) | 基于场景分割的视频数据实时处理方法、装置及计算设备 | |
CN109949234A (zh) | 基于深度网络的视频复原模型训练方法及视频复原方法 | |
CN113379600A (zh) | 基于深度学习的短视频超分辨率转换方法、装置及介质 | |
CN115170383A (zh) | 一种图像虚化方法、装置、存储介质及终端设备 | |
CN114677286A (zh) | 一种图像处理方法、装置、存储介质及终端设备 | |
CN113658050A (zh) | 一种图像的去噪方法、去噪装置、移动终端及存储介质 | |
TWI586144B (zh) | 用於視頻分析與編碼之多重串流處理技術 | |
CN109996056B (zh) | 一种2d视频转3d视频的方法、装置及电子设备 | |
CN115170581A (zh) | 人像分割模型的生成方法、人像分割模型及人像分割方法 | |
CN115984348A (zh) | 一种全景图像的处理方法、装置、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19926716 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021520271 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20217009399 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03.02.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19926716 Country of ref document: EP Kind code of ref document: A1 |