[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2020215644A1 - 视频图像处理方法及装置 - Google Patents

视频图像处理方法及装置 Download PDF

Info

Publication number
WO2020215644A1
WO2020215644A1 PCT/CN2019/114139 CN2019114139W WO2020215644A1 WO 2020215644 A1 WO2020215644 A1 WO 2020215644A1 CN 2019114139 W CN2019114139 W CN 2019114139W WO 2020215644 A1 WO2020215644 A1 WO 2020215644A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
frame
processing
convolution
deblurring
Prior art date
Application number
PCT/CN2019/114139
Other languages
English (en)
French (fr)
Inventor
周尚辰
张佳维
任思捷
Original Assignee
深圳市商汤科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市商汤科技有限公司 filed Critical 深圳市商汤科技有限公司
Priority to JP2021520271A priority Critical patent/JP7123256B2/ja
Priority to KR1020217009399A priority patent/KR20210048544A/ko
Priority to SG11202108197SA priority patent/SG11202108197SA/en
Publication of WO2020215644A1 publication Critical patent/WO2020215644A1/zh
Priority to US17/384,910 priority patent/US20210352212A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/68Control of cameras or camera modules for stable pick-up of the scene, e.g. compensating for camera body vibrations
    • H04N23/682Vibration or motion blur correction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/68Control of cameras or camera modules for stable pick-up of the scene, e.g. compensating for camera body vibrations
    • H04N23/681Motion detection
    • H04N23/6811Motion detection based on the image signal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/68Control of cameras or camera modules for stable pick-up of the scene, e.g. compensating for camera body vibrations
    • H04N23/682Vibration or motion blur correction
    • H04N23/683Vibration or motion blur correction performed by a processor, e.g. controlling the readout of an image memory
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/95Computational photography systems, e.g. light-field imaging systems

Definitions

  • This application relates to the field of image processing technology, and in particular to a video image processing method and device.
  • the captured video is prone to blurring.
  • the blur caused by camera shake or the motion of the subject will often result in shooting failure or failure to perform video-based processing.
  • Next processing Traditional methods can remove the blur in the video image through optical flow or neural network, but the deblurring effect is poor.
  • the embodiments of the present application provide a video image processing method and device.
  • an embodiment of the present application provides a video image processing method, including: acquiring multiple frames of continuous video images, wherein the multiple frames of continuous video images include the Nth frame image, the N-1th frame image, and the Nth frame image.
  • -1 frame deblurred image where N is a positive integer; based on the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image, obtain all The deblurring convolution kernel for the Nth frame image; the deblurring processing is performed on the Nth frame image through the deblurring convolution kernel to obtain an image after the Nth frame deblurring processing.
  • the deblurring convolution kernel of the Nth frame image in the video image can be obtained, and then the Nth frame image can be convolved by the deblurring convolution kernel of the Nth frame image, which can effectively remove Blur in the Nth frame of image, obtain the Nth frame of deblurred image.
  • the image of the Nth frame of image is obtained based on the Nth frame of image, the N-1th frame of image, and the deblurred image of the N-1th frame
  • the deblurring convolution kernel includes: performing convolution processing on the pixels of the image to be processed to obtain the deblurring convolution kernel, wherein the image to be processed is composed of the Nth frame image, the N-1th frame image, and The deblurred image of the N-1th frame is obtained by superimposing the channel dimension.
  • the deblurring convolution kernel of the pixel is obtained , And use the deblurring convolution kernel to perform deconvolution processing on the corresponding pixels in the Nth frame of image to remove the blurring of the pixels in the Nth frame of image; by generating one for each pixel in the Nth frame of image
  • the deblurring convolution kernel can remove the blur in the Nth frame image (non-uniform blur image), and the image after deblurring is clear and natural.
  • performing convolution processing on the pixels of the image to be processed to obtain a deblurring convolution kernel includes: performing convolution processing on the image to be processed to extract the N-th The motion information of the pixels of a frame of image relative to the pixels of the Nth frame of image obtains the alignment convolution kernel, where the motion information includes speed and direction; the alignment convolution kernel is encoded to obtain The deblurring convolution kernel.
  • the alignment convolution kernel of the pixels is obtained, and the alignment kernel can be used for subsequent alignment. deal with. Then through the convolution processing of the alignment kernel, the deblurring information between the pixels of the N-1th frame image and the pixels of the N-1th frame deblurred image is extracted, and the deblurring kernel is obtained.
  • the deblurring kernel not only contains the deblurring information between the pixels of the N-1th frame image and the pixels of the N-1th frame deblurred image, but also contains the pixels of the N-1th frame image and the The motion information between the pixels of the N frame image is beneficial to improve the effect of removing the blur of the Nth frame image.
  • the deblurring of the Nth frame of image through the deblurring convolution kernel to obtain the deblurred image of the Nth frame includes: using the deblurring volume
  • the product core performs convolution processing on the pixels of the characteristic image of the Nth frame of image to obtain a first characteristic image; performs decoding processing on the first characteristic image to obtain the deblurred image of the Nth frame.
  • deblurring is performed on the characteristic image of the Nth frame image through the deblurring convolution kernel, which can reduce the amount of data processing in the deblurring process and increase the processing speed.
  • the performing convolution processing on the pixels of the characteristic image of the Nth frame of image through the deblurring convolution kernel to obtain the first characteristic image includes: adjusting the deblurring The dimensions of the convolution kernel are such that the number of channels of the deblurring convolution kernel is the same as the number of channels of the characteristic image of the Nth frame of image; the Nth frame of image is checked by the deblurring convolution after dimension adjustment The pixel points of the characteristic image are subjected to convolution processing to obtain the first characteristic image.
  • the dimension of the deblurring convolution kernel is the same as the dimension of the characteristic image of the Nth frame image, and then the deblurring convolution check by adjusting the dimension is achieved.
  • the characteristic images of N frames of images are subjected to convolution processing.
  • the convolution processing is performed on the to-be-processed image to extract the motion information of the pixel of the N-1th frame of image relative to the pixel of the Nth frame of image
  • the method further includes: performing convolution processing on the pixels of the characteristic image of the deblurred image of the N-1th frame through the aligned convolution kernel to obtain a second characteristic image.
  • the pixel points of the characteristic image of the N-1th frame of image are convolved by the alignment convolution kernel to realize the time alignment of the characteristic image of the N-1th frame of image to the Nth frame.
  • the convolution processing is performed on the pixel points of the characteristic image of the deblurred image of the N-1th frame through the aligned convolution kernel to obtain a second characteristic image, including : Adjust the dimensions of the aligned convolution kernel so that the number of channels of the aligned convolution kernel is the same as the number of channels of the feature image of the N-1th frame image; check all the channels by the aligned convolution after adjusting the dimensions
  • the pixel points of the characteristic image of the deblurred image of the N-1th frame are subjected to convolution processing to obtain the second characteristic image.
  • the dimension of the de-aligned convolution kernel is the same as the dimension of the feature image of the N-1th frame image, and then the convolution check by adjusting the dimension is aligned
  • the feature image of the N-1th frame image is subjected to convolution processing.
  • the decoding processing of the first characteristic image to obtain the deblurred image of the Nth frame includes: performing the decoding processing on the first characteristic image and the second characteristic image.
  • the feature image is fused to obtain a third feature image; the third feature image is decoded to obtain the Nth frame deblurred image.
  • the first feature image and the second feature image are merged to improve the deblurring effect of the Nth frame image, and then the fused third feature image is decoded to obtain the Nth image.
  • Frame deblurred image
  • the convolution processing is performed on the to-be-processed image to extract the motion information of the pixel of the N-1th frame of image relative to the pixel of the Nth frame of image ,
  • Obtaining the alignment convolution kernel including: performing superposition processing on the channel dimension of the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image to obtain the The image to be processed; the image to be processed is encoded to obtain a fourth characteristic image; the fourth characteristic image is subjected to convolution processing to obtain a fifth characteristic image; the fifth characteristic image is obtained by convolution processing The number of channels is adjusted to the first preset value to obtain the aligned convolution kernel.
  • the motion information of the pixels of the N-1th frame image relative to the pixels of the Nth frame image is extracted, and then convolution processing is used to facilitate subsequent processing.
  • the number of channels of the fifth characteristic image is adjusted to the first preset value.
  • performing encoding processing on the aligned convolution kernel to obtain the deblurring convolution kernel includes: adjusting the number of channels of the aligned convolution kernel to a second preset through convolution processing. Set a value to obtain a sixth characteristic image; perform fusion processing on the fourth characteristic image and the sixth characteristic image to obtain a seventh characteristic image; perform convolution processing on the seventh characteristic image to extract the first characteristic image
  • the deblurring information of the pixels of the N-1 frame deblurred image relative to the pixels of the N-1th frame image obtains the deblurring convolution kernel.
  • the deblurring convolution kernel is obtained by convolution processing on the aligned convolution kernel, which can make the deblurring convolution kernel not only include the pixels of the N-1th frame image relative to the Nth frame image
  • the motion information of the pixel points also includes the deblurring information of the pixels of the N-1th frame deblurred image relative to the pixels of the N-1th frame image, which improves the subsequent deblurring convolution kernel to remove the Nth The blur effect of the frame image.
  • the convolution processing is performed on the seventh characteristic image to extract the difference between the N-1th frame of the deblurred image and the N-1th frame of image
  • Obtaining the deblurring convolution kernel from the deblurring information of the pixel includes: performing convolution processing on the seventh feature image to obtain an eighth feature image; and calculating the number of channels of the eighth feature image through convolution processing Adjust to the first preset value to obtain the deblurring convolution kernel.
  • the motion information of the pixels of the N-1th frame image relative to the pixels of the N-1th frame deblurred image is extracted,
  • the number of channels of the eighth feature image is adjusted to the first preset value through convolution processing.
  • the performing decoding processing on the third characteristic image to obtain the deblurred image of the Nth frame includes: performing deconvolution processing on the third characteristic image , Obtain a ninth characteristic image; perform convolution processing on the ninth characteristic image to obtain a decoded image of the Nth frame; compare the pixel value of the first pixel of the Nth frame of image with the Nth frame The pixel values of the second pixel of the decoded image are added to obtain the image after deblurring of the Nth frame, wherein the position of the first pixel in the Nth frame of image is the same as that of the The position of the second pixel point in the Nth frame decoded image is the same.
  • the third characteristic image is decoded through deconvolution processing and convolution processing to obtain the Nth frame decoded image, and then the Nth frame image and the Nth frame are decoded.
  • the pixel values of corresponding pixels in the processed image are added to obtain the deblurred image of the Nth frame, which further improves the deblurring effect.
  • an embodiment of the present application also provides a video image processing device, including: an acquiring unit configured to acquire multiple frames of continuous video images, wherein the multiple frames of continuous video images include the Nth frame image and the Nth frame image.
  • an acquiring unit configured to acquire multiple frames of continuous video images, wherein the multiple frames of continuous video images include the Nth frame image and the Nth frame image.
  • the first processing unit is configured to be based on the Nth frame of image, the N-1th frame of image, and the first N-1 frames of the deblurred image to obtain a deblurring convolution kernel for the Nth frame of image
  • a second processing unit configured to perform deblurring processing on the Nth frame of image through the deblurring convolution kernel , Get the Nth frame deblurred image.
  • the first processing unit includes: a first convolution processing subunit, configured to perform convolution processing on pixels of the image to be processed to obtain a deblurring convolution kernel, wherein the The processed image is obtained by superimposing the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image in the channel dimension.
  • the first convolution processing subunit is configured to perform convolution processing on the image to be processed to extract pixels of the N-1th frame image relative to the
  • the motion information of the pixels of the Nth frame of image obtains the alignment convolution kernel, where the motion information includes speed and direction; and the alignment convolution kernel is encoded to obtain the deblurring convolution kernel.
  • the second processing unit includes: a second convolution processing subunit configured to convolve the pixels of the characteristic image of the Nth frame of the image through the deblurring convolution kernel Product processing to obtain a first characteristic image; a decoding processing subunit configured to perform decoding processing on the first characteristic image to obtain the Nth frame deblurred image.
  • the second convolution processing subunit is configured to: adjust the dimension of the deblurring convolution kernel so that the number of channels of the deblurring convolution kernel is the same as the Nth frame The number of channels of the feature image of the image is the same; and the pixel points of the feature image of the Nth frame of image are convolved by the deblurring convolution kernel after the dimension is adjusted to obtain the first feature image.
  • the first convolution processing subunit is further configured to: perform convolution processing on the image to be processed to extract pixels of the N-1th frame image With respect to the motion information of the pixels of the Nth frame of image, after the aligned convolution kernel is obtained, the pixels of the characteristic image of the deblurred image of the N-1th frame are convolved through the aligned convolution kernel. Product processing to obtain the second feature image.
  • the first convolution processing subunit is further configured to adjust the dimension of the aligned convolution kernel so that the number of channels of the aligned convolution kernel is equal to the number of channels of the N-1th convolution kernel.
  • the number of channels of the characteristic image of the frame image is the same; and the pixel points of the characteristic image of the image after the deblurring processing of the N-1th frame are subjected to convolution processing by the aligned convolution check after adjusting the dimensions to obtain the first Two feature images.
  • the second processing unit is configured to: perform fusion processing on the first feature image and the second feature image to obtain a third feature image; The image is decoded to obtain the deblurred image of the Nth frame.
  • the first convolution processing subunit is further configured to: deblur the Nth frame image, the N-1th frame image, and the N-1th frame The latter image is superimposed in the channel dimension to obtain the image to be processed; and the image to be processed is encoded to obtain a fourth characteristic image; and the fourth characteristic image is convolved to obtain the first Five characteristic images; and adjusting the number of channels of the fifth characteristic image to a first preset value through convolution processing to obtain the aligned convolution kernel.
  • the first convolution processing subunit is further configured to: adjust the number of channels of the aligned convolution kernel to a second preset value through convolution processing to obtain a sixth characteristic image And performing fusion processing on the fourth feature image and the sixth feature image to obtain a seventh feature image; and performing convolution processing on the seventh feature image to extract the N-1th frame for deblurring
  • the deblurring information of the pixels of the processed image with respect to the pixels of the N-1th frame of image obtains the deblurring convolution kernel.
  • the first convolution processing subunit is further configured to: perform convolution processing on the seventh feature image to obtain an eighth feature image; and perform convolution processing on the first The number of channels of the eight feature image is adjusted to the first preset value to obtain the deblurring convolution kernel.
  • the second processing unit is further configured to: perform deconvolution processing on the third feature image to obtain a ninth feature image; and perform convolution on the ninth feature image Processing to obtain a decoded image of the Nth frame; and adding the pixel value of the first pixel of the Nth frame of image to the pixel value of the second pixel of the image of the Nth frame of decoded image, Obtain the deblurred image of the Nth frame, wherein the position of the first pixel in the image of the Nth frame and the position of the second pixel in the decoded image of the Nth frame The location is the same.
  • an embodiment of the present application further provides a processor, which is configured to execute the foregoing first aspect and any one of the possible implementation methods thereof.
  • an embodiment of the present application also provides an electronic device, including: a processor, an input device, an output device, and a memory.
  • the processor, input device, output device, and memory are connected to each other, and the memory stores Program instructions; when the program instructions are executed by the processor, the processor executes the above-mentioned first aspect and any one of its possible implementation methods.
  • the embodiments of the present application also provide a computer-readable storage medium in which a computer program is stored, and the computer program includes program instructions that are processed by an electronic device When the processor executes, the processor is caused to execute the above-mentioned first aspect and any one of its possible implementation methods.
  • FIG. 1 is a schematic diagram of corresponding pixels in different images provided by an embodiment of the application
  • Fig. 2 is a non-uniform blurred image provided by an embodiment of the application
  • FIG. 3 is a schematic flowchart of a video image processing method provided by an embodiment of this application.
  • FIG. 4 is a schematic diagram of the flow of deblurring processing in a video image processing method according to an embodiment of the application
  • FIG. 5 is a schematic flowchart of another video image processing method provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of a process for obtaining a deblurring convolution kernel and an alignment convolution kernel provided by an embodiment of the application;
  • FIG. 7 is a schematic diagram of an encoding module provided by an embodiment of the application.
  • FIG. 8 is a schematic diagram of an aligned convolution kernel generation module provided by an embodiment of the application.
  • FIG. 9 is a schematic diagram of a deblurring convolution kernel generation module provided by an embodiment of the application.
  • FIG. 10 is a schematic flowchart of another video image processing method provided by an embodiment of the application.
  • FIG. 11 is a schematic diagram of an adaptive convolution processing module provided by an embodiment of the application.
  • FIG. 12 is a schematic diagram of a decoding module provided by an embodiment of this application.
  • FIG. 13 is a schematic structural diagram of a video image deblurring neural network provided by an embodiment of this application.
  • FIG. 14 is a schematic structural diagram of an aligned convolution kernel and deblurring convolution kernel generation module provided by an embodiment of the application;
  • FIG. 15 is a schematic structural diagram of a video image processing device provided by an embodiment of this application.
  • FIG. 16 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the application.
  • the word “correspondence” will appear a lot, where the corresponding pixels in the two images refer to two pixels at the same position in the two images.
  • the pixel point a in the image A corresponds to the pixel point d in the image B
  • the pixel point b in the image A corresponds to the pixel point c in the image B.
  • the corresponding pixels in the multiple images have the same meaning as the corresponding pixels in the two images.
  • the non-uniform blurred image that appears in the following refers to the different degrees of blurring of different pixels in the image, that is, the motion trajectories of different pixels are different.
  • the blur degree of the font on the sign in the upper left corner is greater than the blur degree of the car in the lower right corner, that is, the blur degrees of the two areas are inconsistent.
  • the embodiments of the present application can be used to remove the blur in the non-uniformly blurred image. The embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
  • FIG. 3 is a schematic flowchart of a video image processing method provided by an embodiment of the present application. As shown in FIG. 3, the method includes:
  • the multi-frame continuous video image includes an Nth frame image, an N-1th frame image, and an N-1th frame deblurred image, where N is a positive integer.
  • multiple frames of continuous video images can be obtained by shooting video with a camera.
  • the above Nth frame image and N-1th frame image are two adjacent frames of the multi-frame continuous video image, and the Nth frame image is the image after the N-1th frame image, and the Nth frame image is the current Prepare a frame of image for processing (that is, apply the implementation provided in this application for deblurring processing).
  • the image after deblurring the N-1th frame is the image obtained after deblurring the N-1th frame image.
  • the deblurring of the video image in the embodiments of this application is a recursive process, that is, the image after the deblurring of the N-1th frame will be used as the input image of the Nth frame of image deblurring.
  • the deblurred image of the Nth frame will be used as the input image of the N+1th frame of image deblurring process.
  • N 1, that is, the object of the current deblurring process is the first frame in the video.
  • the N-1th frame image and the N-1th frame deblurred image are both the Nth frame, that is, three first frame images are acquired.
  • a sequence obtained by arranging each frame of images in the video in the order of shooting time is called a video frame sequence.
  • the image obtained after deblurring is called the image after deblurring.
  • the video image is deblurred according to the sequence of video frames, and only one frame of the image is deblurred at a time.
  • the video image and the deblurred image can be stored in the storage of the electronic device.
  • the video refers to the video stream, that is, the video images are stored in the memory of the electronic device in the order of the video frame sequence. Therefore, the electronic device can directly obtain the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image from the memory.
  • the video image mentioned in the embodiments of the present application may be a video captured in real time by a camera of an electronic device, or may be a video image stored in a memory of the electronic device.
  • the Nth frame is obtained based on the Nth frame image, the N-1th frame image, and the deblurred image of the N-1th frame.
  • the deblurring convolution kernel of the frame image includes: performing convolution processing on the pixels of the image to be processed to obtain the deblurring convolution kernel, wherein the image to be processed is composed of the Nth frame image and the N-1th image.
  • the frame image and the deblurred image of the N-1th frame are superimposed on the channel dimension.
  • the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image are superimposed in the channel dimension to obtain the image to be processed.
  • the size of the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image are all 100*100*3
  • the size of the image to be processed after superposition is 100*100*9, that is to say, three images (the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image) are superimposed in the image to be processed
  • the number of pixels is unchanged compared to the number of pixels in any one of the three images, but the number of channels of each pixel will become 3 times that of any one of the three images.
  • the convolution processing performed on the pixels of the image to be processed can be implemented by multiple arbitrarily stacked convolution layers.
  • the embodiment of this application controls the number of convolution layers and the size of the convolution kernel in the convolution layer. Not limited.
  • the characteristic information of the pixels in the image to be processed can be extracted to obtain a deblurring convolution kernel.
  • the characteristic information includes motion information of pixels of the N-1 frame image relative to the pixels of the N frame image, and pixels of the N-1 frame image relative to the N-1 frame deblurring The deblurring information of the pixels of the processed image.
  • the aforementioned motion information includes the motion speed and direction of the pixel in the N-1th frame of image relative to the corresponding pixel in the Nth frame of image.
  • the deblurring convolution kernel in the embodiment of the present application is the result of the convolution processing of the image to be processed, and it is used as the convolution kernel of the convolution processing in the subsequent processing of the embodiment of the present application.
  • performing convolution processing on pixels of the image to be processed refers to performing convolution processing on each pixel of the image to be processed to obtain a deblurring convolution kernel for each pixel respectively.
  • Example 1 continues the example (Example 2), the size of the image to be processed is 100*100*9, that is, the image to be processed contains 100*100 pixels, and the pixels of the image to be processed are convolved to obtain A 100*100 feature image, where each pixel in the above 100*100 feature image can be used as a deblurring convolution kernel for subsequent deblurring of the pixel in the Nth frame of image.
  • the deblurring processing is performed on the Nth frame of image through the deblurring convolution kernel to obtain the deblurred image of the Nth frame,
  • the feature image of the Nth frame image can be obtained by performing feature extraction processing on the Nth frame image.
  • the feature extraction processing may be convolution processing or pooling processing, which is not limited in the embodiment of the present application.
  • the deblurring convolution kernel of each pixel in the image to be processed is obtained.
  • the number of pixels in the image to be processed is the same as the number of pixels in the Nth frame of image, and the pixels in the image to be processed correspond to the pixels in the Nth frame of image one-to-one.
  • the meaning of one-to-one correspondence can be seen in the following example: the pixel point A in the image to be processed corresponds to the pixel point B in the Nth frame image, that is, the position of the pixel point A in the image to be processed and the pixel Point B has the same position in the Nth frame of image.
  • the foregoing decoding processing may be implemented through deconvolution processing, or may be obtained through a combination of deconvolution processing and convolution processing, which is not limited in the embodiment of the present application.
  • the pixel value of the pixel in the image obtained by decoding the first characteristic image is added to the pixel value of the pixel of the Nth frame of image,
  • the image obtained after the "addition” is regarded as the image after the deblurring of the Nth frame.
  • the information of the Nth frame image can be used to obtain the Nth frame deblurred image.
  • the Nth frame obtained after "addition" is deblurred
  • the pixel value of pixel E in the processed image is 350, where the position of C in the image to be processed, the position of D in the Nth frame of image, and the position of E in the Nth frame of deblurred image the same.
  • the motion trajectories of different pixels in a non-uniformly blurred image are different, and the more complex the motion trajectory of the pixel, the higher the degree of blur.
  • the embodiment of the present application predicts one pixel for each pixel in the image to be processed
  • the deblurring convolution kernel is used to perform convolution processing on the feature points in the Nth frame of image through the predicted deblurring convolution kernel to remove the blurring of the pixels in the Nth frame of features. Since different pixels in a non-uniform blurred image have different degrees of blur, it is obvious that the corresponding deblurring convolution kernel is generated for different pixels, which can better remove the blur of each pixel, and then achieve the removal of non-uniform blur Blur in the image.
  • the embodiment of the present application obtains the deblurring convolution kernel of the pixel based on the deblurring information between the pixels of the N-1th frame of image and the deblurred image of the N-1th frame, and uses the deblurring
  • the convolution kernel performs deconvolution processing on the corresponding pixels in the Nth frame image to remove the blur of the pixels in the Nth frame image; by generating a deblurring convolution kernel for each pixel in the Nth frame image , Can remove the blur in the Nth frame image (non-uniform blur image), the image after deblurring is clear and natural, and the entire deblurring process is time-consuming and fast.
  • FIG. 5 is a schematic flowchart of a possible implementation manner of 302 according to an embodiment of the present application. As shown in FIG. 5, the method includes:
  • the motion information includes speed and direction, which can be understood as the motion information of a pixel point from the time of the N-1th frame (the time when the image of the N-1th frame is taken) to the time of the Nth frame ( The time when the Nth frame of image was taken).
  • the motion information of the pixel points helps to remove the blur of the Nth frame image.
  • the convolution processing performed on the pixels of the image to be processed can be implemented by multiple arbitrarily stacked convolution layers.
  • the embodiment of this application controls the number of convolution layers and the size of the convolution kernel in the convolution layer. Not limited.
  • the characteristic information of the pixels in the image to be processed can be extracted to obtain the aligned convolution kernel.
  • the feature information here includes motion information of pixels of the N-1th frame image relative to the pixels of the Nth frame image.
  • the aligned convolution kernel in the embodiment of the present application is the result obtained by performing the aforementioned convolution processing on the image to be processed, and will be used as the convolution kernel of the convolution processing in the subsequent processing of the embodiment of the present application.
  • the alignment convolution kernel extracts the motion information of the pixels of the N-1th frame image relative to the pixels of the Nth frame image by performing convolution processing on the image to be processed, it can be subsequently checked by alignment convolution The pixel points of the Nth frame image are aligned.
  • the aligned convolution kernel obtained in this embodiment is also obtained in real time, that is, through the above processing, the aligned convolution kernel of each pixel in the Nth frame of image is obtained.
  • the encoding processing here can be convolution processing or pooling processing.
  • the foregoing encoding processing is convolution processing, and the convolution processing can be implemented by a plurality of arbitrarily stacked convolution layers.
  • the embodiment of the present application controls the number of convolution layers and the convolution kernel in the convolution layer.
  • the size is not limited.
  • the convolution processing in 402 is different from the convolution processing in 401.
  • the convolution processing in 401 is implemented by 3 convolutional layers with 32 channels (the size of the convolution kernel is 3*3), and the convolution processing in 402 consists of 5 convolutions with 64 channels.
  • the build-up layer (the size of the convolution kernel is 3*3) is implemented. Both (3 convolutional layers and 5 convolutional layers) are essentially convolution processing, but the specific implementation process of the two is different.
  • the image to be processed Since the image to be processed is obtained by superimposing the image of the Nth frame, the image of the N-1th frame, and the deblurred image of the N-1th frame in the channel dimension, the image to be processed contains the image of the Nth frame, Information about the N-1th frame image and the deblurred image of the N-1th frame.
  • the convolution processing in 401 focuses more on extracting the motion information of the pixels of the N-1th frame image relative to the pixels of the Nth frame image, that is to say, after the processing of 401, the Nth image in the image to be processed
  • the deblurring information between the -1 frame image and the N-1th frame deblurred image is not extracted.
  • the image to be processed and the alignment convolution kernel may be fused, so that the aligned convolution kernel obtained after fusion includes the N-1th frame image and the N-1th frame The deblurring information between the deblurred images.
  • the deblurring information of the image after deblurring processing in the N-1th frame relative to the pixels of the N-1th frame image is extracted to obtain the deblurring convolution kernel.
  • the deblurring information can be understood as the mapping relationship between the pixels of the N-1th frame of image and the pixels of the N-1th deblurred image, that is, the pixels before deblurring and the pixels after deblurring. The mapping relationship between points.
  • the deblurring convolution kernel obtained by convolution processing the alignment convolution kernel includes the deblurring between the pixels of the N-1th frame image and the pixels of the N-1th frame deblurred image
  • the information includes the motion information between the pixels of the N-1th frame of image and the pixels of the Nth frame of image.
  • Subsequent convolution processing is performed on the pixels of the Nth frame of image through the deblurring convolution kernel to improve the deblurring effect.
  • the embodiment of the present application obtains the alignment convolution kernel of the pixels based on the motion information between the pixels of the N-1th frame image and the pixels of the Nth frame image, and subsequent alignment processing can be performed through the alignment convolution kernel. Then through the convolution processing of the alignment convolution kernel, the deblurring information between the pixels of the N-1th frame image and the pixels of the N-1th frame deblurred image is extracted, and the deblurring convolution is obtained.
  • the kernel can make the deblurring convolution kernel not only include the deblurring information between the pixels of the N-1th frame image and the pixels of the N-1th frame deblurred image, but also include the N-1th frame
  • the motion information between the pixels of the image and the pixels of the Nth frame of image is beneficial to improve the effect of removing the blur of the Nth frame of image.
  • the foregoing embodiments all obtain the deblurring convolution kernel and the alignment convolution kernel by performing convolution processing on the image. Due to the large number of pixels contained in the image, if the image is processed directly, the amount of data to be processed is large and the processing speed is slow. Therefore, the embodiment of the present application will provide a deblurring convolution based on the characteristic image.
  • the implementation of the kernel and alignment convolution kernel is a deblurring convolution kernel.
  • FIG. 6 is a schematic diagram of a process for obtaining a deblurring convolution kernel and an alignment convolution kernel according to Embodiment 6 of the present application. As shown in FIG. 6, the method includes:
  • step 302 Please refer to step 302 to obtain the implementation of the image to be processed, which will not be repeated here.
  • the foregoing encoding processing can be implemented in multiple ways, such as convolution, pooling, etc., which are not specifically limited in the embodiment of the present application.
  • the module shown in Figure 7 can be used to encode the image to be processed.
  • the module in turn includes a convolutional layer with 32 channels (the size of the convolution kernel is 3*3) , Two residual blocks with 32 channels (each residual block contains two convolutional layers, the size of the convolution kernel of the convolutional layer is 3*3), and a convolutional layer with 64 channels (convolution The size of the product kernel is 3*3), two residual blocks with 64 channels (each residual block contains two convolution layers, and the size of the convolution kernel of the convolution layer is 3*3), one channel number A 128 convolutional layer (convolution kernel size is 3*3), two residual blocks with 128 channels (each residual block contains two convolutional layers, the size of the convolution kernel of the convolutional layer is 3*3).
  • the image to be processed is subjected to layer-by-layer convolution processing to complete the encoding of the image to be processed, and the fourth characteristic image is obtained.
  • the characteristic content and semantic information extracted by each convolution layer are different, and the specific expression is encoding processing
  • the features of the image to be processed are abstracted step by step, and relatively minor features will be gradually removed. Therefore, the smaller the size of the feature image extracted later, and the more concentrated the semantic information.
  • the image to be processed is convolved step by step, and the corresponding features are extracted, and finally a fixed size fourth feature image is obtained. In this way, the main content information of the image to be processed (ie the fourth feature image) can be obtained At the same time, the image size is reduced, the amount of data processing is reduced, and the processing speed is increased.
  • Example 3 For example (Example 3), assuming that the size of the image to be processed is 100*100*3, the size of the fourth characteristic image obtained through the encoding process of the module shown in FIG. 7 is 25*25*128.
  • the implementation process of the above convolution processing is as follows: the convolution layer performs convolution processing on the image to be processed, that is, the convolution kernel is used to slide on the image to be processed, and the pixels on the image to be processed are Multiply the values on the corresponding convolution kernel, and then add all the multiplied values as the pixel value on the image corresponding to the middle pixel of the convolution kernel. Finally, all the pixels in the image to be processed are slidingly processed, and the fourth is obtained. Feature image.
  • the step size of the convolutional layer may be set to 2.
  • FIG. 8 is a module for generating an aligned convolution kernel provided by an embodiment of the application.
  • the fourth feature image is input to the module shown in Figure 8.
  • the fourth feature image sequentially passes through a convolutional layer with 128 channels (convolution kernel size is 3*3) and two channels
  • the number of residual blocks is 64 (each residual block contains two convolutional layers, the size of the convolution kernel of the convolutional layer is 3*3) to realize the convolution processing of the fourth feature image, and extract the first
  • the motion information between the pixel points of the N-1th frame image and the pixel points of the Nth frame image in the four-feature image is used to obtain the fifth feature image.
  • the size of the image does not change, that is, the size of the fifth characteristic image obtained is the same as the size of the fourth characteristic image.
  • the size of the fourth feature image is 25*25*128, and the size of the fifth feature image obtained through the processing of 303 is also 25*25*128.
  • the fourth layer in Figure 8 performs convolution processing on the fifth feature image, and the obtained
  • the size of the aligned convolution kernel is 25*25*c*k*k (it needs to be understood that the number of channels of the fifth feature image is adjusted by the fourth layer of convolution processing), where c is the fifth feature image K is a positive integer, optionally, the value of k is 5.
  • 25*25*c*k*k is adjusted to 25*25*ck 2 , where ck 2 is the first preset value.
  • the height and width of the aligned convolution kernel are both 25.
  • the aligned convolution kernel contains 25*25 elements, each element contains c pixels, and the positions of different elements in the aligned convolution kernel are different, such as: assuming that the width and height of the aligned convolution kernel are defined If it is the xoy plane, each element in the aligned convolution kernel can be determined by coordinates (x, y), where o is the origin.
  • the elements of the aligned convolution kernel are the convolution kernels for pixel alignment in the subsequent processing, and the size of each element is 1*1*ck 2 .
  • Example 4 continues the example (Example 5), the size of the fifth feature image is 25*25*128, and the size of the aligned convolution kernel obtained by the processing of 304 is 25*25*128*k*k, which is 25*25 *128k 2 .
  • the aligned convolution kernel contains 25*25 elements, each element contains 128 pixels, and different elements have different positions in the first aligned convolution kernel.
  • the size of each element is 1*1*128*k 2 .
  • the fourth layer is a convolutional layer, and the larger the convolution kernel of the convolutional layer, the greater the amount of data processing.
  • the fourth layer in FIG. 8 is a convolutional layer with 128 channels and a convolution kernel size of 1*1. Adjusting the number of channels of the fifth feature image through the convolution layer with the convolution kernel size of 1*1 can reduce the amount of data processing and increase the processing speed.
  • the alignment convolution Since the number of channels of the fifth feature image is adjusted by convolution processing in 504 (that is, the fourth layer in Figure 8), before convolution processing the alignment convolution kernel to obtain the deblurring convolution kernel, the alignment convolution The number of channels of the product core is adjusted to the second preset value (that is, the number of channels of the fifth characteristic image).
  • the number of channels of the aligned convolution kernel is adjusted to the second preset value through convolution processing to obtain the sixth characteristic image.
  • the convolution processing can be implemented by a convolution layer with 128 channels and a convolution kernel size of 1*1.
  • the present embodiments 502 to 504 are more focused on extracting the motion information between the pixels of the N-1th frame of image and the pixels of the Nth frame of the image to be processed. Since the subsequent processing needs to extract the deblurring information between the pixels of the N-1th frame of the image to be processed and the pixels of the N-1th frame of the deblurred image, before the subsequent processing, by The fourth characteristic image and the sixth characteristic image are merged to add deblurring information between the pixels of the N-1th frame image and the pixels of the N-1th frame deblurred image in the characteristic image.
  • the fourth feature image and the sixth feature image are concatenated, that is, the fourth feature image and the sixth feature image are superimposed in the channel dimension to obtain the seventh feature image.
  • the seventh characteristic image contains the deblurring information between the extracted pixels of the N-1th frame image and the pixels of the N-1th frame deblurred image, and the seventh characteristic image is scrolled
  • the product processing can further extract the deblurring information between the pixels of the N-1th frame image and the pixels of the N-1th frame deblurred image to obtain a deblurring convolution kernel.
  • the process includes the following steps :
  • Convolution processing is performed on the seventh feature image to obtain an eighth feature image; the number of channels of the eighth feature image is adjusted to the first preset value through convolution processing to obtain a deblurring convolution kernel.
  • the seventh feature image is input to the module shown in Figure 9, and the seventh feature image sequentially passes through a convolutional layer with 128 channels (the size of the convolution kernel is 3*3), two residual blocks with 64 channels (each residual block contains two convolutional layers, and the size of the convolution kernel of the convolutional layer is 3*3) processing to achieve the seventh feature
  • the image convolution process extracts the deblurring information between the pixels of the N-1th frame image in the seventh characteristic image and the pixels of the N-1th frame deblurred image to obtain the eighth characteristic image.
  • the processing procedure of the seventh characteristic image by the module shown in FIG. 9 can refer to the processing procedure of the fifth characteristic image by the module shown in FIG. 8, which will not be repeated here.
  • the module shown in Figure 8 (used to generate aligned convolution kernels) is compared with the module shown in Figure 9 (used to generate deblurring convolution kernels).
  • the module has one more convolutional layer (that is, the fourth layer of the module shown in Figure 8). Although the rest of the composition is the same, the weights of the two are different, which directly determines that the uses of the two are different.
  • the weights of the modules shown in FIG. 8 and the modules shown in FIG. 9 may be obtained by training the modules shown in FIG. 8 and FIG. 9.
  • the deblurring convolution kernel obtained by 507 is a deblurring convolution kernel including each pixel in the seventh feature image, and the size of the convolution kernel of each pixel is 1*1*ck 2 .
  • Example 5 continues the example (Example 6), the size of the seventh feature image is 25*25*128*k*k, that is to say, the seventh feature image contains 25*25 pixels. Accordingly, the obtained The fuzzy convolution kernel (size 25*25*128k 2 ) contains 25*25 deblurring convolution kernels (that is, each pixel corresponds to a deblurring convolution kernel, and each pixel deblurring convolution kernel The size is 1*1*128k 2 ).
  • the information of each pixel in the seventh characteristic image is synthesized into a convolution kernel, that is, the information of each pixel Deblurring the convolution kernel.
  • the motion information between the pixels of the N-1 frame image and the pixels of the N frame image is extracted, and the aligned convolution kernel of each pixel is obtained.
  • the deblurring information between the pixels of the N-1th frame image and the pixels of the N-1th frame deblurred image is extracted, and each pixel is obtained The deblurring convolution kernel.
  • This embodiment explains in detail how to obtain the deblurring convolution kernel and the aligned convolution kernel.
  • the following embodiments will elaborate on how to remove the blur in the Nth frame image through the deblurring convolution kernel and the aligned convolution kernel, and obtain the first N frames of deblurred image.
  • FIG. 10 is a schematic flowchart of another video image processing method provided by an embodiment of the present application. As shown in FIG. 10, the method includes:
  • the above-mentioned feature image of the Nth frame image may be obtained by performing feature extraction processing on the Nth frame image, where the feature extraction processing may be convolution processing or pooling processing, which is not limited in the embodiment of the application.
  • the feature extraction process of the Nth frame image can be performed by the encoding module shown in FIG. 7 to obtain the feature image of the Nth frame image.
  • the specific composition of FIG. 7 and the processing process of the Nth frame image in FIG. 7 can be referred to 502, which will not be repeated here.
  • the feature image of the Nth frame image includes the Nth frame Image information (in this application, the information here can be understood as the information of the blurred area in the Nth frame of image), so subsequent processing of the characteristic image of the Nth frame of image can reduce the amount of data processing and increase the processing speed.
  • each pixel in the image to be processed is subjected to convolution processing to obtain the deblurring convolution kernel of each pixel respectively, and the pixel points of the characteristic image of the Nth frame image are convolved through the deblurring convolution kernel.
  • Processing refers to: using the deblurring convolution kernel of each pixel in the deblurring convolution kernel obtained by the foregoing embodiment as the convolution kernel of the corresponding pixel in the feature image of the Nth frame of image, Each pixel of the characteristic image is convolved.
  • the deblurring convolution kernel of each pixel in the deblurring convolution kernel contains the information of each pixel in the seventh feature image, and this information is one-dimensional information in the deblurring convolution kernel. .
  • the pixel points of the characteristic image of the Nth frame image are three-dimensional. Therefore, the information of each pixel point in the seventh characteristic image is used as the convolution kernel of each pixel point in the characteristic image of the Nth frame image.
  • the dimension of the deblurring convolution kernel needs to be adjusted. Based on the above considerations, the implementation process of 901 includes the following steps:
  • the deblurring convolution kernel of each pixel in the deblurring convolution kernel obtained in the foregoing embodiment can be used as the characteristic image of the Nth frame image through the module (adaptive convolution processing module) shown in FIG. 11 Convolution kernel of the corresponding pixel in the, and perform convolution processing on the pixel.
  • the reshape in Figure 11 refers to the dimension of the deblurring convolution kernel for each pixel in the deblurring convolution kernel, that is, the dimension of the deblurring kernel of each pixel is adjusted from 1*1*ck 2 to c*k*k.
  • Example 6 continues the example (Example 7), the size of the deblurring convolution kernel of each pixel is 1*1*128k 2 , after reshape the deblurring convolution kernel of each pixel, the resulting convolution kernel The size is 128*k*k.
  • the aligned convolution kernel performs convolution processing on the pixel points of the feature image of the image after deblurring the N-1th frame to obtain a second feature image , Including: adjusting the dimension of the aligned convolution kernel so that the number of channels of the aligned convolution kernel is the same as the number of channels of the feature image of the N-1th frame image; and the aligned convolution after adjusting the dimensions
  • the pixel points of the characteristic image of the deblurred image of the N-1th frame are checked for convolution processing to obtain the second characteristic image.
  • the deblurring convolution kernel obtained in the previous embodiment is used as the deblurring convolution kernel for each pixel of the feature image of the Nth frame image through the module shown in FIG. 11.
  • the image deblurring is the same.
  • the dimension of the alignment convolution kernel of each pixel in the alignment convolution kernel obtained in the foregoing embodiment is adjusted to 128*k*k through the reshape in the module shown in FIG. 11, and through adjustment
  • the aligned convolution kernel after the dimensions performs convolution processing on the corresponding pixels in the feature image of the image after the deblurring processing of the N-1th frame.
  • the characteristic image of the deblurred image in the N-1th frame contains a large number of clear (that is, no blur) pixels, but the pixels in the characteristic image of the deblurred image in the N-1th frame are the same as the current frame There is a displacement between the pixels of. Therefore, through the processing of 902, the position of the pixel point of the characteristic image of the image after the deblurring process of the N-1th frame is adjusted, so that the adjusted position of the pixel point is closer to the position at the time of the Nth frame (the position here refers to The position of the subject in the Nth frame of image). In this way, the subsequent processing can use the information of the second characteristic image to remove the blur in the Nth frame of image.
  • 901 can be executed first, then 902, or 902 can be executed first, then 901, or 901 and 902 can be executed simultaneously.
  • 901 may be executed first, and then 505-507, or 505-507 may be executed first, and then 901 or 902 may be executed.
  • the embodiments of this application do not limit this.
  • the first feature image with the second feature image By fusing the first feature image with the second feature image, it can be based on the motion information between the pixels of the N-1 frame image and the pixels of the N frame image and the pixels of the N-1 frame image On the basis of the deblurring information between the pixels of the deblurred image in the N-1th frame, the information of the characteristic image of the (aligned) N-1th frame image is used to improve the deblurring effect.
  • the first feature image and the second feature image are superimposed on the channel dimension to obtain the third feature image.
  • the decoding processing can be any one of deconvolution processing, deconvolution processing, bilinear interpolation processing, and depooling processing, or deconvolution processing, deconvolution processing, double
  • deconvolution processing deconvolution processing, double
  • Figure 12 shows the decoding module, which in turn includes a deconvolution layer with 64 channels (the size of the convolution kernel is 3*3), and two channels A residual block of 64 (each residual block contains two convolutional layers, the size of the convolution kernel of the convolutional layer is 3*3), and a deconvolution layer with 32 channels (the size of the convolution kernel 3*3), two residual blocks with 32 channels (each residual block contains two convolutional layers, and the size of the convolution kernel of the convolutional layer is 3*3).
  • the third characteristic image is decoded by the decoding module shown in FIG. 12 to obtain the deblurred image of the Nth frame including the following steps: deconvolution processing on the third characteristic image to obtain the ninth characteristic image; The nine-feature image is subjected to convolution processing to obtain the N-th frame decoded image.
  • the pixel value of the first pixel of the Nth frame of image can be added to the pixel value of the second pixel of the Nth frame of decoded image .
  • To obtain the deblurred image of the Nth frame wherein the position of the first pixel in the Nth frame of image is the same as the position of the second pixel in the Nth frame of decoded image. Make the Nth frame deblurred image more natural.
  • the feature image of the Nth frame image can be deblurred by the deblurring convolution kernel obtained in the foregoing embodiment, and the feature image of the N-1th frame image can be aligned by the alignment convolution kernel obtained by the foregoing embodiment deal with.
  • Deblurring the first feature image obtained by the deblurring process and the second feature image obtained by the alignment process is fused to decode the third feature image, which can improve the deblurring effect of the Nth frame image and deblur the Nth frame
  • the processed image is more natural.
  • the target of both the deblurring processing and the alignment processing in this embodiment is the feature image, therefore, the data processing amount is small, the processing speed is fast, and real-time deblurring of the video image can be realized.
  • This application also provides a video image deblurring neural network for implementing the method in the foregoing embodiment.
  • FIG. 13 is a schematic structural diagram of a video image deblurring neural network provided by an embodiment of the present application.
  • the video image deblurring neural network includes: an encoding module, an alignment convolution kernel, a deblurring convolution kernel generation module, and a decoding module.
  • the encoding module in FIG. 13 is the same as the encoding module shown in FIG. 7, and the decoding module in FIG. 13 is the same as the decoding module shown in FIG. 12, which will not be repeated here.
  • the aligned convolution kernel and deblurring convolution kernel generation module shown in Fig. 14 includes: a decoding module, an aligned convolution kernel generation module, a deblurring convolution kernel generation module, and the alignment convolution kernel generation module and
  • the deblurring convolution kernel generation module includes a convolution layer with a channel number of 128 and a convolution kernel size of 1*1.
  • the convolution layer is connected to a concatenate layer.
  • the adaptive convolutional layer shown in FIG. 14 is the module shown in FIG. 11.
  • the aligned convolution kernel and deblurring convolution kernel generated by the module shown in Figure 14 respectively convolve the pixel points of the feature image of the N-1th frame image and the feature image of the Nth frame image through the adaptive convolution layer.
  • Product processing ie, alignment processing and de-blurring processing
  • the N-th frame decoded image is obtained, and the pixel value of the first pixel of the N-th frame image is compared with the value of the N-th frame decoded image.
  • the pixel values of the second pixel are added to obtain the deblurred image of the Nth frame, where the position of the first pixel in the Nth frame of image and the second pixel in the Nth frame of decoded image The location is the same.
  • the Nth frame image and the deblurred image of the Nth frame are used as the input of the video image deblurring neural network to process the N+1th frame image.
  • the video image deblurring neural network requires 4 inputs to deblur each frame of the video.
  • the 4 inputs are: The feature image of the N-1th frame image, the N-1th frame deblurred image, the Nth frame image, and the N-1th frame deblurred image (that is, the feature image after the above Nth frame fusion) .
  • the video image deblurring neural network provided by this embodiment can perform deblurring processing on the video image, and the entire processing process only needs 4 inputs to directly obtain the deblurred image, and the processing speed is fast.
  • the deblurring convolution kernel generation module and the alignment convolution kernel generation module generate a deblurring convolution kernel and alignment convolution kernel for each pixel in the image, which can improve the video image deblurring neural network for different frames in the video. Deblurring effect for non-uniformly blurred images.
  • the embodiment of the application Based on the video image deblurring neural network provided in the embodiment, the embodiment of the application provides a training method for the video image deblurring neural network.
  • the difference between the Nth frame deblurred image output by the video image deblurring neural network and the clear image of the Nth frame image is determined.
  • the error between is determined.
  • the specific expression of the mean square error loss function is as follows:
  • C, H, W are respectively the Nth frame image (assuming that the video image deblurring neural network deblurs the Nth frame image) channel number, height, and width, and R is the Nth frame input of the video image deblurring neural network.
  • Frame deblurred image, S is the supervision data of the Nth frame image.
  • the perceptual loss function is used to determine the Euclidean distance between the features of the Nth frame of the deblurred image output by the VGG-19 network and the features of the Nth frame of image supervision data.
  • the specific expression of the perceptual loss function is as follows:
  • ⁇ j ( ⁇ ) is the feature image output by the jth layer in the pre-trained VGG-19 network, They are the number of channels, height, and width of the feature image, R is the Nth frame deblurred image input by the video image deblurring neural network, and S is the ground truth of the Nth frame image.
  • this embodiment obtains the loss function of the video image deblurring neural network by performing weighted summation on formula (1) and formula (2).
  • the specific expression is as follows:
  • is the weight; optionally, ⁇ is a natural number.
  • the value of j may be 15, and the value of ⁇ may be 0.01.
  • the training of the video image deblurring neural network of this embodiment can be completed.
  • the embodiments of the present application provide several possible implementation scenarios.
  • Applying the embodiments of the present application to a drone can remove the blur of the video image captured by the drone in real time, and provide users with clearer videos.
  • the UAV's flight control system is based on the deblurred video image to process the UAV's attitude and movement, which can improve the control accuracy and provide strong support for the UAV to complete various aerial operations.
  • the embodiments of this application can also be applied to mobile terminals (such as mobile phones, sports cameras, etc.).
  • the user uses the terminal to capture videos of objects that move vigorously, and the terminal can take pictures of the user by running the method provided in the embodiments of this application.
  • the video is processed in real time to reduce the blur caused by the intense movement of the subject and improve the user experience.
  • the violent movement of the subject refers to the relative movement between the terminal and the subject.
  • the video image processing method provided by the embodiments of the present application has fast processing speed and good real-time performance.
  • the neural network provided by the embodiments of the present application has less weights and requires less processing resources to run the neural network, and therefore, it can be applied to mobile terminals.
  • FIG. 15 is a schematic structural diagram of a video image processing apparatus provided by an embodiment of the application.
  • the apparatus 1 includes: an acquisition unit 11, a first processing unit 12, and a second processing unit 13, wherein:
  • the acquiring unit 11 is configured to acquire multiple frames of continuous video images, wherein the multiple frames of continuous video images include an Nth frame image, an N-1th frame image, and an N-1th frame deblurred image, and the N Is a positive integer;
  • the first processing unit 12 is configured to obtain a deblurring volume of the Nth frame image based on the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image Product core
  • the second processing unit 13 is configured to perform deblurring processing on the Nth frame of image through the deblurring convolution kernel to obtain a deblurred image of the Nth frame.
  • the first processing unit 12 includes: a first convolution processing subunit 121, configured to perform convolution processing on pixels of the image to be processed to obtain a deblurring convolution kernel, wherein The image to be processed is obtained by superimposing the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image in the channel dimension.
  • a first convolution processing subunit 121 configured to perform convolution processing on pixels of the image to be processed to obtain a deblurring convolution kernel, wherein The image to be processed is obtained by superimposing the Nth frame image, the N-1th frame image, and the N-1th frame deblurred image in the channel dimension.
  • the first convolution processing subunit 121 is configured to perform convolution processing on the image to be processed to extract the pixels of the N-1th frame image relative to all
  • the motion information of the pixels of the Nth frame of image obtains the aligned convolution kernel, where the motion information includes speed and direction; and the alignment convolution kernel is encoded to obtain the deblurring convolution kernel.
  • the second processing unit 13 includes: a second convolution processing subunit 131 configured to check the pixel points of the characteristic image of the Nth frame of image through the deblurring convolution check Perform convolution processing to obtain a first characteristic image; the decoding processing sub-unit 132 is configured to perform decoding processing on the first characteristic image to obtain the Nth frame of the deblurred image.
  • the second convolution processing subunit 131 is configured to adjust the dimension of the deblurring convolution kernel so that the number of channels of the deblurring convolution kernel is equal to the number of channels of the Nth convolution kernel.
  • the number of channels of the characteristic image of the frame image is the same; and the pixel points of the characteristic image of the Nth frame image are convolved by the deblurring convolution kernel after the dimension is adjusted to obtain the first characteristic image.
  • the first convolution processing subunit 121 is further configured to: perform convolution processing on the to-be-processed image to extract pixels of the N-1th frame of image
  • the motion information of a point relative to the pixel of the Nth frame image is obtained after the aligned convolution kernel is obtained, and then the pixel points of the characteristic image of the image deblurred in the N-1th frame are processed through the aligned convolution kernel. Convolution processing to obtain the second feature image.
  • the first convolution processing subunit 121 is further configured to: adjust the dimension of the aligned convolution kernel so that the number of channels of the aligned convolution kernel is equal to the number of channels of the N-th convolution kernel.
  • the number of channels of the characteristic image of one frame of image is the same; and the pixel points of the characteristic image of the image after the deblurring of the N-1th frame are convolved by the aligned convolution check after adjusting the dimensions to obtain the The second feature image.
  • the second processing unit 13 is configured to: perform fusion processing on the first characteristic image and the second characteristic image to obtain a third characteristic image; The characteristic image is decoded to obtain the deblurred image of the Nth frame.
  • the first convolution processing subunit 121 is further configured to: deblur the Nth frame image, the N-1th frame image, and the N-1th frame The processed image is superimposed in the channel dimension to obtain the image to be processed; and the image to be processed is encoded to obtain a fourth characteristic image; and the fourth characteristic image is convolved to obtain A fifth characteristic image; and adjusting the number of channels of the fifth characteristic image to a first preset value through convolution processing to obtain the aligned convolution kernel.
  • the first convolution processing subunit 121 is further configured to adjust the number of channels of the aligned convolution kernel to the second preset value through convolution processing to obtain the first Six feature images; and performing fusion processing on the fourth feature image and the sixth feature image to obtain a seventh feature image; and performing convolution processing on the seventh feature image to extract the N-1th feature image
  • the deblurring information of the pixels of the image after frame deblurring processing relative to the pixels of the N-1th frame image is obtained to obtain the deblurring convolution kernel.
  • the first convolution processing subunit 121 is further configured to: perform convolution processing on the seventh feature image to obtain an eighth feature image; and perform convolution processing on the The number of channels of the eighth characteristic image is adjusted to the first preset value to obtain the deblurring convolution kernel.
  • the second processing unit 13 is further configured to: perform deconvolution processing on the third feature image to obtain a ninth feature image; and perform convolution on the ninth feature image.
  • Product processing to obtain a decoded image of the Nth frame; and add the pixel value of the first pixel of the Nth frame of image to the pixel value of the second pixel of the image of the Nth frame of decoded , To obtain the deblurred image of the Nth frame, wherein the position of the first pixel in the Nth frame of image and the second pixel in the Nth frame of the decoded image In the same position.
  • the functions or units included in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the functions or units included in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the embodiment of the present application also provides an electronic device, including: a processor, an input device, an output device, and a memory.
  • the processor, the input device, the output device, and the memory are connected to each other, and the memory stores program instructions; When the program instructions are executed by the processor, the processor is caused to execute the method described in the embodiment of the present application.
  • the embodiment of the present application also provides a processor configured to execute the method described in the embodiment of the present application.
  • FIG. 16 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the application.
  • the electronic device 2 includes a processor 21, a memory 22, and a camera 23.
  • the processor 21, the memory 22, and the camera 23 are coupled through a connector, and the connector includes various interfaces, transmission lines or buses, etc., which are not limited in the embodiment of the present application.
  • coupling refers to mutual connection in a specific manner, including direct connection or indirect connection through other devices, for example, connection through various interfaces, transmission lines, buses, etc.
  • the processor 21 may be one or more graphics processing units (Graphics Processing Unit, GPU).
  • GPU Graphics Processing Unit
  • the processor 21 may be a single-core GPU or a multi-core GPU.
  • the processor 21 may be a processor group composed of multiple GPUs, and the multiple processors are coupled to each other through one or more buses.
  • the processor may also be other types of processors, etc., which is not limited in the embodiment of the present application.
  • the memory 22 may be used to store computer program instructions and various computer program codes including program codes used to execute the solutions of the present application.
  • the memory includes but is not limited to Random Access Memory (RAM), Read-Only Memory (ROM), Erasable Programmable Read-Only Memory, EPROM ), or a portable read-only memory (Compact Disc Read-Only Memory, CD-ROM), which is used for related instructions and data.
  • RAM Random Access Memory
  • ROM Read-Only Memory
  • EPROM Erasable Programmable Read-Only Memory
  • CD-ROM Compact Disc Read-Only Memory
  • the camera 23 can be used to obtain related videos or images and so on.
  • the memory can be used not only to store related instructions, but also to store related images and videos.
  • the memory can be used to store videos acquired by the camera 23, or the memory can also be used to store 21 and the generated image after deblurring processing, etc., the embodiment of the present application does not limit the specific video or image stored in the memory.
  • FIG. 16 only shows a simplified design of the video image processing device.
  • the video image processing device may also include other necessary components, including but not limited to any number of input/output devices, processors, controllers, memories, etc., and all devices that can implement the embodiments of this application are Within the protection scope of this application.
  • the embodiments of the present application also provide a computer-readable storage medium in which a computer program is stored.
  • the computer program includes program instructions. When the program instructions are executed by a processor of an electronic device, Enabling the processor to execute the method described in the embodiment of the present application.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium.
  • the computer instructions can be sent from a website, computer, server, or data center via wired (for example, coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (for example, infrared, wireless, microwave, etc.) Another website site, computer, server or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a Digital Versatile Disc (DVD)), or a semiconductor medium (for example, a Solid State Disk (SSD) )Wait.
  • a magnetic medium for example, a floppy disk, a hard disk, and a magnetic tape
  • an optical medium for example, a Digital Versatile Disc (DVD)
  • DVD Digital Versatile Disc
  • SSD Solid State Disk
  • the process can be completed by a computer program instructing relevant hardware.
  • the program can be stored in a computer readable storage medium. , May include the processes of the foregoing method embodiments.
  • the aforementioned storage media include: read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM), magnetic disks or optical disks, and various media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例公开了一种视频图像处理方法及装置。该方法包括:获取多帧连续视频图像,其中,所述多帧连续视频图像包括第N帧图像、第N-1帧图像以及第N-1帧去模糊处理后的图像,所述N为正整数;基于所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像,得到所述第N帧图像的去模糊卷积核;通过所述去模糊卷积核对所述第N帧图像进行去模糊处理,得到第N帧去模糊处理后的图像。

Description

视频图像处理方法及装置
相关申请的交叉引用
本申请基于申请号为201910325282.5、申请日为2019年04月22日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。
技术领域
本申请涉及图像处理技术领域,尤其涉及一种视频图像处理方法及装置。
背景技术
随着手持相机和机载相机应用的日益普及,越来越多的人们通过相机拍摄视频,并可基于拍摄的视频进行处理,如无人机和自动驾驶汽车可基于拍摄的视频实现追踪、避障等功能。
由于相机晃动、失焦、拍摄对象高速运动等原因,拍摄的视频易产生模糊,如机器人行动时,由于相机抖动或拍摄对象的运动产生的模糊,这往往将导致拍摄失败或者无法基于视频进行进下一步处理。传统方法通过光流或神经网络可去除视频图像中的模糊,但去模糊效果均较差。
发明内容
本申请实施例提供一种视频图像处理方法及装置。
第一方面,本申请实施例提供了一种视频图像处理方法,包括:获取多帧连续视频图像,其中,所述多帧连续视频图像包括第N帧图像、第N-1帧图像以及第N-1帧去模糊处理后的图像,所述N为正整数;基于所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像,得到所述第N帧图像的去模糊卷积核;通过所述去模糊卷积核对所述第N帧图像进行去模糊处理,得到第N帧去模糊处理后的图像。
通过第一方面提供的技术方案,可得到视频图像中第N帧图像的去模糊卷积核,再通过第N帧图像的去模糊卷积核对第N帧图像进行卷积处理,可有效地去除第N帧图像中的模糊,得到第N帧去模糊处理后的图像。
在一种可能实现的方式中,所述基于所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像,得到所述第N帧图像的去模糊卷积核,包括:对待处理图像的像素点进行卷积处理,得到去模糊卷积核,其中,所述待处理图像由所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像在通道维度上进行叠加得到。
在该种可能实现的方式中,基于第N-1帧图像的像素点以及第N-1帧去模糊处理后的图像的像素点之间的去模糊信息,得到像素点的去模糊卷积核,并用该去模糊卷积核对第N帧图像中对应的像素点进行去卷积处理,以去除第N帧图像中像素点的模糊;通过为第N帧图像中的每个像素点分别生成一个去模糊卷积核,可去除对第N帧图像(非均匀模糊图像)中的模糊,去模糊处理后的图像清晰、自然。
在另一种可能实现的方式中,所述对待处理图像的像素点进行卷积处理,得到去模糊卷积核,包括:对所述待处理图像进行卷积处理,以提取所述第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,得到对齐卷积核,其中,所述运动信息包括速度和方向;对所述对齐卷积核进行编码处理,得到所述去模糊卷积核。
在该种可能实现的方式中,基于第N-1帧图像的像素点以及第N帧图像的像素点之间的运动信息,得到像素点的对齐卷积核,后续可通过该对齐核进行对齐处理。再通过对对齐核进行卷积处理, 提取出第N-1帧图像的像素点与第N-1帧去模糊处理后的图像的像素点之间的去模糊信息,得到去模糊核,可使去模糊核既包含第N-1帧图像的像素点与第N-1帧去模糊处理后的图像的像素点之间的去模糊信息,又包含了第N-1帧图像的像素点与第N帧图像的像素点之间的运动信息,有利于提升去除第N帧图像的模糊的效果。
在又一种可能实现的方式中,所述通过所述去模糊卷积核对所述第N帧图像进行去模糊处理,得到第N帧去模糊处理后的图像,包括:通过所述去模糊卷积核对所述第N帧图像的特征图像的像素点进行卷积处理,得到第一特征图像;对所述第一特征图像进行解码处理,得到所述第N帧去模糊处理后的图像。
在该种可能实现的方式中,通过去模糊卷积核对第N帧图像的特征图像进行去模糊处理,可减小去模糊过程的数据处理量,提高处理速度快。
在又一种可能实现的方式中,所述通过所述去模糊卷积核对所述第N帧图像的特征图像的像素点进行卷积处理,得到第一特征图像,包括:调整所述去模糊卷积核的维度,使所述去模糊卷积核的通道数与所述第N帧图像的特征图像的通道数相同;通过调整维度后的所述去模糊卷积核对所述第N帧图像的特征图像的像素点进行卷积处理,得到所述第一特征图像。
在该种可能实现的方式中,通过调整去模糊卷积核的维度,使去模糊卷积核的维度与第N帧图像的特征图像的维度相同,进而实现通过调整维度去模糊卷积核对第N帧图像的特征图像进行卷积处理。
在又一种可能实现的方式中,所述对所述待处理图像进行卷积处理,以提取所述第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,得到对齐卷积核之后,还包括:通过所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到第二特征图像。
在该种可能实现的方式中,通过对齐卷积核对第N-1帧图像的特征图像的像素点进行卷积处理,实现将第N-1帧图像的特征图像向第N帧时刻对齐。
在又一种可能实现的方式中,所述通过所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到第二特征图像,包括:调整所述对齐卷积核的维度,使所述对齐卷积核的通道数与所述第N-1帧图像的特征图像的通道数相同;通过调整维度后的所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到所述第二特征图像。
在该种可能实现的方式中,通过调整去对齐卷积核的维度,使去对齐卷积核的维度与第N-1帧图像的特征图像的维度相同,进而实现通过调整维度对齐卷积核对第N-1帧图像的特征图像进行卷积处理。
在又一种可能实现的方式中,所述对所述第一特征图像进行解码处理,得到所述第N帧去模糊处理后的图像,包括:对所述第一特征图像和所述第二特征图像进行融合处理,得到第三特征图像;对所述第三特征图像进行解码处理,得到所述第N帧去模糊处理后的图像。
在该种可能实现的方式中,通过将第一特征图像和第二特征图像进行融合,提升对第N帧图像的去模糊的效果,再对融合后的第三特征图像进行解码处理得到第N帧去模糊处理后的图像。
在又一种可能实现的方式中,所述对所述待处理图像进行卷积处理,以提取所述第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,得到对齐卷积核,包括:对所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像在通道维度上进行叠加处理,得到所述待处理图像;对所述待处理图像进行编码处理,得到第四特征图像;对所述第四特征图像进行卷积处理,得到第五特征图像;通过卷积处理将所述第五特征图像的通道数调整至第一预设值,得到所述对齐卷积核。
在该种可能实现的方式中,通过对待处理图像进行卷积处理,提取第N-1帧图像的像素点相对于第N帧图像的像素点的运动信息,为方便后续处理再通过卷积处理将第五特征图像的通道数调整至第一预设值。
在又一种可能实现的方式中,所述对齐卷积核进行编码处理,得到所述去模糊卷积核,包括:通过卷积处理将所述对齐卷积核的通道数调整至第二预设值,得到第六特征图像;对所述第四特征图像和所述第六特征图像进行融合处理,得到第七特征图像;对所述第七特征图像进行卷积处理,以提取所述第N-1帧去模糊处理后的图像的像素点相对于所述第N-1帧图像的像素点的去模糊信息,得到所述去模糊卷积核。
在该种可能实现的方式中,通过对对齐卷积核进行卷积处理得到去模糊卷积核,可使去模糊卷 积核不仅包含第N-1帧图像的像素点相对于第N帧图像的像素点的运动信息,还包含第N-1帧去模糊处理后的图像的像素点相对于第N-1帧图像的像素点的去模糊信息,提高后续通过去模糊卷积核去除第N帧图像的模糊的效果。
在又一种可能实现的方式中,所述对所述第七特征图像进行卷积处理,以提取所述第N-1帧去模糊处理后的图像相对于所述第N-1帧图像的像素点的去模糊信息,得到所述去模糊卷积核,包括:对所述第七特征图像进行卷积处理,得到第八特征图像;通过卷积处理将所述第八特征图像的通道数调整至所述第一预设值,得到所述去模糊卷积核。
在该种可能实现的方式中,通过对第七特征图像进行卷积处理,提取第N-1帧图像的像素点相对于第N-1帧去模糊处理后的图像的像素点的运动信息,为方便后续处理再通过卷积处理将第八特征图像的通道数调整至第一预设值
在又一种可能实现的方式中,所述对所述第三特征图像进行解码处理,得到所述第N帧去模糊处理后的图像,包括:对所述第三特征图像进行解卷积处理,得到第九特征图像;对所述第九特征图像进行卷积处理,得到第N帧解码处理后的图像;将所述第N帧图像的第一像素点的像素值与所述第N帧解码处理后的图像的第二像素点的像素值相加,得到所述第N帧去模糊处理后的图像,其中,所述第一像素点在所述第N帧图像中的位置与所述第二像素点在所述第N帧解码处理后的图像中的位置相同。
在该种可能实现的方式中,通过解卷积处理和卷积处理实现对第三特征图像的解码处理,得到第N帧解码处理后的图像,再通过将第N帧图像与第N帧解码处理后的图像中对应的像素点的像素值相加,得到所述第N帧去模糊处理后的图像,进一步提高去模糊的效果。
第二方面,本申请实施例还提供了一种视频图像处理装置,包括:获取单元,配置为获取多帧连续视频图像,其中,所述多帧连续视频图像包括第N帧图像、第N-1帧图像以及第N-1帧去模糊处理后的图像,所述N为正整数;第一处理单元,配置为基于所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像,得到所述第N帧图像的去模糊卷积核;第二处理单元,配置为通过所述去模糊卷积核对所述第N帧图像进行去模糊处理,得到第N帧去模糊处理后的图像。
在一种可能实现的方式中,所述第一处理单元包括:第一卷积处理子单元,配置为对待处理图像的像素点进行卷积处理,得到去模糊卷积核,其中,所述待处理图像由所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像在通道维度上进行叠加得到。
在另一种可能实现的方式中,所述第一卷积处理子单元配置为:对所述待处理图像进行卷积处理,以提取所述第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,得到对齐卷积核,其中,所述运动信息包括速度和方向;以及对所述对齐卷积核进行编码处理,得到所述去模糊卷积核。
在又一种可能实现的方式中,所述第二处理单元包括:第二卷积处理子单元,配置为通过所述去模糊卷积核对所述第N帧图像的特征图像的像素点进行卷积处理,得到第一特征图像;解码处理子单元,配置为对所述第一特征图像进行解码处理,得到所述第N帧去模糊处理后的图像。
在又一种可能实现的方式中,所述第二卷积处理子单元配置为:调整所述去模糊卷积核的维度,使所述去模糊卷积核的通道数与所述第N帧图像的特征图像的通道数相同;以及通过调整维度后的所述去模糊卷积核对所述第N帧图像的特征图像的像素点进行卷积处理,得到所述第一特征图像。
在又一种可能实现的方式中,所述第一卷积处理子单元还配置为:在所述对所述待处理图像进行卷积处理,以提取所述第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,得到对齐卷积核之后,通过所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到第二特征图像。
在又一种可能实现的方式中,所述第一卷积处理子单元还配置为:调整所述对齐卷积核的维度,使所述对齐卷积核的通道数与所述第N-1帧图像的特征图像的通道数相同;以及通过调整维度后的所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到所述第二特征图像。
在又一种可能实现的方式中,所述第二处理单元配置为:对所述第一特征图像和所述第二特征图像进行融合处理,得到第三特征图像;以及对所述第三特征图像进行解码处理,得到所述第N帧去模糊处理后的图像。
在又一种可能实现的方式中,所述第一卷积处理子单元还配置为:对所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像在通道维度上进行叠加处理,得到所述待处理图像;以及对所述待处理图像进行编码处理,得到第四特征图像;以及对所述第四特征图像进行卷积 处理,得到第五特征图像;以及通过卷积处理将所述第五特征图像的通道数调整至第一预设值,得到所述对齐卷积核。
在又一种可能实现的方式中,所述第一卷积处理子单元还配置为:通过卷积处理将所述对齐卷积核的通道数调整至第二预设值,得到第六特征图像;以及对所述第四特征图像和所述第六特征图像进行融合处理,得到第七特征图像;以及对所述第七特征图像进行卷积处理,以提取所述第N-1帧去模糊处理后的图像的像素点相对于所述第N-1帧图像的像素点的去模糊信息,得到所述去模糊卷积核。
在又一种可能实现的方式中,所述第一卷积处理子单元还配置为:对所述第七特征图像进行卷积处理,得到第八特征图像;以及通过卷积处理将所述第八特征图像的通道数调整至所述第一预设值,得到所述去模糊卷积核。
在又一种可能实现的方式中,所述第二处理单元还配置为:对所述第三特征图像进行解卷积处理,得到第九特征图像;以及对所述第九特征图像进行卷积处理,得到第N帧解码处理后的图像;以及将所述第N帧图像的第一像素点的像素值与所述第N帧解码处理后的图像的第二像素点的像素值相加,得到所述第N帧去模糊处理后的图像,其中,所述第一像素点在所述第N帧图像中的位置与所述第二像素点在所述第N帧解码处理后的图像中的位置相同。
第三方面,本申请实施例还提供了一种处理器,所述处理器用于执行上述第一方面及其任一种可能的实现方式的方法。
第四方面,本申请实施例还提供了一种电子设备,包括:处理器、输入装置、输出装置和存储器,所述处理器、输入装置、输出装置和存储器相互连接,所述存储器中存储有程序指令;所述程序指令被所述处理器执行时,使所述处理器执行上述第一方面及其任一种可能的实现方式的方法。
第五方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被电子设备的处理器执行时,使所述处理器执行上述第一方面及其任一种可能的实现方式的方法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开实施例。
附图说明
为了更清楚地说明本申请实施例或背景技术中的技术方案,下面将对本申请实施例或背景技术中所需要使用的附图进行说明。
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。
图1为本申请实施例提供的不同图像中对应的像素点的示意图;
图2为本申请实施例提供的一张非均匀模糊图像;
图3为本申请实施例提供的一种视频图像处理方法的流程示意图;
图4为本申请实施例的视频图像处理方法中的去模糊处理的流程示意图;
图5为本申请实施例提供的另一种视频图像处理方法的流程示意图;
图6为本申请实施例提供的一种获得去模糊卷积核以及对齐卷积核的流程示意图;
图7为本申请实施例提供的一种编码模块示意图;
图8为本申请实施例提供的一种对齐卷积核生成模块示意图;
图9为本申请实施例提供的一种去模糊卷积核生成模块示意图;
图10为本申请实施例提供的另一种视频图像处理方法的流程示意图;
图11为本申请实施例提供的一种自适应卷积处理模块示意图;
图12为本申请实施例提供的一种解码模块示意图;
图13为本申请实施例提供的一种视频图像去模糊神经网络的结构示意图;
图14为本申请实施例提供的一种对齐卷积核以及去模糊卷积核生成模块的结构示意图;
图15为本申请实施例提供的一种视频图像处理装置的结构示意图;
图16为本申请实施例提供的一种电子设备的硬件结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
本申请实施例中,将大量出现“对应”这个词,其中,两张图像中对应的像素点指两张图像中相同位置的两个像素点。举例来说,如图1所示,图像A中的像素点a与图像B中的像素点d对应,图像A中的像素点b与图像B中的像素点c对应。需要理解的是,多张图像中对应的像素点与两张图像中对应的像素点的意义相同。
下文中出现的非均匀模糊图像指图像内不同像素点的模糊程度不一样,即不同像素点的运动轨迹不同。例如:如图2所示,左上角区域的指示牌上的字体的模糊程度要比右下角的汽车的模糊程度大,即这两个区域的模糊程度不一致。应用本申请实施例可去除非均匀模糊图像中的模糊,下面结合本申请实施例中的附图对本申请实施例进行描述。
请参阅图3,图3是本申请实施例提供的一种视频图像处理的方法的流程示意图,如图3所示,所述方法包括:
301、获取多帧连续视频图像,其中,所述多帧连续视频图像包括第N帧图像、第N-1帧图像以及第N-1帧去模糊处理后的图像,所述N为正整数。
本申请实施例中,可通过摄像头拍摄视频获得多帧连续视频图像。上述第N帧图像、第N-1帧图像为多帧连续视频图像中相邻的两帧图像,且第N帧图像为第N-1帧图像的后一帧图像,第N帧图像为当前准备处理(即应用本申请提供的实施方式进行去模糊处理)的一帧图像。第N-1帧去模糊处理后的图像即为对第N-1帧图像进行去模糊处理后得到的图像。
需要理解的是,本申请实施例对视频图像去模糊是一个递归的过程,即,第N-1帧去模糊处理后的图像将作为第N帧图像去模糊处理过程的输入图像,同理,第N帧去模糊处理后的图像将作为第N+1帧图像去模糊处理过程的输入图像。
可选地,若N为1,即当前去模糊处理的对象为视频中的第一帧。此时,第N-1帧图像以及第N-1帧去模糊处理后的图像均为第N帧,即获取3张第1帧图像。
本申请实施例中,将视频中的每一帧图像按拍摄的时间顺序排列得到的序列称为视频帧序列。将去模糊处理后得到的图像称为去模糊处理后的图像。
本申请实施例按视频帧序列对视频图像进行去模糊处理,每一次只对一帧图像进行去模糊处理。
可选地,视频图像以及去模糊处理后的图像可存储于电子设备的储存器中。其中,视频指视频流,即按视频帧序列的顺序将视频图像存储于电子设备的存储器中。因此,电子设备可直接从存储器中获取第N帧图像、第N-1帧图像以及第N-1帧去模糊处理后的图像。
需要理解的是,本申请实施例中提到的视频图像可以是经电子设备的摄像头实时拍摄得到的视频,也可以是存储于电子设备的存储器内的视频图像。
302、基于所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像,得到所述第N帧图像的去模糊卷积核。
在本申请的一种可选实施例中,所述基于所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像,得到所述第N帧图像的去模糊卷积核,包括:对待处理图像的像素点进行卷积处理,得到去模糊卷积核,其中,所述待处理图像由所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像在通道维度上进行叠加得到。
本实施例中,将第N帧图像、第N-1帧图像以及第N-1帧去模糊处理后的图像在通道维度上进行叠加,得到待处理图像。举例来说(例1),假设第N帧图像、第N-1帧图像以及第N-1帧去模糊处理后的图像的尺寸均为100*100*3,叠加后得到的待处理图像的尺寸为100*100*9,也就是说,将三张图像(第N帧图像、第N-1帧图像以及第N-1帧去模糊处理后的图像)叠加后得到的待处理图像中的像素点的数量相较于三张图像中的任意一张图像中像素点数量不变,但每个像素点的通道数将变成三张图像中的任意一张图像的3倍。
本申请实施例中,对待处理图像的像素点所做的卷积处理可以由多个任意堆叠的卷积层实现,本申请实施例对卷积层的数量以及卷积层中卷积核的大小不做限定。
通过对待处理图像的像素点进行卷积处理,可提取出待处理图像中的像素点的特征信息,得到去模糊卷积核。其中,特征信息包括第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,以及第N-1帧图像的像素点相对于所述第N-1帧去模糊处理后的图像的像素点的去模糊信息。上述运动信息包括第N-1帧图像中的像素点相对第N帧图像中对应的像素点的运动速度和运动方向。
需要理解的是,本申请实施例中的去模糊卷积核即为对待处理图像进行卷积处理得到的结果,在本申请实施例的后续处理中将其作为卷积处理的卷积核。
还需理解的是,对待处理图像的像素点进行卷积处理指对待处理图像的每个像素点进行卷积处理,分别得到每个像素点的去模糊卷积核。接着例1继续举例(例2),待处理图像的尺寸为100*100*9,即待处理图像中包含100*100个像素点,则对待处理图像的像素点进行卷积处理后,可得到一张100*100的特征图像,其中,上述100*100的特征图像中的每个像素点均可作为后续对第N帧图像中的像素点进行去模糊处理的去模糊卷积核。
303、通过所述去模糊卷积核对所述第N帧图像进行去模糊处理,得到第N帧去模糊处理后的图像。
在本申请的一种可选实施例中,如图4所示,所述通过所述去模糊卷积核对所述第N帧图像进行去模糊处理,得到第N帧去模糊处理后的图像,可包括:
3031,通过所述去模糊卷积核对所述第N帧图像的特征图像的像素点进行卷积处理,得到第一特征图像。
上述第N帧图像的特征图像可通过对第N帧图像进行特征提取处理得到。其中,特征提取处理可以是卷积处理,也可以是池化处理,本申请实施例对此不做限定。
通过302的处理得到待处理图像中每个像素点的去模糊卷积核。其中,待处理图像的像素点的数量与第N帧图像的像素点的数量相同,且在待处理图像中的像素点与第N帧图像中的像素点一一对应。本申请实施例中,一一对应的含义可参见下例:待处理图像中像素点A与第N帧图像中的像素点B一一对应,即像素点A在待处理图像中的位置与像素点B在第N帧图像中的位置相同。
3032,对所述第一特征图像进行解码处理,得到所述第N帧去模糊处理后的图像。
上述解码处理可以通过解卷积处理实现,也可以通过解卷积处理与卷积处理组合得到,本申请实施例对此不做限定。
可选地,为提升对第N帧图像的去模糊处理效果,将对第一特征图像进行解码处理得到的图像中的像素点的像素值与第N帧图像的像素点的像素值相加,并将“相加”后得到的图像作为第N帧去模糊处理后的图像。通过上述“相加”可利用第N帧图像的信息得到第N帧去模糊处理后的图像。
举例来说,假设解码处理后得到的图像中的像素点C的像素值为200,第N帧图像中的像素点D的像素值为150,则“相加”后得到的第N帧去模糊处理后的图像中的像素点E的像素值为350,其中,C在待处理图像中的位置、D在第N帧图像中的位置以及E在第N帧去模糊处理后的图像中的位置相同。
如上所述,非均匀模糊图像中不同的像素点的运动轨迹不同,且像素点的运动轨迹越复杂其模糊程度越高,本申请实施例通过为待处理图像中的每个像素点分别预测一个去模糊卷积核,并通过预测得到的去模糊卷积核对第N帧图像中的特征点进行卷积处理,以去除第N帧特征中的像素点的模糊。由于,非均匀模糊图像中不同的像素点的模糊程度不同,显然,为不同的像素点生成相应地去模糊卷积核,可更好的去除每个像素点的模糊,进而实现去除非均匀模糊图像中的模糊。
本申请实施例基于第N-1帧图像的像素点以及第N-1帧去模糊处理后的图像的像素点之间的去模糊信息,得到像素点的去模糊卷积核,并用该去模糊卷积核对第N帧图像中对应的像素点进行去卷积处理,以去除第N帧图像中像素点的模糊;通过为第N帧图像中的每个像素点分别生成一个去模糊卷积核,可去除对第N帧图像(非均匀模糊图像)中的模糊,去模糊处理后的图像清晰、自然, 且整个去模糊处理过程耗时短,处理速度快。
请参阅图5,图5是本申请实施例提供的302的一种可能实现的方式的流程示意图,如图5所示,所述方法包括:
401、对待处理图像进行卷积处理,以提取第N-1帧图像的像素点相对于第N帧图像的像素点的运动信息,得到对齐卷积核,其中,所述运动信息包括速度和方向。
本申请实施例中,运动信息包括速度和方向,可以理解为像素点的运动信息指该像素点从第N-1帧时刻(拍摄第N-1帧图像的时刻)至第N帧的时刻(拍摄第N帧图像的时刻)内的运动轨迹。
由于被拍摄物体在单次曝光时间内的是运动的,且运动轨迹是曲线,进而导致拍摄得到的图像中产生模糊,也就是说,第N-1帧图像的像素点相对于第N帧图像的像素点的运动信息有助于去除第N帧图像的模糊。
本申请实施例中,对待处理图像的像素点所做的卷积处理可以由多个任意堆叠的卷积层实现,本申请实施例对卷积层的数量以及卷积层中卷积核的大小不做限定。
通过对待处理图像的像素点进行卷积处理,可提取出待处理图像中的像素点的特征信息,得到对齐卷积核。其中,此处的特征信息包括第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息。
需要理解的是,本申请实施例中的对齐卷积核即为对待处理图像进行上述卷积处理得到的结果,在本申请实施例的后续处理中将其作为卷积处理的卷积核。具体地,由于对齐卷积核通过对待处理图像进行卷积处理提取出第N-1帧图像的像素点相对于第N帧图像的像素点的运动信息得到,因此,后续可通过对齐卷积核对第N帧图像的像素点进行对齐处理。
需要指出的是,本实施例中获得的对齐卷积核也是实时得到的,即通过上述处理,得到第N帧图像中的每一个像素点的对齐卷积核。
402、对所述对齐卷积核进行编码处理,得到所述去模糊卷积核。
此处的编码处理可以是卷积处理,也可以是池化处理。
在一种可能实现的方式中,上述编码处理为卷积处理,卷积处理可由多个任意堆叠的卷积层实现,本申请实施例对卷积层的数量以及卷积层中卷积核的大小不做限定。
需要理解的是,402中的卷积处理与401中的卷积处理不同。举例来说,假设401中的卷积处理由3个通道数是32的卷积层(卷积核的大小为3*3)实现,402中的卷积处理由5个通道数是64的卷积层(卷积核的大小为3*3)实现,两者(3个卷积层和5个卷积层)本质都是卷积处理,但两者的具体实现过程不同。
由于待处理图像是由第N帧图像、第N-1帧图像以及第N-1帧去模糊处理后的图像在通道维度上进行叠加得到,因此,待处理图像中包含了第N帧图像、第N-1帧图像以及第N-1帧去模糊处理后的图像的信息。而401中的卷积处理更侧重于提取第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,也就是说,经过401的处理,待处理图像中第N-1帧图像与第N-1帧去模糊处理后的图像之间的去模糊处理信息并没有被提取出来。
可选地,在对对齐卷积核进行编码处理之前,可对待处理图像与对齐卷积核进行融合处理,使融合后得到的对齐卷积核包含第N-1帧图像与第N-1帧去模糊处理后的图像之间的去模糊信息。
通过对对齐卷积核进行卷积处理,提取出第N-1帧去模糊处理后的图像相对于所述第N-1帧图像的像素点的去模糊信息,得到去模糊卷积核。其中,去模糊信息可以理解为第N-1帧图像的像素点与第N-1帧去模糊图像的像素点之间的映射关系,即去模糊处理前的像素点与去模糊处理后的像素点之间的映射关系。
这样,通过对对齐卷积核进行卷积处理得到的去模糊卷积核既包含第N-1帧图像的像素点与第N-1帧去模糊处理后的图像的像素点之间的去模糊信息,又包含了第N-1帧图像的像素点与第N帧图像的像素点之间的运动信息。后续通过去模糊卷积核对第N帧图像的像素点进行卷积处理,可提升去模糊的效果。
本申请实施例基于第N-1帧图像的像素点以及第N帧图像的像素点之间的运动信息,得到像素点的对齐卷积核,后续可通过该对齐卷积核进行对齐处理。再通过对对齐卷积核进行卷积处理,提取出第N-1帧图像的像素点与第N-1帧去模糊处理后的图像的像素点之间的去模糊信息,得到去模糊卷积核,可使去模糊卷积核既包含第N-1帧图像的像素点与第N-1帧去模糊处理后的图像的像素点之间的去模糊信息,又包含了第N-1帧图像的像素点与第N帧图像的像素点之间的运动信息,有利于提升去除第N帧图像的模糊的效果。
上述实施例均通过对图像进行卷积处理得到去模糊卷积核以及对齐卷积核。由于图像中所包含 的像素点的数量大,若直接对图像进行处理,所需处理的数据量大,且处理速度慢,因此,本申请实施例将提供一种根据特征图像得到去模糊卷积核以及对齐卷积核的实现方式。
请参阅图6,图6是本申请实施例6提供的一种获得去模糊卷积核以及对齐卷积核的流程示意图,如图6所示,所述方法包括:
501、对第N帧图像、第N-1帧图像以及第N-1帧去模糊处理后的图像在通道维度上进行叠加处理,得到待处理图像。
请参见步骤302得到待处理图像的实现方式,此处将不再赘述。
502、对所述待处理图像进行编码处理,得到第四特征图像。
上述编码处理可以通过多种方式实现,例如卷积、池化等,本申请实施例对此不做具体限定。
在一些可能的实现方式中,请参阅图7,图7所示模块可用于对待处理图像进行编码处理,该模块依次包括一个通道数为32的卷积层(卷积核大小为3*3)、两个通道数为32的残差块(每个残差块包含两个卷积层,卷积层的卷积核的大小为3*3)、一个通道数为64的卷积层(卷积核大小为3*3)、两个通道数为64的残差块(每个残差块包含两个卷积层,卷积层的卷积核的大小为3*3)、一个通道数为128的卷积层(卷积核大小为3*3)、两个通道数为128的残差块(每个残差块包含两个卷积层,卷积层的卷积核的大小为3*3)。
通过该模块对待处理图像逐层进行卷积处理完成对待处理图像的编码,得到第四特征图像,其中,每个卷积层提取出的特征内容及语义信息均不一样,具体表现为,编码处理一步步地将待处理图像的特征抽象出来,同时也将逐步去除相对次要的特征,因此,越到后面提取出的特征图像的尺寸越小,且语义信息就越浓缩。通过多层卷积层逐级对待处理图像进行卷积处理,并提取相应的特征,最终得到固定大小的第四特征图像,这样,可在获得待处理图像主要内容信息(即第四特征图像)的同时,将图像尺寸缩小,减小数据处理量,提高处理速度。
举例来说(例3),假设待处理图像的尺寸为100*100*3,则经过图7所示的模块进行编码处理得到的第四特征图像的尺寸为25*25*128。
在一种可能实现的方式中,上述卷积处理的实现过程如下:卷积层对待处理图像做卷积处理,即利用卷积核在待处理图像上滑动,并将待处理图像上的像素与对应的卷积核上的数值相乘,然后将所有相乘后的值相加作为卷积核中间像素对应的图像上像素值,最终滑动处理完待处理图像中所有的像素,并得到第四特征图像。可选地,在该种可能实现的方式中,卷积层的步长可取为2。
请参见图8,图8为本申请实施例提供的一种用于生成对齐卷积核的模块,根据图8所示的模块生成对齐卷积核的具体过程可参见503~504。
503、对所述第四特征图像进行卷积处理,得到第五特征图像。
如图8所示,将第四特征图像输入至图8所示的模块,第四特征图像依次经过1个通道数为128的卷积层(卷积核大小为3*3)、两个通道数为64的残差块(每个残差块包含两个卷积层,卷积层的卷积核的大小为3*3)的处理,实现对第四特征图像的卷积处理,提取第四特征图像中的第N-1帧图像的像素点与第N帧图像的像素点之间的运动信息,得到第五特征图像。
需要理解的是,通过上述对第四特征图像进行处理,图像的尺寸并没有改变,即得到的第五特征图像的尺寸与第四特征图像尺寸相同。
接着例3继续举例(例4),第四特征图像的尺寸为25*25*128,经过303的处理得到的第五特征图像的尺寸也为25*25*128。
504、通过卷积处理将所述第五特征图像的通道数调整至第一预设值,得到所述对齐卷积核。
为进一步提取第五特征图像中第N-1帧图像的像素点与第N帧图像的像素点之间的运动信息,图8中的第四层对第五特征图像进行卷积处理,得到的对齐卷积核的尺寸为25*25*c*k*k(需要理解的是,此处通过第四层的卷积处理调整第五特征图像的通道数),其中,c为第五特征图像的通道数,k为正整数,可选地,k的取值为5。为方便处理,将25*25*c*k*k调整为25*25*ck 2,其中,ck 2即为第一预设值。
需要理解的是,对齐卷积核的高和宽均为25。对齐卷积核包含25*25个元素,每个元素包含c个像素点,且不同的元素在对齐卷积核中的位置不同,如:假设将对齐卷积核的宽和高所在的平面定义为xoy平面,则对齐卷积核中的每个元素均可由坐标(x,y)确定,其中,o为原点。对齐卷积核的元素为后续处理中对像素点进行对齐处理的卷积核,每个元素的尺寸为1*1*ck 2
接着例4继续举例(例5),第五特征图像的尺寸为25*25*128,通过304的处理得到的对齐卷 积核的尺寸为25*25*128*k*k,即25*25*128k 2。对齐卷积核包含25*25个元素,每个元素包含128个像素点,且不同的元素在第对齐卷积核中的位置不同。每个元素的尺寸为1*1*128*k 2
由于第四层为卷积层,而卷积层的卷积核越大,带来的数据处理量就越大。可选地,图8中的第四层是一个通道数为128、卷积核大小为1*1的卷积层。通过卷积核大小为1*1的卷积层调整第五特征图像的通道数,可减小数据处理量,提高处理速度。
505、通过卷积处理将所述对齐卷积核的通道数调整至第二预设值,得到第六特征图像。
由于504中通过卷积处理(即图8中的第四层)调整了第五特征图像的通道数,因此在对对齐卷积核进行卷积处理得到去模糊卷积核之前,需要将对齐卷积核的通道数调整至第二预设值(即第五特征图像的通道数)。
在一种可能实现的方式中,通过卷积处理将对齐卷积核的通道数调整至第二预设值,得到第六特征图像。可选地,该卷积处理可通过一个通道数为128、卷积核大小为1*1的卷积层实现。
506、对所述第四特征图像和所述第六特征图像在通道维度上进行叠加处理,得到第七特征图像。
本实施例502~504更侧重于提取待处理图像中第N-1帧图像的像素点与第N帧图像的像素点之间的运动信息。由于后续处理需要提取出待处理图像中第N-1帧图像的像素点与第N-1帧去模糊处理后的图像的像素点之间的去模糊信息,因此在进行后续处理之前,通过将第四特征图像与第六特征图像进行融合,以在特征图像中增加第N-1帧图像的像素点与第N-1帧去模糊处理后的图像的像素点之间的去模糊信息。
在一种可能实现的方式中,对第四特征图像和第六特征图像进行融合处理(concatenate),即将第四特征图像和第六特征图像在通道维度上进行叠加处理,得到第七特征图像。
507、对所述第七特征图像进行卷积处理,以提取所述第N-1帧去模糊处理后的图像的像素点相对于所述第N-1帧图像的像素点的去模糊信息,得到所述去模糊卷积核。
第七特征图像中包含已提取出的第N-1帧图像的像素点与第N-1帧去模糊处理后的图像的像素点之间的去模糊信息,而通过对第七特征图像进行卷积处理,可进一步提取出第N-1帧图像的像素点与第N-1帧去模糊处理后的图像的像素点之间的去模糊信息,得到去模糊卷积核,该过程包括以下步骤:
对第七特征图像进行卷积处理,得到第八特征图像;通过卷积处理将第八特征图像的通道数调整至第一预设值,得到去模糊卷积核。
在一些可能实现的方式中,如图9所示,将第七特征图像输入至图9所示的模块,第七特征图像依次经过1个通道数为128的卷积层(卷积核大小为3*3)、两个通道数为64的残差块(每个残差块包含两个卷积层,卷积层的卷积核的大小为3*3)的处理,实现对第七特征图像的卷积处理,提取第七特征图像中的第N-1帧图像的像素点与第N-1帧去模糊处理后的图像的像素点之间的去模糊信息,得到第八特征图像。
图9所示的模块对第七特征图像的处理过程可参见图8所示的模块对第五特征图像的处理过程,此处将不再赘述。
需要理解的是,图8所示的模块(用于生成对齐卷积核)和图9所示的模块(用于生成去模糊卷积核)相比,图8所示模块比图9所示模块多一个卷积层(即图8所示模块的第四层),其余组成虽然相同,但两者的的权重并不一样,这也直接决定了两者的用途是不一样的。
可选地,图8所示的模块和图9所示的模块的权重可通过对图8和图9所示的模块进行训练获得。
需要理解的是,507得到的去模糊卷积核为包含第七特征图像中每个像素点的去模糊卷积核,且每个像素点的卷积核的尺寸为1*1*ck 2
接着例5继续举例(例6),第七特征图像的尺寸为25*25*128*k*k,也就是说,第七特征图像中包含25*25个像素点,相应地,得到的去模糊卷积核(尺寸为25*25*128k 2)中包含25*25个去模糊卷积核(即每个像素点对应一个去模糊卷积核,且每个像素点的去模糊卷积核的尺寸为1*1*128k 2)。
通过将第七特征图像中每个像素点的3个维度的信息合成为一个维度的信息,将第七特征图像中的每个像素点的信息合成为一个卷积核,即每个像素点的去模糊卷积核。
本实施例通过对待处理图像的特征图像进行卷积处理,提取出第N-1帧图像的像素点与第N帧图像的像素点之间的运动信息,得到每个像素点的对齐卷积核。再通过对第七特征图像进行卷积处 理,提取出第N-1帧图像的像素点与第N-1帧去模糊处理后的图像的像素点之间的去模糊信息,得到每个像素点的去模糊卷积核。以便于后续通过对齐卷积核以及去模糊卷积核对第N帧图像进行去模糊处理。
本实施例详细阐述了如何得到去模糊卷积核以及对齐卷积核,下述实施例将详细阐述如何通过去模糊卷积核以及对齐卷积核去除第N帧图像中的模糊,并得到第N帧去模糊处理后的图像。
请参阅图10,图10是本申请实施例提供的另一种视频图像处理方法的流程示意图,如图10所示,所述方法包括:
901、通过去模糊卷积核对第N帧图像的特征图像的像素点进行卷积处理,得到第一特征图像。
上述第N帧图像的特征图像可通过对第N帧图像进行特征提取处理得到,其中,特征提取处理可以是卷积处理,也可以是池化处理,本申请实施例对此不做限定。
在一种可能实现的方式中,可通过图7所示的编码模块对第N帧图像进行特征提取处理,得到第N帧图像的特征图像。其中,图7的具体组成,以及图7对第N帧图像的处理过程可参见502,此处将不再赘述。
通过图7所示的编码模块对第N帧图像进行特征提取处理,得到的第N帧图像的特征图像的尺寸比第N帧图像的尺寸小,且第N帧图像的特征图像包含第N帧图像的信息(在本申请中,此处的信息可理解为第N帧图像中模糊区域的信息),因此后续对第N帧图像的特征图像进行处理可减小数据处理量,提高处理速度。
如上所述,对待处理图像中的每个像素点进行卷积处理,分别得到每个像素点的去模糊卷积核,通过去模糊卷积核对第N帧图像的特征图像的像素点进行卷积处理指:将通过前述实施例得到的去模糊卷积核中每个像素点的去模糊卷积核分别作为第N帧图像的特征图像中对应的像素点的卷积核,对第N帧图像的特征图像的每个像素点进行卷积处理。
如507所述,去模糊卷积核中的每个像素点的去模糊卷积核包含了第七特征图像中每个像素点的信息,且该信息在去模糊卷积核中是一维信息。而第N帧图像的特征图像的像素点是三维,因此,为将第七特征图像中每个像素点的信息分别作为第N帧图像的特征图像中每个像素点的卷积核进行卷积处理,需要调整去模糊卷积核的维度。基于上述考虑,901的实现过程包括以下步骤:
调整去模糊卷积核的维度,使去模糊卷积核的通道数与第N帧图像的特征图像的通道数相同;通过调整维度后的去模糊卷积核对第N帧图像的特征图像的像素点进行卷积处理,得到第一特征图像。
请参见图11,通过图11所示模块(自适应卷积处理模块)可将前述实施例得到的去模糊卷积核中每个像素点的去模糊卷积核作为第N帧图像的特征图像中对应的像素点的卷积核,并对该像素点进行卷积处理。
图11中的调整维度(reshape)指对去模糊卷积核中每个像素点的去模糊卷积核的维度,即将每个像素点的去模糊核的维度由1*1*ck 2调整为c*k*k。
接着例6继续举例(例7),每个像素点的去模糊卷积核的尺寸为1*1*128k 2,对每个像素点的去模糊卷积核进行reshape后,得到的卷积核的尺寸为128*k*k。
通过reshape得到第N帧图像的特征图像的每个像素点的去模糊卷积核,并通过每个像素点的去模糊卷积核分别对每个像素点进行卷积处理,以去除第N帧图像的特征图像的每个像素点的模糊,最终得到第一特征图像。
902、通过所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到第二特征图像。
在本申请的一种可选实施例中,所述通过所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到第二特征图像,包括:调整所述对齐卷积核的维度,使所述对齐卷积核的通道数与所述第N-1帧图像的特征图像的通道数相同;通过调整维度后的所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到所述第二特征图像。
本实施例与901通过图11所示的模块实现将前述实施例得到的去模糊卷积核作为第N帧图像的特征图像每个像素点的去模糊卷积核,对第N帧图像的特征图像进行去模糊处理相同,通过图11所示的模块中的reshape将前述实施例得到的对齐卷积核中每个像素点的对齐卷积核的维度调整为128*k*k,并通过调整维度后的对齐卷积核对第N-1帧去模糊处理后的图像的特征图像中对应的像素点进行卷积处理。实现以当前帧为基准,对第N-1帧去模糊处理后的图像的特征图像进行对齐 处理,即根据每个像素点的对齐核中包含的运动信息,分别调整第N-1帧去模糊处理后的图像的特征图像中每个像素点的位置,得到第二特征图像。
第N-1帧去模糊处理后的图像的特征图像中包含大量清晰(即不存在模糊)的像素点,但第N-1帧去模糊处理后的图像的特征图像中的像素点与当前帧的像素点之间存在位移。因此,通过902的处理调整第N-1帧去模糊处理后的图像的特征图像的像素点的位置,使调整位置后的像素点更接近于第N帧时刻的位置(此处的位置指被拍摄对象在第N帧图像中的位置)。这样,后续处理就可利用第二特征图像的信息去除第N帧图像中的模糊。
需要理解的是,901与902之间并无先后顺序,即可以先执行901,再执行902,也可以先执行902,再执行901,还可以同时执行901和902。进一步地,在通过504得到对齐卷积核之后,可以先执行901,再执行505~507,也可以先执行505~507,再执行901或902。本申请实施例对此不作限定。
903、对所述第一特征图像和所述第二特征图像进行融合处理,得到第三特征图像。
通过将第一特征图像与第二特征图像进行融合处理,可在基于第N-1帧图像的像素点与第N帧图像的像素点之间的运动信息以及第N-1帧图像的像素点与第N-1帧去模糊处理后的图像的像素点之间的去模糊信息去模糊的基础上,利用(对齐后的)第N-1帧图像的特征图像的信息提升去模糊的效果。
在一种可能实现的方式中,对第一特征图像和第二特征图像在通道维度上进行叠加处理(concatenate),得到第三特征图像。
904、对所述第三特征图像进行解码处理,得到所述第N帧去模糊处理后的图像。
本申请实施例中,解码处理可以是解卷积处理、反卷积处理、双线性插值处理、反池化处理中的任意一种,也可以是解卷积处理、反卷积处理、双线性插值处理、反池化处理中的任意一种与卷积处理的结合,本申请对此不作限定。
在一种可能实现的方式中,请参见图12,图12所示为解码模块,依次包括一个通道数为64的解卷积层(卷积核的大小为3*3)、两个通道数为64的残差块(每个残差块包含两个卷积层,卷积层的卷积核的大小为3*3),一个通道数为32的解卷积层(卷积核的大小为3*3)、两个通道数为32的残差块(每个残差块包含两个卷积层,卷积层的卷积核的大小为3*3)。通过图12所示的解码模块对第三特征图像进行解码处理,得到第N帧去模糊处理后的图像包括以下步骤:对第三特征图像进行解卷积处理,得到第九特征图像;对第九特征图像进行卷积处理,得到第N帧解码处理后的图像。
可选地,在得到第N帧解码处理后的图像后,还可将第N帧图像的第一像素点的像素值与第N帧解码处理后的图像的第二像素点的像素值相加,得到第N帧去模糊处理后的图像,其中,第一像素点在第N帧图像中的位置与第二像素点在第N帧解码处理后的图像中的位置相同。使第N帧去模糊处理后的图像更自然。
通过本实施例可通过前述实施例得到的去模糊卷积核对第N帧图像的特征图像进行去模糊处理,以及通过前述实施例得到的对齐卷积核对第N-1帧图像的特征图像进行对齐处理。通过对去模糊处理得到的第一特征图像和对齐处理得到的第二特征图像融合后得到的第三特征图像进行解码处理,可提升对第N帧图像的去模糊效果,使第N帧去模糊处理后的图像更自然。且本实施例的去模糊处理和对齐处理均的作用对象均是特征图像,因此,数据处理量小,处理速度快,可实现对视频图像的实时去模糊。
本申请还提供了一种视频图像去模糊神经网络,用于实现前述实施例中的方法。
请参阅图13,图13是本申请实施例提供的一种视频图像去模糊神经网络的结构示意图。如图13所示,视频图像去模糊神经网络包括:编码模块、对齐卷积核以及去模糊卷积核生成模块、解码模块。其中,图13中的编码模块与图7所示的编码模块相同,图13中的解码模块与图12所示的解码模块相同,此处将不再赘述。
请参见图14,图14所示的对齐卷积核以及去模糊卷积核生成模块包括:解码模块、对齐卷积核生成模块、去模糊卷积核生成模块,且对齐卷积核生成模块与去模糊卷积核生成模块之间包含一个通道数为128、卷积核的大小为1*1的卷积层,该卷积层后连接一个融合(concatenate)层。
需要指出的是,图14所示的自适应卷积层即为图11所示的模块。图14所示模块生成的对齐卷积核和去模糊卷积核通过自适应卷积层分别对第N-1帧图像的特征图像的像素点以及第N帧图像的特征图像的像素点进行卷积处理(即对齐处理和去模糊处理),得到第N-1帧图像的特征图像对齐后的特征图像以及第N帧图像的特征图像去模糊处理后的特征图像。
通过concatenate将上述对齐后的特征图像和去模糊处理后的特征图像在通道维度上进行串联,得到第N帧融合后的特征图像,并将第N帧融合后的特征图像输入至解码模块,以及作为视频图像去模糊神经网络对第N+1帧图像进行处理的输入。
通过解码模块对第N帧融合后的特征图像的解码处理,得到第N帧解码处理后的图像,并将第N帧图像的第一像素点的像素值与第N帧解码处理后的图像的第二像素点的像素值相加,得到第N帧去模糊处理后的图像,其中,第一像素点在第N帧图像中的位置与第二像素点在第N帧解码处理后的图像中的位置相同。并将第N帧图像以及第N帧去模糊处理后的图像作为视频图像去模糊神经网络对第N+1帧图像进行处理的输入。
从上述过程中不难看出,视频图像去模糊神经网络对视频中的每一帧图像进行去模糊处理需要4个输入,以去模糊对象为第N帧图像为例,这4个输入分别为:第N-1帧图像、第N-1帧去模糊处理后的图像、第N帧图像以及第N-1帧去模糊处理后的图像的特征图像(即上述第N帧融合后的特征图像)。
通过本实施例提供的视频图像去模糊神经网络可对视频图像进行去模糊处理,且整个处理过程只需4个输入,即可直接得到去模糊处理后的图像,处理速度快。通过去模糊卷积核生成模块和对齐卷积核生成模块为图像中的每个像素点生成一个去模糊卷积核以及对齐卷积核,可提高视频图像去模糊神经网络对视频中不同帧的非均匀模糊图像的去模糊效果。
基于实施例提供的视频图像去模糊神经网络,本申请实施例提供了一种视频图像去模糊神经网络的训练方法。
本实施例根据均方误差损失函数确定视频图像去模糊神经网络输出的第N帧去模糊处理后的图像与第N帧图像的清晰图像(即第N帧图像的监督数据(ground truth))之间的误差。均方差损失函数的具体表达式如下:
Figure PCTCN2019114139-appb-000001
其中,C、H、W分别是第N帧图像(假设视频图像去模糊神经网络对第N帧图像进行去模糊处理)通道数、高、宽,R是视频图像去模糊神经网络输入的第N帧去模糊处理后的图像,S是第N帧图像的监督数据。
通过感知损失函数(perceptual loss function)确定VGG-19网络输出的第N帧去模糊处理后的图像的特征与第N帧图像的监督数据的特征之间的欧氏距离。感知损失函数的具体表达式如下:
Figure PCTCN2019114139-appb-000002
其中,Ф j(·)为预先训练好的VGG-19网络中第j层输出的特征图像,
Figure PCTCN2019114139-appb-000003
分别为该特征图像的通道数、高、宽,R是视频图像去模糊神经网络输入的第N帧去模糊处理后的图像,S是第N帧图像的监督数据(ground truth)。
最后,本实施例通过对公式(1)以及公式(2)进行加权求和,得到视频图像去模糊神经网络的损失函数,具体表达式如下:
Figure PCTCN2019114139-appb-000004
其中,λ为权重;可选地,λ为自然数。
可选地,上述j的取值可以为15,λ的取值为0.01。
基于本实施例提供的损失函数,可完成对本实施例的视频图像去模糊神经网络的训练。
根据前述实施例提供的视频图像处理方法,以及视频图像去模糊神经网络,本申请实施例提供了几种可能实现应用场景。
将本申请实施例应用于无人机中,可实时去除无人机拍摄到的视频图像的模糊,为用户提供更清晰的视频。同时,无人机的飞行控制系统基于去模糊处理后的视频图像进行处理,控制无人机的姿态和运动,可提高控制精度,为无人机完成各种空中作业提供有力的支持。
还可将本申请实施例应用于移动终端(如:手机、运动相机等等),用户通过终端对运动剧烈的对象进行视频采集,终端通过运行本申请实施例提供的方法,可对用户拍摄的视频进行实时处理,减小因被拍摄对象的剧烈运动产生的模糊,提高用户体验。其中,被拍摄对象的剧烈运动指终端与被拍摄对象之间的相对运动。
本申请实施例提供的视频图像处理方法处理速度快,实时性好。本申请实施例提供的神经网络的权重少,运行该神经网络所需的处理资源少,因此,可应用于移动终端。
上述详细阐述了本申请实施例的方法,下面提供了本申请实施例的装置。
请参阅图15,图15为本申请实施例提供的一种视频图像处理装置的结构示意图,该装置1包括:获取单元11、第一处理单元12以及第二处理单元13,其中:
获取单元11,配置为获取多帧连续视频图像,其中,所述多帧连续视频图像包括第N帧图像、第N-1帧图像以及第N-1帧去模糊处理后的图像,所述N为正整数;
第一处理单元12,配置为基于所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像,得到所述第N帧图像的去模糊卷积核;
第二处理单元13,配置为通过所述去模糊卷积核对所述第N帧图像进行去模糊处理,得到第N帧去模糊处理后的图像。
在一种可能实现的方式中,所述第一处理单元12包括:第一卷积处理子单元121,配置为对待处理图像的像素点进行卷积处理,得到去模糊卷积核,其中,所述待处理图像由所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像在通道维度上进行叠加得到。
在另一种可能实现的方式中,所述第一卷积处理子单元121配置为:对所述待处理图像进行卷积处理,以提取所述第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,得到对齐卷积核,其中,所述运动信息包括速度和方向;以及对所述对齐卷积核进行编码处理,得到所述去模糊卷积核。
在又一种可能实现的方式中,所述第二处理单元13包括:第二卷积处理子单元131,配置为通过所述去模糊卷积核对所述第N帧图像的特征图像的像素点进行卷积处理,得到第一特征图像;解码处理子单元132,配置为对所述第一特征图像进行解码处理,得到所述第N帧去模糊处理后的图像。
在又一种可能实现的方式中,所述第二卷积处理子单元131配置为:调整所述去模糊卷积核的维度,使所述去模糊卷积核的通道数与所述第N帧图像的特征图像的通道数相同;以及通过调整维度后的所述去模糊卷积核对所述第N帧图像的特征图像的像素点进行卷积处理,得到所述第一特征图像。
在又一种可能实现的方式中,所述第一卷积处理子单元121还配置为:在所述对所述待处理图像进行卷积处理,以提取所述第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,得到对齐卷积核之后,通过所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到第二特征图像。
在又一种可能实现的方式中,所述第一卷积处理子单元121还配置为:调整所述对齐卷积核的维度,使所述对齐卷积核的通道数与所述第N-1帧图像的特征图像的通道数相同;以及通过调整维度后的所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到所述第二特征图像。
在又一种可能实现的方式中,所述第二处理单元13配置为:对所述第一特征图像和所述第二特征图像进行融合处理,得到第三特征图像;以及对所述第三特征图像进行解码处理,得到所述第N帧去模糊处理后的图像。
在又一种可能实现的方式中,所述第一卷积处理子单元121还配置为:对所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像在通道维度上进行叠加处理,得到所述待处理图像;以及对所述待处理图像进行编码处理,得到第四特征图像;以及对所述第四特征图像进行卷积处理,得到第五特征图像;以及通过卷积处理将所述第五特征图像的通道数调整至第一预设值,得到所述对齐卷积核。
在又一种可能实现的方式中,所述第一卷积处理子单元121还配置为:通过卷积处理将所述对齐卷积核的通道数调整至所述第二预设值,得到第六特征图像;以及对所述第四特征图像和所述第六特征图像进行融合处理,得到第七特征图像;以及对所述第七特征图像进行卷积处理,以提取所述第N-1帧去模糊处理后的图像的像素点相对于所述第N-1帧图像的像素点的去模糊信息,得到所述去模糊卷积核。
在又一种可能实现的方式中,所述第一卷积处理子单元121还配置为:对所述第七特征图像进行卷积处理,得到第八特征图像;以及通过卷积处理将所述第八特征图像的通道数调整至所述第一预设值,得到所述去模糊卷积核。
在又一种可能实现的方式中,所述第二处理单元13还配置为:对所述第三特征图像进行解卷积处理,得到第九特征图像;以及对所述第九特征图像进行卷积处理,得到第N帧解码处理后的图像;以及将所述第N帧图像的第一像素点的像素值与所述第N帧解码处理后的图像的第二像素点的像素 值相加,得到所述第N帧去模糊处理后的图像,其中,所述第一像素点在所述第N帧图像中的位置与所述第二像素点在所述第N帧解码处理后的图像中的位置相同。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的单元可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。
本申请实施例还提供了一种电子设备,包括:处理器、输入装置、输出装置和存储器,所述处理器、输入装置、输出装置和存储器相互连接,所述存储器中存储有程序指令;所述程序指令被所述处理器执行时,使所述处理器执行如本申请实施例所述的方法。
本申请实施例还提供了一种处理器,所述处理器用于执行如本申请实施例所述的方法。
图16为本申请实施例提供的一种电子设备的硬件结构示意图。该电子设备2包括处理器21、存储器22和摄像头23。该处理器21、存储器22和摄像头23通过连接器相耦合,该连接器包括各类接口、传输线或总线等等,本申请实施例对此不作限定。应当理解,本申请的各个实施例中,耦合是指通过特定方式的相互联系,包括直接相连或者通过其他设备间接相连,例如可以通过各类接口、传输线、总线等相连。
处理器21可以是一个或多个图形处理器(Graphics Processing Unit,GPU),在处理器21是一个GPU的情况下,该GPU可以是单核GPU,也可以是多核GPU。可选的,处理器21可以是多个GPU构成的处理器组,多个处理器之间通过一个或多个总线彼此耦合。可选的,该处理器还可以为其他类型的处理器等等,本申请实施例不作限定。
存储器22可用于存储计算机程序指令,以及用于执行本申请方案的程序代码在内的各类计算机程序代码。可选地,存储器包括但不限于是随机存储记忆体(Random Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)、可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、或便携式只读存储器(Compact Disc Read-Only Memory,CD-ROM),该存储器用于相关指令及数据。
摄像头23,可用于获取相关视频或图像等等。
可理解,本申请实施例中,存储器不仅可用于存储相关指令,还可用于存储相关图像以及视频,如该存储器可用于存储通过摄像头23获取的视频,又或者该存储器还可用于存储通过处理器21而生成的去模糊处理后的图像等等,本申请实施例对于该存储器中具体所存储的视频或图像不作限定。
可以理解的是,图16仅仅示出了视频图像处理装置的简化设计。在实际应用中,视频图像处理装置还可以分别包含必要的其他元件,包含但不限于任意数量的输入/输出装置、处理器、控制器、存储器等,而所有可以实现本申请实施例的装置都在本申请的保护范围之内。
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被电子设备的处理器执行时,使所述处理器执行本申请实施例所述的方法。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。所属领域的技术人员还可以清楚地了解到,本申请各个实施例描述各有侧重,为描述的方便和简洁,相同或类似的部分在不同实施例中可能没有赘述,因此,在某一实施例未描述或未详细描述的部分可以参见其他实施例的记载。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单 独物理存在,也可以两个或两个以上单元集成在一个单元中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(Digital Subscriber Line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,数字通用光盘(Digital Versatile Disc,DVD))、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来指令相关的硬件完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:只读存储器(Read-Only Memory,ROM)或随机存储存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可存储程序代码的介质。

Claims (27)

  1. 一种视频图像处理方法,包括:
    获取多帧连续视频图像,其中,所述多帧连续视频图像包括第N帧图像、第N-1帧图像以及第N-1帧去模糊处理后的图像,所述N为正整数;
    基于所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像,得到所述第N帧图像的去模糊卷积核;
    通过所述去模糊卷积核对所述第N帧图像进行去模糊处理,得到第N帧去模糊处理后的图像。
  2. 根据权利要求1所述的方法,其中,所述基于所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像,得到所述第N帧图像的去模糊卷积核,包括:
    对待处理图像的像素点进行卷积处理,得到去模糊卷积核,其中,所述待处理图像由所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像在通道维度上进行叠加得到。
  3. 根据权利要求2所述的方法,其中,所述对待处理图像的像素点进行卷积处理,得到去模糊卷积核,包括:
    对所述待处理图像进行卷积处理,以提取所述第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,得到对齐卷积核,其中,所述运动信息包括速度和方向;
    对所述对齐卷积核进行编码处理,得到所述去模糊卷积核。
  4. 根据权利要求2或3所述的方法,其中,所述通过所述去模糊卷积核对所述第N帧图像进行去模糊处理,得到第N帧去模糊处理后的图像,包括:
    通过所述去模糊卷积核对所述第N帧图像的特征图像的像素点进行卷积处理,得到第一特征图像;
    对所述第一特征图像进行解码处理,得到所述第N帧去模糊处理后的图像。
  5. 根据权利要求4所述的方法,其中,所述通过所述去模糊卷积核对所述第N帧图像的特征图像的像素点进行卷积处理,得到第一特征图像,包括:
    调整所述去模糊卷积核的维度,使所述去模糊卷积核的通道数与所述第N帧图像的特征图像的通道数相同;
    通过调整维度后的所述去模糊卷积核对所述第N帧图像的特征图像的像素点进行卷积处理,得到所述第一特征图像。
  6. 根据权利要求3所述的方法,其中,所述对所述待处理图像进行卷积处理,以提取所述第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,得到对齐卷积核之后,还包括:
    通过所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到第二特征图像。
  7. 根据权利要求6所述的方法,其中,所述通过所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到第二特征图像,包括:
    调整所述对齐卷积核的维度,使所述对齐卷积核的通道数与所述第N-1帧图像的特征图像的通道数相同;
    通过调整维度后的所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到所述第二特征图像。
  8. 根据权利要求7所述的方法,其中,所述对所述第一特征图像进行解码处理,得到所述第N帧去模糊处理后的图像,包括:
    对所述第一特征图像和所述第二特征图像进行融合处理,得到第三特征图像;
    对所述第三特征图像进行解码处理,得到所述第N帧去模糊处理后的图像。
  9. 根据权利要求3所述的方法,其中,所述对所述待处理图像进行卷积处理,以提取所述第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,得到对齐卷积核,包括:
    对所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像在通道维度上进行叠加处理,得到所述待处理图像;
    对所述待处理图像进行编码处理,得到第四特征图像;
    对所述第四特征图像进行卷积处理,得到第五特征图像;
    通过卷积处理将所述第五特征图像的通道数调整至第一预设值,得到所述对齐卷积核。
  10. 根据权利要求9所述的方法,其中,所述对齐卷积核进行编码处理,得到所述去模糊卷积核,包括:
    通过卷积处理将所述对齐卷积核的通道数调整至第二预设值,得到第六特征图像;
    对所述第四特征图像和所述第六特征图像进行融合处理,得到第七特征图像;
    对所述第七特征图像进行卷积处理,以提取所述第N-1帧去模糊处理后的图像的像素点相对于所述第N-1帧图像的像素点的去模糊信息,得到所述去模糊卷积核。
  11. 根据权利要求10所述的方法,其中,所述对所述第七特征图像进行卷积处理,以提取所述第N-1帧去模糊处理后的图像相对于所述第N-1帧图像的像素点的去模糊信息,得到所述去模糊卷积核,包括:
    对所述第七特征图像进行卷积处理,得到第八特征图像;
    通过卷积处理将所述第八特征图像的通道数调整至所述第一预设值,得到所述去模糊卷积核。
  12. 根据权利要求8所述的方法,其中,所述对所述第三特征图像进行解码处理,得到所述第N帧去模糊处理后的图像,包括:
    对所述第三特征图像进行解卷积处理,得到第九特征图像;
    对所述第九特征图像进行卷积处理,得到第N帧解码处理后的图像;
    将所述第N帧图像的第一像素点的像素值与所述第N帧解码处理后的图像的第二像素点的像素值相加,得到所述第N帧去模糊处理后的图像,其中,所述第一像素点在所述第N帧图像中的位置与所述第二像素点在所述第N帧解码处理后的图像中的位置相同。
  13. 一种视频图像处理装置,包括:
    获取单元,配置为获取多帧连续视频图像,其中,所述多帧连续视频图像包括第N帧图像、第N-1帧图像以及第N-1帧去模糊处理后的图像,所述N为正整数;
    第一处理单元,配置为基于所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像,得到所述第N帧图像的去模糊卷积核;
    第二处理单元,配置为通过所述去模糊卷积核对所述第N帧图像进行去模糊处理,得到第N帧去模糊处理后的图像。
  14. 根据权利要求13所述的装置,其中,所述第一处理单元包括:
    第一卷积处理子单元,配置为对待处理图像的像素点进行卷积处理,得到去模糊卷积核,其中,所述待处理图像由所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像在通道维度上进行叠加得到。
  15. 根据权利要求14所述的装置,其中,所述第一卷积处理子单元配置为:对所述待处理图像进行卷积处理,以提取所述第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,得到对齐卷积核,其中,所述运动信息包括速度和方向;以及对所述对齐卷积核进行编码处理,得到所述去模糊卷积核。
  16. 根据权利要求14或15所述的装置,其中,所述第二处理单元包括:第二卷积处理子单元,配置为通过所述去模糊卷积核对所述第N帧图像的特征图像的像素点进行卷积处理,得到第一特征图像;
    解码处理子单元,配置为对所述第一特征图像进行解码处理,得到所述第N帧去模糊处理后的图像。
  17. 根据权利要求16所述的装置,其中,所述第二卷积处理子单元配置为:调整所述去模糊卷积核的维度,使所述去模糊卷积核的通道数与所述第N帧图像的特征图像的通道数相同;以及通过调整维度后的所述去模糊卷积核对所述第N帧图像的特征图像的像素点进行卷积处理,得到所述第一特征图像。
  18. 根据权利要求15所述的装置,其中,所述第一卷积处理子单元还配置为:在所述对所述待处理图像进行卷积处理,以提取所述第N-1帧图像的像素点相对于所述第N帧图像的像素点的运动信息,得到对齐卷积核之后,通过所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到第二特征图像。
  19. 根据权利要求18所述的装置,其中,所述第一卷积处理子单元还配置为:调整所述对齐卷积核的维度,使所述对齐卷积核的通道数与所述第N-1帧图像的特征图像的通道数相同;以及通过调整维度后的所述对齐卷积核对所述第N-1帧去模糊处理后的图像的特征图像的像素点进行卷积处理,得到所述第二特征图像。
  20. 根据权利要求19所述的装置,其中,所述第二处理单元配置为:对所述第一特征图像和所 述第二特征图像进行融合处理,得到第三特征图像;以及对所述第三特征图像进行解码处理,得到所述第N帧去模糊处理后的图像。
  21. 根据权利要求15所述的装置,其中,所述第一卷积处理子单元还配置为:对所述第N帧图像、所述第N-1帧图像以及所述第N-1帧去模糊处理后的图像在通道维度上进行叠加处理,得到所述待处理图像;以及对所述待处理图像进行编码处理,得到第四特征图像;以及对所述第四特征图像进行卷积处理,得到第五特征图像;以及通过卷积处理将所述第五特征图像的通道数调整至第一预设值,得到所述对齐卷积核。
  22. 根据权利要求21所述的装置,其中,所述第一卷积处理子单元还配置为:通过卷积处理将所述对齐卷积核的通道数调整至第二预设值,得到第六特征图像;以及对所述第四特征图像和所述第六特征图像进行融合处理,得到第七特征图像;以及对所述第七特征图像进行卷积处理,以提取所述第N-1帧去模糊处理后的图像的像素点相对于所述第N-1帧图像的像素点的去模糊信息,得到所述去模糊卷积核。
  23. 根据权利要求22所述的装置,其中,所述第一卷积处理子单元还配置为:对所述第七特征图像进行卷积处理,得到第八特征图像;以及通过卷积处理将所述第八特征图像的通道数调整至所述第一预设值,得到所述去模糊卷积核。
  24. 根据权利要求20所述的方法,其中,所述第二处理单元还配置为:对所述第三特征图像进行解卷积处理,得到第九特征图像;以及对所述第九特征图像进行卷积处理,得到第N帧解码处理后的图像;以及将所述第N帧图像的第一像素点的像素值与所述第N帧解码处理后的图像的第二像素点的像素值相加,得到所述第N帧去模糊处理后的图像,其中,所述第一像素点在所述第N帧图像中的位置与所述第二像素点在所述第N帧解码处理后的图像中的位置相同。
  25. 一种处理器,所述处理器用于执行如权利要求1至12任意一项所述的方法。
  26. 一种电子设备,包括:处理器、输入装置、输出装置和存储器,所述处理器、输入装置、输出装置和存储器相互连接,所述存储器中存储有程序指令;所述程序指令被所述处理器执行时,使所述处理器执行如权利要求1至12任一项权利要求所述的方法。
  27. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被电子设备的处理器执行时,使所述处理器执行权利要求1至12任意一项所述的方法。
PCT/CN2019/114139 2019-04-22 2019-10-29 视频图像处理方法及装置 WO2020215644A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2021520271A JP7123256B2 (ja) 2019-04-22 2019-10-29 ビデオ画像処理方法及び装置
KR1020217009399A KR20210048544A (ko) 2019-04-22 2019-10-29 비디오 이미지 처리 방법 및 장치
SG11202108197SA SG11202108197SA (en) 2019-04-22 2019-10-29 Video image processing method and apparatus
US17/384,910 US20210352212A1 (en) 2019-04-22 2021-07-26 Video image processing method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910325282.5 2019-04-22
CN201910325282.5A CN110062164B (zh) 2019-04-22 2019-04-22 视频图像处理方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/384,910 Continuation US20210352212A1 (en) 2019-04-22 2021-07-26 Video image processing method and apparatus

Publications (1)

Publication Number Publication Date
WO2020215644A1 true WO2020215644A1 (zh) 2020-10-29

Family

ID=67319990

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/114139 WO2020215644A1 (zh) 2019-04-22 2019-10-29 视频图像处理方法及装置

Country Status (7)

Country Link
US (1) US20210352212A1 (zh)
JP (1) JP7123256B2 (zh)
KR (1) KR20210048544A (zh)
CN (3) CN113992848A (zh)
SG (1) SG11202108197SA (zh)
TW (1) TWI759668B (zh)
WO (1) WO2020215644A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2023523502A (ja) * 2021-04-07 2023-06-06 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド モデルトレーニング方法、歩行者再識別方法、装置および電子機器

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113992848A (zh) * 2019-04-22 2022-01-28 深圳市商汤科技有限公司 视频图像处理方法及装置
CN112465698A (zh) 2019-09-06 2021-03-09 华为技术有限公司 一种图像处理方法和装置
CN111241985B (zh) * 2020-01-08 2022-09-09 腾讯科技(深圳)有限公司 一种视频内容识别方法、装置、存储介质、以及电子设备
CN112200732B (zh) * 2020-04-30 2022-10-21 南京理工大学 一种清晰特征融合的视频去模糊方法
CN113409209B (zh) * 2021-06-17 2024-06-21 Oppo广东移动通信有限公司 图像去模糊方法、装置、电子设备与存储介质
US20230034727A1 (en) * 2021-07-29 2023-02-02 Rakuten Group, Inc. Blur-robust image segmentation
CN116362976A (zh) * 2021-12-22 2023-06-30 北京字跳网络技术有限公司 一种模糊视频修复方法及装置
CN114708166A (zh) * 2022-04-08 2022-07-05 Oppo广东移动通信有限公司 图像处理方法、装置、存储介质以及终端
CN116132798B (zh) * 2023-02-02 2023-06-30 深圳市泰迅数码有限公司 一种智能摄像头的自动跟拍方法
CN116128769B (zh) * 2023-04-18 2023-06-23 聊城市金邦机械设备有限公司 摇摆运动机构的轨迹视觉记录系统

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100201865A1 (en) * 2009-02-09 2010-08-12 Samsung Electronics Co., Ltd. Imaging method for use with variable coded aperture device and imaging apparatus using the imaging method
US20120033096A1 (en) * 2010-08-06 2012-02-09 Honeywell International, Inc. Motion blur modeling for image formation
CN102576454A (zh) * 2009-10-16 2012-07-11 伊斯曼柯达公司 利用空间图像先验的图像去模糊法
CN104103050A (zh) * 2014-08-07 2014-10-15 重庆大学 一种基于局部策略的真实视频复原方法
CN108109121A (zh) * 2017-12-18 2018-06-01 深圳市唯特视科技有限公司 一种基于卷积神经网络的人脸模糊快速消除方法
CN108875900A (zh) * 2017-11-02 2018-11-23 北京旷视科技有限公司 视频图像处理方法和装置、神经网络训练方法、存储介质
CN109345449A (zh) * 2018-07-17 2019-02-15 西安交通大学 一种基于融合网络的图像超分辨率及去非均匀模糊方法
CN110062164A (zh) * 2019-04-22 2019-07-26 深圳市商汤科技有限公司 视频图像处理方法及装置

Family Cites Families (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8654201B2 (en) * 2005-02-23 2014-02-18 Hewlett-Packard Development Company, L.P. Method for deblurring an image
EP2153407A1 (en) * 2007-05-02 2010-02-17 Agency for Science, Technology and Research Motion compensated image averaging
KR101574733B1 (ko) * 2008-11-19 2015-12-04 삼성전자 주식회사 고화질 컬러 영상을 획득하기 위한 영상 처리 장치 및 방법
WO2010093040A1 (ja) 2009-02-13 2010-08-19 国立大学法人静岡大学 モーションブラー制御装置、方法、及びプログラム
US8379120B2 (en) * 2009-11-04 2013-02-19 Eastman Kodak Company Image deblurring using a combined differential image
JP5204165B2 (ja) * 2010-08-05 2013-06-05 パナソニック株式会社 画像復元装置および画像復元方法
CN102073993B (zh) * 2010-12-29 2012-08-22 清华大学 一种基于摄像机自标定的抖动视频去模糊方法和装置
CN102158730B (zh) * 2011-05-26 2014-04-02 威盛电子股份有限公司 影像处理系统及方法
KR101844332B1 (ko) * 2012-03-13 2018-04-03 삼성전자주식회사 블러 영상 및 노이즈 영상으로 구성된 멀티 프레임을 이용하여 비균일 모션 블러를 제거하는 방법 및 장치
CN103049891B (zh) * 2013-01-25 2015-04-08 西安电子科技大学 基于自适应窗口选择的视频图像去模糊方法
US9392173B2 (en) * 2013-12-13 2016-07-12 Adobe Systems Incorporated Image deblurring based on light streaks
CN104932868B (zh) * 2014-03-17 2019-01-15 联想(北京)有限公司 一种数据处理方法及电子设备
CN104135598B (zh) * 2014-07-09 2017-05-17 清华大学深圳研究生院 一种视频图像稳定方法及装置
CN106033595B (zh) * 2015-03-13 2021-06-22 中国科学院西安光学精密机械研究所 一种基于局部约束的图像盲去模糊方法
CN105405099A (zh) * 2015-10-30 2016-03-16 北京理工大学 一种基于点扩散函数的水下图像超分辨率重建方法
CN105957036B (zh) * 2016-05-06 2018-07-10 电子科技大学 一种加强字符先验的视频去运动模糊方法
CN106251297A (zh) * 2016-07-19 2016-12-21 四川大学 一种改进的基于多幅图像模糊核估计的盲超分辨率重建算法
CN106791273B (zh) * 2016-12-07 2019-08-20 重庆大学 一种结合帧间信息的视频盲复原方法
CN107273894A (zh) * 2017-06-15 2017-10-20 珠海习悦信息技术有限公司 车牌的识别方法、装置、存储介质及处理器
CN108875486A (zh) * 2017-09-28 2018-11-23 北京旷视科技有限公司 目标对象识别方法、装置、系统和计算机可读介质
CN107944416A (zh) * 2017-12-06 2018-04-20 成都睿码科技有限责任公司 一种通过视频进行真人验证的方法
CN108256629B (zh) * 2018-01-17 2020-10-23 厦门大学 基于卷积网络和自编码的eeg信号无监督特征学习方法
CN108629743B (zh) * 2018-04-04 2022-03-25 腾讯科技(深圳)有限公司 图像的处理方法、装置、存储介质和电子装置
CN108846861B (zh) * 2018-06-12 2020-12-29 广州视源电子科技股份有限公司 图像单应矩阵计算方法、装置、移动终端及存储介质
CN108830221A (zh) * 2018-06-15 2018-11-16 北京市商汤科技开发有限公司 图像的目标对象分割及训练方法和装置、设备、介质、产品
CN109410130B (zh) * 2018-09-28 2020-12-04 华为技术有限公司 图像处理方法和图像处理装置
CN109472837A (zh) * 2018-10-24 2019-03-15 西安电子科技大学 基于条件生成对抗网络的光电图像转换方法
CN109360171B (zh) * 2018-10-26 2021-08-06 北京理工大学 一种基于神经网络的视频图像实时去模糊方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100201865A1 (en) * 2009-02-09 2010-08-12 Samsung Electronics Co., Ltd. Imaging method for use with variable coded aperture device and imaging apparatus using the imaging method
CN102576454A (zh) * 2009-10-16 2012-07-11 伊斯曼柯达公司 利用空间图像先验的图像去模糊法
US20120033096A1 (en) * 2010-08-06 2012-02-09 Honeywell International, Inc. Motion blur modeling for image formation
CN104103050A (zh) * 2014-08-07 2014-10-15 重庆大学 一种基于局部策略的真实视频复原方法
CN108875900A (zh) * 2017-11-02 2018-11-23 北京旷视科技有限公司 视频图像处理方法和装置、神经网络训练方法、存储介质
CN108109121A (zh) * 2017-12-18 2018-06-01 深圳市唯特视科技有限公司 一种基于卷积神经网络的人脸模糊快速消除方法
CN109345449A (zh) * 2018-07-17 2019-02-15 西安交通大学 一种基于融合网络的图像超分辨率及去非均匀模糊方法
CN110062164A (zh) * 2019-04-22 2019-07-26 深圳市商汤科技有限公司 视频图像处理方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2023523502A (ja) * 2021-04-07 2023-06-06 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド モデルトレーニング方法、歩行者再識別方法、装置および電子機器
JP7403673B2 (ja) 2021-04-07 2023-12-22 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド モデルトレーニング方法、歩行者再識別方法、装置および電子機器

Also Published As

Publication number Publication date
CN110062164B (zh) 2021-10-26
CN113992847A (zh) 2022-01-28
TW202040986A (zh) 2020-11-01
KR20210048544A (ko) 2021-05-03
SG11202108197SA (en) 2021-08-30
US20210352212A1 (en) 2021-11-11
CN110062164A (zh) 2019-07-26
JP2021528795A (ja) 2021-10-21
JP7123256B2 (ja) 2022-08-22
TWI759668B (zh) 2022-04-01
CN113992848A (zh) 2022-01-28

Similar Documents

Publication Publication Date Title
WO2020215644A1 (zh) 视频图像处理方法及装置
US12008797B2 (en) Image segmentation method and image processing apparatus
CN110473137B (zh) 图像处理方法和装置
US11688070B2 (en) Video frame segmentation using reduced resolution neural network and masks from previous frames
TWI777185B (zh) 機器人圖像增強方法、處理器、電子設備、電腦可讀儲存介質
CN101540046B (zh) 基于图像特征的全景图拼接方法和装置
JP7086235B2 (ja) ビデオ処理方法、装置及びコンピュータ記憶媒体
WO2022042124A1 (zh) 超分辨率图像重建方法、装置、计算机设备和存储介质
CN112950471A (zh) 视频超分处理方法、装置、超分辨率重建模型、介质
CN110428382B (zh) 一种用于移动终端的高效视频增强方法、装置和存储介质
WO2020146911A2 (en) Multi-stage multi-reference bootstrapping for video super-resolution
CN109005334A (zh) 一种成像方法、装置、终端和存储介质
Conde et al. Lens-to-lens bokeh effect transformation. NTIRE 2023 challenge report
JP2023525462A (ja) 特徴を抽出するための方法、装置、電子機器、記憶媒体およびコンピュータプログラム
CN113949808A (zh) 视频生成方法、装置、可读介质及电子设备
CN107644423A (zh) 基于场景分割的视频数据实时处理方法、装置及计算设备
CN109949234A (zh) 基于深度网络的视频复原模型训练方法及视频复原方法
CN113379600A (zh) 基于深度学习的短视频超分辨率转换方法、装置及介质
CN115170383A (zh) 一种图像虚化方法、装置、存储介质及终端设备
CN114677286A (zh) 一种图像处理方法、装置、存储介质及终端设备
CN113658050A (zh) 一种图像的去噪方法、去噪装置、移动终端及存储介质
TWI586144B (zh) 用於視頻分析與編碼之多重串流處理技術
CN109996056B (zh) 一种2d视频转3d视频的方法、装置及电子设备
CN115170581A (zh) 人像分割模型的生成方法、人像分割模型及人像分割方法
CN115984348A (zh) 一种全景图像的处理方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19926716

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021520271

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20217009399

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03.02.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19926716

Country of ref document: EP

Kind code of ref document: A1