CN114612312A

CN114612312A - Video noise reduction method, intelligent terminal and computer readable storage medium

Info

Publication number: CN114612312A
Application number: CN202011423723.4A
Authority: CN
Inventors: 谢中朝; 张传昊; 刘阳兴
Original assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Current assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date: 2020-12-08
Filing date: 2020-12-08
Publication date: 2022-06-10

Abstract

The invention discloses a video noise reduction method, an intelligent terminal and a computer readable storage medium, which are used for carrying out image noise reduction on frame images in a video file, and the method comprises the following steps: acquiring a video file to be processed; according to the T-1 frame image in the video file, calculating an alignment image corresponding to the T-frame image in the video file; calculating the probability value of each pixel point in the T-1 frame image as a noise pixel according to the T-1 frame image and the alignment image; and fusing the T frame image and the alignment image according to the probability value to generate a T noise reduction frame image corresponding to the T frame image. According to the invention, by calculating the probability value of each pixel point as a noise pixel, the current frame image and the corresponding aligned image are better fused, so that when a large-amplitude object moves between the frame images, the frame images cannot be filtered in the fusion process, the phenomenon of fuzzy video after noise reduction is avoided, and the definition of the video after noise reduction is improved.

Description

Video noise reduction method, intelligent terminal and computer readable storage medium

Technical Field

The invention relates to the technical field of video noise reduction, in particular to a video noise reduction method, an intelligent terminal and a computer readable storage medium.

Background

Video is an important carrier of data, and consists of two parts, one part being sound and one part being images. Video can be viewed as having a specific time frame according to each image, and then playing back the images and sound according to the time frame. Since human eyes have a visual residual phenomenon, when the number of frames per second is less than a certain value, a continuous dynamic video feeling is generated. The image quality is reduced because each process of the image in the processes of collecting, editing, encoding, transcoding, transmitting, displaying and the like is interfered and influenced by various noises. And the quality of image processing directly affects the playing, clipping and other processing of subsequent videos. In order to obtain high quality video, it is often necessary to perform noise reduction on the images in the video to remove unwanted noise while maintaining the integrity of the original information. Noise reduction processing has thus been a hotspot of image and video processing and computer vision research. Most common is the noise introduced during the acquisition. The reduction of the noise intensity can lead the subjective effect of the image to be better, reduce the noise coding when the image and the video are compressed, improve the utilization rate of code rate, and lead the motion estimation in the video coding to be more accurate and the entropy coding speed to be faster. The source of noise is varied, with the most significant portion coming from photon shot noise. When the image is collected, the photosensitive element collects photons and converts the photons into electrons, the electrons form voltage, and the voltage is amplified and quantized to finally form a digital image. Photon shot noise occurs at the step of collecting photons. Therefore, the main principle of the conventional noise reduction method is to increase the number of photons received in a unit pixel area to reduce the noise intensity perceived by human eyes.

Noise can be understood from a spatial domain and a time domain respectively, and current noise reduction methods are divided into two types, namely a spatial domain noise reduction method and a time domain noise reduction method. Spatial domain noise reduction utilizes the correlation of the pixel space of video images to filter out noise, such as the common median, mean, gaussian, bilateral, wavelet filters. But is prone to excessive smoothing, loss of detail and time consuming. The time domain noise reduction method is to utilize the relationship between video frames to perform noise reduction according to the video noise distribution of a current frame and a previous frame, and usually aligns pixels in different frames based on motion estimation or motion compensation, and then performs weighted average fusion on corresponding pixels.

Disclosure of Invention

The invention mainly aims to provide a video noise reduction method, an intelligent terminal and a computer readable storage medium, and aims to solve the problem of poor noise reduction quality of a dynamic scene picture in the prior art.

In order to achieve the above object, the present invention provides a video denoising method for performing image denoising on a frame image in a video file, including:

acquiring a T-th frame image in a video file to be processed, wherein T is a natural number which is greater than zero and less than or equal to the number of frame images in the video file;

when T is more than or equal to 2, calculating an alignment image corresponding to the T frame image in the video file according to the T-1 frame image in the video file;

calculating the probability value of each pixel point in the T-1 frame image as a noise pixel according to the T-1 frame image and the alignment image;

and fusing the T frame image and the alignment image according to the probability value to generate a T noise reduction frame image corresponding to the T frame image.

Optionally, in the video denoising method, when T is equal to 1, the T frame image is filtered according to a preset filter, so as to generate a T denoising frame image corresponding to the T frame image.

Optionally, the video denoising method, wherein when T is greater than or equal to 2, calculating an aligned image corresponding to a T-th frame image in the video file according to a T-1-th frame image in the video file, specifically includes:

calculating optical flow values of a T-1 frame image and a T frame image in the video file to generate a motion vector;

and calculating an alignment image corresponding to the T frame image according to the motion vector.

Optionally, the video denoising method, wherein the calculating optical flow values of a T-1 th frame image and a T-th frame image in the video file to generate a motion vector specifically includes:

splitting the T-1 th frame image and the T-th frame image respectively according to a preset splitting rule to generate a first pyramid corresponding to the T-1 th frame image and a second pyramid corresponding to the T-th frame image;

calculating a first optical flow value between a first top level image in the first pyramid and a second top level image in the second pyramid;

and calculating a second optical flow value between the first pyramid and the lowest layer image in the second pyramid according to the first optical flow value, and using the second optical flow value as a motion vector between the T-1 frame image and the T-1 frame image.

Optionally, in the video denoising method, the splitting rule includes a gaussian pyramid splitting rule and a laplacian pyramid splitting rule.

Optionally, the video denoising method, wherein the first optical flow values include a first sparse optical flow value and a first dense optical flow value, and the calculating first optical flow values between the first top-level image in the first pyramid and the second top-level image in the second pyramid specifically includes:

calculating a block-shaped integral image corresponding to the first top-level image and the second top-level image;

performing reverse search solution on the block-shaped integral graph to generate a first sparse optical flow value corresponding to the first top-level image and the second top-level image;

and calculating a first dense optical flow value corresponding to the first top-level image and the second top-level image according to the first sparse optical flow value.

Optionally, the video denoising method, wherein the splitting the T-1 th frame image and the T-th frame image according to a preset splitting rule to generate a first pyramid corresponding to the T-1 th frame image and a second pyramid corresponding to the T-th frame image respectively includes:

respectively carrying out YVU channel separation on the T-1 frame image and the T frame image to generate a first Y component, a first U component and a first V component corresponding to the T-1 frame image, and a second Y component, a second U component and a second V component of the T frame image;

and respectively performing downsampling on the first Y component and the second Y component according to a preset pyramid rule to generate a first pyramid corresponding to the T-1 frame image and a second pyramid corresponding to the T-1 frame image.

Optionally, the video denoising method, wherein the aligned image includes a third Y component, a third U component, and a third V component; the calculating an aligned image corresponding to the T-th frame image according to the motion vector specifically includes:

according to the motion vector, correcting the first pyramid to generate a third Y component;

correcting the first U component according to the motion vector to generate a third U component;

and correcting the first V component according to the motion vector to generate a third V component.

Optionally, the video denoising method, wherein the calculating, according to the T-1 th frame image and the aligned image, a probability value that each pixel point in the T-th frame image is a noise pixel specifically includes:

calculating the average value of each pixel in the T-1 frame image and the T frame image to generate an average frame image;

calculating a difference value corresponding to the T frame image according to the average frame image;

and calculating the probability value of each pixel in the T frame image as a noise pixel according to the difference value and the alignment image.

Optionally, the video denoising method, wherein the fusing the T-th frame image and the aligned image according to the probability value to generate a T-th denoising frame image corresponding to the T-th frame image, specifically includes:

determining a first weight value corresponding to the T frame image and a second weight value corresponding to the aligned image according to the probability value;

and fusing the T frame image and the aligned image according to the first weight value and the second weight value to generate a T noise reduction frame image corresponding to the T frame image.

Optionally, the video denoising method, wherein the fusing the tth frame image and the aligned image according to the first weight value and the second weight value to generate a tth denoised frame image corresponding to the tth frame image specifically includes:

according to the first weight value and the second weight value, respectively fusing the second Y component and the third Y component, the second U component and the third U component, and the second V component and the third V component to respectively generate a fourth Y component, a fourth U component and a fourth V component;

filtering the fourth Y component according to a preset bilateral filter to generate a fifth Y component, wherein the bilateral filter does not include a spatial domain kernel;

filtering the fourth U component and the fourth V component according to a preset nonlinear filter to generate a fifth U component and a fifth V component;

and fusing the fifth Y component, the fifth U component and the fifth V component to generate a T-th noise reduction frame image corresponding to the T-th frame image.

Optionally, the video denoising method, wherein the video denoising method further includes:

replacing the Tth frame image in the video file with the Tth noise reduction frame image;

and recursively circulating the image noise reduction on the replaced video file until T is equal to the number of frame images in the video file, and generating and outputting a noise reduction video.

In addition, to achieve the above object, the present invention further provides an intelligent terminal, wherein the intelligent terminal includes: a memory, a processor and a video noise reduction method program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the video noise reduction method as described above.

Further, to achieve the above object, the present invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a video noise reduction method program, which when executed by a processor, implements the steps of the video noise reduction method as described above.

After the video file is obtained, an alignment image corresponding to the image of the current frame is calculated according to the image of the previous frame, because the alignment image is an ideal image, the difference between the alignment image and the image of the current frame is probably caused by noise or object movement, the probability value that each pixel point of the image of the current frame is a noise pixel is judged according to the alignment image and the image of the previous frame, and then the current frame and the alignment image are fused according to the probability value to eliminate the difference interference caused by the noise, so that the object movement between the frame images cannot be filtered in the fusion process, and the phenomenon of fuzzy video after noise reduction occurs.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of the video denoising method of the present invention;

FIG. 2 is a general flowchart of a video denoising method according to a preferred embodiment of the present invention;

FIG. 3 is a flowchart of a recursion cycle of a preferred embodiment of the video denoising method of the present invention;

fig. 4 is a schematic operating environment diagram of an intelligent terminal according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1 and 2, the video denoising method according to the preferred embodiment of the present invention is used for performing image denoising on a frame image in a video file, and specifically includes the following steps:

step S10, a T-th frame image in the video file to be processed is obtained, where T is a natural number greater than zero and less than or equal to the number of frame images in the video file.

In this embodiment, the video denoising program installed on the intelligent terminal is used for executing the video denoising method. And when the user shoots the video through the intelligent terminal, generating a video file. The intelligent terminal can start the video noise reduction program in a default starting or manual starting mode, and the program acquires the video file from a preset video file storage address as a video file to be processed.

Specifically, since a video file is essentially a fusion of a plurality of images, the video file has a plurality of frame images, and the video file can be divided into a first frame image and a second frame image … … T frame image according to the sequence, where T is a natural number greater than zero and less than or equal to the number of frame images in the video file. And sequentially acquiring the frame images in the video file according to the time sequence.

And step S20, when T is more than or equal to 2, calculating an alignment image corresponding to the T frame image in the video file according to the T-1 frame image in the video file.

When T is equal to 1, that is, the currently acquired frame image is a first frame image, filtering the T-1 th frame image, that is, the first frame image, according to a preset filter to generate a first filtered image, where the filter may be a conventional spatial domain-based image noise reduction method, such as gaussian filter, bilateral filter, and mean filter. The method mainly has the function of preprocessing and reducing the noise of the first frame image. Therefore, the implementation of the present scheme is not affected by not performing the filtering process on the first frame image. And after generating a first filtering image, replacing the first filtering image with a first frame image in the video file, taking the replaced video file as a video file to be processed, and executing subsequent video noise reduction.

When T is 2, that is, the currently acquired frame image is the second frame image, the alignment image corresponding to the second frame image is calculated according to the T-1 th frame image, that is, the first frame image. In the first implementation manner of this embodiment, a conventional homography matrix, a random sampling consistency algorithm, or the like may be used to correct the first frame image with the second frame image as a target, so as to generate an aligned image corresponding to the second frame image.

Further, in a second implementation manner of this embodiment, an optical flow algorithm is used to calculate an optical flow value between two frame images, so as to correct the first frame image and generate an aligned image corresponding to the second frame image, and step S20 includes:

step S21, calculating the optical flow value of the T-1 frame image and the T frame image in the video file, and generating a motion vector.

Optical flow (optical flow) is the instantaneous velocity of pixel motion of a spatially moving object on the viewing imaging plane. The optical flow method is a method for calculating motion information of an object between adjacent frames by using the change of pixels in an image sequence in a time domain and the correlation between adjacent frames to find the corresponding relationship between a previous frame and a current frame.

The current common optical flow estimation algorithms include a differential method, a matching-based optical flow estimation method, a phase-based optical flow estimation method, a neurodynamic method and the like. In addition to differentiating the optical flow method according to different principles, the optical flow method can be divided into dense optical flow and sparse optical flow according to the density degree of two-dimensional vectors in the formed optical flow field.

Further, when the video file is shot for a moving scene, a large displacement may exist between frame images, so that the embodiment adopts an optical flow value calculation method based on an image pyramid, which specifically includes:

step S211, splitting the T-1 th frame image and the T-th frame image respectively according to a preset splitting rule, and generating a first pyramid corresponding to the T-1 th frame image and a second pyramid corresponding to the T-th frame image.

Specifically, the pyramids used in the algorithm for calculating the optical flow value based on the image pyramid are generally a gaussian pyramid and a laplacian pyramid. Therefore, the splitting rule in this embodiment is a process of splitting the input image as an original image in a sampling manner to generate a gaussian pyramid and a laplacian pyramid.

The gaussian pyramid is a series of down-sampled images obtained by gaussian smoothing and sub-sampling, and the gaussian pyramid comprises a series of low-pass filters, the cut-off frequency of which gradually increases by a factor of 2 from the upper layer to the lower layer. The splitting rule is preset with sampling times, and the input T-1 frame image is used as an original image Y₀Image, then to said Y₀The image is down-sampled to generate an image with smaller specification named as Y₁. Then to the Y₁Downsampling to generate smaller size image Y₂And repeating the down sampling until the times are equal to the sampling times. Because the images are smaller and smaller after down sampling, all the images are arranged according to the specification to form a pyramid shape, and a preset low-pass filter pair is adoptedAnd filtering the pyramid-shaped atlas to generate the Gaussian pyramid. The down-sampling can be implemented in various ways, such as deleting even rows and even columns, deleting fixed rows, etc., which are not described herein.

The laplacian pyramid can be regarded as a residual pyramid and is used for storing the difference between the down-sampled image, i.e. the image in the laplacian pyramid, and the original image. If any image Y in the Gaussian pyramid_iFirstly, down-sampling is carried out to obtain an image Y_downThen up-sampling is carried out to obtain an image Y_upSince upsampling cannot completely restore the original image information, Y is_upAnd Y_downThere is a difference between them. To get Y_downComplete restoration of image Y_iRequiring recording of Y_upAnd Y_iThe difference between them, this is the laplacian pyramid. Thus, for each layer of the gaussian pyramid constructed in the foregoing text, the corresponding layer is found in the laplacian pyramid.

By adopting the above mode, the T-1 th frame image and the T-th frame image, that is, the first frame image and the second frame image are split, so that a first pyramid corresponding to the T-1 th frame image and a second pyramid corresponding to the T-th frame image can be generated, wherein the first pyramid comprises a first gaussian pyramid and a first laplacian pyramid, and the second pyramid comprises a second gaussian pyramid and a second laplacian pyramid.

Further, the optical flow method is established on the premise that the luminance is constant and the temporal continuity or motion is "small motion". The temporal succession of the two images can also be considered due to the very short time interval between video frames. According to the method, in order to ensure that the optical flow calculation is not affected by brightness change between frame images, the embodiment converts the first frame image and the second frame image into the grayscale image when creating the pyramid, and specifically includes:

Specifically, YUV is a color coding method, which is mainly used in the fields of television systems and analog videos, where "Y" represents brightness (Luma), that is, a gray value; the "U" and "V" represent Chroma (Chroma) which is used to describe the color and saturation of the image for specifying the color of the pixel. By separating YVU channels of the image, Y components, U components and V components corresponding to the image can be generated, wherein the Y components are gray level images corresponding to the image, and the U, V components retain the colors of the image, so that compared with a conventional mode of converting the image into gray level, the method has the advantages that the channels are split, and the method is also beneficial to combining and fusing different components into one image quickly. Because the image formats can be converted through a formula, if the original format of the frame image in the video file is an RGB format or other formats, each frame image can be converted through the formula to generate a YUV format for video denoising. The YUV storage formats include two types, one is to store the Y component of each pixel independently, U, V components of several continuous points are stored together, and the other is to store the Y, U, V component in an image by three independent arrays respectively, and in any case, the channel separation is performed to the component by the existing technology to generate three components. Therefore, YVU channel separation is performed on the first frame image and the second frame image respectively, and a first Y component, a first U component and a first V component corresponding to the first frame image, and a second Y component, a second U component and a second V component of the second frame image are generated.

And then based on the Y components of the first frame image and the second frame image, namely the gray level images corresponding to the first frame image and the second frame image, adopting a pyramid rule which is preset with a generation rule containing a Gaussian pyramid and a Laplacian pyramid to perform downsampling on the first Y component and the second Y component, and generating a first pyramid corresponding to the T-1 frame image and a second pyramid corresponding to the T-1 frame image.

Step S212, calculating a first optical flow value between the first top level image in the first pyramid and the second top level image in the second pyramid.

Specifically, starting from a first top-level image in the first pyramid and a second top-level image in the second pyramid, that is, the layer with the smallest resolution, the optical flow of each point in each top-level image is obtained by minimizing the sum of matching errors in the range of each point field. This step is mainly to solve the residual function in the process of constructing the pyramid, that is, the process of constructing the laplacian pyramid, and thus is not described in detail. The method of calculating the first optical flow value in this embodiment is an LK (Lucas-Kanade) optical flow algorithm, which is a conventional algorithm and will not be described in detail herein.

Step S213, calculating a second optical flow value between the first pyramid and the bottom-layer image in the second pyramid according to the first optical flow value, and using the second optical flow value as a motion vector between the T-1 th frame image and the T-th frame image.

Specifically, assuming that the size of the image is scaled to half of the original size each time, and the L layers are scaled together, the 0 th layer is the original image, and the displacement of the known original image is d, the first optical flow value can be represented as d^L＝d/2^LTherefore, the method can trace back to the L-1 layer according to the first optical flow value of the top layer, gradually calculate downwards along the pyramid, and repeatedly estimate the optical flow value of each layer until the bottom layer is the estimation value of each pixel point of the original image. Because the true value of the optical flow value of each layer is equal to the sum of the estimated value and the residual error, for each layer, the optical flow value of each pixel point is obtained based on the matching error and minimization of all pixel points in the neighborhood, and therefore the wireless access is close to the true value. Because the second optical flow value is obtained by obtaining a second frame image according to a first frame image, the second optical flow value has directionality, and is calibrated by using the coordinate system corresponding to the pyramid, and the second optical flow value is taken as the second optical flow valueIs a motion vector between the T-1 frame image and the T frame image.

Further, in order to increase the calculation rate of the optical flow value and reduce the time for video noise reduction, the embodiment uses a Dense Inverse Search optical flow algorithm (DIS) to calculate the optical flow value. The DIS algorithm comprises:

and calculating a first dense optical flow value corresponding to the first top-layer image and the second top-layer image according to the first sparse optical flow value.

Specifically, first, the first top-level image of the first pyramid and the second top-level image of the second pyramid are split and convolved to generate a block-shaped integral graph. And then, taking the block-shaped integral graph as an input value, and reversely searching to obtain the sparse optical flow field. In this step, each pixel point is not taken as a calculation object, but each block integral graph is taken as a calculation object, so that a sparse optical flow value, namely a first sparse optical flow value, is obtained. As can be seen from the above, when calculating the optical flow value, the minimum residual value needs to be obtained, so that the first derivative and the second derivative need to be solved for many times, while in the DIS algorithm, the first derivative and the second derivative need to be solved for only one time, so that the rate is faster than that of the conventional algorithm. Then, each pixel point in the block-shaped integral graph is used as a calculation object, and a corresponding first dense optical flow value is calculated on the basis of the first sparse optical flow value. And then, by adopting the iterative process in the step S213, calculating optical flow values of other layers of the first pyramid and the second pyramid according to the first sparse optical flow value and the first dense optical flow value, and finally obtaining a second optical flow value between images at the bottom layer.

The DIS algorithm has the greatest advantage of increasing the operation speed. The DIS optical flow algorithm is integrated into the video module in OpenCV 4.0, and thus can be conveniently implementedAnd (6) quick calling. The whole process can use formula (F)_x，F_y)＝I_DIS(G_t-1，G_t) Is shown in (F)_x，F_y) Is said second optical flow value, i.e. said motion vector, F_xIs the motion value on the horizontal axis of the coordinate system corresponding to the first pyramid or the second pyramid, F_yThe motion value I on the longitudinal axis of the coordinate system corresponding to the first pyramid or the second pyramid is_DISAs a functional representation of said DIS optical flow algorithm, G_t-1Is said first pyramid, G_tIs the second pyramid. On the basis, the real-time processing of the video file can be realized.

And step S22, calculating the alignment image corresponding to the T frame image according to the motion vector.

After the motion vector between the first frame image and the second frame image is obtained, the first frame image may be corrected according to the motion vector, thereby generating the aligned image. The correction method may be to perform coordinate offset on each pixel point in the T-th frame image according to the motion vector, so as to generate an aligned image corresponding to the T-th frame image.

Further, since in an implementation manner implemented by this embodiment, each image in the video file is an image in a YUV format, step S22 specifically includes:

Specifically, according to the motion vector, each pixel in a gaussian pyramid in a first pyramid corresponding to the first frame image is remapped, so as to correct the first pyramid, thereby generating a corresponding gray image set, which is used as a third Y component of the aligned image, where the gray image set may be obtained by remapping each pixel in the gaussian pyramid in the first pyramidUsing the formula L_Ay＝R(G_t-1，F_x，F_y) Is represented by, wherein L_AyFor the third Y component, R () represents remapping, G_t-₁Representing a first pyramid, F, corresponding to the T-1 th frame image_xAnd F_yIs the motion vector. In the same way, according to the motion vector, the first U component and the first V component are respectively corrected to generate a third U component and a third V component, which can be respectively expressed by a formula L_Au＝R(U_t-1，F_x，F_y) And formula L_Av＝R(V_t-1，F_x，F_y) Is represented by, wherein, the L_AuIs the third U component, the U_t-1For the first U component, the V_t-1For the first V component, the L_AvIs the third V component. Since an image can be split into Y, U, and V components, the Y, U, and V components can also be represented as one image, and in this embodiment, the alignment image includes a third Y component, a third U component, and a third V component, and thus the alignment image is generated after the three Y components, the third U component, and the third V component are obtained.

And step S30, calculating the probability value of each pixel point in the T frame image as a noise pixel according to the T-1 frame image and the alignment image.

Specifically, in the first real-time solution of this embodiment, the noise value of the latter can be preliminarily determined according to the difference between the previous frame image and the next frame image, and the difference between the first frame image and the second frame image is calculated first, so as to obtain the initial noise value. The alignment image is obtained according to the motion vector between the first frame image and the second frame image, so that the alignment image can be regarded as an ideal image corresponding to the first frame image, and therefore, according to the alignment image, pixel points with cheap and large pixels in the second frame image can be screened and used as noise pixels, and on the basis of the initial noise value, the noise pixels in the second frame image are combined, and the true noise value of the second frame image can be obtained.

Further, in order to more accurately calculate the probability value that each pixel point in the T-th frame image is a noise pixel, step S30 includes:

step S31, calculating an average value of each pixel in the T-1 th frame image and the T-th frame image, and generating an average frame image.

In particular, according to formula F_m＝(F_t-1+F_t) First, calculating an average value of each pixel in the first frame image and the second frame image, wherein F_mIs the pixel value of the average frame, F_t-1For the pixel value of the T-1 th frame image, the F_tIs the pixel value of the T frame image.

And step S32, calculating a difference value corresponding to the T-th frame image according to the average frame image.

Specifically, the difference between the two data can be calculated by the square of the difference, the absolute value of the difference between the two, and the like, and this embodiment preferably adopts the square of the difference between the two, and the formula can be expressed as F_diff＝(F_t-F_m)²Wherein, said F_diffI.e. the difference value.

Step S33, calculating a probability value of each pixel in the T-th frame image being a noise pixel according to the difference value and the alignment image.

Specifically, if the video file is a moving object, pixels with a larger second optical flow value inevitably appear in the first frame image and the second frame image, and these pixels are not noise, and therefore, it is not necessary to perform processing such as noise reduction on the pixels, and therefore, in order to avoid the pixels from being included in the noise-reduced pixels, it is necessary to eliminate the pixels. The method is too rough and has poor transitivity, so the method adopted by the embodiment is a method for setting an activation function, for example, a common activation function sigmoid function is taken as an example, the activation function is a nonlinear action function which maps all variables between 0 and 1, and the method is used for pairingThe input value is insensitive when exceeding a certain range. And inputting the difference between the second frame image and the aligned image and the difference value corresponding to the second frame image as input values into a preset activation function, and obtaining the probability value of the pixel point classified as a noise pixel according to the output result. The formula adopted in the embodiment is

Wherein, W_noiseAs a result of the probability value,

e is a natural index.

And step S40, fusing the T frame image and the alignment image according to the probability value, and generating a T noise reduction frame image corresponding to the T frame image.

Specifically, a probability value that each pixel point in the second frame image is a noise pixel is determined, and specifications of the aligned image and the T-th frame image are the same, so that each pixel coordinate has a corresponding pixel value in both. Performing pixel fusion on the Tth frame image and the aligned image according to the probability value, for example, setting a probability threshold value of 80%, and if the probability value corresponding to a pixel point corresponding to the same coordinate is 90%, taking the pixel value in the aligned image as a correct pixel value; and if the probability value corresponding to the pixel point is 10%, taking the pixel value in the T-th frame image as a correct pixel value. And traversing each pixel point in the T-th frame image, repeating the step of determining the correct pixel value, and finally generating a T-th noise reduction frame image corresponding to the T-th frame image according to all the correct pixel values.

Further, step S40 includes:

step S41, determining a first weight value corresponding to the T-th frame image and a second weight value corresponding to the aligned image according to the probability value.

In a second implementation manner of this embodiment, first, according to the probability value, a first weight value corresponding to the T-th frame image and the aligned image are calculatedA corresponding second weight value, wherein the first weight value is equal to the probability value, first weight value w₁＝W_noise(ii) a The second weight value is equal to a difference value from the probability value, i.e. a second weight value w₂＝1-W_noise。

Step S42, according to the first weight value and the second weight value, fusing the T-th frame image and the alignment image, and generating a T-th noise reduction frame image corresponding to the T-th frame image.

Specifically, for a pixel point with a fixed coordinate value, a first product of a corresponding pixel value in the T-th frame image and the first weight value and a second product of a corresponding pixel value in the aligned image and the second weight value are calculated first, and then the first product and the second product are added to complete the fusion of the pixel point_A*w₂+w₁*F_tSaid L is_AAnd the O is the pixel value of the aligned image and the Tth noise reduction frame image. And traversing each pixel point in the T-th frame image, executing the formula, and realizing the fusion of the T-th frame image and the alignment image to generate a T-th noise reduction frame image corresponding to the T-th frame image.

Further, in order to further improve the noise reduction effect, in this embodiment, when performing fusion, a filtering operation is also performed, and step S42 includes:

step S421, the first weight value and the second weight value respectively fuse the second Y component and the third Y component, the second U component and the third U component, and the second V component and the third V component to generate a fourth Y component, a fourth U component, and a fourth V component.

Specifically, the Y component, the U component, and the V component of the second frame image and the aligned image are fused, and a fourth Y component, a fourth U component, and a fourth V component are generated by adding after multiplying the second frame image and the aligned image respectively by the weight values, where the formula may be:

O_y＝L_Ay*w₂+w₁*G_t；

O_u＝L_Au*w₂+w₁*U_t；

O_v＝L_Av*w₂+w₁*V_t。

wherein, said O is_yFor the fourth Y component after fusion, O_uFor the fourth U component after fusion, the O_vFor the fourth V component after fusion, U_tFor the second U component, V_tIs the second V component, w₁Is the first weight value, w₂Is the second weight value.

Step S422, filtering the fourth Y component according to a preset bilateral filter, and generating a fifth Y component, where the bilateral filter does not include a spatial domain kernel.

Specifically, the kernel function in the bilateral filtering includes a spatial domain kernel and a pixel range domain kernel, in an area where the pixel value change is small in the image, the pixel range domain weight is close to one, and the spatial domain kernel plays a role equivalent to gaussian blur; in the image edge area, the area with large pixel value change has the spatial domain weight close to one and the pixel range domain weight is increased, so that the detail of the edge is ensured. In this embodiment, the Y component is a grayscale map, and details may be lost due to the existence of the spatial domain kernel, so that a bilateral filter for removing the spatial domain kernel is used for filtering the fourth Y component to generate the fifth Y component.

Step S423, filtering the fourth U component and the fourth V component according to a preset nonlinear filter, and generating a fifth U component and a fifth V component.

In particular, for more persistent noise, such as salt and pepper noise, the conventional linear filtering intelligence can suppress the more persistent noise, so that the more persistent noise cannot be eliminated, and a nonlinear filter is required. The nonlinear filter includes median filtering and the like. The embodiment is described by using median filtering, which can sort all pixels in a domain window to obtain a median to represent a pixel value at the center of the window, and has a strong noise reduction effect on salt and pepper noise and impulse noise, and meanwhile, edge details can be retained. For better detail processing, the domain window used in the median filter in this embodiment is a small domain window with a size of 3 × 3.

Step S424, fusing the fifth Y component, the fifth U component, and the fifth V component, and generating a T-th noise reduction frame image corresponding to the T-th frame image.

Specifically, the fifth Y component, the fifth U component, and the fifth V component are finally packed, so as to be fused into one image, that is, a second noise reduction frame image corresponding to the second frame image.

Further, after the first frame image is processed, the steps S10-S40 are repeatedly performed until the noise reduction of all the frame images in the video file is completed, and finally the noise-reduced video is output. If the above operation is directly performed on each frame image in the original video file, there may be accumulation of noise, and therefore, in this embodiment, referring to fig. 3, step S40 further includes:

and recursively circulating the steps S10-S40 for the replaced video file until T is equal to the number of frame images in the video file, and generating and outputting the noise-reduced video.

Specifically, the first frame image in the video file is replaced by the first noise reduction frame image, then the replaced video file is used as a video file to be processed, recursive circulation refers to continuously accumulating the value of T, the current T is 1, after replacement, T is assigned to 2, and steps S10-S40 are circulated to obtain a second noise reduction frame image, then the third frame image in the video file is replaced by the third frame image until all frame images are processed, and finally the noise-reduced video is obtained.

Further, as shown in fig. 4, based on the above video denoising method, the present invention also provides an intelligent terminal, which includes a processor 10, a memory 20, and a display 30. Fig. 4 shows only some of the components of the smart terminal, but it should be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.

The memory 20 may be an internal storage unit of the intelligent terminal in some embodiments, such as a hard disk or a memory of the intelligent terminal. The memory 20 may also be an external storage device of the Smart terminal in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the Smart terminal. Further, the memory 20 may also include both an internal storage unit and an external storage device of the smart terminal. The memory 20 is used for storing application software installed in the intelligent terminal and various data, such as program codes of the installed intelligent terminal. The memory 20 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the memory 20 stores a video denoising method program 40, and the video denoising method program 40 can be executed by the processor 10 to implement the video denoising method in the present application.

The processor 10 may be, in some embodiments, a Central Processing Unit (CPU), a microprocessor or other data Processing chip, which is used to run program codes stored in the memory 20 or process data, such as executing the video noise reduction method.

The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like in some embodiments. The display 30 is used for displaying information at the intelligent terminal and for displaying a visual user interface. The components 10-30 of the intelligent terminal communicate with each other via a system bus.

In one embodiment, when the processor 10 executes the video noise reduction method program 40 in the memory 20, the following steps are implemented:

Optionally, in the video denoising method, when T is equal to 1, the T-th frame image is filtered according to a preset pre-filter, so as to generate a T-th denoised frame image corresponding to the T-th frame image.

and calculating a second optical flow value between the first pyramid and the bottom layer image in the second pyramid according to the first optical flow value, and using the second optical flow value as a motion vector between the T-1 frame image and the T-1 frame image.

The present invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a video noise reduction method program, which when executed by a processor implements the steps of the video noise reduction method as described above.

Of course, it will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by instructing relevant hardware (such as a processor, a controller, etc.) through a computer program, and the program can be stored in a computer readable storage medium, and when executed, the program can include the processes of the embodiments of the methods described above. The computer readable storage medium may be a memory, a magnetic disk, an optical disk, etc.

It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims

1. A video denoising method for performing image denoising on a frame image in a video file, the method comprising:

2. The video denoising method of claim 1, further comprising:

and when T is equal to 1, filtering the T frame image according to a preset filter to generate a T noise reduction frame image corresponding to the T frame image.

3. The method according to claim 1, wherein when T is greater than or equal to 2, calculating an aligned image corresponding to a T-th frame image in the video file according to a T-1-th frame image in the video file specifically includes:

4. The video denoising method according to claim 3, wherein the calculating optical flow values of a T-1 th frame image and a T-th frame image in the video file to generate motion vectors specifically comprises:

5. The method of claim 4, wherein the splitting rule comprises a Gaussian pyramid splitting rule and a Laplacian pyramid splitting rule.

6. The method according to claim 4, wherein the first optical flow values comprise first sparse optical flow values and first dense optical flow values, and wherein the calculating of the first optical flow values between the first top-level image in the first pyramid and the second top-level image in the second pyramid comprises:

7. The method of claim 4, wherein the splitting the T-1 th frame image and the T-th frame image according to a preset splitting rule to generate a first pyramid corresponding to the T-1 th frame image and a second pyramid corresponding to the T-th frame image specifically includes:

8. The video denoising method of claim 6, wherein the aligned image comprises a third Y component, a third U component, and a third V component; the calculating an aligned image corresponding to the T-th frame image according to the motion vector specifically includes:

9. The method of claim 1, wherein the calculating, according to the T-1 th frame image and the alignment image, probability values of pixel points in the T-th frame image as noise pixels specifically includes:

10. The video denoising method according to claim 7, wherein the fusing the tth frame image and the alignment image according to the probability value to generate a tth denoised frame image corresponding to the tth frame image specifically comprises:

11. The video denoising method according to claim 9, wherein the fusing the tth frame image and the alignment image according to the first weight value and the second weight value to generate a tth denoising frame image corresponding to the tth frame image specifically comprises:

12. The video denoising method according to any one of claims 1 to 10, further comprising:

13. An intelligent terminal, characterized in that, intelligent terminal includes: memory, processor and a video noise reduction method program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the video noise reduction method according to any of claims 1-12.

14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a video noise reduction method program which, when executed by a processor, implements the steps of the video noise reduction method according to any one of claims 1 to 12.