CN105791980B

CN105791980B - Films and television programs renovation method based on increase resolution

Info

Publication number: CN105791980B
Application number: CN201610109909.XA
Authority: CN
Inventors: 张宏志; 赵秋实; 左旺孟; 石坚; 张垒磊
Original assignee: Harbin Super-Resolution Fx Technology Co Ltd
Current assignee: Harbin Super-Resolution Fx Technology Co Ltd
Priority date: 2016-02-29
Filing date: 2016-02-29
Publication date: 2018-09-14
Anticipated expiration: 2036-02-29
Also published as: CN105791980A

Abstract

Films and television programs of the present invention for lower resolution ratio, lower clarity, it is proposed that a kind of films and television programs renovation method based on increase resolution, concrete scheme are：First, the resolution ratio and target resolution for obtaining original video, calculate scaling；Secondly, input video is divided into set of frames by certain partitioning scheme；Then, it is converted according to pre-stored mapping relations, obtains high-resolution video frame；Finally, high-resolution video frame is combined into high-resolution video wherein, pre-stored mapping relations are obtained based on Mixture of expert model learning, the process in a computer offline complete the method for the invention have many advantages, such as adaptively it is good, speed is fast, effect is good, expansible.

Description

Movie and television work renovation method based on resolution improvement

Technical Field

The invention belongs to the field of computer vision and image processing, relates to a method for renewing film and television works, and particularly relates to a method and a system for renewing the film and television works based on resolution improvement.

Background

With the development of video acquisition, transmission, storage and display technologies, movie and television works are continuously developing towards high resolution. People enjoy video with higher and higher taste, and continuously pursue high-resolution and high-definition film and television works. Meanwhile, the advent of high-resolution display devices (such as 4K, 5K televisions and monitors) has made possible the popularization of high-resolution film and television works.

On the other hand, however, many of the older classic film and television works still have lower resolution, lower definition, and poorer visual effect due to technical means limitations. Meanwhile, due to the fact that the age is long, the film is long in storage time, and various quality degradation such as damage, flicker, noise, jitter and the like can occur due to the fact that damage of external factors such as natural disasters, war and the like is caused. On one hand, people want to warm classical film and television works, and on the other hand, people have new requirements on the quality of films. In order to meet the requirements of people on warming classical film and television works and pursuing high-quality videos, the film and television work renovating technology is produced. The essence of the renovation of film and television works is to apply image/video processing technology to process the original video and eliminate various quality degradation so as to improve the visual effect of the original video.

Like books, film and television works are important cultural carriers of human society, and some classical film and television works have irreplaceable cultural value even though the times are long. Therefore, the refreshing and remapping of the film and television works which are long in the past have very important significance. Specifically, the meaning of the renovation of the film and television works comprises the following aspects:

1. some classical film and television works, such as documentaries, are precious historical data. The historical data can be better stored and transmitted by renewing the film and television works.

2. The method is an important form for cultural and artistic inheritance by renovating classical film and television works and enabling more modern people to appreciate the classic film and television works.

3. The classic film and television works are renovated, the classic art works are glowing again, and the classic film and television works are the greatest respect and souvenir for artists.

The existing method for improving the visual effect of film and television works mainly focuses on video enhancement means, such as noise removal, blur removal, interlacing removal, contrast enhancement, color enhancement and the like. The methods can enhance the visual effect of the original video, but do not improve the resolution of the video, so the method does not essentially meet the requirement of people on the renovation of the classic film and television works.

The resolution enhancement is to generate a high-resolution video quickly and effectively by a certain method from a low-resolution video (or video frame). The difficulty is how to break through the limitation of the number of pixels of the original low-resolution video, fill the pixels which do not exist originally, and keep the structure and the texture of the original low-resolution video and make the video more natural and reasonable to the human eyes.

The traditional resolution improvement method mainly comprises interpolation-based, reconstruction-based and learning-based methods. The interpolation-based method is to linearly combine the existing pixel points to serve as the missing pixel points. The interpolation algorithm is simple and rapid, but a mosaic effect or an over-smooth phenomenon is easy to occur; the reconstruction-based algorithm carries out registration reconstruction by utilizing the similarity of multi-frame images, but the algorithm is usually only simple combination of the multi-frame images, and the effect is not ideal; the learning-based algorithm mainly utilizes a certain amount of training data and obtains the mapping relation from the low-resolution video to the high-resolution video according to specific algorithm training, and the algorithm has higher requirements on a model, is easy to over-fit or under-fit, and has large calculation amount, low speed and low practicability. It can be said that the above-mentioned video resolution improvement problem is always puzzling the users.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the invention provides a method and a system for renewing film and television works based on video resolution improvement, which convert low-resolution film and television works (usually lower than 720P) into higher-resolution videos (such as 1080P, 4K and the like) through a resolution improvement technology to realize the renewal of the film and television works.

The technical solution of the invention is as follows: the specific scheme of the invention is as follows: firstly, acquiring the resolution and the target resolution of an original video, and calculating the scaling; secondly, dividing the input video into image frames according to a certain dividing mode; then, transforming according to a pre-stored mapping relation to obtain a high-resolution video frame; finally, the high resolution video frames are combined into a high resolution video. The pre-stored mapping relation model is obtained based on mixed expert model learning, and the model training process is completed in a computer in an off-line mode. The specific steps include:

learning a mapping relation model:

(1) preprocessing training video

(1.1) selecting a high-resolution video as a training sample, and splitting the high-resolution video into high-resolution video frames;

(1.2) convolving the high-resolution video frame obtained in the step (1.1) by using Gaussian kernel

(1.3) calculating magnification times according to the original low-resolution video and the target high-resolution video, and carrying out interlaced sampling according to the obtained times to obtain corresponding low-resolution video frames;

and (1.4) respectively dividing the high-resolution video frame and the sampled low-resolution video frame into blocks as training data.

(2) Obtaining a mapping relation model based on a hybrid expert model

(2.1) initializing a hybrid expert model. The hybrid expert model comprises two parts of an expert and a gate function, and the structure of the hybrid expert model is a tree shape, as shown in the attached figure 2. The leaf nodes in the tree structure in the graph are called experts and are responsible for mapping and transforming data; the root node is called the gate function and is responsible for selecting the appropriate expert for the data. The present invention uses a linear function as an expert function:

y＝Wx

where W is an expert function parameter and x and y represent a block of low resolution video frames and a corresponding block of high resolution video frames, respectively.

The gate function is responsible for deciding which expert to select for transforming the data, and in the present invention, the ith gate function is expressed as:

where x and y represent a block of low resolution video frames and a corresponding block of high resolution video frames, respectively, v_iRepresenting the ith gate function parameter, v_jAnd expressing the jth gate function parameter, wherein K is the number of experts in the mixed expert model, namely the number of leaf nodes in the tree structure. The initialization of the hybrid expert model specifically comprises the following steps:

(2.1.1) specifying a number K of experts;

(2.1.2) assume that the probability distribution of each expert follows a Gaussian distribution: p (y | x, W)_i)＝N(y(x，W_i) σ) wherein W_iRepresents the parameter of the ith expert, and σ is the standard deviation of the gaussian distribution. Assume parameter W_iThe distribution of (c) also follows a gaussian distribution: p (W)_i) N (0, μ), where μ denotes the mean of the gaussian distribution.

(2.1.3) clustering the training data according to the number K of experts by adopting a K-means algorithm, wherein the initial value W of the parameter of each expert_i ⁽⁰⁾Specifying the initial value v of each gate function parameter as the slope within class_i ⁽⁰⁾Designating as a cluster center;

(2.1.4) calculate the initial value of each gate function:

where x denotes a block of low resolution video frames, v_i ⁽⁰⁾And expressing the initial value of the ith gate function parameter, wherein K is the number of experts in the mixed expert model, namely the number of leaf nodes in the tree structure.

And (2.2) using the training data obtained in the step (1.4) to perform iterative optimization on the hybrid expert model until the iterative process is converged, wherein the finally obtained model parameters are the mapping relation model. The mapping relation model includes a gate function parameter and an expert parameter.

(2.2.1) specifying an allowable error epsilon at the termination of the iteration;

(2.2.2) calculating the posterior probability of each gate function in the current iteration:

where k is the number of iteration steps, p_i(y|x，W_i ^(k)) And p_j(y|x，W_j ^(k)) Probability distribution, g, representing experts_i ^(k)(x，v_i ^(k)) Representing the value of the kth iteration of the ith gate function.

(2.2.3) updating each expert parameter:

where k is the number of iteration steps, X is the vector formed by all low resolution video frame blocks X in the training data, Y is the vector formed by all high resolution video frame blocks Y in the training data, X is the number of iteration steps^TDenotes the transpose of X, I denotes the identity matrix, H_i ^(k+1)And (3) representing a vector formed by the posterior probabilities of all the low-resolution video frame blocks x corresponding to the ith expert in the (k + 1) th step.

(2.2.4) updating each gate function parameter:

whereinRepresenting the ith gate function parameter in the kth iteration,is the posterior probability, x, of the ith gate function in the kth iteration^(t)Representing the t-th block of low resolution video frames.

(2.2.5) calculating the output of each gate function in the current iteration:

(2.2.6) calculating the likelihood probability in the current iteration:

wherein p is_i(y|x，W_i ^(k+1)) Probability distribution, p (W), representing experts_i ^(k+1)) Representing the probability distribution of the expert parameters.

(2.2.7) judging whether the iteration converges. And ending the iteration when the absolute value of the difference between the likelihood probability of the iteration of the current round and the likelihood probability of the iteration of the previous round is smaller than the allowable error epsilon when the iteration is ended. Otherwise, repeating the steps (2.2.2) - (2.2.7).

Gate function parameter v obtained at the end of iteration_iTogether with the number of experts K, the expert parameter W_iThe standard deviation sigma of the probability distribution of the expert and the mean mu of the probability distribution of the expert parameters are stored in a disk as a final mapping relation model.

After the mapping relation model is learned and stored, the resolution of the video is improved by using the stored mapping relation model:

(3) pre-processing low resolution video to be processed

(3.1) splitting the low-resolution video into low-resolution video frames;

(3.2) dividing the low-resolution video frame obtained in the step (3.1) into low-resolution video frame blocks;

(4) and (3) upgrading the low-resolution video into a high-resolution video according to the mapping relation model obtained in the step (2), and the method comprises the following steps:

(4.1) taking the low-resolution video frame block obtained in the step (3) as an input of a gate function, and calculating the output of each gate function by using gate function parameters in the mapping relation model obtained in the step (2):

where x is the incoming low resolution video frame block. K is the number of experts in the mixed expert model, v_iRepresenting the ith gate function parameter, are obtained by step (2.2).

(4.2) calculating the corresponding high-resolution video frame block by using the parameter of the expert function corresponding to the gate function with the maximum output value, wherein the parameter of the expert function is obtained in the step (2);

(4.2.1) calculating the number of the gate function that obtains the maximum output value: i ═ arg max (g)_i)

Wherein, g_iThe output for the ith gate function is obtained by step (4.1).

(4.2.2) computing a high resolution video frame block using the ith expert function: y ═ W_ix

Wherein, W_iAnd y is the high-resolution video frame block corresponding to the input low-resolution video frame block x as the parameter of the ith expert function.

(4.3) carrying out resolution enhancement on each low-resolution video frame block according to the steps of (4.1) and (4.2) to obtain a corresponding high-resolution video frame block, and splicing all the high-resolution video frame blocks into corresponding high-resolution video frames according to the positions of the corresponding low-resolution video frame blocks in the low-resolution video frames;

and (4.4) obtaining high-resolution video frames corresponding to all the low-resolution video frames, and combining the high-resolution video frames into a high-resolution video.

In the step (4), there is no dependency relationship between video frame blocks and between video frames, so that the step can be accelerated in parallel by using a GPU processor.

The method for renewing the film and television works based on the resolution improvement can be used in the form of a computer software player, and can also be integrated into a hardware platform (such as a set top box, an intelligent television and the like) for use.

The method for renewing the film and television works based on resolution improvement can be matched with other video screen enhancement methods to be used as a preprocessing or post-processing means, and the visual effect can be further improved.

Compared with the prior art, the invention has the advantages that: from the visual effect, the high-resolution video obtained by implementing the scheme of the invention has complete details, clear edges, good texture maintenance, and is fast and stable. Specifically, the features of the present invention include:

1. and (4) self-adapting. The scheme of the invention adaptively calculates the scaling factor and can adapt to different resolution ratio improvement requirements.

2. The speed is high. Since there is no dependency between video frame sequences, the processing speed can be increased by parallel processing. In addition, the algorithm processes the video frame sequence to be subjected to linear mapping transformation, and the used mapping parameters can be stored in a memory in advance, so that the processing speed can be further improved.

3. The effect is good. The mapping parameters used in the resolution improvement process are obtained based on the hybrid expert model learning, and the defect that the division and the sub-model learning are separated in the traditional resolution improvement algorithm based on the learning is overcome. Meanwhile, the robustness advantage of statistics and the accuracy advantage based on a learning algorithm are combined, the defect that a large amount of data information cannot be utilized based on the learning algorithm in the past is overcome, the precision is higher than that of a pure statistical method, and even a video with high resolution can be well treated and processed at a high speed.

4. And (4) the expansion is realized. Because the video frame sequences do not have dependency relationship, parallel processing can be realized by applying technical means such as GPU acceleration and the like, and the processing speed is improved. In addition, the algorithm provided by the invention can be directly applied to the field of image resolution improvement.

Drawings

Fig. 1 is a flowchart of a method for refreshing a movie or television work based on resolution enhancement according to the present invention.

FIG. 2 is a schematic diagram of a hybrid expert model according to the present invention.

Fig. 3 is a schematic diagram illustrating the division of a video frame into video frame blocks according to the present invention.

Detailed Description

The process according to the invention is illustrated in the following detailed description by way of example.

According to the method for refreshing the film and television works based on resolution enhancement, the process of enhancing a part of video with the original resolution of 768 × 432 to 3072 × 1728 comprises the following steps:

(1) preprocessing training video

(1.1) selecting a high-resolution movie and television work, reading in the video stream of the movie and television work by using video processing software, and storing each frame in the video stream as a video frame, wherein in the embodiment, the length of the movie and television work is 1200 seconds, the frame rate is 25 frames/second, and the total number of the obtained video frames is: 1200 × 25 ═ 15000;

(1.2) performing convolution on the video frame obtained in the step (1.1) by using a Gaussian kernel with the average value of 0 and the standard deviation of 1;

and (1.3) acquiring the resolution of the original low-resolution video and the target resolution, and calculating the magnification according to the resolution of the original low-resolution video and the target resolution. Original resolution is 768 × 432, target resolution is 3072 × 1728, magnification is: 3072/768 ═ 4. Accordingly, the convolved video frames are downsampled to the original size of 1/4 to obtain the corresponding low resolution video frames.

(1.4) each of the low resolution video frames obtained is divided into non-overlapping small blocks of 10 × 10 pixels by the existing division standard, as shown in fig. 3, and 1,000,000 blocks are selected as training data.

(2) Obtaining a mapping relation model based on a hybrid expert model

(2.1) initializing the hybrid expert model

(2.1.1) the number of experts K is specified. In this embodiment, K is taken to be 100;

(2.1.2) parameters σ and μ that specify the probability distribution of the expert and the probability distribution of the expert parameter, where σ is 0.32 and μ is 0.58 in the present embodiment;

(2.1.3) clustering the training data according to the number K of experts by adopting a K-means algorithm, wherein the W of each expert_i ⁽⁰⁾The parameter is initialized to the slope in class, the gate function parameter v_i ⁽⁰⁾Initializing to a cluster center;

(2.1.4) calculating the initial value of each gate function according to:

(2.2) using the training data obtained in the step (1.4) to carry out iterative optimization on the hybrid expert model obtained in the step (2.1):

(2.2.1) specify the allowable error ε at the end of the iteration. In this embodiment, the error epsilon allowed by the termination of the model iteration is 0.005.

whereink is the number of iteration steps, p_i(y|x，W_i ^(k)) And p_j(y|x，W_j ^(k)) Probability distribution, g, representing experts_i ^(k)(x，v_i ^(k)) Representing the value of the kth iteration of the ith gate function.

(2.2.3) updating each expert parameter:

(2.2.4) updating each gate function parameter:

(2.2.5) calculating the output of each gate function in the current iteration:

(2.2.6) calculating the likelihood probability in the current iteration:

Gate function parameter v obtained at the end of iteration_iTogether with the number of experts K, the expert parameter W_iThe standard deviation sigma of the probability distribution of the expert and the mean mu of the probability distribution of the expert parameters are stored in a disk as a final mapping relation model. Wherein v is_iK, σ, μ are referred to as gate function parameters of the mapping relation model, W_iReferred to as expert parameters of the mapping relation model.

(3) Pre-processing low resolution video to be processed

(3.1) splitting the low-resolution video to be processed into low-resolution video frames, wherein in this embodiment, the length of the movie work is 2000 seconds, the frame rate is 25 frames/second, and the total number of the obtained video frames is: 2000 × 25 ═ 50000;

(3.2) dividing the low resolution video frame obtained in step (3.1) into 10 × 10 video frame blocks, as shown in fig. 3;

(4) mapping the low-resolution video to a high-resolution video, comprising:

(4.2) calculating a corresponding high-resolution video frame block by using the expert function parameter corresponding to the gate function with the maximum output value;

(4.2.1) calculating the number of the gate function that obtains the maximum output value: i ═ arg max (g)_i). Wherein, g_iThe output for the ith gate function is obtained by step (4.1).

(4.2.2) computing a high resolution video frame block using the ith expert function:

y＝W_ix

wherein, W_iAnd (3) obtaining the ith expert function parameter through the step (2). x is the input low resolution video frame block of size 10 x 10, y is the resolution up-scaled high resolution video frame block of size 40 x 40.

Claims

1. A method for renewing film and television works based on resolution improvement is characterized in that: the method comprises two parts of learning a mapping relation model and carrying out resolution improvement according to the mapping relation model;

the method for learning the mapping relation model comprises the following two steps:

(1) pre-processing a training video, comprising:

(1.2) convolving the high-resolution video frame obtained in the step (1.1) by using a Gaussian kernel;

(1.4) respectively dividing the high-resolution video frame and the sampled low-resolution video frame into blocks as training data;

(2) obtaining a mapping relation model based on a hybrid expert model, comprising:

(2.1) initializing a hybrid expert model, comprising the steps of:

① specifies the number of experts K;

② assume that the probability distribution of each expert follows a Gaussian distribution of p (y | x, W)_i)＝N(y(x，W_i) σ), where x denotes a low resolution video frame block, y denotes a high resolution video frame block, W_iA parameter representing the ith expert, σ being the standard deviation of the Gaussian distribution; assume parameter W_iThe distribution of (c) also follows a gaussian distribution: p (W)_i) N (0, μ), where μ represents the mean of a gaussian distribution;

③ clustering training data according to the number K of experts by K-means algorithm, and setting the initial value W of each expert parameter_i ⁽⁰⁾Specifying the initial value v of each gate function parameter as the slope within class_i ⁽⁰⁾Designating as a cluster center;

④ calculate the initial value of each gate function:

wherein v is_i ⁽⁰⁾Expressing the initial value of the ith gate function parameter, wherein K is the number of experts in the mixed expert model, namely the number of leaf nodes in the tree structure;

(2.2) using the training data obtained in the step (1.4) to perform iterative optimization on the hybrid expert model until the iterative process is converged, wherein the finally obtained model parameters are the mapping relation model; the mapping relation model comprises a gate function parameter and an expert parameter; the iterative optimization of the model comprises the following steps:

① specifies the allowable error ε at the end of the iteration;

② the posterior probability of each gate function in the current iteration is calculated:

where k is the number of iteration steps, p_i(y|x，W_i ^(k)) And p_j(y|x，W_j ^(k)) Probability distribution, g, representing experts_i ^(k)(x，v_i ^(k)) Representing the kth iteration value of the ith gate function;

③ update each expert parameter:

where k is the number of iteration steps, X is the vector formed by all low resolution video frame blocks X in the training data, Y is the vector formed by all high resolution video frame blocks Y in the training data, X is the number of iteration steps^TDenotes the transpose of X, I denotes the identity matrix, H_i ^(k+1)Representing a vector formed by the posterior probabilities of all the low-resolution video frame blocks x corresponding to the ith expert in the (k + 1) th step;

④ update each gate function parameter:

whereinRepresenting the ith gate function parameter in the kth iteration,is the posterior probability, x, of the ith gate function in the kth iteration^(t)Representing the tth low resolution video frame block;

⑤ the output of each gate function in the current iteration is calculated:

⑥ likelihood probabilities in the current iteration are calculated:

wherein p is_i(y|x，W_i ^(k+1)) Probability distribution, p (W), representing experts_i ^(k+1)) A probability distribution representing expert parameters;

⑦, judging whether the iteration is convergent, when the absolute value of the difference between the likelihood probability of the iteration and the likelihood probability of the previous iteration is less than the allowable error epsilon when the iteration is terminated, ending the iteration, otherwise, repeating the steps ② - ⑦;

gate function parameter v obtained at the end of iteration_iTogether with the number of experts K, the expert parameter W_iThe standard deviation sigma of the probability distribution of the expert and the mean value mu of the probability distribution of the expert parameters are used as a final mapping relation model to be stored in a magnetic disk;

the resolution improvement according to the mapping relation model comprises the following two steps:

(3) pre-processing low resolution video to be processed, comprising:

(3.1) splitting a low-resolution video to be processed into low-resolution video frames;

(3.2) partitioning the low resolution video frame into blocks;

(4.1) taking the low-resolution video frame block obtained in the step (3) as the input of a mixed expert model gate function, and calculating the output of each gate function by using the gate function parameters in the mapping relation model obtained in the step (2);

2. The method for refreshing a movie or television work based on resolution enhancement according to claim 1, wherein the hybrid expert model of step (2.1) comprises two parts of an expert and a gate function;

the expert is responsible for mapping and transforming the data, and the mapping and transforming in the invention uses a linear function as an expert function:

y＝Wx

wherein W is an expert parameter, and x and y represent a low resolution video frame block and a corresponding high resolution video frame block, respectively;

the gate function is responsible for deciding which expert to select for transforming the data, and the ith gate function in the invention is expressed as:

wherein v is_iRepresenting the ith gate function parameter, v_jAnd expressing the jth gate function parameter, wherein K is the number of experts in the mixed expert model.

3. The method for refreshing a movie or television work based on resolution enhancement according to claim 1, wherein: the step (4.2) of calculating the corresponding high resolution video frame block by using the parameter of the expert function corresponding to the gate function with the maximum output value comprises the following steps:

① calculating the gate function number for obtaining the maximum output value, i ═ arg max (g)_i)

Wherein,g_ithe output of the ith gate function is obtained through the step (4.1);

② use the ith expert function to calculate the block of high resolution video frames y-W_ix