CN108028938A

CN108028938A - Method for video coding and device

Info

Publication number: CN108028938A
Application number: CN201680054222.9A
Authority: CN
Inventors: 张金雷; 邹天玱; 王妙锋; 石中博; 王世通; 薛东; 罗巍
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2016-07-21
Filing date: 2016-07-21
Publication date: 2018-05-11
Also published as: WO2018014301A1

Abstract

The present embodiments relate to method for video coding and device.This method includes：Discrete cosine transform is carried out to the prediction residual of current prediction mode, obtains conversion coefficient；Conversion coefficient is quantified, obtains quantization parameter；Inverse quantization is carried out to quantization parameter, obtains dequantized coefficients；The distortion value of the current prediction mode is obtained according to the difference of conversion coefficient and dequantized coefficients.Current prediction mode cost value is worth to according to the code check value of current prediction mode and distortion；The prediction mode for choosing cost value minimum in a variety of prediction modes is optimal prediction modes；Inverse discrete cosine transformation is carried out to the dequantized coefficients of optimal prediction modes, obtains reconstructive residual error.The embodiment of the present invention obtains distortion value according to the difference of conversion coefficient and dequantized coefficients, and only need to carry out an inverse transformation to optimal prediction modes obtains reconstructive residual error.The embodiment of the present invention has low complex degree, the advantage of low-power consumption, and reliability is high.

Description

Video coding method and device

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a video encoding method and apparatus.

Background

The h.264 standard and the High Efficiency Video Coding (HEVC) standard are two Video Coding schemes currently used, wherein the HEVC standard may also be referred to as the h.265 standard.

Fig. 1 is a prior art video coding framework. In both the h.264 and HEVC standards, the overall coding framework employed is prediction, transformation, quantization, entropy coding. The prediction is to compress the image. The prediction part obtains a prediction value for the current coding image by removing the spatial correlation and the temporal correlation of the video content, and obtains a residual error between the original image and the prediction image. And performing transform quantization on the residual error. And entropy coding information such as quantized coefficients (quantized coefficients) of the prediction residual error and the prediction mode to form code stream output.

The prediction mode can be divided into inter-frame prediction and intra-frame prediction, the inter-frame prediction is to predict the next frame image by using a temporally adjacent previous frame, and the intra-frame prediction mode is to predict the current frame image by using spatial correlation in the current frame. The inter-frame prediction mode can use the adjacent frame information of the video for prediction, and the compression rate is higher. The intra prediction mode has a lower compression rate than the inter prediction mode, but can remove spatial redundancy of neighboring blocks inside the current frame.

The accuracy of the prediction plays a decisive role in the coding performance. In the encoding process, a plurality of prediction modes are available under each prediction type, each prediction mode has different prediction values and prediction residuals, and the code rate (R for short) after entropy encoding and the distortion (D for short) after video reconstruction are different for each prediction mode. And calculating the cost of each mode through Rate Distortion Optimization (RDO for short), and selecting the mode with the minimum cost function as the optimal mode. The RDO calculates the cost value J of each prediction mode by using two quantities, R and D, to design a cost function, as shown in formula (1):

J＝D+λR (1)

the prior art outputs encoded data of an optimal prediction mode, and stores a reconstructed residual and a reconstructed video of the optimal prediction mode. In the prior art, transformation, quantization, inverse quantization and inverse transformation operations are performed on the prediction residual to obtain a reconstructed residual, and the distortion D is obtained according to the difference between the prediction residual and the reconstructed residual. The prediction residual is spatial domain data, and the transformation operation converts the spatial domain data into frequency domain data. Therefore, the prior art is also called a spatial domain distortion calculation method.

In the prior art, the optimal prediction mode can be selected only by carrying out transformation, quantization, inverse quantization and inverse transformation on the prediction residual of each prediction mode. The complexity of the prior art is high, wherein the complexity of the transform and the inverse transform increases exponentially with the transform size, the transform size of the HEVC standard is increased compared to h.264, and meanwhile, the number of prediction modes in the HEVC standard is also multiplied compared to the h.264 standard, so the complexity of the HEVC standard is greatly increased by the prior art.

In the prior art, in the hardware implementation, transformation and inverse transformation are high-power-consumption modules, and the prior art performs transformation and inverse transformation on prediction residuals of each prediction mode, so that the power consumption is too high.

Disclosure of Invention

The embodiment of the invention relates to a video coding method and a video coding device, and solves the problems of high complexity and high hardware implementation power consumption in the prior art.

In a first aspect, an embodiment of the present invention provides a video encoding method, where the method includes: and performing discrete cosine transform on the prediction residual of the current prediction mode to obtain a transform coefficient, wherein the predicted image is obtained by prediction according to an inter-frame prediction mode or an intra-frame prediction mode, the inter-frame prediction mode is to predict the next frame image by using the temporally adjacent previous frame image, and the intra-frame prediction mode is to predict the current frame image by using the spatial correlation in the current frame. And quantizing the transformation coefficient to obtain a quantized coefficient. And carrying out inverse quantization on the quantized coefficients to obtain inverse quantized coefficients. And obtaining a distortion value of the current prediction mode according to the difference between the transformation coefficient and the inverse quantization coefficient. And obtaining the cost value of the current prediction mode according to the code rate value and the distortion value of the current prediction mode. And selecting the prediction mode with the minimum cost value from the multiple prediction modes as the optimal prediction mode. And performing inverse discrete cosine transform on the inverse quantization coefficient of the optimal prediction mode to obtain a reconstructed residual error.

Specifically, the discrete cosine transform converts spatial domain data into frequency domain data. The transform coefficients and the inverse quantization coefficients belong to frequency domain data. Compared with the existing airspace distortion calculation method, the method for estimating the distortion of the invention adopts a frequency domain distortion estimation method, can calculate the distortion without performing discrete cosine transform and inverse discrete cosine transform on each prediction mode, and further selects the optimal prediction mode. Low complexity and low power consumption.

In one possible design, obtaining a cost value of a current prediction mode according to a rate value and a distortion value of the current prediction mode includes: and calculating the cost value of the current prediction mode through a rate distortion optimization function according to the code rate value and the distortion value of the current prediction mode, wherein the rate distortion optimization function is used for weighing the code rate value and the distortion value.

In one possible design, performing discrete cosine transform on the prediction residual of the current prediction mode to obtain transform coefficients, including: and performing discrete cosine transform on the prediction residual of the current prediction mode through a discrete cosine transform matrix to obtain a transform coefficient, wherein the discrete cosine transform matrix is orthorhombic and reversible.

In particular, the method provided by the embodiment of the invention can be applied to the HEVC/H.265 standard.

In one possible design, performing discrete cosine transform on the prediction residual of the current prediction mode to obtain transform coefficients, including: performing integer discrete cosine transform on the prediction residual of the current prediction mode through an integer discrete cosine transform matrix to obtain an integer discrete cosine transform coefficient, wherein the integer discrete cosine transform matrix is non-orthorhombic and reversible; and performing dot multiplication on the integer discrete cosine transform coefficient to obtain a transform coefficient.

Specifically, the method provided by the embodiment of the invention can be applied to the h.264 standard.

In a second aspect, an embodiment of the present invention provides an apparatus for estimating distortion in video coding, the apparatus including: the discrete cosine transform unit is used for performing discrete cosine transform on a prediction residual of a current prediction mode to obtain a transform coefficient, wherein the prediction residual is the difference between pixel values of an original image and a predicted image, the predicted image is obtained by prediction according to an inter-frame prediction mode or an intra-frame prediction mode, the inter-frame prediction mode is to predict a next frame image by using a previous frame image adjacent in time, and the intra-frame prediction mode is to predict a current frame image by using spatial correlation in the current frame. And the quantization unit is used for quantizing the transformation coefficient to obtain a quantization coefficient. And the inverse quantization unit is used for carrying out inverse quantization on the quantized coefficient to obtain an inverse quantized coefficient. And the distortion value calculation unit is used for obtaining the distortion value of the current prediction mode according to the difference between the transformation coefficient and the inverse quantization coefficient. And the cost value calculating unit is used for obtaining the cost value of the current prediction mode according to the code rate value and the distortion value of the current prediction mode. And the optimal prediction mode selection unit is used for selecting the prediction mode with the minimum cost value from the multiple prediction modes as the optimal prediction mode. And the reconstructed residual error unit is used for carrying out inverse discrete cosine transform on the inverse quantization coefficient of the optimal prediction mode to obtain a reconstructed residual error.

Specifically, the prediction mode includes a plurality of prediction images, each prediction image of the prediction modes is different, entropy coding is performed on a quantization coefficient of a current prediction mode to obtain entropy coding information, and a code rate value is determined according to the entropy coding information, wherein the current prediction mode is any one of the plurality of prediction modes.

In one possible design, the cost value calculation unit is specifically configured to: and calculating the cost value of the current prediction mode through a rate distortion optimization function according to the code rate value and the distortion value of the current prediction mode, wherein the rate distortion optimization function is used for weighing the code rate value and the distortion value.

In one possible design, the discrete cosine transform unit is specifically configured to: and performing discrete cosine transform on the prediction residual of the current prediction mode through a discrete cosine transform matrix to obtain a transform coefficient, wherein the discrete cosine transform matrix is orthorhombic and reversible.

In one possible design, the discrete cosine transform unit is specifically configured to: performing integer discrete cosine transform on the prediction residual of the current prediction mode through an integer discrete cosine transform matrix to obtain an integer discrete cosine transform coefficient, wherein the integer discrete cosine transform matrix is non-orthorhombic and reversible; and performing dot multiplication on the integer discrete cosine transform coefficient to obtain a transform coefficient.

In a third aspect, an embodiment of the present invention provides an apparatus for estimating video coding distortion, including:

a memory for storing program instructions. A processor for performing the following operations according to program instructions stored in the memory: performing discrete cosine transform on the prediction residual of the current prediction mode to obtain a transform coefficient; the prediction residual is the difference between the pixel values of the original image and the predicted image, the predicted image is obtained by prediction according to an inter-frame prediction mode or an intra-frame prediction mode, the inter-frame prediction mode is to predict the next frame image by using the temporally adjacent previous frame image, and the intra-frame prediction mode is to predict the current frame image by using the spatial correlation in the current frame. And quantizing the transformation coefficient to obtain a quantized coefficient. And carrying out inverse quantization on the quantized coefficients to obtain inverse quantized coefficients. And obtaining a distortion value of the current prediction mode according to the difference between the transformation coefficient and the inverse quantization coefficient. And obtaining the cost value of the current prediction mode according to the code rate value and the distortion value of the current prediction mode. And selecting the prediction mode with the minimum cost value from the multiple prediction modes as the optimal prediction mode. And performing inverse discrete cosine transform on the inverse quantization coefficient of the optimal prediction mode to obtain a reconstructed residual error.

In one possible design, the processor performs deriving the cost value of the current prediction mode according to the rate value and the distortion value of the current prediction mode, including: and calculating the cost value of the current prediction mode through a rate distortion optimization function according to the code rate value and the distortion value of the current prediction mode, wherein the rate distortion optimization function is used for weighing the code rate value and the distortion value.

In one possible design, the processor performs a discrete cosine transform on the prediction residual of the current prediction mode to obtain transform coefficients, including: and performing discrete cosine transform on the prediction residual of the current prediction mode through a discrete cosine transform matrix to obtain a transform coefficient, wherein the discrete cosine transform matrix is orthorhombic and reversible.

In one possible design, the processor performs a discrete cosine transform on the prediction residual of the current prediction mode to obtain transform coefficients, including: and performing integer discrete cosine transform on the prediction residual of the current prediction mode through an integer discrete cosine transform matrix to obtain an integer discrete cosine transform coefficient, wherein the integer discrete cosine transform matrix is non-orthorhombic and reversible. And performing dot multiplication on the integer discrete cosine transform coefficient to obtain a transform coefficient.

In a fourth aspect, an embodiment of the present invention provides a computer storage medium for storing a program, where the steps executed by the program include the steps of the first aspect.

The embodiment of the invention provides a video coding method and a video coding device, wherein a distortion value is obtained by using a frequency domain distortion calculation method, the most predictive mode is selected according to a rate distortion optimization function, and inverse transformation is carried out on an inverse quantization coefficient of the most predictive mode to obtain a reconstructed residual error. Compared with the prior art, the method and the device have the advantages of low complexity, low power consumption and high reliability.

Drawings

FIG. 1 is a prior art video coding framework diagram;

fig. 2 is a flowchart illustrating a video encoding method according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating another video encoding method according to an embodiment of the present invention;

FIG. 4 is a flow chart of a prior art video encoding method implementation;

fig. 5 is a flowchart of an implementation of a video encoding method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram illustrating the comparison between the distortion calculated by the distortion estimation method of the present invention and the distortion calculated by the prior art;

FIG. 7 is a schematic diagram illustrating the comparison between the distortion calculated by the distortion estimation method of the present invention and the distortion calculated by the prior art;

FIG. 8 is a schematic structural diagram of an apparatus for estimating video coding distortion according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of another video coding distortion estimation apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that, in video coding, the transform operation is usually Discrete Cosine Transform (DCT).

It should be understood that the transform coefficients, quantized coefficients, and dequantized coefficients referred to in this application are, respectively, in english: transforme coefficients, qualified coefficients, and qualified coefficients. The description will not be repeated below.

Fig. 2 is a schematic flow chart of a video encoding method according to an embodiment of the present invention, and referring to fig. 2, the method includes:

step 201, performing DCT transformation on the prediction residual of the current prediction mode to obtain a transformation coefficient, where the prediction residual is a difference between pixel values of an original image and a predicted image, and the predicted image is obtained by prediction according to the current prediction mode based on spatial correlation and temporal correlation of the original image.

The predicted image is obtained by prediction according to an inter-frame prediction mode or an intra-frame prediction mode, the inter-frame prediction mode is to predict the next frame image by using a temporally adjacent previous frame image, and the intra-frame prediction mode is to predict the current frame image by using the spatial correlation of the current frame.

Specifically, the prediction residual of the current prediction mode is subjected to DCT transformation through a DCT transformation matrix to obtain a transformation coefficient, and the DCT transformation matrix is orthorhombic and reversible.

Specifically, integer DCT transformation is carried out on the prediction residual error of the current prediction mode through an integer DCT transformation matrix to obtain an integer DCT transformation coefficient, and the integer DCT transformation matrix is non-orthorhombic and reversible; and performing point multiplication on the integer DCT coefficient to obtain a transformation coefficient.

The prediction residual is spatial data. The DCT transform converts the spatial domain data to frequency domain data. Quantization compresses the data. Video coding distortion is introduced by quantization operations, the DCT transform being a lossless transform. The following description will take an example of a transform size of 4 × 4, to illustrate that the video coding method provided by the embodiment of the present invention is applicable to the HEVC standard and the h.264 standard.

In a first aspect, for the HEVC standard, a DCT transform comprises the steps of:

step 201a, in order to realize DCT transformation, firstly, precision improvement is carried out on an integer DCT transformation matrix A, and each matrix element in A is multiplied by 128 (2)⁷) And approximately rounding each matrix element to obtain a DCT transform matrix C:

wherein,

step 201b, setting the prediction residual as X, and performing DCT transformation on the prediction residual X through a DCT transformation matrix C to obtain a transformation coefficient Y, as shown in formula (2):

Y＝(CXC^T)＞＞9 (2)

wherein, formula (2) shows that the predicted residual X is operated by DCT transformation matrix CAnd shifted to the right by 9 bits, i.e. divided by 2⁹。

Specifically, X is spatial domain data, and Y is frequency domain data.

In step 201C, since the DCT transformation matrix C is an orthogonal reversible matrix, the inverse transformation of Y is lossless, as shown in formula (3):

Y′＝(C^-1(Y＜＜9)(C^-1)^T)＝X (3)

it can be understood that according to formula (2) and formula (3), in the HEVC standard, DCT transform does not introduce errors due to transform in the whole encoding process, and meanwhile, since the encoding process is: prediction, transformation, quantization and entropy coding. Therefore, in the HEVC standard, only quantization introduces errors, so HEVC can estimate distortion D using transform coefficients before quantization and inverse quantized coefficients after inverse quantization.

In a second aspect, for the h.264 standard, the DCT transform includes an integer DCT transform and a dot product.

The integer DCT transform, which may also be termed a kernel transform, has an integer DCT transform matrix C_fComprises the following steps:

it is understood that in the h.264 standard, the integer DCT transform matrix C_fIs non-orthogonally reversible.

It should be noted that, as known to those skilled in the art, when the encoding method is actually applied to a hardware module in the h.264 standard, the transform module includes a kernel transform, and the quantization module includes a dot product and a quantization. Transforming matrix C according to integer DCT_fThe non-orthogonal invertible method indicates that the lossless transform described in the formulas (2) and (3) cannot be satisfied if the kernel transform coefficient after kernel transform is directly used as a transform coefficient in the h.264 standard. It is therefore necessary to attribute the point multiplication to the transform module, the kernel transform and the point multiplication to the complete DCT transform. Therefore, the h.264 standard can also estimate the distortion D using the transform coefficients before quantization and the inverse-quantized coefficients after inverse quantization.

It is to be understood that the dot-multiplied coefficients obtained by dot-multiplying the kernel transform coefficients are the transform coefficients described in step 201.

It is to be understood that the kernel transform mentioned in the embodiment of the present invention is equivalent to an integer DCT transform, and the dot-multiplied coefficient mentioned in the embodiment of the present invention is equivalent to a transform coefficient. The nomenclature thereof is determined by the actual operation performed, and is not used to limit each operation step.

It is understood that the transforms mentioned in the embodiments of the present invention are all abbreviated as DCT transforms.

It should be noted that, the embodiment of the present invention only takes transform size 4 × 4 as an example, and the video coding method provided by the embodiment of the present invention is applicable to the HEVC standard and the h.264 standard, and the embodiment of the present invention is also applicable to video coding of other transform sizes.

Step 202, quantizing the transform coefficient to obtain a quantized coefficient.

Specifically, the transform coefficients belong to frequency domain data, and the quantized compressed quantized coefficients also belong to frequency domain data.

And 203, performing inverse quantization on the quantized coefficients to obtain inverse quantized coefficients.

In particular, the inverse quantization is for decompression. The distortion is further estimated using the transform coefficients before quantization and the inverse quantized coefficients after inverse quantization.

Step 204, estimating a distortion value of the current prediction mode according to a difference between the transform coefficient and the inverse quantization coefficient.

Specifically, inverse quantization operation is performed on the quantized coefficients to obtain corresponding inverse quantized coefficients Q ', and the video coding distortion D ' is estimated directly by using the error between the inverse quantized coefficients Q ' and the transform coefficients T before quantization. The specific calculation method is shown as formula (4):

wherein, blocksize represents the size of the current block, and (i, j) represents the coordinate value of the current block.

It should be noted that the inverse DCT transform converts the frequency domain data into spatial domain data. In the prior art, coding distortion is calculated by using two spatial domain data of a prediction residual and a reconstruction residual. The embodiment of the invention estimates the distortion by using two frequency domain data of the transformation coefficient and the inverse quantization coefficient. Therefore, the distortion calculation method provided in the embodiment of the present invention is also called a frequency domain distortion estimation method.

Step 205, obtaining the cost value of the current prediction mode according to the code rate value and the distortion value of the current prediction mode.

The cost value of the current prediction mode is calculated by a rate-distortion optimization function as described in equation (1) according to the code rate value and the distortion value of the current prediction mode, and the rate-distortion optimization function is used for weighing the code rate value and the distortion value.

And step 206, selecting the prediction mode with the minimum cost value from the multiple prediction modes as the optimal prediction mode.

And step 207, performing inverse discrete cosine transform on the inverse quantization coefficient of the optimal prediction mode to obtain a reconstructed residual error.

Compared with the existing spatial domain distortion calculation method, the method disclosed by the embodiment of the invention can estimate the distortion of each prediction mode without performing inverse transformation operation on each prediction mode by adopting a frequency domain distortion calculation method. The invention only carries out inverse discrete cosine transform once on the optimal prediction mode to obtain and store the reconstructed residual error. In addition, the sum of the predicted image and the reconstructed residual in the optimal prediction mode is saved as a reconstructed video. The invention greatly reduces the complexity and power consumption of video coding.

Fig. 3 is a flowchart illustrating another video encoding method according to an embodiment of the present invention, and referring to fig. 3, the method includes:

by using the frequency domain distortion estimation method shown in fig. 2, the distortion value D of each prediction mode is obtained through the frequency domain error.

Specifically, entropy coding is performed on a quantization coefficient of a current prediction mode to obtain entropy coding information, and a code rate value R is determined according to the entropy coding information, wherein the current prediction mode is any one of a plurality of prediction modes. And obtaining the cost value of the current prediction mode according to the code rate value R and the distortion value D of the current prediction mode.

Specifically, the cost value J of each prediction mode is calculated by a rate-distortion optimization function. The rate-distortion optimization function is used to trade-off the code rate value R and the distortion value D.

The prediction modes include a plurality of types, and the prediction image differs for each prediction mode, and therefore the prediction residual X differs for each prediction mode. Therefore, the rate value R and the distortion value D are different for each prediction mode, and the cost value J is different for each prediction mode.

Specifically, the prediction mode with the minimum cost value J in multiple prediction modes is selected as the optimal prediction mode; and performing inverse DCT (discrete cosine transformation) on the inverse quantization coefficient of the optimal prediction mode to obtain a reconstructed residual error.

Specifically, the reconstructed residual is added to the prediction value of the optimal prediction mode to form a reconstructed video.

Specifically, mode information and quantization coefficients of the optimal prediction mode are entropy-encoded and output.

According to the embodiment of the invention, only the inverse transformation operation is needed to be carried out on the optimal prediction mode to obtain the reconstructed residual error and the reconstructed video. The embodiment of the invention adopts a frequency domain distortion estimation algorithm, can obtain distortion without performing DCT (discrete cosine transform) transformation and inverse DCT (discrete cosine transform) transformation on each prediction mode, only needs to perform inverse DCT transformation once on the optimal prediction mode to obtain a reconstructed residual error, and has the advantages of low power consumption and low complexity.

Fig. 4 and fig. 5 are flow charts of implementing video encoding methods provided by the prior art and the embodiments of the present invention, respectively. Fig. 4 and 5 are only examples of the case where 5 prediction modes are implemented in parallel, respectively, to illustrate the differences between the present invention and the prior art.

As shown in fig. 4, in the prior art, the prediction residual of each prediction mode needs to be transformed, quantized, dequantized, and inversely transformed to obtain the reconstructed residual of each prediction mode. Transforming and quantizing the residual between the predicted value and the original value obtained in each prediction mode, entropy coding the quantized coefficient to obtain the coding bit number (code rate) R of the current block, inverse quantization and inverse transformation the quantized coefficient to obtain the decoded (reconstructed) residual, and calculating the sum of squares of the reconstructed residual after inverse transformation and the predicted residual before transformation to obtain the distortion D, wherein the specific calculation method is shown as formula (5):

wherein, P_org(i, j) represents the pixel value at the corresponding position (i, j) of the original block, p_rec(i, j) represents the pixel value at the corresponding location (i, j) of the reconstructed block, p_pred(i, j) represents the pixel value at the corresponding position (i, j) of the prediction block.

Calculating each from the prediction residual and the reconstructed residualA distortion value D for the prediction mode. As shown at D in FIG. 4₁₁—D₁₅As shown. And calculating the rate-distortion cost under the current mode according to the formula (1) by utilizing R and D.

The invention mainly relates to a method for selecting an optimal prediction mode to obtain an optimal prediction value by using a rate distortion optimization technology in a prediction process. And more particularly to acquisition of distortion D in rate-distortion optimization techniques.

As shown in fig. 5, the code rate R is obtained in the same manner as the scheme shown in the prior art of fig. 4. In the present invention, inverse quantization operation is performed on the quantized coefficients to obtain corresponding inverse quantized coefficients Q ', and the error between the inverse quantized coefficients Q' and the transform coefficients T before quantization is directly used to estimate D, such as D in FIG. 5₂₁—D₂₅As shown. The specific calculation method is shown in formula (3).

The embodiment of the invention replaces the original distortion calculation method using the airspace by designing the distortion estimation algorithm of the frequency domain, so that the coding end does not need to perform inverse transformation operation on all prediction modes, and only needs to perform a complete coding strategy on the selected optimal mode. The video coding method provided by the embodiment of the invention has the advantages of low complexity and low power consumption.

The reliability of the frequency domain distortion estimation method provided by the embodiment of the present invention is illustrated by the schematic diagrams provided in fig. 6 and 7 comparing the distortion calculated by the distortion estimation method of the present invention with the distortion calculated by the prior art. In the embodiment of the invention, after a frame of video is subjected to blocking processing, D is accurately calculated by adopting the prior art and D obtained by adopting the frequency domain distortion estimation method is compared. The method comprises the following specific steps:

fig. 6 is a schematic diagram illustrating a comparison between the distortion calculated by the distortion estimation method of the present invention and the distortion calculated by the prior art. Referring to fig. 6, the image resolution is 1920 × 1080, and the processing is performed on the 4 × 4 blocks, and distortion values of some video blocks are selected for comparison, so as to illustrate an embodiment of the present invention.

As shown in fig. 6, the abscissa of each point represents the D value of the current block obtained by the existing spatial domain distortion calculation method in the h.264 standard, and the ordinate represents the D' value of the current block obtained by the frequency domain distortion estimation method provided by the embodiment of the present invention.

Fig. 7 is a schematic diagram illustrating a comparison between a distortion calculated by using the distortion estimation method of the present invention and a distortion calculated by using the prior art.

As shown in fig. 7, the abscissa of each point represents the D value of the current block obtained by the existing spatial distortion calculation method in the HEVC standard, and the ordinate represents the D' value of the current block obtained by the frequency domain distortion estimation method provided in the embodiment of the present invention.

Specifically, as can be seen from the selected video block points in fig. 6 or fig. 7, points in fig. 6 or fig. 7 are substantially near a straight line y — x, which indicates that the distortion value D obtained by the spatial distortion estimation method provided by the embodiment of the present invention is substantially similar to the distortion value D obtained by the two distortion calculation methods of the conventional spatial distortion calculation method. Therefore, the frequency domain distortion estimation method provided by the embodiment of the invention is very close to the distortion value calculated by the prior art.

It should be noted that fig. 6 and fig. 7 illustrate a comparison between distortion value calculations of a part of video blocks, and it is shown through a large number of calculations and diagram results that the distortion estimation method provided by the embodiment of the present invention and the distortion calculation of any video block in the prior art both satisfy the rules shown in fig. 6 and fig. 7. And will not be described in detail herein. Therefore, the frequency domain distortion estimation method provided by the embodiment of the invention has higher reliability.

It can be understood that, as can be seen from comparing fig. 6 and fig. 7, the distortion value obtained by the frequency domain distortion estimation method provided in the embodiment of the present invention is closer to the calculation result of the HEVC standard than the calculation result of the h.264 standard. Compared with the H.264 standard, the HEVC standard is more complex, and further verification shows that the distortion estimation method provided by the embodiment of the invention can be well applied to the HEVC standard with more transform sizes and prediction modes.

The embodiment of the invention replaces the original distortion calculation method using the airspace by designing the distortion estimation algorithm of the frequency domain, so that an encoding end does not need to perform inverse transformation operation on all prediction modes, only needs to perform a complete encoding strategy on the selected optimal mode, and the distortion value estimated by the embodiment of the invention is very close to the distortion value obtained by the existing airspace distortion calculation method. Therefore, the embodiment of the invention has the advantages of low complexity, low power consumption and high reliability.

Fig. 8 is a schematic structural diagram of a video encoding apparatus according to an embodiment of the present invention; referring to fig. 8, the apparatus includes:

the discrete cosine transform unit 801 is configured to perform discrete cosine transform on a prediction residual of a current prediction mode to obtain a transform coefficient, where the prediction residual is a pixel value residual of an original image and a predicted image, and the predicted image is obtained by prediction according to a spatial correlation and a temporal correlation of the original image and the current prediction mode.

Specifically, the predicted image is predicted according to an inter prediction mode or an intra prediction mode, the inter prediction mode is to predict a next frame image by using a temporally adjacent previous frame image, and the intra prediction mode is to predict a current frame image by using spatial correlation in the current frame.

Specifically, the discrete cosine transform unit 801 is specifically configured to: under the HEVC standard, discrete cosine transform is performed on the prediction residual of the current prediction mode through a discrete cosine transform matrix to obtain a transform coefficient. The discrete cosine transform matrix is orthogonally invertible. Or, under the H.264 standard, performing integer discrete cosine transform on the prediction residual of the current prediction mode through an integer discrete cosine transform matrix to obtain an integer discrete cosine transform coefficient, wherein the integer discrete cosine transform matrix is non-orthorhombic and reversible. And performing dot multiplication on the integer discrete cosine transform coefficient to obtain a transform coefficient.

A quantization unit 802, configured to quantize the transform coefficient to obtain a quantized coefficient.

An inverse quantization unit 803 is configured to perform inverse quantization on the quantized coefficients to obtain inverse quantized coefficients.

A distortion value calculating unit 804, configured to estimate a distortion value of the current prediction mode according to a difference between the transform coefficient and the dequantized coefficient.

The cost value calculating unit 805 is configured to obtain a cost value of the current prediction mode according to the code rate value and the distortion value of the current prediction mode.

Specifically, the cost value calculating unit 805 is specifically configured to: and entropy coding the quantization coefficient of the current prediction mode to obtain entropy coding information, and determining a code rate value according to the entropy coding information, wherein the current prediction mode is any one of a plurality of prediction modes. And calculating the cost value of the current prediction mode through a rate distortion optimization function according to the code rate value and the distortion value of the current prediction mode, wherein the rate distortion optimization function is used for weighing the code rate value and the distortion value.

The optimal prediction mode selecting unit 806 is configured to select a prediction mode with the smallest cost value from the multiple prediction modes as the optimal prediction mode.

A reconstructed residual block 807 for performing inverse discrete cosine transform on the inverse quantized coefficients of the optimal prediction mode to obtain a reconstructed residual.

The specific work flow of each unit can refer to the description of the above method embodiment, and is not described herein again.

It is understood that 801 to 807 each unit and the like are for realizing the functions of the above method embodiments, and include corresponding hardware structures and/or software modules for executing each function. Those of skill in the art will readily appreciate that the present invention can be implemented in hardware or a combination of hardware and computer software, with the exemplary elements and algorithm steps described in connection with the embodiments disclosed herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiment of the present invention, functional modules such as 801 to 807 may be divided according to the above method embodiments, for example, each functional module may be divided according to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, the division of the modules in the embodiment of the present invention is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

Fig. 9 is a schematic structural diagram of another video encoding apparatus according to an embodiment of the present invention, as shown in fig. 9, including: network card 901, memory 902, processor 903, and bus 904.

Specifically, the network card 901 is configured with a plurality of communication interfaces, and the terminal collects or receives video through the communication interfaces to perform video encoding and decoding. The memory 902 is used to store program instructions. Network card 901, memory 902, and processor 903 communicate over bus 904.

In one example, a processor 903 to perform the following operations according to program instructions stored in memory 902: and performing discrete cosine transform on the prediction residual of the current prediction mode to obtain a transform coefficient. The prediction residual is the difference between the pixel values of the original image and the predicted image, the predicted image is obtained by prediction according to an inter-frame prediction mode or an intra-frame prediction mode, the inter-frame prediction mode is to predict the next frame image by using the temporally adjacent previous frame image, and the intra-frame prediction mode is to predict the current frame image by using the spatial correlation in the current frame. And quantizing the transformation coefficient to obtain a quantized coefficient. And carrying out inverse quantization on the quantized coefficients to obtain inverse quantized coefficients. And obtaining the distortion value of the current prediction mode according to the difference between the transformation coefficient and the inverse quantization coefficient. And obtaining the cost value of the current prediction mode according to the code rate value and the distortion value of the current prediction mode. And selecting the prediction mode with the minimum cost value from the multiple prediction modes as the optimal prediction mode. And performing inverse discrete cosine transform on the inverse quantization coefficient of the optimal prediction mode to obtain a reconstructed residual error.

The memory 902 may be a single storage device or a collective name of a plurality of storage elements, and is used to store information such as programs and data necessary for operating the conference server. And the Memory 902 may include a combination of one or more storage media such as Random Access Memory (RAM), Flash Memory, Read Only Memory (ROM), Erasable Programmable Read Only Memory (EPROM), Electrically Erasable Programmable Read Only Memory (EEPROM), registers, a hard disk, a removable disk, a compact disc Read Only Memory (CD-ROM), Flash Memory (Flash), or any other form of storage media known in the art.

The processor 903 may be a CPU, general processor, DSP, Application-Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other Programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, units, and circuits described in connection with the present disclosure. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, among others.

The bus 904 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (enhanced Industry Standard Architecture) bus, or the like. The bus 904 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 9, but this does not indicate only one bus or one type of bus.

In one example, the processor 903 calculates the current prediction mode cost value from the code rate value and the distortion value of the current prediction mode through a rate-distortion optimization function that is used to trade off the code rate value and the distortion value.

In one example, the processor 903 performs discrete cosine transform on the prediction residual of the current prediction mode to obtain transform coefficients, including:

and performing discrete cosine transform on the prediction residual of the current prediction mode through a discrete cosine transform matrix to obtain a transform coefficient. Wherein the discrete cosine transform matrix is orthogonally invertible. Or, performing integer discrete cosine transform on the prediction residual of the current prediction mode through the integer discrete cosine transform matrix to obtain an integer discrete cosine transform coefficient. Wherein the integer discrete cosine transform matrix is non-orthogonally invertible. And performing dot multiplication on the integer discrete cosine transform coefficient to obtain a transform coefficient.

Further, a bus 904 may be used to connect the units in FIG. 8. A processor 903 may be used to perform the functions of the units 801-807 and a processor 902 may be used to store data for the units 801-807.

Compared with the existing spatial domain distortion calculation method, the video coding method and the video coding device provided by the embodiment of the invention have the advantages of low complexity and low power consumption and have high reliability by adopting the frequency domain distortion estimation method.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

A method of video encoding, the method comprising:

performing discrete cosine transform on a prediction residual of a current prediction mode to obtain a transform coefficient, wherein the prediction residual is the difference between pixel values of an original image and a predicted image, the predicted image is obtained by prediction according to an inter-frame prediction mode or an intra-frame prediction mode, the inter-frame prediction mode is to predict a next frame image by using a temporally adjacent previous frame image, and the intra-frame prediction mode is to predict a current frame image by using spatial correlation in the current frame;

quantizing the transformation coefficient to obtain a quantized coefficient;

carrying out inverse quantization on the quantization coefficient to obtain an inverse quantization coefficient;

obtaining a distortion value of the current prediction mode according to the difference between the transformation coefficient and the inverse quantization coefficient;

obtaining the cost value of the current prediction mode according to the code rate value and the distortion value of the current prediction mode;

selecting the prediction mode with the minimum cost value in the multiple prediction modes as the optimal prediction mode;

and performing inverse discrete cosine transform on the inverse quantization coefficient of the optimal prediction mode to obtain a reconstructed residual error.
The method of claim 1, wherein the deriving the current prediction mode cost value according to the rate value and the distortion value of the current prediction mode comprises:

and calculating the cost value of the current prediction mode through a rate distortion optimization function according to the code rate value and the distortion value of the current prediction mode, wherein the rate distortion optimization function is used for balancing the code rate value and the distortion value.
The method of claim 1, wherein the discrete cosine transforming the prediction residue of the current prediction mode to obtain transform coefficients comprises:

and performing discrete cosine transform on the prediction residual of the current prediction mode through a discrete cosine transform matrix to obtain the transform coefficient, wherein the discrete cosine transform matrix is orthoreversible.
The method of claim 1, wherein the discrete cosine transforming the prediction residue of the current prediction mode to obtain transform coefficients comprises:

performing integer discrete cosine transform on the prediction residual of the current prediction mode through an integer discrete cosine transform matrix to obtain an integer discrete cosine transform coefficient, wherein the integer discrete cosine transform matrix is non-orthorhombic and reversible;

and performing point multiplication on the integer discrete cosine transform coefficient to obtain the transform coefficient.
A video encoding apparatus, characterized in that the apparatus comprises:

the discrete cosine transform unit is used for performing discrete cosine transform on the prediction residual error of the current prediction mode to obtain a transform coefficient; the prediction residual is the difference between the pixel values of an original image and a predicted image, the predicted image is obtained by prediction according to an inter-frame prediction mode or an intra-frame prediction mode, the inter-frame prediction mode is to predict a next frame image by using a temporally adjacent previous frame image, and the intra-frame prediction mode is to predict a current frame image by using spatial correlation in the current frame;

the quantization unit is used for quantizing the transformation coefficient to obtain a quantization coefficient;

the inverse quantization unit is used for carrying out inverse quantization on the quantization coefficient to obtain an inverse quantization coefficient;

a distortion value calculation unit, configured to obtain a distortion value of the current prediction mode according to a difference between the transform coefficient and the dequantization coefficient;

the cost value calculation unit is used for obtaining the cost value of the current prediction mode according to the code rate value and the distortion value of the current prediction mode;

the optimal prediction mode selecting unit is used for selecting the prediction mode with the minimum cost value in the multiple prediction modes as the optimal prediction mode;

and the reconstructed residual error unit is used for carrying out inverse discrete cosine transform on the inverse quantization coefficient of the optimal prediction mode to obtain a reconstructed residual error.
The apparatus of claim 5, wherein the cost value calculation unit is specifically configured to:

and calculating the cost value of the current prediction mode through a rate distortion optimization function according to the code rate value and the distortion value of the current prediction mode, wherein the rate distortion optimization function is used for balancing the code rate value and the distortion value.
The apparatus of claim 5, wherein the discrete cosine transform unit is specifically configured to:

and performing discrete cosine transform on the prediction residual of the current prediction mode through a discrete cosine transform matrix to obtain the transform coefficient, wherein the discrete cosine transform matrix is orthoreversible.
The apparatus of claim 5, wherein the discrete cosine transform unit is specifically configured to:

performing integer discrete cosine transform on the prediction residual of the current prediction mode through an integer discrete cosine transform matrix to obtain an integer discrete cosine transform coefficient, wherein the integer discrete cosine transform matrix is non-orthorhombic and reversible;

and performing point multiplication on the integer discrete cosine transform coefficient to obtain the transform coefficient.
A video encoding apparatus, characterized in that the apparatus comprises:

a memory for storing program instructions;

a processor for performing the following operations according to program instructions stored in the memory:

performing discrete cosine transform on the prediction residual of the current prediction mode to obtain a transform coefficient; the prediction residual is the difference between the pixel values of an original image and a predicted image, the predicted image is obtained by prediction according to an inter-frame prediction mode or an intra-frame prediction mode, the inter-frame prediction mode is to predict a next frame image by using a temporally adjacent previous frame image, and the intra-frame prediction mode is to predict a current frame image by using spatial correlation in the current frame;

quantizing the transformation coefficient to obtain a quantized coefficient;

carrying out inverse quantization on the quantization coefficient to obtain an inverse quantization coefficient;

obtaining a distortion value of the current prediction mode according to the difference between the transformation coefficient and the inverse quantization coefficient;

obtaining the cost value of the current prediction mode according to the code rate value and the distortion value of the current prediction mode;

selecting the prediction mode with the minimum cost value in the multiple prediction modes as the optimal prediction mode;

and performing inverse discrete cosine transform on the inverse quantization coefficient of the optimal prediction mode to obtain a reconstructed residual error.
The apparatus of claim 9, wherein the processor performs the deriving the current prediction mode cost value according to a rate value and a distortion value of the current prediction mode, comprising:

and calculating the cost value of the current prediction mode through a rate distortion optimization function according to the code rate value and the distortion value of the current prediction mode, wherein the rate distortion optimization function is used for balancing the code rate value and the distortion value.
The apparatus of claim 9, wherein the processor performs the discrete cosine transforming the prediction residue of the current prediction mode to obtain transform coefficients, comprising:

and performing discrete cosine transform on the prediction residual of the current prediction mode through a discrete cosine transform matrix to obtain the transform coefficient, wherein the discrete cosine transform matrix is orthoreversible.
The apparatus of claim 9, wherein the processor performs the discrete cosine transforming the prediction residue of the current prediction mode to obtain transform coefficients, comprising:

performing integer discrete cosine transform on the prediction residual of the current prediction mode through an integer discrete cosine transform matrix to obtain an integer discrete cosine transform coefficient, wherein the integer discrete cosine transform matrix is non-orthorhombic and reversible;

and performing point multiplication on the integer discrete cosine transform coefficient to obtain the transform coefficient.
A computer storage medium, characterized in that the computer storage medium stores a program, the steps performed by the program comprising the steps of any one of claims 1-4.