WO2023220970A1

WO2023220970A1 - Video coding method and apparatus, and device, system and storage medium

Info

Publication number: WO2023220970A1
Application number: PCT/CN2022/093582
Authority: WO
Inventors: 唐桐
Original assignee: Oppo广东移动通信有限公司
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2023-11-23

Abstract

Provided in the present application are a video coding method and apparatus, and a device, a system and a storage medium. The method comprises: by means of a fierce motion parameter, determining whether the current video is a translational motion video, and if the current video is the translational motion video, skipping an affine motion compensation prediction mode, such that the waste of computing resources caused when the affine motion compensation prediction mode is used to perform prediction on the translational motion video is avoided, thereby shortening the coding time, improving the video coding efficiency, and saving on computing resources.

Description

Video coding method, device, equipment, system, and storage medium

Technical field

The present application relates to the field of video coding and decoding technology, and in particular, to a video coding method, device, equipment, system, and storage medium.

Background technique

Digital video technology can be incorporated into a variety of video devices, such as digital televisions, smartphones, computers, e-readers, or video players. With the development of video technology, video data includes a larger amount of data. In order to facilitate the transmission of video data, video devices implement video compression technology to make the video data more efficiently transmitted or stored.

Redundancy is reduced through prediction during video transmission. For example, through the Affine motion compensation prediction (AFF) mode, prediction of enlargement or reduction, rotation, perspective movement and other irregular movements is achieved. However, currently, when using the affine motion compensation prediction mode, a lot of unnecessary overhead may be generated, wasting computing resources, and increasing encoding time.

Contents of the invention

Embodiments of the present application provide a video encoding method, device, equipment, system, and storage medium that determine whether to skip the affine motion compensation prediction mode by determining violent motion parameters, thereby avoiding waste of computing resources and reducing encoding time.

In a first aspect, embodiments of the present application provide a video decoding method, including:

Decode the code stream and determine the residual value of the current coding tree unit CTU;

Determine the block division method and prediction mode of the current CTU. The block division method and prediction mode of the current CTU are determined based on the violent motion parameter. The violent motion parameter is used to indicate whether to skip the affine motion compensation prediction mode;

Perform block division on the current CTU using the block division method of the current CTU to obtain at least one coding unit CU;

For the current CU in the at least one CU, use the prediction mode corresponding to the current CU in the prediction mode of the current CTU to predict the current CU to obtain a prediction value of the current CU;

According to the residual value of the current CTU, the residual value of the current CU is determined, and based on the residual value and the prediction value of the current CU, the reconstruction value of the current CU is obtained.

In a second aspect, embodiments of the present application provide a video encoding method, including:

Determine the block division method and prediction mode of the current coding tree unit CTU according to the violent motion parameter;

According to the predicted value of the current CU, the residual value of the current CU is determined, and based on the residual value of the current CU, a code stream is obtained.

In a third aspect, the present application provides a video decoding device for performing the method in the above first aspect or its respective implementations. Specifically, the decoding device includes a functional unit for executing the method in the above-mentioned first aspect or its respective implementations.

In a fourth aspect, the present application provides a video encoding device for performing the method in the above second aspect or its respective implementations. Specifically, the encoding device includes a functional unit for executing the method in the above-mentioned second aspect or its respective implementations.

In a fifth aspect, a video decoder is provided, including a processor and a memory. The memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method in the above first aspect or its respective implementations.

A sixth aspect provides a video encoder, including a processor and a memory. The memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method in the above second aspect or its respective implementations.

A seventh aspect provides a video encoding and decoding system, including a video encoder and a video decoder. The video encoder is used to perform the method in the above-mentioned second aspect or its various implementations, and the video decoder is used to perform the method in the above-mentioned first aspect or its various implementations.

An eighth aspect provides a chip for implementing any one of the above-mentioned first to second aspects or the method in each implementation manner thereof. Specifically, the chip includes: a processor, configured to call and run a computer program from a memory, so that the device installed with the chip executes any one of the above-mentioned first to second aspects or implementations thereof. method.

A ninth aspect provides a computer-readable storage medium for storing a computer program that causes a computer to execute any one of the above-mentioned first to second aspects or the method in each implementation thereof.

In a tenth aspect, a computer program product is provided, including computer program instructions, which enable a computer to execute any one of the above-mentioned first to second aspects or the methods in each implementation thereof.

An eleventh aspect provides a computer program that, when run on a computer, causes the computer to execute any one of the above-mentioned first to second aspects or the method in each implementation thereof.

A twelfth aspect provides a code stream, which is generated by any one of the above-mentioned first aspects or implementations thereof.

Based on the above technical solution, by determining the violent motion parameters, it is determined whether the current video is a translational motion video. If the current video is a translational motion video, the affine motion compensation prediction mode is skipped, thereby avoiding the need for translational motion videos. The waste of computing resources caused by using affine motion compensation prediction mode prediction can reduce encoding time, improve video encoding efficiency, and save computing resources.

Description of the drawings

Figure 1 is a schematic block diagram of a video encoding and decoding system related to an embodiment of the present application;

Figure 2 is a schematic block diagram of a video encoder involved in an embodiment of the present application;

Figure 3 is a schematic block diagram of a video decoder involved in an embodiment of the present application;

Figures 4A and 4B are schematic diagrams of control point selection in the affine motion compensation prediction mode according to embodiments of the present application;

Figure 5 is a schematic diagram of the motion vector of each sub-block determined using the affine motion compensation prediction mode;

Figure 6 is a schematic flow chart of a video decoding method provided by an embodiment of the present application;

Figure 7 is a schematic flow chart of a video encoding method provided by an embodiment of the present application;

Figure 8 is a schematic block diagram of a video decoding device provided by an embodiment of the present application;

Figure 9 is a schematic block diagram of a video encoding device provided by an embodiment of the present application;

Figure 10 is a schematic block diagram of an electronic device provided by an embodiment of the present application;

Figure 11 is a schematic block diagram of a video encoding and decoding system provided by an embodiment of the present application.

Detailed ways

This application can be applied to the fields of image encoding and decoding, video encoding and decoding, hardware video encoding and decoding, dedicated circuit video encoding and decoding, real-time video encoding and decoding, etc. For example, the solution of this application can be combined with the audio and video coding standard (AVS for short), such as H.264/audio video coding (AVC for short) standard, H.265/high-efficiency video coding (AVS for short) high efficiency video coding (HEVC) standard and H.266/versatile video coding (VVC) standard. Alternatively, the solution of this application can be operated in conjunction with other proprietary or industry standards, including ITU-TH.261, ISO/IECMPEG-1Visual, ITU-TH.262 or ISO/IECMPEG-2Visual, ITU-TH.263 , ISO/IECMPEG-4Visual, ITU-TH.264 (also known as ISO/IECMPEG-4AVC), including scalable video codec (SVC) and multi-view video codec (MVC) extensions. It should be understood that the technology of this application is not limited to any specific codec standard or technology.

For ease of understanding, the video encoding and decoding system involved in the embodiment of the present application is first introduced with reference to FIG. 1 .

Figure 1 is a schematic block diagram of a video encoding and decoding system related to an embodiment of the present application. It should be noted that Figure 1 is only an example, and the video encoding and decoding system in the embodiment of the present application includes but is not limited to what is shown in Figure 1 . As shown in FIG. 1 , the video encoding and decoding system 100 includes an encoding device 110 and a decoding device 120 . The encoding device is used to encode the video data (which can be understood as compression) to generate a code stream, and transmit the code stream to the decoding device. The decoding device decodes the code stream generated by the encoding device to obtain decoded video data.

The encoding device 110 in the embodiment of the present application can be understood as a device with a video encoding function, and the decoding device 120 can be understood as a device with a video decoding function. That is, the embodiment of the present application includes a wider range of devices for the encoding device 110 and the decoding device 120. Examples include smartphones, desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, and the like.

In some embodiments, the encoding device 110 may transmit the encoded video data (eg, code stream) to the decoding device 120 via the channel 130 . Channel 130 may include one or more media and/or devices capable of transmitting encoded video data from encoding device 110 to decoding device 120 .

In one example, channel 130 includes one or more communication media that enables encoding device 110 to transmit encoded video data directly to decoding device 120 in real time. In this example, encoding device 110 may modulate the encoded video data according to the communication standard and transmit the modulated video data to decoding device 120. The communication media includes wireless communication media, such as radio frequency spectrum. Optionally, the communication media may also include wired communication media, such as one or more physical transmission lines.

In another example, channel 130 includes a storage medium that can store video data encoded by encoding device 110 . Storage media include a variety of local access data storage media, such as optical disks, DVDs, flash memories, etc. In this example, the decoding device 120 may obtain the encoded video data from the storage medium.

In another example, channel 130 may include a storage server that may store video data encoded by encoding device 110 . In this example, the decoding device 120 may download the stored encoded video data from the storage server. Optionally, the storage server may store the encoded video data and may transmit the encoded video data to the decoding device 120, such as a web server (eg, for a website), a File Transfer Protocol (FTP) server, etc.

In some embodiments, the encoding device 110 includes a video encoder 112 and an output interface 113. Among other things, the output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.

In some embodiments, the encoding device 110 may include a video source 111 in addition to the video encoder 112 and the input interface 113 .

Video source 111 may include at least one of a video capture device (eg, a video camera), a video archive, a video input interface for receiving video data from a video content provider, a computer graphics system Used to generate video data.

The video encoder 112 encodes the video data from the video source 111 to generate a code stream. Video data may include one or more images (pictures) or sequence of pictures (sequence of pictures). The code stream contains the encoding information of an image or image sequence in the form of a bit stream. Encoded information may include encoded image data and associated data. The associated data may include sequence parameter set (SPS), picture parameter set (PPS) and other syntax structures. An SPS can contain parameters that apply to one or more sequences. A PPS can contain parameters that apply to one or more images. A syntax structure refers to a collection of zero or more syntax elements arranged in a specified order in a code stream.

The video encoder 112 transmits the encoded video data directly to the decoding device 120 via the output interface 113 . The encoded video data can also be stored on a storage medium or storage server for subsequent reading by the decoding device 120 .

In some embodiments, decoding device 120 includes input interface 121 and video decoder 122.

In some embodiments, in addition to the input interface 121 and the video decoder 122, the decoding device 120 may also include a display device 123.

The input interface 121 includes a receiver and/or a modem. Input interface 121 may receive encoded video data over channel 130.

The video decoder 122 is used to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123 .

The display device 123 displays the decoded video data. Display device 123 may be integrated with decoding device 120 or external to decoding device 120 . Display device 123 may include a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.

In addition, Figure 1 is only an example, and the technical solutions of the embodiments of the present application are not limited to Figure 1. For example, the technology of the present application can also be applied to unilateral video encoding or unilateral video decoding.

The video coding framework involved in the embodiments of this application is introduced below.

The video encoder involved in the embodiment of the present application is introduced below.

Figure 2 is a schematic block diagram of a video encoder provided by an embodiment of the present application. It should be understood that the video encoder 200 can be used to perform lossy compression of images (lossy compression), or can also be used to perform lossless compression (lossless compression) of images. The lossless compression can be visually lossless compression (visually lossless compression) or mathematically lossless compression (mathematically lossless compression).

The video encoder 200 can be applied to image data in a luminance-chrominance (YCbCr, YUV) format. For example, the YUV ratio can be 4:2:0, 4:2:2 or 4:4:4, Y represents brightness (Luma), Cb(U) represents blue chroma, Cr(V) represents red chroma, U and V represent Chroma, which is used to describe color and saturation. For example, in the color format, 4:2:0 means that every 4 pixels have 4 luminance components and 2 chrominance components (YYYYCbCr), 4:2:2 means that every 4 pixels have 4 luminance components and 4 Chroma component (YYYYCbCrCbCr), 4:4:4 means full pixel display (YYYYCbCrCbCrCbCrCbCr).

For example, the video encoder 200 reads video data, and for each frame of image in the video data, divides one frame of image into several coding tree units (coding tree units, CTU). In some examples, CTU may be called "Tree block", "Largest Coding unit" (LCU for short) or "coding tree block" (CTB for short). Each CTU can be associated with an equal-sized block of pixels within the image. Each pixel can correspond to one luminance (luminance or luma) sample and two chrominance (chrominance or chroma) samples. Therefore, each CTU can be associated with one block of luma samples and two blocks of chroma samples. A CTU size is, for example, 128×128, 64×64, 32×32, etc. A CTU can be further divided into several coding units (Coding Units, CUs) for encoding. CUs can be rectangular blocks or square blocks. CU can be further divided into prediction unit (PU for short) and transform unit (TU for short), thus enabling coding, prediction, and transformation to be separated and processing more flexible. In an example, the CTU is divided into CUs in a quad-tree manner, and the CU is divided into TUs and PUs in a quad-tree manner.

Video encoders and video decoders can support various PU sizes. Assuming that the size of a specific CU is 2N×2N, the video encoder and video decoder can support a PU size of 2N×2N or N×N for intra prediction, and support 2N×2N, 2N×N, N×2N, N×N or similar sized symmetric PU for inter prediction. The video encoder and video decoder can also support 2N×nU, 2N×nD, nL×2N and nR×2N asymmetric PUs for inter prediction.

In some embodiments, as shown in Figure 2, the video encoder 200 may include: a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, and a loop filter unit. 260. Decode the image cache 270 and the entropy encoding unit 280. It should be noted that the video encoder 200 may include more, less, or different functional components.

Optionally, in this application, the current block may be called the current coding unit (CU) or the current prediction unit (PU), etc. The prediction block may also be called a predicted image block or an image prediction block, and the reconstructed image block may also be called a reconstruction block or an image reconstructed image block.

In some embodiments, prediction unit 210 includes inter prediction unit 211 and intra prediction unit 212. Since there is a strong correlation between adjacent pixels in a video frame, the intra-frame prediction method is used in video encoding and decoding technology to eliminate the spatial redundancy between adjacent pixels. Since there is a strong similarity between adjacent frames in the video, the interframe prediction method is used in video coding and decoding technology to eliminate the temporal redundancy between adjacent frames, thereby improving coding efficiency.

The inter-frame prediction unit 211 can be used for inter-frame prediction. Inter-frame prediction can refer to image information of different frames. Inter-frame prediction uses motion information to find reference blocks from reference frames and generate prediction blocks based on the reference blocks to eliminate temporal redundancy; The frames used in inter-frame prediction may be P frames and/or B frames. P frames refer to forward prediction frames, and B frames refer to bidirectional prediction frames. The motion information includes the reference frame list where the reference frame is located, the reference frame index, and the motion vector. The motion vector can be in whole pixels or sub-pixels. If the motion vector is in sub-pixels, then interpolation filtering needs to be used in the reference frame to make the required sub-pixel blocks. Here, the reference frame found based on the motion vector is A block of whole pixels or sub-pixels is called a reference block. Some technologies will directly use the reference block as a prediction block, and some technologies will process the reference block to generate a prediction block. Reprocessing to generate a prediction block based on a reference block can also be understood as using the reference block as a prediction block and then processing to generate a new prediction block based on the prediction block.

The intra prediction unit 212 only refers to the information of the same frame image and predicts the pixel information in the current coded image block to eliminate spatial redundancy. The frames used in intra prediction may be I frames. For example, as shown in Figure 5, the white 4×4 block is the current block, and the gray pixels in the left row and the upper column of the current block are the reference pixels of the current block. Intra-frame prediction uses these reference pixels to predict the current block. These reference pixels may all be available, that is, all of them may have been encoded and decoded. There may also be some parts that are unavailable. For example, if the current block is the leftmost part of the entire frame, then the reference pixel on the left side of the current block is unavailable. Or when encoding and decoding the current block, the lower left part of the current block has not been encoded or decoded, so the reference pixels in the lower left are also unavailable. For cases where the reference pixel is not available, the available reference pixels or certain values or certain methods can be used for filling, or no filling can be performed.

In some embodiments, the intra prediction method also includes a multiple reference line intra prediction method (multiple reference line, MRL). MRL can use more reference pixels to improve coding efficiency.

There are multiple prediction modes for intra prediction. In H.264, there are 9 modes for intra prediction of 4×4 blocks. Among them, mode 0 is to copy the pixels above the current block in the vertical direction to the current block as the prediction value; mode 1 is to copy the reference pixel on the left to the current block in the horizontal direction as the prediction value; mode 2 (DC) is to copy A~ The average value of the eight points D and I~L is used as the predicted value of all points. Mode 3 to mode 8 copy the reference pixel to the corresponding position of the current block at a certain angle. Because some positions of the current block cannot exactly correspond to the reference pixels, it may be necessary to use the weighted average of the reference pixels, or the sub-pixels of the interpolated reference pixels.

The intra-frame prediction modes used by HEVC include planar mode (Planar), DC and 33 angle modes, for a total of 35 prediction modes. The intra-frame modes used by VVC include Planar, DC and 65 angle modes, for a total of 67 prediction modes. The intra-frame modes used by AVS3 include DC, Plane, Bilinear and 63 angle modes, for a total of 66 prediction modes.

It should be noted that with the increase of angle modes, intra-frame prediction will be more accurate and more in line with the development needs of high-definition and ultra-high-definition digital videos.

Residual unit 220 may generate a residual block of the CU based on the pixel block of the CU and the prediction block of the PU of the CU. For example, residual unit 220 may generate a residual block of a CU such that each sample in the residual block has a value equal to the difference between the sample in the pixel block of the CU and the PU of the CU. Predict the corresponding sample in the block.

Transform/quantization unit 230 may quantize the transform coefficients. Transform/quantization unit 230 may quantize transform coefficients associated with a TU of the CU based on a quantization parameter (QP) value associated with the CU. Video encoder 200 may adjust the degree of quantization applied to transform coefficients associated with the CU by adjusting the QP value associated with the CU.

Inverse transform/quantization unit 240 may apply inverse quantization and inverse transform to the quantized transform coefficients, respectively, to reconstruct the residual block from the quantized transform coefficients.

Reconstruction unit 250 may add samples of the reconstructed residual block to corresponding samples of one or more prediction blocks generated by prediction unit 210 to produce a reconstructed image block associated with the TU. By reconstructing blocks of samples for each TU of a CU in this manner, video encoder 200 can reconstruct blocks of pixels of the CU.

Loop filtering unit 260 may perform deblocking filtering operations to reduce blocking artifacts for blocks of pixels associated with the CU.

In some embodiments, the loop filtering unit 260 includes a deblocking filtering unit, a sample adaptive compensation SAO unit, and an adaptive loop filtering ALF unit.

Decoded image cache 270 may store reconstructed pixel blocks. Inter prediction unit 211 may perform inter prediction on PUs of other images using reference images containing reconstructed pixel blocks. Additionally, intra prediction unit 212 may use the reconstructed pixel blocks in decoded image cache 270 to perform intra prediction on other PUs in the same image as the CU.

Entropy encoding unit 280 may receive the quantized transform coefficients from transform/quantization unit 230 . Entropy encoding unit 280 may perform one or more entropy encoding operations on the quantized transform coefficients to generate entropy encoded data.

The basic process of video encoding involved in this application is as follows: at the encoding end, the current image is divided into blocks, and for the current block, the prediction unit 210 uses intra prediction or inter prediction to generate a prediction block of the current block. The residual unit 220 may calculate a residual block based on the prediction block and the original block of the current block, that is, the difference between the prediction block and the original block of the current block. The residual block may also be called residual information. The residual block undergoes transformation and quantization processes such as transformation/quantization unit 230 to remove information that is insensitive to human eyes to eliminate visual redundancy. Optionally, the residual block before transformation and quantization by the transformation/quantization unit 230 may be called a time domain residual block, and the time domain residual block after transformation and quantization by the transformation/quantization unit 230 may be called a frequency residual block. or frequency domain residual block. The entropy encoding unit 280 receives the quantized transform coefficients output by the transform and quantization unit 230, and may perform entropy encoding on the quantized transform coefficients to output a code stream. For example, the entropy encoding unit 280 may eliminate character redundancy according to the target context model and probability information of the binary code stream.

In addition, the video encoder performs inverse quantization and inverse transformation on the quantized transform coefficients output by the transform and quantization unit 230 to obtain the residual block of the current block, and then adds the residual block of the current block and the prediction block of the current block, Get the reconstructed block of the current block. As the encoding proceeds, reconstruction blocks corresponding to other image blocks in the current image can be obtained, and these reconstruction blocks are spliced to obtain a reconstructed image of the current image. Since errors are introduced during the encoding process, in order to reduce the error, the reconstructed image is filtered. For example, ALF is used to filter the reconstructed image to reduce the difference between the pixel value of the pixel in the reconstructed image and the original pixel value of the pixel in the current image. difference. The filtered reconstructed image is stored in the decoded image cache 270 and can be used as a reference frame for inter-frame prediction for subsequent frames.

It should be noted that the block division information determined by the encoding end, as well as mode information or parameter information such as prediction, transformation, quantization, entropy coding, loop filtering, etc., are carried in the code stream when necessary. The decoding end determines the same block division information as the encoding end by parsing the code stream and analyzing the existing information, prediction, transformation, quantization, entropy coding, loop filtering and other mode information or parameter information, thereby ensuring the decoded image obtained by the encoding end It is the same as the decoded image obtained by the decoding end.

Figure 3 is a schematic block diagram of a video decoder provided by an embodiment of the present application.

As shown in FIG. 3 , the video decoder 300 includes an entropy decoding unit 310 , a prediction unit 320 , an inverse quantization/transformation unit 330 , a reconstruction unit 340 , a loop filtering unit 350 and a decoded image cache 360 . It should be noted that the video decoder 300 may include more, less, or different functional components.

Video decoder 300 can receive the code stream. Entropy decoding unit 310 may parse the codestream to extract syntax elements from the codestream. As part of parsing the code stream, the entropy decoding unit 310 may parse entropy-encoded syntax elements in the code stream. The prediction unit 320, the inverse quantization/transformation unit 330, the reconstruction unit 340 and the loop filtering unit 350 may decode the video data according to the syntax elements extracted from the code stream, that is, generate decoded video data.

In some embodiments, prediction unit 320 includes inter prediction unit 321 and intra prediction unit 322.

Inter prediction unit 321 may perform intra prediction to generate predicted blocks for the PU. Inter prediction unit 321 may use an intra prediction mode to generate predicted blocks for a PU based on pixel blocks of spatially neighboring PUs. Inter prediction unit 321 may also determine the intra prediction mode of the PU based on one or more syntax elements parsed from the codestream.

Intra prediction unit 322 may construct a first reference image list (List 0) and a second reference image list (List 1) based on syntax elements parsed from the codestream. Additionally, if the PU uses inter-prediction encoding, entropy decoding unit 310 may parse the motion information of the PU. Intra-prediction unit 322 may determine one or more reference blocks for the PU based on the motion information of the PU. Intra-prediction unit 322 may generate a predicted block for the PU based on one or more reference blocks of the PU.

Inverse quantization/transform unit 330 may inversely quantize (ie, dequantize) transform coefficients associated with a TU. Inverse quantization/transform unit 330 may use the QP value associated with the CU of the TU to determine the degree of quantization.

After inversely quantizing the transform coefficients, inverse quantization/transform unit 330 may apply one or more inverse transforms to the inverse quantized transform coefficients to produce a residual block associated with the TU.

Reconstruction unit 340 uses the residual blocks associated with the TU of the CU and the prediction blocks of the PU of the CU to reconstruct the pixel blocks of the CU. For example, reconstruction unit 340 may add samples of the residual block to corresponding samples of the prediction block to reconstruct the pixel block of the CU to obtain a reconstructed image block.

Loop filtering unit 350 may perform deblocking filtering operations to reduce blocking artifacts for blocks of pixels associated with the CU.

In some embodiments, the loop filtering unit 350 includes a deblocking filtering unit, a sample adaptive compensation SAO unit, and an adaptive loop filtering ALF unit.

Video decoder 300 may store the reconstructed image of the CU in decoded image cache 360 . The video decoder 300 may use the reconstructed image in the decoded image cache 360 as a reference image for subsequent prediction, or transmit the reconstructed image to a display device for presentation.

The basic process of video decoding involved in this application is as follows: the entropy decoding unit 310 can parse the code stream to obtain the prediction information, quantization coefficient matrix, etc. of the current block. The prediction unit 320 uses intra prediction or inter prediction for the current block based on the prediction information to generate the current block. Block prediction block. The inverse quantization/transform unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block. The reconstruction unit 340 adds the prediction block and the residual block to obtain a reconstruction block. The reconstructed blocks constitute a reconstructed image, and the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or based on the blocks to obtain a decoded image. The decoded image can also be called a reconstructed image. On the one hand, the reconstructed image can be displayed by the display device, and on the other hand, it can be stored in the decoded image buffer 360 and used as a reference frame for inter-frame prediction for subsequent frames.

The above is the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules or steps of the framework or process may be optimized. This application is applicable to the block-based hybrid coding framework. The basic process of the video codec, but is not limited to this framework and process.

It can be seen from the above that currently in the video encoding and decoding process, the prediction methods used include intra-frame prediction methods and inter-frame prediction methods, and ordinary inter-frame prediction methods mainly consider translational motion. However, in the real world, besides translational motion, there are many kinds of other motion motions, such as zooming in/out, rotation, perspective motion and other irregular motions. Therefore, in some embodiments, a block-based affine motion compensated prediction mode is used for prediction. The affine motion compensation prediction mode is introduced below.

As shown in Figures 4A and 4B, the affine motion field of a block is described by the motion information of two control points (4 parameters) or three control point motion vectors (6 parameters).

The block-based affine motion compensation prediction mode includes the following steps:

Step 1, first divide the block into 4x4 luma sub-blocks.

Step 2, each brightness sub-block calculates the motion vector of its center pixel from the affine vector, and then rounds it to 1/16 accuracy. Among them, the 4-parameter affine motion model and the 6-parameter affine motion model derive motion vectors according to different calculation methods.

For example, for a 4-parameter affine motion model, the motion vector calculation of the sub-block whose central pixel is (x, y) is as shown in formula (1):

For example, for a 6-parameter affine motion model, the motion vector calculation of the sub-block whose central pixel is (x, y) is as shown in formula (2):

Among them, (mv0x, mv0y), (mv1x, mv1y), (mv2x, mv2y) are the motion vectors of the control points in the upper left corner, upper right corner and lower left corner respectively.

Step 3: After calculating the motion vector for each sub-block (as shown in Figure 5), motion compensation interpolation filtering is performed based on the motion vector to obtain the predicted value of each sub-block.

Step 4: The chroma component is also divided into 4x4 sub-blocks, and its motion vector is equal to the average of the motion vectors of the four 4x4 luma sub-blocks related to it.

Like the traditional inter-frame motion vector prediction method, there are two prediction methods for affine motion vectors: affine merge mode and affine AMVP mode.

Affine merge prediction (affine merge prediction), you can use AF_MERGE mode for CUs whose width and height are greater than or equal to 8. In this mode, the control point motion vector (CPMV) of the current CU is generated from the motion information of its adjacent CUs in the spatial domain. At most 5 CPMV prediction candidates are generated, and an index needs to be transmitted to indicate which candidate was ultimately used.

Affine AMVP prediction (affine AMVP prediction), the affine AMVP mode can be used for CUs whose width and height are greater than or equal to 16. In the merge mode, the predicted CPMV is used directly, while in AMVP what needs to be transmitted is the optimal CPMV of the current CU and the residual of the predicted CPMV. The candidate list for affine AMVP prediction has 2 candidates.

In some embodiments, the AFF-based encoding method is: first, divide the input image into multiple non-overlapping CTUs. Then, each CTU is processed in sequence according to the raster scanning order, and the CTU is divided into several CUs in different ways. Among them, the main steps to determine the optimal block division method are as follows:

Step 11: For the i-th division Split[i], calculate the minimum prediction cost CurBestCostInter[i] and the optimal mode CurBestModeInter[i] in the inter prediction mode. Specifically, the traditional inter-frame prediction method (without affine prediction) is first used for motion estimation, the prediction cost CurBestCostNoAffine is calculated, and the prediction mode CurBestModeNoAffine is saved. If sps_affine_enable_flag=1 and fixed constraints such as block size are met, use the affine motion compensation prediction mode for motion estimation, calculate the prediction cost CurBestCostAffine, save the prediction mode CurBestModeAffine, and use the smaller of the two costs as the interframe prediction mode The minimum prediction cost under CurBestCostInter[i], and save the corresponding mode as the optimal mode CurBestModeInter[i].

Step 12: Calculate the minimum prediction cost CurBestCostOther[i] and the optimal prediction mode CurBestModeOther[i] in other prediction modes such as intra prediction. Compare CurBestCostInter[i] and CurBestCostOther[i], and select the optimal prediction mode bestMode[i] and prediction cost bestCost[i] of the i-th division method.

Step 13: Traverse all preset block division methods, and select the block division method Split[opt] and the corresponding prediction mode bestMode[opt] that minimize the current CTU prediction cost.

Step 14: Divide the current CTU according to the optimal block division method to obtain multiple CUs, and use the prediction mode corresponding to the optimal block division method to predict multiple CUs to obtain predicted values. According to the predicted value and the original value, Obtain the residual value, transform, quantize, and entropy encode the residual value. In addition, encode the prediction information, including whether the CU uses the AFF identifier cu.affine and motion vector, etc., and output the code stream.

In some embodiments, the AFF-based decoding method is: performing entropy decoding, inverse quantization, and inverse transformation on the input code stream to obtain the residual value Res, and further decoding the code stream to obtain block division information and prediction information. Next, follow these steps to reconstruct the image:

Step 21: Determine the partition tree of the current CTU according to the block partition information.

Step 22: Process each CU in the partition tree sequentially according to the raster scanning order, and use the prediction information of each CU, such as prediction mode bestMode[opt], identifier cu.affine, motion vector, etc., to calculate the prediction value Pred.

Step 23: Superpose the residual value Res and the prediction value Pred of the current CU to obtain the reconstructed CU. Finally, the reconstructed image is sent to the DBF/SAO/ALF filter, and the filtered image is sent to the buffer to wait for video playback.

As can be seen from the above, AFF serves as a supplement to inter-frame prediction technology to more efficiently express irregular movements such as zooming in/out and rotation. Therefore, for each prediction unit, AFF needs to perform motion estimation on each 4x4 sub-block, which results in more rate-distortion optimization calculations and greatly increases the coding complexity. According to statistics, AFF brings an average gain of 2.95% and an increase in coding complexity of 27% in low-latency mode. However, the fundamental reason why AFF can achieve performance improvement lies in irregular movements such as zooming in/out and rotation. In other words, for general translational motion videos, the performance gain brought by AFF is very limited, but it will still significantly increase the coding complexity.

In order to solve the above technical problem, the embodiment of the present application determines whether the current video is a translational motion video by determining violent motion parameters. If the current video is a translational motion video, the affine motion compensation prediction mode is skipped, thereby avoiding The waste of computing resources caused by using the affine motion compensation prediction mode for prediction of translational motion videos reduces the encoding time, improves the encoding efficiency of the video, and saves computing resources.

The video encoding and decoding method provided by the embodiment of the present application will be introduced below with reference to specific embodiments.

First, with reference to Figure 6, the video decoding method provided by the embodiment of the present application is introduced, taking the decoding end as an example.

FIG. 6 is a schematic flowchart of a video decoding method provided by an embodiment of the present application. The embodiment of the present application is applied to the video decoders shown in FIGS. 1 and 3 .

As shown in Figure 6, the method in the embodiment of this application includes:

S601. Decode the code stream and determine the residual value of the current CTU.

During video decoding, the decoder receives the code stream, decodes the code stream, determines the block division method of the current CTU, and divides the current CTU into blocks using the block division method of the current CTU to obtain at least one CU. Decode the code stream to obtain the quantization coefficient of the current CU in at least one CU, perform inverse quantization on the quantization coefficient of the current CU to obtain the transformation coefficient of the current CU, and perform inverse transformation on the transformation coefficient of the current CU to obtain the residual value of the current CU. Decode the code stream, determine the prediction mode of the current CU, use the prediction mode of the current CU to predict the current CU, and obtain the prediction value of the current CU. According to the prediction value and residual value of the current CU, the reconstruction value of the current CU is obtained. The reconstructed values of each CU in the current frame form the reconstructed image.

In some embodiments, loop filtering is performed on the reconstructed image on an image basis or on a block basis to obtain a decoded image. The decoded image may also be called a reconstructed image, and the reconstructed image may be used as a reference frame for inter-frame prediction for subsequent frames.

In some embodiments, the current CU is also called a current block, a current image block, a current decoding block, a current coding unit, a current block to be decoded, a current image block to be decoded, etc.

In some embodiments, the current CU in the embodiment of the present application only includes chrominance components, which can be understood as chrominance blocks.

In some embodiments, the current CU in the embodiment of the present application only includes the luminance component, which can be understood as a luminance block.

In some embodiments, the current CU includes both luma and chrominance components.

In the embodiment of this application, the decoder determines the residual value of the current CTU in the following two situations:

Case 1: If the encoding end transforms the residual values of each CU in the current CTU to obtain the transformation coefficients of each CU, it directly encodes the transformation coefficients of each CU to obtain a code stream. Correspondingly, the decoder decodes the code stream to obtain the transform coefficient of the current CTU, and performs inverse transformation on the transform coefficient of the current CTU to obtain the residual value of the current CTU.

Case 2: If the encoding end transforms the residual value of each CU in the current CTU to obtain the transform coefficient of each CU, then quantizes the transform coefficient of each CU to obtain the quantized coefficient of each CU, and the quantized coefficient of each CU is Encode and get the code stream. Correspondingly, the decoding end decodes the code stream to obtain the quantization coefficient of the current CTU, inversely quantizes the quantization coefficient of the current CTU to obtain the transform coefficient of the current CTU, and inversely transforms the transform coefficient of the current CTU to obtain the residual value of the current CTU. .

S602. Determine the block division method and prediction mode of the current CTU.

Among them, the block division method and prediction mode of the current CTU are determined based on the violent motion parameter, and the violent motion parameter is used to indicate whether to skip the affine motion compensation prediction mode.

It should be noted that the above-mentioned S602 may be executed before the above-mentioned S601, may be executed after the above-mentioned S602, or may be executed simultaneously with the above-mentioned S602, and the embodiment of the present application does not limit this.

In the embodiment of the present application, since the affine motion compensation prediction mode is more complex and takes up more computing resources, the decoding efficiency is low. In addition, the affine motion compensation prediction mode is mainly used to efficiently express irregular motions such as zooming in/out and rotation. For translational motion videos, the performance gain of the affine motion compensation prediction mode is limited. It can be seen that when using the affine motion compensation prediction mode for prediction of translational motion videos, the compression effect is not significant, but it will occupy a large amount of computing resources and increase the encoding time. Therefore, before using the affine motion compensation prediction mode for prediction, the embodiment of the present application first determines the violent motion parameter, and uses the violent motion parameter to indicate whether the current video is a translational motion video. If it is determined that the current video is a translational motion video, When , the affine motion compensation prediction mode is skipped, thereby avoiding the waste of computing resources caused by using the affine motion compensation prediction mode for prediction of translational motion videos, thereby improving the coding efficiency of the video and saving computing resources.

The violent motion parameter in the embodiment of the present application is used to indicate whether the current video is a translational motion-type video. If it is a translational motion-type video, the affine motion compensation prediction mode is skipped. Therefore, the violent motion parameter in the embodiment of the present application is Can also be used directly to indicate whether to skip affine motion compensation prediction mode. For example, if the violent motion parameter is less than the preset value, it is determined that the current video is a translation motion video, and the affine motion compensation prediction mode is skipped. If the violent motion parameter is greater than or equal to the preset value, it is determined that the current video is not For translational motion videos, affine motion compensation prediction mode can be used for prediction.

In some embodiments, the block division method of the current CTU can be understood as the optimal block division method of the current CTU. For example, the block division method of the current CTU is the block division method with the lowest cost among multiple preset block division methods.

In some embodiments, the above prediction mode of the current CTU can be understood as a set, including the prediction mode of each CU in at least one CU included in the current CTU, where the prediction mode of the CU can be understood as the optimal prediction mode of the CU.

In this embodiment of the present application, the prediction mode of the current CTU is the prediction mode corresponding to the block division method of the current CTU. For example, the current CTU is divided into at least one CU using the block division method of the current CTU, and each CU in the at least one CU is The set of prediction modes is determined as the prediction mode of the current CTU.

In the embodiment of the present application, the block division method and prediction mode of the current CTU are based on the violent motion parameters. In this way, when the current video is judged to be a translational motion video based on the violent motion parameters, the block division method and prediction mode of the current CTU can be determined. Skipping the affine motion compensation prediction mode reduces the workload of determining the block division method and prediction mode of the current CTU, saves computing resources, and improves the efficiency of determining the block division method and prediction mode of the current CTU.

The specific methods for determining the block division method and prediction mode of the current CTU in S602 include but are not limited to the following:

Method 1: The encoding end indicates the determined block division method and prediction mode of the current CTU to the decoder, so that the decoder can determine the block division method and prediction mode of the current CTU based on the indication information. Specifically, the above-mentioned S602 includes the following S602-A1 and S602-A2:

S602-A1. Decode the code stream to obtain at least one of first information and second information, where the first information is used to indicate the block division method of the current CTU, and the second information is used to indicate the prediction mode of the current CTU;

S602-A2: Determine at least one of the block division method and the prediction mode of the current CTU according to at least one of the first information and the second information.

In this method one, after determining the block division method and prediction mode of the current CTU based on the violent motion parameters, the encoding end writes at least one of the first information and the second information in the code stream, where the first information is used to Indicates the block division mode of the current CTU, and the second information is used to indicate the prediction mode of the current CTU. In this way, the decoder obtains at least one of the first information and the second information by decoding the code stream, and then determines the block division method of the current CTU based on the first information, and/or determines the prediction mode of the current CTU based on the second information.

The embodiments of this application do not limit the specific forms of the first information and the second information.

In some embodiments, the first information is an index of the block division mode of the current CTU.

In some embodiments, the second information is an index of the prediction mode of the current CTU.

In addition to using the above steps of S602-A1 and S602-A2 to determine the block division method and prediction mode of the current CTU, the decoder can also determine the block division method and prediction mode of the current CTU according to the following method 2.

Method 2: The decoder determines the violent motion parameters, and determines the block division method and prediction mode of the current CTU based on the violent motion parameters. For example, the decoding end determines the violent motion parameters. For example, the code stream carries the violent motion parameters determined by the encoding end. In this way, the decoding end can obtain the violent motion parameters by decoding the code stream. Next, the decoder determines the block division method and prediction mode of the current CTU based on the violent motion parameter. For example, if the violent motion parameter indicates to skip the affine motion compensation prediction mode, the decoder determines the block division method and prediction mode of the CTU. , skipping affine motion compensation prediction mode. For another example, if the severe motion parameter indicates not to skip the affine motion compensation prediction mode, the decoder may try the affine motion compensation prediction mode when determining the block division method and prediction mode of the CTU. It should be noted that when the decoder uses the block division method and prediction mode of the current CTU, the CU in the current CTU has not been reconstructed. In a possible implementation, the template of the CU can be used instead of the CU to determine the CU in the current CTU. The prediction mode of the CU, where the template of the CU includes the upper decoded area and/or the left decoded area of the CU.

The embodiments of this application do not limit the specific method of determining the block division method and prediction mode of the current CTU based on the violent motion parameters.

In some embodiments, the block division method of the current CTU is a preset block division method, and the prediction mode of the CTU is determined based on the violent motion parameter. For example, if the violent motion parameter indicates to skip the affine motion compensation prediction mode, then the A prediction mode other than the non-affine motion compensation prediction mode is determined as the prediction mode of the CTU. If the severe motion parameter indicates not to skip the affine motion compensation prediction mode, then the affine motion compensation prediction mode is determined as the prediction mode of the CTU.

In some embodiments, the block division method and prediction mode of the current CTU are determined based on the optimal prediction modes corresponding to the N block division methods, where the optimal prediction mode corresponding to the i-th block division method among the N block division methods is Prediction modes are determined based on strenuous exercise parameters. Specifically, for the i-th block division method among the N block division methods, the optimal prediction mode corresponding to the i-th block division method is determined according to the violent motion parameter, so that each block in the N block division methods can be determined The optimal prediction mode corresponding to the division method, and then determine the block division method and prediction mode of the current CTU based on the optimal prediction modes corresponding to the N block division methods. For example, the cost in the optimal prediction mode corresponding to the N block division method is determined. The smallest block division method is determined as the block division method of the current CTU, and the optimal prediction mode corresponding to the block division method of the current CTU is determined as the prediction mode of the current CTU.

The embodiment of the present application does not limit the specific method of determining the optimal prediction mode corresponding to the i-th block division method.

In some embodiments, if the severe motion parameter indicates skipping the affine motion compensation prediction mode, one or more prediction modes are determined as the i-th block partitioning mode from prediction modes other than the affine motion compensation prediction mode. The corresponding optimal prediction mode.

In some embodiments, if the severe motion parameter indicates not to skip the affine motion compensation prediction mode, the affine motion compensation prediction mode is determined to be the optimal prediction mode corresponding to the i-th block division mode.

In some embodiments, the optimal prediction mode corresponding to the i-th block division method is determined based on the optimal prediction modes of M CUs, where the M CUs use the i-th block division method to perform block division on the current CTU. Obtained, the optimal prediction mode of the j-th CU among the M CUs is determined from at least one candidate prediction mode of the j-th CU, and at least one candidate prediction mode of the j-th CU is determined based on the violent motion parameters. .

Specifically, for the i-th block division method among the N block division methods, the current CTU is divided into blocks using the i-th block division method to obtain M CUs. Next, based on the violent motion parameters, the optimal prediction mode corresponding to each of the M CUs is determined. Specifically, for each CU among the M CUs, for example, the j-th CU, at least one candidate prediction mode of the j-th CU is determined based on the violent motion parameter.

In some embodiments, if the severe motion parameter indicates skipping the affine motion compensation prediction mode, the at least one candidate prediction mode of the jth CU does not include the affine motion compensation prediction mode.

In some embodiments, if the severe motion parameter indicates not to skip the affine motion compensation prediction mode, at least one candidate prediction mode of the jth CU includes the affine motion compensation prediction mode.

Next, a prediction mode is determined from at least one candidate prediction mode of the j-th CU as the optimal prediction mode of the j-th CU.

In some embodiments, a default prediction mode among at least one candidate prediction mode of the j-th CU is determined as the optimal prediction mode of the j-th CU.

In some embodiments, the optimal prediction mode of the j-th CU is a cost when predicting the j-th CU according to at least one candidate prediction mode of the j-th CU, and is determined from at least one candidate prediction mode.

Taking the optimal prediction mode of the j-th CU as the candidate prediction mode with the smallest cost among at least one candidate prediction mode of the j-th CU as an example, in the embodiment of the present application, the specific process of determining the block division method and prediction mode of the CTU is as follows : For the i-th block division method among the preset N block division methods, use the i-th block division method to perform block division on the current CTU to obtain M CUs. For each CU in these M CUs, The optimal prediction mode for each CU in the M CUs is determined based on the violent motion parameters. Specifically, for the j-th CU among the M CUs, if the severe motion parameter indicates skipping the affine motion compensation prediction mode, it is determined that at least one candidate prediction mode of the j-th CU does not include the affine motion compensation prediction mode. , if the severe motion parameter indicates not to skip the affine motion compensation prediction mode, it is determined that at least one candidate prediction mode of the j-th CU includes the affine motion compensation prediction mode. Next, use each of the candidate prediction modes of at least one candidate prediction mode of the j-th CU to predict the j-th CU, and obtain the prediction value corresponding to each candidate prediction mode. According to the prediction value corresponding to each candidate prediction mode, and the original value of the j-th CU to determine the cost of each candidate prediction mode of the j-th CU. In the embodiment of the present application, in order to reduce the cost calculation workload, an approximate cost method can be used to calculate the cost of each candidate prediction mode. Cost, for example, based on the predicted value corresponding to each candidate prediction mode and the original value of the j-th CU, calculate the sum of absolute errors (Sum of Absolute Difference, SAD) or the sum of absolute values after adamard transformation (Sum of Absolute Transformed Difference, SATD) and other approximate costs. According to the cost of each candidate prediction mode in at least one candidate prediction mode of the jth CU, a candidate prediction mode is determined from the at least one candidate prediction mode as the optimal prediction mode of the jth CU, for example, the jth CU The candidate prediction mode with the smallest cost among at least one candidate prediction mode of the jth CU is regarded as the optimal prediction mode of the j-th CU. Referring to the method of determining the optimal prediction mode of the j-th CU, the optimal prediction mode of each CU in the M CUs under the i-th block division method can be determined, and the optimal prediction mode of each CU in the M CUs can be determined. The prediction mode is determined as the optimal prediction mode corresponding to the i-th block division method, and the sum of the costs corresponding to the optimal prediction modes of each CU in the M CUs is determined as the cost corresponding to the i-th block division method. . According to the above method of determining the optimal prediction mode and cost corresponding to the i-th block division method, the optimal prediction mode and cost corresponding to each of the N block division methods are determined. Finally, among the N block division methods, the block division method with the smallest cost is determined as the block division method of the current CTU, and then the optimal prediction mode corresponding to the block division method of the current CTU is determined as the optimal prediction mode of the current CTU . For example, the block division method 1 is the block division method with the lowest cost among N block division methods, and then the block division method 1 is determined as the block division method of the current CTU. It is assumed that the block division method 1 divides the current CTU into 4 CUs. , the optimal prediction modes of each of these four CUs are prediction mode 1, prediction mode 2, prediction mode 3 and prediction mode 4, and then prediction mode 1, prediction mode 2, prediction mode 3 and prediction mode 4 are Determine the optimal prediction mode for the current CTU.

It can be seen from the above that in the embodiment of the present application, when determining the block division method and prediction mode of the current CTU, for each CU under each block division method, the optimal prediction mode is determined based on the violent motion parameters. In this way, when the CU is a translational motion video, the affine motion compensation prediction mode can be skipped, thereby greatly saving computing resources, effectively improving the efficiency of determining the block division method and prediction mode of the current CTU, and thus improving coding performance.

The following is an introduction to the determination process of strenuous exercise parameters in the embodiment of the present application.

In some embodiments, the above violent motion parameters include at least one of the violent motion parameters of the current frame, the violent motion parameters of the current CTU, and the violent motion parameters of the current CU. That is to say, the violent motion parameters in this embodiment of the present application include at least one of frame-level violent motion parameters, CTU-level motion translation parameters, and CU-level motion parameters.

Among them, the violent motion parameter of the current frame is used to indicate whether the current frame skips the affine motion compensation prediction mode. That is to say, if the violent motion parameter of the current frame is less than the first threshold, it means that the current frame is a translational motion video. At this time, when determining the prediction mode of each CU in the current frame, the affine motion compensation prediction mode is skipped. , thereby improving the efficiency of determining the prediction model and reducing the calculation amount of determining the prediction model. If the violent motion parameter of the current frame is greater than or equal to the first threshold, it means that the current frame is not a translational motion video. At this time, when determining the prediction mode, try the affine motion compensation prediction mode, thereby improving the accuracy of determining the prediction mode. Improve coding performance.

The embodiments of the present application do not limit the specific value of the above-mentioned first threshold.

The embodiment of the present application does not limit the specific method of determining the violent motion parameters of the current frame.

In some embodiments, the violent motion parameters of the current frame are determined based on the position changes of the pixels in the current frame and the previous frame, for example, based on the translational displacement of the pixels in the current frame relative to the pixels in the previous frame. , determine the violent motion parameters of the current frame.

In some embodiments, the severe motion parameters of the current frame are determined based on the severe motion parameters of K CTUs included in the current frame. For the k-th CTU among the K CTUs included in the current frame, the k-th CTU The violent motion parameter is determined based on the k-th CTU and the reference CTU of the k-th CTU in the previous frame of the current frame. K is a positive integer, and k is a positive integer less than or equal to K.

Specifically, the current frame is divided into K CTUs, the violent motion parameters of each of the K CTUs are determined, and the violent motion parameters of the current frame are determined based on the violent motion parameters of each of the K CTUs. Specifically, for the k-th CTU among K CTUs, the reference CTU of the K-th CTU is determined in the previous frame of the current frame, and then the reference CTU of the k-th CTU and the k-th CTU in the previous frame is determined. Referring to the CTU, determine the strenuous exercise parameters of the k-th CTU. Referring to the above method for determining the violent motion parameters of the k-th CTU, the violent motion parameters of each of the K CTUs included in the current frame can be determined, and then based on the violent motion parameters of each of the K CTUs, the Violent motion parameters of the current frame.

The embodiments of this application do not limit the specific method of determining the strenuous exercise parameters of the k-th CTU.

In some embodiments, the above violent motion parameter of the k-th CTU is determined based on the pixel value of the k-th CTU and the pixel value of the reference CTU. That is, based on the pixel value of the k-th CTU and the pixel value of the reference CTU, the violent motion parameters of the k-th CTU are determined.

For example, based on the pixel value of the pixel point in the k-th CTU and the pixel value of the pixel point in the reference CTU, the translational displacement of the k-th CTU relative to the reference CTU is determined, and then based on the translational displacement of the k-th CTU relative to the reference CTU, Determine the violent motion parameters of the k-th CTU.

For another example, the violent motion parameter of the k-th CTU is determined based on the absolute difference between the pixel value of the k-th CTU and the pixel value of the reference CTU. That is, for each pixel in the k-th CTU, determine the absolute difference between the pixel value of the pixel in the k-th CTU and the pixel value in the reference CTU, and then based on the difference between each pixel in the k-th CTU and Referring to the absolute difference of the pixel value of each pixel in the CTU, determine the violent motion parameters of the k-th CTU.

In one example, the violent motion parameter of the k-th CTU is determined according to the following formula (3):

Among them, MS_CTU _k is the violent motion parameter of the k-th CTU, CTU _k ^Ori (x, y) is the pixel value of the k-th CTU at position (x, y), and CTU _k ^Ref (x, y) is the pixel value of the k-th CTU at position (x, y). The pixel value of the reference CTU of k CTU in the previous frame at position (x, y), CTU _k -H is the height of the kth CTU and the reference CTU, CTU _k -W is the kth CTU and the reference CTU width.

The embodiments of the present application do not limit the specific method of determining the violent motion parameters of the current frame based on the violent motion parameters of the K CTUs included in the current frame.

In some embodiments, the sum of violent motion parameters of K CTUs included in the current frame is determined as the violent motion parameter of the current frame.

In some embodiments, the violent motion parameter of the current frame is determined based on the violent motion parameters of P CTUs among the K CTUs whose violent motion parameters are greater than the first preset value, where P is a positive integer less than or equal to K. Specifically, according to the violent motion parameters of each CTU among the K CTUs included in the current frame, P CTUs whose violent motion parameters are greater than the first preset value are selected from the K CTUs, and then according to the parameters of the P CTUs Violent motion parameters, determine the violent motion parameters of the current frame.

In one possible implementation, the sum of the violent motion parameters of the P CTUs is determined as the violent motion parameter of the current frame.

In another possible implementation, the violent motion parameter of the current frame is determined based on the sum of the violent motion parameters of P CTUs and the total area of P CTUs. For example, the violent motion parameter of the current frame is the ratio of the sum of the violent motion parameters of P CTUs to the total area of P CTUs. For example, determine the violent motion parameters of the current frame according to the following formula (4):

Among them, MS is the violent motion parameter of the current frame,

is the sum of the violent motion parameters of P CTUs.

The embodiments of this application do not limit the specific value of the above-mentioned first preset value.

The specific process of determining the violent motion parameters of the current frame is introduced above. The process of determining the violent motion parameters of the current CTU is introduced below.

The severe motion parameter of the current CTU in this embodiment of the present application is used to indicate whether the current CTU skips the affine motion compensation prediction mode. That is to say, if the violent motion parameter of the current CTU is less than the second threshold, it means that the current CTU is a translational motion video. At this time, when determining the prediction mode of each CU in the current CTU, the affine motion compensation prediction mode is skipped. , thereby improving the efficiency of determining the prediction model and reducing the calculation amount of determining the prediction model. If the violent motion parameter of the current CTU is greater than or equal to the second threshold, it means that the current CTU is not a translational motion video. At this time, when determining the prediction mode, try the affine motion compensation prediction mode, thereby improving the accuracy of determining the prediction mode. Improve coding performance.

The embodiment of the present application does not limit the specific value of the above-mentioned second threshold.

The embodiments of this application do not limit the specific method of determining the strenuous exercise parameters of the current CTU.

In some embodiments, the violent motion parameter of the current CTU is determined based on the current CTU and the reference CTU of the current CTU in the previous frame of the current frame. For example, based on the pixel points in the current CTU relative to the current CTU in the previous frame. The translational displacement of the pixel points of the reference CTU in the frame determines the violent motion parameters of the current CTU.

The embodiments of this application do not limit the above-mentioned specific method of determining the strenuous exercise parameters of the current CTU.

In some embodiments, the above violent motion parameter of the current CTU is determined based on the pixel value of the current CTU and the pixel value of the reference CTU. That is, based on the pixel value of the current CTU and the pixel value of the reference CTU, the violent motion parameters of the current CTU are determined.

For example, based on the pixel value of the pixel point in the current CTU and the pixel value of the pixel point in the reference CTU, the translational displacement of the current CTU relative to the reference CTU is determined, and then based on the translational displacement of the current CTU relative to the reference CTU, the violent motion of the current CTU is determined. parameter.

For another example, the violent motion parameter of the current CTU is determined based on the absolute difference between the pixel value of the current CTU and the pixel value of the reference CTU. That is, for each pixel in the current CTU, determine the absolute difference between the pixel value of the pixel in the current CTU and the pixel value in the reference CTU, and then determine the absolute difference between each pixel in the current CTU and each pixel in the reference CTU. The absolute difference of the pixel values of the points determines the violent motion parameters of the current CTU.

In an example, the strenuous motion parameters of the current CTU are determined according to the following formula (5):

Among them, MS_CTU is the violent motion parameter of the current CTU, CTU ^Ori (x, y) is the pixel value of the current CTU at position (x, y), and CTU ^Ref (x, y) is the current CTU in the previous frame. The pixel value of the reference CTU at position (x, y), CTU-H is the height of the current CTU and the corresponding reference CTU, and CTU-W is the width of the current CTU and the corresponding reference CTU.

The specific process of determining the strenuous exercise parameters of the current CTU is introduced above. The process of determining the strenuous exercise parameters of the current CU is introduced below.

The severe motion parameter of the current CU in this embodiment of the present application is used to indicate whether the current CU skips the affine motion compensation prediction mode. That is to say, if the violent motion parameter of the current CU is less than the third threshold, it means that the current CU is a translational motion video. At this time, when determining the prediction mode of the current CU, the affine motion compensation prediction mode is skipped, thereby improving the prediction The efficiency of determining the model is improved and the calculation amount of determining the prediction model is reduced. If the violent motion parameter of the current CU is greater than or equal to the third threshold, it means that the current CU is not a translational motion video. At this time, when determining the prediction mode, try the affine motion compensation prediction mode, thereby improving the accuracy of determining the prediction mode. Improve coding performance.

The embodiments of the present application do not limit the specific value of the above third threshold.

The embodiment of the present application does not limit the specific method of determining the strenuous exercise parameters of the current CU.

In some embodiments, the violent motion parameters of the current CU are determined based on the current CU and the reference CU of the current CU in the previous frame of the current frame. For example, based on the pixel points in the current CU relative to the current CU in the previous frame. The translational displacement of the pixel points of the reference CU in the frame determines the violent motion parameters of the current CU.

The embodiments of this application do not limit the above-mentioned specific method of determining the strenuous exercise parameters of the current CU.

In some embodiments, the above violent motion parameter of the current CU is determined based on the pixel value of the current CU and the pixel value of the reference CU. That is, based on the pixel value of the current CU and the pixel value of the reference CU, the violent motion parameters of the current CU are determined.

For example, based on the pixel values of the pixels in the current CU and the pixel values of the pixels in the reference CU, the translational displacement of the current CU relative to the reference CU is determined, and then based on the translational displacement of the current CU relative to the reference CU, the violent motion of the current CU is determined. parameter.

For another example, the violent motion parameter of the current CU is determined based on the absolute difference between the pixel value of the current CU and the pixel value of the reference CU. That is, for each pixel in the current CU, determine the absolute difference between the pixel value of the pixel in the current CU and the pixel value in the reference CU, and then determine the absolute difference between each pixel in the current CU and each pixel in the reference CU. The absolute difference of the pixel values of the points determines the violent motion parameters of the current CU.

In an example, the strenuous motion parameters of the current CU are determined according to the following formula (6):

Among them, MS_CU is the violent motion parameter of the current CU, CU ^Ori (x, y) is the pixel value of the current CU at position (x, y), and CU ^Ref (x, y) is the current CU in the previous frame. The pixel value of the reference CU at position (x, y), CU-H is the height of the current CU and the corresponding reference CU, and CU-W is the width of the current CU and the corresponding reference CU.

The above describes in detail the violent motion parameters of the current frame, the violent motion parameters of the current CTU, and the violent motion parameters of the current CU in the embodiment of the present application. In this way, during prediction, it can be determined whether to skip the affine motion compensation prediction mode based on the violent motion parameters, thereby saving computing resources and improving prediction efficiency.

In some embodiments, the above strenuous exercise parameters are determined under preset conditions. That is to say, before determining the strenuous exercise parameters, it is first determined whether the preset conditions are met. If the preset conditions are met, the strenuous exercise parameters are determined. . Therefore, the preset conditions in the embodiments of the present application can be understood as conditions that need to be met to execute the affine motion compensation prediction mode.

The embodiments of this application do not limit the specific content of the above preset conditions.

In a possible implementation, the preset conditions include that the value of the first flag is a first numerical value, and the size of the current CU meets at least one of the preset sizes.

The first flag is used to indicate whether the current sequence is allowed to use the affine motion compensation prediction mode, and the first value is used to indicate that the current sequence is allowed to use the affine motion compensation prediction mode.

For example, when the value of the first flag is a first numerical value, it indicates that the current sequence is allowed to use the affine motion compensation prediction mode; when the value of the first flag is a second numerical value, it indicates that the current sequence is not allowed to use the affine motion compensation. Prediction mode.

The embodiments of the present application do not limit the specific values of the above-mentioned first numerical value and second numerical value.

Optional, the first value is 1.

Optional, the second value is 0.

The above-mentioned first flag can be obtained by decoding the code stream. That is to say, if the encoding end determines that the current sequence does not allow the use of affine motion compensation prediction mode, the value of the first flag is set to 0, and the first flag set to 0 is written into the code stream, or the encoding When the terminal determines that the current sequence allows the use of the affine motion compensation prediction mode, the value of the first flag is set to 1, and the first flag set to 1 is written into the code stream. In this way, the decoder decodes the code stream to obtain the first flag, and determines whether the current sequence is allowed to use the affine motion compensation prediction mode based on the first flag. For example, if the value of the first flag is 0, it is determined that the current sequence is not allowed to use it. Affine motion compensation prediction mode. At this time, there is no need to determine the violent motion parameters, but the affine motion compensation prediction mode is skipped directly. If the value of the first flag is 1, it is determined that the current sequence is allowed to use the affine motion compensation prediction mode. At this time, it is judged whether the limiting conditions of other affine motion compensation prediction modes are met, for example, whether the size of the current CU meets the predetermined Set the size. If the size of the current CU meets the preset size, determine the violent motion parameters.

The embodiment of the present application does not limit the specific value of the preset size that the size of the current CU meets. For example, the preset size is that the length and/or width of the CU is greater than or equal to a preset value, such as greater than or equal to 8. , or the area of the preset size CU is greater than a certain preset value.

In some embodiments, if the translation operation parameter is a violent motion parameter of the current frame, the preset condition includes that the value of the first flag is a first value. That is to say, when the value of the first flag is the first value, the violent motion parameter of the current frame is determined.

In some embodiments, if the translation operation parameter is a violent motion parameter of the current CTU, the preset condition includes that the value of the first flag is a first value. That is to say, the value of the first flag is the first numerical value, which determines the violent motion parameters of the current CTU.

In some embodiments, if the translation operation parameter is a violent motion parameter of the current CU, the preset condition includes that the value of the first flag is a first value, and the size of the current CU meets the preset size. That is to say, when the value of the first flag is the first value and the size of the current CU meets the preset size, the violent motion parameter of the current CU is determined.

For example, sps_affine_enable_flag can be used to represent the first flag.

In some embodiments, when the optimal prediction mode of the current CU is determined, if the value of the first flag is the first value, the violent motion parameter indicates not to skip the affine motion compensation prediction mode, and the size of the current CU satisfies When predicting the size, use the affine motion compensation prediction mode to perform motion estimation on the current CU, obtain the calculated prediction cost CurBestCostAffine, and save the prediction mode CurBestModeAffine. At the same time, calculate the prediction cost corresponding to each candidate prediction mode in other candidate prediction modes of the current CU, compare the prediction cost corresponding to the affine motion compensation prediction mode with the prediction costs corresponding to other candidate prediction modes, and select the prediction mode with the smallest cost. As the optimal prediction mode of the current CU.

In some embodiments, if the value of the first flag is not equal to the first value, and/or the severe motion parameter indicates skipping the affine motion compensation prediction mode, and/or the size of the current CU does not meet the prediction size, then skip Through the affine motion compensation prediction mode, use prediction modes other than the affine motion compensation prediction mode to predict the current CU, obtain the cost corresponding to each prediction mode, and determine the prediction mode with the smallest cost as the optimal prediction mode for the current CU .

The above describes the determination process of violent motion parameters and how the current CTU block division method and prediction mode are determined based on violent motion parameters. After determining the block division method and prediction mode of the current CTU, the decoding end performs reconstruction according to the block division method and prediction mode of the current CTU. For details, refer to the following descriptions of S603 to S605.

S603. Use the block division method of the current CTU to divide the current CTU into blocks to obtain at least one CU.

S604. For the current CU in at least one CU, use the prediction mode corresponding to the current CU in the prediction mode of the current CTU to predict the current CU to obtain the prediction value of the current CU.

In this embodiment of the present application, the decoder determines the block division method and prediction mode of the current CTU based on the above-mentioned S602, and then uses the block division method of the current CTU to perform block division on the current CTU, for example, divides the current CTU into at least one CU. For the current CU in the at least one CU, use the prediction mode corresponding to the current CU in the prediction mode of the current CTU to predict the current CU to obtain the prediction value of the current CU. That is to say, the prediction mode of the current CTU includes the prediction mode of each CU in the above-mentioned at least one CU. In this way, when determining the prediction value of each CU in the at least one CU, the prediction corresponding to the CU in the prediction mode of the CTU can be used. The mode predicts the CU and obtains the predicted value of the CU.

S605. Determine the residual value of the current CU based on the residual value of the current CTU, and obtain the reconstruction value of the current CU based on the residual value and predicted value of the current CU.

The decoding end determines the residual value of the current CTU by decoding the code stream according to the above steps of S601. In this way, according to the block division method of the current CTU, the residual value of the current CU in the current CTU can be determined.

Next, the reconstruction value of the current CU is determined based on the residual value and the prediction value of the current CU. For example, the sum of the residual value and the prediction value of the current CU is determined as the reconstruction value of the current CU.

In some embodiments, the reconstruction value of the current CU is filtered to obtain a filtered reconstruction value. For example, perform DBF, SAO, ALF, etc. filtering on the reconstruction value of the current CU, send the filtered image to the buffer, and wait for video playback.

The video decoding method provided by the embodiment of the present application decodes the code stream and determines the residual value of the current coding tree unit CTU; determines the block division method and prediction mode of the current CTU, and the block division method and prediction mode of the current CTU are based on violent motion parameters. It is determined that the violent motion parameter is used to indicate whether to skip the affine motion compensation prediction mode; use the block division method of the current CTU to block divide the current CTU to obtain at least one coding unit CU; for the current CU in at least one CU, use The prediction mode corresponding to the current CU in the prediction mode of the current CTU is used to predict the current CU to obtain the prediction value of the current CU; based on the residual value of the current CTU, the residual value of the current CU is determined, and based on the residual value of the current CU and the predicted value to obtain the reconstruction value of the current CU. That is, in the embodiment of the present application, the violent motion parameters are used to determine whether the current video is a translational motion video. If the current video is a translational motion video, the affine motion compensation prediction mode is skipped, thereby avoiding the need for translational motion videos. The waste of computing resources caused by using affine motion compensation prediction mode prediction can reduce encoding time, improve video encoding efficiency, and save computing resources.

The decoding method in the embodiment of the present application is introduced above. On this basis, the encoding method provided by the embodiment of the present application is introduced below.

FIG. 7 is a schematic flowchart of a video encoding method provided by an embodiment of the present application. The embodiment of the present application is applied to the encoders shown in FIGS. 1 and 2 . As shown in Figure 7, the method in the embodiment of this application includes:

S701. Determine the parameters of strenuous exercise.

Among them, the violent motion parameter is used to indicate whether to skip the affine motion compensation prediction mode.

The encoding process in this embodiment of the present application is to divide the current image frame into blocks to obtain the current CU, determine the prediction mode of the current CU, use the prediction mode of the current CU to predict the current CU, and obtain the prediction value of the current CU. The original value of the current CU is subtracted from the predicted value to obtain the residual value of the current CU. Transform the residual value of the current CU to obtain the transformation coefficient. Optionally, the transform coefficients are quantized, and the quantized transform coefficients are encoded to obtain a code stream.

The embodiment of the present application relates to the prediction process in the above encoding process.

In the embodiment of the present application, since the affine motion compensation prediction mode is more complex and takes up more computing resources, the coding efficiency is low. In addition, the affine motion compensation prediction mode is mainly used to efficiently express irregular motions such as zooming in/out and rotation. For translational motion videos, the performance gain of the affine motion compensation prediction mode is limited. It can be seen that when using the affine motion compensation prediction mode for prediction of translational motion videos, the compression effect is not significant, but it will occupy a large amount of computing resources and increase the encoding time. Therefore, before using the affine motion compensation prediction mode for prediction, the embodiment of the present application first determines the violent motion parameter, and uses the violent motion parameter to indicate whether the current video is a translational motion video. If it is determined that the current video is a translational motion video, When , the affine motion compensation prediction mode is skipped, thereby avoiding the waste of computing resources caused by using the affine motion compensation prediction mode for prediction of translational motion videos, thereby improving the coding efficiency of the video and saving computing resources.

The violent motion parameter in the embodiment of the present application is used to indicate whether the current video is a translational motion-type video. If it is a translational motion-type video, the affine motion compensation prediction mode is skipped. Therefore, the violent motion parameter in the embodiment of the present application is Can also be used directly to indicate whether to skip affine motion compensation prediction mode. For example, if the violent motion parameter is less than the preset value, it is determined that the current video is a translational motion video, and the affine motion compensation prediction mode is skipped at this time, thereby avoiding the use of the affine motion compensation prediction mode for prediction of translational motion videos. , resulting in a waste of computing resources, thereby improving the video encoding efficiency and saving computing resources.

If the violent motion parameter is greater than or equal to the preset value, it is determined that the current video is not a translational motion video. At this time, you can try to use the affine motion compensation prediction mode for prediction, thereby improving the coding effect of the video.

The following is an introduction to the process of determining strenuous exercise parameters in S701 mentioned above.

In some embodiments, if the violent motion parameter includes the violent motion parameter of the current frame, the encoding end determines the violent motion parameter of the current frame based on the translational displacement of the pixel point in the current frame relative to the pixel point in the previous frame.

In some embodiments, if the violent motion parameter includes the violent motion parameter of the current frame, the encoding end determines the violent motion parameter of the current frame according to the following steps S701-A1 and S701-A21:

S701-A1. For the k-th CTU among the K CTUs included in the current frame, determine the intensity of the k-th CTU based on the k-th CTU and the reference CTU of the k-th CTU in the previous frame of the current frame. Motion parameters, K is a positive integer, k is a positive integer less than or equal to K;

S701-A2: Determine the violent motion parameters of the current frame according to the violent motion parameters of the K CTUs included in the current frame.

The embodiment of the present application does not limit the specific method of determining the strenuous exercise parameters of the k-th CTU in S701-A1.

In some embodiments, the above S701-A1 includes the following S701-A11:

S701-A11. Determine the violent motion parameter of the k-th CTU based on the pixel value of the k-th CTU and the pixel value of the reference CTU.

For another example, for each pixel in the k-th CTU, determine the absolute difference between the pixel value of the pixel in the k-th CTU and the pixel value in the reference CTU, and then based on each pixel in the k-th CTU The absolute difference between the pixel value of each pixel point in the point and the reference CTU determines the violent motion parameters of the k-th CTU.

In one example, the violent motion parameter of the k-th CTU is determined according to the above formula (3).

The embodiment of the present application does not limit the specific method of determining the violent motion parameters of the current frame based on the violent motion parameters of K CTUs included in the current frame in S701-A2.

In some embodiments, the above-mentioned S701-A2 includes: determining the sum of the violent motion parameters of the K CTUs included in the current frame as the violent motion parameter of the current frame.

In some embodiments, the above S701-A2 includes the following steps of S701-A21 and S701-A22:

S701-A21. Select P CTUs whose violent motion parameters are greater than the first preset value from K CTUs, where P is a positive integer less than or equal to K;

S701-A22. Determine the violent motion parameters of the current frame based on the violent motion parameters of P CTUs.

In one example, the sum of the violent motion parameters of the P CTUs is determined as the violent motion parameter of the current frame.

In another example, the violent motion parameter of the current frame is determined based on the sum of violent motion parameters of P CTUs and the total area of P CTUs.

For example, the ratio of the sum of the violent motion parameters of P CTUs to the total area of P CTUs is determined as the violent motion parameter of the current frame.

For example, the violent motion parameter of the current frame is determined according to the above formula (4).

In some embodiments, if the violent motion parameter includes the violent motion parameter of the current CTU, the encoding end determines the current CTU based on the translational displacement of the pixels in the current CTU relative to the pixels of the reference CTU in the previous frame. parameters of strenuous exercise.

In some embodiments, if the strenuous motion parameters include the strenuous motion parameters of the current CTU, the encoding end determines the strenuous motion parameters of the current CTU through the following step S701-B:

S701-B: Determine the violent motion parameters of the current CTU based on the current CTU and the reference CTU of the current CTU in the frame preceding the current frame.

The embodiments of this application do not limit the specific implementation of the above S701-B.

In some embodiments, S701-B includes determining the violent motion parameter of the current CTU according to the pixel value of the current CTU and the pixel value of the reference CTU.

In one example, the translational displacement of the current CTU relative to the reference CTU is determined based on the pixel value of the pixel in the current CTU and the pixel value of the pixel in the reference CTU, and then the current CTU is determined based on the translational displacement of the current CTU relative to the reference CTU. Vigorous exercise parameters of CTU.

In another example, the violent motion parameter of the current CTU is determined based on the absolute difference between the pixel value of the current CTU and the pixel value of the reference CTU.

For example, for each pixel in the current CTU, determine the absolute difference between the pixel value of the pixel in the current CTU and the pixel value in the reference CTU, and then determine the absolute difference between each pixel in the current CTU and each pixel in the reference CTU. The absolute difference in pixel values of pixels determines the violent motion parameters of the current CTU.

For example, the severe motion parameters of the current CTU are determined according to the above formula (5).

In some embodiments, the encoding end determines the violent motion parameter of the current CU based on the translational displacement of the pixels in the current CU relative to the pixels of the reference CU in the previous frame.

In some embodiments, the encoding end determines the violent motion parameters of the current CU according to the following step S701-C:

S701-C: Determine the violent motion parameters of the current CU according to the pixel value of the current CU and the pixel value of the reference CU.

In one example, the translational displacement of the current CU relative to the reference CU is determined based on the pixel value of the pixel in the current CU and the pixel value of the pixel in the reference CU, and then the translational displacement of the current CU relative to the reference CU is determined. Vigorous exercise parameters of CU.

In another example, the violent motion parameter of the current CU is determined based on the absolute difference between the pixel value of the current CU and the pixel value of the reference CU.

For example, for each pixel in the current CU, determine the absolute difference between the pixel value of the pixel in the current CU and the pixel value in the reference CU, and then determine the absolute difference between each pixel in the current CU and each pixel in the reference CU. The absolute difference in pixel values of pixels determines the violent motion parameters of the current CU.

For example, the strenuous motion parameters of the current CU are determined according to the above formula (6).

In some embodiments, under preset conditions, the above-mentioned S701 is executed to determine strenuous exercise parameters. That is to say, before determining the strenuous exercise parameters, it is first determined whether the preset conditions are met. If the preset conditions are met, the strenuous exercise parameters are determined. Therefore, the preset conditions in the embodiments of the present application can be understood as conditions that need to be met to execute the affine motion compensation prediction mode.

Optional, the first value is 1.

Optional, the second value is 0.

That is to say, when the encoding end determines that the current sequence does not allow the use of affine motion compensation prediction mode, the value of the first flag is set to 0, and the first flag set to 0 is written into the code stream, or the encoding end When it is determined that the current sequence allows the use of the affine motion compensation prediction mode, the value of the first flag is set to 1, and the first flag set to 1 is written into the code stream. In this way, the decoder decodes the code stream to obtain the first flag, and determines whether the current sequence is allowed to use the affine motion compensation prediction mode based on the first flag. For example, if the value of the first flag is 0, it is determined that the current sequence is not allowed to use it. Affine motion compensated prediction mode.

The embodiment of the present application does not limit the specific value of the preset size that the size of the current CU satisfies. For example, the preset size is that the length and/or width of the CU is greater than or equal to a preset value, such as greater than or equal to 16 , or the area of the preset size CU is greater than a certain preset value.

For example, sps_affine_enable_flag can be used to represent the first flag.

The above describes the process of determining the parameters of strenuous exercise. After determining the parameters of strenuous exercise according to the above steps, the encoding end performs the following steps of S702.

S702. Determine the block division method and prediction mode of the current CTU according to the violent motion parameters.

In the embodiment of the present application, when the current video is judged to be a translational motion video based on violent motion parameters, the affine motion compensation prediction mode can be skipped when determining the block division method and prediction mode of the current CTU, thereby reducing the number of blocks required to determine the current CTU. The workload in the division method and prediction mode is reduced, computing resources are saved, and the efficiency of determining the block division method and prediction mode of the current CTU is improved.

This embodiment of the present application does not limit the specific method of determining the block division method and prediction mode of the current CTU based on the violent motion parameters in S702.

In some embodiments, the block division method of the current CTU is a preset block division method, and the prediction mode of the CTU is determined based on the violent motion parameter. For example, if the violent motion parameter indicates to skip the affine motion compensation prediction mode, then the One or several prediction modes other than the non-affine motion compensation prediction mode are determined as the prediction modes of the CTU. If the severe motion parameter indicates not to skip the affine motion compensation prediction mode, then the affine motion compensation prediction mode is determined as the prediction mode of the CTU.

In some embodiments, the above-mentioned S702 includes the following steps of S702-A1 to S702-A2:

S702-A1. For the i-th block division method among the preset N block division methods, determine the optimal prediction mode corresponding to the i-th block division method according to the violent motion parameters. N is a positive integer, and i is less than or equal to N is a positive integer;

S702-A2: Determine the block division method and prediction mode of the current CTU according to the optimal prediction modes corresponding to the N block division methods.

Specifically, for the i-th block division method among the N block division methods, the optimal prediction mode corresponding to the i-th block division method is determined according to the violent motion parameter, so that each block in the N block division methods can be determined The optimal prediction mode corresponding to the partitioning method. Next, the block division method and prediction mode of the current CTU are determined according to the optimal prediction modes corresponding to the N block division methods. For example, the block division method with the smallest cost among the optimal prediction modes corresponding to the N block division methods is determined as The block division method of the current CTU determines the optimal prediction mode corresponding to the block division method of the current CTU as the prediction mode of the current CTU.

In some embodiments, determining the optimal prediction mode corresponding to the i-th block division method according to the violent motion parameter in S702-A1 includes the following steps:

S702-A11. Use the i-th block division method to divide the current CTU into blocks, and obtain M CUs, where M is a positive integer;

S702-A12. For the j-th CU among the M CUs, determine at least one candidate prediction mode of the j-th CU based on the violent motion parameters, where j is a positive integer less than or equal to M;

S702-A13. Determine the optimal prediction mode of the j-th CU from at least one candidate prediction mode of the j-th CU;

S702-A14: Determine the optimal prediction mode corresponding to the i-th block division method based on the optimal prediction modes of the M CUs.

In some embodiments, if the severe motion parameter indicates skipping the affine motion compensation prediction mode, it is determined that the at least one candidate prediction mode of the jth CU does not include the affine motion compensation prediction mode.

In some embodiments, if the severe motion parameter indicates not to skip the affine motion compensation prediction mode, it is determined that at least one candidate prediction mode of the j-th CU includes the affine motion compensation prediction mode.

For example, a default prediction mode among at least one candidate prediction mode of the j-th CU is determined as the optimal prediction mode of the j-th CU.

For another example, the cost when predicting the j-th CU according to at least one candidate prediction mode of the j-th CU is determined from at least one candidate prediction mode.

In some embodiments, the specific process of determining the block division method and prediction mode of the current CTU is: for the i-th block division method among the preset N block division methods, use the i-th block division method to calculate the current CTU Perform block division to obtain M CUs. For each of these M CUs, determine the optimal prediction mode for each of the M CUs based on the violent motion parameters. Specifically, for the j-th CU among the M CUs, if the severe motion parameter indicates skipping the affine motion compensation prediction mode, it is determined that at least one candidate prediction mode of the j-th CU does not include the affine motion compensation prediction mode. , if the severe motion parameter indicates not to skip the affine motion compensation prediction mode, it is determined that at least one candidate prediction mode of the j-th CU includes the affine motion compensation prediction mode. Next, use each of the candidate prediction modes of at least one candidate prediction mode of the j-th CU to predict the j-th CU, and obtain the prediction value corresponding to each candidate prediction mode. According to the prediction value corresponding to each candidate prediction mode, and the original value of the j-th CU to determine the cost of each candidate prediction mode of the j-th CU. In the embodiment of the present application, in order to reduce the cost calculation workload, an approximate cost method can be used to calculate the cost of each candidate prediction mode. Cost, for example, based on the predicted value corresponding to each candidate prediction mode and the original value of the j-th CU, calculate the sum of absolute errors (Sum of Absolute Difference, SAD) or the sum of absolute values after adamard transformation (Sum of Absolute Transformed Difference, SATD) and other approximate costs. According to the cost of each candidate prediction mode in at least one candidate prediction mode of the jth CU, a candidate prediction mode is determined from the at least one candidate prediction mode as the optimal prediction mode of the jth CU, for example, the jth CU The candidate prediction mode with the smallest cost among at least one candidate prediction mode of the jth CU is regarded as the optimal prediction mode of the j-th CU. Referring to the method of determining the optimal prediction mode of the j-th CU, the optimal prediction mode of each CU in the M CUs under the i-th block division method can be determined, and the optimal prediction mode of each CU in the M CUs can be determined. The prediction mode is determined as the optimal prediction mode corresponding to the i-th block division method, and the sum of the costs corresponding to the optimal prediction modes of each CU in the M CUs is determined as the cost corresponding to the i-th block division method. . According to the above method of determining the optimal prediction mode and cost corresponding to the i-th block division method, the optimal prediction mode and cost corresponding to each of the N block division methods are determined. Finally, among the N block division methods, the block division method with the smallest cost is determined as the block division method of the current CTU, and then the optimal prediction mode corresponding to the block division method of the current CTU is determined as the optimal prediction mode of the current CTU . For example, the block division method 1 is the block division method with the lowest cost among N block division methods, and then the block division method 1 is determined as the block division method of the current CTU. It is assumed that the block division method 1 divides the current CTU into 4 CUs. , the optimal prediction modes of each of these four CUs are prediction mode 1, prediction mode 2, prediction mode 3 and prediction mode 4, and then prediction mode 1, prediction mode 2, prediction mode 3 and prediction mode 4 are Determine the optimal prediction mode for the current CTU.

In some embodiments, after the encoding end determines the block division method and prediction mode of the current CTU according to the above steps, in order to maintain consistency between the encoding and decoding ends, it indicates the block division method and prediction mode of the current CTU to the decoder. . Specifically, at least one of first information and second information is written into the code stream, where the first information is used to indicate the block division mode of the current CTU, and the second information is used to indicate the prediction mode of the current CTU. In this way, the decoder obtains at least one of the first information and the second information by decoding the code stream, and then determines the block division method of the current CTU based on the first information, and/or determines the prediction mode of the current CTU based on the second information.

After determining the block division method and prediction mode of the current CTU according to the above steps of S702, the encoding end performs the following steps of S703.

S703: Use the block division method of the current CTU to divide the current CTU into blocks to obtain at least one CU.

S704. For the current CU in at least one CU, use the prediction mode corresponding to the current CU in the prediction mode of the current CTU to predict the current CU to obtain the prediction value of the current CU.

S705. Determine the residual value of the current CU according to the predicted value of the current CU, and obtain the code stream based on the residual value of the current CU.

Specifically, the encoding end determines the residual value of the current CU based on the predicted value of the current CU and the current CU. For example, the difference between the current CU and the predicted value of the current CU is determined as the residual value of the current CU.

In some embodiments, the residual value of the current CU is transformed to obtain the transformation coefficient of the current CU.

In some embodiments, the above-mentioned transform coefficients of the current CU are directly encoded to obtain a code stream.

In some embodiments, the transform coefficient of the current CU is quantized to obtain the quantized coefficient of the current CU. Then, the quantization coefficient of the current CU is encoded to obtain a code stream.

In order to further describe the encoding method provided by the embodiment of the present application, taking the violent motion parameters including the violent motion parameters of the current frame as an example, the implementation process of a video encoding method using the affine motion compensation prediction mode is introduced, as follows:

First, according to the above-mentioned step S701, the violent motion parameter MS of the current frame is determined. Next, the current frame is divided into multiple non-overlapping CTU blocks. Then, each CTU is processed sequentially in raster scanning order to determine the block division method and prediction mode of each CTU. Taking determining the block division method and prediction mode of the current CTU as an example, determining the optimal block division method of the current CTU mainly includes the following steps:

Step 31: For the i-th block division method Split[i] among the preset N block division methods, use the i-th block division method to divide the current CTU to obtain at least one CU. For the current CTU in at least one CU CU, calculates the optimal prediction mode CurBestModeInter[i] corresponding to the current CU in inter prediction mode, and the minimum prediction cost CurBestCostInter[i] corresponding to the optimal prediction mode. Specifically, the traditional inter-frame prediction mode (that is, no affine prediction) is first used to perform motion estimation on the current CU, and the prediction cost CurBestCostNoAffine is calculated, and the prediction mode CurBestModeNoAffine is saved. Next, determine whether the current CU meets the requirements, that is, if sps_affine_enable_flag=1 and MS≥T1 and the fixed constraints such as the size of the current CU are met, use the affine motion compensation prediction mode to perform motion estimation on the current CU and calculate the prediction cost CurBestCostAffine, Save the prediction mode CurBestModeAffine, and use the prediction mode with the smallest cost among the traditional inter prediction mode and affine motion compensation prediction mode as the minimum prediction cost CurBestCostInter[i] in the inter prediction mode, and save the corresponding mode as the lowest Best inter prediction mode CurBestModeInter[i].

Step 32: Calculate the minimum prediction cost CurBestCostOther[i] and the optimal prediction mode CurBestModeOther[i] in other prediction modes such as intra prediction. Compare CurBestCostInter[i] and CurBestCostOther[i] to select the optimal prediction mode bestMode[i] and prediction cost bestCost[i] under the i-th block division method.

Step 33: Traverse all the block division methods among the N block division methods, and select the block division method Split[opt] and the corresponding prediction mode bestMode[opt] that minimize the current CTU prediction cost. Finally, use the block division method of the current CTU to perform block division on the current CTU to obtain at least one coding unit CU; for the current CU in at least one CU, use the prediction mode corresponding to the current CU in the prediction mode of the current CTU to perform the block division on the current CU. Prediction, get the predicted value of the current CU. According to the predicted value of the current CU, determine the residual value of the current CU, transform, quantize, and entropy encode the residual value. Optionally, perform prediction information (including whether the CU uses the AFF identifier cu.affine and motion vector, etc. ) to encode and output the code stream.

Referring to the above steps, the embodiment of the present application modifies the usage conditions of the affine motion compensation prediction mode, that is, the usage conditions of the affine motion compensation prediction mode in the coding unit syntax are modified. The modified Coding unit syntax is as shown in Table 1:

Table 1

In this embodiment of the present application, the condition that the severe motion parameter MS is greater than or equal to T1 is added to the usage conditions of the affine motion compensation prediction mode. That is to say, in the embodiment of the present application, the value of the current first flag is the first value, and the width of the current CU is greater than or equal to 16, the height of the current CU is greater than or equal to 16, and the value of the violent motion parameter is greater than or equal to 16. When equal to T1, the affine motion compensation prediction mode can be used, otherwise the affine motion compensation prediction mode is skipped. This can avoid the waste of computing resources caused by using the affine motion compensation prediction mode to predict translational motion videos, thereby reducing Encoding time, improve video encoding efficiency, and save computing resources. The video coding method provided by the embodiment of the present application determines the violent motion parameters, which are used to indicate whether to skip the affine motion compensation prediction mode; and determines the block division method and prediction mode of the current coding tree unit CTU according to the violent motion parameters; Use the block division method of the current CTU to perform block division on the current CTU to obtain at least one coding unit CU; for the current CU in at least one CU, use the prediction mode corresponding to the current CU in the prediction mode of the current CTU to predict the current CU, Obtain the predicted value of the current CU; determine the residual value of the current CU based on the predicted value of the current CU, and obtain the code stream based on the residual value of the current CU. That is, in the embodiment of the present application, the violent motion parameters are used to determine whether the current video is a translational motion video. If the current video is a translational motion video, the affine motion compensation prediction mode is skipped, thereby avoiding the need for translational motion videos. The waste of computing resources caused by using affine motion compensation prediction mode prediction can reduce encoding time, improve video encoding efficiency, and save computing resources.

It should be understood that Figures 4 to 7 are only examples of the present application and should not be understood as limitations of the present application.

The preferred embodiments of the present application have been described in detail above with reference to the accompanying drawings. However, the present application is not limited to the specific details of the above-mentioned embodiments. Within the scope of the technical concept of the present application, various simple modifications can be made to the technical solutions of the present application. These simple modifications all belong to the protection scope of this application. For example, each specific technical feature described in the above-mentioned specific embodiments can be combined in any suitable way without conflict. In order to avoid unnecessary repetition, this application will no longer describe various possible combinations. Specify otherwise. For another example, any combination of various embodiments of the present application can be carried out. As long as they do not violate the idea of the present application, they should also be regarded as the contents disclosed in the present application.

It should also be understood that in the various method embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its functions and internal logic, and should not be used in this application. The implementation of the examples does not constitute any limitations. In addition, in the embodiment of this application, the term "and/or" is only an association relationship describing associated objects, indicating that three relationships can exist. Specifically, A and/or B can represent three situations: A exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this article generally indicates that the related objects are an "or" relationship.

The method embodiment of the present application is described in detail above with reference to FIGS. 4 to 7 , and the device embodiment of the present application is described in detail below with reference to FIGS. 8 to 11 .

Figure 8 is a schematic block diagram of a video decoding device provided by an embodiment of the present application.

As shown in Figure 8, the video decoding device 10 includes:

Decoding unit 11, used to decode the code stream and determine the residual value of the current coding tree unit CTU;

Determining unit 12, used to determine the block division method and prediction mode of the current CTU. The block division method and prediction mode of the current CTU are determined based on the violent motion parameter. The violent motion parameter is used to indicate whether to skip affine motion. Compensation prediction mode;

The dividing unit 13 is configured to use the block dividing method of the current CTU to block divide the current CTU to obtain at least one coding unit CU;

The prediction unit 14 is configured to predict the current CU in the at least one CU using the prediction mode corresponding to the current CU in the prediction mode of the current CTU to obtain a prediction of the current CU. value;

The reconstruction unit 15 is configured to determine the residual value of the current CU according to the residual value of the current CTU, and obtain the reconstruction value of the current CU based on the residual value and the prediction value of the current CU.

In some embodiments, the determining unit 12 is specifically configured to decode the code stream to obtain at least one of first information and second information. The first information is used to indicate the block division method of the current CTU, so The second information is used to indicate the prediction mode of the current CTU; at least one of the block division method and the prediction mode of the current CTU is determined based on at least one of the first information and the second information.

In some embodiments, the block division method and prediction mode of the current CTU are determined based on the optimal prediction modes corresponding to N block division methods, and the i-th block division method among the N block division methods corresponds to The optimal prediction mode is determined based on the violent motion parameter, and the i is a positive integer less than or equal to N.

In some embodiments, the optimal prediction mode corresponding to the i-th block division method is determined based on the optimal prediction modes of M CUs, and the M CUs are all predicted using the i-th block division method. The current CTU is obtained by block division. The optimal prediction mode of the j-th CU among the M CUs is determined from at least one candidate prediction mode of the j-th CU. The optimal prediction mode of the j-th CU is At least one candidate prediction mode is determined based on the vigorous motion parameter.

In some embodiments, the optimal prediction mode of the j-th CU is a cost when predicting the j-th CU according to at least one candidate prediction mode of the j-th CU, from the at least one candidate prediction mode. A candidate prediction mode determined among the prediction modes.

In some embodiments, the violent motion parameter includes at least one of the violent motion parameter of the current frame, the violent motion parameter of the current CTU, and the violent motion parameter of the current CU,

Wherein, the violent motion parameter of the current frame is used to indicate whether the current frame skips the affine motion compensation prediction mode, and the violent motion parameter of the current CTU is used to indicate whether the current CTU skips the affine motion. Compensation prediction mode, the violent motion parameter of the current CU is used to indicate whether the current CU skips the affine motion compensation prediction mode.

In some embodiments, the severe motion parameters of the current frame are determined based on the severe motion parameters of K CTUs included in the current frame, for the kth CTU among the K CTUs included in the current frame. , the violent motion parameter of the k-th CTU is determined based on the k-th CTU and the reference CTU of the k-th CTU in the previous frame of the current frame, and the K is a positive integer, The k is a positive integer less than or equal to K.

In some embodiments, the violent motion parameter of the k-th CTU is determined based on the pixel value of the k-th CTU and the pixel value of the reference CTU.

In some embodiments, the violent motion parameter of the k-th CTU is determined based on the absolute difference between the pixel value of the k-th CTU and the pixel value of the reference CTU.

In some embodiments, the severe motion parameters of the current frame are determined based on the severe motion parameters of P CTUs among the K CTUs whose severe motion parameters are greater than the first preset value, and the P is less than or equal to K positive integer.

In some embodiments, the severe motion parameter of the current frame is determined based on the sum of the severe motion parameters of the P CTUs and the total area of the P CTUs.

In some embodiments, the violent motion parameter of the current frame is the ratio of the sum of the violent motion parameters of the P CTUs to the total area of the P CTUs.

In some embodiments, the severe motion parameter of the current CTU is determined based on the current CTU and a reference CTU of the current CTU in a frame preceding the current frame.

In some embodiments, the severe motion parameter of the current CTU is determined based on the pixel value of the current CTU and the pixel value of the reference CTU.

In some embodiments, the severe motion parameter of the current CTU is determined based on the absolute difference between the pixel value of the current CTU and the pixel value of the reference CTU.

In some embodiments, the violent motion parameter of the current CU is determined based on the current CU and a reference CU of the current CU in a frame preceding the current frame.

In some embodiments, the violent motion parameter of the current CU is determined based on the pixel value of the current CU and the pixel value of the reference CU.

In some embodiments, the violent motion parameter of the current CU is determined based on the absolute difference between the pixel value of the current CU and the pixel value of the reference CU.

In some embodiments, the strenuous exercise parameters are determined under preset conditions.

In some embodiments, the preset condition includes a first flag whose value is a first value and the size of the current CU satisfies at least one of the preset sizes. The first flag is used to indicate whether the current sequence The affine motion compensation prediction mode is allowed to be used, and the first value is used to indicate that the current sequence is allowed to use the affine motion compensation prediction mode.

It should be understood that the device embodiments and the method embodiments may correspond to each other, and similar descriptions may refer to the method embodiments. To avoid repetition, they will not be repeated here. Specifically, the device 10 shown in Figure 8 can execute the decoding method of the embodiment of the present application, and perform the aforementioned and other operations and/or functions of each unit in the device 10 in order to implement the corresponding processes in each method such as the above decoding method. For the sake of brevity, no further details will be given here.

Figure 9 is a schematic block diagram of a video encoding device provided by an embodiment of the present application.

As shown in Figure 9, the video encoding device 20 includes:

The first determination unit 21 is used to determine a violent motion parameter, where the violent motion parameter is used to indicate whether to skip the affine motion compensation prediction mode;

The second determination unit 22 is configured to determine the block division method and prediction mode of the current CTU according to the violent motion parameter;

The dividing unit 23 is configured to use the block dividing method of the current CTU to block divide the current CTU to obtain at least one CU;

The prediction unit 24 is configured to predict the current CU in the at least one CU using the prediction mode corresponding to the current CU in the prediction mode of the current CTU to obtain a prediction of the current CU. value;

The encoding unit 25 is configured to determine the residual value of the current CU according to the predicted value of the current CU, and obtain a code stream based on the residual value of the current CU.

In some embodiments, the second determination unit 22 is specifically configured to determine, for the i-th block division method among the preset N block division methods, according to the violent motion parameter, the i-th block division method corresponds to The optimal prediction mode, the N is a positive integer, and the i is a positive integer less than or equal to N; according to the optimal prediction modes corresponding to the N block division methods, determine the block division method and prediction of the current CTU model.

In some embodiments, the second determination unit 22 is specifically configured to perform block division on the current CTU using the i-th block division method to obtain M CUs, where M is a positive integer; for the M The j-th CU in the CU determines at least one candidate prediction mode of the j-th CU based on the violent motion parameter, where j is a positive integer less than or equal to M; from at least one of the j-th CU In one candidate prediction mode, the optimal prediction mode of the j-th CU is determined; based on the optimal prediction modes of the M CUs, the optimal prediction mode corresponding to the i-th block division method is determined.

In some embodiments, the second determination unit 22 is specifically configured to determine that at least one candidate prediction mode of the j-th CU does not include: The affine motion compensated prediction mode.

In some embodiments, the second determination unit 22 is specifically configured to determine that at least one candidate prediction mode of the j-th CU includes: The affine motion compensated prediction mode.

In some embodiments, the second determination unit 22 is specifically configured to determine the cost corresponding to the at least one candidate prediction mode when the jth CU is predicted using at least one candidate prediction mode of the jth CU. ; Determine the optimal prediction mode of the j-th CU from at least one candidate prediction mode of the j-th CU according to the cost corresponding to the at least one candidate prediction mode.

In some embodiments, if the violent motion parameter includes the violent motion parameter of the current frame, the first determining unit 21 is specifically configured to target the k-th CTU among the K CTUs included in the current frame, Determine the violent motion parameter of the k-th CTU according to the k-th CTU and the reference CTU of the k-th CTU in the previous frame of the current frame, where the K is a positive integer, and the k is a positive integer less than or equal to K; the violent motion parameters of the current frame are determined according to the violent motion parameters of K CTUs included in the current frame.

In some embodiments, the first determination unit 21 is specifically configured to determine the violent motion parameter of the k-th CTU based on the pixel value of the k-th CTU and the pixel value of the reference CTU.

In some embodiments, the first determination unit 21 is specifically configured to determine the violent motion parameter of the k-th CTU based on the absolute difference between the pixel value of the k-th CTU and the pixel value of the reference CTU.

In some embodiments, the first determination unit 21 is specifically configured to select P CTUs whose violent motion parameters are greater than the first preset value from the K CTUs, where P is a positive integer less than or equal to K; according to The violent motion parameters of the P CTUs determine the violent motion parameters of the current frame.

In some embodiments, the first determination unit 21 is specifically configured to determine the violent motion parameter of the current frame based on the sum of the violent motion parameters of the P CTUs and the total area of the P CTUs.

In some embodiments, the first determination unit 21 is specifically configured to determine the ratio of the sum of the violent motion parameters of the P CTUs to the total area of the P CTUs as the violent motion parameter of the current frame. .

In some embodiments, if the violent motion parameter includes the violent motion parameter of the current CTU, the first determining unit 21 is specifically configured to calculate the current CTU according to the current CTU and the current CTU before the current frame. The reference CTU in a frame determines the violent motion parameters of the current CTU.

In some embodiments, the first determining unit 21 is specifically configured to determine the violent motion parameter of the current CTU according to the pixel value of the current CTU and the pixel value of the reference CTU.

In some embodiments, the first determination unit 21 is specifically configured to determine the violent motion parameter of the current CTU based on the absolute difference between the pixel value of the current CTU and the pixel value of the reference CTU.

In some embodiments, if the violent motion parameter includes the violent motion parameter of the current CU, the first determining unit 21 is specifically configured to determine the current CU according to the current CU and the current CU before the current frame. The reference CU in a frame determines the violent motion parameters of the current CU.

In some embodiments, the first determination unit 21 is specifically configured to determine the violent motion parameter of the current CU according to the pixel value of the current CU and the pixel value of the reference CU.

In some embodiments, the first determination unit 21 is specifically configured to determine the violent motion parameter of the current CU based on the absolute difference between the pixel value of the current CU and the pixel value of the reference CU.

In some embodiments, the first determination unit 21 is specifically configured to determine the strenuous exercise parameters under preset conditions.

In some embodiments, the encoding unit 25 is further configured to write at least one of first information and second information in the code stream, where the first information is used to indicate the block division mode of the current CTU, so The second information is used to indicate the prediction mode of the current CTU.

It should be understood that the device embodiments and the method embodiments may correspond to each other, and similar descriptions may refer to the method embodiments. To avoid repetition, they will not be repeated here. Specifically, the device 20 shown in FIG. 9 may correspond to the corresponding subject in executing the encoding method of the embodiment of the present application, and the foregoing and other operations and/or functions of each unit in the device 20 are respectively to implement the encoding method and other methods. The corresponding process in , for the sake of brevity, will not be repeated here.

The device and system of the embodiments of the present application are described above from the perspective of functional units in conjunction with the accompanying drawings. It should be understood that this functional unit can be implemented in the form of hardware, can also be implemented in the form of instructions in the software, or can also be implemented in a combination of hardware and software units. Specifically, each step of the method embodiments in the embodiments of the present application can be completed by integrated logic circuits of hardware in the processor and/or instructions in the form of software. The steps of the methods disclosed in conjunction with the embodiments of the present application can be directly embodied in hardware. The execution of the decoding processor is completed, or the execution is completed using a combination of hardware and software units in the decoding processor. Optionally, the software unit may be located in a mature storage medium in this field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, register, etc. The storage medium is located in the memory, and the processor reads the information in the memory and completes the steps in the above method embodiment in combination with its hardware.

Figure 10 is a schematic block diagram of an electronic device provided by an embodiment of the present application.

As shown in Figure 32, the electronic device 30 may be the video encoder or video decoder described in the embodiment of the present application. The electronic device 30 may include:

Memory 33 and processor 32, the memory 33 is used to store the computer program 34 and transmit the program code 34 to the processor 32. In other words, the processor 32 can call and run the computer program 34 from the memory 33 to implement the method in the embodiment of the present application.

For example, the processor 32 may be configured to perform the steps in the above method 200 according to instructions in the computer program 34 .

In some embodiments of the present application, the processor 32 may include but is not limited to:

General processor, Digital Signal Processor (DSP), Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates Or transistor logic devices, discrete hardware components, etc.

In some embodiments of the present application, the memory 33 includes but is not limited to:

Volatile memory and/or non-volatile memory. Among them, non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory. Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. Volatile memory may be Random Access Memory (RAM), which is used as an external cache. By way of illustration, but not limitation, many forms of RAM are available, such as static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random access memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synch link DRAM, SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DR RAM).

In some embodiments of the present application, the computer program 34 can be divided into one or more units, and the one or more units are stored in the memory 33 and executed by the processor 32 to complete the tasks provided by this application. Methods. The one or more units may be a series of computer program instruction segments capable of completing specific functions. The instruction segments are used to describe the execution process of the computer program 34 in the electronic device 30 .

As shown in Figure 10, the electronic device 30 may also include:

Transceiver 33 , the transceiver 33 can be connected to the processor 32 or the memory 33 .

The processor 32 can control the transceiver 33 to communicate with other devices. Specifically, it can send information or data to other devices, or receive information or data sent by other devices. Transceiver 33 may include a transmitter and a receiver. The transceiver 33 may further include an antenna, and the number of antennas may be one or more.

It should be understood that various components in the electronic device 30 are connected through a bus system, where in addition to the data bus, the bus system also includes a power bus, a control bus and a status signal bus.

As shown in Figure 11, the video encoding and decoding system 40 may include: a video encoder 41 and a video decoder 42, where the video encoder 41 is used to perform the video encoding method involved in the embodiment of the present application, and the video decoder 42 is used to perform The embodiment of the present application relates to a video decoding method.

This application also provides a computer storage medium on which a computer program is stored. When the computer program is executed by a computer, the computer can perform the method of the above method embodiment. In other words, embodiments of the present application also provide a computer program product containing instructions, which when executed by a computer causes the computer to perform the method of the above method embodiments.

This application also provides a code stream, which is generated by the above encoding method. Optionally, the code stream includes a first flag.

When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted over a wired connection from a website, computer, server, or data center (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website, computer, server or data center. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media. The available media may be magnetic media (such as floppy disks, hard disks, magnetic tapes), optical media (such as digital video discs (DVD)), or semiconductor media (such as solid state disks (SSD)), etc.

Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.

A unit described as a separate component may or may not be physically separate. A component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or it may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment. For example, each functional unit in various embodiments of the present application can be integrated into a processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit.

The above contents are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of changes or replacements within the technical scope disclosed in the present application, and should are covered by the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A video coding method, characterized by including:

Determine a violent motion parameter, the violent motion parameter being used to indicate whether to skip the affine motion compensation prediction mode;

Determine the block division method and prediction mode of the current coding tree unit CTU according to the violent motion parameter;

Perform block division on the current CTU using the block division method of the current CTU to obtain at least one coding unit CU;

For the current CU in the at least one CU, use the prediction mode corresponding to the current CU in the prediction mode of the current CTU to predict the current CU to obtain a prediction value of the current CU;

According to the predicted value of the current CU, the residual value of the current CU is determined, and based on the residual value of the current CU, a code stream is obtained.
The method according to claim 1, wherein determining the block division method and prediction mode of the current coding tree unit CTU according to the violent motion parameter includes:

For the i-th block division method among the preset N block division methods, determine the optimal prediction mode corresponding to the i-th block division method according to the violent motion parameter, where the N is a positive integer, and the i is a positive integer less than or equal to N;

The block division method and prediction mode of the current CTU are determined according to the optimal prediction modes corresponding to the N block division methods.
The method of claim 2, wherein determining the optimal prediction mode corresponding to the i-th block division method according to the violent motion parameter includes:

Use the i-th block division method to perform block division on the current CTU to obtain M CUs, where M is a positive integer;

For the j-th CU among the M CUs, determine at least one candidate prediction mode of the j-th CU according to the violent motion parameter, where j is a positive integer less than or equal to M;

Determine the optimal prediction mode of the j-th CU from at least one candidate prediction mode of the j-th CU;

According to the optimal prediction modes of the M CUs, the optimal prediction mode corresponding to the i-th block division mode is determined.
The method of claim 3, wherein determining at least one candidate prediction mode of the j-th CU according to the violent motion parameter includes:

If the severe motion parameter indicates skipping the affine motion compensation prediction mode, it is determined that the at least one candidate prediction mode of the j-th CU does not include the affine motion compensation prediction mode.
The method of claim 3, wherein determining at least one candidate prediction mode of the j-th CU according to the violent motion parameter includes:

If the severe motion parameter indicates not to skip the affine motion compensation prediction mode, it is determined that at least one candidate prediction mode of the j-th CU includes the affine motion compensation prediction mode.
The method of claim 3, wherein determining the optimal prediction mode of the j-th CU from at least one candidate prediction mode of the j-th CU includes:

Determine the cost corresponding to the at least one candidate prediction mode when predicting the j-th CU using at least one candidate prediction mode of the j-th CU;

According to the cost corresponding to the at least one candidate prediction mode, the optimal prediction mode of the jth CU is determined from the at least one candidate prediction mode of the jth CU.
The method according to any one of claims 1 to 6, characterized in that the violent motion parameters include the violent motion parameters of the current frame, the violent motion parameters of the current CTU, and the violent motion parameters of the current CU. at least one of,

Wherein, the violent motion parameter of the current frame is used to indicate whether the current frame skips the affine motion compensation prediction mode, and the violent motion parameter of the current CTU is used to indicate whether the current CTU skips the affine motion. Compensation prediction mode, the violent motion parameter of the current CU is used to indicate whether the current CU skips the affine motion compensation prediction mode.
The method according to claim 7, wherein if the violent motion parameter includes the violent motion parameter of the current frame, then determining the violent motion parameter includes:

For the k-th CTU among the K CTUs included in the current frame, determine the k-th CTU according to the k-th CTU and the reference CTU of the k-th CTU in the previous frame of the current frame. The violent motion parameter of the k-th CTU, the K is a positive integer, and the k is a positive integer less than or equal to K;

The violent motion parameters of the current frame are determined according to the violent motion parameters of K CTUs included in the current frame.
The method according to claim 8, characterized in that the k-th CTU is determined based on the k-th CTU and the reference CTU of the k-th CTU in the previous frame of the current frame. CTU’s strenuous exercise parameters include:

According to the pixel value of the k-th CTU and the pixel value of the reference CTU, the violent motion parameter of the k-th CTU is determined.
The method of claim 9, wherein determining the violent motion parameter of the k-th CTU based on the pixel value of the k-th CTU and the pixel value of the reference CTU includes:

The violent motion parameter of the k-th CTU is determined based on the absolute difference between the pixel value of the k-th CTU and the pixel value of the reference CTU.
The method of claim 8, wherein determining the violent motion parameters of the current frame based on the violent motion parameters of K CTUs included in the current frame includes:

Select P CTUs whose violent motion parameters are greater than the first preset value from the K CTUs, where P is a positive integer less than or equal to K;

According to the violent motion parameters of the P CTUs, the violent motion parameters of the current frame are determined.
The method according to claim 11, wherein determining the violent motion parameters of the current frame based on the violent motion parameters of the P CTUs includes:

The violent motion parameter of the current frame is determined according to the sum of the violent motion parameters of the P CTUs and the total area of the P CTUs.
The method according to claim 12, wherein determining the violent motion parameters of the current frame based on the sum of the violent motion parameters of the P CTUs and the total area of the P CTUs includes:

The ratio of the sum of the violent motion parameters of the P CTUs to the total area of the P CTUs is determined as the violent motion parameter of the current frame.
The method according to claim 7, characterized in that, if the strenuous exercise parameters include the strenuous exercise parameters of the current CTU, the determining of the strenuous exercise parameters includes:

According to the current CTU and the reference CTU of the current CTU in the previous frame of the current frame, the violent motion parameter of the current CTU is determined.
The method according to claim 14, characterized in that the violent motion parameters of the current CTU are determined based on the current CTU and the reference CTU of the current CTU in the frame preceding the current frame, include:

According to the pixel value of the current CTU and the pixel value of the reference CTU, the violent motion parameter of the current CTU is determined.
The method of claim 15, wherein determining the violent motion parameters of the current CTU based on the pixel values of the current CTU and the pixel values of the reference CTU includes:

According to the absolute difference between the pixel value of the current CTU and the pixel value of the reference CTU, the violent motion parameter of the current CTU is determined.
The method according to claim 7, characterized in that, if the strenuous motion parameters include the strenuous motion parameters of the current CU, the determining of the strenuous motion parameters includes:

According to the current CU and the reference CU of the current CU in the previous frame of the current frame, the violent motion parameter of the current CU is determined.
The method of claim 17, wherein the violent motion parameters of the current CU are determined based on the current CU and a reference CU of the current CU in a frame preceding the current frame, include:

According to the pixel value of the current CU and the pixel value of the reference CU, the violent motion parameter of the current CU is determined.
The method of claim 18, wherein determining the violent motion parameters of the current CU based on the pixel values of the current CU and the pixel values of the reference CU includes:

According to the absolute difference between the pixel value of the current CU and the pixel value of the reference CU, the violent motion parameter of the current CU is determined.
The method according to any one of claims 1 to 6, characterized in that determining the parameters of strenuous exercise includes:

Under preset conditions, the strenuous exercise parameters are determined.
The method according to claim 20, wherein the preset condition includes that the value of the first flag is a first numerical value, and the size of the current CU satisfies at least one of the preset sizes, and the first The flag is used to indicate whether the current sequence is allowed to use the affine motion compensation prediction mode, and the first value is used to indicate that the current sequence is allowed to use the affine motion compensation prediction mode.
The method according to any one of claims 1-6, characterized in that the method further includes:

At least one of first information and second information is written in the code stream, the first information is used to indicate the block division method of the current CTU, and the second information is used to indicate the block division method of the current CTU. Prediction mode.
A video encoding device, characterized by including:

A first determination unit configured to determine a violent motion parameter, where the violent motion parameter is used to indicate whether to skip the affine motion compensation prediction mode;

A second determination unit, configured to determine the block division method and prediction mode of the current CTU according to the violent motion parameter;

A dividing unit, configured to divide the current CTU into blocks using the block dividing method of the current CTU to obtain at least one CU;

A prediction unit configured to predict the current CU in the at least one CU using the prediction mode corresponding to the current CU in the prediction mode of the current CTU to obtain a prediction value of the current CU. ;

A coding unit, configured to determine the residual value of the current CU based on the predicted value of the current CU, and obtain a code stream based on the residual value of the current CU.
A video encoder, characterized by including a processor and a memory;

The memory shown is used to store computer programs;

The processor is configured to call and run the computer program stored in the memory to implement the method as described in any one of claims 1 to 22 above.
A coding and decoding system, characterized by including:

The video encoder of claim 24.
A computer-readable storage medium, characterized in that it is used to store a computer program;

The computer program causes the computer to perform the method as claimed in any one of claims 1 to 22 above.
A code stream, characterized in that the code stream is generated by the method described in any one of claims 1 to 22.