CN117560502A

CN117560502A - Video encoding method, video encoding device, computer readable medium and electronic equipment

Info

Publication number: CN117560502A
Application number: CN202311600332.9A
Authority: CN
Inventors: 张洪彬; 唐敏豪; 曹志强; 蔡斌斌; 谢阳杰; 范佑; 王玉伟
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-11-24
Filing date: 2023-11-24
Publication date: 2024-02-13

Abstract

Embodiments of the present application provide a video encoding method, apparatus, computer readable medium, and electronic device. The video encoding method includes: interpolation processing is carried out on a first motion vector in a candidate motion vector list of a current coding block, and a corrected motion vector corresponding to the first motion vector is obtained; optimizing the result of the interpolation processing of the first motion vector through inter-frame and intra-frame combined prediction CIIP mode to obtain a corrected motion vector corresponding to the CIIP mode; generating a motion vector set corresponding to the current coding block according to the corrected motion vector corresponding to the first motion vector and the corrected motion vector corresponding to the CIIP mode; and determining the motion vector of the current coding block according to the motion vector set, and performing coding processing based on the determined motion vector. The technical scheme of the embodiment of the application can reduce the expenditure of a hardware circuit and is beneficial to improving the coding and decoding performance of the system.

Description

Video encoding method, video encoding device, computer readable medium and electronic equipment

Technical Field

The present invention relates to the field of audio and video coding technologies, and in particular, to a video coding method, a video coding device, a computer readable medium and an electronic device.

Background

Among video coding techniques, the Merge coding technique has excellent compression performance, and is one of important techniques in video coding and decoding standards since HEVC (HighEfficiency Video Coding ). Many improvements to the Merge technique were made in the h.266/VVC (Versatile Video Coding, multi-function video coding) standard, including History-based motion vector prediction (History-based Motion Vector Prediction, HMVP for short), combined inter and intra prediction (Combined Inter and Intra Prediction, CIIP for short), and Merge mode with motion vector differences (Merge Mode with Motion Vector Difference, MMVD for short). However, these new encoding techniques increase the overhead of hardware circuits to some extent, and present new challenges for the hardware architecture design of video inter-coding.

Disclosure of Invention

The embodiment of the application provides a video coding method, a video coding device, a computer readable medium and electronic equipment, which can reduce the cost of a hardware circuit and are beneficial to improving the coding and decoding performance of a system.

Other features and advantages of the present application will be apparent from the following detailed description, or may be learned in part by the practice of the application.

According to an aspect of an embodiment of the present application, there is provided a video encoding method, including: interpolation processing is carried out on a first motion vector in a candidate motion vector list of a current coding block, and a corrected motion vector corresponding to the first motion vector is obtained; optimizing the result of the interpolation processing of the first motion vector through inter-frame and intra-frame combined prediction CIIP mode to obtain a corrected motion vector corresponding to the CIIP mode; generating a motion vector set corresponding to the current coding block according to the corrected motion vector corresponding to the first motion vector and the corrected motion vector corresponding to the CIIP mode; and determining the motion vector of the current coding block according to the motion vector set, and performing coding processing based on the determined motion vector.

According to an aspect of an embodiment of the present application, there is provided a video encoding apparatus including: an interpolation unit configured to perform interpolation processing on a first motion vector in a candidate motion vector list of a current coding block to obtain a corrected motion vector corresponding to the first motion vector; the processing unit is configured to optimize the result of the interpolation processing of the first motion vector through the inter-frame and intra-frame combined prediction CIIP mode to obtain a corrected motion vector corresponding to the CIIP mode; a generating unit configured to generate a motion vector set corresponding to the current coding block according to the corrected motion vector corresponding to the first motion vector and the corrected motion vector corresponding to the CIIP mode; and a determining unit configured to determine a motion vector of the current coding block from the motion vector set, and perform coding processing based on the determined motion vector.

In some embodiments of the present application, based on the foregoing solution, the processing unit is further configured to: optimizing a second motion vector in the candidate motion vector list through a bidirectional coding mode to obtain a corrected motion vector corresponding to the second motion vector; and adding the corrected motion vector corresponding to the second motion vector to the candidate motion vector list.

In some embodiments of the present application, based on the foregoing solution, the process of performing, by the processing unit, optimization processing on a second motion vector in the candidate motion vector list through a bi-directional coding mode to obtain a corrected motion vector corresponding to the second motion vector includes:

optimizing the second motion vector through a decoding end motion vector refinement DMVR mode to obtain a corrected motion vector corresponding to the second motion vector; or alternatively

Optimizing the second motion vector through a two-way optical flow technology BDOF mode to obtain a corrected motion vector corresponding to the second motion vector; or alternatively

And optimizing the second motion vector through the DMVR mode and the BDOF mode to obtain a corrected motion vector corresponding to the second motion vector.

In some embodiments of the present application, based on the foregoing scheme, performing optimization processing on the second motion vector by using the DMVR mode and the BDOF mode includes: optimizing the second motion vector through the DMVR mode to obtain a corrected motion vector corresponding to the DMVR mode; and carrying out interpolation processing on the corrected motion vector corresponding to the DMVR mode, and carrying out optimization processing on an interpolation result through the BDOF mode to obtain the corrected motion vector corresponding to the second motion vector.

In some embodiments of the present application, based on the foregoing solution, the processing unit is further configured to: determining a motion vector capable of adopting a bidirectional coding mode from the candidate motion vector list; and taking the motion vector which can adopt the bidirectional coding mode in the candidate motion vector list as the second motion vector, and taking the motion vector which cannot adopt the bidirectional coding mode in the candidate motion vector list as the first motion vector.

In some embodiments of the present application, based on the foregoing scheme, the determining unit is configured to: selecting a target motion vector for performing a merge mode MMVD with a motion vector difference from the set of motion vectors; optimizing the selected target motion vector through an MMVD mode to obtain a corrected motion vector corresponding to the MMVD mode; and selecting the motion vector of the current coding block from the motion vector set and the corrected motion vector corresponding to the MMVD mode.

In some embodiments of the present application, based on the foregoing solution, the determining unit selects a process for performing a target motion vector of a merge mode MMVD with a motion vector difference from the motion vector set, including: calculating the sum SATD cost of absolute conversion differences corresponding to each motion vector in the motion vector set; and selecting a motion vector with the minimum SATD cost from the motion vector set according to the SATD cost corresponding to each motion vector to serve as the target motion vector.

In some embodiments of the present application, based on the foregoing solution, the process of performing, by the determining unit, optimization processing on the selected target motion vector by using an MMVD mode to obtain a corrected motion vector corresponding to the MMVD mode includes: taking the target motion vector as a searching starting point, searching according to a 1/4 pixel precision step length to determine an optimal searching direction; and in the optimal searching direction, searching is continued according to the precision step length of other pixels of the MMVD mode, and the corrected motion vector corresponding to the MMVD mode is determined based on a searching result.

In some embodiments of the present application, based on the foregoing solution, searching is performed with the target motion vector as a search starting point according to a 1/4 pixel precision step size to determine an optimal search direction, including: searching in each searching direction according to 1/4 pixel precision step length by taking the target motion vector as a searching starting point to obtain searching results in each searching direction; and selecting the search direction with the minimum SATD cost as the optimal search direction according to the SATD cost corresponding to the search result in each search direction.

In some embodiments of the present application, based on the foregoing solution, the determining unit selects a motion vector of the current coding block from the motion vector set and a modified motion vector corresponding to the MMVD mode, including: calculating a modified motion vector corresponding to the MMVD mode and rate distortion optimized RDO cost corresponding to each motion vector in the motion vector set; and selecting a motion vector with the minimum RDO cost from the modified motion vector and the motion vector set corresponding to the MMVD mode as the motion vector of the current coding block.

In some embodiments of the present application, based on the foregoing scheme, the determining unit is configured to: calculating rate distortion optimization RDO cost corresponding to each motion vector in the motion vector set; and selecting the motion vector with the minimum RDO cost from the motion vector set as the motion vector of the current coding block.

In some embodiments of the present application, based on the foregoing scheme, the candidate motion vector list of the current coding block is a candidate motion vector list determined by the current coding block in the Merge mode.

According to an aspect of the embodiments of the present application, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a video encoding method as described in the above embodiments.

According to an aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; and storage means for storing one or more computer programs which, when executed by the one or more processors, cause the electronic device to implement the video encoding method as described in the above embodiments.

According to one aspect of embodiments of the present application, there is provided a computer program product comprising a computer program stored in a computer readable storage medium. The processor of the electronic device reads and executes the computer program from the computer-readable storage medium to cause the electronic device to perform the video encoding method provided in the various alternative embodiments described above.

In the technical solutions provided in some embodiments of the present application, interpolation processing is performed on a first motion vector in a candidate motion vector list of a current coding block to obtain a corrected motion vector corresponding to the first motion vector, then optimization processing is performed on a result of interpolation processing on the first motion vector through a CIIP mode to obtain a corrected motion vector corresponding to the CIIP mode, and then a motion vector set corresponding to the current coding block is generated according to the corrected motion vector corresponding to the first motion vector and the corrected motion vector corresponding to the CIIP mode, so that the motion vector of the current coding block is determined according to the motion vector set, and coding processing is performed based on the determined motion vector, so that the CIIP mode can multiplex a circuit for interpolation processing, further the calculation amount of interpolation processing can be reduced, and the cost of a hardware circuit can be reduced, which is beneficial to improving the coding and decoding performance of a system.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

FIG. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of embodiments of the present application may be applied;

fig. 2 shows a schematic diagram of the placement of a video encoding device and a video decoding device in a streaming system;

FIG. 3 shows a basic flow diagram of a video encoder;

FIG. 4 shows a schematic diagram of inter prediction;

FIG. 5 shows a schematic diagram of determining candidate MVs;

FIG. 6 shows a schematic diagram of adjacent blocks on which CIIP mode depends;

FIG. 7 shows a search schematic of MMVD patterns;

FIG. 8 illustrates a flow chart of a video encoding method according to one embodiment of the present application;

FIG. 9 shows a schematic diagram of a video encoding process according to one embodiment of the present application;

FIG. 10 illustrates a block diagram of a video encoding apparatus according to one embodiment of the present application;

fig. 11 shows a schematic diagram of a computer system suitable for use in implementing the electronic device of the embodiments of the present application.

Detailed Description

Example embodiments are now described in a more complete manner with reference being made to the figures. However, the illustrated embodiments may be embodied in various forms and should not be construed as limited to only these examples; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are set forth in order to provide a thorough understanding of embodiments of the present application. However, it will be recognized by one skilled in the art that the present application may be practiced without all of the specific details of the embodiments, that one or more specific details may be omitted, or that other methods, components, devices, steps, etc. may be used.

In the present embodiment, the term "module" or "unit" refers to a computer program or a part of a computer program having a predetermined function, and works together with other relevant parts to achieve a predetermined object, and may be implemented in whole or in part by using software, hardware (such as a processing circuit or a memory), or a combination thereof. Also, a processor (or multiple processors or memories) may be used to implement one or more modules or units. Furthermore, each module or unit may be part of an overall module or unit that incorporates the functionality of the module or unit.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

It should be noted that: references herein to "a plurality" means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., a and/or B may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application may be applied.

As shown in fig. 1, the system architecture 100 includes a plurality of terminal devices that can communicate with each other through, for example, a network 150. For example, the system architecture 100 may include a first terminal device 110 and a second terminal device 120 interconnected by a network 150. In the embodiment of fig. 1, the first terminal apparatus 110 and the second terminal apparatus 120 perform unidirectional data transmission.

For example, the first terminal device 110 may encode video data (e.g., a stream of video pictures collected by the terminal device 110) for transmission over the network 150 to the second terminal device 120, the encoded video data transmitted in one or more encoded video code streams, the second terminal device 120 may receive the encoded video data from the network 150, decode the encoded video data to recover the video data, and display the video pictures in accordance with the recovered video data.

In one embodiment of the present application, the system architecture 100 may include a third terminal device 130 and a fourth terminal device 140 that perform bi-directional transmission of encoded video data, such as may occur during a video conference. For bi-directional data transmission, each of the third terminal device 130 and the fourth terminal device 140 may encode video data (e.g., a stream of video pictures collected by the terminal device) for transmission over the network 150 to the other of the third terminal device 130 and the fourth terminal device 140. Each of the third terminal device 130 and the fourth terminal device 140 may also receive encoded video data transmitted by the other of the third terminal device 130 and the fourth terminal device 140, and may decode the encoded video data to restore the video data, and may display a video picture on an accessible display device according to the restored video data.

In the embodiment shown in fig. 1, the first, second, third and fourth terminal apparatuses 110, 120, 130 and 140 may be servers or terminals, but the principles disclosed herein may not be limited thereto.

The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligent platforms, and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart voice interaction device, a smart watch, a smart home appliance, a vehicle-mounted terminal, an aircraft, etc.

The network 150 shown in fig. 1 represents any number of networks that transfer encoded video data between the first terminal device 110, the second terminal device 120, the third terminal device 130, and the fourth terminal device 140, including, for example, wired and/or wireless communication networks. The communication network 150 may exchange data in circuit-switched and/or packet-switched channels. The network may include a telecommunications network, a local area network, a wide area network, and/or the internet. For the purposes of this application, the architecture and topology of network 150 may be irrelevant to the operation disclosed herein, unless explained below.

In one embodiment of the present application, fig. 2 illustrates the placement of video encoding devices and video decoding devices in a streaming environment. The subject matter disclosed herein is equally applicable to other video-enabled applications including, for example, video conferencing, digital TV (television), storing compressed video on digital media including CDs, DVDs, memory sticks, etc.

The streaming system may include an acquisition subsystem 213, and the acquisition subsystem 213 may include a video source 201, such as a digital camera, that creates an uncompressed video picture stream 202. In an embodiment, the video picture stream 202 includes samples taken by a digital camera. The video picture stream 202 is depicted as a bold line to emphasize a high data volume video picture stream compared to the encoded video data 204 (or the encoded video code stream 204), the video picture stream 202 may be processed by an electronic device 220, the electronic device 220 comprising a video encoding device 203 coupled to the video source 201. The video encoding device 203 may include hardware, software, or a combination of hardware and software to implement or implement aspects of the disclosed subject matter as described in more detail below. The encoded video data 204 (or encoded video stream 204) is depicted as a thin line compared to the video picture stream 202 to emphasize a lower amount of encoded video data 204 (or encoded video stream 204), which may be stored on the streaming server 205 for future use. One or more streaming client subsystems, such as client subsystem 206 and client subsystem 208 in fig. 2, may access streaming server 205 to retrieve copies 207 and 209 of encoded video data 204. Client subsystem 206 may include video decoding device 210, for example, in electronic device 230. Video decoding device 210 decodes an incoming copy 207 of the encoded video data and generates an output video picture stream 211 that may be presented on a display 212 (e.g., a display screen) or another presentation device. In some streaming systems, the encoded video data 204, 207, and 209 (e.g., video streams) may be encoded according to some video encoding/compression standard.

It should be noted that electronic device 220 and electronic device 230 may include other components not shown in the figures. For example, electronic device 220 may comprise a video decoding device, and electronic device 230 may also comprise a video encoding device.

In one embodiment of the present application, taking the international video coding standard HEVC (HighEfficiency Video Coding ), VVC (Versatile Video Coding, multi-function video coding), and the national video coding standard AVS as examples, after a video frame image is input, the video frame image is divided into several non-overlapping processing units according to a block size, and each processing unit will perform a similar compression operation. This processing Unit is called CTU (Coding Tree Unit), or LCU (Largest Coding Unit, maximum Coding Unit). The CTU may further perform finer division down to obtain one or more basic Coding units CU (Coding Unit), which are the most basic elements in a Coding link.

In another embodiment, this processing unit, which may also be referred to as a slice (i.e., tile), is a rectangular region of a frame of multimedia data that can be independently decoded and encoded. In the AV1 standard, the coding slice may be further divided into more finer blocks (SB) to obtain one or more maximum coding blocks (Superblock), and SB is a starting point of Block division, and may be further divided into a plurality of sub-blocks, and then the maximum coding blocks may be further divided into one or more blocks (Block). Each block is the most basic element in a coding chain. Alternatively, one SB may contain several bs.

The above-described division manner for video frame images may be referred to as a block division structure (blockpartition structure), and some concepts in the encoding process are described below:

predictive coding (Predictive Coding): the predictive coding comprises modes of intra-frame prediction, inter-frame prediction and the like, and the residual video signal is obtained after the original video signal is predicted by the selected reconstructed video signal. The encoding end needs to decide which predictive coding mode to select for the current coding unit (or coding block) and inform the decoding end. Wherein, intra-frame prediction refers to that a predicted signal comes from a region which is already coded and reconstructed in the same image; inter prediction refers to a predicted signal from an already encoded picture (referred to as a reference picture) other than the current picture.

Transform & Quantization): the residual video signal is subjected to transformation operations such as DFT (Discrete Fourier Transform ), DCT (Discrete CosineTransform, discrete cosine transform), etc., and then the signal is converted into a transform domain, which is called a transform coefficient. The transformation coefficient is further subjected to lossy quantization operation, and certain information is lost, so that the quantized signal is favorable for compression expression. In some video coding standards, there may be more than one transform mode to choose, so the coding end also needs to choose one of the transform modes for the current coding unit (or coding block) and inform the decoding end. The quantization refinement is generally determined by a quantization parameter (Quantization Parameter, QP for short), and the QP is larger, so that the coefficients representing a larger range of values will be quantized to the same output, and thus will generally bring more distortion and lower code rate; conversely, a smaller QP value will represent a smaller range of coefficients to be quantized to the same output, and therefore will typically result in less distortion, while corresponding to a higher code rate.

Entropy Coding (Entropy Coding) or statistical Coding: the quantized transform domain signal is subjected to statistical compression coding according to the occurrence frequency of each value, and finally a binary (0 or 1) compressed code stream is output. Meanwhile, encoding generates other information, such as a selected encoding mode, motion vector data, etc., and entropy encoding is also required to reduce the code rate. The statistical coding is a lossless coding mode, which can effectively reduce the code rate required for expressing the same signal, and common statistical coding modes are variable length coding (Variable Length Coding, abbreviated as VLC) or context-based binary arithmetic coding (Content Adaptive Binary Arithmetic Coding, abbreviated as CABAC).

The context-based binary arithmetic coding (CABAC) process mainly includes 3 steps: binarization, context modeling, and binary arithmetic coding. After binarizing the input syntax element, the binary data may be encoded by a normal encoding mode and a bypass encoding mode (Bypass Coding Mode). The bypass coding mode does not need to allocate a specific probability model for each binary bit, and the input binary bit bin value is directly coded by a simple bypass coder so as to accelerate the whole coding and decoding speed. In general, different syntax elements are not completely independent, and the same syntax elements have a certain memory. Therefore, according to the conditional entropy theory, the encoding performance can be further improved compared to independent encoding or memoryless encoding by using other encoded syntax elements for conditional encoding. These encoded symbol information used as conditions are referred to as contexts. In the conventional coding mode, the bits of the syntax element enter the context modeler sequentially, and the encoder assigns an appropriate probability model to each input bit based on the values of the previously encoded syntax element or bit, a process that models the context. The context model corresponding to the syntax element can be located through ctxIdxInc (context index increment ) and ctxIdxStart (context index Start, context start index). After the bin values are fed into the binary arithmetic coder together with the assigned probability model for coding, the context model needs to be updated according to the bin values, i.e. the adaptation process in the coding.

Loop Filtering (Loop Filtering): the modified and quantized signal is subjected to inverse quantization, inverse transformation and prediction compensation to obtain a reconstructed image. The reconstructed image differs from the original image in part of the information due to quantization effects compared to the original image, i.e. the reconstructed image is distorted (Distortion). Therefore, the filtering operation, such as deblocking filtering (Deblocking filter, DB for short), SAO (Sample Adaptive Offset, adaptive pixel compensation) or ALF (Adaptive Loop Filter, adaptive loop filtering) can be performed on the reconstructed image, so that the distortion degree generated by quantization can be effectively reduced. Since these filtered reconstructed images will be used as a reference for subsequent encoded images to predict future image signals, the above-described filtering operation is also referred to as loop filtering, i.e. a filtering operation within the encoding loop.

In one embodiment of the present application, fig. 3 shows a basic flow chart of a video encoder, in which intra prediction is illustrated as an example. Wherein the original image signal s _k [x,y]And predicting image signalsPerforming a difference operation to obtain a residual signal u _k [x,y]Residual signal u _k [x,y]The quantized coefficients are obtained after transformation and quantization processes, and the quantized coefficients are obtained by entropy encoding to obtain an encoded bit stream, and inverse quantization and inverse transformation processes to obtain the encoded bit streamTo reconstruct the residual signal u' _k [x,y]Predictive image signal +.>And reconstructing residual signal u' _k [x,y]Superposition to generate image signalImage signal->On the one hand, the video signal is input into an intra-frame mode decision module and an intra-frame prediction module to carry out intra-frame prediction processing, and on the other hand, a reconstructed image signal s 'is output through loop filtering' _k [x,y]Reconstructing an image signal s' _k [x,y]Motion estimation and motion compensation prediction can be performed as a reference image of the next frame. Then based on the result s 'of the motion compensated prediction' _r [x+m _x ,y+m _y ]And intra prediction result f (s _k ^* [x,y]) Obtaining a predicted image signal of the next frame +.>And continuing to repeat the process until the encoding is completed.

Based on the above-described encoding process, at the decoding end, for each encoding unit (or encoding block), after a compressed code stream (i.e., bit stream) is acquired, entropy decoding is performed to obtain various mode information and quantization coefficients. And then the quantized coefficients are subjected to inverse quantization and inverse transformation to obtain residual signals. On the other hand, according to the known coding mode information, a prediction signal corresponding to the coding unit (or the coding block) can be obtained, then, after the residual signal and the prediction signal are added, a reconstructed signal can be obtained, and the reconstructed signal is subjected to operations such as loop filtering and the like to generate a final output signal.

Currently mainstream video coding standards, such as HEVC (High Efficiency Video Coding ), VVC (Versatile Video Coding, multi-function video coding), AVS3, AV1 (Alliance for Open Media Video 1, which is a first generation video coding standard established by the open media alliance), AV2 (Alliance for Open Media Video 2, which is a second generation video coding standard established by the open media alliance), all employ block-based hybrid coding frameworks. The method is characterized in that original video data is divided into a series of coding blocks, and video coding methods such as prediction, transformation and entropy coding are combined to realize compression of the video data.

Among them, motion compensation is a type of prediction method commonly used for video coding, and motion compensation derives a prediction value of a current coding block from a coded region based on redundancy characteristics of video content in a time domain or a space domain. Such prediction methods include: inter prediction, intra block copy prediction, intra string copy prediction, etc., these prediction methods may be used alone or in combination in a particular coding implementation. For coded blocks using these prediction methods, it is often necessary to explicitly or implicitly encode one or more two-dimensional displacement vectors in the code stream, indicating the displacement of the current block (or co-located blocks of the current block) relative to its one or more reference blocks.

It should be noted that under different prediction modes and different implementations, the displacement vectors may have different names, for example, the displacement vectors in inter-frame prediction are called Motion Vectors (MVs); the displacement vector in intra block copy is called block displacement vector (BV); the displacement vector in intra-frame string replication is referred to as a string displacement vector.

As shown in fig. 4, inter prediction predicts a pixel of a current image using a pixel adjacent to a coded image by using correlation of video time domain, so as to achieve the purpose of effectively removing video time domain redundancy, and effectively saving bits of coded residual data. Wherein, P represents the current frame, pr represents the reference frame, B represents the current coding block, and Br represents the reference block of B. The coordinates of B' in the reference frame are the same as the coordinates of B in the current frame, and Br has a coordinate of (x _r ,y _r ) The coordinates of B' are (x, y), and the displacement between the current coding block and its reference block is called motion vector (i.e., MV), where mv= (x) _r -x,y _r -y)。

Considering that the temporal or spatial neighboring blocks have a strong correlation, MV prediction techniques can be used to further reduce the bits required to encode MVs. In some codec standards, inter prediction includes two MV prediction techniques, merge and AMVP (Advanced Motion Vector Prediction ). The Merge mode is a motion vector prediction technique, specifically, an inter coding mode that directly uses motion information (i.e., MV) of a neighboring block as a candidate MVP and takes the best motion information as motion information of a current block, i.e., mv=mvp (Motion Vector Prediction, abbreviated MVP) for a coded block using the Merge mode.

The Merge mode establishes a candidate list (called Merge list) of MVs for the current PU (prediction unit), traverses candidate MVs in the candidate list, and selects the MV with the minimum cost of rate-distortion optimization (Rate distortion optimization, RDO for short) as the optimal MV. If the codec builds the candidate list in the same way, the encoder only needs to transmit the index of the optimal MV in the candidate list. It should be noted that, the MV prediction technology of HEVC is also a skip mode, which is a special case of the Merge mode, and after finding the optimal MV through the Merge mode, if the current block is basically the same as the reference block, it is not necessary to transmit residual data, and only the index of the MV and a skip flag need to be transmitted.

The Merge mode contains 5 candidate MVP types in total, and the VVC criteria define the order in which they construct the Merge list, in terms of spatial, temporal, HMVP, pair-wise average, and zero MVP, from first to second, respectively. The Merge mode contains 5 spatial candidate MVPs, which construct a Merge list in the order { A1, A0, B2, B0, B1}, as shown in fig. 5 (a); the Merge mode contains 2 temporal candidates MVP, namely { D1, D0} as shown in fig. 5 (b). The VVC standard also allows the length of the Merge list to be customized. In the Merge list construction process, if the Merge list is filled, construction is stopped.

In video, mutual occlusion between moving objects is a common phenomenon, which can result in the same object being present in non-adjacent coded blocks. Although these non-neighboring blocks have the same motion trajectories, HEVC does not allow for building Merge lists with non-neighboring coded blocks. Aiming at the problems, the VVC introduces the HMVP technology to expand the application range of the Merge mode. The HMVP technique principle is to construct and maintain a 5-sized queue for recording motion information of previously encoded blocks. When encoding the current block, HMVP is filled into the Merge list in tail-to-head order until full. The HMVP technique stores motion information of previously encoded blocks in a first-in first-out (First Input First Output, FIFO) queue, and if the stored prediction candidate information is the same as the motion information just encoded, the repeated candidate information is first removed, then all HMVP candidate information is moved forward, and motion information of the current block is added at the end of the FIFO queue. If the motion information of the current block is not the same as any candidate motion information in the FIFO queue, the latest motion information is added to the end of the FIFO queue. When new motion information is added to the HMVP list, if the list has reached the maximum length, the first candidate information in the FIFO queue is removed, and the latest motion information is added to the end of the FIFO queue.

CIIP is a joint intra and inter prediction technique, which is a branch of the Merge technique. CIIP needs to calculate the intra-frame prediction value of the current prediction block, i.e. the pixel value of the current block is predicted by using the traditional intra-frame prediction mode and stored. And predicting the inter-frame predicted value of the current block by using an inter-frame prediction mode, and finally weighting the intra-frame predicted value and the inter-frame predicted value in a certain mode to obtain the final predicted value of the current block. The specific weighting formula is as follows:

P _CIIP ＝((4-wt)×P _inter +wt×P _intra +2)>>2

wherein in the above formula, P _CIIP P, the final predicted value of CIIP _intra P is the intra prediction value _inter And the weight is the weight, and the inter prediction value is the weight. To reduce algorithm complexity, the VVC standard defines P _intra Can only be generated by Planar prediction modes; p (P) _inter Can only be generated by the Merge mode. The weight wt is determined by the prediction mode M of the upper neighboring block (shown in FIG. 6) of the current block _u And prediction mode M of a left neighboring block (as shown in fig. 6) _l The determination is as follows:

the VVC encoder decides whether to use the CIIP mode through RDO and transmits an enable flag of the CIIP to the decoding side through a code stream.

MMVD in inter prediction techniques is called fusion technique with motion vector difference, MMVD also belongs to one of the Merge-based techniques, and also belongs to Merge branches in syntax elements at the decoding end. The MMVD technique is generally as follows: firstly, selecting first two candidate motion vectors in a Merge list by utilizing the Merge list of a common Merge mode as initial motion vectors; the initial motion vector is then extended, primarily in the motion amplitude and motion direction, to obtain the final extended motion vector to form a new motion vector. Specifically, as shown in fig. 7, with respect to an initial motion vector, a search of 8 steps (i.e., motion magnitudes) is performed in four directions up, down, left, and right, starting from a position at which the initial motion vector points in a reference image (i.e., a forward reference image in which the play order precedes the current image, and L0 reference in fig. 7 represents a backward reference image in which the play order follows the current image).

VVC defines 2 MMVD amplitude sets I and II, each set containing 8 amplitudes in pixels. Wherein the amplitude set I allows MVD of the fractional pixel domain; and the amplitude set II only comprises MVD of integer pixel domain, which is suitable for video coding of screen content. The comparison of MMVD amplitude values and their indices is shown in table 1 below. A flag bit of an amplitude set may be sent in the Picture Header to specify the MMVD amplitude set used for the current Picture.

Amplitude index	0	1	2	3	4	5	6	7
									Amplitude set I (pixel)	1/4	1/2	1	2	4	8	16	32
Amplitude set II (Pixel)	1	2	4	8	16	32	64	128

Table 1 the correlation of MMVD search direction indexes is shown in table 2 below.

Direction index	00	01	10	11
					Angle of	0°	180°	90°	270°
X-axis	+	-	N/A	N/A
					Y-axis	N/A	N/A	+	-

TABLE 2

To reduce MMVD algorithm complexity, the VVC standard defines that MMVD techniques can only be used for the 1 st and 2 nd candidate MVs in the Merge list. The encoder side decides the magnitude and direction of MMVD by rate-distortion optimized RDO and transmits their index to the decoding side through the code stream.

DMVR is a technique proposed to improve the accuracy of bi-prediction in the Merge mode. The starting point of DMVR is that the inter-frame coding generally adopts a prediction mode based on block matching to perform residual coding, so that in order to more accurately obtain the predicted pixels of the current block to reduce the code rate consumed by coding residual signals, the video coding standard adopts MVs with sub-pixel precision. With the improvement of the resolution of the coded video, the code rate required for coding the MV will also be greatly improved.

The VVC divides the decoded frames into two groups, i.e., reference frame lists L0 and L1, which are to be used as reference frames for predicting the decoded blocks in the current frame. The frames in the reference frame list L0 are frames preceding the current frame in play order, which are referred to as forward reference frames, and the frames in the reference frame list L1 are frames following the current frame in play order, which are referred to as backward reference frames. The indices refIdxL0 and refIdxL1 of the reference frames are used to indicate which two of L0 and L1, respectively, are specifically used to predict the current block. In the case of bi-prediction, two prediction blocks will be obtained from L0 and L1 by MV0 and MV1, respectively, which combine to obtain one prediction signal for the current block. As mentioned above, if the current block is Merge mode encoded, the reference frame indexes refIdxL0, refIdxL1 and motion vectors MV0, MV1 of the neighboring blocks are directly used to obtain the prediction signal of the current block, and the code rate of encoding motion information is low, but may cause a problem that accurate prediction cannot be performed.

The purpose of DMVR is to solve the problem of low prediction accuracy of such bi-predicted Merge mode coding blocks, specifically, DMVR does not directly use MV0 and MV1, but searches for more accurate MV0 'and MV1' around MV0 and MV 1. In detail, assuming that MVs derived using the Merge mode are called Initial MVs, DMVR refines the Initial MVs by means of block matching, in two reference frames, the traversal searches for surrounding blocks of the block pointed to by the Initial MVs, the two best matching blocks will be used to generate the final prediction signal, and the new MVs formed will be called refinished MVs.

The push procedure for DMVR Merge is specifically: assuming that the current coding block size is W×H, performing bilinear interpolation of 2-tap by taking Initial Mvs as MV, wherein the block size of interpolation output is (W+4) x (H+4); then traversing the sub-blocks, performing template match (namely template match) of the whole pixel in a 2 multiplied by 2 area, searching 25 points in total, and calculating each point to obtain SAD (Sum of Absolute Difference, sum of absolute differences) value; then carrying out sub-pixel interpolation on the SAD to obtain sub-pixel precision MV with minimum SAD; and finally, 8-tap interpolation is carried out according to the sub-pixel precision, and a final motion compensation block of the sub-block is generated.

Bidirectional optical flow techniques (i.e., BDOF) are employed in the VVC to correct bi-directionally predicted pixel values. BDOF is applied only to the luminance component, the BDOF mode is based on the optical flow concept, which assumes that the motion of the object is smooth, and BDOF is used to correct the bi-prediction signal of the 4 x 4 sub-block of the current block. For each 4 x 4 sub-block, a motion correction value is calculated by minimizing the difference between L0 and L1 prediction pixels, and then used to adjust the bi-prediction sample values in the 4 x 4 sub-block.

Meanwhile, it is also defined in the VVC standard that DMVR and BDOF are applicable only to bi-directional prediction, and that the distances from the reference image frame of the previous prediction and the backward prediction to the current frame are equal. Furthermore, DMVR techniques are not applicable to CIIP or MMVD Merge modes; whereas BDOF is not applicable to CIIP merge mode.

If only the conventional Merge mode is considered, then the VVC may lead to high computational effort for the search process, and the high IO bandwidth problem is further exacerbated. The CTU size of VVC is enlarged up to 128×128, and the required Search Window of IME (Integer MotionEstimation, hardware integer motion estimation) is increased by 1.77 times with SW (Search Window) unchanged. But if added at the same time, the overall computation may be very large. Second, the addition of tools such as DMVR, BDOF, CIIP further increases the computational effort and bandwidth requirements of bi-directional searches, and also presents new challenges for the design of inter-frame hardware architecture. Based on the above, the technical solution of the embodiment of the present application proposes a new video coding scheme, which can reduce the calculation amount and the overhead of the hardware circuit, and is beneficial to improving the coding and decoding performance of the system.

The implementation details of the technical solutions of the embodiments of the present application are described in detail below:

fig. 8 shows a flow chart of a video encoding method according to an embodiment of the present application, which may be performed by an electronic device, which may be a terminal device or a server. Referring to fig. 8, the video encoding method at least includes steps S810 to S840, and is described in detail as follows:

In step S810, interpolation processing is performed on the first motion vector in the candidate motion vector list of the current coding block, so as to obtain a modified motion vector corresponding to the first motion vector.

In some alternative embodiments, the list of candidate motion vectors for the current encoded block is used to determine the motion vector for the current encoded block. Alternatively, the candidate motion vector list of the current coding block may be a candidate motion vector list determined by the current coding block in the merger mode, that is, the candidate motion vector list of the current coding block may include spatial and temporal candidate motion vectors, and may also include HMVP, pair-wise average, zero value MVP, and the like.

It should be noted that, the number of candidate motion vectors included in the candidate motion vector list of the current coding block may be determined according to the coding standard actually adopted, for example, in the VCC standard, the candidate motion vector list may include at most 6 candidate motion vectors.

In some alternative embodiments, the first motion vector in the candidate motion vector list may be any one or more candidate motion vectors. Alternatively, a motion vector in the candidate motion vector list, which cannot employ the bi-directional coding mode, may be used as the first motion vector. When the interpolation processing is performed on the first motion vector, 8-tap interpolation processing may be performed.

In step S820, the interpolation processing result is optimized by the CIIP mode, so as to obtain a corrected motion vector corresponding to the CIIP mode.

In the embodiment of the application, the interpolation processing result of the first motion vector can be optimized through the CIIP mode to obtain the corrected motion vector corresponding to the CIIP mode, so that the CIIP mode can multiplex the interpolation processing circuit, the calculation amount of the interpolation processing can be reduced, and the expenditure of a hardware circuit can be reduced.

In step S830, a motion vector set corresponding to the current coding block is generated according to the corrected motion vector corresponding to the first motion vector and the corrected motion vector corresponding to the CIIP mode.

In some alternative embodiments, the modified motion vector corresponding to the first motion vector and the modified motion vector corresponding to the CIIP mode may be used as a set to obtain a set of motion vectors corresponding to the current coding block.

In some alternative embodiments, the second motion vector in the candidate motion vector list may be further optimized by the bi-directional coding mode, so as to obtain a modified motion vector corresponding to the second motion vector, and then the modified motion vector corresponding to the second motion vector is added to the candidate motion vector list. Wherein the second motion vector may be any one or more candidate motion vectors in the candidate motion vector list. Alternatively, a motion vector in the candidate motion vector list, which can employ the bi-directional coding mode, may be used as the second motion vector. In other words, if the candidate motion vector in the candidate motion vector list can employ the bi-directional coding mode (e.g., DMVR mode, BDOF mode, etc.), the optimization processing in the above-described embodiment is performed as the second motion vector by the bi-directional coding mode, and if the candidate motion vector in the candidate motion vector list cannot employ the bi-directional coding mode, the optimization processing in the above-described embodiment is performed as the first motion vector.

In some alternative embodiments, when the second motion vector in the candidate motion vector list is optimized through the bi-directional coding mode, the second motion vector may be optimized through the DMVR mode, so as to obtain a modified motion vector corresponding to the second motion vector. Or the second motion vector can be optimized through BDOF mode, so as to obtain a corrected motion vector corresponding to the second motion vector. Or the second motion vector can be optimized through a DMVR mode and a BDOF mode, so that a corrected motion vector corresponding to the second motion vector is obtained.

In some alternative embodiments, when the second motion vector is optimized through the DMVR mode and the BDOF mode, the second motion vector may be optimized through the DMVR mode to obtain a corrected motion vector corresponding to the DMVR mode, then the corrected motion vector corresponding to the DMVR mode is interpolated, and the interpolation result is optimized through the BDOF mode to obtain a corrected motion vector corresponding to the second motion vector. Alternatively, interpolation processing may be performed on the corrected motion vector corresponding to the DMVR mode.

With continued reference to fig. 8, in step S840, a motion vector of the current encoded block is determined from the set of motion vectors, and encoding processing is performed based on the determined motion vector.

In some alternative embodiments, determining the motion vector of the current coding block from the set of motion vectors may specifically be: selecting a target motion vector for performing MMVD mode from the motion vector set, performing optimization processing on the selected target motion vector through the MMVD mode to obtain a corrected motion vector corresponding to the MMVD mode, and selecting a motion vector of the current coding block from the motion vector set and the corrected motion vector corresponding to the MMVD mode.

In some alternative embodiments, when a target motion vector for MMVD is selected from a motion vector set, a SATD cost corresponding to each motion vector in the motion vector set may be calculated, and then, according to the SATD cost corresponding to each motion vector, a motion vector with the minimum SATD cost is selected from the motion vector set as the target motion vector.

In some alternative embodiments, the optimization of the selected target motion vector by MMVD mode may be: and searching according to 1/4 pixel precision step length by taking the target motion vector as a searching starting point to determine an optimal searching direction, then continuously searching according to other pixel precision step lengths of the MMVD mode in the optimal searching direction, and determining a corrected motion vector corresponding to the MMVD mode based on a searching result. Specifically, the target motion vector is used as a searching starting point, searching is performed in each searching direction according to 1/4 pixel precision step length to obtain searching results in each searching direction, and then the searching direction with the minimum SATD cost is selected as the optimal searching direction according to the SATD cost corresponding to the searching results in each searching direction.

In some alternative embodiments, the process of selecting the motion vector of the current coding block from the motion vector set and the modified motion vectors corresponding to the MMVD mode may be to calculate RDO costs corresponding to each motion vector in the modified motion vector set and the motion vector set corresponding to the MMVD mode, and then select the motion vector with the minimum RDO cost from the modified motion vector set and the motion vector set corresponding to the MMVD mode as the motion vector of the current coding block.

In some alternative embodiments, the process of determining the motion vector of the current coding block according to the motion vector set may be to calculate RDO costs corresponding to each motion vector in the motion vector set, and then select the motion vector with the minimum RDO cost from the motion vector set as the motion vector of the current coding block. That is, in this embodiment, the optimization process of MMVD mode may not be introduced, but the motion vector of the current coding block may be selected directly based on RDO costs corresponding to each motion vector in the motion vector set.

In summary, the technical solution of the embodiment of the present application mainly proposes a scheme for performing mode fast selection in a merging manner during a coding process, so as to reduce the processing computation as much as possible and reduce the overhead of a hardware circuit, so as to improve the coding and decoding performance of the system.

Specifically, as shown in fig. 9, for the motion vector in the Merge list of the current block, when performing the optimization process, the DMVR & BDOF Merge and CIIP modes may be mutually exclusive, that is, if the DMVR or BDOF of a certain Merge (index) is available, the CIIP mode is not checked. In the example shown in fig. 9, it is assumed that merge_idx0, merge_idx1, and merge_idx2 in the Merge list are in DMVR and BDOF modes, and merge_idx3, merge_idx4, and merge_idx5 are in CIIP mode.

With continued reference to fig. 9, the motion vectors in the Merge list undergo DMVR search processing and then undergo 8-tap interpolation processing, and then undergo BDOF search processing, so that at most 6 optimized motion vectors can be obtained.

In one embodiment of the present application, the common Merge mode (i.e. when DMVR or BDOF is turned off) and the CIIP mode can be used to perform compensation processing on motion vectors in series, and since the timing constraint of DMVR & BDOF is longer than that of the CIIP mode, the serial scheme does not increase the upper timing limit of the system, and the CIIP mode multiplexes the 8-tap interpolation circuit with the common Merge mode, so that the hardware circuit overhead is reduced.

In one embodiment of the present application, the MMVD mode may also be used in tandem with the DMVR & BDOF/common Merge & CIIP modes. Specifically, the optimal values in Merge Idx0 and Merge Idx1 (i.e., the first two candidate motion vectors in the roughing result) may be determined first from the roughing results of DMVR & BDOF/ordinary Merge & CIIP, i.e., their SATDcost values. Then, the optimal value is used as a reference Merge mode of MMVD, so that the processing procedure of MMVD candidate modes can be reduced by half, and the calculated amount of processing is reduced. Meanwhile, when MMVD processing is performed, MMVD with 1/4 pixel precision step length can be searched first, then the optimal MMVD searching direction is determined according to SATD cost, then MMVD modes with other step lengths are continuously searched in the optimal searching direction, and finally the optimal motion vector is selected according to RDO to serve as the motion vector of the current block.

It can be seen that, in the embodiment shown in fig. 9, in the first aspect, the motion vectors in the Merge list may be subjected to DMVR search processing and then 8-tap interpolation processing, and then BDOFsearch processing is performed, so that the optimal motion vector is selected according to RDO; the second aspect can also use the common Merge mode and CIIP mode to carry out the serial to carry out the compensation processing to the motion vector, and then select the optimal motion vector according to RDO; the third aspect may also combine the first and second aspects above and then select an optimal motion vector based on RDO; the fourth aspect may further perform optimization processing in combination with MMVD based on the third aspect, and then select an optimal motion vector according to RDO; in a fifth aspect, the above third and fourth aspects may be combined to obtain candidate motion vectors, and then the optimal motion vector may be selected according to RDO.

According to the technical scheme, the various coding modes are arranged, so that the calculated amount of processing can be reduced, the expenditure of a hardware circuit can be reduced, and the coding and decoding performance of a system can be improved.

The following describes an embodiment of an apparatus of the present application, which may be used to perform the video encoding method in the above embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the video encoding method described in the present application.

Fig. 10 shows a block diagram of a video encoding apparatus according to one embodiment of the present application.

Referring to fig. 10, a video encoding apparatus 1000 according to an embodiment of the present application includes: interpolation unit 1002, processing unit 1004, generation unit 1006, and determination unit 1008.

The interpolation unit 1002 is configured to perform interpolation processing on a first motion vector in a candidate motion vector list of a current coding block, so as to obtain a corrected motion vector corresponding to the first motion vector; the processing unit 1004 is configured to perform optimization processing on the result of the interpolation processing on the first motion vector through inter-frame and intra-frame combined prediction, so as to obtain a corrected motion vector corresponding to the CIIP mode; the generating unit 1006 is configured to generate a motion vector set corresponding to the current coding block according to the corrected motion vector corresponding to the first motion vector and the corrected motion vector corresponding to the CIIP mode; the determination unit 1008 is configured to determine a motion vector of the current coding block from the set of motion vectors, and perform coding processing based on the determined motion vector.

In some embodiments of the present application, based on the foregoing solution, the processing unit 1004 is further configured to: optimizing a second motion vector in the candidate motion vector list through a bidirectional coding mode to obtain a corrected motion vector corresponding to the second motion vector; and adding the corrected motion vector corresponding to the second motion vector to the candidate motion vector list.

In some embodiments of the present application, based on the foregoing solution, the process of performing, by the processing unit 1004, optimization processing on a second motion vector in the candidate motion vector list through a bi-directional coding mode to obtain a modified motion vector corresponding to the second motion vector includes:

In some embodiments of the present application, based on the foregoing solution, the processing unit 1004 is further configured to: determining a motion vector capable of adopting a bidirectional coding mode from the candidate motion vector list; and taking the motion vector which can adopt the bidirectional coding mode in the candidate motion vector list as the second motion vector, and taking the motion vector which cannot adopt the bidirectional coding mode in the candidate motion vector list as the first motion vector.

In some embodiments of the present application, based on the foregoing scheme, the determining unit 1008 is configured to: selecting a target motion vector for performing a merge mode MMVD with a motion vector difference from the set of motion vectors; optimizing the selected target motion vector through an MMVD mode to obtain a corrected motion vector corresponding to the MMVD mode; and selecting the motion vector of the current coding block from the motion vector set and the corrected motion vector corresponding to the MMVD mode.

In some embodiments of the present application, based on the foregoing solution, the determining unit 1008 selects a process for performing a target motion vector of a merge mode MMVD with a motion vector difference from the motion vector set, including: calculating the sum SATD cost of absolute conversion differences corresponding to each motion vector in the motion vector set; and selecting a motion vector with the minimum SATD cost from the motion vector set according to the SATD cost corresponding to each motion vector to serve as the target motion vector.

In some embodiments of the present application, based on the foregoing solution, the process of performing, by the determining unit 1008, optimization processing on the selected target motion vector by using an MMVD mode to obtain a corrected motion vector corresponding to the MMVD mode includes: taking the target motion vector as a searching starting point, searching according to a 1/4 pixel precision step length to determine an optimal searching direction; and in the optimal searching direction, searching is continued according to the precision step length of other pixels of the MMVD mode, and the corrected motion vector corresponding to the MMVD mode is determined based on a searching result.

In some embodiments of the present application, based on the foregoing solution, the determining unit 1008 selects a motion vector of the current coding block from the motion vector set and the modified motion vector corresponding to the MMVD mode, including: calculating a modified motion vector corresponding to the MMVD mode and rate distortion optimized RDO cost corresponding to each motion vector in the motion vector set; and selecting a motion vector with the minimum RDO cost from the modified motion vector and the motion vector set corresponding to the MMVD mode as the motion vector of the current coding block.

In some embodiments of the present application, based on the foregoing scheme, the determining unit 1008 is configured to: calculating rate distortion optimization RDO cost corresponding to each motion vector in the motion vector set; and selecting the motion vector with the minimum RDO cost from the motion vector set as the motion vector of the current coding block.

It should be noted that, the computer system 1100 of the electronic device shown in fig. 11 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

As shown in fig. 11, the computer system 1100 may include a central processing unit (CentralProcessing Unit, CPU) 1101 that may perform various appropriate actions and processes, such as performing the methods described in the above embodiments, according to a program stored in a Read-only memory (ROM) 1102 or a program loaded from a storage section 1108 into a random access memory (Random Access Memory, RAM) 1103. In the RAM 1103, various programs and data required for system operation are also stored. The CPU 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An Input/Output (I/O) interface 1105 is also connected to bus 1104.

The following components may be connected to the I/O interface 1105: an input section 1106 including a keyboard, a mouse, and the like; an output portion 1107 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and a speaker; a storage section 1108 including a hard disk or the like; and a communication section 1109 including a network interface card such as a LAN (Local Area Network ) card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet. The drive 1110 is also connected to the I/O interface 1105 as needed. Removable media 1111, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in drive 1110, so that a computer program read therefrom is installed as needed in storage section 1108.

In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium for performing the method shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1109, and/or installed from the removable media 1111. When executed by a Central Processing Unit (CPU) 1101, performs the various functions defined in the system of the present application.

It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with a computer-readable computer program embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. A computer program embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Where each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer programs.

The units involved in the embodiments of the present application may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more computer programs which, when executed by the electronic device, cause the electronic device to implement the methods described in the above embodiments.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, in accordance with embodiments of the present application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a mobile hard disk, etc.) or on a network, comprising several instructions to cause an electronic device to perform the method according to the embodiments of the present application. For example, the electronic device may perform the video encoding method shown in fig. 8.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.

It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A video encoding method, comprising:

interpolation processing is carried out on a first motion vector in a candidate motion vector list of a current coding block, and a corrected motion vector corresponding to the first motion vector is obtained;

optimizing the result of the interpolation processing of the first motion vector through inter-frame and intra-frame combined prediction CIIP mode to obtain a corrected motion vector corresponding to the CIIP mode;

generating a motion vector set corresponding to the current coding block according to the corrected motion vector corresponding to the first motion vector and the corrected motion vector corresponding to the CIIP mode;

And determining the motion vector of the current coding block according to the motion vector set, and performing coding processing based on the determined motion vector.

2. The video coding method of claim 1, wherein prior to determining the motion vector of the current coding block from the set of motion vectors, the method further comprises:

optimizing a second motion vector in the candidate motion vector list through a bidirectional coding mode to obtain a corrected motion vector corresponding to the second motion vector;

and adding the corrected motion vector corresponding to the second motion vector to the candidate motion vector list.

3. The video coding method according to claim 2, wherein optimizing the second motion vector in the candidate motion vector list by the bi-directional coding mode to obtain a modified motion vector corresponding to the second motion vector, comprises:

4. The video coding method according to claim 3, wherein optimizing the second motion vector by the DMVR mode and the BDOF mode includes:

optimizing the second motion vector through the DMVR mode to obtain a corrected motion vector corresponding to the DMVR mode;

and carrying out interpolation processing on the corrected motion vector corresponding to the DMVR mode, and carrying out optimization processing on an interpolation result through the BDOF mode to obtain the corrected motion vector corresponding to the second motion vector.

5. The video coding method of claim 2, wherein the video coding method further comprises:

determining a motion vector capable of adopting a bidirectional coding mode from the candidate motion vector list;

and taking the motion vector which can adopt the bidirectional coding mode in the candidate motion vector list as the second motion vector, and taking the motion vector which cannot adopt the bidirectional coding mode in the candidate motion vector list as the first motion vector.

6. The video coding method according to any one of claims 1 to 5, wherein determining a motion vector of the current coding block from the set of motion vectors comprises:

selecting a target motion vector for performing a merge mode MMVD with a motion vector difference from the set of motion vectors;

optimizing the selected target motion vector through an MMVD mode to obtain a corrected motion vector corresponding to the MMVD mode;

and selecting the motion vector of the current coding block from the motion vector set and the corrected motion vector corresponding to the MMVD mode.

7. The video coding method of claim 6, wherein selecting a target motion vector from the set of motion vectors for performing merge mode MMVD with motion vector differences, comprises:

calculating the sum SATD cost of absolute conversion differences corresponding to each motion vector in the motion vector set;

and selecting a motion vector with the minimum SATD cost from the motion vector set according to the SATD cost corresponding to each motion vector to serve as the target motion vector.

8. The video coding method according to claim 6, wherein optimizing the selected target motion vector by an MMVD mode to obtain a corrected motion vector corresponding to the MMVD mode, comprises:

Taking the target motion vector as a searching starting point, searching according to a 1/4 pixel precision step length to determine an optimal searching direction;

and in the optimal searching direction, searching is continued according to the precision step length of other pixels of the MMVD mode, and the corrected motion vector corresponding to the MMVD mode is determined based on a searching result.

9. The video coding method according to claim 8, wherein searching according to a 1/4 pixel precision step size using the target motion vector as a search start point to determine an optimal search direction comprises:

searching in each searching direction according to 1/4 pixel precision step length by taking the target motion vector as a searching starting point to obtain searching results in each searching direction;

and selecting the search direction with the minimum SATD cost as the optimal search direction according to the SATD cost corresponding to the search result in each search direction.

10. The video coding method of claim 6, wherein selecting the motion vector of the current coded block from the set of motion vectors and the modified motion vector corresponding to the MMVD mode comprises:

calculating a modified motion vector corresponding to the MMVD mode and rate distortion optimized RDO cost corresponding to each motion vector in the motion vector set;

And selecting a motion vector with the minimum RDO cost from the modified motion vector and the motion vector set corresponding to the MMVD mode as the motion vector of the current coding block.

11. The video coding method according to any one of claims 1 to 5, wherein determining a motion vector of the current coding block from the set of motion vectors comprises:

calculating rate distortion optimization RDO cost corresponding to each motion vector in the motion vector set;

and selecting the motion vector with the minimum RDO cost from the motion vector set as the motion vector of the current coding block.

12. The video coding method according to any one of claims 1 to 5, wherein the candidate motion vector list of the current coding block is a candidate motion vector list determined by the current coding block in a Merge mode.

13. A video encoding apparatus, comprising:

an interpolation unit configured to perform interpolation processing on a first motion vector in a candidate motion vector list of a current coding block to obtain a corrected motion vector corresponding to the first motion vector;

the processing unit is configured to optimize the result of the interpolation processing of the first motion vector through the inter-frame and intra-frame combined prediction CIIP mode to obtain a corrected motion vector corresponding to the CIIP mode;

A generating unit configured to generate a motion vector set corresponding to the current coding block according to the corrected motion vector corresponding to the first motion vector and the corrected motion vector corresponding to the CIIP mode;

and a determining unit configured to determine a motion vector of the current coding block from the motion vector set, and perform coding processing based on the determined motion vector.

14. A computer readable medium on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the video encoding method according to any one of claims 1 to 12.

15. An electronic device, comprising:

one or more processors;

a memory for storing one or more computer programs that, when executed by the one or more processors, cause the electronic device to implement the video encoding method of any of claims 1-12.

16. A computer program product, characterized in that the computer program product comprises a computer program stored in a computer readable storage medium, from which computer readable storage medium a processor of an electronic device reads and executes the computer program, causing the electronic device to perform the video encoding method according to any one of claims 1 to 12.