[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2020006969A1 - 运动矢量预测方法以及相关装置 - Google Patents

运动矢量预测方法以及相关装置 Download PDF

Info

Publication number
WO2020006969A1
WO2020006969A1 PCT/CN2018/116984 CN2018116984W WO2020006969A1 WO 2020006969 A1 WO2020006969 A1 WO 2020006969A1 CN 2018116984 W CN2018116984 W CN 2018116984W WO 2020006969 A1 WO2020006969 A1 WO 2020006969A1
Authority
WO
WIPO (PCT)
Prior art keywords
current block
motion vector
block
control points
pixel point
Prior art date
Application number
PCT/CN2018/116984
Other languages
English (en)
French (fr)
Inventor
陈焕浜
杨海涛
陈建乐
傅佳莉
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to JP2020573324A priority Critical patent/JP7368396B2/ja
Priority to KR1020237040062A priority patent/KR20230162152A/ko
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to KR1020217002027A priority patent/KR102606146B1/ko
Priority to EP18925643.1A priority patent/EP3809704A4/en
Priority to SG11202013202YA priority patent/SG11202013202YA/en
Priority to CN202211226203.3A priority patent/CN115733974A/zh
Priority to BR112020026992-1A priority patent/BR112020026992A2/pt
Priority to CN202211258218.8A priority patent/CN115695791A/zh
Priority to MX2021000171A priority patent/MX2021000171A/es
Priority to CN201880002952.3A priority patent/CN110876282B/zh
Publication of WO2020006969A1 publication Critical patent/WO2020006969A1/zh
Priority to US17/140,041 priority patent/US11206408B2/en
Priority to US17/525,944 priority patent/US11683496B2/en
Priority to US18/318,731 priority patent/US12108048B2/en
Priority to US18/318,730 priority patent/US12120310B2/en
Priority to JP2023176813A priority patent/JP2023184560A/ja

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/537Motion estimation other than block-based
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • H04N19/139Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/537Motion estimation other than block-based
    • H04N19/54Motion estimation other than block-based using feature points or meshes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • Embodiments of the present invention relate to the technical field of video coding and decoding, and in particular, to a method and a device for predicting a motion vector of a video image, and a corresponding encoder and decoder.
  • Video encoding (video encoding and decoding) is widely used in digital video applications, such as broadcast digital television, video transmission on the Internet and mobile networks, real-time conversation applications such as video chat and video conferencing, DVD and Blu-ray discs, video content acquisition and editing systems And security applications for camcorders.
  • Video Coding AVC
  • ITU-T H.265 High Efficiency Video Coding
  • 3D three-dimensional
  • HEVC High Efficiency Video Coding
  • Embodiments of the present invention provide a motion vector prediction method and related devices, so as to improve coding efficiency and meet user requirements.
  • an embodiment of the present invention provides a motion vector prediction method, which is described from the perspective of a decoding end or an encoding end.
  • the method can be used to predict an image block to be processed.
  • the image block to be processed is obtained by segmenting a video image. It is obtained that, at the encoding end, the image block to be processed is the current affine coding block, and the decoded image block adjacent to the space of the image block to be processed is an adjacent affine coding block.
  • the image block to be processed is the current affine decoding block, and the decoded image block adjacent to the spatial domain of the image block to be processed is the adjacent affine decoding block.
  • the image blocks to be processed may be collectively referred to as current blocks, and reference blocks adjacent to the spatial domain of the image blocks to be processed are collectively referred to as neighboring blocks.
  • the method includes: parsing a bitstream to obtain an index value of a candidate motion vector list; constructing the candidate motion vector list; wherein the candidate motion vector list includes candidate motion vectors of K control points of a current block; the K The candidate motion vector of the control point is obtained according to an affine transformation model of 2 by N parameters adopted by the neighboring block of the current block, and the affine transformation model of 2 by N parameters is based on the neighboring block Obtained from the motion vectors of N control points, where N is an integer greater than or equal to 2 and less than or equal to 4, K is an integer greater than or equal to 2 and less than or equal to 4, and N is not equal to K; according to the index value, Determine the target candidate motion vector of the K control points from the candidate motion vector list; and obtain the predicted motion vector of each sub-block position in the current block according to
  • the decoder can construct a candidate list stage at the current block (such as the stage of constructing a candidate motion vector list based on the affine transformation model AMVP mode or Merge mode). ),
  • the affine transformation model of the neighboring block is used to construct the affine transformation model for the current block itself, and the affine transformation models of the two can be different. Since the affine transformation model of the current block is more in line with the actual motion situation / actual requirements of the current block, the implementation of this solution can improve the coding efficiency and accuracy of the prediction of the current block to meet user needs.
  • the availability of neighboring blocks of one or more preset airspace positions of the current block may be determined according to a preset order, and then the phases available in the preset order are sequentially obtained.
  • the pre-available neighboring blocks may include neighboring image blocks located directly above, directly to the left, upper right, lower left, or upper left of the image block to be processed.
  • the availability of the neighboring blocks is sequentially checked in the order of neighboring image blocks to the left, neighboring image blocks to the top, neighboring image blocks to the top right, neighboring image blocks to the bottom left, and neighboring image blocks to the top left.
  • the candidate motion vectors of the three control points of the current block include the position of the upper left pixel point in the current block (or the upper left vertex, the same below) (x0, y0) motion vector (vx0, vy0) 2.
  • the candidate motion vectors of the three control points of the current block are obtained according to the 4-parameter affine transformation model adopted by the neighboring blocks of the current block, including first calculating the upper left vertex of the current block according to the following formula ( x0, y0) motion vector (vx0, vy0), motion vector (vx1, vy1) of the upper right vertex (x1, y1) of the current block, motion vector (vx2, vy2) of the lower left vertex (x2, y2) of the current block:
  • vx 0 is the horizontal component of the motion vector corresponding to the position of the upper-left pixel point in the current block
  • vy 0 is the vertical component of the motion vector corresponding to the position of the upper-left pixel point in the current block
  • vx 1 is the The horizontal component of the motion vector corresponding to the position of the upper right pixel point in the current block
  • vy 1 is the vertical component of the motion vector corresponding to the position of the upper right pixel point in the current block
  • vx 2 is the upper right pixel point in the current block
  • vy 2 is the vertical component of the motion vector corresponding to the position of the upper-right pixel point in the current block
  • vx 4 is the motion vector corresponding to the position of the upper-left pixel point in the adjacent block.
  • vy 4 is the vertical component of the motion vector corresponding to the position of the upper-left pixel point in the adjacent block
  • vx 5 is the horizontal component of the motion vector corresponding to the position of the upper-right pixel point in the adjacent block
  • vy 5 Is the vertical component of the motion vector corresponding to the position of the upper-right pixel point in the adjacent block
  • x 0 is the abscissa of the position of the upper-left pixel point in the current block
  • y 0 is the position of the upper-left pixel point in the current block vertical Coordinates
  • x 1 is the abscissa of the position of the upper-right pixel point in the current block
  • y 1 is the ordinate of the position of the upper-right pixel point in the current block
  • x 2 is the abscissa of the position of the lower-left pixel point in the current block
  • y 2 is the vertical coordinate of the pixel position of the lower left corner in the current block.
  • x 4 is the abscissa of the position of the upper left pixel point in the adjacent block
  • y 4 is the ordinate of the position of the upper left pixel point in the current block
  • x 5 is the abscissa of the position of the upper right pixel point in the adjacent block.
  • the candidate motion vectors of the two control points of the current block include the position of the upper left pixel point (or the upper left vertex, the same below) (x0, y0) in the current block (vx0, vy0). 2.
  • the candidate motion vectors of the two control points of the current block are based on the Obtained from the 6-parameter affine transformation model adopted by the neighboring blocks of the current block, including calculating the candidate motion vectors of the 2 control points of the current block according to the following formula:
  • vx 0 is the horizontal component of the motion vector corresponding to the position of the upper-left pixel point in the current block
  • vy 0 is the vertical component of the motion vector corresponding to the position of the upper-left pixel point in the current block
  • vx 1 is the The horizontal component of the motion vector corresponding to the position of the upper-right pixel point in the current block
  • vy 1 is the vertical component of the motion vector corresponding to the position of the upper-right pixel point in the current block
  • vx 4 is the upper-left pixel in the adjacent block
  • the horizontal component of the motion vector corresponding to the position of the point, vy 4 is the vertical component of the motion vector corresponding to the position of the upper left pixel point in the adjacent block
  • vx 5 is the motion corresponding to the position of the upper right pixel point in the adjacent block
  • the horizontal component of the vector, vy 5 is the vertical component of the motion vector corresponding to the pixel position of the upper right corner in the adjacent block
  • the implementation of the embodiment of the present invention can realize the affine transformation model for the current block by using the affine transformation model of the neighboring block during the analysis stage of the current block (such as the stage of constructing the candidate motion vector list). , And the affine transformation models of the two can be different. Since the affine transformation model of the current block is more in line with the actual motion situation / actual requirements of the current block, the implementation of this solution can improve the coding efficiency and accuracy of the prediction of the current block to meet user needs.
  • the process of obtaining the predicted motion vector of each sub-block in the current block includes the following process: According to the target candidate motion vectors of the K control points, the obtained The affine transformation model of 2 ⁇ K parameters of the current block is described; and the predicted motion vector of each sub-block in the current block is obtained according to the affine transformation model of 2 ⁇ K parameters of the current block.
  • the motion vectors corresponding to the point coordinates are used as the motion vectors (vx (i, j) , vy (i, j) ) of all pixels in the sub-block:
  • the affine motion model used by the current affine decoding block is a 4-parameter affine motion model
  • the pixel point coordinates (x (i, j) , y (i, j) ) in the sub-block are substituted into the 4-parameter affine motion model formula to obtain the pixel in each sub-block.
  • the motion vectors corresponding to the point coordinates are used as the motion vectors (vx (i, j) , vy (i, j) ) of all pixels in the sub-block:
  • the process of obtaining the predicted motion vectors of each sub-block in the current block includes the following process: according to the target candidate motions of the K control points of the current block Vector to obtain the 6-parameter affine transformation model of the current block; according to the 6-parameter affine transformation model of the current block, obtain the predicted motion vector of each sub-block in the current block.
  • the 6-parameter affine transformation model is uniformly used to obtain the current block. Motion vector information of each sub-block, thereby achieving reconstruction of each sub-block. For example, if a 4-parameter affine transformation model or an 8-parameter bilinear model is used in the analysis phase, a 6-parameter affine transformation model of the current block will be further constructed. If the 6-parameter affine transformation model is used in the parsing stage, the 6-parameter affine transformation model of the current block continues to be used in the reconstruction stage.
  • the current block uses a 4-parameter affine transformation model during the analysis phase
  • the neighboring block may use a 4-parameter affine transformation model
  • it may also use other parameter affine models.
  • the motion vectors of the 2 control points of the current block for example, obtaining the motion vector values (vx4, vy4) and the upper right control points (x5, y5) of the current block's upper left control point (x4, y4) Vector value (vx5, vy5).
  • a 6-parameter affine transformation model needs to be constructed according to the motion vectors of the 2 control points of the current block.
  • the following control formula is used to obtain the third control point
  • the motion vector value of the third control point is, for example, the motion vector (vx2, vy2) of the lower left vertex (x2, y2) of the current block:
  • the motion vector (vx0, vy0) of the upper left control point (x0, y0) of the current block the motion vector (vx1, vy1) of the upper right control point (x1, y1), and the motion vector of the lower left vertex (x2, y2) ( vx2, vy2) to obtain the 6-parameter affine model of the current block reconstruction phase.
  • the formula of the 6-parameter affine model is as follows:
  • the coordinates (x (i, j) ) of the preset position pixel point (such as the center point) of each sub-block (or each motion compensation unit) in the current block with respect to the upper-left vertex (or other reference point) of the current block, y (i, j) ) is substituted into the above parameter 6 parameter affine model formula, and the motion information of the pixel at the preset position of each sub-block (or each motion compensation unit) can be obtained, and then the reconstruction of each sub-block is subsequently realized.
  • the embodiments of the present invention it is possible to implement a unified 6-parameter affine transformation model to predict the current block during the reconstruction stage of the current block. Because the more parameters of the motion model describing the affine motion of the current block, the higher the accuracy, the higher the computational complexity.
  • the 6-parameter affine transformation model constructed in the reconstruction phase of this solution can describe the affine transformation of image blocks such as translation, scaling, rotation, etc., and achieves a good balance between model complexity and modeling capabilities. Therefore, the implementation of this solution can improve the coding efficiency and accuracy of prediction of the current block, and meet user needs
  • the affine transformation of the 2 ⁇ K parameters is obtained according to the target candidate motion vectors of the K control points.
  • the model includes: obtaining the motion vectors of the K control points according to the target candidate motion vectors of the K control points and the motion vector difference of the K control points; The motion vector difference is obtained by analyzing the code stream; according to the motion vectors of the K control points, an affine transformation model of 2 ⁇ K parameters of the current block is obtained.
  • the encoding end and the decoding end use the AMVP mode based on the affine transformation model to perform inter prediction, and the constructed list is a candidate motion vector list of the AMVP mode of the affine transformation model.
  • the first motion vector prediction method based on a motion model described herein may be used to obtain a candidate motion vector of a control point of a current block to be added to a candidate motion vector list corresponding to an AMVP mode.
  • the first motion vector prediction method based on the motion model and the control point motion vector prediction method constructed can also be used, and the candidate motion vectors of the control points of the current block are separately added to the AMVP mode correspondingly. List of candidate motion vectors.
  • the encoding end and the decoding end use the Merge mode based on the affine transformation model to perform inter prediction, and the constructed list is a candidate motion vector list of the Merge mode of the affine transformation model.
  • the first motion vector prediction method based on a motion model described herein may also be used to obtain the candidate motion vector of the control point of the current block to be added to the candidate motion vector list corresponding to the Merge mode.
  • the first motion vector prediction method based on the motion model and the control point motion vector prediction method based on the structure can also be used.
  • the candidate motion vectors of the control points of the current block are added to the Merge mode respectively. List of candidate motion vectors.
  • both the affine decoding blocks with the same number of parameters as the current block model can be used to obtain the candidate motion vectors of the control points of the current block and added to the candidate motion vector list corresponding to the AMVP mode.
  • an affine decoding block with a different number of parameters from the current block model is used to obtain candidate motion vectors for the control points of the current block, and is added to the candidate motion vector list corresponding to the AMVP mode.
  • the decoding end during the process of deriving the candidate motion vector of the control point of the current block by the decoding end, it may be necessary to obtain flag information of the affine transformation model of the affine decoding block.
  • the flag is stored locally in the decoding end in advance, and the flag is used to indicate the affine transformation model that is actually used when the affine decoding block performs its own sub-block prediction.
  • the decoding end when the decoding end recognizes the flag of the affine decoding block, it is determined that the number of model parameters of the affine transformation model actually used by the affine decoding block and the affine transformation model used by the current block are different (or The same), trigger the decoder to use the affine transformation model actually adopted by the affine decoding block to derive the candidate motion vector of the control point of the current block.
  • the flag of the affine transformation model of the affine decoding block may not be required.
  • the decoder determines the affine transformation model used by the current block
  • the decoder obtains the control of a specific number of affine decoded blocks (the specific number is the same as or different from the number of control points of the current block).
  • Point a specific number of control points of the affine decoding block are used to form an affine transformation model, and the affine transformation model is used to derive a candidate motion vector of the control point of the current block.
  • an embodiment of the present invention provides another motion vector prediction method.
  • the method includes: parsing a code stream to obtain an index value of a candidate motion vector list; constructing the candidate motion vector list; wherein the candidate motion vector
  • the list includes candidate motion vectors of K control points of the current block; the candidate motion vectors of K control points of the current block are obtained according to an affine transformation model of 2N parameters adopted by neighboring blocks of the current block,
  • the 2N parameter affine transformation model is obtained based on the motion vectors of N control points of the neighboring blocks, where N is an integer greater than or equal to 2 and less than or equal to 4, and K is an integer greater than or equal to 2 and less than or equal to 4.
  • the adjacent block is a decoded image block that is adjacent to the current block space, the current block includes a plurality of sub-blocks; and an index of the current block is determined from the candidate motion vector list according to the index value
  • a target candidate motion vector of K control points according to the target candidate motion vector of K control points of the current block, an affine transformation model of 6 parameters of the current block is obtained;
  • the 6-parameter affine transformation model of the previous block obtains the predicted motion vector of each sub-block in the current block.
  • the embodiments of the present invention it is possible to implement a unified 6-parameter affine transformation model to predict the current block during the reconstruction stage of the current block. Because the more parameters of the motion model describing the affine motion of the current block, the higher the accuracy, the higher the computational complexity.
  • the 6-parameter affine transformation model constructed in the reconstruction phase of this solution can describe the affine transformation of image blocks such as translation, scaling, rotation, etc., and achieves a good balance between model complexity and modeling capabilities. Therefore, the implementation of this solution can improve the coding efficiency and accuracy of prediction of the current block, and meet user needs.
  • the candidate motion vectors of the two control points of the current block are obtained according to a 4-parameter affine transformation model adopted by neighboring blocks of the current block.
  • the candidate motion vectors of the 2 control points of the current block are based on 6 parameters adopted by the neighboring blocks of the current block Affine transformation model.
  • obtaining the 6-parameter affine transformation model of the current block according to the target candidate motion vectors of the K control points of the current block includes:
  • a 6-parameter affine transformation model of the current block is obtained according to the target candidate motion vector of the 2 control points of the current block and the motion vector of the third control point.
  • the obtaining a 4-parameter affine transformation model of the current block according to the target candidate motion vectors of 2 control points of the current block includes:
  • obtaining the 6-parameter affine transformation model of the current block according to the target candidate motion vector of the 2 control points of the current block and the motion vector of the third control point includes:
  • an affine transformation model of 6 parameters of the current block is obtained.
  • the candidate motion vectors of the 3 control points of the current block are based on 2 parameters adopted by neighboring blocks of the current block. Affine transformation model.
  • an embodiment of the present invention provides a decoding device.
  • the device includes: a storage unit for storing video data in the form of a code stream; and an entropy decoding unit for parsing the code stream to obtain a list of candidate motion vectors.
  • An index value a prediction processing unit configured to construct the candidate motion vector list; wherein the candidate motion vector list includes candidate motion vectors of K control points of the current block; candidate motion vectors of K control points of the current block
  • the vector is obtained according to an affine transformation model of 2 by N parameters adopted by neighboring blocks of the current block, and the affine transformation model of 2 by N parameters is based on N control points of the neighboring blocks ,
  • N is an integer greater than or equal to 2 and less than or equal to 4
  • K is an integer greater than or equal to 2 and less than or equal to 4
  • N is not equal to K
  • the adjacent block is the same as the current block
  • the current block includes a plurality of sub-blocks, and a target candidate motion vector of K control points of the current block is determined from the candidate motion vector list according to the index value
  • a root According to the target candidate motion vectors of the K control points of the current block, the predicted motion vectors of each sub-block in the current block are obtained.
  • each module of the device may be used to implement the method described in the first aspect.
  • an embodiment of the present invention provides a decoding device.
  • the device includes: a storage unit for storing video data in the form of a code stream; and an entropy decoding unit for parsing the code stream to obtain a list of candidate motion vectors.
  • An index value a prediction processing unit configured to parse a code stream to obtain an index value of a candidate motion vector list; construct the candidate motion vector list; wherein the candidate motion vector list includes candidate motion vectors of K control points of the current block;
  • the candidate motion vectors of the K control points of the current block are obtained according to an affine transformation model of 2N parameters adopted by neighboring blocks of the current block, and the affine transformation model of the 2N parameters is based on the neighboring A motion vector of N control points of a block, where N is an integer greater than or equal to 2 and less than or equal to 4 and K is an integer greater than or equal to 2 and less than or equal to 4; the adjacent block is spaced from the current block.
  • the current block includes a plurality of sub-blocks
  • the target of the K control points of the current block is determined from the candidate motion vector list according to the index value Selecting a motion vector; obtaining a 6-parameter affine transformation model of the current block according to the target candidate motion vectors of the K control points of the current block; The predicted motion vector of each sub-block in the current block is described.
  • each module of the device may be used to implement the method described in the second aspect.
  • an embodiment of the present invention provides a device for decoding a video, and the device includes:
  • Memory for storing video data in the form of a stream
  • a decoder for constructing the candidate motion vector list wherein the candidate motion vector list includes candidate motion vectors of the K control points of the current block; the candidate motion vectors of the K control points of the current block are based on the It is obtained by using an affine transformation model of 2 by N parameters of the neighboring block of the current block, and the affine transformation model of 2 by N parameters is based on the motion vectors of N control points of the neighboring block. Obtained, where N is an integer greater than or equal to 2 and less than or equal to 4, K is an integer greater than or equal to 2 and less than or equal to 4, and N is not equal to K; the adjacent block is an already adjacent to the current block space.
  • the current block includes a plurality of sub-blocks; determining target candidate motion vectors of K control points of the current block from the candidate motion vector list according to the index value; and according to K of the current block Target candidate motion vectors for each control point to obtain predicted motion vectors for each sub-block in the current block.
  • N is equal to 2 and K is equal to 3. Accordingly, the candidate motion vectors of the 3 control points of the current block are based on the 4-parameters adopted by the neighboring blocks of the current block. Affine transformation model.
  • the candidate motion vectors of the three control points of the current block include the motion vector of the pixel position of the upper left corner in the current block, and the position of the pixel point of the upper right corner in the current block.
  • the decoder is configured to calculate candidate motion vectors of 3 control points of the current block according to the following formula:
  • vx 0 is the horizontal component of the motion vector corresponding to the position of the upper-left pixel point in the current block
  • vy 0 is the vertical component of the motion vector corresponding to the position of the upper-left pixel point in the current block
  • vx 1 is the The horizontal component of the motion vector corresponding to the position of the upper right pixel point in the current block
  • vy 1 is the vertical component of the motion vector corresponding to the position of the upper right pixel point in the current block
  • vx 2 is the upper right pixel point in the current block
  • vy 2 is the vertical component of the motion vector corresponding to the position of the upper-right pixel point in the current block
  • vx 4 is the motion vector corresponding to the position of the upper-left pixel point in the adjacent block.
  • vy 4 is the vertical component of the motion vector corresponding to the position of the upper-left pixel point in the adjacent block
  • vx 5 is the horizontal component of the motion vector corresponding to the position of the upper-right pixel point in the adjacent block
  • vy 5 Is the vertical component of the motion vector corresponding to the position of the upper-right pixel point in the adjacent block
  • x 0 is the abscissa of the position of the upper-left pixel point in the current block
  • y 0 is the position of the upper-left pixel point in the current block vertical Coordinates
  • x 1 is the abscissa of the position of the upper-right pixel point in the current block
  • y 1 is the ordinate of the position of the upper-right pixel point in the current block
  • x 2 is the abscissa of the position of the lower-left pixel point in the current block
  • y 2 is the vertical coordinate of the pixel position of the lower left corner in the current block
  • x 4
  • N is equal to 3 and K is equal to 2.
  • the candidate motion vectors of the 2 control points of the current block are based on the 6 parameters adopted by the neighboring blocks of the current block. Affine transformation model.
  • the candidate motion vectors of the two control points of the current block include the motion vector of the pixel position of the upper left corner in the current block and the position of the pixel point of the upper right corner in the current block.
  • the decoder is configured to calculate a candidate motion vector of 2 control points of the current block according to the following formula:
  • vx 0 is the horizontal component of the motion vector corresponding to the position of the upper-left pixel point in the current block
  • vy 0 is the vertical component of the motion vector corresponding to the position of the upper-left pixel point in the current block
  • vx 1 is the The horizontal component of the motion vector corresponding to the position of the upper-right pixel point in the current block
  • vy 1 is the vertical component of the motion vector corresponding to the position of the upper-right pixel point in the current block
  • vx 4 is the upper-left pixel in the adjacent block
  • the horizontal component of the motion vector corresponding to the position of the point, vy 4 is the vertical component of the motion vector corresponding to the position of the upper left pixel point in the adjacent block
  • vx 5 is the motion corresponding to the position of the upper right pixel point in the adjacent block
  • the horizontal component of the vector, vy 5 is the vertical component of the motion vector corresponding to the pixel position of the upper right corner in the adjacent block
  • the decoder is specifically configured to:
  • the decoder is specifically configured to:
  • the motion vector difference of K control points is obtained by analyzing the code stream
  • An affine transformation model of 2 by K parameters of the current block is obtained according to the motion vectors of the K control points of the current block.
  • the pre-decoder after determining the target candidate motion vectors of the K control points of the current block from the candidate motion vector list according to the index value, the pre-decoder further Used for:
  • an embodiment of the present invention provides another device for decoding a video.
  • the device includes:
  • Memory for storing video data in the form of a stream
  • the candidate motion vectors of K control points are obtained according to the 2N parameter affine transformation model used for the neighboring block of the current block, and the 2N parameter affine transformation model is based on the N neighboring blocks. Obtained by controlling the motion vector of the point, where N is an integer greater than or equal to 2 and less than or equal to 4, and K is an integer greater than or equal to 2 and less than or equal to 4; the adjacent block is a decoded adjacent to the current block space.
  • An image block the current block includes a plurality of sub-blocks; determining target candidate motion vectors of K control points of the current block from the candidate motion vector list according to the index value; and according to the K numbers of the current block
  • the target candidate motion vector of the control point is used to obtain the 6-parameter affine transformation model of the current block; according to the 6-parameter affine transformation model of the current block, the sub-blocks of the current block are obtained. Measuring the motion vector.
  • an embodiment of the present invention provides a computer-readable storage medium having stored thereon instructions that, when executed, cause one or more processors to encode video data.
  • the instructions cause the one or more processors to perform a method according to any possible embodiment of the first aspect.
  • an embodiment of the present invention provides a computer-readable storage medium having stored thereon instructions that, when executed, cause one or more processors to encode video data.
  • the instructions cause the one or more processors to perform a method according to any possible embodiment of the second aspect.
  • an embodiment of the present invention provides a computer program including a program code that, when run on a computer, executes a method according to any possible embodiment of the first aspect.
  • an embodiment of the present invention provides a computer program including a program code that, when run on a computer, performs a method according to any possible embodiment of the second aspect.
  • the process of encoding and decoding the current block in the process of encoding and decoding the current block, it can be implemented in the parsing stage of the current block (such as the stage of constructing the candidate motion vector list of the AMVP mode or the Merge mode), using adjacent blocks
  • the affine transformation models of the two can be different. Since the affine transformation model of the current block is more in line with the actual motion / actual requirements of the current block, implementing this solution can improve the coding efficiency and accuracy of encoding the current block, and meet user needs.
  • the decoding end can uniformly use the 6-parameter affine transformation model to predict the image block during the reconstruction phase of the image block, so that the embodiment of the present invention reconstructs the current block.
  • the process strikes a good balance between model complexity and modeling capabilities. Therefore, the implementation of this solution can improve the coding efficiency and accuracy of prediction of the current block, and meet user needs.
  • FIG. 1 is a block diagram of an example structure of a video decoding system for implementing an embodiment of the present invention
  • FIG. 2A is a block diagram of an example structure of a video encoder for implementing an embodiment of the present invention
  • 2B is a block diagram of an example structure of a video decoder for implementing an embodiment of the present invention
  • FIG. 3 is a block diagram of an example of a video decoding device for implementing an embodiment of the present invention.
  • FIG. 4 is a block diagram of an example of an encoding device or a decoding device for implementing an embodiment of the present invention
  • FIG. 5 is a schematic diagram of a scenario operation on the current block
  • FIG. 6 is a schematic diagram of another scenario operation on the current block
  • FIG. 7 is a schematic diagram of another scenario operation on the current block
  • FIG. 9 is a flowchart of a motion vector prediction method according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of another scenario operation on the current block
  • 11A is a schematic diagram of a current block and a motion compensation unit of the current block according to an embodiment of the present invention
  • 11B is a schematic diagram of another current block and a motion compensation unit of the current block according to an embodiment of the present invention.
  • FIG. 12 is a flowchart of another motion vector prediction method according to an embodiment of the present invention.
  • FIG. 13 is a flowchart of another motion vector prediction method according to an embodiment of the present invention.
  • the technical solutions involved in the embodiments of the present invention may not only be applied to existing video coding standards (such as H.264, HEVC and other standards), but also to future video coding standards (such as H.266 standard).
  • Video coding generally refers to processing a sequence of pictures that form a video or a video sequence.
  • picture In the field of video coding, the terms “picture”, “frame” or “image” can be used as synonyms.
  • Video encoding as used herein means video encoding or video decoding.
  • Video encoding is performed on the source side and typically involves processing (e.g., by compressing) the original video picture to reduce the amount of data required to represent the video picture, thereby storing and / or transmitting more efficiently.
  • Video decoding is performed on the destination side and usually involves inverse processing relative to the encoder to reconstruct the video picture.
  • the “encoding” of a video picture involved in the embodiment should be understood as “encoding” or “decoding” of a video sequence.
  • the combination of the encoding part and the decoding part is also called codec (encoding and decoding).
  • the video sequence includes a series of pictures.
  • the pictures are further divided into slices, and the slices are divided into blocks.
  • Video encoding is encoded in blocks.
  • the concept of blocks is further expanded.
  • MB macroblock
  • the macroblock can be further divided into multiple prediction blocks (partitions) that can be used for predictive coding.
  • HEVC high-performance video coding
  • basic concepts such as coding unit (CU), prediction unit (PU), and transformation unit (TU) are used. Functionally, Divided a variety of block units, and used a new tree-based description.
  • a CU can be divided into smaller CUs according to a quad tree, and smaller CUs can be further divided to form a quad tree structure.
  • a CU is a basic unit for dividing and encoding a coded image.
  • the PU can correspond to the prediction block and is the basic unit of prediction coding.
  • the CU is further divided into multiple PUs according to the division mode.
  • TU can correspond to a transform block and is a basic unit for transforming prediction residuals.
  • no matter CU, PU or TU they all belong to the concept of block (or image block).
  • a CTU is split into multiple CUs by using a quad-tree structure represented as a coding tree.
  • a decision is made at the CU level whether to use inter-picture (temporal) or intra-picture (spatial) prediction to encode a picture region.
  • Each CU can be further split into one, two, or four PUs according to the PU split type.
  • the same prediction process is applied in a PU, and related information is transmitted to the decoder on the basis of the PU.
  • a CU may be partitioned into a transform unit (TU) according to other quad-tree structures similar to a coding tree for a CU.
  • quad-tree and binary-tree (QTBT) split frames are used to split coded blocks.
  • the CU can be a square or rectangular shape.
  • the image block to be encoded in the currently encoded image may be referred to as the current block, for example, in encoding, it means the block currently being encoded; in decoding, it means the block currently being decoded.
  • the decoded image block in the reference image used to predict the current block is referred to as a reference block, that is, the reference block is a block that provides a reference signal for the current block, where the reference signal represents a pixel value within the image block.
  • a block in the reference image that provides a prediction signal for the current block may be a prediction block, where the prediction signal represents a pixel value or a sampling value or a sampling signal within the prediction block. For example, after traversing multiple reference blocks, the best reference block is found. This best reference block will provide prediction for the current block. This block can be called a prediction block.
  • FIG. 1 is a block diagram of a video decoding system according to an example described in the embodiment of the present invention.
  • video coder generally refers to both video encoders and video decoders.
  • video coding or “coding” may generally refer to video encoding or video decoding.
  • the video encoder 100 and the video decoder 200 of the video coding system are used to predict a current coded image according to various method examples described in any of a variety of new inter prediction modes proposed by the embodiments of the present invention.
  • the motion information of the block or its sub-blocks makes the predicted motion vector to the greatest extent close to the motion vector obtained by using the motion estimation method, so that the motion vector difference is not transmitted during encoding, thereby further improving the encoding and decoding performance.
  • the video decoding system includes a source device 10 and a destination device 20.
  • the source device 10 generates encoded video data. Therefore, the source device 10 may be referred to as a video encoding device.
  • the destination device 20 may decode the encoded video data generated by the source device 10. Therefore, the destination device 20 may be referred to as a video decoding device.
  • Various implementations of the source device 10, the destination device 20, or both may include one or more processors and a memory coupled to the one or more processors.
  • the memory may include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other medium that can be used to store the desired program code in the form of instructions or data structures accessible by a computer, as described herein.
  • the source device 10 and the destination device 20 may include various devices including desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets, such as so-called “smart” phones, etc. Cameras, televisions, cameras, display devices, digital media players, video game consoles, on-board computers, or the like.
  • the destination device 20 may receive the encoded video data from the source device 10 via the link 30.
  • the link 30 may include one or more media or devices capable of moving the encoded video data from the source device 10 to the destination device 20.
  • the link 30 may include one or more communication media enabling the source device 10 to directly transmit the encoded video data to the destination device 20 in real time.
  • the source device 10 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to the destination device 20.
  • the one or more communication media may include wireless and / or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines.
  • RF radio frequency
  • the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet).
  • the one or more communication media may include a router, a switch, a base station, or other devices that facilitate communication from the source device 10 to the destination device 20.
  • the encoded data may be output from the output interface 140 to the storage device 40.
  • the encoded data can be accessed from the storage device 40 through the input interface 240.
  • the storage device 40 may include any of a variety of distributed or locally accessed data storage media, such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, Or any other suitable digital storage medium for storing encoded video data.
  • the storage device 40 may correspond to a file server or another intermediate storage device that may hold the encoded video produced by the source device 10.
  • the destination device 20 may access the stored video data from the storage device 40 via streaming or download.
  • the file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to the destination device 20.
  • Example file servers include a web server (eg, for a website), an FTP server, a network attached storage (NAS) device, or a local disk drive.
  • the destination device 20 can access the encoded video data through any standard data connection, including an Internet connection.
  • This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on a file server.
  • the transmission of the encoded video data from the storage device 40 may be a streaming transmission, a download transmission, or a combination of the two.
  • the motion vector prediction technology of the embodiment of the present invention can be applied to video encoding and decoding to support a variety of multimedia applications, such as air television broadcasting, cable television transmission, satellite television transmission, streaming video transmission (e.g., via the Internet), for storage Encoding of video data on a data storage medium, decoding of video data stored on a data storage medium, or other applications.
  • a video coding system may be used to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and / or video telephony.
  • the video decoding system illustrated in FIG. 1 is only an example, and the techniques of the embodiments of the present invention can be applied to video decoding settings (for example, video encoding or video decoding) that do not necessarily include any data communication between the encoding device and the decoding device. ).
  • data is retrieved from local storage, streamed over a network, and so on.
  • the video encoding device may encode the data and store the data to a memory, and / or the video decoding device may retrieve the data from the memory and decode the data.
  • encoding and decoding are performed by devices that do not communicate with each other, but only encode data to and / or retrieve data from memory and decode data.
  • the source device 10 includes a video source 120, a video encoder 100, and an output interface 140.
  • the output interface 140 may include a regulator / demodulator (modem) and / or a transmitter.
  • Video source 120 may include a video capture device (e.g., a video camera), a video archive containing previously captured video data, a video feed interface to receive video data from a video content provider, and / or a computer for generating video data Graphics systems, or a combination of these sources of video data.
  • the video encoder 100 may encode video data from the video source 120.
  • the source device 10 transmits the encoded video data directly to the destination device 20 via the output interface 140.
  • the encoded video data may also be stored on the storage device 40 for later access by the destination device 20 for decoding and / or playback.
  • the destination device 20 includes an input interface 240, a video decoder 200, and a display device 220.
  • the input interface 240 includes a receiver and / or a modem.
  • the input interface 240 may receive encoded video data via the link 30 and / or from the storage device 40.
  • the display device 220 may be integrated with the destination device 20 or may be external to the destination device 20. Generally, the display device 220 displays decoded video data.
  • the display device 220 may include various display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
  • LCD liquid crystal display
  • OLED organic light emitting diode
  • video encoder 100 and video decoder 200 may each be integrated with an audio encoder and decoder, and may include a suitable multiplexer-demultiplexer unit Or other hardware and software to handle encoding of both audio and video in a common or separate data stream.
  • the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP), if applicable.
  • UDP User Datagram Protocol
  • Video encoder 100 and video decoder 200 may each be implemented as any of a variety of circuits, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), Field Programmable Gate Array (FPGA), discrete logic, hardware, or any combination thereof. If the embodiments of the present invention are implemented partially in software, the device may store instructions for the software in a suitable non-volatile computer-readable storage medium, and may use one or more processors to execute all of the hardware in hardware. The instructions thus implement the techniques of the embodiments of the present invention. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered as one or more processors. Each of video encoder 100 and video decoder 200 may be included in one or more encoders or decoders, any of which may be integrated as a combined encoder in a corresponding device / Decoder (codec).
  • codec device / Decoder
  • Embodiments of the invention may generally refer to video encoder 100 as “signaling” or “transmitting” certain information to another device, such as video decoder 200.
  • the terms “signaling” or “transmitting” may generally refer to the transmission of syntax elements and / or other data used to decode compressed video data. This transfer can occur in real time or almost real time. Alternatively, this communication may occur over a period of time, such as when a syntax element is stored in a coded bitstream to a computer-readable storage medium at the time of encoding, and the decoding device may then after the syntax element is stored in this medium. retrieve the syntax element at any time.
  • the video encoder 100 and the video decoder 200 may operate according to a video compression standard such as High Efficiency Video Coding (HEVC) or an extension thereof, and may conform to the HEVC test model (HM).
  • HEVC High Efficiency Video Coding
  • HM HEVC test model
  • the video encoder 100 and video decoder 200 may also operate according to other industry standards, such as the ITU-T H.264, H.265 standards, or extensions of such standards.
  • the technology of the embodiments of the present invention is not limited to any particular codec standard.
  • the video encoder 100 is configured to encode syntax elements related to a current image block to be encoded into a digital video output bit stream (referred to as a bit stream or code stream), which will be used between the current image block frames.
  • the syntax element of prediction is referred to as inter prediction data for short.
  • the inter prediction data includes, for example, indication information of an inter prediction mode.
  • the inter prediction mode in the embodiment of the present invention includes an AMVP mode based on an affine transformation model and an affine transformation model. At least one of the Merge modes.
  • the inter prediction data may further include an index value (or an index number) of a candidate motion vector list corresponding to the AMVP mode, and Motion vector difference (MVD) of the control point of the current block; if the inter prediction data includes indication information of a Merge mode based on an affine transformation model, the inter prediction data may further include candidates corresponding to the Merge mode The index value (or index number) of the motion vector list.
  • the inter prediction data in the above example may further include indication information of an affine transformation model (the number of model parameters) of the current block.
  • the video encoder 100 may be configured to execute the embodiment of FIG. 13 described later to implement the motion vector prediction method described in the present invention at the encoding end.
  • the video decoder 200 is configured to decode a syntax element related to the image block to be decoded from the bit stream (S401).
  • the syntax element used for inter prediction of the current image block is referred to as an inter frame for short.
  • the prediction data and the inter prediction data include, for example, indication information of an inter prediction mode.
  • the inter prediction mode in the embodiment of the present invention includes at least one of an AMVP mode based on an affine transformation model and a Merge mode based on an affine transformation model.
  • the inter prediction data may further include an index value (or an index number) of a candidate motion vector list corresponding to the AMVP mode, and Motion vector difference (MVD) of the control point of the current block; if the inter prediction data includes indication information of a Merge mode based on an affine transformation model, the inter prediction data may further include candidates corresponding to the Merge mode The index value (or index number) of the motion vector list.
  • the inter prediction data in the above example may further include indication information of an affine transformation model (the number of model parameters) of the current block.
  • the video decoder 200 may be configured to execute the embodiments shown in FIG. 9 or FIG. 12 described later, so as to implement the application of the motion vector prediction method described in the present invention at the decoding end.
  • FIG. 2A is a block diagram of a video encoder 100 according to an example described in the embodiment of the present invention.
  • the video encoder 100 is configured to output a video to the post-processing entity 41.
  • the post-processing entity 41 represents an example of a video entity that can process encoded video data from the video encoder 100, such as a media-aware network element (MANE) or a stitching / editing device.
  • the post-processing entity 41 may be an instance of a network entity.
  • the post-processing entity 41 and the video encoder 100 may be parts of a separate device, while in other cases, the functionality described with respect to the post-processing entity 41 may be performed by the same device including the video encoder 100 carried out.
  • the post-processing entity 41 is an example of the storage device 40 of FIG. 1.
  • the video encoder 100 includes a prediction processing unit 108, a filter unit 106, a decoded image buffer (DPB) 107, a summer 112, a transformer 101, a quantizer 102, and an entropy encoder 103.
  • the prediction processing unit 108 includes an inter predictor 110 and an intra predictor 109.
  • the video encoder 100 further includes an inverse quantizer 104, an inverse transformer 105, and a summer 111.
  • the filter unit 106 is intended to represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter.
  • the filter unit 106 is shown as an in-loop filter in FIG. 2A, in other implementations, the filter unit 106 may be implemented as a post-loop filter.
  • the video encoder 100 may further include a video data memory and a segmentation unit (not shown in the figure).
  • the video data memory may store video data to be encoded by the components of the video encoder 100.
  • the video data stored in the video data storage may be obtained from the video source 120.
  • the DPB 107 may be a reference image memory that stores reference video data used by the video encoder 100 to encode video data in an intra-frame or inter-frame decoding mode.
  • Video data memory and DPB 107 can be formed by any of a variety of memory devices, such as dynamic random access memory (DRAM), synchronous resistive RAM (MRAM), resistive RAM (RRAM) including synchronous DRAM (SDRAM), Or other types of memory devices.
  • Video data storage and DPB 107 can be provided by the same storage device or separate storage devices.
  • the video data memory may be on-chip with other components of video encoder 100, or off-chip relative to those components.
  • the video encoder 100 receives video data and stores the video data in a video data memory.
  • the segmentation unit divides the video data into several image blocks, and these image blocks can be further divided into smaller blocks, such as image block segmentation based on a quad-tree structure or a binary tree structure. This segmentation may also include segmentation into slices, tiles, or other larger units.
  • Video encoder 100 typically illustrates components that encode image blocks within a video slice to be encoded. The slice can be divided into multiple image patches (and possibly into a collection of image patches called slices).
  • the intra predictor 109 within the prediction processing unit 108 may perform intra predictive encoding of the current image block with respect to one or more neighboring blocks in the same frame or slice as the current block to be encoded to remove spatial redundancy.
  • the inter predictor 110 within the prediction processing unit 108 may perform inter predictive encoding of the current image block with respect to one or more prediction blocks in the one or more reference images to remove temporal redundancy.
  • the inter predictor 110 may be configured to determine an inter prediction mode for encoding a current image block.
  • the inter predictor 110 may use rate-distortion analysis to calculate rate-distortion values for various inter-prediction modes in a set of candidate inter-prediction modes and select an inter-frame having the best rate-distortion characteristics from among them. Forecasting mode.
  • Rate distortion analysis generally determines the amount of distortion (or error) between the coded block and the original uncoded block that was coded to produce the coded block, and the bit rate used to generate the coded block (that is, , Number of bits).
  • the inter predictor 110 may determine that the inter prediction mode with the lowest code rate distortion cost of encoding the current image block in the candidate inter prediction mode set is the inter prediction mode used for inter prediction of the current image block.
  • the inter-predictive encoding process will be described in detail below, especially in the various inter-prediction modes for non-directional or directional sports fields in the embodiments of the present invention, predicting one or more sub-blocks in the current image block (specifically, It is a process of motion information of each sub-block or all sub-blocks).
  • the inter predictor 110 is configured to predict motion information (such as a motion vector) of one or more sub-blocks in the current image block based on the determined inter prediction mode, and use the motion information (such as the motion vector) of one or more sub-blocks in the current image block. Motion vector) to obtain or generate a prediction block of the current image block.
  • the inter predictor 110 may locate a prediction block pointed to by the motion vector in one of the reference image lists.
  • the inter predictor 110 may also generate syntax elements associated with the image blocks and video slices for use by the video decoder 200 when decoding the image blocks of the video slices.
  • the inter predictor 110 uses the motion information of each sub-block to perform a motion compensation process to generate a prediction block of each sub-block, thereby obtaining a prediction block of the current image block. It should be understood that the The inter predictor 110 performs motion estimation and motion compensation processes.
  • the inter predictor 110 may provide information indicating the selected inter prediction mode of the current image block to the entropy encoder 103 so that the entropy encoder 103 encodes the instruction.
  • Information on the selected inter prediction mode may include inter prediction data related to the current image block in the transmitted bit stream.
  • the inter prediction data includes, for example, indication information of an inter prediction mode.
  • the inter-prediction mode includes at least one of an AMVP mode based on an affine transformation model and a Merge mode based on an affine transformation model.
  • the inter prediction data may further include an index value (or an index number) of a candidate motion vector list corresponding to the AMVP mode, and Motion vector difference (MVD) of the control point of the current block; if the inter prediction data includes indication information of a Merge mode based on an affine transformation model, the inter prediction data may further include candidates corresponding to the Merge mode The index value (or index number) of the motion vector list.
  • the inter prediction data in the above example may further include indication information of an affine transformation model (the number of model parameters) of the current block.
  • the inter predictor 110 may be configured to perform the related steps of the embodiment of FIG. 13 described later to implement the application of the motion vector prediction method described in the present invention at the encoding end.
  • the intra predictor 109 may perform intra prediction on the current image block.
  • the intra predictor 109 may determine an intra prediction mode used to encode the current block.
  • the intra-predictor 109 may use rate-distortion analysis to calculate rate-distortion values for various intra-prediction modes to be tested, and select an intra-prediction with the best rate-distortion characteristics from the test modes. mode.
  • the intra predictor 109 may provide information indicating the selected intra prediction mode of the current image block to the entropy encoder 103 so that the entropy encoder 103 encodes the instruction Information on the selected intra prediction mode.
  • the video encoder 100 forms a residual image block by subtracting the prediction block from the current image block to be encoded.
  • the summer 112 represents one or more components that perform this subtraction operation.
  • the residual video data in the residual block may be included in one or more TUs and applied to the transformer 101.
  • the transformer 101 transforms the residual video data into a residual transform coefficient using a transform such as a discrete cosine transform (DCT) or a conceptually similar transform.
  • the transformer 101 may transform the residual video data from a pixel value domain to a transform domain, such as a frequency domain.
  • DCT discrete cosine transform
  • the transformer 101 may send the obtained transform coefficients to a quantizer 102.
  • a quantizer 102 quantizes the transform coefficients to further reduce the bit rate.
  • the quantizer 102 may then perform a scan of a matrix containing the quantized transform coefficients.
  • the entropy encoder 103 may perform scanning.
  • the entropy encoder 103 After quantization, the entropy encoder 103 entropy encodes the quantized transform coefficients. For example, the entropy encoder 103 can perform context-adaptive variable-length coding (CAVLC), context-adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), and probability interval segmentation entropy (PIPE ) Coding or another entropy coding method or technique.
  • CAVLC context-adaptive variable-length coding
  • CABAC context-adaptive binary arithmetic coding
  • SBAC syntax-based context-adaptive binary arithmetic coding
  • PIPE probability interval segmentation entropy Coding or another entropy coding method or technique.
  • the encoded bitstream may be transmitted to the video decoder 200, or archived for later transmission or retrieved by the video decoder 200.
  • the entropy encoder 103 may also perform entrop
  • the inverse quantizer 104 and the inverse changer 105 respectively apply inverse quantization and inverse transform to reconstruct the residual block in the pixel domain, for example, for later use as a reference block of a reference image.
  • the summer 111 adds the reconstructed residual block to a prediction block generated by the inter predictor 110 or the intra predictor 109 to generate a reconstructed image block.
  • the filter unit 106 may be adapted to reconstructed image blocks to reduce distortion, such as block artifacts. This reconstructed image block is then stored as a reference block in the decoded image buffer 107 and can be used by the inter predictor 110 as a reference block to inter-predict subsequent video frames or blocks in the image.
  • the video encoder 100 can directly quantize the residual signal without processing by the transformer 101 and correspondingly without the inverse transformer 105; or, for some image blocks Or image frames, the video encoder 100 does not generate residual data, and accordingly does not need to be processed by the transformer 101, quantizer 102, inverse quantizer 104, and inverse transformer 105; or, the video encoder 100 may convert the reconstructed image
  • the blocks are directly stored as reference blocks without being processed by the filter unit 106; alternatively, the quantizer 102 and the inverse quantizer 104 in the video encoder 100 may be merged together.
  • the video encoder 100 is configured to implement a motion vector prediction method described in the embodiments described later.
  • FIG. 2B is a block diagram of a video decoder 200 according to an example described in the embodiment of the present invention.
  • the video decoder 200 includes an entropy decoder 203, a prediction processing unit 208, an inverse quantizer 204, an inverse transformer 205, a summer 211, a filter unit 206, and a decoded image buffer 207.
  • the prediction processing unit 208 may include an inter predictor 210 and an intra predictor 209.
  • video decoder 200 may perform a decoding process that is substantially reciprocal to the encoding process described with respect to video encoder 100 from FIG. 2A.
  • video decoder 200 receives from video encoder 100 an encoded video bitstream representing image blocks of the encoded video slice and associated syntax elements.
  • the video decoder 200 may receive video data from the network entity 42, optionally, the video data may also be stored in a video data storage (not shown in the figure).
  • the video data memory may store video data, such as an encoded video bitstream, to be decoded by components of the video decoder 200.
  • the video data stored in the video data storage can be obtained, for example, from the storage device 40, from a local video source such as a camera, via a wired or wireless network of video data, or by accessing a physical data storage medium.
  • the video data memory can be used as a decoded image buffer (CPB) for storing encoded video data from the encoded video bitstream. Therefore, although the video data storage is not shown in FIG. 2B, the video data storage and the DPB 207 may be the same storage, or may be separately provided storages. Video data memory and DPB 207 can be formed by any of a variety of memory devices, such as: dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), and resistive RAM (RRAM) , Or other types of memory devices. In various examples, the video data memory may be integrated on a chip with other components of the video decoder 200 or provided off-chip relative to those components.
  • DRAM dynamic random access memory
  • SDRAM synchronous DRAM
  • MRAM magnetoresistive RAM
  • RRAM resistive RAM
  • the video data memory may be integrated on a chip with other components of the video decoder 200 or provided off-chip relative to those components.
  • the network entity 42 may be, for example, a server, a MANE, a video editor / splicer, or other such device for implementing one or more of the techniques described above.
  • the network entity 42 may or may not include a video encoder, such as video encoder 100.
  • the network entity 42 may implement some of the techniques described in the embodiments of the present invention.
  • the network entity 42 and the video decoder 200 may be part of separate devices, while in other cases, the functionality described with respect to the network entity 42 may be performed by the same device including the video decoder 200.
  • the network entity 42 may be an example of the storage device 40 of FIG. 1.
  • the entropy decoder 203 of the video decoder 200 entropy decodes the bit stream to generate quantized coefficients and some syntax elements.
  • the entropy decoder 203 forwards the syntax elements to the prediction processing unit 208.
  • Video decoder 200 may receive syntax elements at a video slice level and / or an image block level.
  • the intra predictor 209 of the prediction processing unit 208 may be based on the signaled intra prediction mode and the previously decoded block from the current frame or image. Data to generate prediction blocks for image blocks of the current video slice.
  • the inter predictor 210 of the prediction processing unit 208 may determine, based on the syntax elements received from the entropy decoder 203, the An inter prediction mode in which a current image block of a video slice is decoded, and based on the determined inter prediction mode, the current image block is decoded (for example, inter prediction is performed).
  • the inter predictor 210 may determine whether to use the new inter prediction mode to predict the current image block of the current video slice. If the syntax element indicates that the new inter prediction mode is used to predict the current image block, based on A new inter prediction mode (e.g., a new inter prediction mode specified by a syntax element or a default new inter prediction mode) predicts the current image block of the current video slice or a sub-block of the current image block. Motion information, thereby obtaining or generating a predicted block of the current image block or a sub-block of the current image block using the predicted motion information of the current image block or a sub-block of the current image block through a motion compensation process.
  • a new inter prediction mode e.g., a new inter prediction mode specified by a syntax element or a default new inter prediction mode
  • the motion information here may include reference image information and motion vectors, where the reference image information may include but is not limited to unidirectional / bidirectional prediction information, a reference image list number, and a reference image index corresponding to the reference image list.
  • a prediction block may be generated from one of reference pictures within one of the reference picture lists.
  • the video decoder 200 may construct a reference picture list, that is, a list 0 and a list 1, based on the reference pictures stored in the DPB 207.
  • the reference frame index of the current image may be included in one or more of the reference frame list 0 and list 1.
  • the inter predictor 210 here performs a motion compensation process. In the following, the inter prediction process for using the motion information of the reference block to predict the motion information of the current image block or a sub-block of the current image block under various new inter prediction modes will be explained in detail.
  • the inter predictor 210 may use the syntax element (S401) decoded from the bitstream to be related to the image block currently to be decoded to predict the image block currently to be decoded, which will be used for the current image block frame
  • the syntax element of inter prediction is referred to as inter prediction data for short.
  • the inter prediction data includes, for example, indication information of an inter prediction mode.
  • the inter prediction mode in the embodiment of the present invention includes an AMVP mode based on an affine transformation model and an affine transformation. At least one of the Merge modes of the model.
  • the inter prediction data may further include an index value (or an index number) of a candidate motion vector list corresponding to the AMVP mode, and Motion vector difference (MVD) of the control point of the current block; if the inter prediction data includes indication information of a Merge mode based on an affine transformation model, the inter prediction data may further include candidates corresponding to the Merge mode The index value (or index number) of the motion vector list.
  • the inter prediction data in the above example may further include indication information of an affine transformation model (the number of model parameters) of the current block.
  • the inter predictor 210 may be configured to execute the related steps of the embodiments in FIG. 9 or FIG. 12 described later, so as to implement the application of the motion vector prediction method described in the present invention at the decoding end.
  • the inverse quantizer 204 inverse quantizes, that is, dequantizes, the quantized transform coefficients provided in the bitstream and decoded by the entropy decoder 203.
  • the inverse quantization process may include using a quantization parameter calculated by the video encoder 100 for each image block in the video slice to determine the degree of quantization that should be applied and similarly to determine the degree of inverse quantization that should be applied.
  • the inverse transformer 205 applies an inverse transform to transform coefficients, such as an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process in order to generate a residual block in the pixel domain.
  • the video decoder 200 works by comparing the residual block from the inverse transformer 205 with the corresponding prediction generated by the inter predictor 210 The blocks are summed to get the reconstructed block, that is, the decoded image block.
  • the summer 211 represents a component that performs this summing operation.
  • a loop filter in the decoding loop or after the decoding loop
  • the filter unit 206 may represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter.
  • the filter unit 206 is shown as an in-loop filter in FIG. 2B, in other implementations, the filter unit 206 may be implemented as a post-loop filter.
  • the filter unit 206 is adapted to reconstruct a block to reduce block distortion, and the result is output as a decoded video stream.
  • a decoded image block in a given frame or image may also be stored in a decoded image buffer 207, and the decoded image buffer 207 stores a reference image for subsequent motion compensation.
  • the decoded image buffer 207 may be part of a memory, which may also store the decoded video for later presentation on a display device, such as the display device 220 of FIG. 1, or may be separate from such memory.
  • the video decoder 200 may generate an output video stream without being processed by the filter unit 206; or, for certain image blocks or image frames, the entropy decoder 203 of the video decoder 200 does not decode the quantized coefficients, and accordingly It does not need to be processed by the inverse quantizer 204 and the inverse transformer 205.
  • the video decoder 200 is configured to implement the motion vector prediction method described in the following embodiments.
  • FIG. 3 is a schematic structural diagram of a video decoding device 400 (such as a video encoding device 400 or a video decoding device 400) according to an embodiment of the present invention.
  • the video coding device 400 is adapted to implement various embodiments described herein.
  • the video coding device 400 may be a video decoder (such as video decoder 200 of FIG. 1) or a video encoder (such as video encoder 100 of FIG. 1).
  • the video decoding device 400 may be one or more components in the video decoder 200 of FIG. 1 or the video encoder 100 of FIG. 1 described above.
  • the video decoding device 400 includes an entry port 410 and a receiving unit (Rx) 420 for receiving data, a processor, a logic unit or a central processing unit (CPU) 430 for processing data, and a transmitter unit for transmitting data (Tx) 440 and exit port 450, and a memory 460 for storing data.
  • the video decoding device 400 may further include a photoelectric conversion component and an electro-optical (EO) component coupled with the entrance port 410, the receiver unit 420, the transmitter unit 440, and the exit port 450, for an exit or entrance of an optical signal or an electric signal.
  • EO electro-optical
  • the processor 430 is implemented by hardware and software.
  • the processor 430 may be implemented as one or more CPU chips, cores (eg, multi-core processors), FPGAs, ASICs, and DSPs.
  • the processor 430 is in communication with the ingress port 410, the receiver unit 420, the transmitter unit 440, the egress port 450, and the memory 460.
  • the processor 430 includes a decoding module 470 (eg, an encoding module 470 or a decoding module 470).
  • the encoding / decoding module 470 implements the embodiments disclosed herein to implement the chroma block prediction method provided by the embodiments of the present invention.
  • the encoding / decoding module 470 implements, processes, or provides various encoding operations.
  • the function of the video decoding device 400 is substantially improved through the encoding / decoding module 470, and the transition of the video decoding device 400 to different states is affected.
  • the encoding / decoding module 470 is implemented with instructions stored in the memory 460 and executed by the processor 430.
  • the memory 460 includes one or more magnetic disks, tape drives, and solid-state hard disks, and can be used as an overflow data storage device for storing programs when these programs are selectively executed, and for storing instructions and data read during program execution.
  • the memory 460 may be volatile and / or non-volatile, and may be a read-only memory (ROM), a random access memory (RAM), a random content-addressable memory (TCAM), and / or a static state. Random access memory (SRAM).
  • the processing result for a certain link may be further processed and output to the next link, for example, in interpolation filtering, motion vector derivation or loop After filtering and other steps, the results of the corresponding steps are further clipped or shifted.
  • the motion vector of the control point of the current image block derived from the motion vector of the adjacent affine coding block may be further processed, which is not limited in this application.
  • the value range of the motion vector is restricted so that it is within a certain bit width. Assuming that the bit width of the allowed motion vector is bitDepth, the range of the motion vector is -2 ⁇ (bitDepth-1) to 2 ⁇ (bitDepth-1) -1, where the " ⁇ " symbol represents the power. If bitDepth is 16, the value ranges from -32768 to 32767. If bitDepth is 18, the value ranges from -131072 to 131071. Constraints can be implemented in two ways:
  • ux (vx + 2 bitDepth )% 2 bitDepth
  • the value of vx is -32769, and the value obtained by the above formula is 32767. Because in the computer, the value is stored in two's complement form, and the two's complement of -32769 is 1,0111,1111,1111,1111 (17 bits). The computer treats the overflow as discarding the upper bits, so the value of vx For 0111, 1111, 1111, 1111, it is 32767, which is consistent with the result obtained by formula processing.
  • vx Clip3 (-2 bitDepth-1 , 2 bitDepth-1 -1, vx)
  • vy Clip3 (-2 bitDepth-1 , 2 bitDepth-1 -1, vy)
  • Clip3 is to clamp the value of z to the interval [x, y]:
  • FIG. 4 is a schematic block diagram of an implementation manner of an encoding device or a decoding device (referred to as a decoding device 1200) according to an embodiment of the present invention.
  • the decoding device 1200 may include a processor 1210, a memory 1230, and a bus system 1250.
  • the processor and the memory are connected through a bus system, the memory is used to store instructions, and the processor is used to execute the instructions stored in the memory.
  • the memory of the encoding device stores program code, and the processor may call the program code stored in the memory to perform various video encoding or decoding methods described in the embodiments of the present invention, especially video encoding or decoding in various new inter prediction modes. Decoding method, and method for predicting motion information in various new inter prediction modes. To avoid repetition, it will not be described in detail here.
  • the processor 1210 may be a Central Processing Unit (“CPU”), and the processor 1210 may also be another general-purpose processor, a digital signal processor (DSP), or a dedicated integration. Circuits (ASICs), off-the-shelf programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 1230 may include a read only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device may also be used as the memory 1230.
  • the memory 1230 may include code and data 1231 accessed by the processor 1210 using the bus 1250.
  • the memory 1230 may further include an operating system 1233 and an application program 1235.
  • the application program 1235 includes at least the processor 1210 to perform at least the video encoding or decoding method described in the embodiment of the present invention (especially the motion vector prediction method described in the embodiment of the present invention).
  • the application program 1235 may include applications 1 to N, which further includes a video encoding or decoding application (referred to as a video decoding application) that executes the video encoding or decoding method described in the embodiment of the present invention.
  • the bus system 1250 may include a data bus, a power bus, a control bus, and a status signal bus. However, for the sake of clarity, various buses are marked as the bus system 1250 in the figure.
  • the decoding device 1200 may further include one or more output devices, such as a display 1270.
  • the display 1270 may be a tactile display that incorporates the display with a tactile unit operatively sensing a touch input.
  • the display 1270 may be connected to the processor 1210 via a bus 1250.
  • inter prediction modes In order to better understand the technical solutions of the embodiments of the present invention, the inter prediction modes, non-translational motion models, inherited control point motion vector prediction methods, and constructed control point motion vector prediction methods according to the embodiments of the present invention are further described below.
  • Inter prediction mode In HEVC, two inter prediction modes are used, which are advanced motion vector prediction (AMVP) mode and merge mode.
  • AMVP advanced motion vector prediction
  • the AMVP mode For the AMVP mode, it first traverses the spatially or temporally adjacent coded blocks of the current block (denoted as neighboring blocks), and builds a candidate motion vector list (also referred to as a motion information candidate list) based on the motion information of each neighboring block. Then, the optimal motion vector is determined from the candidate motion vector list by the rate distortion cost, and the candidate motion information with the lowest rate distortion cost is used as the motion vector predictor (MVP) of the current block. Among them, the positions of the neighboring blocks and their traversal order are all predefined.
  • the rate-distortion cost is calculated by formula (1), where J represents the rate-distortion cost RD Cost, and SAD is the sum of the absolute error between the predicted pixel value and the original pixel value obtained after motion estimation using the candidate motion vector prediction value (sum of absolute differences (SAD), where R is the code rate and ⁇ is the Lagrangian multiplier.
  • the encoding end passes the index value of the selected motion vector prediction value in the candidate motion vector list and the reference frame index value to the decoding end. Further, a motion search is performed in the neighborhood centered on the MVP to obtain the actual motion vector of the current block, and the encoder transmits the difference between the MVP and the actual motion vector (motion vector difference) to the decoder.
  • the motion information of the current block in the spatial or time-domain adjacent coded blocks is used to construct a candidate motion vector list, and then the optimal motion information is determined from the candidate motion vector list as the current block's The motion information, and then pass the index value of the position of the optimal motion information in the candidate motion vector list (referred to as merge index, the same applies hereinafter) to the decoding end.
  • merge index the index value of the position of the optimal motion information in the candidate motion vector list
  • the spatial and temporal candidate motion information of the current block is shown in Figure 5.
  • the spatial motion candidate information comes from the spatially adjacent 5 blocks (A0, A1, B0, B1, and B2).
  • the motion information of the neighboring block is not added to the candidate motion vector list.
  • the time-domain candidate motion information of the current block is obtained after scaling the MV of the corresponding position block in the reference frame according to the reference frame and the picture order count (POC) of the current frame.
  • POC picture order count
  • the positions of the neighboring blocks of the Merge mode and their traversal order are also predefined, and the positions of the neighboring blocks and their traversal order may be different in different modes.
  • all pixels in the coding block use the same motion information (that is, the motion of all pixels in the coding block is consistent), and then motion compensation is performed according to the motion information to obtain the predicted value of the pixels of the coding block. .
  • motion compensation is performed according to the motion information to obtain the predicted value of the pixels of the coding block.
  • not all pixels in a coded block have the same motion characteristics. Using the same motion information may cause inaccurate motion compensation predictions, which may increase residual information.
  • the existing video coding standards use block-matched motion estimation based on translational motion models.
  • non-translational motion objects such as rotating objects. Roller coasters rotating in different directions, fireworks and some special effects in movies, especially moving objects in UGC scenes, if they are coded, if the block motion compensation technology based on translational motion models in the current coding standard is used, The encoding efficiency will be greatly affected. Therefore, non-translational motion models, such as affine transformation models, are generated to further improve the encoding efficiency.
  • AMVP modes can be divided into AMVP modes based on translational models and AMVP modes based on non-translational models;
  • Merge modes can be divided into Merge modes based on translational models and non-translational movement models.
  • Merge mode can be divided into Merge modes based on translational models and non-translational movement models.
  • Non-translational motion model prediction refers to the use of the same motion model at the codec side to derive the motion information of each sub-motion compensation unit in the current block, and perform motion compensation based on the motion information of the sub-motion compensation unit to obtain the prediction block, thereby improving prediction effectiveness.
  • the sub motion compensation unit involved in the embodiment of the present invention may be a pixel or a pixel block of size N 1 ⁇ N 2 divided according to a specific method, where N 1 and N 2 are both positive integers and N 1 It may be equal to N 2 or not equal to N 2 .
  • non-translational motion models are 4-parameter affine transformation models or 6-parameter affine transformation models. In possible application scenarios, there are also 8-parameter bilinear models. Each will be described below.
  • the 4-parameter affine transformation model can be represented by the motion vector of two pixels and their coordinates relative to the top left vertex pixel of the current block.
  • the pixels used to represent the parameters of the motion model are called control points. If the upper-left vertex (0,0) and upper-right vertex (W, 0) pixels are used as control points, the motion vectors (vx0, vy0) and (vx1, vy1) of the upper-left vertex and upper-right vertex control points of the current block are determined first.
  • the motion information of each sub-motion compensation unit in the current block is obtained according to the following formula (3), where (x, y) is the coordinates of the sub-motion compensation unit relative to the top left vertex pixel of the current block, and W is the width of the current block.
  • the 6-parameter affine transformation model is shown in the following formula (4):
  • the 6-parameter affine transformation model can be represented by the motion vector of three pixels and its coordinates relative to the top left vertex pixel of the current block. If the upper left vertex (0,0), upper right vertex (W, 0), and lower left vertex (0, H) pixels are used as control points, the motion vectors of the upper left vertex, upper right vertex, and lower left vertex control point of the current block are determined respectively.
  • the 8-parameter bilinear model is shown in the following formula (6):
  • the 8-parameter bilinear model can be represented by the motion vector of four pixels and its coordinates with respect to the top left vertex pixel of the current coding block. If the top left vertex (0,0), top right vertex (W, 0), bottom left vertex (0, H), and bottom right fixed point (W, H) are used as control points, then the top left vertex, top right of the current coding block are determined first.
  • the motion vectors (vx0, vy0), (vx1, vy1), (vx2, vy2), and (vx3, vy3) of the control points of the vertices, lower left vertices, and lower right vertices are then derived according to the following formula (7)
  • the motion information of each sub motion compensation unit where (x, y) is the coordinates of the top left pixel of the sub motion compensation unit relative to the current coding block, and W and H are the width and height of the current coding block, respectively.
  • the coding block predicted by the affine transformation model can also be called an affine coding block.
  • the affine transformation model is directly related to the motion information of the control points of the affine coding block.
  • the AMVP mode based on the affine transformation model or the Merge mode based on the affine transformation model can be used to obtain the motion information of the control points of the affine coding block.
  • the motion information of the control points of the current coding block can be obtained by the inherited control point motion vector prediction method or the constructed control point motion vector prediction method.
  • the inherited control point motion vector prediction method refers to using an affine transformation model of an adjacent affine-coded block of the current block to determine a candidate control point motion vector of the current block.
  • the number of parameters (such as 4 parameters, 6 parameters, 8 parameters, etc.) of the affine transformation model of the affine coding block is consistent with the number of parameters of the affine transformation model of the current block.
  • the adjacent position blocks are not limited to A1, B1, B0, A0, and B2.
  • the adjacent position block may be a pixel, or a pixel block of a preset size divided according to a specific method, for example, a 4x4 pixel block, a 4x2 pixel block, or other sizes. Pixel blocks are not limited.
  • the affine coding block is a coded block adjacent to the current block (also referred to as an adjacent affine coding block) which is predicted by using an affine transformation model during the encoding stage.
  • the affine code block where A1 is located is a 4-parameter affine code block (that is, the affine code block is predicted using a 4-parameter affine transformation model), then the motion vector of the upper left vertex (x4, y4) of the affine code block (vx4, vy4), the motion vector (vx5, vy5) of the upper right vertex (x5, y5).
  • the combination of the motion vector (vx0, vy0) of the upper left vertex (x0, y0) of the current block and the motion vector (vx1, vy1) of the upper right vertex (x1, y1) obtained based on the affine coding block where A1 is located as above is the current Candidate control point motion vector for the block.
  • the motion vector (vx4) of the upper left vertex (x4, y4) of the affine coding block is obtained.
  • vy4 the motion vector (vx5, vy5) of the upper right vertex (x5, y5), and the motion vector (vx6, vy6) of the lower left vertex (x6, y6).
  • the combination of the motion vectors (vx2, vy2) of the vertices (x2, y2) is the candidate control point motion vector of the current block.
  • the constructed control point motion vector prediction method refers to combining the motion vectors of the neighboring coded blocks around the control points of the current block as the motion vectors of the control points of the current affine coding block, without considering the neighboring neighboring coded Whether the block is an affine-coded block.
  • the control point motion vector prediction methods constructed are different, which are described below respectively.
  • the motion vector of the block A2, B2, or B3 adjacent to the top left vertex can be used as the top left vertex of the current block.
  • the candidate motion vector of the motion vector of; the motion vector of the upper right vertex adjacent to the coded block B1 or B0 block is used as the candidate motion vector of the upper right vertex of the current block.
  • the candidate motion vectors of the upper left vertex and the upper right vertex are combined to form a plurality of two-tuples.
  • the motion vectors of two coded blocks included in the tuple can be used as candidate control point motion vectors of the current block.
  • the two-tuple can be seen in the following (13A):
  • v A2 represents the motion vector of A2
  • v B1 represents the motion vector of B1
  • v B0 represents the motion vector of B0
  • v B2 represents the motion vector of B2
  • v B3 represents the motion vector of B3.
  • the motion vector of the block A2, B2, or B3 adjacent to the top-left vertex can be used as the top-left vertex of the current block.
  • candidate motion vector of the motion vector of the motion vector use the motion vector of the top right vertex adjacent coded block B1 or B0 block as the motion vector of the motion vector of the top right vertex of the current block, and use the sitting vertex adjacent coded block A0 or A1
  • the motion vector serves as a candidate motion vector for the motion vector of the lower left vertex of the current block.
  • the above candidate motion vectors of the upper left vertex, the upper right vertex, and the lower left vertex are combined to form multiple triples.
  • the motion vectors of the three coded blocks included in the triple can be used as candidate control point motion vectors for the current block.
  • the multiple triples can be seen in the following formulas (13B) and (13C):
  • v A2 indicates the motion vector of A2
  • v B1 indicates the motion vector of B1
  • v B0 indicates the motion vector of B0
  • v B2 indicates the motion vector of B2
  • v B3 indicates the motion vector of B3
  • v A0 indicates the motion vector of A0
  • v A1 represents the motion vector of A1.
  • FIG. 7 is only used as an example. Other methods of combining control point motion vectors may also be applied to the embodiments of the present invention, and details are not described herein.
  • the following describes a method for predicting a control point motion vector based on the construction of a Merge mode based on an affine transformation model in an embodiment of the present invention.
  • the motion vectors of the upper left vertex and the upper right vertex of the current block are determined by using the motion information of the coded blocks adjacent to the current coded block. It should be noted that FIG. 8 is only used as an example.
  • A0, A1, A2, B0, B1, B2, and B3 are the spatially adjacent positions of the current block and are used to predict CP1, CP2, or CP3;
  • T is the temporally adjacent positions of the current block and used to predict CP4.
  • the coordinates of CP1, CP2, CP3, and CP4 are (0,0), (W, 0), (H, 0), and (W, H), where W and H are the width and height of the current block. Then for each control point of the current block, its motion information is obtained in the following order:
  • the check sequence is B2-> A2-> B3. If B2 is available, the motion information of B2 is used. Otherwise, detect A2, B3. If motion information is not available at all three locations, CP1 motion information cannot be obtained.
  • the check sequence is B0-> B1; if B0 is available, CP2 uses the motion information of B0. Otherwise, detect B1. If motion information is not available at both locations, CP2 motion information cannot be obtained.
  • the detection sequence is A0-> A1;
  • X can be obtained to indicate that the block including the position of X (X is A0, A1, A2, B0, B1, B2, B3, or T) has been encoded and adopts the inter prediction mode; otherwise, the X position is not available. It should be noted that other methods for obtaining motion information of the control points may also be applicable to the embodiments of the present invention, and details are not described herein.
  • the motion information of the control points of the current block is combined to obtain the structured control point motion information.
  • a 4-parameter affine transformation model is used in the current block, the motion information of the two control points of the current block is combined to form a two-tuple, which is used to construct a 4-parameter affine transformation model.
  • the combination of the two control points can be ⁇ CP1, CP4 ⁇ , ⁇ CP2, CP3 ⁇ , ⁇ CP1, CP2 ⁇ , ⁇ CP2, CP4 ⁇ , ⁇ CP1, CP3 ⁇ , ⁇ CP3, CP4 ⁇ .
  • Affine CP1, CP2
  • the motion information of the three control points of the current block is combined to form a triple, which is used to construct a 6-parameter affine transformation model.
  • the combination of the three control points can be ⁇ CP1, CP2, CP4 ⁇ , ⁇ CP1, CP2, CP3 ⁇ , ⁇ CP2, CP3, CP4 ⁇ , ⁇ CP1, CP3, CP4 ⁇ .
  • a 6-parameter affine transformation model constructed using a triplet composed of CP1, CP2, and CP3 control points can be described as Affine (CP1, CP2, CP3).
  • a quadruple formed by combining the motion information of the four control points of the current block is used to construct an 8-parameter bilinear model.
  • An 8-parameter bilinear model constructed using a quaternion of CP1, CP2, CP3, and CP4 control points is denoted as Bilinear (CP1, CP2, CP3, CP4).
  • a tuple is a combination of motion information of four control points (or four coded blocks) referred to as a quadruple.
  • CurPoc represents the POC number of the current frame
  • DesPoc represents the POC number of the reference frame of the current block
  • SrcPoc represents the POC number of the reference frame of the control point
  • MV s represents the motion vector obtained by scaling
  • MV represents the motion vector of the control point.
  • control points can also be converted into a control point at the same position.
  • the 4-parameter affine transformation model obtained by combining ⁇ CP1, CP4 ⁇ , ⁇ CP2, CP3 ⁇ , ⁇ CP2, CP4 ⁇ , ⁇ CP1, CP3 ⁇ , ⁇ CP3, CP4 ⁇ is converted to ⁇ CP1, CP2 ⁇ or ⁇ CP1, CP2, CP3 ⁇ .
  • the conversion method is to substitute the motion vector of the control point and its coordinate information into the above formula (2) to obtain the model parameters, and then substitute the coordinate information of ⁇ CP1, CP2 ⁇ into the above formula (3) to obtain its motion vector.
  • the conversion can be performed according to the following formulas (15)-(23), where W represents the width of the current block, H represents the height of the current block, and in formulas (15)-(23), (vx 0 , vy 0) denotes a motion vector CP1, (vx 1, vy 1) CP2 represents a motion vector, (vx 2, vy 2) represents the motion vector of CP3, (vx 3, vy 3) denotes the motion vector of CP4.
  • ⁇ CP1, CP3 ⁇ conversion ⁇ CP1, CP2 ⁇ or ⁇ CP1, CP2, CP3 ⁇ can be realized by the following formula (16):
  • the conversion from ⁇ CP2, CP4 ⁇ to ⁇ CP1, CP2 ⁇ can be realized by the following formula (20), and the conversion from ⁇ CP2, CP4 ⁇ to ⁇ CP1, CP2, CP3 ⁇ can be realized by the formulas (20) and (21):
  • the 6-parameter affine transformation model of ⁇ CP1, CP2, CP4 ⁇ , ⁇ CP2, CP3, CP4 ⁇ , ⁇ CP1, CP3, CP4 ⁇ is converted into a control point ⁇ CP1, CP2, CP3 ⁇ to represent it.
  • the conversion method is to substitute the motion vector of the control point and its coordinate information into the above formula (4) to obtain the model parameters, and then substitute the coordinate information of ⁇ CP1, CP2, CP3 ⁇ into the formula (5) above to obtain its motion vector.
  • the conversion can be performed according to the following formulas (24)-(26), where W represents the width of the current block, H represents the height of the current block, and in formulas (24)-(26), (vx 0 , vy 0 ) Indicates a motion vector of CP1, (vx 1 , vy 1 ) indicates a motion vector of CP2, (vx 2 , vy 2 ) indicates a motion vector of CP3, and (vx 3 , vy 3 ) indicates a motion vector of CP4.
  • the candidate motion vector list is empty at this time, add the candidate motion information of the control point to the candidate motion vector list; otherwise, iterate through the motion information in the candidate motion vector list and check the candidate motion vectors. Whether the same motion information as the candidate motion information of the control point exists in the list. If there is no motion information in the candidate motion vector list that is the same as the candidate motion information of the control point, the candidate motion information of the control point is added to the candidate motion vector list.
  • a preset sequence is as follows: Affine (CP1, CP2, CP3) ⁇ Affine (CP1, CP2, CP4) ⁇ Affine (CP1, CP3, CP4) ⁇ Affine (CP2, CP3, CP4) ⁇ Affine (CP2, CP3, CP4) ⁇ Affine (CP1, CP2) ⁇ Affine (CP1, CP3) ⁇ Affine (CP2, CP3) ⁇ Affine (CP1, CP4) ⁇ Affine (CP2, CP4) ⁇ Affine (CP3, CP4), a total of 10 combinations.
  • the combination is considered to be unavailable. If a combination is available, determine the reference frame index of the combination (when two control points are selected, the smallest reference frame index is selected as the reference frame index of the combination; when it is greater than two control points, the reference frame index with the most occurrences is selected first. If there are multiple occurrences of multiple reference frame indexes, the smallest reference frame index is selected as the combined reference frame index), and the motion vector of the control point is scaled. If the motion information of all the control points after scaling is consistent, the combination is illegal.
  • the embodiment of the present invention may also fill the candidate motion vector list.
  • the length of the candidate motion vector list at this time is shorter than the maximum list length (such as MaxAffineNumMrgCand). Fill until the length of the list is equal to the maximum list length.
  • It can be filled by a method of supplementing zero motion vectors, or by a method of combining and weighted average of motion information of existing candidates in an existing list. It should be noted that other methods for obtaining candidate motion vector list filling can also be applied to the embodiments of the present invention, and details are not described herein.
  • the non-translational motion model used in the same image sequence is fixed, and the number of parameters of the affine transformation model used for different blocks in the image is the same, that is, affine
  • the number of parameters of the affine transformation model used by the coding block is consistent with the number of parameters of the affine transformation model used by the current block. Therefore, the number of control points of the affine coding block and the position of the control point in the affine coding block, Consistent with the number of control points in the current block and the position of the control points in the current block.
  • the current block also uses a 4-parameter affine transformation model, and the decoder obtains the motion vector of each sub-block in the current block according to the 4-parameter affine transformation model of the current block. Information to achieve reconstruction of each sub-block.
  • the current block also uses an 8-parameter bilinear model, and the decoder obtains the motion of each sub-block in the current block according to the 8-parameter bilinear model of the current block.
  • Vector information so as to achieve the reconstruction of each sub-block.
  • the affine motion of different blocks in the image may be different (that is, the affine motion of the current block may be different from the affine motion of the affine coded block).
  • the affine transformation model for parsing the current block (such as building a list of candidate motion vectors) and reconstruction will cause the coding efficiency and accuracy of the prediction of the current block to be not high, and it is still difficult to meet user needs in some scenarios.
  • the embodiment of the present invention improves the inherited control point motion vector prediction method described above, including two improvement schemes: the first one Improvement plan and second improvement plan.
  • the first improvement scheme may also be referred to as a first motion model prediction method based on a motion model
  • the second improvement scheme may also be referred to as a second motion model prediction method based on a motion model. The following are described separately:
  • the first motion vector prediction method based on the motion model refers to that different blocks in an image sequence are not limited to the affine transformation model used for different blocks, that is, different blocks can use different affine transformation models.
  • the affine transformation model used in the current block can be predefined, or it can be selected from a variety of affine transformation models according to the actual motion or actual needs of the affine transformation model of the current block.
  • the neighboring block of the current block also called the affine coded block at the encoding end and the affine decoded block at the decoding end
  • uses a 2 by N parameter affine transformation model and the 2 by K used by the current block Parameter affine transformation model, and N ⁇ K.
  • the motion vectors (candidate motion vectors) of the K control points of the current block are obtained through interpolation calculation.
  • the process of determining the candidate motion vector of the control point of the current block is described below using A1 as an example shown in FIG. 10.
  • the determination process is mainly described from the perspective of the decoding end.
  • the adjacent block where A1 is located is an affine decoding block.
  • the implementation of the encoding end can be deduced by analogy, that is, if the neighboring block of the current block at the encoding end is an affine encoding block, the implementation situation will not be described herein again.
  • the motion vector (vx4) of the upper left vertex (x4, y4) of the affine decoding block is obtained.
  • vy4 the motion vector (vx5, vy5) of the upper right vertex (x5, y5)
  • the 6-parameter affine transformation model consisting of the motion vectors of the above 3 control points of the affine decoding block is calculated by interpolation according to the following 6-parameter affine transformation model formulas (27) and (28), so as to obtain the upper-left vertex of the current block (x0 , y0) motion vector (vx0, vy0), motion vector (vx1, vy1) of the upper right vertex (x1, y1) of the current block:
  • the motion vector of the upper-left vertex (x4, y4) of the affine decoding block (x4, y4) is obtained ( vx4, vy4), the motion vector (vx5, vy5) of the upper right vertex (x5, y5).
  • the motion vectors of the two control points of the affine decoding block are obtained: the motion vector value (vx4, vy4) of the upper left control point (x4, y4) and the motion vector value (vx5, vy5) of the upper right control point (x5, y5).
  • a 4-parameter affine transformation model composed of 2 control points of the affine decoding block is calculated by interpolation according to the following 4-parameter affine transformation model formulas (29), (30), and (31), so as to obtain the upper-left vertex of the current block (x0 , y0) motion vector (vx0, vy0), motion vector (vx1, vy1) of the upper right vertex (x1, y1) of the current block, and motion vector (vx2, vy2) of the lower left vertex (x2, y2) of the current block:
  • the current block may also use the same model parameter number of the neighboring block. number.
  • the 4-parameter affine transformation model composed of the motion vectors of the above 2 control points of the affine decoding block is calculated by interpolation according to the following 4-parameter affine transformation model formulas (32) and (33), so as to obtain the upper-left vertex of the current block (x0 , y0) motion vector (vx0, vy0), motion vector (vx1, vy1) of the upper right vertex (x1, y1) of the current block:
  • the motion vector of the upper left vertex (x4, y4) of the affine decoding block (x4, y4) is obtained ( vx4, vy4), the motion vector (vx5, vy5) of the upper right vertex (x5, y5), and the motion vector (vx6, vy6) of the lower left vertex (x6, y6).
  • a 6-parameter affine transformation model composed of the motion vectors of the above 3 control points of the affine decoding block is calculated by interpolation according to the following 6-parameter affine transformation model formulas (34), (35), and (36) to obtain the current block Motion vector (vx0, vy0) of the upper left vertex (x0, y0), motion vector (vx1, vy1) of the upper right vertex (x1, y1) of the current block, motion vector (vx2, vy2) of the lower left vertex (x2, y2) of the current block ):
  • the first motion vector prediction method based on the motion model of the present invention can implement the affine transformation model of the neighboring block to construct the current block itself during the analysis phase of the current block (such as the stage of constructing the candidate motion vector list).
  • Affine transformation model, and the affine transformation model of the two can be different. Since the affine transformation model of the current block is more in line with the actual motion situation / actual requirements of the current block, the implementation of this solution can improve the coding efficiency and accuracy of the prediction of the current block to meet user needs.
  • the second motion vector prediction method based on the motion model refers to that different blocks in an image sequence are not limited to the affine transformation model used by different blocks, and different blocks can use the same or different affine transformation models. That is to say, if the neighboring block of the current block (also called the affine coded block at the encoding end and the affine decoded block at the decoding end) uses a 2 by N parameter affine transformation model, and the current block The 2 ⁇ K parametric affine transformation model is adopted. Then, N may be equal to K, and N may not be equal to K.
  • the control point motion vector prediction method described in "3)" above, or the first motion model based on "5)” described above may be used.
  • Motion vector prediction method to obtain the control points of the current block (such as 2 control points, or 3 control points, or 4 control points, etc.).
  • a 6-parameter affine transformation model is uniformly used to obtain the motion vector information of each sub-block in the current block, thereby achieving the reconstruction of each sub-block.
  • the following also uses A1 shown in FIG. 6 as an example to describe the process of determining the candidate motion vector of the control point of the current block (described from the perspective of the decoding end), and so on in other cases.
  • the current block uses a 4-parameter affine transformation model during the analysis phase
  • the neighboring block may use a 4-parameter affine transformation model
  • it may also use other parameter affine models.
  • two current blocks are obtained.
  • the motion vector of the control point for example, the motion vector value (vx4, vy4) of the upper left control point (x4, y4) and the motion vector value (vx5, vy5) of the upper right control point (x5, y5) of the current block.
  • a 6-parameter affine transformation model needs to be constructed according to the motion vectors of the 2 control points of the current block.
  • the following formula (40) is used to obtain the third Motion vector values of three control points, and the motion vector value of the third control point is, for example, the motion vector (vx2, vy2) of the lower left vertex (x2, y2) of the current block:
  • W represents the width of the current block
  • H represents the height of the current block
  • the motion vector (vx0, vy0) of the upper left control point (x0, y0) of the current block the motion vector (vx1, vy1) of the upper right control point (x1, y1), and the motion vector of the lower left vertex (x2, y2) ( vx2, vy2) to obtain the 6-parameter affine model of the current block reconstruction phase.
  • the formula of the 6-parameter affine model is shown in the following formula (37):
  • Implementing the second motion vector prediction method based on the motion model of the present invention can implement a 6-parameter affine transformation model to predict the current block uniformly during the reconstruction stage of the current block. Because the more parameters of the motion model describing the affine motion of the current block, the higher the accuracy, the higher the computational complexity.
  • the 6-parameter affine transformation model constructed in the reconstruction phase of this solution can describe the affine transformation of image blocks such as translation, scaling, rotation, etc., and achieves a good balance between model complexity and modeling capabilities. Therefore, the implementation of this solution can improve the coding efficiency and accuracy of prediction of the current block, and meet user needs.
  • first improvement scheme and the second improvement scheme may also be implemented together.
  • the first motion vector prediction method based on the motion model described in "5)” above can be used.
  • the motion vectors of the 2 control points of the current block and then according to the second motion model prediction method based on the motion model described in "6)" above, the motion vectors of the 2 control points are unified to The 6-parameter affine transformation model is used to subsequently reconstruct each sub-block of the current block.
  • the first motion vector based on the motion model described in "5)” above can be used to predict the motion vector.
  • Method to obtain the motion vectors of the 3 control points of the current block, and then according to the formula (32) in the second motion model prediction method based on the motion model described in "6)" above, according to the 3 The motion vectors of the control points are combined into a 6-parameter affine transformation model, and subsequent reconstruction of each sub-block of the current block is realized.
  • the comprehensive implementation scheme may also be other embodiments, which are not described in detail here.
  • the AMVP mode (Affine AMVP mode) based on the affine transformation model and the Merge mode (Affine Merge mode) based on the affine transformation model are further described below.
  • a first motion vector prediction method based on a motion model and / or a control point motion vector prediction method constructed may be used to construct an AMVP mode based on the AMVP mode.
  • Candidate motion vector list (or control point motion vector prediction value candidate list).
  • the inherited control point motion vector prediction method and / or the constructed control point motion vector prediction method may be used to construct a candidate motion vector list based on the AMVP mode (or a control point motion vector prediction value candidate list).
  • the predicted value of the control point motion vector in the list may include 2 (if the current block is a 4-parameter affine transformation model) candidate control point motion vectors, or 3 (if the current block is a 6-parameter affine transformation model) Case) Candidate control point motion vector, or includes 4 (such as the case where the current block is an 8-parameter bilinear model) candidate control point motion vector.
  • the candidate list of control point motion vector prediction values can also be pruned and sorted according to a specific rule, and it can be truncated or filled to a specific number.
  • the encoder uses each control point motion vector prediction value in the control point motion vector prediction value candidate list by using the above formula (3) or (5) or (7) Obtain the motion vector of each sub-motion compensation unit in the current coding block, and then obtain the pixel value of the corresponding position in the reference frame pointed by the motion vector of each sub-motion compensation unit, as its predicted value, perform motion compensation using the affine transformation model .
  • control point motion vector prediction value is used as the search starting point to perform a motion search within a certain search range to obtain control point motion vectors (CPMV), and calculate the control point motion vector and control point motion vector.
  • CPMV control point motion vectors
  • the encoder passes the index value indicating the position of the control point motion vector prediction value in the control point motion vector prediction value candidate list and the CPMVD encoded input code stream to the decoding end.
  • the decoder (such as the aforementioned video decoder 200) parses and obtains the index value and the control point motion vector difference (CPMVD) in the code stream, and determines the control point motion from the control point motion vector prediction value candidate list according to the index value.
  • Vector predictive value control point motion vector predictor, CPMVP
  • the inherited control point motion vector prediction method and / or the constructed control point motion vector prediction method may be used to construct a candidate motion vector list (or control point) for the Merge mode.
  • Motion vector fusion candidate list may be used to construct a candidate motion vector list for Merge mode (or control point motion vector fusion).
  • control point motion vector fusion candidate list can be pruned and sorted according to a specific rule, and it can be truncated or filled to a specific number.
  • the encoder uses each control point motion vector in the fusion candidate list to obtain each sub-motion in the current encoding block by formula (3) or (5) or (7)
  • the motion vector of the compensation unit pixel point or pixel block of size M ⁇ N divided by a specific method
  • the pixel value of the position in the reference frame pointed to by the motion vector of each sub-motion compensation unit is used as its prediction value.
  • Affine motion compensation Calculate the average of the difference between the original value and the predicted value of each pixel in the current coding block, and select the control point motion vector corresponding to the smallest average value of the difference as the current coding block's 2 or 3 or 4 control points Motion vector.
  • the index value indicating the position of the control point motion vector in the candidate list is encoded into a code stream and sent to the decoding end.
  • a decoder (such as the aforementioned video decoder 200) parses an index value and determines a control point motion vector (CPMV) from a control point motion vector fusion candidate list according to the index value.
  • CPMV control point motion vector
  • At least one means one or more, and “multiple” means two or more.
  • “And / or” describes the association relationship of related objects, and indicates that there can be three kinds of relationships, for example, A and / or B can indicate: A exists alone, A and B exist simultaneously, and B alone exists, where A, B can be singular or plural.
  • the character “/” generally indicates that the related objects are an "or” relationship.
  • “At least one or more of the following" or similar expressions refers to any combination of these items, including any combination of single or plural items. For example, at least one (a), a, b, or c can be expressed as: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple .
  • an embodiment of the present invention provides a motion vector prediction method.
  • the method may be executed by a video decoder 200.
  • the video decoder 200 may be implemented by a video decoder.
  • the inter predictor 210 of 200 is executed.
  • the video decoder 200 may perform part or all of the following steps according to a video data stream having multiple video frames to predict the motion information of each sub-block in the current decoded block (referred to as the current block) of the current video frame and perform motion compensation.
  • the method includes, but is not limited to, the following steps:
  • Step 601 Parse the bitstream to determine the inter prediction mode of the current decoded block.
  • the video decoder 200 on the decoding end may parse the syntax elements in the code stream transmitted from the encoding end to obtain instruction information for indicating the inter prediction mode, so as to determine the inter prediction mode of the current block according to the instruction information. .
  • steps 602a to 606a are performed subsequently.
  • step 602b-step 605b is subsequently performed.
  • Step 602a Construct a candidate motion vector list of the AMVP mode of the affine transformation model.
  • a first motion vector prediction method based on a motion model may be used to obtain a candidate motion vector of a control point of a current block to be added to a candidate motion vector list corresponding to an AMVP mode.
  • the first motion vector prediction method based on the motion model and the control point motion vector prediction method constructed can also be used, and the candidate motion vectors of the control points of the current block are separately added to the AMVP mode correspondingly. List of candidate motion vectors.
  • the candidate motion vector list of the AMVP mode may be a two-tuple list, and the two-tuple list includes one or more two-tuples for constructing a 4-parameter affine transformation model.
  • the candidate motion vector list of the AMVP mode may be a triple list, and the triple list includes one or more triples for constructing a 6-parameter affine transformation model.
  • the candidate motion vector list of the AMVP mode may be a list of quads, and the list of quads includes one or more quads used to construct the 8-parameter bilinear model.
  • the candidate motion vector 2-tuple / triple-tuple / quaternion list can be pruned and sorted according to a specific rule, and it can be truncated or filled to a specific number.
  • the first motion vector prediction method based on a motion model for example, taking FIG. 10 as an example, it can traverse the adjacent position blocks around the current block in the order of A1 ⁇ B1 ⁇ B0 ⁇ A0 ⁇ B2 in FIG. 10 to find adjacent positions.
  • the affine decoding block where the block is located (such as the affine decoding block where A1 is located in FIG. 10), uses the control points of the affine decoding block to construct the affine transformation model of the affine decoding block, and then uses the affine decoding block of the affine decoding block.
  • the transformation model derives candidate motion vectors (such as candidate motion vector binary / triad / quad) for the control points of the current block, and adds them to the candidate motion vector list corresponding to the AMVP mode. It should be noted that other search sequences may also be applicable to the embodiments of the present invention, and details are not described herein.
  • both the encoding end and the decoding end can first use the and Affine decoded blocks with the same number of parameters in the current block model obtain candidate motion vectors for the control points of the current block and add them to the candidate motion vector list corresponding to the AMVP mode. Then, an affine decoding block with a different number of parameters from the current block model is used to obtain candidate motion vectors for the control points of the current block, and added to the candidate motion vector list corresponding to the AMVP mode.
  • the parameter model of the current decoding block is a 4-parameter affine transformation model.
  • the affine decoding block where B1 is located uses a 4-parameter affine transformation model.
  • the A1 affine decoding block uses a 6-parameter affine transformation model.
  • the motion vector of the two control points of the current block can be derived by using the affine decoding block where B1 is located and added to the list, and then the motion vectors of the two control points of the current block can be derived by using the affine decoding block where A1 is located and added. To the list.
  • the parameter model of the current decoding block is a 6-parameter affine transformation model. After traversing neighboring blocks around the current block, it is determined that the affine decoding block where A1 is located uses the 6-parameter affine transformation model, and the affine where B1 is located The decoding block uses a 4-parameter affine transformation model. Then, the motion vector of the three control points of the current block can be derived by using the affine decoding block where A1 is located and added to the list, and then the motion vectors of the three control points of the current block can be derived by using the affine decoding block where B1 is located and added. To the list.
  • the affine transformation model used in different blocks is not limited, that is, the number of parameters of the affine transformation model used in the current block may be different from the affine decoding block, or may be the same as the affine decoding block.
  • the affine transformation model adopted by the current block may be determined by analyzing the code stream, that is, in this case, the code stream includes indication information of the affine transformation model of the current block; in one embodiment, the current The affine transformation model used by the block may be pre-configured; in one embodiment, the affine transformation model used by the current block may be selected from a variety of affine transformation models according to the actual motion situation or actual needs of the current block. from.
  • the flag information (flag) of the affine transformation model is stored in advance at the decoding end locally, and the flag is used to indicate the affine transformation model that is actually used when the affine decoding block performs its own sub-block prediction.
  • the decoding end when the decoding end recognizes the flag of the affine decoding block, it is determined that the number of model parameters of the affine transformation model actually used by the affine decoding block and the affine transformation model used by the current block are different (or The same), trigger the decoder to use the affine transformation model actually adopted by the affine decoding block to derive the candidate motion vector of the control point of the current block.
  • the decoding end recognizes the flag of the affine decoding block, the number of model parameters of the affine transformation model actually used by the affine decoding block and the affine transformation model used by the current block is determined.
  • the affine decoding block uses a 6-parameter affine transformation model, then, the decoder obtains the motion vectors of the three control points of the affine decoding block: the motion vector (vx4, vy4) of the upper left vertex (x4, y4) , The motion vector (vx5, vy5) of the upper right vertex (x5, y5), and the motion vector (vx6, vy6) of the lower left vertex (x6, y6).
  • the candidate motion vectors of the upper-left vertex and upper-right vertex control points of the current block are derived according to the 6-parameter affine transformation model formulas (27) and (28), respectively.
  • the decoder when the 4-block affine transformation model is used in the current block, if the decoder recognizes the flag of the affine decoding block, the model parameters of the affine transformation model actually used by the affine decoding block and the affine transformation model used by the current block are determined.
  • the affine decoding block also adopts a 4-parameter affine transformation model, then, the decoder obtains the motion vectors of the two control points of the affine decoding block: the motion vector value of the upper left control point (x4, y4) ( vx4, vy4) and the motion vector value (vx5, vy5) of the upper right control point (x5, y5).
  • a 4-parameter affine transformation model consisting of two control points of the affine decoding block is used to derive the candidate motion vectors of the upper-left vertex and upper-right vertex control points of the current block according to the 4-parameter affine transformation model formulas (32) and (33), respectively.
  • affine decoding may not be required.
  • the flag of the block's affine transformation model may not be required.
  • the decoder determines the affine transformation model used by the current block
  • the decoder obtains the control of a specific number of affine decoded blocks (the specific number is the same as or different from the number of control points of the current block).
  • Point a specific number of control points of the affine decoding block are used to form an affine transformation model, and the affine transformation model is used to derive a candidate motion vector of the control point of the current block.
  • the decoder does not need to judge the affine transformation model actually used by the affine decoding block (the affine transformation model actually used by the affine decoding block may be 4 parameters or 6 parameters or 8 parameters Affine transformation model), and directly obtain the motion vector of the two control points of the affine decoding block: the motion vector value (vx4, vy4) of the upper left control point (x4, y4) and the upper right control point (x5, y5) Motion vector value (vx5, vy5).
  • the motion vectors of the control points of the upper left vertex and upper right vertex of the current block are derived according to the 4-parameter affine model formulas (32) and (33), respectively.
  • Step 603a Determine the optimal motion vector prediction value of the control point according to the index value.
  • an index value of the candidate motion vector list is obtained by analyzing the code stream, and an optimal control point motion vector prediction value is determined from the candidate motion vector list constructed in the foregoing step 602a according to the index value.
  • the affine motion model used in the current block is a 4-parameter affine motion model
  • an index value is obtained by analysis, and an optimal motion vector prediction value of two control points is determined from the candidate motion vector two-tuple list according to the index value.
  • the index value is obtained by analysis, and the optimal motion vector prediction value of 3 control points is determined from the candidate motion vector triple list according to the index value.
  • the index value is obtained by analysis, and the optimal motion vector prediction value of 4 control points is determined from the candidate motion vector quad list according to the index value.
  • Step 604a Determine the actual motion vector of the control point according to the motion vector difference.
  • the motion vector difference of the control point is obtained by analyzing the code stream, and then the motion vector of the control point is obtained according to the motion vector difference of the control point and the optimal control point motion vector prediction value determined in the foregoing step 603a.
  • the affine motion model used by the current block is a 4-parameter affine motion model.
  • the difference between the motion vector of the two control points of the current block is obtained from the code stream.
  • the upper left position can be obtained from the code stream by decoding.
  • the motion vector difference value and the motion vector prediction value of each control point are respectively added to obtain the actual motion vector value of the control point, that is, the motion vector values of the upper left position control point and the upper right position control point of the current block are obtained.
  • the current block affine motion model is a 6-parameter affine motion model.
  • the motion vector difference of the three control points of the current block is obtained from the code stream.
  • the upper left control point and the upper right are decoded from the code stream.
  • the motion vector difference value and the motion vector prediction value of each control point are respectively added to obtain the actual motion vector value of the control point, that is, the motion vector values of the upper left control point, the upper right control point, and the lower left control point of the current block are obtained.
  • embodiments of the present invention may also be other affine motion models and other control point positions, and details are not described herein.
  • Step 605a Obtain a motion vector value of each sub-block of the current block according to the affine transformation model adopted by the current block.
  • Position pixel motion information represents the motion information of all pixels in the motion compensation unit. Assuming that the size of the motion compensation unit is MxN, the preset position pixels can be the center point of the motion compensation unit (M / 2, N / 2), the upper left pixel (0,0), and the upper right pixel (M-1,0 ), Or pixels at other locations.
  • the following description uses the center point of the motion compensation unit as an example, and FIG. 11A and FIG. 11B.
  • FIG. 11A exemplarily shows a current block and a motion compensation unit of the current block.
  • Each small box in the figure represents a motion compensation unit, and each movement compensation unit in the figure has a size of 4x4.
  • the gray point in the motion compensation unit indicates the center point of the motion compensation unit.
  • V0 represents the motion vector of the upper-left control point of the current block
  • V1 represents the motion vector of the upper-right control point of the current block
  • V2 represents the motion vector of the lower-left control point of the current block.
  • FIG. 11B exemplarily shows another current block and a motion compensation unit of the current block.
  • Each small box in the figure represents a motion compensation unit, and the size of each motion compensation unit in the figure is 8x8.
  • the gray point in each motion compensation unit represents the center point of the motion compensation unit.
  • V0 represents the motion vector of the upper left control point of the current block
  • V1 represents the motion vector of the upper right control point of the current block
  • V2 represents the motion vector of the lower left control point of the current block.
  • the coordinates of the center point of the motion compensation unit relative to the top left pixel of the current block can be calculated using the following formula (38):
  • (x (i, j) , y (i, j) ) Represents the coordinates of the center point of the (i, j) th motion compensation unit relative to the upper left control point pixel of the current affine decoding block.
  • the affine motion model used in the current affine decoding block is a 6-parameter affine motion model, substituting (x (i, j) , y (i, j) ) into the aforementioned 6-parameter affine motion model formula (37) to obtain
  • the motion vector of the center point of each motion compensation unit is used as the motion vector of all the pixels in the motion compensation unit (vx (i, j) , vy (i, j) ):
  • the affine motion model used in the current affine decoding block is a 4 affine motion model
  • substitute (x (i, j) , y (i, j) ) into the 4-parameter affine motion model formula (39) and obtain each
  • the motion vector of the center point of the motion compensation unit is used as the motion vector (vx (i, j) , vy (i, j) ) of all pixels in the motion compensation unit:
  • Step 606a Perform motion compensation for each sub-block according to the determined motion vector value of the sub-block to obtain a pixel prediction value of the sub-block.
  • Step 602b Construct a candidate motion vector list of the Merge mode of the affine transformation model.
  • a first motion vector prediction method based on a motion model may also be used to obtain a candidate motion vector of a control point of a current block to be added to a candidate motion vector list corresponding to a Merge mode.
  • the first motion vector prediction method based on the motion model and the control point motion vector prediction method based on the structure can also be used.
  • the candidate motion vectors of the control points of the current block are added to the Merge mode respectively. List of candidate motion vectors.
  • the candidate motion vector list may be a two-tuple list, and the two-tuple list includes one or more for constructing 4 A two-tuple of parametric affine transformation models.
  • the candidate motion vector list may be a triple list, and the triple list includes one or more triples for constructing a 6-parameter affine transformation model.
  • the candidate motion vector list may be a quaternion list, and the quaternary list includes one or more quaternions used to construct the 8-parameter bilinear model.
  • the candidate motion vector 2-tuple / triple-tuple / quaternion list can be pruned and sorted according to a specific rule, and it can be truncated or filled to a specific number.
  • the first motion vector prediction method based on a motion model for example, taking FIG. 10 as an example, it can traverse the adjacent position blocks around the current block in the order of A1 ⁇ B1 ⁇ B0 ⁇ A0 ⁇ B2 in FIG. 10 to find The affine decoding block where the adjacent position block is located, uses the control points of the affine decoding block to construct the affine transformation model of the affine decoding block, and then uses the affine transformation model of the affine decoding block to derive the control point of the current block.
  • Candidate motion vectors (such as candidate motion vector tuples / triads / quads) are added to the candidate motion vector list corresponding to the Merge mode. It should be noted that other search sequences may also be applicable to the embodiments of the present invention, and details are not described herein.
  • the candidate motion information of the control point is added to the candidate list; otherwise, the motion information in the candidate motion vector list is sequentially traversed in order to check the candidate motion vector list. Whether there is the same motion information as the candidate motion information of the control point. If there is no motion information in the candidate motion vector list that is the same as the candidate motion information of the control point, the candidate motion information of the control point is added to the candidate motion vector list.
  • judging whether two candidate motion information are the same requires judging whether their forward and backward reference frames and the horizontal and vertical components of each forward and backward motion vector are the same. Only when all the above elements are different, the two motion information are considered different.
  • the candidate list is constructed, otherwise the next neighboring position block is traversed.
  • the 6-parameter affine transformation model is used for different blocks in the image.
  • the motion vectors of the three control points of the affine decoding block where A1 is located the motion vector value (vx4, vy4) of the upper left control point (x4, y4) and the upper right control point (x5, y5) Motion vector value (vx5, vy5) and motion vector (vx6, vy6) of the lower left vertex (x6, y6).
  • Step 603b Determine a motion vector value of the control point according to the index value.
  • an index value of the candidate motion vector list is obtained by analyzing the code stream, and an actual motion vector of the control point is determined from the candidate motion vector list constructed in the foregoing step 602b according to the index value.
  • the affine motion model used in the current block is a 4-parameter affine motion model
  • an index value is obtained by analysis, and the motion vector values of the two control points are determined from the candidate motion vector two-tuple list according to the index values.
  • the affine motion model used in the current block is a 6-parameter affine motion model
  • an index value is obtained by analysis, and the motion vector values of the three control points are determined from the candidate motion vector triplet list according to the index values.
  • the index value is obtained by analysis, and the motion vector values of the four control points are determined from the candidate motion vector quaternion list according to the index value.
  • Step 604b Obtain a motion vector value of each sub-block according to the affine transformation model adopted by the current block.
  • Step 605a For detailed implementation of this step, reference may be made to the description of step 605a above. For brevity of the description, details are not described herein again.
  • Step 605b Each sub-block performs motion compensation according to a corresponding motion vector value to obtain a pixel prediction value of the sub-block.
  • the decoder uses the first motion vector prediction method based on the motion model in the prediction process of the current block, which can implement the analysis phase of the current block (such as the construction of the AMVP mode or the Merge mode).
  • Stage of the candidate motion vector list using the affine transformation model of the neighboring block to construct the affine transformation model for the current block itself, and the affine transformation model of the two may be different or the same. Since the affine transformation model of the current block is more in line with the actual motion situation / actual requirements of the current block, the implementation of this solution can improve the coding efficiency and accuracy of the prediction of the current block to meet user needs.
  • an embodiment of the present invention provides another motion vector prediction method.
  • the method may be performed by the video decoder 200. Specifically, the method may be decoded by a video.
  • the inter predictor 210 of the decoder 200 is executed.
  • the video decoder 200 may perform part or all of the following steps according to a video data stream having multiple video frames to predict the motion information of each sub-block in the current decoded block (referred to as the current block) of the current video frame and perform motion compensation. As shown in FIG. 12, the method includes but is not limited to the following steps:
  • Step 701 Parse the bitstream to determine the inter prediction mode of the current decoded block.
  • the video decoder 200 on the decoding end may parse the syntax elements in the code stream transmitted from the encoding end to obtain instruction information for indicating the inter prediction mode, so as to determine the inter prediction mode of the current block according to the instruction information. .
  • steps 702a to 706a are performed subsequently.
  • step 702b-step 705b is subsequently performed.
  • Step 702a Construct a candidate motion vector list of the AMVP mode of the affine transformation model.
  • the affine transformation model used for different blocks of an image in an image sequence is not limited, that is, different blocks may use the same or different affine transformation models.
  • the inherited control point motion vector prediction method can be used to obtain the candidate motion vectors of the control points of the current block to be added to the candidate motion vector list corresponding to the AMVP mode.
  • the first motion model-based motion vector prediction method may be used to obtain a candidate motion vector of a control point of a current block to be added to a candidate motion vector list corresponding to an AMVP mode.
  • a control point motion vector prediction method is constructed to obtain a candidate motion vector of a control point of a current block and add it to a candidate motion vector list corresponding to an AMVP mode.
  • any two of the inherited control point motion vector prediction method, the second motion model-based motion vector prediction method, or the constructed control point motion vector prediction method may also be used to obtain the current block.
  • the candidate motion vectors of the control points are added to the candidate motion vector list corresponding to the AMVP mode.
  • the inherited control point motion vector prediction method, the second motion model-based motion vector prediction method, and the constructed control point motion vector prediction method may also be used to obtain control point candidates for the current block.
  • the motion vectors are added to the candidate motion vector list corresponding to the AMVP mode.
  • the candidate motion vector list of the AMVP mode may be a list of two-tuples, and the list of two-tuples includes one or more Two-tuple.
  • the candidate motion vector list of the AMVP mode may be a triple list, and the triple list includes one or more triples for constructing a 6-parameter affine transformation model.
  • the candidate motion vector list of the AMVP mode may be a list of quads, and the list of quads includes one or more quads used to construct the 8-parameter bilinear model.
  • the candidate motion vector 2-tuple / triple-tuple / quaternion list can also be pruned and sorted according to a specific rule, and it can be truncated or filled to a specific number.
  • Step 703a Determine the optimal motion vector prediction value of the control point according to the index value. For specific content, reference may be made to the related description in step 603a of the foregoing FIG. 9 embodiment, and details are not described herein again.
  • Step 704a Determine the motion vector values of the three control points of the current block according to the motion vector difference.
  • the motion vector difference of the control point is obtained by analyzing the code stream, and then the motion vector of the control point is obtained according to the motion vector difference of the control point and the optimal control point motion vector prediction value determined in the foregoing step 703a. Then, based on the obtained motion vectors of the control points, the motion vector values of the three control points of the current block are determined.
  • the candidate motion vector list constructed by the decoder in step 702a is a two-tuple list.
  • the index value is parsed, and two control points (i.e., the two-tuple) are determined from the candidate motion vector list according to the index value.
  • Motion vector prediction (MVP) Motion vector prediction
  • step 704a the motion vector difference (MVD) of the two control points of the current block is decoded from the code stream, and the motion vector values (MV) of the two control points are obtained according to the MVP and MVD of the two control points, respectively.
  • the motion vector values of the two control points are, for example, the motion vector values (vx0, vy0) of the upper left control point (x0, y0) of the current block, and the motion vector values (vx1) of the upper right control point (x1, y1) of the current block. , vy1). Then, according to the motion vector values of the two control points of the current block, a 4-parameter affine transformation model is formed, and the 4-parameter affine transformation model formula (40) is used to obtain the motion vector value of the third control point.
  • the motion vector value of the control points is, for example, the motion vector (vx2, vy2) of the lower left vertex (x2, y2) of the current block. In this way, the motion vector values of the top left vertex, top right vertex, and bottom left vertex of the current block are determined.
  • the candidate motion vector list constructed by the decoder in step 702a is a triple list.
  • the index value is parsed, and three control points (i.e., ternary) are determined from the candidate motion vector list according to the index value. Group) of motion vector predictions (MVP).
  • MVP motion vector predictions
  • the motion vector difference (MVD) of the three control points of the current block is decoded from the code stream, and the motion vector values (MV) of the three control points are obtained according to the MVP and MVD of the three control points, respectively.
  • the motion vector values of the three control points are, for example, the motion vector values (vx0, vy0) of the upper left control point (x0, y0) of the current block, and the motion vector values of the upper right control point (x1, y1) of the current block (vx1, vy1 ), And the motion vector (vx2, vy2) of the lower left vertex (x2, y2) of the current block.
  • the motion vector values of the top left vertex, top right vertex, and bottom left vertex of the current block are determined.
  • the candidate motion vector list constructed by the decoder in step 702a is a quadruple list.
  • the index value is parsed, and four control points (that is, quaternary) are determined from the candidate motion vector list according to the index value. Group) of motion vector predictions (MVP).
  • MVP motion vector predictions
  • step 704a the motion vector difference (MVD) of the four control points of the current block is obtained by decoding from the code stream, and the motion vector values (MV) of the four control points are obtained according to the MVP and MVD of the four control points, respectively.
  • the motion vector values of the four control points are, for example: the motion vector value (vx0, vy0) of the upper left control point (x0, y0) of the current block, and the motion vector value (vx1, vy1) of the upper right control point (x1, y1) of the current block ), The motion vector (vx2, vy2) of the lower left vertex (x2, y2) of the current block, and the motion vector (vx3, vy3) of the lower right vertex (x3, vy3) of the current block. Then, the decoding end may only use the motion vector values of the three control points: the upper left vertex, the upper right vertex, and the lower left vertex of the current block.
  • Step 705a According to the three control points of the current block, a 6-parameter affine transformation model is used to obtain the motion vector value of each sub-block.
  • a 6-parameter affine transformation model can be formed based on the motion vector values of the 3 control points of the current block.
  • the projective transformation model obtains the motion vector value of each sub-block.
  • the motion vector values of the three control points are, for example, the motion vector value (vx0, y0) of the upper left control point (x0, y0) of the current block, and the motion vector value of the upper right control point (x1, y1) of the current block. Value (vx1, vy1), and the motion vector (vx2, vy2) of the lower left vertex (x2, y2) of the current block.
  • the coordinates of the preset position pixels of each sub-block (or each motion compensation unit) in the current block with respect to the upper-left vertex (or other reference point) of the current block (x (i, j) , y (i, j ) ) Substituting into the above formula (37), the motion vector value of each sub-block can be obtained.
  • the preset position pixel point may be the center point of each sub-block (or each motion compensation unit), and the center point of each sub-block (or each motion compensation unit) relative to the coordinates of the upper-left vertex pixel of the current block (x (i, j) , y (i, j) ) can be calculated using formula (38).
  • formula (38) For specific content, reference may also be made to the related descriptions of the embodiment in FIG. 11A and the embodiment in FIG. 11B, and details are not described herein again.
  • Step 706a Each sub-block performs motion compensation according to a corresponding motion vector value to obtain a pixel prediction value of the sub-block.
  • Step 702b Construct a candidate motion vector list of the affine-transformed Merge mode.
  • the affine transformation model used for different blocks of an image in an image sequence is not limited, that is, different blocks may use the same or different affine transformation models.
  • the inherited control point motion vector prediction method can be used to obtain the candidate motion vectors of the control points of the current block to be added to the candidate motion vector list corresponding to the Merge mode.
  • a first motion model-based motion vector prediction method may be used to obtain a candidate motion vector of a control point of a current block to be added to a candidate motion vector list corresponding to a Merge mode.
  • a control point motion vector prediction method is constructed to obtain a candidate motion vector of a control point of a current block and add it to a candidate motion vector list corresponding to a Merge mode.
  • any two of the inherited control point motion vector prediction method, the second motion model-based motion vector prediction method, or the constructed control point motion vector prediction method may also be used to obtain the current block.
  • the candidate motion vectors of the control points are added to the candidate motion vector list corresponding to the Merge mode, respectively.
  • the inherited control point motion vector prediction method, the second motion model-based motion vector prediction method, and the constructed control point motion vector prediction method may also be used to obtain control point candidates for the current block.
  • the motion vectors are added to the candidate motion vector list corresponding to the Merge mode, respectively.
  • the candidate motion vector list established by the decoding end may be a candidate motion vector two-tuple / three-tuple / quaternion list.
  • the candidate motion vector 2-tuple / triple-tuple / quaternion list can be pruned and sorted according to a specific rule, and it can be truncated or filled to a specific number.
  • Step 703b Obtain a motion vector value of the control point according to the index value.
  • the index value of the candidate motion vector list is obtained by analyzing the code stream, and the actual motion vector of the control point is determined from the candidate motion vector list constructed in the foregoing step 702b according to the index value.
  • this step reference may also be made to the related description of step 603b in the embodiment of FIG. 9, and details are not described herein again.
  • Step 704b Determine the motion vector values of the three control points of the current block according to the obtained motion vectors of the control points.
  • the decoder obtains the motion vector values of two control points (ie, a tuple) in step 703b.
  • the motion vector values of the two control points are, for example, the motion of the upper left control point (x0, y0) of the current block.
  • a 4-parameter affine transformation model is formed, and the 4-parameter affine transformation model formula (31) is used to obtain the motion vector value of the third control point.
  • the motion vector value of the control points is, for example, the motion vector (vx2, vy2) of the lower left vertex (x2, y2) of the current block. In this way, the motion vector values of the top left vertex, top right vertex, and bottom left vertex of the current block are determined.
  • the decoder obtains the motion vector values of 3 control points (ie, triples) in step 703b.
  • the motion vector values of the 3 control points are, for example, the upper left control point (x0, y0) of the current block.
  • the motion vector values of the top left vertex, top right vertex, and bottom left vertex of the current block are determined.
  • the decoder obtains the motion vector values of the four control points (that is, the quads) in step 703b.
  • the motion vector values of the four control points are, for example, the upper left control point (x0, y0) of the current block.
  • Motion vector value (vx0, vy0) motion vector value (vx1, vy1) of the upper right control point (x1, y1) of the current block, motion vector (vx2, vy2) of the lower left vertex (x2, y2) of the current block, and the current block
  • the decoding end may only use the motion vector values of the three control points: the upper left vertex, the upper right vertex, and the lower left vertex of the current block.
  • Step 705b According to the three control points of the current block, a 6-parameter affine transformation model is used to obtain the motion vector value of each sub-block.
  • a 6-parameter affine transformation model is used to obtain the motion vector value of each sub-block.
  • Step 706b Each sub-block performs motion compensation according to a corresponding motion vector value to obtain a pixel prediction value of the sub-block.
  • the decoder uses a second motion vector prediction method based on the motion model in the prediction process of the current block, which can implement the parameters of the affine transformation model used by the current block during the analysis phase.
  • the number can be different or the same as that of the neighboring blocks.
  • a 6-parameter affine transformation model is used to predict the current block.
  • the 6-parameter affine transformation model constructed in the reconstruction phase of this solution can describe the affine transformation of the image block such as translation, scaling, rotation, etc., and achieves a good balance between model complexity and modeling capabilities. Therefore, the implementation of this solution can improve the coding efficiency and accuracy of prediction of the current block, and meet user needs.
  • FIG. 13 shows a flowchart of another motion vector prediction method according to an embodiment of the present invention.
  • the method may be performed by the video encoder 100.
  • the method may be performed by the inter predictor 110 of the video encoder 100.
  • the video encoder 100 may perform part or all of the following steps according to a video data stream having multiple video frames to encode a current encoding block (referred to as a current block) of a current video frame.
  • the method includes, but is not limited to, the following steps:
  • multiple inter prediction modes may also be preset.
  • the multiple intra prediction modes include, for example, the AMVP mode based on the affine motion model described above and Based on the merge mode of the affine motion model, the encoder traverses the multiple inter prediction modes to determine the inter prediction mode that is optimal for the current block prediction.
  • only one inter prediction mode may be preset, that is, in this case, the encoding end directly determines that the default inter prediction mode is currently used.
  • the default The inter-prediction mode is AMVP mode based on affine motion model or merge mode based on affine motion model.
  • steps 802a to 804a are performed subsequently.
  • steps 802b to 804b are performed subsequently.
  • the coding end adopts a design solution of the first motion vector prediction method based on a motion model. Then, for specific implementation of this step, reference may be made to the description of step 602a in the foregoing embodiment of FIG.
  • the coding end adopts a design scheme of a second motion vector prediction method based on a motion model. Then, for specific implementation of this step, reference may be made to the description of step 702a in the foregoing embodiment of FIG. 12, and details are not described herein again.
  • the encoding end may use the control point motion vector prediction value in the candidate motion vector list (such as the candidate motion vector binary / triple / quad) by using formula (3) or (5) or ( 7) Obtain the motion vector of each sub motion compensation unit in the current block, and then obtain the pixel value of the corresponding position in the reference frame pointed by the motion vector of each sub motion compensation unit, as its predicted value, perform the motion using the affine motion model make up. Calculate the average of the difference between the original value and the predicted value of each pixel in the current coding block, select the control point motion vector prediction value corresponding to the smallest average value as the optimal control point motion vector prediction value, and use it as the current block Motion vector predictions for 2 or 3 or 4 control points.
  • the candidate motion vector list such as the candidate motion vector binary / triple / quad
  • the encoder can use the optimal control point motion vector prediction value as the search starting point to perform a motion search within a certain search range to obtain control point motion vectors (CPMV) and calculate the control point motion vector. And the control point motion vector prediction (CPMVD). Then, the encoding end encodes the index value indicating the position of the control point motion vector prediction in the candidate motion vector list and the CPMVD code into the code. In addition, the indication information of the inter prediction mode can also be coded into a code stream for subsequent transmission to the decoding end.
  • CPMV control point motion vectors
  • the encoding end encodes the index value indicating the position of the control point motion vector prediction in the candidate motion vector list and the CPMVD code into the code.
  • the indication information of the inter prediction mode can also be coded into a code stream for subsequent transmission to the decoding end.
  • the encoding end may encode the instruction information indicating the affine transformation model (number of parameters) used by the current block into a code stream, and then pass it to the decoding end, so that the decoding end determines the current block according to the instruction information.
  • the encoding end adopts a design solution of the first motion vector prediction method based on a motion model. Then, for specific implementation of this step, reference may be made to the description of step 602b in the foregoing embodiment of FIG. 9, and details are not described herein again.
  • the coding end adopts a design scheme of a second motion vector prediction method based on a motion model. Then, for specific implementation of this step, reference may be made to the description of step 702b in the foregoing embodiment of FIG. 12, and details are not described herein again.
  • the encoding end may use the control point motion vector (such as the candidate motion vector binary / triad / quad) in the candidate motion vector list by formula (3) or (5) or (7)
  • the motion vector of each sub motion compensation unit in the current coding block is obtained, and then the pixel value of the position in the reference frame pointed to by the motion vector of each sub motion compensation unit is used as its prediction value to perform affine motion compensation.
  • the optimal control The point motion vector is the motion vector of the 2 or 3 or 4 control points of the current coding block.
  • the encoding end may encode an index value indicating the position of the control point motion vector in the candidate list into a code stream, and the indication information of the inter prediction mode is encoded into the code stream for subsequent transmission to the decoding end.
  • the encoding end may encode the instruction information indicating the affine transformation model (number of parameters) used by the current block into a code stream, and then pass it to the decoding end, so that the decoding end determines the current block according to the instruction information.
  • the above embodiment only describes the process of encoding and code stream sending by the encoding end. According to the foregoing description, those skilled in the art understand that the encoding end can also implement other methods described in the embodiments of the present invention in other links.
  • the specific implementation of the reconstruction process of the current block can refer to the related method described above at the decoding end (as shown in the embodiment of FIG. 9 or FIG. 12), which will not be described again here.
  • the encoding end implements the design scheme of the first motion vector prediction method based on the motion model during the encoding process of the current block, which can be implemented in the analysis phase of the current block (for example, in the stage of constructing the candidate motion vector list of the AMVP mode or the Merge mode), the affine transformation model of the neighboring block is used to build the affine transformation model for the current block itself, and the affine transformation models of the two can be different or different. the same. Since the affine transformation model of the current block is more in line with the actual motion / actual requirements of the current block, implementing this solution can improve the coding efficiency and accuracy of encoding the current block, and meet user needs.
  • the encoder refers to the design scheme of the second motion vector prediction method based on the motion model for implementation, which is beneficial to the decoder in the image block.
  • the 6-parameter affine transformation model is used to predict the image blocks. Therefore, the implementation of this solution can improve the coding efficiency and accuracy of prediction of the current block, and meet user needs.
  • Computer-readable media may include computer-readable storage media, which corresponds to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (e.g., according to a communication protocol) .
  • computer-readable media may generally correspond to (1) tangible computer-readable storage media that is non-transitory, or (2) a communication medium such as a signal or carrier wave.
  • a data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, codes, and / or data structures used to implement the techniques described in embodiments of the present invention.
  • the computer program product may include a computer-readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory, or may be used to store instructions or data structures Any other medium in the form of the required program code and accessible by a computer. Also, any connection is properly termed a computer-readable medium. For example, if coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave are used to transmit instructions from a website, server, or other remote source, then coaxial cable Wire, fiber optic cable, twisted pair, DSL or wireless technologies such as infrared, radio and microwave are included in the definition of media.
  • DSL digital subscriber line
  • the computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other temporary media, but are instead directed to non-transitory tangible storage media.
  • magnetic and optical discs include compact discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), and Blu-ray discs, where magnetic discs typically reproduce data magnetically, and optical discs use lasers to reproduce optical data. Combinations of the above should also be included within the scope of computer-readable media.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • processor may refer to any of the aforementioned structures or any other structure suitable for implementing the techniques described herein.
  • functions described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and / or software modules configured for encoding and decoding, or Into the combined codec.
  • the techniques can be fully implemented in one or more circuits or logic elements.
  • embodiments of the present invention may be implemented in a variety of devices or devices, including a wireless handset, an integrated circuit (IC), or a group of ICs (eg, a chipset).
  • IC integrated circuit
  • Various components, modules, or units are described in the embodiments of the present invention to emphasize the functional aspects of the apparatus for performing the disclosed technology, but they do not necessarily need to be implemented by different hardware units.
  • the various units may be combined in a codec hardware unit in combination with suitable software and / or firmware, or through interoperable hardware units (including one or more processors as described above) provide.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请提供了运动矢量预测方法以及相关装置,该方法包括:解析码流获得候选运动矢量列表的索引值;构造所述候选运动矢量列表;候选运动矢量列表包括当前块的K个控制点的候选运动矢量;K个控制点的候选运动矢量是根据当前块的相邻块采用的2N个参数的仿射变换模型而得到的,N为大于等于2且小于等于4的整数,K为大于等于2且小于等于4的整数,并且,N不等于K;根据索引值,从候选运动矢量列表中确定K个控制点的目标候选运动矢量;根据K个控制点的目标候选运动矢量,得到当前块中各个子块的预测运动矢量。实施本申请有利于提升音视频编解码过程的编码效率,满足用户需求。

Description

运动矢量预测方法以及相关装置 技术领域
本发明实施例涉及视频编解码技术领域,尤其涉及一种视频图像的运动矢量预测方法、装置以及相应的编码器和解码器。
背景技术
视频编码(视频编码和解码)广泛用于数字视频应用,例如广播数字电视、互联网和移动网络上的视频传播、视频聊天和视频会议等实时会话应用、DVD和蓝光光盘、视频内容采集和编辑系统以及可携式摄像机的安全应用。
随着1990年H.261标准中基于块的混合型视频编码方式的发展,新的视频编码技术和工具得到发展并为新的视频编码标准形成基础。其它视频编码标准包括MPEG-1视频、MPEG-2视频、ITU-T H.262/MPEG-2、ITU-T H.263、ITU-T H.264/MPEG-4第10部分高级视频编码(Advanced Video Coding,AVC)、ITU-T H.265/高效视频编码(High Efficiency Video Coding,HEVC)…以及此类标准的扩展,例如可扩展性和/或3D(three-dimensional)扩展。随着视频创建和使用变得越来越广泛,视频流量成为通信网络和数据存储的最大负担。因此大多数视频编码标准的目标之一是相较之前的标准,在不牺牲图片质量的前提下减少比特率。即使最新的高效视频编码(High Efficiency video coding,HEVC)可以在不牺牲图片质量的前提下比AVC大约多压缩视频一倍,仍然亟需新技术相对HEVC进一步压缩视频。
发明内容
本发明实施例提供了运动矢量预测方法以及相关装置,以期提升编码效率,满足用户需求。
第一方面,本发明实施例提供了一种运动矢量预测方法,从解码端或编码端的角度进行描述,该方法可用于对待处理图像块进行预测,待处理图像块为通过对视频图像进行分割而得到的,在编码端,待处理图像块为当前仿射编码块,与待处理图像块空域相邻的已解码图像块为相邻仿射编码块。在解码端,待处理图像块为当前仿射解码块,与待处理图像块空域相邻的已解码图像块为相邻仿射解码块。为了便于描述,可将待处理图像块统称为当前块,将与待处理图像块空域相邻的参考块统称为相邻块。所述方法包括:解析码流获得候选运动矢量列表的索引值;构造所述候选运动矢量列表;其中,所述候选运动矢量列表包括当前块的K个控制点的候选运动矢量;所述K个控制点的候选运动矢量是根据所述当前块的相邻块采用的2乘N个参数的仿射变换模型而得到的,所述2乘N个参数的仿射变换模型基于所述相邻块的N个控制点的运动矢量而得到,其中,N为大于等于2且小于等于4的整数,K为大于等于2且小于等于4的整数,并且,N不等于K;根据所述索引值,从所述候选运动矢量列表中确定所述K个控制点的目标候选运动矢量;根据所述K个控制点的目标候选运动矢量,得到所述当前块中各个子块位置的预测运动矢量。其中,各个子块位置对应的预测运动矢量可分别用于所述多个子块的运动补偿。
可以看到,本发明实施例中,解码端在对当前块的预测过程中,能够在当前块构造候选列表阶段(如构造基于仿射变换模型的AMVP模式或Merge模式的候选运动矢量列表的 阶段),利用相邻块的仿射变换模型来构建针对当前块自身的仿射变换模型,且两者的仿射变换模型可以不同。由于当前块自身的仿射变换模型更加符合当前块实际运动情况/实际需求,所以实施本方案能够提高对当前块进行预测的编码效率及准确性,满足用户需求。
基于第一方面,在可能的实施方式中,可按照预设顺序确定当前块的一个或多个预设空域位置的相邻块的可用性,然后,依次获得在所述预设顺序中可用的相邻块。其中,所述预可用的相邻块可包括:位于所述待处理图像块正上方、正左方、右上方、左下方或左上方的相邻图像块。比如,按照正左方相邻图像块、正上方相邻图像块、右上方相邻图像块、左下方相邻图像块、左上方相邻图像块的顺序依次检查所述相邻块的可用性。
基于第一方面,在可能的实施方式中,N=2且K=3,即若仿射解码块(在编码端为仿射编码块)采用4参数仿射变换模型,而当前块采用6参数仿射变换模型,那么所述当前块的3个控制点的候选运动矢量是根据所述当前块的相邻块采用的4参数的仿射变换模型而得到的。
举例来说,所述当前块的3个控制点的候选运动矢量包括所述当前块内左上角像素点位置(或称左上顶点,下文同)(x0,y0)的运动矢量(vx0,vy0)、所述当前块内右上角像素点位置(或称右上顶点,下文同)(x1,y1)的运动矢量(vx1,vy1)和所述当前块内左下角像素点位置(或称左下顶点,下文同)(x2,y2)的运动矢量(vx2,vy2);
所述当前块的3个控制点的候选运动矢量是根据所述当前块的相邻块采用的4参数的仿射变换模型而得到,包括先根据如下公式计算出所述当前块的左上顶点(x0,y0)的运动矢量(vx0,vy0)、当前块右上顶点(x1,y1)的运动矢量(vx1,vy1)、当前块左下顶点(x2,y2)的运动矢量(vx2,vy2):
Figure PCTCN2018116984-appb-000001
Figure PCTCN2018116984-appb-000002
Figure PCTCN2018116984-appb-000003
其中,vx 0为所述当前块内左上角像素点位置对应的运动矢量的水平分量,vy 0为所述当前块内左上角像素点位置对应的运动矢量的竖直分量;vx 1为所述当前块内右上角像素点位置对应的运动矢量的水平分量,vy 1为所述当前块内右上角像素点位置对应的运动矢量的竖直分量;vx 2为所述当前块内右上角像素点位置对应的运动矢量的水平分量,vy 2为所述当前块内右上角像素点位置对应的运动矢量的竖直分量;vx 4为所述相邻块内左上角像素点位置对应的运动矢量的水平分量,vy 4为所述相邻块内左上角像素点位置对应的运动矢量的竖直分量;vx 5为所述相邻块内右上角像素点位置对应的运动矢量的水平分量,vy 5为所述相邻块内右上角像素点位置对应的运动矢量的竖直分量;x 0为所述当前块内左上角像素点位置横坐标,y 0为所述当前块内左上角像素点位置纵坐标;x 1为所述当前块内右上角像 素点位置横坐标,y 1为所述当前块内右上角像素点位置纵坐标;x 2为所述当前块内左下角像素点位置横坐标,y 2为所述当前块内左下角像素点位置纵坐标。x 4为所述相邻块内左上角像素点位置横坐标,y 4为所述当前块内左上角像素点位置纵坐标;x 5为所述相邻块内右上角像素点位置横坐标。
基于第一方面,在可能的实施方式中,N=3且K=2,即若仿射解码块(在编码端为仿射编码块)采用6参数仿射变换模型,而当前块采用4参数仿射变换模型,那么所述当前块的2个控制点的候选运动矢量是根据所述当前块的相邻块采用的6参数的仿射变换模型而得到的。
举例来说,所述当前块的2个控制点的候选运动矢量包括所述当前块内左上角像素点位置(或称左上顶点,下文同)(x0,y0)的运动矢量(vx0,vy0)、所述当前块内右上角像素点位置(或称右上顶点,下文同)(x1,y1)的运动矢量(vx1,vy1);所述当前块的2个控制点的候选运动矢量是根据所述当前块的相邻块采用的6参数的仿射变换模型而得到,包括根据如下公式计算出所述当前块的2个控制点的候选运动矢量:
Figure PCTCN2018116984-appb-000004
Figure PCTCN2018116984-appb-000005
其中,vx 0为所述当前块内左上角像素点位置对应的运动矢量的水平分量,vy 0为所述当前块内左上角像素点位置对应的运动矢量的竖直分量;vx 1为所述当前块内右上角像素点位置对应的运动矢量的水平分量,vy 1为所述当前块内右上角像素点位置对应的运动矢量的竖直分量;vx 4为所述相邻块内左上角像素点位置对应的运动矢量的水平分量,vy 4为所述相邻块内左上角像素点位置对应的运动矢量的竖直分量;vx 5为所述相邻块内右上角像素点位置对应的运动矢量的水平分量,vy 5为所述相邻块内右上角像素点位置对应的运动矢量的竖直分量;vx 6为所述相邻块内左下角像素点位置对应的运动矢量的水平分量,vy 6为所述相邻块内左下角像素点位置对应的运动矢量的竖直分量;x 0为所述当前块内左上角像素点位置横坐标,y 0为所述当前块内左上角像素点位置纵坐标;x 1为所述当前块内右上角像素点位置横坐标,y 1为所述当前块内右上角像素点位置纵坐标;x 4为所述相邻块内左上角像素点位置横坐标,y 4为所述当前块内左上角像素点位置纵坐标;x 5为所述相邻块内右上角像素点位置横坐标,y 6为所述相邻块内左下角像素点位置纵坐标。
可以看到,实施本发明的实施例,能够实现在当前块的解析阶段(如构造候选运动矢量列表的阶段),利用相邻块的仿射变换模型来构建针对当前块自身的仿射变换模型,且两者的仿射变换模型可以不同。由于当前块自身的仿射变换模型更加符合当前块实际运动情况/实际需求,所以实施本方案能够提高对当前块进行预测的编码效率及准确性,满足用户需求。
基于第一方面,在当前块的重建阶段的一种实施例中,当前块中各个子块的预测运动矢量的获取过程包括如下过程:根据所述K个控制点的目标候选运动矢量,得到所述当前 块的2乘K个参数的仿射变换模型;根据所述2乘K个参数的仿射变换模型,得到所述当前块中各个子块的预测运动矢量。
举例来说,若当前仿射解码块采用的仿射运动模型为6参数仿射运动模型,则根据所述当前块的3个(即K=3)控制点的目标候选运动矢量,组成所述当前块的6参数仿射变换模型,将子块中像素点坐标(x (i,j),y (i,j))代入所述6参数仿射运动模型公式,获得每个子块中该像素点坐标对应的运动矢量,并作为该子块内所有像素点的运动矢量(vx (i,j),vy (i,j)):
Figure PCTCN2018116984-appb-000006
又举例来说,若当前仿射解码块采用的仿射运动模型为4参数仿射运动模型,则根据所述当前块的2个(即K=2)控制点的目标候选运动矢量,组成所述当前块的4参数仿射变换模型,将将子块中像素点坐标(x (i,j),y (i,j))代入4参数仿射运动模型公式,获得每个子块中该像素点坐标对应的运动矢量,并作为该子块内所有像素点的运动矢量(vx (i,j),vy (i,j)):
Figure PCTCN2018116984-appb-000007
基于第一方面,在当前块的重建阶段的又一种实施例中,当前块中各个子块的预测运动矢量的获取过程包括如下过程:根据所述当前块的K个控制点的目标候选运动矢量,获得所述当前块的6参数的仿射变换模型;根据所述当前块的6参数的仿射变换模型,得到所述当前块中各个子块的预测运动矢量。
也就是说,在这种方案中,无论当前块在解析阶段(构建列表阶段)采用什么样的仿射变换模型,在当前块的重建阶段,统一采用6参数仿射变换模型来获得当前块中的每个子块的运动矢量信息,从而实现对每个子块的重建。例如,如果解析阶段采用的是4参数仿射变换模型或8参数双线性模型,则将进一步构建当前块的6参数仿射变换模型。如果解析阶段采用的是6参数仿射变换模型,重建阶段继续沿用所述当前块的6参数仿射变换模型。
举例来说,若当前块在解析阶段采用4参数仿射变换模型,而相邻块可能采用4参数仿射变换模型,也可能采用其他参数仿射模型。那么,在获得当前块的2个控制点的运动矢量后,例如,获得当前块的左上控制点(x4,y4)的运动矢量值(vx4,vy4)和右上控制点(x5,y5)的运动矢量值(vx5,vy5)。那么,在当前块的重建阶段,需根据当前块的2个控制点的运动矢量,构建6参数仿射变换模型。
比如,可根据当前块的左上控制点(x0,y0)的运动矢量(vx0,vy0)和右上控制点(x1,y1)的运动矢量(vx1,vy1),采用如下公式获得第3个控制点的运动矢量值,所述第3个控制点的运动矢量值例如为当前块左下顶点(x2,y2)的运动矢量(vx2,vy2):
Figure PCTCN2018116984-appb-000008
然后,利用当前块的左上控制点(x0,y0)的运动矢量(vx0,vy0、右上控制点(x1,y1)的运动矢量(vx1,vy1)和左下顶点(x2,y2)的运动矢量(vx2,vy2),获得当前块重建阶段 的6参数仿射模型,该6参数仿射模型公式如下式所示:
Figure PCTCN2018116984-appb-000009
那么,将当前块中的各个子块(或各个运动补偿单元)的预设位置像素点(如中心点)相对于当前块左上顶点(或其他参考点)的坐标(x (i,j),y (i,j))代入到上式6参数仿射模型公式,即可获得每个子块(或每个运动补偿单元)预设位置像素点的运动信息,进而后续实现对每个子块的重建。
实施本发明的实施例,能够实现在当前块的重建阶段,统一采用6参数仿射变换模型来对当前块进行预测。由于描述当前块的仿射运动的运动模型的参数越多,精度越高,计算复杂度就会越高。而本方案在重建阶段所构建的6参数仿射变换模型可描述图像块的平移、缩放、旋转等等仿射变换,并且在模型复杂度以及建模能力之间取得良好平衡。所以,实施本方案能够提高对当前块进行预测的编码效率及准确性,满足用户需求
基于第一方面,在可能的实施方式中,在基于仿射变换模型的AMVP模式中,所述根据所述K个控制点的目标候选运动矢量,得到所述2乘K个参数的仿射变换模型,包括:根据所述K个控制点的目标候选运动矢量,以及所述K个控制点的运动矢量差值,得到所述K个控制点的运动矢量;其中,所述K个控制点的运动矢量差值是通过解析所述码流得到的;根据所述K个控制点的运动矢量,获得所述当前块的2乘K个参数的仿射变换模型。
基于第一方面,在可能的实施方式中,编码端和解码端采用基于仿射变换模型的AMVP模式进行帧间预测,所构造的列表为仿射变换模型的AMVP模式的候选运动矢量列表。
本发明一些具体实施例中,可利用本文描述的第一种基于运动模型的运动矢量预测方法,得到当前块的控制点的候选运动矢量来加入到AMVP模式对应的候选运动矢量列表。
本发明又一些具体实施例中,也可分别利用第一种基于运动模型的运动矢量预测方法和构造的控制点运动矢量预测方法,当前块的控制点的候选运动矢量来分别加入到AMVP模式对应的候选运动矢量列表。
基于第一方面,在可能的实施方式中,编码端和解码端采用基于仿射变换模型的Merge模式进行帧间预测,所构造的列表为仿射变换模型的Merge模式的候选运动矢量列表。
本发明一些具体实施例中,也可利用本文描述的第一种基于运动模型的运动矢量预测方法,得到当前块的控制点的候选运动矢量来加入到Merge模式对应的候选运动矢量列表。
本发明又一些具体实施例中,也可分别利用第一种基于运动模型的运动矢量预测方法和构造的控制点运动矢量预测方法,当前块的控制点的候选运动矢量来分别加入到Merge模式对应的候选运动矢量列表。
基于第一方面,在可能的实施方式中,当相邻位置块有多个,即与当前块相邻的仿射解码块有多个时,在一种可能实施例中,对于编码端和解码端,都可以先利用与当前块模型参数个数相同的仿射解码块得到当前块的控制点的候选运动矢量,加入到AMVP模式对应的候选运动矢量列表。然后,再利用与当前块模型参数个数不同的仿射解码块得到当前块的控制点的候选运动矢量,加入到AMVP模式对应的候选运动矢量列表。这样,通过与 当前块模型参数个数相同的仿射解码块所得到当前块的控制点的候选运动矢量将会处于列表中的前面位置,这样设计的好处是有利于减少码流中传输的比特数。
基于第一方面,在可能的实施方式中,在解码端利用推导当前块的控制点的候选运动矢量的过程中,可能需要获取仿射解码块的仿射变换模型的标志信息(flag),该flag预先存储在解码端本地,该flag用于指示该仿射解码块的进行自身子块的预测时实际采用的仿射变换模型。
举例来说,一应用场景中,当解码端通过识别仿射解码块的flag,确定仿射解码块实际采用的仿射变换模型与当前块采用的仿射变换模型的模型参数个数不同(或者相同)时,才触发解码端利用仿射解码块实际采用的仿射变换模型推导当前块的控制点的候选运动矢量。
基于第一方面,在可能的实施方式中,在解码端利用推导当前块的控制点的候选运动矢量的过程中,也可以不需要仿射解码块的仿射变换模型的flag。
举例来说,一应用场景中,解码端确定当前块采用的仿射变换模型后,解码端获取仿射解码块特定数量(该特定数量与当前块的控制点的数量相同,或者不同)的控制点,利用仿射解码块特定数量的控制点组成仿射变换模型,再利用该仿射变换模型推导当前块的控制点的候选运动矢量。
第二方面,本发明实施例提供了又一种运动矢量预测方法,所述方法包括:解析码流获得候选运动矢量列表的索引值;构造所述候选运动矢量列表;其中,所述候选运动矢量列表包括当前块的K个控制点的候选运动矢量;所述当前块的K个控制点的候选运动矢量是根据所述当前块的相邻块采用的2N参数的仿射变换模型而得到的,所述2N参数的仿射变换模型基于所述相邻块的N个控制点的运动矢量而得到,其中,N为大于等于2且小于等于4的整数,K为大于等于2且小于等于4的整数;所述相邻块为与所述当前块空间邻近的已解码图像块,所述当前块包括多个子块;根据所述索引值,从所述候选运动矢量列表中确定所述当前块的K个控制点的目标候选运动矢量;根据所述当前块的K个控制点的目标候选运动矢量,获得所述当前块的6参数的仿射变换模型;根据所述当前块的6参数的仿射变换模型,得到所述当前块中各个子块的预测运动矢量。
可以看到。实施本发明的实施例,能够实现在当前块的重建阶段,统一采用6参数仿射变换模型来对当前块进行预测。由于描述当前块的仿射运动的运动模型的参数越多,精度越高,计算复杂度就会越高。而本方案在重建阶段所构建的6参数仿射变换模型可描述图像块的平移、缩放、旋转等等仿射变换,并且在模型复杂度以及建模能力之间取得良好平衡。所以,实施本方案能够提高对当前块进行预测的编码效率及准确性,满足用户需求。
基于第二方面,在可能的实施方式中,N=2且K=2,相应的,
所述当前块的2个控制点的候选运动矢量是根据所述当前块的相邻块采用的4参数的仿射变换模型而得到的。
基于第二方面,在可能的实施方式中,N=3且K=2,相应的,所述当前块的2个控制点的候选运动矢量是根据所述当前块的相邻块采用的6参数的仿射变换模型而得到的。
基于第二方面,在可能的实施方式中,所述根据所述当前块的K个控制点的目标候选运动矢量,获得所述当前块的6参数的仿射变换模型,包括:
根据所述当前块的2个控制点的目标候选运动矢量,获得所述当前块的4参数的仿射变换模型;
根据所述当前块的4参数的仿射变换模型,得到所述当前块的第3个控制点的运动矢量;
根据所述当前块的2个控制点的目标候选运动矢量和所述第3个控制点的运动矢量,获得所述当前块的6参数的仿射变换模型。
基于第二方面,在可能的实施方式中,所述根据所述当前块的2个控制点的目标候选运动矢量,获得所述当前块的4参数的仿射变换模型,包括:
根据所述当前块的2个控制点的目标候选运动矢量,以及所述当前块的2个控制点的运动矢量差值,得到当前块的2个控制点的运动矢量;其中,所述当前块的2个控制点的运动矢量差值是通过解析所述码流得到的;
根据所述当前块的2个控制点的运动矢量,获得所述当前块的4参数的仿射变换模型;
相应的,所述根据所述当前块的2个控制点的目标候选运动矢量和所述第3个控制点的运动矢量,获得所述当前块的6参数的仿射变换模型,具体包括:
根据所述当前块的2个控制点的运动矢量和所述第3个控制点的运动矢量,获得所述当前块的6参数的仿射变换模型。
基于第二方面,在可能的实施方式中,N=2且K=3,相应的,所述当前块的3个控制点的候选运动矢量是根据所述当前块的相邻块采用的2参数的仿射变换模型而得到的。
第三方面,本发明实施例提供了一种解码设备,所述设备包括:存储单元,用于存储码流形式的视频数据;熵解码单元,用于解析所述码流获得候选运动矢量列表的索引值;预测处理单元,用于构造所述候选运动矢量列表;其中,所述候选运动矢量列表包括当前块的K个控制点的候选运动矢量;所述当前块的K个控制点的候选运动矢量是根据所述当前块的相邻块采用的2乘N个参数的仿射变换模型而得到的,所述2乘N个参数的仿射变换模型基于所述相邻块的N个控制点的运动矢量而得到,其中,N为大于等于2且小于等于4的整数,K为大于等于2且小于等于4的整数,并且,N不等于K;所述相邻块为与所述当前块空间邻近的已解码图像块,所述当前块包括多个子块;根据所述索引值,从所述候选运动矢量列表中确定所述当前块的K个控制点的目标候选运动矢量;根据所述当前块的K个控制点的目标候选运动矢量,得到所述当前块中各个子块的预测运动矢量。
具体实施例中,所述设备的各个模块可用于实现第一方面所描述的方法。
第四方面,本发明实施例提供了一种解码设备,所述设备包括:存储单元,用于存储码流形式的视频数据;熵解码单元,用于解析所述码流获得候选运动矢量列表的索引值;预测处理单元,用于解析码流获得候选运动矢量列表的索引值;构造所述候选运动矢量列表;其中,所述候选运动矢量列表包括当前块的K个控制点的候选运动矢量;所述当前块的K个控制点的候选运动矢量是根据所述当前块的相邻块采用的2N参数的仿射变换模型而得到的,所述2N参数的仿射变换模型基于所述相邻块的N个控制点的运动矢量而得到,其中,N为大于等于2且小于等于4的整数,K为大于等于2且小于等于4的整数;所述相邻块为与所述当前块空间邻近的已解码图像块,所述当前块包括多个子块;根据所述索引值,从所述候选运动矢量列表中确定所述当前块的K个控制点的目标候选运动矢量;根据所述 当前块的K个控制点的目标候选运动矢量,获得所述当前块的6参数的仿射变换模型;根据所述当前块的6参数的仿射变换模型,得到所述当前块中各个子块的预测运动矢量。
具体实施例中,所述设备的各个模块可用于实现第二方面所描述的方法。
第五方面,本发明实施例提供了一种用于解码视频的设备,该设备包括:
存储器,用于存储码流形式的视频数据;
解码器,用于构造所述候选运动矢量列表;其中,所述候选运动矢量列表包括当前块的K个控制点的候选运动矢量;所述当前块的K个控制点的候选运动矢量是根据所述当前块的相邻块采用的2乘N个参数的仿射变换模型而得到的,所述2乘N个参数的仿射变换模型基于所述相邻块的N个控制点的运动矢量而得到,其中,N为大于等于2且小于等于4的整数,K为大于等于2且小于等于4的整数,并且,N不等于K;所述相邻块为与所述当前块空间邻近的已解码图像块,所述当前块包括多个子块;根据所述索引值,从所述候选运动矢量列表中确定所述当前块的K个控制点的目标候选运动矢量;根据所述当前块的K个控制点的目标候选运动矢量,得到所述当前块中各个子块的预测运动矢量。
基于第五方面,在一些实施方式中,N等于2且K等于3,相应的,所述当前块的3个控制点的候选运动矢量是根据所述当前块的相邻块采用的4参数的仿射变换模型而得到的。
基于第五方面,在一些实施方式中,所述当前块的3个控制点的候选运动矢量包括所述当前块内左上角像素点位置的运动矢量、所述当前块内右上角像素点位置的运动矢量和所述当前块内左下角像素点位置的运动矢量;
所述解码器用于根据如下公式计算出所述当前块的3个控制点的候选运动矢量:
Figure PCTCN2018116984-appb-000010
Figure PCTCN2018116984-appb-000011
Figure PCTCN2018116984-appb-000012
其中,vx 0为所述当前块内左上角像素点位置对应的运动矢量的水平分量,vy 0为所述当前块内左上角像素点位置对应的运动矢量的竖直分量;vx 1为所述当前块内右上角像素点位置对应的运动矢量的水平分量,vy 1为所述当前块内右上角像素点位置对应的运动矢量的竖直分量;vx 2为所述当前块内右上角像素点位置对应的运动矢量的水平分量,vy 2为所述当前块内右上角像素点位置对应的运动矢量的竖直分量;vx 4为所述相邻块内左上角像素点位置对应的运动矢量的水平分量,vy 4为所述相邻块内左上角像素点位置对应的运动矢量的竖直分量;vx 5为所述相邻块内右上角像素点位置对应的运动矢量的水平分量,vy 5为所述相邻块内右上角像素点位置对应的运动矢量的竖直分量;x 0为所述当前块内左上角像素 点位置横坐标,y 0为所述当前块内左上角像素点位置纵坐标;x 1为所述当前块内右上角像素点位置横坐标,y 1为所述当前块内右上角像素点位置纵坐标;x 2为所述当前块内左下角像素点位置横坐标,y 2为所述当前块内左下角像素点位置纵坐标;x 4为所述相邻块内左上角像素点位置横坐标,y 4为所述当前块内左上角像素点位置纵坐标;x 5为所述相邻块内右上角像素点位置横坐标。
基于第五方面,在一些实施方式中,N等于3且K等于2,相应的,所述当前块的2个控制点的候选运动矢量是根据所述当前块的相邻块采用的6参数的仿射变换模型而得到的。
基于第五方面,在一些实施方式中,所述当前块的2个控制点的候选运动矢量包括所述当前块内左上角像素点位置的运动矢量和所述当前块内右上角像素点位置的运动矢量;
所述解码器用于根据如下公式计算出所述当前块的2个控制点的候选运动矢量:
Figure PCTCN2018116984-appb-000013
Figure PCTCN2018116984-appb-000014
其中,vx 0为所述当前块内左上角像素点位置对应的运动矢量的水平分量,vy 0为所述当前块内左上角像素点位置对应的运动矢量的竖直分量;vx 1为所述当前块内右上角像素点位置对应的运动矢量的水平分量,vy 1为所述当前块内右上角像素点位置对应的运动矢量的竖直分量;vx 4为所述相邻块内左上角像素点位置对应的运动矢量的水平分量,vy 4为所述相邻块内左上角像素点位置对应的运动矢量的竖直分量;vx 5为所述相邻块内右上角像素点位置对应的运动矢量的水平分量,vy 5为所述相邻块内右上角像素点位置对应的运动矢量的竖直分量;vx 6为所述相邻块内左下角像素点位置对应的运动矢量的水平分量,vy 6为所述相邻块内左下角像素点位置对应的运动矢量的竖直分量;x 0为所述当前块内左上角像素点位置横坐标,y 0为所述当前块内左上角像素点位置纵坐标;x 1为所述当前块内右上角像素点位置横坐标,y 1为所述当前块内右上角像素点位置纵坐标;x 4为所述相邻块内左上角像素点位置横坐标,y 4为所述当前块内左上角像素点位置纵坐标;x 5为所述相邻块内右上角像素点位置横坐标,y 6为所述相邻块内左下角像素点位置纵坐标。
基于第五方面,在一些实施方式中,所述解码器具体用于:
根据所述当前块的K个控制点的目标候选运动矢量,得到所述当前块的2乘K个参数的仿射变换模型;
根据所述当前块的2乘K个参数的仿射变换模型,得到所述当前块中各个子块的预测运动矢量。
基于第五方面,在一些实施方式中,所述解码器具体用于:
根据所述当前块的K个控制点的目标候选运动矢量,以及所述当前块的K个控制点的运动矢量差值,得到当前块的K个控制点的运动矢量;其中,所述当前块的K个控制点的运动矢量差值是通过解析所述码流得到的;
根据所述当前块的K个控制点的运动矢量,获得所述当前块的2乘K个参数的仿射变换模型。
基于第五方面,在一些实施方式中,所述根据所述索引值,从所述候选运动矢量列表中确定所述当前块的K个控制点的目标候选运动矢量之后,所述预解码器还用于:
根据所述当前块的K个控制点的目标候选运动矢量,获得所述当前块的6参数的仿射变换模型;
根据所述当前块的6参数的仿射变换模型,得到所述当前块中各个子块的预测运动矢量。
其中,所述解码器的具体功能实现可参考第一方面的相关描述。
第六方面,本发明实施例提供了又一种用于解码视频的设备,该设备包括:
存储器,用于存储码流形式的视频数据;
解码器,用于解析码流获得候选运动矢量列表的索引值;构造所述候选运动矢量列表;其中,所述候选运动矢量列表包括当前块的K个控制点的候选运动矢量;所述当前块的K个控制点的候选运动矢量是根据所述当前块的相邻块采用的2N参数的仿射变换模型而得到的,所述2N参数的仿射变换模型基于所述相邻块的N个控制点的运动矢量而得到,其中,N为大于等于2且小于等于4的整数,K为大于等于2且小于等于4的整数;所述相邻块为与所述当前块空间邻近的已解码图像块,所述当前块包括多个子块;根据所述索引值,从所述候选运动矢量列表中确定所述当前块的K个控制点的目标候选运动矢量;根据所述当前块的K个控制点的目标候选运动矢量,获得所述当前块的6参数的仿射变换模型;根据所述当前块的6参数的仿射变换模型,得到所述当前块中各个子块的预测运动矢量。
其中,所述解码器的具体功能实现可参考第二方面的相关描述。
第七方面,本发明实施例提供了计算机可读存储介质,其上储存有指令,所述指令执行时,使得一个或多个处理器编码视频数据。所述指令使得所述一个或多个处理器执行根据第一方面任何可能实施例的方法。
第八方面,本发明实施例提供了计算机可读存储介质,其上储存有指令,所述指令执行时,使得一个或多个处理器编码视频数据。所述指令使得所述一个或多个处理器执行根据第二方面任何可能实施例的方法。
第九方面,本发明实施例提供了包括程序代码的计算机程序,所述程序代码在计算机上运行时执行根据第一方面任何可能实施例的方法。
第十方面,本发明实施例提供了包括程序代码的计算机程序,所述程序代码在计算机上运行时执行根据第二方面任何可能实施例的方法。
可以看到,本发明一实施例中,在对当前块的编解码过程中,能够实现在当前块的解析阶段(如构造AMVP模式或Merge模式的候选运动矢量列表的阶段),利用相邻块的仿射变换模型来构建针对当前块自身的仿射变换模型,且两者的仿射变换模型可以不同。由于当前块自身的仿射变换模型更加符合当前块实际运动情况/实际需求,所以实施本方案能够提高对当前块进行编码的编码效率及准确性,满足用户需求。
还可以看到,在对当前块的编解码过程中,解码端在图像块的重建阶段可统一采用6参数仿射变换模型来对图像块进行预测,从而使得本发明实施例对当前块的重建过程在模 型复杂度以及建模能力之间取得良好平衡。所以,实施本方案能够提高对当前块进行预测的编码效率及准确性,满足用户需求。
附图说明
为了更清楚地说明本发明实施例或背景技术中的技术方案,下面将对本发明实施例或背景技术中所需要使用的附图进行说明。
图1是用于实现本发明实施例的视频译码系统实例结构的框图;
图2A是用于实现本发明实施例的视频编码器实例结构的框图;
图2B是用于实现本发明实施例的视频解码器实例结构的框图;
图3是用于实现本发明实施例的视频译码设备实例的框图;
图4是用于实现本发明实施例的编码装置或解码装置实例的框图;
图5是一种对当前块的实例操作的场景示意图;
图6是又一种对当前块的实例操作的场景示意图;
图7是又一种对当前块的实例操作的场景示意图;
图8是又一种对当前块的实例操作的场景示意图;
图9是本发明实施例提供的一种运动矢量预测方法流程图;
图10是又一种对当前块的实例操作的场景示意图;
图11A是本发明实施例提供的一种当前块以及当前块的运动补偿单元的示意图;
图11B是本发明实施例提供的又一种当前块以及当前块的运动补偿单元的示意图;
图12是本发明实施例提供的又一种运动矢量预测方法流程图;
图13是本发明实施例提供的又一种运动矢量预测方法流程图。
具体实施方式
下面先对本发明实施例可能涉及的一些概念进行简单介绍。本发明实施例所涉及的技术方案不仅可能应用于现有的视频编码标准中(如H.264、HEVC等标准),还可能应用于未来的视频编码标准中(如H.266标准)。
视频编码通常是指处理形成视频或视频序列的图片序列。在视频编码领域,术语“图片(picture)”、“帧(frame)”或“图像(image)”可以用作同义词。本文中使用的视频编码表示视频编码或视频解码。视频编码在源侧执行,通常包括处理(例如,通过压缩)原始视频图片以减少表示该视频图片所需的数据量,从而更高效地存储和/或传输。视频解码在目的地侧执行,通常包括相对于编码器作逆处理,以重构视频图片。实施例涉及的视频图片“编码”应理解为涉及视频序列的“编码”或“解码”。编码部分和解码部分的组合也称为编解码(编码和解码)。
视频序列包括一系列图像(picture),图像被进一步划分为切片(slice),切片再被划分为块(block)。视频编码以块为单位进行编码处理,在一些新的视频编码标准中,块的概念被进一步扩展。比如,在H.264标准中有宏块(macroblock,MB),宏块可进一步划分成多个可用于预测编码的预测块(partition)。在高性能视频编码(high efficiency video coding,HEVC)标准中,采用编码单元(coding unit,CU),预测单元(prediction unit, PU)和变换单元(transform unit,TU)等基本概念,从功能上划分了多种块单元,并采用全新的基于树结构进行描述。比如CU可以按照四叉树进行划分为更小的CU,而更小的CU还可以继续划分,从而形成一种四叉树结构,CU是对编码图像进行划分和编码的基本单元。对于PU和TU也有类似的树结构,PU可以对应预测块,是预测编码的基本单元。对CU按照划分模式进一步划分成多个PU。TU可以对应变换块,是对预测残差进行变换的基本单元。然而,无论CU,PU还是TU,本质上都属于块(或称图像块)的概念。
例如在HEVC中,通过使用表示为编码树的四叉树结构将CTU拆分为多个CU。在CU层级处作出是否使用图片间(时间)或图片内(空间)预测对图片区域进行编码的决策。每个CU可以根据PU拆分类型进一步拆分为一个、两个或四个PU。一个PU内应用相同的预测过程,并在PU基础上将相关信息传输到解码器。在通过基于PU拆分类型应用预测过程获取残差块之后,可以根据类似于用于CU的编码树的其它四叉树结构将CU分割成变换单元(transform unit,TU)。在视频压缩技术最新的发展中,使用四叉树和二叉树(Quad-tree and binary tree,QTBT)分割帧来分割编码块。在QTBT块结构中,CU可以为正方形或矩形形状。
本文中,为了便于描述和理解,可将当前编码图像中待编码的图像块称为当前块,例如在编码中,指当前正在编码的块;在解码中,指当前正在解码的块。将参考图像中用于对当前块进行预测的已解码的图像块称为参考块,即参考块是为当前块提供参考信号的块,其中,参考信号表示图像块内的像素值。可将参考图像中为当前块提供预测信号的块为预测块,其中,预测信号表示预测块内的像素值或者采样值或者采样信号。例如,在遍历多个参考块以后,找到了最佳参考块,此最佳参考块将为当前块提供预测,此块可称为预测块。
下面描述发明实施例中所涉及的视频译码系统。参见图1,图1为本发明实施例中所描述的一种实例的视频译码系统的框图。如本文所使用,术语“视频译码器”一般是指视频编码器和视频解码器两者。在本发明实施例中,术语“视频译码”或“译码”可一般地指代视频编码或视频解码。视频译码系统的视频编码器100和视频解码器200用于根据本发明实施例提出的多种新的帧间预测模式中的任一种所描述的各种方法实例来预测当前经译码图像块或其子块的运动信息,例如运动矢量,使得预测出的运动矢量最大程度上接近使用运动估算方法得到的运动矢量,从而编码时无需传送运动矢量差值,从而进一步的改善编解码性能。
如图1中所示,视频译码系统包含源装置10和目的地装置20。源装置10产生经编码视频数据。因此,源装置10可被称为视频编码装置。目的地装置20可对由源装置10所产生的经编码的视频数据进行解码。因此,目的地装置20可被称为视频解码装置。源装置10、目的地装置20或两个的各种实施方案可包含一或多个处理器以及耦合到所述一或多个处理器的存储器。所述存储器可包含但不限于RAM、ROM、EEPROM、快闪存储器或可用于以可由计算机存取的指令或数据结构的形式存储所要的程序代码的任何其它媒体,如本文所描述。
源装置10和目的地装置20可以包括各种装置,包含桌上型计算机、移动计算装置、 笔记型(例如,膝上型)计算机、平板计算机、机顶盒、例如所谓的“智能”电话等电话手持机、电视机、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机或其类似者。
目的地装置20可经由链路30从源装置10接收经编码视频数据。链路30可包括能够将经编码视频数据从源装置10移动到目的地装置20的一或多个媒体或装置。在一个实例中,链路30可包括使得源装置10能够实时将经编码视频数据直接发射到目的地装置20的一或多个通信媒体。在此实例中,源装置10可根据通信标准(例如无线通信协议)来调制经编码视频数据,且可将经调制的视频数据发射到目的地装置20。所述一或多个通信媒体可包含无线和/或有线通信媒体,例如射频(RF)频谱或一或多个物理传输线。所述一或多个通信媒体可形成基于分组的网络的一部分,基于分组的网络例如为局域网、广域网或全球网络(例如,因特网)。所述一或多个通信媒体可包含路由器、交换器、基站或促进从源装置10到目的地装置20的通信的其它设备。
在另一实例中,可将经编码数据从输出接口140输出到存储装置40。类似地,可通过输入接口240从存储装置40存取经编码数据。存储装置40可包含多种分布式或本地存取的数据存储媒体中的任一者,例如硬盘驱动器、蓝光光盘、DVD、CD-ROM、快闪存储器、易失性或非易失性存储器,或用于存储经编码视频数据的任何其它合适的数字存储媒体。
在另一实例中,存储装置40可对应于文件服务器或可保持由源装置10产生的经编码视频的另一中间存储装置。目的地装置20可经由流式传输或下载从存储装置40存取所存储的视频数据。文件服务器可为任何类型的能够存储经编码的视频数据并且将经编码的视频数据发射到目的地装置20的服务器。实例文件服务器包含网络服务器(例如,用于网站)、FTP服务器、网络附接式存储(NAS)装置或本地磁盘驱动器。目的地装置20可通过任何标准数据连接(包含因特网连接)来存取经编码视频数据。这可包含无线信道(例如,Wi-Fi连接)、有线连接(例如,DSL、电缆调制解调器等),或适合于存取存储在文件服务器上的经编码视频数据的两者的组合。经编码视频数据从存储装置40的传输可为流式传输、下载传输或两者的组合。
本发明实施例的运动矢量预测技术可应用于视频编解码以支持多种多媒体应用,例如空中电视广播、有线电视发射、卫星电视发射、串流视频发射(例如,经由因特网)、用于存储于数据存储媒体上的视频数据的编码、存储在数据存储媒体上的视频数据的解码,或其它应用。在一些实例中,视频译码系统可用于支持单向或双向视频传输以支持例如视频流式传输、视频回放、视频广播和/或视频电话等应用。
图1中所说明的视频译码系统仅为实例,并且本发明实施例的技术可适用于未必包含编码装置与解码装置之间的任何数据通信的视频译码设置(例如,视频编码或视频解码)。在其它实例中,数据从本地存储器检索、在网络上流式传输等等。视频编码装置可对数据进行编码并且将数据存储到存储器,和/或视频解码装置可从存储器检索数据并且对数据进行解码。在许多实例中,由并不彼此通信而是仅编码数据到存储器和/或从存储器检索数据且解码数据的装置执行编码和解码。
在图1的实例中,源装置10包含视频源120、视频编码器100和输出接口140。在一些实例中,输出接口140可包含调节器/解调器(调制解调器)和/或发射器。视频源120 可包括视频捕获装置(例如,摄像机)、含有先前捕获的视频数据的视频存档、用以从视频内容提供者接收视频数据的视频馈入接口,和/或用于产生视频数据的计算机图形系统,或视频数据的此些来源的组合。
视频编码器100可对来自视频源120的视频数据进行编码。在一些实例中,源装置10经由输出接口140将经编码视频数据直接发射到目的地装置20。在其它实例中,经编码视频数据还可存储到存储装置40上,供目的地装置20以后存取来用于解码和/或播放。
在图1的实例中,目的地装置20包含输入接口240、视频解码器200和显示装置220。在一些实例中,输入接口240包含接收器和/或调制解调器。输入接口240可经由链路30和/或从存储装置40接收经编码视频数据。显示装置220可与目的地装置20集成或可在目的地装置20外部。一般来说,显示装置220显示经解码视频数据。显示装置220可包括多种显示装置,例如,液晶显示器(LCD)、等离子显示器、有机发光二极管(OLED)显示器或其它类型的显示装置。
尽管图1中未图示,但在一些方面,视频编码器100和视频解码器200可各自与音频编码器和解码器集成,且可包含适当的多路复用器-多路分用器单元或其它硬件和软件,以处置共同数据流或单独数据流中的音频和视频两者的编码。在一些实例中,如果适用的话,那么MUX-DEMUX单元可符合ITU H.223多路复用器协议,或例如用户数据报协议(UDP)等其它协议。
视频编码器100和视频解码器200各自可实施为例如以下各项的多种电路中的任一者:一或多个微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、离散逻辑、硬件或其任何组合。如果部分地以软件来实施本发明实施例,那么装置可将用于软件的指令存储在合适的非易失性计算机可读存储媒体中,且可使用一或多个处理器在硬件中执行所述指令从而实施本发明实施例技术。前述内容(包含硬件、软件、硬件与软件的组合等)中的任一者可被视为一或多个处理器。视频编码器100和视频解码器200中的每一者可包含在一或多个编码器或解码器中,所述编码器或解码器中的任一者可集成为相应装置中的组合编码器/解码器(编码解码器)的一部分。
本发明实施例可大体上将视频编码器100称为将某些信息“发信号通知”或“发射”到例如视频解码器200的另一装置。术语“发信号通知”或“发射”可大体上指代用以对经压缩视频数据进行解码的语法元素和/或其它数据的传送。此传送可实时或几乎实时地发生。替代地,此通信可经过一段时间后发生,例如可在编码时在经编码位流中将语法元素存储到计算机可读存储媒体时发生,解码装置接着可在所述语法元素存储到此媒体之后的任何时间检索所述语法元素。
视频编码器100和视频解码器200可根据例如高效视频编码(HEVC)等视频压缩标准或其扩展来操作,并且可符合HEVC测试模型(HM)。或者,视频编码器100和视频解码器200也可根据其它业界标准来操作,所述标准例如是ITU-T H.264、H.265标准,或此类标准的扩展。然而,本发明实施例的技术不限于任何特定编解码标准。
在一个实例中,视频编码器100用于:将与当前待编码的图像块相关的语法元素编码入数字视频输出位流(简称为位流或码流),这里将用于当前图像块帧间预测的语法元素简称为帧间预测数据,帧间预测数据例如包括帧间预测模式的指示信息,本发明实施例中的 帧间预测模式包括基于仿射变换模型的AMVP模式和基于仿射变换模型的Merge模式中的至少一个。在帧间预测数据包括基于仿射变换模型的AMVP模式的指示信息的情况下,所述帧间预测数据还可包括该AMVP模式对应的候选运动矢量列表的索引值(或称索引号),以及当前块的控制点的运动矢量差值(MVD);在帧间预测数据包括基于仿射变换模型的Merge模式的指示信息的情况下,所述帧间预测数据还可包括该Merge模式对应的候选运动矢量列表的索引值(或称索引号)。另外,在可选的实施例中,上述例子中的帧间预测数据还可包括当前块的仿射变换模型(模型参数个数)的指示信息。
应当理解的是,如果由基于本发明实施例提出的新的帧间预测模式预测出的运动信息产生的预测块与当前待编码图像块(即原始块)之间的差值(即残差)为0,则视频编码器100中只需要将与当前待编码的图像块相关的语法元素编入位流(亦称为码流);反之,除了语法元素外,还需要将相应的残差编入位流。
具体实施例中,视频编码器100可以用于执行后文所描述的图13实施例,以实现本发明所描述的运动矢量预测方法在编码端的应用。
在一个实例中,视频解码器200用于:从位流中解码出与当前待解码的图像块相关的语法元素(S401),这里将用于当前图像块帧间预测的语法元素简称为帧间预测数据,帧间预测数据例如包括帧间预测模式的指示信息,本发明实施例中的帧间预测模式包括基于仿射变换模型的AMVP模式和基于仿射变换模型的Merge模式中的至少一个。在帧间预测数据包括基于仿射变换模型的AMVP模式的指示信息的情况下,所述帧间预测数据还可包括该AMVP模式对应的候选运动矢量列表的索引值(或称索引号),以及当前块的控制点的运动矢量差值(MVD);在帧间预测数据包括基于仿射变换模型的Merge模式的指示信息的情况下,所述帧间预测数据还可包括该Merge模式对应的候选运动矢量列表的索引值(或称索引号)。另外,在可选的实施例中,上述例子中的帧间预测数据还可包括当前块的仿射变换模型(模型参数个数)的指示信息。
具体实施例中,视频解码器200可以用于执行后文所描述的图9或图12实施例,以实现本发明所描述的运动矢量预测方法在解码端的应用。
图2A为本发明实施例中所描述的一种实例的视频编码器100的框图。视频编码器100用于将视频输出到后处理实体41。后处理实体41表示可处理来自视频编码器100的经编码视频数据的视频实体的实例,例如媒体感知网络元件(MANE)或拼接/编辑装置。在一些情况下,后处理实体41可为网络实体的实例。在一些视频编码系统中,后处理实体41和视频编码器100可为单独装置的若干部分,而在其它情况下,相对于后处理实体41所描述的功能性可由包括视频编码器100的相同装置执行。在某一实例中,后处理实体41是图1的存储装置40的实例。
在图2A的实例中,视频编码器100包括预测处理单元108、滤波器单元106、经解码图像缓冲器(DPB)107、求和器112、变换器101、量化器102和熵编码器103。预测处理单元108包括帧间预测器110和帧内预测器109。为了图像块重构,视频编码器100还包含反量化器104、反变换器105和求和器111。滤波器单元106既定表示一或多个环路滤波器,例如去块滤波器、自适应环路滤波器(ALF)和样本自适应偏移(SAO)滤波器。尽管在图 2A中将滤波器单元106示出为环路内滤波器,但在其它实现方式下,可将滤波器单元106实施为环路后滤波器。在一种示例下,视频编码器100还可以包括视频数据存储器、分割单元(图中未示意)。
视频数据存储器可存储待由视频编码器100的组件编码的视频数据。可从视频源120获得存储在视频数据存储器中的视频数据。DPB 107可为参考图像存储器,其存储用于由视频编码器100在帧内、帧间译码模式中对视频数据进行编码的参考视频数据。视频数据存储器和DPB 107可由多种存储器装置中的任一者形成,例如包含同步DRAM(SDRAM)的动态随机存取存储器(DRAM)、磁阻式RAM(MRAM)、电阻式RAM(RRAM),或其它类型的存储器装置。视频数据存储器和DPB 107可由同一存储器装置或单独存储器装置提供。在各种实例中,视频数据存储器可与视频编码器100的其它组件一起在芯片上,或相对于那些组件在芯片外。
如图2A中所示,视频编码器100接收视频数据,并将所述视频数据存储在视频数据存储器中。分割单元将所述视频数据分割成若干图像块,而且这些图像块可以被进一步分割为更小的块,例如基于四叉树结构或者二叉树结构的图像块分割。此分割还可包含分割成条带(slice)、片(tile)或其它较大单元。视频编码器100通常说明编码待编码的视频条带内的图像块的组件。所述条带可分成多个图像块(并且可能分成被称作片的图像块集合)。
预测处理单元108内的帧内预测器109可相对于与待编码当前块在相同帧或条带中的一或多个相邻块执行当前图像块的帧内预测性编码,以去除空间冗余。预测处理单元108内的帧间预测器110可相对于一或多个参考图像中的一或多个预测块执行当前图像块的帧间预测性编码以去除时间冗余。
具体的,帧间预测器110可用于确定用于编码当前图像块的帧间预测模式。举例来说,帧间预测器110可使用速率-失真分析来计算候选帧间预测模式集合中的各种帧间预测模式的速率-失真值,并从中选择具有最佳速率-失真特性的帧间预测模式。速率失真分析通常确定经编码块与经编码以产生所述经编码块的原始的未经编码块之间的失真(或误差)的量,以及用于产生经编码块的位速率(也就是说,位数目)。例如,帧间预测器110可确定候选帧间预测模式集合中编码所述当前图像块的码率失真代价最小的帧间预测模式为用于对当前图像块进行帧间预测的帧间预测模式。下文将详细介绍帧间预测性编码过程,尤其是在本发明实施例各种用于非方向性或方向性的运动场的帧间预测模式下,预测当前图像块中一个或多个子块(具体可以是每个子块或所有子块)的运动信息的过程。
帧间预测器110用于基于确定的帧间预测模式,预测当前图像块中一个或多个子块的运动信息(例如运动矢量),并利用当前图像块中一个或多个子块的运动信息(例如运动矢量)获取或产生当前图像块的预测块。帧间预测器110可在参考图像列表中的一者中定位所述运动向量指向的预测块。帧间预测器110还可产生与图像块和视频条带相关联的语法元素以供视频解码器200在对视频条带的图像块解码时使用。又或者,一种示例下,帧间预测器110利用每个子块的运动信息执行运动补偿过程,以生成每个子块的预测块,从而得到当前图像块的预测块;应当理解的是,这里的帧间预测器110执行运动估计和运动补偿过程。
具体的,在为当前图像块选择帧间预测模式之后,帧间预测器110可将指示当前图像块的所选帧间预测模式的信息提供到熵编码器103,以便于熵编码器103编码指示所选帧间预测模式的信息。在本发明实施例中,视频编码器100可在所发射的位流中包含与当前图像块相关的帧间预测数据,帧间预测数据例如包括帧间预测模式的指示信息,本发明实施例中的帧间预测模式包括基于仿射变换模型的AMVP模式和基于仿射变换模型的Merge模式中的至少一个。在帧间预测数据包括基于仿射变换模型的AMVP模式的指示信息的情况下,所述帧间预测数据还可包括该AMVP模式对应的候选运动矢量列表的索引值(或称索引号),以及当前块的控制点的运动矢量差值(MVD);在帧间预测数据包括基于仿射变换模型的Merge模式的指示信息的情况下,所述帧间预测数据还可包括该Merge模式对应的候选运动矢量列表的索引值(或称索引号)。另外,在可选的实施例中,上述例子中的帧间预测数据还可包括当前块的仿射变换模型(模型参数个数)的指示信息。
具体实施例中,帧间预测器110可以用于执行后文所描述的图13实施例的相关步骤,以实现本发明所描述的运动矢量预测方法在编码端的应用。
帧内预测器109可对当前图像块执行帧内预测。明确地说,帧内预测器109可确定用来编码当前块的帧内预测模式。举例来说,帧内预测器109可使用速率-失真分析来计算各种待测试的帧内预测模式的速率-失真值,并从待测试模式当中选择具有最佳速率-失真特性的帧内预测模式。在任何情况下,在为图像块选择帧内预测模式之后,帧内预测器109可将指示当前图像块的所选帧内预测模式的信息提供到熵编码器103,以便熵编码器103编码指示所选帧内预测模式的信息。
在预测处理单元108经由帧间预测、帧内预测产生当前图像块的预测块之后,视频编码器100通过从待编码的当前图像块减去所述预测块来形成残差图像块。求和器112表示执行此减法运算的一或多个组件。所述残差块中的残差视频数据可包含在一或多个TU中,并应用于变换器101。变换器101使用例如离散余弦变换(DCT)或概念上类似的变换等变换将残差视频数据变换成残差变换系数。变换器101可将残差视频数据从像素值域转换到变换域,例如频域。
变换器101可将所得变换系数发送到量化器102。量化器102量化所述变换系数以进一步减小位速率。在一些实例中,量化器102可接着执行对包含经量化的变换系数的矩阵的扫描。或者,熵编码器103可执行扫描。
在量化之后,熵编码器103对经量化变换系数进行熵编码。举例来说,熵编码器103可执行上下文自适应可变长度编码(CAVLC)、上下文自适应二进制算术编码(CABAC)、基于语法的上下文自适应二进制算术编码(SBAC)、概率区间分割熵(PIPE)编码或另一熵编码方法或技术。在由熵编码器103熵编码之后,可将经编码位流发射到视频解码器200,或经存档以供稍后发射或由视频解码器200检索。熵编码器103还可对待编码的当前图像块的语法元素进行熵编码。
反量化器104和反变化器105分别应用逆量化和逆变换以在像素域中重构所述残差块,例如以供稍后用作参考图像的参考块。求和器111将经重构的残差块添加到由帧间预测器110或帧内预测器109产生的预测块,以产生经重构图像块。滤波器单元106可以适用于经重构图像块以减小失真,诸如方块效应(block artifacts)。然后,该经重构图像块作 为参考块存储在经解码图像缓冲器107中,可由帧间预测器110用作参考块以对后续视频帧或图像中的块进行帧间预测。
应当理解的是,视频编码器100的其它的结构变化可用于编码视频流。例如,对于某些图像块或者图像帧,视频编码器100可以直接地量化残差信号而不需要经变换器101处理,相应地也不需要经反变换器105处理;或者,对于某些图像块或者图像帧,视频编码器100没有产生残差数据,相应地不需要经变换器101、量化器102、反量化器104和反变换器105处理;或者,视频编码器100可以将经重构图像块作为参考块直接地进行存储而不需要经滤波器单元106处理;或者,视频编码器100中量化器102和反量化器104可以合并在一起。
具体的,在本发明实施例中,视频编码器100用于实现后文实施例中描述的运动矢量预测方法。
图2B为本发明实施例中所描述的一种实例的视频解码器200的框图。在图2B的实例中,视频解码器200包括熵解码器203、预测处理单元208、反量化器204、反变换器205、求和器211、滤波器单元206以及经解码图像缓冲器207。预测处理单元208可以包括帧间预测器210和帧内预测器209。在一些实例中,视频解码器200可执行大体上与相对于来自图2A的视频编码器100描述的编码过程互逆的解码过程。
在解码过程中,视频解码器200从视频编码器100接收表示经编码视频条带的图像块和相关联的语法元素的经编码视频位流。视频解码器200可从网络实体42接收视频数据,可选的,还可以将所述视频数据存储在视频数据存储器(图中未示意)中。视频数据存储器可存储待由视频解码器200的组件解码的视频数据,例如经编码视频位流。存储在视频数据存储器中的视频数据,例如可从存储装置40、从相机等本地视频源、经由视频数据的有线或无线网络通信或者通过存取物理数据存储媒体而获得。视频数据存储器可作为用于存储来自经编码视频位流的经编码视频数据的经解码图像缓冲器(CPB)。因此,尽管在图2B中没有示意出视频数据存储器,但视频数据存储器和DPB 207可以是同一个的存储器,也可以是单独设置的存储器。视频数据存储器和DPB 207可由多种存储器装置中的任一者形成,例如:包含同步DRAM(SDRAM)的动态随机存取存储器(DRAM)、磁阻式RAM(MRAM)、电阻式RAM(RRAM),或其它类型的存储器装置。在各种实例中,视频数据存储器可与视频解码器200的其它组件一起集成在芯片上,或相对于那些组件设置在芯片外。
网络实体42可例如为服务器、MANE、视频编辑器/剪接器,或用于实施上文所描述的技术中的一或多者的其它此装置。网络实体42可包括或可不包括视频编码器,例如视频编码器100。在网络实体42将经编码视频位流发送到视频解码器200之前,网络实体42可实施本发明实施例中描述的技术中的部分。在一些视频解码系统中,网络实体42和视频解码器200可为单独装置的部分,而在其它情况下,相对于网络实体42描述的功能性可由包括视频解码器200的相同装置执行。在一些情况下,网络实体42可为图1的存储装置40的实例。
视频解码器200的熵解码器203对位流进行熵解码以产生经量化的系数和一些语法元素。熵解码器203将语法元素转发到预测处理单元208。视频解码器200可接收在视频条 带层级和/或图像块层级处的语法元素。
当视频条带被解码为经帧内解码(I)条带时,预测处理单元208的帧内预测器209可基于发信号通知的帧内预测模式和来自当前帧或图像的先前经解码块的数据而产生当前视频条带的图像块的预测块。当视频条带被解码为经帧间解码(即,B或P)条带时,预测处理单元208的帧间预测器210可基于从熵解码器203接收到的语法元素,确定用于对当前视频条带的当前图像块进行解码的帧间预测模式,基于确定的帧间预测模式,对所述当前图像块进行解码(例如执行帧间预测)。具体的,帧间预测器210可确定是否对当前视频条带的当前图像块采用新的帧间预测模式进行预测,如果语法元素指示采用新的帧间预测模式来对当前图像块进行预测,基于新的帧间预测模式(例如通过语法元素指定的一种新的帧间预测模式或默认的一种新的帧间预测模式)预测当前视频条带的当前图像块或当前图像块的子块的运动信息,从而通过运动补偿过程使用预测出的当前图像块或当前图像块的子块的运动信息来获取或生成当前图像块或当前图像块的子块的预测块。这里的运动信息可以包括参考图像信息和运动矢量,其中参考图像信息可以包括但不限于单向/双向预测信息,参考图像列表号和参考图像列表对应的参考图像索引。对于帧间预测,可从参考图像列表中的一者内的参考图像中的一者产生预测块。视频解码器200可基于存储在DPB 207中的参考图像来建构参考图像列表,即列表0和列表1。当前图像的参考帧索引可包含于参考帧列表0和列表1中的一或多者中。应当理解的是,这里的帧间预测器210执行运动补偿过程。下文将详细的阐述在各种新的帧间预测模式下,利用参考块的运动信息来预测当前图像块或当前图像块的子块的运动信息的帧间预测过程。
在一个实例中,帧间预测器210可利用从位流中解码出与当前待解码的图像块相关的语法元素(S401)对当前待解码的图像块进行预测,这里将用于当前图像块帧间预测的语法元素简称为帧间预测数据,帧间预测数据例如包括帧间预测模式的指示信息,本发明实施例中的帧间预测模式包括基于仿射变换模型的AMVP模式和基于仿射变换模型的Merge模式中的至少一个。在帧间预测数据包括基于仿射变换模型的AMVP模式的指示信息的情况下,所述帧间预测数据还可包括该AMVP模式对应的候选运动矢量列表的索引值(或称索引号),以及当前块的控制点的运动矢量差值(MVD);在帧间预测数据包括基于仿射变换模型的Merge模式的指示信息的情况下,所述帧间预测数据还可包括该Merge模式对应的候选运动矢量列表的索引值(或称索引号)。另外,在可选的实施例中,上述例子中的帧间预测数据还可包括当前块的仿射变换模型(模型参数个数)的指示信息。
具体实施例中,帧间预测器210可以用于执行后文所描述的图9或图12实施例的相关步骤,以实现本发明所描述的运动矢量预测方法在解码端的应用。
反量化器204将在位流中提供且由熵解码器203解码的经量化变换系数逆量化,即去量化。逆量化过程可包括:使用由视频编码器100针对视频条带中的每个图像块计算的量化参数来确定应施加的量化程度以及同样地确定应施加的逆量化程度。反变换器205将逆变换应用于变换系数,例如逆DCT、逆整数变换或概念上类似的逆变换过程,以便产生像素域中的残差块。
在帧间预测器210产生用于当前图像块或当前图像块的子块的预测块之后,视频解码器200通过将来自反变换器205的残差块与由帧间预测器210产生的对应预测块求和以得 到重建的块,即经解码图像块。求和器211表示执行此求和操作的组件。在需要时,还可使用环路滤波器(在解码环路中或在解码环路之后)来使像素转变平滑或者以其它方式改进视频质量。滤波器单元206可以表示一或多个环路滤波器,例如去块滤波器、自适应环路滤波器(ALF)以及样本自适应偏移(SAO)滤波器。尽管在图2B中将滤波器单元206示出为环路内滤波器,但在其它实现方式中,可将滤波器单元206实施为环路后滤波器。在一种示例下,滤波器单元206适用于重建块以减小块失真,并且该结果作为经解码视频流输出。并且,还可以将给定帧或图像中的经解码图像块存储在经解码图像缓冲器207中,经解码图像缓冲器207存储用于后续运动补偿的参考图像。经解码图像缓冲器207可为存储器的一部分,其还可以存储经解码视频,以供稍后在显示装置(例如图1的显示装置220)上呈现,或可与此类存储器分开。
应当理解的是,视频解码器200的其它结构变化可用于解码经编码视频位流。例如,视频解码器200可以不经滤波器单元206处理而生成输出视频流;或者,对于某些图像块或者图像帧,视频解码器200的熵解码器203没有解码出经量化的系数,相应地不需要经反量化器204和反变换器205处理。
具体的,在本发明实施例中,视频解码器200用于实现后文实施例中描述的运动矢量预测方法。
参见图3,图3是本发明实施例提供的视频译码设备400(例如视频编码设备400或视频解码设备400)的结构示意图。视频译码设备400适于实施本文所描述的各个实施例。在一个实施例中,视频译码设备400可以是视频解码器(例如图1的视频解码器200)或视频编码器(例如图1的视频编码器100)。在另一个实施例中,视频译码设备400可以是上述图1的视频解码器200或图1的视频编码器100中的一个或多个组件。
视频译码设备400包括:用于接收数据的入口端口410和接收单元(Rx)420,用于处理数据的处理器、逻辑单元或中央处理器(CPU)430,用于传输数据的发射器单元(Tx)440和出口端口450,以及,用于存储数据的存储器460。视频译码设备400还可以包括与入口端口410、接收器单元420、发射器单元440和出口端口450耦合的光电转换组件和电光(EO)组件,用于光信号或电信号的出口或入口。
处理器430通过硬件和软件实现。处理器430可以实现为一个或多个CPU芯片、核(例如,多核处理器)、FPGA、ASIC和DSP。处理器430与入口端口410、接收器单元420、发射器单元440、出口端口450和存储器460通信。处理器430包括译码模块470(例如编码模块470或解码模块470)。编码/解码模块470实现本文中所公开的实施例,以实现本发明实施例所提供的色度块预测方法。例如,编码/解码模块470实现、处理或提供各种编码操作。因此,通过编码/解码模块470为视频译码设备400的功能提供了实质性的改进,并影响了视频译码设备400到不同状态的转换。或者,以存储在存储器460中并由处理器430执行的指令来实现编码/解码模块470。
存储器460包括一个或多个磁盘、磁带机和固态硬盘,可以用作溢出数据存储设备,用于在选择性地执行这些程序时存储程序,并存储在程序执行过程中读取的指令和数据。存储器460可以是易失性和/或非易失性的,可以是只读存储器(ROM)、随机存取存储器 (RAM)、随机存取存储器(ternary content-addressable memory,TCAM)和/或静态随机存取存储器(SRAM)。
应当理解的是,本申请的视频编码器100和视频解码器200中,针对某个环节的处理结果可以经过进一步处理后,输出到下一个环节,例如,在插值滤波、运动矢量推导或环路滤波等环节之后,对相应环节的处理结果进一步进行Clip或移位shift等操作。
例如,按照相邻仿射编码块的运动矢量推导得到的当前图像块的控制点的运动矢量,可以经过进一步处理,本申请对此不做限定。例如,对运动矢量的取值范围进行约束,使其在一定的位宽内。假设允许的运动矢量的位宽为bitDepth,则运动矢量的范围为-2^(bitDepth-1)~2^(bitDepth-1)-1,其中“^”符号表示幂次方。如bitDepth为16,则取值范围为-32768~32767。如bitDepth为18,则取值范围为-131072~131071。可以通过以下两种方式进行约束:
方式1,将运动矢量溢出的高位去除:
ux=(vx+2 bitDepth)%2 bitDepth
vx=(ux>=2 bitDepth-1)?(ux-2 bitDepth):ux
uy=(vy+2 bitDepth)%2 bitDepth
vy=(uy>=2 bitDepth-1)?(uy-2 bitDepth):uy
例如vx的值为-32769,通过以上公式得到的为32767。因为在计算机中,数值是以二进制的补码形式存储的,-32769的二进制补码为1,0111,1111,1111,1111(17位),计算机对于溢出的处理为丢弃高位,则vx的值为0111,1111,1111,1111,则为32767,与通过公式处理得到的结果一致。
方法2,将运动矢量进行Clipping,如以下公式所示:
vx=Clip3(-2 bitDepth-1,2 bitDepth-1-1,vx)
vy=Clip3(-2 bitDepth-1,2 bitDepth-1-1,vy)
其中Clip3的定义为,表示将z的值钳位到区间[x,y]之间:
Figure PCTCN2018116984-appb-000015
图4为本发明实施例的编码设备或解码设备(简称为译码设备1200)的一种实现方式的示意性框图。其中,译码设备1200可以包括处理器1210、存储器1230和总线系统1250。其中,处理器和存储器通过总线系统相连,该存储器用于存储指令,该处理器用于执行该存储器存储的指令。编码设备的存储器存储程序代码,且处理器可以调用存储器中存储的 程序代码执行本发明实施例描述的各种视频编码或解码方法,尤其是在各种新的帧间预测模式下的视频编码或解码方法,以及在各种新的帧间预测模式下预测运动信息的方法。为避免重复,这里不再详细描述。
在本发明实施例中,该处理器1210可以是中央处理单元(Central Processing Unit,简称为“CPU”),该处理器1210还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器1230可以包括只读存储器(ROM)设备或者随机存取存储器(RAM)设备。任何其他适宜类型的存储设备也可以用作存储器1230。存储器1230可以包括由处理器1210使用总线1250访问的代码和数据1231。存储器1230可以进一步包括操作系统1233和应用程序1235,该应用程序1235包括允许处理器1210执行本发明实施例描述的视频编码或解码方法(尤其是本发明实施例描述的运动矢量预测方法)的至少一个程序。例如,应用程序1235可以包括应用1至N,其进一步包括执行在本发明实施例描述的视频编码或解码方法的视频编码或解码应用(简称视频译码应用)。
该总线系统1250除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统1250。
可选的,译码设备1200还可以包括一个或多个输出设备,诸如显示器1270。在一个示例中,显示器1270可以是触感显示器,其将显示器与可操作地感测触摸输入的触感单元合并。显示器1270可以经由总线1250连接到处理器1210。
为了更好理解本发明实施例的技术方案,下面进一步描述本发明实施例涉及的帧间预测模式、非平动运动模型、继承的控制点运动矢量预测方法以及构造的控制点运动矢量预测方法。
1)帧间预测模式。在HEVC中,使用两种帧间预测模式,分别为先进的运动矢量预测(advanced motion vector prediction,AMVP)模式和融合(merge)模式。
对于AMVP模式,先遍历当前块空域或者时域相邻的已编码块(记为相邻块),根据各个相邻块的运动信息构建候选运动矢量列表(也可以称为运动信息候选列表),然后通过率失真代价从候选运动矢量列表中确定最优的运动矢量,将率失真代价最小的候选运动信息作为当前块的运动矢量预测值(motion vector predictor,MVP)。其中,相邻块的位置及其遍历顺序都是预先定义好的。率失真代价由公式(1)计算获得,其中,J表示率失真代价RD Cost,SAD为使用候选运动矢量预测值进行运动估计后得到的预测像素值与原始像素值之间的绝对误差和(sum of absolute differences,SAD),R表示码率,λ表示拉格朗日乘子。编码端将选择的运动矢量预测值在候选运动矢量列表中的索引值和参考帧索引值传递到解码端。进一步地,在MVP为中心的邻域内进行运动搜索获得当前块实际的运动矢量,编码端将MVP与实际运动矢量之间的差值(motion vector difference)传递到解码端。
J=SAD+λR         (1)
对于Merge模式,先通过当前块空域或者时域相邻的已编码块的运动信息,构建候选运 动矢量列表,然后通过计算率失真代价从候选运动矢量列表中确定最优的运动信息作为当前块的运动信息,再将最优的运动信息在候选运动矢量列表中位置的索引值(记为merge index,下同)传递到解码端。当前块空域和时域候选运动信息如图5所示,空域候选运动信息来自于空间相邻的5个块(A0,A1,B0,B1和B2),若相邻块不可得(相邻块不存在或者相邻块未编码或者相邻块采用的预测模式不为帧间预测模式),则该相邻块的运动信息不加入候选运动矢量列表。当前块的时域候选运动信息根据参考帧和当前帧的图序计数(picture order count,POC)对参考帧中对应位置块的MV进行缩放后获得。首先判断参考帧中T位置的块是否可得,若不可得则选择C位置的块。
与AMVP模式类似,Merge模式的相邻块的位置及其遍历顺序也是预先定义好的,且相邻块的位置及其遍历顺序在不同模式下可能不同。
可以看到,在AMVP模式和Merge模式中,都需要维护一个候选运动矢量列表。每次向候选列表中加入新的运动信息之前都会先检查列表中是否已经存在相同的运动信息,如果存在则不会将该运动信息加入列表中。我们将这个检查过程称为候选运动矢量列表的修剪。列表修剪是为了防止列表中出现相同的运动信息,避免冗余的率失真代价计算。
在HEVC的帧间预测中,编码块内的所有像素都采用了相同的运动信息(即编码块中所有像素点的运动一致),然后根据运动信息进行运动补偿,得到编码块的像素的预测值。然而在编码块内,并不是所有的像素都有相同的运动特性,采用相同的运动信息可能会导致运动补偿预测的不准确,进而增加了残差信息。
也就是说,现有的视频编码标准使用基于平动运动模型的块匹配运动估计,但是由于在现实世界中,运动多种多样,存在很多非平动运动的物体,如旋转的的物体,在不同方向旋转的过山车,投放的烟花和电影中的一些特技动作,特别是在UGC场景中的运动物体,对它们的编码,如果采用当前编码标准中的基于平动运动模型的块运动补偿技术,编码效率会受到很大的影响,因此,产生了非平动运动模型,比如仿射变换模型,以便进一步提高编码效率。
基于此,根据运动模型的不同,AMVP模式可以分为基于平动模型的AMVP模式以及基于非平动模型的AMVP模式;Merge模式可以分为基于平动模型的Merge模式和基于非平动运动模型的Merge模式。
2)非平动运动模型。非平动运动模型预测指在编解码端使用相同的运动模型推导出当前块内每一个子运动补偿单元的运动信息,根据子运动补偿单元的运动信息进行运动补偿,得到预测块,从而提高预测效率。其中,本发明实施例中涉及到的子运动补偿单元可以是一个像素点或按照特定方法划分的大小为N 1×N 2的像素块,其中,N 1和N 2均为正整数,N 1可以等于N 2,也可以不等于N 2
常用的非平动运动模型有4参数仿射变换模型或者6参数仿射变换模型,在可能的应用场景中,还有8参数双线性模型。下面将分别进行说明。
对于4参数仿射变换模型,4参数仿射变换模型如下公式(2)所示:
Figure PCTCN2018116984-appb-000016
4参数仿射变换模型可以通过两个像素点的运动矢量及其相对于当前块左上顶点像素的坐标来表示,将用于表示运动模型参数的像素点称为控制点。若采用左上顶点(0,0)和右上顶点(W,0)像素点作为控制点,则先确定当前块左上顶点和右上顶点控制点的运动矢量(vx0,vy0)和(vx1,vy1),然后根据如下公式(3)得到当前块中每一个子运动补偿单元的运动信息,其中(x,y)为子运动补偿单元相对于当前块左上顶点像素的坐标,W为当前块的宽。
Figure PCTCN2018116984-appb-000017
对于6参数仿射变换模型,6参数仿射变换模型如下公式(4)所示:
Figure PCTCN2018116984-appb-000018
6参数仿射变换模型可以通过三个像素点的运动矢量及其相对于当前块左上顶点像素的坐标来表示。若采用左上顶点(0,0)、右上顶点(W,0)和左下顶点(0,H)像素点作为控制点,则先确定当前块左上顶点、右上顶点和左下顶点控制点的运动矢量分别为(vx0,vy0)和(vx1,vy1)和(vx2,vy2),然后根据如下公式(5)得到当前块中每一个子运动补偿单元的运动信息,其中(x,y)为子运动补偿单元相对于当前块的左上顶点像素的坐标,W和H分别为当前块的宽和高。
Figure PCTCN2018116984-appb-000019
对于8参数双线性模型,8参数双线性模型如下公式(6)所示:
Figure PCTCN2018116984-appb-000020
8参数双线性模型可以通过四个像素点的运动矢量及其相对于当前编码块左上顶点像素的坐标来表示。若采用左上顶点(0,0)、右上顶点(W,0)、左下顶点(0,H)和右下定点(W,H)像素点作为控制点,则先确定当前编码块左上顶点、右上顶点、左下顶点和右下顶点控制点的运动矢量(vx0,vy0)、(vx1,vy1)、(vx2,vy2)和(vx3,vy3),然后根据如下公式(7)推导得到当前编码块中每一个子运动补偿单元的运动信息,其中(x,y)为子运动补偿单元相对于当前编码块左上顶点像素的坐标,W和H分别为当前编码块的宽和高。
Figure PCTCN2018116984-appb-000021
采用仿射变换模型进行预测的编码块又可称为仿射编码块,通过上文可以看出,仿射 变换模型与仿射编码块的控制点的运动信息直接相关。
通常的,可以使用基于仿射变换模型的AMVP模式或者基于仿射变换模型的Merge模式,获得仿射编码块的控制点的运动信息。对于基于仿射变换模型的AMVP模式或者基于仿射变换模型的Merge模式,当前编码块的控制点的运动信息可以通过继承的控制点运动矢量预测方法或者构造的控制点运动矢量预测方法得到。下面进一步描述这两种方法。
3)继承的控制点运动矢量预测方法。继承的控制点运动矢量预测方法是指利用当前块的相邻已编码的仿射编码块的仿射变换模型,确定当前块的候选的控制点运动矢量。其中,仿射编码块的仿射变换模型的参数个数(如4参数、6参数、8参数等等)与当前块的仿射变换模型的参数个数相一致。
以图6所示的当前块为例,按照设定的顺序,比如A1→B1→B0→A0→B2的顺序遍历当前块周围的相邻位置块,找到该当前块的相邻位置块所在的仿射编码块,获得该仿射编码块的控制点运动信息,进而通过仿射编码块的控制点运动信息构造的仿射变换模型,推导出当前块的控制点的运动矢量(用于Merge模式)或者当前块的控制点的运动矢量预测值(用于AMVP模式)。其中,A1→B1→B0→A0→B2仅作为一种示例,其它组合的顺序也适用于本发明实施例。另外,相邻位置块不仅限于A1、B1、B0、A0、B2。其中,相邻位置块可以为一个像素点,或者,按照特定方法划分的预设大小的像素块,比如可以为一个4x4的像素块,也可以为一个4x2的像素块,也可以为其他大小的像素块,不作限定。其中,仿射编码块为在编码阶段采用仿射变换模型进行预测的与当前块相邻的已编码块(也可简称为相邻仿射编码块)。
下面以如图6所示出的A1为例描述当前块的候选的控制点运动矢量的确定过程,其他情况以此类推:
若A1所在的仿射编码块为4参数仿射编码块(即该仿射编码块采用4参数仿射变换模型进行预测),则获得该仿射编码块左上顶点(x4,y4)的运动矢量(vx4,vy4)、右上顶点(x5,y5)的运动矢量(vx5,vy5)。
然后,利用如下公式(8)计算获得当前块左上顶点(x0,y0)的运动矢量(vx0,vy0):
Figure PCTCN2018116984-appb-000022
利用如下公式(9)计算获得当前块右上顶点(x1,y1)的运动矢量(vx1,vy1):
Figure PCTCN2018116984-appb-000023
通过如上基于A1所在的仿射编码块获得的当前块的左上顶点(x0,y0)的运动矢量(vx0,vy0)、右上顶点(x1,y1)的运动矢量(vx1,vy1)的组合为当前块的候选的控制点运动矢量。
若A1所在的编码块为6参数仿射编码块(即该仿射编码块采用6参数仿射变换模型进行预测),则获得该仿射编码块左上顶点(x4,y4)的运动矢量(vx4,vy4)、右上顶点(x5,y5) 的运动矢量(vx5,vy5)、左下顶点(x6,y6)的运动矢量(vx6,vy6)。
然后,利用如下公式(10)计算获得当前块左上顶点(x0,y0)的运动矢量(vx0,vy0):
Figure PCTCN2018116984-appb-000024
利用如下公式(11)计算获得当前块右上顶点(x1,y1)的运动矢量(vx1,vy1):
Figure PCTCN2018116984-appb-000025
利用如下公式(12)计算获得当前块左下顶点(x2,y2)的运动矢量(vx2,vy2):
Figure PCTCN2018116984-appb-000026
通过如上基于A1所在的仿射编码块获得的当前块的左上顶点(x0,y0)的运动矢量(vx0,vy0)、右上顶点(x1,y1)的运动矢量(vx1,vy1)、当前块左下顶点(x2,y2)的运动矢量(vx2,vy2)的组合为当前块的候选的控制点运动矢量。
需要说明的是,其他运动模型、候选位置、查找遍历顺序也可以适用于本发明实施例,本发明实施例对此不做赘述。
需要说明的是,采用其他控制点来表示相邻和当前编码块的运动模型的方法也可以适用于本发明实施例,此处不做赘述。
4)构造的控制点运动矢量预测方法。构造的控制点运动矢量预测方法是指将当前块的控制点周边邻近的已编码块的运动矢量进行组合,作为当前仿射编码块的控制点的运动矢量,而不需要考虑周边邻近的已编码块是否为仿射编码块。基于不同的预测模式(基于仿射变换模型的AMVP模式和基于仿射变换模型的Merge模式),构造的控制点运动矢量预测方法又有所差异,下面分别进行描述。
首先描述基于仿射变换模型的AMVP模式的构造的控制点运动矢量预测方法。
以图7所示为例对该构造的控制点运动矢量预测方法进行描述,以利用当前编码块周边邻近的已编码块的运动信息确定当前块左上顶点和右上顶点的运动矢量。
若当前块为4参数仿射编码块(即当前块采用4参数仿射变换模型进行预测),则可利用左上顶点相邻已编码块A2,B2或B3块的运动矢量,作为当前块左上顶点的运动矢量的候选运动矢量;利用右上顶点相邻已编码块B1或B0块的运动矢量,作为当前块右上顶点的运动矢量的候选运动矢量。将上述左上顶点和右上顶点的候选运动矢量进行组合,构成多个二元组,二元组包括的两个已编码块的运动矢量可以作为当前块的候选的控制点运动矢量,所述多个二元组可参见如下(13A)所示:
{v A2,v B1},{v A2,v B0},{v B2,v B1},{v B2,v B0},{v B3,v B1},{v B3,v B0}      (13A)
其中,v A2表示A2的运动矢量,v B1表示B1的运动矢量,v B0表示B0的运动矢量,v B2表示B2的运动矢量,v B3表示B3的运动矢量。
若当前块为6参数仿射编码块(即当前块采用6参数仿射变换模型进行预测),则可利用左上顶点相邻已编码块A2,B2或B3块的运动矢量,作为当前块左上顶点的运动矢量的候选运动矢量;利用右上顶点相邻已编码块B1或B0块的运动矢量,作为当前块右上顶点的运动矢量的候选运动矢量,利用坐下顶点相邻已编码块A0或A1的运动矢量作为当前块左下顶点的运动矢量的候选运动矢量。将上述左上顶点、右上顶点以及左下顶点的候选运动矢量进行组合,构成多个三元组,三元组包括的三个已编码块的运动矢量可以作为当前块的候选的控制点运动矢量,所述多个三元组可参见如下公式(13B)、(13C)所示:
{v A2,v B1,v A0},{v A2,v B0,v A0},{v B2,v B1,v A0},{v B2,v B0,v A0},{v B3,v B1,v A0},{v B3,v B0,v A0}  (13B)
{v A2,v B1,v A1},{v A2,v B0,v A1},{v B2,v B1,v A1},{v B2,v B0,v A1},{v B3,v B1,v A1},{v B3,v B0,v A1}  (13C)
其中,v A2表示A2的运动矢量,v B1表示B1的运动矢量,v B0表示B0的运动矢量,v B2表示B2的运动矢量,v B3表示B3的运动矢量,v A0表示A0的运动矢量,v A1表示A1的运动矢量。
需要说明的是,图7仅作为一种示例。其他控制点运动矢量的组合的方法也可适用于本发明实施例,此处不做赘述。
需要说明的是,采用其他控制点来表示相邻和当前编码块的运动模型的方法也可以适用于本发明实施例,此处不做赘述。
下面描述本发明实施例中基于仿射变换模型的Merge模式的构造的控制点运动矢量预测方法。
以图8所示为例对该构造的控制点运动矢量预测方法进行描述,以利用当前编码块周边邻近的已编码块的运动信息确定当前块左上顶点和右上顶点的运动矢量。需要说明的是,图8仅作为一种示例。
如图8所示,CPk(k=1,2,3,4)表示第k个控制点。A0,A1,A2,B0,B1,B2和B3为当前块的空域相邻位置,用于预测CP1、CP2或CP3;T为当前块的时域相邻位置,用于预测CP4。假设,CP1,CP2,CP3和CP4的坐标分别为(0,0),(W,0),(H,0)和(W,H),其中W和H为当前块的宽度和高度。那么对于当前块的每个控制点,其运动信息按照以下顺序获得:
1、对于CP1,检查顺序为B2->A2->B3,如果B2可得,则采用B2的运动信息。否则,检测A2,B3。若三个位置的运动信息均不可得,则无法获得CP1的运动信息。
2、对于CP2,检查顺序为B0->B1;如果B0可得,则CP2采用B0的运动信息。否则,检测B1。若两个位置的运动信息均不可得,则无法获得CP2的运动信息。
3、对于CP3,检测顺序为A0->A1;
4、对于CP4,采用T的运动信息。
此处X可得表示包括X(X为A0,A1,A2,B0,B1,B2,B3或T)位置的块已经编码并且采用帧间预测模式;否则,X位置不可得。需要说明的是,其他获得控制点的运动信息的方法也可适用于本发明实施例,此处不做赘述。
然后,将当前块的控制点的运动信息进行组合,得到构造的控制点运动信息。
若当前块采用的是4参数仿射变换模型,则将当前块的两个控制点的运动信息进行组合构成二元组,用来构建4参数仿射变换模型。两个控制点的组合方式可以为{CP1,CP4},{CP2,CP3},{CP1,CP2},{CP2,CP4},{CP1,CP3},{CP3,CP4}。例如,采用CP1和CP2控制点组成的二元组构建的4参数仿射变换模型,可以记作Affine(CP1,CP2)。
若当前块采用的是6参数仿射变换模型,则将当前块的三个控制点的运动信息进行组合构成三元组,用来构建6参数仿射变换模型。三个控制点的组合方式可以为{CP1,CP2,CP4},{CP1,CP2,CP3},{CP2,CP3,CP4},{CP1,CP3,CP4}。例如,采用CP1、CP2和CP3控制点构成的三元组构建的6参数仿射变换模型,可以记作Affine(CP1,CP2,CP3)。
若当前块采用的是8参数双线性模型,则将当前块的四个控制点的运动信息进行组合构成的四元组,用来构建8参数双线性模型。采用CP1、CP2、CP3和CP4控制点构成的四元组构建的8参数双线性模型,记做Bilinear(CP1,CP2,CP3,CP4)。
本文中,为了描述方便,将由两个控制点(或者两个已编码块)的运动信息组合简称为二元组,将三个控制点(或者两个已编码块)的运动信息组合简称为三元组,将四个控制点(或者四个已编码块)的运动信息组合简称为四元组。
按照预置的顺序遍历这些模型,若组合模型对应的某个控制点的运动信息不可得,则认为该模型不可得;否则,确定该模型的参考帧索引,并将控制点的运动矢量进行缩放,若缩放后的所有控制点的运动信息一致,则该模型不合法。若确定控制该模型的控制点的运动信息均可得,并且模型合法,则将该构建该模型的控制点的运动信息加入运动信息候选列表中。
控制点的运动矢量缩放的方法如下公式(14)所示:
Figure PCTCN2018116984-appb-000027
其中,CurPoc表示当前帧的POC号,DesPoc表示当前块的参考帧的POC号,SrcPoc表示控制点的参考帧的POC号,MV s表示缩放得到的运动矢量,MV表示控制点的运动矢量。
需要说明的是,亦可将不同控制点的组合转换为同一位置的控制点。
例如将{CP1,CP4},{CP2,CP3},{CP2,CP4},{CP1,CP3},{CP3,CP4}组合得到的4参数仿射变换模型转换为通过{CP1,CP2}或{CP1,CP2,CP3}来表示。转换方法为将控制点的运动矢量及其坐标信息,代入上述公式(2),得到模型参数,再将{CP1,CP2}的坐标信息代入上述公式(3),得到其运动矢量。
更直接地,可以按照以下公式(15)-(23)来进行转换,其中,W表示当前块的宽度,H表示当前块的高度,公式(15)-(23)中,(vx 0,vy 0)表示CP1的运动矢量,(vx 1,vy 1)表示CP2的运动矢量,(vx 2,vy 2)表示CP3的运动矢量,(vx 3,vy 3)表示CP4的运动矢量。
{CP1,CP2}转换为{CP1,CP2,CP3}可以通过如下公式(15)实现,即{CP1,CP2,CP3}中CP3的运动矢量可以通过公式(15)来确定:
Figure PCTCN2018116984-appb-000028
{CP1,CP3}转换{CP1,CP2}或{CP1,CP2,CP3}可以通过如下公式(16)实现:
Figure PCTCN2018116984-appb-000029
{CP2,CP3}转换为{CP1,CP2}或{CP1,CP2,CP3}可以通过如下公式(17)实现:
Figure PCTCN2018116984-appb-000030
{CP1,CP4}转换为{CP1,CP2}或{CP1,CP2,CP3}可以通过如下公式(18)或者(19)实现:
Figure PCTCN2018116984-appb-000031
Figure PCTCN2018116984-appb-000032
{CP2,CP4}转换为{CP1,CP2}可以通过如下公式(20)实现,{CP2,CP4}转换为{CP1,CP2,CP3}可以通过公式(20)和(21)实现:
Figure PCTCN2018116984-appb-000033
Figure PCTCN2018116984-appb-000034
{CP3,CP4}转换为{CP1,CP2}可以通过如下公式(20)实现,{CP3,CP4}转换为{CP1,CP2,CP3}可以通过如下公式(22)和(23)实现:
Figure PCTCN2018116984-appb-000035
Figure PCTCN2018116984-appb-000036
例如将{CP1,CP2,CP4},{CP2,CP3,CP4},{CP1,CP3,CP4}组合的6参数仿射变换模型转换为控制点{CP1,CP2,CP3}来表示。转换方法为将控制点的运动矢量及其坐标信息,代入上述公式(4),得到模型参数,再将{CP1,CP2,CP3}的坐标信息代入公式上述(5),得到其运动矢量。
更直接地,可以按照以下公式(24)-(26)进行转换,其中,W表示当前块的宽度,H表示当前块的高度,公式(24)-(26)中,(vx 0,vy 0)表示CP1的运动矢量,(vx 1,vy 1)表示CP2的运动矢量,(vx 2,vy 2)表示CP3的运动矢量,(vx 3,vy 3)表示CP4的运动矢量。
{CP1,CP2,CP4}转换为{CP1,CP2,CP3}可以通过公式(22)实现:
Figure PCTCN2018116984-appb-000037
{CP2,CP3,CP4}转换为{CP1,CP2,CP3}可以通过公式(23)实现:
Figure PCTCN2018116984-appb-000038
{CP1,CP3,CP4}转换为{CP1,CP2,CP3}可以通过公式(24)实现:
Figure PCTCN2018116984-appb-000039
具体实施例中,将当前所构造的控制点运动信息加入候选运动矢量列表后,若此时候选列表的长度小于最大列表长度(如MaxAffineNumMrgCand),则按照预置的顺序遍历这些组合,得到合法的组合作为控制点的候选运动信息,如果此时候选运动矢量列表为空,则将该控制点的候选运动信息加入候选运动矢量列表;否则依次遍历候选运动矢量列表中的运动信息,检查候选运动矢量列表中是否存在与该控制点的候选运动信息相同的运动信息。如果候选运动矢量列表中不存在与该控制点的候选运动信息相同的运动信息,则将该控制点的候选运动信息加入候选运动矢量列表。
示例性的,一种预置的顺序如下:Affine(CP1,CP2,CP3)→Affine(CP1,CP2,CP4)→Affine(CP1,CP3,CP4)→Affine(CP2,CP3,CP4)→Affine(CP1,CP2)→Affine(CP1,CP3)→Affine(CP2,CP3)→Affine(CP1,CP4)→Affine(CP2,CP4)→ Affine(CP3,CP4),总共10种组合。
若组合对应的控制点运动信息不可得,则认为该组合不可得。若组合可得,确定该组合的参考帧索引(两个控制点时,选择参考帧索引最小的作为该组合的参考帧索引;大于两个控制点时,先选择出现次数最多的参考帧索引,若有多个参考帧索引的出现次数一样多,则选择参考帧索引最小的作为该组合的参考帧索引),并将控制点的运动矢量进行缩放。若缩放后的所有控制点的运动信息一致,则该组合不合法。
可选地,本发明实施例还可以针对候选运动矢量列表进行填充,比如,经过上述遍历过程后,此时候选运动矢量列表的长度小于最大列表长度(如MaxAffineNumMrgCand),则可以对候选运动矢量列表进行填充,直到列表的长度等于最大列表长度。
可以通过补充零运动矢量的方法进行填充,或者通过将现有列表中已存在的候选的运动信息进行组合、加权平均的方法进行填充。需要说明的是,其他获得候选运动矢量列表填充的方法也可适用于本发明实施例,在此不做赘述。
现有方案中,对于继承的控制点运动矢量预测方法,同一图像序列采用的非平动运动模型是固定的,图像中的不同块采用的仿射变换模型的参数个数相同,亦即仿射编码块采用的仿射变换模型的参数个数和当前块采用的仿射变换模型的参数个数相一致,所以,仿射编码块的控制点数量及该控制点在仿射编码块的位置,与当前块的控制点数量及该控制点在当前块的位置分别一致。
例如,如果仿射编码块采用4参数仿射变换模型,那么当前块也采用4参数仿射变换模型,解码端根据当前块的4参数仿射变换模型获得当前块中的每个子块的运动矢量信息,从而实现对每个子块的重建。
又例如,如果仿射编码块采用8参数双线性模型,那么当前块也采用8参数双线性模型,解码端根据当前块的8参数双线性模型获得当前块中的每个子块的运动矢量信息,从而实现对每个子块的重建。
实践表明,图像中的不同块的仿射运动可能各有差异(即当前块的仿射运动可能不同于仿射编码块的仿射运动),所以,现有基于与仿射编码块同阶的仿射变换模型对当前块进行解析(如建立候选运动矢量列表)和重建,会导致对当前块进行预测的编码效率及准确性并不高,某些场景下仍难满足用户需求。
为了克服现有方案的缺陷,提高编解码过程中预测的编码效率和准确性,本发明实施例对上文所述继承的控制点运动矢量预测方法进行改进,包括两种改进方案:第一种改进方案和第二种改进方案。其中,第一种改进方案又可称为第一种基于运动模型的运动矢量预测方法,第二种改进方案又可称为第二种基于运动模型的运动矢量预测方法。下面分别进行描述:
5)第一种基于运动模型的运动矢量预测方法。第一种基于运动模型的运动矢量预测方法是指,对图像序列中的图像的不同块,不限定不同块采用的仿射变换模型,也就是不同块可以采用不同的仿射变换模型。在对当前块的编解码过程中,先确定当前块所采用的仿射变换模型。当前块所采用的仿射变换模型可以是预定义的,也可以是根据当前块的仿射 变换模型的实际运动情况或实际需要而从多种仿射变换模型中选择出来的。如果当前块的相邻块(在编码端又称为仿射编码块,在解码端又称为仿射解码块)采用的2乘N个参数仿射变换模型,而当前块采用的2乘K个参数仿射变换模型,并且,N≠K。那么,根据所述相邻块采用的2乘N个参数的仿射变换模型,通过插值计算获得当前块的K个控制点的运动矢量(候选运动矢量)。
下面以如图10所示出的A1为例描述当前块的控制点的候选运动矢量的确定过程,该确定过程主要从解码端的角度进行描述,此时A1所在的相邻块为仿射解码块。可以理解的,对于编码端的实施情况可以以此类推,即若在编码端当前块的相邻块为仿射编码块,该实施情况本文将不再赘述。
举例来说,若A1所在的仿射解码块采用6参数仿射变换模型,而当前块采用4参数仿射变换模型,则获得该仿射解码块左上顶点(x4,y4)的运动矢量(vx4,vy4)、右上顶点(x5,y5)的运动矢量(vx5,vy5)、左下顶点(x6,y6)的运动矢量(vx6,vy6)。利用仿射解码块上述3个控制点的运动矢量组成的6参数仿射变换模型,按照如下6参数仿射变换模型公式(27)、(28)分别插值计算,从而获得当前块左上顶点(x0,y0)的运动矢量(vx0,vy0)、当前块右上顶点(x1,y1)的运动矢量(vx1,vy1):
Figure PCTCN2018116984-appb-000040
Figure PCTCN2018116984-appb-000041
又举例来说,若A1所在的仿射解码块采用4参数仿射变换模型,而当前块采用6参数仿射变换模型,则获得该仿射解码块左上顶点(x4,y4)的运动矢量(vx4,vy4)、右上顶点(x5,y5)的运动矢量(vx5,vy5)。则获取仿射解码块2个控制点的运动矢量:左上控制点(x4,y4)的运动矢量值(vx4,vy4)和右上控制点(x5,y5)的运动矢量值(vx5,vy5)。利用仿射解码块2个控制点组成的4参数仿射变换模型,按照如下4参数仿射变换模型公式(29)、(30)、(31)分别插值计算,从而获得当前块左上顶点(x0,y0)的运动矢量(vx0,vy0)、当前块右上顶点(x1,y1)的运动矢量(vx1,vy1)和当前块左下顶点(x2,y2)的运动矢量(vx2,vy2):
Figure PCTCN2018116984-appb-000042
Figure PCTCN2018116984-appb-000043
Figure PCTCN2018116984-appb-000044
需要说明的是,上述例子仅仅用于解释本发明技术方案而非限定。另外,关于当前块与相邻块采用其他仿射变换模型的情况(如当前块采用4参数仿射变换模型,相邻块采用8参数双线性模型;又如当前块采用6参数仿射变换模型,相邻块采用8参数双线性模型,等等)也可参考上述例子的实施方式来实现,这里不再一一详述。
还需要说明的是,由于本方案中,并不限定当前块与相邻块的模型参数个数是否相同,所以,在一些实施场景中,当前块也有可能采用与相邻块相同的模型参数个数。
举例来说,若A1所在的仿射解码块采用4参数仿射变换模型,而当前块也采用4参数仿射变换模型,则获得该仿射解码块左上顶点(x4,y4)的运动矢量(vx4,vy4)、右上顶点(x5,y5)的运动矢量(vx5,vy5)。利用仿射解码块上述2个控制点的运动矢量组成的4参数仿射变换模型,按照如下4参数仿射变换模型公式(32)、(33)分别插值计算,从而获得当前块左上顶点(x0,y0)的运动矢量(vx0,vy0)、当前块右上顶点(x1,y1)的运动矢量(vx1,vy1):
Figure PCTCN2018116984-appb-000045
Figure PCTCN2018116984-appb-000046
又举例来说,若A1所在的仿射解码块采用6参数仿射变换模型,而当前块采用6参数仿射变换模型,则获得该仿射解码块左上顶点(x4,y4)的运动矢量(vx4,vy4)、右上顶点(x5,y5)的运动矢量(vx5,vy5)、左下顶点(x6,y6)的运动矢量(vx6,vy6)。利用仿射解码块上述3个控制点的运动矢量组成的6参数仿射变换模型,按照如下6参数仿射变换模型公式(34)、(35)、(36)分别插值计算,从而获得当前块左上顶点(x0,y0)的运动矢量(vx0,vy0)、当前块右上顶点(x1,y1)的运动矢量(vx1,vy1)、当前块左下顶点(x2,y2)的运动矢量(vx2,vy2):
Figure PCTCN2018116984-appb-000047
Figure PCTCN2018116984-appb-000048
Figure PCTCN2018116984-appb-000049
需要说明的是,上述例子仅仅用于解释本发明技术方案而非限定。另外,关于当前块与相邻块采用其他仿射变换模型的情况(如当前块采用8参数仿射变换模型,相邻块也采用8参数双线性模型,等等)也可参考上述例子的实施方式来实现,这里不再一一详述。
实施本发明的第一种基于运动模型的运动矢量预测方法,能够实现在当前块的解析阶 段(如构造候选运动矢量列表的阶段),利用相邻块的仿射变换模型来构建针对当前块自身的仿射变换模型,且两者的仿射变换模型可以不同。由于当前块自身的仿射变换模型更加符合当前块实际运动情况/实际需求,所以实施本方案能够提高对当前块进行预测的编码效率及准确性,满足用户需求。
6)第二种基于运动模型的运动矢量预测方法。第二种基于运动模型的运动矢量预测方法是指,对图像序列中的图像的不同块,不限定不同块采用的仿射变换模型,不同块可以采用相同或者不同的仿射变换模型。也即是说,如果当前块的相邻块(在编码端又称为仿射编码块,在解码端又称为仿射解码块)采用的2乘N个参数仿射变换模型,而当前块采用的2乘K个参数仿射变换模型,那么,N可能等于K,N也可能不等于K。在解析阶段(如构造候选运动矢量列表的阶段),可根据上文“3)”中描述的继承的控制点运动矢量预测方法,或者上文“5)”中描述的第一种基于运动模型的运动矢量预测方法,获得当前块的控制点(如2个控制点、或3个控制点,或4个控制点,等等)。然后,在当前块的重建阶段,根据当前块的控制点,统一采用6参数仿射变换模型来获得当前块中的每个子块的运动矢量信息,从而实现对每个子块的重建。
下面同样以如图6所示出的A1为例描述当前块的控制点的候选运动矢量的确定过程(以解码端的角度进行描述),其他情况以此类推。
举例来说,若当前块在解析阶段采用4参数仿射变换模型,而相邻块可能采用4参数仿射变换模型,也可能采用其他参数仿射模型。那么,可根据上文“3)”中描述的继承的控制点运动矢量预测方法,或者上文“5)”中描述的第一种基于运动模型的运动矢量预测方法,获得当前块的2个控制点的运动矢量,例如,当前块的左上控制点(x4,y4)的运动矢量值(vx4,vy4)和右上控制点(x5,y5)的运动矢量值(vx5,vy5)。那么,在当前块的重建阶段,需根据当前块的2个控制点的运动矢量,构建6参数仿射变换模型。
比如,可根据当前块的左上控制点(x0,y0)的运动矢量(vx0,vy0)和右上控制点(x1,y1)的运动矢量(vx1,vy1),采用如下公式(40)获得第3个控制点的运动矢量值,所述第3个控制点的运动矢量值例如为当前块左下顶点(x2,y2)的运动矢量(vx2,vy2):
Figure PCTCN2018116984-appb-000050
其中,W表示所述当前块的宽度,H表示所述当前块的高度。
然后,利用当前块的左上控制点(x0,y0)的运动矢量(vx0,vy0、右上控制点(x1,y1)的运动矢量(vx1,vy1)和左下顶点(x2,y2)的运动矢量(vx2,vy2),获得当前块重建阶段的6参数仿射模型,该6参数仿射模型公式如下式(37)所示:
Figure PCTCN2018116984-appb-000051
那么,将当前块中的各个子块(或各个运动补偿单元)的中心点相对于当前块左上顶点(或其他参考点)的坐标(x (i,j),y (i,j))代入到上述公式(37),即可获得每个子块(或每 个运动补偿单元)中心点的运动信息,进而后续实现对每个子块的重建。
需要说明的是,上述例子仅仅用于解释本发明技术方案而非限定。另外,关于当前块在解析阶段采用其他仿射变换模型的情况(如6参数仿射变换模型、8参数双线性模型,等等)也可参考上述例子的实施方式来实现,这里不再一一详述。
实施本发明的第二种基于运动模型的运动矢量预测方法,能够实现在当前块的重建阶段,统一采用6参数仿射变换模型来对当前块进行预测。由于描述当前块的仿射运动的运动模型的参数越多,精度越高,计算复杂度就会越高。而本方案在重建阶段所构建的6参数仿射变换模型可描述图像块的平移、缩放、旋转等等仿射变换,并且在模型复杂度以及建模能力之间取得良好平衡。所以,实施本方案能够提高对当前块进行预测的编码效率及准确性,满足用户需求。
可以理解的是,在本发明的一些实施例中,上述第一种改进方案和第二种改进方案也可以综合在一起实现。
例如,当前块在解析阶段采用4参数仿射变换模型,而相邻块采用6参数仿射变换模型时,可根据上文“5)”中描述的第一种基于运动模型的运动矢量预测方法,获得当前块的2个控制点的运动矢量,再根据根据上文“6)”中描述的第二种基于运动模型的运动矢量预测方法,在重建阶段将2个控制点的运动矢量统一到6参数仿射变换模型,进而后续实现对当前块的每个子块的重建。
又例如,当前块在解析阶段采用6参数仿射变换模型,而相邻块采用4参数仿射变换模型时,可根据上文“5)”中描述的第一种基于运动模型的运动矢量预测方法,获得当前块的3个控制点的运动矢量,再根据根据上文“6)”中描述的第二种基于运动模型的运动矢量预测方法中的公式(32),在重建阶段根据3个控制点的运动矢量组合成6参数仿射变换模型,进而后续实现对当前块的每个子块的重建。
当然,关于第一种改进方案和第二种改进方案综合实现方案还可以是其他的实施例,这里不一一详述。
基于上文的描述,下面进一步描述本发明实施例中,基于仿射变换模型的AMVP模式(Affine AMVP mode)和基于仿射变换模型的Merge模式(Affine Merge mode)。
首先描述基于仿射变换模型的AMVP模式。
对于基于仿射变换模型的AMVP模式,一实施例中,也可利用可利用第一种基于运动模型的运动矢量预测方法和/或构造的控制点运动矢量预测方法,构建基于所述AMVP模式的候选运动矢量列表(或称控制点运动矢量预测值候选列表)。又一实施例中,可利用继承的控制点运动矢量预测方法和/或构造的控制点运动矢量预测方法,构建基于所述AMVP模式的候选运动矢量列表(或称控制点运动矢量预测值候选列表)。列表中的控制点运动矢量预测值可包括2个(如当前块为4参数仿射变换模型的情况)候选的控制点运动矢量,或者包括3个(如当前块为6参数仿射变换模型的情况)候选的控制点运动矢量,或者包括4个(如当前块为8参数双线性模型的情况)候选的控制点运动矢量。
可能的应用场景中,还可将控制点运动矢量预测值候选列表根据特定的规则进行剪枝和排序,并可将其截断或填充至特定的个数。
然后,在编码端,编码器(如前述视频编码器100)利用控制点运动矢量预测值候选列表中的每个控制点运动矢量预测值,通过上述公式(3)或(5)或(7)获得当前编码块中每个子运动补偿单元的运动矢量,进而得到每个子运动补偿单元的运动矢量所指向的参考帧中对应位置的像素值,作为其预测值,进行采用仿射变换模型的运动补偿。计算当前编码块中每个像素点的原始值和预测值之间差值的平均值,选择最小平均值对应的控制点运动矢量预测值为最优的控制点运动矢量预测值,并作为当前编码块2个或3个或4个控制点的运动矢量预测值。此外在编码端,还以控制点运动矢量预测值作为搜索起始点在一定搜索范围内进行运动搜索获得控制点运动矢量(control point motion vectors,CPMV),并计算控制点运动矢量与控制点运动矢量预测值之间的差值(control point motion vectors differences,CPMVD)。然后,编码器将表示该控制点运动矢量预测值在控制点运动矢量预测值候选列表中位置的索引值以及CPMVD编码入码流传递到解码端。
在解码端,解码器(如前述视频解码器200)解析获得码流中的索引值以及控制点运动矢量差值(CPMVD),根据索引值从控制点运动矢量预测值候选列表中确定控制点运动矢量预测值(control point motion vectors predictor,CPMVP),将CPMVP与CPMVD相加,得到控制点运动矢量。
接下来描述基于仿射变换模型的Merge模式。
对于基于仿射变换模型的Merge模式,一实施例中,可利用继承的控制点运动矢量预测方法和/或构造的控制点运动矢量预测方法,构建Merge模式的候选运动矢量列表(或称控制点运动矢量融合候选列表)。又一实施例中,也可利用可利用第一种基于运动模型的运动矢量预测方法和/或构造的控制点运动矢量预测方法,构建Merge模式的候选运动矢量列表(或称控制点运动矢量融合候选列表)。
可能的应用场景中,可将控制点运动矢量融合候选列表根据特定的规则进行剪枝和排序,并可将其截断或填充至特定的个数。
然后,在编码端,编码器(如前述视频编码器100)利用融合候选列表中的每个控制点运动矢量,通过公式(3)或(5)或(7)获得当前编码块中每个子运动补偿单元(像素点或特定方法划分的大小为M×N的像素块)的运动矢量,进而得到每个子运动补偿单元的运动矢量所指向的参考帧中位置的像素值,作为其预测值,进行仿射运动补偿。计算当前编码块中每个像素点的原始值和预测值之间差值的平均值,选择差值的平均值最小对应的控制点运动矢量作为当前编码块2个或3个或4个控制点的运动矢量。将表示该控制点运动矢量在候选列表中位置的索引值编码入码流发送给解码端。
在解码端,解码器(如前述视频解码器200)解析索引值,根据索引值从控制点运动矢量融合候选列表中确定控制点运动矢量(control point motion vectors,CPMV)。
另外,需要说明的是,本发明实施例中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中 a,b,c可以是单个,也可以是多个。
参见图9,基于第一种基于运动模型的运动矢量预测方法的设计方案,本发明实施例提供了一种运动矢量预测方法,该方法可由视频解码器200执行,具体的,可以由视频解码器200的帧间预测器210来执行。视频解码器200可根据具有多个视频帧的视频数据流,执行包括如下步骤的部分或全部来预测当前视频帧的当前解码块(简称当前块)中各子块的运动信息以及进行运动补偿。如图9所示,该方法包括但不限于以下步骤:
步骤601、解析码流,确定当前解码块的帧间预测模式。
具体的,解码端的视频解码器200可解析从编码端传输过来的码流中的语法元素,获得用于指示帧间预测模式的指示信息,从而根据所述指示信息确定当前块的帧间预测模式。
若确定当前块的帧间预测模式为基于仿射变换模型的AMVP模式,则后续执行步骤602a-步骤606a。
若确定当前块的帧间预测模式为基于仿射变换模型的merge模式,则后续执行步骤602b-步骤605b。
步骤602a、构造仿射变换模型的AMVP模式的候选运动矢量列表。
本发明一些具体实施例中,可利用第一种基于运动模型的运动矢量预测方法,得到当前块的控制点的候选运动矢量来加入到AMVP模式对应的候选运动矢量列表。
本发明又一些具体实施例中,也可分别利用第一种基于运动模型的运动矢量预测方法和构造的控制点运动矢量预测方法,当前块的控制点的候选运动矢量来分别加入到AMVP模式对应的候选运动矢量列表。
如果当前块采用4参数仿射变换模型,那么AMVP模式的候选运动矢量列表可为二元组列表,二元组列表中包括一个或者多个用于构造4参数仿射变换模型的二元组。
如果当前块采用6参数仿射变换模型,那么AMVP模式的候选运动矢量列表可为三元组列表,三元组列表中包括一个或者多个用于构造6参数仿射变换模型的三元组。
如果当前块采用8参数双线性模型,那么AMVP模式的候选运动矢量列表可为四元组列表,四元组列表中包括一个或者多个用于构造8参数双线性模型的四元组。
在可能的应用场景中,可将候选运动矢量二元组/三元组/四元组列表根据特定的规则进行剪枝和排序,并可将其截断或填充至特定的个数。
对于第一种基于运动模型的运动矢量预测方法,例如以图10为例,可按照图10中A1→B1→B0→A0→B2的顺序遍历当前块周围的相邻位置块,找到相邻位置块所在的仿射解码块(如图10中A1所在的仿射解码块),利用仿射解码块的控制点构造该仿射解码块的仿射变换模型,进而利用仿射解码块的仿射变换模型推导得到当前块的控制点的候选运动矢量(如候选运动矢量二元组/三元组/四元组),加入到AMVP模式对应的候选运动矢量列表。需要说明的是,其他查找顺序也可以适用于本发明实施例,在此不做赘述。
需要说明的是,当相邻位置块有多个,即与当前块相邻的仿射解码块有多个时,在一种可能实施例中,对于编码端和解码端,都可以先利用与当前块模型参数个数相同的仿射解码块得到当前块的控制点的候选运动矢量,加入到AMVP模式对应的候选运动矢量列表。然后,再利用与当前块模型参数个数不同的仿射解码块得到当前块的控制点的候选运动矢 量,加入到AMVP模式对应的候选运动矢量列表。这样,通过与当前块模型参数个数相同的仿射解码块所得到当前块的控制点的候选运动矢量将会处于列表中的前面位置,这样设计的好处是有利于减少码流中传输的比特数。
例如以图10为例,假设当前解码块参数模型是4参数仿射变换模型,遍历当前块周围的相邻位置块后,确定B1所在的仿射解码块采用的是4参数仿射变换模型,而A1所在的仿射解码块采用的是6参数仿射变换模型。那么,可先利用B1所在的仿射解码块推导当前块的2个控制点的运动矢量并加入到列表,再利用A1所在的仿射解码块推导当前块的2个控制点的运动矢量并加入到列表。
又假设当前解码块参数模型是6参数仿射变换模型,遍历当前块周围的相邻位置块后,确定A1所在的仿射解码块采用的是6参数仿射变换模型,而B1所在的仿射解码块采用的是4参数仿射变换模型。那么,可先利用A1所在的仿射解码块推导当前块的3个控制点的运动矢量并加入到列表,再利用B1所在的仿射解码块推导当前块的3个控制点的运动矢量并加入到列表。
需要说明的是,本发明技术方案不局限于上述例子,采用其他相邻位置块、运动模型、查找顺序也可以适用于本发明,这里不再一一详述。
本步骤602a中,不限定不同块采用的仿射变换模型,也就是当前块采用的仿射变换模型的参数个数可以与仿射解码块不同,也可以与仿射解码块相同。一种实施例中,当前块采用的仿射变换模型可以是通过解析码流确定的,即这种情况下码流中包括当前块的仿射变换模型的指示信息;一种实施例中,当前块所采用的仿射变换模型可以是预先配置的;一种实施例中,当前块采用的仿射变换模型可以是根据当前块的实际运动情况或实际需要而从多种仿射变换模型中选择出来的。
关于利用构造的控制点运动矢量预测方法获得当前块的控制点的候选运动矢量的一些内容已在前文“4)”中做了详细描述,为了说明书的简洁,这里也不再赘述。
关于利用第一种基于运动模型的运动矢量预测方法获得当前块的控制点的候选运动矢量的一些内容已在前文“5)”中做了详细描述,为了说明书的简洁,这里也不再赘述。
需要说明的是,在利用第一种基于运动模型的运动矢量预测方法的一些实施例中,在解码端利用推导当前块的控制点的候选运动矢量的过程中,可能需要获取仿射解码块的仿射变换模型的标志信息(flag),该flag预先存储在解码端本地,该flag用于指示该仿射解码块的进行自身子块的预测时实际采用的仿射变换模型。
举例来说,一应用场景中,当解码端通过识别仿射解码块的flag,确定仿射解码块实际采用的仿射变换模型与当前块采用的仿射变换模型的模型参数个数不同(或者相同)时,才触发解码端利用仿射解码块实际采用的仿射变换模型推导当前块的控制点的候选运动矢量。
比如,当前块采用4参数仿射变换模型时,如果解码端识别仿射解码块的flag,确定仿射解码块实际采用的仿射变换模型与当前块采用的仿射变换模型的模型参数个数不同时,例如该仿射解码块采用6参数仿射变换模型,那么,解码端获取该仿射解码块3个控制点的运动矢量:左上顶点(x4,y4)的运动矢量(vx4,vy4)、右上顶点(x5,y5)的运动矢量(vx5,vy5)、左下顶点(x6,y6)的运动矢量(vx6,vy6)。利用仿射解码块3个控制点组 成的4参数仿射变换模型,按照6参数仿射变换模型公式(27)、(28)分别推导得到当前块左上顶点、右上顶点控制点的候选运动矢量。
又比如,当前块采用4参数仿射变换模型时,如果解码端识别仿射解码块的flag,确定仿射解码块实际采用的仿射变换模型与当前块采用的仿射变换模型的模型参数个数相同时,例如该仿射解码块也采用4参数仿射变换模型,那么,解码端获取该仿射解码块2个控制点的运动矢量:左上控制点(x4,y4)的运动矢量值(vx4,vy4)和右上控制点(x5,y5)的运动矢量值(vx5,vy5)。利用仿射解码块2个控制点组成的4参数仿射变换模型,按照4参数仿射变换模型公式(32)、(33)分别推导得到当前块左上顶点、右上顶点控制点的候选运动矢量。
需要说明的是,在利用第一种基于运动模型的运动矢量预测方法的又一些实施例中,在解码端利用推导当前块的控制点的候选运动矢量的过程中,也可以不需要仿射解码块的仿射变换模型的flag。
举例来说,一应用场景中,解码端确定当前块采用的仿射变换模型后,解码端获取仿射解码块特定数量(该特定数量与当前块的控制点的数量相同,或者不同)的控制点,利用仿射解码块特定数量的控制点组成仿射变换模型,再利用该仿射变换模型推导当前块的控制点的候选运动矢量。
例如,当前块采用4参数仿射变换模型时,解码端不用判断仿射解码块实际采用的仿射变换模型(仿射解码块实际采用的仿射变换模型可能是4参数或6参数或8参数的仿射变换模型),而直接获取该仿射解码块2个控制点的运动矢量:左上控制点(x4,y4)的运动矢量值(vx4,vy4)和右上控制点(x5,y5)的运动矢量值(vx5,vy5)。利用仿射解码块2个控制点组成的4参数仿射模型,按照4参数仿射模型公式(32)、(33)分别推导得到当前块左上顶点、右上顶点控制点的运动矢量。
需要说明的是,本发明技术方案不局限于上述例子,采用其他控制点、运动模型、候选位置、查找顺序也可以适用于本发明,这里不再一一详述。
步骤603a、根据索引值,确定控制点的最优的运动矢量预测值。
具体的,通过解析码流获得候选运动矢量列表的索引值,根据该索引值从上述步骤602a构建的候选运动矢量列表中确定最优的控制点运动矢量预测值。
例如,若当前块采用的仿射运动模型是4参数仿射运动模型,则解析获得索引值,根据索引值从候选运动矢量二元组列表中确定2个控制点的最优运动矢量预测值。
又例如,若当前块采用的仿射运动模型是6参数仿射运动模型,则解析获得索引值,根据索引值从候选运动矢量三元组列表中确定3个控制点的最优运动矢量预测值。
又例如,若当前块采用的仿射运动模型是8参数双线性模型,则解析获得索引值,根据索引值从候选运动矢量四元组列表中确定4个控制点的最优运动矢量预测值。
步骤604a、根据运动矢量差值,确定控制点的实际运动矢量。
具体的,通过解析码流获得控制点的运动矢量差值,然后根据控制点的运动矢量差值以及上述步骤603a所确定的最优的控制点运动矢量预测值,得到控制点的运动矢量。
例如,当前块采用的仿射运动模型是4参数仿射运动模型,从码流中解码得到当前块的2个控制点的运动矢量差值,示例性的,可从码流中解码得到左上位置控制点和右上位 置控制点的运动矢量差值。然后分别使用各控制点的运动矢量差值和运动矢量预测值相加,获得控制点的实际运动矢量值,即得到当前块左上位置控制点和右上位置控制点的运动矢量值。
又例如,当前块仿射运动模型是6参数仿射运动模型,从码流中解码得到当前块的3个控制点的运动矢量差,示例性的,从码流中解码得到左上控制点、右上控制点和左下控制点的运动矢量差值。然后,分别使用各控制点的运动矢量差值和运动矢量预测值相加,获得控制点的实际运动矢量值,即得到当前块左上控制点、右上控制点和左下控制点的运动矢量值。
需要说明的是,本发明实施例还可以是其他仿射运动模型和其他控制点位置,在此不做赘述。
步骤605a、根据当前块采用的仿射变换模型获得当前块的每个子块的运动矢量值。
对于当前块PxQ的每一个子块MxN(一个子块也可以等效为一个运动补偿单元,子块PxQ的长或宽至少一个小于当前块的长或宽),可采用运动补偿单元中预设位置像素点的运动信息来表示该运动补偿单元内所有像素点的运动信息。假设运动补偿单元的尺寸为MxN,则预设位置像素点可以为运动补偿单元中心点(M/2,N/2)、左上像素点(0,0),右上像素点(M-1,0),或其他位置的像素点。
下面以运动补偿单元中心点为例说明,参见图11A和图11B。
图11A示例性示出了一种当前块以及当前块的运动补偿单元,图示中每个小方框表示一个运动补偿单元,图示中每个运动补偿单元规格为4x4,图示中每个运动补偿单元中的灰色点即表示该运动补偿单元的中心点。图11A中V0表示当前块的左上控制点的运动矢量,V1表示当前块的右上控制点的运动矢量,V2表示当前块的左下控制点的运动矢量。
图11B示例性示出了又一种当前块以及当前块的运动补偿单元,图示中每个小方框表示一个运动补偿单元,图示中每个运动补偿单元规格为8x8,图示中每个运动补偿单元中的灰色点即表示该运动补偿单元的中心点。。图11B中V0表示当前块的左上控制点的运动矢量,V1表示当前块的右上控制点的运动矢量,V2表示当前块的左下控制点的运动矢量。
运动补偿单元中心点相对于当前块左上顶点像素的坐标可使用如下公式(38)计算得到:
Figure PCTCN2018116984-appb-000052
其中i为水平方向第i个运动补偿单元(从左到右),j为竖直方向第j个运动补偿单元(从上到下),(x (i,j),y (i,j))表示第(i,j)个运动补偿单元中心点相对于当前仿射解码块左上控制点像素的坐标。
若当前仿射解码块采用的仿射运动模型为6参数仿射运动模型,将(x (i,j),y (i,j))代入前述6参数仿射运动模型公式(37),获得每个运动补偿单元中心点的运动矢量,作为该运动补偿单元内所有像素点的运动矢量(vx (i,j),vy (i,j)):
Figure PCTCN2018116984-appb-000053
若当前仿射解码块采用的仿射运动模型为4仿射运动模型,将(x (i,j),y (i,j))代入4参数仿射运动模型公式(39),获得每个运动补偿单元中心点的运动矢量,作为该运动补偿单元内所有像素点的运动矢量(vx (i,j),vy (i,j)):
Figure PCTCN2018116984-appb-000054
步骤606a、针对每个子块根据确定的子块的运动矢量值进行运动补偿得到该子块的像素预测值。
步骤602b、构造仿射变换模型的Merge模式的候选运动矢量列表。
本发明一些具体实施例中,也可利用第一种基于运动模型的运动矢量预测方法,得到当前块的控制点的候选运动矢量来加入到Merge模式对应的候选运动矢量列表。
本发明又一些具体实施例中,也可分别利用第一种基于运动模型的运动矢量预测方法和构造的控制点运动矢量预测方法,当前块的控制点的候选运动矢量来分别加入到Merge模式对应的候选运动矢量列表。
同理,对于Merge模式对应的候选运动矢量列表,如果当前块采用4参数仿射变换模型,那么候选运动矢量列表可为二元组列表,二元组列表中包括一个或者多个用于构造4参数仿射变换模型的二元组。
如果当前块采用6参数仿射变换模型,那么候选运动矢量列表可为三元组列表,三元组列表中包括一个或者多个用于构造6参数仿射变换模型的三元组。
如果当前块采用8参数双线性模型,那么候选运动矢量列表可为四元组列表,四元组列表中包括一个或者多个用于构造8参数双线性模型的四元组。
在可能的应用场景中,可将候选运动矢量二元组/三元组/四元组列表根据特定的规则进行剪枝和排序,并可将其截断或填充至特定的个数。
同理,对于第一种基于运动模型的运动矢量预测方法,例如以图10为例,可按照图10中A1→B1→B0→A0→B2的顺序遍历当前块周围的相邻位置块,找到相邻位置块所在的仿射解码块,利用仿射解码块的控制点构造该仿射解码块的仿射变换模型,进而利用仿射解码块的仿射变换模型推导得到当前块的控制点的候选运动矢量(如候选运动矢量二元组/三元组/四元组),加入到Merge模式对应的候选运动矢量列表。需要说明的是,其他查找顺序也可以适用于本发明实施例,在此不做赘述。
具体的,在上述遍历过程中,如果候选运动矢量列表为空,则将该控制点的候选运动信息加入候选列表;否则,继续依次遍历候选运动矢量列表中的运动信息,检查候选运动矢量列表中是否存在与该控制点的候选运动信息相同的运动信息。如果候选运动矢量列表中不存在与该控制点的候选运动信息相同的运动信息,则将该控制点的候选运动信息加入候选运动矢量列表。
其中,判断两个候选运动信息是否相同需要依次判断它们的前后向参考帧、以及各个前后向运动矢量的水平和竖直分量是否相同。只有当以上所有元素都不相同时才认为这两个运动信息是不同的。
如果候选运动矢量列表中的运动信息个数达到最大列表长度,则候选列表构建完毕,否则遍历下一个相邻位置块。
关于利用构造的控制点运动矢量预测方法获得当前块的控制点的候选运动矢量的一些内容已在前文“4)”中做了详细描述,为了说明书的简洁,这里也不再赘述。
关于利用第一种基于运动模型的运动矢量预测方法获得当前块的控制点的候选运动矢量的一些内容已在前文“5)”中做了详细描述,为了说明书的简洁,这里也不再赘述。
需要说明的是,在利用第一种基于运动模型的运动矢量预测方法的一些实施例中,对于仿射变换模型的Merge模式,图像中不同的块也可不区分4参数、6参数、8参数等仿射变换模型,即不同的块可采用同样参数个数的仿射变换模型。
举例来说,图像中不同的块皆采用6参数仿射变换模型。以图10中A1为例,获取A1所在的仿射解码块3个控制点的运动矢量:左上控制点(x4,y4)的运动矢量值(vx4,vy4)和右上控制点(x5,y5)的运动矢量值(vx5,vy5)和左下顶点(x6,y6)的运动矢量(vx6,vy6)。然后,利用相邻仿射解码块3个控制点组成的6参数仿射模型,按照公式(34)、(35)、(36)分别推导得到当前块左上顶点、右上顶点、左下顶点控制点的运动矢量。
需要说明的是,本发明技术方案不局限于上述例子,采用其他控制点、运动模型、候选位置、查找顺序也可以适用于本发明,这里不再一一详述。
步骤603b、根据索引值,确定控制点的运动矢量值。
具体的,通过解析码流获得候选运动矢量列表的索引值,根据该索引值从上述步骤602b构建的候选运动矢量列表中确定控制点的实际运动矢量。
例如,若当前块采用的仿射运动模型是4参数仿射运动模型,则解析获得索引值,根据索引值从候选运动矢量二元组列表中确定2个控制点的运动矢量值。
又例如,若当前块采用的仿射运动模型是6参数仿射运动模型,则解析获得索引值,根据索引值从候选运动矢量三元组列表中确定3个控制点的运动矢量值。
又例如,若当前块采用的仿射运动模型是8参数双线性模型,则解析获得索引值,根据索引值从候选运动矢量四元组列表中确定4个控制点的运动矢量值。
步骤604b、根据当前块采用的仿射变换模型获得每个子块的运动矢量值。本步骤的详细实施可参考上述步骤605a的描述,为了说明书的简洁,这里不再赘述。
步骤605b、每个子块根据相应的运动矢量值进行运动补偿得到该子块的像素预测值。
可以看到,本发明实施例中,解码端在对当前块的预测过程中采用了第一种基于运动模型的运动矢量预测方法,能够实现在当前块的解析阶段(如构造AMVP模式或Merge模式的候选运动矢量列表的阶段),利用相邻块的仿射变换模型来构建针对当前块自身的仿射变换模型,且两者的仿射变换模型可以不同,也可以相同。由于当前块自身的仿射变换模型更加符合当前块实际运动情况/实际需求,所以实施本方案能够提高对当前块进行预测的编码效率及准确性,满足用户需求。
参见图12,基于第二种基于运动模型的运动矢量预测方法的设计方案,本发明实施例提供了又一种运动矢量预测方法,该方法可由视频解码器200执行,具体的,可以由视频解码器200的帧间预测器210来执行。视频解码器200可根据具有多个视频帧的视频数据流,执行包括如下步骤的部分或全部来预测当前视频帧的当前解码块(简称当前块)中各子块的运动信息以及进行运动补偿。如图12所示,该方法包括但不限于以下步骤:
步骤701、解析码流,确定当前解码块的帧间预测模式。
具体的,解码端的视频解码器200可解析从编码端传输过来的码流中的语法元素,获得用于指示帧间预测模式的指示信息,从而根据所述指示信息确定当前块的帧间预测模式。
若确定当前块的帧间预测模式为基于仿射变换模型的AMVP模式,则后续执行步骤702a-步骤706a。
若确定当前块的帧间预测模式为基于仿射变换模型的merge模式,则后续执行步骤702b-步骤705b。
步骤702a、构造仿射变换模型的AMVP模式的候选运动矢量列表。
本发明实施例中,对图像序列中的图像的不同块采用的仿射变换模型的不限定,即不同块可以采用相同或者不同的仿射变换模型。
在一具体实施例中,可利用继承的控制点运动矢量预测方法,得到当前块的控制点的候选运动矢量来加入到AMVP模式对应的候选运动矢量列表。
在一具体实施例中,可利用第一种基于运动模型的运动矢量预测方法,得到当前块的控制点的候选运动矢量来加入到AMVP模式对应的候选运动矢量列表。
在一具体实施例中,构造的控制点运动矢量预测方法,得到当前块的控制点的候选运动矢量来加入到AMVP模式对应的候选运动矢量列表。
在又一些具体实施例中,也可分别利用继承的控制点运动矢量预测方法、第二种基于运动模型的运动矢量预测方法或构造的控制点运动矢量预测方法中的任意两种,得到当前块的控制点的候选运动矢量来分别加入到AMVP模式对应的候选运动矢量列表。
在又一些具体实施例中,也可分别利用继承的控制点运动矢量预测方法、第二种基于运动模型的运动矢量预测方法和构造的控制点运动矢量预测方法,得到当前块的控制点的候选运动矢量来分别加入到AMVP模式对应的候选运动矢量列表。
关于利用继承的控制点运动矢量预测方法获得当前块的控制点的候选运动矢量的一些内容可参考前文“3)”中做了详细描述,为了说明书的简洁,这里也不再赘述。
关于利用构造的控制点运动矢量预测方法获得当前块的控制点的候选运动矢量的一些内容可参考前文“4)”中做了详细描述,为了说明书的简洁,这里也不再赘述。
关于利用第一种基于运动模型的运动矢量预测方法获得当前块的控制点的候选运动矢量的一些内容可参考前文“5)”中做了详细描述,且还可参考前述图9实施例步骤602a中的相关描述,为了说明书的简洁,这里也不再赘述。
举例来说,如果当前块采用4参数仿射变换模型,那么AMVP模式的候选运动矢量列表可为二元组列表,二元组列表中包括一个或者多个用于构造4参数仿射变换模型的二元组。
如果当前块采用6参数仿射变换模型,那么AMVP模式的候选运动矢量列表可为三元组列表,三元组列表中包括一个或者多个用于构造6参数仿射变换模型的三元组。
如果当前块采用8参数双线性模型,那么AMVP模式的候选运动矢量列表可为四元组列表,四元组列表中包括一个或者多个用于构造8参数双线性模型的四元组。
在可能的应用场景中,还可将候选运动矢量二元组/三元组/四元组列表根据特定的规则进行剪枝和排序,并可将其截断或填充至特定的个数。
步骤703a、根据索引值,确定控制点的最优的运动矢量预测值。具体内容可参考前述图9实施例步骤603a中的相关描述,这里也不再赘述。
步骤704a、根据运动矢量差值,确定当前块的3个控制点的运动矢量值。
具体的,通过解析码流获得控制点的运动矢量差值,然后根据控制点的运动矢量差值以及上述步骤703a所确定的最优的控制点运动矢量预测值,获得控制点的运动矢量。然后,基于所获得的控制点的运动矢量,确定出当前块的3个控制点的运动矢量值。
举例来说,解码端在步骤702a所构造的候选运动矢量列表为二元组列表,那么,在步骤703a解析索引值,根据索引值从候选运动矢量列表中确定2个控制点(即二元组)的运动矢量预测值(MVP)。在步骤704a从码流中解码得到当前块2个控制点的运动矢量差值(MVD),分别根据2个控制点的MVP和MVD获得所述2个控制点的运动矢量值(MV),所述2个控制点的运动矢量值例如为:当前块左上控制点(x0,y0)的运动矢量值(vx0,vy0),和,当前块右上控制点(x1,y1)的运动矢量值(vx1,vy1)。然后,根据当前块2个控制点的运动矢量值,组成4参数仿射变换模型,采用该4参数仿射变换模型公式(40)通过获得第3个控制点的运动矢量值,所述第3个控制点的运动矢量值例如为当前块左下顶点(x2,y2)的运动矢量(vx2,vy2)。这样,就确定出当前块的左上顶点、右上顶点、左下顶点这3个控制点的运动矢量值。
又举例来说,解码端在步骤702a所构造的候选运动矢量列表为三元组列表,那么,在步骤703a解析索引值,根据索引值从候选运动矢量列表中确定3个控制点(即三元组)的运动矢量预测值(MVP)。在步骤704a从码流中解码得到当前块3个控制点的运动矢量差值(MVD),分别根据3个控制点的MVP和MVD获得所述3个控制点的运动矢量值(MV),所述3个控制点的运动矢量值例如为:当前块左上控制点(x0,y0)的运动矢量值(vx0,vy0)、当前块右上控制点(x1,y1)的运动矢量值(vx1,vy1),以及当前块左下顶点(x2,y2)的运动矢量(vx2,vy2)。
这样,就确定出当前块的左上顶点、右上顶点、左下顶点这3个控制点的运动矢量值。
又举例来说,解码端在步骤702a所构造的候选运动矢量列表为四元组列表,那么,在步骤703a解析索引值,根据索引值从候选运动矢量列表中确定4个控制点(即四元组)的运动矢量预测值(MVP)。在步骤704a从码流中解码得到当前块4个控制点的运动矢量差值(MVD),分别根据4个控制点的MVP和MVD获得所述4个控制点的运动矢量值(MV),所述4个控制点的运动矢量值例如为:当前块左上控制点(x0,y0)的运动矢量值(vx0,vy0)、当前块右上控制点(x1,y1)的运动矢量值(vx1,vy1)、当前块左下顶点(x2,y2)的运动矢量(vx2,vy2),以及当前块右下顶点(x3,vy3)的运动矢量(vx3,vy3)。然后解码端可只采用当前块的左上顶点、右上顶点、左下顶点这3个控制点的运动矢量值。
需要说明的是,本发明技术方案不局限于上述例子,采用其他控制点、运动模型也可以适用于本发明,这里不再一一详述。
步骤705a、根据当前块3个控制点,采用6参数仿射变换模型获得每个子块的运动矢量值。
具体的,由于通过步骤704a已经确定了当前块的3个控制点的运动矢量值,那么就可以根据当前块3个控制点的运动矢量值,组成6参数仿射变换模型,采用该6参数仿射变换模型获得每个子块的运动矢量值。
举例来说,所述3个控制点的运动矢量值例如为:当前块左上控制点(x0,y0)的运动矢量值(vx0,vy0)、当前块右上控制点(x1,y1)的运动矢量值(vx1,vy1),以及当前块左下顶点(x2,y2)的运动矢量(vx2,vy2)。那么,利用当前块的左上控制点(x0,y0)的运动矢量(vx0,vy0、右上控制点(x1,y1)的运动矢量(vx1,vy1)和左下顶点(x2,y2)的运动矢量(vx2,vy2),获得当前块重建阶段的6参数仿射模型,该6参数仿射模型公式如前述公式(37)所示。
那么,将当前块中的各个子块(或各个运动补偿单元)的预设位置像素点相对于当前块左上顶点(或其他参考点)的坐标(x (i,j),y (i,j))代入到上述公式(37),即可获得每个子块的运动矢量值。其中,预设位置像素点可以是各个子块(或各个运动补偿单元)的中心点,各个子块(或各个运动补偿单元)的中心点相对于当前块左上顶点像素的坐标(x (i,j),y (i,j))可使用公式(38)计算得到。具体内容还可参考图11A实施例和图11B实施例的相关描述,这里不再赘述。
步骤706a、每个子块根据相应的运动矢量值进行运动补偿得到该子块的像素预测值。
步骤702b、构造仿射变换的Merge模式的候选运动矢量列表。
同理,本发明实施例中,对图像序列中的图像的不同块采用的仿射变换模型的不限定,即不同块可以采用相同或者不同的仿射变换模型。
在一具体实施例中,可利用继承的控制点运动矢量预测方法,得到当前块的控制点的候选运动矢量来加入到Merge模式对应的候选运动矢量列表。
在一具体实施例中,可利用第一种基于运动模型的运动矢量预测方法,得到当前块的控制点的候选运动矢量来加入到Merge模式对应的候选运动矢量列表。
在一具体实施例中,构造的控制点运动矢量预测方法,得到当前块的控制点的候选运动矢量来加入到Merge模式对应的候选运动矢量列表。
在又一些具体实施例中,也可分别利用继承的控制点运动矢量预测方法、第二种基于运动模型的运动矢量预测方法或构造的控制点运动矢量预测方法中的任意两种,得到当前块的控制点的候选运动矢量来分别加入到Merge模式对应的候选运动矢量列表。
在又一些具体实施例中,也可分别利用继承的控制点运动矢量预测方法、第二种基于运动模型的运动矢量预测方法和构造的控制点运动矢量预测方法,得到当前块的控制点的候选运动矢量来分别加入到Merge模式对应的候选运动矢量列表。
关于利用继承的控制点运动矢量预测方法获得当前块的控制点的候选运动矢量的一些内容可参考前文“3)”中做了详细描述,为了说明书的简洁,这里也不再赘述。
关于利用构造的控制点运动矢量预测方法获得当前块的控制点的候选运动矢量的一些内容可参考前文“4)”中做了详细描述,为了说明书的简洁,这里也不再赘述。
关于利用第一种基于运动模型的运动矢量预测方法获得当前块的控制点的候选运动矢 量的一些内容可参考前文“5)”中做了详细描述,且还可参考前述图9实施例步骤60ba中的相关描述,为了说明书的简洁,这里也不再赘述。
需要说明的是,在又一些实施例中,对于仿射变换模型的Merge模式,解码端所建立的候选运动矢量列表可能是候选运动矢量二元组/三元组/四元组列表。此外,还可将候选运动矢量二元组/三元组/四元组列表根据特定的规则进行剪枝和排序,并可将其截断或填充至特定的个数。
需要说明的是,在又一些实施例中,对于仿射变换模型的Merge模式,图像中不同的块也可不区分4参数、6参数、8参数等仿射变换模型,即不同的块可采用同样参数个数的仿射变换模型。
步骤703b、根据索引值,获得控制点的运动矢量值。具体的,通过解析码流获得候选运动矢量列表的索引值,根据该索引值从上述步骤702b构建的候选运动矢量列表中确定控制点的实际运动矢量。本步骤具体实施方式还可参考图9实施例步骤603b的相关描述,这里不再赘述。
步骤704b、根据所获得的控制点的运动矢量,确定当前块的3个控制点的运动矢量值。
举例来说,解码端在步骤703b获得2个控制点(即二元组)的运动矢量值,所述2个控制点的运动矢量值例如为:当前块左上控制点(x0,y0)的运动矢量值(vx0,vy0),和,当前块右上控制点(x1,y1)的运动矢量值(vx1,vy1)。然后,根据当前块2个控制点的运动矢量值,组成4参数仿射变换模型,采用该4参数仿射变换模型公式(31)通过获得第3个控制点的运动矢量值,所述第3个控制点的运动矢量值例如为当前块左下顶点(x2,y2)的运动矢量(vx2,vy2)。这样,就确定出当前块的左上顶点、右上顶点、左下顶点这3个控制点的运动矢量值。
又举例来说,解码端在步骤703b获得3个控制点(即三元组)的运动矢量值,所述3个控制点的运动矢量值例如为:当前块左上控制点(x0,y0)的运动矢量值(vx0,vy0)、当前块右上控制点(x1,y1)的运动矢量值(vx1,vy1),以及当前块左下顶点(x2,y2)的运动矢量(vx2,vy2)。这样,就确定出当前块的左上顶点、右上顶点、左下顶点这3个控制点的运动矢量值。
又举例来说,解码端在步骤703b获得4个控制点(即四元组)的运动矢量值,所述4个控制点的运动矢量值例如为:当前块左上控制点(x0,y0)的运动矢量值(vx0,vy0)、当前块右上控制点(x1,y1)的运动矢量值(vx1,vy1)、当前块左下顶点(x2,y2)的运动矢量(vx2,vy2),以及当前块右下顶点(x3,vy3)的运动矢量(vx3,vy3)。然后解码端可只采用当前块的左上顶点、右上顶点、左下顶点这3个控制点的运动矢量值。
需要说明的是,本发明技术方案不局限于上述例子,采用其他控制点、运动模型也可以适用于本发明,这里不再一一详述。
步骤705b、根据当前块3个控制点,采用6参数仿射变换模型获得每个子块的运动矢量值。本步骤具体实施方式可参考前述步骤705a的相关描述,这里不再赘述。
步骤706b、每个子块根据相应的运动矢量值进行运动补偿得到该子块的像素预测值。
可以看到,本发明实施例中,解码端在对当前块的预测过程中采用了第二种基于运动模型的运动矢量预测方法,能够实现在解析阶段当前块采用的仿射变换模型的参数个数可 以和相邻块不同或相同,在当前块的重建阶段(包括对子块的运动矢量进行预测的阶段),统一采用6参数仿射变换模型来对当前块进行预测。由于本方案在重建阶段所构建的6参数仿射变换模型可描述图像块的平移、缩放、旋转等等仿射变换,并且在模型复杂度以及建模能力之间取得良好平衡。所以,实施本方案能够提高对当前块进行预测的编码效率及准确性,满足用户需求。
参见图13,图13示出了本发明实施例又一种运动矢量预测方法的流程图,该方法可由视频编码器100执行,具体的,可以由视频编码器100的帧间预测器110来执行。视频编码器100可根据具有多个视频帧的视频数据流,执行包括如下步骤的部分或全部来对当前视频帧的当前编码块(简称当前块)进行编码。如图13所示,该方法包括但不限于以下步骤:
801、确定当前编码块的帧间预测模式。
在一具体实现中,对于编码端的帧间预测中,也可预设多种帧间预测模式,所述多种帧内预测模式中例如包括上文所描述的基于仿射运动模型的AMVP模式以及基于仿射运动模型的merge模式,编码端遍历所述多种帧间预测模式,从而确定对当前块的预测最优的帧间预测模式。
在又一具体实现中,对于编码端的帧间预测中,也可只预设一种帧间预测模式,即在这种情况下编码端直接确定当前采用的是默认的帧间预测模式,该默认的帧间预测模式为基于仿射运动模型的AMVP模式或者基于仿射运动模型的merge模式。
本发明实施例中,如果确定当前块的帧间预测模式为基于仿射运动模型的AMVP模式,则后续执行步骤802a-步骤804a。
本发明实施例中,如果确定当前块的帧间预测模式为基于仿射运动模型的AMVP模式,则后续执行步骤802b-步骤804b。
802a、构造仿射变换的AMVP模式的候选运动矢量列表。
一些实施例中,编码端采用了第一种基于运动模型的运动矢量预测方法的设计方案,那么,本步骤的具体实施可参考前述图9实施例步骤602a的描述,这里不再赘述。
又一些实施例中,编码端采用了第二种基于运动模型的运动矢量预测方法的设计方案,那么,本步骤的具体实施可参考前述图12实施例步骤702a的描述,这里不再赘述。
803a、根据率失真代价,确定控制点的最优的运动矢量预测值。
在一些实例中,编码端可利用候选运动矢量列表中的控制点运动矢量预测值(如候选运动矢量二元组/三元组/四元组),通过公式(3)或(5)或(7)获得当前块中每个子运动补偿单元的运动矢量,进而得到每个子运动补偿单元的运动矢量所指向的参考帧中对应位置的像素值,作为其预测值,进行采用仿射运动模型的运动补偿。计算当前编码块中每个像素点的原始值和预测值之间差值的平均值,选择最小平均值对应的控制点运动矢量预测值为最优的控制点运动矢量预测值,并作为当前块2个或3个或4个控制点的运动矢量预测值。
804a、将索引值、控制点的运动矢量差值以及帧间预测模式的指示信息编入码流。
在一些实例中,编码端可使用最优的控制点运动矢量预测值作为搜索起始点在一定搜索范围内进行运动搜索获得控制点运动矢量(control point motion vectors,CPMV),并 计算控制点运动矢量与控制点运动矢量预测值之间的差值(control point motion vectors differences,CPMVD),然后,编码端将表示该控制点运动矢量预测值在候选运动矢量列表中位置的索引值以及CPMVD编码入码流,还可将帧间预测模式的指示信息编入码流,以便于后续传递到解码端。
在又一可能实例中,编码端可将表示当前块采用的仿射变换模型(参数个数)的指示信息编入码流,后续传递到解码端,以便于解码端根据该指示信息确定当前块所采用的仿射变换模型。
802b、构造仿射变换的Merge模式的候选运动矢量列表。
一些实施例中,编码端采用了第一种基于运动模型的运动矢量预测方法的设计方案,那么,本步骤的具体实施可参考前述图9实施例步骤602b的描述,这里不再赘述。
又一些实施例中,编码端采用了第二种基于运动模型的运动矢量预测方法的设计方案,那么,本步骤的具体实施可参考前述图12实施例步骤702b的描述,这里不再赘述。
803b、确定控制点的最优的运动矢量预测值。
在一实例中,编码端可利用候选运动矢量列表中的控制点运动矢量(如候选运动矢量二元组/三元组/四元组),通过公式(3)或(5)或(7)获得当前编码块中每个子运动补偿单元的运动矢量,进而得到每个子运动补偿单元的运动矢量所指向的参考帧中位置的像素值,作为其预测值,进行仿射运动补偿。计算当前编码块中每个像素点的原始值和预测值之间差值的平均值,选择差值的平均值最小对应的控制点运动矢量为最优的控制点运动矢量,该最优的控制点运动矢量即作为当前编码块2个或3个或4个控制点的运动矢量。
804b、将索引值以及帧间预测模式的指示信息编入码流。
在一实例中,编码端可将表示该控制点运动矢量在候选列表中位置的索引值编码入码流,帧间预测模式的指示信息编入码流,以便于后续传递到解码端。
在又一可能实例中,编码端可将表示当前块采用的仿射变换模型(参数个数)的指示信息编入码流,后续传递到解码端,以便于解码端根据该指示信息确定当前块所采用的仿射变换模型。
需要说明的是,上述实施例仅仅描述了编码端实现编码和码流发送的过程,根据前文的描述,本领域技术人员理解编码端还可以在其他环节实施本发明实施例所描述的其他方法。例如在编码端在对当前块的预测中,对当前块的重构过程的具体实现可参考前文在解码端描述的相关方法(如图9或图12实施例),在这里不再赘述。
可以看到,本发明一实施例中,编码端在对当前块的编码过程中,参考了第一种基于运动模型的运动矢量预测方法的设计方案进行实施,能够实现在当前块的解析阶段(如构造AMVP模式或Merge模式的候选运动矢量列表的阶段),利用相邻块的仿射变换模型来构建针对当前块自身的仿射变换模型,且两者的仿射变换模型可以不同,也可以相同。由于当前块自身的仿射变换模型更加符合当前块实际运动情况/实际需求,所以实施本方案能够提高对当前块进行编码的编码效率及准确性,满足用户需求。
还可以看到,本发明一实施例中,编码端在对当前块的编码过程中,参考了第二种基于运动模型的运动矢量预测方法的设计方案进行实施,有利于使解码端在图像块的重建阶段统一采用6参数仿射变换模型来对图像块进行预测。所以,实施本方案能够提高对当前 块进行预测的编码效率及准确性,满足用户需求。
本领域技术人员能够领会,结合本文公开描述的各种说明性逻辑框、模块和算法步骤所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么各种说明性逻辑框、模块、和步骤描述的功能可作为一或多个指令或代码在计算机可读媒体上存储或传输,且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体,其对应于有形媒体,例如数据存储媒体,或包括任何促进将计算机程序从一处传送到另一处的媒体(例如,根据通信协议)的通信媒体。以此方式,计算机可读媒体大体上可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)通信媒体,例如信号或载波。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本发明实施例中描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。
作为实例而非限制,此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来存储指令或数据结构的形式的所要程序代码并且可由计算机存取的任何其它媒体。并且,任何连接被恰当地称作计算机可读媒体。举例来说,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令,那么同轴缆线、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的定义中。但是,应理解,所述计算机可读存储媒体和数据存储媒体并不包括连接、载波、信号或其它暂时媒体,而是实际上针对于非暂时性有形存储媒体。如本文中所使用,磁盘和光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)和蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘利用激光以光学方式再现数据。以上各项的组合也应包含在计算机可读媒体的范围内。
可通过例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指前述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文中所描述的各种说明性逻辑框、模块、和步骤所描述的功能可以提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或者并入在组合编解码器中。而且,所述技术可完全实施于一或多个电路或逻辑元件中。
本发明实施例的技术可在各种各样的装置或设备中实施,包含无线手持机、集成电路(IC)或一组IC(例如,芯片组)。本发明实施例中描述各种组件、模块或单元是为了强调用于执行所揭示的技术的装置的功能方面,但未必需要由不同硬件单元实现。实际上,如上文所描述,各种单元可结合合适的软件和/或固件组合在编码解码器硬件单元中,或者通过互操作硬件单元(包含如上文所描述的一或多个处理器)来提供。
以上所述,仅为本发明实施例示例性的具体实施方式,但本发明实施例的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明实施例揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明实施例的保护范围之内。因此,本发明实施例的保护范围应该以权利要求的保护范围为准。

Claims (16)

  1. 一种运动矢量预测方法,其特征在于,所述方法包括:
    解析码流获得候选运动矢量列表的索引值;
    构造所述候选运动矢量列表;其中,所述候选运动矢量列表包括当前块的K个控制点的候选运动矢量;所述K个控制点的候选运动矢量是根据所述当前块的相邻块采用的2乘N个参数的仿射变换模型而得到的,所述2乘N个参数的仿射变换模型基于所述相邻块的N个控制点的运动矢量而得到,其中,N为大于等于2且小于等于4的整数,K为大于等于2且小于等于4的整数,并且,N不等于K;所述相邻块为与所述当前块空间邻近的已解码图像块,所述当前块包括多个子块;
    根据所述索引值,从所述候选运动矢量列表中确定所述K个控制点的目标候选运动矢量;
    根据所述K个控制点的目标候选运动矢量,得到所述当前块中各个子块的预测运动矢量。
  2. 根据权利要求1所述的方法,其特征在于,N等于2且K等于3,相应的,所述当前块的3个控制点的候选运动矢量是根据所述当前块的相邻块采用的4参数的仿射变换模型而得到的。
  3. 根据权利要求2所述的方法,其特征在于,所述当前块的3个控制点的候选运动矢量包括所述当前块内左上角像素点位置的运动矢量、所述当前块内右上角像素点位置的运动矢量和所述当前块内左下角像素点位置的运动矢量;
    所述当前块的3个控制点的候选运动矢量是根据所述当前块的相邻块采用的4参数的仿射变换模型而得到,包括根据如下公式计算出所述当前块的3个控制点的候选运动矢量:
    Figure PCTCN2018116984-appb-100001
    Figure PCTCN2018116984-appb-100002
    Figure PCTCN2018116984-appb-100003
    其中,vx 0为所述当前块内左上角像素点位置对应的运动矢量的水平分量,vy 0为所述当前块内左上角像素点位置对应的运动矢量的竖直分量;vx 1为所述当前块内右上角像素点位置对应的运动矢量的水平分量,vy 1为所述当前块内右上角像素点位置对应的运动矢量的竖直分量;vx 2为所述当前块内右上角像素点位置对应的运动矢量的水平分量,vy 2为所述当前块内右上角像素点位置对应的运动矢量的竖直分量;vx 4为所述相邻块内左上角像素点位置对应的运动矢量的水平分量,vy 4为所述相邻块内左上角像素点位置对应的运动矢量 的竖直分量;vx 5为所述相邻块内右上角像素点位置对应的运动矢量的水平分量,vy 5为所述相邻块内右上角像素点位置对应的运动矢量的竖直分量;x 0为所述当前块内左上角像素点位置横坐标,y 0为所述当前块内左上角像素点位置纵坐标;x 1为所述当前块内右上角像素点位置横坐标,y 1为所述当前块内右上角像素点位置纵坐标;x 2为所述当前块内左下角像素点位置横坐标,y 2为所述当前块内左下角像素点位置纵坐标;x 4为所述相邻块内左上角像素点位置横坐标,y 4为所述当前块内左上角像素点位置纵坐标;x 5为所述相邻块内右上角像素点位置横坐标。
  4. 根据权利要求1所述的方法,其特征在于,N等于3且K等于2,相应的,所述当前块的2个控制点的候选运动矢量是根据所述当前块的相邻块采用的6参数的仿射变换模型而得到的。
  5. 根据权利要求4所述的方法,其特征在于,所述当前块的2个控制点的候选运动矢量包括所述当前块内左上角像素点位置的运动矢量和所述当前块内右上角像素点位置的运动矢量;所述当前块的2个控制点的候选运动矢量是根据所述当前块的相邻块采用的6参数的仿射变换模型而得到,包括根据如下公式计算出所述当前块的2个控制点的候选运动矢量:
    Figure PCTCN2018116984-appb-100004
    Figure PCTCN2018116984-appb-100005
    其中,vx 0为所述当前块内左上角像素点位置对应的运动矢量的水平分量,vy 0为所述当前块内左上角像素点位置对应的运动矢量的竖直分量;vx 1为所述当前块内右上角像素点位置对应的运动矢量的水平分量,vy 1为所述当前块内右上角像素点位置对应的运动矢量的竖直分量;vx 4为所述相邻块内左上角像素点位置对应的运动矢量的水平分量,vy 4为所述相邻块内左上角像素点位置对应的运动矢量的竖直分量;vx 5为所述相邻块内右上角像素点位置对应的运动矢量的水平分量,vy 5为所述相邻块内右上角像素点位置对应的运动矢量的竖直分量;vx 6为所述相邻块内左下角像素点位置对应的运动矢量的水平分量,vy 6为所述相邻块内左下角像素点位置对应的运动矢量的竖直分量;x 0为所述当前块内左上角像素点位置横坐标,y 0为所述当前块内左上角像素点位置纵坐标;x 1为所述当前块内右上角像素点位置横坐标,y 1为所述当前块内右上角像素点位置纵坐标;x 4为所述相邻块内左上角像素点位置横坐标,y 4为所述当前块内左上角像素点位置纵坐标;x 5为所述相邻块内右上角像素点位置横坐标,y 6为所述相邻块内左下角像素点位置纵坐标。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述根据所述K个控制点的目标候选运动矢量,得到所述当前块中各个子块的预测运动矢量,包括:
    根据所述K个控制点的目标候选运动矢量,得到所述当前块的2乘K个参数的仿射变换模型;
    根据所述2乘K个参数的仿射变换模型,得到所述当前块中各个子块的预测运动矢量。
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述K个控制点的目标候选运动矢量,得到所述2乘K个参数的仿射变换模型,包括:
    根据所述K个控制点的目标候选运动矢量,以及所述K个控制点的运动矢量差值,得到所述K个控制点的运动矢量;其中,所述K个控制点的运动矢量差值是通过解析所述码流得到的;
    根据所述K个控制点的运动矢量,获得所述当前块的2乘K个参数的仿射变换模型。
  8. 根据权利要求1-5任一项所述的方法,其特征在于,所述根据所述索引值,从所述候选运动矢量列表中确定所述当前块的K个控制点的目标候选运动矢量之后,所述方法还包括:根据所述当前块的K个控制点的目标候选运动矢量,获得所述当前块的6参数的仿射变换模型;
    相应的,所述根据所述K个控制点的目标候选运动矢量,得到所述当前块中各个子块的预测运动矢量,包括:
    根据所述当前块的6参数的仿射变换模型,得到所述当前块中各个子块的预测运动矢量。
  9. 一种解码设备,其特征在于,所述设备包括:
    存储单元,用于存储码流形式的视频数据;
    熵解码单元,用于解析所述码流获得候选运动矢量列表的索引值;
    预测处理单元,用于构造所述候选运动矢量列表;其中,所述候选运动矢量列表包括当前块的K个控制点的候选运动矢量;所述当前块的K个控制点的候选运动矢量是根据所述当前块的相邻块采用的2乘N个参数的仿射变换模型而得到的,所述2乘N个参数的仿射变换模型基于所述相邻块的N个控制点的运动矢量而得到,其中,N为大于等于2且小于等于4的整数,K为大于等于2且小于等于4的整数,并且,N不等于K;所述相邻块为与所述当前块空间邻近的已解码图像块,所述当前块包括多个子块;根据所述索引值,从所述候选运动矢量列表中确定所述当前块的K个控制点的目标候选运动矢量;根据所述当前块的K个控制点的目标候选运动矢量,得到所述当前块中各个子块的预测运动矢量。
  10. 根据权利要求9所述的设备,其特征在于,N等于2且K等于3,相应的,所述当前块的3个控制点的候选运动矢量是根据所述当前块的相邻块采用的4参数的仿射变换模型而得到的。
  11. 根据权利要求10所述的设备,其特征在于,所述当前块的3个控制点的候选运动矢量包括所述当前块内左上角像素点位置的运动矢量、所述当前块内右上角像素点位置的 运动矢量和所述当前块内左下角像素点位置的运动矢量;
    所述预测处理单元用于根据如下公式计算出所述当前块的3个控制点的候选运动矢量:
    Figure PCTCN2018116984-appb-100006
    Figure PCTCN2018116984-appb-100007
    Figure PCTCN2018116984-appb-100008
    其中,vx 0为所述当前块内左上角像素点位置对应的运动矢量的水平分量,vy 0为所述当前块内左上角像素点位置对应的运动矢量的竖直分量;vx 1为所述当前块内右上角像素点位置对应的运动矢量的水平分量,vy 1为所述当前块内右上角像素点位置对应的运动矢量的竖直分量;vx 2为所述当前块内右上角像素点位置对应的运动矢量的水平分量,vy 2为所述当前块内右上角像素点位置对应的运动矢量的竖直分量;vx 4为所述相邻块内左上角像素点位置对应的运动矢量的水平分量,vy 4为所述相邻块内左上角像素点位置对应的运动矢量的竖直分量;vx 5为所述相邻块内右上角像素点位置对应的运动矢量的水平分量,vy 5为所述相邻块内右上角像素点位置对应的运动矢量的竖直分量;x 0为所述当前块内左上角像素点位置横坐标,y 0为所述当前块内左上角像素点位置纵坐标;x 1为所述当前块内右上角像素点位置横坐标,y 1为所述当前块内右上角像素点位置纵坐标;x 2为所述当前块内左下角像素点位置横坐标,y 2为所述当前块内左下角像素点位置纵坐标;x 4为所述相邻块内左上角像素点位置横坐标,y 4为所述当前块内左上角像素点位置纵坐标;x 5为所述相邻块内右上角像素点位置横坐标。
  12. 根据权利要求9所述的设备,其特征在于,N等于3且K等于2,相应的,所述当前块的2个控制点的候选运动矢量是根据所述当前块的相邻块采用的6参数的仿射变换模型而得到的。
  13. 根据权利要求12所述的设备,其特征在于,所述当前块的2个控制点的候选运动矢量包括所述当前块内左上角像素点位置的运动矢量和所述当前块内右上角像素点位置的运动矢量;
    所述预测处理单元用于根据如下公式计算出所述当前块的2个控制点的候选运动矢量:
    Figure PCTCN2018116984-appb-100009
    Figure PCTCN2018116984-appb-100010
    其中,vx 0为所述当前块内左上角像素点位置对应的运动矢量的水平分量,vy 0为所述当前块内左上角像素点位置对应的运动矢量的竖直分量;vx 1为所述当前块内右上角像素点位置对应的运动矢量的水平分量,vy 1为所述当前块内右上角像素点位置对应的运动矢量的竖直分量;vx 4为所述相邻块内左上角像素点位置对应的运动矢量的水平分量,vy 4为所述相邻块内左上角像素点位置对应的运动矢量的竖直分量;vx 5为所述相邻块内右上角像素点位置对应的运动矢量的水平分量,vy 5为所述相邻块内右上角像素点位置对应的运动矢量的竖直分量;vx 6为所述相邻块内左下角像素点位置对应的运动矢量的水平分量,vy 6为所述相邻块内左下角像素点位置对应的运动矢量的竖直分量;x 0为所述当前块内左上角像素点位置横坐标,y 0为所述当前块内左上角像素点位置纵坐标;x 1为所述当前块内右上角像素点位置横坐标,y 1为所述当前块内右上角像素点位置纵坐标;x 4为所述相邻块内左上角像素点位置横坐标,y 4为所述当前块内左上角像素点位置纵坐标;x 5为所述相邻块内右上角像素点位置横坐标,y 6为所述相邻块内左下角像素点位置纵坐标。
  14. 根据权利要求9-13任一项所述的设备,其特征在于,所述预测处理单元具体用于:
    根据所述当前块的K个控制点的目标候选运动矢量,得到所述当前块的2乘K个参数的仿射变换模型;
    根据所述当前块的2乘K个参数的仿射变换模型,得到所述当前块中各个子块的预测运动矢量。
  15. 根据权利要求14所述的设备,其特征在于,所述预测处理单元具体用于:
    根据所述当前块的K个控制点的目标候选运动矢量,以及所述当前块的K个控制点的运动矢量差值,得到当前块的K个控制点的运动矢量;其中,所述当前块的K个控制点的运动矢量差值是通过解析所述码流得到的;
    根据所述当前块的K个控制点的运动矢量,获得所述当前块的2乘K个参数的仿射变换模型。
  16. 根据权利要求9-13任一项所述的设备,其特征在于,所述根据所述索引值,从所述候选运动矢量列表中确定所述当前块的K个控制点的目标候选运动矢量之后,所述预测处理单元还用于:
    根据所述当前块的K个控制点的目标候选运动矢量,获得所述当前块的6参数的仿射变换模型;
    根据所述当前块的6参数的仿射变换模型,得到所述当前块中各个子块的预测运动矢量。
PCT/CN2018/116984 2018-07-02 2018-11-22 运动矢量预测方法以及相关装置 WO2020006969A1 (zh)

Priority Applications (15)

Application Number Priority Date Filing Date Title
MX2021000171A MX2021000171A (es) 2018-07-02 2018-11-22 Método de predicción de vector de movimiento y aparato relacionado.
CN202211258218.8A CN115695791A (zh) 2018-07-02 2018-11-22 视频图像编码方法以及用于对视频数据进行编码的设备
KR1020217002027A KR102606146B1 (ko) 2018-07-02 2018-11-22 모션 벡터 예측 방법 및 관련 장치
KR1020237040062A KR20230162152A (ko) 2018-07-02 2018-11-22 모션 벡터 예측 방법 및 관련 장치
SG11202013202YA SG11202013202YA (en) 2018-07-02 2018-11-22 Motion vector prediction method and related apparatus
CN202211226203.3A CN115733974A (zh) 2018-07-02 2018-11-22 视频图像编码方法以及用于对视频数据进行编码的设备
CN201880002952.3A CN110876282B (zh) 2018-07-02 2018-11-22 运动矢量预测方法以及相关装置
JP2020573324A JP7368396B2 (ja) 2018-07-02 2018-11-22 動きベクトル予測方法及び関連する装置
EP18925643.1A EP3809704A4 (en) 2018-07-02 2018-11-22 METHOD AND RELATED DEVICE FOR MOTION VECTOR PREDICTION
BR112020026992-1A BR112020026992A2 (pt) 2018-07-02 2018-11-22 Método de predição de vetor de movimento e aparelho relacionado
US17/140,041 US11206408B2 (en) 2018-07-02 2021-01-01 Motion vector prediction method and related apparatus
US17/525,944 US11683496B2 (en) 2018-07-02 2021-11-14 Motion vector prediction method and related apparatus
US18/318,731 US12108048B2 (en) 2018-07-02 2023-05-17 Video image encoding method and related computer-readable medium and apparatus
US18/318,730 US12120310B2 (en) 2018-07-02 2023-05-17 Motion vector prediction method and related apparatus
JP2023176813A JP2023184560A (ja) 2018-07-02 2023-10-12 動きベクトル予測方法及び関連する装置

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201862693422P 2018-07-02 2018-07-02
US62/693,422 2018-07-02
US201862699733P 2018-07-18 2018-07-18
US62/699,733 2018-07-18

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/140,041 Continuation US11206408B2 (en) 2018-07-02 2021-01-01 Motion vector prediction method and related apparatus

Publications (1)

Publication Number Publication Date
WO2020006969A1 true WO2020006969A1 (zh) 2020-01-09

Family

ID=69060695

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/116984 WO2020006969A1 (zh) 2018-07-02 2018-11-22 运动矢量预测方法以及相关装置

Country Status (9)

Country Link
US (4) US11206408B2 (zh)
EP (1) EP3809704A4 (zh)
JP (2) JP7368396B2 (zh)
KR (2) KR102606146B1 (zh)
CN (3) CN115695791A (zh)
BR (1) BR112020026992A2 (zh)
MX (4) MX2021000171A (zh)
SG (1) SG11202013202YA (zh)
WO (1) WO2020006969A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022078150A1 (zh) * 2020-10-18 2022-04-21 腾讯科技(深圳)有限公司 候选运动信息列表确定方法、装置、电子设备及存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI835864B (zh) * 2018-09-23 2024-03-21 大陸商北京字節跳動網絡技術有限公司 簡化的空時運動矢量預測
US11418793B2 (en) * 2018-10-04 2022-08-16 Qualcomm Incorporated Adaptive affine motion vector coding
WO2020125754A1 (en) * 2018-12-21 2020-06-25 Beijing Bytedance Network Technology Co., Ltd. Motion vector derivation using higher bit-depth precision
CN112055207B (zh) * 2020-08-06 2024-05-31 浙江大华技术股份有限公司 时域运动矢量预测方法、设备及存储介质
US11425368B1 (en) * 2021-02-17 2022-08-23 Adobe Inc. Lossless image compression using block based prediction and optimized context adaptive entropy coding
CN114979627A (zh) * 2021-02-24 2022-08-30 华为技术有限公司 视频编码中的运动矢量(mv)约束和变换约束

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102934440A (zh) * 2010-05-26 2013-02-13 Lg电子株式会社 用于处理视频信号的方法和设备
CN103402045A (zh) * 2013-08-20 2013-11-20 长沙超创电子科技有限公司 一种基于分区匹配和仿射模型相结合的图像去旋稳像方法
CN103561263A (zh) * 2013-11-06 2014-02-05 北京牡丹电子集团有限责任公司数字电视技术中心 基于运动矢量约束和加权运动矢量的运动补偿预测方法
CN104539966A (zh) * 2014-09-30 2015-04-22 华为技术有限公司 图像预测方法及相关装置
CN106375770A (zh) * 2011-01-21 2017-02-01 Sk电信有限公司 视频解码方法
CN106454378A (zh) * 2016-09-07 2017-02-22 中山大学 一种基于变形运动模型的帧率上转换视频编码方法及系统
CN106878749A (zh) * 2010-12-28 2017-06-20 太阳专利托管公司 图像解码方法、图像解码装置

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102883163B (zh) * 2012-10-08 2014-05-28 华为技术有限公司 用于运动矢量预测的运动矢量列表建立的方法、装置
WO2016008157A1 (en) * 2014-07-18 2016-01-21 Mediatek Singapore Pte. Ltd. Methods for motion compensation using high order motion model
CN106303543B (zh) * 2015-05-15 2018-10-30 华为技术有限公司 视频图像编码和解码的方法、编码设备和解码设备
CN107925758B (zh) 2015-08-04 2022-01-25 Lg 电子株式会社 视频编译系统中的帧间预测方法和设备
US10582215B2 (en) * 2015-08-07 2020-03-03 Lg Electronics Inc. Inter prediction method and apparatus in video coding system
CN108600749B (zh) * 2015-08-29 2021-12-28 华为技术有限公司 图像预测的方法及设备
GB2561507B (en) 2016-01-07 2021-12-22 Mediatek Inc Method and apparatus for affine merge mode prediction for video coding system
WO2017156705A1 (en) * 2016-03-15 2017-09-21 Mediatek Inc. Affine prediction for video coding
US10560712B2 (en) * 2016-05-16 2020-02-11 Qualcomm Incorporated Affine motion prediction for video coding
EP3449630B1 (en) 2016-05-28 2024-07-10 Mediatek Inc. Method and apparatus of current picture referencing for video coding
US10631002B2 (en) 2016-09-30 2020-04-21 Qualcomm Incorporated Frame rate up-conversion coding mode
US10448010B2 (en) 2016-10-05 2019-10-15 Qualcomm Incorporated Motion vector prediction for affine motion models in video coding
US10873744B2 (en) * 2017-01-03 2020-12-22 Lg Electronics Inc. Method and device for processing video signal by means of affine prediction
US10701390B2 (en) * 2017-03-14 2020-06-30 Qualcomm Incorporated Affine motion information derivation
US10602180B2 (en) * 2017-06-13 2020-03-24 Qualcomm Incorporated Motion vector prediction

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102934440A (zh) * 2010-05-26 2013-02-13 Lg电子株式会社 用于处理视频信号的方法和设备
CN106878749A (zh) * 2010-12-28 2017-06-20 太阳专利托管公司 图像解码方法、图像解码装置
CN106375770A (zh) * 2011-01-21 2017-02-01 Sk电信有限公司 视频解码方法
CN103402045A (zh) * 2013-08-20 2013-11-20 长沙超创电子科技有限公司 一种基于分区匹配和仿射模型相结合的图像去旋稳像方法
CN103561263A (zh) * 2013-11-06 2014-02-05 北京牡丹电子集团有限责任公司数字电视技术中心 基于运动矢量约束和加权运动矢量的运动补偿预测方法
CN104539966A (zh) * 2014-09-30 2015-04-22 华为技术有限公司 图像预测方法及相关装置
CN106454378A (zh) * 2016-09-07 2017-02-22 中山大学 一种基于变形运动模型的帧率上转换视频编码方法及系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022078150A1 (zh) * 2020-10-18 2022-04-21 腾讯科技(深圳)有限公司 候选运动信息列表确定方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
JP7368396B2 (ja) 2023-10-24
MX2022013784A (es) 2022-11-30
EP3809704A1 (en) 2021-04-21
US12120310B2 (en) 2024-10-15
CN110876282A (zh) 2020-03-10
US12108048B2 (en) 2024-10-01
CN115733974A (zh) 2023-03-03
EP3809704A4 (en) 2021-04-28
KR20210022101A (ko) 2021-03-02
US11683496B2 (en) 2023-06-20
US20220078443A1 (en) 2022-03-10
CN115695791A (zh) 2023-02-03
MX2022013782A (es) 2022-11-30
MX2021000171A (es) 2022-11-01
US11206408B2 (en) 2021-12-21
JP2023184560A (ja) 2023-12-28
CN110876282B (zh) 2022-10-18
KR102606146B1 (ko) 2023-11-23
US20210127116A1 (en) 2021-04-29
US20230370607A1 (en) 2023-11-16
SG11202013202YA (en) 2021-01-28
US20230370606A1 (en) 2023-11-16
BR112020026992A2 (pt) 2021-04-06
MX2022013781A (es) 2022-11-30
KR20230162152A (ko) 2023-11-28
JP2021529483A (ja) 2021-10-28

Similar Documents

Publication Publication Date Title
WO2020006969A1 (zh) 运动矢量预测方法以及相关装置
TWI786790B (zh) 視頻資料的幀間預測方法和裝置
WO2020052304A1 (zh) 基于仿射运动模型的运动矢量预测方法及设备
WO2020088324A1 (zh) 一种视频图像预测方法及装置
WO2020088482A1 (zh) 基于仿射预测模式的帧间预测的方法及相关装置
CN117730535A (zh) 视频编解码中用于仿射运动补偿预测的几何分割
CN110876065A (zh) 候选运动信息列表的构建方法、帧间预测方法及装置
WO2023092256A1 (zh) 一种视频编码方法及其相关装置
CN112055970B (zh) 候选运动信息列表的构建方法、帧间预测方法及装置
WO2020007093A1 (zh) 一种图像预测方法及装置
CN110971899B (zh) 一种确定运动信息的方法、帧间预测方法及装置
CN110677645B (zh) 一种图像预测方法及装置
WO2020042724A1 (zh) 帧间预测方法、装置以及视频编码器和视频解码器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18925643

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020573324

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112020026992

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 20217002027

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2018925643

Country of ref document: EP

Effective date: 20210114

ENP Entry into the national phase

Ref document number: 112020026992

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20201230