WO2024153085A1 - Procédé de codage vidéo et appareil de prédiction de chrominance - Google Patents
Procédé de codage vidéo et appareil de prédiction de chrominance Download PDFInfo
- Publication number
- WO2024153085A1 WO2024153085A1 PCT/CN2024/072601 CN2024072601W WO2024153085A1 WO 2024153085 A1 WO2024153085 A1 WO 2024153085A1 CN 2024072601 W CN2024072601 W CN 2024072601W WO 2024153085 A1 WO2024153085 A1 WO 2024153085A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cross
- component
- chroma
- block
- predictor
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 115
- 238000009795 derivation Methods 0.000 claims description 42
- 230000002123 temporal effect Effects 0.000 claims description 29
- 238000002156 mixing Methods 0.000 claims description 15
- 230000004044 response Effects 0.000 claims description 4
- 239000000523 sample Substances 0.000 description 137
- 241000023320 Luma <angiosperm> Species 0.000 description 101
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 101
- 238000010586 diagram Methods 0.000 description 36
- 230000008569 process Effects 0.000 description 24
- 239000013598 vector Substances 0.000 description 23
- 230000011664 signaling Effects 0.000 description 22
- 238000012545 processing Methods 0.000 description 19
- 230000007246 mechanism Effects 0.000 description 13
- 230000003044 adaptive effect Effects 0.000 description 11
- 230000014509 gene expression Effects 0.000 description 9
- 230000004927 fusion Effects 0.000 description 8
- 238000012935 Averaging Methods 0.000 description 5
- 238000000354 decomposition reaction Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 5
- 238000005192 partition Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000007423 decrease Effects 0.000 description 4
- 229910003460 diamond Inorganic materials 0.000 description 4
- 239000010432 diamond Substances 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 238000013139 quantization Methods 0.000 description 4
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 3
- 238000013459 approach Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- FZEIVUHEODGHML-UHFFFAOYSA-N 2-phenyl-3,6-dimethylmorpholine Chemical compound O1C(C)CNC(C)C1C1=CC=CC=C1 FZEIVUHEODGHML-UHFFFAOYSA-N 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000013138 pruning Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 101100129500 Caenorhabditis elegans max-2 gene Proteins 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013074 reference sample Substances 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/593—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/11—Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/186—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/517—Processing of motion vectors by encoding
- H04N19/52—Processing of motion vectors by encoding by predictive encoding
Definitions
- the conventional video coding standards generally adopt a block based coding technique to exploit spatial and temporal redundancy.
- the basic approach is to divide the whole source picture into a plurality of blocks, perform intra/inter prediction on each block, transform residues of each block, and perform quantization and entropy encoding.
- a reconstructed picture is generated in a coding loop to provide reference data used for coding following blocks.
- in-loop filter s may be used for enhancing the image quality of the reconstructed frame.
- the video decoder is used to perform an inverse operation of a video encoding operation performed by a video encoder.
- the video decoder may have a plurality of processing circuits, such as an entropy decoding circuit, an intra prediction circuit, a motion compensation circuit, an inverse quantization circuit, an inverse transform circuit, a reconstruction circuit, and in-loop filter (s) .
- chroma prediction may be determined under a non-cross-component mode or a cross-component mode.
- One of the objectives of the claimed invention is to provide a video coding method that determines intra prediction of a chroma sample by jointly considering (e.g., blending) a non-cross-component predictor (i.e., a non-cross-component intra-predicted chroma sample) and at least one cross-component predictor (i.e., at least one cross-component intra-predicted chroma sample) that is generated using a cross-component model candidate selected from a merge candidate list.
- a non-cross-component predictor i.e., a non-cross-component intra-predicted chroma sample
- at least one cross-component predictor i.e., at least one cross-component intra-predicted chroma sample
- an exemplary method for video coding includes: receiving data to be encoded or decoded at a current block of pixels of a current picture of a video, wherein the current block comprises at least one chroma block; and encoding or decoding the current block by a target intra prediction mode, which includes: obtaining a non-cross-component predictor of a chroma sample included in the at least one chroma block, wherein the non-cross-component predictor is an intra predictor; obtaining at least one cross-component predictor of the chroma sample; and determining intra prediction of the chroma sample by jointly considering the non-cross-component predictor and the at least one cross-component predictor.
- the at least one cross-component predictor is determined according to a cross-component model candidate from a previous-coded block.
- the intra chroma prediction circuit is arranged to obtain at least one cross-component predictor of the chroma sample according to a cross-component model candidate from a previous-coded block, and determine intra prediction of the chroma sample by jointly considering the non-cross-component predictor and the at least one cross-component predictor.
- an exemplary video decoder includes a video data memory and a decoding circuit.
- the video data memory is arranged to receive data to be decoded as a current block of pixels of a current picture of a video, wherein the current block comprises at least one chroma block.
- the decoding circuit is arranged to perform decoding of the current block by a target intra prediction mode.
- the decoding circuit includes an intra prediction circuit and an intra chroma prediction circuit.
- the intra prediction circuit is arranged to obtain a non-cross-component predictor of a chroma sample included in the at least one chroma block, wherein the non-cross-component predictor is an intra predictor.
- FIG. 1 is a diagram illustrating an example of four reference lines neighboring to a prediction block according to an embodiment of the present invention.
- FIG. 2 is a diagram illustrating locations of the samples used for the derivation of ⁇ and ⁇ according to an embodiment of the present invention.
- FIG. 3 is a diagram illustrating an example of classifying the neighboring samples into two groups according to an embodiment of the present invention.
- FIG. 4 is a diagram illustrating the effect of the slope adjustment parameter “u” according to an embodiment of the present invention.
- FIG. 6 is a diagram illustrating a reference area (with its paddings) used to derive the filter coefficients according to an embodiment of the present invention.
- FIG. 7 is a diagram illustrating 16 gradient patterns for GLM according to an embodiment of the present invention.
- FIG. 8 is a diagram illustrating HoG computation from a template of width 3 pixels according to an embodiment of the present invention.
- FIG. 9 is a diagram illustrating prediction fusion by weighted averaging of two HoG modes and planar according to an embodiment of the present invention.
- FIG. 10 is a diagram illustrating a template and its reference samples used in TIMD according to an embodiment of the present invention.
- FIG. 11 is a diagram illustrating positions of spatial merge candidate according to an embodiment of the present invention.
- FIG. 12 is a diagram illustrating candidate pairs considered for redundancy check of spatial merge candidates according to an embodiment of the present invention.
- FIG. 13 is a diagram illustrating spatial neighboring blocks used to derive the spatial merge candidates according to an embodiment of the present invention.
- FIG. 14 is a diagram illustrating motion vector scaling for temporal merge candidate.
- FIG. 15 is a diagram illustrating candidate positions for temporal merge candidate, C 0 and C 1 , according to an embodiment of the present invention.
- FIG. 16 is a diagram illustrating top and left neighboring blocks used in CIIP weight derivation according to an embodiment of the present invention.
- FIG. 17 is a diagram illustrating a linear predicting method of linearly predicting chroma samples from luma samples according to an embodiment of the present invention.
- FIG. 23 is a diagram illustrating a pattern being 5x5 diamond, including or not excluding (iL, jL) , according to an embodiment of the present invention.
- FIG. 24 is a diagram illustrating different Sobel filters according to an embodiment of the present invention.
- FIG. 25 is a diagram illustrating a pattern being any subset of a window region M2 x N2 around/including the position (iC, jC) according to an embodiment of the present invention.
- FIG. 26 is a diagram illustrating a pattern being 5x5 cross, including or not excluding (iC, jC) , according to an embodiment of the present invention.
- FIG. 27 is a diagram illustrating a pattern being 5x5 diamond, including or not excluding (iC, jC) , according to an embodiment of the present invention.
- FIG. 28 is a diagram illustrating a spatial neighboring region of the current block that includes above reference region, left reference region, above-left reference region, and/or any subset of the above according to an embodiment of the present invention.
- FIG. 29 is a diagram illustrating scanning orders when adding the spatial model information from spatial neighbour blocks into the list according to an embodiment of the present invention.
- FIG. 30 is a diagram illustrating an example of the right-bottom region for the current chroma block according to an embodiment of the present invention.
- FIG. 31 is a block diagram illustrating a video encoder that supports the proposed intra chroma prediction mode according to an embodiment of the present invention.
- FIG. 33 is a flowchart illustrating a video coding method according to an embodiment of the present invention.
- MMLM Multiple model CCLM
- VVC virtual cardiac coding
- HEVC HEVC
- denser directional intra prediction modes apply for all block sizes and for both luma and chroma intra predictions.
- the index of selected reference line (mrl_idx) is signalled and used to generate intra predictor.
- the reference line index which is greater than 0, only include additional reference line modes in MPM list and only signal the MPM index without remaining modes.
- the reference line index is signalled before intra prediction modes, and Planar mode is excluded from intra prediction modes in case a nonzero reference line index is signalled.
- Max –Min is greater than or equal to 62:
- TBC Truncated Binary Code
- CCLM cross-component linear model
- pred C (i, j) represents the predicted chroma samples in a CU and rec L ′ (i, j) represents the downsampled reconstructed luma samples of the same or collocated CU.
- CCLM parameters ( ⁇ and ⁇ ) are derived with at most four neighbouring chroma samples and their corresponding down-sampled luma samples.
- W ⁇ H the current chroma block dimensions
- the downsampling filter can be from the mentioned example and/or is not limited to the mentioned example.
- a single binarization table is used regardless of the value of sps_cclm_enabled_flag as shown in Table 1-2.
- the first bin indicates whether it is regular (0) or CCLM modes (1) . If it is CCLM mode, then the next bin indicates whether it is LM_LA (0) or not. If it is not LM_LA, next 1 bin indicates whether it is LM_L (0) or LM_A (1) . For this case, when sps_cclm_enabled_flag is 0, the first bin of the binarization table for the corresponding intra_chroma_pred_mode can be discarded prior to the entropy coding. Or, in other words, the first bin is inferred to be 0 and hence not coded.
- This single binarization table is used for both sps_cclm_enabled_flag equal to 0 and 1 cases.
- the first two bins in Table 1-2 are context coded with its own context model, and the rest bins are bypass coded.
- the chroma intra mode coding can use this example and/or the chroma intra mode coding is not limited to this example.
- the chroma CUs in 32x32 /32x16 chroma coding tree node are allowed to use CCLM in the following way:
- MMLM multiple model CCLM
- neighbouring luma samples and neighbouring chroma samples of the current block are classified into two groups, each group is used as a training set to derive a linear model (i.e., a particular ⁇ and ⁇ are derived for a particular group) .
- the samples of the current luma block are also classified based on the same rule for the classification of neighbouring luma samples.
- FIG. 4 illustrates the process, where the sub-diagram (A) illustrates a model created with the current CCLM, and the sub-diagram (B) illustrates a model updated as proposed.
- Slope adjustment parameter is provided as an integer between -4 and 4, inclusive, and signaled in the bitstream.
- the unit of the slope adjustment parameter is 1/8 th of a chroma sample value per one luma sample value (for 10-bit content) .
- adjustment is available for the CCLM models that are using reference samples both above and left of the block ( “LM_CHROMA_IDX” and “MMLM_CHROMA_IDX” ) , but not for the “single side” modes. This selection is based on coding efficiency vs. complexity trade-off considerations.
- CCCM Convolutional cross-component model
- the convolutional model has 7-tap filter consist of a 5-tap plus sign shape spatial component, a nonlinear term and a bias term.
- the input to the spatial 5-tap component of the filter consists of a center (C) luma sample which is collocated with the chroma sample to be predicted and its above/north (N) , below/south (S) , left/west (W) and right/east (E) neighbors as illustrated in FIG. 5.
- the horizontal and vertical Sobel filters are applied on all 3 ⁇ 3 window positions, centered on the pixels of the middle line of the template.
- Sobel filters calculate the intensity of pure horizontal and vertical directions as G x and G y , respectively.
- the SATD between the prediction and reconstruction samples of the template is calculated.
- First two intra prediction modes with the minimum SATD are selected as the TIMD modes. These two TIMD modes are fused with the weights after applying PDPC process, and such weighted intra prediction is used to code the current CU.
- Position dependent intra prediction combination (PDPC) is included in the derivation of the TIMD modes.
- weight1 costMode2/ (costMode1+ costMode2)
- weight2 1 -weight1
- VVC includes a number of new and refined inter prediction coding tools listed as follows:
- MMVD Merge mode with MVD
- AMVR Adaptive motion vector resolution
- the merge candidate list is constructed by including the following five types of candidates in order:
- the size of merge list is signalled in sequence parameter set header and the maximum allowed size of merge list is 6.
- an index of best merge candidate is encoded using truncated unary binarization (TU) .
- the first bin of the merge index is coded with context and bypass coding is used for other bins.
- VVC also supports parallel derivation of the merge candidate lists (or called as merging candidate lists) for all CUs within a certain size of area.
- the derivation of spatial merge candidates in VVC is same to that in HEVC except the positions of first two merge candidates are swapped.
- a maximum of four merge candidates are selected among candidates located in the positions depicted in FIG. 11.
- the order of derivation is B 0 , A 0 , B 1 , A 1 and B 2 .
- Position B 2 is considered only when one or more than one CUs of position B 0 , A 0 , B 1 , A 1 are not available (e.g. because it belongs to another slice or tile) or is intra coded.
- the addition of the remaining candidates is subject to a redundancy check which ensures that candidates with same motion information are excluded from the list so that coding efficiency is improved.
- a scaled motion vector is derived based on co-located CU belonging to the collocated reference picture.
- the reference picture list and the reference index to be used for derivation of the co-located CU is explicitly signalled in the slice header.
- the scaled motion vector for temporal merge candidate is obtained as illustrated by the dotted line in FIG.
- tb is defined to be the POC difference between the reference picture of the current picture and the current picture
- td is defined to be the POC difference between the reference picture of the co-located picture and the co-located picture.
- the reference picture index of temporal merge candidate is set equal to zero.
- the position for the temporal candidate is selected between candidates C 0 and C 1 , as depicted in FIG. 15. If CU at position C 0 is not available, is intra coded, or is outside of the current row of CTUs, position C 1 is used. Otherwise, position C 0 is used in the derivation of the temporal merge candidate.
- HMVP candidates could be used in the merge candidate list construction process.
- the latest several HMVP candidates in the table are checked in order and inserted to the candidate list after the TMVP candidate. Redundancy check is applied on the HMVP candidates to the spatial or temporal merge candidate.
- Pairwise average candidates are generated by averaging predefined pairs of candidates in the existing merge candidate list, using the first two merge candidates.
- the first merge candidate is defined as p0Cand and the second merge candidate can be defined as p1Cand, respectively.
- the averaged motion vectors are calculated according to the availability of the motion vector of p0Cand and p1Cand separately for each reference list. If both motion vectors are available in one list, these two motion vectors are averaged even when they point to different reference pictures, and its reference picture is set to the one of p0Cand; if only one motion vector is available, use the one directly; if no motion vector is available, keep this list invalid. Also, if the half-pel interpolation filter indices of p0Cand and p1Cand are different, it is set to 0.
- the zero MVPs are inserted in the end until the maximum merge candidate number is encountered.
- Merge estimation region allows independent derivation of merge candidate list for the CUs in the same merge estimation region (MER) .
- a candidate block that is within the same MER to the current CU is not included for the generation of the merge candidate list of the current CU.
- the updating process for the history-based motion vector predictor candidate list is updated only if (xCb + cbWidth) >> Log2ParMrgLevel is greater than xCb >> Log2ParMrgLevel and (yCb + cbHeight) >> Log2ParMrgLevel is great than (yCb >> Log2ParMrgLevel) and where (xCb, yCb) is the top-left luma sample position of the current CU in the picture and (cbWidth, cbHeight) is the CU size.
- the MER size is selected at encoder side and signalled as log2_parallel_merge_level_minus2 in the sequence parameter set.
- CIIP Combined inter and intra prediction
- the CIIP prediction combines an inter prediction signal with an intra prediction signal.
- the inter prediction signal in the CIIP mode P inter is derived using the same inter prediction process applied to regular merge mode; and the intra prediction signal P intra is derived following the regular intra prediction process with the planar mode or TIMD. Then, the intra and inter prediction signals are combined using weighted averaging, where the weight value is calculated depending on the coding modes of the top and left neighbouring blocks (depicted in FIG. 16) as follows:
- the prediction of the current block which is coded with the non-cross-component intra mode (for example, DC, planar, angular, TIMD, and/or DIMD modes) , is generated using one or more non-cross-component intra modes, one or more cross-component modes, or both.
- the cross-component modes in this invention can refer to any one or more mentioned cross-component modes and/or the cross-component modes in this invention are not limited to the mentioned cross-component modes.
- the one or more cross-component modes are selected from a merge candidate list.
- the cross-component prediction is generated using multiple cross-component modes. For example, the cross-component prediction is generated using MH CCLM.
- the cross-component modes can be used to improve inter and/or IBC chroma prediction.
- inter CCLM inter CCLM.
- template-derivation regression-based weighting is used when generating the chroma prediction for the current block.
- luma reconstructed samples are used to derive the predictors in the chroma block.
- inverse LM is proposed to use chroma information to derive the predictors in the luma block.
- chroma are encoded/decoded (signalled/parsed) before luma.
- the chroma information refers to the chroma reconstructed samples.
- reconstructed neighbouring chroma samples are used as X as neighbouring luma samples for model derivation in traditional CCLM and reconstructed neighbouring luma samples are used as Y as neighbouring chroma samples for model derivation in traditional CCLM.
- the reconstructed samples in the chroma block (collocated to the current luma block) and the derived parameters are used to generate the predictors in the current luma block.
- “information” in this embodiment can refer to predicted samples.
- chroma refers to cb and/or cr component (s) .
- the chroma information is from both cb and cr.
- the neighbouring reconstructed cb and cr samples are weighted and then used as the inputs of deriving model parameters.
- the reconstructed cb and cr samples in the chroma block are weighted and then used to derive the predictors in the current luma block.
- weighting for each hypothesis can be fixed or adaptively changed. For example, equal weights are applied to each hypothesis. In another example, weights vary with neighbouring coding information, sample position, block width, height, prediction mode or area. Some examples of neighbouring coding information usage are shown as follows:
- n can be any positive integer
- the first component is luma.
- the predicted samples for the first component are down-sampling with the downsampling filters.
- the downsampling filters follow the original LM design.
- the downsampling filters will not access neighboring predicted/reconstructed samples.
- the neighboring samples are required to be the input samples of downsampling filters, padded predicted values from the boundary of current block is used instead.
- the third component is Cr.
- the following shows a flow of prediction-based inter CCLM, which means using cross-component prediction to form the inter chroma prediction.
- the linear predicting method can be one of
- MMLM_LT MMLM_L
- MMLM_T any cross-component modes.
- Step 1 Derive the linear model by neighboring luma and chroma reconstructed samples
- Step 2 Apply the derived linear model to current luma predicted samples to get current chroma predicted samples
- pred CCLM (i, j) ⁇ pred L ′ (i, j) + ⁇
- padding is used. For example, when the down-sampling process references any samples outside of the current luma predicted samples of the luma block, padding is used.
- CCLM for cb, deriving model parameters from luma and cb; for cr, deriving model parameters from luma and cr
- CCLM for cb, deriving model parameters from luma and cb
- cr deriving model parameters from luma and cr
- Another variation is MMLM.
- Another variation is that for cb (or cr) , deriving model parameters from multiple collocated luma blocks.
- One modelIdx is signalled to select a candidate from the candidate list (modelList) and the selected candidate is used for the current block.
- the modelList contains one or more candidates where each candidate refers to a model (or cross-component mode) information. If only one candidate is in the list (the size of the list is only 1) , the modellIdx is not signalled and/or can be inferred as 0 or a default value.
- the weights for predictions from different to-be-blended CCLM methods are pre-defined at encoder and/or decoder.
- the weights for predictions from different to-be-blended CCLM methods vary based on the distance between the sample (or region) positions and the reference sample positions.
- the weights for predictions from different to-be-blended CCLM methods depend on signalling.
- a weight index is signaled/parsed.
- the code words can be fixed or vary adaptively.
- the code words vary with template-based methods.
- the template refers to the spatial neighboring region of the current block.
- Example 1 A large block uses the same coding mode.
- Intra prediction is highly related to neighboring reference samples.
- the intra prediction mode may be suitable for those samples which are close to the reference samples but may not be good for those samples which are far away from the reference samples.
- the performance for the different coding modes is decided. Then, the better mode is used for the rest component (s) (subsequently encoded and decoded component (s) ) .
- the prediction from traditional intra prediction modes e.g. angular intra prediction modes, DC, planar
- the prediction from LM mode is better than the prediction from LM mode.
- the traditional intra prediction mode is preferable for cr.
- the proposed method can be subblock based.
- a chroma block is divided into several sub-blocks.
- the subblock’s prediction from LM mode is better than the subblock’s prediction from traditional intra prediction modes (e.g. angular intra prediction modes, DC, planar) .
- traditional intra prediction modes e.g. angular intra prediction modes, DC, planar
- the LM mode is preferable for the corresponding subblock of cr.
- the adaptive changing rule can be performed at both encoder and/or decoder and doesn’t need additional syntax.
- CCLM for Inter Block can also be named as inter CCLM and “CCLM” can be extended to any LM mode (or any cross-component mode, for example, (*) ) or replaced with any LM mode (or any cross-component mode, for example, (*) ) )
- CCLM is used for intra blocks to improve chroma intra prediction.
- chroma prediction may be not as accurate as luma. Possible reasons are listed below:
- Motion vectors for chroma components are inherited from luma, (Chroma doesn't have its own motion vectors. )
- chroma prediction for inter block can be improved according to luma.
- (*) is shown as follows.
- the one or more LM mode (s) (or cross-component mode (s) ) which will be used to generate the one or more hypotheses of predictions for LM assisted Angular/Planar Mode and/or inter CCLM and/or MH CCLM are selected from a pre-defined merging candidate list (called modelList) .
- One modelIdx is signalled to select a candidate from the candidate list (modelList) and the selected candidate is used for the current block.
- the modelList contains one or more candidates where each candidate refers to a model (or cross-component mode) information. If only one candidate is in the list (the size of the list is only 1) , the modellIdx is not signalled and/or can be inferred as 0 or a default value.
- one or more hypotheses of predictions are used to form the current prediction.
- the current prediction is the weighted sum of inter prediction and cross-component prediction, for example, CCLM prediction.
- Weights are designed according to neighboring coding information, sample position, block width, height, mode or area.
- weights for CCLM prediction are higher than weights for inter prediction.
- weights for CCLM prediction are higher than weights for inter prediction.
- weights are fixed values for the whole block.
- the inter prediction can be generated by any inter mode mentioned in the above introduction/documents.
- the inter mode can be regular merge mode.
- the inter mode can be CIIP mode.
- the inter mode can be CIIP PDPC.
- the inter mode can be GPM or any GPM variations (like GPM intra which uses intra prediction for one of the GPM partitions) .
- regular merge mode is a merge candidate selected from the merge candidate list with a signalled merge index.
- regular merge mode can be MMVD.
- the LM mode used in inter CCLM is prediction-based LM.
- inter CCLM is supported only when any one (or more than one) of the pre-defined inter mode is used for the current block, or inter CCLM is supported when any one (or more than one) of the enabling flag (s) of the pre-defined inter mode is (are) indicated as enabled.
- the meaning of supporting inter CCLM is that the prediction of the current block can be chosen between applying inter CCLM or not applying inter CCLM.
- Predfinal (wInter *PredInter + wLM *PredLM + 2) >> 2
- Weighting rule wInter and wLM, for example
- the weighting follows CIIP weighting.
- predInter inter prediction after OBMC (if OBMC is used)
- predInter inter prediction before OBMC (OBMC can be applied after blending)
- the prediction of current block is from original inter prediction.
- the choice between applying inter CCLM or not applying inter CCLM depends on signaling.
- a flag is signalled in the bitstream to indicate whether to apply inter CCLM or not.
- the flag is context coded.
- only one context is used to encode and/or decode the flag.
- multiple contexts are used to encode and/or decode the flag and the selection of the contexts depends on block width, block height, block area, or neighboring mode information.
- additional signaling is used to select one or more than one LM from total candidate LM modes (e.g. CCLM_LT, CCLM_L, CCLM_T, MMLM_L, MMLM_T, MMLM_L, or any subset/extension from the above mentioned modes) .
- the LM prediction is generated by the selected one LM.
- the LM prediction is generated by blending hypotheses of predictions from multiple LM modes.
- the additional signaling refers to an index in the bitstream which can be truncated unary coding with and/or without contexts.
- one or more than one LM from total candidate LM modes (e.g. CCLM_LT, CCLM_L, CCLM_T, MMLM_L, MMLM_T, MMLM_L, or any subset/extension from the above mentioned modes) is (are) implicitly selected (or predefined) to be used in inter CCLM.
- CCLM_LT is used to generate LM prediction for inter CCLM.
- MMLM_LT is used to generate LM prediction for inter CCLM.
- the predefined rule depends on block width, block height, or block area.
- Boundary matching setting (used as the predefined rule) can be applied only when the block width, block height, or block area is larger than a threshold.
- Boundary matching setting (used as the predefined rule) can be applied only when the block width, block height, or block area is smaller than a threshold.
- the selected LM mode (s) is (are) inferred as any one (more than one) LM mode (s) from total candidate LM modes.
- the selected LM mode is fixed as CCLM_LT.
- the selected LM mode is fixed as MMLM_LT.
- the predefined rule depends on boundary matching setting. (Details of boundary matching setting can be found in the section of boundary matching setting.
- the candidate mode used in the section of boundary matching setting refers to each candidate LM mode for inter CCLM.
- the prediction from a candidate mode used in the section of boundary matching setting refers to the prediction generated by each candidate LM mode or refers to the blended prediction from each candidate LM mode and original inter. )
- inter CCLM can be supported only when the size conditions of the current block are satisfied.
- the size condition is that the block width, block height, or block area is larger than a pre-defined threshold.
- the predefine threshold can a positive integer like 8, 16, 32, 64, 128, 256, ....
- the size condition is satisfied.
- the size condition is satisfied when any one of the block width and block height of the current chroma block is larger than the pre-defined threshold.
- the size condition is that the block width, block height, or block area is smaller than a pre-defined threshold.
- the predefine threshold can a positive integer like 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096.... For example, when both the block width and block height of the current chroma block are smaller than the pre-defined threshold, the size condition is satisfied.
- the inter mode used in the inter block depends on an enabling flag.
- the enabling flag is called as regular merge flag.
- the enabling flag is called as CIIP flag.
- the enabling flag is called as CIIP PDPC Flag.
- the enabling flag indicated as enabled (1) means the corresponding inter mode is applied to the current block.
- the enabling flag indicated as disabled (0) means the corresponding inter mode is not applied to the current block.
- the enabling flag is signalled in the bitstream and/or inferred in some cases.
- the signaling of the enabling flag depends on block width, block height, or block area.
- the prediction from inter can be adjusted by the neighboring reconstructed samples and a pre-defined weighting scheme. For example, when the current block is merge, the prediction from merge is blended with the neighboring reconstructed samples.
- the proposed scheme is enabled depending on CIIP PDPC flag. (The CIIP PDPC flag may be signalled when the CIIP flag is indicated as enabled. )
- the pre-defined weighting scheme follows PDPC weighting.
- Inter-predictor of regular merge mode is refined using the above R x, -1 and left R -1, y reconstructed samples, as illustrated in FIG. 19
- nScale (floorLog2 (width) + floorLog2 (height) -2) >> 2;
- inter-predictor is computed in mapped domain
- Pred (x, y) ( ( ( (wT ⁇ R x, -1 +wL ⁇ R -1, y +32) >>6) ⁇ 6) + (64-wT-wL) ⁇ Fwd (predInter (x, y) ) +32) >>6
- inter-predictor is computed in original domain
- Pred (x, y) ( ( ( (wT ⁇ R x, -1 +wL ⁇ R -1, y +32) >>6) ⁇ 6) + (64-wT-wL) ⁇ predInter (x, y) +32) >>6
- CIIP PDPC flag is further signaled to indicate whether to use CIIP PDPC
- original inter prediction (generated by motion compensation) is used for luma and the predictions of chroma components are generated by CCLM and/or any other LM modes.
- the current CU is viewed as an inter CU, intra CU, or a new type of prediction mode (neither intra nor inter) .
- the above proposed methods can be also applied to IBC blocks. ( “inter” in this section can be changed to IBC. ) That is, for chroma components, the block vector prediction can be combined or replaced with CCLM (or any cross-component mode) prediction.
- CCLM cross-component mode
- a boundary matching cost for a candidate mode refers to the discontinuity measurement (including top boundary matching and/or left boundary matching) between the current prediction (the predicted samples within the current block) , generated from the candidate mode, and the neighboring reconstruction (the reconstructed samples within one or more neighboring blocks) , as illustrated in FIG. 20.
- Top boundary matching means the comparison between the current top predicted samples and the neighboring top reconstructed samples
- left boundary matching means the comparison between the current left predicted samples and the neighboring left reconstructed samples.
- the candidate mode with the smallest boundary matching cost is applied to the current block.
- the boundary matching cost for Cb and Cr can be added to be the boundary matching cost for chroma, so the selected candidate mode for Cb and Cr will be shared. (That is the selected candidate mode for Cb and Cr will be the same. )
- the selected candidate modes for Cb and Cr depend on the boundary matching costs for Cb and Cr, respectively, so the selected candidate modes for Cb and Cr can be the same or different.
- a pre-defined subset of the current prediction is used to calculate the boundary matching cost.
- n line (s) of top boundary within the current block and/or m line (s) of left boundary within the current block are used.
- n2 line (s) of top neighboring reconstruction and/or m2 line (s) of left neighboring reconstruction are used.
- n and m can also be applied to n2 and m2.
- n can be any positive integer such as 1, 2, 3, 4, etc.
- n can be any positive integer such as 1, 2, 3, 4, etc.
- n and/or m vary with block width, height, or area.
- Threshold2 64, 128, or 256.
- Threshold2 1, 2, or 4.
- n gets larger.
- Threshold2 64, 128, or 256.
- n is increased to 2. (Originally, n is 1. )
- n is increased to 4. (Originally, n is 1 or 2. )
- n gets larger and/or m gets smaller.
- Threshold2 1, 2, or 4.
- n is increased to 2. (Originally, n is 1. )
- n is increased to 4. (Originally, n is 1 or 2. )
- LM cross-component mode
- the current block’s prediction is formed by a weighted sum of one or more hypotheses of predictions from traditional intra prediction mode (s) and one or more hypotheses of predictions from LM mode (s) (or cross-component mode (s) ) .
- equal weights are applied to both.
- weights vary with neighboring coding information, sample position, block width, height, mode or area.
- More weighting schemes can reference “inverse LM” section.
- Mode A refers to a specific cross-component mode such as CCCM_LT or a specific cross-component family such as CCCM family (including CCCM_LT, CCCM_L, and/or CCCM_T) and/or CCLM family (including CCLM_LT, CCLM_L, and/or CCLM_T) and/or MMLM family (including MMLM_LT, MMLM_L, and/or MMLM_T) .
- CCCM_LT a specific cross-component family
- CCCM family including CCCM_LT, CCCM_L, and/or CCCM_T
- CCLM family including CCLM_LT, CCLM_L, and/or CCLM_T
- MMLM family including MMLM_LT, MMLM_L, and/or MMLM_T
- a weighting set is pre-defined to include multiple weighting candidates such as ⁇ 1, 3 ⁇ , ⁇ 3, 1 ⁇ , and/or equal weighting ⁇ 2, 2 ⁇ for one prediction from a traditional intra prediction mode and the other prediction from a cross-component mode.
- the weighting candidate with a higher weight for the prediction from a cross-component mode is used.
- the weighting candidate with an equal weighting is used.
- the weighting candidate with a smaller weight for the prediction from a cross-component mode is used.
- the neighbouring blocks include any subset of the coded blocks which are spatially adjacent to the top boundary or left boundary of the current block.
- the neighbouring blocks may refer to the top neighbouring block (located on the top of the top-right corner of the current block) and the left neighbouring block (located on the left of the bottom-left corner of the current block) .
- the neighbouring blocks include any subset of the coded blocks located in a pre-defined range spatially nearing the top boundary or left boundary of the current block. In this case, the neighbouring blocks can be adjacent or non-adjacent to the current block.
- the current block is partitioned into several regions.
- the sample positions in the same region share the same weighting. If the current region is close to the reference L neighbour, the weight for prediction from other intra prediction modes (non-cross-component modes) is higher than the weight for prediction from CCLM (or a cross-component mode) .
- CCLM or a cross-component mode
- the current chroma block’s prediction is formed by a weighted sum using template-derivation regression-based weighting. Specifically, the prediction of the current bock is formed by combining one or more proposed source terms and a proposed weighting setting.
- pred (i, j) is a target (predicted) sample in the current block which can be obtained after our proposed mechanism
- sourceTermSet0 includes one or more source terms from luma component
- sourceTermSet1 includes one or more source terms from chroma components
- biasTermSet includes one or more bias terms.
- the expression (1) is just an example and our proposed mechanism can use any subset or extension of sourceTermSet0, sourceTermSet1, and biasTermSet. Each sample or any subset of samples in the current block gets its target (predicted) sample according to the expression (1) .
- the content of sourceTermSet0 is described in some embodiments
- the content of sourceTermSet1 is described in some embodiments
- the content of biasTerm is described in some embodiments
- the predictor derivation using the proposed source terms and the proposed weighting setting is described in some embodiments. Examples for fusion of chroma intra prediction modes (or named chroma fusion mode) , with our proposed mechanism are shown in some embodiments.
- pred (i, j) (sourceTermSet0 (i, j) + sourceTermSet1 (i, j) + ...+ biasTerm) with the proposed weighting setting where (i, j) is a sample position in the current block.
- Some embodiments are for content of sourceTermSet0 (i, j) .
- SourceTermSet0 (i, j) includes one or more luma source terms denoted as sourceTerm0 0 , sourceTerm0 1 , ..., and/or sourceTerm0 n-1 .
- the value of n means the number of taps for the source term set.
- the source terms can be linear terms and/or non-linear terms, only linear terms, and/or only non-linear terms.
- n is a pre-defined value such as 1, 2, ...or any positive integer. For example, the pre-defined value is fixed in the standard.
- the pre-defined value is smaller than or equal to a maximum threshold indicated by a syntax in the bitstream where the syntax is at block, CTU, CTB, slice, tile, picture, sps, pps, picture, and/or sequence level.
- n is determined by coding information of the current block and/or sample position (i, j) .
- n is (1) fixed at a pre-defined value, (2) determined according to block width, block height, block area, coding information and/or sample information for the current block, (3) determined according to coding information and/or sample information for the adjacent/non-adjacent spatial neighboring reference region of the current block, and/or (4) determined according to coding information and/or sample information for the temporal reference region of the current block.
- the pattern of the n taps refers to a pattern defined as any subset of a window region M x N around (referring to excluding) or including the position (iL, jL) , as illustrated in FIG. 21. If the target sample is chroma (cb or cr) , (iL, jL) is the collocated luma position from (i, j) .
- pattern being 5x5 cross (including or not excluding (iL, jL) ) , as illustrated in FIG. 22.
- pattern being 5x5 diamond (including or not excluding (iL, jL) ) , as illustrated in FIG. 23.
- different taps refer to the source terms from different prediction modes or different mode types.
- one or more taps are from mode type intra, another one or more taps are from mode type inter, and/or another one or more taps are from mode type IBC.
- one or more taps are from MIP intra prediction modes, another one or more taps are from non-MIP intra prediction modes.
- the following embodiments are used to determine generation of source content.
- the source content is based on a predicted sample generated by a prediction mode and/or a reconstructed sample generated based on the predicted sample by a prediction mode and a reconstructed residual.
- the prediction mode belongs to mode type intra, mode type inter, or a third mode type (e.g. mode type IBC) .
- the prediction mode refers to planar, DC, horizontal, vertical, other angular (directional) prediction mode, any intra prediction modes specified in 67/131 intra prediction mode domain, wide-angle intra prediction (WAIP) modes, TIMD derived modes, DIMD derived modes, Intra template matching prediction (intraTMP) , and/or any intra prediction modes specified in the standard.
- the prediction mode refers to skip mode, regular merge modes, MMVD modes, affine modes, sbTMVP, AMVR, any merge mode specified in the standard, any AMVP (referring to inter non-merge) mode specified in the standard, or any inter mode specified in the standard.
- the prediction mode belonging to mode type IBC the prediction mode refers to IBC merge, IBC AMVP, or any IBC mode specified in the standard. Note that any possible combination between the prediction mode and the mode type is supported in this invention. That is, any mentioned prediction mode can be under any mode type according to the standard definition. For example, following the standard definition, if IBC mode belongs to mode type inter, the prediction mode belongs to mode type inter in the embodiments can refer to an IBC mode.
- the source content is the filtered source or the source with any pre-processing.
- the source content is the predicted/reconstructed sample after filtering with a pre-defined model or filter.
- the source content is gradient information from the predicted samples and/or reconstructed samples. If the target sample (i, j) belongs to chroma and gradient information of the collocated luma sample (as the center circle) is calculated with any one of the Sobel filters illustrated in FIG. 24 or any pre-defined filter. Each value around the center circle is multiplied with the corresponding predicted/reconstructed samples in the collocated luma block and then added with each other to form the gradient information for the source term of the target sample (i, j) .
- the predicted sample and/or the reconstructed sample is located within the collocated (luma) block from the current (chroma) block.
- the predicted sample and/or the reconstructed sample is treated as an initial sample and used as source content to generate the target sample.
- the values of the source terms are further adjusted (added or subtracted) by a pre-defined offset.
- the target sample refers to chroma
- several embodiments are used to generate the offset of the source term.
- the offset is determined as the averaging value of each (or any subset of) predicted or reconstructed samples in the collocated luma block from the current (chroma) block or in the reference region of the collocated luma block.
- the offset is determined as a sample value of a pre-defined predicted or reconstructed samples in the collocated luma block or in the reference region of the collocated luma block. For example, the sample value is from the top-left position (just outside of the top-left corner of the collocated luma block) .
- the source term may further include location information. For example, if the target sample refers to chroma (not luma) , the horizontal location of the collocated luma block from the sample (i, j) is used in a source term and the vertical location of the collocated luma block from the sample (i, j) is used in a source term.
- Some embodiments are for content of sourceTermSet1 (i, j) .
- SourceTermSet1 (i, j) includes one or more chroma (cb or cr) source terms denoted as sourceTerm1 0 , sourceTerm1 1 , ..., and/or sourceTerm1 m-1 .
- the value of m means the number of taps for the source term set.
- the source terms can be linear terms and/or non-linear terms, only linear terms, and/or only non-linear terms.
- m is a pre-defined value such as 1, 2, ...or any positive integer. For example, the pre-defined value is fixed in the standard.
- the pre-defined value is smaller than or equal to a maximum threshold indicated by a syntax in the bitstream where the syntax is at block, CTU, CTB, slice, tile, picture, sps, pps, picture, and/or sequence level.
- m is determined by coding information of the current block and/or sample position (i, j) .
- m is (1) fixed at a pre-defined value, (2) determined according to block width, block height, block area, coding information and/or sample information for the current block, (3) determined according to coding information and/or sample information for the adjacent/non-adjacent spatial neighboring reference region of the current block, and/or (4) determined according to coding information and/or sample information for the temporal reference region of the current block.
- the pattern of the m taps refers to a pattern defined as any subset of a window region M2 x N2 around (referring to excluding) or including the position (iC, jC) , as illustrated in FIG. 25. If the target sample is chroma (cb or cr) , (iC, jC) is (i, j) .
- pattern being 5x5 cross (including or not excluding (iC, jC) ) , as illustrated in FIG. 26.
- pattern being 5x5 diamond (including or not excluding (iC, jC) ) , as illustrated in FIG. 27.
- different taps refer to the source terms from different prediction modes or different mode types.
- one or more taps are from mode type intra, another one or more taps are from mode type inter, and/or another one or more taps are from mode type IBC.
- one or more taps are from MIP intra prediction modes, another one or more taps are from non-MIP intra prediction modes.
- the following embodiments are used to determine generation of source content.
- the source content is based on a predicted sample generated by a prediction mode and/or a reconstructed sample generated based on the predicted sample by a prediction mode and a reconstructed residual.
- the prediction mode belongs to mode type intra, mode type inter, or a third mode type (e.g. mode type IBC) .
- the prediction mode refers to planar, DC, horizontal, vertical, other angular (directional) prediction mode, any intra prediction modes specified in 67/131 intra prediction mode domain, wide-angle intra prediction (WAIP) modes, TIMD derived modes, DIMD derived modes, intraTMP, DBV, any one of cross-component modes (CCLM (including CCLM_LT, CCLM_L, and/or CCLM_T) , MMLM (including MMLM_LT, MMLM_L, and/or MMLM_T) , CCCM (including CCCM_LT, CCCM_L, and/or CCCM_T) , GLM, and/or any variation/extension of the above modes) , and/or any intra prediction modes specified in the standard.
- CCLM including CCLM_LT, CCLM_L, and/or CCLM_T
- the prediction mode refers to skip mode, regular merge modes, MMVD modes, affine modes, sbTMVP, AMVR, any merge mode specified in the standard, any AMVP mode specified in the standard, or any inter mode specified in the standard.
- the prediction mode belonging to mode type IBC refers to IBC merge, IBC AMVP, or any IBC mode specified in the standard. Note that any possible combination between the prediction mode and the mode type is supported in this invention. That is, any mentioned prediction mode can be under any mode type according to the standard definition. For example, following the standard definition, if IBC mode belongs to mode type inter, the prediction mode belongs to mode type inter in the embodiments can refer to an IBC mode.
- the source content is the filtered source or the source with any pre-processing.
- the source content is the predicted/reconstructed sample after filtering with a pre-defined model or filter.
- the source content is gradient information from the predicted samples and/or reconstructed samples.
- the predicted sample and/or the reconstructed sample is located within the current block.
- the values of the source terms are further adjusted (added or subtracted) by a pre-defined offset.
- the target sample refers to chroma
- several embodiments are used to generate the offset of the source term.
- the offset is determined as the averaging value of each (or any subset of) predicted or reconstructed samples in the current block or in the reference region of the current block.
- the offset is determined as a sample value of a pre-defined predicted or reconstructed samples in the current block or in the reference region of the current block. For example, the sample value is from the top-left position (just outside of the top-left corner of the current block) .
- the source term may further include location information. For example, if the target sample refers to chroma, the horizontal location (i) of (i, j) is used in a source term and the vertical location (j) of (i, j) is used in a source term.
- Some embodiments are for content of biasTerm
- Bias term is any pre-defined value.
- the bias term is a midValue according to bitDepth specified in the standard.
- the bias term is set as (1 ⁇ (bitDept -1) ) .
- the bias term is the same for each sample in the current block. That is, the bias term is regardless of the position (i, j) .
- Some embodiments are for predictor derivation for sample (i, j)
- the proposed weighting setting is to estimate the relationship (minimize the distortion) between the combining results of those source terms and the reconstructed samples on the reference region of the current block (for example, a current chroma block) by a pre-defined regression method, to generate a weighting (referring to model parameters) according to the regression method, and then to apply the weighting on the source terms to get the target (predicted) samples in the current block.
- the pre-defined regression method can be linear minimum mean square error (LMMSE) method as CCLM or can be any unified method with the regression method used for CCLM.
- the pre-defined regression method can be the LDL decomposition method as CCCM or can be any unified method with the regression method used for CCCM.
- the pre-defined regression method can be Gaussian elimination.
- the reference region of the current block is the spatial neighboring region of the current block.
- the spatial neighboring region of the current block includes above reference region, left reference region, above-left reference region, and/or any subset of the above, as illustrated in FIG. 28.
- the size of the above reference region is AW x AH
- the size of the left reference region is LW x LH
- the size of the above-left reference is ALW x ALH, where
- – AW block width of the current block (W) , k*W, W + block height of the current block (H) , any pre-defined value, or any adaptive value depending on the block position, block width, block height, and/or block area of the current block.
- AH or ALH H, any pre-defined value (1, 2, 4, ...) , or any adaptive value depending on the block position, block width, block height, and/or block area of the current block.
- ALW W, any pre-defined value (1, 2, 4, ...) , or any adaptive value depending on the block position, block width, block height, and/or block area of the current block.
- – LH H, k*H, H + W, any pre-defined value, or any adaptive value depending on the block position, block width, block height, and/or block area of the current block.
- sourceTermSet0 includes two taps as G (i, j) and rec’L (i, j) , sourceTermSet1 is not used, and biasTerm refers to another one tap as midValue.
- G (i, j) is the gradient information generated from a selected gradient filter and rec′ L (i, j) is down-sampled reconstructed luma sample.
- the model parameters (a0, a1, and a2) of the weighting are derived based on
- sourceTermSet0 includes six taps as C, Gy (i, j) , Gx (i, j) , Y, X, and P, sourceTermSet1 is not used, and biasTerm refers to another one tap as midValue.
- Gy (i, j) is the gradient information generated from a vertical gradient filter.
- Gx (i, j) is the gradient information generated from a horizontal gradient filter.
- Y and X are the vertical and horizontal locations of the collocated luma sample.
- sourceTermSet1 includes s taps as Pintra 0 to Pintra s-1 (or named as Pintra_0 to P_intra_s-1) and t taps as Pibc 0 to Pibc t-1 (or named as Pibc_0 to P_ibc_t-1) , sourceTermSet0 is not used, and biasTerm refers to another one tap as midValue.
- Pintra_s-1 (i, j) is the predicted sample from the s-1 mode which is selected from all or any subset of the candidate intra prediction modes for the coding mode.
- Pibc_t-1 (i, j) is the predicted sample from the t-1 mode which is selected from all or any subset of the candidate IBC prediction modes for the coding mode.
- s is set 1 when only one candidate intra prediction modes for the coding mode.
- the coding mode is IBC (or Direct Block Vector, DBV)
- DBV Direct Block Vector
- only one candidate intra prediction mode planar or TIMD derived intra prediction mode or chroma DM mode or DIMD derived intra prediction mode
- t is set 1 when only one selected IBC prediction mode for the coding mode.
- the coding mode is IBC (or DBV)
- only one IBC prediction mode (indicated by a signaled/parsed IBC index) is used.
- the proposed mechanism is treated as an optional mode (of IBC (or DBV) coded block) . That is, a flag is signalled/parsed at encoder/decoder to indicate whether to use the proposed mechanism for the current block.
- the flag is at block-level, CTU-level, slice-level, SPS-level, tile-level, PPS-level, and/or picture-level.
- the flag is context-coded. For example, only one context is used for signalling the flag.
- the context selection of the flag depends on the coding information, block width, block height, and/or block area of the current block and/or the coding information, block width, block height, and/or block area of the neighboring block.
- the proposed mechanism is a replacement method.
- the generation of the predictor is inferred to follow the proposed mechanisms.
- s is fixed at the pre-defined value in the standard.
- s is adaptive according to the coding information, block width, block height, and/or block area of the current block and/or the coding information, block width, block height, and/or block area of the neighboring block. For example, if the block width height, or area is larger than a pre-defined threshold, s/t is a larger number; otherwise, s/t is a smaller number.
- the to-be-minimized distortion is between the combining resulting, including (1) predictors generated from the mode 0 to s-1 on the reference region of the current block and (2) predictors generated from the mode 0 to t-1 on the reference region of the current block and (3) bias and (4) weighting setting, and the reconstructed samples on the reference region of the current block.
- sourceTermSet0 can be used in the expression. That is, the corresponding luma information can be used to generate the target chroma samples. For example, rec′ L (i, j) , G (i, j) , and/or Gy (i, j) , Gx (i, j) are added as the source terms in sourceTermSet0.
- sourceTermSet0 includes s taps as Rluma_0 to Rluma_s-1 (or named as Rluma 0 to Rluma s-1 )
- sourceTermSet1 includes t taps as Pibc_0 to Pibc_t-1 (or named as Pibc 0 to Pibc t-1 )
- biasTerm refers to another one tap as midValue.
- Rluma_s-1 (i, j) is the reconstructed luma sample from the s-1 collocated luma block from the current (chroma) block.
- Rluma_s-1 (i, j) is the s-1 reconstructed luma sample from a predefined set (including s samples) in one pre-defined collocated luma block from the current (chroma) block.
- s reconstructed luma samples correspond to a pre-defined pattern and the center of the pre-defined pattern is collocated from (i, j) of the current (chroma) block.
- Pibc_t-1 (i, j) is the predicted sample from the t-1 mode which is selected from all or any subset of the candidate IBC prediction modes for the coding mode.
- s is set as 1 when only one luma tap is used for the coding mode.
- t is set 1 when only one selected IBC prediction mode for the coding mode.
- the coding mode is IBC (or DBV)
- only one IBC prediction mode (indicated by a signaled/parsed IBC index) is used.
- the proposed mechanism is treated as an optional mode (of IBC (or DBV) coded block) . That is, a flag is signalled/parsed at encoder/decoder to indicate whether to use the proposed mechanism for the current block.
- the flag is at block-level, CTU-level, slice-level, SPS-level, tile-level, PPS-level, and/or picture-level.
- the flag is context-coded. For example, only one context is used for signalling the flag.
- the context selection of the flag depends on the coding information, block width, block height, and/or block area of the current block and/or the coding information, block width, block height, and/or block area of the neighboring block.
- the proposed mechanism is a replacement method.
- the generation of the predictor is inferred to follow the proposed mechanisms.
- s is fixed at the pre-defined value in the standard.
- s is adaptive according to the coding information, block width, block height, and/or block area of the current block and/or the coding information, block width, block height, and/or block area of the neighboring block. For example, if the block width height, or area is larger than a pre-defined threshold, s/t is a larger number; otherwise, s/t is a smaller number.
- the to-be-minimized distortion is between the combining resulting, including (1) luma reconstructed samples on the reference region of the collocated luma block from the current (chroma) block and (2) predictors generated from the mode 0 to t-1 on the reference region of the current block and (3) bias and (4) weighting setting, and the reconstructed samples on the reference region of the current block.
- sourceTermSet0 may use gradient information instead of rec′ L (i, j) or may further use gradient information in addition to rec′ L (i, j) .
- G (i, j) , Gy (i, j) , and/or Gx (i, j) are added as the source terms in sourceTermSet0.
- a long-tap post-filter when generating the target predictors of the current block and/or generating the template predictors on the reference region of the current block, a long-tap post-filter is applied.
- the filtering shape can be any pattern proposed in the above invention.
- the proposed methods can be enabled and/or disabled according to implicit rules (for example, block width, height, or area) or according to explicit rules (for example, syntax on block, tile, slice, picture, sps, or pps level) .
- implicit rules for example, block width, height, or area
- explicit rules for example, syntax on block, tile, slice, picture, sps, or pps level
- the proposed method is applied when the block area is smaller/larger than a threshold.
- the one or more LM mode (s) (or cross-component mode (s) ) which will be used to generate the one or more hypotheses of predictions for LM assisted Angular/Planar Mode and/or inter CCLM and/or MH CCLM are selected from a pre-defined merging candidate list (called modelList) .
- modelList a pre-defined merging candidate list
- One modelIdx is signalled to select a candidate from the candidate list (modelList) and the selected candidate is used for the current block.
- the modelList contains one or more candidates where each candidate refers to a model (or cross-component mode) information. If only one candidate is in the list (the size of the list is only 1) , the modellIdx is not signalled and/or can be inferred as 0 or a default value.
- predefined candidates when building modelList, one or more predefined candidates are added.
- the pre-defined candidates can include any subset/extension of the following candidates.
- CCLM_LT CCLM_L
- CCLM_T CCLM_T
- MMLM_LT MMLM_L
- MMLM_T MMLM_T
- CCCM_LT CCCM_L
- CCCM_T CCCM_T
- Spatial model information from spatial neighbour blocks (corresponding to “Spatial MVP from spatial neighbour CUs” for inter)
- Temporal model information from collocated blocks (corresponding to “Temporal MVP from collocated CUs” for inter)
- History-based model information from a FIFO table (corresponding to “History-based MVP from a FIFO table” for inter)
- Pairwise average model information (corresponding to “Pairwise average MVP” for inter)
- a valid spatial neighboring block can be from one of spatial adjacent and/or non-adjacent neighbors (or any subset of the blocks in a neighboring search region for the current block) which satisfies a pre-defined condition.
- the pre-defined condition is that the neighbor is encoded and/or decoded by or using a cross-component mode (such as CCLM, MMLM, CCCM, GLM, the mode with mode information inherited from a merge-like candidate list, MH CCLM, and/or any cross-component mode with syntax not belonging to tradition intra prediction modes) or cross-component-related mode (such as chroma fusion (or named LM assisted Angular/Planar Mode) , inter CCLM, and/or any traditional mode with syntax not belonging to cross-component modes but using the cross-component information to generate the prediction) .
- a cross-component mode such as CCLM, MMLM, CCCM, GLM, the mode with mode information inherited from a merge-like candidate list, MH CCLM, and/or any cross-component mode with syntax not belonging to tradition intra prediction modes
- cross-component-related mode such as chroma fusion (or named LM assisted Angular/Planar Mode)
- the pre-defined condition is that the cross-component mode of the neighbor belongs to a subset of above-mentioned cross-component modes which can be multi-model cross-component modes such as MMLM and/or multi-model CCCM.
- a candidate is added into the list if the candidate is valid.
- the following show some scanning orders when adding the Spatial model information from spatial neighbour blocks into the list.
- the scanning order follows B1 (above) ⁇ A1 (left) ⁇ B0 (right-above) ⁇ A0 (left-bottom) ⁇ B2 (left-above) or any pre-defined order.
- the scanning order follows adjacent candidates are prior to (or after) the non-adjacent candidates.
- the collocated block is from the block in the reference picture or collocated picture as inter mode.
- a valid temporal model information is added into the list as a candidate.
- the pre-defined condition in some embodiments can reference the examples in “Spatial model information from spatial neighbour blocks” with the neighbor replaced with the reference position of temporal model information.
- a history-based table (the FIFO table) is built and stores the model information from the previous coded blocks.
- a valid History-based model information is added into the table as a candidate.
- the pre-defined condition in some embodiments can reference the examples in “Spatial model information from spatial neighbour blocks” with the neighbor replaced with the previous coded block.
- the table can be reset as the beginning and/or end of a CTU, slice, picture, tile, and/or sequence. One or more than one candidates can be added into the list according to the table.
- the model information of this candidate is derived based on the model information from more than one of the previous candidates in the list.
- the default model information is added if the list is not full after inserting all pre-defined candidates.
- the default alpha (or named as ⁇ , a, or scaling parameters) are ⁇ 0, 1/8, -1/8, 2/8, -2/8, 3/8 ⁇
- the beta (or named as ⁇ , b, or offset parameter) is based on the selected default alpha, average neighboring reconstructed luma sample value, and average neighboring reconstructed chroma (Cb/Cr) sample value.
- only a subset of the model information is inherited. For example, only the alpha is inherited.
- the beta is obtained for the current block through the inherited alpha, average neighboring reconstructed luma sample value, and/or average neighboring reconstructed chroma (Cb/Cr) sample value.
- the scaling parameters and/or the classification threshold are inherited.
- the offset parameter in each class is derived according to the inherited classification threshold and the average neighboring reconstructed luma sample value, and/or average neighboring reconstructed chroma (Cb/Cr) sample value in each class. If no neighboring reconstructed samples are available in a class, the offset parameter is directly inherited from the candidate. For example of inheriting CCCM model information, all convolution parameters, offsets, and/or the classification threshold are inherited.
- the GLM candidate is 3-parameter GLM mode
- all the gradient pattern indices and model parameters are inherited; otherwise, if the GLM candidate is the 2-parameter GLM mode, the offset parameter is derived by using the inherited scaling parameter, average neighboring reconstructed luma sample value, and/or average neighboring reconstructed chroma (Cb/Cr) sample value.
- the derived MMLM parameters are inherited and used as when inheriting a MMLM candidate for the current block.
- all model information is inherited. For example, both the alpha and beta are inherited.
- the to-be-propagated mode for the current block (using chroma fusion or named as LM assisted Angular/Planar Mode, inter CCLM, or MH CCLM) is set/stored as the inherited mode, referring to any cross-component mode and/or any cross-component model and/or any subset of cross-component model information used for the current block, such as CCLM, MMLM, CCCM, and/or GLM.
- the candidate types are aligned with the candidate types for the merge mode.
- pruning operations are applied to avoiding duplicated candidates in the list.
- the signalling of the modelIdx for the current block depends on context coding, block width, block height, block area, and/or explicit syntax such as SPS, PPS, slice, CTU, picture, sequence, and/or tile level signalling.
- the candidate selection (from the list) for the current block depends on a pre-defined process.
- the pre-defined process is a TIMD or DIMD-like method.
- (1) a cost for each candidate in the list is calculated based on the distortion of the reconstructed samples on the neighboring template and the predicted samples from this candidate on the neighboring template (2) one or more candidates with smaller costs (or smaller distortions) are selected and/or (3) the one or more selected candidates are used to generate the cross-component prediction of the current block.
- the pre-defined process depends on the neighboring template of the current block.
- the pre-defined process depends on the mode information on the neighboring blocks. If most of the pre-defined neighboring blocks use a specific mode, the first candidate (in the list) referring to the specific mode is selected.
- the signalling of the modelIdx is bypass (not required for signalling) .
- the reference samples can be based on not only original left and top neighboring reconstructed samples but also proposed right and bottom LM-predicted samples. The following shows an example.
- the collocated luma block is reconstructed.
- the neighboring luma reconstructed samples of the collocated luma block” and “the neighboring chroma reconstructed samples of the current chroma block” are used for deriving LM parameters.
- the reconstructed samples of the collocated luma block with the derived parameters are used for obtaining the right-bottom LM-predicted samples of the current chroma block.
- Right-bottom region of the current chroma block can be any subset of the region in FIG. 30.
- the prediction of the current block is generated bi-directionally by referencing original L neighboring region (original top and left region, obtained using a traditional intra prediction mode) and proposed inverse-L region (obtained using LM) .
- the predictors from the original top and left region and the predictors from bottom and left region are combined with weighting.
- equal weights are applied to both.
- weights vary with neighboring coding information, sample position, block width, height, mode or area.
- the weight for the prediction from the traditional intra prediction mode decays.
- this proposed method can be applied to inverse LM. Then, when doing luma intra prediction, the final prediction is bi-directional. (similar to the above example for a chroma block)
- the proposed LM assisted Angular/Planar Mode assists chroma with getting the correct curved angle.
- the proposed methods in this invention can be enabled and/or disabled according to implicit rules (e.g. block width, height, or area) or according to explicit rules (e.g. syntax on block, slice, picture, sps, or pps level) .
- implicit rules e.g. block width, height, or area
- explicit rules e.g. syntax on block, slice, picture, sps, or pps level
- block in this invention can refer to TU/TB, CU/CB, PU/PB, or CTU/CTB.
- LM in this invention can be viewed as one kind of CCLM/MMLM modes or any other extension/variation of CCLM (e.g. the proposed CCLM extension/variation in this invention) .
- the variations of CCLM here mean that some optional modes can be selected when the block indication refers to using one of cross-component modes (e.g. CCLM_LT, MMLM_LT, CCLM_L, CCLM_T, MMLM_L, MMLM_T, and/or an intra prediction mode, which is not one of traditional DC, planar, and angular modes) for the current block.
- CCCM convolutional cross-component mode
- CCCM family includes CCCM_LT CCCM_L, and/or CCCM_T.
- any of the foregoing proposed methods can be implemented in encoders and/or decoders.
- any of the proposed methods can be implemented in an intra/inter/IBC coding module of an encoder and/or decoder, and/or a motion compensation module and/or a merge candidate derivation module of an encoder and/or decoder.
- any of the proposed methods can be implemented as a circuit coupled to the intra/inter/IBC coding module of an encoder and/or decoder and/or motion compensation module and/or a merge candidate derivation module of the encoder and/or decoder.
- FIG. 31 is a block diagram illustrating a video encoder that supports the proposed intra chroma prediction mode according to an embodiment of the present invention.
- the video encoder 100 may be a VVC encoder.
- the video encoder 100 may perform intra and inter predictive coding of video blocks within video frames.
- Intra predictive coding relies on spatial prediction to reduce or remove spatial redundancy in video data within a given video frame or picture.
- Inter predictive coding relies on temporal prediction to reduce or remove temporal redundancy in video data within adjacent video frames or pictures of a video sequence.
- the proposed intra chroma prediction mode is a coding tool for chroma samples in a chroma block, and uses at least one cross-component predictor (which is an intra predictor) to improve accuracy of a non-cross-component predictor (which is also an intra predictor) .
- the video encoder 100 includes an encoding circuit 101 and a video data memory 102.
- the video data memory 102 is arranged to receive data to be encoded as a current block of pixels of a current picture of a video, wherein the current block includes at least one chroma block.
- the encoding circuit 101 is arranged to perform encoding of the current block by a target intra prediction mode.
- the encoding circuit 101 may include a prediction processing circuit 104, a residual generation circuit 106, a transform circuit (labeled by “T” ) 108, a quantization circuit (labeled by “Q” ) 110, an entropy encoding circuit (e.g., a variable-length code (VLC) encoder) 112, an inverse transform circuit (labeled by “IQ” ) 114, an inverse transform circuit (labeled by “IT” ) 116, a reconstruction circuit 118, one or more in-loop filters 120, and a decoded picture buffer (DPB) 122.
- VLC variable-length code
- IQ inverse transform circuit
- I inverse transform circuit
- DPB decoded picture buffer
- the prediction processing circuit 104 may include a partition circuit 124, a motion estimation circuit (labeled by “ME” ) 126, a motion compensation circuit (labeled by “MC” ) 128, an intra prediction circuit (labeled by “IP” ) 130, and an intra chroma prediction circuit (labeled by “ICP” ) 132.
- the proposed intra chroma prediction mode is supported by the prediction processing circuit 104 (particularly, intra chroma prediction circuit 132 of prediction processing circuit 104) .
- the intra prediction circuit 130 of the prediction processing circuit 104 is arranged to obtain an intra predictor of a chroma sample included in a chroma block (e.g., a Cr block or a Cb block) .
- the intra predictor is a non-cross-component predictor Non-CCP determined under a non-cross-component intra mode such as a DC mode, or planar mode, or any angular mode, or DM mode or DIMD mode.
- any intra prediction means capable of obtaining the non-cross-component predictor Non-CCP of a chroma sample included in a chroma block can be employed by the prediction processing circuit 104.
- the intra chroma prediction circuit 132 is arranged to obtain at least one (i.e., one or more) cross-component predictor CCP of the chroma sample, and determine intra prediction of the chroma sample (e.g., a final predictor P_CB/CR) by jointly considering the non-cross-component predictor Non-CCP and the cross-component predictor (s) CCP. For example, the intra chroma prediction circuit 132 blends the non-cross-component predictor Non-CCP and the cross-component predictor (s) CCP to generate a weighted predictor as the intra prediction of the chroma sample.
- the intra chroma prediction circuit 132 blends the non-cross-component predictor Non-CCP and the cross-component predictor (s) CCP to generate a weighted predictor as the intra prediction of the chroma sample.
- the cross-component predictor (s) CCP to be blended with the non-cross-component predictor Non-CCP may include a CCLM-based predictor and/or a predictor generated using any cross-component mode.
- the non-cross-component predictor Non-CCP and the cross-component predictor CCP may be blended using pre-defined (fixed) weighting such as ⁇ 3: 1 ⁇ or ⁇ 1: 3 ⁇ .
- the non-cross-component predictor Non-CCP and the cross-component predictor CCP may be blended using template-derivation regression-based weighting.
- the manner of determining the template-derivation regression-based weighting is similar to that employed by CCCM for determining filter coefficients.
- weighting of the non-cross-component predictor Non-CCP and the cross-component predictor CCP is calculated by minimizing an error between predicted and reconstructed chroma samples in a neighboring template.
- different weighting settings of the non-cross-component predictor Non-CCP and the cross-component predictor CCP are applied to reference samples in the neighboring template to generate a plurality of sets of predicted chroma samples of the neighboring template.
- a regression-based equation may be used to evaluate similarity between each set of predicted chroma samples of the neighboring template and the reconstructed chroma samples of the neighboring template.
- One of the weighting settings that leads to a minimum error between predicted and reconstructed chroma samples in the neighboring template is selected and used for blending the non-cross-component predictor Non-CCP and the cross-component predictor CCP to generate the final predictor P_CB/CR.
- the intra chroma prediction circuit 132 may determine the cross-component predictor CCP according to a cross-component model candidate from a previous-coded block. For example, the intra chroma prediction circuit 132 may construct a merge candidate list MCL for cross-component prediction, and determine the cross-component predictor CCP according to a cross-component model candidate selected from the merge candidate list MCL.
- the candidate types may be aligned with that used by inter motion merge candidate list, or may be a subset of that used by inter motion merge candidate list. The difference between the merge candidate list MCL and the inter motion merge candidate list is that the merge candidate list MCL includes inherited cross-component models, while the inter motion merge candidate list includes inherited motion information.
- the merge candidate list MCL may include spatial candidates.
- the intra chroma prediction circuit 132 may add a cross-component model located at a spatial neighboring position to the merge candidate list MCL.
- the spatial neighboring position and a sample position of the chroma sample are located at the same frame.
- the spatial neighboring position may be an adjacent position or a non-adjacent position with respect to the boundary of the current chroma block.
- the merge candidate list MCL may include temporal candidates.
- the intra chroma prediction circuit 132 may add a cross-component model located at a temporal collocated position to the merge candidate list MCL.
- the temporal collocated position and a sample position of the chroma sample are located at the same position in different frames.
- the merge candidate list MCL may include history-based candidates.
- the intra chroma prediction circuit 132 may add a cross-component model from a history table to the merge candidate list MCL, wherein the history table may be implemented by a first-in first-out (FIFO) buffer that contains cross-component models of previously coded chroma blocks (i.e., previous chroma blocks that are encoded before the current chroma block) .
- FIFO first-in first-out
- the merge candidate list MCL may include default candidates.
- the intra chroma prediction circuit 132 may add a default cross-component model (i.e., pre-defined cross-component model) to the merge candidate list MCL.
- the intra chroma prediction circuit 132 may output a mode index IDX to the entropy encoding circuit 112, such that the mode index IDX is encoded into the encoded video bitstream output from the video encoder 100.
- the data to be encoded as the current block of pixels of the current picture of the video includes a syntax for a mode index to indicate selection of the cross-component model candidate.
- the mode index IDX is signalled to indicate selection of the cross-component model candidate at the video encoder 100.
- selection of the cross-component model candidate may be implicitly derived without any mode index signalling.
- the intra chroma prediction circuit 132 may employ an implicit derivation manner similar to that employed by TIMD. Specifically, the intra chroma prediction circuit 132 uses reference samples of a neighboring template to calculate a cost for each cross-component model candidate included in the merge candidate list MCL, and select one of cross-component model candidates included in the merge candidate list MCL that has a minimum cost as the cross-component model candidate used for obtaining the cross-component predictor CCP.
- the intra chroma prediction circuit 132 uses the cross-component predictor CCP to improve accuracy of the non-cross-component predictor Non-CCP.
- the proposed intra chroma prediction mode is not enabled by the prediction processing circuit 104 unless all enabling conditions are satisfied.
- the enabling conditions may include a size condition of the chroma block, where the size condition may be defined by a block height, a block width, and/or a block area. Hence, when the block height and/or width and/or area is smaller than (or larger than) a pre-defined threshold, the size condition is satisfied.
- the enabling conditions may include supported prediction modes of the chroma block, where the supported prediction modes may include all non-cross-component modes or any sub-mode of non-cross-component modes.
- the prediction mode condition is satisfied.
- the data to be encoded as the current block of pixels of the current picture of the video includes a flag to indicate to apply the target inter prediction mode.
- the flag EN is not signalled by the encoder and/or is inferred at the encoder.
- FIG. 32 is a block diagram illustrating a video decoder that supports the proposed intra chroma prediction mode according to an embodiment of the present invention.
- the video decoder 200 may be a VVC decoder.
- the video decoder 200 includes a decoding circuit 201 and a video data memory 202.
- the video data memory 102 is arranged to receive data to be decoded as a current block of pixels of a current picture of a video, wherein the current block includes at least one chroma block.
- the decoding circuit 201 is arranged to perform decoding of the current block by a target intra prediction mode.
- the decoding circuit 201 may include an entropy decoding circuit (e.g., a VLC decoder) 204, an inverse quantization circuit (labeled by “IQ” ) 206, an inverse transform circuit (labeled by “IT” ) 208, a reconstruction circuit 210, a prediction processing circuit 212, one or more in-loop filters 214, and a decoded picture buffer (DPB) 216.
- entropy decoding circuit e.g., a VLC decoder
- IQ inverse quantization circuit
- IT inverse transform circuit
- DPB decoded picture buffer
- the prediction processing circuit 212 may include a motion compensation circuit (labeled by “MC” ) 218, an intra prediction circuit (labeled by “IP” ) 220, and an intra chroma prediction circuit (labeled by “ICP” ) 222.
- the proposed intra chroma prediction mode is supported by the prediction processing circuit 212 (particularly, intra chroma prediction circuit 222 of prediction processing circuit 212) .
- the present invention is focused on the proposed intra chroma prediction mode and a person skilled in the art should readily understand details of other circuit components included in the video decoder 200, further description of principles of other circuit components included in the video decoder 200 is omitted here for brevity.
- the intra prediction circuit 220 of the prediction processing circuit 212 is arranged to obtain a non-cross-component predictor (which is an intra predictor) Non-CCP of a chroma sample included in a chroma block (e.g., a Cr block or a Cb block) .
- the non-cross-component predictor Non-CCP used by the encoder-side intra chroma prediction circuit 132 for determining the final predictor P_CB/CR is the same as that used by the decoder-side intra chroma prediction circuit 222 for determining the final predictor P_CB/CR.
- the intra chroma prediction circuit 222 is arranged to obtain at least one (i.e., one or more) cross-component predictor (which is an intra predictors) CCP of the chroma sample, and determine intra prediction of the chroma sample (e.g., a final predictor P_CB/CR) by jointly considering the non-cross-component predictor Non-CCP and the cross-component predictor (s) CCP.
- the intra chroma prediction circuit 222 blends the non-cross-component predictor Non-CCP and the cross-component predictor (s) CCP to generate a weighted predictor as the intra prediction of the chroma sample.
- the cross-component predictor (s) CCP may include a CCLM-based predictor and/or a predictor generated using any cross-component mode.
- the non-cross-component predictor Non-CCP and the cross-component predictor CCP may be blended using pre-defined (fixed) weighting such as ⁇ 3: 1 ⁇ or ⁇ 1: 3 ⁇ .
- the non-cross-component predictor Non-CCP and the cross-component predictor CCP may be blended using template-derivation regression-based weighting.
- the intra chroma prediction circuit 222 may construct a merge candidate list MCL for cross-component prediction, and determine the cross-component predictor CCP according to a cross-component model candidate selected from the merge candidate list MCL.
- the candidate types may be aligned with that used by inter motion merge candidate list, or may be a subset of that used by inter motion merge candidate list.
- the merge candidate list MCL constructed by the decoder-side intra chroma prediction circuit 222 is the same as that constructed by the encoder-side intra chroma prediction circuit 132.
- the merge candidate list MCL may include spatial candidates.
- the intra chroma prediction circuit 222 may add a cross-component model located at a spatial neighboring position to the merge candidate list MCL.
- the spatial neighboring position and a sample position of the chroma sample are located at the same frame.
- the spatial neighboring position may be an adjacent position or a non-adjacent position with respect to the boundary of the current chroma block.
- the merge candidate list MCL may include temporal candidates.
- the intra chroma prediction circuit 222 may add a cross-component model located at a temporal collocated position to the merge candidate list MCL.
- the temporal collocated position and a sample position of the chroma sample are located at the same position in different frames.
- the merge candidate list MCL may include history-based candidates.
- the intra chroma prediction circuit 222 may add a cross-component model from a history table to the merge candidate list MCL, wherein the history table may be implemented by a first-in first-out (FIFO) buffer that contains cross-component models of previously coded chroma blocks (i.e., previous chroma blocks that are decoded before the current chroma block) .
- FIFO first-in first-out
- the merge candidate list MCL may include default candidates.
- the intra chroma prediction circuit 222 may add a default cross-component model (i.e., pre-defined cross-component model) to the merge candidate list MCL.
- the entropy decoding circuit 204 may parse the mode index IDX from the encoded video bitstream, and informs the prediction processing circuit 212 (particularly, intra chroma prediction circuit 222 of prediction processing circuit 212) of the mode index IDX.
- the data to be decoded as the current block of pixels of the current picture of the video includes a syntax for a mode index to indicate selection of the cross-component model candidate.
- the intra chroma prediction circuit 222 refers to the signalled mode index IDX to select the same cross-component model candidate used by the video encoder (e.g., video encoder 100) from the merge candidate list MCL.
- selection of the cross-component model candidate may be implicitly derived without parsing any signalled mode index from the encoded video bitstream.
- the intra chroma prediction circuit 222 may employ an implicit derivation manner similar to that employed by TIMD. Specifically, the intra chroma prediction circuit 222 uses reference samples of a neighboring template to calculate a template matching (TM) cost for each cross-component model candidate included in the merge candidate list MCL, and select one of cross-component model candidates included in the merge candidate list MCL that has a minimum TM cost as the cross-component model candidate used for obtaining the cross-component predictor CCP.
- TM template matching
- both of the video encoder and the video decoder follow the same template-based manner to select a cross-component model candidate from the same merge candidate list MCL constructed at both of the video encoder and the video decoder, no signalling of the mode index IDX from the video encoder to the video decoder is needed, and no parsing of the signalled mode index IDX at the video decoder is needed. In this way, signalling overhead can be reduced.
- the intra chroma prediction circuit 222 uses the cross-component predictor CCP to improve accuracy of the non-cross-component predictor Non-CCP.
- the proposed intra chroma prediction mode is not enabled by the prediction processing circuit 212 unless all enabling conditions are satisfied.
- the enabling conditions may include a size condition of the chroma block, where the size condition may be defined by a block height, a block width, and/or a block area. Hence, when the block height and/or width and/or area is smaller than (or larger than) a pre-defined threshold, the size condition is satisfied.
- the enabling conditions may include supported prediction modes of the chroma block, where the supported prediction modes may include all non-cross-component modes or any sub-mode of non-cross-component modes.
- the prediction mode condition is satisfied.
- the data to be decoded as the current block of pixels of the current picture of the video includes a flag to indicate to apply the target inter prediction mode.
- the intra chroma prediction circuit 222 uses the cross-component predictor CCP to improve accuracy of the non-cross-component predictor Non-CCP.
- the flag EN is not signalled to the decoder and/or is not parsed from the decoder and/or is inferred at the decoder.
- FIG. 33 is a flowchart illustrating a video coding method according to an embodiment of the present invention.
- the video coding method may be employed by the video encoder 100 shown in FIG. 31 for encoding of video data or the video decoder 200 shown in FIG. 32 for decoding of encoded video bitstream. Provided that the result is substantially the same, the steps are not required to be executed in the exact order shown in FIG. 33.
- data to be encoded or decoded is received as a current block of pixels of a current picture of a video, wherein the current block includes at least one chroma block.
- encoding or decoding of the current block is performed by a target intra prediction mode.
- the step 2504 includes sub-steps 2506, 2508, and 2510.
- a non-cross-component predictor of a chroma sample included in the at least one chroma block is obtained, wherein the non-cross-component predictor is an intra predictor.
- at least one cross-component predictor of the chroma sample is obtained according to a cross-component model candidate from a previous-coded block.
- intra prediction of the chroma sample is determined by jointly considering (e.g., blending) the non-cross-component predictor and the at least one cross-component predictor.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Procédé de codage vidéo consiste à : recevoir des données à coder ou à décoder au niveau d'un bloc actuel de pixels d'une image actuelle d'une vidéo, le bloc actuel comprenant au moins un bloc de chrominance ; et coder ou décoder le bloc actuel par un mode d'intra-prédiction cible, qui comprend : l'obtention d'un prédicteur de non-inter-composantes d'un échantillon de chrominance compris dans le ou les blocs de chrominance, le prédicteur de non- inter-composantes étant un intra-prédicteur ; l'obtention d'au moins un prédicteur d'inter-composantes de l'échantillon de chrominance ; et la détermination d'une intra-prédiction de l'échantillon de chrominance en considérant conjointement le prédicteur de non-inter-composantes et le ou les prédicteur d'inter-composantes. Le ou les prédicteurs d'inter-composantes sont déterminés en fonction d'un candidat de modèle d'inter-composantes à partir d'un bloc codé précédent.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202363439259P | 2023-01-16 | 2023-01-16 | |
US63/439,259 | 2023-01-16 | ||
US202363490804P | 2023-03-17 | 2023-03-17 | |
US63/490,804 | 2023-03-17 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2024153085A1 true WO2024153085A1 (fr) | 2024-07-25 |
WO2024153085A8 WO2024153085A8 (fr) | 2024-08-22 |
Family
ID=91955319
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2024/072601 WO2024153085A1 (fr) | 2023-01-16 | 2024-01-16 | Procédé de codage vidéo et appareil de prédiction de chrominance |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024153085A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4485922A1 (fr) * | 2023-06-30 | 2025-01-01 | Sharp Kabushiki Kaisha | Dispositif et procédé de décodage de données vidéo |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB201509938D0 (en) * | 2015-06-08 | 2015-07-22 | Canon Kk | Schemes for handling an AMVP flag when implementing intra block copy coding mode |
US20200404312A1 (en) * | 2019-06-21 | 2020-12-24 | Panasonic Intellectul Property Corporation of America | System and method for video coding |
US20210321111A1 (en) * | 2018-08-24 | 2021-10-14 | Samsung Electronics Co., Ltd. | Method and apparatus for image encoding, and method and apparatus for image decoding |
US20220030269A1 (en) * | 2018-09-21 | 2022-01-27 | Canon Kabushiki Kaisha | Video coding and decoding |
US20220124324A1 (en) * | 2019-06-25 | 2022-04-21 | Nippon Hoso Kyokai | Intra prediction device, image decoding device and program |
US20220191487A1 (en) * | 2019-09-03 | 2022-06-16 | Panasonic Intellectual Property Corporation Of America | System and method for video coding |
-
2024
- 2024-01-16 WO PCT/CN2024/072601 patent/WO2024153085A1/fr unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB201509938D0 (en) * | 2015-06-08 | 2015-07-22 | Canon Kk | Schemes for handling an AMVP flag when implementing intra block copy coding mode |
US20210321111A1 (en) * | 2018-08-24 | 2021-10-14 | Samsung Electronics Co., Ltd. | Method and apparatus for image encoding, and method and apparatus for image decoding |
US20220030269A1 (en) * | 2018-09-21 | 2022-01-27 | Canon Kabushiki Kaisha | Video coding and decoding |
US20200404312A1 (en) * | 2019-06-21 | 2020-12-24 | Panasonic Intellectul Property Corporation of America | System and method for video coding |
US20220124324A1 (en) * | 2019-06-25 | 2022-04-21 | Nippon Hoso Kyokai | Intra prediction device, image decoding device and program |
US20220191487A1 (en) * | 2019-09-03 | 2022-06-16 | Panasonic Intellectual Property Corporation Of America | System and method for video coding |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4485922A1 (fr) * | 2023-06-30 | 2025-01-01 | Sharp Kabushiki Kaisha | Dispositif et procédé de décodage de données vidéo |
Also Published As
Publication number | Publication date |
---|---|
WO2024153085A8 (fr) | 2024-08-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11153559B2 (en) | Usage of LUTs | |
WO2024153085A1 (fr) | Procédé de codage vidéo et appareil de prédiction de chrominance | |
US20250039371A1 (en) | Video signal processing method and apparatus therefor | |
US20250039400A1 (en) | Video signal processing method using obmc, and device therefor | |
WO2024153079A1 (fr) | Procédé et appareil de codage vidéo de prédiction de chrominance | |
WO2024222760A1 (fr) | Procédé et appareil de codage vidéo pour améliorer la prédiction de chrominance par fusion | |
CN116193125A (zh) | 用于针对视频编解码的预测相关残差缩放的方法和设备 | |
WO2025007974A1 (fr) | Procédés et appareil de prédiction adaptative inter-composantes pour codage de chrominance | |
WO2025007952A1 (fr) | Procédés et appareil d'amélioration de codage vidéo par dérivation de modèle | |
WO2023241637A1 (fr) | Procédé et appareil de prédiction inter-composantes avec mélange dans des systèmes de codage vidéo | |
WO2024193428A1 (fr) | Procédé et appareil de prédiction de chrominance dans un système de codage vidéo | |
WO2025026397A1 (fr) | Procédés et appareil de codage vidéo utilisant une prédiction inter-composantes à hypothèses multiples pour un codage de chrominance | |
WO2023198142A1 (fr) | Procédé et appareil de prédiction implicite de composantes croisées dans un système de codage vidéo | |
WO2024193386A1 (fr) | Procédé et appareil de fusion de mode luma intra de modèle dans un système de codage vidéo | |
WO2024217574A1 (fr) | Procédé et appareil de codage vidéo pour déterminer la prédiction d'un bloc courant par mélange multi-hypothèse | |
WO2024193431A1 (fr) | Procédé et appareil de prédiction combinée dans un système de codage vidéo | |
WO2024213017A1 (fr) | Procédé, appareil et support de traitement vidéo | |
WO2024222399A1 (fr) | Affinement pour différence de vecteur de mouvement en mode fusion | |
WO2024120307A1 (fr) | Procédé et appareil de réordonnancement de candidats de modèles inter-composantes hérités dans un système de codage vidéo | |
WO2024235144A1 (fr) | Procédé et appareil de filtrage d'interpolation adaptatif dans un système de codage vidéo | |
WO2024149384A1 (fr) | Modes de codage basés sur la régression | |
WO2024140853A1 (fr) | Procédé, appareil et support de traitement vidéo | |
WO2023274302A1 (fr) | Unité de prédiction récursive dans un codage vidéo | |
WO2025021011A1 (fr) | Mode de prédiction combiné | |
WO2024169882A1 (fr) | Procédés et appareil de vecteur de mouvement initial priorisé pour affinement de mouvement côté décodeur dans un codage vidéo |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24744260 Country of ref document: EP Kind code of ref document: A1 |