[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2024116691A1 - Dispositif de décodage vidéo et dispositif de codage vidéo - Google Patents

Dispositif de décodage vidéo et dispositif de codage vidéo Download PDF

Info

Publication number
WO2024116691A1
WO2024116691A1 PCT/JP2023/039040 JP2023039040W WO2024116691A1 WO 2024116691 A1 WO2024116691 A1 WO 2024116691A1 JP 2023039040 W JP2023039040 W JP 2023039040W WO 2024116691 A1 WO2024116691 A1 WO 2024116691A1
Authority
WO
WIPO (PCT)
Prior art keywords
mode
template
timd
intra
prediction
Prior art date
Application number
PCT/JP2023/039040
Other languages
English (en)
Japanese (ja)
Inventor
将伸 八杉
知宏 猪飼
友子 青野
Original Assignee
シャープ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP2022189022A external-priority patent/JP2024077148A/ja
Priority claimed from JP2022208682A external-priority patent/JP2024092612A/ja
Priority claimed from JP2023038230A external-priority patent/JP2024129196A/ja
Application filed by シャープ株式会社 filed Critical シャープ株式会社
Publication of WO2024116691A1 publication Critical patent/WO2024116691A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • Embodiments of the present invention relate to a video decoding device and a video encoding device.
  • a video encoding device that generates encoded data by encoding video
  • a video decoding device is used that generates a decoded image by decoding the encoded data
  • video coding methods include H.264/AVC, HEVC (High-Efficiency Video Coding), and VVC (Versatile Video Coding).
  • the images (pictures) that make up a video are managed in a hierarchical structure consisting of slices obtained by dividing the images, coding tree units (CTUs) obtained by dividing the slices, coding units (sometimes called coding units: CUs) obtained by dividing the coding tree units, and transform units (TUs) obtained by dividing the coding units, and are coded/decoded for each CU.
  • CTUs coding tree units
  • coding units sometimes called coding units: CUs
  • transform units TUs
  • a predicted image is usually generated based on a locally decoded image obtained by encoding/decoding an input image, and the prediction error (sometimes called a "difference image” or “residual image") obtained by subtracting the predicted image from the input image (original image) is coded.
  • the prediction error sometimes called a "difference image” or “residual image”
  • Methods for generating predicted images include inter-frame prediction (inter prediction) and intra-frame prediction (intra prediction).
  • Non-Patent Document 1 discloses template-based intra mode derivation (TIMD) prediction, in which a decoder generates a predicted image by deriving an intra direction prediction mode number using pixels in adjacent regions.
  • TMD template-based intra mode derivation
  • an image of a template reference region near the target block is used to generate a template predicted image from an adjacent image (template image) of the target block for an intra prediction mode candidate. Then, the intra prediction mode candidate that reduces the cost of the template image and template predicted image is selected as the intra prediction mode for the target block.
  • template predicted images and cost calculations for multiple intra prediction mode candidates which poses an issue of extremely large computational complexity.
  • the present invention aims to reduce the complexity of template-based intra-mode derivation.
  • the video decoding device of this embodiment is characterized in that it uses images of the template region and template reference region to generate a template predicted image for a template image and a predetermined intra prediction mode candidate, and in TIMD prediction that selects an intra prediction mode for a target block based on the template predicted image and a cost derived from the template image, it determines the positions of the template region and the template reference region by referring to a template-based intra mode derivation flag.
  • It is also characterized in that it divides the predetermined intra prediction mode candidates into a plurality of groups based on the prediction direction, generates a template predicted image for each intra prediction mode candidate selected from each group, evaluates the cost, and in the group to which the intra prediction mode candidate with the lowest cost belongs, it derives costs and selects candidates other than the selected intra prediction mode candidate.
  • the complexity of template-based intra-mode derivation can be reduced.
  • FIG. 1 is a schematic diagram showing a configuration of an image transmission system according to an embodiment of the present invention.
  • FIG. 2 is a diagram showing a hierarchical structure of data in an encoded stream.
  • FIG. 13 is a schematic diagram showing types of intra-prediction modes (mode numbers).
  • FIG. 1 is a schematic diagram showing a configuration of a video decoding device.
  • FIG. 13 is a diagram illustrating a configuration of an intra-prediction image generating unit.
  • 11 is a diagram showing the relationship between a target block, a template region, and a template reference region.
  • FIG. 2 is a diagram illustrating details of a TIMD prediction unit. 1 is an example of syntax for TIMD prediction.
  • FIG. 11 is a diagram showing the relationship between a target block, a template region, and a template reference region.
  • 11 is a diagram showing the relationship between a target block, a template region, and a template reference region.
  • 11 is a diagram showing the relationship between a target block, a template region, and a template reference region.
  • 11 is a diagram showing the relationship between a target block, a template region, and a template reference region.
  • FIG. 11 is a diagram showing the relationship between a target block, a template region, and a template reference region.
  • FIG. FIG. 1 is a block diagram showing a configuration of a video encoding device. 13 is an example of the syntax for the ISP mode.
  • 11 is a diagram showing the relationship between a target block, a template region, and a template reference region.
  • FIG. 1 is a diagram showing an example of a group of TIMDs.
  • FIG. 1 is a schematic diagram showing the configuration of an image transmission system 1 according to this embodiment.
  • the image transmission system 1 is a system that transmits an encoded stream obtained by encoding an image to be encoded, and decodes the transmitted encoded stream to display an image.
  • the image transmission system 1 is composed of a video encoding device (image encoding device) 11, a network 21, a video decoding device (image decoding device) 31, and a video display device (image display device) 41.
  • An image T is input to the video encoding device 11.
  • the network 21 transmits the encoded stream Te generated by the video encoding device 11 to the video decoding device 31.
  • the network 21 is the Internet, a wide area network (WAN), a local area network (LAN), or a combination of these.
  • the network 21 is not necessarily limited to a bidirectional communication network, and may be a unidirectional communication network that transmits broadcast waves such as terrestrial digital broadcasting and satellite broadcasting.
  • the network 21 may also be replaced by a storage medium on which the encoded stream Te is recorded, such as a DVD (Digital Versatile Disc: registered trademark) or a BD (Blu-ray Disc: registered trademark).
  • the video decoding device 31 decodes each of the encoded streams Te transmitted by the network 21 and generates one or more decoded images Td.
  • the video display device 41 displays all or part of one or more decoded images Td generated by the video decoding device 31.
  • the video display device 41 is equipped with a display device such as a liquid crystal display or an organic EL (Electro-luminescence) display. Display forms include stationary, mobile, HMD, etc. Furthermore, when the video decoding device 31 has high processing power, it displays high quality images, and when it has only lower processing power, it displays images that do not require high processing power or display power.
  • x?y:z is a ternary operator that takes y if x is true (non-zero) and z if x is false (0).
  • BitDepthY is the luminance bit depth.
  • abs(a) is a function that returns the absolute value of a.
  • Int(a) is a function that returns the integer value of a.
  • Floor(a) is a function that returns the largest integer less than or equal to a.
  • Log2(a) is a function that returns the logarithm to the base 2.
  • Ceil(a) is a function that returns the smallest integer greater than or equal to a.
  • a/d represents the division of a by d (rounded down to nearest whole number).
  • Min(a,b) is a function that returns the smaller of a and b.
  • FIG. 2 is a diagram showing the hierarchical structure of data in an encoded stream Te.
  • the encoded stream Te illustratively includes a sequence and a number of pictures that make up the sequence.
  • FIG. 2 shows a coded video sequence that defines a sequence SEQ, a coded picture that specifies a picture PICT, a coded slice that specifies a slice S, coded slice data that specifies slice data, a coding tree unit included in the coded slice data, and a coding unit included in the coding tree unit.
  • the coded video sequence defines a set of data to be referred to by the video decoding device 31 in order to decode the sequence SEQ to be processed.
  • the sequence SEQ includes a video parameter set VPS (Video Parameter Set), a sequence parameter set SPS (Sequence Parameter Set), a picture parameter set PPS (Picture Parameter Set), a picture PICT, and supplemental enhancement information SEI (Supplemental Enhancement Information).
  • the video parameter set VPS specifies a set of coding parameters that are common to multiple videos, as well as a set of coding parameters related to multiple layers and individual layers included in the video.
  • the sequence parameter set SPS specifies a set of coding parameters that the video decoding device 31 references in order to decode the target sequence. For example, the width and height of a picture are specified. Note that there may be multiple SPSs. In that case, one of the multiple SPSs is selected from the PPS.
  • the picture parameter set PPS specifies a set of coding parameters that the video decoding device 31 references in order to decode each picture in the target sequence. For example, it includes the reference value of the quantization width used in decoding the picture (pic_init_qp_minus26) and a flag indicating the application of weighted prediction (weighted_pred_flag). Note that there may be multiple PPSs. In that case, one of the multiple PPSs is selected for each picture in the target sequence.
  • a coded picture defines a set of data to be referenced by the video decoding device 31 in order to decode a picture PICT to be processed. As shown in the coded picture of FIG. 2, the picture PICT includes slices 0 to NS-1 (NS is the total number of slices included in the picture PICT).
  • An encoded slice defines a set of data to be referenced by the video decoding device 31 in order to decode a slice S to be processed. As shown in the encoded slice of Fig. 2, a slice includes a slice header and slice data.
  • the slice header includes a set of coding parameters that the video decoding device 31 refers to in order to determine the decoding method for the target slice.
  • Slice type designation information (slice_type) that specifies the slice type is an example of a coding parameter included in the slice header.
  • Slice types that can be specified by the slice type specification information include (1) an I slice that uses only intra prediction when encoding, (2) a P slice that uses unidirectional prediction or intra prediction when encoding, and (3) a B slice that uses unidirectional prediction, bidirectional prediction, or intra prediction when encoding.
  • inter prediction is not limited to unidirectional or bidirectional prediction, and a predicted image may be generated using more reference pictures.
  • P or B slice it refers to a slice that includes a block for which inter prediction can be used.
  • the slice header may include a reference to the picture parameter set PPS (pic_parameter_set_id).
  • the coded slice data specifies a set of data to be referenced by the video decoding device 31 in order to decode the slice data to be processed.
  • the slice data includes a CTU, as shown in the coded slice header in Fig. 2.
  • a CTU is a block of a fixed size (e.g., 64x64) that constitutes a slice, and is also called a Largest Coding Unit (LCU).
  • LCU Largest Coding Unit
  • coding tree unit In the coding tree unit in Fig. 2, a set of data that the video decoding device 31 refers to in order to decode the CTU to be processed is specified.
  • the CTU is divided into coding units CU, which are basic units of the coding process, by recursive quad tree division (QT (Quad Tree) division), binary tree division (BT (Binary Tree) division), or ternary tree division (TT (Ternary Tree) division).
  • BT division and TT division are collectively called multi tree division (MT (Multi Tree) division).
  • a node of a tree structure obtained by recursive quad tree division is called a coding node.
  • the intermediate nodes of the quad tree, binary tree, and ternary tree are coding nodes, and the CTU itself is specified as the top coding node.
  • the CU is composed of a CU header CUH, prediction parameters, transformation parameters, quantization transformation coefficients, etc.
  • the CU header defines a prediction mode, etc.
  • the prediction process may be performed on a CU basis, or on a sub-CU basis, which is a further division of a CU. If the size of the CU and sub-CU are equal, there is one sub-CU in the CU. If the size of the CU is larger than the size of the sub-CU, the CU is divided into sub-CUs. For example, if the CU is 8x8 and the sub-CU is 4x4, the CU is divided into 2 parts horizontally and 2 parts vertically, into 4 sub-CUs.
  • Intra prediction is a prediction within the same picture
  • inter prediction refers to a prediction process performed between different pictures (for example, between display times or between layer images).
  • the transform and quantization process is performed on a CU basis, but the quantized transform coefficients may be entropy coded on a subblock basis, such as 4x4.
  • the predicted image is derived from prediction parameters associated with the block, which include intra-prediction and inter-prediction parameters.
  • the intra prediction parameters consist of a luminance prediction mode IntraPredModeY and a chrominance prediction mode IntraPredModeC.
  • Figure 3 is a schematic diagram showing the types of intra prediction modes (mode numbers). As shown in the figure, there are, for example, 67 types of intra prediction modes (0 to 66). These include planar prediction (0), DC prediction (1), and angular prediction (2 to 66).
  • linear model (LM: Linear Model) prediction such as cross component linear model (CCLM: Cross Component Linear Model) prediction and multi-mode linear model (MMLM: Multi Mode Linear Model) prediction may also be used.
  • LM Linear Model
  • CCLM Cross Component Linear Model
  • MMLM Multi Mode Linear Model
  • an LM mode may be added for chrominance.
  • the video decoding device 31 includes an entropy decoding unit 301, a parameter decoding unit (prediction image decoding device) 302, a loop filter 305, a reference picture memory 306, a prediction parameter memory 307, a prediction image generating unit (prediction image generating device) 308, an inverse quantization and inverse transform unit 311, an addition unit 312, and a prediction parameter derivation unit 320.
  • the video decoding device 31 may also be configured not to include the loop filter 305.
  • the parameter decoding unit 302 further includes a header decoding unit 3020, a CT information decoding unit 3021, and a CU decoding unit 3022 (prediction mode decoding unit), and the CU decoding unit 3022 further includes a TU decoding unit 3024. These may be collectively referred to as a decoding module.
  • the header decoding unit 3020 decodes parameter set information such as VPS, SPS, PPS, and APS, and slice headers (slice information) from the encoded data.
  • the CT information decoding unit 3021 decodes the CT from the encoded data.
  • the CU decoding unit 3022 decodes the CU from the encoded data.
  • the TU decoding unit 3024 decodes QP update information (quantization correction value) and quantization prediction error (residual_coding) from the encoded data when a prediction error is included in the TU.
  • the predicted image generating unit 308 includes an inter predicted image generating unit 309 and an intra predicted image generating unit 310.
  • the prediction parameter derivation unit 320 includes an inter prediction parameter derivation unit 303 and an intra prediction parameter derivation unit 304.
  • CTU and CU are used as processing units, but this is not limiting and processing may be performed in sub-CU units.
  • CTU and CU may be interpreted as blocks and sub-CU as sub-blocks, and processing may be performed in block or sub-block units.
  • the entropy decoding unit 301 performs entropy decoding on the externally input encoded stream Te to decode individual codes (syntax elements).
  • syntax elements There are two types of entropy coding: CABAC (Context Adaptive Binary Arithmetic Coding), which performs variable-length coding of syntax elements using a context (probability model) adaptively selected according to the type of syntax element and the surrounding circumstances, and a method of variable-length coding of syntax elements using a predetermined table or formula.
  • CABAC Context Adaptive Binary Arithmetic Coding
  • the entropy decoding unit 301 initializes all CABAC states at the beginning of a segment (tile, CTU row, slice).
  • the entropy decoding unit 301 converts syntax elements to a binary string (Bin String) and decodes each bit of the Bin String.
  • Bin String binary string
  • a context index ctxInc is derived for each bit of the syntax element, the bit is decoded using the context, and the CABAC state of the used context is updated. Bits without context are decoded with equal probability (EP, bypass), and ctxInc derivation and CABAC state are omitted.
  • the entropy decoding unit 301 outputs the decoded code to the parameter decoding unit 302.
  • the control of which code to decode is performed based on instructions from the parameter decoding unit 302.
  • the intra prediction parameter derivation unit 304 decodes intra prediction parameters, for example, an intra prediction mode IntraPredMode, with reference to prediction parameters stored in the prediction parameter memory 307, based on the code input from the entropy decoding unit 301.
  • the intra prediction parameter derivation unit 304 outputs the decoded intra prediction parameters to the predicted image generation unit 308, and also stores them in the prediction parameter memory 307.
  • the intra prediction parameter derivation unit 304 may derive different intra prediction modes for luminance and chrominance.
  • the loop filter 305 is a filter provided in the encoding loop that removes block distortion and ringing distortion and improves image quality.
  • the loop filter 305 applies filters such as a deblocking filter, sample adaptive offset (SAO), and adaptive loop filter (ALF) to the decoded image of the CU generated by the adder 312.
  • filters such as a deblocking filter, sample adaptive offset (SAO), and adaptive loop filter (ALF) to the decoded image of the CU generated by the adder 312.
  • the reference picture memory 306 stores the decoded image of the CU generated by the adder 312 in a predetermined location for each target picture and target CU.
  • the prediction parameter memory 307 stores prediction parameters at a predetermined location for each CTU or CU to be decoded. Specifically, the prediction parameter memory 307 stores the parameters decoded by the parameter decoding unit 302 and the prediction mode predMode separated by the entropy decoding unit 301.
  • the prediction mode predMode, prediction parameters, etc. are input to the prediction image generation unit 308.
  • the prediction image generation unit 308 also reads a reference picture from the reference picture memory 306.
  • the prediction image generation unit 308 generates a prediction image of a block or sub-block using the prediction parameters and the read reference picture (reference picture block) in the prediction mode indicated by the prediction mode predMode.
  • a reference picture block is a set of pixels on the reference picture (usually rectangular, so called a block), and is the area referenced to generate a prediction image.
  • the intra prediction image generation unit 310 performs intra prediction using the intra prediction parameters input from the intra prediction parameter derivation unit 304 and reference pixels read from the reference picture memory 306 .
  • the intra-prediction image generation unit 310 reads adjacent blocks in a predetermined range from the target block on the target picture from the reference picture memory 306.
  • the predetermined range refers to the adjacent blocks to the left, upper left, upper, and upper right of the target block, and the area to be referenced differs depending on the intra-prediction mode.
  • the intra-prediction image generation unit 310 generates a prediction image of the target block by referring to the read decoded pixel values and the prediction mode indicated by IntraPredMode.
  • the intra-prediction image generation unit 310 outputs the generated prediction image of the block to the addition unit 312.
  • reference region R a decoded surrounding area adjacent (close) to the block to be predicted is set as reference region R. Then, a predicted image is generated by extrapolating pixels in reference region R in a specific direction.
  • reference region R may be set as an L-shaped region that includes the left and top of the block to be predicted (or further, the top left, top right, and bottom left).
  • the intra-prediction image generation unit 310 includes a reference sample filter unit 3103 (second reference image setting unit), a prediction unit 3104, and a prediction image correction unit 3105 (prediction image correction unit, filter switching unit, and weighting coefficient changing unit).
  • the prediction unit 3104 generates a prediction image (pre-correction prediction image) of the block to be predicted based on each reference pixel (reference image) in the reference region R, a filtered reference image generated by applying a reference pixel filter (first filter), and the intra prediction mode, and outputs the prediction image to the prediction image correction unit 3105.
  • the prediction image correction unit 3105 corrects the prediction image according to the intra prediction mode, and generates and outputs a prediction image (corrected prediction image).
  • the following describes each component of the intra-prediction image generation unit 310.
  • the reference sample filter unit 3103 derives reference samples recSamples[x][y] at each position (x, y) on the reference region R by referring to the reference image.
  • the reference sample filter unit 3103 applies a reference pixel filter (first filter) to the reference samples recSamples[x][y] according to the intra prediction mode to update the reference samples recSamples[x][y] at each position (x, y) on the reference region R (derives filtered reference images recSamples[x][y]).
  • a low-pass filter is applied to the position (x, y) and the reference images therearound to derive a filtered reference image.
  • a low-pass filter may be applied to some intra prediction modes.
  • the filter applied to the reference image on the reference region R in the reference sample filter unit 3103 is referred to as the "reference pixel filter (first filter),” whereas the filter that corrects the predicted image in the predicted image correction unit 3105 described below is referred to as the "position-dependent filter (second filter).”
  • the prediction unit 3104 generates a prediction image of a prediction target block based on an intra prediction mode, a reference image, and a filtered reference pixel value, and outputs the prediction image to a prediction image correction unit 3105.
  • the prediction unit 3104 includes a planar prediction unit 31041, a DC prediction unit 31042, an angular prediction unit 31043, an LM prediction unit 31044, a matrix-based intra prediction unit 31045, and a template-based intra mode derivation (TIMD) prediction unit 31047.
  • the prediction unit 3104 selects a specific prediction unit according to the intra prediction mode, and inputs the reference image and the filtered reference image.
  • the relationship between the intra prediction mode and the corresponding prediction unit is as follows.
  • planar prediction unit 31041 generates a predicted image by linearly adding the reference samples recSamples[x][y] according to the distance between the pixel position to be predicted and the reference pixel position, and outputs the predicted image to the predicted image correction unit 3105.
  • the DC prediction unit 31042 derives a DC predicted value equivalent to the average value of the reference samples recSamples[x][y], and outputs a predicted image predSamples[x][y] whose pixel values are the DC predicted values.
  • refIdx may be set by decoding the syntax of the encoded data.
  • refIdx may be fixed at 0.
  • refIdx may be set to 2 or 4.
  • the LM prediction unit 31044 predicts pixel values of chrominance based on pixel values of luminance. Specifically, this is a method of generating a predicted image of a chrominance image (Cb, Cr) using a linear model based on a decoded luminance image.
  • LM prediction is a prediction method that uses a linear model to predict chrominance from luminance for one block.
  • the MIP unit 31045 generates a predicted image predSamples[x][y] by performing a product-sum operation on the reference samples recSamples[x][y] derived from the adjacent region and a weighting matrix, and outputs the generated image to the predicted image correction unit 3105.
  • the predicted image correction unit 3105 corrects the predicted image output from the prediction unit 3104 according to the intra prediction mode. Specifically, the predicted image correction unit 3105 derives a position-dependent weighting coefficient for each pixel of the predicted image according to the reference region R and the position of the target predicted pixel. Then, the predicted image correction unit 3105 derives a corrected predicted image (corrected predicted image) Pred[][] by performing weighted addition (weighted averaging) of the reference samples recSamples[][] and the predicted images predSamples[x][y]. Note that in some intra prediction modes, the predicted image correction unit 3105 may set the predicted image predSamples[x][y] to Pred without correcting it.
  • the intra prediction parameter derivation unit 304 derives a Most Probable Mode (MPM) list candModeList as follows: The upper left coordinates of the target block are (x0, y0), the block width is cbWidth, and the block height is cbHeight.
  • MPM Most Probable Mode
  • the intra prediction parameter derivation unit 304 may derive the intraPredMode of block A adjacent to the left of the target block as intraPredModeA, and the intraPredMode of block B adjacent above as intraPredModeB, as follows.
  • Block A (hereinafter A) is a block having coordinates (x0-1, y0+cbHeight-1).
  • Block B (hereinafter B) is a block having coordinates (x0+cbWidth-1, y0-1).
  • candModeList[] may be derived from candModePredModeA and candModePredModeB as follows:
  • candModeList[0] candIntraPredModeA
  • candModeList[1] candIntraPredModeB
  • minAB Min(candIntraPredModeA, candIntraPredModeB)
  • candModeList[2] 2 + ((minAB + 61) % 64)
  • candModeList[3] 2 + ((minAB - 1) % 64)
  • candModeList[4] 2 + ((minAB + 60) % 64
  • TIMD prediction is a prediction method that generates a predicted image using an intra prediction mode that is not explicitly signaled.
  • the TIMD prediction unit 31047 derives an intra prediction mode suitable for predicting an image (template image tempSamples) of a template region RT that is an adjacent region of the target block.
  • a predicted image tpredSamples (template predicted image) of the template region RT for multiple intra prediction mode candidates is generated using an image of a reference region (template reference region) RTRS near the template region.
  • the intra prediction mode candidate used to derive tpredSamples that minimizes the cost (e.g., the sum of absolute differences) between tempSamples and tpredSamples is selected as the intra prediction mode of the TIMD of the target block.
  • the TIMD prediction unit 31047 generates a predicted image predSamples using the intra prediction mode of this TIMD.
  • FIG. 8(a) shows an example of the syntax of the coded data related to TIMD.
  • the parameter decoding unit 302 decodes a flag timd_flag indicating whether or not TIMD is used for each block from the coded data.
  • the block may be a CU, a TU, a subblock, or the like.
  • timd_flag of the target block is 0, the parameter decoding unit 302 decodes a syntax element intra_luma_mpm_flag related to the intra prediction mode, and when intra_luma_mpm_flag is 0, the parameter decoding unit 302 further decodes intra_luma_mpm_reminder.
  • intra_luma_mpm_flag is a flag indicating whether or not an intra prediction mode is derived from the intra prediction candidate list candModeList[].
  • intra_luma_mpm_idx is an index specifying an intra prediction candidate when candModeList[] is used.
  • intra_luma_mpm_reminder is an index for selecting an intra prediction candidate not included in candModeList[].
  • the parameter decoding unit 302 does not need to decode syntax elements related to the intra prediction mode (intra_luma_mpm_flag, intra_luma_mpm_idx, intra_luma_mpm_reminder) from the encoded data.
  • the parameter decoding unit 302 further decodes the timd_ref_mode of the current block.
  • the timd_ref_mode indicates the position of the template reference region used to derive the intra prediction mode in TIMD prediction.
  • the meaning of the timd_ref_mode may be as follows:
  • Figure 6 shows the template region RT and template reference region (template reference sample region) RTRS referenced in TIMD prediction.
  • the template region is the region corresponding to the template image.
  • the template reference region RTRS is the region referenced when generating a template predicted image, which is a predicted image of the template image.
  • the TIMD prediction unit 31047 When timd_flag is 1, the TIMD prediction unit 31047 generates a template prediction image for an intra prediction mode candidate using an image of the template reference region RTRS near the target block, and selects a suitable intra prediction mode for the target block.
  • the entropy decoding unit 301 may parse, for example, the syntax element timd_ref_mode shown in the syntax table of FIG. 8 as follows.
  • timd_ref_mode is a syntax element that selects the template region of the TIMD.
  • timd_ref_mode is any of TIMD_MODE_TOP_LEFT mode, TIMD_MODE_TOP mode, and TIMD_MODE_LEFT, and may be 0, 1, or 2, respectively.
  • TIMD_MODE_TOP_LEFT uses the regions above and to the left of the target block as the template region.
  • TIMD_MODE_TOP mode uses the region above the target block as the template region.
  • TIMD_MODE_LEFT uses the region to the left of the target block as the template region.
  • Figure 9(a) shows an example of binarization of timd_ref_mode.
  • Bin0 A bit that selects between TIMD_MODE_TOP_LEFT and other modes. If it is 0, it indicates that it is TIMD_MODE_TOP_LEFT, and if it is 1, it indicates that it is not TIMD_MODE_TOP_LEFT mode.
  • timd_ref_mode timd_mode_flag + timd_mode_dir
  • 1 bit e.g., "0” is assigned to TIMD_MODE_TOP_LEFT
  • 1 bit e.g., "1” is assigned to TIMD_MODE_TOP and TIMD_MODE_LEFT
  • another bit to distinguish between TIMD_MODE_TOP and TIMD_MODE_LEFT e.g., "0" indicates TIMD_MODE_TOP and "1” indicates TIMD_MODE_LEFT).
  • TIMD_MODE_TOP_LEFT which has a high selection rate in the binarization of timd_ref_mode, shorter bits are assigned than when using TIMD_MODE_LEFT or TIMD_MODE_TOP, which has the effect of shortening the average code size and improving coding efficiency.
  • Figures 9(b) to (f) show the setting of the context (ctxInc) of timd_ref_mode in parsing.
  • a context is a variable area for holding the probability (state) of CABAC, and is identified by the value of the context index ctxIdx (0, 1, 2, ). The case where 0 and 1 are always equally probable, in other words, 0.5, 0.5, is called EP (Equal Probability) or bypass. In this case, no context is used because there is no need to hold a state for a specific syntax element.
  • ctxIdx is derived by referring to ctxInc.
  • Bin0 is a bit indicating whether TIMD_MODE_TOP_LEFT or not
  • Bin1 is a bit indicating whether TIMD_MODE_LEFT or TIMD_MODE_TOP or not.
  • the formulas and values are not limited to those described above, and the order of determination and values may be changed. For example, the following may be used.
  • ctxIdx (cbWidth > cbHeight) ? 1 : (cbWidth ⁇ cbHeight) ? 2 : 3
  • the formulas and values are not limited to those described above, and the order of determination and values may be changed. For example, the following may be used.
  • ctxIdx (cbWidth > cbHeight)? 1 : (cbWidth ⁇ cbHeight)? 2 : bypass
  • the TIMD mode (timd_ref_mode) is composed of a first bit and a second bit, where the first bit may select whether the TIMD template region is both above and to the left of the target block, and the second bit may select whether the TIMD template region is to the left or above the target block.
  • a predetermined context e.g., 1 may be used to parse timd_ref_mode, and otherwise a context different from the case when the shape of the block is square (e.g., 2) may be used to parse timd_ref_mode.
  • timd_ref_mode is decoded using the value obtained by swapping the binary value of Bin1 (1 to 0, 0 to 1, for example, 1-Bin1) depending on whether cbWidth>cbHeight or cbHeight ⁇ cbWidth.
  • timd_ref_mode may be derived as follows.
  • the above configuration uses different contexts depending on the shape of the target block, for example, whether the target block is square or not (and/or whether it is horizontal or vertical), making it possible to adaptively encode the block with a short code according to its characteristics, improving performance. Also, for example, in the case of a square, if no context is used at the same time, this has the effect of reducing memory.
  • timd_ref_mode TIMD_MODE_TOP
  • timd_ref_mode TIMD_MODE_LEFT
  • refIdxW and refIdxH may be set using intra_luma_ref_idx as shown in FIG. 8(c).
  • intra_luma_ref_idx is a syntax indicating the position of the reference sample, and is decoded from the encoded data by the parameter decoding unit 302.
  • timd_ref_mode TIMD_MODE_TOP
  • timd_ref_mode TIMD_MODE_LEFT
  • Timd_ref_mode decoding decision condition 1 The entropy decoding unit 301 may decode timd_ref_mode when the condition shown in FIG. 8(b) is satisfied.
  • timd_ref_mode is a syntax element for selecting a template region of TIMD.
  • THSIZE a predetermined threshold
  • the entropy decoding unit 301 estimates timd_ref_mode as TIMD_MODE_TOP_LEFT. For example, this process may be expressed as follows. THSIZE may be, for example, 16, 32, 64, etc.
  • the entropy decoding unit 301 estimates timd_ref_mode as TIMD_MODE_TOP_LEFT. For example, this process may be expressed as follows. THSIZE4 may be, for example, 128, 256, 512, etc.
  • timd_ref_mode TIMD_MODE_TOP_LEFT
  • timd_ref_mode when timd_ref_mode does not appear (when the product of cbWidth and cbHeight is less than THSIZE or greater than THSIZE4), the entropy decoding unit 301 estimates timd_ref_mode to be TIMD_MODE_TOP_LEFT. For example, this process may be expressed as follows:
  • timd_ref_mode TIMD_MODE_TOP_LEFT
  • Timd_ref_mode decoding decision condition 2 (timd_ref_mode decoding decision condition 2) Instead of the above configuration, the following may be used.
  • the entropy decoding unit 301 decodes timd_ref_mode when the sum of cbWidth and cbHeight is equal to or greater than a predetermined threshold THSIZE2. Otherwise, when timd_ref_mode does not appear (when the sum of cbWidth and cbHeight is less than THSIZE2), the entropy decoding unit 301 estimates timd_ref_mode as TIMD_MODE_TOP_LEFT. For example, this process may be expressed as follows. THSIZE2 may be, for example, 8, 12, 16, etc.
  • timd_ref_mode TIMD_MODE_TOP_LEFT
  • the entropy decoding unit 301 estimates timd_ref_mode as TIMD_MODE_TOP_LEFT. For example, this process may be expressed as follows. THSIZE5 may be, for example, 48, 64, 96, etc.
  • timd_ref_mode TIMD_MODE_TOP_LEFT
  • the entropy decoding unit 301 estimates timd_ref_mode to be TIMD_MODE_TOP_LEFT. For example, this process may be expressed as follows:
  • timd_ref_mode TIMD_MODE_TOP_LEFT
  • Timd_ref_mode decoding decision condition 3 The condition may be determined based on the logarithmic values of the width and height of the target block as follows.
  • the entropy decoding unit 301 decodes timd_ref_mode when the sum of Log2(cbWidth) and log2(cbHeight) is equal to or greater than a predetermined threshold THSIZE3. Otherwise, when timd_ref_mode does not appear (when the sum of log2(cbWidth) and log2(cbHeight) is less than THSIZE3), the entropy decoding unit 301 estimates timd_ref_mode to be TIMD_MODE_TOP_LEFT. For example, this process may be expressed as follows. THSIZE3 may be, for example, 4, 5, 6, etc.
  • timd_ref_mode TIMD_MODE_TOP_LEFT
  • timd_ref_mode when timd_ref_mode does not appear (when the sum of Log2(cbWidth) and Log2(cbHeight) is greater than THSIZE6), the entropy decoding unit 301 estimates timd_ref_mode as TIMD_MODE_TOP_LEFT. For example, this process may be expressed as follows. THSIZE6 may be, for example, 7, 8, 9, etc.
  • timd_ref_mode TIMD_MODE_TOP_LEFT
  • timd_ref_mode when timd_ref_mode does not appear (when the sum of Log2(cbWidth) and Log2(cbHeight) is less than THSIZE3 or greater than THSIZE6), the entropy decoding unit 301 estimates timd_ref_mode to be TIMD_MODE_TOP_LEFT. For example, this process may be expressed as follows:
  • Timing_ref_mode decoding decision condition 4 The condition may be determined based on the width and height of the target block as follows. When both cbWidth and cbHeight are equal to or greater than a predetermined threshold THSIZE7, the entropy decoding unit 301 decodes timd_ref_mode. Otherwise, when timd_ref_mode does not appear (when either cbWidth or cbHeight is less than THSIZE7), the entropy decoding unit 301 estimates timd_ref_mode to be TIMD_MODE_TOP_LEFT. For example, this process may be expressed as follows. For THSIZE7, 8, 16, 32, etc. may be used.
  • timd_ref_mode TIMD_MODE_TOP_LEFT
  • the entropy decoding unit 301 decodes timd_ref_mode.
  • the entropy decoding unit 301 estimates timd_ref_mode as TIMD_MODE_TOP_LEFT. For example, this process may be expressed as follows. THSIZE8 may be, for example, 32, 48, 64, etc.
  • timd_ref_mode TIMD_MODE_TOP_LEFT
  • cbWidth and cbHeight are both equal to or larger than a predetermined threshold value THSIZE7 and equal to or smaller than a predetermined threshold value THSIZE8
  • the entropy decoding unit 301 may decode timd_ref_mode. For example, this process may be expressed as follows.
  • timd_ref_mode TIMD_MODE_TOP_LEFT
  • Timing_ref_mode decoding decision condition 5 The condition may be determined based on the coordinates of the target block as follows.
  • the entropy decoding unit 301 decodes timd_ref_mode when both the top left coordinates x0 and y0 of the target block are greater than a predetermined threshold THSIZE9. Otherwise, when timd_ref_mode does not appear (when either x0 or y0 is equal to or smaller than THSIZE8), the entropy decoding unit 301 estimates timd_ref_mode to be TIMD_MODE_TOP_LEFT. For example, this process may be expressed as follows. THSIZE9 may be, for example, 4, 8, 16, etc.
  • TIMD_MODE_TOP_LEFT may be set and processing may be fixed. Note that (x0, y0) may be set as coordinates within a segment (screen, slice, tile) instead of coordinates within the screen, and the determination may be made based on the distance from the segment boundary.
  • isp_mode_dir indicates the direction of block division (direction of the boundary line) when ISP is used.
  • the meaning of isp_mode_dir may be as follows:
  • the areas within each subblock can be used as the template area in TIMD_MODE_TOP. These areas are different from the upper adjacent area of the target block when ISP division is not performed, which may lead to more accurate derivation of a predicted image.
  • the template area used in TIMD_MODE_LEFT for horizontal division of ISP is the same as the left adjacent area when ISP is not used, and the predicted image is unlikely to differ from when ISP is not used, i.e., it is a redundant option. Therefore, limiting isp_mode_dir according to the value of timd_ref_mode has the effect of improving coding efficiency. The same can be said for vertical division of ISP.
  • isp_mode_dir may be either HOR_INTRA_PARTIOTIONS indicating horizontal partitioning or VER_INTRA_PARTITIONS indicating vertical partitioning, and each may be 0 or 1.
  • the syntax elements isp_mode_flag and isp_mode_dir may be combined into one syntax element isp_mode.
  • the value of isp_mode may be as follows:
  • isp_mode 0
  • Use ISP split-horizon, ISP_HOR_SPLIT
  • isp_mode 2
  • Use ISP vertical split, ISP_VER_SPLIT
  • the entropy decoding unit 301 may derive isp_mode from isp_mode_flag and isp_mode_dir by the following formula.
  • the intra-prediction image generation unit 310 divides the current block into sub-blocks (e.g., into four) along horizontal or vertical boundaries, and derives a prediction image for each sub-block in sequence using common intra-prediction parameters within the current block.
  • isp_mode when isp_mode is 0 (ISP_NO_SPLIT), intra prediction is performed without further division of the CU.
  • isp_mode When isp_mode is 1 (ISP_HOR_SPLIT), the CU is divided into 2 to 4 subblocks vertically, and intra prediction, transform coefficient decoding, inverse quantization, and inverse transform are performed on a subblock basis.
  • isp_mode is 2 (ISP_VER_SPLIT)
  • the CU is divided into 2 to 4 subblocks horizontally, and intra prediction, transform coefficient decoding, inverse quantization, and inverse transform are performed on a subblock basis.
  • the number of subblock divisions, NumIntraSubPart is derived using the following formula.
  • the width nW and height nH of the subblock, and the number of horizontal and vertical divisions numPartsX and numPartY are derived as follows.
  • nTbW and nTbH are the width and height of a CU (or TU).
  • the parameter decoding unit 302 informs the entropy decoding unit 301 which syntax elements to parse.
  • the entropy decoding unit 301 outputs the syntax elements parsed by the entropy decoding unit 301 to the prediction parameter derivation unit 320.
  • the above configuration increases prediction efficiency and improves coding efficiency by signaling appropriate template regions.
  • the generation of inappropriate template prediction images and cost calculations can be omitted, reducing the amount of calculations.
  • FIG. 7 shows the configuration of the TIMD prediction unit 31047 in this embodiment.
  • the TIMD prediction unit 31047 is composed of a reference sample derivation unit 4701, a template derivation unit 4702, an intra prediction unit 4703, an intra prediction mode candidate derivation unit 4711, a template predicted image generation unit 4712, a template cost derivation unit 4713, and an intra prediction mode selection unit 4714.
  • the intra prediction mode candidate derivation unit 4711, the template predicted image generation unit 4712, and the template cost derivation unit 4713 may be collectively referred to as a template intra prediction mode derivation device 4710.
  • the TIMD prediction unit 31047 When timd_flag is 1, the TIMD prediction unit 31047 generates a template predicted image tpredSamples using an intra prediction mode that accurately predicts the template image tempSamples generated from the image of the template region RT. Specifically, the TIMD prediction unit 31047 executes the following procedure.
  • the predetermined prediction mode may include an angular prediction mode, a planar prediction mode, a DC prediction mode, and an intra prediction mode derived by decoder-side intra mode derivation (DIMD, Decoder-side Intra Mode Derivation) included in the MPM.
  • DIMD decoder-side intra mode derivation
  • the intra prediction mode corresponding to the tpredSamples that are determined to have a small cost value, i.e., high prediction accuracy, is selected as the prediction mode for the TIMD.
  • the amount of code can be reduced by signaling the appropriate template reference region RTRS using a syntax element.
  • the amount of processing can be reduced because it is no longer necessary to generate a template prediction image using pixels from an inappropriate template reference region or to calculate costs.
  • the intra-prediction mode candidate derivation unit 4711 first derives a list of intra-prediction mode candidates timdModeList[] from the intra-prediction modes of blocks adjacent to a target block.
  • the MPM list candModeList[] may be used as timdModeList.
  • the template derivation unit 4702 derives a template image tempSamples of the current block. As shown in Fig. 6, the template image may be derived from a template region RT adjacent to the current block.
  • the template region RT is an L-shaped set of decoded pixels recSamples with a width of one pixel.
  • the area on recSamples used to derive tempSamples is called the template area RT and is expressed as a set of coordinates (i, j).
  • Template image example 1 an example that does not include the upper left area of the target block
  • 10 to 14 are diagrams showing examples of the template region RT and the template reference region RTRS different from those in FIG.
  • the template image may be shaped other than L-shaped.
  • Figure 10 shows an example where the template region is the region RL on the left side of the target block (middle of the figure) or the region RT above the target block (right side of the figure) in addition to the L-shape (left side of the figure).
  • the template image is set using the following formula.
  • Template image example 2 Example including the upper left area of the target block
  • Figure 11 also shows an example where the template region is an area RL including the left side and upper left corner of the target block (middle of the figure), and an area RT including the upper side and upper left corner of the target block (right side of the figure).
  • the template image is set by the following formula.
  • Template image example 3 an example of extending the target block to the lower left and upper right without including the upper left area
  • Figure 12 also shows an example where the template region is the region RL_EXT (middle of the figure) that includes the left side and bottom left of the target block, and the region RT_EXT (right side of the figure) that includes the top side and top right of the target block.
  • the template image is set by the following formula.
  • tExtW and tExtH are the width and height of the template image extended to the upper right and lower left, respectively.
  • Fig. 13 shows an example where the template region is an area RL_EXT (middle of the figure) that includes the left side, upper left, and lower left of the target block, and an area RT_EXT (right side of the figure) that includes the upper side, upper left, and upper right of the target block.
  • the template image is set by the following formula.
  • timd_ref_mode indicates any of TIMD_MODE_TOP_LEFT, TIMD_MODE_LEFT, and TIMD_MODE_TOP
  • timd_ref_mode TIMD_MODE_TOP.
  • the template derivation unit 4702 refers to timd_ref_mode, selects one of the three template regions, and generates a template image.
  • the template derivation unit 4702 generates a template image as follows:
  • the reference sample derivation unit 4701 derives a reference sample refUnit from the template reference region RTRS. Note that the operation of the reference sample derivation unit 4701 may be performed by the reference sample filter unit 3103.
  • refUnit[x][y] recSamples[x0+x][y0+y]
  • Example 1 of template reference region RTRS Example that does not include the top left corner of the target block
  • timd_ref_mode is TIMD_MODE_LEFT or TIMD_MODE_TOP
  • the template reference region RTRS changes according to the template image.
  • the template reference region RTRS is only the region to the left of the target block.
  • timd_ref_mode is TIMD_MODE_TOP
  • the template reference region RTRS is only the region above the target block.
  • the reference sample derivation unit 4701 sets the template reference region as follows.
  • the reference sample derivation unit 4701 may use the following formula.
  • the template reference region may include the left and upper regions of the target block.
  • the reference sample derivation unit 4701 sets the template reference region as follows: rW is the width of the template reference region when timd_ref_mode is TIMD_MODE_LEFT, and rH is the height of the template reference region when timd_ref_mode is TIMD_MODE_TOP.
  • the reference sample derivation unit 4701 may use the following formula.
  • the above configuration expands the template reference area used to generate the template predicted image, thereby improving prediction accuracy, particularly from the upper left direction, and improving coding efficiency.
  • Example 1 of template reference region RTRS and (Example 2 of template reference region RTRS), we have explained examples where the template reference region RTRS touches the template image.
  • Example 3 of template reference region RTRS we will use Figure 14 to explain an example where the template reference region RTRS does not touch the template image.
  • Figure 14 is a diagram in which Figure 11(b) is applied so that the template reference region RTRS does not touch the template image. Therefore, this example can also be applied to each of Figures 10 to 13 other than Figure 11(b).
  • the reference sample derivation unit 4701 derives the image refUnit of the template reference area from a position that is refIdxW and refIdxH away from the template image in the horizontal and vertical directions, respectively.
  • the position of the template reference area near the target block is such that if the template area is located only to the left of the target block, the vertical position is equal to or greater than the minimum vertical coordinate of the template area, and if the template area is located only to the upper side of the target block, the horizontal position is equal to or greater than the minimum horizontal coordinate of the template area.
  • the location of the template reference region near the target block includes adjacent pixels located above and to the upper left of the template region, even if the template region is located only to the left of the target block, if adjacent blocks are available. Similarly, even if the template region is located only to the top of the target block, it includes adjacent pixels located to the left and to the upper left of the template region, if adjacent blocks are available. In other words, the template reference region includes the top left pixel positions of the 8 neighbors of the template region.
  • the position of the template reference area near the target block is located a predetermined distance away from the template area.
  • the reference sample derivation unit 4701 may filter the reference sample refUnit[x][y] to derive the reference sample p[x][y].
  • the template predicted image generation unit 4712 generates a predicted image (template predicted image tpredSamples) for the intra prediction mode IntraPredMode from the template reference region RTRS.
  • the operation of deriving a predicted image by the template predicted image generation unit 4712 may be performed by the prediction unit 3104.
  • the planar prediction unit 31041, the DC prediction unit 31042, and the angular prediction unit 31043 may perform both the derivation of a template predicted image and the derivation of a predicted image of a target block.
  • the TIMD prediction unit 31047 divides predetermined intra prediction mode candidates into a plurality of groups, generates a template prediction image for each intra prediction mode candidate selected from each group, evaluates the cost, and derives and selects the cost of candidates other than the selected intra prediction mode candidate in the group to which the intra prediction mode candidate with the lowest cost belongs. For example, the TIMD prediction unit 31047 divides intra prediction modes into numGr groups (numGr>1) and tests only the representative mode of each group (derives a template prediction image, calculates the cost, and compares it). For intra prediction modes included in the best group whose representative mode has the smallest cost, the intra prediction mode may be tested in more detail.
  • the search is divided into numGr searches of coarse granularity intra prediction modes and M-1 searches of fine granularity intra prediction modes.
  • numGr is the number of groups
  • M is the number of intra-modes contained in the group.
  • Figure 18(a) shows an example of dividing intra prediction modes into groups.
  • the intra prediction modes included in timdModeList which is a candidate list used in TIMD prediction, are classified into nine groups, Gr[0] to Gr[8].
  • the grouping of intra prediction modes may be determined using a certain threshold gTH[] and the magnitude relationship between the intra prediction modes.
  • the intra prediction mode selection unit 4714 may divide all angular prediction modes into groups. Alternatively, it may divide only candidates included in intra prediction modes timdModeList derived from the intra prediction modes of blocks adjacent to the target block into groups. An example of the latter is shown below.
  • the intra prediction mode selection unit 4714 initializes numGrList and nGrList.
  • the second search searches all intra prediction modes in the best group, which has the effect of performing a complete search.
  • a limited set of candidates such as intra prediction modes included in timdModeList
  • all intra prediction modes in the identified group are searched in the second search. Therefore, the effect of being able to search candidates in a specific direction without omission is achieved.
  • the groups are configured as exclusive groups with no overlaps, which allows for a waste-free search and reduces complexity.
  • modes that are equal to or close to the threshold value may be included in both groups on either side of the threshold value.
  • An appropriate number of overlaps is 1 to 2.
  • the representative mode may be the mode with the highest position in timdModeList among the intra prediction modes of each group, or the first mode. In other words, one representative mode is set from the modes stored in GrList[m][0].
  • the representative mode for each group is not limited to the above.
  • it may be the intra prediction mode with the smallest mode number within each group.
  • the representative mode of each group may be the maximum intra-prediction mode within each group.
  • the representative mode of each group may be the intra-prediction mode having the median value within each group.
  • the group includes a specific direction such as horizontal, vertical, 45 degrees, or 135 degrees, that direction can be set as the representative mode, and if not, another method, such as one of the methods mentioned above, can be used to set the representative mode.
  • the template prediction image generating unit 4712 generates ref[], for example, as follows (expression TIMD-ANGULAR-REF). The following generates tpredSamples[][] from p[][], but refUnit[][] may be used instead of p[][].
  • the template predicted image generation unit 4712 (intra prediction unit 4703) generates a template predicted image tpredSamples[][] corresponding to tIntraPredMode as in the following (equation TIMD-ANGULAR-PRED), for example.
  • filt is an interpolation filter coefficient for the template predicted image.
  • the number of taps MTAP of the interpolation filter used to derive the template predicted image is, for example, 4.
  • the intraPredAngle is an angle parameter for each intra prediction mode.
  • filt may be derived from iFast as follows: filtG[phase][j] is the coefficient of the interpolation filter for generating the template predicted image.
  • the template derivation unit 4702 derives the template image tempSamples of the target block using the above method.
  • the template predicted image generating unit 4712 generates a predicted image (template predicted image tpredSamplesR) of the representative mode repMode[m] from the template reference region RTRS using the above method.
  • the template cost derivation unit 4713 derives the cost tempGrCost[m] of the intra prediction mode candidate from the difference between the template predicted image tpredSamplesR of the representative mode candidate repMode[m] and the template image tempSamples.
  • the cost may be SAD (Sum of Absolute Difference).
  • tW and tH are the width and height of the template image.
  • the template cost derivation unit 4713 derives tempCost using the following formula.
  • tempGrCost ⁇ abs(tpredSamplesR[i][j] - tempSamples[i][j])
  • MAX_COST is a value that indicates that the cost value and its group are invalid, and is a value that will never be smaller than other cost values. For example, the maximum value of double type may be used.
  • the cost may be SATD (Sum of Absolute Transformed Difference) instead of SAD, or may be the weighted sum of multiple costs.
  • the intra-prediction mode selection unit 4714 derives the intra-prediction mode corresponding to the minimum value of tempGrCost as the intra-prediction mode IntraPredMode of the TIMD mode.
  • Steps (STEP1-1) and (STEP1-2) are executed for all intra prediction modes in group mm. If the intra prediction mode has a cost already derived, steps (STEP1-1) and (STEP1-2) can be omitted. The smallest value costMode1 and the second smallest value costMode2 in group mm, and the corresponding intra prediction mode candidates kMode1 and kMode2 are selected.
  • the prediction directions represented by kMode1 and kMode2 may be adjusted to more accurate direction representation.
  • kMode1 and kMode2 For the two derived modes kMode1 and kMode2, derive the weights weight1 and weight2 to be used in generating the predicted image according to the costs costMode1 and codeMode2. For example, they are derived as follows:
  • the group mm may be further recursively divided into smaller groups.
  • Gr[1] includes 33 intra prediction modes from 2 to 34, and the number of searches in the second search becomes too large.
  • the second search is performed on this best group. If the number of intra prediction modes in Gr[3] or Gr[4] is large, further division may be repeated. By limiting the number of modes to be searched in a stepwise manner in this way, it is possible to sequentially derive a highly accurate intra prediction mode IntraPredMode with a small amount of processing.
  • the intra prediction unit 4703 derives an intra predicted image predSamples[][] corresponding to IntraPredMode. Then, it outputs it as an intra predicted image of the current block. Note that these processes may be derived by the prediction unit 3104.
  • iIdx (((y + 1) * intraPredAngle) >> 5)
  • ref[] is as explained in the template predicted image generation unit.
  • fT is the interpolation filter coefficient for the intra-prediction image.
  • MTAP is the number of taps of the interpolation filter filt used to derive the template prediction image tpredSamples.
  • predSamples[x][y] (weight1*predSamples1[x][y] + weight2*predSamples2[x][y]) >> 8 Otherwise, if the first condition of (STEP 2) is not satisfied, predSamples[][] is generated using the intra-predicted image predSamples1[][] corresponding to kMode1. predSamples1[][] is generated by (the formula INTRA-ANGULAR-PRED).
  • predSamples[x][y] predSamples1[x][y]
  • Example 1 of Example 1 As another example of the first embodiment, another group division method will be described. In this method, the groups are divided so that the number of elements in each group including the angular prediction mode is approximately equal, and a large difference in the total number of modes to be processed, i.e., the amount of processing, does not occur depending on the group selected. In other words, intra prediction mode candidates in which the difference in the number of elements in the groups including the angular prediction mode is one or less are used.
  • the intra prediction mode selection unit 4714 may further separate non-directional modes (Planar or DC) from directional modes, and perform group division using the directional mode candidate timdAngModeList.
  • numAngCand is the number of elements in timdAngModeList.
  • the intra prediction mode selection unit 4714 divides timdAngModeList into (numGr-2) groups (Gr[1] to Gr[numGr-1]) so that the number of elements in each group is approximately numAngCand/(numGr-2).
  • numAngCand 23
  • numGr 5.
  • the number of elements in each of the groups Gr[1] to Gr[numGr-1] split from timdModeList differs from the other groups by at most 1.
  • the number of elements is almost the same, which has the effect of steadily reducing the amount of processing.
  • the intra prediction modes are not limited to this.
  • the intra prediction modes may take values from -14 to 80, as in the case of wide angle.
  • a new group may be assigned to the intra prediction modes from -14 to -1 or 67 to 80. Then, a representative mode or the best group may be identified.
  • the video decoding device of this embodiment includes a parameter decoding unit that decodes a template-based intra mode derivation flag from encoded data, a template derivation unit that generates a template image using an image of a template region adjacent to a target block, a reference sample derivation unit that generates a reference sample using an image of a template reference region near the target block, a template predicted image generation unit that generates a template predicted image for a predetermined intra prediction mode candidate using the reference sample, a template cost derivation unit that derives a cost from the template predicted image and the template image, and an intra prediction mode selection unit that selects the intra prediction mode of the target block based on the cost, the template derivation unit specifies the position of the template region by referring to the template-based intra mode derivation flag, and the reference sample derivation unit derives the reference sample by referring to the template-based intra mode derivation flag.
  • the template-based intra mode derivation flag indicates whether to use the upper and left sides of the target block, the left side of the target block, or the upper side of the target block as the template region.
  • the template area to the left of the target block has a vertical position that is greater than or equal to the minimum and less than or equal to the maximum vertical coordinate of the target block, and the template area above the target block has a horizontal position that is greater than or equal to the minimum and less than or equal to the maximum horizontal coordinate of the target block.
  • the template region to the left of the target block has a vertical position that is equal to or less than the maximum vertical coordinate of the target block, and the template region above the target block has a horizontal position that is equal to or less than the maximum horizontal coordinate of the target block; in other words, the template region to the left of the target block includes an area whose vertical position is equal to or less than the minimum vertical coordinate of the target block, and the template region above the target block includes an area whose horizontal position is equal to or less than the minimum horizontal coordinate of the target block.
  • the template region to the left of the target block has a vertical position equal to or greater than the minimum vertical coordinate of the target block, and the template region above the target block has a horizontal position equal to or greater than the minimum horizontal coordinate of the target block; in other words, the template region to the left of the target block includes an area whose vertical position is equal to or greater than the maximum vertical coordinate of the target block, and the template region above the target block includes an area whose horizontal position is equal to or greater than the maximum horizontal coordinate of the target block.
  • the template area to the left of the target block has no vertical position restrictions relative to the target block, and the template area above the target block has no horizontal position restrictions relative to the target block.
  • the distance from the target block may be changed in the horizontal and vertical directions.
  • the template predicted image generator 4712 may perform the following.
  • the above configuration allows the area referenced by the template predicted image to be variable depending on the size of the target block, improving the accuracy of the template predicted image and increasing the accuracy of the TIMD intra predicted image.
  • the adder 312 adds, for each pixel, the predicted image of the block input from the predicted image generation unit 308 and the prediction error input from the inverse quantization and inverse transform unit 311 to generate a decoded image of the block.
  • the adder 312 stores the decoded image of the block in the reference picture memory 306, and also outputs it to the loop filter 305.
  • FIG. 15 is a block diagram showing the configuration of the video encoding device 11 according to this embodiment.
  • the video encoding device 11 includes a prediction image generating unit 101, a subtraction unit 102, a transformation/quantization unit 103, an inverse quantization/inverse transformation unit 105, an addition unit 106, a loop filter 107, a prediction parameter memory (prediction parameter storage unit, frame memory) 108, a reference picture memory (reference image storage unit, frame memory) 109, an encoding parameter determining unit 110, a parameter encoding unit 111, a prediction parameter derivation unit 120, and an entropy encoding unit 104.
  • the predicted image generating unit 101 generates a predicted image for each CU, which is an area obtained by dividing each picture of the image T.
  • the predicted image generating unit 101 includes an intra predicted image generating unit 122 and an inter predicted image generating unit 123.
  • the predicted image generating unit 101 operates in the same way as the predicted image generating unit 308 already explained, and so a description thereof will be omitted.
  • the subtraction unit 102 subtracts the pixel values of the predicted image of the block input from the predicted image generation unit 101 from the pixel values of image T to generate a prediction error.
  • the subtraction unit 102 outputs the prediction error to the transformation and quantization unit 103.
  • the transform/quantization unit 103 calculates transform coefficients by frequency transforming the prediction error input from the subtraction unit 102, and derives quantized transform coefficients by quantizing the prediction error.
  • the transform/quantization unit 103 outputs the quantized transform coefficients to the parameter coding unit 111 and the inverse quantization/inverse transform unit 105.
  • the parameter coding unit 111 includes a header coding unit 1110, a CT information coding unit 1111, and a CU coding unit 1112 (prediction mode coding unit).
  • the CU coding unit 1112 further includes a TU coding unit 1114. The general operation of each module is explained below.
  • the header encoding unit 1110 performs encoding processing of parameters such as header information, splitting information, prediction information, and quantization transformation coefficients.
  • the CT information encoding unit 1111 encodes QT, MT (BT, TT) division information, etc.
  • the CU encoding unit 1112 encodes CU information, prediction information, split information, etc.
  • the TU encoding unit 1114 encodes the QP update information and the quantized prediction error.
  • the CT information encoding unit 1111 and the CU encoding unit 1112 supply syntax elements such as inter prediction parameters, intra prediction parameters, and quantized transform coefficients to the parameter encoding unit 111.
  • the entropy coding unit 104 receives the quantized transform coefficients and coding parameters from the parameter coding unit 111. The entropy coding unit 104 entropy codes these to generate and output the coded stream Te.
  • the prediction parameter derivation unit 120 is a means including an inter-prediction parameter coding unit 112 and an intra-prediction parameter coding unit 113, and derives intra-prediction parameters and intra-prediction parameters from the parameters input from the coding parameter determination unit 110.
  • the derived intra-prediction parameters and intra-prediction parameters are output to the parameter coding unit 111.
  • the intra prediction parameter encoding unit 113 encodes the IntraPredMode and the like input from the encoding parameter determination unit 110.
  • the intra prediction parameter encoding unit 113 includes a part of the same configuration as the configuration in which the intra prediction parameter derivation unit 304 derives intra prediction parameters.
  • the adder 106 generates a decoded image by adding, for each pixel, the pixel value of the predicted block input from the predicted image generation unit 101 and the prediction error input from the inverse quantization and inverse transform unit 105.
  • the adder 106 stores the generated decoded image in the reference picture memory 109.
  • the loop filter 107 applies a deblocking filter, SAO, and ALF to the decoded image generated by the adder 106.
  • SAO deblocking filter
  • ALF ALF
  • the loop filter 107 does not necessarily have to include the above three types of filters, and may be configured, for example, as only a deblocking filter.
  • the prediction parameter memory 108 stores the prediction parameters generated by the encoding parameter determination unit 110 in a predetermined location for each target picture and CU.
  • the reference picture memory 109 stores the decoded image generated by the loop filter 107 in a predetermined location for each target picture and CU.
  • the coding parameter determination unit 110 selects one set from among multiple sets of coding parameters.
  • the coding parameters are the above-mentioned QT, BT or TT division information, prediction parameters, or parameters to be coded that are generated in relation to these.
  • the predicted image generation unit 101 generates a predicted image using these coding parameters.
  • the coding parameter determination unit 110 calculates an RD cost value indicating the amount of information and the coding error for each of the multiple sets.
  • the coding parameter determination unit 110 selects the set of coding parameters that minimizes the calculated cost value.
  • the entropy coding unit 104 outputs the selected set of coding parameters as the coding stream Te.
  • the coding parameter determination unit 110 stores the determined coding parameters in the prediction parameter memory 108.
  • the video decoding device of this embodiment includes a template derivation unit that generates a template image using an image of a template region adjacent to a target block, a reference sample derivation unit that generates a reference sample using an image of a template reference region near the target block, a template predicted image generation unit that generates a template predicted image for a predetermined intra prediction mode candidate using the reference sample, a template cost derivation unit that derives a cost from the template predicted image and the template image, an intra prediction mode selection unit that selects the intra prediction mode of the target block based on the cost, and a parameter encoding unit that encodes a template-based intra mode derivation flag, the template derivation unit specifies the position of the template region by referring to the template-based intra mode derivation flag, and the reference sample derivation unit derives the reference sample by referring to the template-based intra mode derivation flag.
  • a part of the video encoding device 11 and video decoding device 31 in the above-mentioned embodiment for example, the entropy decoding unit 301, the parameter decoding unit 302, the loop filter 305, the predicted image generating unit 308, the inverse quantization and inverse transform unit 311, the addition unit 312, the predicted image generating unit 101, the subtraction unit 102, the transform and quantization unit 103, the entropy encoding unit 104, the inverse quantization and inverse transform unit 105, the loop filter 107, the encoding parameter determination unit 110, and the parameter encoding unit 111 may be realized by a computer.
  • a program for realizing this control function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read into and executed by a computer system.
  • the "computer system” referred to here is a computer system built into either the video encoding device 11 or the video decoding device 31, and includes hardware such as an OS and peripheral devices.
  • “computer-readable recording media” refers to portable media such as flexible disks, optical magnetic disks, ROMs, and CD-ROMs, as well as storage devices such as hard disks built into computer systems.
  • “computer-readable recording media” may also include devices that dynamically store a program for a short period of time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line, and devices that store a program for a certain period of time, such as volatile memory within a computer system that serves as a server or client in such cases.
  • the above-mentioned program may be one that realizes part of the functions described above, or may be one that can realize the functions described above in combination with a program already recorded in the computer system.
  • part or all of the video encoding device 11 and video decoding device 31 in the above-mentioned embodiments may be realized as an integrated circuit such as an LSI (Large Scale Integration).
  • LSI Large Scale Integration
  • Each functional block of the video encoding device 11 and video decoding device 31 may be individually made into a processor, or part or all of them may be integrated into a processor.
  • the integrated circuit method is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. Furthermore, if an integrated circuit technology that can replace LSI appears due to advances in semiconductor technology, an integrated circuit based on that technology may be used.
  • Embodiments of the present invention can be suitably applied to a video decoding device that decodes coded data in which image data has been coded, and a video coding device that generates coded data in which image data has been coded.
  • the present invention can also be suitably applied to the data structure of coded data that is generated by a video coding device and referenced by the video decoding device.
  • Image Decoding Device 301 Entropy Decoding Unit 302 Parameter Decoding Unit 308 Prediction Image Generation Unit 310 Intra-prediction image generation unit 31047 TIMD Forecasting Department 4701 Reference sample derivation part 4702 Template Derivation Unit 4703 Intra prediction unit 4711 Intra prediction mode candidate derivation unit, 4712 Template Prediction Image Generation Unit 4713 Template Cost Derivation Unit 4714 Intra prediction mode selection unit 311 Inverse quantization and inverse transformation unit 312 Addition section 11 Image Encoding Device 101 Prediction image generation unit 102 Subtraction section 103 Transformation and Quantization Section 104 Entropy coding unit 105 Inverse quantization and inverse transformation section 107 Loop Filter 110 Encoding parameter determination unit 111 Parameter Encoding Unit 1110 Header encoding part 1111 CT information coding unit 1112 CU encoding unit (prediction mode encoding unit) 1114 TU encoding unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Une dérivation de mode de prédiction classique du type de dérivation de mode intra reposant sur un modèle (TIMD) sur la base d'une différence entre une image de prédiction de modèle d'une unité de mode de prédiction intra et une image de modèle est problématique, car une grande quantité de calculs sont nécessaires pour la dérivation des calculs d'image de modèle et de coût. Pour répondre à ce problème, dans une prédiction de TIMD, afin de sélectionner un mode de prédiction intra d'un bloc cible sur la base d'une image de prédiction de modèle et d'un coût dérivé d'une image de modèle, un dispositif de décodage vidéo du présent exemple divise des candidats de mode de prédiction intra prescrits en une pluralité de groupes sur la base d'une direction de prédiction, génère une image de prédiction de modèle et évalue le coût pour chaque candidat de mode de prédiction intra sélectionné dans chaque groupe, et réalise une dérivation et une sélection de coût pour des modes autres que le candidat sélectionné pour le groupe auquel le candidat de mode de prédiction intra ayant le coût le plus bas appartient.
PCT/JP2023/039040 2022-11-28 2023-10-30 Dispositif de décodage vidéo et dispositif de codage vidéo WO2024116691A1 (fr)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
JP2022-189022 2022-11-28
JP2022189022A JP2024077148A (ja) 2022-11-28 2022-11-28 動画像復号装置および動画像符号化装置
JP2022208682A JP2024092612A (ja) 2022-12-26 2022-12-26 動画像復号装置および動画像符号化装置
JP2022-208682 2022-12-26
JP2023-038230 2023-03-13
JP2023038230A JP2024129196A (ja) 2023-03-13 2023-03-13 動画像復号装置および動画像符号化装置

Publications (1)

Publication Number Publication Date
WO2024116691A1 true WO2024116691A1 (fr) 2024-06-06

Family

ID=91323742

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/039040 WO2024116691A1 (fr) 2022-11-28 2023-10-30 Dispositif de décodage vidéo et dispositif de codage vidéo

Country Status (1)

Country Link
WO (1) WO2024116691A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190166370A1 (en) * 2016-05-06 2019-05-30 Vid Scale, Inc. Method and system for decoder-side intra mode derivation for block-based video coding
US20220224913A1 (en) * 2021-01-13 2022-07-14 Lemon Inc. Techniques for decoding or coding images based on multiple intra-prediction modes
WO2022232784A1 (fr) * 2021-04-26 2022-11-03 Tencent America LLC Prédiction intra basée sur la mise en correspondance de modèles

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190166370A1 (en) * 2016-05-06 2019-05-30 Vid Scale, Inc. Method and system for decoder-side intra mode derivation for block-based video coding
US20220224913A1 (en) * 2021-01-13 2022-07-14 Lemon Inc. Techniques for decoding or coding images based on multiple intra-prediction modes
WO2022232784A1 (fr) * 2021-04-26 2022-11-03 Tencent America LLC Prédiction intra basée sur la mise en correspondance de modèles

Similar Documents

Publication Publication Date Title
KR102595069B1 (ko) 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체
KR102688470B1 (ko) 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체
KR102658040B1 (ko) 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체
KR102704683B1 (ko) 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체
KR102678729B1 (ko) 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체
KR20240132433A (ko) 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체
KR20230132420A (ko) 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체
KR20220127801A (ko) 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체
KR102617953B1 (ko) 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체
KR20170113384A (ko) 픽쳐 분할 정보를 사용하는 비디오의 부호화 및 복호화를 위한 방법 및 장치
KR102654647B1 (ko) 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체
WO2024116691A1 (fr) Dispositif de décodage vidéo et dispositif de codage vidéo
JP2024129196A (ja) 動画像復号装置および動画像符号化装置
JP2024092612A (ja) 動画像復号装置および動画像符号化装置
JP2024077148A (ja) 動画像復号装置および動画像符号化装置
JP2024047922A (ja) 画像復号装置および画像符号化装置
JP2024047921A (ja) 画像復号装置
EP4378163A1 (fr) Amélioration du codage dans un décalage adaptatif d'échantillons à composants transversaux

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23897347

Country of ref document: EP

Kind code of ref document: A1