WO2022158147A1 - 映像符号化装置および映像符号化方法 - Google Patents
映像符号化装置および映像符号化方法 Download PDFInfo
- Publication number
- WO2022158147A1 WO2022158147A1 PCT/JP2021/044716 JP2021044716W WO2022158147A1 WO 2022158147 A1 WO2022158147 A1 WO 2022158147A1 JP 2021044716 W JP2021044716 W JP 2021044716W WO 2022158147 A1 WO2022158147 A1 WO 2022158147A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- prediction mode
- transform coefficients
- value
- transform
- prediction
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 41
- 238000013139 quantization Methods 0.000 claims abstract description 35
- 238000006243 chemical reaction Methods 0.000 claims abstract description 22
- 238000004364 calculation method Methods 0.000 claims abstract description 12
- 230000001131 transforming effect Effects 0.000 claims description 9
- 238000011156 evaluation Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 9
- 239000011159 matrix material Substances 0.000 description 8
- 230000009466 transformation Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 3
- 238000000638 solvent extraction Methods 0.000 description 3
- 230000006866 deterioration Effects 0.000 description 2
- 230000001747 exhibiting effect Effects 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/11—Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/157—Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/18—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a set of transform coefficients
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/593—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
Definitions
- the present invention relates to a video encoding device and a video encoding method for encoding moving images.
- Non-Patent Document 1 discloses a video coding method called VVC (Versatile Video Coding).
- VVC is ITU-T H. Also called 266.
- coding tree unit In VVC, the maximum size of a coding tree unit (CTU: Coding Tree Unit) is 64 ⁇ 64 pixels in H.265/HEVC (High Efficiency Video Coding) (hereinafter simply expressed as 64 ⁇ 64) is expanded to 128 ⁇ 128.
- size means [vertical ⁇ horizontal] pixels.
- the encoding process in VVC is performed in units of encoding units (CU: Coding Unit) defined from CTU.
- a CU corresponds to a block generated by dividing a CTU using a quad-tree (QT) structure or a multi-type tree (MMT) structure, or the CTU itself.
- QT quad-tree
- MMT multi-type tree
- the CTU is equally partitioned horizontally and vertically.
- partitioning using a multi-type tree structure a CTU is partitioned into two or three in the horizontal or vertical direction.
- Predictive coding includes intra prediction and inter prediction with motion compensation (hereinafter simply referred to as inter prediction).
- the prediction error of each CU is frequency-transformed into transform coefficients.
- PU Prediction Unit
- TU Transform Unit
- VVC extends the maximum TU size to 64 ⁇ 64.
- a discrete cosine transform is mainly used for frequency transformation.
- the energy of the transform coefficients obtained by frequency transform is concentrated in the low frequency region, so that the values of the transform coefficients in the low frequency region are large and the values of the transform coefficients in the high frequency region are small.
- FIG. 7 is an explanatory diagram for explaining DCT, which is an example of frequency conversion.
- the TU size is 64 ⁇ 64 (see FIG. 7(a)).
- FIG. 7(b) shows the frequency component distribution (two-dimensional matrix) after DCT.
- the value of the transform coefficient is large in the upper left region (low frequency region).
- the values of transform coefficients are small in the lower right region (high frequency region). That is, most of the frequency components are concentrated in the low frequency region.
- the low-frequency region and the high-frequency region are schematically shown by ellipses rather than strictly.
- VVC in order to reduce the amount of calculation, if at least one of the TU width (horizontal size M) and height (vertical size N) exceeds 32, the , the portion exceeding 32, that is, the transform coefficients of the high frequency components are excluded. Therefore, both M and N of a block composed of a plurality of transform coefficients are 32 or less. Note that excluding the transform coefficient is equivalent to setting the value of the transform coefficient to zero.
- FIG. 7(c) shows an example in which 32 ⁇ 32 transform coefficients are generated from a 64 ⁇ 64 size TU.
- the excluded transform coefficients are the transform coefficients in the high frequency domain as described above.
- the values of the transform coefficients in the high frequency range are generally small, in other words the energy concentration is low. Therefore, even if the transform coefficients in the high-frequency region are excluded, the quality of the image decoded by the video decoding device is not significantly affected (deteriorated).
- a video encoding device selects an optimum prediction mode from among a large number of prediction modes and performs predictive encoding on a block-by-block basis.
- the video encoding device performs frequency conversion processing, quantization processing, inverse quantization processing, and so on for prediction error signals generated based on prediction mode candidates that can be used. Inverse frequency transform processing, arithmetic coding processing, and the like are executed.
- the transform coefficients exceeding 32 are excluded.
- image quality does not degrade so much even if transform coefficients in the high frequency region are excluded.
- the values of transform coefficients in the high frequency region are relatively large, the quality of the image decoded by the video decoding device degrades.
- the prediction mode including the TU on which such a two-dimensional matrix is based is the optimal prediction It is highly unlikely that it will be selected as a mode.
- a general video encoding device performs processing for selecting the optimum prediction mode, including prediction modes that are highly unlikely to be selected as the optimum prediction mode, as candidates. Run. That is, the video coding apparatus executes inverse quantization processing, inverse frequency transform processing, arithmetic coding processing, and the like, even for prediction modes that are highly unlikely to be selected as the optimum prediction mode.
- the present invention provides a video encoding device and a video encoding method that reduce the possibility of executing unnecessary processing when the video encoding device selects the optimum prediction mode, thereby shortening the processing time. With the goal.
- a video coding apparatus includes transform means for orthogonally transforming a prediction error signal of a block to be processed to generate transform coefficients, quantization means for quantizing the transform coefficients to generate quantized transform coefficients, and quantization.
- Arithmetic coding means for arithmetically coding transform coefficients, local decoding means for locally decoding quantized transform coefficients, and prediction for evaluating a plurality of prediction mode candidates and selecting an optimum prediction mode for a block to be processed mode selection means; pre-conversion energy calculation means for calculating a first value representing the energy of the prediction error signal of the block to be processed; post-conversion energy calculation means for calculating a second value representing the energy of the conversion coefficients other than the excluded conversion coefficients, wherein the second value is a predetermined degree with respect to the first value , the prediction mode selection means optimizes the prediction mode candidate without the quantization means, the arithmetic coding means, and the local decoding means performing quantization, arithmetic coding, and local decoding.
- a video encoding method performs orthogonal transformation on a prediction error signal of a block to be processed to generate transform coefficients, calculates a first value representing the energy of the prediction error signal of the block to be processed, excluding a predetermined transform coefficient if at least one of the width and the height exceeds a predetermined value; calculating a second value representing the energy of the transform coefficients other than the excluded transform coefficient; quantizing the transform coefficient; generating quantized transform coefficients, arithmetically encoding the quantized transform coefficients, locally decoding the quantized transform coefficients, evaluating a plurality of prediction mode candidates for a block to be processed, and selecting an optimum prediction mode; When the optimum prediction mode is selected, if the second value is smaller than the first value by a predetermined degree or more, the prediction mode is selected without performing quantization, arithmetic coding, and local decoding. Decide not to put the candidate in the best prediction mode.
- a video encoding program causes a computer to orthogonally transform a prediction error signal of a block to be processed to generate transform coefficients, and to calculate a first value representing the energy of the prediction error signal of the block to be processed. If at least one of the processing and the width and height of the block to be processed exceeds a predetermined value, the predetermined transform coefficient is excluded, and a second value representing the energy of the transform coefficients other than the excluded transform coefficient is calculated.
- a process of quantizing transform coefficients to generate quantized transform coefficients a process of arithmetically encoding the quantized transform coefficients; a process of locally decoding the quantized transform coefficients; Evaluate the prediction mode candidates and select the optimum prediction mode, and when the optimum prediction mode is selected, the second value is smaller than the first value by a predetermined degree or more , let the prediction mode candidate decide not to be the optimal prediction mode without performing quantization, arithmetic coding and local decoding.
- the processing load on the video encoding device is reduced and the processing time is shortened.
- FIG. 1 is a block diagram showing an example of a computer having a CPU;
- FIG. 1 is a block diagram showing the main parts of a video encoding device;
- FIG. It is an explanatory view for explaining DCT.
- FIG. 1 is a block diagram showing a configuration example of a video encoding device.
- the video encoding device shown in FIG. an arithmetic encoder 113 , a mode determination unit 114 , and a code string generation unit 115 .
- the prediction unit 110 includes an intra predictor 111 and an inter predictor 112 .
- the video encoding device further includes a control unit 120 having an information loss determination unit 121.
- the block division unit 101 divides a picture into a plurality of CTUs. Furthermore, the block division unit 101 defines the CTU as a CU without dividing it, or divides the CTU into individual blocks using a quadtree structure, a multitype tree structure, or a quadtree structure and a multitype tree structure. is defined as a CU. In subsequent processes, the CU is treated as a block to be processed. Further, the block division unit 101 defines a CU as a PU as it is without dividing a CU, or defines a block obtained by dividing a CU as a PU. Similarly, block division section 101 defines a CU without dividing it as a TU, or defines a block into which a CU is divided as a TU.
- the subtractor 102 subtracts the prediction signal from the input signal (input pixel value) for each block generated by the block division unit 101 to generate a prediction error signal.
- the prediction error signal is also called prediction residual or prediction residual signal.
- the transform unit 103 frequency-transforms the prediction error signal of the block to be processed to obtain transform coefficients.
- the transform unit 103 has multiple types of orthogonal transform functions including type II DCT (DCT-II), selects an appropriate orthogonal transform function according to the size of the block to be processed, etc., and performs frequency transform using the selected orthogonal transform function. I do.
- DCT-II type II DCT
- the quantization unit 104 quantizes the transform coefficients into quantized coefficients (transformed quantized values).
- the transform quantized values are used in arithmetic encoder 113 and inverse quantizer 105 .
- the inverse quantization unit 105 restores the transform coefficients by inversely quantizing the transform quantized values.
- the inverse transform unit 106 restores the prediction error signal by inverse frequency transforming the transform coefficients based on the orthogonal transform method selected by the transform unit 103 .
- the adder 107 adds the restored prediction error signal and the prediction signal to generate a reconstructed signal (reconstructed image).
- the intra predictor 111, loop filter 108 and mode determination section 114 receive the reconstructed signal.
- a block memory for storing reference blocks in the current picture to be encoded is generally provided in the preceding stage of the prediction unit 110 or in the intra predictor 111, but is omitted in FIG.
- the intra predictor 111 refers to the reference block, performs intra prediction on the encoding target block, and generates a prediction signal (in this case, an intra prediction signal).
- a loop filter 108 includes, for example, a deblocking filter, a sample adaptive offset filter and an adaptive loop filter, and performs appropriate filtering.
- the reconstructed signal filtered by loop filter 108 is input to inter predictor 112 .
- a frame memory for storing reference pictures is generally provided in the preceding stage of the prediction unit 110 or in the inter predictor 112, but is omitted in FIG.
- the inter predictor 112 refers to a reference picture different from the picture to be coded, performs inter prediction on the block to be coded, and generates a prediction signal (in this case, an inter prediction signal).
- the arithmetic encoder 113 generates an encoded signal (code string: bitstream) by arithmetically encoding the transformed quantized values. Arithmetic encoder 113 binarizes the transformed quantized value and arithmetically encodes the binary signal to generate a binary arithmetic code.
- the mode determination unit 114 selects the optimum prediction mode.
- the code string generator 115 selects the binary arithmetic code in the optimum prediction mode and outputs it as a bitstream. For example, the bitstream is transmitted to an image decoding device. The bitstream may be output to and stored on a storage medium (not shown).
- the information loss determination unit 121 evaluates the degree of information loss when the prediction error signal is frequency-converted in each prediction mode when multiple types of prediction mode candidates are evaluated in order for the block to be processed. Evaluating the degree of information loss corresponds to evaluating the degree of energy bias toward the low-frequency region.
- the inverse quantization unit 105 the inverse transform unit 106, and the adder 107 may be referred to as a local decoding unit.
- the transform coefficients in the portion exceeding 32 are excluded.
- the portion over 32 corresponds to the high frequency region.
- the degree of energy concentration in the low-frequency component is not so large, the image quality after decoding deteriorates when the transform coefficients in the high-frequency region are excluded. In other words, the amount of information contained in the original image is reduced by removing the transform coefficients in the high-frequency region during frequency transform. That is, information is lost.
- the information loss determination unit 121 utilizes the fact that the energy of the signal is preserved (does not change) before and after the orthogonal transform (for example, DCT) as the frequency transform performed by the transform unit 103 .
- the information loss determination unit 121 calculates a value representing the energy of the prediction error signal and a value representing the energy of the transform coefficient. Then, the information loss determination unit 121 compares both. Note that the information loss determination unit 121 determines the conversion coefficient of the region where the high-frequency region (the region exceeding 32 in the TU width and height) is excluded, that is, the region where the TU width and height is 32 or less, that is, A value representing the energy of the low frequency component is calculated.
- the information loss determination unit 121 determines that the energy value of the transform coefficient is smaller than the energy value of the prediction error signal by a predetermined degree or more when the prediction mode candidates are being evaluated. In that case, control is performed to terminate the evaluation of the prediction mode.
- the information loss determination unit 121 When the value of S/T is equal to or less than a predetermined threshold value (th), the information loss determination unit 121 performs control to terminate the prediction mode evaluation.
- the threshold th is predetermined. For example, when it is required to shorten the processing time, a large value is used as the threshold th.
- the information loss determination formula other determination formulas than S/T may be used.
- ⁇ S/ ⁇ T may be used as the judgment formula.
- the video encoding device includes a storage unit (not shown) that stores a table in which data that can specify each of a plurality of types of prediction mode candidates is set.
- the control unit 120 sets the prediction mode to be evaluated in the prediction unit 110 .
- Non-Patent Document 1 Regarding intra prediction, the following prediction modes (see Non-Patent Document 1) may be added as prediction mode candidates.
- Non-Patent Document 1 the following prediction modes (see Non-Patent Document 1) may be added as prediction mode candidates.
- the control unit 120 selects one prediction mode from a plurality of evaluation target prediction modes (prediction mode candidates) (step S100).
- the intra predictor 111 or the inter predictor 112 generates a prediction signal for the block input from the block division unit 101 (step S102). Also, the subtractor 102 generates a prediction error signal (step S102).
- the transform unit 103 frequency-transforms the prediction error signal to generate transform coefficients (step S103). Note that when at least one of the width and height of the TU exceeds 32, the transform unit 103 excludes the transform coefficients in the portion exceeding 32 (that is, the high frequency region). That is, assuming a two-dimensional matrix whose elements are transform coefficients, the transform result of the transform unit 103 has 32 or less rows and columns.
- the transform unit 103 may exclude the transform coefficients in the high-frequency region from the transform result.
- the quantization unit 104 may quantize the transform coefficients in the region of 32 or less rows and columns, and discard the other transform coefficients.
- step S107 step S104.
- the information loss determination unit 121 determines the sum of squares T of the prediction error signal and the transform coefficient (2 In a dimensional matrix, both rows and columns are 32 or less) and the sum of squares S is calculated (step S105).
- the loss determination unit 121 compares the value of S/T with the threshold value th (step S106).
- step S100 control unit 120 selects the next prediction mode from the table. Therefore, the candidate prediction mode under evaluation is determined not to be the optimal prediction mode at this stage. If the value of S/T is greater than the threshold value th, the process proceeds to step S107.
- a determination formula other than S/T may be used as the determination formula for information loss. For example, when ⁇ S/ ⁇ T is used as the determination formula, the control unit 120 compares the value of ⁇ S/ ⁇ T with the threshold value th in the process of step S106.
- step S107 the quantization unit 104 quantizes the transform coefficients from the transform unit 103 to generate transform quantized values.
- Inverse quantization section 105 and arithmetic encoder 113 receive transform quantized values.
- the inverse quantization unit 105 inversely quantizes the transform quantized values, and the inverse transform unit 106 performs inverse frequency transform on the inversely quantized transform quantized values to restore transform coefficients (step S108).
- the arithmetic encoder 113 arithmetically encodes the transformed quantized values to generate an encoded signal (step S109).
- the mode determination unit 114 determines coding efficiency as an evaluation result. Coding efficiency means how much coding can be performed with a small amount of code and with little deterioration of image quality.
- the mode determination unit 114 determines, for example, the encoding efficiency that minimizes the encoding cost J as an index represented by Equation (1) below.
- D represents coding distortion, for example, the sum of squares of the difference between the original image (input signal) and the reconstructed image (reconstructed signal).
- R is, for example, the amount of code generated by the arithmetic encoder 113 .
- ⁇ is a Lagrangian multiplier determined based on quantization parameters and the like.
- the mode determination unit 114 may use an index other than the formula (1) as an index for determining the coding efficiency. As an example, the mode determination unit 114 may use only one of R and D. If only R is used, the arithmetic coding process (process of step S109) is unnecessary. Also, the mode determination unit 114 may use, for example, a cumulative sum (sum) of prediction error signals instead of the sum of squares of the difference between the original image (input signal) and the reconstructed image (reconstructed signal). . Mode determination section 114 may use the amount of code input to arithmetic encoder 113 or the amount of code estimated by some method instead of the amount of code generated by arithmetic encoder 113 .
- step S111 When the evaluation (processing of steps S101 to S110) is completed for all the prediction mode candidates set in the table, the process ends (step S111). If there are unevaluated prediction modes, go back to step 100 .
- the mode determination unit 114 temporarily stores the encoding efficiency of each prediction mode candidate in the process of step S110.
- a prediction mode exhibiting the lowest coding efficiency among the stored coding efficiencies is determined as the prediction mode to be used in actual coding processing.
- the mode determination unit 114 may store the minimum coding efficiency and the prediction mode that exhibits it, instead of storing the coding efficiencies of all prediction mode candidates. In that case, in the processing of step S110, when the encoding efficiency calculated at that time is smaller than the stored encoding efficiency, the calculated encoding efficiency and the prediction mode that exhibits it are stored. Update the current coding efficiency and prediction mode.
- the mode determination unit 114 selects, for example, a prediction mode candidate exhibiting the highest coding efficiency as the optimum prediction mode.
- the video encoding device uses the optimum prediction mode to perform the actual encoding process (the process of generating the output bitstream).
- the video encoding apparatus converts the transform coefficients of the region exceeding the predetermined size to Control to exclude.
- the video coding apparatus does not perform local decoding by the local decoding unit when it is determined that the information loss in the frequency conversion by the transform unit 103 is large, so the optimum prediction mode is determined. processing load becomes smaller. As a result, processing time is shortened.
- the value of S/T is equal to or less than the threshold value th in the process of step S106 (when it is determined that the information loss in the frequency conversion by the conversion unit 103 is large)
- the process proceeds to step S100. Therefore, quantization and arithmetic coding by quantization section 104 and arithmetic encoder 113 are not performed either. This also reduces the processing load when determining the optimum prediction mode, and shortens the processing time.
- inverse transform unit 106 converts the transform coefficients restored by inverse quantization by inverse quantization unit 105 to a predetermined value when at least one of the width and height of the TU exceeds a predetermined value. Inverse frequency transform is performed after setting 0 as the transform coefficient in the region exceeding .
- FIG. 3 is a block diagram showing a configuration example of the conversion unit 103. As shown in FIG. 3 includes a horizontal frequency converter 131, a horizontal high-frequency component removing unit 132, a first right bit shifter 133, a vertical frequency converter 134, a vertical high-frequency component removing unit 135, and a second A right bit shifter 136 is included.
- the transformation unit 103 executes DCT with integer precision. Let M be the width (horizontal size) and N be the height (vertical size) of the TU.
- the horizontal frequency transformer 131 horizontally frequency-transforms the prediction error signal using a predetermined basis to generate transform coefficients. That is, one-dimensional DCT is performed in the horizontal direction.
- the horizontal high-frequency component removing unit 132 removes transform coefficients in areas exceeding 32.
- the first right bit shifter 133 shifts the transform coefficient rightward by log 2 N ⁇ 1 bits.
- the vertical frequency transformer 134 vertically frequency-transforms the prediction error signal using a predetermined basis to generate transform coefficients. That is, one-dimensional DCT is performed in the vertical direction.
- the vertical high-frequency component exclusion unit 135 excludes transform coefficients in areas exceeding 32. Note that when N ⁇ 32, the processing of the vertical high-frequency component removing unit 135 is skipped.
- the second right bit shifter 136 bit-shifts the transform coefficients rightward by log 2 N + 6 bits.
- the transformation unit 103 executes DCT (two-dimensional DCT) by sequentially executing horizontal one-dimensional DCT and vertical one-dimensional DCT. Note that the transform unit 103 may first perform a one-dimensional DCT in the vertical direction. Also in the first embodiment (Embodiment 1), the conversion unit 103 executes the above process.
- the information loss determination unit 121 calculates the sum of squares S of the transform coefficients output from the transform unit 103 . Specifically, the information loss determination unit 121 calculates the sum of squares S of the transform coefficients output by the second right bit shifter 136 . In the second embodiment (embodiment 2), the information loss determination unit 121 calculates the sum of squares S of the horizontal one-dimensional DCT transform coefficients output from the first right bit shifter 133 . When the transform unit 103 is configured to execute the vertical one-dimensional DCT first, the information loss determination unit 121 calculates the square sum S of transform coefficients of the vertical one-dimensional DCT.
- the loss determination unit 121 calculates the sum of squares S of the transform coefficients of the one-dimensional DCT.
- the configuration of the video encoding device is basically the same as the configuration illustrated in FIG.
- the transform unit 103 when the transform unit 103 is configured as shown in FIG. 3, when the horizontal TU size exceeds 32, horizontal one-dimensional DCT is performed to exclude high frequency components. If the degree of information loss is already large at that stage, the prediction mode being evaluated will not be selected as the optimum prediction mode. That is, there is no significance in performing vertical one-dimensional DCT in addition to horizontal one-dimensional DCT.
- the process for evaluating the prediction mode is terminated at an earlier stage, so the processing load when determining the optimum prediction mode becomes smaller. As a result, processing time is further reduced.
- FIG. 4 is a flow chart showing operations related to evaluation of prediction mode candidates of the video encoding device of the third embodiment (Embodiment 3).
- the configuration of the video encoding device of the third embodiment is the same as the configuration illustrated in FIG.
- the threshold th to be compared with the value of S/T is a fixed value, but in the third embodiment the threshold th is a variable value.
- the threshold th is set according to one or more of a prediction method (e.g., intra prediction/inter prediction, unidirectional prediction/bidirectional prediction), TU size, and prediction mode. .
- a prediction method e.g., intra prediction/inter prediction, unidirectional prediction/bidirectional prediction
- TU size e.g., TU size
- prediction mode e.g., prediction mode for prediction.
- the control unit 120 lowers the threshold th corresponding to the prediction mode so that a specific prediction mode is more likely to be selected when encoding P slices or B slices.
- the degree of information loss when high frequency regions are excluded is greater than in 64 ⁇ 32 blocks and 32 ⁇ 64 blocks. may be set according to the area of
- the control unit 120 sets a threshold according to the TU size or the prediction mode (step S101).
- the control unit 120 may set the threshold according to the size of the TU and the prediction mode.
- control unit 120 does not execute the process of step S101 after executing the process of step S100, but sets the threshold to 1 at the start of the process shown in FIG. You only have to do it once.
- the threshold th is set according to the prediction method, TU size, or prediction mode, so an appropriate threshold can be used according to the encoding situation.
- SBT Sub-block Transform
- VVC Video Coding Codon Codon Codon
- SBT is a method of dividing a block into two sub-blocks in the horizontal or vertical direction and performing frequency conversion on only one of the sub-blocks. All prediction error signals in the other sub-block are replaced with zeros. Since information loss also occurs in SBT, it is conceivable to apply each of the above embodiments.
- LFNST Low-Frequency Non-Separable Transform
- LFNST is a method of re-transforming transform coefficients using an orthogonal transform matrix defined for LFNST when encoding by intra prediction. Up to 48 coefficients are subject to retransformation. All the coefficients (976 coefficients in the case of 32 ⁇ 32) other than those to be retransformed are set to 0. Therefore, since the coefficients of the high-frequency components are excluded, information loss occurs even in the LFNST, and it is conceivable to apply each of the above-described embodiments.
- the video encoding device of each of the above embodiments can be configured with individual hardware circuits or integrated circuits, but it can also be realized with a computer having a processor such as a CPU (Central Processing Unit) and memory. It is possible.
- each function may be realized by storing a program for implementing the method (processing) in the above embodiments in a storage device (storage medium) and executing the program by a CPU.
- FIG. 5 is a block diagram showing an example of a computer having a CPU.
- the computer is implemented in the video encoding device.
- the CPU 1000 implements each function in the above embodiments by executing processes according to programs stored in the storage device 1001 . That is, in the video encoding device shown in FIG. , prediction unit 110, arithmetic encoder 113, mode determination unit 114, code string generation unit 115, prediction unit 110 (intra predictor 111 and inter predictor 112), and control unit 120 including information loss determination unit 121 Realize
- the storage device 1001 is, for example, a non-transitory computer readable medium.
- Non-transitory computer readable media include various types of tangible storage media. Specific examples of non-transitory computer-readable media include magnetic recording media (e.g., hard disks), CD-ROMs (Compact Disc-Read Only Memory), CD-Rs (Compact Disc-Recordable), CD-R/Ws (Compact Disc-ReWritable), semiconductor memory (eg mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM).
- magnetic recording media e.g., hard disks
- CD-ROMs Compact Disc-Read Only Memory
- CD-Rs Compact Disc-Recordable
- CD-R/Ws Compact Disc-ReWritable
- semiconductor memory eg mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM.
- the program may also be stored on various types of transitory computer readable medium.
- a transitory computer-readable medium is provided with a program, for example, via a wired or wireless communication path, ie, via an electrical, optical or electromagnetic wave.
- the memory 1002 is, for example, RAM (Random Access Memory), and is storage means for temporarily storing data when the CPU 1000 executes processing.
- RAM Random Access Memory
- a mode in which a program held by the storage device 1001 or a temporary computer-readable medium is transferred to the memory 1002 and the CPU 1000 executes processing based on the program in the memory 1002 is also conceivable.
- FIG. 6 is a block diagram showing the main parts of the video encoding device.
- the video encoding device 10 shown in FIG. 6 includes transform means 11 (implemented by a transform unit 103 in the embodiment) that orthogonally transforms a prediction error signal of a block to be processed (for example, a TU to be processed) to generate transform coefficients. ), a quantization means 12 (in the embodiment, implemented by a quantization unit 104) that quantizes the transform coefficients to generate quantized transform coefficients, and an arithmetic code that arithmetically encodes the quantized transform coefficients.
- transform means 11 implemented by a transform unit 103 in the embodiment
- a quantization means 12 in the embodiment, implemented by a quantization unit 104 that quantizes the transform coefficients to generate quantized transform coefficients
- an arithmetic code that arithmetically encodes the quantized transform coefficients.
- the prediction mode selection means 17 (in the embodiment, the information loss determination unit 121 and and a pre-transformation energy calculation means 15 (embodiment ), and when at least one of the width and height of the block to be processed exceeds a predetermined value (eg, 32), a predetermined transform coefficient (eg, after-transformation energy calculation means 16 (in the embodiment, , realized by the information loss determination unit 121), and when the second value is smaller than the first value by a predetermined degree or more (for example, S/T ⁇ th), quantization Means 12, arithmetic coding means 13, and local decoding means 14 do not perform quantization, arithmetic coding, and local decoding, and prediction mode selection means 17 decides not to set the prediction mode candidate to the optimum prediction mode. do.
- a predetermined value eg, 32
- a predetermined transform coefficient eg, after-transformation energy calculation means 16 (in the embodiment, , realized by the information loss determination unit 121)
- the video encoding device 10 includes threshold setting means (implemented by the control unit 120 in the embodiment) that sets a threshold according to one or more of the prediction mode and the size of the block to be processed. may be
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
図1は、映像符号化装置の構成例を示すブロック図である。図1に示す映像符号化装置は、ブロック分割部101、減算器102、変換部103、量子化部104、逆量子化部105、逆変換部106、加算器107、ループフィルタ108、予測部110、算術符号化器113、モード判定部114、および符号列生成部115を備える。予測部110は、イントラ予測器111とインター予測器112とを含む。
・Planar予測
・角度予測(Angular予測)の各々
・マージ符号化
・MIP(Matrix-based Intra Prediction)
・GPM(Geometric Partitioning Mode)
・CIIP(Combined inter merge / intra prediction)
図3は、変換部103の構成例を示すブロック図である。図3に示す変換部103は、水平方向周波数変換器131、水平方向高周波成分除外部132、第1右ビットシフト器133、垂直方向周波数変換器134、垂直方向高周波成分除外部135、および第2右ビットシフト器136を含む。
図4は、第3の実施形態(実施形態3)の映像符号化装置の予測モードの候補の評価に関する動作を示すフローチャートである。第3の実施形態の映像符号化装置の構成は、図1に例示された構成と同じである。
VVCにおいて、SBT(Sub-block Transform)を使用可能である。SBTは、水平方向または垂直方向にブロックを2つのサブブロックに分割し、いずれか一方のサブブロックのみに関して周波数変換を行う方式である。他方のサブブロックにおける全ての予測誤差信号は0に置き換えられる。SBTでも情報損失が発生するので、上記の各実施形態を応用することが考えられる。
VVCにおいて、LFNST(Low-Frequency Non-Separable Transform)を使用可能である。LFNSTは、イントラ予測で符号化する場合、変換係数をLFNSTのために定義された直交変換行列を用いて再変換する方式である。最大で48係数までが再変換の対象になる。再変換の対象以外の係数(32×32の場合、976係数)を全て0にする。したがって、高周波成分の係数に対して係数の除外が実行されることになるので、LFNSTでも情報損失が発生することになり、上記の各実施形態を応用することが考えられる。
11 変換手段
12 量子化手段
13 算術符号化手段
14 局所復号手段
15 変換前エネルギー算出手段
16 変換後エネルギー算出手段
17 予測モード選択手段
101 ブロック分割部
102 減算器
103 変換部
104 量子化部
105 逆量子化部
106 逆変換部
107 加算器
108 ループフィルタ
110 予測部
111 イントラ予測器
112 インター予測器
113 算術符号化器
114 モード判定部
115 符号列生成部
120 制御部
121 情報損失判定部
131 水平方向周波数変換器
132 水平方向高周波成分除外部
133 第1右ビットシフト器
134 垂直方向周波数変換器
135 垂直方向高周波成分除外部
136 第2右ビットシフト器
1000 CPU
1001 記憶装置
1002 メモリ
Claims (10)
- イントラ予測機能とインター予測機能とを備える映像符号化装置であって、
処理対象ブロックの予測誤差信号を直交変換して変換係数を生成する変換手段と、
前記変換係数を量子化して量子化変換係数を生成する量子化手段と、
前記量子化変換係数を算術符号化する算術符号化手段と、
前記量子化変換係数を局所復号する局所復号手段と、
前記処理対象ブロックに対して、複数の予測モードの候補を評価して最適予測モードを選択する予測モード選択手段と、
前記処理対象ブロックの予測誤差信号のエネルギーを表す第1の値を算出する変換前エネルギー算出手段と、
前記処理対象ブロックの幅と高さとの少なくとも一方が所定値を超える場合に、所定の変換係数を除外し、除外された変換係数以外の変換係数のエネルギーを表す第2の値を算出する変換後エネルギー算出手段とを備え、
前記第1の値に対して前記第2の値が所定程度以上に小さくなっている場合に、前記量子化手段、前記算術符号化手段および前記局所復号手段が、前記量子化、前記算術符号化および前記局所復号を実行することなく、前記予測モード選択手段は、予測モードの候補を最適予測モードにしないことに決定する
ことを特徴とする映像符号化装置。 - 前記変換前エネルギー算出手段は、予測誤差信号の二乗和を前記第1の値として算出し、
前記変換後エネルギー算出手段は、変換係数の二乗和を前記第2の値として算出する
請求項1記載の映像符号化装置。 - 前記変換手段は、前記処理対象ブロックの水平方向の離散コサイン変換と垂直方向の離散コサイン変換とを順に実行し、
前記変換後エネルギー算出手段は、先に実行される前記離散コサイン変換が終了したときに得られる変換係数から前記第2の値を算出する
請求項1または請求項2記載の映像符号化装置。 - 前記所定値は32である
請求項1から請求項3のうちのいずれか1項に記載の映像符号化装置。 - 前記予測モード選択手段は、前記第1の値に対する前記第2の値の割合がしきい値以下の割合であるか否か判定する
請求項1から請求項4のうちのいずれか1項に記載の映像符号化装置。 - 前記しきい値は固定値である
請求項1から請求項5のうちのいずれか1項に記載の映像符号化装置。 - 予測モードと前記処理対象ブロックのサイズのうちの1つ以上に応じて前記しきい値を設定するしきい値設定手段を備えた
請求項1から請求項5のうちのいずれか1項に記載の映像符号化装置。 - 前記局所復号する機能を実現するための手段は、変換量子化値を逆量子化して変換係数を復元する逆量子化手段と、復元された変換係数を逆周波数変換する逆変換手段とを含み、
前記逆変換手段は、前記処理対象ブロックの幅と高さとの少なくとも一方が所定値を超える場合に、所定の変換係数を0に設定した後、逆周波数変換を行う
請求項1から請求項7のうちのいずれか1項に記載の映像符号化装置。 - イントラ予測とインター予測とを実行する映像符号化方法であって、
処理対象ブロックの予測誤差信号を直交変換して変換係数を生成し、
前記処理対象ブロックの予測誤差信号のエネルギーを表す第1の値を算出し、
前記処理対象ブロックの幅と高さとの少なくとも一方が所定値を超える場合に、所定の変換係数を除外し、除外された変換係数以外の変換係数のエネルギーを表す第2の値を算出し、
前記変換係数を量子化して量子化変換係数を生成し、
前記量子化変換係数を算術符号化し、
前記量子化変換係数を局所復号し、
前記処理対象ブロックに対して、複数の予測モードの候補を評価して最適予測モードを選択し、
前記最適予測モードを選択するときに、前記第1の値に対して前記第2の値が所定程度以上に小さくなっている場合に、前記量子化、前記算術符号化および前記局所復号を実行することなく、予測モードの候補を最適予測モードにしないことに決定する
ことを特徴とする映像符号化方法。 - イントラ予測とインター予測とを実行するコンピュータに、
処理対象ブロックの予測誤差信号を直交変換して変換係数を生成する処理と、
前記処理対象ブロックの予測誤差信号のエネルギーを表す第1の値を算出する処理と、
前記処理対象ブロックの幅と高さとの少なくとも一方が所定値を超える場合に、所定の変換係数を除外し、除外された変換係数以外の変換係数のエネルギーを表す第2の値を算出する処理と、
前記変換係数を量子化して量子化変換係数を生成する処理と、
前記量子化変換係数を算術符号化する処理と、
前記量子化変換係数を局所復号する処理と、
前記処理対象ブロックに対して、複数の予測モードの候補を評価して最適予測モードを選択する処理とを実行させ、
前記最適予測モードを選択するときに、前記第1の値に対して前記第2の値が所定程度以上に小さくなっている場合に、前記量子化、前記算術符号化および前記局所復号を実行することなく、予測モードの候補を最適予測モードにしないことに決定させる
ための映像符号化プログラムを格納するコンピュータ読み取り可能な記録媒体。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/273,150 US20240107009A1 (en) | 2021-01-25 | 2021-12-06 | Video coding device and video coding method |
JP2022577019A JP7473017B2 (ja) | 2021-01-25 | 2021-12-06 | 映像符号化装置および映像符号化方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021-009539 | 2021-01-25 | ||
JP2021009539 | 2021-01-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022158147A1 true WO2022158147A1 (ja) | 2022-07-28 |
Family
ID=82549734
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/044716 WO2022158147A1 (ja) | 2021-01-25 | 2021-12-06 | 映像符号化装置および映像符号化方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20240107009A1 (ja) |
JP (1) | JP7473017B2 (ja) |
WO (1) | WO2022158147A1 (ja) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160373739A1 (en) * | 2015-06-16 | 2016-12-22 | Microsoft Technology Licensing, Llc | Intra/inter decisions using stillness criteria and information from previous pictures |
JP2017513342A (ja) * | 2014-03-17 | 2017-05-25 | クゥアルコム・インコーポレイテッドQualcomm Incorporated | ゼロアウトされた係数を使用した低複雑な順変換のためのシステムおよび方法 |
US20190387241A1 (en) * | 2018-06-03 | 2019-12-19 | Lg Electronics Inc. | Method and apparatus for processing video signals using reduced transform |
-
2021
- 2021-12-06 WO PCT/JP2021/044716 patent/WO2022158147A1/ja active Application Filing
- 2021-12-06 JP JP2022577019A patent/JP7473017B2/ja active Active
- 2021-12-06 US US18/273,150 patent/US20240107009A1/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017513342A (ja) * | 2014-03-17 | 2017-05-25 | クゥアルコム・インコーポレイテッドQualcomm Incorporated | ゼロアウトされた係数を使用した低複雑な順変換のためのシステムおよび方法 |
US20160373739A1 (en) * | 2015-06-16 | 2016-12-22 | Microsoft Technology Licensing, Llc | Intra/inter decisions using stillness criteria and information from previous pictures |
US20190387241A1 (en) * | 2018-06-03 | 2019-12-19 | Lg Electronics Inc. | Method and apparatus for processing video signals using reduced transform |
Non-Patent Citations (2)
Title |
---|
BROSS BENJAMIN; WANG YE-KUI; YE YAN; LIU SHAN; CHEN JIANLE; SULLIVAN GARY J.; OHM JENS-RAINER: "Overview of the Versatile Video Coding (VVC) Standard and its Applications", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, vol. 31, no. 10, 2 August 2021 (2021-08-02), USA, pages 3736 - 3764, XP011880906, ISSN: 1051-8215, DOI: 10.1109/TCSVT.2021.3101953 * |
J. CHEN, Y. YE, S. KIM: "Algorithm description for Versatile Video Coding and Test Model 10 (VTM 10)", 19. JVET MEETING; 20200622 - 20200701; TELECONFERENCE; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), 12 August 2020 (2020-08-12), pages 1 - 97, XP030289619 * |
Also Published As
Publication number | Publication date |
---|---|
US20240107009A1 (en) | 2024-03-28 |
JPWO2022158147A1 (ja) | 2022-07-28 |
JP7473017B2 (ja) | 2024-04-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9549198B2 (en) | Apparatus for decoding a moving picture | |
US9609351B2 (en) | Apparatus for decoding a moving picture | |
JP6164660B2 (ja) | ビデオ符号化での分割ブロック符号化方法、ビデオ復号化での分割ブロック復号化方法及びこれを実現する記録媒体 | |
US11659174B2 (en) | Image encoding method/device, image decoding method/device and recording medium having bitstream stored therein | |
KR20120012401A (ko) | 인트라 예측 복호화 장치 | |
US9473789B2 (en) | Apparatus for decoding a moving picture | |
US20200029082A1 (en) | Image processing method for performing processing of encoding tree unit and encoding unit, image decoding and encoding method using same, and device thereof | |
US20220345703A1 (en) | Image encoding method/device, image decoding method/device and recording medium having bitstream stored therein | |
WO2016194380A1 (ja) | 動画像符号化装置、動画像符号化方法および動画像符号化プログラムを記憶する記録媒体 | |
KR20220024120A (ko) | 부호화 장치, 복호 장치, 및 프로그램 | |
WO2022158147A1 (ja) | 映像符号化装置および映像符号化方法 | |
KR101688085B1 (ko) | 고속 인트라 예측을 위한 영상 부호화 방법 및 장치 | |
KR102062894B1 (ko) | 비디오 복호화에서의 분할 블록 복호화 방법 및 이를 구현하는 기록매체 | |
WO2023058254A1 (ja) | 映像符号化装置、方法およびプログラム | |
JP6402520B2 (ja) | 符号化装置、方法、プログラム及び機器 | |
WO2023223705A1 (ja) | 映像符号化装置、映像符号化方法および映像システム | |
KR20110126567A (ko) | 비디오 부호화에서의 분할 블록 부호화 방법, 비디오 복호화에서의 분할 블록 복호화 방법 및 이를 구현하는 기록매체 | |
KR20200008537A (ko) | 비디오 복호화에서의 분할 블록 복호화 방법 및 이를 구현하는 기록매체 | |
KR20140124447A (ko) | 인트라 예측을 이용한 비디오 부호화/복호화 방법 및 장치 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21921249 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022577019 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18273150 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21921249 Country of ref document: EP Kind code of ref document: A1 |