[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN118947113A - Image encoding/decoding method and apparatus - Google Patents

Image encoding/decoding method and apparatus Download PDF

Info

Publication number
CN118947113A
CN118947113A CN202380030687.0A CN202380030687A CN118947113A CN 118947113 A CN118947113 A CN 118947113A CN 202380030687 A CN202380030687 A CN 202380030687A CN 118947113 A CN118947113 A CN 118947113A
Authority
CN
China
Prior art keywords
inverse
block
transform
inverse quantization
inverse transform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202380030687.0A
Other languages
Chinese (zh)
Inventor
李英烈
金明峻
林洙连
宋贤周
崔民炅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industry Academy Cooperation Foundation of Sejong University
Original Assignee
Industry Academy Cooperation Foundation of Sejong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industry Academy Cooperation Foundation of Sejong University filed Critical Industry Academy Cooperation Foundation of Sejong University
Publication of CN118947113A publication Critical patent/CN118947113A/en
Pending legal-status Critical Current

Links

Abstract

In the present disclosure, there is provided a video decoding method including: a step of obtaining the number of non-zero coefficients of the inverse quantization block; determining an inverse transform method of the inverse quantized block according to the number of non-zero coefficients; and performing inverse transform of the inverse quantized block according to the determined inverse transform method.

Description

Image encoding/decoding method and apparatus
Technical Field
The present invention relates to an image encoding/decoding method and apparatus, and more particularly, to an image encoding/decoding method and apparatus that perform inverse transform using linearity.
Background
Recently, there is an increasing demand for High resolution and High quality images, such as HD (High Definition) images and UHD (Ultra High Definition ) images, in various application fields. Because the data size is greatly increased as the resolution and quality of the image data are improved compared with conventional image data, the transmission cost and storage cost are increased when the image data are transmitted by using conventional wired and wireless broadband loop or the like or stored by using conventional storage medium. In order to solve the above-described problems associated with the high resolution and high quality of image data, a high-efficiency image compression technique can be used.
As the video compression technique, there are various techniques such as an inter-prediction technique of predicting pixel values included in a current image from a previous or subsequent image of the current image, an intra-prediction technique of predicting pixel values included in the current image using pixel information within the current image, an entropy encoding technique of assigning shorter codes to values having a higher frequency of occurrence and assigning longer codes to values having a lower frequency of occurrence, and the like, and with the video compression technique as described above, video data can be efficiently compressed and then transferred or stored.
In addition, with the increase in demand for high-resolution images, there is an increasing demand for stereoscopic image content as a new image service. For this reason, video compression techniques for effectively providing high-resolution as well as ultra-high resolution stereoscopic image contents are also actively being discussed.
Disclosure of Invention
Technical problem
An object of the present invention is to provide an image encoding/decoding method and apparatus for performing an inverse transform using linearity.
Further, the present invention aims to provide a storage medium for storing a bitstream generated by the video encoding method or apparatus according to the present invention.
The technical problems to be achieved by the present invention are not limited to the technical problems mentioned in the foregoing, and other technical problems not mentioned will be further clearly understood by those having ordinary skill in the art to which the present invention pertains from the following description.
Technical proposal
In the present disclosure, there is provided a video decoding method including: a step of obtaining the number of non-zero coefficients of the inverse quantization block; determining an inverse transform method of the inverse quantization block according to the number of the non-zero coefficients; and performing inverse transformation of the inverse quantized block according to the determined inverse transformation method.
In an embodiment, the step of determining the inverse transform method of the inverse quantized block may include: a step of comparing the number of non-zero coefficients with a specific threshold value; and determining an inverse transform method of the inverse quantization block based on the comparison result.
In an embodiment, the step of determining the inverse transform method of the inverse quantized block may include: a step of determining the number of multiplication operations required for linear inverse transformation from the number of non-zero coefficients; comparing the number of multiplication operations with a specific threshold value; and determining an inverse transform method of the inverse quantization block based on the comparison result.
In one embodiment, the method is characterized by: the number of multiplications may be determined based on the number of non-zero coefficients and the size of the inverse quantization block.
In one embodiment, the method is characterized by: the specific threshold may be determined based on the size of the inverse quantization block.
In one embodiment, the method is characterized by: the video decoding method may further include: determining a vertical kernel and a horizontal kernel for the inverse quantization block; the particular threshold may be determined based on the sizes of the vertical kernel, the horizontal kernel, and the inverse quantization block.
In one embodiment, the method is characterized by: the vertical core and the horizontal core may be determined from at least one of discrete cosine transform-II (DCT-II), discrete sine transform-VII (DST-VII), and discrete cosine transform-VIII (DCT-VIII).
In one embodiment, the method is characterized by: the vertical kernel and the horizontal kernel may be determined based on the size of the inverse quantization block and a prediction method used for the inverse quantization block.
In one embodiment, the method is characterized by: the inverse transform method of the inverse quantized block may be determined based on the image type of the inverse quantized block.
In one embodiment, the method is characterized by: the step of determining the inverse transform method of the inverse quantized block may include: and determining whether to apply linear inverse transform to the inverse quantization block according to the number of the non-zero coefficients when the image type of the inverse quantization block is an Intra-full (AI) type or a Random Access (RA) type.
In one embodiment, the method is characterized by: the step of determining the inverse transform method of the inverse quantized block may include: and deciding not to apply a linear inverse transform to the inverse quantized block when the image type of the inverse quantized block is not a full intra (AI) type or a Random Access (RA) type.
In one embodiment, the method is characterized by: the inverse transform method of the inverse quantized block may be determined based on quantization parameters applicable to inverse quantization of the inverse quantized block.
In one embodiment, the method may include: and when the quantization parameter is greater than a critical quantization parameter value, determining whether to apply linear inverse transformation to the inverse quantization block according to the number of non-zero coefficients.
In one embodiment, the method is characterized by: when the quantization parameter is less than a critical quantization parameter value, it is determined not to apply a linear inverse transform to the inverse quantization block.
In one embodiment, the method is characterized by: the video decoding method may further include: a step of acquiring linear inverse transformation allowable information indicating whether or not to permit linear inverse transformation from the parameter set; in the step of determining the inverse transform method of the inverse quantized block, if the linear inverse transform allowable information indicates that the linear inverse transform is allowable, it may be determined whether the inverse transform method of the inverse quantized block is a linear inverse transform method.
In one embodiment, the method is characterized by: the parameter set may be at least one of a video parameter set, a sequence parameter set, an image parameter set, and an adaptation parameter set.
In one embodiment, the method is characterized by: an inverse transform method of the inverse quantization block may be determined based on color components of the inverse quantization block.
In one embodiment, the method is characterized by: the step of performing the inverse transform of the inverse quantized block according to the decided inverse transform method may include: a step of dividing the inverse quantization block into a plurality of sub-blocks including only one non-zero coefficient and the remaining coefficients being zero coefficients, in the case where the inverse transform method is a linear inverse transform; a step of performing inverse transform on each of the plurality of sub-blocks; and a step of acquiring an inverse transform block of the inverse quantization block based on the plurality of element blocks of each inverse transform.
In the present disclosure, there is provided a video encoding method including: encoding the block and inversely quantizing the encoded block; a step of obtaining the number of non-zero coefficients of the inverse quantization block; determining an inverse transform method of the inverse quantization block according to the number of the non-zero coefficients; a step of performing inverse transform of the inverse quantization block according to the decided inverse transform method; and reconstructing the block by using the inverse transformation block, and encoding other blocks based on the reconstructed block.
In the present disclosure, there is provided a computer-readable storage medium characterized in that: as a computer-readable storage medium storing a bitstream related to an encoded video, a bitstream generated by a video encoding method, the video encoding method comprising: encoding the block and inversely quantizing the encoded block; a step of obtaining the number of non-zero coefficients of the inverse quantization block; determining an inverse transform method of the inverse quantization block according to the number of the non-zero coefficients; a step of performing inverse transform of the inverse quantization block according to the decided inverse transform method; and reconstructing the block by using the inverse transformation block, and encoding other blocks based on the reconstructed block.
Effects of the invention
In the present invention, an image encoding/decoding method and apparatus for performing an inverse transform using linearity can be provided.
Further, in the present invention, a method and apparatus for transmitting or storing a bitstream generated by the image encoding method/apparatus according to the present invention may be provided.
Further, in the present invention, a computer-readable recording medium for storing a bitstream generated by the image encoding method/apparatus according to the present invention may be provided.
In addition, in the present invention, image data can be efficiently encoded and decoded by the image encoding method/apparatus according to the present invention.
Drawings
Fig. 1 is an explanatory diagram schematically illustrating the configuration of an image encoding apparatus.
Fig. 2 is an exemplary diagram illustrating an embodiment of a prediction unit of the video encoding device.
Fig. 3 illustrates an example of representing a block as a sub-block consisting of 1 non-zero coefficient.
Fig. 4 illustrates an embodiment of an inverse transformation method that allows for inverse transformation using split linearities.
Fig. 5 illustrates a scanning method of rearranging coefficients within an inverse quantization block into a 1-dimensional vector.
Fig. 6 illustrates an example of rearrangement of coefficients into a 1-dimensional vector within a4×4 block using lateral scanning.
Fig. 7 illustrates an example of separating a 1-dimensional vector into sub-vectors.
Fig. 8 illustrates an example of performing inverse transformation on each sub-vector.
Fig. 9 provides an embodiment of a video decoding method that may utilize the linear inverse transform method.
Fig. 10 provides an embodiment of a video encoding method that may utilize the linear inverse transform method.
FIG. 11 illustrates the magnitude response (Magnitude response) at the 16/32-pixel (pixel) location of a 4-tap discrete cosine transform (4-tap DCT) interpolation filter and an 8-tap discrete sine transform (8-tap DST) interpolation filter.
Fig. 12 illustrates one embodiment of the coefficients of an 8-tap Discrete Sine Transform (DST) interpolation filter.
FIG. 13 illustrates the magnitude response (Magnitude response) at the 16/32-pixel (pixel) location of a 4-tap discrete cosine transform (4-tap DCT) interpolation filter, a 4-tap Gaussian (4-tap Gaussian) interpolation filter, an 8-tap discrete cosine transform (8-tap DCT) interpolation filter, and an 8-tap Gaussian (Gaussian) interpolation filter.
Fig. 14 illustrates one embodiment of a method for selecting an interpolation filter using Frequency (Frequency) information.
Fig. 15 and 16 illustrate one embodiment of the coefficients of an 8-tap discrete cosine transform interpolation filter (8-tap DCT interpolation filter) and an 8-tap smooth interpolation filter (8-tap smoothing interpolation filter), respectively.
Fig. 17 illustrates an embodiment of an interpolation filter selected according to a boundary correlation threshold (boundary correlation threshold).
Detailed Description
In the present disclosure, there is provided a video decoding method including: a step of obtaining the number of non-zero coefficients of the inverse quantization block; determining an inverse transform method of the inverse quantization block according to the number of the non-zero coefficients; and performing inverse transformation of the inverse quantized block according to the determined inverse transformation method.
Next, embodiments according to the present invention will be described in detail with reference to the drawings of the present specification so that those having ordinary skill in the art to which the present invention pertains can easily implement the present invention. However, the present invention can be realized in many different forms and is not limited to the embodiments described in the present specification. In the drawings, parts irrelevant to the description are omitted for the purpose of clearly explaining the present invention, and like parts are assigned like reference numerals throughout the specification.
Throughout the specification, the term "connected" between a certain portion and another portion includes not only a case of direct connection but also a case of electrical connection with other elements interposed therebetween.
In addition, in the entire specification, when a certain component is described as "including" a certain component, unless explicitly stated to the contrary, it is not meant to exclude other components, but it is meant that other components may also be included.
In addition, although terms such as 1 st and 2 nd may be used in describing the different components, the components are not limited by the terms. The terms are merely used to distinguish one component from other components.
In addition, in the embodiments related to the apparatus and method described in the present specification, some of the components in the apparatus or some of the steps in the method may be omitted. Further, the order of some of the components in the apparatus or some of the steps in the method may be changed. In addition, other components or other steps may be inserted into a portion of the components or steps of a method in an apparatus.
Furthermore, a part of the constitution or a part of the steps in embodiment 1 according to the present invention may be added to embodiment 2 according to the present invention or a part of the constitution or a part of the steps in embodiment 2 may be replaced.
Further, the constituent elements included in the embodiments according to the present invention are shown separately for the purpose of showing specific functions different from each other, and are not shown as separate hardware or software constituent elements. That is, although the respective constituent parts are described in the column for convenience of description, at least two constituent parts among the respective constituent parts may be combined into one constituent part, or one constituent part may be divided into a plurality of constituent parts and the corresponding functions may be executed. The combination and division embodiments of the respective constituent parts described above are also included in the scope of the claims of the present invention without departing from the essence of the present invention.
First, terms used in the present application will be briefly described as follows.
The decoding device (Video Decoding Apparatus) described in the following description may be included in a service terminal such as a folk security camera, a civil security system, a military security camera, a military security system, a personal computer (PC, personal Computer), a notebook computer, a portable multimedia player (PMP, portable Multimedia Player), a wireless communication terminal (Wireless Communication Terminal), a Smart Phone (Smart Phone), a Television (TV) application service, and a service server, and may be provided with various devices such as a communication modem for performing communication with various devices such as a user terminal and a wired/wireless communication network, a memory for storing various programs and data for performing image decoding or performing inter-picture or intra-picture prediction for decoding, and a microprocessor for performing calculation and control by executing the programs.
In addition, the video encoded into a bitstream (bitstream) by the encoder may be transmitted to the video decoding apparatus in real time or non-real time through a wired and wireless communication network such as the internet, a short-range wireless communication network, a wireless local area network, a wireless broadband access service (WiBro) network, a mobile communication network, or through various communication interfaces such as a cable, a universal serial bus (USB, universal Serial Bus), etc., and reconstructed into a video for playback by decoding. Or the bit stream generated with the encoder may be stored in a memory. The memory may include volatile memory and nonvolatile memory. In this specification, a memory may be represented as a storage medium storing a bit stream.
In general, a video may be composed of a series of images (pictures), and each image may be divided into coding units (coding units) such as blocks (blocks). It should be understood by those having ordinary skill in the art that the term of the Image described in the following description may be replaced with other terms having the same meaning, such as Image (Image) and Frame (Frame). It should be understood by those having ordinary skill in the art that the terms of codec units described below may be replaced with other terms having the same meaning as unit blocks, and the like.
Next, embodiments according to the present invention will be described in more detail with reference to the accompanying drawings. In describing the present invention, repeated descriptions of the same constituent elements will be omitted.
Fig. 1 is an explanatory diagram schematically illustrating the configuration of an image encoding apparatus.
The video encoding device 100 may include a video dividing unit 101, an intra-frame predicting unit 102, an inter-frame predicting unit 103, a subtracting unit 104, a transforming unit 105, a quantizing unit 106, an entropy encoding unit 107, an inverse quantizing unit 108, an inverse transforming unit 109, an adding unit 110, a filtering unit 111, and a memory 112.
In each device, the rate-distortion costs (RD-Cost, rate Distortion Cost) may be compared in order to select the most appropriate information. The rate-distortion Cost (RD-Cost) refers to the Cost value calculated by using the distortion information between the original block and the reconstructed block and the amount of bits generated during the prediction mode transmission. In this case, for calculating the cost value, for example, absolute error sum (SAD, sum of Absolute Difference), absolute change error sum (SATD, sum of Absolute Transformed Difference), error square sum (SSE, sum of Square for Error), and the like may be used.
In fig. 1, each constituent unit is illustrated separately for showing different special functions in the video encoding apparatus, but this does not mean that each constituent unit is constituted by hardware or one software unit separated from each other. That is, although the respective constituent parts have been described in the foregoing description for convenience of description, at least two constituent parts among the respective constituent parts may be combined into one constituent part, or one constituent part may be divided into a plurality of constituent parts to perform the corresponding functions, and the embodiment in which the respective constituent parts are combined and the embodiment in which the constituent parts are separated as described above are included in the scope of the claims of the present invention without departing from the essence of the present invention.
In addition, some of the constituent elements are not essential constituent elements for performing essential functions in the present invention, but are only optional constituent elements for improving the performance thereof. The present invention may include only the essential components for realizing the essence of the present invention except for the components for improving the performance thereof, and the structure including only the essential components except for the optional components for improving the performance thereof is also included in the scope of the claims of the present invention.
The image dividing unit 100 may divide the input image into at least one block. At this time, the inputted image may have various forms and sizes such as an image, a stripe, a parallel block, a segment, and the like. A block may refer to a Coding Unit (CU), a Prediction Unit (PU), or a Transform Unit (TU). The partitioning may be performed based on at least one of a Quad tree (Quad tree), a Binary tree (Binary tree), and a trigeminal tree partitioning (TERNARY TREE). The quadtree divides the upper block into four lower blocks each having a width and a height half of the upper block. The binary tree is a way to divide an upper level block into two lower level blocks, either of which has a width or height half that of the upper level block. The trigeminal tree is a method of dividing an upper block into three lower blocks based on either width or height. By performing the segmentation based on a binary tree as well as a trigeminal tree as described above, the tiles may have not only square morphology, but also non-square morphology.
The prediction units 102, 103 may include an inter-picture prediction unit 103 for performing inter-picture prediction and an intra-picture prediction unit 102 for performing intra-picture prediction. Specific information (e.g., intra prediction mode, motion vector, reference image, etc.) based on each prediction method may be determined after determining whether inter prediction or intra prediction is used for the prediction unit. At this time, the processing unit for performing prediction and the processing unit for determining the prediction method and the specific content may be different from each other. For example, the prediction method, the prediction mode, and the like may be determined in units of prediction, and prediction may be performed in units of transform.
The residual value (residual block) between the generated prediction block and the original block may be input to the transform section 105. In addition, prediction mode information, motion vector information, and the like, which are used when performing prediction, may be transmitted to a decoder after being encoded by the entropy encoding section 107 together with the residual value. In the case of using a specific coding mode, the prediction block may be directly encoded in the original block and then transferred to the decoding unit, instead of being generated by the prediction units 102 and 103.
The intra-picture prediction unit 102 may generate the prediction block based on pixel information in the current image, that is, reference pixel information around the current block. When the prediction mode of the peripheral block of the current block for which intra-picture prediction is required to be performed is inter-picture prediction, reference pixels included in the peripheral block to which inter-picture prediction is applicable may be replaced with reference pixels in other peripheral blocks to which intra-picture prediction is applicable. That is, in the case where the reference pixels are not available, the unavailable reference pixel information may be replaced with at least one of the available reference pixels for use.
In intra prediction, the prediction modes may include a directional prediction mode using reference pixel information according to a prediction direction and a non-directional mode not using direction information. The mode for predicting the luminance information and the mode for predicting the color difference information may be different from each other. In predicting the color difference information, intra prediction mode information used in predicting the luminance information or predicted luminance signal information may be used.
The intra prediction part 102 may include an adaptive intra Smoothing (AIS, adaptive Intra Smoothing) filter, a reference pixel interpolation part, and a mean (DC) filter. An Adaptive Intra Smoothing (AIS) filter is a filter for filtering reference pixels of a current block, and the suitability of the filter may be adaptively determined according to a prediction mode of a current prediction unit. When the prediction mode of the current block is a mode in which Adaptive Intra Smoothing (AIS) filtering is not performed, the Adaptive Intra Smoothing (AIS) filter may not be applied.
When the intra prediction mode of the prediction unit is a mode in which intra prediction is performed on the basis of a pixel value by which reference pixels are interpolated, the reference pixel interpolation section of the intra prediction section 102 can generate reference pixels at fractional unit positions by interpolating the reference pixels. When the prediction mode of the current prediction unit is a prediction mode in which the prediction block is generated without interpolating the reference pixel, the reference pixel may not be interpolated. When the prediction mode of the current block is a mean (DC) mode, a mean (DC) filter may generate a prediction block through filtering.
The inter-picture prediction unit 103 generates a prediction block using the reconstructed reference picture and the motion information stored in the memory 112. The motion information may include, for example, a motion vector, a reference picture index, a list 1 predictor, a list 0 predictor, etc.
Further, a Residual block including the prediction units generated in the prediction units 102 and 103 and Residual value (Residual) information, which is a difference value between the prediction units and the original block, may be generated. The generated residual block may be input to the transform unit 105 for transformation.
The inter-picture prediction unit 103 may derive the prediction block based on information of at least one of a previous image or a subsequent image of the current image. In addition, the prediction block of the current block may be derived based on information of a part of the field in the current image that has been encoded. The inter prediction section 103 according to one embodiment of the present invention may include a reference image interpolation section, a motion prediction section, and a motion compensation section.
In the reference image interpolation section, the reference image information may be received from the memory 112 and pixel information of integer pixels or less may be generated in the reference image. For the luminance pixel, an 8-tap interpolation filter (DCT-based Interpolation Filter) based on discrete cosine transform, which changes the filter coefficient in order to generate pixel information of integer pixel or less in 1/4 pixel unit, may be used. For the color difference pixels, a 4-tap interpolation filter (DCT-based Interpolation Filter) based on discrete cosine transform, which changes the filter coefficient in order to generate pixel information of integer pixels or less in 1/8 pixel units, may be used.
The motion prediction section may perform the motion prediction based on the reference image interpolated by the reference image interpolation section. As a method for calculating the motion vector, various methods such as a Full search block matching Algorithm (FBMA, full search-based Block Matching Algorithm), a Three-step search Algorithm (TSS, three Step Search), a New Three-step search Algorithm (NTS, new Three-STEP SEARCH Algorithm), and the like can be used. The motion vector may be based on interpolated pixels, with a motion vector value of 1/2 or 1/4 pixel units. In the motion prediction unit, a prediction block of the current block may be predicted by different motion prediction methods. As the motion prediction method, various methods such as a Skip method, a Merge (Merge) method, an advanced motion vector prediction (AMVP, advanced Motion Vector Prediction) method, and the like can be used.
The cut-out operation unit 104 adds a block currently to be encoded to the prediction block generated in the intra-picture prediction unit 102 or the inter-picture prediction unit 103, thereby generating a residual block of the current block.
The transform unit 105 may transform the residual block including the residual data by a transform method such as Discrete Cosine Transform (DCT), discrete Sine Transform (DST), karhunen-loy transform (KLT, karhunen Loeve Transform), or the like. In this case, the conversion method may be determined based on an intra prediction mode of a prediction unit used for generating the residual block. For example, a Discrete Cosine Transform (DCT) may be used in the horizontal direction and a Discrete Sine Transform (DST) may be used in the vertical direction according to an intra prediction mode. Alternatively, different conversion techniques may be used in the horizontal direction and the vertical direction depending on the aspect ratio, the size, and the like of the current block.
The quantization unit 106 may quantize the value converted into the frequency domain by the conversion unit 105. The quantization coefficients may be changed according to the block or according to the importance of the image. The value calculated in the quantization section 106 may be supplied to the inverse quantization section 108 and the entropy encoding section 107.
The conversion unit 105 and/or the quantization unit 106 may be optionally included in the video encoding device 100. That is, the video encoding device 100 may perform at least one of transformation and quantization on the residual data of the residual block, or may encode the residual block by skipping transformation and quantization. Even when neither transformation nor quantization nor both transformation and quantization are performed in the video encoding apparatus 100, the block input to the entropy encoding section 107 is generally referred to as a transformed block.
The entropy encoding section 107 performs entropy encoding on the input data. The entropy encoding may use various encoding methods such as exponential golomb code (Exponential Golomb), context-adaptive variable length coding (CAVLC, context-Adaptive Variable Length Coding), and Context-adaptive binary arithmetic coding (CABAC, context-Adaptive Binary Arithmetic Coding).
The entropy encoding section 107 may encode a variety of information such as coefficient information of a transform block, block type information, prediction mode information, partition unit information, prediction unit information, transmission unit information, motion vector information, reference frame information, interpolation information of a block, filter information, and the like. The coefficients of the transform block may be encoded in sub-block units within the transform block.
In order to encode coefficients of a transform block, various syntax elements such as a syntax element (syntax element) for indicating a position of a first non-zero coefficient in the reverse scan order, that is, last_sig, a flag for indicating whether at least one non-zero coefficient is included in a sub-block, that is, coded_sub_blk_flag, a flag for indicating whether the non-zero coefficient is included, that is, sig_coeff_flag, a flag for indicating whether an absolute value of a coefficient is greater than 1, that is, abs_greater1_flag, a flag for indicating whether an absolute value of a coefficient is greater than 2, that is, abs_greater2_flag, and a flag for indicating a Sign of a coefficient, that is, sign_flag, may be encoded. The remaining values of the coefficients not encoded by the above syntax elements may be encoded by the syntax element remaining_coeff.
The inverse quantization unit 108 and the inverse transformation unit 109 inversely quantize the value quantized by the quantization unit 106 and inversely transform the value transformed by the transformation unit 105. The difference (Residual) generated in the inverse quantization unit 108 and the inverse transform unit 109 may be combined with the prediction units predicted by the motion estimation unit, the motion compensation unit, and the intra-frame prediction unit 102 included in the prediction units 102 and 103, to generate a reconstructed block (Reconstructed Block). The adder 110 adds the prediction block generated by the prediction units 102 and 103 and the residual block generated by the inverse transform unit 109 to generate a reconstructed block.
The filtering section 111 may include at least one of a deblocking filter, an offset correction section, and an adaptive loop filter (ALF, adaptive Loop Filter).
The deblocking filter may remove block distortion due to boundaries between blocks from the reconstructed difference. To determine whether deblocking needs to be performed, it may be determined whether deblocking filters need to be applied to the current block based on pixels included in columns or rows included in the block. In applying the deblocking Filter to the block, a Strong Filter (Strong Filter) or a weak Filter (WEAK FILTER) may be applied according to the required deblocking Filter strength. Further, in the case where the deblocking filter is applied, horizontal direction filtering and vertical direction filtering may be processed in parallel when vertical filtering and horizontal filtering are performed.
The offset correction unit may correct the offset between the image after deblocking and the original image in pixel units. For offset correction of a specific image, a method of dividing pixels included in an image into a predetermined number of domains, determining a domain in which offset is to be performed, and applying offset to the corresponding domain, or a method of applying offset in consideration of edge information of each pixel may be used.
The adaptive loop filter (ALF, adaptive Loop Filtering) may be performed on the basis of values that compare the filtered reconstructed image with the original image. It is possible to determine that one filter to be applied to a corresponding group is required after dividing pixels included in an image into specified groups, thereby performing different filtering in each group, respectively. For information on whether or not an Adaptive Loop Filter (ALF) is applied, a luminance signal may be transmitted for each Coding Unit (CU), and the shape and filter coefficient of the applied Adaptive Loop Filter (ALF) may be changed for each block. The Adaptive Loop Filter (ALF) of the same configuration (fixed configuration) can be applied regardless of the characteristics of the application target block.
The memory 112 may store the reconstructed block or image calculated by the filtering section 111, and the stored reconstructed block or image may be provided to the prediction sections 102, 103 when inter-picture prediction is performed.
Fig. 2 is an exemplary diagram illustrating an embodiment of a prediction unit of the video encoding device.
In the case where the prediction mode of the current block is the intra prediction mode, the intra prediction section 201 may generate the reference pixel by deriving the reference pixel from the periphery of the current block and filtering the reference pixel. The reference pixels are determined by reconstructed pixels surrounding the current block. In case the perimeter of the current block cannot utilize a portion of reconstructed pixels or no reconstructed pixels, the available reference pixels may be filled into the unavailable domain or with intermediate values in the range of values that the pixels may have. After deriving all reference pixels, the reference pixels may be filtered using an adaptive intra Smoothing (AIS, adaptive Intra Smoothing) filter.
The intra prediction mode search unit 202 may determine any one of M intra prediction modes. Where M represents the total number of intra prediction modes. The intra prediction modes include directional prediction modes and non-directional prediction modes.
The prediction block may be generated using the determined prediction mode and the filtered reference pixels. The least costly one intra prediction mode may be selected by comparing the rate-distortion costs (RD-Cost) of the different intra prediction modes.
The inter-picture prediction unit 203 may be divided into a Merge (Merge) candidate search unit 204 and an Advanced Motion Vector Prediction (AMVP) candidate search unit 206 according to a method of deriving motion information. The Merge (Merge) candidate search unit 204 sets a reference block using inter prediction among reconstructed blocks surrounding the current block as a Merge (Merge) candidate. The Merge (Merge) candidates are derived in the encoding/decoding device using the same method and the same number is used. The number of Merge (Merge) candidates may be transferred from the encoding apparatus to the decoding apparatus, or a pre-agreed number may be used. In the case where a prescribed number of Merge (Merge) candidates cannot be derived from reconstructed reference blocks surrounding the current block, motion information of a block at the same position as the current block in a different image from the current image may be used as the Merge (Merge) candidates. Alternatively, the insufficient Merge (Merge) candidate may be derived by combining motion information in the history direction and motion information in the future direction with reference to the current image. Or blocks in the same position in other reference pictures may be set as Merge (Merge) candidates.
The Advanced Motion Vector Prediction (AMVP) candidate search unit 206 may determine the motion information of the current block by the motion estimation unit 207. The motion estimation section 207 searches for a prediction block similar to the current block from the reconstructed image.
When inter-picture prediction is performed, motion information of the current block may be determined by one of the Merge (Merge) candidate search unit 204 and the Advanced Motion Vector Prediction (AMVP) candidate search unit 206, and the motion compensation unit 208 may generate a prediction block based on the determined motion information.
In hybrid block based video coding, transforms play an important role in energy compression. Transform coding refers to concentrating energy into a low frequency band by converting residual data of a spatial domain into frequency domain data. Considering discrete cosine transform-II (DCT-II), discrete sine transform-VII (DST-VII) and discrete cosine transform-VIII (DCT-VIII) as linear transforms, inverse transforms for reducing the number of computations using the linearity of the transform can be used in video encoding as well as decoding. By adapting the proposed inverse transform using linearity to video encoding as well as decoding, run time can be saved without degrading encoding performance. In particular, the average decoding time can be greatly reduced under the conditions of full Intra (AI) and Random Access (RA).
In hybrid block based video coding, the spatial domain residual signal acquired after intra/inter prediction will be transformed into a frequency domain residual signal. By an efficient transformation method more energy can be concentrated in the frequency domain into the low frequency component of the residual signal. The Karhunen-Loeve transform (KLT) is a very efficient transform method in terms of data inverse correlation and compression. However, the calonan-loy transform (KLT) is not used in actual transform coding because its complexity (complexity) of computing eigenvectors (eigen vector) corresponding to signal-dependent covariance matrices (signals-DEPENDENT COVARIANCE MATRIX) is high and generally there is no faster computing algorithm.
Because discrete cosine transform-II (DCT-II) provides an excellent approximation of the calonan-loy transform (KLT) under the Markov level 1 condition, the video coding standard uses discrete cosine transform-II (DCT-II) instead of the calonan-loy transform (KLT). But because of the diverse nature of images and video, discrete cosine transform-II (DCT-II) is not always the best transform in terms of energy compression and inverse correlation. To solve the above-described problems, alternative transform means such as discrete cosine transform-II/discrete sine transform-III (DCT-II/DST-VIII) and enhanced multiple transform (EMT, enhanced Multiple Transform) for video coding may be used.
Furthermore, in some cases, the discrete sinusoidal transform-VII (DST-II) may also be brought closer to the Carlo nan-Lowe transform (KLT) using a 1-level Gaussian-Markov (Gauss-Markov) model associated with the image signal. Thus, the video codec may be set to use discrete cosine transform-II (DCT-II) based transforms for 4 x 4, 8 x 8, 16 x 16, and 32 x 32 prediction residual blocks, while discrete sine transform-VII (DST-VII) based alternative transforms are used for 4 x 4 intra prediction residual blocks.
With the recent increase in demand for high-quality and high-resolution images and the development of services such as video streaming, more efficient image compression techniques are required. To improve the efficiency of the transformation, a combination of a split type transformation (joint separable transform) and an inseparable type transformation (non-separable transform) may be used.
The Enhanced Multiple Transform (EMT) using separable attributes of the transform may select one of horizontal and vertical transforms that are most excellent in terms of coding efficiency from among previously defined horizontal and vertical transforms or discrete cosine transform-II (DCT-II), discrete cosine transform-V (DCT-V), discrete cosine transform-VIII (DCT-VIII), discrete sine transform-I (DST-I), and discrete sine transform-II (DST-II). In addition, the inseparable 2-stage transform (Non-separable Secondary Transform) can be applied as a 2-stage transform after the Enhanced Multiple Transform (EMT).
The transformation can be largely divided into two processes, a 1-level transformation and a 2-level transformation. Simplified Enhanced Multiple Transform (EMT) for prediction residual signals may also use the name of multiple transform selection (MTS, multiple Transform Selection). In addition to discrete cosine transform-II (DCT-II), discrete sine transform-VII (DST-VII) and discrete cosine transform-VIII (DCT-VIII) may be additionally used as a transform for Multiple Transform Selection (MTS). But the discrete sine transform-VII (DST-VII) and the discrete cosine transform-VIII (DCT-VIII) can only be applied to luminance blocks.
The maximum transform size of the applicable transform may be set to 64 x 64. Discrete cosine transform-II (DCT-II) may be applied to transforms of blocks of 4 x4 to 64 x 64 sizes, while discrete sine transform-VII (DST-VII) and discrete cosine transform-VIII (DCT-VIII) may be applied to transforms of blocks of 4 x4 to 32 x 32 sizes.
Transform of large blocks can be used for high resolution video, but can result in increased computational complexity. In order to solve the above-described problem, in the transform of a large block, the high frequency transform coefficient may be processed to 0 (zero out). For example, in the case of a 64-point discrete cosine transform-II (DCT-II) transform, only the first 32 low frequency coefficients may be maintained and the remaining high frequency coefficients may be processed as 0 (zero out), respectively. Furthermore 32 in the case of a 32-point discrete cosine transform-VII/VII (DST-II) -16 low frequency coefficients can be processed and the remaining high frequency coefficients are treated as 0 (zero out). The zeroing (zeroing out) described above may also be used in the last coefficient position encoding (last coefficient position coding) and coefficient grouping scanning (coefficient group scanning).
The level 2 transform refers to an additional transform process after the level 1 transform. In one embodiment, a low frequency inseparable transform (LFNST, low Frequency Non-Separable Transform) may be used in a video codec. The low frequency inseparable transform (LFNST) may be applied to the region of interest (ROI, region OfInterest) of the basic transform coefficients. The region of interest (ROI) may be an upper left low frequency domain. When the low frequency inseparable transform (LFNST) is applied, all basic transform coefficients except the region of interest (ROI) will be 0, and the output of the low frequency inseparable transform (LFNST) is additionally quantized and entropy encoded.
The definition of the 1-dimensional (1-D) N-point transform and its inverse is shown in the following equations 1 and 2.
[ Math 1]
[ Formula 2]
Where F (u) is the N-point transformed signal and p (x) is the original signal. In addition, v u, x is a basic element of an n×1 size vu basic vector associated with each u in discrete cosine transform-II (DCT-II), discrete sine transform-VII (DST-VII), and discrete cosine transform-VIII (DCT-VIII). Wherein u, x= (0, 1,2, …, N-1). V u related to discrete cosine transform-II (DCT-II), discrete sine transform-VII (DST-VII) and discrete cosine transform-VIII (DCT-VIII), and x is defined as shown in equations 3, 4 and 5, respectively.
[ Formula 3]
[ Math figure 4]
[ Formula 5]
In the present disclosure, an inverse transformation using separable linear properties is proposed in order to reduce computational complexity. The proposed inverse transform method can be applied to basic transforms and basic inverse transforms of encoders and decoders. In the inverse transform step of the encoder and decoder, the inverse quantized transform coefficients after the low frequency inseparable transform (LFNST) will be input to a 2-dimensional (2-D) inverse transform. In most video codecs, 2-dimensional transformation and inverse transformation can be implemented in a separable type transformation by applying 1-dimensional inverse transformation of mathematical formula 2 to each row and column in order to reduce computational complexity. A separable inverse transform of a non-square block size is represented by equation 6.
[ Formula 6]
X′=BTYA
Where X' is an (n×m) inverse transform block, Y is an (n×m) inverse quantized transform block, A is an (m×m) transform block, and B T is an (n×n) transform. Wherein n and m are the height and width of the block, respectively. Through quantization and inverse quantization processes, most transform coefficients will be 0 when the quantized coefficients are large. In the case where Y is composed of N non-zero coefficients, Y may be expressed as a sum of N sub-blocks of the same size as Y having only one non-zero coefficient as shown in equation 7. Where yi represents the ith sub-block of Y.
[ Formula 7]
Y=y0+y1+…yN-1
Fig. 3 provides an example of representing a4 x 4 block consisting of 3 non-zero coefficients as a plurality of sub-blocks using equation 7. In fig. 3, a4 x 4 block Y300 includes 3 non-zero coefficients. Thus, the 4 x 4 block Y300 may be partitioned into 3 sub-blocks Y 0302、y1 and Y 2 306 that contain only one non-zero coefficient. Further, the sub-blocks y 0302、y1 and y 2 306 may be inverse-transformed, respectively, and the results generated by the inverse transformation may be accumulated using the linearity of the inverse transformation, thereby performing the inverse transformation of the block.
Discrete cosine transform-II (DCT-II), discrete sine transform-VII (DST-VII), and discrete cosine transform-VIII (DCT-VIII) have linear characteristics as shown in mathematical formula 8.
[ Math figure 8]
T(αx+βy)=αT(x)+βT(y)
Where T (x) refers to the transform, x and y are inputs to the transform, and α and β are arbitrary scalar values. The inverse transformation can be expressed as equation 9 by equations 7 and 8.
[ Formula 9]
Assuming that the non-zero transform coefficient is located at the (i, j) th element of Y, B T ylA, 0.ltoreq.l.ltoreq.N-1 can be expressed as equation 10 by using the basis vectors of the transforms B T and A. In the equation 10, yl is a matrix in which non-zero coefficients are located at the (i, j) th element and the transform coefficients of the remaining elements are zero.
[ Math.10 ]
X i,j is the non-zero transform coefficient of the (i, j) th element of the (n X m) sized inverse quantized transform block. v i is the i-th base vector of transform B T and w i is the i-th base vector of transform a. By calculating B Tyl from equation 10, equation 11 can be obtained.
[ Formula 11 ]
[ Formula 12 ]
When the proposed (n×m) inverse transform is applied to 1 non-zero coefficient, the number of multiplications is n+ (n×m). Therefore, the total multiplication number of the inverse transform using linearity for the (n×m) transform block having N non-zero coefficients is calculated by the equation 13.
[ Formula 13 ]
N×(n+(n×m))
Therefore, only when the number of non-zero coefficients is small, the total number of multiplications of the equation 13 can be applied to the inverse high-speed transform of the inverse quantization block in order to reduce the computational complexity in the inverse transform.
Whether to perform the inverse transformation using the existing method having separable characteristics or the proposed method having separable linear characteristics is determined based on a threshold value obtained by comparing the number of multiplication operations of discrete cosine transform-II (DCT-II), discrete sine transform-VII (DST-VII), and discrete cosine transform-VIII (DCT-VIII) with mathematical formula 13. The threshold is calculated in advance in the inverse quantized transform block based on the maximum number of non-zero coefficients. Wherein, in each block size, n× (n+ (n×m)) does not exceed the number of multiplication operations of the inverse transform.
The proposed method is performed in the manner shown in fig. 4. First, the number of non-zero coefficients of the inverse quantized transform block Y is calculated before the inverse transform process. Second, when the number of non-zero coefficients does not exceed a critical value, the proposed inverse transform using the separation linearity is performed. Otherwise, other types of odd-even decomposition will perform the inverse transform as described below.
The transformation may be implemented using a Fast (Fast) method or direct matrix multiplication. When direct array multiplication is used, the multiplication times of the 1D N-point transform is N≡2. Except for discrete cosine transform-II (DCT-II), discrete sine transform-VII (DST-VII) and discrete cosine transform-VIII (DCT-VIII) can be implemented using fast algorithms. Discrete cosine transform-II (DCT-II) uses a fast algorithm that exploits the symmetric and anti-symmetric properties of discrete cosine transform-II (DCT-II). The even basis vectors of the discrete cosine transform-II (DCT-II) are symmetrical and the odd basis vectors are antisymmetric. Regarding the even and odd parts of the N-point input, N-point output is generated by performing addition and subtraction operations between the even and odd parts after calculation using part set matrices acquired from even columns and odd columns of the respective inverse transform matrices. The fast method is also called a partial butterfly structure.
Discrete sine transform-VII (DST-VII) and discrete cosine transform-VIII (DCT-VIII) may be used as basic transform solutions. A fast method related to discrete sine transform-VII (DST-VII) and discrete cosine transform-VIII (DCT-VIII) uses inheritance (inherited) from the functions of discrete sine transform-VII (DST-VII) and discrete cosine transform-VIII (DCT-VIII) in order to reduce the number of jobs. There are three useful functions for reducing the number of computations in discrete sine transform-VII (DST-VII) and discrete cosine transform-VIII (DCT-VIII) transform arrays. First, N elements are included regardless of the sign change. Second, the lower set of N elements is included without considering the sign change. Third, a part of the transformation vectors other than 0 contains only a single element when no sign is changed.
[ Table 1]
[ Table 2]
The number of multiplication operations required in each (n x m) block size in the case where both the horizontal kernel and the vertical kernel are discrete cosine transform-II (DCT-II) is shown in table 1. The number of multiplication operations required in each n x m block size in the case of horizontal and vertical transforms into a combination of discrete sine transform-VII (DST-VII) and discrete cosine transform-VIII (DCT-VII), i.e. (DST-VII ), (DST-VII, DCT-VIII), (DCT-VIII, DST-VII) and (DCT-VIII ) is shown in table 2.
[ Table 3]
[ Table 4]
The threshold values in relation to the number of non-zero coefficients in each block size (n x m) when the horizontal and vertical kernels are discrete cosine transform-II/discrete cosine transform-II (DCT-II/DCT-II) or other combinations of kernels are shown in table 3.
The threshold values in relation to the number of non-zero coefficients in each block size (n x m) for the horizontal and vertical kernels being discrete sine transform-VII (DST-VII/and discrete cosine transform-VIII (DCT-VII) combinations (DST-VII/DST-VII, DST-VII/DCT-VIII, DCT-VIII/DST-VII, DCT-VIII) are shown in table 4.
The threshold value of each n×m block determined by comparing the number of multiplications in table 1 and table 2 with the number of multiplications calculated a priori by equation 8 is determined by a combination of horizontal and vertical transformations in table 3 and table 4. The threshold values associated with the number of non-zero coefficients in each n x m block when the horizontal and vertical kernels are discrete cosine transform-II/discrete cosine transform-II (DCT-II/DCT-II) or other combinations are shown in table 3. The threshold values representing the number of non-zero coefficients in each n x m block when the horizontal and vertical kernels are a combination of discrete sine transform-VII (DST-VII) and discrete cosine transform-VIII (DCT-VIII) are shown in table 4. For example, in the case where the horizontal and vertical transforms are a combination of discrete sine transform-VII/discrete cosine transform-VIII (DST-VII/DCT-VIII) and the non-zero coefficients indicated by bold in the 8 x 8 blocks of table 4 are 14 or less, the inverse transform may be performed by the inverse transform proposed in the present disclosure.
The average selection ratio of the proposed method related to the Y component may be gradually increased with an increase in QP value. The average selection ratio of the proposed method related to the Y component may be gradually increased with an increase in QP value. The result as described above occurs because the higher the QP value, the fewer the number of non-zero coefficients in the quantization process.
The proposed inverse transform with linearity can be implemented in the encoder as well as in the decoder. According to the separable transform of an embodiment, since 16-bit precision is used after vertical and horizontal transforms, in case the proposed linear transform is applied only to a decoder, encoder and decoder inconsistencies may occur. Since the complexity of the decoder is very simple compared to the encoder, the average decoding time can be shortened more effectively than the shortening of the encoding time by Random Access (Random Access).
Finally, by applying the proposed inverse transform with separable characteristics using linearity to the encoder and decoder, run time can be saved while maintaining encoding performance as compared to when applied to the decoder.
For video encoding, as described above, a low frequency inseparable transform (LFNST, low Frequency Non-Separable Transform) may be performed as a 2-level transform (Secondary Transform). The 2-level transform refers to a transform that is additionally performed after the 1-level transform (Primary Transform). For 2-level transformation, the 1-level transformed coefficients (Primary transformed coefficients) represented by the 2-dimensional matrix may be rearranged into a 1-dimensional vector. Furthermore, the 2-level transform may be performed by direct matrix multiplication with non-separable (non-separable) cores based on the rearranged 1-dimensional vectors.
In one embodiment, the amount of computation can be reduced compared to when direct matrix multiplication is applied to a 1-dimensional vector by separating the 1-dimensional vector into sub-vectors having only one non-zero coefficient using the linearity of the transformation and performing the transformation on the respective sub-vectors using direct matrix multiplication. The following equation 14 represents a forward secondary transformation equation, and equation 15 represents a reverse secondary transformation equation.
[ Formula 14 ]
[ Math 15 ]
When direct matrix multiplication is applied to a 1-dimensional vector, multiplication and addition of (N-1) ×r times in total are required in forward transform. In addition, in the inverse transformation, a total of R×N multiplications and (R-1) ×N additions are required. In the forward transform, N represents the size of the input vector (input vector) and R represents the size of the output vector (output vector). Further, in the inverse transform, R represents the magnitude of the input vector, and N represents the magnitude of the output vector.
In the case of using the linearity of the transformation, when N (forward: 0.ltoreq.n, reverse: 0.ltoreq.n.ltoreq.R) non-zero coefficients exist in a 1-dimensional vector as an input of the transformation, multiplication of a total of Rxn times and addition of Rx (N-1) times are required in the forward transformation. In addition, in the inverse transform, n×n multiplications and n× (N-1) additions are required. Therefore, by applying a transform scheme using the linearity of the transform to a 1-dimensional vector having fewer non-zero coefficients, the amount of computation can be significantly reduced.
Accordingly, not only in the case of the separable transform (Separable Transform), but also in the case of the Non-separable transform (Non-Separable Transform), the amount of computation can be drastically reduced by applying the transform based on linearity. The non-separable transformation method using linearity can be applied not only to the secondary transformation but also to the primary transformation.
Fig. 5 illustrates a scanning method of rearranging coefficients within an inverse quantization block into a 1-dimensional vector. Intra-block coefficients that are inverse quantized by a block scanning method may be rearranged into 1-dimensional vectors. Starting from the left side of fig. 5, a diagonal direction scan, a horizontal direction scan, and a vertical direction scan are described. Further, fig. 6 illustrates an illustration of rearrangement of coefficients into a 1-dimensional vector within a4×4 block with lateral scanning. As shown in fig. 5 and 6, for 2-dimensional inseparable transformation, the 2-dimensional input matrix is rearranged into a 1-dimensional input vector. Further, a 2-dimensional inseparable transform is realized by dividing the rearranged 1-dimensional input vector into a plurality of sub-vectors containing only one non-zero coefficient, and multiplying each of the plurality of sub-vectors by a 2-dimensional inseparable (non-separble) kernel, and further accumulating the results of each multiplication according to the linearity of the transform.
The 1-dimensional vector may be split into sub-vectors having only one non-zero coefficient and the inverse transform performed on each sub-vector. Fig. 7 illustrates an example of separating a 1-dimensional vector into sub-vectors. Further, fig. 8 illustrates an example of performing inverse transformation on each sub-vector.
The final inverse transform vector can be generated by accumulating all the vectors based on the inverse transform results associated with the respective sub-vectors in fig. 8. The 1-dimensional vector, i.e., the final inverse transform vector, may be rearranged into 2-dimensional blocks according to the block scanning method used in the transform.
Fig. 9 provides an embodiment of a video decoding method that may utilize the linear inverse transform method.
In step 902, the number of non-zero coefficients of the inverse quantization block is obtained.
In step 904, an inverse transform method of the inverse quantized block is determined according to the number of non-zero coefficients.
In one embodiment, the number of non-zero coefficients may be compared with a specific threshold value, and an inverse transform method of the inverse quantized block may be determined based on the comparison result.
In an embodiment, the number of multiplication operations required for the linear inverse transform may be determined from the number of non-zero coefficients. In addition, the number of multiplication operations may be compared with a specific threshold value, and an inverse transform method of the inverse quantization block may be determined based on the comparison result. The number of multiplications may be determined based on the number of non-zero coefficients and/or the size of the inverse quantization block.
In the embodiment, the specific threshold may be determined based on the size of the inverse quantization block.
In one embodiment, a vertical kernel and a horizontal kernel applicable to the inverse quantization block may be determined. Furthermore, the specific threshold may be determined based on the size of the vertical kernel, the horizontal kernel, and/or the inverse quantization block. The vertical core and the horizontal core may be determined from at least one of discrete cosine transform-II (DCT-II), discrete sine transform-VII (DST-VII), and discrete cosine transform-VIII (DCT-VIII). Furthermore, the vertical kernel and the horizontal kernel may be determined based on the size of the inverse quantization block and a prediction method used for the inverse quantization block.
In an embodiment, the inverse transform method of the inverse quantized block may be determined based on the image type of the inverse quantized block. For example, when the image type of the inverse quantization block is a full Intra (AI) type or a Random Access (RA) type, it may be determined whether to apply a linear inverse transform to the inverse quantization block according to the number of non-zero coefficients. In contrast, when the image type of the inverse quantized block is not a full intra (AI) type or a Random Access (RA) type, it may be decided not to apply a linear inverse transform to the inverse quantized block.
In an embodiment, the inverse transform method of the inverse quantized block may be determined based on quantization parameters applicable to the inverse quantization of the inverse quantized block. When the quantization parameter is greater than a critical quantization parameter value, it may be determined whether to apply a linear inverse transform to the inverse quantization block according to the number of non-zero coefficients. In contrast, when the quantization parameter is smaller than a critical quantization parameter value, it may be decided not to apply a linear inverse transform to the inverse quantization block.
In an embodiment, linear inverse transformation tolerance information indicating whether to allow linear inverse transformation may be acquired from the parameter set. Further, in the case where the linear inverse transform allowable information indicates that the linear inverse transform is allowable, it may be decided whether the inverse transform method of the inverse quantization block is a linear inverse transform method. In contrast, in the case where the linear inverse transform allowable information indicates that the linear inverse transform is not allowable, it may be decided that the inverse quantization block is inverse-transformed by another inverse transform method than the linear inverse transform method. The parameter set may be at least one of a video parameter set, a sequence parameter set, an image parameter set, and an adaptation parameter set.
In an embodiment, the inverse transform method of the inverse quantized block may be determined based on the color components of the inverse quantized block.
In step 906, an inverse transform of the inverse quantized block is performed according to the determined inverse transform method. In the case where the inverse transform method is a linear inverse transform method, the inverse transform of the inverse quantized block may be performed as follows.
First, the inverse quantization block may be divided into a plurality of sub-blocks including only one non-zero coefficient, and the remaining coefficients are zero coefficients. Next, an inverse transform may be performed on each of the plurality of sub-blocks. The inverse transform block of the inverse quantization block may be acquired based on a plurality of element blocks of respective inverse transforms.
Fig. 10 provides an embodiment of a video encoding method that may utilize the linear inverse transform method.
In the video encoding method, after encoding a block of a video, in order to encode another block of the video or another video, the block is stored in a decoded video buffer (Decoded Picture Buffer) after decoding. Therefore, by applying an appropriate inverse transform method to the video encoding method, the speed of video encoding can be increased. The video encoding method using the linear inverse transform method can be implemented as follows.
In step 1002, a block may be encoded and the encoded block is inverse quantized.
In step 1004, the number of non-zero coefficients of the inverse quantization block may be obtained.
In step 1006, an inverse transform method of the inverse quantized block may be determined according to the number of non-zero coefficients.
In step 1008, an inverse transform of the inverse quantized block may be performed according to the determined inverse transform method.
In steps 1004 to 1008, the constitution related to steps 902 to 906 may be applied.
In step 1010, the block may be reconstructed using the inverse transform block and other blocks may be encoded based on the reconstructed block.
A computer readable storage medium storing a bitstream generated by the video encoding method of fig. 10 may be provided. The bit stream generated by the video encoding method of fig. 10 may be stored in a computer-readable storage medium. In addition, the bit stream generated by the video encoding method of fig. 10 may be transferred from the video encoding apparatus to the video decoding apparatus.
The bit stream associated with video data stored in the computer readable storage medium may be decoded by the video decoding method of fig. 9. In addition, the bit stream transferred from the video encoding apparatus to the video decoding apparatus can be decoded by the video decoding method of fig. 10.
8-Tap discrete sine transform (8-tap DST) interpolation filter
The existing method applies a 4 tap discrete cosine transform (4-tap DCT) interpolation filter to all blocks regardless of the size of the block. In the present disclosure, in order to generate reference samples related to Fractional angle (Fractional angle) in intra prediction, a method of applying an 8-tap discrete sine transform (8-tap DST) interpolation filter to 4×4, 4×n, and n×4 blocks is proposed.
For higher resolution class a sequences (sequences), an 8-tap discrete sine transform (8-tap DST) interpolation filter is applied to the 4 x 4 block instead of a 4-tap discrete cosine transform (4-tap DCT) interpolation filter, while for a relatively lower resolution class B, C, D sequences, an 8-tap discrete sine transform (8-tap DST) interpolation filter is applied to the 4 x n block and the n x 4 block (n=4, 8, 16, 32, 64) instead of a 4-tap discrete cosine transform (4-tap DCT) interpolation filter.
The 8-tap discrete sine transform (8-tap DST) interpolation filter coefficients are induced by discrete sine transform-VII (DST-VII) and inverse discrete sine transform-VII (IDST-VII, inverse DST-VII), table 5 shows the coefficients at specific 16/32-pixel locations in the 1/32 interpolation filter coefficients.
[ Table 5]
Interpolation filter analysis
FIG. 11 illustrates the magnitude response (Magnitude response) at the 16/32-pixel (pixel) location of a 4-tap discrete cosine transform (4-tap DCT) interpolation filter and an 8-tap discrete sine transform (8-tap DST) interpolation filter. In fig. 11, the x-axis represents the Frequency (Frequency) normalized to a value between 0 and 1, and the y-axis represents the amplitude response (Magnitude response).
In fig. 11, the blue graph shows the amplitude response (Magnitude response) associated with an existing 4-tap discrete cosine transform (4-tap DCT) interpolation filter, while the red graph shows the amplitude response (Magnitude response) of the proposed method, i.e., an 8-tap discrete sine transform (8-tap DCT) interpolation filter. As shown in fig. 11, it can be confirmed that the low frequency response (Low frequency response) of the two interpolation filters is similar, but the 8-tap discrete sine transform (8-tap DST) interpolation filter has a high frequency response (High frequency response) compared to the 4-tap discrete cosine transform (4-tap DCT) interpolation filter.
Fig. 12 illustrates the coefficients of an 8-tap Discrete Sine Transform (DST) interpolation filter.
The following equation 16 shows a coefficient induction method of an 8-tap Discrete Sine Transform (DST) interpolation filter. In equation 16, equation (3) can be induced by substituting equation (1) into equation (2).
[ Math.16 ]
In the present disclosure, a method is proposed in which an 8-tap discrete cosine transform (8-tap DCT) interpolation filter is used instead of a 4-tap discrete cosine transform (4-tap DCT) interpolation filter for a block having nTbS of 2, and an 8-tap Gaussian (8-tap Gaussian) interpolation filter is used instead of a 4-tap Gaussian (4-tap Gaussian) interpolation filter for a block having nTbS of 5 or more.
Table 6 shows the coefficients at a particular 16/32-pixel (pixel) location in the 1/32-pixel (pixel) 8 tap discrete sine transform (8-tap DCT) interpolation filter coefficients. Table 7 shows the coefficients at a particular 16/32-pixel (pixel) location in the 1/32-pixel (pixel) 8 tap Gaussian) interpolation filter coefficients.
[ Table 6]
[ Table 7]
FIG. 13 illustrates the magnitude response (Magnitude response) at the 16/32-pixel (pixel) location of a 4-tap discrete cosine transform (4-tap DCT) interpolation filter, a 4-tap Gaussian (4-tap Gaussian) interpolation filter, an 8-tap discrete cosine transform (8-tap DCT) interpolation filter, and an 8-tap Gaussian (Gaussian) interpolation filter. The x-axis represents the Frequency (Frequency) normalized to a value between 0 and 1, while the y-axis represents the amplitude response (Magnitude response).
By comparing the blue chart, i.e. the 4-tap discrete cosine transform (4-tap DCT) interpolation filter, with the yellow chart, i.e. the 8-tap discrete sine transform (8-tap DCT) interpolation filter, it can be confirmed that the 8-tap discrete sine transform (8-tap DCT) interpolation filter has a high frequency response (High frequency response) compared to the 4-tap discrete sine transform (4-tap DCT) interpolation filter. Further, it can be confirmed that the 8-tap Gaussian (8-tap Gaussian) interpolation filter has a low frequency response (Low frequency response) compared to the 4-tap Gaussian (4-tap Gaussian) interpolation filter by comparing the red graph, i.e., the 4-tap Gaussian) interpolation filter, with the purple graph, i.e., the 8-tap Gaussian (8-tap Gaussian) interpolation filter.
In the present disclosure, performance can be improved by additionally using an 8-tap discrete sine transform (8-tap DCT) interpolation filter and an 8-tap Gaussian (8-tap Gaussian) interpolation filter according to block sizes and orientation modes among 4-tap discrete sine transform (4-tap DCT) interpolation filters and 4-tap Gaussian (4-tap Gaussian) interpolation filters used in generating reference samples in intra-picture prediction. In the present disclosure, a discrete cosine transform 8-tap (DCT 8-tap) interpolation filter (4×4,4×8,8×4, …) may be applied to a block having nTbS of 2, and a Gaussian 8-tap (Gaussian 8-tap) interpolation filter (32×32, 32×64, 64×32, 64×64) may be applied to a block having nTbS of 5 or more.
Frequency-based adaptive interpolation filter in intra Prediction (Frequency-based Adaptive Interpolation FILTER IN INTRA Prediction)
An 8-tap discrete Cosine Transform interpolation filter (8-TAP DISCRETE Cosine Transform-based interpolation filter, DCT-IF) and an 8-tap smoothing interpolation filter (4-tap Smoothing interpolation filer, SIF) were used instead of the 4-tap discrete Cosine Transform interpolation filter (4-tap DCT-IF) and the 4-tap smoothing interpolation filter (4-tap SIF) currently used in VVC intra-picture prediction. Since the 8-tap discrete cosine transform interpolation filter (8-tap DCT-IF) has a high frequency (high frequency) characteristic compared to the 4-tap discrete cosine transform interpolation filter (4-tap DCT-IF), and the 8-tap smooth interpolation filter (8-tap SIF) has a low frequency (low frequency) characteristic compared to the 4-tap smooth interpolation filter (4-tap SIF), the type of using the 8-tap interpolation filter (8-tap interpolation filter) is selected according to the characteristics of the block.
The characteristics of the block are determined using the size of the block and the frequency (frequency) characteristics of the reference samples, and the type of interpolation filter used in the corresponding block is selected.
The characteristic is utilized that the smaller the size of a block, the lower the correlation (correlation) and the more high frequency (high frequency), and the larger the size, the higher the correlation (correlation) and the more low frequency (low frequency).
The frequency (frequency) characteristics of the reference samples can be calculated by applying a transform (transform) to the reference samples of the block using a discrete cosine transform-II (DCT-II). According to the intra prediction mode, an upper reference sample is used in the case of a vertical direction, and a left reference sample is used in the case of a horizontal direction. The higher the percentage of high frequency energy (high frequency energy), the higher the high frequency (high frequency) characteristic of the block.
The frequency (frequency) characteristics of the block are determined by comparing the percentage of high frequency energy (high frequency energy) to a threshold (threshold) based on the block size, and an interpolation filter (interpolation filter) is selected that needs to be applied to the block.
According to the Frequency (Frequency) information, an 8-tap discrete cosine transform interpolation filter (8-tap DCT-IF) which is a strong high pass filter (HIGH PASS FILTER, HPF) is applied to a block having a large number of high frequencies (high frequencies), and an 8-tap smoothing interpolation filter (8-tap SIF) which is a strong low pass filter (low PASS FILTER, LPF) is applied to a block having a large number of low frequencies (low frequencies).
By applying a strong high-pass filter (HPF) to a block having a large number of high frequencies (high frequencies) based on the characteristic that the correlation (correlation) is low as the block is small, an 8-tap discrete cosine transform interpolation filter (8-tap DCT-IF) which is a Jiang Gaotong filter (HPF) is applied when the block size is small. When the high frequency (high frequency) is small, a 4-tap interpolation filter (4-tap SIF) which is a Low Pass Filter (LPF) is applied.
By applying a strong low-pass filter (LPF) to a block having a large number of low frequencies (low frequencies) based on the characteristic that the correlation (correlation) is higher as the block is larger and frequency (frequency) information, an 8-tap smoothing interpolation filter (8-tap SIF) which is a strong low-pass filter (LPF) is applied when the block size is large. When high frequencies (high frequencies) are large, a 4-tap discrete cosine transform interpolation filter (4-tap DCT-IF) which is a weak low pass filter (HPF) is applied.
Illustrative example of calculating the percentage of high frequency energy (High frequency energy)
In the case where the intra-screen mode is horizontal, N is the height of the block, and in the case of vertical, N is the width of the block. In the case of using a smaller number of reference samples or using a larger number of reference samples, the value of N may become smaller or larger. X represents a reference sample. In the case described above, the high frequency (high frequency) region uses reference samples of 1/4 length of N, and in the case of calculating the high frequency (high frequency) using fewer reference samples or using more reference samples, the length of the region may be reduced or increased. The method of calculating the ratio of the high frequency energy (high frequency energy) is shown in the equation 17.
[ Math 17 ]
Fig. 14 illustrates an example of a method of selecting an interpolation filter using Frequency (Frequency) information.
In the case where the percentage of high frequency energy (high frequency energy) of the block with nTbS of 2 is smaller than the threshold (threshold), a 4-tap smoothing interpolation filter (4-tap SIF) is applied, while in other cases, an 8-tap discrete cosine transform interpolation filter (8-tap DCT-IF) is applied. In the case where the percentage of high frequency energy (high frequency energy) of the block having nTbS of 5 or more is smaller than the threshold (threshold), an 8-tap smoothing interpolation filter (8-tap SIF) is applied, and in other cases, a 4-tap discrete cosine transform interpolation filter (4-tap DCT-IF) is applied.
Fig. 15 and 16 illustrate coefficients of an 8-tap discrete cosine transform interpolation filter (8-tap DCT interpolation filter) and an 8-tap smooth interpolation filter (8-tap smoothing interpolation filter), respectively.
The present disclosure can improve coding efficiency by calculating a threshold value based on correlation (correlation), high_freq_ratio, and block size (nTbS) for each image. For all block sizes, using only the boundary correlation threshold (boundary correlation threshold) in selecting the length of the filter may also improve performance. The correlation threshold (Correlation threshold) may be applied independently to long/short tap discrete cosine transform interpolation filters (long/short tap DCT-IF) and to Smooth Interpolation Filters (SIF), or may be applied with high_freq_ratio (Correlation threshold). Fig. 17 illustrates an embodiment of an interpolation filter selected according to a boundary correlation threshold (boundary correlation threshold).
The various embodiments of the present disclosure are not intended to be exhaustive or to list all possible combinations, but are merely illustrative of representative aspects of the present disclosure, and the matters described in the various embodiments may be applied independently or in combination of two or more.
Furthermore, various embodiments of the present disclosure may be implemented in hardware, firmware (firmware), software, a combination thereof, or the like. When implemented in hardware, may be implemented by one or more Application specific integrated Circuits (ACICs, application SPECIFIC INTEGRATED Circuits), digital signal processors (DSPs, digital Signal Processors), digital signal processing devices (DSPDs, digital Signal Processing Devices), programmable logic devices (PLDs, programmable Logic Devices), field programmable gate arrays (FPGAs, field Programmable GATE ARRAYS), general purpose processors (general processor), controllers, microcontrollers, microprocessors, and the like.
The scope of the present disclosure includes software or device-executable instructions (e.g., operating system, application programs, firmware, programs, etc.) for execution on an apparatus or computer for the actions of the methods of the various embodiments, as well as non-transitory computer-readable media (non-transitory computer-readable medium) on which the software or instructions, etc., as described above, are stored.

Claims (20)

1. A video decoding method, comprising:
A step of obtaining the number of non-zero coefficients of the inverse quantization block;
determining an inverse transform method of the inverse quantization block according to the number of the non-zero coefficients; and
And performing inverse transformation of the inverse quantization block according to the determined inverse transformation method.
2. The video decoding method of claim 1,
The step of determining the inverse transform method of the inverse quantization block includes:
a step of comparing the number of non-zero coefficients with a specific threshold value; and
And determining an inverse transform method of the inverse quantization block based on the comparison result.
3. The video decoding method of claim 1,
The step of determining the inverse transform method of the inverse quantization block includes:
A step of determining the number of multiplication operations required for linear inverse transformation from the number of non-zero coefficients;
comparing the number of multiplication operations with a specific threshold value; and
And determining an inverse transform method of the inverse quantization block based on the comparison result.
4. The video decoding method of claim 3, wherein:
the number of the multiplication operations described above,
Is determined based on the number of non-zero coefficients and the size of the inverse quantization block.
5. The video decoding method according to one of claims 2 and 3, characterized in that:
the specific threshold is determined based on the size of the inverse quantization block.
6. The video decoding method according to one of claims 2 and 3, characterized in that:
The video decoding method further comprises the steps of:
determining a vertical kernel and a horizontal kernel applicable to the inverse quantization block;
the specific threshold is determined based on the sizes of the vertical kernel, the crystal box, and the inverse quantization block.
7. The video decoding method of claim 6, wherein:
The vertical kernel and the horizontal kernel are determined from at least one of discrete cosine transform-II (DCT-II), discrete sine transform-VII (DST-VII), and discrete cosine transform-VIII (DCT-VIII).
8. The video decoding method of claim 6, wherein:
The vertical core and the horizontal core,
Is determined based on the size of the inverse quantization block and a prediction method applicable to the inverse quantization block.
9. The video decoding method of claim 1, wherein:
an inverse transform method of the inverse quantized block is determined based on the image type of the inverse quantized block.
10. The video decoding method of claim 9, wherein:
The step of determining the inverse transform method of the inverse quantization block includes:
And determining whether to apply linear inverse transform to the inverse quantization block according to the number of the non-zero coefficients when the image type of the inverse quantization block is an Intra-full (AI) type or a Random Access (RA) type.
11. The video decoding method of claim 9, wherein:
the step of determining an inverse transform method of an inverse quantized block,
When the image type of the inverse quantization block is not a full Intra (AI) type or a Random Access (RA) type, it is determined that linear inverse transform is not applied to the inverse quantization block.
12. The video decoding method of claim 1, wherein:
an inverse transform method of the inverse quantized block is determined based on quantization parameters applicable to inverse quantization of the inverse quantized block.
13. The video decoding method of claim 12, comprising:
And when the quantization parameter is greater than a critical quantization parameter value, determining whether to apply linear inverse transformation to the inverse quantization block according to the number of non-zero coefficients.
14. The video decoding method of claim 12, wherein:
When the quantization parameter is less than a critical quantization parameter value, it is determined not to apply a linear inverse transform to the inverse quantization block.
15. The video decoding method of claim 1, wherein:
The video decoding method further comprises the steps of:
a step of acquiring linear inverse transformation allowable information indicating whether or not to permit linear inverse transformation from the parameter set;
the step of determining an inverse transform method of an inverse quantized block,
In a case where the linear inverse transform allowable information indicates that the linear inverse transform is allowable, it is determined whether an inverse transform method of the inverse quantization block is a linear inverse transform method.
16. The video decoding method of claim 15, wherein:
the set of parameters is such that,
Is at least one of a video parameter set, a sequence parameter set, an image parameter set, and an adaptation parameter set.
17. The video decoding method of claim 1, wherein:
an inverse transform method of the inverse quantization block is determined based on color components of the inverse quantization block.
18. The video decoding method of claim 1, wherein:
The step of performing the inverse transform of the inverse quantized block according to the decided inverse transform method includes:
In case the inverse transformation method is a linear inverse transformation,
Dividing the inverse quantization block into a plurality of sub-blocks each including only one non-zero coefficient and the remaining coefficients being zero coefficients;
a step of performing inverse transform on each of the plurality of sub-blocks; and
And obtaining an inverse transform block of the inverse quantization block based on a plurality of element blocks of each inverse transform.
19. A video encoding method, comprising:
encoding the block and inversely quantizing the encoded block;
a step of obtaining the number of non-zero coefficients of the inverse quantization block;
determining an inverse transform method of the inverse quantization block according to the number of the non-zero coefficients;
a step of performing inverse transform of the inverse quantization block according to the decided inverse transform method; and
Reconstructing the block by using the inverse transformation block, and encoding other blocks based on the reconstructed block.
20. A computer-readable storage medium, characterized by:
as a computer-readable storage medium storing a bitstream related to encoded video, there is included a bitstream generated by a video encoding method,
The video encoding method comprises the following steps:
encoding the block and inversely quantizing the encoded block;
a step of obtaining the number of non-zero coefficients of the inverse quantization block;
determining an inverse transform method of the inverse quantization block according to the number of the non-zero coefficients;
A step of performing inverse transform of the inverse quantization block according to the decided inverse transform method;
And
Reconstructing the block by using the inverse transformation block, and encoding other blocks based on the reconstructed block.
CN202380030687.0A 2022-01-27 2023-01-27 Image encoding/decoding method and apparatus Pending CN118947113A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
KR10-2022-0012631 2022-01-27
KR10-2022-0021078 2022-02-17
KR10-2022-0073743 2022-06-16

Publications (1)

Publication Number Publication Date
CN118947113A true CN118947113A (en) 2024-11-12

Family

ID=

Similar Documents

Publication Publication Date Title
US20230362364A1 (en) Method and apparatus for encoding/decoding video signal
CN110809886B (en) Method and apparatus for decoding image according to intra prediction in image coding system
JP6953523B2 (en) Image coding / decoding method and equipment
US11533470B2 (en) Method and apparatus for encoding/decoding an image signal
CN108347604B (en) Video decompression method, video compression method, and non-transitory computer-readable medium
TWI816439B (en) Block-based prediction
WO2013058473A1 (en) Adaptive transform method based on in-screen prediction and apparatus using the method
WO2013042888A2 (en) Method for inducing a merge candidate block and device using same
US11778179B2 (en) Image encoding method/device, image decoding method/device and recording medium having bitstream stored thereon
KR20160039549A (en) Method and apparatus for processing a video signal
AU2021200247B2 (en) Image decoding method and apparatus relying on intra prediction in image coding system
CN112425164B (en) Transform variants of multiple separable transform choices
CN117528087A (en) Method for decoding and encoding image and non-transitory computer readable medium
CN112335243A (en) Image encoding/decoding method and apparatus
JP2024069438A (en) Coding of intra-prediction modes
US20240064303A1 (en) Bypass alignment in video coding
KR20220007541A (en) Encoding information determination method based on Neural Network
KR20190140820A (en) A method and an apparatus for processing a video signal based on reference between components
CN118947113A (en) Image encoding/decoding method and apparatus
WO2024213139A1 (en) System and method for intra template matching
KR20230115935A (en) Image encoding/decoding method and apparatus
CN117203960A (en) Bypass alignment in video coding
WO2022191947A1 (en) State based dependent quantization and residual coding in video coding
KR20240047943A (en) Video encoding and decoding device and method
KR20240059507A (en) Method and apparatus for video encoding to imporve throughput and recording medium for storing bitstream

Legal Events

Date Code Title Description
PB01 Publication