GB2501495A - Selection of image encoding mode based on preliminary prediction-based encoding stage - Google Patents
Selection of image encoding mode based on preliminary prediction-based encoding stage Download PDFInfo
- Publication number
- GB2501495A GB2501495A GB1207182.5A GB201207182A GB2501495A GB 2501495 A GB2501495 A GB 2501495A GB 201207182 A GB201207182 A GB 201207182A GB 2501495 A GB2501495 A GB 2501495A
- Authority
- GB
- United Kingdom
- Prior art keywords
- encoding
- coding mode
- prediction
- quantizers
- coding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013139 quantization Methods 0.000 claims abstract description 251
- 238000000034 method Methods 0.000 claims abstract description 78
- 230000003068 static effect Effects 0.000 claims abstract description 76
- 238000009826 distribution Methods 0.000 claims abstract description 31
- 230000011218 segmentation Effects 0.000 claims description 60
- 230000001131 transforming effect Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 description 24
- 230000008569 process Effects 0.000 description 20
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 15
- 230000015654 memory Effects 0.000 description 14
- 238000007906 compression Methods 0.000 description 13
- 230000006835 compression Effects 0.000 description 13
- 230000000694 effects Effects 0.000 description 12
- 238000012545 processing Methods 0.000 description 10
- 238000013459 approach Methods 0.000 description 9
- 230000008878 coupling Effects 0.000 description 9
- 238000010168 coupling process Methods 0.000 description 9
- 238000005859 coupling reaction Methods 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 8
- 230000000877 morphologic effect Effects 0.000 description 6
- 230000002123 temporal effect Effects 0.000 description 6
- 230000009466 transformation Effects 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 230000015556 catabolic process Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 238000009472 formulation Methods 0.000 description 5
- 239000000203 mixture Substances 0.000 description 5
- 239000013598 vector Substances 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 238000012804 iterative process Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 229920006395 saturated elastomer Polymers 0.000 description 4
- 230000007423 decrease Effects 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000010339 dilation Effects 0.000 description 1
- 230000003292 diminished effect Effects 0.000 description 1
- 230000003628 erosive effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 229940085942 formulation r Drugs 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000001172 regenerating effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 210000003462 vein Anatomy 0.000 description 1
- 230000003936 working memory Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/19—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding using optimisation based on Lagrange multipliers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/157—Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
- H04N19/126—Details of normalisation or weighting functions, e.g. normalisation matrices or variable uniform quantisers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/625—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Discrete Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
A method for encoding blocks of image pixels, which may form an ultra-high definition image 101, comprises: providing a preliminary prediction-based encoding 106 of original blocks, comprising obtaining a preliminary prediction residual image portion resulting from the difference between the original blocks and corresponding predictors; segmenting the preliminary prediction residual image portion into preliminary prediction residuals; and providing a second prediction-based encoding of original blocks. The second prediction-based encoding involves choosing a coding mode to apply to each prediction residual, from among several coding modes including a static coding mode 111 and a probabilistic coding mode 112A. The static coding mode implements quantization using only predefined quantizers, which may be user-defined, and the probabilistic coding mode implements quantization using quantizers selected based on statistics 117 on the preliminary prediction residuals. Selection of the optimal coding mode may be based on encoding costs or rate-distortion criteria. Obtaining the statistics may involve a probabilistic model such as a generalised Gaussian distribution. Identities of the coding modes used may be stored in a quad-tree in the bit stream. A corresponding decoding method is also disclosed.
Description
METHODS FOR ENCODING AND DECODING AN IMAGE WITH COMPETITION OF
CODING MODES, AND CORRESPONDING DEVICES
FIELD OF THE INVENTION
The present invention concerns methods for encoding and decoding an image comprising coding units of pixels such as blocks or macroblocks of pixels, and associated encoding and decoding devices.
The invention is particularly useful for the encoding of digital video sequences made of images or "frames".
BACKGROUND OF THE INVENTION
Video compression algorithms, such as those standardized by the standardization organizations ITU, ISO, and SMPTE, exploit the spatial and temporal redundancies of images in order to generate bit streams of data of smaller size than original video sequences. These powerful video compression tools, known as spatial (or intra) and temporal (or inter) predictions, make the transmission and/or the storage of video sequences more efficient.
Video encoders and/or decoders (codecs) are often embedded in portable devices with limited resources, such as cameras or camcorders. Conventional embedded codecs can process at best high definition (HD) digital videos, i.e 1080x1 920 pixel frames.
Real time encoding is limited by the limited resources of the portable devices, especially regarding slow access to the working memory (e.g. random access memory, or RAM) and regarding the central processing unit (CPU).
This is particularly striking for the encoding of ultra-high definition (UHD) digital videos that are about to be handled by the latest cameras. This is because the amount of pixel data to encode or to consider for spatial or temporal prediction is huge.
UHD is typically four times (4k2k pixels) the definition of an HD video which is the current standard definition video. Furthermore, very ultra high definition, which is sixteen times that definition (i.e. 8k4k pixels), is even being considered in a more long-term future.
In the same vein, some encoders simplify the processing to reduce computational costs, to the detriment of the compression efficiency, in particular in terms of rate-distortion ratio.
Figures 1 and 2 respectively represent the scheme for a conventional block-based video encoder 10 and the scheme for a conventional block-based video decoder 20 in accordance with video compression standards, such as H.264/MPEG-4 AVC ("Advanced Video Coding").
The latter is the result of the collaboration between the "Video Coding Expert Group" (VCEG) of the ITU and the "Moving Picture Experts Group" (MPEG) of the ISO, in particular in the form of a publication "Advanced Video Coding for Generic Audiovisual Services" (March 2005).
More advanced standards are being developed by VCEG and MPEG. In particular, the standard of next generation to replace the H.264/MPEG-4 AVC standard is still being drafted and is known as the HEVC standard (standing for "High Efficiency Video Coding").
This HEVC standard introduces new coding tools and new coding units that are generalizations of the coding units defined in H.2641AVC.
Figure 1 schematically represents a scheme for a block-based video encoder 10.
The original video sequence 101 is a succession of digital images I. As is known per se, a digital image is represented by one or more matrices of which the coefficients represent pixels.
The value of a pixel can in particular correspond to luminance information.
In the case where several components are associated with each pixel (for example red-green-blue components or luminance-chrominance components), each of these components can be processed separately.
An image I is split (block 102) into a plurality of coding units of pixels, in particular into macroblocks, generally blocks of size 16 pixels x 16 pixels up to 64x64 pixels in HEVC, which macroblocks may in turn be divided into different sizes of blocks of pixels, for example 4x4, 4x8, 8x4, 8x8, 8x16, 16x8.
During block-based video compression, a first phase consists for the coding units to be predicted spatially (block 103) by an "Intra" predictor or temporally (block 104) by an "Inter" predictor. Each predictor is a set of pixels of the same size as the coding unit to be predicted, not necessarily aligned on the grid decomposing the image into coding units, and is taken from the same image or another image (block 105).
While the prediction is usually estimated at the macroblock level, the video coding standards also provide for partitioning of the macroblock into smaller regions used for the prediction, for example sub-blocks of 4x4, 4x8, 8x4, 8x8, 8x16 or 16x8 pixels. As a consequence, any of the macroblock and of the sub-blocks can be seen as a coding unit to be coded.
From the obtained predictor and from the coding unit to be predicted, a difference block (or "residual") is derived (block 107). Identification of the predictor and coding of the residual make it possible to reduce the quantity of information to be actually encoded.
It should be noted that, in certain cases, the predictor can be chosen in an interpolated version of the reference image in order to reduce the prediction differences and therefore improve the compression in certain cases.
In the "Intra" prediction module 103, the current coding unit is predicted by continuity from its neighbouring blocks (see Figure 3), by means of an "Intra" predictor, which is a block of pixels constructed from information on the current image already encoded.
With regard to Inter" coding by temporal prediction, motion estimation 106 between the current coding unit and reference images 105 (past or future) is performed in order to identify, in one of those reference images, the set of pixels closest to the current block to be used as a predictor of that current coding unit. This is illustrated through Figure 4. The reference images used consist of images in the video sequence that have already been coded and then reconstructed (by decoding).
The predictor obtained by the temporal prediction is next generated and then subtracted (block 107) from the current coding unit to be processed so as to obtain a difference block (or "residual"). This step is called "motion compensation" in the conventional compression algorithms.
These two types of prediction thus supply several prediction texture residuals (the difference between the current coding unit and the predictor) that are compared in a module for selecting the best prediction mode for the purpose of determining the one that optimizes a rate-distortion criterion.
If "Intra" prediction is selected, prediction information for describing the "lntra" predictor is coded (blocki 08) before being inserted into the bit stream 190.
If the module for selecting the best prediction mode chooses "Inter" prediction, prediction information such as motion information is coded (block 108) and inserted into the bit stream 190. This motion information is in particular composed of a motion vector (indicating the position of the predictor in the reference image relative to the position of the coding unit to be predicted) and appropriate information to identify the reference image among the reference images (for example an image index).
In a second phase of the block-based video compression, the residuals selected by the choice module is then transformed (109) in the frequency domain, by means of a discrete cosine transform DCI, this resulting in transform (residual) units of DCT coefficients. Next, the DCT coefficients are quantized (110) using quantizers that are static, i.e. predetermined based on a user-specified quantization parameter (known as the "QP" parameter) to obtain a quantized block of quantized DCI coefficients.
In some video coding standards, the OCT transform and the quantization usually use blocks of the same size as the residual or of the next lower size. For example, a residual resulting from the prediction with the size of 16x16 pixels can be transformed and quantized based on 16x16 blocks or 8x8 blocks. In particular, this is shown in Figure 1 through the selection of the blocks 111 and 112.
Using one or the other size for the transform and the quantization does not provide the same result in term of rate-distortion ratio of the resulting encoded data.
That is why most of the video coding standards provide a competition process between the several sizes of transforms and quantization, each size being used to define a corresponding static coding mode. This competition is local since only information on the residual to be encoded is used.
Of course, this is an example of some video coding standards and a higher number of possibilities than 16x16 and 8x8 sizes can be offered for the second phase of the block-based video compression, including non-square sizes (for instance rectangular).
The coefficients of the residual once transformed and quantized (i.e. of the quantized block) are next coded by means of entropy or arithmetic coding (108) and then inserted into the compressed bit stream 190 as part of the useful data coding the image.
Also an indication of how the image is split into 16x16 blocks and/or 8x8 blocks is added in the bit stream 190 so to make it possible for any decoder to know which coding mode has been used. For example, in the specific case of the HEVC standard, a quadtree is provided in the bitstream 190 to reflect a recursive breakdown of the image into square-shaped regions of pixels (the quantized blocks or transform units) wherein each leaf node is a transform unit (or the corresponding quantized block).
The bit stream 190 thus comprises the encoded image, i.e. this indication of the image breakdown, the prediction information for each coding unit (making it possible to retrieve the corresponding predictor) and the quantized transformed residual for each coding unit, all the information being entropy or arithmetic coded.
Furthermore, the QP parameter is also inserted in the bit stream.
In order to calculate the "Intra" predictors or to make the motion estimation for the "Inter" predictors, the encoder performs decoding of the blocks already encoded by means of a so-called decoding" loop in order to obtain reference images 105 for the future motion estimations. This decoding loop makes it possible to reconstruct the coding units and images from the quantized transformed residuals.
It ensures that the coder and decoder use the same reference images.
Thus each quantized transformed residual is dequantized (113) by application of a quantization operation which is inverse to the one provided at step 110, and is then transformed into the spatial/pixel domain by application of the transformation (114) that is the inverse of the one at step 109.
If the quantized transformed residual comes from an "lntra" coding 103, the "lntra" predictor used is added (115) to the residual after inverse transformation 114 in order to obtain a reconstructed coding unit corresponding to the original coding unit modified by the losses resulting from the quantization operations.
If on the other hand the quantized transformed residual comes from an "Inter" coding 104, the block of pixels pointed to by the current motion vector (this block belongs to the reference image 105 referred to in the coded motion information) is added (115) to the residual after inverse transformation 114. In this way a reconstructed coding unit is obtained that is the same as the original coding unit modified by the losses resulting from the quantization operations.
In order to attenuate, within the same image, the block effects created by strong quantization of the obtained reconstructed coding units, the encoder includes a "deblocking" filter 116 or other post-processing tools such as SAO or ALF. The objective of the deblocking filter 116 is to eliminate these block effects, in particular the artificial high frequencies introduced at the boundaries between blocks in the reconstructed coding units. The deblocking filter 116 smooths the borders between the reconstructed coding units in order to visually attenuate these high frequencies created by the coding. As such a filter is known from the art, it will not be described in further detail here.
The deblocking filter 116 is thus applied to an image when all the coding units of pixels of that image have been decoded and reconstructed.
The filtered images, also referred to as reconstructed images, are then stored as reference images 105 in order to allow subsequent "Inter" predictions based on these images to take place during the compression of the following images in the current video sequence.
Figure 2 shows a general scheme of a video block-based decoder 20.
The decoder 20 receives as an input a bit stream 201, in particular the bit stream 190 generated by the encoder 10 of Figure 1.
During the decoding process, the bit stream 201 is first of all entropy decoded (202). which makes it possible to retrieve the indication of the image breakdown (quadtree of the coding modes), the prediction information for each coding unit (making it possible to retrieve the corresponding predictor) and the quantized transformed residual for each coding unit. The QP parameter is also retrieved from the bit stream.
The quantized transformed residual of a current coding unit to decode is dequantized (203) using the inverse quantization to that provided at 110 corresponding to the associated coding mode (block 204 or 205) as specified in the breakdown information (quadtree). The quantizers defining the quantization 110 can be retrieved by the decoder using the QP parameter transmitted in the bit stream 190, in the same way as the encoder.
The dequantized residual is then transformed in the spatial domain (206) by means of the inverse transformation to that provided at 109.
Decoding of the data in the video sequence is then performed image by image and, within an image, coding unit by coding unit.
The "Inter" or "Intra" prediction mode for the current coding unit is extracted from the corresponding prediction information in the bit stream 201.
If the prediction mode for the current coding unit is of the "Intra" type, the index of the prediction direction is extracted from the prediction information. The pixels of the decoded adjacent blocks most similar to the current coding unit according to this prediction direction are used for regenerating the "Intra" predictor (block 207).
If the prediction mode for the current coding unit indicates it is of the "Inter1' type, then the motion vector, and possibly the identifier of the reference image used, is extracted from the prediction information. This motion information is used in the motion compensation module 208 in order to determine the "Inter" predictor contained in the reference images 209 of the decoder 20. In a similar fashion to the encoder, these reference images 209 may be past or future images with respect to the image currently being decoded and are reconstructed from the bit stream (and are therefore decoded beforehand).
The predictor (either Intra or Inter) is added (block 210) to the decoded residual after inverse transformation 206 in order to obtain a decoded coding unit.
At the end of the decoding of all the coding units of the current image, the same deblocking filter 211 as the one (116) provided at the encoder is used to eliminate the block effects so as to obtain the reference images 209.
The images thus decoded constitute the output video signal 290 of decoder 20, which can then be displayed and used.
One may note that these decoding operations are similar to the decoding loop of the coder 10 and thus also feed the set of reference images 209.
Quick analysis of coding performance reveals that the coding modes for transformation and quantization of the residuals are far from optimality. The inventors have inferred that this is because the coding modes are static, i.e. not depending on the image content. Indeed, the quantizers used for said quantization are predefined based only on the user-specified OP.
SUMMARY OF THE INVENTION
In this context, the inventors have contemplated using statistics on the image content to define and select more efficient quantizers.
They have defined probabilistic coding modes where the quantizers of a quantization therein are selected based on statistics on the quantized residuals resulting from a previous prediction. Statistics are for example on DCT coefficients of a residual frame once segmented.
This is for example the case in co-pending GB application No 1203698 where the probabilistic coding modes are used to encode a residual enhancement layer during scalable encoding.
To optimize compression, the probabilistic coding modes and possible static coding modes (where the quantizers are predefined) are put in competition.
Still with the view of obtaining the best rate-distortion performance, the inventors have faced a coupling difficulty.
On one hand, implementing probabilistic coding modes requires that statistics on a large number of coding units (e.g. a whole image) are gathered in order to select appropriate quantizers, i.e. the statistics on the residuals deriving from these coding units. This is to be sure that the statistics accurately reflect the content to be coded.
But the segmentation of the image into coding units (and thus into residuals) appears essential to obtain the statistics.
On the other hand, the choice of a coding mode between the various competing coding modes is a question of local competition as introduced above, i.e. a competition of the efficiency of the coding modes when considering a specific coding unit in the image.
But the quantizers for those coding modes are required when performing this local competition, since they are needed to compute a compression cost based on which the most efficient coding mode is chosen.
As a consequence, there is a coupling problem when several coding modes including at least one probabilistic coding mode are put in competition: the local optimization to choose the most appropriate coding mode and the global quantizers automatically impact each other in a coupling relationship.
The present invention has been devised to address at least one of the foregoing concerns, in particular to solve the above coupling problem.
According to a first aspect. the invention provides a method for encoding at least one original coding unit of pixels of an original image, the method comprising: providing a preliminary prediction-based encoding of original coding units, comprising obtaining a preliminary prediction residual image portion resulting from the difference between the original coding units and corresponding predictors; segmenting the preliminary prediction residual image portion into preliminary prediction residuals; providing a second prediction-based encoding of at least one original coding unit, comprising obtaining at least one prediction residual resulting from the prediction of the at least one original coding unit; determining a coding mode to apply to the or each prediction residual, from among a plurality of coding modes including at least one static coding mode and at least one probabilistic coding mode, the or each static coding mode implementing quantization using only predefined quantizers and the or each probabilistic coding mode implementing a quantization using quantizers selected based on statistics on the preliminary prediction residuals; and encoding the or each prediction residual using the corresponding determined coding mode.
The invention provides an efficient way of solving the above-defined local/global coupling problem.
This is achieved by providing a dummy first encoding pass (the above "preliminary" encoding) from which statistics, used to define the probabilistic coding modes for the competition of the second and actual encoding pass, are obtained.
Such statistics thus reflect quite closely for instance the distribution of coefficients within the actual prediction residuals, since the two segmentations of the residual frame portion (i.e. all or part of the residual frame) during the dummy encoding and the actual encoding are highly likely to be similar.
Corresponding probabilistic coding modes, where the quantizers are optimized given the statistics, can thus be implemented in order to efficiently encode the coefficients of the actual prediction residuals.
Unbiased competition between the static coding modes and the probabilistic coding modes thus defined is therefore conducted during the actual prediction-based encoding (the second pass).
According to a second aspect, the invention provides a device for encoding at least one original coding unit of pixels of an original image, the device comprising: a first block-based coder configured to provide a preliminary prediction-based encoding of original coding units, by obtaining a preliminary prediction residual image portion resulting from the difference between the original coding units and corresponding predictors and by segmenting the preliminary prediction residual image portion into preliminary prediction residuals; a second block-based coder configured to provide at least one static coding mode and at least one probabilistic coding mode, the or each static coding mode implementing quantization using only predefined quantizers and the or each probabilistic coding mode implementing a quantization using quantizers selected based on statistics on the preliminary prediction residuals; and configured to provide a second prediction-based encoding of at least one original coding unit, by obtaining at least one prediction residual resulting from the prediction of the at least one original coding unit, by determining a coding mode to apply to the or each prediction residual, from among a plurality of coding modes including the at least one static coding mode and the at least one probabilistic coding mode, and by encoding the or each prediction residual using the corresponding determined coding mode.
Optional features of the encoding method or of the encoding device are defined in the appended claims.
According to a particular feature, determining a coding mode to apply to a prediction residual comprises encoding the prediction residual according to several coding modes of the plurality and selecting a coding mode based on a rate-distortion criterion on the obtained encoded prediction residuals. This forms part of the local optimization involved in the above-defined coupling issue.
In particular, encoding costs as a ratio between rate and distortion are computed for each obtained encoded prediction residual, the selection being based on the encoding costs. For the purposes of illustration, such an encoding cost can be the well-known Lagrangian cost D21-X.R, where D is the distortion, R the rate and ? the so-called Lagrange parameter.
In one embodiment of the invention, each probabilistic coding mode is defined by a block type defined for the prediction residuals in the preliminary segmentation and by a set of quantizers selected from the statistics on the preliminary prediction residuals for that block type. This provision ensures the segmentation during the preliminary encoding is as close as possible to the segmentation used during the actual encoding (the second pass).
In particular, the block type for a prediction residual is function of a level of energy of the prediction residual.
According to a particular feature, the segmentation associates a block type from among a plurality of block types with each created preliminary prediction residual; and the encoding method further comprises obtaining statistics on the preliminary prediction residuals for each block type of said plurality, wherein the quantizers for a probabilistic coding mode defined by a block type are selected based on the statistics obtained for that block type.
This also contributes to having close segmentation during the preliminary encoding and the actual encoding.
In particular, each coding mode applies to a respective unit having coefficients in the frequency domain, each having a coefficient type; the obtained statistics for a probabilistic coding mode comprise statistics associated with each of the coefficient types; and selecting the quantizers of a probabilistic coding mode based on the obtained statistics is performed by selecting, from among optimal quantizers each associated with a rate and corresponding distortion, an optimal quantizer for each coefficient type given the associated statistics and a distortion value function of a user-specified quantization parameter. Such user-specified quantization parameter can be the well-known OP parameter.
By pre-storing the set of optimal quantizers for a variety of possible statistics, this provision makes it possible to retrieve the optimal quantizers at low computational cost.
In one embodiment of the invention, the encoding method further comprises providing a user-specified quantization parameter, wherein the quantizers of each static coding mode are predefined based on the user-specified quantization parameter, and the quantizers of each probabilistic coding modes are selected based on the obtained statistics and the user-specified defined quantization parameter.
This provision makes it possible to define the coding modes in competition given a quantization parameter provided by a user.
In another embodiment of the invention, the encoding method further comprises iterating the second prediction-based encoding, wherein the statistics for a next iteration are obtained based on at least one encoded coding unit resulting from the second prediction-based encoding of the previous iteration, and the quantizers for the at least one probabilistic coding mode of the second prediction-based encoding of the next iteration are selected based on the statistics obtained for the next iteration. The iterations provide statistics that are each time closer to the real statistics of the residual coded during the second prediction-based encoding. Therefore, the iterations progressively converge to an optimal coding where the probabilistic coding modes of the second pass are optimal. A rate-distortion threshold of the encoded data or a number of iterations can be predefined to stop the iterative loop.
In yet another embodiment of the invention, the predefined quantizers are independent from the original image. This contrasts with the probabilistic approach of the second stage of the new coding mode.
In yet another embodiment of the invention, the statistics are obtained by transforming the preliminary prediction residuals into preliminary transform units of coefficients in the frequency domain, each coefficient having a coefficient type and by determining, for at least one coefficient type, a probabilistic model for coefficients of said at least one coefficient type based on a plurality of values of coefficients of said at least one coefficient type. In particular, said probabilistic model may be a Generalized Gaussian Distribution.
This is a particularly convenient way for having reliable statistics on the content of the preliminary prediction residuals.
In particular, the quantizers of a probabilistic coding mode are each associated with one respective coefficient type of the coefficient types in the frequency domain; and the quantizer associated with each given coefficient type is selected based on the probabilistic model for that given coefficient type.
Again, in yet another embodiment of the invention, the encoding method comprises providing a bit stream comprising the statistics used to select the quantizers, and comprising, for the or each original coding unit, information on the prediction of said original coding unit during the second prediction-based encoding (e.g. prediction direction or motion vector), an indication of the coding made used for encoding the prediction residual and the resulting encoded prediction residual. This is the relevant information for a decoder to decode coding units that have been coded using the competition-based encoding according to the invention.
In particular, the identities of the coding modes used for the original coding units are stored in a quad-tree in the bit stream. This is an efficient and compact way to store these indications.
In one embodiment, at least one of the probabilistic coding modes comprises: a step of prediction-based encoding of an original coding unit to obtain a pre-coded coding unit, this step including predicting an original coding unit and performing a first encoding stage by quantizing, in the frequency domain and using predefined quantizers, a prediction residual resulting from said prediction, and a second encoding stage including quantizing, in the frequency domain and using the quantizers selected for the probabilistic coding mode, a quantization residual resulting from the difference between the original coding unit and a decoded version of the obtained pre-coded coding unit.
This provision defines a new two-stage coding mode that advantageously combines features of conventional static coding modes and features of probabilistic coding modes.
A first step provides prediction-based encoding using conventional static sub-coding stage, from which a pre-coded coding unit is obtained.
The second stage supplements the first stage with a probabilistic sub-coding mode of the residual resulting from the quantization of the first stage, i.e. of a quantization residual that is the difference between the original coding unit and the reconstructed coding unit using only the encoded data of the first stage (the pre-coded coding unit).
In that case, a sub-embodiment of the invention provides that the bit stream further comprises, for an original coding unit encoded using the two-stage probabilistic coding mode, a quantized prediction residual resulting from the first encoding stage of the original coding unit, and the quantized quantization residual resulting from the second encoding stage. This is the relevant information for the decoder to decode coding units that have been coded using a two-stage coding mode according to the invention.
According to a particular feature, the preliminary prediction-based encoding comprises quantizing preliminary prediction residuals several times alter the segmentation using, each time, quantizers predefined based on at least two quantization offsets and on a user-specified quantization parameter, and at least two probabilistic coding modes are two-stage probabilistic coding modes that differ by respective quantization offsets from the at least two quantization offsets, the quantizers of their first encoding stage being predefined based on their respective quantization offset and on the user-specified quantization parameter, and the quantizers of their second encoding stage being selected based on the user-specified quantization parameter and on statistics on preliminary quantization residuals resulting from the difference between the preliminary prediction residuals before quantization and the same preliminary prediction residuals once quantized using their respective quantization offset and.
This provision makes it possible to split the available rate (directly linked to the user-specified QP parameter) to encode the coding units into a first rate assigned to convey the encoded data resulting from the first encoding stage (i.e. prediction information and encoded prediction residual) and a complementary rate assigned to convey the encoded data resulting from the second encoding stage (i.e. encoded quantization residual), said splitting being controlled by the quantization offset used.
Thus, playing on the quantization offset makes it possible to adjust the splitting.
Given this provision, the new coding mode of the invention can be directly put in competition with other coding modes that meet the same OP parameter, for example static coding modes or any other new coding mode having a different quantization offset splitting the available rate differently.
According to a variant, the preliminary prediction-based encoding comprises competing between static coding modes to segment the preliminary prediction residual image portion while quantizing each preliminary prediction residual using a corresponding competing static coding mode, the competing static coding modes implementing quantization using predefined quantizers based on one quantization offset and a user-specified quantization parameter, and wherein at least one probabilistic coding mode for the second prediction-based encoding is a two-stage probabilistic coding mode, and, for each two-stage probabilistic coding mode, the quantizers of its first encoding stage are predefined based on the quantization offset and on the user-specified quantization parameter, and the quantizers of its second encoding stage are selected based on the user-specified quantization parameter and on statistics on preliminary quantization residuals resulting from the difference between the preliminary prediction residuals before quantization and the same preliminary prediction residuals once quantized using said quantization offset.
This variant may be applied when a single quantization offset is used. This is because the statistics for each two-stage probabilistic coding mode (they are all defined by this quantization offset) can be obtained through a single encoding pass from which the segmentation results. It is thus avoided to perform quantization after the segmentation to obtain the statistics, as implemented in the other variant above.
Where the quantization offset or the various quantization offsets are provided, the bit stream may further comprise the statistics obtained for the or each quantization offset. This is relevant information for the decoder to decode coding units that have been coded using probabilistic two-stage coding modes that are in competition according to the invention.
According to a third aspect, the invention provides a method for decoding a bit stream comprising data representing at least one coding unit of pixels in an image, the method comprising: obtaining statistics from the bit stream; selecting, based on the obtained statistics, quantizers of at least one probabilistic coding mode implementing a quantization; obtaining, from the bit stream and for the at least one coding unit, an identity of a coding mode from among a plurality of coding modes including at least one static coding mode implementing a quantization using predefined quantizers and the at least one probabilistic coding mode; decoding pad of the data using the indicated coding mode, to obtain at least one decoded residual; obtaining prediction information from the bit stream; and reconstructing the at least one coding unit using the prediction information and the at least one decoded residual.
According to this solution, coding units encoded by competition-based encoding of the invention can efficiently be decoded at the decoder side. In particular, this is possible since the statistics to define the quantizers for the probabilistic coding modes are provided in the bit stream, as well as the coding mode in competition actually chosen for each coding unit to decode.
According to a fourth aspect, the invention provides a device for decoding a bit stream comprising data representing at least one coding unit of pixels in an image, the device comprising: a statistics module for obtaining statistics from the bit stream; a quantizer selecting module for selecting, based on the obtained statistics, quantizers of at least one probabilistic coding mode implementing a quantization; a coding mode retrieving module for obtaining, from the bit stream and for the at least one coding unit, an identity of a coding mode from among a plurality of coding modes including at least one static coding mode implementing a quantization using predefined quantizers and the at least one probabilistic coding mode; a decoder to decode part of the data using the indicated coding mode, to obtain at least one decoded residual; a prediction information retrieving module for obtaining prediction information from the bit stream; and a frame reconstruction module for reconstructing the at least one coding unit using the prediction information and the at least one decoded residual.
Optional features of the decoding method or of the decoding device are defined in the appended claims.
In particular, the decoding method may further comprise obtaining a plurality of sets of statistics corresponding to a respective plurality of probabilistic coding modes, from the bit stream; wherein quantizers are selected for each probabilistic coding mode based on the respective obtained set of statistics. This makes it possible for the decoder to reconstruct several probabilistic coding modes in competition at the encoder.
According to a particular feature, the decoding method further comprises obtaining a user-specified quantization parameter from the bit stream, wherein the quantizers of a probabilistic coding mode are selected based on the obtained statistics and the user-specified quantization parameter.
In particular, each coding mode applies to a respective unit having coefficients in the frequency domain, each having a coefficient type; the obtained statistics for a probabilistic coding mode comprise statistics associated with each of the coefficient types; and selecting the quantizers of a probabilistic coding mode based on the obtained statistics is performed by selecting, from among optimal quantizers each associated with a rate and corresponding distortion, an optimal quantizer for each coefficient type given the associated statistics and a distortion value function of a user-specified q uantization parameter.
In one embodiment, the predefined quantizers are independent from the data representing the at least one coding unit in the bit stream.
In case the above two-stage probabilistic coding mode is implemented as one of the probabilistic coding modes, the method may further comprise: decoding part of the data using quantizers selected based on the obtained statistics, to obtain at least one decoded quantization residual; decoding part of the data using predefined quantizers, to obtain at least one decoded prediction residual; and reconstructing the at least one coding unit by adding a predictor obtained from the prediction information with the at least one decoded prediction residual and the at least one decoded quantization residual.
This makes it possible for the decoder to decode coding units encoded by the two-stage coding mode according to the invention.
The invention also provides a non-transitory computer-readable medium storing a program which, when executed by a microprocessor or computer system in an apparatus, causes the apparatus to perform the steps of: providing a preliminary prediction-based encoding of original coding units of pixels of an original image, comprising obtaining preliminary prediction residual image portion resulting from the difference between the original coding units and corresponding predictors: segmenting the preliminary prediction residual image portion into preliminary prediction residuals, based on at least one static coding mode implementing a quantization using only predefined quantizers: providing a second prediction-based encoding of at least one original coding unit of pixels of the original image, comprising obtaining at least one prediction residual resulting from the prediction of the at least one original coding unit; determining a coding mode to apply to the or each prediction residual, from among a plurality of coding modes including the at least one static coding mode and at least one probabilistic coding mode implementing a quantization using quantizers selected based on statistics on the preliminary prediction residuals; and encoding the or each prediction residual using the corresponding determined coding mode.
The invention also provides a non-transitory computer-readable medium storing a program which, when executed by a microprocessor or computer system in an apparatus, causes the apparatus to perform the steps of: obtaining statistics from a bit stream comprising data representing at least one coding unit of pixels in an image; selecting, based on the obtained statistics, quantizers of at least one probabilistic coding mode implementing a quantization,; obtaining, for the at least one coding unit, an identity of a coding mode from among a plurality of coding modes including at least one static coding mode implementing a quantization using predefined quantizers and the at least one probabilistic coding mode decoding part of the data using the indicated coding mode, to obtain at least one decoded residual; obtaining prediction information from the bit stream; and reconstructing the at least one coding unit using the prediction information and the at least one decoded residual.
The invention also provides an encoding device for encoding an image substantially as herein described with reference to, and as shown in, Figure 5; Figure 5A; Figures 5 and 10 or Figures 5 and 19 of the accompanying drawings.
The invention also provides a decoding device for decoding an image substantially as herein described with reference to, and as shown in, Figure 6 or Figure 6A of the accompanying drawings.
At least parts of the methods according to the invention may be computer implemented. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware. resident software, micro-code, etc.) or an embodiment combining software and hardware aspects which may all generally be referred to herein as a "circuit", "module" or "system". Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.
Since the present invention can be implemented in software, the present invention can be embodied as computer readable code for provision to a programmable apparatus on any suitable carrier medium, for example a tangible carrier medium or a transient carrier medium. A tangible carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid state memory device or the like. A transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g. a microwave or RF signal.
BRIEF DESCRIPTION OF THE DRAWINGS
Other particularities and advantages of the invention will also emerge from the following description, illustrated by the accompanying drawings, in which: -Figure 1 schematically shows a conventional block-based video encoder; -Figure 2 schematically shows the corresponding decoder; -Figure 3 schematically illustrates the intra prediction in the conventional block-based video encoder of Figure 1; -Figure 4 schematically illustrates the inter prediction in the conventional block-based video encoder of Figure 1; -Figure 5 schematically shows a block-based video encoder according to an embodiment of the invention; -Figure SA schematically shows a block-based video encoder according to another embodiment of the invention; -Figure 6 schematically shows a decoder corresponding to the encoder of Figure 5; -Figure 6A schematically shows a decoder corresponding to the encoder of Figure 5A; -Figure 7 illustrates a conventional static coding mode including an inter/intra prediction followed by a quantization-based encoding using predefined q uantizers; -Figure 8 illustrates a single-step probabilistic coding mode using statistics-based quantizers; -Figure 9 illustrates a two-step probabilistic coding mode using statistics-based quantizers; -Figure 10 shows an exemplary embodiment of an encoding process according to the teachings of the invention at the frame level; -Figure 11 illustrates an example of a quantizer based on Voronoi cells; -Figure 12 shows the relationship between data in the spatial domain (pixels) and data in the frequency domain; -Figure 13 illustrates an exemplary distribution over two quanta; -Figure 14 shows exemplary rate-distortion curves, each curve corresponding to a specific number of quanta; -Figure 15 shows the rate-distortion curve obtained by taking the upper envelope of the curves of Figure 14; -Figure 16 shows several rate-distortion curves obtained for various possible parameters of the DCT coefficient distribution; -Figure 17 shows a merit-distortion curve for a DCI coefficient; -Figure 18 illustrates a bottom-to-top algorithm used in the frame of the encoding process of Figure 10; -Figure 19 shows a simplified exemplary embodiment of an encoding process according to the teachings of the invention at the frame level; and -Figure 20 shows a particular hardware configuration of a device able to implement methods according to the invention.
DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
For the detailed description below, focus is made on the encoding of a UHD video as introduced above with reference to Figures Ito 4. It is however to be recalled that the invention applies to the encoding of any image from which a probabilistic distribution of transformed block coefficients in a frequency domain can be obtained (e.g. statistically).
Figures 5 and 5A represent, on the same basis as Figure 1, the scheme for a block-based video encoder 10 according to embodiments the invention.
The new scheme according to the invention implements a competition between the static coding modes 111, 112 of Figure 1 with one or more probabilistic coding modes lilA-B, 112A to provide segmentation and encoding of an original image.
In Figure 5, the competition involves a particular kind of probabilistic coding mode made of two stages. The coding modes referenced 11 IA and 111 B in the Figure are two-stage probabilistic coding modes that differ from each other by a quantization offset used for a sub-quantization therein as further described below.
Still with reference to Figure 5, coding mode 112A is also a two-stage probabilistic coding mode. For the purpose of competition between coding modes, other coding modes can be provided in the encoder, in particular single-stage probabilistic coding modes can compete against the two-stage probabilistic coding modes.
Figure 5A illustrates a simpler encoder where only single-stage probabilistic coding modes lilA, 112A are implemented and put in competition with the static coding modes 111, 112.
Figures 6 and 6A represent the corresponding decoders, respectively for decoding data encoded using competition between static and probabilistic coding modes including two-stage coding modes as in Figure 5 and for decoding data encoded using competition involving only one-stage probabilistic coding modes as in Figure 5A.
As shown in Figures 5 and 5A, a statistics block 117 is provided to gather statistics, in particular a probabilistic distribution of transformed (e.g. DCT but not necessarily) block coefficients, for defining the probabilistic coding modes (either single-stage and/or two-stage coding modes) as described below.
Any of the coding modes, be they static or probabilistic, comprises a first operation composed of a conventional spatial or temporal prediction of the original image, from which a segmentation of the original image is obtained and a prediction residual texture frame made of (prediction) residual blocks (or "residuals") is obtained as described above with reference to Figure 1.
A second operation of the static coding mode consists in applying, to each prediction residual block or "prediction residual", a DCT transform (to obtain transform [residual] units) and then quantizing the DCT coefficients using quantizers known a priori (to obtain quantized [residual] blocks) as described above with reference to Figure 1. Such conventional approach already involves competition between static coding modes that decides of the segmentation of the residual frame into prediction residual blocks.
Figure 7 illustrates such a static coding mode where the intra/inter prediction (first operation) is controlled by a user-specified quantization parameter OP (or quality level) and the quantization of the second operation is also controlled by OP (from which the quantizers are defined). Such OP parameter (as defined in HEVC) drives the quality of the encoding, through the evaluation of a mean square error between any original coding unit and the corresponding decoded coding unit.
As provided by the inventors, a probabilistic coding mode is an adaptive mode which adapts its quantization of the transform (residual) unit after the prediction to statistics (e.g. probabilistic distribution) on the transformed coefficients of this transform (residual) unit (and others of the image). For instance, in the case of DCI, a quantizer intended to quantize one DCT coefficient type is chosen depending on the distribution of this coefficient type among the transformed coefficients.
Figure 8 illustrates a single-stage probabilistic coding mode.
After the first conventional prediction operation, the probabilistic encoding involves a quantization of the transform unit corresponding to the prediction residual using quantizers selected based on the statistics on the transform units (from the prediction residuals).
Preferably, the statistics or the probability distribution may be a statistical distribution taken on a large amount of residual blocks. It implies that the quantizers of a probabilistic mode: -are not determined a priori; -are determined a posteriori after computing statistics; -cannot be determined from one block only but from a large portion of the residual frame (preferably from all the blocks on which the coding mode is to be applied); -should only be determined once one knows on which blocks the mode must be applied.
This is the reason why a local/global coupling problem arises as defined above: the quantizers are determined once the transform units or blocks on which the mode is applied are chosen, while the choice of the transform units or blocks on which the mode is applied is made through a competition which uses the quantizers as further described below.
Figure 9 illustrates a two-stage probabilistic coding mode that differs from the single-stage probabilistic coding mode by two stages composing the second operation.
In detail, after the conventional intra/inter prediction, first a conventional static encoding of each prediction residual block is conducted that involves (in addition to DCT transform) a static quantization, i.e. a quantization of transformed coefficients of the prediction residual block using predefined quantizers, and then a second encoding is conducted on the quantization residual resulting from the difference between the original block and a decoded version of the obtained encoded prediction residual, which second encoding is a probabilistic encoding (a probabilistic sub-coding mode) in that it includes quantizing in the frequency domain the quantization residual using quantizers selected based on statistics on the transform units corresponding to the quantization residuals.
The two-stage probabilistic coding mode thus comprises, after the first prediction operation, a second operation made of a first static sub-coding mode and a second probabilistic sub-coding mode.
One may note that the first conventional interfintra prediction and the conventional encoding stage (based on predefined quantizers) can be seen as a first conventional prediction-based encoding of the original coding unit to obtain at least one pre-coded coding unit, wherein the at least one pro-coded coding unit comprises at least one prediction residual quantized in a frequency domain using predefined quantizers.
It is to be recalled here that a conventional approach when encoding an original image is to obtain a segmentation of the original image due to the prediction and thus to apply the DCT transform and the quantization within each coding unit of this obtained segmentation by competing between coding modes that are based on the same block size and are based on half the size of the block size. In other words, the segmentation used for the second encoding of the quantization residuals derives from the segmentation obtained through the prediction and the quantization of the first prediction-based encoding.
Of course, more extended approaches can be implemented.
Each two-stage coding mode as shown in Figure 9, for example lilA, 111B and 112A of Figure 5, is defined, in addition to the above-defined QP, by a positive quantization offset x' that splits the rate available for coding the blocks into a first rate portion assigned to the conventional first encoding stage (i.e. static sub-coding mode) through the value QP+x and a complementary second rate portion assigned to the probabilistic second encoding stage (i.e. probabilistic sub-coding mode).
That means that the predefined quantizers for the conventional first encoding stage are defined based on the value QP'QP+x.
As further described below, the quantizers for the probabilistic encoding stage (either in a single-stage coding mode or in a two-stage coding mode) are optimally selected in that each encoding stage applies to a respective unit having coefficients in the frequency domain (DCT blocks of DCT coefficients), each having a coefficient type; the obtained statistics for a probabilistic encoding stage comprises statistics associated with each of the coefficient types; and selecting the quantizers of a probabilistic encoding stage based on the obtained statistics is performed by selecting, among optimal quantizers each associated with a rate and corresponding distortion, an optimal quantizer for each coefficient type given the associated statistics and a distortion value function of a user-specified quantization parameter QP.
This is described below with more details with reference to Figures 15 and 16.
Figure 10 illustrates an embodiment of a coding mode competition process according to the invention. The process shown in this Figure applies to a specific original image and is to be applied to each image of a video sequence concerned.
The process is split into two passes: the first one, also associated with the word preliminary" below, being designed to gather statistics on the original image signal (for instance the distribution of DCT coefficients in transform [residual] units) with a view to selecting the appropriate optimal quantizers to define the probabilistic coding modes; and the second one to actually encode the original image using the competition between the conventional static coding modes and the thus defined probabilistic coding modes. The second pass may thus lead to a segmentation of the image that is different from the segmentation obtained at the first pass.
The choice of the coding modes used for encoding the original coding units is performed after obtaining the statistics and defining all the probabilistic coding modes.
As shown by the dotted arrow, a loop can be provided to iterate the actual encoding using statistics resulting from a previous actual encoding to adjust the probabilistic coding modes (i.e. selection of the corresponding optimal quantizers) for the next encoding.
In addition, blocks 1040 and 1035 are provided for obtaining statistics specific to the new two-stage probabilistic coding modes. This is because, as set out above, in this new mode, the probabilistic sub-coding mode (second stage) is applied to quantization residuals and not to prediction residuals. 1035 thus makes it possible to obtain corresponding statistics after applying a conventional first stage (static sub-coding mode) as described below.
The arrow that is straight between 1030 and 1050 (i.e. that avoids block 1035) is for the single-stage probabilistic coding modes where the prediction residuals obtained at step 1030 are sufficient to gather statistics for that kind of probabilistic coding mode, as explained below.
Thus depending on whether one or more two-stage probabilistic coding modes or single-stage probabilistic coding modes are provided in competition, block 1035 and the direct arrow between 1030 and 1050 are respectively implemented or not.
Starting from the original image 1000, a preliminary conventional block-based encoding is performed (1005) using competition between intra/inter prediction and competition between conventional static coding modes. A conventional block-based encoder 1010 is used.
The prediction competition introduces a first segmentation of the original images into coding units, for example from 16x16 pixels up to 64x64 pixels. The static mode competition introduces a second segmentation of the resulting coding units into transform units, generally of the same size as the blocks resulting from the prediction or half the size, but between 8x8, or less, pixels and 32x32 pixels.
The static coding modes are defined by quantizers predefined based on the user-specified parameter OP 1015. As is well known, a look-up table in memory of the encoder can tabulate each value of OP with a value of the Lagrange parameter ?.
(1020), from which the quantizers are deduced.
In the example of Figure 5, the static coding modes are the mode 111 based on 16x16 DCI blocks and the mode 112 based on 8x8 blocks.
Resulting from step 1005, each original coding unit in the obtained segmentations has preliminary prediction information associated with it, if any. Indeed, some original coding units may be predicted by inter predictors in which case prediction information relates to motion vectors (and possible image reference indexes). Some original coding units may be predicted by intra predictors in which case prediction information relates to prediction direction in the same original image. Some other original coding units may not be predicted if prediction is not efficient in terms of rate-distortion.
Based on the preliminary prediction information obtained for all the original coding units of the segmentation, a preliminary prediction residual frame is reconstructed (1025) as the difference between the original coding units of the image and corresponding predictors referred to in the prediction information.
One may note that the same approach may be implemented slice by slice as defined in H.264/AVC rather than by the entire frame, since the original coding units are encoded independently in each slice. That means that statistics can be gathered for each slice, and thus the probabilistic coding modes may change from one slice to the other. However for the purpose of illustration of the invention, the entire frame is considered below.
The preliminary prediction residual frame is segmented at step 1030 into preliminary prediction residuals having each a block type from among a plurality of possible block types.
Various approaches to segmentation can be implemented.
A first approach consists in keeping the same second segmentation as obtained at step 1005. This may be the case in Figure 5A where only two probabilistic coding modes will be created corresponding to the two static coding modes.
A second approach consists in applying a predefined segmentation, for example alternating 8x8 blocks and 16x16 blocks.
Another approach combines the second segmentation as obtained during step 1005 with an energy or morphological analysis of the preliminary prediction residual frame. This is to provide frame segmentation according to a criterion of residual activity. This is because it is known from standard codec that, most of the time, a part of a frame with a lot of activity is better compressed by using small blocks. On the contrary less active (flat region) parts are better compressed by using big blocks.
In addition, it is possible to force this segmentation to provide at least one residual block for each possible block type, for instance by forcing some blocks to have the block types not encountered by use of the segmentation method based on residual activity, whatever the content of these blocks. As will be understood from the following description, forcing the presence of each and every possible block type in the segmentation makes it possible to obtain statistics and optimal quantizers for each and every block type and thus to enlarge the field of the optimization process.
Lastly, the segmentation of the image associates a block type from among a plurality of block types with each created preliminary prediction residual.
As an example, the preliminary prediction residual frame has been divided into blocks Bk, each having a particular block type. Several block types may be considered, owing in particular to various possible sizes for the block. Other parameters than size may be used to distinguish between block types.
It is proposed for instance to use the following block types for luminance residual frames, each block type being defined by a size and an index of energy: -16x16 bottom; -16x16 low; -16x16; -8x8 low; -8x8; -8x8 high.
The choice of the block size is performed here by computing the integral of a morphological gradient (measuring residual activity) on each 16x16 block, before applying the DCI transform. (Such a morphological gradient corresponds to the difference between a dilation and an erosion of the luminance residual frame, as explained for instance in "Image Analysis and Mathematical Morphology', Vol. 1, by Jean Serra, Academic Press, February 11, 1984). If the integral computed for a block is higher than a predetermined threshold, the concerned block is divided into four smaller, here 8x8, blocks.
Once the block size of a given block is decided, the block type of this block is determined based on the morphological integral computed for this block, for instance here by comparing the morphological integral with thresholds defining three bands of residual activity (Le. three indices of energy) for each possible size (as noted above, bottom, low or normal residual activity for 16x1 6-blocks and low, normal, high residual activity for 8x8-blocks).
Of course other block sizes may be used, even rectangular shaped blocks, and other criteria.
It may be noted that the morphological gradient is used in the present example to measure the residual activity but that other measures of the residual activity may be used, instead or in combination, such as local energy or Laplace's operator.
It is proposed here that chroniinance blocks each have a block type inferred from the block type of the corresponding luminance block in the frame. For instance chrominance block types can be inferred by dividing in each direction the size of luminance block types by a factor depending on the resolution ratio between the luminance and the chrominance In addition, it is proposed here to define the block type as a function of its size and an index of the energy.
The breakdown of the luminance residual frame into block types of the luminance residual blocks may be stored in a quad-tree. The chrominance block types may be entirely inferred from this quad-tree in such a way that no chrominance quad-tree is needed.
A OCT transform is then applied to each of the blocks concerned in order to obtain a corresponding preliminary prediction residual block of OCT coefficients (i.e. a transform unit according to HEVC standard).
Within a preliminary prediction residual block, the OCT coefficients are associated with an index i (e.g. i = 1 to 54), following an ordering used for successive handling when encoding, for example.
Blocks are grouped into macroblocks MBk. A very common case for so-called 4:2:0 YUV video streams is a macroblock made of 4 blocks of luminance Y, 1 block of chrominance U and 1 block of chrominance V. Here too, other configurations may be considered.
Using the obtained segmentation, statistics on the preliminary prediction residuals can be gathered for each block type as defined above, in particular statistics on each type i of OCT coefficient per block type.
Such statistics will be used to define the single-stage probabilistic coding modes since the latter are bound to apply to the prediction residuals from which the statistics are gathered, and not to the quantization residuals resulting from the first stage. This part of the process corresponds to the direct arrow between 1030 and 1050 where quantization residuals are not considered.
One may note that the statistics for each block type may lead to defining a single-stage coding mode for each compatible (in terms of block size) definition of single-stage coding mode. In the example of Figure 5A, only one compatible definition is available for each block size 16x16 and for each block size 8x8.
As to the two-stage probabilistic coding modes (i.e. where the probabilistic sub-coding mode or second stage applies to quantization residuals), step 1035 is performed to obtain quantization residuals onto statistics.
It is based on predefined quantization offsets x' that are added to the QP parameter (1040) to provide a plurality of intermediate quantization parameters QP'=QP+x.
From each QP', a Lagrange parameter X0 is obtained from the table in memory (1045).
As defined above, the two-stage probabilistic coding modes are defined by a block type and a quantization offset. In the simplified example of Figure 5, the mode lilA is defined by the block type 16x16" and by the quantization offset "1". The mode 1118 is defined by the block type 1 6x16" and by the quantization offset 2". The mode 112A is defined by the block type "8x8" and by the quantization offset "1". Of course other coding modes may be defined as a combination of any block type from -16x16 bottom; -lexl6low; -16x16; -BxS low; -8x8; -8x8 high.
with any quantization offset from predefined offsets.
For example, if there are N predefined quantization offsets and M block types, NxM two-stage probabilistic coding modes can be generated.
The goal of step 1035 is to obtain statistics for each two-stage probabilistic coding mode to define, i.e. for each combination of a block type as obtained with a quantization offset. Of course, some combinations may be forbidden, in which case it is avoided to compute and gather corresponding statistics, and thus to define a corresponding two-stage probabilistic coding mode, To obtain such statistics, a conventional encoding of the preliminary prediction residuals is carried out for each block type and for each possible quantization offset (i.e. to each QP' as defined in 1045), using a compatible (in terms of block size) static coding mode.
A static coding mode for each quantization offset is defined using conventional mechanisms based on 2.. i.e. by using predelined quantizers which are a function of the corresponding X. In other words, the preliminary prediction residuals are quantized several times in the frequency domain using, for each time, quantizers predefined based on a respective quantization offset and the user-specified quantization parameter; and statistics are obtained respectively for each quantization offset from the respective preliminary quantization residuals once quantized.
At step 1035. the preliminary prediction residuals of a given block type are thus encoded using each compatible static coding modes and each quantization offset.
In the simplified example of Figure 5, a generic definition of two-stage probabilistic 16x1 6-block-based coding mode is provided with two quantization offsets: 1' and 2', thus resulting in potentially two two-stage probabilistic coding modes for each 16x16 blocktype: new mode 16x16x1 (11IA)and new mode 16x16x2(111B).
Similarly, a generic definition of two-stage probabilistic 8x8-block-based coding mode is provided with only one quantization offset: 1', thus resulting in potentially only one two-stage probabilistic coding mode for each 8x8 block type: new mode 8x8x1 (112A).
Considering the six energy-based block types defined above (16x16 bottom; 16x16 low; 16x16; 8x8 low; 8x8; 8x8 high), six two-stage probabilistic 16x16-block-based coding mode are potentially created based on the statistics obtained for the block types 16x16 bottom; 16x16 low and 16x16; and three two-stage probabilistic 16x16-block-based coding mode are potentially created based on the statistics obtained for the block types 8x8 low; 8x8 and 8x8 high.
Step 1035 thus consists for each 16x16 preliminary prediction residual block having the same block type (16x16 bottom; 16x16 low or 16x16) to be encoded twice: first, using the static 16x16 coding mode with quantization offset equal to 1 (lilA), i.e. with quantizers predefined using QP'=QP+l; and then, using the static 16x16 coding mode with quantization offset equal to 2 (111 B), i.e. with quantizers predefined using QP'QP+2.
And step 1035 consists for each 8x8 preliminary prediction residual block having the same block type (8x8 low; 8x8 or 8x8 high) to be encoded only once, using the static BxS coding mode with quantization offset equal to 1 (1 12A), i.e. with quantizers predefined using QP'=QP+l.
Considering a given combination of the available combinations of block type and quantization offset, the encoded prediction residuals thus obtained are decoded and then compared to the corresponding preliminary prediction residual before step 1035. The difference between an encoded block and its corresponding preliminary residual results only from the losses due to the quantization QP'. It is thus referred to as "preliminary quantization residual" below.
Preliminary quantization residuals are thus obtained for each considered combination of block type and quantization offset, i.e. for each two-stage probabilistic coding mode to define.
Similarly to the direct arrow between 1030 and 1050, the preliminary quantization residuals of a given combination block type; quantization offset} are DCT transformed and statistics on them are gathered from the DCT coefficients.
Before starting step 1050, a set of statistics have been gathered for each single-stage probabilistic coding mode (i.e. for each block type at step 1030) and for each two-stage probabilistic coding mode (i.e. for each combination of block type and quantization offset at step 1035).
The statistics of a single-stage coding mode are on the preliminary prediction residuals having, at the end of step 1030, the block type defining that single-stage coding mode. Similarly, the statistics of a two-stage coding mode are on the preliminary quantization residuals having, at the end of step 1035, the block type defining that two-stage coding mode and resulting from the first encoding stage based on the quantization offset defining the two-stage coding mode.
As described below, each set of statistics is thus used to define the corresponding probabilistic coding mode, i.e. to select the associated quantizers.
This is performed at step 1050 for each set of statistics. It is to be recalled here that the segmentation at step 1030 can be forced to provide at least one residual block for each possible block type. This is to ensure that a corresponding set of statistics is gathered and that a corresponding probabilistic coding mode can be defined.
Step 1050 that ends the first pass of Figure 10 is now described. It consists in finding optimal quantizers for each probabilistic coding mode (i.e. for each block type in case of a single-stage coding mode and for each available combination of block type and quantization offset in case of a two-stage coding mode) given the corresponding set of statistics gathered.
Reference is now made to DCT blocks that are the transform units resulting from DCT transforming the preliminary prediction residuals or the preliminary quantization residuals having the block type of the considered probabilistic coding mode to define and, where appropriate, resulting from a first encoding stage using the quantization offset of the considered probabilistic coding mode.
For a considered probabilistic coding mode, corresponding DOT blocks obtained at step 1030 or 1035 are considered.
Starting from the OCT blocks, a probabilistic distribution P of each OCT coefficient is determined using a parametric probabilistic model.
Since, in the present example, the DCT blocks are from residual blocks, i.e. information is about a noise residual, it is efficiently modelled by Generalized Gaussian Distributions (GGD) having a zero mean: DCT (X) GGD(a,f3), where a,j3 are two parameters to be determined and the GGD follows the following two-parameter distribution: GGD(cI, /1, x) := exp(-x / 2a['(l/fl) and where F is the well-known Gamma function; F(z) = fetde.
The DCT coefficients cannot be all modelled by the same parameters and, practically, the two parameters a, 13 depend on: -the video content. This means that the parameters must be computed for each image or for every group of n images for instance: -the index I of the DCT coefficient within a DCT block Bk. Indeed, each DCT coefficient has its own behaviour. A DCT channel is thus defined for the OCT coefficients collocated (i.e. having the same index) within the plurality of considered DCT blocks. A DCT channel can therefore be identified by the corresponding coefficient index i. For illustrative purposes, if the residual DCI blocks are 8x8 pixel blocks, the modelling has to determine the parameters of 64 DCT channels for each coding mode.
As described above, the OCT coefficients are furthermore considered per each block type defined above and, if appropriate, per each quantization offset. Indeed, the content of the image, and then the statistics of the DCT coefficients, may be strongly related to the block type because, as explained above, the block type is selected in function of the image content, for instance to use large blocks for parts of the image containing little information.
In addition, since the luminance component V and the chrominance components U and V have dramatically different source contents, they must be encoded in different DCI channels. For example, if it is decided to encode the luminance component V on one channel and to encode jointly the chrominance components UV on another channel, 64 channels are needed for the luminance of a block type of size 8x8 and 16 channels are needed for the joint UV chrominance (made of 4x4 blocks) in a case of a 4:2:0 video where the chrominance is down-sampled by a factor two in each direction compared to the luminance. Alternatively, one may choose to encode U and V separately and 64 channels are needed for Y, 16 for U and 16 for V. At least 64 pairs of parameters for each block type (and quantization offset where appropriate) may appear as a substantial amount of data to transmit to the decoder (see parameter bit-stream 21). However, experience proves that this is quite negligible compared to the volume of data needed to encode the residuals of Ultra High Definition (4k2k or more) videos. As a consequence, one may understand that such a technique is preferably implemented on large videos, rather than on very small videos because the parametric data would take too much volume in the encoded bitstream.
For sake of simplicity of explanation, a set of DCI blocks as obtained at step 1030 or 1035 for the same block type (and quantization offset where appropriate) are now considered. The invention may then be applied to each set corresponding to each block type (and quantization offset where appropriate).
To obtain the two parameters a, p defining the probabilistic distribution P1 for a DCT channel i, the Generalized Gaussian Distribution model is filled onto the DCI block coefficients of the DCI channel, i.e. the DCT coefficients collocated within the DCT blocks of the same block type (and quantization offset where appropriate).
Since this fitting is based on the values of the DCT coefficients, the probabilistic distribution is a statistical distribution of the DCI coefficients within a considered channel i.
For example, the fitting may be simply and robustly obtained using the moment of order k of the absolute value of a GGD: M:=EGGD(af,fl,)j() = Jo xGGD(afl,x) dx= a,t((1+k)1fl1) cc F(1/fl1) Determining the moments of order 1 and of order 2 from the OCT coefficients of channel i makes it possible to directly obtain the value of parameter J3: *12 F(1/fl1)F(3//1) -F(2/fl1)2 The value of the parameter P can thus be estimated by computing the above ratio of the two first and second moments, and then the inverse of the above function of Practically, this inverse function may be tabulated in memory of the encoder instead of computing Gamma functions in real time, which is costly.
The second parameter a may then be determined from the first parameter J3i and the second moment, using the equation: M2 a'2 = a12F(3 IJ1) /IT(1 I fit).
The two parameters a1, f) being determined for the DCT coefficients i, the probabilistic distribution P1 of each OCT coefficient i is defined by (x) = GGD(a,,fl1,x) = 2aF(1/fl1) exp(-x/a1I).
In the probabilistic coding modes to be defined, a quantization of the OCT coefficients is to be performed in order to obtain quantized symbols or values. As explained below, it is proposed here to first determine a quantizer per OCT channel so as to optimize a rate-distortion criterion.
Figure 11 illustrates an exemplary Voronoi cell based quantizer.
A quantizer is made of M Voronoi cells distributed along the values of the DCT coefficients. Each cell corresponds to an interval [trn,tm+i}, called quantum Qrn.
Each cell has a centroid c, as shown in the Figure.
The intervals are used for quantization: a DCT coefficient comprised in the interval [c trn+l I is quantized to a symbol am associated with that interval.
For their part, the centroids are used for de-quantization: a symbol am associated with an interval is de-quantized into the centroid value c,, of that interval.
The quality of a video or still image may be measured by the so-called Peak-Signal-to-Noise-Ratio or PSNR, which is dependent upon a measure of the L2-norm of the error of encoding in the pixel domain, i.e. the sum over the pixels of the squared difference between the original pixel value and the decoded pixel value. It may be recalled in this respect that the PSNR may be expressed in dB as: 1O.1ogi( ), where M is the maximal pixel value (in the spatial domain) and MSE is the mean squared error (Le. the above sum divided by the number of pixels concerned).
However, as noted above, most of video codecs compress the data in the DCT-transfcrmed domain in which the energy of the signal is much better compacted.
The direct link between the PSNR and the error on DCI coefficients is now explained.
For a residual block, we note its inverse DCT (or IDCT) pixel base in the pixel domain as shown on Figure 12. If one uses the so-called IDCT III for the inverse transform, this base is orthonormal: =1.
On the other hand, in the DCT domain, the unity coefficient values form a base qi,, which is orthogonal. One writes the DCT transform of the pixel block X as follows: XDCT =d"q' where d" is the value of the n-th DCI coefficient. A simple base change leads to the expression of the pixel block as a function of the DCI coefficient values: X = IDCT(X0) = IDCTd"ç9 = d"IDCT(qi,,) =d"y'.
If the value of the dc-quantized coefficient d" after decoding is denoted one sees that (by linearity) the pixel error block is given by: = (dz -d)w The mean L2-norm error on all blocks, is thus: Es)= E[d" d2J = EUd d2) = where D is the mean quadratic error of quantization on the n-th DCT coefficient, or squared distortion for this type of coefficient. The distortion is thus a measure of the distance between the original coefficient (here the coefficient before quantization) and the decoded coefficient (here the dequantized coefficient).
It is thus proposed below to control the video quality by controlling the sum of the quadratic errors on the OCT coefficients. In particular, this control is preferable compared to the individual control of each of the DCT coefficient, which is a priori a sub-optimal control.
To fully define a probabilistic coding mode, it is proposed to determine a set of quantizers (to be used each for a corresponding OCT channel), the use of which results in a mean quadratic error having a target value D while minimising the rate obtained.
In view of the above correspondence between PSNR and the mean quadratic error D on OCT coefficients, these constraints can be written as follows: rninimizeR=XR(D) s.t. >JD =D? (A) where 1? is the total rate made of the sum of individual rates R,, for each OCT coefficient. In case the quantization is made independently for each DOT coefficient, the rate 1?,, depends only on the distortion 13,, of the associated n-th DCT coefficient.
It may be noted that the above minimization problem (A) may only be fulfilled by optimal quantizers which are solution of the problem minimizeR,,(D,,) s.t. Ejd" _dj2)=D (B).
This statement is simply proven by the fact that, assuming a first quantizer would not be optimal following (B) but would fulfil (A), then a second quantizer with less rate but the same distortion can be constructed (or obtained). So, if one uses this second quantizer, the total rateR has been diminished without changing the total distortion D; this is in contradiction with the first quantifier being a minimal solution of the problem (A).
As a consequence, the rate-distortion minimization problem (A) can be split into two consecutive sub-problems without losing the optimality of the solution: -first, determining optimal quantizers and their associated rate-distortion curves R,,(D) following the problem (B), which will be done in the present case for GGD channels as explained below; -second, by using optimal quantizers, the problem (A) is changed into the problem (A_opt): minimize R = R (Dr) s.t. = D and R (Dr) is optimal (A_opt).
Based on this analysis, it is proposed as further explained below: -to compute off-line optimal quantizers adapted to possible probabilistic distributions of each DCI channel (thus resulting in a pooi of optimal quantizers); -to select one of these pre-computed optimal quantizers for each DCI channel (La each type of DCI coefficient) such that using the set of selected quantizers results in a global distortion corresponding to the target distortion D with a minimal rate (i.e. a set of quantizers which solves the problem A_opt).
It is now described a possible embodiment for the computation of optimal quantizers for possible probabilistic distributions, here Generalised Gaussian Distributions.
It is proposed to change the previous complex formulation of problem (B) into the so-called Lagrange formulation of the problem: for a given parameter 2 >0, we determine the quantization in order to minimize a cost function such as D2 + AR.
We thus get an optimal rate-distortion couple (DA,RA). In case of a rate control (i.e. rate minimisation) for a given target distortion A1, the optimal parameter 2>0 is determined by 2A = argmin R, (La the value of 2 for which the rate is minimum while fulfiling the constraint on distortion) and the associated minimum rate is RA = R2.
As a consequence, by solving the problem in its Lagrange formulation, for instance following the method proposed below, it is possible to plot a rate distortion curve associating a resulting minimum rate to each distortion value (A H+ RA) which may be computed off-line as well as the associated quantization, La. quantizer, making it possible to obtain this rate-distortion pair.
It is precisely proposed here to formulate problem (B) into a continuum of problems (B_lambda) having the following Lagrange formulation minimize D + AR (D) s.t. Ex _dm2) = D (B lambda).
The well-known Chou-Lookabaugh-Gray algorithm is a good practical way to perform the required minimisation. It may be used with any distortion distance d we describe here a simplified version of the algorithm for the IL2 -distance. This is an iterative process from any given starting guessed quantization.
As noted above, this algorithm is performed here for each of a plurality of possible probabilistic distributions (in order to obtain the pre-computed optimal quantizers for the possible distributions to be encountered in practice), and for a plurality of possible numbers M of quanta. It is described below when applied for a given probabistic distribution P and a given number M of quanta.
In this respect, as the parameter alpha a (or equivalently the standard deviation a of the Generalized Gaussian Definition) can be moved out of the distortion parameter D because it is a homothetic parameter, only optimal quantizers with unity standard deviation a = 1 need to be determined in the pool of quantizers.
Taking advantage of this remark, in the proposed embodiment, the GGD representing a given DCT channel will be normalized before quantization (i.e. homothetically transformed into a unity standard deviation GGD), and will be de-normalized after de-quantization. Of course, this is possible because the parameters (in particular here the parameter a or equivalently the standard deviation a) of the concerned GGD model are sent to the decoder in the video bit-stream.
Before describing the algorithm itself, the following should be noted.
The position of the centroids cm is such that they minimize the distortion 5, inside a quantum, in particular one must verify that = 0 (as the derivative is zero at a minimum).
As the distortion m of the quantization. on the quantum Qm' is the mean error E(d(x;cJ) for a given distortion function or distance d, the distortion on one quantum when using the L2-distance is given by 8, = £x_c,11I2P(x)dx and the nullification of the derivative thus gives: cm = £xP(x)dx/i,zi where P, is the probability of x to be in the quantum Q,,, and is simply the following integral Turning now to minimisation of the cost function C = + 2R, and considering that the rate reaches the entropy of the quantized data: R = -I log2 P, n,=I the nullification of the derivatives of the cost function for an optimal solution can be written as: o = a, C = o, [A -2I in I + - in I÷ Let us set P=P(ç÷1)the value of the probability distribution at the point tm+l* From simple variational considerations, see Figure 13, we get and O(PflJ4 =1' Then, a bit of calculation leads to = a, k -c,, 2](x)dx = -cm 2 + fm,,,, -c, 2 P(x)dx = P}tm+i -cm 2 -25, c I:' (x -cm)P(x)dx = Pç,÷, -crn}2 as well as = tm+t +J2* As the derivative of the cost is now explicitly calculated, its cancellation gives:O = -2Thn -2 -km+1 -d÷12 +Ah1 + A which leads to a useful relation between the quantum boundaries and the centroids c,,: m+1 = cm + cm+I -2 in ÷ -in 2 2(c11 -cm) Thanks to these formulae, the Chou-Lookabaugh-Gray algorithm can be implemented by the following iterative process: 1. Start with arbitrary quanta Qm defined by a plurality of limits tm 2. Compute the probabilities I,, by the formula 1 = P(x)dx 3. Compute the centroids c,, by the formula cm = xP(x)dx/P,,, 4. Compute the limits t, of new quanta by the formula -c,,, + c,.1 -2 ln]P,,,÷1 -in J in+I - 2 2(c11 -ca,) 5. Compute the cost C = + AR by the formula c = > -2I 1n], nml 6. Loop to 2. until convergence of the cost C When the cost C has converged, the current values of limits Em and centroids cm define a quantization, to. a quantizer, with M quanta, which solves the problem (B Jambda), Le. minimises the cost function for a given value 2, and has an associated rate value R2 and an distortion value D2.
Such a process is implemented for many values of the Lagrange parameter 2 (for instance 100 values comprised between 0 and 50). It may be noted that for 2 equal to 0, there is no rate constraint, which corresponds to the so-called Lloyd quantizer.
In order to obtain optimal quantizers for a given parameter /3 of the corresponding GOD, the problems (B_lambda) are to be solved for various odd (by symmetry) values of the number M of quanta and for the many values of the parameter2. A rate-distortion diagram for the optimal quantizers with varying M is thus obtained, as shown on Figure 14.
It turns out that, for a given distortion, there is an optimal number M of needed quanta for the quantization associated to an optimal parameter 2. In brief, one may say that optimal quantizers of the general problem (B) are those associated to a point of the upper envelope of the rate-distortion curves making this diagram, each point being associated with a number of quanta (to. the number of quanta of the quantizer leading to this point of the rate-distortion curve). This upper envelope is illustrated on Figure 15. At this stage, we have now lost the dependency on A of the optimal quantizers: for a given rate (or a given distortion) corresponds only one optimal quantizer whose number of quanta M is fixed.
Based on observations that the GOD modelling provides a value of /3 almost always between 0.5 and 2 in practice, and that only a few discrete values are enough for the precision of encoding, it is proposed here to tabulate /3 every 0.1 in the interval between 0.2 and 25. Considering these values of /3 (Le. here for each of the 24 values of /3 taken in consideration between 0.2 and 2.5), rate-distortion curves, depending on /3, are obtained as shown on Figure 16. It is of course possible to obtain according to the same process rate-distortion curves for a larger number of possible values of /1.
Each curve may in practice be stored in the encoder in a table containing, for a plurality of points on the curve, the rate and distortion (coordinates) of the point concerned, as well as features defining the associated quantizer (here the number of quanta and the values of limits trn and centroids cl?fl for the various quanta). For instance, a few hundreds of quantizers may be stored for each /1 up to a maximum rate, e.g. of 5 bits per DCI coefficient, thus forming the pool of quantizers mentioned above. It may be noted that a maximum rate of 5 bits per coefficient in the prediction/quantization residual frame makes it possible to obtain good quality in the decoded image. Generally speaking, it is proposed to use a maximum rate per OCT coefficient equal or less than 10 bits, for which value near lossless coding is provided.
Before turning to the selection of quantizers, for the various OCT channels and among these optimal quantizers stored in association with their corresponding rate and distortion when applied to the concerned distribution (GGD with a specific parameter /3), it is proposed here to select which part of the DCT channels are to be encoded.
Based on the observation that the rate decreases monotonously as a function of the distortion induced by the quantizer, precisely in each case in the manner shown by the curves just mentioned, it is possible to write the relationship between rate and distortion as follows: R = f,,(-1n(D Ia,,)), where a, is the normalization factor of the OCT coefficient, Le. the GGD model associated to the DCI coefficient has a,, for standard deviation, and where f,,'»= 0 in view of the monotonicity just mentioned.
In particular, without encoding (equivalently zero rate) leads to a quadratic distortion of value a and we deduce that 0 = f,, (0).
Finally, one observes that the curves are convex for parameters /3 lower thantwo:/3«=2 It is proposed here to consider the merit of encoding a OCT coefficient.
More encoding basically results in more rate R (in other words, the corresponding cost) and less distortion D (in other words the resulting gain or advantage).
Thus, when dedicating a further bit to the encoding of the video (rate increase), it should be determined on which OCT coefficient this extra rate is the most efficient. In view of the analysis above, an estimation of the merit M of encoding may be obtained by computing the ratio of the benefit on distortion to the cost of encoding: Al:=
AR
Considering the distortion decreases by an amounts, then a first order development of distortion and rates gives (D-s)2=D2-2W+o(c) and R(D-syfE1n((D-C)Ia)) =f(-ln(D/a)-ln(1-e/D)) =f,,(-ln(D/a)+eID+o(s)) = f, (-ln(D / a)) + £f' (-ln(D / a)) / D As a consequence, the ratio of the first order variations provides an explicit n r' formula for the merit of encoding: M (D) = 7 7, t f, (-ln(D,, / a,,)) If the initial merit M is defined as the merit of encoding at zero rate, i.e. before any encoding, this initial merit M,? can thus be expressed as follows using the preceding formula:M:=M(a,7) = f(O) (because as noted above no encoding leads to a quadratic distortion of value afl.
It is thus possible, starting from the pre-computed and stored rate-distortion curves, to determine the function f,, associated with a given OCT channel and to compute the initial merit M, of encoding the corresponding DCT coefficient (the value f'(O) being determined by approximation thanks to the stored coordinates of rate-distortion curves).
It may further be noted that, for /1 lower than two (which is in practice almost always true), the convexity of the rate distortion curves teaches us that the merit is an increasing function of the distortion.
In particular, the initial merit is thus an upper bound of the merit: M(D)«=M°.
It will now be shown that, when satisfying the optimisation criteria defined above, all encoded DCI coefficients in the block have the same merit after encoding.
Furthermore, this does not only apply to one block only, but as long as the various functions f used in each DOT channel are unchanged, i.e. in particular for all blocks in a given block type (given a quantization offset where appropriate). Hence the common merit value for encoded DOT coefficients will now be referred to as the merit of the block type (and quantization offset where appropriate).
The above property of equal merit after encoding may be shown for instance using the Karush-Kuhn-Tucker (KKT) necessary conditions of optimality. In this goal, the quality constraint = can be rewritten as /i = 0 with The distortion of each DOT coefficient is upper bounded by the distortion without coding: D «= cr,1, and the domain of definition of the problem is thus a multi-dimensional box Q = ((D1,D2,. .);D,, «= a,2} = {(D1,D2,. . .);g «= o}, defined by the functions g (Do) := a,,.
Thus, the problem can be restated as follows: minimize R(D1, D2,...) s.t. h = O,g,, «= 0 (A_opt').
Such an optimization problem under inequality constrains can effectively be solved using so-called Karush-Kuhn-Tucker (KKT) necessary conditions of optimality.
In this goal, the relevant KKT function A is defined as follows: The KKT necessary conditions of minimization are -stationarity: dA = 0, -equality: h = 0, -inequality: «= 0, -dual feasibility: p »= 0, -saturation: = 0.
It may be noted that the parameter 2 in the KKT function above is unrelated to the parameter A used above in the Lagrange formulation of the optimisation problem meant to determine optimal quantizers.
g,, = 0, the n -th condition is said to be saturated. In the present case, it indicates that the n-th OCT coefficient is not encoded.
By using the specific formulation R,, = f,,(-ln(D,, b-,2)) of the rate depending on the distortion discussed above, the stationarity condition gives: 0=30 A =8I?,, -280h-p8D g =-f'/D -22D La 22D =-pD -f'.
By summing on n and taking benefit of the equality condition, this leads to 22D,2 =-uD -EL'-(*) In order to take into account the possible encoding of part of the coefficients only as proposed above, the various possible indices n are distributed into two subsets: -the set Jo = {n;p,, = o} of non-saturated OCT coefficients (Le. of encoded DCT coefficients) for which we have,u,,D,, = 0 and D = -f'I22. and -the set 1 = {n; p > o} of saturated DCT coefficients (i.e. of DCT coefficients not encoded) for which we have pHD,, = -f, -22c,,2.
From (*), we deduce and by gathering the A's 24D On2)=fnT As a consequence, for a non-saturated coefficient (n e J°), i.e. a coefficient to be encoded, we obtain: = [D? _cJfn'(_h1(Dn n))b fm'Ehh1(Dm m)) V me!0 This formula for the distortion makes it possible to rewrite the above formula giving the meritM,1 (D,,) as follows for non-saturated coefficients: M(D) =2.[D _aJi Efm'(_hlWm 7Cm)) mel° Clearly, the right side of the equality does not depend on the OCT channel n concerned. Thus, for a block type k (and quantization offset where appropriate), for any OCT channel n for which coefficients are encoded, the merit associated with said channel after encoding is the same: M0 = m. In other words, all encoded DCT coefficients (i.e. /°) of a block type k (and quantization offset where appropriate) have the same merit M after encoding. The common merit is called the merit of the block type (and quantization offset where appropriate) and noted ink.
Furthermore, all non-encoded coefficients (i.e. 7) have a merit smaller than the merit of the block type (and quantization offset where appropriate).
Another proof of the property of common merit after encoding is the following: supposing that there are two encoded DCT coefficients with two different merits Ml <M2, if an infinitesimal amount of rate from coefficient 1 is put on coefficient 2 (which is possible because coefficient 1 is one of the encoded coefficients and this does not change the total rate), the distortion gain on coefficient 2 would then be strictly bigger than the distortion loss on coefficient 1 (because Ml c M2). This would thus provide a better distortion with the same rate, which is in contradiction with the optimality of the initial condition with two different merits.
As a conclusion, if the two coefficients I and 2 are encoded and if their respective merits Ml and M2 are such that Ml cM2, then the solution is not optimal.
Furthermore, all non-coded coefficients have a merit smaller than the merit of the block type (and quantization offset where appropriate), Le. the merit of coded coefficients after encoding.
In view of the property of equal merits of encoded coefficients when optimisation is satisfied, it is proposed here to encode only coefficients for which the initial encoding merit M1? = is greater than a predetermined target block merit The DCT coefficients with an initial encoding merit M lower than the predetermined target block merit mA are not encoded.
For each coefficient to be encoded, a quantizer is selected to obtain the target block merit as the merit of the coefficient after encoding: first, the corresponding 2D2 distortion, which is thus such that M (D) = = m, can be found by f'(-1n(D,, /a)) dichotomy using stored merit-distortion curves; the quantizer associated with the distortion found is then selected.
Figure 17 illustrates such a stored merit-distortion curve for coefficient it Either the initial merit of the coefficient is lower than the target block merit and the coefficient is not encoded; or there is a unique distortion D such that M (Dr) = MblOEk Knowing the target block merit mk for the block type k (and quantization offset where appropriate), it is possible to deduce a target distortion D from that curve associated with a particular DCI coefficient.
Then the parameter /3 of the OCT channel model for the considered DCT coefficient (obtained from the considered set of statistics on preliminary residuals) makes it possible to select one of the curves of Figure 16! for example the curve of Figure 15.
The target distortion D in that curve thus provides a unique optimal quantizer for DCI coefficient n, having M quanta Q,,.
This is done for each DCT coefficient to encode (i.e. belonging to 1°).
Step 1050 results in a set of optimal quantizers having been selected for the considered block type k (and quantization offset where appropriate), i.e. for the considered probabilistic coding mode to define.
This may be done for the luminance frame Y and for the chrominance frames U.V. In particular, in this process, the computation of DCT coefficients and GGD statistics is performed for the luminance frame Y and the for chrominance frames U,V (both using the same current segmentation associating a block type to each block of the segmentation).
Of course, step 1050 performs this selection of optimal quantizers for each probabilistic coding mode to define, i.e. for each block type (and quantization offset where appropriate).
A description is now given of the obtaining of the target block merit mk for block type k (and quantization offset where appropriate). This is done through blocks 1015, 1020 and 1055 in Figure 10.
The target block merit mk for the block type k currently considered is computed based on a predetermined frame merit m and on a number of blocks Vk of the given block type per area unit, here according to the formula: m =vk.m The frame merit mF may be directly deduced from the user-specified OP parameter, as follows: -get QP parameter from 1015 (Figure 10); -deduce A$tanda for the standard modes from a table (see 1020); -take the same Aproija = Astaridard for the probabilistic coding modes (with either a single or two stages); -deduce the frame merit m through the direct (proportionality as both m and Aproba measure the rate-distortion slope) correspondence between the frame merit m and Aproba. In particular, this makes it possible to obtain the frame merit iii for the luminance frame Y and the frame merits mu in' for the chrominance frames U, V. One may choose the area unit as being the area of a 16x16 block, La 256 pixels. In this case, v = ifor block types of size 16x16, Vk =4for block types of size BxB etc. One also understands that the method is not limited to square blocks; for instance Vk = 2for block types of size 16x8.
This type of computation makes it possible to obtain a balanced encoding between block types (possibly given a quantization offset where appropriate), La here a common merit of encoding per pixel (equal to the frame merit mF) for all block types, possibly given a quantization offset.
This is because the variation of the pixel distortion A8k for the block type A fl2 k is the sum of the distortion vanations provided by the various encoded coder/n DCT coefficients, and can thus be rewritten as follows thanks to the (common) block merit: A8 = m. n,k = 1k.Mk (where ARk is the rate variation for a block of coded,, ASpk mk.ARk F type k). Thus, the merit of encoding per pixel is: = = m (where Uk is AUk vk.ARk the rate per area unit for the block type concerned and possibly quantization offset) and has a common value over the various block types.
Once all the optimal quantizers have been selected for the various probabilistic coding modes (i.e. each block type and/or each possible combination of block type and quantization offset), the first pass in Figure 10 ends.
At this time of the general processing, all the coding modes, including static coding modes and probabilistic coding modes, of Figure 10 are fully defined.
The second pass thus consists in actual encoding of the original image devoid of any a priori segmentation, using competition between the conventional static coding modes and the probabilistic coding modes defined by the quantizers selected at step 1050. This second pass will result in defining a new segmentation of the original image, associating a coding mode with each transform unit or block of the segmentation (i.e. with each prediction/quantization residual), that is a block type possibly together with a quantization offset and corresponding selected optimal quantizers. This is step 1060.
Each coding mode in competition includes a first step of intra/inter prediction followed by one or two encoding stages of OCT transform and quantization using quantizers that are predefined or selected as described above.
The competition involves determining a segmentation of the original image and the coding modes corresponding to each transform unit or block resulting from the segmentation amongst the available coding modes, that provides the best rate-distortion result for the image. Such rate-distortion information is computed through an encoding cost for each possible block.
During the competition, each tested block in the original image is encoded according to several available coding modes and a coding mode is selected based on a rate-distortion criterion on the obtained encoded blocks.
The competition between coding modes is very simple and may be summarizes as follows: -compute the cost of all available coding modes; -choose the mode with the smallest cost.
Costs for static coding modes and probabilistic coding modes have to be computed.
Costs for static coding modes are well known for one skilled in the art.
Therefore, they are not described here in further detail.
Costs for probabilistic coding modes are slightly different from a single-stage coding mode to a two-stage coding mode.
For a single-stage coding mode referenced k, the combined cost Ck,yuv of a block taking into account both luminance Y and chrominance UV can be written: -proba CkYUV -CkyUv + where Red is the cost for encoding the prediction information resulting from the actual inter/intra prediction and cEcj'' is the cost for encoding the prediction residual resulting from that prediction using the probabilistic encoding stage, i.e. the corresponding quantizers selected at step 1050 based on statistics.
For a two-stage coding mode referenced k (k thus refers to the combination of a block type and a quantization offset), the combined cost Ck,yuv of a block taking into account both luminance Y and chrominance UV can be written: r' _rPT0b+ ÷ -k,Yuv residual pred where Rpd is the cost for encoding the prediction information resulting from the actual inter/intra prediction, Rresid/ is the cost for encoding the prediction residual through the static sub-coding stage (first stage) of DCT transform and quantization using predefined quantizers and cçj' is the cost for encoding the quantization residual resulting from the static first stage, using the probabilistic sub-coding stage (second stage), i.e. the corresponding quantizers selected at step 1050 based on statistics.
Rresiduai and Rpred are easily obtained by one skilled in the art as the bitrate of the prediction information and the bitrate of the prediction residual encoded using the first static coding mode.
Below is described how cj is computed, i.e. the cost of encoding a prediction residual using a single-stage probabilistic coding mode or the cost of encoding a quantization residual using a two-stage probabilistic coding mode.
It is proposed here to use a Lagrangian cost of the type + R (as an encoding cost in the encoding cost competition) computed from the bit rate needed for encoding by using the quantizers of the concerned (competing) probabilistic coding or sub-coding mode and the distortion after quantization and dequantization by using the quantizers of the concerned (competing) probabilistic coding or sub-coding mode.
The Lagrangian cost generated by encoding blocks having a particular associated coding mode will be estimated as follows.
The cost of encoding for the luminance is ÀY + Rky where ky is the pixel distortion for the block type of the probabilistic coding mode k introduced above and I4 is the associated rate.
It is known that, as rate and distortion values are constrained on a given rate-distortion curve, Lagrange's parameter can be written as follows: 2 = -______ AS2 and thus approximated as follows: A -= mk = v.m' (where v is the number of blocks of the considered block type per area unit in the luminance frame).
It is thus proposed to estimate the luminance cost as follows: = , + + where QT is the bit rate associated to the parsing of the generalized quad-tree (representing the segmentation) to mark the coding mode (i.e. block type and quantization offset where appropriate) of the concerned block in the bitstream.
In a similar manner, if we write the cost of encoding for the chrominance 82 ______ P,k,UV + Rkuv, Lagrange's parameter is given by A = -and can thus be
A A8
approximated as: 2 Al?kUy Assuming that the quality of luminance frames is the same as the quality of chrominance frames: D = D. = (Dt + D)!2, which gives at the block level: 2 -5P4,U + 8PkV P,L-,UV -Also assuming that the merit of U chrominance frames is the same as the merit of V chrominance frames: mu = m' = which results in an equal merit yr.m' for U and V frames at the block level (where yr is the number of blocks of the given block type per area unit in the chrominance frame).
Thus, Lagrange's parameter can be estimated (based in particular on the definition of the merit) as: A8UV = A(8kU +8kv) = vr.mw A(RkU +Rky) = vfV.mt 2ARkUV 2 2 It is thus proposed to estimate the chrominance cost as follows:
- R
kUV -uv wv + k,UV It may be noted that no rate is dedicated to a chrominance quad-tree as it is considered here that the segmentation for luminance frames follows the segmentation for luminance frame.
The combined cost, taking into account both luminance and chrominance, is the sum of the two associated costs. However, in order to also take into consideration the coupling between luminance and chrominance: the merit of chrominance is computed such that the quality (on the whole frame) of the chrominance matches the quality of the luminance.
As a consequence, a variation of the luminance distortion in one block has a global impact on the average distortion of the chrominance on the whole frame. Due to the quality equality, this impact is a8kUV = and it is thus proposed to introduce a corresponding coupling term in the combined cost, which can thus be estimated by the following formula: _____ 2.(82 +82) Cproba -PAY P,k,UV PAY JC,YUY - uv uv kY k.UV k,QY vk.m Vk.m This formula thus makes it possible to compute the Lagrangian cost in the competition between possible segmentation mentioned above and described in more details below.
Indeed, in practice, the distortions öky and kUV are computed by applying the quantizers selected at step 1050 for the concerned coding mode, then by applying the associated dequantization and finally by comparing the result with the original residual. This last step can e.g. be done in the DCT transform domain because the IDCT is a L2 isometry and total distortion in the DCT domain is the same as the total pixel distortion, as already explained above.
Bit-rates Rky and RkUv can be evaluated without performing the entropy encoding of the quantized coefficients. This is because one knows the rate cost of each quantum of the quantifiers; this rate is simply computed from the probability of falling into this quantum and the probability is provided by the GGD channel modeling associated with the concerned block type k.
Lastly, the size (more precisely the area) of a block impacts the cost formula through the geometrical parameters v[ and yr.
For instance, in the case of a 16x1 6-pixel unit area and a 4:2:0 YUV colour format, the number of blocks per area unit for 16x16 blocks is 4' = 1 for luminance blocks and yr = 2 for chrominance blocks. This last value comes from the fact that one needs two couples of 4x4 UV blocks to cover a unit area of size 16x16 pixels.
Similarly, the number of blocks per area unit for 8x8 blocks is v[ = 4 for luminance blocks and r = S for chrominance blocks.
In the case considered here were possible block sizes are 16x16 and 8x8, the competition between possible coding modes performed at step 1060 (already mentioned above) seeks to determine for each 16x16 area both: -the segmentation of this area into 16x16 or 8x8 blocks, -the choice of the block type and quantization offset, where appropriate, for each block, such that the cost is minimized.
This may lead to a very big number of possible configurations to evaluate.
Fortunately, by using the classical so-called bottom-to-top competition technique (based on the additivity of costs), one can dramatically decrease the number of configurations to deal with.
As shown in Figure 18 (left part), a 16x16 block is segmented into four 8x8 blocks. By using 8x8 cost competition (where the cost for each 8x8 block is computed based on the above formula for each possible coding mode of size 8x8), the most competitive coding mode (Le. the coding mode with the smallest cost) can be selected for each 8x8 block. Then, the cost C16,138 associated with the 8x8 (best) segmentation is just the addition of the four underlying best 8x8 costs: CI5,bCSt8 = E1 Now, one can start the bottom-to-top process by comparing this best cost Cm,,est' using SxB blocks for the 16x16 block to costs computed for coding modes of size l6x16.
Figure 18 is based on the assumption (for clarity of presentation) that there are two possible 16x16 coding modes. Three costs are then to be compared: -the best SxB cost C15,besws deduced from cost additivity; -the 16x16 COSt Cl6,model using 16x16 coding mode 1, -the 16x16 cost Clomode2 using 16x16 coding mode 2.
The smallest cost among these 3 costs decides the segmentation and the block type and quantization offset to associate with the 16x16 block.
It may be noted that the boftom-to-top process could be continued at a larger scale (in case 32x32 blocks are to be considered), or could have started at a lower scale (considering first 4x4 blocks). In this respect, the bottom-to-top competition is not limited to two different sizes, not even to square blocks.
By doing so for each 1 6x1 6 block of the frame, it is thus possible to define a new segmentation, defining a coding mode (with a block type) for each block of the segmentation.
The coding mode competition helps improving the compression performance of about 10%.
When the bottom-to-top competition has been carried out from the very first block of the original image to encode until the last block of the image, the optimal segmentation and encoding of the image have been found.
Each block (i.e. transform unit according to HEVC) as obtained by the segmentation of the original image through this competition is encoded using the coding mode determined at step 1060. And the segmentation of the image into coding modes is saved in a quad-tree.
At the end of step 1060, the encoder has the following information: -a quad tree segmenting the image into blocks to which the associated coding mode used for the encoding is indicated; -the prediction information for each such block; -the quantized prediction residual of each block, supplemented with a quantized quantization residual for the same block in case a two-stage probabilistic coding mode has been used.
A loop from 1060 to 1025 may be provided to update statistics based on the quantized prediction residuals and/or the quantized quantization residuals. In that case, new optimal quantizers can be selected at step 1060 of the next iteration for the probabilistic coding modes based on these updated statistics.
One expects a convergence of the image segmentation and the statistics after several iterations; thus ensuring the optimality of the encoding method because segmentation and statistics are compatible at the end of the iterative process.
When it is considered that the encoding is finished (end of iterative process if implemented), the above information as well as the statistics used to design the probabilistic coding modes are entropy encoded into a bit stream (step 1065) In particular, the statistics for a given probabilistic coding mode (i.e. block type and possible quantization offset) as provided in the bit stream comprise the parameters defining the distribution for each DCT channel, La the parameter a (or equivalently the standard deviation a) and the parameter /3 computed at the encoder side for each OCT channel.
Based on these parameters received in the bit stream, the decoder may deduce the quantizers to be used (a quantizer for each DCI channel) for each probabilistic coding mode thanks to the selection process explained above at the encoder side (the only difference being that the parameters /3 for instance are computed from the original data at the encoder side whereas they are received at the decoder side).
Based on the quad-tree received in the bit stream, the decoder may deduce the coding mode used to encode each block. Decoding of this block, involving dequantization, can thus be performed with the selected quantizers (which are the same as those used at encoding because they are selected in the same way).
Residuals are thus decoded (a prediction residual for each encoded block and in addition, a quantizauion residual in the case of a two-stage probabilistic coding mode) and obtained at the decoder.
Thanks to the corresponding prediction information received in the bit stream, the decoder is thus able to reconstruct the original coding units, less the losses due to the quantization process, by adding the decoded prediction residual, possibly supplemented with the decoded quantization residual in the case of a two-stage probabilistic coding mode, to the predictor defined in the prediction information.
With reference to Figure 19 a simplified process is described in case only one single quantization offset is used and no single-stage probabilistic coding mode is implemented.
Instead of using, at step 1005, the conventional static coding modes defined based on OP only, it is decided to use, at step 1005', static coding modes defined based on QP+x0 (see blocks 1040' and 1045').
This makes it possible to directly have, after step 1030, statistics on the DCT coefficients at QP+x0 to define àorresponding quantizers for a two-stage probabilistic coding mode.
As shown in the Figure, step 1035 is avoided, thus saving computational costs.
With reference now to Figure 20, a particular hardware configuration of a device for encoding or decoding images able to implement methods according to the invention is now described by way of example.
A device implementing the invention is for example a microcomputer 50, a workstation, a personal digital assistant, or a mobile telephone connected to various peripherals. According to yet another embodiment of the invention, the device is in the form of a photographic apparatus provided with a communication interface for allowing connection to a network.
The peripherals connected to the device comprise for example a digital camera 64, or a scanner or any other image acquisition or storage means, connected to an input/output card (not shown) and supplying image data to the device.
The device 50 comprises a communication bus 51 to which there are connected: -a central processing unit CPU 52 taking for example the form of a microprocessor; -a read only memory 53 in which may be contained the programs whose execution enables the methods according to the invention. It may be a flash memory or EEPROM; -a random access memory 54, which, after powering up of the device 50, contains the executable code of the programs of the invention necessary for the implementation of the invention. As this memory 54 is of random access type (RAM), it provides fast access compared to the read only memory 53. This RAM memory 54 stores in particular the various images and the various blocks of pixels as the processing is carried out (transform, quantization, storage of the reference images) on the video sequences; -a screen 55 for displaying data, in particular video and/or serving as a graphical interlace with the user, who may thus interact with the programs according to the invention, using a keyboard 56 or any other means such as a pointing device, for example a mouse 57 or an optical stylus; -a hard disk 58 or a storage memory, such as a memory of compact flash type, able to contain the programs of the invention as well as data used or produced on implementation of the invention; -an optional diskette drive 59, or another reader for a removable data carrier, adapted to receive a diskette 63 and to read/write thereon data processed or to process in accordance with the invention; and -a communication interface 60 connected to the telecommunications network 61, the interface 60 being adapted to transmit and receive data.
In the case of audio data, the device 50 is preferably equipped with an inputfoutput card (not shown) which is connected to a microphone 62.
The communication bus 51 permits communication and interoperability between the different elements included in the device 50 or connected to it. The representation of the bus 51 is non-limiting and, in particular, the central processing unit 52 unit may communicate instructions to any element of the device 50 directly or by means of another element of the device 50.
The diskettes 63 can be replaced by any information carrier such as a compact disc (CD-ROM) rewritable or not, a ZIP disk or a memory card. Generally, an information storage means, which can be read by a micro-computer or microprocessor, integrated or not into the device for processing a video sequence, and which may possibly be removable, is adapted to store one or more programs whose execution permits the implementation of the method according to the invention.
The executable code enabling the coding device to implement the invention may equally welt be stored in read only memory 53, on the hard disk 58 or on a removable digital medium such as a diskette 63 as described earlier. According to a variant, the executable code of the programs is received by the intermediary of the telecommunications network 61, via the interface 60, to be stored in one of the storage means of the device 50 (such as the hard disk 58) before being executed.
The central processing unit 52 controls and directs the execution of the instructions or portions of software code of the program or programs of the invention, the instructions or portions of software code being stored in one of the aforementioned storage means. On powering up of the device 50, the program or programs which are stored in a non-volatile memory, for example the hard disk 58 or the read only memory 53, are transferred into the random-access memory 54, which then contains the executable code of the program or programs of the invention, as well as registers for storing the variables and parameters necessary for implementation of the invention.
It will also be noted that the device implementing the invention or incorporating it may be implemented in the form of a programmed apparatus. For example, such a device may then contain the code of the computer program(s) in a fixed form in an application specific integrated circuit (ASIC)-The device described here and, particularly, the central processing unit 52, may implement all or pad of the processing operations described in relation with Figures 1 to 19, to implement methods according to the present invention and constitute devices according to the present invention.
The above examples are merely embodiments of the invention, which is not limited thereby.
Claims (55)
- CLAIMS1. A method for encoding at least one original coding unit of pixels of an original image, the method comprising: providing a preliminary prediction-based encoding of original coding units, comprising obtaining a preliminary prediction residual image portion resulting from the difference between the original coding units and corresponding predictors; segmenting the preliminary prediction residual image portion into preliminary prediction residuals; providing a second prediction-based encoding of at least one original coding unit, comprising obtaining at least one prediction residual resulting from the prediction of the at least one original coding unit; determining a coding mode to apply to the or each prediction residual, from among a plurality of coding modes including at least one static coding mode and at least one probabilistic coding mode, the or each static coding mode implementing quantization using only predefined quantizers and the or each probabilistic coding mode implementing a quantization using quantizers selected based on statistics on the preliminary prediction residuals; and encoding the or each prediction residual using the corresponding determined coding mode.
- 2. The encoding method of Claim 1, wherein determining a coding mode for a prediction residual comprises encoding the predicuon residual according to several coding modes of the plurality and selecting a coding mode based on a rate-distortion criterion on the obtained encoded prediction residuals.
- 3. The encoding method of Claim 2, wherein encoding costs as a ratio between rate and distortion are computed for each obtained encoded prediction residual, the selection being based on the encoding costs.
- 4. The encoding method of any of Claims 1 to 3, wherein each probabilistic coding mode is defined by a block type defined for the prediction residuals in the preliminary segmentation and by a set of quantizers selected from the statistics on the preliminary prediction residuals for that block type.
- 5. The encoding method of Claim 4, wherein the block type for a prediction residual is function of a level of energy of the prediction residual.
- 6. The encoding method of any of Claims I to 5, wherein the segmentation associates a block type from among a plurality of block types with each created preliminary prediction residual; and the encoding method further comprises obtaining statistics on the preliminary prediction residuals for each block type of said plurality, wherein the quantizers for a probabilistic coding mode defined by a block type are selected based on the statistics obtained for that block type.
- 7. The encoding method of Claim any of Claims 1 to 6, wherein each coding mode applies to a respective unit having coefficients in the frequency domain, each having a coefficient type; the obtained statistics for a probabilistic coding mode comprise statistics associated with each of the coefficient types; and selecting the quantizers of a probabilistic coding mode based on the obtained statistics is performed by selecting, from among optimal quantizers each associated with a rate and corresponding distortion, an optimal quantizer for each coefficient type given the associated statistics and a distortion value function of a user-specified quantization parameter.
- 8. The encoding method of Claim any of Claims 1 to 7, further comprising providing a user-specified quantization parameter, wherein the quantizers of each static coding mode are predefined based on the user-specified quantization parameter, and the quantizers of each probabilistic coding modes are selected based on the obtained statistics and the user-specified quantization parameter.
- 9. The encoding method of Claim any of Claims 1 to 8, further comprising iterating the second prediction-based encoding, wherein the statistics for a next iteration are obtained based on at least one encoded coding unit resulting from the second prediction-based encoding of the previous iteration, and the quantizers for the at least one probabilistic coding mode of the second prediction-based encoding of the next iteration are selected based on the statistics obtained for the next iteration.
- 10. The encoding method of Claim any of Claims 1 to 9, wherein the predefined quantizers are independent from the original image.
- 11. The encoding method of Claim any of Claims 1 to 10, wherein the statistics are obtained by transforming the preliminary prediction residuals into transform units of coefficients in the frequency domain, each coefficient having a coefficient type and by determining, for at least one coefficient type, a probabilistic model for coefficients of said at least one coefficient type based on a plurality of values of coefficients of said at least one coefficient type.
- 12. The encoding method of Claim 11, wherein said probabilistic model is a Generalized Gaussian Distribution.
- 13. The encoding method of Claim 11, wherein the quantizers of a probabilistic coding mode are each associated with one respective coefficient type of the coefficient types in the frequency domain; and the quantizer associated with each given coefficient type is selected based on the probabilistic model for that given coefficient type.
- 14. The encoding method of Claim any of Claims 1 to 13, further comprising providing a bit stream comprising the statistics used to select the quantizers, and comprising, for the or each original coding unit, information on the prediction of said original coding unit during the second prediction-based encoding, an indication of the coding mode used for encoding the prediction residual, and the resulting encoded prediction residual.
- 15. The encoding method of Claim 14, wherein the identities of the coding modes used for the original coding units are stored in a quad-tree in the bit stream.
- 16. The encoding method of Claim any of Claims 1 to 15, wherein at least one of the probabilistic coding modes comprises: a step of prediction-based encoding of an original coding unit to obtain a pre-coded coding unit, this step including predicting an original coding unit and performing a first encoding stage by quantizing, in the frequency domain and using predefined quantizers, a prediction residual resulting from said prediction, and a second encoding stage including quantizing, in the frequency domain and using the quantizers selected for the probabilistic coding mode, a quantization residual resulting from the difference between the original coding unit and a decoded version of the obtained pre-coded coding unit.
- 17. The encoding method of Claim 16 combined with Claim 14 or 15, wherein the bit stream further comprises, for an original coding unit encoded using the two-stage probabilistic coding mode, a quantized prediction residual resulting from the first encoding stage of the original coding unit, and the quantized quantization residual resulting from the second encoding stage.
- 18. The encoding method of Claim 16 or 17, wherein the preliminary prediction-based encoding comprises quantizing preliminary prediction residuals several times after the segmentation using, each time, quantizers predefined based on at least two quantization offsets and on a user-specified quantization parameter, and wherein at least two probabilistic coding modes are two-stage probabilistic coding modes that differ by respective quantization offsets from the at least two quantization offsets, the quantizers of their first encoding stage being predefined based on the respective quantization offset and on the user-specified quantization parameter, and the quantizers of their second encoding stage being selected based on the user-specified quantization parameter and on statistics on preliminary quantization residuals resulting from the difference between the preliminary prediction residuals before quantization and the same preliminary prediction residuals once quantized using the respective quantization offset.
- 19. The encoding method of Claim 16 or 17, wherein the preliminary prediction-based encoding comprises competing between static coding modes to segment the preliminary prediction residual image portion while quantizing each preliminary prediction residual using a corresponding competing static coding mode, the competing static coding modes implementing quantization using predefined quantizers based on one quantization offset and a user-specified quantization parameter, and wherein at least one probabilistic coding mode for the second prediction-based encoding is a two-stage probabilistic coding mode, and, for each two-stage probabilistic coding mode, the quantizers of its first encoding stage are predefined based on the quantization offset and on the user-specified quantization parameter, and the quantizers of its second encoding stage are selected based on the user-specified quantization parameter and on statistics on preliminary quantization residuals resulting from the difference between the preliminary prediction residuals before quantization and the same preliminary prediction residuals once quantized using said quantization offset.
- 20. The encoding method of Claim 18 or 19, wherein the bit stream further comprises the statistics obtained for the or each quantization offset.
- 21. A method for decoding a bit stream comprising data representing at least one coding unit of pixels in an image, the method comprising: obtaining statistics from the bit stream; selecting, based on the obtained statistics, quantizers of at least one probabilistic coding mode implementing a quantization; obtaining, from the bit stream and for the at least one coding unit, an identity of a coding mode from among a plurality of coding modes including at least one static coding mode implementing a quantization using predefined quantizers and the at least one probabilistic coding mode; decoding part of the data using the indicated coding mode, to obtain at least one decoded residual; obtaining prediction information from the bit stream; and reconstructing the at least one coding unit using the prediction information and the at least one decoded residual.
- 22. The decoding method of Claim 21, further comprising obtaining a plurality of sets of statistics corresponding to a respective plurality of probabilistic coding modes, from the bit stream; wherein quantizers are selected for each probabilistic coding mode based on the respective obtained set of statistics.
- 23. The decoding method of Claim 21 or 22, further comprising obtaining a user-specified quantization parameter from the bit stream, wherein the quantizers of a probabilistic coding mode are selected based on the obtained statistics and the user-specified quantization parameter.
- 24. The decoding method of Claim 23, wherein each coding mode applies to a respective unit having coefficients in the frequency domain, each having a coefficient type; the obtained statistics for a probabilistic coding mode comprise statistics associated with each of the coefficient types; and selecting the quantizers of a probabilistic coding mode based on the obtained statistics is performed by selecting, from among optimal quantizers each associated with a rate and corresponding distortion, an optimal quantizer for each coefficient type given the associated statistics and a distortion value function of a user-specified quantization parameter.
- 25. The decoding method of any of Claims 21 to 24, wherein the predefined quantizers are independent from the data representing the at least one coding unit in the bit stream.
- 26. The decoding method of any of Claims 21 to 25, further comprising: decoding part of the data using quantizers selected based on the obtained statistics, to obtain at least one decoded quantization residual; decoding pad of the data using predefined quantizers, to obtain at least one decoded prediction residual; and reconstructing the at least one coding unit by adding a predictor obtained from the prediction information with the at least one decoded prediction residual and the at least one decoded quantization residual.
- 27. A device for encoding at least one original coding unit of pixels of an original image, the device comprising: a first block-based coder configured to provide a preliminary prediction-based encoding of original coding units, by obtaining a preliminary prediction residual image portion resulting from the difference between the original coding units and corresponding predictors and by segmenting the preliminary prediction residual image portion into preliminary prediction residuals; a second block-based coder configured to provide at least one static coding mode and at least one probabilistic coding mode, the or each static coding mode implementing quantization using only predefined quantizers and the or each probabilistic coding mode implementing a quantization using quantizers selected based on statistics on the preliminary prediction residuals; and configured to provide a second prediction-based encoding of at least one original coding unit, by obtaining at least one prediction residual resulting from the prediction of the at least one original coding unit, by determining a coding mode to apply to the or each prediction residual, from among a plurality of coding modes including the at least one static coding mode and the at least one probabilistic coding mode, and by encoding the or each prediction residual using the corresponding determined coding mode.
- 28. The encoding device of Claim 27, wherein the second block-based coder determines a coding mode to apply to a prediction residual by encoding the prediction residual according to several coding modes of the plurality and by selecting a coding mode based on a rate-distortion criterion on the obtained encoded prediction residuals.
- 29. The encoding device of Claim 28, wherein the second block-based coder computes encoding costs as a ratio between rate and distortion for each obtained encoded prediction residual, and makes the selection of quantizers based on the encoding costs.
- 30. The encoding device of any of Claims 27 to 29. wherein each probabilistic coding mode is defined by a block type defined for the prediction residuals in the preliminary segmentation and by a set of quantizers selected from the statistics on the preliminary prediction residual residuals for that block type.
- 31. The encoding device of Claim 30, wherein the block type for a prediction residual is function of a level of energy of the prediction residual.
- 32. The encoding device of any of Claims 27 to 31, wherein the segmentation associates a block type from among a plurality of block types with each created preliminary prediction residual; and the encoding device further comprises a statistics module for obtaining statistics on the preliminary prediction residuals for each block type of said plurality, wherein the quantizers for a probabilistic coding mode defined by a block type are selected based on the statistics obtained for that block type.
- 33. The encoding device of Claim 32, wherein each coding mode applies to a respective unit having coefficients in the frequency domain, each having a coefficient type; the obtained statistics for a probabilistic coding mode comprise statistics associated with each of the coefficient types; and the second block-based coder selects the quantizers of a probabilistic coding mode by selecting, from among optimal quantizers each associated with a rate and corresponding distortion, an optimal quantizer for each coefficient type given the associated statistics and a distortion value function of a user-specified quantization parameter.
- 34. The encoding device of any of Claims 27 to 33, further comprising a module for obtaining a user-specified quantization parameter, wherein the quantizers of each static coding mode are predefined based on the user-specified quantization parameter, and the quantizers of each probabilistic coding modes are selected based on the obtained statistics and the user-specified quantization parameter.
- 35. The encoding device of any of Claims 27 to 34, wherein the second block-based coder is configured to iterate the second prediction-based encoding, wherein the statistics for a next iteration are obtained based on at least one encoded coding unit resulting from the second prediction-based encoding of the previous iteration, and the quantizers for the at least one probabilistic coding mode of the second prediction-based encoding of the next iteration are selected based on the statistics obtained for the next iteration.
- 36. The encoding device of any of Claims 27 to 35, wherein the predefined quantizers are independent from the original image.
- 37. The encoding device of any of Claims 26 to 35, wherein the statistics module obtained statistics by transforming the preliminary prediction residuals into transform units of coefficients in the frequency domain, each coefficient having a coefficient type and by determining, for at least one coefficient type, a probabilistic model for coefficients of said at least one coefficient type based on a plurality of values of coefficients of said at least one coefficient type.
- 38. The encoding device of Claim 37, wherein said probabilistic model is a Generalized Gaussian Distribution.
- 39. The encoding device of Claim 37, configured to associate each of the quantizers of a probabilistic coding mode with one respective coefficient type of the coefficient types in the frequency domain; wherein the second block-based coder is further configured select the quantizer associated with each given coefficient type based on the probabilistic model for that given coefficient type.
- 40. The encoding device of any of Claims 27 to 39, further comprising a bit stream generator for generating a bit stream comprising the statistics used to select the quantizers, and comprising, for the or each original coding unit, information on the prediction of said original coding unit during the second prediction-based encoding, an indication of the coding mode used for encoding the prediction residual, and the resulting encoded prediction residual.
- 41. The encoding device of Claim 40. wherein the bit stream generator is configured to store the identities of the coding modes used for the original coding units in a quad-tree in the bit stream.
- 42. The encoding device of any of Claims 27 to 41, wherein at least one of the probabilistic coding modes comprises: a step of prediction-based encoding of an original coding unit to obtain a pie-coded coding unit, this step including predicting an original coding unit and performing a first encoding stage by quantizing, in the frequency domain and using predefined quantizers, a prediction residual resulting from said prediction, and a second encoding stage including quantizing, in the frequency domain and using the quantizers selected for the probabilistic coding mode, a quantization residual resulting from the difference between the original coding unit and a decoded version of the obtained pre-coded coding unit.
- 43. The encoding device of Claim 42 combined with Claim 40 or 41, wherein the bit stream generator adds, to the bit stream, for an original coding unit encoded using the two-stage probabilistic coding mode, a quantized prediction residual resulting from the first encoding stage of the original coding unit, and the quantized quantization residual resulting from the second encoding stage.
- 44. The encoding device of Claim 42 or 43, wherein the first block-based coder is configured to provide, for the preliminary prediction-based encoding, quantizing preliminary prediction residuals several times after the segmentation using, each time, quantizers predefined based on at least two quantization offsets and on a user-specified quantization parameter, and wherein at least two probabilistic coding modes are two-stage probabilistic coding modes that differ by respective quantization offsets from the at least two quantization offsets, the quantizers of their first encoding stage being predefined based on their respective quantization offset and on the user-specified quantization parameter, and the quantizers of their second encoding stage being selected based on the user-specified quantization parameter and on statistics on preliminary quantization residuals resulting from the difference between the preliminary prediction residuals before quantization and the same preliminary prediction residuals once quantized using their respective quantization offset.
- 45. The encoding device of Claim 42 or 43, wherein the first block-based coder is configured to provide, for the preliminary prediction-based encoding, competition between static coding modes to segment the preliminary prediction residual image portion while quantizing each preliminary prediction residual using a corresponding competing static coding mode, the competing static coding modes implementing quantization using predefined quantizers based on one quantization offset and a user-specified quantization parameter, and wherein at least one probabilistic coding mode in the second block-based coder is a two-stage probabilistic coding mode, and, for each two-stage probabilistic coding mode, the quantizers of its first encoding stage are predefined based on the quantization offset and on the user-specified quantization parameter, and the quantizers of its second encoding stage are selected based on the user-specified quantization parameter and on statistics on preliminary quantization residuals resulting from the difference between the preliminary prediction residuals before quantization and the same preliminary prediction residuals once quantized using said quantization offset.
- 46. The encoding device of Claim 45, wherein the bit stream generator adds, to the bit stream, the statistics obtained for each of the quantization offsets.
- 47. A device for decoding a bit stream comprising data representing at least one coding unit of pixels in an image, the device comprising: a statistics module for obtaining statistics from the bit stream; a quantizer selecting module for selecting, based on the obtained statistics, quantizers of at least one probabilistic coding mode implementing a quantization; a coding mode retrieving module for obtaining, from the bit stream and for the at least one coding unit, an identity of a coding mode from among a plurality of coding modes including at least one static coding mode implementing a quantization using predefined quantizers and the at least one probabilistic coding mode; a decoder to decode part of the data using the indicated coding mode, to obtain at least one decoded residual; a prediction information retrieving module for obtaining prediction information from the bit stream; and a frame reconstruction module for reconstructing the at least one coding unit using the prediction information and the at least one decoded residual.
- 48. The decoding device of Claim 47, wherein the statistics module obtained a plurality of sets of statistics corresponding to a respective plurality of probabilistic coding modes, from the bit stream; and the quantizer selecting module selects quantizers for each probabilistic coding mode based on the respective obtained set of statistics.
- 49. The decoding device of Claim 47 or 48, further comprising a module for obtaining a user-specified quantization parameter from the bit stream, wherein the quantizers of a probabilistic coding mode are selected based on the obtained statistics and the user-specified quantization parameter.
- 50. The decoding device of Claim 49, wherein each coding mode applies to a respective unit having coefficients in the frequency domain, each having a coefficient type; the obtained statistics for a probabilistic coding mode comprise statistics associated with each of the coefficient types: and the quantizer selecting module selects the quantizers of a probabilistic coding mode by selecting, from among optimal quantizers each associated with a rate and corresponding distortion, an optimal quantizer for each coefficient type given the associated statistics and a distortion value function of a user-specified quantization parameter.
- 51. The decoding device of any of Claims 47 to 50, wherein the predefined quantizers are independent from the data representing the at least one coding unit in the bit stream.
- 52. A non-transitory computer-readable medium storing a program which, when executed by a microprocessor or computer system in an apparatus, causes the apparatus to perform the steps of: providing a preliminary prediction-based encoding of original coding units of pixels of an original image, comprising obtaining preliminary prediction residual image portion resulting from the difference between the original coding units and corresponding predictors; segmenting the preliminary prediction residual image portion into preliminary prediction residuals, based on at least one static coding mode implementing a quantization using only predefined quantizers; providing a second prediction-based encoding of at least one original coding unit of pixels of the original image, comprising obtaining at least one prediction residual resulting from the prediction of the at least one original coding unit; determining a coding mode to apply to the or each prediction residual, from among a plurality of coding modes including the at least one static coding mode and at least one probabilistic coding mode implementing a quantization using quantizers selected based on statistics on the preliminary prediction residuals; and encoding the or each prediction residual using the corresponding determined coding mode.
- 53. A non-transitory computer-readable medium storing a program which, when executed by a microprocessor or computer system in an apparatus, causes the apparatus to perform the steps of: obtaining statistics from a bit stream comprising data representing at least one coding unit of pixels in an image; selecting, based on the obtained statistics, quantizers of at least one probabilistic coding mode implementing a quantization; obtaining, for the at least one coding unit, an identity of a coding mode from among a plurality of coding modes including at least one static coding mode implementing a quantization using predefined quantizers and the at least one probabilistic coding mode decoding part of the data using the indicated coding mode, to obtain at least one decoded residual; obtaining prediction information from the bit stream; and reconstructing the at least one coding unit using the prediction information and the at least one decoded residual.
- 54. An encoding device for encoding an image substantially as herein described with reference to, and as shown in, Figure 5; Figure 5A; Figures 5 and 10 or Figures 5 and 19 of the accompanying drawings.
- 55. A decoding device for decoding an image substantially as herein described with reference to, and as shown in, Figure 6 or Figure 6A of the accompanying drawings.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1207182.5A GB2501495B (en) | 2012-04-24 | 2012-04-24 | Methods for encoding and decoding an image with competition of coding modes, and corresponding devices |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1207182.5A GB2501495B (en) | 2012-04-24 | 2012-04-24 | Methods for encoding and decoding an image with competition of coding modes, and corresponding devices |
Publications (3)
Publication Number | Publication Date |
---|---|
GB201207182D0 GB201207182D0 (en) | 2012-06-06 |
GB2501495A true GB2501495A (en) | 2013-10-30 |
GB2501495B GB2501495B (en) | 2015-07-08 |
Family
ID=46261793
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB1207182.5A Active GB2501495B (en) | 2012-04-24 | 2012-04-24 | Methods for encoding and decoding an image with competition of coding modes, and corresponding devices |
Country Status (1)
Country | Link |
---|---|
GB (1) | GB2501495B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017015958A1 (en) * | 2015-07-30 | 2017-02-02 | 华为技术有限公司 | Video encoding and decoding method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5978029A (en) * | 1997-10-10 | 1999-11-02 | International Business Machines Corporation | Real-time encoding of video sequence employing two encoders and statistical analysis |
US20050036699A1 (en) * | 2003-07-18 | 2005-02-17 | Microsoft Corporation | Adaptive multiple quantization |
WO2011084916A1 (en) * | 2010-01-06 | 2011-07-14 | Dolby Laboratories Licensing Corporation | Multiple-pass rate control for video coding applications |
-
2012
- 2012-04-24 GB GB1207182.5A patent/GB2501495B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5978029A (en) * | 1997-10-10 | 1999-11-02 | International Business Machines Corporation | Real-time encoding of video sequence employing two encoders and statistical analysis |
US20050036699A1 (en) * | 2003-07-18 | 2005-02-17 | Microsoft Corporation | Adaptive multiple quantization |
WO2011084916A1 (en) * | 2010-01-06 | 2011-07-14 | Dolby Laboratories Licensing Corporation | Multiple-pass rate control for video coding applications |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017015958A1 (en) * | 2015-07-30 | 2017-02-02 | 华为技术有限公司 | Video encoding and decoding method and device |
US10560719B2 (en) | 2015-07-30 | 2020-02-11 | Huawei Technologies Co., Ltd. | Video encoding and decoding method and apparatus |
Also Published As
Publication number | Publication date |
---|---|
GB201207182D0 (en) | 2012-06-06 |
GB2501495B (en) | 2015-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2015417837B2 (en) | Method and apparatus for transform coding with block-level transform selection and implicit signaling within hierarchical partitioning | |
CN108632628B (en) | Method for deriving reference prediction mode values | |
KR101232420B1 (en) | Rate-distortion quantization for context-adaptive variable length coding (cavlc) | |
US6167162A (en) | Rate-distortion optimized coding mode selection for video coders | |
US9142036B2 (en) | Methods for segmenting and encoding an image, and corresponding devices | |
WO2013128010A2 (en) | Method and devices for encoding a sequence of images into a scalable video bit-stream, and decoding a corresponding scalable video bit-stream | |
US20120163473A1 (en) | Method for encoding a video sequence and associated encoding device | |
GB2488830A (en) | Encoding and decoding image data | |
CN113784126A (en) | Image encoding method, apparatus, device and storage medium | |
CA2943647A1 (en) | Method and apparatus for encoding rate control in advanced coding schemes | |
WO2013000975A1 (en) | Method for encoding and decoding an image, and corresponding devices | |
WO2013000575A1 (en) | Methods and devices for scalable video coding | |
Pang et al. | An analytic framework for frame-level dependent bit allocation in hybrid video coding | |
US9967585B2 (en) | Method for encoding and decoding images, device for encoding and decoding images and corresponding computer programs | |
US20130230096A1 (en) | Methods for encoding and decoding an image, and corresponding devices | |
GB2501493A (en) | Encoding an image signal using quantizers based on statistics relating to the image signal | |
GB2501495A (en) | Selection of image encoding mode based on preliminary prediction-based encoding stage | |
US20130230102A1 (en) | Methods for encoding and decoding an image, and corresponding devices | |
US10764577B2 (en) | Non-MPM mode coding for intra prediction in video coding | |
GB2506348A (en) | Image coding with residual quantisation using statistically-selected quantisers | |
GB2499864A (en) | Encoding and decoding methods and devices that select pixel transform coefficients based on image frame and block merits | |
Nissenbaum | Reduction of prediction side-information for image and video compression | |
GB2506593A (en) | Adaptive post-filtering of reconstructed image data in a video encoder | |
WO2013000973A2 (en) | Method for encoding and decoding an image, and corresponding devices |