WO2014165960A1 - Methods for reconstructing an encoded video at a bit-depth lower than at which it was encoded - Google Patents
Methods for reconstructing an encoded video at a bit-depth lower than at which it was encoded Download PDFInfo
- Publication number
- WO2014165960A1 WO2014165960A1 PCT/CA2013/000644 CA2013000644W WO2014165960A1 WO 2014165960 A1 WO2014165960 A1 WO 2014165960A1 CA 2013000644 W CA2013000644 W CA 2013000644W WO 2014165960 A1 WO2014165960 A1 WO 2014165960A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- bit
- samples
- inverse
- depth
- rounding
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/117—Filters, e.g. for pre-processing or post-processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/80—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
- H04N19/82—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
Definitions
- the present application relates generally to video compression and more particularly to decoding videos encoded at a higher bit-depth using decoders designed for videos having a lower bit-depth.
- the process of video compression typically begins with the acquisition of a raw video signal, say when light strikes electronic components of a charge-coupled device (CCD) in a video camera.
- CCD charge-coupled device
- the camera is obtaining colour-component data for each pixel- position in each picture in a sequence of pictures that makes up the video; the colour components will be values of red, green, and blue if the CCD is based on the classic RGB colour space, or possibly with the addition of a fourth colour component that represents yellow or white light.
- various shortcuts may be taken.
- the CCD may detect only one colour component at each pixel location and extrapolate the missing components based on values from neighbouring pixels. (For example, green values - the most important for human visual perception - may be obtained at 50% of the pixel locations, while red and blue values are each obtained at 25% of the pixel locations.)
- a video encoder Based on the raw video signal, a video encoder makes further changes to the data to create a source video. RGB values are converted to co-ordinates in a colour space that allows the nature of human visual perception to be exploited to achieve greater compression efficiency.
- the colour components may be luma (an approximation of luminance) samples or chroma (short for "chrominance") samples.
- luma an approximation of luminance
- chroma short for "chrominance” samples.
- HEVC High-Efficiency Video Coding
- the luma component is denoted Y
- the chroma components are denoted C r and C b .
- the luma data is treated differently from the chroma data, but C r and C b data are treated equally.
- luma data is not down-sampled, but chroma data - of both types - may optionally be down-sampled; in other words, luma samples correspond to pixels on a one-to-one basis, but a chroma (C r or Cb) sample might correspond to more than one pixel.
- Luma samples in a source video might be represented at one bit-depth while both Cr samples and Cb samples might be represented at another bit-depth; thus the HEVC standard provides two parameters, BitDepthY for luma (Y) and BitDepthC for chroma samples for both C r and C b .
- bit-depth can apply to other colour spaces, including those with additional colour components such as those based upon a supplementary yellow stimulus, or those that incorporate alpha channels.
- supplementary components may be based on a pre-existing parameter, or be provided in a new parameter.
- An encoder will compress a source video comprising samples (said to be in the pixel domain) by, amongst other things, (a) forming a prediction of a set of samples and computing the difference between the prediction and source video samples (b) applying a transform (such as an integer approximation of a discrete cosine transform (DCT)) to generate transformed coefficients (said to be in the transform domain) and (c) quantize those coefficients to generate quantized, transformed coefficients.
- DCT discrete cosine transform
- devices with limited resources such as mobile devices, may still have decoders designed to handle only coefficients encoded based on samples having bit-depth 8.
- Figure la shows a block diagram depicting a conventional sequence of (a) encoding a source video to generate a bit-stream and (b) decoding the bit-stream to generate a
- Figure lb compares two block diagrams depicting respective, conventional sequences, each for (a) encoding source samples having an original bit-depth (d or D) as corresponding coefficients and (b) decoding those coefficients as reconstructed samples having the same original bit-depth (d or D, respectively);
- J Figure lc shows a block diagram comparing two sequences, each for (a) encoding sources samples as coefficients having an original bit-depth D as corresponding coefficients and (b) decoding those coefficients as reconstructed samples having a lower bit-depth d than the original bit-depth D;
- Figures 2a and 2b each show a block diagram depicting a conventional method of decoding coefficients, corresponding to sources samples having an original bit-depth (d in the case of Figure 2a and D in the case of Figure 2b), as reconstructed samples having the same original bit- depth (d and D, respectively);
- FIG. 2c shows a block diagram depicting the impediments to employing a d-bit decode to decode coefficients corresponding to sources samples having an original bit-depth of D;
- Figure 3a shows a block diagram detailing a known method of decoding coefficients, corresponding to sources samples having an original bit-depth D, as reconstructed samples having the same original bit-depth D, while accommodating a picture buffer intended for reference samples of lower bit-depth d;
- Figure 3b shows a block diagram detailing a known method of decoding coefficients, corresponding to sources samples having an original bit-depth 0 as reconstructed samples having the lower bit-depth d, by (a) maintaining the same precision as in the conventional method of Figure 2b throughout all computations and (b) rounding and/or truncating the resulting D-bit reconstructed samples to bit-depth d;
- Figures 4a through 7c each show a block diagram detailing an embodiment disclosed herein of a method of decoding coefficients, corresponding to sources samples having an original bit-depth D, as reconstructed samples having a lower bit-depth d;
- Figures 4a through 5b depict an embodiment in which a residual process, including inverse-quantization and inverse-transformation processes, computes (d + l)-bit residual samples.
- Figure 4a depicts an embodiment in which an inverse-quantization process comprises applying to a quantization parameter, input to the inverse-quantization process, an offset based on bit-depth d, rather than bit-depth D;
- Figure 4b depicts an embodiment in which an inverse-quantization process comprises an inverse-quantization operation configured for use in reconstructing d-bit samples
- Figure 4c depicts an embodiment in which the inverse-quantization process comprises applying an inverse-quantization operation configured for use in reconstructing D-bit samples and rescaling intermediate samples output by the inverse-quantization operation;
- Figure 5a depicts an embodiment in which an inverse-transformation process comprises an inverse-transformation operation configured to produce (d+ l)-bit intermediate samples, given the output of an inverse-quantization designed for use in
- Figure 5b depicts an embodiment in which an inverse-transformation process comprises (a) an inverse-transformation operation configured to produce (D+ l)-bit intermediate samples, given the output of an inverse-quantization designed for use in
- Figures 6a through 7c in contrast to Figures 4a through 5b, each depict an embodiment in which (a) a residual process, including inverse-quantization and inverse- transformation processes, computes (D + l)-bit residual samples and (b) a subsequent prediction process includes rounding and/or truncating intermediate samples;
- Figures 6a through 6c depict an embodiment in which rounding and/or truncating is performed for both inter-prediction and intra-predict ' ion modes.
- Figure 6a depicts an embodiment in which a d-bit output of a prediction operation is padded to produce a D-bit prediction samples
- Figure 6b depicts an embodiment in which a d-bit intermediate samples are padded to produce a D-bit input to a prediction operation
- Figure 6c depicts an embodiment that combines aspects of Figures 6a and 6b by (a) padding d-bit intermediate samples to produce a D-bit input to an intra-prediction operation and (b) padding a d-bit output of an inter-prediction operation to produce a D-bit prediction samples;
- Figure 7a depicts an embodiment in which rounding and/or truncating is performed, only for inter-prediction, prior to an in-loop filtering operation;
- Figure 7b depicts an embodiment in which rounding and/or truncating is performed, only for inter prediction, after an in-loop filtering operation and prior to a sample- adaptive offset;
- Figure 7c depicts an embodiment in which rounding and/or truncating is performed, only for inter prediction, after both an in-loop filtering operation and a sample- adaptive offset;
- Figure 8a shows a flowchart depicting, at a high level, the methods depicted in block-diagram form in Figures 4a through 5b;
- Figure 8b shows a flowchart depicting, at a high level, the methods depicted in block-diagram form in Figures 6a through 7c;
- a method for reconstructing d-bit samples from coefficients, in an encoded video bit-stream, that had been encoded based on source samples having a bit-depth D, where D > d.
- a residual process including inverse-quantization and inverse- transformation processes, is applied to the coefficients to compute (d + l)-bit residual samples.
- a prediction process including clipping intermediate samples, is applied to the residual samples to compute d-bit reconstructed samples.
- a computing device comprises one or more processors and a memory containing processor-executable instructions.
- the instructions when executed by the one or more processors, cause the device to perform a method for reconstructing d-bit samples from coefficients, in an encoded video bit-stream, that had been encoded based on source samples having a bit-depth D, where D > d.
- a residual process including inverse-quantization and inverse-transformation processes, is applied to the coefficients to compute (d + l)-bit residual samples.
- a prediction process including clipping intermediate samples, is applied to the residual samples to compute d-bit reconstructed samples.
- a non-transitory, processor-readable storage medium stores processor-executable instructions in a magnetic, optical, solid-state, or like format.
- the instructions when executed by the one or more processors, cause the device to perform a method for reconstructing d-bit samples from coefficients, in an encoded video bit-stream, that had been encoded based on source samples having a bit-depth D, where D > d.
- a residual process including inverse- quantization and inverse-transformation processes, is applied to the coefficients to compute (d + l)-bit residual samples.
- a prediction process including clipping intermediate samples, is applied to the residual samples to compute d-bit reconstructed samples.
- bit-depths indicate the bit- depth of the data flow at that point.
- the bit-depth indicates the bit-depth of the samples for which the component is designed; for inverse-quantization and inverse-transformation operations, the operation may accept an input and/or produce an output of bit-depth greater than the value of the subscript.
- the bit-depth d is always taken to be less than the bit-depth D.
- FIG. 1 a block diagram is shown depicting a conventional sequence 1000 of encoding and decoding a video.
- Raw source video 1 is input to encoder 10.
- the encoding process generally comprises two major phases. First, in phase 11, the source video is compressed into symbols. This phase includes the encoding of source samples as coefficients at step 100.
- the sources samples may be luma (an approximation of luminance) samples or chroma (short for "chrominance") samples.
- the symbols are binarized and packed into a bit-stream, together with parameters that indicate the particular way in which the compression took place.
- the parameters will be needed by a decoder, in order to correctly decode the bit- stream in a manner complementary to the way in which the source video was encoded.
- the skilled person appreciates that a wide variety of parameters specify, for example, what type of downsampling (if any) of pixels took place to obtain chroma source samples.
- bit-stream 2 which expresses source video 1 in a compressed format.
- Bit-stream 2 is processed by decoder 20, which is compatible with encoderlO in that its decoding process is complementary to the encoding process of the encoder.
- decoder 20 which is compatible with encoderlO in that its decoding process is complementary to the encoding process of the encoder.
- reconstructed video 3 produced by the decoder from bit-stream 2 is an acceptable facsimile of the source video, not a frame-by-frame, pixel-by-pixel duplicate.
- the extent to which reconstructed video 3 resembles source video 1 depends largely on the various parameters that determined how the compression took place. In particular, the parameters affect the compression ratio achieved by the encoding and the fidelity of the reconstructed video as a facsimile of the source video. In general, quality tends to suffer the more highly the video is compressed.
- the decoding process generally comprises two major phases. First, in phase 21, symbols and parameters, as described above, are unpacked from bit-stream 2. Second, in phase 23, the video is reconstructed from the symbols, according to the parameters. For example, if the parameters indicate that downsampling of pixels took place to obtain chroma source samples, this downsampling must be reversed (since each sample corresponds to more than one pixel). The second phase includes reconstructing samples from coefficients at step 200.
- D and d are consistently used to represent a higher bit-depth and a lower bit-depth, respectively.
- D may be 10 and d may be 8.
- teachings of this disclosure are not limited to any specific values of D and d, as long as D > d.
- FIG. lb two block diagrams are shown, side by side, depicting respective, conventional sequences, each for (a) encoding source samples having an original bit- depth as corresponding coefficients and (b) decoding those coefficients as reconstructed samples having the same original bit-depth.
- d-bit source samples 12 are encoded at step 100
- D-bit source samples 12' are encoded at step 100'; the two encoding procedures are designed to process their respective inputs and to output coefficients - 194 for the left sequence and 194' for the right sequence - corresponding to the distinct bit- depths of the respective source samples.
- Decoder 20 designed for d-bit source samples, processes coefficients 194 in two major steps.
- Conventional residual process 210 - for d-bit source samples - produces (d + l)-bit residual samples 219, which conventional prediction process 290 - also for d-bit source samples - turns into d-bit reconstructed samples 293.
- a reconstruction process or portion thereof is described as being "for d-bit samples” if it nominally performs as part of a sequence, such as steps 100 and 200, which is designed for d-bit samples 12 and d-bit reconstructed samples 293.
- intermediate results may have other bit-depths.
- the number of data bits at a data flow is depicted next to a diagonal slash through the flow. In some cases, where they may vary, the bit- depth is not shown.
- Each sequence begins with the encoding of D-bit source samples at step 100' to produce coefficients 194' for D-bit samples.
- These coefficients are processed by two different types of novel decoders, each designed for d-bit samples; to reiterate, each decoder is designed primarily to process coefficients for d-bit samples. However, as disclosed herein, each is re-purposed to also handle coefficients for D-bit coefficients.
- decoder 20X features modified residual process 210X for d-bit samples, which reduces bit depth so that its output is (d + l)-bit residual samples 219X.
- modified residual process 210 of Figure lb Several ways of modifying conventional residual process 210 of Figure lb are described later in reference to Figures 4a through 5b, in which reference numerals 210c through 210g correspond to reference numeral 210X of Figure lc.
- Residual samples 219X which will differ slightly depending on the particular embodiment of modified residual process 210X , are processed by prediction process 290c for d-bit samples, which is only modified so that a sample-adaptive offset (SAO) - an optional decoding step, to be described later - is adjusted for bit-depth d.
- SAO sample-adaptive offset
- This adjusted-SAO prediction process is the same for all embodiments of decoder 20X.
- the output of decoder 20X is d-bit reconstructed samples 293X for a re
- the reconstructed video has lower colour depth than the source video, though a human observer might not be aware that the reconstructed video having colour-depth d is less faithful to the source video than is a conventionally reconstructed video having colour-depth D, unless a side-by-side comparison is made.
- the various embodiments of decoder 20X produce slightly different outputs 293X due to the various versions of modified residual process 210X.
- a D-bit decoder may be capable of decoding d-bit videos, where d ⁇ D, as faithfully as would be done by a d-bit decoder; in fact, an HEVC-compliant 10-bit decoder is required to be able to decode 9-bit and 8-bit videos as they were intended to be decoded.
- the problem dealt with in the present disclosure is the opposite situation: a d-bit-only decoder - i.e., a decoder than can only decode coefficients corresponding to sources samples having an original bit-depth of d - confronted with coefficients corresponding to sources samples having an original bit-depth of D, where D > d.
- decoder 20Y features conventional residual process 210' for
- D-bit samples whose output is (D + l)-bit residual samples 219' (exactly as in the right sequence of Figure lb).
- Residual samples 219' are processed by modified prediction process 290Y, which not only adjusts SAO if necessary, but, more importantly, reduces the bit depth to d.
- modified prediction process 290Y which not only adjusts SAO if necessary, but, more importantly, reduces the bit depth to d.
- decoder 20Y is also d-bit reconstructed samples 293Y for a reconstructed video that is a facsimile of source video 1; as with decoder 20X, the output will vary according to which version of adjusted-SAO prediction process 290Y is employed.
- Figure 2a (2b) shows a block diagram depicting conventional method 200 (200') of decoding coefficients 194 (194'), corresponding to sources samples 12 (12') of Figure lb having an original bit-depth d (D), as reconstructed samples 293 (293') having the same original bit-depth d (D).
- Each method comprises two major portions.
- residual process 210 (210') produces (d + l)-bit ((D + l)-bit) residual samples 219 (219') as input to prediction process 290 (290'), which uses them to generate d-bit (D-bit) reconstructed samples 293 (293') as the output of method 200 (200').
- method 200 takes as input (a) bit-depth value 192 (192') of the source samples 12 (12') - namely d (D) - and (b) quantization parameter (QP) 193 (193').
- Residual process 210 comprises two sequential parts. First is inverse- quantization process 211 (21 ); this part undoes a quantization process that took place during encoding 100' (of Figure 8). Second is inverse-transformation process 216 (216'); this part undoes a discrete cosine transform (DCT) that took place during encoding 100' .
- DCT discrete cosine transform
- inverse-transformation process 216 comprises inverse-transformation operation 217 (217').
- Inverse-quantization process 211 (21 ) is more complex.
- Inverse-quantization operation 214 (214') takes as input both coefficients 194 (194') and a quantization parameter, which is the basis for a factor by which quantization operation will multiply the coefficients to generate dequantized coefficients (i.e., coefficients restored to their previous scale, though typically not identical in value with the original, pre-quantization coefficients).
- that quantization parameter is not always the same as QP 193.
- offset 212' is computed as 6 * (D - 8), and this offset is added to QP 193 at addition operation 213'.
- addition operation 213 has the effect of adding 12 to QP 193.
- the value of 6 * (d - 8) is 0, so addition operation 213 amounts to adding 0, i.e., it has no effect on QP 193.
- prediction process 290 takes as input (d + l)-bit ((D + l)-bit) residual samples 219 (219') as input; each such sample comprises a d-bit (D-bit) magnitude and a single sign bit.
- Prediction samples 279 (279') output by prediction operation 275 (275') are combined at addition operation 220 with d-bit (D-bit) prediction samples 279 (279') generated during either (a) an intra-coding loop including storing reference samples in d-bit (D-bit) line buffer 235 (235') or (b) an inter-prediction loop including storing reference samples in d-bit (D-bit) picture buffer 265 (265').
- n-bit buffer in the context of video decoding means a buffer comprising n-bit words, enough n-bit words to holds all the n-bit samples needed to be stored in one cycle of the reconstruction loop.
- clip3 operation 225 Prior to data flowing to either buffer, it must be clipped from bit-depth d + 2 (D + 2) to bit depth d (D) at clip3 operation 225 (225'); since one bit of the input is a sign bit, the clipping operation will take signed input, but only output non-negative numbers.
- clip3 operation 225 (225') is not a universal function in the way that a cosine function or addition operation is; there are different versions for outputs of different desired bit-depths.
- the corresponding clip3_n(_) operation clips its input, say an intermediate binary integer x, to n bits by (a) outputting zero if the input x is negative and (b) otherwise by outputting the minimum of n and the largest possible n-bit binary integer, namely 2" - 1.
- clipping operation 225 will reduce to 255 all inputs greater than 255, increase to 0 all negative numbers, and leave unchanged all inputs that are already in the range (0, 255].
- Clip3 operation 225 of method 200 being for a d-bit-only method, is designed for a fixed output bit-depth of d; it does not need to be configured based on the video being decoded, as that every processed video is assumed to be d-bit.
- clip3 operation 225' of method 200' is for not only D-bit samples, but also for samples of lower bit-depth; this is because that method is modeled on an HEVC-compliant decoder.
- clip3 operation 225' of method 200' has a variable bit-depth output, it must be configured at run-time - based on the particular video being decoded - for the appropriate output bit-depth. Thus, unlike clip3 operation 225 of method 200, clip3 operation 225' of method 200' also taps off of bit-depth value 192' to determine its ceiling.
- mode selector 272 which is influenced by a parameter (not shown) unpacked from the bit-stream, indicating whether the current frame being decoded was inter coded or intra coded during encoding 100 (100'). (In fact, intra prediction is also used for inter-coded frames.)
- filtering operation 240 (such as de-blocking to remove decoding artifacts) at filtering operation 240 (240').
- Input to filtering operation 240 (240') is compared to one or more thresholds, each of which is designed for a particular bit-depth. Therefore, as with clip3 operation 225 (225') described above, filtering operation 240 (240') must be configured to filter appropriately for the input's bit-depth, namely d ( ⁇ D).
- bit-depth is assumed to be d, so filtering operation 240 of method 200 does not need to learn the bit-depth, d, from bit-stream 2 and to configure itself accordingly; on the other hand, filtering operation 240' of method 200' must learn the bit- depth, D or less, from bit-stream 2 by tapping off of bit-depth value 192', to determine which version of the filtering operation should be employed.
- a sample-adaptive offset may be performed at step 250 if and only if bit- stream 2 has an indicator that this is to be done.
- mode selection applies to the overall process, and how one loop or the other is selected is a matter of implementation choice.
- intra-prediction operation 275m and inter- prediction operation 275n are shown separately (in place of a single, place-holder prediction operation 275 (275')); this allows mode selector 272 to be depicted as following both prediction operators.
- prediction operation 275 (275') must be configured to produce prediction samples 279 (279') of the appropriate bit-depth, namely d (D). Also as with the clipping operation, in the d-bit-only method, the bit-depth is assumed to be d, so prediction operation 275 of method 200 does not need to learn the bit-depth, d, from bit- stream 2 and to configure itself accordingly; on the other hand, prediction operation 275' of method 200' needs to learn the bit-depth, D or less, from bit-stream 2 by tapping off of bit-depth value 192' to determine which version of the prediction operation should be employed.
- prediction operation 275 requires a default input to take the place of missing data pre-determined d-bit (D-bit) input 271 (27 ) is provided to the prediction operation.
- This is conventionally a neutral value, such as 2 d - 1 (2 D - 1) as shown in Figure 2a (2b), which is termed "mid-grey" (after the resulting colour of a pixel whose luma and chroma samples are assigned this value).
- the default value is generated differently in methods 200 and 200'.
- bit-d-only method 200 the default value is generated internally by decoder 20, with no need to learn the bit-depth from bit- stream 2; again, all processed videos are assumed to be d-bit.
- variable-bit-depth method 200' the default value is generated by decoder 20' based on bit-depth value 192'.
- Figures 2a and 2b depict nominal situations, in which a decoder is designed to handle videos of the type received in an efficient manner, in that sources samples and
- Figures 3a through 7c depict decoders that are designed for the smaller bit-depth d, but receive a video of bit-depth D, where D > d, and provides some type of accommodation to that mismatch - the alternative is to fail to reconstruct a video at all.
- Each depiction reuses features of Figures 2a and /or 2b with unchanged feature numerals or with altered feature numerals, depending on whether the feature is unchanged or altered. Features that are changed by design are highlighted with bold lines and boldface text.
- each modified decoding method generically denoted 200Z
- produces a slightly different reconstructed video denoted generically 293Z and denoted specifically with a distinct lower-case letter to match that for the method's feature numeral.
- Figure 2c shows a block diagram depicting the impediments to employing a d-bit- only decoder to decode coefficients corresponding to sources samples having an original bit-depth of D, where D > d. More specifically, inoperable method 200* represents what happens when known method 200' of Figure 2b is confronted with coefficients for a D-bit video.
- Figure 2c is a hybrid of Figures 2a and 2b, as will be explained presently.
- the output of the first major portion of method 200* is (D + l)-bit residual samples 219'.
- the first impediment comes from the fact that the second major portion, prediction process 290* is designed for d-bit samples only. This is indicated by a large 'X' breaking the data from flowing from residual process 210' to prediction process 290*.
- prediction process 290* is not depicted as being identical to prediction process 290 in method 200 of Figure 2a. This is because of optional SAO 250*.
- bit-stream 2 contains an indication (not shown) that a sample-adaptive offset is to be applied to the output of filter 240 in the inter-prediction loop
- the bit-stream will also contain SAO table 195' - which is input to SAO 250* - specifically designed for the bit-depth of the current video.
- SAO table 195' which is input to SAO 250* - specifically designed for the bit-depth of the current video.
- bit-stream 2 effectively causes SAO 250* to be configured so that is compatible with D-bit samples.
- SAO 250* cannot function correctly for videos of bit-depth greater than d. This is because SAO 250* is configured to use a lookup table that requires values to have a specific bit-depth to successfully perform as intended by the D-bit encoder that encoded the video. In reference to the HEVC standard, no output for SAO 250* is defined under these circumstances, wherein the SAO is configured for D-bit input, but is fed d-bit input. This is indicated by shading of inoperable SAO 250* and by a large 'X' breaking the data from flowing out of the SAO (as it would be spurious). This is a second impediment that must be solved, even if the first one is overcome, in the cases when bit-stream 2 indicates an SAO operation is to be performed.
- Figures 3a and 3b depict two known solutions, though they are not both directed to the problem of bit-depth overage.
- Figure 3a shows a block diagram detailing a known method 200a of using a D-bit decoder to decode coefficients 194', corresponding to sources samples 12' having an original bit- depth D, as reconstructed samples 293a having the same original bit-depth D; as such, this method does not solve the bit-depth overage problem addressed by the present disclosure. Its goal is merely ease the computation/storage burden posed by having to store one or more entire D-bit pictures at a time in picture buffer 265' in method 200' of Figure 2b for inter-prediction.
- d-bit picture buffer 265 identical to that in Figure 2a, is used in its place. This is made possible by added steps 264 and 266. Precision of the reference samples destined for it is reduced from D bits to d bits at step 264. A simple implementation is to shift each sample two bits right, effectively truncating the two least significant bits; this is mathematically equivalent to integer division by 2 D d (in which remainders are lost). Picture buffer 265 can then handle each sample as it would for a d-bit video.
- prediction operation 275 will produce D-bit prediction samples 279' that combine appropriately with (D + l)-bit residual samples 219', precision of the reference samples destined for the prediction operation is increased from d bits to D bits at step 266, after the reference samples are moved from picture buffer 265.
- a simple implementation is to shift each sample two bits left, effectively padding the sample with two new least-significant bits, each equal to zero; this is mathematically equivalent to multiplication by 2 D d .
- the net result of the precision decrease before and the precision increase after storage in picture buffer 265 is equivalent to a preservation of precision, but loss of accuracy:
- the two least-significant bits of D bits are cleared (i.e., set to zero), so that each sample is effectively rounded down to the nearest multiple of 2° "d . Consequently, D-bit source videos reconstructed as D-bit videos according to method 200a will suffer in visual fidelity compared to those produced by a true D-bit decoder according to method 200' of Figure 2b.
- Figure 3b shows a block diagram detailing known method 200b of decoding coefficients 194', corresponding to sources samples having an original bit-depth D as reconstructed samples having the lower bit-depth d, by (a) maintaining the same precision as in the conventional method of Figure 2b throughout all computations in prediction process 290b - depicted by heavy data-flow lines - and (b) rounding and/or truncating the resulting D-bit reconstructed samples to bit-depth d at step 291; input of bit-depth value 192', which has the value D, is needed in order to determine by how many bits (D - d) the bit-depth must be reduced.
- a rounding-and/or-truncation operation comprises one or both of rounding and truncating.
- Third, truncating can be implemented via a right-shift operation.
- rounding can be one of many different types, including: (a) rounding toward zero, (b) rounding toward negative infinity, (c) rounding toward positive infinity, (d) rounding half values toward zero, (e) rounding half values toward negative infinity, (f) rounding half values toward positive infinity, (g) rounding half values toward the nearest even value, (h) rounding half values toward the nearest odd value, (i) stochastic rounding, (j) spatial dithering, and (k) spatial dithering in combination with any one of (a) through (h).
- type (f) is favoured by the HEVC standard.
- the corresponding input is, instead, d-bit, filtered, reference samples.
- SAO makes use of a predetermined lookup table designed for the specific bit depth of the video, regardless of what bit-depth the decoder prefers. Moreover, even a d-bit-only decoder must input the table from bit-stream 2.
- SAO 250c This can be done in several ways, which are not depicted in the figures.
- One approach is to configure SAO 250c to rescale d-bit intermediate samples to bit-depth D and then use the rescaled samples to perform lookups in a table configured for use with D-bit intermediate samples; the rescaling can be done in any known fashion, such as by padding (as in step 266 of Figure 2b).
- Another implementation is to configure SAO 250c to perform lookups in a replacement table, which is itself configured for use with d-bit intermediate samples. (For method 2001, described later in reference to Figure 7c, SAO 250' is exactly as it is in conventional method 200' of Figure 2b.)
- bit-stream unpacking operation 21 must be conducted with reference to the bit-depth signalled in bit-stream 2 due to the entropy-coding method employed in the packing of certain categories of symbol, wherein the bit-depth value may control the binarization process that converts the symbol into a string of binary digits.
- the symbol 0 would be represented by the string "0", the symbol 1 as "10”, the symbol 2 as "110”, etc.
- the terminating 0 in the representation of symbol 2 is redundant given a priori knowledge of the alphabet by the decoder.
- a truncated unary code may be employed that represents symbol 2 as "11".
- some parameters, in particular for SAO are binarized in this manner, where the size of the alphabet for SAO offset values is determined by bit-depth value 192.
- Method 200e and methods 200g through 200m all incorporate rounding-and/or- truncation operations and, as previously disclosed, the rounding method can be one of many types. Careful choice as to the particular rounding method is necessary, since, in the absence of a closed loop system that accounts for the effects of the exact rounding-and/or-truncation operation different rounding methods will introduce different types of error. For instance, the relatively straightforward method of rounding half values toward positive infinity is not only asymmetric for positive and negative numbers, but contains a systematic bias that, when influencing the reconstruction loop, will result in a gain greater than one.
- the preferred method of rounding in these methods is rounding half values toward the nearest even value, also known as bankers' rounding, which is unbiased for both positive and negative numbers, for sufficiently well distributed values; a value n can be rounded with respect to the least significant D - d bits replacing n with (n + 1 + ((n » ⁇ D -d)) & 1)) » n, where "»" is the right-shift operator and "&" is the bitwise AND operator.
- Figure 8a shows a flowchart depicting, at a high level, all of methods 200c through
- methods 200c through 200e of Figures 4a through 4c have in common that they each depict an embodiment in which inverse- transformation process 216' comprises an inverse-transformation operation for use in reconstructing D-bit samples (exactly as Figure 2b) and in which conventional inverse-quantization process 211' of method 200' is modified so that its output, when processed by conventional inverse-transformation operation 217', results in (d + l)-bit residual samples, rather than (D + 1)- bit samples, as it does in the case of conventional method 200' of Figure 2b.
- the modification in inverse-quantization process 211c comprises is in applying an offset based on bit-depth d - rather than bit-depth D - to QP 193'. Rather than using input 192' to generate an offset of 6 * (D - 8) as at step 212' of Figure 2c, that input is suppressed or ignored. Instead, the offset 6 * (d - 8) is generated locally (without regard to the bit-depth, D, indicated in bit-stream 2) at step 212c.
- method 200c of Figure 4a is the simplest to implement (since the minor change at step 212c merely ignores the value of the bit-depth of the source samples), but it is the least faithful in reconstructing videos.
- One reason is that no attempt is made to address the increased range of QP values that are potentially present in a bit-stream intended for D-bit decoders (To accommodate the greater fidelity of D-bit systems, the valid range of QP values increases with higher bit-depths), as such, it is only applicable in systems that do not use the extended QP range (this would be typical of bit-streams that target low bit-rate applications such as video streaming).
- the modification in inverse-quantization process 21 Id comprises modifying inverse-quantization operation 214d so that it produces inverse-quantized coefficients of the same magnitude as are produced by inverse-quantization operation 214 in Figure 2a, despite using a QP offset exactly as in Figure 2c.
- the dequantized samples produced by inverse-quantization operation 214d are such that conventional inverse-transformation operation 217' produces (d + l)-bit residual samples 219d.
- the modification in inverse-quantization process 211e comprises adding a new step.
- QP-offsetting 212', 213 and inverse-quantization operation 214' are configured for use in reconstructing D-bit samples exactly as in Figure 2c.
- the intermediate samples output by inverse-quantization operation 214' are rescaled at step 215. As shown in Figure 4c, this may be done by a rounding- and/or-truncating operation.
- rescaling step 215 will reduce the magnitude of intermediate samples (flowing from inverse-quantization operation 214' and inverse- transformation process 217) by D - d bits (even though the bit-depth of samples at this stage is greater than D); input of bit-depth value 192', which has the value D, is needed in order to determine by how many bits the bit-depth must be reduced.
- methods 200f and 200g of Figures 5a through 5b have in common that they each depict an embodiment in which entire inverse-quantization process 211' is exactly as in Figure 2c and invers-transformation process 216' of Figure 2c is modified.
- inverse-quantization process 211' comprises inverse-quantization operation 214' for use in reconstructing D-bit samples.
- Conventional inverse-transformation process 216' of method 200' is modified so as to produce (d + l)-bit residual samples, given the output of conventional inverse-quantization process 211'.
- the modification in inverse-transformation process 216f comprises replacing conventional inverse-transformation operation 217' of Figure 2c with inverse- transformation operation 217f, which is configured to produce (d + l)-bit residual samples 219f, given the output of conventional inverse-quantization process 211'.
- the modification in inverse-transformation process 216f comprises (a) retaining conventional inverse-transformation operation 217' configured, exactly as in Figure 2c, to produce (D + l)-bit intermediate samples, given the output of conventional inverse- quantization process 21 and (b) adding subsequent rounding-and/or-truncating process 218 that, given said (D + l)-bit intermediate samples, produces (d + l)-bit residual samples 219g; input of bit- depth value 192', which has the value D, is needed in order to determine by how many bits (D - d) the bit-depth must be reduced.
- Figure 8b shows a flowchart depicting, at a high level, all of methods 200h through
- conventional prediction process 290' of Figure 2c is modified to include a rounding-and/or-truncating operation (in addition to conventional clip3 operation 225 or 225'), which reduces a (D + h)-bit input to a (d + h)-bit output, where h equals 2 if the rounding-and/or-truncating operation occurs before clip3 operation 225 (as in Figures 6a through 6c) and equals 0 if the rounding-and/or-truncating operation occurs after clip3 operation 225' (as in Figures 7a through 7c). All six of these embodiments will have a distinct prediction process 290Y.
- the reference samples sent both to line buffer 235 for intra prediction and (perhaps via sample-adaptive offset 250c) to reference buffer 265 for inter prediction are d-bit samples.
- prediction samples must have D bits.
- the basic idea is to rescale d-bit intermediate samples to D-bit samples, prior to addition operation 220.
- a simple implementation is to shift each sample two bits left, effectively padding the sample with two least significant bits, each equal to zero; this is
- the padding can be performed before or after a prediction operation is performed.
- either solution can be applied independently for intra-prediction and for inter-prediction.
- This results in four ways to pad three of which are shown in Figures 6a to 6c as methods 200h to 200k, respectively, with different outputs 293h to 293k, respectively.
- padding operation 276 follows prediction for each prediction mode, symbolized by generic prediction operation 275.
- padding operation 274 precedes generic prediction operation 275; the reconstructed d-bit samples 293i output by method 200i have better fidelity than corresponding output 293h from method 200h, because prediction performed by prediction operation 275 will be more accurate if it operates on higher-precision reference samples (even though the extra precision is D - d zero bits).
- Method 200m of Figure 6c depicts an embodiment that combines aspects of Figures 6a and 6b. Padding 274 rescales d-bit intermediate samples to produce D-bit input to intra-prediction operation 275m and padding 276 rescales d-bit output of inter-prediction operation 275n to produce a D-bit prediction samples.
- Generic (i.e., either intra- or inter ) prediction samples 279m will match prediction samples 279h of method 200h in inter-prediction mode and will match prediction samples 279i of method 200i in intra-prediction mode.
- the skilled person will realize from what has already been disclosed that a fourth variation can be obtained by switching the pre-prediction and post-prediction padding between the two prediction modes.
- Method 200m of Figure 6c represents a reasonable balance between, at one extreme, method 200c of Figure 4a - which performs all operations as would be done for a d-bit video, thereby losing precision from the beginning (which results in drift) - and, at the other extreme, known method 200b - which performs all operations as would be done conventionally for a D-bit video, thereby forcing the decoding device (with limited resources) to perform roughly double the work as it would to reconstruct d-bit videos encoded from d-bit source samples.
- Testing of method 200m has verified the advantages of this particular mix of d-bit and D-bit operations, in which the more computationally expensive inter-prediction operation 275n is performed for d-bit reference samples. Objective analysis shows a significant reduction in the distortion compared to method 200c. Visual inspection reveals that the DC drift observed using method 200c is not apparent with method 200m.
- Methods 200j through 2001 of Figures 7a through 7c, respectively, have in common that they each depict an embodiment in which the rounding-and/or-truncating operation is performed only for intra-prediction mode.
- line buffer 235" must do "double duty" as in methods 200a and 200b of Figures 3a and 3b, respectively, to handle D-bit reference samples.
- the complete intra-prediction loop is shown all the way to the inputting of intra-prediction samples 279j to addition operation 220, but the inter-prediction loop is not shown in detail beyond reference-picture buffer 265; inter prediction of next picture at step 269 is a placeholder for two different endings of the inter-prediction loop for each of methods 200j through 2001.
- the three (incomplete) methods, 200j through 2001, depicted in Figures 7a through 7c correspond to three different placements of a rounding-and/or-truncating operation; each placement results in slightly different reconstructed d-bit samples 293j through 2931, respectively.
- rounding-and/or-truncating operation 281 applies prior to in-loop filtering operation 240 (for d-bit samples).
- rounding-and/or- truncating operation 282 applies after in-loop filtering operation 240' (for D-bit samples) and prior to modified optional SAO 250c (for d-bit samples).
- rounding-and/or- truncating operation 283 applies after conventional optional SAO 250' (for D-bit samples); its d-bit output is routed both to output 2931 and to reference-picture buffer 265 for future prediction operations.
- input of bit-depth value 192', which has the value D is needed in order to determine by how many bits (D - d) the bit-depth must be reduced.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Methods are provided for reconstructing d-bit samples from coefficients, in an encoded video bit- stream, that had been encoded based on source samples having a bit-depth D, where D > d. A residual process, including inverse-quantization and inverse-transformation processes, is applied to the coefficients to compute (d + l)-bit residual samples. Then a prediction process, including clipping intermediate samples, is applied to the residual samples to compute d-bit reconstructed samples.
Description
METHODS FOR RECONSTRUCTING AN ENCODED VIDEO AT A BIT-DEPTH LOWER THAN AT WHICH IT WAS ENCODED
FIELD OF TECHNOLOGY
[0001J The present application relates generally to video compression and more particularly to decoding videos encoded at a higher bit-depth using decoders designed for videos having a lower bit-depth.
BACKGROUND
[0002] The process of video compression typically begins with the acquisition of a raw video signal, say when light strikes electronic components of a charge-coupled device (CCD) in a video camera. Conceptually, the camera is obtaining colour-component data for each pixel- position in each picture in a sequence of pictures that makes up the video; the colour components will be values of red, green, and blue if the CCD is based on the classic RGB colour space, or possibly with the addition of a fourth colour component that represents yellow or white light. In practice, various shortcuts may be taken. The CCD may detect only one colour component at each pixel location and extrapolate the missing components based on values from neighbouring pixels. (For example, green values - the most important for human visual perception - may be obtained at 50% of the pixel locations, while red and blue values are each obtained at 25% of the pixel locations.)
[0003] Based on the raw video signal, a video encoder makes further changes to the data to create a source video. RGB values are converted to co-ordinates in a colour space that allows the nature of human visual perception to be exploited to achieve greater compression efficiency. The colour components may be luma (an approximation of luminance) samples or chroma (short for "chrominance") samples. In modern video standards, including High-Efficiency Video Coding
(HEVC), the luma component is denoted Y, while the chroma components are denoted Cr and Cb. Beyond this basic conversion, common to all profiles (i.e., sets of available features) of the standard, many different options can be invoked (even within one profile) to select alternative ways to balance two competing goals of video compression: fidelity of the video reconstructed by a video decoder on the one hand and compression efficiency on the other hand. The design decision to choose certain options will be influenced by usage considerations, such as storage size, transmission bandwidth and the computational resources to effectively exploit a particular option.
[0004] When invoking various options, the luma data is treated differently from the chroma data, but Cr and Cb data are treated equally. For example, luma data is not down-sampled, but chroma data - of both types - may optionally be down-sampled; in other words, luma samples correspond to pixels on a one-to-one basis, but a chroma (Cr or Cb) sample might correspond to more than one pixel. Luma samples in a source video might be represented at one bit-depth while both Cr samples and Cb samples might be represented at another bit-depth; thus the HEVC standard provides two parameters, BitDepthY for luma (Y) and BitDepthC for chroma samples for both Cr and Cb. It should be noted that the treatment of bit-depth can apply to other colour spaces, including those with additional colour components such as those based upon a supplementary yellow stimulus, or those that incorporate alpha channels. The bit-depth of any such
supplementary components may be based on a pre-existing parameter, or be provided in a new parameter.
[0005] An encoder will compress a source video comprising samples (said to be in the pixel domain) by, amongst other things, (a) forming a prediction of a set of samples and computing the difference between the prediction and source video samples (b) applying a transform (such as an integer approximation of a discrete cosine transform (DCT)) to generate transformed coefficients (said to be in the transform domain) and (c) quantize those coefficients to generate quantized, transformed coefficients. The coefficients will typically have more bits than the samples from which they were encoded.
[0006] Older standards specify and many current devices implement codecs based solely
on bit-depths of 8 for both luma and chroma samples, for both encoding and decoding. Increased display resolutions, processor speeds, transmission speeds, and consumers' expectations for ever higher viewing experiences on small have spurred the standardization of profiles, for example in HEVC, that support encoding/decoding of samples having 10-bit or even higher precision.
However, devices with limited resources, such as mobile devices, may still have decoders designed to handle only coefficients encoded based on samples having bit-depth 8.
[0007J In general, a problem arises when coefficients encoded based on samples of bit- depth D (e.g., 10) are encountered by a decoder designed to handle only samples of bit-depth d, with d < D (e.g., d = 8).
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] Reference will now be made, by way of example, to the accompanying drawings, which show example embodiments of the present application, and in which:
[0009] Figure la shows a block diagram depicting a conventional sequence of (a) encoding a source video to generate a bit-stream and (b) decoding the bit-stream to generate a
reconstruction of the video;
[0010] Figure lb compares two block diagrams depicting respective, conventional sequences, each for (a) encoding source samples having an original bit-depth (d or D) as corresponding coefficients and (b) decoding those coefficients as reconstructed samples having the same original bit-depth (d or D, respectively);
[0011 J Figure lc shows a block diagram comparing two sequences, each for (a) encoding sources samples as coefficients having an original bit-depth D as corresponding coefficients and (b) decoding those coefficients as reconstructed samples having a lower bit-depth d than the original bit-depth D;
[0012] Figures 2a and 2b each show a block diagram depicting a conventional method of
decoding coefficients, corresponding to sources samples having an original bit-depth (d in the case of Figure 2a and D in the case of Figure 2b), as reconstructed samples having the same original bit- depth (d and D, respectively);
[0013J Figure 2c shows a block diagram depicting the impediments to employing a d-bit decode to decode coefficients corresponding to sources samples having an original bit-depth of D;
10014] Figure 3a shows a block diagram detailing a known method of decoding coefficients, corresponding to sources samples having an original bit-depth D, as reconstructed samples having the same original bit-depth D, while accommodating a picture buffer intended for reference samples of lower bit-depth d;
[00151 Figure 3b shows a block diagram detailing a known method of decoding coefficients, corresponding to sources samples having an original bit-depth 0 as reconstructed samples having the lower bit-depth d, by (a) maintaining the same precision as in the conventional method of Figure 2b throughout all computations and (b) rounding and/or truncating the resulting D-bit reconstructed samples to bit-depth d;
[0016] Figures 4a through 7c each show a block diagram detailing an embodiment disclosed herein of a method of decoding coefficients, corresponding to sources samples having an original bit-depth D, as reconstructed samples having a lower bit-depth d;
[0017] Figures 4a through 5b, more specifically, each depict an embodiment in which a residual process, including inverse-quantization and inverse-transformation processes, computes (d + l)-bit residual samples.
[0018] Figure 4a, in particular, depicts an embodiment in which an inverse-quantization process comprises applying to a quantization parameter, input to the inverse-quantization process, an offset based on bit-depth d, rather than bit-depth D;
[0019] Figure 4b, in particular, depicts an embodiment in which an inverse-quantization process comprises an inverse-quantization operation configured for use in reconstructing d-bit samples;
[0020] Figure 4c, in particular, depicts an embodiment in which the inverse-quantization process comprises applying an inverse-quantization operation configured for use in reconstructing D-bit samples and rescaling intermediate samples output by the inverse-quantization operation;
[0021] Figure 5a, in particular, depicts an embodiment in which an inverse-transformation process comprises an inverse-transformation operation configured to produce (d+ l)-bit intermediate samples, given the output of an inverse-quantization designed for use in
reconstructing D-bit samples;
[0022] Figure 5b, in particular, depicts an embodiment in which an inverse-transformation process comprises (a) an inverse-transformation operation configured to produce (D+ l)-bit intermediate samples, given the output of an inverse-quantization designed for use in
reconstructing D-bit samples and (b) a rounding-and/or-truncating process that, given said (D + 1)- bit intermediate samples, produces (d + l)-bit residual samples;
[0023] Figures 6a through 7c, in contrast to Figures 4a through 5b, each depict an embodiment in which (a) a residual process, including inverse-quantization and inverse- transformation processes, computes (D + l)-bit residual samples and (b) a subsequent prediction process includes rounding and/or truncating intermediate samples;
[0024] Figures 6a through 6c, more specifically, each depict an embodiment in which rounding and/or truncating is performed for both inter-prediction and intra-predict'ion modes.
[0025] Figure 6a, in particular, depicts an embodiment in which a d-bit output of a prediction operation is padded to produce a D-bit prediction samples;
[0026] Figure 6b, in particular, depicts an embodiment in which a d-bit intermediate samples are padded to produce a D-bit input to a prediction operation;
[0027] Figure 6c, in particular, depicts an embodiment that combines aspects of Figures 6a and 6b by (a) padding d-bit intermediate samples to produce a D-bit input to an intra-prediction operation and (b) padding a d-bit output of an inter-prediction operation to produce a D-bit prediction samples;
[0028] Figure 7a, in particular, depicts an embodiment in which rounding and/or truncating is performed, only for inter-prediction, prior to an in-loop filtering operation;
[0029] Figure 7b, in particular, depicts an embodiment in which rounding and/or truncating is performed, only for inter prediction, after an in-loop filtering operation and prior to a sample- adaptive offset;
[0030] Figure 7c, in particular, depicts an embodiment in which rounding and/or truncating is performed, only for inter prediction, after both an in-loop filtering operation and a sample- adaptive offset;
[0031] Figure 8a, shows a flowchart depicting, at a high level, the methods depicted in block-diagram form in Figures 4a through 5b;
[0032] Figure 8b, shows a flowchart depicting, at a high level, the methods depicted in block-diagram form in Figures 6a through 7c; and
[0033] similar reference numerals may have been used in different figures to denote similar components.
DESCRIPTION OF EXAMPLE EMBODIMENTS
[0034] In one aspect, a method is disclosed for reconstructing d-bit samples from coefficients, in an encoded video bit-stream, that had been encoded based on source samples having a bit-depth D, where D > d. A residual process, including inverse-quantization and inverse- transformation processes, is applied to the coefficients to compute (d + l)-bit residual samples. Then a prediction process, including clipping intermediate samples, is applied to the residual samples to compute d-bit reconstructed samples.
[0035] In another aspect, a computing device is disclosed. The device comprises one or more processors and a memory containing processor-executable instructions. The instructions, when executed by the one or more processors, cause the device to perform a method for reconstructing d-bit samples from coefficients, in an encoded video bit-stream, that had been
encoded based on source samples having a bit-depth D, where D > d. In the method, a residual process, including inverse-quantization and inverse-transformation processes, is applied to the coefficients to compute (d + l)-bit residual samples. Then a prediction process, including clipping intermediate samples, is applied to the residual samples to compute d-bit reconstructed samples.
[0036] In yet another aspect, a non-transitory, processor-readable storage medium is disclosed. The medium stores processor-executable instructions in a magnetic, optical, solid-state, or like format. When executed by a computing device , the instructions, when executed by the one or more processors, cause the device to perform a method for reconstructing d-bit samples from coefficients, in an encoded video bit-stream, that had been encoded based on source samples having a bit-depth D, where D > d. In the method, a residual process, including inverse- quantization and inverse-transformation processes, is applied to the coefficients to compute (d + l)-bit residual samples. Then a prediction process, including clipping intermediate samples, is applied to the residual samples to compute d-bit reconstructed samples.
[0037] Throughout the drawings, when used next to a data line, bit-depths indicate the bit- depth of the data flow at that point. On the other hand, when used as subscripts on a notation for a component, the bit-depth indicates the bit-depth of the samples for which the component is designed; for inverse-quantization and inverse-transformation operations, the operation may accept an input and/or produce an output of bit-depth greater than the value of the subscript. When appearing in the same figure, the bit-depth d is always taken to be less than the bit-depth D.
[0038] Referring first to Figure 1, a block diagram is shown depicting a conventional sequence 1000 of encoding and decoding a video. Raw source video 1 is input to encoder 10. The encoding process generally comprises two major phases. First, in phase 11, the source video is compressed into symbols. This phase includes the encoding of source samples as coefficients at step 100. The sources samples may be luma (an approximation of luminance) samples or chroma (short for "chrominance") samples. Second, in phase 13, the symbols are binarized and packed into a bit-stream, together with parameters that indicate the particular way in which the compression took place. The parameters will be needed by a decoder, in order to correctly decode the bit-
stream in a manner complementary to the way in which the source video was encoded. The skilled person appreciates that a wide variety of parameters specify, for example, what type of downsampling (if any) of pixels took place to obtain chroma source samples.
[0039] The result of the encoding process performed by encoder 10 on source video 1 is bit-stream 2, which expresses source video 1 in a compressed format.
[0040] Bit-stream 2 is processed by decoder 20, which is compatible with encoderlO in that its decoding process is complementary to the encoding process of the encoder. This means that reconstructed video 3 produced by the decoder from bit-stream 2 is an acceptable facsimile of the source video, not a frame-by-frame, pixel-by-pixel duplicate. The extent to which reconstructed video 3 resembles source video 1 depends largely on the various parameters that determined how the compression took place. In particular, the parameters affect the compression ratio achieved by the encoding and the fidelity of the reconstructed video as a facsimile of the source video. In general, quality tends to suffer the more highly the video is compressed.
[0041] The decoding process generally comprises two major phases. First, in phase 21, symbols and parameters, as described above, are unpacked from bit-stream 2. Second, in phase 23, the video is reconstructed from the symbols, according to the parameters. For example, if the parameters indicate that downsampling of pixels took place to obtain chroma source samples, this downsampling must be reversed (since each sample corresponds to more than one pixel). The second phase includes reconstructing samples from coefficients at step 200.
[0042] The novel features disclosed herein are modifications to the conventional method of step 200, made to adapt a decoder, designed for video of lower bit-depth d, to handle a bit-stream produced by an encoder designed for video of higher bit-depth D. Throughout this disclosure, D and d are consistently used to represent a higher bit-depth and a lower bit-depth, respectively. For example, D may be 10 and d may be 8. However, the teachings of this disclosure are not limited to any specific values of D and d, as long as D > d.
[0043] Turning now to Figure lb, two block diagrams are shown, side by side, depicting respective, conventional sequences, each for (a) encoding source samples having an original bit-
depth as corresponding coefficients and (b) decoding those coefficients as reconstructed samples having the same original bit-depth. In the left sequence, d-bit source samples 12 are encoded at step 100, whereas in the right sequence, D-bit source samples 12' are encoded at step 100'; the two encoding procedures are designed to process their respective inputs and to output coefficients - 194 for the left sequence and 194' for the right sequence - corresponding to the distinct bit- depths of the respective source samples.
[0044] In these two conventional sequences, coefficients of each type are processed by decoders designed for them. Decoder 20, designed for d-bit source samples, processes coefficients 194 in two major steps. Conventional residual process 210 - for d-bit source samples - produces (d + l)-bit residual samples 219, which conventional prediction process 290 - also for d-bit source samples - turns into d-bit reconstructed samples 293. Note that as meant herein, a reconstruction process or portion thereof is described as being "for d-bit samples" if it nominally performs as part of a sequence, such as steps 100 and 200, which is designed for d-bit samples 12 and d-bit reconstructed samples 293. During such an encoding-decoding sequence, intermediate results may have other bit-depths. For example, the residual samples 219, output by residual process 210 "for d-bit samples," each consist of d + 1 bits; in particular, they each comprise a d-bit magnitude and a single sign bit. Usually in the accompanying figures, the number of data bits at a data flow is depicted next to a diagonal slash through the flow. In some cases, where they may vary, the bit- depth is not shown. This is the case with inputs and outputs of quantization and inverse- quantization operations (discussed later); nevertheless, these operations are still described as being "for d-bit samples" or "for D-bit samples" as the case may be, since there is a difference between operations for one bit-depth and those for the other bit-depth.
[0045] Continuing in reference to Figure lb, the right sequence is conceptually identical to the left sequence. The difference is that decoder 20' is designed to process coefficients 194' that correspond to D-bit - rather that d-bit - coefficients. Necessarily, conventional residual process 210' - for D-bit samples - produces (D + l)-bit residual samples 219', which conventional prediction process 290' - also for D-bit samples - turns into D-bit reconstructed samples 293'.
[0046] Now in reference to Figure lc, a block diagram is shown depicting, side by side, two sequences with a common starting point. Each sequence begins with the encoding of D-bit source samples at step 100' to produce coefficients 194' for D-bit samples. These coefficients are processed by two different types of novel decoders, each designed for d-bit samples; to reiterate, each decoder is designed primarily to process coefficients for d-bit samples. However, as disclosed herein, each is re-purposed to also handle coefficients for D-bit coefficients.
[0047] In the left sequence, decoder 20X features modified residual process 210X for d-bit samples, which reduces bit depth so that its output is (d + l)-bit residual samples 219X. Several ways of modifying conventional residual process 210 of Figure lb are described later in reference to Figures 4a through 5b, in which reference numerals 210c through 210g correspond to reference numeral 210X of Figure lc. Residual samples 219X, which will differ slightly depending on the particular embodiment of modified residual process 210X , are processed by prediction process 290c for d-bit samples, which is only modified so that a sample-adaptive offset (SAO) - an optional decoding step, to be described later - is adjusted for bit-depth d. This adjusted-SAO prediction process is the same for all embodiments of decoder 20X. The output of decoder 20X is d-bit reconstructed samples 293X for a reconstructed video that is a facsimile of source video 1.
Necessarily, the reconstructed video has lower colour depth than the source video, though a human observer might not be aware that the reconstructed video having colour-depth d is less faithful to the source video than is a conventionally reconstructed video having colour-depth D, unless a side-by-side comparison is made. Ultimately, the various embodiments of decoder 20X produce slightly different outputs 293X due to the various versions of modified residual process 210X.
[0048] It should be noted that a D-bit decoder may be capable of decoding d-bit videos, where d < D, as faithfully as would be done by a d-bit decoder; in fact, an HEVC-compliant 10-bit decoder is required to be able to decode 9-bit and 8-bit videos as they were intended to be decoded. The problem dealt with in the present disclosure is the opposite situation: a d-bit-only decoder - i.e., a decoder than can only decode coefficients corresponding to sources samples having an original bit-depth of d - confronted with coefficients corresponding to sources samples
having an original bit-depth of D, where D > d.
[0049J The method employed by decoder 20X is depicted in flowchart format in Figure 8a.
[0050] In the right sequence, decoder 20Y features conventional residual process 210' for
D-bit samples, whose output is (D + l)-bit residual samples 219' (exactly as in the right sequence of Figure lb). Residual samples 219' are processed by modified prediction process 290Y, which not only adjusts SAO if necessary, but, more importantly, reduces the bit depth to d. Several ways of modifying conventional prediction process 290' of Figure lb are described later in reference to Figures 6a through 7c, in which reference numerals 290h through 2901 correspond to reference numeral 290Y of Figure lc. The output of decoder 20Y is also d-bit reconstructed samples 293Y for a reconstructed video that is a facsimile of source video 1; as with decoder 20X, the output will vary according to which version of adjusted-SAO prediction process 290Y is employed.
[0051J The method employed by decoder 20Y is depicted in flowchart format in Figure 8b.
[0052] The inner workings of decoders 20 and 20' of Figure lb are detailed in Figures 2a and 2b respectively; each of the novel decoding methods disclosed herein incorporate some features of each figure. Features of Figure 2b have the same numerals as corresponding features of Figure 2a, but with a prime sign appended. For efficiency of presentation, both figures will be discussed together, with feature numerals of Figure 2b shown parenthetically after feature numerals for corresponding features of Figure 2a.
[0053] Figure 2a (2b) shows a block diagram depicting conventional method 200 (200') of decoding coefficients 194 (194'), corresponding to sources samples 12 (12') of Figure lb having an original bit-depth d (D), as reconstructed samples 293 (293') having the same original bit-depth d (D). Each method comprises two major portions. In Figure 2a (2b), residual process 210 (210') produces (d + l)-bit ((D + l)-bit) residual samples 219 (219') as input to prediction process 290 (290'), which uses them to generate d-bit (D-bit) reconstructed samples 293 (293') as the output of method 200 (200').
[0054] In addition to the coefficients 194 (194') to be decoded, method 200 (200') takes as
input (a) bit-depth value 192 (192') of the source samples 12 (12') - namely d (D) - and (b) quantization parameter (QP) 193 (193').
[0055] Residual process 210 (210') comprises two sequential parts. First is inverse- quantization process 211 (21 ); this part undoes a quantization process that took place during encoding 100' (of Figure 8). Second is inverse-transformation process 216 (216'); this part undoes a discrete cosine transform (DCT) that took place during encoding 100' . By the socks-and-shoes principle, inverse quantization precedes inverse-transformation during decoding because the quantization followed the DCT during encoding.
[0056] In conventional method 200 (200'), inverse-transformation process 216 (216') comprises inverse-transformation operation 217 (217').
[0057] Inverse-quantization process 211 (21 ) is more complex. Inverse-quantization operation 214 (214') takes as input both coefficients 194 (194') and a quantization parameter, which is the basis for a factor by which quantization operation will multiply the coefficients to generate dequantized coefficients (i.e., coefficients restored to their previous scale, though typically not identical in value with the original, pre-quantization coefficients). However, that quantization parameter is not always the same as QP 193. In the case of method 200', when typically the larger bit-depth D exceeds 8, offset 212' is computed as 6 * (D - 8), and this offset is added to QP 193 at addition operation 213'. In the specific case of D = 10, addition operation 213 has the effect of adding 12 to QP 193. In the case of method 200, when the smaller bit-depth d is typically 8, the value of 6 * (d - 8) is 0, so addition operation 213 amounts to adding 0, i.e., it has no effect on QP 193. The skilled person recognizes that implementation details of steps 212 (212') and 213 may vary; in some embodiments, a test may be performed to avoid an unnecessary computation at step 212 when d = 8. Regardless of how inverse-quantization process 211 (21 ) is performed, the dequantized coefficients it generates are passed to inverse-transformation operation 217 (217'), already mentioned.
[0058] Still in reference to Figure 2a (2b), prediction process 290 (290') takes as input (d + l)-bit ((D + l)-bit) residual samples 219 (219') as input; each such sample comprises a d-bit (D-bit)
magnitude and a single sign bit.
10059] Prediction samples 279 (279') output by prediction operation 275 (275') are combined at addition operation 220 with d-bit (D-bit) prediction samples 279 (279') generated during either (a) an intra-coding loop including storing reference samples in d-bit (D-bit) line buffer 235 (235') or (b) an inter-prediction loop including storing reference samples in d-bit (D-bit) picture buffer 265 (265'). The skilled person will understand that the term "n-bit buffer" in the context of video decoding means a buffer comprising n-bit words, enough n-bit words to holds all the n-bit samples needed to be stored in one cycle of the reconstruction loop. Prior to data flowing to either buffer, it must be clipped from bit-depth d + 2 (D + 2) to bit depth d (D) at clip3 operation 225 (225'); since one bit of the input is a sign bit, the clipping operation will take signed input, but only output non-negative numbers. As the skilled person is aware, clip3 operation 225 (225') is not a universal function in the way that a cosine function or addition operation is; there are different versions for outputs of different desired bit-depths. For bit-depth n, the corresponding clip3_n(_) operation clips its input, say an intermediate binary integer x, to n bits by (a) outputting zero if the input x is negative and (b) otherwise by outputting the minimum of n and the largest possible n-bit binary integer, namely 2" - 1. For example, for n = 8, clipping operation 225 will reduce to 255 all inputs greater than 255, increase to 0 all negative numbers, and leave unchanged all inputs that are already in the range (0, 255].
[0060] There is a significant difference between the respective clip3 operations in methods
200 and 200'. Clip3 operation 225 of method 200, being for a d-bit-only method, is designed for a fixed output bit-depth of d; it does not need to be configured based on the video being decoded, as that every processed video is assumed to be d-bit. On the other hand, clip3 operation 225' of method 200', is for not only D-bit samples, but also for samples of lower bit-depth; this is because that method is modeled on an HEVC-compliant decoder. (Recall that a 10-bit HEVC-compliant decoder must be able to decode 9-bit and 8-bit videos.) Since clip3 operation 225' of method 200' has a variable bit-depth output, it must be configured at run-time - based on the particular video being decoded - for the appropriate output bit-depth. Thus, unlike clip3 operation 225 of method 200, clip3 operation 225' of method 200' also taps off of bit-depth value 192' to determine its
ceiling.
[0061] Which prediction loop is active is determined by mode selector 272, which is influenced by a parameter (not shown) unpacked from the bit-stream, indicating whether the current frame being decoded was inter coded or intra coded during encoding 100 (100'). (In fact, intra prediction is also used for inter-coded frames.)
[0062] In inter-prediction mode, the reference samples must also undergo in-loop filtering
(such as de-blocking to remove decoding artifacts) at filtering operation 240 (240'). Input to filtering operation 240 (240') is compared to one or more thresholds, each of which is designed for a particular bit-depth. Therefore, as with clip3 operation 225 (225') described above, filtering operation 240 (240') must be configured to filter appropriately for the input's bit-depth, namely d (< D). Once again, in the d-bit-only method, the bit-depth is assumed to be d, so filtering operation 240 of method 200 does not need to learn the bit-depth, d, from bit-stream 2 and to configure itself accordingly; on the other hand, filtering operation 240' of method 200' must learn the bit- depth, D or less, from bit-stream 2 by tapping off of bit-depth value 192', to determine which version of the filtering operation should be employed.
[0063] Optionally, a sample-adaptive offset may be performed at step 250 if and only if bit- stream 2 has an indicator that this is to be done.
[0064] Regardless of whether the decoding of method 200 (200') is operating in inter- coding mode or intra-coding mode, reference samples that had been loaded into either line buffer 235 (235') (in intra-coding mode) or picture buffer 265 (265') (in inter-coding mode) at a preceding cycle are used by prediction operation 275 (275') at the current cycle. The skilled person realizes that diagrams such as Figures 2a and 2b are common short-hand in the art for two separate feedback processes; in particular, the functioning of prediction operation 275 (275') is necessarily different for the two different modes. Moreover, the positioning of mode selector 272 within overall prediction process 290 (290') is merely for pictorial completeness. Conceptually, mode selection applies to the overall process, and how one loop or the other is selected is a matter of implementation choice. For example, in Figure 6a, intra-prediction operation 275m and inter-
prediction operation 275n are shown separately (in place of a single, place-holder prediction operation 275 (275')); this allows mode selector 272 to be depicted as following both prediction operators.
[0065] As with clip3 operation 225 (225') described above, prediction operation 275 (275') must be configured to produce prediction samples 279 (279') of the appropriate bit-depth, namely d (D). Also as with the clipping operation, in the d-bit-only method, the bit-depth is assumed to be d, so prediction operation 275 of method 200 does not need to learn the bit-depth, d, from bit- stream 2 and to configure itself accordingly; on the other hand, prediction operation 275' of method 200' needs to learn the bit-depth, D or less, from bit-stream 2 by tapping off of bit-depth value 192' to determine which version of the prediction operation should be employed.
[0066] If prediction operation 275 (275') requires a default input to take the place of missing data pre-determined d-bit (D-bit) input 271 (27 ) is provided to the prediction operation. This is conventionally a neutral value, such as 2d - 1 (2D - 1) as shown in Figure 2a (2b), which is termed "mid-grey" (after the resulting colour of a pixel whose luma and chroma samples are assigned this value). Given what has been said already about prediction operation 275 (275'), the default value is generated differently in methods 200 and 200'. In bit-d-only method 200, the default value is generated internally by decoder 20, with no need to learn the bit-depth from bit- stream 2; again, all processed videos are assumed to be d-bit. In variable-bit-depth method 200', the default value is generated by decoder 20' based on bit-depth value 192'.
[0067] The output of method 200 (200'), namely d-bit (D-bit) reconstructed samples 293
(293'), for the current cycle is identical to the reference samples stored in picture buffer 265 (265') for the next cycle.
[0068] Figures 2a and 2b depict nominal situations, in which a decoder is designed to handle videos of the type received in an efficient manner, in that sources samples and
intermediate samples can be processed without wasted memory (due to D - d "overage" bits being stored in a second d-bit word) or wasted time and power (due to packing and unpacking to avoid wasted memory). Figures 3a through 7c, on the other hand, depict decoders that are designed for
the smaller bit-depth d, but receive a video of bit-depth D, where D > d, and provides some type of accommodation to that mismatch - the alternative is to fail to reconstruct a video at all. Each depiction reuses features of Figures 2a and /or 2b with unchanged feature numerals or with altered feature numerals, depending on whether the feature is unchanged or altered. Features that are changed by design are highlighted with bold lines and boldface text. Data values that that change as a consequence of design changes are not so highlighted. In particular, each modified decoding method, generically denoted 200Z, produces a slightly different reconstructed video, denoted generically 293Z and denoted specifically with a distinct lower-case letter to match that for the method's feature numeral.
(0069] Figure 2c shows a block diagram depicting the impediments to employing a d-bit- only decoder to decode coefficients corresponding to sources samples having an original bit-depth of D, where D > d. More specifically, inoperable method 200* represents what happens when known method 200' of Figure 2b is confronted with coefficients for a D-bit video. Figure 2c is a hybrid of Figures 2a and 2b, as will be explained presently.
[0070] In the first major portion of residual process 210' is shown as being identical to that in method 200' of Figure 2b (rather than method 200 of Figure 2a, as might seem logical for a d-bit decoder) because even 8-bit-only decoders already are capable of performing inverse-quantization and inverse-transformation operations 214' and 217', respectively, with bit-depth as high asl6 (double the native word-size of 8). This is because it is advantageous to preserve more bits of precision - in the transform domain - for the output of the DCT than for its input - in the pixel domain. (A cosine function applied to a non-zero integer input results in a non-terminating floating point output; the discrete cosine transform can be designed to produce any desired level of precision.)
[0071] Thus, the output of the first major portion of method 200* is (D + l)-bit residual samples 219'. The first impediment comes from the fact that the second major portion, prediction process 290* is designed for d-bit samples only. This is indicated by a large 'X' breaking the data from flowing from residual process 210' to prediction process 290*.
[0072] However, prediction process 290* is not depicted as being identical to prediction process 290 in method 200 of Figure 2a. This is because of optional SAO 250*. If bit-stream 2 contains an indication (not shown) that a sample-adaptive offset is to be applied to the output of filter 240 in the inter-prediction loop, the bit-stream will also contain SAO table 195' - which is input to SAO 250* - specifically designed for the bit-depth of the current video. (We will consider the case when that bit-depth is D, but it could be less than D; any bit-depth greater than d will be problematic.) In other words, regardless of the bit-depth for which the decoder is designed prior to its first decoding work, at the time of decoding any particular D-bit video, bit-stream 2 effectively causes SAO 250* to be configured so that is compatible with D-bit samples. Thus, in the hypothetical context of Figure 2c (wherein d-bit data is flowing through prediction process 290*), SAO 250* cannot function correctly for videos of bit-depth greater than d. This is because SAO 250* is configured to use a lookup table that requires values to have a specific bit-depth to successfully perform as intended by the D-bit encoder that encoded the video. In reference to the HEVC standard, no output for SAO 250* is defined under these circumstances, wherein the SAO is configured for D-bit input, but is fed d-bit input. This is indicated by shading of inoperable SAO 250* and by a large 'X' breaking the data from flowing out of the SAO (as it would be spurious). This is a second impediment that must be solved, even if the first one is overcome, in the cases when bit-stream 2 indicates an SAO operation is to be performed.
[0073] The upshot of this is that there is no well-defined output of inoperable method
200*, as indicated by a large 'X' beside the hypothetical d-bit data line exiting prediction process 290*. The embodiments disclosed herein provide functioning solutions to the problems depicted in Figure 2c. Each of the disclosed solutions (methods 200c through 200m, described in reference to Figures 4a through 7c) as well as one known solution (method 200b, described in reference to Figure 3a) is depicted in reference to Figure 2c; that is, features that have changed from Figure 2c are shown with heavy lines and bold text.
[0074] Figures 3a and 3b depict two known solutions, though they are not both directed to the problem of bit-depth overage.
[0075] Figure 3a shows a block diagram detailing a known method 200a of using a D-bit decoder to decode coefficients 194', corresponding to sources samples 12' having an original bit- depth D, as reconstructed samples 293a having the same original bit-depth D; as such, this method does not solve the bit-depth overage problem addressed by the present disclosure. Its goal is merely ease the computation/storage burden posed by having to store one or more entire D-bit pictures at a time in picture buffer 265' in method 200' of Figure 2b for inter-prediction. (Line buffer 235' is much smaller, so full-precision use of it does not incur as big a computation/storage penalty.) Instead, d-bit picture buffer 265, identical to that in Figure 2a, is used in its place. This is made possible by added steps 264 and 266. Precision of the reference samples destined for it is reduced from D bits to d bits at step 264. A simple implementation is to shift each sample two bits right, effectively truncating the two least significant bits; this is mathematically equivalent to integer division by 2D d (in which remainders are lost). Picture buffer 265 can then handle each sample as it would for a d-bit video. So that prediction operation 275 will produce D-bit prediction samples 279' that combine appropriately with (D + l)-bit residual samples 219', precision of the reference samples destined for the prediction operation is increased from d bits to D bits at step 266, after the reference samples are moved from picture buffer 265. A simple implementation is to shift each sample two bits left, effectively padding the sample with two new least-significant bits, each equal to zero; this is mathematically equivalent to multiplication by 2D d. The net result of the precision decrease before and the precision increase after storage in picture buffer 265 is equivalent to a preservation of precision, but loss of accuracy: The two least-significant bits of D bits are cleared (i.e., set to zero), so that each sample is effectively rounded down to the nearest multiple of 2°"d. Consequently, D-bit source videos reconstructed as D-bit videos according to method 200a will suffer in visual fidelity compared to those produced by a true D-bit decoder according to method 200' of Figure 2b.
[0076] Figure 3b shows a block diagram detailing known method 200b of decoding coefficients 194', corresponding to sources samples having an original bit-depth D as reconstructed samples having the lower bit-depth d, by (a) maintaining the same precision as in the conventional method of Figure 2b throughout all computations in prediction process 290b - depicted by heavy
data-flow lines - and (b) rounding and/or truncating the resulting D-bit reconstructed samples to bit-depth d at step 291; input of bit-depth value 192', which has the value D, is needed in order to determine by how many bits (D - d) the bit-depth must be reduced. To maintain precision, (larger) picture buffer 265" and (smaller) line buffer 235" must now be "doubled up" by some means (depicted by doubled outlines), generally by using more memory; for 8-bit decoders based on 8-bit words (and no compact packing and unpacking of samples), each buffering step for a 10-bit video will take twice the storage as for an 8-bit video.
[0077] The skilled person is aware of certain practicalities in respect of rounding-and/or- truncation operations mentioned in reference to Figure 3b or to any figure depicting a novel embodiment disclosed herein. First, a rounding-and/or-truncation operation comprises one or both of rounding and truncating. Second, when both are employed, is advantageous to have the rounding operation followed by a truncation operation, rather than vice versa, for generating a more accurate prediction. Third, truncating can be implemented via a right-shift operation. Fourth, rounding can be one of many different types, including: (a) rounding toward zero, (b) rounding toward negative infinity, (c) rounding toward positive infinity, (d) rounding half values toward zero, (e) rounding half values toward negative infinity, (f) rounding half values toward positive infinity, (g) rounding half values toward the nearest even value, (h) rounding half values toward the nearest odd value, (i) stochastic rounding, (j) spatial dithering, and (k) spatial dithering in combination with any one of (a) through (h). Fifth, of these enumerated types, type (f) is favoured by the HEVC standard.
[0078] Turning now to the novel embodiments disclosed herein, they will first be compared to conventional methods in respect of how optional SAO is handled.
[0079J In both of methods 200a and 200b of Figures 3a and 3b, respectively, if the bit- stream contains an indication that a sample-adaptive offset (SAO) is to be applied to D-bit intermediate samples for inter prediction, that SAO can be applied exactly as intended, without modification, as in method 200' of Figure 2b. This is because the input to SAO 250' is configured, by the input of SAO table 195', for D-bit input and is fed D-bit, filtered, reference samples. As
mentioned above, in inoperable method 200* of Figure 2c, SAO 250* is likewise configured, by the input of SAO table 195', for D-bit input, but is fed d-bit input.
[0080J In the case of all novel embodiments disclosed herein except the last one (described later in reference to Figure 7c), the corresponding input is, instead, d-bit, filtered, reference samples. SAO makes use of a predetermined lookup table designed for the specific bit depth of the video, regardless of what bit-depth the decoder prefers. Moreover, even a d-bit-only decoder must input the table from bit-stream 2. (Contrast this with a d-bit-only decoder's assumption that certain operations will necessarily be used in their d-bit incarnations.) For a D-bit video, the encoder assumes the decoder will be a d-bit decoder; thus, the SAO table stored in bit-stream 2 is designed based on the assumption that D-bit samples will be used by the decoder to index into the table. However, in the novel embodiments mentioned, d-bit samples input to SAO 250c will lack D - d bits of data needed to perform a lookup. Therefore, in those methods providing d-bit input to SAO 250c, the SAO must be configured for use with d-bit intermediate samples. Otherwise, its output (not defined in the HEVC standard under such circumstances) will do serious, irreparable harm to the fidelity of the reconstructed video.
[0081) This can be done in several ways, which are not depicted in the figures. One approach is to configure SAO 250c to rescale d-bit intermediate samples to bit-depth D and then use the rescaled samples to perform lookups in a table configured for use with D-bit intermediate samples; the rescaling can be done in any known fashion, such as by padding (as in step 266 of Figure 2b). Another implementation is to configure SAO 250c to perform lookups in a replacement table, which is itself configured for use with d-bit intermediate samples. (For method 2001, described later in reference to Figure 7c, SAO 250' is exactly as it is in conventional method 200' of Figure 2b.)
[0082] In the case of all novel embodiments disclosed herein, bit-stream unpacking operation 21 must be conducted with reference to the bit-depth signalled in bit-stream 2 due to the entropy-coding method employed in the packing of certain categories of symbol, wherein the bit-depth value may control the binarization process that converts the symbol into a string of
binary digits. For example, in the case of a unary code, the symbol 0 would be represented by the string "0", the symbol 1 as "10", the symbol 2 as "110", etc. However, if it is known that there are only three symbols in the alphabet, the terminating 0 in the representation of symbol 2 is redundant given a priori knowledge of the alphabet by the decoder. In such cases, a truncated unary code may be employed that represents symbol 2 as "11". In the case of HEVC, some parameters, in particular for SAO, are binarized in this manner, where the size of the alphabet for SAO offset values is determined by bit-depth value 192.
[0083] Details - other than regarding SAO - will now be described for novel decoders in reference to Figures 4a through 7c.
[0084] Method 200e and methods 200g through 200m all incorporate rounding-and/or- truncation operations and, as previously disclosed, the rounding method can be one of many types. Careful choice as to the particular rounding method is necessary, since, in the absence of a closed loop system that accounts for the effects of the exact rounding-and/or-truncation operation different rounding methods will introduce different types of error. For instance, the relatively straightforward method of rounding half values toward positive infinity is not only asymmetric for positive and negative numbers, but contains a systematic bias that, when influencing the reconstruction loop, will result in a gain greater than one. The preferred method of rounding in these methods is rounding half values toward the nearest even value, also known as bankers' rounding, which is unbiased for both positive and negative numbers, for sufficiently well distributed values; a value n can be rounded with respect to the least significant D - d bits replacing n with (n + 1 + ((n » {D -d)) & 1)) » n, where "»" is the right-shift operator and "&" is the bitwise AND operator.
[0085] Figure 8a shows a flowchart depicting, at a high level, all of methods 200c through
200g, which are detailed in block-diagram format in Figures 4a through 5b, respectively. These five embodiments have in common that they each comprise a residual process 210X, including inverse- quantization and inverse-transformation processes, that computes (d + l)-bit residual samples. Each of the five embodiments will have a distinct residual process 210X, but a common prediction
process 290c.
[0086] More particularly, methods 200c through 200e of Figures 4a through 4c, respectively, have in common that they each depict an embodiment in which inverse- transformation process 216' comprises an inverse-transformation operation for use in reconstructing D-bit samples (exactly as Figure 2b) and in which conventional inverse-quantization process 211' of method 200' is modified so that its output, when processed by conventional inverse-transformation operation 217', results in (d + l)-bit residual samples, rather than (D + 1)- bit samples, as it does in the case of conventional method 200' of Figure 2b. There are three different ways to modify conventional inverse-quantization process 211' of method 200', reflected in methods 200c, 200d, and 200e, which produce slightly different reconstructed d-bit samples 293c, 293d, and 293e, respectively.
[0087] As shown in Figure 4a, the modification in inverse-quantization process 211c comprises is in applying an offset based on bit-depth d - rather than bit-depth D - to QP 193'. Rather than using input 192' to generate an offset of 6 * (D - 8) as at step 212' of Figure 2c, that input is suppressed or ignored. Instead, the offset 6 * (d - 8) is generated locally (without regard to the bit-depth, D, indicated in bit-stream 2) at step 212c. By adding this smaller offset (which will equal zero when d = 8) to the QP, rather than the larger 6 * (d - 8), at addition operation 213, subsequent inverse-quantization operation 214', which uses the offset QP from addition operation 213, produces dequantized samples such that conventional inverse-transformation operation 217' produces (d + l)-bit residual samples 219c.
[0088] Of the various methods disclosed herein, method 200c of Figure 4a is the simplest to implement (since the minor change at step 212c merely ignores the value of the bit-depth of the source samples), but it is the least faithful in reconstructing videos. One reason is that no attempt is made to address the increased range of QP values that are potentially present in a bit-stream intended for D-bit decoders (To accommodate the greater fidelity of D-bit systems, the valid range of QP values increases with higher bit-depths), as such, it is only applicable in systems that do not use the extended QP range (this would be typical of bit-streams that target low bit-rate
applications such as video streaming). Another reason is that the lower precision is used from the very beginning of the reconstruction process, whereas in the other disclosed methods, at least some computations are done with the higher precision. Specifically, 8-bit decodings of 10-bit source videos, performed according to method 200c, exhibit visually observable drift with the following traits. First, intra prediction suffers DC drift from block to subsequent block, which increases towards the bottom right. Second, this drift is more noticeable in colour and in saturation than in luminance. Third, inter prediction increases the drift from picture to subsequent picture. Fourth, the distortion introduced by the drift can exceed 11 dB in unfavourable conditions. Finally, the distortion is worse for smaller QP values.
[0089] As shown in Figure 4b, the modification in inverse-quantization process 21 Id comprises modifying inverse-quantization operation 214d so that it produces inverse-quantized coefficients of the same magnitude as are produced by inverse-quantization operation 214 in Figure 2a, despite using a QP offset exactly as in Figure 2c. The dequantized samples produced by inverse-quantization operation 214d are such that conventional inverse-transformation operation 217' produces (d + l)-bit residual samples 219d.
[0090] As shown in Figure 4c, the modification in inverse-quantization process 211e comprises adding a new step. QP-offsetting 212', 213 and inverse-quantization operation 214' are configured for use in reconstructing D-bit samples exactly as in Figure 2c. In order to provide dequantized samples to conventional inverse-transformation operation 217' for it to produce, in turn, (d + l)-bit residual samples 219e, the intermediate samples output by inverse-quantization operation 214' are rescaled at step 215. As shown in Figure 4c, this may be done by a rounding- and/or-truncating operation. In practice, rescaling step 215 will reduce the magnitude of intermediate samples (flowing from inverse-quantization operation 214' and inverse- transformation process 217) by D - d bits (even though the bit-depth of samples at this stage is greater than D); input of bit-depth value 192', which has the value D, is needed in order to determine by how many bits the bit-depth must be reduced.
[0091] On the other hand, methods 200f and 200g of Figures 5a through 5b, respectively,
have in common that they each depict an embodiment in which entire inverse-quantization process 211' is exactly as in Figure 2c and invers-transformation process 216' of Figure 2c is modified. In particular, inverse-quantization process 211' comprises inverse-quantization operation 214' for use in reconstructing D-bit samples. Conventional inverse-transformation process 216' of method 200' is modified so as to produce (d + l)-bit residual samples, given the output of conventional inverse-quantization process 211'. There are two ways to modify conventional inverse-transformation process 216' of Figure 2c, reflected in methods 200f and 200g, which produce slightly different reconstructed d-bit samples 293f and 293g, respectively.
[0092] As shown in Figure 5a, the modification in inverse-transformation process 216f comprises replacing conventional inverse-transformation operation 217' of Figure 2c with inverse- transformation operation 217f, which is configured to produce (d + l)-bit residual samples 219f, given the output of conventional inverse-quantization process 211'.
[0093J As shown in Figure 5b, the modification in inverse-transformation process 216f comprises (a) retaining conventional inverse-transformation operation 217' configured, exactly as in Figure 2c, to produce (D + l)-bit intermediate samples, given the output of conventional inverse- quantization process 21 and (b) adding subsequent rounding-and/or-truncating process 218 that, given said (D + l)-bit intermediate samples, produces (d + l)-bit residual samples 219g; input of bit- depth value 192', which has the value D, is needed in order to determine by how many bits (D - d) the bit-depth must be reduced.
|0094] Figure 8b shows a flowchart depicting, at a high level, all of methods 200h through
200m, which are detailed in block-diagram format in Figures 6a through 7c, respectively. These six embodiments have in common that they each comprise residual process 210', including inverse- quantization process 211' and inverse-transformation process 216', computes (D + l)-bit residual samples 219', exactly as in Figure 2c. In each of the these methods, conventional prediction process 290' of Figure 2c is modified to include a rounding-and/or-truncating operation (in addition to conventional clip3 operation 225 or 225'), which reduces a (D + h)-bit input to a (d + h)-bit output, where h equals 2 if the rounding-and/or-truncating operation occurs before clip3
operation 225 (as in Figures 6a through 6c) and equals 0 if the rounding-and/or-truncating operation occurs after clip3 operation 225' (as in Figures 7a through 7c). All six of these embodiments will have a distinct prediction process 290Y.
[0095J More particularly, methods 200h, 200Ί, and 200m of Figures 6a through 6c, respectively, have in common that they each depict an embodiment in which the aforementioned rounding-and/or-truncating operation is performed for both inter-prediction and intra-prediction modes. In each of these figures, round-and/or-truncate operation 224 reduces bit-depth of intermediate values from D + 2 to d + 2 prior clip3 operation 225; input of bit-depth value 192', which has the value D, is needed in order to determine by how many bits (D - d) the bit-depth must be reduced. It is advantageous to perform rounding-and/or-truncating operation 224 followed by a clip3 operation 225, rather than vice versa, for generating a more accurate prediction.
[0096] As a result of operation 230, the reference samples sent both to line buffer 235 for intra prediction and (perhaps via sample-adaptive offset 250c) to reference buffer 265 for inter prediction are d-bit samples. However, to be combined with (D + l)-bit residual samples 219' at addition operation 220, prediction samples must have D bits. The basic idea is to rescale d-bit intermediate samples to D-bit samples, prior to addition operation 220. As with comparable step 266 in method 200a of Figure 3a, a simple implementation is to shift each sample two bits left, effectively padding the sample with two least significant bits, each equal to zero; this is
mathematically equivalent to multiplication by 2D d.
[0097| In particular, the padding can be performed before or after a prediction operation is performed. Furthermore, either solution can be applied independently for intra-prediction and for inter-prediction. This results in four ways to pad, three of which are shown in Figures 6a to 6c as methods 200h to 200k, respectively, with different outputs 293h to 293k, respectively. In method 200h, depicted in Figure 6a, padding operation 276 follows prediction for each prediction mode, symbolized by generic prediction operation 275. In method 200i, depicted in Figure 6b, padding operation 274 precedes generic prediction operation 275; the reconstructed d-bit samples 293i output by method 200i have better fidelity than corresponding output 293h from method 200h,
because prediction performed by prediction operation 275 will be more accurate if it operates on higher-precision reference samples (even though the extra precision is D - d zero bits). Method 200m of Figure 6c depicts an embodiment that combines aspects of Figures 6a and 6b. Padding 274 rescales d-bit intermediate samples to produce D-bit input to intra-prediction operation 275m and padding 276 rescales d-bit output of inter-prediction operation 275n to produce a D-bit prediction samples. Generic (i.e., either intra- or inter ) prediction samples 279m will match prediction samples 279h of method 200h in inter-prediction mode and will match prediction samples 279i of method 200i in intra-prediction mode. The skilled person will realize from what has already been disclosed that a fourth variation can be obtained by switching the pre-prediction and post-prediction padding between the two prediction modes.
[0098] Method 200m of Figure 6c represents a reasonable balance between, at one extreme, method 200c of Figure 4a - which performs all operations as would be done for a d-bit video, thereby losing precision from the beginning (which results in drift) - and, at the other extreme, known method 200b - which performs all operations as would be done conventionally for a D-bit video, thereby forcing the decoding device (with limited resources) to perform roughly double the work as it would to reconstruct d-bit videos encoded from d-bit source samples. Testing of method 200m has verified the advantages of this particular mix of d-bit and D-bit operations, in which the more computationally expensive inter-prediction operation 275n is performed for d-bit reference samples. Objective analysis shows a significant reduction in the distortion compared to method 200c. Visual inspection reveals that the DC drift observed using method 200c is not apparent with method 200m.
[0099] Methods 200j through 2001 of Figures 7a through 7c, respectively, have in common that they each depict an embodiment in which the rounding-and/or-truncating operation is performed only for intra-prediction mode. In each of these figures, line buffer 235" must do "double duty" as in methods 200a and 200b of Figures 3a and 3b, respectively, to handle D-bit reference samples. The complete intra-prediction loop is shown all the way to the inputting of intra-prediction samples 279j to addition operation 220, but the inter-prediction loop is not shown in detail beyond reference-picture buffer 265; inter prediction of next picture at step 269 is a
placeholder for two different endings of the inter-prediction loop for each of methods 200j through 2001. The skilled person will realize from what has already been disclosed that (a) the d-bit output from buffer 265 is inadequate for producing prediction samples compatible with addition operation 220 without a rescaling to D bits somewhere along the path from that buffer, (b) padding by D- d bits can be performed either before (as with operation 274 in method 200i) or after (as with operation 276 in method 200j) an inter-prediction operation, and (c) therefore there are two variants of each of methods 200j through 2001.
[00100] The three (incomplete) methods, 200j through 2001, depicted in Figures 7a through 7c correspond to three different placements of a rounding-and/or-truncating operation; each placement results in slightly different reconstructed d-bit samples 293j through 2931, respectively. In method 200j of Figure 7a, rounding-and/or-truncating operation 281 applies prior to in-loop filtering operation 240 (for d-bit samples). In method 200k of Figure 7b, rounding-and/or- truncating operation 282 applies after in-loop filtering operation 240' (for D-bit samples) and prior to modified optional SAO 250c (for d-bit samples). In method 2001 of Figure 7c, rounding-and/or- truncating operation 283 applies after conventional optional SAO 250' (for D-bit samples); its d-bit output is routed both to output 2931 and to reference-picture buffer 265 for future prediction operations. In case of these three embodiments, input of bit-depth value 192', which has the value D, is needed in order to determine by how many bits (D - d) the bit-depth must be reduced.
[00101] Certain adaptations of the described embodiments can be made. Therefore, the above-discussed embodiments are considered to be illustrative and not restrictive. Moreover, certain of the embodiments might not lend themselves to implementation in some systems, depending on which operations within the decoder are implemented in hardware. It is anticipated that the novel embodiments disclosed herein could be adapted for use in 3-D video systems or multi-view video systems.
Claims
1. A method of reconstructing d-bit samples from coefficients, in an encoded video bit-stream, encoded based on source samples having a bit-depth D, greater than d, the method comprising: applying to the coefficients a residual process, including inverse-quantization and inverse- transformation processes, to compute (d + l)-bit residual samples; and
applying to the residual samples a prediction process, including clipping intermediate samples, to compute d-bit reconstructed samples.
2. The method of claim 1, wherein the inverse-transformation process comprises an inverse- transformation operation for use in reconstructing D-bit samples, and wherein the inverse- quantization process is configured so that its output, when processed by the inverse- transformation operation, results in (d + l)-bit residual samples.
3. The method of claim 2, wherein the inverse-quantization process comprises applying to a quantization parameter, input to the inverse-quantization process, an offset based on bit-depth d, rather than bit-depth D.
4. The method of claim 2, wherein the inverse-quantization process comprises an inverse- quantization operation configured for use in reconstructing d-bit samples.
5. The method of claim 2, wherein the inverse-quantization process comprises:
applying an inverse-quantization operation configured for use in reconstructing D-bit samples; and
reseating intermediate samples output by the inverse-quantization operation.
6. The method of claim 1, wherein the inverse-quantization process comprises an inverse- quantization operation for use in reconstructing D-bit samples, and wherein the inverse- transformation process is configured to produce (d + l)-bit residual samples, given the output of the inverse-quantization operation.
7. The method of claim 6, wherein the inverse-transformation process comprises an inverse- transformation operation configured to produce (d + l)-bit residual samples, given the output of the inverse-quantization operation.
8. The method of claim 6, wherein the inverse-transformation process comprises:
an inverse-transformation operation configured to produce (D + l)-bit intermediate samples, given the output of the inverse-quantization operation; and
a rounding-and/or-truncating process that, given said (D + l)-bit intermediate samples, produces (d + l)-bit residual samples.
9. The method of claim 8, wherein the rounding-and/or-truncating process comprises:
a rounding operation; and
a right-shift operation subsequent to the rounding operation.
10. The method of claim 9, wherein the rounding operation comprises one of: (a) rounding toward zero, (b) rounding toward negative infinity, (c) rounding toward positive infinity, (d) rounding half values toward zero, (e) rounding half values toward negative infinity, (f) rounding half values toward positive infinity, (g) rounding half values toward the nearest even value, (h) rounding half values toward the nearest odd value, (i) stochastic rounding, (j) spatial dithering, and (k) spatial dithering in combination with any one of (a) through (h).
11. The method of any one of claims 1 to 10, wherein the prediction process comprises: if the bit-stream contains an indication that a sample-adaptive offset (SAO) is to be applied to D-bit
intermediate samples for inter prediction, applying instead an SAO configured for d-bit intermediate samples.
12. The method of claim 11, wherein the applied SAO is configured to:
rescale d-bit intermediate samples to bit-depth D; and
use the rescaled samples to perform lookups in a table configured for use with D-bit intermediate samples.
13. The method of claim 11, wherein the applied SAO is configured to perform lookups in a table configured for use with d-bit intermediate samples.
14. A computing device comprising:
one or more processors; and
a memory containing processor-executable instructions that, when executed by the one or more processors, cause the device to perform the method of any one of claims 1 to 13.
15. A non-transitory, processor-readable storage medium storing processor-executable instructions that, when executed by one or more processors of a computing device, cause the device to perform the method of any one of claims 1 to 13.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CA2013/000644 WO2014165960A1 (en) | 2013-04-08 | 2013-04-08 | Methods for reconstructing an encoded video at a bit-depth lower than at which it was encoded |
EP13881580.8A EP2984842A1 (en) | 2013-04-08 | 2013-04-08 | Methods for reconstructing an encoded video at a bit-depth lower than at which it was encoded |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CA2013/000644 WO2014165960A1 (en) | 2013-04-08 | 2013-04-08 | Methods for reconstructing an encoded video at a bit-depth lower than at which it was encoded |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014165960A1 true WO2014165960A1 (en) | 2014-10-16 |
Family
ID=51688766
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CA2013/000644 WO2014165960A1 (en) | 2013-04-08 | 2013-04-08 | Methods for reconstructing an encoded video at a bit-depth lower than at which it was encoded |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP2984842A1 (en) |
WO (1) | WO2014165960A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080075166A1 (en) * | 2004-07-13 | 2008-03-27 | Dolby Laboratories Licensing Corporation | Unbiased Rounding for Video Compression |
US20120307889A1 (en) * | 2011-06-01 | 2012-12-06 | Sharp Laboratories Of America, Inc. | Video decoder with dynamic range adjustments |
US20130083841A1 (en) * | 2011-09-30 | 2013-04-04 | Broadcom Corporation | Video coding infrastructure using adaptive prediction complexity reduction |
CA2831019A1 (en) * | 2011-12-15 | 2013-06-20 | Mediatek Singapore Pte. Ltd. | Method and apparatus for quantization level clipping |
-
2013
- 2013-04-08 WO PCT/CA2013/000644 patent/WO2014165960A1/en active Application Filing
- 2013-04-08 EP EP13881580.8A patent/EP2984842A1/en not_active Withdrawn
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080075166A1 (en) * | 2004-07-13 | 2008-03-27 | Dolby Laboratories Licensing Corporation | Unbiased Rounding for Video Compression |
US20120307889A1 (en) * | 2011-06-01 | 2012-12-06 | Sharp Laboratories Of America, Inc. | Video decoder with dynamic range adjustments |
US20130083841A1 (en) * | 2011-09-30 | 2013-04-04 | Broadcom Corporation | Video coding infrastructure using adaptive prediction complexity reduction |
CA2831019A1 (en) * | 2011-12-15 | 2013-06-20 | Mediatek Singapore Pte. Ltd. | Method and apparatus for quantization level clipping |
Also Published As
Publication number | Publication date |
---|---|
EP2984842A1 (en) | 2016-02-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12028537B2 (en) | Adjustments to encoding and decoding when switching color spaces | |
US11943440B2 (en) | Adjusting quantization/scaling and inverse quantization/scaling when switching color spaces | |
KR102647783B1 (en) | System and method for reshaping and adaptation of high dynamic range video data | |
US10785483B2 (en) | Modified coding for a transform skipped block for CABAC in HEVC | |
US10419774B2 (en) | Palette mode in HEVC | |
EP4018650A1 (en) | Cross-component adaptive loop filter for chroma | |
EP3080988A1 (en) | Method and apparatus for syntax element encoding in a video codec | |
US20140301447A1 (en) | Methods for reconstructing an encoded video at a bit-depth lower than at which it was encoded | |
US9674538B2 (en) | Methods for reconstructing an encoded video at a bit-depth lower than at which it was encoded | |
GB2523076A (en) | Improved palette mode in HEVC for the encoding process | |
EP2984835B1 (en) | Methods for reconstructing an encoded video at a bit-depth lower than at which it was encoded | |
EP2984842A1 (en) | Methods for reconstructing an encoded video at a bit-depth lower than at which it was encoded | |
US20240056585A1 (en) | Coding method, encoder, and decoder | |
KR20240089011A (en) | Video coding using optional neural network-based coding tools |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13881580 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2013881580 Country of ref document: EP |