WO2006043753A1 - Procede et appareil de precodage de trains de bits hybride - Google Patents
Procede et appareil de precodage de trains de bits hybride Download PDFInfo
- Publication number
- WO2006043753A1 WO2006043753A1 PCT/KR2005/003030 KR2005003030W WO2006043753A1 WO 2006043753 A1 WO2006043753 A1 WO 2006043753A1 KR 2005003030 W KR2005003030 W KR 2005003030W WO 2006043753 A1 WO2006043753 A1 WO 2006043753A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- rate
- bit
- upper layer
- layer bitstream
- bitstream
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 76
- 230000015572 biosynthetic process Effects 0.000 claims description 16
- 230000000007 visual effect Effects 0.000 claims description 12
- 239000010410 layer Substances 0.000 description 158
- 230000002123 temporal effect Effects 0.000 description 28
- 238000013139 quantization Methods 0.000 description 25
- 239000013598 vector Substances 0.000 description 23
- 230000008569 process Effects 0.000 description 22
- 238000010586 diagram Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000007906 compression Methods 0.000 description 4
- 230000006835 compression Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000001914 filtration Methods 0.000 description 3
- 239000003550 marker Substances 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000011229 interlayer Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000035807 sensation Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
- H04N19/615—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding using motion compensated temporal filtering [MCTF]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/132—Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/162—User input
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/164—Feedback from the receiver or from the transmission channel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/187—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/36—Scalability techniques involving formatting the layers as a function of picture distortion after decoding, e.g. signal-to-noise [SNR] scalability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/517—Processing of motion vectors by encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/63—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/63—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
- H04N19/64—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets characterised by ordering of coefficients or of bits for transmission
- H04N19/647—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets characterised by ordering of coefficients or of bits for transmission using significance based coding, e.g. Embedded Zerotrees of Wavelets [EZW] or Set Partitioning in Hierarchical Trees [SPIHT]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/115—Selection of the code volume for a coding unit prior to coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/13—Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
Definitions
- Apparatuses and methods consistent with the present invention relate to a multi ⁇ layer video coding technique, and more particularly, to predecoding a hybrid bitstream generated by a plurality of coding schemes.
- Multimedia data containing a variety of information including text, picture, music and the like has been increasingly provided.
- Multimedia data is usually voluminous such that it requires a storage medium having a large capacity.
- a wide bandwidth is required for transmitting the multimedia data. For example, a picture of 24 bit true color having a resolution of 640x480 needs the capacity of 640x480x24 per frame, namely, data of approximately 7.37 Mbits.
- a bandwidth of ap ⁇ proximately 1200 Gbits is needed so as to transmit this data at 30 frames/second, and a storage space of approximately 1200 Gbits is needed so as to store a movie having a length of 90 minutes.
- a compressed coding scheme it is necessary to use a compressed coding scheme in transmitting multimedia data including text, picture or sound.
- Data redundancy implies three types of redundancies: spatial redundancy, temporal redundancy, and perceptional-visual redundancy.
- Spatial redundancy refers to du ⁇ plication of identical colors or objects in an image
- temporal redundancy refers to little or no variation between adjacent frames in a moving picture frame or successive repetition of same sounds in audio
- perceptional- visual redundancy refers to dullness of human vision and sensation to high frequencies.
- FIG. 1 shows an environment in which video compression is applied.
- Original video data is compressed by a video encoder 1.
- DCT Discrete Cosine Transform
- Video decoder 3 decodes the compressed video data to reconstruct original video data.
- the video encoder 1 compresses the original video data to not exceed the available bandwidth of the network 2 in order for the video decoder 3 to decode the compressed data.
- communication bandwidth may vary depending on the type of the network 2.
- the available communication bandwidth of an Ethernet is different from that of a wireless local area network (WLAN).
- WLAN wireless local area network
- a cellular com ⁇ munication network may have a very narrow bandwidth.
- Scalable video coding is a video compression technique that allows video data to provide scalability. Scalability is the ability to generate video sequences at different resolutions, frame rates, and qualities from the same compressed bitstream.
- Temporal scalability can be provided using Motion Compensation Temporal filtering (MCTF), Unconstrained MCTF (UMCTF), or Successive Temporal Approximation and Referencing (STAR) algorithm. Spatial scalability can be achieved by a wavelet transform algorithm or multi-layer coding that has been actively studied in recent years.
- SNR Signal-to-Noise Ratio
- EZW Embedded ZeroTrees Wavelet
- SPIHT Set Partitioning in Hierarchical Trees
- EZBC Embedded ZeroBlock Coding
- EBCOT Embedded Block Coding with Optimized Truncation
- Multi-layer video coding algorithms have recently been adopted for scalable video coding. While conventional multi-layer video coding usually uses a single video coding algorithm, increasing attention has been recently directed to multi-layer video coding using a plurality of video coding algorithms.
- FIGS. 2 and 3 illustrate the structures of bitstreams generated by conventional multi-layer video coding schemes.
- FIG. 2 illustrates a method of generating and arranging a plurality of Advanced Video Coding (AVC) layers at different resolutions, frame rates, and bit-rates.
- AVC Advanced Video Coding
- each layer is efficiently predicted and compressed using information from another layer.
- multiple AVC layers are encoded at different resolutions of QCIF to SD, different frame rates of 15 Hz to 60 Hz, and different bit-rates of 32 Kbps to 3.0 Mbps, thereby achieving a wide variety of visual qualities.
- the method shown in FIG. 2 may reduce redundancy to some extent through interlayer prediction but suffer an increase in bitstream size because an AVC layer is generated for each visual quality.
- FIG. 3 shows an example of a bitstream including an AVC base layer and a wavelet enhancement layer.
- the wavelet enhancement layer has different resolutions from QCIF to SD because wavelet transform supports decomposition of an original image at various resolutions.
- the wavelet enhancement layer that is subjected to embedded quantization can also be encoded at bit-rates of 32 Kbps to 3.0 Mbps by arbitrarily truncating a bitstream from the tail.
- a hierarchical method such as MCTF is used for temporal transformation
- the structure shown in FIG. 3 can provide various frame rates from 15 Hz to 60 Hz.
- the use of only two layers can achieve various visual qualities but not provide high video coding performance at each visual quality.
- FIG. 4 is a graph illustrating Peak Signal-to-Noise Ratio (PSNR) with respect to a bit-rate for AVC and wavelet coding.
- PSNR Peak Signal-to-Noise Ratio
- wavelet coding exhibits high performance at high bit-rate or resolution while providing low performance at low bit-rate or resolution.
- AVC provides good performance at a low bit-rate.
- a bitstream including two layers for each resolution hereinafter referred to as an 'AVC-wavelet hybrid bitstream'
- an upper layer 'wavelet layer'
- 'AVC layer' is encoded using wavelet coding at specific resolution
- a lower layer 'AVC layer'
- the AVC layer is used for a low bit- rate while the wavelet layer is used for a high bit-rate.
- the wavelet layer is quantized using embedded quantization, it can be encoded at various bit-rates by randomly truncating a bitstream from the tail.
- a bit-rate must be suitably allocated to the lower layer, i.e., AVC layer, to ensure a minimum data rate necessary for cir ⁇ cumstances.
- a critical bit-rate Bc can be allocated to provide optimum performance of an AVC-wavelet hybrid bitstream.
- FIG. 5 illustrates a multi-layer coding method using two different coding algorithms for each resolution.
- a video encoder uses both an AVC coding algorithm offering excellent coding efficiency and a wavelet coding technique providing excellent scalability.
- the bitstream shown in FIG. 3 has only two layers, i.e., wavelet layer and AVC layer
- the bitstream shown in FIG. 5 includes complex layers, i.e., a wavelet layer and an AVC layer for each resolution.
- the wavelet layer is not used for implementation of resolution scalability but is used for implementation of SNR scalability.
- MCTF or UMCTF may be used. Disclosure of Invention
- texture data in a wavelet layer bitstream containing the texture data and motion data can be truncated from the tail.
- the entire motion data should be truncated because the motion data is not scalable.
- the present invention provides a method and apparatus for efficiently adjusting a signal-to-noise ratio (SNR) scale in a bitstream including two layers encoded using two different coding algorithms.
- SNR signal-to-noise ratio
- the present invention also provides a method and apparatus for adjusting a SNR scale considering texture data as well as motion data.
- a method for predecoding a hybrid bitstream including a lower layer bitstream and an upper layer bitstream obtained by encoding a video with a predetermined resolution according to a target bit-rate including obtaining a first bit-rate for a boundary between the lower layer bitstream and the upper layer bitstream and a second bit-rate for a boundary between motion information and texture information of the upper layer bitstream from the input hybrid bitstream, determining the target bit-rate according to variable network circumstances; and when the target bit-rate is between the first and second bit-rates, skipping the motion information of the upper layer bitstream and truncating all bits of the texture information of the upper layer bitstream from the tail, except bits corresponding to the difference between the target bit-rate and the first bit- rate.
- a method for predecoding a hybrid bitstream including a lower layer bitstream and an upper layer bitstream obtained by encoding a video with a predetermined resolution according to a target bit-rate including obtaining a first bit-rate for a boundary between the lower layer bitstream and the upper layer bitstream from the input hybrid bitstream, determining the target bit-rate according to variable network circumstances, determining a critical bit-rate used to determine whether to skip motion information of the upper layer bitstream, and when the target bit-rate is between the first bit-rate and the critical bit-rate, skipping the motion information of the upper layer bitstream and truncating all bits of the texture information of the upper layer bitstream from the tail, except bits corresponding to the difference between the target bit-rate and the first bit-rate.
- an apparatus for predecoding a hybrid bitstream including a lower layer bitstream and an upper layer bitstream obtained by encoding a video with a predetermined resolution according to a target bit-rate including a bitstream parser obtaining a first bit-rate for a boundary between the lower layer bitstream and the upper layer bitstream and a second bit-rate for a boundary between motion information and texture in ⁇ formation of the upper layer bitstream from the input hybrid bitstream, a target bit-rate determiner determining the target bit-rate according to variable network circumstances, and a predecoding unit skipping the motion information of the upper layer bitstream and truncating all bits of the texture information of the upper layer bitstream from the tail, except bits corresponding to the difference between the target bit-rate and the first bit-rate when the target bit-rate is between the first and second bit-rates.
- an apparatus for predecoding a hybrid bitstream including a lower layer bitstream and an upper layer bitstream obtained by encoding a video with a predetermined resolution according to a target bit-rate including a bitstream parser obtaining a first bit-rate for a boundary between the lower layer bitstream and the upper layer bitstream from the input hybrid bitstream, a target bit-rate determiner determining the target bit- rate according to variable network circumstances, and a predecoding unit determining a critical bit-rate used to determine whether to skip motion information of the upper layer bitstream and skipping the motion information of the upper layer bitstream and truncating all bits of the texture information of the upper layer bitstream from the tail, except bits corresponding to the difference between the target bit-rate and the first bit- rate, when the target bit-rate is between the first bit-rate and the critical bit-rate.
- FIG. 1 shows an environment in which video compression is applied
- FIG. 2 illustrates conventional multi-layer video coding using a single coding algorithm
- FIG. 3 illustrates conventional multi-layer video coding using two coding algorithms
- FIG. 4 is a graph illustrating Peak Signal-to-Noise Ratio (PSNR) with respect to a bit-rate for Advanced Video Coding (AVC) and wavelet coding;
- PSNR Peak Signal-to-Noise Ratio
- FIG. 5 illustrates conventional multi-layer video coding using two different coding algorithms for each resolution
- FIG. 6 illustrates the structure of a hybrid bitstream according to an exemplary embodiment of the present invention
- FIGS. 7-10 illustrate a predecoding method according to a first exemplary embodiment of the present invention
- FIGS. 11-14 illustrate a predecoding method according to a second exemplary embodiment of the present invention
- FIG. 15 is a block diagram of a video encoder according to an exemplary embodiment of the present invention.
- FIG. 16 is a block diagram of a predecoder according to an exemplary embodiment of the present invention.
- FIG. 17 is a block diagram of a video decoder according to an exemplary embodiment of the present invention.
- FIG. 18 is a detailed flowchart illustrating a predecoding process according to a first exemplary embodiment of the present invention.
- FIG. 19 is a detailed flowchart illustrating a predecoding process according to a second exemplary embodiment of the present invention.
- the present invention proposes a method for efficiently predecoding or truncating a bitstream including a first coding layer (lower layer) and a second coding layer (upper layer) (hereinafter called a 'hybrid bitstream') for each resolution according to a target bit-rate selected depending on variable network situations.
- the predecoding or truncation refers to a process of cutting off a portion of bitstream according to a target bit-rate in order to represent video data with various bit-rates using the remaining portion.
- the hybrid bitstream can be generated for a plurality of resolutions, respectively, as shown in FIG. 5, or the hybrid bitstream can be generated in a combined manner to represent multi-resolution video data. For convenience of explanation, it will be assumed throughout this specification that a single hybrid bitstream is generated.
- the lower layer may be encoded using a video coding scheme providing good coding performance at low bit-rate, such as Advanced Video Coding (AVC) or MPEG- 4 coding while the upper layer may be encoded using a video coding scheme offering high coding performance and signal-to-noise ratio (SNR) at high bit-rate, such as wavelet coding technique.
- the lower layer may have motion vectors with pixel accuracy equal to or lower than those of the upper layer. For example, lower layer motion vectors and upper layer motion vectors may be searched at 1 and 1/4 pixel accuracies, respectively. Of course, because redundancy is present between the lower layer motion vector and the upper layer motion vector, the upper layer motion vector in which the redundancy has been removed will be actually encoded.
- FIG. 6 illustrates the structure of a hybrid bitstream 10 according to an exemplary embodiment of the present invention.
- the hybrid bitstream 10 consists of an AVC layer bitstream 20 that is a lower layer bitstream and a wavelet layer bitstream 30 that is an upper layer bitstream.
- the AVC bitstream 20 contains first motion information MV 21 and first texture information T 22.
- 1 1 bitstream 30 contains second motion information MV 2 31 and second texture in- formation T 32.
- the process of generating the first motion information MV 21, the first texture information T 22, the second motion information MV 31, and the second texture information T 2 32 will be described in detail later with reference to FIG. 15.
- the second texture information T can be arbitrarily truncated from the tail according to a target bit-rate
- no portion of the second motion information MV 31 can be randomly truncated because it is not scalable.
- the AVC layer bitstream 20 cannot also be randomly truncated to ensure a minimum AVC layer bitstream. regardless of a change in target bit-rate.
- FIG. 6 shows bit-rates defined according to positions in the hybrid bitstream 10.
- the bit-rate of the AVC layer bitstream 20 required to provide a minimum data rate is defined as the lowest bit-rate B .
- boundary bit-rate B That is, the lower layer bitstream in the hybrid bitstream 10 has the lowest bit-rate and the boundary between motion information and texture information in the upper layer bitstream has the boundary bit-rate B B .
- a critical bit-rate B indicated in the second texture information T 32 refers to a
- C 2 bit-rate used to determine whether to skip motion information in the upper layer bitstream in a predecoding method according to a second exemplary embodiment of the present invention that will be described below.
- a method for determining the critical bit-rate B will be described in detail later.
- FIGS. 7-10 illustrate a predecoding method according to a first exemplary embodiment of the present invention
- FIGS. 11-14 illustrate a predecoding method according to a second exemplary embodiment of the present invention.
- Reference numerals 10 and 40 re ⁇ spectively denote a hybrid bitstream and a predecoded bitstream that is the remaining bitstream obtained after predecoding the hybrid bitstream 10.
- T 32 of the hybrid bitstream 10 is truncated from the tail as shown in FIG. 7.
- the truncation continues until the final bitstream meets the target bit-rate.
- the whole MV 31 is skipped as shown in FIG. 9 because the MV 31 is non-scalable, thereby saving a bit- rate.
- the MV 31 is replaced with T 32b corresponding to the saved bit-rate.
- T 32b corresponds to the front portion of the texture information T 32 having the size of MV 2 31.
- the inserted T 32b can be truncated from the tail and this truncation
- MV 31 is skipped when the remaining bitstream meets critical bit-rate B before reaching boundary bit-rate B .
- T 32 is truncated from the tail in order to meet a target bit-rate before reaching the critical bit-rate B .
- T 32d corresponding to the size of MV 31 is skipped and T 32d corresponding to the size of MV 31 is inserted into a portion of T truncated, as shown in FIG. 12. Then, the remaining portion of T has the size of T 2c 32c plus T 2d 32d as shown in FIG. 13.
- the T 32e is truncated from the tail as shown in FIG. 14. Of course, this truncation also continues until all bits of T 2e 32e are cut off.
- FIG. 15 is a block diagram of a video encoder 100 according to an exemplary embodiment of the present invention.
- the video encoder 100 encodes an input video into a hybrid bitstream 10.
- the basic concept of generating the hybrid bitstream 10 is shown in the following Equation (1): [47]
- A(-) is a function used to encode an original input video to have a minimum bit-rate using AVC coding
- a " (•) is a function used to decode an encoded video. Because the process of implementing the function A(-) involves lossy coding, the result of decoding an encoded video is not the same as the original input video O.
- a difference E defined by Equation (1) is encoded using wavelet coding and the encoded result is represented by W(E).
- W(-) is a function used to encode a difference using wavelet coding.
- encoded texture information A(O) of a lower layer and encoded texture information W(E) of an upper layer can be obtained.
- Lower layer motion vector and upper layer motion vector are encoded using a different process (mainly lossless coding) than the texture information.
- the motion information and texture information of the lower and upper layers are then combined into the hybrid bitstream 10.
- a subtractor 110 calculates a difference between an original input video and a lower layer frame reconstructed by a lower layer decoding unit 135 in order to generate an upper layer frame.
- a motion estimator 121 performs motion estimation on the upper layer frame to obtain motion vectors of the upper layer frame.
- the motion estimation is the process of finding the closest block to a block in a current frame, i.e., a block with a minimum error.
- Various techniques including fixed-size block and hierarchical variable size block matching (HVSBM) may be used in the motion estimation.
- the motion estimator 121 uses motion vectors of the lower layer frame obtained by a motion estimator 131 to efficiently represent the motion vectors of the upper layer frame, in which redundancy has been removed.
- a temporal transformer 122 uses the motion vectors obtained by the motion estimator 121 and a frame at a temporally different position than the current frame to generate a predicted frame and subtracts the predicted frame from the current frame to generate a temporal residual frame, thereby removing temporal redundancy.
- the temporal transform may be performed using Motion Compensation Temporal filtering (MCTF) or Unconstrained MCTF (UMCTF).
- the wavelet transformer 123 performs wavelet transform on the temporal residual frame generated by the temporal transformer 122 or the upper layer frame output from the subtractor 110 to create a wavelet coefficient.
- Various wavelet filters such as a Haar filter, a 5/3 filter, and a 9/7 filter may be used for wavelet transform according to a transform method.
- An embedded quantizer 124 quantizes the wavelet coefficient generated by the wavelet transformer 123 and represents a quantization coefficient T in a form that can support SNR scalability. In this way, embedded quantization is used in wavelet coding to support SNR scalability.
- Embedded quantization is suitable for use in a wavelet-based codec employing wavelet transform for spatial transform.
- the embedded quantization may include encoding values above an initial threshold, encoding values above one-half the initial threshold, and repeating the above process by setting a new threshold equal to one-quarter the initial threshold.
- the quantization is performed using spatial correlation that is one of the main features of wavelet transform.
- Examples of embedded quantization techniques include Embedded ZeroTrees Wavelet (EZW), Embedded ZeroBlock Coding (EZBC), and Set Partitioning in Hierarchical Trees (SPIHT). The use of embedded quantization allows a user to arbitrarily truncate texture data from the tail according to circumstances.
- EZW Embedded ZeroTrees Wavelet
- EZBC Embedded ZeroBlock Coding
- SPIHT Set Partitioning in Hierarchical Trees
- the lower layer frame is also subjected to motion estimation by the motion estimator 131 and temporal transform by a temporal transformer 132.
- the lower layer frame does not pass through the subtractor 110.
- the lower layer frame encoded using AVC coding can use an intra predictive mode defined in H.264 in combination with temporal transform.
- a Discrete Cosine Transform (DCT) unit 133 performs DCT on a temporal residual frame generated by the temporal transform or an original input frame to create a DCT coefficient.
- the DCT may be performed for each DCT block.
- a quantizer 134 applies quantization to the DCT coefficient.
- the quantization is the process of converting real- valued DCT coefficients into discrete values by dividing the range of coefficients into a limited number of intervals.
- embedded quantization is not applied unlike in the upper layer frame.
- the lower layer decoding unit 135 reconstructs a lower layer frame from a quantization coefficient T generated by the quantizer 134 and provides the lower layer frame to the subtractor 110.
- the process of reconstructing the lower layer frame may involve inverse quantization, inverse DCT, and inverse temporal transform.
- An entropy coding unit 150 losslessly encodes the quantization coefficients T generated by the quantizer 134, the quantization coefficients T generated by the embedded quantizer 124, the motion information MV including the lower layer motion vector generated by the motion estimator 131, and the motion information MV including an upper layer motion vector component generated by the motion estimator 121 into a hybrid bitstream 10.
- Various coding schemes such as Huffman Coding, Arithmetic Coding, and Variable Length Coding may be employed for lossless coding.
- a visual quality comparator 160 compares a visual quality when portion of texture information T of an upper layer bitstream 30 in the hybrid bitstream 10 is truncated with that when motion information of the upper layer bitstream 30 is skipped and bits saved by skipping of the motion information are allocated to the texture information T as shown in FIG. 12 and finds a critical bit-rate B when both visual qualities are the same. In the latter case, motion information of a lower layer bitstream is used in place of the skipped motion information of the upper layer bitstream 30. When the visual quality in the former case is better than that in the latter case, the texture information T is further truncated.
- the critical bit-rate B can be recorded by a marker bit at a predetermined location of the hybrid bitstream 10 generated by the entropy coding unit 150.
- FIG. 16 is a block diagram of a predecoder 200 according to an exemplary embodiment of the present invention.
- the predecoder 200 predecodes the hybrid bitstream 10 provided by the video encoder (100 of FIG. 15) and adjusts a SNR or bit- rate of the hybrid bitstream 10.
- predecoding refers to a process by which resolution, frame rate, and SNR are adjusted by extracting or truncating a portion of a bitstream.
- the predecoding as used hereinafter refers to the process of adjusting a SNR of bitstream.
- the predecoder 200 may actually be realized as a video stream server transmitting a scalable video stream suitably according to a variable network en ⁇ vironment and be integrated into the video encoder 100.
- a bitstream parser 210 parses the hybrid bitstream 10 provided by the video encoder 100.
- the bitstream parser 210 obtains in ⁇ formation about the positions of MV 21, T 22, MV 31, and T 32 in the hybrid bitstream 10 (hereinafter called 'position information') by parsing start bits of MV 21, T 22, MV 31, and T 32.
- the bitstream parser 210 also parses the lowest bit-rate B and boundary bit-rate B through the position information.
- the bitstream parser 210 parses the critical bit-rate B as well and sends B , B , and B to a predecoding unit 220.
- a predecoding condition determiner 240 determines a predecoding condition, i.e., a target bit-rate to adapt to variable network circumstances according to a user's input. To achieve this, the predecoding condition determiner 240 may receive feedback in ⁇ formation about available bit-rate from a video decoder receiving a bitstream from the predecoder 200. The video decoder reconstructing a video stream can be deemed as a client device receiving a video streaming service.
- the predecoding unit 220 predecodes the bitstream according to the determined target bit-rate.
- the predecoding methods according to the first and second exemplary embodiments of the present invention described above will be described in more detail later with reference to FIGS. 18 and 19.
- the bitstream transmitter 230 transmits a hybrid bitstream reconstructed by the predecoding unit 220 after adjusting a bit-rate, i.e., a predecoded bitstream 40 to the video decoder while receiving feedback information from the video decoder.
- the feedback information may contain information about available bit-rate B measured when the video decoder receives the bitstream.
- FIG. 17 is a block diagram of a video decoder 300 according to an exemplary embodiment of the present invention.
- an entropy decoding unit 310 performs the inverse of entropy encoding and extracts lower layer data and upper layer data from an input bitstream (predecoded bitstream).
- the lower layer data may contain motion information MV 21 and texture information T 22 while the upper layer data may contain motion in ⁇ formation MV 31 and texture information T 32. No upper layer data or only texture information T 32 of the upper layer data may exist according to the result of predecoding.
- An inverse quantizer 331 performs inverse quantization on the texture information
- the inverse quantization is the inverse of the quantization process performed by the video encoder 100 and reconstructs transform coefficients using a quantization table used during the quantization process.
- An inverse DCT unit 332 performs inverse DCT on the inversely quantized result.
- the inverse DCT is the inverse of the DCT performed by the video encoder 100.
- An inverse temporal transformer 333 reconstructs a lower layer video sequence from the inversely DCT-transformed result.
- the lower layer motion vector MV 21 and the previously reconstructed lower layer frame are used to generate a motion-compensated frame that is then added to the inversely DCT-transformed result.
- an intra-frame that is not subjected to temporal transform at the video encoder 100 will be reconstructed by inverse intra prediction without undergoing inverse temporal transform.
- the reconstructed lower layer frame is then fed to an adder 340.
- the texture information T 32 of the upper layer is fed to an inverse embedded quantizer 321.
- the inverse embedded quantizer 321 performs inverse embedded quantization on texture information T 32 of the upper layer.
- the inverse embedded quantization is the inverse of the quantization process performed by the video encoder 100.
- An inverse wavelet transformer 322 performs inverse wavelet transform on the result obtained by the inverse embedded quantization.
- the inverse wavelet transform is the inverse of the wavelet transform (filtering) performed by the video encoder 100.
- An inverse temporal transformer 323 reconstructs an upper layer video sequence from the inversely spatially transformed result.
- the upper layer motion vector MV 31 is obtained to generate a motion-compensated frame that is then added to the inversely wavelet-transformed result.
- the inverse temporal transformer 323 determines whether MV 2 31 exists. When MV 2 31 does not exist, MV 21 is used as such. Conversely, when the MV 31 exists, an upper layer motion vector reconstructed using the MV 21 and MV 31 is used. If the MV 31
- J b 1 2 2 is generated at the video encoder 100 using the difference between the upper layer motion vector and the lower layer motion vector, the upper layer motion vector can be reconstructed by adding the upper layer motion vector component contained in the MV
- the adder 340 adds the reconstructed lower layer video sequence to the re ⁇ constructed lower layer video sequence in order to reconstruct a final video sequence.
- FIG. 18 is a detailed flowchart illustrating a predecoding process according to a first exemplary embodiment of the present invention.
- bitstream parser (210 of FIG. 16) parses a hybrid bitstream 10 provided by the video encoder (100 of FIG. 15) to obtain lowest bit-rate B and boundary bit-rate B that are then sent to the predecoding unit 220.
- the predecoding condition determiner 240 determines a predecoding condition, i.e., a target bit-rate B according to variable network circumstances or a user's input.
- the predecoding unit 220 performs predecoding according to the bit-rates B , B ,
- the predecoding process is performed in steps S30 to S 80.
- step S40 w hen B T is higher than B B (yes in step S30), the predecoding unit 220 truncates all bits of upper layer texture information T 32 contained in the hybrid bitstream 10 except bits corresponding to B -B . In other words, a portion of the upper layer texture in ⁇ formation T 32 corresponding to (B -B ) subtracted from the size of T 32 is truncated from the tail.
- the predecoding unit 220 skips upper layer motion information MV 31 contained in the hybrid bitstream 10 in step S60 and truncates all bits of the upper layer texture information T 32 except bits cor ⁇ responding to B -B in step S70. In other words, a portion of the upper layer texture information T 32 corresponding to (B -B ) subtracted from the size of T 32 is truncated from the tail. As a result of performing the step S70, the remaining portion of T 32 further contains bits corresponding to B -B saved by skipping the motion in ⁇ formation MV 2 31.
- step S80 when B is lower than B (no in the step S50), the predecoding unit 220 simply truncates all the upper layer data MV 31 and T 32 because it cannot truncate the lower layer data to ensure a minimum AVC layer bitstream portion.
- FIG. 19 is a detailed flowchart illustrating a predecoding process according to a second exemplary embodiment of the present invention.
- step S 110 the bitstream parser (210 of FIG. 16) parses a hybrid bitstream 10 provided by the video encoder (100 of FIG. 15) to obtain lowest bit-rate B and boundary bit-rate B that are then sent to the predecoding unit 220.
- a critical bit-rate B may be contained in the hybrid bitstream 10 and received from the video encoder 100 or calculated directly by the predecoding unit 220.
- the bitstream parser 210 parses B as well and sends the same to the predecoding unit 220.
- the predecoding condition determiner 240 determines a predecoding condition, i.e., a target bit-rate B according to a user's input or variable network circumstances.
- the predecoding unit 220 performs predecoding according to the bit-rates B , B , and B .
- the predecoding process is performed in steps S 130 to S 180.
- the critical b it- rate B may be received from the bitstream parser 210 or determined directly by the predecoding unit. For example, a ratio between MV 31 and T 32 is predetermined and a bit-rate obtained when a ratio between the MV 31 and a portion of T 32 remaining after truncation reaches the predetermined ratio is called a critical bit-rate.
- the predecoding unit 220 may determine a critical bit-rate using various other methods that will be apparent to those skilled in the art.
- step S 140 when B is higher than B (yes in step S 130), the predecoding unit
- the predecoding unit 220 skips upper layer motion information MV 31 contained in the hybrid bitstream 10 in step S 160 and truncates all bits of the upper layer texture information T 32 except bits cor ⁇ responding to B -B in step S 170. In other words, a portion of the upper layer texture information T 32 corresponding to (B -B ) subtracted from the size of T 32 is truncated from the tail. As a result of performing the step S 170, the remaining portion of T 32 further contains bits corresponding to B -B saved by skipping the motion in ⁇ formation MV 2 31.
- step S 180 even when B is lower than B (no in the step S 150), the predecoding unit 220 simply truncates all the upper layer data MV 31 and T 32 because it cannot truncate the lower layer data.
- SNR scalability can be adjusted efficiently in a hybrid bitstream.
- exemplary embodiments of the present invention provide methods and apparatuses for adjusting SNR scalability considering both texture data and motion data.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP05808648A EP1813114A4 (fr) | 2004-10-18 | 2005-09-13 | Procede et appareil de precodage de trains de bits hybride |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US61902304P | 2004-10-18 | 2004-10-18 | |
US60/619,023 | 2004-10-18 | ||
KR10-2005-0006803 | 2005-01-25 | ||
KR1020050006803A KR100679030B1 (ko) | 2004-10-18 | 2005-01-25 | 하이브리드 비트스트림의 프리디코딩 방법 및 장치 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006043753A1 true WO2006043753A1 (fr) | 2006-04-27 |
Family
ID=36203154
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2005/003030 WO2006043753A1 (fr) | 2004-10-18 | 2005-09-13 | Procede et appareil de precodage de trains de bits hybride |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP1813114A4 (fr) |
WO (1) | WO2006043753A1 (fr) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6263022B1 (en) * | 1999-07-06 | 2001-07-17 | Philips Electronics North America Corp. | System and method for fine granular scalable video with selective quality enhancement |
US6580754B1 (en) * | 1999-12-22 | 2003-06-17 | General Instrument Corporation | Video compression for multicast environments using spatial scalability and simulcast coding |
US6728317B1 (en) * | 1996-01-30 | 2004-04-27 | Dolby Laboratories Licensing Corporation | Moving image compression quality enhancement using displacement filters with negative lobes |
US6771703B1 (en) * | 2000-06-30 | 2004-08-03 | Emc Corporation | Efficient scaling of nonscalable MPEG-2 Video |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6639943B1 (en) * | 1999-11-23 | 2003-10-28 | Koninklijke Philips Electronics N.V. | Hybrid temporal-SNR fine granular scalability video coding |
-
2005
- 2005-09-13 WO PCT/KR2005/003030 patent/WO2006043753A1/fr active Application Filing
- 2005-09-13 EP EP05808648A patent/EP1813114A4/fr not_active Withdrawn
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6728317B1 (en) * | 1996-01-30 | 2004-04-27 | Dolby Laboratories Licensing Corporation | Moving image compression quality enhancement using displacement filters with negative lobes |
US6263022B1 (en) * | 1999-07-06 | 2001-07-17 | Philips Electronics North America Corp. | System and method for fine granular scalable video with selective quality enhancement |
US6580754B1 (en) * | 1999-12-22 | 2003-06-17 | General Instrument Corporation | Video compression for multicast environments using spatial scalability and simulcast coding |
US6771703B1 (en) * | 2000-06-30 | 2004-08-03 | Emc Corporation | Efficient scaling of nonscalable MPEG-2 Video |
Non-Patent Citations (1)
Title |
---|
See also references of EP1813114A4 * |
Also Published As
Publication number | Publication date |
---|---|
EP1813114A4 (fr) | 2007-11-07 |
EP1813114A1 (fr) | 2007-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7839929B2 (en) | Method and apparatus for predecoding hybrid bitstream | |
US8031776B2 (en) | Method and apparatus for predecoding and decoding bitstream including base layer | |
US8331434B2 (en) | Method and apparatus for video coding, predecoding, and video decoding for video streaming service, and image filtering method | |
US20050226335A1 (en) | Method and apparatus for supporting motion scalability | |
WO2007064082A1 (fr) | Procédé et appareil de codage vidéo hiérarchique faisant appel à de multiples couches | |
CA2573843A1 (fr) | Procede de codage video echelonnable et appareil utilisant une couche de base | |
WO2006004331A1 (fr) | Procedes de codage et de decodage video, codeur et decodeur video | |
AU2004302413B2 (en) | Scalable video coding method and apparatus using pre-decoder | |
US20050157794A1 (en) | Scalable video encoding method and apparatus supporting closed-loop optimization | |
EP1709811A1 (fr) | Dispositif et procede de lecture de flux video adaptables | |
US20050163217A1 (en) | Method and apparatus for coding and decoding video bitstream | |
KR20050076160A (ko) | 스케일러블 비디오 스트림 재생 방법 및 장치 | |
EP1741297A1 (fr) | Procede et appareil permettant de mettre en oeuvre l'extensibilite de mouvement | |
CA2557312A1 (fr) | Procedes de codage et decodage video et systemes pour service de video en debit continu | |
WO2006043753A1 (fr) | Procede et appareil de precodage de trains de bits hybride | |
EP1803302A1 (fr) | Dispositif et procede de reglage du debit binaire d'un train binaire code evolutif par multicouches | |
Ji et al. | Architectures of incorporating MPEG-4 AVC into three dimensional subband video coding | |
WO2006006793A1 (fr) | Procede de codage et decodage de video et codeur et decodeur de video |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2005808648 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 200580034019.7 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWP | Wipo information: published in national office |
Ref document number: 2005808648 Country of ref document: EP |