[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2007011085A1 - Apparatus and method of encoding and decoding audio signal - Google Patents

Apparatus and method of encoding and decoding audio signal Download PDF

Info

Publication number
WO2007011085A1
WO2007011085A1 PCT/KR2005/002308 KR2005002308W WO2007011085A1 WO 2007011085 A1 WO2007011085 A1 WO 2007011085A1 KR 2005002308 W KR2005002308 W KR 2005002308W WO 2007011085 A1 WO2007011085 A1 WO 2007011085A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
block switching
audio file
blocks
channel
Prior art date
Application number
PCT/KR2005/002308
Other languages
French (fr)
Inventor
Tilman Liebchen
Original Assignee
Lg Electronics Inc.
Noll, Peter
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lg Electronics Inc., Noll, Peter filed Critical Lg Electronics Inc.
Priority to PCT/KR2005/002308 priority Critical patent/WO2007011085A1/en
Priority to US11/481,931 priority patent/US7411528B2/en
Priority to US11/481,915 priority patent/US7996216B2/en
Priority to US11/481,926 priority patent/US7949014B2/en
Priority to US11/481,932 priority patent/US8032240B2/en
Priority to US11/481,930 priority patent/US8032368B2/en
Priority to US11/481,942 priority patent/US7830921B2/en
Priority to US11/481,927 priority patent/US7835917B2/en
Priority to US11/481,940 priority patent/US8180631B2/en
Priority to US11/481,929 priority patent/US7991012B2/en
Priority to US11/481,939 priority patent/US8121836B2/en
Priority to US11/481,933 priority patent/US7966190B2/en
Priority to US11/481,941 priority patent/US8050915B2/en
Priority to US11/481,916 priority patent/US8108219B2/en
Priority to US11/481,917 priority patent/US7991272B2/en
Priority to EP06757768A priority patent/EP1913583A4/en
Priority to EP06769218A priority patent/EP1913589A4/en
Priority to PCT/KR2006/002688 priority patent/WO2007008010A1/en
Priority to JP2008521306A priority patent/JP2009500682A/en
Priority to JP2008521311A priority patent/JP2009500687A/en
Priority to EP06769219A priority patent/EP1913584A4/en
Priority to JP2008521309A priority patent/JP2009500685A/en
Priority to CNA200680024866XA priority patent/CN101218852A/en
Priority to JP2008521307A priority patent/JP2009500683A/en
Priority to PCT/KR2006/002685 priority patent/WO2007008007A1/en
Priority to PCT/KR2006/002678 priority patent/WO2007008000A2/en
Priority to CNA2006800294174A priority patent/CN101243489A/en
Priority to EP06769226A priority patent/EP1913588A4/en
Priority to JP2008521313A priority patent/JP2009500688A/en
Priority to PCT/KR2006/002690 priority patent/WO2007008012A2/en
Priority to JP2008521314A priority patent/JP2009500689A/en
Priority to CNA2006800289829A priority patent/CN101238510A/en
Priority to PCT/KR2006/002689 priority patent/WO2007008011A2/en
Priority to EP06769223A priority patent/EP1913587A4/en
Priority to EP06757767A priority patent/EP1913582A4/en
Priority to EP06757765A priority patent/EP1913580A4/en
Priority to JP2008521310A priority patent/JP2009500686A/en
Priority to JP2008521305A priority patent/JP2009500681A/en
Priority to CNA2006800304797A priority patent/CN101243493A/en
Priority to EP06769227A priority patent/EP1911020A4/en
Priority to PCT/KR2006/002677 priority patent/WO2007007999A2/en
Priority to JP2008521319A priority patent/JP2009500693A/en
Priority to PCT/KR2006/002682 priority patent/WO2007008004A2/en
Priority to CNA200680028892XA priority patent/CN101238509A/en
Priority to EP06769225A priority patent/EP1911021A4/en
Priority to JP2008521315A priority patent/JP2009500690A/en
Priority to CNA2006800305499A priority patent/CN101243495A/en
Priority to CNA2006800251395A priority patent/CN101218629A/en
Priority to PCT/KR2006/002687 priority patent/WO2007008009A1/en
Priority to EP06769220A priority patent/EP1913585A4/en
Priority to PCT/KR2006/002679 priority patent/WO2007008001A2/en
Priority to CN2006800252699A priority patent/CN101218630B/en
Priority to PCT/KR2006/002683 priority patent/WO2007008005A1/en
Priority to JP2008521317A priority patent/JP2009500691A/en
Priority to JP2008521318A priority patent/JP2009500692A/en
Priority to CNA2006800305111A priority patent/CN101243494A/en
Priority to EP06769224A priority patent/EP1913794A4/en
Priority to PCT/KR2006/002681 priority patent/WO2007008003A2/en
Priority to CNA2006800251376A priority patent/CN101218631A/en
Priority to JP2008521308A priority patent/JP2009500684A/en
Priority to PCT/KR2006/002680 priority patent/WO2007008002A2/en
Priority to CNA2006800304693A priority patent/CN101243492A/en
Priority to CN2006800251380A priority patent/CN101218628B/en
Priority to PCT/KR2006/002686 priority patent/WO2007008008A2/en
Priority to CNA2006800305412A priority patent/CN101243497A/en
Priority to PCT/KR2006/002691 priority patent/WO2007008013A2/en
Priority to EP06769222A priority patent/EP1908058A4/en
Priority to CN2006800294070A priority patent/CN101243496B/en
Priority to EP06757764A priority patent/EP1913579A4/en
Priority to EP06757766A priority patent/EP1913581A4/en
Priority to JP2008521316A priority patent/JP2009510810A/en
Publication of WO2007011085A1 publication Critical patent/WO2007011085A1/en
Priority to US12/232,526 priority patent/US8010372B2/en
Priority to US12/232,527 priority patent/US7962332B2/en
Priority to US12/232,595 priority patent/US8417100B2/en
Priority to US12/232,593 priority patent/US8326132B2/en
Priority to US12/232,590 priority patent/US8055507B2/en
Priority to US12/232,591 priority patent/US8255227B2/en
Priority to US12/232,662 priority patent/US8510120B2/en
Priority to US12/232,658 priority patent/US8510119B2/en
Priority to US12/232,659 priority patent/US8554568B2/en
Priority to US12/232,747 priority patent/US8149878B2/en
Priority to US12/232,748 priority patent/US8155153B2/en
Priority to US12/232,734 priority patent/US8155144B2/en
Priority to US12/232,739 priority patent/US8155152B2/en
Priority to US12/232,744 priority patent/US8032386B2/en
Priority to US12/232,740 priority patent/US8149876B2/en
Priority to US12/232,743 priority patent/US7987008B2/en
Priority to US12/232,741 priority patent/US8149877B2/en
Priority to US12/232,781 priority patent/US7930177B2/en
Priority to US12/232,783 priority patent/US8275476B2/en
Priority to US12/232,784 priority patent/US7987009B2/en
Priority to US12/232,782 priority patent/US8046092B2/en
Priority to US12/314,891 priority patent/US8065158B2/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/022Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring

Definitions

  • the present invention relates to a method for processing audio signal
  • phonographic technology e.g. record players
  • magnetic technology e.g. cassettes
  • Audio standard Lossless audio coding permits the compression of digital audio data
  • the present invention relates to a method for processing forward-adaptive
  • Another object of the invention is to provide a lossless coding techniques for
  • Audio Lossless Coding will define methods for
  • the lossless codec uses forward-adaptive Linear
  • the present invention relate to an encoder and/or decoder (including methods
  • Data may be encoded or decoded in a lossless
  • Embodiments relate to a flexible, hierarchical block switch scheme
  • Embodiments relate to
  • Embodiments relate to a maximum
  • processing an audio file includes subdividing a channel of an audio data frame
  • blocks are subdivided from the channel at the block switching levels.
  • the audio data frame is included in the audio file, each block resulting from a
  • each block results from a subdivision of a
  • the first block switching information indicating
  • audio file includes an encoder configured to subdivide a channel of audio data frame
  • each block results from a subdivision of a superordinate block of double
  • audio file includes a decoder configured to receive an audio file having an audio data
  • each channel being subdivided into a plurality
  • Figure 1 is an example illustration of an audio signal encoder.
  • Figure 2 is an example illustration of an audio signal decoder.
  • Figure 3 is an measured distributions of parcor coefficients for 48KHz, 16-bit
  • Figure 4 is an compander functions C(r) and -C(-r).
  • Figure 5 is an example of a block switching hierarchy structure.
  • Figure 6 is an example of a block switching examples and corresponding
  • Figure 7 is an example of a bit stream of old block switching scheme.
  • Figure 8 is an example of a bit stream of new block switching (BS) scheme:
  • Figure 9 is a switched difference coding scheme.
  • Figure 10 is a partition of the residual distribution.
  • decoder have to be implemented in a deterministic way.
  • Figure 1 shows the typical processing for one input channel of audio data.
  • the quantized parcor values are entropy coded for
  • the residual is entropy coded using different
  • a multiplexing unit combines coded residual, code indices, predictor
  • encoder also provides a CRC checksum, which is supplied mainly for the decoder to
  • the CRC can be used to ensure that
  • the compressed data is losslessly decodable.
  • Additional encoder options comprise block length switching, random access
  • the encoder may use these options to offer several
  • the encoder can switch between different block
  • the codec allows random access in defined intervals down to some
  • the entropy coding part of the prediction residual provides two alternative
  • the encoder will also offer efficient compression of floating-point
  • This codec extension employs an algorithm that
  • the integer signal is
  • the Figure 2 shows the lossless audio signal decoder which is significantly
  • decoder merely decodes the entropy coded residual and the parcor values, converts
  • the computational effort of the decoder mainly depends on the predictor
  • Linear prediction is used in many applications for speech and audio signal
  • the current sample of a time-discrete signal x(n) can be approximately
  • K is the order of the predictor. If the predicted samples are close to the
  • coefficients are commonly used in many areas, e.g. channel equalization and echo
  • the algorithm inherently calculates the corresponding parcor coefficients as well.
  • Another crucial point in forward-adaptive prediction is to determine a suitable
  • bit rate R e the bit rate
  • R c for the predictor coefficients will rise with the number of coefficients to be
  • the task is to find the optimum order which minimizes the total bit
  • the total bit rate can be determined in each iteration, i.e. for each
  • the first two parcor coefficients r ⁇ and r 2 are typically
  • -C(-r 2 ) can be used to provide a finer resolution at r 2 — » +1 (see Figure 4).
  • the two companded coefficients are then quantized using a simple 7-bit
  • the direct form predictor filter uses predictor coefficients h k
  • a lossless coding method specifies an integer-arithmetic
  • Embodiments relate to encoders, decoders, methods of encoding, and
  • an encoder is at least one of an audio
  • encoding is implemented in at least one of an audio encoder, and an Audio Lossless
  • a decoder is at least one of an audio decoder
  • a method of decoding is
  • 1 :4 switching may be more efficient in some cases, in accordance with embodiments. For example, if only 1 :4 switching is possible, 1 :2 or 1 :8 switching (and combinations thereof) may be more efficient in some cases, in accordance with embodiments. For example, if only 1 :4 switching is possible, 1 :2 or 1 :8 switching (and combinations thereof) may be more efficient in some cases, in accordance with embodiments. For example, if only 1 :4 switching is possible, 1 :2 or 1 :8 switching (and combinations thereof) may be more efficient in some cases, in accordance with embodiments. For example, if
  • channels require different switching, in accordance with embodiments. For example,
  • a more flexible, hierarchical block switching scheme allows
  • independent block switching for each channel may be implemented
  • each channel pair may be switched independently in the case of joint channel
  • a maximum predictor order of 1023 may be implemented.
  • the same compression can be achieved with relatively low
  • decoder complexity which also allows higher compression at the same complexity.
  • Audio Lossless Coding includes a relatively simple block switching
  • Each frame of N samples is either encoded using one full length block
  • this scheme may have some limitations.
  • switching is done identically for all channels, although different channels may require different switching (which is especially true if the channels are not
  • a relatively flexible block switching scheme may be
  • each frame can be hierarchically subdivided into many blocks.
  • Figure 5 illustrates a frame which can be hierarchically subdivided up to
  • N/32 may be possible within a frame, as long as each block results from a
  • N/4 + N/2 may be possible, while a partition into N/4 + N/2 + N/4 may not be possible.
  • the actual partition may be signaled in an additional field
  • N B the number of bytes used for bs_info.
  • Table 1 Block switching levels.
  • the bsjnfo field may include up to 4 bytes, in accordance with embodiments.
  • mapping of bits with respect to the levels 1 to 5 may be [(0)1223333 44444444
  • the first bit may be reserved for indicating independent block
  • N B N/8, and bsjnfo consists of one byte.
  • N B N
  • the bits of bsjnfo are set if a block is further subdivided.
  • N/2 is further split ((0)1017) into two blocks of length N/4. If an N/4 block is split as in
  • bsjnfo fields may be transmitted for all channel pairs (CPEs)
  • switching the channels independently may not be
  • bs_info field for each CPE and SCE in a frame (e.g. the two
  • channels of a CPE are switched synchronously), in accordance with embodiments.
  • the first bit of bs_info may be set to 1 , and the
  • blocks in the bit stream can be dynamically arranged. As illustrated in example
  • synchronized block switching (e.g. the middle row of Figure 8). This interleaving may be beneficial, since in a channel pair a block of one channel (e.g. block 1.2) may
  • channel data can be arranged separately
  • Embodiments relate to higher predictor orders. Absent hierarchical block
  • this factor can be increased
  • K n ⁇ x may be bound by the block length N B , where K x ⁇
  • the max_order field in the file header is 10 bits. In embodiments, the opt_order field of the block data is 10 bits. The actual number of
  • bits in a particular block may depend on the maximum order allowed for a block. If
  • this local maximum order may be smaller than the global maximum
  • the opt_order field is 8 bits (instead of 10) due to a maximum local order of 255.
  • a first sample of a current block is predicted using the last K
  • the K value is determined from the opt_order which is
  • Random access stands for fast access to any part of the encoded audio signal
  • the encoder has to insert frames that can be decoded without
  • previous frames may be used for prediction.
  • the distance between random access frames can be chosen from 255 to one
  • the codec uses progressive prediction, which
  • Joint channel coding can be used to exploit dependencies between the two
  • channels of a stereo signal or between any two channels of a multi-channel signal.
  • switched difference coding is beneficial in cases where two channels are very similar.
  • the channels can be rearranged by the encoder
  • Lossless audio codec also supports a more
  • the block can be further divided into four parts, each encoded with a different Rice
  • the encoder can use a more complex and efficient coding
  • BGMC Block Gilbert-Moore Codes
  • Residuals that belong to a central region of the distribution e(n) ⁇ ⁇ e ma ⁇ , and ones
  • the BGMC encoder splits them into
  • the lossless audio codec is compared with two of the most
  • the compression ratio is defined as
  • Monkey's Audio for all formats, but particularly for high-definition material (i.e. 96 kHz
  • signal encoder of the present invention is just a snapshot of an ongoing development.
  • the CPU load of the decoder is only around 20-25%, which in
  • file based decoding is at least 4-5 times faster than real-time.
  • Table 3 Average CPU load (percentage on a 1.7 GHz Pentium-M), depending on audio format (kHz/bits) and ALS encoder complexity.
  • the codec is designed to offer a large range of complexity levels. While the
  • the decoder may be affected by a higher average predictor order.
  • the present invention is related the syntax which is comprised in encoded bit
  • the block_switching field is extended from 1 to 2 bits, the
  • max_order field is extended from 8 to 10 bits.
  • Frame Data If block switching is used, the bsjnfo field is added. Depending on the bit stream, the bsjnfo field is added. Depending on the bit stream, the bsjnfo field is added. Depending on the bit stream, the bsjnfo field is added. Depending on the bit stream, the bsjnfo field is added. Depending on the bit stream, the bsjnfo field is added.
  • Block Header The short_blocks field is removed, since block switching
  • the opt_order field is extended to a maximum of 10 bits

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A method and apparatus of encoding and decoding an audio file are disclosed. A channel of an audio data frame included in an audio file is subdivided into a plurality of blocks hierarchically at one or more block switching levels, wherein the audio data frame is included in the audio file. Each block results from a subdivision of a superordinate block of double length. Then, first block switching information indicating that the audio file is block switched is generated. And, second block switching information indicating how the blocks are subdivided from the channel at the block switching levels is generated afterwards.

Description

[DESCRIPTION]
APPARATUS AND METHOD OF ENCODING AND DECODING AUDIO SIGNAL
Technical Field
The present invention relates to a method for processing audio signal, and
more particularly to a method and apparatus of encoding and decoding audio signal.
Background Art
The storage and replaying of audio signals has been accomplished in different
ways in the past. For example, music and talk has been recorded and preserved by
phonographic technology (e.g. record players), magnetic technology (e.g. cassette
tapes), and digital technology (e.g. compact discs). As audio storage technology
progresses, many challenges need to be overcome to optimize the quality and
storability of audio signals.
For the archiving and broadband transmission of music signals, lossless
reconstruction is becoming a more important feature than high efficiency in
compression by means of perceptual coding as defined in MPEG standards such as
MP3 or AAC. Although DVD audio and Super CD Audio include proprietary lossless
compression schemes, there is a demand for an open and general compression
scheme among content-holders and broadcasters. In response to this demand, a new lossless coding scheme has been considered as an extension to the MPEG-4
Audio standard. Lossless audio coding permits the compression of digital audio data
without any loss in quality due to a perfect reconstruction of the original signal.
Disclosure of Invention
The present invention relates to a method for processing forward-adaptive
linear prediction, which offers remarkable compression even with low
predictor orders. Nevertheless, performance can be significantly improved by using
higher predictor orders, more efficient quantization and encoding of the predictor
coefficients, and adaptive block length switching.
It is an object of the invention to provide an embedded a lossless audio coding
to permit the compression of digital audio data without any loss in quality due to a
perfect reconstruction of the original signal.
Another object of the invention is to provide a lossless coding techniques for
high-definition audio signals. Audio Lossless Coding will define methods for
lossless coding of audio signals with arbitrary sampling rates, resolutions of up to 32
bit, and up to 256 channels. The lossless codec uses forward-adaptive Linear
Predictive Coding (LPC) to reduce bit rates compared to PCM, leaving the
optimization entirely to the encoder. Thus, various encoder implementations are
possible, offering a certain range in terms of efficiency and complexity. Although remarkable compression is achieved even for low predictor orders,
still better compression becomes possible using high-order prediction. In this case,
more efficient coding of the predictor coefficients is necessary in order to limit the
amount of side information. This is achieved by applying a non-linear compander to
the most important coefficients, followed by linear quantization and entropy coding of
the quantized values. In addition, adaptive block length switching is used to account
for changing signal statistics. As a result, compression ratios are comparable to the
best high-order backward adaptive prediction schemes, but with a significantly less
complex decoder, and maintaining full random access to arbitrary parts of the
encoded signal.
The present invention relate to an encoder and/or decoder (including methods
of encoding and decoding) data. Data may be encoded or decoded in a lossless
manner. Embodiments relate to a flexible, hierarchical block switch scheme,
allowing for up to six different block lengths within a frame. Embodiments relate to
independent block switching for each channel. Embodiments relate to a maximum
predictor order of 1023.
Additional advantages, objects, and features of the invention will be set forth
in part in the description which follows and in part will become apparent to those
having ordinary skill in the art upon examination of the following or may be learned
from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written
description and claims hereof as well as the appended drawings.
To achieve these objects and other advantages and in accordance with the
purpose of the invention, as embodied and broadly described herein, a method of
processing an audio file includes subdividing a channel of an audio data frame
included in an audio file into a plurality of blocks hierarchically at one or more block
switching levels, wherein at least two of the subdivided blocks have different lengths,
generating first block switching information indicating that the audio file is block
switched, and generating second block switching information indicating how the
blocks are subdivided from the channel at the block switching levels.
In another aspect of the present invention, a method of encoding an audio
file includes subdividing a channel of an audio data frame included in an audio file
into a plurality of blocks hierarchically at one or more block switching levels, wherein
the audio data frame is included in the audio file, each block resulting from a
subdivision of a superordinate block of double length, generating first block switching
information indicating that the audio file is block switched, and generating second
block switching information indicating how the blocks are subdivided from the
channel at the block switching levels.
In another aspect of the present invention, a method of decoding an audio
file includes receiving an audio file having an audio data frame which has at least one channel, each channel being subdivided into a plurality of blocks hierarchically at one
or more block switching levels, wherein each block results from a subdivision of a
superordinate block of double length, parsing first block switching information from a
file header included in the audio file, the first block switching information indicating
that the audio file is block switched, parsing second block switching information from
the audio data frame to determine 'how the blocks are subdivided from the block
switching levels, and identifying and decoding the subdivided blocks using the parsed
first and second block switching information.
In another aspect of the present invention, an apparatus of encoding an
audio file includes an encoder configured to subdivide a channel of audio data frame
included in an audio file into a plurality of blocks hierarchically at one or more block
switching levels, to generate first block switching information indicating that the audio
file is block switched, and to generate second block switching information indicating
how the blocks are subdivided from the channel at the block switching levels,
wherein each block results from a subdivision of a superordinate block of double
length.
In a further aspect of the present invention, an apparatus of decoding an
audio file includes a decoder configured to receive an audio file having an audio data
frame which has at least one channel, each channel being subdivided into a plurality
of blocks hierarchically at one or more block switching levels, each block resulting from a subdivision of a superordinate block of double length, wherein the decoder is
further configured to parse first block switching information from a file header
included in the audio file, to parse second block switching information from the audio
data frame, and to identify and decode the subdivided blocks using the parsed first
and second block switching information, the first block switching information
indicating that the audio file is block switched, and the second block switching
information indicating how the blocks are subdivided from the block switching levels.
It is to be understood that both the foregoing general description and the
following detailed description of the present invention are exemplary and explanatory
and are intended to provide further explanation of the invention as claimed.
Brief of Description of Drawings
The accompanying drawings, which are included to provide a further
understanding of the invention and are incorporated in and constitute a part of this
application, illustrate embodiment(s) of the invention and together with the
description serve to explain the principle of the invention. In the drawings:
Figure 1 is an example illustration of an audio signal encoder.
Figure 2 is an example illustration of an audio signal decoder.
Figure 3 is an measured distributions of parcor coefficients for 48KHz, 16-bit
audio material. Figure 4 is an compander functions C(r) and -C(-r).
Figure 5 is an example of a block switching hierarchy structure.
Figure 6 is an example of a block switching examples and corresponding
block switching information codes.
Figure 7 is an example of a bit stream of old block switching scheme.
Figure 8 is an example of a bit stream of new block switching (BS) scheme:
No BS (top), synchronized BS between CPE channels 1 and 2 (middle), independent
BS (bottom).
Figure 9 is a switched difference coding scheme.
Figure 10 is a partition of the residual distribution.
Best Mode for Carrying out the Invention
Reference will now be made in detail to the preferred embodiments of the
present invention, examples of which are illustrated in the accompanying drawings.
Wherever possible, the same reference numbers will be used throughout the
drawings to refer to the same or like parts.
Prior to describing the present invention, it should be noted that most terms
disclosed in the present invention correspond to general terms well known in the art,
but some terms have been selected by the applicant as necessary and will
hereinafter be disclosed in the following description of the present invention. Therefore, it is preferable that the terms defined by the applicant be understood on
the basis of their meanings in the present invention.
In a lossless audio coding method, since the encoding process has to be
perfectly reversible without loss of information, several parts of both encoder and
decoder have to be implemented in a deterministic way.
[Structure of the codec]
Figure 1 shows the typical processing for one input channel of audio data. A
buffer stores one block of input samples, and an optimum set of parcor coefficients is
calculated for each block. The number of coefficients, i.e. the order of the predictor,
can be adaptively chosen as well. The quantized parcor values are entropy coded for
transmission, and converted to LPC coefficients for the prediction filter which
calculates the prediction residual. The residual is entropy coded using different
entropy codes. The indices of the chosen codes have to be transmitted as side
information.
Finally, a multiplexing unit combines coded residual, code indices, predictor
coefficients and other additional information to form the compressed bitstream. The
encoder also provides a CRC checksum, which is supplied mainly for the decoder to
verify the decoded data. On the encoder side, the CRC can be used to ensure that
the compressed data is losslessly decodable. Additional encoder options comprise block length switching, random access
and joint channel coding. The encoder may use these options to offer several
compression levels with different complexities. The basic version of the encoder uses
a fixed block length. Optionally, the encoder can switch between different block
lengths to adapt to stationary regions as well as to transient segments of the audio
signal. The codec allows random access in defined intervals down to some
milliseconds, depending on the block length.
Furthermore, joint channel coding is used to exploit dependencies between
channels of stereo or multi-channel signals. This can be achieved by coding the
difference between two channels in those segments where this difference can be
coded more efficiently than one of the original channels.
The entropy coding part of the prediction residual provides two alternative
coding techniques with different complexities. Besides low complexity yet efficient
Golomb-Rice coding, the BGMC arithmetic coding scheme offers even better
compression at the expense of a slightly increased complexity.
Furthermore, The encoder will also offer efficient compression of floating-point
audio data in the 32-bit IEEE format. This codec extension employs an algorithm that
basically splits the floating-point signal into a truncated integer signal and a
difference signal which contains the remaining fractional part. The integer signal is
then compressed using the normal encoding scheme for PCM signals, while the difference signal is coded separately. A detailed description of the floating-point
extension can be found.
The Figure 2 shows the lossless audio signal decoder which is significantly
less complex than the encoder, since no adaptation has to be carried out. The
decoder merely decodes the entropy coded residual and the parcor values, converts
them into LPC coefficients, and applies the inverse prediction filter to calculate the
lossless reconstruction signal.
The computational effort of the decoder mainly depends on the predictor
orders chosen by the encoder. Since the average order is typically well below the
maximum order, prediction with greater maximum orders does not necessarily lead to
a significant increase of decoder complexity. In most cases, realtime decoding is
possible even on low-end systems.
[Linear Prediction]
Linear prediction is used in many applications for speech and audio signal
processing. In the following, only FIR predictors are considered.
Prediction with FIR Filters
The current sample of a time-discrete signal x(n) can be approximately
predicted from previous samples x(n - Ic) . The prediction is given by K
*(») = ∑Λ* *x(n -k), (D 4=1
where K is the order of the predictor. If the predicted samples are close to the
original samples, the residual
e(n) = x(n) — x(ή) (2)
has a smaller variance than x(ή) itself, hence e(n) can be encoded more
efficiently.
The procedure of estimating the predictor coefficients from a segment of input
samples, prior to filtering that segment, is referred to as forward adaptation. In that
case, the coefficients have to be transmitted. If the coefficients are estimated from
previously processed segments or samples, e.g. from the residual, we speak of
backward adaptation. This procedure has the advantage that no transmission of the
coefficients is needed, since the data required to estimate the coefficients is available
to the decoder as well.
Forward-adaptive prediction with orders around 10 is widely used in speech
coding, and can be employed for lossless audio coding as well. The maximum order of most forward-adaptive lossless prediction schemes is still rather small, e.g. K = 32.
An exception is the special 1-bit lossless codec for the Super Audio CD, which uses
predictor orders of up to 128.
On the other hand, backward-adaptive FIR filters with some hundred
coefficients are commonly used in many areas, e.g. channel equalization and echo
cancellation. Most systems are based on the LMS algorithm or a variation thereof,
which has also been proposed for lossless audio coding. Such LMS-based coding
schemes with high orders are applicable since the predictor coefficients do not have
to be transmitted as side information, thus their number does not contribute to the
data rate. However, backward-adaptive codecs have the drawback that the
adaptation has to be carried out both in the encoder and the decoder, making the
decoder significantly more complex than in the forward-adaptive case.
Forward-Adaptive Prediction
In forward-adaptive linear prediction, the optimal predictor coefficients Zz^ (Jn
terms of a minimized variance of the residual) are usually estimated for
each block by the autocorrelation method or the covariance method. The
autocorrelation method, using the Levinson-Durbin algorithm, has the additional
advantage of providing a simple means to iteratively adapt the order of the predictor.
Furthermore, the algorithm inherently calculates the corresponding parcor coefficients as well.
Another crucial point in forward-adaptive prediction is to determine a suitable
predictor order. Increasing the order decreases the variance of the prediction error,
which leads to a smaller bit rate Re for the residual. On the other hand, the bit rate
Rc for the predictor coefficients will rise with the number of coefficients to be
transmitted. Thus, the task is to find the optimum order which minimizes the total bit
rate. This can be expressed by minimizing
Rtotal (K) = Re(K) + Rc(K) (3)
with respect to the prediction order K. As the prediction gain rises
monotonically with higher orders, Re decreases with K. On the other hand Rc rises
monotonically with K, since an increasing number of coefficients have to be
transmitted.
The search for the optimum order can be carried out efficiently by the
Levinson-Durbin algorithm, which determines recursively all predictors with
increasing order. For each order, a complete set of predictor coefficients is calculated.
Moreover, the variance σ] of the corresponding residual can be derived, resulting
in an estimate of the expected bit rate for the residual. Together with the bit rate for the coefficients, the total bit rate can be determined in each iteration, i.e. for each
predictor order. The optimum order is found at the point where the total bit rate no
longer decreases.
While it is obvious from equation(3) that the coefficient bit rate has a direct
effect on the total bit rate, a slower increase of Rc also allows to shift the minimum
of Rtotal to higher orders (where Re is smaller as well), which would lead to better
compression. Hence, efficient though accurate quantization of the predictor
coefficients plays an important role in achieving maximum compression.
Quantization of Predictor Coefficients
Direct quantization of the predictor coefficients hk is not very efficient for
transmission, since even small quantization errors may result in large deviations from
the desired spectral characteristics of the optimum prediction filter. For this reason,
the quantization of predictor coefficients is based on the parcor (reflection)
coefficients rk , which can be calculated by means of the Levinson-Durbin algorithm.
In that case, the resulting values are restricted to the interval [-1 , 1]. Although parcor
coefficients are less sensitive to quantization, they are still too sensitive when their
magnitude is close to unity. The first two parcor coefficients r{ and r2 are typically
very close to -1 and +1 , respectively, while the remaining coefficients rk t k > 2, usually have smaller magnitudes. The distributions of the first coefficients are very
different, but high-order coefficients tend to converge to a zero-mean gaussian-like
distribution (Figure 3).
Therefore, only the first two coefficients are companded based on the
following function:
Figure imgf000016_0001
This compander results in a significantly finer resolution at rx -» -1, whereas
-C(-r2 ) can be used to provide a finer resolution at r2 — » +1 (see Figure 4).
However, in order to simplify computation, +C(-r2 ) is actually used for the
second coefficient, leading to an opposite sign of the companded value.
The two companded coefficients are then quantized using a simple 7-bit
uniform quantizer. This results in the following values:
= [64(-l + j2^+ϊ)] (5)
Figure imgf000016_0002
The remaining coefficients τk , k > 2 are not companded but simply quantized
using a 7-bit uniform quantizer again:
β* = [64rt ] (7)
In all cases the resulting quantized values ak are restricted to the range [-64,
+63]. These quantized coefficients are re-centered around their most probable values,
and then encoded using Golomb-Rice codes. As a result, the average bit rate of the
encoded parcor coefficients can be reduced to approximately 4 bits/coefficient,
without noticeable degration of the spectral characteristics. Thus, it is possible to
employ very high orders up to K = 1023, preferably in conjunction with large block
lengths.
However, the direct form predictor filter uses predictor coefficients hk
according to Eq. (1 ). In order to employ identical coefficients in the encoder and the
decoder, these hk values have to be derived from the quantized ak values in both
cases (see Figures 1 and 2). While it is up to the encoder how to determine a set of
suitable parcor coefficients, A lossless coding method specifies an integer-arithmetic
function for conversion between quantized values ak and direct predictor
coefficients hk which ensures their identical reconstruction in both encoder and decoder.
Block Length Switching
Embodiments relate to encoders, decoders, methods of encoding, and
methods of decoding. In embodiments, an encoder is at least one of an audio
encoder, and an Audio Lossless Coding encoder. In embodiments, a method of
encoding is implemented in at least one of an audio encoder, and an Audio Lossless
Coding encoder. In embodiments, a decoder is at least one of an audio decoder,
and an Audio Lossless Coding decoder. In embodiments, a method of decoding is
implemented in at least one of an audio decoder, and an Audio Lossless Coding
decoder.
hierarchical Block Switching>
Embodiments relate to a block switching mechanism which subdivides a
frame of audio data into four quarter-length blocks, instead of encoding it as one
single block. Switching between one long and four short blocks may be performed
adaptively on a frame-by-frame basis.
Even though this switching mechanism may enable a higher compression
ratio than using a constant block length, there may be some drawbacks. For example,
if only 1 :4 switching is possible, 1 :2 or 1 :8 switching (and combinations thereof) may be more efficient in some cases, in accordance with embodiments. For example, if
switching is done identically for all channels, there may be challenges if different
channels require different switching, in accordance with embodiments. For example,
since a more flexible block switching scheme enables the use of a wide range of
block lengths (including very long ones), even higher maximum predictor orders may
be feasible, in accordance with embodiments.
In embodiments, a more flexible, hierarchical block switching scheme, allows
for up to six different blocks lengths (differing by factors of two) within a frame. In
embodiments, independent block switching for each channel may be implemented
(e.g. each channel pair may be switched independently in the case of joint channel
coding). In embodiments, a maximum predictor order of 1023 may be implemented.
In embodiments, the same compression can be achieved with relatively low
decoder complexity, which also allows higher compression at the same complexity.
Audio Lossless Coding (ALS) includes a relatively simple block switching
mechanism. Each frame of N samples is either encoded using one full length block
(N B = N) or four blocks of length NB = N/4, where the same block partition applies
to all channels. Under some circumstances, this scheme may have some limitations.
For example, only 1 :4 switching may be possible, although different switching (e.g.
1 :2, 1 :8, and combinations thereof) may be more efficient in some cases. For
example, switching is done identically for all channels, although different channels may require different switching (which is especially true if the channels are not
correlated).
In embodiments, a relatively flexible block switching scheme may be
implemented, where each frame can be hierarchically subdivided into many blocks.
For example, Figure 5 illustrates a frame which can be hierarchically subdivided up to
32 blocks. Arbitrary combinations of blocks with NB = N, N/2, N/4, N/8, N/16, and
N/32 may be possible within a frame, as long as each block results from a
subdivision of a superordinate block of double length, in accordance with
embodiments. For example, as illustrated in example Figure 2, a partition into N/4 +
N/4 + N/2 may be possible, while a partition into N/4 + N/2 + N/4 may not be possible.
In embodiments, the actual partition may be signaled in an additional field
block switching information(bs_info) (illustrated in the right column of Figure 6),
where the length depends on the number of block switching levels. Table 1
illustrates an example relationship of the maximum number of levels, the minimum
NB , and the number of bytes used for bs_info.
Table 1 : Block switching levels.
Figure imgf000020_0001
Figure imgf000021_0001
The bsjnfo field may include up to 4 bytes, in accordance with embodiments.
The mapping of bits with respect to the levels 1 to 5 may be [(0)1223333 44444444
55555555 55555555]. The first bit may be reserved for indicating independent block
switching. In the example of Figure 26, there are three levels, thus the minimum
block length is NB = N/8, and bsjnfo consists of one byte. Starting at the maximum
block length NB = N, the bits of bsjnfo are set if a block is further subdivided. For
the topmost example there is no subdivision at all, thus the code is (0)0000000. The
frame in the second row is subdivided ((0)1...), where only the second block of length
N/2 is further split ((0)101...) into two blocks of length N/4. If an N/4 block is split as in
the fourth row, it is indicated in the following bits ((0)111 0100).
In each frame, bsjnfo fields may be transmitted for all channel pairs (CPEs)
and all single channels (SCEs), enabling independent block switching for different
channels, in accordance with embodiments.
independent Block Switch inq>
In Independent Block Switching, while the frame length is identical for all
channels, block switching can be done individually for each channel, in accordance with embodiments. If difference coding is used, both channels of a channel pair
should be switched synchronously, but other channel pairs can still use different
block switching. If the two channels of a channel pair are not correlated with each
other, difference coding may not pay off, and thus there will be no need to switch
both channels synchronously. Accordingly, if the two channels of a channel pair are
not correlated with each other, switching the channels independently may not be
practical.
There may be a bs_info field for each CPE and SCE in a frame (e.g. the two
channels of a CPE are switched synchronously), in accordance with embodiments. If
they are switched independently, the first bit of bs_info may be set to 1 , and the
information applies to the CPE's first channel. In this example, another bs_info field
for the second channel becomes necessary.
In embodiments, as a result of the increased flexibility, the arrangement of
blocks in the bit stream can be dynamically arranged. As illustrated in example
Figure 7, all channels use the same partition (e.g. either one long or four short
blocks) and corresponding short blocks of different channels are arranged
successively (e.g. blocks 1.1 , 2.1 , and 3.1), leading to an interleaved structure.
In embodiments illustrated in example Figure 8, short blocks are only
interleaved if they belong to a channel pair that uses difference coding and therefore
synchronized block switching (e.g. the middle row of Figure 8). This interleaving may be beneficial, since in a channel pair a block of one channel (e.g. block 1.2) may
depend on previous blocks from both channels (e.g. blocks 1.1 and 2.1), so these
previous blocks may need to be available prior to the current one. For channels
whose blocks are switched independently, channel data can be arranged separately
(e.g. bottom row of Figure 8).
<Higher Predictor Orders>
Embodiments relate to higher predictor orders. Absent hierarchical block
switching, there may be a factor of 4 between the long and the short block length (e.g.
4096 & 1024 or 8192 & 2048), in accordance with embodiments. In embodiments
(e.g. where hierarchical block switching is implemented), this factor can be increased
(e.g. up to 32), enabling a larger range (e.g. 16384 down to 512 or even 32768 to
1024 for high sampling rates).
In embodiments, in order to make better use of very long blocks, higher
maximum predictor orders may be employed. The maximum order may be KmΛX =
1023. In embodiments, Kn^x may be bound by the block length NB, where Kx^
< NB / 8 (e.g. KmΑK = 255 for NB = 2048). Therefore, using K^ = 1023 may
require a block length of at least NB = 8192.
In embodiments, the max_order field in the file header is 10 bits. In embodiments, the opt_order field of the block data is 10 bits. The actual number of
bits in a particular block may depend on the maximum order allowed for a block. If
the block is short, this local maximum order may be smaller than the global maximum
order (stated in max_order in the file header). For example, if Kn^ = 1023, but NB -
2048, the opt_order field is 8 bits (instead of 10) due to a maximum local order of 255.
The opt_order is determined based on the following equation. opt_order = min
(global prediction order, local prediction order), and the global prediction order is
determined from the max_order, and the local prediction order is determined from the
length of the block. In detail, global and local prediction orders are determined by
global prediction order = ceil(log2(maximum prediction order +1)), and local
prediction order = max(ceil(log2((Nb»3)-1)), 1)
In embodiments, it is necessary to predict data samples of the subdivided
block from channel. A first sample of a current block is predicted using the last K
samples of a previous block. The K value is determined from the opt_order which is
derived the aboved equation.
If the current block is a channel's first block, no samples from the previous
block may be used. In this case, prediction with progressive order is employed,
where the scaled parcor coefficients are converted progressively to LPC coefficient
inside the prediction filter. Random Access
Random access stands for fast access to any part of the encoded audio signal
without costly decoding of previous parts. It is an important feature for applications
that employ seeking, editing, or streaming of the compressed data. In order to enable
random access, the encoder has to insert frames that can be decoded without
decoding previous frames. In those random access frames, no samples from
previous frames may be used for prediction.
The distance between random access frames can be chosen from 255 to one
frame. Depending on frame length and sampling rate, random access down to some
milliseconds is possible.
However, prediction at the beginning of random access frames still constitutes
a problem. A conventional K-th order predictor would normally need K samples from
the previous frame in order the predict the current frame's first sample. Since
samples from previous frames may not be used, the encoder has either to assume
zeros, or to transmit the first K original samples directly, starting the prediction at
position K + 1.
As a result, compression at the beginning of random access frames would be
poor. In order to minimize this problem, the codec uses progressive prediction, which
makes use of as many available samples as possible. While it is of course not feasible to predict the first sample of a random access frame, we can use first-order
prediction for the second sample, second-order prediction for the third sample, and
so forth, until the samples from position K + 1 on are predicted using the full K-th
order predictor. Since the predictor coefficients hk are calculated recursively from
the quantized parcor coefficients ak anyway, it is possible to calculate each
coefficient set from orders 1 to K without additional costs.
In the case of 500 ms random access intervals, this scheme produces an
absolute overhead of only 0.01-0.02% compared to continuous prediction without
random access.
Joint Channel Coding
Joint channel coding can be used to exploit dependencies between the two
channels of a stereo signal, or between any two channels of a multi-channel signal.
While it is straightforward to process two channels X1(H) and x2 (n) independently,
a simple way to exploit dependencies between these channels is to encode the
difference signal
d{n) = x2(n) - xx (n) (8) instead of x1 (n) or x2(n). Switching between X1 (n) , x2 (n) and d(ή) m' each
block can be carried out by comparison of the individual signals, depending on which
two signals can be coded most efficiently (see Figure 9). Such prediction with
switched difference coding is beneficial in cases where two channels are very similar.
In the case of multi-channel material, the channels can be rearranged by the encoder
in order to assign suitable channel pairs.
Besides simple difference coding, Lossless audio codec also supports a more
complex scheme for exploiting interchannel redundancy between arbitrary channels
of multichannel signals.
Entropy Coding of The Residual
In simple mode, the residual values e(ή) are entropy coded using Rice
codes. For each block, either all values can be encoded using the same Rice code,
or the block can be further divided into four parts, each encoded with a different Rice
code. The indices of the applied codes have to be transmitted, as shown in Figure 1.
' Since there are different ways to determine the optimal Rice code for a given set of
data, it is up to the encoder to select suitable codes depending on the statistics of the
residual.
Alternatively, the encoder can use a more complex and efficient coding
scheme called BGMC (Block Gilbert-Moore Codes). In BGMC mode, the encoding of residuals is accomplished by splitting the distribution in two categories (Figure 10):
Residuals that belong to a central region of the distribution, e(n)\ < emaκ , and ones
that belong to its tails.
The residuals in tails are simply re-centered (i.e. for e(ri) > emax we have
et (n) = e(n) - emax ) and encoded using Rice codes as described earlier. However, to
encode residuals in the center of the distribution, the BGMC encoder splits them into
LSB and MSB components first, then it encodes MSBs using block Gilbert-Moore
(arithmetic) codes, and finally it transmits LSBs using direct fixed-lengths codes. Both
parameters emax and the number of directly transmitted LSBs are selected such that
they only slightly affect the coding efficiency of this scheme, while making it
significantly less complex.
[Compression Results]
In the following, the lossless audio codec is compared with two of the most
popular programs for lossless audio compression: The open-source codec FLAC,
which uses forward-adaptive prediction as well, and Monkey's Audio (MAC 3.97), a
backward-adaptive codec as the current state-of-the-art algorithm in terms of
compression. Both codecs were run with options providing maximum compression
(flac -8 and mac-c4000). The results for the encoder were determined for a medium compression level (with the prediction order restricted to K _ 60) and a maximum
compression level (K _ 1023), both with random access of 500 ms. The tests were
conducted on a 1.7 GHz Pentium-M system, with 1024 MB of memory. It comprises
nearly 1 GB of stereo waveform data with sampling rates of 48, 96, and 192 kHz, and
resolutions of 16 and 24 bits.
[Compression Ratio!
In the following, the compression ratio is defined as
c ^ CompreSSedFileSize H00% ! (g)
OriginalFileSize
where smaller values mean better compression. The results for the examined
audio formats are shown in Table 2 (192 kHz material is not supported by the
FLAC codec).
Table 2: Comparison of average compression ratios for different audio formats
(kHz/bits)
Figure imgf000029_0001
Figure imgf000030_0001
The results show that ALS at maximum level outperforms both FLAC and
Monkey's Audio for all formats, but particularly for high-definition material (i.e. 96 kHz
/ 24-bit and above). Even at medium level ALS delivers the best overall compression.
rComplexitvi
The complexity of different codecs strongly depends on the actual
implementation, particularly that of the encoder. As mentioned earlier, the audio
signal encoder of the present invention is just a snapshot of an ongoing development.
Thus, we restrict our analysis to the decoder, a simple C code implementation with
no further optimizations. The compressed data was generated by the currently best
encoder implementation. The average CPU load for real-time decoding of various
audio formats, encoded at different complexity levels, is shown in Table 3. Even for
maximum complexity, the CPU load of the decoder is only around 20-25%, which in
return means that file based decoding is at least 4-5 times faster than real-time.
Table 3: Average CPU load (percentage on a 1.7 GHz Pentium-M), depending on audio format (kHz/bits) and ALS encoder complexity.
Figure imgf000031_0001
The codec is designed to offer a large range of complexity levels. While the
maximum level achieves the highest compression at the expense of slowest
encoding and decoding speed, the faster medium level only slightly degrades
compression, but decoding is significantly less complex than for the maximum level
(around 5% CPU load for 48 kHz material). Using a low-complexity level (K _ 15,
Rice coding) degrades compression by only 1-1.5% compared to the medium level,
but the decoder complexity is further reduced by a factor of three (less than 2% CPU
load for 48 kHz material). Thus, audio data can be decoded even on hardware with
very low computing power.
While the encoder complexity may be increased by both higher maximum
orders and a more elaborate block switching algorithm (in accordance with
embodiments), the decoder may be affected by a higher average predictor order.
As the results for a scheme in accordance with embodiments with Zn
127, The foregoing embodiments (e.g. hierarchical block switching) and advantages
are merely examples and are not to be construed as limiting the appended claims. The above teachings can be applied to other apparatuses and methods, as would be
appreciated by one of ordinary skill in the art. Many alternatives, modifications, and
variations will be apparent to those skilled in the art.
fSyntaxi
The present invention is related the syntax which is comprised in encoded bit
stream. The syntax is as bellows;
File Header: The block_switching field is extended from 1 to 2 bits, the
max_order field is extended from 8 to 10 bits. The framejength and
user_frame_length fields are merged, resulting in a framejength field of 16 bits,
while the userjramejength field is removed.
Table 4: Syntax of alsjieader
Figure imgf000032_0001
Figure imgf000033_0001
Frame Data: If block switching is used, the bsjnfo field is added. Depending
on the value of block_switching, it has 8, 16, or 32 bits. The first bit of a CPE's
bsjnfo field holds the independents flag. The number of blocks is implicitly derived
from bsjnfo as well. If block_switching is off, there is no bsjnfo field, thus blocks is
one and independent_bs is zero.
In order to improve readability, both new and old syntax are shown separately
in the following table, instead of mixing new with old syntax elements.
Table 5: Syntax of frame_data
Figure imgf000034_0001
II
Block Header: The short_blocks field is removed, since block switching
information is completely transmitted on frame level (bs_info, see previous
paragraph).
Table 6: Syntax of blockjheader
Figure imgf000035_0001
Block Data: The opt_order field is extended to a maximum of 10 bits
(previously 8 bits).
Table 7: Syntax of block_data
Figure imgf000036_0001
[Semantics!
File Header:
Table 8: Elements of als header
Figure imgf000036_0002
Figure imgf000037_0001
Frame Data: Table 9: Elements of frame data
Figure imgf000038_0001
Table 10: Elements of block header
Figure imgf000038_0002
Table 11 : Elements of block data
Figure imgf000038_0003
Industrial Applicability
It will be apparent to those skilled in the art that various modifications and
variations can be made in the present invention without departing from the spirit or
scope of the inventions. For example, the present invention can be adopted another
audio signal codec like the lossy audio signal codec. Thus, it is intended that the
present invention covers the modifications and variations of this invention provided
they come within the scope of the appended claims and their equivalents.

Claims

[CLAIMS]
1. A method of processing an audio file, the method comprising:
subdividing a channel of an audio data frame included in an audio file into a
plurality of blocks hierarchically at one or more block switching levels, wherein at
least two of the subdivided blocks have different lengths;
generating first block switching information indicating that the audio file is
block switched; and
generating second block switching information indicating how the blocks are
subdivided from the channel at the block switching levels.
2. The method of claim 1 , wherein the first block switching information is
included in a file header included in the audio file.
3. The method of claim 2, wherein the first block switching information is
indicated by 2 bits.
4. The method of claim 2, wherein the first block switching information is
defined by any one of "01", "10", and "11" to indicate that the audio file is block
switched.
5. The method of claim 1 , wherein a total length of the second block
switching information is determined based on a total number of the block switching
levels.
6. The method of claim 1 , wherein the second block switching information
includes a series of information bits representing how the blocks are subdivided from
the channel at the block switching levels, respectively.
7. The method of claim 6, wherein each information bit has a value of 1 when
a block is subdivided at a corresponding block switching level and has a value of 0
when the block is not subdivided at the corresponding block switching level.
8. The method of claim 1 , further comprising transmitting the first block
switching information.
9. The method of claim 1 , further comprising transmitting the second block
switching information.
10. The method of claim 1 , further comprising predicting data samples of the
blocks subdivided from the channel, wherein a first sample of a current block is predicted using the last K samples of a previous block.
11. The method of claim 10, wherein a first sample of the current block is
predicted using prediction with progressive order when the current block is a
foremost block of the channel.
12. A method of encoding an audio file, the method comprising:
subdividing a channel of an audio data frame included in an audio file into a
plurality of blocks hierarchically at one or more block switching levels, wherein the
audio data frame is included in the audio file, each block resulting from a subdivision
of a superordinate block of double length;
generating first block switching information indicating that the audio file is
block switched; and
generating second block switching information indicating how the blocks are
subdivided from the channel at the block switching levels.
13. A method of decoding an audio file, the method comprising:
receiving an audio file having an audio data frame which has at least one
channel, each channel being subdivided into a plurality of blocks hierarchically at one
or more block switching levels, wherein each block results from a subdivision of a superordinate block of double length;
parsing first block switching information from a file header included in the
audio file, the first block switching information indicating that the audio file is block
switched;
parsing second block switching information from the audio data frame to
determine how the blocks are subdivided from the block switching levels; and
identifying and decoding the subdivided blocks using the parsed first and
second block switching information.
14. An apparatus of encoding an audio file, the apparatus comprising:
an encoder configured to subdivide a channel of audio data frame included in
an audio file into a plurality of blocks hierarchically at one or more block switching
levels, to generate first block switching information indicating that the audio file is
block switched, and to generate second block switching information indicating how
the blocks are subdivided from the channel at the block switching levels, wherein
each block results from a subdivision of a superordinate block of double length.
15. An apparatus of decoding an audio file, the apparatus comprising:
a decoder configured to receive an audio file having an audio data frame
which has at least one channel, each channel being subdivided into a plurality of blocks hierarchically at one or more block switching levels, each block resulting from
a subdivision of a superordinate block of double length, wherein the decoder is
further configured to parse first block switching information from a file header
included in the audio file, to parse second block switching information from the audio
data frame, and to identify and decode the subdivided blocks using the parsed first
and second block switching information, the first block switching information
indicating that the audio file is block switched, and the second block switching
information indicating how the blocks are subdivided from the block switching levels.
PCT/KR2005/002308 2005-07-11 2005-07-18 Apparatus and method of encoding and decoding audio signal WO2007011085A1 (en)

Priority Applications (93)

Application Number Priority Date Filing Date Title
PCT/KR2005/002308 WO2007011085A1 (en) 2005-07-18 2005-07-18 Apparatus and method of encoding and decoding audio signal
US11/481,931 US7411528B2 (en) 2005-07-11 2006-07-07 Apparatus and method of processing an audio signal
US11/481,915 US7996216B2 (en) 2005-07-11 2006-07-07 Apparatus and method of encoding and decoding audio signal
US11/481,926 US7949014B2 (en) 2005-07-11 2006-07-07 Apparatus and method of encoding and decoding audio signal
US11/481,932 US8032240B2 (en) 2005-07-11 2006-07-07 Apparatus and method of processing an audio signal
US11/481,930 US8032368B2 (en) 2005-07-11 2006-07-07 Apparatus and method of encoding and decoding audio signals using hierarchical block swithcing and linear prediction coding
US11/481,942 US7830921B2 (en) 2005-07-11 2006-07-07 Apparatus and method of encoding and decoding audio signal
US11/481,927 US7835917B2 (en) 2005-07-11 2006-07-07 Apparatus and method of processing an audio signal
US11/481,940 US8180631B2 (en) 2005-07-11 2006-07-07 Apparatus and method of processing an audio signal, utilizing a unique offset associated with each coded-coefficient
US11/481,929 US7991012B2 (en) 2005-07-11 2006-07-07 Apparatus and method of encoding and decoding audio signal
US11/481,939 US8121836B2 (en) 2005-07-11 2006-07-07 Apparatus and method of processing an audio signal
US11/481,933 US7966190B2 (en) 2005-07-11 2006-07-07 Apparatus and method for processing an audio signal using linear prediction
US11/481,941 US8050915B2 (en) 2005-07-11 2006-07-07 Apparatus and method of encoding and decoding audio signals using hierarchical block switching and linear prediction coding
US11/481,916 US8108219B2 (en) 2005-07-11 2006-07-07 Apparatus and method of encoding and decoding audio signal
US11/481,917 US7991272B2 (en) 2005-07-11 2006-07-07 Apparatus and method of processing an audio signal
EP06757768A EP1913583A4 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
EP06769218A EP1913589A4 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
PCT/KR2006/002688 WO2007008010A1 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
JP2008521306A JP2009500682A (en) 2005-07-11 2006-07-10 Apparatus and method for encoding and decoding audio signal
JP2008521311A JP2009500687A (en) 2005-07-11 2006-07-10 Audio signal processing apparatus and method
EP06769219A EP1913584A4 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
JP2008521309A JP2009500685A (en) 2005-07-11 2006-07-10 Apparatus and method for encoding and decoding audio signal
CNA200680024866XA CN101218852A (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
JP2008521307A JP2009500683A (en) 2005-07-11 2006-07-10 Apparatus and method for encoding and decoding audio signal
PCT/KR2006/002685 WO2007008007A1 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
PCT/KR2006/002678 WO2007008000A2 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
CNA2006800294174A CN101243489A (en) 2005-07-11 2006-07-10 Apparatus and method of coding and decoding an audio signal
EP06769226A EP1913588A4 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
JP2008521313A JP2009500688A (en) 2005-07-11 2006-07-10 Audio signal processing apparatus and method
PCT/KR2006/002690 WO2007008012A2 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
JP2008521314A JP2009500689A (en) 2005-07-11 2006-07-10 Audio signal processing apparatus and method
CNA2006800289829A CN101238510A (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
PCT/KR2006/002689 WO2007008011A2 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
EP06769223A EP1913587A4 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
EP06757767A EP1913582A4 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
EP06757765A EP1913580A4 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
JP2008521310A JP2009500686A (en) 2005-07-11 2006-07-10 Apparatus and method for encoding and decoding audio signal
JP2008521305A JP2009500681A (en) 2005-07-11 2006-07-10 Apparatus and method for encoding and decoding audio signal
CNA2006800304797A CN101243493A (en) 2005-07-11 2006-07-10 Apparatus and method of coding and decoding an audio signal
EP06769227A EP1911020A4 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
PCT/KR2006/002677 WO2007007999A2 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
JP2008521319A JP2009500693A (en) 2005-07-11 2006-07-10 Apparatus and method for encoding and decoding audio signal
PCT/KR2006/002682 WO2007008004A2 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
CNA200680028892XA CN101238509A (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
EP06769225A EP1911021A4 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
JP2008521315A JP2009500690A (en) 2005-07-11 2006-07-10 Audio signal processing apparatus and method
CNA2006800305499A CN101243495A (en) 2005-07-11 2006-07-10 Apparatus and method of coding and decoding an audio signal
CNA2006800251395A CN101218629A (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
PCT/KR2006/002687 WO2007008009A1 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
EP06769220A EP1913585A4 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
PCT/KR2006/002679 WO2007008001A2 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
CN2006800252699A CN101218630B (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
PCT/KR2006/002683 WO2007008005A1 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
JP2008521317A JP2009500691A (en) 2005-07-11 2006-07-10 Audio signal processing apparatus and method
JP2008521318A JP2009500692A (en) 2005-07-11 2006-07-10 Audio signal processing apparatus and method
CNA2006800305111A CN101243494A (en) 2005-07-11 2006-07-10 Apparatus and method of coding and decoding an audio signal
EP06769224A EP1913794A4 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
PCT/KR2006/002681 WO2007008003A2 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
CNA2006800251376A CN101218631A (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
JP2008521308A JP2009500684A (en) 2005-07-11 2006-07-10 Audio signal processing method, audio signal encoding and decoding apparatus and method
PCT/KR2006/002680 WO2007008002A2 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
CNA2006800304693A CN101243492A (en) 2005-07-11 2006-07-10 Apparatus and method of coding and decoding an audio signal
CN2006800251380A CN101218628B (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding an audio signal
PCT/KR2006/002686 WO2007008008A2 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
CNA2006800305412A CN101243497A (en) 2005-07-11 2006-07-10 Apparatus and method of coding and decoding an audio signal
PCT/KR2006/002691 WO2007008013A2 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
EP06769222A EP1908058A4 (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
CN2006800294070A CN101243496B (en) 2005-07-11 2006-07-10 Apparatus and method of processing an audio signal
EP06757764A EP1913579A4 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
EP06757766A EP1913581A4 (en) 2005-07-11 2006-07-10 Apparatus and method of encoding and decoding audio signal
JP2008521316A JP2009510810A (en) 2005-07-11 2006-07-10 Audio signal processing apparatus and method
US12/232,526 US8010372B2 (en) 2005-07-11 2008-09-18 Apparatus and method of encoding and decoding audio signal
US12/232,527 US7962332B2 (en) 2005-07-11 2008-09-18 Apparatus and method of encoding and decoding audio signal
US12/232,595 US8417100B2 (en) 2005-07-11 2008-09-19 Apparatus and method of encoding and decoding audio signal
US12/232,593 US8326132B2 (en) 2005-07-11 2008-09-19 Apparatus and method of encoding and decoding audio signal
US12/232,590 US8055507B2 (en) 2005-07-11 2008-09-19 Apparatus and method for processing an audio signal using linear prediction
US12/232,591 US8255227B2 (en) 2005-07-11 2008-09-19 Scalable encoding and decoding of multichannel audio with up to five levels in subdivision hierarchy
US12/232,662 US8510120B2 (en) 2005-07-11 2008-09-22 Apparatus and method of processing an audio signal, utilizing unique offsets associated with coded-coefficients
US12/232,658 US8510119B2 (en) 2005-07-11 2008-09-22 Apparatus and method of processing an audio signal, utilizing unique offsets associated with coded-coefficients
US12/232,659 US8554568B2 (en) 2005-07-11 2008-09-22 Apparatus and method of processing an audio signal, utilizing unique offsets associated with each coded-coefficients
US12/232,747 US8149878B2 (en) 2005-07-11 2008-09-23 Apparatus and method of encoding and decoding audio signal
US12/232,748 US8155153B2 (en) 2005-07-11 2008-09-23 Apparatus and method of encoding and decoding audio signal
US12/232,734 US8155144B2 (en) 2005-07-11 2008-09-23 Apparatus and method of encoding and decoding audio signal
US12/232,739 US8155152B2 (en) 2005-07-11 2008-09-23 Apparatus and method of encoding and decoding audio signal
US12/232,744 US8032386B2 (en) 2005-07-11 2008-09-23 Apparatus and method of processing an audio signal
US12/232,740 US8149876B2 (en) 2005-07-11 2008-09-23 Apparatus and method of encoding and decoding audio signal
US12/232,743 US7987008B2 (en) 2005-07-11 2008-09-23 Apparatus and method of processing an audio signal
US12/232,741 US8149877B2 (en) 2005-07-11 2008-09-23 Apparatus and method of encoding and decoding audio signal
US12/232,781 US7930177B2 (en) 2005-07-11 2008-09-24 Apparatus and method of encoding and decoding audio signals using hierarchical block switching and linear prediction coding
US12/232,783 US8275476B2 (en) 2005-07-11 2008-09-24 Apparatus and method of encoding and decoding audio signals
US12/232,784 US7987009B2 (en) 2005-07-11 2008-09-24 Apparatus and method of encoding and decoding audio signals
US12/232,782 US8046092B2 (en) 2005-07-11 2008-09-24 Apparatus and method of encoding and decoding audio signal
US12/314,891 US8065158B2 (en) 2005-07-11 2008-12-18 Apparatus and method of processing an audio signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/KR2005/002308 WO2007011085A1 (en) 2005-07-18 2005-07-18 Apparatus and method of encoding and decoding audio signal

Publications (1)

Publication Number Publication Date
WO2007011085A1 true WO2007011085A1 (en) 2007-01-25

Family

ID=37668951

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2005/002308 WO2007011085A1 (en) 2005-07-11 2005-07-18 Apparatus and method of encoding and decoding audio signal

Country Status (1)

Country Link
WO (1) WO2007011085A1 (en)

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LIEBCHEN T. ET AL.: "MPEG-4 ALS: an emerging standard for lossless audo coding", DATA COMPRESSION CONFERENCE, 2004. PROCEEDINGS: (DCC 2004), pages 439 - 448, XP010692571 *
LIEBCHEN T.: "An introduction to MPEG-4 audio lossless coding", 2004 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP '04), vol. 3, 17 May 2004 (2004-05-17) - 21 May 2004 (2004-05-21), pages 1012 - 1015, XP010718364 *
MORIYA T. ET AL.: "Extended linear prediction tools for lossless audio coding", ICASSP 2004, vol. 3, 17 May 2004 (2004-05-17) - 21 May 2004 (2004-05-21), pages 1008 - 1011, XP010718363 *

Similar Documents

Publication Publication Date Title
US7991272B2 (en) Apparatus and method of processing an audio signal
WO2007011080A1 (en) Apparatus and method of encoding and decoding audio signal
WO2007011083A1 (en) Apparatus and method of encoding and decoding audio signal
WO2007011085A1 (en) Apparatus and method of encoding and decoding audio signal
WO2007011084A1 (en) Apparatus and method of encoding and decoding audio signal
WO2007011078A1 (en) Apparatus and method of encoding and decoding audio signal
WO2007011079A1 (en) Apparatus and method of encoding and decoding audio signal

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 05761451

Country of ref document: EP

Kind code of ref document: A1