WO2007011085A1

WO2007011085A1 - Apparatus and method of encoding and decoding audio signal

Info

Publication number: WO2007011085A1
Application number: PCT/KR2005/002308
Authority: WO
Inventors: Tilman Liebchen
Original assignee: Lg Electronics Inc.; Noll, Peter
Priority date: 2005-07-18
Filing date: 2005-07-18
Publication date: 2007-01-25

Abstract

A method and apparatus of encoding and decoding an audio file are disclosed. A channel of an audio data frame included in an audio file is subdivided into a plurality of blocks hierarchically at one or more block switching levels, wherein the audio data frame is included in the audio file. Each block results from a subdivision of a superordinate block of double length. Then, first block switching information indicating that the audio file is block switched is generated. And, second block switching information indicating how the blocks are subdivided from the channel at the block switching levels is generated afterwards.

Description

[DESCRIPTION]

APPARATUS AND METHOD OF ENCODING AND DECODING AUDIO SIGNAL

Technical Field

The present invention relates to a method for processing audio signal, and

more particularly to a method and apparatus of encoding and decoding audio signal.

Background Art

The storage and replaying of audio signals has been accomplished in different

ways in the past. For example, music and talk has been recorded and preserved by

phonographic technology (e.g. record players), magnetic technology (e.g. cassette

tapes), and digital technology (e.g. compact discs). As audio storage technology

progresses, many challenges need to be overcome to optimize the quality and

storability of audio signals.

For the archiving and broadband transmission of music signals, lossless

reconstruction is becoming a more important feature than high efficiency in

compression by means of perceptual coding as defined in MPEG standards such as

MP3 or AAC. Although DVD audio and Super CD Audio include proprietary lossless

compression schemes, there is a demand for an open and general compression

scheme among content-holders and broadcasters. In response to this demand, a new lossless coding scheme has been considered as an extension to the MPEG-4

Audio standard. Lossless audio coding permits the compression of digital audio data

without any loss in quality due to a perfect reconstruction of the original signal.

Disclosure of Invention

The present invention relates to a method for processing forward-adaptive

linear prediction, which offers remarkable compression even with low

predictor orders. Nevertheless, performance can be significantly improved by using

higher predictor orders, more efficient quantization and encoding of the predictor

coefficients, and adaptive block length switching.

It is an object of the invention to provide an embedded a lossless audio coding

to permit the compression of digital audio data without any loss in quality due to a

perfect reconstruction of the original signal.

Another object of the invention is to provide a lossless coding techniques for

high-definition audio signals. Audio Lossless Coding will define methods for

lossless coding of audio signals with arbitrary sampling rates, resolutions of up to 32

bit, and up to 256 channels. The lossless codec uses forward-adaptive Linear

Predictive Coding (LPC) to reduce bit rates compared to PCM, leaving the

optimization entirely to the encoder. Thus, various encoder implementations are

possible, offering a certain range in terms of efficiency and complexity. Although remarkable compression is achieved even for low predictor orders,

still better compression becomes possible using high-order prediction. In this case,

more efficient coding of the predictor coefficients is necessary in order to limit the

amount of side information. This is achieved by applying a non-linear compander to

the most important coefficients, followed by linear quantization and entropy coding of

the quantized values. In addition, adaptive block length switching is used to account

for changing signal statistics. As a result, compression ratios are comparable to the

best high-order backward adaptive prediction schemes, but with a significantly less

complex decoder, and maintaining full random access to arbitrary parts of the

encoded signal.

The present invention relate to an encoder and/or decoder (including methods

of encoding and decoding) data. Data may be encoded or decoded in a lossless

manner. Embodiments relate to a flexible, hierarchical block switch scheme,

allowing for up to six different block lengths within a frame. Embodiments relate to

independent block switching for each channel. Embodiments relate to a maximum

predictor order of 1023.

Additional advantages, objects, and features of the invention will be set forth

in part in the description which follows and in part will become apparent to those

having ordinary skill in the art upon examination of the following or may be learned

from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written

description and claims hereof as well as the appended drawings.

To achieve these objects and other advantages and in accordance with the

purpose of the invention, as embodied and broadly described herein, a method of

processing an audio file includes subdividing a channel of an audio data frame

included in an audio file into a plurality of blocks hierarchically at one or more block

switching levels, wherein at least two of the subdivided blocks have different lengths,

generating first block switching information indicating that the audio file is block

switched, and generating second block switching information indicating how the

blocks are subdivided from the channel at the block switching levels.

In another aspect of the present invention, a method of encoding an audio

file includes subdividing a channel of an audio data frame included in an audio file

into a plurality of blocks hierarchically at one or more block switching levels, wherein

the audio data frame is included in the audio file, each block resulting from a

subdivision of a superordinate block of double length, generating first block switching

information indicating that the audio file is block switched, and generating second

block switching information indicating how the blocks are subdivided from the

channel at the block switching levels.

In another aspect of the present invention, a method of decoding an audio

file includes receiving an audio file having an audio data frame which has at least one channel, each channel being subdivided into a plurality of blocks hierarchically at one

or more block switching levels, wherein each block results from a subdivision of a

superordinate block of double length, parsing first block switching information from a

file header included in the audio file, the first block switching information indicating

that the audio file is block switched, parsing second block switching information from

the audio data frame to determine 'how the blocks are subdivided from the block

switching levels, and identifying and decoding the subdivided blocks using the parsed

first and second block switching information.

In another aspect of the present invention, an apparatus of encoding an

audio file includes an encoder configured to subdivide a channel of audio data frame

switching levels, to generate first block switching information indicating that the audio

file is block switched, and to generate second block switching information indicating

how the blocks are subdivided from the channel at the block switching levels,

wherein each block results from a subdivision of a superordinate block of double

length.

In a further aspect of the present invention, an apparatus of decoding an

audio file includes a decoder configured to receive an audio file having an audio data

frame which has at least one channel, each channel being subdivided into a plurality

of blocks hierarchically at one or more block switching levels, each block resulting from a subdivision of a superordinate block of double length, wherein the decoder is

further configured to parse first block switching information from a file header

included in the audio file, to parse second block switching information from the audio

data frame, and to identify and decode the subdivided blocks using the parsed first

and second block switching information, the first block switching information

indicating that the audio file is block switched, and the second block switching

information indicating how the blocks are subdivided from the block switching levels.

It is to be understood that both the foregoing general description and the

following detailed description of the present invention are exemplary and explanatory

and are intended to provide further explanation of the invention as claimed.

Brief of Description of Drawings

The accompanying drawings, which are included to provide a further

understanding of the invention and are incorporated in and constitute a part of this

application, illustrate embodiment(s) of the invention and together with the

description serve to explain the principle of the invention. In the drawings:

Figure 1 is an example illustration of an audio signal encoder.

Figure 2 is an example illustration of an audio signal decoder.

Figure 3 is an measured distributions of parcor coefficients for 48KHz, 16-bit

audio material. Figure 4 is an compander functions C(r) and -C(-r).

Figure 5 is an example of a block switching hierarchy structure.

Figure 6 is an example of a block switching examples and corresponding

block switching information codes.

Figure 7 is an example of a bit stream of old block switching scheme.

Figure 8 is an example of a bit stream of new block switching (BS) scheme:

No BS (top), synchronized BS between CPE channels 1 and 2 (middle), independent

BS (bottom).

Figure 9 is a switched difference coding scheme.

Figure 10 is a partition of the residual distribution.

Best Mode for Carrying out the Invention

Reference will now be made in detail to the preferred embodiments of the

present invention, examples of which are illustrated in the accompanying drawings.

Wherever possible, the same reference numbers will be used throughout the

drawings to refer to the same or like parts.

Prior to describing the present invention, it should be noted that most terms

disclosed in the present invention correspond to general terms well known in the art,

but some terms have been selected by the applicant as necessary and will

hereinafter be disclosed in the following description of the present invention. Therefore, it is preferable that the terms defined by the applicant be understood on

the basis of their meanings in the present invention.

In a lossless audio coding method, since the encoding process has to be

perfectly reversible without loss of information, several parts of both encoder and

decoder have to be implemented in a deterministic way.

[Structure of the codec]

Figure 1 shows the typical processing for one input channel of audio data. A

buffer stores one block of input samples, and an optimum set of parcor coefficients is

calculated for each block. The number of coefficients, i.e. the order of the predictor,

can be adaptively chosen as well. The quantized parcor values are entropy coded for

transmission, and converted to LPC coefficients for the prediction filter which

calculates the prediction residual. The residual is entropy coded using different

entropy codes. The indices of the chosen codes have to be transmitted as side

information.

Finally, a multiplexing unit combines coded residual, code indices, predictor

coefficients and other additional information to form the compressed bitstream. The

encoder also provides a CRC checksum, which is supplied mainly for the decoder to

verify the decoded data. On the encoder side, the CRC can be used to ensure that

the compressed data is losslessly decodable. Additional encoder options comprise block length switching, random access

and joint channel coding. The encoder may use these options to offer several

compression levels with different complexities. The basic version of the encoder uses

a fixed block length. Optionally, the encoder can switch between different block

lengths to adapt to stationary regions as well as to transient segments of the audio

signal. The codec allows random access in defined intervals down to some

milliseconds, depending on the block length.

Furthermore, joint channel coding is used to exploit dependencies between

channels of stereo or multi-channel signals. This can be achieved by coding the

difference between two channels in those segments where this difference can be

coded more efficiently than one of the original channels.

The entropy coding part of the prediction residual provides two alternative

coding techniques with different complexities. Besides low complexity yet efficient

Golomb-Rice coding, the BGMC arithmetic coding scheme offers even better

compression at the expense of a slightly increased complexity.

Furthermore, The encoder will also offer efficient compression of floating-point

audio data in the 32-bit IEEE format. This codec extension employs an algorithm that

basically splits the floating-point signal into a truncated integer signal and a

difference signal which contains the remaining fractional part. The integer signal is

then compressed using the normal encoding scheme for PCM signals, while the difference signal is coded separately. A detailed description of the floating-point

extension can be found.

The Figure 2 shows the lossless audio signal decoder which is significantly

less complex than the encoder, since no adaptation has to be carried out. The

decoder merely decodes the entropy coded residual and the parcor values, converts

them into LPC coefficients, and applies the inverse prediction filter to calculate the

lossless reconstruction signal.

The computational effort of the decoder mainly depends on the predictor

orders chosen by the encoder. Since the average order is typically well below the

maximum order, prediction with greater maximum orders does not necessarily lead to

a significant increase of decoder complexity. In most cases, realtime decoding is

possible even on low-end systems.

[Linear Prediction]

Linear prediction is used in many applications for speech and audio signal

processing. In the following, only FIR predictors are considered.

Prediction with FIR Filters

The current sample of a time-discrete signal x(n) can be approximately

predicted from previous samples x(n - Ic) . The prediction is given by K

*(») = ∑Λ* *x(n -k), (D 4=1

where K is the order of the predictor. If the predicted samples are close to the

original samples, the residual

e(n) = x(n) — x(ή) (2)

has a smaller variance than x(ή) itself, hence e(n) can be encoded more

efficiently.

The procedure of estimating the predictor coefficients from a segment of input

samples, prior to filtering that segment, is referred to as forward adaptation. In that

case, the coefficients have to be transmitted. If the coefficients are estimated from

previously processed segments or samples, e.g. from the residual, we speak of

backward adaptation. This procedure has the advantage that no transmission of the

coefficients is needed, since the data required to estimate the coefficients is available

to the decoder as well.

Forward-adaptive prediction with orders around 10 is widely used in speech

coding, and can be employed for lossless audio coding as well. The maximum order of most forward-adaptive lossless prediction schemes is still rather small, e.g. K = 32.

An exception is the special 1-bit lossless codec for the Super Audio CD, which uses

predictor orders of up to 128.

On the other hand, backward-adaptive FIR filters with some hundred

coefficients are commonly used in many areas, e.g. channel equalization and echo

cancellation. Most systems are based on the LMS algorithm or a variation thereof,

which has also been proposed for lossless audio coding. Such LMS-based coding

schemes with high orders are applicable since the predictor coefficients do not have

to be transmitted as side information, thus their number does not contribute to the

data rate. However, backward-adaptive codecs have the drawback that the

adaptation has to be carried out both in the encoder and the decoder, making the

decoder significantly more complex than in the forward-adaptive case.

Forward-Adaptive Prediction

In forward-adaptive linear prediction, the optimal predictor coefficients Zz^ (Jn

terms of a minimized variance of the residual) are usually estimated for

each block by the autocorrelation method or the covariance method. The

autocorrelation method, using the Levinson-Durbin algorithm, has the additional

advantage of providing a simple means to iteratively adapt the order of the predictor.

Furthermore, the algorithm inherently calculates the corresponding parcor coefficients as well.

Another crucial point in forward-adaptive prediction is to determine a suitable

predictor order. Increasing the order decreases the variance of the prediction error,

which leads to a smaller bit rate R_e for the residual. On the other hand, the bit rate

R_c for the predictor coefficients will rise with the number of coefficients to be

transmitted. Thus, the task is to find the optimum order which minimizes the total bit

rate. This can be expressed by minimizing

R_total (K) = R_e(K) + Rc(K) (3)

with respect to the prediction order K. As the prediction gain rises

monotonically with higher orders, Re decreases with K. On the other hand R_c rises

monotonically with K, since an increasing number of coefficients have to be

transmitted.

The search for the optimum order can be carried out efficiently by the

Levinson-Durbin algorithm, which determines recursively all predictors with

increasing order. For each order, a complete set of predictor coefficients is calculated.

Moreover, the variance σ] of the corresponding residual can be derived, resulting

in an estimate of the expected bit rate for the residual. Together with the bit rate for the coefficients, the total bit rate can be determined in each iteration, i.e. for each

predictor order. The optimum order is found at the point where the total bit rate no

longer decreases.

While it is obvious from equation(3) that the coefficient bit rate has a direct

effect on the total bit rate, a slower increase of R_c also allows to shift the minimum

of R_total to higher orders (where R_e is smaller as well), which would lead to better

compression. Hence, efficient though accurate quantization of the predictor

coefficients plays an important role in achieving maximum compression.

Quantization of Predictor Coefficients

Direct quantization of the predictor coefficients h_k is not very efficient for

transmission, since even small quantization errors may result in large deviations from

the desired spectral characteristics of the optimum prediction filter. For this reason,

the quantization of predictor coefficients is based on the parcor (reflection)

coefficients r_k , which can be calculated by means of the Levinson-Durbin algorithm.

In that case, the resulting values are restricted to the interval [-1 , 1]. Although parcor

coefficients are less sensitive to quantization, they are still too sensitive when their

magnitude is close to unity. The first two parcor coefficients r_{ and r₂ are typically

very close to -1 and +1 , respectively, while the remaining coefficients r_{k t} k > 2, usually have smaller magnitudes. The distributions of the first coefficients are very

different, but high-order coefficients tend to converge to a zero-mean gaussian-like

distribution (Figure 3).

Therefore, only the first two coefficients are companded based on the

following function:

This compander results in a significantly finer resolution at r_x -» -1, whereas

-C(-r₂ ) can be used to provide a finer resolution at r₂ — » +1 (see Figure 4).

However, in order to simplify computation, +C(-r₂ ) is actually used for the

second coefficient, leading to an opposite sign of the companded value.

The two companded coefficients are then quantized using a simple 7-bit

uniform quantizer. This results in the following values:

_aι = [64(-l + j2^+ϊ)] (5)

The remaining coefficients τ_k , k > 2 are not companded but simply quantized

using a 7-bit uniform quantizer again:

β* = [64r_t ] (7)

In all cases the resulting quantized values ak are restricted to the range [-64,

+63]. These quantized coefficients are re-centered around their most probable values,

and then encoded using Golomb-Rice codes. As a result, the average bit rate of the

encoded parcor coefficients can be reduced to approximately 4 bits/coefficient,

without noticeable degration of the spectral characteristics. Thus, it is possible to

employ very high orders up to K = 1023, preferably in conjunction with large block

lengths.

However, the direct form predictor filter uses predictor coefficients h_k

according to Eq. (1 ). In order to employ identical coefficients in the encoder and the

decoder, these h_k values have to be derived from the quantized a_k values in both

cases (see Figures 1 and 2). While it is up to the encoder how to determine a set of

suitable parcor coefficients, A lossless coding method specifies an integer-arithmetic

function for conversion between quantized values a_k and direct predictor

coefficients h_k which ensures their identical reconstruction in both encoder and decoder.

Block Length Switching

Embodiments relate to encoders, decoders, methods of encoding, and

methods of decoding. In embodiments, an encoder is at least one of an audio

encoder, and an Audio Lossless Coding encoder. In embodiments, a method of

encoding is implemented in at least one of an audio encoder, and an Audio Lossless

Coding encoder. In embodiments, a decoder is at least one of an audio decoder,

and an Audio Lossless Coding decoder. In embodiments, a method of decoding is

implemented in at least one of an audio decoder, and an Audio Lossless Coding

decoder.

hierarchical Block Switching>

Embodiments relate to a block switching mechanism which subdivides a

frame of audio data into four quarter-length blocks, instead of encoding it as one

single block. Switching between one long and four short blocks may be performed

adaptively on a frame-by-frame basis.

Even though this switching mechanism may enable a higher compression

ratio than using a constant block length, there may be some drawbacks. For example,

if only 1 :4 switching is possible, 1 :2 or 1 :8 switching (and combinations thereof) may be more efficient in some cases, in accordance with embodiments. For example, if

switching is done identically for all channels, there may be challenges if different

channels require different switching, in accordance with embodiments. For example,

since a more flexible block switching scheme enables the use of a wide range of

block lengths (including very long ones), even higher maximum predictor orders may

be feasible, in accordance with embodiments.

In embodiments, a more flexible, hierarchical block switching scheme, allows

for up to six different blocks lengths (differing by factors of two) within a frame. In

embodiments, independent block switching for each channel may be implemented

(e.g. each channel pair may be switched independently in the case of joint channel

coding). In embodiments, a maximum predictor order of 1023 may be implemented.

In embodiments, the same compression can be achieved with relatively low

decoder complexity, which also allows higher compression at the same complexity.

Audio Lossless Coding (ALS) includes a relatively simple block switching

mechanism. Each frame of N samples is either encoded using one full length block

(N _B = N) or four blocks of length N_B = N/4, where the same block partition applies

to all channels. Under some circumstances, this scheme may have some limitations.

For example, only 1 :4 switching may be possible, although different switching (e.g.

1 :2, 1 :8, and combinations thereof) may be more efficient in some cases. For

example, switching is done identically for all channels, although different channels may require different switching (which is especially true if the channels are not

correlated).

In embodiments, a relatively flexible block switching scheme may be

implemented, where each frame can be hierarchically subdivided into many blocks.

For example, Figure 5 illustrates a frame which can be hierarchically subdivided up to

32 blocks. Arbitrary combinations of blocks with N_B = N, N/2, N/4, N/8, N/16, and

N/32 may be possible within a frame, as long as each block results from a

subdivision of a superordinate block of double length, in accordance with

embodiments. For example, as illustrated in example Figure 2, a partition into N/4 +

N/4 + N/2 may be possible, while a partition into N/4 + N/2 + N/4 may not be possible.

In embodiments, the actual partition may be signaled in an additional field

block switching information(bs_info) (illustrated in the right column of Figure 6),

where the length depends on the number of block switching levels. Table 1

illustrates an example relationship of the maximum number of levels, the minimum

N_B , and the number of bytes used for bs_info.

Table 1 : Block switching levels.

The bsjnfo field may include up to 4 bytes, in accordance with embodiments.

The mapping of bits with respect to the levels 1 to 5 may be [(0)1223333 44444444

55555555 55555555]. The first bit may be reserved for indicating independent block

switching. In the example of Figure 26, there are three levels, thus the minimum

block length is N_B = N/8, and bsjnfo consists of one byte. Starting at the maximum

block length N_B = N, the bits of bsjnfo are set if a block is further subdivided. For

the topmost example there is no subdivision at all, thus the code is (0)0000000. The

frame in the second row is subdivided ((0)1...), where only the second block of length

N/2 is further split ((0)101...) into two blocks of length N/4. If an N/4 block is split as in

the fourth row, it is indicated in the following bits ((0)111 0100).

In each frame, bsjnfo fields may be transmitted for all channel pairs (CPEs)

and all single channels (SCEs), enabling independent block switching for different

channels, in accordance with embodiments.

independent Block Switch inq>

In Independent Block Switching, while the frame length is identical for all

channels, block switching can be done individually for each channel, in accordance with embodiments. If difference coding is used, both channels of a channel pair

should be switched synchronously, but other channel pairs can still use different

block switching. If the two channels of a channel pair are not correlated with each

other, difference coding may not pay off, and thus there will be no need to switch

both channels synchronously. Accordingly, if the two channels of a channel pair are

not correlated with each other, switching the channels independently may not be

practical.

There may be a bs_info field for each CPE and SCE in a frame (e.g. the two

channels of a CPE are switched synchronously), in accordance with embodiments. If

they are switched independently, the first bit of bs_info may be set to 1 , and the

information applies to the CPE's first channel. In this example, another bs_info field

for the second channel becomes necessary.

In embodiments, as a result of the increased flexibility, the arrangement of

blocks in the bit stream can be dynamically arranged. As illustrated in example

Figure 7, all channels use the same partition (e.g. either one long or four short

blocks) and corresponding short blocks of different channels are arranged

successively (e.g. blocks 1.1 , 2.1 , and 3.1), leading to an interleaved structure.

In embodiments illustrated in example Figure 8, short blocks are only

interleaved if they belong to a channel pair that uses difference coding and therefore

synchronized block switching (e.g. the middle row of Figure 8). This interleaving may be beneficial, since in a channel pair a block of one channel (e.g. block 1.2) may

depend on previous blocks from both channels (e.g. blocks 1.1 and 2.1), so these

previous blocks may need to be available prior to the current one. For channels

whose blocks are switched independently, channel data can be arranged separately

(e.g. bottom row of Figure 8).

Embodiments relate to higher predictor orders. Absent hierarchical block

switching, there may be a factor of 4 between the long and the short block length (e.g.

4096 & 1024 or 8192 & 2048), in accordance with embodiments. In embodiments

(e.g. where hierarchical block switching is implemented), this factor can be increased

(e.g. up to 32), enabling a larger range (e.g. 16384 down to 512 or even 32768 to

1024 for high sampling rates).

In embodiments, in order to make better use of very long blocks, higher

maximum predictor orders may be employed. The maximum order may be K_mΛX =

1023. In embodiments, K_n^_x may be bound by the block length N_B, where K_x^

< NB / 8 (e.g. K_mΑK = 255 for N_B = 2048). Therefore, using K^ = 1023 may

require a block length of at least N_B = 8192.

In embodiments, the max_order field in the file header is 10 bits. In embodiments, the opt_order field of the block data is 10 bits. The actual number of

bits in a particular block may depend on the maximum order allowed for a block. If

the block is short, this local maximum order may be smaller than the global maximum

order (stated in max_order in the file header). For example, if K_n^ = 1023, but N_B -

2048, the opt_order field is 8 bits (instead of 10) due to a maximum local order of 255.

The opt_order is determined based on the following equation. opt_order = min

(global prediction order, local prediction order), and the global prediction order is

determined from the max_order, and the local prediction order is determined from the

length of the block. In detail, global and local prediction orders are determined by

global prediction order = ceil(log2(maximum prediction order +1)), and local

prediction order = max(ceil(log2((Nb»3)-1)), 1)

In embodiments, it is necessary to predict data samples of the subdivided

block from channel. A first sample of a current block is predicted using the last K

samples of a previous block. The K value is determined from the opt_order which is

derived the aboved equation.

If the current block is a channel's first block, no samples from the previous

block may be used. In this case, prediction with progressive order is employed,

where the scaled parcor coefficients are converted progressively to LPC coefficient

inside the prediction filter. Random Access

Random access stands for fast access to any part of the encoded audio signal

without costly decoding of previous parts. It is an important feature for applications

that employ seeking, editing, or streaming of the compressed data. In order to enable

random access, the encoder has to insert frames that can be decoded without

decoding previous frames. In those random access frames, no samples from

previous frames may be used for prediction.

The distance between random access frames can be chosen from 255 to one

frame. Depending on frame length and sampling rate, random access down to some

milliseconds is possible.

However, prediction at the beginning of random access frames still constitutes

a problem. A conventional K-th order predictor would normally need K samples from

the previous frame in order the predict the current frame's first sample. Since

samples from previous frames may not be used, the encoder has either to assume

zeros, or to transmit the first K original samples directly, starting the prediction at

position K + 1.

As a result, compression at the beginning of random access frames would be

poor. In order to minimize this problem, the codec uses progressive prediction, which

makes use of as many available samples as possible. While it is of course not feasible to predict the first sample of a random access frame, we can use first-order

prediction for the second sample, second-order prediction for the third sample, and

so forth, until the samples from position K + 1 on are predicted using the full K-th

order predictor. Since the predictor coefficients h_k are calculated recursively from

the quantized parcor coefficients a_k anyway, it is possible to calculate each

coefficient set from orders 1 to K without additional costs.

In the case of 500 ms random access intervals, this scheme produces an

absolute overhead of only 0.01-0.02% compared to continuous prediction without

random access.

Joint Channel Coding

Joint channel coding can be used to exploit dependencies between the two

channels of a stereo signal, or between any two channels of a multi-channel signal.

While it is straightforward to process two channels X₁(H) and x₂ (n) independently,

a simple way to exploit dependencies between these channels is to encode the

difference signal

d{n) = x₂(n) - x_x (n) (8) instead of x1 (n) or x2(n). Switching between X₁ (n) , x₂ (n) and d(ή) m^' each

block can be carried out by comparison of the individual signals, depending on which

two signals can be coded most efficiently (see Figure 9). Such prediction with

switched difference coding is beneficial in cases where two channels are very similar.

In the case of multi-channel material, the channels can be rearranged by the encoder

in order to assign suitable channel pairs.

Besides simple difference coding, Lossless audio codec also supports a more

complex scheme for exploiting interchannel redundancy between arbitrary channels

of multichannel signals.

Entropy Coding of The Residual

In simple mode, the residual values e(ή) are entropy coded using Rice

codes. For each block, either all values can be encoded using the same Rice code,

or the block can be further divided into four parts, each encoded with a different Rice

code. The indices of the applied codes have to be transmitted, as shown in Figure 1.

' Since there are different ways to determine the optimal Rice code for a given set of

data, it is up to the encoder to select suitable codes depending on the statistics of the

residual.

Alternatively, the encoder can use a more complex and efficient coding

scheme called BGMC (Block Gilbert-Moore Codes). In BGMC mode, the encoding of residuals is accomplished by splitting the distribution in two categories (Figure 10):

Residuals that belong to a central region of the distribution, e(n)\ < e_maκ , and ones

that belong to its tails.

The residuals in tails are simply re-centered (i.e. for e(ri) > e_max we have

e_t (n) = e(n) - e_max ) and encoded using Rice codes as described earlier. However, to

encode residuals in the center of the distribution, the BGMC encoder splits them into

LSB and MSB components first, then it encodes MSBs using block Gilbert-Moore

(arithmetic) codes, and finally it transmits LSBs using direct fixed-lengths codes. Both

parameters emax and the number of directly transmitted LSBs are selected such that

they only slightly affect the coding efficiency of this scheme, while making it

significantly less complex.

[Compression Results]

In the following, the lossless audio codec is compared with two of the most

popular programs for lossless audio compression: The open-source codec FLAC,

which uses forward-adaptive prediction as well, and Monkey's Audio (MAC 3.97), a

backward-adaptive codec as the current state-of-the-art algorithm in terms of

compression. Both codecs were run with options providing maximum compression

(flac -8 and mac-c4000). The results for the encoder were determined for a medium compression level (with the prediction order restricted to K _ 60) and a maximum

compression level (K _ 1023), both with random access of 500 ms. The tests were

conducted on a 1.7 GHz Pentium-M system, with 1024 MB of memory. It comprises

nearly 1 GB of stereo waveform data with sampling rates of 48, 96, and 192 kHz, and

resolutions of 16 and 24 bits.

[Compression Ratio!

In the following, the compression ratio is defined as

_{c ^} Compre_SSedFileSize _{H00% ! (g)}

OriginalFileSize

where smaller values mean better compression. The results for the examined

audio formats are shown in Table 2 (192 kHz material is not supported by the

FLAC codec).

Table 2: Comparison of average compression ratios for different audio formats

(kHz/bits)

The results show that ALS at maximum level outperforms both FLAC and

Monkey's Audio for all formats, but particularly for high-definition material (i.e. 96 kHz

/ 24-bit and above). Even at medium level ALS delivers the best overall compression.

rComplexitvi

The complexity of different codecs strongly depends on the actual

implementation, particularly that of the encoder. As mentioned earlier, the audio

signal encoder of the present invention is just a snapshot of an ongoing development.

Thus, we restrict our analysis to the decoder, a simple C code implementation with

no further optimizations. The compressed data was generated by the currently best

encoder implementation. The average CPU load for real-time decoding of various

audio formats, encoded at different complexity levels, is shown in Table 3. Even for

maximum complexity, the CPU load of the decoder is only around 20-25%, which in

return means that file based decoding is at least 4-5 times faster than real-time.

Table 3: Average CPU load (percentage on a 1.7 GHz Pentium-M), depending on audio format (kHz/bits) and ALS encoder complexity.

The codec is designed to offer a large range of complexity levels. While the

maximum level achieves the highest compression at the expense of slowest

encoding and decoding speed, the faster medium level only slightly degrades

compression, but decoding is significantly less complex than for the maximum level

(around 5% CPU load for 48 kHz material). Using a low-complexity level (K _ 15,

Rice coding) degrades compression by only 1-1.5% compared to the medium level,

but the decoder complexity is further reduced by a factor of three (less than 2% CPU

load for 48 kHz material). Thus, audio data can be decoded even on hardware with

very low computing power.

While the encoder complexity may be increased by both higher maximum

orders and a more elaborate block switching algorithm (in accordance with

embodiments), the decoder may be affected by a higher average predictor order.

As the results for a scheme in accordance with embodiments with Z_n

127, The foregoing embodiments (e.g. hierarchical block switching) and advantages

are merely examples and are not to be construed as limiting the appended claims. The above teachings can be applied to other apparatuses and methods, as would be

appreciated by one of ordinary skill in the art. Many alternatives, modifications, and

variations will be apparent to those skilled in the art.

fSyntaxi

The present invention is related the syntax which is comprised in encoded bit

stream. The syntax is as bellows;

File Header: The block_switching field is extended from 1 to 2 bits, the

max_order field is extended from 8 to 10 bits. The framejength and

user_frame_length fields are merged, resulting in a framejength field of 16 bits,

while the userjramejength field is removed.

Table 4: Syntax of alsjieader

Frame Data: If block switching is used, the bsjnfo field is added. Depending

on the value of block_switching, it has 8, 16, or 32 bits. The first bit of a CPE's

bsjnfo field holds the independents flag. The number of blocks is implicitly derived

from bsjnfo as well. If block_switching is off, there is no bsjnfo field, thus blocks is

one and independent_bs is zero.

In order to improve readability, both new and old syntax are shown separately

in the following table, instead of mixing new with old syntax elements.

Table 5: Syntax of frame_data

II

Block Header: The short_blocks field is removed, since block switching

information is completely transmitted on frame level (bs_info, see previous

paragraph).

Table 6: Syntax of blockjheader

Block Data: The opt_order field is extended to a maximum of 10 bits

(previously 8 bits).

Table 7: Syntax of block_data

[Semantics!

File Header:

Table 8: Elements of als header

Frame Data: Table 9: Elements of frame data

Table 10: Elements of block header

Table 11 : Elements of block data

Industrial Applicability

It will be apparent to those skilled in the art that various modifications and

variations can be made in the present invention without departing from the spirit or

scope of the inventions. For example, the present invention can be adopted another

audio signal codec like the lossy audio signal codec. Thus, it is intended that the

present invention covers the modifications and variations of this invention provided

they come within the scope of the appended claims and their equivalents.

Claims

[CLAIMS]

1. A method of processing an audio file, the method comprising:

subdividing a channel of an audio data frame included in an audio file into a

plurality of blocks hierarchically at one or more block switching levels, wherein at

least two of the subdivided blocks have different lengths;

generating first block switching information indicating that the audio file is

block switched; and

generating second block switching information indicating how the blocks are

subdivided from the channel at the block switching levels.

2. The method of claim 1 , wherein the first block switching information is

included in a file header included in the audio file.

3. The method of claim 2, wherein the first block switching information is

indicated by 2 bits.

4. The method of claim 2, wherein the first block switching information is

defined by any one of "01", "10", and "11" to indicate that the audio file is block

switched.

5. The method of claim 1 , wherein a total length of the second block

switching information is determined based on a total number of the block switching

levels.

6. The method of claim 1 , wherein the second block switching information

includes a series of information bits representing how the blocks are subdivided from

the channel at the block switching levels, respectively.

7. The method of claim 6, wherein each information bit has a value of 1 when

a block is subdivided at a corresponding block switching level and has a value of 0

when the block is not subdivided at the corresponding block switching level.

8. The method of claim 1 , further comprising transmitting the first block

switching information.

9. The method of claim 1 , further comprising transmitting the second block

switching information.

10. The method of claim 1 , further comprising predicting data samples of the

blocks subdivided from the channel, wherein a first sample of a current block is predicted using the last K samples of a previous block.

11. The method of claim 10, wherein a first sample of the current block is

predicted using prediction with progressive order when the current block is a

foremost block of the channel.

12. A method of encoding an audio file, the method comprising:

subdividing a channel of an audio data frame included in an audio file into a

plurality of blocks hierarchically at one or more block switching levels, wherein the

audio data frame is included in the audio file, each block resulting from a subdivision

of a superordinate block of double length;

generating first block switching information indicating that the audio file is

block switched; and

generating second block switching information indicating how the blocks are

subdivided from the channel at the block switching levels.

13. A method of decoding an audio file, the method comprising:

receiving an audio file having an audio data frame which has at least one

channel, each channel being subdivided into a plurality of blocks hierarchically at one

or more block switching levels, wherein each block results from a subdivision of a superordinate block of double length;

parsing first block switching information from a file header included in the

audio file, the first block switching information indicating that the audio file is block

switched;

parsing second block switching information from the audio data frame to

determine how the blocks are subdivided from the block switching levels; and

identifying and decoding the subdivided blocks using the parsed first and

second block switching information.

14. An apparatus of encoding an audio file, the apparatus comprising:

an encoder configured to subdivide a channel of audio data frame included in

an audio file into a plurality of blocks hierarchically at one or more block switching

levels, to generate first block switching information indicating that the audio file is

block switched, and to generate second block switching information indicating how

the blocks are subdivided from the channel at the block switching levels, wherein

each block results from a subdivision of a superordinate block of double length.

15. An apparatus of decoding an audio file, the apparatus comprising:

a decoder configured to receive an audio file having an audio data frame

which has at least one channel, each channel being subdivided into a plurality of blocks hierarchically at one or more block switching levels, each block resulting from

a subdivision of a superordinate block of double length, wherein the decoder is

and second block switching information, the first block switching information