CA3026283C - Reconstructing audio signals with multiple decorrelation techniques - Google Patents
Reconstructing audio signals with multiple decorrelation techniques Download PDFInfo
- Publication number
- CA3026283C CA3026283C CA3026283A CA3026283A CA3026283C CA 3026283 C CA3026283 C CA 3026283C CA 3026283 A CA3026283 A CA 3026283A CA 3026283 A CA3026283 A CA 3026283A CA 3026283 C CA3026283 C CA 3026283C
- Authority
- CA
- Canada
- Prior art keywords
- audio
- channels
- subband
- angle
- channel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 171
- 230000005236 sound signal Effects 0.000 title claims abstract description 30
- 230000001052 transient effect Effects 0.000 claims description 130
- 230000008569 process Effects 0.000 claims description 36
- 230000003595 spectral effect Effects 0.000 claims description 35
- 230000002194 synthesizing effect Effects 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 10
- 108091006146 Channels Proteins 0.000 description 342
- 230000006870 function Effects 0.000 description 48
- 239000002131 composite material Substances 0.000 description 46
- 230000008878 coupling Effects 0.000 description 45
- 238000010168 coupling process Methods 0.000 description 45
- 238000005859 coupling reaction Methods 0.000 description 45
- 239000011159 matrix material Substances 0.000 description 45
- 230000008859 change Effects 0.000 description 26
- 230000000875 corresponding effect Effects 0.000 description 17
- 230000000694 effects Effects 0.000 description 17
- 238000009499 grossing Methods 0.000 description 15
- 230000010363 phase shift Effects 0.000 description 13
- 238000013139 quantization Methods 0.000 description 13
- 238000001514 detection method Methods 0.000 description 12
- 239000000654 additive Substances 0.000 description 10
- 230000000996 additive effect Effects 0.000 description 10
- 230000005540 biological transmission Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 10
- 230000004044 response Effects 0.000 description 9
- 230000035945 sensitivity Effects 0.000 description 9
- 238000012937 correction Methods 0.000 description 8
- 238000010606 normalization Methods 0.000 description 7
- 230000002123 temporal effect Effects 0.000 description 7
- 238000012935 Averaging Methods 0.000 description 6
- 230000001419 dependent effect Effects 0.000 description 5
- 230000002829 reductive effect Effects 0.000 description 5
- 238000009825 accumulation Methods 0.000 description 4
- 230000006872 improvement Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000007792 addition Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000001276 controlling effect Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 238000011084 recovery Methods 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- RTZKZFJDLAIYFH-UHFFFAOYSA-N Diethyl ether Chemical compound CCOCC RTZKZFJDLAIYFH-UHFFFAOYSA-N 0.000 description 2
- 230000018199 S phase Effects 0.000 description 2
- FPIPGXGPPPQFEQ-OVSJKPMPSA-N all-trans-retinol Chemical compound OC\C=C(/C)\C=C\C=C(/C)\C=C\C1=C(C)CCCC1(C)C FPIPGXGPPPQFEQ-OVSJKPMPSA-N 0.000 description 2
- 230000004075 alteration Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000007667 floating Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000012856 packing Methods 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000000153 supplemental effect Effects 0.000 description 2
- 238000005773 Enders reaction Methods 0.000 description 1
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 1
- 101150087426 Gnal gene Proteins 0.000 description 1
- 241000669426 Pinnaspis aspidistrae Species 0.000 description 1
- 241001237728 Precis Species 0.000 description 1
- ATJFFYVFTNAWJD-UHFFFAOYSA-N Tin Chemical compound [Sn] ATJFFYVFTNAWJD-UHFFFAOYSA-N 0.000 description 1
- 239000011717 all-trans-retinol Substances 0.000 description 1
- 235000019169 all-trans-retinol Nutrition 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- JEIPFZHSYJVQDO-UHFFFAOYSA-N ferric oxide Chemical compound O=[Fe]O[Fe]=O JEIPFZHSYJVQDO-UHFFFAOYSA-N 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- XULSCZPZVQIMFM-IPZQJPLYSA-N odevixibat Chemical compound C12=CC(SC)=C(OCC(=O)N[C@@H](C(=O)N[C@@H](CC)C(O)=O)C=3C=CC(O)=CC=3)C=C2S(=O)(=O)NC(CCCC)(CCCC)CN1C1=CC=CC=C1 XULSCZPZVQIMFM-IPZQJPLYSA-N 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- ZLIBICFPKPWGIZ-UHFFFAOYSA-N pyrimethanil Chemical compound CC1=CC(C)=NC(NC=2C=CC=CC=2)=N1 ZLIBICFPKPWGIZ-UHFFFAOYSA-N 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 210000004761 scalp Anatomy 0.000 description 1
- 238000007789 sealing Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Stereophonic System (AREA)
Abstract
Systems and methods of audio signal processing are provided that relate to improved upmixing, whereby N audio channels are derived from M audio channels, a decorrelated version of the M audio channels and a set of spatial parameters.
The set of spatial parameters includes an amplitude parameter, a correlation parameter and a phase parameter. The M audio channels are decorrelated using multiple decorrelation techniques to obtain the decorrelated version of the M audio channels. This can be used, for example, for generating an N audio channel upmix.
The set of spatial parameters includes an amplitude parameter, a correlation parameter and a phase parameter. The M audio channels are decorrelated using multiple decorrelation techniques to obtain the decorrelated version of the M audio channels. This can be used, for example, for generating an N audio channel upmix.
Description
7322 14 - 9 2 D lOPPH
4. - 1 -,-Description RECONSTRUCTING AUDIO SIGNALS WITH MULTIPLE DECORRELATION TECHNIQUES
This is a divisional of Canadian Patent Application No. 2,992,051 filed February 28, 2005 which is a divisional Canadian Patent Application No. 2,917,518 filed February 28, 2005, which is a divisional of Canadian Patent Application Serial No. 2,808,226 filed February 28, 2005, which is a divisional of Canadian National Phase Patent Application Serial No. 2,556,575 filed February 28, 2005.
Technical Field The invention relates generally to audio signal processing. The invention is particularly useful in low bitrate and very low bitrate audio signal processing. More particularly, aspects of the invention relate to an encoder (or encoding process), a decoder (or decoding processes), and to an encode/decode system (or encoding/decoding process) for audio signals in which a plurality of audio channels is represented by a composite monophonic ("mono") audio channel and auxiliary ("sidechain") information. Alternatively, the plurality of audio channels is represented by a plurality of audio channels and sidechain information. Aspects of the invention also relate to a multichannel to composite monophonic channel downmixer (or downmix process), to a monophonic channel to multichannel upmixer (or upmixer process), and to a monophonic channel to multichannel decorrelator (or decorrelation process). Other aspects of the invention relate to a multichannel-to-multichannel downmixer (or downmix process), to a multichannel-to-multichannel upmixer (or upmix process), and to a decorrelator (or decorrelation process).
Background Art In the AC-3 digital audio encoding and decoding system, channels may be selectively combined or "coupled" at high frequencies when the system becomes starved for bits. Details of the AC-3 system are well known in the art - see, for example: ATSC Standard A52/A:
Digital Audio Compression Standard (AC-3), Revision A, Advanced Television Systems Committee, 20 Aug.
2001. The A/52 A
document is available on the World Wide Web at http://www.atsc.org/standards.html.
The frequency above which the AC-3 system combines channels on demand is referred to as the "coupling" frequency. Above the coupling frequency, the coupled channels are combined into a "coupling" or composite channel. The encoder generates "coupling coordinates"
(amplitude scale factors) for each subband above the coupling frequency in each channel. The coupling coordinates indicate the ratio of the original = = = 73221-92 , =
= - 2 -= energy of each coupled channel subband to the energy of the corresponding subband in .
-= the composite channeL Below the coupling fu'lquency,.channels are encoded discretely.
The phase polarity of a coupled. channel's subbandmay be reversed:before the channel is combined with=one or more other coupled channels in order to reduce out-of-phase signal component cancellation. The composite channel along with sidechaininforination that= .
includes, on a per-subband basis, the coupling Coordinates and whether the channel's phase is inverted, are sent to the decoder. In praCtice, the coupling frequencies. employed = in. commercial embodiments of the AC-3 system have ranged from about 10 kHzio about 3500 Hz. U.8. Patents 5,583,962; -5,633;981, 5,727,119,5,909,664, and 6,021,386 include teachings that relate to the combining of multiple audio channels into a composite channel and auxiliary or sidechain information and the recovery therefrom of an approximation to the original multiple channels.
Disclosure of the litveution =
. Aspects Of the present invention may be viewed as improvements upon the =
. = "coupling" techniques of the AC-3 encoding and decoding system and also upon other techniques in which multiple channels of audio arc combined either to a monophonic -composite signal or to multiPle channels of audio along with related auxiliary infortaation and from which multiple channels of audio are reconstructed. Aspects of the present .
invention also may be viewed as improvements upon techniques for. downmixing multiple audio channels to a monophonic audio sigtial or to multiple audio channels and for =
decorrelating multiple audio channels derived from a monophonic audio Channel or from .=
multiple audio channels. =
. .
=
Aspects .of the invention may. be employed in an N:1:N spatial audio coding -technique (where "N'.' ikthe number of audio Channels) or an M:1:N spatial audio coding = ' technique (where."1V1" is the number' of encoded audio olmnnels and "N" is the number of, .
decoded audio channels) that improve on channel coupling, by providing, among other , things, improVed phase compensation, deconelatiOn mechanisms, ,and signal-dependent variable time-constants. Aspects of the present invention may also be employed in N:x:N
and M:x..N spatial audio ,coding techniques wherein "x" may be 1 or greater than 1.
- Goals include the reduction of coupling cancellation artifacts in the encode process by' adjusting relative interchannel phase before downeaixing, and improving the spatial . =
=
= =
. .
=
dimensionally of the reproduced signal by restoring the phase angles and degrees of decorrelation in the decoder. Aspects of the invention when embodied in practical embodiments should allow for continuous rather than on-demand channel coupling and lower coupling frequencies than, for example in the AC-3 system, thereby reducing the required data rate.
According to one aspect of the present invention, there is provided a method performed in an audio decoder for reconstructing N audio channels from an audio signal having M encoded udio channels, the method comprising: receiving a bitstream containing the M encoded audio channels and a set of spatial parameters, wherein the set of spatial parameters includes an amplitude parameter and a correlation parameter;
wherein the correlation parameter is differentially encoded across time; decoding the M
encoded audio channels to obtain M audio channels, wherein each of the M audio channels is divided into a plurality of frequency bands, and each frequency band includes one or more spectral components; extracting the set of spatial parameters from the bitstream;
applying a differential decoding process across time to the differentially encoded correlation parameter to obtain a differentially decoded correlation parameter; analyzing the M audio channels to detect a location of a transient; decorrelating the M audio channels to obtain a decorrelated version of the M audio channels, wherein a first decorrelation technique is applied to a first subset of the plurality of frequency bands of each audio channel and a second decorrelation technique is applied to a second subset of the plurality of frequency bands of each audio channel; deriving the N audio channels from the M audio channels, the decorrelated version of the M audio channels, and the set of spatial parameters, wherein N is two or more, M is one or more, and M is less than N; and synthesizing, by an audio reproduction device, the N
audio channels as an output audio signal, wherein both the analyzing and the decorrelating are performed in a frequency domain, the first decorrelation technique represents a first mode of operation of a decorrelator, the second decorrelation technique represents a second mode of operation of the decorrelator, and the audio decoder is implemented at least in part in hardware.
According to another aspect of the present invention, there is provided an audio decoder for decoding M encoded audio channels representing N audio channels, the audio decoder comprising: an input interface for receiving a bitstream containing the M encoded - 3a -audio channels and a set of spatial parameters, wherein the set of spatial parameters includes an amplitude parameter and a correlation parameter; wherein the correlation parameter is differentially encoded across time; an audio decoder for decoding the M
encoded audio channels to obtain M audio channels, wherein each of the M audio channels is divided into a plurality of frequency bands, and each frequency band includes one or more spectral components; a demultiplexer for extracting the set of spatial parameters from the bitstream; a processor for applying a differential decoding process across time to the differentially encoded correlation parameter to obtain a differentially decoded correlation parameter, and analyzing the M audio channels to detect a location of a transient; a decorrelator for decorrelating the M audio channels, wherein a first decorrelation technique is applied to a first subset of the plurality of frequency bands of each audio channel and a second decorrelation technique is applied to a second subset of the plurality of frequency bands of each audio channel; a reconstructor for deriving N audio channels from the M audio channels and the set of spatial parameters, wherein N is two or more, M is one or more, and M is less than N; and an audio reproduction device that synthesizes the N audio channels as an output audio signal, wherein both the analyzing and the decorrelating are performed in a frequency domain, the first decorrelation technique represents a first mode of operation of the decorrelator, and the second decorrelation technique represents a second mode of operation of the decorrelator.
Description of the Drawings FIG. 1 is an idealized block diagram showing the principal functions or devices of an N:1 encoding arrangement embodying aspects of the present invention.
FIG. 2 is an idealized block diagram showing the principal functions or devices of a 1:N decoding arrangement embodying aspects of the present invention.
FIG. 3 shows an example of a simplified conceptual organization of bins and subbands along a (vertical) frequency axis and blocks and a frame along a (horizontal) time axis. The figure is not to scale.
- 3b -FIG. 4 is in the nature of a hybrid flowchart and functional block diagram showing encoding steps or devices performing functions of an encoding arrangement embodying aspects of the present invention.
FIG. 5 is in the nature of a hybrid flowchart and functional block diagram showing decoding steps or devices performing functions of a decoding arrangement embodying aspects of the present invention.
FIG. 6 is an idealized block diagram showing the principal functions or devices of a first N:x encoding arrangement embodying aspects of the present invention.
FIG. 7 is an idealized block diagram showing the principal functions or devices of .. an x:M decoding arrangement embodying aspects of the present invention.
FIG. 8 is an idealized block diagram showing the principal functions or devices of a first alternative x:M decoding arrangement embodying aspects of the present invention.
FIG. 9 is an idealized block diagram showing the principal functions or devices of a second alternative x:M decoding arrangement embodying aspects of the present invention.
Best Mode for Carrying Out the Invention Basic N:1 Encoder Referring to FIG. 1, an N:1 encoder function or device embodying aspects of the present invention is shown. The figure is an example of a function or structure that = , . WO 2005/086139 PCT/ITS2005/00 performs as a basic encoder embodying aspects of the invention. Other functional or strantaral arrangements that practice aspects of the invention may be employed, including alternative and/or equivalent functional or structural arrangements described below.
Two or more andio input channels are applied to the encoder. Although, in principle, aspects of the invention may be practiced by analog, digital or hybrid analog/digital embodiments, examples disclosed herein are digital embodiments.
This, = the input signals may be time samples that may have been derived from analog audio signjl. The time samples may be encoded as linear pulse-code modulation (PCM) signals. Each linear PCM audio input channel is processed by a ffiterbank function or device having both an in-phase and a quadmture output, such as a 512-pointwindowed forward discrete Fourier transform (DFT) (as implemented by a Fast Fourier Transform (FFT)). The flterbank may be considered to be a thus-domain to frequency-domain transfonn. =
FIG. 1 shows a first PCM channel input (channel "1") applied to a filterbank function or device, "Filterbank" 2, and a second PCM channel input (channel "n") = applied, respectively, to another filterbank function, or device, "Filterbank" 4. There may be "n" input channels, where "n" is a whole positive integer equal to two or more. Thus, there also are "n" Filterbanks, each receiving a unique one of the "n" input channels. For simplicity in presentation, FIG. 1 shows only two input channels, "1" and "IV.
=
When a Filterbank is implemented by an FFT, input time-domain signals are segmented into consecutive blocks and are usually processed in overlapping blocks. The Mfrs discrete frequency outputs (transfonu coefficients) are referred to as bins, each having a complex value with real and imaginary parts corresponding, respectively, to in-phase and quadrature cnrnponents. Contiguous transform bins may be grouped into subbands approximating critical bandwidths of the human ear, and most sidechain information produced by -the encoder, as will be described, may be calculated and transmitted" on a per-subb and basis in order to minimize pmcpssing resources and to reduce the bitrate. Multiple successive time-domain blocks may be grouped into frames, with individual, block values averaged or otherwise combined or accumulated across each 50 frame, to minimize the sidechain data rate. In examples described herein, each ffiterbank isimplemented by an FFF, contiguous transform bins are grouped into subbands, blocks . = are grouped into frames and sidechain data is sent on a once per-frame basis.
. . - =
' = = ' ,W0200510g6139 PCM0S2005/0063 =
Alternatively; sidechain data may be sent on a more than once per frame basis (e.g., once per block). See, for example, FIG. 3 and its description, hereinafter. As is well known, there is a tradeoff between the frequency at which sideAsiri information is sent and the - required bitrate.
A suitable practical implementation of aspects of the present invention may employ fixed length frames of about 32 milliseconds when a48 kHz sampling rate is employed, each frame having six blonlrs at intervals of about 5.3 milliseconds each (employing, for example, blocks having a duration of about 10.6 milliseconds with a 50%
overlap). However, neither such timings nor the employment of fixed length frames nor their division mto a fixed number of blocks is critical to practicing aspects of the invention provided that information described herein as being sent on a per-frame basis is = sent no less frequently than about every 40 milliseconds. Frames may be of arbitrary size and their size may vary dynamically. Variable block lengths may be employed as in the AC-3 system cited above. It is with that understanding that reference is made herein to es" and "blocks."
hi practice, if the composite mono or multichannel signal(s), or the composite mono or multichsrmel signal(s) and discrete low-frequency channels, are encoded, as for example by a perceptual coder, as described below, it is convenient to employ the same ' frame and block configuration as employed intim perceptual coder. Moreover, if the coder emPloys variable block lengths such that there is, from time to time, a switching from one block length to another, it would be desirable ifOne or more of the sidechain information as described herein is updated when such a block switch occurs. In order to minimize the increase in data overhead upon the updating of sidechain information upon the occurrence of such a switch, the frequency resolution of the Updated sidechain information may be reduced.
= FIG. 3 shows an example of a simplified conceptual organization of bins and subbands along a (vertical) frequency tods and blocks and a frame along a (horizontal) time axis. When bins are divided into subbands that approximate critical bands, the lowest frequency subbauds have the fewest bins (e.g., one) and the number of bins per subband increase with increasing frequency.
- Returning to FIG. 1, a frequency-doma* versign ofeach of then time-domain input channels, produced by the each 4am/id's respective Filterbank (Filterbanks 2 .and 4 . .
, . , =
=
" WO 2003/086139 PCTATS2005/00.
= "
in this example) are summed together ("downmix' ed") to a monophonic ("mono") composite audio signal by an additive combining fimction of device "Additive Combiner"
6. =
The downmixing may be applied to the entire frequency bandwidth of the input audio signals or, optionally, it may be limited to frequencies above a given "coupling"
frequency, inasmuch as artifacts of the downmixing process may become more audible at middle to low frequencies. In such cases, the channels may be conveyed discretely below the coupling frequency. This strategy may be desirable even if processing artifacts are not an:issue, in that naid/low frequency.subbands constructed by grouping transform bins into critical-band-like subbands (size roughly proportional to frequency) tend to have a = =
small number of transform bins at low frequencies (one bin at very low frequencies) and.
- may be directly coded with as few or fewer bits than is required to send a downmixecl mono audio signal with siderthain information. A coupling or transition frequency as low as 4 kHz, 2300 Hz, 1000 Hz, or even the bottom of the frequency band of the audio signals applied to the encoder, may be acceptable for some applications;
particularly those in which a very low bitrate is important. Other frequencies may provide a useful balance - between bit savings and listener acceptance. -The choice of a particular coupling frequency is not critical to the invention. The coupling frequency rnay be variable and, if variable, it may depend, for example, directly or indirectly on input signal characteristics.
Before dovvnmixing, it is an aspect of the present invention to improve the channels' phase angle alignments vis-a-vis each other, in order to reduce the cancellation of out-of-phase signal components when the channels are combined and to provide an improved mono composite chsrmel. This maybe accomplished by- controllably shillbg over time the "absolute angle" of some or all of the transform bins in ones of the channels. For example, all of the transform bins representing audio above a coupling frequency, thus defining a frequency band of interest, may be controllably shifted over time, as necessary, in every channel or, when one channel is used as a reference, in all but the reference channel. =
The "absolute angle" of a binmay be taken as the angle of the magnitude-and-angle representation of each complex valued tran.sform bin produced by a filterbank Controllable shifting of the absolute angles of bins in a channel is performed by an angle rotation fimetion or device ("Rotate Angie"). Rotate Angle 8 processes the output of =
=
=
=
= = - = .
=
= _ , = = WO 2005/086139 PCT/US2005/0063 ; =
4.
= - 7 7 =
Filterbank 2 prior to its application to the downmix summation provided by Additive Combiner 6, while Rotate Angle 10 processes the output of Filterbank 4 prior to its application to the Additive Combiner 6. It will be appreciated that, under some signal conditions, no angle rotation may be required for a particular.traniform bin over a time period (the time period of a frame, in examples described herein). Below the coupling' frequency, the channel information may be encoded discretely (not shown in FIG. 1).
In principle, an improvement in the channels' phase angle alignments with respect to, each other may be accomplished by shifting the phase of every transform bin or subband by the negative of its absolute phase angle, in each block throughout the = frequency band of interest Although this substantially avoids cancellation of out-of-phase signal components, it tends to cause artifacts that may be audible, particularly if the resulting mono composite signal is listened to in isolation. Thus, it is desirable to employ the principle of "least treatment" by shifting the absolute angles of bins in a rhnnnel only as much as necessary to m11mmi7e out-of-phase cancellation in the downmix process and minimize spatial image collapse of the multichannel signals reconstittrted by the decoder.
Techniques for Glett-rrnining such angle shifts are described below. Such techniques include time and frequency smoothing and the manner in which the signal processing responds to the presence of a transient = Energy normalization may also be performed on a per-bin basis in the encoder to reduce farther any remaining out-of-phase cancellation of isolated bins, as described further below.. Also as described further below, energy normali7ation may also be = performed on a per-subband basis (in the decoder) to assure that the energy of the mono composite signal equals the sums of the energies of the contributing channels.
Each input channel has an audio analyzer function or device ("Audio Analyzer") associated with it for generating the sidechain information for that channel and for controlling the amount or degree of angle rotation applied to the channel before it is = - applied to the downmix summation 6. The Filterbonlr outputs of channels 1 and n are applied to Audio Analyzer 12 and to Audio Analyzer 14, respectively. Audio Analyzer 12 generates the sidechain information for channel 1 and the amount of phase angle rotation for channel 1. Audio Analyzer 14 generates the sidechain information for channel n and the amount of angle rotation for tharmel U. It will be understood that such references herein to "angle" refer to phase angle.
= =
. = . = =
- =
= =
_ .
s WO 2005/08613-9 PCT/US2005/00( The sidechain inforination for each channel generated by an audio analyzer for each channel may include:
= an Amplitude Scale Factor ("Amplitude SF"), =
an Angle Control Parameter, a Decor-relation Scale Factor ("Decorrelation SF"), a Transient Flag, and optionally, an Interpolation Flag = Such sidechain information may be characterized as "spatial parameters,"
indicative of spatial properties of the channels and/or indicative of signal charac.
teristics that may be relevant to spatial processing, such as transients. In each case, the sidechain information applies to a single subband (except for the Transient Flag and the Interpolation Flag, each =
of which apply to all subbands within a channel) and may be updated once per frame, as in the examples described below, or upon the Occurrence of a block switch in a related coder. Further details of the various spatial parameters are set forth below.
The angle =
rotation for a particular channel in the encoder may be taken as the polarity-reversed Angle Control Parameter that forms part of the sidechain information.
= Ha reference channel is employed, that channel may not require an Audio . Analyzer or, alternatively? may require an Audio Analyzer that generates only Amplitude Scale Factor sidechain infomiation. it is not necessary to send an Amplitude Scale Factor if that scale factor can be deduced with sufficient accuracy by a decoder from the Amplitude Scale Factors of the other, non-reference, cbinnels. It is possible to deduce in = the decoder the approximate ialue of the reference channel's Amplitude Scale Factor if the energy normalization in the encoder assures that the scale factors across channels within any subband aubstantiallysum square, to 1, as described below. The deduced approximate reference channel Amplitude Scale Factor value may have errors as a result of the relatively coarse q-uantiyation of amplitude scale factors resulting in image shills in .
the reproduced multi-channel audio. However, in a low data rate environment such artifacts may be more acceptable than using the bits to send the reference channel's Amplitude Scale Factor. Neverthelessiin some cases it may be desirable to employ an =
audio analyzer for the refefence ...hannel that generates, at least, Amplitude Scale Factor = sidechain information. =
=
=
=
=
= = = -, PCT/IIS2005/006-. ' =
= =
= = -9- =
= FIG. 1 showsin a dashed line an optional input to each audio, Anslyzer from the PCM time domain input to the audio analyzer in the channel. This input may be used by the Audio Analyzer to detect a transient over a time period (the period of a block or frame, in the examples described herein) and to generate a transient indicator (e.g., a one-bit "Transient Flag") in response to a transient Alternatively, as described below in the comments to Step 40g of FIG. 4, a transient may be detected in the frequency domain, in which case the Audio Analyzer need not receive a time-domain input-The mono composite audio signal and the sidechain information for all the channels (or all the channels except the reference channel) may be stored, transmitted, or stored and transmitted to a decoding process or device ("Decoder").
Preliminary to the = storage, transmission, or storage and transmission, the various audio signals and various sideehain information may be multiplexed and packed into one or more bitstreams suitable for the storage, transmission or storage and transmission medium or media_ The mono composite audio may be applied to a data-rate reducing encoding process or device such as, for example, a perceptual encoder or to a perceptual encoder and an entropy coder (e.g., arithmetic or Huffman coder) (sometimes referred to as a "hissle,ss" coder) prior to storage, transmission, or storage and transmission. Also, as mentioned above, the mono composite audio and related sidechain information may be derived from multiple input channels only for audio frequencies above a certain frequency (a "Coupling"
frequency). In that case,. the audio frequencies below the coupling frequency in each of the multiple input-rthannels may be stored, transmitted or stored and transmitted as = discrete channels or may be combined or processed in some manner other than as described herein'. SuCh discrete or otherwise-combined channels may also be applied to a data reducing encoding process or device such as, for example, a perceptual encoder or a perceptual encoder and anentropy encoder. The mono composite audio and the discrete = multichannel audio may all be applied to an integrated perceptual encoding or perceptual and entropy encoding process or device.
The particular manner in which sidechain information is carried in the encoder =
bitstream. is not critical to the invention. If desired, the sidechsh information may be carried in such as way that the bitstream is compatible with legacy decoders (i.e., the bitstream is backwards-compaiible). Many suitable techniques for doing so are known.
For example, many encoders generate a bitstream having unused or null bits that are =
=
= = = = -. .
= . -73221-92 = =
_ , =
= = =
. . .
. ignored by the decoder. An example of sueh anarrangement is set forth in United States = 'Patent 6,807,528 B1 of Truman et al, entitled "Adding Data to a Compressed Data Frame," October 19, 2004; = = . . .
Such bits ray be replaced with the sid.echain information. Another example is = .5 = that the Sideehain information ni.ay be steganographically encoded in the encoder's-. .
. bitsiream. Alternatively, the sidechain information may be stored or transmitted separately from the backwards-compatible bitstream by any technique that permits the =
transmission or storage of such infonnaticin along with a moon/stereo hitstreara = =
.. = . compatible with legacy decoders.
. - = 10 = . Basic ..1:N and .1:MDecodei-. =
.Referdng to FIG. 2, a decoder functiOn or device ("Decoder") embodying aspects: .
= of the present invention is shown. The figure is an example of a function or structure that performs .as a basic decoder embodying aspeets of the invention. Other functional or stuctm:Eif arrangements that practice aspect of the invention may be employed, including 15 alternative and/or equivalent functional or structural arrangement described below.
The Decoder receives the mono composite audio signal and the sideehain =
= =
information for All the channels .or all the channels except the reference channel. If =
necessary, the composite audio signal and related sidechain information isdemultiplexed, = . unpacked and/or decoded. Decoding may employ a table lookup. The goal is to derive 20 = fibm the mono composite audio channels a plurality of individual audio channels =
approxiniating respective ones of the audio channels applied to the Encoder of FIG. 1, = .
= subject to bitrate-reducing techniques of the present invention that are described herein.
= . = Of course, one may choose not to recover all of the channels applied to the encoder or to use only the monophonic composite signal. Alternatively;
channels in 25 addition, to the ones applied to the Encoder may he derived from the output of a Decoder =
=
according to aspects of the present invention by employing aspects of the inventions = =
== described in International Applioation PCT/T,JS 92/03619, filed February 7,2002, = .
published August 15;2002, designating the-United States, and its rescilting U.S. national = application S.N. 10/467,213, filed August 5,20.03, and in .International Application' 30 PCT/US03/24570, filed-August 6,2003, published March 4, 2001 as WO
2004/019656, = designating the United States, and it resulting U.S. national application &N. 10/522,515, . =
. Ja..n.tiatY 27, 2005. . =
=
_ . =
. . -= =
= = =
. .
=
. .
=
_ .
=
= =
- ' = , 73221792 .
=
=
- - . =
- . = .
=
=
- 11 - = -= .
Chiumels 'recovered by a Decoder practicing iiipects of the present invention are = =
=
'=== =1: pattieularlYnseful hi corm.ection with the c !met rmiltifilication.
techniques of the cited =
=
applications iii that the recovered channels not only have useful . inter& aim el amplitUde relationships but also have useful interchannelphase relationships. _ == 5. Another alternative for Channel multiplication is to employ a matrix decoder to derive = = additional channels. The=interchannel amplitude- andphasa-preservation aspects of the == present inventionmake the output channels Of a decoder embodying aspects of the .
present inventionparticularly suitable for application to an. amplitude- and phase-sensitive matrix decoder. Many such matrix decoders employ wideband control' circuits that .
= 10. = operate properly only when the signals applied to them are stereo throughoutthe signals' . :bandwidth.. Thus, if the aspects of the present invention are embodied in an,N:1:1\T system. = .
=
= ill Which is. 2;A iw:o chtiunels recovered by. the decoder May be applied to a 2:M =
= active matrix deeod6r. Such channels may have been discrete chaimelSbelow a coupling frequency, as mentioned above. Many-suitable active matrix decoders are. well known in = =15 the art, including, for example, matdi decoders known as 'Pro Logic"
and "Pro Logic II"
-=
decoders ("Pro Ifogic" is a trademark of Dolby Laboratories Licensing Corporation).
=
=
Aspects of Pro Logic decoders are disclosed in U.S: Patents 4,799,260 and 4,941,177, =
=
= = = = Aspects of Pro Logic 11 =
=
=
decoders are disblosed in pending U.S. Patent Application S.N..09/532,711 of Posgatc;
20 entitled "Method for.l)eriving Eit LOLISt Thrie Audio 8ignals from Two Input Audio =
Signals,' filed March 22, 2000 and published as WO 01141504 on June 7, 2001, and in = 'pending U.S. Patent:Application 5.a 10/362,786. ofFosgate et al,:entitled "Method for ' = Apparatus for Audio Matrix Decoding," filed February 25, 2003 and published as US
. 2004/0125960 Al on July 1, 2004.
25 Some aspects of the operation of-Dolby Pro Logic and Pro Logic II
= , =
. = = = decoders are exPlained, for example, in Papers available on the Dolby Laboratories' =
website .(wVrw.dolby.com): "Dolby Stniound Pro-Logic Decoder Principles of =
. Operation,' by Roger Dressler, and "Mixing with Dolby Pro Logic II
Technology, by Jim Ililson. Other suitable active matrix decoders may include those described in one or more =
30 of the following U.S. Patents and published International Applications (each designating = =
= the United States);
=
===
. .
=
=
=
. .
=
=*. VO 2005/086139 PCT/US2005/00 === ' = - 12 -5,046,098; 5,274,740; 5,400,433; 5,625,696; 5,644,640; 5,504,819; 5,428,687;
5,172,415;
and WO 02/19768. ' =
Referring again to=FIG. 2, the received mono composite audio channel is applied to a plurality of signgl pathg from which a resPective one of each of the recovered _ multiple audio channels is derived. Each channel-deriving path includes, in either order, an amplitude adjusting function or device ("Adjust Amplitude") and an. angle rotation function or device ("Rotate Angle").
. = The Adjust Amplitudes apply gains or losses to the Mono composite signal so that, -ender certain signal conditions, the relative output magnitudes (or energies) of the output channels derived from it are similar to those of the Hymnals at the input of the encoder.
Alternatively, under certain signal conditions when "randomized" angle variations are imposed, as next described, a controllable amount of "randomized" amplitude variations may also be imposed on the amplitude of a recovered channel in order to improve its decorrelation with respect to other Ones of the recovered channels.
= 15 The Rotate Angles applYphase rotations so that, under certain signal conditions, =
the relative phase angles of the output channels derived from the mono composite signal .
are similar to those of the channels at the input of the encoder. Preferably, under certain signal conditions, a controllable amount of "randomized" angle variations is also imposed on the angle of a recovered channel in. order to improve its decorrelatidn with respeot to other ones of the recovered channels.
As discussed further below, "randomized" angle amplitude variations may include not only pseudo-random and hilly random variations, but alsa deterministically-generated variations that have the effect of reducing cross-correlation between channels. This is discussed further below in the Comments to Step 505 of FIG. 5A.
Conceptnaily, the Adjust Amplitude and Rotate Angle for a particular channel scale the mono composite audio DFT coefficients to yield reconstructed transform bin values f3r the channel.
The Adjust Amplitude for earth channel maybe controlled at least by the -recovered sidechain Amplitude Scale Factor for the particular channel or, in the case of _ .
the refetence channel, either from the recovered sidechain Amplitude Scale Factor for the reference channel or from an. Amplitude Scale Factor deduced from the recovered sidechain Amplitude Scale Factors of the other, non-reference, channels.
Alternatively, = =
= = -,. . .
. = r . . = .=
= - 2005/086139 =
Per/C52005/0063 = = = =
= - 13 - = =
. . . .
. to enhance decorrelation of the re-covered:channels, the Adjust Amplitude may also be = controlled by a Randorni7Cd Amplitude Scale Factor Parameter derived from the recovered sidechain Decorrelation Scale Factor for a particular channel and the recovered sidechain Transient Flag for the particular channel.
= The Rotate Angle for each channel may be controlled at least by the recovered sidechain Angle Control Parameter (in which case,. the Rotate Angle in the decoder may = =
substantially undo the angle rotation provided by the Rotate Angle inthe encoder). To enhance decorrelation of ihe recovered 'channels, a Rotate Angle may also be controlled _ by a Randorni7ed Angle Control Parameter derived from the recovered sidechain =
= Decorrelation Scale Factor for a particular channel and the recovered.
sidechain Transient Flag for the particular channel. TheRsndomized.:Ang,le Control Parameter-for a channel, and, if employed, the Randomized Amplitude Scale Factor for a channel, may be derived from the recovered Decorrelation Scale Factor-for the channel and the recovered Transient Flag for the channel by a controllable decorrelator function -or device ("Controllable Decerrelator"). =
Referring to the example of FIG. 2, the.recoveredmono composite audio is applied to a first c-hann el audio recovery path 22, which derives the channel 1 audio, and . to a second channel audio recovery path 24, which derives the channel n audio. Audio = path 22 includes an Adjust Amplitude 26, a Rotate Angle 28, end, if a P CM output is =
desired, an inverse filterbank function or device ("Inverse Filterbarde, 30.
Similarly, audio path 24 includes an Adjust Amplitude 32, a Rotate Angle 34, and, if a PCM output = is desired, an inverse filterbank function or device ("Inverse Filterbank") 36. As with the case of FIG. 1, only two channels are shown for simplicity in Presentation, it being .
= understood that there may be more than two channels.
= The recovered sidechain infomiation for the first channel, channel 1, may inclUde an Amplitude Scale Factor, an Angle Control Parameter, a Decorrelation Scale Factor, a:
Transient Flag, and, optionally, an Interpolation Flag, as stated above in connection..with the description of a basic Encoder: The'Amplitude Scale Factor is applied to, Adjust Amplitude 26. If the optional Interpolation Flag is employed; an optional frequency = . = = .
-30 interpolator or interpolator function ("Interpolator") 27 may be employed in order to interpolate the Angle Control Parameter across frequency (..g., across the bins in each =
subband of a channel). Such interpolation may be, for example, a linear interpolation-of s =-. = . =
= =
. - .
_ .
= - . . . . . -= . = - . . . =
- = = . - .
= =-= = .
= = . . =
=
= =
=
VO 2005/086139 = . PCTIUS2005/006 - 14 - " =
the bin ang eshetween the center a of each subband. The state of the one-bit Interpolation Flag selects whether or not interpolation across frequency is employed, as is explained = further below. The Transient Flag and Decorrelation. Scale Factor are apPlied to a "
= . Controllable Decorrelator 38 that generates a Randomized Angle Control Parameter in =
response thereto. The state Of the one-bit Transient Flag selects one of two multiple modes of randomized angle decor:relation, as is explained further below. The Angle Control Parameter, which may be interpolated across frequency if the Interpolation Flag and the Interpolator are employed, and the 1andorni7ed Angle Control Parameter are summed together by an additive combiner or combining function 40 in order to provide a.
.10 control signal for Rotate Angle 28. Alternatively, the Controllable Decorrelator 38 may =
also generate a Randomized .Amplitude Scale Factor in response to the 'Trsnsie.at Flag and Decorrelation ScaleFacter, in addition to generating a Randomi7pd Angle Control = Parameter. The Amplitude Scale Factor may be summed together with such a =
Randomind Amplitude Scale Factor by an additive combiner or combining function (not shown) in order to provide the control signal for the Adjust Amplitude 26.
. Similarly, recovered sidechain information for the second channel; channel n, may also include an Amplitude Scale Factor, an Angle Control Parameter, a Decorrelation =
Scale Factor, a Transient Flag, and, optionally, an Interpolate Flag, as described above in connection with the description of a basic encoder. The Amplitude Scale Factor is; = .
applied to Adjust Amplitude 32. A frequency interpolator or interpolator function . .
=
("Interpolator") 33 may be employed in order to interpolate the Angle Control Parameter = across frequency. As with channel 1, the state of the one-bit Interpolation Flag selects whether or not interpolation across frequency is employed. The Transient Flag and . = .
Decorrelation Scale Factor are applied to a Controllable Decorrelator 42 that generates a Randomized Angle Control Parameter in. response thereto. As with. channel 1;
the state of =
the one-bit Transient Flag selects one of two multiple modes of randorniaed.
angle decorrelation, as is explained further below. The Angle Control Parameter and the ' Randomized Angle Control Parameter are summed together by an additive coMbiner or =
combining function 44 in order to provide a. control sigeal fur Rotate Angle 34. _ - Alternatively, aideseribedabove in connection with channel 1, the Controllable =
Decorrelator 42 may also generate a Randorniaed Amplitude Scale Factor in response to = the Transient Flag and Decorrelation Scale Factor, in addition to generating a =
= . = ' . _ _ =
. .
. , =
=
=
=
. , 2005/086139 PCT/02005/00f = =
. =
=
= = - 15 -Randomized Angle Control Parameter.. The Amplitude Scale Factor and Randomized =
Amplitude Scale Factor may be summed together by an additive combiner or combining function (not shown) in order -to provide the control signal for the Adjust Amplitude 32.
Although a process or topology as just described is useful for understanding, essentially the same results may be obtained with alternative processes or topologies that achieve the same or similar results. . For example, the 'order of Adjust Amplitude 26(32) = and Rotate-Angle 28 (34) may be reversed xral/or there may be more than the Rotate =-Angle ¨one that responds to the Angle Control Parameter and another that responds to -the Randomized Angle Control Parameter. The Rotate Angle may also be considered to be three rather than one Or two functions or devices, as in the =amp- le of FIG. 5 described below.. If a Randomized Amplitude Scalp Factor is employed, there may be more than =
one Adjust Amplitude ¨ one that responds to the Amplitede SealeFactor and one that responds to the Randomized Amplitude Scale Factor. Because of the human ear's greater , sensitivity to amplitude relative to phase, if a Randomized Amplitude Scale Factor is employed, it May be desirable to scale its effect relative to the effect of the Randomized Angle Control Parameter so that its effect on amplitude is less than the effect that the = Randornized'Artgle Control Parameter has on phage angle. As another alternative process-or topology, the Decorrelation Scale Factor may, be used to control. the ratio of = randomized phase angle versus basid phase angle (rather than adding a parameter representing a randomized phase angle to a parameter representing the basic phase angle), .
and if also employdd, the ratio of randomized amplitude shill versus basic amplitude shift (rather than adding a scale factor representing a randomized amplitude to a scale factor -representing the basic amplitude) -(i.e., a Variable crossfade in each case).
. If a reference channel is employed, as discussed above in connection with the - =
basic encoder, the Rotate Angle, Controllable Decorrelator and Additive Combiner for. = -that channel may be omitted inasmuch as the sidenhain information for the reference channel may include only the Amplitude Scale Factor (or, alternatively, if the sidechain information does not contain an Amplitude Scale Factor for the reference channel, it may be deduced from Amplitude Scale Factors of the other channels when the -energy normalization in-the encoder assures that the scale factors across channels within a = subband sum square to I). An Amplitude Adjust is provided for the reference channel and it is controlled by a received or derived Amplitude Scale Factor for the reference .
. .
=
= = . ' =
=
= = =
=
. = = = . =
. . CA 3026283 2018-12-03 =
= = = = V0 1005/086139 = - 16 7 channel Whether the reference channel's Amplitude Scale Factor is derived from the, = sidechain or is 'deduced in the decoder, the recovered reference channel is an amplitude-scaled version of the mono composite channel. It does not require angle rotation because .
it is the reference for the other cha-nnels' rotations. =
Although adjusting the relative amplitude of recovered clumnels may provide a modest degree of decorrelation, if used alone amplitude adjustment is likely to result in a . = reproduced soundfield substantially lacking in spatia1i7ation or iinaging for many signal conditions (e.g., a "collapsed" soundfield). Amplitude adjustment may affect interaural level differences at the ear, which is only one of the psychoacoustic directional cues employed by the ear. Thus, according to aspects of the invention, certain angle-adjusting techniques may be employed, depending on signal conditions, to provide additional decorrelation. Reference may be made to Table I that provides abbreviated comments = useful in mderstanding the multiple angle-adjusting decorrelation techniques or modes of operation that may be employed in accordance with aspects of the invention.
Other =
decorrelation techniques as described below in connection with the examples of FIGS. 8 and 9 may be employed instead of or in addition to the techniques of Table 1:
= In practice, applying angle rotations and. magnitude alterations may result in circular convolution (also known as cyclic or periodic convolution).
Although,. generally, ' it is desirable to avoid circular convolution, undesirable audible artifacts resulting from circular convolution are somewhat reduced by complementary angle shifting in an =
= . encoder and. decoder.. In addition, the effects of cirOular convolution may be tolerated in low cost implementations of aspects ofthe present invention, particularly those in which the downraixing to mono or multiple channels occurs only in part of the audio frequency band, such as, for example aboire 1500 Hz (in which case the audible effects of circular =
convolution are minimal). Alternatively, circular convolution may be avoided or minirnired by any suitable technique, including, for example, an appropliate use of zero .
padding. One way to use zero padding is to transform the proposed frequency domain variation (representing angle rotations and amplitude scaling) to the time domain, window . .. = =
it (with an arbitrary window), pad it with. zeros, then tendorm back to the frequency -* 30 domain and multiply by the frequency domain version of the audio to=be processed (the .
=
audio ne!!--d not be windowed).
= Table 1 = Angle-Adjusting Decorrelation Teebnique,s _ . - =
=
. = , . .
. . . . , .
-9 20,05/086139 PeTTGS2005/006"-=
=
= - 17 -. = =
= Technique 1 Technique 2 Technique 3 =
Type of Signal Spectrally static Complex continuous Complex impulsive (typidal example) source signals signals (transients) Effect on = = Decorrelates low Decorrelates non- Decorrelates Decorrelation frequency and impulsive complex impulsive high steady-state signal = signal components frequency signal components components Effect of transient Operates with Does not operate Operates present in frame shortened time =
= constant What is done Slowly shifts Adds to the angle of Adds to the angle of (frame-by-frame) Technique 1 a time- Technique 1 a bin angle in a invariant rapidly-changing channel randomized angle (block by bloek) =
on a bin-by-bin randomized angle =
= basis in-a channel on a subband-by-subband basis in. a = channel =
Controlled by or Basic phase angle is Amount of = = Amount of ' =
Scaled by controlled by Angle randomized angle is randomized angle is Control Para meter scaled directly by 'scaled indirectly by Decorrelation SF; Decorrelation SF;
same scaling across same scaling across = subband, scaling subband, scaling updated every frame updated every frame Frequency Subband (same or Bin (different Subband (same =
Resolution of atTle interpolated shift randomized shift randomized shift shift = value pplied to all value applied to value applied to. all , bins in each each bin) bins in. each subband) subband; different .
randomized shift value applied to =
= each subhead in = channel) Time Resolution Frame (shift values Randomized shift Block (randomized updated every values remain the shift values updated frame) same and do not every block) change =
=
For signals that are substantially static spectrally, such as, for example, a pitch pipe note, a first technique ("Technique 11) restores the angle of the received mono =
composite signal relative to the angle of each of the other recovered channels to an angle similar (subject to frequency and time granularity and to quantization) to the original =
angle of the channel relative to the other channels at the input of the encoder. Phase angle differences are useful, particularly, for providing deccarelation of low-frequency signal =
= = =
=
=
= .====
= = . . . =
=
VO 2005/086139' KT/GS2005/0 components below about 1500 Hz where the ear follows individual cycles of the audio signal. Preferably, Technique 1 operates under all signal conditions to provide a basic angle shift For high-frequency signal components above about 1500 Hz, the ear does not . 5 follow individual cycles of sonadhut instead responds to waveform envelopes (on a critical band basis). Hence, above about 1500 Hz decorrelation is better provided by differences in signal envelopes rather than phase angle differences. Applying phase angle = shifts only in accordance with Technique 1 does not alter the envelopes of signals sufficiently to decorrelate high frequency signals. The second and third techniques ("Technique 2" and 'Technique 3", respectively) add a controllable amount of itndomind angle variations to. the angle determined by Technique 1 under certain signal conditions, thereby causing a controllable amount of randomind envelope variations, which enhances decorrelation: =
Randomized changes in phase angle are a desirable way to cause random Wed changes in. the envelopes of signals. A particular envelope results from the interaction of .a particular combination of amplitudes and phases of spectral components within a subband Although changing the amplitudes of spectral components within a subband changes the envelop; large amplitude changes are required to obtain a significant change in the envelope, which is undesirable because the human ear is sensitive to variations in spectral amplitude. hi contrast, changing the spectral component's phase angles has a greater effect on the envelope than changing the spectral component's amplitudes ¨
spectral components no longer line up the same way, so the reinforcements and subtractions that define the envelope occur at different times, thereby changing the envelope. Although the human ear has some envelope sensitivity, the ear is relatively phase dm-4 so the overall sound quality reniains substantially similar.
Nevertheless, for some signal conditions, some randomization of the amplitudes of spectral components along with randomization of the phases ofspectral components may provide an enhanced randomization of signal envelopes provided that such amplitude.randorni7ation does not cause undesirable audible artifacts.
Preferably, a controllable amount or degree of Technique 2 or Technique 3 .. = .
= operates along with Technique 1 under 'certain signal conditions. The Transient Flag . selects Technique 2 (no transient present in the frame or block, depending on whether the = = =
= : = = . = . = ..=
= --'01 2005/086139 =
Transient Flag is sent at the frame or block rate) or Technique 3 (transient present in the frame or block): Thus, there are multiple modes of. operation, depending on whether or = not a transient is present Alternatively, in addition, under certain signal conditions, a controllable amount of degree of amplitude randomization also operates along with the amplitude scaling that seeks to restore the original channel amplitude. =
Technique 2 is suitable for complex continuous signals that are rich in harmonics, . = such as massed orchestral violins: Technique 3 is suitable for complex impulsive or transient signals, such as applause, castanets, etc. (Technique 2 time smears claps in applause, making it unsuitable for such signals). As exPlained further below, in order to minimize audible artifacts, Technique 2 and Technique 3 have different time and frequency resolutions for applying randomized angle variations ¨ Technique 2 is selected when a transient is not present, whereas Technique 3 is selected when a transient is present. =
Technique 1 slowly shifts (frame by frame) the bin angle in a channel. The .
amount or degree of this basic shift is controlled by the Angle Control Parameter (no shift if the parameter is zero). As explained further below,. either the same or an interpolated -parameter is applied to all bins in each subband and. the parameter is updated every frame.
Consequently, each subband of each channel may have a phase' shift with respect to other channels, presiding a degree of decorrelation at low frequencies (below about 1500 Hz).
.20. However, Technique 1, by itself is unsuitable for a transient signal such as applause. For such signal conditions, the reproduced ChannelS -May exhibit an. annoying unstable comb-= filter effect. In the case of applause, essentially no decorrelation is provided by adjusting only the relative amplified of recovered channels because all charnels tend to have the =
same amplitude over the period of a frame. =
Technique 2 operates when a transient is not present. Technique 2 adds to the angle shift of Technique 1 a randomized angle shift that dotes not change with time, on a bin-by-bin basis (each bin hasa different randomized shift) in a channel, causing the envelopes of the channels to be different from one another, thus providing decorrelation of complex signals among the channels Maintaining the randomized phase angle values constant over time avoids block or frame artifacts that may result from block-to-block or =
frame-to-frame alteration of bin phase angles. 'While this technique is a verTaseful decorrelation tool when a transient is not Present, it may temporally smear a tansient = =
=
. =
- = = = = . .
70 20051086139 = PCTAIS2005100r =
. ' - -(resulting in what is often referred to as "pre-noise"¨ the post-transient smearing is masked by the transient). The amount or degree of additional shift provided by Technique 2 is scaled directly by the DeCorrelation Scale Factor (there is no a1ditional.
shift if the scale factor is zero). Ideally, the amount of randomized plinseangle added to 2 .. the base angle shift (of Technique 1) accordingto Technique 2 is controlled by the Decorrelation Scale Factor in a manner that minimizes audible signal Warbling artifacts.
ancli minimization of s'gnal warbling artifacts results from the manner in which the Decorrelation Scale Factor is derived. and the application Of appropriate time smoothing, as described below. Although a different additional randomized angle shift value is . applied to each bin and that shift value does not change, the same scaling is applied across a subband and the scaling is updated every.fraMe.
Technique 3 operates in the presence of a transient in. the frame or block, depending on the rate at which the Transient Flag is sent_ It shifts all the bins in each subband in a channel from block to block with a unique randomized angle value, common 15= to all bins in the subband, causing not only the envelopes, but also the amplitudes and phases, of the signals in a channel to change with respect to other channels from block to block. These changes in time and frequency resolution of the angle randomizing reduce steady-state signal. similarities among the channels and provide decorrelation of the channels substantially Without causing "pre-noise" artifacts. The change in frequency resolution of the angle randomizing, from very fine (all bins different in a channel) in.
Technique 2 to coarse (all bins within a subband the same,, but each subband different) in Technique 3 is particularly useful in minimizing 'pre-noise" artifacts.
Although the ear - does not respond tä pure angle changes directly at high frequencies, when two or more channels mix acoustically on their way from loudspeakers to a listener, phase differences may cause amplitude changes (comb-filter effects) that ma:y.be audible and objectionable, and these are broken up by Technique 3. The impulsive characteristics of the signal ininimize block-rate artifacts that might otherWise occur. Thus, Technique 3 adds to the phase shift of Technique 1. a rapidly changing (block-by-block) randomized angle shift =
. on a subband-by-subb and basi8 in a channel. The amount or degree of additional shift is.
scaled indirectly, as described below, by the Decorrelation Scale Factor (there is no additional shill if the scale ctor is zero). The same scaling is applied across a subband and the scaling is updated -every frame:
. .
. = = _ =
=
= . =
= 2005/086139 PCT/ITS2005/0063 .
= Although the angle-adjusting techniques have been characterized as three techniques, this is a matter of semantics and:they may also be characterized as two = techniques: (1) a combination of Technique 1 and a variable degree of Technique 2, which may be zero, and (2) a combination of Technique 1 and a variable degree : -Technique 3, which may be zero. For convenience in)presentaiion, the techniques are treated as being three techniques.
Aspects of the multiple mode decorrelation techniques and modifications of them may be employed in. providing deconelation of audio signals derived, as by upmixing, from one or more audio channels even when such audio channels are not derived from an encoder according to aspects of the present invention. Such arrangements, when applied to among audio. channel, are sometimes referred to as "pseudo-stereo" devices and functions. Any suitable device or function (an "up-mixer") may be employed to derive = multiple signals from a mono audio channel or from multiple audio channels. Once such multiple audio channels are derived by an upmixer, one or more of them may be . 15 decone1ated with respectio one or more of the other derived audio signals by applying the multiple mode decorrelation techniques described herein. In such an application, each derived audio channel to which the decorrelation techniques are applied may be switched from one mode of operation to another by detecting transients in the derived audio channel itself Alternatively, the operation of the transient-present technique (Technique 3) may be simplified to provide no shifting of the phase angles of spectral components when a transient is present Sidechain information = =
= As mentioned above, the sideChain information may include: an Amplitude Seale . Factor, an Angle Control Parameter, a Decoirelation Scale Factor, a Transient Flag, and,, optionally, an Interpolation Flag. Such sidechain information for a practical embodiment = of aspects of the present invention may be summarized in the following Table 2.
= Typinally, the sidechain information may be updated once per frame. , Table 2 Sidechain Information Characteristics for a Channel Sidechain RelDresents Quantization Primary . .
Information. Value Range (is "a measure Levels Purpose of') Subband Angle 0 -342n. Smoothed time 6 bit (64 levels) Provides = Control average in each basic angle Parameter subband of rotation for . ' - .
' -= - .70 2005/086139 = = PCT/US2005/00 . . . . .
-, .
= . =
. .
. - 22 - . .
= Sidechain .
Represents QUanfintion Primary . Information Value Range (is "a measure = Levels =
Purpose of") difference . each bin in = between angle of . channel . each bin in , = subband for a . . channel and that .
of the . .
, .
. - = corresponding bin .
. =
= in. subband of a =
reference channel =
. Subband 0 41 Spectral- 3 bit (8 levels) Scales Decorrelation The Subband steadiness of randomized Scale Factor Decorrelation "- signal angle shifts =
= . Scale Fader is characteristics added to =
high only if over time in a = basic angle both the subband of a rotation, and, = = Spectral- channel (the if employed, Steadiness - Spectral- = also scales Factor and the Steadiness . - = .
randornind.
. , Irtterchannel Factor) and the Amplitude . . Angle consistency in the Scale Factor -Consistency same subband of added to = -. Factor are low, a channel of bin . basic - - angles with Amplitude respect to Scale Factor, -corresponding = - and, .
, bins of a optionally, , .
reference channel scales degree = . (the Interchannel = of - = Angle reverberation.
.
,--Consistency ..
.
. Factor) - . =
, Subband . 0 to 31 (whole Energy or 5 bit (32 levels) Scales = , - Amplitude integer) amplitude in. granularity is amplitude of .
= Scale Factor = 0 is highest . subband Of a 1.5 dB, so the bins in a , amplitude channel with range is 31*1.5 = subband in a 31 is lowest respect to energy 46.5 dB plus channel amplitude - or amplitude for final value = ofe _ same subband .
. acrossall õ
. _ = . .
. . . . , channels *
. . !
. = . . . = . .
.
=
=
. .
= . - . , . , , . . . '. = .
, . _ . .
. .
. = , . , = ' .
- .
. .. .
. . 7 .. , - . = ' . .
. , .
-PCT/US2Q05/0063._ - =
= - 23 -Sidechein = Represents -= Quantization Primary=
.
Information. Value Range (is ,"a measure Levels Purpose of') Transient Flag 1,0 = Presence of a 1 bit (2 levels) Determines (True/False) transient in the which =
(polarity is frame or in the technique for = arbitrary) . block =
adding randomized =
angle shifts, = or both angle shifts and amplitude = shifts, is employed Interpolation 1, 0 A spectral peak 1 bit (2 levels) Determines Flag (True/False) near a subband if the basic (polarity is . boundary or = angle = arbitrary) phase angles rotation is within a channel interpolated have a linear across progression frequency In each case, the sidechain information of a channel applies to a single subband (except for the Transient Flag and the Interpolation Flag, each of which apply to all subbands in a channel) and maybe updated once per frame. Although the time resolution (once per frame), frequency resolution (subband), value ranges and quantization levels indicated have been found to Provide useful performance and a -useful compromise between a low bitrate and performance, it will be appreciated that these time and frequency resolutions, value ranges and quantization levels are not critical and that other =
=
resolutions, ranges and levels may employed in practicing aspects of the invention. For example, the Transient Flag and/or the Interpolation Flag, if employed, may be updated once per block with only a minimal increase in sidechain data overhead. In the case of the Transient Flag, doing so has the advantage that the switching from Technique 2 to -Technique 3 and vice-versa is More accurate. In addition, as Mentioned above, sidechain information may be updated upon the occurrence of a block switch of a related coder.
It will be noted that Technique 2, described above (see also Table .1), provides a bin frequency resolution rather than a subband frequency resolution (ix., a different pSeudo random phase angle shift la applied to %Alin rather than to each subband) even though the same Subband Decorre_ar d. on Scale Factor applies to all bins in a subband. It =
- = . -, =
=
.= -WO 2005/086139 PCT/IIS2005100( = .
will also be noted that Technique 3, described above (see also Table 1), provides a block frequency resolution (i. e., a different randomized phase angle shift is applied to each block rather than to each frame) even though the same Subband Decorrelation Scale.
Factor applies to all bins in a subband. Such resolutions, greater than the resolution of the sidechain information, are possible becanse the randomized phase angle shifts may be generated in a decoder and need not be known in the encoder (this is the case even if the encoder also applies a randomized phase angle shift to the encoded mono composite - signal, an alternative that is described below). In other words, it is not necessary to send sidechain information hiving bin or block granularity. even thang,h the decorrelation technicpres employ such granularity. The decoder may employ, for example, one or more lookup tables of randomized bin phase angles. The obtaining of time and/Or frequency resolutions for decorrelation greater than. the sidechain information rates is among the aspects of the present invention. Thus, decorrelation by way of randorni7ed phases is = performed either with a fine frequency resolution (bin-by-bin) that does not change with time (Technique 2), or with a coarse frequency resolution (band-by-band) ((or a fine frequency resolution (bin-by-bin) when frequency interpolation is employed, as described . further below)) and a fine time resolution (block rate) (Technique 3).
It will also be appreciated that as increasing degrees of randomized phase shifts are added to the phase angle of a recovered channel, the absolute phase angle of the recovered channel differs more and more from the original absolute phase angle of that channel. An aspect of thepresent invention is the appreciation that the resulting absolute phase angle of the recovered channel need not match that of the original channel when . signal conditions are such that the randomized phase shifts are added in accordance with . . aspects of the present invention. For example, in extreme cases when the Decorrelation Scale Factor causes the highest degree Of randomized phase shift, the phase shift caused by Technique 2 or Technique 3 overwhelms the basic phase shift caused by Technique 1.
Nevertheless; this is of no concern in that araudonti zed phase shift is audibly the same as . the different random phases in the original Signal that give rise to a Decor-relation Scale Factor that causes the addition of some degree of randorni7ed phase shifts.
=
As mentioned. above, randomized amplitude shifts may by employed in addition to randomized phase shifts.- For example,-the Adjust Amplitude may also be controlled by a Randomized Amplitude Scale Factor Parameter derived from the recovered sidechain . . =
= CA 3026283 2018-12-03 =
= ¨
- 2005/086139 PCTMS2005/006. =
=
= =
Decorrelation Scale Factor for a particular channel and the recovered sidechain Transient Flag for the particular channeL Such randomized amplitude shifts may operate in two modes in a manner analogous to the application of randomized phase shifts. For example, in the absence of a transient, a randomized amplitude shift that does not change with time may be added on a bin-by-bin basis (different from bin to bin), and, in the presence of a = transient (in the frame or block), a randomized amplitude shift that changes on a block-by-blockbasis (different from block to block) and changes from subband to subband (the same shift for all bins in a subband, different from subband to subband).
Although the amount or degree to which randomized amplitude shifts are added may be controlled by . the Decorrelation Scale Factor, it is believed that a particular scale factor value should .=
.
cause less amplitude shift than the corresponding randomized phase shift resulting from the same scale factor value in order to avoid audible artffiicts.
When the Transient Flag applies to aframe, the time resolution with Which the .
Transient Flag selects Technique 2 or Technique 3 may be enhanced by providing a supplemental transient detector in the decoder in order to provide a temporal resolution finer than the frame rate or even the block rate. Such a supplemental transient detector may detect the occurrence of a. transient in the mono or multichannel composite audio signal received by the decoder and such detection information is then sent to each Controllable Decorrelator (as 38,42 of FIG. 2). Then, upon the receipt of a Trnsient Flag for its channel, the Controllable Decorrelator switches from Technique 2 to =
Technique 3 won receipt of the decoder's local transient detection indication.
Thus, a substantial improvement in temporal resolution is possible without increasing the =
sidechain bitrate, albeit with decreased spatial accuracy (the encoder detects transients in each input channel prior to their downmixing, whereas, detection in the decoder is done after downmiling).
= As an alternative to sending sidechain information on a frame-by-frame basis, sidechain information may be updated.every block, at least for highly dynamic signals.
As mentioned above, updating the Transient Flag and/or the Interpolation Flag every block ;results in only a small increase in sidechain data overhead. In order to accomplish .30 such an increase in. temporal resolution for other sidechain information without substantially increasing the sidechain data rate, a block-floating-point differential coding arrangement may be used. For example, consecutive transform blocks may be collected = . .
=
- = YO 20057086139 PCT/US2005/001 = - 26 -. in groups of six over a frame.- The full sidechain information may be sent for each subband-channel in the first block., In the five subsequent blocks, only differential values may be sent, each the difference between the current-block amplitnde and angle, and the = equivalent values from-the previous-block. This results in very low data rate for static signals, such as a pitch pipe note. For More dynamic signals, a greater range of difference values is required; but at less preci.sion. So, for each group of five differential -values, an exponent may be sent first, using, for example, 3 bits, then differential values are quantized to, for example, 2-bit accuracy. This arrangement reduces the average worst-case sideohain data rate by about a factor of two. Further reduction may be obtained by Omitting thesidechain data for a reference channel (since it can he derived from the o. ther channels), as discussed above, and by using, for example, arithmetic coding.
Alternatively or in addition, differential coding across frequency may be employed by . .
sending, for example, differences in subband angle or amplitude.
Whether sidechain information is sent on a frame-by-frame basis or more frequently, it may be useful to interpolate sidechain values across the blocks iii a frame.
Linear interpolation over time may be employed in the manner of the linear interpolation across frequency, as described below.
One suitable implementation of aspects of the present invention employs processing steps or devices that implement the respective processing steps.
and are = functionally related as next set forth. Although the encoding and decoding steps listed below may each be carried out by computer software instruction sequences operating in the order of the below listed steps, it will be nnderstood that equivalent or similar results may be obtained by steps ordered in other ways, taking into account That certain quantifies are derived from earlier ones. For example, multi-threaded computer software instruction " 25 sequences may be ernployed so that certain sequences of steps are carried out in parallel.
Alternatively, the described steps may bp implemented as devices that perform the described functions, the various devices having functions and functional inte.uelationships as described hereinafter.
Encoding = 30 ' The encoder or encodipg function may collect a frame's worth of data before it .
derives sidechain inform tion and downmixes the forne's audio channels to a single = monophonic (mono) andio channel (in the manner of the example of FIG. 1, described =
=
' '0 2005/086139 = PCT/US2005/0063.
above), or to multiple audio channels (in the manner of the example of FIG. 6, described below). By doing so, sidechain information may be sent first to a decoder, allowingthe decoder to begin decoding immediately upon receipt of the mono or multiple channel audio information. Steps of an encoding process ("encoding steps") may be described as follows. With respect to encoding steps, reference is made to FIG. 4, which is in the =
nature of a hybrid flowchart and functional block diagram. Through Step 419, FIG. 4 shows encoding Steps for one channel. Steps 420 and 421 apply to. all Of the multiple channels that are combined to provide a composite mono signal output or are matrixed together to provide multiple channels, as described below in connection with the example of FIG. 6.
Step 401, Detect Transients a. Perform transient detection of the PCM values in an input audio channeL
b. Set a one-bit Transient Flag True if a transient is present in any block of a frame =
for the channel. =
Comments regarding Step 401:
The Transient Flag forms a portion of the sidechain information and is also used in Step 411, as described below. Transient resolution finer than block rate in the decoder = may improve decoder performance. Although, as discussed above, a block-rate rather than a franie-rate Transient Flag may form a portion of the sidechaiu information with a modest increase in bitrate, a similar result, albeit with decreased spatial accuracy, maybe accomplished without increasing the sidechain bitrate by detecting the occurrence of transients in the mono composite signal received in the decoder.
There is one transient flag per channel per frame, which, because it is derived in the time domain, necessarily applies to all subbands within that channel. The transient detection may be performed in the manner Similar to that employed in an AC-3 encoder for controlling the decision of when to switch between long and short length audio = blocks, but with a higher sensitivity and with the Transient Flag True for any frame in - which the Transient Flag for a block is True (an AC-3 encoder detects transients on a block basis). basis). In particular, see Section 8.2.2 of the above-cited A/52A document. The 31:1 sensitivity of the transient detection described in. Section 8.2.2 may be increased by adding a sensitivity factor F to an equation set forth therein. Section 8.2.2 of the A/52A
document is set forth below, with the sensitivity factor added (Section 8.2:2 as reproduced . .
= =
. =
= -. .
= .
l = .
= = . =
= 73221,92 , . . . , .
'''' = = - =
. . .
.
, . . .
. .
_ -. ' . =:. 28'=-= = = .
.
. = =
. .
, . ' below is cerrected_to indicate that the low'pass filter is a cascaded biped direct f-orm II = ' . õ .
.U.K filter rather than "form I" as lathe published A/52A.= document; Section 8.2.2 was.
. . = correct lathe earlier .A152 docuraent): Although it is not critical, a sensitivity factor of . .
'= =
0.2 has been found to be a suitable value in a practical embodiment of aspects of the = . '.
:-, . . 5 present invention. - = - ..
. .
.=.
. .
.
. Altem.atiVely, a 'similar transient' detection technique deseribed in U.S. Patent .
..
5,394,473 nidy be employed.. The '473 patent describes aspects of the.A/52A
document . = .
= . transient detector in gieater detaiL
. . . =
" -. = - .
. . . . .
.
. ... . .. .
. -. . 10 .. . '' =
As another. altehmtive,transients maybe detected lathe frequency doniain rather .
= : than in the time domain(see the Comments to Step-408 ). In that can;
Step 401 May be . . . . . .
= omitted and. an alternative step emploYed in the frequency domain as d,eiciibed below.
. . =
= = = =, . Step 402. Window and bfr. .
= . =
.
.
= . , . = = = . Multiply overlapping blocks ofPCM time Aamples by alime window and convert , 15 . them to complex frequency values via a DFT as imPlem. ented by atuner. . .
. , .. .
.Step 403. -Convert Complex Values taMagnitude tin.d Angle.' =
= - =
. Convert each freperiby-domain complex transfer:m.13in value (a + jb) to a .
. . ' ' magnitude 'and angle =Presentation using standard complex manipulations:
= . a. Magnitude = square rocit.(a2+ b2) , " =
.. . . =
= . 20, : - :1,. Angle =-.archtit (hitt) ' ' . - .. . ' .. = . .. .
. .
' Comments regarding Step 403: . = =
= . . .
. .
. Some of the. fellOwittg"Steps use or may use, as an alternative, the energy of a bin, = =
.
- defthed as the above.magnitude squared (i., energy = (a2=4, b2.). . .
. .
= . .
. ' = . = Step.
404. Calculate Snhband Energy. -. .
. .
...
. 25 ' . a. Calculate the subband energy p.er blockby adding bin energy values within .. .
= .
= ' - = : each subband (a.summation moss frequencY). = . = = . = . .
. . .
' .
= b. Calculate.the subband energy per frame by averaging or accumulating the . . energy in all the blocks in a frame (an averaging / accumulation across time). . ...t--=
c. If the coupling frequency of the encoder is below about=1000-liz, apply the = I.
. 30 subband frame:averaged or frame-accumulated energy to-a time smoother that operates = . .
. .
on all subbands below that frequency andahove thezbupling fr. equency.
. = .
Comments regarding,Sfep 404c: - = = ' =
. .
. . . .
- . . .
= . . =
= " =
. .
. . _ . . . . .
. .
. = = = . . , = . .
. . .
" .. . =. . . .
. . . . .
.=
. = . .
.
. .
- . : =
=
= =
. . =
"29 - = =
Time=smoothing.to provide inter-frame smoothing hi low frequency subbands may be useful. In order to avoid artifact-causing discontinuities between bin values at Bubb and =
=
boundaries, it maybe useful to apply a progressiVely-decreasing time smoothing from th= e Iowestfrequency subhead encompassing and above the coupling frequency (wherethe =
smoothing ma Y have a signi'ficant effect) up through a higher frequency subband in which . = .
the time smoothing effect is measurable, but hiandiblei although nearly audible. A
suitable time constant for the lowest frequency range subband (where the subband is a= = .
.=
= single bin if subbands are critical bands) may be in the range of 50 to 100milliseconds, . = = for example. l'rogressively-decreasing time smoothing may continue up through a = 10 ,sulkand encompassing about 1000 HZ Where the time constant nifly=be about 10 milliseconds, for example. =
- = Although a first-order smoother is suitable, the smoother maybe a two-stage ymoother that has a variable time constant that shortens its attack and decay time in response te tratisicit (such a two-stage smoother maybe a digital equivalent of the analog two-stage snioothers describedin U.S. Patents' 3,846,719 and 4,922,535).
In other words, the steady-state =
=
= ti.1116 constant may be Scaled according to frequency and may also be variable in response to transients. Alternatively,. such smoothing may be applied in Step 412.
- Step 405: Calculate Sunk of Bin Magnitudes. =
= 20 . a. Calculate the sum per block of the bin magnitudes (Step 403) of each subband = (a suoimation acrosafrequency).
= b.
Calculate the sum per frame of the bin magaitudes of eatit subband by =
=
-= averaging or .accutnulating the magnitudes of Step=405a across.the blocks in a frame (an =
= . averaging / accumulation across time). These 'SUMS are used to calculate an Interchnimel =
. .
Angle Consistency Factor in Step 410.b.elOw.
D. If the coupling frequency) of the encoder i below about 1000 Hz, apply the subband frame-averaged or frame-accumulated magnitudes to a time smoother that . . , operates on all suhbands below that frequency and above the coupling frequency: =
=
= . Comments .regarding Step 405e: See coininents regarding step 404c eicept that mite case of Step 405; the time smoothing may alternatively be performed as pad, of = Step 410. .
= Step 406. Calculate Relative Interch.annel Bin Phase Angle. =
=
. = .
=
=
. =
= =
= =
=
=
= = =
= '10 2005/086139 ITTMS2085/006-/ , =
Calculate the relative interobannel phase angle of each. transform bin of each block by subtracting from the bin angle of Step 403 the corresponding bin angle of a reference .
, channel (for example, the first channel). The result, as with other anee additions or subtractions herein, is taken modulo (;-7c) radians by adding or subtracting 2n until the result is within the desired range of-7C to Step 407. Calculate Interchannel Subband Phase Angle.
For each channel, calculate a frame-rate amplitude-weighted average interchannel phase angle for each subband as follows:
a. For eachbin, construct a compleX number from the magnitude of Step 403 = 10 and the relative interchannel bin phase angle of Step 406.
b. Add the constructed complex numbers of Step 407a across each subband (a summation across frequency).
==Comment regarding Step 407b: For example, if a subband has two bins and one of the bins has a complex value of 1 + jl and the other bin has a complex value of 2 +j2, their complex.pum is 3 +j3. =
Average or accumulate the per block complex number sum for each = subband of Step 407b across the blocks of each frame (an averaging or = accumulation across time) -= d. lithe coupling frequency'of the encoder is below about 1000 Hz, apply the subband flame-averaged or frame-accumulated complex value to. a time sMoother that operates on. all subbands below that frequency and above the coupling = frequency.
Comments regarding Step 407d: See comments regarding Step 4045 except that in the case Of Step 407d, the time smoothing May alternatively be performed as part of Steps 4070 or 410.
e. Compute the magnitude of the complex result of Step 407d as per Step 403.
Comment regarding Step 407e: This magnitude is used in Step 410a below.
In the simple example given in Step 407b, the magnitude of 3 +33 is square root (9 9) = 424.
E Compute the angle of the complex result as per Step 403.
Comments regarding Step 417f: In the simple example given in Step 40%, the angle of 3 +j3 is aretan (3/3) = 45 degrees = n/4 radiant This subband angle . .
_ = = . . _ = --=
. = , PCT1t1S2005/00635 = - 31 -is signal-dependently time-smoothed (see Step 413) and rpiantind (see Step 414) to generate the Subband Angle Control Parameter sidechain information, as described below.
= Step 408. Calculate Bin Spectral-Steadiness Factor For each bin, calculate a Bin Spectra-Steadiness Factor in the range of 0.to 1 as follows: =
a. Let Xm = bin magnitude of present block calculated in Step 403. =
b. Lety = corresponding bin magnitude of previous block.
. = c. If xm. > yõõ, then Bin Dynamic Amplitude Factor d. Else if yõ, > xin, then Bin Dynamic Amplitude Factor =
. e. Me fyxm, then. Bin Spectral-Steadiness Factor = 1.
Comment regarding Step 408:
"Spectral steadiness" is a measure of the extent to which spectral components (e.g., spectral coefficients or bin values) change over time. A Bin Spectral-Steadiness = 15 Factor of 1 indicates no change over a given time per 1.
Spectral Steadiness may also be taken as an indicator of whether a transient is present. A transient may cause a sudden rise and fall in spectral (bin) amplitude over a time period of one or more blocks, depending on its position with regard to blocks and their boundaries. Consequently, a change in the Bin Spectral-Steadiness Factor from a high value to a low value over a small number of blocks may be taken as an indication of the presence of a transient in the block or blocks having the lower value. A
further confirmation of the presence of a transient, or an alternative to employing the Bin = Spectral-Steadiness factor, is to observe the phase angles ofbins within the block (for example, at the phase angle output of Step 403). Because a transient is likely to occupy a single temporal position within a block and have the dominant energy in the block, the existence and position of a transient may be indicatedhy a substantially nui form delay in phase from bin to bin in the block namely, a substantially linear ramp of phase angles as a function of frequency. Yet a further confirmation or alternative is to observe the bin amplitudes over a small number of blocks (for example, at the magnitude output of Step 403), namely by looking directly for a sudden rise and-fall of spectral level.
----Alternativelyi-Step408 may-look atthree conseeutive blocks instead of one block.
= If the coupling frequency of the-encoder is below about 1000 Hz, Step 408 may look at =
-= VO 20051086139 PCT/IIS2005/00t.
=
=
more than three consecutive blocks. The number of consecutive blocks may taken into consideration vary with frequency such that the number gradually increases as the .subband frequency range decreases. If the Bin Spectral-Steadiness Factor is obtained from more than one block, the detection of a transient, as just described, may be determined by separate steps that respond only to the number of blocks useful for detecting transients.
=
As a further alternative, bin energies may be used instead of bin magnitudes. -As yet a further alternative, Step 408 may employ an "event decision"
detecting technique as described below in the comments following Step 409.
Step 409. Compute Subb and Spectral-Steadiness Factor.
Compute a frame-rate Subband Spectral-Steadiness Factoi on a scale of 0 to 1 by forming an amplitude-weighted average of the Bin Spectral-Steadiness Factor within each subband across the blocks in a frame as follows:
a. For each bin, calculate the product of the BinSpectral-Steadiness Factor of Step 408 and the bin magnitude of Step 403. =
b. Sum the products within each subband (a summation across frequency). .
c. Average or accumulate the summation of Step 409b in all the blocks in a frame Can averaging / accumulation across time).. =
d. If the coupling frequency of the encoder is below about 1000 Hz, apply the subband frame-averaged or frame-accumulated summation to a time smoother that =
operates on all subbands below thatfrequency and. above the coupling frequency.
= Comments regarding Step 409d: See comments regarding Step 4040 except that in the case of Step 409d, there is no Suitable subsequent step in which the time smoothing may alternatively be performed.
e. Divide the results of Step 409c or Step 409d, as appropriate, by the sum of the bin magnitudes (Step 403) within the subband.
Comment regarding Step 409e: .The multiplication by the magnitude in Step 409a and-the divionby the sum of the magnitudes in Step 409e provide amplitude weighting. The output of Step 40515 independent of absolute amplitude and, if not = :
amplitude weighted, may rtabse the output or Step 409 to be controlled by very small =
amplitudes, which is undesirable.
f. Scale the result to obtain the Subhead Spectral-Steadiness Factor by mapping = = =
.
.
, =
. . . = .
=. 7.221-92. . . . . =
-' .
' . . - .= . =
. . . .
. .
. . . . . .
. õ
, . . .
. ..- 33 -.
. .
= . . . .
.
=
the range from: {0.5...1) to {0...1). This may be .done by multiplying the result by 2, .
. , subtracting 1; and limiting results less than 0 to a value. Of 9. .
. . . =
. = .
. = Comment rigardhtg=Step 4091: Stop 4091may be useful. in assuring that :t .
. . =
.
channel of noise results in a Sutband Spectral-Steadiness Factor of zero. ' .
..
. :
. _ = 5 - Commenfn regarding Steps 408 and 409: = = .
= - The goal of Steps 408 and 409 is to Measure- spectral 'steadiness ¨ changes in . - spectral composition over time ma Tubb and of a channel.
AltematiVely, aspects of an . .
=
. "event decision" sensing stih as described in hiternationtil PublicationlimOer WO = .
.
.02/097792 Al (designating the.United States) may be employed to measare spectral .
. .
. 40 steadiness instead of the approach just described in.connection with Steps.408 and 409. .
=
. = - = U.S. Patent Application S.N. 10/478,538, moil November 20, 2003 lathe United States' .
. . . .
.
. = national application of thepublisheciPCT Application WO 02/09772 Al.
.. .
=
= .
. . , .
. .
. . .
.. cAcording to these above-mentioned applications, the magnitudes of the = . -.
.
. =15 cemplexHr.r coefficient i3f each bin are calculated and normali7ed (largest magnitude is ' set fb a value of one, for example). Then the magnitudes of corresponding bins. (itt dB) in . consecutive blocks are subtracted (ignoring signs), the differences between bins ate .
summed, and, if the spin exceeds a threshold, the block boundary is-considered to be. an . .
. anditoiy event boundary: Alternatively; changes in amplitude from block to block may . .
. - 20 also be considered along with spectral magnitude changes (by loOlcing at the.
amounfOf .
_ .
.
. = nomialintdon required). - = .
:. . f I aspects of the above-mentioned event-sensing applications.
are employed to measure . . . .
. . = = spectral:steadiness, normalizatiOn may not be required and the changes in spectral .. .
. = = magaitude.(changes in amplitude would not be measured if i7ation is omitted) . .
. . = 25 preferably -are _considered on a subband basis-. Instead ofperfumaing Step 408 as: . . =
. '. indicated above, the decibel differences in spectral MagnitUde between corresponding . =
-. . - , bins in each subband may be summed in apcordance with the teachings of said . = application. Then, each of those sums, representing-the degree of speared change from. s .
= t = . " block to block may be scaled se that the reault is. a spectral steadiness factor haying a . .
. 3Q range from 0 ta 1, wherein a value of 1 indicates the highest steadiness; a change tif0 dB
=
from block to block for a given. bin. A value of 0, indicating the lowest steadiness, may .
. .
.
. . .
.
be assigned to decibel changes equal t or 'greater' than aunitble amount, such as 12 d13, = . = . = = . . . . . _ . . . . . . .
. . . . . . = . , :
- =
. .
. .
. - . .
. = . . .
.
= .
, .
. .
..
. . . .
. . . . - .
.
. = = .
.
.
, =
= .. 73221-92 . . .
= =
- = - 34 -= for example. Those results, a Bin Spectral-Steadiness Factor, may be used by Step 409 in = the same manner that Step 409 uses-the results of Step 408 as described above. -When -Step 409 receives a Bin Spectral-Steadiness Factor obtained by employing the just--described alternative event decision sensing technique, the Subhead Spectral-Steadiness . =
Factor of Step 409-may also be used as an indicator of a transient. For example, if the .
range of values produced by Step 409 is 0 to 1, a transient may be considered to be = present when the SUbband Spectral-Steadiness Factor is a qmall vain;
such as, for =. example, 0.1, indicating substantial spectral. unsteadiness.
= It will be appreciated that the Bin Spectral-Steadiness Factor prodneed by Step .
=
= 10 408 and by the-just-described-alternative to Step 408 each inherently Provide a variable thresholht to a certain, degree in_ that they are baied on relative changes from block to . =
block. Optionally, it may be useful to supplement such inherency by specifically providing a shift in the threshold in response to, for example, multiple transients in a = frame or a large transient among smaller transients.(e.g., a loud transient coming atop mid- to low-level applause). In the case of the latter example, an event detector may initially identify each clap as an event, but a loud transient (e.g., a drum bit) may make it = . =
desirable:to shift the threshold so that only the dmin hit is identified as an event..
=
Alternatively, a randomness metric may be employed (for example, as described =
= in U.S. Patent Re 36,714) instead Of a measure of spectral-steadiness over time.
. .
= 20 == Step 410. Calculate Interchanuel Angle Consistency Factor.=
. =
For each subbandhaving more than one-bin, calculate a frame-rate Interehannel =
= Angle Consistency Factor as follows: = =
=
a. Divide the magnitude of the complex sum of Step 407e by the sum of the =
= magnitudes of Step 405. the resulting "raw" Angle Consistency Factor is a = number in the range of 0 to 1.
= b.-Calculate a correction *tor: let n = the number of yalues across the =
= subband contributing to the two quantities in the above step (in other words, ``n" is-the number' of bins in. the subband). If a is less than. 2, let the Angle Consistency -= 30. = = Facto be 1 and go to Steps 411 and 413.
= = c. Let r = 4xpeeted Random Variation = 1/n. Subtract r from ;the result of == Step 410b. . ==
= .
=
' 2005/086139 PC1702605/0063:._ =
d. Normalive the result of Step 410c by dividing by (1 x.). The result has a maximum value of 1.. Limit the minimum. value to 0 as necessary.
= = Commenti regarding Step 410:
Interchannel Angle Consistency is a measure of how similar the interchannel .
phase angles are within a subband over a frame period. If all bin intexchanncl angles of = the subband are the same, the Interchannel Angle Consistency Factor is 1.0; whereas, if the inrerchannel angles are randomly scattered, the value approaches zero.
The Subband Angle Consistency Factor indicates if there is a phantom iinage between the charrnels If the consistency is low, then it is desirable to deaorrelate the -channels. A high value indicates a fused image. Image fusion is independent of other signal characteristics.
It will be noted that the Subband Angle Consistency Factor, although an angle parameter, is determined indirectly from two magnitudes. If the interchann.el angles are.
all the same, adding the complex values and then taking the magnitude yields the same result as taking all the magnitudes and adding them, so the qUotient is 1. If the interchannel angles are scattered, adding the complex values (such as adding vectors having different angles) results in at least partial eancellation, so the magnitude of the sum is less than the sum of the magnitudes, and the quotient is less than 1.
Following is a simple example of a subband having two bins:
Suppose that the two complex bin values are (3 +j4) and (6 j8). (Same angle each case: angle = arctan. (imag/real), so anglel arctan (4/3) and ang1e2 =
arctan (8/6) arctan. (4/3)). Adding complex values; sum = (9 j12), magnitude of which is = square root (81+144) =-- 15.
The sum of the magnitudes is magnitude of (3 + j4)+magnitude of (6 +j8) = 5 +
,25 10= 15. The quotient is therefore 15/15 1 = consistency (before 1/n.
normalization, would also be 1 after normaliation) (Normali7ed consistency = (1 - 0.5) f (1 -0.5) =1.0).
If one of the above bins has a different angle, say that the second one hag complex value (6¨j 8), which has the same magnitude, 10. The complex sum is now (9 j4), which has magnitude of square root (81 + 16) = 9.85, so the quotient is 9.85 /
15 = 0.66 =
consistency (before normalization). To normalize, subtract 1/n" 1/2, and divide by (1-1/n) (normali7ed consistency= (0.66 - 0.5) 1(1 - 0,5) = 0.32.) .
_ . . =
'02005/086139 = =
=
Although the above-described technique for determining a Subband Angle Consistency Factor has been found useful, its use is not critical. Other suitable techniques . = may he employed. For example, one could calculate a standard deviation of andles using standard formulae. In any ease, it is desirable to employ amplitude weighting to Tninimire the effect of small signals on the calculated consistency value.
In addition, an alternative derivation of the Subband Angle Consistency Factor may use energy (the squares of the magnitudes) instead of magnitude. This may be accomplished by squaring the magnitude from Step 403 before it is applied to Steps 405 and 407.
' Step 411. Derive Subb and Decorrelation. Scale Factor.
Derive a frame-rate DeCotrelation Scala Factor for each subband as follows:
a.. Let x = frame-rate Spectral-Steadiness Factor of Step 409E
b. Let y= frame-rate Angle Consi stency.Factor of Step 410e.
c. Then the frame-rate Subband Decorrelation Scale Factor = (1¨ x) * (1 y), an-umber between 0 and 1.
Comments regarding Step 411:
The Subband Decorrelation Scale Factor is a function of the spectral-steadiness of signal characteristics over time in a subband of a channel (the Spectral-Steadiness Factor) - = and the consistency in the same subband of a channel of bin angles with respect to corresponding bins of a reference channel (the Interchannel Angle Consistency Factor).
The Subband Decorrelation Scale Factor is high only if both the Spectral-Steadiness Factor and the Interchannel Angle Consistency Factor are low.
As explained above, the Decorrelation Scale Factor controls the degree of envelope decorrelation provided in the decoder. Signals that exhibit spectral steadiness over time preferably should not be decorrelated by altering their envelopes, regardless of what is happening in other channels, as it may-result in audible artifacts, namely wavering or warbling of the signaL
Step 412. Derive Subband Amplitude Scale Factors.
From the subband frame energy values of Step 404 and from the subband frame . energy values of all odim channels (as may be obtained by a step conespOnding to Step 404 or an equivalent thereof), derive frame-rate Subband Amplitude Scale Factors as follows:
. ' ) 20051086139 . .
a. For each subband, sum the energy values per frame across all input channels.
b. Divide each subband energy value per frame, (from Step 404) by the sum of the energy values across all input channels (from Step 412a) to create values in the range of 0 to 1.
c. Convert eachratio to dB, in the range of ¨co to 0.
d. Divide by the scale factor granularity, which may be pet at 1.5 dB, for example, change sign to yield a non-negative value, limit to a maximnm value which maybe, for example, 31 (i.e. 5-bit precision) and round to the nearest integer to create the quantized . .
value. These values are the frame-rate Subband Amplitude Scale Factors and are conveyed as part of the sidechain information.
. e. If the coupling frequency of the encoder is-below about 1000 Hz, apply the subband frame-averaged or frame-accumulated magnitudes to a time smoother that operates on all subbands below that frequency and above the coupling frequency.
Comments regarding Step 412e: See comments regarding step 404e except that in the case of Step 412e, there is no suitable subsequent step in which the time smoothing - may alternatively be performed.
Comments for Step 412: =
Although the granularity (resolution) and quantization precision indicated here have been found to be useful, they are not critical and other values may provide acceptable results. =
Alternatively, one mayuse amplitude instead of enerp- to generate the Subband Amplitude- Scale Factors. If using amplitude, one would-use d13=20*log(amplitade ratio), else if nsing energy, one converts to dB via d13=10*log(energy ratio), where amplitude ratio = square root (energy ratio). =
Step 413. Signal-Dependently Time Smooth interehannel Subband Phase Angles.
ApPly signal-dependent temporal smoothing to subband frame-rate interchannel angles derived in Step 407f:
. a. Let v = Subband Spectral-Steadiness Factor of Step 409d.
b. Let w = corresponding Angle Consistency Factor of Step 410e.
= c. Let x = (1 ¨ w. This is a value between 0 and 1, which is high if the *
Spectral-Steadiness Factor is low and the Angle Consistency Factor is high.
=
=
=
=
'0 20051086139 PCT/US2005/0063z9 =
= d- Let y = 1 ¨x. y is high. if Spectral-Steadiness Factor is high and Angle Consistency Factor is low.
e. Let z = ye'P , where exp is a constant, which maybe = 0.1. z is also in the range of 0 to 1, but skewed toward 1, corresponding to a slow time constant .
If the Transient Flag (Step 401) for the channel is set, set z 0, corresponding to a fast thne constant in the presence of a transient g. Compute Jim, a maximum allowable value of; Jim = 1¨ (0.1 * w). This ranges from 0.9 if the Angle Consistency Factor is high to 1.0 if the Angle Consistency Factor is low (0).
h: Limit z by lim as necessary: if (z > lira) then. z = lim.
1. Smooth the subband angle of Step 407f using the value of z and a running Smoothed value Of Rug e maintained for each subband. TIA=angle of Step 407f and RSA.= running smoothed ang e value as of the previous block and NewRSA =
is the new value of the running smoOthed angle, then: NewRSA = RSA * z + A *
(1¨z). The value of RSA is subsequently set equal to NewRSA before processing the following block. New RSA is the signal-dependently time-smoothed angle output of Step 413.
Comments regarding Step 413:
When a transient is detected, the subband angle update time constant is set to 0, =
allowing a rapid subband angle change. This is desirable because it allows the normal .angle update mechanism to use a range of relatively slow time constant, minimizing = image wandering during tatic or quasi-static signals, yet fast-changing signals are treated = with fast time constants.
Although other smoothing techniques and parameters may be usable, a first-order smoother implementing Step 4/3 has been found to be suitable. If implemented as a first-order smoother flowpass filter, the variable "z" corresponds to the feed-forward coefficient (sometimes denoted aff0"), while "(1-z)" corresponds to the feedback coefficient (sometimes denoted "fb1").
Step 414. Quantize Smoothed Interchannel Subban.d Phase Angles.
Quantize the time-smoothed subhead interchanne1 angles derived in Step 413i to obtain the Subband Angle Control Parameter:
a. If the value is less than 0, add 2; so that all range values to be quantized are . .
=
= = (.=
in the range 0 to 27c.
b. Divide by the angle granularity (resolution), which may be 2z 164 radians, and round to an integer. The maximum value may be set at 63, corresponding to 6-bit quantization.
Comments regarding Step 414:
The quantized value is treated as a non-negative integer, so an easy way to quantize the angle is to map it to a non-negative floating point number ((add 2n if less thnn O 2na1rindthe range 0 to (less than) 27c)), scale by the granularity (resolution), and _round to an integer. Similarly, dequanti7ing that integer (which could otherwise be done with a simple table lookup); can be accomplished by scaling by the inverse of the angle granularity factor, converting anon-negative integer to a non-negative floating point angle (again, range 0 to 2n), after which it can be renormali7ed to the range --ac for further use. Althoug,b such quantivation of the Subband Angie Control Parameter has been found to be useful, such a quantization is not critical and other quantizations may provide ac:ceptable results.
Step 415. Quantize Subband Decorrelation Scale Factors.
Qnantize the Subband Deem-elation Scale Factors produced by Step 411 to, for example, 8 levels (3 bits) by multiplying by 7.49 and rounding to the nearest integer.
These quantized values are part of the sidec,hain information.
Comments regarding Step 415: .
Although such quantization of the Subband Decorrelation Scale Factors has been found to be useful, quantization using the example values is not critical and other quantizations may provide acceptable results.
Step 416. Dequantize Subband Angle Control Parameters.
Dequantin the Subband Angle Control Parameters (see Step 414), to use prior to downrnixing.. .
Comment regarding Step 416:
=
Use of quantized values in the encoder helps maintain synchrony between the encoder and the decoder.
Step 417. Distribute Frame-Rate Dequandzed Subband Angle Control Parameters Across Blocks.
In preparation for dowumixing,-dishibute the once-per-frame dequantized =
=
. = 2005/086139 PCT/US2005/006359 Subband Angle Control Parameters of Step 416 across time to the subbauds of each block within the frame. =
Comment regarding Step 417: =
= The same frame value may be assigned to each block in the frame.
Alternatively, .
it may be useful to interpolate the Subband Angle Control Parameter values across the blocks in a frame. Linear interpolation over time may be employed in the manner of the linear interpolation across frequency, as described below.
Step 418. Interpolate block Subb and Angle Control Parameters to Bins . Distrilmte the block Subhead Angle Control Parameters of Step 417 for each . 10 channel. across frequency to bins, preferably using linear interpolation as described below.
= Comment regarding Step 418:
If linear interpolation across frequency is employed, Step 418 1ninimi7,es phase = angle changes from. bin to bin across a subband boundary, thereby Minimizing aliasing artifacts. Such linear interpolation may be enabled, for example, as described below following the description of Step 422, Subband angles are calculated independently of one another; each representing an average across a subband. Thus, there may be a large change from one subband to the next. If the net angle value for a subband is applied to all bins in the subband (a "rectangular" subb and distribution), the entire phase change from one subband to a neighboring subband occurs between two bins. If there is a strung' signal component there, there may be severe, possibly audible, aliasing.
Linear interpolaticha, between the centers of each subband, for example, spreads the phase angle change over all the bins in the subband, minimizing the change between any pair ofbins, so that, for example, the angle at the low end of a subband mates with the ngle at the high end of the subband below it, while maintaining the overall average the same as the given calculated subband angle. In other words, instead of rectangular subband distributions, the subband angle distribution may be trapezoidally shaped.
=
For example, suppose that the lowest coupled subband has one bin and a subband angle of 20 degrees, the next subband has three bins and a subband angle of 40 degrees, and the third subband has five bins and a subband angle of 100 degrees. With no =
interpolation, assume that the first bin (one subband) is shifted by an angle of 20 degrees, the neit three bins (another subhead) are shifted by an. angle of 40 degrees and the next five bins (a father subband) are shifted by an angle of 100 degrees. In that example, =
. .
s =
PCT./1382005/006359 ,==== "" =
- 41 - =
there is a 60-degree maximum change, from bin 4 to bin 5. .With linear interpolation, the first bin still is shiTted by 'an. angle of 20 degrees, the next 3 bins are shifted by about 30, 40, and 50 degrees;eand the next five bins are shifted by about 67,83, 100, 117, and 133 degrees. The average subband= angle shift is the same, but the maxiinnm bin-to-bin change is reduced to 17 degrees.
Optionally, changes in amplitude from subband to subband, in connection with this and other steps described herein, such as Step 417 may also be treated in a siinilar interpolative fashion_ However, it may not be necessary to do so becanse there tends to be more natural continuity in amplitude from one subband .to the next.
Step 419. Apply Phase Angle Rotation to Bin Transform Values for ChatmeL =
Apply phase angle rotation to eaeh bin transform value as follows:
a. Let x = bin angle for this bin as calculated in Step 418. =
b. Let y = -x;
c. Compute z, a unity-magnitude complex phase rotation scale factor with angle y, z = cos (y) j sin (y). =
d. Multiply the bin value (a ilb) by z.
Comments regarding Step 419: =
The phase angle rotation applied in the encoder is the inverse of the angle derived from the Subband Angle Control .Parameter.
phase angle adjustments, as described herein; in. an encoder or encoding process prior to downmixing (Step 420) have several advantages: (1) they minimiye cancellations .
of the channels that are summed to a mono composite signal or matrixed to multiple channels, (2) they minirnive reliance on energy normalimtion (Step 421), and (3) they precompensate the decoder inverse phase angle rotation, thereby reducing aliasing.
The phase correction factors can be applied in the encoder by subtracting each = subband phase correction value from the angles of each transform bin value in that = subband. This is equivalent to multiplying each complex bin value by a complex number with *a magnitude of 1.0 and an angle eqnal to the negative of the phase correction factor.
Note that a complex nmnber of m agnitude 1, angle A is equal to cos(A)+j sin(A). This latter quantity is calculated once for each subband of each charnel, with A = -phase correction for this subband, then multiplied by each bin complex signal value to realize the phase shifted bin value.
= = - - -CA 3 0 2 62 8 3 2 0 1 8 ¨1 2 ¨ 0 3 - 02005/086139 PC=2005/0063.59 _ =
The phase shift is circular, resulting in circular convolution (as mentioned above).
While circular convolution may be benign for some continuous signals, it may create spurious. spectral components for certain continuous complex signals (such as.
a pitch pipe) or may cause blurring of transients if different phase angles are used for different subbands. Consequently, a suitable technique to avoid circular convolution may be employed or the Transient Flag may be employed such that, for example, when the Transient Flag is True, the anglecalculhtion results may be overridden, and all subbands in a channel may use the same phase correction factor such as zero or a randomized value.
Step 420. Downmix.
Downmix to mono by adding the corresponding complex transform bins across =
channels to produce a mono composite channel or downmix to multiple channels by matrixing the input channels, as for example, in. the manner of the example of FIG. 6, as described below.
Comments regarding Step 420:
In the encoder, once the transform bins of all the channels have been phase shifted, the channels are summed, bin-by-bin, to create the mono composite audio signal.
Alternatively, the Channels may be applied to a passive or active matrix-that provides either a simple summation to one channel, as in the N:1 encoding of FIG. 1, or to multiple channels. The matrix coefficients may be real or complex (real and imaginary).
Step 421. Normalize. =
To avoid cancellation of isolated bins and over-emphasis of in-phase signals, normalize the aniplitude of each bin of the mono composite channel to have substantially the same energy as the Sum of the contributing energies, as follows:
a. Let x = the sum across channels -of binenergies (Le., the squares of the bin magnitudes computed in Step 403).
b. Let y = energy of corresponding bin of the mono composite channel, calculated as per Step 403.
e. Let z = scale factor = square root (x/y). If x = 0 then y is 0 and z is set to 1. =
d. Limit z to a maximum value for example, 100. If z is initially greater than 100 (implying strong cancellation from downmixing), add an arbitary value,, - 20057086139 =
fOr example, 0.01 * square _root (x) to the real and imaginary parts of the mono composite bin, which will assure that it is large enough to be normali7ecl by the following step. =
e. Multiply the complex mono composite bin value by z.
. .
Comments regarding Step 421:
Although it is generally desirable to use the same phase factors for both encoding and decoding, even the optimal choice of a subb and phase correction value may cause one or more audible spectral components within the subband to be cancelled during the encode downmix process because the phase shifting of step 419 is performed on a subban.d rather than a binbasis. In this case, a different phase factor for isolated bins in the encoder may be used if it is detected that the sum energy of such bins is much less than the energy stun of the individual channel bins at that frequency. It is generally not = necessary to apply such an isolated correction factor to the decoder, inasmuch as isolated bins usually have little effect on overall image quality. A similar normalization may be applied if multiple channels rather than a mono channel are employed.
Step 422. Assemble and Pack into Bitstream(s).
. The Amplitude Scale Factors, Angle Control Parameters, Decorrelation Scale Factors, and Transient Flags side channel information for - rh channel, along with the common-mono composite audio or the matrixed multiple channels are multiplexed as may be desired and packed into one or more bitstreams suitable for the storage, transmission or storage and transmission medium or media.
Comment regarding Step 422:
=
The Mono composite audio or the multiple channel audio may be applied to a data-rate reducing encoding process or device such as, for example, a percePtual encoder or to a perceptual encoder and an entropy coder (e.g., arithmetic or Huffman coder) (somethnes referred to as a "lossless" coder) prior to packing. Also, as mentioned above, the mono composite audio (or the multiple channel audio) and related sidechain information may be derived from multiple input channels only for audio frequencies above a certain frequency (a "coupling" frequency). In that case, the audio frequencies below the coupling frequency in each of the multiple input channels may be stored, transmitted or stored and transmitted as discrete channels-or may be combined or = processed in some manner other than as described herein. Discrete or otherwise-=
= f. 45) combined channels may also be applied to a data reducing encoding process or device such as, for example, a perceptual encoder or a perceptual encoder and an entropy . encoder. The mono Composite audio (or the multiple channel audio) and the discrete multichannel audio may all be applied to an integrated perceptual encoding or perceptual and entropy encoding process or device prior to packing.
Optional Interpolation Flag (Not shown in FIG. 4) Interpolation across frequency of the basic phase angle shifts provided by the Subb and Angle Control Parameters may be enabled in. the Encoder (Step 418) and/or in the Decoder (Step 505, below). The optional Interpolation Flag sidechain parameter. may be employed for enablinginterpolation in the Decoder. Either the Interpolation Flag or = an enabling flag similar to the Interpolation Flag may be used in Encoder. Note that because the Encoder has access to data at the bin level, it may use different interpolation values than the Decoder, which interpolates the Subband Angle Control Parameters in the sidechain information, The use of such interpolation across frequency in the Encoder or the Decoder may be enabled it for example', either of the following two conditions are true:
Condition 1. Ha strong, isolated spectral peak is located at or near the boundary of two subbands that have substantially different phase rotation angle assignments.
Reason: without interpolation, a large phase change at the boundary may introduce a warble in. the isolated spectral component By using interpolation to spread the band-to-band phose change across the bin values within the band, the =
amount of change it the subband boundaries is reduced. Thresholds for spectral peak strength, closeness to a boundary and (iifference in phase rotation from subband to subband to satisfy this condition may be adjusted empirically.
Condition 2. If, depending on the presence of a transient, either the intercharmel phase angles (no transient) or the absolute phase angles within a channel (transient), comprise a good. fit to a linear progression.
Reason.: Using interpolation to reconstruct the data tends to provide a .
= better fit to the original data. Note that the slope of the linear pingessiOn need = not be constant amass all frequencies, only within each subband, since angle data -will still be conveyed to the decoder on a subband basis; and that forms the input =
=
-2005/086139 PCMTS2005/00( =
- 45 - =
to the Interpolator Step 418: The degree to which the data provides a good fit to satisfy tbiS condition may also be determined empirically.
Other conditions, such as those determined. empitiCally, may benefit from interpolation across frequency. The existence of the two conditions just mentioned may be determined as follows:
Condition 1. If a strong, isolated. spectral peak is located at or near the boundary of two subbands that have substantially different phase rotation angle assignments:
for the Interpolation Flag to be u,4ed by the Decoder, the Subband Angle Control Parameters (output of Step 414), and for enabling of Step 418 within the Encoder, the output of Step 413 before *quantization may be used to determine the rotation angle from subband to subband for both the Interpolation Flag and for enabling within the Encoder, the magnitude output of Step 403, the current DFT magnitudes, may be used to find = ' isolated peaks at subband boundaries.
Condition 1 It depending on the presence of a transient, either the = interchannel phase angles (no transient) or the absolute phase angles within a channel (transient), comprise a good fit to a linear progression.:
= if the Transient Flag is not true (no transient), use the relative interchannel = = bin phase angles tona=Step 406 for the fit to a linear progression determination, and if the Transient Flag is true (transient), us the ehannel's absolute phase angles from Step 403.
Decoding =
The steps of a decoding process ("decoding steps") may be described as follows.
With respect to decoding steps, reference is made to FIG. 5, which is in the nature of a hybrid flowchart and functional block diagram. For simplicity, the figure shows the derivation of sidechain information components for one channel, it being understood that sidechain information components must be obtained for each Channel unless the channel is a reference channel for sail components, as explained elsewhere.
= Step 501. Unpack and Decode=Sidechain Information.
=
Unpack and decode (including dequantization), as necessary, the sidechain data =
=
= =
- 46 - =
components (Amplitnde Scale Factors, Angle Control Parameters; Decorrelation Scale Factors, and Transient Flag) for each frame of each.ehamael (one channel shown in FIG..
5). Table lookups may be used to decode the Amplitude Scale Factors, Angle Control Parameter, and Decorrelation. Scale Factors.
_ Comment regarding Step 501: As explained above, if a reference channel is employed, the sidechain data for the reference channel may not include the Angle Control Parameters, Decorrelation Scale Factors, and Transient Flag.
= Step 502. Unpack and Decode Mono Composite or Multichannel Audio Strisi.
Unpack and decode, as necessary, the mono composite or multicbannel audio signal information to provide DFT coefficients for each transform bin of the mono composite or multichannel audio signal.
Comment regarding Step 502:
Step 501 and Step 502 may be considered to be part of a single unpacking and decoding step. Step 502 may include a passive or active matrix.
Step 503. Distribute Angle Parameter Values Across Blocks.
Block Subband Angle Control Parameter values are derived from the dequantiv.ed = =
frame Subband Angle Control Parameter values. -Comment regarding Step 503:
Step 503 may be implemented by distributing the same parameter value to every block in the frame. =
= Step 504.. Distribute Subband Decorrelation Scale Factor Across Blocks. =
Block Subband Decorrelation Scale Factor values are derived from the dequantized frame Subband Decorrelation Scale Factor values.
Comineit regarding Step 504;
Step 504 may be implemented by distributing the same scale factor value to every block in the frame.
Step 505. Linearly Interpolate Across Frequency.
Optionally, derive bin angles from the block subband angles of decoder Step 30. by linear interpolation across frequency as described above in-connection with encoder Step 418. Linear interpolation in Step 505 may be enabled when the Interpolation Flag is = used and is true. =
=
. =
= -=
YO 2005/086139 PCT/US2005/006: _ = -47-.=
Step 506. Add Randomized Phase Angle Offset (Technique 3).
In accordance with=Technique 3, described above, when the Transient Flag indicates a. transient, aim to the block Subband Angle Control Parameter provided by Step = =
503, which may have been linearly interpolated across frequency by Step 505, a randorni7ed offset value scaled by the Decorrelation. Scale Factor (the scaling may be indirect as set forth in this Step): = =
Let y --= block Subbond Decorrelation Scale Factor. ' b. Let z =ye?, where exp is a constant, for example -- 5. z will also be in the range of 0 to 1, but skewed toward 0, reflecting a bias toward low levels of randomized variation unless the Decorrelation Scale Factor value is high.
c. Let x = a randomized number between +1.0 and 1.0, chosen separately for = each sublarmd of each block. =
d. Then, the value added to the block Subband Angle Control Parameter to add a randomized angle offset value according to Technique 3 is ,x * pi * z.
Comments regarding Step 506:
As will be appreciated by those of ordinary skill in the art, "randomized"
angles =
(or "randomized amplitudes if amplitudes are also scaled) for scaling by the De,correlation.
Scale Factor may inelude not only pseudo-random and truly random variations, but also deterministically-generated variations that, when applied to phase angles or to phase angles and to amplitudes, have the effect of reducing cross-correlation between channels.
Such "randomized" variations may be obtained in many ways. For example, a pseudo-random number generator with various seed values maybe employed.
Alternatively, truly random: numbers maybe generated using a hardware random number generator.
Inasmuch as a r5ndorni7ed angle resolution of only about 1 degree may be sufficient, tables of randomi7ed numbers having two or three decimal places (e.g. 0.84 or 0.844) may be employed. Preferably, the random Ind values (between ¨1.0 and +1.0 with reference to Step 505c, above) are nniformly distributed statistically across each channel.
'Although the non-linear indirect scaling of Step 306 has been found to be useft.11, it is net critical-end other suitable scalings may be employed ¨ in particular other values for the exponent may be employed to obtain similar result.
When the Subband Decorrelation Scale Factor value is 1, a frill range of random angles from to n are added (in which ease the block Subband Angle Control =
, =
= =
I
_ = = WO 2005/086139 =
PCT/US2005/0( ) = = - 48 -Parameter values produced by Step 5th are rendered irrelevant). As the Subband . -Decorrelation Scale Factor value decreases toward zero, the randomizedangle offset also decreases toward zero, carming the output of Step 506 to move toward the Subband Angle Control Parameter values produced by Step 503.
If desired, the encoder described above may also add a scaled randomized offset in accordance with Technique 3 to the angle shift applied. to a channel before downmixing. Doing so may improve alias cancellation in the decoder. It may also be beneficial for improving the synehronicity of the encoder and decoder.
Step 507. Add Randomized Phase Angle Offset (Technique 2).
.In accordance with Technique 2, described above, when the Transient Flag does ' not indicate a transient, for each bin, add to all the block Subband Angle Control Paraineters in a frame provided by Step 503 (Step 505 operates only when the Transient Flag indicates a transient) a different randomized offset value scaled by the Decorrelation Scale Factor (the scaling may be direct as set forth herein in this step):
=
a. Let y = block Subbandpecorrelation Scale Factor.
b. Let x a randomind number between +1.0 and ¨1.0, chosen separately for each bin of each frame.
c. Then, the value added to the block bin Angle Control Parameter to add a randornived angle offset value according to Technique 3 is x * pi * y.
. Comments regarding Step 507:
Sea comments above regarding Step 505 regarding the randomized angle offset.
Although the direct scaling of Step 507 has been found to be useful, it is not critical and other suitable sealings may be employed.
To minimize temporal discontinuities, the -unique randomized angle value for each bin of each channel preferably does not change with time. The randorni7ed angle values of all the bins in a- subb and ate scaled by the same Subband Decorrelation Scale Factor value, which is updated at the frame rate. Thus, when the Subband Decorrelation Scale = Factor value is I, a full range of random angles from ---7t to +7r are added (in which case block subband angle values derived from the dequantized frame suliband angle values are rendered irrelevant). As the Subband Decorrelation Scale Factor value -diminishes toward zero, the randomized angle offset also diminishes tbward zero. Unlik-e Step 504, the scaling in this Step 507 maybe a direct function of the Subband Decorreladon Scale = = =
= =
70 2005/086139 PC1702005/006:
-49..
Factor value. For example, a Subband Decorrelation Scale Factor value of 0.5 proportionally reduces every random angle variation by 0.5.
. The scaled randomind ang a value may then be added to the bin angle from decoder Step 506. The Decorrelation Scale Factor value is updated once per frame. In the presence of a Transient Flag for the frame, this step is skipped, to avoid transient prenoise attifacts.
, If desired, the encoder described above may also add a scaled randornind offset in accordance with Technique 2 to the angle shift applied before downmixing..
Doing so may improve alias cancellation in the decoder. It may also be beneficial for improving the synchronicity of the encoder and decoder.
Step 508. Normalize Amplitude Scale Factors.
Normalize Amplitude Scale Factors across channels so that they sum-square to I.
Comment regarding Step 508:
For example, if two channels have dequantized scale factors of -3.0 d13 (= 2 *
grannlarity of 1.5 dB) (.70795), the sum of the squares is 1.002. Dividing each by the square root of 1.002 = 1.001 yields two values of .7072 (-3.01 dB).
Step 509. Boost Subband Scale Factor Levels (Optional). -Optionally, when the Transient Flag indicates no transient, apply a slight additional boost to Subband Scale Factor levels, dependent on Subband Decorrelation Scale Factor levels: multiply each nonna1i7ed Subband Amplitude Scale Factor by a small factor (e.g., 1+02 * Subband Decorrelation Scale Factor). When the Transient Flag is True, skip this step.
Comment regarding Step 509:
This step maybe useful because the decoder decorreiation, Step 507 may result in slightly reduced levels in the final inverse -filterbank process.
Step 510. Distribute Subband Amplitude Values Across Bins.
= = Step 510 may be implemented by distributing the same subband amplitude scale factor value to every bin in the subb and.
Step 510a. Add Randomized Amplitude Offset (Optional) Optionally, apply a randomized variation to the normalized Subband Amplitude Scale Factor dependent on Subb and Deeotrelation Scale Factor levels and the Transient Flag. In the absence of a transient, add a Randomized Amplitude Scale Factor that does =
=
'VO 20051086139 PC171:62005/00µ
= =
not change with time on a bin-by-bin basis (different from bin to bin), and, in the presence of a transient (in the frame or block), add a Randomized Amplitude Scale Factor that changes on a block-by-block basis (different from block to block) and changes from = subband to subhead (the same shift for all bins in a subband;, different from subband to subband). Step 510a is not shown in the drawings.
Comment regarding Step 510a:
Although the degree to which randoroi7ed amplitude shifts are added may be controlled by the Decorrelation Scale Factor, it is believed that a particular scale factor value should cause less amplitude shift than the corresponding randomized phase shift resulting from the same stale factor value in order to avoid audible artifacts.
Step 511. Tipmix.
a. For each bin of each ()Input channel, construct a complex upmix scale .
factor from the amplitude of decoder Step 508 and the bin angle of decoder Step 507: (amplitude * (cos (angle) +j sin (angle)).
b. For each output channel, multiply-the complex bin value and the complex upnaix scale factor to produce the upmixed complex output bin value of = each bin of the chnnneL
= Step 512. Perform Inverse DFT (Optional).
Optionally, perform an inverse DFT transform on the bins of each output channel 20. to yield multichannel output PCM values. As is well known, in connection with such an inverse DFT tranaformation, the individual blocks of time samples are windowed, and adjacent blocks are overlapped and added together in order to reconstruct the final continuous time output Pa/ audio signal.
Comments regarding Step 512:
A decoder according to the present invention may not provide PCM outputs. In the case where the deroder process is employed only above a given coupling frequency, and discrete MDCT coefficients are sent for each channel below that frequency, it may be desirable to convert theDFT coefficients derived by the decoder upmixing Steps 511a and 51n to MDCT coefficients, so that they can be combined with the lower frequency discrete MDCT coefficients and requantized in order to provide, for example, a bitstream compatible with an encoding system that has a large number of installed users, such as a standard AC-3 SP/DE bitstrearn for application to an external device where an inverse = = :
- =
= =
' "0 2005/086139 PCT/US2005/006 =
= =
transform may be performed. Antinverse DFT transform may be applied to ones of the output channels to provide PCM outputs.
Section 8.2.2 of the4/52A Document With Sensitivity Factor "F" Added = 8.2.2. Transient detection Transients are, detected in the full-bandwidth channels in order to decide when to switch to short length audio blocks to improve pre-echo performance. High-pass filtered versions of the Signals are examined for an increase in energy from one sub-block time-segment to the next. Sub-blocks are examined at different time scales, If a transient is = 10 detected in the second half of an audio block in a channel that channel switches to a short = block. A channel that is block-switched uses the D45. exponent strategy [i.e., the data liss a coarser frequency resolution in order to reduce the data overhead resulting from the increase in temporal resolution].
= The transient detector is used to determine when to switch from a long transform block (length 512), to the short block (length 256). It operates on 512 samples for every audio block. This is done in two passes, with each pass processing 256 'samples. Transient detection is broken down into four steps: 1) high-pass filtering, 2) segmentation of the block into submultiples, 3) peak amplitude detection within each sub-bloek segment, and 4) threshold comparison. The transient detector outputs a flag biksw[n] for each full-bandwidth channel, which when set to "one' indicates the presence of a transient in the second half of the 512 length input block for the corresponding channel.
1) High-pass filtering:.Theligh-pass filter is implemented as a cascaded biquad direct form II BR filter with a cutoff of 8.k1{z.
4. - 1 -,-Description RECONSTRUCTING AUDIO SIGNALS WITH MULTIPLE DECORRELATION TECHNIQUES
This is a divisional of Canadian Patent Application No. 2,992,051 filed February 28, 2005 which is a divisional Canadian Patent Application No. 2,917,518 filed February 28, 2005, which is a divisional of Canadian Patent Application Serial No. 2,808,226 filed February 28, 2005, which is a divisional of Canadian National Phase Patent Application Serial No. 2,556,575 filed February 28, 2005.
Technical Field The invention relates generally to audio signal processing. The invention is particularly useful in low bitrate and very low bitrate audio signal processing. More particularly, aspects of the invention relate to an encoder (or encoding process), a decoder (or decoding processes), and to an encode/decode system (or encoding/decoding process) for audio signals in which a plurality of audio channels is represented by a composite monophonic ("mono") audio channel and auxiliary ("sidechain") information. Alternatively, the plurality of audio channels is represented by a plurality of audio channels and sidechain information. Aspects of the invention also relate to a multichannel to composite monophonic channel downmixer (or downmix process), to a monophonic channel to multichannel upmixer (or upmixer process), and to a monophonic channel to multichannel decorrelator (or decorrelation process). Other aspects of the invention relate to a multichannel-to-multichannel downmixer (or downmix process), to a multichannel-to-multichannel upmixer (or upmix process), and to a decorrelator (or decorrelation process).
Background Art In the AC-3 digital audio encoding and decoding system, channels may be selectively combined or "coupled" at high frequencies when the system becomes starved for bits. Details of the AC-3 system are well known in the art - see, for example: ATSC Standard A52/A:
Digital Audio Compression Standard (AC-3), Revision A, Advanced Television Systems Committee, 20 Aug.
2001. The A/52 A
document is available on the World Wide Web at http://www.atsc.org/standards.html.
The frequency above which the AC-3 system combines channels on demand is referred to as the "coupling" frequency. Above the coupling frequency, the coupled channels are combined into a "coupling" or composite channel. The encoder generates "coupling coordinates"
(amplitude scale factors) for each subband above the coupling frequency in each channel. The coupling coordinates indicate the ratio of the original = = = 73221-92 , =
= - 2 -= energy of each coupled channel subband to the energy of the corresponding subband in .
-= the composite channeL Below the coupling fu'lquency,.channels are encoded discretely.
The phase polarity of a coupled. channel's subbandmay be reversed:before the channel is combined with=one or more other coupled channels in order to reduce out-of-phase signal component cancellation. The composite channel along with sidechaininforination that= .
includes, on a per-subband basis, the coupling Coordinates and whether the channel's phase is inverted, are sent to the decoder. In praCtice, the coupling frequencies. employed = in. commercial embodiments of the AC-3 system have ranged from about 10 kHzio about 3500 Hz. U.8. Patents 5,583,962; -5,633;981, 5,727,119,5,909,664, and 6,021,386 include teachings that relate to the combining of multiple audio channels into a composite channel and auxiliary or sidechain information and the recovery therefrom of an approximation to the original multiple channels.
Disclosure of the litveution =
. Aspects Of the present invention may be viewed as improvements upon the =
. = "coupling" techniques of the AC-3 encoding and decoding system and also upon other techniques in which multiple channels of audio arc combined either to a monophonic -composite signal or to multiPle channels of audio along with related auxiliary infortaation and from which multiple channels of audio are reconstructed. Aspects of the present .
invention also may be viewed as improvements upon techniques for. downmixing multiple audio channels to a monophonic audio sigtial or to multiple audio channels and for =
decorrelating multiple audio channels derived from a monophonic audio Channel or from .=
multiple audio channels. =
. .
=
Aspects .of the invention may. be employed in an N:1:N spatial audio coding -technique (where "N'.' ikthe number of audio Channels) or an M:1:N spatial audio coding = ' technique (where."1V1" is the number' of encoded audio olmnnels and "N" is the number of, .
decoded audio channels) that improve on channel coupling, by providing, among other , things, improVed phase compensation, deconelatiOn mechanisms, ,and signal-dependent variable time-constants. Aspects of the present invention may also be employed in N:x:N
and M:x..N spatial audio ,coding techniques wherein "x" may be 1 or greater than 1.
- Goals include the reduction of coupling cancellation artifacts in the encode process by' adjusting relative interchannel phase before downeaixing, and improving the spatial . =
=
= =
. .
=
dimensionally of the reproduced signal by restoring the phase angles and degrees of decorrelation in the decoder. Aspects of the invention when embodied in practical embodiments should allow for continuous rather than on-demand channel coupling and lower coupling frequencies than, for example in the AC-3 system, thereby reducing the required data rate.
According to one aspect of the present invention, there is provided a method performed in an audio decoder for reconstructing N audio channels from an audio signal having M encoded udio channels, the method comprising: receiving a bitstream containing the M encoded audio channels and a set of spatial parameters, wherein the set of spatial parameters includes an amplitude parameter and a correlation parameter;
wherein the correlation parameter is differentially encoded across time; decoding the M
encoded audio channels to obtain M audio channels, wherein each of the M audio channels is divided into a plurality of frequency bands, and each frequency band includes one or more spectral components; extracting the set of spatial parameters from the bitstream;
applying a differential decoding process across time to the differentially encoded correlation parameter to obtain a differentially decoded correlation parameter; analyzing the M audio channels to detect a location of a transient; decorrelating the M audio channels to obtain a decorrelated version of the M audio channels, wherein a first decorrelation technique is applied to a first subset of the plurality of frequency bands of each audio channel and a second decorrelation technique is applied to a second subset of the plurality of frequency bands of each audio channel; deriving the N audio channels from the M audio channels, the decorrelated version of the M audio channels, and the set of spatial parameters, wherein N is two or more, M is one or more, and M is less than N; and synthesizing, by an audio reproduction device, the N
audio channels as an output audio signal, wherein both the analyzing and the decorrelating are performed in a frequency domain, the first decorrelation technique represents a first mode of operation of a decorrelator, the second decorrelation technique represents a second mode of operation of the decorrelator, and the audio decoder is implemented at least in part in hardware.
According to another aspect of the present invention, there is provided an audio decoder for decoding M encoded audio channels representing N audio channels, the audio decoder comprising: an input interface for receiving a bitstream containing the M encoded - 3a -audio channels and a set of spatial parameters, wherein the set of spatial parameters includes an amplitude parameter and a correlation parameter; wherein the correlation parameter is differentially encoded across time; an audio decoder for decoding the M
encoded audio channels to obtain M audio channels, wherein each of the M audio channels is divided into a plurality of frequency bands, and each frequency band includes one or more spectral components; a demultiplexer for extracting the set of spatial parameters from the bitstream; a processor for applying a differential decoding process across time to the differentially encoded correlation parameter to obtain a differentially decoded correlation parameter, and analyzing the M audio channels to detect a location of a transient; a decorrelator for decorrelating the M audio channels, wherein a first decorrelation technique is applied to a first subset of the plurality of frequency bands of each audio channel and a second decorrelation technique is applied to a second subset of the plurality of frequency bands of each audio channel; a reconstructor for deriving N audio channels from the M audio channels and the set of spatial parameters, wherein N is two or more, M is one or more, and M is less than N; and an audio reproduction device that synthesizes the N audio channels as an output audio signal, wherein both the analyzing and the decorrelating are performed in a frequency domain, the first decorrelation technique represents a first mode of operation of the decorrelator, and the second decorrelation technique represents a second mode of operation of the decorrelator.
Description of the Drawings FIG. 1 is an idealized block diagram showing the principal functions or devices of an N:1 encoding arrangement embodying aspects of the present invention.
FIG. 2 is an idealized block diagram showing the principal functions or devices of a 1:N decoding arrangement embodying aspects of the present invention.
FIG. 3 shows an example of a simplified conceptual organization of bins and subbands along a (vertical) frequency axis and blocks and a frame along a (horizontal) time axis. The figure is not to scale.
- 3b -FIG. 4 is in the nature of a hybrid flowchart and functional block diagram showing encoding steps or devices performing functions of an encoding arrangement embodying aspects of the present invention.
FIG. 5 is in the nature of a hybrid flowchart and functional block diagram showing decoding steps or devices performing functions of a decoding arrangement embodying aspects of the present invention.
FIG. 6 is an idealized block diagram showing the principal functions or devices of a first N:x encoding arrangement embodying aspects of the present invention.
FIG. 7 is an idealized block diagram showing the principal functions or devices of .. an x:M decoding arrangement embodying aspects of the present invention.
FIG. 8 is an idealized block diagram showing the principal functions or devices of a first alternative x:M decoding arrangement embodying aspects of the present invention.
FIG. 9 is an idealized block diagram showing the principal functions or devices of a second alternative x:M decoding arrangement embodying aspects of the present invention.
Best Mode for Carrying Out the Invention Basic N:1 Encoder Referring to FIG. 1, an N:1 encoder function or device embodying aspects of the present invention is shown. The figure is an example of a function or structure that = , . WO 2005/086139 PCT/ITS2005/00 performs as a basic encoder embodying aspects of the invention. Other functional or strantaral arrangements that practice aspects of the invention may be employed, including alternative and/or equivalent functional or structural arrangements described below.
Two or more andio input channels are applied to the encoder. Although, in principle, aspects of the invention may be practiced by analog, digital or hybrid analog/digital embodiments, examples disclosed herein are digital embodiments.
This, = the input signals may be time samples that may have been derived from analog audio signjl. The time samples may be encoded as linear pulse-code modulation (PCM) signals. Each linear PCM audio input channel is processed by a ffiterbank function or device having both an in-phase and a quadmture output, such as a 512-pointwindowed forward discrete Fourier transform (DFT) (as implemented by a Fast Fourier Transform (FFT)). The flterbank may be considered to be a thus-domain to frequency-domain transfonn. =
FIG. 1 shows a first PCM channel input (channel "1") applied to a filterbank function or device, "Filterbank" 2, and a second PCM channel input (channel "n") = applied, respectively, to another filterbank function, or device, "Filterbank" 4. There may be "n" input channels, where "n" is a whole positive integer equal to two or more. Thus, there also are "n" Filterbanks, each receiving a unique one of the "n" input channels. For simplicity in presentation, FIG. 1 shows only two input channels, "1" and "IV.
=
When a Filterbank is implemented by an FFT, input time-domain signals are segmented into consecutive blocks and are usually processed in overlapping blocks. The Mfrs discrete frequency outputs (transfonu coefficients) are referred to as bins, each having a complex value with real and imaginary parts corresponding, respectively, to in-phase and quadrature cnrnponents. Contiguous transform bins may be grouped into subbands approximating critical bandwidths of the human ear, and most sidechain information produced by -the encoder, as will be described, may be calculated and transmitted" on a per-subb and basis in order to minimize pmcpssing resources and to reduce the bitrate. Multiple successive time-domain blocks may be grouped into frames, with individual, block values averaged or otherwise combined or accumulated across each 50 frame, to minimize the sidechain data rate. In examples described herein, each ffiterbank isimplemented by an FFF, contiguous transform bins are grouped into subbands, blocks . = are grouped into frames and sidechain data is sent on a once per-frame basis.
. . - =
' = = ' ,W0200510g6139 PCM0S2005/0063 =
Alternatively; sidechain data may be sent on a more than once per frame basis (e.g., once per block). See, for example, FIG. 3 and its description, hereinafter. As is well known, there is a tradeoff between the frequency at which sideAsiri information is sent and the - required bitrate.
A suitable practical implementation of aspects of the present invention may employ fixed length frames of about 32 milliseconds when a48 kHz sampling rate is employed, each frame having six blonlrs at intervals of about 5.3 milliseconds each (employing, for example, blocks having a duration of about 10.6 milliseconds with a 50%
overlap). However, neither such timings nor the employment of fixed length frames nor their division mto a fixed number of blocks is critical to practicing aspects of the invention provided that information described herein as being sent on a per-frame basis is = sent no less frequently than about every 40 milliseconds. Frames may be of arbitrary size and their size may vary dynamically. Variable block lengths may be employed as in the AC-3 system cited above. It is with that understanding that reference is made herein to es" and "blocks."
hi practice, if the composite mono or multichannel signal(s), or the composite mono or multichsrmel signal(s) and discrete low-frequency channels, are encoded, as for example by a perceptual coder, as described below, it is convenient to employ the same ' frame and block configuration as employed intim perceptual coder. Moreover, if the coder emPloys variable block lengths such that there is, from time to time, a switching from one block length to another, it would be desirable ifOne or more of the sidechain information as described herein is updated when such a block switch occurs. In order to minimize the increase in data overhead upon the updating of sidechain information upon the occurrence of such a switch, the frequency resolution of the Updated sidechain information may be reduced.
= FIG. 3 shows an example of a simplified conceptual organization of bins and subbands along a (vertical) frequency tods and blocks and a frame along a (horizontal) time axis. When bins are divided into subbands that approximate critical bands, the lowest frequency subbauds have the fewest bins (e.g., one) and the number of bins per subband increase with increasing frequency.
- Returning to FIG. 1, a frequency-doma* versign ofeach of then time-domain input channels, produced by the each 4am/id's respective Filterbank (Filterbanks 2 .and 4 . .
, . , =
=
" WO 2003/086139 PCTATS2005/00.
= "
in this example) are summed together ("downmix' ed") to a monophonic ("mono") composite audio signal by an additive combining fimction of device "Additive Combiner"
6. =
The downmixing may be applied to the entire frequency bandwidth of the input audio signals or, optionally, it may be limited to frequencies above a given "coupling"
frequency, inasmuch as artifacts of the downmixing process may become more audible at middle to low frequencies. In such cases, the channels may be conveyed discretely below the coupling frequency. This strategy may be desirable even if processing artifacts are not an:issue, in that naid/low frequency.subbands constructed by grouping transform bins into critical-band-like subbands (size roughly proportional to frequency) tend to have a = =
small number of transform bins at low frequencies (one bin at very low frequencies) and.
- may be directly coded with as few or fewer bits than is required to send a downmixecl mono audio signal with siderthain information. A coupling or transition frequency as low as 4 kHz, 2300 Hz, 1000 Hz, or even the bottom of the frequency band of the audio signals applied to the encoder, may be acceptable for some applications;
particularly those in which a very low bitrate is important. Other frequencies may provide a useful balance - between bit savings and listener acceptance. -The choice of a particular coupling frequency is not critical to the invention. The coupling frequency rnay be variable and, if variable, it may depend, for example, directly or indirectly on input signal characteristics.
Before dovvnmixing, it is an aspect of the present invention to improve the channels' phase angle alignments vis-a-vis each other, in order to reduce the cancellation of out-of-phase signal components when the channels are combined and to provide an improved mono composite chsrmel. This maybe accomplished by- controllably shillbg over time the "absolute angle" of some or all of the transform bins in ones of the channels. For example, all of the transform bins representing audio above a coupling frequency, thus defining a frequency band of interest, may be controllably shifted over time, as necessary, in every channel or, when one channel is used as a reference, in all but the reference channel. =
The "absolute angle" of a binmay be taken as the angle of the magnitude-and-angle representation of each complex valued tran.sform bin produced by a filterbank Controllable shifting of the absolute angles of bins in a channel is performed by an angle rotation fimetion or device ("Rotate Angie"). Rotate Angle 8 processes the output of =
=
=
=
= = - = .
=
= _ , = = WO 2005/086139 PCT/US2005/0063 ; =
4.
= - 7 7 =
Filterbank 2 prior to its application to the downmix summation provided by Additive Combiner 6, while Rotate Angle 10 processes the output of Filterbank 4 prior to its application to the Additive Combiner 6. It will be appreciated that, under some signal conditions, no angle rotation may be required for a particular.traniform bin over a time period (the time period of a frame, in examples described herein). Below the coupling' frequency, the channel information may be encoded discretely (not shown in FIG. 1).
In principle, an improvement in the channels' phase angle alignments with respect to, each other may be accomplished by shifting the phase of every transform bin or subband by the negative of its absolute phase angle, in each block throughout the = frequency band of interest Although this substantially avoids cancellation of out-of-phase signal components, it tends to cause artifacts that may be audible, particularly if the resulting mono composite signal is listened to in isolation. Thus, it is desirable to employ the principle of "least treatment" by shifting the absolute angles of bins in a rhnnnel only as much as necessary to m11mmi7e out-of-phase cancellation in the downmix process and minimize spatial image collapse of the multichannel signals reconstittrted by the decoder.
Techniques for Glett-rrnining such angle shifts are described below. Such techniques include time and frequency smoothing and the manner in which the signal processing responds to the presence of a transient = Energy normalization may also be performed on a per-bin basis in the encoder to reduce farther any remaining out-of-phase cancellation of isolated bins, as described further below.. Also as described further below, energy normali7ation may also be = performed on a per-subband basis (in the decoder) to assure that the energy of the mono composite signal equals the sums of the energies of the contributing channels.
Each input channel has an audio analyzer function or device ("Audio Analyzer") associated with it for generating the sidechain information for that channel and for controlling the amount or degree of angle rotation applied to the channel before it is = - applied to the downmix summation 6. The Filterbonlr outputs of channels 1 and n are applied to Audio Analyzer 12 and to Audio Analyzer 14, respectively. Audio Analyzer 12 generates the sidechain information for channel 1 and the amount of phase angle rotation for channel 1. Audio Analyzer 14 generates the sidechain information for channel n and the amount of angle rotation for tharmel U. It will be understood that such references herein to "angle" refer to phase angle.
= =
. = . = =
- =
= =
_ .
s WO 2005/08613-9 PCT/US2005/00( The sidechain inforination for each channel generated by an audio analyzer for each channel may include:
= an Amplitude Scale Factor ("Amplitude SF"), =
an Angle Control Parameter, a Decor-relation Scale Factor ("Decorrelation SF"), a Transient Flag, and optionally, an Interpolation Flag = Such sidechain information may be characterized as "spatial parameters,"
indicative of spatial properties of the channels and/or indicative of signal charac.
teristics that may be relevant to spatial processing, such as transients. In each case, the sidechain information applies to a single subband (except for the Transient Flag and the Interpolation Flag, each =
of which apply to all subbands within a channel) and may be updated once per frame, as in the examples described below, or upon the Occurrence of a block switch in a related coder. Further details of the various spatial parameters are set forth below.
The angle =
rotation for a particular channel in the encoder may be taken as the polarity-reversed Angle Control Parameter that forms part of the sidechain information.
= Ha reference channel is employed, that channel may not require an Audio . Analyzer or, alternatively? may require an Audio Analyzer that generates only Amplitude Scale Factor sidechain infomiation. it is not necessary to send an Amplitude Scale Factor if that scale factor can be deduced with sufficient accuracy by a decoder from the Amplitude Scale Factors of the other, non-reference, cbinnels. It is possible to deduce in = the decoder the approximate ialue of the reference channel's Amplitude Scale Factor if the energy normalization in the encoder assures that the scale factors across channels within any subband aubstantiallysum square, to 1, as described below. The deduced approximate reference channel Amplitude Scale Factor value may have errors as a result of the relatively coarse q-uantiyation of amplitude scale factors resulting in image shills in .
the reproduced multi-channel audio. However, in a low data rate environment such artifacts may be more acceptable than using the bits to send the reference channel's Amplitude Scale Factor. Neverthelessiin some cases it may be desirable to employ an =
audio analyzer for the refefence ...hannel that generates, at least, Amplitude Scale Factor = sidechain information. =
=
=
=
=
= = = -, PCT/IIS2005/006-. ' =
= =
= = -9- =
= FIG. 1 showsin a dashed line an optional input to each audio, Anslyzer from the PCM time domain input to the audio analyzer in the channel. This input may be used by the Audio Analyzer to detect a transient over a time period (the period of a block or frame, in the examples described herein) and to generate a transient indicator (e.g., a one-bit "Transient Flag") in response to a transient Alternatively, as described below in the comments to Step 40g of FIG. 4, a transient may be detected in the frequency domain, in which case the Audio Analyzer need not receive a time-domain input-The mono composite audio signal and the sidechain information for all the channels (or all the channels except the reference channel) may be stored, transmitted, or stored and transmitted to a decoding process or device ("Decoder").
Preliminary to the = storage, transmission, or storage and transmission, the various audio signals and various sideehain information may be multiplexed and packed into one or more bitstreams suitable for the storage, transmission or storage and transmission medium or media_ The mono composite audio may be applied to a data-rate reducing encoding process or device such as, for example, a perceptual encoder or to a perceptual encoder and an entropy coder (e.g., arithmetic or Huffman coder) (sometimes referred to as a "hissle,ss" coder) prior to storage, transmission, or storage and transmission. Also, as mentioned above, the mono composite audio and related sidechain information may be derived from multiple input channels only for audio frequencies above a certain frequency (a "Coupling"
frequency). In that case,. the audio frequencies below the coupling frequency in each of the multiple input-rthannels may be stored, transmitted or stored and transmitted as = discrete channels or may be combined or processed in some manner other than as described herein'. SuCh discrete or otherwise-combined channels may also be applied to a data reducing encoding process or device such as, for example, a perceptual encoder or a perceptual encoder and anentropy encoder. The mono composite audio and the discrete = multichannel audio may all be applied to an integrated perceptual encoding or perceptual and entropy encoding process or device.
The particular manner in which sidechain information is carried in the encoder =
bitstream. is not critical to the invention. If desired, the sidechsh information may be carried in such as way that the bitstream is compatible with legacy decoders (i.e., the bitstream is backwards-compaiible). Many suitable techniques for doing so are known.
For example, many encoders generate a bitstream having unused or null bits that are =
=
= = = = -. .
= . -73221-92 = =
_ , =
= = =
. . .
. ignored by the decoder. An example of sueh anarrangement is set forth in United States = 'Patent 6,807,528 B1 of Truman et al, entitled "Adding Data to a Compressed Data Frame," October 19, 2004; = = . . .
Such bits ray be replaced with the sid.echain information. Another example is = .5 = that the Sideehain information ni.ay be steganographically encoded in the encoder's-. .
. bitsiream. Alternatively, the sidechain information may be stored or transmitted separately from the backwards-compatible bitstream by any technique that permits the =
transmission or storage of such infonnaticin along with a moon/stereo hitstreara = =
.. = . compatible with legacy decoders.
. - = 10 = . Basic ..1:N and .1:MDecodei-. =
.Referdng to FIG. 2, a decoder functiOn or device ("Decoder") embodying aspects: .
= of the present invention is shown. The figure is an example of a function or structure that performs .as a basic decoder embodying aspeets of the invention. Other functional or stuctm:Eif arrangements that practice aspect of the invention may be employed, including 15 alternative and/or equivalent functional or structural arrangement described below.
The Decoder receives the mono composite audio signal and the sideehain =
= =
information for All the channels .or all the channels except the reference channel. If =
necessary, the composite audio signal and related sidechain information isdemultiplexed, = . unpacked and/or decoded. Decoding may employ a table lookup. The goal is to derive 20 = fibm the mono composite audio channels a plurality of individual audio channels =
approxiniating respective ones of the audio channels applied to the Encoder of FIG. 1, = .
= subject to bitrate-reducing techniques of the present invention that are described herein.
= . = Of course, one may choose not to recover all of the channels applied to the encoder or to use only the monophonic composite signal. Alternatively;
channels in 25 addition, to the ones applied to the Encoder may he derived from the output of a Decoder =
=
according to aspects of the present invention by employing aspects of the inventions = =
== described in International Applioation PCT/T,JS 92/03619, filed February 7,2002, = .
published August 15;2002, designating the-United States, and its rescilting U.S. national = application S.N. 10/467,213, filed August 5,20.03, and in .International Application' 30 PCT/US03/24570, filed-August 6,2003, published March 4, 2001 as WO
2004/019656, = designating the United States, and it resulting U.S. national application &N. 10/522,515, . =
. Ja..n.tiatY 27, 2005. . =
=
_ . =
. . -= =
= = =
. .
=
. .
=
_ .
=
= =
- ' = , 73221792 .
=
=
- - . =
- . = .
=
=
- 11 - = -= .
Chiumels 'recovered by a Decoder practicing iiipects of the present invention are = =
=
'=== =1: pattieularlYnseful hi corm.ection with the c !met rmiltifilication.
techniques of the cited =
=
applications iii that the recovered channels not only have useful . inter& aim el amplitUde relationships but also have useful interchannelphase relationships. _ == 5. Another alternative for Channel multiplication is to employ a matrix decoder to derive = = additional channels. The=interchannel amplitude- andphasa-preservation aspects of the == present inventionmake the output channels Of a decoder embodying aspects of the .
present inventionparticularly suitable for application to an. amplitude- and phase-sensitive matrix decoder. Many such matrix decoders employ wideband control' circuits that .
= 10. = operate properly only when the signals applied to them are stereo throughoutthe signals' . :bandwidth.. Thus, if the aspects of the present invention are embodied in an,N:1:1\T system. = .
=
= ill Which is. 2;A iw:o chtiunels recovered by. the decoder May be applied to a 2:M =
= active matrix deeod6r. Such channels may have been discrete chaimelSbelow a coupling frequency, as mentioned above. Many-suitable active matrix decoders are. well known in = =15 the art, including, for example, matdi decoders known as 'Pro Logic"
and "Pro Logic II"
-=
decoders ("Pro Ifogic" is a trademark of Dolby Laboratories Licensing Corporation).
=
=
Aspects of Pro Logic decoders are disclosed in U.S: Patents 4,799,260 and 4,941,177, =
=
= = = = Aspects of Pro Logic 11 =
=
=
decoders are disblosed in pending U.S. Patent Application S.N..09/532,711 of Posgatc;
20 entitled "Method for.l)eriving Eit LOLISt Thrie Audio 8ignals from Two Input Audio =
Signals,' filed March 22, 2000 and published as WO 01141504 on June 7, 2001, and in = 'pending U.S. Patent:Application 5.a 10/362,786. ofFosgate et al,:entitled "Method for ' = Apparatus for Audio Matrix Decoding," filed February 25, 2003 and published as US
. 2004/0125960 Al on July 1, 2004.
25 Some aspects of the operation of-Dolby Pro Logic and Pro Logic II
= , =
. = = = decoders are exPlained, for example, in Papers available on the Dolby Laboratories' =
website .(wVrw.dolby.com): "Dolby Stniound Pro-Logic Decoder Principles of =
. Operation,' by Roger Dressler, and "Mixing with Dolby Pro Logic II
Technology, by Jim Ililson. Other suitable active matrix decoders may include those described in one or more =
30 of the following U.S. Patents and published International Applications (each designating = =
= the United States);
=
===
. .
=
=
=
. .
=
=*. VO 2005/086139 PCT/US2005/00 === ' = - 12 -5,046,098; 5,274,740; 5,400,433; 5,625,696; 5,644,640; 5,504,819; 5,428,687;
5,172,415;
and WO 02/19768. ' =
Referring again to=FIG. 2, the received mono composite audio channel is applied to a plurality of signgl pathg from which a resPective one of each of the recovered _ multiple audio channels is derived. Each channel-deriving path includes, in either order, an amplitude adjusting function or device ("Adjust Amplitude") and an. angle rotation function or device ("Rotate Angle").
. = The Adjust Amplitudes apply gains or losses to the Mono composite signal so that, -ender certain signal conditions, the relative output magnitudes (or energies) of the output channels derived from it are similar to those of the Hymnals at the input of the encoder.
Alternatively, under certain signal conditions when "randomized" angle variations are imposed, as next described, a controllable amount of "randomized" amplitude variations may also be imposed on the amplitude of a recovered channel in order to improve its decorrelation with respect to other Ones of the recovered channels.
= 15 The Rotate Angles applYphase rotations so that, under certain signal conditions, =
the relative phase angles of the output channels derived from the mono composite signal .
are similar to those of the channels at the input of the encoder. Preferably, under certain signal conditions, a controllable amount of "randomized" angle variations is also imposed on the angle of a recovered channel in. order to improve its decorrelatidn with respeot to other ones of the recovered channels.
As discussed further below, "randomized" angle amplitude variations may include not only pseudo-random and hilly random variations, but alsa deterministically-generated variations that have the effect of reducing cross-correlation between channels. This is discussed further below in the Comments to Step 505 of FIG. 5A.
Conceptnaily, the Adjust Amplitude and Rotate Angle for a particular channel scale the mono composite audio DFT coefficients to yield reconstructed transform bin values f3r the channel.
The Adjust Amplitude for earth channel maybe controlled at least by the -recovered sidechain Amplitude Scale Factor for the particular channel or, in the case of _ .
the refetence channel, either from the recovered sidechain Amplitude Scale Factor for the reference channel or from an. Amplitude Scale Factor deduced from the recovered sidechain Amplitude Scale Factors of the other, non-reference, channels.
Alternatively, = =
= = -,. . .
. = r . . = .=
= - 2005/086139 =
Per/C52005/0063 = = = =
= - 13 - = =
. . . .
. to enhance decorrelation of the re-covered:channels, the Adjust Amplitude may also be = controlled by a Randorni7Cd Amplitude Scale Factor Parameter derived from the recovered sidechain Decorrelation Scale Factor for a particular channel and the recovered sidechain Transient Flag for the particular channel.
= The Rotate Angle for each channel may be controlled at least by the recovered sidechain Angle Control Parameter (in which case,. the Rotate Angle in the decoder may = =
substantially undo the angle rotation provided by the Rotate Angle inthe encoder). To enhance decorrelation of ihe recovered 'channels, a Rotate Angle may also be controlled _ by a Randorni7ed Angle Control Parameter derived from the recovered sidechain =
= Decorrelation Scale Factor for a particular channel and the recovered.
sidechain Transient Flag for the particular channel. TheRsndomized.:Ang,le Control Parameter-for a channel, and, if employed, the Randomized Amplitude Scale Factor for a channel, may be derived from the recovered Decorrelation Scale Factor-for the channel and the recovered Transient Flag for the channel by a controllable decorrelator function -or device ("Controllable Decerrelator"). =
Referring to the example of FIG. 2, the.recoveredmono composite audio is applied to a first c-hann el audio recovery path 22, which derives the channel 1 audio, and . to a second channel audio recovery path 24, which derives the channel n audio. Audio = path 22 includes an Adjust Amplitude 26, a Rotate Angle 28, end, if a P CM output is =
desired, an inverse filterbank function or device ("Inverse Filterbarde, 30.
Similarly, audio path 24 includes an Adjust Amplitude 32, a Rotate Angle 34, and, if a PCM output = is desired, an inverse filterbank function or device ("Inverse Filterbank") 36. As with the case of FIG. 1, only two channels are shown for simplicity in Presentation, it being .
= understood that there may be more than two channels.
= The recovered sidechain infomiation for the first channel, channel 1, may inclUde an Amplitude Scale Factor, an Angle Control Parameter, a Decorrelation Scale Factor, a:
Transient Flag, and, optionally, an Interpolation Flag, as stated above in connection..with the description of a basic Encoder: The'Amplitude Scale Factor is applied to, Adjust Amplitude 26. If the optional Interpolation Flag is employed; an optional frequency = . = = .
-30 interpolator or interpolator function ("Interpolator") 27 may be employed in order to interpolate the Angle Control Parameter across frequency (..g., across the bins in each =
subband of a channel). Such interpolation may be, for example, a linear interpolation-of s =-. = . =
= =
. - .
_ .
= - . . . . . -= . = - . . . =
- = = . - .
= =-= = .
= = . . =
=
= =
=
VO 2005/086139 = . PCTIUS2005/006 - 14 - " =
the bin ang eshetween the center a of each subband. The state of the one-bit Interpolation Flag selects whether or not interpolation across frequency is employed, as is explained = further below. The Transient Flag and Decorrelation. Scale Factor are apPlied to a "
= . Controllable Decorrelator 38 that generates a Randomized Angle Control Parameter in =
response thereto. The state Of the one-bit Transient Flag selects one of two multiple modes of randomized angle decor:relation, as is explained further below. The Angle Control Parameter, which may be interpolated across frequency if the Interpolation Flag and the Interpolator are employed, and the 1andorni7ed Angle Control Parameter are summed together by an additive combiner or combining function 40 in order to provide a.
.10 control signal for Rotate Angle 28. Alternatively, the Controllable Decorrelator 38 may =
also generate a Randomized .Amplitude Scale Factor in response to the 'Trsnsie.at Flag and Decorrelation ScaleFacter, in addition to generating a Randomi7pd Angle Control = Parameter. The Amplitude Scale Factor may be summed together with such a =
Randomind Amplitude Scale Factor by an additive combiner or combining function (not shown) in order to provide the control signal for the Adjust Amplitude 26.
. Similarly, recovered sidechain information for the second channel; channel n, may also include an Amplitude Scale Factor, an Angle Control Parameter, a Decorrelation =
Scale Factor, a Transient Flag, and, optionally, an Interpolate Flag, as described above in connection with the description of a basic encoder. The Amplitude Scale Factor is; = .
applied to Adjust Amplitude 32. A frequency interpolator or interpolator function . .
=
("Interpolator") 33 may be employed in order to interpolate the Angle Control Parameter = across frequency. As with channel 1, the state of the one-bit Interpolation Flag selects whether or not interpolation across frequency is employed. The Transient Flag and . = .
Decorrelation Scale Factor are applied to a Controllable Decorrelator 42 that generates a Randomized Angle Control Parameter in. response thereto. As with. channel 1;
the state of =
the one-bit Transient Flag selects one of two multiple modes of randorniaed.
angle decorrelation, as is explained further below. The Angle Control Parameter and the ' Randomized Angle Control Parameter are summed together by an additive coMbiner or =
combining function 44 in order to provide a. control sigeal fur Rotate Angle 34. _ - Alternatively, aideseribedabove in connection with channel 1, the Controllable =
Decorrelator 42 may also generate a Randorniaed Amplitude Scale Factor in response to = the Transient Flag and Decorrelation Scale Factor, in addition to generating a =
= . = ' . _ _ =
. .
. , =
=
=
=
. , 2005/086139 PCT/02005/00f = =
. =
=
= = - 15 -Randomized Angle Control Parameter.. The Amplitude Scale Factor and Randomized =
Amplitude Scale Factor may be summed together by an additive combiner or combining function (not shown) in order -to provide the control signal for the Adjust Amplitude 32.
Although a process or topology as just described is useful for understanding, essentially the same results may be obtained with alternative processes or topologies that achieve the same or similar results. . For example, the 'order of Adjust Amplitude 26(32) = and Rotate-Angle 28 (34) may be reversed xral/or there may be more than the Rotate =-Angle ¨one that responds to the Angle Control Parameter and another that responds to -the Randomized Angle Control Parameter. The Rotate Angle may also be considered to be three rather than one Or two functions or devices, as in the =amp- le of FIG. 5 described below.. If a Randomized Amplitude Scalp Factor is employed, there may be more than =
one Adjust Amplitude ¨ one that responds to the Amplitede SealeFactor and one that responds to the Randomized Amplitude Scale Factor. Because of the human ear's greater , sensitivity to amplitude relative to phase, if a Randomized Amplitude Scale Factor is employed, it May be desirable to scale its effect relative to the effect of the Randomized Angle Control Parameter so that its effect on amplitude is less than the effect that the = Randornized'Artgle Control Parameter has on phage angle. As another alternative process-or topology, the Decorrelation Scale Factor may, be used to control. the ratio of = randomized phase angle versus basid phase angle (rather than adding a parameter representing a randomized phase angle to a parameter representing the basic phase angle), .
and if also employdd, the ratio of randomized amplitude shill versus basic amplitude shift (rather than adding a scale factor representing a randomized amplitude to a scale factor -representing the basic amplitude) -(i.e., a Variable crossfade in each case).
. If a reference channel is employed, as discussed above in connection with the - =
basic encoder, the Rotate Angle, Controllable Decorrelator and Additive Combiner for. = -that channel may be omitted inasmuch as the sidenhain information for the reference channel may include only the Amplitude Scale Factor (or, alternatively, if the sidechain information does not contain an Amplitude Scale Factor for the reference channel, it may be deduced from Amplitude Scale Factors of the other channels when the -energy normalization in-the encoder assures that the scale factors across channels within a = subband sum square to I). An Amplitude Adjust is provided for the reference channel and it is controlled by a received or derived Amplitude Scale Factor for the reference .
. .
=
= = . ' =
=
= = =
=
. = = = . =
. . CA 3026283 2018-12-03 =
= = = = V0 1005/086139 = - 16 7 channel Whether the reference channel's Amplitude Scale Factor is derived from the, = sidechain or is 'deduced in the decoder, the recovered reference channel is an amplitude-scaled version of the mono composite channel. It does not require angle rotation because .
it is the reference for the other cha-nnels' rotations. =
Although adjusting the relative amplitude of recovered clumnels may provide a modest degree of decorrelation, if used alone amplitude adjustment is likely to result in a . = reproduced soundfield substantially lacking in spatia1i7ation or iinaging for many signal conditions (e.g., a "collapsed" soundfield). Amplitude adjustment may affect interaural level differences at the ear, which is only one of the psychoacoustic directional cues employed by the ear. Thus, according to aspects of the invention, certain angle-adjusting techniques may be employed, depending on signal conditions, to provide additional decorrelation. Reference may be made to Table I that provides abbreviated comments = useful in mderstanding the multiple angle-adjusting decorrelation techniques or modes of operation that may be employed in accordance with aspects of the invention.
Other =
decorrelation techniques as described below in connection with the examples of FIGS. 8 and 9 may be employed instead of or in addition to the techniques of Table 1:
= In practice, applying angle rotations and. magnitude alterations may result in circular convolution (also known as cyclic or periodic convolution).
Although,. generally, ' it is desirable to avoid circular convolution, undesirable audible artifacts resulting from circular convolution are somewhat reduced by complementary angle shifting in an =
= . encoder and. decoder.. In addition, the effects of cirOular convolution may be tolerated in low cost implementations of aspects ofthe present invention, particularly those in which the downraixing to mono or multiple channels occurs only in part of the audio frequency band, such as, for example aboire 1500 Hz (in which case the audible effects of circular =
convolution are minimal). Alternatively, circular convolution may be avoided or minirnired by any suitable technique, including, for example, an appropliate use of zero .
padding. One way to use zero padding is to transform the proposed frequency domain variation (representing angle rotations and amplitude scaling) to the time domain, window . .. = =
it (with an arbitrary window), pad it with. zeros, then tendorm back to the frequency -* 30 domain and multiply by the frequency domain version of the audio to=be processed (the .
=
audio ne!!--d not be windowed).
= Table 1 = Angle-Adjusting Decorrelation Teebnique,s _ . - =
=
. = , . .
. . . . , .
-9 20,05/086139 PeTTGS2005/006"-=
=
= - 17 -. = =
= Technique 1 Technique 2 Technique 3 =
Type of Signal Spectrally static Complex continuous Complex impulsive (typidal example) source signals signals (transients) Effect on = = Decorrelates low Decorrelates non- Decorrelates Decorrelation frequency and impulsive complex impulsive high steady-state signal = signal components frequency signal components components Effect of transient Operates with Does not operate Operates present in frame shortened time =
= constant What is done Slowly shifts Adds to the angle of Adds to the angle of (frame-by-frame) Technique 1 a time- Technique 1 a bin angle in a invariant rapidly-changing channel randomized angle (block by bloek) =
on a bin-by-bin randomized angle =
= basis in-a channel on a subband-by-subband basis in. a = channel =
Controlled by or Basic phase angle is Amount of = = Amount of ' =
Scaled by controlled by Angle randomized angle is randomized angle is Control Para meter scaled directly by 'scaled indirectly by Decorrelation SF; Decorrelation SF;
same scaling across same scaling across = subband, scaling subband, scaling updated every frame updated every frame Frequency Subband (same or Bin (different Subband (same =
Resolution of atTle interpolated shift randomized shift randomized shift shift = value pplied to all value applied to value applied to. all , bins in each each bin) bins in. each subband) subband; different .
randomized shift value applied to =
= each subhead in = channel) Time Resolution Frame (shift values Randomized shift Block (randomized updated every values remain the shift values updated frame) same and do not every block) change =
=
For signals that are substantially static spectrally, such as, for example, a pitch pipe note, a first technique ("Technique 11) restores the angle of the received mono =
composite signal relative to the angle of each of the other recovered channels to an angle similar (subject to frequency and time granularity and to quantization) to the original =
angle of the channel relative to the other channels at the input of the encoder. Phase angle differences are useful, particularly, for providing deccarelation of low-frequency signal =
= = =
=
=
= .====
= = . . . =
=
VO 2005/086139' KT/GS2005/0 components below about 1500 Hz where the ear follows individual cycles of the audio signal. Preferably, Technique 1 operates under all signal conditions to provide a basic angle shift For high-frequency signal components above about 1500 Hz, the ear does not . 5 follow individual cycles of sonadhut instead responds to waveform envelopes (on a critical band basis). Hence, above about 1500 Hz decorrelation is better provided by differences in signal envelopes rather than phase angle differences. Applying phase angle = shifts only in accordance with Technique 1 does not alter the envelopes of signals sufficiently to decorrelate high frequency signals. The second and third techniques ("Technique 2" and 'Technique 3", respectively) add a controllable amount of itndomind angle variations to. the angle determined by Technique 1 under certain signal conditions, thereby causing a controllable amount of randomind envelope variations, which enhances decorrelation: =
Randomized changes in phase angle are a desirable way to cause random Wed changes in. the envelopes of signals. A particular envelope results from the interaction of .a particular combination of amplitudes and phases of spectral components within a subband Although changing the amplitudes of spectral components within a subband changes the envelop; large amplitude changes are required to obtain a significant change in the envelope, which is undesirable because the human ear is sensitive to variations in spectral amplitude. hi contrast, changing the spectral component's phase angles has a greater effect on the envelope than changing the spectral component's amplitudes ¨
spectral components no longer line up the same way, so the reinforcements and subtractions that define the envelope occur at different times, thereby changing the envelope. Although the human ear has some envelope sensitivity, the ear is relatively phase dm-4 so the overall sound quality reniains substantially similar.
Nevertheless, for some signal conditions, some randomization of the amplitudes of spectral components along with randomization of the phases ofspectral components may provide an enhanced randomization of signal envelopes provided that such amplitude.randorni7ation does not cause undesirable audible artifacts.
Preferably, a controllable amount or degree of Technique 2 or Technique 3 .. = .
= operates along with Technique 1 under 'certain signal conditions. The Transient Flag . selects Technique 2 (no transient present in the frame or block, depending on whether the = = =
= : = = . = . = ..=
= --'01 2005/086139 =
Transient Flag is sent at the frame or block rate) or Technique 3 (transient present in the frame or block): Thus, there are multiple modes of. operation, depending on whether or = not a transient is present Alternatively, in addition, under certain signal conditions, a controllable amount of degree of amplitude randomization also operates along with the amplitude scaling that seeks to restore the original channel amplitude. =
Technique 2 is suitable for complex continuous signals that are rich in harmonics, . = such as massed orchestral violins: Technique 3 is suitable for complex impulsive or transient signals, such as applause, castanets, etc. (Technique 2 time smears claps in applause, making it unsuitable for such signals). As exPlained further below, in order to minimize audible artifacts, Technique 2 and Technique 3 have different time and frequency resolutions for applying randomized angle variations ¨ Technique 2 is selected when a transient is not present, whereas Technique 3 is selected when a transient is present. =
Technique 1 slowly shifts (frame by frame) the bin angle in a channel. The .
amount or degree of this basic shift is controlled by the Angle Control Parameter (no shift if the parameter is zero). As explained further below,. either the same or an interpolated -parameter is applied to all bins in each subband and. the parameter is updated every frame.
Consequently, each subband of each channel may have a phase' shift with respect to other channels, presiding a degree of decorrelation at low frequencies (below about 1500 Hz).
.20. However, Technique 1, by itself is unsuitable for a transient signal such as applause. For such signal conditions, the reproduced ChannelS -May exhibit an. annoying unstable comb-= filter effect. In the case of applause, essentially no decorrelation is provided by adjusting only the relative amplified of recovered channels because all charnels tend to have the =
same amplitude over the period of a frame. =
Technique 2 operates when a transient is not present. Technique 2 adds to the angle shift of Technique 1 a randomized angle shift that dotes not change with time, on a bin-by-bin basis (each bin hasa different randomized shift) in a channel, causing the envelopes of the channels to be different from one another, thus providing decorrelation of complex signals among the channels Maintaining the randomized phase angle values constant over time avoids block or frame artifacts that may result from block-to-block or =
frame-to-frame alteration of bin phase angles. 'While this technique is a verTaseful decorrelation tool when a transient is not Present, it may temporally smear a tansient = =
=
. =
- = = = = . .
70 20051086139 = PCTAIS2005100r =
. ' - -(resulting in what is often referred to as "pre-noise"¨ the post-transient smearing is masked by the transient). The amount or degree of additional shift provided by Technique 2 is scaled directly by the DeCorrelation Scale Factor (there is no a1ditional.
shift if the scale factor is zero). Ideally, the amount of randomized plinseangle added to 2 .. the base angle shift (of Technique 1) accordingto Technique 2 is controlled by the Decorrelation Scale Factor in a manner that minimizes audible signal Warbling artifacts.
ancli minimization of s'gnal warbling artifacts results from the manner in which the Decorrelation Scale Factor is derived. and the application Of appropriate time smoothing, as described below. Although a different additional randomized angle shift value is . applied to each bin and that shift value does not change, the same scaling is applied across a subband and the scaling is updated every.fraMe.
Technique 3 operates in the presence of a transient in. the frame or block, depending on the rate at which the Transient Flag is sent_ It shifts all the bins in each subband in a channel from block to block with a unique randomized angle value, common 15= to all bins in the subband, causing not only the envelopes, but also the amplitudes and phases, of the signals in a channel to change with respect to other channels from block to block. These changes in time and frequency resolution of the angle randomizing reduce steady-state signal. similarities among the channels and provide decorrelation of the channels substantially Without causing "pre-noise" artifacts. The change in frequency resolution of the angle randomizing, from very fine (all bins different in a channel) in.
Technique 2 to coarse (all bins within a subband the same,, but each subband different) in Technique 3 is particularly useful in minimizing 'pre-noise" artifacts.
Although the ear - does not respond tä pure angle changes directly at high frequencies, when two or more channels mix acoustically on their way from loudspeakers to a listener, phase differences may cause amplitude changes (comb-filter effects) that ma:y.be audible and objectionable, and these are broken up by Technique 3. The impulsive characteristics of the signal ininimize block-rate artifacts that might otherWise occur. Thus, Technique 3 adds to the phase shift of Technique 1. a rapidly changing (block-by-block) randomized angle shift =
. on a subband-by-subb and basi8 in a channel. The amount or degree of additional shift is.
scaled indirectly, as described below, by the Decorrelation Scale Factor (there is no additional shill if the scale ctor is zero). The same scaling is applied across a subband and the scaling is updated -every frame:
. .
. = = _ =
=
= . =
= 2005/086139 PCT/ITS2005/0063 .
= Although the angle-adjusting techniques have been characterized as three techniques, this is a matter of semantics and:they may also be characterized as two = techniques: (1) a combination of Technique 1 and a variable degree of Technique 2, which may be zero, and (2) a combination of Technique 1 and a variable degree : -Technique 3, which may be zero. For convenience in)presentaiion, the techniques are treated as being three techniques.
Aspects of the multiple mode decorrelation techniques and modifications of them may be employed in. providing deconelation of audio signals derived, as by upmixing, from one or more audio channels even when such audio channels are not derived from an encoder according to aspects of the present invention. Such arrangements, when applied to among audio. channel, are sometimes referred to as "pseudo-stereo" devices and functions. Any suitable device or function (an "up-mixer") may be employed to derive = multiple signals from a mono audio channel or from multiple audio channels. Once such multiple audio channels are derived by an upmixer, one or more of them may be . 15 decone1ated with respectio one or more of the other derived audio signals by applying the multiple mode decorrelation techniques described herein. In such an application, each derived audio channel to which the decorrelation techniques are applied may be switched from one mode of operation to another by detecting transients in the derived audio channel itself Alternatively, the operation of the transient-present technique (Technique 3) may be simplified to provide no shifting of the phase angles of spectral components when a transient is present Sidechain information = =
= As mentioned above, the sideChain information may include: an Amplitude Seale . Factor, an Angle Control Parameter, a Decoirelation Scale Factor, a Transient Flag, and,, optionally, an Interpolation Flag. Such sidechain information for a practical embodiment = of aspects of the present invention may be summarized in the following Table 2.
= Typinally, the sidechain information may be updated once per frame. , Table 2 Sidechain Information Characteristics for a Channel Sidechain RelDresents Quantization Primary . .
Information. Value Range (is "a measure Levels Purpose of') Subband Angle 0 -342n. Smoothed time 6 bit (64 levels) Provides = Control average in each basic angle Parameter subband of rotation for . ' - .
' -= - .70 2005/086139 = = PCT/US2005/00 . . . . .
-, .
= . =
. .
. - 22 - . .
= Sidechain .
Represents QUanfintion Primary . Information Value Range (is "a measure = Levels =
Purpose of") difference . each bin in = between angle of . channel . each bin in , = subband for a . . channel and that .
of the . .
, .
. - = corresponding bin .
. =
= in. subband of a =
reference channel =
. Subband 0 41 Spectral- 3 bit (8 levels) Scales Decorrelation The Subband steadiness of randomized Scale Factor Decorrelation "- signal angle shifts =
= . Scale Fader is characteristics added to =
high only if over time in a = basic angle both the subband of a rotation, and, = = Spectral- channel (the if employed, Steadiness - Spectral- = also scales Factor and the Steadiness . - = .
randornind.
. , Irtterchannel Factor) and the Amplitude . . Angle consistency in the Scale Factor -Consistency same subband of added to = -. Factor are low, a channel of bin . basic - - angles with Amplitude respect to Scale Factor, -corresponding = - and, .
, bins of a optionally, , .
reference channel scales degree = . (the Interchannel = of - = Angle reverberation.
.
,--Consistency ..
.
. Factor) - . =
, Subband . 0 to 31 (whole Energy or 5 bit (32 levels) Scales = , - Amplitude integer) amplitude in. granularity is amplitude of .
= Scale Factor = 0 is highest . subband Of a 1.5 dB, so the bins in a , amplitude channel with range is 31*1.5 = subband in a 31 is lowest respect to energy 46.5 dB plus channel amplitude - or amplitude for final value = ofe _ same subband .
. acrossall õ
. _ = . .
. . . . , channels *
. . !
. = . . . = . .
.
=
=
. .
= . - . , . , , . . . '. = .
, . _ . .
. .
. = , . , = ' .
- .
. .. .
. . 7 .. , - . = ' . .
. , .
-PCT/US2Q05/0063._ - =
= - 23 -Sidechein = Represents -= Quantization Primary=
.
Information. Value Range (is ,"a measure Levels Purpose of') Transient Flag 1,0 = Presence of a 1 bit (2 levels) Determines (True/False) transient in the which =
(polarity is frame or in the technique for = arbitrary) . block =
adding randomized =
angle shifts, = or both angle shifts and amplitude = shifts, is employed Interpolation 1, 0 A spectral peak 1 bit (2 levels) Determines Flag (True/False) near a subband if the basic (polarity is . boundary or = angle = arbitrary) phase angles rotation is within a channel interpolated have a linear across progression frequency In each case, the sidechain information of a channel applies to a single subband (except for the Transient Flag and the Interpolation Flag, each of which apply to all subbands in a channel) and maybe updated once per frame. Although the time resolution (once per frame), frequency resolution (subband), value ranges and quantization levels indicated have been found to Provide useful performance and a -useful compromise between a low bitrate and performance, it will be appreciated that these time and frequency resolutions, value ranges and quantization levels are not critical and that other =
=
resolutions, ranges and levels may employed in practicing aspects of the invention. For example, the Transient Flag and/or the Interpolation Flag, if employed, may be updated once per block with only a minimal increase in sidechain data overhead. In the case of the Transient Flag, doing so has the advantage that the switching from Technique 2 to -Technique 3 and vice-versa is More accurate. In addition, as Mentioned above, sidechain information may be updated upon the occurrence of a block switch of a related coder.
It will be noted that Technique 2, described above (see also Table .1), provides a bin frequency resolution rather than a subband frequency resolution (ix., a different pSeudo random phase angle shift la applied to %Alin rather than to each subband) even though the same Subband Decorre_ar d. on Scale Factor applies to all bins in a subband. It =
- = . -, =
=
.= -WO 2005/086139 PCT/IIS2005100( = .
will also be noted that Technique 3, described above (see also Table 1), provides a block frequency resolution (i. e., a different randomized phase angle shift is applied to each block rather than to each frame) even though the same Subband Decorrelation Scale.
Factor applies to all bins in a subband. Such resolutions, greater than the resolution of the sidechain information, are possible becanse the randomized phase angle shifts may be generated in a decoder and need not be known in the encoder (this is the case even if the encoder also applies a randomized phase angle shift to the encoded mono composite - signal, an alternative that is described below). In other words, it is not necessary to send sidechain information hiving bin or block granularity. even thang,h the decorrelation technicpres employ such granularity. The decoder may employ, for example, one or more lookup tables of randomized bin phase angles. The obtaining of time and/Or frequency resolutions for decorrelation greater than. the sidechain information rates is among the aspects of the present invention. Thus, decorrelation by way of randorni7ed phases is = performed either with a fine frequency resolution (bin-by-bin) that does not change with time (Technique 2), or with a coarse frequency resolution (band-by-band) ((or a fine frequency resolution (bin-by-bin) when frequency interpolation is employed, as described . further below)) and a fine time resolution (block rate) (Technique 3).
It will also be appreciated that as increasing degrees of randomized phase shifts are added to the phase angle of a recovered channel, the absolute phase angle of the recovered channel differs more and more from the original absolute phase angle of that channel. An aspect of thepresent invention is the appreciation that the resulting absolute phase angle of the recovered channel need not match that of the original channel when . signal conditions are such that the randomized phase shifts are added in accordance with . . aspects of the present invention. For example, in extreme cases when the Decorrelation Scale Factor causes the highest degree Of randomized phase shift, the phase shift caused by Technique 2 or Technique 3 overwhelms the basic phase shift caused by Technique 1.
Nevertheless; this is of no concern in that araudonti zed phase shift is audibly the same as . the different random phases in the original Signal that give rise to a Decor-relation Scale Factor that causes the addition of some degree of randorni7ed phase shifts.
=
As mentioned. above, randomized amplitude shifts may by employed in addition to randomized phase shifts.- For example,-the Adjust Amplitude may also be controlled by a Randomized Amplitude Scale Factor Parameter derived from the recovered sidechain . . =
= CA 3026283 2018-12-03 =
= ¨
- 2005/086139 PCTMS2005/006. =
=
= =
Decorrelation Scale Factor for a particular channel and the recovered sidechain Transient Flag for the particular channeL Such randomized amplitude shifts may operate in two modes in a manner analogous to the application of randomized phase shifts. For example, in the absence of a transient, a randomized amplitude shift that does not change with time may be added on a bin-by-bin basis (different from bin to bin), and, in the presence of a = transient (in the frame or block), a randomized amplitude shift that changes on a block-by-blockbasis (different from block to block) and changes from subband to subband (the same shift for all bins in a subband, different from subband to subband).
Although the amount or degree to which randomized amplitude shifts are added may be controlled by . the Decorrelation Scale Factor, it is believed that a particular scale factor value should .=
.
cause less amplitude shift than the corresponding randomized phase shift resulting from the same scale factor value in order to avoid audible artffiicts.
When the Transient Flag applies to aframe, the time resolution with Which the .
Transient Flag selects Technique 2 or Technique 3 may be enhanced by providing a supplemental transient detector in the decoder in order to provide a temporal resolution finer than the frame rate or even the block rate. Such a supplemental transient detector may detect the occurrence of a. transient in the mono or multichannel composite audio signal received by the decoder and such detection information is then sent to each Controllable Decorrelator (as 38,42 of FIG. 2). Then, upon the receipt of a Trnsient Flag for its channel, the Controllable Decorrelator switches from Technique 2 to =
Technique 3 won receipt of the decoder's local transient detection indication.
Thus, a substantial improvement in temporal resolution is possible without increasing the =
sidechain bitrate, albeit with decreased spatial accuracy (the encoder detects transients in each input channel prior to their downmixing, whereas, detection in the decoder is done after downmiling).
= As an alternative to sending sidechain information on a frame-by-frame basis, sidechain information may be updated.every block, at least for highly dynamic signals.
As mentioned above, updating the Transient Flag and/or the Interpolation Flag every block ;results in only a small increase in sidechain data overhead. In order to accomplish .30 such an increase in. temporal resolution for other sidechain information without substantially increasing the sidechain data rate, a block-floating-point differential coding arrangement may be used. For example, consecutive transform blocks may be collected = . .
=
- = YO 20057086139 PCT/US2005/001 = - 26 -. in groups of six over a frame.- The full sidechain information may be sent for each subband-channel in the first block., In the five subsequent blocks, only differential values may be sent, each the difference between the current-block amplitnde and angle, and the = equivalent values from-the previous-block. This results in very low data rate for static signals, such as a pitch pipe note. For More dynamic signals, a greater range of difference values is required; but at less preci.sion. So, for each group of five differential -values, an exponent may be sent first, using, for example, 3 bits, then differential values are quantized to, for example, 2-bit accuracy. This arrangement reduces the average worst-case sideohain data rate by about a factor of two. Further reduction may be obtained by Omitting thesidechain data for a reference channel (since it can he derived from the o. ther channels), as discussed above, and by using, for example, arithmetic coding.
Alternatively or in addition, differential coding across frequency may be employed by . .
sending, for example, differences in subband angle or amplitude.
Whether sidechain information is sent on a frame-by-frame basis or more frequently, it may be useful to interpolate sidechain values across the blocks iii a frame.
Linear interpolation over time may be employed in the manner of the linear interpolation across frequency, as described below.
One suitable implementation of aspects of the present invention employs processing steps or devices that implement the respective processing steps.
and are = functionally related as next set forth. Although the encoding and decoding steps listed below may each be carried out by computer software instruction sequences operating in the order of the below listed steps, it will be nnderstood that equivalent or similar results may be obtained by steps ordered in other ways, taking into account That certain quantifies are derived from earlier ones. For example, multi-threaded computer software instruction " 25 sequences may be ernployed so that certain sequences of steps are carried out in parallel.
Alternatively, the described steps may bp implemented as devices that perform the described functions, the various devices having functions and functional inte.uelationships as described hereinafter.
Encoding = 30 ' The encoder or encodipg function may collect a frame's worth of data before it .
derives sidechain inform tion and downmixes the forne's audio channels to a single = monophonic (mono) andio channel (in the manner of the example of FIG. 1, described =
=
' '0 2005/086139 = PCT/US2005/0063.
above), or to multiple audio channels (in the manner of the example of FIG. 6, described below). By doing so, sidechain information may be sent first to a decoder, allowingthe decoder to begin decoding immediately upon receipt of the mono or multiple channel audio information. Steps of an encoding process ("encoding steps") may be described as follows. With respect to encoding steps, reference is made to FIG. 4, which is in the =
nature of a hybrid flowchart and functional block diagram. Through Step 419, FIG. 4 shows encoding Steps for one channel. Steps 420 and 421 apply to. all Of the multiple channels that are combined to provide a composite mono signal output or are matrixed together to provide multiple channels, as described below in connection with the example of FIG. 6.
Step 401, Detect Transients a. Perform transient detection of the PCM values in an input audio channeL
b. Set a one-bit Transient Flag True if a transient is present in any block of a frame =
for the channel. =
Comments regarding Step 401:
The Transient Flag forms a portion of the sidechain information and is also used in Step 411, as described below. Transient resolution finer than block rate in the decoder = may improve decoder performance. Although, as discussed above, a block-rate rather than a franie-rate Transient Flag may form a portion of the sidechaiu information with a modest increase in bitrate, a similar result, albeit with decreased spatial accuracy, maybe accomplished without increasing the sidechain bitrate by detecting the occurrence of transients in the mono composite signal received in the decoder.
There is one transient flag per channel per frame, which, because it is derived in the time domain, necessarily applies to all subbands within that channel. The transient detection may be performed in the manner Similar to that employed in an AC-3 encoder for controlling the decision of when to switch between long and short length audio = blocks, but with a higher sensitivity and with the Transient Flag True for any frame in - which the Transient Flag for a block is True (an AC-3 encoder detects transients on a block basis). basis). In particular, see Section 8.2.2 of the above-cited A/52A document. The 31:1 sensitivity of the transient detection described in. Section 8.2.2 may be increased by adding a sensitivity factor F to an equation set forth therein. Section 8.2.2 of the A/52A
document is set forth below, with the sensitivity factor added (Section 8.2:2 as reproduced . .
= =
. =
= -. .
= .
l = .
= = . =
= 73221,92 , . . . , .
'''' = = - =
. . .
.
, . . .
. .
_ -. ' . =:. 28'=-= = = .
.
. = =
. .
, . ' below is cerrected_to indicate that the low'pass filter is a cascaded biped direct f-orm II = ' . õ .
.U.K filter rather than "form I" as lathe published A/52A.= document; Section 8.2.2 was.
. . = correct lathe earlier .A152 docuraent): Although it is not critical, a sensitivity factor of . .
'= =
0.2 has been found to be a suitable value in a practical embodiment of aspects of the = . '.
:-, . . 5 present invention. - = - ..
. .
.=.
. .
.
. Altem.atiVely, a 'similar transient' detection technique deseribed in U.S. Patent .
..
5,394,473 nidy be employed.. The '473 patent describes aspects of the.A/52A
document . = .
= . transient detector in gieater detaiL
. . . =
" -. = - .
. . . . .
.
. ... . .. .
. -. . 10 .. . '' =
As another. altehmtive,transients maybe detected lathe frequency doniain rather .
= : than in the time domain(see the Comments to Step-408 ). In that can;
Step 401 May be . . . . . .
= omitted and. an alternative step emploYed in the frequency domain as d,eiciibed below.
. . =
= = = =, . Step 402. Window and bfr. .
= . =
.
.
= . , . = = = . Multiply overlapping blocks ofPCM time Aamples by alime window and convert , 15 . them to complex frequency values via a DFT as imPlem. ented by atuner. . .
. , .. .
.Step 403. -Convert Complex Values taMagnitude tin.d Angle.' =
= - =
. Convert each freperiby-domain complex transfer:m.13in value (a + jb) to a .
. . ' ' magnitude 'and angle =Presentation using standard complex manipulations:
= . a. Magnitude = square rocit.(a2+ b2) , " =
.. . . =
= . 20, : - :1,. Angle =-.archtit (hitt) ' ' . - .. . ' .. = . .. .
. .
' Comments regarding Step 403: . = =
= . . .
. .
. Some of the. fellOwittg"Steps use or may use, as an alternative, the energy of a bin, = =
.
- defthed as the above.magnitude squared (i., energy = (a2=4, b2.). . .
. .
= . .
. ' = . = Step.
404. Calculate Snhband Energy. -. .
. .
...
. 25 ' . a. Calculate the subband energy p.er blockby adding bin energy values within .. .
= .
= ' - = : each subband (a.summation moss frequencY). = . = = . = . .
. . .
' .
= b. Calculate.the subband energy per frame by averaging or accumulating the . . energy in all the blocks in a frame (an averaging / accumulation across time). . ...t--=
c. If the coupling frequency of the encoder is below about=1000-liz, apply the = I.
. 30 subband frame:averaged or frame-accumulated energy to-a time smoother that operates = . .
. .
on all subbands below that frequency andahove thezbupling fr. equency.
. = .
Comments regarding,Sfep 404c: - = = ' =
. .
. . . .
- . . .
= . . =
= " =
. .
. . _ . . . . .
. .
. = = = . . , = . .
. . .
" .. . =. . . .
. . . . .
.=
. = . .
.
. .
- . : =
=
= =
. . =
"29 - = =
Time=smoothing.to provide inter-frame smoothing hi low frequency subbands may be useful. In order to avoid artifact-causing discontinuities between bin values at Bubb and =
=
boundaries, it maybe useful to apply a progressiVely-decreasing time smoothing from th= e Iowestfrequency subhead encompassing and above the coupling frequency (wherethe =
smoothing ma Y have a signi'ficant effect) up through a higher frequency subband in which . = .
the time smoothing effect is measurable, but hiandiblei although nearly audible. A
suitable time constant for the lowest frequency range subband (where the subband is a= = .
.=
= single bin if subbands are critical bands) may be in the range of 50 to 100milliseconds, . = = for example. l'rogressively-decreasing time smoothing may continue up through a = 10 ,sulkand encompassing about 1000 HZ Where the time constant nifly=be about 10 milliseconds, for example. =
- = Although a first-order smoother is suitable, the smoother maybe a two-stage ymoother that has a variable time constant that shortens its attack and decay time in response te tratisicit (such a two-stage smoother maybe a digital equivalent of the analog two-stage snioothers describedin U.S. Patents' 3,846,719 and 4,922,535).
In other words, the steady-state =
=
= ti.1116 constant may be Scaled according to frequency and may also be variable in response to transients. Alternatively,. such smoothing may be applied in Step 412.
- Step 405: Calculate Sunk of Bin Magnitudes. =
= 20 . a. Calculate the sum per block of the bin magnitudes (Step 403) of each subband = (a suoimation acrosafrequency).
= b.
Calculate the sum per frame of the bin magaitudes of eatit subband by =
=
-= averaging or .accutnulating the magnitudes of Step=405a across.the blocks in a frame (an =
= . averaging / accumulation across time). These 'SUMS are used to calculate an Interchnimel =
. .
Angle Consistency Factor in Step 410.b.elOw.
D. If the coupling frequency) of the encoder i below about 1000 Hz, apply the subband frame-averaged or frame-accumulated magnitudes to a time smoother that . . , operates on all suhbands below that frequency and above the coupling frequency: =
=
= . Comments .regarding Step 405e: See coininents regarding step 404c eicept that mite case of Step 405; the time smoothing may alternatively be performed as pad, of = Step 410. .
= Step 406. Calculate Relative Interch.annel Bin Phase Angle. =
=
. = .
=
=
. =
= =
= =
=
=
= = =
= '10 2005/086139 ITTMS2085/006-/ , =
Calculate the relative interobannel phase angle of each. transform bin of each block by subtracting from the bin angle of Step 403 the corresponding bin angle of a reference .
, channel (for example, the first channel). The result, as with other anee additions or subtractions herein, is taken modulo (;-7c) radians by adding or subtracting 2n until the result is within the desired range of-7C to Step 407. Calculate Interchannel Subband Phase Angle.
For each channel, calculate a frame-rate amplitude-weighted average interchannel phase angle for each subband as follows:
a. For eachbin, construct a compleX number from the magnitude of Step 403 = 10 and the relative interchannel bin phase angle of Step 406.
b. Add the constructed complex numbers of Step 407a across each subband (a summation across frequency).
==Comment regarding Step 407b: For example, if a subband has two bins and one of the bins has a complex value of 1 + jl and the other bin has a complex value of 2 +j2, their complex.pum is 3 +j3. =
Average or accumulate the per block complex number sum for each = subband of Step 407b across the blocks of each frame (an averaging or = accumulation across time) -= d. lithe coupling frequency'of the encoder is below about 1000 Hz, apply the subband flame-averaged or frame-accumulated complex value to. a time sMoother that operates on. all subbands below that frequency and above the coupling = frequency.
Comments regarding Step 407d: See comments regarding Step 4045 except that in the case Of Step 407d, the time smoothing May alternatively be performed as part of Steps 4070 or 410.
e. Compute the magnitude of the complex result of Step 407d as per Step 403.
Comment regarding Step 407e: This magnitude is used in Step 410a below.
In the simple example given in Step 407b, the magnitude of 3 +33 is square root (9 9) = 424.
E Compute the angle of the complex result as per Step 403.
Comments regarding Step 417f: In the simple example given in Step 40%, the angle of 3 +j3 is aretan (3/3) = 45 degrees = n/4 radiant This subband angle . .
_ = = . . _ = --=
. = , PCT1t1S2005/00635 = - 31 -is signal-dependently time-smoothed (see Step 413) and rpiantind (see Step 414) to generate the Subband Angle Control Parameter sidechain information, as described below.
= Step 408. Calculate Bin Spectral-Steadiness Factor For each bin, calculate a Bin Spectra-Steadiness Factor in the range of 0.to 1 as follows: =
a. Let Xm = bin magnitude of present block calculated in Step 403. =
b. Lety = corresponding bin magnitude of previous block.
. = c. If xm. > yõõ, then Bin Dynamic Amplitude Factor d. Else if yõ, > xin, then Bin Dynamic Amplitude Factor =
. e. Me fyxm, then. Bin Spectral-Steadiness Factor = 1.
Comment regarding Step 408:
"Spectral steadiness" is a measure of the extent to which spectral components (e.g., spectral coefficients or bin values) change over time. A Bin Spectral-Steadiness = 15 Factor of 1 indicates no change over a given time per 1.
Spectral Steadiness may also be taken as an indicator of whether a transient is present. A transient may cause a sudden rise and fall in spectral (bin) amplitude over a time period of one or more blocks, depending on its position with regard to blocks and their boundaries. Consequently, a change in the Bin Spectral-Steadiness Factor from a high value to a low value over a small number of blocks may be taken as an indication of the presence of a transient in the block or blocks having the lower value. A
further confirmation of the presence of a transient, or an alternative to employing the Bin = Spectral-Steadiness factor, is to observe the phase angles ofbins within the block (for example, at the phase angle output of Step 403). Because a transient is likely to occupy a single temporal position within a block and have the dominant energy in the block, the existence and position of a transient may be indicatedhy a substantially nui form delay in phase from bin to bin in the block namely, a substantially linear ramp of phase angles as a function of frequency. Yet a further confirmation or alternative is to observe the bin amplitudes over a small number of blocks (for example, at the magnitude output of Step 403), namely by looking directly for a sudden rise and-fall of spectral level.
----Alternativelyi-Step408 may-look atthree conseeutive blocks instead of one block.
= If the coupling frequency of the-encoder is below about 1000 Hz, Step 408 may look at =
-= VO 20051086139 PCT/IIS2005/00t.
=
=
more than three consecutive blocks. The number of consecutive blocks may taken into consideration vary with frequency such that the number gradually increases as the .subband frequency range decreases. If the Bin Spectral-Steadiness Factor is obtained from more than one block, the detection of a transient, as just described, may be determined by separate steps that respond only to the number of blocks useful for detecting transients.
=
As a further alternative, bin energies may be used instead of bin magnitudes. -As yet a further alternative, Step 408 may employ an "event decision"
detecting technique as described below in the comments following Step 409.
Step 409. Compute Subb and Spectral-Steadiness Factor.
Compute a frame-rate Subband Spectral-Steadiness Factoi on a scale of 0 to 1 by forming an amplitude-weighted average of the Bin Spectral-Steadiness Factor within each subband across the blocks in a frame as follows:
a. For each bin, calculate the product of the BinSpectral-Steadiness Factor of Step 408 and the bin magnitude of Step 403. =
b. Sum the products within each subband (a summation across frequency). .
c. Average or accumulate the summation of Step 409b in all the blocks in a frame Can averaging / accumulation across time).. =
d. If the coupling frequency of the encoder is below about 1000 Hz, apply the subband frame-averaged or frame-accumulated summation to a time smoother that =
operates on all subbands below thatfrequency and. above the coupling frequency.
= Comments regarding Step 409d: See comments regarding Step 4040 except that in the case of Step 409d, there is no Suitable subsequent step in which the time smoothing may alternatively be performed.
e. Divide the results of Step 409c or Step 409d, as appropriate, by the sum of the bin magnitudes (Step 403) within the subband.
Comment regarding Step 409e: .The multiplication by the magnitude in Step 409a and-the divionby the sum of the magnitudes in Step 409e provide amplitude weighting. The output of Step 40515 independent of absolute amplitude and, if not = :
amplitude weighted, may rtabse the output or Step 409 to be controlled by very small =
amplitudes, which is undesirable.
f. Scale the result to obtain the Subhead Spectral-Steadiness Factor by mapping = = =
.
.
, =
. . . = .
=. 7.221-92. . . . . =
-' .
' . . - .= . =
. . . .
. .
. . . . . .
. õ
, . . .
. ..- 33 -.
. .
= . . . .
.
=
the range from: {0.5...1) to {0...1). This may be .done by multiplying the result by 2, .
. , subtracting 1; and limiting results less than 0 to a value. Of 9. .
. . . =
. = .
. = Comment rigardhtg=Step 4091: Stop 4091may be useful. in assuring that :t .
. . =
.
channel of noise results in a Sutband Spectral-Steadiness Factor of zero. ' .
..
. :
. _ = 5 - Commenfn regarding Steps 408 and 409: = = .
= - The goal of Steps 408 and 409 is to Measure- spectral 'steadiness ¨ changes in . - spectral composition over time ma Tubb and of a channel.
AltematiVely, aspects of an . .
=
. "event decision" sensing stih as described in hiternationtil PublicationlimOer WO = .
.
.02/097792 Al (designating the.United States) may be employed to measare spectral .
. .
. 40 steadiness instead of the approach just described in.connection with Steps.408 and 409. .
=
. = - = U.S. Patent Application S.N. 10/478,538, moil November 20, 2003 lathe United States' .
. . . .
.
. = national application of thepublisheciPCT Application WO 02/09772 Al.
.. .
=
= .
. . , .
. .
. . .
.. cAcording to these above-mentioned applications, the magnitudes of the = . -.
.
. =15 cemplexHr.r coefficient i3f each bin are calculated and normali7ed (largest magnitude is ' set fb a value of one, for example). Then the magnitudes of corresponding bins. (itt dB) in . consecutive blocks are subtracted (ignoring signs), the differences between bins ate .
summed, and, if the spin exceeds a threshold, the block boundary is-considered to be. an . .
. anditoiy event boundary: Alternatively; changes in amplitude from block to block may . .
. - 20 also be considered along with spectral magnitude changes (by loOlcing at the.
amounfOf .
_ .
.
. = nomialintdon required). - = .
:. . f I aspects of the above-mentioned event-sensing applications.
are employed to measure . . . .
. . = = spectral:steadiness, normalizatiOn may not be required and the changes in spectral .. .
. = = magaitude.(changes in amplitude would not be measured if i7ation is omitted) . .
. . = 25 preferably -are _considered on a subband basis-. Instead ofperfumaing Step 408 as: . . =
. '. indicated above, the decibel differences in spectral MagnitUde between corresponding . =
-. . - , bins in each subband may be summed in apcordance with the teachings of said . = application. Then, each of those sums, representing-the degree of speared change from. s .
= t = . " block to block may be scaled se that the reault is. a spectral steadiness factor haying a . .
. 3Q range from 0 ta 1, wherein a value of 1 indicates the highest steadiness; a change tif0 dB
=
from block to block for a given. bin. A value of 0, indicating the lowest steadiness, may .
. .
.
. . .
.
be assigned to decibel changes equal t or 'greater' than aunitble amount, such as 12 d13, = . = . = = . . . . . _ . . . . . . .
. . . . . . = . , :
- =
. .
. .
. - . .
. = . . .
.
= .
, .
. .
..
. . . .
. . . . - .
.
. = = .
.
.
, =
= .. 73221-92 . . .
= =
- = - 34 -= for example. Those results, a Bin Spectral-Steadiness Factor, may be used by Step 409 in = the same manner that Step 409 uses-the results of Step 408 as described above. -When -Step 409 receives a Bin Spectral-Steadiness Factor obtained by employing the just--described alternative event decision sensing technique, the Subhead Spectral-Steadiness . =
Factor of Step 409-may also be used as an indicator of a transient. For example, if the .
range of values produced by Step 409 is 0 to 1, a transient may be considered to be = present when the SUbband Spectral-Steadiness Factor is a qmall vain;
such as, for =. example, 0.1, indicating substantial spectral. unsteadiness.
= It will be appreciated that the Bin Spectral-Steadiness Factor prodneed by Step .
=
= 10 408 and by the-just-described-alternative to Step 408 each inherently Provide a variable thresholht to a certain, degree in_ that they are baied on relative changes from block to . =
block. Optionally, it may be useful to supplement such inherency by specifically providing a shift in the threshold in response to, for example, multiple transients in a = frame or a large transient among smaller transients.(e.g., a loud transient coming atop mid- to low-level applause). In the case of the latter example, an event detector may initially identify each clap as an event, but a loud transient (e.g., a drum bit) may make it = . =
desirable:to shift the threshold so that only the dmin hit is identified as an event..
=
Alternatively, a randomness metric may be employed (for example, as described =
= in U.S. Patent Re 36,714) instead Of a measure of spectral-steadiness over time.
. .
= 20 == Step 410. Calculate Interchanuel Angle Consistency Factor.=
. =
For each subbandhaving more than one-bin, calculate a frame-rate Interehannel =
= Angle Consistency Factor as follows: = =
=
a. Divide the magnitude of the complex sum of Step 407e by the sum of the =
= magnitudes of Step 405. the resulting "raw" Angle Consistency Factor is a = number in the range of 0 to 1.
= b.-Calculate a correction *tor: let n = the number of yalues across the =
= subband contributing to the two quantities in the above step (in other words, ``n" is-the number' of bins in. the subband). If a is less than. 2, let the Angle Consistency -= 30. = = Facto be 1 and go to Steps 411 and 413.
= = c. Let r = 4xpeeted Random Variation = 1/n. Subtract r from ;the result of == Step 410b. . ==
= .
=
' 2005/086139 PC1702605/0063:._ =
d. Normalive the result of Step 410c by dividing by (1 x.). The result has a maximum value of 1.. Limit the minimum. value to 0 as necessary.
= = Commenti regarding Step 410:
Interchannel Angle Consistency is a measure of how similar the interchannel .
phase angles are within a subband over a frame period. If all bin intexchanncl angles of = the subband are the same, the Interchannel Angle Consistency Factor is 1.0; whereas, if the inrerchannel angles are randomly scattered, the value approaches zero.
The Subband Angle Consistency Factor indicates if there is a phantom iinage between the charrnels If the consistency is low, then it is desirable to deaorrelate the -channels. A high value indicates a fused image. Image fusion is independent of other signal characteristics.
It will be noted that the Subband Angle Consistency Factor, although an angle parameter, is determined indirectly from two magnitudes. If the interchann.el angles are.
all the same, adding the complex values and then taking the magnitude yields the same result as taking all the magnitudes and adding them, so the qUotient is 1. If the interchannel angles are scattered, adding the complex values (such as adding vectors having different angles) results in at least partial eancellation, so the magnitude of the sum is less than the sum of the magnitudes, and the quotient is less than 1.
Following is a simple example of a subband having two bins:
Suppose that the two complex bin values are (3 +j4) and (6 j8). (Same angle each case: angle = arctan. (imag/real), so anglel arctan (4/3) and ang1e2 =
arctan (8/6) arctan. (4/3)). Adding complex values; sum = (9 j12), magnitude of which is = square root (81+144) =-- 15.
The sum of the magnitudes is magnitude of (3 + j4)+magnitude of (6 +j8) = 5 +
,25 10= 15. The quotient is therefore 15/15 1 = consistency (before 1/n.
normalization, would also be 1 after normaliation) (Normali7ed consistency = (1 - 0.5) f (1 -0.5) =1.0).
If one of the above bins has a different angle, say that the second one hag complex value (6¨j 8), which has the same magnitude, 10. The complex sum is now (9 j4), which has magnitude of square root (81 + 16) = 9.85, so the quotient is 9.85 /
15 = 0.66 =
consistency (before normalization). To normalize, subtract 1/n" 1/2, and divide by (1-1/n) (normali7ed consistency= (0.66 - 0.5) 1(1 - 0,5) = 0.32.) .
_ . . =
'02005/086139 = =
=
Although the above-described technique for determining a Subband Angle Consistency Factor has been found useful, its use is not critical. Other suitable techniques . = may he employed. For example, one could calculate a standard deviation of andles using standard formulae. In any ease, it is desirable to employ amplitude weighting to Tninimire the effect of small signals on the calculated consistency value.
In addition, an alternative derivation of the Subband Angle Consistency Factor may use energy (the squares of the magnitudes) instead of magnitude. This may be accomplished by squaring the magnitude from Step 403 before it is applied to Steps 405 and 407.
' Step 411. Derive Subb and Decorrelation. Scale Factor.
Derive a frame-rate DeCotrelation Scala Factor for each subband as follows:
a.. Let x = frame-rate Spectral-Steadiness Factor of Step 409E
b. Let y= frame-rate Angle Consi stency.Factor of Step 410e.
c. Then the frame-rate Subband Decorrelation Scale Factor = (1¨ x) * (1 y), an-umber between 0 and 1.
Comments regarding Step 411:
The Subband Decorrelation Scale Factor is a function of the spectral-steadiness of signal characteristics over time in a subband of a channel (the Spectral-Steadiness Factor) - = and the consistency in the same subband of a channel of bin angles with respect to corresponding bins of a reference channel (the Interchannel Angle Consistency Factor).
The Subband Decorrelation Scale Factor is high only if both the Spectral-Steadiness Factor and the Interchannel Angle Consistency Factor are low.
As explained above, the Decorrelation Scale Factor controls the degree of envelope decorrelation provided in the decoder. Signals that exhibit spectral steadiness over time preferably should not be decorrelated by altering their envelopes, regardless of what is happening in other channels, as it may-result in audible artifacts, namely wavering or warbling of the signaL
Step 412. Derive Subband Amplitude Scale Factors.
From the subband frame energy values of Step 404 and from the subband frame . energy values of all odim channels (as may be obtained by a step conespOnding to Step 404 or an equivalent thereof), derive frame-rate Subband Amplitude Scale Factors as follows:
. ' ) 20051086139 . .
a. For each subband, sum the energy values per frame across all input channels.
b. Divide each subband energy value per frame, (from Step 404) by the sum of the energy values across all input channels (from Step 412a) to create values in the range of 0 to 1.
c. Convert eachratio to dB, in the range of ¨co to 0.
d. Divide by the scale factor granularity, which may be pet at 1.5 dB, for example, change sign to yield a non-negative value, limit to a maximnm value which maybe, for example, 31 (i.e. 5-bit precision) and round to the nearest integer to create the quantized . .
value. These values are the frame-rate Subband Amplitude Scale Factors and are conveyed as part of the sidechain information.
. e. If the coupling frequency of the encoder is-below about 1000 Hz, apply the subband frame-averaged or frame-accumulated magnitudes to a time smoother that operates on all subbands below that frequency and above the coupling frequency.
Comments regarding Step 412e: See comments regarding step 404e except that in the case of Step 412e, there is no suitable subsequent step in which the time smoothing - may alternatively be performed.
Comments for Step 412: =
Although the granularity (resolution) and quantization precision indicated here have been found to be useful, they are not critical and other values may provide acceptable results. =
Alternatively, one mayuse amplitude instead of enerp- to generate the Subband Amplitude- Scale Factors. If using amplitude, one would-use d13=20*log(amplitade ratio), else if nsing energy, one converts to dB via d13=10*log(energy ratio), where amplitude ratio = square root (energy ratio). =
Step 413. Signal-Dependently Time Smooth interehannel Subband Phase Angles.
ApPly signal-dependent temporal smoothing to subband frame-rate interchannel angles derived in Step 407f:
. a. Let v = Subband Spectral-Steadiness Factor of Step 409d.
b. Let w = corresponding Angle Consistency Factor of Step 410e.
= c. Let x = (1 ¨ w. This is a value between 0 and 1, which is high if the *
Spectral-Steadiness Factor is low and the Angle Consistency Factor is high.
=
=
=
=
'0 20051086139 PCT/US2005/0063z9 =
= d- Let y = 1 ¨x. y is high. if Spectral-Steadiness Factor is high and Angle Consistency Factor is low.
e. Let z = ye'P , where exp is a constant, which maybe = 0.1. z is also in the range of 0 to 1, but skewed toward 1, corresponding to a slow time constant .
If the Transient Flag (Step 401) for the channel is set, set z 0, corresponding to a fast thne constant in the presence of a transient g. Compute Jim, a maximum allowable value of; Jim = 1¨ (0.1 * w). This ranges from 0.9 if the Angle Consistency Factor is high to 1.0 if the Angle Consistency Factor is low (0).
h: Limit z by lim as necessary: if (z > lira) then. z = lim.
1. Smooth the subband angle of Step 407f using the value of z and a running Smoothed value Of Rug e maintained for each subband. TIA=angle of Step 407f and RSA.= running smoothed ang e value as of the previous block and NewRSA =
is the new value of the running smoOthed angle, then: NewRSA = RSA * z + A *
(1¨z). The value of RSA is subsequently set equal to NewRSA before processing the following block. New RSA is the signal-dependently time-smoothed angle output of Step 413.
Comments regarding Step 413:
When a transient is detected, the subband angle update time constant is set to 0, =
allowing a rapid subband angle change. This is desirable because it allows the normal .angle update mechanism to use a range of relatively slow time constant, minimizing = image wandering during tatic or quasi-static signals, yet fast-changing signals are treated = with fast time constants.
Although other smoothing techniques and parameters may be usable, a first-order smoother implementing Step 4/3 has been found to be suitable. If implemented as a first-order smoother flowpass filter, the variable "z" corresponds to the feed-forward coefficient (sometimes denoted aff0"), while "(1-z)" corresponds to the feedback coefficient (sometimes denoted "fb1").
Step 414. Quantize Smoothed Interchannel Subban.d Phase Angles.
Quantize the time-smoothed subhead interchanne1 angles derived in Step 413i to obtain the Subband Angle Control Parameter:
a. If the value is less than 0, add 2; so that all range values to be quantized are . .
=
= = (.=
in the range 0 to 27c.
b. Divide by the angle granularity (resolution), which may be 2z 164 radians, and round to an integer. The maximum value may be set at 63, corresponding to 6-bit quantization.
Comments regarding Step 414:
The quantized value is treated as a non-negative integer, so an easy way to quantize the angle is to map it to a non-negative floating point number ((add 2n if less thnn O 2na1rindthe range 0 to (less than) 27c)), scale by the granularity (resolution), and _round to an integer. Similarly, dequanti7ing that integer (which could otherwise be done with a simple table lookup); can be accomplished by scaling by the inverse of the angle granularity factor, converting anon-negative integer to a non-negative floating point angle (again, range 0 to 2n), after which it can be renormali7ed to the range --ac for further use. Althoug,b such quantivation of the Subband Angie Control Parameter has been found to be useful, such a quantization is not critical and other quantizations may provide ac:ceptable results.
Step 415. Quantize Subband Decorrelation Scale Factors.
Qnantize the Subband Deem-elation Scale Factors produced by Step 411 to, for example, 8 levels (3 bits) by multiplying by 7.49 and rounding to the nearest integer.
These quantized values are part of the sidec,hain information.
Comments regarding Step 415: .
Although such quantization of the Subband Decorrelation Scale Factors has been found to be useful, quantization using the example values is not critical and other quantizations may provide acceptable results.
Step 416. Dequantize Subband Angle Control Parameters.
Dequantin the Subband Angle Control Parameters (see Step 414), to use prior to downrnixing.. .
Comment regarding Step 416:
=
Use of quantized values in the encoder helps maintain synchrony between the encoder and the decoder.
Step 417. Distribute Frame-Rate Dequandzed Subband Angle Control Parameters Across Blocks.
In preparation for dowumixing,-dishibute the once-per-frame dequantized =
=
. = 2005/086139 PCT/US2005/006359 Subband Angle Control Parameters of Step 416 across time to the subbauds of each block within the frame. =
Comment regarding Step 417: =
= The same frame value may be assigned to each block in the frame.
Alternatively, .
it may be useful to interpolate the Subband Angle Control Parameter values across the blocks in a frame. Linear interpolation over time may be employed in the manner of the linear interpolation across frequency, as described below.
Step 418. Interpolate block Subb and Angle Control Parameters to Bins . Distrilmte the block Subhead Angle Control Parameters of Step 417 for each . 10 channel. across frequency to bins, preferably using linear interpolation as described below.
= Comment regarding Step 418:
If linear interpolation across frequency is employed, Step 418 1ninimi7,es phase = angle changes from. bin to bin across a subband boundary, thereby Minimizing aliasing artifacts. Such linear interpolation may be enabled, for example, as described below following the description of Step 422, Subband angles are calculated independently of one another; each representing an average across a subband. Thus, there may be a large change from one subband to the next. If the net angle value for a subband is applied to all bins in the subband (a "rectangular" subb and distribution), the entire phase change from one subband to a neighboring subband occurs between two bins. If there is a strung' signal component there, there may be severe, possibly audible, aliasing.
Linear interpolaticha, between the centers of each subband, for example, spreads the phase angle change over all the bins in the subband, minimizing the change between any pair ofbins, so that, for example, the angle at the low end of a subband mates with the ngle at the high end of the subband below it, while maintaining the overall average the same as the given calculated subband angle. In other words, instead of rectangular subband distributions, the subband angle distribution may be trapezoidally shaped.
=
For example, suppose that the lowest coupled subband has one bin and a subband angle of 20 degrees, the next subband has three bins and a subband angle of 40 degrees, and the third subband has five bins and a subband angle of 100 degrees. With no =
interpolation, assume that the first bin (one subband) is shifted by an angle of 20 degrees, the neit three bins (another subhead) are shifted by an. angle of 40 degrees and the next five bins (a father subband) are shifted by an angle of 100 degrees. In that example, =
. .
s =
PCT./1382005/006359 ,==== "" =
- 41 - =
there is a 60-degree maximum change, from bin 4 to bin 5. .With linear interpolation, the first bin still is shiTted by 'an. angle of 20 degrees, the next 3 bins are shifted by about 30, 40, and 50 degrees;eand the next five bins are shifted by about 67,83, 100, 117, and 133 degrees. The average subband= angle shift is the same, but the maxiinnm bin-to-bin change is reduced to 17 degrees.
Optionally, changes in amplitude from subband to subband, in connection with this and other steps described herein, such as Step 417 may also be treated in a siinilar interpolative fashion_ However, it may not be necessary to do so becanse there tends to be more natural continuity in amplitude from one subband .to the next.
Step 419. Apply Phase Angle Rotation to Bin Transform Values for ChatmeL =
Apply phase angle rotation to eaeh bin transform value as follows:
a. Let x = bin angle for this bin as calculated in Step 418. =
b. Let y = -x;
c. Compute z, a unity-magnitude complex phase rotation scale factor with angle y, z = cos (y) j sin (y). =
d. Multiply the bin value (a ilb) by z.
Comments regarding Step 419: =
The phase angle rotation applied in the encoder is the inverse of the angle derived from the Subband Angle Control .Parameter.
phase angle adjustments, as described herein; in. an encoder or encoding process prior to downmixing (Step 420) have several advantages: (1) they minimiye cancellations .
of the channels that are summed to a mono composite signal or matrixed to multiple channels, (2) they minirnive reliance on energy normalimtion (Step 421), and (3) they precompensate the decoder inverse phase angle rotation, thereby reducing aliasing.
The phase correction factors can be applied in the encoder by subtracting each = subband phase correction value from the angles of each transform bin value in that = subband. This is equivalent to multiplying each complex bin value by a complex number with *a magnitude of 1.0 and an angle eqnal to the negative of the phase correction factor.
Note that a complex nmnber of m agnitude 1, angle A is equal to cos(A)+j sin(A). This latter quantity is calculated once for each subband of each charnel, with A = -phase correction for this subband, then multiplied by each bin complex signal value to realize the phase shifted bin value.
= = - - -CA 3 0 2 62 8 3 2 0 1 8 ¨1 2 ¨ 0 3 - 02005/086139 PC=2005/0063.59 _ =
The phase shift is circular, resulting in circular convolution (as mentioned above).
While circular convolution may be benign for some continuous signals, it may create spurious. spectral components for certain continuous complex signals (such as.
a pitch pipe) or may cause blurring of transients if different phase angles are used for different subbands. Consequently, a suitable technique to avoid circular convolution may be employed or the Transient Flag may be employed such that, for example, when the Transient Flag is True, the anglecalculhtion results may be overridden, and all subbands in a channel may use the same phase correction factor such as zero or a randomized value.
Step 420. Downmix.
Downmix to mono by adding the corresponding complex transform bins across =
channels to produce a mono composite channel or downmix to multiple channels by matrixing the input channels, as for example, in. the manner of the example of FIG. 6, as described below.
Comments regarding Step 420:
In the encoder, once the transform bins of all the channels have been phase shifted, the channels are summed, bin-by-bin, to create the mono composite audio signal.
Alternatively, the Channels may be applied to a passive or active matrix-that provides either a simple summation to one channel, as in the N:1 encoding of FIG. 1, or to multiple channels. The matrix coefficients may be real or complex (real and imaginary).
Step 421. Normalize. =
To avoid cancellation of isolated bins and over-emphasis of in-phase signals, normalize the aniplitude of each bin of the mono composite channel to have substantially the same energy as the Sum of the contributing energies, as follows:
a. Let x = the sum across channels -of binenergies (Le., the squares of the bin magnitudes computed in Step 403).
b. Let y = energy of corresponding bin of the mono composite channel, calculated as per Step 403.
e. Let z = scale factor = square root (x/y). If x = 0 then y is 0 and z is set to 1. =
d. Limit z to a maximum value for example, 100. If z is initially greater than 100 (implying strong cancellation from downmixing), add an arbitary value,, - 20057086139 =
fOr example, 0.01 * square _root (x) to the real and imaginary parts of the mono composite bin, which will assure that it is large enough to be normali7ecl by the following step. =
e. Multiply the complex mono composite bin value by z.
. .
Comments regarding Step 421:
Although it is generally desirable to use the same phase factors for both encoding and decoding, even the optimal choice of a subb and phase correction value may cause one or more audible spectral components within the subband to be cancelled during the encode downmix process because the phase shifting of step 419 is performed on a subban.d rather than a binbasis. In this case, a different phase factor for isolated bins in the encoder may be used if it is detected that the sum energy of such bins is much less than the energy stun of the individual channel bins at that frequency. It is generally not = necessary to apply such an isolated correction factor to the decoder, inasmuch as isolated bins usually have little effect on overall image quality. A similar normalization may be applied if multiple channels rather than a mono channel are employed.
Step 422. Assemble and Pack into Bitstream(s).
. The Amplitude Scale Factors, Angle Control Parameters, Decorrelation Scale Factors, and Transient Flags side channel information for - rh channel, along with the common-mono composite audio or the matrixed multiple channels are multiplexed as may be desired and packed into one or more bitstreams suitable for the storage, transmission or storage and transmission medium or media.
Comment regarding Step 422:
=
The Mono composite audio or the multiple channel audio may be applied to a data-rate reducing encoding process or device such as, for example, a percePtual encoder or to a perceptual encoder and an entropy coder (e.g., arithmetic or Huffman coder) (somethnes referred to as a "lossless" coder) prior to packing. Also, as mentioned above, the mono composite audio (or the multiple channel audio) and related sidechain information may be derived from multiple input channels only for audio frequencies above a certain frequency (a "coupling" frequency). In that case, the audio frequencies below the coupling frequency in each of the multiple input channels may be stored, transmitted or stored and transmitted as discrete channels-or may be combined or = processed in some manner other than as described herein. Discrete or otherwise-=
= f. 45) combined channels may also be applied to a data reducing encoding process or device such as, for example, a perceptual encoder or a perceptual encoder and an entropy . encoder. The mono Composite audio (or the multiple channel audio) and the discrete multichannel audio may all be applied to an integrated perceptual encoding or perceptual and entropy encoding process or device prior to packing.
Optional Interpolation Flag (Not shown in FIG. 4) Interpolation across frequency of the basic phase angle shifts provided by the Subb and Angle Control Parameters may be enabled in. the Encoder (Step 418) and/or in the Decoder (Step 505, below). The optional Interpolation Flag sidechain parameter. may be employed for enablinginterpolation in the Decoder. Either the Interpolation Flag or = an enabling flag similar to the Interpolation Flag may be used in Encoder. Note that because the Encoder has access to data at the bin level, it may use different interpolation values than the Decoder, which interpolates the Subband Angle Control Parameters in the sidechain information, The use of such interpolation across frequency in the Encoder or the Decoder may be enabled it for example', either of the following two conditions are true:
Condition 1. Ha strong, isolated spectral peak is located at or near the boundary of two subbands that have substantially different phase rotation angle assignments.
Reason: without interpolation, a large phase change at the boundary may introduce a warble in. the isolated spectral component By using interpolation to spread the band-to-band phose change across the bin values within the band, the =
amount of change it the subband boundaries is reduced. Thresholds for spectral peak strength, closeness to a boundary and (iifference in phase rotation from subband to subband to satisfy this condition may be adjusted empirically.
Condition 2. If, depending on the presence of a transient, either the intercharmel phase angles (no transient) or the absolute phase angles within a channel (transient), comprise a good. fit to a linear progression.
Reason.: Using interpolation to reconstruct the data tends to provide a .
= better fit to the original data. Note that the slope of the linear pingessiOn need = not be constant amass all frequencies, only within each subband, since angle data -will still be conveyed to the decoder on a subband basis; and that forms the input =
=
-2005/086139 PCMTS2005/00( =
- 45 - =
to the Interpolator Step 418: The degree to which the data provides a good fit to satisfy tbiS condition may also be determined empirically.
Other conditions, such as those determined. empitiCally, may benefit from interpolation across frequency. The existence of the two conditions just mentioned may be determined as follows:
Condition 1. If a strong, isolated. spectral peak is located at or near the boundary of two subbands that have substantially different phase rotation angle assignments:
for the Interpolation Flag to be u,4ed by the Decoder, the Subband Angle Control Parameters (output of Step 414), and for enabling of Step 418 within the Encoder, the output of Step 413 before *quantization may be used to determine the rotation angle from subband to subband for both the Interpolation Flag and for enabling within the Encoder, the magnitude output of Step 403, the current DFT magnitudes, may be used to find = ' isolated peaks at subband boundaries.
Condition 1 It depending on the presence of a transient, either the = interchannel phase angles (no transient) or the absolute phase angles within a channel (transient), comprise a good fit to a linear progression.:
= if the Transient Flag is not true (no transient), use the relative interchannel = = bin phase angles tona=Step 406 for the fit to a linear progression determination, and if the Transient Flag is true (transient), us the ehannel's absolute phase angles from Step 403.
Decoding =
The steps of a decoding process ("decoding steps") may be described as follows.
With respect to decoding steps, reference is made to FIG. 5, which is in the nature of a hybrid flowchart and functional block diagram. For simplicity, the figure shows the derivation of sidechain information components for one channel, it being understood that sidechain information components must be obtained for each Channel unless the channel is a reference channel for sail components, as explained elsewhere.
= Step 501. Unpack and Decode=Sidechain Information.
=
Unpack and decode (including dequantization), as necessary, the sidechain data =
=
= =
- 46 - =
components (Amplitnde Scale Factors, Angle Control Parameters; Decorrelation Scale Factors, and Transient Flag) for each frame of each.ehamael (one channel shown in FIG..
5). Table lookups may be used to decode the Amplitude Scale Factors, Angle Control Parameter, and Decorrelation. Scale Factors.
_ Comment regarding Step 501: As explained above, if a reference channel is employed, the sidechain data for the reference channel may not include the Angle Control Parameters, Decorrelation Scale Factors, and Transient Flag.
= Step 502. Unpack and Decode Mono Composite or Multichannel Audio Strisi.
Unpack and decode, as necessary, the mono composite or multicbannel audio signal information to provide DFT coefficients for each transform bin of the mono composite or multichannel audio signal.
Comment regarding Step 502:
Step 501 and Step 502 may be considered to be part of a single unpacking and decoding step. Step 502 may include a passive or active matrix.
Step 503. Distribute Angle Parameter Values Across Blocks.
Block Subband Angle Control Parameter values are derived from the dequantiv.ed = =
frame Subband Angle Control Parameter values. -Comment regarding Step 503:
Step 503 may be implemented by distributing the same parameter value to every block in the frame. =
= Step 504.. Distribute Subband Decorrelation Scale Factor Across Blocks. =
Block Subband Decorrelation Scale Factor values are derived from the dequantized frame Subband Decorrelation Scale Factor values.
Comineit regarding Step 504;
Step 504 may be implemented by distributing the same scale factor value to every block in the frame.
Step 505. Linearly Interpolate Across Frequency.
Optionally, derive bin angles from the block subband angles of decoder Step 30. by linear interpolation across frequency as described above in-connection with encoder Step 418. Linear interpolation in Step 505 may be enabled when the Interpolation Flag is = used and is true. =
=
. =
= -=
YO 2005/086139 PCT/US2005/006: _ = -47-.=
Step 506. Add Randomized Phase Angle Offset (Technique 3).
In accordance with=Technique 3, described above, when the Transient Flag indicates a. transient, aim to the block Subband Angle Control Parameter provided by Step = =
503, which may have been linearly interpolated across frequency by Step 505, a randorni7ed offset value scaled by the Decorrelation. Scale Factor (the scaling may be indirect as set forth in this Step): = =
Let y --= block Subbond Decorrelation Scale Factor. ' b. Let z =ye?, where exp is a constant, for example -- 5. z will also be in the range of 0 to 1, but skewed toward 0, reflecting a bias toward low levels of randomized variation unless the Decorrelation Scale Factor value is high.
c. Let x = a randomized number between +1.0 and 1.0, chosen separately for = each sublarmd of each block. =
d. Then, the value added to the block Subband Angle Control Parameter to add a randomized angle offset value according to Technique 3 is ,x * pi * z.
Comments regarding Step 506:
As will be appreciated by those of ordinary skill in the art, "randomized"
angles =
(or "randomized amplitudes if amplitudes are also scaled) for scaling by the De,correlation.
Scale Factor may inelude not only pseudo-random and truly random variations, but also deterministically-generated variations that, when applied to phase angles or to phase angles and to amplitudes, have the effect of reducing cross-correlation between channels.
Such "randomized" variations may be obtained in many ways. For example, a pseudo-random number generator with various seed values maybe employed.
Alternatively, truly random: numbers maybe generated using a hardware random number generator.
Inasmuch as a r5ndorni7ed angle resolution of only about 1 degree may be sufficient, tables of randomi7ed numbers having two or three decimal places (e.g. 0.84 or 0.844) may be employed. Preferably, the random Ind values (between ¨1.0 and +1.0 with reference to Step 505c, above) are nniformly distributed statistically across each channel.
'Although the non-linear indirect scaling of Step 306 has been found to be useft.11, it is net critical-end other suitable scalings may be employed ¨ in particular other values for the exponent may be employed to obtain similar result.
When the Subband Decorrelation Scale Factor value is 1, a frill range of random angles from to n are added (in which ease the block Subband Angle Control =
, =
= =
I
_ = = WO 2005/086139 =
PCT/US2005/0( ) = = - 48 -Parameter values produced by Step 5th are rendered irrelevant). As the Subband . -Decorrelation Scale Factor value decreases toward zero, the randomizedangle offset also decreases toward zero, carming the output of Step 506 to move toward the Subband Angle Control Parameter values produced by Step 503.
If desired, the encoder described above may also add a scaled randomized offset in accordance with Technique 3 to the angle shift applied. to a channel before downmixing. Doing so may improve alias cancellation in the decoder. It may also be beneficial for improving the synehronicity of the encoder and decoder.
Step 507. Add Randomized Phase Angle Offset (Technique 2).
.In accordance with Technique 2, described above, when the Transient Flag does ' not indicate a transient, for each bin, add to all the block Subband Angle Control Paraineters in a frame provided by Step 503 (Step 505 operates only when the Transient Flag indicates a transient) a different randomized offset value scaled by the Decorrelation Scale Factor (the scaling may be direct as set forth herein in this step):
=
a. Let y = block Subbandpecorrelation Scale Factor.
b. Let x a randomind number between +1.0 and ¨1.0, chosen separately for each bin of each frame.
c. Then, the value added to the block bin Angle Control Parameter to add a randornived angle offset value according to Technique 3 is x * pi * y.
. Comments regarding Step 507:
Sea comments above regarding Step 505 regarding the randomized angle offset.
Although the direct scaling of Step 507 has been found to be useful, it is not critical and other suitable sealings may be employed.
To minimize temporal discontinuities, the -unique randomized angle value for each bin of each channel preferably does not change with time. The randorni7ed angle values of all the bins in a- subb and ate scaled by the same Subband Decorrelation Scale Factor value, which is updated at the frame rate. Thus, when the Subband Decorrelation Scale = Factor value is I, a full range of random angles from ---7t to +7r are added (in which case block subband angle values derived from the dequantized frame suliband angle values are rendered irrelevant). As the Subband Decorrelation Scale Factor value -diminishes toward zero, the randomized angle offset also diminishes tbward zero. Unlik-e Step 504, the scaling in this Step 507 maybe a direct function of the Subband Decorreladon Scale = = =
= =
70 2005/086139 PC1702005/006:
-49..
Factor value. For example, a Subband Decorrelation Scale Factor value of 0.5 proportionally reduces every random angle variation by 0.5.
. The scaled randomind ang a value may then be added to the bin angle from decoder Step 506. The Decorrelation Scale Factor value is updated once per frame. In the presence of a Transient Flag for the frame, this step is skipped, to avoid transient prenoise attifacts.
, If desired, the encoder described above may also add a scaled randornind offset in accordance with Technique 2 to the angle shift applied before downmixing..
Doing so may improve alias cancellation in the decoder. It may also be beneficial for improving the synchronicity of the encoder and decoder.
Step 508. Normalize Amplitude Scale Factors.
Normalize Amplitude Scale Factors across channels so that they sum-square to I.
Comment regarding Step 508:
For example, if two channels have dequantized scale factors of -3.0 d13 (= 2 *
grannlarity of 1.5 dB) (.70795), the sum of the squares is 1.002. Dividing each by the square root of 1.002 = 1.001 yields two values of .7072 (-3.01 dB).
Step 509. Boost Subband Scale Factor Levels (Optional). -Optionally, when the Transient Flag indicates no transient, apply a slight additional boost to Subband Scale Factor levels, dependent on Subband Decorrelation Scale Factor levels: multiply each nonna1i7ed Subband Amplitude Scale Factor by a small factor (e.g., 1+02 * Subband Decorrelation Scale Factor). When the Transient Flag is True, skip this step.
Comment regarding Step 509:
This step maybe useful because the decoder decorreiation, Step 507 may result in slightly reduced levels in the final inverse -filterbank process.
Step 510. Distribute Subband Amplitude Values Across Bins.
= = Step 510 may be implemented by distributing the same subband amplitude scale factor value to every bin in the subb and.
Step 510a. Add Randomized Amplitude Offset (Optional) Optionally, apply a randomized variation to the normalized Subband Amplitude Scale Factor dependent on Subb and Deeotrelation Scale Factor levels and the Transient Flag. In the absence of a transient, add a Randomized Amplitude Scale Factor that does =
=
'VO 20051086139 PC171:62005/00µ
= =
not change with time on a bin-by-bin basis (different from bin to bin), and, in the presence of a transient (in the frame or block), add a Randomized Amplitude Scale Factor that changes on a block-by-block basis (different from block to block) and changes from = subband to subhead (the same shift for all bins in a subband;, different from subband to subband). Step 510a is not shown in the drawings.
Comment regarding Step 510a:
Although the degree to which randoroi7ed amplitude shifts are added may be controlled by the Decorrelation Scale Factor, it is believed that a particular scale factor value should cause less amplitude shift than the corresponding randomized phase shift resulting from the same stale factor value in order to avoid audible artifacts.
Step 511. Tipmix.
a. For each bin of each ()Input channel, construct a complex upmix scale .
factor from the amplitude of decoder Step 508 and the bin angle of decoder Step 507: (amplitude * (cos (angle) +j sin (angle)).
b. For each output channel, multiply-the complex bin value and the complex upnaix scale factor to produce the upmixed complex output bin value of = each bin of the chnnneL
= Step 512. Perform Inverse DFT (Optional).
Optionally, perform an inverse DFT transform on the bins of each output channel 20. to yield multichannel output PCM values. As is well known, in connection with such an inverse DFT tranaformation, the individual blocks of time samples are windowed, and adjacent blocks are overlapped and added together in order to reconstruct the final continuous time output Pa/ audio signal.
Comments regarding Step 512:
A decoder according to the present invention may not provide PCM outputs. In the case where the deroder process is employed only above a given coupling frequency, and discrete MDCT coefficients are sent for each channel below that frequency, it may be desirable to convert theDFT coefficients derived by the decoder upmixing Steps 511a and 51n to MDCT coefficients, so that they can be combined with the lower frequency discrete MDCT coefficients and requantized in order to provide, for example, a bitstream compatible with an encoding system that has a large number of installed users, such as a standard AC-3 SP/DE bitstrearn for application to an external device where an inverse = = :
- =
= =
' "0 2005/086139 PCT/US2005/006 =
= =
transform may be performed. Antinverse DFT transform may be applied to ones of the output channels to provide PCM outputs.
Section 8.2.2 of the4/52A Document With Sensitivity Factor "F" Added = 8.2.2. Transient detection Transients are, detected in the full-bandwidth channels in order to decide when to switch to short length audio blocks to improve pre-echo performance. High-pass filtered versions of the Signals are examined for an increase in energy from one sub-block time-segment to the next. Sub-blocks are examined at different time scales, If a transient is = 10 detected in the second half of an audio block in a channel that channel switches to a short = block. A channel that is block-switched uses the D45. exponent strategy [i.e., the data liss a coarser frequency resolution in order to reduce the data overhead resulting from the increase in temporal resolution].
= The transient detector is used to determine when to switch from a long transform block (length 512), to the short block (length 256). It operates on 512 samples for every audio block. This is done in two passes, with each pass processing 256 'samples. Transient detection is broken down into four steps: 1) high-pass filtering, 2) segmentation of the block into submultiples, 3) peak amplitude detection within each sub-bloek segment, and 4) threshold comparison. The transient detector outputs a flag biksw[n] for each full-bandwidth channel, which when set to "one' indicates the presence of a transient in the second half of the 512 length input block for the corresponding channel.
1) High-pass filtering:.Theligh-pass filter is implemented as a cascaded biquad direct form II BR filter with a cutoff of 8.k1{z.
2) Block Segmentation: The block of 256 high-pass filtered samples are.
, segmented into a hierarchical tree of levels in which level 1 represents the 256 length block, level 2 is two segments of length 128, and level 3 is four segments of length 64.
, segmented into a hierarchical tree of levels in which level 1 represents the 256 length block, level 2 is two segments of length 128, and level 3 is four segments of length 64.
3) Peak Detection: The sample with the largest magnitude is identified for.
each segment on every level of the hierarchical tree. The pealcs for a single level are found as follows:
POP] = max(x(31)) , =
=
_ = WO 2005/086139 .. RCT/US2005/004. .. tr';' =
= and I, ..., r(j71) ; = == =
where: x(n) = the nth sample in the 256 length -block j = 1, 2, 3 is the hierarchical level number k = the segment number within level j Note that P[j][0], (i.e., k---0) is defied to be the peak of the last segment on level j of the tree calculated immediately prior to the current tree. For example, P[3][4] in the preceding tree is P[3][0]' in the current tree.
= 4) Threshold Comparison: The first stage of the threshold comparator checks to see if there is significant signal level in the current block. This is clone by comparing the overall Peak Value P[1ll] of the current block to a "silence threshold". If MEI] is below' this threshold then a long block is forced. The Silence threshold value is 100/32768. The next stage of the comparator checks the relative peak levels of adjacent segments on each level of the hierarchical tree. If the peak =
ratio of any two adjacent segments on apartiCular level exceeds a pre-defined threshold for that level, then a flag is set to indicate the presence of a transient in the current 256-length block. The ratios are compared as follows:
;. mag(P[j][k]) x T[j] > (F * inag(P[j][(k-1)])) [Note the "r sensitivity = factor]
where: T[j] is the pre-defined threshold for level j, defined as:
T[1]
= T[2] = .075 . T[3] ---- .05 = If this inequality is true for any two segment Peaks on any level, =
then a transient is indicated for, the first half of the 512 length; input block.
The second pass through this process detetmines the presence of transients ' in the second half of the 512 length input block.
Nall Encoding-.
Aspects of the present invention are not liinitectto N:1 encoding as described in connection with FIG. 1. More generally, aspects of the invention are applicable to the transformation of any umber of input channels (n input -channels) to any number of . . =
=
. , = 32005/086139 ontput channels (m output channels) in the manner of FIG. 6 (i.e., N:M
encoding).
Because in many common applications the number of input channels n is greater than the number of output channels in, the N:M encoding arrangeMent of FIG. 6 will be referred to as "downmixine for convenience in description.
Referring to the details of FIG. 6, instead of summing the outputs of Rotate Angle and Rotate Angle 10 in the Additive Combiner 6 as in the arrangement of FIG.
1, those outputs may be applied to a dovmmix matrix device or function 6' ("Downmix Matrix").
Downinix Matrix 6' may be a passive or active matrix that provides either a simple summation to one channel, as in the N:1 encoding of FIG. 1, or to multiple 'channels. The matrix coefficients may be real or complex (real and iniaginary). Other devices and functions in FIG. 6 may be the same as in the FIG. 1 arrangement and they bear the same reference numerals.
Downmix Matrix 6' may provide a hybrid frequency-dependent function such that it provides, for example, channels in a frequency range fl to f2 and 131s43 channels in a frequency range la to 3. For example, below &coupling frequency 4 for example, 1000 Hz the Downmix Matrix 6' may provide two channels and above the coupling frequency the Downmix Matrix 6' may provide one channel. By employing two channels below the coupling frequency, better spatial fidelity may be obtained, especially if the two channels represent horizontal directions (to match the horizontality of the human ears).
Although FTG. 6 shows the generation of the same siderbain information for each channel as in the FIG. 1 arrangement, it may be possible to omit certain ones of the sidechsin information when more than one channel is provided by the output of the Downmix Matrix 6'. In some cases, acceptable results may be obtained when only the amplitude scale factor sidechain information is provided by the FIG. 6 arrangement.
Further details regarding sidechain options are discussed below in connection with the descriptions of FIGS. 7,8 and 9.
As just mentioned above, the multiple channels generated by the Downmix Matrix 6' need not be fewer than the number of input channels n. When the purpose of an encoder such as in FIG. 6 is to reduce the number of bits for transmission or storage, it is.
likely that the number of channels produced by downmix matrix 6' will be fewer than the number of input channels a However, the arrangement of FIG. 6 may also be used as an =
=
=
_ =
WO 2005/086139 PCT/U52005/006 ' "upraixer." In that case, there may be applications in which the number of channels m produced by the Downmix Matrix 6' is more than the number of input channels 11.
Encoders as described in connection with the examples of FIGS. 2; 5 and 6 may also include their cnyoi local decoder or decoding function in order to determine if the = 5 audio information and the sidechain information, when decoded by such a decoder, would provide suitable results. The results of such a determination could be used.to improve the parameters by employing, for example, a recursive process. In a block encoding and decoding system, recursion calculation.s could be performed, for example, on every block before the next block ends in order to m1nimi7e the delay in transmitting a block of andio information and its associated spatial parameters.
. An arrangement in which the encoder also includes its own decoder or decoding function could also be employed advantageously when. spatial parameters are not stored or sent only for certain blocks. If tinsuitable decoding would result from not sending -spatial-parameter sidechain information, such sidechain information would be sent for the particular block. In this case, the decoder may be a modification of the decoder or =
decoding function of FIGS. 2, 5 or 6 in that the decoder would have both the ability to recover spatial-parameter sidechain information for frequencies above the coupling *frequency from the incoming bitstream but also to generate simulated spatial-parameter sidechain information from the stereo information below the coupling frequency.
In a simplified alternative to such local-decoder-incorporating encoder examples, rather than having a local decoder or decoder function, the encoder could simply check to -determine if there were any signal content below the coupling frequency (determined in , any suitable way, for example, a sum of the energy in frequency bins through the frequency range), and, if not, it would send or store spatial-parameter sidechain information rather than not doing so if the energy were above the threshold.
Depending on the encoding scheme, low signal information below the coupling frequency May also result in more bits being available for se ding Sidechain information.
= 1 = .114.:N Decoding A more generalized form of the arrangement of FIG. 2 is shown in FIG. 7, wherein an npmix matrix function or device ("Upmix-Matix") 20 receives the Ito in channels generated by the arrangement of FIG. 6. The Uptaix Matrix 20 may be a passive-matrix. It maybe, but need not be, the conjugate fransposition (i.e., the =
, . . _ . . .
- , = - 73221-92 = .
. . i . , .
- = = .
- . = = .
. . . .
.
- ' - 55 - . . = .
. . = .
= = = -complement) nf the Downmii Matrix 6 Of tle.FIG. 6 arrangement. Alternatively, the , = =
Upx Matrix 20 ma.y btian. actiye matrix ¨ a variable matrix or 4 passive matrix in , .
.
. combination with a variable matrix. If an active matrix decoder is employed, in its = , -. ... .
= .relaxed or quiespent state it may be the complex conjugate of the Downmix Matrix or it -.
z 5 may be independent of the Downmix Matrix. The sidechain information may be applied :=,.
= . . .
= eh shown in FIG.? so as to control tbe=Adjust:Arailitude, Rotate Angle, and (optional) . Interpolator functions or devices. In that case, the Upmix Matrix; if an active matrix,. . = =
.
.
operates independently of the sidechaia information-and responds only to the channels . applied to it Alternatively, some or all of the sidechain information maybe applied to =
. the active matrix to assist its operation. In that case; some or all of the Adjust Amplitude, . .
= Rotate Angle, and Interpolator. Inactions or devices may be omitted. The Decoder . .
.
=
example of FIG. 7 may also employ the alternative of applying a degree of randomind .
= =
amplitude variations. under Certain signal Conditions, as described abcive in connection .
. .
With FIGS. 2 and 5. .
. . .
.
. 15 . When Upnli-x Matrix 20 is an active matrix, the5arrangement of FIG. 7 may be . = . = , characterized as a "hybrid matrix decoder" for operating in a "hybrid matrix = .
. .encoder/decoder system." "Hybrid" in this context refers to the fact that the decoder may = = = =
, .
derive some measure of control information from its input.audio signal (Le.;
the active . =
. matrix responds to spatial information encoded in the channels applied to it) and a further - - 20 . measure of control information front spatial-parameter sidechain information. Other elements of FIG. 7 are as in the arrangement of FIG.:2 and bear the same reference = -.
.
=
- . numerals. . . - -. . . . =
.
, = Suitable active matrix decoders for use in a hybrid Matrix decoder mayinclude = active matrix decoders such as those mentioned above, ' . . - = .
. .
. 25 including, for example, matrix decoders known as "I"ro Logic" and "Pre Logic'II"' decoders -("Pro. Logic" is atrademark of Dolby Laboratories Licensing Corporation). = = = Altenzattve Decorrelation .
. .
. .
FIGS. 8 and 9 show variations on the generalized Decoder of FIG. 7. In .
.
= =
particular, both the arrangement of FIG. 8 and the arrangement of FIG. 9 show =-=-=
. .
- ' 30 alternatives fo the decorre,lationtechnique of mq . 2 and 7. In FIG. 8, respective .
..
= riceerrelator functions ox devises ("Dzeorrelators") 46 and 48 are in the time domain, . . .= each following the respective Inverse Filterbank 30 and 36 in their channel. In FIG, 9, .
. ..
. . . : . . .
:. . . .
= . .
.
. . . .
, = = = , = . .
. .
, = =
=
, 221-92 =
respective decorrelator functions or devices ("Decorrelators") 50 and 52 are in the frequency domain, each prerfding the respective Inverse Filterbank 30 and 36 in their channel. In both the FIG. 8 and FIG. 9 arrangements, each of the Decorrelators (46,48, 50,52) ha a unique characteristic so that their outputs are mutually deeorrelated with =
respect to each other. The Decorrelation Scale Factor may be used to control, for example, the ratio of decorrelated to correlated signal provided in each channeL
Optionally, the Transient Flag may also be used to shift the mode of operation, of the Decorrelator, as is explained below. In both the FIG. 8 and FIG. 9 arrangements, each = Decorrelator may be a Schroeder-type reverberator having its own unique filter characteristic, in which the amount or degree of reverberation is controlled by the decorrelation scale factor (implemented, for example, by controlling the degre,e to which .the Decorrelator output forms a part of a linear combination of the Decorrelator input and output). Alternatively, other controllable decorrelation techniques may be employed either alone or in combination with each other or with a Schroeder-type reverberator.
Schroeder-type reverberators are well known and Trmy trace their origin to two journal papers: "'Colorless' Artificial Reverberation" by M.R. Schroeder and B.F.
Logan, IRE
Transactions on Audio, vol. AU-9, pp. 209-214, 1961 and "Natural Sounding Artificial =
Reverberation" by M.R. Schroeder, Journal A.E.S., July 1962, vol. 10, no. 2, pp. 219-223.
When the Decorrelators 46 and 48 operate in the time domain, as in the FIG. 8 arrangement, a single (i. e., wideband) Decorrelation Scale Factor is required.. TES may Fe obtained by any of several ways. For example, only a single Decorrelation Scale Factor may be generated in the encoder of FIG. 1 or FIG. 7. Alternatively, if the encoder of no. 1 or FIG. 7 generateR Decorrelation Scale Factors on ft subb and basis, the Subband Decorrelation Scale Factors may be amplitude or power summed in the encoder of FIG. 1 or FIG. 7 or in the decoder of FIG. 8. = -When the Decorrelators 50 and 52 operate in the frequency domain, as in the FIG.
9 arrangement, they may receive a decorrelation scale factor for each subband or groups - =
of subbands and, concomitantly, provide a commensurate degree of decorrelation for such subbands or groups of subbands.
The Decorrelators 46 and 48 of FIG. 8 and the Decorrelators 50 and 52 of FIG_ may optionally receive the Transient Flag. In the iime-domain. Decorrelators of FIG. 8, .
the Transient Flag fray be enaployedIo shift the mode of operation of the respective Decorrelator. For example, the Decorrelator may operate as a Schroeder-type reverberator in the absence of the transient flag but upon its receipt and for a short subsequent time period, say 1 to 10 milliseconds, operate as a fired delay.
Each channel may have a predetermined fixed delay or the delay may be varied in response toa .. plurality of transients within a short time period. In the frequency-domain Decorrelators of FIG. 9, the transient flag may also be employed to shift the mode of operation of the respective DeCorreIator. Rowever, in this case, the receipt of a transient flag may, for example, trigger a short (several milliseconds) increase in-amplitude in the channel in which the flag occurred.
In both the FIG. 8 and 9 arrangements, an Interpolator 27(33). controlled by the optional Transient Flag, may provide interpolation across frequency of the phase angles output of Rotate Angle 28 (33) in a manner as described above.
As mentioned.above, when two or more channels are sent in addition to sidechain information, it may be acceptable to rednee the number of sidechain parameters_ For example, it may be acceptable to send only the Amplitude Scale Factor, ii which case the decorrelation and angle devices or functions in the decoder may be omitted (in that case, FIGS. 7, 8 and 9 reduce to the same arrangement).
Alternatively, only the amplitude scale factor, the Decorrelation Scale Factor, and, optionally, the Transient Flag may be sent. In that case, any of the FIG. .7, 8 or 9 arrangements may be employed (omitting the Rotate Angle 28 and 34 in each of them).
As another alternative, only the amplitude scale factor and the angle control parameter may be sent. In that case, any of the FIG. 7, 8 or 9 arrangements may be employed (omitting the Decorrelator 38 and 42 of FIG. 7 and 46,48, 50,52 of FIGS. 8 and 9).
As in FIGS. 1 and 2, the arrangements of FIGS. 6-9 are intended to show any number of input. and output ehannels although, for simplicity in presentation, only two channels are shown.
It should be understood that implementation of ether. variations and modifications Of the invention and its various aspects will be apparent to those skilled in the art, and that the invention is not limited by these specific embodiments described. It is therefore contemplated to cover by the present invention any and all modifications, variations, or , CA 302 62 83 2 018 ¨12 ¨03 = 7321-92 . . .
=
=
. .
=
.=
. .
=
equivalents that fall Witt:lir., the trite scope of the hasie yncierlying principles = disclosecl herein. =
. = = , .
. = =
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
= .
. .
=
=
=
= . =
=
=
. .
=
=
each segment on every level of the hierarchical tree. The pealcs for a single level are found as follows:
POP] = max(x(31)) , =
=
_ = WO 2005/086139 .. RCT/US2005/004. .. tr';' =
= and I, ..., r(j71) ; = == =
where: x(n) = the nth sample in the 256 length -block j = 1, 2, 3 is the hierarchical level number k = the segment number within level j Note that P[j][0], (i.e., k---0) is defied to be the peak of the last segment on level j of the tree calculated immediately prior to the current tree. For example, P[3][4] in the preceding tree is P[3][0]' in the current tree.
= 4) Threshold Comparison: The first stage of the threshold comparator checks to see if there is significant signal level in the current block. This is clone by comparing the overall Peak Value P[1ll] of the current block to a "silence threshold". If MEI] is below' this threshold then a long block is forced. The Silence threshold value is 100/32768. The next stage of the comparator checks the relative peak levels of adjacent segments on each level of the hierarchical tree. If the peak =
ratio of any two adjacent segments on apartiCular level exceeds a pre-defined threshold for that level, then a flag is set to indicate the presence of a transient in the current 256-length block. The ratios are compared as follows:
;. mag(P[j][k]) x T[j] > (F * inag(P[j][(k-1)])) [Note the "r sensitivity = factor]
where: T[j] is the pre-defined threshold for level j, defined as:
T[1]
= T[2] = .075 . T[3] ---- .05 = If this inequality is true for any two segment Peaks on any level, =
then a transient is indicated for, the first half of the 512 length; input block.
The second pass through this process detetmines the presence of transients ' in the second half of the 512 length input block.
Nall Encoding-.
Aspects of the present invention are not liinitectto N:1 encoding as described in connection with FIG. 1. More generally, aspects of the invention are applicable to the transformation of any umber of input channels (n input -channels) to any number of . . =
=
. , = 32005/086139 ontput channels (m output channels) in the manner of FIG. 6 (i.e., N:M
encoding).
Because in many common applications the number of input channels n is greater than the number of output channels in, the N:M encoding arrangeMent of FIG. 6 will be referred to as "downmixine for convenience in description.
Referring to the details of FIG. 6, instead of summing the outputs of Rotate Angle and Rotate Angle 10 in the Additive Combiner 6 as in the arrangement of FIG.
1, those outputs may be applied to a dovmmix matrix device or function 6' ("Downmix Matrix").
Downinix Matrix 6' may be a passive or active matrix that provides either a simple summation to one channel, as in the N:1 encoding of FIG. 1, or to multiple 'channels. The matrix coefficients may be real or complex (real and iniaginary). Other devices and functions in FIG. 6 may be the same as in the FIG. 1 arrangement and they bear the same reference numerals.
Downmix Matrix 6' may provide a hybrid frequency-dependent function such that it provides, for example, channels in a frequency range fl to f2 and 131s43 channels in a frequency range la to 3. For example, below &coupling frequency 4 for example, 1000 Hz the Downmix Matrix 6' may provide two channels and above the coupling frequency the Downmix Matrix 6' may provide one channel. By employing two channels below the coupling frequency, better spatial fidelity may be obtained, especially if the two channels represent horizontal directions (to match the horizontality of the human ears).
Although FTG. 6 shows the generation of the same siderbain information for each channel as in the FIG. 1 arrangement, it may be possible to omit certain ones of the sidechsin information when more than one channel is provided by the output of the Downmix Matrix 6'. In some cases, acceptable results may be obtained when only the amplitude scale factor sidechain information is provided by the FIG. 6 arrangement.
Further details regarding sidechain options are discussed below in connection with the descriptions of FIGS. 7,8 and 9.
As just mentioned above, the multiple channels generated by the Downmix Matrix 6' need not be fewer than the number of input channels n. When the purpose of an encoder such as in FIG. 6 is to reduce the number of bits for transmission or storage, it is.
likely that the number of channels produced by downmix matrix 6' will be fewer than the number of input channels a However, the arrangement of FIG. 6 may also be used as an =
=
=
_ =
WO 2005/086139 PCT/U52005/006 ' "upraixer." In that case, there may be applications in which the number of channels m produced by the Downmix Matrix 6' is more than the number of input channels 11.
Encoders as described in connection with the examples of FIGS. 2; 5 and 6 may also include their cnyoi local decoder or decoding function in order to determine if the = 5 audio information and the sidechain information, when decoded by such a decoder, would provide suitable results. The results of such a determination could be used.to improve the parameters by employing, for example, a recursive process. In a block encoding and decoding system, recursion calculation.s could be performed, for example, on every block before the next block ends in order to m1nimi7e the delay in transmitting a block of andio information and its associated spatial parameters.
. An arrangement in which the encoder also includes its own decoder or decoding function could also be employed advantageously when. spatial parameters are not stored or sent only for certain blocks. If tinsuitable decoding would result from not sending -spatial-parameter sidechain information, such sidechain information would be sent for the particular block. In this case, the decoder may be a modification of the decoder or =
decoding function of FIGS. 2, 5 or 6 in that the decoder would have both the ability to recover spatial-parameter sidechain information for frequencies above the coupling *frequency from the incoming bitstream but also to generate simulated spatial-parameter sidechain information from the stereo information below the coupling frequency.
In a simplified alternative to such local-decoder-incorporating encoder examples, rather than having a local decoder or decoder function, the encoder could simply check to -determine if there were any signal content below the coupling frequency (determined in , any suitable way, for example, a sum of the energy in frequency bins through the frequency range), and, if not, it would send or store spatial-parameter sidechain information rather than not doing so if the energy were above the threshold.
Depending on the encoding scheme, low signal information below the coupling frequency May also result in more bits being available for se ding Sidechain information.
= 1 = .114.:N Decoding A more generalized form of the arrangement of FIG. 2 is shown in FIG. 7, wherein an npmix matrix function or device ("Upmix-Matix") 20 receives the Ito in channels generated by the arrangement of FIG. 6. The Uptaix Matrix 20 may be a passive-matrix. It maybe, but need not be, the conjugate fransposition (i.e., the =
, . . _ . . .
- , = - 73221-92 = .
. . i . , .
- = = .
- . = = .
. . . .
.
- ' - 55 - . . = .
. . = .
= = = -complement) nf the Downmii Matrix 6 Of tle.FIG. 6 arrangement. Alternatively, the , = =
Upx Matrix 20 ma.y btian. actiye matrix ¨ a variable matrix or 4 passive matrix in , .
.
. combination with a variable matrix. If an active matrix decoder is employed, in its = , -. ... .
= .relaxed or quiespent state it may be the complex conjugate of the Downmix Matrix or it -.
z 5 may be independent of the Downmix Matrix. The sidechain information may be applied :=,.
= . . .
= eh shown in FIG.? so as to control tbe=Adjust:Arailitude, Rotate Angle, and (optional) . Interpolator functions or devices. In that case, the Upmix Matrix; if an active matrix,. . = =
.
.
operates independently of the sidechaia information-and responds only to the channels . applied to it Alternatively, some or all of the sidechain information maybe applied to =
. the active matrix to assist its operation. In that case; some or all of the Adjust Amplitude, . .
= Rotate Angle, and Interpolator. Inactions or devices may be omitted. The Decoder . .
.
=
example of FIG. 7 may also employ the alternative of applying a degree of randomind .
= =
amplitude variations. under Certain signal Conditions, as described abcive in connection .
. .
With FIGS. 2 and 5. .
. . .
.
. 15 . When Upnli-x Matrix 20 is an active matrix, the5arrangement of FIG. 7 may be . = . = , characterized as a "hybrid matrix decoder" for operating in a "hybrid matrix = .
. .encoder/decoder system." "Hybrid" in this context refers to the fact that the decoder may = = = =
, .
derive some measure of control information from its input.audio signal (Le.;
the active . =
. matrix responds to spatial information encoded in the channels applied to it) and a further - - 20 . measure of control information front spatial-parameter sidechain information. Other elements of FIG. 7 are as in the arrangement of FIG.:2 and bear the same reference = -.
.
=
- . numerals. . . - -. . . . =
.
, = Suitable active matrix decoders for use in a hybrid Matrix decoder mayinclude = active matrix decoders such as those mentioned above, ' . . - = .
. .
. 25 including, for example, matrix decoders known as "I"ro Logic" and "Pre Logic'II"' decoders -("Pro. Logic" is atrademark of Dolby Laboratories Licensing Corporation). = = = Altenzattve Decorrelation .
. .
. .
FIGS. 8 and 9 show variations on the generalized Decoder of FIG. 7. In .
.
= =
particular, both the arrangement of FIG. 8 and the arrangement of FIG. 9 show =-=-=
. .
- ' 30 alternatives fo the decorre,lationtechnique of mq . 2 and 7. In FIG. 8, respective .
..
= riceerrelator functions ox devises ("Dzeorrelators") 46 and 48 are in the time domain, . . .= each following the respective Inverse Filterbank 30 and 36 in their channel. In FIG, 9, .
. ..
. . . : . . .
:. . . .
= . .
.
. . . .
, = = = , = . .
. .
, = =
=
, 221-92 =
respective decorrelator functions or devices ("Decorrelators") 50 and 52 are in the frequency domain, each prerfding the respective Inverse Filterbank 30 and 36 in their channel. In both the FIG. 8 and FIG. 9 arrangements, each of the Decorrelators (46,48, 50,52) ha a unique characteristic so that their outputs are mutually deeorrelated with =
respect to each other. The Decorrelation Scale Factor may be used to control, for example, the ratio of decorrelated to correlated signal provided in each channeL
Optionally, the Transient Flag may also be used to shift the mode of operation, of the Decorrelator, as is explained below. In both the FIG. 8 and FIG. 9 arrangements, each = Decorrelator may be a Schroeder-type reverberator having its own unique filter characteristic, in which the amount or degree of reverberation is controlled by the decorrelation scale factor (implemented, for example, by controlling the degre,e to which .the Decorrelator output forms a part of a linear combination of the Decorrelator input and output). Alternatively, other controllable decorrelation techniques may be employed either alone or in combination with each other or with a Schroeder-type reverberator.
Schroeder-type reverberators are well known and Trmy trace their origin to two journal papers: "'Colorless' Artificial Reverberation" by M.R. Schroeder and B.F.
Logan, IRE
Transactions on Audio, vol. AU-9, pp. 209-214, 1961 and "Natural Sounding Artificial =
Reverberation" by M.R. Schroeder, Journal A.E.S., July 1962, vol. 10, no. 2, pp. 219-223.
When the Decorrelators 46 and 48 operate in the time domain, as in the FIG. 8 arrangement, a single (i. e., wideband) Decorrelation Scale Factor is required.. TES may Fe obtained by any of several ways. For example, only a single Decorrelation Scale Factor may be generated in the encoder of FIG. 1 or FIG. 7. Alternatively, if the encoder of no. 1 or FIG. 7 generateR Decorrelation Scale Factors on ft subb and basis, the Subband Decorrelation Scale Factors may be amplitude or power summed in the encoder of FIG. 1 or FIG. 7 or in the decoder of FIG. 8. = -When the Decorrelators 50 and 52 operate in the frequency domain, as in the FIG.
9 arrangement, they may receive a decorrelation scale factor for each subband or groups - =
of subbands and, concomitantly, provide a commensurate degree of decorrelation for such subbands or groups of subbands.
The Decorrelators 46 and 48 of FIG. 8 and the Decorrelators 50 and 52 of FIG_ may optionally receive the Transient Flag. In the iime-domain. Decorrelators of FIG. 8, .
the Transient Flag fray be enaployedIo shift the mode of operation of the respective Decorrelator. For example, the Decorrelator may operate as a Schroeder-type reverberator in the absence of the transient flag but upon its receipt and for a short subsequent time period, say 1 to 10 milliseconds, operate as a fired delay.
Each channel may have a predetermined fixed delay or the delay may be varied in response toa .. plurality of transients within a short time period. In the frequency-domain Decorrelators of FIG. 9, the transient flag may also be employed to shift the mode of operation of the respective DeCorreIator. Rowever, in this case, the receipt of a transient flag may, for example, trigger a short (several milliseconds) increase in-amplitude in the channel in which the flag occurred.
In both the FIG. 8 and 9 arrangements, an Interpolator 27(33). controlled by the optional Transient Flag, may provide interpolation across frequency of the phase angles output of Rotate Angle 28 (33) in a manner as described above.
As mentioned.above, when two or more channels are sent in addition to sidechain information, it may be acceptable to rednee the number of sidechain parameters_ For example, it may be acceptable to send only the Amplitude Scale Factor, ii which case the decorrelation and angle devices or functions in the decoder may be omitted (in that case, FIGS. 7, 8 and 9 reduce to the same arrangement).
Alternatively, only the amplitude scale factor, the Decorrelation Scale Factor, and, optionally, the Transient Flag may be sent. In that case, any of the FIG. .7, 8 or 9 arrangements may be employed (omitting the Rotate Angle 28 and 34 in each of them).
As another alternative, only the amplitude scale factor and the angle control parameter may be sent. In that case, any of the FIG. 7, 8 or 9 arrangements may be employed (omitting the Decorrelator 38 and 42 of FIG. 7 and 46,48, 50,52 of FIGS. 8 and 9).
As in FIGS. 1 and 2, the arrangements of FIGS. 6-9 are intended to show any number of input. and output ehannels although, for simplicity in presentation, only two channels are shown.
It should be understood that implementation of ether. variations and modifications Of the invention and its various aspects will be apparent to those skilled in the art, and that the invention is not limited by these specific embodiments described. It is therefore contemplated to cover by the present invention any and all modifications, variations, or , CA 302 62 83 2 018 ¨12 ¨03 = 7321-92 . . .
=
=
. .
=
.=
. .
=
equivalents that fall Witt:lir., the trite scope of the hasie yncierlying principles = disclosecl herein. =
. = = , .
. = =
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
= .
. .
=
=
=
= . =
=
=
. .
=
=
Claims (11)
1. A
method performed in an audio decoder for reconstructing N audio channels from an audio signal having M encoded audio channels, the method comprising:
receiving a bitstream containing the M encoded audio channels and a set of spatial parameters, wherein the set of spatial parameters includes an amplitude parameter and a correlation parameter; wherein the correlation parameter is differentially encoded across time;
decoding the M encoded audio channels to obtain M audio channels, wherein each of the M audio channels is divided into a plurality of frequency bands, and each frequency band includes one or more spectral components;
extracting the set of spatial parameters from the bitstream;
applying a differential decoding process across time to the differentially encoded correlation parameter to obtain a differentially decoded correlation parameter; analyzing the M audio channels to detect a location of a transient;
decorrelating the M audio channels to obtain a decorrelated version of the M
audio channels, wherein a first decorrelation technique is applied to a first subset of the plurality of frequency bands of each audio channel and a second decorrelation technique is applied to a second subset of the plurality of frequency bands of each audio channel;
deriving the N audio channels from the M audio channels, the decorrelated version of the M audio channels, and the set of spatial parameters, wherein N is two or more, M is one or more, and M is less than N; and synthesizing, by an audio reproduction device, the N audio channels as an output audio signal, wherein both the analyzing and the decorrelating are performed in a frequency domain, the first decorrelation technique represents a first mode of operation of a decorrelator, the second decorrelation technique represents a second mode of operation of the decorrelator, and the audio decoder is implemented at least in part in hardware.
method performed in an audio decoder for reconstructing N audio channels from an audio signal having M encoded audio channels, the method comprising:
receiving a bitstream containing the M encoded audio channels and a set of spatial parameters, wherein the set of spatial parameters includes an amplitude parameter and a correlation parameter; wherein the correlation parameter is differentially encoded across time;
decoding the M encoded audio channels to obtain M audio channels, wherein each of the M audio channels is divided into a plurality of frequency bands, and each frequency band includes one or more spectral components;
extracting the set of spatial parameters from the bitstream;
applying a differential decoding process across time to the differentially encoded correlation parameter to obtain a differentially decoded correlation parameter; analyzing the M audio channels to detect a location of a transient;
decorrelating the M audio channels to obtain a decorrelated version of the M
audio channels, wherein a first decorrelation technique is applied to a first subset of the plurality of frequency bands of each audio channel and a second decorrelation technique is applied to a second subset of the plurality of frequency bands of each audio channel;
deriving the N audio channels from the M audio channels, the decorrelated version of the M audio channels, and the set of spatial parameters, wherein N is two or more, M is one or more, and M is less than N; and synthesizing, by an audio reproduction device, the N audio channels as an output audio signal, wherein both the analyzing and the decorrelating are performed in a frequency domain, the first decorrelation technique represents a first mode of operation of a decorrelator, the second decorrelation technique represents a second mode of operation of the decorrelator, and the audio decoder is implemented at least in part in hardware.
2. The method of claim 1 wherein the first mode of operation uses an all-pass filter and the second mode of operation uses a fixed delay.
3. The method of claim 1 wherein the analyzing occurs after the extracting and the deriving occurs after the decorrelating.
4. The method of claim 1 wherein the first subset of the plurality of frequency bands is at a higher frequency than the second subset of the plurality of frequency bands.
5. The method of claim 1 wherein the M audio channels are a sum of the N
audio channels.
audio channels.
6. The method of claim 1 wherein the location of the transient is used in the decorrelating to process bands with a transient differently than bands without a transient.
7. The method of claim 6 wherein the N audio channels represent a stereo audio signal where N is two and M is one.
8. The method of claim 1 wherein the N audio channels represent a stereo audio signal where N is two and M is one.
9. The method of claim 1 wherein the first subset of the plurality of frequency bands is non-overlapping but contiguous with the second subset of the plurality of frequency bands.
10. A non-transitory computer readable medium containing instructions that when executed by a processor perform the method of claim 1.
11. An audio decoder for decoding M encoded audio channels representing N
audio channels, the audio decoder comprising:
an input interface for receiving a bitstream containing the M encoded audio channels and a set of spatial parameters, wherein the set of spatial parameters includes an amplitude parameter and a correlation parameter; wherein the correlation parameter is differentially encoded across time;
an audio decoder for decoding the M encoded audio channels to obtain M audio channels, wherein each of the M audio channels is divided into a plurality of frequency bands, and each frequency band includes one or more spectral components;
a demultiplexer for extracting the set of spatial parameters from the bitstream;
a processor for applying a differential decoding process across time to the differentially encoded correlation parameter to obtain a differentially decoded correlation parameter, and analyzing the M audio channels to detect a location of a transient;
a decorrelator for decorrelating the M audio channels, wherein a first decorrelation technique is applied to a first subset of the plurality of frequency bands of each audio channel and a second decorrelation technique is applied to a second subset of the plurality of frequency bands of each audio channel;
a reconstructor for deriving N audio channels from the M audio channels and the set of spatial parameters, wherein N is two or more, M is one or more, and M is less than N; and an audio reproduction device that synthesizes the N audio channels as an output audio signal, wherein both the analyzing and the decorrelating are performed in a frequency domain, the first decorrelation technique represents a first mode of operation of the decorrelator, and the second decorrelation technique represents a second mode of operation of the decorrelator.
audio channels, the audio decoder comprising:
an input interface for receiving a bitstream containing the M encoded audio channels and a set of spatial parameters, wherein the set of spatial parameters includes an amplitude parameter and a correlation parameter; wherein the correlation parameter is differentially encoded across time;
an audio decoder for decoding the M encoded audio channels to obtain M audio channels, wherein each of the M audio channels is divided into a plurality of frequency bands, and each frequency band includes one or more spectral components;
a demultiplexer for extracting the set of spatial parameters from the bitstream;
a processor for applying a differential decoding process across time to the differentially encoded correlation parameter to obtain a differentially decoded correlation parameter, and analyzing the M audio channels to detect a location of a transient;
a decorrelator for decorrelating the M audio channels, wherein a first decorrelation technique is applied to a first subset of the plurality of frequency bands of each audio channel and a second decorrelation technique is applied to a second subset of the plurality of frequency bands of each audio channel;
a reconstructor for deriving N audio channels from the M audio channels and the set of spatial parameters, wherein N is two or more, M is one or more, and M is less than N; and an audio reproduction device that synthesizes the N audio channels as an output audio signal, wherein both the analyzing and the decorrelating are performed in a frequency domain, the first decorrelation technique represents a first mode of operation of the decorrelator, and the second decorrelation technique represents a second mode of operation of the decorrelator.
Applications Claiming Priority (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US57997401P | 2001-06-14 | 2001-06-14 | |
US60/579974 | 2001-06-14 | ||
US54936804P | 2004-03-01 | 2004-03-01 | |
US60/549368 | 2004-03-01 | ||
US58825604P | 2004-07-14 | 2004-07-14 | |
US60/588256 | 2004-07-14 | ||
CA2992051A CA2992051C (en) | 2004-03-01 | 2005-02-28 | Reconstructing audio signals with multiple decorrelation techniques and differentially coded parameters |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA2992051A Division CA2992051C (en) | 2001-06-14 | 2005-02-28 | Reconstructing audio signals with multiple decorrelation techniques and differentially coded parameters |
Publications (2)
Publication Number | Publication Date |
---|---|
CA3026283A1 CA3026283A1 (en) | 2005-09-15 |
CA3026283C true CA3026283C (en) | 2019-04-09 |
Family
ID=64655307
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA3026283A Active CA3026283C (en) | 2001-06-14 | 2005-02-28 | Reconstructing audio signals with multiple decorrelation techniques |
Country Status (1)
Country | Link |
---|---|
CA (1) | CA3026283C (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110619882B (en) * | 2013-07-29 | 2023-04-04 | 杜比实验室特许公司 | System and method for reducing temporal artifacts of transient signals in decorrelator circuits |
CN111192592B (en) * | 2013-10-21 | 2023-09-15 | 杜比国际公司 | Parametric reconstruction of audio signals |
CN112967729B (en) * | 2021-02-24 | 2024-07-02 | 辽宁省视讯技术研究有限公司 | Vehicle-mounted local audio fuzzy processing method and device |
CN117476026A (en) * | 2023-12-26 | 2024-01-30 | 芯瞳半导体技术(山东)有限公司 | Method, system, device and storage medium for mixing multipath audio data |
-
2005
- 2005-02-28 CA CA3026283A patent/CA3026283C/en active Active
Also Published As
Publication number | Publication date |
---|---|
CA3026283A1 (en) | 2005-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11308969B2 (en) | Methods and apparatus for reconstructing audio signals with decorrelation and differentially coded parameters | |
CA3026283C (en) | Reconstructing audio signals with multiple decorrelation techniques | |
AU2012208987B2 (en) | Multichannel Audio Coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request |
Effective date: 20181203 |