Embodiment
To explain particularly exemplary embodiment of the present invention with reference to accompanying drawing as follows.
Fig. 1 illustrates the code device based on the multichannel audio of sound source position clue according to exemplary embodiment of the present invention.
According to exemplary embodiment of the present invention based on the code device 100 of the multichannel audio of sound source position clue as 5 channel multichannel audio code devices based on SSLCC; As shown in Figure 1, constitute by pre-treatment bank of filters device 110, analyzer 120, down mixed processor 130, audio coder 140 and multiplexer 150.
At this moment, can expand to the content more than 5 channels according to the code device 100 based on the multichannel audio of sound source position clue of exemplary embodiment of the present invention based on the multichannel audio of sound source position clue.
Pre-treatment bank of filters device 110 carries out pre-treatment and is input to the multichannel input audio signal based on the code device 100 of the multichannel audio of sound source position clue, will be transformed to the signal of frequency field by the input audio signal of pre-treatment via bank of filters.At this moment, bank of filters is carried out the T/F conversion based on subband analysis, and it can be applied to MDCT, MDST, DFT etc.
At this, the audio signal of said input can comprise input signal LF (Left Front), input signal RF (Right Front), input signal C (Center Front), input signal Ls (Left Surround), input signal Rs (Right Surround).
Analyzer 120 extracts spatial cues (spatial cue) from the input audio signal that is transformed to frequency field in pre-treatment bank of filters 110, the bit stream that said spatial cues is shown as additional information transmits.At this moment, analyzer 120 is sent to down mixed processor 130 through compressing said input audio signal.
Mixed processor 130 can descend to mix like analyzer 120 and under frequency field, mix said input audio signal down.And following mixed processor 130 can descend to mix according to the recommendation case of ITU-T.
The audio signal of under mixing in the processor 130 down, mixing can show as bit stream through stereo audio commonly used.Said stereo audio commonly used can be utilized MP3 (MPEG Layer III) or AAC (Advanced Audio Coding) etc.
The audio signal that audio coder 140 can be encoded and mixed down the processor 130 from mixing down.
Multiplexer 150 will be from audio coder 140 encoded signals and the additional information bits stream that from analyzer 120, transmits combine and transmit.
Fig. 2 illustrates the decoding device based on the multichannel audio of sound source position clue according to exemplary embodiment of the present invention.
According to exemplary embodiment of the present invention based on the decoding device 200 of the multichannel audio of sound source position clue as 5 channel multichannel audio decoding devices based on SSLCC; As shown in Figure 2, mix device 240 and the synthesis filter group mouthpart 250 of windowing and constitute by demodulation multiplexer 210, audio decoder 220, windowing bank of filters 230, comprehensive going up.
Demodulation multiplexer 210 receives the signal that transmits from demodulation multiplexer 150, and the said signal that receives is parsed into audio bitstream and additional information bits stream.
Audio decoder 220 restores the signal that mixes down based on said audio bitstream.
230 pairs of said signals that mix down of the bank of filters of windowing apply analysis filterbank and carry out the T/F conversion, with the said signal windowing and the transmission that mixes down of T/F conversion.
Comprehensive going up mixed device 240 through using said signal that mixes down and said additional information bits stream prediction multichannel signal, generates the signal that mixes based on mixing the said signal that mixes down on the said multichannel signal.
Specifically; The comprehensive device 240 that upward mixes separates amplitude information and phase information from the said signal that mixes down; Based on said amplitude signal the random sequence windowing of having deposited to give weighted value to said phase information, can predict the multichannel signal based on the phase information that said amplitude information and weighted value are given.
At this moment, comprehensive going up mixed the said signal that mixes down of device 240 complex transformations, from the said signal that mixes down of complex transformation, separates amplitude information and phase information.
And; Comprehensive going up mixed device 240 is used for revising phase information based on the envelope modeling of said amplitude information frequency spectrum type window; Said frequency spectrum type window is applicable to that the random sequence of having deposited comes windowing, is given said phase information by the random sequence of windowing with weighted value through using.
Comprehensive going up mixed the phase information that device 240 combines said amplitude information and weighted value to give, and the phase information that the amplitude information and the weighted value of said combination are given carries out predicting the multichannel signal against complex transformation.
The 250 pairs of said signals that mix of going up of mouthpart of windowing of synthesis filter group are carried out the signal that the synthesis filter group extracts the time field, said mixed signal is carried out windowing extract the output signal.
Fig. 3 illustrates the decoding device based on the multichannel audio of sound source position clue of exemplary embodiment according to another preferred.
According to another preferred embodiment based on the decoding device 300 of the multichannel audio of sound source position clue as the decoding device that is suitable for real transform (real transform) based on the multichannel audio of sound source position clue; As shown in Figure 2, it is made up of demodulation multiplexer 310, TDAC bank of filters 320, the comprehensive mouthpart 340 of windowing that mixes device 330 and synthesis filter group of going up.
SSLCC is following DFT bank of filters (conversion) basically., for kernel stereo audio interlock mutually, can use the various filters group.
Though the form of bank of filters changes to some extent, SSLCC analyzer 120 or comprehensive mixed device 240 are identical with the operating principle of window comprehensive (synthesis) between the mouthpart 250 of synthesis filter group.
Because inapplicable inversely related device (decorrelator) when solid transmits can be realized real transform.Decoding device 300 according to another preferred embodiment based on the multichannel audio of sound source position clue use can real transform MDCT come and the mutual interlock of kernel encoding and decoding.
TDAC bank of filters 320 is restored the signal that mixes down based on the bit stream of said audio frequency, omits then the said signal that mixes down is suitable for the process of the bank of filters of analyzing and the process of windowing, mixes device 330 and be sent to comprehensive going up.At this moment, the signal that transmitted of TDAC bank of filters 320 can be the signal L that mixes under the frequency with frequency under mixed signal R.
The synthesis filter group window mouthpart 340 with the synthesis filter group be applicable on comprehensive, mix generated in the device 330 on the signal that mixes and extract the signal in time field; Be used for the analysis windowing of said signal that go up to mix and the kernel stereo audio windowing of arranging in pairs or groups is mutually exported signal to extract.
At this moment; Demodulation multiplexer 310, comprehensive go up the demodulation multiplexer 210 that mixes device 330 and the decoding device 200 of the multichannel audio based on the sound source position clue according to an embodiment of the invention, comprehensive going up mixed device 240 and had identical structure, so omits detailed explanation.
According to the decoding device 300 based on the multichannel audio of sound source position clue of exemplary embodiment of the present invention, according to the T/F conversion of selecting to change the audio signal of mixing under the stereo.For example, the action of inversely related device is to close under the situation of off in decoding device, and actual T/F conversion also can be suitable for.At this moment, multiple T/F conversion also can realize, even under the situation of multiple T/F conversion, also can not use phase information.
But the action that in decoding device, needs phase information promptly under the situation of the action that needs the inversely related device, must be used multiple T/F conversion when opening on.When multiple T/F conversion, DFT becomes basically, also can use MDCT/MDST as a complex transformation to (complex transform pair).
Fig. 4 illustrates the decoding device according to the additional information bits stream of exemplary embodiment of the present invention.
Additional information is VLSA (Virtual Sound Location Angle) the bit stream of the additional information of being analyzed from demodulation multiplexer 210 as decoding according to the decoding device of the additional information bits of exemplary embodiment of the present invention stream, as shown in Figure 4ly can be made up of huffman decoder 410 and inverse guantization (IQ) device 420.
And, can belong to the bank of filters 230 of windowing and the mouthpart group 250 of windowing of synthesis filter group according to the decoding device of the additional information bits of exemplary embodiment of the present invention stream.
Huffman decoder 410 usefulness huffman coding books carry out huffman coding to the bit stream of said additional information can generate difference index (differential index).
Huffman decoder 410 comprises that contrary differential encoder 411, differential encoder 412, mapper 413 and huffman encoder 414 generate said huffman coding book.
Contrary differential encoder 411 is carried out unfavourable balance based on the information of treated preceding frame and huffman coding shape and is divided coding the next original index (original index) of can decoding.
And the corresponding negative information of from original index, deleting of information of differential encoder 412 and sinusoidal sine bit carries out differential coding then and generates index information.
Mapper 413 is used for removing said negative information deletion side-play amount (offset) information in index, shines upon said index according to frequency answer (solution) then, to be divided into first sub-band (sub band) and other frequency bands except that said first sub-band.
At last, 414 pairs of said each first sub-bands of huffman encoder and other frequency bands except that said first sub-band are suitable for the huffman coding method and generate the huffman coding book.
Huffman decoder 410 is through the huffman coding book of reference table 1 first sub-band of decoding.
[table 1]
?Index |
?Num?of?bits |
Code?word |
?Index |
Num.of?bits |
Codeword |
0 |
5 |
0x17 |
16 |
5 |
0x1d |
1 |
8 |
0x64 |
17 |
5 |
0x19 |
2 |
8 |
0x65 |
18 |
5 |
0x1c |
3 |
8 |
0xf0 |
19 |
5 |
0x16 |
4 |
8 |
0xf1 |
20 |
5 |
0x18 |
5 |
7 |
0x33 |
21 |
5 |
0x14 |
6 |
7 |
0x79 |
22 |
5 |
0x13 |
7 |
6 |
0x18 |
23 |
5 |
0x15 |
8 |
6 |
0x22 |
24 |
5 |
0x1b |
9 |
6 |
0x23 |
25 |
5 |
0x10 |
10 |
6 |
0x3d |
26 |
5 |
0x0e |
11 |
5 |
0x0b |
27 |
5 |
0x0f |
12 |
5 |
0x12 |
28 |
5 |
0x0d |
13 |
5 |
0x1a |
29 |
5 |
0x0a |
14 |
4 |
0x04 |
30 |
2 |
0x00 |
15 |
5 |
0x1f |
|
|
|
And when the said signal that demodulation multiplexer 210 receives was the quantized signal of 5 bits, huffman decoder 410 carried out Hofmann decoding through the huffman coding book of reference table 2.
[table 2]
And when the said signal that demodulation multiplexer 210 receives was the quantized signal of 4 bits, huffman decoder 410 carried out Hofmann decoding through the huffman coding book of reference table 3.
[table 3]
Inverse guantization (IQ) device 420 utilizes the inverse guantization (IQ) table to come said difference index is carried out inverse guantization (IQ), to restore additional information.Specifically, inverse guantization (IQ) device 420 can carry out inverse guantization (IQ) through VLSA (Virtual Sound Location Angle) information in each framework of mapping with the corresponding quantization table of each VSLA.At this moment; Owing to decode with the DFT or the MDCT of framework unit basically according to the multichannel audio based on the sound source position clue of exemplary embodiment of the present invention, the smooth between framework (smoothing) is mainly satisfied by overlapping additional (overlap-add) mode via windowing.
When VLSA information was LHA (Left Half-plane Angle), inverse guantization (IQ) device 420 can carry out quantization through the quantization table of mapping table 4.
[table 4]
At this moment, restore the step of said additional information, when VLSA information was RHA (Right Half-plane Angle), inverse guantization (IQ) device 420 can carry out quantization through the quantization table of mapping table 5.
[table 5]
At this moment, restore the step of said additional information, when VLSA information was LSA (Left Subsequent vector Angle), inverse guantization (IQ) device 420 can carry out quantization through the quantization table of mapping table 6.
[table 6]
Idx |
-15 |
-14 |
-13 |
-12 |
-11 |
-10 |
-9 |
-8 |
-7 |
-6 |
-5 |
LSA[idx] |
-15 |
-14 |
-13 |
-12 |
-11 |
-10 |
-9 |
-8 |
-7 |
-6 |
-5 |
Idx |
-4 |
-3 |
-2 |
-1 |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
LSA[idx] |
-4 |
-3 |
-2 |
-1 |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
Idx |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
|
|
LSA[idx] |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
|
|
At this moment, restore the step of said additional information, when VLSA information was RSA (Right Subsequent vector Angle), inverse guantization (IQ) device 420 can carry out quantization through the quantization table of mapping table 7.
[table 7]
Idx |
-15 |
-14 |
-13 |
-12 |
-11 |
-10 |
-9 |
-8 |
-7 |
-6 |
-5 |
RSA[idx] |
-15 |
-14 |
-13 |
-12 |
-11 |
-10 |
-9 |
-8 |
-7 |
-6 |
-5 |
Idx |
-4 |
-3 |
-2 |
-1 |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
RSA[idx] |
-4 |
-3 |
-2 |
-1 |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
Idx |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
|
|
RSA[idx] |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
|
|
And inverse guantization (IQ) device 420 can extract the parameter that satisfies mathematical expression 1 from the VLSA information that from each index information, obtains.
[mathematical expression 1]
gLs=sin(θ
lh)
gL=cos(θ
lh)·sin((LSA[idx]-2π)×-3)
gCL=cos(θ
lh)·cos((LSA[idx]-2π)×-3)
gRs=cos(θ
rh)
gCR=sin(θ
rh)·cos(RSA[idx]×3)
gR=sin(θ
rh)·sin(RSA[idx]×3)
At this moment, from said additional information, being comprised to such an extent that the definition parameter of sub-band number is the number that can obtain sub-band among the bsFreqRes, inverse guantization (IQ) device 420 can shine upon the number of the VSLA information that transmits according to the number of sub-band.And maximum frequency band number makes be as the criterion frequency characteristic according to bit rate (bit rate) or framework of 28 frequency bands (Mpar=28) that rate is different with frequency band number respectively.
(bsFreqRes Mpar) can shine upon like following table 8 inverse guantization (IQ) device 420 through mapsubbands.
[table 8]
Inverse guantization (IQ) device 420 uses with the be as the criterion rate respectively of each frequency band of design of ERB frequency band can handle mathematical expression 1, and under mapping as the situation of 8 o'clock Mpar=28, the rate respectively of each frequency band can be for like table 9.
[table 9]
m |
M
par=28
|
kHz |
0 |
0 |
0.0702 |
1 |
1 |
0.1639 |
2 |
2 |
0.2576 |
3 |
3 |
0.3512 |
4 |
4 |
0.4449 |
5 |
5 |
0.5385 |
6 |
6 |
0.6322 |
7 |
7 |
0.7259 |
8 |
8 |
0.9132 |
9 |
9 |
1.1005 |
10 |
10 |
1.2878 |
11 |
11 |
1.4751 |
12 |
12 |
1.8498 |
13 |
13 |
2.2244 |
14 |
14 |
2.599 |
15 |
15 |
2.9737 |
16 |
16 |
3.7229 |
17 |
17 |
4.4722 |
18 |
18 |
5.2215 |
19 |
19 |
5.9707 |
20 |
20 |
6.72 |
21 |
21 |
7.4693 |
22 |
22 |
8.5932 |
23 |
23 |
9.7171 |
24 |
24 |
11.2156 |
25 |
25 |
13.0888 |
26 |
26 |
15.3366 |
27 |
27 |
24 |
At this moment, the cycle formerly symbol canceller 210, contrary BIFORE transform device 220, protection frequency band canceller 230 have the identical quantity of quantity with the employed reception antenna of said decoder, with corresponding with each reception antenna.
After the bank of filters 230 of windowing is carried out the T/F conversion, can be defined as one to the frequency band of frequency field and handle frequency band (processing band).
For example, as shown in table 10 when carrying out 2048 DFT conversion, be the center with the position of start bin, stop bin, can be defined as one to the frequency band of frequency field and handle frequency band.
[table 10]
Comprehensive mixed device 240 is located through the audio frequency position in each sub-band in the said signal that mixes down based on the VSLA information of from the bit stream of additional information, restoring mutually can restore the multichannel signal.Specifically, as shown in Figure 5, utilization shift angle (panning angle) from the bit stream of additional information is predicted the dynamic Information in each sub-band, can predict the signal of the sub-channel of each channel through being suitable for dynamic Information again.
Fig. 5 illustrates the comprehensive process that goes up the gain that mixes each channel of device prediction according to exemplary embodiment of the present invention.
As shown in Figure 5, comprehensive going up mixed device 240 is predicted each channel through the information of the audio frequency position phase of each channel of stage ground recovery gain (gain).
At first, comprehensively go up mixed device 240 and can restore LHA [idx] 510 and RHA [idx] 520.
The comprehensive device 240 that upward mixes is predicted gLs [idx] 530 from LHA [idx] 510, restore LSA [idx] 511, from RHA [idx] 520, predicts gRs [idx] 540, restores RSA [idx] 521.
Then, the comprehensive device 240 that upward mixes is predicted gL [idx] 550 and gCL [idx] 512 from LSA [idx] 511, from RSA [idx] 521, predict gRs [idx] 560 and gCR [idx] 522.
At last, the comprehensive device 240 that upward mixes is predicted gCL [idx]/sqrt (2) 570 from gCL [idx] 512 and gCR [idx] 522.At this moment, gCL [idx]/sqrt (2) is as gCL [idx] 512*0.7071 can be the controlled value of the gain of center channel.
The comprehensive device 240 that upward mixes can generate the signal that mixes based on the said multichannel signal through said step prediction mixing on the mixed down signal.
If X
DmxL(m is the k frequency of b in of the m sub-band of the signal that mixes down of the Left that transmits k), and ' Left upmixing Matrix ' can satisfy mathematical expression 2.
[mathematical expression 2]
And ' Rightupmixing Matrix ' can satisfy mathematical expression 3 to the signal that Right is mixed down.
[mathematical expression 3]
And the comprehensive device 240 that upward mixes can comprise that the inversely related device (decorrelator) based on DFT is D
LAnd D
R
Said D
LAnd D
RCan move with high complex patterns (high complexity mode) with as the low complex patterns (low complexity mode) of general mode.At this moment, said D
LAnd D
ROnly in decoder, generate, it moves with high complex patterns when generating high tone quality, when reappearing common tonequality, moves with low complex patterns.
In high complex patterns, said D
LAnd D
RImplementation to L (m, k) and R (m, the matrixing (matrixing) of mathematical expression 4 k) generates the inverse association signal.
[mathematical expression 4]
In general mode, said D
LAnd D
RSatisfy mathematical expression 5, and do not generate the inverse association signal.
[mathematical expression 5]
Comprehensive going up mixed the value that mixed said mathematical expression 2 and 3 is calculated on the device 240 use mathematical expressions 6.
[mathematical expression 6]
m<4
m≥4
At this moment, α (m) can be the L that indicates each frequency band and the factor of the relation between the R signal.δ is a fixed coefficient, when encoder is downloaded, its can for to around the fixed coefficient of back mixing syzygy number of (surround) signal.
The value that said α (m) is calculated said mathematical expression 4 and 5 is utilized mathematical expression 7 to calculate and is obtained.
[mathematical expression 7]
Go into coefficient, can be the value of the mixability that is used for adjusting the inverse association signal as weighted value.Therefore, at 0≤α (m)≤γ, γ can define α (m) in the scope of 0≤γ≤1.
And, said
WetL (m, k) with
Wet(m, k) signal as inverse association can be generated via the practiced inversely related technology of inversely related device R.
Fig. 6 illustrates the inversely related device according to exemplary embodiment of the present invention.
Inversely related device 600 according to the present invention is on comprehensive, to mix in the device 240 to be comprised the key element that forms the inverse association signal; As shown in Figure 6, it can comprise complex transformation device 610, amplitude information withdrawal device 620, phase information withdrawal device 630, random sequential memory 640, the mouthpart 650 of windowing, position phasing commutator 660, synthesizer 670 and contrary complex transformation device 680.
610 pairs of said signals that mix down of complex transformation device can carry out complex transformation.
The withdrawal device 630 of amplitude information withdrawal device 620 and phase information is from extracting amplitude information and phase information respectively, to separate the said signal that mixes down the said signal that mixes down of 610 conversion of complex transformation device.
The envelope of the said amplitude information that the mouthpart 650 of windowing is extracted based on amplitude information withdrawal device 620 is used for revising phase information modeling frequency spectrum type window, through the random sequence of in random sequential memory 640, having deposited being used said frequency spectrum type window windowing.
The quantity of the random sequence of in random sequential memory 640, having deposited at this moment, is according to the quantity of the said signal that mixes down and by fixed.That is, said in order to generate
WetL (m, k) with
Wet(m k), uses different random sequence mutually to R, and the degree of association of employed two random sequences is near 0 at this moment.
Position phasing commutator 660 utilizes the random sequence of from the mouthpart 650 of windowing, opening a window can give the said phase information that extracts to from phase information withdrawal device 630 with weighted value.
Synthesizer 670 can combine the said amplitude information that from amplitude information withdrawal device 620, extracts and from position phasing commutator 660, apply the phase information of weighted value.
The information that contrary complex transformation device 680 will be combined in synthesizer 670 is carried out contrary complex transformation and is calculated the inverse association signal.
Fig. 7 illustrates out the coding/decoding method based on the multichannel audio of sound source position clue according to exemplary embodiment of the present invention.
In step S710, demodulation multiplexer 210 receives the signal that is transmitted by multiplexer 150, and the signal that receives is parsed into the bit stream of stereo audio and the bit stream of additional information.
In step S720, the bit stream of the said audio frequency that audio decoder 220 is based among the step S710 to be analyzed can restore the signal that mixes down.
In step S730, the bit stream of the said additional information that the coding book of huffman decoder 410 usefulness Huffmans will be analyzed in step S710 carries out Hofmann decoding, to generate the difference index.
In step S740, the difference index that inverse guantization (IQ) device 420 usefulness inverse guantization (IQ) tables will be generated in S730 carries out inverse guantization (IQ), to restore additional information.Specifically, inverse guantization (IQ) device 420 can carry out inverse guantization (IQ) through information mapping and the corresponding quantization table of each VSLA to the VSLA of each frame.
In step S750; The comprehensive device 240 that upward mixes uses said signal that mixes down that in step S720, is restored and the said additional information of in step S740, being restored to predict the multichannel signal, generates the signal that mixes based on mixing the said signal that mixes down on the said multichannel signal.
In step S760, the signal that mixes on 250 pairs of in step S750, generated said of mouthpart of windowing of synthesis filter group is carried out the synthesis filter group and is extracted the signal in the time field, and opens the said window of going up the signal that mixes and can extract the output signal.
As stated; According to exemplary embodiment of the present invention based on the decoding device of the multichannel audio of sound source position clue and method through multi channel audio signal is received and compression; And three-dimensional signal is compressed and transmission via the three-dimensional codec of kernel (core stereo codec); When the reverse compatibility of encoding with existing stereo audio is provided, can transmit multichannel audio.
Although concrete exemplary embodiment of the present invention has been described for the intention of setting forth, those skilled in the art can carry out various modifications, interpolation and replacement to it under the situation that does not break away from the spirit and scope of the present invention that defined by claim.Therefore scope of the present invention should be defined by Rights attached thereto claim such as will demand for peace.