CN101754086B - Decoder and decoding method for multichannel audio coder using sound source location cue - Google Patents
Decoder and decoding method for multichannel audio coder using sound source location cue Download PDFInfo
- Publication number
- CN101754086B CN101754086B CN2009102238140A CN200910223814A CN101754086B CN 101754086 B CN101754086 B CN 101754086B CN 2009102238140 A CN2009102238140 A CN 2009102238140A CN 200910223814 A CN200910223814 A CN 200910223814A CN 101754086 B CN101754086 B CN 101754086B
- Authority
- CN
- China
- Prior art keywords
- signal
- information
- audio
- sound source
- multichannel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 13
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 13
- 238000013139 quantization Methods 0.000 claims description 16
- 230000009466 transformation Effects 0.000 claims description 14
- OVOUKWFJRHALDD-UHFFFAOYSA-N 2-[2-(2-acetyloxyethoxy)ethoxy]ethyl acetate Chemical compound CC(=O)OCCOCCOCCOC(C)=O OVOUKWFJRHALDD-UHFFFAOYSA-N 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 6
- 238000001228 spectrum Methods 0.000 claims 4
- 239000000203 mixture Substances 0.000 claims 2
- 238000011084 recovery Methods 0.000 claims 2
- 230000003760 hair shine Effects 0.000 claims 1
- 238000000926 separation method Methods 0.000 claims 1
- 230000005236 sound signal Effects 0.000 description 18
- 238000013507 mapping Methods 0.000 description 7
- 238000006243 chemical reaction Methods 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 4
- 230000003595 spectral effect Effects 0.000 description 4
- 238000011426 transformation method Methods 0.000 description 4
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 230000000875 corresponding effect Effects 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004091 panning Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/022—Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/26—Pre-filtering or post-filtering
- G10L19/265—Pre-filtering, e.g. high frequency emphasis prior to encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/02—Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Algebra (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Stereophonic System (AREA)
Abstract
本发明提供了一种基于音源位置线索的多频道音频的解码装置及其方法。所述解码装置包括:解复用器,接收信号,且将接收的上述信号剖析成音频比特流和附加信息比特流;音频解码器,基于所述音频比特流,对下混信号进行复原,且将复原的下混信号传送给综合上混器,而不将分析滤波器组适用于所述复原的下混信号;综合上混器,通过使用所述下混信号和所述附加信息比特流,预测多频道信号,基于所述多频道信号将所述下混信号进行上混,来生成上混信号;和综合滤波器组的开窗口器,对所述上混信号实行综合滤波器组来抽取时间领域信号,且对所述上混信号进行开窗口,来抽取输出信号。
The invention provides a multi-channel audio decoding device and method based on sound source location clues. The decoding device includes: a demultiplexer that receives a signal, and parses the received signal into an audio bit stream and an additional information bit stream; an audio decoder that restores the downmixed signal based on the audio bit stream, and passing the restored downmix signal to a synthesis upmixer without applying an analysis filter bank to said restored downmix signal; the synthesis upmixer, by using said downmix signal and said additional information bitstream, predicting a multi-channel signal, and upmixing the downmixed signal based on the multichannel signal to generate an upmixed signal; and a window opener of an integrated filter bank, performing an integrated filter bank on the upmixed signal to extract time domain signal, and perform windowing on the upmixed signal to extract the output signal.
Description
技术领域 technical field
本发明公开了一种基于音源位置线索的多频道音频的解码装置和其方法。The invention discloses a multi-channel audio decoding device and method based on sound source location clues.
背景技术 Background technique
声源定位线索编码(SSLCC:Sound Source Location Cue Coding)的基本编码概念从空间音频编码(SAC:Spatial Audio Coding)的方法出发。The basic coding concept of Sound Source Location Cue Coding (SSLCC: Sound Source Location Cue Coding) starts from the method of Spatial Audio Coding (SAC: Spatial Audio Coding).
所述SAC作为基于音源位置线索的多频道音频的压缩技术,基于空间内人们认知的空间线索(Spatial Cue)删除各频道信号的冗余(redundancy)可以使其压缩效率极大化。The SAC is a multi-channel audio compression technology based on sound source location cues, and deleting the redundancy of each channel signal based on the spatial cues (Spatial Cue) recognized by people in the space can maximize the compression efficiency.
并且,多频道信号基本上通过下混处理,使传来的音频的下载信号成为内核(core)信号。即,通过现有立体音频可以实现重现,其是SAC方法的基本原则。Moreover, the multi-channel signal is basically processed through down-mixing, so that the transmitted audio download signal becomes a core signal. That is, reproduction can be achieved through existing stereo audio, which is the basic principle of the SAC method.
所述SSLCC是SAC方法中的一个且是在空间上人们认知的空间线索,其可以预测音源的位置,从多频道信号中抽取音源的位置信息,来表现且传送。The SSLCC is one of the SAC methods and is a spatial cue recognized by people in space. It can predict the position of the sound source, and extract the position information of the sound source from the multi-channel signal to represent and transmit it.
此时,由于通过SAC编码策略抽取的信息的信息量太少,其可以传送到冗余领域,因此如果接收所述信息的接收处不是支持SAC编码的音频,则利用现有立体音频只重现立体信号,如果接收所述信息的接收处是支持SAC编码的音频,则使用被传来的冗余信息从下混的立体信号中复原基于音源位置线索的多频道的音频信号。At this time, because the amount of information extracted by the SAC encoding strategy is too small, it can be transmitted to the redundant field, so if the receiver receiving the information is not audio that supports SAC encoding, the existing stereo audio is used to reproduce only For the stereo signal, if the receiver receiving the information supports SAC coding audio, the transmitted redundant information is used to restore the multi-channel audio signal based on the location clue of the sound source from the downmixed stereo signal.
但是,为了通过使用冗余信息从下混的立体信号中复原基于音源位置线索的多频道的音频信号,需要使用与在通过SAC编码策略抽取信息的过程中所利用的T/F(time to frequency)变换方法相同的T/F变换方法,因此如果在通过SAC编码策略抽取信息的过程中所利用的T/F变换方法不是最适合接收处的T/F变换方法的话,其会给变换过程会带来不好的影响。However, in order to restore a multi-channel audio signal based on sound source position cues from the downmixed stereo signal by using redundant information, it is necessary to use the T/F (time to frequency ) transformation method with the same T/F transformation method, so if the T/F transformation method used in the process of extracting information through the SAC coding strategy is not the most suitable T/F transformation method at the receiving end, it will give the transformation process bring bad influence.
因此,需要一种最适合接收处的T/F变换方法来复原基于音源位置线索的多频道的音频信号的装置和方法。Therefore, there is a need for an apparatus and method for recovering multi-channel audio signals based on sound source location clues that are most suitable for the T/F conversion method at the receiving end.
发明内容 Contents of the invention
本发明提供了一种通过将多频道音频信号接收和压缩,且经由基本立体编码将立体信号压缩和传送,在其与基本立体音频编码保持反兼容性(backward compatible)的同时可以传送多频道音频信号的解码装置和其方法。The present invention provides a method of receiving and compressing multi-channel audio signals, and compressing and transmitting the stereo signals via basic stereo coding, which can transmit multi-channel audio while maintaining backward compatibility with basic stereo audio coding. Signal decoding device and method thereof.
本发明提供了一种通过使用时域混迭取消(TDAC:Time Domain AliasingCancellation)文档器组(file bank)根据选择可以变换对立体下混的音频信号的T/F的解码装置和其方法。The present invention provides a decoding device and method thereof that can convert the T/F of an audio signal for stereo downmixing according to selection by using a Time Domain Aliasing Cancellation (TDAC: Time Domain Aliasing Cancellation) file bank.
技术方案Technical solutions
根据本发明的一个示例性实施例,提供了一种基于音源位置线索的多频道音频的解码装置,其包括:解复用器,接收信号,且将接收的上述信号剖析成音频比特流和附加信息比特流;音频解码器,基于所述音频比特流,对下混信号进行复原;综合上混器,通过使用所述下混信号和所述附加信息比特流,预测多频道信号,基于所述多频道信号将所述下混信号进行上混,来生成上混信号;和综合滤波器组的开窗口器,对所述上混信号实行综合滤波器组来抽取时间领域信号,且对所述上混信号进行开窗口,来抽取输出信号。According to an exemplary embodiment of the present invention, a multi-channel audio decoding device based on sound source location clues is provided, which includes: a demultiplexer, receiving signals, and analyzing the received signals into audio bit streams and additional an information bit stream; an audio decoder, based on the audio bit stream, restores the downmixed signal; an integrated up-mixer, predicting a multi-channel signal by using the downmixed signal and the additional information bitstream, based on the The multi-channel signal upmixes the downmixed signal to generate an upmixed signal; and a windower of an integrated filter bank performs an integrated filter bank on the upmixed signal to extract a time domain signal, and the The upmix signal is windowed to extract the output signal.
并且,根据本发明的另外一个示例性实施例,提供了一种基于音源位置线索的多频道音频的解码方法,其包括:将接收的信号剖析成音频比特流和附加信息比特流;基于所述音频比特流,对下混信号进行复原;使用所述下混信号和所述附加信息来预测多频道信号;基于所述多频道信号,上混所述下混信号,以生成上混信号;对所述上混信号实行综合滤波器组来抽取时间领域信号;和对所述上混信号进行开窗口,来抽取输出信号。And, according to another exemplary embodiment of the present invention, a method for decoding multi-channel audio based on sound source location clues is provided, which includes: analyzing the received signal into an audio bit stream and an additional information bit stream; based on the an audio bit stream, restoring the downmixed signal; using the downmixed signal and the additional information to predict a multi-channel signal; based on the multi-channel signal, upmixing the downmixed signal to generate an upmixed signal; performing a synthesis filter bank on the upmixed signal to decimate a time domain signal; and windowing the upmixed signal to decimate an output signal.
技术效果technical effect
根据本发明,通过将多频道音频信号接收和压缩,且经由基本立体编码将立体信号压缩和传送,在其与基本立体音频编码保持反兼容性(backwardcompatible)的同时可以传送多频道音频信号。According to the present invention, by receiving and compressing the multi-channel audio signal, and compressing and transmitting the stereo signal via the basic stereo coding, the multi-channel audio signal can be transmitted while remaining backward compatible with the basic stereo coding.
本发明通过使用TDAC滤波器组,根据选择可以变换对立体下混的音频信号的T/F。The present invention can transform the T/F of the stereo downmixed audio signal according to selection by using the TDAC filter bank.
附图说明 Description of drawings
图1示出根据本发明的示例性实施例的基于音源位置线索的多频道音频的编码装置;Fig. 1 shows the coding device of the multi-channel audio based on sound source position clue according to the exemplary embodiment of the present invention;
图2示出根据本发明的示例性实施例的基于音源位置线索的多频道音频的解码装置;Fig. 2 shows the decoding device of the multi-channel audio based on sound source location clue according to the exemplary embodiment of the present invention;
图3示出根据本发明的另外一个示例性实施例的基于音源位置线索的多频道音频的解码装置;FIG. 3 shows a multi-channel audio decoding device based on sound source position clues according to another exemplary embodiment of the present invention;
图4示出根据本发明的示例性实施例的附加信息比特流的解码装置;FIG. 4 shows an apparatus for decoding an additional information bitstream according to an exemplary embodiment of the present invention;
图5示出根据本发明的示例性实施例的综合上混器预测每个频道的增益的过程;FIG. 5 shows a process in which an integrated up-mixer predicts the gain of each channel according to an exemplary embodiment of the present invention;
图6示出根据本发明的示例性实施例的逆相关器;Figure 6 shows an inverse correlator according to an exemplary embodiment of the present invention;
图7示出出根据本发明的示例性实施例的基于音源位置线索的多频道音频的解码方法。Fig. 7 shows a method for decoding multi-channel audio based on audio source position clues according to an exemplary embodiment of the present invention.
具体实施方式 Detailed ways
如下将参考附图对本发明的示例性实施例进行具体地说明。Exemplary embodiments of the present invention will be specifically described below with reference to the accompanying drawings.
图1示出根据本发明的示例性实施例的基于音源位置线索的多频道音频的编码装置。Fig. 1 shows an apparatus for encoding multi-channel audio based on audio source position clues according to an exemplary embodiment of the present invention.
根据本发明的示例性实施例的基于音源位置线索的多频道音频的编码装置100作为基于SSLCC的5频道多频道音频编码装置,如图1所示,由前处理滤波器组器110、分析器120、下混处理器130、音频编码器140和复用器150构成。According to the exemplary embodiment of the present invention, the multi-channel
此时,根据本发明的示例性实施例的基于音源位置线索的多频道音频的编码装置100可以扩展到5频道以上的基于音源位置线索的多频道音频的内容。At this time, the multi-channel
前处理滤波器组器110进行前处理输入到基于音源位置线索的多频道音频的编码装置100的多频道的输入音频信号,将被前处理的输入音频信号经由滤波器组而变换为频率领域的信号。此时,滤波器组基于子频带分析进行T/F变换,其可以应用于MDCT、MDST、DFT等。The pre-processing
在此,所述输入的音频信号可以包括输入信号LF(Left Front)、输入信号RF(Right Front)、输入信号C(Center Front)、输入信号Ls(Left Surround)、输入信号Rs(Right Surround)。Here, the input audio signal may include input signal LF (Left Front), input signal RF (Right Front), input signal C (Center Front), input signal Ls (Left Surround), input signal Rs (Right Surround) .
分析器120在前处理滤波器组110中从变换为频率领域的输入音频信号中抽取空间线索(spatial cue),将所述空间线索表现为附加信息的比特流来传送。此时,分析器120通过压缩所述输入音频信号而传送到下混处理器130。The
下混处理器130可以进行下混如分析器120在频率领域下混所述输入音频信号。并且,下混处理器130根据ITU-T的推荐案可以进行下混。The
在下混处理器130中下混的音频信号通过常用的立体音频可以表现为比特流。所述常用的立体音频可以利用MP3(MPEG Layer III)或AAC(AdvancedAudio Coding)等。The audio signal downmixed in the
音频编码器140可以编码从下混处理器130中下混的音频信号。The
复用器150将从音频编码器140中编码的信号和从分析器120中传送的附加信息比特流进行结合和传送。The
图2示出根据本发明的示例性实施例的基于音源位置线索的多频道音频的解码装置。Fig. 2 shows an apparatus for decoding multi-channel audio based on sound source position clues according to an exemplary embodiment of the present invention.
根据本发明的示例性实施例的基于音源位置线索的多频道音频的解码装置200作为基于SSLCC的5频道多频道音频解码装置,如图2所示,由解复用器210、音频解码器220、开窗口滤波器组230、综合上混器240和综合滤波器组开窗口器250构成。The
解复用器210接收从解复用器150传送的信号,将接收的所述信号剖析成音频比特流和附加信息比特流。The
音频解码器220基于所述音频比特流复原下混的信号。The
开窗口的滤波器组230对所述下混的信号施加分析滤波器组来进行T/F变换,将T/F变换的所述下混的信号开窗口且传送。The windowed
综合上混器240通过使用所述下混的信号与所述附加信息比特流预测多频道信号,基于所述多频道信号上混所述下混的信号来生成上混的信号。The integrated
具体地说,综合上混器240从所述下混的信号中分离振幅信息和位相信息,基于所述振幅信号对已存的随机时序开窗口来给所述位相信息赋予加权值,基于所述振幅信息和加权值赋予的位相信息可以预测多频道信号。Specifically, the integrated up-
此时,综合上混器240复变换所述下混的信号,从复变换的所述下混的信号中分离振幅信息和位相信息。At this time, the integrated
并且,综合上混器240基于所述振幅信息的包络线建模用来修改位相信息的频谱型窗口,将所述频谱型窗口适用于已存的随机时序来开窗口,通过使用被开窗口的随机时序将加权值赋予所述位相信息。Moreover, the integrated
综合上混器240结合所述振幅信息和加权值赋予的位相信息,将所述结合的振幅信息和加权值赋予的位相信息进行逆复变换来预测多频道信号。The integrated
综合滤波器组的开窗口器250对所述上混的信号实行综合滤波器组来抽取时间领域的信号,对所述上混信号实行开窗口来抽取输出信号。The
图3示出根据本发明的另外一个示例性实施例的基于音源位置线索的多频道音频的解码装置。Fig. 3 shows an apparatus for decoding multi-channel audio based on audio source position clues according to another exemplary embodiment of the present invention.
根据本发明的另外一个实施例的基于音源位置线索的多频道音频的解码装置300作为适用实际变换(real transform)的基于音源位置线索的多频道音频的解码装置,如图2所示,其由解复用器310、TDAC滤波器组320、综合上混器330和综合滤波器组的开窗口器340构成。According to another embodiment of the present invention, the decoding device 300 of multi-channel audio based on sound source position clues is used as a decoding device 300 based on the multi-channel audio based on sound source position clues of real transform, as shown in FIG. 2 , which consists of A demultiplexer 310, a TDAC filter bank 320, an integrated upmixer 330 and a windower 340 for the integrated filter bank are formed.
SSLCC基本上跟随着DFT滤波器组(变换)。可是,为了与内核立体音频相连动,可以使用多种滤波器组。SSLCC basically follows a DFT filter bank (transform). However, for interfacing with Kernel Stereo Audio, various filter banks are available.
虽然滤波器组的形态有所变化,但是SSLCC分析器120或综合上混器240与综合滤波器组的开窗口器250之间的综合(synthesis)的动作原理相同。Although the form of the filter bank is changed, the operation principle of the synthesis between the
由于在立体传送时不适用逆相关器(decorrelator),可以实现实际变换。根据本发明的另外一个实施例的基于音源位置线索的多频道音频的解码装置300使用可以实际变换的MDCT来与内核编解码互相连动。Since no decorrelator is used for stereoscopic transmission, the actual transformation can be realized. According to another embodiment of the present invention, the multi-channel audio decoding device 300 based on sound source location clues uses the MDCT that can be actually transformed to interlock with the kernel codec.
TDAC滤波器组320基于所述音频的比特流复原下混的信号,然后省略对所述下混的信号适用分析的滤波器组的过程和开窗口的过程,而传送到综合上混器330。此时,TDAC滤波器组320所传送的信号可以为频率下混的信号L和频率下混的信号R。The TDAC filter bank 320 restores the downmixed signal based on the audio bitstream, and then omits the process of applying the analytical filter bank and the windowing process to the downmixed signal, and transmits it to the integrated upmixer 330 . At this time, the signals transmitted by the TDAC filter bank 320 may be the frequency downmixed signal L and the frequency downmixed signal R.
综合滤波器组的开窗口器340将综合滤波器组适用于在综合上混器330中所生成的上混的信号而抽取时间领域的信号,用来将所述上混的信号和内核立体音频的分析开窗口互相搭配开窗口,以抽取输出信号。The synthesis filter bank windower 340 applies a synthesis filter bank to the upmixed signal generated in the synthesis upmixer 330 to decimate the time domain signal for combining said upmixed signal with the kernel stereo audio The analysis windows of the two are paired with each other to extract the output signal.
此时,解复用器310、综合上混器330与根据本发明的一个实施例的基于音源位置线索的多频道音频的解码装置200的解复用器210、综合上混器240具有相同的结构,因此省略详细的说明。At this time, the demultiplexer 310 and the integrated up-mixer 330 have the same functions as the
根据本发明的示例性实施例的基于音源位置线索的多频道音频的解码装置300,根据选择可以变化对立体下混的音频信号的T/F变换。例如,在解码装置中逆相关器的动作为关闭off的情况下,实际T/F变换也可以被适用。此时,复T/F变换也可以实现,即使在复T/F变换的情况下,也可以不使用位相信息。According to the multi-channel audio decoding apparatus 300 based on sound source position clues according to an exemplary embodiment of the present invention, the T/F transformation of the stereo downmixed audio signal can be changed according to selection. For example, when the operation of the inverse correlator in the decoding device is off, the actual T/F conversion can also be applied. At this time, complex T/F conversion can also be realized, and even in the case of complex T/F conversion, phase information does not need to be used.
但是,在解码装置中需要位相信息的动作为打开on时,即在需要逆相关器的动作的情况下,必须使用复T/F变换。在复T/F变换时,DFT成为基本,也可以应用MDCT/MDST作为一个复变换对(complex transform pair)。However, when the operation requiring phase information is turned on in the decoding device, that is, when an inverse correlator operation is required, complex T/F conversion must be used. In complex T/F transformation, DFT becomes the basis, and MDCT/MDST can also be applied as a complex transform pair.
图4示出根据本发明的示例性实施例的附加信息比特流的解码装置。FIG. 4 shows an apparatus for decoding an additional information bitstream according to an exemplary embodiment of the present invention.
根据本发明的示例性实施例的附加信息比特流的解码装置作为解码从解复用器210所剖析的附加信息的比特流中附加信息即VLSA(Virtual SoundLocation Angle),如图4所示可以由霍夫曼解码器410和逆量子化器420构成。According to the decoding device of the additional information bit stream of the exemplary embodiment of the present invention, as the additional information in the bit stream of the additional information analyzed by the
并且,根据本发明的示例性实施例的附加信息比特流的解码装置可以属于开窗口的滤波器组230和综合滤波器组的开窗口器组250。And, the decoding apparatus of the additional information bitstream according to the exemplary embodiment of the present invention may belong to the
霍夫曼解码器410用霍夫曼编码书对所述附加信息的比特流进行霍夫曼编码可以生成差别指数(differential index)。The
霍夫曼解码器410包括逆差分编码器411、差分编码器412、映射器413和霍夫曼编码器414来生成所述霍夫曼编码书。The
逆差分编码器411基于已经处理的前帧和霍夫曼编码形的信息实行逆差分编码来可以解码原指数(original index)。The inverse
并且,差分编码器412与正弦sine比特的信息相应从原指数中删除负信息,然后进行差分编码来生成指数信息。In addition, the
映射器413用来解除在指数中的所述负信息删除偏移量(offset)信息,然后根据频率解答(solution)映射所述指数,以分成第一子频带(sub band)和除所述第一子频带以外的其他频带。The
最后,霍夫曼编码器414对所述每个第一子频带和除所述第一子频带以外的其他频带适用霍夫曼编码方法来生成霍夫曼编码书。Finally, the
霍夫曼解码器410通过参考表1的霍夫曼编码书解码第一子频带。The
[表1][Table 1]
并且,在解复用器210接收的所述信号是5比特量子化的信号的时候,霍夫曼解码器410通过参考表2的霍夫曼编码书进行霍夫曼解码。Moreover, when the signal received by the
[表2][Table 2]
并且,在解复用器210接收的所述信号是4比特量子化的信号的时候,霍夫曼解码器410通过参考表3的霍夫曼编码书进行霍夫曼解码。Moreover, when the signal received by the
[表3][table 3]
逆量子化器420利用逆量子化表来对所述差别指数进行逆量子化,以复原附加信息。具体地说,逆量子化器420通过映射每个框架内VLSA(VirtualSound Location Angle)信息和与每个VSLA相应的量子化表可以进行逆量子化。此时,由于根据本发明的示例性实施例的基于音源位置线索的多频道音频基本上以框架单位的DFT或MDCT进行解码,在框架之间的修匀(smoothing)主要被经由开窗口的重叠附加(overlap-add)方式满足。The
在VLSA信息为LHA(Left Half-plane Angle)的时候,逆量子化器420通过映射表4的量子化表可以进行量子化。When the VLSA information is LHA (Left Half-plane Angle), the
[表4][Table 4]
此时,复原所述附加信息的步骤,在VLSA信息为RHA(Right Half-planeAngle)的时候,逆量子化器420通过映射表5的量子化表可以进行量子化。At this time, in the step of restoring the additional information, when the VLSA information is RHA (Right Half-plane Angle), the
[表5][table 5]
此时,复原所述附加信息的步骤,在VLSA信息为LSA(Left Subsequentvector Angle)的时候,逆量子化器420通过映射表6的量子化表可以进行量子化。At this time, in the step of restoring the additional information, when the VLSA information is LSA (Left Subsequent Vector Angle), the
[表6][Table 6]
此时,复原所述附加信息的步骤,在VLSA信息为RSA(Right Subsequentvector Angle)的时候,逆量子化器420通过映射表7的量子化表可以进行量子化。At this time, in the step of restoring the additional information, when the VLSA information is RSA (Right Subsequent Vector Angle), the
[表7][Table 7]
并且,逆量子化器420从从每个指数信息中获得的VLSA信息中可以抽取满足数学式1的变数。And, the
[数学式1][mathematical formula 1]
gLs=sin(θlh)gLs=sin(θ lh )
gL=cos(θlh)·sin((LSA[idx]-2π)×-3)gL=cos(θ lh )·sin((LSA[idx]-2π)×-3)
gCL=cos(θlh)·cos((LSA[idx]-2π)×-3)gCL=cos(θ lh )·cos((LSA[idx]-2π)×-3)
gRs=cos(θrh)gRs=cos(θ rh )
gCR=sin(θrh)·cos(RSA[idx]×3)gCR=sin(θ rh )·cos(RSA[idx]×3)
gR=sin(θrh)·sin(RSA[idx]×3)gR=sin(θ rh )·sin(RSA[idx]×3)
此时,从在所述附加信息中被包括得子频带数的定义变数即bsFreqRes中可以获得子频带的数,逆量子化器420根据子频带的数可以映射传来的VSLA信息的数。且,最大的频带数以28频带(Mpar=28)为准根据比特率(bit rate)或框架的频率特性使分别率和频带数不同。At this time, the number of sub-bands can be obtained from the definition variable of the number of sub-bands included in the additional information, namely bsFreqRes, and the
逆量子化器420通过mapsubbands(bsFreqRes,Mpar)可以映射如下表8。The
[表8][Table 8]
逆量子化器420使用以ERB频带为准设计的每个频带的分别率可以处理数学式1,在映射如8时Mpar=28的情况下,每个频带的分别率可以为如表9。The
[表9][Table 9]
此时,周期在先符号删除器210、逆二进制傅里叶变换器220、保护频带删除器230具有与所述解码器所使用的接收天线的数量相同的数量,以与每个接收天线相对应。At this time, the periodic
在开窗口的滤波器组230实行T/F变换后,可以把频率领域的频带定义为一个处理频带(processing band)。After the T/F transform is performed by the
例如,在实行2048点DFT变换时,如表10所示,以start bin、stop bin的位置为中心,可以把频率领域的频带定义为一个处理频带。For example, when performing 2048-point DFT transformation, as shown in Table 10, with the positions of start bin and stop bin as the center, the frequency band in the frequency domain can be defined as a processing frequency band.
[表10][Table 10]
综合上混器240基于从附加信息的比特流中复原的VSLA信息通过在所述下混的信号中每个子频带内的音频位相定位可以复原多频道信号。具体地说,如图5所示,利用从附加信息的比特流中平移角度(panning angle)预测每个子频带内的动力信息,再通过适用动力信息可以预测每个频道的子频道的信号。The
图5示出根据本发明的示例性实施例的综合上混器预测每个频道的增益的过程。FIG. 5 illustrates a process of predicting a gain of each channel by an integrated upmixer according to an exemplary embodiment of the present invention.
如图5所示,综合上混器240通过阶段地复原每个频道的音频位相的信息来预测每个频道的增益(gain)。As shown in FIG. 5 , the integrated up-
首先,综合上混器240可以复原LHA[idx]510和RHA[idx]520。First, the
综合上混器240从LHA[idx]510中预测gLs[idx]530,复原LSA[idx]511,从RHA[idx]520中预测gRs[idx]540,复原RSA[idx]521。
然后,综合上混器240从LSA[idx]511中预测gL[idx]550和gCL[idx]512,从RSA[idx]521中预测gRs[idx]560和gCR[idx]522。Then, the
最后,综合上混器240从gCL[idx]512和gCR[idx]522中预测gCL[idx]/sqrt(2)570。此时,gCL[idx]/sqrt(2)作为gCL[idx]512*0。7071可以为中心频道的增益被调整的值。Finally,
综合上混器240基于通过所述步骤预测的所述多频道信号把下混的信号上混可以生成上混的信号。The
如果XdmxL(m,k)是传来的Left下混的信号的第m子频带的第k频率bin,‘Left upmixing Matrix’可以满足数学式2。If X dmxL (m, k) is the k-th frequency bin of the m-th sub-band of the transmitted Left downmixed signal, the 'Left upmixing Matrix' can satisfy
[数学式2][mathematical formula 2]
并且,对Right下混的信号‘Rightupmixing Matrix’可以满足数学式3。In addition, the 'Rightupmixing Matrix' of the Right downmixed signal can satisfy Mathematical Formula 3.
[数学式3][mathematical formula 3]
并且,综合上混器240可以包括基于DFT的逆相关器(decorrelator)即DL和DR。Also, the
所述DL和DR以高复杂模式(high complexity mode)和作为普通模式低复杂模式(low complexity mode)可以动作。此时,所述DL和DR只在解码器内生成,在生成高音质时其以高复杂模式动作,在重现普通音质时以低复杂模式动作。The DL and DR can operate in a high complexity mode (high complexity mode) and a low complexity mode (low complexity mode) as a normal mode. At this time, the DL and DR are generated only in the decoder, and operate in a high-complexity mode when generating high-quality sound, and operate in a low-complexity mode when reproducing normal sound quality.
在高复杂模式的时候,所述DL和DR实行对L(m,k)和R(m,k)的数学式4的矩阵变换(matrixing)来生成逆关联信号。In high complexity mode, the DL and DR perform matrix transformation (matrixing) of Mathematical Formula 4 on L(m,k) and R(m,k) to generate an inverse correlation signal.
[数学式4][mathematical formula 4]
在普通模式的时候,所述DL和DR满足数学式5,而不生成逆关联信号。In normal mode, the DL and DR satisfy Mathematical Formula 5, and no inverse correlation signal is generated.
[数学式5][mathematical formula 5]
综合上混器240使用数学式6上混所述数学式2和3所计算的值。The
[数学式6][mathematical formula 6]
此时,α(m)可以为指示每个频带的L和R信号之间的关系的因素。δ是固定系数,在编码器下载的时候,其可以为对周围(surround)信号的反混合系数的固定系数。At this time, α(m) may be a factor indicating a relationship between L and R signals of each frequency band. δ is a fixed coefficient, which may be a fixed coefficient for the demixing coefficient of the surrounding signal when the encoder is downloaded.
所述α(m)是将所述数学式4和5所计算出的值利用数学式7来计算而获得的。The α(m) is obtained by calculating the value calculated by the above-mentioned Mathematical Expressions 4 and 5 using Mathematical Expression 7.
[数学式7][mathematical formula 7]
入作为加权值的系数,可以为用来调整逆关联信号的混合程度的值。因此,在0≤α(m)≤γ,γ在0≤γ≤1的范围内可以定义α(m)。Inputting a coefficient as a weighting value may be a value used to adjust the mixing degree of the inverse correlation signal. Therefore, α(m) can be defined in the range of 0≤α(m)≤γ, where γ is in the range of 0≤γ≤1.
并且,所述wetL(m,k)和wetR(m,k)作为逆关联的信号经由逆相关器所实行的逆相关工艺可以被生成。And, the wet L(m,k) and wet R(m,k) may be generated as inversely correlated signals via an inverse correlation process performed by an inverse correlator.
图6示出根据本发明的示例性实施例的逆相关器。Fig. 6 shows an inverse correlator according to an exemplary embodiment of the present invention.
根据本发明的逆相关器600是在综合上混器240内被包括而形成逆关联信号的要素,如图6所示,其可以包括复变换器610、振幅信息抽取器620、位相信息抽取器630、随机时序存储器640、开窗口器650、位相变换器660、综合器670和逆复变换器680。The inverse correlator 600 according to the present invention is included in the integrated up-
复变换器610对所述下混的信号可以进行复变换。The complex transformer 610 may perform complex transformation on the downmixed signal.
振幅信息抽取器620和位相信息的抽取器630从在复变换器610所变换的所述下混的信号中分别抽取振幅信息和位相信息,以分离所述下混的信号。The amplitude information extractor 620 and the phase information extractor 630 respectively extract amplitude information and phase information from the downmixed signal transformed in the complex transformer 610 to separate the downmixed signal.
开窗口器650基于振幅信息抽取器620所抽取的所述振幅信息的包络线用来修改位相信息建模频谱型窗口,通过对在随机时序存储器640中已存的随机时序使用所述频谱型窗口开窗口。The window opener 650 is based on the envelope of the amplitude information extracted by the amplitude information extractor 620 to modify the phase information modeling spectral window, by using the spectral type for the random timing stored in the random timing memory 640 window open window.
此时,在随机时序存储器640中已存的随机时序的数量根据所述下混的信号的数量而被定。即,为了生成所述wetL(m,k)和wetR(m,k),使用互相不同的随机时序,此时所使用的两个随机时序的关联度接近0。At this time, the number of random timings stored in the random timing memory 640 is determined according to the number of downmixed signals. That is, in order to generate the wet L(m, k) and wet R(m, k), different random time series are used, and the degree of correlation between the two random time series used at this time is close to zero.
位相变换器660利用从开窗口器650中开个窗口的随机时序将加权值可以赋予给从位相信息抽取器630中抽取的所述位相信息。The phase converter 660 may assign a weighted value to the phase information extracted from the phase information extractor 630 by using the random timing of opening a window from the window opener 650 .
综合器670可以结合从振幅信息抽取器620中抽取的所述振幅信息和从位相变换器660中施加加权值的位相信息。The synthesizer 670 may combine the amplitude information extracted from the amplitude information extractor 620 and the phase information from the phase transformer 660 to apply a weighted value.
逆复变换器680将在综合器670中所结合的信息进行逆复变换来计算逆关联信号。The inverse transformer 680 inversely transforms the information combined in the synthesizer 670 to calculate an inverse correlation signal.
图7示出出根据本发明的示例性实施例的基于音源位置线索的多频道音频的解码方法。Fig. 7 shows a method for decoding multi-channel audio based on audio source position clues according to an exemplary embodiment of the present invention.
在步骤S710中,解复用器210接收由复用器150传送的信号,将接收的信号剖析成立体音频的比特流和附加信息的比特流。In step S710, the
在步骤S720中,音频解码器220基于在步骤S710中所剖析的所述音频的比特流可以复原下混的信号。In step S720, the
在步骤S730中,霍夫曼解码器410用霍夫曼的编码书将在步骤S710中所剖析的所述附加信息的比特流进行霍夫曼解码,以生成差别指数。In step S730, the
在步骤S740中,逆量子化器420用逆量子化表将在S730中所生成的差别指数进行逆量子化,以复原附加信息。具体地说,逆量子化器420通过对每个帧的VSLA的信息映射与每个VSLA相对应的量子化表来可以进行逆量子化。In step S740, the
在步骤S750中,综合上混器240使用在步骤S720中所复原的所述下混的信号与在步骤S740中所复原的所述附加信息来预测多频道的信号,基于所述多频道的信号上混所述下混的信号来生成上混的信号。In step S750, the integrated up-
在步骤S760中,综合滤波器组的开窗口器250对在步骤S750中所生成的所述上混的信号实行综合滤波器组来抽取在时间领域的信号,且开所述上混的信号的窗口来可以抽取输出信号。In step S760, the
如上所述,根据本发明的示例性实施例的基于音源位置线索的多频道音频的解码装置和方法通过将多频道音频信号接收和压缩,且经由内核立体编解码器(core stereo codec)将立体信号压缩和传送,提供与现有立体音频编码的逆兼容性的同时,可以传送多频道音频。As described above, the multi-channel audio decoding device and method based on sound source location clues according to the exemplary embodiments of the present invention receive and compress the multi-channel audio signal, and convert the stereo Signal compression and transmission, allowing the transmission of multi-channel audio while providing backward compatibility with existing stereo audio coding.
尽管本发明的具体示例性实施例为阐述的意图已经做了描述,但本领域的技术人员在不脱离由权利要求定义的本发明的精神和范围的情况下可以对其进行各种修改、添加和替换。因此本发明的范围应该被附属权利要求和平等权利要求所定义。Although specific exemplary embodiments of the present invention have been described for illustrative purposes, various modifications, additions, and and replace. The scope of the invention should therefore be defined by the appended claims and equal claims.
Claims (10)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR20080121709 | 2008-12-03 | ||
KR121709/08 | 2008-12-03 | ||
KR1020090064918A KR101176703B1 (en) | 2008-12-03 | 2009-07-16 | Decoder and decoding method for multichannel audio coder using sound source location cue |
KR64918/09 | 2009-07-16 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101754086A CN101754086A (en) | 2010-06-23 |
CN101754086B true CN101754086B (en) | 2012-11-14 |
Family
ID=42363561
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2009102238140A Expired - Fee Related CN101754086B (en) | 2008-12-03 | 2009-11-23 | Decoder and decoding method for multichannel audio coder using sound source location cue |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR101176703B1 (en) |
CN (1) | CN101754086B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012045203A1 (en) * | 2010-10-05 | 2012-04-12 | Huawei Technologies Co., Ltd. | Method and apparatus for encoding/decoding multichannel audio signal |
WO2017132082A1 (en) * | 2016-01-27 | 2017-08-03 | Dolby Laboratories Licensing Corporation | Acoustic environment simulation |
KR20220005379A (en) * | 2020-07-06 | 2022-01-13 | 한국전자통신연구원 | Apparatus and method for encoding/decoding audio that is robust against coding distortion in transition section |
-
2009
- 2009-07-16 KR KR1020090064918A patent/KR101176703B1/en active IP Right Grant
- 2009-11-23 CN CN2009102238140A patent/CN101754086B/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN101754086A (en) | 2010-06-23 |
KR20100063639A (en) | 2010-06-11 |
KR101176703B1 (en) | 2012-08-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12080307B2 (en) | Stereo audio encoder and decoder | |
EP1934973B1 (en) | Temporal and spatial shaping of multi-channel audio signals | |
CN112786063B (en) | Audio encoders and decoders | |
CN103493128B (en) | A method and apparatus for performing an adaptive down- and up-mixing of a multi-channel audio signal | |
CN100571043C (en) | A spatial parameter stereo encoding and decoding method and device thereof | |
CN101202043B (en) | Method and system for encoding and decoding audio signal | |
JP7493073B2 (en) | Integration of high frequency reconstruction techniques with post-processing delay reduction | |
CN113963706A (en) | Frequency Domain Processors and Audio Encoders and Decoders for Time Domain Processors | |
CN103329197A (en) | Improved stereo parametric encoding/decoding for channels in phase opposition | |
US20230419976A1 (en) | Apparatus for Encoding or Decoding an Encoded Multichannel Signal Using a Filling Signal Generated by a Broad Band Filter | |
CN104838442A (en) | Encoder, decoder and methods for backward compatible multi-resolution spatial-audio-object-coding | |
JP2021507316A (en) | Backwards compatible integration of high frequency reconstruction technology for audio signals | |
CN101223598A (en) | Channel Level Difference Quantization and Dequantization Method Based on Virtual Source Position Information | |
CN101754086B (en) | Decoder and decoding method for multichannel audio coder using sound source location cue | |
JP2021522543A (en) | Integration of high frequency reconstruction technology with post-processing delay reduction | |
RU2798009C2 (en) | Stereo audio coder and decoder | |
RU2832544C2 (en) | Integration of high-frequency reconstruction techniques with reduced post-processing delay | |
RU2024117821A (en) | METHODS AND DEVICES FOR ENCODING OR DECODING SCENE-ORIENTED IMMERSIVE AUDIO CONTENT |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20100623 Assignee: Neo Lab Convergence Inc. Assignor: Korea Electronic Communication Institute Contract record no.: 2016990000259 Denomination of invention: Decoder and decoding method for multichannel audio coder using sound source location cue Granted publication date: 20121114 License type: Exclusive License Record date: 20160630 |
|
LICC | Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20121114 Termination date: 20191123 |
|
CF01 | Termination of patent right due to non-payment of annual fee |