KR960003453B1

KR960003453B1 - Stereo digital audio coder with bit assortment

Info

Publication number: KR960003453B1
Application number: KR1019940000741A
Authority: KR
Inventors: 김종일
Original assignee: 대우전자주식회사; 배순훈
Priority date: 1994-01-18
Filing date: 1994-01-18
Publication date: 1996-03-13
Also published as: KR950024450A

Abstract

The stereo digital audio coding device for coding a digital audio signal by adaptively allotting bit to each channel and frame comprises a perceptual entropy calculating unit for obtaining a first power density spectrum by inputting a group of frame composed of a plurality of frames of right, left, center, right and left surround channels and obtaining the perceptual for the above channels and the frames of each channel by using an auditory characteristic of human; an adaptive channel and frame bit allocation unit for adaptively allotting a bit to an audio signal for the channels and each frame of each channel in response to the perceptual entropy; and a coder for coding the audio signal by the bit supplied from the adaptive channel and frame bit allocation unit.

Description

Stereo digital audio encoding device that adaptively allocates and encodes channels and frames

제1도는 본 발명에 따른 스테레오 디지탈 오디오 부호화 장치를 도시한 블럭도.1 is a block diagram showing a stereo digital audio encoding apparatus according to the present invention.

제2도는 제1도에 도시된 5개(L, R, 센터, L서라운드 및 R서라운드)의 채널을 갖는 본 발명의 스테레오 디지탈 오디오 부호화 장치의 프레임군(GOF)부를 나타내는 구성도.2 is a block diagram showing a frame group (GOF) portion of the stereo digital audio encoding apparatus of the present invention having five channels (L, R, center, L surround and R surround) shown in FIG.

제3도는 본 발명의 인지정보량(PE₁)(X축)대 프레임 비트 할당 상태(Index)(Y축)룰 도시한 그라프.3 is a graph showing cognitive information amount PE ₁ (X axis) versus frame bit allocation state (Y axis) of the present invention.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

100 : L채널 프레임군부 120 : R채널 프레임군부100: L channel frame group 120: R channel frame group

130 : 센터채널 프레임군부 140 : L서라운드 채널 프레임군부130: center channel frame group 140: L surround channel frame group

150 : R서라운드 채널 프레임군부 160 : 인지 정보량 계산부150: R surround channel frame group 160: cognitive information amount calculation unit

170 : 적응적 채널 및 프레임 비트 할당부 180∼220 : 부호기170: adaptive channel and frame bit allocation unit 180-220: encoder

230 : 다중화부230: multiplexing unit

본 발명은 디지탈 오디오 부호기(Digital Audio Coder)에 관한 것으로, 특히, L, R, 센터, L 및 R 서라운드의 채널로 입력되는 디지탈 오디오 신호에 대해 인간의 청각 특성을 이용하여 측정한 인지 정보량(Perceptual Entropy : PE)에 의해 채널과 각 프레임에 적응적으로 비트 할당하여 부호화하는 스테레오 디지탈 오디오 부호화 장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a digital audio coder, and more particularly, to an amount of cognitive information measured using human auditory characteristics with respect to a digital audio signal input through a channel of L, R, center, L, and R surround. Entropy (PE) relates to a stereo digital audio encoding apparatus for adaptively bit allocation and encoding on a channel and each frame.

현재 실용화되어 있는 컴팩트 디스크(Compact Disk : CD) 및 디지탈 오디오 테이프 레코더(Digital Audio Tape Recoder : DAT)등과 같은 음질 수준의 신호 재생을 목표로 개발중에 있는 고화질 텔레비젼(HDTV)정보 전송 시스템에서는 비교적 좁은 약 6MHz의 전송 선로를 통하여 영상 및 오디오 신호를 전송하여야 하기 때문에 영상 신호처리에서와 마찬가지로 오디오 신호에 대해서도 효율적인 신호 압축기법이 요구되어 왔다.It is relatively narrow in high-definition television (HDTV) information transmission system, which is being developed for the purpose of reproducing sound quality signals such as compact discs (CDs) and digital audio tape recorders (DATs) that are currently in use. Since video and audio signals must be transmitted through a 6MHz transmission line, an efficient signal compressor method has been required for audio signals as in video signal processing.

이를 위하여 근래에 HDTV용 고음질 디지탈 오디오 신호 처리 기술로써 인간의 청각 특성을 반영하는 적응적 변환 부호화(Adaptive Transform Coding)기법을 이용하여 낮은 전송률에서 비교적 간단한 수신기로도 전술한 디지탈 오디오 기기 수준의 음질을 재생할 수 있는 알고리즘 및 하드웨어 구현을 위해 활발히 연구되고 있다.For this purpose, the high-quality digital audio signal processing technology for HDTV has recently been applied to the digital audio device-level sound quality even with a relatively simple receiver at a low data rate by using an adaptive transform coding method that reflects human hearing characteristics. Actively researched for reproducible algorithms and hardware implementation.

전술한 적응적 변환 부호화 기법으로서, 다수의 채널을 갖는 스테레오 디지탈 오디오 부호화 장치에 있어서, 예를들어 5개 채널에 비트를 할당하는 통상적인 방법으로는 각 채널을 각기 독립적으로 부호화하는 방식과, 전반 3채널 (즉, L,R 및 센터채널)과 나머지 후반 2채널(즉, L 및 R 서라운드 채널)에 각기 다른 비트량을 할당하여 부호화하는 방식등이 있다. 전술한 바와 같이 두 섹션으로 분류하여 다른 비트량을 할당하는 이유는 오디오 신호의 중요 정보량이 전반 3채널에 많고, 후반 2채널에는 비교적 덜 중요한 정보량이 포함되기 때문이다. 전술한 신호 특성에 근거하여 후반 2채널 오디오 신호에는 전반 3채널보다 좁은 대역폭을 갖는 대역 통과 필터(Band Pass Filter)를 이용하여 앤티얼라이징(Antialising)필터를 구성하며, 전반 3채널에 설정된 샘플링 주파수(Sampling Frequency)보다 통상 더 작게 설정하고 있다. 그러나, 이러한 부호화 방식들은 전술한 바와 같이 전반 채널이 후반 채널보다 더 많은 정보량을 포함하는 경우에는 유리하지만, 서라운드 오디오 시스템에서 후반, 즉 L 및 R 서라운드 채널에 더 중요한 정보가 포함되는 경우에는 부호화 효율이 떨어지며 동시에 음질을 저하시키는 문제점이 있었다.As the above-described adaptive transform encoding technique, in a stereo digital audio encoding apparatus having a plurality of channels, for example, a conventional method of allocating bits to five channels includes a method of encoding each channel independently, and a first half. There are a method of allocating different bit amounts to three channels (that is, L, R and center channels) and the remaining two second channels (that is, L and R surround channels). As described above, the reason for allocating different bit amounts by dividing into two sections is that the important information amount of the audio signal is included in the first three channels, and the latter two channels contain relatively less important information amounts. Based on the above-described signal characteristics, the latter two-channel audio signal is composed of an anti-aliasing filter using a band pass filter having a bandwidth narrower than the first three channels, and the sampling frequency set in the first three channels. It is usually set smaller than (Sampling Frequency). However, these coding schemes are advantageous when the first channel contains more information than the latter channel as described above, but in the surround audio system, the coding efficiency when more important information is included in the latter, i.e., L and R surround channels. There was a problem of falling sound quality at the same time falling.

따라서, 본 발명의 주 목적은 부호화 효율을 증대시키고, 음질을 보다 향상시키기 위해, L, R, 센터, L 서라운드 및 R 서라운드 채널과 각 프레임에 대한 인간의 인지정보량에 응답하여 적응적으로 비트를 할당하여 부호화하는 채널 및 각 프레임에 적응 적응적으로 비트 할당하여 부호화 하는 스테레오 디지탈 오디오 부호화 장치를 제공하는데 있다.Accordingly, the main object of the present invention is to adaptively adapt bits in response to L, R, center, L surround and R surround channels and the amount of human cognitive information for each frame in order to increase coding efficiency and further improve sound quality. Disclosed is a channel to be allocated and encoded, and a stereo digital audio encoding apparatus for adaptively bit allocation and encoding for each frame.

전술한 목적을 달성하기 위해 본 발명은 좌측(L), 우측(R), 센터, L 및 R 서라운드 채널에 대한 다수개의 프레임을 가진 프레임군(Group Of Frame : GOF)으로 이루어진 디지탈 오디오 신호를 적응적으로 비트를 할당하여 부호화하는 것으로, 상기 L, R, 센터, L 및 R 서라운드 채널의 다수개의 프레임을 가진 프레임군을 입력하여 제1전력 밀도 스펙트럼(Power Density Spectrum)을 산출하고, 이 제1전력 밀도 스펙트럼을 인간의 청각 특성을 이용하여 상기 5개의 채널과 각 채널의 프레임에 대해 인지 정보량(Perceptual Entropy)을 산출하는 인지 정보량 계산부와 ; 상기 인지 정보량 계산부에서 얻은 채널 및 채널의 각 프레임에 대한 인지 정보량에 응답하여 상기 각 채널의 프레임군으로 부터 출력되는 채널 및 각 채널의 각 프레임에 대한 오디오 신호에 적응적으로 비트를 할당하는 적응적 채널 및 프레임 비트 할당(Adaptive Channel And Frame Bit Allocation)부와 ; 상기 프레임군에서 출력되는 채널 및 각 채널의 각 프레임에 대한 오디오 신호를 상기 적응적 채널 및 프레임 비트 할당부에서 제공되는 비트에 의해 부호화하는 부호기를 제공한다.In order to achieve the above object, the present invention adapts a digital audio signal consisting of a group of frames (GOF) having a plurality of frames for left (L), right (R), center, L and R surround channels. By assigning and encoding bits, a frame group having a plurality of frames of the L, R, center, L, and R surround channels is input to calculate a first power density spectrum, and the first power density spectrum is calculated. A cognitive information amount calculating unit configured to calculate a perceptual entropy for the five channels and the frames of each channel by using the power density spectrum of the human auditory characteristics; Adaptive to allocate a bit adaptively to the channel output from the frame group of each channel and the audio signal for each frame of each channel in response to the amount of cognition information for each frame of the channel and channel obtained by the cognitive information amount calculating unit Adaptive Channel And Frame Bit Allocation section; A coder for encoding an audio signal for a channel output from the frame group and each frame of each channel by the bits provided by the adaptive channel and frame bit allocation unit is provided.

본 발명은 L, R, 센터, L 서라운드 및 R 서라운드 채널로 입력되는 오디오 신호에 대한 인지 정보량은 통상적으로 각 채널마다 그 크기가 상이하고 또 동일한 채널내의 각 프레임간에도 그 크기가 상이함에 의거하는데, 예를들어, 인지 정보량이 큰 경우에는 인간의 귀로서 가청 레벨에 대한 오차를 느낄 수 있는 확률이 크므로 보다 많은 비트를 할당하여 부호화해야 하며, 인지 정보량이 작은 경우에는 비트를 적게 할당하여 부호화한다는 개념에 근거한다.In the present invention, the amount of cognitive information for audio signals input to L, R, center, L surround and R surround channels is typically different in size for each channel and also in sizes between frames in the same channel. For example, if the amount of cognitive information is large, the human ear has a high probability of feeling an error about the audible level. Therefore, more bits should be allocated and encoded. Based on the concept

후술하는 바와같이, 본 발명을 실행하는 순서는 먼저 각 채널에 대한 인지 정보량을 구한후, 이 인지 정보량의 평균(Average) 및 분산값(Variance Value)을 구하고, 이 평균 및 분산값을 이용하여 각 채널의 인지 정보량에 따른 가중치를 부여하며, 또한 동일 채널 내의 각 프레임간에도 인지정보량에 따라 가중치를 부여하여 비트를 할당하는 식으로 이루어진다.As will be described later, in order to implement the present invention, the amount of cognitive information for each channel is first obtained, and then, the average and variance values of the cognitive information amount are obtained, and the average and variance values are used. The weight is assigned according to the amount of cognitive information of the channel, and the weight is assigned according to the amount of cognitive information for each frame in the same channel to allocate bits.

이하에서는 도면을 참조하여 본 발명의 바람직한 실시예가 상세하게 설명된다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.

제1도는 본 발명에 따른 L, R, 센터, L 서라운드 및 R 서라운드 채널과 각 채널내의 다수의 프레임에 적응적으로 비트 할당하여 부호화하는 스테레오 디지탈 오디오 부호화 장치를 도시한 블럭도로서, L, R, 센터, L 및 R 서라운드 채널 프레임군 부(110,120,130,140 및 150), 인지 정보량 계산부(160), 적응적 채널 및 프레임 비트 할당부(170), L, R, 센터, L 및 R 서라운드 채널에 대해 알려진 부호화 방식에 의해 부호화하는 부호기(180 내지 220) 및 다중화부(230)를 포함한다. 이러한 장치들을 포함하는 스테레오 디지탈 오디오 부호화 장치에 의해서 입력되는 L, R, 센터, L 및 R 서라운드 채널의 다수개의 프레임을 갖는 5M개 프레임의 디지탈 오디오 신호를 채널과 각 프레임에 대해 측정한 인지정보량에 의해 적응적으로 비트를 할당하여 부호화 하므로써, 부호화 효율을 증대시키고 음질을 향상시킬 수가 있게 된다.FIG. 1 is a block diagram showing a stereo digital audio encoding apparatus for adaptively bit allocation and encoding L, R, center, L surround and R surround channels and a plurality of frames within each channel according to the present invention. , Center, L, and R surround channel frame groups 110, 120, 130, 140, and 150, cognitive information amount calculating unit 160, adaptive channel and frame bit allocation unit 170, L, R, center, L, and R surround channels Encoders 180 to 220 and a multiplexer 230 that encode by a known encoding scheme are included. A digital audio signal of 5M frames having a plurality of frames of L, R, center, L, and R surround channels input by a stereo digital audio encoding apparatus including such devices is added to the amount of cognitive information measured for the channel and each frame. By adaptively allocating and encoding bits, encoding efficiency can be increased and sound quality can be improved.

L, R, 센터, L 서라운드 및 R 서라운드 채널 프레임군(110,120,130,140 및 150)부는 L, R, 센터, L 및 R 서라운드 채널로 각기 입력되는 디지탈 오디오 신호를 채널 및 각 채널내의 프레임에 대한 인지 정보량을 산출하기 위해 복수개(여기서, M은 각 채널의 프레임군내의 프레임 수를 나타냄)의 프레임을 가진 프레임군 단위로 데이타를 분류하여 각 부호기(100,190,200,210 및 220)와 인지 정보량 계산부(160)로 각각 제공한다.The L, R, center, L surround, and R surround channel frame groups 110, 120, 130, 140, and 150 respectively receive digital audio signals input through the L, R, center, L, and R surround channels, respectively. To calculate, classify the data into frame group units having a plurality of frames (where M denotes the number of frames in the frame group of each channel) and provide the data to the encoders 100, 190, 200, 210, and 220 and the cognitive information amount calculating unit 160, respectively. do.

제2도를 참조하면, 제2도는 전술한 L, R, 센터, L 서라운드 및 R 서라운드 채널의 GOF구성을 나타낸 것이다. 도시된 바와 같이, 한 프레임은 N개(여기서, N은 양의 정수)의 샘플(Sample)로 이루어지며, 통상 10msec 내지 약 40msec단위로 이루어진다. 또한, 도시된 바와같이 각 채널의 GOF는 M개의 프레임으로 이루어진다고 가정하면, 한 채널의 GOF는 N×M개의 샘플로 구성되며, 본 발명의 5개 채널에 대한 GOF는 N×5M개의 샘플로 이루어 진다. 여기서, N값은 비디오 카메라(Camera)(도시 안됨)로 부터 입력되는 영상 신호의 부호화 및 복호화 과정에서 필연적으로 발생되는 지연(Delay)시간을 이용하여 오디오 신호의 정상과정(stationary Process) 구간인 10msec 내지 40msec 단위로 구성할 수 있고, M값은 부호화 및 복호화(Codec)수행에 대한 지연시간에 의해 결정된다.Referring to FIG. 2, FIG. 2 shows the GOF configuration of the aforementioned L, R, center, L surround and R surround channels. As shown, one frame consists of N samples, where N is a positive integer, and is typically in the range of 10 msec to about 40 msec. In addition, as shown, the GOF of each channel is composed of M frames, the GOF of one channel is composed of N × M samples, the GOF for five channels of the present invention is N × 5M samples Is done. Here, N value is 10msec which is a stationary process section of an audio signal using a delay time inevitably generated during encoding and decoding of an image signal input from a video camera (not shown). It can be configured in the unit of 40msec, M value is determined by the delay time for performing the coding and decoding (Codec).

제1도를 다시 참조하면, 인지 정보량 계산부(160)는 영상 신호의 부호화 및 복호화 과정의 지연 시간 동안 오디오 신호를 분석하여 인간의 청각 특성에 부한되는 인지 정보량을 산출하므로써, 후술하는 적응적 채널 및 프레임 비트 할당부(170)에서 L, R, 센터, L 및 R 서라운드 채널과 각 채널의 프레임마다 비트 할당량을 달리하므로써 부호화 효율을 증가시킬 수 있다. 이러한 인지 정보량 계산부(150)는 전술한 L, R, 센터, L 및 R 서라운드 GOF 부(110,120,130,140,150)로부터 각기 제공되는 N개의 샘플들로 이루어진 한 프레임의 유한 디지탈 오디오 신호원, 즉, x(n)의 전력 밀도 스펙트럼(S_xx(w))에 의해 인간의 청각 특성을 이용하여 마스킹 문턱치(Masking Threshold)(M(w))를 구하고, 그 다음, 다음과 같은 식에 의해 인지 정보량을 얻을 수가 있다. 예컨데, 먼저, 한 프레임의 유한 디지탈 오디오 신호원 x(n)에 대한 근사적인 전력 밀도 스펙트럼(S_xx(w))을 다음식 (1)에 의해 산출한다.Referring back to FIG. 1, the cognitive information amount calculator 160 analyzes an audio signal during a delay time of encoding and decoding an image signal to calculate an amount of cognitive information limited to human auditory characteristics. The frame bit allocation unit 170 may increase coding efficiency by varying the bit allocation amount for each of the L, R, center, L, and R surround channels and frames of each channel. The cognitive information amount calculation unit 150 is a finite digital audio signal source of one frame, i.e., x (n) consisting of N samples provided from the L, R, center, L, and R surround GOF units 110, 120, 130, 140, and 150 described above. Masking Threshold (M (w)) can be obtained by using the human auditory characteristics according to the power density spectrum of S _xx (w), and then the amount of cognitive information can be obtained by the following equation. have. For example, first, an approximate power density spectrum S _xx (w) for a finite digital audio signal source x (n) of one frame is calculated by the following equation (1).

즉, In other words,

또한, 전술한 M(w)는 S_xx(w)의 전력 밀도 스펙트럼을 갖는 신호에 대하여 소정의 주파수 성분에서 인간의 귀로서는 감지할 수 없는 영역의 전력값에 해당하므로, 소정의 주파수 성분에 대하여 M(w)이하의 오차값으로 신호를 재생하면 귀로서는 그 영역을 감지할 수 없게 되므로, 그 영역, 즉, 오차 신호의 전력 밀도 스펙트럼(S_ee(w))을 구하기 위해, 부호기의 입력 신호를 x(n), 출력 신호를 y(n)이라 하고, 오차 신호 e(n)을 다음식(2)에 의해 계산한다.In addition, since M (w) described above corresponds to a power value of a region having a power density spectrum of S _xx (w) that cannot be detected by a human ear in a predetermined frequency component, If the signal is reproduced with an error value less than or equal to M (w), the area cannot be detected by the ear. Therefore, in order to obtain the power density spectrum (S _ee (w)) of the area, that is, the error signal, the input signal of the encoder Is x (n) and the output signal is y (n), and the error signal e (n) is calculated by the following equation (2).

즉, e(n)=x(n)-y(n) (2)E (n) = x (n) -y (n) (2)

그러므로, N개의 유한 신호원에 대한 S_ee(w)를 M(w)로 대치하여 인간의 귀로서 오차를 감지할 수 없도록 신호 x(n)을 전송하기 위한 인지 정보량(R_PE)을 다음식(3)에 의해 산출할 수 있다.Therefore, by replacing S _ee (w) for N finite signal sources with M (w), the amount of cognitive information (R _PE ) for transmitting the signal x (n) such that an error cannot be detected as the human ear is given by It can calculate by (3).

한편, 마스킹 문턱치 M(w)가 매프레임마다 동일하고 무한개의 주파수 대역을 갖는 대역 분할 부호기를 이용하는 경우, 이론적으로 실현가능한 최소 비트 전송률은 전술한 식 (3)에 의해 얻어진 값이다.On the other hand, when the masking threshold M (w) uses the same band split coder having the same frequency every frame and infinite frequency band, the theoretically feasible minimum bit rate is the value obtained by the above equation (3).

그러나, 실제의 오디오 데이타에 대하여 부호기를 구성하는 경우, N개의 샘플을 가진 한 프레임 시간영역 신호마다 각각의 청각 파라메터를 분석한 후 이에 맞도록 대역 분할된 각 주파수 구간마다 양자화 수준을 달리 설정하고, N개의 샘플마다 변화하는 청각 파라메터를 전송해야 하므로, 실질적으로 필요한 비트 전송률은 전술한 식(3)에 의해 산출된 인지 정보량 보다 크게 될 것이다.However, in the case of configuring the encoder for the actual audio data, each audio parameter is analyzed for each frame time-domain signal having N samples, and then the quantization level is set differently for each frequency band divided accordingly. Since the auditory parameters vary every N samples, the substantially necessary bit rate will be greater than the amount of cognitive information calculated by equation (3) above.

예컨대, 입력 신호 x(n)을 L개(여기서, L은 양의 정수)의 균일한 대역폭을 갖는 주파수 대역으로 분할하여 부호화하는 경우, i번째 주파수 대역의 전력 밀도 스펙트럼 S_xx(i) 및 마스킹 문턱치 M(i)는 다음과 같은 식 (4 및 5)에 의해 근사적으로 산출할 수 있다. 즉,For example, when the input signal x (n) is divided into L frequency bands, where L is a positive integer, and encoded, the power density spectrum S _xx (i) and masking of the i th frequency band are masked. The threshold M (i) can be calculated approximately by the following equations (4 and 5). In other words,

R_i는 i번째 분할 대역에 해당하는 주파수 영역이고, S_xx(wi)는 N 포인트 이산 퓨리어 변환(Discrete Fourier Transform : DFT)에 있어서 j번째 주파수 성분의 전력 밀도 스펙트럼에 해당하는 값이다. 또한,R _i is a frequency domain corresponding to the i-th division band, and S _xx (wi) is a value corresponding to a power density spectrum of the j-th frequency component in an N point discrete Fourier transform (DFT). Also,

여기서, M(W_j)는 j번째 분할 대역에 속하는 마스킹 문턱치 값 M(w)중 최소값을 나타낸다.Here, M (W _j ) represents the minimum value of the masking threshold value M (w) belonging to the j-th division band.

예를들어, 1024포인트 DFT(즉, L=1024)를 이용하여 전력 밀도 스펙트럼을 구한 후, 32개의 주파수 대역(즉, L=32)으로 분할하는 경우, 인지 정보량 R_PE다음과 같은 식(6)에 의해 구할 수 있다.For example, in the case of obtaining a power density spectrum using a 1024-point DFT (that is, L = 1024) and dividing it into 32 frequency bands (that is, L = 32), the cognitive information amount R _PE is expressed as follows. Can be obtained by

다음으로, 각 채널에 대한 1 GOF의 인지 정보량을 계산하기 위한 개념을 설명한다. N개의 샘플로 구성된 i(여기서, i는 0보다 크고 프레임 수 보다는 작은 양의 정수)번째 프레임의 오디오 데이타에 대하여 전력 밀도 스펙트럼 및 마스킹 문턱치를 이용하여 전술한 식(6)에 의해 인지 정보량 PE₁를 구하고, 그 다음 L, R, 센터, L 및 R 서라운드 채널에 대한 전채 5M 개의 프레임을 갖는 1프레임 군에 대한 평균인지 정보량 PE₁₀및 전술한 각PE₁₀에 대한PE₁의 변화량을 나타내는 표준 편차PE_std를 다음 식(7 및 8)에 의해 구한다.Next, a concept for calculating the amount of cognitive information of 1 GOF for each channel will be described. Recognition amount of information PE ₁ by using the power density spectrum and masking threshold for the audio data of the i-th frame consisting of N samples, where i is a positive integer greater than 0 and smaller than the number of frames. And then the standard deviation representing the change in the average perceptual information amount PE ₁₀ and PE ₁ for each PE ₁₀ described above for the group of 1 frame having 5M total frames for L, R, center, L and R surround channels. PE _std is obtained by the following equations (7 and 8).

즉,In other words,

(7), (8) (7), (8)

그 다음 적응적 채널 및 프레임 비트 할당부(170)는 인지 정보량 계산부(160)에서 구한 L, R, 센터, L 및 R 서라운드 채널과 각 채널의 각 프레임에 대한 인지 정보량을 제공받아 후술하는 기법에 의해 적응적으로 비트량을 할당하여 L, R, 센터, L 및 R 서라운드 채널에 대한 부호기(180,190,200,210,220)로 각각 제공한다.Next, the adaptive channel and frame bit allocator 170 receives L, R, center, L, and R surround channels obtained from the cognitive information amount calculating unit 160 and the amount of cognitive information for each frame of each channel. By adaptively allocating the bit amount to provide to the encoder (180, 190, 200, 210, 220) for the L, R, center, L and R surround channels, respectively.

이하에서는 인지정보량 계산부(160)에서 구한 채널 및 각 채널의 프레임에 대한 인지 정보량에 의해 가변적으로 비트를 할당하는 방법에 대해 상세하게 설명된다.Hereinafter, a method of allocating bits variably according to a channel obtained by the cognitive information amount calculating unit 160 and the cognitive information amount of a frame of each channel will be described in detail.

제3도를 참조하면, 제3도는 전술한 바와같이 L, R, 센터, L 및 R 서라운드 채널에 대해 총 5M개의 프레임으로 구성된 1 GOF 내의 i번째 프레임에 대한 인지 정보량을 PE_i라 하고, 1 GOF의 평균 인지 정보량을 PE₁₀이라 할 때, PE_i에 따른 프레임 비트 할당 상태(Index)를 나타낸 그라프이다. 동도면에서, 수직(Y)축의 Index는 -q 내지 +q구간의 정수값을 가지는 비트할당 상태를 나타내고, 수평(X)축의 X_i는 다음식(9)에 의해 결정되는 1 프레임이 가질 수 있는 소정의 인지 정보량을 나타낸다.Referring to FIG. 3, FIG. 3 is referred to as PE _i for the amount of cognition information for the i-th frame in 1 GOF, which is composed of a total of 5M frames for L, R, center, L and R surround channels, and 1 When the average recognition information amount of GOF is PE ₁₀ , it is a graph showing a frame bit allocation state (Index) according to PE _i . In the same figure, the index of the vertical (Y) axis represents a bit allocation state having an integer value between -q and + q, and X _i of the horizontal (X) axis may have one frame determined by the following equation (9). Indicates a predetermined amount of cognitive information.

예를들어, M=8인 경우에 적용된 가중치(δ)는 전술한 식(7 및 8)을 통해 구한 PE₁₀및 PE_std에 대해서 실험 결과를 참조하면 다음 표 1과 같이 얻을 수 있다.For example, the weight δ applied in the case of M = 8 can be obtained as shown in Table 1 below with reference to the experimental results for PE ₁₀ and PE _std obtained through the above equations (7 and 8).

[표 1]TABLE 1

즉, PE₁₀이 0∼0.315이고, PE_std가 0∼0.625인 경우에는 δ는 1000이며, 그외 다른 값들에 대해서도 표 1을 참조하면, 전술한 방식과 동일한 방법으로 δ값을 얻을 수 있을 것이다.That is, when PE ₁₀ is 0 to 0.315 and PE _std is 0 to 0.625, δ is 1000. Referring to Table 1 for other values, δ may be obtained in the same manner as described above.

그리고, 전술한 표 1에 근거하여 구한 가중치(δ)를 이용하면 다음식(9)에 의해 인지정보량 X_i를 구할 수 있다.Then, using the weight δ obtained based on Table 1, the cognitive information amount X _i can be obtained by the following equation (9).

(a) (a)

여기서, i와 -q및 q간에는 -q≤i≤q 관계가 있으며, 다음과 같은 조건을 가정한다.Here, there is a relationship of -q≤i≤q between i, -q and q, and the following conditions are assumed.

sign(i)=1 if(i〉0)sign (i) = 1 if (i> 0)

sign(i)=-1 if(i〈0)sign (i) =-1 if (i <0)

sign(i)=0 if(i=0)sign (i) = 0 if (i = 0)

또한, δ값은 전술한 1 GOF에 대한 5M개의 PE₁값을 전술한 식(7 및 8)에 의해 산출한 PE₁₀및 PE_std값에 따라 결정 되는 가중치이다.In addition, the δ value is a weight determined based on the PE ₁₀ and PE _std values calculated by the above equations (7 and 8) for the 5M PE ₁ values for the 1 GOF described above.

본 발명의 일예로서, q값이 8인 경우, 즉, 전술한 식(9)을 통해 추출한 소정의 인지정보량(X₁)에 대응하는 각각의 Index에 따른 프레임 비트수는 전술한 식들에 의거하면 다음 표 2와 같이 얻을 수 있다.As an example of the present invention, when the q value is 8, that is, the number of frame bits corresponding to each index corresponding to the predetermined amount of recognition information X ₁ extracted through the above-described equation (9) is based on the above-described equations. It can be obtained as shown in Table 2.

[표 2]TABLE 2

(단위, Frame Bit : Bit/Frame, Bit Rate : K Bit/sec)(Unit, Frame Bit: Bit / Frame, Bit Rate: K Bit / sec)

즉, 표 2를 참조하면, 예를들어 Index가 0인 경우, 프레임 비트수는 3072로서, 즉, 1152 샘플로 구성되는 프레임 단위로 초당 128Kbps의 정보 전송률로 부호화되는 경우 1 프레임에 할당되는 비트 수는 3072비트로서 MPEG(Motion Picture Expert Grpup)의 오디오 섹션에서 제안하는 비트수가 할당되고, Index가 증가할수록, 프레임 비트수는 평균치를 훨씬 초과하는 반면에, Index가 감소할수록 프레임 비트수는 평균치 보다 훨씬 적게 할당됨을 알 수 있을 것이다.That is, referring to Table 2, for example, when Index is 0, the number of frame bits is 3072, that is, the number of bits allocated to one frame when encoded at an information rate of 128 Kbps per second in a frame unit composed of 1152 samples. Is 3072 bits, and the number of bits suggested by the audio section of the Motion Picture Expert Grpup (MPEG) is allocated, and as the Index increases, the number of frame bits far exceeds the average, whereas as the Index decreases, the number of frame bits exceeds the average. You will notice that it is less allocated.

제1도를 다시 참조하면, 부호기(180,190,200,210,220)는 L, R, 센터, L 및 R 서라운드 채널 GOF 부(110,120,130,140,150)에서 각기 제공되는 각 채널에 대한 오디오 신호를 적응적 채널 및 프레임 비트 할당부(170)에서 제공되는 채널과 각 채널의 프레임에 할당되는 비트에 의해 알려진 부호화 방식으로 부호화하는 것으로, 그의 출력은 MUX(230)에 접속된다. MUX(230)는 전술한 부호기(180,190,200,210,220)에서 제공되는 부호화된 데이타와 적응적 채널 및 프레임 비트 할당부(170)에서 제공되는 채널과 각 채널의 프레임에 대해 할당되는 비트 할당 정보를 다중화하여 채널(Channel)특성에 적함하게 비트 스트림으로 변환하여 출력한다.Referring back to FIG. 1, the encoders 180, 190, 200, 210, and 220 provide an adaptive channel and frame bit allocation unit 170 for the audio signals for each channel provided by the L, R, center, L, and R surround channel GOF units 110, 120, 130, 140, and 150, respectively. Is encoded in a known encoding scheme by the channels provided in < RTI ID = 0.0 > and < / RTI > The MUX 230 multiplexes the encoded data provided by the above-described encoders 180, 190, 200, 210, and 220, the channel provided by the adaptive channel and frame bit allocator 170, and the bit allocation information allocated to the frames of each channel. Channel is converted into a bit stream and output as appropriate.

이상에서 설명한 본 발명에 따른 채널 및 각 프레임에 적응적으로 비트 할당하여 부호화하는 스테레오 디지탈 오디오 부호화 장치에 의하면, 채널 및 각 채널의 프레임에 대한 인지 정보량의 평균 및 분산값(또는 표준 편차)에 응답하여 채널 및 각 채널내의 프레임에 적응적으로 비트를 할당하여 부호화하므로써, 부호화 효율을 증대시키고 음질을 향상시키는 커다란 장점이 있다.According to the stereo digital audio encoding apparatus for adaptively bit-allocating and encoding a channel and each frame according to the present invention described above, it is possible to respond to an average and a variance (or standard deviation) of the amount of cognitive information for a channel and a frame of each channel. By adaptively assigning and encoding bits to channels and frames within each channel, there is a great advantage of increasing coding efficiency and improving sound quality.

Claims

By adaptively allocating and encoding a digital audio signal consisting of a group of frames (GOF) having a plurality of frames for the left (L), right (R), center, L, and R surround channels, A first power density spectrum is calculated by inputting a frame group having a plurality of frames of the L, R, center, L, and R surround channels, and using the human auditory characteristics of the first power density spectrum. Perceptual Entropy is calculated for the five channels and the frames of each channel, and the first power density spectrum is used for the five channels and the frames of each channel using the human auditory characteristics. A cognitive information amount calculating unit for calculating Perceptual Entropy; Adaptive to allocate a bit adaptively to the channel output from the frame group of each channel and the audio signal for each frame of each channel in response to the amount of cognition information for each frame of the channel and channel obtained by the cognitive information amount calculating unit Adaptive Chaunel And Frame Bit Allocatiom; Adaptively bit allocation to a channel and each frame including an encoder for encoding a channel output from the frame group and an audio signal for each frame of each channel by bits provided by the adaptive channel and frame bit allocation unit Stereo digital audio encoding device for encoding.

The second power of claim 1, wherein the cognitive information amount calculation unit obtains a masking threshold based on the first power density spectrum, and is a region below the masking threshold and is not detected by the human ear. Obtaining a density spectrum, and then calculating L, R, center, L, and R surround channels and the amount of cognitive information for each channel and each frame using the first power density spectrum and the second power density spectrum. Stereo digital audio coding device.

The method of claim 1, wherein the adaptive channel and frame bit allocation unit comprises one frame having 5M frames of L, R, center, L, and R surround channels, where M is the number of frames in GOF for one channel. The amount of cognitive information for the i th frame of the group (where i represents any frame of 1 GOF as a positive integer) is PE ₁ , and the average cognitive information amount PE ₁₀ in the 1 frame group and for the 1 frame group Using the standard deviation PE _std to obtain a weight (δ), and then using the average recognition information amount PE ₁₀ and the weight can be obtained by the following equation, the recognition information amount (X1) having a predetermined value,

here,

if (i> 0) sign (i) = 1

if (i <0) sign (i) =-1

if (i = 0) sign (i) = 0

i is within the Index range, and is then adapted to the final L, R, center, L and R surround channels and each frame of each channel by an Index representing a frame bit allocation state corresponding to the predetermined amount of cognitive information. Stereo digital audio decoding apparatus characterized in that the allocation of bits.