KR100294623B1

KR100294623B1 - Real time processing device of hdtv audio coder and method thereof

Info

Publication number: KR100294623B1
Application number: KR1019980019244A
Authority: KR
Inventors: 이종권; 박창섭; 이준용; 김성윤; 오현오; 윤대희
Original assignee: 박권상; 한국방송공사
Priority date: 1998-05-27
Filing date: 1998-05-27
Publication date: 2001-07-12
Anticipated expiration: 2018-05-27
Also published as: KR19990086309A

Abstract

본 발명은 HDTV 오디오 부호화기의 실시간 처리장치 및 그 방법에 관한 것으로, 특히 서브밴드 분석과 심리 음향 모델링을 수행하는 5개의 슬레이브 보드(1)와; 채널 매트릭싱, 비트 할당, 양자화, 비트 팩킹, PC(5)와의 인터페이스를 위한 마스터 보드(2)와; 상기 슬레이브 보드(1)들과 마스터 보드(2)를 서로 연결해 주는 백 플래인 보드(3)와; 상기 마스터 보드(2)에서 비트열로 전송되어 오는 데이터를 퍼스널 컴퓨터로 전송하는 인터페이스부(4)와; MPEG 표준 복호화 프로그램으로 전송된 비트열을 복호화하여 부호화 시스템의 성능을 확인하는 퍼스널 컴퓨터(5)와; 소오스(6)에서 아날로그로 출력되는 오디오신호를 16비트 PCM의 디지털 데이터로 변환시켜 슬레이브 보드(1)들로 전송하는 6채널의 A/D 변환기(7)로 구성하여 모든 비트율과 표본화율, 채널 조합이 가능하고, 다이내믹 전송 채널 스위칭과 중앙 채널의 가상 부호화를 지원할 수 있으며, 50MHz로 업-클럭킹시 한 개의 프로세서만으로 처리가 가능하고, 사용되는 프로세서의 감소를 통해 제작 비용을 줄일 수 있므며, HDTV 방송등의 응용시 부호화 시간 지연도 줄일 수 있고, 다른 프로세서나 ASIC 등의 응용에 활용할 수 있도록 한 것이다.The present invention relates to a real-time processing apparatus and a method of an HDTV audio encoder, and in particular, five slave boards (1) for performing subband analysis and psychoacoustic modeling; A master board 2 for channel matrixing, bit allocation, quantization, bit packing, and interface with a PC 5; A backplane board (3) connecting the slave boards (1) and the master board (2) to each other; An interface unit (4) for transmitting data transferred from the master board (2) to a bit string to a personal computer; A personal computer 5 for verifying the performance of the encoding system by decoding the bit stream transmitted to the MPEG standard decoding program; All bit rates, sampling rates and channels are composed of six channels of A / D converters (7) which converts audio signals output from the source (6) into analog 16-bit PCM digital data and transmits them to slave boards (1). It can be combined, supports dynamic transport channel switching and virtual encoding of the central channel, can be processed with only one processor when up-clocked to 50 MHz, and production costs can be reduced by reducing the number of processors used. It is possible to reduce encoding time delay in applications such as HDTV broadcasting, and to utilize it in applications of other processors or ASICs.

Description

Real-time processing device and method of HDTV audio encoder

본 발명은 범용 DPS를 이용한 HDTV 오디오 부호화기의 실시간 처리장치 및 그 방법에 관한 것으로, 더욱 상세히는 MPEG-2 계층 2의 48KHz 표본화율에서 5.1채널, 640Kbps까지 실시간 처리할 수 있도록 하여 그 이하의 모든 비트율과 표본화율, 채널 조합이 가능하고, 또 다채널 부호화를 보다 효과적으로 수행하기 위한 방법으로 MPEG-2에서 제안한 복합 부호화(composite coding) 중 다이내믹 전송 채널 스위칭과 중앙 채널의 가상 부호화(phantom coding)를 지원할 수 있으며, 각 서브루틴의 고속화 및 최적화는 2개의 프로세서로 처리하던 슬레이브 보드 수행루틴을 50MHz로 업-클럭킹시 한 개의 프로세서만으로 처리가 가능하고, 또 4개의 프로세서가 있어야 실시간 처리가 가능한 마스터 보드의 다채널 처리도 고속 알고리듬을 적용하여 2개의 프로세서만으로 처리할 수 있도록 하여 사용되는 프로세서의 감소를 통해 제작 비용을 줄일 수 있고, 또 HDTV 방송등의 응용시에 중요한 문제인 부호화 시간 지연(coding delay)도 줄일 수 있으며 최적화된 프로그램은 범용 DSP 프로세서를 이용하여 개발하므로써 다른 프로세서나 ASIC 등의 응용에 활용할 수 있도록 발명된 것이다.The present invention relates to a real-time processing apparatus and method of an HDTV audio encoder using a general-purpose DPS, and more particularly, to the real-time processing up to 5.1 channels, 640Kbps at 48KHz sampling rate of MPEG-2 layer 2, all bit rates below , Sampling rate, and channel combination, and dynamic transmission channel switching and phantom coding of the central channel among the composite coding proposed by MPEG-2 as a method for performing multichannel coding more effectively. The speedup and optimization of each subroutine can be done by one processor when up-clocking the slave board execution routine that was processed by two processors to 50MHz. Multi-channel processing can be handled with only two processors using a fast algorithm. The production cost can be reduced by reducing the number of processors used, and the coding delay, which is an important problem in applications such as HDTV broadcasting, can be reduced. Optimized programs can be developed using other general-purpose DSP processors. It is invented so that it can be used for applications such as ASIC.

통상, 아날로그 오디오에 비하여 디지털 오디오가 갖는 장점은 대역폭과 동적 영역(Dynamic Range)이 넓고 복사할 때 음질 손상이 없다는 점이다.In general, the advantages of digital audio over analog audio are that the bandwidth and dynamic range are wide and there is no sound quality damage when copying.

이러한 디지털 방식 오디오 기술은 현재 CD(Compact Disk)를 비롯하여 MD(Mini Disk), DCC(Digital Compact Cassette), CD-I(Compact Disk-Interactive) 등의 저장 매체에 응용되고 있으며, DBS(Direct Broadcasting Satellite)를 통한 디지털 방송, 그리고 앞으로 있을 HDTV(High Definition TeleVision) 방송 등에도 사용된다.Such digital audio technology is currently being applied to storage media such as compact disk (CD), mini disk (MD), digital compact cassette (DCC), compact disk-interactive (CD-I), and direct broadcasting satellite. It is also used for digital broadcasting through) and upcoming High Definition TeleVision (HDTV) broadcasting.

그러나 전송과 저장에서의 문제점을 줄이기 위해서는 신호의 압축이 불가피하며 압축 부호화한 오디오 신호는 복호화한 후의 주관적 음질이 기존 음질과 거의 동일하도록 유지되는 효과적인 압축 기법이 사용돼야만 한다.However, in order to reduce problems in transmission and storage, signal compression is inevitable, and an effective compression technique must be used in which the compressed and encoded audio signal is maintained to have almost the same subjective sound quality after decoding.

이에 대한 대응책으로 국제 표준화 기구(ISO : International Standard Organization) 산하의 동영상 전문가 그룹(MPEG : Moving Picture Experts Group)에서는 디지털 방식의 HDTV와 같은 방송 매체에 적용가능하고 다채널, 음성 다중 등의 부가 서비스에 대한 지원이 가능하며, 5채널 이상의 오디오를 처리할 수 있도록 동영상과 더불어 CD 수준의 디지털 오디오를 6 ∼ 40 Mbit/s의 전송률로 압축할 수 있는 MPEG-2 표준안 ISO/IEC 13818-3을 1994년 11월에 최종적인 국제 표준안으로 결정하였으며 97년 2월에 불분명했던 내용을 명확히 정의한 표준안을 새롭게 발표하였다.As a countermeasure, the Moving Picture Experts Group (MPEG) under the International Standard Organization (ISO) is applicable to broadcast media such as digital HDTV, and can be used for additional services such as multi-channel and voice multiplexing. In 1994, the MPEG-2 standard ISO / IEC 13818-3, capable of compressing CD-level digital audio at a bit rate of 6 to 40 Mbit / s, was added in order to handle more than 5 channels of audio. In November, it was decided that it was the final international standard and in February, 1997, a new standard was clearly defined which was unclear.

세계 각국에서는 이 기술을 선점하기 위해 고속 알고리듬, 단일칩 제조 기술 등에 대한 연구를 진행하고 있다. 많은 수요가 예상되고 계산량이 상대적으로 적은 복호화기의 경우 단일 칩으로 제조되는 등 활발한 개발이 이루어 지고 있으나, 방송국, 녹음 스튜디오등 한정된 수요만을 가지고 있는 부호화기의 경우는 개발이 지연되고 있는 실정이다.In order to preoccupy this technology, countries around the world are researching high-speed algorithms and single-chip manufacturing technologies. Actively developed, such as a decoder that is expected a lot of demand and a relatively small amount of calculation is manufactured on a single chip, but development is delayed in the case of encoders with limited demand, such as broadcasting stations, recording studios.

하지만, 실제로 MPEG-2 알고리듬을 방송이나 멀티미디어 시스템 등에 이용하기 위해서는 실시간 처리가 가능한 부호화 및 복호화 시스템이 필요하며 계산량이 많은 부호화기를 실시간으로 구현하는 것이 가장 큰 과제로 대두되고 있는 실정이다.However, in order to actually use the MPEG-2 algorithm in a broadcast or multimedia system, an encoding and decoding system capable of real time processing is required, and implementing a coder having a large amount of computation in real time has been the biggest problem.

본 발명은 상기한 제반 문제점을 해결하기 위하여 안출한 것으로, MPEG-2 계층 2의 48KHz 표본화율에서 5.1채널, 640Kbps까지 실시간 처리할 수 있도록 하여 그 이하의 모든 비트율과 표본화율, 채널 조합이 가능하고, 또 다이내믹 전송 채널 스위칭과 중앙 채널의 가상 부호화를 지원할 수 있으며, 50MHz로 업-클럭킹시 한 개의 프로세서만으로 처리가 가능하고, 또 4개의 프로세서가 있어야 실시간 처리가 가능한 마스터 보드의 다채널 처리도 고속 알고리듬을 적용하여 2개의 프로세서만으로 처리할 수 있도록 하여 사용되는 프로세서의 감소를 통해 제작 비용을 줄일 수 있고, HDTV 방송등의 응용시에 중요한 문제인 부호화 시간 지연도 줄일 수 있음은 물론 다른 프로세서나 ASIC 등의 응용에 활용할 수 있는 HDTV 오디오 부호화기의 실시간 처리장치 및 그 방법을 제공하고자 한다.The present invention has been made to solve the above-mentioned problems, and it is possible to process up to 5.1 channels and 640Kbps in real time at 48KHz sampling rate of MPEG-2 layer 2, and all bit rates, sampling rates, and channel combinations below are possible. In addition, it can support dynamic transmission channel switching and virtual encoding of the central channel.It can be processed by only one processor when up-clocking to 50MHz. By applying the algorithm to process with only two processors, the production cost can be reduced by reducing the processor used, and the coding time delay, which is an important problem in applications such as HDTV broadcasting, can be reduced, as well as other processors or ASICs. A real-time processing device and method for HDTV audio encoder that can be used for application I want to give.

이러한 본 발명의 목적은, 각각의 채널 오디오 데이터를 입력 받아 서브밴드 분석과 심리 음향 모델링을 수행하는 5개의 슬레이브 보드와; 채널 매트릭싱(Matrixing), 비트 할당, 양자화, 비트 팩킹, 그리고 PC와의 인터페이스를 위한 마스터 보드와; 상기 슬레이브 보드들과 마스터 보드를 서로 연결해 주는 백 플래인(Back plane) 보드와; 상기 마스터 보드에서 비트열로 전송되어 오는 데이터를 퍼스널 컴퓨터로 전송하는 인터페이스부와; MPEG 표준 복호화 프로그램으로 전송된 비트열을 복호화하여 부호화 시스템의 성능을 확인하는 퍼스널 컴퓨터(PC)와; CD 플레이어나 DAT(Digital Audio Tape) 등과 같은 소오스(Source)에서 아날로그로 출력되는 오디오신호를 16비트 PCM의 디지털 데이터로 변환시켜 상기 슬레이브 보드들로 전송하는 6채널의 A/D 변환기로 구성하여, 병렬 구조로 오디오 데이터를 처리하는 단계와; 서브밴드 분석을 FAST DCT 알고리듬으로 고속 처리하여 계산속도를 높이는 단계와; 슬레이브 보드간의 듀얼 포트 램을 이용하여 데이터를 주고 받는 단계와; 각각의 슬레이브 보드의 결과를 듀얼 포트램을 통하여 마스터로 전달하는 단계와; 마스터 보드 하드웨어가 슬레이브보드에서 데이터를 넘겨 받아 종합적으로 처리하는 단계와; 마스터 보드 하드웨어에서 비트 할당시 효율적인 알고리듬을 사용하여 계산량을 현격히 줄이는 단계와; 최종적으로 PC와의 인터페이스를 통하는 단계를 수행토록 하므로써 달성할 수 있다.The object of the present invention, the five slave boards for receiving subchannel analysis and psychoacoustic modeling by receiving the respective channel audio data; A master board for channel matrixing, bit allocation, quantization, bit packing, and interface with a PC; A back plane board connecting the slave boards and the master board to each other; An interface unit for transferring data transmitted from the master board to a bit string to a personal computer; A personal computer (PC) for verifying the performance of the encoding system by decoding the bit stream transmitted to the MPEG standard decoding program; It consists of a 6-channel A / D converter that converts analog audio signals from sources such as a CD player or digital audio tape (DAT) into 16-bit PCM digital data and transmits them to the slave boards. Processing audio data in a parallel structure; Speeding up the computation by performing subband analysis with the FAST DCT algorithm; Exchanging data using dual port RAM between slave boards; Transferring the result of each slave board to the master through the dual port RAM; Master board hardware handing over data from slave board and processing it comprehensively; Significantly reducing the amount of computation using an efficient algorithm in allocating bits in the master board hardware; This can be achieved by finally performing the steps through the interface with the PC.

따라서, MPEG-2 계층 2의 48KHz 표본화율에서 5.1채널, 640Kbps까지 실시간 처리할 수 있어 그 이하의 모든 비트율과 표본화율, 채널 조합이 가능하고, 또 다이내믹 전송 채널 스위칭과 중앙 채널의 가상 부호화를 지원할 수 있으며, 50MHz로 업-클럭킹시 한 개의 프로세서만으로 처리가 가능하고, 또 4개의 프로세서가 있어야 실시간 처리가 가능한 마스터 보드의 다채널 처리도 고속 알고리듬을 적용하여 2개의 프로세서만으로 처리할 수 있으므로 사용되는 프로세서의 감소를 통해 제작 비용을 줄일 수 있고, HDTV 방송등의 응용시에 중요한 문제인 부호화 시간 지연도 줄일 수 있음은 물론 다른 프로세서나 ASIC 등의 응용에 활용할 수 있는 것이다.Therefore, it can process up to 5.1 channels and 640Kbps in real time at the 48KHz sampling rate of MPEG-2 Layer 2, enabling all bit rates, sampling rates, and channel combinations below it, and supporting dynamic transmission channel switching and virtual encoding of the central channel. It is possible to process only one processor when up-clocking to 50MHz, and multi-channel processing of the master board that can be processed in real time only with 4 processors is possible because it can be processed with only two processors by applying a high-speed algorithm. By reducing the number of processors, production costs can be reduced, and coding time delay, which is an important problem in applications such as HDTV broadcasting, can be reduced, and other processors or ASICs can be utilized.

이하, 첨부된 도면에 의거하여 본 발명의 바람직한 실시 예를 상세히 설명하면 다음과 같다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명 장치의 전체 블럭 구성도1 is an overall block diagram of an apparatus of the present invention

도 2는 본 발명 장치 중 슬레이브 보드의 블록 구성도Figure 2 is a block diagram of a slave board of the present invention device

도 3은 두 프로세서간 데이터 전달상태를 보인 상세 회로도3 is a detailed circuit diagram illustrating a data transfer state between two processors;

도 4는 본 발명 장치 중 자료전달 시간조절을 위한 레디신호 발생 회로도Figure 4 is a ready signal generating circuit diagram for adjusting the data transfer time of the present invention device

도 5는 본 발명 장치 중 마스터 보드의 상세 블록 구성도Figure 5 is a detailed block diagram of a master board of the present invention device

도 6은 마스터와 슬레이브간의 데이터 교환을 나타낸 구조도6 is a structural diagram illustrating data exchange between a master and a slave

도 7은 마스터 보드와 PC간의 데이터 전달 회로도7 is a circuit diagram illustrating data transfer between a master board and a PC.

도 8은 본 발명 장치를 통한 부호화 과정의 작업 할당 알고리듬8 is a task allocation algorithm of the encoding process by the present invention device

도 9는 프로세서 사이의 인터럽트를 통한 동기화 과정을 나타낸 흐름도.9 is a flowchart illustrating a synchronization process through interrupts between processors.

도 10은 도 9중 슬레이브 보드내의 프로세서-a에서 수행하는 프로그램 플로우챠트FIG. 10 is a program flowchart executed by processor-a in the slave board of FIG.

도 11은 도 10 중 행렬 연산에 대한 블록도FIG. 11 is a block diagram of a matrix operation in FIG. 10.

도 12는 도 11의 입력 벡터 변환에 대한 블록도12 is a block diagram of the input vector transform of FIG.

도 13은 도 9 중 슬레이브 보드내의 프로세서-b에서 수행하는 프로그램 플로우챠트FIG. 13 is a program flowchart executed by processor-b in the slave board of FIG.

도 14는 도 9 중 마스터 보드내의 프로세서-a에서 수행하는 프로그램 플로우챠트FIG. 14 is a program flowchart executed by processor-a in the master board of FIG.

도 15는 본 발명 장치에서 사용한 버퍼구조15 is a buffer structure used in the apparatus of the present invention

도 16은 도 9 중 마스터 보드내의 프로세서-b에서 수행하는 프로그램 플로우챠트FIG. 16 is a program flowchart executed by processor-b in the master board of FIG.

도 17은 본 발명에서 실험대상으로 사용했던 음원별 비트 할당 인덱스 평균 분포도FIG. 17 is a graph illustrating the average distribution of bit allocation indexes for each sound source used as an experimental object in the present invention

도 18은 48KHz, 128Kbps에서의 사전 비트 할당 분포도.18 is a prior bit allocation distribution diagram at 48 KHz, 128 Kbps.

도 19 내지 도 23은 본 발명을 이용하여 실험한 결과를 보여주는 데이터로써,19 to 23 are data showing the results of experiments using the present invention.

도 19는 대표적인 비트 할당 전 MNR 곡선19 is a representative bit allocation MNR curve

도 20은 사전 할당에 따른 MNR 곡선 변화도(전자기타의 경우)20 is a diagram of MNR curve change according to pre-allocation (in case of electronic guitar)

도 21은 사전 할당에 따른 MNR 곡선 변화도(첫번째 프레임)21 is a diagram of MNR curve change according to pre-allocation (first frame)

도 22는 사전 할당에 따른 MNR 곡선 변화도(바이올린의 경우)22 is a diagram of MNR curve change according to pre-allocation (in case of violin)

도 23은 비트율에 따른 최종 MNR 곡선 변화도23 is a change in the final MNR curve according to the bit rate

도면중 주요 부분에 대한 부호의 설명Explanation of symbols for main parts in the drawings

1 : 슬레이브 보드 2 : 마스터 보드1: slave board 2: master board

3 : 백 플래인 보드 4 : 인터페이스부3: backplane board 4: interface unit

5 : 퍼스널 컴퓨터 6 : 소오스5: personal computer 6: source

7 : A/D 변환기7: A / D converter

도 1은 본 발명 장치의 전체 블럭 구성도를 나타낸 것이고, 도 2는 본 발명 장치 중 슬레이브 보드의 블록 구성도를 나타낸 것이며, 도 3은 두 프로세서간 데이터 전달상태를 보인 상세 회로도를 나타낸 것이고, 도 4는 본 발명 장치 중 자료전달 시간조절을 위한 레디신호 발생 회로도를 나타낸 것이며, 도 5는 본 발명 장치 중 마스터 보드의 상세 블록 구성도를 나타낸 것이다.1 is a block diagram showing the overall block diagram of the apparatus of the present invention, Figure 2 is a block diagram of a slave board of the device of the present invention, Figure 3 shows a detailed circuit diagram showing a data transfer state between two processors, 4 shows a ready signal generating circuit diagram for adjusting the data transfer time of the device of the present invention, Figure 5 shows a detailed block diagram of a master board of the device of the present invention.

또, 도 6은 마스터와 슬레이브간의 데이터 교환을 나타낸 구조도를 나타낸 것이고, 도 7은 마스터 보드와 PC간의 데이터 전달 회로도를 나타낸 것이며, 도 8은 본 발명 장치를 통한 부호화 과정의 작업 할당 알고리듬을 나타낸 것이고, 도 9는 프로세서 사이의 인터럽트를 통한 동기화 과정을 나타낸 흐름도를 나타낸 것이며, 도 10은 도 9중 슬레이브 보드내의 프로세서-a에서 수행하는 프로그램 플로우챠트를 나타낸 것이고, 도 11은 도 10 중 행렬 연산에 대한 블록도를 나타낸 것이다.FIG. 6 is a structural diagram illustrating data exchange between a master and a slave, FIG. 7 is a circuit diagram of a data transfer circuit between a master board and a PC, and FIG. 8 is a work allocation algorithm of an encoding process through the apparatus of the present invention. 9 is a flowchart illustrating a synchronization process through interrupts between processors. FIG. 10 is a flowchart illustrating a program performed by processor-a in a slave board of FIG. 9. FIG. 11 is a flowchart illustrating a matrix operation of FIG. 10. Shows a block diagram.

도 12는 도 11의 입력 벡터 변환에 대한 블록도를 나타낸 것이고, 도 13은 도 9 중 슬레이브 보드내의 프로세서-b에서 수행하는 프로그램 플로우챠트를 나타낸 것이며, 도 14은 도 9 중 마스터 보드내의 프로세서-a에서 수행하는 프로그램 플로우챠트를 나타낸 것이고, 도 15는 본 발명 장치에서 사용한 버퍼구조를 나타낸 것이며, 도 16은 도 9 중 마스터 보드내의 프로세서-b에서 수행하는 프로그램 플로우챠트를 나타낸 것이고, 도 17은 본 발명에서 실험대상으로 사용했던 음원별 비트 할당 인덱스 평균 분포도를 나타낸 것이며, 도 18은 48KHz, 128Kbps에서의 사전 비트 할당 분포도를 나타낸 것이다.FIG. 12 illustrates a block diagram of the input vector conversion of FIG. 11, FIG. 13 illustrates a program flowchart performed by the processor-b in the slave board of FIG. 9, and FIG. 14 illustrates the processor-in the master board of FIG. 9. FIG. 15 shows a program flowchart executed in a, FIG. 15 shows a buffer structure used in the apparatus of the present invention, FIG. 16 shows a program flowchart executed in the processor-b in the master board of FIG. 9, and FIG. FIG. 18 shows the average distribution of the bit allocation index for each sound source used in the present invention, and FIG. 18 shows the distribution of the prior bit allocation at 48 KHz and 128 Kbps.

또한, 도 19 내지 도 23은 본 발명을 이용하여 실험한 결과를 보여주는 데이터로써, 도 19는 대표적인 비트 할당 전 MNR 곡선이고, 도 20은 사전 할당에 따른 MNR 곡선 변화도(전자기타의 경우)이며, 도 21은 사전 할당에 따른 MNR 곡선 변화도(첫번째 프레인)이고, 도 22는 사전 할당에 따른 MNR 곡선 변화도(바이올린의 경우)이며, 도 23은 비트율에 따른 최종 MNR 곡선 변화도이다.19 to 23 are data showing the results of experiments using the present invention. FIG. 19 is a representative MNR curve before bit allocation, and FIG. 20 is a change diagram of the MNR curve according to pre-allocation (in the case of electronic guitar). 21 is an MNR curve change diagram (first plane) according to a pre-allocation, FIG. 22 is an MNR curve change diagram (in the case of a violin) according to a pre-assignment, and FIG. 23 is a final MNR curve change diagram according to a bit rate.

이에 따르면, 각각의 채널 오디오 데이터를 입력 받아 서브밴드 분석과 심리 음향 모델링을 수행하는 5개의 슬레이브 보드(1)와;According to this, five slave boards 1 for receiving subchannel analysis and psychoacoustic modeling by receiving respective channel audio data;

채널 매트릭싱, 비트 할당, 양자화, 비트 팩킹, 그리고 PC(5)와의 인터페이스를 위한 마스터 보드(2)와;A master board 2 for channel matrixing, bit allocation, quantization, bit packing, and interface with a PC 5;

상기 슬레이브 보드(1)들과 마스터 보드(2)를 서로 연결해 주는 백 플래인 보드(3)와;A backplane board (3) connecting the slave boards (1) and the master board (2) to each other;

상기 마스터 보드(2)에서 비트열로 전송되어 오는 데이터를 퍼스널 컴퓨터로 전송하는 인터페이스부(4)와;An interface unit (4) for transmitting data transferred from the master board (2) to a bit string to a personal computer;

MPEG 표준 복호화 프로그램으로 전송된 비트열을 복호화하여 부호화 시스템의 성능을 확인하는 퍼스널 컴퓨터(5)와;A personal computer 5 for verifying the performance of the encoding system by decoding the bit stream transmitted to the MPEG standard decoding program;

CD 플레이어나 DAT 등과 같은 소오스(6)에서 아날로그로 출력되는 오디오신호를 16비트 PCM의 디지털 데이터로 변환시켜 상기 슬레이브 보드(1)들로 전송하는 6채널의 A/D 변환기(7)로 구성된 것을 기본적인 특징으로 한다.It is composed of six channels of A / D converters (7) for converting audio signals output analog from a source (6) such as a CD player or a DAT to digital data of 16-bit PCM and transmitting them to the slave boards (1). It is a basic feature.

이때, 1 채널의 서브밴드 분석과 심리 음향 모델 처리를 담당하는 슬레이브 보드(1)들은 특정회사의 제품으로써 TMS320C30이라 칭하는 2개의 프로세서-a,b(11)(12)를 사용하고 있으며, 서브 밴드 분석과 심리 음향 모델 분석이 끝난 각 채널의 데이터를 함께 묶어서 처리해야 하는 비트 할당과 양자화 과정, 그리고 비트열 포맷팅 작업을 담당하는 마스터 보드는 1개의 프로세서로 구성되어 있었다.At this time, the slave boards 1, which are in charge of subband analysis and psychoacoustic model processing of one channel, use two processors-a, b (11) (12) called TMS320C30 as products of a specific company, and the subbands The master board, which handles bit allocation, quantization, and bit string formatting that binds and processes data from each channel after analysis and psychoacoustic model analysis, consists of one processor.

또한, 본 발명에서 구현된 시스템은 다채널 처리의 확장된 역할을 수행하기 위해 마스터 보드(2)에도 프로세서를 2개 사용한 동시에 연산 처리 능력을 확장하기 위해 기존의 33MHz 프로세서뿐 아니라, 최대 50MHz 프로세서의 사용도 고려하여 구성하였다.In addition, the system implemented in the present invention uses two processors in the master board (2) to perform the extended role of multi-channel processing, and at the same time not only the existing 33 MHz processor but also up to 50 MHz processor to expand the computational processing capability. It was also configured in consideration of use.

추가된 1개의 프로세서에서는 다채널 비트 할당 과정만을 전담하도록 하였으며, 구성된 시스템내에서 높은 비트율로 다채널 부호화가 가능하도록 몇가지 고속 알고리듬을 적용하였고, 메모리 배치와 반복 루프의 최적화를 통해 서브루틴별 수행시간을 개선하였으며, 각 프로세서사이의 적절한 업무분담과 동기화를 통해 코딩 시간 지연(Coding Delay)을 최소화하였다.One additional processor is dedicated to the multi-channel bit allocation process. Several high-speed algorithms are applied to enable multi-channel encoding at high bit rates in the configured system, and execution time for each subroutine is optimized through memory allocation and iterative loops. The coding delay is minimized through proper work sharing and synchronization between the processors.

또, 소오스(6)와 슬레이브 보드(1) 사이에는 6채널 A/D 변환기(7)가 있으므로 입력 소오스(6)는 CD 플레이어나 DAT(Digital Audio Tape)의 폰(Phone)출력이 될 수도 있고, 5개의 마이크를 앰프를 통해서 연결하여 직접 5채널의 데이터를 입력할 수도 있다.In addition, since there is a six-channel A / D converter 7 between the source 6 and the slave board 1, the input source 6 may be a phone output of a CD player or a digital audio tape (DAT). It is also possible to connect five microphones via an amplifier and directly input five channels of data.

이 때, CD 플레이어나 DAT에서 나오는 D/A출력과 A/D 변환기(7)의 정확도에 의해 약간의 오차가 생길 수 있으나 이러한 오차는 거의 무시할 수 있다는 것이 실험적으로 밝혀졌다.At this time, some errors may occur due to the accuracy of the D / A output from the CD player or the DAT and the A / D converter 7, but it has been found experimentally that these errors can be almost ignored.

위와 같이 변환된 16비트 PCM샘플은 슬레이브 보드(1)들의 프로세서에 내장된 직렬 포트를 이용하여 메모리에 저장되고 부호화 되며, 도 2에 나타난 바와 같이 슬레이브 보드(1)들에 있는 각각의 프로세서는 수개의 듀얼 포트(Dual-port) RAM(13)을 이용하여 서로 데이터를 교환한 다음 각각의 루틴을 수행한다.The 16-bit PCM sample converted as described above is stored and encoded in a memory using a serial port built into the processors of the slave boards 1, and as shown in FIG. 2, each processor in the slave boards 1 is counted. Two dual-port RAMs 13 are used to exchange data with each other and then perform respective routines.

또, 마스터 보드(2)는 각 슬레이브 보드(1)들에서 처리가 끝난 데이터를 역시 듀얼 포트 RAM을 이용하여 넘겨받아 비트 할당, 양자화, 그리고 비트열 포맷팅을 한다.In addition, the master board 2 receives the processed data from each slave board 1 using dual port RAM and performs bit allocation, quantization, and bit string formatting.

최종 처리가 끝난 비트열은 프로세서에 내장된 타이머와 직렬 포트(Serial Port)를 이용하여 마스터 보드(2)에서 PC(5)로 전송되고 PC(5)에서는 직렬-병렬 변환기(Serial-to-Parallel Converter)와 DMA(Direct Memory Access)를 사용하여 전송된 비트열을 하드 디스크에 저장한다.The final processed bit stream is transferred from the master board (2) to the PC (5) using the timer and serial port built into the processor, and from the PC (5) to the serial-to-parallel converter. Using the converter and direct memory access (DMA), the transferred bit stream is stored on the hard disk.

저장한 데이터는 다채널 D/A 변환기로 전송되어 사용자가 원하는 채널 구성대로 재생하여 음질을 평가할 수 있고, 비디오 비트열과 통합하기 위한 시스템으로도 전송할 수 있다Stored data can be sent to a multi-channel D / A converter to play back the user's desired channel configuration to evaluate sound quality and to send it to a system for integration with video bitstreams.

상기한 슬레이브 보드(1)는 도 2와 같이 크게 5개의 부분으로 나눌 수 있다.The slave board 1 may be divided into five parts as shown in FIG. 2.

그 첫번째는 두개의 프로세서-a,b(11)(12)부분이고, 두번째는 프로세서간의 정보 전달을 위한 듀얼 포트 RAM(13) 부분이며, 세번째는 프로그램이나 결과를 저장하기 위해 각 프로세서에 필요한 메모리(14) 부분이고, 네번째는 슬레이브 보드(1)와 마스터 보드(2)간의 데이터 전송을 위한 듀얼 포트 RAM(15)이며, 다섯째는 메모리 어드레스를 디코딩하여 웨이트 수를 변경해 주는 디코딩 회로 부분이다.The first is the two processor-a, b (11) (12) sections, the second is the dual port RAM 13 section for transferring information between processors, and the third is the memory required for each processor to store programs or results. (14), the fourth is the dual port RAM 15 for data transfer between the slave board (1) and the master board (2), and the fifth is the decoding circuit part for changing the number of weights by decoding the memory address.

MPEG-2 계층 2 알고리듬은 1152샘플을 한 프레임 단위로 하여 부호화가 수행되므로 현재 프레임의 부호화는 다음 프레임 1152샘플이 들어오는 사이(= 1152/44.1KHz = 26.1 msec)에 끝나야 하나, 서브밴드 분석부와 심리음향 부분에서 많은 연산량을 필요로 하므로 33MHz 클럭회로(25)로 부터 클럭신호를 입력받는 프로세서 하나로는 실시간 처리가 어려워진다.Since the MPEG-2 Layer 2 algorithm performs encoding using 1152 samples in one frame unit, encoding of the current frame must end between receiving the next frame 1152 samples (= 1152 / 44.1KHz = 26.1 msec). Since a large amount of computation is required in the psychoacoustic portion, one processor that receives a clock signal from the 33 MHz clock circuit 25 makes it difficult to process in real time.

따라서, 실시간 처리를 위해서는 한 채널당 두 개의 프로세서를 사용하고 처리 과정을 파이프 라인화시키면서 각 프로세서가 동시에 동작하는 병렬처리 기법이 사용돼야 한다.Therefore, for real-time processing, a parallel processing technique in which two processors are used per channel and each processor operates simultaneously while pipelined processing must be used.

프로세서-a(11)에서 처리한 결과중 프로세서-b(12)에서 심리음향 모델링을 수행하기 위하여 필요한 데이터는 최대 스케일 팩터(Maximum scale factor) 값과 FFT를 통해 얻은 파워 스펙트럼이이며, 이 데이터를 전송하는 데에는 슬레이브 프로세서-a,b(11)(12) 사이에 있는 공유된 듀얼 포트 RAM(13)이 사용된다.Among the results processed by the processor-a (11), the data necessary for the psychoacoustic modeling in the processor-b (12) are the maximum scale factor value and the power spectrum obtained through the FFT. The shared dual port RAM 13 between the slave processors a, b (11) 12 is used for transmission.

슬레이브 보드(1) 내에 있는 두 개의 프로세서간의 데이터 교환과 마스터, 슬레이브 보드간의 정보 전달을 위해서는 듀얼 포트 RAM(15)을 사용한다.The dual port RAM 15 is used to exchange data between two processors in the slave board 1 and transfer information between the master and slave boards.

도 2에서 볼 수 있듯이 완전히 대칭된 구조 사이에 듀얼 포트 RAM(13)을 연결하여 두 프로세서 사이의 정보 전달을 용이하게 하고 각 프로세서에 할당된 듀얼 포트 RAM(15)의 한 쪽을 백 플래인(Back plane)(16)에 연결하여 마스터 보드(2)와의 정보 전달에 이용한다.As shown in FIG. 2, dual port RAMs 13 are connected between fully symmetrical structures to facilitate information transfer between two processors, and one side of the dual port RAM 15 allocated to each processor is connected to the backplane. Back plane) (16) is used to transfer information with the master board (2).

사용된 듀얼 포트 RAM(13,15)은 제로 웨이트 접근이 가능한 속도를 가지고 있어서 실시간 처리에 적합하며, 양쪽에서 메모리 접근을 할 때 일어날 수 있는 충돌을 방지하기 위한 BUSY핀이 내장되어 있어서 두 프로세서간의 효과적인 정보 전달이 가능하다.The dual port RAMs 13 and 15 used have speeds that allow zero-weight access, making them suitable for real-time processing, and have a built-in BUSY pin to prevent collisions that can occur when accessing memory from both sides. Effective information transfer is possible.

또한 이것에는 양방향 통신을 위한 인터럽트(interrupt) 선을 갖추고 있어서 어느 한쪽에 연결된 시스템이 정해진 번지(왼쪽은 7ffh, 오른쪽은 7feh)에 데이터를 쓰면 다른 한 쪽에 연결된 시스템으로 자동으로 인터럽트가 요청되므로 실시간 알고리듬의 전체적인 동기를 맞추기가 용이하다.In addition, it has an interrupt line for bidirectional communication, so if a system connected to one side writes data to a fixed address (7ffh on the left and 7feh on the right), the system automatically connects to the other system and the interrupt is automatically requested. It is easy to match the overall motivation of.

듀얼 포트 RAM을 사용함으로써 모든 프로세서는 다른 프로세서의 존재를 인식하지 못한 채 오직 자기에게 할당된 메모리만 있는 시스템처럼 주어진 작업을 수행할 수 있고 수행 결과를 쉽게 주고 받을 수 있다.By using dual-port RAM, every processor can perform a given task and easily send and receive results, just like a system with only the memory allocated to it, without knowing the presence of other processors.

도 3은 두 개의 프로세서-a,b(11)(12)가 듀얼 포트 RAM(13)을 사용하여 데이터를 교환하고 인터럽트를 발생시키는 것을 나타낸 것이다.3 shows that two processors a, b (11) 12 use dual port RAM 13 to exchange data and generate an interrupt.

각각의 프로세서-a,b(11)(12)는 독립적인 로칼 메모리로서 32K 워드의 RAM과 같은 크기의 ROM 메모리(14)을 가지고 있다.Each processor-a, b (11) 12 has an independent local memory and a ROM memory 14 of the same size as a RAM of 32K words.

상기 ROM 메모리(14)에는 슬레이브 보드(1)에 필요한 프로그램 코드, 여러가지 표, 그리고 이것들을 RAM으로 옮겨주기 위한 프로그램이 기록되어 있어서 독립적인(stand-alone) 시스템 동작이 가능하며 RAM을 이용하여 중간 단계의 연산결과를 저장한다.The ROM memory 14 stores program codes necessary for the slave board 1, various tables, and programs for transferring these to the RAM to enable stand-alone system operation. Save the operation result of the step.

빠른 ROM을 사용하여 독립 시스템을 구성해도 되지만 쉽게 구할 수 있고 값이 싼 저속 ROM을 사용하여 리셋 후에 부호화에 필요한 모든 프로그램 코드와 자료를 고속 RAM으로 옮겨 실행하도록 하여 시스테 구성을 용이하게 하였다.A fast ROM can be used to configure a standalone system, but the easy-to-use and inexpensive low-speed ROM facilitates system configuration by transferring all program code and data needed for encoding to fast RAM after reset.

어드레스 디코딩 소자로는 GAL22V10을 사용하였다.GAL22V10 was used as the address decoding device.

이것은 최대 12개의 입력과 10개의 출력을 이용할 수 있으므로 슬레이브 보드(1)내의 모든 메모리에 필요한 여러 가지 제어 신호를 디코딩하는데 적절하고, 최대 7nsec의 전송 지연을 갖기 때문에 회로내의 RAM을 제로 웨이트로 접근하는 데 적절하다.It can use up to 12 inputs and 10 outputs, so it is suitable for decoding various control signals required for all memories in the slave board 1, and has a transfer delay of up to 7 nsec to access RAM in the circuit as zero weight. Is appropriate.

모든 프로그램이 완성된 후에는 ROM을 이용하여 인터럽트 벡터 테이블을 구성하게 되는데, ROM의 속도가 느려 제로 웨이트로 접근할 수 없으므로 인터럽트가 발생한 뒤 ROM을 억세스 할 때마다 웨이트 수를 변경해줘야 한다.After all programs are completed, the interrupt vector table is constructed by using ROM. Since the speed of ROM is inaccessible to zero weight, the number of weights should be changed every time the ROM is accessed after an interrupt occurs.

이러한 웨이트 수의 변경은 프로세서 내에 있는 소프트웨어적 웨이트수 제어기능으로 처리할 수 없다. 왜냐하면 인터럽트가 발생하는 순간 프로세서 내부에서 인터럽트 벡터 테이블을 참조하여 해당 인터럽트 번지로 점프하기 때문이다.This change in the number of weights cannot be handled by the software weight control function in the processor. This is because, when an interrupt occurs, the processor jumps to the interrupt address by referring to the interrupt vector table inside the processor.

따라서 이러한 작업을 수행하기 위해서는 하드웨어적으로 웨이트를 발생하여 RAM과 ROM의 웨이트수를 자동적으로 변경해야만 한다.Therefore, in order to perform such a task, the weight must be generated in hardware to automatically change the number of weights in RAM and ROM.

프로세서에는 하드웨어적으로 웨이트 수를 변경가능하도록 해주는 RDY신호(Ready)가 있다.The processor has an RDY signal (Ready) that allows the number of weights to be changed in hardware.

RAM은 제로 웨이트로 접근가능하므로 ROM이 위치한 영역의 어드레스가 발생할 때만 웨이트수를 증가시켜주는 회로를 설계하여 이 레디(RDY)신호에 입력해주면 자동적으로 웨이트수를 조절할 수 있다.Since RAM is accessible with zero weights, a circuit that increases the number of weights only when an address in the region where the ROM is located can be designed and input into the RDY signal to automatically adjust the number of weights.

도 4는 이러한 레디신호 발생회로를 나타낸 것으로, 여기서 여러 플립플롭의 출력중 한 개를 선택하므로써 웨이트 수를 조절할 수 있다.Figure 4 shows such a ready signal generation circuit, where the number of weights can be adjusted by selecting one of the outputs of several flip-flops.

또, 도 5는 개선된 마스터 보드(2)의 구성을 나타낸 것으로, 최대 5개 채널에 대하여 비트 할당, 양자화, 그리고 비트열 포맷팅을 수행하는 마스터 보드(2)는 두 개의 프로세서-a,b(21)(22), 슬레이브 보드의 구조처럼 두 프로세서 간의 데이터를 교환하기 위한 듀얼 포트 램(27), 96K ROM(23)과 64K RAM(24), 클럭 회로(25,26), 어드레스 디코딩 회로로 이루어져 있고, 슬레이브 보드(1)와의 데이터 교환을 위하여 백 플래인 커넥터를 통하여 각 슬레이브 보드(1)의 듀얼 포트 RAM과 연결되어 있다.(도6)In addition, FIG. 5 shows an improved configuration of the master board 2. The master board 2 performing bit allocation, quantization, and bit string formatting on a maximum of five channels includes two processors-a, b ( 21) (22), dual port RAM 27, 96K ROM 23 and 64K RAM 24, clock circuits 25 and 26, and address decoding circuits for exchanging data between two processors as the slave board structure. It is connected to the dual port RAM of each slave board 1 through the backplane connector for data exchange with the slave board 1 (Fig. 6).

이전의 마스터 보드에 비하여 한 개의 프로세서가 추가되었고 사용자 요구사항을 PC(5)로부터 받아들이기 위한 로직(28)이 추가되었다.One processor has been added compared to the previous master board and logic 28 has been added to accept user requirements from the PC 5.

또, 스탠드 얼론(Stand-alone) 시스템을 구축하기 위하여 ROM과 RAM의 크기를 증가하였고, 또 마스터 보드(2)에 장착될 프로세서는 최대 50MHz의 속도를 가진 프로세서에서부터 슬레이브 보드(1)와 같은 기존의 33MHz 프로세서까지를 지원할 수 있도록 듀얼 클럭회로(25,26)를 추가하였다.In addition, in order to build a stand-alone system, the size of ROM and RAM has been increased, and the processor to be mounted on the master board 2 has a speed of up to 50 MHz from a processor having a speed of up to 50 MHz. Dual clock circuits (25 and 26) have been added to support up to 33 MHz processors.

기존의 마스터 보드에서 96Kbps 비트율을 기준으로 한 채널에 대한 비트 할당과 양자화, 비트열 포맷 과정을 수행한 결과 수행시간이 약 12msec가 소요되었다.As a result of performing bit allocation, quantization, and bit string formatting on the channel based on 96Kbps bit rate in the existing master board, the execution time was about 12msec.

다채널 처리시 마스터 보드(2)는 모든 채널의 데이터에 대해 작업을 수행해야 하기 때문에 채널수가 증가하면 이에 비례하여 수행시간도 증가한다.In the multi-channel processing, since the master board 2 must perform work on data of all channels, the execution time also increases in proportion to the number of channels.

채널당 비트율이 96Kbps인 5채널 데이터에 대한 수행 시간을 한 채널때의 수행시간으로 단순 예측하면 약 60msec가 걸린다는 결론이 나온다.It is concluded that it takes about 60msec by simply predicting the execution time for 5 channel data with 96Kbps bit rate per channel.

표본화 주파수가 48KHz일 때 한 프레임 구간이 24msec이므로 프로세서가 1개이던 기존의 마스터 보드로는 다채널에 대한 실시간 처리가 불가능하다.When the sampling frequency is 48KHz, one frame section is 24msec, so the real-time processing of multiple channels is not possible with the existing master board having one processor.

따라서, 개선된 마스터 보드(2)는 프로세서가 2개이므로 각각의 프로세서가 24msec 분량의 루틴을 담당하면 최대 48msec까지 실시간 처리가 가능하다고 말할 수 있다.Therefore, since the improved master board 2 has two processors, if each processor is in charge of a routine of 24 msec, it can be said that up to 48 msec of real time processing is possible.

또한 50MHz 프로세서로 교체할 경우 같은 처리 분량에 대해 기존 33MHz 프로세서의 33/50안에 수행이 끝나게 된다.In addition, the replacement of the 50MHz processor will be completed within 33/50 of the existing 33MHz processor for the same amount of processing.

즉, 33MHz 프로세서에서 60msec 걸리던 분량의 계산은 약 39.6msec안에 처리할 수 있게 된다.In other words, the calculation of the amount of 60 msec in the 33 MHz processor can be processed in about 39.6 msec.

그러므로 한 채널이 96Kbps인 5채널 입력에 대한 실시간 처리가 가능해진다는 수치분석이 나오며, 이후 비트 할당 과정의 개선등 소프트웨어 성능 향상을 통해 보다 높은 비트율의 다채널 데이터를 처리할 수 있다.Therefore, the numerical analysis that one channel is capable of real-time processing for five-channel input of 96 Kbps is available. After that, it is possible to process multi-channel data of higher bit rate through software performance improvement such as improvement of bit allocation process.

마스터 보드(2)는 최대 5개 슬레이브 보드(1)에서의 결과를 함께 처리해야 하므로 모든 슬레이브 보드(1)와의 정보 교환이 가능하여야 한다.Since the master board 2 must process the results of up to five slave boards 1 together, it should be possible to exchange information with all slave boards 1.

마스터 보드(2)는 백 플래인(16)을 통하여 각 슬레이브 보드(1)의 듀얼 포트 RAM에 연결되어 있으므로 도 6과 같이 각 슬레이브 보드에 있는 듀얼 포트 RAM을 자기의 로컬 메모리처럼 이용하여 슬레이브 보드와 정보 교환을 한다.Since the master board 2 is connected to the dual port RAM of each slave board 1 through the backplane 16, the slave board using the dual port RAM in each slave board as its local memory as shown in FIG. Exchange information with

마스터 보드(2)와 슬레이브 보드(1)를 연결하는 듀얼 포트 RAM의 영향으로 마스터 보드(2)는 슬레이브 보드(1)의 존재를 인식하지 못한 채 독립적인 시스템처럼 동작할 수 있다.Due to the influence of the dual port RAM connecting the master board 2 and the slave board 1, the master board 2 may operate as an independent system without recognizing the existence of the slave board 1.

또한 MPEG-2 부호화 알고리듬을 구현하기 위해서는 모든 시스템이 같은 동기신호를 가져야 하므로 마스터 보드(2)에 있는 리셋 회로와 클럭 회로는 백 플래인(16)을 통하여 각 슬레이브 보드에 연결되어 있다.In addition, in order to implement the MPEG-2 encoding algorithm, all systems must have the same synchronization signal, so that the reset circuit and the clock circuit in the master board 2 are connected to each slave board through the backplane 16.

최종 비트열 전송을 위하여 프로세서의 직렬 포트가 백 플래인(16)을 통하여 IBM-PC의 직렬-병렬 변환기와 연결되어 있고 사용자 지정사항을 받아들이기 위하여 또다른 직렬 포트가 PC의 다른 인터페이스 회로에 연결되어 있다.The processor's serial port is connected to the IBM-PC's serial-to-parallel converter via the backplane 16 for final bitstream transmission, and another serial port to another interface circuit on the PC to accept user specifications. It is.

도 7은 상기 마스터 보드(2)내에 있는 프로세서-b(22)와 인터페이스 보드(4), 그리고 PC(5)사이의 연결을 나타낸 것이다.FIG. 7 shows the connection between the processor-b 22 and the interface board 4 and the PC 5 in the master board 2.

프로세서 내부에 있는 직렬 포트(Serial Port)와 타이머를 이용하면 최대 약 8Mbps의 데이터 전송이 가능하다.The serial port and timer inside the processor allow data transfers of up to approximately 8Mbps.

마스터 보드(2)는 프레임 단위의 일정한 시간 간격으로 계속해서 슬레이브 보드(1)로부터 데이터를 받아 비트열 포맷팅까지의 필요한 루틴을 처리한다.The master board 2 continuously receives data from the slave board 1 at regular time intervals in units of frames and processes necessary routines up to bit string formatting.

비트열 팩킹을 마치고 로컬 RAM에 저장되어 있는 최종 비트열은 매 프레임 단위로 PC(5)로 전송된다.After bit string packing, the final bit string stored in the local RAM is transmitted to the PC 5 every frame.

50MHz 클럭회로(26)으로 부터 프로세서에 입력되는 50MHz 시스템 클럭을 소프트웨어적으로 16분주하여 3.125 MHz를 만든 후, 이를 프로세서에서 PC로의 직렬 데이터 전송에 필요한 전송 클락으로 사용하였다.The 50 MHz system clock input from the 50 MHz clock circuit 26 was divided into 16 by software to make 3.125 MHz, which was then used as a transmission clock for serial data transfer from the processor to the PC.

또, 16비트 전송을 알리는 프레임 동기신호는 프로세서 내부에 있는 타이머를 이용하여 생성시켰다. 동시에 이 신호를 PC로 전송함으로서 PC쪽에서 데이터를 하드에 저장하는 데 이용하도록 하였다.In addition, a frame synchronization signal indicating a 16-bit transmission was generated using a timer inside the processor. At the same time, this signal was transmitted to a PC, which was used by the PC to store data on the hard drive.

따라서 프로세서 쪽에서는 직렬 데이터 핀인 DX0와 16비트 단위로 데이터의 전송을 PC쪽에 알리는 FSX0, 그리고 전송 클락인 CLKX0신호를 PC쪽으로 보낸다.Therefore, the processor sends DX0, a serial data pin, FSX0 for transmitting data to the PC in 16-bit units, and CLKX0, a transmission clock, to the PC.

PC(5)의 SLOT(병렬 포트를 확장하기 위한 장치)에 장착된 인터페이스 보드내의 비트열 수신 회로에서는 8비트 직렬-병렬 변환기 두 개를 순차적으로 연결하여 직렬로 전송된 데이터를 16비트 데이터로 변환한다.In the bit string receiving circuit in the interface board mounted in the SLOT (device for expanding the parallel port) of the PC 5, two 8-bit serial-to-parallel converters are sequentially connected to convert serially transmitted data into 16-bit data. do.

16비트의 데이터가 입력되었음을 알리는 FSX0 신호는 바로 PC의 DMA 요구신호로 사용하여 변환된 16비트 데이터를 메모리에 저장됨에 따라 프로세서는 16비트 단위의 데이터가 전송될 때마다 PC에게 DMA 요구를 하게되고, PC에서는 DACK(DMA Acknowledge) 신호가 발생되는 순간 동안만 전송된 16비트 데이터를 얻어 이것을 하드 디스크에 저장한다.The FSX0 signal, which indicates that 16-bit data has been input, is used as the DMA request signal of the PC, and the converted 16-bit data is stored in the memory. Therefore, the processor makes a DMA request to the PC whenever 16-bit data is transmitted. In addition, the PC obtains 16-bit data transmitted only during the moment when a DACK (DMA Acknowledge) signal is generated and stores it in the hard disk.

참고로 PC 대신 복호화기를 연결하면 직렬-병렬 변환기만 제거한 전체적인 시스템을 그대로 적용할 수 있다.Note that if you connect a decoder instead of a PC, you can apply the entire system without the serial-to-parallel converter.

MPEG-2 실시간 부호화기의 소프트웨어를 구성하는데 있어서 중요한 점은 구현된 하드웨어의 성능하에서 최상의 음질과 높은 비트율, 그리고 다양한 선택사양을 지원할 수 있도록 하는 것이다.The important point in configuring the software of MPEG-2 real time encoder is to support the best sound quality, high bit rate and various options under the performance of the implemented hardware.

이를 위해서 각 처리단계에 실시간 구현에 적합한 고속 알고리듬을 적용하고 각각의 서브루틴을 최적화 시킨 후 이들의 수행시간을 바탕으로 프로세서간에 적절한 업무 분담을 하는 작업이 필요하다.For this purpose, it is necessary to apply a fast algorithm suitable for real-time implementation in each processing step, optimize each subroutine, and perform appropriate task sharing among processors based on their execution time.

기존의 부호화기는 2개의 프로세서를 가진 슬레이브 보드와, 1개의 프로세서를 가진 마스터 보드로 이루어져 있었다.The existing encoder consists of a slave board with two processors and a master board with one processor.

슬레이브 보드(1)에서 처리하는 루틴은 각 채널별로 독립적인 처리가 가능하기 때문에 다채널 확장에 따른 별도의 부담이 생기지 않지만, 마스터 보드(2)에서 처리하는 서브루틴들은 채널 확장에 따른 수행시간 초과의 문제가 발생하게 되었다.Since the routines processed by the slave board 1 can be processed independently for each channel, there is no separate burden due to multi-channel expansion, but the subroutines processed by the master board 2 exceed the execution time due to channel expansion. Problems have arisen.

다채널 확장에 따른 마스터 보드(2)의 역할 확대를 대비하여 앞에서 설명한 것처럼 프로세서를 1개 보강하였지만, 충분히 좋은 음질을 얻기 위한 높은 비트율을 지원하기에는 아직 계산 능력이 부족하였다.In order to expand the role of the master board 2 due to the multi-channel expansion, one processor was reinforced as described above, but the computational power was still insufficient to support a high bit rate to obtain a sufficiently good sound quality.

본 발명에서 구현된 MPEG-2 오디오 부호화 알고리듬은 도 8에 나타난 것과 같이, 슬레이브 보드(1)에서 수행되는 작업과, 마스터 보드(2)에서 수행되는 작업으로 크게 나눌 수 있다.As illustrated in FIG. 8, the MPEG-2 audio encoding algorithm implemented in the present invention can be roughly divided into tasks performed on the slave board 1 and tasks performed on the master board 2.

각 서브루틴이 채널별로 독립적으로 처리될 수 있는 성격인지, 아니면 모든 채널의 정보를 함께 지녀야 처리 가능한지의 여부에 따라 작업 할당이 이루어지는데, 전자의 경우 슬레이브 보드(1)에 할당하고 후자는 마스터 보드(2)에서 처리해야만 한다.Task assignment is made depending on whether each subroutine can be processed independently for each channel, or whether all the channels should be processed together. It must be dealt with in (2).

비트 할당 과정이나 비트열 포맷팅 과정 등은 모든 채널의 정보를 함께 가지고 있어야 하므로 마스터 보드(2)에서 처리해야 하고, 서브밴드 분석 과정이나 심리 음향 모델 등은 각 채널별 입력 샘플로부터 직접 결과를 얻는 작업이므로 슬레이브 보드(1)에서 처리할 수 있다.The bit allocation process and the bit string formatting process must be carried out by the master board (2) because all channel information must be included together, and the subband analysis process or psychoacoustic model can directly obtain the results from the input samples of each channel. Therefore, the slave board 1 can process it.

슬레이브 보드(1)에서 처리가능한 작업을 마스터 보드(2)로 가져오게 되면 마스터 보드(2)는 최대 5채널 데이터를 순차적으로 처리해야 하기 때문에 5배 이상 계산 시간이 연장되고 작업량에 부담을 갖게 된다.When a task that can be processed in the slave board 1 is brought to the master board 2, the master board 2 needs to process up to 5 channel data sequentially, so that the calculation time is extended by five times or more and the workload is burdened. .

따라서, 슬레이브 보드(1)에서는 가능한 모든 작업을 처리해서 전송에 필요한 최소한의 결과만을 마스터 보드(2)로 넘겨주도록 해야한다.Therefore, the slave board 1 should process all possible tasks so that only the minimum result required for transmission is passed to the master board 2.

모든 채널의 심리 음향 모델 결과(SMR)를 가지고 있어야 처리가 가능한 비트 할당 과정과 비트 할당 정보로 처리되는 양자화, 그리고 전송을 위한 비트열 포맷팅 과정은 마스터 보드(2)에서 처리해야 하는데, 이 가운데 비트 할당 과정은 채널수가 늘어하면서 수행 시간이 급격하게 증가하여, 마스터 보드(2)내에 두 프로세서를 사용하여도 실시간 처리가 불가능하다.The master board (2) must process the bit allocation process, the quantization processed by the bit allocation information, and the bit string formatting process for transmission, which must have the psychoacoustic model results (SMR) of all channels. As the allocation process increases in number as the number of channels increases, the real time processing is impossible even if two processors are used in the master board 2.

이를 효율적으로 처리하기 위해 사전 할당 알고리듬을 제안하고 적용하였다.In order to handle this efficiently, a pre-allocation algorithm is proposed and applied.

채널수를 변화시키며 비트 할당 과정의 실제 수행 시간을 측정해 본 결과가 표 1과 같다.Table 1 shows the results of measuring the actual execution time of the bit allocation process by varying the number of channels.

비트 할당 과정의 수행시간Execution Time of Bit Allocation Process 수행 시간Execution time msecmsec loadload 1채널, 96Kbps1 channel, 96 Kbps 2.12.1 8.8％8.8% 2채널, 192Kbps2 channels, 192 Kbps 10.310.3 42.9％42.9% 4채널, 384Kbps4 channels, 384 Kbps 41.741.7 173.7％173.7% 5채널, 640Kbps5 channels, 640 Kbps 62.562.5 260.0％260.0%

1. 프로세서간 작업 할당과 동기화1. Assign and synchronize tasks between processors

다채널 처리를 위한 프로세서 작업 할당시에 가장 바람직한 방향은 가능한 최대한의 작업을 슬레이브 보드(1)에서 끝내고 마스터 보드(2)에서는 채널별 데이터를 모두 모아 프레임 포맷에 맞게 팩킹만을 하도록 하는 것이다.The most desirable direction when allocating processor tasks for multi-channel processing is to finish the maximum possible tasks on the slave board 1 and to collect all the channel-specific data in the master board 2 so as to pack only the frame format.

그러나 비트 할당 과정의 경우, 모든 채널의 SMR을 함께 비교하여 처리하므로 마스터 보드(2)에서 처리할 수밖에 없다.However, in the bit allocation process, since the SMRs of all channels are compared and processed together, the master board 2 has no choice but to process them.

비트 할당 결과에 따라 서브밴드 샘플에 대해 차등적으로 행해지는 양자화는 채널별로 독립적으로 수행될 수 있지만 수행시간이 일정치 않은 비트 할당 과정 때문에 마스터 보드(2)와 슬레이브 보드(1) 사이에 동기화(synchronization)가 어려워 마스터 보드(2)에서 처리하도록 하였다.Quantization, which is performed differentially on subband samples according to the bit allocation result, can be performed independently for each channel, but is synchronized between the master board 2 and the slave board 1 due to the bit allocation process whose execution time is not constant. synchronization was difficult, and the master board 2 processed it.

마스터 보드(2)에서 양자화를 수행하게 되면 병렬처리가 안되고 5채널에 대해 순차적으로 모두 수행되어야 하므로 슬레이브 보드(1)에서 행할 때 보다 5배나 긴 수행시간을 필요로 하게 된다.When the quantization is performed in the master board 2, since the parallel processing is not performed and all of the 5 channels must be sequentially performed, the execution time of the master board 2 requires 5 times longer than that in the slave board 1.

도 9는 도 8의 작업 할당을 기준으로 각 프로세서별로 할당된 작업과 프로세서사이의 인터럽트를 통한 동기화 과정을 나타낸 흐름도이다.FIG. 9 is a flowchart illustrating a synchronization process through interrupts between tasks allocated to each processor and processors based on the task assignment of FIG. 8.

입력 샘플 버퍼에서부터 슬레이브 DSP 프로세서-b(12)까지는 1채널에 대해 보여주고 있고, 마스터 보드(2)의 처리과정은 5채널에 대해 보여준다.From the input sample buffer to the slave DSP processor-b (12) is shown for one channel, and the processing of the master board (2) is shown for five channels.

이때 각 영역의 크기는 수행시간 정도를 대략적으로 나타낸다.At this time, the size of each region roughly represents the execution time.

슬레이브 보드(1) DSP 프로세서-b(12)까지의 처리는 5개의 슬레이브 보드(1)에서 각각의 입력 샘플에 대해 같은 방법으로 병렬적으로 처리된다.Slave board 1 The processing up to DSP processor-b 12 is processed in parallel in the same way for each input sample on five slave boards 1.

도면상에서의 화살표는 각 서브루틴의 수행이 끝나고 다음 과정에 필요한 데이터를 듀얼포트 RAM을 통해 다른 프로세서에 넘겨주면서 인터럽트를 발생시키는 것을 나타낸다.The arrows in the figure indicate that the execution of each subroutine is completed and an interrupt is generated while passing data required for the next process to another processor through the dual port RAM.

한 프레임에 대한 처리를 하는 동안에도 새로 들어오는 입력 샘플을 받기 위해 입력 버퍼는 2개를 사용하여 스위칭시키고 있다. 각 프로세서의 서브루틴 처리가 1152개의 다음 프레임 샘플이 모두 들어오기 전에 끝나지 못하면 실시간 처리 실패가 발생한다.While processing one frame, the input buffers are switched using two to receive new incoming samples. If each processor's subroutine processing does not finish before all 1152 next frame samples are received, a real-time processing failure occurs.

한 프레임 처리과정을 순차적으로 보면, 먼저 입력 샘플을 받는 슬레이브 보드(1)내의 프로세서-a(11)에서 심리 음향 모델에 사용할 FFT를 가장 먼저 행한다. 이후 1152 샘플이 모두 들어와야 시작할 수 있는 서브밴드 분석 필터 뱅크 과정과 스케일 팩터 코딩까지가 슬레이브 보드(1)내의 프로세서-a(11)에서 수행된다.Sequentially, one frame processing procedure is performed first in the processor-a 11 in the slave board 1 which receives the input sample to use the FFT for the psychoacoustic model. Subsequently, the subband analysis filter bank process and scale factor coding, which can be started after all 1152 samples are input, are performed in the processor-a 11 in the slave board 1.

서브밴드 샘플로부터 얻을 수 있는 각 서브밴드별 음압까지의 결과가 나와야만 슬레이브 보드(1)내 프로세서-b(12)의 심리 음향 모델이 시작될 수 있기 때문에, 슬레이브 보드(1)내 프로세서-a(11)의 한 프레임 작업이 모두 끝난 후에 슬레이브 보드(1)내의 프로세서-b(12)에 인터럽트를 준다.Since the psychoacoustic model of the processor-b 12 in the slave board 1 can be started only when the results up to the sound pressure for each subband obtained from the subband samples can be started, the processor-a (in the slave board 1) can be started. After all the frame work of 11) is finished, the processor-b 12 in the slave board 1 is interrupted.

슬레이브 보드(1)내의 프로세서-b(12)가 첫 번째 프레임에 대한 심리 음향 모델을 수행하는 동안 슬레이브 보드(1)내의 프로세서-a(11)는 두 번째 프레임에 대한 FFT와 서브밴드 분석을 수행하게 된다.Processor-a 11 in slave board 1 performs FFT and subband analysis on the second frame while processor-b 12 in slave board 1 performs the psychoacoustic model for the first frame. Done.

이때 마스터 보드(2)는 심리 음향 모델 결과인 SMR이 나와야 비트 할당을 시작할 수 있기 때문에 대기상태에 있게 된다.At this time, the master board 2 is in a standby state because the SMR, which is the result of the psychoacoustic model, can be started to allocate bits.

심리 음향 모델이 끝난 직후 슬레이브 보드(1)내의 프로세서-b(12)에서 마스터 보드(2)로 인터럽트를 주면 각 채널의 심리 음향 모델 수행시간이 같지 않기 때문에(최고 5msec까지 크게 변화함) 채널에 따라 잘못된 SMR 데이터를 가져올 수 있다.If the psycho-acoustic model is interrupted from the processor-b (12) in the slave board 1 to the master board 2 immediately after the psycho-acoustic model is finished, the execution time of the psychoacoustic model of each channel is not the same (varies greatly up to 5 msec). As a result, incorrect SMR data may be imported.

이런 문제를 방지하기 위해 도 9에서처럼 모든 채널의 심리 음향 모델링이 반드시 끝나게 되는 두 번째 프레임의 심리 음향 모델 시작 타이밍과 동시에 마스터 보드(2)에도 인터럽트를 주어 첫 프레임에 대한 비트 할당을 시작하도록 하는 방법을 사용하였다.In order to prevent this problem, as shown in FIG. 9, the master board 2 is interrupted at the same time as the psychoacoustic model start timing of the second frame where all the psychoacoustic modeling must be completed, so that the bit allocation for the first frame can be started. Was used.

이 경우 슬레이브 보드(1)내의 프로세서-a(11)에서 첫 번째 프레임의 서브밴드 샘플을 마스터 보드(2)로 넘겨주기 전에 두 번째 서브밴드 분석이 끝나게 되므로 첫 번째 서브밴드 샘플이 보존될 수 있도록 별도의 조취를 취하였다.In this case, the second subband analysis is completed before processor-a (11) in the slave board (1) passes the subband samples of the first frame to the master board (2) so that the first subband samples can be preserved. A separate action was taken.

슬레이브 보드(1)내 프로세서-a,b(11)(12)와의 듀얼포트 RAM은 마스터 보드(2)내의 프로세서-a(21)에 연결되어 있기 때문에 모든 데이터는 마스터 보드(2)내의 프로세서-a(21)에서 받게 된다.Since dual-port RAM with processors a, b (11) 12 in the slave board 1 is connected to processor a (21) in the master board 2, all data is transferred to the processor- in the master board (2). is received at a (21).

각 채널의 슬레이브 보드(1)내의 프로세서-a(11)로부터 서브밴드 샘플, 스케일 팩터 인덱스, 스케일 팩터 선택 정보, 슬레이브 보드(1)내의 프로세서-b(12)로부터 SMR을 전달 받은 마스터 보드(2)내의 프로세서-a(21)는 비트 할당만을 전담하는 마스터 보드(2)내의 프로세서-b(22)에 SMR과 스케일 팩터 선택 정보를 넘겨주고 다음 프레임까지 대기 상태에 들어간다.Master board 2 that has received subband samples, scale factor index, scale factor selection information, and SMR from processor-b 12 in slave board 1 from processor-a 11 in slave board 1 of each channel. The processor-a 21 in the N-th hand passes the SMR and scale factor selection information to the processor-b 22 in the master board 2 dedicated to bit allocation and enters the standby state until the next frame.

두 번째 프레임에 대한 각 슬레이브 보드(1)의 결과 데이터가 들어오면 마스터 보드(2)내의 프로세서-b(22)에 비트 할당에 필요한 데이터를 넘겨주고 첫 번째 프레임에 대한 양자화와 비트열 포맷팅 과정을 수행하게 된다.When the result data of each slave board 1 for the second frame comes in, the data necessary for bit allocation is passed to the processor-b 22 in the master board 2, and the quantization and bit string formatting processes for the first frame are performed. Will perform.

이렇듯 두 번째 프레임의 데이터가 들어온 이후에 첫 번째 프레임에 대한 수행이 이루어지는 마스터 보드(2)내의 프로세서-a(21)는 2개의 데이터 버퍼를 가지고 있어야만 한다.As such, after the data of the second frame comes in, the processor-a 21 in the master board 2 performing the first frame should have two data buffers.

마스터 보드(2)내의 프로세서-a(21)에서 선행 작업인 비트 할당을 하고 마스터 보드(2)내의 프로세서-b(22)에서 양자화와 팩킹을 하게 되면 5채널에 대한 서브밴드 샘플과 스케일 팩터 인덱스등을 듀얼포트 RAM을 통해 또 한번 넘겨주어야 하는 번거로움과 함께 시간적 손해(약 1.6msec)를 입게 되기 때문에 위와 같은 방법을 취하였다.If you do bit assignment as a predecessor in processor-a (21) in master board (2) and quantize and pack in processor-b (22) in master board (2), subband samples and scale factor indexes for 5 channels The above method was taken because it suffered time loss (about 1.6 msec) along with the hassle of passing the back through the dual port RAM.

마스터 보드(2)내의 프로세서-b(21)에서 수행되는 비트 할당 과정은 SMR 곡선의 모양에 따라 수행시간이 달라진다.The bit allocation process performed by the processor-b 21 in the master board 2 has a different execution time depending on the shape of the SMR curve.

마스터 보드(2)내 프로세서-b(21)의 작업이 끝나면 마스터 보드(2)내 프로세서-a(21)로 인터럽트를 발생시키는데 이때 프레임에 따라서는 마스터 보드(2)내 프로세서-a(21)의 PC 전송 인터럽트와 비트 할당 결과를 알리는 인터럽트가 충돌하는 경우가 생길 수 있다.When the processor-b (21) in the master board (2) is finished, an interrupt is generated to the processor-a (21) in the master board (2). At this time, depending on the frame, the processor-a (21) in the master board (2). In some cases, a PC transmission interrupt may conflict with an interrupt indicating a bit allocation result.

인터럽트 충돌이 발생하면 그 프레임에 대해서만 전송이 제대로 이루어지지 않아, 복호화되지 않는 현상이 발생하게 된다. 이런 문제를 막기 위해 마스터 보드(2)내의 프로세서-a(21)에서 PC로 비트열을 전송하는 동안에는 다른 프로세서로부터의 인터럽트를 차단시켜야 한다.If an interrupt collision occurs, the transmission is not properly performed only for the frame, and thus a phenomenon that is not decoded occurs. To avoid this problem, interrupts from other processors must be interrupted while transferring the bit stream from processor-a 21 in the master board 2 to the PC.

2. 슬레이브 보드내의 프로세서-a에서 처리되는 루틴2. Routines processed by processor-a in slave board

두 개의 프로세서로 병렬처리가 가능하도록 설계되어 있는 슬레이브 보드(1)는 1 채널 입력에 대한 서브밴드 분석 과정과 스케일 팩터 코딩, 그리고 심리 음향 모델을 담당하고 있다.Designed for parallel processing with two processors, the slave board (1) is responsible for subband analysis, scale factor coding, and psychoacoustic modeling for one channel input.

작업 할당을 위해 측정한 슬레이브 보드(1)에서 처리되는 서브루틴들의 수행시간이 표 2에 나타나 있다.Table 2 shows the execution time of the subroutines processed in the slave board 1 measured for the task assignment.

'로드(load)'는 표본화 주파수가 48KHz일 때, 한 프레임(1152샘플) 구간의 길이인 24msec에 대한 상대적인 비율이다.'Load' is a relative ratio of 24msec, which is the length of one frame (1152sample) interval, when the sampling frequency is 48KHz.

실시간 처리를 위해서는 다음 프레임 입력이 모두 들어오기 전에 현재 프레임에 대한 처리가 끝나야 하므로 한 프로세서의 로드가 100%를 넘게 되면 실시간 처리는 불가능하다.Real-time processing requires that the current frame be processed before all the next frame input comes in, so if one processor's load exceeds 100%, real-time processing is impossible.

슬레이브 보드(1)에서 처리하는 루틴 전체의 로드는 166.7%이므로 두 개의 프로세서에 적절히 작업 할당을 하면 실시간 처리가 가능하다.Since the total load of the routine handled by the slave board 1 is 166.7%, if the task is properly allocated to the two processors, real time processing is possible.

Slave 보드 처리 루틴의 수행시간Execution Time of Slave Board Processing Routines 수행 시간Execution time 처리 CPUProcessing CPU msecmsec loadload 서브밴드 분석Subband analysis 10.8610.86 45.3％45.3% aa 스케일 팩터 코팅Scale factor coating 2.182.18 9.1％9.1% aa FFT ＆ 파워 스펙트럼FFT & Power Spectrum 3.953.95 16.5％16.5% aa 심리 음향 모델Psychoacoustic model 17∼2317-23 95.8％95.8% bb

가장 많은 수행 시간을 필요로 하는 심리 음향 모델은 FFT 결과로 얻을 수 있는 파워 스펙트럼과 서브밴드 분석 과정을 거쳐 얻어지는 스케일 팩터가 나와야만 수행이 가능하다.Psychoacoustic models that require the most execution time can be performed only after the power spectrum obtained from the FFT result and the scale factor obtained through the subband analysis are performed.

이러한 점을 고려하여 슬레이브 보드(1)내의 프로세서-b(12)에는 심리 음향 모델만을 수행하도록 할당하였고, 나머지 서브루틴들을 슬레이브 보드(1)내의 프로세서-a(11)에 할당하였다.In consideration of this point, the processor-b 12 in the slave board 1 is allocated to perform only the psychoacoustic model, and the remaining subroutines are allocated to the processor-a 11 in the slave board 1.

서브밴드 분석 과정과 스케일 팩터 코딩이 모두 끝나야 슬레이브 보드(1)내 프로세서-b(12)의 심리 음향 모델 과정을 시작할 수 있기 때문에, 슬레이브 보드(1)내 프로세서-a(11)의 서브루틴 중 가장 많은 로드를 갖는 서브밴드 분석 과정은 전체 부호화 시간 지연(coding delay)에도 직접적으로 영향을 미치게 된다.Since both the subband analysis process and the scale factor coding must be completed before the psychoacoustic model process of the processor-b 12 in the slave board 1 can be started, the subroutine of the processor-a 11 in the slave board 1 can be started. The subband analysis process with the highest load will also directly affect the overall coding delay.

슬레이브 보드(1)내의 프로세서-a(11)에서는 각 채널별 입력 PCM 샘플을 받아서 심리 음향 모델을 위한 1024-point FFT와 서브밴드 분석 필터 뱅크, 스케일 팩터 계산과 코딩을 수행하게 되는데, 이때 상기 슬레이브 보드(1)내의 프로세서-a에서 수행하는 루틴의 흐름도는 도 10과 같다.Processor-a (11) in the slave board (1) receives input PCM samples for each channel and performs 1024-point FFT, subband analysis filter bank, scale factor calculation and coding for psychoacoustic model, wherein the slave A flowchart of the routine performed by processor-a in board 1 is shown in FIG.

즉, 프로세서와 메모리를 초기화한 후 시리얼 포트를 셋팅하면 비로소 입력 샘플이 들어오게 되는데, 슬레이브 보드(1)내의 프로세서-a(11)는 특히 각 루틴을 수행하는 동안 입력 샘플을 계속 받아야 하므로 1/44.1kHz 마다 한번씩 들어오는 입력에 대해 별도의 인터럽트 루틴을 가지고 있어야 한다.In other words, when the serial port is set after the processor and the memory are initialized, input samples are received. The processor-a (11) in the slave board 1 needs to continuously receive input samples during each routine. There should be a separate interrupt routine for incoming input once every 44.1kHz.

입력 샘플을 받는 인터럽트 루틴은 수행시간이 최소화되어야할 뿐만 아니라 사용되는 레지스터도 최소화되어야만 한다.Interrupt routines that take input samples must minimize the execution time as well as the registers used.

인터럽트 루틴에서 사용되는 레지스터는 슬레이브 보드(1)내의 프로세서-a(11)상의 다른 루틴들에서 사용할 수 없음을 고려하여 프로그램 하여야 한다.The register used in the interrupt routine must be programmed in consideration that it cannot be used by other routines on processor-a (11) in the slave board (1).

입력 샘플이 576개 들어오면 과거 프레임의 448샘플을 더하여 1024 포인트 FFT를 먼저 수행하게 된다. 마스터 보드 프로세서와의 동기화를 위해 이전 프레임에 대한 서브밴드 샘플, 스케일 팩터 인덱스, 스케일 팩터 선택 정보 등은 로컬 RAM에 저장하고 있다가 현재 프레임의 FFT 루틴이 끝난 후 듀얼 포트 RAM으로 옮기도록 한다.When 576 input samples come in, a 1024 point FFT is performed first by adding 448 samples of past frames. For synchronization with the master board processor, subband samples, scale factor indices, and scale factor selection information for previous frames are stored in local RAM and moved to dual port RAM after the FFT routine for the current frame.

이어서 한 프레임 크기인 1152샘플의 입력이 모두 들어오면 서브밴드 분석 필터 뱅크 과정을 수행하게 되는데, 이때 서브밴드 분석 필터 뱅크 과정은 슬레이브 보드(1)내의 프로세서-a(11)에 할당된 전체 분량의 64%를 차지하는 10.86msec가 걸리며, 이는 한 프레임 처리를 끝내야 하는 시간인 24msec에 대한 로드로도 45.3%나 차지하는 시간이다.Subsequently, when the input of 1152 samples of one frame size is all received, the subband analysis filter bank process is performed, wherein the subband analysis filter bank process is performed for the total amount allocated to processor-a (11) in the slave board (1). It takes 10.86msec, which takes up 64%, which accounts for 45.3% of the load for 24msec, the time to finish processing one frame.

또한 서브밴드 분석 과정이 끝나야만 심리 음향 모델 및 이후의 루틴들이 처리될 수 있기 때문에 부호화의 전체 시간 지연에도 직접적인 영향을 미치게 된다. 이처럼 많은 수행시간이 걸리는 서브밴드 분석 과정에 포함되어 있는 32×64의 행렬 연산을 본 구현에서는 32-포인트 IDCT(Inverse Discrete Cosine Transform)로 고속화하여 처리할 수 있는 알고리듬을 적용하였다.In addition, since the psychoacoustic model and subsequent routines can be processed only when the subband analysis process is completed, it directly affects the overall time delay of encoding. In this implementation, a 32-64 inverse discrete cosine transform (IDCT) algorithm can be used to speed-up the 32 × 64 matrix operation included in such a long subband analysis process.

필터뱅크 분석 과정을 거친 후 나온 서브밴드 샘플로부터 스케일 팩터를 구해 이를 코딩해두고 또한 스케일 팩터 선택 정보도 찾아 둔다.The scale factor is obtained from the subband sample after the filterbank analysis and coded, and the scale factor selection information is also found.

또한 마스터 보드(2)에서의 처리시간을 줄이기 위해 서브밴드 샘플의 스케일링까지도 슬레이브 보드(1)내의 프로세서-a(11)에서 처리하여 준다.In addition, in order to reduce the processing time in the master board 2, even the scaling of subband samples is processed by the processor-a 11 in the slave board 1.

여기까지의 과정이 끝나면 슬레이브 보드(1)내의 프로세서-b(12)와 마스터 보드(2)내의 프로세서-a(21)에 인터럽트를 줌으로서 필요한 데이터를 가져가 이후의 작업을 처리할 수 있도록 한다.After the process up to this point, interrupts the processor-b 12 in the slave board 1 and the processor-a 21 in the master board 2 so that the necessary data can be taken and processed later. .

그리고 슬레이브 보드(1)내의 프로세서-a(11)는 다시 다음 프레임에 대한 FFT 처리를 위해 입력 샘플이 모두 채워지기를 기다린다.Processor-a 11 in slave board 1 again waits for all input samples to be filled for FFT processing for the next frame.

이후, 고속 서브밴드 분석 알고리듬에 대해 설명하면 다음과 같다.Next, the fast subband analysis algorithm will be described.

MPEG의 서브밴드 분석 필터 뱅크 과정은 한 프레임 입력 1152샘플에 대해서 32샘플 단위로 블록을 나누어 중첩 가산에 의해 72개의 원소로 이루어진 입력 백터를 만들고 이를 서브밴드 분석 필터에 입력하면 서브밴드 분석결과는 32개의 서브밴드에 각각 한 개씩의 서브 샘플을 얻게 된다.The subband analysis filter bank process of MPEG divides blocks by 32 sample units for 1 frame input 1152 samples to form an input vector composed of 72 elements by overlapping addition and inputs the subband analysis filter into 32 subband analysis filters. One subsample is obtained for each subband.

위의 과정을 36회 시행하면 한 프레임 1152샘플(=32x36)에 대한 분석이 끝나게 된다.If the above process is performed 36 times, the analysis of one frame 1152 samples (= 32x36) is completed.

서브밴드 분석을 수행하기 위해서는 도 11과 같은 32x64의 분석 행렬을 거쳐야 하는데, 한 프레임 처리를 위해서 36번 반복되어야 하는 루틴 안에 32x64=2048의 곱셈, 덧셈 연산이 필요한 분석 행렬이 있는 것은 처리시간에 큰 부담이 된다.In order to perform subband analysis, it is required to go through a 32x64 analysis matrix as shown in FIG. 11, and an analysis matrix requiring multiplication and addition operation of 32x64 = 2048 in a routine that needs to be repeated 36 times for one frame processing is large in processing time. It is a burden.

곱셈과 덧셈을 한 명령어 사이클 안에 처리할 수 있는 프로세서 DSP칩을 사용한다고 하여도 분석 행렬 연산에 이론적으로 계산되는 수행시간은 2048×36=73728cycle≒4.42msec 이나 걸리게 된다.Even if you use a processor DSP chip that can process multiplication and addition in one instruction cycle, the theoretical calculation time for the analysis matrix operation 2048 × 36 = 73728cycle ≒ 4.42msec You will get caught.

이때 도 11의 분석 행렬의 원소들(M[i,k])은 Discrete Cosine Transform (DCT)의 커널(kernal)과 유사한 형태를 가지고 있는데, 이 점과 함께 코사인함수의 대칭성을 이용하여 행렬 연산과정을 32-point IDCT로 변형함으로서 여러 가지 고속 DCT연산을 사용할 수 있게 된다.In this case, the elements M [i, k] of the analysis matrix of FIG. 11 have a form similar to the kernel of the Discrete Cosine Transform (DCT), together with the symmetry of the cosine function. By transforming to 32-point IDCT, we can use various fast DCT operations.

도 12는 입력 벡터의 변환 과정을 나타낸 것으로, 고속 IDCT 알고리듬에는 DCT 커널이 DFT (Discrete Fourier Transform) 커널의 실수부분과 유사한 점에 착안하여 FFT (Fast Fourier Transform)를 이용하는 방법과 FFT의 전개 과정과 유사하게 시간축 혹은 주파수축에서의 간축(Decimation)을 사용하여 나비(butterfly)구조로 구현하는 방법으로 크게 나눌 수 있다.12 illustrates a process of transforming an input vector. In the fast IDCT algorithm, the DCT kernel uses a fast fourier transform (FFT) based on the similarity to the real part of the discrete fourier transform (DFT) kernel, and the development process of the FFT. Similarly, it can be broadly divided into a method of implementing a butterfly structure using a decimation on a time axis or a frequency axis.

DCT의 사용영역이 넓어짐에 따라 변환을 빠르게 수행할 수 있는 여러 가지 알고리듬이 개발되었는데, 그 중 제작된 하드웨어 상에서 구현하기에 가장 적합한 방법은 Lee's 고속 IDCT 알고리듬임을 실험적으로 확인하여 이를 사용하였다.As DCT's usage area expands, various algorithms have been developed that can perform the conversion quickly. Among them, the most suitable method to implement on the fabricated hardware is experimentally confirmed using Lee's high-speed IDCT algorithm.

실제 하드웨어 및 어셈블러로 알고리듬을 구현할 때는 순수한 변환과정에 필요한 곱셈 및 덧셈에 관한 연산 횟수뿐만 아니라 전처리와 후처리의 난이도, 파이프라인 구조 등을 고려하여야 하기 때문에 연산 횟수가 적다고 하여 가장 빠르다고 말할 수는 없다.When implementing algorithms with real hardware and assemblers, the number of operations is not the fastest because the number of operations related to multiplication and addition as well as the difficulty of preprocessing and postprocessing, pipeline structure, etc. are considered. none.

Lee's 고속 IDCT는 다른 고속 IDCT에 비해 전처리 과정은 없고 단지 입력을 bitreverse 순서로 해주어야 한다는 점만 있는데, 대부분의 범용 DSP 칩들이 이를 쉽게 구현할 수 있는 어드레싱 모드를 가지고 있고, FFT를 사용하는 다른 방법들에서도 FFT를 수행하기 위해서는 bitreverse과정이 필요하기 때문에 추가 부담으로 생각할 수 없다.Lee's high-speed IDCT has no preprocessing compared to other high-speed IDCTs and only requires inputs to be in bitreverse order. Most general-purpose DSP chips have an addressing mode that makes it easy to implement them, and FFT also uses other FFT methods. Because it requires bitreverse process, it cannot be considered as additional burden.

다만 Lee's 고속 IDCT의 구현상의 문제는 butterfly구조가 다소 복잡하고 출력 또한 정상적인 순서가 아니라 적절한 변환을 위해 많은 데이터 포인터를 필요로 한다는 점이다.However, the problem with Lee's high-speed IDCT is that the butterfly structure is rather complicated and the output also requires many data pointers for proper conversion, not the normal order.

3. 슬레이브 보드내의 프로세서-b에서 처리되는 루틴(심리 음향 모델)3. Routines handled by processor-b in slave boards (psychological model)

서브밴드별 SMR값을 찾기 위한 심리 음향 모델 과정만을 전담하는 슬레이브 보드(1)내의 프로세서-b에서 수행되는 루틴의 흐름도는 도 13과 같다.13 is a flowchart of a routine performed by the processor-b in the slave board 1 dedicated to the psychoacoustic model process for finding the SMR value for each subband.

즉, 프로세서와 메모리를 초기화한 후 대기상태에서 슬레이브 보드(1)내 프로세서-a(11)로부터의 인터럽트가 들어오면 듀얼포트 RAM으로부터 FFT 결과인 파워스펙트럼과 서브밴드별 최대 스케일 팩터 값을 전달받아 로컬 RAM에 저장한다. 이후 흐름도에 나온 순서대로 각 서브루틴을 처리하게 된다.In other words, if the interrupt from the processor-a (11) in the slave board (1) enters the standby state after initializing the processor and memory, the FFT result power spectrum and the maximum scale factor value for each subband are received from the dual port RAM. Store in local RAM. Each subroutine will be processed in the order shown in the flowchart below.

심리 음향 모델 과정 중 전체 마스킹 곡선을 구하는 작업이 수행시간의 대부분을 차지하고 있는데, 전체 마스킹 곡선을 구하기 위해서 로그값으로 표현되어 있는 개별 마스킹 곡선을 지수 연산하여 더한 후 이를 다시 로그값으로 바꾸는 과정이 많은 수행시간을 요구하는 원인이 된다.In the psychoacoustic model process, the task of calculating the total masking curve takes up most of the execution time.In order to find the total masking curve, the process of exponentially calculating the individual masking curves expressed as log values and converting them back to log values This causes a demand for execution time.

범용 DSP칩인 프로세서에는 로그 및 지수 연산을 수행할 수 있는 별도의 연산자가 없기 때문에 각각의 연산을 테일러 급수로 표현하여 근사화하는 방법으로서 대체해야 하기 때문에 로그 및 지수 연산은 많은 수행시간을 요구하게 된다.Since the processor, which is a general-purpose DSP chip, does not have a separate operator capable of performing log and exponential operations, the log and exponential operations require a lot of execution time because each operation must be replaced with a Taylor series.

DSP칩이나 ASIC 코어 설계시 로그 연산을 빠르게 수행할 수 있는 기법으로는 미리 로그표를 만들어 두고 이를 찾아서 연산을 대체하는 방법이나 수열의 수렴조건을 이용해서 찾는 방법등 여러 가지로 소개되고 있다.As a technique for quickly performing log operations when designing a DSP chip or ASIC core, various methods such as creating a log table in advance and replacing them and finding them using a convergence condition of a sequence are introduced.

이들은 테일러 급수를 사용하는 경우보다 고속으로 수행되지만 정밀도가 떨어져 고정 소숫점 연산등에 응용은 적합하나 고음질을 위해 부동 소숫점 연산 프로세서를 사용하는 본 부호화기에서 사용하기는 부적합하다.They perform faster than using Taylor series, but they are less accurate and therefore suitable for fixed-point arithmetic, etc., but are not suitable for use with this encoder, which uses a floating-point arithmetic processor for high sound quality.

본 발명에서 구현된 로그 및 지수 연산의 수행시간을 줄이기 위해 PC상에서 얻은 결과와 비교하여 오차가 생기지 않는 범위에서 테일러 급수의 차수를 최소화하였다. 그리고 서브루틴으로 처리되던 연산 과정을 프로그램 내부로 흡수하여 불필요한 초기화 과정과 'CALL'(4 사이클) 및 'RETURN'(4 사이클) 명령에 의한 시간 낭비를 없애고 프로그램 캐쉬 메모리의 사용을 최적화 시킴으로서 수행시간을 더욱 단축시켰다.In order to reduce the execution time of the logarithmic and exponential calculations implemented in the present invention, the order of the Taylor series is minimized in the range where no error occurs in comparison with the results obtained on the PC. By absorbing the operation process processed by the subroutine into the program, it eliminates unnecessary initialization process and waste of time by 'CALL' (4 cycle) and 'RETURN' (4 cycle) instructions and optimizes the use of program cache memory. Further shortened.

4. 마스터 보드내의 프로세서-a에서 수행되는 루틴4. Routines Performed on Processor-a in the Master Board

마스터 보드(2)는 각 채널 슬레이브 보드(1)로부터 서브밴드 샘플과 스케일 팩터 인덱스, 스케일 팩터 선택 정보, 그리고 심리 음향 모델링 결과인 SMR(Signal-to-Mask Ratio)을 전달받아서 서브밴드별로 비트 할당을 하고 비트 할당 정보를 바탕으로 서브밴드 샘플을 양자화한 후 비트열을 포맷팅하여 전송하는 역할을 담당한다.The master board 2 receives subband samples, scale factor indexes, scale factor selection information, and signal-to-mask ratio (SMR), which is a psychoacoustic modeling result, from each channel slave board 1, and allocates bits for each subband. After quantizing the subband samples based on the bit allocation information, a bit string is formatted and transmitted.

마스터 보드(2)에서 처리되는 서브루틴들은 채널별로 수행되는 슬레이브 보드(1)에서의 서브루틴들과는 달리 모든 채널의 정보를 함께 처리해야 하므로 부호화되는 채널수가 증가하면 그만큼 처리시간이 증가하게 되며 같은 채널수일 때 비트율이 높아져도 처리 시간은 늘어나게 된다.Unlike the subroutines in the slave board 1 performed for each channel, the subroutines processed by the master board 2 must process information of all channels together. Therefore, as the number of encoded channels increases, the processing time increases accordingly. When the number is higher, the processing time increases even if the bit rate is increased.

각 채널별 슬레이브 보드(1)로부터 처리된 데이터를 전송받아 비트할당을 전담하는 마스터 보드(2)내의 프로세서-b(22)에 SMR값을 넘겨주고 비트할당된 결과로부터 양자화와 비트열 포맷팅을 담당하는 마스터 보드(2)내 프로세서-a(21)의 흐름도는 도 14와 같다.Receives the processed data from the slave board 1 for each channel, passes the SMR value to the processor-b 22 in the master board 2 dedicated to bit allocation, and takes charge of quantization and bit string formatting from the bit assigned result. The flowchart of the processor-a 21 in the master board 2 is shown in FIG.

즉, 프로세서를 초기화한 후 슬레이브 보드(1)로부터의 인터럽트를 기다리는데, 이때 하드웨어의 구조상 한 개의 슬레이브 보드(1)로 부터만 인터럽트를 받을 수 있다.In other words, after initializing the processor, it waits for an interrupt from the slave board 1, and in this case, the interrupt can be received from only one slave board 1 due to the hardware structure.

그러므로 채널별로 수행시간이 다른 심리 음향 모델이 모든 채널에 대해 끝난 이후 인터럽트를 받기 위해 도 9에서 처럼 다음 프레임의 심리 음향 모델이 시작하는 시점에서 인터럽트를 받도록 설계하였다.Therefore, it is designed to receive an interrupt at the start of the psychoacoustic model of the next frame as shown in FIG.

인터럽트를 받으면 각채널의 심리 음향 모델 결과인 SMR값과 SCFSI를 마스터 보드(2)내의 프로세서-b(22)로 전송해주고 다른 데이터들을 로컬 RAM으로 옮겨준다.Upon receiving the interrupt, the SMR value and SCFSI, which are the psychoacoustic model results of each channel, are transmitted to the processor-b 22 in the master board 2 and other data are transferred to the local RAM.

이때 마스터 보드(2)내의 프로세서-b(22)에서 전담하는 비트 할당 과정은 많은 연산량으로 다채널 처리의 가장 큰 걸림돌이 된다.At this time, the bit allocation process dedicated to the processor-b 22 in the master board 2 is the biggest obstacle to multi-channel processing with a large amount of computation.

비트 할당 과정이 각 서브밴드의 MNR중 최소값을 찾아서 그 서브밴드에 비트 할당 인덱스를 한단계 상승시켜 주는 것이 기본 알고리듬인데, 서브밴드에 따라 양자화 스텝이 다르게 규정되어 있기 때문에 각기 다른 표를 참조해야만 한다.The basic algorithm is that the bit allocation process finds the minimum value of the MNRs of each subband and raises the bit allocation index by one step. Since the quantization steps are defined differently according to subbands, different tables must be referred to.

어셈블러의 경우 2차원 이상의 메모리 배열이 불가능하기 때문에 메모리 포인터가 비트 할당표 참조에 필요한 채널 정보와 서브밴드 정보를 함께 가질 수 없다.In the assembler, since a memory array of two or more dimensions is impossible, the memory pointer cannot have both channel information and subband information necessary for referencing the bit allocation table.

1차원 메모리 배열로 각 채널 서브밴드중 최소의 MNR값 찾기와 그 서브밴드에 맞는 비트 할당표 참조를 효율적으로 하기 위해 MNR과 비트 할당 인덱스의 버퍼를 채널순으로 섞은 도 15와 같이 구성함에 따라 슬레이브 보드(1)내의 프로세서-b(12)로부터 채널의 SMR을 가져와 마스터 보드(2)내의 프로세서-b(22)에 전달할 때는 도 15의 순서대로 마스터 보드(2)내 프로세서-b(22)의 로컬 RAM에 저장해야 한다.In order to efficiently find the minimum MNR value in each channel subband and refer to the bit allocation table for the subband in the one-dimensional memory array, as shown in FIG. When the SMR of a channel is taken from processor-b 12 in board 1 and transferred to processor-b 22 in master board 2, the processor-b 22 in master board 2 is transferred in the order of FIG. It must be stored in local RAM.

마스터 보드(2)내 프로세서-a(21)의 경우는 최종 비트열 팩킹에 유리한 방법으로 각 데이터의 버퍼를 구성하였다.In the case of the processor-a 21 in the master board 2, a buffer of each data is configured in a manner that is advantageous for the final bit string packing.

5채널일 경우 MPEG-1 호환 2채널(Lo, Ro)이 번갈아가며 서브밴드 순서로 팩킹되고 나머지 3채널(C, LS, RS)이 역시 순서대로 섞여가며 팩킹이 되므로 듀얼포트 RAM으로부터 각 채널의 서브밴드 샘플, 스케일 팩터 인덱스, 스케일 팩터 선택 정보를 로컬 RAM으로 옮길 때 MPEG-1 호환 2채널과 확장된 3채널별로 팩킹될 순서에 맞게 버퍼를 구성하는 것이 유리하다.In case of 5 channels, MPEG-1 compatible 2 channels (Lo, Ro) are alternately packed in subband order, and the remaining 3 channels (C, LS, RS) are also mixed in order and packed. When moving subband samples, scale factor indexes, and scale factor selection information to local RAM, it is advantageous to configure the buffers in the order that they will be packed by MPEG-1 compatible 2-channel and extended 3-channel.

또, 마스터 보드(2)내의 프로세서-b(22)로부터 가져오는 비트 할당 인덱스도 같은 방법으로 저장을 하면, 양자화 과정도 5채널에 대해 차례대로 5회 수행하지 않고 서브밴드수만 늘려서 2채널과 3채널에 대해 1회씩만 수행하면 된다.In addition, if the bit allocation index obtained from the processor-b 22 in the master board 2 is stored in the same manner, the quantization process is not performed five times for five channels in turn, but the number of subbands is increased to increase two and three channels. You only need to do this once for your channel.

또한 슬레이브 보드(1)로부터 가져온 데이터가 다음 프레임에 사용될 것이므로 마스터 보드(2)내 프로세서-a(21)의 메모리도 두 개의 버퍼로 나뉘어져 포인터를 스위칭하여 주어야 한다.In addition, since the data obtained from the slave board 1 will be used in the next frame, the memory of the processor-a 21 in the master board 2 should be divided into two buffers to switch pointers.

포맷팅에 유리하도록 양자화 및 그루핑 작업이 끝나면 비트열 팩킹 작업에 들어간다.After the quantization and grouping operations are completed, the bit string packing operation is performed.

MPEG-2의 비트열은 32비트의 헤더로 시작되는 MPEG-1 호환 비트열과 40비트의 헤더로 시작되는 확장 비트열로 나눌 수 있다.The bit stream of MPEG-2 can be divided into an MPEG-1 compatible bit string starting with a 32-bit header and an extension bit string starting with a 40-bit header.

비트율이 384Kbps를 넘지 않으면 MPEG-1 호환 비트열내에 수용할 수 있으므로 확장 비트열은 사용되지 않는다.If the bit rate does not exceed 384 Kbps, the extended bit string is not used because it can be accommodated in the MPEG-1 compatible bit string.

부호화기에서 생성해낸 비트열은 PC나 기타 전송 채널로 보내기 위해 16비트단위로 잘라서 RAM에 저장해두고 프레임 단위로 전송한다.The bit stream generated by the encoder is cut into 16 bit units and stored in RAM for transmission to the PC or other transmission channel.

비트열 팩킹 과정에서 저장 단위가 16비트인 점을 고려하여 'OR'명령을 적절히 사용하면 수행시간의 개선을 가져올 수 있다.In consideration of the fact that the storage unit is 16 bits in the bit string packing process, the proper use of the 'OR' instruction can improve the execution time.

확장 비트열은 16비트 단위로 한 프레임 포맷이 모두 팩킹된 다음에야 범위를 알 수 있기 때문에 그전에 확장 비트열 헤더의 CRC 검색 범위를 정할 수가 없다. 따라서 확장 비트열은 별도로 포맷팅하지 않고 확장 비트열 헤더를 뺀 전체 비트열 포맷팅이 모두 끝난 뒤 전송 과정에서 MPEG-1 호환 프레임 크기만큼이 전송된 후에 확장 비트열 헤더를 첨가한 뒤 나머지 비트열을 보내는 방식을 취한다.The extended bit string cannot be determined before the CRC search range of the extended bit string header because the range is known only after one frame format is packed in units of 16 bits. Therefore, the extended bit string is not formatted separately, but after the entire bit string formatting except for the extended bit string header is completed, the extended bit string header is added after the MPEG-1 compatible frame size is transmitted. Take the way.

CRC 루틴의 실제 적용에 있어서는 16비트의 CRC 부호가 들어갈 위치를 뺀 나머지 비트열이 완전히 팩킹된 후 검색 범위에 해당하는 비트열을 차례로 검색하고, 생성된 CRC-16 부호를 비트열 내에 삽입하였다.In the practical application of the CRC routine, the bit string corresponding to the search range is sequentially searched after the remaining bit string is completely packed except for the position of the 16-bit CRC code, and the generated CRC-16 code is inserted into the bit string.

포맷팅까지 마친 비트열은 시리얼 포트를 이용하여 PC나 혹은 다른 시스템으로 전송한다. 시리얼 전송 과정에서는 비트열의 오류 발생을 막기위해 일체의 외부 인터럽트를 차단하도록 해야 한다.Once the formatting is completed, the bit stream can be transferred to a PC or other system using the serial port. In the serial transmission process, it is necessary to block any external interrupt to prevent bit string error.

전송이 끝난 후에 다음 프레임에서 사용할 비트 할당 결과값을 마스터 보드(2)내의 프로세서-b(22)로부터 전달 받고 이를 포맷팅에 유리하게 로컬 RAM에 재배치한 후 다음 프레임으로 넘어간다.After the transfer is finished, the bit allocation result value to be used in the next frame is received from the processor-b 22 in the master board 2, and it is rearranged in the local RAM for formatting.

5. 마스터 보드(2)내의 프로세서-b에서 수행되는 루틴 (비트 할당)5. Routines (bit assignments) performed by processor-b in master board (2)

MPEG-2 부호화 과정 중 가장 많은 수행시간을 요구하는 비트 할당 과정은 마스터 보드(2)내의 프로세서-b(22)에서 전담하게 된다.The bit allocation process that requires the most execution time in the MPEG-2 encoding process is dedicated to the processor-b 22 in the master board 2.

마스터 보드(2)내 프로세서-b의 프로그램 흐름도는 도 16과 같다.A program flow diagram of the processor-b in the master board 2 is shown in FIG.

즉, 프로세서를 초기화한 후 마스터 보드(2)내의 프로세서-a(21)로부터 인터럽트가 오면 비트 할당에 필요한 데이터를 로컬 RAM으로 옮기고 5채널의 각 서브밴드에 대한 비트 할당 작업을 수행한다.That is, when an interrupt comes from the processor-a 21 in the master board 2 after initializing the processor, the data required for bit allocation is moved to the local RAM and bit allocation for each subband of 5 channels is performed.

각 채널의 SMR정보를 모두 비교하면서 수행되는 비트 할당의 경우 채널수가 많아지면 수행시간이 급격히 증가하게 된다.In the case of bit allocation performed while comparing all the SMR information of each channel, the execution time increases rapidly as the number of channels increases.

표 1에서 보듯이 마스터 보드(2)내 프로세서로 실험한 결과 2채널 192Kbps일 때 10.3msec정도 걸리던 비트 할당 루틴이 4채널 384Kbps일 때는 41.7msec가 걸려서 처리시간이 무려 4배나 증가하게 된다. 채널당 128Kbps정도에 해당되는 5채널 640Kbps의 비트 할당을 MPEG에서의 알고리듬대로 수행하기 위해서는 62.5msec가 걸린다.As shown in Table 1, as a result of experimenting with the processor in the master board (2), the bit allocation routine, which took about 10.3msec for 2 channel 192Kbps, took 41.7msec for 4 channel 384Kbps, which increases the processing time by 4 times. It takes 62.5msec to perform the 5-channel 640Kbps bit allocation, corresponding to 128Kbps per channel, according to the algorithm in MPEG.

이는 실시간 처리 가능 시간인 24msec에 2.6배나 초과하는 시간이므로 구성된 하드웨어 환경하에서 비트율이 높은 다채널 비트 할당이 사실상 불가능하다고 말할 수 있다.This is 2.6 times greater than the real-time processing time of 24msec, so it can be said that multi-channel bit allocation with high bit rate is virtually impossible under the configured hardware environment.

비트 할당 과정의 수행시간이 이처럼 채널과 비트율의 증가에 따라 급격히 증가하는 이유는 MPEG의 비트 할당 알고리듬을 보면 알 수 있다.The reason why the execution time of the bit allocation process increases rapidly with the increase of the channel and the bit rate can be seen from the bit allocation algorithm of MPEG.

양자화 레벨에 따라 결정되는 SNR과 심리 음향 모델 결과인 SMR의 차로부터 MNR을 구하여 각 서브밴드 중 최소의 MNR을 갖는 서브밴드에 1비트를 할당하여 그 밴드의 SNR을 향상시켜주고, 이렇게 하여 바뀐 MNR 곡선으로부터 다시 최소의 MNR을 갖는 서브밴드를 찾아서 다시 1비트를 할당한다.The MNR is obtained from the difference between the SNR determined according to the quantization level and the SMR resulting from the psychoacoustic model, and 1 bit is allocated to the subband having the minimum MNR of each subband to improve the SNR of the band. Find the subband with the minimum MNR again from the curve and allocate one bit again.

이 작업을 한 프레임내에서 사용가능한 비트를 모두 할당할 때까지 반복Repeat this until all the available bits are allocated in one frame

음원에 따른 비트 할당의 평균Average of beat assignment according to the sound source 서브밴드Subband 비트 할당 인덱스의 평균Average of bit allocation index 전자기타Electromagnetic Strike PopPop 바이올린violin 1One 6.56.5 8.28.2 9.19.1 22 5.55.5 6.16.1 7.37.3 33 5.05.0 4.64.6 5.75.7 44 6.06.0 6.26.2 7.27.2 55 5.65.6 5.95.9 7.27.2 66 5.35.3 5.75.7 6.76.7 77 5.15.1 5.45.4 5.55.5 88 4.94.9 5.15.1 5.35.3 99 5.05.0 5.05.0 4.84.8 1010 4.74.7 4.54.5 4.54.5 1111 4.94.9 4.44.4 4.04.0 1212 4.34.3 4.24.2 4.54.5 1313 4.94.9 4.14.1 3.03.0 1414 4.14.1 3.63.6 2.32.3 1515 3.23.2 3.03.0 1.61.6 1616 3.13.1 2.62.6 0.80.8 1717 2.62.6 1.71.7 0.20.2 1818 1.21.2 0.70.7 0.00.0 1919 0.30.3 0.20.2 0.00.0 2020 0.00.0 0.00.0 0.00.0 2121 00 00 00 2222 00 00 00 2323 00 00 00 2424 00 00 00 2525 00 00 00 2626 00 00 00 2727 00 00 00

수행하는 것이 기본 알고리듬이다.It is the basic algorithm to perform.

이때 할당에 사용되는 비트를 채널별로 미리 나누어 사용하지 않고 모든 채널, 모든 서브밴드의 MNR을 함께 비교하기 때문에 채널이 증가하면 비교되는 서브밴드가 배수로 증가하게 된다.In this case, since the MNRs of all channels and all subbands are compared together without using the bits used for allocation in advance for each channel, the subbands to be compared increase in multiples.

범용 DSP칩인 프로세서에는 최소값을 찾는데 효율적인 별도의 명령어가 없기 때문에 최소 MNR을 찾는 과정에 비교 명령어를 사용하여 구현해야 한다. 비교문에 이은 분기 명령어는 고속 DSP 칩의 최대 장점인 파이프라인 구조를 깨트리기 때문에 일반적인 명령어보다 4배나 긴 수행 사이클을 갖는다.The processor, a general-purpose DSP chip, does not have a separate instruction that is efficient to find the minimum value. Therefore, a comparison instruction must be implemented to find the minimum MNR. The branch instruction following the comparison breaks the pipeline structure, which is the greatest advantage of the high-speed DSP chip, and has four times longer execution cycles than a normal instruction.

따라서 비교 명령이 많이 들어가게 되면 실시간 처리가 매우 불리해진다. 한편, 비트율이 증가하면 프레임당 할당해야할 비트수가 그만큼 증가하게 되므로 비트 할당 루프를 도는 회수가 많아지는 결과를 가져온다.Therefore, when a large number of comparison commands are entered, real-time processing becomes very disadvantageous. On the other hand, if the bit rate is increased, the number of bits to be allocated per frame increases by that much, which results in a larger number of times around the bit allocation loop.

위의 2채널 192Kbps일 때와 4채널 384Kbps일 때를 비교하면 채널과 비트율이 각각 2배가 된 경우이므로 수행시간이 4배가 된 실험 결과를 설명할 수 있다.Comparing the above two-channel 192Kbps and four-channel 384Kbps when the channel and the bit rate is doubled, respectively, can explain the experimental results of four times the execution time.

본 발명에서 구현된 소프트웨어에서는 사전 할당(pre-allocation) 알고리듬을 통해 수행시간 초과의 문제를 해결하였다.The software implemented in the present invention solves the problem of execution timeout through a pre-allocation algorithm.

전송된 5채널의 비트 할당전 MNR값에 대해 지정한 양만큼 미리 할당을 하여두고 이에 맞게 MNR을 보정하여준 후 남은 비트양에 대해 MPEG 고유의 비트 할당 작업을 수행하게 된다.After assigning the MNR value before the bit allocation of the 5 channels transmitted in advance and correcting the MNR accordingly, MPEG-specific bit allocation operation is performed on the remaining bit amount.

또한 비트 할당 과정상의 비교문을 줄이기 위해 초기 할당 기간에 불필요하게 비교되는 명령어들을 없앤 루프를 먼저 돌린 후 최종적으로 정확한 비트 할당을 수행하여 프로그램을 마무리하도록 하였다. 비트 할당 작업이 끝나면 마스터 보드(2)내의 프로세서-a(21)로 인터럽트를 주어 수행의 종료를 알린다.In addition, to reduce the comparison statement in the bit allocation process, the loop is eliminated without unnecessary comparisons in the initial allocation period, and finally the correct bit allocation is executed to finish the program. When the bit allocation operation is completed, an interrupt is sent to processor-a (21) in the master board (2) to indicate the end of execution.

이하 사전 비트 할당 알고리듬에 대해 설명한다.The dictionary bit allocation algorithm is described below.

먼저, 5채널 640Kbps일 때 62.5msec나 걸리는 비트 할당 과정을 실시간으로 구현하기 위해서는 비트 할당 알고리듬을 간소화시킬 수 있는 방법을 강구해야한다.First, in order to implement a 62.5msec bit allocation process in real time with 5 channels of 640 Kbps, a method for simplifying the bit allocation algorithm must be devised.

MPEG의 경우 비트 할당 정보를 비트열 안에 포함시켜 복호화단에 전송하기 때문에 주어진 비트율안에서 부호화기는 비트 할당에 어느 정도 자율성이 부여된다. 비트 할당 작업량을 줄이기 위해 심리 음향 모델 결과인 SMR과 이를 바탕으로 할당되어진 서브밴드별 비트수를 통계적으로 분석하는 과정이 필요하다.In the case of MPEG, the bit allocation information is included in the bit string and transmitted to the decoding end, so that the encoder is given some autonomy to the bit allocation at a given bit rate. In order to reduce the bit allocation work, it is necessary to statistically analyze the SMR, which is the result of the psychoacoustic model, and the number of bits per subband allocated based on the psychoacoustic model.

비트 할당의 통계적 특성을 적극적으로 활용하여 수행시간을 개선하는 방법으로 다양한 종류의 음악 소스에 대해 실제 비트 할당 루틴을 통해 평균적인 할당표를 작성하여 모든 소스에 대해 항상 할당되는 비트수 만큼은 미리 할당을 해놓고 나머지 비트수에 대해서만 실제 비트 할당 알고리듬을 적용하는 것을 생각할 수 있다.By actively utilizing the statistical characteristics of beat allocation to improve performance, an average allocation table is created through actual beat assignment routines for various kinds of music sources, so that the number of bits that are always allocated for all sources is pre-allocated. You can think of applying the actual bit allocation algorithm only to the remaining number of bits.

즉, 항상 할당되는 비트수 만큼은 매 프레임마다 불필요하게 루프를 반복하지 않도록 미리 할당하여두고 남는 비트만을 할당시키는 알고리듬이다. 이를 위해 각기 다른 음압 분포를 갖는 음원들에 대해 채널당 128Kbps의 비트율로 할당한 각 서브밴드의 평균적인 비트 할당 인덱스 분포는 표 3 및 그림 17과 같다.In other words, the number of bits that are always allocated is an algorithm that allocates only the remaining bits in advance, so as not to repeat the loop unnecessarily every frame. For this purpose, the average bit allocation index distribution of each subband allocated at a bit rate of 128 Kbps per channel for sound sources with different sound pressure distributions is shown in Table 3 and Figure 17.

데이터는 각 음원에 대해 3000프레임을 실제 부호화한 후 복호화기에서 받은 비트 할당 인덱스의 평균으로 구하였다.The data was obtained as the average of the bit allocation indices received from the decoder after 3000 frames were actually encoded for each sound source.

이때 서브밴드 1~3의 비트 할당 인덱스는 나머지 서브밴드와는 다른 양자화 레벨을 가지고 있기 때문에 실제로 1~3 서브밴드에 할당된 비트수는 나머지 서브밴드에서의 같은 인덱스보다 샘플당 2비트가 더 큰 값이다.In this case, since the bit allocation indexes of subbands 1 to 3 have different quantization levels from the remaining subbands, the number of bits actually allocated to subbands 1 to 3 is 2 bits larger than the same index in the remaining subbands. Value.

실험적으로 선택한 3개의 음원은 광대역 음원인 전자기타, 저주파 대역의 음압이 높은 바이올린, 그리고 다양한 편성으로 가장 일반적인 음압 분포를 보이는 Pop 음악이다.The three sound sources selected experimentally are the electromagnetic guitar, the wide-band sound source, the violin with high sound pressure in the low frequency band, and the pop music showing the most general sound pressure distribution with various combinations.

도 17에서 보면 비트 할당 분포의 기본적인 유형은 각기 다른 음원에 대해 크게 달라지지 않고 있으며 주로 저주파 대역에 많은 비트가 할당되고 고주파 영역으로 갈수록 줄어들어 20번째 서브밴드 이상부터는 거의 할당이 되고 있지 않음을 알 수 있다(20번째 서브밴드의 경우 전자기타 음원에서 100프레임당 3번정도 1비트가 할당되었다).17, it can be seen that the basic type of the bit allocation distribution is not significantly different for different sound sources, and many bits are mainly allocated to the low frequency band and are reduced toward the high frequency region, so that they are hardly allocated from the 20th subband or more. (In the 20th subband, 1 bit is allocated about 3 times per 100 frames in the electromagnetic sound source).

바이올린의 경우는 저주파대역인 1,2 서브밴드에서의 할당량이 매우 높고 대신 17번째 서브밴드에서부터 거의 할당이 되고 있지 않다.In the case of the violin, the allocation is very high in the low frequency bands 1 and 2 subbands, and instead is allocated from the 17th subband.

전자기타는 3번째 서브밴드에서부터 13번째 서브밴드까지 비교적 평탄한 할당 분포를 보이다가 14번째 서브밴드부터 서서히 감소하는 것을 볼 수 있다.Electromagnetic strokes show a relatively flat allocation distribution from the third subband to the thirteenth subband, and then gradually decrease from the fourteenth subband.

Pop 음악에서는 다른 두 음원의 할당 곡선의 중간 형태를 갖는다.In pop music, it has an intermediate shape of the assignment curves of the other two sound sources.

세가지 음원의 평균 비트 할당 분포를 기준으로 하여 실시간 처리가 가능해지도록 각 서브밴드에 사전 할당(pre-allocation)해 놓을 비트수를 정하였다(표 4, 도 18).The number of beats to be pre-allocated to each subband was determined based on the average bit allocation distribution of the three sound sources so as to enable real-time processing (Table 4, FIG. 18).

사전 할당되는 비트 인덱스Preallocated Bit Index 서브밴드Subband 비트 할당 인덱스Bit allocation index 1,21,2 66 33 44 4,5,64,5,6 55 7,87,8 44 9,10,11,129,10,11,12 33 13,1413,14 1One 15-2715-27 00

이 값은 표 3의 실험 결과를 바탕으로 수행시간 로드가 100%를 넘지 않게 실험적으로 결정한 것으로 채널당 128Kbps의 비트율을 갖는 부호화 과정에서만 사용할 수 있다.Based on the experimental results in Table 3, this value was determined experimentally so that the execution time load would not exceed 100%. It can be used only in the encoding process having a bit rate of 128 Kbps per channel.

이렇게 하여 MPEG 비트 할당 알고리듬의 반복 루프 회수를 줄이는 한편, 비교 대상이 되는 서브밴드의 수를 줄이기 위해 실험 결과 채널당 128Kbps의 비트율일 때 어떤 음원에 대해서도 비트 할당이 되지 않는 22-27번째 서브밴드는 비트 할당 비교 대상에서 제외시키도록 하였다. 위 두가지 방법을 결합한 결과 5채널 640Kbps일 때 62.5msec나 걸리던 비트 할당 과정이 17.5msec안에 처리되어 실시간 구현이 가능하게 되었다. 같은 방법으로 실시간 처리가 불가능한 다른 비트율에 대해서도 비트 할당 인덱스의 평균을 구해 적절하게 사전 할당하여 수행시간을 조절할 수 있다.In this way, to reduce the number of repetitive loops of the MPEG bit allocation algorithm, and to reduce the number of subbands to be compared, the experimental results show that the 22-27th subband, which is not allocated to any sound source at a bit rate of 128 Kbps per channel, is used for bits. Exclude from allocation comparison. As a result of combining the above two methods, the bit allocation process, which took 62.5msec in 5 channel 640Kbps, is processed in 17.5msec. In the same way, the bit allocation index can be averaged and pre-allocated appropriately for other bit rates that cannot be processed in real time to adjust the execution time.

이후, 사전 비트 할당 알고리듬의 성능을 검증하면 다음과 같다.Then, the performance of the pre-bit allocation algorithm is verified as follows.

표 4에서 선택한 사전 할당량을 사용한 비트 할당 알고리듬의 MNR 변화를 통해 성능을 검증해 보자.Let's verify the performance by changing the MNR of the bit allocation algorithm using the pre-allocation chosen in Table 4.

MNR(Mask-to-Noise Ratio)은 SNR과 SMR의 차로부터 구해지며 이때의 잡음은 양자화 오차(Quantization Noise)를 말한다.Mask-to-Noise Ratio (MNR) is obtained from the difference between SNR and SMR, and noise at this time refers to quantization error.

MNR이 주는 의미는 마스크와 잡음의 비로서, 어떤 서브밴드의 MNR값이 0dB 이상이면 그 서브밴드에서의 양자화 오차는 모두 마스킹이 되어 사람의 귀로는 들리지 않게 됨을 의미한다. 따라서 심리 음향 모델에 의해 구한 SMR값이 정확하다고 가정할 때, 모든 서브밴드의 MNR을 0dB 이상으로만 만들면 원음과 지각적 구별이 없는 복원음을 얻을 수 있게 된다.The meaning of MNR is the ratio of mask and noise. If the MNR value of a subband is 0 dB or more, it means that all quantization errors in the subband are masked and inaudible to the human ear. Therefore, assuming that the SMR value obtained by the psychoacoustic model is correct, if the MNR of all subbands is made to be 0 dB or more, a reconstructed sound without perceptual distinction can be obtained.

비트 할당 과정은 결국 각 밴드의 MNR을 최대한 0dB이상으로 끌어 올리는 작업이므로 위에서 제안한 방법에 의해 처리한 최종적인 MNR결과가 만족스러우면 MPEG 본래의 비트 할당 알고리듬과 할당된 인덱스 분포가 조금 다르더라도 문제가 되지 않는다.The bit allocation process ultimately raises the MNR of each band to more than 0dB, so if the final MNR result processed by the method proposed above is satisfactory, it does not matter even if the MPEG original bit allocation algorithm and the allocated index distribution are slightly different. Do not.

도 19는 48KHz에서 각기 다른 음압 분포를 갖는 임의의 오디오 신호에 대한 세 개의 비트 할당전 MNR 곡선이다.19 is a three bit pre-allocation MNR curve for any audio signal with different sound pressure distributions at 48 KHz.

앞서 설명한 것처럼 전자기타 음원은 신호의 대역폭이 넓기 때문에 0dB이하의 MNR을 갖는 서브밴드가 넓게 분포하고 바이올린 음원은 신호의 대부분이 저주파 대역에 모여있기 때문에 0dB이하의 MNR값이 적은 대신 저주파수에서 깊게 내려가 있는 분포를 가진다. 첫 번째 프레임 신호의 경우 필터 뱅크의 시간 지연을 고려한 Zero Padding 때문에 신호 음압이 낮아서 MNR 곡선 또한 완만한 형태를 갖는다.As mentioned earlier, electromagnetic guitar sound sources have a wider signal bandwidth, so subbands with MNRs of 0 dB or less are widely distributed. Violin sound sources go deeper at low frequencies instead of lower MNR values of 0 dB or less because most of the signals are gathered in the low frequency band. Has a distribution. In the case of the first frame signal, due to the zero padding considering the time delay of the filter bank, the MNR curve has a gentle shape because the signal sound pressure is low.

바이올린의 경우 1~12 서브밴드, 전자기타의 경우 1~19 서브밴드에 대해 적당히 비트 할당을 하여 MNR을 0dB이상으로 향상시켜야 함을 알 수 있다.It can be seen that the MNR should be improved to 0 dB or more by appropriately assigning bits to 1 to 12 subbands for violins and 1 to 19 subbands for electromagnetic guitars.

각 서브밴드 샘플에 할당하는 비트수가 1비트 증가할 때마다 6dB정도의 MNR 상승 효과를 가져올 수 있다. 최종적인 MNR 곡선은 평평한 형태를 가지고 있는 것이 이상적이다.When the number of bits allocated to each subband sample is increased by one bit, an MNR increase effect of about 6 dB can be obtained. The final MNR curve should ideally have a flat shape.

이제 각각의 MNR 곡선이 사전 할당을 통해 어떻게 변화하는가를 살펴보자.Now let's look at how each MNR curve changes through pre-allocation.

도 20, 21, 22는 도 19의 MNR 곡선들에 대해 표 4에 나타난 크기만큼 서브밴드 별로 사전 할당하였을 때, 그리고 최종 비트 할당 과정까지 완전히 끝났을 때의 MNR 곡선의 변화를 각각 나타내고 있다.20, 21, and 22 show changes in the MNR curves when the MNR curves of FIG. 19 are pre-assigned for each subband by the size shown in Table 4, and when the MNR curves are completely finished until the final bit allocation process.

도 20의 전자기타 음원은 사전 할당을 통해 5~8 서브밴드는 10dB선까지 MNR이 상승되고 이후 잔여 비트 할당 과정을 통해 주로 9~19 서브밴드에 비트가 추가된다.In the electromagnetic guitar sound source of FIG. 20, MNR is increased to 10 dB line by 5-8 subbands through pre-allocation, and then bits are added to 9-9 subbands mainly through the remaining bit allocation process.

사전 할당을 통해 상승된 MNR값이 지나치게 높아지지 않고 최종 할당된 후의 평균수준에 잘 부합되고 있음을 볼 수 있다. 비트 할당이 모두 끝난 후 MNR은 10dB선 정도에 평평한 분포를 가지고 있으므로 비트율에 충분히 여유가 있음을 알 수 있다.The pre-assignment shows that the increased MNR value is not too high and is well in line with the average level after the last allocation. After all bit allocations are completed, the MNR has a flat distribution on the 10dB line, indicating that there is sufficient margin in the bit rate.

도 21의 첫 번째 프레임에 대한 MNR의 경우는 사전 할당을 통해 1~6 서브밴드가 완전히 할당되고 잔여 비트 할당을 통해 중간 서브밴드 부분의 MNR이 보강된다. 최종 MNR곡선는 7dB선 정도에서 평형을 이루며 역시 0dB이하에 머무는 MNR값은 존재하지 않으므로 양자화 잡음이 완전히 마스킹되고 있다고 볼 수 있다.In the case of the MNR for the first frame of FIG. 21, 1 to 6 subbands are fully allocated through pre-allocation, and the MNR of the middle subband portion is reinforced through residual bit allocation. The final MNR curve is balanced at around 7dB and there is no MNR value below 0dB, so the quantization noise is completely masked.

도 22의 바이올린 음원일 때는 앞의 두 경우와는 반대로 사전 할당 후에도 저주파 부근의 MNR값이 여전히 0dB미만에 있고, 따라서 잔여 할당이 저주파 대역 중심으로 이루어지고 있으며 중간 주파수 대역은 사전 할당을 통해 충분히 높은 MNR값을 지니고 있다. 평균 MNR값이 15dB이상으로 가장 높아, 부호화가 가장 잘되는 음원이라 말할 수 있다.In the case of the violin sound source of Fig. 22, in contrast to the previous two cases, the MNR value near the low frequency is still less than 0 dB even after the pre-allocation, so that the remaining allocation is made around the low frequency band, and the intermediate frequency band is sufficiently high through the pre-allocation. It has an MNR value. The average MNR value is higher than 15dB, which can be said to be the best sound source.

세가지 경우를 통해 표 4에서 택한 사전 할당 비트량이 다양한 음원에 대해 적절했음을 검증할 수 있다. 저주파 대역에 더 많은 사전 비트 할당을 하면 바이올린과 같은 음원에 대해서는 유리해지겠지만 전자기타 음원과 같은 경우는 저주파 대역의 MNR이 너무 높아지는 결과를 가져오게 될 것이다.Three cases can verify that the pre-allocated bits chosen in Table 4 were appropriate for various sources. More pre-bit assignments in the low frequency band will be beneficial for sound sources such as the violin, but in the case of electromagnetic guitar sources, the MNR in the low frequency band will be too high.

반대로 중간 주파 대역에 더 많은 사전 할당을 하게 되면 바이올린과 같은 음원에 대해 저주파 대역에서 비트수가 모자르는 경우가 생길 수 있다.Conversely, more pre-allocation to the mid-frequency band can lead to shorter beats in the low-frequency band for sources such as violins.

다양한 음원에 대한 많은 실험 결과, 비트 할당 전의 MNR 곡선은 그림 19의 세가지 유형에서 크게 벗어나지 않는 것으로 나타났다. 만약 특별한 음원이 있어 MNR 곡선이 위의 유형을 벗어난다고 하더라도 채널당 128Kbps의 비트율에서는 할당할 비트에 충분한 여유가 있기 때문에 최종적인 MNR 곡선은 전대역에 걸쳐 항상 0dB보다 큰 값을 유지할 수 있다.Many experiments with various sources indicate that the MNR curve before beat assignment does not deviate significantly from the three types in Figure 19. If there is a special sound source and the MNR curve is out of the above, the final MNR curve can always be greater than 0 dB over the entire band, since there is enough room for the bits to be allocated at a bit rate of 128 Kbps per channel.

여러 종류의 음악에 대해 MPEG 본래의 비트 할당 방법과 사전 할당을 통한 방법의 부호화된 음질을 비교한 결과 거의 구별이 되지 않는 것으로 나타났다.As a result of comparing the encoded sound quality of MPEG original bit allocation method and pre-allocation method for various kinds of music, it is almost indistinguishable.

6. 실험 및 결과6. Experiment and Results

본 발명에서 구현된 MPEG-2 실시간 오디오 부호화기 소프트웨어의 성능 평가는 각 서브루틴별 수행시간에 대한 평가와 최종 비트열을 복호화하였을 때의 음질에 대한 평가로 나눌 수 있다.Performance evaluation of the MPEG-2 real-time audio encoder software implemented in the present invention can be divided into evaluation of execution time for each subroutine and evaluation of sound quality when the final bit string is decoded.

2채널의 CD 플레이어 및 5채널의 아날로그 음원을 제공하는 THX 프로세서를 A/D 변환기(Analog-to-Digital Converter)에 연결한 다음 A/D의 출력을 각 슬레이브 보드의 직렬 포트에 연결하여 음악 데이터를 얻는다.Connect the THX processor, which provides a 2-channel CD player and 5-channel analog sound source, to the Analog-to-Digital Converter, and then connect the output of the A / D to the serial port of each slave board to provide music data. Get

16비트 전송을 알리는 프레임 싱크(frame sync)신호와 전송 클락이 내장된 6채널의 A/D를 제작하여 실험에 이용하였는데, A/D 변환기(7)와 슬레이브 보드(1)의 프로세서 직렬 포트간에는 부가 회로가 필요 없이 직접 연결이 가능하도록 설계하였다.A six-channel A / D with a frame sync signal and a transmission clock for 16-bit transmission was fabricated and used in the experiment. Between the A / D converter 7 and the processor serial port of the slave board 1, Designed to allow direct connection without the need for additional circuits.

음원이 되는 아날로그 출력으로는 음악의 시작과 끝을 알리는 신호를 제공할 수 없기 때문에 부호화기의 리셋 스위치를 이용하여 부호화될 음악의 범위를 조절하도록 하였다.Since the analog output that is the sound source cannot provide a signal indicating the beginning and end of the music, the encoder's reset switch is used to adjust the range of music to be encoded.

16비트 PCM 형식의 데이터를 얻으려면 직렬 데이터 선과 프레임 싱크, 그리고 전송 클락 신호를 프로세서의 해당하는 입력핀에 연결하면 된다.To get data in 16-bit PCM format, connect the serial data line, frame sink, and transmit clock signal to the corresponding input pins of the processor.

MPEG-2는 최대 5.1채널을 처리하지만 기본적인 2채널(L, R) 또는 3채널(L, R, S)등 다양한 채널 구성을 지원하므로 가능한 모든 채널 조합에 대하여 실험을 하였다.Although MPEG-2 handles up to 5.1 channels, it supports various channel configurations such as basic 2 channels (L, R) or 3 channels (L, R, S), so we experimented with all possible channel combinations.

각 슬레이브 보드(1)는 채널 구성에 상관없이 한 채널만을 처리하므로 프로그램 변경 없이 사용될 수 있으나 마스터 프로그램은 채널 구성 및 비트율에 따라 변경되어야 한다.Each slave board 1 processes only one channel regardless of the channel configuration, so it can be used without changing the program, but the master program should be changed according to the channel configuration and bit rate.

프로세서에는 프로그램 수행 사이클을 측정할 수 있는 별도의 장치가 없기 때문에 각 서브 루틴별 수행 시간은 일정 시간 간격으로 들어오는 입력 샘플의 개수를 이용하여 측정하였다.Since the processor does not have a separate device for measuring program execution cycles, the execution time of each subroutine is measured using the number of input samples coming in at a predetermined time interval.

측정해야할 서브루틴의 수행전과 수행후의 들어온 입력샘플수 차이를 시간으로 환산하면 수행시간을 얻을 수 있다.The execution time can be obtained by converting the difference in the number of input samples before and after the execution of the subroutine to be measured into time.

입력 샘플의 표본화 주파수가 48KHz이고 시스템의 클럭주파수가 33MHz일 때, 샘플수와 수행시간의 관계는 다음과 같다.When the sampling frequency of the input sample is 48KHz and the system clock frequency is 33MHz, the relationship between the number of samples and the execution time is as follows.

▶ ▶

▶ 1 Sample ≒ 343.7 Cycle ▶ 1 Sample ≒ 343.7 Cycle

▶ 1 Frame = 1152 Samples = 1152 * 20.83 μS ≒ 24.0 msec ▶ 1 Frame = 1152 Samples = 1152 * 20.83 μS ≒ 24.0 msec

본 발명에서 구현된 시스템 소프트웨어의 슬레이브 보드(1)에서 처리되는 서브루틴들의 최적화된 수행시간은 표 5와 같다. 수행시간이 일정치 않은 서브 루틴들에 대해서는 가장 길 때의 시간으로 표시했다.The optimized execution times of the subroutines processed in the slave board 1 of the system software implemented in the present invention are shown in Table 5. For subroutines whose execution time is not constant, the time of the longest time is indicated.

Slave 보드 처리 루틴의 수행시간Execution Time of Slave Board Processing Routines CPUCPU 서브 루틴Subroutine 수행 시간Execution time (msec)(msec) loadload aa 서브밴드 분석Subband analysis 6.226.22 25.9％25.9% 스케일 팩터 코팅Scale factor coating 1.811.81 7.5％7.5% FFT ＆ 파워 스펙트럼FFT & Power Spectrum 3.653.65 15.2％15.2% CPU - a 합 계CPU-a total 11.6811.68 48.7％48.7% bb 심리 음향 모델Psychoacoustic model 18.2818.28 76.2％76.2% Slave 전체 수행 시간Slave overall run time 29.9629.96 124.9％124.9%

Lee's 고속 DCT를 이용하여 분석 행렬을 대체한 서브 밴드 분석과정은 기존 수행시간 10.86msec의 57.3%인 6.22msec만에 처리되어 가장 개선된 수행 성능을 보이고 있다.The subband analysis process using Lee's high-speed DCT to replace the analytical matrix was performed in 6.22msec, which is 57.3% of the existing execution time of 10.86msec, showing the most improved performance.

이때, 슬레이브 보드(1)내의 프로세서-a(11)에서 실행되는 서브루틴들은 처리되는 동안에 계속해서 입력 샘플을 받기 위한 인터럽트 루틴이 수행되기 때문에 실제 수행 시간은 측정된 것보다 빠르다고 말할 수 있다.At this time, it can be said that the subroutines executed in the processor-a 11 in the slave board 1 are actually faster than the measured time because an interrupt routine for continuously receiving input samples is performed during processing.

슬레이브 보드(1)내 프로세서-b(12)의 심리 음향 모델은 어셈블 명령어 사용의 최적화 등을 통해 3.56msec의 수행시간을 개선하였다. 개선된 후 슬레이브 보드(1)내의 프로세서-a(11)에서 처리되는 서브루틴들의 'load'의 합이 50%를 넘지 않는다.The psychoacoustic model of the processor-b 12 in the slave board 1 improves the execution time of 3.56 msec through optimization of the use of the assembly instructions. After the improvement, the sum of the 'loads' of the subroutines processed by the processor-a 11 in the slave board 1 does not exceed 50%.

따라서 슬레이브 보드(1)내 프로세서-b(12)의 심리 음향 모델을 조금 더 개선하여 'load'를 50%이하로 줄인다면, 슬레이브 보드(1)의 프로세서 사용을 한 개로 줄일 수 있게 될 것이다.Therefore, if the psychoacoustic model of the processor-b 12 in the slave board 1 is further improved to reduce the load to 50% or less, the processor use of the slave board 1 may be reduced to one.

마스터 보드(2)에서 수행되는 루틴은 채널수와 비트율에 따라 처리해야하는 데이터 양이 달라지므로 수행시간도 각기 다르다. 채널 증가에 따라 가장 많은 수행시간이 걸리는 비트 할당 과정의 경우 사전 할당을 통해 실시간 처리가 가능하도록 수행시간을 조절할 수 있다.The routines performed on the master board 2 vary in execution time because the amount of data to be processed varies depending on the number of channels and the bit rate. In the case of the bit allocation process that takes the most execution time as the channel increases, the execution time can be adjusted to enable real-time processing through pre-allocation.

2채널까지의 처리는 비트율에 상관없이 사전할당을 사용하지 않고도 실시간 처리가 가능하여 기존의 MPEG 비트 할당 알고리듬을 사용하였고, 3채널 이상부터는 비트율에 따라 적절히 사전 할당을 하는 방법을 택하였다.Since up to 2 channels can be processed in real time without using pre-allocation regardless of the bit rate, the existing MPEG bit allocation algorithm is used, and more than 3 channels are pre-allocated according to the bit rate.

2채널 192Kbps와 5채널 640Kbps인 경우에 대해 마스터 보드(2)에서 수행되는 루틴들의 최적화된 수행시간이 표 6에 나와 있다.Table 6 shows the optimized execution times of the routines performed on the master board 2 for the 2-channel 192 Kbps and 5-channel 640 Kbps.

사전 할당을 통해서 비트 할당 과정의 수행시간은 얼마든지 조절이 가능하고 나머지 루틴들은 비트율에 따른 수행시간 변화가 크지 않으므로 구현된 부호화기는 사실상 모든 비트율에 대해서 실시간 처리가 가능하다.Through the pre-allocation, the execution time of the bit allocation process can be adjusted as much, and the rest of the routines do not change the execution time according to the bit rate so that the implemented encoder can process virtually all bit rates in real time.

Master 보드 처리 루틴의 수행시간Execution time of master board processing routine 처리 CPUProcessing CPU 서브 루틴Subroutine 2채널, 192Kbps2 channels, 192 Kbps 5채널, 640Kbps5 channels, 640 Kbps (msec)(msec) loadload (msec)(msec) loadload bb 비트 할당Bit allocation 7.037.03 29.3％29.3% 17.5217.52 73.0％73.0% aa 양자화Quantization 1.331.33 5.5％5.5% 3.423.42 14.3％14.3% 비트열 포맷Bit string format 2.382.38 9.9％9.9% 6.216.21 25.9％25.9% 비트열 전송Bit string transfer 2.312.31 9.6％9.6% 7.527.52 31.3％31.3% CPU - a 합계CPU-a total 6.026.02 25.0％25.0% 17.1517.15 71.5％71.5%

첫 번째 프레임의 입력 샘플이 모두 들어온 후부터 4개의 프로세서를 거쳐서 최종 비트열이 나가는 순간까지의 수행시간은 그림 9의 프로세서사이의 동기화 과정을 살펴보면 계산할 수 있다.The execution time from the input frame of the first frame until the final bit string through the four processors can be calculated by looking at the synchronization process between the processors shown in Figure 9.

PC로 최종 비트열이 전송되는 프로세서인 마스터 보드(2)내 프로세서-a(21)의 첫 번째 프레임에 대한 작업 시작은 슬레이브 보드(1)내 프로세서-a(11)의 세 번째 프레임에 대한 서브밴드 분석이 끝나는 시점과 동기화되어 있다.The start of work on the first frame of processor-a (21) in master board (2), the processor on which the last bit string is transmitted to the PC, is the subframe for the third frame of processor-a (11) in slave board (1). It is synchronized with the end of band analysis.

그러므로 첫 번째 샘플이 입력되어 최종 비트열이 전송될 때까지의 전체 부호화 시간 지연(coding delay)은 아래와 같이 계산된다.Therefore, the total coding delay until the first sample is input and the final bit string is transmitted is calculated as follows.

Coding Delay = (1 프레임 구간; 입력 버퍼가 채워지는 시간) + (2 프레임 구간)Coding Delay = (1 frame interval; time when input buffer is filled) + (2 frame interval)

+ (서브밴드 분석 구간) + (Master-a 수행 구간)+ (Subband section) + (master-a section)

= 24.0 + 2*24.0 + (4.64+0.37) + 17.15= 24.0 + 2 * 24.0 + (4.64 + 0.37) + 17.15

=94.16(msec)= 94.16 (msec)

본 발명에서 구현된 실시간 부호화기를 사용하여 생성해낸 비트열은 하드디스크에 저장된 후 C언어로 작성된 MPEG 다채널 표준 복호화 프로그램으로 복호화하여 성능을 검증하였다.The bit stream generated using the real-time encoder implemented in the present invention is stored on a hard disk and then decoded using an MPEG multi-channel standard decoding program written in C to verify performance.

채널당 비트율이 64Kbps일 때는 복호화된 음악에 음질 손상이 있음이 쉽게 느껴지지만 96Kbps일 때는 대역이 좁은 바이올린, 첼로와 같은 음원에 대해 원음과 비교해서 큰 차이가 나지 않는다. 채널당 비트율이 128Kbps일 때는 복원음이 원음과 거의 구분이 되지 않게 된다.When the bit rate per channel is 64Kbps, it is easy to see that the decoded music has sound quality damage, but when it is 96Kbps, there is no big difference compared to the original sound for sound such as a narrow band violin and cello. When the bit rate per channel is 128Kbps, the reconstruction sound is hardly distinguished from the original sound.

그 이상 비트율이 상승할 경우는 제작된 A/D 변환기, 재생에 사용된 스피커와 앰프등에 의한 음질 왜곡이 부호화에 따른 음질 손실보다 더 심해 128Kbps일 때의 음질에 비교해서 크게 개선되고 있음을 느끼기 어렵다.If the bit rate rises further, it is hard to feel that the distortion of the sound quality by the manufactured A / D converter, the speaker and the amplifier used for reproduction is more severe than the sound quality loss due to the encoding, and compared with the sound quality at 128 Kbps. .

도 23은 44.1KHz 표본화 주파수에서 같은 입력 샘플에 대해 구한 SMR 곡선을 바탕으로 64, 96, 128 Kbps로 비트 할당을 했을 때 최종적인 MNR 곡선을 나타내고 있다. 비트율이 상승함에 따라, 비트 할당된 대역에서 대략 6-10dB정도의 MNR 상승이 이뤄지고 있음을 볼 수 있다.FIG. 23 shows the final MNR curve when bits are allocated at 64, 96, and 128 Kbps based on the SMR curves obtained for the same input sample at 44.1 KHz sampling frequency. As the bit rate increases, it can be seen that an MNR increase of about 6-10 dB is achieved in the bit-allocated band.

64Kbps에서는 할당된 모든 대역의 MNR이 아직 0dB에 미치지 못해, 마스킹되지 않는 양자화 잡음이 존재한다고 볼 수 있다.At 64 Kbps, the MNR of all allocated bands is still less than 0 dB, suggesting that there is an unmasked quantization noise.

96Kbps에서는 할당 대역이 모두 0dB에 접근했지만 아직 완전히 마스킹되지 않는 서브밴드가 존재하고 있음을 알 수 있다.At 96 Kbps, we can see that there are subbands where allocating bands have approached 0 dB but are not yet fully masked.

128Kbps의 비트율에서는 전 대역의 MNR이 5-10dB 이상에 분포하고 있어서, 이제 이론적으로 모든 서브밴드의 양자화 잡음이 신호에 의해 마스킹되고 있다고 말할 수 있다.At a bit rate of 128Kbps, the MNR of the entire band is distributed over 5-10dB, and now it can be said that the quantization noise of all subbands is theoretically masked by the signal.

도 23의 비트 할당에 따른 최종 MNR 곡선을 비교함으로서 부호화된 비트열의 비트율에 따른 음질 차이를 설명할 수 있다. 원음과 주관적 음질 차이가 나지 않는 채널당 128Kbps의 비트율은 샘플당 평균 2.67비트가 사용된 것으로 16비트로 양자화된 PCM 샘플에 대해 약 6:1의 압축 성능을 갖는다.By comparing the final MNR curve according to the bit allocation of FIG. 23, the sound quality difference according to the bit rate of the encoded bit string may be described. The bit rate of 128Kbps per channel with no difference between the original sound and the subjective sound is about 2.67 bits per sample, which is about 6: 1 compression performance for PCM samples quantized to 16 bits.

복호화되어 하드디스크에 저장된 5채널 PCM 샘플은 제작된 직렬 전송 카드로 D/A 변환기(Digital-to-Analog Converter)에 보내져 아날로그 신호로 변환된 후 앰프로 출력하여 음질을 평가할 수 있다.The 5-channel PCM sample, which is decoded and stored on the hard disk, is a serial transmission card manufactured and sent to a digital-to-analog converter, which is converted into an analog signal and output to an amplifier for evaluation of sound quality.

이상에서 설명한 바와 같이 본 발명에 의해 구현된 부호화기 소프트웨어는, MPEG-2 계층 2의 48KHz 표본화율에서 5.1채널, 640Kbps까지 실시간 처리되었으므로 그 이하의 모든 비트율과 표본화율, 채널 조합이 가능하고, 또 다채널 부호화를 보다 효과적으로 수행하기 위한 방법으로 MPEG-2에서 제안한 복합 부호화(composite coding) 중 다이내믹 전송 채널 스위칭과 중앙 채널의 가상 부호화(phantom coding)를 지원한다.As described above, the encoder software implemented according to the present invention has been processed in real time up to 5.1 channels and 640 Kbps at 48 KHz sampling rate of MPEG-2 layer 2, and thus all bit rates, sampling rates, and channel combinations below are possible. As a method for more effectively performing channel coding, dynamic transport channel switching and phantom coding of a central channel are supported among the composite coding proposed by MPEG-2.

각 서브루틴의 고속화 및 최적화는 2개의 프로세서로 처리하던 슬레이브 보드 수행 루틴을 50MHz로 Up-clocking시 한 개의 프로세서만으로 처리가 가능하며, 또 4개의 프로세서가 있어야 실시간 처리가 가능한 마스터 보드의 다채널 처리도 고속 알고리듬을 적용하여 2개의 프로세서만으로 처리할 수 있고, 사용되는 프로세서의 숫자가 줄면 제작 비용상의 이득뿐만아니라 HDTV 방송등의 응용시에 중요한 문제인 부호화 시간 지연(coding delay)도 줄일 수 있으며, 최적화된 프로그램은 범용 DSP 프로세서를 이용하여 개발된 것이므로 다른 프로세서나 ASIC 등의 응용에 활용할 수 있는 것이다.The speedup and optimization of each subroutine can be done by one processor when up-clocking the slave board execution routine that was processed by two processors to 50MHz, and multi-channel processing of the master board that requires four processors to process in real time High-speed algorithms can be applied to only two processors, and the number of processors used can be reduced, reducing not only the production cost but also the coding delay, which is an important problem in applications such as HDTV broadcasting, and optimization. The developed program was developed using a general-purpose DSP processor, so it can be used for applications such as other processors and ASICs.

Claims

Five slave boards 1 which receive respective channel audio data and perform subband analysis and psychoacoustic modeling;

A master board 2 for channel matrixing, bit allocation, quantization, bit packing, and interface with a PC 5;

A backplane board (3) connecting the slave boards (1) and the master board (2) to each other;

An interface unit (4) for transmitting data transferred from the master board (2) to a bit string to a personal computer;

A personal computer 5 for verifying the performance of the encoding system by decoding the bit stream transmitted to the MPEG standard decoding program;

It is composed of six channels of A / D converters (7) for converting audio signals output analog from a source (6) such as a CD player or a DAT to digital data of 16-bit PCM and transmitting them to the slave boards (1). A real-time processing device for an HDTV audio encoder.

The method according to claim 1, wherein the slave boards (1) are two processors-a, b (11) 12, dual port RAM 13 for transferring information between the processors, to store a predetermined program or result Memory 14 required for each processor, dual port RAM 15 for data transfer between slave board 1 and master board 2, and decoding circuit 16 for decoding the memory address to change the number of weights Real-time processing device of the HDTV audio encoder, characterized in that.

The method of claim 1, wherein the master board (2) comprises two processors -a, b (21) (22) and 96K ROM (23) for performing bit allocation, quantization, and bit string formatting on up to five channels. 64K RAM 24, dual port RAM of each slave board 1 via a backplane connector for data exchange with reset and clock circuits 25 and 26, address decoding circuitry, and slave board 1; 27) a real-time processing device for an HDTV audio encoder.

Processing audio data in a parallel structure; Speeding up the computation by performing subband analysis with the FAST DCT algorithm; Exchanging data using dual port RAM between slave boards; Transferring the result of each slave board to the master through the dual port RAM; Master board hardware handing over data from slave board and processing it comprehensively; Significantly reducing the amount of computation using an efficient algorithm in allocating bits in the master board hardware; Finally, the real-time processing method of the HDTV audio encoder, characterized in that consisting of a step through the interface with the PC.

5. The method of claim 4, wherein using dual port RAM between slave boards processes processor-a to subbands in the slave board and then passes the result to processor-b in the slave board using dual port RAM to model the remaining psychoacoustic. Real-time processing method of the HDTV audio encoder, characterized in that consisting of the process of performing.

5. The method of claim 4, wherein the step of passing each slave board result to the master board hardware using dual port RAM comprises: collecting each bit of audio data, which has already obtained subband analysis and psychoacoustic modeling, together with bit allocation, quantization, and bit stream. A real-time processing method of an HDTV audio encoder, characterized in that the process of performing the formatting.

The method according to claim 4, wherein the efficient algorithm used in the bit allocation in the master board hardware measures the bit allocation degree for each subband statistically, and then allocates the number of bit allocations of the subbands allocated in advance to improve the overall execution time. Real-time processing method of the HDTV audio encoder, characterized in that for reducing.

5. The method of claim 4, wherein the interface to the PC is finally able to send the encoded bit stream to the decoder, or to create an interface card to send the encoded bit stream to the PC so that the encoded bit stream can be processed together with other data on the PC. Real-time processing method of the HDTV audio encoder, characterized in that.