KR19990026316A

KR19990026316A - VSI's Subtitle Processing System and Method

Info

Publication number: KR19990026316A
Application number: KR1019970048386A
Authority: KR
Inventors: 박정호
Original assignee: 전주범; 대우전자 주식회사
Priority date: 1997-09-24
Filing date: 1997-09-24
Publication date: 1999-04-15

Abstract

본 발명은 음성인식용 DSP IC를 이용한 브이시알의 자막처리 시스템 및 방법에 관한 것으로서, 종래에는 사용자가 어학공부를 하기 위해 또는 청각장애자가 시각적으로 인식할 수 있도록 브이시알의 음성신호를 TV수상기 화면상에서 문자로 자막처리하기 위한 방법이 별도로 강구되어 있지 않아 브이시알 시스템을 보다 적극적으로 이용하는데 한계가 있었다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a system and method for subtitle processing of VSIal using a DSP IC for speech recognition. In the related art, a voice signal of VSIal can be visually recognized by a TV receiver to enable a user to study a language or to be visually recognized by a deaf person. There is a limit to actively use the VSI system because no method for captioning with texts in the text box has been devised.

본 발명은 종래의 이러한 문제점을 개선할수 있도록 초기에 브이시알이 파워 온 되었는지를 판단하는 단계(스텝100)와, 상기 단계에서 브이시알이 파워 온 되면 순차적으로 현재 브이시알이 재생모드인지를 판단하는 단계(스텝101)와, 계속해서 자막 모드키(40)가 온 되었는지 여부를 판단하는 단계(스텝102)와, 상기 단계에서 자막 모드키(40)가 온 되었다고 판단되면 아날로그 상태의 오디오신호를 디지탈화시키는 단계(스텝103)와, 상기 디지탈화된 오디오신호중 음성신호를 메모리에 저장하는 단계(스텝104)와, 계속해서 현재 디지탈화된 음성신호의 구간이 무음구간인지 판단하는 단계(스텝105)와, 상기 단계에서 음성인식용 DSP IC(44)에 의해 무음구간으로 판단되면 계속해서 연속음 구간의 음성데이터를 인식한 후 문자데이타로 변환시키는 단계(스텝106)와, 상기 음성인식용 DSP IC(44)에 의해 변환된 문자데이터를 마이콤(45)으로 전송시키는 단계(스텝107)와, 상기 단계에서 문자데이터가 마이콤(45)에 인가되면 다시 문자데이터를 오에스디부(46)로 전송시키는 단계(스텝108)와, 상기 오에스디부(46)에 문자데이터가 보내지면 문자데이터를 자막처리하여 TV 수상기 화면으로 출력시키는 단계(스텝109)와, 상기 단계가 끝나면 다시 현재 브이시알의 모드가 재생모드인지를 판단하여 계속해서 재생모드이면 스텝103으로 리턴하고 아니면 종료되는 단계(스텝110)를 순차적으로 처리하는 것을 특징으로 한다.According to the present invention, it is possible to determine whether the VSI is powered on initially (Step 100), and sequentially determine whether the current VSI is in the regeneration mode. Step (step 101), and subsequently determining whether or not the caption mode key 40 is on (step 102), and if it is determined in this step that the caption mode key 40 is turned on, the digital audio signal in the analog state A step (step 103), storing a voice signal of the digitalized audio signal in a memory (step 104), and subsequently determining whether a section of the currently digitized voice signal is a silent section (step 105); If it is determined in the step that the speech recognition DSP IC 44 is in the silent section, it continuously recognizes the speech data of the continuous sound section and converts it into text data (step 106); Transmitting the character data converted by the voice recognition DSP IC 44 to the microcomputer 45 (step 107), and when the text data is applied to the microcomputer 45 in the step, the text data is again transmitted. Transmitting the text data to the TV receiver 46 (step 108), and outputting the text data to the TV receiver screen when the text data is sent to the OS unit 46 (step 109). It is characterized by judging whether the current VSI mode is the regeneration mode, and if the regeneration mode continues, the process returns to step 103 and ends (step 110).

Description

VSI's Subtitle Processing System and Method

본 발명은 음성인식용 DSP IC를 이용한 브이시알의 자막처리 시스템 및 방법에 관한 것으로서, 특히 브이시알의 재생모드에서 사용자가 브이시알 세트의 일측이나 리모콘의 키입력부에 구비되는 자막모드키를 온 시키면 오디오신호가 음성인식용 DSP IC에 의해 문자데이터로 처리되고 계속해서 음성인식용 DSP IC로 부터 문자데이터를 인가받은 마이콤의 제어로 문자데이터가 오에스디부에서 자막처리되어 TV수상기 화면으로 출력되도록 한 것이다.The present invention relates to a closed caption processing system and method using a DSP IC for speech recognition, and in particular, when a user turns on the closed caption mode key provided on one side of the set of the visual receiver or on the key input unit of the remote control. The audio signal is processed into text data by the DSP IC for speech recognition, and the text data is subtitled in OSD to be output to the TV receiver screen under the control of Micom which has received the text data from the DSP IC for speech recognition. will be.

우리 나라의 문자방송은 케이비피에스 신호 데이터를 비디오 신호의 수평동기 구간의 백포치측에 삽입하여 이것을 이용한 각종 정보를 이용할 수 있도록 설계되어져 있다.Korean character broadcasting is designed to insert KBIPS signal data into the back porch side of the horizontal synchronization section of the video signal and to use various kinds of information.

일예로, 우리나라에서 실행되고 있는 케이비피에스 방송신호에는 방송국명, 각종 프로의 제목, 시간, 시각정보,.. 등의 데이터를 갖고 있으며, 이러한 데이터의 이용으로 브이시알 시스템에서는 예약녹화 등의 작업을 행하고 있다.For example, the KB PS broadcast signal that is being executed in Korea has data such as the name of a broadcasting station, titles of various pros, time, time information, and so on. The use of such data allows the BSI system to perform reservation recording and the like. Doing.

이와같은 케이비피에스 신호를 이용하여 브이시알 시스템에서 간편한 예약녹화 처리와 함께 오에스디 화면을 출력할 수 있는 유형이 도 1과 같이 알려져 있다.A type capable of outputting an OSD screen with a simple reservation recording process in a VSI system using such a KBP signal is known as shown in FIG. 1.

여기서는, 브이시알 시스템을 제어하는 마이크로 컴퓨터(31)와, 시스템 데크측의 모터 등을 구동 제어하는 서보 회로(32)와, 리모콘 송신부(33c)와 무선으로 연결된 리모콘수신부(33b) 및 세트측의 키부(33c)가 연결되어져 있는 타이머 회로(33)와, 시스템에서 재생 또는 음성 녹음 신호를 처리하는 오디오 회로(35)와, 튜너 및 아이에프부(38)로부터 제공되는 알에프 신호로부터 녹화 신호를 테이프 측에 제공하거나 재생 비디오 신호 등을 처리하는 비디오 회로(37)와, 시스템 데크측에서 배속 재생시등에 오디오 신호를 정상상태로 제공할 수 있도록 제어하는 디엠에스에스(DMSS)부(36)로 이뤄져 있다.Here, the microcomputer 31 for controlling the VSI system, the servo circuit 32 for driving control of the motor on the system deck side, the remote control unit 33b and the set side connected wirelessly to the remote control transmitter 33c The recording signal is taped from a timer circuit 33 to which the key portion 33c is connected, an audio circuit 35 for processing a reproduction or audio recording signal in the system, and an RF signal provided from the tuner and the IF unit 38. The video circuit 37 is provided to the side or processes the reproduced video signal, and the like, and the DMSS unit 36 is configured to control the audio signal to be provided in the normal state at the time of double speed reproduction.

또, 상기 시스템에는 케이비피에스 등의 문자 및 그래픽 등의 데이터를 디코딩하여 화면상에 각종 정보의 문자, 그래픽 표시 및 기능 선택을 위한 오에스디 메뉴화면 등을 표출하는 케이비피에스부(34)와, 안테나로부터 수신되는 신호를 동조하고 중간 주파수의 신호를 생성하는 튜너 및 아이 에프부(38)와, 시스템 전원을 제공하는 파워 회로(39), 외부 오디오/비디오 입력과 방송계측 신호를 선택하는 오디오/비디오 선택 스위치(ASW,VSW)등 으로 이루어져 있다.In addition, the system includes a KBPS unit 34 which decodes data such as characters and graphics, such as KBPS, and displays an OSD menu screen for selecting characters, graphics, and functions of various information on the screen, and an antenna. Tuner and eye F 38 for tuning signals received from and generating intermediate frequency signals, power circuit 39 for providing system power, audio / video for selecting external audio / video input and broadcast measurement signals It consists of selection switch (ASW, VSW), etc.

또, 상기 시스템의 케이비피에스부(34)는 이것을 도 2와 같이 나타 내었다.In addition, the KB PS part 34 of the system showed this as shown in FIG.

여기서는 튜너 및 아이에프부(38)에서 제공되는 안테나측 신호를 버퍼 (34c,34j)로 연결하는 제 1 스위치(34b)와, 버퍼(34c) 측에서 제공되는 알에프 신호를 제공받아 수평동기신호(HS)와 칼러동기신호(CS)를 제각기 분리하는 동기분리 아이시(34d)와, 복합비디오 신호로부터 케이비피에스 데이터(DT)를 검출하는 데이터 슬라이서(34e)와, 케이비피에스 데이터(DT)를 입력받아 데이터 내용을 디코딩 처리하여 제 2 스위치(34k)로 오에스디(OSD; On Screen Display) 신호 등을 제공하는 케이비피에스 디코더(34a)와, 이 케이비피에스 디코더(34a)에 의하여 실행되는 각종 프로그램 내용 및 데이터들을 저장하고 있는 프로그머블 롬(34g)과 시스템 램(34f) 및 문자폰트 롬(34h) 등을 포함하고 있다.In this case, the first switch 34b for connecting the antenna-side signals provided from the tuner and the IF unit 38 to the buffers 34c and 34j, and the RF signal provided from the buffer 34c side are supplied with the horizontal synchronization signal ( Inputting a synchronization separator (34d) for separating the HS and the color sync signal (CS) from each other, a data slicer (34e) for detecting the KBP data DT from the composite video signal, and the KBP data DT; Receiving and decoding data contents to provide an On Screen Display (OSD) signal or the like to the second switch 34k, and various program contents executed by the KPS processor 34a. And a programmable ROM 34g that stores data, a system RAM 34f, a character font ROM 34h, and the like.

이외에도, 상기 제 2 스위치 (34k)에서 제공되는 신호를 디에이 (D/A; Digital/Analog) 컨버터(34i)에 제공하거나 시스템의 비디오 출력단으로 신호를 제공하는 스위치 아이시(34L)와, 이 아이시(34L)로부터 제공된 아날로그 신호를 디지탈 신호로 변환하여 케이비피에스 디코더(34a)로 제공하는 에이디 컨버터(34i)와, 외부의 캡션부(4)와 이어져 있는 저역필터(34m) 등을 포함한 구성으로 이루어져 있다.In addition, a switch is 34L for providing a signal provided from the second switch 34k to a digital / analog (D / A) converter 34i or providing a signal to a video output terminal of the system. It consists of an AD converter 34i which converts an analog signal provided from the city 34L into a digital signal and provides it to the KB PS decoder 34a, a low pass filter 34m connected to the external caption section 4, and the like. consist of.

한편, 상기와 같은 오에스디 기능을 갖는 시스템에서는 케이비피에스 데이터를 이용하여 티브이 수상기에 필요한 채널 숫자, 자막 문자, 기호 등을 오에스디 화면으로 제공 가능하게 설계되어 있지만 종래에는 이러한 기능을 적극적으로 활용하는데 제약이 있었다.On the other hand, in the system having the OSD function as described above, it is designed to provide the channel number, subtitle characters, symbols, etc. required for the TV set using the KBPS data to the OSD screen, but conventionally to actively use such a function There was a limitation.

일례로, 종래에는 사용자가 어학공부를 하기 위해 또는 청각장애자가 시각적으로 인식할 수 있도록 브이시알의 음성신호를 TV수상기 화면상에서 문자로 자막처리하기 위한 방법이 별도로 강구되어 있지 않아 브이시알 시스템을 보다 적극적으로 이용하는데 한계가 있었다.For example, in the related art, there is no method for capturing VSI's audio signal into text on the TV receiver screen so that the user can study language or visually recognize the visually impaired. There was a limit to active use.

본 발명의 목적은 상기와 같은 종래 기술의 문제점을 개선할수 있도록 브이시알의 재생모드에서 사용자가 브이시알 세트의 일측이나 리모콘의 키입력부에 구비되는 자막모드키를 온 시키면 자동으로 음성신호가 문자로 오에스디부에서 자막처리되어 TV수상기 화면으로 출력되도록 함으로써 사용자의 어학공부에 도움을 줄뿐만아니라 청각장애자도 브이시알의 음성출력을 시각적으로 편리하게 인식할 수 있도록 한 브이시알의 자막처리 시스템 및 방법을 제공하는데 있다.An object of the present invention is to automatically turn on the subtitle mode key provided on one side of the Vial set or the key input unit of the remote control in the playback mode of the Vial to improve the problems of the prior art as described above. VSI subtitle processing system and method not only helps the language study of the user by being subtitled in OSD and output to the TV receiver screen, but also allows the hearing impaired to visually recognize VSI's audio output visually and conveniently. To provide.

특히 상기예의 목적을 구현할 수 있도록 본 발명은 사용자가 TV 수상기화면을 통해 문자 자막을 보고 싶을 때 온 시키면 브이시알의 마이콤에 입력신호가 인가되도록 브이시알 세트의 일측이나 리모콘의 키입력부에 구비되는 자막모드키와, 비디오 테이프의 재생시 분리된 비디오/오디오신호를 각각 처리하는 비디오신호 처리부 및 오디오신호 처리부와, 상기 오디오신호 처리부로부터 아날로그 오디오신호를 인가받아 디지탈화시키는 A/D컨버터와, 상기 A/D컨버터로부터 디지탈화된 음성데이타를 인가받아 메모리에 저장시키는 한편 일차적으로 입력된 음성데이타를 분석하여 입력된 음성데이타가 무음구간인지 여부를 판단한 후 무음구간이라 판단되면 무음구간동안 연속음 구간의 음성데이터를 인식하여 문자데이타로 변환시키는 음성인식용 DSP IC와, 상기 음성인식용 DSP IC로 부터 문자데이타를 인가받아 오에스디부를 제어하는 마이콤과, 상기 마이콤의 제어를 받아 TV수상기 화면에 문자 자막을 출력시키는 오에스디부 등을 포함하는 것을 특징으로 한다.Particularly, in order to realize the above object, the present invention provides a subtitle provided on one side of a VSI set or a key input unit of a remote control so that an input signal is applied to the MICOM of VSI when the user turns on when the user wants to view a subtitle on a TV receiver screen. A mode signal, a video signal processor and an audio signal processor for processing separate video / audio signals during playback of a videotape, an A / D converter for receiving an analog audio signal from the audio signal processor and digitizing the same; After receiving the digitalized voice data from the D converter and storing it in the memory, the voice data is first analyzed by determining whether the input voice data is a silent section, and if it is determined that it is a silent section, the voice data of the continuous sound section during the silent section. DSP IC for Speech Recognition to Convert Text Data Characterized in that it comprises a microprocessor for controlling the negative o is receiving text data from the edible parts of the DSP IC SD, SD O unit such that under the control of the microcomputer outputting a subtitle text to the TV set screen.

또한, 본 발명은 초기에 브이시알이 파워 온 되었는지를 판단하는 단계와, 상기 단계에서 브이시알이 파워 온 되면 순차적으로 현재 브이시알이 재생모드인지를 판단하는 단계와, 계속해서 자막 모드키가 온 되었는지 여부를 판단하는 단계와, 상기 단계에서 자막 모드키가 온 되었다고 판단되면 아날로그 상태의 오디오신호를 디지탈화시키는 단계와, 상기 디지탈화된 오디오신호중 음성신호를 메모리에 저장하는 단계와, 계속해서 현재 디지탈화된 음성신호의 구간이 무음구간인지 판단하는 단계와, 상기 단계에서 음성인식용 DSP IC에 의해 무음구간으로 판단되면 계속해서 연속음 구간의 음성데이터를 인식한 후 문자데이타로 변환시키는 단계와, 상기 음성인식용 DSP IC에 의해 변환된 문자데이터를 마이콤으로 전송시키는 단계와, 상기 단계에서 문자데이터가 마이콤에 인가되면 다시 문자데이터를 오에스디부로 전송시키는 단계와, 상기 오에스디부에 문자데이터가 보내지면 문자데이터를 자막처리하여 TV 수상기 화면으로 출력시키는 단계와, 상기 단계가 끝나면 다시 현재 브이시알의 모드가 재생모드인지를 판단하여 재생모드이면 계속해서 오디오신호를 A/D 처리하고 아니면 종료되는 단계를 순차적으로 수행하는 것을 특징으로 한다.In addition, the present invention is the step of determining whether the V-Sial is initially powered on, and if the V-Sial is powered on in this step, the step of sequentially determining whether the current V-Sial is in the playback mode, and subsequently the subtitle mode key is turned on Determining whether the subtitle mode key is turned on, digitalizing an audio signal in an analog state, storing a voice signal of the digitalized audio signal in a memory, and continuing the current digitalization. Determining whether the speech signal section is the silent section, and if it is determined that the speech section is the silent section by the voice recognition DSP IC, continuously recognizing the speech data of the continuous sound section and converting the speech data into text data; Transmitting the character data converted by the DSP IC for recognition to the microcomputer, and in the step, When the data is applied to the microcomputer, the step of transmitting the text data to the OS again, and if the text data is sent to the OS D and subtitles the text data to output to the TV receiver screen, and when the step is over again the current V It is characterized by judging whether the mode of the sial is the playback mode, and if the playback mode is to continue the A / D processing of the audio signal, or terminates sequentially.

이하에서 이를 첨부된 도면과 함께 좀 더 상세히 설명하므로써 본 발명의 보다 구체적인 특징들이 이해될 수 있을 것이다.Hereinafter, more specific features of the present invention will be understood by describing the same in detail with the accompanying drawings.

도 1은 일반적인 케이비피에스 수신 기능을 갖는 브이시알(VCR; Video Cassette Recorder)의 시스템 회로 블록도,1 is a system circuit block diagram of a Video Cassette Recorder (VCR) having a general KBPS reception function;

도 2는 도 1에서 케이비피에스(KBPS) 디코더부를 나타낸 참고도,FIG. 2 is a reference diagram illustrating a KBPS decoder unit in FIG. 1; FIG.

도 3은 본 발명의 시스템 구성을 나타내는 블록도,3 is a block diagram showing a system configuration of the present invention;

도 4는 본 발명에 따른 브이시알의 자막처리 방법을 나타내는 흐름도이다.4 is a flowchart illustrating a subtitle processing method of BCial according to the present invention.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

40: 자막모드키 41: 오디오신호 처리부40: caption mode key 41: audio signal processor

42: 오디오신호 처리부 43: A/D컨버터42: audio signal processor 43: A / D converter

44: 음성인식용 DSP IC 45: 마이콤44: DSP IC 45 for voice recognition: micom

46: 오에스디부46: OSDIVE

도 3은 본 발명에 의한 브이시알의 자막처리 시스템을 나타내는 구성블록도이다.3 is a block diagram showing a closed caption processing system according to the present invention.

우선, 사용자가 어학공부를 하기 위해서 또는 청각장애자가 비디오 테이프의 음성데이터를 시각적으로 인식하기 위해서 TV 수상기화면을 통해 문자 자막을 보고 싶을 때 브이시알 세트의 일측이나 리모콘의 키입력부에 구비되는 자막모드키(40)를 온 시키면 브이시알의 마이콤(45)에 입력신호가 인가되고 그에 따라 마이콤(45)은 자막모드를 수행하게 된다.First, when the user wants to watch a text caption through a TV receiver screen to study a language or to visually recognize voice data of a videotape, the caption mode is provided on one side of a VSI set or a key input unit of a remote controller. When the key 40 is turned on, an input signal is applied to the microcomputer 45 of VSI and thus the microcomputer 45 performs the subtitle mode.

즉, 비디오 테이프의 재생시 분리된 비디오/ 오디오신호는 각각 비디오신호 처리부(41) 및 오디오신호 처리부(42)를 거쳐 처리되는데 이때, 특히 오디오 신호는 A/D컨버터(43)를 거쳐 디지탈화되게 된다.That is, the separated video / audio signals are processed through the video signal processor 41 and the audio signal processor 42, respectively, during the playback of the video tape. In this case, the audio signals are digitalized through the A / D converter 43. .

또한, 상기 A/D컨버터(43)에서 디지탈화된 음성데이타가 음성인식용 DSP IC(44)의 제어에 의해 메모리에 저장되면 상기 음성인식용 DSP IC(44)는 일차적으로 입력된 음성데이타를 분석하여 입력된 음성데이타가 무음구간인지 여부를 판단한 후 무음구간이라 판단되면 무음구간동안 연속음 구간의 음성데이터를 인식한 후 문자데이타로 변환시키게 된다.In addition, if the digitized voice data stored in the A / D converter 43 is stored in the memory under the control of the voice recognition DSP IC 44, the voice recognition DSP IC 44 analyzes the firstly input voice data. After determining whether the input voice data is a silent section, if it is determined that it is a silent section, the voice data of the continuous sound section is recognized during the silent section and then converted into text data.

또한, 상기 음성인식용 DSP IC(44)로 부터 문자데이타를 인가받은 마이콤(45)은 오에스디부(46)를 제어하여 TV수상기 화면에 문자 자막을 출력시키게 된다.In addition, the microcomputer 45 receiving the text data from the voice recognition DSP IC 44 controls the OSD 46 to output the text caption on the TV receiver screen.

도 4는 본 발명에 의한 브이시알의 자막처리 방법을 나타내는 흐름도이다. 이하에서 도 4의 흐름도와 함께 본 방법의 처리과정을 좀 더 상세히 설명해 보겠다.4 is a flowchart illustrating a subtitle processing method of VSIal according to the present invention. Hereinafter, the processing of the method will be described in more detail with the flowchart of FIG. 4.

즉, 브이시알 세트측의 마이콤(45)은 초기에 사용자에 의해 브이시알이 파워 온 되었는지를 판단하고(스텝100), 상기 단계에서 브이시알이 파워 온 되면 순차적으로 현재 브이시알이 재생모드인지를 판단하게 된다(스텝101).That is, the microcomputer 45 on the VSI set side initially determines whether the VSI is powered on by the user (step 100), and if the VSI is powered on in this step, whether the current VSI is in the regeneration mode sequentially. It is determined (step 101).

이때, 상기 마이콤(45)은 사용자에 의해 브이시알의 세트측에 구비되는 자막 모드키(40)가 온 되었는지 여부를 판단하게 된다(스텝102).At this time, the microcomputer 45 determines whether or not the caption mode key 40 provided on the set side of VSI is turned on by the user (step 102).

상기 단계에서 자막 모드키(40)가 온 되면 순차적으로 A/D컨버터(43)에 의해 아날로그 상태의 오디오신호가 2진수로 디지탈화된다(스텝103).When the caption mode key 40 is turned on in this step, the analog audio signal is sequentially digitalized by the A / D converter 43 in binary (step 103).

이때, 상기 A/D컨버터(43)의 후단에 구비되는 음성인식용 DSP IC(44)는 디지탈화된 음성신호를 메모리에 저장시키는 한편 계속해서 현재 디지탈화된 음성신호의 구간이 무음구간인지 여부를 판단하게 된다(스텝104, 스텝105)At this time, the DSP IC 44 for speech recognition provided at the rear end of the A / D converter 43 stores the digitalized voice signal in a memory and continuously determines whether the current digitalized voice signal section is a silent section. (Step 104, step 105)

참고적으로 상기 A/D컨버터(43)의 전단에는 필터(도시되지 않음)가 구비되어 주파수대역을 약 1KHz∼4KHz로 제한하게 된다. 즉, 상기 필터는 A/D컨버터(43)를 통해 디지탈화되는 오디오신호의 주파수대역을 1KHz∼4KHz의 음성영역으로 제한시키는 역할을 하게 된다.For reference, a filter (not shown) is provided at the front end of the A / D converter 43 to limit the frequency band to about 1KHz to 4KHz. That is, the filter serves to limit the frequency band of the audio signal digitized through the A / D converter 43 to the voice region of 1KHz to 4KHz.

상기 단계에서 음성인식용 DSP IC(44)에 의해 무음구간으로 판단되면 계속해서 연속음 구간의 음성데이터를 분석한 후 문자데이타로 변환시키게 된다(스텝106).If it is determined in the step that the speech recognition DSP IC 44 is in the silent section, it continuously analyzes the speech data in the continuous sound section and converts it into text data (step 106).

이때, 상기의 연속음구간 음성데이터를 인식하는 알고리즘은 대표적인 것으로 DTW(Dynamic Time Warping), HMM(Hidden Markov Model), ANN(Artificial Neural Network) 등이 있는데 각각에 대해 설명하면 다음과 같다.At this time, the algorithm for recognizing the voice data of the continuous sound intervals is typical, such as DTW (Dynamic Time Warping), HMM (Hidden Markov Model), ANN (Artificial Neural Network), each of which will be described as follows.

우선, DTW는 기준이 되는 음성신호의 패턴과 입력된 음성신호간의 유사도를 동적 프로그램을 이용하여 구하는 방법이다. HMM은 직관적인 알고리즘인 DTW와 달리 확률이론에 근거한 알고리즘으로 고립단어 인식으로부터 연속음성 인식에 이르기까지 폭넓게 사용되는 알고리즘으로 자리잡았으며 음성이 마르코브과정(Markov Process)으로 모델링될 수 있다는 가정 하에 학습과 인식과정으로 구분된다. 먼저 학습과정에서는 학습데이터의 관측일 0으로부터 조건부 확률λ가 최적화 기준에 맞게 HMM λ의 파라미터를 추정해 인식하고자 하는 단어의 모델을 결정한다. 한편 인식과정에서는 각 HMM에 대한 입력음성패턴의 조건부 확률 P(0/λ)를 구해 그 조건부 확률이 최대가 되는 HMM에 해당되는 단어를 인식된 결과로 삼는다. HMM은 현재 상용화된 상당수의 제품에 채택되고 있으며 특히 화자 독립, 연속음인식 방식에 많은 장점을 갖고 있으며 계산량도 앞에서 언급한 DTW 보다 작다. 단점으로는 학습데이터가 부족할 경우에 모델간의 변별력이 부족하고 음성신호간의 연관성을 무시하는 경향이 있다.First, DTW is a method of obtaining the similarity between the pattern of the voice signal as a reference and the input voice signal using a dynamic program. Unlike DTW, which is an intuitive algorithm, HMM is a probabilistic theory-based algorithm that has been widely used from isolated word recognition to continuous speech recognition. It is assumed that speech can be modeled using the Markov process. And recognition process. First, in the learning process, from the observation date 0 of the training data, the conditional probability λ estimates the parameters of the HMM λ according to the optimization criteria, and determines the model of the word to be recognized. On the other hand, in the recognition process, the conditional probability P (0 / λ) of the input speech pattern for each HMM is obtained, and the word corresponding to the HMM having the maximum conditional probability is regarded as the recognized result. HMM is currently adopted in a large number of commercially available products, and has many advantages, especially speaker-independent and continuous speech recognition, and the calculation is smaller than the DTW mentioned above. Disadvantages include lack of discrimination between models and neglect of the association between voice signals when there is a lack of training data.

ANN은 인공신경망이라 불리는 것으로 인간의 정보처리과정을 모델링해 간단하고 많은 처리요소를 병렬로 상호 연결하여 학습을 통해 입력패턴에 내재된 정보를 스스로 찾아내고 처리할 수 있도록 고안된 것이다.ANN is called artificial neural network, which is designed to model human information processing process and to find and process information inherent in input pattern through learning by connecting many processing elements in parallel.

일반적으로 음성인식 시스템은 음성신호→음성검출→특징추출→기준모델집합을 통한 유사도측정→최종인식결정→결과 의 단계를 거쳐 수행되는데 현실적으로 이상적인 음성검출기의 구현은 어려우며 대부분의 경우 0교차율(Zero Crossing Rate)과 에너지와 같이 계산량이 적은 시간영역의 파라미터만을 사용해 실시간으로 음성구간을 검출하고, 주변환경의 변화에 따라 음성을 검출하는 파라미터의 문턱값을 적응하는 식으로 주변환경의 영향을 적게 받도록 설계된다. 또한, 상술한 각 단계를 최적의 상태로 수행하기 위해 상술된 각종 알고리즘들이 복합적으로 채택되는 것이다.Generally, the speech recognition system is carried out through the steps of speech signal → speech detection → feature extraction → similarity measurement → final recognition determination → result through the reference model set. In reality, it is difficult to implement an ideal speech detector and in most cases zero crossing It is designed to be less affected by the surrounding environment by detecting the speech section in real time using only the parameters of the time domain with a small amount of computation such as rate and energy, and adapting the threshold of the parameter that detects the voice according to the change of the surrounding environment. do. In addition, the various algorithms described above are combined to perform each step described above in an optimal state.

한편, 음성인식용 DSP IC(44)에 의해 연속음구간의 음성데이터가 문자데이터로 변환되면 계속해서 상기 음성인식용 DSP IC(44)는 변환된 문자데이터를 마이콤(45)으로 전송시키고(스텝107), 상기 문자데이터를 인가받은 마이콤(45)은 다시 문자데이터를 오에스디부(46)로 보낸다(스텝108).On the other hand, when the voice data of the continuous sound section is converted into text data by the voice recognition DSP IC 44, the voice recognition DSP IC 44 then transfers the converted text data to the microcomputer 45 (step). 107), the microcomputer 45 which has received the text data sends the text data to the OS 46 again (step 108).

이때, 문자데이터를 인가받은 오에스디부(46)는 문자데이터를 자막처리하여 TV수상기 화면으로 출력시키게 된다(스텝109).At this time, the OS 46 receiving the text data subtitles the text data and outputs it to the TV receiver screen (step 109).

마지막으로 마이콤(45)은 브이시알이 현재 재생모드를 수행하고 있는지를 판단하게 되는데(스텝110), 이때 계속해서 재생모드를 수행중이면 스텝103으로 리턴하게 되고 재생모드가 오프되면 따라서 본 처리과정도 종료되게 된다.Finally, the microcomputer 45 determines whether VSI is currently in the playback mode (step 110). At this time, if the playback mode is continuously performed, the microcomputer 45 returns to step 103. Will end.

이러한 본 발명은 브이시알의 재생모드에서 사용자가 브이시알 세트의 일측이나 리모콘의 키입력부에 구비되는 자막모드키(40)를 온 시키면 오디오신호가 음성인식용 DSP IC(44)에 의해 문자데이터로 처리되고 계속해서 음성인식용 DSP IC(44)로 부터 문자데이터를 인가받은 마이콤(45)의 제어로 문자데이터가 오에스디부(46)에서 자막처리되어 TV수상기 화면으로 출력되도록 함으로써 사용자의 어학공부에 도움을 줄 뿐만아니라 청각장애자도 브이시알의 음성출력을 시각적으로 편리하게 인식할 수 있도록 한다.In the present invention, when the user turns on the caption mode key 40 provided on one side of the VSIAL set or the key input unit of the remote control, the audio signal is converted into character data by the DSP IC 44 for voice recognition. The language study of the user is performed by processing the text data under the control of the microcomputer 45 receiving the text data from the DSP IC 44 for speech recognition and outputting the caption data on the TV receiver screen. In addition to helping the hearing impaired, the hearing-impaired person can visually recognize the voice output of VSI.

Claims

When the user wants to watch a text caption through the TV receiver screen, the caption mode key 40 is provided on one side of the VSI set or the key input unit of the remote control so that an input signal is applied to the microcomputer 45 of VSI. A video signal processor 41 and an audio signal processor 42 for processing separated video / audio signals during playback of a tape, and an A / D converter for receiving an analog audio signal from the audio signal processor 42 and digitizing the same. 43) and the digitalized voice data received from the A / D converter 43 are stored in the memory, and the voice data is analyzed first to determine whether the input voice data is a silent section, and then determined to be a silent section. And a voice recognition DSP IC 44 for recognizing voice data in a continuous sound section and converting the voice data into text data during a silent period. Microcomputer 45 to control the OS dee 46 by receiving the text data from the; and the OS dee 46 to output text subtitles on the TV receiver screen under the control of the microcomputer 45, etc. Subtitle processing system of V'Sial characterized in that.

Initially determining whether the VSI is powered on (step 100); if the VSI is powered on in this step, sequentially determining whether the current BD is in the playback mode (step 101); Determining whether or not the key 40 is on (step 102), and if it is determined that the subtitle mode key 40 is turned on (step 102), digitalizing the audio signal in an analog state (step 103); Storing the audio signal of the audio signal in the memory (step 104), and subsequently determining whether the section of the current digitalized voice signal is a silent section (step 105); and in the step, the speech recognition DSP IC 44 If it is determined that the silent section is determined, the speech data of the continuous sound section is continuously recognized and converted into text data (step 106), and the text data converted by the DSP IC 44 for speech recognition. Transmitting the character data to the microcomputer 45 (step 107), and transmitting the character data to the OS-D 46 again when the character data is applied to the microcomputer 45 (step 108); When the text data is sent to the SD unit 46, the text data is subtitled to be output to the TV receiver screen (step 109), and after the above steps, it is determined whether the current VSI mode is the playback mode to continue playback. If it is in the mode, the process returns to step 103, or ends (step 110).