KR101179915B1

KR101179915B1 - Apparatus and method for cleaning up vocalization data in Voice Recognition System provided Statistical Language Model

Info

Publication number: KR101179915B1
Application number: KR1020110145698A
Authority: KR
Inventors: 김동희
Original assignee: 주식회사 예스피치
Priority date: 2011-12-29
Filing date: 2011-12-29
Publication date: 2012-09-06

Abstract

PURPOSE: A speaking data filtering device of a voice recognizing system using a statistical language model and a method thereof are provided to consistently filter a sentence made by voice recognition and update statistical language model grammar of an existing voice recognizing engine by a trained statistical language model. CONSTITUTION: A speaking voice collecting unit(1) collects speaking voices of users and corresponding keywords. A voice processing unit(100) converts a voice signal outputted from the speaking voice collecting unit into voice data. A voice recognizing unit(200) recognizes the voice data by stored statistical language model grammar. The voice recognizing unit updates the statistical language model grammar. A filtering unit(300) corrects a sentence made by voice recognition. The filtering unit generates an update version of a statistical language model about the corrected sentence.

Description

Apparatus and method for cleaning up vocalization data in Voice Recognition System provided Statistical Language Model}

본 발명은 통계적 언어 모델(Statistical Language Model: SLM)을 적용한 음성인식 시스템에 관한 것으로, 보다 상세하게는 수집된 고객의 발화(또는 "발성"이라 함)데이터를 음성인식 후 일관성 있는 형태로 정제한 후 통계적 언어 모델을 학습시켜 음성인식엔진의 통계적 언어 모델 그래마를 갱신하여 적용시킬 수 있는 통계적 언어 모델이 적용된 음성인식 시스템의 발화 데이터 정제 장치 및 방법에 관한 것이다.
The present invention relates to a speech recognition system using a statistical language model (SLM), and more particularly, to collect collected customer speech (or "talk") data in a consistent form after speech recognition. The present invention relates to an apparatus and method for refining speech data of a speech recognition system to which a statistical language model, which can be applied by updating a statistical language model grammar of a speech recognition engine by learning a statistical language model.

일반적으로, 고객이 어떤 제품의 구매, 또는 서비스의 만족을 결정하는 중요 요소들 중 하나가 제품의 AS 및 고객의 불만 해소 및 빠르고 신속한 사후 서비스 제공을 위한 기업 및 관공서 등의 대고객 서비스이다. 최근 대고객 서비스를 최전방에서 전담하는 창구역할을 수행하는 것이 자동응답시스템이다.In general, one of the important factors that determine the satisfaction of a customer's purchase of a product or service is customer-to-customer service such as an enterprise or a public office for resolving customer's AS and customer complaints and providing fast and prompt after-sales service. Recently, the automatic response system is to perform the window area dedicated to customer service.

자동응답시스템은 교환기와 자동응답시스템 기능에 의존했던 2세대를 거쳐 전화와 컴퓨터를 결합한 3세대 CTI 자동응답시스템(또는 "CTI 콜센터 시스템"이라 함)으로 발전하였다.Automated answering systems have evolved into third generation CTI automated answering systems (or "CTI call center systems"), combining telephones and computers, through the second generation, which relied on exchange and answering machine functions.

자동응답시스템은 기업 입장에서 고객서비스 처리비용을 절감할 수 있고, 지점의 역할을 수행할 뿐만 아니라 고객정보를 신속하고 효율적으로 쉽게 입수할 수 있으며, 신속한 서비스를 제공할 수 있는 이점으로 인하여 급속하게 발전하고 있다.Automated answering system can reduce the cost of customer service processing in the enterprise, act as a branch office, obtain customer information quickly and efficiently, easily, and provide rapid service. It is developing.

현재, 전화와 컴퓨터 둘을 통합한 CTI 자동응답시스템이 주류를 이루고 있으며, CTI 자동응답시스템은 컴퓨터와 전화를 결합시켜 사내로 들어오는 전화를 효율적으로 분산시키고, 상담자가 효율적으로 상담할 수 있도록 고객의 정보를 미리 파악하여 제공하고, 컴퓨터를 통한 다양한 부가 기능들을 제공하고 있다.Currently, CTI auto answering system that integrates both telephone and computer is the mainstream, and CTI auto answering system combines computer and telephone to distribute the incoming calls efficiently and to help the counselor to consult effectively. Information is grasped and provided in advance, and various additional functions are provided through a computer.

또한, 이러한 CTI 자동응답시스템에서 더 발전하여 고객의 요구사항을 빠르게 인지하고 처리하기 위해서 음성인식을 수행하는 음성인식 시스템이 포함되는 자동응답시스템들이 개발되어 적용되고 있다.In addition, in order to further develop the CTI auto answering system, an auto answering system including a voice recognition system for performing voice recognition in order to quickly recognize and process a customer's requirements has been developed and applied.

이러한 음성인식 시스템을 포함하는 자동응답시스템에 적용되는 음성인식 기술로는 통계적 언어 모델(Statistical Language Model: SLM)이 많이 적용되고 있다.Statistical language model (SLM) is widely applied as a speech recognition technology applied to an automatic answering system including the speech recognition system.

통계적 언어 모델(SLM)은 자연어 처리의 다양한 분야에서 시스템의 정확도를 높이고 수행 시간을 줄여 줄 수 있는 장점을 가진다.Statistical language model (SLM) has the advantage of improving the accuracy of the system and reducing the execution time in various fields of natural language processing.

그러나 사용자들의 나이, 언어 학습 능력, 사투리, 발음의 정확도 등에 따라 같은 문장이라도 매우 다양한 발성 패턴이 나타난다. 이들을 통계적 언어 모델로 학습시키기 위해 사용자들의 나이, 언어 학습 능력, 사투리, 발음의 정확도 각각에 대응하여 문자화하고, 발성 패턴을 분석한다. 이런 경우 인식 대상은 여러 형태로 학습되므로 인식결과는 원래의 형태의 의미가 아닌 다른 형태의 의미로 나타날 확률이 높아져 음성인식 성능이 떨어질 수 있는 문제점이 있었다.However, very different vocalization patterns appear in the same sentence depending on the user's age, language learning ability, dialect, and pronunciation accuracy. In order to train them with statistical language models, the user texts corresponding to age, language learning ability, dialect, and accuracy of pronunciation, and analyzes speech patterns. In this case, since the recognition target is learned in various forms, the recognition result has a high probability of appearing in a meaning other than that of the original form, thereby degrading speech recognition performance.

예를 들어 인식 대상을 A라 하면, 인식 대상 A에 대해 사용자의 나이, 언어 학습 능력, 사투리, 발음의 정확도 등에 따라 그 인식 결과는 A가 아닌 기타 B, C, D 등으로 나타낼 확률이 높아진다. 따라서 음성인식 성능은 떨어진다.
For example, if the recognition target is A, the recognition result A is more likely to be represented as B, C, D, or the like instead of A according to the user's age, language learning ability, dialect, and accuracy of pronunciation. Therefore, speech recognition performance is poor.

따라서, 본 발명의 목적은 수집된 고객의 발화데이터를 음성인식 후 일관성 있는 형태로 정제한 후 통계적 언어 모델을 학습시켜 음성인식엔진의 통계적 언어 모델 그래마를 갱신하여 적용시킬 수 있는 통계적 언어 모델이 적용된 음성인식 시스템의 발화 데이터 정제 장치 및 방법을 제공하는 데 있다.
Therefore, an object of the present invention is to apply a statistical language model that can be applied to update the statistical language model of the speech recognition engine by refining the collected speech data of the customer in a consistent form after speech recognition and then learning the statistical language model An apparatus and method for refining speech data of a speech recognition system are provided.

상기와 같은 목적을 달성하기 위한 본 발명의 통계적 언어 모델이 적용된 음성인식 시스템의 발화 데이터 정제 장치는; 사용자들의 발화음성과 상기 발화음성에 대한 키워드를 매칭시켜 수집하여 저장하고 있고 정제 처리 요청 이벤트의 발생 시 수집된 발화음성에 대한 음성신호들을 출력하고 출력되는 음성신호에 대응하는 키워드를 출력하는 발화음성 수집부와, 상기 발화음성 수집부로부터 출력되는 음성신호를 음성데이터로 변환하여 출력하는 음성 처리부와, 상기 음성 처리부로부터 출력되는 음성데이터를 미리 저장된 통계적 언어 모델 그래마에 의해 음성 인식하여 그 인식결과를 출력하며, 통계적 언어 모델 업데이트 버전을 입력받아 상기 통계적 언어 모델 그래마를 갱신하는 음성인식부와, 상기 음성인식 결과와 상기 음성인식 결과에 대응하는 키워드를 입력받아 음성인식 결과의 문장을 일관성 있게 교정하고, 교정된 문장에 대한 통계적 언어 모델 업데이트 버전을 생성하여 상기 음성인식부로 제공하는 정제부를 포함하는 것을 특징으로 한다.An apparatus for reproducing speech data of a speech recognition system to which the statistical language model of the present invention is applied to achieve the above object; A speech voice that matches and collects the user's speech voice and the keyword of the speech voice, collects and stores the voice signals for the speech voice collected when the refining processing request event is generated, and outputs a keyword corresponding to the output voice signal. A speech processing unit for converting the speech signal output from the speech speech collecting unit into speech data and outputting the speech data; and a speech recognition result of the speech data output from the speech processing unit by pre-stored statistical language model grammar, Outputs a speech recognition unit for receiving a statistical language model update version and updating the statistical language model grammar, and inputting a keyword corresponding to the speech recognition result and the speech recognition result and consistently correcting a sentence of the speech recognition result. Update statistical language model for corrected sentences It characterized in that it comprises a tablet unit for generating a version to provide the speech recognition unit.

상기 음성인식부는, 상기 통계적 언어 모델 그래마가 상기 통계적 언어 모델 업데이트 버전에 의해 갱신되면 상기 갱신된 통계적 언어 모델에 대한 시뮬레이션을 수행하여 시뮬레이션 결과를 출력하는 시뮬레이션부를 포함하는 것을 특징으로 한다.The speech recognition unit may include a simulation unit configured to output a simulation result by performing a simulation on the updated statistical language model when the statistical language model grammar is updated by the statistical language model updated version.

상기 정제부는, 상기 음성인식부로부터 출력되는 문장단위의 인식결과로부터 단어 및 어절 단위로 분할하여 출력하는 어절 검출부와, 상기 교정 종류에 따라 상기 단어, 어절, 문장단위의 교정을 수행하기 위한 교정 종류에 따른 사전들을 저장하는 사전부와, 상기 어절 검출부로부터 출력되는 인식결과의 문장에 대한 단어 및 어절들과 키워드를 입력받고, 상기 사전부를 참조하여 상기 교정 종류에 따른 교정을 수행하여 교정된 단어 및 어절들을 교정 종류별로 출력하는 교정부와, 상기 교정부에서 교정 종류별로 출력되는 문장들을 결합하여 교정된 하나의 문장을 생성하고, 생성된 문장에 대해 통계적 언어 모델에 의한 훈련을 수행하여 통계적 언어 모델 업데이트 버전을 생성하여 출력하는 통계적 언어 모델 생성부를 포함하는 것을 특징으로 한다.The refiner may include a word detector that divides and outputs a word and word unit from a sentence unit recognition result output from the speech recognizer, and a correction type for performing the correction of the word, word, and sentence unit according to the correction type. A word word and word phrases and keywords for a sentence of a recognition result output from the word detection part, and a word corrected by performing a correction according to the type of correction by referring to the dictionary part; The correction unit for outputting words by correction type and the sentence output by each correction type are combined to generate a corrected sentence, and the statistical language model is trained by the statistical language model on the generated sentence. And a statistical language model generator for generating and outputting an updated version. .

상기 사전부는, 단어 단위의 오타를 교정하기 위한 단어들의 문법을 정의하는 문법 사전과, 어절 또는 복수의 어절단위의 띄어쓰기를 교정하기 위해 적용 분야별로 특화되는 띄어쓰기 문법을 저장하는 띄어쓰기 사전과, 단어 단위의 소리 나는 대로 쓰기 교정을 위한 표준 쓰기와 소리 나는 대로 쓰기가 다른 단어들에 대해 상호 매칭시켜 저장하고 있는 쓰기 변환 사전과, 어절단위로 어미를 일치시키기 위한 다수의 표준 어미들 각각에 대한 가변어미들을 정의하고 그 가변 어미들을 일치시킬 대표 어미를 정의하고 있는 어미 일치 사전을 포함하는 것을 특징으로 한다.The dictionary unit may include a grammar dictionary for defining grammar of words for correcting a typo in a word unit, a spacing dictionary for storing a spacing grammar specialized for each application field to correct a spacing of a word or a plurality of word units, and a word unit. Standard conversions for phonetic writing and phonetic writing for different words, and a variable conversion dictionary for each of a number of standard endings for matching words on a word-by-word basis. And a match dictionary which defines a representative ending to match the variable endings.

상기 교정부는, 어절 검출부로부터 상기 음성인식 결과의 문장을 구성하는 단어들을 입력받고 상기 문법 사전을 참조하여 상기 단어들 중 오타가 있는지를 검사하고, 오타가 있으면 정상적인 단어로 교정하여 통계적 언어 모델 생성부로 출력하는 오타 교정부와, 어절 검출부로부터 문장단위의 어절들을 입력받고, 상기 문장에 대응하는 키워드를 발화 음성 수집부로부터 입력받아 상기 띄어쓰기 사전을 참조하여 상기 문장의 어절들에 대해 키워드 기반의 띄어쓰기 교정을 수행하여 통계적 언어 모델 생성부로 출력하는 띄어쓰기 교정부와, 어절 검출부로부터 인식결과의 문장에 대한 단어들을 입력받고, 상기 쓰기변환 사전을 참조하여 상기 단어들이 표준쓰기와 소리 나는 대로 쓰기가 다른 단어인지를 판단하고, 표준쓰기와 소리 나는 대로 쓰기가 다른 단어이면 소리 나는 대로 쓰기로 변환하여 통계적 언어 모델 생성부로 출력하는 소리 나는 대로 쓰기 교정부와, 어절 검출부로부터 상기 인식결과의 문장에 대한 어절들을 입력받고, 상기 어미 일치 사전을 참조하여 상기 어절의 어미에 대응하는 대표 어미로 일치시켜 통계적 언어 모델 생성부로 출력하는 어미 일치 교정부를 포함하는 것을 특징으로 한다.The correction unit receives words constituting the sentence of the speech recognition result from the word detection unit, checks whether there is a typo among the words by referring to the grammar dictionary, corrects a normal word if there is a typo, and generates a statistical language model generator. Corrects a word-based spacing for the words of the sentence by referring to the spacing dictionary by receiving a typo correcting unit and a sentence unit of words from the word detecting unit, and receiving a keyword corresponding to the sentence from the spoken speech collector. A spacing correction unit for outputting to a statistical language model generator and outputting words for a sentence of a recognition result from a word detection unit, and referring to the writing conversion dictionary, whether the words are different from standard writing and writing as sounds. Judge the standard writing and phonetic writing If it is a different word, the phonetic writing correction unit which converts the phonetic writing into a phonetic writing and outputs it to the statistical language model generating unit, and receives the words for the sentence of the recognition result from the word detecting unit, and refers to the And a mother matching correction unit for matching to a representative mother corresponding to the mother and outputting to the statistical language model generator.

상기 쓰기 변환 사전은, 표준쓰기인 기본 어휘들과 상기 기본 어휘들 중 발음 정확도 및 사투리에 따라 달라지는 기본 어휘들에 대해 상기 발음 정확도 및 사투리에 따라 실제 발음되는 복수의 추가 발음이 더 정의되고, 상기 소리 나는 대로 쓰기 교정부는, 어절 검출부로부터 상기 인식 결과의 문장에 대한 단어들을 입력받고 상기 쓰기 변환 사전을 참조하여 상기 단어들 각각의 기본 어휘에 대해 발음 정확도 및 사투리에 따라 복수의 발음이 존재하는 경우 음성인식시스템이 적용된 지역에 대응하는 실제 발성(소리 나는 대로 쓰기)에 대한 단어로 교정하여 통계적 언어 모델 생성부로 출력하는 것을 특징으로 한다.The writing conversion dictionary may further define a plurality of additional pronunciations which are actually pronounced according to the pronunciation accuracy and dialect for basic vocabulary that is standard writing and basic vocabularies that vary depending on pronunciation accuracy and dialect among the basic vocabulary. The phonetic writing correction unit receives a word for the sentence of the recognition result from the word detection unit, and when there are a plurality of pronunciations according to pronunciation accuracy and dialect for the basic vocabulary of each of the words by referring to the writing conversion dictionary. Characterized by the words for the actual utterance (writing as spoken) corresponding to the region to which the speech recognition system is applied is characterized in that the output to the statistical language model generator.

상기 발화음성 수집부는, 자동응답시스템을 이용하여 자동응답시스템으로 전화를 건 사용자들의 발화음성 및 상기 발화음성을 유도한 시나리오의 해당 단계의 키워드를 매칭시켜 수집하는 것을 특징으로 한다.The spoken voice collecting unit may collect and match the spoken voices of the users who call the automatic answering system using the automatic answering system and the keywords of the corresponding stages of the scenario inducing the spoken voice.

상기와 같은 목적을 달성하기 위한 본 발명의 통계적 언어 모델이 적용된 음성인식 시스템의 발화 데이터 정제 방법은; 발화 음성 수집부가 사용자들의 발화음성과 상기 발화음성에 대응하는 키워드를 매칭하여 수집하는 수집과정과, 정제 처리 요청 이벤트 발생 시 발화 음성 수집부가 상기 수집된 발화음성 단위로 상기 발화음성과 매칭된 키워드를 출력하는 검사 정제 개시 과정과, 음성 처리부가 상기 발화음성에 대한 음성신호를 입력받아 음성데이터로 변환하여 출력하는 음성 데이터 변환 과정과, 음성인식부가 상기 음성 데이터를 입력받아 음성인식을 수행하여 상기 발화음성에 대응하는 음성인식 결과의 문장을 출력하는 음성인식 과정과, 교정부가 상기 음성인식 결과의 문장을 입력받고, 상기 발화 음성 수집부로부터 상기 발화음성에 매칭된 키워드를 입력받아 음성인식 결과의 문장을 일관성 있게 교정하고, 교정된 문장에 대한 통계적 언어 모델 업데이트 버전을 생성하여 상기 음성인식부로 제공하는 정제 과정과, 상기 음성인식부가 통계적 언어 모델 업데이트 버전을 입력받아 상기 통계적 언어 모델 그래마를 갱신하는 통계적 언어 모델 그래마 갱신 과정을 포함하는 것을 특징으로 한다.Speech data purification method of the speech recognition system to which the statistical language model of the present invention is applied to achieve the above object; A collection process of collecting and matching the speech voices of the users with the speech voices and the keywords corresponding to the speech voices; and when the refining processing request event occurs, the speech voice collection unit uses the collected speech voice units to match the keywords with the speech voices. A process of initiating and outputting the inspection and refining, a voice data conversion process of receiving a voice signal for the spoken voice and converting the voice signal into voice data, and a voice recognition unit receiving the voice data to perform voice recognition to perform the speech A voice recognition process of outputting a sentence of a speech recognition result corresponding to a voice, and a correction unit receives a sentence of the speech recognition result, and receives a keyword matched to the speech voice from the speech speech collection unit receives a sentence of the speech recognition result The language consistently and update the statistical language model for the corrected statement Purification to produce the former to provide parts of the speech recognition, and receives the speech recognition, an input unit for statistical language model updated version of the statistical language model that updates the statistical language model dry Yes Yes Do characterized in that it comprises the updating process.

상기 음성인식부가 상기 갱신된 통계적 언어 모델 그래마에 대해 시뮬레이션을 수행하고, 수행된 시뮬레이션 결과를 출력하는 시뮬레이션 과정을 더 포함하는 것을 특징으로 한다. The speech recognition unit may further include a simulation process of performing a simulation on the updated statistical language model grammar and outputting the performed simulation result.

상기 정제 처리 요청 이벤트는, 시간, 일, 주, 월 및 년의 기간 중 하나 이상의 일정 기간 간격으로 자동 발생하는 것을 특징으로 한다.The refining processing request event may be automatically generated at one or more predetermined periods of time, days, weeks, months, and years.

상기 정제 과정은, 어절 검출부가 상기 인식결과의 문장을 단어 및 어절들로 분할하여 출력하는 어절 검출 단계와, 교정부가 교정 종류에 따른 상기 분할된 단어 및 어절들 및 상기 키워드를 입력받고 사전부를 참조하여 상기 교정 종류에 따른 교정을 수행하여 교정된 단어 및 어절들을 교정 종류 별로 출력하는 교정 단계와, 통계적 언어 모델 생성부가 상기 교정 종류 별로 출력된 단어 또는 어절들로 구성되는 문장들을 교정부로부터 입력받아 각 교정 종류별로 교정이 반영된 하나의 문장을 생성하고, 생성된 문장에 대한 통계적 언어 모델 업데이트 버전을 생성하여 음성인식부로 출력하는 통계적 언어 모델 생성 단계를 포함하는 것을 특징으로 한다.The refinement process may include a word detection step of the word detection unit dividing the sentence of the recognition result into words and words, and a correction unit receiving the divided words, words and the keywords according to the type of correction, and referring to a dictionary unit. A calibration step of outputting the corrected words and phrases for each type of correction by performing a correction according to the type of correction, and receiving a sentence composed of words or words output by the statistical language model generation unit for each type of correction from the correction unit Generating a sentence reflecting the correction for each type of correction, generating a statistical language model updated version of the generated sentence, and outputting it to the speech recognition unit.

상기 교정 단계는, 어절 검출부로부터 상기 음성인식 결과의 문장을 구성하는 단어들을 입력받고 문법 사전을 참조하여 상기 단어들 중 오타가 있는지를 검사하고, 오타가 있으면 정상적인 단어로 교정하여 통계적 언어 모델 생성부로 출력하는 오타 교정 단계와, 어절 검출부로부터 문장단위의 어절들을 입력받고, 상기 문장에 대응하는 키워드를 발화 음성 수집부로부터 입력받아 띄어쓰기 사전을 참조하여 상기 문장의 어절들에 대해 키워드 기반의 띄어쓰기 교정을 수행하여 통계적 언어 모델 생성부로 출력하는 띄어쓰기 교정 단계와, 어절 검출부로부터 인식결과의 문장에 대한 단어들을 입력받고, 쓰기 변환 사전을 참조하여 상기 단어들이 표준쓰기와 소리 나는 대로 쓰기가 다른 단어인지를 판단하고, 표준쓰기와 소리 나는 대로 쓰기가 다른 단어이면 소리 나는 대로 쓰기로 변환하여 통계적 언어 모델 생성부로 출력하는 소리 나는 대로 쓰기 교정 단계와, 어절 검출부로부터 상기 인식결과의 문장에 대한 어절들을 입력받고, 어미 일치 사전을 참조하여 상기 어절의 어미에 대응하는 대표 어미로 일치시켜 통계적 언어 모델 생성부로 출력하는 어미 일치 교정 단계를 포함하는 것을 특징으로 한다.In the correcting step, the words constituting the sentence of the speech recognition result are input from the word detection unit, and a grammar dictionary is checked to see if there is a typo among the words, and if there is a typo, it is corrected to a normal word to the statistical language model generator. A word-based word correction for the words of the sentence is performed by a typo correcting step of outputting the word, a word unit of the sentence from the word detection unit, and a keyword corresponding to the sentence from the spoken speech collector. A spacing correction step of outputting to a statistical language model generation unit and receiving words for sentences of the recognition result from the word detection unit, and determining whether the words are different from standard writing and writing as described by referring to a writing conversion dictionary. Different from standard writing and phonetic writing If the word is converted into phonetic writing and output to the statistical language model generation unit, the phonetic writing correction step and the word detection unit receive the words for the sentence of the recognition result, and the word matching dictionary is referred to the word ending. And matching the corresponding representative endings to output to the statistical language model generation unit.

상기 교정 단계는, 어절 검출부로부터 상기 인식 결과의 문장에 대한 단어들을 입력받고 상기 쓰기 변환 사전을 참조하여 상기 단어들 각각의 기본 어휘에 대해 발음 정확도 및 사투리에 따라 복수의 발음이 존재하는 경우 음성인식시스템이 적용된 지역에 대응하는 실제 발성(소리 나는 대로 쓰기)에 대한 단어로 교정하여 통계적 언어 모델 생성부로 출력하는 추가 발음 교정 단계를 더 포함하는 것을 특징으로 한다.In the correcting step, when the words for the sentence of the recognition result are input from the word detection unit and a plurality of pronunciations are present according to pronunciation accuracy and dialect for the basic vocabulary of each word by referring to the writing conversion dictionary The apparatus further includes an additional pronunciation correction step of correcting the word for the actual utterance (writing as spoken) corresponding to the region to which the system is applied and outputting it to the statistical language model generator.

상기 수집과정에서 수집되는 상기 발화음성은, 자동응답시스템을 이용하여 자동응답시스템으로 전화를 건 사용자들이 발화한 음성이고, 상기 키워드는 상기 발화음성을 유도한 시나리오의 해당 단계의 키워드인 것을 특징으로 한다.
The spoken voice collected in the collection process is a voice spoken by users calling an automatic answering system using an automatic answering system, and the keyword is a keyword of a corresponding step of a scenario inducing the spoken voice. do.

본 발명은 사용자들이 발화한 발성데이터를 음성 인식한 문장을 최대한 일관성 있게 정제한 후 통계적 언어 모델에 따라 훈련시키고, 훈련된 통계적 언어 모델에 의해 기존 음성인식엔진의 통계적 언어 모델 그래마를 갱신시킴으로써 통계적 언어 모델을 적용한 음성인식엔진의 음성인식 성능을 향상시킬 수 있는 효과를 가진다.The present invention is to refine the speech recognition sentences of the user uttered speech data as consistently as possible and then trained according to the statistical language model, and by updating the statistical language model grammar of the existing speech recognition engine by the trained statistical language model statistical language It has the effect of improving the speech recognition performance of the speech recognition engine applying the model.

도 1은 본 발명에 따른 통계적 언어 모델이 적용된 음성인식 시스템의 발화 데이터 정제 장치의 구성을 나타낸 도면이다.
도 2는 본 발명에 따른 발화 데이터 정제 장치의 정제부의 상세 구성을 나타낸 도면이다.
도 3은 본 발명에 따른 통계적 언어 모델이 적용된 음성인식 시스템의 발화 데이터 정제 방법을 나타낸 흐름도이다.
도 4는 본 발명에 따른 통계적 언어 모델이 적용된 음성인식 시스템의 발화 데이터 정제 방법의 정제 과정을 상세히 나타낸 흐름도이다. 1 is a view showing the configuration of a speech data purification apparatus of a speech recognition system to which a statistical language model is applied according to the present invention.
2 is a view showing the detailed configuration of a purification unit of the ignition data purification device according to the present invention.
3 is a flowchart illustrating a method of purifying speech data of a speech recognition system to which a statistical language model is applied according to the present invention.
Figure 4 is a flow chart illustrating in detail the purification process of the speech data purification method of the speech recognition system to which the statistical language model is applied according to the present invention.

이하 도면을 참조하여 본 발명에 따른 통계적 언어 모델이 적용된 음성인식 시스템의 발화 데이터 정제 장치의 구성 및 동작을 설명하고, 그 장치에서의 발화 데이터 정제 방법을 설명한다.Hereinafter, a configuration and operation of a speech data purification apparatus of a speech recognition system to which a statistical language model is applied according to the present invention will be described, and a speech data purification method in the apparatus will be described.

도 1은 본 발명에 따른 통계적 언어 모델이 적용된 음성인식 시스템의 발화 데이터 정제 장치의 구성을 나타낸 도면이다.1 is a view showing the configuration of a speech data purification apparatus of a speech recognition system to which a statistical language model is applied according to the present invention.

도 1을 참조하면, 본 발명에 따른 통계적 언어 모델이 적용된 음성인식 시스템의 발화데이터 정제 장치는 발화음성 수집부(1)와 음성처리부(100)와 음성인식부(200)와 정제부(300)를 포함한다.Referring to FIG. 1, an apparatus for refining speech data of a speech recognition system to which a statistical language model is applied according to the present invention includes a speech collecting unit 1, a speech processing unit 100, a speech recognition unit 200, and a purification unit 300. It includes.

발화음성 수집부(1)는 사용자의 발화음성과 상기 발화음성에 상응하는 키워드를 매칭하여 수집한다. 상기 발화음성은 음성인식을 이용하는 자동응답시스템이 구축된 콜센터로 전화를 건 사용자들의 발화음성들을 녹음하여 수집하거나 직접 개발자가 녹취하여 수집할 수 있고, 상기 키워드는 상기 각 발화음성을 유도한 자동응답시스템의 시나리오의 해당 단계의 키워드일 수 있고 개발자가 직접 입력한 키워드일 수도 있을 것이다. 이때 녹음되는 시나리오는 문장단위로 녹음되고, 녹음된 발화음성과 상기 발화음성에 매칭된 키워드를 고유의 식별정보에 매칭하여 저장한다.The spoken voice collecting unit 1 collects and matches a user's spoken voice with a keyword corresponding to the spoken voice. The spoken voice may be collected by collecting the voices of the users who call the call center in which the automatic answering system using the voice recognition is constructed, or collected by the developer directly, and the keyword may be an automatic response that induces each spoken voice. It could be a keyword for that phase of the system's scenario, or a keyword entered directly by the developer. In this case, the recorded scenario is recorded in sentence units, and the recorded speech voice and a keyword matching the speech voice are matched and stored with unique identification information.

발화음성 수집부(1)는 정제 처리 요청 이벤트의 발생 시 수집된 발화 음성들에 대한 음성신호를 음성 처리부(100)로 출력하고, 상기 음성 처리부(100)로 출력된 음성신호에 매칭된 키워드를 정제부(300)로 출력한다. 이때 정제부(300)는 상기 입력된 키워드에 포함된 식별정보에 의해 상기 음성 처리부(100)로 출력된 음성신호에 대한 것임을 알고 있다. 상기 정제 처리 요청 이벤트는 시간, 일, 주, 월 및 년 등과 같이 일정 기간 들 중 적어도 하나 이상의 기간 단위로 발생하도록 구성될 수도 있고, 음성인식 개발자들의 요청에 의해서 발생할 수도 있다.The spoken voice collecting unit 1 outputs a voice signal for the spoken voices collected when the purification processing request event occurs to the voice processing unit 100, and outputs a keyword matching the voice signal output to the voice processing unit 100. Output to the purification unit 300. At this time, the purification unit 300 knows that the voice signal output to the voice processing unit 100 by the identification information included in the input keyword. The refining processing request event may be configured to occur at least one or more of a certain period of time, such as time, day, week, month and year, or may be generated by a voice recognition developer's request.

음성 처리부(100)는 상기 발화음성 수집부(1)로부터 출력되는 음성신호를 음성데이터로 변환하여 음성인식부(200)로 출력한다.The voice processing unit 100 converts the voice signal output from the spoken voice collecting unit 1 into voice data and outputs the voice signal to the voice recognition unit 200.

음성인식부(200)는 상기 음성 처리부(100)로부터 입력되는 음성데이터를 통계적 언어 모델에 의해 훈련되어 미리 저장되어 있는 통계적 언어 모델 그래마에 의해 음성인식을 수행하고 그 인식결과를 정제부(300)로 출력한다. 그리고 음성인식부(200)는 정제부(300)로부터 통계적 언어 모델 업데이트 버전을 입력받아 미리 저장되어 있던 통계적 언어 모델 그래마를 갱신시킨다. 음성인식부(200)는 또한, 통계 언어 모델 그래마의 갱신 시 갱신된 통계언어 모델에 대한 시뮬레이션을 수행하고, 시뮬레이션 결과를 출력한다. 출력된 시뮬레이션 결과는 저장되고, 저장된 시뮬레이션 결과는 개발자의 선택에 따라 화면에 표시되거나 프린트될 수 있을 것이다.The speech recognition unit 200 performs speech recognition on the speech data input from the speech processing unit 100 by a statistical language model grammar, which is trained by a statistical language model and stored in advance, and the recognition unit 300 receives the recognition result. Will output The voice recognition unit 200 receives the statistical language model update version from the refiner 300 and updates the statistical language model grammar that has been stored in advance. The speech recognition unit 200 also performs a simulation on the updated statistical language model when the statistical language model grammar is updated, and outputs a simulation result. The output simulation result is stored, and the stored simulation result may be displayed or printed according to the developer's choice.

정제부(300)는 상기 음성인식부(200)로부터 입력되는 음성인식 결과의 문장과 키워드를 입력받아 상기 문장의 오타, 키워드 중심의 띄어쓰기, 소리 나는 대로 쓰기 및 어미 일치 등의 교정 과정을 거쳐 음성인식 결과의 문장을 일관성 있게 교정하고, 교정된 문장에 대한 통계적 언어 모델 업데이트 버전을 생성하여 상기 음성인식부(20)로 제공한다. 또한 정제부(300)는 교정된 문장들을 저장하고 저장된 문장들의 단어 및 어절들을 검사하여 특정 어휘의 띄어쓰기가 여러 가지로 되어 있는 경우 체크해 교정 규칙 추가 대상 리스트를 생성하고 아웃풋 파일로 출력한다.
The purification unit 300 receives a sentence and a keyword of the voice recognition result input from the voice recognition unit 200 and undergoes a correction process such as a typo of the sentence, keyword-based spacing, writing as it sounds and matching the ending. The sentence of the recognition result is consistently corrected, and a statistical language model updated version of the corrected sentence is generated and provided to the speech recognition unit 20. In addition, the refiner 300 stores the corrected sentences, checks the words and phrases of the stored sentences, checks when there are various spacings of a specific vocabulary, generates a list of correction rule addition targets, and outputs the output list.

상기 도 1에서는 정제부의 개략적인 동작만을 설명하였다. 이하 도 2에서는 본 발명에 따른 정제부(300)의 상세 구성 및 동작을 설명한다.In FIG. 1, only the schematic operation of the purification unit has been described. Hereinafter, a detailed configuration and operation of the purification unit 300 according to the present invention will be described.

도 2는 본 발명에 따른 발화 데이터 정제 장치의 정제부의 상세 구성을 나타낸 도면이다.2 is a view showing the detailed configuration of a purification unit of the ignition data purification device according to the present invention.

정제부(300)는 어절 검출부(310)와, 교정부(320)와 사전부(330)와 통계적 언어 모델 생성부(340) 및 정제 규칙 생성부(350)를 포함한다.The refiner 300 includes a word detector 310, a corrector 320, a dictionary 330, a statistical language model generator 340, and a refinement rule generator 350.

어절 검출부(310)는 음성인식부(200)로부터 출력되는 문장단위의 인식결과를 입력받고, 단어 및 어절 단위로 분할한 후 교정 종류에 따라 단어, 어절, 복수의 어절 및 문장 단위의 어절들을 출력한다.The word detection unit 310 receives a recognition result of a sentence unit output from the voice recognition unit 200, divides the word unit into words and word units, and outputs words, words, a plurality of words, and units of sentences according to the type of correction. do.

사전부(330)는 상기 교정부(320)에서 교정 종류에 따른 단어, 어절, 복수의 어절, 및 문장단위의 어절들 중 하나 이상에 대해 교정을 수행하기 위해 상기 교정 종류에 따른 다수의 사전들을 구비한다.The dictionary unit 330 may include a plurality of dictionaries according to the correction type in order to perform correction on one or more of words, phrases, a plurality of words, and sentences in sentence units according to the correction type in the correction unit 320. Equipped.

구체적으로, 상기 사전부(330)는 문법 사전(321)과 띄어쓰기 사전(332)과, 소리 나는 대로 쓰기 사전(333)과 어미 일치 사전(334)은 포함한다.In detail, the dictionary unit 330 includes a grammar dictionary 321, a spacing dictionary 332, a writing dictionary 333 and a coincidence dictionary 334 as they are spoken.

상기 문법 사전(321)은 단어 단위의 오타를 교정하기 위한 단어들의 문법을 정의한다.The grammar dictionary 321 defines grammars of words for correcting a typo in a word unit.

띄어쓰기 사전(332)은 어절 또는 복수의 어절단위의 띄어쓰기를 교정하기 위해 적용 분야별로 특화되는 띄어쓰기 문법을 저장한다.The spacing dictionary 332 stores a spacing grammar that is specialized for each application field in order to correct spacing of a word or a plurality of word units.

쓰기 변환 사전(333)은 단어 단위의 소리 나는 대로 쓰기 교정을 위한 표준 쓰기와 소리 나는 대로 쓰기가 다른 단어들에 대해 상호 매칭시켜 저장한다. 또한, 쓰기 변환 사전(333)은 하기 표 1과 같이 발음 정확도 및 사투리 등에 따라 같은 단어라도 여러 형태로 발음하게 되므로 언어모델 제작 시 사용되는 발화 데이터 내 각 어휘의 기본 발음 사전에 추가로 발음될 수 있는 실제 발음이 추가된다. 다시 말하면, 상기 쓰기 변환 사전(333)은 기본 어휘들과 상기 기본 어휘들 중 발음 정확도 및 사투리에 따라 달라지는 기본 어휘들에 대해 상기 발음 정확도 및 사투리에 따라 실제 발음되는 복수의 추가 발음이 더 정의된다.
The write conversion dictionary 333 matches and stores words that are different from each other in terms of phonetic writing and phonetic writing. In addition, the write conversion dictionary 333 may be pronounced in addition to the basic pronunciation dictionary of each vocabulary in the spoken data used in producing a language model, since the same word may be pronounced in various forms according to pronunciation accuracy and dialect, as shown in Table 1 below. The actual pronunciation is added. In other words, the writing conversion dictionary 333 may further define a plurality of additional pronunciations which are actually pronounced according to the pronunciation accuracy and dialect for basic vocabularies and basic vocabularies that vary depending on pronunciation accuracy and dialect among the basic vocabularies. .

어휘
(표준 쓰기)Vocabulary
(Standard write) 발음사전Pronunciation dictionary 실제 발성
(소리 나는 대로 쓰기)Real speech
(Write as you hear) 인식결과Recognition result
메시지
message mzedZimzedZi 기본 발음사전Basic Pronunciation Dictionary 메세지message 메세지message mezidZimezidZi 추가 발음사전Pronunciation Pronunciation 메시지message 메세지message mesedZimesedZi 추가 발음사전Pronunciation Pronunciation 메쎄지Message 메세지message mesidZimesidZi 추가 발음사전Pronunciation Pronunciation 메씨지Message 메세지message

어미 일치 사전(334)는 어절단위로 어미를 일치시키기 위한 다수의 표준 어미들 각각에 대한 가변어미들을 정의하고 그 가변 어미들을 일치시킬 대표 어미를 정의한다. 상기 대표 어미와 표준 어미는 일치할 수도 있고, 다를 수도 있다.The ending matching dictionary 334 defines variable endings for each of a plurality of standard endings for matching endings on a word-by-word basis and defines a representative ending for matching the variable endings. The representative and standard endings may or may not be identical.

교정부(320)는 오타 교정부(321)와 띄어쓰기 교정부(322)와, 소리 나는 대로 쓰기 교정부(323)와 어미 일치 교정부(324)를 포함하여, 상기 어절 검출부(310)로부터 입력되는 단어, 어절, 복수의 어절 및 문장단위의 어절들에 대해 교정 종류에 따라 일관성을 가지도록 교정하여 출력한다. 상기 교정 종류는 오타 교정, 띄어쓰기 교정, 소리 나는 대로 쓰기 교정 및 어미 일치 교정 등이 있다.The corrector 320 includes a typo corrector 321, a spacing corrector 322, a write corrector 323, and a coincidence corrector 324 as it sounds, from the word detector 310. The words, words, plural words, and sentence units of words are corrected and output to be consistent according to the type of correction. Types of correction include typo correction, spacing correction, phonetic write correction, and parental correction.

구체적으로 설명하면, 오타 교정부(321)는 상기 사전부(330)의 문법 사전(331)을 참조하여 상기 음성인식 결과의 문장을 구성하는 단어들에 오타가 있는지를 검사하고, 오타가 있으면 정상적인 단어로 교정하여 통계적 언어 모델 생성부(340)로 출력한다.Specifically, the typo correcting unit 321 refers to the grammar dictionary 331 of the dictionary unit 330 to check whether there is a typo in words constituting the sentence of the speech recognition result, and if there is a typo, The proofed word is output to the statistical language model generator 340.

띄어쓰기 교정부(332)는 어절 검출부(310)로부터 문장단위의 어절들을 입력받고, 상기 문장에 대응하는 키워드를 발화음성 수집부(1)로부터 입력받고, 띄어쓰기 사전(332)은 참조하여 키워드 기반의 띄어쓰기 교정을 수행하여 출력한다.The spacing correction unit 332 receives words in sentence units from the word detection unit 310, receives a keyword corresponding to the sentence from the spoken speech collection unit 1, and the spacing dictionary 332 refers to a keyword-based word. Perform spacing correction and output.

하기 표 1은 키워드 중심의 띄어쓰기 교정 예들을 나타낸 것으로, 표 1의 첫 번째 문장을 예를 들어 설명하면, 띄어쓰기 교정부(332)는 첫 번째 문장에 대한 키워드 '결제대금'을 발화음성 수집부(1)로부터 입력받고, 문장단위의 어절들 ['이번', '달', '결제대금이', '얼마에요']를 입력받는다. 상기와 같이 문장단위의 어절들과 키워드가 입력되면 띄어쓰기 교정부(332)는 ['이번', '달', '결제대금', '이', '얼마에요']같이 교정하여 통계적 언어 모델 생성부(340)로 출력한다. 이렇게 함으로써 하기 표 2에서와 같이 발성 데이터 원본으로 훈련할 경우 문장 당 1회 학습의 효과밖에 가지지 못하던 것이 키워드 중심으로 분리하여 훈련을 시킴으로써 서로 다른 문장 안에 반복적으로 일치하는 키워드가 형성되어 키워드에 대한 더 많은 훈련을 수행할 수 있는 효과를 가진다. 이와 같이 훈련의 수가 증가함에 따라 음성인식 성능 또한 향상된다.
Table 1 below shows keyword-based spacing correction examples. When the first sentence of Table 1 is described as an example, the spacing correction unit 332 may generate a keyword 'payment price' for the first sentence. 1) input from the sentence, sentence units ['this time', 'month', 'payment price', 'how much'] is input. As described above, when the phrases and keywords in sentence units are input, the spacing correction unit 332 generates a statistical language model by correcting as ['this', 'month', 'payment amount', 'yi', 'how much']. Output to the unit 340. By doing so, as shown in Table 2 below, when training with a vocal data source, only the effect of learning once per sentence was separated by the keyword and trained to form a keyword that was repeatedly matched in different sentences. It has the effect of performing a lot of training. As the number of training increases, voice recognition performance also improves.

발성 데이터 원본Vocal data source 학습 효과Learning effect 키워드 분리Keyword separation 학습 효과Learning effect 이번 달 결제대금이 얼마에요How much is the payment this month 각 문장 당 1회 학습1 lesson per sentence 이번 달 결제대금 이 얼마예요How much is the payment this month 결제대금 3회 학습3 times of payment 결제대금을 좀 알려주세요.Please tell me your payment. 결제대금을 좀 알려주세요Please tell me your payment 결제대금 내려고Trying to pay 결제대금 내려고요I'm paying the bill. 나의 요금제를 바꿀 거예요I will change my plan 나의 요금제를 바꿀 거예요I will change my plan 나의 요금제 2회 학습My plan two lessons 분실 신고 하려고요I want to report my loss. 분실 신고 하려고요I want to report my loss. 분실신고가 잘 됐는지 확인할라고I want to check if the loss has been reported well. 분실 신고 가 잘 됐는지 확인할라고To check the loss report 분실 신고 3회 학습Lost report three times learning 분실 신고한 거 취소할게요I will cancel the report. 분실 신고 한 거 취소할게요I will cancel the report.

소리 나는 대로 쓰기 교정부(323)는 어절 검출부(310)로부터 인식결과의 문장에 대한 단어들을 입력받고, 쓰기 변환 사전(333)을 참조하여 상기 단어들이 표준쓰기와 소리 나는 대로 쓰기가 다른 단어인지를 판단하고, 표준쓰기와 소리 나는 대로 쓰기가 다른 단어이면 소리 나는 대로 쓰기로 변환하여 통계적 언어모델 생성부(340)로 출력한다. 예를 들어 표준쓰기와 소리 나는 대로 쓰기가 다른 단어는 하기 표 3과 같은 경우가 될 수 있을 것이다.The phonetic writing correction unit 323 receives words for sentences of the recognition result from the word detecting unit 310 and refers to the writing conversion dictionary 333 to determine whether the words are different from standard writing. If it is determined that the standard writing and the phonetic writing as different words, the phonetic writing is converted to the phonetic statistical model generation unit 340 is output. For example, words different from standard writing and phonetic writing may be the same as in Table 3 below.

표준 쓰기
(어휘)Standard writing
(Vocabulary) 소리 나는 대로 쓰기
(실제 발성)Phonetic writing
(Actual speech) 변환conversion 서비스service 써비스Service 서비스 -> 써비스Service-> Service 다이너스Diners 다이너쓰Diners 다이너스 -> 다이너쓰Diners-> Diners 보너스bonus 뽀나쓰Ponat 보너스 -> 뽀나쓰Bonus-> Ponat 센터center 쎈터Canter 센터 -> 쎈터Center-> Center 보이스피싱Voice phishing 보이쓰 피씽Boys Pissing 보이스피싱 -> 보이쓰 피씽Voice Phishing-> Voice Phishing

또한, 소리 나는 대로 쓰기 교정부(323)는 상기 표준쓰기인 기본 어휘에 대해 발음 정확도 및 사투리 등에 따라 복수의 발음이 존재하는 경우에도 상기 표1에서와 같은 쓰기 변환 사전(333)을 참조하여 음성인식시스템이 적용된 지역에 대응하는 실제 발성(소리 나는 대로 쓰기)에 대한 단어를 통계적 언어 모델 생성부(340)로 출력한다.In addition, the writing correction unit 323 as spoken refers to the writing conversion dictionary 333 as shown in Table 1 above even when a plurality of pronunciations exist for the basic vocabulary, which is the standard writing, according to pronunciation accuracy and dialect. The word about the actual utterance (writing as spoken) corresponding to the region to which the recognition system is applied is output to the statistical language model generator 340.

어미 일치 교정부(324)는 어절 검출부(310)로부터 인식결과의 문장에 대한 어절들을 입력받고, 상기 어미 일치 사전(334)을 참조하여 상기 어절의 어미에 대응하는 대표 어미로 일치시켜 통계적 언어 모델 생성부(340)로 출력한다. 이는 대화체의 발화인 경우 발음 및 사투리에 따라 어미는 다양한 패턴을 띄기 때문에 인식 성능이 떨어질 수 있다. 따라서 이러한 어미들을 일치시켜 훈련시킴으로써 인식 성능을 높일 수 있다. 예를 들면, 하기 표 4와 같이 어미 일치 교정부(324)는 [옮기(어간)+ㄹ려고요(어미)]를 [옮기(어간)+려고요(어미)]로 일치시키고, [옮기(어간)+ㄹ려고용(어미)]를 [옮기(어간)+려고요(어미)]로 일치시켜 출력한다.
The word matching correction unit 324 receives the words for the sentence of the recognition result from the word detection unit 310 and refers to the word matching dictionary 334 to match the representative words corresponding to the ending words of the statistical language model. Output to the generation unit 340. In the case of conversational speech, recognition performance may deteriorate since the mother has various patterns depending on pronunciation and dialect. Therefore, by matching these mothers and training, recognition performance can be improved. For example, as shown in Table 4 below, the mother matching correction unit 324 matches [move (stem) + r (mother)] to [move (stem) + ryo (mother)], and [move (stem)]. + Drugyong (mother)] matches [move (stem) + goryo (mother)].

어절Word 가변 어미Variable mother 대표 어미Representative mother 일치Same 옮길려고요To move. ㄹ려고요I'm going to 려고요I'm going to 옮기려고요To move 옮길려고용To move ㄹ려고용I'm going to 바꾸려구To change 려구Ryeo 려고Trying 바꾸려고To change 바꿀라고To change ㄹ라고R 바꾸려구요I want to change 려구요I'm going 려고 요I'm going 바꾸려고 요I want to change 바꿀라Change ㄹ라R 려Ryeo 바꾸려To change 바꿀려Change ㄹ려L

통계적 언어 모델 생성부(340)는 상기 교정부(320)에서 출력되는 단어, 어절, 복수의 어절 및 문장단위의 어절들 중 하나 이상의 교정 데이터들을 결합하여 교정된 상기 문장에 대해 통계적 언어 모델에 의한 훈련을 수행하여 통계적 언어 모델 업데이트 버전을 생성하여 음성인식부(200)로 출력한다.Statistical language model generation unit 340 by the statistical language model for the sentence corrected by combining one or more correction data of the word, word, a plurality of words and sentence units of the word output from the correction unit 320 By performing the training, a statistical language model update version is generated and output to the speech recognition unit 200.

정제 규칙 생성부(350)는 상기 오타 교정부(321), 띄어쓰기 교정부(322), 소리 나는 대로 쓰기 교정부(323) 및 어미 일치 교정부(324)로부터 출력되는 하나의 인식결과 문장에 대한 각각의 단어 및 어절들 입력받아 문장단위로 저장하고, 저장된 문장들의 동일 어휘들을 검사하여 동일한 어휘임에도 불구하고 교정되지 않고 띄어쓰기가 다른 교정 규칙 추가 대상 리스트를 생성하여 아웃풋 파일로 출력한다.The refinement rule generation unit 350 may generate a single sentence of a sentence that is output from the typo correcting unit 321, the spacing correcting unit 322, the writing correcting unit 323, and the mother coincidence correcting unit 324. Each word and phrase is input and stored in sentence units, and the same vocabulary of the stored sentences is checked to generate a correction rule addition target list having different spacing but not being corrected despite the same vocabulary, and outputting it to an output file.

상기 개발자는 교정 규칙 추가 대상 리스트를 보고 해당 규칙을 업데이트할 수 있을 것이다.
The developer may update the rule by viewing a list of targets for adding a correction rule.

도 3은 본 발명에 따른 통계적 언어 모델이 적용된 음성인식 시스템의 발화 데이터 정제 방법을 나타낸 흐름도이다. 이하 도 1 내지 도 3을 참조하여 설명한다.3 is a flowchart illustrating a method of purifying speech data of a speech recognition system to which a statistical language model is applied according to the present invention. A description with reference to FIGS. 1 to 3 is as follows.

우선, 발화음성 수집부(1)는 자동응답시스템의 시나리오에 따른 음성안내 서비스를 제공하면서 시나리오의 각 단계별 키워드와 상기 각 단계에서 유도된 사용자의 발화음성을 수집하여 저장한다(S311).First, the spoken voice collecting unit 1 collects and stores the keyword of each step of the scenario and the user's spoken voice derived in each step while providing a voice guidance service according to the scenario of the automatic response system (S311).

발화음성 수집부(1)는 상기 사용자 발성음성 및 키워드의 수집 중에 정제 처리 요청 이벤트가 발생하는지를 검사한다(S312).The spoken voice collecting unit 1 checks whether a refinement processing request event occurs during the collection of the user spoken voice and the keyword (S312).

상기 정제 처리 요청 이벤트가 발생하면 발화 음성 수집부(1)는 수집된 키워드 및 사용자의 발화음성들을 순차적으로 재생하여 수집된 발화 음성들에 대한 음성신호를 음성 처리부(100)로 출력하고, 상기 음성 처리부(100)로 출력된 음성신호에 매칭된 키워드는 정제부(300)로 출력한다(S313).When the refining processing request event occurs, the spoken voice collecting unit 1 sequentially reproduces the collected keywords and the user's spoken voices, and outputs a voice signal for the collected spoken voices to the voice processing unit 100. The keyword matching the voice signal output to the processor 100 is output to the refiner 300 (S313).

음성 처리부(100)는 상기 발화음성 수집부(1)가 출력하는 음성신호를 입력받아 음성데이터로 변환하여 음성인식부(200)로 출력한다(S314).The voice processor 100 receives the voice signal output from the spoken voice collector 1, converts the voice signal into voice data, and outputs the voice signal to the voice recognizer 200 (S314).

음성인식부(200)는 상기 음성데이터를 입력받아 저장되어 있는 통계적 언어모델 그래마(210)에 의해 음성인식을 수행하여 음성인식 결과를 정제부(300)로 출력한다(S315).The voice recognition unit 200 performs voice recognition by using the statistical language model grammar 210, which receives the voice data and outputs the voice recognition result to the refiner 300 (S315).

음성인식 결과를 입력받은 정제부(300)는 상술한 바와 같이 오타 교정, 띄어쓰기 교정, 소리 나는 대로 쓰기 교정 및 어미 일치 교정을 수행하고, 동일한 인식결과의 문장에 대해 오타 교정, 띄어쓰기 교정, 소리 나는 대로 쓰기 교정 및 어미 일치 교정을 수정한 각각의 문장을 통계적 언어 모델 생성부를 통해 결합한 후 최종 교정(정제)된 다수의 문장들에 대해 통계 언어 모델 업데이트 버전을 생성하여 음성 인식부(200)로 출력한다(S317). The tablet unit 300 receiving the voice recognition result performs a typo correction, a spacing correction, a writing correction and a coincidence correction as described above, and corrects a typo correction, spacing correction, or sound for a sentence having the same recognition result. After combining each sentence correcting the writing correction and the parental matching correction through the statistical language model generator, a statistical language model update version is generated for the plurality of sentences that have been finally corrected (refined) and output to the speech recognition unit 200. (S317).

상기 정제가 완료되면 음성 인식부(200)는 상기 정제부(300)로부터 출력된 통계 언어 모델 업데이트 버전을 입력받아 통계적 언어 모델 그래마에 적용하여 통계적 언어 모델 그래마를 갱신시킨다(S319).When the refinement is completed, the voice recognition unit 200 receives the statistical language model update version output from the refiner 300 and applies the statistical language model grammar to update the statistical language model grammar (S319).

상기와 같이 통계적 언어 모델 그래마(210)가 갱신되면 음성인식부(200)는 시뮬레이션부(220)를 통해 갱신된 통계적 언어 모델 그래마에 대한 시뮬레이션을 수행하고 시뮬레이션 결과를 출력한다(S321).
When the statistical language model grammar 210 is updated as described above, the voice recognition unit 200 performs a simulation on the updated statistical language model grammar through the simulation unit 220 and outputs a simulation result (S321).

도 4는 본 발명에 따른 통계적 언어 모델이 적용된 음성인식 시스템의 발화 데이터 정제 방법의 정제 과정을 상세히 나타낸 흐름도이다. 이하 도 4를 참조하여 본 발명에 따른 발화 데이터 정제 과정을 설명한다.Figure 4 is a flow chart illustrating in detail the purification process of the speech data purification method of the speech recognition system to which the statistical language model is applied according to the present invention. Hereinafter, a process of purifying utterance data according to the present invention will be described with reference to FIG. 4.

우선, 정제부(300)의 어절 검출부(310)는 음성인식 결과가 입력되는지를 검사한다(S411). First, the word detection unit 310 of the purification unit 300 checks whether a voice recognition result is input (S411).

음성인식 결과가 입력되면 어절 검출부(310)는 음성인식 결과의 문장을 구성하는 단어, 또는 어절 단위로 분할하여 출력한다(S412).When the voice recognition result is input, the word detection unit 310 divides the word constituting the sentence of the voice recognition result into a word or word unit and outputs the divided word (S412).

상기 오타 교정부(321)는 상기 어절 검출부(310)로부터 인식된 문장을 구성하는 단어들을 입력받아 문법 사전(331)을 참조하여 각 단어의 오타 검사를 수행하고(S421), 오타가 존재 여부에 따라(S423) 오타가 있는 단어를 올바른 단어로 교정하여 통계적 언어 모델 생성부(340)로 출력한다(S425).The typo correcting unit 321 receives the words constituting the sentence recognized by the word detection unit 310 and performs a typo check on each word by referring to the grammar dictionary 331 (S421). In operation S423, the word with a typo is corrected as a correct word and output to the statistical language model generator 340 (S425).

그리고 띄어쓰기 교정부(322)는 어절 검출부(310)로부터 상기 문장을 구성하는 어절들을 입력받아 띄어쓰기 사전(332)을 참조하여 띄어쓰기 검사를 수행하고(S427), 변환 대상 띄어쓰기가 존재하는지의 여부를 판단하여(S429), 존재하면 해당 띄어쓰기 교정을 적용하여 통계적 언어 모델 생성부(340)로 출력한다(S431).The spacing correction unit 322 receives the words constituting the sentence from the word detecting unit 310, performs a spacing check with reference to the spacing dictionary 332 (S427), and determines whether there is a spacing to be converted. (S429), and if present, the corresponding spacing correction is applied to the statistical language model generation unit 340 (S431).

그리고 소리 나는 대로 쓰기 교정부(323)는 어절 검출부(310)로부터 상기 문장을 구성하는 단어들을 입력받고 쓰기 변환 사전(333)을 참조하여 소리 나는 대로 쓰기 대상인 단어가 있는지를 판단하여(S435), 존재하는 경우 해당 단어의 표준 쓰기 단어를 소리 나는 대로 쓰기의 단어로 교정하여 통계적 언어 모델 생성부(340)로 출력한다(S437).And the phonetic writing correction unit 323 receives the words constituting the sentence from the word detection unit 310 and by referring to the write conversion dictionary 333 determines whether there is a word to be written as a phonetic sound (S435), If present, the standard writing word of the corresponding word is corrected as a word of writing as it is sounded and output to the statistical language model generator 340 (S437).

그리고 어미 일치 교정부(324)는 어절 검출부(310)로부터 상기 인식 결과의 문장을 구성하는 어절들을 입력받고, 어미 일치 사전(334)은 참조하여 어절들의 어미에 대한 대표 어미가 존재하는지를 검사하는 어미 일치 검사를 수행하여(S439) 그 존재 여부에 따라(S441) 해당 대표 어미로 일치시키는 교정을 수행한 후 통계적 언어 모델 생성부(340)로 출력한다(S443).In addition, the word matching correction unit 324 receives the words constituting the sentence of the recognition result from the word detecting unit 310, and the word matching dictionary 334 refers to a mother for checking whether there is a representative ending for the ending words. A matching test is performed (S439), and according to the presence or absence thereof (S441), the calibration is performed by matching the representative mother and then output to the statistical language model generator 340 (S443).

그러면 통계적 언어 모델 생성부(340)는 상기와 같이 하나의 인식된 결과의 문장에 대해 오타 교정부(321), 띄어쓰기 교정부(322), 소리 나는 대로 쓰기 교정부(323), 어미 일치 교정부(324)로부터 출력되는 교정된 단어단위 또는 어절 단위의 문장들을 입력받아 모든 교정이 반영된 하나의 교정 문장으로 결합한 후 저장한다(S445).The statistical language model generator 340 then corrects a typo correcting unit 321, a spacing correction unit 322, a phonetic writing correction unit 323, and a mother coincidence correction unit for the sentence of one recognized result as described above. The corrected word unit or word unit sentences output from the input unit 324 are received and combined into one corrected sentence in which all corrections are reflected, and then stored (S445).

한편, 본 발명은 전술한 전형적인 바람직한 실시예에만 한정되는 것이 아니라 본 발명의 요지를 벗어나지 않는 범위 내에서 여러 가지로 개량, 변경, 대체 또는 부가하여 실시할 수 있는 것임은 당해 기술분야에서 통상의 지식을 가진 자라면 용이하게 이해할 수 있을 것이다. 이러한 개량, 변경, 대체 또는 부가에 의한 실시가 이하의 첨부된 특허청구범위의 범주에 속하는 것이라면 그 기술사상 역시 본 발명에 속하는 것으로 보아야 한다.While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims. It will be easily understood. If such improvement, change, substitution or addition is carried out within the scope of the appended claims, the technical spirit should also be regarded as belonging to the present invention.

1: 발화음성 수집부 100: 음성 처리부
200: 음성인식부 210: 통계적 언어모델 그래마
220: 시뮬레이션부 300: 정제부
310: 어절 검출부 320: 교정부
321: 오타 교정부 322: 띄어쓰기 교정부
323: 소리 나는 대로 쓰기 교정부 324: 어미 일치 교정부
330: 사전부 331: 문법 사전
332: 띄어쓰기 사전 333: 쓰기 변환 사전
334: 어미 일치 사전 340: 통계적 언어모델 생성부
350: 정제 규칙 생성부1: speech collection unit 100: voice processing unit
200: speech recognition unit 210: statistical language model grammar
220: simulation unit 300: purification unit
310: word detection unit 320: correction unit
321: typo correction unit 322: spacing correction unit
323: phonetic writing correction unit 324: mother matching correction unit
330: dictionary section 331: grammar dictionary
332: Spacing dictionary 333: Write conversion dictionary
334: Matching dictionary 340: Statistical language model generator
350: refinement rule generation unit

Claims

A speech voice that matches and collects the user's speech voice and the keyword of the speech voice, collects and stores the voice signals for the speech voice collected when the refining processing request event is generated, and outputs a keyword corresponding to the output voice signal. Collector,
A voice processing unit for converting and outputting a voice signal output from the spoken voice collecting unit into voice data;
A speech recognition unit for recognizing speech data output from the speech processing unit by pre-stored statistical language model grammar and outputting the recognition result, and receiving a statistical language model update version to update the statistical language model grammar;
And a refiner configured to receive the voice recognition result and the keywords corresponding to the voice recognition result, to consistently correct the sentence of the voice recognition result, and to generate a statistical language model update version of the corrected sentence to provide the voice recognition unit. An apparatus for reproducing speech data of a speech recognition system to which a statistical language model is applied.

The method of claim 1,
The voice recognition unit,
And a simulation unit configured to output a simulation result by performing a simulation on the updated statistical language model when the statistical language model grammar is updated by the statistical language model updated version of the speech recognition system to which the statistical language model is applied. Ignition data purification device.

The method of claim 1,
The purification unit,
A word detection unit for dividing and outputting word and word units from sentence recognition results output from the voice recognition unit;
A dictionary unit for storing dictionaries according to a type of correction for performing the correction of the word, word, sentence, according to the type of correction;
A correction unit for receiving words, phrases, and keywords for sentences of the recognition result output from the word detection unit, and performing correction according to the type of correction by referring to the dictionary unit and outputting the corrected words and words for each type of correction; ,
The statistical unit generates a corrected sentence by combining sentences outputted by the type of correction by the correction unit, and generates a statistical language model updated version by training a statistical language model on the generated sentence to generate and output a statistical language model. An apparatus for reproducing speech data of a speech recognition system to which a statistical language model is applied, including a unit.

The method of claim 3,
The dictionary unit,
A grammar dictionary that defines the grammar of words to correct typos in word units,
A spacing dictionary that stores spacing grammars specific to each application to correct spacing of words or multiple word units,
Write conversion dictionaries, where standard writing and phonetic writing for word-to-word phonetic correction are matched and stored for different words;
Speech Recognition with Statistical Language Models, which includes a word matching dictionary that defines variable endings for each of a plurality of standard endings for matching words on a word-by-word basis and defines a representative ending for matching the variable endings. Ignition data purification device of the system.

The method of claim 4, wherein
The correction unit,
Typing words constituting the sentence of the speech recognition result input from the word detection unit, checks if there is a typo among the words by referring to the grammar dictionary, corrects the normal word if there is a typo, and outputs it to the statistical language model generator With the government,
Generates a statistical language model by inputting words in a sentence unit from a word detection unit, and receiving a keyword corresponding to the sentence from a spoken speech collector and performing keyword-based spacing correction on the words of the sentence with reference to the spacing dictionary. A spacing correction unit to output negative
When the words for the sentence of the recognition result are input from the word detection unit, the word is determined by referring to the writing conversion dictionary to determine whether the words are different from the standard writing and the phonetic writing. A phonetic writing correction unit which converts the phonetic writing into output and outputs it to the statistical language model generator;
And a word matching correction unit for receiving words from the word detection unit for the sentences of the recognition result, matching the representative words corresponding to the word endings of the word by referring to the word matching dictionary, and outputting the word to the statistical language model generator. An apparatus for refining speech data of a speech recognition system to which a statistical language model is applied.

The method of claim 5,
The write conversion dictionary,
A plurality of additional pronunciations which are actually pronounced according to the pronunciation accuracy and dialect are further defined for basic vocabularies that are standard writing and basic vocabularies that vary depending on pronunciation accuracy and dialect among the basic vocabularies,
Write correction part as mentioned above,
If a word is input from the word detection unit for the sentence of the recognition result and a plurality of pronunciations are present according to pronunciation accuracy and dialect for each basic vocabulary of each word by referring to the writing conversion dictionary, An apparatus for reproducing speech data of a speech recognition system to which a statistical language model is applied, which is corrected by a word for a corresponding actual utterance (writing as a phonetic sound) and output to a statistical language model generator.

The method of claim 1,
The spoken voice collection unit,
Speech data of the speech recognition system to which the statistical language model is applied, characterized in that the speech of the users calling the automatic answering system using the automatic answering system and the keywords of the corresponding stages of the scenario inducing the speeching are matched and collected. refinery.

A collection process of collecting, by the spoken voice collector, a user's spoken voice and a keyword corresponding to the spoken voice;
A test purification initiation process of outputting a keyword matching the speech voice to the collected speech voice unit when a purification processing request event occurs;
A voice data conversion process of receiving a voice signal for the spoken voice by converting the voice signal into voice data;
A voice recognition process of receiving a voice recognition unit to receive the voice data and performing voice recognition to output a sentence of a voice recognition result corresponding to the spoken voice;
A correction unit receives a sentence of the speech recognition result, receives a keyword matched to the speech voice from the spoken speech collector, consistently corrects a sentence of the speech recognition result, and updates a statistical language model updated version of the corrected sentence. A purification process generated and provided to the voice recognition unit;
And a statistical language model grammar updating process of receiving a statistical language model update version by the speech recognition unit to update the statistical language model grammar.

9. The method of claim 8,
The speech recognition unit further comprises a simulation process for performing a simulation on the updated statistical language model grammar, and outputs the result of the simulation results, the speech data purification method of the speech recognition system to which the statistical language model is applied.

9. The method of claim 8,
The refinement processing request event, the speech data refining method of the speech recognition system to which the statistical language model is applied, characterized in that the automatic occurrence of one or more intervals of one or more of the period of time, day, week, month and year.

9. The method of claim 8,
The purification process,
A word detection step of the word detection unit dividing a sentence of the recognition result into words and words and outputting the divided word;
A correcting step of receiving a corrected word and phrases according to the type of correction by inputting the divided words and phrases and the keyword according to the type of correction, and by performing a correction according to the type of correction by referring to a dictionary;
The statistical language model generator receives a sentence composed of words or words output for each correction type from the correction unit, generates one sentence reflecting the correction for each correction type, and updates the statistical language model updated version of the generated sentence. And a statistical language model generation step of generating and outputting the speech signal to the speech recognition unit.

The method of claim 11,
The calibration step,
Step of correcting the error from the word detection unit receives the words constituting the sentence of the speech recognition result by referring to the grammar dictionary to check whether there is a typo, and if there is a typo, correct the normal word and output it to the statistical language model generator Wow,
Receives words in units of sentences from the word detection unit, receives a keyword corresponding to the sentence from the spoken speech collector, and performs keyword-based spacing correction on the words of the sentence by referring to the spacing dictionary to the statistical language model generator. A spacing correction step to output,
The word detection unit receives the words for the sentence in the recognition result, and determines whether the words are different from the standard writing and the phonetic writing by referring to the writing conversion dictionary. I will convert the writing as it is written to the statistical language model generator, as written sound correction step,
And a word matching correction step of receiving words from the word detection unit for the sentences of the recognition result, matching the representative words corresponding to the word endings of the word by referring to a word matching dictionary, and outputting the word to the statistical language model generator. A method of purifying speech data of a speech recognition system to which a statistical language model is applied.

The method of claim 12,
The calibration step,
If a word is input from the word detection unit for the sentence of the recognition result and a plurality of pronunciations are present according to pronunciation accuracy and dialect for each basic vocabulary of each word by referring to the writing conversion dictionary, A method of refining speech data of a speech recognition system to which a statistical language model is applied, further comprising an additional pronunciation correction step of correcting a word for a corresponding actual utterance (writing as a phonetic sound) and outputting it to a statistical language model generator.

9. The method of claim 8,
The speech voice collected in the collection process,
Voices spoken by users who dialed the auto answering system using the auto answering system.
The keyword is a speech data refining method of a speech recognition system to which a statistical language model is applied, wherein the keyword is a keyword of a corresponding step of a scenario inducing speech speech.