KR101854369B1

KR101854369B1 - Apparatus for providing speech recognition service, and speech recognition method for improving pronunciation error detection thereof

Info

Publication number: KR101854369B1
Application number: KR1020110118048A
Authority: KR
Inventors: 김영준
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2011-11-14
Filing date: 2011-11-14
Publication date: 2018-05-04
Also published as: KR20130052800A

Abstract

본 발명은 각 단어에 대한 기준 발음이 정의된 음성인식용 발음 사전 및 각 단어에 대하여 발생 가능한 오류 발음 유형을 정의한 오류 검출용 발음 사전을 구분하여 저장하고, 단말 장치로부터 사용자 음성 입력에 대한 오류 검출이 요청되면, 상기 음성인식용 발음 사전을 기준으로 상기 사용자 음성 입력에 대응하는 단어를 인식한 후, 상기 오류 검출용 발음 사전으로부터 상기 인식된 단어에 대한 오류 발음을 추출하고, 이를 상기 사용자 음성 입력과 비교하여, 사용자 발음에 대한 오류를 검출하여 단말 장치에 제공하는 서비스 장치; 사용자로부터 상기 사용자 음성을 입력 받아, 상기 서비스 장치로 전송하여 오류 검출을 요청하고, 상기 서비스 장치로부터 상기 사용자 음성 입력의 발음에 대한 오류 검출 결과를 수신하여 사용자에게 출력하는 단말 장치를 포함하는 것을 특징으로 하는 오류 발음 검출 능력 향상을 위한 음성 인식 시스템을 제공하여, 오류 유형을 아는 상태에서 오류를 검출하므로 빠르고 정확하게 음성 인식을 수행하고 발성한 단어에 존재하는 오류를 검출할 수 있다.The present invention distinguishes and stores an erroneous pronunciation dictionary that defines a reference pronunciation for each word and an error dictionary for error detection that defines a possible erroneous pronunciation type for each word, The speech recognition unit recognizes a word corresponding to the user voice input based on the voice recognition pronunciation dictionary and extracts an error pronunciation for the recognized word from the error detection pronunciation dictionary, A service device for detecting an error of a user's pronunciation and providing the detected error to a terminal device; And a terminal device for receiving the user's voice from the user, requesting the error detection by transmitting the user's voice to the service device, receiving the error detection result for the pronunciation of the user's voice input from the service device, The present invention provides a speech recognition system for improving the pronunciation detection ability by detecting an error in a state of knowing an error type, thereby performing fast and accurate speech recognition and detecting an error in a spoken word.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a speech recognition service,

본 발명은 사용자 음성에 대하여 음성 인식을 수행하는 음성 인식에 관한 것으로, 특히, 모국어 간섭에 의해 발생하는 외국어의 발음 오류에 대한 검출 능력을 향상시키기 위한 음성 인식 서비스를 제공하는 장치 및 그의 오류 발음 검출 능력 향상을 위한 음성 인식 방법에 관한 것이다.The present invention relates to speech recognition for performing speech recognition on user speech, and more particularly to an apparatus for providing speech recognition services for improving the detection ability of a foreign language pronunciation error caused by mother language interference, And a speech recognition method for improving the ability.

일반적으로 음성인식에서는 사람의 목소리 패턴과 유사도를 측정하기 위하여 음향모델을 사용한다. 이때 음운(phoneme)을 음향모델의 최소 단위로 사용하는 경우가 많다. 더하여 단어마다 발성 가능한 음운열 정보를 기록해 놓은 것을 사전(lexicon)이라고 한다. 한 단어도 여러 개의 음운열로 발성되는 경우가 있기 때문에 이렇게 여러 개의 음운열로 발음되는 단어에 대해서는 각기 다른 음운열에 해당하는 정보를 복수로 추가하게 된다.In general, speech recognition uses an acoustic model to measure human voice patterns and similarities. In this case, phoneme is often used as the minimum unit of the acoustic model. In addition, a dictionary in which phonetic column information that can be spoken per word is recorded is called a lexicon. One word may also be spoken as several phonemes, so a plurality of information corresponding to different phonemes are added to words that are spoken by a plurality of phonemes.

기존의 음성 인식에서는 정확한 발음(표준 발음)에 대한 발음 사전만을 사용하여 사용자가 발성한 음성에 대응하는 단어를 인식하였다.In conventional speech recognition, only the pronunciation dictionary for correct pronunciation (standard pronunciation) is used to recognize a word corresponding to the user's utterance.

이러한 기존의 음성 인식 기술에서는, 외국어 학습자의 모국어 간섭에 의해 발음 오류가 발생하는 경우, 이들을 정확히 인식하기 어려우며, 특히, 표준 발음과 모국어 간섭에 의한 오류 발음 구분하여 인식하는 것은 불가능하였다.In the conventional speech recognition technology, it is difficult to correctly recognize the pronunciation errors due to the native language interference of the foreign language learners. In particular, it is impossible to recognize the pronunciation errors due to the standard pronunciation and the native language interference.

모국어 간섭에 의한 발음 오류에 대해서도 정확히 단어를 인식하기 위해서는, 발음 사전에 더 많은 음운열을 추가하여야 하는데, 이렇게 하나의 단어에 대해 너무 많은 개수의 음운열 정보를 사용할 경우, 확률적으로 맞춰야 하는 음성인식 성능이 떨어지게 된다.In order to accurately recognize words even in pronunciation errors due to native language interference, it is necessary to add more phonemes in the phonetic dictionary. If too many phonemes in one word are used, The recognition performance deteriorates.

본 발명은 음성 인식에 이용되는 발음 사전을, 음성 인식용과 오류검출용으로 구분하여 구성하고, 음성인식용 발음 사전을 이용해서 음성인식을 수행하여 정확한 단어를 찾고, 오류검출용 발음 사전을 적용하여 발화자가 발성한 단어에 존재하는 발음 오류를 검출하도록 함으로써, 음성 인식 성능 및 오류 검출 성능을 향상시킬 수 있는 음성 인식 서비스를 제공하는 장치 및 그의 오류 발음 검출 능력 향상을 위한 음성 인식 방법을 제공하고자 한다.The present invention is characterized in that a pronunciation dictionary used for speech recognition is divided into two parts for speech recognition and error detection, and speech recognition is performed using a speech recognition pronunciation dictionary to find correct words and a pronunciation dictionary for error detection is applied An apparatus for providing a speech recognition service capable of improving speech recognition performance and error detection performance by detecting a pronunciation error existing in a word uttered by a speaker, and a speech recognition method for improving the error pronunciation detection capability thereof .

상술한 과제를 해결하기 위한 수단으로서, 본 발명은 통신망을 통해 데이터를 송수신하는 통신부; 음성 인식을 위해 단어별로 발생 가능한 기준 음운열을 정의하는 음성인식용 발음 사전과, 사용자 발성에 대한 오류를 검출하기 위해 각 단어 별로 발생 가능한 오류 음운열을 정의하는 오류검출용 발음 사전을 구분하여 저장하는 저장부; 및 상기 통신부를 통해 단말 장치로부터 사용자 음성이 수신되면, 상기 음성인식용 발음 사전을 기준으로 상기 사용자 음성에 대응하는 단어를 인식한 후, 상기 오류검출용 발음 사전을 이용하여 상기 인식된 단어에 대한 발음 오류를 검출하여, 음성 인식 결과 및 오류 검출 결과를 상기 단말 장치에 제공하는 서비스 제공부;를 포함하는 것을 특징으로 하는 음성인식 서비스를 제공하는 서비스 장치를 제공한다.As a means for solving the above-mentioned problems, the present invention provides a communication system comprising: a communication unit for transmitting and receiving data through a communication network; A phonetic phonetic dictionary for defining a phoneme sequence that can be generated for each word for speech recognition, and a phonetic dictionary for error detection that defines an error phonological column that can be generated for each word to detect errors in user utterance ; And a control unit for recognizing a word corresponding to the user's voice based on the voice recognition pronunciation dictionary when the user's voice is received from the terminal device through the communication unit, And a service provider for detecting a pronunciation error and providing a result of speech recognition and an error detection result to the terminal device.

본 발명에 의한 서비스 장치에 있어서, 음성인식용 발음 사전 및 오류 검출용 발음 사전은 모국어 간섭에 의해 발생 가능한 오류 음운열을 포함하되, 음성인식용 발음 사전은 오류검출용 발음 사전에 비하여 더 적은 음운열을 포함하는 것을 특징으로 한다.In the service apparatus according to the present invention, the phonetic dictionary for voice recognition and the phonetic dictionary for error detection include error phonemes that can be generated by mother language interference, and the phonetic phonetic dictionary for voice recognition has a smaller phoneme And heat.

본 발명에 의한 서비스 장치에 있어서, 서비스 제공부는 사용자 음성에 대하여 발음, 장단, 억양, 강세 중 하나 이상을 포함하는 오류 유형을 검출하는 오류 검출 모듈을 포함할 수 있다.In the service apparatus according to the present invention, the service providing unit may include an error detection module that detects an error type including at least one of pronunciation, longitude, intonation, and accent with respect to the user's voice.

본 발명에 의한 서비스 장치에 있어서, 오류검출용 발음 사전은 각각의 오류 음운열 별로 오류 발생 빈도율을 더 포함하고, 서비스 제공부는 오류 검출 시 오류 빈도율이 높은 순서대로 오류 음운열과 상기 사용자 음성을 비교하여 오류를 검출할 수 있다.In the service apparatus according to the present invention, the error-detection pronunciation dictionary further includes an error occurrence frequency rate for each error phoneme string, and the service providing unit stores the error sound sequence and the user speech in the order of high error frequency rate The error can be detected by comparing.

본 발명에 의한 서비스 장치에 있어서, 서비스 제공부는 단말 장치로 오류 검출 결과의 제공 시, 각각의 오류 유형에 대한 교정 방법 또는 오류 원인에 대한 정보를 더 제공할 수 있다.In the service apparatus according to the present invention, when providing the error detection result to the terminal apparatus, the service providing unit may further provide information on the correction method or error cause for each error type.

본 발명에 의한 서비스 장치의 서비스 제공부는, 상기 사용자 음성이 표준 음운열과의 매칭되지는 않으나 유사도가 기 설정된 범위이고, 오류검출용 발음 사전에 대응하는 음운열이 존재하지 않는 경우, 상기 사용자 음성에 대응하는 음운열을 오류검출용 발음 사전에 추가하는 오류 관리 모듈을 더 포함할 수 있다.The service provider of the service apparatus according to the present invention may be configured such that when the user's voice is not matched with the standard phoneme string but the similarity is within a predetermined range and the phoneme string corresponding to the phonetic dictionary for error detection does not exist, And an error management module for adding a corresponding phoneme string to the phonetic dictionary for error detection.

또한, 본 발명은 상술한 과제를 해결하기 위한 다른 수단으로서, 음성 인식을 위해 단어별로 발생 가능한 기준 음운열을 정의하는 음성인식용 발음 사전과, 사용자 발성에 대한 오류를 검출하기 위해 각 단어 별로 발생 가능한 오류 음운열을 정의하는 오류검출용 발음 사전을 구분하여 저장하는 저장부; 사용자의 요청을 입력 받기 위한 입력부; 사용자의 음성을 입력 받는 오디오 처리부; 입력부를 통해 입력된 사용자의 요청에 따라서, 음성인식용 발음 사전을 이용하여 오디오 처리부를 통해 입력된 사용자 음성에 대응하는 단어를 인식하고, 오류검출용 발음 사전을 이용하여 인식된 단어에 대한 발음 오류를 추출하고, 음성 인식 결과 및 오류 검출 결과를 출력하도록 제어하는 제어부; 및 음성 인식 결과 및 오류 검출 결과를 출력하는 출력부;를 포함하는 것을 특징으로 하는 음성인식 서비스를 제공하는 단말 장치를 제공한다.Another object of the present invention is to provide a speech recognition dictionary for speech recognition which defines a reference phoneme sequence that can be generated for each word for speech recognition and a speech recognition dictionary for speech recognition, A storage unit for storing a phonetic dictionary for error detection that defines possible error phonemes separately; An input unit for receiving a user request; An audio processing unit for receiving a user's voice; According to a request of a user input through an input unit, a word corresponding to a user voice input through the audio processing unit is recognized using a voice recognition pronunciation dictionary, and a pronunciation error And outputs a voice recognition result and an error detection result; And an output unit for outputting a speech recognition result and an error detection result.

본 발명에 의한 단말 장치에 있어서, 음성인식용 발음 사전 및 오류검출용 발음 사전은 모국어 간섭에 의해 발생 가능한 오류 음운열을 포함하되, 음성인식용 발음 사전은 오류검출용 발음 사전에 비하여 더 적은 음운열을 포함하는 것을 특징으로 한다.In the terminal device according to the present invention, the phonetic dictionary for speech recognition and the phonetic dictionary for error detection include error phonemes that can be generated by the mother language interference, and the phonetic phonetic dictionary for speech recognition has a smaller phoneme And heat.

본 발명에 의한 단말 장치에 있어서, 제어부는 사용자 음성에 대하여 발음, 장단, 억양, 강세 중 하나 이상을 포함하는 오류 유형을 더 검출할 수 있다.In the terminal device according to the present invention, the control unit can further detect an error type including at least one of pronunciation, long end, intonation, and accent with respect to the user's voice.

본 발명에 의한 단말 장치에 있어서, 오류검출용 발음 사전은 각각의 오류 음운열 별로 오류 발생 빈도율을 더 포함하고, 제어부는 오류 검출 시 오류 빈도율이 높은 순서대로 오류 음운열과 상기 사용자 음성을 비교하여 오류를 검출할 수 있다.In the terminal device according to the present invention, the phonetic dictionary for error detection further includes an error occurrence rate for each erroneous phoneme string, and the controller compares the error phoneme string and the user voice in the order of high error frequency rate So that an error can be detected.

본 발명에 의한 단말 장치에 있어서, 제어부는 오류 검출 결과의 제공 시, 각각의 오류 유형에 대한 교정 방법 또는 오류 원인에 대한 정보를 더 제공할 수 있다.In the terminal device according to the present invention, the control unit may further provide information on a correction method or an error cause for each error type when providing the error detection result.

본 발명에 의한 단말 장치에 있어서, 제어부는 통신부를 통해 서비스 장치로부터 음성인식용 발음 사전 및 오류검출용 발음 사전을 수신하여 저장할 수 있다.In the terminal device according to the present invention, the control unit can receive and store the pronunciation dictionary for voice recognition and the pronunciation dictionary for error detection from the service apparatus through the communication unit.

또한, 본 발명은 상술한 과제를 해결하기 위한 또 다른 수단으로서, 음성 인식을 위해 단어 별로 발생 가능한 기준 음운열을 정의하는 음성인식용 발음 사전을 이용하여, 사용자 음성에 대응하는 단어를 인식하는 단계; 및 사용자 발성에 대한 오류를 검출하기 위해 각 단어 별로 발생 가능한 오류 음운열을 정의하는 오류검출용 발음 사전을 이용하여, 사용자 음성에 포함된 발음 오류를 검출하는 단계를 포함하는 것을 특징으로 하는 오류 발음 검출 능력 향상을 위한 음성 인식 방법을 제공한다.The present invention also provides a method of recognizing a word corresponding to a user's voice by using a phonetic pronunciation dictionary that defines a reference phoneme string that can be generated for each word for speech recognition, ; And detecting a pronunciation error included in the user's voice by using a pronunciation dictionary for error detection that defines an error phoneme sequence that can be generated for each word in order to detect an error in user utterance A speech recognition method for improving the detection capability is provided.

본 발명에 의한 음성 인식 방법에 있어서, 음성인식용 발음 사전 및 오류검출용 발음 사전은 모국어 간섭에 의해 발생 가능한 오류 음운열을 포함하되, 음성인식용 발음 사전은 오류검출용 발음 사전에 비하여 더 적은 음운열을 포함하는 것을 특징으로 한다.In the speech recognition method according to the present invention, the phonetic dictionary for speech recognition and the phonetic dictionary for error detection include error phonemes that can be generated by mother language interference, Phonemes and phonemes.

본 발명에 의한 음성 인식 방법은, 상기 검출된 오류에 대하여, 발음, 장단, 억양, 강세 중 하나 이상을 포함하는 오류 유형을 검출하는 단계를 더 포함할 수 있다.The speech recognition method according to the present invention may further include the step of detecting an error type including at least one of pronunciation, long end, intonation, and stress with respect to the detected error.

본 발명은 음성 인식을 위해, 단어 별로 발생 가능한 음운열을 정의하는 발음 사전을 구축하는데 있어서, 음성인식용과 오류검출용을 구분하여 구성하고, 음성인식용 발음 사전을 통해 사용자 음성에 대응하는 단어를 인식한 후, 오류검출용 발음 사전을 통해 사용자 음성에 포함된 발음 오류를 검출하도록 함으로써, 음성 인식 및 오류 검출 성능을 향상시킬 수 있다.The present invention relates to a method of constructing a phonetic dictionary that defines phonemes that can be generated for each word for speech recognition. The phonetic dictionary is divided into speech recognition and error detection, and a word corresponding to the user speech The speech recognition and the error detection performance can be improved by detecting a pronunciation error included in the user's voice through the pronunciation dictionary for error detection.

특히, 본 발명은 음성인식용 발음 사전 및 오류 검출용 발음 사전의 구성시 사용자의 모국어 간섭에 의해 발생 가능한 오류 음운열을 포함하도록 하되, 음성 인식용 발음 사전의 음운열이 오류 검출용 발음 사전의 음운열보다 적게 함으로써, 음성 인식시에는 최소로 압축된 음운열로 학습하여, 음성 인식 속도 및 정확도를 향상시키고, 오류 검출시에는 발생 가능한 모든 오류 음운열을 적용함으로써, 오류 검출 성능을 향상시킬 수 있다.In particular, the present invention is intended to include an error phoneme sequence that can be generated by a user's mother language interference in the construction of a phonetic pronunciation dictionary and a phonetic dictionary for error detection, By reducing the number of phonemes, it is possible to improve speech recognition performance by improving speech recognition speed and accuracy by learning with the least compressed speech sequence in speech recognition and by applying all possible error phonemes in error detection have.

도 1은 본 발명에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 시스템을 나타낸 블록도이다.
도 2는 본 발명에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 시스템에서 서비스 장치의 구성을 나타낸 블록도이다.
도 3은 본 발명의 서비스 장치를 통해 수행되는 오류 발음 검출 능력 향상을 위한 음성 인식 방법을 나타낸 순서도이다.
도 4는 본 발명의 다른 실시예에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 서비스 제공을 위한 단말 장치의 구성을 나타낸 블록도이다.
도 5는 본 발명의 다른 실시예에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 방법을 나타낸 순서도이다.FIG. 1 is a block diagram of a speech recognition system for improving error detection capability according to the present invention.
FIG. 2 is a block diagram illustrating a configuration of a service apparatus in a speech recognition system for improving error detection capability according to the present invention.
3 is a flowchart illustrating a speech recognition method for improving error pronunciation detection capability performed through the service apparatus of the present invention.
FIG. 4 is a block diagram illustrating a configuration of a terminal device for providing a speech recognition service for improving error detection capability according to another embodiment of the present invention. Referring to FIG.
FIG. 5 is a flowchart illustrating a speech recognition method for improving the error pronunciation detection capability according to another embodiment of the present invention.

이하 본 발명의 바람직한 실시 예를 첨부한 도면을 참조하여 상세히 설명한다. 다만, 하기의 설명 및 첨부된 도면에서 본 발명의 요지를 흐릴 수 있는 공지 기능 또는 구성에 대한 상세한 설명은 생략한다. 또한, 도면 전체에 걸쳐 동일한 구성 요소들은 가능한 한 동일한 도면 부호로 나타내고 있음에 유의하여야 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description and the accompanying drawings, detailed description of well-known functions or constructions that may obscure the subject matter of the present invention will be omitted. It should be noted that the same constituent elements are denoted by the same reference numerals as possible throughout the drawings.

이하에서 설명되는 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위한 용어로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시 예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시 예에 불과할 뿐이고, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다.The terms and words used in the present specification and claims should not be construed in an ordinary or dictionary sense, and the inventor shall properly define the terms of his invention in the best way possible It should be construed as meaning and concept consistent with the technical idea of the present invention. Therefore, the embodiments described in the present specification and the configurations shown in the drawings are merely the most preferred embodiments of the present invention, and not all of the technical ideas of the present invention are described. Therefore, It is to be understood that equivalents and modifications are possible.

도 1은 본 발명에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 시스템을 나타낸 블록도로서, 이를 참조하면, 본 발명의 일 실시 예에 따른 음성 인식 시스템은 단말 장치(100)와, 서비스 장치(200)와, 네트워크(300)를 포함하여 이루어질 수 있다.FIG. 1 is a block diagram of a speech recognition system for improving error detection capability according to the present invention. Referring to FIG. 1, a speech recognition system according to an embodiment of the present invention includes a terminal device 100, a service device 200 , And a network 300. [

본 발명의 일 실시 예에 있어서, 음성 인식 서비스는 서버 기반 컴퓨팅 방식으로 이루어질 수 있다. 여기서, 서비스 기반 컴퓨팅 방식은, 네트워크를 매개로 연결된 임의의 장치에서 본 발명에 따른 오류 발음 검출 능력 향상을 위한 음성 인식 서비스 제공 방법의 처리가 이루어지고, 단말 장치에서는 입출력만 이루어지는 방식을 의미한다. 이하에서는 설명의 편의를 위해 본 발명에 따른 오류 발음 검출 능력 향상을 위한 음성 인식 서비스를 제공하는 장치를, 서비스 장치(200)로 구분하기로 한다.In one embodiment of the present invention, the speech recognition service may be based on a server-based computing approach. Here, the service-based computing method refers to a method in which a method of providing a voice recognition service for improving the error pronunciation detection capability according to the present invention is performed in an arbitrary device connected via a network, and a terminal device performs only input and output. Hereinafter, for convenience of description, an apparatus for providing a speech recognition service for improving the error pronunciation detection capability according to the present invention will be described as a service apparatus 200.

서비스 장치(200)는, 네트워크(300)를 통해서 단말 장치(100)로 본 발명에 따른 오류 발음 검출 능력 향상을 위한 음성 인식 서비스를 제공한다. 더 구체적으로, 서비스 장치(200)는 각 단어에 대한 기준 음운열이 정의된 음성인식용 발음 사전 및 각 단어에 대하여 발생 가능한 오류 음운열을 정의한 오류 검출용 발음 사전을 구분하여 저장하고, 단말 장치(100)로부터 사용자 음성 입력에 대한 오류 검출이 요청되면, 음성인식용 발음 사전을 기준으로 사용자 음성 입력에 대응하는 단어를 인식한 후, 오류 검출용 발음 사전으로부터 인식된 단어에 대한 오류 발음을 추출하고, 이를 단말 장치(100)에 제공한다.The service apparatus 200 provides a speech recognition service for improving the error pronunciation detection capability according to the present invention to the terminal device 100 via the network 300. [ More specifically, the service device 200 separately stores an pronunciation pronunciation dictionary in which a reference phoneme string for each word is defined, and a pronunciation dictionary for error detection that defines an error phoneme sequence that can be generated for each word, When an error detection for the user's voice input is requested from the speech recognition apparatus 100, a word corresponding to the user's speech input is recognized based on the pronunciation dictionary for speech recognition, and an error pronunciation for the recognized word is extracted And provides it to the terminal device 100.

상기에서 음성인식용 발음사전과 오류 검출용 발음 사전은 모두 모국어 간섭에 의해 발생 가능한 오류 음운열을 포함할 수 있으며, 다만, 음성 인식용 발음 사전은 오류 검출용 발음 사전에 비하여 더 적은 음운열을 포함하는 것이 바람직하다. 이를 통해 본 발명은 음성 인식 성능 뿐만 아니라 오류 검출 성능까지 향상시킬 수 있다.The phonetic dictionary for speech recognition and the phonetic dictionary for error detection may include an error phoneme sequence that can be generated by the mother language interference. However, the phonetic dictionary for speech recognition has a smaller phonetic sequence than the phonetic dictionary for error detection . Accordingly, the present invention can improve not only speech recognition performance but also error detection performance.

이러한 서비스 장치(200)는 서버-클라이언트 컴퓨팅 방식으로 동작할 수도 있고, 클라우드 컴퓨팅 기반으로 동작할 수도 있다. 즉, 오류 발음 검출 능력 향상을 위한 음성 인식 서비스를 진행하는데 필요한 컴퓨터 자원, 예를 들면, 하드웨어, 소프트웨어 중에서 하나 이상을 서비스 장치(200)에 제공할 수 있다.The service device 200 may operate in a server-client computing manner or may operate in a cloud computing basis. That is, it is possible to provide at least one of the computer resources, for example, hardware and software, required for proceeding with the speech recognition service for improving the error pronunciation detection capability, to the service apparatus 200.

단말 장치(100)는 사용자가 이용하는 다양한 형태의 장치로서, 예를 들면, PC(Personal Computer), 노트북 컴퓨터, 휴대폰(mobile phone), 태블릿 PC, 내비게이션(navigation) 단말기, 스마트폰(smart phone), PDA(Personal Digital Assistants), 스마트 TV(Smart TV), PMP(Portable Multimedia Player) 및 디지털방송 수신기를 포함할 수 있다. 물론 이는 예시에 불과할 뿐이며, 상술한 예 이외에도 현재 개발되어 상용화되었거나 향후 개발될 모든 통신이 가능한 장치를 포함하는 개념으로 해석되어야 한다.The terminal device 100 may be any of various types of devices used by a user and may be a personal computer (PC), a notebook computer, a mobile phone, a tablet PC, a navigation terminal, a smart phone, A PDA (Personal Digital Assistants), a Smart TV, a PMP (Portable Multimedia Player), and a digital broadcast receiver. Of course, this is merely an example, and it should be construed as a concept including a device that is currently developed, commercialized, or capable of all communication to be developed in the future, in addition to the above-described examples.

이러한 단말 장치(100)는 오류 발음 검출 능력 향상을 위한 음성 인식 서비스를 요청하는 사용자가 사용할 수 있다. 본 발명에 따른 오류 발음 검출 능력 향상을 위한 음성 인식 시스템에서, 단말 장치(100)는 사용자로부터 사용자 음성을 입력 받아, 서비스 장치(200)로 전송하여 오류 검출을 요청하고, 서비스 장치(200)로부터 사용자 음성에 대한 음성 인식 및 오류 검출 결과를 수신하여 사용자에게 출력한다.Such a terminal device 100 can be used by a user who requests a speech recognition service for improving the false pronunciation detection capability. In the speech recognition system for improving the error sound detection capability according to the present invention, the terminal device 100 receives user's voice from a user, transmits the user's voice to the service device 200 to request error detection, Receives speech recognition and error detection results for the user's voice, and outputs the result to the user.

네트워크(300)는 서비스 장치(200)와 복수의 단말 장치(100) 간에 데이터의 송수신을 위한 통로를 제공한다. 이러한 네트워크(300)는 인터넷 프로토콜(IP)을 통하여 대용량 데이터의 송수신 서비스 및 끊기는 현상이 없는 데이터 서비스를 제공하는 아이피망으로, 아이피를 기반으로 서로 다른 망을 통합한 아이피망 구조인 올 아이피(All IP)망 일 수 있다. 또한, 네트워크(300)는 유선네트워크, Wibro(Wireless Broadband)망, WCDMA를 포함하는 3 세대 이동네트워크, HSDPA(High Speed Downlink Packet Access)망 및 LTE망을 포함하는 3.5세대 이동네트워크, LTE advanced를 포함하는 4세대 이동네트워크, 위성네트워크 및 와이파이(Wi-Fi)망을 포함하는 무선랜 중 하나 이상을 포함하여 이루어질 수 있다.The network 300 provides a path for transmitting and receiving data between the service device 200 and a plurality of terminal devices 100. [ The network 300 is an i-bimetry that provides a large capacity data transmission / reception service and a data service without a disruption phenomenon through an internet protocol (IP). The i-bimet network 300 is an i-bimet network IP network. Also, the network 300 includes a 3G mobile network including a wired network, a Wibro (Wireless Broadband) network, a WCDMA, a 3.5G mobile network including an HSDPA (High Speed Downlink Packet Access) , A fourth generation mobile network, a satellite network, and a wireless LAN including a Wi-Fi network.

본 발명에 있어서, 단말 장치(100)는 오류 발음 검출 능력 향상을 위한 음성 인식 서비스 제공을 위한 출력 기능만을 수행하므로, 이하에서 서비스 장치(200)를 위주로 설명하기로 한다.In the present invention, the terminal device 100 performs only an output function for providing a voice recognition service for improving the false pronunciation detection capability, and thus the service device 200 will be mainly described below.

도 2는 본 발명의 일 실시예에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 서비스를 제공하기 위한 서비스 장치(200)의 구성을 나타낸 블록도이다. 도 2에서는 서비스 장치(200)의 구성을 기능 단위로 표현하였으나, 이는 실제로 구현 시 다수의 서버 장치에 분산되어 구현될 수도 있고, 하나의 서버 장치에 구현될 수도 있다.2 is a block diagram illustrating a configuration of a service apparatus 200 for providing a speech recognition service for improving error detection capability according to an embodiment of the present invention. In FIG. 2, the configuration of the service apparatus 200 is expressed as a functional unit. However, in actual implementation, the service apparatus 200 may be distributed to a plurality of server apparatuses, or may be implemented in one server apparatus.

도 2를 참조하면, 본 발명의 오류 발음 검출 능력 향상을 위한 음성 인식 시스템에 있어서 서비스 장치(200)는 통신부(210), 저장부(220), 서비스 제공부(230)를 포함하여 이루어질 수 있다.Referring to FIG. 2, in the speech recognition system for improving the error sound detection capability of the present invention, the service apparatus 200 may include a communication unit 210, a storage unit 220, and a service providing unit 230 .

통신부(210)는 네트워크(300)를 통하여 단말 장치(100)와 데이터를 주고받는다.The communication unit 210 exchanges data with the terminal device 100 through the network 300.

저장부(220)는 서비스 장치(200)의 동작을 위한 데이터 및 프로그램을 저장하는 수단으로서, 특히, 본 발명에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 서비스 제공을 위하여, 각 단어에 대한 기준 음운열이 정의된 음성인식용 발음 사전(221) 및 각 단어에 대하여 발생 가능한 오류 음운열을 정의한 오류 검출용 발음 사전(222)을 구분하여 저장한다. 더하여 오류 검출용 발음 사전(222)은 각각의 오류 음운열에 대한 발생 빈도율을 더 포함하여 오류 검출 시 발생 빈도율이 높은 순서대로 오류 유형을 입력된 음성과 비교하여 검출할 수 있다.The storage unit 220 is a means for storing data and programs for operation of the service apparatus 200. In particular, in order to provide a speech recognition service for improving the error pronunciation detection capability according to the present invention, A phonetic pronunciation dictionary 221 in which columns are defined, and a phonetic dictionary 222 for error detection in which error phonemes are generated for each word are separately stored. In addition, the error detection pronunciation dictionary 222 further includes an occurrence frequency rate for each error phoneme sequence, so that the error type can be detected by comparing the error type with the input voice in the order of the occurrence frequency rate at the time of error detection.

서비스 제공부(230)는, 단말 장치(100)로부터 사용자 음성 입력에 대한 오류 검출이 요청되면, 음성인식용 발음 사전(221)을 기준으로 사용자 음성 입력에 대응하는 단어를 인식한 후, 오류 검출용 발음 사전(222)으로부터 인식된 단어에 대한 발생가능한 오류 음운열을 추출하고, 이를 사용자 음성과 비교하여, 사용자 발음에 대한 오류를 검출하여 단말 장치(100)에 제공한다.The service providing unit 230 recognizes a word corresponding to the user's voice input on the basis of the voice recognition pronunciation dictionary 221 when the error detection of the user voice input is requested from the terminal device 100, A possible erroneous phoneme string for a recognized word is extracted from the phonetic pronunciation dictionary 222, and the erroneous phoneme string is compared with a user's voice to detect an error with respect to user pronunciation and provide the detected error to the terminal device 100.

더불어, 서비스 제공부(230)는, 음성 입력된 발음이 표준 발음과 동일하지는 않으나, 기 설정된 유사도 범위에 해당하며, 오류 검출용 발음 사전(222)에 존재하지 않는 발음에 해당하는 경우, 상기 사용자 음성에 대응하는 음운열을 오류 검출용 발음 사전(222)에 추가하여 저장할 수 있다.In addition, in the case where the pronunciation is not equal to the standard pronunciation but corresponds to a predetermined similarity degree range and does not exist in the pronunciation dictionary for error detection 222, A phoneme string corresponding to speech can be added to the pronunciation dictionary 222 for error detection and stored.

이러한 서비스 제공부(230)는, 본 발명에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 서비스 제공을 위한 음성 인식 모듈(231)과 오류 검출 모듈(232)과 오류 관리 모듈(233)을 포함하여 이루어질 수 있다.The service providing unit 230 includes a voice recognition module 231, an error detection module 232, and an error management module 233 for providing a voice recognition service for improving the error sound detection capability according to the present invention .

음성 인식 모듈(231)은 음성인식용 발음 사전(221)을 기준으로 사용자 음성 입력에 대응하는 단어를 인식한다.The voice recognition module 231 recognizes a word corresponding to the user voice input based on the voice recognition pronunciation dictionary 221.

오류 검출 모듈(232)은 오류 검출용 발음 사전으로부터 인식된 단어에 대하여 발생 가능한 다수의 오류 음운열을 추출하여 사용자 음성과 비교하여 사용자 음성에 포함된 오류를 검출한다. 오류 검출 시 오류 빈도율이 높은 순서대로 오류 음운열을 사용자 음성과 비교하여 검출할 수 있다. The error detection module 232 extracts a plurality of error phonemes that can be generated for the recognized word from the phonetic dictionary for error detection, and compares the generated error phonemes with user speech to detect errors included in the user's voice. The error phoneme string can be detected in comparison with the user speech in the order of the highest error frequency rate.

오류 관리 모듈(233)은 사용자 음성에 대응하는 발음이 표준 발음과 기 설정된 유사도 범위에 해당하고, 오류 검출용 발음 사전(222)에 존재하지 않는 새로운 오류 음운열에 해당하는 경우 오류 검출용 발음 사전(222)에 추가한다. 상기 새로운 오류 음운열의 추가는 사용자의 요청에 따라서 이루어질 수 있다.When the pronunciation corresponding to the user voice corresponds to the standard pronunciation and the predetermined similarity degree range and corresponds to a new error phoneme string that does not exist in the pronunciation dictionary for error detection 222, 222). The addition of the new erroneous phoneme string may be made at the request of the user.

음성 인식 모듈(231)과, 오류 검출 모듈(232)과, 오류 관리 모듈(233)은 소프트웨어 혹은 하드웨어 혹은 소프트웨어와 하드웨어의 조합에 의해 구현될 수 있는 것으로서, 예를 들면, 프로그램 형태로 저장부(220)에 저장되어 있다가 서비스 제공부(230)에 의해 실행됨에 의해 구현될 수 있다.The voice recognition module 231, the error detection module 232 and the error management module 233 may be realized by software or hardware or a combination of software and hardware. For example, the voice recognition module 231, the error detection module 232, 220 and executed by the service providing unit 230. [0050]

도 3은 본 발명에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 서비스 제공 방법을 나타낸 순서도이다.FIG. 3 is a flowchart illustrating a method for providing a speech recognition service for improving error detection capability according to the present invention.

도 3을 참조하면, 서비스 장치(200)는 각 단어에 대한 기준 음운열이 정의된 음성인식용 발음 사전(221) 및 각 단어에 대하여 발생 가능한 오류 음운열을 정의한 오류 검출용 발음 사전(222)을 구분하여 저장하여 둔다(S105).3, the service apparatus 200 includes an pronunciation pronunciation dictionary 221 in which a reference phoneme string for each word is defined, and an error detection pronunciation dictionary 222 that defines an error phoneme string that can be generated for each word. (S105).

단말 장치(100)로 사용자에 의해 음성이 입력되면(S110), 단말 장치(100)는 상기 사용자 음성을 서비스 장치(200)로 전송하여 음성 인식 및 오류 검출을 요청할 수 있다(S115). 이에 서비스 장치(200)는, 음성인식용 발음 사전을 기준으로 수신한 사용자 음성에 대응하는 단어를 인식한다(S120).If a voice is input by the user to the terminal device 100 in step S110, the terminal device 100 may transmit the user voice to the service device 200 to request voice recognition and error detection in step S115. Accordingly, the service apparatus 200 recognizes a word corresponding to the received user voice on the basis of the pronunciation dictionary for voice recognition (S120).

그리고, 오류검출용 발음 사전에서 상기 단어에 대응하는 오류 음운열과 상기 사용자 음성을 비교하여, 사용자의 발음에 존재하는 오류를 검출한다(S125). 이때, 상기 서비스 장치(200)는 사용자의 음성에서 검출된 오류와 관련하여, 발음 오류, 장단 오류, 억양 오류, 강세 오류 중에서 하나 이상을 포함하는 오류 유형을 더 검출할 수 있다. 또한, 인식된 단어에 대하여 발생 가능한 다수의 오류 음운열과 사용자 음성을 비교하는데 있어서, 각 오류 음운열별로 발생 빈도율을 저장하여 두고, 발생 빈도율이 높은 순서대로 오류 음운열을 사용자 음성과 비교함으로써, 보다 신속하게 오류를 검출할 수 있다. Then, the error phonetic string corresponding to the word is compared with the user voice in the pronunciation dictionary for error detection, and an error existing in the pronunciation of the user is detected (S125). At this time, the service device 200 may further detect an error type including at least one of a pronunciation error, a long end error, an intonation error, and an accent error in relation to an error detected from the user's voice. Also, in comparing a plurality of error phonemes that can be generated for a recognized word with a user's voice, the occurrence frequency rate is stored for each error phoneme string, and the error phoneme string is compared with the user speech , The error can be detected more quickly.

그리고, 서비스 장치(200)는 음성 인식 및 오류 검출 결과를 단말 장치(100)로 제공하고(S130), 단말 장치(100)는 서비스 장치(200)로부터 수신한 음성 인식 및 오류 검출 결과를 사용자에게 출력한다(S135).The service device 200 provides the voice recognition and error detection result to the terminal device 100 in step S130 and the terminal device 100 transmits the voice recognition and error detection result received from the service device 200 to the user (S135).

이때, 서비스 장치(200)는 오류 검출 결과 제공시, 검출된 오류 유형에 대한 교정 방법 또는 오류 원인을 더 제공할 수 있다.At this time, when providing the error detection result, the service apparatus 200 can further provide a correction method or an error cause for the detected error type.

한편, 사용자 음성이 표준 발음과 기 설정된 유사도 범위에 해당되고, 오류 검출용 발음 사전에 존재하지 않는 새로운 오류 유형에 해당하는 경우(S140), 서비스 장치(200)는 오류 검출용 발음 사전(222)에 상기 사용자 음성에 대응하는 음운열을 새로운 오류 음운열로서 추가함으로써 오류 검출용 발음 사전(222)을 업데이트 할 수 있다(S145). 이때, 추가되는 오류 음운열의 최종 선정은 사용자에 의해 이루어질 수 있다.On the other hand, when the user's voice corresponds to a new error type that does not exist in the pronunciation dictionary for error detection and corresponds to the standard pronunciation and predetermined similarity degree range (S140), the service device 200 displays the pronunciation dictionary for error detection 222 The phonetic dictionary for error detection 222 can be updated by adding a phoneme string corresponding to the user's voice as a new erroneous phoneme sequence in step S145. At this time, the final selection of the error phoneme string to be added can be made by the user.

본 발명의 다른 실시 예에 있어서, 오류 발음 검출 능력 향상을 위한 음성 인식 서비스의 제공은 단말 장치(100)를 기반으로 이루어질 수 있다.In another embodiment of the present invention, the provision of the speech recognition service for improving the error pronunciation detection capability may be performed based on the terminal device 100. [

도 4는 본 발명의 다른 실시 예에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 서비스의 제공을 위한 단말 장치(100)의 구성을 나타낸 블록도이다.FIG. 4 is a block diagram illustrating a configuration of a terminal device 100 for providing a voice recognition service for improving error detection capability according to another embodiment of the present invention.

도 4를 참조하면, 본 발명에 따른 단말 장치(100)는 입력부(110)와, 출력부(120)와, 오디오 처리부(130)와, 저장부(140)와, 제어부(150)를 포함할 수 있다.4, a terminal device 100 according to the present invention includes an input unit 110, an output unit 120, an audio processing unit 130, a storage unit 140, and a control unit 150 .

입력부(110)는 사용자의 조작에 따라서 단말 장치(100)를 제어하거나 동작하기 위한 사용자 입력 신호를 발생하는 수단으로서, 다양한 방식의 입력 수단으로 구현될 수 있다. 예를 들어, 입력부(110)는 키 입력 수단, 터치 입력 수단, 제스처 입력 수단, 음성 입력 수단 중에서 하나 이상을 포함할 수 있다. 키 입력 수단은, 키 조작에 따라서 해당 키에 대응하는 신호를 발생시키는 것으로서, 키패드, 키보드가 해당된다. 터치 입력 수단은, 사용자가 특정 부분을 터치하는 동작을 감지하여 입력 동작을 인식하는 것으로서, 터치 패드, 터치 스크린, 터치 센서를 들 수 있다. 제스처 입력 수단은, 사용자의 동작, 예를 들어, 단말 장치를 흔들거나 움직이는 동작, 단말 장치에 접근하는 동작, 눈을 깜빡이는 동작 등 지정된 특정 동작을 특정 입력 신호로 인식하는 것으로서, 지자기 센서, 가속도 센서, 카메라, 고도계, 자이로 센서, 근접 센서 중에서 하나 이상을 포함하여 이루어질 수 있다.The input unit 110 is a means for generating a user input signal for controlling or operating the terminal device 100 according to a user's operation, and may be implemented by various types of input means. For example, the input unit 110 may include at least one of a key input unit, a touch input unit, a gesture input unit, and a voice input unit. The key input means generates a signal corresponding to the key according to the key operation, and corresponds to a keypad and a keyboard. The touch input means is a touch pad, a touch screen, and a touch sensor, which recognize an input operation by sensing a user's operation of touching a specific portion. The gesture input means recognizes a specific operation, such as a shaking or moving operation of the terminal device, an approach to the terminal device, a blinking operation, etc., as a specific input signal, such as a geomagnetic sensor, A sensor, a camera, an altimeter, a gyro sensor, and a proximity sensor.

출력부(120)는 단말 장치(100)와 사용자 간의 인터페이스 화면을 출력하는 출력 수단으로서, 음성 인식 및 오류 검출 결과를 표시한다. 이러한 출력부(120)는 예를 들면, LCD((Liquid Crystal Display), TFT-LCD(Thin Film Transistor-Liquid Crystal Display), LED(Light Emitting Diodes), OLED(Organic Light Emitting Diodes), AMOLED(Active Matrix Organic Light Emitting Diodes), 플렉시블 디스플레이(flexible display), 3차원 디스플레이 중에서 어느 하나가 될 수 있다.The output unit 120 is an output means for outputting an interface screen between the terminal device 100 and the user, and displays speech recognition and error detection results. For example, the output unit 120 may be a liquid crystal display (LCD), a thin film transistor-liquid crystal display (TFT), a light emitting diode (LED), an organic light emitting diode (OLED) Matrix Organic Light Emitting Diodes), a flexible display, and a three-dimensional display.

오디오 처리부(130)는 음성 및 음향의 입력 및 출력을 처리하는 수단으로서, 음성 인식을 위한 사용자 음성을 입력 받게 된다. 이러한 오디오 처리부(130)는, 음성을 입력할 수 있는 마이크와 출력할 수 있는 스피커를 포함하여 이루어질 수 있다.The audio processing unit 130 is a means for processing input and output of voice and sound, and receives a user voice for voice recognition. The audio processing unit 130 may include a microphone capable of inputting audio and a speaker capable of outputting audio.

저장부(140)는 단말 장치(100)의 동작에 필요한 데이터 혹은 프로그램을 저장하는 수단으로서, 기본적으로 단말 장치(100)의 운용 프로그램(OS) 및 하나 이상의 응용 프로그램을 저장할 수 있다. 더하여, 본 발명에 있어서, 저장부(140)는 오류 발음 검출 능력 향상을 위한 음성 인식 서비스 제공을 위한 각 단어에 대한 기준 음운열이 정의된 음성인식용 발음 사전(141) 및 각 단어에 대하여 발생 가능한 오류 음운열을 정의한 오류 검출용 발음 사전(142)을 구분하여 저장한다. 이러한 저장부(140)는, 램(RAM, Read Access Memory), 롬(ROM, Read Only Memory), 하드디스크(HDD, Hard Disk Drive), 플래시 메모리, CD-ROM, DVD와 같은 모든 종류의 저장 매체를 포함할 수 있다.The storage unit 140 is a means for storing data or programs necessary for operation of the terminal apparatus 100 and can basically store an operating program (OS) of the terminal apparatus 100 and one or more application programs. In addition, in the present invention, the storage unit 140 includes an edible pronunciation dictionary 141 in which a reference phoneme string for each word is defined to provide a speech recognition service for improving the error pronunciation detection capability, And a phonetic dictionary 142 for error detection that defines possible error phonemes are separately stored. The storage unit 140 may be any type of storage such as a RAM (Read Only Memory), a ROM (Read Only Memory), a hard disk (HDD), a flash memory, a CD- Media.

제어부(150)는 단말 장치(100)의 동작 전반을 제어하는 것으로서, 기본적으로 저장부(150)에 저장한 운영 프로그램을 기반으로 동작하여 단말 장치(100)의 기본적인 플랫폼 환경을 구축하고, 사용자의 선택에 따라서 응용 프로그램을 실행하여 임의 기능을 제공한다. 본 발명의 다른 실시 예에 있어서, 제어부(150)는, 입력부(110)를 통해 사용자로부터 음성인식 및 오류 검출이 요청되면, 음성인식용 발음 사전(141)을 기준으로 오디오 처리부(130)를 통해 입력된 사용자 음성에 대응하는 단어를 인식한 후, 오류 검출용 발음 사전(142)으로부터 인식된 단어에 대한 오류 음운열을 추출하고, 이를 사용자 음성과 비교하여, 사용자 발음에 대한 오류를 검출하여 오류 검출 결과를 출력한다. 이러한 제어부(150)는 음성 인식 모듈(151), 오류 검출 모듈(152), 오류 관리 모듈(153)을 포함할 수 있다.The control unit 150 controls the entire operation of the terminal device 100 and basically operates based on the operating program stored in the storage unit 150 to establish a basic platform environment of the terminal device 100, Depending on the selection, the application is executed to provide arbitrary functions. In another embodiment of the present invention, when the voice recognition and the error detection are requested by the user through the input unit 110, the control unit 150 controls the audio processing unit 130 based on the voice recognition pronunciation dictionary 141 After recognizing a word corresponding to the input user voice, an error phoneme sequence for the recognized word is extracted from the error-detecting pronunciation dictionary 142, and the error phoneme sequence is compared with the user's voice, And outputs the detection result. The control unit 150 may include a voice recognition module 151, an error detection module 152, and an error management module 153.

음성 인식 모듈(151)은 음성인식용 발음 사전(141)을 기준으로 사용자 음성에 대응하는 단어를 인식한다.The voice recognition module 151 recognizes a word corresponding to the user voice based on the voice recognition pronunciation dictionary 141.

오류 검출 모듈(152)은 오류 검출용 발음 사전으로부터 인식된 단어에 대하여 발음, 장단, 억양, 강세 중 하나 이상에 대한 오류를 포함하는 오류 음운열을 추출하며, 추출된 오류 음운열과 사용자 음성을 비교하여, 사용자 음성에 포함된 발음 오류를 검출한다. 이때, 오류 음운열별로 발생 빈도율이 저장되어, 발생 빈도율이 높은 순서대로 오류 음운열을 사용자 음성과 비교하도록 할 수 있다. The error detection module 152 extracts an error phoneme string including an error for at least one of pronunciation, long end, intonation, and accent with respect to the recognized word from the phonetic dictionary for error detection, and compares the extracted error phoneme string with the user voice And detects a pronunciation error included in the user voice. At this time, the occurrence frequency rate is stored for each erroneous phoneme string, and the error phoneme string can be compared with the user speech in order of the occurrence frequency ratio.

오류 관리 모듈(153)은 입력된 사용자 음성의 발음이 표준 발음과 기 설정된 유사도 범위이고, 오류 검출용 발음 사전(142)에 존재하지 않는 새로운 오류 유형에 해당하는 경우, 상기 사용자 음성의 음운열을 오류 검출용 발음 사전(142)에 새로운 오류 유형을 추가한다.If the pronunciation of the inputted user voice corresponds to a new error type that does not exist in the error detection pronunciation dictionary 142 and the standard pronunciation and the predetermined similarity degree range, the error management module 153 sets the phoneme string of the user voice A new error type is added to the error detection pronunciation dictionary 142. [

음성 인식 모듈(151)과, 오류 검출 모듈(152)과, 오류 관리 모듈(153)은 소프트웨어 혹은 하드웨어 혹은 소프트웨어와 하드웨어의 조합에 의해 구현될 수 있는 것으로서, 예를 들면, 프로그램 형태로 저장부(140)에 저장되어 있다가 제어부(150)에 의해 실행됨에 의해 구현될 수 있다.The voice recognition module 151, the error detection module 152 and the error management module 153 may be implemented by software or hardware or a combination of software and hardware. For example, the voice recognition module 151, the error detection module 152, 140, and executed by the control unit 150. [0033] FIG.

도 5는 본 발명의 다른 실시 예에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 서비스 제공 방법을 나타낸 순서도이다.FIG. 5 is a flowchart illustrating a method of providing a speech recognition service for improving error detection capability according to another embodiment of the present invention.

도 5를 참조하면, 단말 장치(100)는 각 단어에 대한 기준 음운열이 정의된 음성인식용 발음 사전(141) 및 각 단어에 대하여 발생 가능한 오류 음운열을 정의한 오류 검출용 발음 사전(142)을 구분하여 저장하여 둔다(S205). 상기 음성인식용 발음 사전(141) 및 오류 검출용 발음 사전(142)은 서비스 장치(200)로부터 수신하여 획득할 수 있다.5, the terminal device 100 includes an edible pronunciation dictionary 141 in which a reference phoneme string for each word is defined, and an error detection pronunciation dictionary 142 that defines an error phoneme string that can be generated for each word, (S205). The voice recognition pronunciation dictionary 141 and the error detection pronunciation dictionary 142 can be received from the service device 200 and acquired.

이후 사용자 음성이 입력되면(S210), 음성인식용 발음 사전(141)을 기준으로 사용자 음성에 대응하는 단어를 인식한다(S215). 이어서, 인식한 단어에 대하여 발생 가능한 발음 오류, 장단 오류, 억양 오류, 강세 오류를 포함하는 다수의 오류 음운열을 오류 검출용 발음 사전(142)으로부터 추출하고, 이를 사용자 음성과 비교하여 사용자 음성에 포함된 오류 발음을 검출한다(S220). 이때, 검출된 오류 발음에 대응하여, 발음 오류, 장단 오류, 억양 오류, 강세 오류 중에서 하나 이상을 포함하는 오류 유형을 더 검출할 수 있다. 더불어, 오류 검출시, 발생 빈도율이 높은 순서대로 오류 음운열과 사용자 음성의 발음을 비교함으로써, 보다 신속하게 음성 입력된 단어의 오류를 검출할 수 있다. When a user voice is input (S210), a word corresponding to the user voice is recognized based on the voice recognition pronunciation dictionary 141 (S215). Subsequently, a plurality of error phonemes including the possible pronunciation errors, long and short errors, intonation errors, and accent errors for the recognized words are extracted from the pronunciation dictionary 142 for error detection, and compared with the user speech, And detects the included erroneous pronunciation (S220). At this time, corresponding to the detected error pronunciation, an error type including at least one of a pronunciation error, a long end error, an accent error, and an accent error can be further detected. In addition, when an error is detected, the errors of the speech input words can be detected more quickly by comparing the pronunciation of the user's speech with the error phonological sequence in the order of the occurrence frequency ratio.

단말 장치(100)는 출력부(120)를 통해 음성 인식 및 오류 검출 결과를 출력하여 사용자에게 제공한다(S225). 이때 오류 검출 결과는 각 단어의 오류 유형 또는 검출된 오류 유형에 대한 발음 교정 방법 또는 발음의 오류 원인을 더 포함함으로써, 외국어 학습시 도움을 줄 수 있다.The terminal device 100 outputs speech recognition and error detection results through the output unit 120 and provides the result to the user (S225). At this time, the error detection result may further assist in the learning of a foreign language by further including the error type of each word or the cause of error in the pronunciation correction method or pronunciation of the detected error type.

한편 음성 입력된 발음이 정상 발음과 기 설정된 유사도 범위이고, 오류 검출용 발음 사전에 존재하지 않는 새로운 오류 유형에 해당하는 경우(S230), 단말 장치(100)는 오류 검출용 발음 사전(142)에 새로운 오류 유형을 추가함으로써 오류 검출용 발음 사전(142)을 업데이트 할 수 있다(S235).On the other hand, if the voice input is a new error type that does not exist in the pronunciation dictionary for error detection (S230), the terminal device 100 displays the pronunciation dictionary for error detection 142 The pronunciation dictionary 142 for error detection may be updated by adding a new error type (S235).

본 발명에 따른 오류 발음 검출 능력 향상을 위한 음성 인식 방법은 다양한 컴퓨터 수단을 통하여 판독 가능한 소프트웨어 형태로 구현되어 컴퓨터로 판독 가능한 기록매체에 기록될 수 있다. 여기서, 기록매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 기록매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 예컨대 기록매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(Magnetic Media), CD-ROM(Compact Disk Read Only Memory), DVD(Digital Video Disk)와 같은 광 기록 매체(Optical Media), 플롭티컬 디스크(Floptical Disk)와 같은 자기-광 매체(Magneto-Optical Media), 및 롬(ROM), 램(RAM, Random Access Memory), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함한다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함할 수 있다. 이러한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The speech recognition method for improving the error sound detection capability according to the present invention can be implemented in software form readable by various computer means and recorded in a computer-readable recording medium. Here, the recording medium may include program commands, data files, data structures, and the like, alone or in combination. Program instructions to be recorded on a recording medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. For example, the recording medium may be an optical recording medium such as a magnetic medium such as a hard disk, a floppy disk and a magnetic tape, a compact disk read only memory (CD-ROM), a digital video disk (DVD) Includes a hardware device that is specially configured to store and execute program instructions such as a magneto-optical medium such as a floppy disk and a ROM, a random access memory (RAM), a flash memory, do. Examples of program instructions may include machine language code such as those generated by a compiler, as well as high-level language code that may be executed by a computer using an interpreter or the like. Such hardware devices may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상과 같이, 본 명세서와 도면에는 본 발명의 바람직한 실시 예에 대하여 개시하였으나, 여기에 개시된 실시 예외에도 본 발명의 기술적 사상에 바탕을 둔 다른 변형 예들이 실시 가능하다는 것은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 자명한 것이다. 또한, 본 명세서와 도면에서 특정 용어들이 사용되었으나, 이는 단지 본 발명의 기술 내용을 쉽게 설명하고 발명의 이해를 돕기 위한 일반적인 의미에서 사용된 것이지, 본 발명의 범위를 한정하고자 하는 것은 아니다.While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, It will be apparent to those skilled in the art. Furthermore, although specific terms are used in this specification and the drawings, they are used in a generic sense only to facilitate the description of the invention and to facilitate understanding of the invention, and are not intended to limit the scope of the invention.

음성인식용 사전과 오류검출용 사전을 별도로 구비하여 음성인식용 사전을 통해서 음성인식을 수행하여 정확한 단어를 찾고, 그 단어에 대한 오류 유형을 포함하는 오류 검출용 사전을 이용하여 발성한 단어에 존재하는 오류를 검출하여, 오류 유형을 아는 상태에서 오류를 검출하므로 빠르고 정확하게 음성 인식을 수행하고 발성한 단어에 존재하는 오류를 검출할 수 있다.The speech recognizing dictionary and the error detecting dictionary are separately provided, so that the speech recognizing is performed through the speech recognizing dictionary to find the correct word, and the error detecting dictionary including the error type for the word is used to present And detects an error in a state in which the error type is known, so that it is possible to perform fast and accurate speech recognition and to detect an error in a spoken word.

100: 단말 장치 110: 입력부 120: 출력부
130: 오디오 처리부 140: 저장부
141: 음성인식용 발음 사전 142: 오류 검출용 발음 사전
150: 제어부 151: 음성 인식 모듈
152: 오류 검출 모듈 153: 오류 관리 모듈
200: 서비스 장치 210: 통신부 220: 저장부
221: 음성인식용 발음 사전 222: 오류검출용 발음 사전
230: 서비스 제공부 231: 음성 인식 모듈
232: 오류 검출 모듈 233: 오류 관리 모듈 300: 네트워크100: terminal device 110: input unit 120: output unit
130: Audio processing unit 140:
141: Speech recognition pronunciation dictionary 142: Error detection pronunciation dictionary
150: control unit 151: voice recognition module
152: Error detection module 153: Error management module
200: service apparatus 210: communication unit 220:
221: Speech recognition pronunciation dictionary 222: Error detection speech dictionary
230: Service provisioner 231: Voice recognition module
232: error detection module 233: error management module 300: network

Claims

A communication unit for transmitting and receiving data through a communication network;
A phonetic phonetic dictionary for defining a phoneme sequence that can be generated for each word for speech recognition, and a phonetic dictionary for error detection that defines an error phonological column that can be generated for each word to detect errors in user utterance ; And
When a user's voice is received from the terminal device through the communication unit, recognizes a word corresponding to the user's voice on the basis of the pronunciation dictionary for voice recognition, and then uses the pronunciation dictionary for error detection to pronounce A service providing unit that detects an error and provides a voice recognition result and an error detection result to the terminal device;
Lt; / RTI >
The service providing unit,
When the user's voice is not matched with the standard phoneme string but the degree of similarity is in a predetermined range and the phoneme string corresponding to the phonetic dictionary for error detection does not exist, And an error management module for adding the voice recognition service to the voice recognition service.

The method according to claim 1,
Wherein the phonetic dictionary for speech recognition and the phonetic dictionary for error detection include error phonemes that can be generated by mother language interference and the phonetic pronunciation dictionary includes fewer phonemes than the phonetic dictionary for error detection A service device providing a voice recognition service.

The method of claim 1, wherein the service providing unit
And an error detection module for detecting an error type including at least one of pronunciation, longitude, intonation, and accentuation with respect to the user voice.

The method according to claim 1,
Wherein the error detection pronunciation dictionary further includes an error occurrence frequency rate for each error phoneme string,
Wherein the service providing unit detects an error by comparing the error phoneme string and the user voice in the order of high error frequency rate when an error is detected.

The method of claim 1, wherein the service providing unit
Wherein when the error detection result is provided to the terminal apparatus, the terminal apparatus further provides information on a correction method or an error cause for each error type.

delete

A phonetic phonetic dictionary for defining a phoneme sequence that can be generated for each word for speech recognition, and a phonetic dictionary for error detection that defines an error phonological column that can be generated for each word to detect errors in user utterance ;
An input unit for receiving a user request;
An audio processing unit for receiving a user's voice;
A speech recognition apparatus for recognizing a word corresponding to a user's voice input through the audio processing unit using the speech recognition dictionary for speech recognition in response to a user's request input through an input unit, And outputting the speech recognition result and the error detection result; And
An output unit for outputting the speech recognition result and the error detection result;
Lt; / RTI >
Wherein,
When the user's voice is not matched with the standard phoneme string but the degree of similarity is in a predetermined range and the phoneme string corresponding to the phonetic dictionary for error detection does not exist, And the voice recognition service is added to the voice recognition service.

[8] has been abandoned due to the registration fee.

8. The method of claim 7,
Wherein the phonetic dictionary for speech recognition and the phonetic dictionary for error detection include error phonemes that can be generated by mother language interference and the phonetic pronunciation dictionary includes fewer phonemes than the phonetic dictionary for error detection A terminal device providing a voice recognition service.

[Claim 9 is abandoned upon payment of registration fee.]

8. The apparatus of claim 7, wherein the control unit
Wherein the error detection unit further detects an error type including at least one of pronunciation, longitude, intonation, and accent with respect to the user voice.

[Claim 10 is abandoned upon payment of the registration fee.]

8. The method of claim 7,
Wherein the error detection pronunciation dictionary further includes an error occurrence frequency rate for each error phoneme string,
Wherein the controller detects an error by comparing the error phoneme string with the user speech in the order of the error frequency rate when the error is detected.

[Claim 11 is abandoned upon payment of the registration fee.]

8. The apparatus of claim 7, wherein the control unit
Wherein the information providing unit further provides information on a correction method or an error cause for each error type when providing the error detection result.

8. The apparatus of claim 7, wherein the control unit
Wherein the speech recognition dictionary for voice recognition and the pronunciation dictionary for error detection are received and stored from the service device.

Recognizing a word corresponding to a user's voice using a speech recognition dictionary for speech recognition that defines a reference phoneme string that can be generated for each word for speech recognition;
Detecting a pronunciation error included in the user's voice by using a pronunciation dictionary for error detection that defines an error phoneme sequence that can be generated for each word in order to detect an error in user utterance; And
If the user's voice is not matched with the standard phoneme string but the similarity is within a predetermined range and the phoneme string corresponding to the phonetic dictionary for error detection does not exist, a phoneme string corresponding to the user's voice is added to the phonetic dictionary step;
And a speech recognition unit for recognizing the speech sound.

delete