KR102228017B1

KR102228017B1 - Stand-along Voice Recognition based Agent Module for Precise Motion Control of Robot and Autonomous Vehicles

Info

Publication number: KR102228017B1
Application number: KR1020200051031A
Authority: KR
Inventors: 이덕진
Original assignee: 군산대학교산학협력단
Priority date: 2020-04-27
Filing date: 2020-04-27
Publication date: 2021-03-12
Also published as: KR20200132693A

Abstract

본 발명은 인터넷 연결이 없는 공간에서도 로봇 제어 음성 명령을 정밀하게 인식 및 해석 할 수 있는 단독형 임베디드 음성인식 기술과 자율이동체의 모션을 정밀하게 제어할 수 있는 음성인식 기반 에이전트 모듈을 제공하는 데 목적이 있다.
이를 위해, 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈은 입력된 음성을 인식 가능한 텍스트로 출력하도록 언어처리한 후, 모션 제어를 위한 속도 제어 텍스트로 변환하여 할당하는 음성인식엔진; 상기 음성인식엔진에서 속도 제어 텍스트를 할당받는 프로그래밍 언어를 통해 속도 제어 명령을 전달받아 로봇 또는 자율이동체가 구동되도록 속도 값을 전달하는 로봇운영시스템; 상기 로봇운영시스템으로부터 속도 값을 전달받아 로봇 또는 자율이동체를 구동하는 구동부를 포함할 수 있다.The present invention aims to provide a standalone embedded voice recognition technology capable of accurately recognizing and interpreting robot control voice commands even in a space without internet connection, and a voice recognition-based agent module capable of precisely controlling the motion of an autonomous vehicle. There is this.
To this end, the independent voice recognition-based agent module for precise motion control of robots and autonomous vehicles according to an embodiment of the present invention performs language processing to output the input voice as a recognizable text, and then converts the input voice into a speed control text for motion control. A voice recognition engine that converts and assigns; A robot operating system that receives a speed control command from the speech recognition engine through a programming language to which a speed control text is assigned and transmits a speed value to drive a robot or an autonomous vehicle; It may include a driving unit that receives the speed value from the robot operating system and drives the robot or autonomous moving object.

Description

Stand-along Voice Recognition based Agent Module for Precise Motion Control of Robot and Autonomous Vehicles}

본 발명은 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈에 관한 것으로서, 특히 음향모델, 언어모델, 데이터 사전을 포함하는 음성인식 알고리즘을 이용하여 음성 명령 인식 및 해석을 수행함으로써, 로봇 또는 자율이동체의 모션을 정밀하게 제어할 수 있는 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈에 관한 것이다.The present invention relates to a standalone voice recognition-based agent module for precise motion control of robots and autonomous vehicles, and in particular, by performing voice command recognition and interpretation using a voice recognition algorithm including an acoustic model, a language model, and a data dictionary, The present invention relates to a single-type voice recognition-based agent module for precise motion control of robots and autonomous vehicles that can precisely control the motion of robots or autonomous vehicles.

일반적으로 음성인식은 마이크 등 유무선 통신 방식을 통해 음성을 전달받아 단어나 문장으로 변환시키는 기술을 일컫는다. 이러한 음성인식은 인간에게 편의성을 더할 수 있어 근래에 로봇 분야, 핸드폰, 스마트홈, 자동차, 인포테인먼트 등의 휴대용 기기 분야에서 널리 각광받으며 사용되고 있다.In general, voice recognition refers to a technology that receives voice through a wired or wireless communication method such as a microphone and converts it into words or sentences. Since such voice recognition can add convenience to humans, it is widely used in the field of portable devices such as robots, mobile phones, smart homes, automobiles, and infotainment in recent years.

이에 따라, 음성인식은 꾸준히 연구되어 왔으며, 앞으로도 활발한 연구가 진행될 예정이다.Accordingly, voice recognition has been steadily studied, and active research is expected to continue in the future.

최근에는, 음성인식을 위한 개방형(Open Source) 모듈이 공개되고 있고, 아마존 알렉사, 구글 홈 등과 같은 개방형 모듈을 이용한 음성인식 기반 스마트 홈, 개인비서 활용이 각광받고 있는 실정이다.Recently, an open source module for voice recognition has been released, and the use of voice recognition-based smart homes and personal assistants using open modules such as Amazon Alexa and Google Home is in the spotlight.

그러나, 이러한 개방형(Open Source) 모듈의 경우, 인터넷 연결을 통해서만 서비스가 제공되는 단점이 있기 때문에 인터넷 연결이 없는 공간에서도 로봇을 자유롭고 정밀하게 제어할 수 있는 단독형(Stand along) 임베디드 음성인식 모듈의 개발이 요구되고 있다.However, in the case of such an open source module, there is a disadvantage that the service is provided only through an Internet connection. Therefore, the stand along embedded voice recognition module can freely and precisely control the robot even in a space without an Internet connection. Development is in demand.

본 발명은 상기의 문제점을 해결하기 위하여 인터넷 연결이 없는 공간에서도 로봇 제어 음성 명령을 정밀하게 인식 및 해석 할 수 있는 단독형 임베디드 음성인식 기술과 자율이동체의 모션을 정밀하게 제어할 수 있는 음성인식 기반 에이전트 모듈을 제공하는 데 목적이 있다.In order to solve the above problems, the present invention is a standalone embedded voice recognition technology capable of accurately recognizing and interpreting robot control voice commands even in a space without an Internet connection, and a voice recognition base capable of precisely controlling the motion of an autonomous vehicle. The purpose is to provide an agent module.

상기 과제를 해결하기 위한 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈은, 입력된 음성을 인식 가능한 텍스트로 출력하도록 언어처리한 후, 모션 제어를 위한 속도 제어 텍스트로 변환하여 할당하는 음성인식엔진; 상기 음성인식엔진에서 속도 제어 텍스트를 할당받는 프로그래밍 언어를 통해 속도 제어 명령을 전달받아 로봇 또는 자율이동체가 구동되도록 속도 값을 전달하는 로봇운영시스템; 상기 로봇운영시스템으로부터 속도 값을 전달받아 로봇 또는 자율이동체를 구동하는 구동부를 포함할 수 있다.A standalone voice recognition-based agent module for precise motion control of robots and autonomous vehicles according to an embodiment of the present invention to solve the above problems, after language processing to output the input voice as a recognizable text, for motion control A voice recognition engine that converts and assigns speed control text; A robot operating system that receives a speed control command through a programming language to which a speed control text is assigned from the speech recognition engine and transmits a speed value to drive a robot or an autonomous vehicle; It may include a driving unit that receives the speed value from the robot operating system and drives the robot or autonomous moving object.

여기서, 상기 음성인식엔진은 포켓스피닉스(Pocketsphinx) 또는 다른 개방형 음성인식 모듈을 포함할 수 있다.Here, the voice recognition engine may include a Pocketsphinx or other open voice recognition module.

또한, 상기 음성인식엔진은, 음성 인식에 필요한 특징 벡터를 추출하는 전처리부 및 음성인식 알고리즘을 저장하며, 상기 전처리부에서 추출된 특징 벡터를 상기 음성인식 알고리즘을 통해 분석하여 언어처리 하는 인식부를 포함할 수 있다.In addition, the speech recognition engine includes a pre-processing unit for extracting a feature vector required for speech recognition and a speech recognition unit for storing a speech recognition algorithm, and analyzing the feature vector extracted from the pre-processing unit through the speech recognition algorithm to process language. can do.

또한, 상기 음성인식 알고리즘은, 마이크를 통해 입력된 음성에 대하여 적응성을 갖도록 하는 음향모델; 상기 추출된 특징 벡터를 상기 적응성을 가진 음향모델과 비교하여 인식 가능한 텍스트 형태로 변환하는 언어모델 및 상기 언어모델이 상기 추출된 특징 벡터와 상기 음향모델 비교 시에, 인식 가능한 텍스트 형태로 변환할 수 있는지 판별해 주는 데이터 사전을 포함할 수 있다.In addition, the speech recognition algorithm may include an acoustic model that allows adaptability to a voice input through a microphone; A language model for converting the extracted feature vector into a recognizable text format by comparing the extracted feature vector with the adaptive acoustic model, and the language model can be converted into a recognizable text format when comparing the extracted feature vector and the acoustic model. It can include a data dictionary to determine if it is there.

또한, 상기 프로그래밍 언어는 파이썬(python) 또는 C/C++일 수 있다.In addition, the programming language may be Python or C/C++.

또한, 상기 프로그래밍 언어로부터 상기 로봇운영시스템으로의 속도 제어 명령 전달은 유선 또는 무선 통신방식을 통해서 전달될 수 있다.In addition, transmission of the speed control command from the programming language to the robot operating system may be transmitted through a wired or wireless communication method.

또한, 상기 구동부는 상기 로봇운영시스템으로부터 속도 값을 전달받아 속도를 저레벨 신호로 입력하는 저레벨 프로세서; 상기 저레벨 프로세서를 통해 입력된 속도 신호를 펄스 변조하는 PWM 생성기 및 상기 PWM 생성기로부터 변조된 펄스에 따라 로봇 또는 자율이동체를 구동하는 DC 모터를 포함할 수 있다.In addition, the driving unit may include a low-level processor for receiving a speed value from the robot operating system and inputting a speed as a low-level signal; It may include a PWM generator for pulse-modulating the speed signal input through the low-level processor, and a DC motor for driving a robot or an autonomous vehicle according to the pulse modulated from the PWM generator.

본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈은, 인터넷 연결이 없는 공간에서도 로봇을 자유롭고 정밀하게 제어할 수 있는 특징이 있다.The independent voice recognition-based agent module for precise motion control of robots and autonomous vehicles according to an embodiment of the present invention has a feature that can freely and precisely control the robot even in a space without an Internet connection.

또한, 본 발명의 일 구성인 음향모델이 MLLR(Maximum Likelihood Linear Regression) 및 MAP(Maximum A Posteriori)의 화자적응기법을 이용함으로써 보다 정확한 음성인식을 수행할 수 있다.In addition, by using the speaker adaptation technique of Maximum Likelihood Linear Regression (MLLR) and Maximum A Posteriori (MAP), the acoustic model, which is a component of the present invention, can perform more accurate speech recognition.

또한, 개방형 음성인식엔진인 포켓스피닉스(Pocketsphinx)를 포함한 개방형 음성인식 엔진을 이용함으로써 비교적 저렴한 가격으로 제공될 수 있다.In addition, by using an open speech recognition engine including Pocketsphinx, an open speech recognition engine, it can be provided at a relatively low price.

도 1은 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈의 구성을 도시한 블록도이다.
도 2는 본 발명의 일 구성인 음성인식엔진의 구성을 도시한 블록도이다.
도 3은 도 2의 음성인식엔진의 일 구성인 인식부의 구성을 도시한 블록도이다.
도 4는 본 발명의 일 구성인 음향모델의 화자적응단계를 도시한 블록도이다.
도 5는 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈의 작동 흐름도이다.
도 6은 도 5의 (c) 단계의 작동 흐름도이다.
도 7은 도 5의 (f) 단계의 작동 흐름도이다.1 is a block diagram showing the configuration of a single-type voice recognition-based agent module for precise motion control of robots and autonomous vehicles according to an embodiment of the present invention.
2 is a block diagram showing the configuration of a speech recognition engine, which is one configuration of the present invention.
3 is a block diagram showing a configuration of a recognition unit, which is a component of the speech recognition engine of FIG. 2.
4 is a block diagram showing a speaker adaptation step of an acoustic model, which is a component of the present invention.
5 is a flowchart illustrating an operation of a single-type voice recognition-based agent module for precise motion control of robots and autonomous vehicles according to an embodiment of the present invention.
6 is a flowchart of the operation of step (c) of FIG. 5.
7 is a flowchart of the operation of step (f) of FIG. 5.

이하, 도면을 참조한 본 발명의 설명은 특정한 실시 형태에 대해 한정되지 않으며, 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있다. 또한, 이하에서 설명하는 내용은 본 발명의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Hereinafter, the description of the present invention with reference to the drawings is not limited to a specific embodiment, and various transformations may be applied and various embodiments may be provided. In addition, the content described below should be understood to include all conversions, equivalents, and substitutes included in the spirit and scope of the present invention.

이하의 설명에서 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용되는 용어로서, 그 자체에 의미가 한정되지 아니하며, 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.In the following description, terms such as first and second are terms used to describe various elements, and their meanings are not limited, and are used only for the purpose of distinguishing one element from other elements.

본 명세서 전체에 걸쳐 사용되는 동일한 참조번호는 동일한 구성요소를 나타낸다.The same reference numbers used throughout this specification denote the same elements.

본 발명에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 또한, 이하에서 기재되는 "포함하다", "구비하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것으로 해석되어야 하며, 하나 또는 그 이상의 다른 특징들이나, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The singular expression used in the present invention includes a plurality of expressions unless the context clearly indicates otherwise. In addition, terms such as "comprise", "include" or "have" described below are intended to designate the presence of features, numbers, steps, actions, components, parts, or combinations thereof described in the specification. It is to be construed and not to preclude the possibility of the presence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof.

이하 본 발명의 실시 예를 첨부한 도 1 내지 도 7을 참조하여 상세히 설명하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to FIGS. 1 to 7.

도 1은 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈의 구성을 도시한 블록도이며, 도 2는 본 발명의 일 구성인 음성인식엔진의 구성을 도시한 블록도이고, 도 3은 도 2의 음성인식엔진의 일 구성인 인식부의 구성을 도시한 블록도이며, 도 4는 본 발명의 일 구성인 음향모델의 화자적응단계를 도시한 블록도이다.1 is a block diagram showing the configuration of a single-type voice recognition-based agent module for precise motion control of robots and autonomous vehicles according to an embodiment of the present invention. FIG. 3 is a block diagram showing a configuration of a recognition unit, which is a component of the speech recognition engine of FIG. 2, and FIG. 4 is a block diagram showing a speaker adaptation step of an acoustic model, which is a component of the present invention. .

먼저, 도 1 내지 도 4를 참조하면, 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈은 로봇 또는 자율이동체에 설치되며, 음성인식엔진(10), 로봇운영시스템(30), 구동부(40)를 포함할 수 있다.First, referring to Figs. 1 to 4, a single-type voice recognition-based agent module for precise motion control of a robot and an autonomous mobile according to an embodiment of the present invention is installed in a robot or an autonomous mobile body, and a voice recognition engine 10, It may include a robot operating system 30, a driving unit 40.

구체적으로, 음성인식엔진(10)은 로봇 또는 자율이동체에 설치된 마이크 및 유무선 통신 기반 음성 전달장치로부터 입력되는 음성을 인식할 수 있다. 또한, 음성인식엔진(10)은 포켓스피닉스(Pocketsphinx)로 구성될 수 있다. Specifically, the voice recognition engine 10 may recognize a voice input from a microphone installed in a robot or an autonomous mobile body and a voice transmission device based on wired/wireless communication. In addition, the voice recognition engine 10 may be composed of Pocketsphinx.

이는, 개방형 음성인식엔진으로써, 저렴한 장점이 있으며, 로봇 또는 자율이동체를 제어하기 위해 구성되는 로봇운영시스템(ROS) 및 C/C++ 또는 파이썬 등의 프로그래밍 언어와 호환성이 높고, 인식 속도가 높은 장점이 있다. 그러나, 상기 포켓스피닉스(pocketshpinx)는 바람직한 예로서, 한정되는 것은 아니며 다른 음성인식엔진(10)으로 구비될 수도 있다. This is an open speech recognition engine, which has an inexpensive advantage, has high compatibility with a robot operating system (ROS) configured to control a robot or autonomous vehicle, programming languages such as C/C++ or Python, and high recognition speed. have. However, the pocketshpinx is a preferred example and is not limited and may be provided with another voice recognition engine 10.

이하에서는 바람직한 예인 포켓스피닉스(Pocketshpinx)로 구성되는 음성인식엔진(10)을 기반으로 설명하기로 한다. Hereinafter, a description will be made on the basis of a voice recognition engine 10 composed of a pocketshpinx, which is a preferred example.

음성인식엔진(10)은 설치된 마이크 및 유무선 통신 기반 음성 전달장치로부터 입력된 음성을 인식 가능한 텍스트로 출력하는 언어처리를 수행할 수 있으며, 언어처리된 텍스트를 모션 제어를 위한 속도 제어 텍스트로 변환하여 할당할 수 있다.The voice recognition engine 10 can perform language processing to output the voice input from the installed microphone and the wired/wireless communication-based voice transmission device as a recognizable text, and convert the language-processed text into speed control text for motion control. Can be assigned.

이를 위해, 음성인식엔진(10)은 도 2에 도시된 바와 같이 전처리부(12) 및 인식부(14)를 포함할 수 있다.To this end, the voice recognition engine 10 may include a preprocessor 12 and a recognition unit 14 as shown in FIG. 2.

전처리부(12)는 음성 인식에 필요한 특징 벡터를 추출할 수 있다. 즉, 마이크로부터 음성이 입력되어 음성인식엔진(10)으로 들어오면, 전처리부(12)는 음성으로부터 음성학적 특징을 잘 표현해 줄 수 있는 특징 벡터를 추출할 수 있다. 이때, 전처리부(12)는 1/100(초) 단위로 특징 벡터를 추출할 수 있다. The preprocessor 12 may extract feature vectors required for speech recognition. That is, when a voice is input from a microphone and enters the voice recognition engine 10, the preprocessor 12 may extract a feature vector capable of expressing a phonetic feature well from the voice. In this case, the preprocessor 12 may extract a feature vector in units of 1/100 (second).

또한, 전처리부(12)는 특징 벡터 추출 시 MFCC(Mel Frequency Cepstral Coefficients) 알고리즘을 이용하여 추출할 수 있다. 여기서, MFCC 알고리즘은 입력된 소리 전체를 일정 구간(short time)을 나누어, 이 구간에 대한 스펙트럼(spectrum)을 분석하여 특징을 추출할 수 있다. 예를 들어, 일정 구간의 길이를 20 내지 40ms 단위로 나누고, 각 단위에 해당하는 스펙트럼(spectrum) 즉, 주파수를 계산하는 방식이다.In addition, the preprocessor 12 may extract the feature vector using a Mel Frequency Cepstral Coefficients (MFCC) algorithm when extracting the feature vector. Here, the MFCC algorithm may extract features by dividing the entire input sound into a short time, and analyzing a spectrum for this period. For example, this is a method of dividing the length of a certain section into units of 20 to 40 ms and calculating a spectrum corresponding to each unit, that is, a frequency.

한편, 전처리부(12)는 마이크, 회선 등에서 비롯되는 채널 왜곡 및 배경 잡음 등을 포함하는 잡음을 처리하는 잡음처리를 수행하여 특징 벡터를 추출할 수 있다. 이는, 음성인식 성공률을 높이기 위함으로, 특징벡터 추출 후 보상하거나, 잡음에 강한 특징 벡터를 도입하는 등의 방법을 사용할 수 있다. Meanwhile, the preprocessor 12 may extract a feature vector by performing noise processing for processing noise including channel distortion and background noise originating from a microphone, a line, and the like. In order to increase the success rate of speech recognition, a method such as compensating after extracting a feature vector or introducing a feature vector that is strong against noise can be used.

이를 통해, 특징 벡터를 추출한 전처리부(12)는 인식부(14)로 특징 벡터를 전송할 수 있다.Through this, the preprocessor 12 from which the feature vector has been extracted may transmit the feature vector to the recognition unit 14.

인식부(14)는 전처리부(12)에서 추출된 특징 벡터를 패턴 분석하여 언어처리 할 수 있다. 또한, 인식부(14)는 음성인식 알고리즘(16)을 저장할 수 있으며, 음성인식 알고리즘(16)을 통해 언어처리 할 수 있다. 여기서, 음성인식 알고리즘(16)은 도 3에 도시된 바와 같이 음향모델(16a), 언어모델(16b), 데이터 사전(16c)을 포함할 수 있다. The recognition unit 14 may perform language processing by pattern analysis of the feature vector extracted from the preprocessor 12. In addition, the recognition unit 14 may store the speech recognition algorithm 16 and may perform language processing through the speech recognition algorithm 16. Here, the speech recognition algorithm 16 may include an acoustic model 16a, a language model 16b, and a data dictionary 16c, as shown in FIG. 3.

보다 구체적으로, 인식부(14)는 전처리부(12)에서 추출된 음성의 특징 벡터를 전달받아 음성인식엔진(10)의 데이터베이스에 저장된 음향모델(16a)과 패턴 비교하여 인식 결과를 얻을 수 있다. 여기서, 도 4에 도시된 바와 같이 음향모델(16a)은 인식률을 높이기 위해 스피커를 통해 전달되는 화자의 음성과 적응성을 갖도록 형성되며, 이를 위해 음향모델(16a)은 MLLR(Maximum Likelihood Linear Regression) 및 MAP(Maximum A Posteriori) 적응기법을 이용할 수 있다. More specifically, the recognition unit 14 may receive the feature vector of the speech extracted from the preprocessor 12 and compare the pattern with the acoustic model 16a stored in the database of the speech recognition engine 10 to obtain a recognition result. . Here, as shown in FIG. 4, the acoustic model 16a is formed to have adaptability to the speaker's voice transmitted through the speaker in order to increase the recognition rate, and for this purpose, the acoustic model 16a is MLLR (Maximum Likelihood Linear Regression) and A MAP (Maximum A Posteriori) adaptation technique can be used.

이때, 음향모델(16a)은 MLLR(Maximum Likelihood Linear Regression) 적응 후, MAP(Maximum A Posteriori)을 실행할 수 있으며, 이를 통해 음향모델(16a)은 화자의 음성에 최대한 근접한 샘플을 제공하여 음성인식엔진(10)이 정확하게 음성을 인식할 수 있도록 할 수 있다.At this time, the acoustic model 16a can execute MAP (Maximum A Posteriori) after adapting MLLR (Maximum Likelihood Linear Regression), and through this, the acoustic model 16a provides a sample that is as close as possible to the speaker's voice, thereby providing a speech recognition engine. (10) This can make it possible to accurately recognize the voice.

한편, 음향모델(16a)은 한국어 또는 영어로 구성될 수 있다.Meanwhile, the acoustic model 16a may be configured in Korean or English.

언어모델(16b)은 음향모델(16a)을 통해 인식된 음성에 대하여 언어처리 할 수 있다. 이를 위해, 언어모델(16b)은 단어 단위 검색 및 문장 단위 검색을 포함할 수 있다. The language model 16b may perform language processing on the voice recognized through the acoustic model 16a. To this end, the language model 16b may include a word unit search and a sentence unit search.

단어 단위 검색은 음소를 포함하여 진행되며, 데이터베이스에 저장된 음향모델(16a)과의 단어 단위의 또는 음소 단위의 패턴 비교를 통해 가능한 후보 단어 또는 후보 음소를 추출할 수 있다. 이때, 상기 과정을 거친 후보 단어 또는 후보 음소는 문장 단위 검색으로 진행될 수 있다.The word-by-word search is performed by including phonemes, and possible candidate words or candidate phonemes may be extracted through a word-based or phoneme-based pattern comparison with the acoustic model 16a stored in the database. In this case, the candidate word or candidate phoneme through the above process may be searched for each sentence.

문장 단위 검색은 후보 단어 또는 후보 음소들의 정보를 토대로, 데이터 사전(Data dictionary)을 이용하여 문법 구조, 문장 문맥, 특정 주제 등에의 부합 여부를 판단하여 가장 적합한 단어나 음소를 판별할 수 있다.The sentence-by-sentence search can determine the most suitable word or phoneme by determining whether it conforms to a grammar structure, sentence context, or a specific subject using a data dictionary based on information on candidate words or candidate phonemes.

예를 들어, '우리는 바닷가에 간다' 라는 문장에서 불명확한 발음에 의해 '는'과 '능'의 구분이 어렵다고 가정하면, 단어 단위 검색에서는 '는'과 '능'이라는 두 개의 후보 단어를 결과로 생성할 수 있다. 이때, 문장 단위 검색에서는 데이터 사전(16c)을 이용한 문장 구조 분석을 통해 '는'은 문장에서 조사 역할을 담당하지만, '능'이라는 조사는 존재하지 않음을 인식하고 후보에서 배제할 수 있다. For example, in the sentence'We go to the beach', suppose it is difficult to distinguish between'Eun' and'Nung' due to an indefinite pronunciation. Can be produced as a result. In this case, in the sentence-by-sentence search, through the sentence structure analysis using the data dictionary 16c,'A' plays a role of investigation in the sentence, but it is recognized that there is no investigation'capability' and can be excluded from the candidate.

즉, 언어모델(16b)은 어휘 및 문법 구조를 제약하여 인식성능을 향상시키도록 언어처리 과정을 수행할 수 있다. 이러한 방법을 통해 음성인식엔진(10)의 음성 인식은 더 빠르게 실행되며, 음성인식결과는 더 정확할 수 있다.That is, the language model 16b may perform a language processing process to improve recognition performance by restricting vocabulary and grammar structures. Through this method, the speech recognition of the speech recognition engine 10 is performed faster, and the speech recognition result may be more accurate.

여기서, 언어모델(16b)은 통계적 패턴 인식을 기반으로 하며, 단어 단위 검색과 문장 단위 검색 과정을 하나로 통합한 방식인 HMM(Hidden Markov Model) 기법을 사용할 수 있다. 이는, 음성 단위에 해당하는 패턴들의 통계적 정보를 확률 모델 형태로 저장하고, 미지의 입력패턴이 들어오면 각각의 모델에서 들어온 미지의 패턴이 나올 수 있는 확률을 계산함으로써 미지의 패턴에 가장 적합한 음성단위를 찾아내는 방법이다.Here, the language model 16b is based on statistical pattern recognition, and a Hidden Markov Model (HMM) technique, which is a method in which a word-based search and a sentence-based search process are integrated into one, may be used. This is the most suitable speech unit for the unknown pattern by storing statistical information of the patterns corresponding to the speech unit in the form of a probability model, and calculating the probability that the unknown pattern from each model can come out when an unknown input pattern comes in. It's a way to find out.

한편, 언어모델(16b)은 언어처리 시, 로봇 또는 자율이동체의 모션을 제어할 수 있도록 처리된 언어를 텍스트화 하여 송출할 수 있다. On the other hand, the language model 16b may transmit the processed language as text so as to control the motion of a robot or an autonomous vehicle during language processing.

즉, 전처리부(12)에서 추출된 음성의 특징 벡터와 화자의 음성에 적응한 음향모델(16a)을 비교수행하여 인식된 음성은 언어모델(16b)을 통해 후보 단어 또는 후보 음소들을 추출하고, 후보 단어 또는 후보 음소들을 데이터 사전(16c)을 토대로 하여 가장 적합한 단어나 음소를 판별하여 정확한 문장 단위로 구분되며, 이때 문장 단위로 구분되도록 언어처리된 음성은 언어모델(16b)을 통해 텍스트화 될 수 있다.That is, the voice recognized by comparing the feature vector of the voice extracted by the preprocessor 12 with the acoustic model 16a adapted to the speaker's voice extracts candidate words or candidate phonemes through the language model 16b, Candidate words or candidate phonemes are identified based on the data dictionary (16c) to determine the most appropriate word or phoneme, and are divided into correct sentence units. At this time, the speech processed to be divided into sentences will be converted into text through the language model (16b) I can.

상기에서 언어 처리된 텍스트는 상술한 바와 같이 음성인식엔진(10)을 통해 로봇 또는 자율이동체의 모션 제어를 위해 모터의 속도를 조절할 수 있는 속도 제어 텍스트로 변환되어 할당될 수 있다.As described above, the language-processed text may be converted and assigned to a speed control text capable of adjusting the speed of a motor for motion control of a robot or an autonomous vehicle through the voice recognition engine 10 as described above.

할당된 속도 제어 텍스트는 모션을 제어하도록 프로그래밍 언어(20)를 통해 로봇 제어 명령으로 치환되어 로봇운영시스템(30)에 전달될 수 있다. The assigned speed control text may be replaced with a robot control command through the programming language 20 to control motion and transmitted to the robot operating system 30.

이때, 프로그래밍 언어(20)는 C/C++ 또는 파이썬(python)일 수 있다. 또한, 프로그래밍 언어(20)는 로봇운영시스템(30)과 유선 또는 무선으로 연결될 수 있다. 즉, 음성인식엔진(10)을 통해 속도 제어 텍스트를 할당받은 프로그래밍 언어(20)는 알고리즘을 통해 로봇 제어 명령을 생성하고 이를 로봇운영시스템(30)에 유선 또는 무선 방식으로 전달할 수 있다. 여기서, 무선 방식은 블루투스 또는 와이파이(Wi-Fi) 등의 방식으로 구성될 수 있다.In this case, the programming language 20 may be C/C++ or Python. In addition, the programming language 20 may be connected to the robot operating system 30 by wire or wirelessly. That is, the programming language 20 assigned the speed control text through the voice recognition engine 10 may generate a robot control command through an algorithm and transmit it to the robot operating system 30 in a wired or wireless manner. Here, the wireless method may be configured by a method such as Bluetooth or Wi-Fi.

로봇운영시스템(30)은 프로그래밍 언어(20)로부터 속도 제어 명령을 전달받을 수 있다. 또한, 로봇운영시스템(30)은 로봇 또는 자율이동체를 구동하는 구동부(40)에 속도 값을 전달함으로써 구동부(40)를 제어할 수 있다. 즉, 로봇운영시스템(30)은 프로그래밍 언어(20)로부터 속도 제어 명령에 따른 속도 값을 조절하여 로봇 또는 자율이동체의 모션이 제어되도록 구동부(40)를 제어할 수 있다. The robot operating system 30 may receive a speed control command from the programming language 20. In addition, the robot operating system 30 may control the driving unit 40 by transmitting a speed value to the driving unit 40 that drives the robot or the autonomous vehicle. That is, the robot operating system 30 may control the driving unit 40 to control the motion of the robot or the autonomous moving object by adjusting the speed value according to the speed control command from the programming language 20.

구동부(40)는 로봇운영시스템(30)으로부터 속도 값을 전달받아 로봇 또는 자율이동체를 구동할 수 있다. 이를 위해, 구동부(40)는 저레벨 프로세서(42), PWM 생성기(44), DC 모터(46)를 포함할 수 있다.The driving unit 40 may receive a speed value from the robot operating system 30 to drive a robot or an autonomous moving object. To this end, the driving unit 40 may include a low-level processor 42, a PWM generator 44, and a DC motor 46.

구체적으로, 저레벨 프로세서(42)는 로봇운영시스템(30)을 통해 전달된 속도 값을 저레벨(Low-level) 신호로 프로그래밍하여 입력할 수 있다.Specifically, the low-level processor 42 may program and input a speed value transmitted through the robot operating system 30 as a low-level signal.

PWM 생성기(44)는 저레벨 프로세서(42)로부터 저레벨(Low-level)로 프로그래밍되어 입력된 저레벨 신호의 펄스를 변조할 수 있다. 즉, 속도 값을 표현한 저레벨 신호의 펄스 변조를 통해 속도 값을 제어할 수 있다. The PWM generator 44 may be programmed from the low-level processor 42 to a low-level to modulate a pulse of an input low-level signal. That is, the speed value can be controlled through pulse modulation of the low-level signal expressing the speed value.

DC 모터(46)는 로봇 또는 자율이동체의 바퀴 등과 연결되어, PWM 생성기(44)로 변조된 펄스에 따라 로봇 또는 자율이동체를 구동할 수 있다. 이때, DC 모터(46)는 각 바퀴마다 구비되는 것이 바람직하며, 각각의 DC 모터(46)의 제어속도를 달리하여 로봇 또는 자율이동체의 모션을 제어할 수 있다.The DC motor 46 may be connected to a robot or a wheel of an autonomous vehicle, and may drive the robot or autonomous vehicle according to a pulse modulated by the PWM generator 44. At this time, it is preferable that the DC motor 46 is provided for each wheel, and the motion of the robot or the autonomous vehicle may be controlled by varying the control speed of each DC motor 46.

이하, 도 5 내지 도 7을 참조하여, 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈의 작동방법을 설명하기로 한다.Hereinafter, a method of operating a single-type voice recognition-based agent module for precise motion control of robots and autonomous vehicles according to an embodiment of the present invention will be described with reference to FIGS. 5 to 7.

도 5는 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈의 작동 흐름도이며, 도 6은 도 5의 (c) 단계의 작동 흐름도이고, 도 7은 도 5의 (f) 단계의 작동 흐름도이다.5 is a flowchart illustrating an operation of a single-type voice recognition-based agent module for precise motion control of a robot and an autonomous mobile according to an embodiment of the present invention, and FIG. 6 is an operation flowchart of step (c) of FIG. 5, and FIG. 7 is This is the operation flow chart of step 5(f).

도 5 내지 도 7을 참조하면, 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈의 작동방법은 하기 (a) 내지 (f) 단계 순으로 진행될 수 있다. Referring to FIGS. 5 to 7, a method of operating a single-type voice recognition-based agent module for precise motion control of robots and autonomous vehicles according to an embodiment of the present invention may proceed in the following steps (a) to (f). .

(a) 마이크를 통한 사용자의 음성 입력 단계(S100)(a) User's voice input step through a microphone (S100)

- 로봇 또는 자율이동체에 설치된 마이크를 통해 사용자가 음성 명령을 입력할 수 있다. 이때, 음성 명령은 음성인식엔진(10)으로 전달될 수 있다.-The user can input voice commands through the microphone installed on the robot or autonomous vehicle. At this time, the voice command may be transmitted to the voice recognition engine 10.

(b) 입력된 음성의 특징 벡터를 추출하는 단계(S200)(b) extracting a feature vector of the input speech (S200)

- 음성 명령이 음성인식엔진(10)에 도달하면, 음성인식엔진(10)의 전처리부(12)는 입력된 음성의 특징 벡터를 추출할 수 있다. 여기서, 음성인식엔진(10)은 포켓스피닉스(Pocketsphinx)로 구비될 수 있으며, MFCC(Mel Frequency Cepstral Coefficients) 알고리즘을 이용하여 1/100(초) 단위로 특징 벡터를 추출할 수 있다. -When the voice command reaches the voice recognition engine 10, the preprocessor 12 of the voice recognition engine 10 may extract a feature vector of the input voice. Here, the speech recognition engine 10 may be provided with Pocketsphinx, and feature vectors may be extracted in units of 1/100 (second) using a Mel Frequency Cepstral Coefficients (MFCC) algorithm.

(c) 특징 벡터를 음성인식 알고리즘을 이용하여 인식 가능한 텍스트로 변환하는 단계(S300)(c) converting the feature vector into recognizable text using a speech recognition algorithm (S300)

- 이 단계는 음성인식엔진(10)의 인식부(14)에서 수행될 수 있다. 이때, 음성인식 알고리즘(16)은 인식부(14)에 저장될 수 있다. 즉, 인식부(14)는 전처리부(12)로부터 추출된 특징 벡터를 전달받을 수 있으며, 인식부(14)는 저장된 음성인식 알고리즘(16)을 이용하여 인식 가능한 텍스트로 변환할 수 있다. -This step can be performed in the recognition unit 14 of the voice recognition engine 10. In this case, the speech recognition algorithm 16 may be stored in the recognition unit 14. That is, the recognition unit 14 may receive the feature vector extracted from the preprocessor 12, and the recognition unit 14 may convert the text into a recognizable text using the stored speech recognition algorithm 16.

구체적으로, 음성인식 알고리즘(16)은 음향모델(16a), 언어모델(16b), 데이터 사전(16c)을 포함할 수 있으며, 하기 3 단계로 수행될 수 있다.Specifically, the speech recognition algorithm 16 may include an acoustic model 16a, a language model 16b, and a data dictionary 16c, and may be performed in the following three steps.

1 단계 : 음향모델(16a)이 화자의 음성에 적응하는 단계(S310)Step 1: Step of adapting the acoustic model 16a to the speaker's voice (S310)

- 음향모델(16a)은 전처리부(12)로부터 추출된 특징 벡터와 비교되어 음성 인식 결과를 도출할 수 있는데, 이때 화자마다 음성학적 특징이 다른점을 고려하여 음향모델(16a)이 화자적응할 수 있도록 형성될 수 있다. 여기서, 음향모델(16a)은 MLLR(Maximum Likelihood Linear Regression) 및 MAP(Maximum A Posteriori) 적응기법을 이용할 수 있으며, MLLR(Maximum Likelihood Linear Regression) 적응 수행 뒤, MAP(Maximum A Posteriori) 적응을 순차적으로 진행할 수 있다.-The acoustic model 16a can be compared with the feature vector extracted from the preprocessing unit 12 to derive the speech recognition result.At this time, the acoustic model 16a can adapt the speaker in consideration of the differences in phonetic characteristics for each speaker. It can be formed to be. Here, the acoustic model 16a can use the MLLR (Maximum Likelihood Linear Regression) and MAP (Maximum A Posteriori) adaptation techniques, and after performing the MLLR (Maximum Likelihood Linear Regression) adaptation, MAP (Maximum A Posteriori) adaptation is sequentially performed. You can proceed.

이를 통해, 보다 인식률을 높일 수 있다. Through this, the recognition rate can be further increased.

2 단계 : 화자에 적응한 음향모델과 특징 벡터를 비교하여 음성을 인식하는 단계(S320)Step 2: Comparing the acoustic model adapted to the speaker and the feature vector to recognize speech (S320)

- 1 단계에서 초기 음향모델(16a)이 화자에 적응하면, 특징 벡터를 적응 음향모델(16a) 비교수행할 수 있다. 이때, 음향모델(16a)은 화자적응을 통해 보다 정확한 인식을 수행할 수 있다.-In step 1, if the initial acoustic model 16a is adapted to the speaker, the feature vector can be compared with the adaptive acoustic model 16a. In this case, the acoustic model 16a may perform more accurate recognition through speaker adaptation.

3 단계 : 인식된 음성에 따라 언어모델(16b)이 후보 음소 또는 후보 단어를 추출 후, 데이터 사전(16c)을 이용하여 정확한 음성을 판별하여, 로봇 및 자율이동체 모션 제어를 위해 인식 가능한 텍스트 형태로 변환하는 단계(S330)Step 3: After the language model 16b extracts a candidate phoneme or a candidate word according to the recognized voice, it determines the correct voice using the data dictionary 16c, and converts it into a recognizable text form for motion control of robots and autonomous vehicles. Converting step (S330)

- 2 단계에서 인식된 음성에 따라 언어모델(16b)은 HMM(Hidden Markov Model) 기법을 통한 단어 단위 검색과 문장 단위 검색을 통해 후보 음소 또는 후보 단어를 추출할 수 있다. 여기서, 언어모델(16b)은 추출된 후보 음소 또는 후보 단어를 기설정된 데이터 사전(16c)을 통해 비교하여 가장 적합한 단어나 음소를 판별할 수 있다.-According to the voice recognized in step 2, the language model 16b may extract a candidate phoneme or a candidate word through a word unit search and a sentence unit search through HMM (Hidden Markov Model) technique. Here, the language model 16b may determine the most suitable word or phoneme by comparing the extracted candidate phoneme or candidate word through the preset data dictionary 16c.

상기의 과정을 통해 판별된 문장은 로봇 및 자율이동체 정밀 모션 제어를 위해 인식 가능한 텍스트 형태로 변환할 수 있다.The sentence determined through the above process may be converted into a recognizable text format for precise motion control of robots and autonomous vehicles.

(d) 인식 가능하도록 변환된 텍스트를 속도 제어 텍스트로 변환하는 단계(S400)(d) converting the text converted to be recognizable into speed control text (S400)

- 상기 음성인식 알고리즘(16)을 통해 도출된 인식 가능한 텍스트는 속도를 제어 할 수 있는 텍스트로 변환되어 프로그래밍 언어(20)로 전달될 수 있다. 이 단계는 음성인식엔진(10)에서 수행될 수 있다.-Recognizable text derived through the speech recognition algorithm 16 may be converted into text capable of controlling the speed and transmitted to the programming language 20. This step may be performed in the voice recognition engine 10.

(e) 속도 제어 텍스트를 프로그래밍 언어를 통해 속도 제어 명령을 생성하는 단계(S500)(e) generating a speed control command through a programming language for the speed control text (S500)

- 프로그래밍 언어(20)로 전달된 속도 제어 텍스트는 코딩된 알고리즘을 통해 속도 제어 명령을 생성할 수 있다. 이때, 생성된 속도 제어 명령은 Wi-fi 또는 블루투스(Bluetooth) 등의 유선방식 또는 무선방식으로 로봇 또는 자율이동체를 제어하는 로봇운영시스템에 전달될 수 있다.-The speed control text delivered to the programming language 20 can generate a speed control command through a coded algorithm. At this time, the generated speed control command may be transmitted to a robot operating system that controls a robot or an autonomous mobile body in a wired or wireless method such as Wi-fi or Bluetooth.

한편, 프로그래밍 언어(20)는 C/C++ 또는 파이썬(Python)으로 구비될 수 있다.Meanwhile, the programming language 20 may be provided in C/C++ or Python.

(f) 상기 속도 제어 명령에 따라 로봇운영시스템(30)이 로봇 또는 자율이동체의 모션을 동작시키는 구동부(40)를 제어하는 단계(S600)(f) step of controlling the driving unit 40 for operating the motion of the robot or autonomous vehicle by the robot operating system 30 according to the speed control command (S600)

- 속도 제어 명령을 전달받은 로봇운영시스템(30)은 로봇 또는 자율이동체의 모션을 동작시키는 구동부(40)가 속도를 내도록 속도 명령(속도 값)을 보내어 제어할 수 있다.-The robot operating system 30 that has received the speed control command can be controlled by sending a speed command (speed value) so that the driving unit 40 that operates the motion of the robot or the autonomous moving object generates a speed.

여기서, 구동부(40)는 저레벨 프로세서(42), PWM 생성기(44), DC 모터(46)를 포함하여 하기 3 단계 진행을 수행할 수 있다.Here, the driving unit 40 may perform the following three steps including the low-level processor 42, the PWM generator 44, and the DC motor 46.

1 단계 : 상기 로봇운영시스템(30)으로부터 저레벨 프로세서(42)가 속도 값(속도 명령)을 전달받아 속도를 저레벨 신호로 입력하는 단계(S610)Step 1: Step of receiving a speed value (speed command) by the low-level processor 42 from the robot operating system 30 and inputting the speed as a low-level signal (S610)

- 저레벨 프로세서(42)는 로봇운영시스템(30)으로부터 속도 값을 전달받을 수 있다. 이때, 속도 값을 전달받은 저레벨 프로세서(42)는 속도를 프로그래밍을 통해 저레벨 신호로 입력할 수 있다. -The low-level processor 42 may receive a speed value from the robot operating system 30. In this case, the low-level processor 42 receiving the speed value may input the speed as a low-level signal through programming.

2 단계 : PWM 생성기가 상기 저레벨 프로세서(42)를 통해 입력된 저레벨의 속도 신호를 펄스 변조하는 단계(S620)Step 2: The PWM generator pulse-modulating the low-level speed signal input through the low-level processor (42) (S620)

- 저레벨 프로세서(42)를 통해 프로그래밍 된 저레벨의 속도 신호를 PWM 생성기(44)가 인수받아 신호를 펄스 변조할 수 있다. -The PWM generator 44 receives the low-level speed signal programmed through the low-level processor 42 and pulse-modulates the signal.

3 단계 : DC 모터가 상기 PWM 생성기(44)로부터 변조된 펄스에 따라 로봇 또는 자율이동체의 모션이 동작되도록 구동하는 단계(S630)Step 3: Driving the DC motor to operate the motion of the robot or autonomous vehicle according to the pulse modulated from the PWM generator 44 (S630)

- PWM 생성기(44)로부터 변조된 펄스는 로봇 또는 자율이동체의 바퀴마다 구비된 각각의 DC 모터(46)로 전송되어 DC 모터(46)가 펄스에 따라 속도가 제어됨으로써 로봇 및 자율이동체의 모션이 제어될 수 있다.-The pulse modulated from the PWM generator 44 is transmitted to each DC motor 46 provided for each wheel of the robot or autonomous vehicle, and the speed of the DC motor 46 is controlled according to the pulse, so that the motion of the robot and the autonomous vehicle is controlled. Can be controlled.

이에 따라, 종래의 개방형 음성인식 모듈은 인터넷 연결을 통해서만 서비스가 제공되었지만, 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈 및 그의 작동방법은 인터넷 연결이 없는 공간에서도 로봇을 자유롭게 제어할 수 있는 특징이 있다.Accordingly, the conventional open voice recognition module was provided with a service only through an Internet connection, but the standalone voice recognition-based agent module for precise motion control of robots and autonomous vehicles according to an embodiment of the present invention and its operation method are not connected to the Internet. It has the feature of being able to freely control the robot even in a space without it.

이상으로 첨부된 도면을 참조하여 본 발명의 실시예를 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고 다른 구체적인 형태로 실시할 수 있다는 것을 이해할 수 있을 것이다. 따라서 이상에서 기술한 실시예는 모든 면에서 예시적인 것이며 한정적이 아닌 것이다.The embodiments of the present invention have been described above with reference to the accompanying drawings, but those of ordinary skill in the art to which the present invention pertains can be implemented in other specific forms without changing the technical spirit or essential features of the present invention. You will be able to understand. Accordingly, the embodiments described above are illustrative and non-limiting in all respects.

10 : 음성인식엔진
12 : 전처리부
14 : 인식부
16 : 음성인식 알고리즘
16a : 음향모델
16b : 언어모델
16c : 데이터 사전
20 : 프로그래밍 언어
30 : 로봇운영시스템
40 : 구동부
42 : 저레벨 프로세서
44 : PWM 생성기
46 : DC 모터10: voice recognition engine
12: pre-treatment unit
14: recognition unit
16: speech recognition algorithm
16a: acoustic model
16b: language model
16c: data dictionary
20: programming language
30: Robot operation system
40: drive unit
42: low-level processor
44: PWM generator
46: DC motor

Claims

As a standalone embedded voice recognition-based agent module that can accurately recognize and interpret robot control voice commands even in a space without internet connection,
A speech recognition engine that performs language processing to output a speech input through a microphone installed in a robot or an autonomous mobile body and a speech transmission device based on wired/wireless communication as a recognizable text, and converts and assigns it to a speed control text for motion control;
A robot operating system that receives a speed control command through a programming language to which a speed control text is assigned from the speech recognition engine, and transmits and controls a speed value to drive the robot or autonomous vehicle;
It includes a driving unit that receives a speed value from the robot operating system and drives a robot or an autonomous moving body,
The voice recognition engine,
Feature vectors required for speech recognition are compensated for noise including channel distortion and background noise after feature vectors are extracted, or noise is processed by introducing feature vectors that are strong against noise, and sound input using MFCC (Mel frequency cepstral coefficients) algorithm A preprocessor that divides the entire short time, analyzes the spectrum for this section, and extracts it in units of 1/100 (second), and
A recognition unit storing a speech recognition algorithm, and analyzing a feature vector extracted from the preprocessing unit through the speech recognition algorithm to process a language,
The speech recognition algorithm,
After adaptation of MLLR (Maximum Likelihood Linear Regression) to the voice input through the microphone, the MAP (Maximum A Posteriori) adaptation technique is used to have the speaker's voice and adaptability, and an acoustic model consisting of Korean or English;
Converts the extracted feature vector into a recognizable text form after pattern comparison with the adaptive acoustic model, including word-by-word and sentence-by-sentence searches, and language processing by constraining vocabulary and grammar structures to convert the processed language into text Based on the statistical pattern recognition of HMM (Hidden Markov Model), statistical information of patterns corresponding to speech units is stored in the form of a probability model, and when an unknown input pattern comes in, an unknown pattern from each model comes out. A language model that finds the most suitable phonetic unit for an unknown pattern by calculating the possible probability and
A data dictionary for determining whether the language model can be converted into a recognizable text form when comparing the extracted feature vector and the acoustic model,
The recognition unit receives the feature vector of the speech extracted from the preprocessor and compares the pattern with the acoustic model to obtain a recognition result,
The word-by-word search of the language model is performed including a phoneme-by-phone search, and after extracting a possible candidate word or a candidate phoneme through a word-by-word or phoneme-by-phone pattern comparison with the acoustic model stored in the database, the sentence-by-sentence search is performed. ,
The sentence-by-sentence search determines the most suitable word or phoneme by determining whether it conforms to a grammar structure, sentence context, and a specific topic using a data dictionary, based on information on candidate words or candidate phonemes,
The voice recognition engine is composed of Pocketsphinx,
The programming language is Python or C/C++,
Transmission of the speed control command from the programming language to the robot operating system is transmitted through a wired communication method,
The driving unit,
A low-level processor for receiving a speed value from the robot operating system and inputting a speed as a low-level signal;
PWM generator for pulse-modulating the speed signal input through the low-level processor and
A standalone voice recognition-based agent module for precise motion control of robots and autonomous vehicles, including a DC motor that drives a robot or an autonomous vehicle according to a pulse modulated from the PWM generator.

delete