JPS61151706A

JPS61151706A - Voice recognition control system

Info

Publication number: JPS61151706A
Application number: JP59273136A
Authority: JP
Inventors: Yoshihide Sugino; 杉野　芳英
Original assignee: Azbil Corp
Current assignee: Azbil Corp
Priority date: 1984-12-26
Filing date: 1984-12-26
Publication date: 1986-07-10

Abstract

PURPOSE:To respond to an automatic device according to a conservation voice by applying control corresponding to a key word and a retrigger in response to recognition when the retrigger word is recognized after a pre-registered key word is recognized. CONSTITUTION:A portable mobile station 1 and a robot 10 are communicated by ratio waves with frequencies f1, f2. A voice recognizer 12 of the robot 10 recognizes the received ratio wave, a conversation control section 13 controls a voice synthesizer 14 in response to the recognized output and the response voice is outputted. Further, when various commands are given from a keyboard 24, a main control section 21 and the section 13 respond thereto and a desired control condition is set. When the keyboard 24 commands the preparation of registration of the key word and the trigger word, the device 13 responds thereto and controls the device 14 to output a synthesized voice representing that the registration preparation is set, controls the device 12 to set registration preparation. Then each word is transmitted from a microphone 3a to registrate it and each word of the same uttering personnel is recognized individually afterward.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、音声を認識して所定の制御を行なう方式に関
するものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a method for recognizing speech and performing predetermined control.

[Conventional technology]

音声認識の方式そのものとしては、現在種々のものが開
発され、次第に汎用化されつ＼あるが、いずれも、比較
的少数の音を組み合せた単語の認識に止っており、一連
の会話的な音声を認識し、これに応じた制御を行なうこ
とは、装置の構成上制約を受け、実用することが不可能
となっている。Currently, various speech recognition methods have been developed and are gradually becoming more general-purpose, but all of them only recognize words that are made up of a relatively small number of sounds, and only recognize a series of conversational sounds. Recognizing this and performing control accordingly is limited by the configuration of the device, making it impossible to put it to practical use.

[Problem that the invention seeks to solve]

この丸め、ロボット等の自動装置に対し、会話的な音声
認識を行なわせ、これに応じた動作を行なわせることが
不可能であり、自動装置を会話的な音声による指令にし
たがって応動させることができず、自動装置の応動状況
が不自然となる問題を生じている。This means that it is impossible to make automatic devices such as robots perform conversational voice recognition and take actions accordingly, and it is impossible for automatic devices to respond according to conversational voice commands. This has resulted in a problem in which the automatic device's response situation is unnatural.

[Means for solving problems]

前述の問題を解決するた″め、本発明はつぎの手段によ
り構成するものとなっている。In order to solve the above-mentioned problem, the present invention is constructed by the following means.

すなわち、事前に音声認識用として登録したキーワード
およびトリガワードを定め、キーワードを認識しかつこ
れの後にトリガワードを認識したとき、キーワードおよ
びトリガワードと対応した制御をトリガワードの認識に
応じて行なうものとしている。That is, a keyword and a trigger word that are registered in advance for speech recognition are determined, and when the keyword is recognized and a trigger word is subsequently recognized, the control corresponding to the keyword and trigger word is performed in accordance with the recognition of the trigger word. It is said that

[For production]

したがって、一連の会話的な指令中へまずキーワードを
挿入し、これの後に最終的な時点においてトリガワード
を挿入すれば、これ忙応じてキーワードおよびトリガワ
ードと対応した制御が行なわれるものとなり、あたかも
、自動装置が会話を連続的に認識し、これの終了にした
がって指令を実行する状況となる。Therefore, if a keyword is first inserted into a series of conversational commands, and then a trigger word is inserted at the final point in time, the control corresponding to the keyword and trigger word will be performed as needed. , a situation arises in which the automatic device continuously recognizes the conversation and executes the commands according to its termination.

[Implementation row]

以下、実施列を示す図によって本発明の詳細な説明する
。Hereinafter, the present invention will be explained in detail with reference to figures showing implementation sequences.

第２図は、構成を示すブロック図であり、携帯用無線機
等の移動局１は、送受信機（以下、ＴＲｘ）２およびヘ
ッドセット３により構成され、ヘッドセット３のマイク
ロホン３ａによってピックアップされ九音声は、ＴＲＸ
　２を介し周波数ｆｌの電波として送信され、ロボット
１０側のＴＲＸ１１により受信されるものとなっており
、これの受信出力は音声認識装置（以下、Ｖ（Ｊ）　１
２へ与えられ、こ＼において音声認識が行なわれたうえ
、この認識出力が対話制御部（以下、０ＣＴ）１３へ与
えられ、これに応じてＣＣＴ１３が音声合成装置（以下
、ＶＳＺ）１４１に制御し、これからの応答音声をＴＲ
Ｘ１１および増幅器（以下、ＰＡ）１５−＼与え、ＴＲ
Ｘ１１を介して周波数ｆ！の電波により送信する一方、
ＰＡ１５を経てスピーカ１６を駆動するものとなってい
る。FIG. 2 is a block diagram showing the configuration. A mobile station 1 such as a portable radio is configured with a transmitter/receiver (hereinafter referred to as TRx) 2 and a headset 3, and is picked up by a microphone 3a of the headset 3. Audio is TRX
2 as a radio wave with a frequency fl, and is received by the TRX 11 on the robot 10 side, and the reception output of this is a voice recognition device (hereinafter referred to as V(J) 1).
2, speech recognition is performed here, and this recognition output is given to the dialogue control unit (hereinafter referred to as 0CT) 13, and in accordance with this, the CCT 13 controls the speech synthesis device (hereinafter referred to as VSZ) 141. and TR the future response voice.
X11 and amplifier (hereinafter referred to as PA) 15-\ given, TR
Frequency f! via X11 While transmitting by radio waves,
The speaker 16 is driven through the PA 15.

このため、マイクロホン３ａから所定の音声を与えれば
、これに応じた合成音声がスピーカ１６から放出される
と共に、ＴＲＸ２により受信され、ヘッドセット３の受
信器３ｂＫより聴取できるものとなる。Therefore, when a predetermined sound is given from the microphone 3a, a corresponding synthesized sound is emitted from the speaker 16, received by the TRX2, and can be heard from the receiver 3bK of the headset 3.

また、　ＣＣＴ　１３からは、音声の認識に応じた信号
が母線を介して主制御部（以下、ＭＣＴ）２１へ送出さ
れ、こ＼において各部を制御するための判断が行なわれ
たうえ、表示部（以下、ＤＰ）２２および各部を駆動す
る各モータ用の駆動回路（以下、ＤＲＹ　）　２３に対
して制御信号を送出するものとなっており、テンキーを
用いたキーボード（以下、ＫＢ）２４からは各種の直接
指示が与えられると、これによってＭＣＴ２１、ＣＣＴ
　１３が応動し、所望の制御条件が設定されるものとな
っている。In addition, from the CCT 13, a signal corresponding to the voice recognition is sent via the bus to the main control unit (hereinafter referred to as MCT) 21, where judgments are made to control each unit, and the display unit (hereinafter referred to as DP) 22 and a drive circuit (hereinafter referred to as DRY) 23 for each motor that drives each part, control signals are sent from a keyboard (hereinafter referred to as KB) 24 using a numeric keypad. When various direct instructions are given, MCT21, CCT
13 responds and the desired control conditions are set.

したがって、ＫＢ２４の操作によりキーワードおよびト
リガワードの登録準備を指示すれば、これによってＣＣ
Ｔ１３が応動し、ＶＳＺ１４を制御して各ワードの登録
準備状態が設定された旨の合成音声を送出させると共に
、ＶＣＥ１２を制御してこれらの登録準備を設定するた
め、マイクロホン３ａから各ワードを個別に送出するこ
とにより、Ｖ（Ｊ　１２において各ワードの登録がなさ
れ、これ以稜は、同一発声者の各ワードを各個に認識す
るものとなる。Therefore, if you instruct the preparation of keyword and trigger word registration by operating KB24, this will allow you to
The T13 responds and controls the VSZ14 to send out a synthesized voice indicating that the registration preparation state for each word has been set, and also controls the VCE12 to individually record each word from the microphone 3a in order to set these registration preparations. By transmitting the words to V(J12), each word is registered, and from this point on, each word of the same speaker is individually recognized.

第３図は、キーワード（以下、ＫＷ）３１およびトリガ
ワード（以下、ＴＶ）３２の認識状況を示す図でおり、
前言３３につぎ１　例えば０．３〜１．０　ｓ　ｅ　ｃ
のスペース時間ｔｓを設けてからＫＷ３１が与えられ、
これについでスペース時間ｔｓを介して中間言３４が与
えられた後、更にスペース時間ｔｓを経てからＴＶ３２
が最終的に与えられると、前言３３および中間言３４は
無視され、まず、ＫＷ３１が認識された後にＴＶ３２が
認識され、両認識結果の論理積にしたがい、ＫＷ３１お
よびＴＶ３２と対応して定められた制御がＴＶ３２の認
識時点を基準として開始されるものとなっている。FIG. 3 is a diagram showing the recognition status of a keyword (hereinafter referred to as KW) 31 and a trigger word (hereinafter referred to as TV) 32.
Following the previous statement 33, 1 For example, 0.3 to 1.0 s e c
After setting the space time ts, KW31 is given,
After this, the intermediate word 34 is given through the space time ts, and after the space time ts, the TV 32
is finally given, the preceding word 33 and the intermediate word 34 are ignored, first, KW31 is recognized, then TV32 is recognized, and according to the logical product of both recognition results, it is determined to correspond to KW31 and TV32. Control is started based on the time of recognition by the TV 32.

したがって、飼えば、１みなさん“をＫＶ／３１．１ち
ょうだい′をＴＶ３２として音声認識用に登録のうえ、
「ロボット君、皆さんに、自己紹介をして、頂戴」とマ
イクロホン３ａから送話すると、「皆さん」および「頂
戴」を順次に認識し、「頂戴」に応じ、例えば、首を下
げてから走行により一回転し、［ワタクシハ、ロボット
ノ、○○クンテス」等のスピーカ１６による発声を行な
い、挨拶および自己紹介の制御が実行されるものとなる
。Therefore, if you keep it, you can register it for voice recognition as "KV/31.1 Give me" as TV32.
When you send the message "Robot-kun, please introduce yourselves to everyone and give it to me" from the microphone 3a, it will recognize "everyone" and "taidai" in sequence, and in response to "taidai", for example, it will lower its head before running. The robot rotates once, and the speaker 16 makes a sound such as "I'm a robot, ○○ Kuntesu," and controls for greeting and self-introduction are executed.

なお、ＣＣＴ　１３、ＭＣＴ２１は、マイクロプロセッ
サ等のプロセッサおよびメモリ等からなり、メモリ中の
命令をプロセッサが実行し、所定のデータをメモリへア
クセスしながら上述の各制御を行なうものとなっている
。The CCT 13 and MCT 21 each include a processor such as a microprocessor, a memory, etc., and the processor executes instructions in the memory and performs the above-mentioned controls while accessing predetermined data to the memory.

第１図は、ＭＣＴ２１中のプロセッサによる制御状況の
７０−チャートであり、ＫＷ３１の認識を示す信号がＣ
ＣＴ１３から与えられるとこのルーチンが’　５ＴＡＲ
Ｔ　’　Ｌ、つぎの音声が与えられるのを待機中である
旨を示すＤＰ　２２中の１待機表示灯・点滅Ｉおよび、
実行する１プログラム番号表示’　１０１　ｉＤＰ２２
の文字表示器により行ない、つぎに与えられた音声の８
識結果が１トマレ？’１０２のＮ（Ｎｏ）であり、かつ
、’ＴＶ？”１０３のＮであれば、プロセッサ中へ構成
したタイマーにより例えば５８　ｅｃの時間％Ｔ経過？
　”１０４がＮの間はステップ１０１以降を反復し、ス
テップ１０４がＹ（ＹＥＳ）となるのにしたがい’ＥＸ
ＩＴ“を介して別途の主ルーチンへ復帰する。FIG. 1 is a 70-chart of the control status by the processor in MCT21, and the signal indicating recognition of KW31 is C
When given from CT13, this routine returns '5TAR
T' L, 1 standby indicator light/blinking I in DP 22 indicating waiting for the next voice, and
1 program number display to be executed' 101 iDP22
8 of the given voice.
The result is 1 tomare? '102 N (No) and 'TV? “If it is N of 103, then the timer configured in the processor will elapse the time %T of, say, 58 ec?
"While 104 is N, repeat steps 101 and after, and as step 104 becomes Y (YES), 'EX'
Return to a separate main routine via IT".

以上に対し、ステップ１０３がＹとなれば、１待機表示
灯・消灯“１１１を行なってから、ＫＷ３１およびＴＷ
３２と１対応するプログラム実行〃１１２へ移行し、′
プログラム終了？＃１１３のＮおよび、与えられる音声
の認識結果が１トマレ？“１１４ＯＮであることを前提
としてステップ１１２以降を反プし、ステップ１１３ま
たは１１４がＹとなればｖＡＥＸＩＴ”ｉ介して主ルー
チンへ復帰する。In contrast to the above, if step 103 is Y, 1 standby indicator light is turned off.
Execute the program corresponding to 32 and 1. Move to 112,'
The end of the program? #113 N and the recognition result of the given voice is 1 tomare? Step 112 and subsequent steps are repeated on the assumption that "114 is ON," and if step 113 or 114 is Y, the process returns to the main routine via vAEXIT.

なお、第、３図のスペース時間ｔＳは、例えばＣＣＴ１
３において規正される。Note that the space time tS in FIG. 3 is, for example, CCT1
3.

したがって、Ｋｗ３１１に認識してから後にＴＷ３２が
認識され＼ば、ステップ１１２が直ちに実行されると共
に、ＫＷ３１を認識した後に１　トマレ“が認識され＼
は第１図のルーチンが中断され、ＴＷ３２の認識後にお
いても同様となり、不測な状態の発生を任意に阻止でき
る一方、ＫＷ３１をＶ識した後、時間Ｔ以内にＴＷ３２
を認識できない場合も同様となる。Therefore, if TW32 is recognized after being recognized by Kw311, step 112 is immediately executed, and 1 Tomare is recognized after recognizing KW31.
The routine shown in FIG. 1 is interrupted, and the same happens after the recognition of TW32, and while it is possible to arbitrarily prevent the occurrence of unexpected conditions, the routine of FIG.
The same thing applies if it cannot be recognized.

たソし、第１図においては、条件に応じて不要のステッ
プを省略してもよく、第２図の構成は状況にしたがった
選定が任意であり、第３図ではＫＷ３１を複数としても
同様である等、種々の変形が自在である。However, in Figure 1, unnecessary steps may be omitted depending on the conditions, and the configuration in Figure 2 can be arbitrarily selected according to the situation, and in Figure 3, the same can be done even if there are multiple KW31s. Various modifications are possible, such as.

〔Effect of the invention〕

以上の説明により明らかなとおり本発明によれば、特に
複雑な構成とせずに会話的な指令に応する制御がなされ
、展示用ロボット等のヒユーマイノイド的な自動装置に
おいて顕著な効果が得られる。As is clear from the above description, according to the present invention, control can be performed in response to conversational commands without a particularly complicated configuration, and remarkable effects can be obtained in humanoid-like automatic devices such as exhibition robots.

[Brief explanation of drawings]

図は本発明の実施例を示し、第１図は制御状況のフロー
チャート、第２図はブロック図、第３図は音声認識の状
況を示す図である。２．１１−−−−　ＴＲＸ（送受信様）、３ａ・、。拳マイクロホン、１２拳・・・ＶＣＥ　（’ｔ　？ｕ識
装置）、１３・・・・ＣＣＴ　（対話制御部）、１４・
Φ・・ＶＳＺ　（音声合成装置）、１６・・・・スピー
カ、２１・Φ・・ＭＣＴ（主制御部）、２２・・・・Ｄ
Ｐ（表示部）、２３・・・・ＤＲＶ（駆動回路）、２４
・拳・・ＫＢ　　（キーボード）。特許出願人　山武ハネウェル株式会社代　　理　　人　山　川　政　樹（は〃・２名）第１図The figures show an embodiment of the present invention; FIG. 1 is a flowchart of the control situation, FIG. 2 is a block diagram, and FIG. 3 is a diagram showing the speech recognition situation. 2.11----- TRX (transmitting/receiving), 3a. Fist microphone, 12 fist...VCE ('t? u recognition device), 13... CCT (dialogue control unit), 14...
Φ...VSZ (speech synthesizer), 16...speaker, 21...MCT (main control unit), 22...D
P (display section), 23... DRV (drive circuit), 24
・Fist...KB (keyboard). Patent applicant: Yamatake Honeywell Co., Ltd. Agent: Masaki Yamakawa (2 people) Figure 1

Claims

[Claims]

A keyword and a trigger word registered in advance for voice recognition are determined, and when the keyword is recognized and the trigger word is recognized after this, the control corresponding to the keyword and the trigger word is performed according to the recognition of the trigger word. A voice recognition control method characterized by: