JPH0540498A

JPH0540498A - Voice recognizing device

Info

Publication number: JPH0540498A
Application number: JP3197542A
Authority: JP
Inventors: Shinichi Tsurufuji; 真一鶴藤; Masayuki Iida; 正幸飯田; Hiroki Onishi; 宏樹大西; Koji Araki; 孝次荒木; Koji Dejima; 浩次出島
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1991-08-07
Filing date: 1991-08-07
Publication date: 1993-02-19
Anticipated expiration: 2015-01-31
Also published as: JP3005330B2

Abstract

PURPOSE:To eliminate the need of operating a voice input switch whenever a voice is inputted, although the voice input switch is operated in order to prevent a malfunction caused by an ambient noise. CONSTITUTION:By a filter bank 34, a multiplexer 36 and an A/D converter 38, a voice inputted from a microphone 32 is converted to voice pattern data, and it is compared with standard pattern data registered in advance in a standard pattern table. When a voice input is allowed, a voice input switch 30 is turned on, and also, its turn-on time is set to a turn-on time timer of a memory 42. When an input voice is discriminated within a set time, an extension time is set to the turn-on time timer, and continuous voice inputs are allowed.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は音声認識装置に関し、
特にマイクロフォンから入力された音声を分析して得ら
れる音声パターンと予め設定された標準パターンとを比
較して当該音声を認識する、音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device,
In particular, the present invention relates to a voice recognition device that compares a voice pattern obtained by analyzing a voice input from a microphone with a preset standard pattern to recognize the voice.

【０００２】[0002]

【従来の技術】この種の音声認識装置をステレオなどの
音響機器の近傍で用いる場合、音響機器からの出力音響
が音声認識装置に対して周囲雑音となり、誤認識を多発
する危惧がある。特に、たとえば、このような音響機器
を音声認識装置の認識結果に基づいて制御ないし操作し
ようとする場合には、音響機器から出力される音声や音
楽がかなりの大きさで音声認識装置に入力されるので、
音声認識装置が不所望に動作してしまうという不都合が
ある。このような誤動作を防止するために、音声認識装
置に対して音声入力を行うときには、音声入力期間だけ
音響機器の出力を小さくするような音声認識装置が提案
されている（特開昭６３−２９７５５号公報参照）。2. Description of the Related Art When a voice recognition device of this type is used in the vicinity of an audio device such as a stereo, the output sound from the audio device becomes ambient noise with respect to the voice recognition device, and there is a danger of erroneous recognition occurring frequently. In particular, for example, when controlling or operating such an audio device based on the recognition result of the voice recognition device, the voice or music output from the audio device is input to the voice recognition device in a considerable amount. So
There is an inconvenience that the voice recognition device operates undesirably. In order to prevent such a malfunction, a voice recognition device has been proposed in which, when voice input is performed to the voice recognition device, the output of the acoustic device is reduced only during the voice input period (Japanese Patent Laid-Open No. 63-29755). (See Japanese Patent Publication).

【０００３】また、このような音声認識装置において
は、一般的には、マイクロフォンから入力された音声を
分析して得られる音声の特徴を表すパラメータを含む音
声パターンを、予め設定された標準パターンと比較し
て、最も類似した標準パターンを選択することによって
入力音声を認識する。このような音声認識装置において
は、最も類似する標準パターンを選択しても、その類似
度が極めて小さいときには、誤認識である可能性が高い
ので、これを防止するために、その類似度が一定の閾値
を超えなければ認識棄却（リジェクト）するのが一般的
である。Further, in such a voice recognition device, generally, a voice pattern including a parameter representing a feature of a voice obtained by analyzing a voice input from a microphone is referred to as a preset standard pattern. The input voice is recognized by comparing and selecting the most similar standard pattern. In such a voice recognition device, even if the most similar standard pattern is selected, if the degree of similarity is extremely small, there is a high possibility that it is erroneous recognition. If it does not exceed the threshold of, recognition is generally rejected.

【０００４】[0004]

【発明が解決しようとする課題】前者においては、音声
入力可能期間を設定するために、音声入力の都度スイッ
チを操作するなど煩雑な操作が必要であった。また、後
者においては、類似度の閾値が大きすぎる場合には音声
の微妙な曖昧要素によって認識結果が得られないことが
多く、また閾値を小さくすると雑音までも音声として誤
認識してしまうなど種々の不都合がある。In the former case, in order to set the voice input possible period, a complicated operation such as operating a switch each time voice input is required. In the latter case, if the threshold value of the similarity is too large, a recognition result is often not obtained due to a subtle ambiguity element of the voice, and if the threshold value is reduced, even noise may be erroneously recognized as voice. There is an inconvenience.

【０００５】それゆえに、この発明の主たる目的は、煩
雑な操作なしに周囲雑音等による誤動作を防止できる、
音声認識装置を提供することである。この発明の他の目
的は、類似度の閾値設定に伴う不都合を解消できる、音
声認識装置を提供することである。この発明のさらに他
の目的は、認識対象外の音声が入力された場合の誤認識
を防止できる、音声認識装置を提供することである。Therefore, the main object of the present invention is to prevent malfunctions due to ambient noise without complicated operations.
A voice recognition device is provided. Another object of the present invention is to provide a voice recognition device that can eliminate the inconvenience caused by setting the threshold value of the similarity. Still another object of the present invention is to provide a voice recognition device that can prevent erroneous recognition when a voice that is not a recognition target is input.

【０００６】この発明のさらに他の目的は、１つの項目
に対して複数の音声を標準パターンとして登録する場合
に登録誤りを可及的防止できる、音声認識装置を提供す
ることである。Still another object of the present invention is to provide a voice recognition device capable of preventing registration error as much as possible when registering a plurality of voices as a standard pattern for one item.

【０００７】[0007]

【課題を解決するための手段】第１発明は、マイクロフ
ォンから入力された音声を分析して音声パターンを作成
するパターン作成手段、および音声パターンと予め登録
されている標準パターンとを比較して認識する認識手段
を備える音声認識装置において、マイクロフォンからの
音声入力を許容する入力時間を設定する時間設定手段、
および時間設定手段によって設定された入力時間内に認
識手段によって音声が認識されたとき入力時間を延長す
る延長手段をさらに備えることを特徴とする、音声認識
装置である。SUMMARY OF THE INVENTION A first aspect of the present invention is a pattern creating means for analyzing a voice input from a microphone to create a voice pattern, and a recognition by comparing a voice pattern with a standard pattern registered in advance. A voice recognition device having a recognition means for setting a time setting means for setting an input time for allowing voice input from a microphone,
And a voice recognizing device further comprising extension means for extending the input time when the voice is recognized by the recognition means within the input time set by the time setting means.

【０００８】第２発明は、マイクロフォンから入力され
た音声を分析して音声パターンを作成するパターン作成
手段、複数の音声パターンが標準パターンとして予め設
定されている標準パターン設定手段、パターン作成手段
によって作成された音声パターンを標準パターン設定手
段に設定されているそれぞれの標準パターンと比較して
最も大きい類似度を示す標準パターンを選択する選択手
段、および選択手段によって選択された標準パターンの
類似度が所定の閾値より大きいとき標準パターンによっ
て音声を認識し、類似度が所定の閾値よりも小さいとき
リジェクトする判定手段を備える音声認識装置におい
て、判定手段によって同じ音声が複数回連続してリジェ
クトされたときその音声を認識結果とする手段をさらに
備えることを特徴とする、音声認識装置である。A second aspect of the invention is a pattern creating means for analyzing a voice input from a microphone to create a voice pattern, a standard pattern setting means for presetting a plurality of voice patterns as standard patterns, and a pattern creating means. The selected voice pattern is compared with the respective standard patterns set in the standard pattern setting means, and the selecting means for selecting the standard pattern having the highest similarity, and the similarity of the standard pattern selected by the selecting means are predetermined. When a voice recognition device having a determination means for recognizing a voice by a standard pattern when the value is larger than a threshold value and rejecting the similarity when the similarity is smaller than a predetermined threshold value, the same voice is continuously rejected multiple times by the determination means. Characterized in that it further comprises means for making speech a recognition result. That is a voice recognition device.

【０００９】第３発明は、マイクロフォンから入力され
た音声を分析して音声パターンを作成するパターン作成
手段、複数の音声パターンが標準パターンとして予め設
定されている標準パターン設定手段、パターン作成手段
によって作成された音声パターンを標準パターン設定手
段に設定されているそれぞれの標準パターンと比較して
最も大きい類似度を示す標準パターンを選択する選択手
段、および選択手段によって選択された標準パターンの
類似度が所定の閾値より大きいとき標準パターンによっ
て音声を認識し、類似度が前記所定の閾値よりも小さい
ときリジェクトする判定手段を備える音声認識装置にお
いて、選択された標準パターンが識別対象外のものであ
るとき判定手段からの認識結果の出力を停止する手段を
さらに備えることを特徴とする、音声認識装置である。A third aspect of the present invention is a pattern creating means for analyzing a voice input from a microphone to create a voice pattern, a standard pattern setting means for presetting a plurality of voice patterns as a standard pattern, and a pattern creating means. The selected voice pattern is compared with the respective standard patterns set in the standard pattern setting means, and the selecting means for selecting the standard pattern having the highest similarity, and the similarity of the standard pattern selected by the selecting means are predetermined. When the selected standard pattern is out of the identification target, the speech recognition apparatus includes a determination unit that recognizes the voice by the standard pattern when the standard pattern is larger than the threshold value and rejects when the similarity is smaller than the predetermined threshold value. Further comprising means for stopping the output of the recognition result from the means Wherein a speech recognition device.

【００１０】第４発明は、マイクロフォンから入力され
た音声を分析して得られる音声パターンを標準パターン
として予め登録しておく音声認識装置において、１つの
項目について異なるモードで異なる音声の標準パターン
を登録するとき、登録すべきモードを知らせる表示手段
をさらに備えることを特徴とする、音声認識装置であ
る。According to a fourth aspect of the present invention, in a voice recognition device in which a voice pattern obtained by analyzing a voice input from a microphone is registered in advance as a standard pattern, different voice standard patterns are registered in different modes for one item. The voice recognition device is characterized by further comprising display means for indicating a mode to be registered.

【００１１】[0011]

【作用】第１発明においては、たとえば音声入力スイッ
チをオンして音声入力を許容する。それと共に、たとえ
ばオン時間タイマによって音声入力を許容する入力時間
が設定される。そのタイマに設定された入力時間内に認
識手段によって音声が認識されたとき、延長手段は、た
とえば、そのタイマに延長時間を再度設定して入力時間
を延長する。In the first aspect of the invention, the voice input switch is turned on to allow the voice input. At the same time, for example, an on-time timer sets an input time for allowing voice input. When the recognition unit recognizes the voice within the input time set in the timer, the extension unit sets the extension time in the timer again to extend the input time.

【００１２】第２発明においては、選択手段によって選
択された標準パターンに対する類似度が閾値よりも小さ
いときにはリジェクトされるが、その標準パターンが複
数回連続してリジェクトされたときには、その標準パタ
ーンに基づいて認識結果を出力する。したがってリジェ
クトされた音声を何回か再入力すれば認識可能になる。In the second aspect of the invention, when the similarity to the standard pattern selected by the selecting means is smaller than the threshold value, the standard pattern is rejected. When the standard pattern is rejected a plurality of times in succession, the standard pattern is used. Output the recognition result. Therefore, the rejected voice can be recognized by inputting it again several times.

【００１３】第３発明においては、認識対象外の音声の
標準パターンが最大類似度を示す場合には、認識結果は
出力しない。したがって、認識対象外の音声によって誤
動作を生じることはない。第４発明によれば、たとえば
ＬＥＤなどの異なる表示態様によって、そのとき音声を
登録すべきモードを表示する。In the third aspect of the present invention, the recognition result is not output when the standard pattern of the non-recognition speech shows the maximum similarity. Therefore, a malfunction does not occur due to a voice that is not a recognition target. According to the fourth invention, the mode in which the voice should be registered at that time is displayed by a different display mode such as an LED.

【００１４】[0014]

【発明の効果】第１発明によれば、音声が一旦認識され
ると音声入力可能時間が延長されるので、連続して音声
入力する場合に再度入力時間を設定する必要はない。し
たがって、誤動作を防止するために音声入力可能期間を
設定するのに、従来のように煩雑なスイッチ操作は必要
なくなる。また、入力時間にのみ音声を認識するので、
周囲雑音がマイクロフォンに入力される可能性が小さく
なり、従来と同様に、雑音で誤動作することはない。According to the first aspect of the invention, since the voice input possible time is extended once the voice is recognized, it is not necessary to set the input time again in the case of continuous voice input. Therefore, in order to set the voice input possible period in order to prevent malfunction, it is not necessary to perform a complicated switch operation as in the conventional case. Also, since the voice is recognized only at the input time,
Ambient noise is less likely to be input to the microphone, and noise does not cause malfunction as in the conventional case.

【００１５】第２発明によれば、判定ないし認識に必要
な類似度の弁別のための閾値の設定が容易になる。すな
わち、従来技術では、閾値を小さく設定すると周囲の音
声等の雑音によっても誤動作するし、そのような誤動作
を防止するために閾値を大きく設定すると、音声特有の
曖昧要素によってリジェクトされる確率が大きくなり、
閾値の設定が難しかったが、この第２発明によれば、複
数回同じ単語を音声入力し、それがリジェクトされ続け
た場合には、そのリジェクトされた単語を認識するの
で、結果的に、誤動作を防止するために閾値を大きく設
定しても、何回か同じ音声入力を繰り返すことによって
認識可能となる。また、突発音や会話の音声の場合に
は、同一単語が繰り返し入力されることは少ないので、
そのような突発音や会話音声による誤動作を少なくする
ことができる。According to the second aspect of the invention, it becomes easy to set a threshold value for discriminating the degree of similarity required for judgment or recognition. That is, in the prior art, if the threshold value is set small, it malfunctions due to noise such as surrounding voice, and if the threshold value is set large to prevent such malfunction, the probability of being rejected by an ambiguity element peculiar to the voice is large. Becomes
Although it was difficult to set the threshold value, according to the second aspect of the invention, if the same word is input by voice a plurality of times and the word is continuously rejected, the rejected word is recognized, resulting in malfunction. Even if the threshold value is set to a large value in order to prevent this, recognition can be performed by repeating the same voice input several times. Also, in the case of sudden sound or speech of conversation, the same word is rarely input repeatedly, so
It is possible to reduce malfunctions caused by such sudden sounds and conversation voices.

【００１６】第３発明によれば、認識対象外の音声を入
力した場合には認識結果が出力されないので、それによ
る誤動作を防止することができる。また、再入力を指示
するようにすれば、使用者は、再度音声入力することが
できる。また、所定時間内に再度音声が入力されない場
合には、対象内で最も類似している標準パターンに基づ
いて認識するようにすれば、認識対象内の音声を入力し
たにも拘わらず対象外であると判断された場合でも、対
象内の音声入力であるとして認識される。According to the third aspect, since the recognition result is not output when a voice that is not a recognition target is input, it is possible to prevent a malfunction due to the recognition result. If the user inputs a re-input instruction, the user can input the voice again. Also, if the voice is not input again within the predetermined time, if recognition is performed based on the most similar standard pattern in the target, the voice in the recognition target is input, but the target is outside the target. Even if it is determined that there is a voice input, it is recognized as a voice input in the target.

【００１７】第４発明によれば、たとえば１つの制御対
象を２以上の認識結果で制御するような場合でも、登録
誤りを減じることができる。この発明の上述の目的，そ
の他の目的，特徴および利点は、図面を参照して行う以
下の実施例の詳細な説明から一層明らかとなろう。According to the fourth invention, registration errors can be reduced even when, for example, one control target is controlled by two or more recognition results. The above-mentioned objects, other objects, features and advantages of the present invention will become more apparent from the following detailed description of the embodiments with reference to the drawings.

【００１８】[0018]

【実施例】図１に示す実施例のカーオーディオシステム
１０はマイクロコンピュータ１２を含み、マイクロコン
ピュータ１２によってオーディオ部１４が制御される。
オーディオ部１４は、チューナ１８，テープデッキ２０
およびＣＤプレーヤ２２等を含むステレオ音源１６を含
み、このステレオ音源１６からの右信号Ｒおよび左信号
Ｌは、それぞれ、アンプ２４Ｒおよび２４Ｌを通して、
自動車（図示せず）の室内の適宜の位置に配置されたス
ピーカ２６Ｒおよび２６Ｌに与えられる。ステレオ音源
１６が４チャネルステレオである場合、さらにリア信号
が出力される。オーディオ部１４は、さらに、コントロ
ーラ２８を含み、このコントローラ２８はステレオ音源
１６を手動的に操作するための操作スイッチ（図示せ
ず）を備える。ただし、マイクロコンピュータ１２から
の制御信号によってオーディオ部１４すなわちステレオ
音源１６を制御する場合には、オーディオ部１４に設け
られた音声入力スイッチ３０が操作される。この場合に
は、上述の操作スイッチからの操作信号に代えて、マイ
クロコンピュータ１２からの制御信号がステレオ音源１
６に入力される。なお、オーディオ部１４には、発光ダ
イオード（ＬＥＤ）３１が設けられ、このＬＥＤ３１に
よって、後述のように、たとえば認識対象外の音声が入
力されたこと、そのために再度音声入力が必要なこと、
あるいは登録の手順等を操作者に種々報知する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A car audio system 10 of the embodiment shown in FIG. 1 includes a microcomputer 12, and an audio section 14 is controlled by the microcomputer 12.
The audio section 14 includes a tuner 18 and a tape deck 20.
And a stereo sound source 16 including a CD player 22 and the like, and a right signal R and a left signal L from this stereo sound source 16 are passed through amplifiers 24R and 24L, respectively.
It is given to speakers 26R and 26L arranged at appropriate positions in the interior of a vehicle (not shown). When the stereo sound source 16 is 4-channel stereo, a rear signal is further output. The audio unit 14 further includes a controller 28, and the controller 28 includes an operation switch (not shown) for manually operating the stereo sound source 16. However, when the audio unit 14, that is, the stereo sound source 16 is controlled by the control signal from the microcomputer 12, the voice input switch 30 provided in the audio unit 14 is operated. In this case, the control signal from the microcomputer 12 is replaced by the control signal from the microcomputer 12 instead of the operation signal from the operation switch.
6 is input. Note that the audio unit 14 is provided with a light emitting diode (LED) 31, and as described later, for example, a voice that is not a recognition target is input by the LED 31, and therefore voice input is required again.
Alternatively, the operator is notified of various procedures such as registration procedures.

【００１９】一方、自動車のダッシュボード（図示せ
ず）には、オーディオ部分１４を制御するためのドライ
バの音声をピックアップするためのマイクロフォン３２
が配置される。このマイクロフォン３２からの音声信号
はフィルタバンク３４に与えられる。フィルタバンク３
４は、よく知られているように、たとえば８チャネルの
バンドパスフィルタを含み、そのバンドパスフィルタに
よって、マイクロフォン３２から入力された音声信号の
特徴パラメータを抽出する。すなわち、フィルタバンク
３４は、各チャネル毎に、プリアンプ，ＡＧＣ，バンド
パスフィルタ，整流回路およびローパスフィルタを備え
る。フィルタバンク３４からの各特徴パラメータ（アナ
ログ信号）はマルチプレクサ３６に入力される。マルチ
プレクサ３６は、フィルタバンク３４から入力される８
チャネルの特徴パラメータ信号を時間順次に出力する。
マルチプレクサ３６から出力された音声信号はＡ／Ｄ変
換器３８によって、特徴パラメータデータに変換され
る。On the other hand, a vehicle dashboard (not shown) has a microphone 32 for picking up a driver's voice for controlling the audio section 14.
Are placed. The audio signal from the microphone 32 is given to the filter bank 34. Filter bank 3
As well known, 4 includes, for example, an 8-channel band pass filter, and the band pass filter extracts the characteristic parameter of the audio signal input from the microphone 32. That is, the filter bank 34 includes a preamplifier, an AGC, a bandpass filter, a rectifier circuit, and a lowpass filter for each channel. Each characteristic parameter (analog signal) from the filter bank 34 is input to the multiplexer 36. The multiplexer 36 receives the 8 input from the filter bank 34.
The characteristic parameter signals of the channels are output in time sequence.
The audio signal output from the multiplexer 36 is converted into characteristic parameter data by the A / D converter 38.

【００２０】上述の音声入力スイッチ３０からの信号お
よびＡ／Ｄ変換器３８の出力は、入力ポート４０を通し
て、上述のマイクロコンピュータ１２に入力される。マ
イクロコンピュータ１２は、後述のようにして、入力ポ
ート４０から入力された特徴パラメータをメモリ４２に
形成されている標準パターンテーブル４２ａ（図２）の
各標準パターンと比較することによって、マイクロフォ
ン３２から入力された音声を認識する。そして、その認
識結果に応じて、出力ポート４４を通して、オーディオ
部１４に前述の制御信号を出力する。The signal from the voice input switch 30 and the output of the A / D converter 38 are input to the microcomputer 12 via the input port 40. The microcomputer 12 compares the characteristic parameters input from the input port 40 with the respective standard patterns in the standard pattern table 42a (FIG. 2) formed in the memory 42, as described later, to input from the microphone 32. The recognized voice. Then, according to the recognition result, the aforementioned control signal is output to the audio section 14 through the output port 44.

【００２１】したがって、音声入力スイッチ３０が操作
されているときマイクロフォン３２にオーディオ部１４
を制御するための音声が入力されると、その音声に応じ
て、マイクロコンピュータ１２から制御信号が出力され
る。この制御信号に応答して、コントローラ２８が、ス
テレオ音源１６を制御する。メモリ４２は、図２に示す
ように、標準パターンテーブル４２ａを含み、この標準
パターンテーブル４２ａには、フィルタバンク３４によ
って切り出された特徴パラメータに基づいて音声を認識
するための各音ないし単語の標準的な特徴パラメータの
パターンが各番号毎に予め登録されている。なお、この
標準パターンテーブル４２ａはたとえばバックアップＲ
ＡＭで構成される。メモリ４２には、さらに、始端フラ
グ４２ａが形成され、この始端フラグ４２ｂは、図３に
示すように音声データが最初に閾値を超えたときすなわ
ち“Ｆｈ”で示す音声の始端が検出されたときオンされ
る。メモリ４２はさらに音声データバッファ４２ｃを含
み、この音声データバッファ４２ｃにはマイクロコンピ
ュータ１２が取り込んだＡ／Ｄ変換器３８からの音声デ
ータがストアされる。この音声データバッファ４２ｃは
複数のフレームに亘って図３に示す始端（“Ｆｈ”で示
す）から終端（“Ｆｔ”で示す）までの一連の音声デー
タをストア可能なように、複数のアドレスを有する。た
だし、１フレームはたとえば５ミリ秒に設定される。す
なわち、音声データバッファ４２ｃは、Ａ／Ｄ変換器３
８から出力されるマイク３２に入力された音声の特徴パ
ラメータデータをフレーム順次にストアする。Therefore, when the voice input switch 30 is operated, the audio section 14 is added to the microphone 32.
When a voice for controlling is input, a control signal is output from the microcomputer 12 according to the voice. In response to this control signal, the controller 28 controls the stereo sound source 16. As shown in FIG. 2, the memory 42 includes a standard pattern table 42a. In the standard pattern table 42a, a standard of each sound or word for recognizing a voice based on the characteristic parameters cut out by the filter bank 34. A pattern of characteristic parameters is registered in advance for each number. The standard pattern table 42a is, for example, a backup R
Composed of AM. A start end flag 42a is further formed in the memory 42, and the start end flag 42b is set when the voice data first exceeds the threshold as shown in FIG. 3, that is, when the start end of the voice indicated by "Fh" is detected. Turned on. The memory 42 further includes an audio data buffer 42c, and the audio data buffer 42c stores audio data from the A / D converter 38 fetched by the microcomputer 12. The audio data buffer 42c has a plurality of addresses so that a series of audio data from the start end (shown by "Fh") to the end (shown by "Ft") shown in FIG. 3 can be stored over a plurality of frames. Have. However, one frame is set to 5 milliseconds, for example. That is, the audio data buffer 42c is the A / D converter 3
The characteristic parameter data of the voice input from the microphone 8 and input to the microphone 32 is stored in frame order.

【００２２】メモリ４２はさらに前述の標準パターンテ
ーブル４２ａの各番号毎に固有の設定領域を有する時間
テーブル４２ｄを含み、この時間テーブル４２ｄには、
標準パターンテーブル４２ａに設定される標準パターン
毎に特有に決定される「延長時間」が設定される。この
延長時間は、前述の音声入力スイッチ３０のオン時間を
延長すべき時間を意味する。たとえば、一連の２以上の
音声で１つの制御を達成する場合、先の音声が認識され
た後、音声入力スイッチ３０のオン状態を継続しておく
必要があるが、そのオン時間をどの程度延長すべきかを
示す延長時間が、この時間テーブル４２ｄに設定され
る。そして、後述のように、この時間テーブル４２ｄか
ら読み出した時間が同じくメモリ４２に割り付けられて
いるオン時間タイマ４２ｅに設定される。The memory 42 further includes a time table 42d having a setting area unique to each number of the standard pattern table 42a, and the time table 42d includes:
An "extended time" that is uniquely determined for each standard pattern set in the standard pattern table 42a is set. This extension time means the time when the on time of the voice input switch 30 should be extended. For example, in order to achieve one control with a series of two or more voices, it is necessary to keep the ON state of the voice input switch 30 after the previous voice is recognized, but how much the ON time is extended. The extension time indicating whether or not it should be set is set in the time table 42d. Then, as described later, the time read from the time table 42d is set in the on-time timer 42e which is also assigned to the memory 42.

【００２３】メモリ４２に含まれるリジェクトフラグ４
２ｆは適正な認識ができなかったとき（認識棄却のと
き）にオンされるものであり、リジェクト番号レジスタ
４２ｇはそのようにしてリジェクトされた単語を示す標
準パターンテーブル４２ａの番号をストアする。リジェ
クトカウンタ４２ｈは、リジェクトされた回数をカウン
トするもので、リジェクトされる毎にインクリメントさ
れる。Reject flag 4 included in the memory 42
2f is turned on when proper recognition cannot be performed (when recognition is rejected), and the reject number register 42g stores the number of the standard pattern table 42a indicating the word rejected in this way. The reject counter 42h counts the number of rejects, and is incremented each time it is rejected.

【００２４】なお、メモリ４２の再入力タイマ４２ｉ
は、認識対象外の単語が入力されたとき操作者に再入力
を許容する時間を設定するためのタイマである。また、
点滅時間タイマ４２ｊは、ＬＥＤ３１を点滅させる時間
間隔を設定するためのタイマである。図４に示す登録モ
ードは図示しない登録キーの操作に応じて設定され、最
初のステップＳ１においては、同じく図示しないテンキ
ーなどを用いて登録番号を設定する。この登録番号は標
準パターンテーブル４２ａにおける番号であり、その番
号毎に認識すべき単語の標準パターンを登録する。その
ために、使用者がマイクロフォン３２（図１）に向かっ
てその番号で登録したい単語を音声入力する。応じて、
ステップＳ２において、音声入力のサンプリングが開始
され、先に説明したように、フィルタバンク３４，マル
チプレクサ３６およびＡ／Ｄ変換器３８を経て、マイク
ロコンピュータ１２に音声（パラメータ）データが入力
される。したがって、ステップＳ３において、マイクロ
コンピュータ１２は、その音声データを取り込み、図示
しないバッファに一時的にストアする。次のステップＳ
４においては、マイクロコンピュータ１２は、音声の始
端（これは図３の“Ｆｈ”に相当する）を既に検出して
いるかどうかを判断する。もし音声の始端がまだ入力さ
れていないときには、続くステップＳ５において、その
ステップＳ３で入力された音声データは始端のものであ
るかどうか判断する。このステップＳ５において“Ｎ
Ｏ”が判断されると、ステップＳ３に戻る。入力された
音声データが始端のものであると、マイクロコンピュー
タ１２は始端フラグ４２ｂ（図２）をセットして、先の
ステップＳ４において“ＹＥＳ”と判断されたときと同
様に、次のステップＳ７を実行する。ステップＳ７にお
いては、先に取り込んだ音声データを音声バッファ４２
ｃ（図２）にストアする。そして、ステップＳ８におい
て、入力された音声データが終端（これは図３における
“Ｆｔ”に相当する）のものであるかどうか判断する。
そうでなければ、先のステップＳ３に戻る。このように
して、ステップＳ３〜Ｓ８が繰り返し実行され、始端か
ら終端までの音声データが音声バッファ４２ｃにフレー
ム順次にストアされる。The re-input timer 42i of the memory 42
Is a timer for setting a time for allowing the operator to re-input a word that is not a recognition target. Also,
The blinking time timer 42j is a timer for setting a time interval for blinking the LED 31. The registration mode shown in FIG. 4 is set according to the operation of a registration key (not shown), and in the first step S1, a registration number is also set using a numeric keypad (not shown). This registration number is a number in the standard pattern table 42a, and a standard pattern of a word to be recognized is registered for each number. For that purpose, the user voice-inputs to the microphone 32 (FIG. 1) the word to be registered with the number. Depending on,
In step S2, sampling of voice input is started, and voice (parameter) data is input to the microcomputer 12 via the filter bank 34, the multiplexer 36 and the A / D converter 38 as described above. Therefore, in step S3, the microcomputer 12 takes in the audio data and temporarily stores it in a buffer (not shown). Next step S
At 4, the microcomputer 12 determines whether or not the beginning of the voice (which corresponds to "Fh" in FIG. 3) has already been detected. If the start point of the voice has not been input yet, it is determined in step S5 whether the voice data input in step S3 is the start point. In this step S5, "N
If "O" is determined, the process returns to step S3. If the input audio data is for the start end, the microcomputer 12 sets the start end flag 42b (FIG. 2), and "YES" at the previous step S4. In the same manner as when it is determined that the voice buffer 42 receives the voice data previously fetched in step S7.
Store in c (FIG. 2). Then, in step S8, it is determined whether or not the input voice data is the end (corresponding to "Ft" in FIG. 3).
If not, the process returns to the previous step S3. In this way, steps S3 to S8 are repeatedly executed, and the audio data from the start end to the end is frame-sequentially stored in the audio buffer 42c.

【００２５】その後、ステップＳ９において、マイクロ
コンピュータ１２はこの音声バッファ４２ｃにストアし
たデータを正規化（具体的にはデータ圧縮）する。正規
化された音声データが、ステップＳ１０において、標準
パターンテーブル４２ａのステップＳ１において設定さ
れた番号に相当する領域にセーブされる。次のステップ
Ｓ１１においては、時間テーブル４２ｄに、「延長時
間」を設定する。すなわち、このステップＳ１１におい
ては、標準パターンテーブル４２ａに標準パターンが設
定されたその単語が入力されたときに、音声入力可能時
間（後述）をどの程度延長すべきかを示す延長時間が個
々に設定される。そして、ステップＳ１２において、登
録キーが再度操作されたかどうかなどに応じて、登録モ
ードを終了するかどうか判断される。もし登録動作を継
続するならば、ステップＳ１３において、登録番号を変
更して先のステップＳ２に戻る。このようにして、標準
パターンテーブル４２ａに認識すべき単語の標準パター
ンデータが、そして時間テーブル４２ｄに個々の単語を
認識したときの延長時間を表すデータが予め登録され
る。Then, in step S9, the microcomputer 12 normalizes the data stored in the audio buffer 42c (specifically, data compression). In step S10, the normalized voice data is saved in the area corresponding to the number set in step S1 of the standard pattern table 42a. In the next step S11, "extension time" is set in the time table 42d. That is, in this step S11, when the word for which the standard pattern is set is input to the standard pattern table 42a, the extension time indicating how much the voice input possible time (described later) should be extended is individually set. It Then, in step S12, it is determined whether or not to end the registration mode according to whether or not the registration key is operated again. If the registration operation is to be continued, the registration number is changed in step S13 and the process returns to step S2. In this way, the standard pattern data of the word to be recognized is registered in the standard pattern table 42a, and the data representing the extension time when each word is recognized is registered in the time table 42d in advance.

【００２６】図５に示す認識モードの最初のステップＳ
１０１では、マイクロコンピュータ１２は、入力ポート
４０（図１）からの信号によって、音声入力スイッチ３
０が操作されているかどうか、すなわち音声入力可能期
間であるかどうか判断する。そして、ステップＳ１０１
において音声入力スイッチ３０のオンが検出されると、
次のステップＳ１０２において、マイクロコンピュータ
１２は、オン時間タイマ４２ｅ（図２）に、この音声入
力スイッチ３０のオン状態を継続する所定の時間（たと
えば、１０秒）を設定する。First step S of the recognition mode shown in FIG.
At 101, the microcomputer 12 receives the signal from the input port 40 (FIG. 1) and outputs the voice input switch 3
It is determined whether 0 is operated, that is, whether it is a voice input possible period. Then, step S101
When it is detected that the voice input switch 30 is turned on,
In the next step S102, the microcomputer 12 sets the ON time timer 42e (FIG. 2) to a predetermined time (for example, 10 seconds) for continuing the ON state of the voice input switch 30.

【００２７】その後、ステップＳ１０３，Ｓ１０４，Ｓ
１０５，Ｓ１０６およびＳ１０８が実行される。これら
のステップは、先の図５の登録モードで説明したステッ
プＳ２，Ｓ３，Ｓ４，Ｓ５およびＳ６にそれぞれ相当す
るので、ここでは重複する説明は省略する。そして、ス
テップＳ１０７において、ステップＳ１０４で入力され
た音声データが、先のステップＳ１０２においてオン時
間タイマ４２ｅに設定した音声入力可能時間内に入力さ
れたものかどうか判断する。このステップＳ１０７にお
いて“ＹＥＳ”が判断されると、先のステップＳ１０４
に戻るが、“ＮＯ”が判断されるとステップＳ１０７ａ
において、マイクロコンピュータ１２は、音声入力スイ
ッチ３０をオフ状態に強制し、ステップＳ１０１に戻
る。すなわち、音声入力スイッチ３０がオンされた後オ
ン時間タイマ４２ｅに設定された所定時間内に音声入力
がなければ、マイクロコンピュータ１２は音声入力スイ
ッチ３０をオフして、それ以後の認識動作は実行されな
い。Thereafter, steps S103, S104, S
105, S106 and S108 are executed. These steps correspond to the steps S2, S3, S4, S5 and S6 described in the registration mode of FIG. 5, respectively, and thus duplicated description will be omitted here. Then, in step S107, it is determined whether or not the voice data input in step S104 is input within the voice input possible time set in the on-time timer 42e in step S102. If "YES" is determined in this step S107, the previous step S104
However, if "NO" is determined, the step S107a is executed.
At 12, the microcomputer 12 forces the voice input switch 30 to the off state and returns to step S101. That is, if there is no voice input within the predetermined time set in the on-time timer 42e after the voice input switch 30 is turned on, the microcomputer 12 turns off the voice input switch 30 and the recognition operation thereafter is not executed. ..

【００２８】ステップＳ１０８に続いて、図６に示すス
テップＳ１０９および１１０が実行されるが、このステ
ップは先の登録モードにおけるステップＳ７およびＳ８
と同様であり、ここでは重複する説明は省略する。そし
て、ステップＳ１１１において、マイクロコンピュータ
１２は、音声バッファ４２ｃにストアされた音声データ
と標準パターンテーブル４２ａに予め登録されている標
準パターンの各々との類似度を計算する。そして、その
うち最大類似度を示す標準パターンをステップＳ１１２
で決定するとともに、ステップＳ１１３においてその類
似度を弁別するための第１の閾値を設定し、ステップＳ
１１４に進む。ステップＳ１１３において設定される第
１の閾値は、比較的大きく、完全同一の場合の類似度を
「１００」とすると、この第１の閾値はたとえば「９
０」に設定される。そして、ステップＳ１１４におい
て、ステップＳ１１２において選択した標準パターンの
類似度が、ステップＳ１１３で設定した第１の閾値を超
えるかどうか判断する。最大類似度が第１の閾値より大
きいとき、その最大類似度を与える標準パターンで示さ
れる単語を認識結果として出力する（ステップＳ１１
５）。Following step S108, steps S109 and 110 shown in FIG. 6 are executed, which are steps S7 and S8 in the previous registration mode.
The same description is omitted here, and the duplicated description is omitted here. Then, in step S111, the microcomputer 12 calculates the degree of similarity between the audio data stored in the audio buffer 42c and each of the standard patterns registered in advance in the standard pattern table 42a. Then, the standard pattern indicating the maximum degree of similarity is selected in step S112.
In step S113, the first threshold value for discriminating the degree of similarity is set, and in step S113
Proceed to 114. The first threshold set in step S113 is relatively large, and if the similarity in the case of perfect identity is "100", this first threshold is, for example, "9."
It is set to "0". Then, in step S114, it is determined whether the similarity of the standard pattern selected in step S112 exceeds the first threshold value set in step S113. When the maximum similarity is larger than the first threshold value, the word indicated by the standard pattern that gives the maximum similarity is output as the recognition result (step S11).
5).

【００２９】続くステップＳ１１６においては、時間テ
ーブル４２ｄのその単語に相当する番号の領域から延長
時間データを読み出し、その延長時間を、先のステップ
Ｓ１０２と同様にして、オン時間タイマ４２ｅに設定す
る。すなわち、ステップＳ１１５において、入力された
音声が標準パターンテーブル４２ａに予め登録されてい
る標準パターンによって識別されると、引き続き音声入
力を許容するために、ステップＳ１１６においてオン時
間タイマ４２ｅを再設定して、ステップＳ１０３（図
５）に戻り、後続の音声入力を待つ。このように、入力
音声が認識されると音声入力可能時間が延長されるの
で、その後続けて音声入力する場合でも、音声入力スイ
ッチ３０を再度操作する必要はない。たとえば、カーオ
ーディオシステム１０のテープデッキ２０を制御して、
「早送り」したいときには、「早送り」，「再生」，
「早送り」，…「再生」と連続して音声入力すればよい
が、この場合でも、最初に１回音声入力スイッチ３０を
オンするだけで、以後連続して音声入力することができ
る。また、ステップＳ１０７およびＳ１０７ａによっ
て、オン時間タイマ４０ｅに設定した時間が経過した後
は、音声入力できなくなるので、周囲の雑音による誤動
作を防ぐことができる。In the following step S116, extension time data is read from the area of the number corresponding to the word in the time table 42d, and the extension time is set in the on-time timer 42e in the same manner as in step S102. That is, when the input voice is identified by the standard pattern registered in advance in the standard pattern table 42a in step S115, the on-time timer 42e is reset in step S116 to allow the voice input continuously. , And returns to step S103 (FIG. 5) to wait for the subsequent voice input. As described above, when the input voice is recognized, the voice input possible time is extended, and therefore, even when the voice is continuously input, it is not necessary to operate the voice input switch 30 again. For example, by controlling the tape deck 20 of the car audio system 10,
When you want to "fast forward", "fast forward", "play",
It is only necessary to continuously input the voice by "fast forward", ... "Play", but even in this case, the voice can be continuously input only by first turning on the voice input switch 30 once. Further, in steps S107 and S107a, after the time set in the on-time timer 40e has elapsed, voice input cannot be performed, so that malfunction due to ambient noise can be prevented.

【００３０】なお、ステップＳ１１６がステップＳ１１
５において特定番号で示される単語を認識したときにの
み実行されるようにすれば、すなわち特定の単語を認識
したときにのみ音声入力可能時間を延長するようにすれ
ば、周囲雑音による誤動作の可能性をより一層低減する
ことができる。先のステップＳ１１４（図６）において
ステップＳ１１２で選択された最大類似度を示す標準パ
ターンの類似度が第１の閾値より小さいと判定した場合
には、図７に示すステップＳ１１７に進む。すなわち、
ステップＳ１１７においては、リジェクトフラグ４２ｆ
がオンされているかどうかを判断する。もし、リジェク
トフラグ４２ｆがオフされているときには、ステップＳ
１１８において、リジェクトフラグ４２ｆをセットする
とともに、リジェクト番号レジスタ４２ｇにリジェクト
された単語（標準パターン）の番号をストアしかつリジ
ェクトカウンタ４０ｈをインクリメントし、その後先の
ステップＳ１０３（図５）に戻る。Note that step S116 is step S11.
If it is executed only when the word indicated by the specific number in 5 is recognized, that is, if the voice input possible time is extended only when the specific word is recognized, malfunction due to ambient noise is possible. The property can be further reduced. If it is determined in the previous step S114 (FIG. 6) that the similarity of the standard pattern indicating the maximum similarity selected in step S112 is smaller than the first threshold value, the process proceeds to step S117 shown in FIG. That is,
In step S117, the reject flag 42f
To determine if is turned on. If the reject flag 42f is turned off, step S
At 118, the reject flag 42f is set, the number of the rejected word (standard pattern) is stored in the reject number register 42g, the reject counter 40h is incremented, and then the process returns to step S103 (FIG. 5).

【００３１】ステップＳ１１７においてリジェクトフラ
グ４２ｆが既にオンされていることを検出すると、次の
ステップＳ１１９において、マイクロコンピュータ１２
は、リジェクト番号レジスタ４２ｇを参照して、直前に
リジェクトされた標準パターンの番号と今回リジェクト
された標準パターンの番号とが同じであるかどうか、す
なわち同じ単語が続けてリジェクトされたかどうかを判
断する。前にリジェクトされた単語と今回リジェクトさ
れた単語とが異なる場合、すなわち“ＮＯ”の場合、ス
テップＳ１２０において、リジェクト番号レジスタ４２
ｇを今回リジェクトされた標準パターンの番号で更新す
るとともに、リジェクトカウンタ４２ｈをインクリメン
トし、ステップＳ１０３に戻る。When it is detected in step S117 that the reject flag 42f is already turned on, in the next step S119, the microcomputer 12 is detected.
Refers to the reject number register 42g, and determines whether the number of the standard pattern rejected immediately before is the same as the number of the standard pattern rejected this time, that is, whether the same word is continuously rejected. .. If the previously rejected word is different from the currently rejected word, that is, if “NO”, in step S120, the reject number register 42
g is updated with the number of the standard pattern rejected this time, the reject counter 42h is incremented, and the process returns to step S103.

【００３２】前にリジェクトされた番号と今回リジェク
トされた番号とが同じである場合、すなわちステップＳ
１１９において“ＹＥＳ”が判断された場合、マイクロ
コンピュータＳ１２１は、第１閾値よりやや小さいたと
えば「８０」のような第２の閾値を設定し、ステップＳ
１２２において、ステップＳ１１２（図６）で選択され
た最大類似度がステップＳ１２１で設定された第２の閾
値を超えるかどうかを判断する。もし最大類似度がその
第２の閾値を超える場合には、その標準パターンに基づ
いて認識結果が出力される。しかしながら、最大類似度
が第２の閾値以下である場合には、ステップＳ１２３に
おいて、マイクロコンピュータ１２はリジェクトカウン
タ４２ｈを参照して、リジェクト回数が所定回数ｎ（た
とえば３回）に達したかどうかを判断する。ステップＳ
１２３において“ＹＥＳ”と判断されると、マイクロコ
ンピュータ１２は、ステップＳ１２４において、リジェ
クト番号レジスタ４２ｇにロードされている番号を認識
結果として出力する。また、リジェクト回数が所定回数
に達していないときには、ステップＳ１２５において、
リジェクトカウンタ４２ｈをインクリメントするととも
に、第２の閾値よりさらに小さいたとえば「７０」の第
３の閾値を設定して、ステップＳ１０３に戻る。If the previously rejected number and the currently rejected number are the same, that is, step S
If “YES” is determined in 119, the microcomputer S121 sets a second threshold value, such as “80”, which is slightly smaller than the first threshold value, and the step S
At 122, it is determined whether the maximum similarity selected at step S112 (FIG. 6) exceeds the second threshold set at step S121. If the maximum similarity exceeds the second threshold, the recognition result is output based on the standard pattern. However, if the maximum similarity is equal to or less than the second threshold value, in step S123, the microcomputer 12 refers to the reject counter 42h and determines whether the number of rejects has reached a predetermined number n (for example, 3). to decide. Step S
When it is determined to be "YES" in 123, the microcomputer 12 outputs the number loaded in the reject number register 42g as a recognition result in step S124. If the number of rejects has not reached the predetermined number, in step S125,
The reject counter 42h is incremented, and a third threshold value of, for example, "70" smaller than the second threshold value is set, and the process returns to step S103.

【００３３】このようにして、連続する音声入力が同一
の標準パターンとして同定されかつ同じようにリジェク
トされた場合には、類似度の閾値を徐々に小さく設定す
るようにしているので、再度音声入力すれば認識され得
る。したがって、最初に設定する第１の閾値を比較的大
きく設定して誤認識を可及的減じるようにしても、リジ
ェクトされ続けて音声入力できなくなるということはな
い。さらに、所定回数（たとえば３回）同じようにリジ
ェクトされてしまうと、そのリジェクトされた番号で示
す標準パターンによって同定される音声を識別する（ス
テップＳ１２４）ので、何回か同じように音声入力を繰
り返すことによって、確実にその音声が入力される。な
お、突発音や会話の場合には同じ単語が繰り返されるこ
とは少ないので、突発音や会話によって誤動作すること
はない。In this way, when consecutive voice inputs are identified as the same standard pattern and are rejected in the same manner, the threshold value of the similarity is set to be gradually smaller, so that the voice input is performed again. It can be recognized. Therefore, even if the first threshold value that is initially set is set to be relatively large to reduce erroneous recognition as much as possible, there is no possibility that voice input cannot be continued due to rejection. Further, when the voices are rejected a predetermined number of times (for example, three times) in the same manner, the voice identified by the standard pattern indicated by the rejected number is identified (step S124), and therefore voice input is performed several times in the same manner. By repeating, the voice is surely input. In the case of sudden sound or conversation, the same word is rarely repeated, so there is no malfunction due to sudden sound or conversation.

【００３４】図７のステップＳ１１８，Ｓ１２０または
Ｓ１２５からは、図５のステップＳ１０３に戻るが、そ
のときにもステップＳ１０２で設定された入力時間は有
効であるので、ここで設定された入力時間内に繰り返し
て同じ音声が入力されかつリジェクトされた場合に、図
７に示すプロセスが有効となる。その入力時間内に再音
声入力がない場合は、リジェクトされたままで終わる。From step S118, S120 or S125 in FIG. 7, the process returns to step S103 in FIG. 5, but the input time set in step S102 is still valid at that time. When the same voice is repeatedly input and rejected, the process shown in FIG. 7 becomes effective. If there is no re-voice input within the input time, it ends as rejected.

【００３５】別の実施例では、図６に示すステップＳ１
１３に続いて、図８に示すステップＳ２０１を実行す
る。このステップＳ２０１では、ステップＳ１１４と同
様にして、ステップＳ１１２で示される最大類似度がス
テップＳ１１３で決定された第１の閾値を超えるかどう
かを判断する。最大類似度が第１の閾値を超えない場合
には、すなわちリジェクトする場合には、先の実施例と
同じように図７のステップＳ１１７に移るようにしても
よいし、そのまま終わるようにしてもよい。In another embodiment, step S1 shown in FIG.
Subsequent to step 13, step S201 shown in FIG. 8 is executed. In this step S201, similarly to step S114, it is determined whether or not the maximum degree of similarity shown in step S112 exceeds the first threshold value determined in step S113. When the maximum similarity does not exceed the first threshold value, that is, when the rejection is performed, the process may proceed to step S117 of FIG. 7 as in the previous embodiment, or may be finished as it is. Good.

【００３６】また、最大類似度が第１の閾値を超える場
合には、ステップＳ２０２において、マイクロコンピュ
ータ１２は、その最大類似度を与える単語が認識対象の
ものかどうかを判断する。すなわち、図１の実施例にお
いてカセットテープモードとチューナモードとがあると
すると、それぞれのモードにおいては、表１に示すよう
に、認識対象となる単語がモード毎に予め限定されてい
るものとする。If the maximum similarity exceeds the first threshold, the microcomputer 12 determines in step S202 whether the word giving the maximum similarity is a recognition target. That is, assuming that there is a cassette tape mode and a tuner mode in the embodiment of FIG. 1, in each mode, as shown in Table 1, words to be recognized are preliminarily limited for each mode. ..

【００３７】[0037]

【表１】 [Table 1]

【００３８】この場合、マイクロコンピュータ１２は、
たとえばチューナモードにおいて登録番号「１」〜
「５」のいずれかが最大類似度を与える場合またはカセ
ットモードにおいて登録番号「６」〜「１３」のいずれ
かの標準パターンが最大類似度を与える場合には、ステ
ップＳ２０２において、そのときの音声入力は認識対象
外であると判断する。認識対象外であることを判断する
と、すなわちステップＳ２０２において“ＮＯ”が判断
されると、ステップＳ２０３においては、マイクロコン
ピュータ１２は、たとえばブザー（図示せず）を鳴らし
たり、ＬＥＤ３１（図１）を点灯するなどして、認識対
象外の単語が最大類似度を示したことおよびしたがって
再入力の必要があることを使用者に報知する。それとと
もに、ステップＳ２０４において、再入力タイマ４２ｉ
（図２）に所定時間たとえば３秒を設定する。そして、
再入力タイマ４２ｉに設定された時間内に音声入力がな
い場合には、ステップＳ２０５を経て、ステップＳ２０
６において、マイクロコンピュータ１２は、認識対象内
で最大類似度を与える標準パターンを決定する。たとえ
ばカセットモードにおいて「巻戻し」の音声入力があっ
たとき、それが曖昧に発声されたため、ステップＳ１１
２においてそれが「バンドチェンジ」の標準パターンと
最も類似している判断され、次に類似しているのが「巻
戻し」の標準パターンである場合には、ステップＳ２０
６では、認識対象内で最大類似度を示す単語すなわち
「巻戻し」を決定し、その類似度が第１の閾値を超えて
いるかどうかを、先のステップＳ２０１と同様にして、
ステップＳ２０７で判断する。In this case, the microcomputer 12
For example, in tuner mode, registration number "1" ~
If any of "5" gives the maximum similarity, or if any of the standard patterns of registration numbers "6" to "13" gives the maximum similarity in the cassette mode, in step S202, the voice at that time is output. It is determined that the input is outside the recognition target. When it is determined that the object is not a recognition target, that is, when “NO” is determined in step S202, in step S203, the microcomputer 12 sounds, for example, a buzzer (not shown) or the LED 31 (FIG. 1). For example, by turning on the light, the user is notified that the unrecognized word has the maximum degree of similarity and thus needs to be re-input. At the same time, in step S204, the re-input timer 42i
A predetermined time, for example, 3 seconds is set in (FIG. 2). And
If there is no voice input within the time set in the re-input timer 42i, the process goes through step S205 and then step S20.
At 6, the microcomputer 12 determines the standard pattern that gives the maximum similarity within the recognition target. For example, when there is a voice input of "rewind" in the cassette mode, the voice is vaguely uttered, so step S11
If it is determined in 2 that it is the most similar to the standard pattern of "band change" and the next most similar is the standard pattern of "rewind", step S20.
In 6, the word indicating the maximum similarity in the recognition target, that is, “rewind” is determined, and whether the similarity exceeds the first threshold value is determined in the same manner as in step S201 described above.
The determination is made in step S207.

【００３９】ただし、再入力タイマ４２ｉに設定された
時間内に音声入力があった場合には、図５のステップＳ
１０３からの動作を実行し、その再入力された音声につ
いて判定する。次に、図９を参照して、図４に示す登録
モードの変形例について説明する。この変形例において
は、表２に示すように、１つのキーないしスイッチに複
数の機能を持たせるいわゆる「マルチファンクション」
を達成する場合の登録方法である。However, if there is voice input within the time set in the re-input timer 42i, step S in FIG.
The operation from 103 is executed, and the re-input voice is judged. Next, a modification of the registration mode shown in FIG. 4 will be described with reference to FIG. In this modification, as shown in Table 2, a so-called "multi-function" in which one key or switch has a plurality of functions
This is a registration method for achieving the above.

【００４０】[0040]

【表２】 [Table 2]

【００４１】このようなマルチファンクション効果を達
成するためには、１つの表示に対して２以上の音声を予
め登録する必要があるが、これらを区別することは難し
く、したがって誤登録、誤認識の原因になっていた。図
９に示す実施例はこのような問題を解決するように、２
以上の音声によって制御される機器を制御するための音
声を登録する場合には、特定の表示に従って、そのこと
を使用者に知らしめ、結果的に誤登録、誤認識を低減す
るようにするものである。すなわち、ステップＳ３０１
においては、マイクロコンピュータ１２は、表２に示す
「１／ＡＭＳＳ」や「２／ＲＰＴ」のように１つのスイ
ッチにモード毎に異なる単語を登録する場合であるかど
うかを判断する。たとえば「１／ＡＭＳＳ」スイッチ
は、ＡＭラジオモードではＡＭ放送の１チャネルを設定
するために用いられ、ＦＭラジオモードではＦＭ放送の
１チャネルを設定するために用いられ、カセットテープ
モードでは頭出しの設定のために用いられる。したがっ
て、この場合、ステップＳ３０１では“ＹＥＳ”と判定
される。もしそうでなければ、マイクロコンピュータ１
２は、次のステップＳ３０２において、ＬＥＤ３１（図
１）を常時点灯する。もし“ＹＥＳ”が判断されると、
すなわち１つのスイッチに対して複数の音声登録を行う
場合であれば、次のステップＳ３０３において、マイク
ロコンピュータ１２は、ＬＥＤ３１の点滅モードを設定
する。そして、ステップＳ３０４において、たとえば
「１／ＡＭＳＳ」のように１つのスイッチに対して３つ
以上の音声の登録が必要なのかどうかを判断する。１つ
のスイッチに対して２つの音声登録のみでよい場合すな
わち“ＮＯ”が判断される場合には、ステップＳ３０５
において、マイクロコンピュータ１２は点滅用タイマ４
２ｊ（図２）に第１のタイマ時間を設定し、逆に“ＹＥ
Ｓ”が判断されたときには、ステップＳ３０６において
マイクロコンピュータ１２は第２タイマ時間を設定す
る。第１タイマ時間と第２タイマ時間とはＬＥＤ３１の
点滅速度や間隔が異なるように予め決められているもの
である。したがって、使用者は、ＬＥＤ３１の点灯状態
（すなわち常時点灯，点滅１および点滅２）を判断する
ことによって各モードに適合した音声パターンを登録す
ることができ、誤登録をなくすことができる。In order to achieve such a multi-function effect, it is necessary to register in advance two or more voices for one display, but it is difficult to distinguish between them, and therefore misregistration and misrecognition may occur. It was the cause. In order to solve such a problem, the embodiment shown in FIG.
When registering a voice for controlling a device controlled by the above voices, the user is informed of the fact according to a specific display, and as a result, false registration and false recognition are reduced. Is. That is, step S301
In, the microcomputer 12 determines whether it is a case of registering different words for each mode in one switch such as “1 / AMSS” and “2 / RPT” shown in Table 2. For example, the "1 / AMSS" switch is used to set one channel of AM broadcast in the AM radio mode, is used to set one channel of FM broadcast in the FM radio mode, and is set to the beginning in the cassette tape mode. Used for settings. Therefore, in this case, "YES" is determined in step S301. If not, microcomputer 1
In the next step S302, the LED 2 always lights up the LED 31 (FIG. 1). If “YES” is judged,
That is, when a plurality of voice registrations are performed for one switch, the microcomputer 12 sets the blinking mode of the LED 31 in the next step S303. Then, in step S304, it is determined whether or not it is necessary to register three or more voices with respect to one switch, such as "1 / AMSS". If only two voice registrations are required for one switch, that is, if "NO" is determined, step S305.
At the microcomputer 12, the blinking timer 4
Set the first timer time to 2j (Fig. 2), and conversely "YE
When S "is determined, the microcomputer 12 sets the second timer time in step S306. The first timer time and the second timer time are predetermined so that the blinking speed and interval of the LED 31 are different. Therefore, the user can register the voice pattern suitable for each mode by judging the lighting state of the LED 31 (that is, constant lighting, blinking 1 and blinking 2), and eliminate erroneous registration. ..

【００４２】なお、上述の実施例では、音声入力を許容
するために音声入力スイッチ３０を設けたが、このよう
な特別なスイッチを設けることなく、たとえば「入力
（にゅうりょく）」のような音声入力によって音声入力
可能状態を設定するようにしてもよい。In the above-mentioned embodiment, the voice input switch 30 is provided to allow the voice input. However, without providing such a special switch, for example, "input" is used. The voice input enable state may be set by various voice inputs.

[Brief description of drawings]

【図１】この発明の一実施例を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of the present invention.

【図２】図１のメモリをより詳細に示す図解図である。FIG. 2 is an illustrative view showing the memory of FIG. 1 in more detail.

【図３】認識される音声の始端と終端とを示す波形図で
ある。FIG. 3 is a waveform diagram showing the beginning and end of a recognized voice.

【図４】図１の実施例における登録モードを示すフロー
図である。FIG. 4 is a flowchart showing a registration mode in the embodiment of FIG.

【図５】図１の実施例における認識モードの一部を示す
フロー図である。5 is a flowchart showing a part of a recognition mode in the embodiment of FIG.

【図６】図１の実施例における認識モードの一部を示す
フロー図である。6 is a flowchart showing a part of a recognition mode in the embodiment of FIG.

【図７】図１の実施例における認識モードの一部を示す
フロー図である。FIG. 7 is a flowchart showing a part of a recognition mode in the embodiment of FIG.

【図８】図１の実施例における認識モードの変形例を示
すフロー図である。FIG. 8 is a flowchart showing a modification of the recognition mode in the embodiment of FIG.

【図９】図１の実施例における登録モードの変形例を示
すフロー図である。9 is a flowchart showing a modification of the registration mode in the embodiment of FIG.

[Explanation of symbols]

１０ …カーオーディオシステム１２ …マイクロコンピュータ１４ …オーディオ部１６ …ステレオ音源３０ …音声入力スイッチ３１ …ＬＥＤ３２ …マイクロフォン３４ …フィルタバンク３６ …マルチプレクサ３８ …Ａ／Ｄ変換器４２ …メモリ４２ａ …標準パターンテーブル４２ｃ …音声バッファ４２ｄ …時間テーブル 10 ... Car audio system 12 ... Microcomputer 14 ... Audio part 16 ... Stereo sound source 30 ... Voice input switch 31 ... LED 32 ... Microphone 34 ... Filter bank 36 ... Multiplexer 38 ... A / D converter 42 ... Memory 42a ... Standard pattern table 42c ... voice buffer 42d ... time table

───────────────────────────────────────────────────── フロントページの続き (72)発明者荒木孝次大阪府守口市京阪本通２丁目18番地三洋電機株式会社内 (72)発明者出島浩次大阪府守口市京阪本通２丁目18番地三洋電機株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Koji Araki 2-18 Keihan Hondori, Moriguchi City, Osaka Prefecture Sanyo Electric Co., Ltd. (72) Inventor Koji Dejima 2-18 Keihan Hondori, Moriguchi City, Osaka Sanyo Electric Co., Ltd. Within the corporation

Claims

[Claims]

1. A voice recognition device comprising pattern creating means for analyzing a voice input from a microphone to create a voice pattern, and recognition means for recognizing by comparing the voice pattern with a standard pattern registered in advance. In, a time setting means for setting an input time for allowing voice input from the microphone, and an extension for extending the input time when voice is recognized by the recognition means within the input time set by the time setting means. A voice recognition device, further comprising means.

2. A pattern creating means for analyzing a voice input from a microphone to create a voice pattern, a standard pattern setting means for presetting a plurality of voice patterns as a standard pattern, and a pattern creating means. Selecting means for comparing the voice pattern with each of the standard patterns set in the standard pattern setting means and selecting the standard pattern showing the highest degree of similarity;
And voice recognition provided with a judging means for recognizing the voice by the standard pattern when the similarity of the standard pattern selected by the selecting means is larger than a predetermined threshold value and rejecting when the similarity degree is smaller than the predetermined threshold value. The speech recognition apparatus, further comprising means for, when the same speech is rejected a plurality of times in succession by the determination means, using the speech as a recognition result.

3. A pattern creating means for analyzing a voice input from a microphone to create a voice pattern, a standard pattern setting means for presetting a plurality of voice patterns as a standard pattern, and a pattern creating means. Selecting means for comparing the voice pattern with each of the standard patterns set in the standard pattern setting means and selecting the standard pattern showing the highest degree of similarity;
And voice recognition provided with a judging means for recognizing the voice by the standard pattern when the similarity of the standard pattern selected by the selecting means is larger than a predetermined threshold value and rejecting when the similarity degree is smaller than the predetermined threshold value. The voice recognition device, further comprising means for stopping the output of the recognition result from the determination means when the selected standard pattern is out of the identification target.

4. A voice recognition device in which a voice pattern obtained by analyzing a voice input from a microphone is registered in advance as a standard pattern. When registering different voice standard patterns in different modes for one item, registration is performed. The voice recognition device, further comprising display means for indicating a mode to be performed.