JPH0675588A

JPH0675588A - Speech recognition device

Info

Publication number: JPH0675588A
Application number: JP4228109A
Authority: JP
Inventors: Ryosuke Hamazaki; 良介濱崎
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1992-08-27
Filing date: 1992-08-27
Publication date: 1994-03-18

Abstract

PURPOSE:To let a person utter a speech with a proper intensity while for him to recognize the conditions in which the device is being used in a speech recogntion device that recognizes inputted speech signals through a pattern matching. CONSTITUTION:Prior to uttering, the environment in which the device is being used is notified to the uttering person by an environment notifying means 60 based on the noise level detected by a noise detection means 28 and while he is uttering, the uttering condition in the environment, in that the device is being used, is notified by the means 60 based on the speech level and the noise level. For example, the noise level prior to an uttering is compared with a threshold value, notify to the person of the fact that the noise level is within a proper range when the level is below the threshold value. When the noise level is over the threshold value, the fact, that the noise level is not in the proper range, is notified to the person. Moreover, the speech level during an uttering is compared with the prescribed threshold value and when the level is above the threshold value, the fact that the speech level is within a proper range is notified to the person. If the level is below the threshold value, the fact that the speech level is not in the proper range is notified.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声信号を入力してパ
ターンマッチングにより認識する音声認識装置に関し、
装置の使用環境の状態を発声者に認識させながら音声認
識を行う音声認識装置に関する。音声認識装置を使用す
る環境は様々であり、雑音の大きさに応じた強さで発声
すれば、雑音レベルが高くても、認識可能な場合もあ
る。しかし、装置の使用環境における雑音状態がどの程
度のものであり、どの程度の強さで発声したら適正なの
かは判りづらく、この点の改善が望まれる。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device which receives a voice signal and recognizes it by pattern matching.
The present invention relates to a voice recognition device that performs voice recognition while allowing a speaker to recognize the state of the environment in which the device is used. There are various environments in which the voice recognition device is used, and in some cases, even if the noise level is high, the voice can be recognized if the voice is uttered with an intensity corresponding to the noise level. However, it is difficult to know how much noise is generated in the environment in which the device is used, and how strong the voice should be when it is uttered, and improvement of this point is desired.

【０００２】[0002]

【従来の技術】従来、音声認識装置を使用して実際に音
声認識を行う環境としては様々な環境が考えられ、例え
ば高いレベルの雑音が存在する環境で認識を行う場合に
は、雑音レベルに見合った大きな声で発音する必要があ
る。しかし、発声者は装置の使用環境における雑音がど
の程度のものかを正しく認識していないため、音声レベ
ルが不足して音声認識に失敗する場合が多々存在する。2. Description of the Related Art Conventionally, various environments are conceivable as an environment for actually recognizing a voice using a voice recognition device. For example, when recognizing in an environment in which a high level of noise is present, the noise level is You need to pronounce in a loud voice. However, since the speaker does not correctly recognize the noise level in the usage environment of the device, there are many cases where the voice level is insufficient and the voice recognition fails.

【０００３】この理由としては、（１）物理的な雑音レ
ベルと心理的（聴覚的）な雑音レベルとが一致していな
いため、発声環境があまり気にならない、（２）音声認
識装置がどの程度の雑音に弱いのか直観的に把握するこ
とができない、（３）自分が発声している音声のレベル
が適正であるか判断できない、などが考えられる。The reason for this is that (1) the physical noise level and the psychological (audible) noise level do not match each other, so that the utterance environment is not very noticeable. It is possible that it is difficult to intuitively understand whether it is weak to a certain amount of noise, or (3) it is not possible to judge whether or not the level of the voice being uttered is appropriate.

【０００４】この問題を解決するため例えば特開昭６３
−６０３７９５号にあっては、音声認識に誤りが生じた
場合に、ＳＮ比が低いことが原因で認識誤りが起きたも
のと判断して話者に対し「もう少し大きな声で発声して
下さい」等の発声レベルに関する指示を出すようにして
いる。また特開平３−９４００号にあっては、雑音レベ
ルを検出し、雑音レベルの大小に応じて話者に対し声の
大きさを指示し、更に、雑音レベルに応じて予め用意し
ておいた低雑音用テンプレートと高雑音用テンプレート
の辞書メモリを切替えて音声認識に使用するようにして
いる。To solve this problem, for example, Japanese Patent Laid-Open No. 63-63
In -603795, when an error occurs in voice recognition, it is judged that the recognition error has occurred due to the low SN ratio, and the speaker is asked to "Speak a little louder." I am trying to give instructions on the utterance level such as. Further, in Japanese Patent Laid-Open No. 3-9400, the noise level is detected, the loudness of the voice is instructed to the speaker according to the magnitude of the noise level, and further, it is prepared in advance according to the noise level. The dictionary memories of the low noise template and the high noise template are switched and used for speech recognition.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、このよ
うな従来の音声認識装置にあっては、音声認識に失敗し
た場合に強く発音する指示を話者に対して発するか、Ｓ
Ｎ比の大小に応じて話者に対して声の大きさを指示する
だけであり、いずれの場合にも実際の入力音声レベルが
どうであろうとも、そのまま音声認識を実行しており、
認識の成功、失敗を待たなければ話者は発声レベルが適
正であったか否か判らないという問題があった。However, in such a conventional voice recognition apparatus, when the voice recognition is unsuccessful, a strong pronunciation instruction is given to the speaker, or S
It simply indicates the loudness of the voice to the speaker according to the magnitude of the N ratio, and in any case, regardless of the actual input voice level, the voice recognition is directly executed,
There was a problem that the speaker could not know whether the utterance level was proper unless waiting for the success or failure of recognition.

【０００６】即ち強く発音する指示を受けても、実際に
発声した音声レベルが適正か否かは発声時には判らず、
もし認識に成功すれば結果として発声レベルが適正であ
ったことが判るに過ぎない。一方、認識に失敗した場合
は、失敗して初めて発声環境に対する発声レベルが適正
でなかったことを知ることとなり、更に強く発音して結
果を待つこととなり、音声入力に手間取る問題があっ
た。That is, even if a strong pronunciation instruction is received, it is not known at the time of utterance whether the actually uttered voice level is proper or not.
If the recognition is successful, it is only possible to know that the utterance level was appropriate. On the other hand, when the recognition fails, it is necessary to know that the utterance level for the utterance environment is not appropriate, and to pronounce more strongly and wait for the result, which causes a troublesome voice input.

【０００７】このように従来装置では、発声者が音声認
識装置にとって有利となる発声の環境状態を知って積極
的に発声の強さを自らコントロールすることが十分にで
きなため、装置の使用環境に対し適正な音声レベルがな
かなか得られず、認識の失敗が多発し、効率的な認識処
理ができないという問題があった。本発明の目的は、装
置の使用環境を発声者が十分に認識して適正な強さの発
声ができるようにした音声認識装置を提供する。As described above, in the conventional apparatus, it is not possible for the speaker to sufficiently control the strength of the utterance himself by knowing the environmental condition of the utterance, which is advantageous for the voice recognition apparatus. On the other hand, there is a problem that it is difficult to obtain a proper voice level, recognition failures frequently occur, and efficient recognition processing cannot be performed. An object of the present invention is to provide a voice recognition device which allows a speaker to fully recognize the environment in which the device is used and to speak with a proper strength.

【０００８】本発明の他の目的は、発声前の雑音レベル
と発声時の音声レベルとに基づいて装置の使用環境を発
声者に知らせるようにした音声認識装置を提供する。本
発明の他の目的は、発声前の雑音レベルの大小を発声者
に知らせると共に発声者に音声レベルが適正範囲にある
か否かを知らせるようにした音声認識装置を提供する。Another object of the present invention is to provide a voice recognition device adapted to inform a speaker of a use environment of the device based on a noise level before utterance and a voice level at the time of utterance. Another object of the present invention is to provide a voice recognition device adapted to inform a speaker of the magnitude of a noise level before utterance and to inform the speaker whether or not the voice level is within a proper range.

【０００９】本発明の他の目的は、発生前の雑音レベル
の大小を発声者に知らせ、発生時ＳＮ比に基づいた適正
範囲にあるか否かを発声者に知らせるようにした音声認
識装置を提供する。本発明の他の目的は、発声者に対す
る適正範囲にあるか否かを２種類の表示ランプ、メッセ
ージ表示又は音声メッセージで知らせるようにした音声
認識装置を提供する。Another object of the present invention is to provide a voice recognizing device for informing a speaker of the magnitude of a noise level before generation and for notifying a speaker whether or not the noise level is within an appropriate range based on an SN ratio at the time of generation. provide. Another object of the present invention is to provide a voice recognition device adapted to inform the speaker whether or not it is within the proper range by two types of display lamps, message display or voice message.

【００１０】本発明の他の目的は、装置の使用環境が適
正範囲になかった場合には、音声入力に対し音声認識を
禁止して再度音声入力を促すようにした音声認識装置を
提供する。Another object of the present invention is to provide a voice recognition device which prohibits voice recognition for voice input and prompts voice input again when the use environment of the device is not within the proper range.

【００１１】[0011]

【課題を解決するための手段】図１は本発明は原理説明
図である。まず本発明の音声認識装置は、発声者の音声
を電気信号に変換して入力する音声入力手段１２と、音
声入力手段１２により入力した音声信号を分析して入力
音声パターンを求め、辞書２０に予め登録した標準パタ
ーンとの照合により音声内容を認識し、認識結果を出力
又は表示する音声認識手段１４とを備える。FIG. 1 illustrates the principle of the present invention. First, the voice recognition device of the present invention analyzes the voice signal input by the voice input means 12 for converting the voice of the speaker into an electric signal and inputting the voice signal, obtains an input voice pattern, and stores it in the dictionary 20. A voice recognition unit 14 is provided for recognizing voice content by collating with a standard pattern registered in advance and outputting or displaying the recognition result.

【００１２】さらに本発明の音声認識装置は、装置の使
用環境における背景雑音を電気信号に変換して入力する
雑音入力手段２６と、雑音入力手段２６により入力した
雑音信号の雑音レベルを検出する雑音レベル検出手段２
８と、音声入力手段１２で入力した音声信号の音声レベ
ルを検出する音声レベル検出手段３４と、雑音検出手段
２８で検出された雑音レベルに基づいて発生前に発声者
に装置の使用環境を知らせると共に、発声時には音声レ
ベルと雑音レベルに基づいて装置の使用環境における発
声状況を知らせる環境状態報知手段６０とを設けたこと
を特徴とする。Further, the speech recognition apparatus of the present invention is such that the noise input means 26 for converting the background noise in the environment of use of the apparatus into an electric signal and inputting it, and the noise for detecting the noise level of the noise signal inputted by the noise inputting means 26. Level detection means 2
8, the voice level detecting means 34 for detecting the voice level of the voice signal input by the voice input means 12, and the noise level detected by the noise detecting means 28 to inform the speaker of the environment in which the device is used before occurrence. At the same time, it is characterized in that an environment state notifying unit 60 is provided for notifying the utterance state in the usage environment of the apparatus on the basis of the voice level and the noise level during utterance.

【００１３】ここで、環境状態報知手段６０は、発声以
前の雑音レベルが適正範囲であるかどうかを発声者に指
示すると共に、発声時の音声レベルが適正範囲であるか
どうか発声者に指示する。具体的には、発声以前の雑音
レベルを予め定めた閾値と比較し、閾値以下のとき雑音
レベルが適正な範囲にあることを発声者に知らせ、閾値
より大きいときには適正な範囲にないことを発声者に知
らせる。Here, the environmental condition notifying means 60 instructs the speaker whether or not the noise level before utterance is within the proper range, and also instructs the speaker whether or not the voice level at the time of utterance is within the proper range. . Specifically, the noise level before utterance is compared with a predetermined threshold value, and when the noise level is less than or equal to the threshold value, the speaker is informed that the noise level is within the appropriate range. Inform the person.

【００１４】同時に、発声時の音声レベルを予め定めた
閾値と比較し、閾値以上のとき音声レベルが適正な範囲
にあることを発声者に知らせ、閾値より小さいときには
適正な範囲にないことを発声者に知らせる。また環境状
態報知手段６０は、発声前の雑音レベルが予め定めた雑
音閾値以下の場合に使用する第１音声閾値と、発声前の
雑音レベルが前記雑音閾値より大きい場合に使用する前
記第１音声閾値より大きい第２音声閾値と有する。At the same time, the voice level at the time of utterance is compared with a predetermined threshold value. When the voice level is equal to or higher than the threshold value, the speaker is informed that the voice level is within the proper range. Inform the person. Further, the environmental condition notifying unit 60 uses the first voice threshold used when the noise level before utterance is equal to or lower than the predetermined noise threshold and the first voice used when the noise level before utterance is higher than the noise threshold. It has a second voice threshold larger than the threshold.

【００１５】この場合、発声前の雑音レベルが雑音閾値
以下のときは、発声時の音声レベルを第１音声閾値と比
較し、第１音声閾値以上のとき音声レベルが適正な範囲
にあることを発声者に知らせ、第１音声閾値より小さい
ときには適正な範囲にないことを発声者に知らせる。一
方、発声前の雑音レベルが雑音閾値より大きい場合は、
発声時の音声レベルを第２音声閾値と比較し、第２音声
閾値以上のとき音声レベルが適正な範囲にあることを発
声者に知らせ、第２音声閾値より小さいときには適正な
範囲にないことを発声者に知らせる。In this case, when the noise level before utterance is below the noise threshold, the voice level during utterance is compared with the first voice threshold, and when above the first voice threshold, the voice level is within a proper range. The speaker is notified, and when it is smaller than the first voice threshold, the speaker is notified that the range is not appropriate. On the other hand, if the noise level before utterance is higher than the noise threshold,
The voice level at the time of utterance is compared with the second voice threshold, and when the voice level is equal to or higher than the second voice threshold, the speaker is informed that the voice level is in the proper range. When the voice level is lower than the second voice threshold, the voice level is not in the proper range. Notify the speaker.

【００１６】また環境状態報知手段６０は、発声時に音
声レベルの雑音レベルに対するＳＮ比を求め、ＳＮ比の
値が適正範囲であるかどうかを発声者に知らせる。具体
的には、発声時に計算したＳＮ比を予め定めた閾値と比
較し、閾値以上のとき音声レベルが適正な範囲にあるこ
とを発声者に知らせ、閾値より小さいときには適正な範
囲にないことを発声者に知らせる。Further, the environmental condition informing means 60 obtains the SN ratio of the voice level to the noise level at the time of utterance and informs the speaker whether or not the value of the SN ratio is within the proper range. Specifically, the SN ratio calculated at the time of utterance is compared with a predetermined threshold value, and when it is equal to or higher than the threshold value, the speaker is informed that the voice level is in the proper range. Notify the speaker.

【００１７】さらに、適正範囲にあることを指示するた
めに点灯する第１表示ランプと、適正範囲にないことを
指示するために点灯する第２表示ランプとを設ける。こ
れは第１表示ランプとして緑色の表示ランプを使用し、
第２表示ランプとして赤色の表示ランプを使用する。表
示ランプは１つのみでもよいし、３つ以上でもよい。勿
論、メッセージ表示や音声メッセージの出力により発声
者に知らせてもよい。Further, there is provided a first display lamp which is lit to indicate that it is within the proper range and a second display lamp which is lit to indicate that it is not within the proper range. This uses a green indicator lamp as the first indicator lamp,
A red indicator lamp is used as the second indicator lamp. The number of display lamps may be only one, or may be three or more. Of course, the speaker may be notified by displaying a message or outputting a voice message.

【００１８】更にまた、環境状態報知手段６０は発声時
に使用環境が適正範囲にないことを検出した場合には、
音声認識手段１４に対し認識処理の禁止を指示し、且つ
発声者に再度の音声入力を促すことを特徴とする。Furthermore, when the environmental condition informing means 60 detects that the usage environment is not within the proper range at the time of utterance,
It is characterized by instructing the voice recognition means 14 to prohibit the recognition process and prompting the speaker to input the voice again.

【００１９】[0019]

【作用】このような構成を備えた本発明の音声認識装置
によれば、装置に向かって音声を発声するまえに、事前
に認識装置が置かれている環境状態、即ち背景雑音の状
態を発声者に知らせると共に、発声時の音声レベルをも
チェックし、雑音レベル及び音声レベルに基づく状況判
断から装置にとって認識が可能であるかそうでないかを
判断し、常にそれらの情報を発声者に知らせることによ
ってフィードバックする。According to the speech recognition apparatus of the present invention having such a configuration, before uttering a voice toward the apparatus, the utterance state of the environment in which the recognizing apparatus is placed in advance, that is, the state of background noise is uttered. In addition to notifying the speaker, also check the voice level at the time of utterance, judge whether the device is recognizable or not from the situation judgment based on the noise level and voice level, and always inform the speaker of such information. Give feedback by.

【００２０】このため発声者は音声認識装置がおかれて
いる状況と自分の発声に関する状況について容易に判断
できるので、装置の使用環境に見合った適切な発声がで
きる。また、判断の結果、認識できない可能性が大きい
状況では、認識処理を行わないようにロックをかけ、直
ちに発声者に再発声を促すことにより、効率の良い認識
が可能となる。For this reason, the speaker can easily judge the situation in which the voice recognition device is installed and the situation concerning his or her own utterance, so that an appropriate utterance suitable for the environment in which the device is used can be produced. Also, as a result of the judgment, in a situation in which there is a high possibility that the recognition cannot be performed, the recognition processing is locked so that the recognition processing is not performed, and the speaker is immediately prompted to re-evaluate, so that the recognition can be performed efficiently.

【００２１】[0021]

【実施例】図２は本発明の第１実施例を示した実施例構
成図である。図２において、まず音声認識側はマイクロ
ホン１０，音声入力部１２，分析部１６，照合部１８，
辞書２０および結果表示部２２で構成される。ここで、
音声認識のための分析部１６と照合部１８は、例えばＤ
ＳＰ１４によるプログラム制御で実現され、辞書２０を
用いて高速に音声認識を実行することができる。2 is a block diagram of an embodiment showing the first embodiment of the present invention. In FIG. 2, first, on the voice recognition side, a microphone 10, a voice input unit 12, an analysis unit 16, a collation unit 18,
It is composed of a dictionary 20 and a result display unit 22. here,
The analysis unit 16 and the matching unit 18 for voice recognition are, for example, D
It is realized by program control by SP14, and voice recognition can be executed at high speed using the dictionary 20.

【００２２】一方、装置の使用環境における雑音状態お
よび音声入力中の状態を発声者に知らせるため、背景雑
音を電気信号に変換するマイクロホン２４，雑音入力部
２６，雑音レベル計算部３０，雑音レベル判定部３２，
音声レベル計算部３４，音声レベル判定部３６，総合判
定部３８およびランプ表示部４０が設けられる。ここ
で、雑音レベル計算部３０，雑音レベル判定部３２，音
声レベル計算部３４，音声レベル判定部３６および総合
判定部３８はＭＰＵ２８のプログラム制御により実現さ
れる。ランプ表示部４０には、この実施例にあっては音
声入力に適した環境状態にあることを示す緑色ランプ
（第１表示ランプ）、音声入力に適した環境状態にない
ことを示す赤色ランプ（第２表示ランプ）が設けられ、
この緑色ランプおよび赤色ランプとしては発光ダイオー
ド等を使用することができる。On the other hand, in order to inform the speaker of the noise state in the usage environment of the apparatus and the state during voice input, a microphone 24 for converting background noise into an electric signal, a noise input section 26, a noise level calculation section 30, a noise level determination. Part 32,
A voice level calculation unit 34, a voice level determination unit 36, a comprehensive determination unit 38, and a lamp display unit 40 are provided. Here, the noise level calculation unit 30, the noise level determination unit 32, the voice level calculation unit 34, the voice level determination unit 36, and the comprehensive determination unit 38 are realized by the program control of the MPU 28. In this embodiment, the lamp display unit 40 has a green lamp (first display lamp) indicating that it is in an environmental state suitable for voice input, and a red lamp (not indicating an environmental state suitable for voice input). A second display lamp) is provided,
A light emitting diode or the like can be used as the green lamp and the red lamp.

【００２３】次に図２の実施例の動作を説明する。ま
ず、雑音入力部２６からはマイクロホン２４で電気信号
に変換された背景雑音を示す信号が常時入力されてお
り、雑音レベル計算部３０において雑音レベルが計算さ
れ、雑音レベル判定部３２に与えられている。雑音レベ
ル判定部３２には予め雑音閾値ＴＨｎが設定されてお
り、雑音レベル計算部３０で計算した雑音レベルと雑音
閾値ＴＨｎとを比較し、閾値ＴＨｎより大きいかどう
か、つまり周りがうるさくて音声認識が困難であるかど
うかを判定する。Next, the operation of the embodiment shown in FIG. 2 will be described. First, from the noise input unit 26, a signal indicating the background noise converted into an electric signal by the microphone 24 is constantly input, and the noise level calculation unit 30 calculates the noise level and supplies it to the noise level determination unit 32. There is. A noise threshold THn is set in advance in the noise level determination unit 32, and the noise level calculated by the noise level calculation unit 30 is compared with the noise threshold THn to determine whether or not it is greater than the threshold THn, that is, the surroundings are noisy and speech recognition is performed. Determine if is difficult.

【００２４】一方、音声入力部１２からはマイクロホン
１０で得た発生された音声の音声信号を入力し、音声レ
ベル計算部３４において音声レベルを計算し、音声レベ
ル判定部３６に入力する。音声レベル判定部３６は入力
した音声レベルが予め定めた閾値より大きいかどうか判
定する。即ち、十分な大きさで音声が発声されているか
どうかが判定される。On the other hand, the voice signal of the generated voice obtained by the microphone 10 is input from the voice input unit 12, the voice level calculation unit 34 calculates the voice level, and the voice signal is input to the voice level determination unit 36. The voice level determination unit 36 determines whether or not the input voice level is higher than a predetermined threshold value. That is, it is determined whether or not the voice is uttered in a sufficient size.

【００２５】この実施例において、音声レベル判定部３
６は雑音レベルが雑音閾値ＴＨｎより小さいときに使用
する第１の音声閾値ＴＨ₁ と、雑音レベルが雑音閾値Ｔ
Ｈ以上のときに使用する第２の音声閾値ＴＨ₂ を予め設
定しており、雑音レベルが小さいときと大きいときで異
なる音声閾値を使用して音声レベルとの比較判定を行っ
ている。In this embodiment, the voice level judging section 3
6 is a first voice threshold TH ₁ used when the noise level is lower than the noise threshold THn, and a noise level T ₁ is the noise threshold T.
A second voice threshold TH ₂ used when the noise level is H or higher is set in advance, and the voice level is compared and determined using different voice thresholds when the noise level is low and when it is high.

【００２６】勿論、雑音レベルが小さいときに使用する
音声閾値ＴＨ₁ と雑音レベルの大きいときに使用する閾
値ＴＨ₂ との間にはＴＨ₁ ＜ＴＨ₂ の関係がある。雑音
レベル判定部３２の判定結果および音声レベル判定部３
６の判定結果は総合判定部３８に与えられ、雑音レベル
と音声レベルの２つの状況に対する論理判断を行って、
音声認識に適した環境状態にあるか、あるいは音声認識
に適していない環境状態にあるかの判定結果をランプ表
示部４０に出力する。ランプ表示部４０は総合判定部３
８より音声認識に適した装置の使用環境にあることを示
す判定結果が得られた場合には、緑色ランプを点灯す
る。一方、音声認識に適さない装置の使用環境の判定結
果が出力された場合には赤色ランプを点灯する。Of course, there is a relation of TH ₁ <TH ₂ between the voice threshold TH ₁ used when the noise level is low and the threshold TH ₂ used when the noise level is high. Determination result of noise level determination unit 32 and voice level determination unit 3
The judgment result of No. 6 is given to the comprehensive judgment unit 38, and logical judgment is made for two situations of the noise level and the voice level,
The lamp display unit 40 outputs the determination result indicating whether the environment is suitable for voice recognition or the environment is not suitable for voice recognition. The lamp display unit 40 is the comprehensive determination unit 3
When the determination result indicating that the apparatus is in the usage environment suitable for voice recognition is obtained from 8, the green lamp is turned on. On the other hand, when the determination result of the usage environment of the device not suitable for voice recognition is output, the red lamp is turned on.

【００２７】例えば、発声前に雑音レベルが雑音閾値Ｔ
Ｈｎ以上で周りがうるさいときには赤ランプを点灯して
音声認識に適した装置の使用環境にないことを発声者に
知らせる。このためランプ表示部４０の赤色ランプが点
灯した場合には、発声者は周りを静かにさせるか、ある
いは静かな場所に移動するか等の対応措置をとることが
できる。For example, the noise level is the noise threshold T before utterance.
When it is above Hn and the surroundings are noisy, the red lamp is turned on to inform the speaker that the device is not suitable for voice recognition. Therefore, when the red lamp of the lamp display unit 40 is turned on, the speaker can take a countermeasure such as keeping his surroundings quiet or moving to a quiet place.

【００２８】勿論、雑音レベルが雑音閾値ＴＨｎより小
さく、周りが静かなときには、ランプ表示部４０の緑ラ
ンプが点灯し、音声認識に適した使用環境にあることを
知らせるので、緑ランプが点灯している状態で発声して
音声認識を行わせるようになる。また、発声前に周りが
静かで緑ランプが点灯状態で発声した場合、雑音レベル
が小さいときに使用する音声閾値ＴＨ₁ より音声レベル
が小さかった場合には赤ランプが点灯し、もっと大きな
声で発声するように発声者に対し指示を出す。従って、
発声時に赤ランプが点灯した場合には発声者はもっと大
きな声で発声すれば、音声レベルが音声閾値ＴＨ₁ 以上
となって緑ランプの点灯に切り替わる。Of course, when the noise level is lower than the noise threshold THn and the surroundings are quiet, the green lamp of the lamp display unit 40 is turned on to inform that the operating environment is suitable for voice recognition, so the green lamp is turned on. You will be able to make a voice recognition by uttering while you are standing. Also, if you utter with the surroundings quiet and the green lamp lit before you utter, and if the voice level is lower than the voice threshold TH ₁ used when the noise level is low, the red lamp will turn on and you will hear a louder voice. Instruct the speaker to speak. Therefore,
When the red lamp is turned on at the time of utterance, if the speaker utters a louder voice, the voice level becomes equal to or higher than the voice threshold TH ₁ and the green lamp is switched on.

【００２９】一方、発声前に周囲がうるさくて雑音レベ
ルが雑音閾値ＴＨｎ以上となって赤ランプが点灯してい
るときにも、大きい声で発声すれば緑ランプの点灯に切
り替わる。即ち、雑音レベルが雑音閾値ＴＨｎ以上であ
った場合にはランプ表示部４０の赤色ランプが点灯して
いるが、このとき大きい声で発声すると音声レベル判定
部３６において、雑音レベルが大きいときに使用する音
声閾値ＴＨ₂ 以上となる音声認識に十分な音声レベルが
得られることから、緑色ランプの点灯に切り替わり、発
声者に対し周囲がうるさくても音声認識に適した十分な
レベルの音声入力が行われたことを知らせる。On the other hand, even if the surroundings are noisy and the noise level becomes equal to or higher than the noise threshold THn before the vocalization and the red lamp is lit, if the loud voice is uttered, the green lamp is switched on. That is, when the noise level is equal to or higher than the noise threshold THn, the red lamp of the lamp display unit 40 is turned on, but if a loud voice is uttered at this time, the voice level determination unit 36 uses it when the noise level is high. Since a voice level that is equal to or higher than the voice threshold TH ₂ that is sufficient for voice recognition is obtained, the green lamp is switched on, and a voice input with a sufficient level suitable for voice recognition is performed even if the speaker is noisy around. Notify that you have been broken.

【００３０】一方、音声入力部１２からの音声信号は分
析部１６において周波数分析されて入力音声パターンに
変換される。分析部１６で分析された入力音声パターン
は照合部１８に与えられ、照合部１８は辞書２０に予め
登録している標準パターンと分析後の入力音声パターン
との照合を行い、各標準パターンに対する入力音声パタ
ーンの距離を計算する。On the other hand, the voice signal from the voice input unit 12 is frequency-analyzed by the analysis unit 16 and converted into an input voice pattern. The input voice pattern analyzed by the analysis unit 16 is given to the collation unit 18, and the collation unit 18 collates the standard pattern registered in the dictionary 20 with the post-analysis input voice pattern, and inputs to each standard pattern. Calculate the distance of a voice pattern.

【００３１】結果表示部２２では照合部１８で計算した
距離が小さい順に１または複数の認識候補を認識結果と
して表示する。結果表示部２２に複数の認識候補が表示
された場合には発声者の入力操作で指定された候補が最
終的な認識結果となる。図３は図２のＭＰＵ２８におけ
る雑音レベルと音声レベルに基づく環境状態の判定処理
を示したフローチャートである。The result display unit 22 displays one or a plurality of recognition candidates in the ascending order of distance calculated by the matching unit 18. When a plurality of recognition candidates are displayed on the result display unit 22, the candidate specified by the input operation of the speaker is the final recognition result. FIG. 3 is a flowchart showing an environmental condition determination process based on the noise level and the voice level in the MPU 28 of FIG.

【００３２】図３において、まずステップＳ１で音声入
力の有無を監視しており、発声前にあってはステップＳ
２に進んで雑音レベルと雑音閾値ＴＨｎとを比較する。
雑音レベルが雑音閾値ＴＨｎ以下であれば周りが静かな
ことから、音声認識に適した装置の使用環境にあるもの
と判断してステップＳ３で緑色ランプを点灯する。一
方、雑音レベルが雑音閾値ＴＨｎより大きい場合には周
囲がうるさく音声認識に適した装置の使用環境にないも
のと判断し、ステップＳ４で赤色ランプを点灯する。In FIG. 3, first, the presence or absence of voice input is monitored in step S1, and before utterance, step S1 is performed.
In step 2, the noise level is compared with the noise threshold THn.
If the noise level is equal to or lower than the noise threshold THn, the surroundings are quiet, so it is determined that the environment is in use of the device suitable for voice recognition, and the green lamp is turned on in step S3. On the other hand, if the noise level is higher than the noise threshold THn, it is determined that the environment is not in the environment in which the device suitable for voice recognition is noisy and the red lamp is turned on in step S4.

【００３３】次に発声時にあっては、ステップＳ１で音
声入力有りが判定され、ステップＳ５に進み、雑音レベ
ルと雑音閾値ＴＨｎを比較する。雑音レベルが雑音閾値
ＴＨｎ以下で静かなときにはステップＳ６に進み、雑音
レベルが小さいときに使用する音声閾値ＴＨ₁ とそのと
きの音声レベルを比較する。音声レベルが音声閾値ＴＨ
₁ 以上であれば音声認識に適した音声レベルにあるもの
と判定し、ステップＳ７で緑色ランプを点灯する。Next, at the time of utterance, it is determined in step S1 that there is voice input, and the process proceeds to step S5, where the noise level and the noise threshold THn are compared. When the noise level is lower than the noise threshold THn and quiet, the process proceeds to step S6, and the voice threshold TH ₁ used when the noise level is low is compared with the voice level at that time. Voice level is voice threshold TH
If it is ₁ or more, it is determined that the voice level is suitable for voice recognition, and the green lamp is turned on in step S7.

【００３４】一方、音声レベルが音声閾値ＴＨ₁ より小
さければ音声認識には不十分な発声状態と判定し、ステ
ップＳ８で赤ランプを点灯し、大きい声で発声するよう
に発声者に促す。またステップＳ５で雑音レベルが雑音
閾値ＴＨｎより大きかった場合にはステップＳ９に進
み、雑音レベルが大きいときに使用する音声閾値ＴＨ₂
と音声レベルを比較する。音声レベルが音声閾値ＴＨ₂
より大きければ雑音レベルが高いが十分に大きな声で発
声されたと判定し、ステップＳ１０で緑ランプを点灯す
る。On the other hand, if the voice level is lower than the voice threshold TH _1, it is determined that the voice is not sufficient for voice recognition, the red lamp is turned on in step S8, and the speaker is urged to speak loudly. If the noise level is higher than the noise threshold THn in step S5, the process proceeds to step S9, and the voice threshold TH ₂ used when the noise level is high.
And compare the audio level. The voice level is the voice threshold TH ₂
If it is larger, the noise level is high, but it is determined that a sufficiently loud voice is uttered, and the green lamp is turned on in step S10.

【００３５】ステップＳ９で音声レベルが音声閾値ＴＨ
₂ より小さければ雑音レベルが高い状態では十分な音声
レベルが得られず、適正な音声認識ができないものとし
てステップＳ１１で赤ランプを点灯し、より大きな声で
発声するように発声者に促す。図４は図２の実施例のＤ
ＳＰ１４で行われる音声認識処理を示したフローチャー
トである。In step S9, the voice level is the voice threshold TH.
If it is less than ₂ , a sufficient voice level cannot be obtained in a state where the noise level is high, and it is determined that proper voice recognition cannot be performed, the red lamp is turned on in step S11, and the speaker is urged to speak louder. FIG. 4 shows the embodiment D of FIG.
7 is a flowchart showing a voice recognition process performed in SP14.

【００３６】図４において、まずステップＳ１で音声入
力の有無をチェックしており、音声入力があるとステッ
プＳ２に進み、音声データを分析部１６に取り込む。続
いてステップＳ３で分析部１６による周波数分析を実行
し、入力音声パターンを生成する。次にステップＳ４で
照合部１８が辞書２０に登録した標準パターンと入力音
声パターンとの距離を計算し、ステップＳ５で距離の短
い順に１または複数の候補を選択し、ステップＳ６で結
果表示部２２に表示する。In FIG. 4, first, the presence / absence of voice input is checked in step S1. If voice input is present, the flow advances to step S2 to capture voice data into the analysis section 16. Then, in step S3, frequency analysis is performed by the analysis unit 16 to generate an input voice pattern. Next, in step S4, the matching unit 18 calculates the distance between the standard pattern registered in the dictionary 20 and the input voice pattern, and in step S5, one or more candidates are selected in ascending order of distance, and in step S6 the result display unit 22 To display.

【００３７】結果表示部２２に複数候補が選択された場
合には、発声者による入力指示で指定された候補を認識
結果として選択し、再びステップＳ１に戻って次の音声
入力を待つようになる。尚、図２の第１実施例にあって
は、ランプ表示部４０に発声環境の適否を示す赤色ラン
プと緑色ランプの２種類のランプを用いているが、更に
赤色ランプと緑色ランプの中間範囲を示す黄色ランプを
入れるなど、３個以上の表示ランプを用いてもよい。ま
た、単一の表示ランプのみを設け、発声環境が適正でな
い場合にのみランプを点灯するようにしてもよい。When a plurality of candidates are selected on the result display section 22, the candidate specified by the input instruction by the speaker is selected as the recognition result, and the process returns to step S1 to wait for the next voice input. . In the first embodiment shown in FIG. 2, the lamp display section 40 uses two types of lamps, a red lamp and a green lamp, which indicate the appropriateness of the vocal environment. It is also possible to use three or more display lamps such as a yellow lamp for indicating. Alternatively, only a single display lamp may be provided and the lamp may be turned on only when the utterance environment is not appropriate.

【００３８】図５は本発明の第２実施例を示した実施例
構成図であり、この実施例にあっては音声レベルと雑音
レベルからＳＮ比を計算して、発声時に音声認識に適し
た状況か否か判定するようにしたことを特徴とする。図
５において、音声認識側は音声入力部１２，分析部１
６，照合部１８，結果表示部２２および辞書２０を備
え、図２の第２実施例と同じである。また、音声認識の
ための装置の使用環境の判定側としてマイクロホン２
４，雑音入力部２６，雑音レベル計算部３０，雑音レベ
ル判定部３２，音声レベル計算部３４、更にランプ表示
部４０が設けられている点は図２の第２実施例と同じで
あるが、この実施例にあっては新たにＳＮ比計算部４２
とＳＮ比判定部４４が設けられる。FIG. 5 is a block diagram of an embodiment showing the second embodiment of the present invention. In this embodiment, the SN ratio is calculated from the voice level and the noise level, which is suitable for voice recognition at the time of utterance. The feature is that it is determined whether or not the situation. In FIG. 5, the voice recognition side includes a voice input unit 12 and an analysis unit 1.
6, a collation unit 18, a result display unit 22 and a dictionary 20 are provided, which is the same as the second embodiment of FIG. In addition, the microphone 2 is used as a judgment side of the usage environment of the device for voice recognition.
4, the noise input unit 26, the noise level calculation unit 30, the noise level determination unit 32, the voice level calculation unit 34, and the lamp display unit 40 are the same as the second embodiment of FIG. In this embodiment, a new SN ratio calculation unit 42 is added.
And an SN ratio determination unit 44 are provided.

【００３９】図２の第１実施例にあっては、発声時の発
声環境の状態判定に背景雑音と発声音声の各々の状態を
判定した後、論理的な総合判定を行っているのに対し、
図５の第２実施例にあっては、背景雑音の雑音レベルと
発声音声の音声レベルからＳＮ比を算出し、単に発声し
た音声レベルを判定するのではなく、背景雑音に対する
発声状態を考慮した判定を行っている。In the first embodiment shown in FIG. 2, in contrast to the judgment of the state of the utterance environment at the time of utterance, after the respective states of the background noise and the uttered voice are decided, a logical comprehensive decision is made. ,
In the second embodiment of FIG. 5, the SN ratio is calculated from the noise level of the background noise and the voice level of the uttered voice, and the utterance state with respect to the background noise is taken into consideration rather than simply determining the uttered voice level. Judging.

【００４０】このようなＳＮ比に基づく判定は音声認識
装置にとっては単に背景雑音の雑音レベルが大きいこ
と、あるいは発声音声の音声レベルが小さいことのいず
れも認識率悪化の要因と考えることができるが、更に認
識率の要因は背景雑音の雑音レベルに対し発声音声の音
声レベルがどの程度あるかによって判断することもでき
るからである。For the voice recognition device, such a determination based on the SN ratio can be considered to be a factor that deteriorates the recognition rate, either because the noise level of the background noise is simply high or when the voice level of the vocalized voice is low. The reason for this is that the factor of the recognition rate can be judged by the level of the voice level of the uttered voice with respect to the noise level of the background noise.

【００４１】即ち、多少、背景雑音の雑音レベルが大き
くて、第１実施例における雑音閾値を越えていたとして
も、その分大きな音声レベルで発声すれば雑音レベルが
相対的に小さくなったと考えることができ、従ってＳＮ
比による判定の方がより適切な発声環境の状態を判定す
ることができる。図５のＳＮ比判定部４４には、例えば
図６に示すような閾値ＴＨが設定されている。図６は実
験的に求めたＳＮ比（横軸）に対する音声認識装置のエ
ラーレート（縦軸）を示したもので、例えば０．５％の
エラーレートに対応するＳＮ比を閾値ＴＨと設定する。
勿論、必要に応じて任意のエラーレートに対応したＳＮ
比を閾値として設定することができる。That is, even if the noise level of the background noise is somewhat high and exceeds the noise threshold in the first embodiment, it is considered that the noise level becomes relatively low if the voice is uttered at a correspondingly high voice level. And therefore SN
It is possible to judge a more appropriate state of the utterance environment by the judgment based on the ratio. A threshold TH as shown in FIG. 6, for example, is set in the SN ratio determination unit 44 in FIG. FIG. 6 shows the error rate (vertical axis) of the speech recognition apparatus with respect to the experimentally obtained SN ratio (horizontal axis). For example, the SN ratio corresponding to an error rate of 0.5% is set as the threshold value TH. .
Of course, if necessary, SN corresponding to any error rate
The ratio can be set as a threshold.

【００４２】図７は図５の第２実施例におけるＳＮ比に
基づく音声認識の使用環境の状態判定を示したフローチ
ャートである。図７において、まずステップＳ１で音声
入力の有無を判定しており、発声以前にあってはステッ
プＳ２に進み、雑音レベルを雑音閾値ＴＨｎと比較して
おり、雑音レベルが小さければステップＳ３で緑色ラン
プを点灯し、雑音レベルが大きければステップＳ４で赤
色ランプを点灯している。FIG. 7 is a flow chart showing the judgment of the state of the use environment of the voice recognition based on the SN ratio in the second embodiment of FIG. In FIG. 7, first, the presence or absence of voice input is determined in step S1, the process proceeds to step S2 before utterance, and the noise level is compared with the noise threshold THn. If the noise level is low, green is detected in step S3. The lamp is turned on, and if the noise level is high, the red lamp is turned on in step S4.

【００４３】続いて発声による音声入力があった場合に
はステップＳ１からステップＳ５に進み、このときの雑
音レベルと音声レベルからＳＮ比を計算し、ステップＳ
６で閾値ＴＨと大小を比較する。ＳＮ比が閾値ＴＨ以上
であれば音声認識に適した発声環境にあるものと判定し
て、ステップＳ７で緑色ランプを点灯する。ＳＮ比が閾
値ＴＨを下回れば音声認識に適した発声環境にないもの
と判定し、ステップＳ８で赤色ランプを点灯する。Then, if there is voice input by utterance, the process proceeds from step S1 to step S5, the SN ratio is calculated from the noise level and voice level at this time, and step S
At 6, the threshold TH is compared with the size. If the SN ratio is greater than or equal to the threshold value TH, it is determined that the utterance environment is suitable for voice recognition, and the green lamp is turned on in step S7. If the SN ratio is below the threshold value TH, it is determined that the environment is not suitable for voice recognition, and the red lamp is turned on in step S8.

【００４４】図８は本発明の第３実施例を示した実施例
構成図であり、この第３実施例にあっては図２の第１実
施例で設けていたランプ表示部４０の代わりにメッセー
ジ表示部４６とメッセージ登録部４８を設けるようにし
たことを特徴とし、他の構成は図２の第１実施例と同じ
である。メッセージ登録部４８には照合判定部３８の判
定結果に対応したメッセージが予め登録されている。例
えば、発声前の雑音レベルが大きい場合の判定結果に対
しては「周りを静かにさせてください」または「静かな
場所に移動してください」がメッセージ登録されてい
る。また、音声レベルが音声閾値より小さい場合の判定
結果に対しては「もっと大きな声で発声してください」
等のメッセージ登録が行われる。FIG. 8 is a block diagram of an embodiment showing a third embodiment of the present invention. In this third embodiment, instead of the lamp display section 40 provided in the first embodiment of FIG. The present embodiment is characterized in that a message display section 46 and a message registration section 48 are provided, and the other structure is the same as that of the first embodiment shown in FIG. A message corresponding to the determination result of the collation determination unit 38 is registered in the message registration unit 48 in advance. For example, the message “Please keep your surroundings quiet” or “Move to a quiet place” is registered as a message for the determination result when the noise level before utterance is high. For the judgment result when the voice level is lower than the voice threshold, "Speak louder"
Etc. message registration is performed.

【００４５】一方、雑音レベルが小さく周りが静かなと
きには「音声を入力してください」等のガイダンスメッ
セージを登録する。また、音声レベルが適切な範囲にあ
った場合には「適正な発声が行われています」等のメッ
セージ登録をすればよい。このように第３実施例にあっ
ては、メッセージ表示部４６に総合判定部３８に対応し
た具体的なガイダンスメッセージが表示されるため、発
声者は表示メッセージに対応した音声入力を行うこと
で、常に最適環境での音声認識を行わせることができ
る。On the other hand, when the noise level is low and the surroundings are quiet, a guidance message such as "Please input voice" is registered. If the voice level is within the appropriate range, a message such as "Proper utterance is being made" may be registered. As described above, in the third embodiment, since the specific guidance message corresponding to the comprehensive determination unit 38 is displayed on the message display unit 46, the speaker performs voice input corresponding to the display message, It is possible to always perform voice recognition in the optimum environment.

【００４６】図９は本発明の第４実施例を示した実施例
構成図であり、この実施例にあっては図２の第１実施例
に設けているランプ表示部４０の代わりに音声メッセー
ジ再生部５０と音声メッセージ登録部５２を設けたこと
を特徴とする。音声メッセージ登録部５２には総合判定
部３８の判定結果に対応した音声メッセージが予め登録
されており、判定結果に対する音声メッセージの内容は
図８の第３実施例の場合と同様であり、発声前および発
声時の総合判定部３８の判定結果に応じ、音声メッセー
ジ再生部５０が対応する音声メッセージを音声メッセー
ジ登録部５２から読み出してスピーカ５６により発声者
に判定結果を知らせるようになる。FIG. 9 is a block diagram of an embodiment showing a fourth embodiment of the present invention. In this embodiment, a voice message is used instead of the lamp display section 40 provided in the first embodiment of FIG. A feature is that a reproducing unit 50 and a voice message registration unit 52 are provided. A voice message corresponding to the determination result of the comprehensive determination unit 38 is registered in the voice message registration unit 52 in advance, and the content of the voice message corresponding to the determination result is the same as in the case of the third embodiment of FIG. According to the judgment result of the comprehensive judgment unit 38 at the time of utterance, the voice message reproduction unit 50 reads the corresponding voice message from the voice message registration unit 52, and the speaker 56 notifies the speaker of the judgment result.

【００４７】図１０は本発明の第５実施例を示した実施
例構成図であり、この実施例にあっては総合判定部３８
による発声時の発声環境の状態判定結果から適正範囲に
ないことの判定結果が得られたとき、ＤＳＰ１４側で行
っている音声認識処理を禁止し、新たな音声入力を発声
者に促すようにしたことを特徴とする。図１０におい
て、音声認識側の音声入力部１２に通じて、ＤＳＰ１４
内に新たに認識モードロック部５４を設けており、認識
モードロック部５４がロック状態にあると、次の分析部
１６および照合部１８による音声認識処理を認識するこ
とができる。認識モードロック部５４のロック制御およ
びアンロック制御は総合判定部３８の判定結果、即ち発
声時に音声認識に適した環境にないことの判定結果が得
られたときに認識モードロック部５４をロック制御する
ようにしている。それ以外の構成は図２の第１実施例と
同じである。FIG. 10 is a block diagram of an embodiment showing the fifth embodiment of the present invention. In this embodiment, the comprehensive judgment section 38 is shown.
When the result of the state determination of the utterance environment at the time of utterance is obtained that the result is out of the proper range, the voice recognition process performed on the DSP 14 side is prohibited and a new voice input is prompted to the utterer. It is characterized by In FIG. 10, the DSP 14 is connected to the voice input unit 12 on the voice recognition side.
A recognition mode lock unit 54 is newly provided therein, and when the recognition mode lock unit 54 is in the locked state, the next voice recognition process by the analysis unit 16 and the collation unit 18 can be recognized. The lock control and unlock control of the recognition mode lock unit 54 locks the recognition mode lock unit 54 when the determination result of the comprehensive determination unit 38, that is, the determination result that the environment is not suitable for voice recognition at the time of utterance is obtained. I am trying to do it. The other structure is the same as that of the first embodiment shown in FIG.

【００４８】この第５実施例の特徴は図２の第１実施例
にあっては、発声前の背景雑音の雑音レベルを監視し
て、その状態を表示し、発声時に発声音声の音声レベル
を監視して、その状態を表示しているが、仮に発声状況
が音声認識装置にとって適正でない場合にも認識モード
に入って音声認識を実行している。これは雑音レベル判
定部３２および音声レベル判定部３６で適性でないと判
定されたとしても必ずしも認識が不可能とは言えず、認
識が可能な状態であることから認識モードに入っている
ものである。The feature of the fifth embodiment is that in the first embodiment of FIG. 2, the noise level of the background noise before utterance is monitored, the state is displayed, and the voice level of the uttered voice at the time of utterance is displayed. Although the state is monitored and displayed, if the utterance situation is not appropriate for the voice recognition device, the recognition mode is entered to perform voice recognition. Even if the noise level determination unit 32 and the voice level determination unit 36 determine that this is not appropriate, this does not necessarily mean that recognition is impossible, and the recognition mode is entered because the recognition is possible. .

【００４９】しかしながら、雑音レベル判定部及びまた
は音声レベル判定部３６で適性でないとの判定結果が得
られた場合には、発声した音声内容が認識候補第１位と
して認識されない場合の頻度が多くなり、再度、発声す
る必要が生じたり、複数の認識候補の中から下位の候補
を選択しなければならないなど、音声認識装置の操作性
が低下する。However, when the noise level determination unit and / or the voice level determination unit 36 obtains a determination result that the voice level is not appropriate, the frequency of cases in which the uttered voice content is not recognized as the first recognition candidate increases. The operability of the voice recognition device is deteriorated, for example, it becomes necessary to speak again, or a lower-order candidate has to be selected from a plurality of recognition candidates.

【００５０】また、ランプ表示部４０で適正な使用環境
にあることを示していても発声者にとっては認識性能の
程度が不鮮明となり、どの程度の背景雑音や発声音声の
レベルだと認識困難であるかが分からなくなる恐れがあ
る。従って、総合判定部３８において、発声時の状況が
予め定めた判断結果となったときには認識モードロック
部５４をロック制御し、分析部１６および照合部１８に
よる音声認識処理を実行せずに再度、音声入力を促すよ
うにする。例えば、総合判定部３８において、発声前に
雑音レベルが雑音閾値ＴＨｎを上回っており、発声時の
音声レベルが大きい雑音レベルに対応した音声閾値ＴＨ
₂ 以上であっても、予め定めた値以上でなければ適正な
認識結果は得られないものと判定し、認識モードロック
部５４をロック制御し、ランプ表示部４０の赤色ランプ
を点灯し、発声者に次の音声入力を促す。Further, even if the lamp display section 40 indicates that the environment is appropriate, the speaker cannot recognize the degree of recognition performance clearly, and it is difficult to recognize what level of background noise or voice level. There is a danger that you may not be able to see it. Therefore, in the comprehensive judgment unit 38, when the situation at the time of utterance becomes a predetermined judgment result, the recognition mode lock unit 54 is locked and the speech recognition processing by the analysis unit 16 and the collation unit 18 is not executed again, Prompt for voice input. For example, in the comprehensive judgment unit 38, the noise level exceeds the noise threshold THn before utterance, and the voice threshold TH corresponding to the noise level at which the voice level at the time of utterance is large is TH.
Even if it is ₂ or more, it is determined that a proper recognition result cannot be obtained unless it is equal to or more than a predetermined value, the recognition mode lock unit 54 is locked and controlled, the red lamp of the lamp display unit 40 is turned on, and the utterance is issued. Prompt the next voice input.

【００５１】図１１は図１０の第５実施例の処理動作を
示したフローチャートであり、図１１において、ステッ
プＳ１〜Ｓ１１の処理は図３に示した第１実施例と基本
的に同じである。これに加えて図１１の処理にあって
は、ステップＳ８において、雑音レベルが小さくても音
声レベルが小さいために適正な範囲にないと判定された
場合の赤ランプの点灯、またはステップＳ１１で雑音レ
ベルが大きく、これに対し十分に大きな音声レベルの発
声が行われなかったときの赤ランプの点灯について、ス
テップＳ１２に進み、ＤＳＰ１４の認識モードロック部
５４に対し認識モードのロックを指示し、ステップＳ１
３でランプ表示部４０の例えば赤ランプの点滅等により
再入力を指示し、続いてステップＳ１４で音声入力が終
了した時点で認識モードロック部５４のロックを解除
し、再びステップＳ１に戻って音声再入力を待つように
なる。FIG. 11 is a flow chart showing the processing operation of the fifth embodiment of FIG. 10. In FIG. 11, the processing of steps S1 to S11 is basically the same as that of the first embodiment shown in FIG. . In addition to this, in the process of FIG. 11, the red lamp is lit when it is determined in step S8 that the sound level is low but the sound level is low, but the red lamp is turned on, or the noise is detected in step S11. Regarding the lighting of the red lamp when the level is high and the voicing of a sufficiently high voice level is not performed, the process proceeds to step S12, where the recognition mode lock unit 54 of the DSP 14 is instructed to lock the recognition mode, and the step is performed. S1
In step 3, the re-input is instructed by, for example, blinking a red lamp of the lamp display section 40, and subsequently, when the voice input ends in step S14, the recognition mode lock section 54 is unlocked, and the process returns to step S1 again to return the voice. It will wait for re-input.

【００５２】この第５実施例に設けた音声モードロック
部５４については、ＳＮ比を計算して判定する図５の第
２実施例、メッセージ表示を行う図８の第３実施例、更
に音声メッセージを出力する図９の第４実施例について
も全く同様に適用することができる。Regarding the voice mode lock unit 54 provided in the fifth embodiment, the second embodiment of FIG. 5 for judging by calculating the SN ratio, the third embodiment of FIG. 8 for displaying a message, and further the voice message. The same can be applied to the fourth embodiment of FIG.

【００５３】[0053]

【発明の効果】以上説明してきたように本発明によれ
ば、発声前に現在、音声認識装置が置かれている使用環
境における背景雑音の状況を発声者が認識できるため、
背景雑音の大小に適合した強さで発声することとなり、
認識率を向上することができる。また、発声中には背景
雑音に対する発声音声の割合がどの程度になっているか
を知ることができるため、背景雑音が高い場合には十分
に大きく発声して認識率を向上することができ、あるい
は現在の環境で音声認識が無理であれば緩い環境に場所
を移して発声することによって認識率を向上させること
ができる。As described above, according to the present invention, the speaker can recognize the state of background noise in the use environment in which the voice recognition device is currently placed before utterance.
You will be uttering with a strength that matches the size of the background noise.
The recognition rate can be improved. Further, since it is possible to know the ratio of the uttered speech to the background noise during utterance, when the background noise is high, it is possible to utter sufficiently large to improve the recognition rate, or If voice recognition is not possible in the current environment, the recognition rate can be improved by moving to a loose environment and uttering.

【００５４】更に、正確な音声認識が保証されない環境
状態については、発声時に音声認識を禁止して次の音声
認識を促すことで、音声認識が保証できないような環境
でも無理な音声入力の繰返しを防ぐことができ、使用場
所を移す等して、より適切な環境での音声認識を行うこ
とができる。Further, regarding an environmental condition in which accurate voice recognition is not guaranteed, voice recognition is prohibited at the time of utterance and the next voice recognition is prompted, so that it is possible to repeat voice input in an environment where voice recognition cannot be guaranteed. It is possible to prevent this, and by changing the place of use, it is possible to perform voice recognition in a more appropriate environment.

[Brief description of drawings]

【図１】本発明の原理説明図FIG. 1 is an explanatory view of the principle of the present invention.

【図２】本発明の第１実施例を示した実施例構成図FIG. 2 is a configuration diagram of an embodiment showing the first embodiment of the present invention.

【図３】図２の装置使用環境の判定処理を示したフロー
チャートFIG. 3 is a flowchart showing a process of determining the device usage environment of FIG.

【図４】図２の音声認識処理を示したフローチャートFIG. 4 is a flowchart showing a voice recognition process of FIG.

【図５】本発明の第２実施例を示した実施例構成図FIG. 5 is a configuration diagram of an embodiment showing a second embodiment of the present invention.

【図６】図５で用いるＳＮ比の判定に使用する閾値ＴＨ
を決めるためのＳＮ比とエラーレートの特性図FIG. 6 is a threshold TH used for determining the SN ratio used in FIG.
Characteristic diagram of SN ratio and error rate for determining

【図７】図５の装置使用環境の判定処理を示したフロー
チャートFIG. 7 is a flowchart showing a determination process of the device usage environment of FIG.

【図８】本発明の第３実施例を示した実施例構成図FIG. 8 is a configuration diagram of an embodiment showing a third embodiment of the present invention.

【図９】本発明の第４実施例を示した実施例構成図FIG. 9 is a configuration diagram of an embodiment showing a fourth embodiment of the present invention.

【図１０】本発明の第５実施例を示した実施例構成図FIG. 10 is a configuration diagram of an embodiment showing a fifth embodiment of the present invention.

【図１１】図１０の装置使用環境の判定処理を示したフ
ローチャートFIG. 11 is a flowchart showing a determination process of the device usage environment of FIG.

[Explanation of symbols]

１０：マイクロホン（音声用）１２：音声入力部（音声入力手段）１４：ＤＳＰ（音声認識手段）１６：分析部１８：照合部２０：辞書２２：結果表示部２４：マイクロホン（背景雑音用）２６：雑音入力部（雑音入力手段）２８：ＭＰＵ３０：雑音レベル計算部（雑音レベル検出手段）３２：雑音レベル判定部３４：音声レベル計算部（音声レベル検出手段）３６：音声レベル判定部３８：総合判定部４０：ランプ表示部４２：ＳＮ比計算部４４：ＳＮ比判定部４６：メッセージ表示部４８：メッセージ登録部５０：音声メッセージ再生部５２：音声メッセージ登録部５４：認識モードロック部５６：スピーカ 10: Microphone (for voice) 12: Voice input section (voice input means) 14: DSP (voice recognition means) 16: Analysis section 18: Collation section 20: Dictionary 22: Result display section 24: Microphone (for background noise) 26 : Noise input section (noise input means) 28: MPU 30: Noise level calculation section (noise level detection means) 32: Noise level determination section 34: Voice level calculation section (voice level detection means) 36: Voice level determination section 38: Comprehensive determination unit 40: Lamp display unit 42: SN ratio calculation unit 44: SN ratio determination unit 46: Message display unit 48: Message registration unit 50: Voice message reproduction unit 52: Voice message registration unit 54: Recognition mode lock unit 56: Speaker

Claims

[Claims]

1. A voice input means (12) for converting a voice of a speaker into an electric signal and inputting the voice signal, and analyzing a voice signal input by the voice input means (12) to obtain an input voice pattern to obtain a dictionary ( Voice recognition means (14) for recognizing voice contents by collating with a standard pattern registered in advance in 20), and noise input means (26) for converting background noise in the usage environment of the device into an electric signal for inputting. Noise level detecting means (28) for detecting the noise level of the noise signal input by the noise input means (26), and voice level detecting means (34) for detecting the voice level of the voice signal input by the voice input means (12). ) And based on the noise level detected by the noise detection means (28), the speaker is informed of the usage environment of the device before utterance, and at the time of utterance, based on the voice level and the noise level. Speech recognition apparatus according to environmental state informing means for informing the utterance situation in the use environment of the stomach device (60), characterized in that the provided.

2. The voice recognition device according to claim 1, wherein the environmental condition notifying means (60) notifies the speaker whether or not the noise level before utterance is within an appropriate range, and
A voice recognition device characterized by informing a speaker whether or not the voice level at the time of utterance is within an appropriate range.

3. The voice recognition device according to claim 2, wherein the environmental condition notification means (60) compares the noise level before utterance with a predetermined threshold value, and when the noise level is less than the threshold value, the noise level is appropriate. A voice recognition device characterized in that a speaker is informed of being in a proper range, and when it is larger than the threshold, the speaker is informed that the range is not in an appropriate range.

4. The voice recognition device according to claim 2, wherein the environmental condition notifying means (60) compares the voice level at the time of utterance with a predetermined threshold value, and when the voice level is equal to or higher than the threshold value, the voice level is appropriate. A voice recognition device characterized in that a speaker is informed of being in a proper range, and when it is smaller than the threshold, the speaker is informed that the range is not in an appropriate range.

5. The voice recognition apparatus according to claim 2, wherein the environmental condition notifying means (60) uses a first voice threshold when a noise level before utterance is equal to or lower than a predetermined noise threshold, And a second voice threshold larger than the first voice threshold used when the noise level before utterance is higher than the noise threshold, and when the noise level before utterance is less than or equal to the noise threshold, a voice level at the time of utterance Is compared with the first voice threshold, and when the voice level is equal to or higher than the first voice threshold, the voice speaker is informed that the voice level is within the proper range. When the voice level is lower than the first voice threshold, the voice speaker is not within the proper range. Further, when the noise level before utterance is higher than the noise threshold, the voice level at the time of utterance is compared with the second voice threshold, and when the noise level is equal to or higher than the second voice threshold, the voice level is within a proper range. Know to the speaker So, the speech recognition apparatus characterized by informing not in the proper range when less than the second voice threshold speaker.

6. The voice recognition device according to claim 2, wherein the environmental condition notifying means (60) obtains an SN ratio of the voice level to a noise level at the time of utterance, and the SN ratio is within an appropriate range. A voice recognition device characterized by informing a speaker whether or not there is any.

7. The voice recognition device according to claim 6, wherein the environmental condition notifying means (60) calculates the SN calculated at the time of utterance.
The ratio is compared with a predetermined threshold value, and when it is equal to or higher than the threshold value, the speaker is informed that the sound level is in the proper range, and when it is lower than the threshold value, the speaker is informed that the sound level is not in the proper range. Voice recognition device.

8. The voice recognition device according to claim 1, wherein the environmental condition notifying means (60) is not in the proper range, and the first indicator lamp is lit to notify that the environment condition is in the proper range. And a second display lamp that is turned on to notify the voice recognition device.

9. The voice recognition device according to claim 8, wherein a green indicator lamp is used as a first indicator lamp for notifying that the usage environment is within the proper range, and notifies that the usage environment is not within the proper range. A voice recognition device characterized in that a red display lamp is used as the second display lamp.

10. A voice recognition apparatus according to claim 1, wherein the environmental condition notifying means (60) is a first display lamp which is turned on for notifying that it is in an appropriate range, Second indicator lamp that lights up to inform
And a voice recognition device comprising a third display lamp which is lit to inform that the range is between the proper range and the improper range.

11. The voice recognition device according to claim 1, wherein the environmental condition notifying means (60) is provided with only a display lamp which is turned on for notifying that the environmental condition is not within the proper range. Speech recognizer.

12. The voice recognition device according to claim 1, wherein the environmental condition notifying means (60) informs the speaker by displaying a message that the use environment is within the proper range or not within the proper range. A voice recognition device characterized by the above.

13. The voice recognition device according to claim 1, wherein the environmental condition notifying means (60) outputs a voice message indicating that the usage environment is within or outside the proper range. A voice recognition device characterized by notifying the user.

14. The voice recognition device according to claim 1, wherein the environment condition notifying means (60) detects the voice recognition means (60) when the use environment is not within an appropriate range at the time of utterance. 14) A voice recognition device characterized by notifying the prohibition of recognition processing to 14) and prompting a speaker to input a voice again.