JPH0713591A

JPH0713591A - Device and method for speech recognition

Info

Publication number: JPH0713591A
Application number: JP5150712A
Authority: JP
Inventors: Hiroaki Kokubo; 浩明小窪; Akio Amano; 明雄天野
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1993-06-22
Filing date: 1993-06-22
Publication date: 1995-01-17

Abstract

PURPOSE:To improve the recognition performance in noisy environment even when a large vocabulary are registered by using all recognized object vocabulary which are previously recognized as objects of recognition in less noisy environment, and excluding some of the recognized vocabulary which are previously registered from objects of recognition and performing recognition in the noisy environment. CONSTITUTION:This device and method are equipped with a speech input part 101, an A/D conversion part 102 which quantizes an input speech obtained by the speech input part 101, an analysis part 103 which obtains feature components of the input speech, a standard pattern storage part 104 for feature vectors of the previously registered recognized object vocabulary a collation part 105 which recognizes the input speech by obtaining the similarity degree between the feature vectors and the feature vectors obtained by the analysis part 103, a noise state detection part 106, and a recognized object vocabulary limiting part 107 which makes the number of the recognized object vocabulary to be collated by the collation part 105 less than the number of the previously registered recognized object vocabulary. The recognized vocabulary limiting part 107 limits the recognized object vocabulary corresponding to the noise state detected by the noise detection part 106.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声認識装置に係り、
特に環境により異なる騒音条件下でも安定に動作する携
帯型音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device,
In particular, the present invention relates to a portable voice recognition device that operates stably even under different noise conditions depending on the environment.

【０００２】[0002]

【従来の技術】携帯電話に代表される小型携帯端末で
は、小型故に操作ボタンの数や大きさは限定されてく
る。従って、このような装置にはボタンにより操作する
よりも、手書き文字や音声等を用いた操作が望まれてい
る。2. Description of the Related Art In a small portable terminal represented by a portable telephone, the number and size of operation buttons are limited due to its small size. Therefore, such a device is desired to be operated by using handwritten characters or voice rather than being operated by buttons.

【０００３】音声認識の問題点は、騒音環境下で使用す
ると周囲雑音のため認識性能が大幅に劣化してしまう点
である。しかも、大語彙の認識、類似単語の多い認識の
場合には特に認識誤りが生じやすい。したがって、音声
認識装置を実用化するためには騒音下で発声した音声で
も正しく認識出来るような耐雑音化の技術も不可欠であ
る。A problem of voice recognition is that when used in a noisy environment, the recognition performance is significantly deteriorated due to ambient noise. Moreover, in the case of recognition of a large vocabulary or recognition of many similar words, a recognition error is likely to occur. Therefore, in order to put the voice recognition device into practical use, a noise resistant technology that can correctly recognize even a voice uttered under noise is indispensable.

【０００４】[0004]

【発明が解決しようとする課題】騒音環境下で発声した
音声を正しく認識させる手段としては、前処理によって
雑音が重畳した音声から雑音を除去する手法、あるいは
雑音が重畳された音声でも正しく認識できる認識手法が
必要である。前者には、適応フィルタを用いて雑音を除
去するもの、音声に混入した雑音スペクトルを推定し、
入力スペクトルから差し引くスペクトルサブトラクショ
ン法などがある。後者には、雑音の影響を受けにくいパ
ラメータや距離尺度を用いる手法、および雑音を予め標
準パタンに重畳しておく雑音重畳法などがある。しかし
ながら、数多くの雑音処理手法が提案されているもの
の、依然、静かな環境下での認識性能と比較すると十分
とは言えない。As means for correctly recognizing a voice uttered in a noisy environment, a method of removing noise from a voice on which noise is superimposed by preprocessing or a voice on which noise is superimposed can be correctly recognized. A recognition method is needed. For the former, the one that removes noise using an adaptive filter, the noise spectrum mixed in the speech is estimated,
There is a spectral subtraction method that subtracts from the input spectrum. The latter includes a method that uses parameters and distance measures that are less susceptible to noise, and a noise superposition method that preliminarily superimposes noise on a standard pattern. However, although many noise processing methods have been proposed, it is still insufficient when compared with the recognition performance in a quiet environment.

【０００５】本発明の目的は、騒音環境下での音声認識
精度を向上させることができる音声認識装置を提供する
ことにある。An object of the present invention is to provide a voice recognition device capable of improving the voice recognition accuracy in a noisy environment.

【０００６】[0006]

【課題を解決するための手段】上記課題を解決するため
に、本発明による音声認識装置は、認識対象となる音声
を入力する音声入力部と、該音声入力部より得られた入
力音声を量子化するＡ／Ｄ変換部と、前記入力音声の特
徴成分を求める分析部と、予め登録しておいた認識対象
語彙の特徴ベクトルを格納した標準パタン格納部と、該
標準パタンに格納されている特徴ベクトルと前記分析部
で求めた特徴ベクトルとの類似度を求めることで前記入
力音声の認識をおこなう照合部と、認識時の騒音状態を
検出する騒音状態検出部と、前記照合部で照合する認識
対象語彙数を予め登録しておいた認識対象語彙数よりも
少なくする認識語彙制限部とを備え、前記認識語彙制限
部は、前記騒音状態検出部で検出された騒音状態に応じ
て認識対象語彙を制限するようにしたものである。In order to solve the above-mentioned problems, a voice recognition apparatus according to the present invention includes a voice input section for inputting a voice to be recognized, and an input voice obtained from the voice input section. A / D conversion unit for converting the input speech, an analysis unit for obtaining a characteristic component of the input voice, a standard pattern storage unit storing a feature vector of a recognition target vocabulary registered in advance, and a standard pattern storage unit stored in the standard pattern. A matching unit that recognizes the input voice by finding the similarity between the feature vector and the feature vector found by the analysis unit, a noise state detection unit that detects the noise state at the time of recognition, and the matching unit collate. A recognition vocabulary limiting unit for reducing the number of recognition target vocabularies to a number smaller than the number of recognition target vocabularies registered in advance, wherein the recognition vocabulary limiting unit recognizes the recognition target vocabulary according to the noise state detected by the noise state detecting unit. Vocabulary In which it was to be limited.

【０００７】また、本発明による音声認識方法は、予め
登録された複数の語彙と入力音声とを照合して、音声認
識を行う音声認識装置における音声認識方法であって、
音声認識時の周囲の騒音状態を検出し、該検出された騒
音の大小を判断し、該騒音が小さいと判断されたとき、
前記登録されたすべての語彙を認識対象として用いて音
声認識を行い、前記騒音が大きいと判断されたとき、前
記登録された語彙の一部のみを認識対象として用いて音
声認識を行うことを特徴とする。Further, the voice recognition method according to the present invention is a voice recognition method in a voice recognition device for performing voice recognition by collating a plurality of vocabularies registered in advance with an input voice,
Detecting the surrounding noise state at the time of voice recognition, determining the magnitude of the detected noise, when it is determined that the noise is small,
Speech recognition is performed using all the registered vocabulary as recognition targets, and when it is determined that the noise is large, only a part of the registered vocabulary is used as recognition target for speech recognition. And

【０００８】[0008]

【作用】音声認識装置には認識可能な語彙を予め登録し
ておく。この登録できる語彙の数は、使用者側の立場で
は、なるべく多いことが望ましい。ところが、一般に音
声認識装置は認識対象語彙が多いほど、また類似単語の
多いほど認識性能は低下する。特に、騒音環境下では類
似単語による性能劣化の程度は大きい。したがって、登
録可能な語彙数を大きく設定すると騒音のない環境では
充分な認識性能が得られるものの、騒音環境下では認識
誤りが多く、充分な認識性能は期待できない。Function: A recognizable vocabulary is registered in advance in the voice recognition device. From the standpoint of the user, it is desirable that the number of vocabularies that can be registered be as large as possible. However, in general, a speech recognition apparatus has lower recognition performance as the vocabulary to be recognized is larger and the similar words are larger. Especially, in a noisy environment, the degree of performance deterioration due to similar words is large. Therefore, if a large number of vocabularies that can be registered are set, sufficient recognition performance can be obtained in a noise-free environment, but in a noise environment, recognition errors are large and sufficient recognition performance cannot be expected.

【０００９】そこで、本発明では、騒音が少ない環境で
使用する場合には予め登録しておいた認識語彙全てを認
識対象として使用し、騒音の多い環境では騒音状態検出
部、認識語彙制限部により、予め登録しておいた認識語
彙のうち一部を除外して（例えば使用頻度の低い語彙あ
るいは類似語を認識対象から外して）認識をおこなう。Therefore, in the present invention, when used in an environment with little noise, all the recognition vocabulary registered in advance is used as a recognition target, and in a noisy environment, the noise state detecting section and the recognition vocabulary limiting section are used. The recognition is performed by excluding a part of the recognition vocabulary registered in advance (for example, removing a vocabulary or a similar word that is rarely used from the recognition target).

【００１０】以上の動作により、多くの語彙を登録する
場合であっても騒音環境下の認識性能を向上させること
が可能になった。By the above operation, it becomes possible to improve the recognition performance in a noisy environment even when many vocabularies are registered.

【００１１】[0011]

【実施例】以下、本発明の実施例を説明する。本実施例
では、音声認識機能を有する携帯型端末装置に本発明を
適用した例を説明する。EXAMPLES Examples of the present invention will be described below. In the present embodiment, an example in which the present invention is applied to a mobile terminal device having a voice recognition function will be described.

【００１２】図１５に、携帯型端末装置の外観を示す。
同図において、１０１は音声入力部、１１０は音声出力
部、１１１は表示部、１１２は選択ボタン、５０６はセ
ンサ部である。これらの各部の機能については後述す
る。FIG. 15 shows the external appearance of the portable terminal device.
In the figure, 101 is a voice input unit, 110 is a voice output unit, 111 is a display unit, 112 is a selection button, and 506 is a sensor unit. Functions of these units will be described later.

【００１３】図１は、本発明の音声認識装置のシステム
構成の一実施例を示すブロック図である。図１におい
て、１０１は音声入力部、１０２はＡ／Ｄ変換部、１０
３は分析部、１０４は標準パタン格納部、１０５は照合
部、１０６は騒音状態検出部、１０７は認識語彙制限
部、１０８は音声区間検出部、１０９はコマンド実行部
である。音声入力部１０１は音声コマンド等の音声を入
力する部分である。音声入力部１０１より入力した音声
信号はＡ／Ｄ変換部１０２により量子化され、音声区間
検出部１０８で音声区間が検出される。音声区間検出に
ついては、古井“ディジタル音声処理”東海大学出版会
などに詳しい。例えば、入力信号の短時間パワーを一定
時間毎に抽出していき、閾値以上の短時間パワーが一定
時間以上継続された否かによって音声区間を判定する手
法がよく用いられるものとして挙げられる。音声区間検
出部２０１で検出された音声区間以外の部分は騒音のみ
の区間と考えてほぼ間違いない。音声区間の量子化され
た音声信号は、分析部１０３に入る。分析部１０３では
ＬＰＣ分析等の分析手法を用いて認識の判定に用いる音
声の特徴ベクトルを抽出する。音声の特長抽出法につい
ては、前述の古井“ディジタル音声処理”などに詳し
い。音声の特徴ベクトルとして、例えばＬＰＣケプスト
ラムなどがよく用いられる。標準パタン格納部１０４
は、予め登録しておいた語彙の特徴ベクトル（標準パタ
ン）を格納しておく。照合部１０５は、分析部１０３で
特徴ベクトルに変換された入力音声と、標準パタン格納
部１０４に格納されている標準パタンとの間で類似度計
算をおこない、いちばん大きい類似度の登録語彙を認識
結果として出力する。騒音状態検出部１０６は、認識性
能が劣化する大きな要因のひとつである騒音の状態を検
出する部分である。認識語彙制限部１０７は、騒音状態
検出部１０６で得られた騒音状態の程度により認識対象
語彙を制限する。勿論、騒音状態の程度が充分小さい場
合には認識語彙制限部１０７はなにもしない。FIG. 1 is a block diagram showing an embodiment of the system configuration of the speech recognition apparatus of the present invention. In FIG. 1, 101 is a voice input unit, 102 is an A / D conversion unit, 10
3 is an analysis unit, 104 is a standard pattern storage unit, 105 is a collation unit, 106 is a noise state detection unit, 107 is a recognition vocabulary restriction unit, 108 is a voice section detection unit, and 109 is a command execution unit. The voice input unit 101 is a unit for inputting voice such as voice commands. The voice signal input from the voice input unit 101 is quantized by the A / D conversion unit 102, and the voice section is detected by the voice section detection unit 108. For more information on voice segment detection, see Furui "Digital Speech Processing" Tokai University Press. For example, a method is often used in which the short-time power of the input signal is extracted at regular time intervals and the voice section is determined based on whether or not the short-time power equal to or more than a threshold value is continued for a predetermined time or longer. It is almost certain that the portion other than the voice section detected by the voice section detection unit 201 is considered to be a noise-only section. The quantized voice signal in the voice section enters the analysis unit 103. The analysis unit 103 extracts an audio feature vector used for recognition determination by using an analysis method such as LPC analysis. For details on voice feature extraction methods, see Furui "Digital Speech Processing" mentioned above. As the voice feature vector, for example, LPC cepstrum is often used. Standard pattern storage unit 104
Stores a vocabulary feature vector (standard pattern) registered in advance. The matching unit 105 calculates the similarity between the input speech converted into the feature vector by the analysis unit 103 and the standard pattern stored in the standard pattern storage unit 104, and recognizes the registered vocabulary having the largest similarity. Output as a result. The noise state detection unit 106 is a unit that detects a noise state, which is one of the major factors that deteriorate the recognition performance. The recognition vocabulary limiting unit 107 limits the recognition target vocabulary according to the degree of the noise state obtained by the noise state detecting unit 106. Of course, when the noise level is sufficiently low, the recognition vocabulary limiting unit 107 does nothing.

【００１４】図１２により、図１のシステムのシステム
処理フローを説明する。まず、使用者が音声コマンドを
起動すると、Ａ／Ｄ変換部１０２が起動され、音声の取
込を開始する（Ｓ１０）。そこで、入力音声信号の音声
区間を検出するとともに（Ｓ１１）、騒音状態の検出を
行う（Ｓ１２）。この騒音状態を予め定めた閾値と比較
し（Ｓ１３）、騒音が小さいと判定されればステップＳ
１５に進む。騒音が大きいと判定されれば、認識対象語
彙数を制限する（Ｓ１４）。この認識対象語彙の制限
は、例えば、標準パタン格納部１０４に格納されている
登録語彙の特徴ベクトル（標準パタン）に使用頻度情報
を持たせ、使用頻度の高いもののみを認識対象語彙とす
る。そこで、この制限された語彙の標準パタンと入力音
声との照合を行い（Ｓ１５）、認識結果の提示を行い
（Ｓ１６）、使用者の確認を求める。この提示は、図１
５に示した表示部１１１または音声出力部（音声合成部
を含む）１１０により行う。使用者が、確認ボタンを押
すなどにより認識結果が正しい旨入力した場合、その認
識結果が表わすコマンドを実行する（Ｓ１８）。認識結
果が誤っている場合には、使用者が再度音声指示を行う
等によりステップＳ１０からの処理が再度行われる。認
識結果が正しい旨、使用者が指示した場合、さらに、使
用頻度情報の更新を行う（Ｓ１９）。The system processing flow of the system shown in FIG. 1 will be described with reference to FIG. First, when the user activates a voice command, the A / D conversion unit 102 is activated and voice acquisition is started (S10). Therefore, the voice section of the input voice signal is detected (S11) and the noise state is detected (S12). This noise state is compared with a predetermined threshold value (S13), and if it is determined that the noise level is low, step S
Proceed to 15. If it is determined that the noise is large, the number of recognition target words is limited (S14). Regarding the limitation of the recognition target vocabulary, for example, the feature vector (standard pattern) of the registered vocabulary stored in the standard pattern storage unit 104 is provided with the usage frequency information, and only the frequently used one is set as the recognition target vocabulary. Therefore, the standard pattern of the restricted vocabulary is compared with the input voice (S15), the recognition result is presented (S16), and the user's confirmation is requested. This presentation is shown in Figure 1.
The display unit 111 or the voice output unit (including the voice synthesis unit) 110 shown in FIG. When the user inputs that the recognition result is correct by pressing the confirmation button or the like, the command represented by the recognition result is executed (S18). If the recognition result is incorrect, the processing from step S10 is performed again by the user giving a voice instruction again. When the user indicates that the recognition result is correct, the usage frequency information is further updated (S19).

【００１５】図１３により、音声ダイヤルを例にとっ
て、本システムの入出力シーケンスを説明する。まず、
音声コマンドを起動すると、システムからコマンドを要
求するガイダンスが出力される（２００１）。システム
からの出力は、この例では「コマンドを入力してくださ
い。」という音声を示しているが、音声でなく液晶等に
よる画面表示であってもよく、あるいは、両者を併用し
てもよい。使用者は、ガイダンスに従い、音声コマンド
を入力する（２００２）。この例では、ダイヤリングし
たい相手の名前「いちかわ」を発声している。音声コマ
ンドが入力されると、システムは入力した音声コマンド
の認識結果を使用者に返し（２００３）、使用者に確認
を求める。ここでは、「いちかわ」を「いしかわ」と誤
って認識した例を示している。使用者は、認識結果が誤
っていると判断すると音声コマンド（「いちかわ」）を
再入力する（２００４）。システムは音声コマンドが再
入力された音声コマンドを再度認識し、その認識結果
（この例では「いちかわ」）を返す（２００５）。そこ
で、使用者は認識結果が正しいことを確認ボタンの押下
等によりシステムに通知する（２００６）。その結果、
システムは、認識したコマンドを実行し、ダイヤリング
を開始する（２００７）。The input / output sequence of this system will be described with reference to FIG. 13 by taking a voice dial as an example. First,
When the voice command is activated, guidance for requesting the command is output from the system (2001). In this example, the output from the system indicates a voice "Please input a command.", But the voice may be displayed on the screen by liquid crystal or the like, or both may be used in combination. The user inputs a voice command according to the guidance (2002). In this example, the name "Ichikawa" of the person who wants to dial is uttered. When a voice command is input, the system returns the recognition result of the input voice command to the user (2003) and asks the user for confirmation. Here, an example is shown in which "Ichikawa" is mistakenly recognized as "Ishikawa". When the user determines that the recognition result is incorrect, the user re-inputs the voice command (“Ichikawa”) (2004). The system recognizes the voice command in which the voice command is re-input, and returns the recognition result (“Ichikawa” in this example) (2005). Therefore, the user notifies the system that the recognition result is correct by pressing the confirmation button or the like (2006). as a result,
The system executes the recognized command and starts dialing (2007).

【００１６】次に、騒音状態検出部１０６についてその
詳細を述べる。図２に、騒音状態検出部１０６の構成例
を示す。Next, the noise state detecting section 106 will be described in detail. FIG. 2 shows a configuration example of the noise state detection unit 106.

【００１７】図２において、２０２はパワー算出部であ
る。パワー算出部２０２は、音声区間検出部２０１か
ら、音声区間以外の信号を騒音区間信号を受けて、信号
の短時間パワーを算出することにより、周囲騒音のパワ
ーを求める。この求められた周囲騒音のパワーは、認識
時の騒音状態情報として認識語彙制限部１０７に出力さ
れる。勿論、音声区間検出部１０８において短時間パワ
ーを算出する場合にはパワー算出部２０２は必要ない。In FIG. 2, reference numeral 202 is a power calculation unit. The power calculation unit 202 receives a signal other than the voice section from the voice section detection unit 201 as a noise section signal, and calculates the short-time power of the signal to obtain the power of ambient noise. The obtained ambient noise power is output to the recognition vocabulary limiting unit 107 as noise state information at the time of recognition. Of course, the power calculation unit 202 is not necessary when the short-term power is calculated by the voice section detection unit 108.

【００１８】図３は、騒音状態検出部１０６の他の構成
例を説明するための図である。FIG. 3 is a diagram for explaining another example of the configuration of the noise state detecting section 106.

【００１９】図３において、３０２、３０２’は図２の
パワー算出部２０２と同一構成のパワー算出部、３０３
は音声対雑音比（Ｓ／Ｎ）算出部である。図２の場合と
同様、Ａ／Ｄ変換部１０２により量子化された入力音声
が音声区間検出部１０８において音声が存在する区間と
それ以外の区間とに分離される。パワー算出部３０２、
３０２’は、それぞれ音声区間信号の短時間パワー、お
よび騒音区間信号の短時間パワーを計算する。勿論、音
声区間検出部１０８において短時間パワーを算出する場
合にはパワー算出部３０２、３０２’は必要ない。Ｓ／
Ｎ算出部３０３は音声区間信号の短時間パワー、および
騒音区間信号の短時間パワーに基づき入力音声のＳ／Ｎ
を計算し、認識時の騒音状態情報として認識語彙制限部
１０７に出力する。In FIG. 3, 302 and 302 'are power calculation units having the same configuration as the power calculation unit 202 in FIG.
Is a voice-to-noise ratio (S / N) calculator. As in the case of FIG. 2, the input voice quantized by the A / D conversion unit 102 is separated by the voice period detection unit 108 into a period in which voice exists and a period other than that. Power calculator 302,
302 'calculates the short-term power of the voice section signal and the short-term power of the noise section signal, respectively. Of course, when the short-term power is calculated in the voice section detection unit 108, the power calculation units 302 and 302 'are not necessary. S /
The N calculation unit 303 calculates the S / N of the input voice based on the short-time power of the voice section signal and the short-time power of the noise section signal.
Is output to the recognition vocabulary limiting unit 107 as noise state information at the time of recognition.

【００２０】図４は、さらに他の構成の騒音状態検出部
４０８を有する本発明の他の実施例のシステム構成を示
す。FIG. 4 shows a system configuration of another embodiment of the present invention having a noise state detecting section 408 having still another configuration.

【００２１】図４において、図１の実施例と同様、１０
１は音声入力部、１０２はＡ／Ｄ変換部、１０３は分析
部、１０４は標準パタン格納部、１０５は照合部、４０
６は認識結果確認部、１０７は認識語彙制限部、１０８
は音声区間検出部、１０９はコマンド実行部であり、さ
らに４０８は騒音状態検出部である。図１の実施例と同
様に、音声入力部１０１より入力した音声信号は、Ａ／
Ｄ変換部１０２で量子化され、分析部１０３で特徴ベク
トルに変換される。照合部１０５は分析部１０３より出
力された特徴ベクトルと標準パタン格納部１０４に格納
されている標準パタンとの距離計算をおこない、認識結
果を認識結果確認部４０６に出力する。認識結果確認部
４０６は、照合部１０５で認識された結果を音声や文字
などの情報として出力し、使用者に認識結果の確認を求
める。認識結果の確認方法は「確認」、「キャンセル」
等の選択ボタン１１２（図１５）の操作や、音声による
確認、言い直しなどが考えられる。認識結果確認部４０
６は、認識結果が正しいと確認すると認識結果をコマン
ド実行部１０９に出力し、音声コマンドの実行を依頼す
る。また、認識結果確認部４０６は、認識結果が正しく
ないと判断すると、騒音状態検出部４０８に認識誤りが
生じたことを出力すると共に、新たな認識結果を受け付
ける。騒音状態検出部４０８は、認識誤りが生じたこと
を検出すると、認識誤りの頻度を騒音状態情報として認
識対象制限部１０７に出力する。認識語彙制限部１０７
は、騒音状態検出部４０８で得られた騒音状態の程度に
より認識対象語彙を制限する。In FIG. 4, as in the embodiment of FIG.
1 is a voice input unit, 102 is an A / D conversion unit, 103 is an analysis unit, 104 is a standard pattern storage unit, 105 is a collation unit, 40
6 is a recognition result confirmation unit, 107 is a recognition vocabulary restriction unit, 108
Is a voice section detection unit, 109 is a command execution unit, and 408 is a noise state detection unit. Similar to the embodiment of FIG. 1, the audio signal input from the audio input unit 101 is A /
The D conversion unit 102 quantizes and the analysis unit 103 converts it into a feature vector. The matching unit 105 calculates the distance between the feature vector output from the analysis unit 103 and the standard pattern stored in the standard pattern storage unit 104, and outputs the recognition result to the recognition result confirmation unit 406. The recognition result confirmation unit 406 outputs the result recognized by the collation unit 105 as information such as voice and characters, and requests the user to confirm the recognition result. The confirmation method of the recognition result is "Confirm", "Cancel"
The operation of the selection button 112 (FIG. 15) such as, confirmation by voice, rewording, etc. can be considered. Recognition result confirmation unit 40
When confirming that the recognition result is correct, 6 outputs the recognition result to the command execution unit 109 and requests the execution of the voice command. Further, when the recognition result confirmation unit 406 determines that the recognition result is not correct, the recognition result confirmation unit 406 outputs to the noise state detection unit 408 that a recognition error has occurred, and receives a new recognition result. When detecting that a recognition error has occurred, the noise state detection unit 408 outputs the frequency of the recognition error to the recognition target restriction unit 107 as noise state information. Recognition vocabulary restriction unit 107
Limits the recognition target vocabulary according to the degree of the noise state obtained by the noise state detection unit 408.

【００２２】図５は、騒音状態検出部１０６の第３の実
施例を説明するための図である。FIG. 5 is a diagram for explaining the third embodiment of the noise state detecting section 106.

【００２３】図５において、１０１は音声入力部、１０
２はＡ／Ｄ変換部、１０３は分析部、１０４は標準パタ
ン格納部、１０５は照合部、１０７は認識語彙制限部、
１０８は音声区間検出部であり、５０６はセンサ部、５
０７は騒音状態検出部である。図１の実施例と同様に、
音声入力部１０１より入力した音声信号は、Ａ／Ｄ変換
部１０２で量子化され、分析部１０３で特徴ベクトルに
変換される。照合部１０５は、分析部１０３より出力さ
れた特徴ベクトルと標準パタン格納部１０４に格納され
ている標準パタンとの距離計算をおこない、認識結果を
得る。センサ部５０６は、例えば認識装置と使用者（の
口元）と間の距離を測定し、騒音状態検出部５０７に出
力する。距離の測定については、赤外線センサや超音波
センサ等の距離センサを使用することで実現可能であ
る。騒音状態検出部５０７は、音声入力部１０１と使用
者の距離大きいほど入力音声のＳ／Ｎが悪いと想定し、
センサー部５０６で得られた距離情報を騒音状態情報と
して認識対象制限部１０７に出力する。認識語彙制限部
１０７は、騒音状態検出部５０７で得られた騒音状態の
程度により認識対象語彙を制限する。ここで、認識装置
に認識条件を選択するためのスイッチを設け、センサ部
５０６で距離を測定するかわりにスイッチの状態を検出
し、騒音状態検出部５０７はスイッチの状態を認識時の
騒音状態情報として認識語彙制限部１０７に出力する変
形も実現可能である。スイッチは「通常モード」、「騒
音モード」等の切替スイッチでもよいし、使用環境毎に
多段階に切り替えられるものでもよい。In FIG. 5, 101 is a voice input unit and 10 is a voice input unit.
2 is an A / D conversion unit, 103 is an analysis unit, 104 is a standard pattern storage unit, 105 is a collation unit, 107 is a recognition vocabulary restriction unit,
Reference numeral 108 denotes a voice section detection unit, 506 denotes a sensor unit, 5
Reference numeral 07 is a noise state detection unit. Similar to the embodiment of FIG.
The audio signal input from the audio input unit 101 is quantized by the A / D conversion unit 102 and converted into a feature vector by the analysis unit 103. The matching unit 105 calculates the distance between the feature vector output from the analysis unit 103 and the standard pattern stored in the standard pattern storage unit 104, and obtains a recognition result. The sensor unit 506 measures, for example, the distance between the recognition device and the user's (mouth), and outputs it to the noise state detection unit 507. The distance can be measured by using a distance sensor such as an infrared sensor or an ultrasonic sensor. The noise state detection unit 507 assumes that the S / N of the input voice is worse as the distance between the voice input unit 101 and the user is larger,
The distance information obtained by the sensor unit 506 is output to the recognition target restriction unit 107 as noise state information. The recognition vocabulary limiting unit 107 limits the recognition target vocabulary according to the degree of the noise state obtained by the noise state detecting unit 507. Here, a switch for selecting a recognition condition is provided in the recognition device, and the state of the switch is detected instead of measuring the distance by the sensor unit 506, and the noise state detection unit 507 detects noise state information at the time of recognizing the state of the switch. It is also possible to realize a modification in which the output is output to the recognition vocabulary restriction unit 107 as. The switch may be a “normal mode”, a “noise mode”, or the like, or may be a switch that can be switched in multiple stages for each usage environment.

【００２４】これまでに説明した騒音状態検出のための
構成は例示に過ぎず、他にも同様な効果を得ることがで
きれば他の構成でも構わない。また、上述した構成を複
数併用することも可能である。The configuration for detecting the noise state described above is merely an example, and other configurations may be used as long as the same effect can be obtained. It is also possible to use a plurality of the above-mentioned configurations together.

【００２５】次に、認識語彙制限部１０７について詳細
に述べる。図６に、認識語彙制限部１０７の構成例を示
す。Next, the recognition vocabulary limiting unit 107 will be described in detail. FIG. 6 shows a configuration example of the recognized vocabulary restriction unit 107.

【００２６】図６において、１０４は標準パタン格納
部、６０２は重要語彙記憶部、６０３は認識語彙制限部
である。重要語彙記憶部６０２は、標準パタン格納部１
０４に格納されている登録語彙のうち、使用者が頻繁に
使用する語彙や重要語彙として予め指定したものを記憶
する部分である。重要語彙の指定は語彙登録時でも登録
後でもどちらでもよい。認識語彙制限部６０３は、騒音
状態検出部１０６から入力した騒音情報の値が予め定め
たが閾値よりも大きい場合、標準パタン格納部１０４に
格納されている登録語彙のうち重要語彙記憶部６０２に
登録されている語彙のみを認識対象語彙として照合部１
０５に出力する。この構成例によれば、使用者自ら指定
した語彙は騒音環境下で認識対象語彙を制限した場合で
も認識対象外となることはなく、騒音下で重要語彙が認
識できないことによる不利益を心配する必要はなくな
る。In FIG. 6, reference numeral 104 is a standard pattern storage unit, 602 is an important vocabulary storage unit, and 603 is a recognized vocabulary restriction unit. The important vocabulary storage unit 602 is the standard pattern storage unit 1
Of the registered vocabulary stored in 04, the vocabulary frequently used by the user and a previously designated important vocabulary are stored. The important vocabulary may be specified either during vocabulary registration or after registration. If the value of the noise information input from the noise state detection unit 106 is larger than a threshold value, the recognition vocabulary restriction unit 603 stores in the important vocabulary storage unit 602 among the registered vocabulary stored in the standard pattern storage unit 104. Matching unit 1 with only registered vocabulary as recognition target vocabulary
Output to 05. According to this configuration example, the vocabulary designated by the user is not excluded from the recognition target even if the recognition target vocabulary is limited in a noisy environment, and there is a concern that the important vocabulary cannot be recognized in the noise. There is no need.

【００２７】認識語彙制限部１０７の他の構成例とし
て、認識騒音状態に応じて認識対象語彙数を決め、標準
パタン格納部１０４に登録された語彙のうち優先度の高
いものから、限られた語彙として選択する手法について
説明する。ここでは、騒音状態検出部１０６が入力音声
のＳ／Ｎ値を出力する場合について説明するが、騒音状
態検出部１０６が他の場合でも同様に考えることができ
る。As another configuration example of the recognition vocabulary limiting unit 107, the number of recognition target vocabularies is determined according to the recognized noise state, and the vocabulary registered in the standard pattern storage unit 104 is restricted from the one having the highest priority. A method of selecting as a vocabulary will be described. Here, the case where the noise state detection unit 106 outputs the S / N value of the input voice will be described, but the same can be considered when the noise state detection unit 106 is other cases.

【００２８】このような構成を実現する手段として、図
７に、認識語彙制限部１０７において、認識対象語彙数
を制限するために使用する認識対象語彙数制限テーブル
７００を示す。図７において、７０１は騒音状態検出部
１０６より入力した音声のＳ／Ｎの値、７０２は対象Ｓ
／Ｎ時の認識対象語彙数を示す。認識語彙制限部１０７
は、図７のテーブル７００を参照することによって、騒
音状態検出部１０６より入力したＳ／Ｎに応じた認識対
象語彙数を得ることができる。認識語彙制限部１０７
は、認識対象語彙数に応じて標準パタン格納部１０４よ
り優先度の高い順にその語彙数分の標準パタンを選択す
る。As a means for realizing such a configuration, FIG. 7 shows a recognition target vocabulary number restriction table 700 used in the recognition vocabulary restriction unit 107 for limiting the number of recognition target words. In FIG. 7, 701 is the S / N value of the voice input from the noise state detection unit 106, and 702 is the target S.
Indicates the number of recognition target vocabulary when / N. Recognition vocabulary restriction unit 107
Can refer to the table 700 of FIG. 7 to obtain the number of recognition target vocabularies according to the S / N input from the noise state detection unit 106. Recognition vocabulary restriction unit 107
Selects the standard patterns for the number of vocabularies in descending order of priority from the standard pattern storage unit 104 according to the number of recognition target vocabularies.

【００２９】優先度の高い順に認識対象語彙を選択する
方法は、前述のように、標準パタン格納部１０４に格納
している各登録語彙に対して使用頻度情報を付与し、使
用頻度の高い順に認識対象語彙として選択していくこと
で実現できる。このように、認識対象語彙の制限に使用
頻度情報を利用すれば、入力音声が認識対象語彙から外
れる確率を大幅に少なくすることが可能である。As described above, the method of selecting the recognition target vocabulary in descending order of priority is to add the usage frequency information to each registered vocabulary stored in the standard pattern storage unit 104, and to use the vocabulary in descending order of usage frequency. It can be realized by selecting as a recognition target vocabulary. Thus, by using the usage frequency information to limit the recognition target vocabulary, it is possible to significantly reduce the probability that the input voice deviates from the recognition target vocabulary.

【００３０】認識語彙制限部１０７の第３の構成例とし
て、標準パタン格納部１０４に格納している各登録語彙
に対し、類似語情報を付与する方法について説明する。
この方法を実現する手段として、図８に、標準パタン格
納部１０４に格納してある登録語彙の類似語テーブル８
００を示す。As a third configuration example of the recognized vocabulary restriction unit 107, a method of giving similar word information to each registered vocabulary stored in the standard pattern storage unit 104 will be described.
As a means for realizing this method, a similar word table 8 of the registered vocabulary stored in the standard pattern storage unit 104 is shown in FIG.
Indicates 00.

【００３１】図８において、８０１は登録語彙、８０２
は各登録語彙に対する類似登録語彙、８０３は登録語彙
と類似登録語彙との類似度を示す。本構成例において、
標準パタン格納部１０４に新たに語彙が登録されると、
元から登録されていた語彙と新たに登録された語彙との
間で類似度が計算され、いちばん類似度の高い登録語彙
を類似登録語彙として、そのときの類似度と共に類似語
テーブル８００に登録する。認識語彙制限部１０７は、
類似語テーブル８００を参照し、類似度がある値以上の
登録語彙のペアのうち一方の語彙を認識対象から外す。
このとき、類似度の閾値は図７と同様に騒音状態検出部
１０６の出力と類似度閾値との対応テーブルを予め用意
しておく。また、類似語ペアのうちどちらの語彙を認識
対象として選択するかの判断は、使用頻度の高い語彙や
使用者が重要語彙として登録した語彙を優先することに
より実現できる。In FIG. 8, reference numeral 801 is a registered vocabulary, and 802.
Indicates a similar registered vocabulary for each registered vocabulary, and 803 indicates the similarity between the registered vocabulary and the similar registered vocabulary. In this configuration example,
When a new vocabulary is registered in the standard pattern storage unit 104,
The similarity is calculated between the vocabulary that was originally registered and the newly registered vocabulary, and the registered vocabulary with the highest similarity is registered as a similar registered vocabulary in the similar word table 800 together with the similarity at that time. . The recognition vocabulary restriction unit 107
By referring to the similar word table 800, one of the registered vocabulary pairs having a similarity of a certain value or more is excluded from the recognition target.
At this time, as the similarity threshold, a correspondence table between the output of the noise state detection unit 106 and the similarity threshold is prepared in advance as in FIG. 7. Further, the determination of which vocabulary in the similar word pair to select as the recognition target can be realized by giving priority to a vocabulary that is frequently used or a vocabulary registered by the user as an important vocabulary.

【００３２】図８の構成例によれば、騒音が少ない環境
で使用する場合には予め登録しておいた認識語彙全てを
認識対象として使用し、騒音の多い環境では騒音状態検
出部、認識語彙制限部により、予め登録しておいた認識
語彙のうち使用頻度の低い語彙および類似語を認識対象
から外して認識をおこなう。その結果、多くの語彙を登
録している場合であっても、騒音環境下の認識性能を向
上させることが可能となる。また、図５で説明した選択
スイッチを設ける場合を除けば、騒音状態に応じて認識
対象語彙が自動的に制限されるため、使用者が使用環境
を意識しなくともよいといった利点もある。According to the configuration example of FIG. 8, when used in an environment with little noise, all the recognition vocabularies registered in advance are used as recognition targets, and in a noisy environment, the noise state detection unit and the recognition vocabulary are used. The restriction unit recognizes a vocabulary and a similar word having a low frequency of use among the recognition vocabularies registered in advance by excluding them from the recognition target. As a result, even when many vocabularies are registered, it is possible to improve the recognition performance in a noisy environment. Further, except for the case where the selection switch described with reference to FIG. 5 is provided, the recognition target vocabulary is automatically limited according to the noise state, so that there is an advantage that the user does not have to be aware of the usage environment.

【００３３】以上説明した実施例では、認識対象語彙の
制限により認識対象外となった語彙を認識させることは
できない。そこで、本実施例の変形として、認識対象語
彙の制限をおこなった結果正しく認識できなかった場合
には、認識対象語彙を入れ替える方法について説明す
る。認識した結果を確認する方法については図４の認識
結果確認部４０６ですでに説明した。In the embodiment described above, it is impossible to recognize a vocabulary which is not recognized because of the limitation of the recognition target vocabulary. Therefore, as a modification of the present embodiment, a method of replacing the recognition target vocabulary when the recognition target vocabulary is not correctly recognized as a result of being restricted will be described. The method of confirming the recognized result has already been described in the recognition result confirmation unit 406 of FIG.

【００３４】図９は、２回の認識をおこなったときの認
識対象語彙の集合を摸式的に示したものである。図９に
おいて９０１は登録した全語彙の集合、９０２は認識語
彙制限部１０７により選択された第１回目の認識対象語
彙の集合、９０３は第２回目の認識対象語彙の集合であ
る。FIG. 9 schematically shows a set of recognition target vocabulary when recognition is performed twice. In FIG. 9, 901 is a set of all registered vocabularies, 902 is a set of first recognition target vocabularies selected by the recognition vocabulary limiting unit 107, and 903 is a set of second recognition target vocabularies.

【００３５】図１４の処理フローに示すように、騒音下
での第１回目の認識（Ｓ２０）では、語彙群９０２を対
象とした認識がおこなわれる（Ｓ２１，Ｓ２３）。した
がって、入力された語彙が認識対象語彙群９０２に存在
しない場合には（Ｓ２４，Ｓ２５）、正しい認識結果を
得ることはできない。そこで、第１回目の認識が正しく
ない場合、認識語彙制限部１０７は全登録語彙９０１か
ら第１回目の認識対象語彙９０２を除外した語彙を対象
として再び認識対象語彙を制限をおこなう（Ｓ２２）。
なお、図９の例では、全登録語彙９０１から第１回目の
認識対象語彙の集合９０２を除外した語彙数が認識語彙
制限部１０７で制限される認識語彙数よりも大きい場合
を示しているが、逆に認識語彙制限部１０７で制限され
る認識語彙数よりも小さい場合もありうる。この場合
は、第２回目の認識には全登録語彙９０１から第１回目
の認識対象語彙の集合９０２を除外した語彙全てを用い
ることも可能であるし、第１回目の認識対象語彙のうち
使用頻度の高い語彙および、重要語彙として登録してあ
る語彙については第２回目の認識対象語彙として除外し
ないといった変形も可能である。以上の実施例の変形に
よれば、認識対象語彙の制限により認識対象外となった
語彙に対しても２回目以降の認識では正しく認識させる
ことができる。なお、２回目には語彙数の制限自体をな
くし、すべての語彙を認識対象とすることも考えられ
る。As shown in the processing flow of FIG. 14, in the first recognition (S20) under noise, recognition is performed on the vocabulary group 902 (S21, S23). Therefore, if the input vocabulary does not exist in the recognition target vocabulary group 902 (S24, S25), a correct recognition result cannot be obtained. Therefore, when the first recognition is not correct, the recognition vocabulary restriction unit 107 restricts the recognition target vocabulary again for the vocabulary excluding the first recognition target vocabulary 902 from the entire registered vocabulary 901 (S22).
Note that the example of FIG. 9 shows a case in which the number of vocabularies obtained by excluding the set 902 of the first-time recognition target vocabularies from the total registered vocabularies 901 is larger than the number of recognition vocabularies restricted by the recognition vocabulary restriction unit 107. Conversely, it may be smaller than the number of recognized vocabularies limited by the recognized vocabulary limiting unit 107. In this case, it is possible to use all of the vocabularies excluding the set 902 of the first-time recognition target vocabulary from the entire registered vocabulary 901 for the second recognition, and to use the first recognition-target vocabulary. It is possible to make a modification such that the frequently used vocabulary and the vocabulary registered as the important vocabulary are not excluded as the second recognition target vocabulary. According to the modification of the above-described embodiment, it is possible to correctly recognize a vocabulary that is not a recognition target due to the restriction of the recognition target vocabulary in the second and subsequent recognitions. It should be noted that it is conceivable to remove the limitation on the number of vocabularies and make all the vocabularies the recognition target at the second time.

【００３６】つぎに、認識対象語彙の制限により認識対
象外となった語彙を認識させるための第２の例として、
類似語情報を用いて認識対象語彙を制限する場合には認
識結果を複数提示する本発明の第４の実施例について説
明する。図１０は本実施例を説明するためのシステム構
成図であり、図１１は認識対象語彙の摸式図である。Next, as a second example for recognizing a vocabulary that is not a recognition target due to the limitation of the recognition target vocabulary,
A fourth embodiment of the present invention will be described in which a plurality of recognition results are presented when the recognition target vocabulary is limited using similar word information. FIG. 10 is a system configuration diagram for explaining the present embodiment, and FIG. 11 is a schematic diagram of recognition target vocabulary.

【００３７】図１０において、１０１は音声入力部、１
０２はＡ／Ｄ変換部、１０３は分析部、１０４は標準パ
タン格納部、１０５は照合部、１０６は騒音状態検出
部、１０８は音声区間検出部、１００７は認識語彙制限
部、８００は類似語テーブル、１００９は認識結果確認
部、１０９はコマンド実行部である。図１、図４の実施
例と同様に、音声入力部１０１より入力した音声信号
は、Ａ／Ｄ変換部１０２で量子化され、分析部１０３で
特徴ベクトルに変換される。照合部１０５は分析部１０
３より出力された特徴ベクトルと標準パタン格納部１０
４に格納されている標準パタンとの距離計算をおこな
う。このとき騒音状態検出部１０６で検出した騒音状態
によって、認識語彙制限部１００９は、類似語テーブル
８００の情報を元に認識対象語彙の制限をおこなう。類
似語テーブル８００には図８で説明したように、登録語
彙とその類似語彙がペアで記憶されている。認識結果確
認部１００９は、照合部１０５で認識された結果を音声
や文字などの情報として出力し、使用者に認識結果の確
認を求める。In FIG. 10, 101 is a voice input unit, 1
Reference numeral 02 is an A / D conversion unit, 103 is an analysis unit, 104 is a standard pattern storage unit, 105 is a collation unit, 106 is a noise state detection unit, 108 is a voice section detection unit, 1007 is a recognition vocabulary restriction unit, and 800 is a similar word. A table, 1009 is a recognition result confirmation unit, and 109 is a command execution unit. As in the embodiment of FIGS. 1 and 4, the audio signal input from the audio input unit 101 is quantized by the A / D conversion unit 102 and converted into a feature vector by the analysis unit 103. The collating unit 105 is the analyzing unit 10.
3 and the feature pattern output from the standard pattern storage unit 10
The distance from the standard pattern stored in 4 is calculated. At this time, the recognition vocabulary limiting unit 1009 limits the recognition target vocabulary based on the information in the similar word table 800, according to the noise state detected by the noise state detecting unit 106. As described in FIG. 8, the similar word table 800 stores the registered vocabulary and the similar vocabulary in pairs. The recognition result confirmation unit 1009 outputs the result recognized by the collation unit 105 as information such as voice and characters, and requests the user to confirm the recognition result.

【００３８】図１１において、１１０１は登録した全語
彙の集合、１１０２は図１０の類似語テーブル８００に
記憶されている類似語ペアの一例、１１０３は認識制限
部１００７によって選択された認識対象語彙語彙の集合
である。図１１の例では類似語のペア「いちかわ」、
「いしかわ」のうち「いちかわ」が認識対象語彙となっ
ているため、音声入力部１０１からの入力が「いしか
わ」の場合であっても照合部１０５では「いちかわ」が
認識される。そこで、認識確認部１００９では認識候補
として照合部１０５で認識された語彙(「いちかわ」)と
その類似ペアの語彙(「いしかわ」)の両方を表示部１１
１（図１５）または音声出力部１１０（図１５）に提示
し、使用者に認識候補の選択を施す。このとき両者の違
いを強調するために同一でない部分「ち」、「し」を強
調して提示してもよい。認識候補の選択は、ボタンやタ
ッチパネル等で選択することも可能であるし、再び音声
で入力することも可能である。また、認識確認部１００
９は、照合部１０５で認識された語彙の類似語ペアの類
似度が低い場合には認識候補として類似語を出力しない
といった変形も可能である。認識結果確認部１００９で
認識結果が確定すると認識結果をコマンド実行部１０９
に出力し、音声コマンドの実行を依頼する。ここで、各
構成要素は既に説明した実施例の何れの実現方法を用い
てもよいことはいうまでもない。In FIG. 11, 1101 is a set of all registered vocabularies, 1102 is an example of similar word pairs stored in the similar word table 800 of FIG. 10, 1103 is a recognition target vocabulary selected by the recognition restriction unit 1007. Is a set of. In the example of FIG. 11, a pair of similar words “Ichikawa”,
Since "Ichikawa" of "Ishikawa" is the recognition target vocabulary, "Ichikawa" is recognized by the matching unit 105 even when the input from the voice input unit 101 is "Ishikawa". Therefore, the recognition confirming unit 1009 displays both the vocabulary (“Ichikawa”) recognized by the collating unit 105 as a recognition candidate and the vocabulary of the similar pair (“Ishikawa”).
1 (FIG. 15) or the voice output unit 110 (FIG. 15) to present the user with recognition candidates. At this time, in order to emphasize the difference between the two, the non-identical parts "chi" and "shi" may be emphasized and presented. The recognition candidate can be selected by using a button, a touch panel, or the like, or can be input by voice again. Further, the recognition confirmation unit 100
A modification of 9 is possible such that, when the similarity of the similar word pair of the vocabulary recognized by the matching unit 105 is low, the similar word is not output as a recognition candidate. When the recognition result confirmation section 1009 confirms the recognition result, the recognition result is sent to the command execution section 109.
And request execution of the voice command. Here, it goes without saying that each component may use any implementation method of the above-described embodiments.

【００３９】以上説明した図１０の実施例によれば、類
似語情報を用いて認識対象語彙を制限する場合には認識
結果を複数提示することで、認識対象語彙の制限により
認識対象外となった語彙に対しても認識候補とすること
が可能となる。According to the embodiment of FIG. 10 described above, when the recognition target vocabulary is limited by using the similar word information, a plurality of recognition results are presented, so that the recognition target vocabulary is excluded from the recognition target vocabulary. It is also possible to make a candidate for recognition even for a vocabulary that has been recognized.

【００４０】[0040]

【発明の効果】本発明によれば、騒音が少ない環境で使
用する場合には、予め登録しておいた認識対象語彙全て
を認識対象として使用し、騒音の多い環境では予め登録
しておいた認識語彙のうちその一部の語彙を認識対象か
ら外して認識をおこなうことにより、多くの語彙を登録
している場合であっても騒音環境下の認識性能を向上さ
せることが可能となる。According to the present invention, all the recognition target vocabularies registered in advance are used as recognition targets when used in a low noise environment, and registered in advance in a noisy environment. By recognizing a part of the recognized vocabulary by excluding it from the recognition target, it is possible to improve the recognition performance in a noisy environment even when many vocabularies are registered.

[Brief description of drawings]

【図１】本発明の一実施例のシステム構成を示すブロッ
ク図である。FIG. 1 is a block diagram showing a system configuration of an embodiment of the present invention.

【図２】図１の騒音状態検出部の構成例を示すブロック
図である。FIG. 2 is a block diagram showing a configuration example of a noise state detection unit in FIG.

【図３】図１の騒音状態検出部の他の構成例を示すブロ
ック図である。FIG. 3 is a block diagram showing another configuration example of the noise state detection unit in FIG.

【図４】本発明の第２の実施例のシステム構成を示すブ
ロック図である。FIG. 4 is a block diagram showing a system configuration of a second exemplary embodiment of the present invention.

【図５】本発明の第３の実施例のシステム構成を示すブ
ロック図である。FIG. 5 is a block diagram showing a system configuration of a third exemplary embodiment of the present invention.

【図６】図１の認識対象語彙数制限部の構成例を示すブ
ロック図である。6 is a block diagram showing a configuration example of a recognition target vocabulary number limiting unit in FIG. 1. FIG.

【図７】図１の認識対象語彙数制限部の他の構成例を示
す認識対象語彙数制限テーブルの説明図である。FIG. 7 is an explanatory diagram of a recognition target vocabulary number restriction table showing another configuration example of the recognition target vocabulary number restriction unit of FIG. 1.

【図８】図１０の実施例に用いうる類似語テーブルの説
明図である。8 is an explanatory diagram of a similar word table that can be used in the embodiment of FIG.

【図９】図１の実施例等における認識対象語彙の説明図
である。9 is an explanatory diagram of a recognition target vocabulary in the example of FIG. 1 and the like.

【図１０】本発明の第４の実施例のシステム構成を示す
ブロック図である。FIG. 10 is a block diagram showing a system configuration of a fourth exemplary embodiment of the present invention.

【図１１】図１０の実施例において認識対象語彙を説明
するための説明図である。11 is an explanatory diagram for explaining a recognition target vocabulary in the embodiment of FIG.

【図１２】図１の実施例のシステム処理を表わすフロー
チャートである。FIG. 12 is a flowchart showing a system process of the embodiment of FIG.

【図１３】図１の実施例における入出力シーケンス例の
説明図である。13 is an explanatory diagram of an input / output sequence example in the embodiment of FIG.

【図１４】図９の説明に対応するシステム処理のフロー
チャートである。14 is a flowchart of system processing corresponding to the description of FIG.

【図１５】本発明が適用される携帯型端末装置の外観図
である。FIG. 15 is an external view of a portable terminal device to which the present invention is applied.

[Explanation of symbols]

１０１…音声入力部、１０２…Ａ／Ｄ変換部、１０３…
分析部、１０４…標準パタン格納部、１０５…照合部、
１０６…騒音状態検出部、１０７…認識語彙制限部、１
０８…音声区間検出部、１０９…コマンド実行部101 ... Voice input section, 102 ... A / D conversion section, 103 ...
Analyzing unit, 104 ... Standard pattern storing unit, 105 ... Collating unit,
106 ... Noise state detecting unit, 107 ... Recognition vocabulary limiting unit, 1
08 ... voice section detection unit, 109 ... command execution unit

Claims

[Claims]

1. A voice input unit for inputting a voice to be recognized, and an A / D for quantizing the input voice obtained from the voice input unit.
A conversion unit, an analysis unit that obtains the characteristic component of the input speech, a standard pattern storage unit that stores a characteristic vector of a recognition target vocabulary that is registered in advance, a characteristic vector that is stored in the standard pattern, and the analysis The matching unit that recognizes the input voice by finding the similarity to the feature vector obtained by the unit, the noise state detection unit that detects the noise state at the time of recognition, and the number of recognition target vocabularies that the matching unit collates. A recognition vocabulary limiting unit that reduces the number of recognition target vocabularies registered in advance, wherein the recognition vocabulary limiting unit limits the recognition target vocabulary according to the noise state detected by the noise state detecting unit. A voice recognition device.

2. The noise state detection unit detects a noise level when the voice recognition device is used.
The voice recognition device described.

3. The voice recognition device according to claim 1, wherein the noise state detection unit detects the noise state by detecting the voice-to-noise ratio of the voice input from the voice input unit.

4. The noise state detection unit detects the distance between the voice input unit and the mouth of the user, and detects the noise state based on the detected distance. Voice recognition device.

5. A condition selection unit for a user to select a usage condition, and the noise condition detection unit detects a noise condition based on the condition of the condition selection unit.
The voice recognition device described.

6. An erroneous recognition determination section for determining that the voice input from the voice input section is not correctly recognized, wherein the noise state detection section is based on the state of the erroneous recognition determination section. The voice recognition device according to claim 1, wherein

7. A designated vocabulary storage unit for storing a vocabulary designated by a user from among the vocabularies registered in the standard pattern storage unit, and the recognized vocabulary restriction unit is stored in the designated vocabulary storage unit. 7. The voice recognition device according to claim 1, wherein the recognition target vocabulary is limited based on the vocabulary information that is stored.

8. A similar vocabulary storage unit that stores a relationship of similar vocabulary with respect to a vocabulary registered in the standard pattern storage unit, and the recognized vocabulary restriction unit is similar vocabulary information of the similar vocabulary storage unit. 7. The speech recognition apparatus according to claim 1, wherein the recognition target vocabulary is limited based on the above.

9. A use frequency storage unit for storing a past use frequency for a vocabulary registered in the standard pattern storage unit, and the recognition vocabulary restriction unit stores frequency information in the use frequency storage unit. 7. The voice recognition device according to claim 1, wherein the recognition target vocabulary is limited based on the original.

10. A similar vocabulary storage unit for storing a relationship of similar vocabulary with respect to a vocabulary registered in the standard pattern storage unit, and a past use frequency for a vocabulary registered in the standard pattern storage unit. The recognition vocabulary limiting unit limits the recognition target vocabulary based on the similar vocabulary information of the similar vocabulary storage unit and the frequency information of the vocabulary frequency storage unit. The voice recognition device according to claim 1.

11. An erroneous recognition determination unit that determines that the voice input from the voice input unit is not correctly recognized, and the recognition result of recognition in which the recognition target vocabulary is restricted by the recognition vocabulary restriction unit is incorrect. 11. If there is, the restriction of the recognition target vocabulary part is removed, and all the vocabulary registered in the standard pattern storage part is recognized as a recognition target, and the speech according to claim 1. Recognition device.

12. An erroneous recognition determination unit that determines that the voice input from the voice input unit is not correctly recognized, and the recognition result of recognition in which the recognition target vocabulary is restricted by the recognition vocabulary restriction unit is incorrect. If there is, the remaining vocabulary excluding the recognition target vocabulary selected in the recognition target vocabulary unit out of all the vocabularies registered in the standard pattern storage unit is recognized as a recognition target. 1 to 10
The voice recognition device according to any one of 1.

13. An erroneous recognition determination unit that determines that the voice input from the voice input unit is not correctly recognized, and the recognition result of recognition in which the recognition target vocabulary is restricted by the recognition vocabulary restriction unit is incorrect. If there is, a part of the remaining vocabulary excluding the recognition target vocabulary selected in the recognition target vocabulary part out of all the vocabularies registered in the standard pattern storage part is recognized as a recognition target. The voice recognition device according to any one of claims 1 to 10.

14. A presentation unit for presenting a recognition result to a user, wherein when the recognition vocabulary limiting unit performs recognition with the recognition vocabulary limited, the presentation unit, together with the recognized vocabulary, 11. The voice recognition device according to claim 8, wherein a similar word of the recognition vocabulary stored in the similar vocabulary storage unit is presented as a recognition candidate.

15. When the recognition vocabulary is restricted by the recognition vocabulary limiting unit, the recognition unit stores the recognition result stored in the similar vocabulary storage unit. 11. The speech recognition apparatus according to claim 8, wherein the presentation unit presents the recognized vocabulary and the similar word as recognition candidates only when the degree of similarity of the vocabulary to the similar word is high.

16. The presenting unit, when presenting the recognized vocabulary and the similar word to the presenting unit, emphasizes and presents a non-identical portion of the presented vocabulary. 15. The voice recognition device according to 15.

17. A voice recognition method in a voice recognition device for performing voice recognition by collating a plurality of pre-registered vocabularies with an input voice, wherein a surrounding noise state at the time of voice recognition is detected, and the detection is performed. When it is determined that the noise is small, the speech recognition is performed using all the registered vocabularies as recognition targets, and when it is determined that the noise is large, the registration is performed. A speech recognition method characterized by performing speech recognition by using only a part of the vocabulary as a recognition target.