JPH06208393A

JPH06208393A - Voice recognizing device

Info

Publication number: JPH06208393A
Application number: JP5019641A
Authority: JP
Inventors: Nobuo Hagimoto; 信男萩本
Original assignee: Clarion Co Ltd
Current assignee: Faurecia Clarion Electronics Co Ltd
Priority date: 1993-01-12
Filing date: 1993-01-12
Publication date: 1994-07-26

Abstract

PURPOSE:To recover a voice level which is extremely subtracted by subtraction processing in a sound recognizing device having a noise suppression circuit. CONSTITUTION:A spectrum edit processing section 402 performs spectrum edit processing for an output of a subtraction section 404. Also, a noise level from a noise level measuring section 403 is inputted to a compensation quantity setting section 500, and the setting section 500 sets that how much % of the noise level is to be added to analysis data and returned. Set compensation quantity is inputted to a compensation switch section 502, and added to a compensation quantity addition section 503 in accordance with a judged result of a zero judging section 501. The zero judging section 501 judges whether it is zero or non-zero for data of after spectrum edit processing. At the time, if data is zero, the compensation switch section 502 is controlled so that compensation quantity is not inputted to the compensation quantity addition section 503. And if data is non-zero, since it can be judged as a voice section, the zero setting section 501 controls the compensation switch section 502 so that compensation quantity is inputted to the compensation quantity addition section 503, in order to compensate.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声認識装置に関し、
特に、騒音抑圧回路を有する音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device,
In particular, it relates to a voice recognition device having a noise suppression circuit.

【０００２】[0002]

【従来の技術】近年、音声認識装置を用いて機器をコン
トロールするシステムの研究開発が行われ実用化されつ
つある。特に、自動車等の車室内における機器操作は、
音声認識装置を用いることにより手を使わず（ハンズフ
リー）、視点をそらすことなく（アイズフリー）行える
ため、機器操作における運転の安全の確保や向上が実現
できるという利点がある。しかし、現状の音声認識装置
は（特に、車両の走行騒音のような）、騒音環境下では
音声認識性能が大幅に低下するという問題点があった。2. Description of the Related Art In recent years, research and development of a system for controlling a device using a voice recognition device have been carried out and are being put to practical use. In particular, the operation of equipment in the interior of a car
By using the voice recognition device, hands can be used (hands-free) and the viewpoint can be kept (eyes-free), so that there is an advantage that driving safety in device operation can be secured and improved. However, there is a problem with the current voice recognition device (especially, such as vehicle running noise) that the voice recognition performance is significantly deteriorated in a noisy environment.

【０００３】図８は従来の音声認識装置の構成を示すブ
ロック図であり、音声認識装置は、構成を大きく分ける
と、音声分析部１００、音声認識部２００及び辞書部３
００から構成されている。図８において、マイク１０５
から入力された音声は音声分析部１００で特徴抽出が行
われる。FIG. 8 is a block diagram showing a configuration of a conventional voice recognition device. The voice recognition device is roughly divided into a voice analysis unit 100, a voice recognition unit 200 and a dictionary unit 3.
It is composed of 00. In FIG. 8, the microphone 105
The voice analysis unit 100 performs feature extraction on the voice input from.

【０００４】音声分析部１００にはＮチャンネルの帯域
フィルタがあり、それぞれには絶対値回路（ＲＥＣＴ）
及び低域フィルタ（ＬＰＦ）が接続されている。分析デ
ータはＡ／Ｄコンバータに順次入力されデジタルデータ
となる。そして、音声の分析結果（デジタルデータ）は
外部バス１１０を介して音声認識部２００に出力され
る。The voice analysis unit 100 has N-channel bandpass filters, each of which has an absolute value circuit (RECT).
And a low pass filter (LPF) is connected. The analysis data is sequentially input to the A / D converter and becomes digital data. Then, the voice analysis result (digital data) is output to the voice recognition unit 200 via the external bus 110.

【０００５】図８に示したような従来の音声認識装置の
音声認識性能が騒音環境下で低下する原因としては、主
に、騒音の混入により音声のスペクトルパターンが変形
されるために、予め辞書３００に登録されている標準パ
ターンとの類似性の低下が生じることが挙げられる。The reason why the voice recognition performance of the conventional voice recognition apparatus as shown in FIG. 8 is deteriorated in a noisy environment is that the mixing of noise mainly transforms the spectrum pattern of the voice so that the dictionary is previously prepared. It can be mentioned that the similarity with the standard pattern registered in 300 is reduced.

【０００６】また、単語音声認識装置の場合、孤立発音
された単語音声の始端と終端を検出し、音声信号を切り
出す音声区間検出処理が必要になることが知られている
が、騒音により音声信号が変形したり、騒音に埋もれた
りすることで、正確な音声検出が行えなくなり検出精度
が低下し、これが認識性能の低下を招くこととなる。Further, in the case of a word voice recognition device, it is known that a voice section detection process for detecting the start and end of a word voice that is sounded in isolation and cutting out a voice signal is required. If the sound is deformed or buried in noise, accurate voice detection cannot be performed, and the detection accuracy decreases, which leads to deterioration in recognition performance.

【０００７】更に、周囲の騒音が高くなってくると音声
区間検出が行えなくなり、認識装置側が動作不能にな
り、音声を発声しても認識結果が出力されなくなる等と
いう問題点があった。Further, when the ambient noise becomes high, the voice section cannot be detected, the recognition device side becomes inoperable, and the recognition result is not output even if the voice is uttered.

【０００８】そこで、周囲騒音への対策として、従来は
マイクと発声者の口許との距離を近付け（５〜１０cmく
らいに）、マイク感度を落とすことで発声者以外の音の
マイクからの入力がされにくくするという方策がとられ
ることが多い。[0008] Therefore, as a measure against ambient noise, conventionally, the distance between the microphone and the speaker's mouth is reduced (to about 5 to 10 cm), and the microphone sensitivity is reduced so that the sound other than the speaker can be input from the microphone. In many cases, measures are taken to make it harder to be done.

【０００９】しかしながら、マイク距離を短くするため
にはマイクの設置方法が問題となる。例えば、発声者が
マイクを身に付けたり、または発声者の目の前にマイク
が設置される。このような場合、例えば、車の運転中に
音声認識される場合等はマイクが目の前にあると使いに
くい、など用途によってはマイクが邪魔になったり、ま
た、例えば、マイクを体に装着して用いている場合等
は、車に乗り降りする度にマイクを着脱しなければなら
ないなど、発声に煩わしさを感じさせる等の不都合があ
る。However, in order to shorten the microphone distance, the method of installing the microphone becomes a problem. For example, the speaker wears the microphone, or the microphone is installed in front of the speaker's eyes. In such a case, for example, when the voice is recognized while driving a car, it is difficult to use if the microphone is in front of the eyes, etc. When it is used, the microphone has to be attached and detached each time the user gets in and out of the car, which causes an inconvenience such as annoying vocalization.

【００１０】このため、車室内などでの使用ではマイク
距離を数１０cm（３０〜５０cmくらい）に離し、サンバ
イザ等に取り付ける場合が多い。この方法によればマイ
クの取付法による不都合は解消されるがマイク距離が離
れた分だけマイク感度を上げる必要が生じ、反って騒音
がマイクから拾われやすくなるので、マイクの位置如何
による解決策でなく音声認識装置側での騒音対策が必要
となる。For this reason, when used in a vehicle interior, the microphone distance is often set to several tens of cm (about 30 to 50 cm) and attached to a sun visor or the like. According to this method, the inconvenience caused by the microphone mounting method is eliminated, but it is necessary to increase the microphone sensitivity as the microphone distance increases, and noise tends to be picked up from the microphone. Instead, it is necessary to take noise countermeasures on the side of the voice recognition device.

【００１１】走行騒音は、低周波数にパワーが偏ってい
るので、音声認識装置の前処理としてＨＰＦ（ハイパス
フィルタ）を用いて一部のノイズを相対的に低減する方
法が考えられる。しかし、この方法では、ＨＰＦを用い
ることにより音声情報も欠落するので遮断周波数を適当
に選択する必要がある。なお、この場合ＨＰＦを用いる
ことにより音声情報の欠落に比べて低域ノイズの低減が
大きいので、相対的にＳ／Ｎは向上する。Since the running noise has a power biased to a low frequency, a method of relatively reducing a part of the noise by using an HPF (high-pass filter) as a preprocessing of the voice recognition device can be considered. However, in this method, the cutoff frequency must be appropriately selected because the voice information is also lost by using the HPF. It should be noted that in this case, the use of the HPF significantly reduces the low-frequency noise as compared with the loss of the voice information, so the S / N is relatively improved.

【００１２】ＨＰＦによる走行騒音対策は、例えば、文
献として、「耐騒音音声認識システム」浜田、滝沢共著
（松下電器）信学技法ＳＰ８９〜１０５に示されてい
る。上記耐騒音音声認識システムの欠点として、騒音を大幅に低減しようとして遮断周波数を高く設
定すると、それと共に音声情報の欠落も大きくなり、認
識性能が低下するため遮断周波数の設定に限界があり、
従って、騒音の低減にも限界があること、ＨＰＦにより抑圧される騒音成分以外の帯域にも騒
音は存在しているが、この部分の騒音対策が何等講じら
れていないので認識性能の飛躍的な向上を望めない点、があり、このシステムはあくまでも、前処理として用い
られる実用的な手法の１つに過ぎない。Countermeasures against running noise by HPF are shown, for example, in "Noise-Resistant Speech Recognition System" by Hamada and Takizawa (Matsushita Electric), Communication Techniques SP89-105. As a drawback of the above noise resistant voice recognition system, if the cutoff frequency is set to be high in order to significantly reduce noise, the loss of voice information also increases and the recognition performance decreases, so there is a limit to the setting of the cutoff frequency.
Therefore, there is a limit to the noise reduction, and noise exists in the band other than the noise component suppressed by the HPF, but no noise countermeasures have been taken for this part, so that the recognition performance is dramatically improved. However, this system is only one of the practical methods used as pretreatment.

【００１３】上述したように、音声認識に用いられる音
声の分析データは騒音が混入することで大きく変形され
認識性能を低下させる原因となっている。そして、ＨＰ
Ｆによる低域騒音の抑圧だけでは（又は、ＨＰＦがなく
ても）、音声の分析帯域内に存在する騒音成分の抑圧
を、音声情報の欠落なしにできないという問題点があ
る。As described above, the analysis data of the voice used for the voice recognition is greatly deformed due to the inclusion of noise, which causes the reduction of the recognition performance. And HP
There is a problem that the suppression of the noise component existing in the analysis band of the voice cannot be suppressed without the loss of the voice information only by suppressing the low frequency noise by F (or even without the HPF).

【００１４】[0014]

【発明が解決しようとする課題】上述したように音声分
析部１００による音声分析データは音声認識部２００に
出力されるが、上記問題点を解消するものとして音声分
析部１００と音声認識部２００の間に騒音対策処理部を
設け、騒音により変形された音声の分析データを波形整
形（以下、スペクトルエディット（Specturam-Edit）と
いう）することで、耐騒音性能（音声認識率）を向上さ
せた音声認識装置が提案されている。As described above, the voice analysis data by the voice analysis unit 100 is output to the voice recognition unit 200. As a solution to the above problems, the voice analysis unit 100 and the voice recognition unit 200 are provided. A voice with improved noise resistance performance (voice recognition rate) by providing a noise countermeasure processing unit between them and performing waveform shaping (hereinafter referred to as "Specturam-Edit") on the analysis data of the voice deformed by noise. A recognizer has been proposed.

【００１５】〈提案されている音声認識装置〉図９は、
上記提案の音声認識装置の構成例を示すブロック図であ
る。図９において、騒音対策処理部４００として環境騒
音減算部４０１及び音声波形整形部４０２が示されてい
るが、音声波形整形部４０２は騒音対策処理部４００に
含まれる場合もあり含まれない場合もあり、環境騒音減
算部４０１とは独立の処理を行う。<Proposed Speech Recognition Device> FIG.
It is a block diagram which shows the structural example of the said speech recognition apparatus. In FIG. 9, the environmental noise subtraction unit 401 and the voice waveform shaping unit 402 are shown as the noise countermeasure processing unit 400, but the voice waveform shaping unit 402 may or may not be included in the noise countermeasure processing unit 400. Yes, the processing is independent of the environmental noise subtraction unit 401.

【００１６】図３は図８の音声認識装置でマイク１０５
から入力された騒音を含む音声「さっぽろ」の音声分析
部１００のチャンネル数“７”による分析結果の例を示
したものである（以下、図３〜図７でＣＨ１〜ＣＨ７は
チャンネル（番号）を示す）。FIG. 3 shows the microphone 105 of the voice recognition apparatus of FIG.
It shows an example of an analysis result of the number "7" of channels of the voice analysis unit 100 of the voice "Sapporo" including the noise input from (hereinafter, CH1 to CH7 are channels (numbers) in FIGS. 3 to 7). Indicates).

【００１７】図８の装置では、図３に示した分析データ
に対して環境騒音減算部４０１で平均騒音レベルを測定
し、元のデータから騒音分を差し引くことによりスペク
トルサブトラクション（Spectrum-Subtraction）を行っ
ていた。図４はスペクトルサブトラクションの結果によ
る音声「さっぽろ」の各チャンネル毎の出力波形図であ
る。In the apparatus of FIG. 8, the average noise level is measured by the environmental noise subtraction unit 401 for the analysis data shown in FIG. 3 and the noise component is subtracted from the original data to obtain spectrum subtraction. I was going. FIG. 4 is an output waveform diagram for each channel of the sound "Sapporo" according to the result of the spectral subtraction.

【００１８】図９の音声波形整形部４０２は、音声の存
在しない範囲または音声の無音区間を検出して、その部
分をゼロデータで置き換える処理（スペクトルエディッ
ト処理）を行う。図５はスペクトルエディット処理によ
る音声「さっぽろ」の各チャンネル毎の波形図である。The voice waveform shaping section 402 of FIG. 9 detects a range in which no voice exists or a silent section of the voice and replaces that portion with zero data (spectrum edit processing). FIG. 5 is a waveform diagram for each channel of the sound "Sapporo" by the spectrum edit processing.

【００１９】しかしながら、上記提案の音声認識装置で
は平均騒音レベルが高いとき、減算処理後の音声レベル
は騒音がないときにあったであろうレベルに比べて低く
なり後段の音声認識部２００における音声区間検出に不
都合が生じる。その理由として、一般に単語音声認識装
置の音声区間検出アルゴリズムはカットアンドトライ
（Cut-And-Try）や多くのデータの中から決められたパ
ラメータに従って動作するように構成されているので、
その音声区間検出動作は音声パターンの山や谷の出方に
著しく依存したものになることが挙げられる。However, in the speech recognition apparatus proposed above, when the average noise level is high, the speech level after the subtraction processing becomes lower than the level that would have existed when there was no noise. Inconvenience occurs in section detection. The reason is that the speech segment detection algorithm of the word speech recognition device is generally configured to operate according to a parameter determined from cut-and-try and many data,
It can be mentioned that the operation of detecting the voice section is remarkably dependent on the appearance of the peaks and valleys of the voice pattern.

【００２０】このことから音声区間検出を精度よく行わ
せるためにもサブトラクション（Subtraction；減算）
処理により音声パターンの山や谷の形やレベルは変動し
ないことが望ましい。言い替えれば、サブトラクション
後のデータは騒音が抑制されるのが理想であるから、処
理後に残った音声は騒音のない場合の音声と同じである
ことが究極の課題となる。From this, in order to accurately detect the voice section, subtraction is performed.
It is desirable that the shape and level of peaks and valleys of the voice pattern do not change due to the processing. In other words, it is ideal that the noise is suppressed in the data after the subtraction, and the ultimate problem is that the voice remaining after the processing is the same as the voice without noise.

【００２１】しかしながら、騒音レベルが高いと上述し
たように音が引かれ（減らされ）過ぎて本来の音声より
レベルが低くなり、音声区間検出動作が不安定になった
り音声パターンの辞書に対する類似度が低下したりす
る。このとき、音声のレベルが下がるのを回避するため
に減算量を平均騒音レベルよりも小さい値にすると騒音
を充分に抑圧できないという不都合がある。However, if the noise level is high, the sound is drawn (reduced) too much and the level becomes lower than the original voice as described above, the voice section detection operation becomes unstable, and the similarity of the voice pattern to the dictionary is low. Will decrease. At this time, if the subtraction amount is set to a value smaller than the average noise level in order to prevent the sound level from decreasing, there is a disadvantage that the noise cannot be suppressed sufficiently.

【００２２】図６及び図７はこのような場合の例であ
り、図６はサブトラクション後の音声データの波形、図
７は騒音のないときの音声「さっぽろ」の分析データの
波形であり、図６と図７の波形を比較すると図７のＣＨ
（チャンネル）１のデータの振幅が小さいことがわか
る。これは、入力されている騒音が走行騒音を模擬した
ものであり、低域にそのパワーが偏っているために低域
チャンネルであるチャンネル１に含まれる騒音レベルが
高くなり、結果として減算時に減算量が大きくなること
を原因とする。FIGS. 6 and 7 are examples of such a case, FIG. 6 is a waveform of voice data after subtraction, and FIG. 7 is a waveform of analysis data of voice “Sapporo” in the absence of noise. 6 and the waveforms of FIG. 7 are compared, CH of FIG.
It can be seen that the amplitude of the data of (channel) 1 is small. This is because the input noise is a simulation of running noise, and its power is concentrated in the low range, so the noise level included in channel 1, which is the low range channel, becomes high, and as a result, subtraction is performed during subtraction. Due to the large amount.

【００２３】本発明は上記不都合に鑑みてなされたもの
であり、騒音抑圧回路を有する音声認識装置において減
算処理により減算されすぎた音声レベルを復活させ得る
音声認識装置を提供することを目的とする。The present invention has been made in view of the above inconvenience, and an object of the present invention is to provide a voice recognition device capable of restoring a voice level that has been excessively subtracted by a subtraction process in a voice recognition device having a noise suppression circuit. .

【００２４】[0024]

【課題を解決するための手段】上記の目的を達成するた
めに第１の発明による音声認識装置は、音声入力手段に
より取り込まれた音声を第１の音声分析データに変換す
る音声分析手段と、音声分析データから前記騒音成分を
検出し騒音成分信号を得る騒音検出手段と、音声分析デ
ータから騒音成分を減算した第２の音声分析データを得
る減算手段と第２の音声分析データに基づいて、音声を
認識処理し音声認識結果を得る音声認識手段と、を備え
た音声認識装置において、第２の音声分析データの音声
区間を検出し検出信号を得る音声区間検出手段と、騒音
成分信号の所定量を第２の音声分析データに与え、該第
２の音声分析データの補正処理を行う補正処理手段と、
を有することを特徴とする。In order to achieve the above object, a voice recognition apparatus according to the first invention comprises a voice analysis means for converting a voice captured by a voice input means into a first voice analysis data. Based on the noise detecting means for detecting the noise component from the voice analysis data to obtain a noise component signal, the subtracting means for obtaining the second voice analysis data by subtracting the noise component from the voice analysis data, and the second voice analysis data, A voice recognition device comprising: a voice recognition means for recognizing voice to obtain a voice recognition result; a voice section detecting means for detecting a voice section of the second voice analysis data to obtain a detection signal; and a noise component signal location. Correction processing means for giving a fixed amount to the second voice analysis data and correcting the second voice analysis data;
It is characterized by having.

【００２５】第２の発明は、上記第１の発明による音声
認識装置において、更に、第２の音声分析データをスペ
クトルエディット処理してスペクトルエディット信号を
得るスペクトルエディット手段を有し、音声区間分析手
段が、スペクトルエディット信号のゼロ区間または非ゼ
ロ区間を検出し検出信号を得るゼロ区間検出手段であ
り、補正処理手段が、騒音成分信号から該騒音成分信号
の所定量を得る補正量設定手段と、スペクトルエディッ
ト信号と所定量の騒音成分信号との合成処理を行う合成
手段と、ゼロ区間検出手段からの検出信号に基づいて合
成処理を制御する制御手段と、を有することを特徴とす
る。A second invention is the speech recognition apparatus according to the first invention, further comprising a spectrum edit means for subjecting the second speech analysis data to a spectrum edit process to obtain a spectrum edit signal, and a voice section analysis means. Is a zero section detecting means for obtaining a detection signal by detecting a zero section or a non-zero section of the spectrum edit signal, the correction processing means, a correction amount setting means for obtaining a predetermined amount of the noise component signal from the noise component signal, It is characterized in that it has a synthesizing means for synthesizing the spectrum edit signal and the noise component signal of a predetermined amount, and a control means for controlling the synthesizing processing based on the detection signal from the zero section detecting means.

【００２６】第３の発明は、上記第１の発明による音声
認識装置において、補正処理手段が、騒音成分信号から
該騒音成分信号の所定量を得る補正量設定手段と、第２
の音声分析データと所定量の騒音成分信号との合成処理
を行う合成手段と、音声区間検出手段からの検出信号に
基づいて合成処理を制御する制御手段と、を有すること
を特徴とする。A third invention is the speech recognition apparatus according to the first invention, wherein the correction processing means obtains a predetermined amount of the noise component signal from the noise component signal, and a second correction amount setting means.
And a control means for controlling the synthesizing process based on the detection signal from the voice section detecting means.

【００２７】[0027]

【作用】上記構成により第１の発明による音声認識装置
は、音声区間検出手段により第２の音声分析データの音
声区間を検出し検出信号を得て、補正処理手段により騒
音成分信号の所定量を第２の音声分析データに与え、該
第２の音声分析データの補正処理を行う。With the above arrangement, the voice recognition apparatus according to the first aspect of the invention detects the voice section of the second voice analysis data by the voice section detecting means to obtain a detection signal, and the correction processing means to determine a predetermined amount of the noise component signal. It is given to the second voice analysis data, and the correction processing of the second voice analysis data is performed.

【００２８】第２の発明は、上記第１の発明による音声
認識装置において、更に、スペクトルエディット手段に
より第２の音声分析データをスペクトルエディット処理
してスペクトルエディット信号を得て、ゼロ区間検出手
段によりスペクトルエディット信号のゼロ区間又は非ゼ
ロ区間を検出し検出信号を得る。そして、補正処理手段
が、補正量設定手段により騒音成分信号から該騒音成分
信号の所定量を得て、合成手段によりスペクトルエディ
ット信号と所定量の騒音成分信号との合成処理を行い、
制御手段によりゼロ区間検出手段からの検出信号に基づ
いて合成処理を制御する。A second aspect of the present invention is the voice recognition apparatus according to the first aspect of the present invention, further comprising: spectrum-editing processing the second voice-analysis data to obtain a spectrum-editing signal. The zero section or non-zero section of the spectrum edit signal is detected to obtain a detection signal. Then, the correction processing means obtains a predetermined amount of the noise component signal from the noise component signal by the correction amount setting means, and performs the combining processing of the spectrum edit signal and the predetermined amount of noise component signal by the combining means,
The control means controls the synthesizing process based on the detection signal from the zero section detecting means.

【００２９】第３の発明は、上記第１の発明による音声
認識装置において、補正処理手段が、補正量設定手段に
より騒音成分信号から該騒音成分信号の所定量を得て、
合成手段により第２の音声分析データと所定量の騒音成
分信号との合成処理を行い、制御手段により音声区間検
出手段からの検出信号に基づいて合成処理を制御する。According to a third invention, in the speech recognition apparatus according to the first invention, the correction processing means obtains a predetermined amount of the noise component signal from the noise component signal by the correction amount setting means,
The synthesizing unit performs the synthesizing process of the second voice analysis data and the noise component signal of the predetermined amount, and the controlling unit controls the synthesizing process based on the detection signal from the voice section detecting unit.

【００３０】[0030]

【Example】

〈実施例１〉図１は第１の発明に基づく音声認識装置の
一実施例を示すブロック図であり、４００は音声分析部
１００と音声認識部２００の間に設けられた音声抑制装
置であり、音声抑圧装置４００は環境騒音減算部４０
１、スペクトルエディット処理部４０２、補正量設定部
５００、ゼロ判定部５０１、補正スイッチ部５０２、補
正量加算部５０３から構成されている。また、環境騒音
減算部４０１は騒音レベル測定部４０３と減算部４０４
で構成されている。<Embodiment 1> FIG. 1 is a block diagram showing an embodiment of a voice recognition device according to the first invention, and 400 is a voice suppression device provided between a voice analysis unit 100 and a voice recognition unit 200. The voice suppression device 400 includes the environmental noise subtraction unit 40.
1, a spectrum edit processing unit 402, a correction amount setting unit 500, a zero determination unit 501, a correction switch unit 502, and a correction amount addition unit 503. Also, the environmental noise subtraction unit 401 includes a noise level measurement unit 403 and a subtraction unit 404.
It is composed of.

【００３１】マイク１０５から音声分析部１００に入力
された騒音を含む音声（例えば、図３の音声「さっぽ
ろ」の波形参照）は音声分析部１００で分析され、音声
分析データ（デジタルデータ）として出力され、環境騒
音減算部４０１にチャンネル毎に入力される。騒音レベ
ル測定部４０３では平均騒音レベルγをチャンネル毎に
測定する。A voice containing noise (for example, refer to the waveform of the voice "Sapporo" in FIG. 3) input from the microphone 105 to the voice analysis unit 100 is analyzed by the voice analysis unit 100 and output as voice analysis data (digital data). And input to the environmental noise subtraction unit 401 for each channel. The noise level measuring unit 403 measures the average noise level γ for each channel.

【００３２】更に、減算部４０４では測定した騒音レベ
ルをもとの音声分析データから差し引く処理を行い、図
４に示すような各チャンネル毎の出力波形を得る。スペ
クトルエディット処理部４０２は減算部４０４の出力に
対しスペクトルエディット処理を行い図５に示すような
各チャンネル毎の出力波形を得る。そして、補正量設定
部５００は騒音レベル測定部４０３からの騒音レベルγ
を入力し、騒音レベルの何％を図５に示すような分析デ
ータに加え戻すかを設定する。ここで、この設定率をα
とすると加え戻す補正量はαγとなる。Further, the subtracting section 404 performs a process of subtracting the measured noise level from the original voice analysis data to obtain an output waveform for each channel as shown in FIG. The spectrum edit processing section 402 performs spectrum edit processing on the output of the subtraction section 404 to obtain an output waveform for each channel as shown in FIG. Then, the correction amount setting unit 500 receives the noise level γ from the noise level measuring unit 403.
Is input, and what percentage of the noise level is added back to the analysis data as shown in FIG. 5 is set. Here, this setting rate is α
Then, the correction amount to be added back is αγ.

【００３３】補正量設定部５００で設定された補正量は
補正スイッチ部５０２に入力され、ゼロ判定部５０１の
判定結果に応じて補正量加算部５０３に加算されたり、
されなかったりする。The correction amount set by the correction amount setting unit 500 is input to the correction switch unit 502 and added to the correction amount addition unit 503 according to the determination result of the zero determination unit 501,
It is not done.

【００３４】ゼロ判定部５０１はスペクトルエディット
処理後のデータ（図５）に対してゼロか非ゼロかの判定
を行う。このとき、データがゼロならば補正量αγが補
正量加算部５０３に入力されないように補正スイッチ５
０２を制御する。また、データが非ゼロの場合には音声
区間てあると判断できるので、補正を行うためにゼロ判
定部５０１は補正量αγを補正加算部５０３に入力する
よう補正スイッチ５０２を制御する。The zero decision unit 501 decides whether the data (FIG. 5) after the spectrum edit processing is zero or non-zero. At this time, if the data is zero, the correction amount αγ is corrected so that the correction amount addition unit 503 does not input the correction amount αγ.
Control 02. Further, when the data is non-zero, it can be determined that there is a voice section, and therefore the zero determination unit 501 controls the correction switch 502 so that the correction amount αγ is input to the correction addition unit 503.

【００３５】図７は図５に示す音声分析データに補正処
理を施したものであり、減算量の１／２を加え戻した例
（α＝０．５）である。この場合、騒音レベルが大きい
チャンネル（実施例ではＣＨ１）は減算量が大きくなる
ので補正量も大きくなる。なお、図７のＣＨ１で区間
と区間は非ゼロであり、補正量が加算された区間であ
る。図７によれば音声レベルが回復していることが明ら
かである。FIG. 7 shows a case where the voice analysis data shown in FIG. 5 is subjected to correction processing, and is an example (α = 0.5) in which ½ of the subtraction amount is added back. In this case, since the subtraction amount is large in the channel with a high noise level (CH1 in the embodiment), the correction amount is also large. In addition, in CH1 of FIG. 7, the section and the section are non-zero, and are the sections to which the correction amount is added. It is clear from FIG. 7 that the audio level has recovered.

【００３６】〈実施例２〉図２は第１の発明に基づく音
声認識装置の一実施例を示すブロック図であり、減算後
の波形にスペクトルエディット処理を施すことなく直接
補正を行うものである。図２で音声抑圧装置４００は環
境騒音減算部４０１、補正量設定部５００、音声区間検
出処理部５０４、補正スイッチ５０２、補正量加算部５
０３から構成されている。また、環境騒音減算部４０１
の構成は図１と同じである。<Embodiment 2> FIG. 2 is a block diagram showing an embodiment of the speech recognition apparatus according to the first invention, in which the waveform after subtraction is directly corrected without performing spectrum edit processing. . In FIG. 2, the voice suppression device 400 includes an environmental noise subtraction unit 401, a correction amount setting unit 500, a voice section detection processing unit 504, a correction switch 502, and a correction amount addition unit 5.
It is composed of 03. Also, the environmental noise subtraction unit 401
Is the same as that of FIG.

【００３７】図２において、マイクから音声分析部１０
０に入力された騒音を含む音声（例えば、図３の音声
「さっぽろ」の波形参照）は音声分析部１００で分析さ
れ、音声分析データ（デジタルデータ）として出力さ
れ、環境騒音減算部４０１にチャンネル毎に入力され
る。騒音レベル測定部４０３では平均騒音レベルγをチ
ャンネル毎に測定する。更に、減算部４０４では測定し
た騒音レベルをもとの音声分析データから差し引く処理
を行い、図４に示すような各チャンネル毎の出力波形を
得る。In FIG. 2, from the microphone to the voice analysis unit 10
The voice including noise input to 0 (for example, refer to the waveform of the voice “Sapporo” in FIG. 3) is analyzed by the voice analysis unit 100, output as voice analysis data (digital data), and is channeled to the environmental noise subtraction unit 401. It is input every time. The noise level measuring unit 403 measures the average noise level γ for each channel. Further, the subtracting unit 404 performs a process of subtracting the measured noise level from the original voice analysis data to obtain an output waveform for each channel as shown in FIG.

【００３８】スペクトルエディット処理を行わない場
合、補正を行う音声区間以外にも減算しきれずに残った
騒音があるためゼロ、非ゼロ判定で補正を行う区間を特
定することができない。そこで、騒音区間検出部５０４
により音声区間検出を行う。一方、補正量設定部５００
は図１の場合と同様に騒音レベル測定部４０３からの騒
音レベルγを入力し、騒音レベルの何％を図５に示すよ
うな分析データに加え戻すかを設定する。ここで、この
設定率をαとすると加え戻す補正量はαγとなる。If the spectrum edit process is not performed, it is not possible to specify the section to be corrected by the zero / non-zero judgment because there is noise that cannot be completely subtracted and remains in addition to the voice section to be corrected. Therefore, the noise section detection unit 504
The voice section is detected by. On the other hand, the correction amount setting unit 500
1 inputs the noise level γ from the noise level measuring unit 403 as in the case of FIG. 1 and sets what percentage of the noise level is added back to the analysis data as shown in FIG. Here, when the set rate is α, the correction amount to be added back is αγ.

【００３９】補正量設定部５００で設定された補正量は
補正スイッチ部５０２に入力され、音声区間検出部５０
４による音声区間検出の結果に応じて補正量加算部５０
３に加算されたり、されなかったりする。The correction amount set by the correction amount setting unit 500 is input to the correction switch unit 502, and the voice section detection unit 50
Correction amount adding section 50 according to the result of voice section detection by
It may or may not be added to 3.

【００４０】[0040]

【発明の効果】以上説明したように第１の発明によれ
ば、スペクトルサブトラクション（減算処理）により失
われる音声レベルを復活させることができ、スペクトル
エディット（波形整形）後の波形をより最初の入力音声
の波形に近付けることが可能となり、辞書に対する波形
の類似性が向上するので音声認識部における認識率を向
上させ得る。また、第２の発明によれば、スペクトルエ
ディット処理を行うことなくスペクトルサブトラクショ
ンにより失われる音声レベルを復活させることができ、
より簡単な回路構成で、辞書に対する波形の類似性が向
上するので音声認識部における認識率を向上させ得る。As described above, according to the first aspect of the present invention, the voice level lost by the spectral subtraction (subtraction process) can be restored, and the waveform after spectrum editing (waveform shaping) can be input first. It becomes possible to approximate the waveform of the voice, and the similarity of the waveform to the dictionary is improved, so that the recognition rate in the voice recognition unit can be improved. Further, according to the second invention, it is possible to restore the voice level lost by the spectrum subtraction without performing the spectrum edit process,
With a simpler circuit configuration, the similarity of the waveform to the dictionary is improved, so that the recognition rate in the voice recognition unit can be improved.

[Brief description of drawings]

【図１】第１の発明に基づく音声認識装置の一実施例を
示すブロック図である。FIG. 1 is a block diagram showing an embodiment of a voice recognition device according to the first invention.

【図２】第１の発明に基づく音声認識装置の一実施例を
示すブロック図である。FIG. 2 is a block diagram showing an embodiment of a voice recognition device according to the first invention.

【図３】騒音を含む音声「さっぽろ」の音声分析部によ
る分析結果の例である。FIG. 3 is an example of an analysis result by a voice analysis unit of a voice “Sapporo” including noise.

【図４】スペクトルサブトラクションの結果を示す音声
「さっぽろ」の波形図である。FIG. 4 is a waveform diagram of a voice “Sapporo” showing a result of spectrum subtraction.

【図５】スペクトルエディット処理の結果を示す音声
「さっぽろ」の波形図である。FIG. 5 is a waveform diagram of the sound “Sapporo” showing the result of the spectrum editing process.

【図６】騒音のないときの音声「さっぽろ」の音声分析
データである。FIG. 6 is voice analysis data of voice “Sapporo” when there is no noise.

【図７】図５に示した音声分析データに対し施した補正
処理の結果を示す波形図である。7 is a waveform chart showing a result of a correction process performed on the voice analysis data shown in FIG.

【図８】従来の音声認識装置の構成例を示すブロック図
ある。FIG. 8 is a block diagram showing a configuration example of a conventional voice recognition device.

【図９】提案された音声認識装置の構成を示すブロック
図である。FIG. 9 is a block diagram showing a configuration of a proposed voice recognition device.

[Explanation of symbols]

１００音声分析部（音声分析手段）１０５マイク（音声入力手段）２００音声認識部（音声認識手段）４０２スペクトルエディット処理部（スペクトルエデ
ィット手段）４０３騒音レベル測定部（騒音検出手段）４０４減算部（減算手段）５００補正量設定部（補正量設定手段）５０１ゼロ判定部（ゼロ区間検出手段）５０２補正スイッチ部（制御手段）５０３補正量加算部（合成手段）100 voice analysis unit (voice analysis unit) 105 microphone (voice input unit) 200 voice recognition unit (voice recognition unit) 402 spectrum edit processing unit (spectrum edit unit) 403 noise level measurement unit (noise detection unit) 404 subtraction unit (subtraction) Means) 500 correction amount setting unit (correction amount setting unit) 501 zero determination unit (zero section detection unit) 502 correction switch unit (control unit) 503 correction amount addition unit (combining unit)

Claims

[Claims]

1. A voice analysis unit for converting a voice captured by a voice input unit into first voice analysis data, a noise detection unit for detecting a noise component from the voice analysis data to obtain a noise component signal, and the voice. And a subtraction unit for obtaining second voice analysis data obtained by subtracting the noise component from the analysis data, and a voice recognition unit for recognizing the voice and obtaining a voice recognition result based on the second voice analysis data. In the voice recognition device, a voice section detecting means for detecting a voice section of the second voice analysis data to obtain a detection signal, and a predetermined amount of the noise component signal is given to the second voice analysis data to obtain the second voice analysis data. A voice recognition device, comprising: a correction processing unit that performs correction processing of voice analysis data.

2. The voice recognition device according to claim 1, wherein
Further, the second voice analysis data has a spectrum edit means for performing spectrum edit processing to obtain a spectrum edit signal, and the voice section analysis means detects a zero section or a non-zero section of the spectrum edit signal to obtain a detection signal. Correction section setting means for obtaining a predetermined amount of the noise component signal from the noise component signal, and synthesizing means for synthesizing the spectrum edit signal and the noise component signal of the predetermined amount. And a control unit that controls the synthesis process based on a detection signal from the zero section detection unit.

3. The voice recognition apparatus according to claim 1, wherein the correction processing means includes a correction amount setting means for obtaining a predetermined amount of the noise component signal from the noise component signal, the second voice analysis data and the predetermined amount. A voice recognition device comprising: a synthesizing unit that performs a synthesizing process with a noise component signal; and a control unit that controls the synthesizing process based on a detection signal from a voice section detecting unit.