JP2005348173A

JP2005348173A - Noise reduction method, device for executing the same method, program and its recording medium

Info

Publication number: JP2005348173A
Application number: JP2004166216A
Authority: JP
Inventors: Kiyotaka Sakauchi; 澄宇阪内; Yoichi Haneda; 陽一羽田; Akitoshi Kataoka; 章俊片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2004-06-03
Filing date: 2004-06-03
Publication date: 2005-12-15
Anticipated expiration: 2024-06-03
Also published as: JP4223441B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a noise reduction device which reduces "muffling" of voice and sufficiently suppresses noise. <P>SOLUTION: The noise reduction device is equipped with a frequency band division part 22 which converts an inputted voice signal into a signal of a frequency region and divides the signal into a plurality of frequency bands, an inputted voice signal power calculation part 24 which calculates inputted voice signal power for each frequency from a frequency band signal of the inputted voice signal, a noise power estimation part 51 which estimates noise power for each frequency from the inputted voice signal power for each frequency, a base gain factor calculation part 61 which calculates a base gain factor from the inputted voice signal power for each frequency and the noise power for each frequency, a gain factor smoothing part 62 which smoothes the base gain factor, a gain factor insertion part 28 which superposes the smoothed gain factor on a frequency band signal of the inputted voice signal and calculates a frequency band signal of a noise reduction signal and a time region conversion part 29 which performs inverse conversion of the frequency band signal of the noise reduction signal into a time region and outputs it. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

この発明は、雑音低減方法、この方法を実施する装置、プログラムおよびその記録媒体に関し、特に、マイクロホンを用いた音声通信において目的音声に重畳してマイクロホンに収音された雑音信号を低減し、音声品質を向上させる雑音低減方法、この方法を実施する装置、プログラムおよびその記録媒体に関する。 The present invention relates to a noise reduction method, an apparatus for implementing the method, a program, and a recording medium thereof, and more particularly, to reduce a noise signal superimposed on a target voice and collected by a microphone in voice communication using a microphone. The present invention relates to a noise reduction method for improving quality, an apparatus for implementing the method, a program, and a recording medium thereof.

目的音声信号に雑音信号の重畳した入力音声信号から雑音信号を低減する従来例を図２を参照して説明する。
図２において、マイクロホン１１に収音される目的音声に不要な雑音が混入して得られた入力音声信号Ｘ（ｎ）＝Ｓ（ｎ）＋Ｎ（ｎ）をアナログ／ディジタル（Ａ／Ｄ）変換器２１においてディジタル化し、周波数帯域分割部２２に入力する。なお、Ｓ（ｎ）は目的音声信号を示しており、Ｎ（ｎ）は混入した不要な雑音信号である。周波数帯域分割部２２に入力されたＡ／Ｄ変換後の入力音声信号Ｘ（ｎ）は、ここで周波数領域の信号に変換されてから複数周波数帯域に分割される。周波数帯域分割部２２において分割された各周波数帯域信号を、入力音声信号パワー計算部２４、ゲインファクタ挿入部２８に入力する。以降、入力音声信号Ｘ（ｎ）のｋ番目の周波数帯域信号を代表として、周波数帯域信号Ｘｋ（ｎ）に対する処理の流れを説明する。 A conventional example of reducing a noise signal from an input voice signal in which a noise signal is superimposed on a target voice signal will be described with reference to FIG.
In FIG. 2, an input voice signal X (n) = S (n) + N (n) obtained by mixing unnecessary noise into the target voice collected by the microphone 11 is converted from analog to digital (A / D). The signal is digitized by the device 21 and input to the frequency band dividing unit 22. Note that S (n) indicates a target voice signal, and N (n) is an unnecessary noise signal mixed therein. The input audio signal X (n) after A / D conversion input to the frequency band dividing unit 22 is converted into a frequency domain signal and then divided into a plurality of frequency bands. Each frequency band signal divided by the frequency band dividing unit 22 is input to the input audio signal power calculating unit 24 and the gain factor inserting unit 28. Hereinafter, the flow of processing for the frequency band signal Xk (n) will be described using the kth frequency band signal of the input audio signal X (n) as a representative.

入力音声信号パワー計算部２４においては、入力された周波数帯域信号Ｘｋ（ｎ）のパワーレベルを計算し、Ｓ／Ｎ比推定部２７、雑音パワー推定部５１に入力する。雑音パワー推定部５１においては、入力された入力音声信号パワーＰＸｋ（ｎ）を用いて雑音パワーＰＮｋ（ｎ）を推定する。Ｓ／Ｎ比推定部２７においては、入力音声信号パワーＰＸｋ（ｎ）、推定雑音パワーＰＮｋ（ｎ）および１処理フレーム前にゲインファクタ挿入部２８で得た雑音を低減した信号Ｙ’ｋ（ｎ）を用いて目的音声信号対雑音信号比（Ｓ／Ｎ比）ＳＮＲｋ（ｎ）を推定する。Ｓ／Ｎ比推定部２７で推定したＳ／Ｎ比ＳＮＲｋ（ｎ）は、ゲインファクタ計算部３０および入力音声信号加算率決定部５２に入力する。ゲインファクタ計算部３０においては、Ｓ／Ｎ比推定部２７から入力されたＳ／Ｎ比ＳＮＲｋ（ｎ）を用いてゲインファクタＧ（ＳＮＲｋ（ｎ））が決定される。ここで、ゲインファクタＧ（ＳＮＲｋ（ｎ））の具体的な計算は、スペクトルサブトラクション、ウィナーフィルタ、ＭＬ推定法、ＭＭＳＥ法その他の短時間スペクトラル振幅（ＳＴＳＡ）推定に基づいて実施される。ゲインファクタ計算部３０で推定されたゲインファクタＧ（ＳＮＲｋ（ｎ））は、ゲインファクタ挿入部２８に入力される。ゲインファクタ挿入部２８においては、ゲインファクタＧ（ＳＮＲｋ（ｎ））を用いて雑音低減を行う。具体的には、周波数帯域分割部２２より入力された周波数帯域信号Ｘｋ（ｎ）に周波数領域でゲインファクタＧ（ＳＮＲｋ（ｎ））を重畳（掛算）する演算を行う。そして、斯くして得られた、雑音を低減した雑音低減信号Ｙ’ｋ（ｎ）を入力音声信号加算部５３およびＳ／Ｎ比推定部２７に入力する。これと並行して、入力音声信号加算率決定部５２においては、入力されたＳ／Ｎ比ＳＮＲｋ（ｎ）を用いてＳ／Ｎ比に基づいた入力音声信号加算率αを決定し、入力音声信号加算部５３に入力する。 The input voice signal power calculation unit 24 calculates the power level of the input frequency band signal Xk (n) and inputs the power level to the S / N ratio estimation unit 27 and the noise power estimation unit 51. The noise power estimation unit 51 estimates the noise power PNk (n) using the input voice signal power PXk (n) that has been input. In the S / N ratio estimation unit 27, the input speech signal power PXk (n), the estimated noise power PNk (n), and the signal Y′k (n) in which the noise obtained by the gain factor insertion unit 28 one processing frame before is reduced. ) Is used to estimate the target speech signal to noise signal ratio (S / N ratio) SNRk (n). The S / N ratio SNRk (n) estimated by the S / N ratio estimation unit 27 is input to the gain factor calculation unit 30 and the input audio signal addition rate determination unit 52. The gain factor calculation unit 30 determines the gain factor G (SNRk (n)) using the S / N ratio SNRk (n) input from the S / N ratio estimation unit 27. Here, the specific calculation of the gain factor G (SNRk (n)) is performed based on spectral subtraction, Wiener filter, ML estimation method, MMSE method, and other short-time spectral amplitude (STSA) estimation. The gain factor G (SNRk (n)) estimated by the gain factor calculation unit 30 is input to the gain factor insertion unit 28. The gain factor insertion unit 28 performs noise reduction using the gain factor G (SNRk (n)). Specifically, a calculation is performed to superimpose (multiply) the gain factor G (SNRk (n)) in the frequency domain on the frequency band signal Xk (n) input from the frequency band dividing unit 22. Then, the noise reduction signal Y′k (n) obtained by reducing the noise is input to the input audio signal adding unit 53 and the S / N ratio estimating unit 27. In parallel with this, the input sound signal addition rate determination unit 52 determines the input sound signal addition rate α based on the S / N ratio using the input S / N ratio SNRk (n), and the input sound. The signal is input to the signal adder 53.

入力音声信号加算部５３においては、雑音低減信号Ｙ’ｋ（ｎ）に入力音声信号加算率αに準じた割合で周波数帯域信号Ｘｋ（ｎ）を加算（付加）して下記の周波数帯域信号Ｙｋ（ｎ）を出力する（特許文献１参照）。
Ｙｋ（ｎ）＝αＸｋ（ｎ）＋（１−α）Ｙ’ｋ（ｎ）
周波数帯域信号Ｙｋ（ｎ）は時間領域変換部２９に入力され、全帯域が合成されると共に時間領域信号に逆変換されて、ディジタル／アナログ（Ｄ／Ａ）変換器３４に入力される。ここで、ディジタル／アナログ変換器３４においてアナログ信号に変換された雑音の低減された出力信号Ｙ（ｎ）が出力される。
特許第３４５４４０２号 In the input voice signal adding unit 53, the frequency band signal Xk (n) is added (added) to the noise reduction signal Y′k (n) at a rate according to the input voice signal addition rate α, and the following frequency band signal Yk is added. (N) is output (see Patent Document 1).
Yk (n) = αXk (n) + (1−α) Y′k (n)
The frequency band signal Yk (n) is input to the time domain conversion unit 29, and all the bands are combined, converted back to a time domain signal, and input to the digital / analog (D / A) converter 34. Here, the noise-reduced output signal Y (n) converted into an analog signal by the digital / analog converter 34 is output.
Japanese Patent No. 3454402

以上の従来例において、ゲインファクタＧ（ＳＮＲｋ（ｎ））はＳ／Ｎ比を用いて計算する。さて、Ｓ／Ｎ比は、目的音声信号と雑音信号のパワー比に相当するが、入力音声信号Ｘ（ｎ）には目的音声信号Ｓ（ｎ）と雑音信号Ｎ（ｎ）が混在しているので、それぞれのパワーを独立に測定することはできない。そのために、特許文献１においては、それぞれのパワーおよびＳ／Ｎ比を推定する方法が採用されているが、推定したパワーおよびＳ／Ｎ比には推定誤差が生じる。この推定誤差の影響で、計算したゲインファクタは真のＳ／Ｎ比を基にした理想的な値とはならない。即ち、周波数領域で隣り合うゲインファクタが真のＳ／Ｎ比を基にした理想的な値と比較して断続的（飛び飛び）になるところから、処理後の出力音声信号Ｙ（ｎ）に歪みを発生させる原因となる。 In the above conventional example, the gain factor G (SNRk (n)) is calculated using the S / N ratio. The S / N ratio corresponds to the power ratio between the target voice signal and the noise signal, but the target voice signal S (n) and the noise signal N (n) are mixed in the input voice signal X (n). Therefore, each power cannot be measured independently. Therefore, in Patent Document 1, a method of estimating each power and S / N ratio is adopted, but an estimation error occurs in the estimated power and S / N ratio. Due to the influence of this estimation error, the calculated gain factor is not an ideal value based on the true S / N ratio. That is, the gain factor adjacent in the frequency domain becomes intermittent (jumps) compared with the ideal value based on the true S / N ratio, and is distorted into the processed output audio signal Y (n). It will cause to generate.

以上のことから、この発明は、推定したパワー、Ｓ／Ｎ比に推定誤差があっても、処理後の出力音声信号の歪みの発生を抑え、且つ充分な雑音低減を実現する雑音低減方法、この方法を実施する装置、プログラムおよびその記録媒体を提供するものである。 From the above, the present invention provides a noise reduction method that suppresses the occurrence of distortion of the output audio signal after processing and realizes sufficient noise reduction even if there is an estimation error in the estimated power and S / N ratio, An apparatus, a program, and a recording medium for implementing the method are provided.

請求項１：入力音声信号から雑音信号を低減する雑音低減方法において、入力音声信号を周波数領域の信号に変換し、入力音声信号の周波数帯域信号から周波数毎の入力音声信号パワーを計算し、周波数毎の入力音声信号パワーから周波数毎の雑音パワーを推定し、周波数毎の入力音声信号パワーと周波数毎の雑音パワーから素ゲインファクタを計算し、素ゲインファクタを平滑化し、平滑化ゲインファクタを入力音声信号の周波数帯域信号に重畳して雑音低減信号の周波数帯域信号を計算し、雑音低減信号の周波数帯域信号を時間領域の信号に逆変換して出力する雑音低減方法を構成した。 [Claim 1] A noise reduction method for reducing a noise signal from an input voice signal, converting the input voice signal into a frequency domain signal, calculating an input voice signal power for each frequency from a frequency band signal of the input voice signal, The noise power for each frequency is estimated from the input audio signal power for each frequency, the prime gain factor is calculated from the input voice signal power for each frequency and the noise power for each frequency, the prime gain factor is smoothed, and the smoothed gain factor is input. A noise reduction method is constructed in which a frequency band signal of a noise reduction signal is calculated by being superimposed on a frequency band signal of an audio signal, and the frequency band signal of the noise reduction signal is inversely converted into a time domain signal and output.

請求項２：請求項１に記載される雑音低減方法において、素ゲインファクタに重み付け加算平均する平滑化処理を施す雑音低減方法を構成した。
請求項３：請求項１および請求項２の内の何れかに記載される雑音低減方法において、平滑化ゲインファクタを強調し、強調化ゲインファクタを入力音声信号の周波数帯域信号に重畳して雑音低減信号の周波数帯域信号を計算する雑音低減方法を構成した。
請求項４：請求項３に記載される雑音低減方法において、平滑化ゲインファクタの値の大小によって、平滑化ゲインファクタを０もしくは１に近づける強調化処理を施す雑音低減方法を構成した。 [2] The noise reduction method according to [1], wherein a noise reduction method for performing a smoothing process of weighting and averaging the prime gain factors is configured.
[3] The noise reduction method according to any one of [1] and [2], wherein the smoothing gain factor is emphasized, and the emphasis gain factor is superimposed on the frequency band signal of the input speech signal to reduce the noise. A noise reduction method for calculating the frequency band signal of the reduced signal was constructed.
According to a fourth aspect of the present invention, in the noise reduction method according to the third aspect of the present invention, a noise reduction method is implemented in which an enhancement process is performed to bring the smoothing gain factor close to 0 or 1 depending on the value of the smoothing gain factor.

請求項５：入力音声信号から雑音信号を低減する雑音低減装置において、入力音声信号を周波数領域の信号に変換して複数周波数帯域に分割する周波数帯域分割部２２と、入力音声信号の周波数帯域信号から周波数毎の入力音声信号パワーを計算する入力音声信号パワー計算部２４と、周波数毎の入力音声信号パワーから周波数毎の雑音パワーを推定する雑音パワー推定部５１と、周波数毎の入力音声信号パワーと周波数毎の雑音パワーから素ゲインファクタを計算する素ゲインファクタ計算部６１と、素ゲインファクタを平滑化するゲインファクタ平滑部６２と、平滑化ゲインファクタを入力音声信号の周波数帯域信号に重畳して雑音低減信号の周波数帯域信号を計算するゲインファクタ挿入部２８と、雑音低減信号の周波数帯域信号を時間領域に逆変換して出力する時間領域変換部２９とを具備する雑音低減装置を構成した。 5. A noise reduction apparatus for reducing a noise signal from an input audio signal, wherein the frequency band dividing unit 22 converts the input audio signal into a frequency domain signal and divides the signal into a plurality of frequency bands, and the frequency band signal of the input audio signal. The input voice signal power calculation unit 24 that calculates the input voice signal power for each frequency from the input, the noise power estimation unit 51 that estimates the noise power for each frequency from the input voice signal power for each frequency, and the input voice signal power for each frequency And a prime gain factor calculation unit 61 for calculating a prime gain factor from noise power for each frequency, a gain factor smoothing unit 62 for smoothing the prime gain factor, and a smoothing gain factor superimposed on the frequency band signal of the input audio signal. The gain factor insertion unit 28 for calculating the frequency band signal of the noise reduction signal and the frequency band signal of the noise reduction signal. To constitute a noise reduction apparatus comprising a time domain conversion unit 29 to output the inverse transformation in the region.

請求項６：請求項５に記載される雑音低減装置において、素ゲインファクタに重み付け加算平均する平滑化処理を施す雑音低減装置を構成した。
請求項７：請求項５および請求項６の内の何れかに記載される雑音低減装置において、平滑化ゲインファクタを強調するゲインファクタ強調化部６３を具備し、強調化ゲインファクタを入力音声信号の周波数帯域信号に重畳して雑音低減信号の周波数帯域信号を計算する雑音低減装置を構成した。
請求項８：請求項７に記載される雑音低減装置において、平滑化ゲインファクタの値の大小によって、平滑化ゲインファクタを０もしくは１に近づける強調化処理を施す雑音低減装置を構成した。 Claim 6: The noise reduction apparatus according to claim 5, wherein a noise reduction apparatus that performs a smoothing process that performs weighted addition averaging on the prime gain factor is configured.
[7] The noise reduction apparatus according to any one of [5] and [6], further comprising a gain factor emphasizing unit 63 for emphasizing the smoothing gain factor, and using the enhanced gain factor as the input audio signal. A noise reduction device is constructed that calculates the frequency band signal of the noise reduction signal superimposed on the frequency band signal.
Claim 8: The noise reduction apparatus according to claim 7, wherein the noise reduction apparatus is configured to perform enhancement processing for making the smoothing gain factor close to 0 or 1 depending on the value of the smoothing gain factor.

請求項９：請求項１ないし請求項４の内の何れかに記載される雑音低減方法をコンピュータに書き込み読み出す符号によって記述した雑音低減プログラムを構成した。
請求項１０：請求項９に記載される雑音低減プログラムを記録した記録媒体を構成した。 Claim 9: The noise reduction program which described the noise reduction method in any one of Claim 1 thru | or 4 by the code | symbol written in and read out to a computer was comprised.
Claim 10: A recording medium on which the noise reduction program according to claim 9 is recorded is configured.

この発明は、ゲインファクタの平滑化を行うことによりゲインファクタの周波数領域における断続性が減少して出力音声信号の歪みの発生が抑えられ、平滑化した後のゲインファクタを強調化することにより、処理後の音声の周波数成分の一部の欠損を回避することにより音声の「こもり」を低減し、且つ雑音を充分に抑圧するという効果を奏するに到る。 In the present invention, by performing smoothing of the gain factor, the discontinuity in the frequency domain of the gain factor is reduced and the occurrence of distortion of the output audio signal is suppressed, and by emphasizing the gain factor after smoothing, By avoiding the loss of part of the frequency component of the processed voice, the effect of reducing the “clouding” of the voice and suppressing the noise sufficiently can be achieved.

この発明を実施するための最良の形態を図１を参照して説明する。
図１は雑音低減装置の実施例を説明する図であるが、この実施例は、先の特許文献１において図１を参照して説明されている雑音低減装置と比較して、ゲインファクタ計算部３０の構成を異にするのみで、その他の構成は共通している。以上のことから、この実施例におけるゲインファクタ計算部３０以外の部位における各値の計算は、この特許文献１に記載される方法に倣って実行することができる。
マイクロホン１１で収音される目的音声に不要な雑音が混入して得られた入力音声信号Ｘ（ｎ）＝Ｓ（ｎ）＋Ｎ（ｎ）はアナログ／ディジタル（Ａ／Ｄ）変換器２１においてディジタル信号に変換され、周波数帯域分割部２２に入力される。なお、Ｓ（ｎ）は目的音声信号を示しており、Ｎ（ｎ）は混入した不要な雑音信号である。周波数帯域分割部２２において、入力された音声信号Ｘ（ｎ）は周波数領域の信号に変換されて複数周波数帯域に分割される。分割された各周波数帯域信号は、入力音声信号パワー計算部２４、およびゲインファクタ挿入部２８に入力される。以降、入力音声信号のｋ番目の周波数帯域信号を代表として、ｋ番目の周波数帯域信号Ｘｋ（ｎ）に対する処理の流れを説明する。入力音声信号パワー計算部２４においては、入力された周波数帯域信号Ｘｋ（ｎ）からパワーレベルを計算し、計算結果である入力音声信号パワーＰＸｋ（ｎ）をＳ／Ｎ比推定部２７、雑音パワー推定部５１に入力する。雑音パワー推定部５１においては、入力された入力音声信号パワーＰＸｋ（ｎ）を用いて雑音パワーＰＮｋ（ｎ）を推定する。Ｓ／Ｎ比推定部２７においては、入力音声信号パワーＰＸｋ（ｎ）、推定雑音パワーＰＮｋ（ｎ）および１処理フレーム前にゲインファクタ挿入部２８で得た雑音を低減した周波数帯域信号Ｙ’ｋ（ｎ）を用いてＳ／Ｎ比ＳＮＲｋ（ｎ）を推定する。Ｓ／Ｎ比推定部２７で推定したＳ／Ｎ比ＳＮＲｋ（ｎ）は、ゲインファクタ計算部３０および入力音声信号加算率決定部５２に入力される。 The best mode for carrying out the present invention will be described with reference to FIG.
FIG. 1 is a diagram for explaining an embodiment of a noise reduction apparatus. This embodiment is a gain factor calculation unit as compared with the noise reduction apparatus described with reference to FIG. Only the configuration of 30 is different, and the other configurations are common. From the above, the calculation of each value in the part other than the gain factor calculation unit 30 in this embodiment can be executed following the method described in Patent Document 1.
An input voice signal X (n) = S (n) + N (n) obtained by mixing unnecessary noise in the target voice collected by the microphone 11 is digitally converted by an analog / digital (A / D) converter 21. It is converted into a signal and input to the frequency band dividing unit 22. Note that S (n) indicates a target voice signal, and N (n) is an unnecessary noise signal mixed therein. In the frequency band dividing unit 22, the input audio signal X (n) is converted into a frequency domain signal and divided into a plurality of frequency bands. Each divided frequency band signal is input to the input audio signal power calculation unit 24 and the gain factor insertion unit 28. Hereinafter, the process flow for the kth frequency band signal Xk (n) will be described with the kth frequency band signal of the input audio signal as a representative. The input voice signal power calculation unit 24 calculates a power level from the input frequency band signal Xk (n), and calculates the input voice signal power PXk (n) as a calculation result from the S / N ratio estimation unit 27, noise power. Input to the estimation unit 51. The noise power estimation unit 51 estimates the noise power PNk (n) using the input voice signal power PXk (n) that has been input. In the S / N ratio estimation unit 27, the input voice signal power PXk (n), the estimated noise power PNk (n), and the frequency band signal Y′k in which the noise obtained by the gain factor insertion unit 28 before one processing frame is reduced. (N) is used to estimate the S / N ratio SNRk (n). The S / N ratio SNRk (n) estimated by the S / N ratio estimation unit 27 is input to the gain factor calculation unit 30 and the input audio signal addition rate determination unit 52.

ここで、この発明の実施例においては、ゲインファクタ計算部３０は素ゲインファクタ計算部６１と、ゲインファクタ平滑部６２と、ゲインファクタ強調部６３とより成る。このゲインファクタ計算部３０においては、始めに、素ゲインファクタ計算部６１でＳ／Ｎ比推定部２７から入力されたＳ／Ｎ比ＳＮＲｋ（ｎ）を用いて素ゲインファクタＧ（ＳＮＲｋ（ｎ））が計算される。素ゲインファクタの具体的な計算は、スペクトルサブトラクション、ウィナーフィルタ、ＭＬ推定法、ＭＭＳＥ法その他の短時間スペクトラル振幅（ＳＴＳＡ）推定に基づいて実行される。ゲインファクタはＳ／Ｎ比を元に計算されるが、具体的には入力音声信号に占める目的音声の周波数領域における周波数領域それぞれの比率を意味し、Ｓ／Ｎ比を元に計算されたそのままのゲインファクタを素ゲインファクタと称している。 Here, in the embodiment of the present invention, the gain factor calculation unit 30 includes a prime gain factor calculation unit 61, a gain factor smoothing unit 62, and a gain factor enhancement unit 63. In the gain factor calculation unit 30, first, the prime gain factor G (SNRk (n)) is used by using the S / N ratio SNRk (n) input from the S / N ratio estimation unit 27 in the prime gain factor calculation unit 61. ) Is calculated. The specific calculation of the elementary gain factor is performed based on spectral subtraction, Wiener filter, ML estimation method, MMSE method and other short-time spectral amplitude (STSA) estimation. The gain factor is calculated based on the S / N ratio. Specifically, the gain factor means the ratio of each frequency domain in the frequency domain of the target voice in the input voice signal, and is calculated as it is based on the S / N ratio. The gain factor is referred to as a prime gain factor.

次に、ゲインファクタ平滑部６２において、素ゲインファクタＧ（ＳＮＲｋ（ｎ））に対して重みをつけた平滑化を行う。素ゲインファクタ計算部６１で計算された素ゲインファクタＧ（ＳＮＲｋ（ｎ））は、［発明が解決しようとする課題］の項においても説明した通り、Ｓ／Ｎ比の推定誤差の影響で理想の値との間のずれが生じ、周波数領域において各素ゲインファクタに断続性が生じる。このために、この実施例においては、周波数軸上で隣り合う素ゲインファクタの断続性を緩和することにより、各ゲインファクタの値を滑らかにする平滑化を行う。以下、平滑化の仕方を具体的に説明する。 Next, the gain factor smoothing unit 62 performs smoothing with a weight applied to the prime gain factor G (SNRk (n)). The prime gain factor G (SNRk (n)) calculated by the prime gain factor calculation unit 61 is ideal due to the influence of the estimation error of the S / N ratio as described in the section “Problems to be Solved by the Invention”. And a discontinuity occurs in each elementary gain factor in the frequency domain. For this reason, in this embodiment, smoothing is performed to smooth the values of the respective gain factors by relaxing the discontinuity of the elementary gain factors adjacent on the frequency axis. Hereinafter, the smoothing method will be described in detail.

ｋ番目の周波数帯域の素ゲインファクタＧ（ＳＮＲｋ（ｎ））をＧ（ｋ）と置き代えて、平滑化した後の平滑化ゲインファクタＧｅ（ｋ）とすると、平滑化処理の１例は、以下の式で表すことができる。
Ｇｅ（ｋ）＝Σ_i,jａ（ｉ）×Ｇ（ｊ）／Σ_iａ（ｉ）
この式は、インデックスｊで示されるｋ番目の周波数帯域に隣接する複数の素ゲインファクタＧ（ｊ）の平均値を求め、ｋ番目の周波数帯域の平滑化ゲインファクタＧｅ（ｋ）とする平滑化処理を示す。和をとる際のｉとｊの総数は同数であり、またその総数は最も多くても周波数分析点数以下である。重み係数ａ（ｉ）は、平均値を計算する場合の各素ゲインファクタの影響、即ち、断続性を緩和する割合を制御する。以上の処理の後に、平滑化ゲインファクタＧｅ（ｋ）、即ち、Ｇｅ（ＳＮＲｋ（ｎ））を出力する。 When the prime gain factor G (SNRk (n)) of the kth frequency band is replaced with G (k) and the smoothed gain factor Ge (k) after smoothing is performed, an example of the smoothing process is as follows: It can be represented by the following formula.
Ge (k) = Σ _{i, j} a (i) × G (j) / Σ _i a (i)
This equation obtains an average value of a plurality of elementary gain factors G (j) adjacent to the kth frequency band indicated by the index j, and performs smoothing as a smoothing gain factor Ge (k) of the kth frequency band. Indicates processing. The total number of i and j when taking the sum is the same, and the total is at most equal to or less than the number of frequency analysis points. The weighting factor a (i) controls the influence of each elementary gain factor when calculating the average value, that is, the rate at which the discontinuity is relaxed. After the above processing, a smoothing gain factor Ge (k), that is, Ge (SNRk (n)) is output.

次に、ゲインファクタ強調部６３において、ゲインファクタ平滑化部６２で既に平滑化した平滑化ゲインファクタＧｅ（ｋ）の強調化を行う。平滑化ゲインファクタＧｅ（ｋ）は平滑化によって不連続性がなくなるが、トレードオフで「なまる」という弊害が生じる。具体的には、ｋ番目の周波数帯域のゲインファクタが１で、ｋ−１番目の周波数帯域のゲインファクタが０．９２、ｋ＋２番目の周波数帯域のゲインファクタが０．９３の時に、この３つの周波数帯域で平均値からｋ番目の周波数帯域の平滑化ゲイン係数を決めると、０．９５となる。なお、この例では重み係数ａ（ｉ）は、全て１としている。 Next, in the gain factor emphasizing unit 63, the smoothed gain factor Ge (k) already smoothed by the gain factor smoothing unit 62 is enhanced. The smoothing gain factor Ge (k) has no discontinuity due to the smoothing, but has a negative effect of “rounding” due to a trade-off. Specifically, when the gain factor of the kth frequency band is 1, the gain factor of the k−1 frequency band is 0.92, and the gain factor of the k + 2 frequency band is 0.93, the three When the smoothing gain coefficient of the kth frequency band from the average value in the frequency band is determined, 0.95 is obtained. In this example, the weighting factors a (i) are all set to 1.

上述した通り、ゲインファクタはＳ／Ｎ比を元に計算されるが、具体的には入力音声信号に占める目的音声の周波数領域における周波数領域それぞれの比率である。即ち、計算されたゲインファクタが１に近い時は入力音声信号中に雑音は小さく目的音声の割合が多い状態を意味し、０に近い時は入力音声信号中の目的音声は小さく雑音の割合が多い状態であることを意味する。先に説明した通り、平滑化ゲインファクタが「なまる」と、その値は０および１から離れることになる。１から離れると、例えば、先の例の様に平滑化して０．９５になると、本来目的音声のみが存在する周波数領域の成分を９５％にするため、目的音声に５％の欠損が生じる。０から離れると、例えば、０．０５になると、雑音のみが存在して１００％低減すべき時に、９５％の低減となり雑音が残留し通話品質に悪影響を及ぽす。そこで、以下に示す平滑化され「なまった」平滑化ゲインファクタＧｅ（ｋ）の強調化処理を行う。周波数領域でｋ番目を強調化してゲインファクタをＧｇ（ｋ）とした場合、強調化処理は平滑化ゲインファクタＧｅ（ｋ）の値の大小によって、それぞれのゲイン係数を０もしくは１に近づける処理である。即ち、平滑化ゲインファクタＧｅ（ｋ）が大きい１に近い場合は、より１に近づけて目的音声をより通し易くし、平滑化ゲインファクタＧｅ（ｋ）が小さい０に近い場合は、より０に近づけて雑音をより大きく低減する様に平滑化ゲインファクタＧｅ（ｋ）を強調する。この強調化処理の具体的な１例を以下に式で示す。
Ｇｅ（ｋ）がｔｈ１より大きい場合：Ｇｇ（ｋ）＝ｔｈ１×（Ｇｅ（ｋ）／ｔｈ１）^v1
Ｇｅ（ｋ）がｔｈ２より小さい場合：
Ｇｇ（ｋ）＝１−（１−ｔｈ２）｛（１−Ｇｅ（ｋ））／（１−ｔｈ２）｝^v2
ここで、ｖ１（ｋ）およびｖ２（ｋ）は１以上の整数とする。また、ｔｈ１とｔｈ２は、ｔｈ１≧ｔｈ２の関係を満たす０以上１以下の整数である。Ｇｅ（ｋ）は０から１の範囲の値を持つので、ｔｈ１より大きい場合、より１に近づき、ｔｈ２より小さい場合、より０に近づく処理をこの式は実現する。以上の処理の後に、強調化したゲインファクタＧｇ（ｋ）、即ち、Ｇｇ（ＳＮＲｋ（ｎ））を出力する。 As described above, the gain factor is calculated on the basis of the S / N ratio. Specifically, the gain factor is the ratio of each frequency domain in the frequency domain of the target speech occupied in the input speech signal. That is, when the calculated gain factor is close to 1, it means that the noise is small in the input voice signal and the ratio of the target voice is high, and when it is close to 0, the target voice in the input voice signal is small and the ratio of the noise is high. It means that there are many states. As explained above, when the smoothing gain factor is “rounded”, the value will deviate from 0 and 1. When moving away from 1, for example, when smoothed to 0.95 as in the previous example, the frequency domain component in which only the target speech originally exists is 95%, so a 5% loss occurs in the target speech. When the distance from 0 is, for example, 0.05, when only noise is present and should be reduced by 100%, the noise is reduced by 95% and the speech quality is adversely affected. Therefore, the following smoothing “smoothed” smoothing gain factor Ge (k) is enhanced. When the gain factor is Gg (k) by emphasizing the k-th in the frequency domain, the enhancement process is a process of making each gain coefficient close to 0 or 1 depending on the magnitude of the value of the smoothing gain factor Ge (k). is there. That is, when the smoothing gain factor Ge (k) is close to 1, it is closer to 1 to make it easier to pass the target speech, and when the smoothing gain factor Ge (k) is close to 0, it is more close to 0. The smoothing gain factor Ge (k) is emphasized so that the noise is greatly reduced by approaching. A specific example of this enhancement processing is shown by the following formula.
When Ge (k) is larger than th1: Gg (k) = th1 × (Ge (k) / th1) ^v1
When Ge (k) is smaller than th2:
Gg (k) = 1- (1-th2) {(1-Ge (k)) / (1-th2)} ^v2
Here, v1 (k) and v2 (k) are integers of 1 or more. Further, th1 and th2 are integers of 0 or more and 1 or less that satisfy the relationship of th1 ≧ th2. Since Ge (k) has a value in the range of 0 to 1, this expression realizes processing closer to 1 when it is larger than th1, and closer to 0 when it is smaller than th2. After the above processing, the enhanced gain factor Gg (k), that is, Gg (SNRk (n)) is output.

ゲインファクタ計算部３０内のゲインファクタ強調部６３で計算された強調化ゲインファクタＧ（ＳＮＲｋ（ｎ））は、ゲインファクタ挿入部２８に入力される。ゲインファクタ挿入部２８においては、強調化ゲインファクタＧｇ（ＳＮＲｋ（ｎ））を用いて雑音低減を行う。具体的には、周波数帯域分割部２２から入力された周波数帯域信号Ｘｋ（ｎ）に、周波数領域で強調化ゲインファクタＧｇ（ＳＮＲｋ（ｎ））を重畳（掛算）する計算を行う。そして、雑音を低減した信号Ｙ’ｋ（ｎ）を入力音声信号加算部５３およびＳ／Ｎ比推定部２７に入力する。これと並行して、入力音声信号加算率決定部５２においては、入力されたＳ／Ｎ比ＳＮＲｋ（ｎ）を用いてＳ／Ｎ比に基づいた入力音声信号加算率αを決定し、入力音声信号加算部５３に入力する。 The enhanced gain factor G (SNRk (n)) calculated by the gain factor emphasizing unit 63 in the gain factor calculating unit 30 is input to the gain factor inserting unit 28. The gain factor insertion unit 28 performs noise reduction using the enhanced gain factor Gg (SNRk (n)). Specifically, calculation is performed to superimpose (multiply) the enhancement gain factor Gg (SNRk (n)) in the frequency domain on the frequency band signal Xk (n) input from the frequency band dividing unit 22. Then, the noise-reduced signal Y′k (n) is input to the input audio signal adding unit 53 and the S / N ratio estimating unit 27. In parallel with this, the input sound signal addition rate determination unit 52 determines the input sound signal addition rate α based on the S / N ratio using the input S / N ratio SNRk (n), and the input sound. The signal is input to the signal adder 53.

入力音声信号加算部５３においては、雑音を低減した信号Ｙ’ｋ（ｎ）に入力音声信号加算率αに準じた割合で周波数帯域信号Ｘｋ（ｎ）を加算（付加）して、従来例について先に説明した通りの下記の周波数帯域信号Ｙｋ（ｎ）を出力する。
Ｙｋ（ｎ）＝αＸｋ（ｎ）＋（１−α）Ｙ’ｋ（ｎ）
周波数帯域信号Ｙｋ（ｎ）は、時間領域変換部２９に入力され、全帯域が合成されると共に時間領域の信号に逆変換される。この逆変換された時間領域の信号はディジタル／アナログ変換器３４に入力され、アナログ信号に変換されて雑音の低減された出力信号Ｙ（ｎ）として出力される。 In the input audio signal adding unit 53, the frequency band signal Xk (n) is added (added) to the signal Y′k (n) whose noise is reduced at a rate according to the input audio signal addition rate α, and the conventional example is obtained. The following frequency band signal Yk (n) as described above is output.
Yk (n) = αXk (n) + (1−α) Y′k (n)
The frequency band signal Yk (n) is input to the time domain conversion unit 29, where the entire band is synthesized and inversely converted into a time domain signal. The inversely converted time domain signal is input to the digital / analog converter 34, converted to an analog signal, and output as an output signal Y (n) with reduced noise.

ところで、この発明の雑音低減装置は、これをＤＳＰ（Digital Signal Processor）により構成することができる。また、コンピュータによりプログラムを実行させることにより機能させてもよい。この場合は、そのプログラムはＣＤ−ＲＯＭ、フロッピー（登録商標）ディスク、磁気ディスクなどに記録されたものを、コンピュータ内のプログラム用メモリに取り込んで行うことになる。このプログラム用メモリには、通信によりプログラムをダウンロードさせてもよい。 By the way, the noise reduction device of the present invention can be configured by a DSP (Digital Signal Processor). Moreover, you may make it function by making a computer run a program. In this case, the program is recorded on a CD-ROM, a floppy (registered trademark) disk, a magnetic disk, or the like by being loaded into a program memory in the computer. The program memory may be downloaded by communication.

実施例を説明する図。The figure explaining an Example. 従来例を説明する図。The figure explaining a prior art example.

Explanation of symbols

１１マイクロホン２１アナログ／ディジタル変換器
２２周波数帯域分割部２４入力音声信号パワー計算部
２７Ｓ／Ｎ比推定部２８ゲインファクタ挿入部
２９時間領域変換部３０ゲインファクタ計算部
３４ディジタル／アナログ変換器５１雑音パワー推定部
５２入力音声信号加算率決定部５３入力音声信号加算部
６１素ゲインファクタ計算部６２ゲインファクタ平滑化部
６３ゲインファクタ強調部 DESCRIPTION OF SYMBOLS 11 Microphone 21 Analog / digital converter 22 Frequency band division part 24 Input sound signal power calculation part 27 S / N ratio estimation part 28 Gain factor insertion part 29 Time domain conversion part 30 Gain factor calculation part 34 Digital / analog converter 51 Noise Power estimation unit 52 Input speech signal addition rate determination unit 53 Input speech signal addition unit 61 Elemental gain factor calculation unit 62 Gain factor smoothing unit 63 Gain factor enhancement unit

Claims

In a noise reduction method for reducing a noise signal from an input voice signal,
Convert the input audio signal into a frequency domain signal, calculate the input audio signal power for each frequency from the frequency band signal of the input audio signal, estimate the noise power for each frequency from the input audio signal power for each frequency, and for each frequency Calculate the raw gain factor from the input voice signal power and noise power at each frequency, smooth the prime gain factor, and superimpose the smoothed gain factor on the frequency band signal of the input voice signal to generate the frequency band signal of the noise reduction signal. A noise reduction method comprising: calculating, inversely converting a frequency band signal of a noise reduction signal into a time domain signal, and outputting the signal.

The noise reduction method according to claim 1,
A noise reduction method, characterized by performing a smoothing process of weighting and averaging an elementary gain factor.

In the noise reduction method according to any one of claims 1 and 2,
A noise reduction method characterized by emphasizing a smoothing gain factor and calculating a frequency band signal of a noise reduction signal by superimposing the enhanced gain factor on a frequency band signal of an input voice signal.

The noise reduction method according to claim 3,
A noise reduction method characterized by performing an emphasis process for making a smoothing gain factor close to 0 or 1 depending on a value of a smoothing gain factor.

In a noise reduction device that reduces a noise signal from an input voice signal,
A frequency band dividing unit that converts an input audio signal into a frequency domain signal and divides the signal into a plurality of frequency bands; an input audio signal power calculation unit that calculates an input audio signal power for each frequency from the frequency band signal of the input audio signal; A noise power estimator for estimating noise power for each frequency from input voice signal power for each frequency; a prime gain factor calculator for calculating a prime gain factor from input voice signal power for each frequency and noise power for each frequency; Gain factor smoothing unit that smoothes the gain factor, gain factor insertion unit that calculates the frequency band signal of the noise reduction signal by superimposing the smoothing gain factor on the frequency band signal of the input audio signal, and the frequency band of the noise reduction signal A noise reduction apparatus comprising: a time domain conversion unit that converts a signal back to the time domain and outputs the signal.

The noise reduction device according to claim 5, wherein
A noise reduction apparatus characterized by performing a smoothing process of weighted addition averaging on an elementary gain factor.

In the noise reduction device according to any one of claims 5 and 6,
A noise reduction device comprising a gain factor emphasizing unit for emphasizing a smoothing gain factor, and calculating a frequency band signal of a noise reduction signal by superimposing the enhancement gain factor on a frequency band signal of an input voice signal .

The noise reduction device according to claim 7,
A noise reduction apparatus characterized by performing an emphasis process for making a smoothing gain factor close to 0 or 1 depending on a value of a smoothing gain factor.

A noise reduction program in which the noise reduction method according to any one of claims 1 to 4 is described by codes written to and read from a computer.

A recording medium on which the noise reduction program according to claim 9 is recorded.