JP4518817B2 - Sound collection method, sound collection device, and sound collection program - Google Patents
Sound collection method, sound collection device, and sound collection program Download PDFInfo
- Publication number
- JP4518817B2 JP4518817B2 JP2004065513A JP2004065513A JP4518817B2 JP 4518817 B2 JP4518817 B2 JP 4518817B2 JP 2004065513 A JP2004065513 A JP 2004065513A JP 2004065513 A JP2004065513 A JP 2004065513A JP 4518817 B2 JP4518817 B2 JP 4518817B2
- Authority
- JP
- Japan
- Prior art keywords
- original sound
- addition rate
- signal
- voice
- sound addition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Description
本発明は、雑音抑圧処理した信号に対して、各時間Δtにおける音声区間らしさを算出し、その音声区間らしさに応じて原音付加率を自動的に設定することで、音声品質を保持しながら雑音抑圧を行う収音方法及び収音装置、収音プログラムに関する。 The present invention calculates the likelihood of a speech segment at each time Δt for a signal subjected to noise suppression processing, and automatically sets the original sound addition rate according to the likelihood of the speech segment, thereby maintaining noise while maintaining speech quality. The present invention relates to a sound collection method, a sound collection device, and a sound collection program for performing suppression.
雑音抑圧処理した信号に対し、原音を付加することで処理音の歪を低減させ、聴感上の品質を上げる技術は従来から提案されている(非特許文献1)。
この非特許文献1に開示された原音付加方法は原音付加率を全区間(時間軸方向の全区間)にわたって一定に保つ処理がなされている。
In the original sound addition method disclosed in Non-Patent
ところが、例えば雑音が極端に大きな環境(目的音声よりも雑音の方が大きい環境)においては、音声品質を改善するのに十分な原音付加率を常に一定の値で付加すると、雑音も大きく残留してしまい、品質を保ちながらS/N比を改善させることが難しい、という課題があった。
この発明の目的は音声区間の品質を保ちながら時間軸方向の全区間にわたってS/N比を改善することができる収音方法、収音装置、収音プログラムを提案しようとするものである。
However, for example, in an environment where the noise is extremely large (an environment where the noise is larger than the target speech), if the original sound addition rate sufficient to improve speech quality is always added at a constant value, the noise will remain large. Therefore, there is a problem that it is difficult to improve the S / N ratio while maintaining the quality.
An object of the present invention is to propose a sound collection method, a sound collection device, and a sound collection program that can improve the S / N ratio over the entire interval in the time axis direction while maintaining the quality of the voice interval.
本発明は、従来、全区間で一定の原音付加率を用いていた方法に対し、雑音抑圧処理後の信号に対して音声区間らしさを算出し、音声(らしい)区間においては音声品質をなるべく保ちながら、全区間全体でのS/N比改善量を従来に比べて改善させようとするものである。
その具体的な手法として、この発明の第1実施形態として入力信号に含まれる雑音を抑圧し、目的信号を強調する雑音抑圧処理と、入力信号の各区間毎における音声区間らしさを算出する音声区間度算出処理と、音声区間度算出処理で算出された音声区間らしさに基づき雑音抑圧処理された目的信号に原音を付加する率を決定する原音付加率決定処理と、原音付加率決定処理で決定した原音付加率に従って雑音抑圧処理された目的信号に原音を付加する原音付加処理とを含む収音方法を提案する。
The present invention calculates the likelihood of a speech section for a signal after noise suppression processing, compared to the conventional method using a constant original sound addition rate in all sections, and keeps the speech quality as much as possible in the speech (likely) section. However, it is intended to improve the S / N ratio improvement amount in the entire section as compared with the conventional art.
Specifically, as a first embodiment of the present invention, noise suppression processing for suppressing noise included in an input signal and emphasizing a target signal, and a speech section for calculating the likelihood of a speech section for each section of the input signal are described. The original sound addition rate determination processing for determining the rate of adding the original sound to the target signal subjected to the noise suppression processing based on the sound interval characteristic calculated by the voice interval degree calculation processing, and the original sound addition rate determination processing A sound collection method including original sound addition processing for adding original sound to a target signal subjected to noise suppression processing according to the original sound addition rate is proposed.
この第1実施形態において、音声区間度算出処理は入力信号のパワーが所定値以上の区間を音声区間と決定し、所定値以下を雑音区間と決定する収音方法を提案する。
更に、第1実施形態において、音声区間度算出処理は入力信号のパワーが第1設定値TS以上の区間を音声区間と決定し、入力信号のパワーが前記第1設定値より小さい第2設定値TN以下の区間を雑音区間と決定し、入力信号のパワーが前期第1設定値TSと第2設定値TNとの間にある場合には入力信号のパワーに応じて音声らしさを決定する収音方法を提案する。
In this first embodiment, the speech interval degree calculation process proposes a sound collection method in which a section in which the power of the input signal is greater than or equal to a predetermined value is determined as a speech section and a section below the predetermined value is determined as a noise section.
Furthermore, in the first embodiment, the voice interval degree calculation process determines a section where the power of the input signal is equal to or higher than the first set value TS as the voice section, and the second set value whose input signal power is smaller than the first set value. A section below TN is determined as a noise section, and when the power of the input signal is between the first set value TS and the second set value TN in the previous period, the sound collection is determined according to the power of the input signal. Suggest a method.
更に、この発明の第2実施形態では音声区間度算出処理は、目的音源及び雑音源までの距離が互に異なる距離となる位置に設置された少なくとも一対の音声入力手段を具備し、これら一対の音声入力手段で捉えた雑音信号を含む目的信号をそれぞれチャネル別に帯域分割処理し、帯域分割された各チャネルの帯域成分を同一帯域成分毎にチャネル間でレベル比較し、このレベル比較結果に従って、入力手段と目的音源及び雑音源との距離差に対応付けして音声信号成分か雑音信号成分かを判定し、音声信号成分と判定された周波数帯域成分のパワーを積算し、このパワーの積算値により音声らしさを決定する収音方法を提案する。
更に、この発明の第3実施形態では原音付加率が変化する再に、その変化を序々に変化させる平滑化処理を付加し、これにより原音付加率が急変することを阻止し、音声に歪みを与えることのない収音方法を提案する。
Furthermore, in the second embodiment of the present invention, the speech interval degree calculation processing includes at least a pair of speech input means installed at positions where the distances to the target sound source and the noise source are different from each other. The target signal including the noise signal captured by the voice input means is band-divided for each channel, the band components of each band-divided channel are compared between channels for each same band component, and input according to this level comparison result The voice signal component or the noise signal component is determined in association with the distance difference between the sound source and the target sound source and the noise source, and the power of the frequency band component determined as the voice signal component is integrated. We propose a sound collection method that determines the sound quality.
Furthermore, in the third embodiment of the present invention, a smoothing process for gradually changing the change is added to the change of the original sound addition rate, thereby preventing the original sound addition rate from changing suddenly and distorting the sound. We propose a sound collection method that never gives.
本発明は目的信号と雑音信号が混ざって収音された受音信号に対し、音声区間らしさを表す物理量を算出し、その音声区間らしさに応じて原音付加率を自動的に調整することで、音声区間では歪が少ない音声を、雑音区間ではS/N比改善量が大きい(残留雑音が少ない)信号を出力できる。その結果、従来の全区間一律に原音付加率を定める方法に比べて聞きやすく、雑音の少ない音声信号を出力することができる。 The present invention calculates a physical quantity representing the likelihood of a voice section for a received signal collected by mixing a target signal and a noise signal, and automatically adjusts the original sound addition rate according to the likelihood of the voice section, It is possible to output a voice with less distortion in the voice section and a signal having a large S / N ratio improvement amount (less residual noise) in the noise section. As a result, it is possible to output a voice signal that is easier to hear and has less noise than the conventional method of uniformly determining the addition rate of the original sound in all sections.
以下にこの発明を実施するための最良の形態となる各実施形態について詳細に説明する。図1はこの発明の第1実施形態を示す。この第1実施形態はこの発明の基本的な収音方法を利用して動作する収音装置である。
この発明の第1実施形態で提案する収音装置100はマイクロホンで構成される音声入力手段1と、雑音抑圧手段2と、音声区間度算出手段3と、原音付加率決定手段4と、原音付加手段5によって構成される。
Each embodiment which is the best mode for carrying out the present invention will be described in detail below. FIG. 1 shows a first embodiment of the present invention. The first embodiment is a sound collection device that operates using the basic sound collection method of the present invention.
The sound collection device 100 proposed in the first embodiment of the present invention includes a voice input means 1 composed of a microphone, a noise suppression means 2, a voice segment degree calculation means 3, an original sound addition rate determination means 4, and an original sound addition. Consists of
音声入力手段1は目的音源Sと雑音源Nとから目的信号S1(t)と雑音信号S2(t)とを受音する。尚、ここでは説明を簡略化するために雑音源Nを一つとして説明するが、一般に雑音源Nの個数は複数あってもよい。音声入力手段1の出力には目的信号に雑音信号が重畳した信号x(t)(以下原音信号と称す)が出力される。
音声入力手段1が出力する原音信号x(t)は雑音抑圧手段2に入力される。雑音抑圧手段2は一般的な手法、例えばスペクトルサブトラクション等を利用して雑音を抑圧する。雑音抑圧処理された信号をS1′(t)として示す。
雑音抑圧処理された信号S1′(t)は音声区間度算出手段3に供給され、この音声区間度算出手段3で音声区間らしさを表わす「音声区間度」を算出する。
The voice input means 1 receives the target signal S1 (t) and the noise signal S2 (t) from the target sound source S and the noise source N. Here, in order to simplify the description, the description will be made with one noise source N. However, in general, there may be a plurality of noise sources N. A signal x (t) (hereinafter referred to as an original sound signal) in which a noise signal is superimposed on a target signal is output from the audio input means 1.
The original sound signal x (t) output from the
The noise-suppressed signal S1 ′ (t) is supplied to the speech interval degree calculating means 3, and the speech interval degree calculating means 3 calculates the “speech interval degree” representing the likelihood of the speech interval.
音声区間らしさの算出方法としてこの発明で提案する収音方法では雑音抑圧された信号S1’(t)のパワー(Pow)を算出し、パワーがある値TSを超えたら音声区間らしい、と判定して音声区間であると決定する。逆に、パワーがTS以下ならば騒音区間であると決定する。
この発明で提案する収音方法では、例えば、ある値TSとTNを設定し(TS>TN)、Pow>TSを満たす場合には音声区間らしさが100%であると判定する。TN<Pow<TSの場合には、Powの値がTSに近いほど音声区間らしさも高い、と判定する。例えば(Pow−TN)/(TS−TN)を音声区間らしさを表す量(音声区間度)とする。
The sound collecting method propose in this inventions the method of calculating the speech segment likelihood calculating the power (Pow) of the signal S1 are noise suppression '(t), seems speech segment After exceeds the value TS that is power, and It determines and it determines with it being an audio | voice area. Conversely, if the power is equal to or less than TS, it is determined to be a noise section.
The sound collecting method propose in this inventions, for example, determines that sets a certain value TS and TN (TS> TN), likeness voice section when satisfying Pow> TS is 100%. In the case of TN <Pow <TS, it is determined that the closer the value of Pow is to TS, the higher the likelihood of a speech segment is. For example, let (Pow-TN) / (TS-TN) be an amount (sound interval degree) that represents the likelihood of a speech interval.
図2に示したように、音声区間度を算出するために信号のパワーPowを算出する際、雑音抑圧された信号S1′(t)のかわりに原信号x(t)を用いても良い。図2の構成のメリットは、例えば、騒音が目的信号に対してそれほど大きくない(例えばS/N比が10dB程度)の信号に対し、雑音抑圧処理した後に音声区間度を算出すると、処理遅延が長くなってしまい、例えば通信に不適切、といった環境において発揮される。このような環境下では、受音信号を音声区間度算出に用いることで、(音声区間度算出精度は多少劣化するものの、その代わりに、)雑音抑圧処理と音声区間度算出を並列で行えるために処理遅延を短くすることが出来る。
原音付加率決定手段4においては、音声区間度算出手段3で算出した音声区間度に応じて原音付加率を動的に決定する。具体的な方法を二つ挙げる。まず、音声区間度算出手段3で各区間を音声区間か騒音区間かの2種類に類別した場合について述べる。
この場合は、音声区間であると判定された区間(Pow>TS)については、音声品質を重視する目的で原音付加率α(t)を高め(例えば0.3)に設定する。雑音区間であると判定された区間(Pow<TN)については、S/N比改善量を優先する目的で原音付加率α(t)を低め(例えば0.05)に設定する。
As shown in FIG. 2, when calculating the signal power Pow to calculate the speech interval, the original signal x (t) may be used instead of the noise-suppressed signal S1 ′ (t). The merit of the configuration of FIG. 2 is that, for example, if a speech interval is calculated after noise suppression processing is performed on a signal whose noise is not so large (for example, the S / N ratio is about 10 dB), the processing delay is increased. It becomes long, and is exhibited in an environment where it is inappropriate for communication, for example. In such an environment, using the received sound signal for calculating the speech interval degree allows noise suppression processing and speech interval degree calculation to be performed in parallel (although the speech interval degree calculation accuracy is somewhat degraded, instead) In addition, the processing delay can be shortened.
In the original sound addition rate determining means 4, the original sound addition rate is dynamically determined according to the voice interval degree calculated by the voice interval degree calculating means 3. Here are two specific methods. First, a description will be given of a case in which each section is classified into two types, that is, a voice section and a noise section by the voice section degree calculation means 3.
In this case, for the section determined to be a voice section (Pow> TS), the original sound addition rate α (t) is set high (for example, 0.3) for the purpose of placing importance on the voice quality. For the section determined to be a noise section (Pow <TN), the original sound addition rate α (t) is set low (for example, 0.05) for the purpose of giving priority to the S / N ratio improvement amount.
次に、音声区間度算出手段3で、ある値TSとTNを設定し(TS>TN)、音声区間らしさを(Pow−TN)/(TS−TN)として表した場合の原音付加率決定手段4の動作例を記載する。
Pow>TSを満たす場合には音声区間らしさが100%であるとして原音付加率の最大値αmax(例えばαmax=0.5)を付加率α(t)として設定する。また、Pow<TNを満たす場合には雑音区間らしさが100%であるとして原音付加率の最小値αmin(例えばαmin=0.1)を付加率α(t)として設定する。TN<Pow<TSの場合には、図3に示す方法で原音付加率α(t)を決定する。すなわち、横軸にPowを、縦軸に原音付加率α(t)をとったグラフ上において、(TN,αmin),(TS,αmax)の2点を通る直線を求めることにより、各Powに応じた原音付加率α(t)を算出する。以上の動作をプログラム言語を使って式(1)に示す。
When Pow> TS is satisfied, the maximum value α max (for example, α max = 0.5) of the original sound addition rate is set as the addition rate α (t) on the assumption that the speech interval likelihood is 100%. When Pow <TN is satisfied, the noise section likelihood is 100%, and the minimum value α min (for example, α min = 0.1) of the original sound addition rate is set as the addition rate α (t). In the case of TN <Pow <TS, the original sound addition rate α (t) is determined by the method shown in FIG. That is, by obtaining a straight line passing through two points (TN, α min ) and (TS, α max ) on a graph with Pow on the horizontal axis and the original sound addition rate α (t) on the vertical axis, An original sound addition rate α (t) corresponding to Pow is calculated. The above operation is shown in Equation (1) using a programming language.
次にこの発明の第2実施形態を説明する。この発明の第2実施形態は請求項1で提案する収音方法と、請求項6で提案する収音装置に対応する。
ここでは音声区間度算出手段3の一例について説明する。実施形態1では、音声区間度算出に用いる信号は、既存の雑音抑圧手段2により雑音抑圧された信号S1’(t)、または原信号x(t)を用いた。信号S1’(t)を使う場合、例えば既存の雑音抑圧手段2の雑音抑圧処理がうまく働かない場合、音声区間度算出手段3の算出精度にも悪影響を及ぼす可能性があった。また、原信号x(t)を用いた場合、受音信号のS/N比が著しく悪い(例えばS/N比が0dBまたは負の値となる)場合には、音声区間度の算出が困難であった。
Next, a second embodiment of the present invention will be described. The second embodiment of the present invention corresponds to the sound collection method proposed in
Here, an example of the speech segment degree calculation means 3 will be described. In the first embodiment, the signal S1 ′ (t) or the original signal x (t) that has been subjected to noise suppression by the existing
この実施形態2では、これら雑音抑圧手段2の影響や原信号x(t)のS/N比の影響を受けずに、なるべく精度の高い音声区間度を算出する方法の一例を図4に示す。音声区間度算出手段3の入力手段1A、1Bは例えば、2本以上のマイクロホンで構成される。帯域分割手段6では入力手段1A、1Bからの信号を周波数分析する。周波数分析には例えばフーリエ変換が用いられる。チャネル間レベル差算出手段7では各周波数成分におけるチャネル間(入力手段1A側と1B側をチャネル間と称す)のレベル差ΔA(ω)が算出される。ΔA(ω)は式(2)で定義される。
ΔA(ω)=20log10[|X1(ω)|/|X2(ω)|] (2)
音源信号判定手段8においてはチャネル間レベル差ΔA(ω)の値に基づき、各周波数成分が目的信号の成分か、雑音信号の成分かを判定する。例えば図4のように、目的音源3が入力手段1Bに比べて入力手段1Aのほうに近く配置され、逆に、雑音源Nが入力手段1Aに比べて入力手段1Bのほうに近く配置されている場合には、ΔA(ω)≦0を満たす周波数成分は雑音信号成分である、と判定される。
In the second embodiment, FIG. 4 shows an example of a method for calculating a speech segment degree with the highest possible accuracy without being influenced by the noise suppression means 2 and the S / N ratio of the original signal x (t). . The input means 1A and 1B of the speech interval degree calculating means 3 are composed of, for example, two or more microphones. The band dividing means 6 performs frequency analysis on the signals from the input means 1A and 1B. For example, Fourier transform is used for frequency analysis. The inter-channel level difference calculating means 7 calculates the level difference ΔA (ω) between channels (the input means 1A side and 1B side are referred to as inter-channel) in each frequency component. ΔA (ω) is defined by equation (2).
ΔA (ω) = 20 log 10 [| X 1 (ω) | / | X 2 (ω) |] (2)
The sound source signal determination means 8 determines whether each frequency component is a target signal component or a noise signal component based on the value of the inter-channel level difference ΔA (ω). For example, as shown in FIG. 4, the target sound source 3 is arranged closer to the input means 1A than the input means 1B, and conversely, the noise source N is arranged closer to the input means 1B than the input means 1A. If it is, the frequency component satisfying ΔA (ω) ≦ 0 is determined to be a noise signal component.
音源信号選択手段9においては、音源信号判定手段8の判定結果に基づき、目的信号成分にはあるゲイン値gSが、雑音信号成分にはあるゲイン値gNが乗算される。音源信号選択手段8における制御式をプログラム言語を用いて式(3)に示す。
if ΔA(ω)≧0 then S^1(ω)=gS・X1(ω)
elseif ΔA(ω)≦0 then S^1(ω)=gN・X1(ω) (3)
ゲイン値の例として例えば、gS=1.0,gN=0.0を与える。
In the sound source signal selection means 9, based on the determination result of the sound source signal determination means 8, the target signal component is multiplied by a certain gain value g S and the noise signal component is multiplied by a certain gain value g N. A control expression in the sound source
if ΔA (ω) ≧ 0 then S ^ 1 (ω) = g S · X 1 (ω)
elseif ΔA (ω) ≦ 0 then S ^ 1 (ω) = g N · X 1 (ω) (3)
For example, g S = 1.0 and g N = 0.0 are given as examples of gain values.
パワー積算手段10においては、音源信号選択手段9において重み付けされた信号S^1(ω)のパワーを全周波数帯域に渡り積算する。積算したパワーPow値を原音付加率決定手段4へ送る。原音付加率決定手段4においては、この積算されたパワーPow値を用いて、実施形態1で述べたのと同様の制御で付加率を決定する。図4に示した音声区間度算出手段3は、原信号に対して、音源の方向情報を利用し、各周波数成分が音声信号、雑音信号どちらのものであるか判定し、音声スペクトルと判定された帯域はゲイン1.0を乗算することで強調し、雑音スペクトルと判定された帯域はゲイン値0を乗算することで抑圧するため、実質的には原信号x(t)のS/N比を改善したのと等価となり、S/N比を改善した後に音声区間度を算出することになる。このため、S/N比が悪い信号に対して、原信号をそのまま使う方法よりも音声区間度の算出精度が向上することができる。また、既存の雑音抑圧処理と独立して音声区間度を算出できるため、仮に既存の雑音抑圧処理の性能が悪い場合にもその影響を受けずに音声区間度を算出することができる。 The power integration means 10 integrates the power of the signal S 1 (ω) weighted by the sound source signal selection means 9 over the entire frequency band. The integrated power Pow value is sent to the original sound addition rate determination means 4. The original sound addition rate determination means 4 determines the addition rate by the same control as described in the first embodiment using the integrated power Pow value. The voice interval degree calculating means 3 shown in FIG. 4 uses the direction information of the sound source for the original signal, determines whether each frequency component is a voice signal or a noise signal, and is determined as a voice spectrum. The band determined as a noise spectrum is suppressed by multiplication by a gain value of 0, so that the S / N ratio of the original signal x (t) is substantially reduced. Thus, the speech interval degree is calculated after improving the S / N ratio. For this reason, it is possible to improve the accuracy of calculating the speech interval degree for a signal having a poor S / N ratio, compared to the method using the original signal as it is. In addition, since the speech interval degree can be calculated independently of the existing noise suppression processing, the speech interval degree can be calculated without being affected even when the performance of the existing noise suppression processing is poor.
次に、この発明の実施形態3を説明する。この実施形態3は請求項3で提案する収音方法と請求項8で提案する収音装置に対応する。
構成例を図5に示す。この実施形態3では実施形態1で述べた原音付加率決定手段4において、原音付加率α(t)を変化させる際、原音付加率α(t)の時間変化を滑らかにする原音付加率平滑化手段11を加えた構成を特徴とするものである。
原音付加率α(t)の時間変化を滑らかにするために、一つ前の時刻における原音付加率αpre(t)と現時刻の原音付加率α(t)の差分を算出し、その差分の極性(+,−)に応じて現時刻の原音付加率α(t)を決定する。決定法の一例をプログラム言語を用いて式(4)に示す。
If (αpre(t)<α(t)) then αsmooth(t)=αpre(t)+atk(α(t)−αpre(t))
else then α smooth (t)=αpre(t)+rls(α(t)−αpre(t))
(4)
ここで、atkとrlsは0<atk<1と0<rls<1を満たす値である。上記動作により、原音付加率α(t)は毎時刻に大幅に変化することは無く、一つ前の時刻からの微増、または微減にとどまるため時間変化が滑らかとなり、処理後の音質歪が小さくなる。微増、微減どちらの方向へ推移するかは、現時刻の原音付加率α(t)と前時刻の付加率の差分値により決定され、現時刻の原音付加率α(t)が前時刻に比べて増加している場合には微増の方向へ、減少している場合には微減の方向へ推移する。この第3実施形態の発明はもちろん、第2実施形態と合わせて使うことも出来る。
Next, a third embodiment of the present invention will be described. The third embodiment corresponds to the sound collection method proposed in claim 3 and the sound collection device proposed in
A configuration example is shown in FIG. In the third embodiment, when the original sound addition rate α (t) is changed in the original sound addition rate determining means 4 described in the first embodiment, the original sound addition rate smoothing is performed to smooth the time change of the original sound addition rate α (t). The configuration is characterized by adding
In order to smooth the time change of the original sound addition rate α (t), the difference between the original sound addition rate α pre (t) at the previous time and the original sound addition rate α (t) at the current time is calculated, and the difference The original sound addition rate α (t) at the current time is determined according to the polarity (+, −). An example of the determination method is shown in Formula (4) using a programming language.
If (α pre (t) <α (t)) then α smooth (t) = α pre (t) + atk (α (t) −α pre (t))
else then α smooth (t) = α pre (t) + rls (α (t)-α pre (t))
(4)
Here, atk and rls are values satisfying 0 <atk <1 and 0 <rls <1. With the above operation, the original sound addition rate α (t) does not change significantly every time, and the time change becomes smooth because it only slightly increases or decreases from the previous time, and the sound quality distortion after processing is small. Become. The direction of the slight increase or slight decrease is determined by the difference between the original sound addition rate α (t) at the current time and the addition rate at the previous time, and the original sound addition rate α (t) at the current time is compared with the previous time. If it is increasing, it will increase slightly, and if it is decreasing, it will decrease slightly. The invention of the third embodiment can be used together with the second embodiment.
このように、上記手段を用いることで、雑音が混ざった信号から音声区間らしさを表す物理量を算出し、その値を元に音声区間らしさに応じて原音付加率を自動で調整することが可能となり、従来の全区間一律に原音付加率を定める方法に比べて、音声区間においては歪低減が実現し、雑音区間においてはS/N比改善量が確保され、全区間トータルで聞きやすく、なおかつ残留雑音の少ない信号を抽出し、収音することができる。
上述したこの発明の収音方法はこの発明による収音プログラムをコンピュータに実行させることによって実現される。この発明による収音プログラムはコンピュータが解読可能なプログラム言語によって記述され,磁気ディスク或はCD−ROM等の記録媒体に記録され、これらの記録媒体からコンピュータインストールされるか、又は通信回線を通じてインストールされる。インストールされた収音プログラムはコンピュータに備えられた中央演算処理装置CPUに解読されて実行される。
In this way, by using the above means, it is possible to calculate a physical quantity representing the likelihood of a speech section from a signal mixed with noise and automatically adjust the original sound addition rate according to the likelihood of the speech section based on the value. Compared with the conventional method of uniformly determining the original sound addition rate for all sections, distortion reduction is realized in the voice section, and the S / N ratio improvement amount is secured in the noise section, making it easy to hear in all sections, and remaining. A signal with less noise can be extracted and collected.
The sound collecting method of the present invention described above is realized by causing a computer to execute the sound collecting program according to the present invention. The sound collection program according to the present invention is written in a computer-readable program language, recorded on a recording medium such as a magnetic disk or a CD-ROM, and installed on the computer from the recording medium or installed through a communication line. The The installed sound collecting program is decoded and executed by a central processing unit CPU provided in the computer.
この発明による収音方法及び収音装置は例えば音声認識装置の収音装置に適用することができ、収音する音声信号の品質を改善することにより、音声認識装置の認識率を向上させることができる。 The sound collection method and sound collection device according to the present invention can be applied to, for example, a sound collection device of a voice recognition device, and can improve the recognition rate of the voice recognition device by improving the quality of a voice signal to be collected. it can.
1、1A、1B 音声入力手段 7 チャネル間レベル差算出手段
2 雑音抑圧手段 8 音源信号判定手段
3 音声区間度算出手段 9 音源信号選択手段
4 原音付加率決定手段 10 パワー積算手段
5 原音付加手段 11 原音付加率平滑化手段
6 帯域分割手段 100 収音装置
1, 1A, 1B Voice input means 7 Channel level difference calculating means
2 Noise suppression means 8 Sound source signal judgment means
3 Voice interval degree calculation means 9 Sound source signal selection means
4 Original sound addition rate determining means 10 Power integrating means
5 Original sound addition means 11 Original sound addition rate smoothing means
6 Band dividing means 100 Sound collecting device
Claims (9)
前記入力信号の各区間毎における音声区間らしさを算出する音声区間度算出処理と、
前記各区間毎に前記音声区間らしさの大きさに応じて大きい原音付加率を決定する原音付加率決定処理と、
前記各区間毎に前記原音付加率決定処理で決定した原音付加率に従って前記雑音抑圧処理された信号に前記入力信号を付加する原音付加処理と、を含み、
前記音声区間度算出処理は、目的音源及び雑音源までの距離が互に異なる距離となる位置に設置された少なくとも一対の入力手段を具備し、
これら一対の入力手段で捉えた雑音信号を含む目的信号をそれぞれチャネル別に帯域分割処理し、帯域分割された各チャネルの帯域成分を同一帯域成分毎にチャネル間でレベル比較し、このレベル比較結果に従って、前記入力手段と前記目的音源及び雑音源との距離差に対応付けして音声信号成分か雑音信号成分かを判定し、音声信号成分と判定された周波数帯域成分のパワーを積算し、このパワーの積算値により音声らしさを決定する
ことを特徴とする収音方法。 Noise suppression processing that suppresses noise contained in the input signal and emphasizes the target signal;
A speech segment degree calculation process for calculating the likelihood of a speech segment for each segment of the input signal;
An original sound addition rate determination process for determining a large original sound addition rate according to the size of the voice interval for each interval;
Original sound addition processing for adding the input signal to the noise-suppressed signal according to the original sound addition rate determined by the original sound addition rate determination processing for each section ,
The speech interval degree calculation process includes at least a pair of input means installed at positions where the distances to the target sound source and the noise source are different from each other,
The target signal including the noise signal captured by the pair of input means is subjected to band division processing for each channel, and the band components of each band-divided channel are compared between channels for each same band component, and according to the level comparison result The voice signal component or the noise signal component is determined in association with the distance difference between the input means and the target sound source and the noise source, and the power of the frequency band component determined as the voice signal component is integrated. A sound collection method characterized by determining the likelihood of voice based on the integrated value of .
前記入力信号の各区間毎における音声区間らしさを算出する音声区間度算出処理と、
前記各区間毎に前記音声区間らしさの大きさに応じて大きい原音付加率を決定する原音付加率決定処理と、
前記各区間毎に前記原音付加率決定処理で決定した原音付加率に従って前記雑音抑圧処理された信号に前記入力信号を付加する原音付加処理と、を含み、
前記原音付加率決定処理で決定した現時刻の原音付加率と前時刻に決定した原音付加率との差を算出し、差の値と極性に応じて目標とする現時刻の原音付加率に向って序々に原音付加率を変化させることを特徴とする収音方法。 Noise suppression processing that suppresses noise contained in the input signal and emphasizes the target signal;
A speech segment degree calculation process for calculating the likelihood of a speech segment for each segment of the input signal;
An original sound addition rate determination process for determining a large original sound addition rate according to the size of the voice interval for each interval;
Original sound addition processing for adding the input signal to the noise-suppressed signal according to the original sound addition rate determined by the original sound addition rate determination processing for each section ,
The difference between the original sound addition rate at the current time determined in the original sound addition rate determination process and the original sound addition rate determined at the previous time is calculated, and the difference between the original sound addition rate determined at the previous time and the target original sound addition rate at the current time according to the polarity. The sound collection method is characterized by gradually changing the original sound addition rate .
前記音声入力手段または前記雑音抑圧手段の各区間毎の出力信号における音声区間らしさを算出する音声区間度算出手段と、
前記音声区間度算出手段で算出された音声区間らしさに基づき前記各区間毎の原音付加率α(t)(0<α(t)<1)を決定する原音付加率決定手段と、
前記音声入力手段の前記各区間毎の出力信号を前記原音付加率決定手段で決定した原音付加率α(t)倍(0<α(t)<1)して、前記雑音抑圧手段の前記各区間毎の出力信号に加算する原音付加手段と、を有し、
前記音声入力手段として互に離して設けられた複数のマイクロホンを用い、
前記音声区間度算出手段として、
前記複数のマイクロホンが出力する複数のチャネル信号を帯域分割する帯域分割手段と、
前記帯域分割手段で分割された各チャネル信号の各同一帯域毎に、チャネル間の同一帯域成分毎のレベル差を算出する帯域別チャネル間レベル差算出手段と、
前記各帯域成分毎の帯域別チャネル間レベル差に基づき、その帯域の上記帯域分割された各チャネル信号のいずれかがいずれの音源から入力された信号であるかを判定する音源信号判定手段と、
前記音源信号判定手段の判定に基づき、上記帯域分割された各チャネル信号から、目的音源から入力された信号を選択する音源信号選択手段と、
前記音源信号選択手段の出力信号のパワーを積算するパワー積算手段と、
前記パワー積算手段で積算したパワー値により原音付加率を決定する原音付加率決定手段と、
前記原音付加率決定手段が決定した原音付加率に従って前記雑音抑圧手段が出力する目的信号に前記音声入力手段が出力する原音信号を付加する原音付加手段と、
を有することを特徴とする収音装置。 Noise suppression means having a function of suppressing noise with respect to the output signal of the voice input means and emphasizing a target signal;
A voice interval degree calculating means for calculating the likelihood of a voice section in an output signal for each section of the voice input means or the noise suppression means;
Original sound addition rate determining means for determining the original sound addition rate α (t) (0 <α (t) <1) for each section based on the likelihood of the voice interval calculated by the voice interval degree calculating means;
The output signal for each section of the voice input means is multiplied by the original sound addition rate α (t) determined by the original sound addition rate determination means (0 <α (t) <1), and the noise suppression means Original sound adding means for adding to the output signal for each section ,
Using a plurality of microphones provided apart from each other as the voice input means,
As the voice interval degree calculating means,
Band dividing means for dividing a plurality of channel signals output from the plurality of microphones;
For each same band of each channel signal divided by the band dividing means, a level difference calculation means between channels for each band to calculate a level difference for each same band component between channels;
Sound source signal determining means for determining which one of the channel signals divided into the band of the band is a signal input from which sound source based on the level difference between channels for each band component;
Based on the determination of the sound source signal determination means, sound source signal selection means for selecting a signal input from the target sound source from each of the band-divided channel signals,
Power integrating means for integrating the power of the output signal of the sound source signal selecting means;
Original sound addition rate determining means for determining the original sound addition rate based on the power value integrated by the power integration means;
Original sound addition means for adding the original sound signal output by the voice input means to the target signal output by the noise suppression means according to the original sound addition ratio determined by the original sound addition ratio determination means;
A sound collecting device comprising:
前記音声入力手段または前記雑音抑圧手段の各区間毎の出力信号における音声区間らしさを算出する音声区間度算出手段と、
前記音声区間度算出手段で算出された音声区間らしさに基づき前記各区間毎の原音付加率α(t)(0<α(t)<1)を決定する原音付加率決定手段と、
前記音声入力手段の前記各区間毎の出力信号を前記原音付加率決定手段で決定した原音付加率α(t)倍(0<α(t)<1)して、前記雑音抑圧手段の前記各区間毎の出力信号に加算する原音付加手段と、を有し、
前記原音付加率決定手段が決定した現時刻の原音付加率と、前時刻に決定した原音付加率との差を求め、その差の値と極性に応じて目標となる現時刻の原音付加率に向って徐々に原音付加率を変化させる原音付加率平滑化手段を設けたことを特徴とする収音装置。 Noise suppression means having a function of suppressing noise with respect to the output signal of the voice input means and emphasizing a target signal;
A voice interval degree calculating means for calculating the likelihood of a voice section in an output signal for each section of the voice input means or the noise suppression means;
Original sound addition rate determining means for determining the original sound addition rate α (t) (0 <α (t) <1) for each section based on the likelihood of the voice interval calculated by the voice interval degree calculating means;
The output signal for each section of the voice input means is multiplied by the original sound addition rate α (t) determined by the original sound addition rate determination means (0 <α (t) <1), and the noise suppression means Original sound adding means for adding to the output signal for each section ,
The difference between the original sound addition rate at the current time determined by the original sound addition rate determination means and the original sound addition rate determined at the previous time is obtained, and the target original sound addition rate at the current time is determined according to the difference value and polarity. A sound collecting apparatus comprising an original sound addition rate smoothing means for gradually changing the original sound addition rate .
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004065513A JP4518817B2 (en) | 2004-03-09 | 2004-03-09 | Sound collection method, sound collection device, and sound collection program |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004065513A JP4518817B2 (en) | 2004-03-09 | 2004-03-09 | Sound collection method, sound collection device, and sound collection program |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2005257748A JP2005257748A (en) | 2005-09-22 |
JP4518817B2 true JP4518817B2 (en) | 2010-08-04 |
Family
ID=35083568
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2004065513A Expired - Fee Related JP4518817B2 (en) | 2004-03-09 | 2004-03-09 | Sound collection method, sound collection device, and sound collection program |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP4518817B2 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5526524B2 (en) * | 2008-10-24 | 2014-06-18 | ヤマハ株式会社 | Noise suppression device and noise suppression method |
JP5233914B2 (en) * | 2009-08-28 | 2013-07-10 | 富士通株式会社 | Noise reduction device and noise reduction program |
US9905232B2 (en) * | 2013-05-31 | 2018-02-27 | Sony Corporation | Device and method for encoding and decoding of an audio signal |
CN108648756A (en) * | 2018-05-21 | 2018-10-12 | 百度在线网络技术(北京)有限公司 | Voice interactive method, device and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS58181099A (en) * | 1982-04-16 | 1983-10-22 | 三菱電機株式会社 | Voice identifier |
JPS5999497A (en) * | 1982-11-29 | 1984-06-08 | 松下電器産業株式会社 | Voice recognition equipment |
JPS6267598A (en) * | 1985-09-20 | 1987-03-27 | 株式会社リコー | Voice section detection system |
JPH10161694A (en) * | 1996-11-28 | 1998-06-19 | Nippon Telegr & Teleph Corp <Ntt> | Band split type noise reducing method |
WO1999030315A1 (en) * | 1997-12-08 | 1999-06-17 | Mitsubishi Denki Kabushiki Kaisha | Sound signal processing method and sound signal processing device |
JP2000082999A (en) * | 1998-09-07 | 2000-03-21 | Nippon Telegr & Teleph Corp <Ntt> | Noise reduction processing method/device and program storage medium |
JP2002073061A (en) * | 2000-09-05 | 2002-03-12 | Matsushita Electric Ind Co Ltd | Voice recognition device and its method |
-
2004
- 2004-03-09 JP JP2004065513A patent/JP4518817B2/en not_active Expired - Fee Related
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS58181099A (en) * | 1982-04-16 | 1983-10-22 | 三菱電機株式会社 | Voice identifier |
JPS5999497A (en) * | 1982-11-29 | 1984-06-08 | 松下電器産業株式会社 | Voice recognition equipment |
JPS6267598A (en) * | 1985-09-20 | 1987-03-27 | 株式会社リコー | Voice section detection system |
JPH10161694A (en) * | 1996-11-28 | 1998-06-19 | Nippon Telegr & Teleph Corp <Ntt> | Band split type noise reducing method |
WO1999030315A1 (en) * | 1997-12-08 | 1999-06-17 | Mitsubishi Denki Kabushiki Kaisha | Sound signal processing method and sound signal processing device |
JP2000082999A (en) * | 1998-09-07 | 2000-03-21 | Nippon Telegr & Teleph Corp <Ntt> | Noise reduction processing method/device and program storage medium |
JP2002073061A (en) * | 2000-09-05 | 2002-03-12 | Matsushita Electric Ind Co Ltd | Voice recognition device and its method |
Also Published As
Publication number | Publication date |
---|---|
JP2005257748A (en) | 2005-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9881635B2 (en) | Method and system for scaling ducking of speech-relevant channels in multi-channel audio | |
KR101461141B1 (en) | System and method for adaptively controlling a noise suppressor | |
KR101227876B1 (en) | Method and apparatus for maintaining speech audibility in multi-channel audio with minimal impact on surround experience | |
JP6147744B2 (en) | Adaptive speech intelligibility processing system and method | |
US10242692B2 (en) | Audio coherence enhancement by controlling time variant weighting factors for decorrelated signals | |
EP3155618B1 (en) | Multi-band noise reduction system and methodology for digital audio signals | |
JP6134078B1 (en) | Noise suppression | |
WO2012026126A1 (en) | Sound source separator device, sound source separator method, and program | |
KR20100099242A (en) | System for adjusting perceived loudness of audio signals | |
Hendriks et al. | Optimal near-end speech intelligibility improvement incorporating additive noise and late reverberation under an approximation of the short-time SII | |
JP6371167B2 (en) | Reverberation suppression device | |
CN112272848A (en) | Background noise estimation using gap confidence | |
Doclo et al. | Extension of the multi-channel Wiener filter with ITD cues for noise reduction in binaural hearing aids | |
JP4518817B2 (en) | Sound collection method, sound collection device, and sound collection program | |
JP2006243644A (en) | Method for reducing noise, device, program, and recording medium | |
Miyazaki et al. | Theoretical analysis of parametric blind spatial subtraction array and its application to speech recognition performance prediction | |
CN110168640A (en) | For enhancing the device and method for needing component in signal | |
KR101096091B1 (en) | Apparatus for Separating Voice and Method for Separating Voice of Single Channel Using the Same | |
Takahashi et al. | Musical noise analysis based on higher order statistics for microphone array and nonlinear signal processing | |
US12118970B2 (en) | Compensating noise removal artifacts | |
US20240170002A1 (en) | Dereverberation based on media type | |
Defraene et al. | A psychoacoustically motivated speech distortion weighted multi-channel Wiener filter for noise reduction | |
JP7264594B2 (en) | Reverberation suppression device and hearing aid | |
WO2023172609A1 (en) | Method and audio processing system for wind noise suppression | |
Shanmugapriya et al. | A thorough investigation on speech enhancement techniques for hearing aids |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20060411 |
|
RD03 | Notification of appointment of power of attorney |
Free format text: JAPANESE INTERMEDIATE CODE: A7423 Effective date: 20060411 |
|
A977 | Report on retrieval |
Free format text: JAPANESE INTERMEDIATE CODE: A971007 Effective date: 20090626 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20090804 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20090917 |
|
A02 | Decision of refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A02 Effective date: 20091222 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20100316 |
|
A911 | Transfer to examiner for re-examination before appeal (zenchi) |
Free format text: JAPANESE INTERMEDIATE CODE: A911 Effective date: 20100329 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20100506 |
|
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20100518 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20130528 Year of fee payment: 3 |
|
R150 | Certificate of patent or registration of utility model |
Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20140528 Year of fee payment: 4 |
|
S531 | Written request for registration of change of domicile |
Free format text: JAPANESE INTERMEDIATE CODE: R313531 |
|
R350 | Written notification of registration of transfer |
Free format text: JAPANESE INTERMEDIATE CODE: R350 |
|
LAPS | Cancellation because of no payment of annual fees |