JP2013179388A

JP2013179388A - Acoustic signal enhancement device, perspective determination device, method and program therefor

Info

Publication number: JP2013179388A
Application number: JP2012041052A
Authority: JP
Inventors: Yusuke Hioka; 裕輔日岡; Kenichi Furuya; 賢一古家; Yoichi Haneda; 陽一羽田; Kenta Niwa; 健太丹羽
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-02-28
Filing date: 2012-02-28
Publication date: 2013-09-09
Anticipated expiration: 2032-02-28
Also published as: JP5738218B2

Abstract

PROBLEM TO BE SOLVED: To provide an acoustic signal enhancement device which reproduces the acoustic signal of a sound source, by determining the reverberant ratio estimate of the acoustic signal precisely.SOLUTION: A direct sound direction power estimation unit obtains the power estimate of a direct sound direction signal obtained by performing a process for passing only the signal component arriving from the direct sound source direction, by means of a predetermined beam former achieved by a microphone array. A reverberant sound direction power estimation unit obtains the power estimate of a reverberant sound direction signal subjected to processing for passing the signal component arriving from other than the direct sound source direction, by means of a beam former set so that the signal component arriving from other than the direct sound source direction has the same directivity shape as that of the beam former, and the direction of main beam avoiding the direct sound source direction. A reverberant ratio estimation unit obtains a reverberant ratio estimate DRR indicating the ratio of power estimate of direct sound to power estimate of a reverberant sound direction signal, by using the power estimate of a frequency region signal and the power estimate of a reverberant sound direction signal.

Description

本発明は、音響信号の直間比を推定する技術に関する。 The present invention relates to a technique for estimating a direct ratio of an acoustic signal.

特許文献１に示す従来技術では、直間比を求めるためにマイクロホンアレーの受音信号を周波数領域に変換し、その信号から求められる空間相関行列を用いて直接音と間接音のそれぞれのパワーを求めている（例えば、実施例１の段落〔００３４〕〜〔００６１〕参照）。 In the prior art shown in Patent Document 1, the received sound signal of the microphone array is converted to the frequency domain in order to obtain the direct ratio, and the power of each of the direct sound and the indirect sound is calculated using the spatial correlation matrix obtained from the signal. (See, for example, paragraphs [0034] to [0061] of Example 1).

特開２００９−２０１７２４号公報JP 2009-201724 A

特許文献１に開示された方法では、直接音とそれと同じ方向から到来する間接音との区別がつかないため、直接音の方向から到来する音はすべて直接音と判断されてしまう。結果として直接音パワーを過大評価（または間接音パワーを過小評価）してしまい、最終的に求められる直間比が真の値よりも大きくなってしまう課題がある。 In the method disclosed in Patent Literature 1, since direct sound and indirect sound coming from the same direction cannot be distinguished, all sounds coming from the direct sound direction are determined to be direct sounds. As a result, there is a problem that the direct sound power is overestimated (or the indirect sound power is underestimated), and the finally obtained direct ratio becomes larger than the true value.

本発明は、このような課題に鑑みてなされたものであり、直接音の方向から到来する残響音を区別して、直接音パワーと残響音パワーを推定することで、従来手法に比べてより真値に近い直間比推定値（ＤＲＲ:Direct-to-Reverberation energy Ratio）を得、その正確な直間比推定値に基づいて音源の音響信号を精度よく再生する音響信号強調装置と遠近判定装置と、それらの方法及びプログラムを提供することを目的とする。 The present invention has been made in view of such problems, and distinguishes reverberant sounds coming from the direction of the direct sound, and estimates the direct sound power and the reverberant sound power. A sound signal emphasis device and a perspective determination device that obtain a direct-to-reverberation energy ratio (DRR) close to the value and accurately reproduce the sound signal of the sound source based on the accurate direct-ratio estimation value It is an object to provide such methods and programs.

本発明の音響信号強調装置は、受信音パワー推定部と、直接音方向パワー推定部と、残響音方向パワー推定部と、減算部と、直間比算出部と、対象信号調整部を有する。受信音パワー推定部は、マイクロホンアレーに含まれる複数個のマイクロホンで受音された受音信号を周波数領域に変換して得られる周波数領域信号を用い、当該周波数領域信号のパワー推定値を得る。直接音方向パワー推定部は、周波数領域信号に対して直接音源方向から到来した信号成分を主に通過させる処理を行って得られた直接音方向信号のパワー推定値、又は、受音信号に対して直接音源方向から到来した信号成分を主に通過させる処理を行った信号を周波数領域に変換して得られた直接音方向信号のパワー推定値を得る。残響音方向パワー推定部は、主に直接音源方向以外から到来した信号成分を、直接音方向パワー推定部の直接音源方向から到来した信号成分を主に通過させる処理と同じ指向性形状で通過させる処理を行って得られた残響音方向信号のパワー推定値、又は、受音信号に対して主に前記直接音源方向以外から到来した信号成分を通過させる処理を行った信号を周波数領域に変換して有られた残響音方向信号のパワー推定値、を得る。減算部は、直接音方向信号のパワー推定値から残響音方向信号のパワー推定値を減算した直接音パワー推定値を出力する。直間比算出部は、周波数領域信号のパワー推定値及び残響音方向信号のパワー推定値を用い、残響音方向信号のパワー推定値に対する直接音のパワー推定値の比率を表す直間比推定値を得る。対象信号調整部は、直間比推定値に応じたゲインを、受音信号から得られる処理対象信号に乗じて処理後信号を得る。そして、直間比推定値が表す比率が所定の閾値よりも大きい処理対象信号に乗じられるゲインは、比率が上記所定の閾値よりも小さな処理対象信号に乗じられる上記ゲインよりも大きい。 The acoustic signal enhancement apparatus of the present invention includes a received sound power estimation unit, a direct sound direction power estimation unit, a reverberation sound direction power estimation unit, a subtraction unit, a direct ratio calculation unit, and a target signal adjustment unit. The reception sound power estimation unit obtains a power estimation value of the frequency domain signal using a frequency domain signal obtained by converting a reception signal received by a plurality of microphones included in the microphone array into a frequency domain. The direct sound direction power estimator is a power estimation value of a direct sound direction signal obtained by performing a process of mainly passing a signal component arriving from a direct sound source direction to a frequency domain signal, or a received sound signal. Thus, the power estimation value of the direct sound direction signal obtained by converting the signal subjected to the processing of mainly passing the signal component arriving from the direct sound source direction into the frequency domain is obtained. The reverberant sound direction power estimation unit passes signal components that mainly come from other than the direct sound source direction in the same directivity shape as the processing that mainly passes signal components that come from the direct sound source direction of the direct sound direction power estimation unit. The power estimation value of the reverberant sound direction signal obtained by processing, or the signal that has been processed to pass the signal component that mainly arrives from outside the direct sound source direction to the received sound signal is converted to the frequency domain To obtain the estimated power value of the reverberation direction signal. The subtracting unit outputs a direct sound power estimated value obtained by subtracting the power estimated value of the reverberant sound direction signal from the power estimated value of the direct sound direction signal. The direct ratio calculation unit uses the power estimation value of the frequency domain signal and the power estimation value of the reverberation sound direction signal, and indicates the ratio of the direct sound power estimation value to the power estimation value of the reverberation sound direction signal. Get. The target signal adjustment unit multiplies the processing target signal obtained from the received sound signal by a gain corresponding to the direct ratio estimation value to obtain a processed signal. The gain multiplied by the processing target signal whose ratio represented by the direct ratio estimation value is larger than the predetermined threshold is larger than the gain multiplied by the processing target signal whose ratio is smaller than the predetermined threshold.

また、本発明の遠近判定装置は、上記した音響信号強調装置と同じ、受信音パワー推定部と、直接音方向パワー推定部と、残響音方向パワー推定部と、減算部と、直間比算出部と、を備え、更に遠近判定部を備える。その遠近判定部は、１個以上のフレームからなる判定区間で受音された前記受音信号に基づいて得られた前記直間比推定値に対応する判定値と、前記判定区間よりも多くの個数のフレームからなる基準区間で受音された前記受音信号に基づいて得られた複数の前記直間比推定値に対応する基準値とを用いた比較判定によって、前記判定区間での前記直接音源の遠近判定を行う。 Further, the perspective determination device of the present invention is the same as the above-described acoustic signal enhancement device, the received sound power estimation unit, the direct sound direction power estimation unit, the reverberant sound direction power estimation unit, the subtraction unit, and the direct ratio calculation And a perspective determination unit. The perspective determination unit includes a determination value corresponding to the direct ratio estimation value obtained based on the received sound signal received in a determination section including one or more frames, and more than the determination section. The direct determination in the determination section is performed by comparison determination using a plurality of reference values corresponding to the direct ratio estimation values obtained based on the received sound signal received in the reference section including a number of frames. Determine the perspective of the sound source.

本発明の音響信号強調装置は、この発明が提案する直間比推定方法で求めた直間比推定値を用いて音源の音響信号を強調する。その直間比推定方法は、残響音の拡散性が強いことによる到来方向の等方性に着目した新しい方法であり、マイクロホンアレーにより実現される指向性形状が同一な２つ以上のビームフォーマによって、直接音方向から到来する信号のうち直接音と残響音を区別して、それぞれのパワーを正しく推定する。その結果として直間比の推定精度を向上させることができるので、音源の音響信号を正確に強調することを可能にする。 The acoustic signal emphasizing apparatus of the present invention enhances the sound signal of the sound source using the direct ratio estimation value obtained by the direct ratio estimation method proposed by the present invention. The direct ratio estimation method is a new method that focuses on the isotropic direction of arrival due to the strong diffusivity of reverberant sound. It uses two or more beamformers with the same directivity shape realized by the microphone array. In the signals coming from the direct sound direction, the direct sound and the reverberant sound are distinguished and the respective powers are correctly estimated. As a result, the accuracy of estimating the direct ratio can be improved, so that the sound signal of the sound source can be accurately emphasized.

また、本発明の遠近判定装置は、この発明の直間比推定方法で求めた直間比推定値に基づいて、発音時刻が異なる音の音源の距離の遠近を判定するので、従来のものよりも正確な判定をすることができる。 In addition, the perspective determination device of the present invention determines the distance of the sound source of sounds having different pronunciation times based on the direct ratio estimation value obtained by the direct ratio estimation method of the present invention. Can also make an accurate determination.

音響信号強調装置４００を利用する場面の一例を示す図。The figure which shows an example of the scene using the acoustic signal enhancement apparatus. 屋内での音の伝搬経路を示す図。The figure which shows the propagation path of the sound indoors. 直間比とマイクロホン間距離との関係を示す図。The figure which shows the relationship between direct ratio and the distance between microphones. 各実施例に対応する原理を概念的に示す図。The figure which shows the principle corresponding to each Example notionally. 同じ指向性形状を持ち、メインビームが異なる方向に向けられた２つのビームフォーマを示す図であり、（ａ）は音源方向にビームを向けたビームフォーマ、（ｂ）は音源方向にヌルを向けたビームフォーマを示す。It is a figure which shows the two beam formers which have the same directivity shape and the main beam is directed in different directions, (a) is a beam former in which the beam is directed to the sound source direction, (b) is directed to the sound source direction. Shows the beamformer. 実施例１の音響信号強調装置４００の機能構成例を示す図。FIG. 3 is a diagram illustrating a functional configuration example of an acoustic signal enhancement device 400 according to the first embodiment. 音響信号強調装置４００の動作フローを示す図。The figure which shows the operation | movement flow of the acoustic signal enhancement apparatus 400. 処理対象信号生成部４３の機能構成例を示す図。The figure which shows the function structural example of the process target signal production | generation part 43. FIG. 直間比計算部４４の機能構成例を示す図。The figure which shows the function structural example of the direct ratio calculation part 44. 直間比計算部４４′の機能構成例を示す図。The figure which shows the function structural example of direct ratio calculation part 44 '. 各残響指向性形成部４４３１_１〜４４３１_Ｎの指向性形状の例を模式的に示す図。Schematically illustrates an example of a directional shape of each reverberation directivity forming section 4431 ₁ ~4431 _N. 直間比計算部４４″の機能構成例を示す図。The figure which shows the function structural example of direct ratio calculation part 44 ''. 実施例２の遠近判定装置１３０の機能構成例を示す図。FIG. 6 is a diagram illustrating a functional configuration example of a perspective determination device according to a second embodiment. 効果確認実験の実験条件を示す図。The figure which shows the experimental condition of an effect confirmation experiment. 直間比推定のシミュレーション結果を示す図。The figure which shows the simulation result of direct ratio estimation. 直間比推定装置１６０の機能構成例を示す図。The figure which shows the function structural example of the direct ratio estimation apparatus 160. FIG.

以下、本発明の実施の形態を図面を参照して説明する。複数の図面中同一のものには同じ参照符号を付し、説明は繰り返さない。また、以下の説明において、テキスト中で使用する記号「￣」や「＾」等は、本来直後の文字の真上に記載されるべきものであるが、テキスト記法の制限により、当該文字の直前に記載する。式中においてはこれらの記号は本来の位置に記述している。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. The same reference numerals are given to the same components in a plurality of drawings, and the description will not be repeated. In the following description, the symbols “記号”, “^”, etc. used in the text should be written immediately above the original character, but immediately before the character due to restrictions on the text notation. It describes. In the formula, these symbols are written in their original positions.

実施例の説明の前に、各実施例に対応する原理について説明する。
〔原理〕
実施例１の音響信号強調装置は、単一のマイクロホンアレーを用いて、マイクロホンアレーから特定の距離範囲にある音だけを強調若しくは抑圧して、所定の範囲内の音源の音を収音することを目的とするものである。また、実施例２の遠近判定装置は、受音信号の音源位置の遠近を判定するものである。 Prior to the description of the embodiments, the principle corresponding to each embodiment will be described.
〔principle〕
The acoustic signal emphasizing device according to the first embodiment uses a single microphone array to emphasize or suppress only sound within a specific distance range from the microphone array and collect sound of a sound source within a predetermined range. It is intended. Further, the perspective determination apparatus according to the second embodiment determines the perspective of the sound source position of the received sound signal.

図１に、実施例１の音響信号強調装置４００を利用する場面を例示する。小型マイクロホンアレー１１を、例えば４人の発話者１２〜１４が取り囲んで会議をしている場面を想定する。その会議室内には、テレビ１６、電話１７、館内放送用のスピーカ１８が配置されているものとする。このような場面において、館内放送の音声や、電話の音等を収音せずに、小型マイクロホンアレー１１を中心として所定の距離範囲内（破線で示す円内）に位置する発話者１２〜１４の発話だけを収音したい。 FIG. 1 illustrates a scene in which the acoustic signal enhancement device 400 according to the first embodiment is used. Assume that a small microphone array 11 is surrounded by, for example, four speakers 12 to 14 for a conference. It is assumed that a television 16, a telephone 17, and a speaker 18 for broadcasting in the hall are arranged in the conference room. In such a scene, the speakers 12 to 14 located within a predetermined distance range (within a circle indicated by a broken line) around the small microphone array 11 without picking up the voice of the in-house broadcast or the sound of the telephone. I want to pick up only the utterances.

そこで、マイクロホンアレーから音源までの距離を見分けるために、受信音に含まれる直接音と間接音（残響音とも称する）との比（以降、直間比と称する）に着目する。図２に屋内にマイクロホンを置いて音を収録した際の、音源２１からマイクロホン２２までの音の伝搬経路を示す。直接音とは、音源２１からマイクロホンまで直接到達する太い実線で示す音波である。一方の残響音とは、音源２１から発した音が壁や床や天井などで反射してからマイクロホン２２に到達する破線で示す音波である。 Therefore, in order to distinguish the distance from the microphone array to the sound source, attention is focused on the ratio of direct sound and indirect sound (also referred to as reverberation sound) included in the received sound (hereinafter referred to as direct ratio). FIG. 2 shows a sound propagation path from the sound source 21 to the microphone 22 when a microphone is placed indoors and a sound is recorded. The direct sound is a sound wave indicated by a thick solid line that directly reaches from the sound source 21 to the microphone. One reverberant sound is a sound wave indicated by a broken line that reaches the microphone 22 after the sound emitted from the sound source 21 is reflected by a wall, floor, ceiling, or the like.

図３に直間比とマイクロホン間距離との関係を示す。図３の横軸はマイクロホンから音源までの距離、縦軸は直間比である。一般的に間接音はマイクロホンからの距離に依存しない一定の大きさを示す。その間接音に対して直接音は、マイクロホンからの距離の増加に伴って単調に減少する特性を示す。その直接音を間接音で除した直間比は、直接音と同様に距離の増加に伴って単調に減少する特性になる。 FIG. 3 shows the relationship between the direct ratio and the distance between the microphones. The horizontal axis in FIG. 3 is the distance from the microphone to the sound source, and the vertical axis is the direct ratio. In general, the indirect sound has a certain magnitude that does not depend on the distance from the microphone. In contrast to the indirect sound, the direct sound exhibits a characteristic that monotonously decreases as the distance from the microphone increases. The direct ratio obtained by dividing the direct sound by the indirect sound has a characteristic that decreases monotonously as the distance increases, as in the case of the direct sound.

この直間比から、マイクロホンアレー１１を中心とした所定の距離範囲を推定することが可能である。したがって、この直間比を用いることで所望の音源からの音響信号のみを強調することが可能になる。 From this direct ratio, a predetermined distance range around the microphone array 11 can be estimated. Therefore, it is possible to emphasize only the acoustic signal from the desired sound source by using this direct ratio.

図４に、本発明の直間比推定の原理の考えを概念的に示す。一般に残響が十分ある場合には残響音に拡散性を仮定することができ、マイクロホンから見た場合に残響音はあらゆる方向から同じ大きさで到来する音としてモデル化できることが知られている。小型マイクロホンアレー１１の出力信号に任意のビームフォーマＢＦ１を適用すると、所定の指向性形状Ｄ_１で残響音方向パワー２３を受音することができる。残響音方向パワー２３の３本の矢印は、指向性形状Ｄ_１で得られる残響音の大きさを模式的に表現している。 FIG. 4 conceptually shows the idea of the principle of direct ratio estimation according to the present invention. In general, when reverberation is sufficient, diffusivity can be assumed for reverberant sound, and it is known that reverberant sound can be modeled as sound arriving at the same magnitude from all directions when viewed from a microphone. Applying any beamformer BF1 output signal of the small microphone array 11, may be received sound reverberation direction power 23 in a predetermined directional shape D _1. Three arrows reverberation direction power 23, the magnitude of reverberation obtained by directional shape D ₁ are schematically represented.

いま音源２１の位置が既知であると仮定した場合、音源２１から小型マイクロホンアレー１１に直接到来する直接音パワー２５は、ビームフォーマＢＦ０の指向性形状Ｄ_０を、Ｄ_１と同じとし、その指向方向を音源２１方向とすることで、残響音方向パワー２３と同じ大きさの残響音方向パワーを含む直接音方向パワー２６を受音することができる。 Assuming that the position of the sound source 21 is already known, the direct sound power 25 that directly arrives at the small microphone array 11 from the sound source 21 has the directivity shape D ₀ of the beamformer BF ₀ as D _1, and its directivity. By setting the direction to the direction of the sound source 21, it is possible to receive the direct sound direction power 26 including the reverberation sound direction power having the same magnitude as the reverberation sound direction power 23.

残響音方向パワー２３と同じ残響成分を含む直接音方向パワー２６から、残響音方向パワー２３を差し引くことで直接音パワー２５を得ることができる。次に、この原理を理論的に説明する。 The direct sound power 25 can be obtained by subtracting the reverberant sound direction power 23 from the direct sound direction power 26 including the same reverberation component as the reverberant sound direction power 23. Next, this principle will be explained theoretically.

＜残響音の等方到来モデル＞
提案方式では、残響音の等方性を考慮したモデルを導入する。ここでは、パワー推定値としてパワースペクトル密度又はその推定値を用いた例を説明するが、これは本発明を限定するものではない。 <Model of arrival of isotropic reverberation>
In the proposed method, a model considering the isotropic nature of reverberant sound is introduced. Here, an example in which the power spectral density or its estimated value is used as the power estimated value will be described, but this does not limit the present invention.

Ｍ（Ｍ≧２）個のマイクロホンからなるマイクロホンアレーのｍ番目のマイクロホンでの受音信号を短時間フーリエ変換等によって周波数領域に変換すると、以下の周波数領域信号Ｘ^（ｍ）（ω，ｔ）が得られる。 When the received sound signal of the m-th microphone of the microphone array composed of M (M ≧ 2) microphones is converted into the frequency domain by short-time Fourier transform or the like, the following frequency domain signal X ^(m) (ω, t) Is obtained.

ただし、ωは周波数であり、Ｈ_Ｄ ^（ｍ）（ω）は音源からｍ番目のマイクロホンまでの直接音の伝達関数であり、Ｈ_Ｒ ^（ｍ）（ω）は音源からｍ番目のマイクロホンまでの間接音の伝達関数であり、Ｓ（ω，ｔ）は音源の音を周波数領域に変換して得られる信号である。ｔは時間フレームのインデックスである。 Where ω is a frequency, H _D ^(m) (ω) is a direct sound transfer function from the sound source to the m-th microphone, and H _R ^(m) (ω) is a sound source to the m-th microphone. It is a transfer function of indirect sound, and S (ω, t) is a signal obtained by converting the sound of the sound source into the frequency domain. t is a time frame index.

ここで直接音はコヒーレント（coherent）である一方、間接音はその主な成分が残響であることから拡散音（diffuse）であると仮定する。すなわち、それぞれの到来方向に着目した場合、直接音は音源の方向からのみ到来するのに対し、間接音はあらゆる方向から一様なパワーで到来する性質（以下等方性という）を持つ。提案方法ではこれら空間的な到来特性の違いに着目して直接音パワーと間接音パワーを推定して直間比を求める。 Here, it is assumed that the direct sound is coherent while the indirect sound is a diffuse sound because its main component is reverberation. That is, when attention is paid to the respective arrival directions, the direct sound arrives only from the direction of the sound source, whereas the indirect sound has a property of arriving with uniform power from all directions (hereinafter referred to as isotropic). In the proposed method, the direct ratio is obtained by estimating the direct sound power and the indirect sound power by paying attention to the difference in these arrival characteristics.

前提条件として直接音の到来方向（以下「直接音源方向」という）は既知であり、直接音及び任意の方向から到来する間接音は平面波とみなせるとし、拡散音の定義より直接音と間接音は互いに無相関とする。このとき音源からｍ番目のマイクロホンまでの直接音、間接音の伝達関数Ｈ_Ｄ ^（ｍ）（ω），Ｈ_Ｒ ^（ｍ）（ω）は、それぞれ以下のように表現できる。 As a prerequisite, the direct sound arrival direction (hereinafter referred to as “direct sound source direction”) is known, and direct sound and indirect sound coming from any direction can be regarded as plane waves. Uncorrelated with each other. At this time, the transfer functions H _D ^(m) (ω) and H _R ^(m) (ω) of the direct sound and the indirect sound from the sound source to the m-th microphone can be expressed as follows.

ただし、Ｈ_Ｄｒｅｆ（ω）は音源からマイクロホンアレーの基準点（「基準点」という）までの伝達関数の直接音成分であり、Ｈ_{Ｒｒｅｆ，θ}（ω）は基準点からみた方向θの間接音成分である。基準点はマイクロホンアレーの内部に存在してもよいし、マイクロホンアレーのマイクの何れかの位置に存在してもよい。 Here, H _Dref (ω) is a direct sound component of the transfer function from the sound source to the reference point (referred to as “reference point”) of the microphone array, and H _{Rref, θ} (ω) is an indirect sound in the direction θ viewed from the reference point. It is an ingredient. The reference point may exist inside the microphone array, or may exist at any position on the microphone of the microphone array.

直接音と間接音の伝達関数Ｈ_Ｄ ^（ｍ）（ω），Ｈ_Ｒ ^（ｍ）（ω）のそれぞれは、音源から基準点までの伝達関数成分と、基準点からｍ番目のマイクロホンまでの伝搬遅延による位相差成分とに分解して表すことができる。従って、周波数領域信号Ｘ^（ｍ）（ω，ｔ）（ｍ∈{１，…，Ｍ}）を要素とするマイクロホンアレー入力ベクトル^→ｘ（ω，ｔ）＝[Ｘ^（１）（ω，ｔ），…，Ｘ^（Ｍ）（ω，ｔ）]^Ｔは次式で表される。Ｔは転置を表す。 The transfer functions H _D ^(m) (ω) and H _R ^(m) (ω) of the direct sound and the indirect sound are the transfer function component from the sound source to the reference point and the propagation from the reference point to the mth microphone. It can be expressed as a phase difference component due to delay. Therefore, a microphone array input vector whose elements are frequency domain signals X ^(m) (ω, t) (mε {1,..., M}) ^→ x (ω, t) = [X ⁽¹⁾ (ω, t ,..., X ^(M) (ω, t)] ^T is expressed by the following equation. T represents transposition.

ただし、Ｓ_Ｄ（ω，ｔ）＝Ｈ_Ｄｒｅｆ（ω）Ｓ（ω，ｔ），Ｓ_Ｒ，θ（ω，ｔ）＝Ｈ_{Ｒｒｅｆ，θ}（ω）Ｓ（ω，ｔ）である。^→ａ_θ（ω）は式（５）で表されるθ方向のアレイ・マニフォールド・ベクトルである。アレイ・マニフォールド・ベクトルの各要素は伝搬遅延τ_θ ^（ｍ）に依存する。直接音及び間接音が平面波とみなせる場合、伝搬遅延τ_θ ^（ｍ）はマイクロホンアレーの基準点に対する各マイクロホンの相対位置及び方向θに依存する。なお、アレイ・マニフォールド・ベクトルの詳細については、例えば、参考文献１「浅野太著，“音のアレイ信号処理−音源の定位・追跡と分離（日本音響学会編音響テクノロジーシリーズ）”，株式会社コロナ社，2011年2月25日，ISBN978-4-339-01116-6」の第１章（P1〜26）を参照されたい。 However, S _D (ω, t) = H _Dref (ω) S (ω, t), S _{R, θ} (ω, t) = H _{Rref, θ} (ω) S (ω, t). ^→ a _θ (ω) is an array manifold vector in the θ direction represented by the equation (5). Each element of the array manifold vector depends on the propagation delay τ _θ ^(m) . When direct sound and indirect sound can be regarded as plane waves, the propagation delay τ _θ ^(m) depends on the relative position and direction θ of each microphone relative to the reference point of the microphone array. For details of the array manifold vector, refer to Reference Document 1 “Taita Asano,“ Sound Array Signal Processing-Sound Source Localization / Tracking and Separation (Sound Technology Series edited by the Acoustical Society of Japan) ”, Corona Co., Ltd. Company, February 25, 2011, ISBN978-4-339-01116-6 ”, Chapter 1 (P1-26).

このマイクロホンアレー入力に任意のビームフォーマ（ＢＦ）を適用すると、その出力パワースペクトル密度（ＰＳＤ）は、式（６）に示す直接音と間接音のそれぞれの出力パワースペクトル密度（ＰＳＤ）にビームフォーマ（ＢＦ）のパワーゲイン｜Ｄ_θ（ω）｜^２を乗じた和となる。 When an arbitrary beamformer (BF) is applied to the microphone array input, the output power spectral density (PSD) is changed to the output power spectral density (PSD) of each of the direct sound and the indirect sound shown in Equation (6). The sum of (BF) multiplied by the power gain | _Dθ (ω) | ² .

ただし、Ｐ_Ｄ（ω）＝Ｅ[｜Ｓ_Ｄ（ω，ｔ）｜^２]_ｔ，Ｐ_Ｒ，θ（ω）＝Ｅ[｜Ｓ_Ｒ（ω，ｔ）｜^２]_ｔ，^→ｗ（ω）はビームフォーマ（ＢＦ）のフィルタ係数、Ｒ（ω）はｉｊ成分にＲ_ｉｊ（ω）＝Ｅ[Ｘ_ｉ（ω，ｔ）Ｘ_ｊ ^＊（ω，ｔ）]_ｔを持つマイクロホンアレーの入力信号空間相関行列である。Ｅ[・]は期待値演算を表している。 _{However, P D (ω) = E} [| S D (ω, t) | 2] t, P R, θ (ω) = E [| S R (ω, t) | 2] t, → w (ω ) Is a filter coefficient of the beamformer (BF), and R (ω) is an input of a microphone array having R _ij (ω) = E [X _i (ω, t) X _j ^* (ω, t)] _t in the ij component. It is a signal space correlation matrix. E [•] represents an expected value calculation.

＜複数のビームフォーマを用いた直間比推定＞
式（６）において間接音が等方的に到来すると仮定できる音場では、残響音パワーＰ_Ｒ，_θ（ω）は方向θに依らない定数￣Ｐ_Ｒ（ω）で置き換えることができ、出力パワースペクトル密度は式（７）で表せる。 <Direct ratio estimation using multiple beamformers>
In the sound field that can be assumed that the indirect sound isotropically arrives in Equation (6), the reverberant power P _R , _θ (ω) can be replaced with a constant ￣P _R (ω) that does not depend on the direction θ. The power spectral density can be expressed by equation (7).

ここで、図５に示すように同じ指向性形状を持ち、メインビームが異なる方向に向けられた２つのビームフォーマＢＦ０とＢＦ１があるとすると、式（７）の右辺第二項∫_θ｜Ｄ_θ（ω）｜^２ｄθは等しくなり、各ビームフォーマの出力は、右辺第一項すなわち直接音に対するビームフォーマのパワーゲインによってのみ変化する。 Here, if there are two beam formers BF0 and BF1 having the same directivity shape as shown in FIG. 5 and having the main beam directed in different directions, the second term ∫ _θ | D on the right side of Equation (7) _θ (ω) | ² dθ are equal, and the output of each beamformer changes only with the first term on the right side, that is, the beamformer's power gain for direct sound.

そこで、音源方向にビームを向けたビームフォーマＢＦ０の出力パワースペクトル密度Ｐ_０（ω）から音源方向にヌル（指向性感度の低い点）を向けたビームフォーマＢＦ１の出力パワースペクトル密度Ｐ_１（ω）を減算することで、直接音パワー２５を求めることができる。 Therefore, the output power spectral density P ₁ beamformer BF1 with its null (low point directivity sensitivity) from the output power spectral density P ₀ beamformer BF0 with its beam to the sound source direction _(omega) in the sound source direction _(omega ) Is subtracted, the direct sound power 25 can be obtained.

以上の原理により、直接音源方向から到来する残響音を区別することができ、結果として直間比の推定精度を向上させることが可能になる。 Based on the above principle, reverberant sound coming directly from the sound source direction can be distinguished, and as a result, the accuracy of the direct ratio can be improved.

図６に、実施例１の音響信号強調装置４００の機能構成例を示す。その動作フローを図７に示す。音響信号強調装置４００は、マイクロホンアレー４１と、複数の周波数領域変換部４２₁〜４２_Mと、処理対象信号生成部４３と、直間比計算部４４と、対象信号調整部４５と、逆周波数領域変換部４６と、を具備する。マイクロホンアレー４１を除く各機能構成部は、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現されるものである。 FIG. 6 illustrates a functional configuration example of the acoustic signal enhancement device 400 according to the first embodiment. The operation flow is shown in FIG. The acoustic signal enhancement device 400 includes a microphone array 41, a plurality of frequency domain conversion units 42 _{1 to} 42 _M , a processing target signal generation unit 43, a direct ratio calculation unit 44, a target signal adjustment unit 45, and an inverse frequency. An area conversion unit 46. Each functional configuration unit excluding the microphone array 41 is realized by a predetermined program being read into a computer including, for example, a ROM, a RAM, and a CPU, and the CPU executing the program.

マイクロホンアレー４１は複数のマイクロホンｍ₁，…ｍ_Mから成る。複数の周波数領域変換部４２₁，…，４２_Mは、複数のマイクロホンｍ₁，…ｍ_Mで受音された受音信号ｘ_m（ｎ）がそれぞれ入力され、各受音信号を周波数領域の信号に変換する（ステップＳ４２）。周波数領域変換部４２₁，…，４２_Mは、受音信号ｘ_m（ｎ）を、例えばサンプリング周波数１６ｋＨｚでサンプリングしてディジタル信号に変換し、例えば２５６個のサンプルを１フレームとして、それぞれのフレームにおいて離散フーリエ変換を行い周波数成分Ｘ_m（ω，ｔ）を出力する（ステップＳ４２）。ωは周波数、ｔはフレーム番号である。なお、受音信号ｘ_m（ｎ）をディジタル信号に変換するＡ/Ｄ変換器は省略している。 Microphone array 41 is a plurality of microphones m _1, consisting of ... m _M. A plurality of frequency domain transform section 42 _1, ..., 42 _M, a plurality of microphones m _1, ... m _M received sound has been received sound signal x _m (n) are inputted, respectively, each received sound signal in the frequency domain It converts into a signal (step S42). The frequency domain converters 42 ₁ ,..., 42 _M sample the received sound signal x _m (n), for example, at a sampling frequency of 16 kHz and convert it into a digital signal, for example, 256 samples as one frame. In Step S42, discrete Fourier transform is performed to output a frequency component X _m (ω, t) (step S42). ω is a frequency, and t is a frame number. An A / D converter that converts the received sound signal x _m (n) into a digital signal is omitted.

処理対象信号生成部４３は、複数の周波数領域変換部４２₁，…，４２_Mが出力する周波数領域の信号Ｘ_m（ω，ｔ）を合成して処理対象信号Ｙ（ω，ｔ）を生成する（ステップＳ４３）。 Processing signal generating unit 43, a plurality of frequency domain transform section 42 _1, ..., generates a 42 _M signals X _m of frequency domain output (omega, t) by combining the processed signal Y (ω, t) (Step S43).

直間比計算部４４は、複数の周波数領域変換部４２₁，…，４２_mが出力する周波数領域の信号Ｘ_m（ω，ｔ）を入力として受音信号の直間比推定値ＤＲＲ（ω，ｔ）を計算する（ステップＳ４４）。直間比計算部４４の詳しい動作説明は後述する。 Chokkan ratio calculation unit 44, a plurality of frequency domain transform section 42 _1, ..., 42 _m signal X _m (ω, t) in the frequency domain to output Chokkan ratio estimate DRR of the received sound signals as input (omega , T) is calculated (step S44). Detailed operation of the direct ratio calculation unit 44 will be described later.

対象信号調整部４５は、処理対象信号Ｙ（ω，ｔ）と、直間比推定値ＤＲＲ（ω，ｔ）を入力としてその値に応じて処理対象信号Ｙ（ω，ｔ）の振幅を調整した処理後信号Ｚ（ω，ｔ）を生成する（ステップＳ４５）。 The target signal adjustment unit 45 receives the processing target signal Y (ω, t) and the direct ratio estimated value DRR (ω, t) as input and adjusts the amplitude of the processing target signal Y (ω, t) according to the values. The processed post-process signal Z (ω, t) is generated (step S45).

逆周波数領域変換部４６は、処理後信号Ｚ（ω，ｔ）を時間領域の信号ｚ（ｎ）に変換する（ステップＳ４６）。ステップＳ４１〜ステップＳ４６までの動作は、全ての受音信号ｘ_m（ｎ）が終了するまで継続される。 The inverse frequency domain converter 46 converts the processed signal Z (ω, t) into a time domain signal z (n) (step S46). The operations from step S41 to step S46 are continued until all the sound reception signals x _m (n) are completed.

ここで、直間比推定値ＤＲＲ（ω，ｔ）の値に応じて調整とは、ＤＲＲ（ω，ｔ）の閾値処理や、その値が大きいほど処理後信号Ｚ（ω，ｔ）の振幅を大きくする処理や、その値が大きいほど処理後信号Ｚ（ω，ｔ）の振幅を小さくする等の処理を含む。詳しくは後述する。 Here, adjustment according to the value of the direct ratio estimated value DRR (ω, t) means threshold processing of DRR (ω, t) and the amplitude of the post-processing signal Z (ω, t) as the value increases. And processing such as decreasing the amplitude of the post-processing signal Z (ω, t) as the value increases. Details will be described later.

以上の動作により、マイクロホンアレーによって、例えば、特定の距離範囲にある音だけを強調し、その範囲外の音は抑圧して収音する雑音除去が行われる。以降、各部のより具体的な機能構成例を示して更に詳しく本発明を説明する。 With the above operation, noise removal is performed by the microphone array, for example, by emphasizing only sounds within a specific distance range and suppressing and collecting sounds outside the range. Hereinafter, the present invention will be described in more detail by showing more specific functional configuration examples of the respective units.

〔処理対象信号生成部〕
図８に処理対象信号生成部４３のより具体的な機能構成例を示す。処理対象信号生成部４３は、複数の重み乗算手段４３１₁〜４３１_Mと、加算手段４３２を備える。複数の重み乗算手段４３１₁〜４３１_Mは、Ｍ個のマイクロホンで受音した複数の受音信号ｘ_m（ｎ）の、それぞれの周波数成分Ｘ₁（ω，ｔ），…，Ｘ_M（ω，ｔ）に重み係数ｗ_m（ω）を乗ずる。 [Processing signal generator]
FIG. 8 shows a more specific functional configuration example of the processing target signal generation unit 43. The processing target signal generation unit 43 includes a plurality of weight multiplication units 431 _{1 to} 431 _M and an addition unit 432. The plurality of weight multiplying means 431 _{1 to} 431 _M are respectively frequency components X ₁ (ω, t),..., X _M (ω) of a plurality of received signals x _m (n) received by M microphones. , T) is multiplied by a weighting factor w _m (ω).

重み乗算手段４３１₁〜４３１_Mで使用する重みには、例えばＭ個のマイクロホンが無指向性の場合にはｗ_m＝１/Ｍとすることで全ての周波数成分Ｘ₁（ω，ｔ），…，Ｘ_M（ω，ｔ）の平均を取ることで、処理対象信号Ｙ（ω，ｔ）を安定化させる。また、Ｍ個のマイクロホンが指向性を持つ場合には、ｗ₁＝１，ｗ_m＝０（ｍ＝{２，…，Ｍ}）とすることで、特定のマイクロホンの信号だけを使用することができる。例えば、参考文献２「大賀、山崎、金田著、“音響システムとディジタル信号処理”電子情報通信学会発行」に記載されているような方法を利用して、重みビームフォーミングのフィルタ係数を使用すれば、マイクロホンアレーで任意の指向性を形成することもできる。 For the weights used in the weight multiplication means 431 _{1 to} 431 _M , for example, when M microphones are omnidirectional, by setting w _m = 1 / M, all frequency components X ₁ (ω, t), .., X _M (ω, t) is averaged to stabilize the processing target signal Y (ω, t). Also, when M microphones have directivity, use only a specific microphone signal by setting w ₁ = 1, w _m = 0 (m = {2,..., M}). Can do. For example, if the filter coefficient of the weight beamforming is used by using a method described in Reference Document 2 “Oga, Yamazaki, Kanada,“ Sound System and Digital Signal Processing ”published by the Institute of Electronics, Information and Communication Engineers” An arbitrary directivity can be formed by the microphone array.

加算手段４３２は、重みが乗ぜられた全ての周波数成分Ｘ₁（ω，ｔ），…，Ｘ_M（ω，ｔ）を加算して処理対象信号Ｙ（ω，ｔ）を出力する。 The adding means 432 adds all the frequency components X ₁ (ω, t),..., X _M (ω, t) multiplied by the weights, and outputs a processing target signal Y (ω, t).

加算手段を用いず、音源に近い位置にマイクロホンアレーとは別にマイクロホンを設置して、設置したマイクロホンの収音信号を処理対象信号Ｙ（ω，ｔ）としてもよい。 A microphone may be installed separately from the microphone array at a position close to the sound source without using the adding means, and the collected sound signal of the installed microphone may be used as the processing target signal Y (ω, t).

〔直間比計算部〕
図９に直間比計算部４４の機能構成例を示す。直間比計算部４４は、受信音パワー推定部４４１と、直接音方向パワー推定部４４２と、残響音方向パワー推定部４４３と、減算部４４４と、直間比算出部４４５と、を備える。 (Direct ratio calculation section)
FIG. 9 shows a functional configuration example of the direct ratio calculation unit 44. The direct ratio calculation unit 44 includes a received sound power estimation unit 441, a direct sound direction power estimation unit 442, a reverberation sound direction power estimation unit 443, a subtraction unit 444, and a direct ratio calculation unit 445.

受信音パワー推定部４４１は、マイクロホンアレー４１に含まれる複数個のマイクロホンで受音された受音信号を周波数領域に変換して得られる周波数領域信号Ｘ_１（ω，ｔ），…，Ｘ_Ｍ（ω，ｔ）を用い、受音信号に対応する周波数領域信号のパワー推定値を生成して出力する。このパワー推定値は、式（９）のように何れか１個のマイクロホンｍ（ｍ∈{１，…，Ｍ}）に対応する周波数領域信号Ｘ_ｍ（ω，ｔ）のパワー推定値であってもよいし、式（１０）のように周波数領域信号Ｘ_１（ω，ｔ），…，Ｘ_Ｍ（ω，ｔ）それぞれのパワー推定値を重み付け平均したものであってもよい。実施例１では、受音信号に対応する周波数領域信号のパワー推定値としてパワースペクトル密度Ｐ_Ｘ，Ｌ（ω）を求める。 The reception sound power estimation unit 441 converts frequency reception signals received by a plurality of microphones included in the microphone array 41 into frequency domain signals X ₁ (ω, t),..., X _M Using (ω, t), a power estimation value of the frequency domain signal corresponding to the received sound signal is generated and output. This power estimated value is a power estimated value of the frequency domain signal X _m (ω, t) corresponding to any one microphone m (mε {1,..., M}) as shown in the equation (9). Alternatively, as in Expression (10), the power estimation values of the frequency domain signals X ₁ (ω, t),..., X _M (ω, t) may be weighted and averaged. In the first embodiment, the power spectral density P _{X, L} (ω) is obtained as the power estimation value of the frequency domain signal corresponding to the received sound signal.

ただし、Ｌはフレーム数、α_ｍは式（１１）と成るように設定されるマイクロホンｍへの非負の重みである。Ｅ[・]は期待値演算を表している。 However, L is the number of frames, and α _m is a non-negative weight to the microphone m set so as to satisfy the equation (11). E [•] represents an expected value calculation.

直接音方向パワー推定部４４２は、周波数領域信号Ｘ_１（ω，ｔ），…，Ｘ_Ｍ（ω，ｔ）に対して直接音源方向から到来した信号成分のみを通過する処理を行って得られた直接音方向信号のパワー推定値Ｐ_ＤＤ（ω）、又は、受音信号に対して直接音源方向から到来した信号成分のみを通過する処理を行った信号を周波数領域に変換して得られた直接音方向信号のパワー推定値Ｐ_ＤＤ（ω）を得る。直接音方向信号のパワーＰ_ＤＤ（ω）は、前記した式（８）のＰ_０（ω）と同じものである。 The direct sound direction power estimation unit 442 is obtained by performing a process of passing only the signal component coming from the direct sound source direction with respect to the frequency domain signals X ₁ (ω, t),..., X _M (ω, t). Obtained by converting the power estimation value P _DD (ω) of the direct sound direction signal or the signal obtained by performing the process of passing only the signal component coming from the direct sound source direction to the received sound signal into the frequency domain. A power estimate value P _DD (ω) of the direct sound direction signal is obtained. The power P _DD (ω) of the direct sound direction signal is the same as P ₀ (ω) in equation (8).

直接音方向パワー推定部４４２は、指向性形成部４４２１とパワー推定部４４２２を備える。指向性形成部４４２１は、あらかじめ与えられた方向に指向性のビームが向くように指向性を形成し、その指向性を通過した信号を出力する。指向性形成部４４２１の指向性は、直接音方向に指向性のメインビームが向くように設定される。指向性形成の方法としては、例えば参考文献１（浅野太著，「音のアレイ信号処理−音源の定位・追跡と分離」コロナ社，pp.70-79））に記載されている遅延和ビームフォーミングなどの方法を用いることができる。 The direct sound direction power estimation unit 442 includes a directivity forming unit 4421 and a power estimation unit 4422. The directivity forming unit 4421 forms directivity so that a directional beam is directed in a predetermined direction, and outputs a signal that has passed the directivity. The directivity of the directivity forming unit 4421 is set so that the main beam having directivity faces the direct sound direction. As a method of directivity formation, for example, the delayed sum beam described in Reference Document 1 (Taro Asano, “Sound Array Signal Processing-Sound Source Localization / Tracking and Separation” Corona, pp. 70-79)). A method such as forming can be used.

指向性形成部４４２１の出力をＹ_ＢＦ（ω，ｔ）と表記した場合、パワー推定部４４２２の出力する直接音方向信号のパワー推定値Ｐ_ＤＤ（ω）は、式（１２）によって得られる。 When the output of the directivity forming unit 4421 is expressed as Y _BF (ω, t), the power estimation value P _DD (ω) of the direct sound direction signal output from the power estimation unit 4422 is obtained by Expression (12).

また、直接音方向信号のパワー推定値Ｐ_ＤＤ（ω）の出力パワースペクトル密度は式（１３）で表される。 Further, the output power spectral density of the power estimation value P _DD (ω) of the direct sound direction signal is expressed by Expression (13).

ここで｜Ｄ_０θ（ω）｜^２は、図４で説明したビームフォーマＢＦ０のパワーゲインに当たる。 Here, | D _0θ (ω) | ² corresponds to the power gain of the beam former _{BF 0} described in FIG.

残響音方向パワー推定部４４３は、主に直接音源方向以外から到来した信号成分を、直接音方向パワー推定部４４２の直接音源方向から到来した信号成分を主に通過させる処理と同じ指向性形状で通過させる処理を行って得られた残響音方向信号のパワー推定値、又は、主に受音信号に対して直接音源方向以外から到来した信号成分を通過する処理を行った信号を周波数領域に変換して有られた残響音方向信号のパワー推定値を得る。 The reverberant sound direction power estimation unit 443 has the same directivity shape as the process of mainly passing the signal component arriving from other than the direct sound source direction through the signal component arriving from the direct sound source direction of the direct sound direction power estimation unit 442. Converts the estimated power of the reverberant sound direction signal obtained by passing the signal, or the signal that has been processed to pass the signal component that mainly arrives from outside the sound source direction to the received sound signal into the frequency domain. Thus, the power estimate value of the reverberation direction signal is obtained.

理想的には、残響音方向パワー推定部４４３は、残響指向性形成部４４３１と残響パワー推定部４４３２を備える。残響指向性形成部４４３１の指向性は、指向性のメインビームが直接音方向を避けるように設定される。その指向性形状は指向性形成部４４２１と同じに設定される。残響指向性形成部４４３１と指向性形成部４４２１の指向性形状は、極力同じ形状になるように設定するのが望ましい。その指向性形状の設定は従来技術で容易に実現することができる。音源の方向の推定については、例えば、参考文献２「大賀，山崎，金田著，“音響システムとディジタル信号処理”電子情報通信学会発行」の７.２章に記載されている。 Ideally, the reverberant sound direction power estimation unit 443 includes a reverberation directivity forming unit 4431 and a reverberation power estimation unit 4432. The directivity of the reverberation directivity forming unit 4431 is set so that the directivity main beam avoids the direct sound direction. The directivity shape is set to be the same as that of the directivity forming portion 4421. The directivity shapes of the reverberation directivity forming unit 4431 and the directivity forming unit 4421 are desirably set to be the same as much as possible. The setting of the directivity shape can be easily realized by the prior art. The estimation of the direction of the sound source is described in, for example, Chapter 7.2 of Reference Document 2 “Oga, Yamazaki, Kanada,“ Acoustic System and Digital Signal Processing ”published by IEICE.

残響パワー推定部４４３２は、直接音方向を避けるようにして受音された残響音を入力として残響音方向信号のパワー推定値Ｐ_ＲＤ（ω）を出力する（式１４）。残響音方向信号のパワー推定値Ｐ_ＲＤ（ω）には、直接音方向を避けるようにして受音しているので、｜Ｄ_１，θＤ｜^２≪１と設定することで、直接音成分｜Ｄ_０，θ（ω）｜^２Ｐ_Ｄ（ω）は、十分小さくなる。 The reverberation power estimation unit 4432 receives the reverberant sound received so as to avoid the direct sound direction and outputs a power estimate value P _RD (ω) of the reverberant sound direction signal (Equation 14). Since the power estimation value P _RD (ω) of the reverberant sound direction signal is received so as to avoid the direct sound direction, by setting | D _{1, θD} | ² << _1, the direct sound component | D _{0, θ} (ω) | ² P _D (ω) is sufficiently small.

ここで｜Ｄ_１（ω）｜^２は、図４で説明したビームフォーマＢＦ１のパワーゲインに当たる。 Here, | D ₁ (ω) | ² corresponds to the power gain of the beam former BF1 described in FIG.

減算部４４４は、直接音方向パワー推定部４４２が出力する直接音方向信号のパワー推定値Ｐ_ＤＤ（ω）から、残響パワー推定部４４３２が出力する残響音方向信号のパワー推定値Ｐ_ＲＤ（ω）を減算した直接音パワー推定値＾Ｐ_Ｄ（ω）を出力する（式（１５））。 The subtracting unit 444 uses the power estimation value P _RD (ω) of the reverberation sound direction signal output from the reverberation power estimation unit 4432 from the power estimation value P _DD (ω) of the direct sound direction signal output from the direct sound direction power estimation unit 442. ) Is subtracted from the direct sound power estimated value ^ P _D (ω) (formula (15)).

式（１５）の分母は、指向性形成部４４２１と残響指向性形成部４４３１のそれぞれのビームフォーマ（ＢＦ）のパワーゲインの差によって直接音パワー推定値＾Ｐ_Ｄ（ω）を正規化するための項である。 The denominator of the equation (15) normalizes the direct sound power estimated value ^ P _D (ω) by the difference in power gain between the beamformers (BF) of the directivity forming unit 4421 and the reverberant directivity forming unit 4431. It is a term of.

直間比算出部４４５は、受信音パワー推定部４４１が出力するパワースペクトル密度Ｐ_Ｘ，Ｌ（ω）及び直接音パワー推定値＾Ｐ_Ｄ（ω）を用い、直接音パワー推定値＾Ｐ_Ｄ（ω）と、残響音方向信号のパワー推定値のパワーの比である直間比推定値ＤＲＲ（ω）を得る（式（１６））。 The direct ratio calculation unit 445 uses the power spectral density P _{X, L} (ω) and the direct sound power estimated value ^ P _D (ω) output from the received sound power estimating part 441, and uses the direct sound power estimated value ^ P _D. A direct ratio estimated value DRR (ω), which is the ratio of the power of (ω) and the power estimated value of the reverberant sound direction signal, is obtained (equation (16)).

また、受信音パワー推定部４４１の出力する受信音パワーが、何れか１個のマイクロホンｍ（ｍ∈{１，…，Ｍ}）に対応する式（９）で表記される場合、直間比は式（１７）で推定することもできる。 Further, when the received sound power output from the received sound power estimation unit 441 is expressed by the equation (9) corresponding to any one microphone m (mε {1,..., M}), the direct ratio Can also be estimated by equation (17).

さらに、直間比は周波数によらない直間比として式（１８），（１９）で推定することもできる。なお、フレーム数Ｌごとに求めた値であるのでＤＲＲ（ω）と表記しているが、１フレームごと周波数ごとに求めた値はＤＲＲ（ω，ｔ）と表記される。 Further, the direct ratio can be estimated by the equations (18) and (19) as the direct ratio independent of the frequency. In addition, since it is the value calculated | required for every frame number L, it describes with DRR ((omega)), However, The value calculated | required for every frequency for every frame is described with DRR ((omega), t).

以上説明した直間比推定法は、残響音は拡散性が強い信号であることからマイクロホンアレーに対して等方的に到来することに着目した新しい方法である。マイクロホンアレーにより実現される指向性形状が同一な２つのビームフォーマによって、直接音と残響音を含む信号と、残響音のみを含む信号と、を得ることで直接音成分と間接音成分を正しく分離することができ、その結果として直間比の推定精度を向上させることができる。 The direct ratio estimation method described above is a new method that pays attention to the fact that reverberant sound is isotropically arrives at the microphone array since it is a highly diffuse signal. By using two beamformers with the same directional shape realized by a microphone array, a direct sound component and an indirect sound component are correctly separated by obtaining a signal including direct sound and reverberation sound and a signal including only reverberation sound. As a result, the estimation accuracy of the direct ratio can be improved.

なお、式（１６），（１７），（１８），（１９）は、以下のようにデシベル表記しない直間比推定値ＤＲＲであってもよい。 The expressions (16), (17), (18), and (19) may be direct ratio estimated values DRR that are not expressed in decibels as follows.

〔変形例１〕
図１０に、直間比計算部４４の残響音方向パワー推定部４４３の機能構成を変形した直間比計算部４４′の機能構成例を示す。直間比計算部４４′は、残響音方向パワーＰ_ＲＤ（ω）を、複数（２個以上）の指向方向の残響音方向パワーＰ_ＲＤ1（ω）〜Ｐ_ＲＤＮ（ω）を平均して求めるようにしたものである。 [Modification 1]
FIG. 10 shows a functional configuration example of a direct ratio calculation unit 44 ′ in which the functional configuration of the reverberation sound direction power estimation unit 443 of the direct ratio calculation unit 44 is modified. Chokkan ratio calculating unit 44 'obtains the reverberation direction power _{P RD} (omega), a plurality (two or more) orientation of the reverberation sound direction power _{P RD1} an _{(omega) to P RDN} (omega) On average It is what I did.

直間比計算部４４′の残響音方向パワー推定部４４３′は、２個以上の残響指向性形成部４４３１_１〜４４３１_Ｎと、２個以上の残響パワー推定部４４３２_１〜４４３２_Ｎと、残響方向パワー算出部４４３３を備える点で、直間比計算部４４と異なる。残響指向性形成部４４３１_１のビームフォーマのメインビームの方向は例えば基準点から方向θ_１である。残響指向性形成部４４３１_２のビームフォーマのメインビームの方向は方向θ_１であり、残響指向性形成部４４３１_Ｎのビームフォーマのメインビームの方向は方向θ_Ｎである。 The reverberation sound direction power estimation unit 443 ′ of the direct ratio calculation unit 44 ′ includes two or more reverberation directivity forming units 4431 _{1 to} 4431 _N , two or more reverberation power estimation units 4432 _{1 to} 4432 _N , and reverberation. It differs from the direct ratio calculation unit 44 in that a directional power calculation unit 4433 is provided. The direction of the main beam of the beam former of the reverberation directivity forming unit 4431 ₁ is, for example, the direction θ ₁ from the reference point. Direction of the main beam beamformer reverberation directivity forming section 4431 ₂ is the direction theta _1, the direction of the main beam beamformer reverberation directivity forming section 4431 _N is the direction theta _N.

図１１に各残響指向性形成部４４３１_１〜４４３１_Ｎの指向性形状を模式的に示す。各残響指向性形成部４４３１_１〜４４３１_Ｎの指向性形状は、そのメインビームの方向θのみが異なりその形状は同じものである。各々の残響指向性形成部４４３１_１〜４４３１_Ｎの指向性を通過した信号から、それぞれに接続された残響パワー推定部４４３２１〜４４３２Ｎによって各指向方向の残響音パワー推定値Ｐ_ＲＤ１（ω）〜Ｐ_ＲＤＮ（ω）が求められる。 FIG. 11 schematically shows the directivity shapes of the reverberation directivity forming units 4431 _{1 to} 4431 _N. The directivity shapes of the reverberation directivity forming units 4431 _{1 to} 4431 _N are different only in the direction θ of the main beam and have the same shape. Reverberation sound power estimation values P _RD1 (ω) to P in each directivity direction from the signals that have passed through the directivities of the respective reverberation directivity formation units 4431 _{1 to} 4431 _N by the reverberation power estimation units 44321 to 4432N connected thereto. _RDN (ω) is determined.

残響方向パワー算出部４４３３は、複数のパワー推定値Ｐ_ＲＤ１（ω）〜Ｐ_ＲＤＮ（ω）を、重み付け平均（式２０）して残響音方向パワーＰ_ＲＤ（ω）を算出する。 The reverberation direction power calculation unit 4433 calculates the reverberation sound direction power P _RD (ω) by performing a weighted average (Expression 20) on the plurality of power estimation values P _RD1 (ω) to P _RDN (ω).

ただし、β_ｎは非負の重み係数であり、式（２１）を満たすようにあらかじめ設定される。このようにして求めた残響音方向パワーＰ_ＲＤ（ω）は、複数の方向の残響音方向パワーを平均して求めた値なので、その精度を向上させることができる。その結果、直間比推定値ＤＲＲ（ω）の精度を向上させることができる。 However, β _n is a non-negative weighting factor and is set in advance so as to satisfy the equation (21). Since the reverberant sound direction power P _RD (ω) obtained in this way is a value obtained by averaging the reverberant sound direction powers in a plurality of directions, the accuracy can be improved. As a result, the accuracy of the direct ratio estimation value DRR (ω) can be improved.

〔変形例２〕
図１２に、直間比計算部４４の残響音方向パワー推定部４４３の機能構成を変更した直間比計算部４４″の機能構成例を示す。直間比計算部４４″は、指向性形成部４４２１と残響指向性形成部４４３１のビームフォーマのメインビームの方向を自動的に設定できるようにしたものである。 [Modification 2]
FIG. 12 shows a functional configuration example of the direct ratio calculation unit 44 ″ obtained by changing the functional configuration of the reverberation sound direction power estimation unit 443 of the direct ratio calculation unit 44. The direct ratio calculation unit 44 ″ is a directivity generator. The direction of the main beam of the beam former of the unit 4421 and the reverberation directivity forming unit 4431 can be automatically set.

直間比計算部４４″は、音源方向推定部４４６と、ビームフォーマ生成部４４７と、を備える点で、直間比計算部４４と異なる。音源方向推定部４４６は、マイクロホンアレー４１に含まれる複数個のマイクロホンで受音された受音信号を周波数領域に変換して得られる周波数領域信号Ｘ_１（ω，ｔ），…，Ｘ_Ｍ（ω，ｔ）を入力として、音源の方向を推定して音源方向信号を出力する。音源の方向は、例えば、周波数領域信号Ｘ_１（ω，ｔ），…，Ｘ_Ｍ（ω，ｔ）の位相差等から従来技術で求めることが可能である。 The direct ratio calculation unit 44 ″ is different from the direct ratio calculation unit 44 in that it includes a sound source direction estimation unit 446 and a beam former generation unit 447. The sound source direction estimation unit 446 is included in the microphone array 41. The direction of the sound source is estimated by using as input the frequency domain signals X ₁ (ω, t),..., X _M (ω, t) obtained by converting the received signals received by a plurality of microphones into the frequency domain. The direction of the sound source can be obtained by a conventional technique from the phase difference of the frequency domain signals X ₁ (ω, t),..., X _M (ω, t), for example. .

ビームフォーマ生成部４４７は、音源方向信号を入力として、その音源方向にメインビームを持つビームフォーマＢＦ０と、その音源方向を避けるようにメインビームが設定されたビームフォーマＢＦ１とを生成して、ビームフォーマＢＦ０を直接音方向パワー推定部４４２へ、ビームフォーマＢＦ１を残響音方向パワー推定部４４３に出力する。直接音方向パワー推定部４４２の指向性形成部４４２１は、ビームフォーマＢＦ０を適用して上記した出力信号Ｙ_ＢＦ（ω，ｔ）を出力する。残響音方向パワー推定部４４３は、ビームフォーマＢＦ１を適用して残響音方向パワー_ＰＲＤ（ω）を出力する。 The beamformer generation unit 447 receives a sound source direction signal, generates a beamformer BF0 having a main beam in the sound source direction, and a beamformer BF1 in which the main beam is set so as to avoid the sound source direction. The former BF0 is output to the direct sound direction power estimating unit 442, and the beam former BF1 is output to the reverberant sound direction power estimating unit 443. The directivity forming unit 4421 of the direct sound direction power estimating unit 442 applies the beamformer BF0 and outputs the output signal Y _BF (ω, t) described above. The reverberant sound direction power estimation unit 443 outputs the reverberant sound direction power _PRD (ω) by applying the beam former BF1.

このように直間比計算部４４″は、自動的に直接音方向パワー推定部４４２と残響音方向パワー推定部４４３の指向性形状を設定することができる。以上、直間比計算部４４，４４′，４４″の動作を周波数領域で動作する例で説明を行ったが、変形例を含めて本発明の技術思想は、そのまま時間領域の動作に適用することが可能である。また、直間比計算部４４″の考えを、直間比計算部４４′に適用することも可能である。 As described above, the direct ratio calculation unit 44 ″ can automatically set the directivity shapes of the direct sound direction power estimation unit 442 and the reverberant sound direction power estimation unit 443. As described above, the direct ratio calculation unit 44, The operation of 44 ′, 44 ″ has been described with an example of operating in the frequency domain. However, the technical idea of the present invention including the modification can be applied to the operation in the time domain as it is. It is also possible to apply the idea of the direct ratio calculation unit 44 ″ to the direct ratio calculation unit 44 ′.

〔対象信号調整部〕
対象信号調整部４５は、処理対象信号Ｙ（ω，ｔ）と直間比推定値ＤＲＲ（ω，ｔ）を入力とし、直間比推定値ＤＲＲ（ω，ｔ）に応じて処理対象信号Ｙ（ω，ｔ）の振幅を調整した処理後信号Ｚ（ω，ｔ）を生成して出力する。言い換えると、対象信号調整部４５は、直間比推定値ＤＲＲ（ω，ｔ）に応じたゲイン（フィルタ係数）を処理対象信号Ｙ（ω，ｔ）に乗じ、それによって処理後信号Ｚ（ω，ｔ）を生成して出力する（ステップＳ４５）。 [Target signal adjustment section]
The target signal adjustment unit 45 receives the processing target signal Y (ω, t) and the direct ratio estimated value DRR (ω, t) as input, and processes the target signal Y according to the direct ratio estimated value DRR (ω, t). A post-processing signal Z (ω, t) in which the amplitude of (ω, t) is adjusted is generated and output. In other words, the target signal adjustment unit 45 multiplies the processing target signal Y (ω, t) by a gain (filter coefficient) corresponding to the direct ratio estimated value DRR (ω, t), thereby processing the processed signal Z (ω , T) is generated and output (step S45).

直間比推定値ＤＲＲに応じてどのような大きさのゲインを定めるかは、マイクロホンアレー４１からどのような距離範囲にある直接音源から発せられた音を強調するのかに依存する。例えば、マイクロホンアレー４１に近い直接音源から発せられた音をより強調する場合には、直間比推定値ＤＲＲが表す間接音のパワー推定値に対する直接音のパワー推定値の比率が第１値である場合に処理対象信号に乗じられるゲインを、当該比率が第１値よりも小さな第２値である場合に処理対象信号に乗じられるゲインよりも大きくする。例えば、マイクロホンアレー４１に遠い直接音源から発せられた音をより強調する場合には、直間比推定値ＤＲＲが表す間接音のパワー推定値に対する直接音のパワー推定値の比率が第１値である場合に処理対象信号に乗じられるゲインＧ（ω，ｔ）を、当該比率が第１値よりも小さな第２値である場合に処理対象信号に乗じられるゲインよりも小さくする。 The magnitude of the gain determined according to the direct ratio estimated value DRR depends on the distance range from the microphone array 41 and the sound emitted from the direct sound source. For example, when the sound emitted from the direct sound source close to the microphone array 41 is further emphasized, the ratio of the power estimate value of the direct sound to the power estimate value of the indirect sound represented by the direct ratio estimate value DRR is the first value. In some cases, the gain multiplied by the processing target signal is set to be larger than the gain multiplied by the processing target signal when the ratio is a second value smaller than the first value. For example, when the sound emitted from a direct sound source far from the microphone array 41 is further emphasized, the ratio of the direct sound power estimated value to the indirect sound power estimated value represented by the direct ratio estimated value DRR is the first value. In some cases, the gain G (ω, t) multiplied by the processing target signal is made smaller than the gain multiplied by the processing target signal when the ratio is a second value smaller than the first value.

対象信号調整部４５は、例えば、フィルタ係数算出手段４５１と、乗算手段４５２とで構成できる（図６）。フィルタ係数算出部４５は、直間比推定値ＤＲＲ(ω，ｔ)を入力としてフィルタ係数Ｇ（ω，ｔ）を算出して出力する。フィルタ係数Ｇ（ω，ｔ）の算出には、例えば式（２２），（２３）に示すように閾値を用いた２値のフィルタなどが用いられる。 The target signal adjustment unit 45 can be configured by, for example, a filter coefficient calculation unit 451 and a multiplication unit 452 (FIG. 6). The filter coefficient calculation unit 45 calculates and outputs a filter coefficient G (ω, t) with the direct ratio estimated value DRR (ω, t) as an input. For the calculation of the filter coefficient G (ω, t), for example, a binary filter using a threshold value as shown in equations (22) and (23) is used.

なお、閾値Ｔｈ_１は、直間比推定値ＤＲＲ（ω，ｔ）の最小値と最大値の間の任意の値が設定できる。閾値Ｔｈ_１を最小値（０）に近づけると音質は向上する。逆に閾値Ｔｈ_１を最大値に近づけると雑音抑圧効果は高めるが受音信号の歪みが大きくなり音質が劣化する。 The threshold Th ₁ can arbitrary value set between the minimum and maximum values of Chokkan ratio estimate DRR (ω, t). The sound quality is improved when the threshold Th ₁ is brought close to the minimum value (0). Noise suppression effect is brought close to the threshold Th ₁ to the maximum value in the opposite order to increase but deteriorates the sound quality becomes large distortion of the received sound signal.

このように閾値Ｔｈ_１は、音質と雑音抑圧との関係でトレードオフの関係を持つ。よって、閾値Ｔｈ_１は、このトレードオフの関係を考慮した上で、利用目的に応じて経験的に決定される。 Thus, the threshold Th ₁ has a trade-off relationship between the sound quality and the noise suppression. Therefore, the threshold Th ₁ is determined empirically according to the purpose of use in consideration of this trade-off relationship.

また、フィルタ係数Ｇ（ω，ｔ）の算出に際して式（２４），（２５）に示すように、直間比推定値が閾値Ｔｈ_２を下回る時間周波数帯域を強調するようにすれば、特定の距離範囲より遠くの音源を強調することができる。 Further, when calculating the filter coefficient G (ω, t), as shown in the equations (24) and (25), if the time frequency band in which the direct ratio estimated value falls below the threshold Th ₂ is emphasized, a specific frequency Sound sources farther than the distance range can be emphasized.

なお、フィルタ係数Ｇ（ω，ｔ）の例として０か１の２値のフィルタを挙げたが、フィルタ係数Ｇ（ω，ｔ）は必ずしも０と１である必要はなく、例えば、０.１と０.９のように十分異なる値であれば良い。 In addition, although the binary filter of 0 or 1 was mentioned as an example of the filter coefficient G (ω, t), the filter coefficient G (ω, t) does not necessarily have to be 0 and 1, for example, 0.1 And a sufficiently different value such as 0.9.

また、フィルタ係数Ｇ（ω，ｔ）には、１以上の実数を設定するようにしても良い。つまり、処理対象信号Ｙ（ω，ｔ）を増幅するゲインＧ（ω，ｔ）が定められてもよい。また、処理対象信号Ｙ（ω，ｔ）を大きく抑圧するゲインＧ（ω，ｔ）（例えば０.１以下の値）が定められてもよい。また、閾値判定によってゲインＧ（ω，ｔ）を定めるのではなく、直間比推定値やその関数値がゲインＧ（ω，ｔ）とされてもよい。例えば、以下の式（２６）〜（２９）のようにゲインＧ（ω，ｔ）が定められてもよい。 Further, a real number of 1 or more may be set for the filter coefficient G (ω, t). That is, the gain G (ω, t) for amplifying the processing target signal Y (ω, t) may be determined. Further, a gain G (ω, t) (for example, a value of 0.1 or less) that largely suppresses the processing target signal Y (ω, t) may be determined. Further, instead of determining the gain G (ω, t) by threshold determination, the direct ratio estimated value or its function value may be used as the gain G (ω, t). For example, the gain G (ω, t) may be determined as in the following formulas (26) to (29).

ただし、Ｆは単調増加関数又は単調減少関数などの関数である。 However, F is a function such as a monotonically increasing function or a monotonically decreasing function.

このようにして求めたフィルタ係数Ｇ（ω，ｔ）が、乗算手段４５２において、処理対象信号Ｙ（ω，ｔ）に乗じて処理後信号Ｚ（ω，ｔ）＝Ｇ（ω，ｔ）・Ｙ（ω，ｔ）が生成される。よって、処理後信号Ｚ（ω，ｔ）を、直間比推定値ＤＲＲ（ω，ｔ）の大きな処理対象信号Ｙ（ω，ｔ）のみで構成することができる。つまり、直接音のみを抽出することができる。 The multiplication coefficient 452 multiplies the processing target signal Y (ω, t) by the filter coefficient G (ω, t) obtained in this way, and the processed signal Z (ω, t) = G (ω, t) · Y (ω, t) is generated. Therefore, the post-processing signal Z (ω, t) can be composed of only the processing target signal Y (ω, t) having a large direct ratio estimate value DRR (ω, t). That is, only the direct sound can be extracted.

実施例２として、実施例１で述べた直間比推定値ＤＲＲ(ω，ｔ）を用いて音源の遠近を判定する遠近判定装置１２０を説明する。図１３に遠近判定装置１２０の機能構成例を示す。遠近判定装置１２０は、マイクロホンアレー４１と、複数の周波数領域変換部４１₁〜４１_mと、直間比計算部４４と、遠近判定部１２１と、を備える。マイクロホンアレー４１と、複数の周波数領域変換部４１₁〜４１_mと、直間比計算部４４とは、雑音除去装置４００のものと同じである。遠近判定装置１２０も、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現される。 As a second embodiment, a perspective determination device 120 that determines the perspective of a sound source using the direct ratio estimated value DRR (ω, t) described in the first embodiment will be described. FIG. 13 shows a functional configuration example of the perspective determination device 120. The perspective determination device 120 includes a microphone array 41, a plurality of frequency domain conversion units 41 _{1 to} 41 _m , a direct ratio calculation unit 44, and a perspective determination unit 121. The microphone array 41, the plurality of frequency domain conversion units 41 _{1 to} 41 _m, and the direct ratio calculation unit 44 are the same as those of the noise removal device 400. The distance determination device 120 is also realized by a predetermined program being read into a computer including, for example, a ROM, a RAM, and a CPU, and the CPU executing the program.

遠近判定装置１２０は、複数の異なる距離にある音源が異なる時刻に発音するときに、ある時刻に受音された音の音源が遠くにあるのか近くにあるのかを判定するものである。遠近判定装置１２０を構成する遠近判定部１２１は、周波数平均手段１２１０と、蓄積手段１２１１と、判定手段１２１２と、を備える。 The perspective determination device 120 determines whether a sound source of a sound received at a certain time is far or near when sound sources at a plurality of different distances sound at different times. The perspective determination unit 121 included in the perspective determination device 120 includes a frequency averaging unit 1210, a storage unit 1211, and a determination unit 1212.

遠近判定部１２１は、１個以上のフレームからなる判定区間で受音された受音信号に基づいて得られた直間比推定値に対応する判定値と、判定区間よりも多くの個数のフレームからなる基準区間で受音された受音信号に基づいて得られた複数の直間比推定値に対応する基準値とを用いた比較判定によって、判定区間での直接音源の遠近判定を行う。 The perspective determination unit 121 includes a determination value corresponding to the direct ratio estimation value obtained based on the received sound signal received in the determination section including one or more frames, and a larger number of frames than the determination section. The perspective determination of the direct sound source in the determination section is performed by comparison determination using reference values corresponding to a plurality of direct ratio estimation values obtained based on the received sound signal received in the reference section.

周波数平均手段１２１０は、直間比推定値ＤＲＲ（ω，ｔ）を入力として、当該値を周波数方向に平均して周波数平均直間比推定値￣Ｅ_ｔを出力する（式（３０））。 The frequency averaging means 1210 receives the direct ratio estimated value DRR (ω, t) as an input, averages the value in the frequency direction, and outputs a frequency average direct ratio estimated value ￣E _t (formula (30)).

ここで、Ｋは周波数領域変換部４２₁〜４２_Mで行ったフーリエ変換の周波数ビンの総数である。 Here, K is the total number of frequency bins of the Fourier transform performed by the frequency domain transform units 42 _{1 to} 42 _M.

蓄積手段１２１１は、周波数平均直間比推定値￣Ｅ_ｔを過去Ｌ時間フレーム分蓄積して、比較対象直間比推定値＾Ｅを出力する。比較対象直間比推定値＾Ｅには、例えば蓄積された周波数平均直間比推定値￣Ｅ_ｔの平均値＾Ｅ＝１/ＬΣ_ｔ ^L￣Ｅ_ｔや、最小値と最大値の平均値Ｅ＾＝１/２（max￣Ｅ_ｔ+min￣Ｅ_ｔ）等が用いられる。 The accumulating unit 1211 accumulates the frequency average direct ratio estimation value ￣E _t for the past L time frames and outputs the comparison target direct ratio estimate value ^ E. Compared Chokkan ratio estimate ^ The E, for example, stored frequency mean Chokkan ratio estimates the average value of ¯e _t ^ and _{^{E = 1 / LΣ t L ¯E}} t, minimum and average values of the maximum value E ^ = 1/2 (max￣E _t + min￣E _t ) or the like is used.

判定手段１２１２は、周波数平均直間比相当値￣Ｅ_ｔと、比較対象直間比相当値＾Ｅを比較して、￣Ｅ_ｔ＞＾Ｅの時には遠近判定結果Ｙ_lに距離が近いことを表す例えば１を、￣Ｅ_l＜＾Ｅの時には遠近判定結果Ｙ_ｔに距離が遠いことを表す例えば０を出力する。この遠近判定結果Ｙ_ｔは、直近の過去Ｌ時間分の受音信号が、比較的近い音源からの音であるか、又は、比較的遠い音源からの音であるかを表すものである。 Determination means 1212, a frequency averaging Chokkan ratio equivalent value ¯e _t, compares the comparison Chokkan ratio equivalent value ^ E, that is close distance distance determination result Y _l when the ¯E _{t> ^} E the representative example 1, and outputs the ¯E _l <^ 0 example indicates that the distance is far in distance determination result Y _t is the time of E. The distance determination result Y _t is the last received sound signals of the past L time period is either a sound from relatively close sound source or those indicating which sounds from a relatively distant sound source.

この遠近判定結果Ｙ_ｔを用いることで、逐次入力される受音信号を、マイクロホンとその音源間との距離によって切り分けることが可能である。つまり、複数の音源の音を、マイクロホンからの距離に応じて選択することができる。 By using this distance determination result Y _t , it is possible to distinguish the sequentially received sound signal based on the distance between the microphone and the sound source. That is, sounds from a plurality of sound sources can be selected according to the distance from the microphone.

〔実験結果〕
本発明の効果を確認する目的で、鏡像法を用いたシミュレーション実験を行った。〔Experimental result〕
In order to confirm the effect of the present invention, a simulation experiment using a mirror image method was performed.

図１４にシミュレーション条件を示す。図１４は平面図であり、幅４ｍ、奥行き６ｍで、高さが２.７ｍの部屋を想定した。壁の吸音率はα＝０.０５（残響時間Ｔ_６０＝１.８秒）に設定した。８個のマイクロホンを円状に配置したマイクロホンアレーを用い、その基準点の高さは１.５ｍとした。音源の高さも１.５ｍとした。 FIG. 14 shows the simulation conditions. FIG. 14 is a plan view, assuming a room having a width of 4 m, a depth of 6 m, and a height of 2.7 m. The wall sound absorption coefficient was set to α = 0.05 (reverberation time T ₆₀ = 1.8 seconds). A microphone array in which eight microphones were arranged in a circle was used, and the height of the reference point was 1.5 m. The height of the sound source was also 1.5 m.

この条件において、インパルス応答から推定したＤＲＲの実測値ＤＲＲ_actual（□）と、本発明（▽）と、従来法（○）と、を比較した結果を図１５に示す。この発明の方法により推定したＤＲＲ（▽）は、従来法と比べて実測値ＤＲＲ_actual（□）に近づいており、特に音源が遠方にある場合では３ｄＢ程度改善している。 FIG. 15 shows a result of comparison between the _actual DRR _actual value (□) estimated from the impulse response, the present invention (▽), and the conventional method (◯) under these conditions. The DRR (▽) estimated by the method of the present invention is closer to the actually measured value DRR _actual (□) than the conventional method, and is improved by about 3 dB particularly when the sound source is far away.

一般に間接成分のパワーは音源の距離によらず一定であるのに対して、直接成分のパワーは距離の２乗に反比例する。このため遠方の音源の場合、直接成分のパワーは間接成分のそれと比べて微小になり、推定された直接成分に含まれる誤差が小さくてもＤＲＲの推定結果には大きな影響を与える。この発明の方法では、マイクロホンアレーの指向性制御により、音源方向から到来する信号の影響を極力抑えて間接音のパワーを求めることから、より精度の高い推定が可能となり、より遠方の音源までＤＲＲを正しく推定できるようになる。 In general, the power of the indirect component is constant regardless of the distance of the sound source, whereas the power of the direct component is inversely proportional to the square of the distance. For this reason, in the case of a distant sound source, the power of the direct component is smaller than that of the indirect component, and even if the error included in the estimated direct component is small, the DRR estimation result is greatly affected. According to the method of the present invention, the power of the indirect sound is obtained by suppressing the influence of the signal arriving from the sound source direction as much as possible by the directivity control of the microphone array, so that it is possible to perform the estimation with higher accuracy and the DRR to the sound source farther away. Can be estimated correctly.

以上説明したように、この発明の新しい直間比推定方法は、残響音は拡散性が強い信号であることからマイクロホンアレーに対して等方的に到来すると仮定する新しい方法である。マイクロホンアレーにより実現される指向性形状が同一でメインビームの方向が直接音源方向に設定されたビームフォーマと、メインビームの方向が直接音源方向を避けるように設定されたビームフォーマと、によって音源方向から到来する直接成分と間接成分とを正しく分離することができ、その結果として直間比の推定値精度を上げることができる。 As described above, the new direct ratio estimation method of the present invention is a new method that assumes that the reverberant sound is isotropically arrives at the microphone array since it is a highly diffuse signal. The sound source direction by the beamformer with the same directional shape realized by the microphone array and the main beam direction set directly to the sound source direction, and the beamformer set the main beam direction to avoid the direct sound source direction The direct component and the indirect component coming from can be correctly separated, and as a result, the accuracy of the estimate of the direct ratio can be improved.

上記した説明では、本発明の直間比推定方法を、音響信号強調装置４００または遠近判定装置１３０に組み込んだ例で説明を行ったが、図１６に示すようにこの発明の直間比推定方法のみを実現する直間比推定装置１６０として構成するようにしてもよい。その場合、直間比推定装置１６０は、マイクロホンアレー４１と、複数の周波数領域変換部４２_１〜４２_Ｍと、直間比計算部４４と、で構成することが可能である。 In the above description, the direct ratio estimation method of the present invention has been described as an example incorporated in the acoustic signal enhancement device 400 or the perspective determination device 130. However, as shown in FIG. 16, the direct ratio estimation method of the present invention. However, it may be configured as the direct ratio estimation device 160 that realizes only the above. In that case, the direct ratio estimation device 160 can be configured by the microphone array 41, a plurality of frequency domain converters 42 _{1 to} 42 _M, and the direct ratio calculator 44.

なお、直間比推定値ＤＲＲとしてデジベル表記した例を式（１６）〜（１９）に示したが、直間比推定値はパワースペクトル密度の比で求めてよいことは言うまでもないことであり、上記した式で表されるＤＲＲの値に、何れかの定数が乗じられたものを直間比推定値としてもよいし、上記した式で表されたＤＲＲの値の逆数に定数が乗じられたものを直間比推定値としてもよい。また、その定数は単調増加関数値であってもよい。つまり、この発明の直間比推定値ＤＲＲは、上記した式（１６）〜（１９）で表されたものに限定されない。 In addition, although the example expressed in decibel as the direct ratio estimated value DRR is shown in the equations (16) to (19), it goes without saying that the direct ratio estimated value may be obtained by the ratio of the power spectral density. The direct ratio estimation value may be obtained by multiplying the DRR value represented by the above formula by any constant, or the constant may be multiplied by the reciprocal of the DRR value represented by the above formula. It is good also as a direct ratio estimated value. The constant may be a monotonically increasing function value. That is, the direct ratio estimated value DRR of the present invention is not limited to that expressed by the above formulas (16) to (19).

なお、上記方法及び装置において説明した処理は、記載の順に従って時系列に実行され
るのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 Note that the processes described in the above method and apparatus are not only executed in time series according to the order of description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. Good.

また、上記装置における処理手段をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、各装置における処理手段がコンピュータ上で実現される。 Further, when the processing means in the above apparatus is realized by a computer, the processing contents of functions that each apparatus should have are described by a program. Then, by executing this program on the computer, the processing means in each apparatus is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）/ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of a server computer and transferring the program from the server computer to another computer via a network.

また、各手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Each means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

Claims

A received sound power estimation unit that obtains a power estimation value of the frequency domain signal using a frequency domain signal obtained by converting a received sound signal received by a plurality of microphones included in the microphone array into a frequency domain;
The power estimation value of the direct sound direction signal obtained by performing the process of mainly passing the signal component arriving from the direct sound source direction with respect to the frequency domain signal, or the direct sound source direction with respect to the received signal A direct sound direction power estimation unit that obtains a power estimation value of a direct sound direction signal obtained by converting the signal that has been processed to pass through the signal component mainly into the frequency domain;
It is obtained by performing a process of passing signal components mainly coming from other than the direct sound source direction with the same directivity shape as the process of mainly passing the signal components coming from the direct sound source direction of the direct sound direction power estimation unit. It is possible to convert the power estimation value of the reverberant sound direction signal obtained or the signal obtained by performing the process of passing the signal component mainly coming from outside the direct sound source direction to the received sound signal into the frequency domain. A reverberant sound direction power estimation unit for obtaining a power estimate of a reverberant sound direction signal;
A subtracting unit that outputs a direct sound power estimated value obtained by subtracting a power estimated value of the reverberant sound direction signal from a power estimated value of the direct sound direction signal;
Using the power estimation value of the frequency domain signal and the power estimation value of the reverberation sound direction signal, the direct ratio to obtain the direct ratio estimation value representing the ratio of the power estimation value of the direct sound to the power estimation value of the reverberation sound direction signal A ratio calculator;
A target signal adjusting unit for obtaining a post-processing signal by multiplying a processing target signal obtained from the received sound signal by a gain according to the direct ratio estimation value;
The gain multiplied by the processing target signal for which the ratio represented by the direct ratio estimation value is larger than a predetermined threshold is larger than the gain multiplied by the processing target signal for which the ratio is smaller than the predetermined threshold. , Sound signal enhancement device.

A received sound power estimation unit that obtains a power estimation value of the frequency domain signal using a frequency domain signal obtained by converting a received sound signal received by a plurality of microphones included in the microphone array into a frequency domain;
The power estimation value of the direct sound direction signal obtained by performing the process of mainly passing the signal component arriving from the direct sound source direction with respect to the frequency domain signal, or the direct sound source direction with respect to the received signal A direct sound direction power estimation unit that obtains a power estimation value of a direct sound direction signal obtained by converting the signal that has been processed to pass through the signal component mainly into the frequency domain;
It is obtained by performing a process of passing signal components mainly coming from other than the direct sound source direction with the same directivity shape as the process of mainly passing the signal components coming from the direct sound source direction of the direct sound direction power estimation unit. It is possible to convert the power estimation value of the reverberant sound direction signal obtained or the signal obtained by performing the process of passing the signal component mainly coming from outside the direct sound source direction to the received sound signal into the frequency domain. A reverberant sound direction power estimation unit for obtaining a power estimate of a reverberant sound direction signal;
A subtracting unit that outputs a direct sound power estimated value obtained by subtracting a power estimated value of the reverberant sound direction signal from a power estimated value of the direct sound direction signal;
Using the power estimation value of the frequency domain signal and the power estimation value of the reverberation sound direction signal, the direct ratio to obtain the direct ratio estimation value representing the ratio of the power estimation value of the direct sound to the power estimation value of the reverberation sound direction signal A ratio calculator;
A target signal adjusting unit for obtaining a post-processing signal by multiplying a processing target signal obtained from the received sound signal by a gain according to the direct ratio estimation value;
The gain multiplied by the processing target signal for which the ratio represented by the direct ratio estimation value is larger than a predetermined threshold is smaller than the gain multiplied by the processing target signal for which the ratio is smaller than the predetermined threshold. , Sound signal enhancement device.

A received sound power estimation unit that obtains a power estimation value of the frequency domain signal using a frequency domain signal obtained by converting a received sound signal received by a plurality of microphones included in the microphone array into a frequency domain;
The power estimation value of the direct sound direction signal obtained by performing the process of mainly passing the signal component arriving from the direct sound source direction with respect to the frequency domain signal, or the direct sound source direction with respect to the received signal A direct sound direction power estimation unit that obtains a power estimation value of a direct sound direction signal obtained by converting the signal that has been processed to pass through the signal component mainly into the frequency domain;
It is obtained by performing a process of passing signal components mainly coming from other than the direct sound source direction with the same directivity shape as the process of mainly passing the signal components coming from the direct sound source direction of the direct sound direction power estimation unit. Obtained by converting the received reverberation direction signal power estimate value, or a signal obtained by performing a process of passing a signal component mainly coming from outside the direct sound source direction to the received sound signal into the frequency domain. A reverberant sound direction power estimation unit for obtaining a power estimate of a reverberant sound direction signal;
A subtracting unit that outputs a direct sound power estimated value obtained by subtracting a power estimated value of the reverberant sound direction signal from a power estimated value of the direct sound direction signal;
Using the power estimation value of the frequency domain signal and the power estimation value of the reverberation sound direction signal, the direct ratio to obtain the direct ratio estimation value representing the ratio of the power estimation value of the direct sound to the power estimation value of the reverberation sound direction signal A ratio calculator;
Using the direct ratio estimate, having a perspective determination unit that performs perspective determination of the direct sound source and obtains a perspective determination result,
The direct ratio estimation value is obtained based on the received sound signal received in a frame that is a predetermined time interval,
The perspective determination unit includes a determination value corresponding to the direct ratio estimation value obtained based on the received sound signal received in a determination section including one or more frames, and more than the determination section. The direct determination in the determination section is performed by comparison determination using a plurality of reference values corresponding to the direct ratio estimation values obtained based on the received sound signal received in the reference section including a number of frames. A perspective determination device that performs perspective determination of a sound source.

A received sound power estimation unit that obtains a power estimation value of the frequency domain signal using a frequency domain signal obtained by converting a received sound signal received by a plurality of microphones included in the microphone array into a frequency domain;
The power estimation value of the direct sound direction signal obtained by performing the process of mainly passing the signal component arriving from the direct sound source direction with respect to the frequency domain signal, or the direct sound source direction with respect to the received signal A direct sound direction power estimation unit that obtains a power estimation value of a direct sound direction signal obtained by converting the signal that has been processed to pass through the signal component mainly into the frequency domain;
It is obtained by performing a process of passing signal components mainly coming from other than the direct sound source direction with the same directivity shape as the process of mainly passing the signal components coming from the direct sound source direction of the direct sound direction power estimation unit. It is possible to convert the power estimation value of the reverberant sound direction signal obtained or the signal obtained by performing the process of passing the signal component mainly coming from outside the direct sound source direction to the received sound signal into the frequency domain. A reverberant sound direction power estimation unit for obtaining a power estimate of a reverberant sound direction signal;
A subtracting unit that outputs a direct sound power estimated value obtained by subtracting a power estimated value of the reverberant sound direction signal from a power estimated value of the direct sound direction signal;
Using the power estimation value of the frequency domain signal and the power estimation value of the reverberation sound direction signal, the direct ratio to obtain the direct ratio estimation value representing the ratio of the power estimation value of the direct sound to the power estimation value of the reverberation sound direction signal A ratio calculator;
A target signal adjusting unit for obtaining a post-processing signal by multiplying a processing target signal obtained from the received sound signal by a gain according to the direct ratio estimation value;
The gain multiplied by the processing target signal when the ratio represented by the direct ratio estimation value is a first value is the processing target signal when the ratio is a second value smaller than the first value. An acoustic signal emphasizing device that is larger than the gain multiplied by.

A received sound power estimation unit that obtains a power estimation value of the frequency domain signal using a frequency domain signal obtained by converting a received sound signal received by a plurality of microphones included in the microphone array into a frequency domain;
The power estimation value of the direct sound direction signal obtained by performing the process of mainly passing the signal component arriving from the direct sound source direction with respect to the frequency domain signal, or the direct sound source direction with respect to the received signal A direct sound direction power estimation unit that obtains a power estimation value of a direct sound direction signal obtained by converting the signal that has been processed to pass through the signal component mainly into the frequency domain;
It is obtained by performing a process of passing signal components mainly coming from other than the direct sound source direction with the same directivity shape as the process of mainly passing the signal components coming from the direct sound source direction of the direct sound direction power estimation unit. It is possible to convert the power estimation value of the reverberant sound direction signal obtained or the signal obtained by performing the process of passing the signal component mainly coming from outside the direct sound source direction to the received sound signal into the frequency domain. A reverberant sound direction power estimation unit for obtaining a power estimate of a reverberant sound direction signal;
A subtracting unit that outputs a direct sound power estimated value obtained by subtracting a power estimated value of the reverberant sound direction signal from a power estimated value of the direct sound direction signal;
Using the power estimation value of the frequency domain signal and the power estimation value of the reverberation sound direction signal, the direct ratio to obtain the direct ratio estimation value representing the ratio of the power estimation value of the direct sound to the power estimation value of the reverberation sound direction signal A ratio calculator;
A target signal adjusting unit for obtaining a post-processing signal by multiplying a processing target signal obtained from the received sound signal by a gain according to the direct ratio estimation value;
The gain multiplied by the processing target signal when the ratio represented by the direct ratio estimation value is a first value is the processing target signal when the ratio is a second value smaller than the first value. An acoustic signal emphasizing device that is smaller than the gain multiplied by.

A received sound power estimation step for obtaining a power estimation value of the frequency domain signal using a frequency domain signal obtained by converting a received sound signal received by a plurality of microphones included in the microphone array into a frequency domain;
The power estimation value of the direct sound direction signal obtained by performing the process of mainly passing the signal component arriving from the direct sound source direction with respect to the frequency domain signal, or the direct sound source direction with respect to the received signal Direct sound direction power estimation step for obtaining a power estimation value of the direct sound direction signal obtained by converting the signal that has been processed to pass through the signal component mainly into the frequency domain;
It is obtained by performing a process of passing signal components mainly coming from other than the direct sound source direction with the same directivity shape as the process of mainly passing the signal components coming from the direct sound source direction of the direct sound direction power estimation unit. It is possible to convert the power estimation value of the reverberant sound direction signal obtained or the signal obtained by performing the process of passing the signal component mainly coming from outside the direct sound source direction to the received sound signal into the frequency domain. A reverberant sound direction power estimating step for obtaining a power estimate of the reverberant sound direction signal;
A subtracting step for outputting a direct sound power estimated value obtained by subtracting a power estimated value of the reverberant sound direction signal from a power estimated value of the direct sound direction signal;
Using the power estimation value of the frequency domain signal and the power estimation value of the reverberation sound direction signal, the direct ratio to obtain the direct ratio estimation value representing the ratio of the power estimation value of the direct sound to the power estimation value of the reverberation sound direction signal A ratio calculating step;
A target signal adjustment step of obtaining a post-processing signal by multiplying a processing target signal obtained from the sound reception signal by a gain according to the direct ratio estimation value;
The gain multiplied by the processing target signal for which the ratio represented by the direct ratio estimation value is larger than a predetermined threshold is larger than the gain multiplied by the processing target signal for which the ratio is smaller than the predetermined threshold. Sound signal enhancement method.

A received sound power estimation step for obtaining a power estimation value of the frequency domain signal using a frequency domain signal obtained by converting a received sound signal received by a plurality of microphones included in the microphone array into a frequency domain;
The power estimation value of the direct sound direction signal obtained by performing the process of mainly passing the signal component arriving from the direct sound source direction with respect to the frequency domain signal, or the direct sound source direction with respect to the received signal Direct sound direction power estimation step for obtaining a power estimation value of the direct sound direction signal obtained by converting the signal that has been processed to pass through the signal component mainly into the frequency domain;
It is obtained by performing a process of passing signal components mainly coming from other than the direct sound source direction with the same directivity shape as the process of mainly passing the signal components coming from the direct sound source direction of the direct sound direction power estimation unit. It is possible to convert the power estimation value of the reverberant sound direction signal obtained or the signal obtained by performing the process of passing the signal component mainly coming from outside the direct sound source direction to the received sound signal into the frequency domain. A reverberant sound direction power estimating step for obtaining a power estimate of the reverberant sound direction signal;
A subtracting step for outputting a direct sound power estimated value obtained by subtracting a power estimated value of the reverberant sound direction signal from a power estimated value of the direct sound direction signal;
Using the power estimation value of the frequency domain signal and the power estimation value of the reverberation sound direction signal, the direct ratio to obtain the direct ratio estimation value representing the ratio of the power estimation value of the direct sound to the power estimation value of the reverberation sound direction signal A ratio calculating step;
A target signal adjustment step of obtaining a post-processing signal by multiplying a processing target signal obtained from the sound reception signal by a gain according to the direct ratio estimation value;
The gain multiplied by the processing target signal for which the ratio represented by the direct ratio estimation value is larger than a predetermined threshold is smaller than the gain multiplied by the processing target signal for which the ratio is smaller than the predetermined threshold. Sound signal enhancement method.

A received sound power estimation step for obtaining a power estimation value of the frequency domain signal using a frequency domain signal obtained by converting a received sound signal received by a plurality of microphones included in the microphone array into a frequency domain;
The power estimation value of the direct sound direction signal obtained by performing the process of mainly passing the signal component arriving from the direct sound source direction with respect to the frequency domain signal, or the direct sound source direction with respect to the received signal Direct sound direction power estimation step for obtaining a power estimation value of the direct sound direction signal obtained by converting the signal that has been processed to pass through the signal component mainly into the frequency domain;
It is obtained by performing a process of passing signal components mainly coming from other than the direct sound source direction with the same directivity shape as the process of mainly passing the signal components coming from the direct sound source direction of the direct sound direction power estimation unit. Obtained by converting the received reverberation direction signal power estimate value, or a signal obtained by performing a process of passing a signal component mainly coming from outside the direct sound source direction to the received sound signal into the frequency domain. A reverberant sound direction power estimating step for obtaining a power estimate of the reverberant sound direction signal;
A subtracting step for outputting a direct sound power estimated value obtained by subtracting a power estimated value of the reverberant sound direction signal from a power estimated value of the direct sound direction signal;
Using the power estimation value of the frequency domain signal and the power estimation value of the reverberation sound direction signal, the direct ratio to obtain the direct ratio estimation value representing the ratio of the power estimation value of the direct sound to the power estimation value of the reverberation sound direction signal A ratio calculating step;
Using the direct ratio estimation value, having a perspective determination step of performing perspective determination of the direct sound source to obtain a perspective determination result,
The direct ratio estimation value is obtained based on the received sound signal received in a frame that is a predetermined time interval,
The perspective determination step includes a determination value corresponding to the direct ratio estimation value obtained based on the received sound signal received in a determination section including one or more frames, and more than the determination section. The direct determination in the determination section is performed by comparison determination using a plurality of reference values corresponding to the direct ratio estimation values obtained based on the received sound signal received in the reference section including a number of frames. A perspective determination method for determining the perspective of a sound source.

A program for causing a computer to function as the acoustic signal enhancement device according to claim 1 or 2, the perspective determination device according to claim 3, or the acoustic signal enhancement device according to claim 4 or 5.