JP4928382B2

JP4928382B2 - Specific direction sound collection device, specific direction sound collection method, specific direction sound collection program, recording medium

Info

Publication number: JP4928382B2
Application number: JP2007208936A
Authority: JP
Inventors: 裕輔日岡; 和則小林; 賢一古家; 章俊片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2007-08-10
Filing date: 2007-08-10
Publication date: 2012-05-09
Anticipated expiration: 2027-08-10
Also published as: JP2009044588A

Description

本発明は音声通話や機器の操作などハンズフリー方式で音声を取得する収音装置に関するものであり、特に収音装置から見て特定の方向に存在する音源からの音だけを強調して収音したい場合に適用して好適な特定方向収音装置、特定方向収音方法、特定方向収音プログラム、および特定方向収音プログラムを記録した記録媒体に関する。 The present invention relates to a sound collection device that acquires a voice in a hands-free manner, such as a voice call or operation of a device. The present invention relates to a specific direction sound collection device, a specific direction sound collection method, a specific direction sound collection program, and a recording medium on which a specific direction sound collection program is recorded.

従来技術では、図１に示すようにｘ−ｙ平面上のＭ個の異なる位置（ｐ_１，ｑ_１）〜（ｐ_Ｍ，ｑ_Ｍ）に配置されたマイクロホンｍｉｃ．１〜ｍｉｃ．Ｍを用いて、任意の角度θ_Ｓの方向にある音源から発生される音を信号とし、それ以外の方向で発せられる音を雑音として、信号のみを強調して高いＳＮＲ（信号雑音比）で収音する。図２は従来の強調収音法の構成を示すブロック図である。位置（ｘ_ｍ，ｙ_ｍ）に配置されたマイクロホンｍで受音した信号ｘ_ｍ（ｎ）（ｍ＝１…Ｍ）に対し、式（１）のように遅延Ｄ_ｍを付加することにより信号ｙ_ｍ（ｎ）を得る。 In the prior art, as shown in FIG. 1, microphones mic. Arranged at _M different positions (p ₁ , q ₁ ) to (p _M , q _M ) on the xy plane are used. 1-mic. Using M, sound generated from a sound source in an arbitrary angle θ _S direction is used as a signal, sound generated in other directions is used as noise, and only the signal is emphasized with a high signal-to-noise ratio (SNR). Collect sound. FIG. 2 is a block diagram showing the configuration of a conventional enhanced sound collection method. The signal x _m (n) (m = 1... M) received by the microphone m arranged at the position (x _m , y _m ) is added by adding a delay D _m as shown in the equation (1). Obtain y _m (n).

ｙ_ｍ（ｎ）＝ｘ_ｍ（ｎ−Ｄ_ｍ）（１）
このとき遅延量Ｄ_ｍは、あらかじめ与えられた所望音源の方向θ_Ｓから、それぞれ式（２）により導出することができる。 y _m (n) = x _m (n−D _m ) (1)
At this time, the delay amount D _m can be derived from the desired sound source direction θ _S given in advance by the equation (2).

Ｄ_ｍ＝（ｄ_ｍ／ｃ）ｓｉｎθ_Ｓ（２）
ここでｃは音速であり、ｄ_ｍは図１においてθ_Ｓ方向から到来した音波から見たときの、マイクｍと基準点の間の距離で、式（３）により表される。 D _m = (d _m / c) sin θ _S (2)
Where c is the speed of sound, d _m is when viewed from the sound wave arriving from theta _S direction in FIG. 1, the distance between the microphone m and a reference point is represented by the formula (3).

ｄ_ｍ＝ｐ_ｍｓｉｎθ＋ｑ_ｍｃｏｓθ （３）
次にいま得られたｙ_ｍ（ｎ）を式（４）のように加算することで、所望位置から発せられる音を強調した信号ｚ（ｎ）が求められる。 d _m = p _m sin θ + q _m cos θ (3)
Next, y _m (n) obtained now is added as shown in Expression (4) to obtain a signal z (n) that emphasizes the sound emitted from the desired position.

以上が従来の強調収音法（非特許文献１）である。この従来技術で形成される指向特性には図３に示すように、主ビームＢＭに近接して比較的大きい利得を持つサイドローブＳＢとよばれる領域が生じるため、雑音を十分に抑圧することができない。またこのサイドローブＳＢの利得を極力小さく抑えるためには、マイクロホン数を増やし、またマイクロホンアレーを大型にする必要がある（非特許文献１）。
大賀寿郎、山崎芳男、金田豊共著、音響システムとディジタル処理、電子情報通信学会Ｐ．１８１〜Ｐ．１８６７．１．２．

The above is the conventional enhanced sound collection method (Non-Patent Document 1). As shown in FIG. 3, in the directivity characteristic formed by this conventional technique, a region called a side lobe SB having a relatively large gain is generated in the vicinity of the main beam BM, so that noise can be sufficiently suppressed. Can not. In order to keep the gain of the side lobe SB as small as possible, it is necessary to increase the number of microphones and increase the size of the microphone array (Non-Patent Document 1).
Toshiro Oga, Yoshio Yamazaki, Yutaka Kaneda, Acoustic systems and digital processing, IEICE 181-P. 186 7.1.2.

従来技術を用いて収音装置の指向特性をある特定の方向に向け、その方向で発せられる音を強調し、それ以外の方向で発せられる音を抑圧して収音する場合に、従来技術により形成される指向特性はサイドローブを持つ。したがって、本来抑圧したい方向から発せられる音が十分に抑圧されずに収音されてしまう問題があった。このため強調したい音源の方向以外に非常に大きな音を発する雑音源が存在する場合に、従来技術の収音装置は所望音源に対する十分な強調効果が得られなかった。また従来技術において、サイドローブを低減するには、マイクロホン数を増やし、またマイクロホンアレーを大型にしなければならず、実用する際には設置、運搬が困難であった。さらに従来技術による収音装置の指向特性は周波数によって変化するため、所望音や雑音のもつ周波数構造によっては、十分な強調効果が得られない問題があった。 When the sound collecting device is directed to a specific direction using conventional technology, the sound emitted in that direction is emphasized, and the sound emitted in other directions is suppressed and collected. The formed directivity has side lobes. Therefore, there has been a problem that sound emitted from the direction in which it is desired to be suppressed is collected without being sufficiently suppressed. For this reason, when there is a noise source that emits a very loud sound other than the direction of the sound source to be emphasized, the sound collecting device of the prior art cannot obtain a sufficient enhancement effect for the desired sound source. Further, in the prior art, in order to reduce the side lobes, the number of microphones must be increased and the microphone array must be increased in size, which is difficult to install and transport in practical use. Furthermore, since the directivity characteristics of the sound collecting device according to the prior art change depending on the frequency, there is a problem that a sufficient emphasis effect cannot be obtained depending on the frequency structure of the desired sound or noise.

本発明は以上の課題を解決されるためになされたもので、マイクロホンアレーの規模を拡大することなく、従来技術よりも高いＳＮＲで所望音源からの音を強調して収音する装置を実現することにある。 The present invention has been made to solve the above-described problems, and realizes an apparatus for enhancing and collecting sound from a desired sound source with an SNR higher than that of the prior art without increasing the scale of a microphone array. There is.

本発明の特定方向収音装置は、複数のビームフォーマー部、複数の周波数領域変換部、特定方向選択部、信号量推定部、利得係数算出部、乗算部を備える。ビームフォーマー部は、複数のマイクロホンを搭載して構成されるマイクロホンアレーの出力信号を利用してそれぞれが異なる方向の角度領域から到来する音を強調して収音する。周波数領域変換部は、複数のビームフォーマー部が収音した角度領域信号のそれぞれを複数の帯域成分に分割した周波数領域信号に変換する。特定方向選択部は、各周波数領域変換部が出力する周波数領域信号の中の所望方向の角度領域に属する特定方向周波数領域信号を選択する。信号量推定部は、領域集約手段と逆行列演算手段と乗算手段とを有する。領域集約手段は、特定方向周波数領域信号の信号量と他の方向の角度領域信号の信号量からなる集約パワーベクトルを求める。逆行列演算手段は、ビームフォーマー部の指向特性から求めた集約ゲイン行列の逆行列を求める。乗算手段は、集約パワーベクトルに逆行列を乗算して周波数領域信号の総和量の推定値を求める。利得係数算出部は、特定方向周波数領域信号の信号量と周波数領域信号の総和量との比により周波数帯域毎の利得係数を算出する。乗算部は、利得係数算出部が算出した利得係数を特定方向周波数領域信号の各対応する周波数帯域の信号量に乗算する。 The specific direction sound pickup apparatus of the present invention includes a plurality of beam former units, a plurality of frequency domain conversion units, a specific direction selection unit, a signal amount estimation unit, a gain coefficient calculation unit, and a multiplication unit. The beamformer unit collects sound by emphasizing sounds coming from angular regions in different directions using output signals of a microphone array configured by mounting a plurality of microphones. The frequency domain conversion unit converts each of the angle domain signals collected by the plurality of beam former units into a frequency domain signal divided into a plurality of band components. The specific direction selection unit selects a specific direction frequency domain signal belonging to an angle domain in a desired direction among the frequency domain signals output from each frequency domain conversion unit. The signal amount estimation unit includes a region aggregation unit, an inverse matrix calculation unit, and a multiplication unit. The region aggregating unit obtains an aggregate power vector composed of the signal amount of the specific direction frequency region signal and the signal amount of the angle region signal in the other direction. The inverse matrix calculation means obtains an inverse matrix of the aggregate gain matrix obtained from the directivity characteristics of the beam former unit. The multiplication means multiplies the aggregate power vector by an inverse matrix to obtain an estimated value of the total amount of frequency domain signals. The gain coefficient calculation unit calculates a gain coefficient for each frequency band based on a ratio between the signal amount of the specific direction frequency domain signal and the total amount of the frequency domain signals. The multiplying unit multiplies the gain coefficient calculated by the gain coefficient calculating unit by the signal amount of each corresponding frequency band of the specific direction frequency domain signal.

本発明の特定方向収音装置によれば、所望方向の音源が発する音を強調して収音する際の強調効果を改善するために、マイクロホンアレーによって受音した信号を用いて複数のビームフォーマー部処理の結果から各音源が発する音信号のパワーを推定し、収音領域内の信号を強調する非線形フィルタ係数を用いて所望音信号を強調する。したがって、マイクロホンの数の増大やマイクロホンアレーの大型化が必要ない。また、実用において設置や運搬が容易な小規模なシステムのまま強調効果を改善できる。 According to the specific direction sound pickup device of the present invention, in order to improve the enhancement effect when collecting sound by enhancing a sound emitted from a sound source in a desired direction, a plurality of beamforms are obtained using signals received by a microphone array. The power of the sound signal emitted from each sound source is estimated from the result of the image processing, and the desired sound signal is enhanced using a nonlinear filter coefficient that enhances the signal in the sound collection region. Therefore, it is not necessary to increase the number of microphones or increase the size of the microphone array. In addition, the emphasis effect can be improved with a small system that is easy to install and transport in practice.

また、本発明の特定方向収音装置の処理では、逆行列演算手段での処理が最も計算時間を必要とする。本発明の特定方向収音装置の信号量推定部は、２次元の集約パワーベクトルを用いるため、ビームフォーマー部の指向特性から求めた集約ゲイン行列も２行２列である。したがって、処理全体の計算量を大きく削減できる。 Further, in the processing of the sound collecting apparatus in the specific direction of the present invention, the processing by the inverse matrix calculating means requires the longest calculation time. Since the signal amount estimation unit of the sound collecting apparatus of the specific direction according to the present invention uses a two-dimensional aggregate power vector, the aggregate gain matrix obtained from the directivity of the beamformer unit is also 2 rows and 2 columns. Therefore, the calculation amount of the entire process can be greatly reduced.

原理
図４に本発明の特定方向収音装置のマイクロホンアレーの配置の例を示す。本発明では、図４に示すように、収音する領域を複数の方向領域に分割し、マイクロホンアレーの指向性をそれぞれの方向領域に向けるように制御して受音した信号を用いる。このときマイクロホンアレーによって処理された信号は、その処理前と比較して音源の存在する方向に応じてパワー（「信号量」とも言う。）が変化する。本発明では、このパワーの変化量を利用して、それぞれの方向領域から到来する信号のパワーを推定する。そして、推定されたパワーから、事前に与えられた方向領域から到来する信号を強調する非線形フィルタを構成し（利得係数を求め）、そのフィルタを通した信号を最終的な出力信号として得る。また、計算量を削減するために、上述の信号のパワー推定において、方向領域の集約を行う。 Principle FIG. 4 shows an example of the arrangement of the microphone array of the sound collecting device in a specific direction according to the present invention. In the present invention, as shown in FIG. 4, an area to be collected is divided into a plurality of direction areas, and a signal received by controlling the directivity of the microphone array to each direction area is used. At this time, the signal processed by the microphone array changes in power (also referred to as “signal amount”) in accordance with the direction in which the sound source exists, as compared to before the processing. In the present invention, the power of the signal arriving from each direction area is estimated using the amount of change in power. Then, a non-linear filter that emphasizes a signal arriving from a direction region given in advance is formed from the estimated power (a gain coefficient is obtained), and a signal that has passed through the filter is obtained as a final output signal. In order to reduce the amount of calculation, the direction areas are aggregated in the above-described signal power estimation.

以下に、具体的な実施形態を説明する。なお、同じ機能、同じ処理を行う構成部には同じ番号を付し、重複説明を省略する。
［第１実施形態］
はじめに本発明の全体の概要を説明する。図５は本発明の特定方向収音装置の全体構成例を示している。図６は本発明の特定方向収音装置の処理フローの例を示している。Ｍ（≧２）個のマイクロホンから構成されるマイクロホンアレー１１によって受音された信号ｘ_ｍ（ｎ）（ｍ＝１，２，…，Ｍ）は、ビームフォーマー部１２−１からビームフォーマー部１２−ＱまでのＱ個のビームフォーマー部１２−１〜１２−Ｑに入力される。ここでｎは離散時間信号のサンプル番号を表す。 Specific embodiments will be described below. In addition, the same number is attached | subjected to the component which performs the same function and the same process, and duplication description is abbreviate | omitted.
[First Embodiment]
First, an overall outline of the present invention will be described. FIG. 5 shows an example of the overall configuration of the specific direction sound pickup apparatus of the present invention. FIG. 6 shows an example of the processing flow of the specific direction sound pickup apparatus of the present invention. Signals x _m (n) (m = 1, 2,..., M) received by the microphone array 11 composed of M (≧ 2) microphones are transmitted from the beam former unit 12-1. Input to Q beam former units 12-1 to 12-Q up to unit 12-Q. Here, n represents the sample number of the discrete time signal.

ビームフォーマー部１２−１〜１２−Ｑでは、例えば図７に示すような指向性のビームＢＭを、図４であらかじめ与えられたＱ個の方向領域Θ_１〜Θ_Ｑのいずれかに向け、該当する方向領域で発せられる音を強調して収音する処理を行い、結果を出力する（Ｓ１２−１〜Ｓ１２−Ｑ）。各ビームフォーマー部１２−１〜１２−Ｑの出力信号ｙ_１（ｎ）、ｙ_２（ｎ）、…、ｙ_Ｑ（ｎ）はそれぞれ周波数領域変換部１３−１〜１３−Ｑに入力される。周波数領域変換部１３−１〜１３−Ｑは、入力された信号を短い時間長（例えばサンプリング周波数１６０００Ｈｚの場合には２５６サンプル程度）のフレームに分解し、それぞれのフレームにおいて離散フーリエ変換を行って、得られたΩ個の周波数成分を出力信号Ｙ_１（ω，ｌ）、Ｙ_２（ω，ｌ）、…Ｙ_Ｑ（ω，ｌ）として出力する（Ｓ１３−１〜Ｓ１３−Ｑ）。周波数領域変換された信号は、信号量推定部１４と特定方向選択部１５にそれぞれ入力される。 In the beam former units 12-1 to 12-Q, for example, a directional beam BM as shown in FIG. 7 is directed to any one of the _Q direction regions Θ _{1 to} Θ Q given in advance in FIG. A process of collecting sound by emphasizing the sound emitted in the corresponding direction area is performed, and the result is output (S12-1 to S12-Q). The output signals y ₁ (n), y ₂ (n),..., Y _Q (n) of the beam former units 12-1 to 12-Q are respectively input to the frequency domain transform units 13-1 to 13-Q. The The frequency domain transforming units 13-1 to 13-Q decompose the input signal into frames having a short time length (for example, about 256 samples when the sampling frequency is 16000 Hz), and perform discrete Fourier transform in each frame. The obtained Ω frequency components are output as output signals Y ₁ (ω, l), Y ₂ (ω, l),... Y _Q (ω, l) (S13-1 to S13-Q). The frequency domain transformed signal is input to the signal amount estimation unit 14 and the specific direction selection unit 15, respectively.

信号量推定部１４は、入力されたビームフォーマー部１２−１〜１２−Ｑの出力信号パワーから各方向領域Θ_１〜Θ_Ｑにおける音源から発せられる音信号の総和のパワー成分を求め、これを１つのベクトルにまとめた信号パワーベクトルＸ_ｅｓｔ（ω，ｌ）を出力する（Ｓ１４）。 The signal amount estimation unit 14 obtains a power component of the sum of sound signals emitted from the sound sources in the respective direction regions Θ _{1 to} Θ _Q from the output signal power of the input beam former units 12-1 to 12 _-Q . A signal power vector X _est (ω, l) in which these are combined into one vector is output (S14).

特定方向選択部１５は、強調したい方向領域に指向性のビームを向けたビームフォーマー部の出力を選択しＹ_Ｓ（ω，ｌ）として出力する（Ｓ１５）。 The specific direction selection unit 15 selects the output of the beam former unit that directs the directional beam to the direction region to be emphasized, and outputs it as Y _S (ω, l) (S15).

利得係数算出部１６は、入力された信号パワーベクトルＸ_ｅｓｔ（ω，ｌ）から利得係数Ｒ（ω，ｌ）を算出し、出力する（Ｓ１６）。利得係数Ｒ（ω，ｌ）は乗算部１７に入力される。乗算部１７は入力された利得係数Ｒ（ω，ｌ）と特定方向選択部１５の出力Ｙ_Ｓ（ω，ｌ）を同じ周波数の成分ごとに掛け算した結果を出力する（Ｓ１７）。乗算部１７の出力信号Ｙ_ＳＲ（ω，ｌ）は逆周波数領域変換部１８に入力され、逆離散フーリエ変換を行って時間信号に復元された信号ｙ（ｎ）が出力される（Ｓ１８）。この信号ｙ（ｎ）が本発明の装置によって所望音が強調されて収音された信号である。 The gain coefficient calculation unit 16 calculates the gain coefficient R (ω, l) from the input signal power vector X _est (ω, l) and outputs it (S16). The gain coefficient R (ω, l) is input to the multiplication unit 17. The multiplier 17 outputs the result of multiplying the input gain coefficient R (ω, l) and the output Y _S (ω, l) of the specific direction selector 15 for each component of the same frequency (S17). The output signal Y _SR (ω, l) of the multiplication unit 17 is input to the inverse frequency domain transform unit 18, and a signal y (n) restored to a time signal by performing inverse discrete Fourier transform is output (S18). This signal y (n) is a signal picked up by enhancing the desired sound by the apparatus of the present invention.

ビームフォーマー部１２−１〜１２−Ｑ、信号量推定部１４、特定方向選択部１５、利得係数算出部１６の詳細は別の図を用いて以下に順に説明する。 Details of the beam former units 12-1 to 12-Q, the signal amount estimation unit 14, the specific direction selection unit 15, and the gain coefficient calculation unit 16 will be described in order below with reference to another drawing.

（ビームフォーマー部）
図８はビームフォーマー部１２−１〜１２−Ｑの中の一つの構成を示している。同様の処理がすべてのビームフォーマー部において行われる。入力された信号ｘ_ｍ（ｎ）（ｍ＝１，２，…，Ｍ）はフィルタ処理部ＦＣ１〜ＦＣＭに入力される。フィルタ処理部ＦＣ１〜ＦＣＭではあらかじめ与えられた（決定方法は後述する）フィルタ係数Ｗ_ｑｍ（ｎ）を、式（５）に示す畳み込み演算に代入して得られる信号ｘ’_ｑｍ（ｎ）を出力する。 (Beam former part)
FIG. 8 shows one configuration of the beam former units 12-1 to 12-Q. Similar processing is performed in all beam former units. The input signal x _m (n) (m = 1, 2,..., M) is input to the filter processing units FC1 to FCM. The filter processing units FC1 to FCM output a signal x ′ _qm (n) obtained by substituting a filter coefficient W _qm (n) given in advance (determination method will be described later) into the convolution operation shown in Expression (5). To do.

各フィルタ処理部ＦＣ１〜ＦＣＭの出力信号は加算部ＡＤＤに入力される。加算部ＡＤＤでは入力信号を式（６）のように加算し、ビームフォーマー部の出力信号ｙ_ｑ（ｎ）（ｑ＝１…Ｑ）を得る。

The output signals of the filter processing units FC1 to FCM are input to the adding unit ADD. The adder ADD adds the input signals as shown in Expression (6) to obtain the output signal y _q (n) (q = 1... Q) of the beam former.

ここでフィルタ係数Ｗ_ｑｍ（ｎ）は、それぞれのビームフォーマー部１２−１〜１２−Ｑの指向特性Ｄ_ｑ（ω，θ）が、図４に示すあらかじめ与えられた第Ｑ方向領域Θ_Ｑで発せられる音を強調して受音し、それ以外の方向で発せられる音を抑圧するように設計される。

Here, the filter coefficient W _qm (n) indicates that the directivity characteristics D _q (ω, θ) of the respective beam former units 12-1 to 12-Q are given in the Q-direction region Θ _Q given in advance as shown in FIG. It is designed to receive sound with emphasis on the sound emitted from the, and to suppress sound emitted in other directions.

（信号量推定部）
図９は信号量推定部１４の構成を示している。信号量推定部１４に入力される周波数成分Ｙ_１（ω，ｌ）、Ｙ_２（ω，ｌ）、…、Ｙ_Ｑ（ω，ｌ）はそれぞれパワー演算部ＰＷ−１〜ＰＷ−Ｑに入力され、信号のパワー値｜Ｙ_１（ω，ｌ）｜^２、｜Ｙ_２（ω，ｌ）｜^２、…、｜Ｙ_Ｑ（ω，ｌ）｜^２が出力され、領域集約部１４Ａに入力される（図６のＳＰＡ）。領域集約部１４Ａは、あらかじめ決められた収音したい領域の集合Ｓから発せられる信号のパワー値の平均と、抑圧したい領域の集合Ｎから発せられる信号のパワー平均を求め、その結果からなる集約パワーベクトルＹ（ω，ｌ）を出力する（図６のＳ１４Ａ）。 (Signal amount estimation unit)
FIG. 9 shows the configuration of the signal amount estimation unit 14. The frequency components Y ₁ (ω, l), Y ₂ (ω, l),..., Y _Q (ω, l) input to the signal amount estimation unit 14 are input to the power calculation units PW-1 to PW-Q, respectively. The signal power values | Y ₁ (ω, l) | ² , | Y ₂ (ω, l) | ² ,..., | Y _Q (ω, l) | ² are output and input to the region aggregation unit 14A. (SPA in FIG. 6). The area aggregating unit 14A obtains an average of power values of signals emitted from a predetermined set S of areas to be collected and an average power of signals emitted from a set N of areas to be suppressed, and an aggregate power obtained as a result thereof. The vector Y (ω, l) is output (S14A in FIG. 6).

ただし、Ｎ_Ｓは集合Ｓに含まれる領域の数、Ｎ_Ｎは集合Ｎに含まれる領域の数を示している。また、すべての方向領域（１〜Ｑ）を集合Ｓまたは集合Ｎに所属するようにあらかじめ定めておく。例えば、Ｑ＝４のとき、集合Ｓと集合ＮをＳ＝｛１，２｝、Ｎ＝｛３，４｝のように決めればよい。

However, N _S is the number of areas included in the set S, N _N indicates the number of areas included in the set N. Further, all the direction areas (1 to Q) are determined in advance so as to belong to the set S or the set N. For example, when Q = 4, the sets S and N may be determined as S = {1, 2} and N = {3, 4}.

ビームフォーマー部出力パワーベクトルＹ（ω，ｌ）は乗算部１４Ｂに入力される。乗算部１４Ｂのもう一方の入力であるパワー推定行列Ｔ^−１（ω）は、逆行列演算部１４Ｃの出力信号である。逆行列演算部１４Ｃには式（８）により定義される集約ゲイン行列Ｔ（ω）が入力され、その逆行列Ｔ^−１（ω）を出力する（図６のＳ１４Ｃ）。 The beamformer unit output power vector Y (ω, l) is input to the multiplication unit 14B. The power estimation matrix T ⁻¹ (ω), which is the other input of the multiplication unit 14B, is an output signal of the inverse matrix calculation unit 14C. The inverse matrix calculator 14C receives the aggregate gain matrix T (ω) defined by the equation (8) and outputs the inverse matrix T ⁻¹ (ω) (S14C in FIG. 6).

集約ゲイン行列Ｔの各要素は、図１０に示すように各ビームフォーマー部の各方向領域に対する指向特性の平均値から求められるパラメータであり、例えば、式（９）に示すよう指向特性の方向に関する平均値を用いる。

Each element of the aggregate gain matrix T is a parameter obtained from the average value of the directivity with respect to each direction area of each beamformer unit as shown in FIG. 10, for example, the direction of the directivity as shown in Expression (9). The average value for is used.

α_ｐｑはビームフォーマー部１２−ｐの第ｑ方向領域に対する指向特性の平均値である。なお、指向特性は、例えば非特許文献１に記載されている技術を用いてフィルタ係数Ｗ_ｍ（ｎ）より求めることができる。

α _pq is an average value of directivity with respect to the q-th direction region of the beam former unit 12-p. The directivity can be obtained from the filter coefficient W _m (n) using the technique described in Non-Patent Document 1, for example.

乗算部１４Ｂは式（１０）に示すように、入力されたビームフォーマー部出力パワーベクトルＹ（ω，ｌ）とパワー推定行列Ｔ^−１（ω）の乗算を周波数成分ごとに行い、推定信号パワーベクトルＸ_ｅｓｔ（ω，ｌ）を出力する（図６のＳ１４Ｂ）。 As shown in Expression (10), the multiplication unit 14B performs multiplication of the input beamformer unit output power vector Y (ω, l) and the power estimation matrix T ⁻¹ (ω) for each frequency component, and thus an estimated signal. The power vector X _est (ω, l) is output (S14B in FIG. 6).

Ｘ_ｅｓｔ（ω，ｌ）＝Ｔ^−１（ω）Ｙ（ω，ｌ）（１０）
なお、本発明の原理の中で説明した方向領域の集約を行って信号のパワー（信号量）を推定するのが、信号量推定部１４である。 X _est (ω, l) = T ⁻¹ (ω) Y (ω, l) (10)
The signal amount estimation unit 14 estimates the signal power (signal amount) by performing the aggregation of the direction areas described in the principle of the present invention.

（特定方向選択部）
図１１は特定方向選択部１５の構成を示している。特定方向選択部１５では各周波数領域変換部１３−１〜１３−Ｑから入力された周波数成分Ｙ_１（ω，ｌ）〜Ｙ_Ｑ（ω，ｌ）のうち、強調したい第ｑ方向領域（ただし、ｑは１，…，Ｑの中から選択した１つ）に対応するものを選択してＹ_Ｓ（ω，ｌ）として出力する。 (Specific direction selector)
FIG. 11 shows the configuration of the specific direction selection unit 15. In the specific direction selection unit 15, among the frequency components Y ₁ (ω, l) to Y _Q (ω, l) input from the frequency domain conversion units 13-1 to 13-Q, the q-th direction region to be emphasized (however, , Q is one corresponding to one selected from 1,..., Q), and is output as Y _S (ω, l).

Ｙ_Ｓ（ω，ｌ）＝Ｙ_ｑ（ω，ｌ）（１１）
（利得係数算出部）
図１２は利得係数算出部１６の構成を示している。信号量推定部１４より入力された推定信号パワーベクトルＸ_ｅｓｔ（ω，ｌ）はベクトル要素抽出部１６Ａに入力される。推定信号パワーベクトルＸ_ｅｓｔ（ω，ｌ）は、式（１２）に示すように、入力された推定信号パワーベクトルの収音領域信号推定パワー｜Ｓ（ω，ｌ）｜^２を第１成分、入力された推定信号パワーベクトルの抑圧領域信号推定パワー｜Ｎ（ω，ｌ）｜^２を第２成分とする。 Y _S (ω, l) = Y _q (ω, l) (11)
(Gain coefficient calculator)
FIG. 12 shows the configuration of the gain coefficient calculation unit 16. The estimated signal power vector X _est (ω, l) input from the signal amount estimation unit 14 is input to the vector element extraction unit 16A. As shown in Expression (12), the estimated signal power vector X _est (ω, l) is a first component of the sound collection region signal estimated power | S (ω, l) | ² of the input estimated signal power vector, The suppression region signal estimated power | N (ω, l) | ² of the input estimated signal power vector is set as the second component.

Ｘ_ｅｓｔ（ω，ｌ）＝［｜Ｓ（ω，ｌ）｜^２｜Ｎ（ω，ｌ）｜^２］^Ｔ（１２）
ベクトル要素抽出部１６Ａは、収音領域信号推定パワー｜Ｓ（ω，ｌ）｜^２と抑圧領域信号推定パワー｜Ｎ（ω，ｌ）｜^２を出力し、それらをＳＮ比推定部１６Ｂに入力する。ＳＮ比推定部１６Ｂでは式（１３）を用いて所望方向領域の信号を強調する利得係数Ｒ（ω，ｌ）を計算し出力する。 X _est (ω, l) = [| S (ω, l) | ² | N (ω, l) | ² ] ^T (12)
The vector element extraction unit 16A outputs the sound collection region signal estimation power | S (ω, l) | ² and the suppression region signal estimation power | N (ω, l) | ² and inputs them to the SN ratio estimation unit 16B. To do. The signal-to-noise ratio estimation unit 16B calculates and outputs a gain coefficient R (ω, l) that enhances the signal in the desired direction region using Expression (13).

ここで、αは利得係数Ｒ（ω，ｌ）によって所望方向領域の信号の強調を調整するパラメータであって、例えばα＝１／２とすればよい。

Here, α is a parameter for adjusting the enhancement of the signal in the desired direction region by the gain coefficient R (ω, l), and for example, α may be set to 1/2.

このように、本実施形態の特定方向収音装置によれば、所望方向の音源が発する音を強調して収音する際の強調効果を改善するために、マイクロホンアレー１１によって受音した信号を用いて複数のビームフォーマー部１２−１〜１２−Ｑの結果から各音源が発する音信号のパワーを推定し、収音領域内の信号を強調する利得係数（非線形フィルタ係数）を用いて所望音信号を強調する。したがって、マイクロホンの数の増大やマイクロホンアレーの大型化が必要ない。また、実用において設置や運搬が容易な小規模なシステムのまま強調効果を改善できる。 As described above, according to the specific direction sound collecting apparatus of the present embodiment, the signal received by the microphone array 11 is received in order to improve the enhancement effect when the sound emitted from the sound source in the desired direction is emphasized. Using the gain coefficient (nonlinear filter coefficient) that estimates the power of the sound signal emitted from each sound source from the results of the plurality of beam former units 12-1 to 12-Q and emphasizes the signal in the sound collection region Emphasize the sound signal. Therefore, it is not necessary to increase the number of microphones or increase the size of the microphone array. In addition, the emphasis effect can be improved with a small system that is easy to install and transport in practice.

また、本実施形態の特定方向収音装置の信号量推定部１４は、２次元の集約パワーベクトルを用いるため、ビームフォーマー部１２−１〜１２−Ｑの指向特性から求めた集約ゲイン行列も２行２列である。したがって、処理全体の計算量を大きく削減できる。
［第２実施形態］
第２実施形態の特定方向収音装置は、第１実施形態の特定方向収音装置の信号量推定部１４、利得係数算出部１６、乗算部１７での処理手順を変更したものである。図１３は、第２実施形態の特定方向収音装置の構成例を示す図である。第１実施形態との相違点は、周波数領域変換部１３−１〜１３−Ｑの後段に帯域分割部１９−１〜１９−Ｑを備え、信号量推定部１４、利得係数算出部１６、乗算部１７の各処理が、Ω個の周波数帯域ごとに行われる点、および、各周波数帯域での乗算部１７の後段に帯域合成部２１を備え、各帯域の乗算部１７からの出力を合成する点である。図１４に帯域分割部の構成を、図１５に帯域合成部の構成を示す。 In addition, since the signal amount estimation unit 14 of the specific direction sound pickup apparatus of the present embodiment uses a two-dimensional aggregate power vector, an aggregate gain matrix obtained from the directivity characteristics of the beam former units 12-1 to 12-Q is also used. 2 rows and 2 columns. Therefore, the calculation amount of the entire process can be greatly reduced.
[Second Embodiment]
The specific direction sound collecting device of the second embodiment is obtained by changing the processing procedure in the signal amount estimating unit 14, the gain coefficient calculating unit 16, and the multiplying unit 17 of the specific direction sound collecting device of the first embodiment. FIG. 13 is a diagram illustrating a configuration example of the specific direction sound pickup device of the second embodiment. The difference from the first embodiment is that band division units 19-1 to 19-Q are provided after the frequency domain conversion units 13-1 to 13-Q, a signal amount estimation unit 14, a gain coefficient calculation unit 16, and a multiplication. Each processing of the unit 17 is performed for every Ω frequency bands, and a band synthesizing unit 21 is provided in the subsequent stage of the multiplying unit 17 in each frequency band, and the output from the multiplying unit 17 of each band is synthesized. Is a point. FIG. 14 shows the configuration of the band dividing unit, and FIG. 15 shows the configuration of the band synthesizing unit.

本実施形態の同一帯域成分収集部２０−ｘ（ただし、ｘは１，…，Ω）の信号量推定部１４の集約ゲイン行列Ｔ_ｘ（ω）は、式（１４）のように定めればよい。 The aggregate gain matrix T _x (ω) of the signal amount estimation unit 14 of the same band component collection unit 20-x (where x is 1,..., Ω) of the present embodiment is determined as shown in Expression (14). Good.

ただし、Ｎ_ｘは、集約されたｘ番目の帯域に含まれる周波数ビンの数である。その他の部分は第１実施形態と同じである。

Here, N _x is the number of frequency bins included in the aggregated x th band. Other parts are the same as those in the first embodiment.

このような構成であるから、第２実施形態の特定方向収音装置も第１実施形態の特定方向収音装置と同じ効果を得ることができる。さらに、第２実施形態の特定方向収音装置は、Ω個の周波数帯域ごとに演算を行えるので、演算量を削減する効果もある。
［実験例］
図１６に本発明の特定方向収音装置出の実験結果を示す。図１６は、所望音源（女声）の位置を０度に固定し、雑音源（男声）の位置を図４に示す１５度おきの方向に変化させる実験での雑音抑圧量をデシベル値で示したものである。図１６では、極座標系の内側に行くほど雑音抑圧量が大きい。また、本実験では、収音したい領域を０度〜９０度、２７０度〜３６０度に設定したため、それ以外の方向（９０度〜２７０度）が抑圧したい領域となる。なお、本実験では一辺２４ｃｍの正方形の各頂点に配置された４つの単一指向性マイクからなるマイクロホンアレーを用いた。 Since it is such a structure, the specific direction sound collection apparatus of 2nd Embodiment can also acquire the same effect as the specific direction sound collection apparatus of 1st Embodiment. Furthermore, since the specific direction sound pickup device of the second embodiment can perform calculation for every Ω frequency bands, there is also an effect of reducing the amount of calculation.
[Experimental example]
FIG. 16 shows the experimental results of the specific direction sound pickup device of the present invention. FIG. 16 shows the noise suppression amount in decibel values in an experiment in which the position of the desired sound source (female voice) is fixed at 0 degree and the position of the noise source (male voice) is changed in the direction of every 15 degrees shown in FIG. Is. In FIG. 16, the amount of noise suppression increases as it goes inside the polar coordinate system. Further, in this experiment, since the region to be collected is set to 0 degrees to 90 degrees, 270 degrees to 360 degrees, the other direction (90 degrees to 270 degrees) is the region to be suppressed. In this experiment, a microphone array composed of four unidirectional microphones arranged at each vertex of a square with a side of 24 cm was used.

従来技術では、所望音源から離れるにしたがって雑音抑圧量が緩やかに増加しているが、本発明による方法では、雑音抑圧量は収音したい領域では一様に低く、抑圧したい領域との境界を越えると急激に増加している。また、従来技術の雑音抑圧量は最も大きい方向でも７ｄＢ程度であるのに対し、本発明による方法では抑圧したい領域のほとんどの方向に対して１０ｄＢ以上の雑音抑圧量を実現している。このことから本発明による方法は収音したい領域の音を一様に取得するとともに、従来技術と比較して高い雑音抑圧性能を抑圧したい領域全体にわたって持つことが確認できる。 In the prior art, the noise suppression amount gradually increases as the distance from the desired sound source increases. However, in the method according to the present invention, the noise suppression amount is uniformly low in the region where the sound is to be collected and exceeds the boundary with the region where the noise is to be suppressed. And it is increasing rapidly. The noise suppression amount of the prior art is about 7 dB even in the largest direction, whereas the method according to the present invention realizes a noise suppression amount of 10 dB or more in most directions of the region to be suppressed. From this, it can be confirmed that the method according to the present invention uniformly obtains the sound of the region to be picked up, and has high noise suppression performance over the entire region to be suppressed as compared with the prior art.

図１７に、コンピュータの機能構成例を示す。なお、本発明の収音装置は、コンピュータ２０００の記録部２０２０に、本発明の各構成部としてコンピュータ２０００を動作させるプログラムを読み込ませ、処理部２０１０、入力部２０３０、出力部２０４０などを動作させることで実現できる。また、コンピュータに読み込ませる方法としては、プログラムをコンピュータ読み取り可能な記録媒体に記録しておき、記録媒体からコンピュータに読み込ませる方法、サーバ等に記録されたプログラムを、電気通信回線等を通じてコンピュータに読み込ませる方法などがある。 FIG. 17 shows a functional configuration example of a computer. Note that the sound collection device of the present invention causes the recording unit 2020 of the computer 2000 to read a program that causes the computer 2000 to operate as each component of the present invention and operate the processing unit 2010, the input unit 2030, the output unit 2040, and the like. This can be achieved. In addition, as a method of causing the computer to read, the program is recorded on a computer-readable recording medium, and the program recorded on the server or the like is read into the computer through a telecommunication line or the like. There is a method to make it.

従来技術によるマイクロホンアレーの収音方法を説明するための図。The figure for demonstrating the sound collection method of the microphone array by a prior art. 従来の特定方向収音装置を説明するためのブロック図。The block diagram for demonstrating the conventional specific direction sound-collecting apparatus. 従来の特定方向収音装置の指向特性を説明するための図。The figure for demonstrating the directional characteristic of the conventional specific direction sound-collecting apparatus. 本発明の特定方向収音装置のマイクロホンアレーの配置の例を示す図。The figure which shows the example of arrangement | positioning of the microphone array of the specific direction sound collection device of this invention. 第１実施形態の特定方向収音装置の全体構成例を示す図。The figure which shows the example of whole structure of the specific direction sound collection apparatus of 1st Embodiment. 第１実施形態の特定方向収音装置の処理フローの例を示す図。The figure which shows the example of the processing flow of the specific direction sound collection apparatus of 1st Embodiment. 本発明に用いるビームフォーマー部の指向特性を説明するための図。The figure for demonstrating the directivity of the beam former part used for this invention. ビームフォーマー部の構成を示す図。The figure which shows the structure of a beam former part. 信号量推定部の構成を示す図。The figure which shows the structure of a signal amount estimation part. 本発明に用いるビームフォーマー部の指向特性の一例を説明するための図。The figure for demonstrating an example of the directional characteristic of the beam former part used for this invention. 特定方向選択部の構成を示す図。The figure which shows the structure of a specific direction selection part. 利得係数算出部の構成を示す図。The figure which shows the structure of a gain coefficient calculation part. 第２実施形態の特定方向収音装置の構成例を示す図。The figure which shows the structural example of the specific direction sound collection apparatus of 2nd Embodiment. 帯域分割部の構成を示す図。The figure which shows the structure of a band division part. 帯域合成部の構成を示す図。The figure which shows the structure of a zone | band synthetic | combination part. 本発明の特定方向収音装置出の実験結果を示す図。The figure which shows the experimental result of the specific direction sound-collecting apparatus of this invention. コンピュータの機能構成例を示す図。The figure which shows the function structural example of a computer.

Explanation of symbols

１１マイクロホンアレー１２−１〜１２−Ｑビームフォーマー部
１３−１〜１３−Ｑ周波数領域変換部１４信号量推定部
１４Ａ領域集約部１４Ｂ乗算部
１４Ｃ逆行列演算部１５特定方向選択部
１６利得係数算出部１７乗算部
１８逆周波数領域変換部１９−１〜１９−Ｑ帯域分割部
２０−１〜２０−Ω 同一帯域成分収集部２１帯域合成部
DESCRIPTION OF SYMBOLS 11 Microphone array 12-1 to 12-Q Beam former part 13-1 to 13-Q Frequency domain conversion part 14 Signal amount estimation part 14A Area aggregation part 14B Multiplication part 14C Inverse matrix calculation part 15 Specific direction selection part 16 Gain coefficient Calculation unit 17 Multiplication unit 18 Inverse frequency domain conversion unit 19-1 to 19-Q Band division unit 20-1 to 20-Ω Same band component collection unit 21 Band synthesis unit

Claims

A plurality of beamformer sections that emphasize and collect sound coming from angular regions in different directions using output signals of a microphone array configured with a plurality of microphones;
A plurality of frequency domain conversion units for converting each of the angle domain signals collected by the plurality of beamformer units into a frequency domain signal divided into a plurality of band components;
A specific direction selection unit that selects a specific direction frequency domain signal from frequency domain signals belonging to a plurality of angular domains in a desired direction among the frequency domain signals output by the plurality of frequency domain transform units;
An aggregate power vector whose elements are a signal amount that is an average of frequency domain signals in a plurality of angle regions in the desired direction and a signal amount that is an average of frequency domain signals in a plurality of angle regions other than the angle region in the desired direction. Area aggregation means to be obtained; parameters obtained from an average value of directivity characteristics of the plurality of angle areas in the desired direction obtained from the directivity characteristics of the beamformer unit; and directivity of a plurality of angle areas other than the angle area of the desired direction An inverse matrix computing means for obtaining an inverse matrix of an aggregate gain matrix having a parameter obtained from an average value of characteristics as an element; and multiplying the aggregate power vector by the inverse matrix to obtain signals of a plurality of angle regions in the desired direction. Sound collection area signal estimation power, which is an estimated value of power, and suppression area signal estimation, which is an estimate of the power of signals in a plurality of angle areas other than the angle area in the desired direction A signal estimation unit and a multiplication means for obtaining an estimated signal power vector and a word as elements,
A gain coefficient calculation unit for calculating a gain coefficient for each frequency band from the sound collection area signal estimation power and the suppression area signal estimation power ;
A multiplier that multiplies the signal amount in each corresponding frequency band of the specific direction frequency domain signal by the gain coefficient calculated by the gain coefficient calculator;
A specific direction sound pickup device.

A plurality of beamformer processing steps for enhancing and collecting sounds arriving from angular regions in different directions using output signals of a microphone array configured with a plurality of microphones;
A plurality of frequency domain conversion steps for converting each of the angle domain signals collected in the plurality of beamformer processing steps into a frequency domain signal divided into a plurality of band components;
A specific direction selection step of selecting a specific direction frequency domain signal from frequency domain signals belonging to a plurality of angle domains in a desired direction among frequency domain signals output by the plurality of frequency domain transformation steps;
An aggregate power vector whose elements are a signal amount that is an average of frequency domain signals in a plurality of angle regions in the desired direction and a signal amount that is an average of frequency domain signals in a plurality of angle regions other than the angle region in the desired direction. Parameters of directivity characteristics of a plurality of angle regions in the desired direction and parameters of directivity characteristics of a plurality of angle regions other than the angle region of the desired direction obtained from the directivity characteristics of the region aggregation sub-step to be obtained and the beamformer processing step And an inverse matrix calculation sub-step for obtaining an inverse matrix of an aggregate gain matrix having the element as an element, and multiplying the aggregate power vector by the inverse matrix to obtain an estimate of power of signals in a plurality of angle regions in the desired direction. Elements of sound region signal estimation power and suppression region signal estimation power that is an estimated value of the power of signals in a plurality of angle regions other than the angle region of the desired direction A signal estimation step and a multiplier sub-step of obtaining an estimated signal power vectors,
A gain coefficient calculation step of calculating a gain coefficient for each frequency band from the sound collection area signal estimation power and the suppression area signal estimation power ;
A multiplication step of multiplying the signal amount of each corresponding frequency band of the specific direction frequency domain signal by the gain factor calculated by the gain factor calculation step;
A specific direction sound collection method.

A specific direction sound collecting program for operating a computer as the specific direction sound collecting device according to claim 1.

A computer-readable recording medium on which the sound collecting program for a specific direction according to claim 3 is recorded.