JP2018142826A

JP2018142826A - Non-target sound suppression device, method and program

Info

Publication number: JP2018142826A
Application number: JP2017035348A
Authority: JP
Inventors: 克之高橋; Katsuyuki Takahashi
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2017-02-27
Filing date: 2017-02-27
Publication date: 2018-09-13
Anticipated expiration: 2037-02-27
Also published as: JP6903947B2

Abstract

PROBLEM TO BE SOLVED: To achieve the satisfactory sound quality of target sound, and to control a suppression coefficient or subtraction coefficient with a low processing load when reducing or subtracting non-target sound from input signals.SOLUTION: A non-target sound suppression device according to the present invention, comprises: a front suppression signal generation unit which generates front suppression signals with a blind spot in front thereof based on differences among a plurality of frequency domain input signals resulting from conversion of input signals from a time domain to a frequency domain; a coherence calculating unit which calculates a coherence based on signals obtained from the plurality of input signals; a feature quantity calculating unit which calculates a feature quantity indicative of the relationship between the front suppression signal and the coherence; and a non-target sound suppression processing unit which uses the feature quantity indicative of the relationship between the front suppression signal and the coherence to set a coefficient involved in suppression of non-target sound included in the input signals, and obtains post-suppression processing signals produced by using the coefficient to suppress the non-target sound included in the input signals.SELECTED DRAWING: Figure 1

Description

この発明は、非目的音抑圧装置、方法及びプログラムに関し、例えば、電話やテレビ会議システムなどの音声を用いる通信装置または通信ソフトウェア、あるいは音声認識処理の前処理で用いる音響信号処理に適用し得るものである。 The present invention relates to a non-target sound suppressing apparatus, method, and program, and can be applied to, for example, a communication device or communication software that uses voice, such as a telephone or a video conference system, or an acoustic signal process used in a preprocessing of voice recognition It is.

近年、スマートフォンやカーナビゲーションなど、音声通話機能や音声認識機能などの様々な音声処理機能が搭載された機器が普及している。しかし、これらの機器が普及したことで、混雑した街中や走行中の車内など、以前よりも過酷な雑音環境下で音声処理機能が用いられるようになってきている。そのため、雑音環境下でも通話音質や音声認識性能を維持できるような、信号処理技術の需要が高まっている。 In recent years, devices equipped with various voice processing functions such as a voice call function and a voice recognition function have become widespread, such as smartphones and car navigation systems. However, with the widespread use of these devices, voice processing functions are being used in harsher noise environments than before, such as in crowded streets and running cars. For this reason, there is an increasing demand for signal processing technology that can maintain call sound quality and speech recognition performance even in a noisy environment.

音声処理機能の性能を阻害する雑音は、例えば、街中での雑踏や、自動車の走行雑音などの背景雑音と、妨害音（例えば、音声処理機能の使用者以外の人の話し声等の妨害音声）に大別できる。背景雑音は周波数特性やパワーが定常であることを前提に、様々な有効な抑圧方法が提案されてきた（特許文献１〜３、非特許文献１参照）。 Noise that hinders the performance of the voice processing function includes, for example, background noises such as crowds in the streets and driving noise of automobiles, and disturbing sounds (for example, disturbing voices such as the speech of people other than the user of the voice processing function) Can be broadly divided. Various effective suppression methods have been proposed on the assumption that the background noise has a steady frequency characteristic and power (see Patent Documents 1 to 3 and Non-Patent Document 1).

特表２０１０−５３２８７９号公報Japanese translation of PCT publication 2010-532879 特開２０１４−１０６３３７号公報JP 2014-106337 A 特開２０１４−１６４１９１号公報JP 2014-164191 A

平岡和幸、堀玄著，“プログラミングのための確率統計”,オーム社，平成２１年１０月２３日発行Published by Kazuyuki Hiraoka and Gen Hori, “Probability Statistics for Programming,” Ohmsha, October 23, 2009

しかし、前述のように、音声信号処理機能の利用環境の急拡大により、背景雑音が定常ではない場合も増えている。従って、背景雑音の特性の変動に素早く追従できる背景雑音抑圧方法が求められているが、妨害音が存在する信号区間で背景雑音を抑圧した場合に、目的音の信号成分も欠落させ、音質が劣化する場合が生じ得る。 However, as described above, the background noise is not steady due to the rapid expansion of the usage environment of the audio signal processing function. Therefore, there is a need for a background noise suppression method that can quickly follow fluctuations in the characteristics of background noise.However, when background noise is suppressed in a signal section in which an interfering sound exists, the signal component of the target sound is also lost and the sound quality is improved. Degradation may occur.

また、特許文献３には、入力信号から正面から到来する成分を抑圧した信号（正面抑圧信号と呼ぶ。）を減算することで、周囲から到来した妨害音を抑圧する技術が開示されるが、減算の際に、正面抑圧信号に減算係数を乗算することで減算の強度を制御することが多く、減算係数は大きすぎると抑圧性能が過剰で目的音の歪が増し、小さすぎると妨害音の抑圧性能が不十分、というように音質に大きな影響を及ぼす。しかし、目的音に重畳されている妨害音の存在判定は難しく、減算係数を適切な値に設定することは困難である。 Patent Document 3 discloses a technique for suppressing interfering sounds coming from the surroundings by subtracting a signal (referred to as a front suppression signal) in which a component coming from the front is suppressed from the input signal. When subtracting, the intensity of subtraction is often controlled by multiplying the front suppression signal by the subtraction coefficient.If the subtraction coefficient is too large, the suppression performance is excessive and distortion of the target sound increases. It has a great influence on sound quality such as insufficient suppression performance. However, it is difficult to determine the presence of the interference sound superimposed on the target sound, and it is difficult to set the subtraction coefficient to an appropriate value.

そのため、上記課題に鑑み、入力信号から非目的音を抑圧又は減算する際に、目的音の音質を良好とし、処理負荷を抑え、抑圧係数又は減算係数を制御することができる非目的音抑圧装置、方法及びプログラムが求められている。 Therefore, in view of the above problems, when suppressing or subtracting a non-target sound from an input signal, the non-target sound suppressing apparatus can improve the sound quality of the target sound, suppress the processing load, and control the suppression coefficient or the subtraction coefficient. There is a need for methods and programs.

かかる課題を解決するために、第１の本発明に係る非目的音抑圧装置は、（１）複数のマイクのそれぞれからの各入力信号を時間領域から周波数領域に変換して得た、複数の周波数領域入力信号の差に基づいて、正面に死角を有する正面抑圧信号を生成する正面抑圧信号生成部と、（２）複数の入力信号から得た信号に基づいてコヒーレンスを算出するコヒーレンス算出部と、（３）正面抑圧信号とコヒーレンスとの関係性を示す特徴量を算出する特徴量算出部と、（４）正面抑圧信号とコヒーレンスとの関係性を示す特徴量を用いて、入力信号に含まれる非目的音の抑圧に係る係数を設定し、当該係数を用いて前記入力信号に含まれる非目的音を抑圧した抑圧処理後信号を得る非目的音抑圧処理部とを備えることを特徴とする。 In order to solve such a problem, the non-target sound suppression apparatus according to the first aspect of the present invention is (1) a plurality of input signals obtained by converting each input signal from each of a plurality of microphones from a time domain to a frequency domain. A front suppression signal generation unit that generates a frontal suppression signal having a blind spot in front based on a difference in frequency domain input signals; and (2) a coherence calculation unit that calculates coherence based on signals obtained from a plurality of input signals; (3) a feature value calculation unit for calculating a feature value indicating the relationship between the front suppression signal and coherence; and (4) a feature value indicating the relationship between the front suppression signal and coherence. A non-target sound suppression processing unit that sets a coefficient related to suppression of the non-target sound and obtains a post-suppression signal that suppresses the non-target sound included in the input signal using the coefficient. .

第２の本発明に係る非目的音抑圧方法は、（１）正面抑圧信号生成部が、複数のマイクのそれぞれからの各入力信号を時間領域から周波数領域に変換して得た、複数の周波数領域入力信号の差に基づいて、正面に死角を有する正面抑圧信号を生成し、（２）コヒーレンス算出部が、複数の入力信号から得た信号に基づいてコヒーレンスを算出し、（３）特徴量算出部が、正面抑圧信号とコヒーレンスとの関係性を示す特徴量を算出し、（４）非目的音抑圧処理部が、正面抑圧信号とコヒーレンスとの関係性を示す特徴量を用いて、入力信号に含まれる非目的音の抑圧に係る係数を設定し、当該係数を用いて入力信号に含まれる非目的音を抑圧した抑圧処理後信号を得ることを特徴とする。 The non-target sound suppression method according to the second aspect of the present invention includes: (1) a plurality of frequencies obtained by the front suppression signal generation unit converting each input signal from each of the plurality of microphones from the time domain to the frequency domain; Based on the difference between the region input signals, a front suppression signal having a blind spot in front is generated. (2) The coherence calculation unit calculates coherence based on signals obtained from a plurality of input signals. The calculation unit calculates a feature amount indicating the relationship between the front suppression signal and the coherence, and (4) the non-target sound suppression processing unit uses the feature amount indicating the relationship between the front suppression signal and the coherence. A coefficient relating to suppression of the non-target sound included in the signal is set, and a signal after suppression processing in which the non-target sound included in the input signal is suppressed is obtained using the coefficient.

第３の本発明に係る非目的音抑圧プログラムは、コンピュータを、（１）複数のマイクのそれぞれからの各入力信号を時間領域から周波数領域に変換して得た、複数の周波数領域入力信号の差に基づいて、正面に死角を有する正面抑圧信号を生成する正面抑圧信号生成部と、（２）複数の入力信号から得た信号に基づいてコヒーレンスを算出するコヒーレンス算出部と、（３）正面抑圧信号とコヒーレンスとの関係性を示す特徴量を算出する特徴量算出部と、（４）正面抑圧信号とコヒーレンスとの関係性を示す特徴量を用いて、入力信号に含まれる非目的音の抑圧に係る係数を設定し、当該係数を用いて入力信号に含まれる非目的音を抑圧した抑圧処理後信号を得る非目的音抑圧処理部として機能させることを特徴とする。 A non-target sound suppression program according to a third aspect of the present invention is a computer program comprising: (1) a plurality of frequency domain input signals obtained by converting each input signal from each of a plurality of microphones from a time domain to a frequency domain; A front suppression signal generation unit that generates a frontal suppression signal having a blind spot in front based on the difference; (2) a coherence calculation unit that calculates coherence based on signals obtained from a plurality of input signals; (4) a feature amount calculation unit that calculates a feature amount indicating a relationship between the suppression signal and coherence; and (4) a feature amount indicating a relationship between the front suppression signal and the coherence. A coefficient related to suppression is set, and the coefficient is used as a non-target sound suppression processing unit that obtains a signal after suppression processing in which the non-target sound included in the input signal is suppressed using the coefficient.

本発明によれば、入力信号から非目的音を抑圧又は減算する際に、目的音の音質を良好に、低い処理負荷で、抑圧係数又は減算係数を制御することができる。 According to the present invention, when suppressing or subtracting a non-target sound from an input signal, it is possible to control the suppression coefficient or the subtraction coefficient with good sound quality of the target sound and low processing load.

第１の実施形態に係る非目的音抑圧装置の全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the non-target sound suppression apparatus which concerns on 1st Embodiment. 実施形態に係るマイクの配置例を説明する説明図である。It is explanatory drawing explaining the example of arrangement | positioning of the microphone which concerns on embodiment. 実施形態に係る音響信号処理装置で適用される指向性信号の特性を示す図である。It is a figure which shows the characteristic of the directional signal applied with the acoustic signal processing apparatus which concerns on embodiment. 第１の実施形態に係るＷＦ部の構成を示すブロック図である。It is a block diagram which shows the structure of the WF part which concerns on 1st Embodiment. 第１の実施形態に係るＷＦ部の時定数制御部における処理を示すフローチャートである。It is a flowchart which shows the process in the time constant control part of the WF part which concerns on 1st Embodiment. 第２の実施形態に係る非目的音抑圧装置の全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the non-target sound suppression apparatus which concerns on 2nd Embodiment. 第２の実施形態に係る周波数減算処理部の構成を示すブロック図である。It is a block diagram which shows the structure of the frequency subtraction process part which concerns on 2nd Embodiment. 第２の実施形態に係る周波数減算処理部の時定数制御部２３における処理を示すフローチャートである。It is a flowchart which shows the process in the time constant control part 23 of the frequency subtraction process part which concerns on 2nd Embodiment.

（Ａ）第１の実施形態
以下では、本発明に係る非目的音抑圧装置、方法及びプログラムの第１の実施形態を、図面を参照しながら詳細に説明する。 (A) First Embodiment Hereinafter, a first embodiment of a non-target sound suppressing apparatus, method and program according to the present invention will be described in detail with reference to the drawings.

第１の実施形態では、本発明を利用して、音声信号処理機能の利用環境の急拡大により、定常でない背景雑音の特性の変動に素早く追従する背景雑音抑圧装置及び方法（非目的音抑圧装置及び方法）を例示する。 In the first embodiment, by using the present invention, a background noise suppression apparatus and method (non-target sound suppression apparatus) that quickly follows fluctuations in characteristics of non-steady background noise due to rapid expansion of the usage environment of the audio signal processing function. And method).

ここで、周囲で妨害音が生じている環境で背景雑音抑圧機能が利用された場合、妨害音が存在する信号区間で、誤って係数適応動作を行う場合がある。このとき、妨害音という人間の声の特徴も背景雑音抑圧係数（以下では、「抑圧係数」と呼ぶ。）に反映されるため、当該係数を用いて抑圧処理を行った場合、目的音の信号成分も欠落させてしまい、音質が劣化することがある。 Here, when the background noise suppression function is used in an environment where interference sound is generated in the surroundings, the coefficient adaptation operation may be erroneously performed in a signal section where the interference sound exists. At this time, the characteristic of human voice as an interfering sound is also reflected in the background noise suppression coefficient (hereinafter referred to as “suppression coefficient”), and therefore when the suppression process is performed using the coefficient, the signal of the target sound The component may be lost, and the sound quality may be deteriorated.

そこで、第１の実施形態では、上記のような現象を防止するため、目的音や妨害音の影響を抑えつつ背景雑音の変動を監視し続け、その結果に基づいて背景雑音抑圧係数の適応動作を制御できる非目的音抑圧装置及び方法を実現する。 Therefore, in the first embodiment, in order to prevent the above phenomenon, the background noise fluctuation is continuously monitored while suppressing the influence of the target sound and the interference sound, and the background noise suppression coefficient adaptive operation is performed based on the result. A non-target sound suppressing apparatus and method capable of controlling the sound is realized.

（Ａ−１）第１の実施形態の構成
図１は、第１の実施形態に係る非目的音抑圧装置１の全体構成を示すブロック図である。 (A-1) Configuration of the First Embodiment FIG. 1 is a block diagram showing the overall configuration of the non-target sound suppression apparatus 1 according to the first embodiment.

図１に示すように、非目的音抑圧装置１は、複数（図１では２個の場合を示している。）のマイクｍ＿１及びｍ＿２から入力信号ｓ１（ｎ）及びｓ２（ｎ）を取得する。なお、ｎはサンプルの入力順を示すインデックスであり、正の整数で表現される。以下では、ｎが小さいほど古い入力サンプルであり、大きいほど新しい入力サンプルであるとする。 As shown in FIG. 1, the non-target sound suppression apparatus 1 acquires input signals s1 (n) and s2 (n) from a plurality of microphones m_1 and m_2 (two cases are shown in FIG. 1). . Note that n is an index indicating the input order of samples and is expressed by a positive integer. In the following, it is assumed that the smaller n is the older input sample, and the larger n is the new input sample.

非目的音抑圧装置１は、マイクｍ＿１及びｍ＿２から取得した各入力信号に基づいて、背景雑音の特性の変動に追従して背景雑音を抑圧するパラメータ（変数）を設定し、その背景雑音を抑圧した抑圧後信号を、後段の音声処理装置２に供給する。 The non-target sound suppression apparatus 1 sets a parameter (variable) that suppresses background noise by following changes in characteristics of the background noise based on each input signal acquired from the microphones m_1 and m_2, and suppresses the background noise. The post-suppression signal is supplied to the subsequent audio processing device 2.

音声処理装置２は、非目的音抑圧装置１からの抑圧後信号を利用して、所定の音声処理を行なう。音声処理装置２における処理内容は特に限定されるものではなく、様々な処理を行なうものを適用でき、例えば、電話端末やテレビ会議システム等の音声通信処理や音声認識処理等を行なうようにしてもよい。なお、非目的音抑圧装置１と音声処理装置２とは、信号の授受が可能であればよく、回路の配線接続されているようにしてもよいし、又例えば有線回線、無線回線を介したネットワーク通信により信号の授受ができるものであってもよい。 The audio processing device 2 performs predetermined audio processing using the post-suppression signal from the non-target sound suppressing device 1. The processing contents in the voice processing device 2 are not particularly limited, and those that perform various processing can be applied. For example, voice communication processing or voice recognition processing such as a telephone terminal or a video conference system may be performed. Good. It should be noted that the non-target sound suppressing device 1 and the sound processing device 2 may be configured so as to be able to exchange signals, and may be connected to a circuit wiring, for example, via a wired line or a wireless line. It may be one that can send and receive signals by network communication.

図２は、マイクｍ＿１およびｍ＿２の配置例を説明する説明図である。 FIG. 2 is an explanatory diagram for explaining an arrangement example of the microphones m_1 and m_2.

図２に示すように、マイクｍ＿１、ｍ＿２は、２つのマイクｍ＿１、ｍ＿２を含む面が目的音の到来する方向（目的音の音源の方向）に対して垂直となるように配置されているものとする。また、以下では、図２に示すように、２つのマイクｍ＿１、ｍ＿２の間の位置から見て、目的音の到来方向を前方向又は正面方向と呼ぶものとする。また、以下では、図２に示すように、右方向、左方向、後ろ方向と呼ぶ場合は、２つのマイクｍ＿１、ｍ＿２の間の位置から目的音の到来方向を見た場合の各方向を示すものとして説明する。なお、この実施形態では、目的音がマイクｍ＿１、ｍ＿２の正面方向から到来し、妨害音を含む非目的音が左右方向（横方向）から到来するものとして説明する。 As shown in FIG. 2, the microphones m_1 and m_2 are arranged so that the plane including the two microphones m_1 and m_2 is perpendicular to the direction in which the target sound arrives (the direction of the target sound source). And In the following, as shown in FIG. 2, the arrival direction of the target sound is referred to as the front direction or the front direction when viewed from the position between the two microphones m_1 and m_2. In the following, as shown in FIG. 2, when referring to the right direction, the left direction, and the backward direction, each direction when viewing the arrival direction of the target sound from the position between the two microphones m_1 and m_2 is shown. It will be explained as a thing. In this embodiment, it is assumed that the target sound comes from the front direction of the microphones m_1 and m_2, and the non-target sound including the interference sound comes from the left-right direction (lateral direction).

図１に示すように、非目的音抑圧装置１は、ＦＦＴ部１１、正面抑圧信号生成部１２、コヒーレンス計算部１３、相関及びｍｏｄＧＩ計算部１４、ＷＦ（ウィナーフィルター）部１５、ＩＦＦＴ部１６を有する。 As shown in FIG. 1, the non-target sound suppression apparatus 1 includes an FFT unit 11, a front suppression signal generation unit 12, a coherence calculation unit 13, a correlation and modGI calculation unit 14, a WF (Wiener filter) unit 15, and an IFFT unit 16. Have.

非目的音抑圧装置１は、プロセッサやメモリ等を有するコンピュータにプログラム（例えば、非目的音抑圧プログラム）をインストールして実現するようにしてもよく、この場合、非目的音抑圧装置１は機能的には図１を用いて示すことができる。なお、非目的音抑圧装置１については一部又は全部をハードウェア的に実現するようにしてもよい。 The non-target sound suppression device 1 may be realized by installing a program (for example, a non-target sound suppression program) in a computer having a processor, a memory, and the like. In this case, the non-target sound suppression device 1 is functional. Can be shown using FIG. Note that part or all of the non-target sound suppressing device 1 may be realized by hardware.

ＦＦＴ部１１は、マイクｍ＿１及びｍ＿２のそれぞれから図示しないＡＤ変換器を介して、入力信号ｓ１及びｓ２を受け取り、その入力信号ｓ１及びｓ２に高速フーリエ変換（あるいは離散フーリエ変換）を行うものである。これにより、入力信号ｓ１及びｓ２が周波数領域で表現されることになる。 The FFT unit 11 receives the input signals s1 and s2 from the microphones m_1 and m_2 via an AD converter (not shown), and performs fast Fourier transform (or discrete Fourier transform) on the input signals s1 and s2. . As a result, the input signals s1 and s2 are expressed in the frequency domain.

なお、ＦＦＴ部１１は、高速フーリエ変換を実施するにあたり、入力信号ｓ１（ｎ）及びｓ２（ｎ）から所定のＮ個（Ｎは任意の整数）のサンプルから成る、分析フーリエＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）を構成するものとする。入力信号ｓ１からＦＲＡＭＥ１を構成する例を以下の（１）式に示す。 Note that, in performing the fast Fourier transform, the FFT unit 11 includes an analysis Fourier FRAME1 (K) and a predetermined N (N is an arbitrary integer) samples from the input signals s1 (n) and s2 (n). Assume that FRAME2 (K) is configured. An example of configuring FRAME1 from the input signal s1 is shown in the following equation (1).

（１）式において、Ｋはフレームの順番を表すインデックスであり、正の整数で表現される。以下では、Ｋの値が小さいほど古い分析フレームであり、Ｋの値が大きいほど新しい分析フレームであるものとする。また、以降の説明において、特に但し書きが無い限り、分析対象となる最新の分析フレームを表すインデックスはＫであるとする。 In the equation (1), K is an index indicating the order of frames and is expressed by a positive integer. In the following, it is assumed that the smaller the K value, the older the analysis frame, and the larger the K value, the newer the analysis frame. In the following description, it is assumed that the index representing the latest analysis frame to be analyzed is K unless otherwise specified.

ＦＦＴ部１１は、分析フレームごとに、高速フーリエ変換処理を施すことで、入力信号ｓ１から構成した分析フレームＦＲＡＭＥ１（Ｋ）にフーリエ変換して得た周波数領域信号Ｘ１（ｆ，Ｋ）と、入力信号ｓ２から構成した分析フレームＦＲＡＭＥ２（Ｋ）にフーリエ変換して得た周波数領域信号Ｘ２（ｆ，Ｘ）とを、正面抑圧信号生成部１２及びコヒーレンス計算部１３に与える。 The FFT unit 11 performs a fast Fourier transform process for each analysis frame, thereby performing a frequency domain signal X1 (f, K) obtained by performing a Fourier transform on the analysis frame FRAME1 (K) configured from the input signal s1, and an input A frequency domain signal X2 (f, X) obtained by performing Fourier transform on the analysis frame FRAME2 (K) configured from the signal s2 is supplied to the front suppression signal generation unit 12 and the coherence calculation unit 13.

ここで、ｆは周波数を表すインデックスである。また、周波数領域信号Ｘ１（ｆ，Ｋ）は、単一の値ではなく、(２)式のように複数の周波数ｆ１〜ｆｍのｍ個（ｍは任意の整数）のスペクトル成分から構成されるものであるとする。 Here, f is an index representing a frequency. Further, the frequency domain signal X1 (f, K) is not a single value but is composed of m (m is an arbitrary integer) spectral components of a plurality of frequencies f1 to fm as shown in the equation (2). Suppose it is a thing.

上記（２）式において、Ｘ１（ｆ，Ｋ）は複素数であり、実部と虚部からなる。これは、Ｘ２（ｆ，Ｋ）、及び後述する正面抑圧信号生成部１２で説明する正面抑圧信号Ｎ（ｆ，Ｋ）についても同様である。 In the above equation (2), X1 (f, K) is a complex number and consists of a real part and an imaginary part. The same applies to X2 (f, K) and the front suppression signal N (f, K) described in the front suppression signal generation unit 12 described later.

正面抑圧信号生成部１２は、ＦＦＴ部１１から供給された信号について、周波数毎に正面方向の信号成分を抑圧する処理を行う。換言すると、正面抑圧信号生成部１２は、正面方向の成分を抑圧する指向性フィルタとして機能する。 The front suppression signal generation unit 12 performs a process of suppressing the signal component in the front direction for each frequency with respect to the signal supplied from the FFT unit 11. In other words, the front suppression signal generation unit 12 functions as a directivity filter that suppresses a component in the front direction.

例えば、正面抑圧信号生成部１２は、図３に示すように、正面方向に死角を有する８の字型の双指向性のフィルタを用いて、ＦＦＴ部１１から供給された信号から正面方向の成分を抑圧する指向性フィルタを形成する。 For example, as shown in FIG. 3, the front suppression signal generation unit 12 uses an 8-shaped bi-directional filter having a blind spot in the front direction to generate a front direction component from the signal supplied from the FFT unit 11. A directional filter that suppresses the noise is formed.

具体的には、正面抑圧信号生成部１２は、ＦＦＴ部１１から供給された信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に基づいて、以下の（３）式のような計算を行って、周波数毎の正面抑圧信号Ｎ（ｆ，Ｋ）を生成する。以下の（３）式の計算は、図３のような正面方向に死角を有する８の字型の双指向性のフィルタを形成する処理に相当する。
Ｎ（ｆ，Ｋ）＝Ｘ１（ｆ，Ｋ）−Ｘ２（ｆ，Ｋ） …（３） Specifically, the front suppression signal generation unit 12 performs a calculation such as the following equation (3) based on the signals X1 (f, K) and X2 (f, K) supplied from the FFT unit 11. Thus, the front suppression signal N (f, K) for each frequency is generated. The calculation of the following equation (3) corresponds to a process of forming an 8-shaped bi-directional filter having a blind spot in the front direction as shown in FIG.
N (f, K) = X1 (f, K) -X2 (f, K) (3)

以上のように、正面抑圧信号生成部１２は、周波数ｆ１〜ｆｍの各周波数成分（各周波数帯の１フレーム分のパワー）を取得する。 As described above, the front suppression signal generation unit 12 acquires each frequency component of frequencies f1 to fm (power for one frame in each frequency band).

また、正面抑圧信号生成部１２は、（４）式に従って、周波数ｆ１〜ｆｍの全周波数に亘って、正面抑圧信号Ｎ（ｆ，Ｋ）を平均した、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）を算出する。 Further, the front suppression signal generator 12 calculates an average front suppression signal AVE_N (K) by averaging the front suppression signals N (f, K) over all frequencies f1 to fm according to the equation (4). To do.

コヒーレンス計算部１３は、ＦＦＴ部１１からの周波数領域信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に含まれる特定方向に指向性の強い信号を形成してコヒーレンスＣＯＨ（Ｋ）を算出する。 The coherence calculation unit 13 forms a highly directional signal in a specific direction included in the frequency domain signals X1 (f, K) and X2 (f, K) from the FFT unit 11 to calculate coherence COH (K). .

ここで、コヒーレンス計算部１３におけるコヒーレンスＣＯＨ（Ｋ）の算出処理を説明する。 Here, the calculation processing of coherence COH (K) in the coherence calculation unit 13 will be described.

コヒーレンス計算部１３は、周波数領域信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）から第１の方向（例えば、左方向）に指向性が強いフィルタで処理した信号Ｂ１（ｆ，Ｋ）を形成し、またコヒーレンス計算部１３は、周波数領域信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）から第２の方向（例えば、右方向）に指向性が強いフィルタで処理した信号Ｂ２（ｆ，Ｋ）を形成する。特定方向に指向性の強い信号Ｂ１（ｆ）、Ｂ２（ｆ）の形成方法は、既存の方法を適用することができ、ここでは、以下の（５）式を適用して第１の方向に指向性が強い信号Ｂ１を形成し、以下の（６）式を適用して第２の方向に指向性が強い信号Ｂ２を形成する場合を例示する。 The coherence calculator 13 processes the signal B1 (f, K) obtained by processing the frequency domain signals X1 (f, K) and X2 (f, K) with a filter having strong directivity in the first direction (for example, the left direction). The coherence calculation unit 13 forms the signal B2 (f) processed from the frequency domain signals X1 (f, K) and X2 (f, K) with a filter having strong directivity in the second direction (for example, the right direction). , K). An existing method can be applied to the formation method of the signals B1 (f) and B2 (f) having high directivity in a specific direction. Here, the following equation (5) is applied to the first direction. An example in which the signal B1 having high directivity is formed and the signal B2 having high directivity in the second direction is formed by applying the following equation (6) will be described.

上記の（５）式、（６）式において、Ｓはサンプリング周波数、ＮはＦＦＴ分析フレーム長、τはマイクｍ＿１とマイクｍ＿２との間の音波到達時間差、ｉは虚数単位、ｆは周波数を示す。 In the above formulas (5) and (6), S is the sampling frequency, N is the FFT analysis frame length, τ is the difference in arrival time of sound waves between the microphone m_1 and the microphone m_2, i is the imaginary unit, and f is the frequency. .

次に、コヒーレンス計算部１３は、上記のようにして得られた信号Ｂ１（ｆ）、Ｂ２（ｆ）に対し、以下のような（７）式、（８）式に示す演算を施すことでコヒーレンスＣＯＨ（Ｋ）を得る。ここで、（７）式におけるＢ２（ｆ、Ｋ）^＊はＢ２（ｆ、Ｋ）の共役複素数である。 Next, the coherence calculation unit 13 performs the operations shown in the following expressions (7) and (8) on the signals B1 (f) and B2 (f) obtained as described above. Obtain coherence COH (K). Here, B2 (f, K) ^* in the equation (7) is a conjugate complex number of B2 (f, K).

ｃｏｅｆ（ｆ、Ｋ）は、インデックスが任意のインデックスＫのフレーム（分析フレームＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）を構成する任意の周波数ｆ（周波数ｆ１〜ｆｍのいずれかの周波数）の成分におけるコヒーレンスを表しているものとする。 coef (f, K) is a coherence in a component of an index K having an arbitrary index K (an arbitrary frequency f (any one of frequencies f1 to fm) constituting the analysis frames FRAME1 (K) and FRAME2 (K)). .

なお、ｃｏｅｆ（ｆ，Ｋ）を求める際に、信号Ｂ１（ｆ）の指向性の方向と信号Ｂ（ｆ）の指向性の方向が異なるものであれば、信号Ｂ１（ｆ）及び信号Ｂ２（ｆ）に係る指向性方向はそれぞれ、正面方向以外の任意の方向とするようにしてもよい。また、ｃｏｅｆ（ｆ，Ｋ）を算出する方法は、上記の算出方法に限定されるものではない。 When obtaining coef (f, K), if the directionality of the signal B1 (f) is different from that of the signal B (f), the signals B1 (f) and B2 ( The directivity direction according to f) may be any direction other than the front direction. Further, the method for calculating coef (f, K) is not limited to the above calculation method.

相関及びｍｏｄＧＩ計算部１４は、正面以外に指向性を有する正面抑圧信号Ｎ（ｆ，Ｎ）（平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ））と、コヒーレンスＣＯＨ（Ｋ）とを取得し、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）とコヒーレンスＣＯＨ（Ｋ）との関係性を示す特徴量である相関係数ｃｏｒ（Ｋ）を計算する。 The correlation and mod GI calculation unit 14 obtains a front suppression signal N (f, N) (average front suppression signal AVE_N (K)) and coherence COH (K) having directivity other than the front, and an average front suppression signal. A correlation coefficient cor (K), which is a feature amount indicating the relationship between AVE_N (K) and coherence COH (K), is calculated.

また、相関及びｍｏｄＧＩ計算部１４は、相関係数ｃｏｒ（Ｋ）を用いて、当該相関係数ｃｏｒ（Ｋ）の振幅の傾きの正負の変動の激しさを表す特徴量（ｃｏｒ＿ｍｏｄＧＩ（Ｋ））を計算し、その特量量（ｃｏｒ＿ｍｏｄＧＩ（ｋ））をＷＦ部１５に出力する。 Further, the correlation and mod GI calculation unit 14 uses the correlation coefficient cor (K) to represent a feature amount (cor_modGI (K)) that represents the intensity of positive and negative fluctuations in the amplitude slope of the correlation coefficient cor (K). And the characteristic amount (cor_modGI (k)) is output to the WF unit 15.

まず、相関及びＭｏｄＧＩ計算部１４において、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）とコヒーレンスＣＯＨ（Ｋ）との相関係数ｃｏｒ（Ｋ）に基づいて、妨害音が存在する信号区間を検出する原理を説明する。 First, a description will be given of the principle in which the correlation and ModGI calculation unit 14 detects a signal section in which an interfering sound exists based on the correlation coefficient cor (K) between the average front suppression signal AVE_N (K) and the coherence COH (K). To do.

ここでは、マイクｍ＿１及びマイクｍ＿２の正面方向に、目的音を発する音源が存在し、正面方向以外の方向（例えば、マイクｍ＿１及びマイクｍ＿２の横方向（すなわち、左方向、右方向）から妨害音が到来するものとする。 Here, there is a sound source that emits a target sound in the front direction of the microphone m_1 and the microphone m_2, and the disturbing sound is generated from a direction other than the front direction (for example, the lateral direction of the microphone m_1 and the microphone m_2 (that is, the left direction and the right direction). Shall arrive.

例えば、「妨害音声が存在せず」、かつ、「目的音が存在する」場合、正面抑圧信号Ｎ（ｆ，Ｋ）は、目的音成分の大きさに比例した信号値となる。ただし、図２のように、正面方向のゲインは、横方向のゲインと比較して小さいため、妨害音が存在する場合よりも小さい値となる。 For example, when “no disturbing voice is present” and “the target sound is present”, the front suppression signal N (f, K) has a signal value proportional to the magnitude of the target sound component. However, as shown in FIG. 2, since the gain in the front direction is smaller than the gain in the horizontal direction, the gain is smaller than that in the case where an interfering sound is present.

また、コヒーレンスＣＯＨ（Ｋ）は、入力信号の到来方向と深い関係を持つ特徴量であり、２つの信号成分の相関と言い換えられる。これは、（６）式は、ある周波数成分についての相関を算出する式であり、（７）式は全ての周波数成分の相関値の平均を計算する式であるためであるため、コヒーレンスＣＯＨ（Ｋ）が小さい場合は、２つの信号成分の相関が小さい場合であるといえ、反対に、コヒーレンスＣＯＨ（Ｋ）が大きい場合とは、２つの信号成分の相関が大きい場合であるといえる。コヒーレンスＣＯＨ（Ｋ）が小さい場合の入力信号は、到来方向が右方向又は左方向のいずれかに大きく偏っており、正面方向以外の方向から到来している信号といえる。一方、コヒーレンスＣＯＨ（Ｋ）が大きい場合の入力信号は、到来方向の偏りが少なく、正面方向から到来している信号であるといえる。 The coherence COH (K) is a feature quantity that has a deep relationship with the arrival direction of the input signal, and is rephrased as a correlation between two signal components. This is because the equation (6) is an equation for calculating the correlation for a certain frequency component, and the equation (7) is an equation for calculating the average of the correlation values of all frequency components, so that the coherence COH ( When K) is small, it can be said that the correlation between the two signal components is small, and conversely, when the coherence COH (K) is large, it can be said that the correlation between the two signal components is large. The input signal when the coherence COH (K) is small can be said to be a signal arriving from a direction other than the front direction because the arrival direction is greatly biased to either the right direction or the left direction. On the other hand, the input signal when the coherence COH (K) is large can be said to be a signal arriving from the front direction with little deviation in the arrival direction.

そうすると、「妨害音が存在せず」、かつ、「目的音が存在する」場合、コヒーレンスＣＯＨ（Ｋ）は大きい値となり、「妨害音が存在し」、かつ、「目的音が存在する」場合、コヒーレンスＣＯＨ（Ｋ）は小さい値となる。 Then, if “no interference sound exists” and “the target sound exists”, the coherence COH (K) becomes a large value, “the interference sound exists”, and “the target sound exists”. The coherence COH (K) is a small value.

以上の挙動を妨害音の有無に着目して整理すると、以下のような関係となる。
・「妨害音が存在せず」、かつ、「目的音が存在する」場合、コヒーレンスＣＯＨ（Ｋ）は大きな値となり、正面抑圧信号Ｎ（ｆ，Ｋ）（平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ））は目的音成分の大きさに比例した値となる。
・「妨害音が存在する」場合、コヒーレンスＣＯＨ（Ｋ）が小さい値となり、正面抑圧信号Ｎ（ｆ，Ｋ）（平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ））は大きい値となる。 When the above behavior is organized by focusing on the presence or absence of interfering sounds, the following relationship is obtained.
When “no interference sound exists” and “target sound exists”, the coherence COH (K) becomes a large value, and the front suppression signal N (f, K) (average front suppression signal AVE_N (K)) Is a value proportional to the magnitude of the target sound component.
When “disturbance sound exists”, the coherence COH (K) has a small value, and the front suppression signal N (f, K) (average front suppression signal AVE_N (K)) has a large value.

ところで、上記のような挙動の場合、正面抑圧信号Ｎ（ｆ，Ｋ）（平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ））とコヒーレンスＣＯＨ（Ｋ）との相関係数ｃｏｒ（Ｋ）を導入すると、以下のようなことがいえる。
・「妨害音が存在しない」場合、相関係数ｃｏｒ（Ｋ）は正の値（ｃｏｒ（Ｋ）＞０）となる。
・「妨害音が存在する」場合、相関係数ｃｏｒ（Ｋ）は負の値（ｃｏｒ（Ｋ）≦０）となる。 By the way, in the case of the above behavior, if the correlation coefficient cor (K) between the front suppression signal N (f, K) (average front suppression signal AVE_N (K)) and coherence COH (K) is introduced, The same can be said.
When “no disturbing sound exists”, the correlation coefficient cor (K) is a positive value (cor (K)> 0).
When “interference sound exists”, the correlation coefficient cor (K) is a negative value (cor (K) ≦ 0).

従って、相関及びｍｏｄＧＩ計算部１４は、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）とコヒーレンスＣＯＨ（Ｋ）との相関係数ｃｏｒ（Ｋ）の正負を観測し、相関係数ｃｏｒ（Ｋ）が正の場合に妨害音は存在しないと判定し、相関係数ｃｏｒ（Ｋ）が負の場合に妨害音が存在すると判定することができる。 Therefore, the correlation and mod GI calculation unit 14 observes the sign of the correlation coefficient cor (K) between the average front suppression signal AVE_N (K) and the coherence COH (K), and the correlation coefficient cor (K) is positive. It is determined that there is no interfering sound, and it is determined that there is an interfering sound when the correlation coefficient cor (K) is negative.

ここで、相関係数ｃｏｒ（Ｋ）の計算方法は限定されるものではないが、例えば、以下の式（９）を用いて、フレームごとに相関係数ｃｏｒ（Ｋ）を算出することができる。 Here, the calculation method of the correlation coefficient cor (K) is not limited. For example, the correlation coefficient cor (K) can be calculated for each frame using the following equation (9). .

なお、以下の式（９）において、ｃｏｖ［ＡＶＥ＿Ｎ（Ｋ），ＣＯＨ（Ｋ）］は、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）とコヒーレンスＣＯＨ（Ｋ）の共分散を示している。また、以下の式（９）において、σＡＶＥ＿Ｎ（Ｋ）は、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）の標準偏差を示し、σＣＯＨ（Ｋ）は、コヒーレンスＣＯＨ（Ｋ）の標準偏差を示している。さらに、以下の（９）式にて、相関係数ｃｏｒ（Ｋ）を求める場合に、ＡＶＥ＿Ｎ（Ｋ）及びＣＯＨ（Ｋ）についてそれぞれ直近に処理した所定数ｉ個のフレームの結果を用いて、標準偏差や共分散を求めるようにしてもよい。具体的には、以下の（９）にて、相関係数ｃｏｒ（Ｋ）を求める過程において、例えば、直近に処理したｉ個のフレーム（Ｋ−ｉ番目のフレーム、Ｋ−（ｉ−１）番目のフレーム、…、Ｋ−１番目のフレーム、Ｋ番目のフレーム）のそれぞれに係るＣＯＨ（Ｋ）及びＡＶＥ＿Ｎを用いて、標準偏差（σＮ（ｆ，Ｋ）、及び、σＣＯＨ（Ｋ））や共分散（ｃｏｖ［ＡＶＥ＿Ｎ（Ｋ），ＣＯＨ（Ｋ）］）を求めるようにしてもよい。言い換えると、相関係数ｃｏｒ（Ｋ）を求める過程において、直近に求めたｉ個のＡＶＥ＿Ｎ及びＣＯＨをサンプルとして用いて、以下の（９）式における標準偏差や共分散を求めるようにしてもよい。このようにして得られる相関係数ｃｏｒ（Ｋ）は、−１．０〜１．０の値をとる。 In the following equation (9), cov [AVE_N (K), COH (K)] indicates the covariance between the average front suppression signal AVE_N (K) and the coherence COH (K). In the following equation (9), σAVE_N (K) represents the standard deviation of the average front suppression signal AVE_N (K), and σCOH (K) represents the standard deviation of the coherence COH (K). Furthermore, when calculating the correlation coefficient cor (K) in the following equation (9), using the result of a predetermined number i frames processed most recently for AVE_N (K) and COH (K) respectively, Standard deviation and covariance may be obtained. Specifically, in the process of obtaining the correlation coefficient cor (K) in the following (9), for example, i frames (Ki-th frame, K- (i-1) processed most recently) The standard deviations (σN (f, K) and σCOH (K)) and COH (K) and AVE_N related to the first frame,..., K−1th frame, Kth frame), Covariance (cov [AVE_N (K), COH (K)]) may be obtained. In other words, in the process of obtaining the correlation coefficient cor (K), the standard deviation and covariance in the following equation (9) may be obtained by using the i AVE_N and COH obtained most recently as samples. . The correlation coefficient cor (K) obtained in this way takes a value of −1.0 to 1.0.

次に、相関及びｍｏｄＧＩ計算部１４において、相関係数ｃｏｒ（Ｋ）を用いて、当該相関係数ｃｏｒ（Ｋ）の振幅の傾きの正負の変動の激しさを表す特徴量を計算する。 Next, the correlation and modGI calculation unit 14 uses the correlation coefficient cor (K) to calculate a feature amount representing the intensity of positive and negative fluctuations in the slope of the amplitude of the correlation coefficient cor (K).

入力信号に背景雑音が存在する場合、相関係数ｃｏｒ（Ｋ）の挙動は次のように変わる。 When background noise exists in the input signal, the behavior of the correlation coefficient cor (K) changes as follows.

・妨害音が存在すると、相関係数ｃｏｒ（Ｋ）の値が正となり、妨害音が存在しなければ、相関係数ｃｏｒ（Ｋ）の値が負となる、マクロな挙動はある程度維持される。 When the interference sound exists, the value of the correlation coefficient cor (K) becomes positive, and when the interference sound does not exist, the value of the correlation coefficient cor (K) becomes negative, and the macro behavior is maintained to some extent. .

・背景雑音の影響を受けて正面抑圧信号（平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ））の振幅の大小の変動の不規則さが増すのに対して、コヒーレンスＣＯＨ（Ｋ）はダイナミックレンジが小さくなる程度で、振幅の大小の不規則さは極端に変化しない。このため、正面抑圧信号（平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ））の増加・減少と、コヒーレンスＣＯＨ（Ｋ）の増加・減少の同期性が損なわれ、相関（相関係数ｃｏｒ（Ｋ））の増減の変動が激しくなる。また、相関係数ｃｏｒ（Ｋ）の正負の変動の頻度が増す。・ Under the influence of background noise, the irregularity of the fluctuation of the amplitude of the front suppression signal (average front suppression signal AVE_N (K)) increases, whereas the coherence COH (K) decreases the dynamic range. Therefore, the irregularity of the amplitude does not change drastically. For this reason, the synchronization between the increase / decrease of the front suppression signal (average front suppression signal AVE_N (K)) and the increase / decrease of coherence COH (K) is impaired, and the increase / decrease of the correlation (correlation coefficient cor (K)). The fluctuations of the In addition, the frequency of positive and negative fluctuations in the correlation coefficient cor (K) increases.

・すなわち、背景雑音の影響が増すほど、相関係数ｃｏｒ（Ｋ）の値の増減の変動や、相関係数ｃｏｒ（Ｋ）の値の正負の変動頻度は増す。 That is, as the influence of background noise increases, the fluctuation in the value of the correlation coefficient cor (K) increases and the frequency of positive and negative fluctuations in the value of the correlation coefficient cor (K) increases.

このように、背景雑音が存在する場合には、相関係数ｃｏｒ（Ｋ）の値の増減の変動や正負の変動の頻度が増し、背景雑音の影響が増すほどこれらの変動（すなわち、相関係数ｃｏｒ＿（Ｋ）の値の増減や正負の変動）は大きくなる。この挙動は背景雑音にのみ由来するものである。よって、相関係数ｃｏｒ（Ｋ）の値の変動激しさを観測することで、目的音や妨害音の影響を受けずに、背景雑音が目的音に及ぼす影響度や、特性の変動を推定することができる。 As described above, when background noise exists, the frequency of increase / decrease in the value of the correlation coefficient cor (K) and the frequency of positive / negative fluctuations increase, and as the influence of background noise increases, these fluctuations (that is, the correlation) The increase / decrease in the value of the number cor_ (K) and the positive / negative fluctuation) increase. This behavior is derived only from background noise. Therefore, by observing the intensity of fluctuation of the correlation coefficient cor (K), the degree of influence of the background noise on the target sound and the fluctuation of the characteristics are estimated without being affected by the target sound and the disturbing sound. be able to.

そこで、第１の実施形態では、相関及びｍｏｄＧＩ計算部１４が、相関係数ｃｏｒ（Ｋ）の値の増減や正負の変動を観測するために、ｍｏｄＧＩ（ＧＩ：ＧｒａｄｉｅｎｔＩｎｄｅｘ）と呼ばれる特徴量を算出する。 Therefore, in the first embodiment, the correlation and mod GI calculation unit 14 uses a feature amount called mod GI (GI: Gradient Index) in order to observe an increase or decrease in the value of the correlation coefficient cor (K) or a positive / negative fluctuation. calculate.

ここで、ｍｏｄＧＩは、信号波形の傾き方向が変化する回数とその大きさを測る指標である（特許文献２参照）。ｍｏｄＧＩは、特徴量算出対象の任意の信号に関し、その算出対象信号のパワーで正規化された、その算出対象信号の２階差分のパワーと定義される。 Here, modGI is an index for measuring the number of changes in the inclination direction of the signal waveform and its magnitude (see Patent Document 2). modGI is defined as the power of the second-order difference of the calculation target signal, normalized with respect to the power of the calculation target signal, with respect to an arbitrary signal of the feature amount calculation target.

第１の実施形態では、相関及びｍｏｄＧＩ計算部１４は、特許文献２に記載される計算方法に従って、ｍｏｄＧＩを算出する。上記のように定義されるｍｏｄＧＩの算出式の一例として、以下の（１０）式を利用して、相関及びｍｏｄＧＩ計算部１４が、相関係数ｃｏｒ（Ｋ）の変動の激しさを表す特徴量ｃｏｒ＿ｍｏｄＧＩ（Ｋ）を計算する。 In the first embodiment, the correlation and modGI calculation unit 14 calculates modGI according to the calculation method described in Patent Document 2. As an example of the calculation formula of mod GI defined as described above, the feature and the correlation and mod GI calculation unit 14 represent the intensity of fluctuation of the correlation coefficient cor (K) using the following formula (10). Calculate cor_modGI (K).

（１０）式は、相関係数ｃｏｒ（Ｋ）の傾きの正負が変動する頻度を表している。（１０）式は、信号の傾きの正負の変動が小さくなるほど、ｃｏｒ＿ｍｏｄＧＩの値が小さくなるのに対し、傾きの正負の変動が大きくなるほど、ｃｏｒ＿ｍｏｄＧＩの値は大きくなる、という特徴を有する。換言すれば、ｃｏｒ＿ｍｏｄＧＩの値が大きいほど背景雑音の影響は大きく、反対に、ｃｏｒ＿ｍｏｄＧＩの値が小さいほど背景雑音の影響は小さいといえる。 Equation (10) represents the frequency at which the slope of the correlation coefficient cor (K) varies. Equation (10) is characterized in that the value of cor_modGI decreases as the positive / negative fluctuation of the signal slope decreases, whereas the value of cor_modGI increases as the positive / negative fluctuation of the slope increases. In other words, it can be said that the larger the value of cor_modGI, the greater the influence of background noise, and the smaller the value of cor_modGI, the smaller the influence of background noise.

ＷＦ部１５は、相関及びｍｏｄＧＩ計算部１４からｃｏｒ＿ｍｏｄＧＩ（Ｋ）の値に基づいて、抑圧係数ｗｆ＿ｃｏｅｆ（ｆ,Ｋ）の適応速度を制御する時定数（λ）の値を設定し、この時定数の値を用いて抑圧係数ｗｆ＿ｃｏｅｆ（ｆ,Ｋ）を算出する。 The WF unit 15 sets a value of a time constant (λ) for controlling the adaptive speed of the suppression coefficient wf_coef (f, K) based on the value of the cor_modGI (K) from the correlation and modGI calculation unit 14, and this time constant Is used to calculate the suppression coefficient wf_coef (f, K).

また、ＷＦ部１５は、入力信号の周波数領域信号Ｘ１（ｆ，Ｋ）に抑圧係数ｗｆ＿ｃｏｅｆ（ｆ,Ｋ）を乗算して、抑圧処理後信号Ｙ（ｆ，Ｋ）を算出して、ＩＦＦＴ部１６に出力する。 The WF unit 15 multiplies the frequency domain signal X1 (f, K) of the input signal by the suppression coefficient wf_coef (f, K) to calculate the post-suppression signal Y (f, K), and the IFFT unit 16 is output.

図４は、第１の実施形態に係るＷＦ部１５の構成を示すブロック図である。 FIG. 4 is a block diagram showing a configuration of the WF unit 15 according to the first embodiment.

図４に示すように、第１の実施形態に係るＷＦ部１５は、入力信号取得部２１、時定数制御部２３、係数適応部２４、背景雑音抑圧処理部２５、抑圧処理後信号出力部２６を有する。 As shown in FIG. 4, the WF unit 15 according to the first embodiment includes an input signal acquisition unit 21, a time constant control unit 23, a coefficient adaptation unit 24, a background noise suppression processing unit 25, and a post-suppression signal output unit 26. Have

入力信号取得部２１は、入力信号の周波数領域信号Ｘ１（ｆ，Ｋ）と、相関及びｍｏｄＧＩ計算部１４からｃｏｒ＿ｍｏｄＧＩ（Ｋ）を取得するものである。 The input signal acquisition unit 21 acquires the frequency domain signal X1 (f, K) of the input signal and cor_modGI (K) from the correlation and modGI calculation unit 14.

時定数制御部２３は、相関及びｍｏｄＧＩ計算部１４からｃｏｒ＿ｍｏｄＧＩ（Ｋ）の値に基づいて、抑圧係数ｗｆ＿ｃｏｅｆ（ｆ,Ｋ）の適応速度を制御する時定数λの値を設定するものである。 The time constant control unit 23 sets the value of the time constant λ that controls the adaptive speed of the suppression coefficient wf_coef (f, K) based on the value of the cor_modGI (K) from the correlation and modGI calculation unit 14.

ここで、時定数λの役割を簡単に述べる。ＷＦ部１５では、後述する抑圧係数適応部２４が、抑圧係数ｗｆ＿ｃｏｅｆ（ｆ，Ｋ）を算出するが、これに先立ち周波数ごとに背景雑音特性を計算しなければならない。背景雑音の推定は、例えば特許文献１の数１で行なわれ、ここにパラメータ（時定数）λが関与する。 Here, the role of the time constant λ will be briefly described. In the WF unit 15, a suppression coefficient adaptation unit 24, which will be described later, calculates the suppression coefficient wf_coef (f, K). Prior to this, the background noise characteristic must be calculated. The background noise is estimated by, for example, Equation 1 of Patent Document 1, and a parameter (time constant) λ is involved here.

時定数λは、０．０〜１．０の値をとり、背景雑音特性に対して瞬時入力値をどの程度反映するかをコントロールする役割を持つ。時定数λの値が大きいほど瞬時入力の影響が強くなり、時定数λの値が小さければ瞬時入力の影響は薄れる。よって、時定数λの値が大きければ、抑圧係数ｗｆ＿ｃｏｅｆ（ｆ,Ｋ）の値は、その瞬間の入力が強く反映されて高速な係数適応が実現できる一方で、瞬時入力の影響が強くなるため係数値の変動が大きくなり、音質の自然さを低下させる可能性がある。一方、時定数λの値が小さい場合には、適応速度は遅いものの、得られる抑圧係数ｗｆ＿ｃｏｅｆ（ｆ,Ｋ）は瞬時特性の影響を強く受けておらず過去の雑音特性が平均的に反映されたものになるので、音質の自然さが失われにくい。 The time constant λ takes a value of 0.0 to 1.0 and has a role of controlling how much the instantaneous input value is reflected on the background noise characteristics. The larger the value of the time constant λ, the stronger the influence of the instantaneous input, and the smaller the value of the time constant λ, the less the influence of the instantaneous input. Therefore, if the value of the time constant λ is large, the value of the suppression coefficient wf_coef (f, K) reflects the instantaneous input strongly, so that high-speed coefficient adaptation can be realized, but the influence of the instantaneous input becomes strong. The coefficient value fluctuates greatly, which may reduce the naturalness of sound quality. On the other hand, when the value of the time constant λ is small, although the adaptation speed is slow, the obtained suppression coefficient wf_coef (f, K) is not strongly influenced by the instantaneous characteristics, and the past noise characteristics are reflected on average. Therefore, it is difficult to lose the natural sound quality.

よって、時定数制御部２３は、ｃｏｒ＿ｍｏｄ（Ｋ）の値が閾値Θより大きい場合（例えば、ｃｏｒ＿ｍｏｄ（Ｋ）が閾値Θ以上の場合）には、背景雑音の影響は大きいので、時定数λの値を大きい値とする。一方、時定数制御部２３は、ｃｏｒ＿ｍｏｄ（Ｋ）の値が閾値Θより小さい場合（例えば、ｃｏｒ＿ｍｏｄ（Ｋ）が閾値Θ未満の場合）には、背景雑音の影響が小さい、時定数λの値を小さくする。これにより、目的音や妨害音の影響を受けずに、背景雑音の特性に応じた係数適応を実現できるようになる。 Therefore, when the value of cor_mod (K) is larger than the threshold Θ (for example, when cor_mod (K) is greater than or equal to the threshold Θ), the time constant control unit 23 has a large influence of the background noise. Set the value to a large value. On the other hand, when the value of cor_mod (K) is smaller than the threshold value Θ (for example, when cor_mod (K) is less than the threshold value Θ), the time constant control unit 23 is less influenced by background noise and has a value of the time constant λ. Make it smaller. As a result, the coefficient adaptation according to the characteristics of the background noise can be realized without being affected by the target sound and the interference sound.

なお、ここでは、時定数λの値の大きさを判断する閾値θが１個である場合を例示するが、閾値は２個以上設定してもよく、ｃｏｒ＿ｍｏｄＧＩが属する区間ごとに、きめ細かく時定数λを設定するようにしてもよい。 Here, the case where the threshold value θ for judging the magnitude of the value of the time constant λ is one is illustrated, but two or more threshold values may be set, and the time constant is finely set for each section to which cor_modGI belongs. λ may be set.

抑圧係数適応部２４は、時定数制御部２３により設定された時定数λを用いて、抑圧係数ｗｆ＿ｃｏｅｆ（ｆ,Ｋ）を算出するものである。抑圧係数ｗｆ＿ｃｏｅｆ（ｆ,Ｋ）は、例えば、特許文献１の数３を利用して求めることができる。 The suppression coefficient adaptation unit 24 calculates the suppression coefficient wf_coef (f, K) using the time constant λ set by the time constant control unit 23. The suppression coefficient wf_coef (f, K) can be obtained using, for example, Equation 3 in Patent Document 1.

背景雑音抑圧処理部２５は、以下の（１１）式を用いて、抑圧係数適応部２４により算出された抑圧係数ｗｆ＿ｃｏｅｆ（ｆ,Ｋ）を、入力信号の周波数領域信号Ｘ１（ｆ，Ｋ）に乗算して、抑圧処理後信号Ｙ（ｆ，Ｋ）を算出するものである。
Ｙ（ｆ，Ｋ）＝Ｘ１（ｆ，Ｋ）×ｗｆ＿ｃｏｅｆ（ｆ，Ｋ） …（１１） The background noise suppression processing unit 25 uses the following equation (11) to convert the suppression coefficient wf_coef (f, K) calculated by the suppression coefficient adaptation unit 24 into the frequency domain signal X1 (f, K) of the input signal. Multiplication is performed to calculate a post-suppression signal Y (f, K).
Y (f, K) = X1 (f, K) × wf_coef (f, K) (11)

抑圧処理後信号出力部は、抑圧処理後信号Ｙ（ｆ，Ｋ）を、ＩＦＦＴ部１６に出力するものである。 The post-suppression signal output unit outputs the post-suppression signal Y (f, K) to the IFFT unit 16.

ＩＦＦＴ部１６は、周波数領域信号である信号Ｙ（ｆ，Ｋ）を時間領域信号ｙ（ｎ）に変換するものである。なお、後段回路が、周波数領域信号Ｙ（ｆ，Ｋ）をそのまま処理できる構成であれば、ＩＦＦＴ部１６を省略するようにしてもよい。 The IFFT unit 16 converts a signal Y (f, K) that is a frequency domain signal into a time domain signal y (n). Note that the IFFT unit 16 may be omitted if the subsequent circuit is configured to process the frequency domain signal Y (f, K) as it is.

（Ａ−２）第１の実施形態の動作
次に、第１の実施形態に係る非目的音抑圧装置１における非目的音抑圧処理の動作を、図面を参照して詳細に説明する。 (A-2) Operation of the First Embodiment Next, the operation of the non-target sound suppression process in the non-target sound suppression device 1 according to the first embodiment will be described in detail with reference to the drawings.

まず、マイクｍ＿１、ｍ＿２のそれぞれから図示しないＡＤ変換器を介して、１フレーム分（１つの処理単位分）の入力信号ｓ１（ｎ）、ｓ２（ｎ）がＦＦＴ部１１に供給される。ＦＦＴ部１１は、１フレーム分の入力信号ｓ１（ｎ）及びｓ２（ｎ）に基づく分析フレームＦＲＡＭＥ１（Ｋ）、ＦＲＡＭＥ２（Ｋ）についてフーリエ変換し、周波数領域で示される信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）を取得する。ＦＦＴ部１１で生成された信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）が、正面抑圧信号生成部１２及びコヒーレンス計算部１３に与えられる。 First, input signals s1 (n) and s2 (n) for one frame (for one processing unit) are supplied from each of the microphones m_1 and m_2 to the FFT unit 11 via an AD converter (not shown). The FFT unit 11 performs Fourier transform on the analysis frames FRAME1 (K) and FRAME2 (K) based on the input signals s1 (n) and s2 (n) for one frame, and a signal X1 (f, K) indicated in the frequency domain. , X2 (f, K). The signals X1 (f, K) and X2 (f, K) generated by the FFT unit 11 are given to the front suppression signal generation unit 12 and the coherence calculation unit 13.

正面抑圧信号生成部１２は、ＦＦＴ部１１からの信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に基づいて、正面抑圧信号Ｎ（ｆ，Ｋ）を算出する。そして、正面抑圧信号生成部１２は、正面抑圧信号Ｎ（ｆ，Ｋ）に基づいて平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）を算出して、相関及びｍｏｄＧＩ計算部１４に与える。 The front suppression signal generator 12 calculates a front suppression signal N (f, K) based on the signals X1 (f, K) and X2 (f, K) from the FFT unit 11. Then, the front suppression signal generation unit 12 calculates an average front suppression signal AVE_N (K) based on the front suppression signal N (f, K), and provides it to the correlation and modGI calculation unit 14.

コヒーレンス計算部１３は、ＦＦＴ部１１からの信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に基づいて、コヒーレンスＣＯＨ（Ｋ）を生成し、相関及びｍｏｄＧＩ計算部１４に与える。 The coherence calculation unit 13 generates a coherence COH (K) based on the signals X1 (f, K) and X2 (f, K) from the FFT unit 11 and gives them to the correlation and modGI calculation unit 14.

相関及びｍｏｄＧＩ計算部１４は、例えば（９）式を用いて、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）とコヒーレンスＣＯＨ（Ｋ）との関係性を示す特徴量である相関係数ｃｏｒ（Ｋ）を計算する。 The correlation and modGI calculation unit 14 calculates a correlation coefficient cor (K), which is a feature amount indicating the relationship between the average front suppression signal AVE_N (K) and the coherence COH (K), using, for example, Expression (9). To do.

また、相関及びｍｏｄＧＩ計算部１４は、相関係数ｃｏｒ（Ｋ）を用いて、当該相関係数ｃｏｒ（Ｋ）の振幅の傾きの正負の変動の激しさを表す特徴量であるｃｏｒ＿ｍｏｄＧＩ（Ｋ）を計算し、このｃｏｒ＿ｍｏｄＧＩ（Ｋ）をＷＦ部１５に与える。 Further, the correlation and mod GI calculation unit 14 uses the correlation coefficient cor (K), and cor_modGI (K) that is a feature amount that represents the intensity of positive and negative fluctuations in the amplitude slope of the correlation coefficient cor (K). And cor_modGI (K) is given to the WF unit 15.

ＷＦ部１５には、相関及びｍｏｄＧＩ計算部１４からｃｏｒ＿ｍｏｄＧＩ（Ｋ）と、入力信号の周波数領域信号Ｘ１（ｆ，Ｋ）とが入力される。 Cor_modGI (K) and the frequency domain signal X1 (f, K) of the input signal are input to the WF unit 15 from the correlation and modGI calculation unit 14.

図５は、第１の実施形態に係るＷＦ部１５の時定数制御部２３における処理を示すフローチャートである。 FIG. 5 is a flowchart showing processing in the time constant control unit 23 of the WF unit 15 according to the first embodiment.

まず、時定数制御部２３は、相関及びｍｏｄＧＩ計算部１４からのｃｏｒ＿ｍｏｄＧＩ（Ｋ）の値と閾値Θとを比較し（Ｓ１０１）、ｃｏｒ＿ｍｏｄＧＩ（Ｋ）の値が閾値Θより大きい場合、時定数λの値を大きい値とし（Ｓ１０２）、ｃｏｒ＿ｍｏｄＧＩ（Ｋ）の値が閾値Θ未満である場合、時定数λの値を小さい値とする（Ｓ１０２）。 First, the time constant control unit 23 compares the value of cor_modGI (K) from the correlation and modGI calculation unit 14 with the threshold Θ (S101), and when the value of cor_modGI (K) is larger than the threshold Θ, the time constant λ If the value of cor_modGI (K) is less than the threshold Θ, the value of the time constant λ is set to a small value (S102).

時定数λは、０．０＜λ＜１．０の値をとるものであり、時定数λの値が１．０に近づくにつれ、瞬間に入力される信号に強く影響されるものであり、時定数λの値が０．０に近づくにつれ、瞬間に入力される信号の影響が弱くなるものである。従って、ｃｏｒ＿ｍｏｄＧＩ（Ｋ）の値と閾値Θとの比較結果に基づく、時定数λの値は相対的な大きさとすることができる。従って、ｃｏｒ＿ｍｏｄＧＩ（Ｋ）の値が閾値Θ未満の場合、時定数λの値をλ１とし、ｃｏｒ＿ｍｏｄＧＩ（Ｋ）の値が閾値Θ以上の場合の時定数λの値をλ２とすると、λ１＜λ２という大小関係であればよい。 The time constant λ takes a value of 0.0 <λ <1.0, and as the value of the time constant λ approaches 1.0, the time constant λ is strongly influenced by a signal input at an instant. As the value of the time constant λ approaches 0.0, the influence of the signal input at the moment becomes weaker. Therefore, the value of the time constant λ based on the comparison result between the value of cor_modGI (K) and the threshold Θ can be a relative magnitude. Therefore, when the value of cor_modGI (K) is less than the threshold Θ, the value of the time constant λ is λ1, and when the value of the cor_modGI (K) is greater than or equal to the threshold Θ, the value of the time constant λ is λ2, λ1 <λ2. It is sufficient if it is a magnitude relationship.

そして、抑圧係数適応部２４は、時定数制御部２３により設定された時定数λを用いて、抑圧係数ｗｆ＿ｃｏｅｆ（ｆ,Ｋ）を算出する。 Then, the suppression coefficient adaptation unit 24 calculates the suppression coefficient wf_coef (f, K) using the time constant λ set by the time constant control unit 23.

つまり、時定数λの値が大きいほど、瞬時入力の影響が強く反映された高速な抑圧係数ｗｆ＿ｃｏｅｆ（ｆ,Ｋ）を算出できる。一方、時定数λの値が小さければ、瞬時入力の影響は薄れ、抑圧係数ｗｆ＿ｃｏｅｆ（ｆ,Ｋ）の適応速度は遅いものが、得られる抑圧係数ｗｆ＿ｃｏｅｆ（ｆ,Ｋ）は、瞬時特性の影響を強く受けておらず、過去の雑音特性が平均的に反映されたものになる。そのため、この場合、音質の自然さが失われにくい。 That is, the higher the time constant λ, the faster the suppression coefficient wf_coef (f, K) that strongly reflects the influence of the instantaneous input can be calculated. On the other hand, if the value of the time constant λ is small, the influence of the instantaneous input is reduced and the adaptive speed of the suppression coefficient wf_coef (f, K) is slow, but the obtained suppression coefficient wf_coef (f, K) is affected by the instantaneous characteristic. The noise characteristics of the past are reflected on average. Therefore, in this case, the natural sound quality is not easily lost.

また、背景雑音抑圧処理部２５は、（１１）式を用いて、抑圧係数適応部２４により算出された抑圧係数ｗｆ＿ｃｏｅｆ（ｆ,Ｋ）を、入力信号の周波数領域信号Ｘ１（ｆ，Ｋ）に乗算して、抑圧処理後信号Ｙ（ｆ，Ｋ）を算出し、抑圧処理後信号出力部が、抑圧処理後信号Ｙ（ｆ，Ｋ）を、ＩＦＦＴ部１６に出力する。 Further, the background noise suppression processing unit 25 uses the expression (11) to convert the suppression coefficient wf_coef (f, K) calculated by the suppression coefficient adaptation unit 24 into the frequency domain signal X1 (f, K) of the input signal. The post-suppression signal Y (f, K) is calculated by multiplication, and the post-suppression signal output unit outputs the post-suppression signal Y (f, K) to the IFFT unit 16.

ＩＦＦＴ部１６は、周波数領域信号である信号Ｙ（ｆ，Ｋ）を時間領域信号ｙ（ｎ）に変換して、後段の音声処理装置２に出力する。 The IFFT unit 16 converts the signal Y (f, K), which is a frequency domain signal, into a time domain signal y (n) and outputs it to the subsequent audio processing apparatus 2.

（Ａ−３）第１の実施形態の効果
以上のように、第１の実施形態によれば、正面抑圧信号とコヒーレンスの相関のｍｏｄＧＩは、背景雑音の影響が増すほど大きくなり、影響が小さいほど小さくなるという、特徴的な挙動に基づいて、ウィナーフィルタ（ＷＦ）の時定数を制御することができる。これにより、背景雑音の影響に基づいた適切な係数適応が可能になり、背景雑音抑圧処理の精度を高めることができる。 (A-3) Effect of First Embodiment As described above, according to the first embodiment, the mod GI of the correlation between the front suppression signal and the coherence increases as the influence of background noise increases, and the influence is small. The time constant of the Wiener filter (WF) can be controlled based on the characteristic behavior of becoming smaller. Thereby, appropriate coefficient adaptation based on the influence of background noise becomes possible, and the accuracy of background noise suppression processing can be improved.

これにより、本発明をテレビ会議システムや携帯電話などの通信装置や音声認識機能の前処理に適用することで、性能の向上が期待できる。 Thus, by applying the present invention to a pre-processing of a communication device such as a video conference system or a mobile phone or a voice recognition function, an improvement in performance can be expected.

（Ｂ）第２の実施形態
次に、本発明に係る非目的音抑圧装置、方法及びプログラムの第２の実施形態を、図面を参照しながら説明する。 (B) Second Embodiment Next, a second embodiment of the non-target sound suppressing apparatus, method and program according to the present invention will be described with reference to the drawings.

第２の実施形態では、本発明を利用して、例えば入力信号に対して、正面抑圧信号を減算して、周囲から到来した妨害音を抑圧する非目的音抑圧装置及び方法（妨害音抑圧装置及び方法）を例示する。 In the second embodiment, the present invention is used to subtract a front suppression signal from an input signal, for example, to subtract the interference sound coming from the surroundings, and a non-target sound suppression apparatus and method (interference sound suppression apparatus) And method).

入力信号から正面抑圧信号を減算の際に、正面抑圧信号に減算係数を乗算することで減算の強度を制御することが多く、減算係数が大きすぎると抑圧性能が過剰で目的音声の歪が増し、減算係数が小さすぎると妨害音声の抑圧性能が不十分、というように音質に大きな影響を及ぼす。しかし、目的音声に重畳されている妨害音声の存在判定は難しく、減算係数を適切な値に設定することは困難である。 When subtracting the front suppression signal from the input signal, the subtraction strength is often controlled by multiplying the front suppression signal by the subtraction coefficient. If the subtraction coefficient is too large, the suppression performance will be excessive and distortion of the target speech will increase. If the subtraction coefficient is too small, the sound quality is greatly affected such that the suppression performance of the disturbing voice is insufficient. However, it is difficult to determine the presence of disturbing speech superimposed on the target speech, and it is difficult to set the subtraction coefficient to an appropriate value.

そこで、第２の実施形態では、入力信号への妨害音の寄与度を推定し、その結果に応じて周波数減算の減算係数を制御して、過不足なく妨害音を抑圧する非目的音抑圧装置及び方法（妨害音抑圧装置及び方法）を実現する。 Therefore, in the second embodiment, a non-target sound suppression device that estimates the contribution degree of the interference sound to the input signal and controls the subtraction coefficient of the frequency subtraction according to the result to suppress the interference sound without excess or deficiency. And a method (interference sound suppressing apparatus and method) are realized.

（Ｂ−１）第２の実施形態の構成
図６は、第２の実施形態に係る非目的音抑圧装置１Ａの全体構成を示すブロック図である。 (B-1) Configuration of Second Embodiment FIG. 6 is a block diagram showing an overall configuration of a non-target sound suppressing apparatus 1A according to the second embodiment.

第２の実施形態に係る非目的音抑圧装置１Ａは、複数（図１では２個の場合を示している。）のマイクｍ＿１及びｍ＿２から入力信号ｓ１（ｎ）及びｓ２（ｎ）を取得し、入力信号への妨害音の寄与度を推定し、その結果に応じて周波数減算の減算係数を制御し、妨害音を抑圧した抑圧後信号を、後段の音声処理装置２に供給する。 The non-target sound suppressing apparatus 1A according to the second embodiment acquires input signals s1 (n) and s2 (n) from a plurality of microphones m_1 and m_2 (FIG. 1 shows two cases). Then, the contribution degree of the disturbing sound to the input signal is estimated, the subtraction coefficient of the frequency subtraction is controlled according to the result, and the post-suppression signal in which the disturbing sound is suppressed is supplied to the subsequent speech processing apparatus 2.

音声処理装置２は、第１の実施形態と同様に、非目的音抑圧装置１Ａからの抑圧後信号を利用して、所定の音声処理を行なうものである。 As in the first embodiment, the sound processing device 2 performs predetermined sound processing using the post-suppression signal from the non-target sound suppressing device 1A.

図６に示すように、非目的音抑圧装置１Ａは、ＦＦＴ部１１、正面抑圧信号生成部１２、コヒーレンス計算部１３、相関計算部５４、周波数減算処理部５５、ＩＦＦＴ部１６を有する。 As illustrated in FIG. 6, the non-target sound suppression apparatus 1A includes an FFT unit 11, a front suppression signal generation unit 12, a coherence calculation unit 13, a correlation calculation unit 54, a frequency subtraction processing unit 55, and an IFFT unit 16.

なお、ＦＦＴ部１１、正面抑圧信号生成部１２、コヒーレンス計算部１３及びＩＦＦＴ部１６は、基本的には、第１の実施形態で説明した同一又は対応する構成要素であるため詳細な説明は省略する。 Note that the FFT unit 11, the front suppression signal generation unit 12, the coherence calculation unit 13, and the IFFT unit 16 are basically the same or corresponding components described in the first embodiment, and thus detailed description thereof is omitted. To do.

非目的音抑圧装置１Ａは、プロセッサやメモリ等を有するコンピュータにプログラム（例えば、非目的音抑圧プログラム）をインストールして実現するようにしてもよく、この場合、非目的音抑圧装置１Ａは機能的には図６を用いて示すことができる。なお、非目的音抑圧装置１Ａについては一部又は全部をハードウェア的に実現するようにしてもよい。 The non-target sound suppression device 1A may be realized by installing a program (for example, a non-target sound suppression program) in a computer having a processor, a memory, and the like. In this case, the non-target sound suppression device 1A is functional. Can be shown using FIG. Note that a part or all of the non-target sound suppressing apparatus 1A may be realized by hardware.

相関計算部５４は、正面抑圧信号生成部１２から正面抑圧信号（平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ））と、コヒーレンス計算部１３からコヒーレンスＣＯＨ（Ｋ）とを取得し、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）とコヒーレンスＣＯＨとの相関係数ｃｏｒ（Ｋ）を算出する。また、相関計算部５４は、算出した相関係数ｃｏｒ（Ｋ）を周波数減算処理部５５に出力する。この相関係数ｃｏｒ（Ｋ）の計算方法は、第１の実施形態と同様の方法を用いることができ、例えば（９）式を用いることができる。 The correlation calculation unit 54 acquires the frontal suppression signal (average frontal suppression signal AVE_N (K)) from the frontal suppression signal generation unit 12, and the coherence COH (K) from the coherence calculation unit 13, and the average frontal suppression signal AVE_N (K). ) And the coherence COH are calculated. Further, the correlation calculation unit 54 outputs the calculated correlation coefficient cor (K) to the frequency subtraction processing unit 55. As a method for calculating the correlation coefficient cor (K), the same method as in the first embodiment can be used, and for example, the equation (9) can be used.

周波数減算処理部５５は、入力信号Ｘ１（ｆ，Ｋ）と、相関計算部５４から相関係数ｃｏｒ（Ｋ）と、正面抑圧信号生成部１２から正面抑圧信号Ｎ（ｆ，Ｋ）を取得し、相関係数ｃｏｒ（Ｋ）に基づいて、減算係数αを設定し、正面抑圧信号Ｎ（ｆ，Ｋ）に減算係数αを乗算したうえで、入力信号Ｘ１（ｆ，Ｋ）から減算して、抑圧後信号Ｙ（ｆ，Ｋ）を得る。 The frequency subtraction processing unit 55 acquires the input signal X1 (f, K), the correlation coefficient cor (K) from the correlation calculation unit 54, and the front suppression signal N (f, K) from the front suppression signal generation unit 12. Based on the correlation coefficient cor (K), a subtraction coefficient α is set, the front suppression signal N (f, K) is multiplied by the subtraction coefficient α, and then subtracted from the input signal X1 (f, K). Then, the post-suppression signal Y (f, K) is obtained.

図７は、第２の実施形態に係る周波数減算処理部５５の構成を示すブロック図である。 FIG. 7 is a block diagram illustrating a configuration of the frequency subtraction processing unit 55 according to the second embodiment.

図７に示すように、周波数減算処理部５５は、入力信号取得部３１、減算係数制御部３２、減算部３３、減算処理後信号出力部３４を有する。 As illustrated in FIG. 7, the frequency subtraction processing unit 55 includes an input signal acquisition unit 31, a subtraction coefficient control unit 32, a subtraction unit 33, and a post-subtraction processing signal output unit 34.

入力信号取得部３１は、入力信号Ｘ１（ｆ，Ｋ）と、相関計算部５４から相関係数ｃｏｒ（Ｋ）と、正面抑圧信号生成部１２から正面抑圧信号Ｎ（ｆ，Ｋ）を取得するものである。 The input signal acquisition unit 31 acquires the input signal X1 (f, K), the correlation coefficient cor (K) from the correlation calculation unit 54, and the front suppression signal N (f, K) from the front suppression signal generation unit 12. Is.

減算係数制御部３２は、相関係数ｃｏｒ（Ｋ）に基づいて減算係数αを設定するものである。 The subtraction coefficient control unit 32 sets the subtraction coefficient α based on the correlation coefficient cor (K).

ここで、妨害音（ここでは妨害音声とする。）の寄与度の推定の原理を以下に述べる。まず、目的音がマイクｍ＿１及びｍ＿２の正面から到来し、妨害音がマイクｍ＿１及びｍ＿２の横方向（右方向、左方向）から到来するものとする。 Here, the principle of estimating the contribution of the disturbing sound (here, the disturbing sound) will be described. First, it is assumed that the target sound comes from the front of the microphones m_1 and m_2, and the disturbing sound comes from the lateral direction (right direction, left direction) of the microphones m_1 and m_2.

このとき、正面抑圧信号Ｎ（ｆ，Ｋ）は、「妨害音が存在せず」、かつ、「目的音が存在する」場合は正面から到来する信号成分を捕捉するため、目的音成分の大きさに比例した信号値をもつ。ただし、図２のように正面方向の集音レベルは横方向と比較して小さいため、「妨害音が存在する」場合よりは小さい。 At this time, since the front suppression signal N (f, K) captures the signal component coming from the front when “no disturbing sound exists” and “the target sound exists”, the magnitude of the target sound component is large. It has a signal value proportional to the height. However, as shown in FIG. 2, the sound collection level in the front direction is smaller than that in the horizontal direction, and is smaller than the case where “disturbance sound exists”.

また、コヒーレンスＣＯＨは、入力信号の到来方位と深い関係を持つ特徴量である。よって、「妨害音が存在せず」、かつ、「目的音のみが存在する」場合には大きな値をもち、「妨害音が存在する」場合には小さい値をとる。 The coherence COH is a feature quantity that has a deep relationship with the arrival direction of the input signal. Therefore, the value is large when “no disturbing sound exists” and “only the target sound exists”, and small when “disturbing sound exists”.

以上の挙動を妨害音の有無に着目して整理すると、以下のようになる。 The above behavior can be summarized as follows, focusing on the presence or absence of interfering sounds.

・「妨害音が存在せず」、かつ、「目的音だけが存在する」場合には、コヒーレンスＣＯＨは大きな値で、正面抑圧信号は目的音成分の大きさに比例した値となる。 When “no disturbing sound exists” and “only the target sound exists”, the coherence COH is a large value, and the front suppression signal is a value proportional to the size of the target sound component.

・「妨害音が存在する」場合にはコヒーレンスＣＯＨは小さい値で、正面抑圧信号は大きな値となる。 When “interfering sound is present”, the coherence COH is a small value and the front suppression signal is a large value.

この挙動は正面抑圧信号Ｎ（ｆ，Ｋ）とコヒーレンスＣＯＨとの相関係数ｃｏｒ（Ｋ）を導入すると、以下のようになる。 This behavior is as follows when the correlation coefficient cor (K) between the front suppression signal N (f, K) and the coherence COH is introduced.

・「妨害音が存在しない」場合には、相関係数ｃｏｒ（Ｋ）は正の値となる。 In the case of “no disturbing sound”, the correlation coefficient cor (K) is a positive value.

・「妨害音声が存在しない」場合には、相関係数ｃｏｒ（Ｋ）は負の値となる。 In the case of “no disturbing voice”, the correlation coefficient cor (K) is a negative value.

ところで、減算係数αは、妨害音の影響が小さいほど小さい値で、妨害音の影響が大きいほど大きい値であることが、妨害音抑圧の過不足を減らす観点からは望ましい（後述する（１２）式を参照）。 By the way, the subtraction coefficient α is preferably a smaller value as the influence of the disturbing sound is smaller and a larger value as the influence of the disturbing sound is larger from the viewpoint of reducing the excess or deficiency of the disturbing sound (described later (12)). See formula).

上述の通り、妨害音の有無によって正負が変動することから、相関係数ｃｏｒ（Ｋ）が正なら、減算係数αを小さくし、相関係数（Ｋ）が負なら、減算係数αを大きくするというような処理によって、妨害音の影響度に応じた減算係数の制御が実現できる。 As described above, the sign varies depending on the presence or absence of the interfering sound. If the correlation coefficient cor (K) is positive, the subtraction coefficient α is decreased. If the correlation coefficient (K) is negative, the subtraction coefficient α is increased. By such processing, the subtraction coefficient can be controlled in accordance with the influence level of the disturbing sound.

そこで、第２の実施形態では、減算係数制御部３２が、正面抑圧信号Ｎ（ｆ，Ｋ）とコヒーレンスＣＯＨとの相関係数ｃｏｒ（Ｋ）に特有の挙動に基づいて周波数減算処理に用いる減算係数を制御する。 Therefore, in the second embodiment, the subtraction coefficient control unit 32 uses the subtraction used for the frequency subtraction process based on the behavior peculiar to the correlation coefficient cor (K) between the front suppression signal N (f, K) and the coherence COH. Control the coefficient.

より具体的には、減算係数制御部３２は、妨害音声が存在する場合には抑圧効果を高めるために、減算係数αには大きな値を設定し、妨害音が存在しない場合には抑圧効果を弱めるために、減算係数αには小さな値を設定する。 More specifically, the subtraction coefficient control unit 32 sets a large value for the subtraction coefficient α in order to enhance the suppression effect when the disturbing sound is present, and exhibits the suppression effect when there is no disturbing sound. In order to weaken, a small value is set for the subtraction coefficient α.

なお、減算係数制御部３２は、例えば、相関係数の値と減算係数αの設定値との対応関係を記録した減算係数記憶部（図示しない）を設けて、この減算係数記憶部を参照して、減算係数αを設定するようにしてもよい。 The subtraction coefficient control unit 32 includes, for example, a subtraction coefficient storage unit (not shown) that records the correspondence between the correlation coefficient value and the set value of the subtraction coefficient α, and refers to this subtraction coefficient storage unit. Thus, the subtraction coefficient α may be set.

減算部３３は、減算係数制御部３２から得た減算係数αを用いて、（１２）式のような減算処理を行なうものである。
Ｙ（ｆ，Ｋ）＝Ｘ１（ｆ，Ｋ）−α×Ｎ（ｆ，Ｋ） …（１２） The subtraction unit 33 uses the subtraction coefficient α obtained from the subtraction coefficient control unit 32 to perform a subtraction process as shown in equation (12).
Y (f, K) = X1 (f, K) −α × N (f, K) (12)

減算処理後信号出力部３４は、減算部３３により算出された抑圧処理後信号（減算処理後信号）Ｙ（ｆ，Ｋ）をＩＦＦＴ部１６に出力する。 The post-subtraction signal output unit 34 outputs the post-suppression signal (subtraction signal) Y (f, K) calculated by the subtraction unit 33 to the IFFT unit 16.

（Ｂ−２）第２の実施形態の動作
次に、第２の実施形態に係る非目的音抑圧装置１Ａにおける非目的音抑圧処理の動作を、図面を参照して詳細に説明する。 (B-2) Operation of Second Embodiment Next, the operation of the non-target sound suppression process in the non-target sound suppression device 1A according to the second embodiment will be described in detail with reference to the drawings.

マイクｍ＿１、ｍ＿２のそれぞれから図示しないＡＤ変換器を介して、１フレーム分（１つの処理単位分）の入力信号ｓ１（ｎ）、ｓ２（ｎ）がＦＦＴ部１１に供給される。ＦＦＴ部１１は、１フレーム分の入力信号ｓ１（ｎ）及びｓ２（ｎ）に基づく分析フレームＦＲＡＭＥ１（Ｋ）、ＦＲＡＭＥ２（Ｋ）についてフーリエ変換し、周波数領域で示される信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）を取得する。ＦＦＴ部１１で生成された信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）が、正面抑圧信号生成部１２及びコヒーレンス計算部１３に与えられる。 Input signals s1 (n) and s2 (n) for one frame (for one processing unit) are supplied to the FFT unit 11 from each of the microphones m_1 and m_2 via an AD converter (not shown). The FFT unit 11 performs Fourier transform on the analysis frames FRAME1 (K) and FRAME2 (K) based on the input signals s1 (n) and s2 (n) for one frame, and a signal X1 (f, K) indicated in the frequency domain. , X2 (f, K). The signals X1 (f, K) and X2 (f, K) generated by the FFT unit 11 are given to the front suppression signal generation unit 12 and the coherence calculation unit 13.

正面抑圧信号生成部１２は、ＦＦＴ部１１からの信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に基づいて、正面抑圧信号Ｎ（ｆ，Ｋ）を算出する。そして、正面抑圧信号生成部１２は、正面抑圧信号Ｎ（ｆ，Ｋ）に基づいて平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）を算出して、相関計算部５４に与える。 The front suppression signal generator 12 calculates a front suppression signal N (f, K) based on the signals X1 (f, K) and X2 (f, K) from the FFT unit 11. Then, the front suppression signal generation unit 12 calculates an average front suppression signal AVE_N (K) based on the front suppression signal N (f, K), and gives it to the correlation calculation unit 54.

コヒーレンス計算部１３は、ＦＦＴ部１１からの信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に基づいて、コヒーレンスＣＯＨ（Ｋ）を生成し、相関計算部５４に与える。 The coherence calculation unit 13 generates coherence COH (K) based on the signals X1 (f, K) and X2 (f, K) from the FFT unit 11 and gives them to the correlation calculation unit 54.

相関計算部５４は、例えば（９）式を用いて、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）とコヒーレンスＣＯＨ（Ｋ）との関係性を示す特徴量である相関係数ｃｏｒ（Ｋ）を計算する。 The correlation calculation unit 54 calculates a correlation coefficient cor (K), which is a feature amount indicating the relationship between the average front suppression signal AVE_N (K) and the coherence COH (K), using, for example, equation (9).

周波数減算処理部５５には、入力信号Ｘ１（ｆ，Ｋ）と、相関計算部５４から相関係数ｃｏｒ（Ｋ）と、正面抑圧信号生成部１２から正面抑圧信号Ｎ（ｆ，Ｋ）が入力される。 The frequency subtraction processing unit 55 receives the input signal X1 (f, K), the correlation coefficient cor (K) from the correlation calculation unit 54, and the front suppression signal N (f, K) from the front suppression signal generation unit 12. Is done.

図８は、第２の実施形態に係る周波数減算処理部５５の減算係数制御部３２における処理を示すフローチャートである。 FIG. 8 is a flowchart showing processing in the subtraction coefficient control unit 32 of the frequency subtraction processing unit 55 according to the second embodiment.

まず、減算係数制御部３２は、相関計算部５４からの相関係数ｃｏｒ（Ｋ）の値が負であるか否かを判定する（Ｓ２０１）。そして、相関係数ｃｏｒ（Ｋ）の値が負である場合（すなわち、妨害音声が存在する場合）、抑圧効果を高めるために、減算係数αには大きな値を設定する（Ｓ２０２）。一方、相関係数ｃｏｒ（Ｋ）の値が負でない場合（すなわち、妨害音が存在しない場合）、抑圧効果を弱めるために、減算係数αには小さな値を設定する。 First, the subtraction coefficient control unit 32 determines whether or not the value of the correlation coefficient cor (K) from the correlation calculation unit 54 is negative (S201). When the value of the correlation coefficient cor (K) is negative (that is, when disturbing voice is present), a large value is set for the subtraction coefficient α in order to enhance the suppression effect (S202). On the other hand, when the value of the correlation coefficient cor (K) is not negative (that is, when no disturbing sound exists), a small value is set for the subtraction coefficient α in order to weaken the suppression effect.

そして、減算部３３は、減算係数制御部３２により得られた減算係数αを用いて、（１２）式により、減算処理後信号Ｙ（ｆ，Ｋ）を求め、減算処理後信号出力部３４が、抑圧処理後信号（減算処理後信号）Ｙ（ｆ，Ｋ）をＩＦＦＴ部１６に出力する。 Then, the subtraction unit 33 uses the subtraction coefficient α obtained by the subtraction coefficient control unit 32 to obtain a post-subtraction signal Y (f, K) by the equation (12), and the subtraction signal output unit 34 Then, the signal after suppression processing (the signal after subtraction processing) Y (f, K) is output to the IFFT unit 16.

（Ｂ−３）第２の実施形態の効果
以上のように、第２の実施形態によれば、妨害音声が存在する場合は正面抑圧信号とコヒーレンスとの相関係数が負で、妨害音声が存在しない場合には正となるという特徴的な挙動に基づいて、目的音声に重畳された妨害音声の存在を検出し、この結果を用いて周波数減算処理に用いる減算係数を制御することで、妨害音声抑圧処理の精度を高めることができる。 (B-3) Effect of the Second Embodiment As described above, according to the second embodiment, when the disturbing speech exists, the correlation coefficient between the front suppression signal and the coherence is negative, and the disturbing speech is Based on the characteristic behavior of being positive when it does not exist, the presence of interfering speech superimposed on the target speech is detected, and this result is used to control the subtraction coefficient used in the frequency subtraction process. The accuracy of the voice suppression process can be increased.

（Ｃ）他の実施形態
上述した第１及び第２の実施形態においても種々の変形実施形態を言及したが、本発明は、以下の変形実施形態にも適用できる。 (C) Other Embodiments Although various modified embodiments are mentioned in the first and second embodiments described above, the present invention can also be applied to the following modified embodiments.

（Ｃ−１）上述した第１又は第２の実施形態において、抑圧係数又は減算係数は、周波数ビンごとに算出してもよい。この場合、相関係数も周波数ビンごとに算出することで実現することができる。 (C-1) In the first or second embodiment described above, the suppression coefficient or the subtraction coefficient may be calculated for each frequency bin. In this case, the correlation coefficient can also be realized by calculating for each frequency bin.

（Ｃ−２）第２の実施形態において、相関係数の正負に着目することで妨害音の有無が判定できるが、相関係数の絶対値に着目することで妨害音の影響の大きさが分かる。相関係数と妨害音の影響との具体的な関係は、相関係数が負で絶対値が小さければ妨害音の影響は小さく、相関係数が負で絶対値が大きければ妨害音の影響は大きい、というものである。よって、入力値が小さければ出力値は小さく、入力値が大きければ出力値が大きくなるような任意の関数（例えば二次関数）を用意し、これに相関係数の絶対値を入力して得た値を減算係数とすることで、妨害音の影響度（相関の絶対値の大きさ）に応じた減算係数を設定することができる。 (C-2) In the second embodiment, the presence / absence of the interfering sound can be determined by paying attention to the positive / negative of the correlation coefficient. I understand. The specific relationship between the correlation coefficient and the influence of the interfering sound is as follows. If the correlation coefficient is negative and the absolute value is small, the influence of the interfering sound is small. It ’s big. Therefore, an arbitrary function (for example, a quadratic function) is prepared so that the output value is small if the input value is small and the output value is large if the input value is large, and the absolute value of the correlation coefficient is input to this. By using the obtained value as the subtraction coefficient, it is possible to set a subtraction coefficient corresponding to the influence level of the interference sound (the magnitude of the absolute value of the correlation).

１及び１Ａ…非目的音抑圧装置、１１…ＦＦＴ部、１２…正面抑圧信号生成部、１３…コヒーレンス計算部、１４…相関及びｍｏｄＧＩ計算部、１５…ＷＦ（ウィナーフィルター）部、５４…相関計算部、５５…周波数減算処理部、１６…ＩＦＦＴ部。 DESCRIPTION OF SYMBOLS 1 and 1A ... Non-target sound suppression apparatus, 11 ... FFT part, 12 ... Front suppression signal generation part, 13 ... Coherence calculation part, 14 ... Correlation and modGI calculation part, 15 ... WF (Wiener filter) part, 54 ... Correlation calculation , 55... Frequency subtraction processing unit, 16... IFFT unit.

Claims

Front-side suppression signal generation that generates a front-side suppression signal having a blind spot in front based on the difference between the plurality of frequency-domain input signals obtained by converting each input signal from each of the plurality of microphones from the time domain to the frequency domain And
A coherence calculator that calculates coherence based on signals obtained from the plurality of input signals;
A feature amount calculating unit that calculates a feature amount indicating a relationship between the front suppression signal and the coherence;
Using a feature amount indicating the relationship between the front suppression signal and the coherence, a coefficient related to suppression of the non-target sound included in the input signal is set, and the non-purpose included in the input signal using the coefficient A non-target sound suppression apparatus comprising: a non-target sound suppression processing unit that obtains a signal after suppression processing that suppresses sound.

The feature amount calculation unit calculates a feature amount that represents the intensity of positive and negative fluctuations in the slope of the correlation amplitude indicating the relationship between the front suppression signal and the coherence,
The non-target sound suppression processing unit sets a variable used for background noise suppression processing using the feature amount representing the intensity of positive and negative fluctuations in the slope of the correlation amplitude, and the suppression obtained using the variable The non-target sound suppression apparatus according to claim 1, wherein a signal after suppression processing is obtained by multiplying the input signal by a coefficient.

The feature amount calculation unit calculates the feature amount by normalizing the power of the second-order difference of the correlation indicating the relationship between the front suppression signal and the coherence with the power of the correlation,
The non-target sound suppression apparatus according to claim 2, wherein the non-target sound suppression processing unit sets a variable used for the background noise suppression processing according to a comparison result between the feature amount and a threshold value.

The feature amount calculating unit calculates a feature amount indicating a correlation indicating a relationship between the front suppression signal and the coherence;
The non-target sound suppression processing unit sets a subtraction coefficient using the feature amount representing the correlation, subtracts a product of the front suppression signal and the subtraction coefficient from the input signal, and outputs a signal after suppression processing. The non-target sound suppression device according to claim 1, wherein the non-target sound suppression device is obtained.

The non-target sound suppression processing unit sets the subtraction coefficient according to the sign of the feature value representing the correlation, and subtracts the product of the front suppression signal and the subtraction coefficient from the input signal, The non-target sound suppression apparatus according to claim 5, wherein a post-suppression signal is obtained.

A front suppression signal having a blind spot in front based on a difference between a plurality of frequency domain input signals obtained by converting each input signal from each of a plurality of microphones from a time domain to a frequency domain by a front suppression signal generation unit. Produces
A coherence calculator calculates coherence based on signals obtained from the plurality of input signals;
A feature amount calculation unit calculates a feature amount indicating a relationship between the front suppression signal and the coherence;
The non-target sound suppression processing unit sets a coefficient related to suppression of the non-target sound included in the input signal using the feature amount indicating the relationship between the front suppression signal and the coherence, and uses the coefficient A non-target sound suppression method, comprising: obtaining a post-suppression signal that suppresses a non-target sound included in the input signal.

Computer
Front-side suppression signal generation that generates a front-side suppression signal having a blind spot in front based on the difference between the plurality of frequency-domain input signals obtained by converting each input signal from each of the plurality of microphones from the time domain to the frequency domain And
A coherence calculator that calculates coherence based on signals obtained from the plurality of input signals;
A feature amount calculating unit that calculates a feature amount indicating a relationship between the front suppression signal and the coherence;
Using a feature amount indicating the relationship between the front suppression signal and the coherence, a coefficient related to suppression of the non-target sound included in the input signal is set, and the non-purpose included in the input signal using the coefficient A non-target sound suppression program that functions as a non-target sound suppression processing unit that obtains a signal after suppression processing that suppresses sound.