JP2006067127A

JP2006067127A - Method and apparatus of reducing reverberation

Info

Publication number: JP2006067127A
Application number: JP2004245778A
Authority: JP
Inventors: 恵一 ▲吉▼田; Keiichi Yoshida; Hiroaki Takeyama; 博昭竹山; Yasuhisa Ihira; 靖久井平; Minoru Fukushima; 実福島; Akihiro Kikuchi; 彰洋菊池; Satoshi Sugimoto; 敏杉本; Akira Terasawa; 章寺澤
Original assignee: Matsushita Electric Works Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 2004-08-25
Filing date: 2004-08-25
Publication date: 2006-03-09
Anticipated expiration: 2024-08-25
Also published as: JP4396449B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method and an apparatus of reducing a reverberation which can remove a reverberation sound even if an auditory range sound is not sounded. <P>SOLUTION: An inverted filter processing unit 10 obtains the transfer function H(k) of a reverberation space using the impulse response of a feedback path H<SB>AC</SB>in which a first echo canceler 30A identifies in an accommodative manner by an adaptive filter 31A, namely, a filter coefficient h^<SB>i</SB>(j). Further, a reverberation voice signal Z'(k) is divided by the amplitude ¾H(k)¾ of the transfer function H(k), and the original voice signal (sound source signal) Z"(k) is restored, and a reverberation component is removed. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、浴室などの残響のある場所で音声を明瞭に集音するために用いる残響除去方法及びその装置に関するものである。 The present invention relates to a reverberation removal method and apparatus used for clearly collecting sound in a reverberant place such as a bathroom.

近年、インターホンシステムなどにおいてマイクロホンとスピーカを利用して拡声通話を行う拡声通話装置が普及してきているが、この種の拡声通話装置を残響のある場所（例えば、浴室など）に設置した場合、話者の音声の残響がマイクロホンで集音されるために音声が不明瞭になってしまう虞があった。これに対してマイクロホンで集音される音声から残響を除去する方法（残響除去方法）及び残響除去装置が種々提案されている。 In recent years, loudspeaker devices that perform loudspeaker calls using microphones and speakers have become widespread in intercom systems, etc., but when this kind of loudspeaker device is installed in a place with reverberation (for example, in a bathroom, etc.) Since the reverberation of the person's voice is collected by the microphone, the voice may become unclear. On the other hand, various methods (reverberation removal methods) and reverberation removal devices for removing reverberation from sound collected by a microphone have been proposed.

例えば、非特許文献１には、単一マイクロホンで集音された信号から室内伝達特性の最小位相成分のみを取り除いて回復させる方法が提案されている。しかし、この方法は室内音場が最小位相特性をもつときのみしか有効でない。また、非特許文献２には、音源の数に対しマイクロホンを一つ以上多く配置することで、音源とマイクロホン間の伝達特性の零点が重複しない場合、系が最小位相特性を有していなくても音源波形そのものを正確に復元できる音場逆フィルタ理論が提唱されている。これらの伝達特性の逆特性を逆フィルタ手段で実現する方法では、逆フィルタを決定する上であらかじめ逆フィルタ用パラメータ（残響のインパルス応答）を測定しておかなければならない。しかし、室内伝達系は、室内環境の様々な変動に伴い時間と共に変動するため、高い回復精度を保持するために、その都度、伝達系を測定し、適応的に処理しなければならない。さらに、非特許文献３には、残響特性がスペクトル歪だけでなく信号波形のエンベロープに影響を与えることに着目し、室内伝達特性の測定を必要としない方法が提案されている。これは、変調伝達関数（MTF：Modulation Transfer Function）に基づいて音源信号と伝達系をモデル化し、信号波形そのものではなく、パワーエンベロープの回復を目的としたパワーエンベロープ逆フィルタ処理として実現されている。
特開平９−３２１８６０号公報 Stephen T. Neely and Jont B. Allen 「Invertibility of a room impulse response」,J.Acoust.Soc.Am.Vol.66,No.1,July 1979 Miyoshi,M. and Kaneda,Y.,「Inverse filtering of room acoustics,」 IEEE Trans.ASSP,Vol.36,No.2,pp.145-152,Feb.1988 広林茂樹、野村博昭、東山三樹夫「パワーエンベローブ伝達関数の逆フィルタ処理による残響音声の回復」電子情報通信学会論文誌A,Vol.J81-A,No.10,pp.1323-1330,2000 古川正和,鵜木祐史,赤木正人,「ＭＴＦに基づいた残響音声パワーエンベローブの回復方法」電子情報通信学会信学技法,EA2002-15,SP2002-15(2002-04) For example, Non-Patent Document 1 proposes a method of removing only the minimum phase component of room transfer characteristics from a signal collected by a single microphone and recovering it. However, this method is effective only when the room sound field has a minimum phase characteristic. Further, in Non-Patent Document 2, when one or more microphones are arranged with respect to the number of sound sources and the zeros of the transfer characteristics between the sound sources and the microphones do not overlap, the system does not have the minimum phase characteristics. The sound field inverse filter theory that can accurately restore the sound source waveform itself has been proposed. In the method of realizing the inverse characteristics of these transfer characteristics with the inverse filter means, the inverse filter parameters (reverberation impulse response) must be measured in advance to determine the inverse filter. However, since the indoor transmission system fluctuates with time due to various fluctuations in the indoor environment, the transmission system must be measured and adaptively processed each time in order to maintain high recovery accuracy. Further, Non-Patent Document 3 proposes a method that does not require measurement of indoor transfer characteristics, focusing on the fact that reverberation characteristics affect not only spectral distortion but also the envelope of the signal waveform. This is realized as a power envelope inverse filter process for the purpose of recovering the power envelope, not the signal waveform itself, by modeling the sound source signal and the transmission system based on a modulation transfer function (MTF).
JP-A-9-321860 Stephen T. Neely and Jont B. Allen "Invertibility of a room impulse response", J.Acoust.Soc.Am.Vol.66, No.1, July 1979 Miyoshi, M. and Kaneda, Y., `` Inverse filtering of room acoustics, '' IEEE Trans.ASSP, Vol.36, No.2, pp.145-152, Feb. 1988 Shigeki Hirobayashi, Hiroaki Nomura, Mikio Higashiyama "Recovery of reverberant speech by inverse filtering of power envelope transfer function" IEICE Transactions A, Vol. J81-A, No. 10, pp. 1323-1330, 2000 Masakazu Furukawa, Yuji Kashiwagi, Masato Akagi, "Recovery method of reverberant voice power envelope based on MTF" IEICE Technical Report, EA2002-15, SP2002-15 (2002-04)

しかしながら、非特許文献３に開示されたパワーエンベロープ回復方法では、モデル化された室内伝達特性のパラメータ（振幅と残響時間）の決定法が不明確であり、一般的なパラメータ決定法であるＴＳＰ法やＭ系列法といったインパルス応答測定方法では、測定時に可聴領域音をスピーカから出力せねばならず、これが現実的な応用問題への発展を制限しているという問題があった。 However, in the power envelope recovery method disclosed in Non-Patent Document 3, the method for determining the parameters (amplitude and reverberation time) of the modeled room transfer characteristic is unclear, and the TSP method is a general parameter determination method. In an impulse response measurement method such as the M-sequence method, an audible range sound must be output from a speaker at the time of measurement, which limits the development of a practical application problem.

これに対して非特許文献４では、非特許文献３のパワーエンベローブ逆フィルタ処理をベースに（１）パワーエンベローブの抽出法,（２）室内インパルス応答のパラメータ（振幅項と残響時間）の決定法といった原理上の問題点の改善を提案している。しかしながら、非特許文献４に開示された方法においては、音声信号のキャリア信号をホワイトノイズで近似しており、キャリア信号が受ける残響音場の影響を考慮していないため、不十分な音源信号しか復元されないという問題がある。特に、かかる方法を浴室に設置されるインターホンの通話端末に適用した場合、来訪者からは話者が浴室内で通話していることが判ってしまい、居住者のプライバシーを侵害してしまうという問題があった。 On the other hand, in Non-Patent Document 4, based on the power envelope inverse filter processing of Non-Patent Document 3, (1) a method for extracting power envelopes, and (2) a method for determining parameters of an indoor impulse response (amplitude term and reverberation time) It proposes the improvement of the problems in principle. However, in the method disclosed in Non-Patent Document 4, the carrier signal of the audio signal is approximated by white noise, and the influence of the reverberant sound field that the carrier signal receives is not taken into consideration. There is a problem that it is not restored. In particular, when such a method is applied to an intercom telephone terminal installed in a bathroom, the visitor knows that the speaker is talking in the bathroom, which violates the privacy of the resident. was there.

一方、特許文献１には、事前に可聴領域音を測定できない場合や伝達関数が時々刻々変化する場合でも適用可能な残響除去装置及び方法が開示されている。この残響除去装置は、少なくとも２つのマイクロホンと、これら２つのマイクロホンに対応した逆フィルタ部及び伝達関数模擬フィルタ部をそれぞれ構成要件としているが、インターホン等の拡声通話装置にこの方法及び装置を適用しようとすると、マイクロホンと演算用メモリ領域を増設し且つ高性能の信号処理演算器を用いる必要があるため、利用者への提供価格が高くなってしまい、これが一般家庭への普及の障壁になっているという問題があった。 On the other hand, Patent Document 1 discloses a dereverberation apparatus and method applicable even when an audible area sound cannot be measured in advance or when the transfer function changes from moment to moment. This dereverberation apparatus has at least two microphones, and an inverse filter unit and a transfer function simulation filter unit corresponding to these two microphones, respectively. However, this method and apparatus will be applied to a loudspeaker device such as an interphone. Then, since it is necessary to add a microphone and a memory area for calculation and use a high-performance signal processing arithmetic unit, the price to provide to the user becomes high, which becomes a barrier to the spread to general households. There was a problem of being.

本発明は、上記事情に鑑みてなされたものであり、その目的は、可聴領域音を鳴らさなくとも残響音を除去することができる残響除去方法及びその装置を提供することにある。 The present invention has been made in view of the above circumstances, and an object thereof is to provide a reverberation removing method and apparatus capable of removing reverberant sound without sounding an audible area sound.

請求項１の発明は、上記目的を達成するために、残響空間においてマイクロホンで集音する残響音声信号から残響成分を除去して元の音源信号を復元する残響除去方法であって、残響空間に存在するスピーカとマイクロホンとの音響結合によって形成される帰還経路のインパルス応答を適応フィルタにより適応的に同定してマイクロホンで集音した残響音声信号から帰還経路のエコー成分を推定する第１のステップと、第１のステップにおいて適応フィルタで推定されたエコー成分を帰還経路の出力信号より減算する第２のステップと、第２のステップにおける減算結果に含まれたエコー成分推定値の推定誤差が最小となるように適応フィルタのフィルタ係数を更新する第３のステップと、第３のステップにおいてエコー成分推定値の推定誤差が最小となったときのフィルタ係数を残響空間のインパルス応答に代用し該フィルタ係数から残響空間の伝達関数を求める第４のステップと、第４のステップで求めた残響空間の伝達関数とマイクロホンで集音した残響音声信号との演算から元の音声信号を求める第５のステップとを有することを特徴とする。 In order to achieve the above object, the invention of claim 1 is a dereverberation method for removing a reverberation component from a reverberant speech signal collected by a microphone in a reverberation space and restoring an original sound source signal. A first step of adaptively identifying an impulse response of a feedback path formed by acoustic coupling between an existing speaker and a microphone by an adaptive filter and estimating an echo component of the feedback path from a reverberant speech signal collected by the microphone; The second step of subtracting the echo component estimated by the adaptive filter in the first step from the output signal of the feedback path, and the estimation error of the echo component estimation value included in the subtraction result in the second step is minimized. A third step of updating the filter coefficients of the adaptive filter so that the estimation error of the echo component estimation value in the third step The fourth step of substituting the filter coefficient when the signal becomes the minimum into the impulse response of the reverberant space to obtain the transfer function of the reverberant space from the filter coefficient, the transfer function of the reverberant space obtained in the fourth step and the microphone And a fifth step of obtaining an original audio signal from a calculation with the collected reverberant audio signal.

請求項２の発明は、請求項１の発明において、第４のステップにおいて、フィルタ係数をフーリエ変換することにより周波数領域における伝達関数を求め、第５のステップにおいては残響音声信号をフーリエ変換するとともに第４のステップで求めた周波数領域の伝達関数の大きさで除算した後に逆フーリエ変換することを特徴とする。 In the invention of claim 2, in the invention of claim 1, in the fourth step, a transfer function in the frequency domain is obtained by Fourier transforming the filter coefficient, and in the fifth step, the reverberant speech signal is Fourier transformed. The inverse Fourier transform is performed after dividing by the magnitude of the transfer function in the frequency domain obtained in the fourth step.

請求項３の発明は、第１のステップにおける適応フィルタがＦＩＲ型フィルタであることを特徴とする。 The invention of claim 3 is characterized in that the adaptive filter in the first step is an FIR type filter.

請求項４の発明は、請求項３の発明において、第３のステップにおいて、最小自乗平均アルゴリズムにより適応フィルタのフィルタ係数を更新することを特徴とする。 The invention of claim 4 is characterized in that, in the invention of claim 3, in the third step, the filter coefficient of the adaptive filter is updated by a least mean square algorithm.

請求項５の発明は、請求項４の発明において、第３のステップにおいて、残響音声信号に音声が含まれているか否かを判定し、音声が含まれている場合にだけ適応フィルタのフィルタ係数を更新することを特徴とする。 According to a fifth aspect of the present invention, in the third aspect of the invention, in the third step, it is determined whether or not the reverberant speech signal includes speech, and the filter coefficient of the adaptive filter is only included when speech is included. It is characterized by updating.

請求項６の発明は、請求項４の発明において、第３のステップにおいて、スピーカから出力する音声信号の瞬時パワーに対する残響音声信号の瞬時パワー比が所定のしきい値よりも大きい場合に適応フィルタにおけるステップゲインを相対的に小さい値に設定することを特徴とする。 The invention of claim 6 is the adaptive filter according to the invention of claim 4, wherein in the third step, the instantaneous power ratio of the reverberant voice signal to the instantaneous power of the voice signal output from the speaker is larger than a predetermined threshold value. The step gain is set to a relatively small value.

請求項７の発明は、請求項４の発明において、第３のステップにおいて、マイクロホンで集音された信号とスピーカから出力される信号の双方に音声が含まれているか否かを判定し、双方に音声が含まれている場合には適応フィルタのフィルタ係数を更新しないことを特徴とする。 According to a seventh aspect of the present invention, in the third aspect of the present invention, in the third step, it is determined whether or not sound is included in both the signal collected by the microphone and the signal output from the speaker. Is characterized by not updating the filter coefficient of the adaptive filter.

請求項８の発明は、請求項４の発明において、第３のステップにおいて、フィルタ係数が発散した場合にフィルタ係数を初期化することを特徴とする。 The invention of claim 8 is characterized in that, in the invention of claim 4, in the third step, the filter coefficient is initialized when the filter coefficient diverges.

請求項９の発明は、請求項７の発明において、第３のステップにおいて、マイクロホンで集音された信号とスピーカから出力される信号の双方に音声が含まれている場合であっても帰還経路が変動したときにはフィルタ係数の更新を継続することを特徴とする。 The invention of claim 9 is the feedback path in the invention of claim 7, even if the third step includes sound in both the signal collected by the microphone and the signal output from the speaker. When the value fluctuates, the filter coefficient is continuously updated.

請求項１０の発明は、請求項１又は２の発明において、第５のステップにおいて、マイクロホンで集音された信号とスピーカから出力される信号の双方に音声が含まれているか否かを判定し、マイクロホンで集音された信号とスピーカから出力される信号の少なくとも何れか一方に音声が含まれておらず、且つエコー成分推定値の推定誤差が所定のしきい値より小さい場合に残響音声信号をゼロとすることを特徴とする。 According to a tenth aspect of the present invention, in the first or second aspect of the present invention, in the fifth step, it is determined whether or not sound is included in both the signal collected by the microphone and the signal output from the speaker. A reverberant audio signal when no audio is included in at least one of the signal collected by the microphone and the signal output from the speaker, and the estimation error of the echo component estimation value is smaller than a predetermined threshold value Is set to zero.

請求項１１の発明は、上記目的を達成するために、残響空間においてマイクロホンで集音する残響音声信号から残響成分を除去して元の音源信号を復元する残響除去装置であって、残響空間に存在するスピーカとマイクロホンとの音響結合によって形成される帰還経路のインパルス応答を適応的に同定してマイクロホンで集音した残響音声信号から帰還経路のエコー成分を推定する適応フィルタと、適応フィルタで推定されたエコー成分を帰還経路の出力信号より減算する減算手段と、減算手段による減算結果に含まれたエコー成分推定値の推定誤差が最小となるように適応フィルタのフィルタ係数を更新するフィルタ係数更新手段と、フィルタ係数更新手段においてエコー成分推定値の推定誤差が最小となったときのフィルタ係数を残響空間のインパルス応答に代用し該フィルタ係数から残響空間の伝達関数を求める伝達関数演算手段と、伝達関数演算手段で求めた残響空間の伝達関数とマイクロホンで集音した残響音声信号との演算から元の音声信号を求める残響演算手段とを備えたことを特徴とする。 In order to achieve the above object, an eleventh aspect of the invention is a dereverberation apparatus that removes a reverberation component from a reverberant speech signal collected by a microphone in a reverberation space and restores the original sound source signal. An adaptive filter that adaptively identifies the impulse response of the feedback path formed by the acoustic coupling between the existing speaker and microphone and estimates the echo component of the feedback path from the reverberant speech signal collected by the microphone, and an estimation by the adaptive filter Filter coefficient update that subtracts the echo component from the output signal of the feedback path and updates the filter coefficient of the adaptive filter so that the estimation error of the echo component estimation value contained in the subtraction result by the subtraction means is minimized And the filter coefficient when the estimation error of the echo component estimated value is minimized in the filter coefficient updating means. Transfer function calculation means for obtaining the transfer function of the reverberation space from the filter coefficient in place of the impulse response of the reverberation space, and from the calculation of the reverberation space transfer function obtained by the transfer function calculation means and the reverberant speech signal collected by the microphone. Reverberation calculation means for obtaining an audio signal is provided.

請求項１２の発明は、請求項１１の発明において、伝達関数演算手段は、フィルタ係数をフーリエ変換することにより周波数領域における伝達関数を求め、残響演算手段は、残響音声信号をフーリエ変換するとともに該残響信号を周波数領域の伝達関数の大きさで除算した後に逆フーリエ変換することを特徴とする。 According to a twelfth aspect of the invention, in the invention of the eleventh aspect, the transfer function calculating means obtains a transfer function in the frequency domain by subjecting the filter coefficient to Fourier transform, and the reverberation calculating means performs Fourier transform on the reverberant speech signal and The inverse Fourier transform is performed after the reverberation signal is divided by the magnitude of the transfer function in the frequency domain.

請求項１３の発明は、請求項１１又は１２の発明において、適応フィルタがＦＩＲ型フィルタであることを特徴とする。 The invention of claim 13 is characterized in that, in the invention of claim 11 or 12, the adaptive filter is an FIR type filter.

請求項１４の発明は、請求項１３の発明において、フィルタ係数更新手段は、最小自乗平均アルゴリズムにより適応フィルタのフィルタ係数を更新することを特徴とする。 According to a fourteenth aspect of the present invention, in the thirteenth aspect, the filter coefficient updating means updates the filter coefficient of the adaptive filter by a least mean square algorithm.

請求項１５の発明は、請求項４の発明において、フィルタ係数更新手段は、残響音声信号に音声が含まれているか否かを判定し、音声が含まれている場合にだけ適応フィルタのフィルタ係数を更新する有音／無音判定部を具備することを特徴とする。 According to a fifteenth aspect of the present invention, in the fourth aspect of the present invention, the filter coefficient updating means determines whether or not the reverberant sound signal includes sound, and only when the sound is included, the filter coefficient of the adaptive filter. A voice / silence determination unit for updating

請求項１６の発明は、請求項１４の発明において、フィルタ係数更新手段は、スピーカから出力する音声信号の瞬時パワーに対する残響音声信号の瞬時パワー比が所定のしきい値よりも大きい場合に適応フィルタにおけるステップゲインを相対的に小さい値に設定するステップゲイン切替部を具備することを特徴とする。 According to a sixteenth aspect of the present invention, in the fourteenth aspect of the present invention, the filter coefficient updating means is an adaptive filter when the instantaneous power ratio of the reverberant voice signal to the instantaneous power of the voice signal output from the speaker is greater than a predetermined threshold value. A step gain switching unit for setting the step gain to a relatively small value.

請求項１７の発明は、請求項１４の発明において、フィルタ係数更新手段は、マイクロホンで集音された信号とスピーカから出力される信号の双方に音声が含まれているか否かを判定する判定部を具備し、判定部によって双方に音声が含まれている場合には適応フィルタのフィルタ係数を更新しないことを特徴とする。 According to a seventeenth aspect of the present invention, in the fourteenth aspect, the filter coefficient updating means determines whether or not sound is included in both the signal collected by the microphone and the signal output from the speaker. And the filter coefficient of the adaptive filter is not updated when both are included in the determination unit.

請求項１８の発明は、請求項１４の発明において、フィルタ係数更新手段は、フィルタ係数の発散を検出するとともに発散検出時にフィルタ係数を初期化する発散検出部を具備することを特徴とする。 The invention of claim 18 is characterized in that, in the invention of claim 14, the filter coefficient updating means comprises a divergence detection unit for detecting the divergence of the filter coefficient and initializing the filter coefficient when the divergence is detected.

請求項１９の発明は、請求項１７の発明において、フィルタ係数更新手段は、帰還経路の変動を検出する帰還経路変動検出部を具備し、判定部によって双方に音声が含まれていると判定された場合であっても帰還経路変動検出手段が帰還経路の変動を検出したときにはフィルタ係数の更新を継続することを特徴とする。 According to a nineteenth aspect of the present invention, in the seventeenth aspect, the filter coefficient updating means includes a feedback path fluctuation detecting unit that detects a fluctuation of the feedback path, and it is determined by the determining unit that the sound is included in both. Even in this case, when the feedback path fluctuation detecting means detects the fluctuation of the feedback path, the filter coefficient is continuously updated.

請求項２０の発明は、請求項１１又は１２の発明において、減算手段の出力信号とスピーカから出力される信号に音声が含まれているか否かを検出するとともにエコー成分推定値の推定誤差を所定のしきい値と比較し、少なくとも何れか一方の信号に音声が含まれておらず、且つ推定誤差がしきい値より小さい場合に残響音声信号に非線形のエコー成分が含まれていると判断して当該残響音声信号をゼロとする非線形エコー抑圧手段を備えたことを特徴とする。 According to a twentieth aspect of the invention, in the eleventh or twelfth aspect of the invention, it is detected whether the output signal of the subtracting means and the signal output from the speaker include sound, and the estimation error of the echo component estimation value is predetermined. If at least one of the signals does not contain speech and the estimation error is smaller than the threshold, it is judged that the reverberant speech signal contains a nonlinear echo component. And non-linear echo suppression means for setting the reverberant speech signal to zero.

本発明によれば、残響空間に存在するスピーカとマイクロホンとの音響結合によって形成される帰還経路のインパルス応答を適応フィルタのフィルタ係数で代用し、そのフィルタ係数を用いて残響成分を除去するため、単一のマイクロホンのみを用い、従来例のように可聴領域音を鳴らさなくとも残響音を除去することができるという効果がある。 According to the present invention, the impulse response of the feedback path formed by the acoustic coupling between the speaker and the microphone existing in the reverberation space is substituted with the filter coefficient of the adaptive filter, and the reverberation component is removed using the filter coefficient. Using only a single microphone, there is an effect that reverberant sound can be removed without sounding an audible area sound as in the conventional example.

以下、本発明の残響除去方法を実現する残響除去装置の実施形態について説明する。なお、本実施形態ではインターホンシステムを構成し浴室内に設置される拡声通話装置に残響除去装置を搭載した場合について例示しているが、これに限定する主旨ではなく、マイクロホンとスピーカを用いて音声を拡声する拡声装置全般に本発明の残響除去方法及び残響除去装置が適用可能である。 Hereinafter, an embodiment of a dereverberation apparatus that realizes the dereverberation method of the present invention will be described. In this embodiment, the case where the dereverberation device is installed in the loudspeaker device that is configured in the interphone system and installed in the bathroom is illustrated. However, the present invention is not limited to this, and the sound is heard using a microphone and a speaker. The dereverberation method and the dereverberation apparatus of the present invention can be applied to all loudspeakers that amplify sound.

図２に拡声通話装置としてのインターホン親機（以下、「親機」と略す）Ｍ、相手側通話端末としてのドアホン子器Ｓのブロック図を示す。親機Ｍは、マイクロホン１、スピーカ２、２線−４線変換回路３、マイクロホンアンプＧ１、回線（２線の伝送路）への送話信号を増幅する回線出力アンプＧ２、回線からの受話信号を増幅する回線入力アンプＧ３、スピーカアンプＧ４、送話音量調整用増幅器Ｇ５、受話音量調整用増幅器Ｇ６、並びに第１及び第２のエコーキャンセラ３０Ａ，３０Ｂで構成される。また、ドアホン子器Ｓはマイクロホン１′、スピーカ２′、２線−４線変換回路３′、マイクロホンアンプＧ１′並びにスピーカアンプＧ４′で構成される。 FIG. 2 shows a block diagram of an interphone master unit (hereinafter abbreviated as “master unit”) M as a loudspeaker device, and a door phone slave unit S as a counterpart call terminal. The base unit M includes a microphone 1, a speaker 2, a 2-wire to 4-wire conversion circuit 3, a microphone amplifier G1, a line output amplifier G2 for amplifying a transmission signal to a line (two-wire transmission line), and a reception signal from the line. Is composed of a line input amplifier G3, a speaker amplifier G4, a transmission volume adjustment amplifier G5, a reception volume adjustment amplifier G6, and first and second echo cancellers 30A and 30B. The doorphone slave unit S includes a microphone 1 ', a speaker 2', a two-wire / four-wire conversion circuit 3 ', a microphone amplifier G1', and a speaker amplifier G4 '.

第１のエコーキャンセラ３０Ａは適応フィルタ３１Ａと減算器３２Ａからなり、スピーカ２−マイクロホン１間の音響結合により形成される帰還経路（音響エコー経路）Ｈ_ACのインパルス応答を適応フィルタ３１Ａにより適応的に同定し、参照信号（スピーカアンプＧ４への入力信号）Ｘ(ｊ)から推定したエコー成分（音響エコー）Ｇ＾(ｊ)を減算器３２ＡによりマイクロホンアンプＧ１の出力信号Ｙ(ｊ)から減算することでエコー成分を相殺して消去するものである。また、第２のエコーキャンセラ３０Ｂも適応フィルタ３１Ｂと減算器３２Ｂからなり、２線−４線変換回路３と伝送路との間のインピーダンスの不整合による反射およびドアホン子器Ｓにおけるスピーカ２’−マイクロホン１’間の音響結合とにより形成される帰還経路（回線エコー経路）Ｈ_LINのインパルス応答を適応フィルタ３１Ｂにより適応的に同定し、参照信号（回線出力アンプＧ２への入力信号、すなわち送話信号）から推定したエコー成分（回線エコー）を減算器３２Ｂにより受話信号から減算することでエコー成分を相殺して消去するものである。 The first echo canceller 30A is made adaptive filter 31A and a subtractor 32A, adaptively the impulse response of the feedback path (acoustic echo path) H _AC formed by the acoustic coupling between the speaker 2 microphone 1 by the adaptive filter 31A The subtractor 32A subtracts the echo component (acoustic echo) G ^ (j) identified and estimated from the reference signal (input signal to the speaker amplifier G4) X (j) from the output signal Y (j) of the microphone amplifier G1. Thus, the echo component is canceled and erased. The second echo canceller 30B also includes an adaptive filter 31B and a subtractor 32B. Reflection due to impedance mismatch between the two-wire / four-wire conversion circuit 3 and the transmission line, and the speaker 2′− in the doorphone slave unit S. The impulse response of the feedback path (line echo path) H _LIN formed by the acoustic coupling between the microphones 1 'is adaptively identified by the adaptive filter 31B, and the reference signal (input signal to the line output amplifier G2, that is, transmission) The echo component (line echo) estimated from the signal) is subtracted from the received signal by the subtractor 32B to cancel and cancel the echo component.

而して、第１及び第２のエコーキャンセラ３０Ａ，３０Ｂにより帰還経路Ｈ_ACおよびＨ_LINのエコー成分を相殺して閉ループを断ち切るため、不快なエコーおよびハウリングを抑制することができる。また、マイクロホンアンプＧ１の出力信号に含まれるエコー以外の成分、すなわち、親機Ｍに対して通話者が発声した音声信号および親機Ｍの周囲の騒音については全く損失を与えずにドアホン子器Ｓ側へ伝送することができ、同様に受話信号に含まれるエコー以外の成分、すなわち、ドアホン子器Ｓに対して通話者が発声した音声信号およびドアホン子器Ｓの周囲の騒音については全く損失を与えずに親機Ｍ側へ伝送することができる。したがって、双方向の同時通話を実現することができる。 Thus, since the first and second echo cancelers 30A and 30B cancel the closed loop by canceling the echo components of the feedback paths H _AC and H _LIN , unpleasant echoes and howling can be suppressed. Further, a component other than an echo included in the output signal of the microphone amplifier G1, that is, a voice signal uttered by the caller to the base unit M and a noise around the base unit M do not give any loss, and the door phone slave unit is not lost. Similarly, components other than the echo included in the received signal, that is, the voice signal uttered by the caller to the doorphone slave unit S and the noise around the doorphone slave unit S are completely lost. Can be transmitted to the base unit M side without giving. Therefore, two-way simultaneous calls can be realized.

次に本発明の要旨である残響除去装置Ａについて説明する。本実施形態における残響除去装置Ａは、図１に示すようにマイクロホン１、スピーカ２、第１のエコーキャンセラ３０Ａ、並びに親機Ｍの送話側の信号経路における第１のエコーキャンセラ３０Ａと送話音量調整用増幅器Ｇ５との間に設けられた逆フィルタ処理部１０によって構成されている。但し、第１のエコーキャンセラ３０Ａと逆フィルタ処理部１０はＤＳＰ（Digital Signal Proccesser）のハードウェアを専用のソフトウェアで制御することによって実現されるものであり、アナログの音声信号をディジタル信号に変換するＡ／Ｄ変換器３７とディジタルの音声信号をアナログ信号に変換するＤ／Ａ変換器３８を備えている。 Next, the dereverberation apparatus A which is the gist of the present invention will be described. As shown in FIG. 1, the dereverberation apparatus A according to the present embodiment transmits the microphone 1, the speaker 2, the first echo canceller 30A, and the first echo canceller 30A in the signal path on the transmission side of the base unit M and the speech. The inverse filter processing unit 10 is provided between the volume adjusting amplifier G5. However, the first echo canceler 30A and the inverse filter processing unit 10 are realized by controlling DSP (Digital Signal Processor) hardware with dedicated software, and convert an analog audio signal into a digital signal. An A / D converter 37 and a D / A converter 38 for converting a digital audio signal into an analog signal are provided.

適応フィルタ３１Ａは、ＦＩＲ型、ＩＩＲ型、ラティス型などの種々の構造のうちで最も安定的で且つ入力信号の特性変化にも強いＦＩＲ型であって、可変のフィルタ係数を適応更新することによって帰還経路のエコー成分（帰還経路を介した受話信号の回り込み成分）を推定するアルゴリズムとして、減算器３２Ａの出力信号の自乗平均値を最小化する最小自乗平均（ＬＭＳ：Least-Mean-Square）アルゴリズムを用いている。 The adaptive filter 31A is an FIR type that is the most stable among various structures such as FIR type, IIR type, and lattice type, and that is resistant to changes in the characteristics of the input signal, and adaptively updates variable filter coefficients. As an algorithm for estimating the echo component of the feedback path (the wraparound component of the received signal via the feedback path), the least mean square (LMS) algorithm that minimizes the mean square value of the output signal of the subtractor 32A Is used.

適応フィルタ３１Ａの動作をさらに詳しく説明すると、ＬＭＳアルゴリズムにおいては次式によってフィルタ係数（「タップ重み」ともいう）ｈ^_i(ｊ)を再帰的に更新していく。 The operation of the adaptive filter 31A will be described in more detail. In the LMS algorithm, the filter coefficient (also referred to as “tap weight”) h ^ _i (j) is recursively updated by the following equation.

ｈ^_i(ｊ＋１)＝ｈ^_i(ｊ)＋μＥ(ｊ)・Ｘ(ｊ−ｉ)
但し、ｉはタップ番号、ｊはサンプル時間を示す。 h ^ _i (j + 1) = h ^ _i (j) + μE (j) · X (j−i)
However, i is a tap number and j is a sample time.

ここで、Ｅ(ｊ)は、遠端側（ドアホン子器Ｓ）からのみ発声が行われて近端側（親機Ｍ）では発声が行われていない、いわゆるシングルトークの状態である場合にエコー成分をＧ(ｊ)とするとＥ(ｊ)＝Ｇ(ｊ)−Ｇ＾(ｊ)となり、サンプル時間ｊにおけるエコー成分Ｇ(ｊ)の推定誤差（瞬時誤差）を表し、μは毎回の繰り返しにおける補正量の大きさ（すなわち、収束の速さ）を制御するための定数であるステップゲイン（あるいは「ステップサイズパラメータ」ともいう）を表す。なお、エコー成分Ｇ(ｊ)の推定値Ｇ＾(ｊ)は上記フィルタ係数ｈ^_i(ｊ)と受話信号Ｘ(ｊ)とから下記式（１）によって求められる。 Here, E (j) is in a so-called single talk state in which speech is made only from the far end side (doorphone slave unit S) and no speech is made on the near end side (base unit M). If the echo component is G (j), E (j) = G (j) −G ^ (j), which represents the estimation error (instantaneous error) of the echo component G (j) at the sample time j, and μ is It represents a step gain (also referred to as a “step size parameter”) that is a constant for controlling the magnitude of the correction amount in repetition (that is, the speed of convergence). The estimated value G ^ (j) of the echo component G (j) is obtained by the following equation (1) from the filter coefficient h ^ _i (j) and the received signal X (j).

ここで、Ｉはフィルタタップ数、ｉはタップ番号である。 Here, I is the number of filter taps, and i is the tap number.

そして、フィルタ係数ｈ^_i(ｊ)を再帰的に更新することで上記推定誤差Ｅ(ｊ)の平均自乗誤差を最小とする最適解に到達する（収束する）と、その最適解のフィルタ係数ｈ^_i(ｊ)から求められるエコー成分の推定値Ｇ＾(ｊ)を送話信号Ｙ(ｊ)から減算することでエコー成分を相殺した出力信号Ｅ(ｊ)が得られることになる。 When the optimal solution that minimizes the mean square error of the estimation error E (j) is reached (converged) by recursively updating the filter coefficient h ^ _i (j), the filter coefficient of the optimal solution By subtracting the estimated value G ^ (j) of the echo component obtained from h ^ _i (j) from the transmission signal Y (j), an output signal E (j) in which the echo component is canceled is obtained.

よって、第１のエコーキャンセラ３０Ａにより音響側帰還経路Ｈ_ACのエコー成分を相殺して閉ループを断ち切るため、不快なエコーを抑制することができる。また、マイクロホンアンプＧ１の出力信号に含まれるエコー以外の成分、すなわち、親機Ｍに対して通話者が発声した音声信号および親機Ｍの周囲の騒音については全く損失を与えずにドアホン子器Ｓ側へ伝送することができ、同様に受話信号に含まれるエコー以外の成分、すなわち、ドアホン子器Ｓに対して通話者が発声した音声信号およびドアホン子器Ｓの周囲の騒音については全く損失を与えずに親機Ｍ側へ伝送することができる。 Therefore, to break the closed loop by the first echo canceller 30A cancels the echo component of the acoustic side feedback path H _AC, it is possible to suppress an unpleasant echo. Further, a component other than an echo included in the output signal of the microphone amplifier G1, that is, a voice signal uttered by the caller to the base unit M and a noise around the base unit M do not give any loss, and the door phone slave unit is not lost. Similarly, components other than the echo included in the received signal, that is, the voice signal uttered by the caller to the doorphone slave unit S and the noise around the doorphone slave unit S are completely lost. Can be transmitted to the base unit M side without giving.

ところで、親機Ｍとドアホン子器Ｓで同時に発声が行われる、いわゆるダブルトークの状態においてエコーキャンセラ３０Ａ，３０Ｂの適応フィルタ３１Ａ，３１Ｂがフィルタ係数ｈ^_i(ｊ)の更新を継続すると、フィルタ係数ｈ^_i(ｊ)が収束せずに発散してしまう虞がある。例えば第１のエコーキャンセラ３０Ａにおいて、マイクロホン１から入力するダブルトーク成分Ｎ(ｊ)が存在する場合、送話信号Ｙ(ｊ)はＹ(ｊ)＝Ｎ(ｊ)＋Ｇ(ｊ)となり、推定誤差Ｅ(ｊ)はＥ(ｊ)＝Ｎ(ｊ)＋（Ｇ(ｊ)−Ｇ＾(ｊ)）と表される。このとき、フィルタ係数ｈ^_i(ｊ)を再帰的に更新することで推定誤差Ｅ(ｊ)の平均自乗誤差を最小とする最適解を求めようとすると、参照信号（受話信号Ｘ(ｊ)）と相関のないダブルトーク成分Ｎ(ｊ)の項が推定誤差Ｅ(ｊ)に含まれているためにフィルタ係数ｈ^_i(ｊ)が収束せず、逆に発散する虞がある。すなわち、ダブルトーク成分Ｎ(ｊ)は適応フィルタ３１Ａの動作においては外乱成分となる。 By the way, when the adaptive filters 31A and 31B of the echo cancellers 30A and 30B continue to update the filter coefficient h ^ _i (j) in a so-called double talk state in which the base unit M and the intercom unit S are simultaneously uttered, the filter The coefficient h ^ _i (j) may diverge without converging. For example, in the first echo canceller 30A, when the double talk component N (j) input from the microphone 1 exists, the transmission signal Y (j) is Y (j) = N (j) + G (j), and is estimated. The error E (j) is expressed as E (j) = N (j) + (G (j) −G ^ (j)). At this time, if the filter coefficient h ^ _i (j) is recursively updated to obtain an optimal solution that minimizes the mean square error of the estimation error E (j), the reference signal (received signal X (j) ) Includes a term of the double talk component N (j) that is not correlated with the estimation error E (j), the filter coefficient h ^ _i (j) may not converge and may diverge. That is, the double talk component N (j) becomes a disturbance component in the operation of the adaptive filter 31A.

そこで本実施形態のエコーキャンセラ３０Ａでは、遠端側からの入力信号Ｘ(ｊ)に音声成分が含まれるかどうかを判別する有音／無音判定部１３を適応フィルタ３１Ａに具備するとともにダブルトークを検出するダブルトーク検出部１４を具備し、後述するように有音／無音判定部１３で音声が含まれると判定され、かつダブルトーク検出部１４によりダブルトークが検出されない状態でのみフィルタ係数ｈ^_i(ｊ)を更新するとともに、その他の状態ではフィルタ係数ｈ^_i(ｊ)を更新せずにそれ以前の値に固定するようにして、フィルタ係数ｈ^_i(ｊ)の発散を防止している。 Therefore, in the echo canceller 30A of the present embodiment, the adaptive filter 31A includes the voice / silence determination unit 13 that determines whether or not the speech signal is included in the input signal X (j) from the far end side, and double talk is performed. A double-talk detection unit 14 is provided for detection, and as will be described later, the sound coefficient / no-sound determination unit 13 determines that the voice is included, and the filter coefficient h ^ only when double-talk is not detected by the double-talk detection unit 14. _In addition to updating _i (j), the filter coefficient h ^ _i (j) is not updated but fixed to the previous value in other states to prevent divergence of the filter coefficient h ^ _i (j). ing.

さらに本実施形態のエコーキャンセラ３０Ａでは、遠端側の信号の瞬時パワーに対する近端側の信号の瞬時パワー比が所定のしきい値よりも大きい場合に適応フィルタ３１Ａにおけるステップゲインμを相対的に小さい値に設定するステップゲイン切替部１５を具備しており、これにより上記比が所定のしきい値よりも大きいか否かを判定し、しきい値よりも大きいと判定した場合に適応フィルタ３１Ａにおけるステップゲインμを相対的に小さい値に設定するため、ダブルトークか否かにかかわらず、上記比がしきい値よりも大きければ適応フィルタ３１Ａにおけるフィルタ係数ｈ^_i(ｊ)の収束の速さを相対的に遅くすることで発散を未然に防止して抑制することができるようになっている。 Furthermore, in the echo canceller 30A of the present embodiment, when the instantaneous power ratio of the near-end side signal to the instantaneous power of the far-end side signal is larger than a predetermined threshold, the step gain μ in the adaptive filter 31A is relatively set. A step gain switching unit 15 for setting to a small value is provided, whereby it is determined whether or not the ratio is greater than a predetermined threshold, and if it is determined that the ratio is greater than the threshold, the adaptive filter 31A In order to set the step gain μ at a relatively small value, regardless of whether or not double talk, the speed of convergence of the filter coefficient h ^ _i (j) in the adaptive filter 31A if the ratio is larger than the threshold value. By relatively slowing down, it is possible to prevent and suppress divergence.

ところで、インターホンシステムの親機Ｍのように、マイクロホン１やスピーカ２の前に手をかざしたり顔を近づけたりすることでエコー経路の利得が頻繁に変動する系では、ダブルトークの状態とエコー経路の利得が変動した状態とを判別することができず、エコー経路の利得の変動に伴って本来更新すべきフィルタ係数ｈ^_i(ｊ)が更新されない虞がある。そこで本実施形態では、音響エコー経路変動検出部１６をエコーキャンセラ３０Ａに具備しており、音響エコー経路Ｈ_ACの変動を検出した場合、フィルタ係数ｈ^_i(ｊ)の更新を継続するようにしている。このため、フィルタ係数ｈ^_i(ｊ)を速く収束させることが可能となり、エコー成分のみを早期に精度よく抑圧できるようになっている。 By the way, in the system in which the gain of the echo path frequently fluctuates by holding the hand in front of the microphone 1 or the speaker 2 or bringing the face close to the microphone 1 or the speaker 2 as in the base unit M of the interphone system, the double talk state and the echo path There is a possibility that the state in which the gain has fluctuated cannot be discriminated, and the filter coefficient h ^ _i (j) that should be updated may not be updated in accordance with the fluctuation in the gain of the echo path. Therefore, in the present embodiment, the acoustic echo path change detector 16 is provided to the echo canceller 30A, when detecting a variation in the acoustic echo path H _AC, so as to continue updating the filter coefficients h ^ _i (j) ing. For this reason, it is possible to quickly converge the filter coefficient h ^ _i (j), and it is possible to suppress only the echo component early and accurately.

また本実施形態では、適応フィルタ３１Ａにおいてフィルタ係数ｈ^_i(ｊ)が発散したことを検出する音響エコー発散検出部１７をエコーキャンセラ３０Ａに具備しており、音響エコー発散検出部１７でフィルタ係数ｈ^_i(ｊ)の発散が検出された場合、そのままフィルタ係数ｈ^_i(ｊ)の更新を継続しても再度収束させることは困難であるため、フィルタ係数ｈ^_i(ｊ)を初期化するようになっている。このため、フィルタ係数ｈ^_i(ｊ)を速く収束させることが可能になり、エコー成分のみを早期に精度よく抑圧できるようになっている。 In the present embodiment, the echo canceller 30A includes the acoustic echo divergence detection unit 17 that detects that the filter coefficient h ^ _i (j) diverges in the adaptive filter 31A. If divergence of h ^ _i (j) is detected, since it is difficult to converge as it again be continued updating of the filter coefficients h ^ _i (j), the initial filter coefficients h ^ _i (j) It has come to become. For this reason, the filter coefficient h ^ _i (j) can be quickly converged, and only the echo component can be suppressed quickly and accurately.

ところで本実施形態のエコーキャンセラ３０Ａは、上述した様々な方法でも抑圧しきれない残留エコー成分を除去するために非線形エコー抑圧部１８を具備している。この非線形エコー抑圧部１８は、後述するように近端側からの入力信号（送話信号）Ｙ(ｊ)に伝送すべき音声信号が含まれていない場合にのみ残留エコーを抑圧するものであって、通話の安定性向上が可能となる。 By the way, the echo canceller 30A of the present embodiment includes a nonlinear echo suppression unit 18 in order to remove residual echo components that cannot be suppressed even by the various methods described above. As will be described later, the nonlinear echo suppressor 18 suppresses the residual echo only when the input signal (transmission signal) Y (j) from the near-end side does not include a voice signal to be transmitted. Thus, the stability of the call can be improved.

次に本発明の要旨である残響除去装置について説明する。一般に音源信号をｘ(ｔ)（ｔは時間をあらわすインデックス）、室内インパルス応答をｈ(ｔ)とすると、残響信号（観測信号）は下記の式（２）で表される（非特許文献３参照）。 Next, the dereverberation apparatus that is the gist of the present invention will be described. In general, when a sound source signal is x (t) (t is an index representing time) and an indoor impulse response is h (t), a reverberation signal (observation signal) is expressed by the following equation (2) (Non-patent Document 3). reference).

ｙ(ｔ)＝ｘ(ｔ)＊ｈ(ｔ) （２）
但し、＊はコンボリューション（畳み込み）演算を示す演算子である。
式（２）は周波数領域で下記の式（３）で表される（非特許文献３参照）。 y (t) = x (t) * h (t) (2)
Note that * is an operator indicating a convolution operation.
Expression (2) is expressed by the following expression (3) in the frequency domain (see Non-Patent Document 3).

Ｙ(ｋ)＝Ｘ(ｋ)Ｈ(ｋ) （３）
但し、ｋは周波数領域をあらわすインデックスである。
従って音源信号Ｘ(ｋ)は、下記の式（４）から求められる。 Y (k) = X (k) H (k) (3)
Here, k is an index representing the frequency domain.
Accordingly, the sound source signal X (k) is obtained from the following equation (4).

Ｘ(ｋ)＝Ｙ(ｋ)Ｈ^-1 (ｋ) （４）
ここで、線形システムにおいて観測信号から音源信号を推定するためには伝達系の推定が必要である。しかし、伝達関数Ｈ(ｋ)は一般に時変系であるために適応的な推定を必要とする。 X (k) = Y (k) H ⁻¹ (k) (4)
Here, in order to estimate the sound source signal from the observation signal in the linear system, it is necessary to estimate the transmission system. However, since the transfer function H (k) is generally a time-varying system, adaptive estimation is required.

一方、エコーキャンセラ３０Ａにおいてフィルタ係数ｈ^_i(ｊ)が十分に収束している、すなわち、推定誤差Ｅ(ｊ)の平均自乗誤差を最小とする最適解に到達しているときは、そのフィルタ係数ｈ^_i(ｊ)が音響側帰還経路Ｈ_ACのインパルス応答をよく近似している。そして、上記式（２）における室内インパルス応答ｈ(ｔ)を音響側帰還経路Ｈ_ACのインパルス応答で代用すれば、フィルタ係数ｈ^_i(ｊ)を用いて伝達関数Ｈ(ｋ)を演算し、残響信号（参照信号）から音源信号を復元することができるものであり、かかる演算処理を逆フィルタ処理部１０で実行している。すなわち、本実施形態では逆フィルタ処理部１０が伝達関数演算手段並びに残響演算手段となる。 On the other hand, when the filter coefficient h ^ _i (j) is sufficiently converged in the echo canceller 30A, that is, when the optimum solution that minimizes the mean square error of the estimation error E (j) has been reached, the filter coefficients h ^ _i (j) is a good approximation of the impulse response of the acoustic side feedback path H _AC. If the room impulse response h (t) in the above equation (2) is substituted with the impulse response of the acoustic side feedback path H _AC , the transfer function H (k) is calculated using the filter coefficient h ^ _i (j). The sound source signal can be restored from the reverberation signal (reference signal), and the inverse filter processing unit 10 executes such calculation processing. That is, in this embodiment, the inverse filter processing unit 10 serves as a transfer function calculation unit and a reverberation calculation unit.

親機Ｍが浴室のような残響のある場所に設置されている場合、マイクロホン１で集音する音声信号Ｚ(ｊ)には音源信号（通話者が発した音声信号）だけでなく残響信号が含まれており、第１のエコーキャンセラ３０Ａによってエコー成分のみが抑圧された音声信号Ｚ’(ｊ)には残響成分がそのまま残っているため、この残響成分を逆フィルタ処理部１０で除去することにより残響成分を含まない音声信号（音源信号）Ｚ”(ｊ)が復元される。具体的には逆フィルタ処理部１０では、次の５つのステップ１〜５の演算処理を行っている。
＜ステップ１：室内インパルス応答ｈ(ｔ)を高速フーリエ変換演算するステップ＞
室内インパルス応答ｈ(ｔ)を代用するフィルタ係数ｈ^_i(ｊ)を第１のエコーキャンセラ３０Ａから取得し、このフィルタ係数ｈ^_i(ｊ)を高速フーリエ変換して伝達関数Ｈ(ｋ)を求める。この伝達関数Ｈ(ｋ)は下式のように複素形式で表される。但し、Ａは振幅を調整するパラメータであり、ｉは虚数単位である。 When the main unit M is installed in a reverberant place such as a bathroom, the sound signal Z (j) collected by the microphone 1 includes not only the sound source signal (the sound signal emitted by the caller) but also the reverberation signal. The reverberation component remains as it is in the audio signal Z ′ (j) that is included and the echo component only is suppressed by the first echo canceller 30A. Therefore, the reverberation component is removed by the inverse filter processing unit 10. Thus, an audio signal (sound source signal) Z ″ (j) that does not include a reverberation component is restored. Specifically, the inverse filter processing unit 10 performs the following five steps 1 to 5 arithmetic processing.
<Step 1: Step of fast Fourier transform of room impulse response h (t)>
A filter coefficient h ^ _i (j) substituting the indoor impulse response h (t) is acquired from the first echo canceller 30A, and this filter coefficient h ^ _i (j) is subjected to fast Fourier transform to transfer function H (k). Ask for. This transfer function H (k) is expressed in a complex form as shown in the following equation. However, A is a parameter for adjusting the amplitude, and i is an imaginary unit.

Ｈ(ｋ)＝Ａ{ｈ_real(k)＋ｉ・ｈ_img(k)}
＜ステップ２：残響音声信号Ｚ’(ｊ)を高速フーリエ変換演算するステップ＞
第１のエコーキャンセラ３０Ａから出力される残響成分を含んだ音声信号（残響音声信号）Ｚ’(ｊ)を高速フーリエ変換して周波数領域の残響音声信号Ｚ’(ｋ)を求める。この残響音声信号Ｚ’(ｋ)も下式のように複素形式で表される。 H (k) = A {h_real (k) + i · h_img (k)}
<Step 2: Step of fast Fourier transform calculation of reverberant speech signal Z ′ (j)>
A speech signal (reverberation speech signal) Z ′ (j) including a reverberation component output from the first echo canceller 30A is subjected to fast Fourier transform to obtain a reverberation speech signal Z ′ (k) in the frequency domain. This reverberant speech signal Z ′ (k) is also expressed in a complex form as shown in the following equation.

Ｚ’(ｋ)＝ｚ’_real(k)＋ｉ・ｚ’_img(k)
＜ステップ３：伝達関数Ｈ(ｋ)の大きさ｜Ｈ(ｋ)｜を演算するステップ＞
下式により伝達関数Ｈ(ｋ)の大きさ｜Ｈ(ｋ)｜を求める。 Z ′ (k) = z′_real (k) + i · z′_img (k)
<Step 3: Step of calculating magnitude | H (k) | of transfer function H (k)>
The magnitude | H (k) | of the transfer function H (k) is obtained by the following equation.

｜Ｈ(ｋ)｜＝{ｈ_real²(k)＋ｈ_img²(k)}^1/2
＜ステップ４：音源信号Ｚ”(ｋ)を回復する演算を行うステップ＞
ステップ１〜３でそれぞれ求めたＨ(ｋ)、｜Ｈ(ｋ)｜、Ｚ’(ｋ)を用いて音源信号、すなわち、残響成分が除去された音声信号Ｚ”(ｋ)を求める。式（４）より、
Ｚ”(ｋ)＝Ｚ’(ｋ)／Ｈ(ｋ)＝ｚ”_real(k)＋ｉ・ｚ”_img(k)
但し、
ｚ”_real(k)＝{ｚ’_real(k)・ｈ_real(k)＋ｚ’_img(k)・ｈ_img(k)}／｜Ｈ(ｋ)｜²
ｚ”_img(k)＝{ｚ’_img(k)・ｈ_real(k)−ｚ’_real(k)・ｈ_img(k)}／｜Ｈ(ｋ)｜²
＜ステップ５：音源信号Ｚ”(ｋ)を逆高速フーリエ変換演算するステップ＞
周波数領域の音源信号Ｚ”(ｋ)を逆高速フーリエ変換して時間領域の音源信号Ｚ”(ｊ)を求める。｜ H (k) | = {h_real ² (k) + h_img ² (k)} ^1/2
<Step 4: Step of performing calculation to recover sound source signal Z ″ (k)>
Using H (k), | H (k) |, Z ′ (k) obtained in Steps 1 to 3 respectively, a sound source signal, that is, a speech signal Z ″ (k) from which a reverberation component is removed is obtained. From (4)
Z ″ (k) = Z ′ (k) / H (k) = z ″ _real (k) + i · z ″ _img (k)
However,
z ″ _real (k) = {z′_real (k) · h_real (k) + z′_img (k) · h_img (k)} / | H (k) | ²
z ”_img (k) = {z′_img (k) · h_real (k) −z′_real (k) · h_img (k)} / | H (k) | ²
<Step 5: Inverse Fast Fourier Transform Calculation of Sound Source Signal Z ″ (k)>
The frequency domain sound source signal Z ″ (k) is subjected to inverse fast Fourier transform to obtain the time domain sound source signal Z ″ (j).

上記ステップ１〜５の演算処理により、残響音声信号Ｚ’(ｊ)から残響成分を除去した音声信号（音源信号）Ｚ”(ｊ)が逆フィルタ処理部１０から遠端側に出力される。 Through the arithmetic processing in steps 1 to 5, a speech signal (sound source signal) Z ″ (j) obtained by removing the reverberation component from the reverberant speech signal Z ′ (j) is output from the inverse filter processing unit 10 to the far end side.

ここで、親機Ｍが相手の通話機器（ドアホン子器Ｓなど）と通話を行う際に第１のエコーキャンセラ３０Ａ及び逆フィルタ処理部１０が行う処理について、図３及び図４のフローチャートを参照して説明する。 Here, regarding the processing performed by the first echo canceller 30A and the inverse filter processing unit 10 when the base unit M makes a call with the other party's telephone device (such as the door phone slave unit S), refer to the flowcharts of FIG. 3 and FIG. Will be explained.

例えば、ドアホン子器Ｓからの呼出に対して親機Ｍの応答釦が操作されると、親機Ｍとドアホン子器Ｓとの間に通話路が確立されて親機Ｍが通話状態に移行すると同時にＤＳＰがエコーキャンセラ３０Ａ，３０Ｂや逆フィルタ処理部１０を実現するプログラム（ソフトウェア）を実行する。 For example, when the response button of the master unit M is operated in response to a call from the door phone slave unit S, a communication path is established between the master unit M and the door phone slave unit S, and the master unit M shifts to a call state. At the same time, the DSP executes a program (software) for realizing the echo cancellers 30A, 30B and the inverse filter processing unit 10.

図３に示すように、まず最初に変数の初期化処理（フィルタ係数ｈ＾_i(０）＝０、ステップゲインμ＝μ_MAX、推定誤差Ｅ(０)＝１）３９が行われ、続いて遠端側（ドアホン子器Ｓ側）の入力信号Ｘ(ｊ＋１)と近端側（マイクロホン１側）の入力信号Ｙ(ｊ＋１)の取得処理４０，４１が行われ、取得した入力信号Ｘ(ｊ＋１)，Ｙ(ｊ＋１)はＦＩＦＯ型のメモリ（図示せず）に最新データとして蓄積される。 As shown in FIG. 3, first, a variable initialization process (filter coefficient ＾ _i (0) = 0, step gain μ = μ _MAX , estimation error E (0) = 1) 39 is performed, and then Acquisition processing 40 and 41 of the input signal X (j + 1) on the far end side (door phone slave unit S side) and the input signal Y (j + 1) on the near end side (microphone 1 side) is performed, and the acquired input signal X (j + 1) ), Y (j + 1) are stored as the latest data in a FIFO type memory (not shown).

次にフィルタ係数ｈ＾_i(ｊ)を更新するか、更新を停止する（更新しない）か、変数初期化処理３９から処理をやり直すかの判別処理（係数更新判別処理４２）が行われる。このフィルタ係数更新判別処理４２では、図４のフローチャートに示すように発散判定処理、有音／無音判定処理、ダブルトーク判定処理が行われる。音響エコー発散検出部１７による発散判定処理では、まず近端側入力信号Ｙ(ｊ)とエコー成分推定値Ｇ＾(ｊ)の積に基づいて両者の符号を判別し、両者の符号が異符号であるときにのみカウント値divcountをインクリメントする処理５１が行われた後、発散判定時間未経過判別処理５２において発散検出の判定を行なう時間（例えば、２００ミリ秒）が経過したかどうかが判別され、経過していなければ有音／無音判定処理が実行され、経過していれば上記時間のカウントを０に初期化するとともに、カウント値divcountつまり異符号の割合が発散判定閾値div_sliceを超えているか否かが判断される。そして、カウント値divcountが発散判定閾値div_sliceを越えていれば発散状態と判定し、カウント値divcountを０に初期化する処理５４が行われた後、変数初期化処理３９が行われる。一方、カウント値divcountが発散判定閾値div_sliceを越えていなければ非発散状態と判定し、カウント値divcountを０に初期化する処理５６が行われた後、有音／無音判定部１３による有音／無音判定処理が実行される。 Next, a determination process (coefficient update determination process 42) of whether to update the filter coefficient ＾ _i (j), stop the update (do not update), or restart the process from the variable initialization process 39 is performed. In the filter coefficient update determination process 42, a divergence determination process, a sound / silence determination process, and a double talk determination process are performed as shown in the flowchart of FIG. In the divergence determination process by the acoustic echo divergence detection unit 17, first, the code of both is determined based on the product of the near-end side input signal Y (j) and the echo component estimated value G ^ (j). After the processing 51 for incrementing the count value divcount is performed only when the time is divergence, it is determined in the divergence determination time non-elapsed determination processing 52 whether or not the time for performing divergence detection determination (for example, 200 milliseconds) has elapsed. If it has not elapsed, the voice / silence determination processing is executed. If it has elapsed, the time count is initialized to 0, and the count value divcount, that is, the ratio of the different sign exceeds the divergence determination threshold div _slice. It is determined whether or not. Then, if the count value divcount exceeds the divergence determination threshold div _slice , it is determined as a divergence state, and after a process 54 for initializing the count value divcount to 0 is performed, a variable initialization process 39 is performed. On the other hand, if the count value divcount does not exceed the divergence determination threshold div _slice , it is determined as a non-divergence state, and after the process 56 for initializing the count value divcount to 0 is performed, the sound / silence determination unit 13 performs sound. / Silence determination processing is executed.

有音／無音判定処理では、蓄積された入力信号Ｘ(ｊ＋１)の絶対値平均ＬＸ(ｊ＋１)が有音／無音判定閾値ＬＸ_SLICEを超えているか否かが判断され、絶対値平均ＬＸ(ｊ＋１)が有音／無音判定閾値ＬＸ_SLICEを越えていなければ無音状態と判定し、フィルタ係数ｈ＾_i(ｊ)の更新が停止される。一方、絶対値平均ＬＸ(ｊ＋１)が有音／無音判定閾値ＬＸ_SLICEを越えていれば有音状態と判定し、ダブルトーク判定処理が実行される。さらにダブルトーク判定処理では、蓄積された入力信号Ｙ(ｊ＋１)の絶対値平均ＬＹ(ｊ＋１)がダブルトーク判定閾値ＬＹ_SLICEを越えているか否かが判断され、絶対値平均ＬＹ(ｊ＋１)がダブルトーク判定閾値ＬＹ_SLICEを越えていればダブルトーク状態と判定し、フィルタ係数ｈ＾_i(ｊ)の更新が停止される。一方、絶対値平均ＬＹ(ｊ＋１)がダブルトーク判定閾値ＬＹ_SLICEを越えていなければダブルトーク状態でないと判定し、適応フィルタ３１Ａにおけるフィルタ係数ｈ＾_i(ｊ)の更新処理が実行される。 In the sound / silence determination processing, it is determined whether or not the absolute value average LX (j + 1) of the accumulated input signal X (j + 1) exceeds the sound / silence determination threshold LX _SLICE , and the absolute value average LX (j + 1) ) _Does not exceed the sound / silence determination threshold LX _SLICE , it is determined that there is a silence, and the update of the filter coefficient h ^ _i (j) is stopped. On the other hand, if the absolute value average LX (j + 1) exceeds the sound / silence determination threshold LX _SLICE , it is determined that the sound is present, and the double talk determination process is executed. Further, in the double talk determination process, it is determined whether or not the absolute value average LY (j + 1) of the accumulated input signal Y (j + 1) exceeds the double talk determination threshold LY _SLICE , and the absolute value average LY (j + 1) is doubled. If the talk determination threshold value LY _SLICE is _exceeded , it is determined that the state is a double talk state, and the update of the filter coefficient h ^ _i (j) is stopped. On the other hand, if the absolute value average LY (j + 1) does not exceed the double talk determination threshold LY _SLICE , it is determined that the double talk state is not established, and the filter coefficient h ^ _i (j) is updated in the adaptive filter 31A.

そして、図３に示すように、フィルタ係数ｈ＾_i(ｊ)を更新する場合はステップゲイン切替部１５においてステップゲイン切替処理４３が実行され、フィルタ係数ｈ＾_i(ｊ)の更新を停止する場合は前回のフィルタ係数ｈ＾_i(ｊ)を今回のフィルタ係数ｈ＾_i(ｊ)に代入する処理４４’が行われた後にエコー成分推定値Ｇ＾(ｊ＋１)の演算処理４５が行われる。 As shown in FIG. 3, when the filter coefficient h ^ _i (j) is updated, the step gain switching unit 43 executes the step gain switching process 43 to stop the update of the filter coefficient h ^ _i (j). In this case, the processing 44 ′ for substituting the previous filter coefficient h ^ _i (j) for the current filter coefficient h ^ _i (j) is performed, and then the calculation process 45 of the echo component estimated value G ^ (j + 1) is performed. .

ステップゲイン切替処理４３では、入力信号Ｘ(ｊ)，Ｙ(ｊ)を最新のものから所定時間（例えば、２ミリ秒）前まで平均して求めた瞬時値（Ｘ瞬時値、Ｙ瞬時値）の比（＝Ｘ瞬時値／Ｙ瞬時値）を所定の閾値α_sliceと比較し、上記比が閾値α_sliceを越えていなければフィルタ係数更新処理４４で用いるステップゲインμを最小値μ_MINに設定し、瞬時値の比が閾値α_slice以上であればステップゲインμを最大値μ_MAXに設定することでフィルタ係数ｈ＾_i(ｊ)の発散を防止している。 In the step gain switching process 43, instantaneous values (X instantaneous value, Y instantaneous value) obtained by averaging the input signals X (j) and Y (j) from the latest to a predetermined time (for example, 2 milliseconds). (= X instantaneous value / Y instantaneous value) is compared with a predetermined threshold value α _slice, and if the ratio does not exceed the threshold value α _slice , the step gain μ used in the filter coefficient update processing 44 is set to the minimum value μ _MIN . If the instantaneous value ratio is equal to or greater than the threshold α _slice , the step gain μ is set to the maximum value μ _MAX to prevent the filter coefficient h ^ _i (j) from diverging.

フィルタ係数更新処理４４では、蓄積されているエコー成分の推定誤差Ｅ(ｊ)と入力信号Ｘ(ｊ)を取得してフィルタ係数ｈ＾_i(ｊ)がタップ番号ごとに更新される。続いて式（１）によりエコー成分推定値Ｇ＾(ｊ＋１)を演算する処理４５が行われた後、入力信号Ｙ(ｊ＋１)からエコー成分推定値Ｇ＾(ｊ＋１)を減算してエコー成分の推定誤差Ｅ(ｊ＋１)を演算する処理４６が行われ、さらに非線形エコー除去処理４７が行われる。この非線形エコー除去処理４７においては、メモリに蓄積された入力信号Ｙ(ｊ＋１)の絶対値平均ＬＹ(ｊ＋１)がシングルトークとダブルトークを判定する閾値ＬＹ_sliceより小さい、つまりシングルトークの状態にあり、かつエコー成分の推定誤差Ｅ(ｊ＋１)がクリッピング閾値Ｅclipより小さければ、これを非線形エコー成分と判定し、出力信号（残響音声信号）Ｚ’(ｊ＋１)を０とすることで除去する。それ以外の場合はエコー成分の推定誤差Ｅ(ｊ＋１)がそのまま出力信号Ｚ’(ｊ＋１)とされる。 In the filter coefficient updating process 44, the accumulated echo component estimation error E (j) and the input signal X (j) are acquired, and the filter coefficient ＾ _i (j) is updated for each tap number. Subsequently, the processing 45 for calculating the echo component estimated value G ^ (j + 1) is performed according to the equation (1), and then the echo component estimated value G ^ (j + 1) is subtracted from the input signal Y (j + 1). A process 46 for calculating the estimation error E (j + 1) is performed, and a nonlinear echo removal process 47 is further performed. In this nonlinear echo cancellation processing 47, the absolute value average LY (j + 1) of the input signal Y (j + 1) stored in the memory is smaller than the threshold LY _slice for determining single talk and double talk, that is, in a single talk state. If the estimated error E (j + 1) of the echo component is smaller than the clipping threshold Eclip, it is determined as a non-linear echo component, and the output signal (reverberant speech signal) Z ′ (j + 1) is set to 0 and removed. In other cases, the echo component estimation error E (j + 1) is directly used as the output signal Z ′ (j + 1).

逆フィルタ処理部１０による逆フィルタ処理４８では、既に説明したように５つのステップ１〜５により残響音声信号Ｚ’(ｊ)から残響成分を除去した音声信号（音源信号）Ｚ”(ｊ)を復元しており、復元された音声信号Ｚ”(ｊ)を送話側の信号経路に出力する処理４９，５０が行われた後、再び入力信号Ｘ(ｊ＋１)，Ｙ(ｊ＋１)を取得する処理４０に戻って上述の処理が繰り返されることになる。尚、逆フィルタ処理４８で用いられるフィルタ係数ｈ＾_i(ｊ)が、係数更新判別処理４２、ステップゲイン切替処理４３、非線形エコー除去処理４７によって時変系である空間（例えば、浴室）のインパルス応答を精度よく近似することができるため、残響成分を高い精度で除去できるものである。 In the inverse filter processing 48 by the inverse filter processing unit 10, the sound signal (sound source signal) Z ″ (j) obtained by removing the reverberation component from the reverberation sound signal Z ′ (j) in the five steps 1 to 5 as described above. After the processing 49 and 50 for outputting the restored voice signal Z ″ (j) to the signal path on the transmission side is performed, the input signals X (j + 1) and Y (j + 1) are acquired again. It returns to the process 40 and the above-mentioned process is repeated. Note that the filter coefficient 空間_i (j) used in the inverse filter process 48 is an impulse in a space (for example, a bathroom) that is a time-varying system by the coefficient update determination process 42, the step gain switching process 43, and the nonlinear echo removal process 47. Since the response can be approximated with high accuracy, the reverberation component can be removed with high accuracy.

また本実施形態においては、第１のエコーキャンセラ３０Ａにおけるエコー抑圧量が所定の基準値を超えているか否かを判断して基準値を超えている場合はフィルタ係数ｈ＾_i(ｊ)が室内インパルス応答ｈ(ｊ)をよく近似しているとみなして逆フィルタ処理部１０にフィルタ係数ｈ＾_i(ｊ)を出力し、逆フィルタ処理部１０が上述の逆フィルタ処理４８を実行して残響音声信号Ｚ’(ｊ)から残響成分を除去した音声信号Ｚ”(ｊ)を出力し、反対に基準値を超えていない場合はフィルタ係数ｈ＾_i(ｊ)が室内インパルス応答ｈ(ｊ)を近似していないとみなして逆フィルタ処理部１０にフィルタ係数ｈ＾_i(ｊ)を出力せず、残響音声信号Ｚ’(ｊ)が逆フィルタ処理部１０を通過してそのまま音声信号Ｚ”(ｊ)として出力されるようにしている。 Further, in this embodiment, it is determined whether or not the echo suppression amount in the first echo canceller 30A exceeds a predetermined reference value. If the echo suppression amount exceeds the reference value, the filter coefficient h _i (j) is Assuming that the impulse response h (j) is a good approximation, the filter coefficient h ^ _i (j) is output to the inverse filter processing unit 10, and the inverse filter processing unit 10 executes the above-described inverse filter processing 48 to generate reverberation. When the audio signal Z ″ (j) is obtained by removing the reverberation component from the audio signal Z ′ (j), and the reference value is not exceeded, the filter coefficient h ^ _i (j) is the room impulse response h (j). Is not output to the inverse filter processing unit 10 and the reverberant speech signal Z ′ (j) passes through the inverse filter processing unit 10 as it is and does not output the filter coefficient h ^ _i (j). This is output as (j).

例えば、Ａ／Ｄ変換器３７のサンプリング周波数を８ｋＨｚ、インパルス応答長を２５６ミリ秒とした場合、式（１）のフィルタタップ数Ｉは２０４８個となり、逆フィルタ処理部１０が逆フィルタ処理４８を行うか否かの判断は、エコー抑圧量をエコー成分の推定誤差Ｅ(ｊ)とエコー成分推定値Ｇ＾(ｊ)との比Ｅ(ｊ)／Ｇ＾(ｊ)と定義したときに２０log{Ｅ(ｊ)／Ｇ＾(ｊ)}（エコー抑圧量）の値が基準値−８ｄＢを越えているか否かで行われる。すなわち、上記エコー抑圧量が−８ｄＢ未満となったときに第１のエコーキャンセラ３０Ａが音響側帰還経路Ｈ_ACを回り込んでくる音響エコーを十分抑圧している、つまりフィルタ係数ｈ＾_i(ｊ)が帰還経路Ｈ_ACの室内インパルス応答ｈ(ｊ)をよく近似していると判断して、第１のエコーキャンセラ３０Ａから逆フィルタ処理部１０にフィルタ係数ｈ＾_i(ｊ)が渡されて逆フィルタ処理部１０が逆フィルタ処理４８を行い、エコー抑圧量が−８ｄＢ以上のときは音響エコーが十分に抑圧されていない、つまりフィルタ係数ｈ＾_i(ｊ)が帰還経路Ｈ_ACの室内インパルス応答ｈ(ｊ)を近似していないと判断して、第１のエコーキャンセラ３０Ａから逆フィルタ処理部１０にフィルタ係数ｈ＾_i(ｊ)が渡されずに逆フィルタ処理部１０は逆フィルタ処理４８を行わず、残響音声信号Ｚ’(ｊ)がそのまま出力される。ここで、「省エネルギーは心がけ次第です。」というフレーズを浴室（室内寸法：２．０ｍ×１．７ｍ×２．２ｍ）内で男性が発したときの残響音声信号の波形を図５に、逆フィルタ処理部１０が逆フィルタ処理４８を行うことで残響成分を除去した後の音声信号の波形を図６に、第１のエコーキャンセラ３０Ａが収束したときの２０４８個のフィルタ係数ｈ＾_i(ｊ)を図７にそれぞれ示す。図５と図６を比較すれば明らかなように、本実施形態の残響除去装置Ａにより音声信号に含まれる残響成分が除去されて音声信号が聞き取りやすくなっていることが判る。 For example, when the sampling frequency of the A / D converter 37 is 8 kHz and the impulse response length is 256 milliseconds, the number of filter taps I in the equation (1) is 2048, and the inverse filter processing unit 10 performs the inverse filter processing 48. The determination as to whether or not to perform is 20 log when the echo suppression amount is defined as the ratio E (j) / G ^ (j) between the echo component estimation error E (j) and the echo component estimated value G ^ (j). This is performed depending on whether the value of {E (j) / G ^ (j)} (echo suppression amount) exceeds the reference value −8 dB. That is, the first echo canceller 30A when the echo suppression amount is less than -8dB are sufficiently suppress acoustic echoes goes around the acoustic side feedback path H _AC, i.e. the filter coefficients h ^ _i (j ) Is sufficiently approximated to the indoor impulse response h (j) of the feedback path H _AC , and the filter coefficient h _i (j) is passed from the first echo canceller 30A to the inverse filter processing unit 10. When the inverse filter processing unit 10 performs the inverse filter processing 48 and the echo suppression amount is −8 dB or more, the acoustic echo is not sufficiently suppressed, that is, the filter coefficient h ^ _i (j) is the room impulse of the feedback path H _AC . it is determined that no similar response h a (j), the inverse filtering unit 10 to the inverse filtering unit 10 from the first echo canceller 30A to not passed the filter coefficients h ^ _i (j) is the inverse filtering 8 without reverberation sound signal Z '(j) is output as it is. Here, the waveform of the reverberant audio signal when a man utters the phrase “energy saving is up to you” in the bathroom (room dimensions: 2.0 m × 1.7 m × 2.2 m) The waveform of the audio signal after the reverberation component is removed by the filter processing unit 10 performing the inverse filter processing 48 is shown in FIG. 6, and 2048 filter coefficients ｈ _i (j when the first echo canceller 30A converges. ) Are shown in FIG. As is clear from comparison between FIG. 5 and FIG. 6, it can be seen that the reverberation component included in the audio signal is removed by the dereverberation apparatus A of the present embodiment, and the audio signal is easy to hear.

而して、残響のある浴室内に設置された親機Ｍと玄関先に設置されたドアホン子器Ｓとの間で拡声通話を行う場合、浴室内の残響成分が付加された音声信号が親機Ｍのマイクロホン１に集音されるために残響成分が音源信号（通話者の音声信号）をマスクしてしまい、ドアホン子器Ｓのスピーカから出力される音声が聞き取りにくくなっていたが、上述のように本発明に係る残響除去装置Ａを親機Ｍに搭載することにより、親機Ｍからドアホン子器Ｓへは残響成分が除去された音声信号が伝送されるため、ドアホン子器Ｓのスピーカから出力される音声が聞き取り易くなって快適な通話環境が実現できる。また、ドアホン子器Ｓのスピーカから聞こえる音声に残響成分が含まれていると相手の通話者に浴室内にいることが判ってしまうことからプライバシーが侵害される虞があり、しかも、入浴中であればそのことが相手の通話者に判ってしまうことで住居に侵入されたり盗難の被害に遭いかねないという防犯上の問題もあったが、ドアホン子器Ｓのスピーカから聞こえる音声に残響成分がなければ相手の通話者に浴室内に居ることが判らないため、居住者のプライバシー保護と防犯性の向上とが図れるものである。 Thus, when making a loudspeaking call between the main unit M installed in a bathroom with reverberation and the doorphone slave unit S installed at the entrance, the audio signal with the reverberation component in the bathroom is the parent signal. Since the sound is collected by the microphone 1 of the machine M, the reverberation component masks the sound source signal (caller's voice signal), making it difficult to hear the sound output from the speaker of the doorphone slave unit S. Since the dereverberation apparatus A according to the present invention is installed in the master unit M as described above, an audio signal from which the reverberation component is removed is transmitted from the master unit M to the door phone slave unit S. The voice output from the speaker is easy to hear and a comfortable telephone call environment can be realized. In addition, if the sound heard from the speaker of the doorphone slave unit S contains a reverberation component, it may be known that the other party is in the bathroom, and privacy may be infringed. If there is a problem of crime prevention, the other party's caller may know that it may be invaded into the house or may be damaged by theft, but the sound heard from the speaker of the doorphone slave unit S has a reverberation component. Otherwise, the other party's caller will not know that he is in the bathroom, so the privacy of the resident can be protected and crime prevention can be improved.

尚、コンサートホールや講堂に設置された拡声システムにおいても講演者の音声が空間から反射してくる残響成分によってマスクされて聴講者が内容を聞き取りにくくなることがあるが、かかる拡声システムに本発明の残響除去方法及び装置を適用すれば、インターホンシステムの親機Ｍを浴室に設置した場合と同様の効果を奏し、コンサートホールや講堂内でＴＳＰ方やＭ系列法などに基づく基準音を出力せずとも逆フィルタ処理によって残響成分を除去することが可能である。 Even in a loudspeaker system installed in a concert hall or auditorium, the speaker's voice may be masked by reverberation components reflected from the space, making it difficult for the listener to hear the content. If the dereverberation method and device are applied, the same effect as when the main unit M of the intercom system is installed in the bathroom is produced, and the reference sound based on the TSP method or M-sequence method is output in the concert hall or auditorium. It is possible to remove the reverberation component at least by inverse filtering.

本発明の実施形態を示すブロック図である。It is a block diagram which shows embodiment of this invention. 同上を搭載した親機と、親機とともにインターホンシステムを構成するドアホン子器のブロック図である。It is a block diagram of the door phone cordless handset which comprises a main phone carrying the same as above and an intercom system with the main phone. 同上の動作説明用のフローチャートである。It is a flowchart for operation | movement description same as the above. 同上の動作説明用のフローチャートである。It is a flowchart for operation | movement description same as the above. 残響音声信号の波形図である。It is a wave form diagram of a reverberant voice signal. 同上を用いて残響成分が除去された音声信号の波形図である。It is a wave form diagram of an audio signal from which a reverberation component was removed using the same as above. 同上におけるフィルタ係数を示す図である。It is a figure which shows the filter coefficient in the same as the above.

Explanation of symbols

Ａ残響除去装置
１マイクロホン
２スピーカ
１０逆フィルタ処理部
３０Ａ第１のエコーキャンセラ
３１Ａ適応フィルタ
３２Ａ減算器 A dereverberation device 1 microphone 2 speaker 10 inverse filter processing unit 30A first echo canceller 31A adaptive filter 32A subtractor

Claims

A reverberation removal method for restoring an original sound source signal by removing a reverberation component from a reverberant speech signal collected by a microphone in a reverberation space, and a feedback path formed by acoustic coupling between a speaker and a microphone existing in the reverberation space The first step of estimating the echo component of the feedback path from the reverberant speech signal collected by the microphone by adaptively identifying the impulse response of the signal from the microphone, and the echo component estimated by the adaptive filter in the first step A second step of subtracting from the output signal of the feedback path, and a third step of updating the filter coefficient of the adaptive filter so that the estimation error of the echo component estimation value included in the subtraction result in the second step is minimized. And the filter coefficient when the estimation error of the echo component estimation value is minimized in the third step A fourth step for obtaining a transfer function of the reverberation space from the filter coefficient instead of the impulse response, and an original sound from the calculation of the transfer function of the reverberation space obtained in the fourth step and the reverberant sound signal collected by the microphone. And a fifth step of obtaining a signal.

In the fourth step, a transfer function in the frequency domain is obtained by Fourier transforming the filter coefficients, and in the fifth step, the reverberant speech signal is Fourier transformed and the magnitude of the transfer function in the frequency domain obtained in the fourth step is obtained. The dereverberation method according to claim 1, wherein an inverse Fourier transform is performed after dividing by the above.

3. The dereverberation method according to claim 1, wherein the adaptive filter in the first step is an FIR type filter.

4. The dereverberation method according to claim 3, wherein in the third step, the filter coefficient of the adaptive filter is updated by a least mean square algorithm.

5. The reverberation according to claim 4, wherein in the third step, it is determined whether or not speech is included in the reverberant speech signal, and the filter coefficient of the adaptive filter is updated only when speech is included. Removal method.

In the third step, when the instantaneous power ratio of the reverberant audio signal to the instantaneous power of the audio signal output from the speaker is larger than a predetermined threshold, the step gain in the adaptive filter is set to a relatively small value. 5. The dereverberation method according to claim 4, wherein

In the third step, it is determined whether or not both the signal collected by the microphone and the signal output from the speaker include sound, and if both include sound, the filter of the adaptive filter 5. The dereverberation method according to claim 4, wherein the coefficient is not updated.

5. The dereverberation method according to claim 4, wherein in the third step, the filter coefficient is initialized when the filter coefficient diverges.

In the third step, the filter coefficient is continuously updated when the feedback path fluctuates even when both the signal collected by the microphone and the signal output from the speaker include sound. The dereverberation method according to claim 7.

In the fifth step, it is determined whether or not both the signal collected by the microphone and the signal output from the speaker include sound, and the signal collected by the microphone and the signal output from the speaker are 3. The reverberant speech signal is set to zero when at least one of them does not include speech and the estimation error of the echo component estimated value is smaller than a predetermined threshold value. Reverberation removal method.

A reverberation removing device that removes a reverberation component from a reverberant speech signal collected by a microphone in a reverberation space and restores the original sound source signal, and is a feedback path formed by acoustic coupling between a speaker and a microphone that exist in the reverberation space An adaptive filter that adaptively identifies the impulse response of the signal and estimates the echo component of the feedback path from the reverberant speech signal collected by the microphone, and a subtractor that subtracts the echo component estimated by the adaptive filter from the output signal of the feedback path Filter coefficient updating means for updating the filter coefficient of the adaptive filter so that the estimation error of the echo component estimated value included in the subtraction result by the subtracting means is minimized, and the estimation error of the echo component estimated value in the filter coefficient updating means The filter coefficient at the time of the minimum is substituted for the impulse response in the reverberation space, and the residual is obtained from the filter coefficient. Transfer function calculating means for obtaining a transfer function in space, and reverberation calculating means for obtaining an original sound signal from a calculation of a reverberation space transfer function obtained by the transfer function calculating means and a reverberant sound signal collected by a microphone. A dereverberation device characterized by that.

The transfer function calculation means obtains a transfer function in the frequency domain by performing Fourier transform on the filter coefficient, and the reverberation calculation means performs Fourier transform on the reverberant speech signal and divides the reverberation signal by the size of the transfer function in the frequency domain. 12. The dereverberation apparatus according to claim 11, wherein an inverse Fourier transform is performed later.

13. The dereverberation apparatus according to claim 11 or 12, wherein the adaptive filter is an FIR type filter.

The dereverberation apparatus according to claim 13, wherein the filter coefficient updating means updates the filter coefficient of the adaptive filter by a least mean square algorithm.

The filter coefficient update means includes a sound / silence determination unit that determines whether or not sound is included in the reverberant sound signal and updates the filter coefficient of the adaptive filter only when the sound is included. The dereverberation apparatus according to claim 14, characterized in that:

The filter coefficient updating means sets the step gain in the adaptive filter to a relatively small value when the instantaneous power ratio of the reverberant audio signal to the instantaneous power of the audio signal output from the speaker is larger than a predetermined threshold value. The dereverberation apparatus according to claim 14, further comprising a switching unit.

The filter coefficient updating unit includes a determination unit that determines whether or not both the signal collected by the microphone and the signal output from the speaker include sound, and the determination unit includes sound in both. 15. The dereverberation apparatus according to claim 14, wherein the filter coefficient of the adaptive filter is not updated when it is present.

15. The dereverberation apparatus according to claim 14, wherein the filter coefficient updating means includes a divergence detection unit that detects divergence of the filter coefficient and initializes the filter coefficient when the divergence is detected.

The filter coefficient updating means includes a feedback path fluctuation detecting unit that detects fluctuations in the feedback path, and the feedback path fluctuation detecting means is used even if it is determined by the determining unit that the voice is included in both. 18. The dereverberation apparatus according to claim 17, wherein the update of the filter coefficient is continued when a change in the frequency is detected.

It detects whether or not the output signal of the subtracting means and the signal output from the speaker contain sound, compares the estimated error of the echo component estimated value with a predetermined threshold value, and outputs sound to at least one of the signals. Is included, and when the estimation error is smaller than the threshold, it is determined that the reverberant speech signal includes a non-linear echo component, and non-linear echo suppression means for setting the reverberant speech signal to zero is provided. The dereverberation apparatus according to claim 11 or 12, wherein the dereverberation apparatus is provided.