JPH11168791A

JPH11168791A - Device and method for detecting sound source

Info

Publication number: JPH11168791A
Application number: JP10249875A
Authority: JP
Inventors: Paeivi Valve; バルブパイビ; Juha Haekkinen; ハッキネンユハ
Original assignee: Nokia Mobile Phones Ltd
Current assignee: Nokia Oyj
Priority date: 1997-09-04
Filing date: 1998-09-03
Publication date: 1999-06-22
Anticipated expiration: 2018-09-03
Also published as: FI114422B; FI973596A0; EP0901267A2; US6707910B1; JP4624503B2; EP0901267B1; EP0901267A3; DE69840119D1; FI973596A

Abstract

PROBLEM TO BE SOLVED: To provide a device for detecting a sound source composed of a microphone for receiving sound signals and a detection means for detecting sound from the received sound signals. SOLUTION: This device is provided with means 15 and 17 for discriminating the incoming direction of received signals, the means 17 for storing the estimated incoming direction of the sound of a certain specified source and a means 18 for comparing the incoming direction of the received signals with the estimated incoming direction. The device is further provided with the means 18 for displaying that the sound source is a certain specified source in the case of discriminating by comparison that the incoming direction of the received signals matched with the estimated incoming direction with certain allowable tolerance.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声信号を受信する
ためのマイクロフォン手段と、受信した音声信号内の音
声を検出する検出手段とを備えてなる音声源検出方法と
検出装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sound source detecting method and a detecting device comprising a microphone means for receiving a sound signal and a detecting means for detecting sound in the received sound signal.

【０００２】[0002]

【従来の技術】電話での会話はエコーによって妨害され
ることが多い。これは特に、以下の異なる４通りの、す
なわち、アイドル、近端（ｎｅａｒ−ｅｎｄ）通話、遠
端（ｆａｒ−ｅｎｄ）通話、およびダブルトーク（ｄｏ
ｕｂｌｅ−ｔａｌｋ）の通話状態を有する全二重電話の
場合である。エコーは通常は、通話が遠端から入り、受
信した遠端信号がスピーカで再生され、マイクロフォン
を介して遠端に戻される場合に発生する。エコー問題は
特に、スピーカが高音量の音声を周囲に対して再生し、
そのようにしてスピーカからの音声がマイクロフォンに
容易に戻されるようなハンドフリーの通話方法で発生す
る。BACKGROUND OF THE INVENTION Telephone conversations are often disturbed by echoes. This is especially true for four different types: idle, near-end, far-end, and double-talk (do).
This is a case of a full-duplex telephone having a call state of (uble-talk). Echo typically occurs when a call enters at the far end and the received far end signal is played back on the speaker and returned to the far end via a microphone. The echo problem is especially true when the speakers play loud sound to the surroundings,
This occurs in a hands-free calling method in which the sound from the speaker is easily returned to the microphone.

【０００３】エコーを除去するために適応化された信号
処理が採用される。ハンドフリーの移動電話の応用例で
は、公知のエコーキャンセラおよびエコー抑制器を使用
して、スピーカからマイクロフォンへの妨害が多い音響
フィードバック、すなわち音響エコーを有効に除去する
ことが可能である。エコーキャンセラは、出信号からの
エコー信号、すなわち受信側に遠端信号が存在する場合
に、通常は遠端から来た信号を抑制する適応ディジタル
・フィルタを利用して実施することができる。このよう
にして、遠端信号が遠端に戻ることを防止するための努
力がなされている。適応フィルタの各パラメータは通常
は、何らかの状況の条件をできる限り正確に考慮に入れ
るために、遠端通話が生じた時に常に更新される。エコ
ー抑制器については、送信される近端信号を減衰するた
めに使用される。[0003] Adapted signal processing is employed to eliminate echo. In hands-free mobile phone applications, known echo cancellers and echo suppressors can be used to effectively remove the jammed acoustic feedback from the speaker to the microphone, ie, the acoustic echo. The echo canceller can be implemented using an adaptive digital filter that suppresses the echo signal from the output signal, that is, the signal that normally comes from the far end when the far end signal exists on the receiving side. In this way, efforts are made to prevent far end signals from returning to the far end. The parameters of the adaptive filter are usually updated whenever a far-end call occurs, in order to take into account the conditions of any situation as accurately as possible. For echo suppressors, it is used to attenuate the transmitted near-end signal.

【０００４】近端通話と遠端通話が同時に生ずるような
状況はダブルトーク状況と呼ばれている。ダブルトーク
中、エコーキャンセラはエコー信号を有効に除去するこ
とができない。その理由は、エコー信号は送信される近
端信号内で加算され、その場合、エコーキャンセラは除
去されるべきエコー信号の正確なモデルを形成できない
からである。このような場合、エコーキャンセラの適応
フィルタはスピーカとマイクロフォンとの間の空間の音
響応答に適正に適応することができず、従って、近端通
話信号が存在する場合は、送信される信号から音響エコ
ーを除去することができない。そのため、エコーキャン
セラに対するダブルトークの妨害作用を除去するため
に、ダブルトーク検出器が使用されることが多い。通常
は、ダブルトーク状況は遠端通話と同時に近端通話が存
在しているか否かを検出することによって検知される。
ダブルトーク中、エコーキャンセラの適応フィルタの各
パラメータは更新されないが、適応フィルタの更新は近
端の人が通話している間は中断されなければならない。
エコー抑制器はまた、近端の人が通話している間に送信
される信号を不適切に（過度に）減衰しないように、近
端の通話者の通話活性に関する情報が必要である。A situation in which near-end and far-end calls occur simultaneously is called a double-talk situation. During double talk, the echo canceller cannot effectively remove the echo signal. This is because the echo signals are summed in the transmitted near-end signal, in which case the echo canceller cannot form an accurate model of the echo signal to be removed. In such a case, the adaptive filter of the echo canceller cannot properly adapt to the acoustic response of the space between the loudspeaker and the microphone, and therefore, if near-end speech signals are present, the acoustic filter Echo cannot be removed. For this reason, a double-talk detector is often used to remove the double-talk interference effect on the echo canceller. Typically, a double talk situation is detected by detecting whether a near end call is present at the same time as a far end call.
During double talk, the parameters of the adaptive filter of the echo canceller are not updated, but the update of the adaptive filter must be interrupted while the near end is talking.
The echo suppressor also needs information about the call activity of the near-end talker so that the signal transmitted while the near-end talks is not inappropriately (excessively) attenuated.

【０００５】ＧＭＳ移動電話で使用される割り込み可能
な送信にはエコーキャンセリングおよび抑制に加えて、
近端通話活性に関する情報が必要である。割り込み可能
な送信の概念とは、通話活動中だけに通話信号を送信す
ること、すなわち、近端の通話者が休止している間は、
電力を節減するために近端信号は送信されないというこ
とである。割り込み可能な送信に起因するバックグラウ
ンドノイズ・レベルが過度に変動することを避けるた
め、アイドル状態である種の快適なノイズを送信し、し
かも送信中に必要なビットを節減することが可能であ
る。そのために、ＧＳＭの割り込み可能な送信によって
送信される通話の音質を低下させないようにするため、
近端通話活性は正確、迅速かつ確実に検知されなければ
ならない。[0005] Interruptible transmissions used in GMS mobile phones, in addition to echo canceling and suppression,
Information on near-end call activity is needed. The concept of interruptible transmission is to send a call signal only during call activity, that is, while the near-end talker is dormant,
That is, no near-end signal is transmitted to save power. To avoid excessive fluctuations in the background noise level due to interruptible transmissions, it is possible to transmit some comfortable noise while idle and save the bits required during transmission. . Therefore, in order not to degrade the sound quality of the call sent by GSM interruptible transmission,
Near-end call activity must be accurately, quickly and reliably detected.

【０００６】図１はエコーキャンセリングおよびダブル
トーク検出のための従来から公知の構成１を示してい
る。近端信号３はマイクロフォン２から届き、近端通話
活性検出器４，ＶＡＤ（音声活性検出器）を用いてそれ
が検出される。遠端信号５は入力接続部Ｉから届き（こ
れはハンドフリーの装置では入力コネクタ、定置電話の
場合はワイヤ・コネクタ、および移動電話の場合はアン
テナから電話の受信ブランチへの経路でよい）、遠端通
話活性検出器６，ＶＡＤ内で検出され、最後にスピーカ
７で再生される。近端信号３と遠端信号５は双方ともダ
ブルトークを検出するためのダブルトーク検出器８と、
エコー経路１３の音響応答に適応するための適応フィル
タ９とに送られる。ダブルトーク中にフィルタに適応し
ないため（パラメタが更新されないため）に、適応フィ
ルタ９は入力としてダブルトーク検出器８の出力をも受
ける。エコーキャンセリングを行うために、適応フィル
タによって形成されるモデル１０は加算／減算装置１１
で近端信号３から減算される。（ハンドフリーの装置で
は出力コネクタ、定置電話ではワイヤ・コネクタ、およ
び移動電話では送信ブランチからアンテナへの経路でよ
い）出力接続部Ｏには、エコーキャンセラ出力信号１２
が送られ、そこからエコー（のある部分）は既にキャン
セルされている。図１に示したエコーキャンセラを（例
えばハンドフリーのスピーカ呼出し用にスピーカとマイ
クロフォンとからなる）電話に統合してもよく、又は別
個のハンドフリー装置で実施することもできる。FIG. 1 shows a conventionally known configuration 1 for echo canceling and double talk detection. The near-end signal 3 arrives from the microphone 2 and is detected using the near-end speech activity detector 4 and VAD (voice activity detector). The far-end signal 5 arrives at the input connection I (which may be the input connector for hands-free devices, the wire connector for stationary phones, and the path from the antenna to the receiving branch of the phone for mobile phones), It is detected in the far-end speech activity detector 6 and VAD, and is finally reproduced by the speaker 7. The near-end signal 3 and the far-end signal 5 are both a double talk detector 8 for detecting double talk,
And an adaptive filter 9 for adapting to the acoustic response of the echo path 13. The adaptive filter 9 also receives the output of the double talk detector 8 as input, since it does not adapt to the filter during double talk (because the parameters are not updated). To perform echo cancellation, the model 10 formed by the adaptive filter is added to an adder / subtracter 11
Is subtracted from the near-end signal 3. (The output connector O may be an output connector for a hands-free device, a wire connector for a stationary telephone, and a path from a transmission branch to an antenna for a mobile telephone).
From which the echo (part of it) has already been cancelled. The echo canceller shown in FIG. 1 may be integrated into a telephone (e.g., comprising a speaker and a microphone for hands-free speaker calling) or may be implemented in a separate hands-free device.

【０００７】ダブルトークを検出するための方法は幾つ
か提案されている。しかし、それらの多くは極めて単純
であり、一部は信頼性に欠ける。ほとんどのダブルトー
ク検出器はスピーカ信号及び／又はマイクロフォン信号
及び／又はエコーキャンセラの後の信号相互間のパワー
比に基づいている。これらの検出器の利点は簡略さと迅
速さであり、それらの欠点は信頼性に欠けることであ
る。Several methods have been proposed for detecting double talk. However, many of them are quite simple and some are unreliable. Most double talk detectors are based on the power ratio between the loudspeaker signal and / or the microphone signal and / or the signal after the echo canceller. The advantage of these detectors is simplicity and speed, the disadvantage of which is that they are not reliable.

【０００８】スピーカ信号及び／又はマイクロフォン信
号及び／又はエコーキャンセラの後の信号相互間を相関
させることに基づく検出器も公知である。これらの検出
器は、スピーカおよびマイクロフォン内の単なるエコー
信号（エコーキャンセラを経た後の信号）が強く相関さ
れるという構想に基づくものであるが、近端信号がマイ
クロフォン信号に加算されると相関関係は低減する。こ
れらの検出器の欠点は検出速度が遅いこと、近端信号と
遠端信号との（一部は正しくない）非相関関係が想定さ
れること、エコー経路に起因するスピーカ信号に変化の
影響があること、近端信号がない場合でも相関関係が低
下することである。[0008] Detectors based on correlating loudspeaker signals and / or microphone signals and / or signals after an echo canceller are also known. These detectors are based on the concept that the mere echo signal (the signal after passing through the echo canceller) in the speaker and the microphone is strongly correlated, but when the near-end signal is added to the microphone signal, the correlation is increased. Is reduced. The disadvantages of these detectors are that the detection speed is slow, that the near-end signal and the far-end signal are uncorrelated (partially incorrect), and that the loudspeaker signal due to the echo path is affected by changes. That is, the correlation decreases even when there is no near-end signal.

【０００９】また同じ信号の自動相関の比較に基づくダ
ブルトーク検出器も公知であり、この場合は検出器が近
端信号内の音声を認識し、ひいては近端信号の存在を検
出することができる。このような検出器の計算に要する
パワーは少ないが、検出器が相関に基づいているので上
記と同じ問題点が生ずる。[0009] Also known are double talk detectors based on a comparison of the autocorrelation of the same signal, in which the detector can recognize speech in the near-end signal and thus detect the presence of the near-end signal. . Although the power required to calculate such a detector is small, the same problems occur as described above because the detector is based on correlation.

【００１０】ＫｕｏＳ．Ｍ., ＰａｎＺ．著の文献
「大規模ビデオ会議用の音響エコーを消去するマイクロ
フォン・システム」ＩＣＳＰＡＴ’９４会報、１９９４
年刊、７−１２ページでは、ノイズおよび音響エコーを
除去し、冒頭に述べた異なる通話状況を認識するための
反対方向に向けられた２個のマイクロフォンを使用して
いる。しかし、上記の方法は単にエコーキャンセラの出
力パワーによってのみ実施されるダブルトークの認識に
特別の改善をもたらすものではない。[0010] Kuo S. M., Pan Z. Author's book, "Microphone system for canceling acoustic echo for large-scale video conferencing,"ICSPAT'94 Bulletin, 1994.
Yearly, pp. 7-12, uses two microphones oriented in opposite directions to remove noise and acoustic echo and to recognize the different call situations mentioned at the outset. However, the above method does not provide any particular improvement in the recognition of double talk performed only by the output power of the echo canceller.

【００１１】ＡｆｆｅｓＳ., ＧｒｅｎｉｅｒＹ．
著の文献「ダブルトーク状況のためのマイクロフォンの
音源サブスペース・トラッキングアレイ」，ＩＣＳＰＡ
Ｔ’９６会報，第２巻，１９９６年刊，９０９−９１２
ページでは、マイクロフォン・ベクトル構造のエコーお
よびバックグラウンドノイズキャンセラが提案されてい
る。提案されているエコーキャンセラは所望の方向から
到達する信号を保持しつつ、空間的に選択された方向か
ら到達する信号を濾波するものである。上記のエコーキ
ャンセラはダブルトーク状況の間も動作することができ
る。しかし、上記の文献は（マイクロフォン・ベクトル
とも呼ばれる）多重マイクロフォンによる解決方法を利
用した近端通話活性の検出も、ダブルトーク検出も提示
していない。Affes S., Grenier Y.
Authored book "Microphone sound source subspace tracking array for double talk situations", ICSPA
T'96 Bulletin, Volume 2, 1996, 909-912
On the page, an echo and background noise canceller with a microphone vector structure is proposed. The proposed echo canceller filters signals arriving from spatially selected directions while retaining signals arriving from a desired direction. The above echo canceller can operate during a double talk situation. However, the above references do not present near-end speech activity detection using a multi-microphone solution (also referred to as microphone vector) nor do double-talk detection.

【００１２】[0012]

【発明が解決しようとする課題】ここに近端通話活性を
検出し、かつダブルトーク状況を認識するための方法と
装置の発明がなされた。SUMMARY OF THE INVENTION A method and apparatus for detecting near-end speech activity and recognizing a double talk situation has been invented.

【００１３】[0013]

【課題を解決するための手段】本発明は近端通話信号が
到達する方向に基づいてその信号を検出する構想に基づ
いている。スピーカ信号が近端の通話者の通話信号の方
向とは明確に異なる方向から到達するハンドフリーの応
用例では、近端通話信号をその到達角度に基づいてスピ
ーカ信号と区別することができる。本発明では、異なる
方向及び／又は異なる地点からの音声をピックアップす
る幾つかのマイクロフォン（マイクロフォン・ベクト
ル）を利用して検出が行われる。The present invention is based on the concept of detecting a near-end speech signal based on the direction of arrival of the signal. In hands-free applications where the speaker signal arrives from a direction that is distinctly different from the direction of the near-end talker's call signal, the near-end talk signal can be distinguished from the speaker signal based on its arrival angle. In the present invention, detection is performed using several microphones (microphone vectors) that pick up sound from different directions and / or different points.

【００１４】マイクロフォン・ベクトルの出力は先ず帯
域フィルタにかけられて狭帯域信号にされ、濾波された
信号によって形成される信号マトリクス上で到達方向の
角度が推定される。この推定によって空間スペクトルが
復元され、そこからスペクトル内に生ずるピークに基づ
いて到達方向のトラッキングが行われる。近端通話信号
の到達方向とスピーカ信号の到達方向とは得られた（判
明した）到達方向に基づいて更新される。到達方向のこ
れらの推定値によって最終的なＶＡＤの決定がより容易
になる。到達方向推定装置が近端通話信号の推定到達方
向に充分に近い到達方向に充分に強いスペクトル・ピー
クを検出すると、近端通話者は通話しているものと見な
される。すなわち近端通話活性を検出することができ
る。The output of the microphone vector is first bandpass filtered into a narrowband signal, and the angle of arrival is estimated on a signal matrix formed by the filtered signals. The spatial spectrum is restored by this estimation, and tracking in the direction of arrival is performed based on the peaks generated in the spectrum from the spatial spectrum. The arrival direction of the near-end speech signal and the arrival direction of the speaker signal are updated based on the obtained (determined) arrival direction. These estimates of the direction of arrival make the final VAD determination easier. If the direction-of-arrival estimator detects a sufficiently strong spectral peak in the direction of arrival sufficiently close to the estimated direction of arrival of the near-end talk signal, the near-end talker is considered to be talking. That is, near-end call activity can be detected.

【００１５】ダブルトークの判定には、近端通話活性に
加えて、遠端通話活性に関する情報が必要である。この
情報は、例えばパワーレベルに基づく音声活性検出器の
ような公知の音声活性検出器を用いて検出することがで
きる。For the determination of double talk, information on the far-end speech activity is required in addition to the near-end speech activity. This information can be detected using a known voice activity detector, such as a power level based voice activity detector.

【００１６】本発明による装置は、受信した信号の到達
方向を判定する手段と、ある特定の源の音声の推定され
た到達方向を記憶する手段と、前記受信した信号の到達
方向と前記推定された到達方向とを比較する手段と、前
記比較によって前記受信した信号の到達方向が前記推定
された到達方向とある許容差の範囲内で一致した場合に
は、前記音声が前記ある特定の源で発生されたことを表
示する手段とを備えてなることを特徴としている。The apparatus according to the present invention comprises: means for determining the direction of arrival of a received signal; means for storing the estimated direction of arrival of the sound of a particular source; and the direction of arrival of the received signal and the estimated direction of arrival. Means for comparing the arrival direction of the received signal with the estimated arrival direction within the range of a certain tolerance by the comparison. Means for indicating that the event has occurred.

【００１７】[0017]

【発明の実施の形態】次に本発明を図面を参照して詳細
に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, the present invention will be described in detail with reference to the drawings.

【００１８】図２は近端音声活性を検出し、かつダブル
トークを認識するための、本発明による検出器のブロッ
ク図を示している。本発明ではマイクロフォン２として
幾つかのマイクロフォン２ａ，２ｂ，．．．，２Ｍが使
用され、それらのマイクロフォンは好適にはいわゆるマ
イクロフォン・ベクトル２として接続されている。ベク
トルは少なくとも２個のマイクロフォンを有している
が、３個または４個またはそれ以上であることが好適で
ある。各マイクロフォンは単一の信号３ａ，３
ｂ，．．．，３Ｍを発生し、Ｍ個のマイクロフォン（Ｍ
は整数）が使用される場合、時間領域内で変化可能なＭ
個の信号が得られ、これらが時間領域内で変化可能なＭ
個の要素からなる１つの信号ベクトルを形成する。FIG. 2 shows a block diagram of a detector according to the present invention for detecting near-end voice activity and recognizing double talk. In the present invention, several microphones 2a, 2b,. . . , 2M are used, and their microphones are preferably connected as a so-called microphone vector 2. The vector has at least two microphones, but preferably three or four or more. Each microphone has a single signal 3a, 3
b,. . . , 3M, and M microphones (M
Is an integer), M is variable in the time domain
Signals are obtained, which are variable in the time domain.
To form one signal vector of the elements.

【００１９】マイクロフォン・ベクトル２の出力３ａ，
３ｂ，．．．，３Ｍは先ず帯域フィルタ１４内で帯域濾
波されて狭帯域信号１９ａ，１９ｂ，．．．，１９Ｍに
される。超分解能のスペクトルの正確な推定方法は狭帯
域信号にのみ作用するので、方向角度の推定のために帯
域濾波が行われる。帯域濾波は例えば急速フーリエ変換
（ＦＦＴ）、ウインドゥイング及びインタリービングを
利用して実現することができる。帯域フィルタの周波数
範囲はマイクロフォン・ベクトル内の各マイクロフォン
間の距離に基づいて判定される。ナイキスト・サンプリ
ング理論に従って、空間サンプリング周波数は信号の空
間周波数の少なくとも２倍でなければならないので、帯
域フィルタ１４の帯域周波数（ポイント周波数）として
下記が得られる：ｆ＝Ｃ／２ｄ，但し、Ｃは空中での音
の速度（２０℃で３４３ｍ／ｓ）であり、ｄは各マイク
ロフォン間の距離である。The outputs 3a of the microphone vector 2
3b,. . . , 3M are first bandpass filtered in bandpass filter 14 to produce narrowband signals 19a, 19b,. . . , 19M. Since the accurate method of estimating the super-resolution spectrum only works on narrowband signals, bandpass filtering is performed to estimate the directional angle. Bandpass filtering can be achieved using, for example, fast Fourier transform (FFT), windowing and interleaving. The frequency range of the bandpass filter is determined based on the distance between each microphone in the microphone vector. Since, according to Nyquist sampling theory, the spatial sampling frequency must be at least twice the spatial frequency of the signal, the following is obtained as the band frequency (point frequency) of the bandpass filter 14: f = C / 2d, where C is The speed of sound in the air (343 m / s at 20 ° C.), and d is the distance between each microphone.

【００２０】濾波された信号１９ａ，１９ｂ，．．．，
１９Ｍによって形成される信号マトリクス上の方向角度
（すなわち到達方向）の推定は、例えばＭＵＳＩＣ（多
重信号分類（Multiple Signal Classification））のよ
うなある種の公知の推定方法を利用して推定装置１５で
行われる。The filtered signals 19a, 19b,. . . ,
The estimation of the directional angle (i.e. the direction of arrival) on the signal matrix formed by the 19M is performed by the estimator 15 using some known estimating method such as, for example, MUSIC (Multiple Signal Classification). Done.

【００２１】この推定方法は空間スペクトルを復元し、
そこからスペクトル内に生ずるピークに基づいて信号の
到達方向が判定される。図３はそのようなマイクロフォ
ン・ベクトル信号の空間スペクトルの例を示している。
到達方向は例えばスペクトル曲線の微分係数を吟味する
ことによって図３に示したスペクトル図から判定するこ
とができる。微分係数のそのようなゼロ点は到達方向と
して復元される。ゼロ点では微分係数は正から負に変化
し、それは公知のように、曲線内の各ピーク位置を示
す。このように図３では、２つの信号がマイクロフォン
・ベクトルに到達する。すなわち１つは１０°の方向か
ら、またはもう１つは４０°の方向からの信号である。
更に、到達方向であると見なされるスペクトル・ピーク
がある特定の最小振幅（例えば５ｄＢ）を有することを
要求できる。図面では、スペクトルの有効範囲（カバレ
ージ）は９０°として示されている。実際には、±９０
°の範囲で検出することができる。微分係数の計算と、
振幅の最小条件が満たされているか否かのチェックは好
適には、ディジタル信号プロセッサを利用して（プログ
ラミングによって）実施されなければならない。推定装
置１５はその出力として信号の到達方向１６を付与（推
定）する。This estimation method restores the spatial spectrum,
The direction of arrival of the signal is determined based on the peaks generated in the spectrum therefrom. FIG. 3 shows an example of the spatial spectrum of such a microphone vector signal.
The arrival direction can be determined from the spectrum diagram shown in FIG. 3 by examining the derivative of the spectrum curve, for example. Such a zero of the derivative is restored as the direction of arrival. At zero, the derivative changes from positive to negative, which, as is known, indicates the position of each peak in the curve. Thus, in FIG. 3, two signals arrive at the microphone vector. That is, one from a direction of 10 ° or another from a direction of 40 °.
Further, it may be required that the spectral peaks considered to be in the direction of arrival have a certain minimum amplitude (eg, 5 dB). In the drawing, the effective range (coverage) of the spectrum is shown as 90 °. In practice, ± 90
° can be detected. Calculating the derivative,
Checking whether the minimum condition of the amplitude is satisfied should preferably be performed (by programming) using a digital signal processor. The estimation device 15 gives (estimates) the arrival direction 16 of the signal as its output.

【００２２】近端通話信号３とスピーカ信号５の推定到
達方向は得られた（判明した）到達方向に基づいてブロ
ック１７で更新される。可能性がある到達方向はスペク
トル・ピークから得られた到達方向を平均化することに
よって評価される。どの方向から信号が到達したのかが
ほぼ判明した場合は、その時点で空間スペクトルに生ず
るエラー・ピークの影響を最小限にすることができる。
エラー・ピークが推定到達方向で生じない限り、それが
注記されることはない。図４は車内での通常のハンドフ
リー装置のマイクロフォン２とスピーカ７の配置を示し
ており、スピーカは通常は０°±４０°の方向でむしろ
マイクロフォン・ベクトル２の直前にある。スピーカの
位置はマイクロフォン・ベクトルに対して大幅に変更し
てもよい。マイクロフォン・ベクトル２のマイクロフォ
ン２ａ，２ｂ，．．．，２Ｍは互いにある特定の方向に
ある特定の距離を隔てて配置される。距離と方向は以下
に説明する到達方向推定アルゴリズムによって判定され
なければならない。以下に、信号源位置、方向判定ブロ
ック１７で実行される遠端と近端の双方の到達方向の平
均化についてより詳細に説明する。The estimated directions of arrival of the near-end speech signal 3 and the loudspeaker signal 5 are updated in block 17 based on the obtained (determined) directions of arrival. The possible directions of arrival are evaluated by averaging the directions of arrival obtained from the spectral peaks. If it is almost known from which direction the signal has arrived, the effect of error peaks occurring in the spatial spectrum at that point can be minimized.
Unless an error peak occurs in the estimated direction of arrival, it is not noted. FIG. 4 shows the arrangement of the microphone 2 and the loudspeaker 7 of a conventional hands-free device in a car, the loudspeaker usually being in the direction of 0 ° ± 40 °, but rather just before the microphone vector 2. The position of the loudspeakers may vary significantly with respect to the microphone vector. The microphones 2a, 2b,. . . , 2M are arranged at a certain distance from each other in a certain direction. The distance and direction must be determined by the direction-of-arrival estimation algorithm described below. Hereinafter, the averaging of the arrival directions at both the far end and the near end performed in the signal source position / direction determination block 17 will be described in more detail.

【００２３】遠端到達方向の推定はスペクトル推定装置
１５から得られる到達角度１６の平均化に基づいて行わ
れる。平均化は、遠端に通話がある場合にのみ行われ、
これはその出力が判定ブロック１７に送られる遠端ＶＡ
Ｄ６の出力を用いて判定される。平均化は好適には例え
ばＩＩＲ−濾波を用いて時間領域内で実行される。異な
る方向から到達する２つの信号源、すなわち近端信号３
と遠端信号５が存在するということが基本的に想定され
ている。更に、前記信号の到達方向は実行される観測の
頻度と比較して比較的ゆっくりと変化すると想定されて
いる。スペクトル推定装置１５がその出力として到達方
向ベクトルｄｏａ（度での）を発すると、遠端到達方向
ベクトルの推定値ｆｄｏａ（度での）は平均化によっ
て、各々の新たな方向推定値が最も近いｆｄｏａの成分
に影響を及ぼすように更新される。更新には、関連する
成分に近いほど検出された方向がｆｄｏａ成分を更新す
るように重み付けが行われる。スピーカ信号の方向及び
それによってスペクトル内に誘発される残響（ｒｅｖｅ
ｒｂｅｒａｔｉｏｎ）信号の方向は極めて僅かしか変化
せず、その場合、上記の重み付けによってスペクトル内
の偶発的な、エラー・ピークの影響が低減される。同時
に問題のｆｄｏａ成分が発生する確率、ｐｄｏａは、当
該の方向推定値に新たな値が近いければ近いほど更新さ
れる。加えて、当該のｆｄｏａ成分の強さ、ｐｏｗｄｏ
ａは、それに対応するスペクトル・ピークのパワーに基
づいて更新される。この場合、遠端到達方向推定ベクト
ルｆｄｏａは（Ｍ−１）個の信号の到達角度の方向を有
する。成分ｐｄｏａは、範囲〔０，１〕での対応する到
達方向の確率と、ｐｏｗｄｏａに対応する範囲〔０，
１〕での標準化された強度からなっている。The estimation of the far end arrival direction is performed based on the averaging of the arrival angle 16 obtained from the spectrum estimating device 15. Averaging occurs only when there is a call at the far end,
This is the far end VA whose output is sent to decision block 17.
The determination is made using the output of D6. The averaging is preferably performed in the time domain, for example using IIR-filtering. Two signal sources arriving from different directions, namely the near-end signal 3
It is basically assumed that the far-end signal 5 exists. It is further assumed that the direction of arrival of the signal changes relatively slowly compared to the frequency of observations performed. When the spectrum estimator 15 emits the arrival direction vector doa (in degrees) as its output, the estimated value fdoa (in degrees) of the far-end arrival direction vector is averaged so that each new direction estimate is the closest Updated to affect components of fdoa. The updating is weighted such that the closer the detected component is to the related component, the more the detected direction updates the fdoa component. The direction of the loudspeaker signal and thereby the reverberation (reve) induced in the spectrum
The direction of the reverberation signal changes very little, in which case the weighting reduces the effects of accidental, error peaks in the spectrum. At the same time, the probability of occurrence of the problem fdoa component, pdoa, is updated as the new value is closer to the direction estimation value. In addition, the strength of the relevant fdoa component, powder
a is updated based on the power of the corresponding spectral peak. In this case, the far-end arrival direction estimation vector fdoa has directions of arrival angles of (M-1) signals. The component pdoa is the probability of the corresponding arrival direction in the range [0, 1] and the range [0, 1] corresponding to the powder.
1].

【００２４】ここで、遠端信号５の到達方向はそれに対
応する確率と強度が最高であり、一方、最後に判定され
た遠端信号到達方向に最も近い遠端到達方向ベクトル推
定値ｆｄｏａの成分であるものと想定することができ
る。推定は通話が遠端に存在する場合にのみ更新される
ので、近端信号３（この場合はダブルトーク）が時間の
５０％未満で発生するものと想定することができる。従
って基本的に想定されることは、ダブルトークが遠端通
話活性時間の半分未満で発生するということである。遠
端信号到達方向（スピーカの方向）は、到達方向に対応
するスペクトル・ピークのパワーに基づいて残響される
スピーカ信号の到達方向から分離されうる。マイクロフ
ォンに直接到達する信号によって通常は残響経路で減衰
される信号よりも強いピークが空間スペクトル内に生ず
る。Here, the arrival direction of the far-end signal 5 has the highest probability and strength corresponding thereto, while the component of the far-end arrival direction vector estimated value fdoa closest to the finally determined far-end signal arrival direction. Can be assumed. Since the estimate is updated only when the call is at the far end, it can be assumed that the near end signal 3 (in this case, double talk) occurs less than 50% of the time. Thus, it is basically assumed that double talk will occur in less than half of the far-end active time. The far-end signal arrival direction (the direction of the speaker) can be separated from the arrival direction of the reverberated speaker signal based on the power of the spectral peak corresponding to the arrival direction. Signals that arrive directly at the microphone produce stronger peaks in the spatial spectrum than signals that are normally attenuated in the reverberation path.

【００２５】以下は図７を参照した到達方向を推定する
アルゴリズムの説明である。段階１００では、以下を判
定することからなる初期化が行われる。ｆｄｏａ，ｐｄ
ｏａ及びｐｏｗｄｏａが（Ｍ−１）個の成分を含んでい
る。ｄｏａがＬ個の成分（１≦Ｌ≦Ｍ−１）を含んでい
る。ｆｄｏａ成分が異なる値を用いて初期化される。ｆｄｏａ（ｎ）＝−９０＋ｎ^*１８０／Ｍ；（１≦ｎ≦
Ｍ−１）The following is an explanation of the algorithm for estimating the arrival direction with reference to FIG. In step 100, an initialization is performed which comprises determining: fdoa, pd
oa and powder contain (M-1) components. doa includes L components (1 ≦ L ≦ M−1). The fdoa component is initialized with different values. fdoa (n) = − 90 + n ^* 180 / M; (1 ≦ n ≦
M-1)

【００２６】段階１０１：検出された到達方向（ｄｏ
ａ）に対応して以下のように推定値（ｆｄｏａの各成
分）をトラッキングする。各推定値からの各到達方向の
距離を計算する。距離が最短の推定値ｄｏａ（ｉ）及び
それに対応する最も近い推定値ｆｄｏａ（ｎ）を選択す
る。Step 101: Detected direction of arrival (do
The estimated value (each component of fdoa) is tracked as follows corresponding to a). Calculate the distance in each direction of arrival from each estimate. The shortest estimated value doa (i) and its corresponding closest estimated value fdoa (n) are selected.

【００２７】段階１０２：到達方向ｄｏａ（ｉ）の推定
値にどれだけ近いかに応じて推定値ｆｄｏａ（ｎ）を更
新する。近ければ近いほど、検出された方向は推定値を
変更する。すなわち、ｆｄｏａ（ｎ）＝α₀ ^*ｆｄｏａ
（ｎ）＋（１−α₀) ^*ｄｏａ（ｉ），但しα₀は例え
ば距離の一次関数、または指数関数である（一次（線
形）の依存性については図５を参照）。更新係数α₀及
び距離ｄの上限と下限、α₀＿ｍａｘ，α ₀ ｍｉｎ及
びｄｍａｘ，ｄｍｉｎを調整することによって、更
新の迅速さだけではなく、どの距離に位置するピークが
推定値に影響を与えるかにも影響を及ぼすことができ
る。例えば、距離の最大値が４０°に保たれ（ｄｍｉ
ｎ＝０°，ｄｍａｘ＝４０°）、更新係数の最大値が
１に保たれている場合（α ₀ ｍｉｎ＝０．９９，α ₀
ｍａｘ＝１．０）、４０°以上遠いスペクトル・エラー
のピーク値は推定値を更新せず、ひいてはエラーを全く
誘発しない。このようにして、推定値に及ぼす上記の偽
信号の影響を除去することが可能である。Step 102: Update the estimated value fdoa (n) according to how close it is to the estimated value of the arrival direction doa (i). The closer, the more the detected direction changes the estimate. That is, fdoa (n) = α ₀ ^* fdoa
(N) + (1−α ₀ ) ^* doa (i), where α ₀ is, for example, a linear function or exponential function of the distance (see FIG. 5 for linear (linear) dependence). Upper limit and lower limit of update coefficient α ₀ and distance d, α ₀ _max, α ₀ min and d max, d Adjusting min can affect not only the speed of the update, but also at which distance the peak located affects the estimate. For example, the maximum value of the distance is kept at 40 ° (d mi
n = 0 °, d max = 40 °), when the maximum value of the update coefficient is kept at 1 (α ₀ min = 0.99, α ₀
max = 1.0), peak values of spectral errors more than 40 ° do not update the estimate and thus do not induce any errors. In this way, it is possible to remove the influence of the above-mentioned spurious signal on the estimated value.

【００２８】段階１０３：再び到達方向が推定値にどの
程度近いかに応じて推定値の発生の確率、ｐｄｏａを高
める。以下では、距離の関数は一次関数であるものと想
定されている。例えば指数関数のようなその他の関数で
も可能である。ｐｄｏａ（ｎ）＝α₁ ^*ｐｄｏａ（ｎ）＋（１−α₁)
（１−ｄｉｓｔ／１８０）但し、α₁は例えば０．９であり、ｄｉｓｔは〔０，１
８０〕の範囲内での観測値と推定値との間の距離であ
る。Step 103: Increase the probability of occurrence of the estimated value, pdoa, again according to how close the arrival direction is to the estimated value. In the following, the function of distance is assumed to be a linear function. Other functions, such as an exponential function, are possible. pdoa (n) = α ₁ ^* pdoa (n) + (1−α ₁ )
(1-dist / 180) where α ₁ is, for example, 0.9, and dist is [0, 1
80] is the distance between the observed and estimated values.

【００２９】段階１０４：以下のように検出されたスペ
クトル・ピークのパワーを用いて推定値のパワーｐｏｗ
ｄｏａをも更新する。ｐｏｗｄｏａ（ｎ）＝α₃ ^*ｐｏｗｄｏａ（ｎ）＋（１
−α₃) ^*Ｐｏｗ／Ｐｏｗｍａｘ但し、α₃は例えば０．９であり、Ｐｏｗはスペクトル
・ピークのパワーであり、Ｐｏｗｍａｘは最大のパワー
である。Step 104: Estimated power pow using the power of the detected spectral peak as follows:
doa is also updated. powder (n) = α ₃ ^* powder (n) + (1
−α ₃ ) ^* Pow / Powmax where α ₃ is, for example, 0.9, Pow is the power of the spectrum peak, and Powmax is the maximum power.

【００３０】段階１０５では、その他の到達方向と推定
値を見出しうるか否かが判定され、イエスである場合は
残りの到達方向と推定値のペアについて段階１０１−１
０４を繰り返す。段階１０６：例えばｄｉｓｔ＝１８０及びＰｏｗ＝０に
設定することによって、到達方向が検出されなかった推
定値の発生頻度とパワーを低減する。In step 105, it is determined whether or not other arrival directions and estimated values can be found. If yes, steps 101-1 are performed on the remaining pairs of arrival directions and estimated values.
Repeat 04. Step 106: Reduce the frequency and power of estimation values for which no direction of arrival has been detected, for example by setting dist = 180 and Pow = 0.

【００３１】その後、段階１０７で、例えば以下の式を
最大にすることによって、スピーカの方向に関して、そ
の方向が最大の発生確率とパワーを有し、かつスピーカ
の方向の最新の評定値に最も近い推定値の方向を選択す
る。ａ^*ｐｄｏａ（ｋ）＋ｂ^*ｐｏｗｄｏａ（ｋ）＋ｃ^*ｄｉｓｔａｎｃｅ（ｋ）；Ｋ＝１．．．Ｍ−１，（１）但し、ａ，ｂおよびｃは重み付け係数、例えば１／３で
あり、ｄｉｓｔａｎｃｅ（ｋ）は推定値ｆｄｏａ（ｋ）
とそれまで（以前に）評定されたスピーカの方向との間
の「度」で表した距離である。Thereafter, at step 107, for example, by maximizing the following equation, the direction of the loudspeaker has the highest probability of occurrence and power, and is closest to the latest rating of the direction of the loudspeaker. Select the direction of the estimate. a ^* pdoa (k) + b ^* powdoa (k) + c ^* distance (k); K = 1. . . M-1, (1) where a, b and c are weighting factors, for example, 1/3, and distance (k) is an estimated value fdoa (k)
Is the distance in degrees between the direction of the loudspeaker evaluated so far (previously).

【００３２】これまで遠端通話信号の到達方向の推定に
ついて説明してきた。以下に近端通話信号の到達方向の
推定について説明する。近端信号の到達方向の推定は上
記に説明した手順とアルゴリズムに従って行われ、従っ
て近端の到達方向の推定値ｎｄｏａは上記アルゴリズム
でｆｄｏａをｎｄｏａと置き換えることによって得られ
る。遠端通話活性検出器６が、遠端から到達する通話が
ないことを表示した場合に推定は行われる。推定装置１
５内でこのスペクトルを検出する場合、予期されるピー
ク（到達角度の方向）がないか、又は近端信号及び／又
は偽信号及び残響の方向に従って１から（Ｍ−１）個の
ピークがある。近端の通話者の方向として、前述のよう
に最も頻繁に反復され、かつ最も強力である空間スペク
トルによって示される方向が選択される。更に、近端の
通話者はマイクロフォン・ベクトルに対して約０°±３
０°の方向に座っているものと想定され、その場合、近
端の通話者の方向推定値の初期値を０°と設定でき、方
向の選択に際してはそれまでに（以前に）評定された方
向が強力に重み付けされることができる。The estimation of the arrival direction of the far-end speech signal has been described above. Hereinafter, the estimation of the arrival direction of the near-end speech signal will be described. The estimation of the direction of arrival of the near-end signal is performed according to the procedure and algorithm described above, and thus the estimated value ndoa of the direction of arrival of the near end is obtained by replacing fdoa with ndoa in the above algorithm. The estimation is performed when the far-end speech activity detector 6 indicates that there is no call arriving from the far end. Estimation device 1
When detecting this spectrum within 5, there is no expected peak (direction of angle of arrival) or there are 1 to (M-1) peaks according to the direction of the near-end signal and / or spurious signal and reverberation. . As the direction of the near end talker, the direction indicated by the most frequently repeated and strongest spatial spectrum is selected as described above. In addition, the near-end talker is approximately 0 ° ± 3 ° with respect to the microphone vector.
It is assumed to be sitting in the 0 ° direction, in which case the initial value of the near-end talker's direction estimate can be set to 0 °, and the choice of direction has been previously (previously) rated. Directions can be strongly weighted.

【００３３】これらの想定された到達方向値ｆｄｏａ，
ｎｄｏａは最終的な検出を行う検出ブロック１８に採り
入れられる。そのピークが近端通話信号の想定到達方向
に充分に近い到達方向での充分に強いスペクトル・ピー
クを到達方向推定装置１５が検出すると、近端の通話者
が通話していることが判明する。すなわち、近端通話活
性が検出される。この比較はブロック１５及び１７から
到達する信号に基づいて検出器１８で行われる。近端通
話活性の最終的な決定は、スペクトル・ピーク及び到達
方向の推定（平均化）を利用して行われる。いずれかの
スペクトル・ピークが遠端の推定値よりも近端の到達方
向の推定値（又はその残響の推定値）に近く、更に所定
のエラー許容差よりも近端推定値に近い場合は、近端に
通話が存在することが検出される。許容差の値は例えば
１０度である。These assumed arrival direction values fdoa,
The ndoa is incorporated in a detection block 18 that performs a final detection. When the arrival direction estimating device 15 detects a sufficiently strong spectrum peak in the arrival direction whose peak is sufficiently close to the assumed arrival direction of the near-end speech signal, it is determined that the near-end talker is talking. That is, near-end call activity is detected. This comparison is made at detector 18 based on the signals arriving from blocks 15 and 17. The final determination of near-end speech activity is made using estimation (averaging) of the spectral peak and direction of arrival. If any of the spectral peaks are closer to the near-end arrival direction estimate (or its reverberation estimate) than the far-end estimate, and closer to the near-end estimate than a predetermined error tolerance: The presence of a call at the near end is detected. The value of the tolerance is, for example, 10 degrees.

【００３４】ダブルトークの判定には、近端通話活性に
加えて、遠端通話活性に関する情報が必要である。この
情報は遠端通話活性検出器６からダブルトーク検出器１
８へと送られ、このようにしてこの検出器１８は、（前
述の）近端通話活性検出器が通話を検出し、かつ遠端通
話活性検出器６が同時に通話を検出すると、ダブルトー
ク状況を検出する。遠端信号に関する限り、通話活性の
検出にはどのＶＡＤアルゴリズムを用いてもよい。ダブ
ルトークの結果は、近端および遠端通話活性の値につい
ての簡単なＡＮＤ演算、すなわち、１（通話中）及び０
（非通話中）を用いて得られる。For the determination of double talk, information on the far-end speech activity is required in addition to the near-end speech activity. This information is transmitted from the far-end speech activity detector 6 to the double talk detector 1
8 and thus this detector 18 will detect a double talk situation when the near-end speech activity detector (described above) detects speech and the far-end speech activity detector 6 simultaneously detects speech. Is detected. As far as the far-end signal is concerned, any VAD algorithm may be used to detect call activity. The result of the double talk is a simple AND operation on the values of the near-end and far-end speech activity, namely 1 (during a call) and 0
(During a non-call).

【００３５】以下に、図２を参照して、過渡状態（遷
移）検出器ＴＤの機能を説明する。この検出器は本発明
による通話活性検出器／ダブルトーク検出器ではオプシ
ョンであり、従って図面では点線を用いて示してある。
到達方向の推定は狭帯域信号で行われるので、急激な近
端信号の変化（過渡状態）の検出は困難である。そのた
め、過渡状態の検出用に最適化された並列の検出器ＴＤ
を使用することが可能である。各々の過渡位置が検出さ
れた後、判定の正確さをチェックするために到達方向検
出器が使用される。本発明による検出器が例えば２０ｍ
ｓ未満の充分に急激な信号変化を検出するならば、過渡
状態検出器ＴＤを使用する必要はない。The function of the transient state (transition) detector TD will be described below with reference to FIG. This detector is optional in the call activity detector / double talk detector according to the invention and is therefore shown in the drawing with a dotted line.
Since the estimation of the arrival direction is performed using a narrowband signal, it is difficult to detect a sudden change (transient state) of the near-end signal. Therefore, a parallel detector TD optimized for transient detection
It is possible to use After each transient position is detected, a direction-of-arrival detector is used to check the accuracy of the determination. The detector according to the invention is for example 20 m
If a sufficiently rapid signal change of less than s is detected, it is not necessary to use the transient detector TD.

【００３６】原則的に、過渡状態検出器として通常のＶ
ＡＤを使用することができる。しかし、多重マイクロフ
ォン構造によってある特定の到達方向角度を減衰させる
ことが可能であるので、推定されたスピーカ信号の方向
が減衰されるような態様で過渡状態検出器ＴＤを実現す
ることができる。この場合、検出された過渡状態が近端
信号に結びついている確率が高くなる。スピーカ方向で
の減衰は多くの異なる方法で実現できる。最も容易な方
法は２個の適応性マイクロフォン構造を使用することで
ある。原則的に、これらの２個のマイクロフォンとし
て、例えば互いに最も遠く離れたマイクロフォンのよう
な、マイクロフォン・ベクトル２のマイクロフォン２
ａ，２ｂ，．．．，２Ｍのうちの任意の２個を使用する
ことができる。減衰を実現するためには、２個のマイク
ロフォン信号があれば充分である。到達方向推定装置の
判定を利用して適応化が制御される場合（すなわち、遠
端信号だけがある場合だけ適応化が行われる）、所望の
方向での減衰が得られる。（例えば１ＫＨｚ−２ＫＨｚ
のような）ある特定の周波数範囲で検出が行われれば、
適応はより容易になる。過渡状態検出器内で、例えばＦ
ＦＴ又は帯域フィルタを使用して、マイクロフォンから
得られる信号に対して直接周波数分割を実施することが
できる。In principle, a normal V
AD can be used. However, since a specific arrival direction angle can be attenuated by the multiple microphone structure, the transient state detector TD can be realized in such a manner that the estimated direction of the speaker signal is attenuated. In this case, the probability that the detected transient state is linked to the near-end signal increases. Attenuation in the speaker direction can be achieved in many different ways. The easiest way is to use two adaptive microphone structures. In principle, these two microphones are the microphones 2 of the microphone vector 2, for example the microphones furthest from each other.
a, 2b,. . . , 2M can be used. Two microphone signals are sufficient to achieve attenuation. If the adaptation is controlled using the determination of the direction-of-arrival estimation device (that is, the adaptation is performed only when there is only the far-end signal), attenuation in the desired direction can be obtained. (For example, 1KHz-2KHz
If detection is performed in a certain frequency range (such as
Adaptation becomes easier. In the transient detector, for example, F
An FT or bandpass filter can be used to perform frequency division directly on the signal obtained from the microphone.

【００３７】実際の過渡状態検出器ＴＤは瞬間ｎにおけ
る信号の瞬間パワーＰ（ｎ）をノイズ推定値Ｎ（ｎ）と
比較する。但し、Ｐ（ｎ）はマイクロフォン信号のパワ
ー（又はスピーカ信号の方向が減衰されたマイクロフォ
ン信号のパワー）であり、ノイズ推定値Ｎ（ｎ）はその
以前の値を用いて平均化され、通話が全くない場合には
システム全体の判定によって制御される、対応するパワ
ーである。通話がない瞬間に関する情報はブロック１８
から過渡状態検出器ＴＤへと取り出すことができる（点
線の矢印）。関連する値Ｐ（ｎ）及びＮ（ｎ）はマイク
ロフォンから到達する信号に基づいて過渡状態検出器を
用いて計算することができる。（信号出力値Ｐ（ｎ）と
Ｎ（ｎ）の計算方法は公知であり、例えばＩＩＲ（無限
インパルス応答）フィルタを用いて過渡状態検出器ＴＤ
内で実行できる。その差が充分に大きい場合は、過渡状
態が検出されたものと判定される。ノイズ推定値Ｎ
（ｎ）を更新するには反復的平均化、Ｎ（ｎ＋１）＝α
Ｎ（ｎ）＋（１−α）Ｐ（ｎ）が用いられる。但しαは
平均化を制御する時定数（一般的には約０．９）であ
る。The actual transient detector TD compares the instantaneous power P (n) of the signal at instant n with the noise estimate N (n). Here, P (n) is the power of the microphone signal (or the power of the microphone signal in which the direction of the loudspeaker signal is attenuated), the noise estimation value N (n) is averaged using the previous value, and the call is terminated. In the absence of any, it is the corresponding power controlled by the overall system decision. Information about moments when there is no call is block 18
To the transient state detector TD (dotted arrow). The associated values P (n) and N (n) can be calculated using a transient detector based on the signal arriving from the microphone. (Methods for calculating the signal output values P (n) and N (n) are known. For example, a transient state detector TD using an IIR (infinite impulse response) filter is used.
Can be run within If the difference is sufficiently large, it is determined that a transient state has been detected. Noise estimation value N
Iterative averaging to update (n), N (n + 1) = α
N (n) + (1-α) P (n) is used. Where α is a time constant for controlling averaging (generally about 0.9).

【００３８】過渡状態検出器は本発明による空間検出器
の機能を補足するものである。過渡状態検出器ＴＤで近
端の通話そのものを検出することも可能であるが、確実
な検出は到達方向推定装置１５による方向判定によって
得られる。単なるエコー源（近端信号ではない）での正
しくない過渡状態の検出は到達方向推定装置１５での方
向によって修正することができる。方向の減衰が充分に
有効に動作するならば、近端通話中にエコーによって誘
発される過渡状態に留意する必要はない。エコー中に開
始される近端通話は再び明確な過渡状態として検出する
ことができ、その結果を到達方向検出器を用いてチェッ
クすることができる。過渡状態検出器ＴＤの出力はブロ
ック１８に採り入れられる（点線）。The transient detector complements the function of the spatial detector according to the invention. Although it is possible to detect the near-end call itself with the transient state detector TD, reliable detection can be obtained by the direction determination by the arrival direction estimating device 15. The detection of incorrect transients at the mere echo source (not the near-end signal) can be corrected by the direction at the direction-of-arrival estimator 15. If the directional attenuation works well enough, it is not necessary to be aware of transients induced by echoes during near-end calls. Near-end calls initiated during the echo can again be detected as distinct transients, and the results can be checked using a direction-of-arrival detector. The output of the transient detector TD is taken to block 18 (dotted line).

【００３９】近端通話活性とダブルトークはまた到達方
向推定装置１５の出力に基づいて統計的パターン認識ア
プローチによっても判定することができる。このアプロ
ーチによれば、到達方向（ＤＯＡ）角度推定に基づく通
話活性の検出は、統計的情報を利用して改良することが
できよう。ニューラル・ネットワークおよびヒドン・マ
ルコフ・モデル（ＨＭＭ）のようなパターン認識技術は
これまで多くの類似の課題に適用されて成果が上げられ
ている。パターン認識方法の強みはトレーニングが可能
なことである。充分な量のトレーニング・データが与え
られると、システムの各々の状態（近端通話、遠端通
話、ダブルトーク、無音）に対するモデルを推定するこ
とができる。次にこれらのモデルを利用して、システム
の状態を最適に検出することができる。検出プロセスは
モデリングの想定が適正である場合に限り最適であるこ
とは言うまでもない。Near-end speech activity and double talk can also be determined by a statistical pattern recognition approach based on the output of direction of arrival estimator 15. According to this approach, speech activity detection based on DOA angle estimation could be improved using statistical information. Pattern recognition techniques such as neural networks and the Hidden Markov Model (HMM) have been successfully applied to many similar tasks. The strength of the pattern recognition method is that training is possible. Given a sufficient amount of training data, a model can be estimated for each state of the system (near-end talk, far-end talk, double talk, silence). These models can then be used to optimally detect the state of the system. It goes without saying that the detection process is optimal only if the modeling assumptions are correct.

【００４０】以下に、ＨＭＭを多重マイクロフォン通話
活性の検出に利用する態様を概略的に説明する。システ
ムへの入力は依然として空間スペクトルから導出される
ので、信号（単数または複数）のＤＯＡ角度は依然とし
て本発明に基づく決定的な要因である。その上、上記に
説明した過渡状態検出部品（基準ＴＤ）をこれまでと同
様に利用することができる。Hereinafter, an embodiment in which the HMM is used to detect the activity of a multi-microphone call will be schematically described. Since the input to the system is still derived from the spatial spectrum, the DOA angle of the signal (s) is still a decisive factor according to the invention. In addition, the transient detection component (reference TD) described above can be used as before.

【００４１】ＨＭＭを利用したパターン認識の第１のス
テップはモデル・ネットワークを規定することである。
前述したように、全二重電話システムには４つの状態
（モデル）、すなわち近端通話、遠端通話、ダブルトー
ク、無音がある。各モデルは多重状態ＨＭＭでモデリン
グすることができるが、出発点としては単一状態のＨＭ
Ｍを使用できる。その他の改良の可能性としては、各状
態間での発振を防止するため、各状態上での最小限の強
制期間を使用することがある。The first step in pattern recognition using HMMs is to define a model network.
As mentioned above, there are four states (models) in a full-duplex telephone system: near-end talk, far-end talk, double talk, and silence. Each model can be modeled with a multi-state HMM, but the starting point is a single-state HM
M can be used. Other potential improvements include using a minimum forced period on each state to prevent oscillations between the states.

【００４２】理論上は、過渡状態は任意の２つのモデル
間に生じ得るが、実際には、無音とダブルトークモデル
との間、及び近端モデルと遠端モデルとの間の直接的な
過渡状態は無視することができるので、実際の過渡状態
は図８に示したとおりである。Although in theory a transient can occur between any two models, in practice there are direct transients between silence and the double talk model, and between the near and far end models. Since the state can be neglected, the actual transient state is as shown in FIG.

【００４３】モデル構造が定義されると、どのような種
類の確率分布を用いるかを決定しなければならない。音
声認識の標準的なアプローチはガウス確率密度関数（ｐ
ｄｆ）によって各状態をモデリングすることであり、こ
れは本実施例においても好ましい出発点である。その代
わりに任意のｐｄｆを用いることもできよう。モデルｐ
ｄｆのトレーニングは図９に示すような（システムが所
定の瞬間にどの状態にあるかを知ることができる）ラベ
ルを付されたトレーニング・データから公算が最も高い
パラメータを推定することによって理想的に行われる。
代替方法としては、ある特定の基本モデルから出発し、
非監視トレーニング（ｕｎｓｕｐｅｒｖｉｓｅｄｔｒ
ａｉｎｉｎｇ）と呼ばれるオンラインにシステムを適応
させる方法がある。再び音声認識に言及すると、これに
適用できるオンライン通話者適応技術が幾つかある。要
約すると、現行のデータを用いて、最大の公算を生ずる
状態が最大の重み付けで適応される。適応データが多い
ほど、更新に際してより多くの重み付けが加えられる。
非監視トレーニングの明確な問題点は、分類ミスの場合
に正しくないモデルを適応する危険があることである。
初期パラメタを少数の監視トレーニング・サンプルで推
定することができれば、適応がより良好に行われる公算
が高くなる。更に、遠端チャネル（スピーカ）が残りの
チャネルから分離され、この情報を利用することができ
る。遠端活性がある場合は、遠端及びダブルトークモデ
ルだけを適応することができ、以下同様である。Once the model structure has been defined, it must be decided what kind of probability distribution to use. The standard approach to speech recognition is the Gaussian probability density function (p
df) to model each state, which is also a preferred starting point in this embodiment. Instead, any pdf could be used. Model p
Training df is ideally done by estimating the parameters that are most likely to be trained from the labeled training data as shown in FIG. 9 (which allows us to know what state the system is in at a given moment). Done.
An alternative is to start with a certain basic model,
Unsupervised training (unsupervised tr)
There is a way to adapt the system online, called aing. Referring again to speech recognition, there are several online talker adaptation techniques that can be applied to this. In summary, using the current data, the condition that produces the highest probability is adapted with the highest weight. The more adaptive data, the more weight is added in the update.
The obvious problem with unsupervised training is that there is a risk of adapting the incorrect model in case of misclassification.
If the initial parameters can be estimated with a small number of monitoring training samples, the adaptation is likely to be better. Further, the far-end channel (speaker) is separated from the remaining channels, and this information is available. If there is far-end activity, only the far-end and double-talk models can be applied, and so on.

【００４４】実際の検出（認識）は極めて簡単である。
公算が最も高いモデルを随時選択するだけである。勿
論、遠端通話活性のような追加情報を利用して、検出性
能を更に高めることができる。この代替アプローチの論
理的な改良方法は、幾つかの状態でＨＭＭを利用するこ
とである。例えば、各システム状態を表すＨＭＭは３つ
の状態からなっている。すなわち、モデルへの過渡状
態、モデルの静止部分を表す状態、及びモデルから外れ
る過渡状態である。また、ｐｄｆモデリングの精度を高
めるためにガウスｐｄｆｓを併用することもできよう。The actual detection (recognition) is very simple.
You only need to select the model with the highest probability at any time. Of course, additional information such as far-end speech activity can be used to further enhance detection performance. A logical refinement of this alternative approach is to use HMMs in some situations. For example, the HMM representing each system state is composed of three states. That is, a transient state to the model, a state representing a stationary part of the model, and a transient state deviating from the model. Gaussian pdfs could also be used together to increase the accuracy of the pdf modeling.

【００４５】本発明による検出器を車内のハンドフリー
の用途に使用する場合、信号の最終の残響の方向を考慮
に入れるような方法で過渡状態検出器を変更することが
できる。そのような場合は、過渡状態の検出はスピーカ
の１つの推定到達方向ではなく、スピーカの幾つかの推
定到達方向を減衰させることによって改良することがで
きる。When using the detector according to the invention for in-car hands-free applications, the transient detector can be modified in such a way as to take into account the direction of the final reverberation of the signal. In such a case, the detection of transients can be improved by attenuating some estimated directions of arrival of the loudspeakers, rather than one of the directions of arrival of the loudspeakers.

【００４６】本発明による空間通話活性検出器の従来の
方法と比較した利点は、ダブルトーク状況と近端通話活
性の双方を認識することができること、その迅速さ及び
信頼性にある。通話信号の到達方向に基づく本発明の検
出器はその主要な特徴によって極めて信頼性が高い。各
通話信号のパワーレベル間の相違は結果に大きな影響を
及ぼさないが、検出器はスピーカ信号よりもパワーが大
幅に低い近端通話信号をも認識する。それに加えて、決
定の結果は適応性エコーキャンセラの動作のような、別
個の装置の動作による影響を受けない。ダブルトーク検
出器では、通話信号及び周囲のノイズレベルに応じたし
きい値レベルがしばしば存在し、そのしきい値レベルに
基づいてダブルトーク状況があるか否かが判定される。
この検出器のパラメータは主要部分では一定であり、従
って上記のような問題点は存在しない。オプションの過
渡状態検出器を使用することによって、認識の迅速さを
高めることができる。An advantage of the spatial call activity detector according to the invention compared to the conventional method is that it can recognize both double talk situations and near-end call activity, its speed and reliability. The detector according to the invention based on the direction of arrival of the speech signal is very reliable due to its main features. The difference between the power levels of each speech signal does not significantly affect the result, but the detector also recognizes near-end speech signals that have significantly lower power than the speaker signal. In addition, the result of the decision is not affected by the operation of a separate device, such as the operation of an adaptive echo canceller. In the double talk detector, there are often threshold levels according to the speech signal and the surrounding noise level, and it is determined whether there is a double talk situation based on the threshold level.
The parameters of this detector are largely constant and therefore do not have the problems described above. The speed of recognition can be increased by using an optional transient detector.

【００４７】このハンドフリー設備ではいずれにせよ、
本発明による空間検出器に必要な、遠端通話活性の検出
及び周囲ノイズの推定のような多くの動作が実施され、
従って既に実施された計算動作を本発明による検出器で
利用することができる。In any case, in this hands-free facility,
A number of operations, such as detection of far-end speech activity and estimation of ambient noise, required for the spatial detector according to the invention are performed,
Thus, already performed calculation operations can be used in the detector according to the invention.

【００４８】本発明による検出器は、ハンドフリー設備
に、例えば移動電話の車内搭載キット又は自動車電話の
ハンドフリー設備に（例えばエコーキャンセラ及び伝送
論理の一部として）使用することができる。本発明はま
た、ハンドフリー設備が電話内に含まれている、いわゆ
るハンドフリー電話の応用例などで使用するにも適して
いる。The detector according to the invention can be used in hands-free installations, for example in on-board kits for mobile phones or in hands-free installations for car phones (for example as part of an echo canceller and transmission logic). The invention is also suitable for use in so-called hands-free telephone applications where the hands-free equipment is included in the telephone.

【００４９】図６は本発明による空間的近端通話／ダブ
ルトーク検出器８０が使用されている、本発明に基づく
移動局を一例として示している。マイクロフォン・ベク
トル２から出て送信されるべき通話信号はＡ／Ｄ変換器
２０でサンプリングされ、その後で（例えば音声の符号
化、チャネルの符号化、インタリービングのような）ベ
ース周波数信号の処理、無線周波数へのミキシングと変
調、及びブロックＴＸへの送信が行われる。ブロックＴ
Ｘから、信号はデュプレックスフィルタＤＰＬＸ及びア
ンテナＡＮＴを経て空中経路へと発信される。例えばエ
コーキャンセラを制御し、又は断続的な送信で送信ＴＸ
を制御するために検出器８０を利用することができる。
受信側では復調、インタリーブのキャンセル、チャネル
復号、及び音声復号のような受信機ブランチＲＸの通常
の動作が行われ、その後、検出器６で遠端通話活性が検
出され、信号はＤ／Ａ変換器２３でアナログ形式に変換
され、スピーカ７によって再生される。図６に記載のブ
ロック２，７，２０，２３及び８０を入力、出力及び制
御信号（３０，５０，近端ＶＡＤ，ＤＴ）用の移動局用
の接続を有するセパレート形のハンドフリー装置に設置
することによって、本発明をセパレート形のハンドフリ
ー装置で実施することが可能である。本発明は更に、会
議での発呼用に卓上での１個以上のマイクロフォンとス
ピーカとを備えた会議での発呼装置に、または例えば、
マンクロフォンとスピーカを例えばビデオ表示装置に内
蔵できる、インターネット回線網を介して発呼するため
のコンピュータと接続して使用することができる。この
ように、本発明はあらゆる種類のハンドフリー形式の装
置に適している。FIG. 6 shows by way of example a mobile station according to the invention in which a spatial near-end speech / double-talk detector 80 according to the invention is used. The speech signal to be transmitted out of the microphone vector 2 is sampled in the A / D converter 20 and then processed in a base frequency signal (such as speech coding, channel coding, interleaving), Mixing and modulation to radio frequency and transmission to block TX are performed. Block T
From X, the signal is transmitted to the aerial path via the duplex filter DPLX and the antenna ANT. For example, to control the echo canceller or to transmit TX with intermittent transmission
The detector 80 can be utilized to control
On the receiving side, normal operations of the receiver branch RX such as demodulation, cancellation of interleaving, channel decoding, and voice decoding are performed, and thereafter, the far-end speech activity is detected by the detector 6, and the signal is D / A converted. The data is converted into an analog form by the device 23 and reproduced by the speaker 7. Blocks 2, 7, 20, 23 and 80 according to FIG. 6 are installed in a separate hands-free device with connections for mobile stations for input, output and control signals (30, 50, near end VAD, DT). By doing so, it is possible to carry out the present invention with a separate hands-free device. The present invention is further directed to a conference calling device with one or more microphones and speakers on a desk for conference calling, or for example,
The microphone and the speaker can be used by connecting to a computer for making a call via an Internet network, which can be built in a video display device, for example. Thus, the present invention is suitable for all kinds of hands-free type devices.

【００５０】これまで実施例を用いて本発明の実現方法
と実施態様を説明してきた。本発明は上記に提示した実
施例の細部に限定されるものではなく、かつ本発明はそ
の特徴から逸脱することなく他の実施例でも実現できる
ことは当業者には自明であろう。開示した実施例は例示
的のものであり、限定的なものではないものと見なされ
るべきである。従って、本発明を実施し、利用する可能
性は特許請求の範囲の各請求項によってのみ限定される
ものである。従って、各請求項によって特定された本発
明の異なる実施例、及び同等の実施例は本発明の範囲に
含まれるものである。The implementation method and the embodiment of the present invention have been described using the embodiments. It will be obvious to those skilled in the art that the present invention is not limited to the details of the embodiments presented above, and that the present invention may be implemented in other embodiments without departing from its features. The disclosed embodiments are to be considered illustrative and not restrictive. Therefore, the possibilities of implementing and using the invention are limited only by the following claims. Thus, different embodiments of the invention, as defined by the claims, and equivalents, are within the scope of the invention.

【図面の簡単な説明】[Brief description of the drawings]

【図１】従来から公知のエコーキャンセラのブロック図
である。FIG. 1 is a block diagram of a conventionally known echo canceller.

【図２】本発明による検出器のブロック図である。FIG. 2 is a block diagram of a detector according to the present invention.

【図３】マイクロフォン・ベクトル信号の空間スペクト
ル図である。FIG. 3 is a spatial spectrum diagram of a microphone vector signal.

【図４】自動車内のマイクロフォンおよびスピーカの設
置図である。FIG. 4 is an installation diagram of a microphone and a speaker in an automobile.

【図５】距離の関数として到達方向を（度の単位で）推
定するために用いられる更新係数を示す図である。FIG. 5 shows update coefficients used to estimate (in degrees) the direction of arrival as a function of distance.

【図６】本発明による移動局を示す図である。FIG. 6 shows a mobile station according to the invention.

【図７】到達方向の推定を流れ図の形式で示す図であ
る。FIG. 7 shows the estimation of the direction of arrival in the form of a flowchart.

【図８】代替実施例における異なる状態間の遷移状態を
示す図である。FIG. 8 is a diagram illustrating transition states between different states in an alternative embodiment.

【図９】ラベル付けされたトレーニング・データを示す
図である。FIG. 9 shows labeled training data.

[Explanation of symbols]

２…マイクロフォン・ベクトル３…近端信号５…遠端信号６…遠端通話活性検出器７…スピーカ１４…帯域フィルタ１５…到達方向推定装置１７…到達方向判定ブロック１８…ダブルトーク検出器ＴＤ…過渡状態検出器 2 microphone vector 3 near-end signal 5 far-end signal 6 far-end speech activity detector 7 speaker 14 band-pass filter 15 arrival direction estimating device 17 arrival direction determination block 18 double-talk detector TD Transient state detector

Claims

[Claims]

An apparatus for detecting an audio source, comprising: microphone means (2; 2a, 2b, 2M) for receiving an audio signal; and means for detecting audio from the received audio signal. Means for determining the direction of arrival (15, 17); means for storing the estimated direction of arrival of a particular sound source; and means for determining the direction of arrival of the received signal and the estimated direction of arrival. Means (18) for comparing, if the comparison shows that the direction of arrival of the received signal coincides with the estimated direction of arrival within a certain tolerance, the source of the audio is present A device for detecting a voice source, comprising: means (18) for indicating that the source is a specific source.

2. The microphone means (2; 2a, 2)
b, 2M) is M microphones (2a, 2b)
Wherein M is an integer, wherein the microphone is arranged to generate M microphone signals as output, the device forms a spatial spectrum based on the microphone signals, and Apparatus according to claim 1, comprising means (15) for determining the direction of arrival in the spectrum based on peaks occurring in the spectrum.

3. A means (15) for calculating the derivative of the spectral curve and determining the arrival direction by restoring the zero point of the derivative in which the derivative changes from positive to negative. 3. The device according to claim 2, wherein:

4. The microphone means (2; 2a, 2)
b, 2M), comprising means for reproducing sound in a certain direction (7), said means (17) for storing an estimated direction of arrival of a source, wherein said first source is a device. Is configured to store the estimated direction of arrival of the sound from at least two different sources when the second source is the sound reproducing means (7), in which case the first Is the estimated direction of arrival of the sound from the first source, the second estimated direction of arrival is the estimated direction of arrival of the sound from the second source, and The means (18) for detecting a sound source may be configured to determine that a direction of arrival of the received signal is closer to the first estimated direction of arrival than to the second estimated direction of arrival as a result of the comparison. The user of the device is the audio source Apparatus according to claim 2, characterized in that it is configured to display.

5. The apparatus of claim 4, wherein the means for detecting an audio source is configured to indicate a situation in which audio is simultaneously received from the first source and the second source.
An apparatus according to claim 1.

6. A means for two-way voice transmission, wherein voice from said first source is a near-end call to be transmitted and voice from said second source is voice. Apparatus according to claim 4 or 5, characterized in that the received far-end call is arranged to be reproduced using a reproduction means (7).

7. A method for receiving an audio signal and detecting an audio source wherein the audio is detected from the received audio signal, wherein a direction of arrival of the received signal is determined, and audio from a particular source is determined. The estimated direction of arrival of
The arrival direction of the received signal is compared with the estimated arrival direction, and as a result of the comparison, the arrival direction of the received signal matches the estimated arrival direction within a certain tolerance. A method of detecting an audio source, wherein if found, the audio source is displayed as being the particular source.

8. An audio signal comprising M microphones (2
a, 2b), where M is an integer, where M microphone signals are provided as outputs of each microphone, a spatial spectrum in the direction of arrival is generated based on the microphone signals, and within the spectrum The method of claim 7, wherein the direction of arrival is determined from the spectrum based on a resulting peak.

9. The arrival direction is determined by calculating a derivative of the spectral curve and restoring a zero point of the derivative where the derivative changes from positive to negative. 7. The method according to 7.

10. A parameter (fdoa) that describes the direction of arrival of each of the peaks in the spatial spectrum of the source and a parameter (pdo) that describes the probability that the direction of arrival will appear.
a) and a parameter (powdoa), wherein the strength of the sound from the source and the direction of arrival of the source are determined by averaging each parameter (fdoa, pdoa, powdoa) individually. 9. The method of claim 8, wherein the method comprises: