JP5611119B2

JP5611119B2 - Acoustic simulator, acoustic consulting apparatus, and processing method thereof

Info

Publication number: JP5611119B2
Application number: JP2011112427A
Authority: JP
Inventors: 真人戸上; 貴志住吉
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2011-05-19
Filing date: 2011-05-19
Publication date: 2014-10-22
Anticipated expiration: 2031-05-19
Also published as: JP2012242597A

Description

本発明は、使用環境に応じた音響機器の適切な設置を支援するためのシミュレーション技術及び当該シミュレーション結果を使用するコンサルティング技術に関する。 The present invention relates to a simulation technique for supporting appropriate installation of an audio device according to a use environment and a consulting technique using the simulation result.

近年、テレビ会議システムの普及が進んでいる。テレビ会議システムを用いれば、遠隔地間においても、音声と映像による双方向のコミュニケーションを実現することができる。 In recent years, video conferencing systems have become popular. If a video conference system is used, two-way communication using voice and video can be realized even between remote locations.

当該システムにおいて、音声はマイクロホンを通じて収録された後、遠方側に送信され、スピーカを通じて再生される。ところが、マイクロホンで収録される音声には、様々な雑音が混入する可能性があり、当該雑音の混入がスピーカから再生される音声の品質劣化の要因となっている。 In this system, sound is recorded through a microphone, transmitted to a remote side, and reproduced through a speaker. However, there is a possibility that various noises are mixed in the voice recorded by the microphone, and the mixing of the noise causes deterioration of the quality of the voice reproduced from the speaker.

一般には、空調機やプロジェクタが発生する風音を含む周囲雑音や音響エコーが問題にされやすい。音響エコーとは、マイクロホンと同じ室内に設置されたスピーカから再生された音がマイクロホンに混入することで発生する。 In general, ambient noise and acoustic echo including wind noise generated by air conditioners and projectors are likely to be a problem. Acoustic echo is generated when sound reproduced from a speaker installed in the same room as the microphone is mixed into the microphone.

ところが、マイクロホンで収録された音が何ら処理されることなく遠方に送信されることになると（音響エコーも送信されると）、遠方側の話者には、自分の発した声が遅れて戻ってくるように感じられる。この場合、会話の容易さが大きく損なわれてしまう。このため、周囲雑音や音響エコーを除去する機能を備える音響信号処理システムが従来より提案されている（例えば、非特許文献１参照）。 However, if the sound recorded by the microphone is transmitted far away without any processing (acoustic echoes are also sent), the far-side speaker returns with a delayed voice. It feels like coming. In this case, the ease of conversation is greatly impaired. For this reason, an acoustic signal processing system having a function of removing ambient noise and acoustic echo has been conventionally proposed (see, for example, Non-Patent Document 1).

また、周囲雑音や音響エコーの他、発話者の残響成分も一種の雑音である。このため、これら残響成分の抑圧も求められている。ところが、この残響成分は、発話者の声との相関が高い。このため、残響成分を除去する際に誤って発話者の声まで抑圧してしまう可能性がある。特に、初期反射音の除去は困難である。もっとも、発話者の口元からマイクロホンに直接伝わる音に対し、マイクホンまでの到来が数百ms程度遅れる後部残響成分を除去する技術が近年開発されている（例えば、非特許文献２参照）。 In addition to ambient noise and acoustic echo, the reverberation component of the speaker is also a kind of noise. For this reason, suppression of these reverberation components is also required. However, this reverberation component has a high correlation with the voice of the speaker. For this reason, when removing the reverberation component, the voice of the speaker may be erroneously suppressed. In particular, it is difficult to remove the initial reflected sound. However, in recent years, a technology has been developed to remove a rear reverberation component that delays arrival at the microphone by about several hundred ms from the sound directly transmitted from the speaker's mouth to the microphone (for example, see Non-Patent Document 2).

これら課題の解決には、テレビ会議システムを使用する前のチューニングや最適化が必要となる。これらの作業には、実際の現場で収録した音響特性を用いることが多い。例えば、音響環境を一種のシステムとみなし、音響環境に入った入力音（話者の口元やスピーカから出た瞬間の音）と、システムから出る出力音（マイクロホンに入った音）との間の関係をインパルス応答として計測し、チューニング及び最適化に活用することが多い。例えばインパルス応答の測定には、TSP（Time Stretched Pulse）が用いられている。 To solve these problems, tuning and optimization are required before using the video conference system. For these operations, the acoustic characteristics recorded at the actual site are often used. For example, considering the acoustic environment as a kind of system, the sound between the input sound that entered the acoustic environment (the sound of the moment when it comes out of the speaker's mouth or speaker) and the output sound that comes out of the system (the sound that entered the microphone) The relationship is often measured as an impulse response and used for tuning and optimization. For example, TSP (Time Stretched Pulse) is used for measuring the impulse response.

なお、これまで提案されているチューニングや最適化に関する技術は、いずれも主に音響信号処理で使用するパラメータの変更であり、収録したインパルス応答を使って該当環境における発話音声をシミュレートし、評価に用いている。 All of the tuning and optimization technologies that have been proposed so far are mainly changes in parameters used in acoustic signal processing. The recorded impulse response is used to simulate and evaluate speech speech in the corresponding environment. Used for.

因みに、コンサートホールその他建物の設計時や建築時における音響設計の用途では、対象建物のＣＡＤデータから音響環境をシミュレーションし、音の聞こえ方を事前に知ることができるシステムが開発されている（例えば、特許文献１参照）。 Incidentally, in the design of concert halls and other acoustic designs when building or building, a system has been developed that can simulate the acoustic environment from CAD data of the target building and know in advance how to hear the sound (for example, , See Patent Document 1).

特許第２８４６１６２号明細書Japanese Patent No. 2846162

戸上真人他著、「垂直配置マイクロホンアレーを利用した卓上突発音除去機能を備える遠隔会議システム」、電子情報通信学会論文誌 D，Vol.J93-D No.10 pp. 2069-2084、2010/10.Masato Togami et al., “Remote conference system with table-top sudden sound removal function using vertical microphone array”, IEICE Transactions D, Vol. J93-D No.10 pp. 2069-2084, 2010/10 . K. Kinoshita, M. Delcroix, T. Nakatani and M. Miyoshi, “Suppression of late reverberation effect on speech signal using long-term multiple-step linear prediction”, IEEE Transactions on Audio, Speech and Language processing, 17(4), pp.534-545, 2009K. Kinoshita, M. Delcroix, T. Nakatani and M. Miyoshi, “Suppression of late reverberation effect on speech signal using long-term multiple-step linear prediction”, IEEE Transactions on Audio, Speech and Language processing, 17 (4) , pp.534-545, 2009

ところが、前述した音響信号処理技術は、室内の環境（音響条件）の影響を受け易い。具体的には残響時間、雑音の種類及び発生位置、スピーカとマイクロホンの位置関係、スピーカの再生音量等の影響を受け易い。 However, the above-described acoustic signal processing technique is easily affected by the indoor environment (acoustic conditions). Specifically, it is easily affected by the reverberation time, the type and location of noise, the positional relationship between the speaker and the microphone, the reproduction volume of the speaker, and the like.

このため、テレビ会議システム等のチューニング及び最適化には、音響信号処理で使用するパラメータを単にチューニングするだけでなく、マイクロホンとスピーカの位置関係やマイクロホンの数を使用環境に応じて最適化する必要がある。 For this reason, tuning and optimization of video conferencing systems, etc. require not only tuning parameters used in acoustic signal processing, but also optimizing the positional relationship between microphones and speakers and the number of microphones according to the operating environment. There is.

また、既存の音響シミュレータは、事前に現場のＣＡＤデータが取得されていることが必要であり、ＣＡＤデータが存在しない場合には、そもそも音響特性をシミュレーションすることができなかった。 In addition, existing acoustic simulators require that on-site CAD data be acquired in advance, and if CAD data does not exist, the acoustic characteristics could not be simulated in the first place.

これらの技術課題を鋭意検討した本発明者は、ＣＡＤデータが存在しない又は利用できない環境下でも、マイクロホンとスピーカの位置関係等を使用環境に応じて最適化することができるシミュレーション技術及び当該処理結果を使用する音響コンサルティング技術を発明した。 The present inventor, who has intensively studied these technical problems, is able to optimize the positional relationship between the microphone and the speaker according to the use environment even in an environment where CAD data does not exist or cannot be used, and the processing result. Invented acoustic consulting technology that uses.

本発明に係る音響シミュレータは、音響システムを構築する空間内で実際に収録された音響データと、当該音響データの収録時に使用したマイクロホン及びスピーカの実測に基づく位置情報と、システム構築後に想定される音源、マイクロホン及びスピーカの位置情報等に基づいて、当該マイクロホンの設置位置における音響特性を推定する。 The acoustic simulator according to the present invention is assumed to be acoustic data actually recorded in a space for constructing an acoustic system, position information based on actual measurements of microphones and speakers used at the time of recording the acoustic data, and after system construction. Based on the position information of the sound source, the microphone, and the speaker, the acoustic characteristics at the installation position of the microphone are estimated.

また、本発明に係る音響コンサルティング装置は、音響シミュレータにより推定された音響特性が所定の性能を満たすか否かを評価し、評価結果をユーザーに提示する。 The acoustic consulting apparatus according to the present invention evaluates whether or not the acoustic characteristics estimated by the acoustic simulator satisfy a predetermined performance, and presents the evaluation result to the user.

本発明によれば、音響システムを構築する空間に関するＣＡＤデータが存在しない場合や利用できない場合でも、空間に適した音響環境の構築を支援することができる。
上記した以外の課題、構成及び効果は、以下の実施形態の説明により明らかにされる。 ADVANTAGE OF THE INVENTION According to this invention, even when the CAD data regarding the space which construct | assembles an acoustic system does not exist, or when it cannot utilize, it can support the construction of the acoustic environment suitable for space.
Problems, configurations, and effects other than those described above will be clarified by the following description of embodiments.

形態例に係るテレビ会議システムの設置環境を説明する図。The figure explaining the installation environment of the video conference system which concerns on an example. 形態例に係る音響コンサルティング装置（音響シミュレータ）のハードウェア構成を示す図。The figure which shows the hardware constitutions of the acoustic consulting apparatus (acoustic simulator) which concerns on an example. 形態例に係る音響シミュレータで実行される処理手順を示すフローチャート。The flowchart which shows the process sequence performed with the acoustic simulator which concerns on an example. マイクロホン情報テーブルの例を示す図。The figure which shows the example of a microphone information table. スピーカ情報テーブルの例を示す図。The figure which shows the example of a speaker information table. 仮想話者位置に関するテーブル例を示す図。The figure which shows the example of a table regarding a virtual speaker position. 音響特性計測部の構成例を示す図。The figure which shows the structural example of an acoustic characteristic measurement part. 周囲雑音測定部の構成例を示す図。The figure which shows the structural example of an ambient noise measurement part. 音響シミュレーション部の構成例を示す図。The figure which shows the structural example of an acoustic simulation part. インパルス応答の直接音成分と残響成分を説明する図。The figure explaining the direct sound component and reverberation component of an impulse response. 形態例に係る音響コンサルティング装置で実行される処理手順を示すフローチャート。The flowchart which shows the process sequence performed with the acoustic consulting apparatus which concerns on an example. 所望性能テーブルの例を示す図。The figure which shows the example of a desired performance table. 性能評価部の構成例を示す図。The figure which shows the structural example of a performance evaluation part. マイクロホンの数及び配置とスピーカの配置を自動的に最適化する際に実行される処理手順を説明するフローチャート。The flowchart explaining the process sequence performed when the number and arrangement | positioning of a microphone and arrangement | positioning of a speaker are optimized automatically.

以下、図面に基づいて、本発明の実施の形態を説明する。なお、本発明の実施の態様は、後述する形態例に限定されるものではなく、その技術思想の範囲において、種々の変形が可能である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. The embodiment of the present invention is not limited to the embodiments described later, and various modifications are possible within the scope of the technical idea.

以下、ＣＡＤデータが存在しない空間にテレビ会議システムを構築する場合に使用して好適な音響コンサルティング装置の仕組みを説明する。図１に、音響コンサルティング装置を適用するテレビ会議システムの設置環境例を示す。なお、本明細書における「テレビ会議システム」は、ビデオ会議システムやＷｅｂ会議システムも含む意味で使用する。 Hereinafter, a mechanism of an audio consulting apparatus suitable for use in constructing a video conference system in a space where CAD data does not exist will be described. FIG. 1 shows an installation environment example of a video conference system to which an acoustic consulting apparatus is applied. Note that the “video conference system” in this specification is used to include a video conference system and a web conference system.

（テレビ会議システムの構成要素と配置）
本形態例において、テレビ会議システムは、マイクロホンとスピーカを備える汎用のテレビ会議システムを想定する。もっとも、特定の用途に最適化されたテレビ会議システムであっても構わない。 (Components and arrangement of the video conference system)
In this embodiment, the video conference system is assumed to be a general-purpose video conference system including a microphone and a speaker. However, it may be a video conference system optimized for a specific application.

テレビ会議システム設置環境１０１は、テレビ会議システムを構築する空間（環境）であれば、特に制約はない。ここでは、会議室を想定する。この形態例の場合、会議室に配置された机１０２の上には会議用マイクロホン１０４が設置されているものとする。また、会議用スピーカ１０３は、同じ会議室内に配置されているものとする。会議用スピーカ１０３と会議用マイクロホン１０４は、汎用のテレビ会議システムに常に接続されていても良い。 The video conference system installation environment 101 is not particularly limited as long as it is a space (environment) for constructing a video conference system. Here, a conference room is assumed. In the case of this embodiment, it is assumed that a conference microphone 104 is installed on the desk 102 arranged in the conference room. In addition, the conference speaker 103 is arranged in the same conference room. The conference speaker 103 and the conference microphone 104 may be always connected to a general-purpose video conference system.

図１に係るテレビ会議システムは、発話者が２人の場合を想定する。図１では、想定する発話者の位置を、想定話者位置１０５−１及び１０５−２で表している。もっとも、システム的には発話者は１人でも３人以上でも良い。また、会議用スピーカ１０３や会議用マイクロホン１０４も、システム的には１台に限らず、複数台であっても良い。 The video conference system according to FIG. 1 assumes a case where there are two speakers. In FIG. 1, assumed speaker positions are represented by assumed speaker positions 105-1 and 105-2. However, the number of speakers may be one or more than three in terms of system. Further, the conference speaker 103 and the conference microphone 104 are not limited to one system, and may be a plurality of units.

図１には、テレビ会議システムを使用する際の音響条件を与える会議用スピーカ１０３、会議用マイクロホン１０４及び想定話者位置１０５−１及び１０５−２の他、音響特性の実測時に使用する４台の測定用マイクロホンアレイ１０６と２台の測定用スピーカアレイ１０７を描いている。 FIG. 1 shows a conference speaker 103, a conference microphone 104, and assumed speaker positions 105-1 and 105-2 that give acoustic conditions when using the video conference system, and four units used for actual measurement of acoustic characteristics. The measurement microphone array 106 and two measurement speaker arrays 107 are depicted.

この形態例の場合、測定用マイクロホンアレイ１０６及び測定用スピーカアレイ１０７は、テレビ会議システム設置環境１０１の音響特性の測定時に測定ユーザーによって配置される。図１の場合、測定用マイクロホンアレイ１０６は机１０２の四隅に配置されている。また、測定用スピーカアレイ１０７は、想定話者位置１０５−１及び１０５−２の背後に配置されている。 In the case of this embodiment, the measurement microphone array 106 and the measurement speaker array 107 are arranged by the measurement user when measuring the acoustic characteristics of the video conference system installation environment 101. In the case of FIG. 1, the measurement microphone arrays 106 are arranged at the four corners of the desk 102. The measurement speaker array 107 is arranged behind the assumed speaker positions 105-1 and 105-2.

ここで、測定用スピーカアレイ１０７は、音響特性を測定する際の参照音の放出に使用される。この形態例の場合、測定用スピーカアレイ１０７は、複数のスピーカの集合体であるが、１台のスピーカにより構成されていても良い。 Here, the measurement speaker array 107 is used to emit a reference sound when measuring the acoustic characteristics. In the case of this embodiment, the measurement speaker array 107 is an aggregate of a plurality of speakers, but may be configured by a single speaker.

なお、図１の場合、測定用マイクロホンアレイ１０６と測定用スピーカアレイ１０７はいずれも複数ずつ配置されているが、いずれも１台だけ配置しても良い。また、測定用マイクロホン１０６及び測定用スピーカアレイ１０７は、音響コンサルティング装置による音響特性の測定及び最適条件の出力の後、会議室から取り外される。もっとも、一部は、会議用スピーカ１０３や会議用マイクロホン１０４と兼用しても良い。 In the case of FIG. 1, a plurality of measurement microphone arrays 106 and a plurality of measurement speaker arrays 107 are arranged, but only one of them may be arranged. The measurement microphone 106 and the measurement speaker array 107 are removed from the conference room after the acoustic characteristics are measured by the acoustic consulting apparatus and the optimum conditions are output. However, a part may be shared with the conference speaker 103 and the conference microphone 104.

（音響コンサルティング装置のハードウェア構成）
図２に、形態例に係る音響コンサルティング装置のハードウェア構成を示す。なお、音響シミュレーション装置は、音響コンサルティング装置の機能の一部として実現される。従って、音響コンサルティング装置のハードウェア構成は、音響シミュレーション装置と共通である。以下では、音響コンサルティング装置のハードウェア構成として説明する。 (Hardware configuration of acoustic consulting equipment)
FIG. 2 shows a hardware configuration of the acoustic consulting apparatus according to the embodiment. The acoustic simulation apparatus is realized as a part of the function of the acoustic consulting apparatus. Therefore, the hardware configuration of the acoustic consulting apparatus is the same as that of the acoustic simulation apparatus. Below, it demonstrates as a hardware constitutions of an acoustic consulting apparatus.

形態例に係る音響コンサルティング装置は、測定用マイクロホンアレイ１０６と測定用スピーカアレイ１０７を、コンピュータに接続することにより構成される。 The acoustic consulting apparatus according to the embodiment is configured by connecting the measurement microphone array 106 and the measurement speaker array 107 to a computer.

測定用マイクロホンアレイ１０６で取り込んだ音響信号は、多チャンネルＡＤ（Analog to Digital）変換装置２０２により、アナログ信号からデジタル信号に変換される。変換後のデジタル信号は中央演算装置２０３に与えられる。 The acoustic signal captured by the measurement microphone array 106 is converted from an analog signal to a digital signal by a multi-channel AD (Analog to Digital) converter 202. The converted digital signal is given to the central processing unit 203.

中央演算装置２０３は、各種のプログラムを実行する。この形態例の場合、中央演算装置２０３は、会議室内の任意の位置に会議用マイクロホン１０４や会議用スピーカ１０３を配置した場合の音響特性をシミュレーションする処理やその処理結果を評価する処理等を実行する。当該処理機能を実現するプログラムを、本明細書では、「音響信号処理プログラム」ということにする。なお、音響信号処理プログラムは、不揮発性メモリ２０４に記憶されており、必要に応じて中央演算装置２０３に読み出される。因みに、当該プログラムを実行するためのワークメモリは、揮発性メモリ２０５上に確保される。 The central processing unit 203 executes various programs. In the case of this embodiment, the central processing unit 203 executes processing for simulating acoustic characteristics and processing for evaluating the processing results when the conference microphone 104 and the conference speaker 103 are arranged at an arbitrary position in the conference room. To do. In this specification, a program that realizes the processing function is referred to as an “acoustic signal processing program”. The acoustic signal processing program is stored in the nonvolatile memory 204 and is read out to the central processing unit 203 as necessary. Incidentally, a work memory for executing the program is secured on the volatile memory 205.

前述したように、形態例に係る音響コンサルティング装置は、テレビ会議システム設置環境１０１の音響特性を測定用マイクロホンアレイ１０６及び測定用スピーカアレイ１０７で実測し、音響シミュレーションのための基礎データとする。ここでの音響特性は、インパルス応答特性や周囲雑音である。 As described above, the acoustic consulting apparatus according to the embodiment actually measures the acoustic characteristics of the video conference system installation environment 101 with the measurement microphone array 106 and the measurement speaker array 107 and uses them as basic data for acoustic simulation. The acoustic characteristics here are impulse response characteristics and ambient noise.

音響特性の測定時、中央演算装置２０３は、多チャンネルＤＡ（Digital to Analog）変換装置２０６にインパルス応答測定用の参照信号をデジタル信号として送信する。当該参照信号は、多チャンネルＤＡ変換装置２０６においてアナログ信号に変換され、測定用スピーカアレイ１０７に出力される。測定用スピーカアレイ１０７は、入力された参照信号に対応する音をテレビ会議システム設定環境１０１に放射する。 When measuring the acoustic characteristics, the central processing unit 203 transmits a reference signal for impulse response measurement as a digital signal to a multi-channel DA (Digital to Analog) converter 206. The reference signal is converted into an analog signal by the multi-channel DA converter 206 and output to the measurement speaker array 107. The measurement speaker array 107 radiates sound corresponding to the input reference signal to the video conference system setting environment 101.

中央演算装置２０３には、ユーザーのためのインターフェースとして、マウス２０８及びキーボード２０９が用意されている。ユーザーは、これらのインターフェースを使用し、中央演算装置２０３に情報を入力する。また、音響シミュレーションの結果は、ディスプレイ２１０に表示され、ユーザーはシミュレーション結果や評価結果を目視により確認することができる。 The central processing unit 203 is provided with a mouse 208 and a keyboard 209 as user interfaces. The user inputs information to the central processing unit 203 using these interfaces. The result of the acoustic simulation is displayed on the display 210, and the user can visually confirm the simulation result and the evaluation result.

（音響シミュレータとしての処理）
まず、本形態例に係る音響コンサルティング装置の基本機能（音響シミュレーション機能）について説明する。ここで、音響シミュレーション機能とは、会議用スピーカ１０３と会議用マイクロホン１０４をテレビ会議システム設置環境１０１内の任意の位置に仮想的に設定した場合に収録される音響特性をシミュレーションする機能である。 (Processing as an acoustic simulator)
First, the basic function (acoustic simulation function) of the acoustic consulting apparatus according to this embodiment will be described. Here, the acoustic simulation function is a function for simulating acoustic characteristics recorded when the conference speaker 103 and the conference microphone 104 are virtually set at arbitrary positions in the video conference system installation environment 101.

ただし、本形態例では、テレビ会議システム設置環境１０１のＣＡＤデータが存在しないことを前提とする。このため、音響シミュレータとして動作する音響コンサルティング装置は、テレビ会議システム設置環境１０１の音響特性を測定用マイクロホンアレイ１０６と測定用スピーカアレイ１０７を用いて測定し、当該測定結果を使用して仮想位置における音響特性を演算する。以下では、音響シミュレーション機能の提供主体を音響シミュレータと呼ぶ。 However, in this embodiment, it is assumed that there is no CAD data for the video conference system installation environment 101. For this reason, the acoustic consulting apparatus operating as an acoustic simulator measures the acoustic characteristics of the video conference system installation environment 101 using the measurement microphone array 106 and the measurement speaker array 107, and uses the measurement result at the virtual position. Calculate acoustic characteristics. Hereinafter, the provider of the acoustic simulation function is referred to as an acoustic simulator.

図３に、音響シミュレータの処理手順の概略を示す。音響シミュレータは、処理３０１〜処理３０３において、テレビ会議システム設置環境１０１に設置されている会議用マイクロホン１０４、会議用スピーカ１０３の位置情報及び想定話者位置１０５−１、１０５−２の登録処理を実行する。なお、処理３０１〜処理３０３の実行順序は一例であり、どのような順序で実行されても構わない。 FIG. 3 shows an outline of the processing procedure of the acoustic simulator. In processing 301 to processing 303, the acoustic simulator performs registration processing of the position information of the conference microphone 104 and the conference speaker 103 installed in the video conference system installation environment 101 and the assumed speaker positions 105-1 and 105-2. Run. Note that the execution order of the processing 301 to the processing 303 is an example, and may be executed in any order.

処理３０１において、音響シミュレータは、会議用マイクロホン１０４に関する情報の登録処理を実行する。ここでの情報は、中央演算装置２０３による処理が可能なように、マウス２０８、キーボード２０９その他の入力装置を通じて入力される。ＣＡＤデータが存在しないため、登録（設定）作業は手作業で行われる。他の音響機器の位置情報の登録についても同様である。 In the process 301, the acoustic simulator executes a process for registering information related to the conference microphone 104. The information here is input through the mouse 208, the keyboard 209, and other input devices so that the central processing unit 203 can process the information. Since CAD data does not exist, registration (setting) work is performed manually. The same applies to registration of position information of other audio equipment.

なお、会議用マイクロホン１０４の情報には、テレビ会議システム設置環境１０１内の設置場所、マイクロホンの指向特性、マイクロホンの向き等が含まれる。 The information of the conference microphone 104 includes the installation location in the video conference system installation environment 101, the microphone directivity, the microphone orientation, and the like.

図４に、会議用マイクロホン１０４の情報の登録例を示す。なお、後述するように、図４に示す登録項目は、測定用マイクロホンの情報の場合にも共通である。ただし、会議用の情報と測定用の情報は別テーブルで管理される。 FIG. 4 shows an example of registration of information of the conference microphone 104. As will be described later, the registration items shown in FIG. 4 are common to the information of the measurement microphone. However, the conference information and the measurement information are managed in separate tables.

図４の各行が、各マイクロホンの情報に対応する。図４の場合、３台のマイクロホンの使用が想定されている。各行には、マイクロホンを一意に特定するマイクロホンＩＤが付与されている。また、各行には、マイクロホンの三次元的な位置（ｘ，ｙ，ｚ）及び向きが記憶されている。単位に一貫性があれば、任意の単位系を使用できる。座標系は絶対座標とし、テレビ会議システム設置環境１０１毎に同じ座標系を使用することを想定する。 Each row in FIG. 4 corresponds to information on each microphone. In the case of FIG. 4, the use of three microphones is assumed. Each row is given a microphone ID that uniquely identifies the microphone. Each row stores the three-dimensional position (x, y, z) and orientation of the microphone. Any unit system can be used as long as the units are consistent. It is assumed that the coordinate system is absolute, and the same coordinate system is used for each video conference system installation environment 101.

なお、会議用マイクロホン１０４が未設定の場合、設置予定の座標値を入力しても良い。会議用マイクロホン１０４の場所を指定する座標値には、例えば実測値を入力する。設置位置に関する情報を参照可能な場合には、その情報を手入力しても良い。この形態例の場合、会議用マイクロホン１０４は既設であり、測定用マイクロホンアレイ１０６や測定用スピーカアレイ１０７の設置位置を実測する際の基準点として使用する。 If the conference microphone 104 is not set, a coordinate value to be installed may be input. For example, an actual measurement value is input as a coordinate value for designating the location of the conference microphone 104. When the information regarding the installation position can be referred to, the information may be manually input. In the case of this embodiment, the conference microphone 104 is already installed and is used as a reference point when actually measuring the installation positions of the measurement microphone array 106 and the measurement speaker array 107.

この他、各行には、マイクロホンの指向特性の情報が付与されている。指向特性の情報から、正面に対する方位角毎の音圧レベルが一意に定まる。指向特性は、一般的には、マイクロホンのカタログ等から知ることができる。 In addition, information on the directivity of the microphone is given to each row. From the information on the directivity, the sound pressure level for each azimuth angle with respect to the front is uniquely determined. The directivity characteristic can be generally known from a microphone catalog or the like.

処理３０２において、音響シミュレータは、会議用スピーカ１０３に関する情報の登録処理を実行する。ここでの情報も、中央演算装置２０３におる処理が可能なように、マウス２０８、キーボード２０９その他の入力装置を通じて入力される。 In process 302, the acoustic simulator executes a process for registering information related to the conference speaker 103. The information here is also input through the mouse 208, the keyboard 209, and other input devices so that the central processing unit 203 can perform processing.

会議用スピーカ１０３の情報には、テレビ会議システム設置環境１０１内の設置場所、スピーカの放射特性、スピーカの向き等が含まれる。 The information of the conference speaker 103 includes the installation location in the video conference system installation environment 101, the radiation characteristics of the speaker, the direction of the speaker, and the like.

図５に、会議用スピーカ１０３の情報の登録例を示す。なお、後述するように、図５に示す登録項目は、測定用スピーカアレイの情報の登録にも使用できる。ただし、会議用の情報と測定用の情報とは別テーブルで管理される。 FIG. 5 shows a registration example of information of the conference speaker 103. As will be described later, the registration items shown in FIG. 5 can also be used for registering information of the measurement speaker array. However, the conference information and the measurement information are managed in separate tables.

図５の各行が、各スピーカの情報に対応する。図５の場合、３台のスピーカの使用が想定されている。各行には、スピーカを一意に特定するスピーカＩＤが付与されている。また、各行には、スピーカの三次元的な位置（ｘ，ｙ，ｚ）及び向きが記憶される。単位系は任意であるが、マイクロホンと同じ座標系を使用する。 Each row in FIG. 5 corresponds to information of each speaker. In the case of FIG. 5, the use of three speakers is assumed. Each row is given a speaker ID that uniquely identifies the speaker. Each row stores a three-dimensional position (x, y, z) and orientation of the speaker. The unit system is arbitrary, but the same coordinate system as the microphone is used.

なお、会議用スピーカ１０３が未設の場合には設置予定の座標値を入力しても良い。会議用スピーカ１０３の場所を指定する座標値には、例えば実測値を入力する。設置位置に関する情報を参照可能な場合には、その情報を手入力しても良い。なお、会議用スピーカ１０３が既設の場合には、当該設定位置を、測定用マイクロホンアレイ１０６や測定用スピーカアレイ１０７の設置位置を実測する際の基準点に使用しても良い。 If the conference speaker 103 is not installed, a coordinate value to be installed may be input. For example, an actual measurement value is input as the coordinate value for designating the location of the conference speaker 103. When the information regarding the installation position can be referred to, the information may be manually input. When the conference speaker 103 is already installed, the set position may be used as a reference point for actually measuring the installation positions of the measurement microphone array 106 and the measurement speaker array 107.

この他、各行には、スピーカの放射特性の情報が付与されている。放射特性の情報から、正面に対する方位角ごとの音圧レベルが一意に定まる。放射特性は、一般的には、スピーカのカタログ等から知ることができる。 In addition, information on the radiation characteristics of the speaker is given to each row. From the information on the radiation characteristics, the sound pressure level for each azimuth angle with respect to the front is uniquely determined. The radiation characteristics can be generally known from a catalog of speakers or the like.

処理３０３において、音響シミュレータは、想定話者位置に関する情報の登録処理を実行する。想定話者位置とは、テレビ会議の参加者の着席位置として想定される範囲を指定する情報である。ここでの情報も、中央演算装置２０３におる処理が可能なように、マウス２０８、キーボード２０９その他の入力装置を通じて入力される。 In process 303, the acoustic simulator executes a process of registering information related to the assumed speaker position. The assumed speaker position is information that designates a range that is assumed as the seating position of the participant of the video conference. The information here is also input through the mouse 208, the keyboard 209, and other input devices so that the central processing unit 203 can perform processing.

図６に、想定話者位置の登録例を示す。図６の各行が、想定話者位置の情報に対応する。想定話者位置が複数であれば、複数の話者位置が設定される。図６は想定話者位置が３つの場合を表している。各行には、想定話者位置を一意に特定する想定話者位置ＩＤが付与されている。また、各行には、想定話者位置の中心位置を与える三次元的な位置（ｘ，ｙ，ｚ）と当該中心位置に対する範囲を与える半径Ｒが記憶されている。座標系は絶対座標であり、マイクロホンと同じ座標系を使用する。 FIG. 6 shows a registration example of the assumed speaker position. Each line in FIG. 6 corresponds to information on the assumed speaker position. If there are a plurality of assumed speaker positions, a plurality of speaker positions are set. FIG. 6 shows a case where there are three assumed speaker positions. Each row is given an assumed speaker position ID that uniquely identifies the assumed speaker position. Each row stores a three-dimensional position (x, y, z) that gives the center position of the assumed speaker position and a radius R that gives a range for the center position. The coordinate system is absolute and uses the same coordinate system as the microphone.

次に、音響シミュレータは、処理３０４〜処理３０５において、テレビ会議システム設置環境１０１の音響特性を測定するための計測機器（音響機器）の位置情報等を設定する。処理３０４と処理３０５の実行順序は一例であり、いずれが先に実行されても構わない。 Next, in processing 304 to processing 305, the acoustic simulator sets position information and the like of a measuring device (acoustic device) for measuring acoustic characteristics of the video conference system installation environment 101. The execution order of the process 304 and the process 305 is an example, and any of them may be executed first.

処理３０４において、音響シミュレータは、測定用マイクロホン１０６に関する情報の登録処理を実行する。ここでの情報は、中央演算装置２０３による処理が可能なように、マウス２０８、キーボード２０９その他の入力装置を通じて入力される。 In process 304, the acoustic simulator executes a process for registering information regarding the measurement microphone 106. The information here is input through the mouse 208, the keyboard 209, and other input devices so that the central processing unit 203 can process the information.

測定用マイクロホン１０６の情報には、テレビ会議システム設置環境１０１内の設置場所、マイクロホンの指向特性、マイクロホンの向き等が含まれる。前述の通り、測定用マイクロホン１０６に関する情報は、会議用マイクロホン１０４に関する情報とは別のテーブルに記録される。なお、座標系は絶対座標であり、マイクロホンと同じ座標系を使用する。ここで、測定用マイクロホン１０６の座標値は、テレビ会議システム設置環境１０１内に設定された基準点（例えば会議用マイクロホン１０４）に対する相対的な位置情報として入力しても良い。この入力手法を採用する場合、中央演算装置２０３によって絶対座標に変換する処理が実行される。 The information of the measurement microphone 106 includes an installation location in the video conference system installation environment 101, a microphone directivity, a microphone orientation, and the like. As described above, the information related to the measurement microphone 106 is recorded in a table different from the information related to the conference microphone 104. Note that the coordinate system is absolute and uses the same coordinate system as the microphone. Here, the coordinate value of the measurement microphone 106 may be input as relative position information with respect to a reference point (for example, the conference microphone 104) set in the video conference system installation environment 101. When this input method is adopted, the central processing unit 203 executes processing for conversion to absolute coordinates.

処理３０５において、音響シミュレータは、測定用スピーカアレイ１０７に関する情報の登録処理を実行する。ここでの情報は、中央演算装置２０３による処理が可能なように、マウス２０８、キーボード２０９その他の入力装置を通じて入力される。 In process 305, the acoustic simulator executes a process for registering information related to the measurement speaker array 107. The information here is input through the mouse 208, the keyboard 209, and other input devices so that the central processing unit 203 can process the information.

測定用スピーカアレイ１０７の情報には、テレビ会議システム設置環境１０１内の設置場所、スピーカの放射特性、スピーカの向き等が含まれる。前述の通り、測定用スピーカアレイ１０７に関する情報は、会議用スピーカ１０３に関する情報とは別のテーブルに記録される。なお、座標系は絶対座標であり、マイクロホンと同じ座標系を使用する。やはり、測定用スピーカアレイ１０７の座標値は、テレビ会議システム設置環境１０１内に設定された基準点（例えば会議用マイクロホン１０４）に対する相対的な位置情報として入力しても良い。この入力手法を採用する場合、中央演算装置２０３によって絶対座標に変換する処理が実行される。 The information of the measurement speaker array 107 includes the installation location in the video conference system installation environment 101, the radiation characteristics of the speaker, the direction of the speaker, and the like. As described above, the information related to the measurement speaker array 107 is recorded in a table different from the information related to the conference speaker 103. Note that the coordinate system is absolute and uses the same coordinate system as the microphone. Similarly, the coordinate values of the measurement speaker array 107 may be input as relative position information with respect to a reference point (for example, the conference microphone 104) set in the video conference system installation environment 101. When this input method is adopted, the central processing unit 203 executes processing for conversion to absolute coordinates.

処理３０６において、音響シミュレータは、テレビ会議システム設置環境１０１内に設置された測定用マイクロホンアレイ１０６と測定用スピーカアレイ１０７を用い、テレビ会議システム設定環境１０１に固有の音響特性を計測する。処理３０６において、音響シミュレータは、測定用スピーカアレイ１０７と測定用マイクロホンアレイ１０６間の伝達特性（インパルス応答）の測定と周囲雑音の測定を実行する。 In the process 306, the acoustic simulator uses the measurement microphone array 106 and the measurement speaker array 107 installed in the video conference system installation environment 101 to measure acoustic characteristics specific to the video conference system setting environment 101. In process 306, the acoustic simulator performs measurement of transfer characteristics (impulse response) and measurement of ambient noise between the measurement speaker array 107 and the measurement microphone array 106.

図７に、処理３０６に対応する処理機能を実現するプログラムの機能ブロック構成を示す。以下の説明では、当該プログラムを音響特性計測部７０１と呼ぶ。音響特性計測部７０１は、インパルス応答測定部７０２と周囲雑音測定部７０３で構成される。 FIG. 7 shows a functional block configuration of a program that realizes a processing function corresponding to the processing 306. In the following description, the program is referred to as an acoustic characteristic measurement unit 701. The acoustic characteristic measurement unit 701 includes an impulse response measurement unit 702 and an ambient noise measurement unit 703.

インパルス応答測定部７０２は、テレビ会議システム設定環境１０１におけるインパルス応答を、例えばＴＳＰ法（例えば特許文献１参照）を用いて測定する。この他、測定用スピーカアレイ１０７から白色雑音などの全周波数成分を含んだ音を放射して測定用マイクロホンアレイ１０６で収録し、マイクロホンで収録された信号と放射音の原信号の相関係数を調べることでインパルス応答を測定しても良い。 The impulse response measurement unit 702 measures the impulse response in the video conference system setting environment 101 using, for example, the TSP method (see, for example, Patent Document 1). In addition, sound including all frequency components such as white noise is radiated from the measurement speaker array 107 and recorded by the measurement microphone array 106, and the correlation coefficient between the signal recorded by the microphone and the original signal of the radiated sound is calculated. The impulse response may be measured by examining it.

図７に示すように、インパルス応答の測定時には、測定用マイクロホンアレイ１０６に接続された多チャンネルＡＤ変換装置２０２と、測定用スピーカアレイ１０７に接続された多チャンネルＤＡ変換装置２０６を使用する。 As shown in FIG. 7, when measuring the impulse response, a multi-channel AD converter 202 connected to the measurement microphone array 106 and a multi-channel DA converter 206 connected to the measurement speaker array 107 are used.

多チャンネルＤＡ変換装置２０６は、インパルス応答測定に用いる白色信号やＴＳＰ信号（音響信号Ｓ３）をインパルス応答測定部７０２から入力し、当該音響信号Ｓ３をデジタル信号からアナログ信号に変換する。多チャンネルＡＤ変換装置２０２は、多チャンネルＤＡ変換装置２０６と同期制御され、インパルス応答測定中の音声信号をアナログ信号からデジタル信号（音響信号Ｓ１、Ｓ２）に変換する。変換後のデジタル信号は、インパルス応答測定部７０２及び周囲雑音測定部７０３に与える。 The multi-channel DA converter 206 receives a white signal or TSP signal (acoustic signal S3) used for impulse response measurement from the impulse response measuring unit 702, and converts the acoustic signal S3 from a digital signal to an analog signal. The multi-channel AD converter 202 is synchronously controlled with the multi-channel DA converter 206, and converts an audio signal during impulse response measurement from an analog signal to a digital signal (acoustic signals S1, S2). The converted digital signal is given to the impulse response measurement unit 702 and the ambient noise measurement unit 703.

インパルス応答測定部７０２は、与えられた信号に相関係数推定処理やＴＳＰ（Time Stretched Pulse）逆変換処理を適用し、インパルス応答Ｓ４を得る。これらの処理自体は既知であるため、詳細な説明は省略する。 The impulse response measurement unit 702 applies a correlation coefficient estimation process or a TSP (Time Stretched Pulse) inverse transform process to a given signal to obtain an impulse response S4. Since these processes are already known, detailed description thereof is omitted.

一方、周囲雑音測定部７０３は、与えられた信号からテレビ会議システム設置環境１０１内の周囲雑音を測定する。周囲雑音の収録は、実際のテレビ会議中の雑音にできる限り近い雑音が生じるように、テレビ会議システム設置環境１０１の機器を制御する。例えばテレビ会議システム設置環境１０１に空調機やプロジェクタが配備されている場合、それら機器を動作させた状態で周囲雑音を収録する。勿論、周囲雑音の収録時には、測定用スピーカアレイ１０７から音は出力されない。同じく、周囲雑音の収録時には、話者音も誤って収録されないように注意する。ただし、周囲雑音として紙が擦れる音や卓上をたたく音等を想定する場合には、これらの音が収録中に生じるように収録環境を工夫しても良い。 On the other hand, the ambient noise measurement unit 703 measures the ambient noise in the video conference system installation environment 101 from a given signal. Recording of ambient noise controls the devices in the video conference system installation environment 101 so that noise as close as possible to noise during actual video conference occurs. For example, when an air conditioner or a projector is provided in the video conference system installation environment 101, ambient noise is recorded in a state where these devices are operated. Of course, no sound is output from the measurement speaker array 107 when ambient noise is recorded. Similarly, when recording ambient noise, be careful not to accidentally record speaker sound. However, in the case of assuming that the ambient noise is a paper rubbing sound or a table tapping sound, the recording environment may be devised so that these sounds are generated during recording.

図８に、周囲雑音測定部７０３の詳細ブロック構成を示す。周囲雑音測定部７０３は、測定用マイクロホンアレイ１０６で集音された音響信号Ｓ２を音源毎の信号Ｓ１１、Ｓ１２、…、Ｓ１Ｎに分離する音源分離部８０２と、音源毎の信号から各音源の音量と空間的な場所を推定する音源定位部８０３−１、８０３−２、…、８０３−Ｎとで構成される。ここで、音源定位部８０３−１、８０３−２、…、８０３−Ｎは、各音源の音量と音源位置の情報Ｓ５（Ｓ２１、Ｓ２２、…Ｓ２Ｎ）を出力する。 FIG. 8 shows a detailed block configuration of the ambient noise measurement unit 703. The ambient noise measurement unit 703 includes a sound source separation unit 802 that separates the acoustic signal S2 collected by the measurement microphone array 106 into signals S11, S12,..., S1N for each sound source, and the volume of each sound source from the signal for each sound source. And sound source localization units 803-1, 803-2,..., 803-N for estimating a spatial location. Here, the sound source localization units 803-1, 803-2,..., 803-N output sound volume and sound source position information S5 (S21, S22,... S2N) of each sound source.

音源分離部８０２は、独立成分分析や最小分散ビームフォーマ、非負行列分解その他の一般的な音源分離処理技術を用い、複数チャンネルのマイクロホン入力信号を各音源に対応する信号Ｓ１１、Ｓ１２、…、Ｓ１Ｎに分離する。 The sound source separation unit 802 uses independent component analysis, minimum dispersion beamformer, non-negative matrix decomposition and other general sound source separation processing techniques, and uses a plurality of channels of microphone input signals as signals S11, S12,. To separate.

音源定位部８０３−１、８０３−２、…、８０３−Ｎは、位相差に基づいたSRP-PHAT（Steered Response Power-Phase Transform）方式等を使用して各音源の位置を求める。この他、測定用マイクロホンアレイ１０６を分散的に配置する場合には、マイクロホン間の振幅比から音源位置を推定する方式を用いても良い。 The sound source localization units 803-1, 803-2,..., 803-N obtain the position of each sound source using an SRP-PHAT (Steered Response Power-Phase Transform) method based on the phase difference. In addition, when the measurement microphone arrays 106 are arranged in a distributed manner, a method of estimating the sound source position from the amplitude ratio between the microphones may be used.

処理３０７において、音響シミュレータは、音響シミュレーションのための会議用マイクロホン１０４と会議用スピーカ１０３に関する仮想情報の登録処理を実行する。この処理で登録された情報に基づいて、音響シミュレータは、音響シミュレーションを実行する。ユーザーは、会議用マイクロホン１０４と会議用スピーカ１０３について登録されている情報に対し、仮想値をそれぞれ登録することができる。すなわち、設置位置、向き及び性能等に関する仮想値をそれぞれ登録することができる。例えば、処理３０１及び３０２で登録された位置と向きの情報をそのまま使用し、会議用マイクロホン１０４の指向特性だけを仮想的に変更しても良い。 In the process 307, the acoustic simulator executes a virtual information registration process regarding the conference microphone 104 and the conference speaker 103 for the acoustic simulation. Based on the information registered in this process, the acoustic simulator executes an acoustic simulation. The user can register virtual values for the information registered for the conference microphone 104 and the conference speaker 103. That is, virtual values relating to the installation position, orientation, performance, and the like can be registered. For example, the position and orientation information registered in the processes 301 and 302 may be used as they are, and only the directivity characteristics of the conference microphone 104 may be virtually changed.

この形態例の場合、ユーザーは、これら情報の登録（設定）を、例えばＧＵＩ（Graphical User Interface）を用いて実行する。情報の登録は、数値等の直接入力することにより行っても良いし、予め定義されたリストの中から選択する方式を採用しても良い。 In the case of this embodiment, the user performs registration (setting) of these pieces of information using, for example, a GUI (Graphical User Interface). Information registration may be performed by directly inputting a numerical value or the like, or a method of selecting from a predefined list may be employed.

処理３０８において、音響シミュレータは、テレビ会議システム設定環境１０１について実測された音響特性と、仮想的に設定された会議用マイクロホン１０４や会議用スピーカ１０３に関する情報とに基づいて音響シミュレーションを実行し、シミュレーション結果を出力して処理を終了する。 In process 308, the acoustic simulator executes an acoustic simulation based on the acoustic characteristics actually measured for the video conference system setting environment 101 and the information about the conference microphone 104 and the conference speaker 103 that are virtually set, and the simulation is performed. The result is output and the process is terminated.

図９に、処理３０８に対応する処理機能を実現するプログラムの機能ブロック構成を示す。以下の説明では、当該プログラムを音響シミュレーション部９０１と呼ぶ。音響シミュレーション部９０１は、会議参加者の発話のインパルス応答を推定する機能と、会議用スピーカ１０４において集音される残留エコー量を推定する機能と、シミュレーション上仮想的に設定されたマイクロホン位置における騒音をシミュレーションする機能を有している。 FIG. 9 shows a functional block configuration of a program that realizes a processing function corresponding to the processing 308. In the following description, the program is referred to as an acoustic simulation unit 901. The acoustic simulation unit 901 has a function of estimating the impulse response of the utterance of the conference participant, a function of estimating the residual echo amount collected by the conference speaker 104, and noise at the microphone position virtually set in the simulation. It has a function to simulate.

音響シミュレーション部９０１は、直接音／残響音分割部９０２、想定話者位置のインパルス応答推定部９０３、残留エコー推定部９０４、想定マイク位置の騒音シミュレーション部９０５で構成される。 The acoustic simulation unit 901 includes a direct sound / reverberation sound division unit 902, an assumed speaker position impulse response estimation unit 903, a residual echo estimation unit 904, and an assumed microphone position noise simulation unit 905.

直接音/残響音分割部９０２は、インパルス応答測定部７０２で測定されたインパルス応答Ｓ４を直接音成分と残響音成分とに分割する。図１０に、インパルス応答Ｓ４の一例を示す。図中上段は、測定用スピーカ１０７と測定用マイクロホン１０６の距離が１ｍの場合に取得されるインパルス応答の波形であり、図中下段は、同距離が３ｍの場合に取得されるインパルス応答の波形である。ここで、横軸は時間であり、縦軸は信号強度である。 The direct sound / reverberation division unit 902 divides the impulse response S4 measured by the impulse response measurement unit 702 into a direct sound component and a reverberation sound component. FIG. 10 shows an example of the impulse response S4. The upper part of the figure shows the waveform of the impulse response obtained when the distance between the measurement speaker 107 and the measurement microphone 106 is 1 m, and the lower part of the figure shows the waveform of the impulse response obtained when the distance is 3 m. It is. Here, the horizontal axis is time, and the vertical axis is signal intensity.

図に破線で囲んで示すように、インパルス応答の先頭付近に出現する波形が直接音成分に対応し、それ以後に出現する波形が残響成分に対応する。２つの波形を見比べて分かるように、直接音成分は明らかに距離の影響を受けている。直接音成分のピーク値は、１ｍの方が３ｍの場合よりも大きいことが分かる。一方、残響成分については音量が大きくは変化しないことが分かる。 As shown in the figure surrounded by a broken line, the waveform appearing near the head of the impulse response corresponds to the direct sound component, and the waveform appearing thereafter corresponds to the reverberation component. As can be seen by comparing the two waveforms, the direct sound component is clearly affected by the distance. It can be seen that the peak value of the direct sound component is larger at 1 m than at 3 m. On the other hand, it can be seen that the reverberation component does not change greatly.

本例の場合、距離が１ｍと３ｍにおける直接音の比率は約９．５ｄＢ、残響音の比率は約２ｄＢであった。距離が１ｍから３ｍに３倍変化した場合に、約９．５ｄＢだけ音量が小さくなっているので、直接音成分は距離の２乗に反比例して音量が変化していると考えることができる。一方、残響の音量は、距離の変化に対してほぼ無関係に決まると考えられる。 In the case of this example, the ratio of direct sound at a distance of 1 m and 3 m was about 9.5 dB, and the ratio of reverberant sound was about 2 dB. When the distance changes 3 times from 1 m to 3 m, the volume is reduced by about 9.5 dB. Therefore, it can be considered that the volume of the direct sound component changes in inverse proportion to the square of the distance. On the other hand, the volume of reverberation is considered to be determined almost independently of changes in distance.

まず、直接音／残響音分割部９０２は、インパルス応答Ｓ４の直接音の開始ポイントsmaxを、以下の式を用いて求める。 First, the direct sound / reverberation division unit 902 obtains the direct sound start point smax of the impulse response S4 using the following equation.

なお、直接音の終了ポイントは、s_max+wで与えられる。ここで、wは窓幅であり、固定値に設定する。 The end point of the direct sound is given by s _max + w. Here, w is the window width, and is set to a fixed value.

次に、直接音／残響音分割部９０２は、開始ポイントS_maxを用い、インパルス応答の直接音成分h_directを、以下の関係式を用いて求める。 Next, the direct sound / reverberation sound division unit 902 obtains the direct sound component h _direct of the impulse response using the start point S _max using the following relational expression.

一方、直接音／残響音分割部９０２は、残響成分h_reverbを、以下の式を用いて求める。

On the other hand, the direct sound / reverberation division unit 902 obtains the reverberation component h _reverb using the following equation.

想定話者位置のインパルス応答推定部９０３は、インパルス応答の直接音成分h_directと残響成分h_reverbを使用し、想定話者位置からの発話を仮想的に配置された会議用マイクロホンで受音する場合におけるインパルス応答を推定する。なお、インパルス応答推定部９０３には、想定話者位置の情報と想定する会議マイクロホンの情報が与えられている。図９では、これらの情報をＳ４１で示す。インパルス応答h_synthは次式で与えられる。 The assumed speaker position impulse response estimation unit 903 uses the direct sound component h _direct and the reverberation component h _reverb of the impulse response, and receives a speech from the assumed speaker position with a virtually arranged conference microphone. Estimate the impulse response in the case. The impulse response estimation unit 903 is provided with information on the assumed speaker position and information on the assumed conference microphone. In FIG. 9, this information is indicated by S41. The impulse response h _synth is given by:

ここで、αは直接音成分の減衰率とし、次式で与えられる。

Here, α is a direct sound component attenuation rate and is given by the following equation.

ここで、r_preは、インパルス応答を測定した際に用いた測定用スピーカアレイ１０７と測定用マイクロホンアレイ１０６間の距離とする。r_postは、想定話者位置と会議用マイクロホン１０４間の距離とする。想定話者位置は、一点ではなく大きさを持っている。このため、設定された想定話者位置の範囲の中で最も大きなr_postを与えるr_postを設定する。 Here, r _pre is a distance between the measurement speaker array 107 and the measurement microphone array 106 used when the impulse response is measured. r _post is a distance between the assumed speaker position and the conference microphone 104. The assumed speaker position has a size, not a single point. Therefore, setting the r _post giving the greatest r _post within the range of the set assumed speaker position.

β_preはインパルス応答の測定に用いた測定用マイクロホンアレイ１０６の指向特性に依存して決まる係数とする。この形態例の場合、測定用マイクロホンアレイ１０６が向いている方向を基準方向とし、当該方向に対する測定用スピーカアレイ１０７の相対的な方向に対応する測定用マイクロホンアレイ１０６の指向特性をβ_preとする。 β _pre is a coefficient determined depending on the directivity of the measurement microphone array 106 used for measuring the impulse response. In this embodiment, the direction in which the measurement microphone array 106 faces is the reference direction, and the directivity characteristic of the measurement microphone array 106 corresponding to the relative direction of the measurement speaker array 107 with respect to the direction is β _pre . .

β_postは、仮想的に配置した会議用マイクロホンの指向特性に依存して決まる係数とする。この形態例の場合、会議用マイクロホンの向いた方向を基準方向とし、当該方向に対する相対話者位置の相対的な方向に対応する測定用マイクロホンアレイ１０６の指向特性をβ_postとする。 β _post is a coefficient determined depending on the directivity characteristics of the virtually arranged conference microphone. In the case of this embodiment, the direction in which the conference microphone is directed is the reference direction, and the directivity characteristic of the measurement microphone array 106 corresponding to the relative direction of the position of the interlocutor relative to the direction is β _post .

γ_preはインパルス応答の測定に使用した測定用スピーカアレイ１０７の放射特性に依存して決まる係数とする。この形態例の場合、測定用スピーカアレイ１０７の向いた方向を基準方向とし、当該方向に対する測定用マイクロホンアレイ１０６の相対的な方向に対応する測定用スピーカアレイ１０７の放射特性をγ_preとする。 γ _pre is a coefficient determined depending on the radiation characteristics of the speaker array for measurement 107 used for measuring the impulse response. In this example, the direction in which the measurement speaker array 107 faces is a reference direction, and the radiation characteristic of the measurement speaker array 107 corresponding to the relative direction of the measurement microphone array 106 with respect to the direction is γ _pre .

γ_postは仮想的に配置した想定話者位置の放射特性に依存して決まる係数とする。この形態例の場合、想定話者の向いている方向を基準方向とし、当該方向に対する会議用マイクロホンの相対的な方向に対応する想定話者の放射特性をγ_postとする。 γ _post is a coefficient determined depending on the radiation characteristics of the assumed speaker position virtually arranged. In this example, the direction in which the assumed speaker is facing is set as the reference direction, and the radiation characteristic of the assumed speaker corresponding to the relative direction of the conference microphone with respect to the direction is set as γ _post .

一般に、想定話者は、会議室内に設置されたディスプレイを目視できる方向に向いていると考えられる。このため、この形態例の場合、想定話者位置を、ディスプレイの対面位置に設定する。また、想定話者の放射特性は、予めダミーヘッド等で測定し、データベースに保持しておくことが望ましい。 In general, it is considered that an assumed speaker is facing a direction in which a display installed in a conference room can be seen. Therefore, in the case of this embodiment, the assumed speaker position is set to the facing position of the display. Further, it is desirable that the radiation characteristics of the assumed speaker are measured in advance with a dummy head or the like and stored in a database.

想定話者位置のインパルス応答推定部９０３は、想定話者位置毎に生成したインパルス応答h_synthを出力して処理を終了する。測定されたインパルス応答が複数存在する場合、インパルス応答推定部９０３は、r_preとr_postの差が最小となるようなインパルス応答を選択する。 The assumed speaker position impulse response estimation unit 903 outputs the impulse response h _synth generated for each assumed speaker position and ends the process. When there are a plurality of measured impulse responses, the impulse response estimation unit 903 selects an impulse response that minimizes the difference between r _pre and r _post .

残留エコー推定部９０４は、仮想的に配置した会議用マイクロホンの位置における残留エコーを推定する。この前処理として、残留エコー推定部９０４は、仮想的に配置した会議用スピーカから仮想的に配置した会議用マイクロホンまでのインパルス応答を、インパルス応答Ｓ４の直接音成分と残響成分に基づいて生成する。この残留エコー推定部９０４によるインパルス応答の生成は、想定話者位置のインパルス応答推定部９０３と同様の処理手順により行う。生成したインパルス応答をh_echoとする。なお、残留エコー推定部９０４には、想定話者位置、想定する会議用マイクロホンと会議用スピーカの情報が与えられている。図９では、これらの情報をＳ４２で示す。 The residual echo estimation unit 904 estimates the residual echo at the position of the virtually arranged conference microphone. As this preprocessing, the residual echo estimation unit 904 generates an impulse response from the virtually arranged conference speaker to the virtually arranged conference microphone based on the direct sound component and the reverberant component of the impulse response S4. . The impulse response is generated by the residual echo estimator 904 by the same processing procedure as the impulse response estimator 903 for the assumed speaker position. Let the generated impulse response be h _echo . Note that the residual echo estimation unit 904 is provided with information on the assumed speaker position, the assumed conference microphone, and the conference speaker. In FIG. 9, this information is indicated by S42.

次に、残留エコー推定部９０４は、残留エコーのインパルス応答h_residualを次式より算出する。 Next, the residual echo estimation unit 904 calculates an impulse response h _residual of the residual echo from the following equation.

ここで、λ及びt_specは、使用するエコーキャンセラの仕様に基づいて決まるパラメータである。例えばエコー消去時間Ｔ秒、２０ｄＢという性能を有するエコーキャンセラの場合、λは0.1、t_specはＴ秒に相当する。これらの情報を図９ではＳ４３で示す。残留エコー推定部９０４は、h_echoとh_residualを出力して処理を終了する。 Here, λ and t _spec are parameters determined based on the specifications of the echo canceller to be used. For example, in the case of an echo canceller having a performance of echo cancellation time T seconds and 20 dB, λ corresponds to 0.1 and t _spec corresponds to T seconds. Such information is indicated by S43 in FIG. The residual echo estimation unit 904 outputs h _echo and h _residual and ends the processing.

想定マイク位置の騒音シミュレーション部９０５は、音源分離により分離された音源の音量と位置の情報Ｓ２１〜Ｓ２Ｎを利用し、想定マイク位置の騒音レベルP_noiseを推定する。 The noise simulation unit 905 at the assumed microphone position estimates the noise level P _noise at the assumed microphone position using the sound volume and position information S21 to S2N of the sound source separated by the sound source separation.

ここで、Ｎは音源の数とする。P_observed(i)は、音源分離により分離されたi番目の音源の音量とする。r_pre(i)は、i番目の音源位置と音響特性測定用マイクロホンまでの距離とし、r_post(i)は、i番目の音源位置と仮想的に配置した会議用マイクロホンまでの距離とする。なお、図９では、想定する会議用マイクロホンの情報をＳ４４で示す。騒音シミュレーション部９０５は、推定した騒音レベルP_noiseを出力して処理を終了する。 Here, N is the number of sound sources. P _observed (i) is the volume of the i-th sound source separated by sound source separation. r _pre (i) is the distance from the i-th sound source position to the acoustic characteristic measurement microphone, and r _post (i) is the distance from the i-th sound source position to the virtually arranged conference microphone. In FIG. 9, information on the assumed conference microphone is indicated by S44. The noise simulation unit 905 outputs the estimated noise level P _noise and ends the process.

なお、必要に応じ、音響シミュレータは、シミュレーションの結果として算出されたインパルス応答h_synth、h_echo、h_residual及び騒音レベルP_noiseを文字や図形によりディスプレイ２１０上に表示する。ユーザーは、この画面表示の内容を確認することにより、会議用マイクロホン１０４や会議用スピーカ１０３をシミュレーションの対象となった仮想位置で使用した場合にどのような音響特性が得られるかを事前に判断することができる。 If necessary, the acoustic simulator displays the impulse responses h _synth , h _echo, h _residual and noise level P _noise calculated as a result of the simulation on the display 210 with characters and figures. By confirming the contents of this screen display, the user determines in advance what acoustic characteristics can be obtained when the conference microphone 104 and the conference speaker 103 are used at the virtual position targeted for simulation. can do.

このように、本形態例に係る音響シミュレータは、設置位置の仮想的な調整によりシミュレーションを実行する点において、音声信号処理のパラメータを調整する従来技術とは明らかに異なっている。 As described above, the acoustic simulator according to the present embodiment is clearly different from the prior art that adjusts the parameters of the audio signal processing in that the simulation is executed by virtual adjustment of the installation position.

（音響コンサルティング装置としての処理１）
続いて、本形態例に係る音響コンサルティング装置としての処理動作を説明する。ここでは、ユーザーにより仮想的に入力された会議用マイクロホンとスピーカの使用条件が所望の性能を満たしているか否かを入力の都度判定し、判定結果をユーザーに通知する場合について説明する。 (Processing 1 as acoustic consulting equipment)
Next, the processing operation as the acoustic consulting apparatus according to this embodiment will be described. Here, a case will be described in which whether or not the use conditions of the conference microphone and the speaker virtually input by the user satisfy the desired performance is determined every time the input is performed, and the determination result is notified to the user.

音響コンサルティング装置は、前述した音響シミュレーションの処理結果（すなわち、推定音響信号）を評価し、テレビ会議システム設定環境１０１に適した会議用マイクロホン１０４や会議用スピーカ１０３の最適な配置であるか否かの評価結果を出力する。勿論、形態例に係る音響コンサルティング装置では、テレビ会議システム設定環境１０１に関するＣＡＤデータを使用できないことが前提である。 The acoustic consulting apparatus evaluates the processing result of the acoustic simulation described above (that is, the estimated acoustic signal) and determines whether or not the conference microphone 104 and the conference speaker 103 are suitable for the video conference system setting environment 101. The evaluation result of is output. Of course, the acoustic consulting apparatus according to the embodiment is based on the premise that the CAD data related to the video conference system setting environment 101 cannot be used.

図１１に、音響コンサルティング装置の処理手順の概略を示す。なお、図１１には、図３との対応部分に同一符号を付して示している。図１１と図３の違いは、処理３０９、処理３１０及び処理３１１である。 FIG. 11 shows an outline of the processing procedure of the acoustic consulting apparatus. In FIG. 11, the same reference numerals are given to the portions corresponding to those in FIG. 3. The difference between FIG. 11 and FIG. 3 is processing 309, processing 310, and processing 311.

処理３０９では、音響シミュレーションの結果を評価するための評価性能の設定が実行される。ここでの所望性能の入力も、ユーザーが、マウス２０８、キーボード２０９その他の入力装置の操作を通じて入力する。図１２に、所望性能の一例を示す。図１２においては、会議用マイクロホン１０４で集音される話者発話の残響比量、環境雑音比量、音響エコーキャンセラ後の残留エコー比量が定義されている。いずれもSNR（Signal To Noise Ratio）の形式で定義されている。 In process 309, evaluation performance setting for evaluating the result of the acoustic simulation is executed. The user inputs the desired performance through the operation of the mouse 208, the keyboard 209, and other input devices. FIG. 12 shows an example of desired performance. In FIG. 12, the reverberation ratio amount of the speaker utterance collected by the conference microphone 104, the environmental noise ratio amount, and the residual echo ratio amount after the acoustic echo canceller are defined. Both are defined in the format of SNR (Signal To Noise Ratio).

なお、図１１の場合、処理３０９は、音響特性の計測処理（処理３０６）とマイクロホン及びスピーカの仮想情報の設定処理（処理３０７）の間に配置されているが、シミュレーション結果の判定処理（処理３１０）を実行する前であれば、どの時点に配置しても良い。 In the case of FIG. 11, the process 309 is arranged between the acoustic characteristic measurement process (process 306) and the microphone and speaker virtual information setting process (process 307), but the simulation result determination process (process) It may be arranged at any time point before executing 310).

また、図１１に示す音響コンサルティング装置の場合、処理３０７で登録される会議用マイクロホン１０４と会議用スピーカ１０３の仮想情報は、シミュレーション結果を評価するための初期条件を与えているのに過ぎない。このため、本形態例の場合には、処理３０１と処理３０２で登録された情報をそのまま読み出して仮想情報として登録しても良い。 In the case of the acoustic consulting apparatus shown in FIG. 11, the virtual information of the conference microphone 104 and the conference speaker 103 registered in the process 307 merely gives an initial condition for evaluating the simulation result. Therefore, in the case of this embodiment, the information registered in the processing 301 and the processing 302 may be read as it is and registered as virtual information.

処理３１０において、音響コンサルティング装置は、仮想的な会議用マイクロホンと会議用スピーカについて実行された音響環境のシミュレーション結果が、ユーザーが予め設定した所望の性能を満たしているか否か判定する。 In process 310, the acoustic consulting apparatus determines whether or not the simulation result of the acoustic environment executed for the virtual conference microphone and conference speaker satisfies a desired performance set in advance by the user.

ここで、性能を満たすと判定された場合、音響コンサルティング装置は、会議用マイクロホン１０３と会議用スピーカ１０４について仮想的に設定されている情報を、所望の性能を満たす条件として出力し、処理を終了する。例えば所望の性能が得られる会議用マイクロホン１０４と会議用スピーカ１０３の位置と向き情報を出力する。 If it is determined that the performance is satisfied, the acoustic consulting apparatus outputs information virtually set for the conference microphone 103 and the conference speaker 104 as a condition that satisfies the desired performance, and ends the processing. To do. For example, the position and orientation information of the conference microphone 104 and the conference speaker 103 that can obtain desired performance is output.

これに対し、性能を満たさないと判定された場合、音響コンサルティング装置は、処理３１１に進む。当該処理３１１において、音響コンサルティング装置は、シミュレーション用に仮想的に登録された会議用マイクロホン１０４と会議用スピーカ１０３の情報に対する変更を受付ける処理を実行する。登録情報に対する変更の入力には、処理３０７で用いたユーザーインターフェースを用いる。なお、登録情報の変更の入力は、ユーザーが個別に手入力する方法と自動設定する方法が考えられる。自動設定については後述する。いずれにしても、設定情報の変更の完了がユーザーから指示入力されると、音響コンサルティング装置は、処理３０８に戻り、変更後の情報に基づいて音響シミュレーションを実行する。 On the other hand, when it is determined that the performance is not satisfied, the acoustic consulting apparatus proceeds to processing 311. In the process 311, the acoustic consulting apparatus executes a process of accepting a change to the information of the conference microphone 104 and the conference speaker 103 that are virtually registered for simulation. The user interface used in processing 307 is used to input changes to the registration information. The registration information change can be input manually by the user or automatically. The automatic setting will be described later. In any case, when the completion of the change of the setting information is input from the user, the acoustic consulting apparatus returns to the processing 308 and executes an acoustic simulation based on the changed information.

図１３に、音響シミュレーション結果が所望の性能を満たしているか否かを判定するために使用するプログラムの機能ブロック構成を示す。以下の説明では、当該プログラムを性能評価部１３０１と呼ぶ。性能評価部１３０１は、以下に示す３つの評価部と１つの比較部で構成される。 FIG. 13 shows a functional block configuration of a program used for determining whether or not an acoustic simulation result satisfies a desired performance. In the following description, the program is referred to as a performance evaluation unit 1301. The performance evaluation unit 1301 includes the following three evaluation units and one comparison unit.

直接音／残響音比率評価部１３０２は、想定話者位置のインパルス応答h_synthから直接音成分h_directと残響音成分h_reverbの比率を評価する。まず、音響コンサルティング装置は、入力のあったインパルス応答h_synthを式２と式３を用いて直接音成分h_direct(t)と残響音成分h_reverb(t)に分離する。これらの成分が得られると、直接音／残響音比率評価部１３０２は、次式に基いて直接音成分と残響音性分の比率P_reverbを算出する。 The direct sound / reverberation sound ratio evaluation unit 1302 evaluates the ratio of the direct sound component h _direct and the reverberation sound component h _reverb from the impulse response h _synth of the assumed speaker position. First, the acoustic consulting apparatus separates the input impulse response h _synth into a direct sound component h _direct (t) and a reverberant sound component h _reverb (t) using Equations 2 and 3. When these components are obtained, the direct sound / reverberant ratio evaluation unit 1302 calculates the ratio _Preverb of the direct sound component and the reverberant part based on the following equation.

直接音／残響音比率評価部１３０２は、算出した比率P_reverbのうち最小値を出力する。
話者発話／残留エコー比率評価部１３０３は、想定話者位置毎に式９に基づいて比率P_echoを推定する。 The direct sound / reverberation sound ratio evaluation unit 1302 outputs the minimum value of the calculated ratio _Preverb .
The speaker utterance / residual echo ratio evaluation unit 1303 estimates the ratio P _echo based on Equation 9 for each assumed speaker position.

ここでのρは、式１０で求める。

Here, ρ is obtained by Expression 10.

なお、h_1,directは、仮想的に設定された会議用マイクロホンの位置と想定話者位置の距離が１ｍの場合におけるインパルス応答を表している。また、Ａは１ｍの距離における想定話者音量である。μは、式１１により求められる。 Here, h _{1, direct} represents an impulse response when the distance between the virtually set conference microphone position and the assumed speaker position is 1 m. A is the assumed speaker volume at a distance of 1 m. μ is determined by Equation 11.

ここで、h_spは、仮想的に設定された会議用スピーカの位置から想定話者位置までのインパルス応答を表している。Ｂは、想定話者位置におけるスピーカ出力信号の音圧レベルである。 Here, h _sp represents an impulse response from the position of the conference speaker virtually set to the assumed speaker position. B is the sound pressure level of the speaker output signal at the assumed speaker position.

話者発話／残留エコー比率評価部１３０３は、P_echoの最小値を求めて出力する。
話者発話／騒音比率評価部１３０４は、想定話者位置毎に式１２で定義されるP_nを求め、P_nの最小値を求めて出力する。 The speaker utterance / residual echo ratio evaluation unit 1303 _calculates and outputs the minimum value of P _echo .
Speaker speech / noise ratio evaluating unit 1304 obtains the P _n defined by Equation 12 for each assumed speaker position, determines and outputs the minimum value of P _n.

所望性能比較部１３０５は、ユーザーにより設定された所望性能Ｓ５１と、前段の各部で算出されたP_reverb、P_echo及びP_nとを比較して、各値が所望性能に収まっているか否かを判定する。各値についての比較結果が判定結果Ｓ５２として出力される。 The desired performance comparing unit 1305 compares the desired performance S51 set by the user with _Preverb , P _echo, and P _n calculated by the respective units in the previous stage, and determines whether each value falls within the desired performance. judge. The comparison result for each value is output as the determination result S52.

音響コンサルティング装置は、当該判定結果を文字や図形によりディスプレイ２１０上に表示する。ユーザーは、この画面表示の内容を確認することにより、仮想的に指定した条件を満たす会議用マイクロホン１０４や会議用スピーカ１０３の使用が所望の性能を満たすか否かの判定結果を知ることができる。また、所望の性能を満たさない場合には、新たな候補の指定を繰り返すことで、所望の性能が得られる条件を検索することができる。 The acoustic consulting apparatus displays the determination result on the display 210 using characters and graphics. The user can know the determination result as to whether or not the use of the conference microphone 104 or the conference speaker 103 that satisfies the virtually specified condition satisfies the desired performance by confirming the content of the screen display. . If the desired performance is not satisfied, a condition for obtaining the desired performance can be searched by repeatedly specifying a new candidate.

（音響コンサルティング装置としての処理２）
ここでも、本形態例に係る音響コンサルティング装置の処理動作例を説明する。ここでは、音響コンサルティング装置が、会議用マイクロホンとスピーカの仮想条件が所望の性能を満たすように自動的に修正する機能を搭載する場合について説明する。 (Processing 2 as an acoustic consulting device)
Here, an example of the processing operation of the acoustic consulting apparatus according to this embodiment will be described. Here, a case will be described in which the acoustic consulting apparatus is equipped with a function for automatically correcting the conference microphone and speaker so that the virtual conditions satisfy the desired performance.

図１４に、音響コンサルティング装置の処理手順の概略を示す。なお、図１４には、図１１との対応部分に同一符号を付して示しており、処理３０８までの処理内容は図１１と同様である。従って、以下では、処理３０８の音響シミュレーションが実行された後の時点から説明を開始する。 FIG. 14 shows an outline of the processing procedure of the acoustic consulting apparatus. In FIG. 14, the same reference numerals are given to corresponding parts to those in FIG. 11, and the processing content up to processing 308 is the same as in FIG. 11. Therefore, in the following, the description will be started from the time point after the acoustic simulation of the process 308 is executed.

処理３１１において、音響コンサルティング装置は、予めユーザーが設定した所望性能とシミュレーション結果との誤差を確認する。この処理は、図１３に示す所望性能比較部１３０５において実行される。この処理３１１の後、音響コンサルティング装置は、処理３１２に進む。 In processing 311, the acoustic consulting apparatus checks an error between the desired performance set by the user in advance and the simulation result. This processing is executed in the desired performance comparison unit 1305 shown in FIG. After this process 311, the acoustic consulting apparatus proceeds to process 312.

処理３１２において、音響コンサルティング装置は、１つ前のシミュレーション実行回における誤差と今回のシミュレーション実行回における誤差との差分が、所定の閾値以下か否か（すなわち、収束条件を満たすか否か）判定する。この処理３１２において否定結果が得られた場合、音響コンサルティング装置は、処理３１３に進む。 In processing 312, the acoustic consulting apparatus determines whether or not the difference between the error in the previous simulation execution time and the error in the current simulation execution time is equal to or smaller than a predetermined threshold (that is, whether or not the convergence condition is satisfied). To do. If a negative result is obtained in this process 312, the acoustic consulting apparatus proceeds to process 313.

処理３１３において、音響コンサルティング装置は、例えば式１３に定義する評価関数Ｃが最小勾配方向に遷移するように、会議用マイクロホンとスピーカの仮想情報を変更する。 In the process 313, the acoustic consulting apparatus changes the virtual information of the conference microphone and the speaker so that the evaluation function C defined in Equation 13 transitions in the minimum gradient direction, for example.

ここで、ａ，ｂ，ｃはそれぞれ性能評価尺度の重みを表している。
なお、本明細書の場合、マイクロホンの位置とスピーカの位置をそれぞれ微小方向だけずらした場合におけるコスト関数Ｃの変化値をΔＣとする。また、最小勾配方向に動かす場合の微小方向をそれぞれΔＭ及びΔＳとする。 Here, a, b, and c represent weights of the performance evaluation scale, respectively.
In this specification, the change value of the cost function C when the position of the microphone and the position of the speaker are shifted by a minute direction is ΔC. In addition, the minute directions when moving in the minimum gradient direction are denoted by ΔM and ΔS, respectively.

ΔＭは三次元ベクトルで与えられ、会議用マイクロホンの位置を特定する座標値ｘ，ｙ，ｚそれぞれの変化量を表している。ΔＳは同様に三次元ベクトルで与えられ、会議用スピーカの位置を特定する座標値ｘ，ｙ，ｚそれぞれの変化量を表している。 ΔM is given as a three-dimensional vector and represents the amount of change in each of the coordinate values x, y, z that specify the position of the conference microphone. Similarly, ΔS is given as a three-dimensional vector, and represents the amount of change in each of the coordinate values x, y, z specifying the position of the conference speaker.

また、行列[M S]_newは、次式に示すように、最小勾配方向に動かした後の仮想的な会議用マイクロホンの配置とスピーカの配置を示し、行列[M S]_oldは、最小勾配方向に動かす前の仮想的な会議用マイクロホンの配置及びスピーカの配置を示している。 The matrix [MS] _new indicates the virtual conference microphone placement and speaker placement after moving in the minimum gradient direction, as shown in the following equation, and the matrix [MS] _old is in the minimum gradient direction. The arrangement of the virtual conference microphone and the arrangement of the speakers before moving is shown.

このように仮想情報を自動的に変更した後、音響コンサルティング装置は、処理３０８の音響シミュレーションの実行に戻る。 After the virtual information is automatically changed in this way, the acoustic consulting apparatus returns to the execution of the acoustic simulation in the process 308.

なお、処理３１２において肯定結果が得られた場合（誤差が所定の閾値より小さく、収束条件を満たす場合）、音響コンサルティング装置は、処理３１０に進む。すなわち、音響コンサルティング装置は、シミュレーション結果が所望の性能を満たすか否かを判定する。この判定処理自体は、図１１の場合と同様である。なお、シミュレーション結果が所望の性能を満たしている場合、音響コンサルティング装置は、その時点で処理を終了する。 If a positive result is obtained in the process 312 (when the error is smaller than a predetermined threshold and the convergence condition is satisfied), the acoustic consulting apparatus proceeds to the process 310. That is, the acoustic consulting apparatus determines whether the simulation result satisfies a desired performance. This determination process itself is the same as in the case of FIG. If the simulation result satisfies the desired performance, the acoustic consulting apparatus ends the process at that time.

一方、否定結果が得られた場合、音響コンサルティング装置は、処理３１４において会議用マイクロホンの数を一つ増やし、その後、処理３０８の音響シミュレーションの実行に戻る。 On the other hand, if a negative result is obtained, the acoustic consulting apparatus increases the number of conference microphones by one in process 314, and then returns to the execution of the acoustic simulation in process 308.

このように、本形態例の場合に、音響コンサルティング装置が、自動的にテレビ会議に最適な会議用マイクロホンの位置と会議用マイクロホンの位置（必要に応じて会議用マイクロホンの数）を設定することができる。勿論、当該判定結果は、文字や図形によりディスプレイ２１０上に表示される。このため、ユーザーは、テレビ会議システム設置環境１０１のＣＡＤデータを有していない場合でも、会議用マイクロホンとスピーカの最適な数と位置に関する情報を自動的に得ることができる。 As described above, in the case of this embodiment, the acoustic consulting apparatus automatically sets the position of the conference microphone and the position of the conference microphone (the number of conference microphones as necessary) that are optimal for the video conference. Can do. Of course, the determination result is displayed on the display 210 with characters and figures. For this reason, even when the user does not have the CAD data of the video conference system installation environment 101, the user can automatically obtain information regarding the optimum number and position of the conference microphones and speakers.

なお、前述の形態例においては、所望の性能を満たす条件が発見された時点でその情報を出力し、仮想情報の変更処理と変更後の情報に基づくシミュレーションの実行及び評価を停止しているが、ユーザーによって予め設定された可変範囲内で仮想情報の変更と変更後の情報に基づくシミュレーションの実行及び評価を繰り返し、可変可能な範囲のうちで所望の性能を満たす空間配置やその他の条件を画面上に表示しても良い。この場合、所望の性能を満たす範囲内でもユーザーの希望を反映した配置を選択的に導入することができ、使い勝手を向上することができる。 In the above-described embodiment, the information is output when a condition that satisfies the desired performance is found, and the execution and evaluation of the simulation based on the virtual information change process and the changed information are stopped. , Repeats the execution and evaluation of simulation based on the change of virtual information and the information after the change within the variable range set in advance by the user, and displays the spatial arrangement and other conditions that satisfy the desired performance within the variable range It may be displayed above. In this case, an arrangement reflecting the user's desire can be selectively introduced even within a range satisfying the desired performance, and the usability can be improved.

（他の形態例）
なお、本発明は上述した形態例に限定されるものでなく、様々な変形例が含まれる。例えば、上述した形態例は、本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある形態例の一部を他の形態例の構成に置き換えることが可能であり、また、ある形態例の構成に他の形態例の構成を加えることも可能である。また、各形態例の構成の一部について、他の構成を追加、削除又は置換することも可能である。 (Other examples)
In addition, this invention is not limited to the form example mentioned above, Various modifications are included. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Moreover, it is possible to replace a part of a certain form example with the structure of another form example, and it is also possible to add the structure of another form example to the structure of a certain form example. Moreover, it is also possible to add, delete, or replace another structure with respect to a part of structure of each form example.

また、上述した各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路その他のハードウェアとして実現しても良い。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することにより実現しても良い。すなわち、ソフトウェアとして実現しても良い。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリやハードディスク、SSD（Solid State Drive）等の記憶装置、ICカード、SDカード、DVD等の記憶媒体に格納することができる。 Moreover, you may implement | achieve some or all of each structure, a function, a process part, a process means, etc. which were mentioned above as an integrated circuit or other hardware, for example. Each of the above-described configurations, functions, and the like may be realized by the processor interpreting and executing a program that realizes each function. That is, it may be realized as software. Information such as programs, tables, and files for realizing each function can be stored in a memory, a hard disk, a storage device such as an SSD (Solid State Drive), or a storage medium such as an IC card, an SD card, or a DVD.

また、制御線や情報線は、説明上必要と考えられるものを示すものであり、製品上必要な全ての制御線や情報線を表すものでない。実際にはほとんど全ての構成が相互に接続されていると考えて良い。 Control lines and information lines indicate what is considered necessary for the description, and do not represent all control lines and information lines necessary for the product. In practice, it can be considered that almost all components are connected to each other.

１０１…テレビ会議システム設置環境、１０２…机、１０３…会議用スピーカ、１０４…会議用マイクロホン、１０５−１…想定話者位置、１０５−２…想定話者位置、１０６…測定用マイクロホンアレイ、１０７…測定用スピーカアレイ、２０２…多チャンネルＡＤ変換装置、２０３…中央演算装置、２０４…不揮発性メモリ、２０５…揮発性メモリ、２０６…多チャンネルＤＡ変換装置、２０８…マウス、２０９…キーボード、２１０…ディスプレイ、７０１…音響特性計測部、７０２…インパルス応答測定部、７０３…周囲雑音測定部、８０２…音源分離部、８０３−１、８０３−２、８０３−Ｎ…音源定位部、９０１…音響シミュレーション部、９０２…直接音／残響音分割部、９０３…想定話者位置のインパルス応答推定部、９０４…残留エコー推定部、９０５…想定マイク位置の騒音シミュレーション部、１３０１…性能評価部、１３０２…直接音／残響音比率評価部、１３０３…話者発話／残留エコー比率評価部、１３０４…話者発話／騒音比率評価部、１３０５…所望性能比較部 DESCRIPTION OF SYMBOLS 101 ... Video conference system installation environment, 102 ... Desk, 103 ... Conference speaker, 104 ... Conference microphone, 105-1 ... Assumed speaker position, 105-2 ... Assumed speaker position, 106 ... Measurement microphone array, 107 ... Speaker array for measurement, 202 ... Multi-channel AD converter, 203 ... Central processing unit, 204 ... Non-volatile memory, 205 ... Volatile memory, 206 ... Multi-channel DA converter, 208 ... Mouse, 209 ... Keyboard, 210 ... Display, 701 ... Acoustic characteristic measurement unit, 702 ... Impulse response measurement unit, 703 ... Ambient noise measurement unit, 802 ... Sound source separation unit, 803-1, 803-2, 803-N ... Sound source localization unit, 901 ... Sound simulation unit 902 ... Direct sound / reverberation division unit 903 ... Impulse response estimation unit for assumed speaker position, 90 ... residual echo estimation unit, 905 ... noise simulation unit at assumed microphone position, 1301 ... performance evaluation unit, 1302 ... direct sound / reverberation sound ratio evaluation unit, 1303 ... speaker utterance / residual echo ratio evaluation unit, 1304 ... speaker utterance / Noise ratio evaluation unit, 1305 ... Desired performance comparison unit

Claims

A first storage device for storing acoustic data actually recorded in a space for constructing an acoustic system;
A second storage device for storing first information about the performance of the first microphone and the first speaker used at the time of recording the acoustic data and the actually measured position in the acoustic system;
A third storage device for storing second information regarding the position of the sound source assumed when the acoustic system is constructed;
A first setting receiving unit that receives a setting of third information related to the position and performance of the second microphone and the second speaker used when the acoustic system is constructed;
A fourth storage device for storing the third information;
An acoustic simulator comprising: a simulation unit configured to estimate an acoustic characteristic when the second microphone is used in the acoustic system based on the acoustic data and the first, second, and third information.

The acoustic simulator according to claim 1,
An acoustic characteristic measurement unit that performs measurement of the acoustic data by the first microphone and the first speaker;
A second setting accepting unit for accepting settings of the first and second information;
An acoustic simulator comprising: a change receiving unit that receives a change in the setting of the third information.

In the acoustic simulator according to claim 2,
The first setting receiving unit receives a virtual change with respect to the number of second microphones.

In the acoustic simulator according to claim 2,
The acoustic simulator according to claim 1, wherein the first setting reception unit receives a virtual change with respect to a position of the second microphone.

The acoustic simulator according to claim 1,
The acoustic simulator is a video conference system.

A first storage device for storing acoustic data actually recorded in a space for constructing an acoustic system;
A second storage device for storing first information about the performance of the first microphone and the first speaker used at the time of recording the acoustic data and the actually measured position in the acoustic system;
A third storage device for storing second information regarding the position of the sound source assumed when the acoustic system is constructed;
A fourth storage device for storing third information on the position and performance of the second microphone and the second speaker used when the acoustic system is constructed;
A simulation unit for estimating acoustic characteristics when the second microphone is used in the acoustic system based on the acoustic data and the first, second, and third information;
A determination unit that determines whether or not the estimated acoustic characteristics satisfy a desired performance;
An acoustic consulting apparatus comprising: a presentation unit that outputs a determination result of the determination unit to a user interface.

The acoustic consulting apparatus according to claim 6,
The determination unit includes a setting change unit that automatically changes at least a part of the third information when it is determined that the estimated acoustic characteristics do not satisfy a desired performance.
The said simulation part uses the said 3rd information after a change, and newly estimates the acoustic characteristic at the time of using a said 2nd microphone with the said acoustic system. The acoustic consulting apparatus characterized by the above-mentioned.

The acoustic consulting apparatus according to claim 7,
The setting changer changes the number of second microphones. An acoustic consulting apparatus, wherein:

The acoustic consulting apparatus according to claim 7,
The setting change unit changes the position of the second microphone.

Processing to store the acoustic data actually recorded in the space for constructing the acoustic system in a storage device;
A process of accepting, through an input device, the setting of the first information regarding the performance of the first microphone and the first speaker used at the time of recording the acoustic data and the actually measured position in the acoustic system;
A process of accepting the setting of the second information related to the position of the sound source assumed at the time of constructing the sound system through the input device;
A process of accepting the setting of the third information related to the position and performance of the second microphone and the second speaker used at the time of constructing the acoustic system through the input device;
An acoustic simulation method comprising: processing for estimating an acoustic characteristic when the second microphone is used in the acoustic system based on the acoustic data and the first, second, and third information.

Processing to store the acoustic data actually recorded in the space for constructing the acoustic system in a storage device;
A process of accepting, through an input device, the setting of the first information regarding the performance of the first microphone and the first speaker used at the time of recording the acoustic data and the actually measured position in the acoustic system;
A process of accepting the setting of the second information related to the position of the sound source assumed at the time of constructing the sound system through the input device;
Based on the third information on the position and performance of the second microphone and the second speaker used at the time of constructing the acoustic system, the acoustic data, and the first and second information, the second microphone is A process of estimating acoustic characteristics when used in the acoustic system;
A process of determining whether or not the estimated acoustic characteristics satisfy a desired performance;
And a process of outputting a determination result of the determination unit to a user interface.