JP6998823B2

JP6998823B2 - Multi-channel objective evaluation device and program

Info

Publication number: JP6998823B2
Application number: JP2018078019A
Authority: JP
Inventors: 知美小倉; 智康小森; 岳大杉本
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2018-04-13
Filing date: 2018-04-13
Publication date: 2022-02-04
Anticipated expiration: 2038-04-13
Also published as: JP2019184933A

Description

本発明は、２チャンネルを超えるマルチチャンネル音響システムに用いるマルチチャンネル音響信号の品質を客観評価するマルチチャンネル客観評価装置及びプログラムに関する。 The present invention relates to a multi-channel objective evaluation device and a program for objectively evaluating the quality of a multi-channel acoustic signal used in a multi-channel acoustic system having more than two channels.

従来、マルチチャンネル音響システムにおいて、音響信号の品質を評価する方法が知られている。例えば、音響信号の品質を主観的に評価する方法として、マルチチャンネル音響システムを含む劣化の少ない音響システムの主観評価法がＩＴＵ－Ｒ勧告ＢＳ．１１１６－３に定められている（例えば、非特許文献１を参照）。 Conventionally, in a multi-channel acoustic system, a method of evaluating the quality of an acoustic signal has been known. For example, as a method for subjectively evaluating the quality of an acoustic signal, a subjective evaluation method for an acoustic system with little deterioration including a multi-channel acoustic system is described in ITU-R Recommendation BS. It is defined in 1116-3 (see, for example, Non-Patent Document 1).

一方、ＩＴＵ－Ｒ勧告ＢＳ．１１１６－３に則して行った主観評価に対応した音質を客観的に測定する客観評価法がＩＴＵ－Ｒ勧告ＢＳ．１３８７－１に定められている（例えば、非特許文献２を参照）。このＩＴＵ－Ｒ勧告ＢＳ．１３８７－１に定めた客観評価法は、ＰＥＡＱ（Perceptual Evaluation of Audio Quality）客観音質測定法と呼ばれている。 On the other hand, ITU-R Recommendation BS. The objective evaluation method for objectively measuring the sound quality corresponding to the subjective evaluation performed in accordance with 1116-3 is ITU-R Recommendation BS. 1387-1 (see, for example, Non-Patent Document 2). This ITU-R recommendation BS. The objective evaluation method defined in 1387-1 is called a PEAQ (Perceptual Evaluation of Audio Quality) objective sound quality measurement method.

ＰＥＡＱ客観音質測定法は、音響信号の品質を客観的に測定するための標準化アルゴリズムにて実現され、人間の耳の知覚特性を反映した聴覚モデル、及びニューラルネットワーク構造を有する認識モデルを用いて、客観評価値を求めるものである。詳細については後述する。 The PEAQ objective sound quality measurement method is realized by a standardized algorithm for objectively measuring the quality of acoustic signals, and uses an auditory model that reflects the perceptual characteristics of the human ear and a recognition model that has a neural network structure. It seeks an objective evaluation value. Details will be described later.

一般に、信頼性の高い主観評価を行うには、多くの被験者、多大な時間及び労力を必要とするため、全ての音源に対して主観評価を行うのは現実的でない。このため、予め客観評価を行うことにより、主観評価に使用するパラメータを選定するようにしている。 In general, it is not realistic to perform a subjective evaluation on all sound sources because a large number of subjects, a large amount of time and effort are required to perform a highly reliable subjective evaluation. Therefore, the parameters used for the subjective evaluation are selected by performing the objective evaluation in advance.

しかしながら、前述のＩＴＵ－Ｒ勧告ＢＳ．１３８７－１に定めた客観評価法は、１チャンネルまたは２チャンネルの音響システムに適用した方法である。このため、この客観評価法は、２２．２ｃｈ（チャンネル）等の２チャンネルを超えるマルチチャンネル音響システム（例えば、非特許文献３を参照）に対して用いることができない。 However, the aforementioned ITU-R Recommendation BS. The objective evaluation method defined in 1387-1 is a method applied to a one-channel or two-channel acoustic system. Therefore, this objective evaluation method cannot be used for a multi-channel acoustic system (see, for example, Non-Patent Document 3) having more than two channels such as 22.2 ch (channel).

そこで、２チャンネルを超えるマルチチャンネル音響システムにおいて、マルチチャンネル音響信号の品質を客観的に評価する方法が提案されている（例えば、非特許文献４を参照）。この方法は、マルチチャンネル音響信号の原音及び劣化音に頭部インパルス応答ＨＲＩＲ（Head Related Impulse Response）をそれぞれ畳み込んで２チャンネル信号に変換し、客観評価を行うものである。 Therefore, a method for objectively evaluating the quality of a multi-channel acoustic signal in a multi-channel acoustic system having more than two channels has been proposed (see, for example, Non-Patent Document 4). In this method, the head related impulse response (HRIR) is convoluted with the original sound and the deteriorated sound of the multi-channel acoustic signal, respectively, and converted into a two-channel signal for objective evaluation.

Rec. ITU-R BS.1116-3,“Methods for the subjective assessment of small impairments in audio systems”,2015Rec. ITU-R BS.1116-3, “Methods for the subjective assessment of small impairments in audio systems”, 2015 Rec. ITU-R BS.1387-1,“Method for objective measurements of perceived audio quality”,2001Rec. ITU-R BS.1387-1, “Method for objective measurements of perceived audio quality”, 2001 Rec. ITU-R BS.2051，“Advanced sound system for programme production”,2014Rec. ITU-R BS.2051, “Advanced sound system for programme production”, 2014 J.LIEBETRAU etc,“Standardization of PEAQ-MC:Extension of ITU-R BS.1387-1 to multichannel audio”,J. Audio Eng. Soc. 40th International Conference,2010J.LIEBETRAU etc, “Standardization of PEAQ-MC: Extension of ITU-R BS.1387-1 to multichannel audio”, J. Audio Eng. Soc. 40th International Conference, 2010

しかしながら、前述の非特許文献４は、２チャンネルを超えるマルチチャンネル音響システムに用いるマルチチャンネル音響信号の品質を客観的に評価する場合に、前述の非特許文献２のＩＴＵ－Ｒ勧告ＢＳ．１３８７－１に定めた客観評価法とは異なり、両耳間時間差、両耳間レベル差等も用いた認識モデルを用いている。また、非特許文献４により求められる客観評価結果は、前述の非特許文献１のＩＴＵ－Ｒ勧告ＢＳ．１１１６－３に定めた主観評価法により求められる主観評価結果を十分に反映した値ではない。このため、ＩＴＵ－Ｒ（国際電気通信連合の無線通信部門）は、前述の非特許文献４の方法を用いて標準化を試みたが、承認されずに現在に至っている。 However, the above-mentioned Non-Patent Document 4 describes the ITU-R recommendation BS of the above-mentioned Non-Patent Document 2 when objectively evaluating the quality of a multi-channel acoustic signal used in a multi-channel acoustic system having more than two channels. Unlike the objective evaluation method defined in 1387-1, a recognition model using the time difference between both ears and the level difference between both ears is used. Further, the objective evaluation result obtained by Non-Patent Document 4 is obtained from the above-mentioned ITU-R Recommendation BS of Non-Patent Document 1. It is not a value that sufficiently reflects the subjective evaluation result obtained by the subjective evaluation method specified in 1116-3. For this reason, ITU-R (International Telecommunication Union Radiocommunication Sector) has attempted standardization using the method of Non-Patent Document 4 described above, but it has not been approved and continues to the present.

ところで、２チャンネルを超えるマルチチャンネル音響システムにおいて、符号化等により劣化した音響信号を主観評価する場合、人間は、全ての方向の音響信号を集中して聞き比べることが苦手である。このため、音響信号のチャンネル数が多い場合、主観評価値が上がる傾向がある。 By the way, in a multi-channel acoustic system having more than two channels, when subjectively evaluating an acoustic signal deteriorated by coding or the like, human beings are not good at concentrating and comparing acoustic signals in all directions. Therefore, when the number of channels of the acoustic signal is large, the subjective evaluation value tends to increase.

また、音像が動くコンテンツについても、人間は、全ての方向の音響信号を頭で記憶しながら集中して聞き比べることが苦手である。このため、チャンネル数が多い場合には、同様に主観評価値が上がる傾向がある。 Also, with regard to content in which the sound image moves, humans are not good at concentrating and comparing acoustic signals in all directions while memorizing them with their heads. Therefore, when the number of channels is large, the subjective evaluation value tends to increase as well.

マルチチャンネル音響信号は人間へ提示されるものであるから、客観評価値は、このような主観評価値の傾向が反映された値となることが望ましい。つまり、２チャンネルを超えるマルチチャンネル音響信号の品質を客観的に評価する方法は、主観評価値への影響を考慮した客観評価法であることが望ましい。 Since the multi-channel acoustic signal is presented to humans, it is desirable that the objective evaluation value is a value that reflects such a tendency of the subjective evaluation value. That is, it is desirable that the method for objectively evaluating the quality of a multi-channel acoustic signal having more than two channels is an objective evaluation method in consideration of the influence on the subjective evaluation value.

前述の非特許文献２のＩＴＵ－Ｒ勧告ＢＳ．１３８７－１に定めた客観評価法は、主観評価値への影響を考慮した客観評価法であるが、２チャンネルの音響信号に適用する方法であり、２チャンネルを超えるマルチチャンネル音響信号に適用する方法ではない。 The above-mentioned ITU-R recommendation BS of Non-Patent Document 2. The objective evaluation method defined in 1387-1 is an objective evaluation method in consideration of the influence on the subjective evaluation value, but is a method applied to a two-channel acoustic signal and applied to a multi-channel acoustic signal exceeding two channels. Not the way.

ここで、前述の非特許文献４の方法に、前述の非特許文献２のＩＴＵ－Ｒ勧告ＢＳ．１３８７－１に定めた客観評価法を組み込んだ新たな手法を想定することができる。この想定手法は、マルチチャンネル音響信号の原音及び劣化音に頭部インパルス応答ＨＲＩＲをそれぞれ畳み込み、原音及び劣化音の畳み込み結果をそれぞれ加算して２チャンネル信号を生成し、この２チャンネル信号を用いて、ＰＥＡＱ客観音質測定法により客観評価値を求めるものである。 Here, in addition to the method of the above-mentioned non-patent document 4, the ITU-R recommendation BS of the above-mentioned non-patent document 2 is used. A new method incorporating the objective evaluation method defined in 1387-1 can be envisioned. In this assumed method, the head impulse response HRIR is convolved with the original sound and the deteriorated sound of the multi-channel acoustic signal, and the convolution results of the original sound and the deteriorated sound are added to generate a 2-channel signal, and the 2-channel signal is used. , PEAQ The objective evaluation value is obtained by the objective sound quality measurement method.

この想定手法は、主観評価に対応した音質を客観的に測定するＰＥＡＱ客観音質測定法を用いるものであるが、後述する図１０の実験結果に示すように、その客観評価結果は、主観評価結果に近い値にならない。 This assumption method uses the PEAQ objective sound quality measurement method for objectively measuring the sound quality corresponding to the subjective evaluation. As shown in the experimental result of FIG. 10 described later, the objective evaluation result is the subjective evaluation result. The value is not close to.

主観評価結果と客観評価結果が異なる要因として、加算された音響信号にはそれぞれのチャンネルの劣化も加算されているが、評価者がその全てに対してステレオ信号と同様の精度で評価することが難しいということが推察される。 As a factor that makes the subjective evaluation result different from the objective evaluation result, the deterioration of each channel is also added to the added acoustic signal, but the evaluator evaluates all of them with the same accuracy as the stereo signal. It is inferred that it is difficult.

そこで、本発明は前記課題を解決するためになされたものであり、その目的は、２チャンネルを超えるマルチチャンネル音響信号の品質について、主観評価結果に近い客観評価結果を得ることが可能なマルチチャンネル客観評価装置及びプログラムを提供することにある。 Therefore, the present invention has been made to solve the above-mentioned problems, and an object thereof is a multi-channel capable of obtaining an objective evaluation result close to a subjective evaluation result with respect to the quality of a multi-channel acoustic signal having more than two channels. The purpose is to provide an objective evaluation device and a program.

前記課題を解決するために、請求項１のマルチチャンネル客観評価装置は、２チャンネルを超えるマルチチャンネル音響信号を客観評価するマルチチャンネル客観評価装置において、前記マルチチャンネル音響信号を構成するそれぞれの音響信号のチャンネルに対応して、チャンネル毎の伝搬特性を表す頭部インパルス応答（ＨＲＩＲ）またはバイノーラル室内インパルス応答（ＢＲＩＲ）を畳み込み信号として出力する畳み込み信号出力部と、前記マルチチャンネル音響信号の原音及び劣化音を入力すると共に、前記畳み込み信号出力部により出力されたチャンネル毎の前記畳み込み信号を入力し、チャンネル毎の前記原音に前記畳み込み信号を畳み込み、全てのチャンネルの畳み込み結果に基づいて、全てのチャンネルに共通の基本信号を生成すると共に、チャンネル毎に、当該チャンネルを含む１または複数のチャンネルの前記劣化音に前記畳み込み信号を畳み込み、第１の畳み込み結果を生成し、全てのチャンネルのうち前記１または複数のチャンネル以外のチャンネルの前記原音に前記畳み込み信号を畳み込み、第２の畳み込み結果を生成し、前記第１の畳み込み結果及び前記第２の畳み込み結果に基づいて被測定信号を生成し、チャンネル毎に、前記基本信号及び前記被測定信号からなるバイノーラル信号を生成する信号処理部と、前記信号処理部により生成されたチャンネル毎の前記バイノーラル信号を入力し、チャンネル毎に、当該チャンネルの前記バイノーラル信号に基づき、所定のＰＥＡＱ（Perceptual Evaluation of Audio Quality）客観音質測定法を用いて、客観評価結果を生成する評価部と、前記評価部により生成されたチャンネル毎の前記客観評価結果に基づいて、前記マルチチャンネル音響信号の客観評価結果をマルチチャンネル客観評価結果として生成するマルチチャンネル評価部と、を備えたことを特徴とする。 In order to solve the above problems, the multi-channel objective evaluation device according to claim 1 is a multi-channel objective evaluation device that objectively evaluates a multi-channel acoustic signal having more than two channels, and each acoustic signal constituting the multi-channel acoustic signal is used. A convolutional signal output unit that outputs a head impulse response (HRIR) or a binoral chamber impulse response (BRIR) that represents the propagation characteristics of each channel as a convolution signal, and the original sound and deterioration of the multi-channel acoustic signal. Along with inputting the sound, the convolution signal for each channel output by the convolution signal output unit is input, the convolution signal is convoluted into the original sound for each channel, and all channels are convoluted based on the convolution results of all channels. In addition to generating a basic signal common to the above, the convolution signal is convoluted into the deterioration sound of one or a plurality of channels including the channel for each channel to generate a first convolution result, and the above 1 of all channels is generated. Alternatively, the convolution signal is convoluted into the original sound of a channel other than the plurality of channels to generate a second convolution result, and a signal to be measured is generated based on the first convolution result and the second convolution result. Each time, a signal processing unit that generates a binoral signal composed of the basic signal and the measured signal and the binoral signal for each channel generated by the signal processing unit are input, and for each channel, the binoral of the channel is input. Based on the signal, an evaluation unit that generates an objective evaluation result using a predetermined PEAQ (Perceptual Evaluation of Audio Quality) objective sound quality measurement method, and the objective evaluation result for each channel generated by the evaluation unit. It is characterized by including a multi-channel evaluation unit that generates an objective evaluation result of the multi-channel acoustic signal as a multi-channel objective evaluation result.

また、請求項２のマルチチャンネル客観評価装置は、請求項１に記載のマルチチャンネル客観評価装置において、前記畳み込み信号出力部が、前記マルチチャンネル音響信号のチャンネルの数及び配置を定める音響方式の情報を入力し、予め設定されたデータベースから、前記音響方式に対応するチャンネル毎の前記畳み込み信号を読み出して出力し、前記データベースには、前記音響方式のチャンネル、及び当該チャンネルに対応する前記畳み込み信号が格納されている、ことを特徴とする。 Further, in the multi-channel objective evaluation device according to claim 2, in the multi-channel objective evaluation device according to claim 1, the convolution signal output unit determines the number and arrangement of channels of the multi-channel acoustic signal. Is input, the convolution signal for each channel corresponding to the acoustic method is read out from a preset database and output, and the channel of the acoustic method and the convolution signal corresponding to the channel are stored in the database. It is characterized by being stored.

また、請求項３のマルチチャンネル客観評価装置は、請求項１に記載のマルチチャンネル客観評価装置において、前記畳み込み信号出力部が、前記マルチチャンネル音響信号を構成するそれぞれの音響信号についての再生位置を定めるチャンネル毎の角度の情報を入力し、予め設定されたデータベースから、チャンネル毎の前記角度に対応するチャンネル毎の前記畳み込み信号を読み出して出力し、前記データベースには、前記角度、及び当該角度に対応する前記畳み込み信号が格納されている、ことを特徴とする。 Further, in the multi-channel objective evaluation device according to claim 3, in the multi-channel objective evaluation device according to claim 1, the convolution signal output unit determines the reproduction position of each acoustic signal constituting the multi-channel acoustic signal. Information on the angle for each predetermined channel is input, the convolution signal for each channel corresponding to the angle for each channel is read out from a preset database, and the convolution signal is output to the database at the angle and the angle. It is characterized in that the corresponding convolution signal is stored.

また、請求項４のマルチチャンネル客観評価装置は、請求項１から３までのいずれか一項に記載のマルチチャンネル客観評価装置において、前記マルチチャンネル評価部が、前記評価部により生成されたチャンネル毎の前記客観評価結果のうち最低値を検出し、当該最低値を前記マルチチャンネル客観評価結果として生成する、ことを特徴とする。 Further, the multi-channel objective evaluation device according to claim 4 is the multi-channel objective evaluation device according to any one of claims 1 to 3, wherein the multi-channel evaluation unit is generated for each channel generated by the evaluation unit. It is characterized in that the lowest value among the objective evaluation results of the above is detected and the lowest value is generated as the multi-channel objective evaluation result.

また、請求項５のマルチチャンネル客観評価装置は、請求項１から３までのいずれか一項に記載のマルチチャンネル客観評価装置において、前記マルチチャンネル評価部が、前記評価部により生成されたチャンネル毎の前記客観評価結果に対し、所定のチャンネル毎の重み付け係数をそれぞれ乗算し、チャンネル毎の乗算結果を加算し、加算結果を前記マルチチャンネル客観評価結果として生成する、ことを特徴とする。 Further, the multi-channel objective evaluation device according to claim 5 is the multi-channel objective evaluation device according to any one of claims 1 to 3, wherein the multi-channel evaluation unit is generated for each channel generated by the evaluation unit. The objective evaluation result is multiplied by a weighting coefficient for each predetermined channel, the multiplication result for each channel is added, and the addition result is generated as the multi-channel objective evaluation result.

また、請求項６のプログラムは、コンピュータを、請求項１から５までのいずれか一項に記載のマルチチャンネル客観評価装置として機能させることを特徴とする。 The program of claim 6 is characterized in that the computer functions as the multi-channel objective evaluation device according to any one of claims 1 to 5.

以上のように、本発明によれば、２チャンネルを超えるマルチチャンネル音響信号の品質について、主観評価結果に近い客観評価結果を得ることが可能となる。 As described above, according to the present invention, it is possible to obtain an objective evaluation result close to a subjective evaluation result for the quality of a multi-channel acoustic signal having more than two channels.

本発明の実施形態によるマルチチャンネル客観評価装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the multi-channel objective evaluation apparatus by embodiment of this invention. マルチチャンネル客観評価装置の処理例を示すフローチャートである。It is a flowchart which shows the processing example of the multi-channel objective evaluation apparatus. 畳み込み信号出力部の処理例を示すフローチャートである。It is a flowchart which shows the processing example of the convolution signal output part. ＤＢのデータ構成例を示す図である。It is a figure which shows the data structure example of DB. 信号処理部の第１処理例を示すフローチャートである。It is a flowchart which shows the 1st processing example of a signal processing part. 信号処理部の第２処理例を示すフローチャートである。It is a flowchart which shows the 2nd processing example of a signal processing part. マルチチャンネル評価部の第１処理例を示すフローチャートである。It is a flowchart which shows the 1st processing example of a multi-channel evaluation part. マルチチャンネル評価部の第２処理例を示すフローチャートである。It is a flowchart which shows the 2nd processing example of a multi-channel evaluation part. マルチチャンネル評価部による重み付け係数Ｗ_1～24の設定処理例を示すフローチャートである。It is a flowchart which shows the setting processing example of the weighting coefficient W _{1 to 24} by a multi-channel evaluation unit. 実験結果を示す図である。It is a figure which shows the experimental result.

以下、本発明を実施するための形態について図面を用いて詳細に説明する。
〔発明の概要〕
符号化等により劣化した音響信号（以下、「劣化音」という。）を主観評価する場合、人間は、個別の音源の音質劣化に着目して評価する傾向がある。また、マルチチャンネル音響システムにおいて、マルチチャンネル音響信号を再生する際の音源は、あるチャンネル（例えば正面方向のチャンネル、またはペアとなるチャンネル）について最も大きなレベルでミキシングされる。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings.
[Outline of the invention]
When subjectively evaluating an acoustic signal (hereinafter referred to as "deteriorated sound") deteriorated by coding or the like, human beings tend to pay attention to the deterioration of the sound quality of each sound source. Further, in a multi-channel acoustic system, a sound source for reproducing a multi-channel acoustic signal is mixed at the highest level for a certain channel (for example, a frontal channel or a paired channel).

このような状況を鑑み、本発明の実施形態のマルチチャンネル客観評価装置は、あるチャンネルの音質劣化の度合いを主観評価と近似させるために、所定のチャンネルのみを劣化音とし、その他のチャンネルを原音として扱う。そして、マルチチャンネル客観評価装置は、これらの劣化音及び原音を用いてバイノーラル信号を生成し、このバイノーラル信号を客観評価対象の入力信号とし、客観評価を行う。 In view of such a situation, in the multi-channel objective evaluation device of the embodiment of the present invention, in order to approximate the degree of sound quality deterioration of a certain channel to the subjective evaluation, only a predetermined channel is used as the deteriorated sound and the other channels are used as the original sound. Treat as. Then, the multi-channel objective evaluation device generates a binaural signal using these deteriorated sounds and original sounds, and uses this binaural signal as an input signal for objective evaluation to perform objective evaluation.

具体的には、マルチチャンネル客観評価装置は、２チャンネルを超えるマルチチャンネル音響信号を構成するそれぞれの音響信号の原音及び劣化音を入力する。そして、マルチチャンネル客観評価装置は、チャンネル毎に、例えば全ての原音及び当該チャンネルのみの劣化音を用いた畳み込み処理を行い、主観評価を考慮したチャンネル毎のバイノーラル信号を生成する。 Specifically, the multi-channel objective evaluation device inputs the original sound and the deteriorated sound of each acoustic signal constituting the multi-channel acoustic signal having more than two channels. Then, the multi-channel objective evaluation device performs a convolution process for each channel using, for example, all the original sounds and the deteriorated sound of only the channel, and generates a binaural signal for each channel in consideration of the subjective evaluation.

マルチチャンネル客観評価装置は、バイノーラル信号を客観評価対象の入力信号として、チャンネル毎に、前述の非特許文献２のＩＴＵ－Ｒ勧告ＢＳ．１３８７－１に定めた客観評価法により客観評価値を求める。そして、マルチチャンネル客観評価装置は、チャンネル毎の客観評価値に基づいて、マルチチャンネルの客観評価値を求める。 The multi-channel objective evaluation device uses a binoural signal as an input signal to be objectively evaluated, and uses the above-mentioned ITU-R recommendation BS of Non-Patent Document 2 for each channel. The objective evaluation value is obtained by the objective evaluation method specified in 1387-1. Then, the multi-channel objective evaluation device obtains a multi-channel objective evaluation value based on the objective evaluation value for each channel.

これにより、客観評価対象のバイノーラル信号は、個別の音源の音質劣化に着目して生成される主観評価を考慮した信号であるから、バイノーラル信号の客観評価値から生成されるマルチチャンネルの客観評価値は、主観評価値に近い値となる。したがって、２チャンネルを超えるマルチチャンネル音響信号の品質について、主観評価結果に近い客観評価結果を得ることが可能となる。 As a result, since the binaural signal to be objectively evaluated is a signal considering the subjective evaluation generated by paying attention to the deterioration of the sound quality of each sound source, the multi-channel objective evaluation value generated from the objective evaluation value of the binaural signal. Is a value close to the subjective evaluation value. Therefore, it is possible to obtain an objective evaluation result close to the subjective evaluation result for the quality of the multi-channel acoustic signal having more than two channels.

〔マルチチャンネル客観評価装置〕
まず、本発明の実施形態によるマルチチャンネル客観評価装置の構成及び処理について説明する。図１は、本発明の実施形態によるマルチチャンネル客観評価装置の構成例を示すブロック図である。 [Multi-channel objective evaluation device]
First, the configuration and processing of the multi-channel objective evaluation device according to the embodiment of the present invention will be described. FIG. 1 is a block diagram showing a configuration example of a multi-channel objective evaluation device according to an embodiment of the present invention.

このマルチチャンネル客観評価装置１は、２チャンネルを超えるマルチチャンネル音響信号を客観的に評価する装置であり、前述の非特許文献２のＩＴＵ－Ｒ勧告ＢＳ．１３８７－１に定めた客観評価法を活用し、前述の非特許文献１のＩＴＵ－Ｒ勧告ＢＳ．１１１６－３に定めた主観評価法により求められる主観評価値に近いマルチチャンネルの客観評価値ｚ（マルチチャンネル客観評価結果）を求める。マルチチャンネル客観評価装置１は、畳み込み信号出力部１０、信号処理部１１、ＰＥＡＱ評価部１２及びマルチチャンネル評価部１３を備えている。 This multi-channel objective evaluation device 1 is a device that objectively evaluates a multi-channel acoustic signal having more than two channels, and is described in the above-mentioned ITU-R recommendation BS of Non-Patent Document 2. Utilizing the objective evaluation method defined in 1387-1, the ITU-R recommendation BS of Non-Patent Document 1 described above. A multi-channel objective evaluation value z (multi-channel objective evaluation result) close to the subjective evaluation value obtained by the subjective evaluation method defined in 1116-3 is obtained. The multi-channel objective evaluation device 1 includes a convolution signal output unit 10, a signal processing unit 11, a PEAQ evaluation unit 12, and a multi-channel evaluation unit 13.

マルチチャンネル客観評価装置１は、マルチチャンネル音響信号の原音ｘ_1～24及び劣化音ｘ’_1～24を入力すると共に、再生位置情報Ｐを入力し、再生位置情報Ｐに基づいて、チャンネル毎の畳み込み信号を特定する。そして、マルチチャンネル客観評価装置１は、主観評価を考慮したチャンネル毎のバイノーラル信号を生成し、バイノーラル信号をＰＥＡＱ評価し、その結果に基づいて、主観評価を考慮したマルチチャンネルの客観評価値ｚを算出する。 The multi-channel objective evaluation device 1 inputs the original sound x _{1 to 24} and the deteriorated sound x ' _{1 to 24} of the multi-channel acoustic signal, inputs the reproduction position information P, and based on the reproduction position information P, for each channel. Identify the convolution signal. Then, the multi-channel objective evaluation device 1 generates a binaural signal for each channel in consideration of subjective evaluation, evaluates the binaural signal in PEAQ, and based on the result, obtains a multi-channel objective evaluation value z in consideration of subjective evaluation. calculate.

以下、マルチチャンネル音響信号の例として、音響方式が２２．２ｃｈの場合の音響信号を挙げて具体的に説明する。２２．２ｃｈのマルチチャンネル音響信号は、２４チャンネルの音響信号により構成される。 Hereinafter, as an example of the multi-channel acoustic signal, an acoustic signal when the acoustic method is 22.2ch will be specifically described. The 22.2ch multi-channel acoustic signal is composed of 24 channels of acoustic signals.

再生位置情報Ｐは、マルチチャンネル音響システムにおけるそれぞれの音響信号の再生位置に関する情報であり、例えば、マルチチャンネル音響信号の音響方式の情報、または再生位置に関する角度の情報である。本例の場合、再生位置情報Ｐとして、２２．２ｃｈの音響方式の情報が入力される。音響方式により、チャンネルの数及び配置が一義的に決定される。または、再生位置情報Ｐとして、２２．２ｃｈのマルチチャンネル音響信号を構成するそれぞれの音響信号についての（それぞれのチャンネルについての）仰角及び方位角（水平面の角度及び垂直面の角度）からなる角度の情報が入力される。 The reproduction position information P is information regarding the reproduction position of each acoustic signal in the multi-channel acoustic system, and is, for example, information on the acoustic method of the multi-channel acoustic signal or information on an angle relating to the reproduction position. In the case of this example, 22.2ch acoustic method information is input as the reproduction position information P. The acoustic method uniquely determines the number and arrangement of channels. Alternatively, as the reproduction position information P, the angle consisting of the elevation angle (for each channel) and the azimuth angle (horizontal plane angle and vertical plane angle) for each acoustic signal constituting the 22.2ch multi-channel acoustic signal. Information is entered.

図２は、マルチチャンネル客観評価装置１の処理例を示すフローチャートである。マルチチャンネル客観評価装置１は、マルチチャンネル音響信号を構成するそれぞれの音響信号の原音ｘ_1～24及び劣化音ｘ’_1～24を入力すると共に、再生位置情報Ｐを入力する（ステップＳ２０１）。マルチチャンネル客観評価装置１により、マルチチャンネル音響信号の原音ｘ_1～24を基準として劣化音ｘ’_1～24が客観的に評価される。 FIG. 2 is a flowchart showing a processing example of the multi-channel objective evaluation device 1. The multi-channel objective evaluation device 1 inputs the original sound x _{1 to 24} and the deteriorated sound x ' _{1 to 24} of each acoustic signal constituting the multi-channel acoustic signal, and also inputs the reproduction position information P (step S201). The multi-channel objective evaluation device 1 objectively evaluates the deteriorated sound x ' _{1 to 24} based on the original sound x _{1 to 24} of the multi-channel acoustic signal.

マルチチャンネル客観評価装置１は、再生位置情報Ｐに基づき、チャンネル毎の畳み込み信号として、例えばチャンネル毎の伝搬特性を表す頭部インパルス応答ＨＲＩＲ_1～24を特定する（ステップＳ２０２）。 The multi-channel objective evaluation device 1 specifies, for example, head-related impulse responses HRIRs _{1 to 24} representing propagation characteristics for each channel as convolution signals for each channel based on the reproduction position information P (step S202).

マルチチャンネル客観評価装置１は、チャンネル毎の原音ｘ_1～24、劣化音ｘ’_1～24及び頭部インパルス応答ＨＲＩＲ_1～24に基づいて、主観評価を考慮した畳み込み処理を行い、チャンネル毎のバイノーラル信号ｙ_{1_ori～24_ori}，ｙ_{1_sig～24_sig}を生成する（ステップＳ２０３）。 The multi-channel objective evaluation device 1 performs convolution processing in consideration of subjective evaluation based on the original sound x _{1 to 24} for each channel, the deteriorated sound x ' _{1 to 24} , and the head impulse response HRIR _{1 to 24} , and for each channel. Binaural signals y _{1_ori to 24_ori} and y _{1_sig to 24_sig} are generated (step S203).

マルチチャンネル客観評価装置１は、チャンネル毎に、当該チャンネルのバイノーラル信号ｙ_{1_ori～24_ori}，ｙ_{1_sig～24_sig}に基づいて、前述の非特許文献２のＩＴＵ－Ｒ勧告ＢＳ．１３８７－１に定めた客観評価法によるＰＥＡＱ評価を行う（ステップＳ２０４）。そして、マルチチャンネル客観評価装置１は、チャンネル毎の客観評価値ｚ_1～24を求める。 The multi-channel objective evaluation device 1 is based on the binaural signals y _{1_ori to 24_ori} and y _{1_sig to 24_sig} of the channel for each channel, and is based on the above-mentioned ITU-R recommendation BS of Non-Patent Document 2. PEAQ evaluation is performed by the objective evaluation method specified in 1387-1 (step S204). Then, the multi-channel objective evaluation device 1 obtains objective evaluation values z _{1 to 24} for each channel.

マルチチャンネル客観評価装置１は、チャンネル毎の客観評価値ｚ_1～24に基づいて、マルチチャンネルの客観評価値ｚを算出して出力する（ステップＳ２０５）。 The multi-channel objective evaluation device 1 calculates and outputs a multi-channel objective evaluation value z based on the objective evaluation values z _{1 to 24} for each channel (step S205).

（畳み込み信号出力部１０）
図１を参照して、畳み込み信号出力部１０は、予め設定されたデータベース（ＤＢ、図示せず）を備えている。畳み込み信号出力部１０は、２４チャンネルの音響信号の再生位置情報Ｐを入力し、ＤＢから、再生位置情報Ｐに対応するチャンネル毎の畳み込み信号、例えばチャンネル毎の頭部インパルス応答ＨＲＩＲ_1～24を読み出す。そして、畳み込み信号出力部１０は、チャンネル毎の頭部インパルス応答ＨＲＩＲ_1～24を信号処理部１１に出力する。 (Convolution signal output unit 10)
With reference to FIG. 1, the convolution signal output unit 10 includes a preset database (DB, not shown). The convolution signal output unit 10 inputs the reproduction position information P of the acoustic signal of 24 channels, and inputs the convolution signal for each channel corresponding to the reproduction position information P, for example, the head impulse response HRIR _{1 to 24} for each channel from the DB. read out. Then, the convolution signal output unit 10 outputs the head impulse responses HRIR _{1 to 24} for each channel to the signal processing unit 11.

図３は、畳み込み信号出力部１０の処理例を示すフローチャートである。畳み込み信号出力部１０は、再生位置情報Ｐを入力し（ステップＳ３０１）、再生位置情報Ｐに音響方式の情報が含まれるか、または角度の情報が含まれるかを判定する（ステップＳ３０２）。 FIG. 3 is a flowchart showing a processing example of the convolution signal output unit 10. The convolution signal output unit 10 inputs the reproduction position information P (step S301), and determines whether the reproduction position information P includes the acoustic method information or the angle information (step S302).

再生位置情報Ｐには、音響方式及び角度のうちいずれか一方の情報が含まれるものとする。２２．２ｃｈ、１１．１ｃｈ、７．１ｃｈ、５．１ｃｈ等のように、スピーカー配置が非特許文献３のように標準化された音響方式の場合、再生位置は固定であるため、プリセットを登録しておく。この場合、再生位置情報Ｐには、２２．２ｃｈ等を識別するための音響方式の情報が含まれる。一方、固定の音響方式を用いない場合、再生位置情報Ｐには、チャンネル毎に再生位置を特定するための角度の情報が含まれる。 It is assumed that the reproduction position information P includes information on either one of the acoustic method and the angle. In the case of a standardized acoustic method such as 22.2ch, 11.1ch, 7.1ch, 5.1ch, etc., as in Non-Patent Document 3, the playback position is fixed, so a preset is registered. Keep it. In this case, the reproduction position information P includes information on the acoustic method for identifying 22.2ch and the like. On the other hand, when the fixed acoustic method is not used, the reproduction position information P includes angle information for specifying the reproduction position for each channel.

畳み込み信号出力部１０は、ステップＳ３０２において、再生位置情報Ｐに音響方式の情報が含まれると判定した場合（ステップＳ３０２：音響方式）、ＤＢから、再生位置情報Ｐに含まれる音響方式に対応する頭部インパルス応答ＨＲＩＲ_1～24を読み出す（ステップＳ３０３）。 When the convolution signal output unit 10 determines in step S302 that the reproduction position information P includes the acoustic method information (step S302: acoustic method), the convolution signal output unit 10 corresponds to the acoustic method included in the reproduction position information P from the DB. The head impulse response HRIRs _{1 to 24} are read out (step S303).

一方、畳み込み信号出力部１０は、ステップＳ３０２において、再生位置情報Ｐに角度の情報が含まれると判定した場合（ステップＳ３０２：角度）、ＤＢから、再生位置情報Ｐに含まれる角度に対応する頭部インパルス応答ＨＲＩＲ_1～24を読み出す（ステップＳ３０４）。 On the other hand, when the convolution signal output unit 10 determines in step S302 that the reproduction position information P includes the angle information (step S302: angle), the head corresponding to the angle included in the reproduction position information P from the DB. Read out the part impulse response HRIRs _{1 to 24} (step S304).

畳み込み信号出力部１０は、ステップＳ３０３またはステップＳ３０４から移行して、チャンネル毎の頭部インパルス応答ＨＲＩＲ_1～24を信号処理部１１に出力する（ステップＳ３０５）。 The convolution signal output unit 10 shifts from step S303 or step S304 and outputs the head impulse responses HRIR _{1 to 24} for each channel to the signal processing unit 11 (step S305).

図４は、ＤＢのデータ構成例を示す図である。このＤＢは、音響方式、チャンネル番号（ラベル）、仰角、方位角、及びこれらの情報に対応する畳み込み信号である頭部インパルス応答ＨＲＩＲ（スピーカー位置と人間の耳の位置との間の伝達関数に対応するインパルス応答）のデータから構成される。 FIG. 4 is a diagram showing an example of DB data configuration. This DB is a transfer function between the acoustic system, channel number (label), elevation angle, azimuth angle, and head impulse response HRIR (speaker position and human ear position), which is a convolution signal corresponding to these information. It consists of data from the corresponding impulse response).

音響方式は、２２．２ｃｈ、１１．１ｃｈ、７．１ｃｈ、５．１ｃｈ等であり、チャンネル番号は、音響方式の各音響信号に対応した番号である。仰角は、スピーカー位置と人間の耳の位置との間の線が水平面となす角度であり、方位角は、スピーカー位置と人間の耳の位置との間の線が垂直面となす角度である。一般的に正面方向を仰角０度、方位角０度とする。 The acoustic system is 22.2ch, 11.1ch, 7.1ch, 5.1ch, etc., and the channel number is a number corresponding to each acoustic signal of the acoustic system. The elevation angle is the angle formed by the line between the speaker position and the position of the human ear with the horizontal plane, and the azimuth angle is the angle formed by the line between the speaker position and the position of the human ear with the vertical plane. Generally, the front direction is an elevation angle of 0 degrees and an azimuth angle of 0 degrees.

図４に示すＤＢには、音響方式が２２．２ｃｈの場合において、チャンネル番号３（ラベルがＦＣ（フロントセンター））、仰角０°、方位角０°、及びこれらの情報に対応する頭部インパルス応答ＨＲＩＲ₃等が格納されている。また、ＤＢには、２２．２ｃｈ以外の５．１ｃｈ等の音響方式のデータも格納されており、音響方式が５．１ｃｈの場合において、チャンネル番号３（ラベルがＣ（センター））、仰角０°、方位角０°、及びこれらの情報に対応する頭部インパルス応答ＨＲＩＲ₃等が格納されている。 In the DB shown in FIG. 4, when the acoustic method is 22.2ch, the channel number 3 (label is FC (front center)), the elevation angle 0 °, the azimuth angle 0 °, and the head impulse corresponding to these information are displayed. Response HRIR ₃ etc. are stored. In addition, data of acoustic methods such as 5.1ch other than 22.2ch are also stored in the DB, and when the acoustic method is 5.1ch, the channel number 3 (label is C (center)) and the elevation angle is 0. °, azimuth angle 0 °, and head impulse response HRIR ₃ and the like corresponding to this information are stored.

畳み込み信号出力部１０は、２２．２ｃｈの音響方式の情報を含む再生位置情報Ｐを入力した場合、ステップＳ３０３において、２２．２ｃｈの音響方式をキーとして図４のＤＢを検索する。そして、畳み込み信号出力部１０は、ＤＢから、２２．２ｃｈの音響方式のチャンネル番号１～２４に対応する頭部インパルス応答ＨＲＩＲ_1～24をそれぞれ読み出す。 When the reproduction position information P including the information of the acoustic method of 22.2ch is input, the convolution signal output unit 10 searches the DB of FIG. 4 using the acoustic method of 22.2ch as a key in step S303. Then, the convolution signal output unit 10 reads out the head impulse responses HRIRs _{1 to 24} corresponding to the channel numbers 1 to 24 of the 22.2ch acoustic method from the DB, respectively.

これにより、畳み込み信号出力部１０は、２２．２ｃｈの音響方式に対応するチャンネル毎の頭部インパルス応答ＨＲＩＲ_1～24を、各チャンネルの角度を意識することなく特定することができる。この場合のＤＢには、音響方式、チャンネル番号（ラベル）、及びこれらの情報に対応する頭部インパルス応答ＨＲＩＲが格納されていればよい。 Thereby, the convolution signal output unit 10 can specify the head impulse response HRIR _{1 to 24} for each channel corresponding to the 22.2ch acoustic method without being conscious of the angle of each channel. In this case, the DB may store the acoustic method, the channel number (label), and the head impulse response HRIR corresponding to these information.

また、畳み込み信号出力部１０は、チャンネル毎の仰角及び方位角の情報を含む再生位置情報Ｐを入力した場合、ステップＳ３０４において、チャンネル毎の仰角及び方位角をキーとして図４のＤＢを検索する。そして、畳み込み信号出力部１０は、ＤＢから、チャンネル毎の仰角及び方位角に対応する頭部インパルス応答ＨＲＩＲ_1～24をそれぞれ読み出す。 Further, when the convolution signal output unit 10 inputs the reproduction position information P including the elevation angle and azimuth information for each channel, in step S304, the convolution signal output unit 10 searches the DB of FIG. 4 using the elevation angle and azimuth angle for each channel as keys. .. Then, the convolution signal output unit 10 reads out the head impulse responses HRIR _{1 to 24} corresponding to the elevation angle and the azimuth angle for each channel from the DB, respectively.

これにより、畳み込み信号出力部１０は、スピーカー配置がプリセットされていない、２以上のスピーカーが任意に配置されたマルチチャンネルシステムについて、チャンネル毎の角度に対応するチャンネル毎の頭部インパルス応答ＨＲＩＲ_1～24を特定することができる。この場合のＤＢには、仰角、方位角、及びこれらの情報に対応する頭部インパルス応答ＨＲＩＲが格納されていればよい。 As a result, the convolution signal output unit 10 has head-related impulse response HRIR _{1 to} each channel corresponding to the angle of each channel for a multi-channel system in which two or more speakers are arbitrarily arranged without preset speaker arrangement. ₂₄ can be identified. In this case, the DB may store the elevation angle, the azimuth angle, and the head impulse response HRIR corresponding to these information.

（信号処理部１１）
図１に戻って、信号処理部１１は、マルチチャンネル音響信号の原音ｘ_1～24及び劣化音ｘ’_1～24を入力すると共に、畳み込み信号出力部１０から頭部インパルス応答ＨＲＩＲ_1～24を入力する。 (Signal processing unit 11)
Returning to FIG. 1, the signal processing unit 11 inputs the original sound x _{1 to 24} and the deteriorated sound x ' _{1 to 24} of the multi-channel acoustic signal, and receives the head impulse response HRIR _{1 to 24} from the convolution signal output unit 10. input.

信号処理部１１は、原音ｘ_1～24、劣化音ｘ’_1～24及び頭部インパルス応答ＨＲＩＲ_1～24に基づいて畳み込み処理を行い、主観評価を考慮したチャンネル毎のバイノーラル信号ｙ_{1_ori～24_ori}，ｙ_{1_sig～24_sig}を生成する。具体的には、信号処理部１１は、チャンネル毎に、例えば全ての原音ｘ_1～24、当該チャンネルを含む所定のチャンネルのみの劣化音ｘ’（当該チャンネルのみの１つの劣化音ｘ’、または当該チャンネルを含む複数チャンネルの劣化音ｘ’）、及び頭部インパルス応答ＨＲＩＲ_1～24に基づいて畳み込み処理を行う。信号処理部１１は、チャンネル毎のバイノーラル信号ｙ_{1_ori～24_ori}，ｙ_{1_sig～24_sig}をＰＥＡＱ評価部１２に出力する。 The signal processing unit 11 performs convolution processing based on the original sound x _{1 to 24} , the deteriorated sound x ' _{1 to 24} , and the head impulse response HRIR _{1 to 24} , and the binaural signal y _{1_ori to 24_ori} for each channel in consideration of subjective evaluation. ， Y _{Generates 1_sig to 24_sig} . Specifically, the signal processing unit 11 may use the signal processing unit 11 for each channel, for example, all the original sounds x _{1 to 24} , the deteriorated sound x'of only a predetermined channel including the channel, or the deteriorated sound x'of only the channel. The convolution process is performed based on the deteriorated sound x') of a plurality of channels including the channel and the head impulse response HRIR _{1 to 24} . The signal processing unit 11 outputs binaural signals y _{1_ori to 24_ori} and y _{1_sig to 24_sig} for each channel to the PEAQ evaluation unit 12.

ここで、マルチチャンネル音響信号のチャンネル数をＭ（本例ではＭ＝２４）とすると、チャンネル毎の（Ｍ個の）バイノーラル信号ｙ_{1_ori～M_ori}またはｙ_{1_sig～M_sig}を生成する際に、チャンネル毎にチャンネル数Ｎ（＜Ｍ）の劣化音ｘ’または原音ｘが用いられる。Ｍは２より大きい正の整数であり、劣化音ｘ’または原音ｘのチャンネル数Ｎは、１以上かつマルチチャンネル音響信号のチャンネル数Ｍよりも小さい整数である（１≦Ｎ＜Ｍ）。 Here, assuming that the number of channels of the multi-channel acoustic signal is M (M = 24 in this example), each channel (M) binaural signals y _{1_ori to M_ori} or y _{1_sig to M_sig} for each channel is generated. The deteriorated sound x'or the original sound x having the number of channels N (<M) is used. M is a positive integer larger than 2, and the number of channels N of the deteriorated sound x'or the original sound x is 1 or more and smaller than the number of channels M of the multi-channel acoustic signal (1 ≦ N <M).

劣化音ｘ’または原音ｘのチャンネル数Ｎは、チャンネル番号ｋ（ｋ＝１～Ｍ）のチャンネルのバイノーラル信号ｙ_{k_ori}，ｙ_{k_sig}を生成する際の、そのチャンネル（チャンネル番号ｋのチャンネル）を含む１または２以上のチャンネルの数である。チャンネル番号ｋのチャンネルに加え、そのチャンネルに対して隣接するチャンネルを含むようにしてもよいし、チャンネル間相関を算出し、正規化相関係数が大きいチャンネルから選択するようにしてもよい。ここで、チャンネル番号ｋの信号をｆ（ｔ）、隣接するチャンネルの信号をｇ（ｔ）とすると、正規化相関関数σ_fgは、以下の数式（１）にて算出される。σ_f，σ_gは信号ｆ（ｔ），ｇ（ｔ）の標準偏差である。

The number of channels N of the deteriorated sound x'or the original sound x includes the channel (channel of channel number k) when the binoral signals y _{k_ori} and y _{k_sig} of the channel of channel number k (k = 1 to M) are generated. The number of channels of 1 or 2 or more. In addition to the channel with the channel number k, the channels adjacent to the channel may be included, or the interchannel correlation may be calculated and selected from the channels having a large normalization correlation coefficient. Here, assuming that the signal of the channel number k is f (t) and the signal of the adjacent channel is g (t), the normalized correlation function σ _fg is calculated by the following mathematical formula (1). σ _f and σ _g are standard deviations of the signals f (t) and g (t).

バイノーラル信号ｙ_{1_ori～24_ori}，ｙ_{1_sig～24_sig}は、原音ｘ_1～24に対応する基本信号ｙ_{1_ori～24_ori}、及び劣化音ｘ’_1～24に対応する被測定信号ｙ_{1_sig～24_sig}により構成される。基本信号ｙ_{1_ori～24_ori}及び被測定信号ｙ_{1_sig～24_sig}は、信号処理部１１により、後述する図５または図６に示す処理例にて生成される。 The binaural signals y _{1_ori to 24_ori} and y _{1_sig to 24_sig} are composed of the basic signals y _{1_ori to 24_ori} corresponding to the original sounds x ₁ to 24 and the measured signals y _{1_sig to 24_sig} corresponding to the deteriorated sounds x ' _{1 to 24} . .. The basic signals y _{1_ori to 24_ori} and the measured signals y _{1_sig to 24_sig} are generated by the signal processing unit 11 in the processing example shown in FIG. 5 or FIG. 6 to be described later.

図５は、信号処理部１１の第１処理例を示すフローチャートである。この第１処理例は、チャンネル毎に、全ての原音ｘ_1～24、当該チャンネルのみの劣化音ｘ’、及び頭部インパルス応答ＨＲＩＲ_1～24に基づいて畳み込み処理を行い、バイノーラル信号ｙ_{_ori}，ｙ_{1_sig～24_sig}を生成する例である。マルチチャンネル音響信号のチャンネル数Ｍ＝２４、劣化音ｘ’のチャンネル数Ｎ＝１とする。 FIG. 5 is a flowchart showing a first processing example of the signal processing unit 11. In this first processing example, convolution processing is performed for each channel based on all the original sounds x _{1 to 24} , the deteriorated sound x'of the channel only, and the head impulse response HRIR _{1 to 24} , and the binaural signal y _{_ori} ,. This is an example of generating y _{1_sig to 24_sig} . It is assumed that the number of channels M of the multi-channel acoustic signal is 24 and the number of channels of the deteriorated sound x'is N = 1.

信号処理部１１は、マルチチャンネル音響信号の原音ｘ_1～24及び劣化音ｘ’_1～24を入力すると共に、畳み込み信号出力部１０から頭部インパルス応答ＨＲＩＲ_1～24を入力する（ステップＳ５０１）。 The signal processing unit 11 inputs the original sound x _{1 to 24} and the deteriorated sound x ' _{1 to 24} of the multi-channel acoustic signal, and also inputs the head impulse response HRIR _{1 to 24} from the convolution signal output unit 10 (step S501). ..

信号処理部１１は、全ての原音ｘ_1～24及び頭部インパルス応答ＨＲＩＲ_1～24を用いて畳み込み処理を行い、共通の基本信号ｙ_{_ori}を生成する（ステップＳ５０２）。 The signal processing unit 11 performs convolution processing using all the original sounds x _{1 to 24} and the head impulse response HRIR _{1 to 24} , and generates a common basic signal y _{_ori} (step S502).

具体的には、信号処理部１１は、以下の数式（２）に示すように、チャンネル毎の原音ｘ_1～24にチャンネル毎の頭部インパルス応答ＨＲＩＲ_1～24をそれぞれ畳み込み、全てのチャンネルの畳み込み結果を加算し、加算結果を、共通の基本信号ｙ_{_ori}として生成する。

Specifically, as shown in the following formula (2), the signal processing unit 11 convolves the head impulse response HRIR 1 to 24 for each channel into the original sound x _{1 to 24} for each channel, respectively, and the signal processing unit 11 convolves the head impulse response HRIR _{1 to 24} for each channel. The convolution results are added, and the addition result is generated as a common basic signal y _{_ori} .

ここで、チャンネル番号ｋの基本信号をｙ_{k_ori}、チャンネル番号ｉの原音をｘ_i、チャンネル番号ｉの頭部インパルス応答をＨＲＩＲ_iとすると、基本信号ｙ_{k_ori}は、ｙ_{_ori}と同じになる。ｋ，ｉは、それぞれ１から２４までの整数であり、＊は畳み込み演算を示す。 Here, if the basic signal of the channel number k is y _{k_ori} , the original sound of the channel number i is x _i , and the head impulse response of the channel number i is HRIR _i , the basic signal y _{k_ori} is the same as y _{_ori} . k and i are integers from 1 to 24, respectively, and * indicates a convolution operation.

信号処理部１１は、チャンネル毎に、チャンネル数２３（＝Ｍ－Ｎ＝２４－１）の原音ｘ及びチャンネル数１（＝Ｎ）の劣化音ｘ’、並びに全てのチャンネルの頭部インパルス応答ＨＲＩＲ_1～24を用いて畳み込み処理を行い、チャンネル毎の被測定信号ｙ_{1_sig～24_sig}を生成する（ステップＳ５０３）。 The signal processing unit 11 has, for each channel, the original sound x of the number of channels 23 (= MN = 24-1), the deteriorated sound x'of the number of channels 1 (= N), and the head impulse response HRIR of all the channels. The convolution process is performed using _{1 to 24} to generate the measured signals y _{1_sig to 24_sig} for each channel (step S503).

具体的には、信号処理部１１は、チャンネル毎に、当該チャンネル（チャンネル番号ｋとする。）以外のチャンネル数２３の原音ｘに頭部インパルス応答ＨＲＩＲをそれぞれ畳み込み、チャンネル数２３の畳み込み結果を加算し、チャンネル数２３の原音ｘの加算結果を得る。そして、信号処理部１１は、当該チャンネルにおけるチャンネル数１の劣化音ｘ’に頭部インパルス応答ＨＲＩＲを畳み込み、チャンネル数１の劣化音ｘ’の畳み込み結果を得る。 Specifically, the signal processing unit 11 convolves the head impulse response HRIR into the original sound x of the number of channels 23 other than the channel (referred to as the channel number k) for each channel, and obtains the convolution result of the number of channels 23. Addition is performed, and the addition result of the original sound x of the number of channels 23 is obtained. Then, the signal processing unit 11 convolves the head impulse response HRIR with the deteriorated sound x'of the number of channels 1 in the channel, and obtains the convolution result of the deteriorated sound x'of the number of channels 1.

信号処理部１１は、チャンネル数２３の原音ｘの加算結果に、チャンネル数１（チャンネル番号ｋとする。）の劣化音ｘ’の畳み込み結果を加算し、当該加算結果を、当該チャンネルの被測定信号ｙ_{k_sig}とし、チャンネル毎の被測定信号ｙ_{1_sig～24_sig}を生成する。 The signal processing unit 11 adds the convolution result of the deteriorated sound x'of the channel number 1 (channel number k) to the addition result of the original sound x of the channel number 23, and the addition result is measured by the channel. The signal y _{k_sig} is used, and the measured signals y _{1_sig to 24_sig} for each channel are generated.

尚、信号処理部１１は、原音ｘ_1～24に頭部インパルス応答ＨＲＩＲ_1～24をそれぞれ畳み込み、全てのチャンネルの畳み込み結果を加算し、当該チャンネルの原音ｘに頭部インパルス応答ＨＲＩＲを畳み込み、前者の加算結果から後者の畳み込み結果を減算することで、チャンネル数２３の原音ｘの加算結果を得るようにしてもよい。そして、信号処理部１１は、チャンネル数２３の原音ｘの加算結果に、チャンネル数１の劣化音ｘ’の畳み込み結果を加算し、チャンネル毎の被測定信号ｙ_{1_sig～24_sig}を生成する。これは、後述する数式（３）の演算に相当する。 The signal processing unit 11 convolves the head impulse response HRIRs _{1 to 24} into the original sounds x _{1 to 24} , adds the convolution results of all channels, and convolves the head impulse response HRIR into the original sounds x of the channel. By subtracting the latter convolution result from the former addition result, the addition result of the original sound x having the number of channels 23 may be obtained. Then, the signal processing unit 11 adds the convolution result of the deteriorated sound x'of the number of channels 1 to the addition result of the original sound x of the number of channels 23, and generates the measured signals y _{1_sig to 24_sig} for each channel. This corresponds to the calculation of the mathematical formula (3) described later.

ここで、チャンネル番号ｋの被測定信号をｙ_{k_sig}、チャンネル番号ｉ，ｋの原音をそれぞれｘ_i，ｘ_k、チャンネル番号ｉ，ｋの頭部インパルス応答をそれぞれＨＲＩＲ_i，ＨＲＩＲ_k、チャンネル番号ｋの劣化音をｘ’_kとすると、被測定信号ｙ_{k_sig}は、以下の数式にて表される。

Here, the measured signal of the channel number k is y _{k_sig} , the original sound of the channel numbers i and k is x _i and x _k , respectively, and the head impulse response of the channel numbers i and k is HRIR _i , HRIR _k and the channel number k, respectively. The measured signal y _{k_sig} is expressed by the following formula, _{where x'k} is the deteriorated sound of.

尚、前記数式（３）は、劣化音ｘ’のチャンネル数Ｎ＝１の式であり、人間が１つのチャンネルに注目して主観評価することを想定したものである。しかし、実際は、音源の種類によっては人間が２以上のチャンネルに着目して主観評価することもあり得る。この場合、劣化音ｘ’のチャンネル数Ｎ＞１としたときの被測定信号ｙ_{k_sig}が算出される。劣化音ｘ’のチャンネル数Ｎ＞１の場合、前記数式（３）の右辺の第２項は、チャンネル数Ｎ分の原音ｘについて畳み込み演算が行われ、それぞれの演算結果が減算される。また、前記数式（３）の右辺の第３項は、チャンネル数Ｎ分の劣化音ｘ’について畳み込み演算が行われ、それぞれの演算結果が加算される。 The formula (3) is a formula in which the number of channels N = 1 of the deteriorated sound x'is assumed to be subjectively evaluated by a human being paying attention to one channel. However, in reality, depending on the type of sound source, a human may focus on two or more channels for subjective evaluation. In this case, the measured signal y _{k_sig} is calculated when the number of channels N> 1 of the deteriorated sound x'. When the number of channels N> 1 of the deteriorated sound x', the second term on the right side of the equation (3) is convolved with respect to the original sound x for the number of channels N, and the respective calculation results are subtracted. Further, in the third term on the right side of the mathematical formula (3), a convolution calculation is performed on the deteriorated sound x'for the number of channels N, and the respective calculation results are added.

信号処理部１１は、ステップＳ５０２にて生成した基本信号ｙ_{_ori}、及びステップＳ５０３にて生成した被測定信号ｙ_{1_sig～24_sig}をＰＥＡＱ評価部１２に出力する（ステップＳ５０４）。 The signal processing unit 11 outputs the basic signal y _{_ori} generated in step S502 and the measured signals y _{1_sig to 24_sig} generated in step S503 to the PEAQ evaluation unit 12 (step S504).

このように、基本信号ｙ_{_ori}は、全てのチャンネルの原音ｘ_1～24を用いた畳み込み処理にて生成される。また、被測定信号ｙ_{1_sig～24_sig}は、チャンネル毎に、当該チャンネル以外のチャンネル数２３の原音ｘ、及び当該チャンネルのチャンネル数１の劣化音ｘ’を用いた畳み込み処理にて生成される。 In this way, the basic signal y _{_ori} is generated by the convolution process using the original sounds x _{1 to 24} of all channels. Further, the measured signals y _{1_sig to 24_sig} are generated for each channel by a convolution process using the original sound x of 23 channels other than the channel and the deteriorated sound x'of 1 channel of the channel.

つまり、所定チャンネル（チャンネル番号ｋのチャンネル）のバイノーラル信号ｙ_{k_ori}，ｙ_{k_sig}は、全てのチャンネルの原音ｘ_1～24に基づいた基本信号ｙ_{_ori}と、全てのチャンネルの劣化音ｘ’_1～24のうち当該チャンネルの劣化音ｘ’_kに基づいた被測定信号ｙ_{k_sig}とにより構成される。このため、被測定信号ｙ_{1_sig～24_sig}は、マルチチャンネル音響において、個別の音源の音質劣化に着目して評価する主観評価を考慮したバイノーラル信号となる。 That is, the binoral signals y _{k_ori} and y _{k_sig} of the predetermined channel (channel number k) are the basic signal y _{_ori} based on the original sound x _{1 to 24} of all channels and the deteriorated sound x ' _{1 to 24} of all channels. Of these, it is composed of the measured signal y _{k_sig} based on the deteriorated sound _x'k of the channel. Therefore, the measured signals y _{1_sig to 24_sig} are binaural signals in consideration of subjective evaluation that focuses on the deterioration of sound quality of individual sound sources in multi-channel acoustics.

図６は、信号処理部１１の第２処理例を示すフローチャートである。この第２処理例は、チャンネル毎に、全ての劣化音ｘ’_1～24、当該チャンネルのみの原音ｘ、及び頭部インパルス応答ＨＲＩＲ_1～24に基づいて畳み込み処理を行い、バイノーラル信号ｙ_{1_ori～24_ori}，ｙ_{_sig}を生成する例である。マルチチャンネル音響信号のチャンネル数Ｍ＝２４、劣化音ｘ’のチャンネル数Ｎ＝１とする。 FIG. 6 is a flowchart showing a second processing example of the signal processing unit 11. In this second processing example, convolution processing is performed for each channel based on all the deteriorated sounds x ' _{1 to 24} , the original sound x of only the channel, and the head impulse response HRIR _{1 to 24} , and the binaural signal y _{1_ori ~.} This is an example of generating _{24_ori} and y _{_sig} . It is assumed that the number of channels M of the multi-channel acoustic signal is 24 and the number of channels of the deteriorated sound x'is N = 1.

信号処理部１１は、マルチチャンネル音響信号の原音ｘ_1～24及び劣化音ｘ’_1～24を入力すると共に、畳み込み信号出力部１０から頭部インパルス応答ＨＲＩＲ_1～24を入力する（ステップＳ６０１）。 The signal processing unit 11 inputs the original sound x _{1 to 24} and the deteriorated sound x ' _{1 to 24} of the multi-channel acoustic signal, and also inputs the head impulse response HRIR _{1 to 24} from the convolution signal output unit 10 (step S601). ..

信号処理部１１は、全ての劣化音ｘ’_1～24及び頭部インパルス応答ＨＲＩＲ_1～24を用いて畳み込み処理を行い、共通の被測定信号ｙ_{_sig}を生成する（ステップＳ６０２）。 The signal processing unit 11 performs convolution processing using all the deteriorated sounds x ' _{1 to 24} and the head impulse response HRIR _{1 to 24} , and generates a common measured signal y _{_sig} (step S602).

具体的には、信号処理部１１は、以下の数式（４）に示すように、チャンネル毎の劣化音ｘ’_1～24にチャンネル毎の頭部インパルス応答ＨＲＩＲ_1～24をそれぞれ畳み込み、全てのチャンネルの畳み込み結果を加算し、加算結果を、共通の被測定信号ｙ_{_sig}として生成する。

Specifically, as shown in the following formula (4), the signal processing unit 11 convolves the head impulse response HRIR _{1 to 24} for each channel into the deterioration sound x ' _{1 to 24} for each channel, and all of them. The convolution results of the channels are added, and the addition result is generated as a common measured signal y _{_sig} .

ここで、チャンネル番号ｋの被測定信号をｙ_{k_sig}、チャンネル番号ｉの劣化音をｘ’_i、チャンネル番号ｉの頭部インパルス応答をＨＲＩＲ_iとすると、被測定信号をｙ_{k_sig}は、ｙ_{_sig}と同じになる。 Here, assuming that the measured signal of channel number k is y _{k_sig} , the degraded sound of channel number _i is x'i, and the head impulse response of channel number i is HRIR _i , the measured signal is y _{k_sig} and y _{_sig} . Will be the same.

信号処理部１１は、チャンネル毎に、チャンネル数２３の劣化音ｘ’及びチャンネル数１の原音ｘ、並びに全てのチャンネルの頭部インパルス応答ＨＲＩＲ_1～24を用いて畳み込み処理を行い、チャンネル毎の基本信号ｙ_{1_ori～24_ori}を生成する（ステップＳ６０３）。 The signal processing unit 11 performs convolution processing for each channel using the deteriorated sound x'of the number of channels 23, the original sound x of the number of channels 1, and the head impulse responses HRIR _{1 to 24} of all the channels, and performs the convolution processing for each channel. The basic signals y _{1_ori to 24_ori} are generated (step S603).

具体的には、信号処理部１１は、チャンネル毎に、当該チャンネル以外のチャンネル数２３の劣化音ｘ’に頭部インパルス応答ＨＲＩＲをそれぞれ畳み込み、チャンネル数２３の畳み込み結果を加算し、チャンネル数２３の劣化音ｘ’の加算結果を得る。そして、信号処理部１１は、当該チャンネルにおけるチャンネル数１の原音ｘに頭部インパルス応答ＨＲＩＲを畳み込み、チャンネル数１の原音ｘの畳み込み結果を得る。 Specifically, the signal processing unit 11 convolves the head impulse response HRIR into the deteriorated sound x'of the number of channels 23 other than the channel, and adds the convolution result of the number of channels 23 to the number of channels 23. The addition result of the deteriorated sound x'is obtained. Then, the signal processing unit 11 convolves the head impulse response HRIR with the original sound x having the number of channels 1 in the channel, and obtains the convolution result of the original sound x having the number of channels 1.

信号処理部１１は、チャンネル数２３の劣化音ｘ’の加算結果に、チャンネル数１（チャンネル番号ｋとする。）の原音ｘの畳み込み結果を加算し、当該加算結果を、当該チャンネルの基本信号ｙ_{k_ori}とし、チャンネル毎の基本信号ｙ_{1_ori～24_ori}を生成する。 The signal processing unit 11 adds the convolution result of the original sound x of the channel number 1 (channel number k) to the addition result of the deteriorated sound x'of the channel number 23, and the addition result is the basic signal of the channel. Let y _{k_ori} and generate basic signals y _{1_ori to 24_ori} for each channel.

尚、信号処理部１１は、劣化音ｘ’_1～24に頭部インパルス応答ＨＲＩＲ_1～24をそれぞれ畳み込み、全てのチャンネルの畳み込み結果を加算し、当該チャンネルの劣化音ｘ’に頭部インパルス応答ＨＲＩＲを畳み込み、前者の加算結果から後者の畳み込み結果を減算することで、チャンネル数２３の劣化音ｘ’の加算結果を得るようにしてもよい。そして、信号処理部１１は、チャンネル数２３の劣化音ｘ’の加算結果に、チャンネル数１の原音ｘの畳み込み結果を加算し、チャンネル毎の基本信号ｙ_{1_ori～24_ori}を生成する。これは、後述する数式（５）の演算に相当する。 The signal processing unit 11 convolves the head impulse response HRIRs _{1 to 24} with the deterioration sound x ' _{1 to 24} , adds the convolution results of all channels, and adds the head impulse response to the deterioration sound x'of the channel. By convolving the HRIR and subtracting the convolution result of the latter from the addition result of the former, the addition result of the deteriorated sound x'of the number of channels 23 may be obtained. Then, the signal processing unit 11 adds the convolution result of the original sound x of the number of channels 1 to the addition result of the deteriorated sound x'of the number of channels 23, and generates the basic signals y _{1_ori to 24_ori} for each channel. This corresponds to the calculation of the mathematical formula (5) described later.

ここで、チャンネル番号ｋの基本信号をｙ_{k_ori}、チャンネル番号ｉ，ｋの劣化音をそれぞれｘ’_i，ｘ’_k、チャンネル番号ｉ，ｋの頭部インパルス応答をそれぞれＨＲＩＲ_i，ＨＲＩＲ_k、チャンネル番号ｋの原音をｘ_kとすると、基本信号をｙ_{k_ori}は、以下の数式にて表される。

Here, the basic signal of the channel number k is y _{k_ori} , the degraded sound of the channel numbers i and k is x'i and x'k, respectively, and the head impulse response of the channel numbers _{i and k is HRIR i} _{, HRIR k} _and _the channel, respectively. Assuming that the original sound of the number k is x _k , the basic signal y _{k_ori} is expressed by the following formula.

尚、劣化音のチャンネル数Ｎ＞１の場合、前記数式（５）の右辺の第２項は、チャンネル数Ｎ分の劣化音ｘ’について畳み込み演算が行われ、それぞれの演算結果が減算される。また、前記数式（５）の右辺の第３項は、チャンネル数Ｎ分の原音ｘについて畳み込み演算が行われ、それぞれの演算結果が加算される。 When the number of channels of the deteriorated sound N> 1, the second term on the right side of the formula (5) is subjected to a convolution calculation for the deteriorated sound x'for the number of channels N, and the respective calculation results are subtracted. .. Further, in the third term on the right side of the mathematical formula (5), a convolution operation is performed on the original sound x for the number of channels N, and the respective calculation results are added.

信号処理部１１は、ステップＳ６０２にて生成した被測定信号ｙ_{_sig}、及びステップＳ６０３にて生成した基本信号ｙ_{1_ori～24_ori}をＰＥＡＱ評価部１２に出力する（ステップＳ６０４）。 The signal processing unit 11 outputs the measured signal y _{_sig} generated in step S602 and the basic signals y _{1_ori to 24_ori} generated in step S603 to the PEAQ evaluation unit 12 (step S604).

このように、被測定信号ｙ_{_sig}は、全てのチャンネルの劣化音ｘ’_1～24を用いた畳み込み処理にて生成される。また、基本信号ｙ_{1_ori～24_ori}は、チャンネル毎に、当該チャンネル以外のチャンネル数２３の劣化音ｘ’、及び当該チャンネルのチャンネル数１の原音ｘを用いた畳み込み処理にて生成される。 In this way, the measured signal y _{_sig} is generated by the convolution process using the deteriorated sounds x ' _{1 to 24} of all channels. Further, the basic signals y _{1_ori to 24_ori} are generated for each channel by a convolution process using the deteriorated sound x'of the number of channels 23 other than the channel and the original sound x of the channel number 1 of the channel.

つまり、所定チャンネル（チャンネル番号ｋのチャンネル）のバイノーラル信号ｙ_{k_ori}，ｙ_{k_sig}は、全てのチャンネルの劣化音ｘ’_1～24に基づいた被測定信号ｙ_{_sig}と、全てのチャンネルの原音ｘ_1～24のうち当該チャンネルの原音ｘ_kに基づいた基本信号ｙ_{k_ori}とにより構成される。この場合、基本信号ｙ_{k_ori}が基本の信号となり、被測定信号ｙ_{_sig}が、所定チャンネルの音源の音質劣化が反映された信号となる。このため、基本信号ｙ_{1_ori～24_ori}は、個別の音源の音質劣化に着目して評価する主観評価を考慮したバイノーラル信号となる。 That is, the binoral signals y _{k_ori} and y _{k_sig} of the predetermined channel (channel number k) are the measured signal y _{_sig} based on the deteriorated sound x ' _{1 to 24} of all channels and the original sound x _{1 to} all channels. Of the ₂₄ , it is composed of the basic signal y _{k_ori} based on the original sound x _k of the channel. In this case, the basic signal y _{k_ori} becomes the basic signal, and the measured signal y _{_sig} becomes a signal reflecting the deterioration of the sound quality of the sound source of the predetermined channel. Therefore, the basic signals y _{1_ori to 24_ori} are binaural signals in consideration of subjective evaluation that focuses on the deterioration of sound quality of individual sound sources.

尚、図５及び図６は、劣化音ｘ’のチャンネル数Ｎ＝１の例であるが、Ｎ＞１の場合も同様に適用できる。Ｎ＞１の場合、信号処理部１１は、チャンネル番号ｋのチャンネルの基本信号ｙ_{k_ori}を生成する際に、チャンネル数Ｎ＞１の原音ｘを選択する必要がある。 Note that FIGS. 5 and 6 are examples of the number of channels N = 1 of the deteriorated sound x', but the same can be applied when N> 1. When N> 1, the signal processing unit 11 needs to select the original sound x having the number of channels N> 1 when generating the basic signal y _{k_ori} of the channel with the channel number k.

信号処理部１１は、チャンネル番号ｋのチャンネルについて、例えば、当該チャンネルの原音ｘ_kに加え、当該チャンネルに隣接する所定数のチャンネルの原音ｘを選択する。所定数は１以上の整数である。 For the channel of channel number k, the signal processing unit 11 selects, for example, the original sound x of a predetermined number of channels adjacent to the channel in addition to the original sound x _k of the channel. The predetermined number is an integer of 1 or more.

具体的には、チャンネル番号ｋのチャンネルに隣接するチャンネルが複数の場合、信号処理部１１は、チャンネル番号ｋのチャンネルとこれに隣接するチャンネルとの間の正規化相関係数ρ_fg（前記数式（１））を、隣接する複数のチャンネルのそれぞれについて算出する。信号処理部１１は、隣接する複数のチャンネルを、正規化相関係数ρ_fgが大きい順に並べる。信号処理部１１は、チャンネル番号ｋのチャンネルの原音ｘ_kに加え、正規化相関係数ρ_fgの大きい所定数のチャンネルの原音ｘを選択する。チャンネル番号ｋのチャンネルに隣接する複数のチャンネルは、再生位置情報Ｐから予め設定されるものとする。 Specifically, when there are a plurality of channels adjacent to the channel of channel number k, the signal processing unit 11 has a normalized correlation coefficient ρ _fg between the channel of channel number k and the channel adjacent thereto (the above-mentioned formula). (1)) is calculated for each of a plurality of adjacent channels. The signal processing unit 11 arranges a plurality of adjacent channels in descending order of the normalized correlation coefficient ρ _fg . The signal processing unit 11 selects the original sound x of a predetermined number of channels having a large normalization correlation coefficient ρ _fg in addition to the original sound x _k of the channel having the channel number k. It is assumed that a plurality of channels adjacent to the channel with the channel number k are preset from the reproduction position information P.

この場合、信号処理部１１は、チャンネル番号ｋのチャンネルに隣接しないチャンネルの原音ｘを選択するようにしてもよい。具体的には、信号処理部１１は、隣接する複数のチャンネル以外のチャンネル（隣接しないチャンネル）について、チャンネル番号ｋのチャンネルと隣接しないチャンネルとの間の正規化相関係数ρ_fgを算出する。そして、信号処理部１１は、その正規化相関係数ρ_fgが隣接するチャンネルよりも大きい場合、隣接するチャンネルに代えて、隣接しないチャンネルの原音ｘを選択する。 In this case, the signal processing unit 11 may select the original sound x of the channel not adjacent to the channel of the channel number k. Specifically, the signal processing unit 11 calculates the normalized correlation coefficient ρ _fg between the channel having the channel number k and the non-adjacent channel for channels other than the plurality of adjacent channels (non-adjacent channels). Then, when the normalized correlation coefficient ρ _fg is larger than that of the adjacent channel, the signal processing unit 11 selects the original sound x of the non-adjacent channel instead of the adjacent channel.

（ＰＥＡＱ評価部１２）
図１に戻って、ＰＥＡＱ評価部１２は、信号処理部１１からチャンネル毎のバイノーラル信号ｙ_{1_ori～24_ori}，ｙ_{1_sig～24_sig}を入力する。そして、ＰＥＡＱ評価部１２は、チャンネル毎に、前述の非特許文献２のＩＴＵ－Ｒ勧告ＢＳ．１３８７－１に定めた客観評価法であるＰＥＡＱ客観音質測定法により客観評価値ｚ_1～24を求める。ＰＥＡＱ評価部１２は、チャンネル毎の客観評価値ｚ_1～24をマルチチャンネル評価部１３に出力する。 (PEAQ evaluation unit 12)
Returning to FIG. 1, the PEAQ evaluation unit 12 inputs binaural signals y _{1_ori to 24_ori} and y _{1_sig to 24_sig} for each channel from the signal processing unit 11. Then, the PEAQ evaluation unit 12 uses the above-mentioned ITU-R recommendation BS of Non-Patent Document 2 for each channel. The objective evaluation values z _{1 to 24} are obtained by the PEAQ objective sound quality measurement method, which is the objective evaluation method defined in 1387-1. The PEAQ evaluation unit 12 outputs the objective evaluation values z _{1 to 24} for each channel to the multi-channel evaluation unit 13.

ＰＥＡＱ評価部１２は、ＰＥＡＱ評価手段２０－１、ＰＥＡＱ評価手段２０－２、・・・及びＰＥＡＱ評価手段２０－２４を備えている。ＰＥＡＱ評価手段２０－ｋは、信号処理部１１からチャンネル番号ｋのバイノーラル信号ｙ_{k_ori}，ｙ_{k_sig}を入力し、ＰＥＡＱ客観音質測定法のアルゴリズムを用いて客観評価値ｚ_kを求め、客観評価値ｚ_kをマルチチャンネル評価部１３に出力する。ｋは、前述のとおり１から２４までの整数である。 The PEAQ evaluation unit 12 includes PEAQ evaluation means 20-1, PEAQ evaluation means 20-2, ..., And PEAQ evaluation means 20-24. The PEAQ evaluation means 20-k inputs the binoral signals y _{k_ori} and y _{k_sig} of the channel number k from the signal processing unit 11, obtains the objective evaluation value z _k using the algorithm of the PEAQ objective sound quality measurement method, and obtains the objective evaluation value z. _k is output to the multi-channel evaluation unit 13. k is an integer from 1 to 24 as described above.

具体的には、ＰＥＡＱ評価手段２０－ｋは、基本信号ｙ_{k_ori}及び被測定信号ｙ_{k_sig}から構成されるバイノーラル信号ｙ_{k_ori}，ｙ_{k_sig}を入力する。そして、ＰＥＡＱ評価手段２０－ｋは、人間の耳の知覚特性を反映した聴覚モデルを用いて、基本信号ｙ_{k_ori}についての聴覚モデル出力信号、及び被測定信号ｙ_{k_sig}についての聴覚モデル出力信号を生成する。 Specifically, the PEAQ evaluation means 20-k inputs binaural signals y _{k_ori} and y _{k_sig} composed of the basic signal y _{k_ori} and the measured signal y _{k_sig} . Then, the PEAQ evaluation means 20-k generates an auditory model output signal for the basic signal y _{k_ori} and an auditory model output signal for the measured signal y _{k_sig} using an auditory model that reflects the perceptual characteristics of the human ear. do.

この聴覚モデルは、外耳、中耳及び内耳の各機能を模擬したアルゴルズムにより、入力信号にＦＦＴ（Fast Fourier Transform：高速フーリエ変換）を施して周波数成分の信号を生成し、周波数成分の信号を、内耳の機能を反映したグループに分類し、周波数成分の信号に血流等の生理的雑音を加算し、周波数軸上及び時間軸上の広がりを考慮して神経興奮パターンを計算することにより、聴覚モデル出力信号を生成する。 In this auditory model, the input signal is subjected to FFT (Fast Fourier Transform) by the algorithm that simulates the functions of the outer ear, middle ear, and inner ear to generate a frequency component signal, and the frequency component signal is generated. Hearing by classifying into groups that reflect the function of the inner ear, adding physiological noise such as blood flow to the signal of the frequency component, and calculating the nerve excitement pattern in consideration of the spread on the frequency axis and the time axis. Generate a model output signal.

ＰＥＡＱ評価手段２０－ｋは、基本信号ｙ_{k_ori}及び被測定信号ｙ_{k_sig}についてのそれぞれの聴覚モデル出力信号に基づいて、聴覚歪み特性を計算し、音響的な信号劣化の程度を表すモデル出力値を求める。そして、ＰＥＡＱ評価手段２０－ｋは、ニューラルネットワーク構造を有する認識モデルを用いて、モデル出力値に基づき客観評価値ｚ_kを求める。 The PEAQ evaluation means 20-k calculates the auditory distortion characteristics based on the respective auditory model output signals for the basic signal y _{k_ori} and the measured signal y _{k_sig} , and calculates the model output value indicating the degree of acoustic signal deterioration. Ask. Then, the PEAQ evaluation means 20- _k obtains an objective evaluation value zk based on the model output value by using a recognition model having a neural network structure.

尚、ＰＥＡＱ客観音質測定法のアルゴリズムを用いて客観評価値ｚ_kを求める方法は既知であり、詳細については、例えば前述の非特許文献２または以下の文献を参照されたい。
渡辺馨、“オーディオ信号の劣化の評価法”、日本音響学会誌、63巻11号（2007）、pp.686-692 A method of obtaining an objective evaluation value z _k using an algorithm of the PEAQ objective sound quality measurement method is known, and for details, refer to, for example, the above-mentioned Non-Patent Document 2 or the following documents.
Kaoru Watanabe, "Evaluation Method for Deterioration of Audio Signals", Journal of Acoustical Society of Japan, Vol. 63, No. 11 (2007), pp.686-692

（マルチチャンネル評価部１３）
マルチチャンネル評価部１３は、ＰＥＡＱ評価部１２からチャンネル毎の客観評価値ｚ_1～24を入力し、客観評価値ｚ_1～24に基づいてマルチチャンネルの客観評価値ｚを求め、マルチチャンネルの客観評価値ｚを出力する。 (Multi-channel evaluation unit 13)
The multi-channel evaluation unit 13 inputs objective evaluation values z _{1 to 24} for each channel from the PEAQ evaluation unit 12, obtains multi-channel objective evaluation values z based on objective evaluation values z _{1 to 24} , and multi-channel objective. The evaluation value z is output.

図７は、マルチチャンネル評価部１３の第１処理例を示すフローチャートである。第１処理例は、ＰＥＡＱ客観音質測定法にて求めたチャンネル毎の客観評価値ｚ_1～24のうち、最低値ｚ_Lをマルチチャンネルの客観評価値ｚとする例である。 FIG. 7 is a flowchart showing a first processing example of the multi-channel evaluation unit 13. The first processing example is an example in which the lowest value z _L among the objective evaluation values z _{1 to 24} for each channel obtained by the PEAQ objective sound quality measurement method is set as the multi-channel objective evaluation value z.

マルチチャンネル評価部１３は、ＰＥＡＱ評価部１２からチャンネル毎の客観評価値ｚ_1～24を入力し（ステップＳ７０１）、チャンネル毎の客観評価値ｚ_1～24のうち、最低値ｚ_Lを検出する（ステップＳ７０２）。 The multi-channel evaluation unit 13 inputs objective evaluation values z _{1 to 24} for each channel from the PEAQ evaluation unit 12 (step S701), and detects the lowest value z _L among the objective evaluation values z _{1 to 24} for each channel. (Step S702).

マルチチャンネル評価部１３は、ステップＳ７０２にて検出した最低値ｚ_Lをマルチチャンネルの客観評価値ｚに設定し（ｚ＝ｚ_L）、マルチチャンネルの客観評価値ｚを出力する（ステップＳ７０３）。 The multi-channel evaluation unit 13 sets the lowest value z _L detected in step S702 to the multi-channel objective evaluation value z (z = z _L ), and outputs the multi-channel objective evaluation value z (step S703).

このように、マルチチャンネル評価部１３は、ＰＥＡＱ客観音質測定法にて求めたチャンネル毎の客観評価値ｚ_1～24のうち、最低値ｚ_Lをマルチチャンネルの客観評価値ｚとして出力するようにした。これにより、人間がマルチチャンネル音響において特定のチャンネルに着目したときの最も評価の低いチャンネルについて、当該チャンネルの客観評価値が、マルチチャンネルの客観評価値ｚとして出力される。つまり、マルチチャンネルの客観評価値ｚは、個別の音源の音質劣化に着目して評価される主観評価値に近い値となる。 In this way, the multi-channel evaluation unit 13 outputs the lowest value z _L among the objective evaluation values z _{1 to 24} for each channel obtained by the PEAQ objective sound quality measurement method as the multi-channel objective evaluation value z. did. As a result, the objective evaluation value of the channel having the lowest evaluation when a human pays attention to a specific channel in the multi-channel sound is output as the multi-channel objective evaluation value z. That is, the multi-channel objective evaluation value z is close to the subjective evaluation value evaluated by paying attention to the deterioration of the sound quality of each sound source.

図８は、マルチチャンネル評価部１３の第２処理例を示すフローチャートである。第２処理例は、ＰＥＡＱ客観音質測定法にて求めたチャンネル毎の客観評価値ｚ_1～24に重み付け係数Ｗ_1～24を乗算し、全てのチャンネルの乗算結果を加算することでマルチチャンネルの客観評価値ｚを求める例である。 FIG. 8 is a flowchart showing a second processing example of the multi-channel evaluation unit 13. In the second processing example, the objective evaluation values z _{1 to 24} for each channel obtained by the PEAQ objective sound quality measurement method are multiplied by the weighting coefficients W _{1 to 24} , and the multiplication results of all channels are added to obtain a multi-channel. This is an example of obtaining the objective evaluation value z.

マルチチャンネル評価部１３は、ＰＥＡＱ評価部１２からチャンネル毎の客観評価値ｚ_1～24を入力し（ステップＳ８０１）、チャンネル毎の客観評価値ｚ_1～24に、所定の重み付け係数Ｗ_1～24をそれぞれ乗算し、チャンネル毎の乗算結果を求める（ステップＳ８０２）。重み付け係数Ｗ_1～24の合計値は１である。 The multi-channel evaluation unit 13 inputs objective evaluation values z _{1 to 24} for each channel from the PEAQ evaluation unit 12 (step S801), and a predetermined weighting coefficient W _{1 to 24} is set in the objective evaluation values z _{1 to 24} for each channel. Are multiplied by each, and the multiplication result for each channel is obtained (step S802). The total value of the weighting coefficients W1 _{to 24} is 1.

マルチチャンネル評価部１３は、ステップＳ８０２にて求めた全てのチャンネルの乗算結果を加算し（ステップＳ８０３）、加算結果をマルチチャンネルの客観評価値ｚに設定し、マルチチャンネルの客観評価値ｚを出力する（ステップＳ８０４）。 The multi-channel evaluation unit 13 adds the multiplication results of all the channels obtained in step S802 (step S803), sets the addition result to the multi-channel objective evaluation value z, and outputs the multi-channel objective evaluation value z. (Step S804).

ここで、図８に示した第２処理例は、以下の数式にて表される。

Here, the second processing example shown in FIG. 8 is represented by the following mathematical formula.

所定の重み付け係数Ｗ_1～24としては、チャンネル毎に、客観評価値ｚ_1～24が大きいほど（劣化が小さいほど）小さい値が用いられ、客観評価値ｚ_1～24が小さいほど（劣化が大きいほど）大きい値が用いられる。所定の重み付け係数Ｗ_1～24は、ユーザにより予め設定されるようにしてもよいし、所定の処理により自動的に設定されるようにしてもよい。 As the predetermined weighting coefficients W _{1 to 24} , a smaller value is used as the objective evaluation value z ₁ _{to 24} is larger (the smaller the deterioration is), and the smaller the objective evaluation value z 1 to 24 is (the deterioration is less) for each channel. Larger values are used. The predetermined weighting coefficients W1 _{to 24} may be preset by the user or may be automatically set by a predetermined process.

以下、所定の処理にて重み付け係数Ｗ_1～24を設定する例について説明する。図９は、マルチチャンネル評価部１３による重み付け係数Ｗ_1～24の設定処理例を示すフローチャートである。マルチチャンネル評価部１３は、チャンネル番号ｉ（ｉ＝１～２４）を順番に設定し（ステップＳ９０１）、客観評価値ｚ_iが所定値よりも大きいか否かを判定する（ステップＳ９０２）。 Hereinafter, an example of setting the weighting coefficients W1 _{to 24} in a predetermined process will be described. FIG. 9 is a flowchart showing an example of setting processing of the weighting coefficients W1 _{to 24} by the multi-channel evaluation unit 13. The multi-channel evaluation unit 13 sets channel numbers i (i = 1 to 24) in order (step S901), and determines whether or not the objective evaluation value z _i is larger than a predetermined value (step S902).

ＰＥＡＱ評価部１２により求めた客観評価値ｚ_iにおいて、０が「劣化音を検知できない」、－１が「劣化音を検知できるが気にならない」、－２が「劣化音がやや気になる」、－３が「劣化音が気になる」、－４が「劣化音が非常に気になる」を示す場合、ステップＳ９０２にて用いる所定値は、例えば－１である。 In the objective evaluation value z _i obtained by the PEAQ evaluation unit 12, 0 is "cannot detect deteriorated sound", -1 is "can detect deteriorated sound but does not bother", and -2 is "slightly worried about deteriorated sound". , -3 indicates "I am concerned about the deteriorated sound", and -4 indicates "I am very concerned about the deteriorated sound", the predetermined value used in step S902 is, for example, -1.

マルチチャンネル評価部１３は、ステップＳ９０２において、客観評価値ｚ_iが所定値よりも大きいと判定した場合（ステップＳ９０２：Ｙ）、チャンネル番号ｉのチャンネルの音響信号について、劣化が小さいと判断し、重み付け係数Ｗ_i＝０に設定する（ステップＳ９０３）。 When the multi-channel evaluation unit 13 determines in step S902 that the objective evaluation value _zi is larger than the predetermined value (step S902: Y), the multi-channel evaluation unit 13 determines that the acoustic signal of the channel of the channel number i has little deterioration. The weighting coefficient W _i = 0 is set (step S903).

一方、マルチチャンネル評価部１３は、ステップＳ９０２において、客観評価値ｚ_iが所定値よりも大きくないと判定した場合（ステップＳ９０２：Ｎ）、当該音響信号のラウドネスレベルに基づいて、重み付け係数Ｗ_iを設定する（ステップＳ９０４）。 On the other hand, when the multi-channel evaluation unit 13 determines in step S902 that the objective evaluation value z _i is not larger than the predetermined value (step S902: N), the weighting coefficient W _i is based on the loudness level of the acoustic signal. Is set (step S904).

具体的には、マルチチャンネル評価部１３は、図１には図示しないラウドネス測定部から、チャンネル番号ｉのチャンネルの音響信号についてのラウドネスレベルを入力する。そして、マルチチャンネル評価部１３は、ラウドネスレベルが所定値よりも大きくないと判断したチャンネルが複数ある場合、ラウドネスレベルが大きいほど重み付け係数Ｗ_iが大きくなり（１に近くなり）、ラウドネスレベルが小さいほど重み付け係数Ｗ_iが小さくなるように（０に近くなるように）、音響信号のラウドネスレベルに基づいた重み付け係数Ｗ_iを設定する。これにより、音響信号のラウドネスレベルに対応した重み付け係数Ｗ_iが得られる。尚、重み付け係数Ｗ_1～24の合計値は、１であるとする。 Specifically, the multi-channel evaluation unit 13 inputs the loudness level for the acoustic signal of the channel of channel number i from the loudness measurement unit (not shown in FIG. 1). When there are a plurality of channels that the multi-channel evaluation unit 13 determines that the loudness level is not larger than the predetermined value, the larger the loudness level, the larger the weighting coefficient Wi _i (closer to 1), and the smaller the loudness level. The weighting coefficient W _i is set based on the loudness level of the acoustic signal so that the weighting coefficient W _i becomes smaller (closer to 0). As a result, a weighting coefficient Wi corresponding to the _loudness level of the acoustic signal can be obtained. It is assumed that the total value of the weighting coefficients W _{1 to 24} is 1.

この場合、図示しないラウドネス測定部は、例えば以下の文献の手法を用いて、チャンネル毎にラウドネス（音の大きさ）を測定する。
Rec. ITU-R BS.1770-4,“Algorithms to measure audio programme loudness and true-peak audio level”
ラウドネス測定部は、複数あるチャンネル毎のラウドネスレベルをマルチチャンネル評価部１３に出力する。マルチチャンネル評価部１３では、チャンネル毎のラウドネスレベルに応じた重み付け係数Ｗ_iを設定する。 In this case, the loudness measuring unit (not shown) measures loudness (loudness) for each channel by using, for example, the method of the following literature.
Rec. ITU-R BS.1770-4, “Algorithms to measure audio programme loudness and true-peak audio level”
The loudness measuring unit outputs the loudness level for each of a plurality of channels to the multi-channel evaluation unit 13. The multi-channel evaluation unit 13 sets a weighting coefficient Wi _i according to the loudness level for each channel.

尚、マルチチャンネル評価部１３は、チャンネルの正規化相関係数ρ_fgに対応した重み付け係数Ｗ_iを設定するようにしてもよい。具体的には、マルチチャンネル評価部１３は、図１には図示しない相関係数算出部から、チャンネル番号ｉのチャンネルにおける正規化相関係数ρ_fgを入力する。マルチチャンネル評価部１３は、正規化相関係数ρ_fgが大きいほど重み付け係数Ｗ_iが大きくなり（１に近くなり）、正規化相関係数ρ_fgが小さいほど重み付け係数Ｗ_iが小さくなるように（０に近くなるように）、重み付け係数Ｗ_iを設定する。これにより、チャンネルの正規化相関係数ρ_fgに対応した重み付け係数Ｗ_iが得られる。 The multi-channel evaluation unit 13 may set a _weighting coefficient Wi corresponding to the channel normalization correlation coefficient ρ _fg . Specifically, the multi-channel evaluation unit 13 inputs the normalized correlation coefficient ρ _fg in the channel of channel number i from the correlation coefficient calculation unit (not shown in FIG. 1). In the multi-channel evaluation unit 13, the weighting coefficient Wii becomes larger (closer to 1) as the normalization correlation coefficient ρ _fg becomes larger, and the weighting coefficient _{Wi i} _becomes smaller as the normalization correlation coefficient ρ _fg becomes smaller. (To be close to 0), set the weighting factor Wi _i . As a result, a weighting coefficient Wi corresponding to the channel normalization correlation coefficient _ρ _fg can be obtained.

この場合、図示しない相関係数算出部は、チャンネル番号ｉのチャンネルと当該チャンネル以外のチャンネルとの間の正規化相関係数ρ_fgを、前記数式（１）を用いてそれぞれ算出する。そして、相関係数算出部は、これを、チャンネル番号ｉのチャンネルにおける正規化相関係数ρ_fgとしてマルチチャンネル評価部１３に出力する。 In this case, the correlation coefficient calculation unit (not shown) calculates the normalized correlation coefficient ρ _fg between the channel of channel number i and the channel other than the channel, respectively, using the above formula (1). Then, the correlation coefficient calculation unit outputs this to the multi-channel evaluation unit 13 as a normalized correlation coefficient ρ _fg in the channel of channel number i.

また、マルチチャンネル評価部１３は、チャンネル毎の客観評価値ｚ_iのうち最低値ｚ_Lを検出し、その最低値ｚ_Lを有するチャンネルに隣接する複数のチャンネルについて、客観評価値ｚ_iが所定値以下の場合、重み付け係数Ｗ_iの合計値が１を超えるように、重み付け係数Ｗ_iを設定するようにしてもよい。ただし、重み付け係数Ｗ_iの合計値は２を超えないものとする。また、ＰＥＡＱ評価部１２により求めた客観評価値ｚ_iが、前述のとおり０～－４で表される場合、客観評価値ｚ_iと比較される所定値は、例えば－１である。 Further, the multi-channel evaluation unit 13 detects the lowest value z _L among the objective evaluation values z _i for each channel, and the objective evaluation value z _i is predetermined for a plurality of channels adjacent to the channel having the lowest value z _l . If it is less than or equal to the value, the weighting coefficient W _i may be set so that the total value of the weighting coefficient W _i exceeds 1. However, the total value of the weighting coefficient W _i shall not exceed 2. Further, when the objective evaluation value z _i obtained by the PEAQ evaluation unit 12 is represented by 0 to -4 as described above, the predetermined value to be compared with the objective evaluation value z _i is, for example, -1.

このように、マルチチャンネル評価部１３は、ＰＥＡＱ客観音質測定法にて求めたチャンネル毎の客観評価値ｚ_1～24に所定の重み付け係数Ｗ_1～24を乗算し、全ての乗算結果を加算することで、マルチチャンネルの客観評価値ｚを生成して出力するようにした。これにより、人間が特定のチャンネルに着目したときのＰＥＡＱ客観音質測定法にて求めたチャンネル毎の客観評価値ｚ_1～24に対し、その着目度合いに応じた重み付け係数Ｗ_1～24が用いられることで、チャンネル毎に異なる着目度合いが反映されたマルチチャンネルの客観評価値ｚが生成され出力される。つまり、マルチチャンネルの客観評価値ｚは、個別の音源の音質劣化に着目して評価される主観評価値に近い値となる。 In this way, the multi-channel evaluation unit 13 multiplies the objective evaluation values z _{1 to 24} for each channel obtained by the PEAQ objective sound quality measurement method by the predetermined weighting coefficients W _{1 to 24} , and adds all the multiplication results. Therefore, the multi-channel objective evaluation value z is generated and output. As a result, for the objective evaluation values z _{1 to 24} for each channel obtained by the PEAQ objective sound quality measurement method when a human focuses on a specific channel, weighting coefficients W _{1 to 24} according to the degree of attention are used. As a result, a multi-channel objective evaluation value z that reflects the degree of attention that differs for each channel is generated and output. That is, the multi-channel objective evaluation value z is close to the subjective evaluation value evaluated by paying attention to the deterioration of the sound quality of each sound source.

〔実験結果〕
次に、コンピュータシミュレーションによる実験結果について説明する。この実験結果は、マルチチャンネル客観評価装置１により出力されたマルチチャンネルの客観評価値ｚが、前述の非特許文献１のＩＴＵ－Ｒ勧告ＢＳ．１１１６－３に定めた主観評価法により求めた主観評価値に近いことを示すものである。〔Experimental result〕
Next, the experimental results by computer simulation will be described. In this experimental result, the multi-channel objective evaluation value z output by the multi-channel objective evaluation device 1 is the ITU-R recommendation BS of Non-Patent Document 1 described above. It shows that it is close to the subjective evaluation value obtained by the subjective evaluation method defined in 1116-3.

図１０は、実験結果を示す図であり、実際に収音した２２．２ｃｈのマルチチャンネル音響信号の環境音を評価した結果を示す。（ａ）は、前述の非特許文献１のＩＴＵ－Ｒ勧告ＢＳ．１１１６－３に定めた主観評価法により求めた主観評価結果を示し、（ｂ）は、本発明の実施形態による客観評価結果（劣化音ｘ’のチャンネル数Ｎ＝１の場合）を示す。 FIG. 10 is a diagram showing the experimental results, and shows the results of evaluating the environmental sound of the 22.2ch multi-channel acoustic signal actually picked up. (A) is the above-mentioned ITU-R recommendation BS of Non-Patent Document 1. The subjective evaluation result obtained by the subjective evaluation method defined in 1116-3 is shown, and (b) shows the objective evaluation result (when the number of channels N = 1 of the deteriorated sound x') according to the embodiment of the present invention.

また、（ｃ）は、前述の非特許文献４の方法に、前述の非特許文献２のＩＴＵ－Ｒ勧告ＢＳ．１３８７－１に定めた客観評価法を組み込んだ従来技術（前述の想定手法）による客観評価結果を示す。具体的には、（ｃ）の客観評価結果は、前述のとおり、マルチチャンネル音響信号に頭部インパルス応答ＨＲＩＲを畳み込んで２チャンネル信号を生成し、前述の非特許文献２の客観評価法により求めた結果である。 Further, (c) is based on the method of the above-mentioned non-patent document 4 and the above-mentioned ITU-R recommendation BS of the above-mentioned non-patent document 2. The objective evaluation result by the prior art (the above-mentioned assumed method) incorporating the objective evaluation method defined in 1387-1 is shown. Specifically, as described above, the objective evaluation result of (c) is obtained by convolving the head impulse response HRIR into the multi-channel acoustic signal to generate a 2-channel signal, and using the objective evaluation method of Non-Patent Document 2 described above. This is the result of the request.

（ａ）（ｂ）及び（ｃ）の横軸は、音響信号のビットレート［kbit/s］を示す。ビットレートが高いほど圧縮率は低く、ビットレートが低いほど圧縮率は高い関係にある。（ａ）の縦軸は主観評価値（Diff Grade）を示し、（ｂ）及び（ｃ）の縦軸は客観評価値（Diff Grade）を示す。（ｂ）の客観評価値は、図１に示したマルチチャンネル客観評価装置１のマルチチャンネル評価部１３により出力されたマルチチャンネルの客観評価値ｚである。 (A) The horizontal axes of (b) and (c) indicate the bit rate [kbit / s] of the acoustic signal. The higher the bit rate, the lower the compression rate, and the lower the bit rate, the higher the compression rate. The vertical axis of (a) shows the subjective evaluation value (Diff Grade), and the vertical axis of (b) and (c) shows the objective evaluation value (Diff Grade). The objective evaluation value (b) is the multi-channel objective evaluation value z output by the multi-channel evaluation unit 13 of the multi-channel objective evaluation device 1 shown in FIG.

前述と同様に、主観評価値及び客観評価値の０は「劣化音を検知できない」、－１は「劣化音を検知できるが気にならない」、－２は「劣化音がやや気になる」、－３は「劣化音が気になる」、－４は「劣化音が非常に気になる」を示す。 Similar to the above, 0 of the subjective evaluation value and the objective evaluation value is "cannot detect the deteriorated sound", -1 is "the deterioration sound can be detected but does not bother me", and -2 is "the deteriorated sound is a little worrisome". , -3 indicates "I'm worried about the deteriorated sound", and -4 indicates "I'm very worried about the deteriorated sound".

（ａ）（ｂ）及び（ｃ）から、（ｂ）に示す本発明の実施形態の客観評価結果は、（ｃ）に示す従来技術の客観評価結果よりも、（ａ）に示す主観評価結果に近いことがわかる。 (A) From (b) and (c), the objective evaluation result of the embodiment of the present invention shown in (b) is the subjective evaluation result shown in (a) rather than the objective evaluation result of the prior art shown in (c). It turns out that it is close to.

このように、本発明の実施形態のマルチチャンネル客観評価装置１を用いることにより、前述の非特許文献１のＩＴＵ－Ｒ勧告ＢＳ．１１１６－３に定めた主観評価法により求めた主観評価値に近いマルチチャンネルの客観評価値ｚを求めることができる。 As described above, by using the multi-channel objective evaluation device 1 of the embodiment of the present invention, the above-mentioned ITU-R recommendation BS of Non-Patent Document 1 can be used. It is possible to obtain a multi-channel objective evaluation value z close to the subjective evaluation value obtained by the subjective evaluation method defined in 1116-3.

以上のように、本発明の実施形態のマルチチャンネル客観評価装置１によれば、畳み込み信号出力部１０は、予め設定されたＤＢを用いて、２４チャンネルの音響信号の再生位置情報Ｐに基づき、チャンネル毎の頭部インパルス応答ＨＲＩＲ_1～24を特定して出力する。 As described above, according to the multi-channel objective evaluation device 1 of the embodiment of the present invention, the convolution signal output unit 10 uses a preset DB and is based on the reproduction position information P of the acoustic signal of 24 channels. The head impulse response HRIR _{1 to 24} for each channel is specified and output.

信号処理部１１は、マルチチャンネル音響信号の原音ｘ_1～24、劣化音ｘ’_1～24及び頭部インパルス応答ＨＲＩＲ_1～24に基づいて畳み込み処理を行い、主観評価を考慮したチャンネル毎のバイノーラル信号ｙ_{1_ori～24_ori}，ｙ_{1_sig～24_sig}を生成する。具体的には、信号処理部１１は、チャンネル毎に、例えば全ての原音ｘ_1～24、当該チャンネルのみの劣化音ｘ’、及び頭部インパルス応答ＨＲＩＲ_1～24に基づいて畳み込み処理を行い、バイノーラル信号ｙ_{1_ori～24_ori}，ｙ_{1_sig～24_sig}を生成する。 The signal processing unit 11 performs convolution processing based on the original sound x _{1 to 24} of the multi-channel acoustic signal, the deteriorated sound x ' _{1 to 24} , and the head impulse response HRIR _{1 to 24} , and the binaural for each channel in consideration of subjective evaluation. Generates signals y _{1_ori to 24_ori} and y _{1_sig to 24_sig} . Specifically, the signal processing unit 11 performs convolution processing for each channel, for example, based on all the original sounds x _{1 to 24} , the deteriorated sound x'only for the channel, and the head impulse response HRIR _{1 to 24} . Binaural signals y _{1_ori to 24_ori} and y _{1_sig to 24_sig} are generated.

ＰＥＡＱ評価部１２は、チャンネル毎に、バイノーラル信号ｙ_{1_ori～24_ori}，ｙ_{1_sig～24_sig}に基づいて、前述の非特許文献２のＩＴＵ－Ｒ勧告ＢＳ．１３８７－１に定めた客観評価法であるＰＥＡＱ客観音質測定法により客観評価値ｚ_1～24を求める。 The PEAQ evaluation unit 12 has the ITU-R recommendation BS of Non-Patent Document 2 described above based on the binaural signals y _{1_ori to 24_ori} and y _{1_sig to 24_sig} for each channel. The objective evaluation values z _{1 to 24} are obtained by the PEAQ objective sound quality measurement method, which is the objective evaluation method defined in 1387-1.

マルチチャンネル評価部１３は、チャンネル毎の客観評価値ｚ_1～24に基づいて、マルチチャンネルの客観評価値ｚを求める。 The multi-channel evaluation unit 13 obtains the multi-channel objective evaluation value z based on the objective evaluation values z _{1 to 24} for each channel.

ここで、ＰＥＡＱ評価部１２が用いる客観評価対象のバイノーラル信号ｙ_{1_ori～24_ori}，ｙ_{1_sig～24_sig}は、信号処理部１１において個別の音源の音質劣化に着目して生成される主観評価を考慮した信号である。これにより、マルチチャンネル評価部１３が求めるマルチチャンネルの客観評価値ｚは、バイノーラル信号ｙ_{1_ori～24_ori}，ｙ_{1_sig～24_sig}の客観評価値ｚ_1～24から生成されるから、主観評価値に近い値となる。したがって、２チャンネルを超えるマルチチャンネル音響信号の品質について、主観評価結果に近い客観評価結果を得ることが可能となる。 Here, the binaural signals y _{1_ori to 24_ori} and y _{1_sig to 24_sig} used by the PEAQ evaluation unit 12 for objective evaluation are signals considering the subjective evaluation generated by the signal processing unit 11 focusing on the deterioration of the sound quality of individual sound sources. Is. As a result, the multi-channel objective evaluation value z obtained by the multi-channel evaluation unit 13 is generated from the objective evaluation values z _{1 to 24} of the binaural signals y _{1_ori to 24_ori} and y _{1_sig to 24_sig} , and is therefore close to the subjective evaluation value. It becomes. Therefore, it is possible to obtain an objective evaluation result close to the subjective evaluation result for the quality of the multi-channel acoustic signal having more than two channels.

以上、実施形態を挙げて本発明を説明したが、本発明は前記実施形態に限定されるものではなく、その技術思想を逸脱しない範囲で種々変形可能である。前記実施形態では、マルチチャンネル客観評価装置１は、２２．２ｃｈのマルチチャンネル音響信号を評価対象として、マルチャンネルの客観評価値ｚを求めるようにした。本発明は、評価対象を２２．２ｃｈのマルチチャンネル音響信号に限定するものではなく、１１．１ｃｈ、７．１ｃｈ、５．１ｃｈ等の他の音響方式のマルチチャンネル音響信号にも適用がある。 Although the present invention has been described above with reference to embodiments, the present invention is not limited to the above-described embodiment and can be variously modified without departing from the technical idea. In the above-described embodiment, the multi-channel objective evaluation device 1 obtains the objective evaluation value z of the round channel with the 22.2ch multi-channel acoustic signal as the evaluation target. The present invention is not limited to the evaluation target of 22.2ch multi-channel acoustic signals, but is also applicable to multi-channel acoustic signals of other acoustic methods such as 11.1ch, 7.1ch, and 5.1ch.

また、本発明は、スピーカー配置がプリセットされた２２．２ｃｈ等の音響方式のマルチチャンネル音響信号だけでなく、スピーカー配置がプリセットされていない、２以上のスピーカーが任意に配置されたマルチチャンネル音響信号にも適用がある。 Further, the present invention includes not only a multi-channel acoustic signal of an acoustic method such as 22.2ch in which the speaker arrangement is preset, but also a multi-channel acoustic signal in which two or more speakers in which the speaker arrangement is not preset are arbitrarily arranged. Also applies to.

また、前記実施形態では、マルチチャンネル客観評価装置１は、畳み込み信号として、頭部インパルス応答ＨＲＩＲ_1～24を用いるようにした。本発明は、畳み込み信号を頭部インパルス応答ＨＲＩＲ_1～24に限定するものではなく、他のインパルス応答、例えばバイノーラル室内インパルス応答ＢＲＩＲ（Binaural Room Impulse Response）_1～24を用いるようにしてもよい。 Further, in the above embodiment, the multi-channel objective evaluation device 1 uses the head impulse response HRIR _{1 to 24} as the convolution signal. The present invention does not limit the convolution signal to the head impulse responses HRIRs _{1 to 24} , but other impulse responses such as binaural room impulse responses (BRIRs) _{1 to 24} may be used.

この場合、図４を参照して、畳み込み信号出力部１０に備えたＤＢには、頭部インパルス応答ＨＲＩＲ_1～24の代わりに、バイノーラル室内インパルス応答ＢＲＩＲ_1～24が格納されている。畳み込み信号出力部１０は、ＤＢから、再生位置情報Ｐに対応するチャンネル毎の伝搬特性を表すバイノーラル室内インパルス応答ＢＲＩＲ_1～24を読み出す。そして、信号処理部１１は、マルチチャンネル音響信号の原音ｘ_1～24及び劣化音ｘ’_1～24、並びにバイノーラル室内インパルス応答ＢＲＩＲ_1～24に基づいて畳み込み処理を行い、主観評価を考慮したチャンネル毎のバイノーラル信号ｙ_{1_ori～24_ori}，ｙ_{1_sig～24_sig}を生成する。 In this case, referring to FIG. 4, the DB provided in the convolution signal output unit 10 stores the binaural chamber impulse responses BRIRs _{1 to 24} instead of the head impulse responses HRIRs _{1 to 24} . The convolution signal output unit 10 reads out the binaural chamber impulse responses BRIRs _{1 to 24} representing the propagation characteristics of each channel corresponding to the reproduction position information P from the DB. Then, the signal processing unit 11 performs convolution processing based on the original sound x _{1 to 24} and the deteriorated sound x ' _{1 to 24} of the multi-channel acoustic signal, and the binaural room impulse response BRIR _{1 to 24} , and the channel in consideration of the subjective evaluation. Generates binaural signals y _{1_ori to 24_ori} and y _{1_sig to 24_sig} for each.

また、前記実施形態では、マルチチャンネル客観評価装置１は、マルチチャンネル音響信号の原音ｘ_1～24及び劣化音ｘ’_1～24にバイノーラル室内インパルス応答ＢＲＩＲ_1～24を畳み込み、チャンネル毎のバイノーラル信号ｙ_{1_ori～24_ori}，ｙ_{1_sig～24_sig}を生成するようにした。本発明は、この畳み込み処理を時間領域での演算に限定するものではなく、周波数領域に変換した原音ｘ_1～24及び劣化音ｘ’_1～24と、頭部伝達関数ＨＲＴＦ（Head Related Transfer Function）_1～24の積を演算し、時間領域に変換してバイノーラル信号ｙ_{1_ori～24_ori}，ｙ_{1_sig～24_sig}を生成するようにしてもよい。また、本発明は、原音ｘ_1～24及び劣化音ｘ’_1～24の周波数成分とバイノーラル室内伝達関数ＢＲＴＦ（Binaural Room Transfer Function）_1～24の積を演算し、時間領域に変換してバイノーラル信号ｙ_{1_ori～24_ori}，ｙ_{1_sig～24_sig}を生成するようにしてもよい。 Further, in the above-described embodiment, the multi-channel objective evaluation device 1 convolves the binaural chamber impulse responses BRIR _{1 to 24} with the original sound x _{1 to 24} and the deteriorated sound x ' _{1 to 24} of the multi-channel acoustic signal, and the binaural signal for each channel. Changed to generate y _{1_ori to 24_ori} and y _{1_sig to 24_sig} . The present invention does not limit this convolution process to the calculation in the time domain, but the original sound x _{1 to 24} and the deteriorated sound x ' _{1 to 24} converted into the frequency domain, and the head related transfer function (HRTF). ) The product of _{1 to 24} may be calculated and converted into the time domain to generate the binoral signals y _{1_ori to 24_ori} and y _{1_sig to 24_sig} . Further, in the present invention, the product of the frequency components of the original sound x _{1 to 24} and the deteriorated sound x ' _{1 to 24} and the binaural room transfer function BRTF (Binaural Room Transfer Function) _{1 to 24} is calculated and converted into a time domain to be converted into a binaural. The signals y _{1_ori to 24_ori} and y _{1_sig to 24_sig} may be generated.

尚、本発明の実施形態によるマルチチャンネル客観評価装置１のハードウェア構成としては、通常のコンピュータを使用することができる。マルチチャンネル客観評価装置１は、ＣＰＵ、ＲＡＭ等の揮発性の記憶媒体、ＲＯＭ等の不揮発性の記憶媒体、及びインターフェース等を備えたコンピュータによって構成される。 As the hardware configuration of the multi-channel objective evaluation device 1 according to the embodiment of the present invention, a normal computer can be used. The multi-channel objective evaluation device 1 is composed of a computer provided with a volatile storage medium such as a CPU and RAM, a non-volatile storage medium such as a ROM, and an interface.

マルチチャンネル客観評価装置１に備えた畳み込み信号出力部１０、信号処理部１１、ＰＥＡＱ評価部１２及びマルチチャンネル評価部１３の各機能は、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。 Each function of the convolution signal output unit 10, the signal processing unit 11, the PEAQ evaluation unit 12, and the multi-channel evaluation unit 13 provided in the multi-channel objective evaluation device 1 is performed by causing the CPU to execute a program describing these functions. It will be realized.

これらのプログラムは、前記記憶媒体に格納されており、ＣＰＵに読み出されて実行される。また、これらのプログラムは、磁気ディスク（フロッピー（登録商標）ディスク、ハードディスク等）、光ディスク（ＣＤ－ＲＯＭ、ＤＶＤ等）、半導体メモリ等の記憶媒体に格納して頒布することもでき、ネットワークを介して送受信することもできる。 These programs are stored in the storage medium, read by the CPU, and executed. In addition, these programs can be stored and distributed in storage media such as magnetic disks (floppy (registered trademark) disks, hard disks, etc.), optical disks (CD-ROM, DVD, etc.), semiconductor memories, etc., and can be distributed via a network. You can also send and receive.

１マルチチャンネル客観評価装置
１０畳み込み信号出力部
１１信号処理部
１２ＰＥＡＱ評価部
１３マルチチャンネル評価部
２０－１～２４ＰＥＡＱ評価手段
ｘ_1～24 マルチチャンネル音響信号の原音
ｘ’_1～24 マルチチャンネル音響信号の劣化音
Ｐ再生位置情報
ＨＲＩＲ_1～24 頭部インパルス応答
Ｍマルチチャンネル音響信号のチャンネル数
Ｎ劣化音ｘ’のチャンネル数Ｎ
ＢＲＩＲ_1～24 バイノーラル室内インパルス応答
ＨＲＴＦ_1～24 頭部伝達関数
ＢＲＴＦ_1～24 バイノーラル室内伝達関数
ｙ_{1_ori～24_ori} 基本信号（バイノーラル信号）
ｙ_{1_sig～24_sig} 被測定信号（バイノーラル信号）
ｙ_{_ori} 共通の基本信号
ｙ_{_sig} 共通の被測定信号
ｚ_1～24 チャンネル毎の客観評価値
ｚマルチチャンネルの客観評価値
ρ_fg 正規化相関係数
Ｗ_1～24 重み付け係数 1 Multi-channel objective evaluation device 10 Folded signal output unit 11 Signal processing unit 12 PEAQ evaluation unit 13 Multi-channel evaluation unit 20-1 to 24 PEAQ evaluation means x _{1 to} 24 Original sound of multi-channel acoustic signal x ' _{1 to 24} Multi-channel sound Signal deterioration sound P Playback position information HRIR _{1 to 24} Head impulse response M Number of channels of multi-channel acoustic signal N Number of channels of deterioration sound x'N
BRIR _{1 to 24} binaural chamber impulse response HRTF _{1 to 24} head related transfer function BRTF _{1 to 24} binaural chamber transfer function y _{1_ori to 24_ori} basic signal (binaural signal)
y _{1_sig ~ 24_sig} Measured signal (binaural signal)
y _{_ori} Common basic signal y _{_sig} Common measured signal z Objective evaluation value for each channel _{1 to 24} z Objective evaluation value for multi-channel ρ _fg Normalization correlation coefficient W _{1 to 24} Weighting coefficient

Claims

In a multi-channel objective evaluation device that objectively evaluates multi-channel acoustic signals exceeding two channels
A convolution signal output that outputs a head-related impulse response (HRIR) or binaural chamber impulse response (BRIR) representing the propagation characteristics of each channel as a convolution signal corresponding to each channel of the acoustic signal constituting the multi-channel acoustic signal. Department and
The original sound and the deteriorated sound of the multi-channel acoustic signal are input, and the convolution signal for each channel output by the convolution signal output unit is input.
The convolution signal is convoluted into the original sound for each channel, and a basic signal common to all channels is generated based on the convolution results of all channels.
For each channel, the convolution signal is convoluted into the degraded sound of one or more channels including the channel to generate a first convolution result, and the original sound of a channel other than the one or the plurality of channels among all channels. The convolution signal is convoluted to generate a second convolution result, and a signal to be measured is generated based on the first convolution result and the second convolution result.
A signal processing unit that generates a binaural signal composed of the basic signal and the measured signal for each channel.
The binoral signal for each channel generated by the signal processing unit is input, and each channel is objectively measured using a predetermined PEAQ (Perceptual Evaluation of Audio Quality) objective sound quality measurement method based on the binoral signal of the channel. An evaluation unit that generates evaluation results and
A multi-channel evaluation unit that generates an objective evaluation result of the multi-channel acoustic signal as a multi-channel objective evaluation result based on the objective evaluation result for each channel generated by the evaluation unit.
A multi-channel objective evaluation device characterized by being equipped with.

In the multi-channel objective evaluation device according to claim 1,
The convolution signal output unit is
Information on the acoustic method that determines the number and arrangement of channels of the multi-channel acoustic signal is input, and the convolution signal for each channel corresponding to the acoustic method is read out from a preset database and output.
The database is a multi-channel objective evaluation device, characterized in that a channel of the acoustic system and the convolution signal corresponding to the channel are stored.

In the multi-channel objective evaluation device according to claim 1,
The convolution signal output unit is
The information of the angle for each channel that determines the reproduction position for each acoustic signal constituting the multi-channel acoustic signal is input, and the convolution signal for each channel corresponding to the angle for each channel is input from a preset database. Read and output,
A multi-channel objective evaluation device, characterized in that the database stores the angle and the convolution signal corresponding to the angle.

In the multi-channel objective evaluation device according to any one of claims 1 to 3.
The multi-channel evaluation unit
A multi-channel objective evaluation device, characterized in that the lowest value among the objective evaluation results for each channel generated by the evaluation unit is detected and the lowest value is generated as the multi-channel objective evaluation result.

In the multi-channel objective evaluation device according to any one of claims 1 to 3.
The multi-channel evaluation unit
The objective evaluation result for each channel generated by the evaluation unit is multiplied by a weighting coefficient for each predetermined channel, the multiplication result for each channel is added, and the addition result is generated as the multi-channel objective evaluation result. , A multi-channel objective evaluation device characterized by that.

A program for making a computer function as the multi-channel objective evaluation device according to any one of claims 1 to 5.