JP5658506B2

JP5658506B2 - Acoustic signal conversion apparatus and acoustic signal conversion program

Info

Publication number: JP5658506B2
Application number: JP2010173946A
Authority: JP
Inventors: 渡辺　馨; 馨渡辺; 小森　智康; 智康小森
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2010-08-02
Filing date: 2010-08-02
Publication date: 2015-01-28
Anticipated expiration: 2030-08-02
Also published as: JP2012034295A

Description

本発明は、音響信号変換装置及び音響信号変換プログラムに係り、特に多チャンネル音響信号をダウンミックスする場合に最適な音響に変換するための音響信号変換装置及び音響信号変換プログラムに関する。 The present invention relates to an acoustic signal conversion device and an acoustic signal conversion program, and more particularly, to an acoustic signal conversion device and an acoustic signal conversion program for converting a multi-channel acoustic signal into an optimal sound when downmixing.

５．１チャンネル（ｃｈ）サラウンド音響方式よりも更に臨場感の高い音響を再生ができる「３次元（立体）音響方式」が開発されている。また、３次元音響方式を用いた番組制作は、多数のスピーカを標準配置した制作環境で行われるが、家庭においては３次元音響方式に対応した音響用スピーカ設備を実現できない場合も多く、通常は、２チャンネル音響や５．１チャンネルサラウンド音響装置で再生される。 A “three-dimensional (three-dimensional) sound method” has been developed that can reproduce sound with a higher presence than the 5.1 channel (ch) surround sound method. Program production using the three-dimensional sound system is performed in a production environment in which a large number of speakers are arranged as standard, but there are many cases where it is not possible to realize sound speaker equipment compatible with the three-dimensional sound system at home. It is reproduced by a 2-channel sound or 5.1-channel surround sound device.

そこで、３次元音響方式等の多チャンネル音響信号を２チャンネル音響や５．１チャンネルサラウンド音響装置で再生する手法として、「３次元音響方式等の多チャンネル音響信号に並行して、２チャンネル音響や５．１チャンネルサラウンド音響信号を伝送するサイマルキャスト方式」や「受信した多チャンネル音響信号を受信側で２チャンネル音響信号や５．１チャンネルサラウンド音響信号等に変換するダウンミックス方式」があり、ダウンミックス方式については規定の変換式が存在する（例えば、非特許文献１参照）。 Therefore, as a technique for reproducing a multi-channel sound signal such as a three-dimensional sound method with a two-channel sound or a 5.1 channel surround sound device, “two-channel sound or "Simult-cast method for transmitting 5.1 channel surround sound signal" and "Downmix method for converting received multi-channel sound signal into 2-channel sound signal, 5.1 channel surround sound signal, etc." There is a prescribed conversion formula for the mix method (for example, see Non-Patent Document 1).

また近年では、よりよい音響を視聴者（ユーザ）に提供するために、例えばナレーション信号とバックグラウンドミュージック（以下、「ＢＧＭ」と表す。）信号とのミキシングを行う際に、ミキシングバランスの状態を聴感に対応させて表示し、調整することができるミキシングバランス表示装置（例えば、特許文献１参照）や、健聴者に不快感を与えることなく、感音性難聴者の聴覚特性を正確に模擬することができるミキシングバランス表示システム（例えば、特許文献２参照）が開示されている。 In recent years, in order to provide better sound to viewers (users), for example, when mixing a narration signal and a background music (hereinafter referred to as “BGM”) signal, the state of mixing balance is changed. A mixing balance display device (for example, see Patent Document 1) that can be displayed and adjusted according to the sense of hearing, or accurately simulates the auditory characteristics of a sound-sensitive deaf person without causing discomfort to the normal hearing person. There is disclosed a mixing balance display system (see, for example, Patent Document 2).

特開２００８−１２４８９２号公報JP 2008-124882 A 特開２００９−１５９０８３号公報JP 2009-159083 A

ＩＴＵ−ＲＤｏｃ６Ｃ／２５３−Ｅ “ＰＲＯＰＯＳＥＤＰＲＥＬＩＭＩＮＡＲＹＤＲＡＦＴＮＥＷＲＥＣＯＭＭＥＮＤＡＴＩＯＮＩＴＵ−ＲＢＳ．［３Ｄ−ＳＯＵＮＤ］”、２６Ｏｃｔｏｒｂｅｒ２００９ITU-R Doc 6C / 253-E “PROPOSED PRELIMINARY DRAFT NEW RECOMMENDATION ITU-R BS. [3D-SOUND]”, 26 Oct. 2009

ところで、番組制作等において音響信号を制作する場合、その音響信号に含まれるスピーチ、ナレーション、会話、セリフ等のユーザに情報を伝えるための発話音と、ＢＧＭ、環境音等の効果音とのバランスにおいて、番組制作者や音響調整者等の意図に沿った最適な多チャンネル音響の音響コンテンツを制作したとしても、従来の規定された変換式によるダウンミックス手法を使用して、例えば２チャンネル音響や５．１チャンネルサラウンド音響信号にダウンミックスを行うと、発話音／効果音のバランスが劣化し、例えばＢＧＭ音によりナレーション音が聞きづらくなる等、番組制作者の意図に沿った発話音／効果音のバランスとはならない場合があった。 By the way, when producing an acoustic signal in program production, etc., the balance of speech sounds included in the acoustic signal, such as speech, narration, conversation, speech, etc., and sound effects such as BGM, environmental sound, etc. However, even if the optimum multi-channel sound content is produced in accordance with the intention of the program producer or the sound adjuster, it is possible to use, for example, two-channel sound or When downmixing a 5.1 channel surround sound signal, the utterance sound / sound effect balance deteriorates, for example, it becomes difficult to hear the narration sound due to the BGM sound. There was a case that was not balanced.

なお、上述した発話音／効果音のバランスの劣化とは、例えば、多チャンネル音響信号に含まれる信号レベルの異なる２つの音響信号をダウンミックスした際に生じる音声バランス（ミキシングバランス）の劣化を意味しており、具体的には、従来のダウンミックスを行うことで再生チャンネル数やスピーカ位置による空間的な音響マスキング条件の相違やダウンミックスによる意図しない信号レベル上昇等により、発話音／効果音のバランスが劣化することをいう。 Note that the above-described deterioration of the speech / sound effect balance means, for example, the deterioration of the sound balance (mixing balance) that occurs when two acoustic signals having different signal levels included in the multi-channel acoustic signal are downmixed. Specifically, by performing conventional downmixing, speech / sound effects may be reduced due to differences in spatial acoustic masking conditions depending on the number of playback channels and speaker positions, and unintended signal level increases due to downmixing. It means that the balance deteriorates.

したがって、このような場合には、２チャンネル音響や５．１チャンネルサラウンド音響信号用に別途ミキシングを行う必要があり手間がかかっていた。 Therefore, in such a case, it is necessary to separately perform mixing for a 2-channel sound or a 5.1-channel surround sound signal, which is troublesome.

また従来では、多チャンネル音響信号を受信側で２チャンネル音響信号や５．１チャンネルサラウンド音響信号等にダウンミックスする場合にも、上述したような発話音／効果音のバランスの劣化を改善する技術は開発されていなかった。 Conventionally, even when a multi-channel audio signal is downmixed to a 2-channel audio signal, a 5.1-channel surround sound signal, or the like on the receiving side, a technique for improving the deterioration of the speech / sound effect balance as described above. Was not developed.

本発明は、上述した問題点に鑑みなされたものであり、多チャンネル音響信号をダウンミックスする場合に、発話音と効果音とのバランスを劣化させずに最適な音響に変換するための音響信号変換装置及び音響信号変換プログラムを提供することを目的とする。 The present invention has been made in view of the above-described problems, and in the case of downmixing a multi-channel acoustic signal, an acoustic signal for converting to an optimal sound without degrading the balance between the speech sound and the sound effect. It is an object to provide a conversion device and an acoustic signal conversion program.

上記課題を解決するために、本件発明は、以下の特徴を有する課題を解決するための手段を採用している。 In order to solve the above problems, the present invention employs means for solving the problems having the following characteristics.

請求項１に記載された発明は、第１の信号レベルと第２の信号レベルとからなる音響信号を予め設定されたチャンネル数に対応させて変換する音響信号変換装置において、第１のチャンネル数に対応する音響信号から第２のチャンネル数に対応する音響信号にダウンミックスされたときの前記第１の信号レベルの音響信号と、前記第２の信号レベルの音響信号とに対する信号レベルのミックスバランスを測定するバランス測定手段と、前記バランス測定手段により得られる前記第１の信号レベルと前記第２の信号レベルとの相対レベル差に対応させて、前記第１の信号レベル又は前記第２の信号レベルのゲイン調整量を算出するゲイン調整量算出手段と、前記ゲイン調整量算出手段により得られるゲイン調整量に基づいて、前記第１の信号レベル及び／又は前記第２の信号レベルのゲインを調整するゲイン調整手段と、前記ゲイン調整手段により得られるゲイン調整された音響信号を用いて、前記第１の信号レベルの音響信号及び前記第２の信号レベルの音響信号を合成し、前記第２のチャンネル数に対応する音響信号を出力する合成手段とを有し、前記ゲイン調整手段は、前記ゲイン調整量算出手段により得られる前記ゲイン調整量の増加量又は減少量の大きさに応じた長さの調整時間を設定し、設定された調整時間で前記合成手段による合成後のトータル音量が一定に保たれるようにゲインを調整することを特徴とする。
According to the first aspect of the present invention, there is provided an acoustic signal converter for converting an acoustic signal having a first signal level and a second signal level in correspondence with a preset number of channels. The signal level mix balance for the first signal level acoustic signal and the second signal level acoustic signal when the acoustic signal corresponding to the second signal level is downmixed to the acoustic signal corresponding to the second channel number. In accordance with a relative level difference between the first signal level and the second signal level obtained by the balance measuring means , and the first signal level or the second signal. Based on the gain adjustment amount calculating means for calculating the gain adjustment amount of the level and the gain adjustment amount obtained by the gain adjustment amount calculating means, the first signal level is calculated. And / or gain adjustment means for adjusting the gain of the second signal level, and the gain-adjusted acoustic signal obtained by the gain adjustment means, and the first signal level acoustic signal and the second signal level. an acoustic signal by combining the signal levels of, have a synthesizing means for outputting a sound signal corresponding to the number of said second channel, said gain adjustment means, the gain adjustment amount obtained by the gain adjustment amount calculation means Adjusting the gain so that the total volume after synthesis by the synthesis means is kept constant for the set adjustment time. Features.

請求項１記載の発明によれば、多チャンネル音響信号をダウンミックスする場合に、２つの異なる信号レベルのミックスバランスを劣化させずに最適な音響に変換することができる。 According to the first aspect of the present invention, when a multi-channel sound signal is downmixed, it can be converted into an optimal sound without deteriorating the mix balance of two different signal levels.

請求項２に記載された発明は、制作側から伝送された第１の信号レベルと第２の信号レベルとからなる音響信号を予め設定されたチャンネル数に対応させて変換する音響信号変換装置において、前記制作側から伝送された多重化信号から、第１のチャンネル数に対応する音響信号と、前記第１のチャンネル数から第２のチャンネル数にダウンミックスされたときの前記第１の信号レベル及び前記第２の信号レベルのミックスバランスに対応した記第１の信号レベル又は前記第２の信号レベルのゲイン量と、前記第１の信号レベル及び前記第２の信号レベルの音響信号を識別するための識別メタデータとに分離するミキシングメタデータ分離手段と、前記ミキシングメタデータ分離手段により得られる第１のチャンネル数に対応する音響信号に対して、前記識別メタデータを用いて、第２のチャンネル数に対応する音響信号にダウンミックスされたときの前記第１の信号レベルの音響信号と、前記第２の信号レベルの音響信号とに分離するチャンネル分離手段と、前記第１の信号レベル及び／又は前記第２の信号レベルのゲインを、前記第１の信号レベルと前記第２の信号レベルとの相対レベル差に対応させたゲイン調整量に基づいて調整するゲイン調整手段と、前記ゲイン調整手段により得られるゲイン調整された音響信号を用いて、前記第１の信号レベルの音響信号及び前記第２の信号レベルの音響信号を合成し、前記第２のチャンネル数に対応する音響信号を出力する合成手段とを有し、前記ゲイン調整手段は、前記ゲイン調整量の増加量又は減少量の大きさに応じた長さの調整時間を設定し、設定された調整時間で前記合成手段による合成後のトータル音量が一定に保たれるようにゲインを調整することを特徴とする。
According to a second aspect of the present invention, there is provided an acoustic signal converter for converting an acoustic signal composed of a first signal level and a second signal level transmitted from a production side in accordance with a preset number of channels. The first signal level when the mixed signal transmitted from the production side is downmixed from the first channel number to the second channel number and the acoustic signal corresponding to the first channel number. And the first signal level or the gain amount of the second signal level corresponding to the mix balance of the second signal level, and the acoustic signal of the first signal level and the second signal level are identified. Mixing metadata separating means for separating the identification metadata for the sound, and an acoustic signal corresponding to the first channel number obtained by the mixing metadata separating means Then, using the identification metadata, the sound signal having the first signal level and the sound signal having the second signal level when the sound signal is downmixed to the sound signal corresponding to the second number of channels. Gain adjustment in which the channel separation means for separating and the gain of the first signal level and / or the second signal level correspond to the relative level difference between the first signal level and the second signal level The first signal level acoustic signal and the second signal level acoustic signal are synthesized using a gain adjustment unit that adjusts based on the amount and a gain-adjusted acoustic signal obtained by the gain adjustment unit. the second and a synthesizing means for outputting a sound signal corresponding to the number of channels, the gain adjustment means, the gain adjustment amount of increase or decrease in the size length tone corresponding to of Set the time, total volume after synthesis by the synthesis unit adjustment time is set and adjusts the gain so as to keep constant.

請求項２記載の発明によれば、多チャンネル音響信号をダウンミックスする場合に、２つの異なる信号レベルのミックスバランスを劣化させずに最適な音響に変換することができる。 According to the second aspect of the present invention, when a multi-channel sound signal is downmixed, it can be converted into an optimal sound without deteriorating the mix balance of two different signal levels.

請求項３に記載された発明は、前記合成手段は、前記ゲイン調整手段によりゲイン調整された音響信号の変化量に応じて、前記第２のチャンネル数に対応する音響信号の音量を調整することを特徴とする。
According to a third aspect of the present invention, the synthesizing unit adjusts the volume of the acoustic signal corresponding to the second number of channels according to the change amount of the acoustic signal gain-adjusted by the gain adjusting unit. It is characterized by.

請求項４に記載された発明は、前記第１の信号レベルと第２の信号レベルとからなる音響信号は、発話音と効果音であることを特徴とする。
The invention described in claim 4 is characterized in that the acoustic signal composed of the first signal level and the second signal level is an utterance sound and a sound effect.

請求項５に記載された発明は、入力される前記第１の信号レベルと前記第２の信号レベルとからなる音響信号は、予め設定される聴覚の臨界帯域幅を用いて前記発話音又は前記効果音に分離されていることを特徴とする。
According to a fifth aspect of the present invention, an acoustic signal composed of the first signal level and the second signal level to be input is obtained by using the utterance sound or the sound using a preset critical bandwidth of hearing. It is characterized by being separated into sound effects.

請求項６に記載された発明は、コンピュータを、請求項１乃至５の何れか１項に記載の音響信号変換装置が有する各手段として機能させるための音響信号変換プログラムである。
The invention described in claim 6 is an acoustic signal conversion program for causing a computer to function as each means included in the acoustic signal converter according to any one of claims 1 to 5 .

請求項６記載の発明によれば、多チャンネル音響信号をダウンミックスする場合に、２つの異なる信号レベルのミックスバランスを劣化させずに最適な音響に変換することができる。また、実行プログラムをコンピュータにインストールすることにより、容易に音響信号変換処理を実現することができる。

According to the sixth aspect of the present invention, when a multi-channel sound signal is downmixed, it can be converted to an optimal sound without deteriorating the mix balance of two different signal levels. In addition, the acoustic signal conversion process can be easily realized by installing the execution program in the computer.

なお、本発明の構成要素、表現又は構成要素の任意の組み合わせを、方法、装置、システム、コンピュータプログラム、記録媒体、データ構造等に適用したものも本発明の態様として有効である。 In addition, what applied the component, expression, or arbitrary combination of the component of this invention to a method, an apparatus, a system, a computer program, a recording medium, a data structure, etc. is also effective as an aspect of this invention.

本発明によれば、多チャンネル音響信号をダウンミックスする場合に、２つの異なる信号レベルのミックスバランスを劣化させずに最適な音響に変換することができる。 According to the present invention, when a multi-channel sound signal is downmixed, the sound can be converted to an optimum sound without deteriorating the mix balance of two different signal levels.

第１の実施形態における音響信号変換システムの一例を示す図である。It is a figure which shows an example of the acoustic signal conversion system in 1st Embodiment. ２２．２チャンネル時における音響スピーカの配置例を示す図である。It is a figure which shows the example of arrangement | positioning of the acoustic speaker at the time of 22.2 channel. ２２．２チャンネルの音響信号からダウンミックスする際の計算式の一例を示す図である。It is a figure which shows an example of the calculation formula at the time of downmixing from the acoustic signal of 22.2 channel. 本実施形態におけるゲイン調整量算出例を説明するための図である。It is a figure for demonstrating the example of gain adjustment amount calculation in this embodiment. 第２の実施形態における音響信号変換システム（送信側）の一例を示す図である。It is a figure which shows an example of the acoustic signal conversion system (transmission side) in 2nd Embodiment. 第２の実施形態における音響信号変換システム（受信側）の一例を示す図である。It is a figure which shows an example of the acoustic signal conversion system (reception side) in 2nd Embodiment. 臨界帯域番号と周波数との関係の一例を示す図である。It is a figure which shows an example of the relationship between a critical band number and a frequency. 第１の実施形態における音響信号変換処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the acoustic signal conversion process sequence in 1st Embodiment. 第２の実施形態における音響信号変換処理手順の一例を示すシーケンス図である。It is a sequence diagram which shows an example of the acoustic signal conversion process procedure in 2nd Embodiment.

＜本発明について＞
本発明は、例えば、３次元音響方式等の多数の音響チャンネル（第１のチャンネル数）を有する多チャンネル音響コンテンツの制作と同時並行して、多チャンネル音響信号コンテンツから２チャンネル音響信号や５．１チャンネルサラウンドコンテンツの音響信号等のチャンネル数（第２のチャンネル数）の音響信号コンテンツに自動的に変換（ダウンミックス）する。また、本発明では、多数の音響チャンネルを有する番組コンテンツとこれに付随して生成されたメタデータを用いて、受信側において、上述したメタデータに基づいて２チャンネル音響信号や５．１チャンネルサラウンド音響信号等、受信側の音響設備等に対応した音響信号に変換（ダウンミックス）する。なお、本発明では、上述した音響信号の変換において、発話音と効果音とのバランス調整を行うための機能を有する。 <About the present invention>
In the present invention, for example, in parallel with the production of multi-channel sound content having a large number of sound channels (first channel number) such as a three-dimensional sound system, a multi-channel sound signal content is converted into a two-channel sound signal or It is automatically converted (downmixed) into audio signal content of the number of channels (second channel number) such as audio signals of 1-channel surround content. In the present invention, the program content having a large number of audio channels and the metadata generated in association with the program content are used, and on the receiving side, the 2-channel audio signal and 5.1 channel surround are based on the above-mentioned metadata. It converts (downmix) into an acoustic signal corresponding to the receiving side acoustic equipment, such as an acoustic signal. The present invention has a function for adjusting the balance between the utterance sound and the sound effect in the above-described conversion of the acoustic signal.

次に、上述したような特徴を有する本発明における音響信号変換装置及び音響信号変換プログラムを好適に実施した形態について、図面等を用いて詳細に説明する。なお、以下の実施形態では、制作側で多チャンネル音響信号と同時並行してダウンミックスにより２チャンネル音響信号を制作する場合、及び、受信側で多チャンネル音響信号からダウンミックスにより２チャンネル音響信号を制作する場合について説明する。なお、本実施形態では、以下に説明する２チャンネル音響信号を他のチャンネル数からなる音響信号（例えば、５．１チャンネルサラウンド音響信号等）に容易に置き換えることができる。 Next, an embodiment in which an acoustic signal conversion device and an acoustic signal conversion program according to the present invention having the above-described features are suitably implemented will be described in detail with reference to the drawings. In the following embodiments, when the production side produces a two-channel acoustic signal by downmixing simultaneously with the multichannel acoustic signal, and the reception side produces the two-channel acoustic signal by downmixing from the multichannel acoustic signal. The case of producing will be described. In the present embodiment, a 2-channel acoustic signal described below can be easily replaced with an acoustic signal having a different number of channels (for example, a 5.1 channel surround acoustic signal).

＜音響信号変換システム：第１の実施形態＞
図１は、第１の実施形態における音響信号変換システムの一例を示す図である。図１に示す音響信号変換システム１０は、音響収録再生装置１１と、音声入力手段としてのマイク１２と、音響ミキシング装置１３と、音響信号変換装置１４とを有するよう構成されている。音響信号変換装置１４は、発話音／効果音バランス測定装置２１と、ゲイン調整量算出手段２２と、ゲイン調整手段２３と、合成手段２４とを有している。 <Acoustic Signal Conversion System: First Embodiment>
FIG. 1 is a diagram illustrating an example of an acoustic signal conversion system according to the first embodiment. An acoustic signal conversion system 10 shown in FIG. 1 is configured to include an acoustic recording / reproducing device 11, a microphone 12 as an audio input means, an acoustic mixing device 13, and an acoustic signal conversion device 14. The acoustic signal conversion device 14 includes a speech sound / sound effect balance measurement device 21, a gain adjustment amount calculation unit 22, a gain adjustment unit 23, and a synthesis unit 24.

なお、図１に示す音響信号変換システム１０は、制作側において、多数の音響チャンネルを有する音響コンテンツの制作と同時並行して行われる２チャンネル音響信号の自動ダウンミックス手法の一例を示している。 The acoustic signal conversion system 10 shown in FIG. 1 is an example of a two-channel audio signal automatic downmix technique that is performed in parallel with the production of acoustic content having a large number of acoustic channels on the production side.

図１に示す音響信号変換システム１０において、音響収録再生装置１１は、予め収録されているナレーション、スピーチ等の発話音やＢＧＭ等の効果音等の音響信号について、それぞれの音の種類等の音響内容を把握するために予め設定された属性データ「発話音／効果音識別」を付加した音響信号を生成する。 In the acoustic signal conversion system 10 shown in FIG. 1, the sound recording / reproducing device 11 is configured to store the sound signals such as narration and speech recorded in advance and the sound signals such as sound effects such as BGM and the like. An acoustic signal to which attribute data “speech sound / sound effect identification” set in advance in order to grasp the contents is added is generated.

また、マイク１２は、アナウンサー等のナレーション、スピーチ等の発話音や環境音等の効果音を直接入力し、それぞれの入力に対して上述した属性データ「発話音／効果音識別」を付加した音響信号を生成する。つまり、音響収録再生装置１１及びマイク１２から得られる音源素材は、音響信号の意味内容に関する属性「発話音／効果音識別」が音響信号毎に付加されて出力される。 Further, the microphone 12 directly inputs sound effects such as narration such as an announcer, speech, etc. and environmental sounds, etc., and the above-described attribute data “speech sound / sound effect identification” is added to each input. Generate a signal. That is, the sound source material obtained from the sound recording / playback apparatus 11 and the microphone 12 is output with the attribute “speech sound / sound effect identification” relating to the semantic content of the sound signal added for each sound signal.

ここで、属性データとしての「発話音／効果音識別」とは、例えば、音響チャンネル信号毎にチャンネルの音響内容が、例えば「発話音」か「効果音」の何れかを示す識別情報である。第１の実施形態の場合、音響収録再生装置１１及びマイク１２には、通常「発話音」又は「効果音」の何れか１つの音響信号が入力されるため、入力される音響信号に対して予め設定された対応する識別情報を付加することができる。 Here, “speech sound / sound effect identification” as attribute data is, for example, identification information indicating whether the sound content of the channel is “speech sound” or “sound effect” for each sound channel signal. . In the case of the first embodiment, the sound recording / reproducing device 11 and the microphone 12 normally receive either one of “speech sound” or “sound effect”, so that the input sound signal Corresponding identification information set in advance can be added.

例えば、音響収録再生装置１１には「効果音」に相当する音響信号が入力され、マイク１２には「発話音」に相当する音響信号が入力される場合、それぞれ対応する識別情報の属性データを付加することになる。なお、本実施形態においては、これに限定されるものではなく、例えば音響収録再生装置１１に「発話音」の音響信号が入力されてもよく、マイク１２に「効果音」の音響信号が入力されてもよい。また、本実施形態では、音響収録再生装置１１及びマイク１２がそれぞれ１又は複数有しているため、それぞれの入力に対してその音響信号に対応する１つの識別情報を付加して音響ミキシング装置１３に出力される。 For example, when an acoustic signal corresponding to “sound effect” is input to the sound recording / playback apparatus 11 and an acoustic signal corresponding to “utterance sound” is input to the microphone 12, attribute data of identification information corresponding to each is input. Will be added. In the present embodiment, the present invention is not limited to this. For example, an audio signal of “speech sound” may be input to the sound recording / playback apparatus 11, and an audio signal of “sound effect” may be input to the microphone 12. May be. In the present embodiment, each of the sound recording / reproducing device 11 and the microphone 12 has one or a plurality of microphones. Therefore, one identification information corresponding to the sound signal is added to each input, and the sound mixing device 13 is added. Is output.

音響ミキシング装置１３は、１又は複数の音響収録再生装置１１又はマイク１２から１又は複数の音響信号（音源素材）を入力し、入力した音響信号を用いて３次元音響方式等の多数の音響チャンネルを有する音響コンテンツを生成するために音響調整者が予め設定する条件等によりミキシングを行い、目的とする多数の音響チャンネル（例えば、２２．２チャンネル等）に対応した音響コンテンツを生成する。また、音響ミキシング装置１３は、生成された多チャンネル音響信号コンテンツ３１を出力する。 The acoustic mixing device 13 receives one or a plurality of sound signals (sound source material) from one or a plurality of sound recording / playback devices 11 or microphones 12, and uses a plurality of sound channels such as a three-dimensional sound system using the input sound signals. In order to generate the audio content having the above, mixing is performed according to conditions set in advance by the audio adjuster, and the audio content corresponding to a large number of target audio channels (for example, 22.2 channels) is generated. The acoustic mixing device 13 outputs the generated multi-channel acoustic signal content 31.

なお、図１の例では、音響ミキシング装置１３が制作する多チャンネル音響信号コンテンツ３１は、例えば番組のジャンル等によって設定される番組制作者の意図に沿った発話音／効果音のバランスを音響調整者が調整することで最適な音響信号が制作される。 In the example of FIG. 1, the multi-channel audio signal content 31 produced by the audio mixing device 13 adjusts the balance of the utterance sound / sound effect according to the intention of the program producer set by the genre of the program, for example. The optimal acoustic signal is produced by adjustment by the person.

また、音響ミキシング装置１３は、制作した多チャンネル音響信号コンテンツ３１に対して、上述した属性データ「発話音／効果音識別」を用いて、発話音及び効果音毎に分別し、発話音及び効果音毎に対して予め設定された変換式等を用いて、２チャンネル音響信号にダウンミックス（音響信号変換）を行い、発話音ダウンミックス信号３２及び効果音ダウンミックス信号３３を出力する。 Also, the acoustic mixing device 13 classifies the produced multi-channel acoustic signal content 31 for each utterance sound and sound effect using the above-described attribute data “speech sound / sound effect identification”, and the utterance sound and effect. Using a conversion formula or the like set in advance for each sound, down-mixing (sound signal conversion) is performed on the two-channel sound signal, and a speech sound down-mix signal 32 and a sound effect down-mix signal 33 are output.

また、上述した変換式としては、例えば、上述した非特許文献１に示すような規定された変換式を用いることができるが、本発明においてはこの変換式に限定されるものではなく、例えば受信側での音響設備の環境（空間や音声を出力するスピーカの性能等）、音響を聞く人（ユーザ）か難聴者であるか高齢者であるか等に応じて他の変換式を用いることができる。なお、変換式を用いたダウンミックスの具体例については後述する。 Further, as the above-described conversion formula, for example, a specified conversion formula as shown in Non-Patent Document 1 described above can be used. However, the present invention is not limited to this conversion formula. Other conversion formulas may be used depending on the environment of the sound equipment on the side (space, performance of speakers that output sound, etc.), the person who listens to the sound (user), the deaf or the elderly, etc. it can. A specific example of downmix using a conversion formula will be described later.

上述したように音響ミキシング装置１３は、番組制作者の意図及び訓練された音響調整者の操作するミキシング情報に基づいて、番組制作者の意図に沿った最適な発話音／効果音のバランスで制作された多チャンネル音響の音響コンテンツ３１を出力すると共に、上述した２チャンネル音響の２種類のダウンミックス信号（発話音ダウンミックス信号３２及び効果音ダウンミックス信号３３）を出力する。 As described above, the audio mixing device 13 is produced based on the intention of the program producer and the mixing information operated by the trained acoustic adjuster with the optimum utterance / sound effect balance in accordance with the intention of the program producer. The multi-channel sound content 31 is output, and the above-described two-channel downmix signals (speech sound downmix signal 32 and sound effect downmix signal 33) are output.

音響信号変換装置１４において、発話音／効果音バランス測定装置２１は、発話音ダウンミックス信号３２及び効果音ダウンミックス信号３３を入力して、例えば上述した特許文献１や特許文献２に示されるミキシングバランス表示装置や表示システムに入力し、発話音／効果音のバランスを測定する。 In the acoustic signal conversion device 14, the utterance sound / sound effect balance measurement device 21 receives the utterance sound downmix signal 32 and the sound effect downmix signal 33 and mixes them as described in, for example, Patent Document 1 and Patent Document 2 described above. Input to a balance display device or display system, and measure the balance of speech / sound effects.

具体的には、特許文献１に示すように、第１及び第２の音信号（発話音ダウンミックス信号３２及び効果音ダウンミックス信号３３）のレベルを所定時間間隔のフレーム毎に検出し、第１の音信号と第２の音信号とのレベル差を算出し、レベル差に対して第１の音信号のレベルに応じた重み付けを行って重み付きレベル差を算出し、現フレームから過去のｎ個のフレームまでの間において重み付きレベル差の値の大きいものから順にｍ個の値の平均値を算出し、現フレームから過去の所定数のフレームまでの間における第１の音信号のレベル平均値を算出し、それぞれの算出結果により第１の音信号と第２の音信号とのミキシングバランスの状態を示す表示値を決定する。 Specifically, as shown in Patent Document 1, the levels of the first and second sound signals (the speech sound downmix signal 32 and the sound effect downmix signal 33) are detected for each frame of a predetermined time interval, A level difference between the first sound signal and the second sound signal is calculated, a weighted level difference is calculated by weighting the level difference according to the level of the first sound signal, and the past difference from the current frame is calculated. The average value of m values is calculated in descending order of the weighted level difference value up to n frames, and the level of the first sound signal from the current frame to a predetermined number of frames in the past An average value is calculated, and a display value indicating a state of mixing balance between the first sound signal and the second sound signal is determined based on each calculation result.

また、特許文献２に示すように、第１及び第２の音信号（発話音ダウンミックス信号３２及び効果音ダウンミックス信号３３）のエネルギレベルを周波数バンド毎に算出し、第１の音信号のエネルギレベルと第２の音信号のエネルギレベルとの差に基づいて感音性難聴者の聴覚マスキング特性を模擬するためのマスキング補正量を周波数バンド毎に算出し、第１の音信号のエネルギレベルと第２の音信号のエネルギレベルとの和に基づいて感音性難聴者のリクルートメント現象を模擬するためのリクルートメント補正量を周波数バンド毎に算出し、マスキング補正量及びリクルートメント補正量に基づいて第１及び第２の音信号にそれぞれ対応する感音性難聴者の聴覚特性を模擬した第１及び第２の聴覚特性模擬信号を算出する。 Further, as shown in Patent Document 2, the energy levels of the first and second sound signals (the speech sound downmix signal 32 and the sound effect downmix signal 33) are calculated for each frequency band, and the first sound signal Based on the difference between the energy level and the energy level of the second sound signal, a masking correction amount for simulating the auditory masking characteristics of the hearing-impaired deaf person is calculated for each frequency band, and the energy level of the first sound signal is calculated. Based on the sum of the energy level of the second sound signal and the second sound signal, a recruitment correction amount for simulating the recruitment phenomenon of the sound-sensitive deaf person is calculated for each frequency band, and the masking correction amount and the recruitment correction amount are calculated. Based on the first and second sound signals, first and second auditory characteristic simulation signals simulating the auditory characteristics of the sound-sensitive deaf person are calculated.

また、発話音／効果音バランス測定装置２１は、発話音／効果音のそれぞれのバランス測定結果（信号レベル等）をゲイン調整量算出手段２２に出力する。 Further, the utterance sound / sound effect balance measuring device 21 outputs the balance measurement result (signal level or the like) of each utterance sound / sound effect to the gain adjustment amount calculation means 22.

ゲイン調整量算出手段２２は、発話音／効果音バランス測定装置２１から得られる発話音／効果音のバランスの測定結果に基づき、発話音ダウンミックス信号３２のレベルに対するゲイン調整量を算出する。なお、本発明においてはこれに限定されるものではなく、ゲイン調整量算出手段２２は、例えば発話音／効果音のバランスの測定結果に基づき、効果音ダウンミックス信号３３のレベルに対するゲイン調整量を算出してもよい。更には、発話音／効果音のバランスの測定結果に基づき、発話音ダウンミックス信号３２及び効果音ダウンミックス信号３３の両方の信号レベルに対してゲイン調整量を算出してもよい。なお、本実施形態におけるゲイン調整量の算出手法については後述する。ゲイン調整量算出手段２２は、得られたゲイン調整量をゲイン調整手段２３に出力する。 The gain adjustment amount calculation means 22 calculates a gain adjustment amount for the level of the utterance sound downmix signal 32 based on the measurement result of the utterance sound / effect sound balance obtained from the utterance sound / effect sound balance measurement device 21. However, the present invention is not limited to this, and the gain adjustment amount calculation means 22 calculates the gain adjustment amount with respect to the level of the sound effect downmix signal 33 based on, for example, the measurement result of the utterance sound / effect sound balance. It may be calculated. Further, the gain adjustment amount may be calculated for the signal levels of both the utterance sound downmix signal 32 and the sound effect downmix signal 33 based on the measurement result of the utterance sound / effect sound balance. A method for calculating the gain adjustment amount in the present embodiment will be described later. The gain adjustment amount calculation unit 22 outputs the obtained gain adjustment amount to the gain adjustment unit 23.

ゲイン調整手段２３は、ゲイン調整量算出手段２２により得られるゲイン調整量に基づいて、発話音ダウンミックス信号３２に対してゲイン調整を行う。なお、本発明においてはこれに限定されるものではなく、ゲイン調整手段２３は、例えばゲイン調整量算出手段２２により効果音ダウンミックス信号３３のレベルに対するゲイン調整量を算出しているのであれば、効果音ダウンミックス信号３３に対してゲイン調整を行う。更に、ゲイン調整手段２３は、ゲイン調整量算出手段２２により発話音ダウンミックス信号３２及び効果音ダウンミックス信号３３の両方の信号レベルに対してゲイン調整量を算出しているのであれば、発話音ダウンミックス信号３２及び効果音ダウンミックス信号３３に対してゲイン調整を行う。 The gain adjusting unit 23 performs gain adjustment on the speech sound downmix signal 32 based on the gain adjustment amount obtained by the gain adjustment amount calculating unit 22. In the present invention, the present invention is not limited to this. For example, the gain adjustment unit 23 calculates the gain adjustment amount with respect to the level of the sound effect downmix signal 33 by the gain adjustment amount calculation unit 22. Gain adjustment is performed on the sound effect downmix signal 33. Furthermore, if the gain adjustment means 23 has calculated the gain adjustment amount for the signal levels of both the utterance sound downmix signal 32 and the sound effect downmix signal 33 by the gain adjustment amount calculation means 22, the utterance sound. Gain adjustment is performed on the downmix signal 32 and the sound effect downmix signal 33.

更に、ゲイン調整手段２３は、発話音ダウンミックス信号３２のゲイン調整による時間的な連続性を保つため、例えば番組内又はある一定時間内においてゲイン調整値の急激な変動を起こさないようにする。具体的には、ゲイン調整手段２３は、ゲイン調整量に応じた調整時間を設定し、例えばゲイン調整量の増加幅又は減少量が大きい場合には、その大きさに応じて調整時間を長く設定し、時間をかけて急激な変動がないように調整する。 Furthermore, the gain adjusting means 23 prevents the gain adjustment value from abruptly changing, for example, within a program or within a certain period of time in order to maintain temporal continuity by adjusting the gain of the speech sound downmix signal 32. Specifically, the gain adjustment unit 23 sets an adjustment time according to the gain adjustment amount. For example, when the gain adjustment amount increases or decreases, the gain adjustment unit 23 sets the adjustment time longer according to the amount. And adjust so that there is no sudden fluctuation over time.

合成手段２４は、ゲイン調整手段２３によりゲイン調整された発話音ダウンミックス信号３２と、効果音ダウンミックス信号３３とを合成することで、番組制作者の意図に沿った発話音／効果音のバランスを有する２チャンネル音響信号コンテンツ３４を出力する。なお、合成手段２４において合成される発話音ダウンミックス信号３２及び効果音ダウンミックス信号３３は、一方又は両方がゲイン調整されていてもよい。 The synthesizing unit 24 synthesizes the utterance sound downmix signal 32 gain-adjusted by the gain adjusting unit 23 and the sound effect downmix signal 33, so that the balance of the utterance sound / sound effect in accordance with the intention of the program producer. 2 channel acoustic signal content 34 is output. Note that one or both of the speech sound downmix signal 32 and the sound effect downmix signal 33 synthesized by the synthesizing unit 24 may be gain-adjusted.

ここで、図１に示す実施形態においては、発話音ダウンミックス信号３２のゲイン調整を行う場合に、その調整量が大きいと合成手段２４による合成を行ったダウンミックス音響のレベルが上昇又は下降してしまう場合がある。したがって、本実施形態における合成手段２４は、ゲイン調整手段２３によりゲイン調整された音響信号（発話音ダウンミックス信号３２）の変化量に応じて、例えば合成後のトータル音量がほぼ一定に保たれるように、発話音ダウンミックス信号３２と効果音ダウンミックス信号３３の合成前又は合成後に音量の調整を行う。例えば、合成手段２４は、発話音ダウンミックス信号３２のゲイン調整により信号レベルを上昇させた場合には、合成前の効果音ダウンミックス信号３３の信号レベルを上昇の割合に対応させて下降させてもよく、また合成後の２チャンネル音響信号の信号レベルをトータル音量がほぼ一定になるように下降させてよい。 Here, in the embodiment shown in FIG. 1, when the gain of the speech sound downmix signal 32 is adjusted, if the amount of adjustment is large, the level of the downmix sound synthesized by the synthesizing unit 24 increases or decreases. May end up. Therefore, for example, the synthesizing unit 24 in the present embodiment keeps the total volume after synthesis, for example, substantially constant in accordance with the amount of change in the acoustic signal (speech sound downmix signal 32) gain-adjusted by the gain adjusting unit 23. As described above, the volume is adjusted before or after the synthesis of the speech sound downmix signal 32 and the sound effect downmix signal 33. For example, when the signal level is raised by adjusting the gain of the utterance sound downmix signal 32, the synthesizing unit 24 lowers the signal level of the sound effect downmix signal 33 before synthesis corresponding to the rate of increase. Alternatively, the signal level of the two-channel sound signal after synthesis may be lowered so that the total volume becomes substantially constant.

上述した第１の実施形態によれば、多チャンネル音響の音響コンテンツ制作のための音源素材（又は多チャンネル音響の音響コンテンツ）からダウンミックスにより自動制作することができる。また、第１の実施形態によれば、制作側で多チャンネル音響信号と同時並行してダウンミックスにより２チャンネル音響信号を制作することができる。 According to the first embodiment described above, it is possible to automatically produce a sound source material (or multichannel audio content) for producing multichannel audio content by downmixing. Further, according to the first embodiment, the production side can produce a 2-channel audio signal by downmixing in parallel with the multi-channel audio signal.

したがって、第１の実施形態にて得られる多チャンネル音響信号コンテンツ３１と２チャンネル音響信号コンテンツ３４とをそれぞれ異なる用途で使用することができ、また多チャンネル音響信号コンテンツ３１と２チャンネル音響信号コンテンツ３４を同時に伝送してサイマルキャスト方式として音響信号コンテンツをユーザに提供することができる。 Therefore, the multi-channel audio signal content 31 and the 2-channel audio signal content 34 obtained in the first embodiment can be used for different purposes, respectively, and the multi-channel audio signal content 31 and the 2-channel audio signal content 34 can be used. Can be transmitted simultaneously to provide the user with audio signal content as a simulcast method.

上述した第１の実施形態によれば、制作側における自動ダウンミックスにおいて、３次元音響方式等の多数の音響チャンネルを有する音響コンテンツの制作と同時並行して、適切な発話音／効果音のバランスが保たれた２チャンネル音響や５．１チャンネルサラウンド音響等の音響コンテンツを自動的にダウンミックス制作することができる。また、規定のダウンミックスの計算式及び音響信号の意味内容に関する発話音／効果音識別信号に則って、発話音のダウンミックス信号及び効果音のダウンミックス信号を生成し、また、発話音信号及び効果音信号から発話音／効果音のバランスを測定し、発話音／効果音のバランス測定結果に基づき、発話音信号のレベルをゲイン調整し、ゲイン調整した発話音信号と効果音信号を合成して、適切な発話音／効果音のバランスが保たれたダウンミックス信号を制作することができる。 According to the first embodiment described above, in the automatic downmix on the production side, an appropriate utterance / sound effect balance is performed simultaneously with the production of the audio content having a large number of audio channels such as a three-dimensional audio system. Audio content such as 2-channel sound and 5.1-channel surround sound in which sound quality is maintained can be automatically downmixed. Further, in accordance with the prescribed downmix calculation formula and the utterance sound / sound effect identification signal regarding the meaning content of the sound signal, the utterance sound downmix signal and the sound effect downmix signal are generated, and the utterance sound signal and Measures the sound / sound effect balance from the sound signal, adjusts the level of the sound signal based on the result of the sound / sound balance measurement, and synthesizes the gain adjusted sound signal and sound effect signal. Thus, it is possible to produce a downmix signal in which an appropriate utterance / effect sound balance is maintained.

＜変換式を用いたダウンミックスの具体例＞
ここで、上述した変換式を用いたダウンミックスの具体例について図を用いて説明する。なお、以下の説明では、多チャンネル音響の一例として、２２．２チャンネル音響を用いこととするが、本発明においてはこれに限定されるものではない。 <Specific example of downmix using conversion formula>
Here, a specific example of the downmix using the conversion formula described above will be described with reference to the drawings. In the following description, 22.2 channel sound is used as an example of multi-channel sound, but the present invention is not limited to this.

図２は、２２．２チャンネル時における音響スピーカ（サウンドシステム）の配置例を示す図である。また、図３は、２２．２チャンネルの音響信号からダウンミックスする際の計算式の一例を示す図である。 FIG. 2 is a diagram illustrating an arrangement example of acoustic speakers (sound systems) in 22.2 channel. FIG. 3 is a diagram illustrating an example of a calculation formula used when downmixing from 22.2 channel acoustic signals.

なお、図３（ａ）は、２２．２チャンネルから２チャンネルへのダウンミックス式とベース音響チャンネルの例を示し、図３（ｂ）は、２２．２チャンネルから５．１ｃｈへのダウンミックス式とベース音響チャンネルの例を示している。 3A shows an example of a downmix type from the 22.2 channel to 2 channels and a bass sound channel, and FIG. 3B shows a downmix type from the 22.2 channel to 5.1ch. And an example of a bass acoustic channel.

例えば、２２．２チャンネルでは、図２に示すように、テレビスクリーン（ＴＶＳｃｒｅｅｎ）に対する立体空間上のスピーカの配置において、トップ層に９チャンネル（ＴｐＦＬ，ＴｐＦＣ，ＴｐＦＲ，ＴｐＳｉＬ，ＴｐＣ，ＴｐＳｉＲ，ＴｐＢＬ，ＴｐＢＣ，ＴｐＢＲ）、ミドル層に１０チャンネル（ＦＬ，ＦＬｃ，ＦＣ，ＦＲｃ，ＦＲ，ＳｉＲ，ＢＲ，ＢＣ，ＢＬ，ＳｉＬ）、ボトム層に３チャンネル（ＢｔＦＬ，ＢｔＦＣ，ＢｔＦＲ）、ＬＦＥ（ＬｏｗＦｒｅｑｕｅｎｃｙＥｆｆｅｃｔ；低域効果音）に２チャンネル（ＬＦＥ１，ＬＦＥ２）を有している。 For example, in the 22.2 channel, as shown in FIG. 2, in the arrangement of the speakers in the three-dimensional space with respect to the TV screen (TV Screen), the top layer has 9 channels (TpFL, TpFC, TpFR, TpSiL, TpC, TpSiR, TpBL). , TpBC, TpBR), 10 channels (FL, FLc, FC, FRc, FR, SiR, BR, BC, BL, SiL) in the middle layer, 3 channels (BtFL, BtFC, BtFR), LFE (Low Frequency) in the bottom layer (Effect: low-frequency effect sound) has two channels (LFE1, LFE2).

この場合、２２．２チャンネルから２チャンネルにダウンミックスする場合には、例えば図３（ａ）に示す（１），（２）式を用いることにより、ベース音響チャンネルである２チャンネル（Ｌ，Ｒ）の音響信号を算出することができる。 In this case, in the case of downmixing from 22.2 channels to 2 channels, for example, by using the equations (1) and (2) shown in FIG. ) Acoustic signal can be calculated.

なお、本実施形態によれば、同様に他のチャンネルにもダウンミックスすることができ、例えば２２．２チャンネルから５．１チャンネルにダウンミックスする場合には、図３（ｂ）に示す（３）〜（７）式を用いて計算することにより、ベース音響チャンネルである５．１チャンネル（Ｌ，Ｒ，Ｃ，ＬＳ，ＲＳ，ＬＦＥ）の音響信号を算出することができる。 Note that according to the present embodiment, it is possible to similarly downmix to other channels. For example, in the case of downmixing from 22.2 channels to 5.1 channels, as shown in FIG. ) To (7) to calculate 5.1 channel (L, R, C, LS, RS, LFE) acoustic signals which are base acoustic channels.

なお、図３に示すようなダウンミックス式は、例えば上述した非特許文献１等に示されている。 In addition, the downmix type as shown in FIG. 3 is shown by the nonpatent literature 1 etc. which were mentioned above, for example.

＜本実施形態に適用可能なチャンネル数の例＞
ここで、上述した本実施形態における音響信号変換（ダウンミックス）を行うために入力可能な音響信号については、上述した２２．２チャンネルに限定されるものではなく、例えば１２．２チャンネル、１０．２チャンネル、９．１チャンネル、８．１チャンネル、７．１チャンネル、６．１チャンネル等を用いることができる。 <Example of the number of channels applicable to this embodiment>
Here, the acoustic signal that can be input to perform the acoustic signal conversion (downmix) in the present embodiment described above is not limited to the 22.2 channel described above. Two channels, 9.1 channels, 8.1 channels, 7.1 channels, 6.1 channels, etc. can be used.

また、ダウンミックスされるチャンネル数としては、例えば一般家庭において実現可能な音響設備のチャンネル数に対応していることが好ましく、例えば上述した２チャンネルや５．１チャンネル、更には１チャンネルや３チャンネル、５チャンネル（ＬＦＥなし）等でも適用することができる。 Also, the number of channels to be downmixed preferably corresponds to, for example, the number of channels of audio equipment that can be realized in a general home. For example, the above-described 2 channels or 5.1 channels, and further, 1 channel or 3 channels It can also be applied to 5 channels (without LFE).

＜ゲイン調整量の算出例＞
次に、ゲイン調整量算出手段２２におけるゲイン調整量算出例について図を用いて説明する。図４は、本実施形態におけるゲイン調整量算出例を説明するための図である。なお、図４では、発話音信号（発話音ダウンミックス信号３２）レベルのゲイン調整値関数の例を示している。 <Calculation example of gain adjustment amount>
Next, a gain adjustment amount calculation example in the gain adjustment amount calculation means 22 will be described with reference to the drawings. FIG. 4 is a diagram for explaining a gain adjustment amount calculation example in the present embodiment. FIG. 4 shows an example of the gain adjustment value function of the level of the utterance sound signal (utterance sound downmix signal 32).

図４に示すように、横軸を重み付き相対レベル差（「効果音」−「発話音」）とし、縦軸をゲイン調整量（ｄＢ）とした場合、例えば、重み付き相対レベル差が「−６」から「０」に増加するに従い、ゲイン調整量（ｄＢ）を「０」から「６」に線形に増加させるようにゲイン調整を行っている。上述したように、予め設定されるゲイン調整値関数を用いることにより、発話音／効果音のバランスの測定結果に基づいて、対応する調整量を容易に算出することができる。 As shown in FIG. 4, when the horizontal axis is a weighted relative level difference (“sound effect” − “utterance sound”) and the vertical axis is a gain adjustment amount (dB), for example, the weighted relative level difference is “ The gain adjustment is performed so that the gain adjustment amount (dB) increases linearly from “0” to “6” as it increases from “−6” to “0”. As described above, by using a preset gain adjustment value function, the corresponding adjustment amount can be easily calculated based on the measurement result of the utterance sound / sound effect balance.

なお、図４に示す例では、発話音信号レベルのゲイン調整値関数の例を示しているが、本発明においてはこれに限定されるものではなく、例えば効果音信号（効果音ダウンミックス信号３３）レベルのゲイン調整値関数を設定しておき、効果音信号レベルを調整してもよく、上述した関数を用いて発話音信号レベル及び効果音信号レベルの両方の調整を行ってもよい。 The example shown in FIG. 4 shows an example of the gain adjustment value function of the speech signal level. However, the present invention is not limited to this, and for example, a sound effect signal (sound effect downmix signal 33). ) A level gain adjustment value function may be set to adjust the sound effect signal level, or both the speech sound signal level and the sound effect signal level may be adjusted using the above-described function.

また、図４に示す関数は、発話音信号レベル及び効果音信号レベル毎に別の関数を設定し、またダウンミックスされるチャンネル数に応じて別の関数を設定しておくことで、音の種類やチャンネル数に応じて最適なレベル調整を実現することができる。 The function shown in FIG. 4 is set for each utterance sound signal level and sound effect signal level, and by setting another function according to the number of channels to be downmixed, Optimal level adjustment can be realized according to the type and number of channels.

なお、本実施形態において、発話音信号のゲイン調整に伴うダウンミックス音のレベル上昇は、トータルの音量が保たれるように、例えば発話音信号と効果音信号の合成後（ダウンミックス後）又は合成前に調整する。また、発話音信号のゲイン調整による時間的な連続性を保つため、番組内又はある一定時間内でゲイン調整値の急激な変動を起こさないようにする。 In the present embodiment, the level increase of the downmix sound accompanying the gain adjustment of the utterance sound signal is, for example, after the synthesis of the utterance sound signal and the sound effect signal (after the downmix) or so as to maintain the total volume. Adjust before synthesis. Also, in order to maintain temporal continuity by adjusting the gain of the utterance sound signal, the gain adjustment value is not suddenly changed within the program or within a certain time.

＜音響信号変換システム：第２の実施形態＞
次に、音響信号変換システムの第２の実施形態について図を用いて説明する。
図５は、第２の実施形態における音響信号変換システム（送信側）の一例を示す図である。また、図６は、第２の実施形態における音響信号変換システム（受信側）の一例を示す図である。つまり、第２の実施形態における音響信号変換システム４０は、図５に示す送信側の音響信号変換システム４０−１と、図６に示す受信側の音響信号変換システム４０−２とに大別される。 <Acoustic Signal Conversion System: Second Embodiment>
Next, a second embodiment of the acoustic signal conversion system will be described with reference to the drawings.
FIG. 5 is a diagram illustrating an example of an acoustic signal conversion system (transmission side) in the second embodiment. FIG. 6 is a diagram illustrating an example of an acoustic signal conversion system (reception side) in the second embodiment. That is, the acoustic signal conversion system 40 according to the second embodiment is roughly classified into a transmission-side acoustic signal conversion system 40-1 shown in FIG. 5 and a reception-side acoustic signal conversion system 40-2 shown in FIG. The

第２の実施形態では、音響信号変換装置で多チャンネル音響信号からダウンミックスにより２チャンネル音響信号を生成してユーザに提供するため、送信側の音響信号制作装置においてミキシングメタデータの生成を行い、受信側の音響信号変換装置において、音響信号制作装置から伝送されたミキシングメタデータを受信し、そのミキシングメタデータを用いた２チャンネル音響ダウンミックス信号を生成する例を示している。 In the second embodiment, in order to generate a two-channel sound signal by downmixing from a multi-channel sound signal in the sound signal conversion device and provide it to the user, the sound signal production device on the transmission side generates mixing metadata, An example is shown in which a receiving-side acoustic signal conversion device receives mixing metadata transmitted from an acoustic signal production device and generates a two-channel acoustic downmix signal using the mixing metadata.

なお、以下の説明において、上述した図１に示す第１の実施形態における同様の機能構成については、同一の符号を付するものとし、ここでの具体的な説明は省略する。 In the following description, the same functional configuration in the above-described first embodiment shown in FIG. 1 is denoted by the same reference numeral, and a specific description thereof is omitted here.

＜音響信号変換システムの送信側の構成について＞
図５に示す音響信号変換システム４０−１は、音響収録再生装置１１と、音声入力手段としてのマイク１２と、音響ミキシング装置４３と、音響信号制作装置４４とを有するよう構成されている。音響信号制作装置４４は、発話音／効果音バランス測定装置２１と、ゲイン調整量算出手段２２と、ミキシングメタデータ多重手段４５とを有している。 <Configuration of transmission side of acoustic signal conversion system>
The acoustic signal conversion system 40-1 shown in FIG. 5 is configured to include an acoustic recording / reproducing device 11, a microphone 12 as an audio input unit, an acoustic mixing device 43, and an acoustic signal production device 44. The acoustic signal production device 44 includes an utterance sound / sound effect balance measurement device 21, a gain adjustment amount calculation unit 22, and a mixing metadata multiplexing unit 45.

図５に示す送信側の音響信号変換システム４０−１では、上述した第１の実施形態で示すように、１又は複数の音響収録再生装置１１及びマイク１２により出力される１又は複数の音響信号（音源素材）を音響ミキシング装置４３で入力する。なお、多チャンネル音響の音響コンテンツ制作のための音源素材には、上述したように音響信号の意味内容に関する属性データである「発話音／効果音識別」が入力される音響信号毎に事前に付加されている。 In the transmission-side acoustic signal conversion system 40-1 shown in FIG. 5, as shown in the first embodiment described above, one or more acoustic signals output from one or more acoustic recording / reproducing devices 11 and the microphone 12. (Sound source material) is input by the acoustic mixing device 43. In addition, as described above, “speech sound / sound effect identification”, which is attribute data related to the meaning content of the sound signal, is added in advance to the sound source material for multi-channel sound content production in advance for each sound signal to be input. Has been.

音響ミキシング装置４３は、番組制作者の意図に沿った最適な発話音／効果音のバランスの多チャンネル音響の音響コンテンツを作成する。つまり、制作時においては、番組制作者の意図等に応じて訓練された音響調整者が操作するミキシング情報に基づいて、番組制作者の意図等に沿った最適な発話音／効果音のバランスで制作された多チャンネル音響の音響コンテンツが制作される。 The sound mixing device 43 creates sound content of multi-channel sound having an optimal utterance / effect sound balance in accordance with the intention of the program producer. In other words, at the time of production, based on the mixing information operated by the sound adjuster trained according to the program producer's intention, etc., the optimal utterance / sound effect balance in accordance with the program producer's intention, etc. The produced multi-channel sound content is produced.

また、音響ミキシング装置４３は、上述した第１の実施形態と同様に多チャンネル音響信号コンテンツ３１、発話音ダウンミックス信号３２、及び効果音ダウンミックス信号３３を出力し、更に発話音／効果音識別メタデータ５１を出力する。つまり、第２の実施形態では、音響ミキシング装置４３にて制作される多チャンネル音響信号には、音響調整者が操作したミキシング情報に基づいてチャンネル音響信号毎に発話音／効果音識別メタデータ５１が生成される。 Also, the acoustic mixing device 43 outputs the multi-channel acoustic signal content 31, the utterance sound downmix signal 32, and the sound effect downmix signal 33 in the same manner as in the first embodiment described above, and further utterance sound / sound effect identification. The metadata 51 is output. That is, in the second embodiment, the multi-channel sound signal produced by the sound mixing device 43 includes the utterance sound / sound effect identification metadata 51 for each channel sound signal based on the mixing information operated by the sound adjuster. Is generated.

具体的に説明すると、発話音ゲインメタデータ及びチャンネル音響信号毎に付随される発話音／効果音識別メタデータ５１は、まず、事前に付加されている属性データの「発話音／効果音識別」と音響調整者が操作したミキシング情報とに基づいて、設定される各音響チャンネル信号毎の発話音／効果音識別メタデータ５１を生成する。なお、そのチャンネル信号の音響内容が「発話音」と「効果音」の両方を含む場合には、周波数帯域を例えば聴覚の臨界帯域幅を有する複数の周波数帯域に分割し、分割した周波数帯域の信号毎に「発話音／効果音識別」を付加することもできる。「発話音／効果音識別」の付加例については後述する。 More specifically, the utterance sound gain metadata and the utterance sound / sound effect identification metadata 51 attached to each channel sound signal are first set to “speech sound / sound effect identification” of attribute data added in advance. Utterance sound / sound effect identification metadata 51 for each set acoustic channel signal is generated based on the mixing information operated by the sound adjuster. When the acoustic content of the channel signal includes both “speech sound” and “sound effect”, the frequency band is divided into a plurality of frequency bands having a critical bandwidth of hearing, for example, "Speech sound / sound effect identification" can be added for each signal. An additional example of “speech sound / sound effect identification” will be described later.

次に、音響ミキシング装置４３は、事前に付加されている属性データ「発話音／効果音識別」に従い発話音の２チャンネル音響ダウンミックス及び効果音の２チャンネル音響ダウンミックスの２種類のダウンミックス信号を、音響調整者の操作するミキシング情報及び図３（ａ），（ｂ）に示すような規定のダウンミックスの計算式に則って生成する。 Next, the acoustic mixing device 43 performs two types of downmix signals, a two-channel sound downmix of the utterance sound and a two-channel sound downmix of the sound effect, according to the attribute data “speech sound / sound effect identification” added in advance. Are generated in accordance with the mixing information operated by the sound adjuster and the prescribed downmix calculation formula as shown in FIGS.

音響信号制作装置４４の発話音／効果音バランス測定装置２１は、入力された発話音ダウンミックス信号３２と、効果音ダウンミックス信号３３とに基づいて、バランス測定を行う。なお、第２の本実施形態では、発話音信号及び効果音信号を、例えば特許文献１や特許文献２に示されるミキシングバランス表示装置等に入力し、発話音／効果音のバランスを測定する。 The utterance sound / sound effect balance measurement device 21 of the sound signal production device 44 performs balance measurement based on the input utterance sound downmix signal 32 and the sound effect downmix signal 33. In the second embodiment, the speech sound signal and the sound effect signal are input to, for example, a mixing balance display device disclosed in Patent Document 1 or Patent Document 2, and the balance of the speech sound / sound effect is measured.

また、ゲイン調整量算出手段２２は、入力される発話音／効果音のバランス測定結果に基づき、上述したように、例えば発話音信号のレベルをゲイン調整するための、２チャンネル音響ダウンミックス信号用の発話音ゲインメタデータからなるゲイン調整量５２を算出する。 Further, the gain adjustment amount calculation means 22 is for a two-channel sound downmix signal for gain adjustment of the level of the utterance sound signal, for example, as described above based on the input utterance sound / effect sound balance measurement result. The gain adjustment amount 52 consisting of the utterance sound gain metadata is calculated.

ここで、「発話ゲインメタデータ」とは、例えば、上述する図４と同様に発話音／効果音のバランス測定結果に基づいた発話音信号レベルのゲイン調整値の関数の出力値で構成される。 Here, the “speech gain metadata” includes, for example, the output value of the function of the gain adjustment value of the speech sound signal level based on the result of the balance measurement of the speech sound / effect sound, as in FIG. 4 described above. .

なお、第２の実施形態でも上述した第１の実施形態と同様に、効果音ダウンミックス信号３３の信号レベルについてのゲイン調整量５２を算出したり、発話音ダウンミックス信号３２及び効果音ダウンミックス信号３３の信号レベルについてのゲイン調整量５２を算出してもよい。 In the second embodiment, as in the first embodiment described above, the gain adjustment amount 52 for the signal level of the sound effect downmix signal 33 is calculated, or the speech sound downmix signal 32 and the sound effect downmix are calculated. The gain adjustment amount 52 for the signal level of the signal 33 may be calculated.

したがって、第２の実施形態では、ゲイン調整量算出手段２２から、例えば全音響チャンネルに１つ付随される発話音ゲインメタデータとしてのゲイン調整量５２が生成されて出力される。 Therefore, in the second embodiment, the gain adjustment amount calculation unit 22 generates and outputs a gain adjustment amount 52 as, for example, speech sound gain metadata associated with one sound channel.

また、第２の実施形態におけるゲイン調整値は、例えば受信側で、難聴者や高齢者、ユーザ毎の音の好み等、各ユーザの条件等に応じてレベルを変更できるように、複数のゲイン調整値を含めてもよい。このように複数のゲイン調整値を含めることにより、受信側でユーザ等が自分に合った音響を選択することができる。 In addition, the gain adjustment value in the second embodiment has a plurality of gains so that, for example, the receiving side can change the level according to the conditions of each user, such as a hearing impaired person, an elderly person, and a sound preference for each user. An adjustment value may be included. By including a plurality of gain adjustment values in this way, the user or the like can select the sound that suits him / her on the receiving side.

ミキシングメタデータ多重手段４５は、多チャンネル音響信号コンテンツ３１に、ミキシングメタデータとして、チャンネル音響信号毎の発話音／効果音識別メタデータ５１、及び全音響チャンネルに１つのゲイン調整量５２等を用いて多重化し、多重化したミキシング信号が音響信号変換装置６０側に送信する。 The mixing metadata multiplexing unit 45 uses, as the mixing metadata for the multi-channel sound signal content 31, the speech sound / sound effect identification metadata 51 for each channel sound signal, and one gain adjustment amount 52 for all sound channels. Then, the multiplexed mixing signal is transmitted to the acoustic signal converter 60 side.

なお、ミキシングメタデータ多重手段４５により多チャンネル音響信号コンテンツ３１に、発話音／効果音識別メタデータ５１及びゲイン調整量５２を多重されるタイミングは、例えば予め設定された時間間隔毎であることが好ましい。これにより、例えば、受信側で番組の途中で他の番組に切り替えた場合でも、迅速に調整された最適な音響をユーザに提供することができる。なお、本発明においてはこれに限定されるものではなく、例えば提供される複数の番組（音響コンテンツ）の切り替わり毎であってもよく、受信側からの音響調整要求があったときでもよい。 The timing at which the speech / sound effect identification metadata 51 and the gain adjustment amount 52 are multiplexed on the multi-channel audio signal content 31 by the mixing metadata multiplexing unit 45 may be, for example, every preset time interval. preferable. Thereby, for example, even when the receiving side switches to another program in the middle of the program, it is possible to provide the user with the optimal sound adjusted quickly. In the present invention, the present invention is not limited to this. For example, it may be performed every time a plurality of programs (sound contents) are switched, or may be when there is an acoustic adjustment request from the receiving side.

＜音響信号変換システムの受信側の構成について＞
次に、図６を用いて音響信号変換システム４０−１の音響信号制作装置４４から送信されたミキシング信号を受信し、受信した信号をスピーカ等の音響出力手段等を用いて出力する受信側の音響信号変換システム４０−２である音響信号変換装置６０の機能構成について図を用いて説明する。 <About the configuration on the receiving side of the acoustic signal conversion system>
Next, the receiving side that receives the mixing signal transmitted from the acoustic signal production device 44 of the acoustic signal conversion system 40-1 using FIG. 6 and outputs the received signal using acoustic output means such as a speaker is used. A functional configuration of the acoustic signal conversion device 60 which is the acoustic signal conversion system 40-2 will be described with reference to the drawings.

図６に示す音響信号変換装置６０は、ミキシングメタデータ分離手段６１と、チャンネル分離手段６２と、ゲイン調整手段６３と、合成手段６４とを有するよう構成されている。なお、図６に示す音響信号変換装置６０は、２チャンネル音響信号７１を出力する。なお、本発明においてはこれに限定されるものではなく、例えば上述した５．１チャンネル音響信号を出力してもよい。 The acoustic signal conversion device 60 shown in FIG. 6 is configured to include mixing metadata separation means 61, channel separation means 62, gain adjustment means 63, and synthesis means 64. Note that the acoustic signal converter 60 shown in FIG. 6 outputs a two-channel acoustic signal 71. Note that the present invention is not limited to this. For example, the 5.1 channel acoustic signal described above may be output.

音響信号変換装置６０は、音響信号変換システム４０から送信されたミキシング信号を受信すると、ミキシングメタデータ分離手段６１は、ミキシングメタデータを分離し、上述したゲイン調整量５２、発話音／効果音識別メタデータ５１、及び多チャンネル音響信号コンテンツ３１を取得する。 When the acoustic signal conversion device 60 receives the mixing signal transmitted from the acoustic signal conversion system 40, the mixing metadata separation means 61 separates the mixing metadata, and the above-described gain adjustment amount 52, speech sound / sound effect identification. Metadata 51 and multi-channel audio signal content 31 are acquired.

また、チャンネル分離手段６２は、多チャンネル音響信号コンテンツ３１を入力し、発話音／効果音識別メタデータ５１を用いて発話音ダウンミックス信号３２と効果音ダウンミックス信号３３とに分離して出力する。 Further, the channel separation means 62 receives the multi-channel sound signal content 31, separates it into a speech sound downmix signal 32 and a sound effect downmix signal 33 using the speech sound / sound effect identification metadata 51, and outputs it. .

ここで、ゲイン調整手段６３は、発話音ダウンミックス信号３２を入力し、ゲイン調整量５２に基づいて信号レベルの調整を行う。なお、本実施形態においては、信号レベルの調整だけではなく、上述したように発話音ダウンミックス信号３２及び効果音ダウンミックス信号３３の信号レベルを調整してもよい。また、スピーチレベルと効果音レベルの両方を調整してもよい。 Here, the gain adjusting unit 63 receives the utterance sound downmix signal 32 and adjusts the signal level based on the gain adjustment amount 52. In the present embodiment, not only the signal level but also the signal levels of the speech sound downmix signal 32 and the sound effect downmix signal 33 may be adjusted as described above. Further, both the speech level and the sound effect level may be adjusted.

また、ゲイン調整手段６３は、入力されるゲイン調整量５２に複数の調整値が含まれている場合には、ユーザが何れかの調整値を設定することで、難聴者や高齢者、ユーザ毎の音の好み等に対して、自分に適した音響にゲイン調整することができる。 Further, the gain adjusting means 63, when the input gain adjustment amount 52 includes a plurality of adjustment values, allows the user to set any adjustment value so that the hearing impaired person, the elderly, You can adjust the gain to the sound suitable for you.

また、合成手段６４は、ゲイン調整手段６３から得られる発話音ダウンミックス信号３２と、効果音ダウンミックス信号３３とを合成し、２チャンネル音響信号７１を出力する。 The synthesizing unit 64 synthesizes the utterance sound downmix signal 32 obtained from the gain adjusting unit 63 and the sound effect downmix signal 33 and outputs a two-channel sound signal 71.

上述したように、第２の実施形態では、まず制作側で、３次元音響方式等の多数の音響チャンネルを有する音響コンテンツの制作に並行して、「発話音／効果音のバランス」の劣化を改善するために必要なミキシングメタデータを、音響信号の意味内容に関する属性「発話音／効果音識別」データ、及び各音響チャンネルのミキシングレベル情報及び発話音／効果音のバランス測定結果に基づいて自動的に生成し、次に、多数の音響チャンネルを有する番組コンテンツとこれに付随するミキシングメタデータを伝送する。これに対し受信側では、番組コンテンツの音響信号とこれに付随したミキシングメタデータに基づいて２チャンネル音響信号にダウンミックスすることにより、最適な発話音／効果音のバランスの音響信号を聞くことができる。 As described above, in the second embodiment, first, on the production side, in parallel with the production of the audio content having a large number of audio channels such as a three-dimensional audio system, the “speech sound / sound effect balance” is deteriorated. Automatic mixing metadata necessary for improvement based on attribute “speech sound / sound effect identification” data related to the semantic content of the sound signal, mixing level information of each sound channel, and measurement result of speech sound / sound effect balance Then, the program content having a number of sound channels and the accompanying mixing metadata are transmitted. On the other hand, on the receiving side, an audio signal having an optimal balance of utterance sound / effect sound can be heard by down-mixing into a 2-channel audio signal based on the audio signal of the program content and the accompanying mixing metadata. it can.

つまり、受信側では、多チャンネル音響信号とこれに付随したミキシングメタデータを受信し、このミキシングメタデータに基づいて２チャンネル音響信号にダウンミックスするが、このミキシングメタデータを利用した２チャンネル音響ダウンミックスは、番組制作者の意図に沿った「発話音／効果音のバランス」が実現され、「発話音／効果音のバランス」の劣化が改善される。 In other words, the receiving side receives a multi-channel audio signal and accompanying mixing metadata, and downmixes it to a two-channel audio signal based on the mixing metadata, but the two-channel audio down using this mixing metadata. In the mix, “speech sound / sound effect balance” in accordance with the intention of the program producer is realized, and deterioration of “speech sound / sound effect balance” is improved.

＜「発話音／効果音識別」の付加例＞
次に、上述した属性データとしての「発話音／効果音識別」の付加例について具体的に説明する。 <Additional example of “speech sound / sound effect identification”>
Next, an additional example of “speech sound / sound effect identification” as the attribute data will be described in detail.

本実施形態では、送信される多チャンネル音響信号の各音響チャンネルは、「発話音」又は「効果音」の内容のみを含む場合と、「発話音」と「効果音」の両者を含む場合がある。ここで、全周波数帯域にわたり「発話音」又は「効果音」の内容のみを含む場合は、当該チャンネルに１度だけ音響内容が「発話音」又は「効果音」の何れかを示す「発話音／効果音識別」を音響信号に付加して送付する。 In the present embodiment, each acoustic channel of the transmitted multi-channel acoustic signal may include only the content of “speech sound” or “sound effect”, or may include both “speech sound” and “sound effect”. is there. Here, when only the content of “speech sound” or “sound effect” is included over the entire frequency band, the “speech sound” indicating whether the sound content is “speech sound” or “sound effect” only once in the channel. / Sound effect identification "is added to the sound signal and sent.

一方、「発話音」と「効果音」の両者を含む場合は、例えば、周波数帯域を予め設定される聴覚の臨界帯域幅を有する複数の周波数帯域に分割し、分割した各周波数帯域に対応させて「発話音」又は「効果音」の何れかを示す「発話音／効果音識別」を音響信号に付加して送付する。したがって、「発話音／効果音識別」において、「発話音」識別が付けられた周波数帯域信号は発話音成分とし、「効果音」識別が付けられた周波数帯域信号は効果音成分とみなす。これにより、当該チャンネルの音響信号を発話音信号と効果音信号の両者のうち何れかに分離することができる。なお、どの周波数帯域を「発話音」とし、どの周波数帯域を「効果音」とするかについては、番組制作者の意図や音響信号の内容等に応じて予め設定しておくものとする。 On the other hand, when both “speech sound” and “sound effect” are included, for example, the frequency band is divided into a plurality of frequency bands having a preset auditory critical bandwidth, and each divided frequency band is made to correspond. Then, “speech sound / sound effect identification” indicating either “speech sound” or “sound effect” is added to the acoustic signal and sent. Therefore, in “speech sound / sound effect identification”, the frequency band signal with “speech sound” identification is regarded as a speech sound component, and the frequency band signal with “sound effect” identification is regarded as a sound effect component. Thereby, the sound signal of the channel can be separated into either the speech sound signal or the sound effect signal. It should be noted that which frequency band is set as “speech sound” and which frequency band is set as “sound effect” is set in advance according to the intention of the program producer, the contents of the sound signal, and the like.

上述した聴覚の臨界帯域幅とは、例えば「Ｅ．ヴィッカー／原著者、山田由紀子／訳者、心理音響学、原書名：ＰＳＹＣＨＯＡＫＵＳＴＩＫ、Ｐ．７４」等の文献に記載されている臨界周波数表等を用いることができる。 The above-mentioned critical bandwidth of hearing is, for example, a critical frequency table described in documents such as “E. Vicker / original author, Yukiko Yamada / translator, psychoacoustics, original title: PSYCHOKUSTIK, P. 74”, etc. Can be used.

図７は、臨界帯域番号と周波数との関係の一例を示す図である。なお、図７では、臨界帯域番号と周波数の関係の他にも臨界帯域幅△ｆｇとその中止周波数ｆｍの関係も示している。なお、中心周波数ｆｍに属する臨界帯域番号ｚも同様に掲げてある。臨界帯域幅に属する互いに隣接した臨界帯域の境界周波数ｆｕとｆ０は、２列目に示された値に相当する。 FIG. 7 is a diagram illustrating an example of the relationship between the critical band number and the frequency. In addition to the relationship between the critical band number and the frequency, FIG. 7 also shows the relationship between the critical bandwidth Δfg and the stop frequency fm. The critical band number z belonging to the center frequency fm is also listed. The boundary frequencies fu and f0 of the critical bands adjacent to each other belonging to the critical bandwidth correspond to the values shown in the second column.

図７に示すように、帯域番号ｚと周波数ｆとの関係は、聴覚の働きを理解するのに非常に重要である。そのため、この臨界帯域幅を用いて高精度にチャンネルの音響信号を発話音信号と効果音信号の何れかに分離することができる。 As shown in FIG. 7, the relationship between the band number z and the frequency f is very important for understanding the function of hearing. Therefore, using this critical bandwidth, it is possible to separate the acoustic signal of the channel into either the speech sound signal or the sound effect signal with high accuracy.

ここで、上述した第１及び第２の実施形態は、本発明においてはこれに限定されるものではなく、例えば第１及び第２の実施形態を組み合わせた実施形態でもよい。 Here, the first and second embodiments described above are not limited to this in the present invention, and may be, for example, an embodiment in which the first and second embodiments are combined.

＜音響信号変換プログラム＞
なお、上述した実施形態は、上述した音響信号変換システムにおける専用の装置構成により、本発明における上述した音響信号変換手順を行うこともできるが、上述した音響信号変換手順に関する各処理をコンピュータに実行させることができる実行プログラム（音響信号変換プログラム）を生成し、例えば、汎用のパーソナルコンピュータ、ワークステーション等に音響信号変換プログラムをインストールすることにより本発明における音響信号変換が実現可能となる。 <Sound signal conversion program>
In the above-described embodiment, the above-described acoustic signal conversion procedure according to the present invention can be performed by the dedicated device configuration in the above-described acoustic signal conversion system. For example, an acoustic signal conversion program according to the present invention can be realized by generating an execution program (acoustic signal conversion program) that can be executed and installing the acoustic signal conversion program in, for example, a general-purpose personal computer or workstation.

つまり、上述した音響信号変換システム１０や音響信号変換装置１４，４４は、ＣＰＵ、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の揮発性の記憶媒体、ＲＯＭ等の不揮発性の記憶媒体、マウスやキーボード、ポインティングデバイス等の入力装置、画像やデータを表示する表示部、並びに外部と通信するためのインタフェースを備えたコンピュータによって構成することができる。 That is, the acoustic signal conversion system 10 and the acoustic signal conversion devices 14 and 44 described above are a volatile storage medium such as a CPU and a RAM (Random Access Memory), a nonvolatile storage medium such as a ROM, a mouse, a keyboard, and a pointing device. The computer can include an input device such as a display unit for displaying images and data, and an interface for communicating with the outside.

したがって、音響信号変換システムや音響信号変換装置１４，４４が有する各機能は、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現可能となる。また、これらのプログラムは、磁気ディスク（フロッピィーディスク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤ等）、半導体メモリ等の記録媒体に格納して頒布することもできる。 Accordingly, the functions of the acoustic signal conversion system and the acoustic signal converters 14 and 44 can be realized by causing the CPU to execute a program describing these functions. These programs can also be stored and distributed in a recording medium such as a magnetic disk (floppy disk, hard disk, etc.), optical disk (CD-ROM, DVD, etc.), semiconductor memory, or the like.

つまり、上述した各構成における処理をコンピュータに実行させるための実行プログラム（音響信号変換プログラム）を生成し、例えば、汎用のパーソナルコンピュータやサーバ等にそのプログラムをインストールすることにより、コンピュータを、上述した音響信号変換システムや音響信号変換装置として機能させることができる。 That is, an execution program (acoustic signal conversion program) for causing the computer to execute the processing in each of the above-described configurations is generated, and the computer is installed on the general-purpose personal computer, server, or the like, for example. It can function as an acoustic signal conversion system or an acoustic signal conversion device.

ここで、上述した第１及び第２の実施形態に対応させた音響信号変換プログラムによる音響信号変換処理を流れについてフローチャート又はシーケンス図を用いて説明する。 Here, the flow of acoustic signal conversion processing by the acoustic signal conversion program corresponding to the first and second embodiments described above will be described with reference to a flowchart or a sequence diagram.

＜音響信号変換処理手順：第１の実施形態＞
図８は、第１の実施形態における音響信号変換処理手順の一例を示すフローチャートである。図８において、まず１又は複数の音響信号（「発話音／効果音識別」等の属性データ付き）を取得し（Ｓ０１）、得られた音響信号等から多チャンネル音響信号コンテンツを制作する（Ｓ０２）。このとき、多チャンネル音響信号コンテンツは、音響調整者等により最適な音響に調整されている。 <Sound Signal Conversion Processing Procedure: First Embodiment>
FIG. 8 is a flowchart illustrating an example of an acoustic signal conversion processing procedure according to the first embodiment. In FIG. 8, first, one or a plurality of sound signals (with attribute data such as “speech sound / sound effect identification”) are acquired (S01), and multi-channel sound signal content is produced from the obtained sound signals (S02). ). At this time, the multi-channel sound signal content is adjusted to an optimal sound by a sound adjuster or the like.

次に、多チャンネル音響信号コンテンツの制作と並行して、上述したように変換式等により発話音ダウンミックス信号と効果音ダウンミックス信号を生成し（Ｓ０３）、生成した発話音ダウンミックス信号と効果音ダウンミックス信号に基づいて、バランス測定を行う（Ｓ０４）。また、Ｓ０４の処理により得られたバランス測定結果から２チャンネル音響等のダウンミックスされた音響信号に対応するゲイン調整量を算出し（Ｓ０５）、算出されたゲイン調整量に基づいてゲイン調整を行う（Ｓ０６）。 Next, in parallel with the production of the multi-channel audio signal content, the utterance sound downmix signal and the sound effect downmix signal are generated by the conversion formula as described above (S03), and the generated utterance sound downmix signal and the effect are generated. Balance measurement is performed based on the sound downmix signal (S04). Also, a gain adjustment amount corresponding to the downmixed sound signal such as a two-channel sound is calculated from the balance measurement result obtained in the process of S04 (S05), and gain adjustment is performed based on the calculated gain adjustment amount. (S06).

次に、Ｓ０６の処理によりゲイン調整された発話音ダウンミックス信号と効果音ダウンミックス信号の合成を行い（Ｓ０７）、合成された２チャンネル音響信号コンテンツを出力する（Ｓ０８）。 Next, the speech downmix signal and the sound effect downmix signal whose gains are adjusted by the process of S06 are synthesized (S07), and the synthesized two-channel audio signal content is output (S08).

＜音響信号変換処理手順：第２の実施形態＞
図９は、第２の実施形態における音響信号変換処理手順の一例を示すシーケンス図である。図９において、まず送信側の音響信号制作装置４４において、１又は複数の音響信号（「発話音／効果音識別」等の属性データ付き）を取得し（Ｓ１１）、得られた音響信号等から多チャンネル音響信号コンテンツを制作する（Ｓ１２）。このとき、多チャンネル音響信号コンテンツは、音響調整者等により最適な音響に調整されている。また、発話音／効果音の識別メタデータを生成する（Ｓ１３）。 <Sound Signal Conversion Processing Procedure: Second Embodiment>
FIG. 9 is a sequence diagram illustrating an example of an acoustic signal conversion processing procedure according to the second embodiment. In FIG. 9, first, the transmission-side acoustic signal production device 44 acquires one or a plurality of acoustic signals (with attribute data such as “speech sound / sound effect identification”) (S11), and from the obtained acoustic signals and the like. Multi-channel audio signal content is produced (S12). At this time, the multi-channel sound signal content is adjusted to an optimal sound by a sound adjuster or the like. Also, identification metadata of utterance sound / effect sound is generated (S13).

次に、多チャンネル音響信号コンテンツの制作と並行して、上述したように変換式等により発話音ダウンミックス信号と効果音ダウンミックス信号を生成し（Ｓ１４）、生成した発話音ダウンミックス信号と効果音ダウンミックス信号に基づいて、バランス測定を行う（Ｓ１５）。また、Ｓ１５の処理により得られたバランス測定結果から２チャンネル音響等のダウンミックスされた音響信号に対応するゲイン調整量を算出し（Ｓ１６）、算出されたゲイン調整量と、多チャンネル音響信号コンテンツと、発話音／効果音識別メタデータとを多重化し（Ｓ１７）、多重化したミキシング信号を受信側に送信する（Ｓ１８）。 Next, in parallel with the production of the multi-channel audio signal content, as described above, the utterance sound downmix signal and the sound effect downmix signal are generated by the conversion formula or the like (S14), and the generated utterance sound downmix signal and the effect are generated. Based on the sound downmix signal, balance measurement is performed (S15). Further, a gain adjustment amount corresponding to the down-mixed sound signal such as 2-channel sound is calculated from the balance measurement result obtained by the process of S15 (S16), and the calculated gain adjustment amount and the multi-channel sound signal content are calculated. Then, the utterance sound / sound effect identification metadata is multiplexed (S17), and the multiplexed mixing signal is transmitted to the receiving side (S18).

次に、受信側の音響信号変換装置６０において、音響信号制作装置４４から送信されたミキシング信号について、ミキシングメタデータの分離処理を行い（Ｓ１９）、ゲイン調整量と、多チャンネル音響信号コンテンツと、発話音／効果音識別メタデータとを分離する。次に、分離された多チャンネル音響信号コンテンツと、発話音／効果音識別メタデータに基づいてチャンネル分離処理を行い（Ｓ２０）、発話音ダウンミックス信号と効果音ダウンミックス信号を取得し、得られた発話音ダウンミックス信号に対して、更にＳ１９の処理で得られたゲイン調整量による調整を行う（Ｓ２１）。次に、Ｓ２１の処理によりゲイン調整された発話音ダウンミックス信号と効果音ダウンミックス信号の合成を行い（Ｓ２２）、合成された２チャンネル音響信号コンテンツを出力する（Ｓ２３）。 Next, the receiving side acoustic signal converter 60 performs mixing metadata separation processing on the mixing signal transmitted from the acoustic signal production device 44 (S19), gain adjustment amount, multi-channel acoustic signal content, Separate speech / sound effect identification metadata. Next, channel separation processing is performed based on the separated multi-channel audio signal content and the speech / sound effect identification metadata (S20), and the speech sound downmix signal and the sound effect downmix signal are obtained and obtained. The uttered sound downmix signal is further adjusted by the gain adjustment amount obtained in S19 (S21). Next, the speech downmix signal and the sound effect downmix signal whose gains are adjusted by the processing of S21 are synthesized (S22), and the synthesized two-channel sound signal content is output (S23).

上述したように、実行プログラムをコンピュータにインストールすることにより、容易に上述した音響信号変換処理を実現することができる。 As described above, the acoustic signal conversion process described above can be easily realized by installing the execution program in the computer.

上述したように本発明によれば、多チャンネル音響信号をダウンミックスする場合に、発話音と効果音とのバランスを劣化させずに最適な音響に変換することができる。具体的には、従来のダウンミックスを行うと、「発話音／効果音のバランス」が劣化し、例えばＢＧＭ音によりナレーション音が聞きづらくなる等、番組制作者の意図に沿った「発話音／効果音のバランス」とはならない場合があったが、本発明を適用することにより、３次元音響方式等の多数の音響チャンネルを有する番組コンテンツの制作と同時並行して、適切な「発話音／効果音のバランス」が保たれた２チャンネル音響信号や５．１チャンネルサラウンド音響信号等を自動的にダウンミックス制作することが可能となる。 As described above, according to the present invention, when a multi-channel sound signal is downmixed, the sound can be converted into an optimum sound without deteriorating the balance between the uttered sound and the sound effect. Specifically, when the conventional downmix is performed, the “speech sound / sound effect balance” deteriorates, for example, it becomes difficult to hear the narration sound due to the BGM sound. In some cases, the “balance of sound effects” may not have been achieved. However, by applying the present invention, an appropriate “speech sound / It is possible to automatically produce a downmix of a 2-channel sound signal, 5.1-channel surround sound signal, etc. in which the “balance of sound effects” is maintained.

また、音響信号変換装置においては、本発明を適用することにより、３次元音響方式等の多数の音響チャンネルを有する番組コンテンツの音響信号とこれに付随した当該ミキシングメタデータを受信し、２チャンネル音響信号や５．１チャンネルサラウンド音響信号等にダウンミックスする場合、規定のダウンミックスで発生する「発話音／効果音のバランス」の劣化を改善することが可能となる。 In the sound signal conversion apparatus, by applying the present invention, the sound signal of the program content having a large number of sound channels such as a three-dimensional sound system and the mixing metadata associated therewith are received, and the two-channel sound is received. When downmixing to a signal, 5.1 channel surround sound signal, or the like, it is possible to improve the deterioration of the “speech sound / sound effect balance” generated by the specified downmix.

以上本発明の好ましい実施例について詳述したが、本発明は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形、変更が可能である。 The preferred embodiments of the present invention have been described in detail above, but the present invention is not limited to such specific embodiments, and various modifications, within the scope of the gist of the present invention described in the claims, It can be changed.

１０，４０音響信号変換システム
１１音響収録再生装置
１２マイク（音声入力手段）
１３，４３音響ミキシング装置
１４，６０音響信号変換装置
２１発話音／効果音バランス測定装置
２２ゲイン調整量算出手段
２３，６３ゲイン調整手段
２４，６４合成手段
３１発話音ダウンミックス信号
３２効果音ダウンミックス信号
３３多チャンネル音響信号コンテンツ
３４２チャンネル音響信号コンテンツ
４４音響信号制作装置
４５ミキシングメタデータ多重手段
５１発話音／効果音識別メタデータ
５２ゲイン調整量
６１ミキシングメタデータ分離手段
６２チャンネル分離手段 10, 40 Acoustic signal conversion system 11 Sound recording / reproducing device 12 Microphone (voice input means)
DESCRIPTION OF SYMBOLS 13,43 Acoustic mixing apparatus 14,60 Acoustic signal converter 21 Speech / effect sound balance measurement apparatus 22 Gain adjustment amount calculation means 23,63 Gain adjustment means 24,64 Synthesis means 31 Speech sound downmix signal 32 Effect sound downmix Signal 33 Multi-channel audio signal content 34 2-channel audio signal content 44 Audio signal production device 45 Mixing metadata multiplexing means 51 Speech / effect sound identification metadata 52 Gain adjustment amount 61 Mixing metadata separating means 62 Channel separating means

Claims

In an acoustic signal conversion device that converts an acoustic signal composed of a first signal level and a second signal level in correspondence with a preset number of channels,
The acoustic signal having the first signal level when the acoustic signal corresponding to the first channel number is downmixed to the acoustic signal corresponding to the second channel number, and the acoustic signal having the second signal level. A balance measuring means for measuring the signal level mix balance;
A gain adjustment amount of the first signal level or the second signal level is calculated in correspondence with a relative level difference between the first signal level and the second signal level obtained by the balance measuring means. Gain adjustment amount calculating means;
Gain adjusting means for adjusting the gain of the first signal level and / or the second signal level based on the gain adjustment amount obtained by the gain adjustment amount calculating means;
A sound signal corresponding to the second number of channels is obtained by synthesizing the sound signal having the first signal level and the sound signal having the second signal level using the sound signal having the gain adjusted obtained by the gain adjusting means. It has a synthesizing means for outputting a signal,
The gain adjustment means sets an adjustment time having a length corresponding to the amount of increase or decrease of the gain adjustment amount obtained by the gain adjustment amount calculation means, and is set by the combining means at the set adjustment time. An acoustic signal converter characterized by adjusting a gain so that a total volume after synthesis is kept constant .

In the acoustic signal converter for converting the acoustic signal composed of the first signal level and the second signal level transmitted from the production side in correspondence with the preset number of channels,
From the multiplexed signal transmitted from the production side, an acoustic signal corresponding to the first channel number, the first signal level when downmixed from the first channel number to the second channel number, and To identify the first signal level or the gain amount of the second signal level corresponding to the mix balance of the second signal level, and the acoustic signal of the first signal level and the second signal level Mixing metadata separation means for separating the identification metadata into,
The acoustic signal corresponding to the first number of channels obtained by the mixing metadata separation unit is down-mixed to the acoustic signal corresponding to the second number of channels using the identification metadata. Channel separation means for separating an acoustic signal having a signal level of 1 and an acoustic signal having the second signal level;
A gain for adjusting a gain of the first signal level and / or the second signal level based on a gain adjustment amount corresponding to a relative level difference between the first signal level and the second signal level. Adjusting means;
A sound signal corresponding to the second number of channels is obtained by synthesizing the sound signal having the first signal level and the sound signal having the second signal level using the sound signal having the gain adjusted obtained by the gain adjusting means. Combining means for outputting a signal,
The gain adjusting means sets an adjustment time having a length corresponding to the amount of increase or decrease of the gain adjustment amount, and the total volume after synthesis by the synthesizing means is kept constant for the set adjustment time. An acoustic signal converter characterized by adjusting a gain so as to sag .

The synthesis means includes
The acoustic signal conversion according to claim 1 or 2 , wherein a volume of the acoustic signal corresponding to the second number of channels is adjusted according to a change amount of the acoustic signal gain-adjusted by the gain adjusting unit. apparatus.

The acoustic signal conversion apparatus according to any one of claims 1 to 3 , wherein the acoustic signal composed of the first signal level and the second signal level is a speech sound and a sound effect.

An acoustic signal composed of the input first signal level and the second signal level is separated into the uttered sound or the sound effect using a preset critical bandwidth of hearing. The acoustic signal converter according to claim 4 .

An acoustic signal conversion program for causing a computer to function as each unit included in the acoustic signal converter according to any one of claims 1 to 5 .