JP2008233672A

JP2008233672A - Masking sound generation apparatus, masking sound generation method, program, and recording medium

Info

Publication number: JP2008233672A
Application number: JP2007075283A
Authority: JP
Inventors: Atsuko Ito; 敦子伊藤; Yasushi Shimizu; 寧清水; Akira Miki; 晃三木; Masahito Hata; 雅人秦
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2007-03-22
Filing date: 2007-03-22
Publication date: 2008-10-02
Anticipated expiration: 2027-03-22
Also published as: JP5103974B2

Abstract

PROBLEM TO BE SOLVED: To provide a technique for generating a masking sound having sound characteristics most suitable for masking sound characteristic of a sound to be masked. SOLUTION: In a masking sound generation apparatus according to the present invention, a storage means thereof stores a scramble sound signal having been processed in advance so that a meaning as a language can not be decided. In operation mode 1, a CPU analyzes sound characteristics of a noise generated in a sound space, reads a scramble sound signal similar to the analysis result out of the storage means, and outputs it as a masking sound. In operation mode 2, when a user specifies a scramble sound signal directly or when information associated with properties of a person using the sound space etc., is received, the CPU selects and reads a scramble sound signal out of the storage means according to the input contents and outputs it as a masking sound. COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、マスキングサウンドを生成する技術に関する。 The present invention relates to a technique for generating a masking sound.

ある音（対象音）が聞こえているときに対象音に近い音響特性（周波数特性など）を持つ別の音（マスキングサウンド）が存在すると、その対象音が聞こえにくくなるという現象が一般に知られており、マスキング効果と呼ばれている。マスキング効果は、人間の聴覚特性に根ざしたものであり、マスキングサウンドの周波数が対象音の周波数に近いほど、また、マスキングサウンドの音量レベルが対象音の音量レベルに対して相対的に高いほど顕著になることが知られている。 It is generally known that when a certain sound (target sound) is heard and there is another sound (masking sound) with acoustic characteristics (frequency characteristics, etc.) close to the target sound, the target sound becomes difficult to hear. This is called the masking effect. The masking effect is rooted in human auditory characteristics, and becomes more prominent as the masking sound frequency is closer to the target sound frequency and the masking sound volume level is higher relative to the target sound volume level. It is known to become.

このマスキング効果を利用した音響技術は、従来種々提案されており、その例として特許文献１ないし２に開示された技術が挙げられる。特許文献１には、取得した音を所定のフレームに分割し、各フレーム内で時間的に逆に再生することにより音を無意味化しマスキングサウンドを生成する技術が開示されている。また、特許文献２には、音信号を複数のセグメントに分割し、この複数のセグメントの順序を入れ替えることにより音を無意味化しマスキングサウンドを生成する技術が開示されている。
特願２００６−２４２３４４号公報特表２００５−５５４０６１号公報 Various acoustic techniques using this masking effect have been proposed in the past, and examples thereof include techniques disclosed in Patent Documents 1 and 2. Japanese Patent Application Laid-Open No. 2004-151620 discloses a technique for generating a masking sound by making an acquired sound meaningless by dividing an acquired sound into predetermined frames and playing back in reverse in each frame. Patent Document 2 discloses a technique for generating a masking sound by making a sound meaningless by dividing a sound signal into a plurality of segments and changing the order of the plurality of segments.
Japanese Patent Application No. 2006-242344 JP-T-2005-554061

特許文献１および２に記載の技術によれば、収音した音からリアルタイムにマスキングサウンドを生成するため、音信号の処理に高いパフォーマンスが要求されていた。
本発明は上記の問題に鑑みてなされたものであり、マスキングしたい音の音響特性をマスキングするのに最も適した音響特性を有するマスキングサウンドを生成する技術を提供することにある。 According to the techniques described in Patent Documents 1 and 2, since a masking sound is generated in real time from the collected sound, high performance is required for processing the sound signal.
The present invention has been made in view of the above problems, and it is an object of the present invention to provide a technique for generating a masking sound having an acoustic characteristic most suitable for masking an acoustic characteristic of a sound to be masked.

本発明に係るマスキングサウンド生成装置は、音信号を所定時間長の区間に分割して再構成することにより、前記音信号の時系列が変更されたスクランブル音信号を複数記憶すると共に、前記スクランブル音信号の各々の音響特性を記憶する記憶手段と、音を収音し前記音の音響特性を分析する音響特性分析手段と、前記音響特性分析手段により分析された音響特性と前記スクランブル音信号の音響特性とを所定のアルゴリズムにより比較してスクランブル音信号を決定し、前記記憶手段から該決定したスクランブル音信号を読出して出力する出力手段とを具備することを特徴とする。 The masking sound generation apparatus according to the present invention stores a plurality of scrambled sound signals in which the time series of the sound signal is changed by dividing the sound signal into sections of a predetermined time length, and reconfiguring the sound signal. Storage means for storing each acoustic characteristic of the signal, acoustic characteristic analyzing means for collecting sound and analyzing the acoustic characteristic of the sound, acoustic characteristics analyzed by the acoustic characteristic analyzing means, and acoustics of the scrambled sound signal Output means for comparing a characteristic with a predetermined algorithm to determine a scrambled sound signal, reading the determined scrambled sound signal from the storage means, and outputting it.

また、本発明に係るマスキングサウンド生成装置は、上記の構成において、前記出力手段は、前記音響特性分析手段により分析された前記音の音響特性に基づいて、前記記憶手段から読出したスクランブル音信号に音響処理を施して出力しても良い。 In the masking sound generation apparatus according to the present invention, in the above configuration, the output unit generates a scrambled sound signal read from the storage unit based on an acoustic characteristic of the sound analyzed by the acoustic characteristic analysis unit. Sound processing may be performed and output.

本発明に係るマスキングサウンド生成装置の別の構成は、音信号を所定時間長の区間に分割して再構成することにより、前記音信号の時系列が変更されたスクランブル音信号を複数記憶すると共に、前記スクランブル音信号の各々の音響特性を記憶する記憶手段と、操作者からマスキングされる音の音響特性に関する情報を受取る受取手段と、前記受取手段により受取られた音響特性と前記スクランブル音信号の音響特性とを所定のアルゴリズムにより比較してスクランブル音信号を決定し、前記記憶手段から該決定したスクランブル音信号を読出して出力する出力手段とを具備することを特徴とするマスキングサウンド生成装置。 Another configuration of the masking sound generation apparatus according to the present invention stores a plurality of scrambled sound signals in which the time series of the sound signal is changed by reconfiguring the sound signal into sections of a predetermined time length. Storage means for storing the acoustic characteristics of each of the scrambled sound signals, receiving means for receiving information on the acoustic characteristics of the sound to be masked from the operator, acoustic characteristics received by the receiving means and the scrambled sound signal A masking sound generating apparatus comprising: output means for comparing a sound characteristic with a predetermined algorithm to determine a scrambled sound signal, reading out the determined scrambled sound signal from the storage means, and outputting the signal.

また、本発明に係るマスキングサウンド生成装置は、上記の構成において、前記出力手段は、前記受取手段が受取った前記マスキングされる音の音響特性に関する情報に基づいて、前記記憶手段から読出したスクランブル音信号に音響処理を施して出力しても良い。 In the masking sound generation apparatus according to the present invention, in the above configuration, the output means includes a scrambled sound read from the storage means based on information on acoustic characteristics of the masked sound received by the receiving means. The signal may be subjected to acoustic processing and output.

また、本発明に係るマスキングサウンド生成装置の別の構成は、音信号を所定時間長の区間に分割して再構成することにより、前記音信号の時系列が変更されたスクランブル音信号を複数記憶する記憶手段と、操作者から前記記憶手段に記憶されたスクランブル音信号のいずれかを指定する指示信号を受取る受取手段と、前記受取手段により受取られた指示信号が示すスクランブル音信号を、前記記憶手段から読出して出力する出力手段とを具備することを特徴とする。 Further, another configuration of the masking sound generation device according to the present invention stores a plurality of scrambled sound signals whose time series is changed by dividing the sound signal into sections of a predetermined time length and reconfiguring the sound signal. Storing means, receiving means for receiving an instruction signal designating any of the scrambled sound signals stored in the storage means from an operator, and storing the scrambled sound signal indicated by the instruction signal received by the receiving means. Output means for reading out from the means and outputting.

本発明に係るマスキングサウンド生成装置は、上記のいずれかの構成において、音信号を受取り、該音信号を所定区間に区切って加工することにより、前記各区間の時系列が変更されたスクランブル音信号を生成し、前記記憶手段に記憶させるスクランブル手段とを更に備えていても良い。 A masking sound generation apparatus according to the present invention is a scrambled sound signal in which the time series of each section is changed by receiving a sound signal and processing the sound signal into predetermined sections in any one of the above configurations And scramble means for generating and storing it in the storage means.

本発明に係るマスキングサウンド生成装置は、上記のいずれかの構成において、操作者から前記スクランブル音信号が放音される空間の音響特性に関する情報を受取る受信手段を更に有し、前記出力手段は、前記受信手段が受取った空間の音響特性に関する情報に基づいて、前記記憶手段から読出したスクランブル音信号に音響処理を施して出力しても良い。 The masking sound generation apparatus according to the present invention further includes receiving means for receiving information on acoustic characteristics of a space where the scrambled sound signal is emitted from an operator in any one of the above configurations, and the output means includes: Based on the information about the acoustic characteristics of the space received by the receiving means, the scrambled sound signal read from the storage means may be subjected to acoustic processing and output.

本発明に係るマスキングサウンド生成方法は、音信号を所定時間長の区間に分割して再構成することにより、前記音信号の時系列が変更されたスクランブル音信号を記憶装置に複数記憶すると共に、前記スクランブル音信号の各々の音響特性を記憶する記憶段階と、音を収音し前記音の音響特性を分析する音響特性分析段階と、前記音響特性分析段階において分析された音響特性と前記スクランブル音信号の音響特性とを所定のアルゴリズムにより比較してスクランブル音信号を決定し、前記記憶装置から該決定したスクランブル音信号を読出して出力する出力段階とを具備することを特徴とする。 The masking sound generation method according to the present invention stores a plurality of scrambled sound signals in which the time series of the sound signal is changed by dividing the sound signal into sections of a predetermined time length and reconfiguring the sound signal, A storage step of storing each acoustic characteristic of the scrambled sound signal; an acoustic characteristic analysis step of collecting sound and analyzing the acoustic characteristic of the sound; and the acoustic characteristic and the scrambled sound analyzed in the acoustic characteristic analysis step An output step of comparing the acoustic characteristics of the signal with a predetermined algorithm to determine a scrambled sound signal, and reading out the determined scrambled sound signal from the storage device and outputting it;

本発明に係るマスキングサウンド生成方法の別の構成は、音信号を所定時間長の区間に分割して再構成することにより、前記音信号の時系列が変更されたスクランブル音信号を記憶装置に複数記憶すると共に、前記スクランブル音信号の各々の音響特性を記憶する記憶段階と、操作者からマスキングされる音の音響特性に関する情報を受取る受取段階と、前記受取段階において受取られた音響特性と前記スクランブル音信号の音響特性とを所定のアルゴリズムにより比較してスクランブル音信号を決定し、前記記憶装置から該決定したスクランブル音信号を読出して出力する出力段階とを具備することを特徴とする。 Another configuration of the masking sound generation method according to the present invention is that a plurality of scrambled sound signals in which the time series of the sound signal is changed are stored in a storage device by reconfiguring the sound signal by dividing the sound signal into sections of a predetermined time length. A storage step for storing and storing each acoustic characteristic of the scrambled sound signal; a receiving stage for receiving information on the acoustic characteristic of the sound to be masked from an operator; and the acoustic characteristic received in the receiving stage and the scrambled An output step of comparing the acoustic characteristics of the sound signal with a predetermined algorithm to determine a scrambled sound signal, and reading out the determined scrambled sound signal from the storage device and outputting it;

また、本発明に係るマスキングサウンド生成方法の別の構成は、音信号を所定時間長の区間に分割して再構成することにより、前記音信号の時系列が変更されたスクランブル音信号を記憶装置に複数記憶する記憶段階と、操作者から前記記憶段階において記憶されたスクランブル音信号のいずれかを指定する指示信号を受取る受取段階と、前記受取段階において受取られた指示信号が示すスクランブル音信号を、前記記憶装置から読出して出力する出力段階とを具備することを特徴とする。 Further, another configuration of the masking sound generation method according to the present invention is a storage device that stores a scrambled sound signal in which the time series of the sound signal is changed by reconfiguring the sound signal by dividing the sound signal into sections of a predetermined time length. A plurality of storage stages, a reception stage for receiving an instruction signal designating one of the scrambled sound signals stored in the storage stage from an operator, and a scrambled sound signal indicated by the instruction signal received in the reception stage. And an output stage for reading out from the storage device and outputting it.

本発明に係るプログラムは、音信号を所定時間長の区間に分割して再構成することにより、前記音信号の時系列が変更されたスクランブル音信号を複数記憶すると共に、前記スクランブル音信号の各々の音響特性を記憶する記憶手段と、音を収音し前記音の音響特性を分析する音響特性分析手段と、前記音響特性分析手段により分析された音響特性と前記スクランブル音信号の音響特性とを所定のアルゴリズムにより比較してスクランブル音信号を決定し、前記記憶手段から該決定したスクランブル音信号を読出して出力する出力手段として機能させる。 The program according to the present invention stores a plurality of scrambled sound signals in which the time series of the sound signal is changed by dividing the sound signal into sections of a predetermined time length and reconfiguring each of the scrambled sound signals. Storage means for storing the acoustic characteristics, acoustic characteristic analysis means for collecting sound and analyzing the acoustic characteristics of the sound, acoustic characteristics analyzed by the acoustic characteristic analysis means, and acoustic characteristics of the scrambled sound signal A scrambled sound signal is determined by comparison with a predetermined algorithm, and the determined scrambled sound signal is read from the storage means and functions as an output means for outputting.

本発明に係るプログラムの別の構成は、コンピュータを、音信号を所定時間長の区間に分割して再構成することにより、前記音信号の時系列が変更されたスクランブル音信号を複数記憶すると共に、前記スクランブル音信号の各々の音響特性を記憶する記憶手段と、操作者からマスキングされる音の音響特性に関する情報を受取る受取手段と、前記受取手段により受取られた音響特性と前記スクランブル音信号の音響特性とを所定のアルゴリズムにより比較してスクランブル音信号を決定し、前記記憶手段から該決定したスクランブル音信号を読出して出力する出力手段として機能させる。 Another configuration of the program according to the present invention is to store a plurality of scrambled sound signals in which the time series of the sound signal is changed by reconfiguring the computer by dividing the sound signal into sections of a predetermined time length. Storage means for storing the acoustic characteristics of each of the scrambled sound signals, receiving means for receiving information on the acoustic characteristics of the sound to be masked from the operator, acoustic characteristics received by the receiving means and the scrambled sound signal The scrambled sound signal is determined by comparing the acoustic characteristics with a predetermined algorithm, and the determined scrambled sound signal is read from the storage means and functions as output means for outputting.

また、本発明に係るプログラムの別の構成は、コンピュータを、音信号を所定時間長の区間に分割して再構成することにより、前記音信号の時系列が変更されたスクランブル音信号を複数記憶する記憶手段と、操作者から前記記憶手段に記憶されたスクランブル音信号のいずれかを指定する指示信号を受取る受取手段と、前記受取手段により受取られた指示信号が示すスクランブル音信号を、前記記憶手段から読出して出力する出力手段として機能させる。 In another configuration of the program according to the present invention, the computer stores a plurality of scrambled sound signals in which the time series of the sound signal is changed by reconfiguring the sound signal into sections of a predetermined time length. Storing means, receiving means for receiving an instruction signal designating any of the scrambled sound signals stored in the storage means from an operator, and storing the scrambled sound signal indicated by the instruction signal received by the receiving means. It functions as an output means for reading out from the means and outputting.

本発明に係るコンピュータ読み取り可能な記録媒体は、音信号を所定区間に区切って加工することにより、前記各区間の時系列が変更されたスクランブル音信号を複数記憶するとともに、前記各スクランブルデータを選択的に読み出せるように記憶していることを特徴とする。 The computer-readable recording medium according to the present invention stores a plurality of scrambled sound signals in which the time series of each section is changed by dividing the sound signal into predetermined sections, and selects each scrambled data. It is memorized so that it can be read out automatically.

本発明に係るマスキングサウンド生成装置またはマスキングサウンド生成方法またはプログラムまたは記録媒体により、マスキングしたい音の音響特性をマスキングするのに最も適した音響特性を有するマスキングサウンドを生成することができる。 With the masking sound generation device, the masking sound generation method, the program, or the recording medium according to the present invention, it is possible to generate a masking sound having an acoustic characteristic most suitable for masking the acoustic characteristic of the sound to be masked.

以下、本発明の実施形態について図面を用いて説明する。
（Ａ；構成）
（Ａ−１；全体構成）
図１は、本発明に係るサウンドマスキングシステム１の構成を示す図である。図１に示すように、音響空間２０Ａには、マイクロホン３０が天井から吊り下げられて設置されている。音響空間２０Ｂにはスピーカ４０が天井から吊り下げられて設置されている。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
(A: Configuration)
(A-1: Overall configuration)
FIG. 1 is a diagram showing a configuration of a sound masking system 1 according to the present invention. As shown in FIG. 1, a microphone 30 is suspended from the ceiling and installed in the acoustic space 20A. A speaker 40 is suspended from the ceiling and installed in the acoustic space 20B.

マイクロホン３０は、音響空間２０Ａにおける音（人間の話し声や空調の動作音などの可聴音）を収音してアナログの音信号に変換し、マスキングサウンド生成装置１０へ出力する。
スピーカ４０は、マスキングサウンド生成装置１０からアナログの音信号を受取り、音響空間２０Ｂにおいて再生する。 The microphone 30 picks up sound in the acoustic space 20 A (audible sound such as human speech or air-conditioning operation sound), converts it into an analog sound signal, and outputs the analog sound signal to the masking sound generator 10.
The speaker 40 receives an analog sound signal from the masking sound generator 10 and reproduces it in the acoustic space 20B.

（Ａ−２；マスキングサウンド生成装置１０の構成）
次に、マスキングサウンド生成装置１０の構成について図２を参照して説明する。マスキングサウンド生成装置１０は、マスキングサウンド（マスカー）を表す音信号を生成する。該マスキングサウンドは音響空間２０Ｂにおいて放音され、音響空間２０Ａにおける会話の内容を他の音響空間２０Ｂのユーザに聞きとられにくくしたり（セキュリティーの保護）、他の音響空間２０Ｂのユーザが音響空間２０Ａから漏れ聞こえる音により会話が妨害されたり作業の集中を乱されたりしないようにする（騒音のマスキング）。 (A-2: Configuration of the masking sound generator 10)
Next, the configuration of the masking sound generation apparatus 10 will be described with reference to FIG. The masking sound generator 10 generates a sound signal representing a masking sound (masker). The masking sound is emitted in the acoustic space 20B, and the contents of the conversation in the acoustic space 20A are difficult to be heard by other users of the acoustic space 20B (security protection). The sound that can be heard from 20A is prevented from disturbing the conversation or disturbing the work concentration (noise masking).

ＣＰＵ（Central Processing Unit）１００は、記憶部２００に格納されている各種プログラムを実行することにより本発明に特徴的な動作を行ったり、マスキングサウンド生成装置１０の各部の動作を制御したりする。 A CPU (Central Processing Unit) 100 executes various programs stored in the storage unit 200 to perform operations characteristic of the present invention and control operations of each unit of the masking sound generation apparatus 10.

音声入力部３００は、アナログ／デジタル（以下、「Ａ／Ｄ」と略記する）コンバータ３１０と入力端子３２０とを有する。入力端子３２０にはマイクロホン３０が接続されており、マイクロホン３０により生成された音信号は、入力端子３２０を介してＡ／Ｄコンバータ３１０へ入力される。Ａ／Ｄコンバータ３１０は、マイクロホン３０から受取った音信号にＡ／Ｄ変換を施し、デジタルの音信号をＣＰＵ１００へ出力する。 The audio input unit 300 includes an analog / digital (hereinafter abbreviated as “A / D”) converter 310 and an input terminal 320. The microphone 30 is connected to the input terminal 320, and the sound signal generated by the microphone 30 is input to the A / D converter 310 via the input terminal 320. The A / D converter 310 performs A / D conversion on the sound signal received from the microphone 30 and outputs a digital sound signal to the CPU 100.

音声出力部４００は、Ｄ／Ａコンバータ４１０とアンプ４２０と出力端子４３０とを有する。Ｄ／Ａコンバータ４１０は、ＣＰＵ１００から受取った音信号に対して、Ｄ／Ａ変換を施すことによってアナログの音信号へ変換する。アンプ４２０は、Ｄ／Ａコンバータ４１０から受取った音信号の振幅（マスタボリューム）を最適な値に調整して、マスキング効果が最大となるように制御する。音信号の増幅率は、後述する操作部５００からの信号に基づいてＣＰＵ１００により制御される。出力端子４３０はスピーカ４０と接続されており、音信号はスピーカ４０へ出力され、音響空間２０Ｂにおいてマスキングサウンド（マスカー）として放音される。 The audio output unit 400 includes a D / A converter 410, an amplifier 420, and an output terminal 430. The D / A converter 410 converts the sound signal received from the CPU 100 into an analog sound signal by performing D / A conversion. The amplifier 420 adjusts the amplitude (master volume) of the sound signal received from the D / A converter 410 to an optimal value, and controls so that the masking effect is maximized. The amplification factor of the sound signal is controlled by the CPU 100 based on a signal from the operation unit 500 described later. The output terminal 430 is connected to the speaker 40, and the sound signal is output to the speaker 40 and emitted as a masking sound (masker) in the acoustic space 20B.

操作部５００はタッチパネルを有する入力装置であり、マスキングサウンド生成装置１０のユーザにより該タッチパネルが押下された場合に、操作内容をＣＰＵ１００へと出力する。図３は、操作部５００の外観を示した図である。操作部５００のタッチパネルは、動作モード選択部５１０、音信号選択部５２０、性別選択部５３０、年齢選択部５４０、言語選択部５５０、音響空間選択部５６０、および音量レベル選択部５７０を有する。 The operation unit 500 is an input device having a touch panel, and outputs an operation content to the CPU 100 when the user of the masking sound generation device 10 is pressed. FIG. 3 is a diagram illustrating an appearance of the operation unit 500. The touch panel of the operation unit 500 includes an operation mode selection unit 510, a sound signal selection unit 520, a sex selection unit 530, an age selection unit 540, a language selection unit 550, an acoustic space selection unit 560, and a volume level selection unit 570.

ユーザによりタッチパネル上の特定の領域が押下されると、該領域は同図に例示されているように選択された領域が網掛け表示となると共に、対応する項目が選択されたことを示す信号がＣＰＵ１００に出力される。なお、音量レベル選択部５７０においては、大きい数字ほど大きい音量レベルが対応付けられている。以下ではそれらの信号を、それぞれ動作モード選択情報、音信号選択情報、性別選択情報、年齢選択情報、言語選択情報、音響空間選択情報、および音量レベル選択情報と呼ぶ。また、性別選択情報、年齢選択情報、言語選択情報、および音響空間選択情報を条件設定情報と総称する。 When a specific area on the touch panel is pressed by the user, the selected area is shaded as shown in the figure, and a signal indicating that the corresponding item is selected is displayed. It is output to the CPU 100. In the sound volume level selection unit 570, a larger sound volume level is associated with a larger number. Hereinafter, these signals are referred to as operation mode selection information, sound signal selection information, gender selection information, age selection information, language selection information, acoustic space selection information, and volume level selection information, respectively. Moreover, sex selection information, age selection information, language selection information, and acoustic space selection information are collectively referred to as condition setting information.

再び図２において、光ディスク再生装置６００は、装着された光ディスクから記録されているデータを読み出す装置である。読み出されたデータは、ＣＰＵ１００へ出力される。 In FIG. 2 again, the optical disk reproducing device 600 is a device for reading data recorded from the loaded optical disk. The read data is output to the CPU 100.

記憶部２００は、ＲＯＭ（Read Only Memory）２１０およびＲＡＭ（Random Access Memory）２２０を有する。
ＲＯＭ２１０は、本発明に特徴的な機能をＣＰＵ１００に実行させるための制御プログラムやデータが格納されている。
ＲＡＭ２２０は、各種の記憶領域を有し、ＣＰＵ１００によってワークエリアとして利用される。また、ＲＡＭ２２０は、音声入力部３００から受取った各音信号を所定時間分記憶可能な音信号記憶領域を有する。前記所定時間は長時間であるほど好ましく、マスキングサウンド生成装置としては高い性能を有するが、ハード資源の容量や性能により上限値があるため、本実施形態においては一例として１８０秒に設定した。また、ＲＡＭ２２０は、マスキングサウンドの音信号生成に係るパラメータなど各種のデータを記憶する。
以上に説明した各ユニットは、バス７００を介して接続されており、互いにデータの授受を行う。 The storage unit 200 includes a ROM (Read Only Memory) 210 and a RAM (Random Access Memory) 220.
The ROM 210 stores a control program and data for causing the CPU 100 to execute functions characteristic of the present invention.
The RAM 220 has various storage areas and is used as a work area by the CPU 100. The RAM 220 has a sound signal storage area capable of storing each sound signal received from the sound input unit 300 for a predetermined time. The predetermined time is preferably as long as possible, and the masking sound generation device has high performance, but has an upper limit value depending on the capacity and performance of hardware resources. Therefore, in this embodiment, the predetermined time is set to 180 seconds. In addition, the RAM 220 stores various data such as parameters relating to the sound signal generation of the masking sound.
The units described above are connected via a bus 700 and exchange data with each other.

（Ａ−３；制御プログラムおよびデータ）
次に、ＲＯＭ２１０に記憶されている制御プログラムについて説明する。ＣＰＵ１００は、制御プログラムを実行することにより、以下に説明する処理を始めとする各種の処理を実行する。 (A-3; control program and data)
Next, the control program stored in the ROM 210 will be described. The CPU 100 executes various processes including the process described below by executing the control program.

まず、「音響特性分析処理」について説明する。音響特性分析処理とは、入力された音信号を所定長の区間に分割し、生成された各断片（以下、フレームと呼ぶ）における話速、フォルマント、および周波数特性を分析する処理である。 First, the “acoustic characteristic analysis process” will be described. The acoustic characteristic analysis process is a process of dividing an input sound signal into sections of a predetermined length and analyzing speech speed, formant, and frequency characteristics in each generated fragment (hereinafter referred to as a frame).

まず、話速の分析について説明する。本実施形態において、「話速（発話速度）」とは、音声が発せられるときの速さであり、単位時間あたりの音節数などで定義される。ここで音節とは、一定の声の長さを持つ音素（例えば母音）のまとまり、または一定の声の長さを持つ音素の前および／または後に非常に短い音素（例えば子音）を従えるまとまりを意味する。音響特性分析処理において、ＣＰＵ１００は、受取った音信号の各フレームについて、音信号の時間軸波形を生成し、当該時間軸波形のエンベロープ（包絡線）にスムージング処理を施す。そしてスムージング処理された波形から各音節を構成する波形のピーク位置をフレームごとに検出して、ピーク数を計測する。その後、当該ピーク数を音節数とし、音節数をフレーム長で除した単位時間あたりの音節数を話速として算出する。ここでピークとは、各音節を構成する波形においてレベルが最大の箇所を言う。話速はフレーム毎に異なるが、ＣＰＵ１００は、上記フレームごとにその時点での話速を分析し、それらの値の平均値、該平均値のフレーム間のばらつきである標準偏差σを算出し出力する。 First, the analysis of speech speed will be described. In the present embodiment, “speech speed (speech speed)” is the speed at which a voice is emitted, and is defined by the number of syllables per unit time or the like. A syllable is a group of phonemes having a certain voice length (for example, vowels), or a group that can follow a very short phoneme (for example, consonants) before and / or after a phoneme having a certain voice length. means. In the acoustic characteristic analysis process, the CPU 100 generates a time axis waveform of the sound signal for each frame of the received sound signal, and performs a smoothing process on the envelope (envelope) of the time axis waveform. Then, the peak position of the waveform constituting each syllable is detected for each frame from the smoothed waveform, and the number of peaks is measured. Thereafter, the number of syllables per unit time obtained by dividing the peak number as the syllable number and dividing the syllable number by the frame length is calculated as the speech speed. Here, the peak means a portion having the maximum level in the waveform constituting each syllable. Although the speech speed varies from frame to frame, the CPU 100 analyzes the speech speed at that time for each frame, calculates the average value of these values, and the standard deviation σ, which is the variation between the average values, and outputs it. To do.

次に、フォルマントの分析について説明する。フォルマントとは、音声のスペクトル包絡上で特定の周波数領域にエネルギーが集中して生じる山である。これは、人間の声などが固有に持っている周波数スペクトル（倍音成分の分布パターン）であり、声の高さや強さに依存しないという特徴を有する。フォルマントを分析することで、話者の性別、年齢、使用言語などを読み取ることができることが知られている。音響特性分析処理において、ＣＰＵ１００は、受取った音信号の各フレームにおける波形をフーリエ変換する。そしてＣＰＵ１００は、フーリエ変換により得られた振幅スペクトルの対数を求め、それをフーリエ逆変換して各フレームのスペクトル包絡を生成する。そしてＣＰＵ１００は、得られたスペクトル包絡の低い周波数から第１フォルマントの周波数および第２フォルマントの周波数、第３フォルマントの周波数を抽出する。なお、本実施形態においては、第１ないし第３フォルマントの周波数を抽出するが、そのうちいずれか１つまたは２つ、または第４フォルマント以降について分析しても良い。 Next, formant analysis will be described. A formant is a mountain formed by concentrating energy in a specific frequency region on the spectrum envelope of speech. This is a frequency spectrum (overtone component distribution pattern) inherent to a human voice or the like, and has a feature that it does not depend on the pitch or strength of the voice. It is known that the sex, age, language used, etc. of a speaker can be read by analyzing formants. In the acoustic characteristic analysis process, the CPU 100 performs a Fourier transform on the waveform of each received sound signal in each frame. Then, the CPU 100 obtains the logarithm of the amplitude spectrum obtained by the Fourier transform, and inversely transforms it to generate the spectrum envelope of each frame. Then, the CPU 100 extracts the first formant frequency, the second formant frequency, and the third formant frequency from the obtained low frequency envelope frequency. In the present embodiment, the frequencies of the first to third formants are extracted, but any one or two of them, or the fourth and subsequent formants may be analyzed.

次に、周波数特性の分析について説明する。ＣＰＵ１００は、受取った音信号をフレームごとに読み出し、フーリエ変換により各フレームの周波数領域のスペクトルデータを生成する。生成されたスペクトルデータからは、音信号が表す音のピッチなどを読み取ることが出来る。
以上が、音響特性分析処理である。 Next, frequency characteristic analysis will be described. The CPU 100 reads the received sound signal for each frame and generates spectrum data in the frequency domain of each frame by Fourier transform. From the generated spectrum data, the pitch of the sound represented by the sound signal can be read.
The above is the acoustic characteristic analysis process.

次に、音信号の「リバース処理」について説明する。リバース処理において、ＣＰＵ１００は、受取った音信号の各フレームを一旦時間軸領域の信号に変換する。そして、該音信号の各フレームを時間軸において逆から読み出し、各音信号を新たな音信号へ変換する。本処理は、すなわち元の音信号が生成された順序とは逆の時間的順序で古いデータから読み出して新たな音信号を生成する処理である。このリバース処理により生成された音信号からは、処理前の音信号に含まれていた内容を理解することはできない。 Next, “reverse processing” of sound signals will be described. In the reverse processing, the CPU 100 once converts each frame of the received sound signal into a signal in the time axis region. Then, each frame of the sound signal is read from the reverse on the time axis, and each sound signal is converted into a new sound signal. This process is a process of reading out old data and generating a new sound signal in a temporal order opposite to the order in which the original sound signal was generated. From the sound signal generated by the reverse processing, the contents contained in the sound signal before the processing cannot be understood.

次に、音信号の各フレームの「窓掛け処理」について説明する。窓掛け処理とは、内容が連続していないフレーム同士を接続する場合に、その音がスムーズに移行するように接続部分の波形を変換する処理である。
具体的には、ＣＰＵ１００は、例えば三角関数などからなる「整形関数」を各フレームの音信号に乗算することにより、各フレームの頭部では滑らかに立ち上がるように、そして各フレームの尾部では滑らかに立ち下がるように整形する。音響処理により連続した音信号が複数のフレームに分割され、元の音信号と異なる順序で接続された場合には、その接続部分においてクリックノイズが発生することがあるが、この窓掛け処理により該ノイズは除去される。 Next, the “windowing process” for each frame of the sound signal will be described. The windowing process is a process of converting the waveform of the connection portion so that the sound smoothly transitions when frames whose contents are not continuous are connected.
Specifically, the CPU 100 multiplies the sound signal of each frame by a “shaping function” composed of, for example, a trigonometric function so that the head of each frame rises smoothly, and the tail of each frame smoothly Shape it so that it falls. When a continuous sound signal is divided into a plurality of frames by acoustic processing and connected in an order different from that of the original sound signal, click noise may occur in the connected portion. Noise is removed.

次に、ＲＯＭ２１０に記憶されているデータについて説明する。
まず、「フレーム長選択テーブル」について説明する。図４は、フレーム長選択テーブルの１例を示した図である。フレーム長選択テーブルにおいては、上述した話速の範囲に対してフレーム長が対応付けられている。例えば、話速７．５以上１２．５未満〔秒^−１〕に対して、フレーム長の値０．１０〔秒〕が対応付けられている。ここで、１フレーム長は、話速が各話速の範囲の中間の値であるときの１音節の時間と同程度に設定した。すなわち、話速１０〔秒^−１〕では１音節の発話速度は０．１０秒であり、話速１０〔秒^−１〕が含まれる話速７．５以上１２．５未満の範囲に対応するフレーム長をこの１音節の発話時間（０．１０秒）に設定した。これは、１フレーム長が１音節より極端に短い時間の場合には、１音節が複数フレームに分割され、各フレームをリバース再生しても元の音節として認識されるおそれがあり、１フレーム長が１音節より極端に長い時間の場合には、各フレームをランダムに再構成しても１フレーム内の各音節がそのまま認識されるおそれがあるからである。 Next, data stored in the ROM 210 will be described.
First, the “frame length selection table” will be described. FIG. 4 is a diagram showing an example of the frame length selection table. In the frame length selection table, the frame length is associated with the above-described speech speed range. For example, a frame length value of 0.10 [second] is associated with a speech speed of 7.5 or more and less than 12.5 [second- ¹ ]. Here, the length of one frame is set to be approximately the same as the time of one syllable when the speech speed is an intermediate value in the range of each speech speed. That is, at a speech speed of 10 [seconds ^-1 ], the utterance speed of one syllable is 0.10 seconds, which corresponds to a range of speech speeds of 7.5 to less than 12.5 including the speech speed of 10 [seconds ^-1 ]. The frame length was set to this one syllable speech time (0.10 seconds). This is because when one frame length is extremely shorter than one syllable, one syllable is divided into a plurality of frames, and each frame may be recognized as the original syllable even if reversely played back. This is because if the time is extremely longer than one syllable, each syllable in one frame may be recognized as it is even if each frame is randomly reconstructed.

次に、「スクランブル音信号」について説明する。スクランブル音信号とは、人間の音声をスクランブル（無意味化または理解不能化）した音信号である。具体的には、人間の音声を収音して対応する波形データを生成し、所定時間（例えば１００ミリ秒）ごとに複数のフレームに分割し、それらを元の音声とは異なる順序で組み合わせて新たに生成した音信号である。本実施形態においては、複数のスクランブル音信号（スクランブル音信号１、２、３…）が、後述する初期設定処理においてＲＯＭ２１０に格納される。なお、人間は、このスクランブル音信号から言語としての意味を理解することはできない。
また、ＲＯＭ２１０には、人の音声の音信号以外に、広帯域ノイズの一例としてホワイトノイズの音信号も記憶されている。なお、ホワイトノイズとは、測定周波数帯域において一様なパワースペクトル密度を有する雑音である。 Next, the “scrambled sound signal” will be described. A scrambled sound signal is a sound signal obtained by scrambled human speech (meaningless or unintelligible). Specifically, human speech is collected and corresponding waveform data is generated, divided into a plurality of frames every predetermined time (for example, 100 milliseconds), and these are combined in a different order from the original speech. This is a newly generated sound signal. In the present embodiment, a plurality of scrambled sound signals (scrambled sound signals 1, 2, 3,...) Are stored in the ROM 210 in an initial setting process described later. Humans cannot understand the meaning of language from this scrambled sound signal.
The ROM 210 also stores a white noise sound signal as an example of broadband noise in addition to a human voice signal. White noise is noise having a uniform power spectral density in the measurement frequency band.

次に、「スクランブル音信号選択テーブル」について説明する。図５に示すように、スクランブル音信号選択テーブルにおいては、ＲＯＭ２１０に格納されたスクランブル音信号の各々を特定可能なスクランブル音信号番号に対して、その音の発音体属性情報、および音響特性情報が書き込まれている。発音体属性情報には、そのスクランブル音信号の元となった音声を発音した人の性別、年齢、言語、および名前が含まれる。例えば、スクランブル音信号１は、３０歳の日本人男性である「Ａさん」により吹き込まれた音声から生成されたものである。音響特性情報には、該スクランブル音信号の話速、フォルマント、および周波数特性に関するデータが含まれる。なお、フォルマント、および周波数特性の項目には、フォルマント、および周波数特性のデータを一意に識別するためのファイル名が書き込まれており、データは別途ＲＯＭ２１０に書き込まれている。 Next, the “scrambled sound signal selection table” will be described. As shown in FIG. 5, in the scrambled sound signal selection table, sound generator attribute information and acoustic characteristic information of the sound are stored for each scrambled sound signal number that can identify each scrambled sound signal stored in the ROM 210. Has been written. The sound generator attribute information includes the gender, age, language, and name of the person who pronounced the voice that is the source of the scrambled sound signal. For example, the scrambled sound signal 1 is generated from a sound that is blown by “Mr. A”, a 30-year-old Japanese male. The acoustic characteristic information includes data relating to the speech speed, formant, and frequency characteristic of the scrambled sound signal. In the formant and frequency characteristic items, a file name for uniquely identifying the formant and frequency characteristic data is written, and the data is separately written in the ROM 210.

（Ｂ；動作）
次に、本実施形態の動作について説明する。
（Ｂ−１；初期設定処理）
ＣＰＵ１００は、マスキングサウンドを生成の前に初期設定処理を行う。図６は、初期設定処理におけるＣＰＵ１００が行う処理の流れを示したフローチャートである。 (B: Operation)
Next, the operation of this embodiment will be described.
(B-1: Initial setting process)
The CPU 100 performs an initial setting process before generating the masking sound. FIG. 6 is a flowchart showing a flow of processing performed by the CPU 100 in the initial setting processing.

まず、ステップＳＡ１００において、ＣＰＵ１００は音信号を受信する。ここで、ＣＰＵ１００が音信号を受信する方法は２つある。１つは、ユーザがマイクロホン３０を介して音声を吹き込み、ＣＰＵ１００は、音声入力部３００を介して音信号を受取る方法である。もう１つの方法は、音信号が書き込まれた光ディスクから光ディスク再生装置６００により音信号を読み出す方法である。この場合、光ディスクとしては、例えば既製品として販売されている光ディスクでもよいし、ユーザが予め音信号を光ディスクに書き込んだものでも良い。 First, in step SA100, the CPU 100 receives a sound signal. Here, there are two methods for the CPU 100 to receive a sound signal. One is a method in which a user blows sound through the microphone 30 and the CPU 100 receives a sound signal through the sound input unit 300. The other method is a method of reading a sound signal from the optical disk on which the sound signal has been written by the optical disk reproducing device 600. In this case, the optical disk may be, for example, an optical disk sold as an off-the-shelf product, or may be one in which a user has previously written a sound signal on the optical disk.

ユーザは、上記いずれかの方法で音信号を入力し終えると、該音信号に関する発音体属性情報（該音声を発音した人の性別、年齢、言語、および名前）について図示せぬ入力手段を介して入力する。ＣＰＵ１００は、受取った音信号と発音体属性情報とを、相互に関連付けて一旦ＲＡＭ２２０に書き込む。 When the user finishes inputting the sound signal by any of the above methods, the sound generator attribute information (gender, age, language, and name of the person who pronounced the sound) regarding the sound signal is input via an input unit (not shown). Enter. The CPU 100 temporarily writes the received sound signal and sound generator attribute information in the RAM 220 in association with each other.

本動作例においては、前者の方法すなわちマイクロホン３０を介して音声を入力する方法と、後者の方法すなわち光ディスクなどの記憶媒体から音信号を読み出す方法を併用する。前者の方法で入力される音信号は、以下の通りである。スクランブル音信号１および２の元となる音信号として、それぞれ３０歳の日本人男性である「Ａさん」、２５歳の日本人女性である「Ｂさん」の発音を表す音信号が入力される。また、スクランブル音信号３の元となる音信号として、平均年齢２５歳の日本人の男女５人からなる「Ｃグループ（５人）」の発音を表す音信号が入力される。 In this operation example, the former method, that is, a method of inputting sound through the microphone 30, and the latter method, that is, a method of reading a sound signal from a storage medium such as an optical disk are used in combination. The sound signal input by the former method is as follows. Sound signals representing the pronunciation of “Mr. A”, a 30-year-old Japanese male, and “Mr. B”, a 25-year-old Japanese female, are input as sound signals that are the basis of the scrambled sound signals 1 and 2. . In addition, as a sound signal that is a source of the scrambled sound signal 3, a sound signal representing the pronunciation of “C group (5 people)” consisting of five Japanese men and women with an average age of 25 is input.

また、後者の方法で入力される音信号は以下の通りである。スクランブル音信号４の元となる音信号として、１０歳の日本人の男児の発音を表す音信号が入力される。また、スクランブル音信号５の元となる音信号として、３０歳のイギリス人男性の音から生成された音信号が入力される。 The sound signal input by the latter method is as follows. A sound signal representing the pronunciation of a 10-year-old Japanese boy is input as the sound signal that is the basis of the scrambled sound signal 4. Further, a sound signal generated from the sound of a 30-year-old British male is input as the sound signal that is the basis of the scrambled sound signal 5.

なお、入力すべき音信号は、各ユーザが音響空間２０Ａを利用する頻度、および音響空間２０Ａにおいて使用される言語の種類を参考にして選択すれば良い。例えば、音響空間２０Ａが、「Ａさん」や「Ｂさん」や「Ｃグループ」により頻繁に利用されたり、頻繁に英語による会議が行われるような場合には、上述のようにそれら頻繁に利用する人の音声や使用言語の音信号を入力しておくと良い。 The sound signal to be input may be selected with reference to the frequency with which each user uses the acoustic space 20A and the type of language used in the acoustic space 20A. For example, when the acoustic space 20A is frequently used by “Mr. A”, “Mr. B”, and “C group”, or when meetings are frequently held in English, they are frequently used as described above. It is good to input the voice signal of the person who uses it and the sound signal of the language used.

次に、ステップＳＡ１１０において、ＣＰＵ１００は、音響特性分析処理を行う。具体的には、ＣＰＵ１００は、ＲＡＭ２２０に書き込まれた各音信号において、話速、フォルマント、および周波数特性を分析し、その分析結果である音響特性情報を各分析対象となった音信号と関連付けて一旦ＲＡＭ２２０に書き込む。 Next, in step SA110, the CPU 100 performs an acoustic characteristic analysis process. Specifically, the CPU 100 analyzes the speech speed, formant, and frequency characteristics in each sound signal written in the RAM 220, and associates the acoustic characteristic information, which is the analysis result, with the sound signal to be analyzed. Once written in the RAM 220.

ステップＳＡ１２０において、ＣＰＵ１００は、ＲＯＭ２１０に格納されたスクランブル音信号選択テーブルの更新を行う。具体的には、ＣＰＵ１００は、各音信号に関する発音体属性情報と音響特性とをＲＡＭ２２０から読み出し、スクランブル音信号選択テーブルに書き込む。その際、図５に示すように、スクランブル音信号１、２、３、４、および５の元となる音信号に関する発音体属性情報と音響特性は、それぞれスクランブル音信号１、２、３、４、および５の欄に書き込む。 In step SA120, CPU 100 updates the scrambled sound signal selection table stored in ROM 210. Specifically, the CPU 100 reads sound generator attribute information and acoustic characteristics regarding each sound signal from the RAM 220 and writes them in the scrambled sound signal selection table. At this time, as shown in FIG. 5, the sound generator attribute information and the acoustic characteristics relating to the sound signals that are the basis of the scrambled sound signals 1, 2, 3, 4, and 5 are respectively scrambled sound signals 1, 2, 3, 4 , And 5 are written.

ステップＳＡ１３０において、ＣＰＵ１００は、音信号スクランブル処理を行う。図７は、音信号スクランブル処理の流れを示すフローチャートである。また、図８は、音信号スクランブル処理に伴う音信号の波形を示した図である。 In step SA130, the CPU 100 performs a sound signal scramble process. FIG. 7 is a flowchart showing the flow of the sound signal scramble process. FIG. 8 is a diagram showing a waveform of a sound signal accompanying the sound signal scramble process.

図７のステップＳＢ１００において、ＣＰＵ１００は、ＲＡＭ２２０に書き込まれた音信号を複製する。本動作例においては、ＣＰＵ１００は、音信号を３つに複製し、複製された音信号をＲＡＭ２２０に書き込む。なお、以下ではこれらの音信号を音信号Ａ、Ｂ、およびＣと呼ぶ。以下に説明するステップＳＢ１１０ないしステップＳＢ１５０は、音信号Ａ、Ｂ、およびＣのそれぞれについて行われ、それらの音信号は互いに異なる音信号へと変換される。以下の処理は、３つの音信号について同時に実行しても良いし、順次実行しても良い。 In step SB100 in FIG. 7, the CPU 100 duplicates the sound signal written in the RAM 220. In this operation example, the CPU 100 duplicates the sound signal into three, and writes the duplicated sound signal in the RAM 220. Hereinafter, these sound signals are referred to as sound signals A, B, and C. Steps SB110 to SB150 described below are performed for each of the sound signals A, B, and C, and these sound signals are converted into different sound signals. The following processing may be executed simultaneously for three sound signals, or may be executed sequentially.

ステップＳＢ１１０において、ＣＰＵ１００は、音信号のフレーム化を以下のように行う。すなわち、ＣＰＵ１００は、当該音信号の話速に関する情報をＲＡＭ２２０から読み出す。そしてＣＰＵ１００は、ＲＯＭ２１０に記憶されているフレーム長選択テーブルにおいて、平均値、平均値＋σ、平均値−σの値に対応付けられたフレーム長を読み出し、ＲＡＭ２２０に書き込まれた音信号Ａ、Ｂ、およびＣを読み出したそれぞれのフレーム長で分割し、分割した結果生成されたフレームをＲＡＭ２２０に書き込む。なお、図８の（ａ）―Ａ、（ａ）―Ｂ、および（ａ）―Ｃには、音信号Ａ、Ｂ、およびＣが異なるフレーム長で分割された状況が示されている。 In step SB110, the CPU 100 performs framing of the sound signal as follows. That is, the CPU 100 reads out information regarding the speech speed of the sound signal from the RAM 220. Then, the CPU 100 reads the frame length associated with the average value, the average value + σ, and the average value−σ in the frame length selection table stored in the ROM 210, and the sound signals A, B, And C are divided by the read frame lengths, and a frame generated as a result of the division is written in the RAM 220. Note that (a) -A, (a) -B, and (a) -C in FIG. 8 show a situation where the sound signals A, B, and C are divided by different frame lengths.

ステップＳＢ１２０において、ＣＰＵ１００は、ＲＡＭ２２０に書き込まれた音信号のフレームの各々について、上述したリバース処理を行う。リバース処理がなされた結果、音信号Ａ、Ｂ、およびＣの各フレームは、それぞれ図８の（ｂ）―Ａ、（ｂ）―Ｂ、および（ｂ）―Ｃに示されるようにフレーム内で時間的に逆に倒置されたデータに変換される。 In step SB120, CPU 100 performs the above-described reverse process for each frame of the sound signal written in RAM 220. As a result of the reverse processing, the frames of the sound signals A, B, and C are included in the frames as shown in FIGS. 8 (b) -A, (b) -B, and (b) -C, respectively. It is converted into data inverted in time.

ステップＳＢ１３０において、各フレームには窓掛け処理が行われる。その結果、各フレームの頭部および尾部に対応する部分の波形が整形される。
ステップＳＢ１４０において、ＣＰＵ１００は、各音信号についてその複数のフレームの順序をランダムに並べ替える（図８（ｃ）参照）。
ステップＳＢ１５０において、ＣＰＵ１００は、並べ替えられたフレームの音信号をつなぎ合わせ、新たな音信号を生成する。
ステップＳＢ１６０において、ＣＰＵ１００は、ステップＳＢ１１０ないし１５０においてそれぞれ別個に処理がなされた音信号Ａ、Ｂ、およびＣをミキシング処理し、スクランブル音信号を生成する（図８（ｄ）参照）。 In step SB130, a windowing process is performed on each frame. As a result, the waveform of the portion corresponding to the head and tail of each frame is shaped.
In step SB140, the CPU 100 randomly rearranges the order of the plurality of frames for each sound signal (see FIG. 8C).
In step SB150, the CPU 100 connects the sound signals of the rearranged frames to generate a new sound signal.
In step SB160, the CPU 100 mixes the sound signals A, B, and C separately processed in steps SB110 to 150, and generates a scrambled sound signal (see FIG. 8D).

以上の処理により生成されたスクランブル音信号は、以下のような特徴を持つ。すなわち、生成されるスクランブル音信号においては、元の音信号の音量レベルの変動幅が小さくなり、平均的な音量レベルに収束する。なぜならば、元の音信号は短いフレームで分割されそれらのフレームはランダムに順序が並べ替えられているだけでなく、そのような処理がなされた複数の音信号が重ねあわされているからである。このため、スクランブル音信号の音量レベルは略一定に保たれており、元の音信号の音量レベルの変動によるマスキング効果の不安定さが低減される。 The scrambled sound signal generated by the above processing has the following characteristics. That is, in the generated scrambled sound signal, the fluctuation range of the volume level of the original sound signal becomes small and converges to an average volume level. This is because the original sound signal is divided into short frames and the frames are not only randomly rearranged, but also a plurality of sound signals that have undergone such processing are overlaid. . For this reason, the volume level of the scrambled sound signal is kept substantially constant, and the instability of the masking effect due to fluctuations in the volume level of the original sound signal is reduced.

また、音信号を分割するフレーム長は話速に応じて適切に設定されるため、元の音に含まれていた音素は適切に分割されており、高いマスキング効果を有する。また、音素の分割やフレーム内のリバース処理により音の無意味化が十分になされており、ユーザのプライバシーやセキュリティーは保護される。また、各フレームのつなぎ目においては窓掛け処理がなされていることから、生成されたスクランブル音信号は滑らかにつながった音信号となる。 In addition, since the frame length for dividing the sound signal is appropriately set according to the speech speed, the phonemes included in the original sound are appropriately divided and have a high masking effect. In addition, sound is rendered meaningless by dividing phonemes and performing reverse processing within the frame, thereby protecting the user's privacy and security. In addition, since the windowing process is performed at the joint of each frame, the generated scrambled sound signal is a smoothly connected sound signal.

再び図６において、ＣＰＵ１００は、ステップＳＡ１４０において、生成されたスクランブル音信号をＲＯＭ２１０に書き込む。
また、ＣＰＵ１００は、音信号選択部５２０の各選択肢の右横に、スクランブル音信号選択テーブルにおいて該番号のスクランブル音信号に関連付けられた「名前」を表示する。 In FIG. 6 again, the CPU 100 writes the generated scrambled sound signal in the ROM 210 in step SA140.
Further, the CPU 100 displays “name” associated with the scrambled sound signal of the number in the scrambled sound signal selection table on the right side of each option of the sound signal selection unit 520.

なお、ＲＯＭ２１０には、ホワイトノイズを表す音信号も予め格納されている。従って、初期設定処理を終えた段階で、ＲＯＭ２１０には、マスキングサウンドの元となる音信号として、スクランブル音信号とホワイトノイズの音信号が格納された状態となる。 Note that the ROM 210 also stores a sound signal representing white noise in advance. Therefore, when the initial setting process is completed, the ROM 210 is in a state where a scrambled sound signal and a white noise sound signal are stored as the sound signal that is the basis of the masking sound.

（Ｂ−２；マスキングサウンド生成処理）
次に、マスキングサウンド生成処理について説明する。図９は、マスキングサウンド生成処理の流れを示したフローチャートである。
マスキングサウンド生成装置１０のユーザは、マスキングサウンド生成処理を実行させるにあたり、操作部５００の動作モード選択部５１０を操作し、１または２のいずれかの動作モードを選択する。操作部５００は、選択された動作モードを示す動作モード情報をＣＰＵ１００に出力する。以下では、ユーザによりそれぞれの動作モードが選択された場合のマスキングサウンド生成処理について説明する。 (B-2; Masking sound generation process)
Next, the masking sound generation process will be described. FIG. 9 is a flowchart showing the flow of the masking sound generation process.
When executing the masking sound generation process, the user of the masking sound generation apparatus 10 operates the operation mode selection unit 510 of the operation unit 500 to select one of the operation modes 1 and 2. The operation unit 500 outputs operation mode information indicating the selected operation mode to the CPU 100. Hereinafter, the masking sound generation process when each operation mode is selected by the user will be described.

（Ｂ−２−１；動作モード１）
本動作モードは、音響空間２０Ａにおける音の音響特性に基づいて、マスキングサウンドを生成する上で適切なスクランブル音信号が自動的に選択されるモードである。 (B-2-1; operation mode 1)
This operation mode is a mode in which an appropriate scrambled sound signal is automatically selected for generating a masking sound based on the acoustic characteristics of the sound in the acoustic space 20A.

ステップＳＣ１００において、ＣＰＵ１００は、動作モード情報を受信する。
ステップＳＣ１１０において、ＣＰＵ１００は、受信した動作モード情報が１であるか否かを判断する。本動作モードにおいては、動作モード情報は「１」であるから、ステップＳＣ１１０の判断結果は“Ｙｅｓ”となり、ステップＳＣ１２０の処理が行われる。 In step SC100, CPU 100 receives the operation mode information.
In step SC110, CPU 100 determines whether or not the received operation mode information is 1. In this operation mode, since the operation mode information is “1”, the determination result in step SC110 is “Yes”, and the process in step SC120 is performed.

ステップＳＣ１２０において、ＣＰＵ１００は、音響空間２０Ａにおける音を表す音信号を受取り、該音信号の音響特性分析処理を行う。本処理は、初期設定処理における音響特性分析処理と同様であるため、その説明を省略する。 In step SC120, the CPU 100 receives a sound signal representing a sound in the acoustic space 20A and performs an acoustic characteristic analysis process on the sound signal. Since this process is the same as the acoustic characteristic analysis process in the initial setting process, description thereof is omitted.

ステップＳＣ１３０において、ＣＰＵ１００は、ステップＳＣ１２０の音響特性分析処理の結果に基づいて、ＲＯＭ２１０に書き込まれたスクランブル音信号からいずれか１つ適切な音信号を読み出す。すなわち、ＣＰＵ１００は、ステップＳＣ１２０における分析結果として得た音響特性（話速、フォルマント、および周波数特性）を、スクランブル音信号選択テーブルと照らし合わせ、最も音響特性が類似しているスクランブル音信号を選択する。 In step SC130, the CPU 100 reads any one appropriate sound signal from the scrambled sound signal written in the ROM 210 based on the result of the acoustic characteristic analysis process in step SC120. That is, the CPU 100 compares the acoustic characteristics (speech speed, formant, and frequency characteristics) obtained as an analysis result in step SC120 with a scrambled sound signal selection table, and selects a scrambled sound signal having the most similar acoustic characteristics. .

ステップＳＣ１４０において、ＣＰＵ１００は、読み出した音信号（本実施形態では１８０秒のデータ）をマスキングサウンドとして出力する。なお、スクランブル音信号は、１８０秒間のデータであるから、出力を開始してから１８０秒後以降は、該スクランブル音信号をループ状に繰り返し出力する。なお、出力されるスクランブル音信号の音量レベルは、使用者により音量レベル選択部５７０により入力された音量レベルに応じて最適な値に設定され、該処理は割り込み処理として実行される。 In step SC140, the CPU 100 outputs the read sound signal (data of 180 seconds in the present embodiment) as a masking sound. Since the scrambled sound signal is data for 180 seconds, the scrambled sound signal is repeatedly output in a loop after 180 seconds from the start of output. The volume level of the output scrambled sound signal is set to an optimum value according to the volume level input by the user through the volume level selection unit 570, and this process is executed as an interrupt process.

本動作モードにおいては、音響空間２０Ａにおける音の音響特性を分析し、該音と最も音響特性が類似したスクランブル音信号がＲＯＭ２１０に格納された多数のスクランブル音信号から選択される。上述のように、マスキングサウンドが対象音の音響特性と類似している場合に最も高いマスキング効果が発揮される。従って、出力されるマスキングサウンドは、音響空間２０Ａにおいて生じている音をマスキングするのに最も適した音響特性を有する。 In this operation mode, the acoustic characteristics of the sound in the acoustic space 20A are analyzed, and a scrambled sound signal having the most similar acoustic characteristics to the sound is selected from a number of scrambled sound signals stored in the ROM 210. As described above, the highest masking effect is exhibited when the masking sound is similar to the acoustic characteristics of the target sound. Therefore, the output masking sound has acoustic characteristics most suitable for masking the sound generated in the acoustic space 20A.

（Ｂ−２−２；動作モード２）
次に、動作モード２おけるマスキングサウンド生成処理について説明する。本動作モードは、ユーザの指示内容に従ってマスキングサウンドが自動的に選択されるモードである。 (B-2-2; operation mode 2)
Next, the masking sound generation process in the operation mode 2 will be described. This operation mode is a mode in which the masking sound is automatically selected according to the content of the user's instruction.

ステップＳＡ１００において、ＣＰＵ１００は、動作モード情報を受信する。
ステップＳＡ１１０において、ＣＰＵ１００は、受信した動作モード情報が１であるか否かを判断する。本動作モードにおいては、動作モード情報は「２」であるから、ステップＳＣ１１０の判断結果は“Ｎｏ”となり、ステップＳＣ１５０の処理が行われる。 In step SA100, the CPU 100 receives the operation mode information.
In step SA110, CPU 100 determines whether or not the received operation mode information is 1. In this operation mode, since the operation mode information is “2”, the determination result in step SC110 is “No”, and the process in step SC150 is performed.

さて、ユーザは、次のいずれかの方法によりマスキングサウンドの生成に係るパラメータを入力する。まず１つめの方法について説明する。ユーザは、操作部５００の音信号選択部５２０の右横に表示されている「名前」を参照し、いずれかの音信号を直接指定する。例えば音響空間２０Ａにおいて「Ａさん」が発声する場合には、ユーザは、音信号選択部５２０において「１」を押下し、英語による会議が行われる場合には、「５」を押下する。 Now, the user inputs parameters relating to the generation of the masking sound by one of the following methods. First, the first method will be described. The user refers to the “name” displayed on the right side of the sound signal selection unit 520 of the operation unit 500 and directly designates one of the sound signals. For example, when “Mr. A” utters in the acoustic space 20A, the user presses “1” in the sound signal selection unit 520, and presses “5” when a conference in English is held.

もう１つの方法は、ユーザが、性別選択部５３０、年齢選択部５４０、言語選択部５５０、および音響空間選択部５６０の中から１つまたは複数について、特定の選択肢を選択する方法である。この場合、前記選択された情報に基づいてＣＰＵ１００が音信号を選択する。例えば、「大人」の「男性」が「執務室」において「英語」で話をする場合には、図３に示すように性別選択部５３０、年齢選択部５４０、言語選択部５５０、および音響空間選択部５６０の各項目が選択される。 Another method is a method in which the user selects a specific option for one or more of the sex selection unit 530, the age selection unit 540, the language selection unit 550, and the acoustic space selection unit 560. In this case, the CPU 100 selects a sound signal based on the selected information. For example, when “adult” “male” speaks “English” in “office”, as shown in FIG. 3, gender selection unit 530, age selection unit 540, language selection unit 550, and acoustic space Each item of the selection unit 560 is selected.

操作部５００は、上述の操作内容に応じて音信号選択情報または条件設定情報を出力する。
ステップＳＣ１５０において、ＣＰＵ１００は、操作部５００から音信号選択情報または条件設定情報を受信する。 The operation unit 500 outputs sound signal selection information or condition setting information according to the above-described operation content.
In step SC150, CPU 100 receives sound signal selection information or condition setting information from operation unit 500.

ステップＳＣ１３０において、ＣＰＵ１００は、操作部５００から受取った音信号選択情報または条件設定情報に基づいて音信号を選択する。すなわち、ＣＰＵ１００が音信号選択情報を受取った場合には、該音信号選択情報が表すスクランブル音信号をＲＯＭ２１０から読み出してマスキングサウンドとして出力する。また、ＣＰＵ１００が条件設定情報を受信した場合には、該条件設定情報に書き込まれた性別、年齢、言語、そして音響空間の種類に関する情報を、スクランブル音信号選択テーブルと照らし合わせ、所定のアルゴリズム、例えば、最も一致した項目の数が多い音信号や、過去の選択履歴の中から最近選択された音信号、あるいは使用頻度が最も高い音信号など設定条件に合致するスクランブル音信号を読み出す。前記所定のアルゴリズムは、利用者の要求に応じて任意に設定すればよい。 In step SC 130, CPU 100 selects a sound signal based on sound signal selection information or condition setting information received from operation unit 500. That is, when the CPU 100 receives the sound signal selection information, the scrambled sound signal represented by the sound signal selection information is read from the ROM 210 and output as a masking sound. Further, when the CPU 100 receives the condition setting information, the information regarding the gender, age, language, and type of acoustic space written in the condition setting information is compared with a scrambled sound signal selection table, and a predetermined algorithm, For example, a scrambled sound signal that matches the setting condition such as a sound signal with the largest number of matched items, a sound signal recently selected from the past selection history, or a sound signal with the highest frequency of use is read. The predetermined algorithm may be arbitrarily set according to a user request.

なお、このとき音響空間選択情報において、「住宅」が書き込まれていた場合には、ＣＰＵ１００は、マスキングサウンドとしてホワイトノイズの音信号を選択しても良い。なぜなら、一般に人の音から生成されたマスキングサウンドよりもホワイトノイズなどのランダムノイズから生成されたマスキングサウンドの方が、マスキング効果は低いものの不快感や違和感を引き起こす程度が低いため、居住性や快適性を優先する住宅では不快感や違和感が低いホワイトノイズによるマスキングが望まれるからである。また、「住宅」以外の場合でもホワイトノイズの音信号を優先することがあるのは言うまでもない。 At this time, if “house” is written in the acoustic space selection information, the CPU 100 may select a white noise sound signal as the masking sound. This is because masking sound generated from random noise such as white noise is generally less likely to cause discomfort and discomfort than masking sound generated from human sounds. This is because masking with white noise, which has a low level of discomfort and incongruity, is desired in houses where priority is placed on sex. Needless to say, the sound signal of white noise may be given priority even in cases other than “house”.

ステップＳＣ１４０において、ＣＰＵ１００は、選択したスクランブル音信号またはホワイトノイズの音信号のいずれかを出力する。なお、出力されるスクランブル音信号の音量レベルは、使用者により音量レベル選択部５７０により入力された音量レベルに応じて最適な値に設定される。なお、該処理は割り込み処理として実行される。 In step SC140, CPU 100 outputs either the selected scrambled sound signal or white noise sound signal. The volume level of the output scrambled sound signal is set to an optimum value according to the volume level input by the user through the volume level selection unit 570. This process is executed as an interrupt process.

本動作モードにおいては、音響空間２０Ａにおける音の特長や音響空間２０Ａの種類などの情報に基づいて、最も該音および音響空間２０Ａの音響特性に合致したスクランブル音信号が、ＲＯＭ２１０に格納された複数のスクランブル音信号またはホワイトノイズから選択される。この場合、ユーザは、ＲＯＭ２１０にどのような音信号が格納されているかについて知らなくても、簡便に最適なマスキングサウンドを生成させることができる。 In this operation mode, a plurality of scrambled sound signals that most closely match the sound and the acoustic characteristics of the acoustic space 20A are stored in the ROM 210 based on information such as the characteristics of the sound in the acoustic space 20A and the type of the acoustic space 20A. Scrambled sound signal or white noise. In this case, the user can easily generate an optimum masking sound without knowing what kind of sound signal is stored in the ROM 210.

（Ｃ；変形例）
以上、本発明の一実施形態について説明したが、かかる実施形態に以下に述べるような変形を加えても良いことは勿論である。また、以下に述べる変形を組み合わせて用いてもよい。 (C: Modification)
Although one embodiment of the present invention has been described above, it is needless to say that the embodiment may be modified as described below. Moreover, you may use combining the deformation | transformation described below.

（１）上記実施形態においては、マスキングサウンド生成装置１０のＣＰＵ１００が本発明に特徴的な処理の多くを実行する場合について説明したが、それぞれの処理を行うハードウェアモジュールを設けて同様の処理を行わせるようにしても良い。 (1) In the above embodiment, the case where the CPU 100 of the masking sound generation apparatus 10 executes many of the processes characteristic of the present invention has been described. However, a hardware module for performing each process is provided to perform the same process. You may make it do.

（２）上記実施形態においては、初期設定処理において、音信号に各種の処理（フレーム化処理、リバース処理、窓掛け処理、およびランダム化処理）を全て施す場合について説明した。しかし、上述した全ての処理を必ずしも行わなくても良く、それらの処理を組み合わせることにより言語としての意味が理解できない程度に音信号が改変されていれば良い。 (2) In the above-described embodiment, the case has been described in which various processes (frame processing, reverse processing, windowing processing, and randomization processing) are all performed on the sound signal in the initial setting processing. However, it is not always necessary to perform all the processes described above, and it is sufficient that the sound signal is modified to such an extent that the meaning as a language cannot be understood by combining these processes.

（３）上記実施形態においては、スクランブル音信号選択テーブルにおいて、音信号に関する複数の情報（性別、年齢、言語、話速、フォルマント、周波数特性）について書き込む場合について説明した。しかし、音響特性分析処理において、話速、フォルマント、周波数特性の全てについて必ずしも分析する必要は無く、これら全ての項目について初期設定処理において書き込む必要もない。また、発音体属性情報の全てを書き込む必要は無い。ＣＰＵ１００は、書き込まれた項目の範囲内で最も一致の度合いが高いスクランブル音信号を選択するようにすれば良い。 (3) In the above embodiment, a case has been described in which a plurality of pieces of information (gender, age, language, speech speed, formant, frequency characteristics) regarding a sound signal are written in the scrambled sound signal selection table. However, in the acoustic characteristic analysis process, it is not always necessary to analyze all of the speech speed, formant, and frequency characteristic, and it is not necessary to write all these items in the initial setting process. Further, it is not necessary to write all of the sound generator attribute information. The CPU 100 may select the scrambled sound signal having the highest degree of matching within the range of the written item.

（４）上記実施形態においては、音響特性分析処理の方法の一例について説明した。しかし、各音響特性の分析方法は、上述の方法に限定されるものではなく、同様の分析結果が得られる方法であればどのような方法を用いても良い。 (4) In the said embodiment, an example of the method of the acoustic characteristic analysis process was demonstrated. However, the analysis method of each acoustic characteristic is not limited to the method described above, and any method may be used as long as a similar analysis result can be obtained.

（５）上記実施形態においては、動作モード１において、音響空間２０Ａで収音した音信号の音響特性を分析する処理について説明した。しかし、実際にマスキングサウンドが放音される空間は音響空間２０Ｂであり、両音響空間の間には壁などの音響特性を変化させる障害物即ち遮音構造体が存在する。従って、ＣＰＵ１００は、音響特性分析処理を行う前に、対象となる音信号に前記遮音構造体の遮音特性を模したフィルタリング処理を施して該音信号が壁を透過した場合の音響効果を付与し、その後音響特性分析処理を行うようにしても良い。その結果、生成されるマスキングサウンドは、音響空間２０Ｂのユーザに聞こえる騒音を模した音信号から生成されたものとなるため、より高いマスキング効果が期待できる。 (5) In the above embodiment, the process of analyzing the acoustic characteristics of the sound signal collected in the acoustic space 20A in the operation mode 1 has been described. However, the space where the masking sound is actually emitted is the acoustic space 20B, and an obstacle that changes the acoustic characteristics such as walls, that is, a sound insulation structure exists between the two acoustic spaces. Therefore, before performing the acoustic characteristic analysis process, the CPU 100 performs a filtering process simulating the sound insulation characteristic of the sound insulation structure on the target sound signal to give an acoustic effect when the sound signal passes through the wall. Then, an acoustic characteristic analysis process may be performed. As a result, the generated masking sound is generated from a sound signal simulating noise heard by the user of the acoustic space 20B, and therefore a higher masking effect can be expected.

（６）上記実施形態においては、マイクロホン３０とスピーカ４０を別々の音響空間に設ける場合について説明した。しかし、同じ音響空間にマイクロホン３０およびスピーカ４０を設置しても良い。例えば音響空間２０Ａにマイクロホン３０およびスピーカ４０を設置した場合、音響空間２０Ａのユーザの会話内容からマスキングサウンドが生成され、該マスキングサウンドは音響空間２０Ａにおいて放音されるため、音響空間２０Ｂには、会話内容とマスキングサウンドの両者が漏れ聞こえることになる。その結果、音響空間２０Ｂのユーザは、音響空間２０Ａのユーザの会話内容を理解することが困難になる。この場合、前記マイクロホン３０とスピーカ４０によりハウリングが発生しないような配置や信号処理を行うことは当然のことである。 (6) In the above embodiment, the case where the microphone 30 and the speaker 40 are provided in separate acoustic spaces has been described. However, the microphone 30 and the speaker 40 may be installed in the same acoustic space. For example, when the microphone 30 and the speaker 40 are installed in the acoustic space 20A, a masking sound is generated from the conversation contents of the user in the acoustic space 20A, and the masking sound is emitted in the acoustic space 20A. Both conversational content and masking sound can be heard. As a result, it becomes difficult for the user of the acoustic space 20B to understand the conversation content of the user of the acoustic space 20A. In this case, it is natural that the microphone 30 and the speaker 40 perform arrangement and signal processing so that no howling occurs.

（７）上記実施形態においては、マイクロホン３０およびスピーカ４０を別々の音響空間に設置する場合について説明した。しかし、同じ空間内にマイクロホン３０およびスピーカ４０を離して設置して、マイクロホン３０の付近のエリアで秘匿性の高い会話がなされ、スピーカ４０の付近のエリアのユーザにマスキングサウンドを放音することで該会話内容が聞こえないようにするようにしても良い。 (7) In the above embodiment, the case where the microphone 30 and the speaker 40 are installed in different acoustic spaces has been described. However, by placing the microphone 30 and the speaker 40 apart in the same space, a highly confidential conversation is made in the area near the microphone 30, and the masking sound is emitted to the user in the area near the speaker 40. The conversation content may not be heard.

（８）上記実施形態においては、音響空間２０Ａにマイクロホン３０を、音響空間２０Ｂにスピーカ４０を設置する場合について説明した。しかし、複数の音響空間、例えば音響空間２０Ａおよび２０Ｂのそれぞれにおいて、マイクロホン３０およびスピーカ４０の両者を設置するようにしてもよい。その場合、マスキングサウンド生成装置１０は入力手段を有し、ユーザは、秘匿性の高い会話を行う場合には入力手段を介してその旨を入力し、マスキングサウンド生成装置１０は、該入力がなされた音響空間においてはマイクロホン３０で音を収音し、他方の音響空間において生成されたマスキングサウンドを放音するように制御すれば良い。 (8) In the above embodiment, the case where the microphone 30 is installed in the acoustic space 20A and the speaker 40 is installed in the acoustic space 20B has been described. However, both the microphone 30 and the speaker 40 may be installed in each of a plurality of acoustic spaces, for example, the acoustic spaces 20A and 20B. In that case, the masking sound generation apparatus 10 has an input means, and when a user has a highly confidential conversation, the user inputs that fact via the input means, and the masking sound generation apparatus 10 receives the input. In such an acoustic space, the sound may be collected by the microphone 30 and the masking sound generated in the other acoustic space may be emitted.

（９）上記実施形態においては、ＣＰＵ１００は、音信号スクランブル処理において入力された音信号を相異なるフレーム長の３つの音信号に複製し、それぞれの音信号において相異なる音信号処理を施し、その後それらの音信号をミキシングしてマスキングサウンドを生成した。しかし、扱う音信号の系統数は３に限られるものではなく、１または２でも良いし４以上でも良いが、マスキングサウンドとしての効果は、系統数がより多いほど高い。 (9) In the above embodiment, the CPU 100 duplicates the sound signal input in the sound signal scramble processing into three sound signals having different frame lengths, performs different sound signal processing on each sound signal, and thereafter These sound signals were mixed to generate a masking sound. However, the number of systems of sound signals to be handled is not limited to 3, but may be 1 or 2 or 4 or more, but the effect as a masking sound is higher as the number of systems is larger.

（１０）上記実施形態においては、ＣＰＵ１００は、音信号のフレーム化において、話速の平均値および時間的なばらつきである標準偏差σから、平均値、平均値＋σ、平均値−σの値を算出し、複製された音信号の各々のフレーム化処理に適用する場合に場合について説明した。しかし、利用される値は、平均値と平均値±σの値に限られるものではない。例えば、σに代えて標準誤差などとしても良いし、σに代えて予め定められた値を用いるとしても良い。
また、フレーム長選択テーブルにおいては、話速に対してフレーム長を３つ対応させておくようにし、ＣＰＵ１００は話速の平均値に対応する３つのフレーム長を読み出し、読み出されたフレーム長を用いて各々の音信号をフレームに分割するようにすれば良い。 (10) In the above embodiment, the CPU 100 determines the average value, the average value + σ, and the average value −σ from the average value of the speech speed and the standard deviation σ that is temporal variation in the framing of the sound signal. The case has been described where the calculation and application to the framing process of each of the duplicated sound signals has been described. However, the values used are not limited to the average value and the average value ± σ. For example, standard error may be used instead of σ, or a predetermined value may be used instead of σ.
Further, in the frame length selection table, three frame lengths are associated with the speech speed, and the CPU 100 reads out the three frame lengths corresponding to the average value of the speech speed, and sets the read frame length as the frame length. It is sufficient to divide each sound signal into frames.

（１１）上記実施形態においては、複製された音信号をそれぞれ異なるフレーム長で分割する場合について説明した。しかし、複数の複製された音信号を共通のフレーム長で分割するようにしても良い。その場合、ＣＰＵ１００は話速の平均値に対応するフレーム長を読み出し、読み出されたフレーム長を用いて各々の音信号をフレームに分割するようにすれば良い。 (11) In the above embodiment, the case where the duplicated sound signal is divided by different frame lengths has been described. However, a plurality of replicated sound signals may be divided by a common frame length. In that case, the CPU 100 may read the frame length corresponding to the average value of the speech speed, and divide each sound signal into frames using the read frame length.

（１２）上記実施形態においては、ランダムノイズとしてホワイトノイズを用いる場合について説明した。しかし、ランダムノイズの種類は、ホワイトノイズに限定されず、例えばピンクノイズ（パワースペクトル密度が周波数に反比例する雑音）など他の音源でも良いし、空調から実際に発生する騒音や振動などから予め生成した音信号を用いるなどしても良い。 (12) In the above embodiment, the case where white noise is used as random noise has been described. However, the type of random noise is not limited to white noise, but may be other sound sources such as pink noise (noise whose power spectral density is inversely proportional to frequency), or generated in advance from noise or vibration actually generated from air conditioning. You may use the sound signal which did.

（１３）上記実施形態においては、既成の音信号をＲＯＭ２１０に書き込むために光ディスク再生装置を設け、光ディスクに書き込まれた音信号をＲＯＭ２１０に書き込む場合について説明した。しかし、外部から音信号を取り込むための装置は、光ディスク再生装置に限られるものではなく、例えばインターネットなどの通信網を介して音信号をサーバからダウンロードしたり、マスキングサウンド生成装置１０に外部機器との接続を仲介するＩ／Ｏ部を設けて、該Ｉ／Ｏ部に接続されたＦｌａｓｈＭｅｍｏｒｙなどから音信号をＲＯＭ２１０に移動したりしても良い。 (13) In the above embodiment, a case has been described in which an optical disk playback device is provided to write an existing sound signal to the ROM 210, and a sound signal written to the optical disk is written to the ROM 210. However, the device for taking in the sound signal from the outside is not limited to the optical disk reproducing device, and for example, the sound signal can be downloaded from a server via a communication network such as the Internet, or the masking sound generating device 10 can be connected to an external device. It is also possible to provide an I / O unit that mediates the connection of the audio signal and move a sound signal from the flash memory connected to the I / O unit to the ROM 210.

（１４）上記実施形態においては、動作モード１および２が選択可能である場合について説明した。しかし、両方の動作モードに示した処理が実行可能である必要はなく、いずれか一方だけでも良い。 (14) In the above embodiment, the case where the operation modes 1 and 2 are selectable has been described. However, the processes shown in both operation modes need not be executable, and only one of them may be executed.

（１５）上記実施形態においては、初期設定処理において音信号スクランブル処理を施し、予めスクランブル音信号をＲＯＭ２１０に書き込んでおく場合について説明した。しかし、ＣＰＵ１００は、音信号スクランブル処理することなく受取った音信号をＲＯＭ２１０に格納し、マスキングサウンド生成処理の際に音信号スクランブル処理を行いながらマスキングサウンドを出力するようにしても良い。
また、光ディスクにスクランブルされた音信号が格納されている場合には、初期設定処理において音信号スクランブル処理を行わなくても良い。 (15) In the above embodiment, the case where the sound signal scramble process is performed in the initial setting process and the scramble sound signal is written in the ROM 210 in advance has been described. However, the CPU 100 may store the received sound signal in the ROM 210 without performing the sound signal scramble process, and output the masking sound while performing the sound signal scramble process in the masking sound generation process.
When the scrambled sound signal is stored on the optical disc, the sound signal scramble process may not be performed in the initial setting process.

（１６）上記実施形態においては、スクランブルされた音信号を複数生成し、それらの音信号をＲＯＭ２１０に格納し、マスキングサウンドを生成する際にそれらを選択して用いる旨説明した。従って、上記実施形態における「スクランブルされた複数の音信号の組み」を記憶した記憶媒体を作成し、他の音信号の再生装置において該記憶媒体から読み出された音信号を選択して出力するようにしても良い。 (16) In the above-described embodiment, it has been described that a plurality of scrambled sound signals are generated, the sound signals are stored in the ROM 210, and are selected and used when generating a masking sound. Therefore, a storage medium storing “a set of a plurality of scrambled sound signals” in the above embodiment is created, and a sound signal read from the storage medium is selected and output by another sound signal playback device. You may do it.

（１７）上記実施形態においては、動作モード１では、ＣＰＵ１００がスクランブル音信号選択テーブルを参照し、受取った音信号の音響特性に最も類似しているスクランブル音信号を選択する場合について説明した。また、動作モード２では、ＣＰＵ１００がスクランブル音信号選択テーブルを参照し、ユーザから入力された各種条件と最も一致度が高いスクランブル音信号を選択する場合について説明した。しかし、ＣＰＵ１００は、上記いずれの場合においても、スクランブル音信号選択テーブルにおいて、スクランブル音信号を選択するにあたり、最も一致度が高いものではなく、一致度が一定のレベルを超えるものの中から選択するなどしても良い。 (17) In the above embodiment, in the operation mode 1, the case where the CPU 100 selects the scrambled sound signal most similar to the acoustic characteristics of the received sound signal by referring to the scrambled sound signal selection table has been described. In the operation mode 2, the case where the CPU 100 refers to the scrambled sound signal selection table and selects the scrambled sound signal having the highest degree of coincidence with the various conditions input from the user has been described. However, in any of the above cases, the CPU 100 selects the scrambled sound signal in the scrambled sound signal selection table from among the scrambled sound signals that are not the highest in coincidence but the degree of coincidence exceeds a certain level. You may do it.

（１８）上記実施形態においては、動作モード１において、音響特性分析処理の分析結果に基づいて、最も音響特性が類似したスクランブル音信号またはホワイトノイズの音信号が選択される場合について説明した。しかし、複数の音信号を同時に選択することができるようにしても良い。その場合、例えば動作モード１においては、操作部５００において、選択される音信号の数を設定するための入力部を設けると良い。そしてＣＰＵ１００は、最も音響特性が一致した順に、所定の数の音信号を選択するようにすれば良い。また、動作モード２において操作者により音信号が直接選択される場合には、音信号選択部５２０において押下された複数の選択肢と対応する複数の音信号が選択されるようにすればよい。以上のようにすれば、複数の音信号がマスキングサウンドとして重ねて出力されるため、効果的なマスキングがなされることが期待できる。 (18) In the above-described embodiment, the case has been described in which the scrambled sound signal or the white noise sound signal having the most similar acoustic characteristics is selected in the operation mode 1 based on the analysis result of the acoustic characteristic analysis process. However, a plurality of sound signals may be selected simultaneously. In this case, for example, in the operation mode 1, the operation unit 500 may be provided with an input unit for setting the number of sound signals to be selected. Then, the CPU 100 may select a predetermined number of sound signals in the order in which the acoustic characteristics are the same. In addition, when the sound signal is directly selected by the operator in the operation mode 2, a plurality of sound signals corresponding to the plurality of options pressed in the sound signal selection unit 520 may be selected. In this way, since a plurality of sound signals are output as a masking sound, it can be expected that effective masking is performed.

（１９）上記実施形態において、音響空間選択情報の内容に基づいて、出力されるマスキングサウンドに各種音響効果を付与しても良い。例えば音響空間選択情報が「ホール」である場合には、ＣＰＵ１００は、読み出されたスクランブル音信号またはホワイトノイズの音信号に対して残響効果を付与しても良い。なお、残響の付与方法としては、所定の時間遅延させた複数の音信号を重ね合わせる（ＦＩＲフィルタによる反射音の畳み込み処理）など、従来技術を適用可能である。また、「会議室」や「ホール」など、選択された音響空間の種類に応じて、残響時間や重ね合わせる音信号の数に差を設けるようにしても良い。
また、別の音響効果として反射音を畳み込む処理などにより音色の変換を施しても良い。会議室では、音が会議室の壁や机で反射したり室内で反響したりして、会議室独特の音色へと変換される。従って、音響空間選択情報が例えば「会議室」である場合には、ＣＰＵ１００は、読み出されたスクランブル音信号またはホワイトノイズの音信号の波形を調整して、該音信号を上記会議室特有の音色へ変換するなどしても良い。
以上の音響処理を施すことにより、更に違和感の少ないマスキングサウンドが生成される。 (19) In the above embodiment, various acoustic effects may be imparted to the output masking sound based on the content of the acoustic space selection information. For example, when the acoustic space selection information is “Hall”, the CPU 100 may add a reverberation effect to the read scrambled sound signal or white noise sound signal. In addition, as a method for imparting reverberation, a conventional technique such as superimposing a plurality of sound signals delayed for a predetermined time (convolution processing of reflected sound using an FIR filter) can be applied. Further, a difference may be provided in the reverberation time and the number of sound signals to be superimposed depending on the type of the selected acoustic space such as “conference room” or “hall”.
Moreover, you may perform timbre conversion by the process etc. which convolve a reflected sound as another acoustic effect. In the conference room, the sound is reflected by the walls and desks of the conference room or reverberated in the room, and is converted into a tone unique to the conference room. Therefore, when the acoustic space selection information is “conference room”, for example, the CPU 100 adjusts the waveform of the read scrambled sound signal or white noise sound signal, It may be converted into a timbre.
By performing the above acoustic processing, a masking sound with less discomfort is generated.

（２０）上記実施形態においては、音響空間選択部５６０における選択肢として、「会議室」、「住宅」、「ホール」、「執務室」など、部屋の種類が書き込まれている場合について説明した。しかし、例えば「音がよく反響する空間」、「無響室」などといった部屋の音響特性を示す選択肢を設けても良い。要は、音響特性選択情報は、音響空間の音響特性を示す情報であれば良い。 (20) In the above embodiment, the case where the type of room such as “meeting room”, “house”, “hall”, “office room”, etc. is written as an option in the acoustic space selection unit 560 has been described. However, for example, options indicating the acoustic characteristics of the room such as “a space in which sound is well reflected” and “anechoic room” may be provided. In short, the acoustic characteristic selection information may be information indicating the acoustic characteristics of the acoustic space.

（２１）上記実施形態においては、動作モード２において、音響空間選択情報に基づいて音信号が選択される場合について説明した。しかし、そのような場合に限らず、いずれの動作モードで動作している場合においても音響空間選択部５６０への入力が可能であるようにしても良い。そのようにすれば、上記変形例（１９）にも説明したように、音響空間２０の音響特性に基づいて各種の音響処理をマスキングサウンドに対して施すことが可能となる。 (21) In the above embodiment, the case where the sound signal is selected based on the acoustic space selection information in the operation mode 2 has been described. However, the present invention is not limited to such a case, and input to the acoustic space selection unit 560 may be possible in any of the operation modes. By doing so, it is possible to perform various kinds of acoustic processing on the masking sound based on the acoustic characteristics of the acoustic space 20 as described in the modification (19).

（２２）上記実施形態においては、動作モード１において、音響特性分析処理の分析結果を、スクランブル音信号またはホワイトノイズの選択に用いる場合について説明した。その場合、音響特性分析処理において更に、音響空間２０Ａにおける残響時間や反射音特性（インパルス応答）などを測定し、該音響特性分析処理の分析結果に基づいて、読み出された音信号に対して各種の音響処理を施して出力するようにしても良い。例えば音響空間２０Ａは「ホール」である場合、ホールは一般に残響時間が非常に長いことから、読み出された音信号に対して残響処理を施すなどしても良い。 (22) In the above-described embodiment, the case where the analysis result of the acoustic characteristic analysis process is used for selecting a scrambled sound signal or white noise in the operation mode 1 has been described. In that case, in the acoustic characteristic analysis process, the reverberation time and reflected sound characteristic (impulse response) in the acoustic space 20A are further measured, and the read sound signal is analyzed based on the analysis result of the acoustic characteristic analysis process. Various sound processings may be performed and output. For example, when the acoustic space 20A is a “hole”, the reverberation process may be performed on the read sound signal because the hall generally has a very long reverberation time.

（２３）上記実施形態においては、動作モード２において、条件設定情報が入力された場合には該条件設定情報に基づいて音信号を読み出して出力する場合について説明した。その場合、読み出した音信号に対し、条件設定情報に基づいて各種音響処理を施しても良い。例えば、性別選択情報が「男性」である場合には、音信号をイコライジングして周波数の低い周波数成分を強調して「男性」の声を模した音信号に変換するようにするなどしても良い。また、年齢選択情報が「小人」である場合には、音信号をイコライジングして周波数の高い周波数成分を強調して「小人」の声を模した音信号に変換するなどしても良い。 (23) In the above embodiment, in the operation mode 2, when the condition setting information is input, the sound signal is read and output based on the condition setting information. In that case, various acoustic processes may be performed on the read sound signal based on the condition setting information. For example, when the gender selection information is “male”, the sound signal may be equalized to emphasize the low frequency component and convert it to a sound signal imitating a “male” voice. good. Further, when the age selection information is “dwarf”, the sound signal may be equalized to emphasize the frequency component having a high frequency and converted to a sound signal imitating the voice of “dwarf”. .

マスキングサウンド生成装置１０が設けられた音響空間２０の構成を示した図である。It is the figure which showed the structure of the acoustic space 20 in which the masking sound production | generation apparatus 10 was provided. マスキングサウンド生成装置１０の構成を示したブロック図である。1 is a block diagram showing a configuration of a masking sound generation device 10. FIG. 操作部５００の外観を示した図である。FIG. 3 is a diagram illustrating an appearance of an operation unit 500. フレーム長選択テーブルの一例を示した図である。It is the figure which showed an example of the frame length selection table. スクランブル音信号選択テーブルの一例を示した図である。It is the figure which showed an example of the scramble sound signal selection table. 初期設定処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the initial setting process. 音信号スクランブル処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the sound signal scramble process. 音信号スクランブル処理における音信号の波形を示した図である。It is the figure which showed the waveform of the sound signal in a sound signal scramble process. マスキングサウンド生成処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the masking sound production | generation process.

Explanation of symbols

１…サウンドマスキングシステム、１０…マスキングサウンド生成装置、２０Ａ、２０Ｂ…音響空間、３０…マイクロホン、４０…スピーカ、１００…ＣＰＵ、２００…記憶部、２１０…ＲＯＭ、２２０…ＲＡＭ、３００…音声入力部、３１０…Ａ／Ｄコンバータ、３２０…入力端子、４００…音声出力部、４１０…Ｄ／Ａコンバータ、４２０…アンプ、４３０…出力端子、５００…操作部、５１０…動作モード選択部、５２０…音信号選択部、５３０…性別選択部、５４０…年齢選択部、５５０…言語選択部、５６０…音響空間選択部、５７０…音量レベル選択部、６００…光ディスク再生装置、７００…バス DESCRIPTION OF SYMBOLS 1 ... Sound masking system, 10 ... Masking sound production | generation apparatus, 20A, 20B ... Acoustic space, 30 ... Microphone, 40 ... Speaker, 100 ... CPU, 200 ... Memory | storage part, 210 ... ROM, 220 ... RAM, 300 ... Audio | voice input part 310 ... A / D converter, 320 ... input terminal, 400 ... audio output unit, 410 ... D / A converter, 420 ... amplifier, 430 ... output terminal, 500 ... operation unit, 510 ... operation mode selection unit, 520 ... sound Signal selection unit, 530 ... gender selection unit, 540 ... age selection unit, 550 ... language selection unit, 560 ... acoustic space selection unit, 570 ... volume level selection unit, 600 ... optical disk playback device, 700 ... bus

Claims

A memory for storing a plurality of scrambled sound signals in which the time series of the sound signal is changed and for storing acoustic characteristics of each of the scrambled sound signals by reconfiguring the sound signal by dividing it into sections of a predetermined time length Means,
Acoustic characteristic analysis means for collecting sound and analyzing acoustic characteristics of the sound;
The acoustic characteristic analyzed by the acoustic characteristic analyzing means and the acoustic characteristic of the scrambled sound signal are compared by a predetermined algorithm to determine a scrambled sound signal, and the determined scrambled sound signal is read from the storage means and output. And an output means. A masking sound generation apparatus comprising: an output means;

A memory for storing a plurality of scrambled sound signals in which the time series of the sound signal is changed and for storing acoustic characteristics of each of the scrambled sound signals by reconfiguring the sound signal by dividing it into sections of a predetermined time length Means,
Receiving means for receiving information about the acoustic characteristics of the sound to be masked from the operator;
Output means for comparing the acoustic characteristics received by the receiving means and the acoustic characteristics of the scrambled sound signal by a predetermined algorithm to determine a scrambled sound signal, reading out the determined scrambled sound signal from the storage means and outputting it And a masking sound generating device.

Storage means for storing a plurality of scrambled sound signals in which the time series of the sound signal is changed by dividing the sound signal into sections of a predetermined time length and reconfiguring;
Receiving means for receiving an instruction signal designating any of the scrambled sound signals stored in the storage means from an operator;
A masking sound generating apparatus comprising: output means for reading out and outputting the scrambled sound signal indicated by the instruction signal received by the receiving means from the storage means.

Scrambling means for receiving a sound signal, processing the sound signal into predetermined sections, and generating a scrambled sound signal in which the time series of each section is changed, and storing the scrambled sound signal in the storage means. The masking sound generation apparatus according to claim 1, wherein the masking sound generation apparatus is a masking sound generation apparatus.

The output means performs an acoustic process on the scrambled sound signal read from the storage means based on the acoustic characteristics of the sound analyzed by the acoustic characteristics analysis means, and outputs the result. Masking sound generator.

The output means performs an acoustic process on the scrambled sound signal read from the storage means based on information about the acoustic characteristics of the masked sound received by the receiving means, and outputs the scrambled sound signal. The masking sound generator described in 1.

Receiving means for receiving information on acoustic characteristics of a space where the scrambled sound signal is emitted from an operator;
5. The output unit according to claim 1, wherein the output unit performs an acoustic process on the scrambled sound signal read from the storage unit based on information about the acoustic characteristics of the space received by the receiving unit. A masking sound generator according to claim 1.

By dividing and reconfiguring the sound signal into sections of a predetermined time length, a plurality of scrambled sound signals whose time series of the sound signal are changed are stored in a storage device, and the acoustic characteristics of each of the scrambled sound signals are A memory stage to memorize,
An acoustic characteristic analysis stage for collecting sound and analyzing the acoustic characteristics of the sound;
The acoustic characteristic analyzed in the acoustic characteristic analysis step and the acoustic characteristic of the scrambled sound signal are compared by a predetermined algorithm to determine a scrambled sound signal, and the determined scrambled sound signal is read from the storage device and output. A masking sound generation method comprising: an output stage.

By dividing and reconfiguring the sound signal into sections of a predetermined time length, a plurality of scrambled sound signals whose time series of the sound signal are changed are stored in a storage device, and the acoustic characteristics of each of the scrambled sound signals are A memory stage to memorize,
A receiving stage for receiving information about the acoustic characteristics of the sound to be masked from the operator;
An output step of determining the scrambled sound signal by comparing the acoustic characteristic received in the receiving step with the acoustic characteristic of the scrambled sound signal by a predetermined algorithm, and reading and outputting the determined scrambled sound signal from the storage device And a masking sound generating method comprising:

A storage step of storing a plurality of scrambled sound signals in which the time series of the sound signal is changed by dividing the sound signal into sections of a predetermined time length and reconfiguring the sound signal;
Receiving an instruction signal designating any of the scrambled sound signals stored in the storage step from an operator;
A masking sound generation method comprising: an output step of reading out and outputting the scrambled sound signal indicated by the instruction signal received in the receiving step from the storage device.

Computer
A memory for storing a plurality of scrambled sound signals in which the time series of the sound signal is changed and for storing acoustic characteristics of each of the scrambled sound signals by reconfiguring the sound signal by dividing it into sections of a predetermined time length Means,
Acoustic characteristic analysis means for collecting sound and analyzing acoustic characteristics of the sound;
The acoustic characteristic analyzed by the acoustic characteristic analyzing means and the acoustic characteristic of the scrambled sound signal are compared by a predetermined algorithm to determine a scrambled sound signal, and the determined scrambled sound signal is read from the storage means and output. Program to function as output means.

Computer
A memory for storing a plurality of scrambled sound signals in which the time series of the sound signal is changed and for storing acoustic characteristics of each of the scrambled sound signals by reconfiguring the sound signal by dividing it into sections of a predetermined time length Means,
Receiving means for receiving information about the acoustic characteristics of the sound to be masked from the operator;
Output means for comparing the acoustic characteristics received by the receiving means and the acoustic characteristics of the scrambled sound signal by a predetermined algorithm to determine a scrambled sound signal, reading out the determined scrambled sound signal from the storage means and outputting it Program to function as.

Computer
Storage means for storing a plurality of scrambled sound signals in which the time series of the sound signal is changed by dividing the sound signal into sections of a predetermined time length and reconfiguring;
Receiving means for receiving an instruction signal designating any of the scrambled sound signals stored in the storage means from an operator;
A program for causing a scrambled sound signal indicated by an instruction signal received by the receiving means to function as an output means for reading out and outputting from the storage means.

By dividing the sound signal into predetermined sections and processing them, a plurality of scrambled sound signals in which the time series of each section is changed are stored, and the scrambled data is stored so that the respective scrambled data can be selectively read. recoding media.