JPS6356698A

JPS6356698A - Numerous speaker voice recognition equipment

Info

Publication number: JPS6356698A
Application number: JP61199081A
Authority: JP
Inventors: 松下　満次; 辻田　和一郎
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1986-08-27
Filing date: 1986-08-27
Publication date: 1988-03-11

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Abstract] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は多数の話者の音声を認識する多数話者音声認識
装置に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a multi-speaker speech recognition device that recognizes the voices of a large number of speakers.

（従来の技術）第３図は従来の多数話者音声認識装置を示すブロック図
である。同図において、３１は多数の話者が発する音声
を拾って音声信号を出力する１木のマイクロフォン、３
２はマイクロフォン３１からの音声信号を増幅する増幅
器（以下、ＡＭＰと略す）、３３は各話者毎に設けられ
、音声を発した話者のみが押下することで、後述する標
準パターン選択部３４を介して後述する標準パターン部
３５に予め登録された標準パターンを選択するため判別
スイッチ、３４は判別スイッチ３３からの信号に基づい
て後述する標準パターン部３５に予め登録された標準パ
ターンを選択して後述する認識部３６に出力させるため
の標準パターン選択部、３５は話者毎の標準パターンを
予め登録しておく標準パターン部、３６はＡＭＰ３２か
らの音声信号と選択された標準パターン部３５から標準
パターンとの同定を行なう認識部である。(Prior Art) FIG. 3 is a block diagram showing a conventional multi-speaker speech recognition device. In the figure, 31 is a one-wooden microphone that picks up voices emitted by many speakers and outputs audio signals;
2 is an amplifier (hereinafter abbreviated as AMP) for amplifying the audio signal from the microphone 31; 33 is provided for each speaker, and when pressed only by the speaker who made the sound, a standard pattern selection unit 34 (described later) is selected. A discrimination switch 34 selects a standard pattern pre-registered in the standard pattern section 35, which will be described later, based on a signal from the discrimination switch 33. 35 is a standard pattern section in which standard patterns for each speaker are registered in advance, and 36 is a standard pattern section for outputting the standard pattern from the audio signal from the AMP 32 and the selected standard pattern section 35. This is a recognition unit that performs identification with standard patterns.

次に、第３図を用いて従来の多数話者音声認識装置の動
作を説明する。Next, the operation of the conventional multi-speaker speech recognition device will be explained using FIG.

はじめに、発声した話者とその話者の標準パターンが対
応して標準パターン部３５に登録しておくため、発声し
た音声信号から標準パターンを作成して標準パターン部
３５に登録する際、その話者は判別スイッチ３３を押下
して音声を発する。First, since the speaker who uttered the voice and the standard pattern of that speaker are registered in the standard pattern section 35 in correspondence, when a standard pattern is created from the uttered audio signal and registered in the standard pattern section 35, the speech The user presses the discrimination switch 33 to emit a sound.

そして、例えば、話者■が発声するとき、話者■は話者
■の判別スイッチ３３を押下する。すると、標準パター
ン選択部３４は判別スイッチ３３からの信号に基づいて
標準パターン部３５の中から話者■の標準パターンＡを
選択して認識部３６に供給される。よって、話者■が発
声した音声は認識部３６にて標準パターン部３５からの
話者■の標準パターンと比較され、同定が行なわれて音
声認識される。　尚、継続して話者■が発声するときは
、判別スイッチ３３を押下する必要はない。Then, for example, when the speaker ■ speaks, the speaker ■ presses the discrimination switch 33 of the speaker ■. Then, the standard pattern selection section 34 selects the standard pattern A of the speaker ■ from the standard pattern section 35 based on the signal from the discrimination switch 33, and supplies it to the recognition section 36. Therefore, the speech uttered by the speaker ■ is compared with the standard pattern of the speaker ■ from the standard pattern unit 35 in the recognition section 36, identification is performed, and the speech is recognized. Incidentally, when the speaker (2) continues to speak, there is no need to press the discrimination switch 33.

（発明が解決しようとする問題点）しかしながら、上記従来の装置では、音声を発した話者
を各自の判別スイッチの押下により検知して、その話者
に対応する標準パターンを選択し認識するために、話者
は各自の判別スイッチを手に持っていなければならず、
かつ音声を発する毎に必ず判別スイッチを押下しなけれ
ばいけない等の操作性における煩わしさかあった。(Problem to be Solved by the Invention) However, in the conventional device described above, the speaker who has uttered the voice is detected by pressing the respective discrimination switch, and the standard pattern corresponding to that speaker is selected and recognized. In this case, each speaker must have his or her discrimination switch in hand,
In addition, there were some inconveniences in operability, such as having to press a discrimination switch every time a voice was emitted.

本発明はこれらの問題点を解決するためのもので、操作
性の優れた多数話者音声認識装置を提供することを目的
とする。The present invention is intended to solve these problems, and it is an object of the present invention to provide a multi-speaker speech recognition device with excellent operability.

（問題点を解決するための手段）本発明は前記問題点を解決するために、複数の話者の音
声を認識する多数話者音声認識装置において次の各手段
を設けたものである。(Means for Solving the Problems) In order to solve the above-mentioned problems, the present invention provides the following means in a multi-speaker speech recognition device that recognizes the voices of a plurality of speakers.

第１の発明としては、話者の音声を音声信号に変換する
マイクロフォンを複数本離間させて設置し、各マイクロ
フォンからの各音声信号に基づいて話者の位置を同定す
る第１の手段と、第１の手段からの同定結果に基づいて
話者を検知して、検知した話者に対応する予め登録して
おいた標準パターンを選択するｉ２の手段と、第２の手
段により選択された標準パターンに基づいて話者の音声
を認識する第３の手段とを設けた。A first invention includes a first means for installing a plurality of microphones that convert a speaker's voice into an audio signal at a distance, and identifying the speaker's position based on each audio signal from each microphone; i2 means for detecting a speaker based on the identification result from the first means and selecting a pre-registered standard pattern corresponding to the detected speaker; and a standard selected by the second means. and third means for recognizing the speaker's voice based on the pattern.

第２の発明としては、話者の音声を音声信号に変換する
マイクロフォンを複数本離間させて設置し、各マイクロ
フォンからの各音声信号に基づいて話者の位置を同定す
る第１の手段と、話者の音声の周波数特性を抽出して話
者毎に予め立縁しておいた周波数特性と同定する第２の
手段と、第１の手段及び第２の手段からの各同定結果に
基づいて話者を検知して、検知した話者に対応する予め
登録しておいた標準パターンを選択する第３の手段と、
第３の手段により選択された標準パターンに基づいて話
者の音声を認識する第４の手段とを設けた。A second invention includes a first means for installing a plurality of microphones spaced apart from each other to convert the voice of the speaker into an audio signal, and identifying the position of the speaker based on each audio signal from each microphone; a second means for extracting the frequency characteristics of the speaker's voice and identifying them with frequency characteristics previously established for each speaker; and based on each identification result from the first means and the second means. a third means for detecting a speaker and selecting a pre-registered standard pattern corresponding to the detected speaker;
and fourth means for recognizing the speaker's voice based on the standard pattern selected by the third means.

（作用）以上のような構成を有する第１及び第２の発明によれば
各々次のように作用する。(Operation) According to the first and second inventions having the above-described configurations, each operates as follows.

第１の発明では、先ず複数本離間して設置された各マイ
クロフォンからの各音声信号に基づいて話者の位置を同
定される。そして、この同定結果に基づいて話者を検知
して、この検知した話者に対応する予め登録しておいた
標準パターンを選択する。よって、この選択された標準
パターンに基づいて話者の音声が認識される。In the first invention, the speaker's position is first identified based on each audio signal from each of the plurality of microphones installed at a distance. Then, the speaker is detected based on this identification result, and a pre-registered standard pattern corresponding to the detected speaker is selected. Therefore, the speaker's voice is recognized based on this selected standard pattern.

第２の発明では、７ｇｌの発明に対して異なる点として
話者を検知する際、第１の発明のように話者位置同定の
他に話者の音声の周波数特性同定を加えている。これに
よって、−層話者の特定を確実にできる。The second invention differs from the invention of 7gl in that when detecting a speaker, frequency characteristic identification of the speaker's voice is added in addition to the speaker position identification as in the first invention. This makes it possible to reliably identify the -layer speaker.

したがって、本発明は前記問題点を解決でき、操作性の
優れた多数話者音声認識装置を提供できる。Therefore, the present invention can solve the above problems and provide a multi-speaker speech recognition device with excellent operability.

（実施例）以下、本発明の一実施例を図面に基づいて説明する。(Example) Hereinafter, one embodiment of the present invention will be described based on the drawings.

第１図は本発明の一実施例を示すブロック図である。同
図において、１，２は多数の話者が発する音声を拾って
音声信号を出力するマイクロフォン、３はマイクロフォ
ン１，２からの各音声信号を増幅するＡＭＰ、４はＡＭ
Ｐ３からの音声信号と後述する標準パターン部８からの
標準パターンとの同定を行なう認識部、５はＡＭＰ３か
らの音声信号の平均ピッチ周波数を求めるピッチ抽出部
、６はマイクロフォン１．２からの各音声信号のパワー
差を求めて各話者の位置情報として登録しておき、かつ
実際に音声を発した話者の位置データと登録しておいた
位置情報とを同定する位置同定部、７はピッチ抽出部５
からの平均ピッチ周波数による話者同定出力と位置同定
部６からの位置同定出力とに基づいて後述する標準パタ
ーン部８に予め登録しである標準パターンの中から話者
に対応する標準パターンを選択して認識部４に出力させ
るための標準パターン選択部、８は話者毎の標準パター
ンを予め登録しておく標準パターン部、９は各話者の標
準パターン登録名を人力するためのキー、１０はキー９
から人力された標準パターン登録名を標準パターン部８
に登録する標準パターンと共に登録させる認識制御部で
ある。FIG. 1 is a block diagram showing one embodiment of the present invention. In the figure, 1 and 2 are microphones that pick up voices emitted by many speakers and output audio signals, 3 is an AMP that amplifies each voice signal from microphones 1 and 2, and 4 is an AM
A recognition unit identifies the audio signal from P3 and a standard pattern from a standard pattern unit 8, which will be described later. 5 is a pitch extraction unit that calculates the average pitch frequency of the audio signal from AMP 3. 6 is a recognition unit for each of the signals from the microphone 1.2. a position identification unit 7 that calculates the power difference between the voice signals and registers the difference as position information of each speaker, and identifies the position data of the speaker who actually uttered the voice and the registered position information; Pitch extractor 5
A standard pattern corresponding to the speaker is selected from among the standard patterns registered in advance in the standard pattern section 8, which will be described later, based on the speaker identification output based on the average pitch frequency from the speaker and the position identification output from the position identification section 6. 8 is a standard pattern section for pre-registering standard patterns for each speaker; 9 is a key for manually inputting the standard pattern registration name for each speaker; 10 is key 9
The standard pattern registration name manually created from the standard pattern section 8
This is a recognition control unit that registers the standard pattern along with the standard pattern to be registered.

次に、第１図を用いて本実施例の動作を説明する。Next, the operation of this embodiment will be explained using FIG.

はじめに、話者と標準パターンの対応リストを作成する
。キー９により、例えばマイクロフォン１からマイクロ
フォン２の方向の順番で各話者（Ａ、Ｂ、Ｃ〜）の標準
パターン登録名を人力する。次に、マイクロフォン１，
２をＯＮにしてマイクロフォン１からマイクロフォン２
の方向の順番で、順次外話者が数単語、時間を区切って
発声する。位置同定部６はマイクロフォン１．マイクロ
フォン２より人力された音声信号のパワー差により各話
者の位置を決定して各話者の位置情報として登録する。First, create a correspondence list between speakers and standard patterns. Using the key 9, the standard pattern registered name of each speaker (A, B, C, etc.) is entered manually in the order from microphone 1 to microphone 2, for example. Next, microphone 1,
Turn on 2 and switch from microphone 1 to microphone 2.
In this order, the outside speaker utters several words at intervals. The position identification unit 6 includes the microphone 1. The position of each speaker is determined based on the power difference between the audio signals input manually from the microphone 2, and is registered as position information of each speaker.

この時、各話者の人数が多い場合又は話者が近接してい
る場合、位置決定を誤る可能性がある為、ピッチ抽出部
により各話者の平均ピッチ周波数を求めて登録する。こ
の平均ピッチ周波数は、声帯音源の振動周波数である。At this time, if there are a large number of speakers or if the speakers are close to each other, there is a possibility of incorrect position determination, so the average pitch frequency of each speaker is determined and registered by the pitch extraction section. This average pitch frequency is the vibration frequency of the vocal cord sound source.

日本犬の場合、成年男子で平均１００〜１５０Ｈｚ、女
子の場合、２５０〜３００Ｈｚ、子供はその中間であり
、男女間の分離は比較的容易である。各話者の平均ピッ
チ周波数を求めることにより、位置同定のみに比べてよ
り一層近接話者の誤同定を防ぐことができる。もし、平
均ピッチ周波数も似ている場合は話者位置を変更すれば
よい。以上により話者位置登録を完了する。In the case of Japanese dogs, adult males have an average frequency of 100 to 150 Hz, females have an average frequency of 250 to 300 Hz, and children are somewhere in between, so it is relatively easy to separate the sexes. By determining the average pitch frequency of each speaker, erroneous identification of nearby speakers can be further prevented than by position identification alone. If the average pitch frequencies are also similar, the speaker position can be changed. With the above steps, speaker location registration is completed.

認識動作時は、マイクロフォン１，２より人力された音
声信号をピッチ抽出部５及び位置同定部６に人力し、パ
ワーによる位置同定出力、平均ピッチ周波数による話者
同定出力を標準パターン選択部７に人力する。標準パタ
ーン選択部７は第２図に示す様に、パワーによる位置同
定出力、平均ピッチ周波数出力による話者同定を行ない
、話者に対応した標準パターンを選択する。第２図にお
いて、例えばマイクロフォン１．２の間に図中のような
位置分布に話者が位置していたとする。During the recognition operation, audio signals input from the microphones 1 and 2 are input to the pitch extraction section 5 and the position identification section 6, and the position identification output based on the power and the speaker identification output based on the average pitch frequency are sent to the standard pattern selection section 7. Manpower. As shown in FIG. 2, the standard pattern selection section 7 performs position identification output based on power and speaker identification based on average pitch frequency output, and selects a standard pattern corresponding to the speaker. In FIG. 2, it is assumed that the speaker is located between the microphones 1.2 in the position distribution as shown in the figure.

このとき、マイクロフォン１，２からの音声信号が第１
図のピッチ抽出部５及び位置同定部６に人力されて第２
図に示すような位置分布を算出したとすると、２つの位
置分布の重なっている部分が話者の位置であると決定し
て話者（図中で話者■）を特定する。At this time, the audio signals from microphones 1 and 2 are
The pitch extraction unit 5 and position identification unit 6 in the figure
Assuming that the position distribution as shown in the figure is calculated, the overlapping part of the two position distributions is determined to be the speaker's position, and the speaker (speaker ■ in the figure) is identified.

そして、認識部４は選択された標準パターンとマイクロ
フォン１．２からの音声信号との同定を行なって認識動
作を行なう。Then, the recognition unit 4 performs a recognition operation by identifying the selected standard pattern and the audio signal from the microphone 1.2.

（発明の効果）以上説明したように、本発明によれば、複数本のマイク
ロフォンから人力された音声信号により話者の位置同定
に基づいて自動的に話者を正確に検知し、検知した話者
に対応する標準パターンを選択して音声認識を行なうこ
とができ、さらに、話者の発した音声信号から周波数特
性を抽出して話者同定を行なってより一層正確に話者の
特定ができ、操作性の優れた多数話者音声認識装置を提
供できる。(Effects of the Invention) As explained above, according to the present invention, the speaker can be automatically and accurately detected based on the position identification of the speaker using human-generated audio signals from a plurality of microphones, and the detected speaker can be heard. It is possible to perform speech recognition by selecting a standard pattern corresponding to the speaker, and to identify the speaker even more accurately by extracting the frequency characteristics from the speech signal emitted by the speaker. , it is possible to provide a multi-speaker speech recognition device with excellent operability.

[Brief explanation of drawings]

第１図は本発明の一実施例を示すブロック図、第２図は
本実施例による位置分布の一例を示す図、第３図は従来
の多数話者音声認識装置を示すブロック図である。１．２・・・マイクロフォン、３・・・増幅器、４・・
・認識部、５・・・ピッチ抽出部、６・・・位置同定部
、７・・・標準パターン選択部、８・・・標準パターン
部、９・・・キー、１０−・・認識制御部。FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a diagram showing an example of position distribution according to this embodiment, and FIG. 3 is a block diagram showing a conventional multi-speaker speech recognition device. 1.2...Microphone, 3...Amplifier, 4...
- Recognition section, 5... Pitch extraction section, 6... Position identification section, 7... Standard pattern selection section, 8... Standard pattern section, 9... Key, 10... Recognition control section .

Claims

[Claims]

(1) In a multi-speaker speech recognition device that recognizes the voices of multiple speakers, multiple microphones that convert the voices of speakers into audio signals are installed at a distance, and the a first means for identifying the position of the speaker; and detecting the speaker based on the identification result from the first means, and selecting a pre-registered standard pattern corresponding to the detected speaker. A multi-speaker speech recognition device comprising a second means and a third means for recognizing the speech of speakers based on the standard pattern selected by the second means.

(2) In a multi-speaker speech recognition device that recognizes the voices of multiple speakers, multiple microphones that convert the voices of the speakers into audio signals are installed at a distance, and the a first means for identifying the position of the speaker; a second means for extracting the frequency characteristic of the speaker's voice and identifying it with a frequency characteristic registered in advance for each speaker; and the first means. and a third means for detecting a speaker based on each identification result from the second means and selecting a pre-registered standard pattern corresponding to the detected speaker; A multi-speaker speech recognition device comprising: fourth means for recognizing the speech of the speakers based on the standard pattern selected by the method.