JPS5814199A

JPS5814199A - Voice recognizer

Info

Publication number: JPS5814199A
Application number: JP56112726A
Authority: JP
Inventors: 宏樹大西
Original assignee: Sanyo Electric Co Ltd; Sanyo Denki Co Ltd
Current assignee: Sanyo Electric Co Ltd; Sanyo Denki Co Ltd
Priority date: 1981-07-17
Filing date: 1981-07-17
Publication date: 1983-01-26

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本発明は１人間の音声を認識する事のできる音声認識装
置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech recognition device capable of recognizing the speech of one person.

一般に、この種音声認識装置は、登録した話者の音声だ
けを認識対象とする特定話者用の装置と多数の人の音声
をも認識対象とする不特定話者用の装置と、に大別され
る。In general, this type of speech recognition device is divided into two types: devices for specific speakers, which recognize only the voices of registered speakers, and devices for non-specific speakers, which recognize the voices of many people. Separated.

しかしながら、現在いかなる人の音声にも完全に対応で
きる不特定話者用装置はなく、音声の個人差を代表する
特定多数の音声な予じめ登録しておき、これ等の登録音
声と入力音声との比較を行なう特定多数話者用の音声認
識装置が不特定話者用の装置に代用されており、これ等
特定、不特定話者用認識装置には格別の違いが無いもの
である。However, there is currently no speaker-independent device that can completely handle the voices of any person, and it is necessary to register in advance a number of specific voices that represent individual differences in voice, and use these registered voices and input voices. The speech recognition device for specific majority speakers is used as a substitute for the device for unspecified speakers, and there is no particular difference between these recognition devices for specific and unspecified speakers.

第１図に不特定話者を対象として特定多数話者の音声が
登録されている音声認識装置を示す。同図に於いて１１
）は音声を電気信号に変換するマイクロフオン、（２）
は該電気信号から８つのスペクトル値ｘｉ（ｉ＝１＋２
＋・・・８）を抽出する８並列のバンドパスフィルタ群
であり、夫々の通過帯域は音声帯域（１００Ｈｚ　〜４
００（ＩＨｚ　　程度）を８分割したものである。（３
）は音声パターン作成回路であり該バンドパスフィルタ
群（２）から得られる８つのスペクトル値、２７ｔ　を
サンプリング処理して、入力音声の存在範囲について１
６サンプルを網集するものである。（４）は入カバター
ンメモリであり、上記バンドパスフィルタ群（田からの
８つのスペクトル□ 値ｘｉの１６サンプルからなる入力音声）（ターンＣｘ
１ｊ　　）　　＋　　（ｉｚｔ　　１　２　、−、　　
ｓ　　　　）°　ｅ　　ｌ　　、　　２　　、　　Ｉ＋
　　。FIG. 1 shows a speech recognition device in which the voices of a specified number of speakers are registered for unspecified speakers. In the same figure, 11
) is a microphone that converts sound into electrical signals, (2)
is the eight spectral values xi (i=1+2
It is a group of 8 parallel bandpass filters that extracts
00 (approximately IHz) divided into eight parts. (3
) is an audio pattern creation circuit that samples the eight spectral values, 27t, obtained from the bandpass filter group (2) and calculates 1 for the existing range of the input audio.
6 samples are collected. (4) is an input cover turn memory, and the above band-pass filter group (input audio consisting of 16 samples of 8 spectra □ values xi from the field) (turn Cx
1j ) + (izt 1 2 , -,
s ) ° e l , 2 , I+
.

１６）が貯えられる。（５）は参照パターンメモリであ
り、予じめ登録された多数話者Ｍ人の複数のＮ語につい
ての参照音者パターン〔ｙＬｊ）ｒＩＬｒＬ。16) can be stored. (5) is a reference pattern memory in which reference speaker patterns [yLj)rILrL are registered in advance for a plurality of N words of M speakers.

（ｍ−１、２，−、Ｍ　　ｎ−］　、　２．−．・、Ｎ
）か格納されている。＋６１は距離算出回路であり、上
記参照ノくターンメモリ（５）の参照音者パターンＣｙ
ｉｊ〕ｍｎと上記入カバターンメモリ（４）の入力音声
パターンとの距離が算出される。（７）は認識処理部であり、上記距離算
出回路（６）に依って得られた距＠　ｄ　（ｍ　、　ｒ
Ｌ）が最小となる時のルを検出し、この時の入力音声が
第１番目の認識語であると認識する。斯様な音声認識装
置は、距離算出回路（６）に於いて行列パターンＣＭｉ
ｊ　）とＣｙｉｊ”：Ｊ　との距離Ｄ（ｍ、ｎ）をＫＮ
Ｎ個算する事になり、更に多人数の音声に対応しようと
して、又、認識率の向上を計ろうとしてＭを増加すれば
、大量の計算処理が必要となり、実時間での音声認識が
困難となる恐れがあった。(m-1, 2, -, M n-], 2.-.., N
) is stored. +61 is a distance calculation circuit, which uses the reference speaker pattern Cy of the above-mentioned turn memory (5).
ij]mn and the input voice pattern in the cover pattern memory (4) is calculated. (7) is a recognition processing unit, which calculates the distance @ d (m, r
The input speech at this time is recognized as the first recognized word. Such a speech recognition device uses the matrix pattern CMi in the distance calculation circuit (6).
The distance D (m, n) between Cyij”: J
If you increase M in order to accommodate the voices of a larger number of people or to improve the recognition rate, a large amount of calculation processing will be required, making it difficult to recognize voices in real time. There was a fear that this would happen.

本発明は斯る実情に鑑みて為されたものであり、認識の
為の計算処理の簡略化を計った音声認識装置を提供する
ものである。The present invention has been made in view of the above circumstances, and it is an object of the present invention to provide a speech recognition device that simplifies calculation processing for recognition.

第２図に本発明の音声認識装置を示す。同図に於いて、
（１）〜〔４）は第１図の従来装置と同様にマイクロフ
ォン−入カバターンメモリを示している。FIG. 2 shows a speech recognition device of the present invention. In the same figure,
(1) to [4] indicate a microphone-input cover pattern memory similar to the conventional device shown in FIG.

（８）は標準音声パターンメモリであり、予じめＮ個の
各認識語について、Ｓ準的な標準音声パターンＣ９’ｊ
′Ｉｎ、（ｔｓ＝１．２．−Ａ’）が貯えられている。(8) is a standard speech pattern memory, in which S-quasi standard speech patterns C9'j are stored in advance for each of N recognized words.
'In, (ts=1.2.-A') is stored.

（９）は距離算出回路であり、上記標準パターンメモリ
（８）の各標準パターン〔７番）〕ルと入カバターンメ
モ９　（４）の入力音声パターンとの距離を算出する。(9) is a distance calculating circuit which calculates the distance between each standard pattern [No. 7] in the standard pattern memory (8) and the input voice pattern in the input cover turn memo 9 (4).

（１（Ｉは該距離算出回路（８）から得られる距離へ’
ｌ　）ルｇ）＝（ｄｌｌｌ、　ｄ（２）、−、ｄ（Ｍ）
ｔｒＸ貯えられる距離ベクトルメモリである。０１）は
参照距離ベクトルメモリであり、予じめＭ人の特定多数
話者のＮ語の音声パターンについて、その各音声バター
’Ｃ’／’）　〕、−”　、（ｍ−１＋　２　ｍ”幸１
Ｍ　　ｎ−１，２，・・・、Ｎ）と上記標準パターンメ
モリ（７）の標準音声パターン〔ｙす〕ル、（ｎ＋ｗｌ
　ｌ　２　＊・・・、Ｎ）との距離からなる距離ベクトル［＋）　ｍｎ−（ｄＶ’ｌ？、　
ｄ（舊几・・・、ｄ恥。(1 (I is the distance obtained from the distance calculation circuit (8)'
l ) le g) = (dllll, d(2), -, d(M)
This is a distance vector memory in which trX is stored. 01) is a reference distance vector memory, in which each speech butter 'C'/') ], -", (m-1+2 m" Good luck 1
M n-1, 2, ..., N) and the standard voice pattern [y series] of the standard pattern memory (7), (n+wl
Distance vector [+) mn-(dV'l?,
d (舊几..., d shame.

（ｍ＝１　、２　、・・、Ｍ　ｎ−１、２、・・・Ｎ）
がｍｎが列のマトリクス状に格納されている。■は上記
誤差算出回路であり、上記参照距離ベクトルメモリ０１
１の各参照距離ベクトルＩＤｍｎと上記距離ベクトルメ
モリＧＯの距離ベクトルのとの誤差 δ（扉、ｒＬ）　−０）　−ｒＤｍ　ｎ　　が算出され
る。α３は認識処理部であり、上記誤差算出回路０２か
ら得られる誤差δ（ｍ　、　ｎ　）が最小となる時のル
を検出する事に依って、この時マイクロフォン＋１１に
入力された音声が第ｎ番目の認識語であると決定される
。(m=1, 2,..., M n-1, 2,...N)
are stored in a matrix with mn columns. 3 is the error calculation circuit described above, and the reference distance vector memory 01
The error δ(door, rL) −0) −rDm n between each reference distance vector IDmn of 1 and the distance vector of the distance vector memory GO is calculated. α3 is a recognition processing unit, and by detecting the time when the error δ(m, n) obtained from the error calculation circuit 02 is the minimum, the voice input to the microphone +11 at this time is recognized as the nth It is determined that it is the th recognized word.

斯る構成の音声認識装置は、その距離算出回路（９）に
依って、入力音声パターン〔Ｘす°〕を１語に１つづつ
割り合てられた標準パターンメモリ（８）のＮ個の標準
音声パターンＣｙす°〕がとこの入力音声パターン〔Ｘ
す′〕間の距離ｄ　（ｙｚｌで表現されるＮ次元（７）
ｆｆｉｌｌｌへ’７　）　ル１）＝　（ｄｌｌｌ、　ｄ
（２＋、　・、　ｄ（／Ｖ））ｃ変換しておく。そして
、参照距離ペクト７レメモリａ１１には、予じめ１Ｍ人
Ｎ語の各音声パターン〔ｙｉｊ）ｍｎをＮ個の標準音声
ｔ＜ｙ−ｙ（いｊ〕ｎを用いて音声の個人差の値を示す
Ｎ次元ベクトルに変換された参照距離ペクトｌし［Ｄ　
７７１　ｎ　−ｔ　ｄ’ＴｒＶ、　ｄｌ？。The speech recognition device having such a configuration uses its distance calculation circuit (9) to input the input speech pattern [X°] into N pieces of standard pattern memory (8), one for each word. The standard voice pattern Cy°] is the input voice pattern [X
distance d (N dimension (7) expressed in yzl)
to ffill'7) le1) = (dllll, d
(2+, ・, d(/V))c conversion. Then, in the reference distance pect 7 memory a11, each speech pattern [yij)mn of N words of 1M people is stored in advance by using N standard speeches t<y-y(ij)n to account for individual differences in speech. The reference distance vector l converted into an N-dimensional vector indicating the value [D
771 n-t d'TrV, dl? .

・・・、　ｄＭ＞がマトリゲス的に格納されており、誤
差算出回路囮に依って、これ等参照距離ベクトルｐｔｎ
ｎと入力音声に対応する前記の距離ベクトルのとの誤差
δ（ｍ　、ｎ　）　＊即ち、音声の個人差の成分を含め
た類似度、が算出され、認識処理部（２）で最小のδ（
ｍ　、　ｎ　）となるルが検出される。..., dM> are stored in a matrix manner, and depending on the error calculation circuit decoy, these reference distance vectors ptn
The error δ (m, n) between n and the distance vector corresponding to the input voice *that is, the degree of similarity including the component of individual differences in voice is calculated, and the recognition processing unit (2) calculates the minimum δ (
m, n) is detected.

この様にマトリクス状の音声パターンＣｘ１ｊ）。In this way, the matrix-like voice pattern Cx1j).

Ｃｙｉｊ〕ｒｎｎを標準音声パターン（ｙ　ｉ　ｊ）　
ｎに依って、音声の個人差の値を示す距離ベクトルρ。Cyij] rnn as standard voice pattern (y i j)
A distance vector ρ indicating the value of individual differences in speech depending on n.

ｒＤｍｎに変換してから、これ等ｐとｌｐ　ｙｘ　ａの
マツチングを行なっているので、上記音声パターン〔Ｘ
す゛）、（ｙｉｊ　）ｍｎ　を直接マツチングするのと
等価な認識処理が為される事になる。そして、この認識
時の演算処理は、距離算出回路（９）に於いてｔ行ノ゛
列のマトリクスパターン間の距離ｄ　ｌｎｌがＮ回算出
されるので、すＮ回の減算が行なわれ。Since these p and lp yx a are matched after converting to rDmn, the above audio pattern [X
A recognition process equivalent to directly matching s゛), (yij)mn is performed. In the arithmetic processing during this recognition, since the distance d lnl between the matrix patterns in row t and column ` is calculated N times in the distance calculation circuit (9), subtraction is performed a total of N times.

誤差算出回路０に於いては１Ｍ個の成分からなる距離ベ
クトル間の誤差がＫＮＮ回算出れるので、ＫＮ”回の減
算が行なわれ、合計り゛Ｎ十Ｍ／回の減算が為される。Since the error calculation circuit 0 calculates the error between the distance vectors consisting of 1M components KNN times, subtractions are performed KN'' times, for a total of 1M subtractions.

この減算回数は合計ｉｊＭ締の減算が為される第１図に
示した従来例に比べると、近似的に（す°−Ｎ）ＭＷの
減算回数が軽減されている。即ち、実施例の如く、ｉ　
−３、ｊ−１６とした場合、認識語数Ｎを１２８語以下
とする場合に減算回数の軽減が計れる事が明らかである
。Compared to the conventional example shown in FIG. 1 in which the total number of subtractions is ijM, the number of subtractions is approximately reduced by (°-N)MW. That is, as in the example, i
-3, j-16, it is clear that the number of subtractions can be reduced if the number of recognized words N is 128 or less.

本発明は以上の説明から明らかな如く、パターン作成回
路から得られる入力音声パターンを距離算出回路に依っ
て標準パターンメモリの標準音声パターンとの距離ベク
トルに変換し、この距離ベグルトと、予じめ特定多数話
者の複数の音声バター誤差算出回路に依って算出し、こ
の誤差が最も小Ｉ −さ′い時の参照距離メモリの距離ベクトルに対応する
倉声なこの時の入力音声と認識するものであるので、従
来装置の如くマトリクス状の参照音声パターンと入力音
再パターンとを直接マツチングするのに比べ、演算処理
量を大巾に軽減する事ができ、しかもメモリ容量の低減
が計れるう従って、実時間の応答を計りながら、登録話
者を増加せしめる事に依って、高い認識率で不特定話者
に対応できる音声認識装置が実現できる。As is clear from the above description, the present invention converts an input speech pattern obtained from a pattern creation circuit into a distance vector with a standard speech pattern in a standard pattern memory using a distance calculation circuit, and It is calculated by multiple voice butter error calculation circuits of specific majority speakers, and this error is recognized as the input voice at this time corresponding to the distance vector in the reference distance memory at the time when the error is the smallest. Therefore, compared to the conventional device that directly matches the matrix-shaped reference audio pattern and the input audio re-pattern, the amount of calculation processing can be greatly reduced, and the memory capacity can also be reduced. Therefore, by increasing the number of registered speakers while measuring real-time responses, it is possible to realize a speech recognition device that can handle unspecified speakers with a high recognition rate.

[Brief explanation of the drawing]

第１図は従来の音声認識装置の構成を示すブロック図、
第２図は本発明の音声認識装置を示すブロック図、であ
り、（２）はバンドパスフィルタ群・（３）はパターン
作成回路、（４）は入カバターンメモリ１６）＋９１は
距離算出回路、　１７１（１３は認識処理部、（８）は
標準ハターンメモリ、αＯは距離ベクトルメモリ、ａｌ
ｌは参照距離ベクトルメモリ、紛は誤差算出回路。を夫々示している。第１図FIG. 1 is a block diagram showing the configuration of a conventional speech recognition device.
FIG. 2 is a block diagram showing the speech recognition device of the present invention, where (2) is a group of band-pass filters, (3) is a pattern creation circuit, and (4) is an input cover pattern memory 16)+91 is a distance calculation circuit. , 171 (13 is the recognition processing unit, (8) is the standard Hatern memory, αO is the distance vector memory, al
l is a reference distance vector memory, and d is an error calculation circuit. are shown respectively. Figure 1

Claims

[Claims]

(1) A pattern creation circuit that creates a voice characteristic pattern based on the voice signal of input voice, a standard pattern memory that stores standard voice patterns for a plurality of voices in advance, and a Regarding multiple voice patterns of specific majority speakers. A reference distance memory that stores distance vectors consisting of a plurality of distances between each of the voice patterns and the plurality of standard voice patterns in a matrix manner so as to correspond to a plurality of voices of a specific plurality of speakers, and the pattern creation circuit. a distance calculation circuit that calculates the distance between the audio badan obtained from the standard pattern memory and the plurality of standard patterns in the standard pattern memory, and a distance vector consisting of the plurality of distances obtained by the distance calculation circuit and each distance in the reference distance memory. a distance error calculation circuit that calculates an error with the distance vector, detects a distance vector in the reference distance memory that minimizes the error obtained from the distance error calculation circuit, and calculates the audio corresponding to this distance vector. A speech recognition device characterized by recognizing input speech at times.