JPH02240700A

JPH02240700A - Voice recognizing device

Info

Publication number: JPH02240700A
Application number: JP1061367A
Authority: JP
Inventors: Makoto Akaha; 誠赤羽
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1989-03-14
Filing date: 1989-03-14
Publication date: 1990-09-25
Anticipated expiration: 2015-06-05
Also published as: JP3049711B2

Abstract

PURPOSE:To execute phoneme recognition with high accuracy and to recognize continuous voices of a large vocabulary by comparing and collating the characteristics of the phoneme segments extracted from an input voice signal and the characteristics of the phoneme segments described in a phoneme knowledge base, thereby recognizing the phoneme. CONSTITUTION:The respective parameters obtd. in an acoustic analyzing circuit 5 are supplied as parameters for recognition processing to a phoneme recognizing circuit 8. The respective parameters outputted from circuits 51 to 56 are supplied as the parameters for segmentation to a characteristic point extracting circuit 61 of a 1st segmentation circuit 6. The respective parameters attached with the characteristic points are supplied to a 2nd segmentation circuit 7. The phoneme characteristics of the respective phoneme segments extracted from the parameters for the recognition processing are compared and contrasted with the phoneme characteristics of the phoneme segments stored in the phoneme knowledge base and the phoneme candidate arrays are outputted in accordance with the results thereof in the phoneme recognizing circuit 8. The phoneme recognition with the high accuracy is executed in this way and the continuous voices of the large vocabulary are recognized.

Description

【発明の詳細な説明】〔産業上の利用分野］この発明は、音声認識装置、特に音韻知識ベースと推論
手段を備えた音声認識装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a speech recognition device, and particularly to a speech recognition device equipped with a phonological knowledge base and inference means.

[Summary of the invention]

この発明は、音声ＵＷ　識装置において、母音、子音、
ホルマントの定常部、遷移部等で音声を音韻のセグメン
トに分割し、このセグメント単位で記述された音韻認識
ベースを備え、入力音声信号から抽出された音韻セグメ
ントの特徴と、音韻知識ベースの音韻セグメントの特徴
とを比較、照合し、この比較、照合の結果に基づいて音
韻候補を得るようにしたことにより、精度の高い音韻認
識を行え、人語霊、連続音声認識をできるようにしたも
のである。The present invention provides a speech UW recognition device for vowels, consonants,
Speech is divided into phonological segments at stationary parts, transitional parts, etc. of formants, and a phonological recognition base described in units of segments is provided, and features of the phonological segments extracted from the input speech signal and phonological segments based on phonological knowledge are provided. By comparing and matching the features of the computer and obtaining phonological candidates based on the results of this comparison and matching, highly accurate phonological recognition can be performed and human speech recognition and continuous speech recognition can be performed. be.

[Conventional technology]

従来の音韻認識では、入力音声のスペクトルパターンを
、標準的な音韻のスペクトルパターンと比較、照合して
認識する、いわゆるパターンマツチング〔特開昭５０−
９６１０４号公報参照〕が参照的に行われていた。しか
し、パターンマツチングでは、子音の微妙な差、例えば
＾ＴＡとＡＰＡ　、を検出することが困難であった。In conventional phoneme recognition, the spectral pattern of input speech is recognized by comparing and matching it with a standard phonological spectral pattern.
96104] was carried out for reference. However, with pattern matching, it is difficult to detect subtle differences between consonants, such as ^TA and APA.

そこで、これを改善するため、音韻の特徴、音韻の識別
規則等を音韻知識ベースとするエキスパートシステムに
よって音韻認識を行なう試みがなされている。Therefore, in order to improve this problem, attempts have been made to perform phoneme recognition using an expert system that uses phoneme features, phoneme identification rules, etc. as a phoneme knowledge base.

[Problem to be solved by the invention]

しかしながら、エキスパートシステムに於ける音韻認識
では、音韻知識ベースに対し音韻の特徴、音韻の識別規
則等をどのように記述したらよいかという問題点があっ
た。However, in phoneme recognition in expert systems, there is a problem in how to describe phoneme features, phoneme identification rules, etc. in a phoneme knowledge base.

従ってこの発明の目的は、知識ベースに対する音韻の特
徴、音韻の識別規則等の記述に改善を加えた音声認識装
置を提供することにある。Therefore, an object of the present invention is to provide a speech recognition device that improves the description of phoneme features, phoneme identification rules, etc. for a knowledge base.

[Means to solve the problem]

この発明は、母音、子音、ホルマントの定常部、遷移部
等で音声を音韻のセグメントに分割し、このセグメント
単位で記述された音韻知識ベースを備え、入力音声信号
から抽出された音韻セグメントの特徴と、音韻知識ベー
スの音韻セグメントの特徴とを比較、照合し、この比較
、照合の結果に基づいて音韻候補を得るようにした構成
としている。This invention divides speech into phonological segments based on constant parts, transition parts, etc. of vowels, consonants, formants, etc., has a phonological knowledge base described in units of segments, and has characteristics of phonological segments extracted from input speech signals. and the features of the phoneme segment based on the phoneme knowledge base are compared and verified, and phoneme candidates are obtained based on the results of this comparison and verification.

[Effect]

所定の条件に基づいて音声を音韻セグメントに分割し、
次いで、各音韻セグメント毎に特徴を求め、それを推論
手段に入力する。Divide speech into phonological segments based on predetermined conditions,
Next, features are obtained for each phoneme segment and input into the inference means.

一方、音韻知識ベースには、音韻セグメント単位で各音
韻の特徴が、例えば、ｉｆ−・−ｔｈｅｎ・−型のルー
ルで記述されている。推論手段では、音韻セグメント毎
の特徴と、音韻知識ベースの音韻セグメントの特徴とを
比較、照合し、これに基づいて音韻候補を得、音韻が特
定される。On the other hand, in the phoneme knowledge base, the characteristics of each phoneme are described in units of phoneme segments using, for example, if---then--type rules. The inference means compares and collates the features of each phoneme segment with the features of the phoneme segment based on the phoneme knowledge base, obtains phoneme candidates based on this, and specifies the phoneme.

この結果、精度の高い音韻認識が行え、大語粟、連続音
声認識が可能となる。As a result, highly accurate phoneme recognition can be performed, making it possible to recognize large words and continuous speech.

〔Example〕

以下、この発明の一実施例について第１図乃至第８図を
参照して説明する。An embodiment of the present invention will be described below with reference to FIGS. 1 to 8.

第１図は、この発明に係る音声認識装置の例を示す。FIG. 1 shows an example of a speech recognition device according to the present invention.

入力音声がマイクロホンｌで音声信号に変換され、アン
プ２及びローパスフィルタ３を介して、Ａ／Ｄ変換回路
４に供給される。音声信号は、Ａ／Ｄ変換回路４にて、
例えば、１２．５にＨｚのサンプリング周波数で１２ビ
ツトのデジタル音声信号に変換される。このデジタル音
声信号は、音響分析回路５に供給される。Input audio is converted into an audio signal by a microphone 1, and is supplied to an A/D conversion circuit 4 via an amplifier 2 and a low-pass filter 3. The audio signal is sent to the A/D conversion circuit 4.
For example, it is converted into a 12-bit digital audio signal at a sampling frequency of 12.5 Hz. This digital audio signal is supplied to the acoustic analysis circuit 5.

音響分析回路５は、バンドパスフィルタバンクを有する
過渡検出パラメータ生成回路５１と、音声パワーを検出
する対数パワー検出回路５２と、ゼロクロスレート演算
回路５３と、隣接サンプルの相関関係をみるための１次
のパーコール係数の演算回路５４と、パワースペクトル
の傾きの演算回路５５と、ホルマントの時間方向の変化
を求めるホルマント検出回路５６と、音声の基本周期の
検出回路５７を備える。The acoustic analysis circuit 5 includes a transient detection parameter generation circuit 51 having a bandpass filter bank, a logarithmic power detection circuit 52 for detecting audio power, a zero cross rate calculation circuit 53, and a primary , a calculation circuit 54 for calculating the Percoll coefficient, a calculation circuit 55 for calculating the slope of the power spectrum, a formant detection circuit 56 for determining changes in the formant in the time direction, and a detection circuit 57 for the fundamental period of speech.

過渡検出パラメータは、入力音声の過渡性及び定常性を
検出するためのもので、音声スペクトルの変化量を各チ
ャンネル（周波数）の時間方向のブロック内の分散の和
として定義される。即ち、音声スペクトル５ｉ（ｎ）を
周波数方向の以下に示す平均値Ｓ　ａｖｇ　（ｎ）でゲ
インを正規化する。The transient detection parameter is used to detect the transient nature and stationarity of the input audio, and the amount of change in the audio spectrum is defined as the sum of variances within a block in the time direction of each channel (frequency). That is, the gain of the audio spectrum 5i(n) is normalized using the average value S avg (n) shown below in the frequency direction.

ここで、ｉはチャンネル番号、ｑはチャンネル数（バン
ドパスフィルタ数）を示す。また、ｑチャンネルの各チ
ャンネルの情報は時間方向にサンプリングされるが、同
一時点のｑチャンネルの情報のブロックをフレームとい
い、ｎは認識に使用されるフレームの番号を示している
。Here, i indicates a channel number, and q indicates the number of channels (number of bandpass filters). Furthermore, although the information of each channel of the q channel is sampled in the time direction, a block of information of the q channel at the same time is called a frame, and n indicates the number of the frame used for recognition.

ゲイン正規化の行われた音声スペクトル５ｉ（ｎ）は、Ｓ　ｉ　（ｎ）　＝　Ｓ　ｉ　（ｎ）　　Ｓ　ａｖｇ　
（ｎ）　・−−−−−（２）となる。The speech spectrum 5i(n) after gain normalization is S i (n) = S i (n) S avg
(n) ・----(2).

過渡検出パラメータＴ　（ｎ）は、そのフレームの前後
のＨフレームの合計（２Ｍ＋１）である（ｎ−Ｍ、ｎ＋
門〕ブロック内の各チャンネルの時間方向の分散の和と
して定義される。The transient detection parameter T (n) is the sum of H frames before and after that frame (2M+1) (n-M, n+
[gate] is defined as the sum of the time-direction variance of each channel within the block.

ここで、であり、各チャンネルのブロック内の時間方向の平均値
である。Here, is the average value in the time direction within the block of each channel.

実際的には、（ｎ　−Ｍ　＋　ｎ　＋　Ｍ　）ブロック
中心付近の変化は、音の揺らぎ或いはノイズを拾い易い
ので、過渡検出パラメータＴ　（ｎ）の計算から取り除
くこととし、第（３）式は次のように変形される。In practice, changes near the center of the (n - M + n + M) block are likely to pick up sound fluctuations or noise, so they are removed from the calculation of the transient detection parameter T (n), and Equation (3) is transformed as follows.

そして、第（５）式において、−例として、ａ・１．Ｍ
・２Ｂ、　ｍ＝３．４＝３２の場合の過渡検出パラメー
タＴ　（ｎ）が求められる。例えば、「あきよ（ａｋｙ
ｏ）　Ｊという入力音声の場合、第２図Ａのような過渡
検出パラメータＴ　（ｎ）が得られる。In equation (5), - as an example, a.1. M
- The transient detection parameter T (n) in the case of 2B, m=3.4=32 is found. For example, ``Akiyo (akiyo)
o) For an input voice J, a transient detection parameter T (n) as shown in FIG. 2A is obtained.

他のパラメータ、例えば、第２図Ｂに示される対数パワ
ー、第２図Ｃに示されるゼロクロスレート、第２図りに
示される１次のパーコール係数、第２図已に示されるパ
ワースペクトルの傾きの検出、第２図Ｈに示される基本
周期等のパラメータの演算も、過渡検出パラメータＴ　
（ｎ）　と同様に、成る時点（フレーム）を中心として
その前後にｈフレーム分の時間幅を有するウィンドーを
考え、このウィンドーを順次、１サンプル点ずつ時間方
向に移動させ、各ウィンドー内で夫々演算を行うことに
より得られる。尚、第２図Ｆ及びＪには入力音声「あき
よ（ａｋｙｏ）　Ｊの波形、第２図■にはホルマントの
遷移、そして第２図Ｇ及びＫには上述のパラメータに基
づいて得られた音韻境界候補の例を示す。第２図中、Ｆ
及びＪ、Ｇ及びＫは、他のパラメータとの比較の便宜上
、同一内容のものを重複して示している。Other parameters, such as the logarithmic power shown in Figure 2B, the zero crossing rate shown in Figure 2C, the first-order Percoll coefficient shown in Figure 2, and the slope of the power spectrum shown in Figure 2 Detection and calculation of parameters such as the fundamental period shown in FIG. 2H are also performed using the transient detection parameter T.
Similarly to (n), consider a window that has a time width of h frames before and after the point in time (frame) at the center, and sequentially move this window one sample point at a time in the time direction. Obtained by performing calculations. In addition, Fig. 2 F and J show the waveform of the input voice ``Akyo J'', Fig. 2 ■ shows the formant transition, and Fig. 2 G and K show the phoneme obtained based on the above-mentioned parameters. An example of a boundary candidate is shown. In Fig. 2, F
, and J, G, and K have the same content and are shown redundantly for convenience of comparison with other parameters.

音響分析回路５で得られた各パラメータは、認識処理用
パラメータとして音韻認識回路８に供給され、回路５１
〜５６から出力される各パラメータはセグメンテーショ
ン用パラメータとして第１セグメンテーシゴン回路６の
特徴点抽出回路６１に供給される。Each parameter obtained by the acoustic analysis circuit 5 is supplied to the phoneme recognition circuit 8 as a parameter for recognition processing, and the circuit 51
Each parameter output from 56 is supplied to the feature point extraction circuit 61 of the first segmentation circuit 6 as a segmentation parameter.

第１セグメンテーション回路６では、セグメンテーショ
ン用パラメータから音韻境界候補を求めるために、−船
釣な特徴点を抽出する。この例では、特徴点として次の
１５種類を用いる。The first segmentation circuit 6 extracts negative feature points in order to find phoneme boundary candidates from the segmentation parameters. In this example, the following 15 types of feature points are used.

■立上がり点−平坦な部分から増加方向に変化する点 ■立下がり点−減少方向に変化した後、平坦になる部分
の点 ■増加変化点−増加率が変化する点 ■減少変化点−減少率が変化する点 ■ピーク点−ピークの位置 ■正のゼロクロス点−増加方向で零レベルと交差する点 ■負のゼロクロス点−減少方向で零レベルと交差する点０語頭、語尾（無音からの立上がり、無音への立下がり
）、０語頭、語尾の不安定な部分から安定になる点［相］
語中の休止による無音区間への立上がりと立下がり ■子音区間←→母音区間での変化点＠母音区間内でホルマントの定常区間から遷移区間の始
点、または遷移区間の終点 ■子音区間内でホルマントの定常区間から遷移区間の始
点、または遷移区間の終点［相］ホルマントの発生する点、消失する点■ボイスバ
ーの区間の始点と終点、尚、この明細書中、ボイスバー
とは、有声子音の前に唇が閉じている状態で声帯の振動
が有る時に発生する低域の周波数成分のみからなる音声
信号をいう。■ Rising point - the point where the flat part changes to an increasing direction ■ Falling point - the point where it changes to a decreasing direction and then becomes flat ■ Increased change point - the point where the rate of increase changes ■ Decrease change point - the decreasing rate ■Peak point - the position of the peak ■Positive zero crossing point - the point that crosses the zero level in the increasing direction ■Negative zero crossing point - the point that crosses the zero level in the decreasing direction 0 Word beginning, ending (rising from silence) , falling to silence), the point at which the unstable part at the beginning and end of the 0th word becomes stable [phase]
Rising and falling to silent intervals due to mid-word pauses ■Consonant interval ← → Change point in vowel interval @ From the steady formant interval within the vowel interval to the start point of the transition interval, or the end point of the transition interval ■Formant within the consonant interval From the steady interval to the start point of the transition interval, or the end point of the transition interval [Phase] Point where the formant occurs, point where it disappears■Start point and end point of the voice bar interval.In this specification, the voice bar refers to the point before a voiced consonant. A voice signal consisting only of low frequency components that occurs when the lips are closed and the vocal cords vibrate.

特徴点抽出回路６１では、特徴点情報記憶回路６２から
の特徴点情報を参照して各パラメータ毎に特徴点を抽出
する。第２図Ａ−Ｅの各パラメータ中、時間軸方向に幕
線で示す位置が各特徴点の位置である。The feature point extraction circuit 61 refers to the feature point information from the feature point information storage circuit 62 and extracts feature points for each parameter. Among the parameters in FIGS. 2A to 2E, the positions indicated by the curtain lines in the time axis direction are the positions of each feature point.

第１セグメンテーション回路６から得られ、特徴点の付
された各パラメータは、第２セグメンテーシツン回路７
に供給される。Each parameter obtained from the first segmentation circuit 6 and to which feature points are attached is sent to the second segmentation circuit 7.
is supplied to

第２セグメンテーシジン回路７は、特徴点統合処理回路
７１と、音韻境界特徴検出回路７２と、特徴点統合情報
記憶回路７３と、音韻境界特徴情報記憶回路７４とから
なる。The second segmentation circuit 7 includes a feature point integration processing circuit 71, a phoneme boundary feature detection circuit 72, a feature point integration information storage circuit 73, and a phoneme boundary feature information storage circuit 74.

第１セグメンテーション回路６で求めた特徴点はパラメ
ータ毎に位置ズレ、未検出等があるので、特徴点統合処
理回路７１にて特徴点統合情報記憶回路７３からの特徴
点統合情報を参照して各パラメータの特徴点をまとめ音
韻境界候補を決定する。Since the feature points obtained by the first segmentation circuit 6 may have positional deviations or undetected points for each parameter, the feature point integration processing circuit 71 refers to the feature point integration information from the feature point integration information storage circuit 73 and The feature points of the parameters are summarized and phoneme boundary candidates are determined.

尚、特徴点統合情報は、どのパラメータの特徴点を優先
するかについての情報である。Note that the feature point integration information is information about which parameter's feature point should be prioritized.

音韻境界特徴検出回路７２では、各音韻境界候補の音韻
境界特徴を求める。この例では以下の音韻境界特徴が用
いられている。The phoneme boundary feature detection circuit 72 determines the phoneme boundary feature of each phoneme boundary candidate. In this example, the following phonetic boundary features are used.

■無音からの立上がり（ＳＩＬ−Ｒ） ■子音性→母音性（Ｃ−Ｖ） ■母音性→母音性（Ｖ−Ｖ） ■母音性→母音の過渡部（Ｖ−Ｖ、Ｔ）■母音の過渡部
→子音性（Ｖ、Ｔ−Ｃ）■子音性→母音の過渡部（Ｃ−
Ｖ、Ｔ）■母音の過渡部→母音性（Ｖ、Ｔ−Ｖ）■無音
への立下がり（Ｆ　−Ｓ　ＩＬ）■有音→無音（ＳＮＤ
−３ＩＬ）［相］子音性→子音性（Ｃ−Ｃ） ■子音性→子音の過渡部（Ｃ−Ｃ，Ｔ）■子音の過渡部
→子音性（Ｃ，Ｔ−Ｃ）音韻境界特徴情報記憶回路７４
には、これら１２種類の音韻境界特徴情報が記憶されて
おり、音韻境界特徴検出回路７２では、音韻境界特徴情
報記憶回路７４からの情報を参照して各音韻境界候補の
音韻境界特徴を検出する。■ Rise from silence (SIL-R) ■ Consonantity → Vowelness (C-V) ■ Vowelness → Vowelness (V-V) ■ Vowelness → Vowel transition (V-V, T) ■ Vowel Transitional part → Consonantity (V, T-C) ■ Consonantity → Vowel transitional part (C-
V, T) ■Vowel transition → vowel character (V, T-V) ■Falling to silence (F-S IL) ■Sound → silence (SND
-3IL) [Phase] Consonance → Consonance (C-C) ■Consonance → Consonant transition (C-C, T) ■ Consonant transition → Consonance (C, T-C) Phonological boundary feature information Memory circuit 74
These 12 types of phoneme boundary feature information are stored in the phoneme boundary feature detection circuit 72, and the phoneme boundary feature detection circuit 72 detects the phoneme boundary feature of each phoneme boundary candidate by referring to information from the phoneme boundary feature information storage circuit 74. .

第２セグメンテーション回路７からは、音韻区間情報と
して、音韻境界候補情報と、その音韻境界特徴情報が得
られる。そして、この音韻区間情報が音韻認識回路８に
供給される。The second segmentation circuit 7 obtains phoneme boundary candidate information and its phoneme boundary feature information as phoneme interval information. This phoneme segment information is then supplied to the phoneme recognition circuit 8.

音韻認識回路８では、音響分析回路５からの認識処理用
パラメータと、第２セグメンテーション回路７からの音
韻区間情報に基づいて音韻認識を実行する。The phoneme recognition circuit 8 executes phoneme recognition based on the recognition processing parameters from the acoustic analysis circuit 5 and the phoneme interval information from the second segmentation circuit 7.

音韻認識回路８では、認識処理用パラメータから抽出さ
れる各音韻セグメントの音韻特徴を、音韻知識ベース〔
以下、単に知識ベースと称する〕に蓄えられている音韻
セグメントの音韻特徴と比較、照合する。そして、この
結果に基づいて音韻候補列が出力される。この過程を第
３図のフローチャートに基づいて説明する。The phonological recognition circuit 8 uses the phonological knowledge base [
The information is compared and collated with the phoneme features of the phoneme segment stored in the knowledge base (hereinafter simply referred to as knowledge base). Then, based on this result, a phoneme candidate sequence is output. This process will be explained based on the flowchart of FIG.

上述したように第１及び第２セグメンテーション回路６
．７にて音韻セグメントが形成される（ステップ１０１
）。As described above, the first and second segmentation circuits 6
．． 7, a phonological segment is formed (step 101
).

次いで、音韻認識回路８にて、各音韻セグメントの音韻
特徴が抽出される。具体的には、音響分析回路５からの
認識処理用パラメータの統計量に従い、ホルマントが定
常区間である音韻セグメントに於いて、調音方法が、母
音性／子音性、有声／無声、摩擦音／破裂音／鼻音等に
分類される（ステップ１０２）。Next, the phoneme recognition circuit 8 extracts phoneme features of each phoneme segment. Specifically, according to the statistics of recognition processing parameters from the acoustic analysis circuit 5, in a phoneme segment where the formant is a stationary interval, the articulation method is vowel/consonant, voiced/unvoiced, fricative/plosive. /nasal, etc. (step 102).

更に、音韻認識回路８内に設けられている音韻特徴ディ
テクタによって、破裂点、ボイスバー、摩擦性のエネル
ギーの集中している周波数帯域のカットオフ周波数、そ
してホルマント遷移方法等の情報が求められる（ステッ
プ１０３　）　。Furthermore, the phonological feature detector provided in the phonological recognition circuit 8 obtains information such as rupture points, voice bars, cutoff frequencies of frequency bands where frictional energy is concentrated, and formant transition methods (step 103).

以上の処理によって、各音韻セグメント毎の音韻特徴が
求められる。これに基づきｉｆ・−・−ｔｈｅｎ型の推
論が以下のような手順で行なわれる〔ステップ１０４〕
。尚、この推論に用いられる知識ベースには、ステップ
１０５にてｉｆ−・ｔｈｅｎ型のルールが記述されてい
る。また、以下の推論では必要に応じて音韻セグメント
の特徴抽出〔ステップ１０２］へ戻り、再処理を実行す
る。Through the above processing, phoneme features for each phoneme segment are determined. Based on this, if---then type inference is performed in the following steps [Step 104]
. Incidentally, in the knowledge base used for this inference, an if--then type rule is described in step 105. Further, in the following inference, if necessary, the process returns to feature extraction of phoneme segments [step 102] and reprocessing is executed.

■、第１の推論ホルマント遷移方法により、先行、後続の各子音セグメ
ントの調音位置（例えば、口唇、＠茎、口蓋）を決定す
る。(2) Determine the articulatory position (eg, labial, stem, palate) of each preceding and subsequent consonant segment using the first speculative formant transition method.

ルール（１１）後続母音／ａ／に対するホルマント遷移
が、第４図に示されるような状態ならば、両唇音である
。尚、図中、Ｆｌ、Ｆ２は夫々第１ホルマント、第２ホ
ルマントを表す。Rule (11) If the formant transition for the following vowel /a/ is as shown in Figure 4, it is a bilabial sound. In the figure, Fl and F2 represent the first formant and the second formant, respectively.

ルール（１２）後続母音／ａ／に対するホルマント遷移
が、第５図に示されるような状態ならば、歯茎台である
。Rule (12) If the formant transition for the following vowel /a/ is as shown in FIG. 5, it is an alveolar stage.

ルール（１３）後続母音／ａ／に対するホルマント遷移
が、第６図に示されるような状態ならば、口蓋音である
。尚、その他のルールは省略する。Rule (13) If the formant transition for the following vowel /a/ is as shown in Figure 6, it is a palatal sound. Note that other rules are omitted.

■、第２の推論調音方法と調音位置で子音セグメントの音韻を決定する
。(2) Determine the phonology of the consonant segment using the second inferential articulation method and articulation position.

ルール（２１）調音方法が無声摩擦音で後続のホルマン
ト遷移から求まる調音位置が両唇音ならば、／ｆ／であ
る。Rule (21) If the articulation method is a voiceless fricative and the articulatory position determined from the subsequent formant transition is a bilabial, then /f/.

ルール（２２）調音方法が無声摩擦音で調音位置が歯茎
音ならば、／ｓ／である。Rule (22) If the articulation method is a voiceless fricative and the articulation position is an alveolar sound, it is /s/.

ルール（２３）調音方法が無声摩擦音で調音位置が口蓋
台ならば、／ｓｈ／である。Rule (23) If the method of articulation is a voiceless fricative and the position of articulation is the palatal base, it is /sh/.

ルール（２４）調音方法が無声破裂音で調音位置が口蓋
台ならば、／に／である。Rule (24) If the method of articulation is a voiceless plosive and the position of articulation is the palatal stand, then it is /ni/.

ルール（２５）調音方法が有声破裂音で調音位置が口唇
台ならば、／ｂ／である。尚、その他のルールは省略す
る。Rule (25) If the method of articulation is a voiced plosive and the position of articulation is the labial stand, it is /b/. Note that other rules are omitted.

■、第３の推論調音方法と調音位置で音韻が特定できない時、バースト
、ボイスバー、カットオフ周波数等の音韻特徴を用いて
子音セグメントの音韻を決定する。(2) Third inferential articulation method When the phoneme cannot be specified by the articulatory position, the phoneme of the consonant segment is determined using phoneme features such as burst, voice bar, and cut-off frequency.

ルール（３１）無声破裂音でバーストが二か所以上あれ
ば、／に／である。Rule (31) If there are two or more bursts in a voiceless plosive, it is /ni/.

ルール（３２）破裂区間の長さは、／ｐ／　＜／ｌ／　
＜八／の順になる。尚、その他のルールは省略する。Rule (32) The length of the rupture section is /p/ </l/
The order is <8/. Note that other rules are omitted.

■、第４の推論母音性区間でホルマント遷移の長い区間に対して半母音
の決定を行う。(2) A semi-vowel is determined for a long interval of formant transition in the fourth inferred vowel interval.

ルール（４１）後続母音１０／でホルマント遷移が、第
７図に示されるような状態ならば、子音／ｙ／である。Rule (41) If the formant transition in the following vowel 10/ is as shown in FIG. 7, it is a consonant /y/.

尚、その他のルールは省略する。Note that other rules are omitted.

上述のｉ　ｆ−ｔ　ｈ　ｅ　ｎ型の推論によって求まっ
た音韻候補列の矛盾性のチエツク〔ステップ１０６　）
が、ステップ１０７で記述されている音韻接続知識ベー
スに基づいて行なわれる。ステップ１０６では、音韻候
補列が日本語音曲として正しく接続されているか否かの
チエツクを行う。尚、このチエツクによっても音韻候補
列を特定できない時は、前段階のステップ１０２、ステ
ップ１０４へ戻り、再処理が行なわれる。Checking the consistency of the phoneme candidate string found by the above-mentioned if-the-n type inference [step 106]
is performed based on the phoneme connection knowledge base described in step 107. In step 106, it is checked whether the phoneme candidate string is correctly connected as a Japanese musical piece. Incidentally, if the phoneme candidate string cannot be specified even by this check, the process returns to the previous steps 102 and 104 and reprocessing is performed.

入力音声「あきよ（ａｋｙｏ）　Ｊを例にして、上述の
推論を説明する。The above reasoning will be explained using the input voice "Akyo J" as an example.

■、音韻セグメンテーションされた後、前述した各種パ
ラメータにて調音方法が分類される。(2) After phoneme segmentation, articulation methods are classified based on the various parameters mentioned above.

第２図Ｇ、Ｋに示される音韻境界候補の音韻境界特徴が
以下のように規定される。The phoneme boundary features of the phoneme boundary candidates shown in FIGS. 2G and 2K are defined as follows.

■（ＳＩＬ−Ｒ）　〜（Ｃ−Ｖ）　　　子音性　有声■
（Ｃ−Ｖ）〜（Ｖ−Ｖ、Ｔ）　　母音性　有声■（Ｖ−
Ｖ、　Ｔ）　〜（Ｖ、Ｔ−３ＩＬ）　　母音性有声 ■（Ｖ、Ｔ−３ＩＬ）〜（ＳＮＤ−ＳＩＬ）子音性無声 ■（ＳＮＤ−３ＩＬ）〜（ＳＩＬ−Ｒ）　　無音■（Ｓ
ＩＬ−Ｒ）　〜（Ｃ−Ｖ、Ｔ）子音性　無声（破裂、バ
ースト２つ） ■（Ｃ−Ｖ、Ｔ）〜（Ｖ、Ｔ−Ｖ）母音性　有声■（Ｖ
、Ｔ−Ｖ）〜（Ｆ−３ＩＬ）母音性　有声■（Ｆ−３Ｉ
Ｌ）〜（ＳＮＤ−３ＩＬ）子音性　無声１１、　　（Ｖ
−Ｖ、Ｔ）〜（Ｖ、　Ｔ−３ＩＬ）間のホルマント遷移
の先行母音が／ａ／なので、第８図に示されるホルマン
ト遷移、及びルール（１３）から後続の子音は口蓋台と
なる。■(SIL-R) ~(C-V) Consonant Voiced■
(C-V) ~ (V-V, T) Vowelness Voiced ■ (V-
V, T) ~ (V, T-3IL) Vowel voiced ■ (V, T-3IL) ~ (SND-SIL) Consonant voiceless ■ (SND-3IL) ~ (SIL-R) Silent ■ (S
IL-R) ~ (C-V, T) Consonant Voiceless (plosive, 2 bursts) ■ (C-V, T) ~ (V, T-V) Vowel Voiced ■ (V
, T-V) ~ (F-3IL) Vowelic Voiced ■ (F-3I
L) ~ (SND-3IL) Consonant voiceless 11, (V
Since the preceding vowel of the formant transition between -V, T) and (V, T-3IL) is /a/, the following consonant is a palatal base from the formant transition shown in FIG. 8 and rule (13).

ＩＩｌ、　　（ＳｒＬ−Ｒ−Ｃ−Ｖ、　Ｔ）の音韻セグ
メントは、ルール（２４）の破裂音と口蓋台の特徴から
７ｋｌとなる。The phonological segment of IIl, (SrL-R-C-V, T) is 7kl based on the plosive and palatine features of rule (24).

ＩＶ、　　（Ｃ−Ｖ、　Ｔ　〜Ｖ、　Ｔ−Ｖ）　（７）
セグメントは、ホルマントの遷移区間が長いので、半母
音のチエツクを行う、この結果、ルール（４１）のホル
マント遷移と同じなので、子音／ｙ／となる。IV, (C-V, T ~ V, T-V) (7)
Since the segment has a long formant transition period, a semi-vowel check is performed, and the result is the consonant /y/ since it is the same as the formant transition in rule (41).

７０以上の推論により、音韻認識回路８からは、／ａ／
＋／に／＋／ｙ／＋１０／の音韻候補列が出力される。Based on 70 or more inferences, the phoneme recognition circuit 8 outputs /a/
A phoneme candidate string of /+/y/+10/ is output to +/.

このように、入力音声信号に基づいて形成される音韻セ
グメントの音韻特徴と、知識ベースに改善して記述され
ている音韻セグメントの音韻特徴とを比較、照合し音［
ｙ２識しているので、精度の高い音韻認識を行え、大語
霊、連続音声認識が可能となる。In this way, the phonological features of the phonological segment formed based on the input speech signal are compared and matched with the phonological features of the phonological segment that have been improved and described in the knowledge base, and the sound [
Since it recognizes y2, it can perform highly accurate phoneme recognition, and it becomes possible to recognize large words and continuous speech.

〔Effect of the invention〕

この発明によれば、音韻知識ベースに対する音韻の特徴
、音韻の識別規則等の記述を改善し、入力音声信号から
抽出された音韻セグメントの特徴と、音韻知識ベースに
記述されている音韻セグメントの特徴とを比較、照合し
音ｎ認識しているので、精度の高い音ｆｌ＃Ｘ１ｌｋを
行うことができ、大語霊、連続音声認識が可能になると
いう効果がある。According to this invention, the description of phoneme features, phoneme identification rules, etc. for the phoneme knowledge base is improved, and the characteristics of the phoneme segment extracted from the input speech signal and the characteristics of the phoneme segment described in the phoneme knowledge base are improved. Since the sound n is recognized by comparing and collating the sound n, it is possible to perform the sound fl#X1lk with high accuracy, and there is an effect that it is possible to perform large word comprehension and continuous speech recognition.

[Brief explanation of drawings]

第１図はこの発明の一実施例を示すブロック図、第２図
は夫々パラメータの波形図、第３図は推論の手順を示す
フローチャート、第４図乃至第８図は夫々ホルマント遷
移のパターンを示す説明図である。図面における主要な符号の説明５：音響分析回路、６：第１セグメンテーション回路、
７：第２セグメンテーション回路、８：音韻認識回路。Fig. 1 is a block diagram showing an embodiment of the present invention, Fig. 2 is a waveform diagram of each parameter, Fig. 3 is a flowchart showing the inference procedure, and Figs. 4 to 8 each show a pattern of formant transition. FIG. Explanation of main symbols in the drawings 5: Acoustic analysis circuit, 6: First segmentation circuit,
7: Second segmentation circuit, 8: Phonological recognition circuit.

Claims

[Claims]

Speech is divided into phonological segments by constant parts, transition parts, etc. of vowels, consonants, formants, etc., and a phonological knowledge base described in units of segments is provided, and the features of the phonological segments extracted from the input speech signal and the above A speech recognition device that compares and matches features of phoneme segments based on phoneme knowledge and obtains phoneme candidates based on the results of this comparison and check.