JP2898637B2

JP2898637B2 - Audio signal analysis method

Info

Publication number: JP2898637B2
Application number: JP63047418A
Authority: JP
Inventors: 政巳赤嶺
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1987-12-10
Filing date: 1988-03-02
Publication date: 1999-06-02
Anticipated expiration: 2014-06-02
Also published as: JPH01251000A

Description

【発明の詳細な説明】〔発明の目的〕（産業上の利用分野）この発明は、音声信号を高能率に圧縮する符号化方式
の音声信号分析方法に関し、特に極零モデル分析方法を
用いた音声信号分析方法に関する。Description: Object of the Invention (Industrial application field) The present invention relates to a speech signal analysis method of an encoding system for compressing a speech signal with high efficiency, and particularly to a pole-zero model analysis method. The present invention relates to an audio signal analysis method.

（従来の技術）音声信号を線形予測に基づいて分析する方法が、音声
の符号化や分析・合成システム、認識などで広く用いら
れている。この分析法は、音声の短時間スペクトルの包
絡を全極形のモデル（ARモデルとも言う。）で近似する
ものであり、スペクトルの極をよく近似する。しかし、
零点はうまく近似できない特質をもつ。実際の音声には
子音や鼻音のようにスペクトルに零点が存在するため、
音声信号を精度よく分析する場合には全極形のモデルで
は不十分であると考えられていた。(Prior Art) A method of analyzing a speech signal based on linear prediction is widely used in speech coding, an analysis / synthesis system, recognition, and the like. This analysis method approximates the envelope of the short-time spectrum of speech with an all-pole model (also called an AR model), and closely approximates the poles of the spectrum. But,
Zeros have characteristics that cannot be approximated well. Since real speech has zeros in the spectrum like consonants and nasal sounds,
It has been considered that an all-pole model is not sufficient for accurately analyzing audio signals.

そこで、音声の短時間スペクトル包絡を極零形のモデ
ル（ARMAモデルとも言う。）て近似する方法がいくつか
提案されている。たとえば「音声分析における極零モデ
ルの次数の固定」（信学論（Ａ）.Vol.J60−A,No.4,PP4
23〜424（1977−４））がある。Therefore, several methods have been proposed for approximating the short-time spectral envelope of speech using a pole-zero model (also referred to as an ARMA model). For example, "Fixed order of pole-zero model in speech analysis" (IEICE (A). Vol. J60-A, No. 4, PP4
23-424 (1977-4)).

この方法は、短時間スペクトルを次に示す極零形の伝
達関数で近似するものである。In this method, a short-time spectrum is approximated by a pole-zero transfer function shown below.

（ここで、a_iとp:各々、極モデルのパラメータと次数、
b_iとq:零モデルのパラメータと次数）この極零形のモデルは現実の音声生成過程をよく表し
ていると言える。上記した文献の内容を、第９図に示す
ように構成したとすると、入力された音声信号は、まず
全極形モデル（式（１）においてb_i＝0,i＝1,2,…,q）
による分析が行われる。具体的には、自己相関法処理部
101により、全極形モデルのパラメータa_i（ｉ＝1,2,…,
p）が求められる。次に、音声信号は、全極形逆フィル
タ102に通され、スペクトルの極が除去された残差信号
となる。 (Where a _i and p are the parameters and order of the polar model, respectively.
b _i and q: parameters and order of the zero model) This pole-zero model can be said to well represent the actual speech generation process. Assuming that the contents of the above-mentioned document are configured as shown in FIG. 9, the input speech signal is first converted to an all-pole model (b _i = 0, i = 1, 2,..., q)
Analysis is performed. Specifically, the autocorrelation processing unit
According to 101, the parameters a _i (i = 1, 2,...,
p) is required. Next, the audio signal is passed through an all-pole inverse filter 102 to become a residual signal from which the poles of the spectrum have been removed.

次にパワースペクトル部10で残差信号のパワースペク
トルを求め逆数処理部104で逆数をとる。このことによ
り、残差信号のスペクトルに残っていた零点は極に置き
換えられる。このため、逆フーリェ変換部105により逆
パワースペクトルの逆フーリェ変換が施され得られる自
己相関係数に全極形モデルの分析法を適用することがで
き、零モデルのパラメータb_i（ｉ＝1,2,…,q）を得るこ
とができる。Next, the power spectrum of the residual signal is obtained by the power spectrum unit 10 and the reciprocal is obtained by the reciprocal processing unit 104. As a result, the zeros remaining in the spectrum of the residual signal are replaced with poles. Therefore, it is possible to apply the analysis of the all-pole model autocorrelation coefficients inverse Fourier transform is obtained is subjected to inverse power spectrum by inverse Fourier transform unit 105, zero model parameters b _{i (i} = 1 , 2, ..., q).

しかしながらこの方法を実際の音声に適用する場合、
大きな問題点がある。これは、ピッチによるスペクトル
の微細構造により、零モデルパラメータの推定に大きな
誤りが生じることである。詳しく説明すると第10図の周
波数8kHzのμ−PCMコーデックで入力した男性の音声信
号のスペクトル、第11図の極予測後の残差信号のスペク
トルの一例を示す図から分かるように、音声信号及び残
差信号のスペクトルに音源のピッチに基づく微細構造が
表われている。この微細構造の深い谷が零点のように作
用したり、真の零点を強調したりする。例えば、第11図
の残差信号のスペクトルのＡ点に示すものがそれであ
る。このような零点を偽りの零点、又は強調された零点
と呼ぶことにする。このため、上述した従来の方法で分
析すると、得られる極零モデルの周波数特性は第12図に
示すように音声信号のスペクトル包絡とかけ離れたもの
となる。However, when applying this method to real speech,
There is a big problem. This is because the fine structure of the spectrum due to the pitch causes a large error in estimating the zero model parameters. To explain in detail, the spectrum of a male audio signal input by the μ-PCM codec having a frequency of 8 kHz in FIG. 10, as can be seen from FIG. 11, which shows an example of the spectrum of the residual signal after polar prediction in FIG. The fine structure based on the pitch of the sound source appears in the spectrum of the residual signal. The deep valleys of this microstructure act like zeros or emphasize true zeros. For example, this is shown at point A in the spectrum of the residual signal in FIG. Such a zero will be referred to as a false zero or an enhanced zero. Therefore, when analyzed by the above-described conventional method, the frequency characteristic of the obtained pole-zero model is far apart from the spectrum envelope of the audio signal as shown in FIG.

このように、従来の極零形モデルの分析法では、ピッ
チによるスペクトルの微細構造のため、零点の推定に誤
りを生じる問題点があり、全極形モデルと比べてもスペ
クトル近似が悪くなる場合がある。この問題を解決する
方法として、残差信号のパワースペクトルを２点平均に
より平滑化する方法が容易に考えられるが、ピッチ周期
は個人、特に男女の違いによって大きく異なり、また同
一人物の音声でも音韻によって変動する。このため、パ
ワースペクトルを単に平均化する方法では、パワースペ
クトルの平滑化が常に良好に行われると限らない。As described above, in the conventional method of analyzing the pole-zero model, there is a problem that the estimation of the zero point is erroneous due to the fine structure of the spectrum due to the pitch, and the spectrum approximation is worse than that of the all-pole model. There is. As a method of solving this problem, a method of smoothing the power spectrum of the residual signal by two-point averaging can be easily considered. However, the pitch period varies greatly depending on individuals, particularly on gender, and even on the voice of the same person. Will vary. Therefore, in the method of simply averaging the power spectrum, the smoothing of the power spectrum is not always performed satisfactorily.

（発明が解決しようとする課題）上述したように、従来の極零形モデルによる音声分析
法では、ピッチの影響により零点の抽出が精度良く行え
ず、得られるモデルのスペクトル近似が悪くなるという
問題がある。(Problems to be Solved by the Invention) As described above, in the conventional voice analysis method using the pole-zero model, zero points cannot be accurately extracted due to the influence of pitch, and the spectrum approximation of the obtained model is deteriorated. There is.

本発明は、このような問題に鑑みてなされたものであ
り、その目的は、ピッチに影響されず、又発声者や音韻
に依存せずに常に零点の抽出を精度良く行える極零モデ
ルのパラメータ抽出方法を含む音声信号分析方法を提供
する。The present invention has been made in view of such a problem, and an object thereof is to provide a parameter of a pole-zero model that can always accurately extract zeros without being affected by pitch and without depending on a speaker or a phoneme. An audio signal analysis method including an extraction method is provided.

[Configuration of the invention]

（課題を解決するための手段）本発明は、全極モデルの残差信号のパワースペクトル
又は、そのパワースペクトルの逆数を時間領域で平滑化
し、平滑化されたパワースペクトルの逆数から逆フーリ
ェ変換により自己相関係数を求め、求めた自己相関係数
に全極モデルの分析法を適用することにより零点のパラ
メータを抽出するものである。ただし、平滑化の度合い
はピッチ周期の値に応じて適応的に変化される。また、
平滑化のためにフィルタを用いる場合は、用いられるフ
ィルタは零位相となる。(Means for Solving the Problems) The present invention smoothes the power spectrum of the residual signal of the all-pole model or the reciprocal of the power spectrum in the time domain, and performs inverse Fourier transform from the reciprocal of the smoothed power spectrum. The autocorrelation coefficient is obtained, and a parameter of a zero point is extracted by applying an all-pole model analysis method to the obtained autocorrelation coefficient. However, the degree of smoothing is adaptively changed according to the value of the pitch period. Also,
When a filter is used for smoothing, the filter used has zero phase.

（作用）まず、入力信号を全極モデルにより分析し、全極モデ
ルのパラメータを抽出する。次に入力信号を全極形の逆
フィルタに通し、残差信号を得る。残差信号のスペクト
ルは、極が除去されたものとなり、次に、パワースペク
トルの逆数が取られ、スペクトルの零点が極に変換され
る。このとき、スペクトルを時間領域で平滑化すること
により、ピッチに基づく微細構造が平滑化される。ピッ
チによるスペクトルの微細構造の間隔は、ピッチ周期の
逆数に比例し、個人によって、又同一人物でも音韻によ
って変化する。しかして本方式によれば、スペクトル平
滑化の度合を、ピッチ周期に応じて適応的に変えている
ので、発声者や音韻に依らず常に、スペクトルの平滑化
を良好に行うことができ、微細構造による偽の零点や強
調され過ぎた零点を除去することができる。また、本方
式では、平滑化に用いられるフィルタを零位相とするこ
とにより、フィルタの位相特性のためのスペクトルの零
点がずれる問題を防止している。この結果、音声のスペ
クトルを良好に近似する極零モデルを得ることができ
る。(Operation) First, an input signal is analyzed using an all-pole model, and parameters of the all-pole model are extracted. Next, the input signal is passed through an all-pole inverse filter to obtain a residual signal. The spectrum of the residual signal is the one with the poles removed, then the inverse of the power spectrum is taken and the zeros of the spectrum are converted to poles. At this time, the fine structure based on the pitch is smoothed by smoothing the spectrum in the time domain. The interval of the fine structure of the spectrum due to the pitch is proportional to the reciprocal of the pitch period, and varies from person to person or even from the same person due to phonemes. However, according to the present method, the degree of spectrum smoothing is adaptively changed according to the pitch cycle, so that the spectrum can always be satisfactorily smoothed regardless of the speaker or phoneme. False zeros and overemphasized zeros due to the structure can be removed. Further, in the present method, the problem that the zero point of the spectrum due to the phase characteristic of the filter shifts is prevented by setting the filter used for smoothing to zero phase. As a result, a pole-zero model that satisfactorily approximates the spectrum of speech can be obtained.

（実施例）以下本発明に係る一実施例を図面を参照して詳述す
る。(Example) Hereinafter, an example according to the present invention will be described in detail with reference to the drawings.

第１図は、本発明の一実施例に係る極零予測分析方法
のブロック図を示す。図において、音声信号は端子１に
入力され、極パラメータ推定部２に入力される。極パラ
メータの推定方法としては、いくつか知られているが、
例えば文献「ディジタル音声処理）（東海大学出版会）
に示される自己相関法を用いることができる。入力信号
は、次にこの極パラメータ推定部２で得られた極パラメ
ータをもつ全極形逆フィルタ３に入力される。ここで
は、次式に従って予測残差信号ｄ（ｎ）を計算し、出力
する。FIG. 1 is a block diagram showing a pole-zero prediction analysis method according to an embodiment of the present invention. In the figure, an audio signal is input to a terminal 1 and is input to a polar parameter estimator 2. Several methods are known for estimating the polar parameters.
For example, the document "Digital Voice Processing" (Tokai University Press)
Can be used. The input signal is then input to an all-pole inverse filter 3 having the pole parameters obtained by the pole parameter estimator 2. Here, the prediction residual signal d (n) is calculated and output according to the following equation.

ここで、Ｓ（ｎ）は入力信号系列、s_iは全極モデルの
パラメータ、ｐは予測次数である。 Here, S (n) is the input signal series, s _i is the parameter of the all-pole model, p is a prediction order.

次に、高速フーリェ変換（FFT）部４と２乗回路部５
により残差信号ｄ（ｎ）のパワースペクトルを求めると
共に、ピッチ分析処理部６により、ピッチ周期の抽出と
有声／無声の判定を行う。なお、FFT部４の代りに離散
フーリェ変換（DFT）を用いることもできる。また、ピ
ッチ分析の方法としては、例えば上記した文献「ディジ
タル音声処理」に記載されている変形相関法を用いるこ
とができる。Next, a fast Fourier transform (FFT) unit 4 and a squaring circuit unit 5
, The power spectrum of the residual signal d (n) is obtained, and the pitch analysis processing unit 6 extracts a pitch cycle and determines voiced / unvoiced. Note that a discrete Fourier transform (DFT) can be used instead of the FFT unit 4. As a pitch analysis method, for example, a modified correlation method described in the above-mentioned document “Digital Speech Processing” can be used.

FFT部４と２乗回路部５によって求められた残差信号
のパワースペクトルは、スムージング回路部７に入力さ
れる。スムージング回路部７は、ピッチ分析処理部６に
より得られたピッチ周期と有声／無声の状態をパラメー
タとしてパワースペクトルを平滑化する。The power spectrum of the residual signal obtained by the FFT unit 4 and the squaring circuit unit 5 is input to the smoothing circuit unit 7. The smoothing circuit unit 7 smoothes the power spectrum using the pitch period and the voiced / unvoiced state obtained by the pitch analysis processing unit 6 as parameters.

第３図は、本発明の一実施例に係るスムージング回路
部の具体例を示すブロック図である。この回路の時定
数、すなわちインパルス応答が1/eになるサンプル数Ｔ
はＴ＝−1/ln（α） ……（３）と表される。この時定数Ｔをピッチ周期の値に応じて適
応的に変化させる。ピッチ周期をT_p〔サンプル〕、サン
プリング周波数をｉ_ｓ〔Hz〕,FFT又はDFTの次数をＮと
すれば残差信号のパワースペクトルに現れるピッチによ
る微細構造の周期ｍ〔サンプル〕は次式で記述すること
ができる。FIG. 3 is a block diagram showing a specific example of a smoothing circuit unit according to one embodiment of the present invention. The time constant of this circuit, that is, the number of samples T at which the impulse response becomes 1 / e
Is expressed as T = −1 / ln (α) (3) This time constant T is adaptively changed according to the value of the pitch period. If the pitch period is T _p [sample], the sampling frequency is is _[Hz] , and the order of the FFT or DFT is N, the period m [sample] of the fine structure due to the pitch appearing in the power spectrum of the residual signal is expressed by the following equation. Can be described.

従って、時定数Ｔをｍに応じて適応的に変化させるに
は、に（３）式を代入し、αについて解きと定めればよい。ただし、Ｌはスムージングを行う微細
構造の数を表すパラメータである。また、無音声の場
合、T_pは得られないので、ピッチ分析処理部６が無声と
判定した場合には、T_pを予め適当に定めた値に設定す
る。さらに、第２図に示したフィルタによりパワースペ
クトルを平滑化する際、フィルタは零位相とする。零位
相とするには、例えば、パワースペクトルを前向きと後
向きに各々、フィルタリングし、各々得られる出力を平
均すればよい。残差信号のパワースペクトルをＤ（ｎω
_０）、前向きにフィルタリングした場合のフィルタ出力
をＤ（ｎω_０）_ｆ、後向きにフィルタリングし場合のフ
ィルタ出力をＤ（ｎω_０）_ｂとおくと、平滑化は次によ
うに記述される。 Therefore, to adaptively change the time constant T according to m, Substituting equation (3) into It should be determined. Here, L is a parameter representing the number of fine structures to be smoothed. Also, the case of no speech, since T _p can not be obtained, when the pitch analysis unit 6 determines that unvoiced is set in advance appropriately determined values of T _p. Further, when the power spectrum is smoothed by the filter shown in FIG. 2, the filter has a zero phase. In order to make the phase zero, for example, the power spectrum may be filtered forward and backward, respectively, and the obtained outputs may be averaged. The power spectrum of the residual signal is represented by D (nω
₀ ), the filter output when filtering forward is D (nω ₀ ) _f , and the filter output when filtering backward is D (nω ₀ ) _b , the smoothing is described as follows.

但し、Ｄ（ｎω_０）は平滑化されたパワースペクトル
であり、ＮはFFT又はDETの次数である。 Here, D (nω ₀ ) is a smoothed power spectrum, and N is the order of FFT or DET.

第３図に平滑化された残差信号のスペクトルの例を示
す。但し、スペクトルは265点FFTにより求めた。FIG. 3 shows an example of the spectrum of the smoothed residual signal. However, the spectrum was obtained by 265 point FFT.

以上のスムージング回路により平滑化されたスペクト
ルは、逆数回路部８によって、逆スペクトルに変換され
る。この結果、残差信号スペクトルの零点は極へ変換さ
れる。逆スペクトルは逆FFT処理部９により逆FFTが施さ
れ、自己相関系列へと変換され、零予測パラメータ推定
部10へ入力される。零予測パラメータ推定部10は、入力
した自己相関系列から、自己相関法を用いて零予測パラ
メータを求める。全零形逆フィルタ11は、全極形逆フィ
ルタの残差信号を入力とし、零予測パラメータ推定部10
により求められた零予測パラメータを用いて予測を行い
予測残差信号ｅ（ｎ）を出力する。ｅ（ｎ）は次式に従
い計算される。The spectrum smoothed by the smoothing circuit described above is converted by the reciprocal circuit unit 8 into an inverse spectrum. As a result, the zeros of the residual signal spectrum are converted to poles. The inverse spectrum is subjected to inverse FFT by an inverse FFT processing unit 9, converted into an autocorrelation sequence, and input to a zero prediction parameter estimation unit 10. The zero prediction parameter estimating unit 10 obtains a zero prediction parameter from the input autocorrelation sequence using an autocorrelation method. The all-zero inverse filter 11 receives the residual signal of the all-pole inverse filter as an input and outputs a zero prediction parameter estimator 10.
The prediction is performed using the zero prediction parameter obtained by the above, and a prediction residual signal e (n) is output. e (n) is calculated according to the following equation.

ここで、b_iは零予測パラメータ、Ｑは零予測の次数で
ある。 Here, b _i is zero prediction parameter, Q is the order of the zero prediction.

以上の処理により、音声信号の極零予測分析が行われ
る。第４図に得られた極零モデルの周波数特性を示す。With the above processing, the pole-zero prediction analysis of the audio signal is performed. FIG. 4 shows the frequency characteristics of the pole-zero model obtained.

第１図に示すスムージング回路部として、パワースペ
クトルのピークを検出し、検出したピーク間を２次曲線
で補間する方法によっても行うことができる。具体的に
は、３点のピークを通る２次方程式の係数を求め、２点
のピーク間をその２次曲線で補間する。この場合、ピッ
チ分析が要らなくなるので演算量が少なくなるという効
果がある。The smoothing circuit shown in FIG. 1 can also be implemented by a method of detecting peaks of a power spectrum and interpolating between the detected peaks with a quadratic curve. Specifically, the coefficients of the quadratic equation passing through the three peaks are obtained, and the interpolating between the two peaks is performed using the quadratic curve. In this case, there is an effect that the amount of calculation is reduced because pitch analysis is not required.

第１図に示すスムージング回路部は、逆数回路の次に
挿入することもでき、この場合の実施例を第５図に示
す。また、周波数領域で行っている第１図，第５図のス
ムージングは時間領域で行うこともできる。残差信号ｄ
（ｎ）のパワースペクトルの逆数をＤ′（ｎω_０），
（ｎ＝0,1,…Ｎ−１）第２図のディジタルフィルタのイ
ンパルス応答と伝達関数を各々ｈ（ｎ）,H（ｎω_０）と
おくと、スムージングは次式で表されるように周波数領
域でのフィルタリングによって行われる。The smoothing circuit shown in FIG. 1 can be inserted next to the reciprocal circuit, and an embodiment in this case is shown in FIG. The smoothing shown in FIGS. 1 and 5 performed in the frequency domain can be performed in the time domain. Residual signal d
The reciprocal of the power spectrum of (n) is D '(nω ₀ ),
(N = 0, 1,..., N−1) If the impulse response and the transfer function of the digital filter shown in FIG. 2 are denoted by h (n) and H (nω ₀ ), the smoothing can be expressed by the following equation. This is performed by filtering in the frequency domain.

ここで（ｎω_０）はスムージングされたパワースペ
クトルである。（ｎω_０）とＤ′（ｎω_０）の逆フー
リェ変換を各々（ｎ），γ′（ｎ）とすれば、フーリ
ェ変換の性質から、式（11）は時間領域で次式のように
記述される。 Here, (nω ₀ ) is a smoothed power spectrum. If the inverse Fourier transforms of (nω ₀ ) and D ′ (nω ₀ ) are (n) and γ ′ (n), respectively, equation (11) is described in the time domain as Is done.

（ｎ）＝γ′（ｎ）・Ｈ（ｎω_０） ……（13）すなわち、窓Ｈ（ｎω_０）をかけるのと等しい。この
とき、Ｈ（ｎω_０）をラグウィンドと呼ぶ。Ｈ（ｎ
ω_０）はピッチ周期に応じて適応的に変化する。(N) = γ ′ (n) · H (nω ₀ ) (13) That is, it is equivalent to multiplying the window H (nω ₀ ). At this time, H (nω ₀ ) is called a lag window. H (n
ω ₀ ) changes adaptively according to the pitch period.

第６図に、スームジングを時間領域で行う場合の一実
施例を示す。FIG. 6 shows an embodiment in which the smoothing is performed in the time domain.

また、第１図，第５図，第６図の実施例では、周波数
領域で零点の極への変換を行っているが、これを時間領
域で行うこともできる。極予測の残差信号ｄ（ｎ）の自
己相関系列をγ（ｎ）、そのフーリェ変換であるパワー
スペクトルをＤ（ｎω_０）とおくと、Ｄ（ｎω_０）のそ
の逆数Ｄ′（ｎω_０）の間には次の関係がある。Further, in the embodiments shown in FIGS. 1, 5 and 6, the conversion of the zero point to the pole is performed in the frequency domain, but this can be performed in the time domain. Pole autocorrelation sequence of the residual signal d (n) of the prediction gamma (n), placing the power spectrum is the Fourier transform and D (nω _0), D the inverse D '(nω ₀ of (nω ₀₎ ) Has the following relationship:

Ｄ（ｎω_０）・Ｄ′（ｎω_０）＝１ ……（14）フーリェ変換の性質から、上式は時間領域で次のよう
に表される。D (nω ₀ ) · D ′ (nω ₀ ) = 1 (14) From the nature of the Fourier transform, the above equation is expressed in the time domain as follows.

自己相関係数はγ（０）についての対称であるのでこ
の式（15）は、行列の形で次のように書くことができ
る。 Since the autocorrelation coefficient is symmetric about γ (0), this equation (15) can be written as a matrix as follows.

この方程式は、Levinsonアルゴリズムにより再帰的に
解くことができる。この方法は、例えば「ディジタル信
号処理の理論１基礎・制御」（コロナ社」に記載されて
いるものである。 This equation can be solved recursively by the Levinson algorithm. This method is described, for example, in "Digital Signal Processing Theory 1 Basics / Control" (Corona).

時間領域で零点の変換とスムージングを行う場合の一
実施例を第７図と第８図に示す。これらの図において逆
たたみ込み回路部57,67は式（17）を計算することによ
り、式（15）をγ′（ｎ）について解くものである。FIGS. 7 and 8 show an embodiment in which the zero point is converted and smoothed in the time domain. In these figures, the deconvolution circuits 57 and 67 solve equation (15) for γ ′ (n) by calculating equation (17).

尚、第８図において、逆たたみ込み回路部67にかえて
ラグウインドー66の出力を、FFTあるいはDFT処理し、絶
対値の２乗逆数を施し逆FFTあるいは逆DFT処理する方法もある。この場
合、演算量が逆たたみ込みによるものよりさらに少なく
なるという効果がある。In FIG. 8, the output of the lag window 66 is subjected to FFT or DFT processing in place of the deconvolution circuit section 67, and the square reciprocal of the absolute value is obtained. To perform inverse FFT or inverse DFT processing. In this case, there is an effect that the calculation amount is further reduced as compared with the case of the inverse convolution.

次に実音声に対する実験結果を以下に示す。 Next, the experimental results for real voice are shown below.

成人男女各１名の発声した「雨」に対する分析結果を
第13図と第14図に示す。音声の入力は、サンプリング周
波数8kHzのμ−PCMコーデックで行い、前処理は行って
いない。分析の条件は、フレーム30ms（240サンプ
ル）、分析長32ms（256サンプル）、時間窓256サンプル
のハミング窓、極の次数、零の次数８である。第13図
（ａ）と第14図（ａ）は、16次の全極モデルのスペクト
ルとスペクトル平滑化を行わない場合の極零モデルのス
ペクトルとピッチ周期適応の平滑化を行った場合の極零
モデルのスペクトルを比較したものである。これらの図
から分かるように、平滑化を行わない場合には、極零モ
デルのスペクトルに偽の零点や強調された零点が現れ、
スペクトルの近似が悪くなっている。しかし、平滑化を
行った場合には、スペクトルの零点もよく近似してい
る。第13図（ｂ）と第14図（ｂ）は、スペクトル平滑化
の方法を比較したものである。男性音声に対しては、平
滑化法の違いによるスペクトル近似の良悪に大きな差は
ない。これは、実験に用いた男の声のピッチによるスペ
クトル微細構造の間隔が３点平均により平滑化できる程
度だったことによるものと考えられる。女性音声に対し
ては、平均法のスペクトル近似が悪くなっている。これ
は、女性音声の場合、微細構造の間隔が広くなって、平
均法ではうまく平滑化ができなかったことによる。これ
に対し、ピーク間線形補間法は平滑化の範囲が微細構造
の間隔に応じて変化するので、平均化よりスペクトル近
似が良い。表１に平滑化法を変えた場合のセグメンタル
予測ゲイGsegとセグメンタルSFMsegの逆数を示す。Gseg
と1/SFMsegは各フレームにおける予測ゲインと1/SFMをd
B領域で平均したものである。表１や第13図，第14図か
ら分かるように、スペクトル平滑化法としてピッチ適応
法を用いた極零モデルは、発声者の性別に依らず常に入
力スペクトルを良好に推定できる。FIGS. 13 and 14 show the results of analysis of the “rain” uttered by one adult and one male. Audio input is performed by a μ-PCM codec with a sampling frequency of 8 kHz, and no preprocessing is performed. The analysis conditions include a frame of 30 ms (240 samples), an analysis length of 32 ms (256 samples), a Hamming window of 256 time windows, a pole order, and a zero order 8. FIGS. 13 (a) and 14 (a) show the spectrum of the all-pole model of the 16th order, the spectrum of the pole-zero model when spectrum smoothing is not performed, and the pole when the pitch period adaptation is smoothed. This is a comparison of the spectra of the zero model. As can be seen from these figures, when smoothing is not performed, false zeros or emphasized zeros appear in the spectrum of the pole-zero model,
The spectrum approximation is poor. However, when smoothing is performed, the zeros of the spectrum are well approximated. FIGS. 13 (b) and 14 (b) compare the methods of spectral smoothing. For male speech, there is no significant difference in the quality of the spectral approximation due to the difference in the smoothing method. This is considered to be because the interval of the spectral fine structure according to the pitch of the male voice used in the experiment was such that it could be smoothed by averaging three points. For female voices, the spectral approximation of the averaging method is poor. This is because in the case of female voice, the interval between fine structures became wide, and smoothing was not successfully performed by the averaging method. On the other hand, the peak-to-peak linear interpolation method has better spectral approximation than averaging because the range of smoothing changes according to the interval between fine structures. Table 1 shows the reciprocals of the segmental prediction gay Gseg and the segmental SFMseg when the smoothing method is changed. Gseg
And 1 / SFMseg are the predicted gain and 1 / SFM in each frame
It is the average in the B area. As can be seen from Table 1 and FIGS. 13 and 14, the pole-zero model using the pitch adaptation method as the spectrum smoothing method can always satisfactorily estimate the input spectrum regardless of the gender of the speaker.

以上示したように従来の方法には、有声音駆動音源の
周期性に基づくスペクトルの微細構造のため、全極モデ
ルの残差信号のスペクトルに偽の零点や強調された零点
が生じ、そのため零パラメータの推定を誤る問題があっ
た。この問題を解決する方法として残差信号のパワース
ペクトルを平滑化する方法を検討し、ピッチ周期に応じ
て時定数を適応的に変化させるフィルタにより残差信号
のパワースペクトルを周波数領域で平滑化し、その後、
逆スペクトル化し零パラメータを抽出する方法を提案し
た。この方法により、スペクトルの微細構造に影響され
ず、常にパラメータが誤りなく抽出できるようになっ
た。また、周波数領域で行っていたスペクトル平滑化と
逆スペクトル化の処理を時間領域で行う方法を明らかに
した。 As described above, according to the conventional method, due to the fine structure of the spectrum based on the periodicity of the voiced sound source, false zeros and emphasized zeros are generated in the spectrum of the residual signal of the all-pole model. There was a problem of incorrect parameter estimation. As a method for solving this problem, a method of smoothing the power spectrum of the residual signal is studied, and the power spectrum of the residual signal is smoothed in a frequency domain by a filter that adaptively changes a time constant according to a pitch period, afterwards,
A method to inverse spectrum and extract zero parameters was proposed. According to this method, parameters can always be extracted without error regardless of the fine structure of the spectrum. In addition, we clarified a method of performing spectrum smoothing and inverse spectrum conversion processing performed in the frequency domain in the time domain.

提案した分析法を実音声に適用し、スペクトルの零点
が良好に近似できていることを示した。The proposed analysis method was applied to real speech, and it was shown that the zero of the spectrum could be approximated well.

次に、上記実施例で示したものとは別の試みによるス
ペクトルと微細構造を除去する原理及び実施例を第15図
乃至第20図を用いて示す。Next, a principle and an embodiment of removing a spectrum and a fine structure by another attempt different from those shown in the above embodiment will be described with reference to FIGS. 15 to 20. FIG.

一般に知られているように音声音発生のメカニズムを
モデル化すると、第15図に示すように構成される。つま
り声門信号としてのピッチパルスｅ（ｔ）（ピッチT_p）
が声道としてのフィルタＨ（ω）第18図（ａ）に通され
ると音声信号Ｓ（ｔ）が出力される。この場合ピッチパ
ルスｅ（ｔ）の１周期分の信号e₀（ｔ）で駆動された音
声信号はS₀（ｔ）である。Modeling the sound generation mechanism as generally known results in a configuration as shown in FIG. That is, the pitch pulse e (t) (pitch T _p ) as the glottal signal
Is passed through a filter H (ω) as a vocal tract and an audio signal S (t) is output. In this case, the audio signal driven by the signal e ₀ (t) for one cycle of the pitch pulse e (t) is S ₀ (t).

ピッチパルスの１周期分の信号を上記の如くE₀（ｔ）
とし、そのフーリェ変換を第17図（ａ）に示すようにE₀
（ω）とおくと、ｅ（ｔ）のフーリエ変換Ｅ（ω）は第
17図（ｂ）に示すようにE₀（ω）を周波数方向に２π/T
_pごとに離散化したものとなる。このため、音声信号Ｓ
（ｔ）のスペクトルＳ（ω）も第18図（ｃ）に示される
ような周波数方向に離散化（サンプリング）されたもの
となり、これが一般に言われているスペクトルの微細構
造である。The signal for one cycle of the pitch pulse is converted to E ₀ (t) as described above.
And its Fourier transform is E ₀ as shown in FIG. 17 (a).
(Ω), the Fourier transform E (ω) of e (t) is
As shown in FIG. 17 (b), E ₀ (ω) is set to 2π / T in the frequency direction.
_It is discretized for each _p . Therefore, the audio signal S
The spectrum S (ω) of (t) is also discretized (sampled) in the frequency direction as shown in FIG. 18 (c), and this is the fine structure of the spectrum generally referred to.

１ピッチ周期分の信号e₀（ｔ）で駆動された場合の音
声信号を上記したようにS₀（ｔ）,S₀（ｔ）を第16図に
示すように全極形逆フィルタ（１＋Ａ（ω））を有する
全極モデルで分析した場合の残差信号をd₀（ｔ）とおく
と、d₀（ｔ）のスペクトルD₀（ω）は第19図に示すよう
に、Ｓ（ω）から極が除かれ、零が残ったものとなる。
一方、実際の残差信号ｄ（ｔ）（第16図）は、周期的な
ピッチパルスｅ（ｔ）で駆動されて出力された音声信号
Ｓ（ｔ）を分析した場合の残差信号であるので、そのス
ペクトルは第19図（ｂ）に示すようにD₀（ω）を周波数
方向にサンプリングしたものとなる。As described above, S ₀ (t) and S ₀ (t) are converted to an all-pole inverse filter (1 + A) as shown in FIG. 16 for the audio signal when driven by the signal e ₀ (t) for one pitch period. (omega)) d ₀ the residual signal when analyzed by all-pole model with (putting and t), the spectrum D ₀ (omega of d ₀ (t)), as shown in FIG. 19, S ( The pole is removed from ω), leaving zero.
On the other hand, the actual residual signal d (t) (FIG. 16) is a residual signal obtained by analyzing the audio signal S (t) driven and output by the periodic pitch pulse e (t). Therefore, the spectrum is obtained by sampling D ₀ (ω) in the frequency direction as shown in FIG. 19 (b).

ここで、D₀（ω）とＤ（ω）のフーリェ変換を第20図
（ａ）（ｂ）に示すようにFD₀（ｔ）,FD（ｔ）とおく
と、FD（ｔ）はFD₀（ｔ）を周期T_pの周期信号にしたも
のとなる。従って、Ｄ（ω）にカットオフ周波数T_p/2の
理想ローパスフィルタをかければD₀（ω）が復元でき、
Ｄ（ω）からピッチによる微細構造を取り除くことがで
きる。よって上記した第１図のスムージング処理部τと
してカットオフ周波数T_p/2のローパスフィルタを用いれ
ばよい。Here, if the Fourier transform of D ₀ (ω) and D (ω) is FD ₀ (t), FD (t) as shown in FIGS. 20 (a) and 20 (b), FD (t) becomes FD (t). ₀ (t) becomes as that in the period signal having a period T _p. Therefore, if an ideal low-pass filter having a cutoff frequency T _p / 2 is applied to D (ω), D ₀ (ω) can be restored,
The fine structure due to the pitch can be removed from D (ω). Therefore, a low-pass filter having a cutoff frequency T _p / 2 may be used as the smoothing processing unit τ in FIG.

又、パワースペクトルの場合も同様であり、残差信号
ｄ（ｔ）のパワースペクトル|H（ω）|²にカットオフ周
波数T_p/2のローパスフィルタをかければ|D（ω）|²から
微細構造を除去できる。|D（ω）|²にT_p/2の理想ローパ
スフィルタをかけることは、時間領域では、残差信号の
自己相関係数にT_p/2の方形窓をかけることと等価であ
る。よって上記した第８図のラグウインド66としてT_p/2
の方形窓を用いればよい。Also, The same applies to the case of the power spectrum, the power spectrum of the residual signal d (t) | by multiplying the cutoff frequency T _p / 2 of the low-pass filter ² | | H ^(ω) from ² | D ^(ω) Fine structure can be removed. Applying an ideal low-pass filter of | D (ω) | ² to T _p / 2 is equivalent to applying a square window of T _p / 2 to the autocorrelation coefficient of the residual signal in the time domain. Therefore, T _p / 2 as the lag window 66 in FIG.
May be used.

残差信号のパワースペクトルから、上記の方法により
微細構造を除去した後、パワースペクトル逆数変換、逆
フーリェ変換により得られる自己相関係数に自己相関法
を適用することにより零パラメータを得る。このことに
よって、ピッチに影響されず、常に零パラメータを精度
良く求めることができる。After removing the fine structure from the power spectrum of the residual signal by the above-described method, the zero parameter is obtained by applying the autocorrelation method to the autocorrelation coefficient obtained by the power spectrum reciprocal transform and inverse Fourier transform. As a result, the zero parameter can always be obtained with high accuracy without being affected by the pitch.

なお、自己相関係数にT_p/2の方形窓をかけた場合の逆
たたみ込みの処理は、次の行列方程式をLevinsonのアル
ゴリジムにより解くことになる。Note that the deconvolution process when the autocorrelation coefficient is multiplied by a square window of T _p / 2 is to solve the following matrix equation by Levinson's algorithm.

〔発明の効果〕本発明によれば、ピッチによるスペクトルの微細構造
によって生じる零点の推定誤りを防止し、音声信号のス
ペクトルを良好に近似する極零モデルを得ることができ
る。 [Effects of the Invention] According to the present invention, it is possible to prevent a erroneous estimation of a zero point caused by a fine structure of a spectrum due to a pitch, and to obtain a pole-zero model that approximates a spectrum of an audio signal well.

[Brief description of the drawings]

第１図はこの発明の一実施例に係る極零モデル分析方法
のブロック図、第２図は第１図の主要部であるスムージ
ング回路の一具体例を示すブロック図、第３図は平滑化
された残差信号のスペクトルの一例を示す図、第４図は
本発明の一実施例により得られた極零モデルの周波数特
性の一例を示す図、第５図〜第８図は本発明の他の実施
例に係る極零モデル分析方法のブロック図、第９図は従
来の極零モデル分析方法のブロック図、第10図は音声信
号のスペクトルの一例を示す図、第11図は全極形逆フィ
ルタ出力である残差信号のスペクトルの一例を示す図、
第12図は従来の方法で得られた極零モデルの周波数特性
の一例を示す図、第13図，第14図は実音声に対する分析
結果を示す図、第15図は有声発生のモデルを示す図、第
16図は極零分析モデルを示す図、第17図は声門信号（ピ
ッチパルス）のフーリェ変換を示す図、第18図はＨ
（ω）,S₀（ω）,S（ω）の特性を示す図、第19図はD₀
（ω）,D（ω）のスペクトルを示す図、第20図はD
₀（ω）,D（ω）のフーリェ変換を示す図である。 1,20,20,41,52,62,100……入力端子、 2,31,42,53,62……極パラメータ推定回路、 3,32,43,54,63,102……全極形逆フィルタ、 4,33,44……FFT回路、5,34,45……２乗回路、 6,35,46,56,65……ピッチ分析回路、 7,37……スムージング回路、 8,36,47,104……逆数回路、 9,38,48……逆FFT回路、 10,39,50,59,68……零パラメータ推定回路、 11,40,51,60,69,107……全極形逆フィルタ、 21,23……乗算器、22……加算器、 24……単位遅延回路、49,58,66……ラグウインド、 55,64……自己相関係数計算器、 57,67……逆たたみ込み回路、 101,106……自己相関法実行回路、 103……パワースペクトル計算器。FIG. 1 is a block diagram of a pole-zero model analysis method according to an embodiment of the present invention, FIG. 2 is a block diagram showing a specific example of a smoothing circuit which is a main part of FIG. 1, and FIG. FIG. 4 is a diagram showing an example of the spectrum of the obtained residual signal, FIG. 4 is a diagram showing an example of the frequency characteristic of the pole-zero model obtained by one embodiment of the present invention, and FIGS. FIG. 9 is a block diagram of a conventional pole-zero model analysis method according to another embodiment, FIG. 9 is a block diagram of a conventional pole-zero model analysis method, FIG. 10 is a diagram showing an example of a spectrum of a voice signal, and FIG. The figure showing an example of the spectrum of the residual signal that is the shape inverse filter output,
Fig. 12 shows an example of the frequency characteristics of the pole-zero model obtained by the conventional method, Figs. 13 and 14 show the analysis results for real speech, and Fig. 15 shows a model of voiced generation. Figure, No.
FIG. 16 shows a pole-zero analysis model, FIG. 17 shows a Fourier transform of a glottal signal (pitch pulse), and FIG.
_{(Ω), S 0 (ω} ), shows the characteristics of S (ω), Fig. 19 is D ₀
FIG. 20 shows spectra of (ω) and D (ω), and FIG.
FIG. 4 is a diagram showing a Fourier transform of ₀ (ω) and D (ω). 1,20,20,41,52,62,100 …… Input terminal, 2,31,42,53,62 …… Pole parameter estimation circuit, 3,32,43,54,63,102 …… All pole inverse filter, 4 , 33, 44… FFT circuit, 5, 34, 45… square circuit, 6, 35, 46, 56, 65… pitch analysis circuit, 7, 37… smoothing circuit, 8, 36, 47, 104… Reciprocal circuit, 9,38,48 …… Inverse FFT circuit, 10,39,50,59,68 …… Zero parameter estimation circuit, 11,40,51,60,69,107 …… All-pole inverse filter, 21,23 …… Multiplier, 22… Adder, 24 …… Unit delay circuit, 49,58,66 …… Lag window, 55,64 …… Autocorrelation coefficient calculator, 57,67 …… Deconvolution circuit, 101,106: Autocorrelation method execution circuit, 103: Power spectrum calculator.

Claims

(57) [Claims]

The present invention estimates the parameters of a pole model of an input speech signal, calculates the power spectrum of a residual signal obtained through an all-pole filter having the parameters of the pole model, and calculates the power spectrum of the power spectrum. In a voice signal analysis method for performing a pole-zero model analysis for obtaining a parameter of a zero model from an autocorrelation coefficient obtained by performing an inverse Fourier transform on a reciprocal, according to a pitch period value of an input voice signal or a residual signal. The power spectrum of the residual signal or its reciprocal is smoothed in the time domain.

2. An autocorrelation processing means for performing autocorrelation processing for estimating a polar model parameter of an input speech signal, and an all-pole inverse filter configured based on the polar model parameter. First filter means for obtaining a residual signal from the input audio signal, and arithmetic means for estimating the parameters of the zero model by performing autocorrelation processing on the result of inverse Fourier transform of the reciprocal of the power spectrum of the residual signal In the audio signal analysis method, the arithmetic unit includes a unit that extracts a pitch period of the input audio signal, and calculates a characteristic of a signal including a power spectrum of a residual signal based on a signal related to the pitch period. An audio signal analysis method, comprising means for smoothing a region.

3. A second method for obtaining a residual signal from an output signal of the first filter means via an all-zero type inverse filter formed based on the parameters of the zero model estimated by the arithmetic means.
3. The audio signal analysis method according to claim 2, further comprising a filter means for obtaining a spectrum envelope characteristic of the audio signal.