JPH0380300A

JPH0380300A - Voice synthesizing system

Info

Publication number: JPH0380300A
Application number: JP1216560A
Authority: JP
Inventors: Kazunori Ozawa; 一範小澤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1989-08-23
Filing date: 1989-08-23
Publication date: 1991-04-05
Anticipated expiration: 2012-04-16
Also published as: JP2600384B2

Abstract

PURPOSE:To obviate the deterioration in sound quality even if the pitch period at the time of synthesis is largely changed with respect to the pitch period of a unit voice by determining the change rate of the pitch period in accordance with delay time and rhythm information and synthesizing the voice by changing the pitch period of a residual signal by as much as the change rate of the pitch period. CONSTITUTION:Unit voice connection information is inputted from a terminal 100 and the rhythm information is inputted from a terminal 150. The unit voice connection information is supplied to a sound source signal storage section 200 and a spectral parameter storage section 210 and the rhythm information is supplied to a pitch control section 230 and a a time length control section 240. A pitch change rate calculating section 220 calculates the exact pitch period and delay time with respect to the residual waveforms between the adjacent pitch sections and determines the pitch change rate in the vocal section of the residual signal. The pitch control section 230 changes the pitch period of the residual signal by as much as the pitch change rate in the vocal section of the residual signal by using the pitch change rate. The good synthesized voices which are hardly deteriorated in sound quality when the pitch is changed are obtd. in this way.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、音声合成方式に関し、特に単位音声接続情報
と韻律情報とに従い単位音声の残差信号の韻律を変化さ
せて接続し、単位音声のスペクトルパラメータに基づい
て構成される合成フィルタを駆動して合成音声を得る音
声合成方式に関する。Detailed Description of the Invention (Industrial Field of Application) The present invention relates to a speech synthesis method, and in particular, the present invention relates to a speech synthesis method, and in particular, connects unit speech by changing the prosody of residual signals of unit speech according to unit speech connection information and prosody information. This invention relates to a speech synthesis method that obtains synthesized speech by driving a synthesis filter configured based on spectral parameters.

（従来の技術）比較的良好な音質の得られる規則合成方式として、単位
音声（例えば、ｃｖ、ｖｃ等）の全区間において音源信
号を表すための残差信号とスペクトル包絡を表すスペク
トルパラメータをあらかじめ単位音声から分析して求め
て格納し、入力した単位音声接続情報と、入力したピッ
チ周期、振幅、継続時間長などの韻律情報を用いて該当
する単位音声の残差情報の韻律（ピッチ周期、振幅、継
続時間長）を所望の値に制御してこれを接続し、接続し
た残差信号を該当するスペクトルパラメータにより構成
される合成フィルタに通して音声を合成する残差制御型
音声合成方式が知られている。(Prior art) As a rule synthesis method that can obtain relatively good sound quality, a residual signal representing a sound source signal and a spectral parameter representing a spectral envelope are prepared in advance in the entire interval of a unit sound (e.g., CV, VC, etc.). The prosodic information (pitch period, A residual control type speech synthesis method that controls the amplitude and duration (amplitude and duration) to desired values, connects them, and synthesizes speech by passing the connected residual signal through a synthesis filter configured with the corresponding spectral parameters. Are known.

この方式の詳細は、例えば、特願昭６３−１３６９６９
号明細書（文献１）、特願昭６３−１３３４７８号（文
献２）や岩田比らによる“残差制御による音声合成シス
テムの検討”　（日本音響学会ｍ論、３−２−７．１９
８８年１０月）（文献３）等に記載されている。For details of this method, see Japanese Patent Application No. 63-136969.
specification (Reference 1), Japanese Patent Application No. 63-133478 (Reference 2), and “Study of speech synthesis system using residual control” by Hiro Iwata et al. (Acoustical Society of Japan M Theory, 3-2-7.19)
(October 1988) (Reference 3).

この方式によれば、単位音声をあらかじめ分析−・ｅて
得た残差信号を単位音声の全区間にわたり音源信号とし
て使用しているので、音源信号として有声区間ではイン
パルス列、無声区間では雑音信号を用いる方式と比べて
、合成音声の音質が各段に良好である。According to this method, the residual signal obtained by analyzing the unit speech in advance is used as the sound source signal for the entire section of the unit speech, so the sound source signal is an impulse train in the voiced section, and a noise signal in the unvoiced section. The sound quality of the synthesized speech is much better than the method using .

（発明が解決しようとする課題）しかしながら、前述の残差制御型音声合成方式では、合
成時に残差信号のピッチ周期を変化させる範囲が小さい
ときは良好な音質の合成音声が得られるが、ピッチ周期
を大きく変化させると音質が劣化するという問題点があ
った。(Problem to be Solved by the Invention) However, in the residual control type speech synthesis method described above, synthesized speech with good quality can be obtained when the range of changing the pitch period of the residual signal during synthesis is small, but the pitch There was a problem in that the sound quality deteriorated if the cycle was changed significantly.

残差制御型音声合成方式におけるピッチ周期は次のよう
に変化させている。即ち、有声区間では単位音声から求
めた残差信号のピッチ周期をあらかじめ計算し、残差信
号を前記ピッチ周期に等しい長さを有するピッチ区間に
予め分割する０次に、入力した韻律情報からの所望のピ
ッチ周期と前記区間長あるいはピッチ周期とを用いて残
差信号のピッチ変化量を計算し、これを用いて残差信号
のピッチ周期をピッチ変化量だけ変化させている。The pitch period in the residual control type speech synthesis method is changed as follows. That is, in the voiced section, the pitch period of the residual signal obtained from the unit voice is calculated in advance, and the residual signal is divided in advance into pitch sections having a length equal to the pitch period. The pitch change amount of the residual signal is calculated using the desired pitch period and the section length or pitch period, and this is used to change the pitch period of the residual signal by the pitch change amount.

しかるに、残差信号をあらかじめピッチ区間毎に分割す
る際に、通常数％のピッチ周期抽出誤りが生ずる。これ
は単位音声のピッチ周期を変化させないときや変化させ
ても変化幅が小さいときには全く問題はないが、合成時
のピッチ周期を単位音声のピッチ周期に比べ大きく変化
させると、数％のピッチ抽出誤りが蓄積されて、ピッチ
を変化させた後のピッチ周期は韻律情報で指定したピッ
チ周期に対してゆらぎを生ずる。このゆらぎによって音
質劣化が発生していた。However, when the residual signal is divided into pitch sections in advance, a pitch period extraction error of several percent usually occurs. This is no problem at all when the pitch period of the unit voice is not changed or when the change range is small even if it is changed, but if the pitch period during synthesis is changed greatly compared to the pitch period of the unit voice, a few percent of the pitch will be extracted. As errors accumulate, the pitch period after changing the pitch fluctuates with respect to the pitch period specified by the prosody information. This fluctuation caused deterioration in sound quality.

さらに、前記ピッチ区間の分割の際に、ピッチ区間毎に
残差信号のピッチ波形のピーク位置の位相をそろえて分
割することは困難なので、ピッチ周期を大きく変化させ
て合成したときに合成音声のピッチ波形の位相ずれによ
り音質が劣化していた。Furthermore, when dividing the pitch interval, it is difficult to align the peak position of the pitch waveform of the residual signal for each pitch interval. The sound quality deteriorated due to the phase shift of the pitch waveform.

本発明の目的は、単位音声のピッチ周期に対して合成時
のピッチ周期を大きく変化させても音質の劣化のない音
声合成方式を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a speech synthesis method that does not cause deterioration in sound quality even when the pitch period during synthesis is greatly changed with respect to the pitch period of a unit speech.

〈課゛ばを解決するための手段）本発明による音声合成方式は、単位音声全体の音源を表
わす残差信号とスペクトル包絡を表わすスペクトルパラ
メータとを格納し、単位音声接続情報と前記韻律情報と
に従い前記残差信号の韻律を変化させて接続し、前記ス
ペクトルパラメータに基づいて構成される合成フィルタ
を駆動して合成音声を得る音声合成方式において、前記
隣接ピッチ区間の残差信号どうしの相互相関関数の最大
値を与える遅れ時間を求め、前記遅れ時間と前記韻律情
報とに基づいてピッチ周期可変量を求め、前記残差信号
のピッチ周期を前記ピッチ周期可変量だけ変化させてい
る。<Means for solving the problem> The speech synthesis method according to the present invention stores a residual signal representing the sound source of the entire unit speech and a spectral parameter representing the spectral envelope, and combines the unit speech connection information and the prosody information. In a speech synthesis method that obtains synthesized speech by changing the prosody of the residual signals according to the spectral parameters and driving a synthesis filter configured based on the spectral parameters, the cross-correlation between the residual signals of the adjacent pitch sections is determined. A delay time that gives a maximum value of the function is determined, a pitch period variable amount is determined based on the delay time and the prosody information, and the pitch period of the residual signal is changed by the pitch period variable amount.

また、隣接ピッチ区間の残差信号により前記合成フィル
タを駆動して合成音声を求め、前記隣接ピッチ区間の前
記合成音声どうしの相互相関関数の最大値を与える遅れ
時間を求め、前記遅れ時間と前記韻律情報とに基づいて
ピッチ周期変化量を求め、前記残差信号のピッチ周期を
前記ピッチ周期変化量だけ変化させ音声を合成する。Further, the synthesis filter is driven by the residual signal of the adjacent pitch section to obtain a synthesized speech, a delay time that gives a maximum value of a cross-correlation function between the synthesized speech of the adjacent pitch sections is obtained, and the delay time and the A pitch period change amount is determined based on the prosody information, and the pitch period of the residual signal is changed by the pitch period change amount to synthesize speech.

（作曲）第１の発明の作用を第３図を引用して説明する。(composition) The operation of the first invention will be explained with reference to FIG.

ここで、単位音声としては、例えば、日本語のｃｖ、ｖ
ｃを３００〜４００種を用いる場合を想定する。Here, as unit sounds, for example, Japanese cv, v
It is assumed that 300 to 400 types of c are used.

有声区間では、単位音声から予めピッチ周期を抽出し、
ピッチ周期に長さが等しいピッチ区間毎に単位音声に対
して境界を設けておく、ピッチ抽出には、音声信号の自
己相関から求める方法や、他の公知な方法を用いること
ができる。また、ピッチ分割境界の求め方として、例え
ば、特願昭６２−２１０６９０号（文献４）に提案され
ている技術が用いられる。In the voiced section, the pitch period is extracted from the unit voice in advance,
For pitch extraction, in which a boundary is set for a unit voice for each pitch section whose length is equal to the pitch period, a method obtained from autocorrelation of the voice signal or other known methods can be used. Further, as a method for determining the pitch division boundary, for example, the technique proposed in Japanese Patent Application No. 1983-210690 (Reference 4) is used.

有声区間では、ピッチ区間毎に単位音声を分析して音声
信号のスペクトル包絡を表すスペクトルパラメータと残
差信号を求めておく、また無声区間では、あらかじめ定
められた一定区間（例えば５ｍ５）毎に分析する０分析
には前記文献２，３と同じく改良ケプヌトラム分析を用
いるが、他の公知の良好なスペクトル分析法を用いるこ
とがでｌる。残差信号は前記文献１，２．３と同様に単
位音声の全区間について求める。In voiced sections, the unit speech is analyzed for each pitch section to obtain spectral parameters and residual signals representing the spectral envelope of the speech signal.In unvoiced sections, the analysis is performed for each predetermined interval (for example, 5m5). The modified Cepnutrum analysis is used for the 0 analysis as in References 2 and 3, but other known good spectral analysis methods can be used. The residual signal is obtained for the entire section of the unit speech in the same manner as in References 1 and 2.3.

以上の処理を予め行い、音源信号格納部２００には単位
音声の全区間での残差信号を各単位音声毎に格納してお
く、またスペクトルパラメータ格納部２１０には、有声
区間ではピッチ区間毎に、無声区間では予め定められた
一定時間毎に求めたスペクトルパラメータを格納してお
く。The above processing is performed in advance, and the sound source signal storage unit 200 stores the residual signal for the entire unit voice for each unit voice, and the spectral parameter storage unit 210 stores the residual signal for each pitch interval in the voiced interval. In addition, in the silent section, spectral parameters obtained at predetermined intervals are stored.

次に、端子１００から単位音声接続情報を、端子１５０
から韻律情報（ピッチ周期、音韻の継続時間長、振幅）
を入力する。単位音声接続情報は音源信号格納部２００
とスペクトルパラメータ格納部２１０とへ供給され、韻
律情報はピッチ制御部２３０と時間長制御部２４０とへ
供給される。Next, unit audio connection information is transferred from the terminal 100 to the terminal 150.
prosodic information (pitch period, phonological duration, amplitude)
Enter. The unit audio connection information is stored in the audio source signal storage section 200.
and spectral parameter storage section 210, and prosody information is supplied to pitch control section 230 and time length control section 240.

ピッチ変化量計算部２２０は、残差信号の有声区間で隣
接ピッチ区間の残差波形に対して正確なピッチ周期Ｔ、
遅れ時間τ１１．を計算し、ピッチ変化量を求める。第
４図に示すように、分割したピッチ区間■における区間
長をＬＬ、残差波形をｅ＋　　（ｎ）、ピッチ区間■に
おける区間長をＬ２、ＴＡ藻波形をｅ２　（ｎ）とする
、ｅｌ　（ｎ）とｅ２（ｎ）の相互相関関数を次式によ
り計算する。The pitch change calculation unit 220 calculates an accurate pitch period T, with respect to the residual waveform of the adjacent pitch section in the voiced section of the residual signal,
Delay time τ11. Calculate and find the amount of pitch change. As shown in Fig. 4, the section length of the divided pitch section ■ is LL, the residual waveform is e+ (n), the section length of the pitch section ■ is L2, and the TA waveform is e2 (n), el ( The cross-correlation function of n) and e2(n) is calculated using the following equation.

Φ　（τ）＝Σｅｔ　　（ｎ＋Ｌ＋　　＋τ）ｅ２　（
ｎ）　　　　（１）種々の遅れ時間に対して（１）式を
計算し、（１）式を最大化する遅れ時間をτ６．８とす
る。このとき隣接ピッチ間の正確なピッチ周期Ｔは次式
により求められる。Φ (τ)=Σet (n+L+ +τ)e2 (
n) (1) Calculate equation (1) for various delay times, and set the delay time that maximizes equation (1) to τ6.8. At this time, the accurate pitch period T between adjacent pitches can be determined by the following equation.

Ｔ＝Ｌ１＋τ、、、　　　　　　　　　　　　（２）従
って、単位音声の有するピッチ周期Ｔを、韻律情報とし
て入力した合成したいピッチ周期Ｔ°にするためには、
ピッチ変化量りは次式から求められる。T=L1+τ,,, (2) Therefore, in order to make the pitch period T of the unit speech to the pitch period T° that is inputted as prosody information and which is desired to be synthesized,
The pitch change scale can be obtained from the following formula.

Ｄ＝Ｔ’　−ＬＬ−τ、、、＝Ｔ’−Ｔ　　　　　（３
）求めたピッチ変化量りをピッチ制御部２３０へ出力す
る。D=T'-LL-τ,,,=T'-T (3
) The determined pitch change scale is output to the pitch control section 230.

ピッチ制御部２３０は、ピッチ変化量りを用いて残差信
号の有声区間で残差信号のピッチ周期をピッチ変化量だ
け変化させる。具体的には、前記文献１．２のように、
ピッチ周期をＤだけ長くさせ′るときにはピッチ区間の
後方にＤサンプルだけＯを詰める。一方、ピッチ周期を
Ｄだけ短くさせるときにはピッチ区間の後方から残差信
号をＤサンプルだけ切り詰めていく、なお、ピッチ周期
の変化法としては他の公知な方法を用いることもできる
ことは勿論である。The pitch control unit 230 changes the pitch period of the residual signal by the amount of pitch change in the voiced section of the residual signal using a pitch change measure. Specifically, as in the above document 1.2,
When the pitch period is lengthened by D, O is inserted at the rear of the pitch section by D samples. On the other hand, when the pitch period is shortened by D, the residual signal is truncated by D samples from the rear of the pitch section. Of course, other known methods can also be used to change the pitch period.

時間長制御部２４０は、入力した韻律情報にうちの継続
時間長を用いて、単位音声を接続して求めた音韻の継続
時間長を制御する。具体的には前記文献１．２の時間長
制御部を参照することができる。The duration control unit 240 uses the duration length of the input prosody information to control the duration length of a phoneme obtained by connecting unit speech. Specifically, reference can be made to the time length control section in Document 1.2.

合成フィルタ２５０は、単位音声が接続され、ピッチ周
期、継続１時間長という韻律情報が制御された残差信号
を入力して次式によりスペクトルパラメータを用いて音
声を合成して端子２６０より出力する。なお、ここでス
ペクトルパラメータとしては、制御し易さを考慮して、
改良ゲプヌトラムを線形予測係数ａ１に変換したものを
用いる。The synthesis filter 250 inputs the residual signal to which the unit speech is connected and in which the prosodic information such as the pitch period and duration of one hour is controlled, synthesizes the speech using the spectral parameters according to the following equation, and outputs the synthesized speech from the terminal 260. . Note that the spectral parameters here are as follows, considering ease of control:
A modified gepnutrum converted into a linear prediction coefficient a1 is used.

改良ゲプヌトラムから線形予測係数への変換は、例えば
前記文献２を参照できる。For the conversion from improved gepnutrum to linear prediction coefficients, reference can be made to the above-mentioned document 2, for example.

次に第２の発明では、−旦、合成音声を求めて合成音声
信号レベルでピッチ変化量りを求める。Next, in the second aspect of the invention, first, synthesized speech is obtained and a pitch change measurement is obtained based on the synthesized speech signal level.

求めかたを以下に示す６図４においてピッチ区間■にお
ける合成音声をｘ＋（ｎ）、ピッチ区間■における合成
音声をｘ２　（ｎ）とする、これらの合成音声はピッチ
区間の残差信号を合成フィルタ２５０に一旦通ずことに
より求められる０次に次式に従い相互相関関数を計算す
る。The calculation method is shown below.6 In Figure 4, the synthesized speech in pitch interval ■ is x + (n), and the synthesized speech in pitch interval ■ is x2 (n). These synthesized voices are synthesized by the residual signals of the pitch interval. Once passed through the filter 250, a cross-correlation function is calculated according to the zero-order equation.

Φ　（τ）＝　Σ　ＸＩ　　（ｎ＋Ｌ　　＋　　τ　）
　　Ｘ２　　（ｎ）　　　　　　（５）（５）式を最大
化するτをτ、１工として求め、前記（２）、（３）式
から正確なピッチ周期Ｔ、ピッチ変化量りを求める。Φ (τ) = Σ XI (n+L + τ)
X2 (n) (5) Find τ that maximizes equation (5) as τ, 1 work, and find the accurate pitch period T and pitch change amount from equations (2) and (3).

（実施例）第１図に第１の一実施例を示すブロック図を示す。(Example) FIG. 1 shows a block diagram showing a first embodiment.

制御回路５１０は、端子５００から韻律制御情報（ピッ
チ、継続時間長、振幅）単位音声の接続情報を入力し、
音源格納回路５５０、スペクトルパラメータ格納口ｖ？
１５８０、ピッチ変化量計算回路５５５、振幅制御回路
５７０、時間長制御回路５９０へ出力する。The control circuit 510 inputs prosody control information (pitch, duration, amplitude) unit voice connection information from the terminal 500,
Sound source storage circuit 550, spectrum parameter storage port v?
1580, the pitch change amount calculation circuit 555, the amplitude control circuit 570, and the time length control circuit 590.

音源格納回路５５０は、単位音声の接続情報を入力し、
その単位音声に対応する予測残差信号を出力する。The sound source storage circuit 550 inputs the connection information of the unit sound,
A prediction residual signal corresponding to the unit speech is output.

ピッチ変化量計算回路５５５は、韻律制御情報から合成
時のピッチ周期Ｔ″を入力する。また、残差信号の有声
区間で隣接ピッチ区間の残差波形に対して正確なピッチ
周期Ｔ、遅れ時間τ１６．を計算し、ピッチ変化量りを
求める。ピッチ変化量りは前記（１）〜（３〉式に従い
計算することができる。求めたＤをピッチ制御回路５６
０へ出力する。The pitch change calculation circuit 555 inputs the pitch period T'' at the time of synthesis from the prosody control information.In addition, the pitch change calculation circuit 555 inputs the pitch period T'' at the time of synthesis from the prosody control information. τ16. is calculated to obtain the pitch change measure.The pitch change measure can be calculated according to formulas (1) to (3> above.The obtained D is calculated by the pitch control circuit 56.
Output to 0.

ピッチ制御回路５６０は、ピッチ変化量りを入力し、有
声区間においてあらかじめ指定されているピッチ分割位
置を用いて、残差信号のピッチ周期の変更を行う、ピッ
チ周期を変更するための具体的な方法については、前記
作用の項で説明した方−決や、他の公知の方法を用いる
ことができる。The pitch control circuit 560 inputs a pitch change measure and changes the pitch period of the residual signal using pitch division positions specified in advance in a voiced section. For this purpose, the method explained in the section of the above-mentioned operation or other known methods can be used.

時間長制御回路５９０は、制御回路５１０から継続時間
長情報を入力し、単位音声を接続して得た音韻の継続時
間長が所望の時間長となるように時間長を制御する。詳
細は前記文献１．２の時間長制御回路を参照できる。The duration control circuit 590 receives duration information from the control circuit 510 and controls the duration so that the duration of the phoneme obtained by connecting the unit voices becomes a desired duration. For details, refer to the time length control circuit in Document 1.2.

次に、振幅制御回路５７０は、振幅制御情報を入力し、
それに従い、残差信号の振幅を制御しｅ　（ｎ）を出力
する。Next, the amplitude control circuit 570 inputs amplitude control information,
Accordingly, the amplitude of the residual signal is controlled and e (n) is output.

スペクトルパラメータ格納口Ｆ＃１５８０は、単位音声
の接続情報を入力し、その単位音声に対応するスペクト
ルパラメータ系列を出力する。ここでは、前記作用の項
と同様にスペクトルパラメータとして、ケプストラム係
数から変換して求めたしｐｃ係数ａｌを用いることにす
るが、他の公知なパラメータを用いることができる。The spectral parameter storage port F#1580 inputs the connection information of the unit voice and outputs the spectral parameter series corresponding to the unit voice. Here, the pc coefficient al obtained by converting the cepstral coefficients will be used as the spectral parameter in the same manner as in the above-mentioned action section, but other known parameters may be used.

合成フィルタ回路６００は、ピッチ周期を変更した残差
信号を入力して係数＆ｌを用いて次式に従い合成音声ｘ
　（ｎ）を計算する。The synthesis filter circuit 600 inputs the residual signal whose pitch period has been changed and uses the coefficient &l to generate synthesized speech x according to the following formula.
Calculate (n).

ｘ　（ｎ）＝ｅ（ｎ）十　Σａｔ　　−ｘ　　（ｎ−ｉ
）　　　　　　　　　　　（６）以上で第１の発明の実
施例に対する説明を終える。x (n)=e(n) ten Σat −x (ni
) (6) This concludes the description of the embodiment of the first invention.

第２図は第２の発明の一実腫例を示すブロック図である
。第２図において、第１図と同一の番号を付した構成要
素は、第１図と同一の動きをするのでここでは説明を省
略する。FIG. 2 is a block diagram showing an example of a solid tumor according to the second invention. In FIG. 2, the components labeled with the same numbers as in FIG. 1 operate in the same manner as in FIG. 1, and therefore their explanations will be omitted here.

第２図において、ピッチ変化量計算回路６１０は、音源
格納回路から出力した残差信号の有声区間において、前
記残差信号により一旦合戒ファイルを駆動して合成音声
信号を求める。ここで合成フィルタの係数は、スペクト
ルパラメータ格納回路５８０より読み出して使用する。In FIG. 2, a pitch change amount calculation circuit 610 once drives a synchronization file using the residual signal in a voiced section of the residual signal output from the sound source storage circuit to obtain a synthesized speech signal. Here, the coefficients of the synthesis filter are read out from the spectrum parameter storage circuit 580 and used.

そして隣接ピッチ区間の合成音声に対して、前記〈５）
式に従い遅れ時間τ１．工を求め、前記（３）式に従い
ピッチ変化量りを求め、ピッチ制御回路５６０へ出力す
る。Then, for the synthesized speech of the adjacent pitch section, the above-mentioned <5)
Delay time τ1 according to the formula. Then, the pitch change amount is determined according to the above equation (3) and outputted to the pitch control circuit 560.

尚、上記実施例は、あくまでも本発明の一梢成にすぎず
、種々の変形も可能である。It should be noted that the above-mentioned embodiment is merely one example of the present invention, and various modifications are possible.

本実施例では、単位音声の全区間について、音源信号と
して、予測分析して得られた予測残差信号を用いたが、
演算量、メモリ量の低減のために、有声区間、特に母音
区間では、代表的な１ピッチ区間の予測残差信号を用い
て、この振幅、ピッチを制御しながら繰り返して用いて
もよい。In this example, the prediction residual signal obtained by predictive analysis was used as the sound source signal for the entire section of the unit speech.
In order to reduce the amount of calculation and memory, in voiced sections, especially vowel sections, a prediction residual signal of a typical one pitch section may be used repeatedly while controlling the amplitude and pitch.

また、音源信号としては、予測分析して得られる予測残
差信号のみならず、他の良好な音源信号、例えば、零位
相化信号、位相等化信号、マルチパルス音源などを用い
ることができる。Further, as the sound source signal, not only the prediction residual signal obtained by predictive analysis but also other good sound source signals such as a zero-phase signal, a phase equalized signal, a multi-pulse sound source, etc. can be used.

また、正確なピッチ周期Ｔの計算は上述の実施例のよう
に合成する際に計算するのではなく、単位音声の分析時
にＴを前記（１）、（２）、（５）に基づきあらかじめ
計算し格納しておき、合成時には前記（３）式にもとづ
きピッチ変化量りを計算して、ピッチを制御するように
してもよい。In addition, the accurate pitch period T is not calculated at the time of synthesis as in the above embodiment, but T is calculated in advance based on the above (1), (2), and (5) when analyzing the unit voice. It is also possible to control the pitch by storing the pitch and calculating the pitch change amount based on the equation (3) at the time of synthesis.

さらにピッチ区間を分割するときに、分割区間長を前記
正確なピッチ周期Ｔと等しい長さとするように分割する
こともできる。Furthermore, when dividing the pitch section, it is also possible to divide the pitch section so that the length of the divided section is equal to the exact pitch period T.

また、格納するスペクトルパラメータとしては、実肢例
の方法以外に他のスペクトルパラメータ、例えば、ホル
マント、ＡＲＭＡ、ＰＳＥ、ＬＳＰ、ＰＡＲＣＯＲ、メ
ルケプストラム、−膜化ケプストラム、メル一般化ケプ
ストラムなどを用いることができる。In addition to the actual example method, other spectral parameters such as formant, ARMA, PSE, LSP, PARCOR, mel cepstrum, -membrane cepstrum, mel generalized cepstrum, etc. can be used as the spectral parameters to be stored. can.

また、スペクトルパラメータとしてＬＰＣｌ数をスペク
トルパラメータ格納回路５８０に格納したが、ケプスト
ラムや改良ケプストラムを直接格納し、ケプストラムや
改良ケプストラムを用いて合成するようにすることもで
きる。Further, although the LPCl number is stored as a spectral parameter in the spectral parameter storage circuit 580, it is also possible to directly store a cepstrum or an improved cepstrum and perform synthesis using the cepstrum or improved cepstrum.

更にピッチ周期を大きく変更したときに、合成音声のス
ペクトル包絡はピッチ周期を変更する前のスペクトル包
絡と比べ変形あるいは歪んでいる可能性があるので、合
成フィルタ６００の後段にスペクトル包絡を補正する補
正フィルタを接続するようにしてもよい、補正フィルタ
の具体的な構成法は、前記文献１．２に開示されている
構成を用いることができる。Furthermore, when the pitch period is changed significantly, the spectral envelope of the synthesized speech may be deformed or distorted compared to the spectral envelope before the pitch period was changed, so a correction for correcting the spectral envelope is provided after the synthesis filter 600. As a specific method of configuring the correction filter, which may be connected with filters, the configuration disclosed in the above-mentioned document 1.2 can be used.

また、各単位音声毎に、ピッチの変化量に応じて前記補
正フィルタの補正用スペクトルパラメータをｃｏｄｅｂ
ｏｏｋとして有しておくか、あるいはスペクトルパラメ
ータの変化自体をｃｏｄｅｂｏｏｋあるいはテーブルと
して予め有しておき、スペクトルパラメータの最適な変
化を参照するようにしてもよい、このようにすると、前
者の場合では補正用フィルタの計算が簡略化され、後者
の場合では補正用フィルタの計算が不要となる。In addition, for each unit voice, the correction spectral parameters of the correction filter are coded according to the amount of change in pitch.
ook, or the changes in the spectral parameters themselves can be stored in advance as a codebook or table, and the optimal change in the spectral parameters can be referenced. In this way, in the former case, the correction In the latter case, calculation of the correction filter becomes unnecessary.

更に、振幅制御回路５７０は簡略化のために省略するこ
ともできる。Additionally, amplitude control circuit 570 may be omitted for simplicity.

また、本実總例では、韻律１ｔＩＩｆｊＲＩｒ１を報を
端子５００を通して入力する構成としたが、韻律制御に
関しては、アクセント情報、イントネーション情報を入
力して、規則により韻律制御情報を発生するよ−うにし
てもよい。Furthermore, in this practical example, the configuration is such that the prosody information 1tIIfjRIr1 is input through the terminal 500, but for prosody control, accent information and intonation information are input, and prosody control information is generated according to rules. You can.

（発明の効果）以上説明したように、本発明によれば、単位音声の全て
の区間について残差信号とスペクトルパラメータを有し
ており、残差信号のピッチ周期を変更するときに、隣接
ピッチ区間の残差信号どうしあるいは合成音声信号どう
しの相互相関関数の計算からピッチ変化量を求めてピッ
チ周期を変更しているので、従来方式に比べてピッチを
変化させたときに音質の劣化がほとんどない良好な合成
音声を得ることができるという大きな効果がある。(Effects of the Invention) As explained above, according to the present invention, a residual signal and a spectrum parameter are provided for every section of a unit speech, and when changing the pitch period of the residual signal, adjacent pitch Since the pitch period is changed based on the amount of pitch change calculated from the calculation of the cross-correlation function between the residual signals of the intervals or between the synthesized speech signals, there is almost no deterioration in sound quality when changing the pitch compared to conventional methods. This has the great effect of making it possible to obtain good synthesized speech.

[Brief explanation of drawings]

第１図は本発明における第１の発明の実施例を示すブロ
ック図、第２図は第２の発明の実施例を示すブロック図
、第３図は本発明の作用を示すブロック図、第４図は有
声区間におけるピッチ区間の残差波形を示す図である。２００．５５０・・・音源信号格納回路、２１０゜５８
０・・・スペクトルパラメータ格納回路、２２０゜５５
５．６１０・・・ピッチ変化量計算回路、２３０゜５６
０・・・ピッチ制御回路、２４０．５９０・・・時間長
制御回路、２５０，６００・・・合成フィルタ、５７０
・・・振幅制御回路。FIG. 1 is a block diagram showing an embodiment of the first invention in the present invention, FIG. 2 is a block diagram showing an embodiment of the second invention, FIG. 3 is a block diagram showing the operation of the present invention, and FIG. The figure shows a residual waveform of a pitch section in a voiced section. 200.550...Sound source signal storage circuit, 210°58
0...spectral parameter storage circuit, 220°55
5.610...Pitch change calculation circuit, 230°56
0... Pitch control circuit, 240.590... Time length control circuit, 250,600... Synthesis filter, 570
...Amplitude control circuit.

Claims

[Claims]

(1) storing a residual signal representing the sound source of the entire unit voice and a spectral parameter representing the spectral envelope, and connecting the residual signal by changing the prosody of the residual signal according to input unit voice connection information and prosody information; In a speech synthesis method that obtains synthesized speech by driving a synthesis filter configured based on the spectral parameters, a delay time that gives the maximum value of the cross-correlation function between the residual signals of the adjacent pitch sections is determined, and the delay time is and the prosodic information, and synthesizes speech by changing the pitch period of the residual signal by the pitch period change amount.

(2) Store a residual signal representing the sound source of the entire unit voice and a spectral parameter representing the spectral envelope, and connect the residual signal by changing the prosody of the residual signal according to input unit voice connection information and prosody information. In a speech synthesis method for obtaining synthesized speech by driving a synthesis filter configured based on spectral parameters, the synthesis filter is driven by a residual signal of an adjacent pitch section to obtain synthesized speech;
Determine the delay time that gives the maximum value of the cross-correlation function between the synthesized speech in the adjacent pitch sections, determine the amount of change in pitch period based on the delay time and the prosody information, and calculate the pitch period of the residual signal as described above. A speech synthesis method characterized by synthesizing speech by changing the amount of change in pitch period.