[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

JPH0380300A - Voice synthesizing system - Google Patents

Voice synthesizing system

Info

Publication number
JPH0380300A
JPH0380300A JP1216560A JP21656089A JPH0380300A JP H0380300 A JPH0380300 A JP H0380300A JP 1216560 A JP1216560 A JP 1216560A JP 21656089 A JP21656089 A JP 21656089A JP H0380300 A JPH0380300 A JP H0380300A
Authority
JP
Japan
Prior art keywords
pitch
residual signal
pitch period
speech
section
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP1216560A
Other languages
Japanese (ja)
Other versions
JP2600384B2 (en
Inventor
Kazunori Ozawa
一範 小澤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP1216560A priority Critical patent/JP2600384B2/en
Publication of JPH0380300A publication Critical patent/JPH0380300A/en
Application granted granted Critical
Publication of JP2600384B2 publication Critical patent/JP2600384B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Filters That Use Time-Delay Elements (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

PURPOSE:To obviate the deterioration in sound quality even if the pitch period at the time of synthesis is largely changed with respect to the pitch period of a unit voice by determining the change rate of the pitch period in accordance with delay time and rhythm information and synthesizing the voice by changing the pitch period of a residual signal by as much as the change rate of the pitch period. CONSTITUTION:Unit voice connection information is inputted from a terminal 100 and the rhythm information is inputted from a terminal 150. The unit voice connection information is supplied to a sound source signal storage section 200 and a spectral parameter storage section 210 and the rhythm information is supplied to a pitch control section 230 and a a time length control section 240. A pitch change rate calculating section 220 calculates the exact pitch period and delay time with respect to the residual waveforms between the adjacent pitch sections and determines the pitch change rate in the vocal section of the residual signal. The pitch control section 230 changes the pitch period of the residual signal by as much as the pitch change rate in the vocal section of the residual signal by using the pitch change rate. The good synthesized voices which are hardly deteriorated in sound quality when the pitch is changed are obtd. in this way.

Description

【発明の詳細な説明】 (産業上の利用分野) 本発明は、音声合成方式に関し、特に単位音声接続情報
と韻律情報とに従い単位音声の残差信号の韻律を変化さ
せて接続し、単位音声のスペクトルパラメータに基づい
て構成される合成フィルタを駆動して合成音声を得る音
声合成方式に関する。
Detailed Description of the Invention (Industrial Field of Application) The present invention relates to a speech synthesis method, and in particular, the present invention relates to a speech synthesis method, and in particular, connects unit speech by changing the prosody of residual signals of unit speech according to unit speech connection information and prosody information. This invention relates to a speech synthesis method that obtains synthesized speech by driving a synthesis filter configured based on spectral parameters.

(従来の技術) 比較的良好な音質の得られる規則合成方式として、単位
音声(例えば、cv、vc等)の全区間において音源信
号を表すための残差信号とスペクトル包絡を表すスペク
トルパラメータをあらかじめ単位音声から分析して求め
て格納し、入力した単位音声接続情報と、入力したピッ
チ周期、振幅、継続時間長などの韻律情報を用いて該当
する単位音声の残差情報の韻律(ピッチ周期、振幅、継
続時間長)を所望の値に制御してこれを接続し、接続し
た残差信号を該当するスペクトルパラメータにより構成
される合成フィルタに通して音声を合成する残差制御型
音声合成方式が知られている。
(Prior art) As a rule synthesis method that can obtain relatively good sound quality, a residual signal representing a sound source signal and a spectral parameter representing a spectral envelope are prepared in advance in the entire interval of a unit sound (e.g., CV, VC, etc.). The prosodic information (pitch period, A residual control type speech synthesis method that controls the amplitude and duration (amplitude and duration) to desired values, connects them, and synthesizes speech by passing the connected residual signal through a synthesis filter configured with the corresponding spectral parameters. Are known.

この方式の詳細は、例えば、特願昭63−136969
号明細書(文献1)、特願昭63−133478号(文
献2)や岩田比らによる“残差制御による音声合成シス
テムの検討” (日本音響学会m論、3−2−7.19
88年10月)(文献3)等に記載されている。
For details of this method, see Japanese Patent Application No. 63-136969.
specification (Reference 1), Japanese Patent Application No. 63-133478 (Reference 2), and “Study of speech synthesis system using residual control” by Hiro Iwata et al. (Acoustical Society of Japan M Theory, 3-2-7.19)
(October 1988) (Reference 3).

この方式によれば、単位音声をあらかじめ分析−・eて
得た残差信号を単位音声の全区間にわたり音源信号とし
て使用しているので、音源信号として有声区間ではイン
パルス列、無声区間では雑音信号を用いる方式と比べて
、合成音声の音質が各段に良好である。
According to this method, the residual signal obtained by analyzing the unit speech in advance is used as the sound source signal for the entire section of the unit speech, so the sound source signal is an impulse train in the voiced section, and a noise signal in the unvoiced section. The sound quality of the synthesized speech is much better than the method using .

(発明が解決しようとする課題) しかしながら、前述の残差制御型音声合成方式では、合
成時に残差信号のピッチ周期を変化させる範囲が小さい
ときは良好な音質の合成音声が得られるが、ピッチ周期
を大きく変化させると音質が劣化するという問題点があ
った。
(Problem to be Solved by the Invention) However, in the residual control type speech synthesis method described above, synthesized speech with good quality can be obtained when the range of changing the pitch period of the residual signal during synthesis is small, but the pitch There was a problem in that the sound quality deteriorated if the cycle was changed significantly.

残差制御型音声合成方式におけるピッチ周期は次のよう
に変化させている。即ち、有声区間では単位音声から求
めた残差信号のピッチ周期をあらかじめ計算し、残差信
号を前記ピッチ周期に等しい長さを有するピッチ区間に
予め分割する0次に、入力した韻律情報からの所望のピ
ッチ周期と前記区間長あるいはピッチ周期とを用いて残
差信号のピッチ変化量を計算し、これを用いて残差信号
のピッチ周期をピッチ変化量だけ変化させている。
The pitch period in the residual control type speech synthesis method is changed as follows. That is, in the voiced section, the pitch period of the residual signal obtained from the unit voice is calculated in advance, and the residual signal is divided in advance into pitch sections having a length equal to the pitch period. The pitch change amount of the residual signal is calculated using the desired pitch period and the section length or pitch period, and this is used to change the pitch period of the residual signal by the pitch change amount.

しかるに、残差信号をあらかじめピッチ区間毎に分割す
る際に、通常数%のピッチ周期抽出誤りが生ずる。これ
は単位音声のピッチ周期を変化させないときや変化させ
ても変化幅が小さいときには全く問題はないが、合成時
のピッチ周期を単位音声のピッチ周期に比べ大きく変化
させると、数%のピッチ抽出誤りが蓄積されて、ピッチ
を変化させた後のピッチ周期は韻律情報で指定したピッ
チ周期に対してゆらぎを生ずる。このゆらぎによって音
質劣化が発生していた。
However, when the residual signal is divided into pitch sections in advance, a pitch period extraction error of several percent usually occurs. This is no problem at all when the pitch period of the unit voice is not changed or when the change range is small even if it is changed, but if the pitch period during synthesis is changed greatly compared to the pitch period of the unit voice, a few percent of the pitch will be extracted. As errors accumulate, the pitch period after changing the pitch fluctuates with respect to the pitch period specified by the prosody information. This fluctuation caused deterioration in sound quality.

さらに、前記ピッチ区間の分割の際に、ピッチ区間毎に
残差信号のピッチ波形のピーク位置の位相をそろえて分
割することは困難なので、ピッチ周期を大きく変化させ
て合成したときに合成音声のピッチ波形の位相ずれによ
り音質が劣化していた。
Furthermore, when dividing the pitch interval, it is difficult to align the peak position of the pitch waveform of the residual signal for each pitch interval. The sound quality deteriorated due to the phase shift of the pitch waveform.

本発明の目的は、単位音声のピッチ周期に対して合成時
のピッチ周期を大きく変化させても音質の劣化のない音
声合成方式を提供することにある。
SUMMARY OF THE INVENTION An object of the present invention is to provide a speech synthesis method that does not cause deterioration in sound quality even when the pitch period during synthesis is greatly changed with respect to the pitch period of a unit speech.

〈課゛ばを解決するための手段) 本発明による音声合成方式は、単位音声全体の音源を表
わす残差信号とスペクトル包絡を表わすスペクトルパラ
メータとを格納し、単位音声接続情報と前記韻律情報と
に従い前記残差信号の韻律を変化させて接続し、前記ス
ペクトルパラメータに基づいて構成される合成フィルタ
を駆動して合成音声を得る音声合成方式において、前記
隣接ピッチ区間の残差信号どうしの相互相関関数の最大
値を与える遅れ時間を求め、前記遅れ時間と前記韻律情
報とに基づいてピッチ周期可変量を求め、前記残差信号
のピッチ周期を前記ピッチ周期可変量だけ変化させてい
る。
<Means for solving the problem> The speech synthesis method according to the present invention stores a residual signal representing the sound source of the entire unit speech and a spectral parameter representing the spectral envelope, and combines the unit speech connection information and the prosody information. In a speech synthesis method that obtains synthesized speech by changing the prosody of the residual signals according to the spectral parameters and driving a synthesis filter configured based on the spectral parameters, the cross-correlation between the residual signals of the adjacent pitch sections is determined. A delay time that gives a maximum value of the function is determined, a pitch period variable amount is determined based on the delay time and the prosody information, and the pitch period of the residual signal is changed by the pitch period variable amount.

また、隣接ピッチ区間の残差信号により前記合成フィル
タを駆動して合成音声を求め、前記隣接ピッチ区間の前
記合成音声どうしの相互相関関数の最大値を与える遅れ
時間を求め、前記遅れ時間と前記韻律情報とに基づいて
ピッチ周期変化量を求め、前記残差信号のピッチ周期を
前記ピッチ周期変化量だけ変化させ音声を合成する。
Further, the synthesis filter is driven by the residual signal of the adjacent pitch section to obtain a synthesized speech, a delay time that gives a maximum value of a cross-correlation function between the synthesized speech of the adjacent pitch sections is obtained, and the delay time and the A pitch period change amount is determined based on the prosody information, and the pitch period of the residual signal is changed by the pitch period change amount to synthesize speech.

(作曲) 第1の発明の作用を第3図を引用して説明する。(composition) The operation of the first invention will be explained with reference to FIG.

ここで、単位音声としては、例えば、日本語のcv、v
cを300〜400種を用いる場合を想定する。
Here, as unit sounds, for example, Japanese cv, v
It is assumed that 300 to 400 types of c are used.

有声区間では、単位音声から予めピッチ周期を抽出し、
ピッチ周期に長さが等しいピッチ区間毎に単位音声に対
して境界を設けておく、ピッチ抽出には、音声信号の自
己相関から求める方法や、他の公知な方法を用いること
ができる。また、ピッチ分割境界の求め方として、例え
ば、特願昭62−210690号(文献4)に提案され
ている技術が用いられる。
In the voiced section, the pitch period is extracted from the unit voice in advance,
For pitch extraction, in which a boundary is set for a unit voice for each pitch section whose length is equal to the pitch period, a method obtained from autocorrelation of the voice signal or other known methods can be used. Further, as a method for determining the pitch division boundary, for example, the technique proposed in Japanese Patent Application No. 1983-210690 (Reference 4) is used.

有声区間では、ピッチ区間毎に単位音声を分析して音声
信号のスペクトル包絡を表すスペクトルパラメータと残
差信号を求めておく、また無声区間では、あらかじめ定
められた一定区間(例えば5m5)毎に分析する0分析
には前記文献2,3と同じく改良ケプヌトラム分析を用
いるが、他の公知の良好なスペクトル分析法を用いるこ
とがでlる。残差信号は前記文献1,2.3と同様に単
位音声の全区間について求める。
In voiced sections, the unit speech is analyzed for each pitch section to obtain spectral parameters and residual signals representing the spectral envelope of the speech signal.In unvoiced sections, the analysis is performed for each predetermined interval (for example, 5m5). The modified Cepnutrum analysis is used for the 0 analysis as in References 2 and 3, but other known good spectral analysis methods can be used. The residual signal is obtained for the entire section of the unit speech in the same manner as in References 1 and 2.3.

以上の処理を予め行い、音源信号格納部200には単位
音声の全区間での残差信号を各単位音声毎に格納してお
く、またスペクトルパラメータ格納部210には、有声
区間ではピッチ区間毎に、無声区間では予め定められた
一定時間毎に求めたスペクトルパラメータを格納してお
く。
The above processing is performed in advance, and the sound source signal storage unit 200 stores the residual signal for the entire unit voice for each unit voice, and the spectral parameter storage unit 210 stores the residual signal for each pitch interval in the voiced interval. In addition, in the silent section, spectral parameters obtained at predetermined intervals are stored.

次に、端子100から単位音声接続情報を、端子150
から韻律情報(ピッチ周期、音韻の継続時間長、振幅)
を入力する。単位音声接続情報は音源信号格納部200
とスペクトルパラメータ格納部210とへ供給され、韻
律情報はピッチ制御部230と時間長制御部240とへ
供給される。
Next, unit audio connection information is transferred from the terminal 100 to the terminal 150.
prosodic information (pitch period, phonological duration, amplitude)
Enter. The unit audio connection information is stored in the audio source signal storage section 200.
and spectral parameter storage section 210, and prosody information is supplied to pitch control section 230 and time length control section 240.

ピッチ変化量計算部220は、残差信号の有声区間で隣
接ピッチ区間の残差波形に対して正確なピッチ周期T、
遅れ時間τ11.を計算し、ピッチ変化量を求める。第
4図に示すように、分割したピッチ区間■における区間
長をLL、残差波形をe+  (n)、ピッチ区間■に
おける区間長をL2、TA藻波形をe2 (n)とする
、el (n)とe2(n)の相互相関関数を次式によ
り計算する。
The pitch change calculation unit 220 calculates an accurate pitch period T, with respect to the residual waveform of the adjacent pitch section in the voiced section of the residual signal,
Delay time τ11. Calculate and find the amount of pitch change. As shown in Fig. 4, the section length of the divided pitch section ■ is LL, the residual waveform is e+ (n), the section length of the pitch section ■ is L2, and the TA waveform is e2 (n), el ( The cross-correlation function of n) and e2(n) is calculated using the following equation.

Φ (τ)=Σet  (n+L+  +τ)e2 (
n)    (1)種々の遅れ時間に対して(1)式を
計算し、(1)式を最大化する遅れ時間をτ6.8とす
る。このとき隣接ピッチ間の正確なピッチ周期Tは次式
により求められる。
Φ (τ)=Σet (n+L+ +τ)e2 (
n) (1) Calculate equation (1) for various delay times, and set the delay time that maximizes equation (1) to τ6.8. At this time, the accurate pitch period T between adjacent pitches can be determined by the following equation.

T=L1+τ、、、            (2)従
って、単位音声の有するピッチ周期Tを、韻律情報とし
て入力した合成したいピッチ周期T°にするためには、
ピッチ変化量りは次式から求められる。
T=L1+τ,,, (2) Therefore, in order to make the pitch period T of the unit speech to the pitch period T° that is inputted as prosody information and which is desired to be synthesized,
The pitch change scale can be obtained from the following formula.

D=T’ −LL−τ、、、=T’−T     (3
)求めたピッチ変化量りをピッチ制御部230へ出力す
る。
D=T'-LL-τ,,,=T'-T (3
) The determined pitch change scale is output to the pitch control section 230.

ピッチ制御部230は、ピッチ変化量りを用いて残差信
号の有声区間で残差信号のピッチ周期をピッチ変化量だ
け変化させる。具体的には、前記文献1.2のように、
ピッチ周期をDだけ長くさせ′るときにはピッチ区間の
後方にDサンプルだけOを詰める。一方、ピッチ周期を
Dだけ短くさせるときにはピッチ区間の後方から残差信
号をDサンプルだけ切り詰めていく、なお、ピッチ周期
の変化法としては他の公知な方法を用いることもできる
ことは勿論である。
The pitch control unit 230 changes the pitch period of the residual signal by the amount of pitch change in the voiced section of the residual signal using a pitch change measure. Specifically, as in the above document 1.2,
When the pitch period is lengthened by D, O is inserted at the rear of the pitch section by D samples. On the other hand, when the pitch period is shortened by D, the residual signal is truncated by D samples from the rear of the pitch section. Of course, other known methods can also be used to change the pitch period.

時間長制御部240は、入力した韻律情報にうちの継続
時間長を用いて、単位音声を接続して求めた音韻の継続
時間長を制御する。具体的には前記文献1.2の時間長
制御部を参照することができる。
The duration control unit 240 uses the duration length of the input prosody information to control the duration length of a phoneme obtained by connecting unit speech. Specifically, reference can be made to the time length control section in Document 1.2.

合成フィルタ250は、単位音声が接続され、ピッチ周
期、継続1時間長という韻律情報が制御された残差信号
を入力して次式によりスペクトルパラメータを用いて音
声を合成して端子260より出力する。なお、ここでス
ペクトルパラメータとしては、制御し易さを考慮して、
改良ゲプヌトラムを線形予測係数a1に変換したものを
用いる。
The synthesis filter 250 inputs the residual signal to which the unit speech is connected and in which the prosodic information such as the pitch period and duration of one hour is controlled, synthesizes the speech using the spectral parameters according to the following equation, and outputs the synthesized speech from the terminal 260. . Note that the spectral parameters here are as follows, considering ease of control:
A modified gepnutrum converted into a linear prediction coefficient a1 is used.

改良ゲプヌトラムから線形予測係数への変換は、例えば
前記文献2を参照できる。
For the conversion from improved gepnutrum to linear prediction coefficients, reference can be made to the above-mentioned document 2, for example.

次に第2の発明では、−旦、合成音声を求めて合成音声
信号レベルでピッチ変化量りを求める。
Next, in the second aspect of the invention, first, synthesized speech is obtained and a pitch change measurement is obtained based on the synthesized speech signal level.

求めかたを以下に示す6図4においてピッチ区間■にお
ける合成音声をx+(n)、ピッチ区間■における合成
音声をx2 (n)とする、これらの合成音声はピッチ
区間の残差信号を合成フィルタ250に一旦通ずことに
より求められる0次に次式に従い相互相関関数を計算す
る。
The calculation method is shown below.6 In Figure 4, the synthesized speech in pitch interval ■ is x + (n), and the synthesized speech in pitch interval ■ is x2 (n). These synthesized voices are synthesized by the residual signals of the pitch interval. Once passed through the filter 250, a cross-correlation function is calculated according to the zero-order equation.

Φ (τ)= Σ XI  (n+L  +  τ )
  X2  (n)      (5)(5)式を最大
化するτをτ、1工として求め、前記(2)、(3)式
から正確なピッチ周期T、ピッチ変化量りを求める。
Φ (τ) = Σ XI (n+L + τ)
X2 (n) (5) Find τ that maximizes equation (5) as τ, 1 work, and find the accurate pitch period T and pitch change amount from equations (2) and (3).

(実施例) 第1図に第1の一実施例を示すブロック図を示す。(Example) FIG. 1 shows a block diagram showing a first embodiment.

制御回路510は、端子500から韻律制御情報(ピッ
チ、継続時間長、振幅)単位音声の接続情報を入力し、
音源格納回路550、スペクトルパラメータ格納口v?
1580、ピッチ変化量計算回路555、振幅制御回路
570、時間長制御回路590へ出力する。
The control circuit 510 inputs prosody control information (pitch, duration, amplitude) unit voice connection information from the terminal 500,
Sound source storage circuit 550, spectrum parameter storage port v?
1580, the pitch change amount calculation circuit 555, the amplitude control circuit 570, and the time length control circuit 590.

音源格納回路550は、単位音声の接続情報を入力し、
その単位音声に対応する予測残差信号を出力する。
The sound source storage circuit 550 inputs the connection information of the unit sound,
A prediction residual signal corresponding to the unit speech is output.

ピッチ変化量計算回路555は、韻律制御情報から合成
時のピッチ周期T″を入力する。また、残差信号の有声
区間で隣接ピッチ区間の残差波形に対して正確なピッチ
周期T、遅れ時間τ16.を計算し、ピッチ変化量りを
求める。ピッチ変化量りは前記(1)〜(3〉式に従い
計算することができる。求めたDをピッチ制御回路56
0へ出力する。
The pitch change calculation circuit 555 inputs the pitch period T'' at the time of synthesis from the prosody control information.In addition, the pitch change calculation circuit 555 inputs the pitch period T'' at the time of synthesis from the prosody control information. τ16. is calculated to obtain the pitch change measure.The pitch change measure can be calculated according to formulas (1) to (3> above.The obtained D is calculated by the pitch control circuit 56.
Output to 0.

ピッチ制御回路560は、ピッチ変化量りを入力し、有
声区間においてあらかじめ指定されているピッチ分割位
置を用いて、残差信号のピッチ周期の変更を行う、ピッ
チ周期を変更するための具体的な方法については、前記
作用の項で説明した方−決や、他の公知の方法を用いる
ことができる。
The pitch control circuit 560 inputs a pitch change measure and changes the pitch period of the residual signal using pitch division positions specified in advance in a voiced section. For this purpose, the method explained in the section of the above-mentioned operation or other known methods can be used.

時間長制御回路590は、制御回路510から継続時間
長情報を入力し、単位音声を接続して得た音韻の継続時
間長が所望の時間長となるように時間長を制御する。詳
細は前記文献1.2の時間長制御回路を参照できる。
The duration control circuit 590 receives duration information from the control circuit 510 and controls the duration so that the duration of the phoneme obtained by connecting the unit voices becomes a desired duration. For details, refer to the time length control circuit in Document 1.2.

次に、振幅制御回路570は、振幅制御情報を入力し、
それに従い、残差信号の振幅を制御しe (n)を出力
する。
Next, the amplitude control circuit 570 inputs amplitude control information,
Accordingly, the amplitude of the residual signal is controlled and e (n) is output.

スペクトルパラメータ格納口F#1580は、単位音声
の接続情報を入力し、その単位音声に対応するスペクト
ルパラメータ系列を出力する。ここでは、前記作用の項
と同様にスペクトルパラメータとして、ケプストラム係
数から変換して求めたしpc係数alを用いることにす
るが、他の公知なパラメータを用いることができる。
The spectral parameter storage port F#1580 inputs the connection information of the unit voice and outputs the spectral parameter series corresponding to the unit voice. Here, the pc coefficient al obtained by converting the cepstral coefficients will be used as the spectral parameter in the same manner as in the above-mentioned action section, but other known parameters may be used.

合成フィルタ回路600は、ピッチ周期を変更した残差
信号を入力して係数&lを用いて次式に従い合成音声x
 (n)を計算する。
The synthesis filter circuit 600 inputs the residual signal whose pitch period has been changed and uses the coefficient &l to generate synthesized speech x according to the following formula.
Calculate (n).

x (n)=e(n)十 Σat  −x  (n−i
)           (6)以上で第1の発明の実
施例に対する説明を終える。
x (n)=e(n) ten Σat −x (ni
) (6) This concludes the description of the embodiment of the first invention.

第2図は第2の発明の一実腫例を示すブロック図である
。第2図において、第1図と同一の番号を付した構成要
素は、第1図と同一の動きをするのでここでは説明を省
略する。
FIG. 2 is a block diagram showing an example of a solid tumor according to the second invention. In FIG. 2, the components labeled with the same numbers as in FIG. 1 operate in the same manner as in FIG. 1, and therefore their explanations will be omitted here.

第2図において、ピッチ変化量計算回路610は、音源
格納回路から出力した残差信号の有声区間において、前
記残差信号により一旦合戒ファイルを駆動して合成音声
信号を求める。ここで合成フィルタの係数は、スペクト
ルパラメータ格納回路580より読み出して使用する。
In FIG. 2, a pitch change amount calculation circuit 610 once drives a synchronization file using the residual signal in a voiced section of the residual signal output from the sound source storage circuit to obtain a synthesized speech signal. Here, the coefficients of the synthesis filter are read out from the spectrum parameter storage circuit 580 and used.

そして隣接ピッチ区間の合成音声に対して、前記〈5)
式に従い遅れ時間τ1.工を求め、前記(3)式に従い
ピッチ変化量りを求め、ピッチ制御回路560へ出力す
る。
Then, for the synthesized speech of the adjacent pitch section, the above-mentioned <5)
Delay time τ1 according to the formula. Then, the pitch change amount is determined according to the above equation (3) and outputted to the pitch control circuit 560.

尚、上記実施例は、あくまでも本発明の一梢成にすぎず
、種々の変形も可能である。
It should be noted that the above-mentioned embodiment is merely one example of the present invention, and various modifications are possible.

本実施例では、単位音声の全区間について、音源信号と
して、予測分析して得られた予測残差信号を用いたが、
演算量、メモリ量の低減のために、有声区間、特に母音
区間では、代表的な1ピッチ区間の予測残差信号を用い
て、この振幅、ピッチを制御しながら繰り返して用いて
もよい。
In this example, the prediction residual signal obtained by predictive analysis was used as the sound source signal for the entire section of the unit speech.
In order to reduce the amount of calculation and memory, in voiced sections, especially vowel sections, a prediction residual signal of a typical one pitch section may be used repeatedly while controlling the amplitude and pitch.

また、音源信号としては、予測分析して得られる予測残
差信号のみならず、他の良好な音源信号、例えば、零位
相化信号、位相等化信号、マルチパルス音源などを用い
ることができる。
Further, as the sound source signal, not only the prediction residual signal obtained by predictive analysis but also other good sound source signals such as a zero-phase signal, a phase equalized signal, a multi-pulse sound source, etc. can be used.

また、正確なピッチ周期Tの計算は上述の実施例のよう
に合成する際に計算するのではなく、単位音声の分析時
にTを前記(1)、(2)、(5)に基づきあらかじめ
計算し格納しておき、合成時には前記(3)式にもとづ
きピッチ変化量りを計算して、ピッチを制御するように
してもよい。
In addition, the accurate pitch period T is not calculated at the time of synthesis as in the above embodiment, but T is calculated in advance based on the above (1), (2), and (5) when analyzing the unit voice. It is also possible to control the pitch by storing the pitch and calculating the pitch change amount based on the equation (3) at the time of synthesis.

さらにピッチ区間を分割するときに、分割区間長を前記
正確なピッチ周期Tと等しい長さとするように分割する
こともできる。
Furthermore, when dividing the pitch section, it is also possible to divide the pitch section so that the length of the divided section is equal to the exact pitch period T.

また、格納するスペクトルパラメータとしては、実肢例
の方法以外に他のスペクトルパラメータ、例えば、ホル
マント、ARMA、PSE、LSP、PARCOR、メ
ルケプストラム、−膜化ケプストラム、メル一般化ケプ
ストラムなどを用いることができる。
In addition to the actual example method, other spectral parameters such as formant, ARMA, PSE, LSP, PARCOR, mel cepstrum, -membrane cepstrum, mel generalized cepstrum, etc. can be used as the spectral parameters to be stored. can.

また、スペクトルパラメータとしてLPCl数をスペク
トルパラメータ格納回路580に格納したが、ケプスト
ラムや改良ケプストラムを直接格納し、ケプストラムや
改良ケプストラムを用いて合成するようにすることもで
きる。
Further, although the LPCl number is stored as a spectral parameter in the spectral parameter storage circuit 580, it is also possible to directly store a cepstrum or an improved cepstrum and perform synthesis using the cepstrum or improved cepstrum.

更にピッチ周期を大きく変更したときに、合成音声のス
ペクトル包絡はピッチ周期を変更する前のスペクトル包
絡と比べ変形あるいは歪んでいる可能性があるので、合
成フィルタ600の後段にスペクトル包絡を補正する補
正フィルタを接続するようにしてもよい、補正フィルタ
の具体的な構成法は、前記文献1.2に開示されている
構成を用いることができる。
Furthermore, when the pitch period is changed significantly, the spectral envelope of the synthesized speech may be deformed or distorted compared to the spectral envelope before the pitch period was changed, so a correction for correcting the spectral envelope is provided after the synthesis filter 600. As a specific method of configuring the correction filter, which may be connected with filters, the configuration disclosed in the above-mentioned document 1.2 can be used.

また、各単位音声毎に、ピッチの変化量に応じて前記補
正フィルタの補正用スペクトルパラメータをcodeb
ookとして有しておくか、あるいはスペクトルパラメ
ータの変化自体をcodebookあるいはテーブルと
して予め有しておき、スペクトルパラメータの最適な変
化を参照するようにしてもよい、このようにすると、前
者の場合では補正用フィルタの計算が簡略化され、後者
の場合では補正用フィルタの計算が不要となる。
In addition, for each unit voice, the correction spectral parameters of the correction filter are coded according to the amount of change in pitch.
ook, or the changes in the spectral parameters themselves can be stored in advance as a codebook or table, and the optimal change in the spectral parameters can be referenced. In this way, in the former case, the correction In the latter case, calculation of the correction filter becomes unnecessary.

更に、振幅制御回路570は簡略化のために省略するこ
ともできる。
Additionally, amplitude control circuit 570 may be omitted for simplicity.

また、本実總例では、韻律1tIIfjRIr1を報を
端子500を通して入力する構成としたが、韻律制御に
関しては、アクセント情報、イントネーション情報を入
力して、規則により韻律制御情報を発生するよ−うにし
てもよい。
Furthermore, in this practical example, the configuration is such that the prosody information 1tIIfjRIr1 is input through the terminal 500, but for prosody control, accent information and intonation information are input, and prosody control information is generated according to rules. You can.

(発明の効果) 以上説明したように、本発明によれば、単位音声の全て
の区間について残差信号とスペクトルパラメータを有し
ており、残差信号のピッチ周期を変更するときに、隣接
ピッチ区間の残差信号どうしあるいは合成音声信号どう
しの相互相関関数の計算からピッチ変化量を求めてピッ
チ周期を変更しているので、従来方式に比べてピッチを
変化させたときに音質の劣化がほとんどない良好な合成
音声を得ることができるという大きな効果がある。
(Effects of the Invention) As explained above, according to the present invention, a residual signal and a spectrum parameter are provided for every section of a unit speech, and when changing the pitch period of the residual signal, adjacent pitch Since the pitch period is changed based on the amount of pitch change calculated from the calculation of the cross-correlation function between the residual signals of the intervals or between the synthesized speech signals, there is almost no deterioration in sound quality when changing the pitch compared to conventional methods. This has the great effect of making it possible to obtain good synthesized speech.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明における第1の発明の実施例を示すブロ
ック図、第2図は第2の発明の実施例を示すブロック図
、第3図は本発明の作用を示すブロック図、第4図は有
声区間におけるピッチ区間の残差波形を示す図である。 200.550・・・音源信号格納回路、210゜58
0・・・スペクトルパラメータ格納回路、220゜55
5.610・・・ピッチ変化量計算回路、230゜56
0・・・ピッチ制御回路、240.590・・・時間長
制御回路、250,600・・・合成フィルタ、570
・・・振幅制御回路。
FIG. 1 is a block diagram showing an embodiment of the first invention in the present invention, FIG. 2 is a block diagram showing an embodiment of the second invention, FIG. 3 is a block diagram showing the operation of the present invention, and FIG. The figure shows a residual waveform of a pitch section in a voiced section. 200.550...Sound source signal storage circuit, 210°58
0...spectral parameter storage circuit, 220°55
5.610...Pitch change calculation circuit, 230°56
0... Pitch control circuit, 240.590... Time length control circuit, 250,600... Synthesis filter, 570
...Amplitude control circuit.

Claims (2)

【特許請求の範囲】[Claims] (1)単位音声全体の音源を表わす残差信号とスペクト
ル包絡を表わすスペクトルパラメータとを格納し、入力
される単位音声接続情報と韻律情報とに従い前記残差信
号の韻律を変化させて接続し、前記スペクトルパラメー
タに基づいて構成される合成フィルタを駆動して合成音
声を得る音声合成方式において、前記隣接ピッチ区間の
残差信号どうしの相互相関関数の最大値を与える遅れ時
間を求め、前記遅れ時間と前記韻律情報とに基づいてピ
ッチ周期変化量を求め、前記残差信号のピッチ周期を前
記ピッチ周期変化量だけ変化させて音声を合成すること
を特徴とする音声合成方式。
(1) storing a residual signal representing the sound source of the entire unit voice and a spectral parameter representing the spectral envelope, and connecting the residual signal by changing the prosody of the residual signal according to input unit voice connection information and prosody information; In a speech synthesis method that obtains synthesized speech by driving a synthesis filter configured based on the spectral parameters, a delay time that gives the maximum value of the cross-correlation function between the residual signals of the adjacent pitch sections is determined, and the delay time is and the prosodic information, and synthesizes speech by changing the pitch period of the residual signal by the pitch period change amount.
(2)単位音声全体の音源を表わす残差信号とスペクト
ル包絡を表わすスペクトルパラメータとを格納し、入力
される単位音声接続情報と韻律情報とに従い前記残差信
号の韻律を変化させて接続し前記スペクトルパラメータ
に基づいて構成される合成フィルタを駆動して合成音声
を得る音声合成方式において、隣接ピッチ区間の残差信
号により前記合成フィルタを駆動して合成音声を求め、
前記隣接ピッチ区間の前記合成音声どうしの相互相関関
数の最大値を与える遅れ時間を求め、前記遅れ時間と前
記韻律情報とに基づいてピッチ周期変化量を求め、前記
残差信号のピッチ周期を前記ピッチ周期変化量だけ変化
させ音声を合成することを特徴とする音声合成方式。
(2) Store a residual signal representing the sound source of the entire unit voice and a spectral parameter representing the spectral envelope, and connect the residual signal by changing the prosody of the residual signal according to input unit voice connection information and prosody information. In a speech synthesis method for obtaining synthesized speech by driving a synthesis filter configured based on spectral parameters, the synthesis filter is driven by a residual signal of an adjacent pitch section to obtain synthesized speech;
Determine the delay time that gives the maximum value of the cross-correlation function between the synthesized speech in the adjacent pitch sections, determine the amount of change in pitch period based on the delay time and the prosody information, and calculate the pitch period of the residual signal as described above. A speech synthesis method characterized by synthesizing speech by changing the amount of change in pitch period.
JP1216560A 1989-08-23 1989-08-23 Voice synthesis method Expired - Lifetime JP2600384B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1216560A JP2600384B2 (en) 1989-08-23 1989-08-23 Voice synthesis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1216560A JP2600384B2 (en) 1989-08-23 1989-08-23 Voice synthesis method

Publications (2)

Publication Number Publication Date
JPH0380300A true JPH0380300A (en) 1991-04-05
JP2600384B2 JP2600384B2 (en) 1997-04-16

Family

ID=16690346

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1216560A Expired - Lifetime JP2600384B2 (en) 1989-08-23 1989-08-23 Voice synthesis method

Country Status (1)

Country Link
JP (1) JP2600384B2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04370810A (en) * 1991-06-19 1992-12-24 Daikin Ind Ltd Temperature control method for electric carpet
WO2006114964A1 (en) * 2005-04-22 2006-11-02 Kyushu Institute Of Technology Pitch period equalizing apparatus, pitch period equalizing method, sound encoding apparatus, sound decoding apparatus, and sound encoding method
US7630883B2 (en) 2001-08-31 2009-12-08 Kabushiki Kaisha Kenwood Apparatus and method for creating pitch wave signals and apparatus and method compressing, expanding and synthesizing speech signals using these pitch wave signals
US8346546B2 (en) * 2006-08-15 2013-01-01 Broadcom Corporation Packet loss concealment based on forced waveform alignment after packet loss

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
USD946629S1 (en) 2020-11-24 2022-03-22 Aquastar Pool Products, Inc. Centrifugal pump
US11193504B1 (en) 2020-11-24 2021-12-07 Aquastar Pool Products, Inc. Centrifugal pump having a housing and a volute casing wherein the volute casing has a tear-drop shaped inner wall defined by a circular body region and a converging apex with the inner wall comprising a blocker below at least one perimeter end of one diffuser blade
USD986289S1 (en) 2020-11-24 2023-05-16 Aquastar Pool Products, Inc. Centrifugal pump

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04370810A (en) * 1991-06-19 1992-12-24 Daikin Ind Ltd Temperature control method for electric carpet
US7630883B2 (en) 2001-08-31 2009-12-08 Kabushiki Kaisha Kenwood Apparatus and method for creating pitch wave signals and apparatus and method compressing, expanding and synthesizing speech signals using these pitch wave signals
US7647226B2 (en) 2001-08-31 2010-01-12 Kabushiki Kaisha Kenwood Apparatus and method for creating pitch wave signals, apparatus and method for compressing, expanding, and synthesizing speech signals using these pitch wave signals and text-to-speech conversion using unit pitch wave signals
WO2006114964A1 (en) * 2005-04-22 2006-11-02 Kyushu Institute Of Technology Pitch period equalizing apparatus, pitch period equalizing method, sound encoding apparatus, sound decoding apparatus, and sound encoding method
JP2006301464A (en) * 2005-04-22 2006-11-02 Kyushu Institute Of Technology Device and method for pitch cycle equalization, and audio encoding device, audio decoding device, and audio encoding method
JP4599558B2 (en) * 2005-04-22 2010-12-15 国立大学法人九州工業大学 Pitch period equalizing apparatus, pitch period equalizing method, speech encoding apparatus, speech decoding apparatus, and speech encoding method
US7957958B2 (en) 2005-04-22 2011-06-07 Kyushu Institute Of Technology Pitch period equalizing apparatus and pitch period equalizing method, and speech coding apparatus, speech decoding apparatus, and speech coding method
US8346546B2 (en) * 2006-08-15 2013-01-01 Broadcom Corporation Packet loss concealment based on forced waveform alignment after packet loss

Also Published As

Publication number Publication date
JP2600384B2 (en) 1997-04-16

Similar Documents

Publication Publication Date Title
JP6024191B2 (en) Speech synthesis apparatus and speech synthesis method
KR940002854B1 (en) Sound synthesizing system
US5682502A (en) Syllable-beat-point synchronized rule-based speech synthesis from coded utterance-speed-independent phoneme combination parameters
JP3294604B2 (en) Processor for speech synthesis by adding and superimposing waveforms
JP4469883B2 (en) Speech synthesis method and apparatus
US8195464B2 (en) Speech processing apparatus and program
JPH031200A (en) Regulation type voice synthesizing device
US5381514A (en) Speech synthesizer and method for synthesizing speech for superposing and adding a waveform onto a waveform obtained by delaying a previously obtained waveform
JPH0380300A (en) Voice synthesizing system
JP2002358090A (en) Speech synthesizing method, speech synthesizer and recording medium
JP4225128B2 (en) Regular speech synthesis apparatus and regular speech synthesis method
JPH0247700A (en) Speech synthesizing method
JP4451665B2 (en) How to synthesize speech
EP1093111B1 (en) Amplitude control for speech synthesis
JPH10124082A (en) Singing voice synthesizing device
Fries Hybrid time-and frequency-domain speech synthesis with extended glottal source generation
JP3089940B2 (en) Speech synthesizer
JPH09510554A (en) Language synthesis
EP1505570A1 (en) Singing voice synthesizing method
JP6047952B2 (en) Speech synthesis apparatus and speech synthesis method
JPH01304500A (en) System and device for speech synthesis
JP3284634B2 (en) Rule speech synthesizer
JP3063088B2 (en) Speech analysis and synthesis device, speech analysis device and speech synthesis device
JPH056191A (en) Voice synthesizing device
JP2001312300A (en) Voice synthesizing device