JP2703253B2

JP2703253B2 - Speech synthesizer

Info

Publication number: JP2703253B2
Application number: JP63040627A
Authority: JP
Inventors: 成利斉藤
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1988-02-25
Filing date: 1988-02-25
Publication date: 1998-01-26
Anticipated expiration: 2013-01-26
Also published as: JPH01216399A

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は、自然性の高い合成音声を生成する音声合成
装置及び音声合成方法に関する。DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Industrial application field) The present invention relates to a speech synthesis device and a speech synthesis method for generating a synthesized speech with high naturalness.

（従来の技術）従来より種々の音声合成の手法が提唱されており、そ
の１つに規則合成がある。この規則合成は、任意の入力
文字列を解析してその音韻情報と韻律情報とを求め、こ
れらの情報に従って予め定められた規則に基き、上記入
力文字列が示す音声を合成して出力するものである。こ
の規則合成によれば、任意の単語やフレーズの合成音声
を容易に生成することができる。この規則合成によって
生成された合成音声は、了解度の点では高品質である。
しかし、「キ、ク、シ、ス、チ、ツ、ピ、プ、ヒ、フ」
等の無声化する母音を含む単語や文章を音声合成する場
合についても良好な合成音声を得るには、無声化する母
音の音声素片パラメータファイルの音声パワーを適切な
ものに設定しなければならない。しかし、この設定は難
しく、前記音声素片パラメータファイルを作成するに
は、音声のエネルギーバランスの調節や明瞭度を考慮し
て相当の労力が必要となる。女声合成音声を出力するだ
けの規則合成装置では、試聴実験を繰り返し、無声化母
音の音声素片パラメータファイルを作成すればよい。し
かし、同時に複数種類の音声素片パラメータファイルを
備え、複数種類（例えば、女声、男声、子供の声の３種
類）の合成音声を出力することが可能な音声合成装置で
は、合成音声の種類だけ無声化母音の音声素片パラメー
タファイルを作成する必要があり、多大な労力を必要と
していた。(Prior Art) Conventionally, various speech synthesis techniques have been proposed, and one of them is rule synthesis. This rule synthesis analyzes an arbitrary input character string, obtains its phonological information and prosody information, and synthesizes and outputs the voice indicated by the input character string based on a predetermined rule according to the information. It is. According to this rule synthesis, a synthesized speech of an arbitrary word or phrase can be easily generated. The synthesized speech generated by this rule synthesis is of high quality in terms of intelligibility.
However, "ki, ku, shi, su, chi, tsu, pi, pu, hi, fu"
In order to obtain good synthesized speech even when speech or words containing vowels to be devoiced are synthesized, the voice power of the speech unit parameter file of the vowel to be devoiced must be set to an appropriate value. . However, this setting is difficult, and considerable effort is required to create the speech unit parameter file in consideration of adjustment of the energy balance of speech and intelligibility. In a rule synthesizing apparatus that only outputs a female voice synthesized voice, a trial listening experiment may be repeated to create a voice unit parameter file of unvoiced vowels. However, in a speech synthesizer that includes a plurality of types of speech unit parameter files at the same time and can output a plurality of types (for example, three types of female voice, male voice, and child voice), only the type of synthesized voice is used. It is necessary to create a speech unit parameter file of the unvoiced vowel, which requires a great deal of labor.

（発明が解決しようとする課題）このように従来の規則合成方式を採り、複数種類の合
成音声が出力可能な音声合成装置を開発するためには、
その種類の数だけ音声素片パラメータファイルを作成し
なければならない。しかし、これら音声素片パラメータ
ファイルは音声パワーの設定や明瞭度の問題から作成す
るのが難しく、特に無声化する母音の音節について、試
聴実験を繰返して作成しなければならず、その作成に多
大な労力を必要するという問題点があった。そこで本発
明は上記の欠点を除去するもので、規則合成される音声
の滑らかさ及び聞きとり易すさを損うことなく複数種類
の音声を合成でき且つ、無声化する母音の音節の音声素
片パラメータファイルの作成労力を軽減できる音声合成
装置及び音声合成方法を提供することを目的としてい
る。(Problems to be Solved by the Invention) As described above, in order to adopt a conventional rule synthesis method and develop a speech synthesizer capable of outputting a plurality of types of synthesized speech,
Speech unit parameter files must be created for each type. However, it is difficult to create these speech unit parameter files due to problems with audio power settings and intelligibility.In particular, vowel syllables to be unvoiced must be repeatedly created through trial listening experiments. There was a problem that required great effort. Therefore, the present invention eliminates the above-described disadvantages, and can synthesize a plurality of types of voices without impairing the smoothness and audibility of voices that are rule-synthesized, and can produce speech units of vowel syllables to be unvoiced. An object of the present invention is to provide a speech synthesis device and a speech synthesis method that can reduce the labor for creating a parameter file.

［発明の構成］（課題を解決するための手段）本発明は、入力文字列を解析して音韻記号列及び韻律
情報を求める文字列解析手段と、合成音声の種類の少な
くとも１種類の無声化母音の音声素片パラメータを格納
すると共に、合成音声の種類に合わせた複数種類の無声
化母音以外の音声素片パラメータを格納する音声素片パ
ラメータ格納手段と、前記文字列解析手段で求めた音声
記号列から、合成音声の種類を示す情報に基づき、前記
音声素片パラメータ格納手段を参照して音声パラメータ
列を生成する音声パラメータ列生成手段と、前記文字列
解析手段で求めた韻律情報及び合成音声の種類を示す情
報に基づいて決定する基本ピッチと音源規則に基づき、
韻律パラメータ列を生成する韻律パラメータ列生成手段
と、前記音声パラメータ列及び前記韻律パラメータ列に
基づき、合成音声を出力する音声合成手段とを設けた構
成を有している。[Constitution of the Invention] (Means for Solving the Problems) The present invention provides a character string analyzing means for analyzing an input character string to obtain a phoneme symbol string and prosody information, and at least one type of synthesized voice de-voicing. Speech unit parameter storage means for storing speech unit parameters of vowels and for storing speech unit parameters other than a plurality of types of unvoiced vowels according to the type of synthesized speech, and speech obtained by the character string analysis unit A voice parameter sequence generating unit for generating a voice parameter sequence by referring to the voice unit parameter storage unit based on information indicating a type of a synthesized voice from a symbol sequence; and prosody information and synthesis obtained by the character string analyzing unit. Based on the basic pitch and sound source rules determined based on information indicating the type of sound,
It has a configuration in which a prosody parameter sequence generation means for generating a prosody parameter sequence, and speech synthesis means for outputting a synthesized speech based on the speech parameter sequence and the prosody parameter sequence are provided.

また、本発明は、入力文字列を解析して音韻記号列及
び韻律情報をそれぞれ求め、この求めた音声記号列に含
まれる音節が無声化母音である場合には予め決められた
特定の種類の音声パラメータを取り出し、また前記文字
列解析手段で求めた音声記号列に含まれる音節が無声化
母音以外である場合には複数の合成音声の種類から指定
された種類の音声素片パラメータを取り出し、音声パラ
メータ列を生成し、前記求めた韻律情報及び、指定され
た合成音声の種類の基づく基本ピッチと音源規則とに従
って、韻律パラメータ列を生成し、前記音声パラメータ
列及び前記韻律パラメータ列に基づき、合成音声を出力
する処理から成っている。Further, the present invention analyzes the input character string to obtain a phonological symbol string and prosodic information, respectively, and when a syllable included in the obtained phonetic symbol string is an unvoiced vowel, a predetermined specific type Extract voice parameters, and if the syllables included in the voice symbol string determined by the character string analysis means is other than a non-voicing vowel, extract a voice unit parameter of a specified type from a plurality of types of synthesized voice, Generate a voice parameter sequence, according to the obtained prosodic information and the basic pitch and sound source rules based on the type of the specified synthesized voice, generate a prosody parameter sequence, based on the voice parameter sequence and the prosody parameter sequence, It consists of processing to output synthesized speech.

（作用）本発明の音声合成装置及び音声合成方法において、音
声素片パラメータファイルには、予め、無声化母音以外
の音節については男声、女声、子供の声等合成する音声
の種類別に対応した音声素片パラメータを登録し且つ、
無声化母音の音節は１種類の音声素片パラメータを登録
しておく。音声パラメータ列作成手段は前記音声素片パ
ラメータファイルに登録されている１種類の無声化母音
の音声素片パラメータを、合成する音声の種類にかかわ
りなく共通に参照して、音声パラメータ列を作成する。(Operation) In the speech synthesis device and the speech synthesis method of the present invention, the speech unit parameter file contains speech corresponding to each type of speech to be synthesized such as a male voice, a female voice, or a child's voice for syllables other than the unvoiced vowel in advance. Register the unit parameters and
As the syllable of the unvoiced vowel, one type of speech unit parameter is registered. The voice parameter sequence generating means generates a voice parameter sequence by referring to the voice unit parameters of one type of unvoiced vowel registered in the voice unit parameter file irrespective of the type of voice to be synthesized. .

（実施例）以下本発明の一実施例を図面を参照して説明する。第
１図は本発明の音声合成装置の一実施例を示したブロッ
ク図である。１は入力文字列100を解析して、その音韻
記号列200と韻律情報300及び選択情報を求める文字列解
析部である。この文字列解析部１にて求められた音韻記
号列200は、音声パラメータ列生成装置２に前記選択情
報と共に入力され、ここで音声素片パラメータファイル
３を参照して音声パラメータ列400に変換生成される。
一方、前記文字列解析部１で求められた韻律情報300
は、韻律パラメータ列生成装置４に前記選択情報と共に
与えられて、ここで韻律パラメータ列500に変換生成さ
れる。音声パラメータ列生成装置２によって変換生成さ
れた音声パラメータ列400と韻律パラメータ列生成装置
によって変換生成された韻律パラメータ列500は音声合
成器５に入力され、ここで入力されたこれらパラメータ
列400,500に従い且つ所定の合成規則に基づいて、前記
入力文字列に対応した合成音声600が生成出力される。Embodiment An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of a speech synthesis apparatus according to the present invention. Reference numeral 1 denotes a character string analysis unit that analyzes the input character string 100 and obtains a phoneme symbol string 200, prosody information 300, and selection information. The phoneme symbol string 200 obtained by the character string analysis unit 1 is input to the speech parameter string generation device 2 together with the selection information, and converted into a speech parameter string 400 by referring to the speech unit parameter file 3. Is done.
On the other hand, the prosody information 300 obtained by the character string
Is supplied to the prosody parameter sequence generation device 4 together with the selection information, and is converted and generated into a prosody parameter sequence 500 here. The voice parameter sequence 400 converted and generated by the voice parameter sequence generation device 2 and the prosody parameter sequence 500 converted and generated by the prosody parameter sequence generation device are input to the voice synthesizer 5, and according to the input parameter sequences 400 and 500, and Based on a predetermined synthesis rule, a synthesized speech 600 corresponding to the input character string is generated and output.

次に本実施例の動作について説明する。女声、男声、
子供の声等、複数種類の合成音声が出力可能な音声合成
装置では、いずれの種類の合成音を発生するかを選択指
定する選択信号が入力文字列100に付随して必要とな
る。本例では、この選択信号として例えば、α，β，γ
をそれぞれ女声、男声、子供の声を指定する選択信号入
力列とする。第２図は第１図に示した文字列解析部１の
詳細例を示したブロック図である。判別器11は前記選択
信号の種類を判別して男、女、子供のいずれの種類の合
成音出力が選択されているかの選択情報を音声パラメー
タ列生成装置２及び韻律パラメータ列生成装置４に送
る。音声パラメータ列生成装置２は前記選択情報から音
声素片パラメータファイル３の男声、女声、子供の声の
素片パラメータファイルのうちいずれを選択するかを決
め、韻律パラメータ列生成装置４は前記選択情報から男
声、女声、子供の声のいずれに基本ピッチf₀を設定する
か又その音源規則を決める。更に、判別器11は上記α，
β，γの選択入力信号以外の入力信号（入力文字列）10
0を音声合成を行うための日本語テキストとみなして、
これら入力信号をテキスト解析部12に渡す。テキスト解
析部12は言語辞書13を用いて入力信号のテキスト解析を
行って音韻記号列200と韻律情報300を求める。このよう
にして求められた音韻記号列200と韻律情報300は音声パ
ラメータ列生成装置２及び韻律パラメータ列生成装置４
に与えられる。音声パラメータ列生成装置２は、先きに
判別器11から入力された選択情報によって、入力される
音韻記号列に対応する音声素片パラメータを音声素片パ
ラメータファイル３から取り出して、補間結合を行って
音声パラメータ列400を生成する。韻律パラメータ列生
成装置４は、先きに判別器11から入力された選択情報に
よって決められる基本ピッチf₀及び音源規則と入力され
る韻律情報から、音源生成規則に基づいて韻律パラメー
タ列500を生成する。Next, the operation of this embodiment will be described. Female voice, male voice,
In a speech synthesizer capable of outputting a plurality of types of synthesized speech such as a child's voice, a selection signal for selecting which type of synthesized speech is to be generated is required in addition to the input character string 100. In this example, for example, α, β, γ
Is a selection signal input sequence for designating a female voice, a male voice, and a child voice, respectively. FIG. 2 is a block diagram showing a detailed example of the character string analysis unit 1 shown in FIG. The discriminator 11 discriminates the type of the selection signal and sends selection information indicating which type of synthesized sound output, male, female, or child, is selected to the voice parameter sequence generator 2 and the prosody parameter sequence generator 4. . The speech parameter sequence generator 2 determines which of the male, female, and child voice segment parameter files of the speech segment parameter file 3 to select from the selection information. decide male voice, female voice, in any of the child's voice or also the sound source rule to set the basic pitch f ₀ from. Further, the discriminator 11 calculates the above α,
Input signal (input character string) other than selection input signal of β and γ 10
Regarding 0 as Japanese text for speech synthesis,
These input signals are passed to the text analysis unit 12. The text analysis unit 12 analyzes the text of the input signal using the language dictionary 13 to obtain a phoneme symbol string 200 and prosody information 300. The phoneme symbol sequence 200 and the prosody information 300 obtained in this manner are used as the speech parameter sequence generation device 2 and the prosody parameter sequence generation device 4.
Given to. The speech parameter sequence generation device 2 extracts a speech segment parameter corresponding to the inputted phoneme symbol string from the speech segment parameter file 3 based on the selection information previously inputted from the discriminator 11, and performs interpolation combination. To generate a voice parameter sequence 400. The prosody parameter sequence generation device 4 generates a prosody parameter sequence 500 based on the sound source generation rules from the basic pitch f ₀ determined by the selection information previously input from the discriminator 11 and the sound source rules and the input prosody information. I do.

ここで、文字列解析部１に入力される選択信号と文字
列が例えば、「β滑らかな音声が得られます。α機能を
選んで下さい。γ画面も表示しますか。」であったとす
る。この場合、音声パラメータ列生成装置２では、まず
選択信号βによって音声素片パラメータファイル３の中
の男声の素片パメータファイルが選択され、また、韻律
パラメータ列生成装置４では、男声基本ピッチf₀と音源
生成規則が選択される。他方テキスト解析部12では、
「滑らかな音声が得られます」が言語辞書13を用いて解
析され、上記文字列に対応する音韻記号列200と韻律情
報列300が得られる。従って、音声パラメータ列生成装
置２及び韻律パラメータ列生成装置４にて上記文字列に
対応する音声パラメータ列400及び韻律パラメータ列500
が生成されて、これらパラメータ列が音声合成器５に入
力される。このため、音声合成器５からは男声の規則合
成音による「滑らかな音声が得られます」が出力され
る。同様にして、音声合成器５からは女声の規則合成音
で「機能を選んで下さい」が、続いて子供の規則合成音
で「画像も表示しますか」が出力される。Here, it is assumed that the selection signal and the character string input to the character string analysis unit 1 are, for example, “β smooth sound can be obtained. Select the α function. Do you want to display the γ screen?” . In this case, the voice parameter sequence generator 2 first selects a male parameter file in the voice unit parameter file 3 by the selection signal β, and the prosodic parameter sequence generator 4 selects the male voice basic pitch f ₀ and the sound source generation rule are selected. On the other hand, in the text analysis unit 12,
"Smooth speech is obtained" is analyzed using the language dictionary 13, and a phoneme symbol string 200 and a prosody information string 300 corresponding to the character string are obtained. Therefore, the voice parameter sequence generation device 2 and the prosody parameter sequence generation device 4 generate the voice parameter sequence 400 and the prosody parameter sequence 500 corresponding to the character string.
Are generated, and these parameter strings are input to the speech synthesizer 5. For this reason, the speech synthesizer 5 outputs “smooth speech is obtained” by the ruled synthetic voice of the male voice. Similarly, the voice synthesizer 5 outputs "Please select a function" with the ruled synthetic voice of the female voice, and then outputs "Do you want to display the image?"

第３図は上記音声素片パラメータファイル３の構成例
を示した模式図である。音声素片パラメータ３として
は、ケプストラムパラメータ、LPCパラメータ、PARCOR
パラメータ、LSPパラメータ、ホルマントパラメータ等
があり、第３図（Ａ）で示したものは女声素片パラメー
タファイル31、男声素片パラメータファイル32、子供の
声の素片パラメータファイル33及び無声化母音素片パラ
メータ50から成っており、女声、男声、子供の声が出力
可能な音声規則合成装置に用いられるようになってい
る。この音声素片パラメータファイル３には無声化母音
以外の音節（日本語百音節や外来語音節）については女
声、男声、子供の声それぞれの音節を自然音声から分析
してパラメータ化したものが登録されている。また無声
化母音の音節については、女性アナウンサの自然音声を音声のエネル
ギーバランスや明瞭度を考慮して作成したものを共通の
無声化音声の音声素片パラメータ50として前記音声素片
パラメータファイル３に登録してある。なお、前記無音
化音声の音声素片パラメータは男性アナウンサ又は子供
の声を使って作成しても良いが、本例では１種類だけ登
録されている。FIG. 3 is a schematic diagram showing a configuration example of the speech unit parameter file 3. As speech unit parameters 3, cepstrum parameters, LPC parameters, PARCOR
There are parameters, LSP parameters, formant parameters, etc., and those shown in FIG. 3 (A) are a female voice segment parameter file 31, a male voice segment parameter file 32, a child voice segment parameter file 33 and a devoiced vowel element. It is composed of one parameter 50, and is used for a voice rule synthesizer capable of outputting female voice, male voice, and child voice. In this voice unit parameter file 3, syllables other than unvoiced vowels (Japanese hundred syllables and foreign syllables) are registered by analyzing the syllables of female, male, and child voices from natural speech and parameterizing them. Have been. Vowels of unvoicing vowels As for, a natural voice of a female announcer created in consideration of energy balance and intelligibility of voice is registered in the voice unit parameter file 3 as a voice unit parameter 50 of a common unvoiced voice. Note that the speech segment parameters of the silenced speech may be created using a male announcer or a child's voice, but in this example, only one type is registered.

ここで、第１図に示した音声合成装置でを男声合成音声で出力する場合、音声パラメータ列生成
装置２は、「そ」、「き」に関して第３図（Ａ）で示し
た音声素片パラメータファイル３の男声素片パラメータ
32を取り出し、は無声化母音の共通データである無声化母音素化パラメ
ータ50を取り出して捕間結合を行って、対応する音声パ
ラメータ列400を生成する。女声合成音、子供の声の合
成音を生成する場合も同様である。第３図（Ｂ）は音声
素片パラメータファイル３にて無声化母音の音声素片パ
ラメータを男声、女声、子供の声を素片ファイルにそれ
ぞれ持たせた例であり、３つの無声化母音素片パラメー
タに同じものを登録してあるため、実質的には第３図
（Ａ）に示したものと同じものになる。但し、第３図
（Ａ）のように女性アナウンサの発声した無声化母音を
音声分析した無声化母音素片パラメータファイル50を共
用して用いる場合は、男声、子供の声の無声化母音のデ
ータをメモリに記憶しておく必要がなくなり、その分メ
モリが節約できる。ところで、無声化母音のうち特に、は明瞭度、エネルギーバランスの点から作成しにくいも
のである。そこで本例では、これらについては共通データとして無声化母音の音声素片パラ
メータ50に作成し、残りの無声化母音節はそれぞれの自然音声を各パラメータファイルに登録し
てある。Here, the speech synthesizer shown in FIG. Is output as a male voice synthesis voice, the voice parameter sequence generation device 2 outputs the male voice segment parameter of the voice segment parameter file 3 shown in FIG.
Take out 32, Extracts the unvoiced vowelization parameters 50, which are common data of unvoiced vowels, and performs intercept coupling to generate a corresponding voice parameter sequence 400. The same applies to the case of generating a synthetic voice of a female voice and a synthetic voice of a child's voice. FIG. 3 (B) shows an example in which voice unit parameters of unvoiced vowels are provided in the voice unit parameter file 3 for male, female, and child voices in the voice unit file, respectively. Since the same parameter is registered in one parameter, the parameter is substantially the same as that shown in FIG. 3 (A). However, when the unvoiced vowel segment parameter file 50 obtained by voice analysis of the unvoiced vowel uttered by the female announcer is used in common as shown in FIG. Need not be stored in the memory, and the memory can be saved accordingly. By the way, among the unvoiced vowels, Are difficult to create in terms of clarity and energy balance. In this example, Is created as common data in the speech unit parameters 50 of the unvoiced vowels, and the remaining unvoiced vowels Has registered each natural voice in each parameter file.

本実施例によれば、音声素片パラメータ３に登録され
ている無声化母音素片パラメータ50としては例えば女性
の自然音声から作成した一種類だけであるため、従来の
如く、合成音声の種類別に前記無声化母音音声素片パラ
メータを作成する必要がなくなり、この無声化母音音声
素片パラメータの作成労力を大幅に削減することができ
る。しかも、前記１種類の無声化母音音声素片パラメー
タ50を共通データとして、女声の合成音作成の時は勿
論、男声、子供の声の合成音作成時に用いるため、いず
れの合成音も滑らかさ、聞き易さ等を損うことがない自
然性の高い高品質のものを規則合成することができる。According to the present embodiment, the unvoiced vowel segment parameter 50 registered in the speech segment parameter 3 is only one type created from, for example, a natural voice of a woman. There is no need to create the unvoiced vowel speech unit parameters, and the labor for creating the unvoiced vowel speech unit parameters can be greatly reduced. In addition, since the one type of unvoiced vowel speech unit parameter 50 is used as common data when creating a synthetic voice for a female voice, as well as when creating a synthetic voice for a male voice and a child voice, any of the synthetic voices has smoothness, It is possible to regularly synthesize high-quality items having high naturalness without impairing the audibility and the like.

［発明の効果］以上記述した如く本発明の音声合成装置及び音声合成
方法によれば、規則合成される音声の滑らかさ及び聞き
とり易すさを損うことなく、複数種類の音声を合成でき
且つ、無声化する母音の音節の音声素片パラメータファ
イルの作成労力を軽減できる効果がある。[Effects of the Invention] As described above, according to the speech synthesizing apparatus and the speech synthesizing method of the present invention, a plurality of types of speech can be synthesized without impairing the smoothness and audibility of the speech that is regularly synthesized. This has the effect of reducing the labor required to create a speech unit parameter file of the vowel syllable to be unvoiced.

[Brief description of the drawings]

第１図は本発明の音声合成装置の一実施例を示したブロ
ック図、第２図は第１図に示した文字列解析部の詳細例
を示したブロック図、第３図は第１図に示した音声素片
パラメータファイルの構成例を示した模式図である。１……文字解析部２……音声パラメータ列生成装置３……音声素片パラメータファイル４……韻律パラメータ列生成装置５……音声合成器、11……判別器 12……テキスト解析部、13……言語辞書FIG. 1 is a block diagram showing an embodiment of a speech synthesizer according to the present invention, FIG. 2 is a block diagram showing a detailed example of a character string analyzing section shown in FIG. 1, and FIG. 3 is a schematic diagram showing a configuration example of a speech unit parameter file shown in FIG. DESCRIPTION OF SYMBOLS 1 ... Character analysis part 2 ... Speech parameter string generation device 3 ... Speech unit parameter file 4 ... Prosody parameter string generation device 5 ... Speech synthesizer, 11 ... Discriminator 12 ... Text analysis part, 13 ...... Language dictionary

Claims

(57) [Claims]

1. A character string analyzing means for analyzing an input character string to obtain a phoneme symbol string and prosody information, and storing speech unit parameters of at least one type of unvoiced vowel of a type of synthesized speech, and A speech unit parameter storage unit for storing speech unit parameters other than a plurality of types of unvoiced vowels according to the type of speech, and a speech symbol string obtained by the character string analysis unit, based on information indicating the type of synthesized speech. A voice parameter sequence generating unit that generates a voice parameter sequence by referring to the voice unit parameter storing unit; a basic pitch determined based on prosody information obtained by the character string analyzing unit and information indicating a type of synthesized voice. And a prosody parameter sequence generating means for generating a prosody parameter sequence based on the sound source rules, and a synthesized speech based on the speech parameter sequence and the prosody parameter sequence. A voice synthesizing means for outputting voice.

2. An input character string is analyzed to obtain a phoneme symbol string and prosody information. If a syllable included in the obtained phonetic symbol string is an unvoiced vowel, a predetermined specific type of speech If the syllables included in the phonetic symbol sequence obtained by the character string analysis means are other than unvoicing vowels, the speech unit parameters of the specified type are extracted from a plurality of synthesized speech types, and Generating a parameter sequence, generating a prosody parameter sequence in accordance with the determined prosody information and a basic pitch and sound source rule based on the specified type of synthesized speech, and synthesizing based on the speech parameter sequence and the prosody parameter sequence. A speech synthesis method characterized by outputting speech.