JPH0667685A

JPH0667685A - Speech synthesizing device

Info

Publication number: JPH0667685A
Application number: JP4224478A
Authority: JP
Inventors: Yuriko Taga; 百合子多賀; Yumi Honda; 由美本多
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1992-08-25
Filing date: 1992-08-25
Publication date: 1994-03-11

Abstract

(57)【要約】【目的】本発明は文字列からなるテキストを音声に変
換して発声する音声合成装置に関し、自然さを失わずに
決められた時間でぴったり発声が終了するようにした音
声合成装置を実現することを目的とする。【構成】音韻変換部１とパラメータ計算部２と音声合
成部３と音声波形合成部４とよりなる、文書の文字列の
読み上げを行う音声合成装置において、発声時間を指定
する入力手段10と、伸縮処理部11とを備え、伸縮処理部
11は、パラメータ計算部２において計算した、時間に関
するパラメータを変化させ、文字列の読み上げが、指定
された発声時間で行われるように構成する。 (57) [Abstract] [Object] The present invention relates to a voice synthesizer for converting a text consisting of a character string into voice and uttering the voice, and a voice in which the utterance ends exactly at a predetermined time without losing the naturalness. The purpose is to realize a synthesizer. In a voice synthesizing device for reading a character string of a document, which comprises a phoneme converting unit 1, a parameter calculating unit 2, a voice synthesizing unit 3, and a voice waveform synthesizing unit 4, an input means 10 for designating a utterance time, The expansion / contraction processing unit 11 and the expansion / contraction processing unit are provided.
Reference numeral 11 changes the parameter relating to time calculated by the parameter calculation unit 2 so that the reading of the character string is performed at the designated utterance time.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は文字列からなるテキスト
を音声に変換して発声する音声合成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer for converting text consisting of character strings into speech and uttering the speech.

【０００２】マルチメディア技術が発達し、文書を音声
で出力することができるようになったが、より自然な発
声、より容易な指定方法がプレゼンテーションの分野で
要求されている。具体的には、発声速度ではなく発声時
間での指定が求められる。With the development of multimedia technology, it has become possible to output a document by voice, but a more natural utterance and an easier specification method are required in the field of presentation. Specifically, it is required to specify the speaking time instead of the speaking rate.

【０００３】[0003]

【従来の技術】図４は従来システムの構成ブロック図で
ある。音韻変換部１は、入力された漢字かなまじり文で
書かれた文書ファイルを、音韻辞書によりかな文字列に
直し、音韻列とする。これはワープロのかな漢字変換の
逆操作に相当する。2. Description of the Related Art FIG. 4 is a block diagram of a conventional system. The phoneme conversion unit 1 converts the input document file written in kanji or kana-moji text into a kana character string by using a phoneme dictionary to form a phoneme string. This corresponds to the reverse operation of kana-kanji conversion in a word processor.

【０００４】これにアクセント辞書により、単語を構成
する音韻列にアクセント情報を付加し、さらに、文法解
析により単語間のつながりによる連結の強さを考慮した
休止時間情報、およびイントネーション情報を付加し
て、合成文字列とする。To this, accent information is added to a phoneme sequence forming a word by an accent dictionary, and further pause time information and intonation information in consideration of connection strength due to connection between words are added by grammatical analysis. , A composite character string.

【０００５】合成文字列は、音節を表すカタカナと、呼
気段落、フレーズ、アクセント句を現す区切り記号と、
アクセント記号、鼻濁化記号、無声化記号等よりなる。
パラメータ計算部２は、音声合成部３の前処理として、
合成文字列を数値パラメータに変換する。この中には、
各音韻要素の継続時間（時間パラメータ）が含まれる。
これは標準の発声速度における値である。The synthetic character string includes katakana that represents syllables, delimiters that represent exhalation paragraphs, phrases, and accent phrases,
It consists of accent marks, nasalization marks, unvoiced marks, etc.
The parameter calculation unit 2 performs the preprocessing of the speech synthesis unit 3 as follows.
Converts a composite string into a numeric parameter. In this,
The duration (time parameter) of each phoneme element is included.
This is the value at the standard speaking rate.

【０００６】音声合成部３は、数値パラメータを一定の
合成規則（例えばＰＡＲＣＯＲ方式）により変換合成し
てパラメータ時系列データとする。パラメータ時系列デ
ータは一定時間単位（フレーム）毎の値を並べたもの
で、音声波形合成部４でこれを音声波形に変換する。The voice synthesizing unit 3 transforms and synthesizes numerical parameters according to a certain synthesizing rule (for example, PARCOR method) to obtain parameter time series data. The parameter time-series data is an array of values for each fixed time unit (frame), and the voice waveform synthesizer 4 converts this into a voice waveform.

【０００７】図５に入力テキストから数値パラメータに
変換されるまでの各処理段階のデータの例を示す。図５
（Ａ）は入力テキストである。これを音韻変換部１で処
理して、結果として得られる合成文字列が図５（Ｂ）で
ある。これをパラメータ計算部２で処理した結果、図５
（Ｃ）に示すような、パラメータ時系列データの前段階
の数値パラメータが得られる。図において、pause の欄
は無音の時間を、ｃの欄は子音の時間を、v は母音の時
間を表す。FIG. 5 shows an example of data at each processing stage from conversion of input text to numerical parameters. Figure 5
(A) is the input text. This is processed by the phoneme conversion unit 1, and the resultant synthetic character string is shown in FIG. 5 (B). As a result of processing this by the parameter calculation unit 2, FIG.
Numerical parameters at the previous stage of the parameter time series data as shown in (C) are obtained. In the figure, the column of pause represents silent time, the column of c represents consonant time, and v represents vowel time.

【０００８】数値パラメータを音声合成部で処理してパ
ラメータ時系列データとし、パラメータ時系列データを
音声波形合成部４で処理し、アナログデータとしての音
声を得、スピーカ等により音声として出力する。[0008] Numerical parameters are processed by the voice synthesizing unit to form parameter time-series data, and the parameter time-series data is processed by the voice waveform synthesizing unit 4 to obtain voice as analog data and output as voice from a speaker or the like.

【０００９】発声時間を変えたいときは音声波形合成部
４でフレーム時間を変えることにより発声速度を変えて
いた。When it is desired to change the utterance time, the utterance speed is changed by changing the frame time in the voice waveform synthesizer 4.

【００１０】[0010]

【発明が解決しようとする課題】従来技術では発声速度
を変えることはできたが、決められた時間で発声が終了
するようにはできなかった。In the prior art, the utterance speed could be changed, but the utterance could not be finished at a predetermined time.

【００１１】本発明は文書全体または文章単位に時間を
指定することにより、自然さを失わずに決められた時間
でぴったり発声が終了するようにした音声合成装置を実
現することを目的としている。It is an object of the present invention to realize a voice synthesizing apparatus in which the time is specified for the entire document or for each sentence so that the utterance can be finished exactly at a predetermined time without losing the naturalness.

【００１２】[0012]

【課題を解決するための手段】図１は本発明の原理ブロ
ック図である。従来の音声合成装置に対して、発声時間
を指定する入力手段10、数値パラメータの内の時間パラ
メータを変更する伸縮処理部11とを備える。FIG. 1 is a block diagram showing the principle of the present invention. The conventional speech synthesizer is provided with an input means 10 for designating a utterance time and an expansion / contraction processing section 11 for changing a time parameter among numerical parameters.

【００１３】[0013]

【作用】標準の時間パラメータにより発声するときの発
声時間は、各音韻要素の時間パラメータを合計すること
により求められる。これをｔとし、指定された発声時間
をＴとすると、Ｔ／ｔが伸縮率であり、伸縮処理部11
は、この値を各音韻要素の時間パラメータに乗ずること
により、伸縮処理済時間パラメータとする。この伸縮処
理済時間パラメータを含む数値パラメータを音声合成部
４に渡すことにより発声時間を指定の時間に合わせるこ
とができる。The utterance time when uttering with the standard time parameter is obtained by summing the time parameters of each phoneme element. Assuming that this is t and the designated utterance time is T, T / t is the expansion / contraction ratio, and the expansion / contraction processing unit 11
Is multiplied by the time parameter of each phoneme element to obtain a stretched time parameter. By passing a numerical parameter including the expansion / contraction processing time parameter to the voice synthesizing unit 4, the utterance time can be adjusted to the designated time.

【００１４】ところで、人間が発声する場合、発声速度
を変化させたとき、一律に変化するのではなく、例え
ば、音声を有音区間と無音区間とに分けたとき、無音区
間の方が伸縮率が大きいように、音声の構成要素の種類
によって伸縮率が異なることは経験上判っていることで
ある。By the way, when a human utters, when the utterance speed is changed, the utterance does not change uniformly. For example, when the voice is divided into a voiced section and a silent section, the expansion / contraction rate is in the silent section. It has been empirically known that the expansion / contraction ratio varies depending on the type of the voice component, as is large.

【００１５】従って、自然さを保つためには全体を同じ
伸縮率で変化させるのではなく、構成要素の種類による
伸縮率の違いを考慮して時間パラメータの値を決めるこ
とが望ましい。Therefore, in order to maintain the naturalness, it is desirable to determine the value of the time parameter in consideration of the difference in the expansion / contraction ratio depending on the type of the component, instead of changing the entire expansion / contraction ratio at the same.

【００１６】標準発声時間を構成要素の種類毎に区分
し、それぞれの時間パラメータの合計をｔ1 ，ｔ2 ，ｔ
3 ，・・・・とすると、ｔ＝ｔ1 ＋ｔ2 ＋ｔ3 ＋・・・・・・構成要素の種類毎の伸縮率を、１＋ｋ1 ，１＋ｋ2 ，１
＋ｋ3 ，・・・・・とし、それらの相互関係を定めてお
く。The standard utterance time is divided for each type of component, and the sum of the respective time parameters is t1, t2, t.
3, ..., t = t1 + t2 + t3 + ... The expansion and contraction rate for each type of component is 1 + k1, 1 + k2, 1
+ K3, ..., and their mutual relations are defined.

【００１７】例えば、ｋ1 ：ｋ2 ：ｋ3 ：・・・・＝
ａ：ｂ：ｃ：・・・・・とすると、指定発声時間は、Ｔ＝（１＋ｋ1 ）ｔ1 ＋（１＋ｋ2 ）ｔ2 ＋（１＋ｋ3 ）ｔ3 ＋・・・＝ｔ＋ｋ1 ・ｔ1 ＋ｋ2 ・ｔ2 ＋ｋ3 ・ｔ3 ＋・・・・＝ｔ＋ｋ1 ・ｔ1 ＋ｂ・ｋ1 ・ｔ2 ／ａ＋ｃ・ｋ1 ・ｔ3 ／ａ・・・＝ｔ＋ｋ1 （ａ・ｔ1 ＋ｂ・ｔ2 ＋ｃ・ｔ3 ＋・・・・）／ａ従って、ｋ1 ＝（Ｔ−ｔ）・ａ／（ａ・ｔ1 ＋ｂ・ｔ2 ＋ｃ・ｔ3 ＋・・・・）ｋ2 ＝ｂ・ｋ１／ａｋ3 ＝ｃ・ｋ１／ａ・・・・・・・・・・・により、構成要素の種類毎の伸縮率が求められる。For example, k1: k2: k3: ...
If a: b: c: ..., then the designated utterance time is: T = (1 + k1) t1 + (1 + k2) t2 + (1 + k3) t3 + ... = t + k1.t1 + k2.t2 + k3.t3 + ··· = t + k1 · t1 + b · k1 · t2 / a + c · k1 · t3 / a ··· t + k1 (a · t1 + b · t2 + c · t3 + ···) / a Therefore, k1 = (T -T) ・ a / (a ・ t1 + b ・ t2 + c ・ t3 + ・・・・) k2 = b ・ k1 / a k3 = c ・ k1 / a ・・・・・・・・・The expansion / contraction rate for each type of element is obtained.

【００１８】従って、各構成要素の種類に応じて 1 ＋
ｋ1 ，１＋ｋ2 ，１＋ｋ3 ，・・を、各音韻要素の時間
パラメータに乗じて、伸縮処理後の時間パラメータを得
る。さらに、図２に示すように全体の伸縮率Ｔ／ｔが１
から離れるほど、構成要素の種類による伸縮率の変動は
大きくなる。従って上記のａ，ｂ，ｃ・・・の値をＴ／
ｔに対してテーブルにしておき、その値を使ってｋ１，
ｋ２，ｋ３，・・・の計算をすれば、より自然さを保っ
た伸縮が行える。Therefore, depending on the type of each component, 1 +
The time parameters after expansion / contraction processing are obtained by multiplying k1, 1 + k2, 1 + k3, ... By the time parameters of each phoneme element. Further, as shown in FIG. 2, the overall expansion / contraction rate T / t is 1
The farther away from, the greater the variation of the expansion / contraction rate depending on the type of component. Therefore, the above values of a, b, c ...
Make a table for t, and use that value for k1,
By calculating k2, k3, ..., expansion / contraction with more naturalness can be performed.

【００１９】[0019]

【実施例】以下に本発明の実施例を説明する。前述の例
「朝早く、バンガローに電報が届いた。」を約２倍およ
び１／２倍にしたときの計算例を示す。EXAMPLES Examples of the present invention will be described below. An example of calculation when the above-mentioned example “a telegram arrived at a bungalow early in the morning” was doubled and halved is shown.

【００２０】標準の時間パラメータを合計すると３６３
５であり、構成要素の種類を「無音」「母音」「子音」
の３つに区分すると、ｔ１＝１１００，ｔ２＝１９６
５，ｔ３＝５７０である。The sum of the standard time parameters is 363
5, the types of components are "silence", "vowels", and "consonants"
When divided into three, t1 = 1100, t2 = 196
5, t3 = 570.

【００２１】ａ：ｂ：ｃ＝７：３：０とすると、Ｔ＝７０００（約２倍）と指定した場合。ｋ１＝（7000-3635)・7 ／(7・1100＋ 3・1965＋ 0・57
0)＝1.733 ｋ２＝0.743 ｋ３＝0 よって、伸縮率は、無音2.73倍，母音1.74倍，子音１倍
となる。If a: b: c = 7: 3: 0, then T = 7000 (about twice) is specified. k1 = (7000-3635) / 7 / (7/1100 + 3/1965 + 0.57)
0) = 1.733 k2 = 0.743 k3 = 0 Therefore, the expansion / contraction rate is 2.73 times for silence, 1.74 times for vowel, and 1 time for consonant.

【００２２】これらの伸縮率により変更した時間パラメ
ータを図３（Ａ）に示す。Ｔ＝２０００（約１／２倍）と指定した場合。ｋ１＝（2000-3635)・7 ／(7・1100＋ 3・1965＋ 0・57
0)＝-0.842 ｋ２＝-0.361 ｋ３＝0 よって、伸縮率は、無音0.16倍，母音0.64倍，子音１倍
となる。ここで、伸縮率が一定値より小さくなった場合
はその値とする。例えば無音の伸縮率は０を最低値とす
る。FIG. 3 (A) shows the time parameter changed by these expansion / contraction rates. When T = 2000 (about 1/2 times) is specified. k1 = (2000-3635) / 7 / (7/1100 + 3/1965 + 0.57)
0) =-0.842 k2 = -0.361 k3 = 0 Therefore, the expansion / contraction rate is 0.16 times silence, 0.64 times vowel, and 1 time consonant. Here, when the expansion / contraction rate becomes smaller than a certain value, it is set to that value. For example, the expansion / contraction rate of silence is set to 0 as the minimum value.

【００２３】これらの伸縮率により変更した時間パラメ
ータを図３（Ｂ）に示す。なお、発声時間は文書全体の
時間を指定してもよいし、文章単位で指定できるように
してもよい。文章単位で指定する場合は文書の中に時間
指定コマンドを埋め込む形にすればよい。FIG. 3B shows the time parameter changed by these expansion / contraction rates. The utterance time may be specified as the time of the entire document or may be specified as a sentence unit. When specifying in text units, the time specification command may be embedded in the document.

【００２４】プレゼンテーション時間を文書全体の時間
として指定し、特に強調したい文章を個別に指定するこ
とができるようにすれば、より効果的である。It is more effective if the presentation time is designated as the time of the entire document, and the sentence to be particularly emphasized can be designated individually.

【００２５】[0025]

【発明の効果】以上、詳細に説明したように、本発明に
よれば、指定時間にちょうど収まる文書読み上げ装置を
実現することができる。また読み上げ時間を指定するこ
とにより発声速度が変えられても発声の自然さが保たれ
る。As described above in detail, according to the present invention, it is possible to realize the document reading apparatus which can be set within the designated time. Also, by specifying the reading time, the naturalness of utterance is maintained even if the utterance speed is changed.

[Brief description of drawings]

【図１】本発明の原理ブロック図である。FIG. 1 is a principle block diagram of the present invention.

【図２】音声の構成要素の種類毎の伸縮率の相対変化を
示す。FIG. 2 shows a relative change in expansion / contraction ratio for each type of audio component.

【図３】本発明の実施例による伸縮処理後の時間パラメ
ータの例を示す。FIG. 3 shows an example of time parameters after expansion / contraction processing according to an embodiment of the present invention.

【図４】従来システムの構成ブロック図である。FIG. 4 is a configuration block diagram of a conventional system.

【図５】入力テキストから数値パラメータに変換される
までのデータ例。FIG. 5 is an example of data from input text to conversion into numerical parameters.

[Explanation of symbols]

１音韻変換部２パラメータ計算部３音声合成部４音声波形合成部 10 発声時間指定手段 11 伸縮処理部 1 phoneme conversion unit 2 parameter calculation unit 3 speech synthesis unit 4 speech waveform synthesis unit 10 vocalization time designating unit 11 expansion / contraction processing unit

Claims

[Claims]

1. A phoneme conversion unit (1) which converts a document into a kana character string or the like into a phoneme string by a phoneme dictionary, adds accent information, pause time information, intonation information, etc., and forms a synthesized character string, and A parameter calculation unit (2) that converts a character string into a numerical parameter that includes a time-related parameter at the time of standard utterance, and a combined character string that is converted into a numerical parameter is converted and combined according to a certain rule to create parameter time series data. In a voice synthesizer for reading a character string of a document, which comprises a voice synthesizer (3) for converting the parameter time series data into a voice waveform synthesizer (4) for converting into an analog voice signal, The input / output unit (10) for designating and the expansion / contraction processing unit (11) are provided. A speech synthesizer characterized by changing a parameter to read a character string at a designated utterance time.

2. For each type of constituent elements of the uttered voice,
The speech synthesizer according to claim 1, wherein a rate of change of a parameter relating to time is changed.

3. The speech synthesizer according to claim 2, wherein the ratio of the rate of change of the parameter relating to time is changed for each type of voice constituent elements in accordance with the ratio of the standard utterance time and the designated utterance time.