JP3297221B2

JP3297221B2 - Phoneme duration control method

Info

Publication number: JP3297221B2
Application number: JP26350394A
Authority: JP
Inventors: 俊康政川; 泰石川
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1994-10-27
Filing date: 1994-10-27
Publication date: 2002-07-02
Anticipated expiration: 2017-07-02
Also published as: JPH08123487A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、任意のテキストを音声
に変換する音声規則合成装置に用いる音韻継続時間長制
御方式に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a phoneme duration control method used in a speech rule synthesizer for converting an arbitrary text into speech.

【０００２】[0002]

【従来の技術】規則音声合成において、合成音声品質、
特に自然性を左右する要因として音韻継続時間長が重要
であり、従来よりその制御方式の検討が行われている。
従来の音韻継続時間長制御方式の第１の例としては、単
音節の聴覚上のリズム知覚点に着目した制御方式があ
り、第１の文献“ＣＶ音節を用いる音声の規則合成にお
ける音節配置規則”（石川、中島、今井電子通信学会
通信部門全国大会講演論文集昭和５９年ｐ３−１９
３）において提案されている。2. Description of the Related Art In rule speech synthesis, synthesized speech quality,
In particular, the duration of the phoneme is important as a factor influencing naturalness, and its control method has been studied conventionally.
As a first example of the conventional phonological duration control method, there is a control method which focuses on a perceptual rhythm perception point of a single syllable, and is described in the first document “Syllable arrangement rule in rule synthesis of voice using CV syllable”. "(Ishikawa, Nakajima, Imai Proceedings of the National Meeting of the Institute of Electronics, Communication and Communication Engineers, 1984 p3-19
3) is proposed.

【０００３】図６は、この手法に基づく従来の音韻継続
時間長制御方式の構成（以下、第１の従来例という）を
示した図である。第１の従来例においては、単音節長決
定部２５とリズム知覚点記憶部１０と子音長決定部１２
と音韻境界決定部１４とで構成される。FIG. 6 is a diagram showing a configuration of a conventional phoneme duration control method based on this method (hereinafter referred to as a first conventional example). In the first conventional example, a single syllable length determination unit 25, a rhythm perception point storage unit 10, and a consonant length determination unit 12
And a phonological boundary determining unit 14.

【０００４】以下、第１の従来例における音韻継続時間
長制御方式について説明する。Hereinafter, a phoneme duration control method in the first conventional example will be described.

【０００５】単音節長決定部２５では、単音節列１中の
各単音節の時間長を決定し、単音節列１に単音節長の情
報を付与した単音節長データ列９を出力する。音節長決
定部２５における音節長の決定方法の一例は、単音節長
を全て同一とする方法が用いられる。リズム知覚点記憶
部１０は、単音節種類ごとの聴覚上のリズム知覚の基準
となるリズム知覚点位置１１を記憶する。子音長決定部
１２は、例えば自然音声の分析により得られる各子音の
平均長を用いる方式で、子音種類ごとの継続時間長を決
定する。音韻境界決定部１４は、図７に示したように、
単音節種類に従ってリズム知覚点記憶部１０からリズム
知覚点位置１１を読み出すとともに、子音長決定部１２
から単音節の子音長１３を入力し、単音節長がリズム知
覚点の時間間隔と一致するように子音を配置することで
音韻境界を決定する。母音の時間長は、決定された音韻
境界によって事後的に決定し、音韻継続時間長１５を出
力するものである。A single syllable length determining unit 25 determines the time length of each single syllable in the single syllable string 1, and outputs a single syllable length data string 9 in which the single syllable string information is added to the single syllable string 1. As an example of the method of determining the syllable length in the syllable length determining unit 25, a method of making all syllable lengths the same is used. The rhythm perception point storage unit 10 stores a rhythm perception point position 11 serving as a reference of auditory rhythm perception for each single syllable type. The consonant length determination unit 12 determines the duration of each consonant type by using, for example, the average length of each consonant obtained by analyzing natural speech. As shown in FIG. 7, the phoneme boundary determination unit 14
The rhythm perception point position 11 is read from the rhythm perception point storage unit 10 in accordance with the monosyllable type, and the consonant length determination unit 12
, A consonant length 13 of a single syllable is input, and the consonants are arranged so that the single syllable length coincides with the time interval of the rhythm perception point, thereby determining a phonological boundary. The time length of the vowel is determined a posteriori based on the determined phoneme boundary, and the phoneme duration time 15 is output.

【０００６】音韻継続時間長制御方式の第２の従来例と
しては、第２の文献としての特開平５−２８１９９３号
公報に開示されている日本語の単音節長が単音節母音部
のエネルギーの積分値が全区間の積分値の２分の１とな
る点である母音重心点間間隔で聴取されることを仮定
し、母音重心点間の時間間隔が子音毎に一定の時間長と
なる傾向を利用した制御方式がある。As a second conventional example of the phonological duration control method, a Japanese single syllable length disclosed in Japanese Patent Application Laid-Open No. 5-281993 as a second reference is used for controlling the energy of a single syllable vowel part. Assuming that the sound is heard at the interval between the vowel centroids, which is the point at which the integrated value is half of the integral over the entire interval, the time interval between the vowel centroids tends to be a fixed time length for each consonant There is a control method using the

【０００７】図８は、この手法に基づく従来の音韻継続
時間長制御方式の構成（以下、第２の従来例という）を
示した図である。第２の従来例においては、母音重心点
間時間長決定部２６と母音重心点間時間長記憶部２８と
母音重心点間音韻境界決定部３０と母音重心点間音韻境
界決定パラメータ記憶部３２とで構成される。FIG. 8 is a diagram showing a configuration of a conventional phoneme duration control method based on this method (hereinafter referred to as a second conventional example). In the second conventional example, a vowel center-of-gravity point time length determining unit 26, a vowel center-of-gravity point length storage unit 28, a vowel center-of-gravity phoneme boundary determining unit 30, a vowel center-of-gravity phoneme boundary determining parameter storage unit 32, It consists of.

【０００８】以下、第２の従来例における音韻継続時間
長制御方式について図９を用いて説明する。Hereinafter, a phoneme duration control method in the second conventional example will be described with reference to FIG.

【０００９】母音重心点間時間長記憶部２８には、例え
ば自然音声の分析により得られた各子音毎の前後の母音
重心点間時間長２７が記憶されている。母音重心点間時
間長決定部２６では入力された単音節列１中の各単音節
の子音種類により母音重心点間時間長記憶部２８から母
音重心点間時間長２７を読み出し、前後の母音重心点間
時間長を決定し、これを単音節列に付与した母音重心点
間長データ列２９を出力する。The vowel center-of-gravity time storage unit 28 stores, for example, the vowel center-of-gravity time lengths 27 before and after each consonant obtained by analysis of natural speech. The vowel center-of-gravity point time length determining unit 26 reads the vowel center-of-gravity point time length 27 from the vowel center-of-gravity point time storage unit 28 according to the consonant type of each monosyllable in the input monosyllable string 1, and reads the preceding and following vowel centroids. The point-to-point time length is determined, and a vowel center-of-gravity point-to-point length data string 29 obtained by adding this to a monosyllable string is output.

【００１０】母音重心点間の先行母音、子音、後続母音
の継続時間を決定する方式については、例えば第３の文
献“母音重心点間リズム規則の母音長・子音長設定法に
ついて”（加藤、橋本日本音響学会講演論文集１−
５−１３平成４年１０月ｐｐ２４１−２４２）にその
一例が説明されている。すなわち、母音重心点間音韻境
界決定パラメータ記憶部３２には、母音子音母音別に与
えられた母音重心点間時間長から、先行母音長、子音
長、後続母音長を算出するための１次式の係数で表され
る母音重心点間音韻境界決定パラメータ３１が記憶され
ており、母音重心点間音韻境界決定部３０では、入力さ
れた母音重心点間長データ列２９で与えられる各母音重
心点間時間長に読み出した母音重心点間音韻境界決定パ
ラメータ３１を乗じ先行母音、子音、後続母音長を求め
ることで、音韻継続時間長１５を出力するものである。For a method of determining the duration of a preceding vowel, a consonant, and a succeeding vowel between vowel centroids, see, for example, the third document "Method of setting vowel length and consonant length in rhythm rule between vowel centroids" (Kato, Hashimoto Proceedings of the Acoustical Society of Japan 1
5-13, October 1992, pp 241-242), an example of which is described. That is, the vowel center-of-gravity phonological boundary determination parameter storage unit 32 stores a linear expression for calculating a preceding vowel length, a consonant length, and a succeeding vowel length from the time length between vowel centroids given for each vowel / vowel. A vowel center-of-gravity center-phonetic boundary determining parameter 31 represented by a coefficient is stored. The phoneme duration 15 is output by obtaining the preceding vowel, consonant, and subsequent vowel length by multiplying the time length by the read-out vowel center-of-gravity phoneme boundary determination parameter 31.

【００１１】[0011]

【発明が解決しようとする課題】第１の従来例は、日本
語音韻継続時間の最も顕著な特徴の一つであるモーラ
（単音節に相当する韻律の単位）をほぼ等時間に発声す
る傾向を基本としており、規則にしたがって単音節の音
節長に各子音の一定の位置を割り当て、音韻継続長を定
める構成になっている。一般に、音節長を定める方式と
しては、一定時間長を与えているが、自然音声の分析に
よればリズム知覚点間隔で決まる単音節長は必ずしも一
定ではない。In the first conventional example, there is a tendency that a mora (a unit of a prosody corresponding to a single syllable), which is one of the most remarkable features of the Japanese phoneme duration, is uttered almost at the same time. And a fixed position of each consonant is assigned to a syllable length of a single syllable according to a rule, and a phoneme continuation length is determined. Generally, as a method of determining the syllable length, a fixed time length is given, but according to the analysis of natural speech, the length of a single syllable determined by the rhythm perception point interval is not always constant.

【００１２】しかしながら、第１の従来例では、その構
成に単音節長の変動を再現するための手段が含まれてい
ないため、出力される音韻継続時間長は自然音声の継続
時間とは異なるものとなり、結果としてこれを用いた音
声規則合成装置では合成音品質が劣化するという問題点
があった。However, in the first conventional example, since the configuration does not include means for reproducing the variation of the monosyllable length, the output phoneme duration is different from the duration of natural speech. As a result, there is a problem that the quality of synthesized speech is degraded in the speech rule synthesis device using this.

【００１３】一方、第２の従来例では、音節長は母音重
心点間の時間間隔で定義し、音節長を母音間の子音の種
類で決定している。従って、音節長の変動要因として子
音の種類を要因とする制御が考慮されているものの、音
節長の変動要因を子音種類に限定し、かつ他の要因によ
らずに一定の音節長を与えるため、異なる音節環境にお
かれた場合の音節長の変動を再現することができず、前
記第１の従来例と同様、出力される音韻継続時間長は自
然音声の継続時間とは異なるものとなり、結果としてこ
れを用いた音声規則合成装置では合成音品質が劣化する
という問題点は十分には解消されない。On the other hand, in the second conventional example, the syllable length is defined by the time interval between the vowel centroids, and the syllable length is determined by the type of consonant between vowels. Therefore, although control based on consonant type is considered as a syllable length variation factor, in order to limit the syllable length variation factor to consonant types and to provide a constant syllable length regardless of other factors However, it is not possible to reproduce the variation in syllable length when placed in a different syllable environment, and the output phoneme duration is different from the duration of natural speech, as in the first conventional example. As a result, the problem that the synthesized speech quality is degraded by the speech rule synthesizing apparatus using this is not sufficiently solved.

【００１４】また、子音毎の音節長が異なるため、結果
として得られる音韻継続時間長の総和で表される継続時
間が入力される音節種類により大きく異なったものとな
り、発話速度の制御が困難になるという新たな問題も生
じていた。Further, since the syllable length of each consonant is different, the duration represented by the sum of the resulting phoneme durations varies greatly depending on the type of syllable to be input, making it difficult to control the utterance speed. There was a new problem.

【００１５】この発明は上記のような問題を解決するた
めになされたものであり、自然な音韻継続時間長を入力
音節列に対して与えることで、これを用いる音声規則合
成装置の合成音声品質の自然性向上を図る音韻継続時間
長制御方式を提供することを目的としている。The present invention has been made in order to solve the above-described problem. By giving a natural phoneme duration to an input syllable string, the synthesized speech quality of a speech rule synthesizer using the same is provided. It is an object of the present invention to provide a phoneme duration control method for improving the naturalness of the sound.

【００１６】[0016]

【課題を解決するための手段】以上の目的を達成するた
めに、請求項１記載の発明は、任意のテキストを音声に
変換する音声規則合成装置に用いる音韻継続時間長制御
方式において、入力された単音節列から、隣接する音節
の判定により２音節あるいは１音節からなるリズム単位
を抽出し、単音節列にリズム単位種類の情報を付与した
リズム単位列を出力するリズム単位抽出手段と、リズム
単位種類ごとの基本継続時間長を記憶するリズム単位基
準長記憶手段と、前記リズム単位列を入力とし、前記リ
ズム単位基準長記憶手段からリズム単位種類に対応する
基本継続時間長を読み出し、入力された各リズム単位の
継続時間を決定し、入力リズム単位列に付与し、リズム
単位長データ列として出力するリズム単位長決定手段
と、前記リズム単位長データ列を入力とし、２音節から
なるリズム単位の時間長を一定比率に分割して各単音節
の時間長を決定し、単音節種類と単音節長からなる単音
節長データ列を出力するリズム単位分割手段と、単音節
種類ごとのリズム知覚点位置を記憶するリズム知覚点記
憶手段と、子音の継続時間長を決定する子音長決定手段
と、前記単音節長データ列を入力とし、前記リズム知覚
点記憶手段からリズム知覚点位置を読み込むとともに、
前記子音長決定手段で与えられる子音長からリズム知覚
点の時間間隔が各単音節長と一致するように各子音を配
置することで音韻境界を決定し、各音韻の継続時間長を
決定する音韻境界決定手段と、を備えることを特徴とす
る。In order to achieve the above object, an invention according to claim 1 is provided in a phoneme duration control method used in a speech rule synthesizing apparatus for converting an arbitrary text into speech. Rhythm unit extraction means for extracting a rhythm unit consisting of two syllables or one syllable by judging adjacent syllables from the single syllable string, and outputting a rhythm unit string in which information of the rhythm unit type is added to the single syllable string; Rhythm unit reference length storage means for storing the basic duration time for each unit type, and the rhythm unit string as input, read out the basic duration time corresponding to the rhythm unit type from the rhythm unit reference length storage means, and input them. A rhythm unit length determining means for determining the duration of each rhythm unit, adding the continuation time to the input rhythm unit sequence, and outputting it as a rhythm unit length data sequence; Rhythm that takes a data string as input, divides the time length of a rhythm unit consisting of two syllables into a fixed ratio, determines the time length of each syllable, and outputs a single syllable length data string consisting of a single syllable type and a single syllable length A unit division unit, a rhythm perception point storage unit for storing a rhythm perception point position for each single syllable type, a consonant length determination unit for determining a duration of a consonant, and the single syllable length data string as an input, While reading the rhythm perception point position from the perception point storage means,
From the consonant length given by the consonant length determining means, determine the phoneme boundary by arranging each consonant so that the time interval of the rhythm perception point matches each monosyllable length, and determine the duration of each phoneme. And a boundary determining means.

【００１７】請求項２記載の発明は、任意のテキストを
音声に変換する音声規則合成装置に用いる音韻継続時間
長制御方式において、入力された単音節列から、隣接す
る音節の判定により２音節あるいは１音節からなるリズ
ム単位を抽出し、単音節列にリズム単位種類の情報を付
与したリズム単位列を出力するリズム単位抽出手段と、
リズム単位種類ごとの基本継続時間長を記憶するリズム
単位基準長記憶手段と、前記リズム単位列を入力とし、
前記リズム単位基準長記憶手段からリズム単位種類に対
応する基本継続時間長を読み出し、入力された各リズム
単位の継続時間を決定し、入力リズム単位列に付与し、
リズム単位長データ列として出力するリズム単位長決定
手段と、２音節からなるリズム単位を構成する単音節対
種類ごとの分割比率を記憶する分割比率記憶手段と、リ
ズム単位長データ列を入力とし、単音節対の種類に従っ
て、前記分割比率記憶手段から読み出した分割比率によ
ってリズム単位を分割することにより各単音節長を決定
し、単音節種類と単音節長からなる単音節長データ列を
出力する音節種類別リズム単位分割手段と、単音節種類
ごとのリズム知覚点位置を記憶するリズム知覚点記憶手
段と、子音の継続時間長を決定する子音長決定手段と、
前記単音節長データ列を入力とし、前記リズム知覚点記
憶手段からリズム知覚点位置を読み込むとともに、前記
子音長決定手段で与えられる子音長からリズム知覚点の
時間間隔が各単音節長と一致するように各子音を配置す
ることで音韻境界を決定し、各音韻の継続時間長を決定
する音韻境界決定手段と、を備えることを特徴とする。According to a second aspect of the present invention, there is provided a phoneme duration control method used in a speech rule synthesizing apparatus for converting an arbitrary text into a speech, wherein two syllables or two syllables are determined by judging adjacent syllables from an input monosyllable string. Rhythm unit extraction means for extracting a rhythm unit composed of one syllable and outputting a rhythm unit sequence in which information of a rhythm unit type is added to a single syllable sequence;
Rhythm unit reference length storage means for storing a basic duration for each rhythm unit type, and the rhythm unit sequence as input,
The basic duration corresponding to the rhythm unit type is read from the rhythm unit reference length storage means, the duration of each input rhythm unit is determined, and given to the input rhythm unit sequence,
A rhythm unit length determining means for outputting as a rhythm unit length data string, a dividing ratio storing means for storing a dividing ratio for each type of a single syllable constituting a syllable unit consisting of two syllables, and a rhythm unit length data string as input, According to the type of the monosyllable pair, each syllable length is determined by dividing the rhythm unit according to the division ratio read from the division ratio storage means, and a monosyllable length data string including the monosyllable type and the monosyllable length is output. Rhythm unit division means for each syllable type, rhythm perception point storage means for storing a rhythm perception point position for each single syllable type, consonant length determination means for determining the duration of consonants,
The single syllable length data string is input, the rhythm perception point position is read from the rhythm perception point storage means, and the time interval of the rhythm perception point matches each single syllable length from the consonant length given by the consonant length determination means. Phoneme boundary determining means for determining a phoneme boundary by arranging each consonant in this manner and determining the duration of each phoneme.

【００１８】請求項３記載の発明は、請求項１乃至２記
載の音韻継続時間長制御方式において、出力される前記
単音節長データ列に対し、別に入力されるアクセント情
報に基づき各単音節長を変更する単音節長変更手段を備
えることを特徴とする。According to a third aspect of the present invention, there is provided the phonological duration control system according to the first or second aspect, wherein the single syllable length data string to be output is determined based on accent information input separately. Singly syllable length changing means for changing the syllable length.

【００１９】請求項４記載の発明は、任意のテキストを
音声に変換する音声規則合成装置に用いる音韻継続時間
長制御方式において、入力された単音節列から、隣接す
る音節の判定により２音節あるいは１音節からなるリズ
ム単位を抽出し、単音節列にリズム単位種類の情報を付
与したリズム単位列を出力するリズム単位抽出手段と、
リズム単位種類ごとの基本継続時間長を記憶するリズム
単位基準長記憶手段と、前記リズム単位列を入力とし、
前記リズム単位基準長記憶手段からリズム単位種類に対
応する基本継続時間長を読み出し、入力された各リズム
単位の継続時間を決定し、入力リズム単位列に付与し、
リズム単位長データ列として出力するリズム単位長決定
手段と、前記リズム単位長データ列を入力とし、２音節
からなるリズム単位の時間長を一定比率に分割して各単
音節の時間長を決定し、単音節種類と単音節長からなる
単音節長データ列を出力するリズム単位分割手段と、音
韻連鎖毎の母音長比率を記憶する母音長比率記憶手段
と、単音節長データ列を入力とし、音節境界位置を母音
区間に配置し、母音子音境界から音節境界位置までの時
間長を音節長と母音長比率から決定し、各音韻の継続時
間長を出力する母音長優先音韻境界決定手段と、を備え
ることを特徴とする。According to a fourth aspect of the present invention, there is provided a phoneme duration control method used in a speech rule synthesizing apparatus for converting an arbitrary text into a speech, wherein two syllables or two syllables are determined by judging adjacent syllables from an input monosyllable string. Rhythm unit extraction means for extracting a rhythm unit composed of one syllable and outputting a rhythm unit sequence in which information of a rhythm unit type is added to a single syllable sequence;
Rhythm unit reference length storage means for storing a basic duration for each rhythm unit type, and the rhythm unit sequence as input,
The basic duration corresponding to the rhythm unit type is read from the rhythm unit reference length storage means, the duration of each input rhythm unit is determined, and given to the input rhythm unit sequence,
Rhythm unit length determining means for outputting as a rhythm unit length data sequence, and inputting the rhythm unit length data sequence, dividing the time length of a rhythm unit consisting of two syllables into a fixed ratio to determine the time length of each single syllable. Rhythm unit dividing means for outputting a single syllable length data string including a single syllable type and a single syllable length, vowel length ratio storing means for storing a vowel length ratio for each phonological chain, and a single syllable length data string as inputs. Vowel length priority phonological boundary determining means for locating a syllable boundary position in a vowel section, determining a time length from a vowel consonant boundary to a syllable boundary position from a syllable length and a vowel length ratio, and outputting a duration of each phoneme; It is characterized by having.

【００２０】[0020]

【作用】請求項１記載の音韻継続時間長制御方式では、
音節長の変動がモーラより長い発声上のまとまりに起因
するとしたもので、リズム単位抽出手段は、入力された
単音節列から、隣接する２単音節の種類に対する判定を
行い、２音節あるいは１音節からなる発声上のまとまり
の単位であるリズム単位を抽出し、単音節列リズム単位
種類を付与したリズム単位列を出力する。リズム単位長
決定手段は、リズム単位種類に従いリズム単位基準長記
憶手段からリズム単位基準長を読み出し、各リズム単位
毎の継続時間長を決定し、リズム単位列に各リズム単位
長の情報を付与したリズム単位長データ列を出力する。
リズム単位分割手段は、リズム単位のうち２音節からな
るリズム単位の時間長を一定比率に分割して各単音節の
時間長を決定し、単音節種類と単音節長からなる単音節
長データ列を出力する。子音長決定手段は、各子音の継
続時間長を決定する。音韻境界決定手段は、前記単音節
長データ列を入力とし、単音節種類に従ってリズム知覚
点記憶手段からリズム知覚点位置を読み出し、各単音節
のリズム知覚点位置を定め、子音長決定手段により子音
継続時間を決定し、リズム知覚点の時間間隔が各単音節
長と一致するように子音を配置することによって音韻境
界を決定し、各音韻の継続時間長を出力する。According to the phonological duration control method of the first aspect,
The variation in syllable length is attributed to a vocal unit longer than mora, and the rhythm unit extracting means determines a type of two adjacent syllables from the input single syllable sequence and performs two or one syllable. A rhythm unit, which is a unit of utterance composed of, is extracted, and a rhythm unit string to which a monosyllable string rhythm unit type is added is output. The rhythm unit length determining means reads the rhythm unit reference length from the rhythm unit reference length storage means according to the rhythm unit type, determines the duration time for each rhythm unit, and adds information on each rhythm unit length to the rhythm unit sequence. Outputs the rhythm unit length data string.
The rhythm unit dividing means determines the time length of each syllable by dividing the time length of the rhythm unit consisting of two syllables in the rhythm unit into a fixed ratio, and obtains a single syllable length data sequence consisting of a single syllable type and a single syllable length. Is output. The consonant length determining means determines the duration of each consonant. The phonological boundary determining means receives the single syllable length data string as input, reads a rhythm perceived point position from the rhythm perceived point storage means according to the single syllable type, determines a rhythm perceived point position of each single syllable, and sets a consonant length by the consonant length determining means. The duration is determined, the consonants are arranged so that the time interval of the rhythm perception point matches each monosyllable length, and the phoneme boundary is determined, and the duration of each phoneme is output.

【００２１】請求項２記載の音韻継続時間長制御方式
は、リズム単位を構成する音節の組み合わせによる変動
を制御するもので、音節種類別リズム単位分割手段は、
２音節からなるリズム単位に対しリズム単位分割比率を
分割比率記憶手段から読み出し、前記リズム単位を分
割、各単音節長を決定し、単音節種類と単音節長とから
なる単音節長データを出力する。According to a second aspect of the present invention, a phonological duration control method controls fluctuation due to a combination of syllables constituting a rhythm unit.
The rhythm unit division ratio is read out from the division ratio storage means for a rhythm unit composed of two syllables, the rhythm unit is divided, each single syllable length is determined, and single syllable length data composed of a single syllable type and a single syllable length is output. I do.

【００２２】請求項３記載の音韻継続時間長制御方式
は、アクセントが音節長を変動させることを制御に取り
入れたものであり、単音節長変更手段は、入力された単
音節長データ列に対し、別途入力されたアクセント情報
に基づいて各単音節長を変更し、変更した単音節長デー
タ列を出力する。According to a third aspect of the present invention, the syllable duration control system incorporates the control that the accent varies the syllable length, and the single syllable length changing means controls the input single syllable length data sequence. Then, each single syllable length is changed based on separately input accent information, and the changed single syllable length data string is output.

【００２３】請求項４記載の音韻継続時間長制御方式で
は、母音長優先音韻境界決定手段は、前記単音節長デー
タ列あるいは前記変更された単音節長データ列を入力と
し、単音節境界位置から母音終了点および、母音開始点
から単音節境界位置までの時間長を、母音長比率記憶手
段に記憶されている母音長比率を読み込み、演算によっ
て求め、音韻境界を決定し、各音韻の継続時間長を出力
する。In the phonological duration control method according to claim 4, the vowel-length-priority phonological boundary determining means receives the single syllable length data sequence or the changed single syllable length data sequence as input and calculates a single syllable length data position from the single syllable boundary position. The vowel end point and the time length from the vowel start point to the monosyllable boundary position are read from the vowel length ratio stored in the vowel length ratio storage means, obtained by calculation, the phoneme boundary is determined, and the duration of each phoneme is determined. Output the length.

【００２４】[0024]

【実施例】以下、本発明に係る音韻継続時間長制御方式
の好適な実施例を図面を用いて説明する。なお、従来例
又は各実施例において同様の要素には同じ符号を付け
る。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A preferred embodiment of the phoneme duration control method according to the present invention will be described below with reference to the drawings. Note that the same reference numerals are given to the same elements in the conventional example or each embodiment.

【００２５】実施例１．図１は、本発明に係る音韻継続
時間長制御方式の第１実施例を示した構成図である。本
装置は、入力された単音節列から隣接する音節の判定に
より２音節あるいは１音節からなるリズム単位を抽出
し、単音節列にリズム単位種類の情報を付与したリズム
単位列を出力するリズム単位抽出手段としてのリズム単
位抽出部２と、リズム単位種類ごとの基本継続時間長を
記憶するリズム単位基準長記憶手段としてのリズム単位
基準長記憶部４と、前記リズム単位列を入力とし、リズ
ム単位基準長記憶部４からリズム単位種類に対応する基
本継続時間長を読み出し、入力された各リズム単位の継
続時間を決定し、入力リズム単位列に付与し、リズム単
位長データ列として出力するリズム単位長決定手段とし
てのリズム単位長決定部６と、リズム単位長データ列を
入力とし、２音節からなるリズム単位の時間長を一定比
率に分割して各単音節の時間長を決定し、単音節種類と
単音節長からなる単音節長データ列を出力するリズム単
位分割手段としてのリズム単位分割部８と、単音節種類
ごとのリズム知覚点位置を記憶するリズム知覚点記憶手
段としてのリズム知覚点記憶部１０と、子音の継続時間
長を決定する子音長決定手段としての子音長決定部１２
と、単音節長データ列を入力とし、リズム知覚点記憶部
１０からリズム知覚点位置を読み込むとともに、子音長
決定部１２で与えられる子音長からリズム知覚点の時間
間隔が各単音節長と一致するように各子音を配置するこ
とで音韻境界を決定し、各音韻の継続時間長を決定する
音韻境界決定手段としての音韻境界決定部１４と、で構
成される。 Embodiment 1 FIG. 1 is a configuration diagram showing a first embodiment of a phoneme duration control method according to the present invention. This device extracts a rhythm unit consisting of two syllables or one syllable from the input single syllable sequence by judging adjacent syllables, and outputs a rhythm unit sequence obtained by adding rhythm unit type information to the single syllable sequence. A rhythm unit extraction unit 2 as extraction means, a rhythm unit reference length storage unit 4 as rhythm unit reference length storage means for storing a basic duration length for each rhythm unit type, and a rhythm unit A rhythm unit that reads the basic duration corresponding to the rhythm unit type from the reference length storage unit 4, determines the duration of each input rhythm unit, assigns it to the input rhythm unit sequence, and outputs it as a rhythm unit length data sequence A rhythm unit length determining unit 6 as a length determining means and a rhythm unit length data string are input, and the time length of a rhythm unit consisting of two syllables is divided into a fixed ratio to be used for each unit. A rhythm unit dividing unit 8 which determines a syllable time length and outputs a single syllable length data string including a single syllable type and a single syllable length, and stores a rhythm perception point position for each single syllable type. Rhythm perception point storage unit 10 as rhythm perception point storage means, and consonant length determination unit 12 as consonant length determination means for determining the duration of consonants
And a syllable length data string as input, read the rhythm perception point position from the rhythm perception point storage unit 10, and match the time interval of the rhythm perception point with each single syllable length from the consonant length given by the consonant length determination unit 12. And a phoneme boundary determining unit 14 as a phoneme boundary determining means for determining a phoneme boundary by arranging each consonant so as to determine the duration of each phoneme.

【００２６】本実施例において特徴的なことは、以上の
構成により２音節からなるリズム単位すなわち長音節あ
るいは２単音節の時間長を一定比率で分割して各単音節
長を決定し、各単音節におけるリズム知覚点の時間間隔
が単音節長と一致するように子音を配置して音韻境界を
決定し音韻継続時間長を設定するようにしたので、日本
語単音節の継続時間の変動要因として重要な２音節のリ
ズムを再現することができることである。これにより、
自然なリズムを有する品質の高い合成音声を出力する音
声規則合成装置を実現することができる。The characteristic feature of this embodiment is that the rhythm unit composed of two syllables, that is, the length of a long syllable or two single syllables is divided at a fixed ratio to determine each single syllable length. The consonants are arranged so that the time interval of the rhythm perception point in the syllable matches the single syllable length, the phonological boundary is determined, and the phonological duration is set. The ability to reproduce the important two-syllable rhythm. This allows
A speech rule synthesizing apparatus that outputs a high-quality synthesized speech having a natural rhythm can be realized.

【００２７】次に、本実施例における動作について図２
に示す説明図を用いて説明する。リズム単位抽出部２は
入力された単音節列１について隣合う音節の判定を行
い、２音節、あるいは１音節からなるリズム単位を抽出
する。リズム単位の一例としては、例えば第４の文献
「外国人のための日本語例文・問題シリーズ１２発
音・聴解」（土岐、村田荒竹出版１９８９）５ペー
ジに示されている３種類の単位を用いる。すなわち、長
音節、長音節以外の連続する２単音節、孤立する単音節
である。ここで、長音節は第４の文献に示されるよう
に、結合の強い音節の連鎖であり、（１）単音節（母音、子音＋母音）＋撥音、（２）単音節＋促音（３）長母音音節（４）単音節＋母音／ｉ／である。リズム単位抽出部２は、単音節列から隣合う音
節を判定して、まず、上記のいずれかの音節列であれば
長音節とする。次に、長音節以外の音節について前から
連続する音節を２音節とし、それ以外の１音節は、孤立
する単音節とする。例えば、図２に示したように、「は
なさない」という入力単音節列に対しては、「ない」が
長音節として検出され、残る「はなさ」について前から
連続する単音節を検索し、「はな」が２音節となり、残
る「さ」が孤立する単音節となる。Next, the operation in this embodiment will be described with reference to FIG.
This will be described with reference to the explanatory diagram shown in FIG. The rhythm unit extraction unit 2 determines adjacent syllables in the input single syllable string 1 and extracts a rhythm unit composed of two syllables or one syllable. As an example of the rhythm unit, for example, the three types of units shown on page 5 of the fourth document “Japanese Example Sentences and Problem Series for Foreigners 12 Pronunciation and Listening” (Toki, Murata Aratake Publishing, 1989) Used. That is, a long syllable, two consecutive single syllables other than the long syllable, and an isolated single syllable. Here, as shown in the fourth document, a long syllable is a chain of syllables that are strongly connected, and (1) a single syllable (vowel, consonant + vowel) + vowel sound; Long vowel syllable (4) Single syllable + vowel / i /. The rhythm unit extraction unit 2 determines adjacent syllables from a single syllable string, and if the syllable string is any of the above, first determines that the syllable is a long syllable. Next, syllables other than long syllables are assumed to be two syllables continuing from the beginning, and the other syllable is assumed to be an isolated single syllable. For example, as shown in FIG. 2, for an input monosyllable string “Hanasanai”, “None” is detected as a long syllable, and for the remaining “Hanasa”, a continuous monosyllable is searched from the front, and “ “Hana” becomes two syllables, and the remaining “sa” becomes an isolated monosyllable.

【００２８】リズム単位長決定部６は、各リズム単位の
種類に従って、リズム単位基準長記憶部４に記憶されて
いるリズム単位基準長５を読み出し、リズム単位長を決
定し、入力されたリズム単位列３に各リズム単位長の情
報を付与したリズム単位長データ列７を出力する。リズ
ム単位分割部８は、２音節からなるリズム単位、すなわ
ち長音節および２単音節を一定比率に分割して、各単音
節長を決定し、単音節種類と単音節長からなる単音節長
データ列９を出力する。The rhythm unit length determination unit 6 reads the rhythm unit reference length 5 stored in the rhythm unit reference length storage unit 4 according to the type of each rhythm unit, determines the rhythm unit length, and A rhythm unit length data sequence 7 in which information on each rhythm unit length is added to the column 3 is output. The rhythm unit dividing unit 8 divides a syllable unit consisting of two syllables, that is, a long syllable and two monosyllables at a fixed ratio, determines each monosyllable length, and generates monosyllable length data consisting of a monosyllable type and a monosyllable length. Output column 9.

【００２９】子音長決定部１２は、例えば第１の文献に
あるように自然音声サンプルにおける子音種類ごとの平
均長を直接用いて子音長１３を決定する。リズム知覚点
記憶部１０は、前述した第１の文献に示されるように各
子音に固有の音節の時間長の聴取の基準となる位置であ
るリズム知覚点位置が予め記憶されている。音韻境界決
定部１４は、単音節種類に従って、リズム知覚点記憶部
１０に記憶されているリズム知覚点位置１１を読み出
し、図２に示したように、リズム知覚点位置１１と子音
長１３とにより、与えられた音節境界に各単音節のリズ
ム知覚点位置を置くことで子音位置を定め、母音は子音
が割り当てられた残りの区間で与え、音韻境界を決定
し、音韻継続時間長１５を出力する。The consonant length determination unit 12 determines the consonant length 13 by directly using the average length of each consonant type in a natural speech sample as described in the first document, for example. The rhythm perception point storage unit 10 previously stores a rhythm perception point position which is a reference position for listening to a syllable time length specific to each consonant, as described in the above-mentioned first document. The phonological boundary determination unit 14 reads the rhythm perception point position 11 stored in the rhythm perception point storage unit 10 according to the monosyllable type, and as shown in FIG. The consonant position is determined by placing the rhythm perception point position of each monosyllable on the given syllable boundary, the vowel is given in the remaining section to which the consonant is assigned, the phonological boundary is determined, and the phonological duration 15 is output. I do.

【００３０】なお、子音長決定部１２としては、上記子
音種類ごとの平均値を直接用いる方法に替えて、例えば
第５の文献“文音声における子音継続長の設定”（海
木、匂坂日本音響学会講演論文集２−６−２０平
成２年９月ｐｐ２５９−２６０）で提案されているよ
うに、子音種類、隣接母音種類などを要因とし、数量化
１類によって自然音声サンプルを分析して係数を決定し
た重回帰モデルを用いて子音長を決定することもでき
る。The consonant length determination unit 12 may use, for example, the fifth document “Setting Consonant Continuation Length in Sentence Speech” (Naoki, Sakasaka Nippon Acoustics) instead of using the above average value for each consonant type directly. As proposed in the Proceedings of the Society 2-6-20 September 1990, pp. 259-260), natural speech samples are analyzed by quantification class 1 using coefficients such as consonant types, adjacent vowel types, and coefficients. The consonant length can also be determined using the multiple regression model in which is determined.

【００３１】以上のように、本実施例によれば、日本語
における単音節長の変動要因として特徴的である単音節
より長い発声上の単位の時間構造の特徴を付与する音韻
継続時間長制御を行うことができ、この発明による音韻
継続時間長制御方式を用いる音声規則合成装置では、自
然なリズム感を持った合成音声を生成することができ
る。As described above, according to the present embodiment, the phoneme duration control for giving the characteristic of the time structure of a unit on an utterance longer than a single syllable, which is characteristic as a variation factor of the single syllable length in Japanese. The speech rule synthesizing apparatus using the phoneme duration control method according to the present invention can generate synthesized speech having a natural rhythmic feeling.

【００３２】実施例２．図３は、本発明に係る音韻継続
時間長制御方式の第２実施例を採用した音声規則合成装
置の構成を示す構成図である。本装置は、第１実施例と
同様のリズム単位抽出部２、リズム単位基準長記憶部
４、リズム単位長決定部６、リズム知覚点記憶部１０、
子音長決定部１２及び音韻境界決定部１４と、第１実施
例のリズム単位分割部８に替えて、２音節からなるリズ
ム単位を構成する単音節対種類ごとの分割比率を記憶す
る分割比率記憶手段としての分割比率記憶部１６と、リ
ズム単位長データ列を入力とし、単音節対の種類に従っ
て分割比率記憶部１６から読み出した分割比率によって
リズム単位を分割することにより各単音節長を決定し、
単音節種類と単音節長からなる単音節長データ列を出力
する音節種類別リズム単位分割手段としての音節種類別
リズム単位分割部１８と、で構成される。 Embodiment 2 FIG . FIG. 3 is a configuration diagram showing a configuration of a speech rule synthesis device employing a second embodiment of the phoneme duration control method according to the present invention. The present apparatus includes a rhythm unit extraction unit 2, a rhythm unit reference length storage unit 4, a rhythm unit length determination unit 6, a rhythm perception point storage unit 10, similar to the first embodiment.
Instead of the consonant length determination unit 12 and the phonological boundary determination unit 14 and the rhythm unit division unit 8 of the first embodiment, a division ratio storage that stores a division ratio for each type of single syllable that forms a rhythm unit composed of two syllables. The rhythm unit length is determined by dividing the rhythm unit according to the division ratio read from the division ratio storage unit 16 according to the type of the single syllable pair, using the division ratio storage unit 16 as a means and the rhythm unit length data string as input. ,
A syllable-type-specific rhythm unit dividing unit 18 serving as a syllable-type-specific rhythm unit dividing unit that outputs a single-syllable-length data string including a single-syllable type and a single-syllable length.

【００３３】本実施例において特徴的なことは、以上の
構成により２音節からなるリズム単位すなわち長音節あ
るいは２単音節の時間長を構成する音節種類別の分割比
率で分割して各単音節長を決定し、各単音節におけるリ
ズム知覚点の時間間隔が単音節長と一致するように子音
を配置して音韻境界を決定し、音韻継続時間長を設定す
るようにしたので、日本語単音節の継続時間の変動要因
として重要な２音節のリズムを再現することができるこ
とである。このように、リズム単位内の音節長の変動を
音節種類ごとの分割比率で制御するようにしたので、自
然なリズムを有する品質の高い合成音声を出力する音声
規則合成装置を実現することができる。What is characteristic in this embodiment is that each single syllable length is divided by a rhythm unit composed of two syllables, that is, a long syllable or a division length for each syllable type constituting the time length of two single syllables. The consonants are arranged so that the time interval of the rhythm perception point in each monosyllable matches the monosyllable length, the phonological boundary is determined, and the phonological duration is set. Is to be able to reproduce the rhythm of two syllables, which is important as a factor of fluctuation of the duration of the syllable. As described above, the variation of the syllable length in the rhythm unit is controlled by the division ratio for each syllable type, so that a speech rule synthesizing apparatus that outputs a high-quality synthesized speech having a natural rhythm can be realized. .

【００３４】次に、本実施例における動作について説明
する。Next, the operation of this embodiment will be described.

【００３５】リズム単位抽出部２は、入力された単音節
列１について隣合う音節の判定を行い、長音節以外の連
続する２単音節、あるいは１音節からなるリズム単位を
抽出する。ここで、リズム単位は、例えば前記第１実施
例に示した長音節、２音節、孤立する単音節が用いられ
る。リズム単位長決定部６は、各リズム単位の種類に従
って、リズム単位基準長記憶部４に記憶されているリズ
ム単位基準長５を読み出し、リズム単位長を決定し、入
力されたリズム単位列３に各リズム単位長の情報を付与
したリズム単位長データ列７を出力する。音節種類別リ
ズム単位分割部１８は、２音節からなるリズム単位、す
なわち長音節および２単音節の音節種類によって、予め
分割比率記憶部１６に記憶されている分割比率１７を読
み出し、継続時間を分割して、各単音節長を決定し、音
節種類と各単音節長からなる単音節長データ列９を出力
する。The rhythm unit extraction unit 2 determines adjacent syllables in the input single syllable string 1, and extracts a rhythm unit composed of two consecutive single syllables other than long syllables or one syllable. Here, as the rhythm unit, for example, long syllables, two syllables, and isolated single syllables shown in the first embodiment are used. The rhythm unit length determination unit 6 reads the rhythm unit reference length 5 stored in the rhythm unit reference length storage unit 4 according to the type of each rhythm unit, determines the rhythm unit length, and A rhythm unit length data string 7 to which information of each rhythm unit length is added is output. The syllable-type-based rhythm unit division unit 18 reads the division ratio 17 stored in advance in the division ratio storage unit 16 according to the syllable type of two syllables, that is, the syllable types of long syllable and two single syllables, and divides the duration. Then, each single syllable length is determined, and a single syllable length data sequence 9 including the syllable type and each single syllable length is output.

【００３６】子音長決定部１２は、子音長１３を決定す
る。音韻境界決定部１４は、単音節種類に従って、リズ
ム知覚点記憶部１０に記憶されているリズム知覚点位置
１１を読み出すとともに、前記子音長１３とにより、図
２に示したように、与えられた音節境界に各単音節のリ
ズム知覚点位置を置くことで、子音位置を定め、母音は
子音が割り当てられた残りの区間で与え、音韻境界を決
定し、音韻継続時間長１５を出力する。The consonant length determining section 12 determines the consonant length 13. The phonological boundary determination unit 14 reads out the rhythm perception point position 11 stored in the rhythm perception point storage unit 10 according to the type of monosyllable, and is given by the consonant length 13 as shown in FIG. By placing the rhythm perception point position of each monosyllable on the syllable boundary, the consonant position is determined, the vowel is given in the remaining section to which the consonant is allocated, the phonological boundary is determined, and the phonological duration 15 is output.

【００３７】以上のように、本実施例によれば、日本語
における単音節長の変動要因として特徴的である単音節
より長い発声上の単位の時間構造の特徴を付与する音韻
継続時間長制御を行うことができ、更に、リズム単位内
の音節長比率の音節種類別の変動を考慮した音韻継続時
間長制御を行うことができ、本実施例における音韻継続
時間長制御方式を用いる音声規則合成装置では、自然な
リズム感を持った合成音声を生成することができる。As described above, according to the present embodiment, the phoneme duration control that gives the characteristic of the time structure of a unit on an utterance longer than a single syllable, which is characteristic as a variation factor of the single syllable length in Japanese. Syllable duration control in consideration of syllable type variation of the syllable length ratio in the rhythm unit, and speech rule synthesis using the phoneme duration control method in the present embodiment. The device can generate a synthetic voice having a natural rhythmic feeling.

【００３８】実施例３．図４は、本発明に係る音韻継続
時間長制御方式の第３実施例を採用した音声規則合成装
置の構成を示す構成図である。本装置は、前記第１実施
例の構成要素に加え、リズム単位分割部８と音韻境界決
定部１４との間に、アクセント情報１９に基づき各単音
節長を変更する単音節長変更手段としての単音節長変更
部２０を設けたことを特徴とする。これにより、単音節
長をアクセント情報１９に基づいて変更し、各単音節に
おけるリズム知覚点の時間間隔が変更した単音節長と一
致するように子音を配置して音韻境界を決定し、音韻継
続時間長を設定するようにしたので、日本語単音節の継
続時間の変動要因として重要な２音節のリズムを再現す
ることができ、自然なリズムを有する品質の高い合成音
声を出力する音声規則合成装置を実現することができ
る。 Embodiment 3 FIG . FIG. 4 is a configuration diagram showing the configuration of a speech rule synthesis device employing a third embodiment of the phoneme duration control method according to the present invention. This device includes a single syllable length changing unit that changes each single syllable length based on the accent information 19 between the rhythm unit dividing unit 8 and the phonological boundary determining unit 14 in addition to the components of the first embodiment. A syllable length changing unit 20 is provided. Thereby, the monosyllable length is changed based on the accent information 19, the consonants are arranged so that the time interval of the rhythm perception point in each monosyllable matches the changed monosyllable length, and the phonological boundary is determined. Speech rule synthesis that can reproduce the rhythm of two syllables, which is important as a fluctuation factor of the duration of a single Japanese syllable, and outputs a high-quality synthetic speech with a natural rhythm because the time length is set The device can be realized.

【００３９】次に、本実施例における動作について説明
する。Next, the operation of this embodiment will be described.

【００４０】リズム単位抽出部２は、単音節列１から例
えば長音節、２単音節、孤立する単音節の３種類のリズ
ム単位を抽出し、単音節列１にリズム単位種類の情報を
付与したリズム単位列３を出力する。リズム単位長決定
部６は、各リズム単位の種類に従って、リズム単位基準
長記憶部４に記憶されているリズム単位基準長５を選択
し、各リズム単位長を決定し、リズム単位列３に各リズ
ム単位長の情報を付与したリズム単位長データ列７を出
力する。リズム単位分割部８は、２音節からなるリズム
単位すなわち長音節あるいは２単音節のリズム単位長を
一定比率に分割して、各単音節長を決定し、音節種類と
単音節長とからなる単音節長データ列９を出力する。単
音節長変更部２０は、入力されたアクセント情報１９に
従って、例えば「単音節列末尾から２番目の単音節にア
クセント核がある場合には、末尾の単音節長は１５％短
くなる」といった、自然音声に見られる現象を反映する
ように各単音節長を変更し、変更単音節長データ列２１
を出力する。子音長決定部１２は子音長１３を決定す
る。音韻境界決定部１４は、単音節種類に従って、リズ
ム知覚点記憶部１０に記憶されているリズム知覚点位置
１１を読み出し、図２に示したように、リズム知覚点位
置１１と子音長１３とにより、与えられた音節境界に各
単音節のリズム知覚点位置を置くことで子音位置を定
め、母音は子音が割り当てられた残りの区間で与え、音
韻境界を決定し、音韻継続時間長１５を出力する。The rhythm unit extraction unit 2 extracts three types of rhythm units, for example, long syllables, two single syllables, and isolated single syllables from the single syllable string 1, and adds rhythm unit type information to the single syllable string 1. The rhythm unit sequence 3 is output. The rhythm unit length determination unit 6 selects the rhythm unit reference length 5 stored in the rhythm unit reference length storage unit 4 according to the type of each rhythm unit, determines each rhythm unit length, and stores A rhythm unit length data string 7 to which rhythm unit length information is added is output. The rhythm unit dividing unit 8 divides the rhythm unit length of two syllables, that is, the rhythm unit length of a long syllable or two single syllables at a fixed ratio, determines each single syllable length, and determines a single syllable length and a single syllable length. The syllable length data string 9 is output. According to the input accent information 19, the monosyllabic length changing unit 20 states, for example, "If the second monosyllable from the end of a monosyllable string has an accent nucleus, the monosyllable length at the end is reduced by 15%." Each single syllable length is changed to reflect the phenomenon seen in natural speech, and the changed single syllable length data string 21 is changed.
Is output. The consonant length determining unit 12 determines the consonant length 13. The phonological boundary determination unit 14 reads the rhythm perception point position 11 stored in the rhythm perception point storage unit 10 according to the monosyllable type, and as shown in FIG. The consonant position is determined by placing the rhythm perception point position of each monosyllable on the given syllable boundary, the vowel is given in the remaining section to which the consonant is assigned, the phonological boundary is determined, and the phonological duration 15 is output. I do.

【００４１】以上のように、本実施例によれば、日本語
における単音節長の変動要因として特徴的である単音節
より長い発声上の単位の時間構造の特徴を付与する音韻
継続時間長制御を行うことができ、更に、アクセントに
よる音節長の変動を考慮した音韻継続時間長制御を行う
ことができ、本実施例における音韻継続時間長制御方式
を用いる音声規則合成装置では、自然なリズム感を持っ
た合成音声を生成することができる。As described above, according to the present embodiment, the phoneme duration control that gives the characteristic of the time structure of a unit on an utterance longer than a single syllable, which is characteristic as a variation factor of the single syllable length in Japanese. Can be performed, and the phoneme duration control can be performed in consideration of the syllable length variation due to the accent. In the speech rule synthesizing apparatus using the phoneme duration control method in the present embodiment, a natural rhythmic sense can be obtained. Can be generated.

【００４２】実施例４．図５は、本発明に係る音韻継続
時間長制御方式の第４実施例を採用した音声規則合成装
置の構成を示す構成図である。本装置は、第１実施例と
同様のリズム単位抽出部２、リズム単位基準長記憶部
４、リズム単位長決定部６及びリズム単位分割部８と、
第１実施例のリズム知覚点記憶部１０、子音長決定部１
２及び音韻境界決定部１４に替えて、音韻連鎖毎の母音
長比率を記憶する母音長比率記憶手段としての母音長比
率記憶部２２と、単音節長データ列を入力とし、音節境
界位置を母音区間に配置し、母音子音境界から音節境界
位置までの時間長を音節長と母音長比率から決定し、各
音韻の継続時間長を出力する母音長優先音韻境界決定手
段としての母音長優先音韻境界決定部２４と、で構成さ
れる。 Embodiment 4 FIG . FIG. 5 is a configuration diagram showing a configuration of a speech rule synthesis device employing a fourth embodiment of the phoneme duration control method according to the present invention. This apparatus includes a rhythm unit extraction unit 2, a rhythm unit reference length storage unit 4, a rhythm unit length determination unit 6, and a rhythm unit division unit 8 similar to the first embodiment.
Rhythm perception point storage unit 10 and consonant length determination unit 1 of the first embodiment
Vowel length ratio storage unit 22 as vowel length ratio storage means for storing a vowel length ratio for each phoneme chain, and a single syllable length data sequence in place of the vowel length and the vowel length. A vowel length priority phonological boundary as a vowel length priority phonological boundary determining means that is arranged in a section, determines the time length from the vowel consonant boundary to the syllable boundary position from the syllable length and the vowel length ratio, and outputs the duration of each phoneme. And a decision unit 24.

【００４３】本実施例において特徴的なことは、以上の
構成により２音節からなるリズム単位すなわち長音節あ
るいは２単音節の時間長を一定比率で分割して各単音節
長を決定し、単音節境界が母音区間にあるとし、単音節
境界から母音子音境界までの時間長を決定することで音
韻継続時間長を設定するようにしたので、日本語単音節
の継続時間の変動要因として重要な２音節のリズムを母
音の重心位置間隔が音節長として知覚されるという仮定
の下で再現することができ、自然なリズムを有する品質
の高い音声規則合成装置を実現することができる。The characteristic feature of this embodiment is that the rhythm unit composed of two syllables, that is, a long syllable or two single syllables is divided at a fixed ratio to determine each single syllable length, and the single syllable length is determined. Since the boundary is in a vowel section and the duration of a syllable is set by determining the time length from a monosyllable boundary to a vowel-consonant boundary, it is important as a variation factor of the duration of a Japanese monosyllable. The syllable rhythm can be reproduced under the assumption that the barycenter position interval of the vowel is perceived as a syllable length, and a high-quality speech rule synthesizer having a natural rhythm can be realized.

【００４４】次に、本実施例における動作について説明
する。Next, the operation of this embodiment will be described.

【００４５】リズム単位抽出部２は、単音節列１から例
えば長音節、２単音節、孤立する単音節の３種類のリズ
ム単位を抽出し、単音節列１にリズム単位種類の情報を
付与したリズム単位列３を出力する。リズム単位長決定
部６は、各リズム単位の種類に従って、リズム単位基準
長記憶部４に記憶されているリズム単位基準長５を選択
し、各リズム単位長を決定し、リズム単位列３に各リズ
ム単位長の情報を付与したリズム単位長データ列７を出
力する。リズム単位分割部８は、２音節からなるリズム
単位すなわち長音節あるいは２単音節のリズム単位長を
一定比率に分割して、各単音節長を決定し、単音節種類
と単音節長とからなる単音節長データ列９を出力する。
母音優先音韻境界決定部２４は、予め記憶されている母
音長比率記憶部２２から音節境界を母音区間内に配置し
たとき音節境界から母音子音境界までの時間長を単音節
長から算出するための係数である母音長比率２３を読み
出し、音節境界から母音子音境界までの時間長を決定
し、音韻継続時間長１５を出力する。The rhythm unit extraction unit 2 extracts three types of rhythm units, for example, long syllables, two single syllables, and isolated single syllables from the single syllable string 1, and adds rhythm unit type information to the single syllable string 1. The rhythm unit sequence 3 is output. The rhythm unit length determination unit 6 selects the rhythm unit reference length 5 stored in the rhythm unit reference length storage unit 4 according to the type of each rhythm unit, determines each rhythm unit length, and stores A rhythm unit length data string 7 to which rhythm unit length information is added is output. The rhythm unit dividing unit 8 divides a rhythm unit consisting of two syllables, that is, a syllable unit length of a long syllable or two single syllables into a fixed ratio, determines each single syllable length, and comprises a single syllable type and a single syllable length. A single syllable length data string 9 is output.
The vowel priority phonological boundary determining unit 24 determines the time length from the syllable boundary to the vowel consonant boundary from the monosyllable length when the syllable boundary is arranged in the vowel section from the vowel length ratio storage unit 22 stored in advance. The vowel length ratio 23, which is a coefficient for calculation, is read, the time length from the syllable boundary to the vowel consonant boundary is determined, and the phoneme duration 15 is output.

【００４６】以上のように、本実施例によれば、日本語
における単音節長の変動要因として特徴的である単音節
より長い発声上の単位の時間構造の特徴を付与する音韻
継続時間長制御を行うことができ、更に、得られた音節
境界に対して音声規則合成装置が母音重心を割り当てる
ようにすれば、単音節の継続時間の知覚が母音重心であ
ると仮定するリズム単位制御法となり、本実施例におけ
る音韻継続時間長制御方式を用いる音声規則合成装置で
は、自然なリズム感を持った合成音声を生成することが
できる。As described above, according to the present embodiment, the phoneme duration control that gives the characteristic of the time structure of a unit on an utterance longer than a single syllable, which is characteristic as a variation factor of the single syllable length in Japanese. If the vowel ruler assigns the vowel centroid to the obtained syllable boundary, the rhythm unit control method assumes that the perception of the duration of a single syllable is the vowel centroid. In the speech rule synthesizing apparatus using the phoneme duration control method according to the present embodiment, a synthesized speech having a natural rhythm can be generated.

【００４７】実施例５．上記第４実施例におけるリズム
単位分割手段としてのリズム単位分割部８は、上記第２
実施例に示した音節種類別リズム単位分割部１８及び分
割比率記憶部１６に置き換えることもできる。 Embodiment 5 FIG . The rhythm unit dividing section 8 as the rhythm unit dividing means in the fourth embodiment is the same as the second embodiment.
The rhythm unit division unit 18 for each syllable type and the division ratio storage unit 16 shown in the embodiment can be replaced with each other.

【００４８】実施例６．上記実施例４における母音優先
音韻境界決定部２４は、リズム単位分割部８からの出力
である単音節長データ列９を入力するが、上記第３実施
例における単音節長変更手段としての単音節長変更部２
０を母音長優先音韻境界決定部２４とリズム単位分割部
８との間に設け、アクセント情報に従って単音節長を変
更し単音節長変更部２０から出力される変更単音節長デ
ータ列を、単音節長データ列に替えて母音長優先音韻境
界決定部２４の入力とすることもできる。 Embodiment 6 FIG . The vowel-priority phoneme boundary determining unit 24 in the fourth embodiment receives the single syllable length data sequence 9 output from the rhythm unit dividing unit 8, but the single syllable length changing means in the third embodiment. Length change unit 2
0 is provided between the vowel-length-priority phoneme boundary determining unit 24 and the rhythm unit dividing unit 8, and the modified monosyllable length data string output from the monosyllable length changing unit 20 according to the accent information is changed. Instead of the syllable length data sequence, it can be input to the vowel length priority phoneme boundary determination unit 24.

【００４９】実施例７．上記第３実施例及び第６実施例
に示したリズム単位分割部８を第２実施例における音節
種類別リズム単位分割部１８及び分割比率記憶部１６に
置き換えることもできる。Embodiment 7 FIG . The rhythm unit dividing section 8 shown in the third embodiment and the sixth embodiment can be replaced with the syllable type-based rhythm unit dividing section 18 and the dividing ratio storage section 16 in the second embodiment.

【００５０】[0050]

【発明の効果】以上説明したように、請求項１記載の発
明によれば、単音節列から隣接する音節の判定により、
例えば長音節、２単音節、孤立する単音節に分類される
リズム単位を抽出し、リズム単位種類ごとに時間長を決
定し、２音節からなるリズム単位すなわち長音節あるい
は２単音節の時間長を一定比率で分割して各単音節長を
決定し、各単音節におけるリズム知覚点の時間間隔が単
音節長と一致するように子音を配置して音韻境界を決定
し音韻継続時間長を設定するようにしたので、日本語単
音節の継続時間の変動要因として重要な２音節のリズム
を再現することが可能となる。As described above, according to the first aspect of the present invention, by determining adjacent syllables from a monosyllable string,
For example, rhythm units classified into long syllables, two single syllables, and isolated single syllables are extracted, a time length is determined for each rhythm unit type, and a rhythm unit composed of two syllables, that is, a long syllable or two single syllables is determined. Determine the length of each syllable by dividing at a fixed ratio, arrange consonants so that the time interval between rhythm perception points in each syllable matches the syllable length, determine phonological boundaries, and set the phonological duration As a result, it is possible to reproduce the rhythm of two syllables, which is important as a fluctuation factor of the duration of a single Japanese syllable.

【００５１】また、これを用いることで、自然なリズム
を有する品質の高い合成音声を出力する音声規則合成装
置を実現することが可能となる。Also, by using this, it is possible to realize a speech rule synthesizing apparatus that outputs a high quality synthesized speech having a natural rhythm.

【００５２】請求項２記載の発明によれば、単音節列か
ら隣接する音節の判定により、例えば長音節、２単音
節、孤立する単音節に分類されるリズム単位を抽出し、
リズム単位種類ごとに時間長を決定し、２音節からなる
リズム単位すなわち長音節あるいは２単音節の時間長を
構成する音節種類別の分割比率で分割して各単音節長を
決定し、各単音節におけるリズム知覚点の時間間隔が単
音節長と一致するように子音を配置して音韻境界を決定
し、音韻継続時間長を設定するようにしたので、日本語
単音節の継続時間の変動要因として重要な２音節のリズ
ムを再現することが可能となる。According to the second aspect of the present invention, rhythm units classified into long syllables, two single syllables, and isolated single syllables are extracted by judging adjacent syllables from the single syllable string,
The time length is determined for each rhythm unit type, and each syllable length is determined by dividing the rhythm unit consisting of two syllables, that is, a long syllable or a division ratio for each syllable type constituting the time length of two single syllables. The consonants are arranged so that the time interval of the rhythm perception point in the syllable matches the syllable length, the phonological boundary is determined, and the phonological duration is set. As a result, it is possible to reproduce an important two-syllable rhythm.

【００５３】更に、リズム単位内の音節長の変動を音節
種類ごとの分割比率で制御するようにしたので、自然な
リズムを有する品質の高い合成音声を出力する音声規則
合成装置を実現することが可能となる。Furthermore, since the variation of the syllable length in the rhythm unit is controlled by the division ratio for each syllable type, a speech rule synthesizing apparatus which outputs a high quality synthesized speech having a natural rhythm can be realized. It becomes possible.

【００５４】請求項３記載の発明によれば、単音節列か
ら隣接する音節の判定により、例えば長音節、２単音
節、孤立する単音節に分類されるリズム単位を抽出し、
リズム単位種類ごとに時間長を決定し、２音節からなる
リズム単位すなわち長音節あるいは２単音節の時間長を
分割して各単音節長を決定し、単音節長をアクセント情
報に基づいて変更し、各単音節におけるリズム知覚点の
時間間隔が変更した単音節長と一致するように子音を配
置して音韻境界を決定し、音韻継続時間長を設定するよ
うにしたので、日本語単音節の継続時間の変動要因とし
て重要な２音節のリズムを再現することが可能となる。According to the third aspect of the present invention, rhythm units classified into, for example, long syllables, two single syllables, and isolated single syllables are extracted by judging adjacent syllables from the monosyllable string,
The time length is determined for each rhythm unit type, the time length of a two-syllable rhythm unit, that is, a long syllable or two single syllables is divided to determine each single syllable length, and the single syllable length is changed based on accent information. The consonants were arranged so that the time interval of the rhythm perception point in each monosyllable coincided with the changed monosyllable length, the phonological boundary was determined, and the phonological duration was set. It is possible to reproduce a rhythm of two syllables which is important as a fluctuation factor of the duration.

【００５５】更に、アクセントによる音節長の変動を制
御可能としたので、これを用いることで、自然なリズム
を有する品質の高い合成音声を出力する音声規則合成装
置を実現することが可能となる。Further, since the variation of the syllable length due to the accent can be controlled, it is possible to realize a speech rule synthesizing apparatus which outputs a high-quality synthesized speech having a natural rhythm by using this.

【００５６】請求項４記載の発明によれば、単音節列か
ら隣接する音節の判定により、例えば長音節、２単音
節、孤立する単音節に分類されるリズム単位を抽出し、
リズム単位種類ごとに時間長を決定し、２音節からなる
リズム単位すなわち長音節あるいは２単音節の時間長を
一定比率で分割して各単音節長を決定し、単音節境界が
母音区間にあるとし、単音節境界から母音子音境界まで
の時間長を決定することで音韻継続時間長を設定するよ
うにしたので、日本語単音節の継続時間の変動要因とし
て重要な２音節のリズムを母音の重心位置間隔が音節長
として知覚されるという仮定の下で再現することが可能
となる。According to the fourth aspect of the present invention, rhythm units classified into, for example, long syllables, two single syllables, and isolated single syllables are extracted by judging adjacent syllables from the single syllable string,
The time length is determined for each rhythm unit type, and the time length of a two-syllable rhythm unit, that is, a long syllable or two single syllables is divided at a fixed ratio to determine each single syllable length. The phonological duration is set by determining the time length from the boundary of a single syllable to the boundary of a vowel consonant. It is possible to reproduce under the assumption that the barycenter position interval is perceived as a syllable length.

【００５７】また、これを用いることで、自然なリズム
を有する品質の高い音声規則合成装置を実現することが
可能となる。By using this, it is possible to realize a high-quality speech rule synthesizing apparatus having a natural rhythm.

[Brief description of the drawings]

【図１】この発明に係る音韻継続時間長制御方式の第
１実施例を示した構成図である。FIG. 1 is a configuration diagram showing a first embodiment of a phoneme duration control method according to the present invention.

【図２】第１実施例の動作を説明するための図であ
る。FIG. 2 is a diagram for explaining the operation of the first embodiment.

【図３】この発明に係る音韻継続時間長制御方式の第
２実施例を示した構成図である。FIG. 3 is a configuration diagram showing a second embodiment of the phoneme duration control method according to the present invention.

【図４】この発明に係る音韻継続時間長制御方式の第
３実施例を示した構成図である。FIG. 4 is a configuration diagram showing a third embodiment of a phoneme duration control method according to the present invention.

【図５】この発明に係る音韻継続時間長制御方式の第
４実施例を示した構成図である。FIG. 5 is a configuration diagram showing a fourth embodiment of the phoneme duration control method according to the present invention.

【図６】第１の従来例を示す構成図である。FIG. 6 is a configuration diagram showing a first conventional example.

【図７】第１の従来例の動作を示す説明図である。FIG. 7 is an explanatory diagram showing the operation of the first conventional example.

【図８】第２の従来例を示す構成図である。FIG. 8 is a configuration diagram showing a second conventional example.

【図９】第２の従来例の動作を示す説明図である。FIG. 9 is an explanatory diagram showing the operation of the second conventional example.

[Explanation of symbols]

１単音節列、２リズム単位抽出部、３リズム単位
列、４リズム単位基準長記憶部、５リズム単位基準
長、６リズム単位長決定部、７リズム単位長データ
列、８リズム単位分割部、９単音節長データ列、１
０リズム知覚点記憶部、１１リズム知覚点位置、１
２子音長決定部、１３子音長、１４音韻境界決定
部、１５音韻継続時間長、１６分割比率記憶部、１
７分割比率、１８音節種類別リズム単位分割部、１
９アクセント情報、２０単音節長変更部、２１変
更単音節長データ列、２２母音長比率記憶部、２３
母音長比率、２４母音長優先音韻境界決定部、２５
単音節長決定部、２６母音重心点間時間長決定部、２
７母音重心点間時間長、２８母音重心点間時間長記
憶部、２９母音重心点間長データ列、３０母音重心
点間音韻境界決定部、３１母音重心点間音韻境界決定
パラメータ、３２母音重心点間音韻境界決定パラメー
タ記憶部。1. Single syllable string, 2 rhythm unit extraction section, 3 rhythm unit string, 4 rhythm unit reference length storage section, 5 rhythm unit reference length, 6 rhythm unit length determination section, 7 rhythm unit length data string, 8 rhythm unit division section, 9 Single syllable length data string, 1
0 rhythm perception point storage unit, 11 rhythm perception point position, 1
2 consonant length determination unit, 13 consonant length, 14 phoneme boundary determination unit, 15 phoneme duration time, 16 division ratio storage unit, 1
7 division ratio, 18 syllable type rhythm unit division section, 1
9 accent information, 20 single syllable length changing section, 21 changed single syllable length data string, 22 vowel length ratio storage section, 23
Vowel length ratio, 24 Vowel length priority phonological boundary determination unit, 25
Single syllable length determining unit, 26 Vowel center of gravity time length determining unit, 2
7 vowel center-of-gravity point time length, 28 vowel center-of-gravity point length storage unit, 29 vowel center-of-gravity point length data sequence, 30 vowel center-of-gravity point phoneme boundary determining unit, 31 vowel center-of-gravity point phoneme boundary determining parameter, 32 vowel center of gravity Point to phoneme boundary determination parameter storage unit.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開昭61−11798（ＪＰ，Ａ) 特開昭62−234200（ＪＰ，Ａ) 特開昭62−284397（ＪＰ，Ａ) 特開昭62−294295（ＪＰ，Ａ) 特開平１−262599（ＪＰ，Ａ) 特開平５−281993（ＪＰ，Ａ) 特開平６−222793（ＪＰ，Ａ) 特開昭60−205497（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 13/06 - 13/08 ──────────────────────────────────────────────────続き Continuation of front page (56) References JP-A-61-11798 (JP, A) JP-A-62-234200 (JP, A) JP-A-62-284397 (JP, A) JP-A-62-234 294295 (JP, A) JP-A-1-262599 (JP, A) JP-A-5-281993 (JP, A) JP-A-6-222793 (JP, A) JP-A-60-205497 (JP, A) (58) Field surveyed (Int.Cl. ⁷ , DB name) G10L 13/06-13/08

Claims

(57) [Claims]

1. A phoneme duration control method used in a speech rule synthesizing apparatus for converting an arbitrary text into a speech, comprising: determining an adjacent syllable from an input monosyllable string;
Rhythm unit extraction means for extracting a syllable or a syllable unit consisting of one syllable and outputting a rhythm unit sequence obtained by adding rhythm unit type information to a single syllable sequence, and a rhythm unit for storing a basic duration for each rhythm unit type A reference length storage unit, the rhythm unit sequence being input, reading a basic duration corresponding to a rhythm unit type from the rhythm unit reference length storage unit, determining a duration of each input rhythm unit, A rhythm unit length determining means for assigning to a unit sequence and outputting it as a rhythm unit length data sequence; inputting the rhythm unit length data sequence as input, dividing a time length of a rhythm unit composed of two syllables into a fixed ratio, and dividing each single syllable Rhythm unit dividing means that determines the time length of each syllable and outputs a single syllable length data string consisting of a single syllable length and the position of the rhythm perception point for each single syllable type A rhythm perception point storage means for storing, a consonant length determination means for determining a duration of a consonant, a syllable length data string as an input, and reading a rhythm perception point position from the rhythm perception point storage means; The phoneme boundary is determined by arranging each consonant from the consonant length given by the length determining means so that the time interval of the rhythm perception point matches each monosyllable length, and determining the duration of each phoneme. Means, and a phoneme duration control method.

2. A phonological duration control method used in a speech rule synthesizing apparatus for converting an arbitrary text into a speech, wherein two syllable strings are determined from adjacent syllables based on an input monosyllable string.
Rhythm unit extraction means for extracting a syllable or a syllable unit consisting of one syllable and outputting a rhythm unit sequence obtained by adding rhythm unit type information to a single syllable sequence, and a rhythm unit for storing a basic duration for each rhythm unit type A reference length storage unit, the rhythm unit sequence being input, reading a basic duration corresponding to a rhythm unit type from the rhythm unit reference length storage unit, determining a duration of each input rhythm unit, Rhythm unit length determining means for giving a unit string and outputting it as a rhythm unit length data string; dividing ratio storing means for storing a dividing ratio for each type of single syllable constituting a syllable unit composed of two syllables; rhythm unit length By inputting a data string and dividing a rhythm unit according to the division ratio read from the division ratio storage means according to the type of a single syllable pair, Rhythm unit division means for each syllable type that determines each syllable length and outputs a monosyllable length data string consisting of monosyllable type and monosyllable length, and rhythm perception point storage that stores the rhythm perception point position for each single syllable type Means, a consonant length determining means for determining the duration of a consonant, and the syllable length data string as an input, a rhythm perceived point position being read from the rhythm perceived point storage means, and provided by the consonant length determining means. A phonological boundary determining means for determining a phonological boundary by arranging each consonant such that a time interval of a rhythm perception point from a consonant length matches each monosyllable length, and determining a duration of each phonological unit. A phoneme duration control method characterized by:

3. The syllable duration control system according to claim 1, wherein the output unit changes each single syllable length based on accent information separately input. A phoneme duration control method characterized by comprising changing means.

4. A phonological duration control method used in a speech rule synthesizing apparatus for converting an arbitrary text into a speech, wherein two syllables are determined from an input monosyllable string by determining adjacent syllables.
Rhythm unit extraction means for extracting a syllable or a syllable unit consisting of one syllable and outputting a rhythm unit sequence obtained by adding rhythm unit type information to a single syllable sequence, and a rhythm unit for storing a basic duration for each rhythm unit type A reference length storage unit, the rhythm unit sequence being input, reading a basic duration corresponding to a rhythm unit type from the rhythm unit reference length storage unit, determining a duration of each input rhythm unit, A rhythm unit length determining means for assigning to a unit sequence and outputting it as a rhythm unit length data sequence; inputting the rhythm unit length data sequence as input, dividing a time length of a rhythm unit composed of two syllables into a fixed ratio, and dividing each single syllable Rhythm unit dividing means for determining the time length of a syllable and outputting a single syllable length data string consisting of a single syllable type and a single syllable length, and a vowel for storing a vowel length ratio for each phonological sequence The length ratio storage means and a single syllable length data sequence are input, syllable boundary positions are arranged in vowel intervals, the time length from the vowel consonant boundary to the syllable boundary position is determined from the syllable length and the vowel length ratio, and the Vowel length priority phoneme boundary determining means for outputting a duration length.