JPH11249678A

JPH11249678A - Voice synthesizer and its text analytic method

Info

Publication number: JPH11249678A
Application number: JP10049635A
Authority: JP
Inventors: Eiji Komatsu; 英二小松
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1998-03-02
Filing date: 1998-03-02
Publication date: 1999-09-17
Anticipated expiration: 2018-03-02
Also published as: JP4218075B2

Abstract

PROBLEM TO BE SOLVED: To provide a voice synthesizer and its text analytic method making a synthetic voice generated by the supplied data the natural synthetic voice easy to hear and outputting it. SOLUTION: The voice synthesizer 10 is constituted so that a division candidate generation part 460 of a vocal sound processing part 40 divides an output of a language processing part 30 based on a (first) rule of ICRLB (immediate constituent with recursively left-branching structure) division, and judges (second rule) whether this divided rhythm phrase is the number of prescribed beats or below, and makes a combination of the rhythm phrases generated according to positions dividing this rhythm phrase when this rhythm phrase is re-divided a division candidate, and a valuation value showing division suitability is calculated by using information of a voice expression incorporated in the division candidate obtained in a division candidate selection part 462 and a parameter supplied from a parameter storage part 464, and further, and the division candidate satisfying a (third) rule, selecting the minimum valuation value is selected by using this valuation value by the division candidate selection part 462, and the selected division candidate becomes the rhythm phrase divided to an optimum length.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、供給された文章に
各種の解析を施し、この解析結果を韻律規則等に応じて
生成される中間言語から人工的に合成音声を発生させる
音声合成装置およびそのテキスト解析方法に関し、特
に、供給される日本語の文章に形態素および構文の解析
を施した後、この結果に対して施す音韻処理によって生
成される中間言語を基に合成音声を生成する音声合成装
置およびこの音声合成装置に適用して好適なテキスト解
析方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer for performing various analyzes on a supplied sentence and artificially generating a synthesized speech from an intermediate language generated according to a prosody rule and the like. The text analysis method relates to a speech synthesis method, in particular, which analyzes a morpheme and a syntax of a supplied Japanese sentence, and then generates a synthesized speech based on an intermediate language generated by phonological processing performed on the result. The present invention relates to an apparatus and a text analysis method suitable for application to the speech synthesis apparatus.

【０００２】[0002]

【従来の技術】音声を人工的に合成するには、波形形成
を行うために予め記録した音声を組み合わせて再生する
記録再生方式、あるいは音声を純粋に人工的に合成する
純合成方式があり、この記録再生方式における波形形成
の制御にも予め用意した情報の組合せを利用する編集制
御方式、あるいは純人工的に制御信号を生成する規則制
御方式の諸手法がある。これらの方式の中で最近、たと
えば電子メールの読み上げや天気予報の案内サービス、
プロ野球の結果といったニュース等、種々の分野に規則
制御方式を適用して高品質の合成音声を提供する音声合
成装置が注目されている。2. Description of the Related Art To artificially synthesize speech, there are a recording / reproducing method in which sounds recorded in advance are combined and reproduced in order to form a waveform, and a pure synthesizing method in which sounds are artificially synthesized purely. For the control of waveform formation in this recording / reproducing method, there are various methods such as an edit control method using a combination of information prepared in advance, or a rule control method for purely artificially generating a control signal. Among these methods, recently, e.g., e-mail reading and weather forecasting services,
2. Description of the Related Art A speech synthesizer that provides a high-quality synthesized speech by applying a rule control method to various fields such as news such as the results of professional baseball has attracted attention.

【０００３】合成音声を高品質に得るためには、日本語
における語義、統語構造、談話構造等の言語情報を的確
に反映し、かつ自然な韻律を生成することのできる韻律
規則を構築しなければならない。このような規則制御方
式を適用して日本語の合成音声を生成する手法の一例が
日本音響学会誌50巻 6号（「日本語文章音声の合成のた
めの韻律規則」）に開示されている。ここに開示されて
いる手法は、供給される文章に対して形態素解析、構文
解析を行って韻律句を生成し、生成した韻律句が大きす
ぎる場合にはこの韻律句を均等に近く分割することによ
り、文の意味的な構造を考慮するとともに、滑らかな韻
律で文が読まれるように中間言語を生成させている。In order to obtain a synthesized speech with high quality, it is necessary to construct a prosodic rule that can accurately reflect linguistic information such as meaning, syntactic structure, and discourse structure in Japanese and generate a natural prosody. Must. An example of a method for generating a synthesized speech in Japanese by applying such a rule control method is disclosed in the Journal of the Acoustical Society of Japan, Vol. 50, No. 6 (“Prosody rules for synthesis of Japanese sentence speech”). . The method disclosed herein performs a morphological analysis and a syntactic analysis on a supplied sentence to generate a prosodic phrase, and when the generated prosodic phrase is too large, divides the prosodic phrase almost equally. Thus, the intermediate language is generated so that the sentence can be read with smooth prosody while considering the semantic structure of the sentence.

【０００４】音声合成装置は、この生成された中間言語
を基に制御パラメータを生成し、この制御パラメータに
応じた音声を合成して出力している。The speech synthesizer generates a control parameter based on the generated intermediate language, and synthesizes and outputs a speech corresponding to the control parameter.

【０００５】[0005]

【発明が解決しようとする課題】ところで、前述したよ
うな生成した韻律句が大きすぎる場合、規則を適用する
ことによって、生成した韻律句を細かく分割し過ぎてし
まったり、本来の韻律に反する読み方がなされるような
韻律句の分割が行われることがある。このような韻律句
の分割が行われることにより、出力される合成音声が不
自然に聞こえてしまうことがあった。By the way, when the generated prosody phrase is too large as described above, by applying rules, the generated prosody phrase may be too finely divided, or may be read in a manner contrary to the original prosody. Prosodic phrase division may be performed. When such prosodic phrases are divided, the output synthesized speech may sound unnatural.

【０００６】本発明はこのような従来技術の欠点を解消
し、供給されたデータによって生成される合成音声をよ
り自然で聞き易い合成音声にして出力することができる
音声合成装置およびそのテキスト解析方法を提供するこ
とを目的とする。The present invention solves the above-mentioned drawbacks of the prior art, and provides a speech synthesis apparatus and a text analysis method capable of outputting a synthesized speech generated by supplied data as a more natural and easily synthesized speech. The purpose is to provide.

【０００７】[0007]

【課題を解決するための手段】本発明は上述の課題を解
決するために、情報として供給される文章に含まれる形
態素やこの文章の構文を処理に用いる言語に基づいてこ
の言語レベルの特徴を解析する言語解析手段と、この言
語解析手段の出力に対して音声言語レベルの特徴に基づ
く解析を行うとともに、得られた解析結果を基に音声合
成用の指令となる中間言語を生成する音韻解析手段と、
この音韻解析手段の出力に応じた制御パラメータを生成
する制御パラメータ生成手段と、この制御パラメータ生
成手段の出力を基に音声信号を合成する音声信号生成手
段とを備え、情報を人工的に音声に合成する音声合成装
置において、形態素の連鎖である韻律語素が複数まとめ
られた韻律句の修飾関係に応じて言語解析手段の出力を
分割する第１の規則とともに、この分割によって得られ
た韻律句が予め設定した大きさ以下に小さいかを判断す
る第２の規則に応じて韻律句を再分割する際にこの韻律
句を区切る位置により得られる韻律句の組合せを分割候
補として生成する分割候補生成手段と、この分割候補生
成手段で生成された分割候補が含む音声表現の情報を用
いて分割妥当性を評価する評価値を算出するとともに、
この評価値の最小な分割候補を選択する第３の規則によ
り分割候補を選択する分割候補選択手段と、この分割候
補選択手段での評価値の算出に用いるパラメータを格納
するパラメータ格納手段とを音韻解析手段に含むことを
特徴とする。According to the present invention, in order to solve the above-mentioned problems, a language level feature is determined based on a morpheme included in a sentence supplied as information and a language used for processing the syntax of the sentence. A linguistic analysis means for analyzing, and a phonological analysis for performing an analysis based on features of a speech language level on an output of the linguistic analysis means and generating an intermediate language serving as a command for speech synthesis based on the obtained analysis result. Means,
A control parameter generating means for generating a control parameter corresponding to an output of the phoneme analyzing means; and a voice signal generating means for synthesizing a voice signal based on an output of the control parameter generating means. In the speech synthesizer for synthesizing, the first rule for dividing the output of the language analysis means according to the modification relation of the prosodic phrase in which a plurality of prosodic words which are a chain of morphemes are collected, and the prosody phrase obtained by this division. Candidate generation for generating, as a division candidate, a combination of prosodic phrases obtained at positions separating the prosodic phrases when subdividing the prosodic phrases in accordance with a second rule for determining whether is smaller than a predetermined size. Means for calculating an evaluation value for evaluating the validity of the division using the information of the speech expression included in the division candidate generated by the division candidate generation means,
A division candidate selection unit for selecting a division candidate according to a third rule for selecting a division candidate having the smallest evaluation value, and a parameter storage unit for storing parameters used for calculating an evaluation value in the division candidate selection unit. It is characterized in that it is included in the analysis means.

【０００８】ここで、分割候補生成手段は、言語解析手
段の出力に対して第１の規則を用いて分割する第１の規
則分割手段と、この第１の規則処理手段の出力において
分割された長さを第２の規則で判断する第２の規則処理
手段と、この分割長判断手段の判断に応じて第１の規則
分割手段から出力する韻律句を再分割して複数の分割候
補を生成する再分割手段とを含むことが望ましい。Here, the division candidate generating means is a first rule dividing means for dividing the output of the linguistic analysis means using a first rule, and is divided at the output of the first rule processing means. A second rule processing means for judging the length by the second rule, and a plurality of division candidates generated by subdividing the prosodic phrase output from the first rule division means in accordance with the judgment of the division length judging means It is desirable to include subdivision means for performing the division.

【０００９】パラメータ格納手段は、分割候補における
音声表現の情報に基づく第１の重み係数と、分割候補の
分割数に伴う韻律語素の平均拍数に掛ける第２の重み係
数とを格納するとよい。これにより、生成される韻律句
の強調・抑圧指定等を的確に表現できる中間言語の生成
が可能になる。[0009] The parameter storage means may store a first weighting factor based on information of speech expression in the division candidate and a second weighting factor multiplied by the average number of prosodic words according to the number of divisions of the division candidate. . As a result, it is possible to generate an intermediate language capable of accurately expressing the designation of emphasis / suppression of the generated prosodic phrase.

【００１０】規則を満足する韻律句の区切り設定位置の
実例を格納する分割実例格納手段と、この分割実例格納
手段に格納された実例の中から供給された文章が最適に
分割される実例を検索する分割実例検索手段とを含むこ
とが好ましい。このような構成により、実例に則して最
適な合成音声を出力させる中間言語を出力させることが
できる。A divided example storing means for storing an example of a set position of a prosodic phrase which satisfies a rule, and an example in which a supplied sentence is optimally divided from examples stored in the divided example storing means. It is preferable to include a division example search unit that performs the division. With such a configuration, it is possible to output an intermediate language that outputs an optimum synthesized speech according to an actual example.

【００１１】分割実例検索手段は、分割実例格納手段が
格納する実例の検索に強調情報、韻律語素、韻律語素の
拍数、この拍数の総数、韻律語素の係受けの種別、アク
セント指令および／または品詞の種類を用いることが好
ましい。実例の検索による実例のマッチングの確度の向
上および検索時間の短縮を図ることができる。[0011] The divided example retrieval means includes a search for an example stored in the divided example storage means, the emphasis information, the prosodic word element, the number of beats of the prosodic word element, the total number of beats, the type of prosodic word element dependency, and the accent. Preferably, a command and / or part of speech type is used. It is possible to improve the accuracy of matching of examples by searching for examples and shorten the search time.

【００１２】言語解析手段は、強調情報の規則の設定を
行うとともに、設定された強調情報を格納する強調情報
設定手段を含むことが好ましい。[0012] It is preferable that the linguistic analysis means sets emphasis information rules and includes emphasis information setting means for storing the set emphasis information.

【００１３】本発明の音声合成装置は、分割候補生成手
段で言語解析手段の出力を第１の規則に基づいて分割
し、この分割された韻律句を第２の規則で判断して韻律
句を再分割する際にこの韻律句を区切る位置により生成
される韻律句の組合せを分割候補とし、分割候補選択手
段で得られた分割候補が含む音声表現の情報を用いて分
割妥当性を評価する評価値をパラメータ格納手段から供
給されるパラメータを用いて算出し、さらに、この評価
値を用いて分割候補選択手段で第３の規則を満たす分割
候補を選択することにより、選ばれた分割候補が最適な
長さに区切られた韻律句になるので、音韻解析手段では
この選ばれた分割候補に応じた中間言語を生成する音声
合成装置における中間処理が可能になる。In the speech synthesis apparatus according to the present invention, the output of the language analysis means is divided by the division candidate generation means based on the first rule, and the divided prosody phrase is determined by the second rule to determine the prosody phrase. When subdividing, a combination of prosodic phrases generated based on the positions separating the prosodic phrases is used as a division candidate, and evaluation is performed to evaluate the validity of the division using the information of the speech expression included in the division candidate obtained by the division candidate selection means. The value is calculated using the parameters supplied from the parameter storage means, and further, by using this evaluation value, the division candidate selection means selects a division candidate satisfying the third rule. Since the prosodic phrase is divided into various lengths, the phonological analysis means can perform intermediate processing in a speech synthesis device that generates an intermediate language corresponding to the selected division candidate.

【００１４】また、本発明の音声合成装置のテキスト解
析方法は、情報として供給される文章に含まれる形態素
やこの文章の構文を処理に用いる言語に基づいてこの言
語レベルの特徴を解析し、この解析結果に基づいて文章
の韻律に対する音声言語レベルの特徴に基づく解析を行
うとともに、得られた解析結果を基に音声合成用の指令
となる中間言語を生成し、この生成された中間言語に応
じた制御パラメータを生成した後、この制御パラメータ
に対応する音声を人工的に合成する音声合成装置のテキ
スト解析方法において、形態素の連鎖である韻律語素が
複数まとめられた韻律句の修飾関係に応じて情報を分割
する第１の規則を用いて、文章に対する解析結果を分割
する規則分割工程と、韻律句の大きさが予め設定した大
きさ以下に小さいかの判断を第２の規則とし、規則分割
工程の結果をこの第２の規則で判断する分割長判断工程
と、この分割長判断工程の判断結果に応じて情報を再分
割して得られる韻律句の区切り設定位置の組合せを分割
候補として生成する分割候補生成工程と、この分割候補
生成工程で得られる分割候補が含む音声表現の情報に基
づいて分割妥当性の評価に用いるパラメータを格納する
パラメータ格納工程と、分割候補生成工程で生成された
分割候補が含む音声表現の情報および前記パラメータか
ら評価値を算出する評価値算出工程と、評価値算出工程
により得られる評価値中で最小を示す分割候補の選択を
第３の規則とし、この第３の規則に基づいて分割候補を
選択する分割候補選択工程とを含み、供給された文章の
解析を行うことを特徴とする。Further, the text analysis method of the speech synthesis apparatus according to the present invention analyzes the features at the language level based on a morpheme included in a sentence supplied as information and a language used for processing the syntax of the sentence. Based on the analysis result, the analysis based on the features of the spoken language level of the prosody of the sentence is performed, and based on the obtained analysis result, an intermediate language serving as a command for speech synthesis is generated, and according to the generated intermediate language, In the text analysis method of the speech synthesizer for artificially synthesizing a speech corresponding to the control parameter after generating the control parameter according to the modification relation of the prosodic phrase in which a plurality of prosodic words which are a chain of morphemes are grouped. A rule dividing step of dividing an analysis result for a sentence using a first rule of dividing information by using a first rule, wherein the size of a prosodic phrase is smaller than a predetermined size. Is determined as a second rule, a division length determining step of determining the result of the rule dividing step by the second rule, and a prosody phrase obtained by re-dividing the information in accordance with the determination result of the division length determining step And a parameter storage for storing parameters used for evaluating the validity of the division based on the information of the speech expression included in the division candidates obtained in the division candidate generation step. A step, an evaluation value calculating step of calculating an evaluation value from information of the speech expression included in the division candidate generated in the division candidate generating step and the parameter, and a division candidate indicating a minimum among the evaluation values obtained in the evaluation value calculating step Is a third rule, and a division candidate selection step of selecting a division candidate based on the third rule is performed, and the supplied sentence is analyzed.

【００１５】ここで、分割候補は、韻律句に含まれる拍
長が予め設定した再分割を指示する再分割拍長より大き
いとき、韻律句を再分割して生成され、かつ分割候補の
境界には音声言語レベルの特徴の一つである話調指令す
る記号を入れることが好ましい。分割候補をこのように
定義することにより、長い韻律句に対する考慮、たとえ
ば韻律句を細かく分割し過ぎること等を避けることがで
きるようになる。Here, when the beat length included in the prosodic phrase is larger than the preset re-divided beat length indicating the re-division, the prosodic phrase is generated by subdividing the prosodic phrase, and It is preferable to enter a symbol for instructing a tone, which is one of the features of the speech language level. By defining the division candidates in this way, it is possible to avoid consideration of a long prosodic phrase, for example, to divide the prosodic phrase too finely.

【００１６】評価値は、韻律句を分割して得られる区間
の拍長と韻律句全体の拍長を等分割した拍長との差の絶
対値の総和を算出する誤差総和算出工程と、韻律句にお
ける音声言語レベルの特徴に含まれる音声表現の情報で
表す特徴量の存在に応じてこの韻律句を分割して得られ
る区間の拍長の総和を算出する特徴量総和算出工程と、
分割候補の前記韻律語素に対して含まれる音声表現の情
報を基に重み係数を算出する重み算出工程と、特徴量総
和算出工程の結果に重み算出工程で得られた重み係数を
乗算するとともに、この乗算結果の総和を算出する重付
き特徴量総和算出工程と、韻律句を分割した後の平均拍
長と韻律句全体の拍長を等分割した拍長の積を算出する
積算出工程と、誤差総和算出工程の結果と重付き特徴量
総和算出工程の結果を加算し、この加算により得られた
結果から積算出工程の結果を減算して対象となる分割候
補の評価値を算出する評価値算出工程とを用いて算出さ
れると有利である。The evaluation value is obtained by calculating a sum of absolute values of differences between a beat length of a section obtained by dividing the prosodic phrase and a beat length obtained by equally dividing the beat length of the entire prosodic phrase; A feature amount sum calculating step of calculating a sum of beat lengths of sections obtained by dividing the prosodic phrase according to the presence of a feature amount represented by information of a speech expression included in a feature of a speech language level in the phrase;
A weight calculating step of calculating a weight coefficient based on information of a speech expression included in the prosodic word element of the division candidate, and multiplying a result of the feature amount sum calculating step by the weight coefficient obtained in the weight calculating step; A weighted feature amount sum calculating step of calculating the sum of the multiplication results, and a product calculating step of calculating the product of the average beat length after dividing the prosodic phrase and the beat length obtained by equally dividing the entire prosodic phrase. And adding the result of the error sum calculation step and the result of the weighted feature amount sum calculation step, and subtracting the result of the product calculation step from the result obtained by the addition to calculate the evaluation value of the target division candidate. Advantageously, it is calculated using a value calculation step.

【００１７】上述した音声表現の情報には、韻律語素の
強調、抑圧、および両者を除く場合とに分類して値を設
定されることが好ましい。この設定される値に基づいて
特徴量が求められる。It is preferable that the information of the above-mentioned speech expression is classified into values for emphasizing and suppressing prosodic words and for excluding both, and values are set. The feature amount is obtained based on the set value.

【００１８】分類する上での基準として、強調は分割候
補で表される韻律節の中心に含まれる固有名詞、あるい
は数字により分類し、抑圧は分割候補で表される韻律節
の中心に含まれる形式名詞、動詞、あるいは文頭に位置
し、かつ韻律節の最後に係助詞が位置することによりこ
の韻律節を分類し、強調と抑圧に対して予め設定した値
を割り当てることが望ましい。As a criterion for classification, emphasis is classified by proper nouns or numerals included in the center of the prosodic phrase represented by the division candidate, and suppression is included in the center of the prosodic phrase represented by the division candidate. It is desirable to classify this prosodic phrase by being located at the beginning of a formal noun, verb, or sentence, and by placing a particle at the end of the prosodic phrase, and assigning preset values to emphasis and suppression.

【００１９】強調は分割候補の韻律節に固有名詞、ある
いは数字を含むことにより分類し、抑圧は分割候補の韻
律節に形式名詞、動詞、あるいは文頭に位置し、かつ韻
律節の最後に係助詞を含むことによりこの韻律節を分類
し、強調と抑圧に対して予め設定した値を割り当てるこ
とが望ましい。The emphasis is classified by including a proper noun or a numeral in the prosody of the division candidate, and the suppression is located at the formal noun, the verb or the beginning of the prosody of the division candidate, and the final part of the prosody. It is desirable to classify this prosodic clause by including the following, and to assign predetermined values to emphasis and suppression.

【００２０】このテキスト解析方法が適用される文章の
分割候補は、供給される情報および／または適切に区分
された実例に付加される情報を検索キーとしてを予め格
納し、分割長判断工程は、供給される韻律句に対する判
断結果に応じて出力先を選択し、その後、この選択され
た出力先で供給される韻律句の分割に該当する実例を検
索し、この検索結果に応じてこの韻律句を分割して各種
指令を付与する実例検索工程を含むことが好ましい。こ
の検索により、的確な文章の解析を行える。A sentence division candidate to which the text analysis method is applied stores information to be supplied and / or information added to an appropriately divided example as a search key in advance, and a division length determination step includes: An output destination is selected according to the judgment result of the supplied prosodic phrase, and thereafter, an example corresponding to the division of the prosodic phrase supplied at the selected output destination is searched, and the prosodic phrase is selected according to the search result. It is preferable to include an actual example search step of dividing and giving various instructions. By this search, accurate sentence analysis can be performed.

【００２１】実例格納工程は、予め各種の実例を学習的
に記憶させることが望ましい。この記憶により、経験を
積む期間の短縮化およびより一層幅広く対応させること
ができる。In the example storing step, it is desirable to store various examples in a learning manner in advance. With this memory, it is possible to shorten the period of gaining experience and to cope with it more widely.

【００２２】また、パラメータは、統計解析あるいは多
変量解析により求めることが好ましい。これにより、パ
ラメータの変更を半自動的に行える。Preferably, the parameters are obtained by statistical analysis or multivariate analysis. Thereby, the parameter can be changed semi-automatically.

【００２３】このテキスト解析方法の適用する言語は、
日本語であることが有利である。これにより、あいまい
な表現や構文を有する日本語の文章解析に基づいた中間
言語の生成が容易になる。The language to which this text analysis method is applied is
Advantageously, it is Japanese. This facilitates generation of an intermediate language based on Japanese sentence analysis having ambiguous expressions and syntax.

【００２４】本発明の音声合成のテキスト解析方法は、
文章に対する解析結果を規則分割工程の第１の規則、分
割長判断工程の第２の規則と順に処理し、この結果に応
じて情報を再分割した際に得られる組合せを分割候補と
し、一方、この分割候補が含む音声表現の情報に基づい
てパラメータ格納工程で得られているパラメータおよび
分割候補が含む音声表現の情報に基づいて分割妥当性の
評価値の算出し（評価値算出工程）、分割候補選択工程
で第３の規則を満足する分割候補を選択することによ
り、供給される文章の韻律句をさらに細分化して音韻解
析が行われても最適な分割候補を選択してこの分割に対
応した中間言語の生成が可能になる。The text analysis method for speech synthesis according to the present invention comprises:
The analysis result for the sentence is processed in the order of the first rule of the rule division step and the second rule of the division length determination step, and a combination obtained when information is redivided according to the result is set as a division candidate. An evaluation value of division validity is calculated based on the parameters obtained in the parameter storage step based on the information of the speech expression included in the division candidate and the information of the speech expression included in the division candidate (evaluation value calculation step). By selecting a division candidate satisfying the third rule in the candidate selection step, even if the prosodic phrase of the supplied sentence is further subdivided and a phonological analysis is performed, an optimal division candidate is selected to cope with this division. It is possible to generate an intermediate language.

【００２５】[0025]

【発明の実施の形態】次に添付図面を参照して本発明に
よる音声合成装置およびそのテキスト解析方法の実施例
を詳細に説明する。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an embodiment of a speech synthesizing apparatus according to the present invention;

【００２６】本発明の音声合成装置は、供給される日本
語の文章に形態素および構文の解析を施した後、この結
果に対して施す音韻処理によって生成される中間言語を
基に制御パラメータを生成し、この制御パラメータに基
づいて合成音声を生成して出力する装置である。この音
声合成装置およびそのテキスト解析方法について図１〜
図19を参照しながら説明する。図１に示すように、音声
合成装置10は、データ入力部20、言語解析部30、音韻処
理部40、制御パラメータ生成部50、および音声信号生成
部60を有している。The speech synthesizing apparatus of the present invention analyzes the morpheme and syntax of the supplied Japanese sentence, and then generates control parameters based on the intermediate language generated by phonemic processing performed on the result. This is a device that generates and outputs synthesized speech based on the control parameters. This speech synthesizer and its text analysis method are shown in FIGS.
This will be described with reference to FIG. As shown in FIG. 1, the speech synthesizer 10 includes a data input unit 20, a language analysis unit 30, a phoneme processing unit 40, a control parameter generation unit 50, and a speech signal generation unit 60.

【００２７】データ入力部20は、各種形態の文章を音声
合成装置10の処理可能なデータ形式に変換して入力す
る。この文章は、たとえば手書きの原稿の形態やワード
プロセッサ等の文章作成ソフトウェアによって作成され
た所定のデータ形式で記録媒体に記録される形態等、様
々な形態がある。手書きの原稿の形態の場合、データ入
力部20は、この紙に係れている文章を光学的に読み取る
装置が付加されている。また、フロッピーディスクから
保存されているデータを読み出す場合、データ入力部20
には、データ読取り装置としてディスクドライバが装置
内部に配設されている。The data input section 20 converts various forms of text into a data format that can be processed by the speech synthesizer 10 and inputs the data. This sentence has various forms, such as a form of a handwritten manuscript and a form recorded on a recording medium in a predetermined data format created by sentence creation software such as a word processor. In the case of a handwritten document, the data input unit 20 is provided with a device for optically reading a sentence related to the paper. When reading stored data from a floppy disk, the data input section 20
, A disk driver is disposed inside the device as a data reading device.

【００２８】言語解析部30は、図１に示すように、形態
素解析部32、構文解析部34、および強調情報設定部36を
備えている。言語解析部30では、データ入力部20から供
給されるデータ、すなわち入力文のディジタルデータが
形態素解析部32に供給されている。形態素解析部32に
は、図２に示す単語分割解析部32a と単語辞書部32b が
ある。形態素（morpheme）とは、意味を有する最小の形
態のことである。単語分割解析部32a は、単語分割を行
う処理プログラムが格納されており、入力文に含まれる
単語と単語辞書部32b の単語を照合するように検索処理
が施している。形態素解析部32は、単語分割解析部32a
に格納されるプログラムに応じて単語辞書部32b を検索
しながら、入力文を単語に分割している。As shown in FIG. 1, the language analysis unit 30 includes a morphological analysis unit 32, a syntax analysis unit 34, and an emphasis information setting unit 36. In the language analysis unit 30, data supplied from the data input unit 20, that is, digital data of an input sentence is supplied to the morphological analysis unit 32. The morphological analysis unit 32 includes a word division analysis unit 32a and a word dictionary unit 32b shown in FIG. A morpheme is the smallest form that has meaning. The word division analysis unit 32a stores a processing program for performing word division, and performs a search process so as to match words included in an input sentence with words in the word dictionary unit 32b. The morphological analysis unit 32 includes a word division analysis unit 32a
The input sentence is divided into words while searching the word dictionary section 32b according to the program stored in.

【００２９】ここで、言語解析の用語を説明する。文
は、処理上の単位で韻律的単位、構文的単位、およびこ
れら両者の中間的単位の用語に分類して扱われる（図３
(a) を参照）。韻律的単位には、韻律語、韻律句、韻律
節、韻律文がある。韻律語は、一つのアクセント成分に
対応し、かつ一定のアクセント型を示す音素（すなわ
ち、音として弁別可能な最小単位）の連鎖である。韻律
句は、一つのフレーズ成分に対応する韻律語の連鎖であ
る。韻律節は、休止によって区切られた韻律句の連鎖で
あり、かつ韻律句の最後の部分でフレーズ成分のリセッ
ト（すなわち、負のフレーズ指令の生起）が行われてい
ないものを示す。最後に、韻律文は、韻律句の連鎖であ
り、かつ韻律句の定義と同様に最後の部分でフレーズ成
分のリセットが行われていないものである。たとえば、
文「低気圧は伊豆半島の南にあり、東日本でにわか雨が
降っています。」は、これらの定義で図３(b) に示すよ
うに分類される。Here, terms of language analysis will be described. Sentences are classified into processing units, prosodic units, syntactic units, and intermediate units between them.
(see (a)). Prosodic units include prosodic words, prosodic phrases, prosodic clauses, and prosodic sentences. A prosodic word is a chain of phonemes (that is, minimum units that can be discriminated as sounds) that correspond to one accent component and indicate a certain accent type. A prosodic phrase is a chain of prosodic words corresponding to one phrase component. A prosodic clause is a chain of prosodic phrases separated by pauses, and indicates that the phrase component has not been reset (that is, a negative phrase command has been generated) in the last part of the prosodic phrase. Finally, the prosodic sentence is a chain of prosodic phrases, and the phrase component is not reset in the last part as in the definition of the prosodic phrase. For example,
The sentence "The cyclone is south of the Izu Peninsula and is showering in eastern Japan" is classified according to these definitions as shown in Figure 3 (b).

【００３０】構文的単位には、大きさの順に文節、
（句）、節、文がある。特に、節とは、他の語句を修飾
していない述語（動詞述語、形容詞述語、名詞述語等）
とその述語を直接的・間接的に修飾するすべての語句か
らなる連鎖と定義している。ただし、「付近の500 世帯
の家庭に電気を供給している変電所です」という文中に
おける「〜している」の記述は「変電所」を連体修飾し
ているので、この文の「〜している」の記述は、節に分
類されない。The syntactic units include clauses in order of size,
(Phrases), clauses and sentences. In particular, a clause is a predicate that does not modify other terms (verb predicate, adjective predicate, noun predicate, etc.)
Is defined as a chain consisting of all the phrases that directly and indirectly modify the predicate. However, in the sentence "It is a substation that supplies electricity to 500 households in the vicinity", the description of "to do" qualifies "substation" as a union, so in this sentence, Are not categorized into sections.

【００３１】また、中間的単位には、韻律語素、ICRLB
がある。韻律語素は、統語的条件、読みの強調・抑圧に
よって複数個の韻律語に分割されることのない形態素の
連鎖である。ICRLB （Immediate Constituent with Rec
ursively Left-Branching Structure の略）とは、右枝
分れ境界で区切られ、かつ左枝分れ境界のみを含む韻律
語の連鎖である。構文木において左枝分れ境界とは、修
飾関係にある韻律語素間の境界のことで、、右枝分れ境
界とは、修飾関係にない韻律語素間の境界と定義され、
ICRLB 境界と呼ばれている。ICRLB は、一つの単語でも
よく、構文木を右枝分れ境界で分割すると、一つの文は
ICRLB 連鎖となる。後述する第１の規則は、ICRLB 分割
を行う規則である。The intermediate units are prosodic words, ICRLB
There is. A prosodic word is a chain of morphemes that is not divided into a plurality of prosodic words by syntactic conditions and emphasis / suppression of reading. ICRLB (Immediate Constituent with Rec
Ursively Left-Branching Structure) is a sequence of prosodic words separated by right branching boundaries and including only left branching boundaries. In the parse tree, a left branch boundary is a boundary between prosodic words having a modification relationship, and a right branch boundary is defined as a boundary between prosody words having no modification relationship.
This is called the ICRLB boundary. ICRLB may be a single word, and if the parse tree is divided at the right branch boundary, one sentence
ICRLB chain. The first rule described later is a rule for performing ICRLB division.

【００３２】このような定義により、たとえば節の境界
は、修飾関係があっても通常の修飾関係と区別できるた
め同時にICRLB 境界とする。したがって、境界の定義
は、それぞれまとめると節境界が節同士の境界、ICRLB
境界はICRLB 同士の境界、列挙境界は、たとえば「司
法、立法、行政」のように名詞が読点で並列的に並べた
表現における読点後の単語境界となる。According to such a definition, for example, a boundary of a clause can be distinguished from a normal modification relation even if there is a modification relation, so that it is also an ICRLB boundary at the same time. Therefore, the definition of the boundary can be summarized as follows: the node boundary is the boundary between nodes, ICRLB
The boundary is a boundary between ICRLBs, and the enumeration boundary is a word boundary after a reading point in an expression in which nouns are arranged in parallel with reading points, for example, “judicial, legislative, administrative”.

【００３３】このような定義を用いて解析処理を行う構
文解析部34は、形態素解析部32からの形態素解析結果を
データとして入力し、入力文に対する構文木、または係
受け構造の解析を行う機能部である。また、構文解析部
34は、上述した機能による解析結果からICRLB 、節、お
よび並列関係等を決定して入力文に対し言語レベルにお
ける特徴である構文解析を行っている。この構文解析の
結果が構文木でなく、係受け構造の場合、前述において
定義した修飾関係から、ICRLB 境界を定義したり、アク
セント核を複数個含む複合語がある場合、その複合語の
直前をICRLB 境界と定義している。The syntax analyzer 34 that performs an analysis process using such a definition receives the morphological analysis result from the morphological analyzer 32 as data and analyzes the syntax tree or the dependency structure of the input sentence. Department. Also, the parser
Numeral 34 determines ICRLB, clauses, parallel relations, and the like from the results of analysis by the above-described functions, and performs syntax analysis, which is a feature at the language level, on input sentences. If the result of this parsing is not a parse tree but a dependency structure, the ICRLB boundary is defined based on the modification relation defined above, and if there is a compound word containing multiple accent nuclei, the immediately preceding compound word is added. Defined as ICRLB boundary.

【００３４】強調情報設定部36は、図２に示すように形
態素解析部32の解析結果と構文解析部34の解析結果を基
に単語または文節に強調・抑圧の強調情報の設定を行う
情報設定部36a と、情報設定部36a に供給する設定の基
準となる、強調情報の規則を格納する規格格納部36b と
を備えている。情報設定部36a は、規格格納部36b から
供給される規則と照合して強調情報を入力されるデータ
に設定している。As shown in FIG. 2, the emphasis information setting unit 36 sets an emphasis / suppression emphasis information for a word or a phrase based on the analysis result of the morphological analysis unit 32 and the analysis result of the syntax analysis unit 34. And a standard storage unit 36b for storing rules of emphasis information, which is a reference for setting to be supplied to the information setting unit 36a. The information setting section 36a sets emphasis information in the input data by comparing it with the rules supplied from the standard storage section 36b.

【００３５】ここで、強調情報は、アクセントや一部の
フレーズ指令の設定に用いる情報で、音声表現の情報の
一つである。強調情報は、３種類の情報に分類し、情報
の強調には+emph 、情報の抑圧には-emph 、そして強調
・抑圧の両方に分類されない場合には0emph に分けてい
る。強調情報の設定例として、たとえば固有名詞、数字
等を中心とする文節は+emph とし、「こと」、「もの」
等の形式名詞、「なる」、「いる」等の動詞を中心とす
る文節やたとえば文頭の「は」（係助詞）格の文節は-e
mph に分類する。Here, the emphasis information is information used for setting an accent and a part of a phrase command, and is one of information of voice expression. The emphasis information is classified into three types of information, + emph for emphasis of information, -emph for suppression of information, and 0emph when not classified as both emphasis and suppression. As an example of setting emphasis information, for example, clauses centered on proper nouns, numbers, etc. should be + emph, and "koto", "mono"
For example, a phrase centered on a verb such as "Naru" or "Iru" or a phrase such as "ha" (participatory) case at the beginning of a sentence is -e
mph.

【００３６】言語解析部30は、入力文に対して形態素解
析部32、および構文解析部34からの解析結果や強調情報
設定部36からの設定された強調情報を音韻処理部40に出
力する。音韻処理部40は、言語解析部30の出力を基に音
声合成のための中間言語を生成する。音韻処理部40に
は、図１に示すように、韻律語生成処理部42、アクセン
ト指令生成部44、およびポーズ・フレーズ指令生成部46
が備えられている。The language analysis unit 30 outputs to the phonological processing unit 40 the analysis result from the morphological analysis unit 32 and the syntax analysis unit 34 and the emphasis information set from the emphasis information setting unit 36 for the input sentence. The phoneme processing unit 40 generates an intermediate language for speech synthesis based on the output of the language analysis unit 30. As shown in FIG. 1, the phonological processing unit 40 includes a prosodic word generation processing unit 42, an accent command generation unit 44, and a pause / phrase command generation unit 46.
Is provided.

【００３７】韻律語生成部42は、文節に含まれる単語の
アクセント結合を行って韻律語を生成する。韻律語生成
部42は、必要に応じて文節同士のアクセント結合も行っ
て韻律語を生成している。韻律語生成部42では、生成さ
れた韻律語に文節の強調情報も設定されている。The prosodic word generation unit 42 generates a prosodic word by performing accent combination of words included in a phrase. The prosodic word generation unit 42 also generates a prosodic word by performing accent connection between the phrases as necessary. The prosodic word generation unit 42 also sets phrase emphasis information in the generated prosodic word.

【００３８】アクセント指令生成部44は、図４に示すよ
うに、言語解析部30の出力の中でICRLB の情報に基づい
てアクセントのかけ方を指示するアクセント指令の生成
を行うアクセント指令設定部44a と、アクセント指令設
定部44a に供給する設定の基準となるアクセント指令の
規則を格納する規格格納部44b とが備えられている。ア
クセント指令設定部44a は、規格格納部44b から供給さ
れるアクセント指令の規則と照合してアクセント指令を
言語解析の結果に設定している。As shown in FIG. 4, an accent command generation unit 44a generates an accent command for instructing how to apply an accent based on the ICRLB information in the output of the language analysis unit 30, as shown in FIG. And a standard storage unit 44b for storing accent command rules that serve as a reference for setting supplied to the accent command setting unit 44a. The accent command setting unit 44a sets the accent command in the result of the linguistic analysis by comparing it with the rule of the accent command supplied from the standard storage unit 44b.

【００３９】ここで、アクセント指令は、アクセントの
大きさ、アクセントの立上げ、立下げの時点を示す命令
である。このアクセント指令の生成において、処理対象
の範囲は、アクセント変形の範囲である。アクセント変
形（accent sandhi ）とは、たとえば平板的なアクセン
トの平板型、アクセントに起伏を伴う起伏型等といった
アクセント型、文章を境界毎に区切る統語条件や談話条
件に応じて、連続する複数の韻律語のアクセント成分が
互いに影響を及ぼし合う韻律語素間の相互作用のことで
ある。このような関係から判るように、アクセント変形
の範囲はICRLBである。Here, the accent command is a command indicating the size of the accent and the time of the start and the end of the accent. In the generation of the accent command, the processing target range is the range of accent deformation. Accent deformation (accent sandhi) refers to a plurality of continuous prosody according to an accent type such as a flat type of a flat accent, an undulating type with an undulating accent, a syntactic condition or a discourse condition that divides a sentence into boundaries. It is the interaction between prosodic words that affect the accent components of words. As can be seen from such a relationship, the range of accent deformation is ICRLB.

【００４０】ポーズ・フレーズ指令生成部46は、休止の
長さ・フレーズ指令の生成を行う機能を有している。こ
の機能を発揮するように、ポーズ・フレーズ指令生成部
46には、図４に示すポーズ指令・フレーズ指令を生成す
るポーズ・フレーズ指令設定部46A と、言語解析部30の
出力を最適な長さのICRLB に分割するICRLB 分割部46B
とが備えられている。The pause / phrase command generation unit 46 has a function of generating a pause length / phrase command. Pause / phrase command generation unit to demonstrate this function
46 includes a pause / phrase command setting unit 46A for generating a pause command / phrase command shown in FIG. 4 and an ICRLB dividing unit 46B for dividing the output of the language analysis unit 30 into ICRLBs of an optimal length.
And are provided.

【００４１】ここで、ポーズ指令は、休止記号S₁, S₂,
S₃で表される。休止記号S₁, S₂, S₃は、それぞれ文、
節、ICRLB の区切りである。また、フレーズ指令は、フ
レーズ記号P₀, P₁, P₂, P₃で表される。フレーズ記号
P₀, P₁, P₂, P₃は、それぞれ、フレーズ成分のリセッ
ト、立直し、節頭での追加、ICRLB 間およびICRLB 内で
の追加に用いる。Here, the pause command is composed of pause symbols S ₁ , S ₂ ,
Represented by S _3. The pause symbols S ₁ , S ₂ , S ₃ are statements,
Clause and ICRLB. A phrase command is represented by phrase symbols P ₀ , P ₁ , P ₂ , and P ₃ . Phrase symbol
P ₀ , P ₁ , P ₂ , and P ₃ are used for resetting, recovering, adding at the beginning of a phrase component, and adding between ICRLBs and within ICRLB, respectively.

【００４２】ポーズ・フレーズ指令設定部46A には、ポ
ーズ指令・フレーズ指令を生成する指令設定部46a と、
この指令設定部46a に供給する設定の基準となるポーズ
指令・フレーズ指令の規則を格納する規格格納部46b と
がある。指令設定部46a には、言語解析部30からの出力
がそのまま供給されるのではなく、本実施例ではICRLB
分割部46B からの出力が供給されている。The pause / phrase command setting section 46A includes a command setting section 46a for generating a pause command / phrase command,
There is a standard storage unit 46b for storing rules of a pause command / phrase command as a reference for setting supplied to the command setting unit 46a. The command setting unit 46a is not supplied with the output from the language analysis unit 30 as it is, but in this embodiment, ICRLB
The output from the dividing unit 46B is supplied.

【００４３】さらに、ICRLB 分割部46B には、分割候補
生成部460 、分割候補選択部462 、およびパラメータ格
納部464 が備えられている。分割候補生成部460 には、
言語解析部30からの出力が供給されている。分割候補生
成部460 には、供給されるデータ（文節）をICRLB の境
界での分割（すなわち、第１の規則）を行う規則分割部
460aと、規則分割部460aの出力（ICRLB で分割された韻
律句の長さ：以下、ICRLB 韻律句という）が予め設定し
た長さ以下に短いかを判断（すなわち、第２の規則）す
る分割長判断部460bと、分割長判断部460bの判断に応じ
て規則分割部460aで分割された韻律句を再分割して複数
の分割候補を生成する再分割部460cとがある。Further, the ICRLB dividing section 46B includes a division candidate generation section 460, a division candidate selection section 462, and a parameter storage section 464. The division candidate generation unit 460 includes:
An output from the language analysis unit 30 is supplied. The division candidate generation unit 460 includes a rule division unit that divides the supplied data (clause) at the boundary of ICRLB (that is, the first rule).
460a and a division for judging whether the output of the rule division unit 460a (the length of the prosodic phrase divided by ICRLB: hereinafter, ICRLB prosodic phrase) is shorter than a preset length (that is, the second rule). There is a length determining unit 460b and a re-dividing unit 460c that re-divides the prosodic phrase divided by the rule dividing unit 460a according to the determination of the division length determining unit 460b to generate a plurality of division candidates.

【００４４】ここで、ICRLB 韻律句を再分割し韻律句を
区切る位置により得られる韻律句の組合せであり、この
組合せを分割候補と呼ぶ。Here, the ICRLB prosodic phrase is a combination of prosodic phrases obtained by subdividing the prosodic phrase by dividing the prosodic phrase, and this combination is called a division candidate.

【００４５】分割長判断部460bは、最初に供給された場
合に第２の規則の判断条件を満足するとき、出力を指令
設定部46a に供給する。また、この最初の場合に、この
判断条件を満足しないとき、指令設定部46a は出力先を
再分割部460c側にする。ところで、２度目以降の判断時
において、この判断条件に分割されたICRLB 韻律句に対
して分割候補の組合せがまだあるかどうかという条件も
加えて指令設定部46aは判断する。When the division length judgment unit 460b satisfies the judgment condition of the second rule when supplied first, it supplies an output to the command setting unit 46a. In the first case, when the determination condition is not satisfied, the command setting unit 46a sets the output destination to the re-division unit 460c. By the way, at the time of the second and subsequent determinations, the command setting unit 46a makes a determination in addition to the determination condition, as well as a condition as to whether or not there is still a combination of division candidates for the divided ICRLB prosody.

【００４６】再分割部460cは、第２の規則の判断を行う
ため分割したデータを再分割部460cに帰還させて複数種
類の分割候補の組合せがなくなるまで再分割処理を繰り
返す。再分割部460cは、分割された分割候補の中で第２
の規則の判断条件を満足する、分割候補だけを分割候補
選択部462 の演算部462aに出力する。The subdivision unit 460c returns the divided data to the subdivision unit 460c to determine the second rule, and repeats the subdivision processing until there are no more combinations of plural types of division candidates. The re-dividing unit 460c selects the second among the divided candidates.
Only the division candidates satisfying the determination condition of the rule are output to the calculation unit 462a of the division candidate selection unit 462.

【００４７】ただし、ICRLB 分割部46c に供給される前
に情報がICRLB の境界での分割規則に対応して常に細か
く分割されているとき、規則分割部460aは配設を省略す
ることができる。分割長判断部460bは、分割されたICRL
B 韻律句が所定の長さより長いとき再分割部460cに出力
を供給し、それ以外（すなわち、入力文が最初に供給さ
れた際に第２の規則を満足する）のとき、唯一、ポーズ
・フレーズ指令設定部46A の指令設定部46a にこの出力
を供給する。However, if the information is always finely divided according to the division rule at the boundary of the ICRLB before being supplied to the ICRLB division unit 46c, the rule division unit 460a can omit the arrangement. The division length determination unit 460b determines the divided ICRL
B When the prosodic phrase is longer than a predetermined length, the output is supplied to the re-segmentation unit 460c; otherwise (ie, when the input sentence is first supplied, the second rule is satisfied), only the pause This output is supplied to the command setting section 46a of the phrase command setting section 46A.

【００４８】したがって、分割候補生成部460 は、供給
されたICRLB 韻律句が所定の長さより長いとき、内蔵す
る再分割部460cでこのICRLB 韻律句を再分割し、所定の
長さ以下になるまで繰り返しながら、この出力を分割候
補選択部462 に供給することになる。Therefore, when the supplied ICRLB prosodic phrase is longer than a predetermined length, the division candidate generating section 460 re-divides the ICRLB prosodic phrase by the built-in subdivision section 460c until the ICRLB prosodic phrase becomes shorter than the predetermined length. While repeating, this output is supplied to the division candidate selection unit 462.

【００４９】分割候補選択部462 には、生成された分割
候補が含む音声表現の情報に基づいて得られる分割妥当
性を示す評価値としてコストを算出する演算部462aと、
演算部462aで算出されたコストの中で最小値となる分割
候補の選択（すなわち、第３の規則）を行う選択部362b
とがある。このコスト算出は、従来の方法における誤差
の総和に相当するもので、一部にこの誤差の総和を含む
が、その詳細については後段で説明する。分割候補選択
部462 は、選択部462bから選択された分割候補をポーズ
・フレーズ指令設定部46A の指令設定部46a に出力を供
給する。また、パラメータ格納部464 は、分割候補選択
部462 の演算部462aでのコスト算出に用いるパラメータ
を格納するメモリがある。The division candidate selection unit 462 includes a calculation unit 462a that calculates a cost as an evaluation value indicating the validity of division obtained based on the information of the speech expression included in the generated division candidate.
A selection unit 362b that selects a division candidate having the minimum value among the costs calculated by the calculation unit 462a (that is, the third rule).
There is. This cost calculation corresponds to the sum of the errors in the conventional method, and partially includes the sum of the errors. The details will be described later. The division candidate selection unit 462 supplies an output of the division candidate selected from the selection unit 462b to the command setting unit 46a of the pause / phrase command setting unit 46A. The parameter storage unit 464 has a memory for storing parameters used for cost calculation in the calculation unit 462a of the division candidate selection unit 462.

【００５０】また、パラメータ格納部464 には、図示し
ないが統計解析、あるいは多変量解析等の方法を適用し
てパラメータの変更を行う機能部もある。装置の簡略化
を考慮する場合、パラメータの変更を装置の外部で行
い、パラメータ格納部464 に変更するパラメータを単に
供給するようにしてもよい。統計解析、あるいは多変量
解析は、音声表現の情報を予め数値化する規則を設けて
おき、規則に基づいて情報に対応して得られる数値を用
いて文の傾向を評価する。この際、評価に対応する値が
再分割したことによる傾向を示すパラメータとなる。The parameter storage unit 464 also has a function unit (not shown) for changing parameters by applying a method such as statistical analysis or multivariate analysis. When the simplification of the apparatus is taken into consideration, the parameter may be changed outside the apparatus, and the parameter to be changed may be simply supplied to the parameter storage unit 464. In the statistical analysis or the multivariate analysis, a rule for digitizing information of a speech expression is provided in advance, and the tendency of a sentence is evaluated using a numerical value obtained corresponding to the information based on the rule. At this time, the value corresponding to the evaluation is a parameter indicating a tendency due to the subdivision.

【００５１】ICRLB 分割部46B は、長いICRLB 韻律句に
対して所定の長さ以下にした際の分割候補の中から最小
コストを選択し、この分割候補をポーズ・フレーズ指令
設定部46A の指令設定部46a に供給してポーズ指令・フ
レーズ指令を従来よりも的確なものにして出力する機能
を備えている。これにより、ICRLB 韻律句を適切な範囲
に分割することができる。音韻処理部40は、単語の読
み、アクセント指令、ポーズ指令、フレーズ指令の決定
後、図示しないが長音化、促音化等の処理を施して作成
される中間言語のデータを制御パラメータ生成部50に送
る。この音韻処理により、不自然な音声合成となる中間
言語の生成を避けることができる。The ICRLB division unit 46B selects the minimum cost from the division candidates when the length of the long ICRLB prosodic phrase is reduced to a predetermined length or less, and sets this division candidate in the command setting of the pause / phrase instruction setting unit 46A. It has a function of supplying a pause command / phrase command to the section 46a to make it more accurate and output it. Thus, the ICRLB prosodic phrase can be divided into appropriate ranges. The phonological processing unit 40, after determining the word reading, the accent command, the pause command, and the phrase command, outputs data of an intermediate language (not shown) created by performing processes such as lengthening and prompting to the control parameter generating unit 50. send. By this phonological process, generation of an intermediate language that results in unnatural speech synthesis can be avoided.

【００５２】制御パラメータ生成部50は、音韻処理部40
から供給される中間言語によるデータを基に音声合成に
用いる制御パラメータを生成する。生成した制御パラメ
ータは、音声信号生成部60に供給される。音声信号生成
部60には、音声波形生成部62、および音声出力部64が備
えられている。音声波形生成部62は、制御パラメータ生
成部50から供給される制御パラメータにD/A 変換処理を
施して音声波形を生成して音声出力部64に出力する。音
声出力部64は、音声波形をたとえばスピーカを介して入
力された情報（たとえば文章等）を音声にして出力す
る。このように構成して音声構成装置10は、的確な中間
言語のデータを生成して、この生成されたデータに基づ
く合成音声を出力させている。The control parameter generation unit 50 includes a phoneme processing unit 40
A control parameter used for speech synthesis is generated based on the data in the intermediate language supplied from. The generated control parameters are supplied to the audio signal generation unit 60. The audio signal generator 60 includes an audio waveform generator 62 and an audio output unit 64. The audio waveform generation unit 62 performs a D / A conversion process on the control parameters supplied from the control parameter generation unit 50, generates an audio waveform, and outputs the generated audio waveform to the audio output unit 64. The sound output unit 64 outputs a sound waveform as information (for example, a sentence) input through a speaker, for example. With such a configuration, the speech composition device 10 generates accurate intermediate language data and outputs a synthesized speech based on the generated data.

【００５３】次に本実施例の音声合成装置10の制御およ
びその動作について図５〜図13のフローチャートや各種
の例示に基づく表等を参照しながら説明する。図５のフ
ローチャートは、音声合成装置10の制御およびその制御
による主要な動作手順を説明している。この音声合成装
置10に電源が投入されると、音声合成装置10の動作が開
始して初期設定が行われた後、ステップS10 に進む。ス
テップS10 では、音声合成装置10のデータ入力部20を介
して供給された文章をディジタル化したり、すでにディ
ジタル化済みのデータをメモリに一旦格納する。このデ
ータ入力の後、サブルーチンSUB1に進む。Next, the control and operation of the speech synthesizer 10 of this embodiment will be described with reference to the flowcharts of FIGS. 5 to 13 and tables based on various examples. The flowchart of FIG. 5 describes the control of the speech synthesizer 10 and the main operation procedure under the control. When the power is turned on to the speech synthesizer 10, the operation of the speech synthesizer 10 is started, the initialization is performed, and the process proceeds to step S10. In step S10, the sentence supplied via the data input unit 20 of the speech synthesizer 10 is digitized, and the digitized data is temporarily stored in the memory. After this data input, the process proceeds to subroutine SUB1.

【００５４】サブルーチンSUB1では、供給されたデータ
について言語解析部30で言語解析処理を施す。ここで行
われる言語解析処理には、形態素解析処理、構文解析処
理、強調情報設定処理等がある。この言語解析処理の結
果はサブルーチンSUB2に送られる。In a subroutine SUB1, a language analysis process is performed by the language analysis unit 30 on the supplied data. The language analysis processing performed here includes morphological analysis processing, syntax analysis processing, emphasis information setting processing, and the like. The result of this language analysis processing is sent to the subroutine SUB2.

【００５５】次にサブルーチンSUB2では、サブルーチン
SUB1の結果を基に音韻処理部40で音韻解析処理を行う。
この音韻解析処理には、韻律語生成処理、アクセント指
令生成処理、ポーズ・フレーズ指令生成処理等がある。
この音韻解析の最終解析結果（すなわち、中間言語のデ
ータ）は、制御パラメータ生成部50に送られ、処理手順
はステップS11 に進む。Next, in subroutine SUB2,
Based on the result of SUB1, the phoneme processing unit 40 performs phoneme analysis processing.
The phonological analysis processing includes prosody word generation processing, accent command generation processing, pause / phrase command generation processing, and the like.
The final analysis result of the phonological analysis (that is, the data in the intermediate language) is sent to the control parameter generation unit 50, and the processing procedure proceeds to step S11.

【００５６】このサブルーチンSUB2には、ポーズ・フレ
ーズ指令を的確に行うように規則が用意されている。こ
の規則は、予め設定した最短の拍数をL₁とし、韻律句に
含まれる拍数限界をL₂とする場合、拍数限界L₂より長い
ICRLB 韻律句があれば、すべての韻律句が拍数限界L₂以
下になるように韻律語素境界にフレーズ記号P₃を挿入す
る。ただし、直前のフレーズ記号P₁/P₂/P₃からの距離が
最短拍数L₁以下の場合はフレーズ記号P₃の挿入を省略す
る。さらに、長いICRLB 韻律句の分割方法が複数ある場
合は、分割できる韻律句について後述するコスト関数を
適用して、コスト関数の値を最小とする分割を選択す
る。したがって、ポーズ・フレーズ指令は、この選択さ
れた韻律句に対して生成されることになる。Rules are prepared in the subroutine SUB2 so that a pause / phrase command can be issued accurately. This rule, the number of beats shortest preset as L _1, the number of beats included limits are in the prosodic phrase if the L _2, longer than the pulse rate limit L ₂
If there is ICRLB prosodic phrase, all the prosodic phrase inserts phrase symbol P ₃ to prosodic word containing boundary so as to limit L ₂ less than or equal to the number of beats. However, if the distance is the number of L ₁ following the shortest beat from phrase symbol P ₁ / P ₂ / P ₃ immediately before omitted insertion phrase symbol P _3. Further, when there are a plurality of long ICRLB prosodic phrase division methods, a cost function described later is applied to prosodic phrases that can be divided, and division that minimizes the value of the cost function is selected. Therefore, a pause phrase command will be generated for this selected prosodic phrase.

【００５７】ステップS11 では、サブルーチンSUB2の音
韻解析処理の結果に基づいて音声合成に必要な制御パラ
メータを制御パラメータ生成部50で生成する。ここで生
成された制御パラメータは、ステップS12 に進む。In step S11, the control parameter generation unit 50 generates control parameters required for speech synthesis based on the result of the phoneme analysis processing of the subroutine SUB2. The control parameters generated here proceed to step S12.

【００５８】ステップS12 では、ステップS11 により得
られた制御パラメータを基に音声信号生成部60で音声信
号生成処理を行う。この処理により、最終的に供給され
た文章に対応する合成音声が生成されて音声出力され
る。この音声出力により音声合成装置10のこの一連の処
理が終了する。In step S12, the audio signal generation unit 60 performs an audio signal generation process based on the control parameters obtained in step S11. By this processing, a synthesized speech corresponding to the sentence finally supplied is generated and output as speech. With this voice output, this series of processes of the voice synthesizer 10 ends.

【００５９】前述したサブルーチンSUB1, SUB2について
さらに説明する。音声合成装置10は、図６のサブルーチ
ンSUB1で言語解析処理を行う際に、まず、サブステップ
SS10に進む。サブステップSS10では、形態素解析処理を
行う。形態素解析部32は、単語分解解析部32a で供給さ
れたデータから文章の境界を認識して文章を文に分割す
る。形態素解析部32は、さらに分割した文の文字列の要
素をなす部分文字列と一致する単語を図２の単語辞書32
b から検索する。また、形態素解析部32では、文法的な
接続可能性のチェックも行って文を単語列に分割する。
この処理の後に、サブステップSS11に進む。The above-described subroutines SUB1 and SUB2 will be further described. When performing the language analysis process in the subroutine SUB1 of FIG.
Proceed to SS10. In sub-step SS10, morphological analysis processing is performed. The morphological analysis unit 32 recognizes the boundaries of sentences from the data supplied by the word decomposition analysis unit 32a and divides the sentences into sentences. The morphological analysis unit 32 further converts words that match the partial character strings forming the elements of the character string of the divided sentence into the word dictionary 32 of FIG.
Search from b. The morphological analysis unit 32 also checks grammatical connection possibility and divides the sentence into word strings.
After this processing, the flow advances to sub-step SS11.

【００６０】サブステップSS11では、サブステップSS10
の処理結果（単語列）を用いて構文解析部34で構文解析
する。この構文解析は、供給される単語列を文節にまと
め、かつこの文節間の修飾関係を解析する。この解析に
より、たとえば「低気圧は伊豆半島の南にあり、東日本
でにわか雨が降っています。」をICRLB で区切ると、図
７(a) に示す構文木、あるいは図７(b) に示す係り受け
構造が生成される。ここで、図７中の記号「／」は、文
のICRLB 境界を示している。この際に構文解析部34で
は、ICRLB 、節、列挙表現、並列関係等も同時に決定し
ている。この処理後、サブステップSS12に進む。In sub-step SS11, sub-step SS10
The syntax analysis unit 34 performs syntax analysis using the processing result (word string). In this parsing, the supplied word strings are combined into phrases, and the modification relationship between the phrases is analyzed. According to this analysis, for example, when "Low pressure is south of the Izu Peninsula and rain is showering in eastern Japan" by ICRLB, the syntax tree shown in Fig. 7 (a) or the relationship shown in Fig. 7 (b) A receiving structure is generated. Here, the symbol “/” in FIG. 7 indicates the ICRLB boundary of the sentence. At this time, the syntax analyzer 34 also determines ICRLB, clauses, enumeration expressions, parallel relations, and the like. After this processing, the flow advances to sub-step SS12.

【００６１】サブステップSS12では、形態素解析処理お
よび構文解析処理の結果を強調情報設定部36に供給して
供給されたこれらのデータに強調情報設定処理を施す。
強調情報設定部36の設定規則格納部36b には、強調・抑
圧・両者以外の３つの場合に分類する規則が格納されて
いる。情報設定部36a は、供給される文節と設定規則格
納部36b からの規則とを照合して一致する規則に対応す
る強調情報をこの文節に設定している。In sub-step SS12, the results of the morphological analysis process and the syntax analysis process are supplied to the emphasis information setting unit 36, and the supplied data is subjected to the emphasis information setting process.
The setting rule storage unit 36b of the emphasis information setting unit 36 stores rules for classification into three cases other than emphasis, suppression, and both. The information setting unit 36a compares the supplied phrase with the rule from the setting rule storage unit 36b, and sets emphasis information corresponding to the matching rule in this phrase.

【００６２】次にサブステップSS13では、サブステップ
SS10, SS11, SS12で得られた処理結果を音韻処理部40に
供給する。この供給の後、リターンに移行してサブルー
チンSUB1を終了する。Next, in sub-step SS13,
The processing results obtained in SS10, SS11, and SS12 are supplied to the phoneme processing unit 40. After this supply, the process returns to the subroutine SUB1 and ends.

【００６３】このサブルーチンSUB1の終了後、直ちに図
８のサブルーチンSUB2に移行する。サブルーチンSUB2で
は、前述したサブルーチンSUB1の言語解析処理の結果を
基に音韻解析処理が行われる。音韻解析処理には、韻律
語生成処理、アクセント指令生成処理、ポーズ・フレー
ズ指令生成処理等がある。まず、サブステップSS20に進
む。Immediately after the end of the subroutine SUB1, the process proceeds to the subroutine SUB2 of FIG. In subroutine SUB2, phoneme analysis processing is performed based on the result of the language analysis processing of subroutine SUB1 described above. The phonological analysis processing includes a prosodic word generation processing, an accent command generation processing, a pause / phrase command generation processing, and the like. First, the process proceeds to sub-step SS20.

【００６４】サブステップSS20では、言語解析部30から
のデータを韻律語生成部42に供給して韻律語を生成させ
る処理を施す。韻律語生成部42は、供給されるデータ
（文節）内の単語のアクセント結合を行う。また、韻律
語生成部42は、必要に応じて文節同士のアクセント結合
をも行って、韻律語を生成する。生成された韻律語に
は、図４の韻律語生成部42に図示しないが文節の強調情
報の設定機能も備えられている。この処理後に、サブス
テップSS21に進む。In the sub-step SS20, the data from the language analysis unit 30 is supplied to the prosody word generation unit 42 to perform processing for generating a prosody word. The prosodic word generation unit 42 performs accent connection of words in the supplied data (phrase). Further, the prosodic word generation unit 42 also generates a prosodic word by performing accent connection between the phrases as necessary. The generated prosodic word also has a function of setting phrase emphasis information, not shown in the prosodic word generation unit 42 of FIG. After this processing, the flow advances to sub-step SS21.

【００６５】サブステップSS21では、供給されるICRLB
の情報に基づいて各韻律語のアクセントの大きさを決定
する処理を行う。この処理はアクセント指令生成部44で
行われる。アクセント指令生成部44では、規格格納部44
b に予め格納されている規則、たとえばアクセント変形
の規則等と供給されるICRLB 情報との照合をアクセント
指令設定部44a で行われる。アクセント指令設定部44a
は、ICRLB 情報に対して規則と一致するアクセント記号
を割り当てて設定している。この設定の後、サブルーチ
ンSUB3に移行する。In sub-step SS21, the supplied ICRLB
To determine the magnitude of the accent of each prosodic word on the basis of the information of. This processing is performed by the accent command generation unit 44. In the accent command generation unit 44, the standard storage unit 44
The accent command setting unit 44a compares a rule stored in advance in b, for example, a rule for accent transformation with the supplied ICRLB information. Accent command setting section 44a
Is set by assigning accent marks that match the rules to ICRLB information. After this setting, the process proceeds to subroutine SUB3.

【００６６】図９に示すサブルーチンSUB3では、韻律的
な特徴を表すポーズ指令・フレーズ指令を供給されるデ
ータに設定する（ポーズ・フレーズ指令生成処理）。こ
のポーズ・フレーズ指令生成処理は、ポーズ指令・フレ
ーズ指令をそれぞれ生成する処理を行うために、まず、
サブルーチンSUB4でICRLB 分割処理を行っている。ICRL
B 分割処理には、さらに後述する分割候補生成処理、パ
ラメータ格納処理、および分割選択処理を行うためサブ
ルーチンSUB5, SUB6, SUB7が含まれている。これらの処
理を経た最適に分割されたICRLB 韻律句（すなわち、分
割候補）を指令設定部46a に供給する。この後、サブス
テップSS30に進む。In a subroutine SUB3 shown in FIG. 9, a pause command / phrase command representing a prosodic feature is set in the supplied data (pause / phrase command generation processing). In this pause / phrase command generation process, first, in order to perform a process of generating a pause command / phrase command, first,
ICRLB division processing is performed in subroutine SUB4. ICRL
The B division process further includes subroutines SUB5, SUB6, and SUB7 for performing a division candidate generation process, a parameter storage process, and a division selection process described later. The ICRLB prosodic phrase (that is, the division candidate) that has been optimally divided through these processes is supplied to the command setting unit 46a. Thereafter, the process proceeds to sub-step SS30.

【００６７】サブステップSS30では、この供給された分
割候補に対して、指令設定部46a は、ポーズ指令、フレ
ーズ指令を設定して中間言語作成部（図示せず）に出力
する。この処理の後に、リターンに進む。リターンでこ
のポーズ指令・フレーズ指令の設定処理を終了してサブ
ステップSS22に移行する。In sub-step SS30, the command setting section 46a sets a pause command and a phrase command for the supplied division candidates and outputs them to an intermediate language creating section (not shown). After this processing, the process proceeds to the return. Upon return, the process of setting the pause command / phrase command is completed, and the flow shifts to sub-step SS22.

【００６８】サブステップSS22では、サブステップSS2
0, SS21, SUB3で得られたデータを基に長音化、促音化
等の処理を行った後、中間言語の作成を行う。作成され
た中間言語のデータは、制御パラメータ生成部50に供給
される。この出力の後、リターンに進み、このサブルー
チンSUB3の処理を終了させる。In sub-step SS22, sub-step SS2
0, SS21, and SUB3 are processed based on the data obtained, such as prolonged sounding and prompting, and then an intermediate language is created. The created intermediate language data is supplied to the control parameter generation unit 50. After this output, the process proceeds to the return, and the processing of this subroutine SUB3 is terminated.

【００６９】前述したサブルーチンSUB3で行われるサブ
ルーチンSUB4のICRLB 分割処理について図10のフローチ
ャートで簡単に説明する。図９のサブルーチンSUB3に処
理が移行してきたとき、ICRLB 分割処理を行うようにサ
ブルーチンSUB4を開始して分割候補生成処理、パラメー
タ格納処理、および分割選択処理を順次に行う。分割候
補生成処理はサブルーチンSUB5（図11を参照）で、パラ
メータ格納処理はサブルーチンSUB6（図12を参照）で、
分割選択処理はサブルーチンSUB7（図13を参照）で行
う。これら一連の処理の後、データをサブルーチンSUB3
に渡す。このとき、サブルーチンSUB3の処理に対応し
て、図４の指令設定部46a にデータが供給されている。
指令設定部46a は、前述した通り入力されたデータに対
してポーズ指令・フレーズ指令を規格格納部46b の規則
に応じて設定している。The ICRLB division processing of the subroutine SUB4 performed in the above-described subroutine SUB3 will be briefly described with reference to the flowchart of FIG. When the processing shifts to the subroutine SUB3 in FIG. 9, the subroutine SUB4 is started to perform the ICRLB division processing, and the division candidate generation processing, the parameter storage processing, and the division selection processing are sequentially performed. The division candidate generation process is a subroutine SUB5 (see FIG. 11), the parameter storage process is a subroutine SUB6 (see FIG. 12),
The division selection process is performed in a subroutine SUB7 (see FIG. 13). After these series of processing, the data is stored in subroutine SUB3
Pass to. At this time, data is supplied to the command setting section 46a of FIG. 4 corresponding to the processing of the subroutine SUB3.
The command setting unit 46a sets a pause command / phrase command for the input data as described above in accordance with the rules of the standard storage unit 46b.

【００７０】次に分割候補生成処理を行うサブルーチン
SUB5について図11を参照しながら説明する。分割候補生
成処理は、サブルーチンSUB4に処理が移行してきたと
き、直ちにサブルーチンSUB5の処理を開始してサブステ
ップSS50に進む。Next, a subroutine for performing division candidate generation processing
SUB5 will be described with reference to FIG. In the division candidate generation processing, when the processing shifts to the subroutine SUB4, the processing of the subroutine SUB5 is started immediately and the process proceeds to the sub-step SS50.

【００７１】サブステップSS50では、言語解析部30から
供給されるデータの種類に応じて処理を分ける。データ
がたとえば文節のとき（Yes ）、サブステップSS51に進
む。また、すでにデータが適当なICRLB 境界で分割され
ているとき、この処理を行わずにサブステップSS52に進
む。In sub-step SS50, processing is divided according to the type of data supplied from language analysis unit 30. When the data is, for example, a phrase (Yes), the process proceeds to sub-step SS51. If the data has already been divided at an appropriate ICRLB boundary, the process proceeds to sub-step SS52 without performing this process.

【００７２】サブステップSS51では、供給されるデータ
をICRLB に分割する。この処理は前述した第１の規則に
従って規則分割部460aで行われる。In sub-step SS51, the supplied data is divided into ICRLB. This processing is performed by the rule dividing unit 460a according to the above-described first rule.

【００７３】サブステップSS52では、供給されたデータ
（ICRLB 韻律句）の長さが予め設定された分割長との比
較処理を行う。この分割長の長さの基本単位は、言葉の
読みに対応して与えられる１拍としている。したがっ
て、この比較は、この分割長の拍数とICRLB 韻律句の拍
数で行われる。比較の条件は、第２の規則でICRLB 韻律
句の拍数が分割長の拍数以下かどうかで行われる。比較
処理は処理回数によって処理結果の供給先を変えるため
たとえば、フラグF を設けるとよい。言語解析部30から
供給されたデータに対する比較処理を行う（フラグF=0
のとき）。このとき、条件を満たす場合（Yes ）、図４
の分割長判断部60b は、供給されたICRLB韻律句を指令
設定部46a に供給する。また、条件を満たさない、すな
わちICRLB韻律句の拍数が分割長の拍数より長いとき（N
o）、ICRLB 韻律句を再分割部460cに供給する。この
後、サブステップSS53に進む。In sub-step SS52, the length of the supplied data (ICRLB prosodic phrase) is compared with a preset division length. The basic unit of the length of the division length is one beat given corresponding to the reading of words. Therefore, this comparison is made with the number of beats of this division length and the number of beats of the ICRLB prosodic phrase. The comparison condition is based on whether the number of beats of the ICRLB prosodic phrase is equal to or less than the number of beats of the division length in the second rule. In the comparison processing, for example, a flag F may be provided to change the supply destination of the processing result depending on the number of processing. A comparison process is performed on the data supplied from the language analysis unit 30 (flag F = 0
When). At this time, if the condition is satisfied (Yes), FIG.
The division length determination unit 60b supplies the supplied ICRLB prosodic phrase to the command setting unit 46a. When the condition is not satisfied, that is, when the number of beats of the ICRLB prosodic phrase is longer than the number of beats of the division length (N
o), supply the ICRLB prosodic phrase to the subdivision unit 460c. Thereafter, the process proceeds to sub-step SS53.

【００７４】サブステップSS53では、供給されるICRLB
韻律句を再分割し、分割長判断部460bに分割されたICRL
B 韻律句からなるデータを出力する。In sub-step SS53, the supplied ICRLB
The prosodic phrase is subdivided, and the ICRL divided by the division length judgment unit 460b
B Output data consisting of prosodic phrases.

【００７５】次にサブステップSS54では、前述した比較
条件を戻されたデータが満足するかを判断している。比
較条件を満足するとき（Yes ）、サブステップSS55に進
む。また、比較条件を満たさないとき（No）、サブステ
ップSS53に処理を戻す。この判断は分割長判断部460bで
行われる。Next, in sub-step SS54, it is determined whether or not the returned data satisfies the comparison condition. When the comparison condition is satisfied (Yes), the process proceeds to sub-step SS55. If the comparison condition is not satisfied (No), the process returns to sub-step SS53. This determination is made by the division length determination unit 460b.

【００７６】サブステップSS55では、分割候補の組合せ
となり得る分割候補の有無を判断する。分割候補となる
組合せがICRLB 韻律句にあるとき（Yes ）、比較条件が
満たされた分割候補をサブルーチンSUB7に引き渡すとと
もに、この引渡し処理後、サブステップSS53に戻す。こ
れにより、分割候補生成部460 は、再分割部460cから分
割候補選択部462 にデータを供給する。また、分割候補
となる組合せがICRLB韻律句にないとき（No）、リター
ンに進む。In sub-step SS55, it is determined whether there is a division candidate that can be a combination of division candidates. When the combination as a division candidate is in the ICRLB prosodic phrase (Yes), the division candidate satisfying the comparison condition is delivered to the subroutine SUB7, and after this delivery process, the process returns to the sub-step SS53. Thereby, the division candidate generation unit 460 supplies data from the re-division unit 460c to the division candidate selection unit 462. If the combination that is a candidate for division is not included in the ICRLB prosody (No), the process proceeds to return.

【００７７】ここで、分割候補の有無は、計算により得
られる。この計算は、まずICRLB 分割後にICRLB 韻律句
の拍数を設定値以下に分割するとともに、分割によって
得られるフレーズ指令の最小個数を求める。すなわち、
全拍数を上述の設定値で割った際に得られる整数値で、
剰余がある場合、この最小個数は整数値+1となる。この
最小個数を変数PH_NUMとする。また、データをICRLB 分
割によって最も細かく分割した際に得られるフレーズ指
令の最大個数MAX_PH_NUMを求める。変数PH_NUMは、分割
が要すると判断された際に分割長判断部460bは、変数PH
_NUMの値を+1だけ歩進する。このように設定することか
ら、実際に再分割したデータのサブルーチンSUB7への引
渡しは、変数PH_NUMの分割を行われた後になる。そし
て、分割候補の組合せは、変数PH_NUMがフレーズ指令の
最大個数MAX_PH_NUMを越えるまで続けられる。図11のフ
ローチャートでは便宜的にデータ引渡しをサブルーチン
SUB7の表示で表している。データ引渡しは、この方法に
限定されず、メモリに格納しておき、まとめて分割候補
の組合せを引き渡しても良い。これらの判断処理は分割
長判断部460bで行われる。この一連の処理により、分割
候補の組合せが生成されるようになる。Here, the presence / absence of a division candidate is obtained by calculation. In this calculation, first, after the ICRLB division, the number of beats of the ICRLB prosodic phrase is divided below a set value, and the minimum number of phrase commands obtained by the division is obtained. That is,
An integer value obtained by dividing the total number of beats by the above setting value,
If there is a remainder, this minimum number is an integer value +1. This minimum number is set as a variable PH_NUM. In addition, the maximum number of phrase commands MAX_PH_NUM obtained when the data is finely divided by ICRLB division is obtained. When the variable PH_NUM is determined to require division, the division length determination unit 460b sets the variable PH
Steps up the value of _NUM by +1. With this setting, the actually re-divided data is passed to the subroutine SUB7 after the variable PH_NUM is divided. The combination of division candidates is continued until the variable PH_NUM exceeds the maximum number of phrase commands MAX_PH_NUM. In the flowchart of FIG. 11, the data transfer is a subroutine for convenience.
This is indicated by the display of SUB7. The data delivery is not limited to this method, and the data may be stored in a memory and a combination of division candidates may be delivered in a lump. These determination processes are performed by the division length determination unit 460b. By this series of processing, a combination of division candidates is generated.

【００７８】次に分割選択処理についてサブルーチンSU
B7を説明する前に、この分割選択処理で用いられるパラ
メータについて図12を用いて簡単に説明する。パラメー
タはサブルーチンSUB6で設定している。ここで、パラメ
ータには、後述する式(1) で用いる特徴量I に対する重
み係数A と、分割後の平均拍数に乗算する係数B とがあ
る。Next, a subroutine SU for the division selection process
Before explaining B7, parameters used in this division selection process will be briefly described with reference to FIG. The parameters are set in subroutine SUB6. Here, the parameters include a weighting coefficient A for the feature value I used in the expression (1) described later and a coefficient B for multiplying the average beat after division.

【００７９】ここで、特徴量I は、たとえば韻律句の先
頭の韻律語の強調情報（+emp, -emp, 0emp）で示す。サ
ブステップSS60では、特徴量I の規則、すなわち強調情
報（+emp, -emp, 0emp）の関係を記憶する。Here, the feature amount I is indicated by, for example, emphasis information (+ emp, -emp, 0emp) of the prosodic word at the head of the prosodic phrase. In sub-step SS60, the rule of the feature amount I, that is, the relationship between the emphasis information (+ emp, -emp, 0emp) is stored.

【００８０】次にサブステップSS61では、特徴量I の数
値に応じて重み係数A を格納する。本実施例では、特徴
量I の数値-1, -2に対する重み係数A は、7 、-3を割り
当てて数値の格納を行っている。Next, in sub-step SS61, a weighting factor A is stored according to the numerical value of the feature value I. In this embodiment, 7 and -3 are assigned to the weighting factors A for the numerical values -1 and -2 of the feature amount I, and the numerical values are stored.

【００８１】次にサブステップSS62では、前述した係数
B の数値を格納する。ここで、前述の重み係数A および
重み係数B は、図４に示すパラメータ格納部464 で算出
してもよい。これらの重み係数A, Bは、たとえば音声合
成における特徴量I の統計解析処理、あるいは多変量解
析等の手法を駆使して算出するとよい。パラメータ格納
部464 は、パラメータ値の算出を装置外部で予め行い、
単に数値を格納させるだけでもよい。Next, in sub-step SS62, the coefficient
Stores the value of B. Here, the above-mentioned weight coefficient A and weight coefficient B may be calculated by the parameter storage unit 464 shown in FIG. These weighting factors A and B may be calculated by making full use of, for example, a statistical analysis process of the feature value I in speech synthesis or a multivariate analysis. The parameter storage unit 464 calculates the parameter value in advance outside the device,
It is also possible to simply store a numerical value.

【００８２】このような手順で処理してリターンに進
み、パラメータの格納を終了して図10に示すようにサブ
ルーチンSUB7に移行する。このサブルーチンSUB7では、
サブルーチンSUB5から供給される分割候補（ICRLB 韻律
句）の各組合せに対して図13に示すようにコスト計算が
行われる。さらに、サブルーチンSUB7は、図13の手順に
従って得られたコストから最適な分割候補を選択する処
理を行っている。ここで、コストとは、分割してできる
句に関する評価を行った際に正確な音声表現の指標とな
る誤差のことである。The processing proceeds in such a procedure, and the process proceeds to the return. The storage of the parameters is completed, and the process proceeds to the subroutine SUB7 as shown in FIG. In this subroutine SUB7,
As shown in FIG. 13, cost calculation is performed for each combination of division candidates (ICRLB prosodic phrases) supplied from the subroutine SUB5. Further, the subroutine SUB7 performs a process of selecting an optimal division candidate from the costs obtained according to the procedure of FIG. Here, the cost is an error that is an index of an accurate speech expression when an evaluation is made on a phrase that can be divided.

【００８３】まず、図13のサブステップSS70では、供給
される分割候補の各組合せに対してICRLB 境界で区切ら
れる範囲毎の拍数を図４の演算部462aでカウントして記
憶する。この記憶を行ってサブステップSS71に進む。サ
ブステップSS71では、分割候補の選択に用いる変数MIN
の値を設定する。この設定値は、たとえば9999とする。
また、最小の分割候補のICRLB 境界の位置を記憶するメ
モリMIN_DIV[ ]およびコスト値を格納する変数VAL の内
容をクリアする。この後、サブステップSS72に進む。First, in the sub-step SS70 of FIG. 13, the number of beats in each range divided by the ICRLB boundary for each combination of the supplied division candidates is counted and stored by the calculation unit 462a of FIG. This storage is performed, and the flow advances to sub-step SS71. In sub-step SS71, a variable MIN used for selecting a division candidate
Set the value of. This set value is, for example, 9999.
Further, the contents of the memory MIN_DIV [] for storing the position of the ICRLB boundary of the smallest division candidate and the variable VAL for storing the cost value are cleared. Thereafter, the process proceeds to sub-step SS72.

【００８４】サブステップSS72では、ICRLB 境界で分割
された各分割数m での分割候補についてそれぞれコスト
計算を行う。この計算は、演算部462aにおいてコスト関
数F(D)に従って行う。コスト関数F(D)は、式(1)In the sub-step SS72, the cost is calculated for each of the division candidates with the division number m divided at the ICRLB boundary. This calculation is performed by the calculation unit 462a according to the cost function F (D). The cost function F (D) is given by the equation (1)

【００８５】[0085]

【数１】で表し、ここで、D(m,n)は供給された分割候補の組合せ
中で n番目でこの分割候補を m個に分割したことを示す
変数、L はICRLB 全体の拍数、L(m,n,i)は n番目の組合
せの分割候補を m個に分割した際の i番目の韻律句の拍
数を表す変数、Ph(m,n,k) は n番目の組合せの分割候補
を m個に分割した際の k番目の韻律句、count(Ph))は k
番目の韻律句における特徴量I の有無を表す関数、A_kは
k番目の韻律句における特徴量I の重み係数、B は分割
後の平均拍数に関する重み係数である。これら各種の変
数および重み係数を用いて表される式(1) は、第１項が
前述した誤差の総和で、第２項が特徴量I のコスト、最
後に第３項が分割後の平均拍数のコストを表している。
これら各項の演算結果が n番目の分割候補のコストであ
る。(Equation 1) Where D (m, n) is a variable indicating that the division candidate is divided into m at the nth in the supplied combination of division candidates, L is the number of beats of the entire ICRLB, L (m , n, i) is a variable representing the number of beats of the i-th prosodic phrase when the n-th combination candidate is divided into m, and Ph (m, n, k) is a variable that represents the n-th combination candidate. k-th prosodic phrase when divided into m, count (Ph)) is k
A _k is a function representing the presence or absence of feature I in the
In the k-th prosodic phrase, the weighting factor of the feature amount I, and B is a weighting factor relating to the average beat after division. In equation (1) expressed using these various variables and weighting factors, the first term is the sum of the above-described errors, the second term is the cost of the feature I, and finally the third term is the average after the division. It represents the cost of beats.
The operation result of each of these terms is the cost of the n-th division candidate.

【００８６】次にサブステップSS73では、計算されたコ
ストを変数VAL に格納する。格納後、サブステップSS74
に進む。Next, in sub-step SS73, the calculated cost is stored in a variable VAL. After storing, sub-step SS74
Proceed to.

【００８７】サブステップSS74では、変数VAL と変数MI
N の値を比較する。この比較において変数MIN の値が変
数VAL の値よりも大きいとき（ VAL＜MIN ）、サブステ
ップSS75に進む。また、変数VAL の値が変数MIN の値以
上のとき（ VAL≧MIN ）、サブステップSS76に進む。In sub-step SS74, variable VAL and variable MI
Compare the value of N. In this comparison, when the value of the variable MIN is larger than the value of the variable VAL (VAL <MIN), the process proceeds to sub-step SS75. When the value of the variable VAL is equal to or more than the value of the variable MIN (VAL ≧ MIN), the process proceeds to the sub-step SS76.

【００８８】サブステップSS75では、変数MIN の値を変
数VAL の値で置換するとともに、この n番目の分割候補
を分割した位置を示すICRLB 境界位置を示すデータをメ
モリMIN_DIV[ ]にこれまでの記憶データと置換させる。In sub-step SS75, the value of variable MIN is replaced with the value of variable VAL, and data indicating the ICRLB boundary position indicating the position at which the n-th division candidate has been divided is stored in memory MIN_DIV []. Replace with data.

【００８９】サブステップSS76では、分割候補の組合せ
がまだあるかどうか判断している。まだ分割候補がある
とき（Yes ）、サブステップSS77に進む。また、分割候
補がなくなったとき（No）、サブステップSS78に進む。
ここで、その判断には、前述した分割候補の有無で用い
た変数MAX_PH_NUMと変数PH_NUMとの関係から判る。すな
わち、変数MAX_PH_NUMと変数PH_NUMが等しくなると、分
割候補がなくなったことになるからである。In sub-step SS76, it is determined whether or not there is a combination of division candidates. If there are still division candidates (Yes), the flow advances to sub-step SS77. When there are no more division candidates (No), the process proceeds to sub-step SS78.
Here, the determination is made based on the relationship between the variable MAX_PH_NUM and the variable PH_NUM used in the presence or absence of the division candidate described above. That is, when the variable MAX_PH_NUM becomes equal to the variable PH_NUM, it means that there are no more division candidates.

【００９０】サブステップSS77では、供給される新たな
n+1 番目の分割候補の拍数をカウントする。このカウン
ト処理の後、サブステップSS71に処理を戻す。また、サ
ブステップSS78では、分割候補のコスト計算が終了し、
この時点で最小な分割候補が選択されたことになるの
で、最小の確定したメモリMIN_DIV[ ]のICRLB 境界位置
のデータを最適な分割候補として図４の指令設定部46a
に供給する。この供給の後、リターンに進む。リターン
を介して分割候補の選択処理を終了させる。In sub-step SS77, the new supplied
Count the number of beats of the (n + 1) th division candidate. After this counting process, the process returns to sub-step SS71. Further, in sub-step SS78, the cost calculation of the division candidate is completed,
At this point, the smallest division candidate is selected, and the data of the ICRLB boundary position in the smallest determined memory MIN_DIV [] is set as the optimal division candidate as the command setting unit 46a in FIG.
To supply. After this supply, go to return. The process of selecting the division candidates is terminated via the return.

【００９１】このように処理することにより、最適な分
割が施された分割候補を選択しこの選択された分割候補
にポーズ指令・フレーズ指令をそれらの指令の規則と照
合して設定している。By performing the above-described processing, the division candidate to which the optimal division has been performed is selected, and the pause instruction / phrase instruction is set in the selected division candidate by comparing them with the rules of those instructions.

【００９２】次により具体的な例を用いて説明するとと
もに、従来の処理との比較も交えて説明する。前述した
規則に基づいて入力文のテキスト解析を行う。ここで、
第２の規則で用いる基本的なパラメータである最短拍数
L₁=5、韻律句に含まれる拍数限界L₂=15 にしている。ま
た、韻律語素の間に挿入されている記号「↓」は、アク
セント指令、記号「，」は韻律語素境界、記号「P 」は
フレーズ指令を示している。演算処理の準備として、パ
ラメータ格納部464 は、格納されている特徴量I=-1, I=
-2に関する重み係数A₁=7, A₂=-3 、および重み係数B=0.
2 を演算部462aに供給している。Next, a description will be given using a more specific example and a comparison with a conventional process will be described. The text of the input sentence is analyzed based on the rules described above. here,
Shortest beat, which is a basic parameter used in the second rule
L ₁ = 5, and the beat limit L ₂ = 15 included in the prosodic phrase. The symbol "↓" inserted between prosodic words indicates an accent command, the symbol "," indicates a prosodic word boundary, and the symbol "P" indicates a phrase command. As a preparation for the arithmetic processing, the parameter storage unit 464 stores the stored feature values I = -1, I =
-2 weighting factors A ₁ = 7, A ₂ = -3, and weighting factor B = 0.
2 is supplied to the calculation unit 462a.

【００９３】図１のデータ入力部20、言語解析部30を介
して供給される入力文「私たちの生活から切り離せない
道具となっています。」が音韻処理部40に供給された場
合、アクセント指令および韻律語素の分割により入力文
は、「ワタシ↓タチノ，セーカツヲ，キリハナセナイ，
ドーグ↓ト，ナ↓ッテイマス」となる。また、各韻律語
素の拍数を括弧内の数字で表すと、韻律語素の拍数は、
それぞれ、(6),(6),(7),(4),(6) で、この場合の分割さ
れる最大個数は 5で、ICRLB 全体の拍数は(29)である。
入力文の韻律句は、図14に示すようにフレーズ指令の最
小個数2 と最大個数5 の間で区分される。最小個数は、
ICRLB 全体の拍数と拍数限界L₂=15 との関係から明らか
である。図14の表は、設定した分割数でコスト最小とな
る区分位置を示している。When the input sentence “This tool is inseparable from our lives.” Supplied through the data input unit 20 and the language analysis unit 30 in FIG. The input sentence is divided by the command and the prosodic word element into the following: "Watashi ↓ Tatino, Sekatsu ヲ, Kirihanasenai,
Dog ↓ g, na ↓ mass. Also, when the number of beats of each prosodic word is represented by a number in parentheses, the number of beats of the prosodic word is
They are (6), (6), (7), (4) and (6), respectively. In this case, the maximum number of divisions is 5, and the total number of beats of ICRLB is (29).
The prosodic phrase of the input sentence is divided between the minimum number 2 and the maximum number 5 of phrase commands as shown in FIG. The minimum number is
It is clear from the relationship between the overall number of beats of ICRLB and the beat rate limit L ₂ = 15. The table in FIG. 14 shows the partition positions at which the cost is minimum with the set number of divisions.

【００９４】コスト算出に関して演算部462aは、コスト
関数F の各項毎に値を算出し、これらの値を合算して求
めている（式(1) を参照）。さらに、ある分割数におい
てどの区分位置でのコストが最小になるか求めた結果、
図14の表は、分割数2 で、かつ(6+6,7+4+6) と分割した
とき、コストが最小になることを示している。この分割
に合わせて指令設定部46a でポーズ指令・フレーズ指令
の設定を行うと、入力文に対して生成されるデータは、
「P₁ワタシ↓タチノ，セーカツヲ，P₃キリハナセナイ，
ドーグ↓ト，ナ↓ッテイマスP₀」となる。これは、特
に、分割後の韻律句の長さについて式(1) の第３項で考
慮することにより、自然な韻律となるデータが生成され
るようになることが判った。As for the cost calculation, the calculation unit 462a calculates a value for each term of the cost function F, and obtains the sum of these values (see equation (1)). Furthermore, as a result of determining at which division position the cost is minimized for a certain number of divisions,
The table in FIG. 14 shows that the cost is minimized when the number of divisions is 2 and the division is made into (6 + 6, 7 + 4 + 6). When a pause command / phrase command is set in the command setting unit 46a in accordance with this division, the data generated for the input sentence is
"P ₁ Watashi ↓ Tachino, Sekatsuwo, P ₃ Kirihanasenai,
Dog ↓ ，, ↓ マスマステイ P _{0 0 0} . In particular, it has been found that by considering the length of the prosodic phrase after division in the third term of the expression (1), data having a natural prosody is generated.

【００９５】ところで、コスト関数の第１項だけを用い
て分割数を判断していたとき、誤差の総和は、 5分割、
すなわち(6,6,7,4,6) にした場合、最小値3.6 となっ
た。この結果、入力文には「P₁ワタシ↓タチノ，P₃セー
カツヲ，P₃キリハナセナイ，P₃ドーグ↓ト，P₃ナ↓ッテ
イマスP₀」とフレーズ指令が付加される。しかしなが
ら、あまりに細かく入力文が分割されているため、この
生成されたデータを基に得られる合成音声の韻律は不自
然であった。このように分割後の韻律句の長さを考慮し
た処理を行うことにより、最終的に的確な韻律を伴った
合成音声を発生させることができるようになる。By the way, when the number of divisions is determined using only the first term of the cost function, the sum of the errors is 5 divisions,
That is, when (6,6,7,4,6) was set, the minimum value was 3.6. As a result, the input sentence "P ₁ Watashi ↓ Tachino, P ₃ Sekatsuwo, P ₃ Kirihanasenai, P ₃ Dogu ↓ door, P ₃ Na ↓ Tteimasu P _0" phrase command and is added. However, since the input sentence is divided too finely, the prosody of the synthesized speech obtained based on the generated data is unnatural. By performing processing in consideration of the length of the prosody phrase after division in this way, it is possible to finally generate a synthesized speech accompanied by an accurate prosody.

【００９６】また、別な入力文「開幕連勝を支えた望月
が打たれた。」が音声合成装置10に供給された。この場
合、韻律処理、アクセント指令により、「カイマク，レ
↓ンショーヲ，ササエタ，モチ↓ズキガ，ウタ↓レタ」
となる。また、各韻律語素の拍数を括弧内の数字で表す
と、韻律語素の拍数は、それぞれ、(4),(5),(4),(5),
(4) で、この場合の分割される最小個数は 2、最大個数
は 3で、ICRLB 全体の拍数は(22)である。図15に示す表
は、各分割数においてコストを最小にするフレーズ指令
の位置関係を示している。Further, another input sentence “Mochizuki that supported the opening streak has been struck.” Was supplied to the speech synthesizer 10. In this case, by prosody processing and accent command, "Kai-mak, re-show ↓, sasaeta, mochi ↓ zukiga, uta ↓ letter"
Becomes Also, when the number of beats of each prosodic word element is represented by a number in parentheses, the number of beats of the prosodic word element is (4), (5), (4), (5),
In (4), the minimum number of divisions in this case is 2, the maximum number is 3, and the total number of beats in ICRLB is (22). The table shown in FIG. 15 shows the positional relationship of the phrase command that minimizes the cost for each division number.

【００９７】この場合もコスト関数F は各項毎に値を算
出し、これらの値を合算して求めている。この結果、各
分割の最小値の中で分割数2 で、かつ(4+5+4,5+4) と分
割したとき、コストが-1.2と最小になることが判る。こ
の分割に合わせて指令設定部46a でポーズ指令・フレー
ズ指令の設定を行うと、入力文に対して生成されるデー
タは、「P₁カイマク，レ↓ンショーヲ，ササエタ，P₃モ
チ↓ズキガ，ウタ↓レタP₀」となる。これは、特に、韻
律後の強調情報および分割後の韻律句の長さについて式
(1) の第２項および第３項で考慮することにより、自然
な韻律となるデータが生成されることを示していた。Also in this case, the cost function F calculates a value for each term and obtains the sum of these values. As a result, it can be seen that when the number of divisions is 2 and the division is (4 + 5 + 4,5 + 4) among the minimum values of the divisions, the cost is minimized to -1.2. If the setting of the pause command phrase command in the command setting section 46a in accordance with this division, data generated with respect to the input sentence, "P ₁ opening, Les ↓ Nshowo, Sasaeta, P ₃ waxy ↓ Zukiga, Songs ↓ Letter P ₀ ". This is especially true for the post-prosodic emphasis information and the prosodic phrase length after splitting.
It has been shown that by taking into account the second and third terms of (1), natural prosody data is generated.

【００９８】ところで、前述の比較と同様にコスト関数
の第１項だけを用いて分割数を判断していたとき、誤差
の総和は、 2分割し、分割候補が(4+5,4+5+4) あるいは
(4+5+4,5+4) と韻律句を区切った場合、最小値4 となっ
た。この入力文は２通りの分割が得られる。(4+5,4+5+
4) の分割候補で生成されたデータは、「P₁カイマク，
レ↓ンショーヲ，P₃ササエタ，モチ↓ズキガ，ウタ↓レ
タP₀」となり、一方、(4+5+4,5+4) の分割候補で生成さ
れたデータは、「P₁カイマク，レ↓ンショーヲ，ササエ
タ，P₃モチ↓ズキガ，ウタ↓レタP₀」となる。When the number of divisions is determined using only the first term of the cost function as in the above-described comparison, the total error is divided into two, and the division candidates are (4 + 5,4 + 5 +4) or
When the prosodic phrase was delimited by (4 + 5 + 4,5 + 4), the minimum value was 4. This input sentence can be divided into two types. (4 + 5,4 + 5 +
Data generated by the division candidate 4), "P ₁ opening,
Les ↓ Nshowo, P ₃ Sasaeta, waxy ↓ Zukiga, Songs ↓ Letter P ₀ ". On the other hand, data generated by the division candidate (4 + 5 + 4,5 + 4)," P ₁ opening, Les ↓ Nshowo, Sasaeta, P ₃ mochi ↓ Zukiga, and Uta ↓ Letter P ₀ ".

【００９９】ここで、(4+5,4+5+4) の分割候補の生成さ
れたデータには、動詞「支えた」の前にフレーズ指令が
設定され、固有名詞「望月」の前にフレーズ指令が付い
ていない。これは、起伏語の強調がアクセント指令を大
きくすることにより行われるがフレーズ指令を前の単語
に付けたため強調単語が強調されなくなってしまうこと
を示している。すなわち、強調単語が韻律句の先頭に来
ないICRLB 分割を行うことに起因している。したがっ
て、前者の分割候補を用いて合成音声を生成すると、不
自然な韻律で発声されることになる。この入力文の場
合、誤差の総和がともに4 と同じため、誤差の総和だけ
の判断では前者の分割候補が採用される虞れがあった。
このような観点からも本発明を適用した音声合成装置10
は、的確に韻律を自然に再現する分割候補を選択できる
ので、この不適切な分割候補の採用を回避することがで
きた。Here, the phrase command is set before the verb “supported” and the proper noun “mochizuki” is added to the generated data of the (4 + 5,4 + 5 + 4) division candidate. There is no phrase command. This indicates that the undulating word is emphasized by increasing the accent command, but the emphasized word is not emphasized because the phrase command is added to the previous word. That is, this is caused by performing ICRLB division in which the emphasized word does not come at the beginning of the prosodic phrase. Therefore, when the synthesized speech is generated using the former division candidate, the synthesized speech is uttered with an unnatural prosody. In the case of this input sentence, since the sum of the errors is the same as 4, there is a possibility that the former candidate for division may be adopted only in the judgment of the sum of the errors.
From this point of view, the speech synthesizer 10 to which the present invention is applied
Can appropriately select a division candidate that naturally reproduces the prosody, so that the adoption of this inappropriate division candidate can be avoided.

【０１００】また、抑圧単語が韻律句の先頭に来るよう
にICRLB 分割する場合もある。このような場合の例文と
しては、たとえば「電話が繋がってしまう事があるかも
しれません。」がある。この例では、韻律処理、アクセ
ント指令により、「デンワガ，ツナガッテシマウ，コ
ト］ガ，ア｝ルカモ，シレマセ｝ン」となる。また、各
韻律語素の拍数を括弧内の数字で表すと、韻律語素の拍
数は、それぞれ、(4),(8),(3),(4),(5) で、ICRLB 全体
の拍数は(24)である。図16に示す表は、各分割数におい
てコストを最小にするフレーズ指令の位置関係を示して
いる。In some cases, ICRLB division is performed so that a suppressed word comes at the head of a prosodic phrase. An example sentence in such a case is, for example, "Phone may be connected." In this example, the result is "denwaga, tsunagatteshimau, koto] ga, alcamo, shiremasen" by the prosody processing and the accent command. When the number of beats of each prosodic word is represented by a number in parentheses, the number of beats of the prosodic word is (4), (8), (3), (4), (5), respectively, and ICRLB The total number of beats is (24). The table shown in FIG. 16 shows the positional relationship of the phrase command that minimizes the cost for each division number.

【０１０１】コスト関数F は各項毎に値を算出し、これ
らの値を合算して求めている。このコスト算出の結果、
分割数2 で、かつ(4+8+3,4+5) に分割したとき、入力文
に対するコストが3.6 と最小になった。この分割に合わ
せて指令設定部46a でポーズ指令・フレーズ指令の設定
を行うと、入力文に対して生成されるデータは、「P₁デ
ンワガ，ツナガッテシマウ，コト］ガ，P₃，ア｝ルカ
モ，シレマセ｝ン」となる。これは、特に、韻律後の強
調情報および分割後の韻律句の長さについて式(1) の第
２項および第３項で考慮することにより、自然な韻律と
なるデータが生成されるようになった。The cost function F calculates a value for each term, and obtains the sum of these values. As a result of this cost calculation,
When the number of divisions was 2 and it was divided into (4 + 8 + 3,4 + 5), the cost for the input sentence was the smallest at 3.6. If the setting of the pause command phrase command in the command setting section 46a in accordance with this division, data generated with respect to the input sentence, "P ₁ Denwaga, Tsunagatteshimau, Koto] moth, P _3, A} Rukamo, Shiremasedin ”. This is because, in particular, by considering the prosody enhancement information and the length of the prosodic phrase after division in the second and third terms of the equation (1), data having a natural prosody is generated. became.

【０１０２】一方、コスト関数F の第１項のみを用いて
分割候補を評価すると、分割数2 で、かつ(4+8,3+4+5)
と分割したとき、誤差が0 と最小になった。このとき、
入力文に対して生成されるデータは、「P₁デンワガ，ツ
ナガッテシマウ，P₃コト］ガ，ア｝ルカモ，シレマセ｝
ン」となる。このデータが示すように、「コト」の直前
にフレーズ指令が設定される。しかしながら、「コト」
のような形式名詞は、通常、抑圧されて強く発音されな
いことが知られている。式(1) の第１項のみによる誤差
の総和で最小値を探しても抑圧単語が韻律句の先頭に来
るような分割候補を選択すると、結局、不自然な韻律で
合成音声を生成してしまうことになる。したがって、コ
スト関数の全項で入力文のデータを評価することによ
り、的確な分割候補を選択できるようになる。On the other hand, when the division candidates are evaluated using only the first term of the cost function F, the division number is 2 and (4 + 8,3 + 4 + 5)
When divided, the error became 0 and the minimum. At this time,
Data generated with respect to the input sentence, "P ₁ Denwaga, Tsunagatteshimau, P ₃ things] moth, A} Rukamo, Shiremase}
". As indicated by this data, a phrase command is set immediately before “KOTO”. However,
It is known that formal nouns such as are usually suppressed and not pronounced strongly. Even if the minimum value is found by summing the errors due to only the first term of the equation (1), if a candidate for the division in which the suppressed word comes at the beginning of the prosodic phrase is selected, the synthesized speech is eventually generated with an unnatural prosody. Will be lost. Therefore, by evaluating the data of the input sentence in all the terms of the cost function, it becomes possible to select an accurate division candidate.

【０１０３】音声合成装置10に、たとえば入力文「国有
地の処分に適用されているのと同じ転売規則を設けると
いう基本方針を決めました。」が入力された場合、韻律
語素の処理、アクセント指令の処理により、入力文は、
「コクユ］ーチノ，ショ↓ブンニ，テキヨーサレ↓テイ
ルノト，オナジ，テンバイキソ］クヲ，モーケ↓ルトイ
ユー，キホンホ↓ーシンヲ，キメマ｝シタ」と処理され
る。各韻律語素の拍数は、(6),(4),(11),(3),(8),(7),
(8),(5)で、全体の拍数は52であった。コスト関数F の
第１項のみで評価すると、図17に示すように、分割数4
、かつ(6+4,11+3,8+7,8+5)の分割候補の場合に誤差の
総和が6 と最小になることが判る。またコスト関数の全
項の合算によるコストを各分割候補の組合せで比較した
場合でも分割数4 、かつ(6+4,11+3,8+7,8+5)の場合にコ
ストが3.4 となり、最小を示す。このように同じ結果が
得られる場合もある。For example, when the input sentence “Basic policy to establish the same resale rules as applied to the disposal of state-owned land has been decided” is input to the speech synthesizer 10, the processing of the prosodic words, By the processing of the accent command, the input sentence is
It is processed as "Kokuyu] Chino, Show ↓ Bunni, Tekiyosare ↓ Tailnot, Onaji, Tenbaikiso] ヲ, Moke ↓ Rutouyu, Kihonho ↓ Shinshin, Kimemamashita. The number of beats for each prosodic word is (6), (4), (11), (3), (8), (7),
In (8) and (5), the total number of beats was 52. When only the first term of the cost function F is evaluated, as shown in FIG.
In addition, it can be seen that the sum of the errors is as small as 6 in the case of the (6 + 4,11 + 3,8 + 7,8 + 5) division candidates. In addition, even when comparing the total cost of all terms of the cost function with each combination of division candidates, the cost is 3.4 when the number of divisions is 4 and (6 + 4,11 + 3,8 + 7,8 + 5). , Indicating the minimum. Thus, the same result may be obtained in some cases.

【０１０４】次に本発明の音声合成装置の変形例につい
て図18および図19を参照しながら説明する。この音声合
成装置10は、基本的に前述の実施例の構成と略々同じで
あるが、音韻処理部40のポーズ・フレーズ指令生成部46
に対して構成の変更を施している。この変更が施された
音韻処理部40の要部としてポーズ・フレーズ指令生成部
46の構成を図18に示す。Next, a modification of the speech synthesizer of the present invention will be described with reference to FIGS. This speech synthesizer 10 has basically the same configuration as that of the above-described embodiment, except that a pause / phrase command generation unit 46 of the phoneme processing unit 40 is provided.
Has been modified. The pause / phrase command generation unit is a main part of the phoneme processing unit 40 with this change.
The configuration of 46 is shown in FIG.

【０１０５】ポーズ・フレーズ指令生成部46は、前述し
た実施例と同様にポーズ・フレーズ指令設定部46A と、
ICRLB 分割部46B とが備えられている。この場合、ポー
ズ・フレーズ指令設定部46A には、言語解析部30からの
出力が指令設定部46a に供給されている。フレーズ指令
設定部46A は、この指令設定部46a と、規格格納部46b
とを含んでいる。規格格納部46b には、ポーズ指令、フ
レーズ指令を供給されるデータに付与するための規格が
格納されている。指令設定部46a は、前述の実施例の規
格分割部460aと同等の機能を有するとともに、供給され
るデータがこの規格の内、韻律句に含まれる拍数限界L₂
を満足するかの判断機能が備わっている。この条件によ
り、指令設定部46a は、データの出力先を選択してい
る。条件については、後段の動作で詳述する。The pause / phrase command generation unit 46 includes a pause / phrase command setting unit 46A, as in the above-described embodiment.
An ICRLB dividing unit 46B is provided. In this case, the pause / phrase command setting unit 46A is supplied with the output from the language analysis unit 30 to the command setting unit 46a. The phrase command setting unit 46A includes a command setting unit 46a and a standard storage unit 46b.
And The standard storage unit 46b stores a standard for giving a pause command and a phrase command to supplied data. The command setting unit 46a has a function equivalent to that of the standard division unit 460a of the above-described embodiment, and the data supplied is a beat limit L ₂ included in the prosodic phrase in this standard.
There is a function to judge whether or not to satisfy. According to this condition, the command setting section 46a selects the data output destination. The conditions will be described in detail in a later operation.

【０１０６】ICRLB 分割部46B は、実例検索部466 と、
分割実例記憶部468 とが備えられている。実例検索部46
6 には、指令設定部46a から供給されるデータを韻律語
素に分解する韻律語素分割部466aと、韻律語素分割部46
6aの出力に対する検索キーの情報を抽出し、そして分割
実例記憶部468 に記憶されている実例との照合を行う検
索キー照合部466bと、検索キー照合部466bの照合結果に
基づいて言語解析部30からの出力の分割を施すととも
に、ポーズ指令、フレーズ指令をこの出力に対して付与
する実例対応分割部466cとが含まれている。検索キー照
合部466bは、分割実例記憶部468 が有する照合用検索キ
ーと同じ情報で照合を行っている。The ICRLB dividing unit 46B includes an actual case searching unit 466,
A division example storage unit 468 is provided. Example search section 46
6 includes a prosodic word segmentation unit 466a for decomposing the data supplied from the command setting unit 46a into prosodic words,
A search key collating unit 466b for extracting information of the retrieval key for the output of 6a and collating it with the examples stored in the divided example storage unit 468, and a language analysis unit based on the collation result of the retrieval key collating unit 466b. An example corresponding division unit 466c that divides the output from the output unit 30 and gives a pause command and a phrase command to this output is included. The search key collating unit 466b performs collation using the same information as the collation retrieval key included in the divided actual example storage unit 468.

【０１０７】分割実例記憶部468 は、メモリで、既に的
確な分割と判断された実例のデータを照合用の検索キー
として記憶している。照合用の検索キーとしては、実際
の入力文のデータ、強調情報、韻律語素、韻律語素の拍
数、この拍数の総数、韻律語素の係受けの種別、および
／またはアクセント指令等を用いる。The division example storage unit 468 stores, in a memory, data of an example already determined to be an appropriate division as a search key for collation. The search key for collation includes actual input sentence data, emphasis information, prosodic words, the number of beats of the prosodic words, the total number of beats, the type of the prosodic words, and / or accent commands. Is used.

【０１０８】このポーズ・フレーズ指令生成部46の動作
について図19のサブルーチンSUB8を用いて簡単に説明す
る。サブルーチンSUB8は、図８に示したサブルーチンSU
B3の処理に代わるポーズ・フレーズ指令生成処理ルーチ
ンである。この処理には、分割実例記憶部468 に的確な
分割がされた実例を予め複数格納しておくことが好まし
い。この実例には、ポーズ指令・フレーズ指令も含まれ
ている。The operation of the pause / phrase command generator 46 will be briefly described with reference to the subroutine SUB8 of FIG. The subroutine SUB8 is a subroutine SU shown in FIG.
This is a pause / phrase command generation processing routine that replaces the processing of B3. In this process, it is preferable to store a plurality of correctly divided examples in the divided example storage unit 468 in advance. This example includes a pause command and a phrase command.

【０１０９】図19に示すサブルーチンSUB8のサブステッ
プSS80では、供給される韻律句の長さが規定されている
拍数限界L₂を満足するかの判断を行う。この際、指令設
定部46a には、規格格納部46b から格納されている拍数
限界L₂の値が供給されている。指令設定部46a は、この
数値に基づき韻律句の長さが（第２の）規則を満足する
か判断している（分割長判断工程）。規則を満足すると
き（Yes ）、サブステップSS81に進む。また、規則を満
たさないとき（No）、サブステップSS82に進む。[0109] In sub-step SS80 of the subroutine SUB8 shown in FIG. 19, performs determination of whether to satisfy the prosody beats length of is defined clause limit L ₂ supplied. At this time, the command setting section 46a, the value of the standard storage beats stored from section 46b limit L ₂ is supplied. The command setting unit 46a determines whether the length of the prosodic phrase satisfies the (second) rule based on the numerical value (division length determination step). When the rule is satisfied (Yes), the process proceeds to sub-step SS81. When the rule is not satisfied (No), the process proceeds to sub-step SS82.

【０１１０】サブステップSS81では、規則を満たすよう
に韻律句の長さが拍数限界L₂以下にあるので、韻律句の
間に対応するポーズ記号・フレーズ記号を付加する。指
令設定部46a は、規格格納部46b に格納されているポー
ズ指令・フレーズ指令の規則に対応してポーズ記号・フ
レーズ記号を付加している。この処理後、リターンに移
行する。[0110] In sub-step SS81, the length of the prosodic phrase to satisfy the rules are in the limit L ₂ less than the number of beats, adding pause symbols phrase symbol corresponding to between prosodic phrase. The command setting unit 46a adds a pause symbol / phrase symbol in accordance with the rules of the pause command / phrase command stored in the standard storage unit 46b. After this processing, the flow shifts to return.

【０１１１】サブステップSS82では、供給されたデータ
をICRLB 分割部46B の実例検索部466 に送って韻律語素
に分割する。この分割処理（すなわちICRLB 分割）は、
実例検索部466 の韻律語素分割部466aで行われる。韻律
語素分割部466aは、分割したデータを検索キー照合部46
6bに出力する。この処理後、サブステップSS83に進む。In the sub-step SS82, the supplied data is sent to the actual example search unit 466 of the ICRLB division unit 46B, and divided into prosodic words. This splitting process (ie, ICRLB splitting)
This is performed by the prosodic word segmentation unit 466a of the example search unit 466. The prosodic word element division unit 466a compares the divided data with the search key collation unit 46.
Output to 6b. After this processing, the flow advances to sub-step SS83.

【０１１２】サブステップSS83では、分割されたデータ
に含まれている情報と一致する分割実例の検索および検
索の一致度合いを判定する。実際に、この検索は、韻律
語素分割部466aからの出力を検索キーとし分割実例記憶
部468 が有する情報を照合用の検索キーとし、検索キー
照合部466bで行われる。この検索キーの例は、後段に示
す。検索において、完全に一致する実例が得られたとき
（Yes ）、サブステップSS84に進む。検索キー照合部46
6bは、完全に一致しなかったとき（No）、サブステップ
SS85に進む。In sub-step SS83, a search is made for a division example that matches the information contained in the divided data, and the degree of coincidence of the retrieval is determined. In practice, this search is performed by the search key matching unit 466b using the output from the prosodic word segmentation unit 466a as a search key and the information stored in the division example storage unit 468 as a search key for comparison. An example of this search key will be described later. When a completely matching actual example is obtained in the search (Yes), the process proceeds to sub-step SS84. Search key collation unit 46
6b is a sub-step when there is no exact match (No)
Proceed to SS85.

【０１１３】サブステップSS84では、分割実例記憶部46
8 から一致した実例を検索キー照合部466b、実例対応分
割部466cを介して出力する。このとき、実例対応分割部
466cは、単にこの実例の情報をスルーさせて規則を満た
した際の指令設定部46a の出力先に供給する。この供給
の後、リターンに移行する。In the sub-step SS84, the divided actual example storage section 46
8, the matching example is output via the search key matching unit 466b and the example correspondence division unit 466c. At this time, the example corresponding division unit
466c simply passes through the information of this example and supplies it to the output destination of the command setting unit 46a when the rule is satisfied. After this supply, a return is made.

【０１１４】サブステップSS85では、分割実例記憶部46
8 が有する実例の中で最も一致している実例に応じて言
語解析部30からの出力を分割し、かつポーズ指令・フレ
ーズ指令を付加する処理を施す。検索キー照合部466b
は、一致性の高い実例を実例対応分割部466cに供給す
る。このような場合、入力文が一致していないことが考
えられるので、実例対応分割部466cは、言語解析部30か
らの出力を検索キー照合部466bからの情報に応じて分
割、かつその分割位置も含めて情報（ポーズ指令・フレ
ーズ指令等）の付加を行う。実例対応分割部466cは、出
力をサブステップSS84での指令設定部46a の出力先と同
じ出力先に供給する。この処理後、リターンに移行す
る。サブルーチンSS84, SS85の処理は実例検索工程に相
当する。In the sub-step SS85, the division example storage section 46
8, the output from the language analysis unit 30 is divided according to the instance that matches the most, and a process of adding a pause instruction / phrase instruction is performed. Search key collation unit 466b
Supplies a high-matching example to the example-corresponding division unit 466c. In such a case, since the input sentences may not match, the example correspondence division unit 466c divides the output from the language analysis unit 30 according to the information from the search key collation unit 466b, and (Pause command, phrase command, etc.). The example corresponding division unit 466c supplies the output to the same output destination as the output destination of the command setting unit 46a in sub-step SS84. After this processing, the flow shifts to return. The processing of the subroutines SS84 and SS85 corresponds to an actual example search step.

【０１１５】リターンでは、実例に合った分割候補の選
択を行うサブルーチンSUB8を終了してサブステップSS22
に進む。これ以降の処理は、サブルーチンSUB2の処理を
行った後、メインルーチンにより合成音声を生成してい
る。この変形例のように構成すると、音声合成装置10を
簡略化して構成することが可能になる。In the return, the subroutine SUB8 for selecting a division candidate suitable for the actual example ends, and the sub-step SS22
Proceed to. In the subsequent processing, after performing the processing of the subroutine SUB2, a synthesized speech is generated by the main routine. With such a modification, the speech synthesizer 10 can be simplified.

【０１１６】この検索に用いる情報の具体例を以下に示
す。一つの入力文に対して分割実例記憶部468 には、た
とえば韻律語素、分割位置（記号「／」で示す）、韻律
語素数、全体の拍数、強調情報、各韻律語素の拍数、係
受けの種別、アクセント指令の位置および／または品詞
の種類等の情報が数値に置き換えられて格納される。A specific example of the information used for this search is shown below. For one input sentence, the division example storage unit 468 stores, for example, a prosodic word element, a division position (indicated by a symbol “/”), a prosodic word prime number, an overall number of beats, emphasis information, a beat number of each prosodic word element. , The type of the dependency, the position of the accent command, and / or the type of part of speech are replaced with numerical values and stored.

【０１１７】たとえば、入力文が「遠くの海まで漁に出
かけている漁師が」は、韻律語素と分割位置の関係：ト
ークノ，ウミマデ，（／）リョーニ，デテイル，リョー
シガとなり韻律語素数が 5、全体の拍数が19、各韻律句
の強調情報が0emph,0emph,0emph,0emph,0emph 、各韻律
句の拍数が(4),(4),(3),(4),(4) である。このときのP
記号は（P₁）トークノ，ウミマデ，（P₃）リョーニ，デ
テイル，リョーシガ（P₀）であり、強調を含む文例「国
境に近いである島である魚釣り島に着いた」は韻律語素
と分割位置の関係：コッキョーニ，チカ↓イ，シマ↓デ
アル，（／）ウオツリジマニ，ツ↓イタとなり韻律語素
数が 5、全体の拍数が23、各韻律句の強調情報は、0emp
h,0emph,0emph,+emph,0emph で、各韻律句の拍数は、
(5),(3),(5),(7),(3) である。このときのP 記号は
（P₁）コッキョーニ，チカ↓イ，シマ↓デアル，（P₃）
ウオツリジマニ，ツ↓イタ（P₀）という関係になってい
る。For example, if the input sentence is “fisherman who has gone fishing to a distant sea”, the relation between the prosodic words and the division positions: Tokno, Umimade, (/) Ryoni, Detail, Ryoshiga and the prosodic words prime number is 5 , The total number of beats is 19, the emphasis information of each prosodic phrase is 0emph, 0emph, 0emph, 0emph, 0emph, and the number of beats of each prosodic phrase is (4), (4), (3), (4), (4 ). P at this time
The symbols are (P ₁ ) Tokno, Umimade, (P ₃ ) Ryoni, Detail, Ryoshiga (P ₀ ), and the sentence example with emphasis “I arrived at the fishing island that is an island near the border” is a prosodic word element. Relation of division position: Coccyoni, Chica ↓ i, Shima ↓ Deal, (/) Waterojimani, Tsu ↓ Ita and the prosodic word prime number is 5, the total number of beats is 23, the emphasis information of each prosodic phrase is 0emp
h, 0emph, 0emph, + emph, 0emph and the number of beats in each prosodic phrase is
(5), (3), (5), (7), (3). The P symbol at this time is (P ₁ ) KOKKYONI, CHICA ↓ I, SHIMA ↓ DEAL, (P ₃ )
Uotsurijimani, have a relationship that Tsu ↓ Ita (P _0).

【０１１８】また、抑圧を含む文例「会長を務めたこと
も強みの一つです。」は韻律語素と分割位置の関係：カ
イチョーヲ，ツト↓メタ，コト↓モ，（／）ツヨミノ，
ヒト↓ツデスとなり韻律語素数が 5、全体の拍数が19、
各韻律句の強調情報は、0emph,0emph,-emph,0emph,0emp
h で、各韻律句の拍数は、(4),(4),(3),(4),(4) であ
る。このときのP 記号は（P₁）カイチョーヲ，ツト↓メ
タ，コト↓モ，（P₃）ツヨミノ，ヒト↓ツデス（P₀）と
なる。The sentence example including repression is also one of the strengths of having served as the chairman. The relation between prosodic words and division positions: Kaicho ヲ, Tsuto ↓ Meta, Koto ↓ Mo, (/) Tsuyomino,
Human ↓ Tsudes, prosodic prime number is 5, the overall beat is 19,
The emphasis information for each prosodic phrase is 0emph, 0emph, -emph, 0emph, 0emp
In h, the number of beats of each prosodic phrase is (4), (4), (3), (4), (4). At this time, the P symbol is (P ₁ ) Kaicho ヲ, ↓ ↓ meta, ト ↓ モ, (P ₃ ) tsuyomino, and human ↓ tsudes (P ₀ ).

【０１１９】最後に、係受けの種別も情報として用いる
例に挙げると、前述した入力文「私たちの生活から欠か
せない道具となっています。」は、ワタシ↓タチノ，セ
ーカツカラ，（／）カカセナイ，ドーグ↓ト，ナ↓ッテ
イマイスとなり、韻律語素数5、全体の拍数が27、係り
受け種別は韻律句毎に連体、連用、連体、連用、なし
で、各韻律句の強調情報は、0emph,0emph,0emph,0emph,
0emph で、各韻律句の拍数は、(6),(6),(5),(4),(6) で
ある。このときのP 記号は（P₁）ワタシ↓タチノ，セー
カツカラ，（P₃）カカセナイ，ドーグ↓ト，ナ↓ッテイ
マイス（P₀）となる。これらの情報を数値化して分割実
例記憶部468 に格納し、検索時に照合用の検索キーに用
いることにより、パラメータの設定等を行うことなく、
実例の検索および検索の精度を向上させることができる
ようになる。Finally, as an example of using the type of dependency as information, the above-mentioned input sentence “It is an indispensable tool in our lives.” Is “Watashi ↓ Tachino, Sekatsukara, (/) Kakasenai, Dog ↓ G, Na ↓ Meiteis, prosodic word prime number 5, the total number of beats is 27, the dependency type is continuous, continuous, continuous, continuous, none for each prosodic phrase, and the emphasis information of each prosodic phrase is 0emph, 0emph, 0emph, 0emph,
At 0 emph, the number of beats in each prosodic phrase is (6), (6), (5), (4), (6). At this time, the P symbol is (P ₁ ) I ↓ Tachino, Sekatsukara, (P ₃ ) Kakasenai, Dog ↓ G, N ↓ Itomies (P ₀ ). By digitizing such information and storing it in the division example storage unit 468 and using it as a search key for collation at the time of search, it is possible to set parameters without performing
Example retrieval and retrieval accuracy can be improved.

【０１２０】以上のように構成することにより、妥当な
ICRLB 分割を行えるので、不自然な合成音声を出す音韻
処理が避けられるので、より自然で聞き易い合成音声を
出力させることができる。この音声合成装置は、規則の
変更に伴うパラメータ値の変更も半自動的に対応するこ
とができ。この装置の操作性を容易化し、人手による労
力を減少させることができる。With the above configuration, a reasonable
Since ICRLB division can be performed, phonological processing for generating an unnatural synthesized voice can be avoided, so that a synthesized voice that is more natural and easy to hear can be output. This speech synthesizer can semi-automatically cope with parameter value changes accompanying rule changes. The operability of this device can be facilitated, and the labor required by humans can be reduced.

【０１２１】また、実例を検索し選択した実例を用いる
ことにより、パラメータ設定等の処理および演算処理を
なくすことができ、装置構成の簡略化によりコスト低減
も図ることができる。Further, by using an example obtained by searching and selecting an example, processing such as parameter setting and calculation processing can be eliminated, and cost can be reduced by simplifying the apparatus configuration.

【０１２２】なお、本発明は、前述した実施例に限定さ
れるものでなく、たとえば前述の実施例とこの変形例を
組み合わせて構成してもよい。音声合成装置10を稼働さ
せはじめた初期では、情報の蓄積を図るため図４の構成
により最適な分割候補を求め、中間言語を作成させると
ともに、この求めた分割候補の情報を分割実例記憶部46
8 に供給して予め各種の実例を学習的に記憶させる。こ
の処理を行って分割実例記憶部468 に情報を蓄積させた
後、供給されるデータに対する処理を前述した変形例
（の図18に示す構成）に切り換えて情報の検索を行って
ポーズ指令・フレーズ指令を付加するようにしてもよ
い。これにより、最初に、ユーザの使用する確実な音声
合成するためのデータが音声合成装置10に蓄積されるの
で無駄なデータの格納を避けることができ、処理を検索
処理にした場合は、演算を行うことなく検索キーを組み
合わせて検索することにより所望の分割候補およびそれ
に付加する情報を求めることが容易にできるようにな
る。The present invention is not limited to the above-described embodiment. For example, the present invention may be configured by combining the above-described embodiment with this modification. At the initial stage when the speech synthesizer 10 is started to operate, an optimal division candidate is determined by the configuration of FIG. 4 to accumulate information, an intermediate language is created, and the information of the determined division candidate is stored in the division example storage unit 46.
8 for learning various examples in advance. After performing this processing and accumulating the information in the division example storage unit 468, the processing for the supplied data is switched to the above-described modified example (the configuration shown in FIG. 18) to search for the information, and to execute the pause command / phrase. A command may be added. As a result, first, data for reliable speech synthesis used by the user is stored in the speech synthesizer 10, so that useless data storage can be avoided. By performing a search by combining search keys without performing the search, a desired division candidate and information to be added thereto can be easily obtained.

【０１２３】また、前述の実施例では、分割位置を選択
する式(1) に拍数を用いたが、文節の中心語、または文
節間の係受け種別等を変数に用いて演算させることもで
きる。In the above-described embodiment, the number of beats is used in the equation (1) for selecting the division position. However, the calculation may be performed using the central word of the phrase, the type of inter-phrase dependency, or the like as a variable. it can.

【０１２４】[0124]

【発明の効果】このように本発明の音声合成装置によれ
ば、分割候補生成手段で言語解析手段の出力を第１の規
則に基づいて分割し、この分割された韻律句を第２の規
則で判断して韻律句を再分割する際にこの韻律句を区切
る位置により生成される韻律句の組合せを分割候補と
し、分割候補選択手段で得られた分割候補が含む音声表
現の情報およびパラメータ格納手段から供給されるパラ
メータを用いて分割妥当性を示す評価値を算出し、さら
に、この評価値を用いて分割候補選択手段で第３の規則
を満たす分割候補を選択して選ばれた分割候補が最適な
長さに区切られた韻律句になって音韻解析手段でこの分
割に応じた中間言語を生成する音声合成装置における中
間処理が行われるので、不自然な合成音声を出す音韻処
理が避けられ、より自然で聞き易い合成音声を出力させ
ることができ、規則の変更に伴うパラメータ値の変更も
半自動的に対応することができる。これにより、さらに
この装置の操作性を容易化し、人手による労力を減少さ
せることもできる。As described above, according to the speech synthesizer of the present invention, the output of the language analysis means is divided by the division candidate generation means based on the first rule, and the divided prosodic phrases are divided by the second rule. When the prosodic phrase is re-divided as determined by the above, the combination of the prosodic phrases generated by the positions separating the prosodic phrases is used as a division candidate, and information and parameters of speech expressions included in the division candidate obtained by the division candidate selection means are stored. An evaluation value indicating the validity of the division is calculated using the parameters supplied from the means, and the division candidate selected by selecting the division candidate satisfying the third rule by the division candidate selection means using the evaluation value. Is converted into a prosodic phrase divided into optimal lengths, and the phonological analysis means performs intermediate processing in the speech synthesizer that generates an intermediate language according to this division, so that phonological processing that produces unnatural synthesized speech is avoided. More It can be output easily synthesized speech heard by natural changes in the parameter values due to changes in rules can also be semi-automatically correspond. As a result, the operability of the device can be further facilitated, and the labor required by humans can be reduced.

【０１２５】また、音声合成装置は、分割実例格納手段
に実例を記憶させて分割実例検索手段で外部からの情報
に一致する記憶させていた情報を検索して、この検索結
果を用いて該当する分割候補に指令等を付加すると、パ
ラメータに対する処理、分割候補のコスト計算等を行う
ことなく、簡単な構成で品質の高い合成音声を得ること
ができる。装置のコスト低減も図ることができる。Further, the speech synthesizing apparatus stores an example in the divided example storage means, searches the stored information that matches the information from the outside by the divided example search means, and uses the search result to match the stored information. When a command or the like is added to a division candidate, high-quality synthesized speech can be obtained with a simple configuration without performing processing for parameters, calculating the cost of the division candidate, and the like. The cost of the apparatus can be reduced.

【０１２６】本発明の音声合成装置のテキスト解析方法
によれば、文章に対する解析結果を第１の規則、第２の
規則に沿って順に処理や判断を行い、この結果に応じて
情報を再分割した際に得られる組合せを分割候補とし、
一方、格納されるパラメータおよび分割候補が含む音声
表現の情報を用いて分割妥当性の評価値を算出し、第３
の規則を満足する分割候補を選択する音韻解析を行って
この分割候補に応じた中間言語を生成することにより、
より自然で聞き易い合成音声の出力される確度を向上さ
せることができる。According to the text analysis method of the speech synthesizing apparatus of the present invention, the analysis result of a sentence is processed and determined in order according to the first rule and the second rule, and information is re-divided according to the result. The combination obtained at the time of
On the other hand, the evaluation value of the division validity is calculated using the stored parameters and the information of the speech expression included in the division candidate, and the third value is calculated.
By performing a phonological analysis to select a division candidate that satisfies the rule of and generating an intermediate language corresponding to the division candidate,
It is possible to improve the accuracy of outputting a synthesized voice that is more natural and easy to hear.

【０１２７】また、分割候補は、供給される情報および
／または適切に区分された実例に付加される情報を検索
キーとしてを予め格納し、供給される韻律句に対する判
断結果に応じて出力先を選択した後に、この選択された
出力先で供給される韻律句の分割に該当する実例を検索
し、この検索結果に応じて各種指令を付与することによ
っても、無駄な演算を大幅に削減しても容易な操作でよ
り自然な品質の高い合成音声を生成させることができ
る。Further, the division candidate stores in advance the information to be supplied and / or the information to be added to an appropriately divided example as a search key, and determines the output destination according to the judgment result for the supplied prosodic phrase. After the selection, searching for an example corresponding to the division of the prosodic phrase supplied at the selected output destination, and by giving various instructions according to the search result, the useless operation is greatly reduced. It is also possible to generate a more natural and high-quality synthesized speech by an easy operation.

[Brief description of the drawings]

【図１】本発明の音声合成装置の概略的な構成を示すブ
ロック図である。FIG. 1 is a block diagram showing a schematic configuration of a speech synthesizer of the present invention.

【図２】図１の音声合成装置における言語解析部の概略
的な構成の一例を示すブロック図である。FIG. 2 is a block diagram illustrating an example of a schematic configuration of a language analysis unit in the speech synthesis device in FIG. 1;

【図３】図２の言語解析部で扱う言語の処理単位の関係
とその分類例を示す図である。FIG. 3 is a diagram illustrating a relationship between processing units of a language handled by a language analysis unit in FIG. 2 and a classification example thereof.

【図４】図１の音声合成装置における音韻処理部の概略
的な構成の一例を示すブロック図である。FIG. 4 is a block diagram illustrating an example of a schematic configuration of a phoneme processing unit in the speech synthesis device in FIG. 1;

【図５】図１の音声合成装置の動作を説明するメインフ
ローチャートである。FIG. 5 is a main flowchart for explaining the operation of the speech synthesizer in FIG. 1;

【図６】図５のメインフローチャートに示した言語解析
処理（サブルーチンSUB1）の動作を説明するフローチャ
ートである。FIG. 6 is a flowchart illustrating an operation of a language analysis process (subroutine SUB1) shown in the main flowchart of FIG. 5;

【図７】図６に示した処理ルーチン内の構文解析処理と
して構文木および係り受け構造の一具体例を表す図であ
る。FIG. 7 is a diagram showing a specific example of a syntax tree and a dependency structure as a syntax analysis process in the processing routine shown in FIG. 6;

【図８】図５のメインフローチャートに示した音韻解析
処理（サブルーチンSUB2）の動作を説明するフローチャ
ートである。FIG. 8 is a flowchart illustrating the operation of the phoneme analysis process (subroutine SUB2) shown in the main flowchart of FIG.

【図９】図８のサブルーチンSUB2に示したポーズ・フレ
ーズ指令設定（サブルーチンSUB3）の動作を説明するフ
ローチャートである。FIG. 9 is a flowchart illustrating an operation of a pause / phrase command setting (subroutine SUB3) shown in a subroutine SUB2 of FIG. 8;

【図１０】図９のサブルーチンSUB3に示したICRLB 分割
処理（サブルーチンSUB4）の動作を説明するフローチャ
ートである。FIG. 10 is a flowchart illustrating an operation of an ICRLB division process (subroutine SUB4) shown in a subroutine SUB3 of FIG.

【図１１】図10のサブルーチンSUB4に示した分割候補生
成処理（サブルーチンSUB5）の動作を説明するフローチ
ャートである。11 is a flowchart illustrating an operation of a division candidate generation process (subroutine SUB5) shown in a subroutine SUB4 of FIG.

【図１２】図10のサブルーチンSUB4に示したパラメータ
格納処理（サブルーチンSUB6）の動作を説明するフロー
チャートである。FIG. 12 is a flowchart illustrating an operation of a parameter storage process (subroutine SUB6) shown in a subroutine SUB4 of FIG.

【図１３】図10のサブルーチンSUB4に示した分割選択処
理（サブルーチンSUB7）の動作を説明するフローチャー
トである。FIG. 13 is a flowchart illustrating the operation of a division selection process (subroutine SUB7) shown in subroutine SUB4 of FIG.

【図１４】図１の構成に供給された入力文を分割した分
割候補に対して得られる各変数の数値および分割候補の
選択を表す図である。14 is a diagram illustrating numerical values of variables obtained for a division candidate obtained by dividing the input sentence supplied to the configuration of FIG. 1 and selection of the division candidate.

【図１５】図１の構成に供給された入力文を分割した分
割候補に対して得られる各変数の数値および分割候補の
選択を表す図である。FIG. 15 is a diagram illustrating numerical values of variables obtained for the division candidates obtained by dividing the input sentence supplied to the configuration of FIG. 1 and selection of the division candidates.

【図１６】図１の構成に供給された入力文を分割した分
割候補に対して得られる各変数の数値および分割候補の
選択を表す図である。16 is a diagram illustrating numerical values of variables obtained for the division candidates obtained by dividing the input sentence supplied to the configuration of FIG. 1 and selection of the division candidates.

【図１７】図１の構成に供給された入力文を分割した分
割候補に対して得られる各変数の数値および分割候補の
選択を表す図である。17 is a diagram illustrating numerical values of variables obtained for division candidates obtained by dividing the input sentence supplied to the configuration of FIG. 1 and selection of the division candidates.

【図１８】図４の音韻処理部における他の構成の要部で
ある概略的なポーズ・フレーズ指令生成部を示すブロッ
ク図である。FIG. 18 is a block diagram showing a schematic pause / phrase command generation unit which is a main part of another configuration in the phoneme processing unit of FIG. 4;

【図１９】図18のポーズ・フレーズ指令生成部の実例に
則したポーズ・フレーズ指令処理（サブルーチンSUB8）
の動作を説明するフローチャートである。FIG. 19 shows a pause / phrase instruction process (subroutine SUB8) in accordance with an example of the pause / phrase instruction generation unit in FIG.
5 is a flowchart for explaining the operation of FIG.

[Explanation of symbols]

10 音声合成装置 20 データ入力部 30 言語解析部 40 音韻処理部 42 韻律語生成部 44 アクセント指令生成部 46 ポーズ・フレーズ指令生成部 50 制御パラメータ生成部 60 音声信号生成部 460 分割候補生成部 462 分割候補選択部 464 パラメータ格納部 10 voice synthesizer 20 data input unit 30 language analysis unit 40 phoneme processing unit 42 prosodic word generation unit 44 accent command generation unit 46 pause / phrase command generation unit 50 control parameter generation unit 60 voice signal generation unit 460 division candidate generation unit 462 division Candidate selection section 464 Parameter storage section

Claims

[Claims]

1. A language analyzing means for analyzing a feature at a language level based on a morpheme included in a sentence supplied as information and a language used for processing a syntax of the sentence, and an output of the language analyzing means. Performs analysis based on the features of the speech language level, and generates phoneme analysis means for generating an intermediate language serving as an instruction for speech synthesis based on the obtained analysis results, and generates control parameters corresponding to the output of the phoneme analysis means. A voice synthesizing device that synthesizes a voice signal based on an output of the control parameter generating device, and synthesizes the information into a voice artificially. A first rule for dividing the output of the linguistic analysis means according to a modification relation of a prosodic phrase in which a plurality of prosodic words which are a chain of morphemes are collected, and a prosody obtained by the division. Second to determine but less below the size that is set in advance
A division candidate generating means for generating, as a division candidate, a combination of prosodic phrases obtained by positions at which the prosody phrase is divided when the prosody phrase is subdivided in accordance with the rule of: A third evaluation step of calculating an evaluation value for evaluating the validity of the division using the information of the speech expression including the information and selecting a division candidate having the minimum evaluation value;
A speech synthesis apparatus comprising: in the phonological analysis means, division candidate selection means for selecting a division candidate according to the following rule: and parameter storage means for storing a parameter used for calculating an evaluation value in the division candidate selection means. .

2. The apparatus according to claim 1, wherein the division candidate generation unit divides an output of the language analysis unit using the first rule, and A second rule processing means for judging the divided length in the output of the first rule processing means by the second rule; and a second rule processing means for determining the divided length in accordance with the judgment of the second rule processing means. And a subdivision unit configured to subdivide the prosodic phrase to be output to generate a plurality of division candidates.

3. The apparatus according to claim 1, wherein the parameter storage means includes: a first weighting factor based on information of a speech expression in the division candidate; and an average of prosodic words according to the number of divisions of the division candidate. And a second weighting factor for multiplying the number of beats.

4. The apparatus according to claim 1, wherein the information of the speech expression is emphasis information for designating emphasis and suppression of the prosodic word element and excluding both. Synthesizer.

5. The apparatus according to claim 1, wherein the phonological analysis unit stores an example of a division setting position of a prosodic phrase satisfying the rule, and the phonological analysis unit stores the example. And a divided example search unit for searching an example in which the supplied sentence is optimally divided from the examples.

6. The apparatus according to claim 5, wherein the divided example search means includes a search for the example stored by the divided example storage means, the emphasis information, the prosodic word element, and a beat rate of the prosodic word element. , A total number of beats, a type of dependency of the prosodic word, an accent command and / or a type of part of speech.

7. The apparatus according to claim 1, wherein the language analysis unit sets an emphasis information rule and includes an emphasis information setting unit that stores the set emphasis information. Speech synthesizer.

8. Analyzing features of the language level based on a morpheme included in a sentence supplied as information and a language used for processing the syntax of the sentence, and based on the analysis result, a speech language level for a prosody of the sentence. After performing an analysis based on the characteristics of the above, an intermediate language serving as a command for speech synthesis is generated based on the obtained analysis result, and a control parameter corresponding to the generated intermediate language is generated. In a text analysis method of a speech synthesizer for artificially synthesizing a corresponding speech, the method includes dividing the information according to a modification relation of a prosodic phrase in which a plurality of prosodic words that are a chain of the morphemes are combined. A rule dividing step of dividing the analysis result of the sentence using the first rule, and determining whether the size of the prosodic phrase is smaller than a predetermined size as a second rule, A division length determining step of determining the result of the rule division step by the second rule; and a prosody phrase delimiter setting position obtained by subdividing the information according to the determination result of the division length determining step. A division candidate generating step of generating a division candidate as a division candidate; a parameter storage step of storing a parameter used for evaluating division validity based on information of a speech expression included in the division candidate obtained in the division candidate generation step; An evaluation value calculating step of calculating an evaluation value from the information of the speech expression included in the division candidate generated in the step and the parameter; and selecting a division candidate having the smallest evaluation value among the evaluation values obtained in the evaluation value calculation step. And a division candidate selection step of selecting the division candidate based on the third rule.

9. The method according to claim 8, wherein the division candidate redivides the prosodic phrase when a beat length included in the prosodic phrase is larger than a preset subdivision beat length indicating re-division. A text analysis method for a speech synthesis apparatus, characterized in that a symbol for speech tone command, which is one of the features of the speech language level, is inserted into the boundary of the division candidate.

10. The method according to claim 8, wherein the evaluation value is a difference between a beat length of a section obtained by dividing the prosodic phrase and a beat length obtained by equally dividing the beat length of the entire prosodic phrase. An error sum calculating step of calculating a sum of absolute values; and a section obtained by dividing the prosodic phrase according to the presence of a feature amount represented by information of a speech expression included in the feature of the speech language level in the prosodic phrase. A feature amount sum calculating step of calculating a sum of beat lengths; a weight calculating step of calculating a weight coefficient based on information of a speech expression included in the prosodic word of the division candidate; a feature amount sum calculating step Is multiplied by the weighting factor obtained in the weight calculation step, and a weighted feature amount sum calculation step of calculating the sum of the multiplication results; and the average beat length after dividing the prosodic phrase and the prosodic phrase The product of the beat lengths obtained by dividing the entire beat length equally A product calculation step to be issued, and a result of the error sum calculation step and a result of the weighted feature sum calculation step, and a result of the product calculation step is subtracted from a result obtained by the addition. A text analysis method for the speech synthesis apparatus, wherein the text analysis method is calculated using an evaluation value calculation step of calculating an evaluation value of a division candidate.

11. The method according to claim 8, wherein the information of the phonetic expression is set in such a manner that the information is classified into emphasis and suppression of the prosodic words and a case where both are excluded. A text analysis method for a speech synthesizer to be executed.

12. The method according to claim 11, wherein the emphasis is classified by including a proper noun or a number in a prosodic clause of the division candidate, and the suppression is performed by using a formal noun, a verb in a prosodic clause of the division candidate. Or the beginning of the sentence, and classifying the prosodic phrase by including a particle at the end of the prosodic phrase, and assigning a predetermined value to the emphasis and the suppression. Text parsing method.

13. The method according to claim 8, wherein the division candidate stores in advance information to be supplied and / or information added to an appropriately divided example as a search key, and determines the division length. In the process, an output destination is selected in accordance with a result of the judgment on the supplied prosodic phrase, and thereafter, an example corresponding to the division of the prosodic phrase supplied by the selected output destination is searched, and according to the search result, A text analysis method for a speech synthesizer, comprising an example search step of dividing the prosodic phrase and giving various instructions.

14. The text analysis method according to claim 13, wherein in the example storing step, various examples are stored in a learning manner in advance.

15. The method according to claim 8, wherein said parameter is obtained by statistical analysis or multivariate analysis.

16. The method according to claim 8, wherein the language is Japanese.