JPH11282494A

JPH11282494A - Speech synthesizer and storage medium

Info

Publication number: JPH11282494A
Application number: JP10100646A
Authority: JP
Inventors: Hideyuki Hoshikawa; 英之星川
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 1998-03-27
Filing date: 1998-03-27
Publication date: 1999-10-15

Abstract

(57)【要約】【課題】アクセント句が無声化規則の適用対象である
場合は、その所定のアクセント句の長さに応じてアクセ
ント句に含まれる母音を無声化し、あるいは無声化しな
い処理を行うことにより合成音を聞き取りやすくする。【解決手段】ステップ４で辞書１２を参照しながら日
本語解析処理を行って合成文字列に変換し、ステップ６
で合成文字列に対して辞書１２に記憶されている無声化
規則を適用し、ステップ８でアクセント句内の音節の中
に無声化規則の適用対象となる音節があると判定した場
合は、ステップ１０でアクセント句内の拍数ｎを計数す
る。続いてステップ１２で拍数ｎが３より小さいと判定
した場合はステップ１６でアクセント句内の母音を無声
化しない音韻データを設定する。そしてステップ１８で
アクセント、ピッチなどの設定を行い、ステップ２０で
音声合成処理を行う。 (57) [Summary] [Problem] When an accent phrase is an object to which the devoicing rule is applied, a process for devoicing or not devoicing a vowel included in the accent phrase according to the length of the predetermined accent phrase is performed. By doing so, it makes the synthesized sound easier to hear. SOLUTION: In step 4, a Japanese character analysis process is performed while referring to a dictionary 12 to convert to a synthesized character string.
If the vocalization rules stored in the dictionary 12 are applied to the synthesized character string in step 8, and it is determined in step 8 that there is a syllable to which the voicing rule is to be applied among syllables in the accent phrase, At 10, the number of beats n in the accent phrase is counted. Subsequently, if it is determined in step 12 that the number of beats n is smaller than 3, in step 16, phonological data which does not devoice the vowels in the accent phrase is set. Then, the accent, pitch, etc. are set in step 18, and speech synthesis processing is performed in step 20.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声合成装置およ
びその音声合成装置を機能させるためのコンピュータプ
ログラムが記憶された記憶媒体に関し、テキストデータ
を音声に変換して発声するものとして好適なものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer and a storage medium in which a computer program for causing the speech synthesizer to function is stored. is there.

【０００２】[0002]

【従来の技術】従来、上記音声合成装置として、たとえ
ば図３に示す構成のものが知られている。この音声合成
装置３０には、単語、単語の読み、アクセント、文法、
無声化規則などの情報が登録されている辞書１２が備え
られている。ここで、無声化規則について、その内容の
一部を説明する図４を参照して説明する。無声化規則と
は、どのような場合に母音を無声化するかを定めたもの
であり、母音を無声化することにより、歯切れのよい合
成音を得るための規則である。図４に無声化規則の一部
を示す。図４の最初の行には、［ｓｈｉ］の次にタ行が
くる場合、［ｓｈｉ］を無声化するという無声化規則の
一部が記載されている。これは、たとえば、文末の「で
した［ｄｅ・ｓｈｉ・ｔａ］」を読み上げる場合、［ｓ
ｈｉ］の母音［ｉ］が、無声子音［ｓｈ］と［ｔ］に挟
まれているため、［ｓｈｉ］の母音［ｉ］を無声化する
という内容である。したがって、その無声化規則を用い
れば、「でした」を［ｄｅ・ｓｈ・ｔａ］と読み上げる
ことになる。2. Description of the Related Art Conventionally, as the above-mentioned voice synthesizing apparatus, for example, one having the configuration shown in FIG. 3 is known. The speech synthesizer 30 includes words, word reading, accents, grammar,
There is provided a dictionary 12 in which information such as a voiceless rule is registered. Here, the devoicing rule will be described with reference to FIG. 4 which explains a part of the content. The devoicing rule defines when vowels are to be devoiced, and is a rule for obtaining crisp synthesized sounds by devoicing vowels. FIG. 4 shows a part of the devoicing rule. The first line in FIG. 4 describes a part of the devoicing rule that [shi] is devoiced when the next line comes after [shi]. For example, when reading out the end of the sentence “was [de · shi · ta]”, [s
Since the vowel [i] of [hi] is sandwiched between the unvoiced consonants [sh] and [t], the vowel [i] of [shi] is devoiced. Therefore, if the unvoiced rule is used, “was” is read out as [de · sh · ta].

【０００３】また、音声合成装置３０には、合成の基本
単位となる音韻データが記憶された音韻データファイル
２０が備えられている。そして、日本語解析部１４は、
入力された漢字仮名混じりの日本語テキストデータを、
辞書１２を参照しながら音韻を表す片仮名と、韻律情報
を表す韻律記号とからなる合成文字列に変換し、この変
換した合成文字列を音声パラメータ設定部３２へ出力す
る。なお、韻律記号は、アクセント句、フレーズ句を表
す区切り記号、アクセントやポーズを表す記号からな
る。[0003] The speech synthesizer 30 is provided with a phoneme data file 20 in which phoneme data as a basic unit of synthesis is stored. And the Japanese analysis unit 14
The input Japanese text data mixed with Kanji Kana
While referring to the dictionary 12, it is converted into a synthesized character string composed of katakana representing phonemes and prosodic symbols representing prosody information, and the converted synthesized character strings are output to the speech parameter setting unit 32. The prosodic symbols include accent phrases, delimiters indicating phrase phrases, and symbols indicating accents and poses.

【０００４】そして、音声パラメータ設定部３２は、入
力した合成文字列と、辞書１２に記憶されている無声化
規則とを用いて合成すべき音声単位を設定するととも
に、合成文字列に含まれているアクセントやアクセント
句記号に基づいて合成音声のピッチや韻律継続時間など
のパラメータを設定する。たとえば、上述の「でした」
の例でいえば、音声単位を［ｄｅ］［ｓｈ］［ｔａ］と
設定する。続いて、音声合成部３４は、音声パラメータ
設定部３４で設定された音声単位を音韻データファイル
２０から選択し、この選択した音声単位と、上記設定さ
れたパラメータとに基づいて音声を合成する。そして、
音声合成部３４から出力される電気信号は、スピーカ２
４によって音声に変換される。A voice parameter setting unit 32 sets a voice unit to be synthesized using the input synthesized character string and the devoicing rule stored in the dictionary 12, and includes a speech unit included in the synthesized character string. Parameters such as the pitch of the synthesized speech and the prosody duration are set based on the accent and accent phrase symbol that are present. For example, "was" above
In the example, the voice unit is set to [de] [sh] [ta]. Subsequently, the voice synthesis unit 34 selects the voice unit set by the voice parameter setting unit 34 from the phoneme data file 20, and synthesizes voice based on the selected voice unit and the set parameters. And
The electric signal output from the voice synthesizer 34 is
4 is converted to voice.

【０００５】[0005]

【発明が解決しようとする課題】しかし、上記従来の音
声合成装置では、無声化規則を適用して母音を無声化し
たことによって、かえって合成音を聞き取り難くなる場
合がある。たとえば、２拍の単語である「七［ｓｈｉ・
ｃｈｉ］」は、の音声合成を行うと、［ｓｈｉ］および
［ｃｈｉ］の母音［ｉ］が無声化され、「ｓｈ／ｃｈ」
と読み上げるため、聞き取り難い音声になる。However, in the above-described conventional speech synthesizer, the vowel is devoiced by applying the devoicing rule, so that the synthesized voice may be difficult to hear. For example, a two-beat word "seven [shi.
[chi]], the voice synthesis of [shi] and the vowel [i] of [chi] are devoiced, and “sh / ch”
Is read, so the sound is hard to hear.

【０００６】そこで、本発明は、テキストデータ中の所
定のアクセント句が、無声化規則の適用対象である場合
において、その所定のアクセント句の長さに応じて、そ
の所定のアクセント句に含まれる母音を無声化し、ある
いは、無声化しない処理を行うことにより、合成音の聞
き取りやすい音声合成装置および記憶媒体を実現するこ
とを目的とする。Accordingly, the present invention provides a method in which a predetermined accent phrase in text data is included in the predetermined accent phrase in accordance with the length of the predetermined accent phrase in a case where the devoicing rule is applied. It is an object of the present invention to realize a voice synthesizer and a storage medium in which vowels are unvoiced or non-voiceless so that synthesized voices can be easily heard.

【０００７】[0007]

【課題を解決するための手段】本発明は、上記目的を達
成するため、請求項１に記載の発明では、音声を合成す
る際の単位となる音声合成単位データが記憶された記憶
手段と、無声化規則にしたがってテキストデータ中の所
定のアクセント句に含まれる母音を無声化するか否かを
判定する第１の判定手段と、前記所定のアクセント句の
長さを検出する検出手段と、この検出手段によって検出
されたアクセント句の長さが、予め設定された所定の長
さよりも長いか否かを判定する第２の判定手段と、前記
第１の判定手段による判定結果が、前記所定のアクセン
ト句に含まれる母音を無声化するという判定結果である
場合は、前記第２の判定手段による判定結果に応じて、
前記所定のアクセント句に含まれる母音を無声化し、も
しくは、無声化しない処理を行う処理手段と、この処理
手段による処理結果および前記第１の判定手段による判
定結果に基づいて前記記憶手段から音声合成単位データ
を読出すとともに、その読出した音声合成単位データを
合成する合成手段と、が備えられたという技術的手段を
採用する。According to the present invention, in order to achieve the above object, according to the first aspect of the present invention, there is provided storage means for storing speech synthesis unit data as a unit for synthesizing speech, First determining means for determining whether or not a vowel included in a predetermined accent phrase in text data is to be devoiced in accordance with a voiceless rule; detecting means for detecting a length of the predetermined accent phrase; A second determining unit that determines whether a length of the accent phrase detected by the detecting unit is longer than a predetermined length, and a determination result by the first determining unit is the predetermined If the result is that the vowel included in the accent phrase is to be devoiced, according to the result of the determination by the second determining means,
Processing means for voiceless or non-voiceless processing of vowels included in the predetermined accent phrase, and speech synthesis from the storage means based on a processing result of the processing means and a determination result by the first determination means. And a synthesizing means for reading the unit data and synthesizing the read speech synthesis unit data.

【０００８】請求項２に記載の発明では、請求項１に記
載の音声合成装置において、前記処理手段は、前記第１
の判定手段による判定結果が、前記所定のアクセント句
に含まれる母音を無声化するという判定結果であり、か
つ、前記第２の判定手段による判定結果が、前記検出さ
れたアクセント句の長さが、予め設定された所定の長さ
よりも長いという判定結果である場合は、前記所定のア
クセント句に含まれる母音を無声化する処理を行い、前
記第１の判定手段による判定結果が、前記所定のアクセ
ント句に含まれる母音を無声化するという判定結果であ
り、かつ、前記第２の判定手段による判定結果が、前記
検出されたアクセント句の長さが、予め設定された所定
の長さよりも長くないという判定結果である場合は、前
記所定のアクセント句に含まれる母音を無声化しない処
理を行うという技術的手段を採用する。According to a second aspect of the present invention, in the voice synthesizing apparatus according to the first aspect, the processing unit includes the first processing unit.
Is a determination result that the vowel included in the predetermined accent phrase is to be unvoiced, and the determination result by the second determination unit is that the length of the detected accent phrase is If it is determined that the vowel included in the predetermined accent phrase is unvoiced, the result of the determination by the first determination unit is determined to be longer than the predetermined length. The determination result that the vowel included in the accent phrase is to be devoiced, and the determination result by the second determination unit is that the length of the detected accent phrase is longer than a predetermined length set in advance If the determination result is that there is no vowel, a technical means of performing a process of not devoicing a vowel included in the predetermined accent phrase is adopted.

【０００９】請求項３に記載の発明では、請求項１また
は請求項２に記載の音声合成装置において、前記所定の
アクセント句の長さは、その所定のアクセント句内の拍
数、音声合成単位数および音韻継続時間のうちの少なく
とも１つによって決定されるという技術的手段を採用す
る。According to a third aspect of the present invention, in the speech synthesizer according to the first or second aspect, the length of the predetermined accent phrase is the number of beats in the predetermined accent phrase, a voice synthesis unit. A technical measure is adopted that is determined by at least one of the number and the phoneme duration.

【００１０】請求項４に記載の発明では、音声を合成す
る際の単位となる音声合成単位データが記憶された記憶
領域と、無声化規則にしたがってテキストデータ中の所
定のアクセント句に含まれる母音を無声化するか否かを
判定する第１の判定プログラムと、前記所定のアクセン
ト句の長さを検出する検出プログラムと、この検出プロ
グラムによって検出されたアクセント句の長さが、予め
設定された所定の長さよりも長いか否かを判定する第２
の判定プログラムと、前記第１の判定プログラムによる
判定結果が、前記所定のアクセント句に含まれる母音を
無声化するという判定結果である場合は、前記第２の判
定手段による判定結果に応じて、前記所定のアクセント
句に含まれる母音を無声化し、もしくは、無声化しない
処理を行う処理プログラムと、この処理プログラムによ
る処理結果および前記第１の判定プログラムによる判定
結果に基づいて前記記憶手段から音声合成単位データを
読出すとともに、その読出した音声合成単位データを合
成するデータ合成プログラムと、が記憶された記憶媒体
という技術的手段を採用する。According to the fourth aspect of the present invention, a storage area in which speech synthesis unit data serving as a unit for synthesizing speech is stored, and a vowel included in a predetermined accent phrase in the text data in accordance with a de-voicing rule. A first determination program for determining whether or not to make the voice unvoiced, a detection program for detecting the length of the predetermined accent phrase, and the length of the accent phrase detected by the detection program is set in advance. Second to determine whether the length is longer than a predetermined length
In the case where the judgment result of the first judgment program and the judgment result of the first judgment program are the judgment result of devoicing the vowel included in the predetermined accent phrase, according to the judgment result by the second judgment means, A processing program that performs a process of unvoicing or not devoicing a vowel included in the predetermined accent phrase, and performing speech synthesis from the storage unit based on a processing result by the processing program and a determination result by the first determination program. A technical means of a storage medium in which a unit data is read and a data synthesizing program for synthesizing the read voice synthesizing unit data is adopted.

【００１１】[0011]

【作用】請求項１ないし請求項３に記載の発明では、上
記記憶手段は、音声を合成する際の単位となる音声合成
単位データを記憶し、上記第１の判定手段は、無声化規
則にしたがってテキストデータ中の所定のアクセント句
に含まれる母音を無声化するか否かを判定し、第２の判
定手段は、所定のアクセント句の長さを検出する検出手
段によって検出されたアクセント句の長さが、予め設定
された所定の長さよりも長いか否かを判定する。そし
て、上記処理手段は、第１の判定手段による判定結果
が、上記所定のアクセント句に含まれる母音を無声化す
るという判定結果である場合は、第２の判定手段による
判定結果に応じて、上記所定のアクセント句に含まれる
母音を無声化し、もしくは、無声化しない処理を行う。
そして、上記合成手段は、処理手段による処理結果およ
び第１の判定手段による判定結果に基づいて、記憶手段
から音声合成単位データを読出すとともに、その読出し
た音声合成単位データを合成する。つまり、テキストデ
ータ中の所定のアクセント句が、無声化規則の適用対象
である場合において、アクセント句の長さに応じて、そ
の所定のアクセント句に含まれる母音を無声化するか否
かを判定し、その判定結果に基づいて音声合成単位デー
タを合成することができる。したがって、アクセント句
の長短によっては、母音が無声化されると聞き取り難く
なる場合があるという問題を解決できる。According to the first to third aspects of the present invention, the storage means stores speech synthesis unit data which is a unit for synthesizing speech, and the first determination means is adapted to a voiceless rule. Therefore, it is determined whether or not the vowels included in the predetermined accent phrase in the text data are to be devoiced, and the second determination unit determines whether or not the accent phrase detected by the detection unit that detects the length of the predetermined accent phrase is detected. It is determined whether the length is longer than a predetermined length set in advance. If the result of the determination by the first determining means is a result of devoicing a vowel included in the predetermined accent phrase, the processing means: A vowel included in the predetermined accent phrase is unvoiced or a process for not voiceless is performed.
The synthesizing unit reads the speech synthesis unit data from the storage unit based on the processing result by the processing unit and the determination result by the first determination unit, and synthesizes the read speech synthesis unit data. That is, when a predetermined accent phrase in the text data is a target to which the devoicing rule is applied, it is determined whether or not the vowel included in the predetermined accent phrase is devoiced according to the length of the accent phrase. Then, the speech synthesis unit data can be synthesized based on the determination result. Therefore, it is possible to solve the problem that, depending on the length of the accent phrase, it becomes difficult to hear when the vowel is devoiced.

【００１２】特に、請求項２に記載の発明では、上記処
理手段は、第１の判定手段による判定結果が、所定のア
クセント句に含まれる母音を無声化するという判定結果
であり、かつ、第２の判定手段による判定結果が、検出
されたアクセント句の長さが、予め設定された所定の長
さよりも長いという判定結果である場合は、上記所定の
アクセント句に含まれる母音を無声化する処理を行い、
第１の判定手段による判定結果が、上記所定のアクセン
ト句に含まれる母音を無声化するという判定結果であ
り、かつ、第２の判定手段による判定結果が、上記検出
されたアクセント句の長さが、予め設定された所定の長
さよりも長くないという判定結果である場合は、上記所
定のアクセント句に含まれる母音を無声化しない処理を
行う。つまり、アクセント句が、ある所定の長さより短
く（長くなく）、アクセント句内の母音を無声化する
と、合成音を聞き取り難くなるような場合は、アクセン
ト句内の母音を無声化しない処理を行い、アクセント句
が上記所定の長さより長い場合は、アクセント句内の母
音を無声化する処理を行う。したがって、アクセント句
が短いことにより、母音が無声化されると聞き取り難く
なる場合があるという問題を解決できる。[0012] In particular, in the invention according to claim 2, the processing means determines that the determination result by the first determination means is that the vowel included in the predetermined accent phrase is to be devoiced, and If the result of the determination by the second determining means is that the length of the detected accent phrase is longer than a predetermined length, a vowel included in the predetermined accent phrase is devoiced. Do the processing,
The result of the judgment by the first judging means is a judgment result that the vowel included in the predetermined accent phrase is to be unvoiced, and the result of the judgment by the second judging means is the length of the detected accent phrase. Is not longer than a predetermined length, the vowel included in the predetermined accent phrase is not voiced. In other words, if the accent phrase is shorter (not longer) than a predetermined length, and if the vowels in the accent phrase become unvoiced, it is difficult to hear the synthesized sound. If the accent phrase is longer than the predetermined length, the vowel in the accent phrase is devoiced. Therefore, it is possible to solve the problem that it is difficult to hear when the vowel is devoiced due to the short accent phrase.

【００１３】また、上記所定のアクセント句の長さは、
請求項３に記載の発明のように、その所定のアクセント
句内の拍数、音声合成単位数および音韻継続時間のうち
の少なくとも１つによって決定されることが好ましい。
つまり、音声合成装置では、所定のアクセント句内の拍
数、音声合成単位数および音韻継続時間が、アクセント
句の長さを示す代表的なパラメータとして用いれている
からである。The length of the predetermined accent phrase is
As in the third aspect of the present invention, it is preferable to be determined by at least one of the number of beats in the predetermined accent phrase, the number of speech synthesis units, and the phoneme duration.
That is, in the speech synthesizer, the number of beats in a given accent phrase, the number of speech synthesis units, and the duration of the phoneme are used as typical parameters indicating the length of the accent phrase.

【００１４】そして、請求項４に記載の発明では、音声
を合成する際の単位となる音声合成単位データが記憶さ
れた記憶領域と、無声化規則にしたがってテキストデー
タ中の所定のアクセント句に含まれる母音を無声化する
か否かを判定する第１の判定プログラムと、前記所定の
アクセント句の長さを検出する検出プログラムと、この
検出プログラムによって検出されたアクセント句の長さ
が、予め設定された所定の長さよりも長いか否かを判定
する第２の判定プログラムと、前記第１の判定プログラ
ムによる判定結果が、前記所定のアクセント句に含まれ
る母音を無声化するという判定結果である場合は、前記
第２の判定手段による判定結果に応じて、前記所定のア
クセント句に含まれる母音を無声化し、もしくは、無声
化しない処理を行う処理プログラムと、この処理プログ
ラムによる処理結果および前記第１の判定プログラムに
よる判定結果に基づいて前記記憶手段から音声合成単位
データを読出すとともに、その読出した音声合成単位デ
ータを合成するデータ合成プログラムと、が記憶された
記憶媒体という構成であるため、その記憶媒体を用いる
ことにより、請求項１に記載の発明を実施できる。つま
り、上記音声合成装置は、たとえば、後述する発明の実
施の形態に記載するように、音声合成装置に内蔵された
ＣＰＵ、あるいは、音声合成装置に接続されたコンピュ
ータによって制御されることから、上記記憶媒体として
の記憶部を音声認識装置に設け、もしくは、上記記憶媒
体に格納されているコンピュータプログラムをコンピュ
ータにインストールすることによって、合成音の聞き取
りやすい音声合成装置および記憶媒体を実現できるから
である。According to the fourth aspect of the present invention, a storage area in which speech synthesis unit data serving as a unit for synthesizing speech is stored, and a predetermined accent phrase in text data in accordance with a de-voicing rule. A first determination program for determining whether or not the vowel to be voiced is to be unvoiced, a detection program for detecting the length of the predetermined accent phrase, and a length of the accent phrase detected by the detection program is set in advance. The second determination program that determines whether the length is longer than the predetermined length and the determination result obtained by the first determination program are the determination results that the vowels included in the predetermined accent phrase are to be unvoiced. In the case, the vowel included in the predetermined accent phrase is unvoiced or a process for not unvoiced is performed according to the determination result by the second determination unit. A processing program, a data synthesizing program for reading voice synthesis unit data from the storage means based on a processing result of the processing program and a determination result of the first determination program, and synthesizing the read voice synthesis unit data; Is stored in a storage medium, and the storage medium can be used to implement the invention described in claim 1. That is, the speech synthesizer is controlled by, for example, a CPU incorporated in the speech synthesizer or a computer connected to the speech synthesizer, as described in an embodiment of the invention described later. By providing a storage unit as a storage medium in the speech recognition device, or by installing a computer program stored in the storage medium in a computer, a speech synthesis device and a storage medium in which a synthesized sound can be easily heard can be realized. .

【００１５】[0015]

【発明の実施の形態】以下、本発明の音声合成装置の一
実施形態について図を参照して説明する。図１は、本実
施形態の音声合成装置の概略構成をブロックで示す説明
図である。なお、従来と同一の構成には同一の符号を用
いてその説明を省略する。本実施形態の音声合成装置１
０には、図１に示すように、辞書１２と、日本語解析部
１４と、拍数計数部１６と、音声パラメータ設定部１８
と、音韻データファイル２０と、音声合成部２２と、ス
ピーカ２４とが備えられている。拍数検出部１６は、日
本語解析部１４から出力される合成文字列に含まれるア
クセント句を表す区切り記号、文末記号を基にしてアク
セント句内の片仮名に対応する音節を拍数として計数す
る。音声パラメータ設定部１８は、拍数計数部１６によ
って計数された拍数と、辞書１２に記憶されている無声
化規則１２ａ（図４参照）と、日本語解析部１４から出
力される合成文字列に基づいて、適切な音韻データを設
定する。また、音声パラメータ設定部１８は、合成文字
列に含まれているアクセントやアクセント句記号に基づ
いて合成音声のピッチや韻律継続時間などのパラメータ
を設定する。続いて、音声パラメータ設定部１８は、上
記設定した音韻データおよびパラメータを音声合成部２
２へ出力する。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an embodiment of a speech synthesizing apparatus according to the present invention. FIG. 1 is an explanatory diagram showing a schematic configuration of a speech synthesizer of the present embodiment by blocks. The same components as those in the related art are denoted by the same reference numerals, and description thereof is omitted. Speech synthesis device 1 of the present embodiment
0, as shown in FIG. 1, the dictionary 12, the Japanese language analyzing unit 14, the beat counting unit 16, and the voice parameter setting unit 18
, A phoneme data file 20, a voice synthesis unit 22, and a speaker 24. The number-of-beats detection unit 16 counts the syllable corresponding to the katakana in the accent phrase as the number of beats based on the delimiter and the end-of-sentence symbol representing the accent phrase included in the composite character string output from the Japanese analysis unit 14. . The voice parameter setting unit 18 determines the number of beats counted by the number of beats counting unit 16, the voicing rule 12 a stored in the dictionary 12 (see FIG. 4), and the synthesized character string output from the Japanese analysis unit 14. , Appropriate phonological data is set. The speech parameter setting unit 18 sets parameters such as the pitch of the synthesized speech and the prosody duration based on the accents and accent phrases included in the synthesized character string. Subsequently, the voice parameter setting unit 18 converts the set phoneme data and parameters into the voice synthesizing unit 2.
Output to 2.

【００１６】そして、音声合成部２２は、音声パラメー
タ設定部１８から出力された音韻データを音韻データフ
ァイル２０から順次読出し、これら読出した音韻データ
を音声パラメータ設定部１８から出力されたパラメータ
に基づいて合成し、その合成した音韻データを音声信号
としてスピーカ２４へ出力する。そして、スピーカ２４
は、入力した音声信号を音声に変換する。The speech synthesis unit 22 sequentially reads out the phoneme data output from the speech parameter setting unit 18 from the phoneme data file 20, and reads these read phoneme data based on the parameters output from the speech parameter setting unit 18. The synthesized phoneme data is output to the speaker 24 as an audio signal. And the speaker 24
Converts an input audio signal into audio.

【００１７】次に、音声合成装置１０が音声合成を行う
場合の処理の流れについて、それを示す図２のフローチ
ャートを参照して説明する。なお、辞書１２および音韻
データファイル２０は、装置に内蔵されたＲＯＭやＨＤ
Ｄ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）などの記憶媒体
に記憶されており、音声合成装置１０が図２に示す処理
を実行するためのコンピュータプログラムも上記記憶媒
体に記憶されている。また、ここでは、２拍の単語であ
る「七［ｓｈｉ・ｃｈｉ］」の音声合成を行うものと
し、単語「七」は、無声化規則の適用対象とする。さら
に、無声化規則の適用対象である場合に、拍数ｎが３以
上である場合に無声化を行うものとする。Next, the flow of processing when the speech synthesizer 10 performs speech synthesis will be described with reference to the flowchart of FIG. Note that the dictionary 12 and the phoneme data file 20 are stored in a ROM or HD built in the device.
It is stored in a storage medium such as D (Hard Disk Drive), and a computer program for causing the speech synthesizer 10 to execute the processing shown in FIG. 2 is also stored in the storage medium. Also, here, it is assumed that speech synthesis is performed on the word “seven [shi · chi]” which is a two-beat word, and the word “seven” is an object to be applied to the unvoiced rule. Furthermore, if the number of beats n is 3 or more in the case where the voiceless rule is applied, voicelessness is performed.

【００１８】まず、テキストデータとして単語「七」が
入力されると（ステップ２）、辞書１２を参照しながら
単語「七」の日本語解析処理を行い、単語「七」を合成
文字列に変換する（ステップ４）。続いて、ステップ４
で変換された合成文字列に対して辞書１２に記憶されて
いる無声化規則を適用し（ステップ６）、アクセント句
内の音節の中に無声化規則の適用対象となる音節がある
か否かを判定する（ステップ８）。ここでは、単語
「七」は、無声化規則の適用対象であるため、ステップ
１０へ進み（ステップ８：Ｙｅｓ）、単語「七」の拍数
ｎを計数する。続いて、拍数ｎが３以上であるか否かを
判定するが、単語「七」の拍数ｎは２であるため、ステ
ップ１６へ進み、単語「七」に対して母音［ｉ］を無声
化していない音韻データ［ｓｈｉ］［ｃｈｉ］を設定す
る（ステップ１６）。First, when the word "seven" is input as text data (step 2), a Japanese analysis process of the word "seven" is performed with reference to the dictionary 12, and the word "seven" is converted into a synthesized character string. (Step 4). Then, step 4
Apply the devoicing rule stored in the dictionary 12 to the synthesized character string converted in step (step 6), and determine whether any syllables in the accent phrase include the syllable to which the devoicing rule is applied. Is determined (step 8). Here, since the word “seven” is a target to which the devoicing rule is applied, the process proceeds to step 10 (step 8: Yes), and the number of beats n of the word “seven” is counted. Subsequently, it is determined whether or not the number of beats n is 3 or more. Since the number of beats n of the word “seven” is 2, the process proceeds to step 16 and the vowel [i] is added to the word “seven”. Unvoiced phoneme data [shi] [chi] is set (step 16).

【００１９】そして、ステップ４で変換された合成文字
列に含まれるアクセント、ピッチ、音韻継続時間などの
パラメータを設定する（ステップ１８）。続いて、ステ
ップ１６で設定された音韻データを音韻データファイル
２０から読出すとともに、その読出した音韻データを、
ステップ１８で設定されたパラメータを付加しながら合
成する。ここでは、その合成結果は、［ｓｈｉ・ｃｈ
ｉ］となり、母音［ｉ］が有声化された歯切れの良い合
成音となる。なお、音声合成の対象となるアクセント句
内の拍数ｎが３以上の場合は（ステップ１２：Ｙｅ
ｓ）、母音を無声化した音韻データを設定する（ステッ
プ１４）。また、ステップ６において無声化規則を適用
した結果、母音を無声化しないと判定した場合は（ステ
ップ８：Ｎｏ）、拍数ｎの検出を行わないで、母音を無
声化していない音韻データを設定する（ステップ１
６）。Then, parameters such as accent, pitch, and phoneme duration included in the synthesized character string converted in step 4 are set (step 18). Subsequently, while reading the phoneme data set in step 16 from the phoneme data file 20, the read phoneme data is
The composition is performed while adding the parameters set in step 18. Here, the synthesis result is [shi · ch
i], and the vowel [i] becomes a voiced crisp synthesized sound. If the number of beats n in the accent phrase to be synthesized is 3 or more (step 12: Ye
s), set phonological data in which the vowels are devoiced (step 14). If it is determined in step 6 that the vowel is not to be unvoiced as a result of the application of the devoicing rule (step 8: No), the vowel is not voiced without detecting the number of beats n. (Step 1
6).

【００２０】以上のように、本実施形態の音声合成装置
１０を使用すれば、無声化規則の適用となった場合であ
っても、アクセント句内の拍数ｎが２以下の場合には、
母音の無声化を行わないため、母音が無声化されてしま
い、聞き取り難くなることをなくすことができる。ま
た、上記実施形態では、無声化規則を適用した場合にお
いて、母音を無声化するか否かの判定基準としたアクセ
ント句内の拍数ｎを３に設定した場合を代表に説明した
が、３以外の数でもよい。さらに、アクセント句の長さ
を判定するパラメータとしては、上記拍数の他に、音声
合成単位数や音韻継続時間などを用いることもできる。As described above, if the speech synthesizer 10 of the present embodiment is used, even if the devoicing rule is applied, if the number of beats n in the accent phrase is 2 or less,
Since the vowel is not devoiced, it is possible to prevent the vowel from being devoiced and becoming difficult to hear. In the above-described embodiment, the case where the number of beats n in the accent phrase is set to 3 as a criterion for determining whether or not a vowel is to be unvoiced when the voiceless rule is applied has been described as a representative. Other numbers may be used. Further, as the parameter for determining the length of the accent phrase, the number of speech synthesis units, the phoneme duration, and the like can be used in addition to the number of beats.

【００２１】また、上記実施形態では、音声合成装置１
０が音声合成を実行するためのコンピュータプログラム
が装置内のＲＯＭやＨＤＤなどの記憶媒体に記憶されて
いる構成を用いたが、上記コンピュータプログラムをＣ
Ｄ−ＲＯＭやフロッピーディスクなどに記憶し、それら
を本音声合成装置１０に備えられた読取装置（図示省
略）を用いてインストールすることによって音声合成処
理を行うように構成することもできる。この場合、上記
ＣＤ−ＲＯＭやＦＤなどが、請求項４に記載の記憶媒体
として機能する。さらに、外部情報処理装置から有線ま
たは無線の通信手段を介してコンピュータプログラムを
読み込んで動作させることもできる。In the above embodiment, the speech synthesizer 1
0 uses a configuration in which a computer program for executing speech synthesis is stored in a storage medium such as a ROM or an HDD in the apparatus.
The voice synthesizing process may be performed by storing the data in a D-ROM or a floppy disk or the like and installing them using a reading device (not shown) provided in the voice synthesizing device 10. In this case, the CD-ROM or the FD functions as the storage medium according to the fourth aspect. Furthermore, a computer program can be read from an external information processing device via a wired or wireless communication unit and operated.

【００２２】ところで、音韻データファイル２０が、本
発明の記憶手段または記憶領域として機能し、拍数計数
部１６が、検出手段として機能し、音声合成部２２が、
合成手段として機能する。また、音声パラメータ設定部
１８が、本発明の第１の判定手段、第２の判定手段およ
び処理手段として機能する。さらに、音声合成装置１０
に備えられたＣＰＵが、図２に示す処理を実行する場合
において、ステップ８を実行するためのコンピュータプ
ログラムが、請求項４に記載の第１の判定プログラムに
対応し、ステップ１０を実行するためのコンピュータプ
ログラムが、検出プログラムに対応し、ステップ１２を
実行するためのコンピュータプログラムが、第２の判定
プログラムに対応し、ステップ１４〜ステップ１８を実
行するためのコンピュータプログラムが、処理プログラ
ムに対応し、ステップ２０を実行するためのコンピュー
タプログラムが、データ合成プログラムに対応する。By the way, the phoneme data file 20 functions as storage means or storage area of the present invention, the beat counting section 16 functions as detection means, and the speech synthesis section 22
Functions as a combining means. Further, the voice parameter setting unit 18 functions as a first determining unit, a second determining unit, and a processing unit of the present invention. Further, the speech synthesizer 10
In the case where the CPU provided in the computer executes the processing shown in FIG. 2, the computer program for executing step 8 corresponds to the first determination program according to claim 4 and executes step 10. Corresponds to the detection program, the computer program for executing step 12 corresponds to the second determination program, and the computer program for executing steps 14 to 18 corresponds to the processing program. , The computer program for executing step 20 corresponds to the data synthesis program.

【００２３】[0023]

【発明の効果】以上のように、請求項１ないし請求項３
に記載の発明によれば、音声を合成する際の単位となる
音声合成単位データを記憶する記憶手段と、無声化規則
にしたがってテキストデータ中の所定のアクセント句に
含まれる母音を無声化するか否かを判定する第１の判定
手段と、所定のアクセント句の長さを検出する検出手段
によって検出されたアクセント句の長さが、予め設定さ
れた所定の長さよりも長いか否かを判定する第２の判定
手段と、第１の判定手段による判定結果が、上記所定の
アクセント句に含まれる母音を無声化するという判定結
果である場合は、第２の判定手段による判定結果に応じ
て、上記所定のアクセント句に含まれる母音を無声化
し、もしくは、無声化しない処理を行う処理手段と、こ
の処理手段による処理結果および第１の判定手段による
判定結果に基づいて、上記記憶手段から音声合成単位デ
ータを読出すとともに、その読出した音声合成単位デー
タを合成す合成手段とを備えるため、テキストデータ中
の所定のアクセント句が、無声化規則の適用対象である
場合において、アクセント句の長さに応じて、その所定
のアクセント句に含まれる母音を無声化するか否かの処
理を行い、その処理結果に基づいて音声合成単位データ
を合成することができる。したがって、アクセント句の
長短によっては、母音が無声化されると聞き取り難くな
る場合があるという問題を解決できる。As described above, claims 1 to 3 are as described above.
According to the invention described in (1), storage means for storing speech synthesis unit data which is a unit when synthesizing speech, and whether a vowel included in a predetermined accent phrase in text data is devoiced according to a devoicing rule The first determining means for determining whether or not the accent phrase is detected and the length of the accent phrase detected by the detecting means for detecting the length of the predetermined accent phrase are determined whether or not the length is longer than a predetermined length. If the result of the determination by the second determining means and the result of the first determining means are to determine that the vowel included in the predetermined accent phrase is to be unvoiced, Processing means for performing processing for devoicing or not devoicing vowels included in the predetermined accent phrase, and processing results of the processing means and determination results of the first determination means. A voice synthesizing unit for reading out the voice synthesis unit data from the storage means and synthesizing the read voice synthesis unit data. According to the length of the accent phrase, a process is performed to determine whether or not vowels included in the predetermined accent phrase are to be devoiced, and speech synthesis unit data can be synthesized based on the processing result. Therefore, it is possible to solve the problem that, depending on the length of the accent phrase, it becomes difficult to hear when the vowel is devoiced.

【００２４】特に、請求項２に記載の発明によれば、上
記処理手段は、第１の判定手段による判定結果が、所定
のアクセント句に含まれる母音を無声化するという判定
結果であり、かつ、第２の判定手段による判定結果が、
検出されたアクセント句の長さが、予め設定された所定
の長さよりも長いという判定結果である場合は、上記所
定のアクセント句に含まれる母音を無声化する処理を行
い、第１の判定手段による判定結果が、上記所定のアク
セント句に含まれる母音を無声化するという判定結果で
あり、かつ、第２の判定手段による判定結果が、上記検
出されたアクセント句の長さが、予め設定された所定の
長さよりも長くないという判定結果である場合は、上記
所定のアクセント句に含まれる母音を無声化しない処理
を行うため、アクセント句が短いことにより、母音が無
声化されると聞き取り難くなる場合があるという問題を
解決できる。In particular, according to the second aspect of the present invention, the processing unit determines that the first determination unit determines that the vowel included in the predetermined accent phrase is to be unvoiced, and , The determination result by the second determination means is:
If it is determined that the length of the detected accent phrase is longer than a predetermined length, a process for devoicing vowels included in the predetermined accent phrase is performed. Is a result of determining that a vowel included in the predetermined accent phrase is to be devoiced, and the result of the determination by the second determination means is that the length of the detected accent phrase is set in advance. If it is determined that the vowel is not longer than the predetermined length, the vowel included in the predetermined accent phrase is subjected to a process of not devoicing. Can solve the problem that it may be.

【００２５】また、請求項３に記載の発明によれば、上
記所定のアクセント句の長さは、その所定のアクセント
句内の拍数、音声合成単位数および音韻継続時間のうち
の少なくとも１つによって決定されるため、音声合成装
置において代表的なパラメータをアクセント句の長さを
決定するために用いることができる。According to the third aspect of the present invention, the length of the predetermined accent phrase is at least one of the number of beats, the number of speech synthesis units, and the phoneme duration in the predetermined accent phrase. Therefore, typical parameters can be used in the speech synthesizer to determine the length of the accent phrase.

【００２６】そして、請求項４に記載の発明によれば、
音声を合成する際の単位となる音声合成単位データが記
憶された記憶領域と、無声化規則にしたがってテキスト
データ中の所定のアクセント句に含まれる母音を無声化
するか否かを判定する第１の判定プログラムと、上記所
定のアクセント句の長さを検出する検出プログラムと、
この検出プログラムによって検出されたアクセント句の
長さが、予め設定された所定の長さよりも長いか否かを
判定する第２の判定プログラムと、上記第１の判定プロ
グラムによる判定結果が、上記所定のアクセント句に含
まれる母音を無声化するという判定結果である場合は、
上記第２の判定手段による判定結果に応じて、上記所定
のアクセント句に含まれる母音を無声化し、もしくは、
無声化しない処理を行う処理プログラムと、この処理プ
ログラムによる処理結果および上記第１の判定プログラ
ムによる判定結果に基づいて上記記憶手段から音声合成
単位データを読出すとともに、その読出した音声合成単
位データを合成するデータ合成プログラムと、が記憶さ
れた記憶媒体という構成であるため、その記憶媒体を音
声合成装置内の記憶媒体として設け、もしくは、その記
憶媒体に格納されているコンピュータプログラムを音声
合成装置あるいは音声合成装置に接続されたコンピュー
タにインストールすることによって請求項１に記載の音
声合成装置を実現できる。According to the fourth aspect of the present invention,
A storage area in which speech synthesis unit data serving as a unit for synthesizing speech is stored, and a first judgment is made as to whether or not vowels included in a predetermined accent phrase in text data are to be unvoiced in accordance with a voiceless rule. Determination program, and a detection program for detecting the length of the predetermined accent phrase,
A second determination program that determines whether the length of the accent phrase detected by the detection program is longer than a predetermined length, and a determination result by the first determination program are determined by the first determination program. If the result is that the vowels in the accent phrase
The vowels included in the predetermined accent phrase are devoiced according to the determination result by the second determination unit, or
A speech synthesis unit data is read out from the storage unit based on a processing program for performing a process that does not de-voice, and a processing result by the processing program and a determination result by the first determination program. Since the data synthesizing program to be synthesized and the storage medium in which the data synthesizing program is stored are provided, the storage medium is provided as a storage medium in the voice synthesizing apparatus, or the computer program stored in the storage medium is stored in the voice synthesizing apparatus or The speech synthesizer according to claim 1 can be realized by installing the software in a computer connected to the speech synthesizer.

[Brief description of the drawings]

【図１】本発明実施形態の音声合成装置の概略構成をブ
ロックで示す説明図である。FIG. 1 is an explanatory diagram showing a schematic configuration of a speech synthesis device according to an embodiment of the present invention by blocks.

【図２】本発明実施形態の音声合成装置の処理の流れを
示すフローチャートである。FIG. 2 is a flowchart illustrating a processing flow of the speech synthesis device according to the embodiment of the present invention.

【図３】従来の音声合成装置の概略構成をブロックで示
す説明図である。FIG. 3 is an explanatory diagram showing a schematic configuration of a conventional voice synthesizing apparatus by blocks.

【図４】無声化規則の内容の一部を示す説明図である。FIG. 4 is an explanatory diagram showing a part of the contents of a voiceless rule.

[Explanation of symbols]

１０音声合成装置１２辞書１４日本語解析部１６拍数計数部（検出手段）１８音声パラメータ設定部（第１および第２の判定
手段、処理手段）２０音韻データファイル（記憶手段）２２音声合成部（合成手段）２４スピーカDESCRIPTION OF SYMBOLS 10 Speech synthesizer 12 Dictionary 14 Japanese analysis part 16 Beat count part (detection means) 18 Speech parameter setting part (first and second judgment means, processing means) 20 Phoneme data file (storage means) 22 Speech synthesis part (Synthesizing means) 24 speakers

Claims

[Claims]

1. A storage means for storing speech synthesis unit data as a unit for synthesizing speech, and whether or not vowels included in a predetermined accent phrase in text data are to be devoiced according to a devoicing rule. First determining means for determining the length of the accent phrase, detecting means for detecting the length of the predetermined accent phrase, and whether the length of the accent phrase detected by the detecting means is longer than a predetermined length set in advance A second determining means for determining whether or not the vowel included in the predetermined accent phrase is to be devoiced; Processing means for performing voiceless or non-voiceless processing of vowels included in the predetermined accent phrase in accordance with the determination result by With reading the voice synthesis unit data from said storage means based on a determination result by the stage, and combining means for combining the read-out voice synthesis unit data, speech synthesizer, wherein a provided.

2. The processing unit according to claim 1, wherein the determination result by said first determination unit is a determination result that a vowel included in said predetermined accent phrase is to be unvoiced, and said determination unit determines by said second determination unit. If the result is a determination result that the length of the detected accent phrase is longer than a predetermined length, a process of devoicing vowels included in the predetermined accent phrase is performed. The result of the determination by the first determining means is a result of determining that the vowel included in the predetermined accent phrase is to be unvoiced, and the result of the determination by the second determining means is the length of the detected accent phrase. If the result of the determination is that the vowel included in the predetermined accent phrase is not longer than a predetermined length, a process for not devoicing a vowel included in the predetermined accent phrase is performed. Speech synthesis apparatus according to 1.

3. The length of the predetermined accent phrase is determined by at least one of the number of beats, the number of speech synthesis units, and the phoneme duration in the predetermined accent phrase. The speech synthesizer according to claim 1 or 2.

4. A storage area in which speech synthesis unit data as a unit for synthesizing speech is stored, and whether a vowel included in a predetermined accent phrase in text data is to be devoiced in accordance with a devoicing rule. A determination program for determining the length of the accent phrase, and a detection program for detecting the length of the predetermined accent phrase, whether the length of the accent phrase detected by the detection program is longer than a predetermined length set in advance. A second determination program for determining whether or not a vowel included in the predetermined accent phrase is to be devoiced, if the determination result obtained by the first determination program is a second voice, A processing program for performing a process of devoicing or not devocalizing a vowel included in the predetermined accent phrase according to the determination result by And a data synthesizing program for reading the speech synthesis unit data from the storage means based on the processing result by the program and the judgment result by the first judgment program, and synthesizing the read speech synthesis unit data. A storage medium characterized by the above-mentioned.