JPH03276194A

JPH03276194A - Text/sound converter

Info

Publication number: JPH03276194A
Application number: JP2078338A
Authority: JP
Inventors: Takashi Yato; 隆矢頭
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1990-03-27
Filing date: 1990-03-27
Publication date: 1991-12-06

Abstract

PURPOSE:To obtain a desired synthesized tone equipped originally with reading, accent, intonation and breath, etc., which are originally controlled by a user, with simple configuration by providing mode set and control parts. CONSTITUTION:When the mode set part 40 selects any one of modes such as a text/ sound conversion mode, phoneme sound and meter symbol train output mode and phoneme sound and meter symbol train synthesizing mode according to a designation from an external part, the control part 50 discriminates the mode and controls the input/output of a text and a phoneme sound and meter symbol train. When the text/ sound conversion mode is set, the control part 50 inputs the text and a text analysis part 30 analyzes the text. Then, the result of the analysis is inparted through a sound synthesizing part 60 to a loudspeaker 61. When the phoneme sound and meter symbol train output mode is set, the control part 50 analyzes the text at the text analysis part 30 and outputs the generated pheneme sound and meter symbol train in the form of a character code to the external part. When the phoneme sound and meter symbol train synthesizing mode is set, the control part 50 directly outputs the phoneme sound and meter symbol train inputted from the external part, through the sound synthesizing part 60. Thus, the user can freely change the pheneme sound and meter symbol train and easily obtain the desired synthesized tone.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、文字コードよりなる文章データ（これを「テ
キスト」という）等を入力してそれを音声に変換して出
力するテキスト・音声変換装置に関するものである。[Detailed Description of the Invention] (Industrial Application Field) The present invention is a text-to-speech conversion system that inputs text data consisting of character codes (this is referred to as "text"), converts it into speech, and outputs it. It is related to the device.

（従来の技術）従来、このような分野の技術としては特開平１−１１９
８２２号公報に記載されるものがあった。(Prior art) Conventionally, as a technology in this field, Japanese Patent Application Laid-Open No. 1-119
There was one described in Publication No. 822.

従来、音声合成装置には、文字情報を入力してそれを音
声に変換して出力する文字・音声変換装置つまりテキス
ト・音声変換装置や、音声をメモリ等に蓄積しておきそ
れを再生する音声蓄積・再生型音声合成装置等があった
。後者の装置は蓄積された音声のみを生成するため、そ
の出力語常に制限を受けるが、前者の装置は入力された
文字情報を音声に変換して出力するため、出力語嘗の制
限がないことから、後者の装置にとって代わる技術とし
て種々の利用分野での応用が期待できる。Conventionally, speech synthesis devices include a text-to-speech conversion device that inputs character information, converts it into speech, and outputs it, and a speech synthesizer that stores speech in a memory or the like and plays it back. There were storage and playback type speech synthesis devices. The latter device generates only stored speech, so it is always limited in its output vocabulary, but the former device converts input text information into speech and outputs it, so there is no limit to the output vocabulary. Therefore, it can be expected to be applied in various fields as a technology to replace the latter device.

この種の装置の一例を第２図に示す。An example of this type of device is shown in FIG.

第２図は、前記文献等に記載された従来のテキスト・音
声変換装置の一構成例を示すブロック図である。FIG. 2 is a block diagram showing a configuration example of a conventional text-to-speech conversion device described in the above-mentioned literature.

ホストコンピュータ１の出力側に接続されたテキスト・
音声変換装置１０は、インタフェース部１１、テキスト
解析部１２、音声合成部１３、及びスピーカ１４より構
成されている。A text message connected to the output side of host computer 1.
The speech conversion device 10 includes an interface section 11, a text analysis section 12, a speech synthesis section 13, and a speaker 14.

この装置では、ホストコンピュータ１からテキストが出
力されると、そのテキストがインタフェース部１１を介
してテキスト解析部１２に与えられる。テキスト解析部
１２では、入力されたテキストを、単語毎に区分化し、
各単語の読み、アクセント、及び文章全体の抑揚を表す
イントネーション記号等の、文章を音声として読み上げ
るために必要な音韻、韻律記号列を生成し、音声合成部
１３へ送る。音声合成部１３では、音韻、韻律記号列を
基に入力文章、つまり入力テキストに対応する音声を合
成し、スピーカ１４より出力する。In this device, when text is output from the host computer 1, the text is provided to the text analysis section 12 via the interface section 11. The text analysis unit 12 segments the input text into words,
Phonological and prosodic symbol strings necessary for reading out the text aloud, such as intonation symbols representing the pronunciation of each word, accent, and intonation of the entire text, are generated and sent to the speech synthesis unit 13. The speech synthesis section 13 synthesizes speech corresponding to the input sentence, that is, the input text, based on the phoneme and prosodic symbol strings, and outputs the synthesized speech from the speaker 14.

（発明が解決しようとする課題）しかしながら、上記構成の装置では、次のような課題が
あった。(Problems to be Solved by the Invention) However, the apparatus with the above configuration has the following problems.

従来の文章を読み上げるテキスト・音声変換装置では、
構成が簡単で、回路規模が小さく、低コストであるとい
う利点を有しているため、種々の用途に使われている。In conventional text-to-speech conversion devices that read out sentences,
Since it has the advantages of simple configuration, small circuit scale, and low cost, it is used for various purposes.

しかし、この種のテキスト・音声変換装置が、日常使わ
れる漢字かな混じり文章を解析し、完全な読み、アクセ
ント、及びイントネーション等を得ることは現状では不
可能に近い。また、正確な音韻、韻律記号列を得るため
には、意味解析を含めた非常に高度な解析を必要とし、
それによって装置構成の複雑化と、それによる回路規模
の大型化及びコスト高を招くという問題が生じる。その
ため、文章を読み上げる従来のテキスト・音声変換装置
では、テキスト解析結果である音韻、１律記号列に、多
くの場合、単語の読み、アクセント、あるいは文章のイ
ントネーションに誤りを含んでおり、ユーザの期待通り
の合成音を得ることが困難であった。その上、テキスト
を入力して装置内部で自動的に解析を行い、音声として
読み上げる従来のテキスト・音声変換装置では、ユーザ
が意図するように合成音を細かく調整することは不可能
であり、装置による解析に基づくお仕着せの合成音に甘
んじなければならず、未だ技術的に十分満足のゆくもの
が得られなかった。However, at present, it is nearly impossible for this type of text-to-speech conversion device to analyze sentences that are used in daily life, including kanji and kana, and obtain perfect pronunciation, accent, and intonation. In addition, in order to obtain accurate phonological and prosodic symbol strings, very sophisticated analysis including semantic analysis is required.
This causes the problem of complicating the device configuration, thereby increasing the circuit scale and cost. Therefore, in conventional text-to-speech conversion devices that read sentences aloud, the phoneme and monotony symbol strings that are the result of text analysis often contain errors in word pronunciation, accent, or sentence intonation, making it difficult for the user to understand. It was difficult to obtain the expected synthesized sound. Furthermore, with conventional text-to-speech conversion devices that input text, automatically analyze it internally, and read it out as voice, it is impossible to finely adjust the synthesized sound as the user intends. We had to settle for ready-made synthesized sounds based on analysis by the authors, and we were still unable to obtain anything that was technically satisfactory.

本発明は、前記従来技術が持っていた課題として、ユー
ザが独自に調整した読みや、アクセント、イントネーシ
ョン、息つぎ等を持った所望の合成音を簡単な構成で得
ることが困難である点について解決したテキスト・音声
変換装置を提供するものである。The present invention addresses the problem of the prior art, which is that it is difficult for a user to obtain a desired synthesized voice with a unique reading, accent, intonation, breath pause, etc. with a simple configuration. The present invention provides a text/speech conversion device that solves the above problems.

（課題を解決するための手段）本発明は前記課題を解決するために、漢字かな混じり文
章データを解析してその文章データに対する音韻、韻律
記号列を生成するテキスト解析部と、前記音韻、韻律記
号列に基づき前記文章データに対応する合成音声を出力
する音声合成部とを、備えたテキスト・音声変換装置に
おいて、モード設定部及び制御部を設けたものである。(Means for Solving the Problems) In order to solve the problems described above, the present invention provides a text analysis unit that analyzes kanji-kana mixed text data and generates a phonological and prosodic symbol string for the text data, This text-to-speech conversion device includes a speech synthesis section that outputs synthesized speech corresponding to the text data based on a symbol string, and a mode setting section and a control section.

ここで、モード設定部は、前記文章データを前記テキス
ト解析部へ入力するテキスト・音声変換モード、前記テ
キスト解析部で生成された音韻、韻律記号列を文字コー
ドの形式で外部へ出力する音韻、韻律記号列出力モード
、及び外部からの音韻、韻律記号列を直接前記音声合成
部へ入力する音韻、韻律記号列合成モードのいずれか一
つのモードを、外部から設定する機能を有している。ま
た、制御部は、前記モード設定部で設定されたモード状
態を判定し、その判定結果に基づき、前記文章データ及
び音韻、韻律記号列の入、出力を制御する機能を有して
いる。Here, the mode setting section includes a text-to-speech conversion mode in which the text data is input to the text analysis section, a phonology in which the phonology and prosodic symbol strings generated in the text analysis section are outputted to the outside in the form of character codes; It has a function of externally setting any one of a prosodic symbol string output mode and a phoneme and prosodic symbol string synthesis mode in which phoneme and prosodic symbol strings from the outside are directly input to the speech synthesis section. Further, the control section has a function of determining the mode state set by the mode setting section, and controlling input and output of the text data, phoneme, and prosodic symbol string based on the determination result.

（作　用）本発明によれば、以上のようにテキスト・音声変換装置
を構成したので、外部からの指定によってモード設定部
で、テキスト・音声変換モードと、音韻、韻律記号列出
力モードと、音韻、韻律記号列合成モードとのいずれか
一つのモードを選択すると、制御部では、その設定され
たモード状態を判定し、その判定結果に基づき、前記テ
キスト及び音韻、韻律記号列の入、出力を制御する。(Function) According to the present invention, since the text-to-speech conversion device is configured as described above, the mode setting unit can select the text-to-speech conversion mode and the phonetic and prosodic symbol string output mode according to external specifications. When one of the phoneme and prosodic symbol string synthesis modes is selected, the control unit determines the set mode state, and inputs and outputs the text, phoneme, and prosodic symbol string based on the determination result. control.

例えば、テキスト・音声変換モードが設定されると、制
御部は、テキストを入力し、それをテキスト解析部に与
えて解析を行わせ、その解析結果より音声合成部から合
成音声を出力させる。音韻、韻律記号列出力モードが設
定されると、制御部は、テキストを入力してそのテキス
トをテキスト解析部で解析させ、そのテキスト解析部で
生成される音韻、韻律記号列を文字コードの形式で外部
に出力する。また、音韻、韻律記号列合成モードが設定
されると、制御部は、外部から入力した音韻、韻律記号
列を直接、音声合成部へ入力し、その音声合成部から合
成音声を出力させる。For example, when the text-to-speech conversion mode is set, the control unit inputs text, provides it to the text analysis unit to analyze it, and causes the speech synthesis unit to output synthesized speech based on the analysis result. When the phonological and metrical symbol string output mode is set, the control section inputs text, causes the text analysis section to analyze the text, and converts the phonological and metrical symbol strings generated by the text analysis section into character code formats. output to the outside. Further, when the phoneme and prosodic symbol string synthesis mode is set, the control section directly inputs the phoneme and prosodic symbol string input from the outside to the speech synthesis section, and causes the speech synthesis section to output synthesized speech.

このように、３つのモードを外部からの指定によって選
択できる構成になっているため、テキスト解析結果であ
る音韻、韻律記号列をユーザが自由に変更でき、それに
よって所望の合成音が簡単な構成で得られる。従って、
前記課題を解決できるのである。In this way, since the configuration is such that three modes can be selected by external specification, the user can freely change the phonological and prosodic symbol strings that are the result of text analysis, thereby making it possible to easily create the desired synthesized sound. It can be obtained with Therefore,
The above problem can be solved.

（実施例）第１図は、本発明の一実施例を示すテキスト・音声変換
装置の構成ブロック図である。(Embodiment) FIG. 1 is a configuration block diagram of a text-to-speech conversion device showing an embodiment of the present invention.

このテキスト・音声変換装置２０は、演算及び制御機能
を有するホストコンピュータ１の出力側に接続されるも
ので、そのホストコンピュータ１と装置内部との間の信
号の授受を行うインタフェース部２１と、単語の読み、
アクセント位置、品詞情報等が格納されたＲＯＭ　（リ
ード・オンリ・メモリ）等からなる単語辞書メモリ２２
とを、備えている。この単語辞書メモリ２２には、テキ
スト解析部３０が接続されている。This text-to-speech conversion device 20 is connected to the output side of a host computer 1 having arithmetic and control functions, and includes an interface unit 21 that sends and receives signals between the host computer 1 and the inside of the device, and a word reading,
A word dictionary memory 22 consisting of a ROM (read-only memory) etc. in which accent positions, parts of speech information, etc. are stored.
It is equipped with. A text analysis section 30 is connected to this word dictionary memory 22 .

テキスト解析部３０は、単語辞書メモリ２２の内容を参
照して入力テキストを解析し、単語の読み、アクセント
位置、文章の抑揚を表すイントネーション記号等、文章
データを音声として合成するために必要な音韻、韻律記
号列を生成するものである。このテキスト解析部３０は
、テキストを単語毎に分割する単語分割処理手段３１、
分割された各単語の読みを付与する読み付与処理手段３
２、アクセントを付与するアクセント付与処理手段３３
、及びポーズやイントネーションを設定するポーズ・イ
ントネーション設定手段３４等で、構成されている。The text analysis unit 30 analyzes the input text with reference to the contents of the word dictionary memory 22, and obtains phonemes necessary for synthesizing text data into speech, such as word pronunciation, accent position, and intonation marks representing sentence intonation. , which generates a prosodic symbol string. This text analysis section 30 includes word division processing means 31 that divides the text into words;
Reading assignment processing means 3 for assigning a reading to each divided word
2. Accent adding processing means 33 for adding an accent
, and a pause/intonation setting means 34 for setting the pause and intonation.

また、このテキスト・音声変換装置２０には、モード設
定部４０が設けられ、そのモード設定部４０とテキスト
解析部３０には、制御部５０が接続されている。制御部
５０の出力側には、音声合成部６０を介してスピーカ６
１が接続されている。The text-to-speech conversion device 20 is also provided with a mode setting section 40, and a control section 50 is connected to the mode setting section 40 and the text analysis section 30. A speaker 6 is connected to the output side of the control unit 50 via a voice synthesis unit 60.
1 is connected.

モード設定部４０は、テキストをテキスト解析部３０へ
入力するテキスト・音声変換モードと、テキスト解析部
３０で生成された音韻、韻律記号列を文字コードの形式
で外部のホストコンピュータ１へ出力する音韻、韻律記
号列出力モードと、外部のホストコンピュータ１からの
音韻、韻律記号列を直接音声合成部６０へ入力する音韻
、韻律記号列合成モードとの、３つのモードのいずれか
一つを外部から設定する機能を有している。このモード
設定部４０は、例えば切換スイッチを有し、手動によっ
てモードの切換えを行うか、あるいはホストコンピュー
タ１からのコマンド（命令）を、インタフェース部２１
及び制御部５０を介して受けとることによってモードの
切換えを行う機能を有している。The mode setting unit 40 has a text-to-speech conversion mode in which text is input to the text analysis unit 30, and a phonology mode in which the phoneme and prosodic symbol strings generated by the text analysis unit 30 are output to the external host computer 1 in the form of character codes. , a prosodic symbol string output mode, and a phonological and prosodic symbol string synthesis mode in which phonetic and prosodic symbol strings from the external host computer 1 are directly input to the speech synthesis section 60. It has the function to set. The mode setting section 40 has, for example, a changeover switch, and either manually switches the mode or transfers commands from the host computer 1 to the interface section 21.
It has a function of switching the mode by receiving the information via the control unit 50.

制御部５０は、モード設定部４０で設定されたモード状
態を判定するモード判定手段５１と、入出力選択手段５
２とを、備えている。入出力選択手段５２は、モード判
定手段５１の判定結果に基づき、インタフェース部２１
からのテキストまたは音韻、韻律記号列の入力、テキス
ト解析部３０で生成された音韻、韻律記号列のインタフ
ェース部２１への出力、ざらにテキスト解析部３０の出
力またはインタフェース部２１の出力を音声合成部６０
へ送る入出力機能を有している。The control unit 50 includes a mode determination unit 51 that determines the mode state set by the mode setting unit 40, and an input/output selection unit 5.
It is equipped with 2. The input/output selection means 52 selects the interface section 21 based on the determination result of the mode determination means 51.
Input text, phoneme, and prosodic symbol string from , output the phoneme and prosodic symbol string generated by the text analysis section 30 to the interface section 21 , and roughly synthesize the output of the text analysis section 30 or the output of the interface section 21 into speech. Part 60
It has an input/output function to send data to.

音声合成部６０は、入力された音韻、韻律記号列に基づ
き、文章データに対応する合成音声をスピーカ６０へ出
力する機能を有している。The speech synthesis section 60 has a function of outputting synthesized speech corresponding to the text data to the speaker 60 based on the input phoneme and prosodic symbol strings.

次に、動作を説明する。Next, the operation will be explained.

例えば、手動により、予めモード設定部４０でモード設
定を行うものとして、以下の動作説明を行う。For example, the following operation will be described assuming that the mode setting section 40 manually sets the mode in advance.

モード設定部４０によりテキスト・音声変換モードを設
定した場合、テキスト・音声変換装置２０では、ホスト
コンピュータ１からのテキストを、インタフェース部２
１を介して制御部５０に入力する。制御部５０ｋｍおい
て、モード判定手段５１は、モード設定部４０における
モード状態を判定し、テキスト・音声変換モードである
ことを判定する。すると、その判定結果に基づき、入出
力選択手段５２が、インタフェース部２１から受は取っ
たテキストをテキスト解析部３０へ送る。When the text-to-speech conversion mode is set by the mode setting section 40, the text-to-speech conversion device 20 converts the text from the host computer 1 to the interface section 2.
1 to the control unit 50. In the control section 50km, the mode determination means 51 determines the mode state in the mode setting section 40, and determines that it is the text/speech conversion mode. Then, based on the determination result, the input/output selection means 52 sends the text received from the interface section 21 to the text analysis section 30.

テキスト解析部３０において、単語分割処理手段３１は
、単語辞書メモリ２２の内容を参照して入力されたテキ
ストを単語毎に分割する。この分割された各単語に対し
て、読み付与処理手段３２によって各単語の読みが付与
され、アクセント付与処理手段３３によってアクセント
が付与され、ざらにポーズ・イントネーション設定手段
３４により、ポーズや文章全体の抑揚を表すイントネー
ション記号の設定等が行われ、音韻、＠律記号列が生成
されて入出力選択手段５２に送られる。In the text analysis section 30, the word division processing means 31 refers to the contents of the word dictionary memory 22 and divides the input text into words. The reading assignment processing means 32 assigns a pronunciation to each of the divided words, the accent assignment processing means 33 assigns an accent, and the rough pause/intonation setting section 34 determines the pause and overall sentence. The intonation symbol representing the intonation is set, and a phoneme and @meter symbol string are generated and sent to the input/output selection means 52.

入出力選択手段５２は、モード判定手段５１の出力に基
づき、テキスト解析部３０から出力された音韻、韻律記
号列を音声合成部６０へ送る。音声合成部６０は、受は
取った音韻、韻律記号列に基づき、音声を合成してスピ
ーカ６１より出力する。The input/output selection means 52 sends the phoneme and prosodic symbol string output from the text analysis section 30 to the speech synthesis section 60 based on the output of the mode determination means 51. The speech synthesis section 60 synthesizes speech based on the phoneme and prosodic symbol strings received, and outputs the synthesized speech from the speaker 61.

次に、モード設定部４０により、音韻、韻律記号列出力
モードを設定した場合の動作を説明する。Next, the operation when the mode setting section 40 sets the phoneme and prosodic symbol string output mode will be described.

音韻、韻律記号列出力モードでは、前記のテキスト・音
声変換モードと同様に、ホストコンピュータ１からのテ
キストを、インタフェース部２１を介して制御部５０に
入力する。制御部５０内のモード判定手段５１は、モー
ド設定部４０におけるモード状態を判定し、音韻、韻律
記号列出力モードであることを判定すると、その判定結
果を入出力選択手段５２に与える。入出力選択手段５２
は、インタフェース部２１から受は取ったテキストをテ
キスト解析部３０へ送る。すると、テキスト解析部３０
では、前記と同様にして入力されたテキストの解析を行
い、音韻、韻律記号列を生成し、それを入出力選択手段
５２へ送る。In the phonetic and prosodic symbol string output mode, text from the host computer 1 is input to the control section 50 via the interface section 21, as in the text-to-speech conversion mode. The mode determining means 51 in the control section 50 determines the mode state in the mode setting section 40, and when determining that it is the phoneme and prosodic symbol string output mode, provides the determination result to the input/output selecting means 52. Input/output selection means 52
sends the received text from the interface section 21 to the text analysis section 30. Then, the text analysis section 30
Now, the input text is analyzed in the same manner as described above to generate phoneme and prosodic symbol strings, which are sent to the input/output selection means 52.

入出力選択手段５２では、テキスト解析部３０の解析に
よって得られた、文字コードの形式で表現された音韻、
韻律記号列を、インタフェース部２１を介してホストコ
ンピュータ１へ送出する。The input/output selection means 52 selects phonemes expressed in the form of character codes obtained by the analysis of the text analysis section 30,
The prosodic symbol string is sent to the host computer 1 via the interface section 21.

ホストコンピュータ１では、インタフェース部２１から
受は取った音韻、韻律記号列をファイル化しておくこと
により、例えば通常のエディタ（ソースプログラムの編
集、修正を行うプログラム）で、容易に修正可能となる
。In the host computer 1, by storing the phoneme and prosodic symbol strings received from the interface unit 21 in a file, they can be easily modified using, for example, a normal editor (a program for editing and modifying source programs).

また、モード設定部４０により、音韻、韻律記号列合成
モードを設定した場合の動作を説明する。Further, the operation when the mode setting unit 40 sets the phoneme and prosodic symbol string synthesis mode will be explained.

この音韻、韻律記号列合成モードは、例えばユーザによ
って所望の合成音を得られるように、エディタ等で修正
された音韻、韻律記号列を用いて音声の合成を行うモー
ドである。このモードでは、ホストコンピュータ１より
出力された音韻、韻律記号列をインタフェース部２１を
介して制御部５０に入力する。制御部５０内のモード判
定手段５１は、モード設定部４０におけるモート状態を
判定し、音韻、韻律記号列合成モードであることを判定
すると、その判定結果を入出力選択手段５２に与える。This phoneme and prosodic symbol string synthesis mode is a mode in which speech is synthesized using phoneme and prosodic symbol strings that have been modified with an editor or the like so that the user can obtain a desired synthesized sound, for example. In this mode, the phoneme and prosodic symbol strings output from the host computer 1 are input to the control section 50 via the interface section 21. The mode determining means 51 in the control section 50 determines the mote state in the mode setting section 40, and when determining that the mode is the phoneme/prosodic symbol string synthesis mode, provides the determination result to the input/output selecting means 52.

入出力選択手段５２では、インタフェース部２１から受
は取った音韻、韻律記号列を、テキスト解析部３０を通
さずに直接、音声合成部６０へ送り、その音声合成部６
０で合成音を発生させる。The input/output selection means 52 sends the phoneme and prosodic symbol strings received from the interface section 21 directly to the speech synthesis section 60 without passing through the text analysis section 30.
0 generates a synthesized sound.

本実施例では、次のような利点を有している。This embodiment has the following advantages.

本実施例では、通常のテキスト（漢字かな混じり文章デ
ータ）を入力して音声に変換する機能に加え、テキスト
解析部３０で解析された途中結果である音韻、韻律記号
列を文字コードの形式でホストコンピュータ１へ出力す
る機能と、そのホストコンビュータコから音韻、韻律記
号列を入力してこれを直接音声に変換する機能を設け、
これら３つの機能を外部からの指定によってモード設定
部４０及び制御部５０で選択できるようにしている。そ
のため、装置によるテキスト解析結果に基づく合成音声
の出力が行えると共に、テキスト解析結果である音韻、
韻律記号列に対して、ユーザが独自に調整した読みや、
アクセント、イントネーション、息つぎ等を持った所望
の合成音を得ることができる。しかも、本発明では、モ
ード設定部４０及び制御部５０を追加するだけで良いた
め、装置の構成が簡単になり、それによって回路規模の
小型化及び低コスト化が図れる。In this embodiment, in addition to the function of inputting normal text (sentence data containing kanji and kana) and converting it into speech, the phonological and prosodic symbol strings, which are intermediate results analyzed by the text analysis section 30, are converted into character codes. It has a function of outputting to the host computer 1 and a function of inputting phoneme and prosodic symbol strings from the host computer 1 and converting them directly into speech.
These three functions can be selected by the mode setting section 40 and the control section 50 according to external specifications. Therefore, it is possible to output synthesized speech based on the text analysis results by the device, as well as the phonology and phonology that are the text analysis results.
User-adjusted readings for prosodic symbol strings,
It is possible to obtain a desired synthesized sound with accent, intonation, pause, etc. Moreover, in the present invention, since it is only necessary to add the mode setting section 40 and the control section 50, the configuration of the device is simplified, and thereby the circuit size and cost can be reduced.

なお、本発明は図示の実施例に限定されず、種々の変形
が可能である。例えば、テキスト解析部３０やあるいは
制御部５０等に他の処理手段を付加したり、あるいはそ
れらを個別回路で構成する他に、コンピュータのプログ
ラムＩＩＩＩＩ１等で実行する構成にする等、種々の変
形が可能である。Note that the present invention is not limited to the illustrated embodiment, and various modifications are possible. For example, various modifications can be made, such as adding other processing means to the text analysis section 30 or the control section 50, or configuring them with individual circuits, or configuring them to be executed by a computer program III1 or the like. It is possible.

（発明の効果）以上詳細に説明したように、本発明によれば、テキスト
・音声変換モードと音韻、韻律記号列出力モードと音韻
、韻律記号列合成モードとを設定するモード設定部を設
け、ざらにそのモード設定部の出力に基づき、文章デー
タ及び音韻、韻律記号列の入、出力を制仰する制御部と
を設けている。(Effects of the Invention) As described in detail above, according to the present invention, a mode setting unit is provided for setting a text-to-speech conversion mode, a phoneme, a prosodic symbol string output mode, a phoneme, a prosodic symbol string synthesis mode, Roughly, based on the output of the mode setting section, a control section is provided that controls the input and output of text data, phonemes, and prosodic symbol strings.

そのため、モード設定部により、テキスト・音声変換モ
ードを設定すると、通常の漢字かな混じり文章を入力し
て音声に変換することができる。また、モード設定部に
より、音韻、韻律記号列出力モードを設定すると、テキ
スト解析部で生成された音韻、韻律記号列が、文字コー
ドの形式で制御部を介して外部へ出力される。これによ
り、ユーザ等は、テキスト解析結果である音韻、韻律記
号列を自由に変更することができる。そして、音韻、韻
律記号列合成モードをモード設定部で設定した後、ユー
ザ等が変更した音韻、韻律記号列を外部から入力すれば
、ユーザ等が独自に調整した読みや、アクセント、イン
トネーション、息つぎ等を持った所望の合成音を得るこ
とができる。Therefore, by setting the text/speech conversion mode using the mode setting section, it is possible to input a normal sentence containing kanji and kana and convert it into speech. Furthermore, when the mode setting section sets the phoneme and prosodic symbol string output mode, the phoneme and prosodic symbol strings generated by the text analysis section are outputted to the outside via the control section in the form of a character code. This allows the user and the like to freely change the phoneme and prosodic symbol strings that are the text analysis results. After setting the phoneme and prosodic symbol string synthesis mode in the mode setting section, by inputting the phoneme and prosodic symbol strings changed by the user from the outside, the user can adjust the pronunciation, accent, intonation, breath, etc. independently adjusted by the user. It is possible to obtain a desired synthesized sound with the following, etc.

しかも、本発明では、モード設定部及び制御部を追加す
るだけで実現できるため、装置の構成が簡単になり、そ
れによって回路規模の小型化及び低コスト化という効果
も期待できる。Moreover, since the present invention can be realized by simply adding a mode setting section and a control section, the configuration of the device is simplified, and the effects of reducing the circuit size and cost can be expected.

[Brief explanation of the drawing]

第１図は本発明の実施例を示すテキスト・音声変換装置
の構成ブロック図、第２図は従来のテキスト・音声変換
装置の構成ブロック図である。１・・・・・・ホストコンピュータ、２０・・・・・・
テキスト・音声変換装置、２１・・・・・・インタフェ
ース部、２２・・・・・・単語辞書メモリ、３０・・・
・・・テキスト解析部、４０・・・・・・モード設定部
、５０・・・・・・制御部、５１・・・・・・モード判
定手段、５２・・・・・・入出力選択手段、６０・・・
・・・音声合成部、６１・・・・・・スピーカ。FIG. 1 is a block diagram of a text-to-speech conversion device according to an embodiment of the present invention, and FIG. 2 is a block diagram of a conventional text-to-speech conversion device. 1...Host computer, 20...
Text/speech conversion device, 21...interface unit, 22...word dictionary memory, 30...
... text analysis section, 40 ... mode setting section, 50 ... control section, 51 ... mode judgment means, 52 ... input/output selection means , 60...
...Speech synthesis section, 61...Speaker.

Claims

[Scope of Claims] A text analysis unit that analyzes kanji-kana mixed text data and generates a phonological and prosodic symbol string for the text data;
A text-to-speech conversion device comprising a speech synthesis unit that outputs synthesized speech corresponding to the text data based on the phoneme and prosodic symbol strings, a text-to-speech conversion mode in which the text data is input to the text analysis unit. , a phonological and metrical symbol string output mode in which the phonological and metrical symbol strings generated by the text analysis section are outputted to the outside in the form of character codes, and a phonological and metrical symbol string from the outside is directly input to the speech synthesis section. a mode setting unit that externally sets one of the phoneme and prosodic symbol string synthesis modes; and a mode setting unit that determines the mode state set by the mode setting unit and converts the text data and the phoneme and prosodic symbol strings based on the determination result. 1. A text-to-speech conversion device comprising: a control section for controlling input and output of the text.