JPH07334507A

JPH07334507A - Human body action and voice generation system from text

Info

Publication number: JPH07334507A
Application number: JP12600294A
Authority: JP
Inventors: San Ro; 山呂
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1994-06-08
Filing date: 1994-06-08
Publication date: 1995-12-22
Anticipated expiration: 2014-11-10
Also published as: JP2976811B2

Abstract

PURPOSE:To input a text written in a natural language, to generate a human body action which is matched with the content of the text and to output the content of the text by means of voice. CONSTITUTION:A natural language analysis device 1 fetching a word from the text, a verb/action pattern dictionary 21 describing the corresponding relation of the word showing the action and a human body action pattern, an action pattern generation device 2 retrieving the verb/action pattern dictionary 21 and generating the human body pattern, a modifier/action degree dictionary 31 describing the corresponding relation of a modifier and the degree of the action, an action degree generation device 3 retrieving the modifier/action degree dictionary 31 and generating the action degree of the human body pattern, an action and voice synchronization device 4 outputting operation time data obtained by making the appearing position of a verb in the text corresponding to the start time of action generation, and outputting voice data, an action video generation device 5 generating time sequential video data of the human body action and video-outputting it and a voice synthesis device 6 outputting voice by a regular voice synthesis method are provided.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、テキストからの人体動
作音声生成システムに関し、特に計算機システムを用い
て自然言語で書かれたテキスタファイルから、人体動作
および音声の生成や人間の動きと音声のアニメーション
の作成を行うシステムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a system for generating a human body motion voice from text, and more particularly to the generation of a human body motion and voice, and human motion and voice from a texta file written in a natural language using a computer system. Related to the system that creates the animation.

【０００２】[0002]

【従来の技術】従来の技術として、計算機による自然言
語の構文解析技術を利用した人体動作の生成方法が、特
開平４−２６４９７２号公報に開示されている。これは
自然言語の構文解析技術を用いて、テキストから解析さ
れる動作を表す特定な単語などを利用して動作プログラ
ムを生成し、人体動作の生成を行うものである。2. Description of the Related Art As a conventional technique, a method of generating a human body motion using a natural language syntax analysis technique by a computer is disclosed in Japanese Patent Laid-Open No. 264972/1992. This is to generate a human body motion by using a natural language syntax analysis technique to generate a motion program using a specific word or the like that represents a motion analyzed from text.

【０００３】また、別の従来の技術として、音声合成装
置からの音素を利用して人間の口形の変化を生成する方
法が、特開平２−２３４２８５号公報に開示されてい
る。これはテキスタファイルから規則音声合成装置によ
って生成された音声の音素を利用して、各音素に対応す
る口形特徴のパラメータを制御することにより、人間の
口形の変化を生成するものである。As another conventional technique, Japanese Patent Laid-Open No. 2-234285 discloses a method of generating a change in human mouth shape by using phonemes from a speech synthesizer. This is to generate a change in the human mouth shape by controlling the parameters of the mouth shape feature corresponding to each phoneme by using the phonemes of the voice generated by the regular voice synthesizer from the texter file.

【０００４】[0004]

【発明が解決しようとする課題】上述した従来の技術に
は、次のような問題点が存在する。The above-mentioned conventional technique has the following problems.

【０００５】（１）前者の場合は、テキストから動作の
生成が可能であるが、その動作に同期した音声の出力は
できない。(1) In the former case, the action can be generated from the text, but the voice synchronized with the action cannot be output.

【０００６】（２）さらに、前者では動作を記述するア
ニメーション専用の動作プログラムを生成するが、その
動作プログラムが計算機プログラムと同様な形式に記述
されているため、プログラマーではない一般利用者の編
集作業には適していない。(2) Further, in the former, a motion program dedicated to animation for describing motion is generated, but since the motion program is described in the same format as the computer program, editing work by a general user who is not a programmer Not suitable for.

【０００７】（３）後者の場合は、テキストの音声出力
から人間動作の一部である口形の変化だけを生成してお
り、より自然な人間な動作映像を生成するための身体な
どの他の部分の動きの生成が困難である。(3) In the latter case, only the change of the mouth shape, which is a part of the human motion, is generated from the voice output of the text, and the other such as the body for generating a more natural human motion image. It is difficult to generate part movement.

【０００８】[0008]

【課題を解決するための手段】本発明のテキストからの
人体動作音声生成システムは、上述した（１）項および
（３）項に記載の課題を解決するために、テキストから
動詞や副詞などの単語を取り出す自然言語解析手段と、
動詞などの動作を表す単語と人体動作パターンの対応関
係を記述する動詞・動作パターン辞書と、前記自然言語
解析手段で抽出された動詞を用いて前記動詞・動作パタ
ーン辞書を検索し人体動作パターンを生成する動作パタ
ーン生成手段と、動詞を修飾する副詞などの修飾語と動
作の程度の対応関係を記述する修飾語・動作程度辞書
と、前記自然言語解析手段で抽出された修飾語を用いて
前記修飾語・動作程度辞書を検索し前記人体動作パター
ンの動作程度を生成する動作程度生成手段と、動作映像
出力と合成音声出力とを同期させるためのテキスト中の
動詞の出現位置を動作生成の開始時刻に対応付けた動作
時間データ、およびテキストの長さから計算されるテキ
ストの読み上げ時間を含む音声時間データを出力する動
作音声同期化手段と、前記人体動作パターン，動作程度
および動作時間データを含む動作生成命令を入力とし人
体動作の時系列映像データを生成し表示手段に出力する
動作映像生成手段と、前記テキストおよび前記音声時間
データを含む音声生成命令を入力とし規則音声合成方法
で音声を出力する音声合成手段とを備えている。In order to solve the problems described in the above items (1) and (3), the system for producing a human body motion voice from text according to the present invention can be used to convert verbs and adverbs from texts. A natural language analysis means for extracting words,
A verb / action pattern dictionary that describes the correspondence between words that represent actions such as verbs and human body action patterns, and the verb / action pattern dictionary that is extracted using the verb extracted by the natural language analysis means are used to retrieve the human body action pattern. The action pattern generating means for generating, a modifier / actor degree dictionary describing the correspondence between modifiers such as adverbs for modifying verbs and the degree of motion, and the modifier extracted by the natural language analyzing means A motion generation means for searching a modifier / motion dictionary and generating the motion of the human body motion pattern, and the motion generation start for the appearance position of the verb in the text for synchronizing the motion video output and the synthetic voice output An operation voice synchronization means for outputting action time data associated with time, and voice time data including a text reading time calculated from the length of the text; A motion image generation unit that inputs a motion generation command including the human motion pattern, the degree of motion and motion time data, generates time series video data of the human motion and outputs the time series video data to the display unit, and a voice including the text and the voice time data. And a voice synthesizing unit which outputs a voice by a regular voice synthesizing method with a generation command as an input.

【０００９】また、上述した（２）項に記載の課題を解
決するために、上記の構成に前記動作パターン生成手
段，前記動作程度生成手段および前記動作音声同期化手
段から出力される前記動作生成命令と前記音声生成命令
とを人間可読のテキストである文章動作記述ファイル変
換する動作音声生成命令・テキスト変換手段と、前記動
作音声生成命令・テキスト変換手段に変換された文章動
作記述ファイルを格納する文章動作記述ファイル蓄積手
段と、前記文章動作記述ファイル蓄積手段からの文章動
作記述ファイルを前記動作生成命令と音声生成命令とに
変換し前記動作映像生成手段および前記音声合成手段に
出力するテキスト・動作音声生成命令変換手段とを付加
することにより、外部のエディターで前記文章動作記述
ファイルを修正できることを可能にしている。Further, in order to solve the problem described in the above item (2), the motion generation output from the motion pattern generation means, the motion degree generation means and the motion voice synchronization means is added to the above configuration. An action voice generation command / text conversion unit for converting a command and the voice generation command into a sentence action description file which is a human-readable text, and a sentence action description file converted by the action voice generation command / text conversion unit are stored. A text / motion description file storage means and a text / motion that converts the text / motion description file from the text / motion description file storage means into the motion generation command and the voice generation command, and outputs the motion / motion command to the motion video generation means and the voice synthesis means. By adding voice generation command conversion means, you can modify the sentence behavior description file with an external editor. It is made possible.

【００１０】[0010]

【作用】本発明においては、入力されたテキストを解析
して語句ごとに分割し、動詞や動詞修飾語などの単語を
取り出す。そして、動詞に出現する位置を動作開始の信
号とする規則に基づいて、動作生成のタイミングを決定
する。In the present invention, the input text is analyzed and divided into words, and words such as verbs and verb modifiers are extracted. Then, the action generation timing is determined based on the rule that the position appearing in the verb is used as the action start signal.

【００１１】また、動詞の種類に対応する人体動作パタ
ーンを決定したり、修飾語などを用いて、動作の動きの
程度を決めるところがポイントである。これにより、テ
キストを与えると音声出力および音声と同期した円滑な
人間の動作を自動的に作成することができる。Further, the point is to determine a human body motion pattern corresponding to the type of verb, or to determine the degree of motion of a motion by using a modifier. With this, when text is given, a voice output and a smooth human motion synchronized with the voice can be automatically created.

【００１２】[0012]

【実施例】次に、本発明について図面を参照して説明す
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, the present invention will be described with reference to the drawings.

【００１３】図１は、本発明のテキストからの人体動作
音声生成システムの一実施例を示すブロック図であっ
て、第１の発明の実施例の構成を示す図である。FIG. 1 is a block diagram showing an embodiment of a human-body motion voice generation system from text according to the present invention, and is a diagram showing a configuration of an embodiment of the first invention.

【００１４】本実施例のテキストからの人体動作音声生
成システムは、図１に示すように、Ａ装置１００とＢ装
置２００とから成り、Ａ装置１００は、テキストから動
詞や副詞などの単語を取り出す自然言語解析装置１と、
動詞などの動作を表す単語と人体動作パターンの対応関
係を記述する動詞・動作パターン辞書２１と、自然言語
解析装置１で抽出された動詞を用いて動詞・動作パター
ン辞書２１を検索し人体動作パターンを生成する動作パ
ターン生成装置２と、動詞を修飾する副詞などの修飾語
と動作の程度の対応関係を記述する修飾語・動作程度辞
書３１と、自然言語解析装置１で抽出された修飾語を用
いて修飾語・動作程度辞書３１を検索し前記人体動作パ
ターンの動作程度を生成する動作程度生成装置３と、動
作映像出力と合成音声出力とを同期させるためのテキス
ト中の動詞の出現位置を動作生成の開始時刻に対応付け
た動作時間データ、およびテキストの長さから計算され
るテキストの読み上げ時間を含む音声時間データを出力
する動作音声同期化装置４とから構成される。As shown in FIG. 1, the system for producing a human motion voice from text according to the present embodiment comprises an A device 100 and a B device 200, and the A device 100 extracts words such as verbs and adverbs from the text. Natural language analysis device 1,
Using the verb / action pattern dictionary 21 that describes the correspondence between words representing actions such as verbs and human action patterns, and the verb / action pattern dictionary 21 that is extracted by the natural language analysis device 1, the human action pattern is searched. A motion pattern generation device 2 for generating a motion pattern, a modifier / motion degree dictionary 31 for describing a correspondence relation between a modifier such as an adverb for modifying a verb and the degree of motion, and a modifier extracted by the natural language analysis device 1. The movement degree generation device 3 that searches the modifier / movement degree dictionary 31 using the movement degree generation apparatus 3 for generating the movement degree of the human body movement pattern, and the appearance position of the verb in the text for synchronizing the movement video output and the synthesized voice output Action voice synchronization that outputs action time data associated with the start time of action generation, and voice time data that includes the text reading time calculated from the length of the text Composed from the device 4.

【００１５】また、Ｂ装置２００は、人体動作パター
ン，動作程度および動作時間データを含む動作生成命令
を入力とし人体動作の時系列映像データを生成し表示装
置に出力する動作映像生成装置５と、テキストおよび音
声時間データを含む音声生成命令を入力とし規則音声合
成方法で音声を出力する音声合成装置６とから構成され
る。Further, the device B 200 receives the action generation command including the human action pattern, the degree of action and the action time data, generates the time-series image data of the human action, and outputs it to the display device. A voice synthesis device 6 which receives a voice generation command including text and voice time data and outputs a voice by a regular voice synthesis method.

【００１６】自然言語解析装置１は、外部より入力され
るテキストから個々の単語を取り出す。この自然言語解
析装置１は、従来からの構文解析の手法を用いて、文か
ら単語を抽出する。ここでは、ＣＹＫ法を用いた構文解
析の方法（杉村領一，赤坂宏二，久保幸弘：論理型形態
素解析ＬＡＸ，Ｐｒｏｃ．ｏｆｔｈｅＬｏｇｉｃ
ＰｒｏｇｒａｍｍｉｎｇＣｏｎｆ．ＩＣＯＴ，２
１３−２２２，１９８８年）などを利用している。な
お、自然言語解析装置１は、既存技術を利用するので、
ここでは詳細な説明を省略する。The natural language analysis device 1 extracts individual words from text input from the outside. The natural language analysis device 1 extracts a word from a sentence by using a conventional syntax analysis method. Here, a method of syntactic analysis using the CYK method (Ryoichi Sugimura, Koji Akasaka, Yukihiro Kubo: Logical Morphological Analysis LAX, Proc. Of the Logic).
Programming Conf. ICOT, 2
13-222, 1988) and the like. Since the natural language analysis device 1 uses the existing technology,
Detailed description is omitted here.

【００１７】この自然言語解析装置１によって、例え
ば、「彼が気持ち良く笑った」のような文を、「彼・が
・気持ち良く・笑った」の個々の独立した語句に分解す
ることができる。また、次に示すように、文の中に各々
の語句の文法上の意味が得られる。With this natural language analysis apparatus 1, for example, a sentence such as "he laughed pleasantly" can be decomposed into individual independent phrases "he wa ... pleasantly laughed". Further, as shown below, the grammatical meaning of each word is obtained in the sentence.

【００１８】彼：主語が：助詞気持ち良く：修飾語笑った：動詞自然言語解析装置１から出力される動詞を動作パターン
生成装置２の入力として、人体動作パターンを生成す
る。具体的には動詞および人体動作パターンの対応関係
を格納した動詞・動作パターン辞書２１を調べ、入力さ
れた動詞に対応した動作パターンを生成し出力する。He: Subject is: Particles Comfortable: Modifier Laughing: Verb The verb output from the natural language analysis device 1 is input to the motion pattern generation device 2 to generate a human body motion pattern. Specifically, the verb / motion pattern dictionary 21 that stores the correspondence between verbs and human motion patterns is searched, and a motion pattern corresponding to the input verb is generated and output.

【００１９】次に、人体動作パターンを生成する処理の
詳細について説明する。Next, details of the process for generating the human body motion pattern will be described.

【００２０】動詞・動作パターン辞書２１は、１つの動
詞に対して複数の人体動作パターンを対応させ、さら
に、それぞれの人体動作パターンに優先度が付与されて
いる。従って、動詞から人体動作パターンを検索すると
き、優先度の高い順に人体動作パターンを出力する。In the verb / motion pattern dictionary 21, a plurality of human motion patterns are associated with one verb, and each human motion pattern is given a priority. Therefore, when retrieving a human body movement pattern from a verb, the human body movement pattern is output in descending order of priority.

【００２１】図５は、図１の動作パターン生成装置２の
詳細を示すブロック図である。図５において、動作パタ
ーン制約条件辞書２６は、順序を付けられた２つの前後
の人体動作パターンが適切であるか否かの情報を格納す
る。例えば、頭部が左に傾けた状態で、頭部を前後に振
るといった動作パターンは明らかに不自然で適切ではな
いと定義される。こうした前人体動作パターンから現在
生成される人体動作パターンが適切であるか否かの知識
を動作パターン制約条件辞書２６の中に定義する。FIG. 5 is a block diagram showing details of the operation pattern generation device 2 of FIG. In FIG. 5, the motion pattern constraint condition dictionary 26 stores information as to whether or not two ordered human body motion patterns before and after are appropriate. For example, a motion pattern of swinging the head back and forth with the head tilted to the left is defined as unnatural and not appropriate. Knowledge of whether or not the human body motion pattern currently generated from the preceding human body motion pattern is appropriate is defined in the motion pattern constraint condition dictionary 26.

【００２２】図６は、図５の動作パターン生成装置２が
人体動作パターンを生成する処理の流れを示す流れ図で
ある。FIG. 6 is a flow chart showing the flow of processing for the motion pattern generation device 2 of FIG. 5 to generate a human motion pattern.

【００２３】まず、自然言語解析装置１から得られた動
詞を、動作パターン探索部２２に入力する。このとき、
カウント発生器２３から現在のカウント値（初期値が
“０”である）に“１”を加え、そのカウント値を動作
パターン探索部２２に送る。First, the verb obtained from the natural language analysis device 1 is input to the motion pattern search unit 22. At this time,
The count generator 23 adds "1" to the current count value (initial value is "0"), and sends the count value to the operation pattern search unit 22.

【００２４】以下、動作パターン生成装置２の動作につ
いて、カウント値＝１の場合と、カウント値＞１の場合
とに分けて説明する。カウント値＝１の場合；まず、動作パターン探索部２
２が入力された動詞を基に、動詞・動作パターン辞書２
１から優先度の最も高い人体動作パターンを取り出す。
そして、取り出された人体動作パターンを出力すると同
時に、この人体動作パターンとカウント値とを合わせて
ヒストリー記憶部２４に記憶させる。カウント値＞１の場合；動作パターン探索部２２が入
力された動詞を基に、動詞・動作パターン辞書２１から
優先度の最も高い人体動作パターン（ＭＰ_i ）を取り出
す。次に、ヒストリー記憶部２４から現カウント値の１
つ前の人体動作パターン（ＭＰ_i-1 ）を取り出す。そし
て、人体動作パターン（ＭＰ_i ）および（ＭＰ_i-1 ）を
動作パターン照合部２５に送る。The operation of the operation pattern generator 2 will be described below separately for the case of count value = 1 and the case of count value> 1. When the count value = 1: First, the motion pattern search unit 2
Verb / action pattern dictionary 2 based on the verb 2 entered
The human body motion pattern with the highest priority is extracted from 1.
Then, at the same time that the extracted human body motion pattern is output, the human body motion pattern and the count value are stored together in the history storage unit 24. When the count value> 1, the action pattern search unit 22 extracts the human body action pattern (MP _i ) having the highest priority from the verb / action pattern dictionary 21 based on the input verb. Next, from the history storage unit 24, the current count value of 1
The previous human body motion pattern (MP _i-1 ) is taken out. Then, the human body motion patterns (MP _i ) and (MP _i-1 ) are sent to the motion pattern matching unit 25.

【００２５】動作パターン照合部２５は、動作パターン
制約条件辞書２６を参考にし、１つ前の人体動作パター
ン（ＭＰ_i-1 ）に対して現在の人体動作パターン（ＭＰ
_i ）が適切であるか否かを判断する。The motion pattern matching unit 25 refers to the motion pattern constraint condition dictionary 26, and compares the current human motion pattern (MP _i-1 ) with the previous human motion pattern (MP _i-1 ).
Determine if _i ) is appropriate.

【００２６】そして、適切であると判断されると、現在
の人体動作パターン（ＭＰ_i ）を出力し、現在のカウン
ト値と現在の人体動作パターン（ＭＰ_i ）とをヒストリ
ー記憶部２４に記憶させる。また、適切ではないと判断
されると、動作パターン探索部２２がもう一度次に優先
度の高い人体動作パターン（ＭＰ′_i-1 ）を取り出す。When it is judged to be appropriate, the current human body motion pattern (MP _i ) is output, and the current count value and the current human body motion pattern (MP _i ) are stored in the history storage unit 24. . When it is determined that it is not appropriate, the motion pattern search unit 22 extracts the human body motion pattern (MP ' _i-1 ) having the next highest priority again.

【００２７】次に、この人体動作パターン（ＭＰ′
_i-1 ）と１つ前の人体動作パターン（ＭＰ_i-1 ）とを用
いて人体動作パターンの照合を行い、適切と判断される
まで人体動作パターンの探索・照合の処理を繰り返えす
とともに、探索されたすべての人体動作パターンが不適
切であると判断されると、優先度の最も高い人体動作パ
ターンを出力する。Next, this human body motion pattern (MP '
_i-1 ) and the previous human body movement pattern (MP _i-1 ) are used to collate the human body movement pattern, and the human body movement pattern search / collation processing is repeated until it is determined to be appropriate. If it is determined that all searched human body movement patterns are inappropriate, the human body movement pattern having the highest priority is output.

【００２８】さらに、自然言語解析装置１から出力され
る修飾語を動作程度生成装置３の入力として、人体動作
パターンの程度を記述する動作程度のデータを生成す
る。具体的には、修飾語および人体動作パターンの動作
程度の対応関係を格納した修飾語・動作程度辞書３１を
検索し、入力された修飾語に対応した動作程度を生成し
出力する。ここで、動作程度を表現するため、数値デー
タを用いることができる。Further, the modifier output from the natural language analysis device 1 is used as an input to the motion degree generation device 3 to generate motion degree data describing the degree of a human body motion pattern. Specifically, the modifier / motion degree dictionary 31 that stores the correspondence between the modifier and the motion degree of the human body motion pattern is searched, and the motion degree corresponding to the input modifier is generated and output. Here, numerical data can be used to express the degree of motion.

【００２９】動作音声同期化装置４は、自然言語解析装
置１から得られた語句を基に、動作の生成および合成音
声の出力のタイミングを一致させる機能を有する。本発
明においては、自然言語解析装置１からの句読点ではさ
まれた１区切りのテキストを動作生成と音声合成の基本
単位として考え、このテキスト中に動作を表す動詞の出
現する位置を動作生成の開始位置とする。その具体的な
処理について以下に説明する。The motion voice synchronization device 4 has a function of matching the timing of motion generation and the output of synthetic voice based on the words and phrases obtained from the natural language analysis device 1. In the present invention, a delimited text sandwiched by punctuation marks from the natural language analysis device 1 is considered as a basic unit of motion generation and speech synthesis, and the position where a verb representing a motion appears in this text is the start of motion generation. Position. The specific processing will be described below.

【００３０】図４は、動作音声同期化装置４の出力する
動作・音声の時間データの一例を示す図である。図４に
おいて、まず、音声合成装置６の音声出力速度を基にし
て１語句を出力する総時間を計算する。例えば、音声合
成装置６が１文字を出力するための所要時間をｔ秒とす
ると、ｎ個の文字からなる語句の出力時間Ｔ_S ＝ｎ×ｔ秒となる。そして、語句の始めの単語に対応して、音声の
開始時刻をｔ_S0秒（相対時間が０秒である）とすると、音声の終了時刻＝ｔ_S0＋Ｔ_S 秒となる。また、語句の中に動詞の出現する位置がｍ文字
目にあるとすると、動作生成の開始時間ｔ_m0＝ｔ_S0＋ｍ×ｔ秒動作時間の長さＴ_m ＝（ｎ−ｍ）×ｔ秒となる。図４には、このようにして計算された動作時間
データと音声時間データとを示す。FIG. 4 is a diagram showing an example of time data of motion / voice output from the motion voice synchronization device 4. In FIG. 4, first, the total time for outputting one word is calculated based on the voice output speed of the voice synthesizer 6. For example, if the time required for the speech synthesizer 6 to output one character is t seconds, then the output time T _{S of a} word consisting of n characters is Ts = n × t seconds. Then, when the voice start time is t _S0 seconds (relative time is 0 seconds) corresponding to the first word of the phrase, the voice end time is t _S0 + T _S seconds. If the position where the verb appears in the phrase is the m-th character, the motion generation start time t _m0 = t _S0 + m × t seconds The length of the operation time T _m = (n−m) × t seconds Becomes FIG. 4 shows the operation time data and the voice time data calculated in this way.

【００３１】次に、動作映像生成装置５は、動作パター
ン生成装置２からの人体動作パターン、動作程度生成装
置３からの動作程度、および動作音声同期化装置４から
の動作時間データを含む動作生成命令を入力として、デ
ィスプレイ装置やＶＴＲなどに人体動作の時系列画像を
出力する。この動作映像生成装置５においては、人体動
作パターンを複数の動作モジュールの合成による生成方
式（例えば、呂山，吉坂主旬，宮井均：「人体動作生成
システムの提案座化」，情報処理学会第４７回全国大会
講演論文集（２），３４５−３４６，１９９３年）を利
用する。Next, the motion picture generation device 5 generates a motion including the human body motion pattern from the motion pattern generation device 2, the motion level from the motion level generation device 3, and the motion time data from the motion voice synchronization device 4. A command is input, and a time series image of a human body motion is output to a display device, a VTR, or the like. In this motion picture generation apparatus 5, a method of generating a human motion pattern by synthesizing a plurality of motion modules (eg, Luyama, Shun Yoshisaka, Hitoshi Miyai: “Proposal of human motion generation system”, IPSJ The 47th National Convention Lecture Collection (2), 345-346, 1993) is used.

【００３２】続いて、音声合成装置６については、既存
の音声規則合成手法を利用することができる（山本誠
一，樋口宜男，清水水徹：「テキスト編集機能付き音声
規則合成装置の試作」，電子情報通信学会技術報告ＳＰ
８７−１３７，１９８８年３月）。そして、自然言語解
析装置１からの語句と動作音声同期化装置４からの時間
データを含む音声出力命令を入力として、音声を合成し
出力する。Next, with respect to the speech synthesizer 6, the existing speech rule synthesizing method can be used (Seiichi Yamamoto, Yoshio Higuchi, Tohru Shimizu: "Prototype of speech rule synthesizer with text editing function", IEICE Technical Report SP
87-137, March 1988). Then, it inputs a voice output command including the phrase from the natural language analysis device 1 and the time data from the motion voice synchronization device 4 and synthesizes and outputs a voice.

【００３３】図２は、第２の発明の一実施例の構成を示
すブロック図である。本実施例は、図２に示すように、
Ａ装置１００から出力される動作生成命令と音声生成命
令とを人間可読のテキストである文章動作記述ファイル
変換する動作音声生成命令・テキスト変換装置７と、こ
の動作音声生成命令・テキスト変換装置７に変換された
文章動作記述ファイルを格納する文章動作記述ファイル
蓄積装置８と、この文章動作記述ファイル蓄積装置８か
らの文章動作記述ファイルを動作生成命令と音声生成命
令とに変換し、Ｂ装置２００に出力するテキスト・動作
音声生成命令変換装置９とから構成される。FIG. 2 is a block diagram showing the configuration of an embodiment of the second invention. In this embodiment, as shown in FIG.
A motion voice generation command / text conversion device 7 for converting a motion generation command and a voice generation command output from the A device 100 into a text motion description file which is a human-readable text, and the motion voice generation command / text conversion device 7. A text action description file storage device 8 for storing the converted text action description file, a text action description file from this text action description file storage device 8 is converted into a motion generation command and a voice generation command, and the B device 200 is operated. It is composed of an output text / action voice generation command conversion device 9.

【００３４】なお、Ａ装置１００は、動作パターン生成
装置２，動作程度生成装置３および動作音声同期化装置
４を含み、Ｂ装置２００は、動作映像生成装置５および
音声合成装置６を含んでいるが、これらＡ装置１００お
よびＢ装置２００については、第１の発明の実施例にお
いて既に説明済みであり、重複を避けるために省略し、
図２の他の部分について説明する。The device A 100 includes a motion pattern generator 2, a motion degree generator 3 and a motion voice synchronizer 4, and the device B 200 includes a motion video generator 5 and a voice synthesizer 6. However, these A device 100 and B device 200 have already been described in the embodiment of the first invention, and are omitted to avoid duplication,
Another part of FIG. 2 will be described.

【００３５】本実施例では、Ａ装置１００から出力され
る動作生成命令と音声生成命令とを動作音声生成命令・
テキスト変換装置７により文章動作記述ファイルのフォ
ーマットに合ったテキストファイルに変換する。In this embodiment, the motion generation command and the voice generation command output from the A-apparatus 100 are converted into the motion voice generation command.
The text conversion device 7 converts the text action description file into a text file suitable for the format.

【００３６】図３は、このフォーマットの一例を示す図
である。図３において、テキストファイルに書き込まれ
たテキスト文章に対し、このテキスト中に含まれる動詞
と同じ位置に、アンダーラインマークを付け、さらに、
そのアンダーラインマークの下にＡ装置１００が生成し
た人体動作パターン名，人体動作パターンの動作程度の
パラメータｐと動作時間の長さｔとを記述する。FIG. 3 is a diagram showing an example of this format. In FIG. 3, an underline mark is added to the text sentence written in the text file at the same position as the verb included in the text, and
Under the underline mark, the human body motion pattern name generated by the device A 100, the parameter p of the motion level of the human body motion pattern, and the length t of the motion time are described.

【００３７】次に、動作音声生成命令・テキスト変換装
置７により変換された文章動作記述ファイルを磁気ディ
スク装置などの外部記憶装置から構成される文章動作記
述ファイル蓄積装置８に格納する。この格納した文章動
作記述ファイルは、可読なテキストファイルの形式であ
るため、一般に市販されているテキストエディターを利
用して、動作の修正などを容易に行うことができる。Next, the sentence action description file converted by the action voice generation command / text converter 7 is stored in the sentence action description file storage device 8 composed of an external storage device such as a magnetic disk device. Since the stored text action description file is in the form of a readable text file, it is possible to easily correct the action using a commercially available text editor.

【００３８】テキスト・動作音声生成命令変換装置９
は、動作音声生成命令・テキスト変換装置７とは反対
に、文章動作記述ファイル書き込まれた人体動作パター
ン，人体動作パターンの動作程度および動作時間データ
を読み出し、Ｂ装置２００中の動作映像生成装置５に入
力の動作生成命令に変換する。続いて、文章動作記述フ
ァイル書き込まれたテキストを読み出し、動作音声同期
化装置４で用いられた音声時間のデータの生成方法を利
用して、音声出力用のテキストと音声時間データを生成
し、音声合成装置６に入力し音声の出力を行う。Text / motion voice generation command conversion device 9
Contrary to the motion voice generation instruction / text conversion device 7, reads the human motion pattern, the motion degree of the human motion pattern, and the motion time data written in the text motion description file, and the motion video generation device 5 in the device B 200 is read. It is converted into the action generation command of input. Then, the text written in the text action description file is read, and the text for voice output and voice time data are generated by using the voice time data generation method used in the motion voice synchronization device 4. The sound is output by inputting it to the synthesizer 6.

【００３９】[0039]

【発明の効果】以上説明したように、本発明のテキスト
からの人体動作音声生成システムは、入力された自然言
語のテキストを合成音声で出力し、音声と同期が取れた
人体動作を自動的に生成することができる。As described above, the system for producing a human motion voice from text according to the present invention outputs the input natural language text as a synthetic voice and automatically performs the human motion synchronized with the voice. Can be generated.

【００４０】また、元のテキストに近い形の動作音声記
述ファイルを作成し、普通のテキストエディターでその
ファイルを編集することにより、最終的に生成される人
体動作の調整を行うことができる。By creating a motion voice description file having a shape close to the original text and editing the file with an ordinary text editor, the finally generated human body motion can be adjusted.

[Brief description of drawings]

【図１】第１の発明の一実施例の構成を示すブロック図
である。FIG. 1 is a block diagram showing a configuration of an embodiment of a first invention.

【図２】第２の発明の一実施例の構成を示すブロック図
である。FIG. 2 is a block diagram showing the configuration of an embodiment of the second invention.

【図３】文章動作記述ファイルのフォーマットの一例を
示す図である。FIG. 3 is a diagram showing an example of a format of a text action description file.

【図４】本実施例の動作音声同期化装置の出力する動作
・音声の時間データの一例を示す図である。FIG. 4 is a diagram showing an example of operation / voice time data output by the operation voice synchronization apparatus of the embodiment.

【図５】図１の動作パターン生成装置の詳細を示すブロ
ック図である。5 is a block diagram showing details of the operation pattern generation device in FIG. 1. FIG.

【図６】図５の動作パターン生成装置が人体動作パター
ンを生成する処理の流れを示す流れ図である。FIG. 6 is a flowchart showing a flow of processing for generating a human body motion pattern by the motion pattern generation device of FIG.

[Explanation of symbols]

１自然言語解析装置２動作パターン生成装置３動作程度生成装置４動作音声同期化装置５動作映像生成装置６音声合成装置７動作音声生成命令・テキスト変換装置８文章動作記述ファイル蓄積装置９テキスト・動作音声生成命令変換装置２１動詞・動作パターン辞書２２動作パターン探索部２３カウント発生器２４ヒストリー記憶部２５動作パターン照合部２６動作パターン制約条件辞書３１修飾語・動作程度辞書１００Ａ装置２００Ｂ装置 1 Natural Language Analyzer 2 Behavior Pattern Generator 3 Behavior Degree Generator 4 Behavior Voice Synchronizer 5 Behavior Video Generator 6 Voice Synthesizer 7 Behavior Voice Generation Command / Text Converter 8 Sentence Behavior Description File Storage 9 Text / Action Speech generation command conversion device 21 Verb / action pattern dictionary 22 Action pattern search unit 23 Count generator 24 History storage unit 25 Action pattern matching unit 26 Action pattern constraint dictionary 31 Modifier / action degree dictionary 100 A device 200 B device

Claims

[Claims]

1. A natural language analysis means for extracting words such as verbs and adverbs from texts, a verb / motion pattern dictionary for describing correspondence between words representing motions such as verbs and human motion patterns, and the natural language analysis means. A motion pattern generating means for searching the verb / motion pattern dictionary by using the verb extracted in 1. to generate a human motion pattern, and a modifier for describing the correspondence between a modifier such as an adverb that modifies the verb and the degree of motion. A motion degree dictionary, a motion degree generation means for searching the modifier / motion degree dictionary by using the modifiers extracted by the natural language analysis means, and generating a motion degree of the human body motion pattern; and a motion video output. Calculates the appearance position of the verb in the text to synchronize with the synthetic speech output from the action time data associated with the start time of action generation and the length of the text Operation voice synchronization means for outputting voice time data including the reading time of the text to be read, and time-series video data of the human body action generated and displayed by inputting the action generation command including the human body action pattern, the degree of action and the action time data A human body motion from a text, comprising: a motion video generation means for outputting to the means; and a voice synthesis means for inputting a voice generation command including the text and the voice time data and outputting a voice by a regular voice synthesis method. Voice generation system.

2. The human body motion voice generation system from text according to claim 1, wherein the motion generation command and the voice generation output from the motion pattern generation means, the motion degree generation means and the motion voice synchronization means. An action voice generation command / text conversion means for converting an instruction and a sentence action description file, a sentence action description file storage means for storing the sentence action description file converted by the action voice generation command / text conversion means, and the sentence A text / motion voice generation command conversion unit for converting the text motion description file from the motion description file storage unit into the motion generation command and the voice generation command and outputting to the motion video generation unit and the voice synthesis unit; The text that is characterized in that the text behavior description file can be modified with the editor of Human motion sound generation system et al.

3. The human-body motion voice generation system from text according to claim 2, wherein the text-motion description file is human-readable text.