JPH0216681A

JPH0216681A - Winking signal generating system for face animation picture synthesizing

Info

Publication number: JPH0216681A
Application number: JP63168482A
Authority: JP
Inventors: Eiji Morimatsu; 映史森松; Toshitaka Tsuda; 俊隆津田; Kiichi Matsuda; 松田　喜一
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1988-07-05
Filing date: 1988-07-05
Publication date: 1990-01-19

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】［目　次］概要産業上の利用分野従来の技術（第１９図）発明が解決しようとする課題課題を解決するための手段（第１図）作　用（第１図）実施例第１実施例の説明（第２〜１２図）第２実施例の説明（第１３図）第３実施例の説明（第１４〜１８図）発明の効果［概　要］初期化時に伝送される少数の初期化データを用いること
により、通信中に伝送される音声情報に応じて、受信側
で顔の動画像を合成して表示するものにおいて、顔動画
像合成用のまばたき信号を発生する方式に関し、話をしている時と、そうでない時とで、まばたきの様子
を変えることができるようにして、より自然な動画像の
合成を可能にすることを目的とし、初期化時に、初期化
データとして１発声中および非発声中における各まばた
きの時間間隔の平均値、ＩＩ準偏差を伝送し、通信時に
は、発声中および非発声中における各まばたきの時間間
隔の平均値、標準偏差に応じた正規分布に従う時間間隔
で。[Detailed description of the invention] [Table of contents] Overview Industrial field of application Prior art (Figure 19) Means for solving the problem to be solved by the invention (Figure 1) Effect (Figure 1) ) Embodiment Description of the first embodiment (Figs. 2 to 12) Description of the second embodiment (Fig. 13) Description of the third embodiment (Figs. 14 to 18) Effects of the invention [Summary] At the time of initialization By using a small amount of initialization data that is transmitted, it is possible to generate a blink signal for facial dynamic image synthesis in a device that synthesizes and displays facial dynamic images on the receiving side according to audio information transmitted during communication. Regarding the method of blinking, the aim is to make it possible to change the way the blinking occurs depending on whether you are talking or not, and to make it possible to synthesize more natural video images. , 1 The average value and standard deviation of the time interval of each blink during utterance and non-utterance are transmitted as initialization data, and during communication, the average value and standard deviation of the time interval of each blink during utterance and non-utterance are transmitted. at time intervals that follow a normal distribution according to .

発声時用パルス列信号および非発声時用パルス列信号を
それぞれ発生し、受信側で、音声入力検出部で検出した
検出信号に応じて、発声中は、発声時用パルス列信号を
まばたき信号として出力するとともに、非発声中は、非
発声時用パルス列信号をまばたき信号として出力するよ
うに構成する。A pulse train signal for vocalization and a pulse train signal for non-vocalizing are respectively generated, and the receiving side outputs the pulse train signal for vocalization as a blink signal during vocalization according to the detection signal detected by the audio input detection section. , during non-utterance, the pulse train signal for non-utterance is output as a blink signal.

［産業上の利用分野］本発明は、初期化時に伝送される少数の初期化データを
用いることにより、通信中に伝送される音声情報に応じ
て、受信側で顔の動画像を合成して表示するものにおい
て、顔動画像合成用のまばたき信号を発生する方式に関
する。[Industrial Application Field] The present invention uses a small amount of initialization data transmitted at the time of initialization to synthesize a moving image of a face on the receiving side according to audio information transmitted during communication. The present invention relates to a method of generating a blink signal for facial moving image synthesis in a display.

テレビ（ＴＶ）電話、ＴＶ会議等においては、最終的に
公衆電話回線を利用した伝送方式を採用することが目標
とされており、このため、得られた画像情報を可能な限
り圧縮することが要望されている。The goal is to eventually adopt a transmission method that uses public telephone lines for television (TV) telephones, TV conferences, etc., and for this reason, it is necessary to compress the obtained image information as much as possible. It is requested.

［従来の技術］ＴＶ電話等において伝送される画像は１通常、人物の原
動画像であるが、かかる動画像情報は。[Prior Art] Images transmitted in video telephones etc. are usually moving images of people, but such moving image information.

第１９図に示すように、音声情報とは独立して伝送され
るのが従来からの方式である４、即ち、入力画像は、送
信側においてＴＶカメラ６１によりアナログ画像信号と
して発生され、この画像信号は画像符号化装置６２でデ
ィジタル信号に変換されて符号化され圧縮されて受信側
に送られる。受信側では、画像復号化装置６３により受
信画像を元の信号に復号化してデイスプレィ６４に出力
画像として表示する。As shown in FIG. 19, in the conventional method 4, the input image is transmitted independently of the audio information, that is, the input image is generated as an analog image signal by the TV camera 61 on the transmitting side, and the input image is The signal is converted into a digital signal by an image encoding device 62, encoded, compressed, and sent to the receiving side. On the receiving side, an image decoding device 63 decodes the received image into an original signal and displays it on a display 64 as an output image.

また、入力音声は送信側でマイクロ５で音声情報として
得た後、音声符号化装置６６で音声特有の符号化を行な
って圧縮した後、受信側で音声復号化装置６７で復号化
されてスピーカー６８から出力音声として得られる。In addition, the input voice is obtained as voice information by the micro 5 on the transmitting side, and is then compressed by voice-specific encoding in the voice encoding device 66. Then, on the receiving side, it is decoded by the voice decoding device 67, and then it is transmitted to the speaker. 68 as output audio.

しかしながら、このような従来から一般的に行なわれて
きた動画像の伝送方式は、動画像の情報量が大きいため
、低ビツトレートの通信回線を利用することができず、
コストが高くなってしまうとともに、公衆電話回線を利
用したＴＶ電話等の適用には程遠いという問題点があっ
た。However, since the amount of information contained in moving images is large, this conventional video transmission method that has been commonly used cannot use low bit rate communication lines.
There were problems in that the cost was high and that it was far from being applicable to TV telephones using public telephone lines.

そこで、送信側からは例えば顔の静止画情報をあらかじ
め送っておき、受信側で、送信側から送られてくる音声
情報からこの音声情報に適合するように、口の部分だけ
を変形させて、画像を再生することも考えられる。Therefore, the sending side sends, for example, still image information of the face in advance, and the receiving side deforms only the mouth part to match the audio information sent from the sending side. It is also possible to play back images.

しかし、これでは顔の表情の中で重要な位置を占める瞼
が全く動かず、不自然さが増すという問題点がある。However, this has the problem that the eyelids, which play an important role in facial expressions, do not move at all, which increases the unnaturalness.

そこで、更に口の部分の変形に加えて、まばたきをラン
ダムに行なわせることにより、顔の表情に不自然さを出
さないようにしながら、原動画の情報をより圧縮できる
ようにした画像伝送方式も提案されている。Therefore, in addition to the deformation of the mouth area, an image transmission method has also been developed that makes it possible to further compress the information in the original video while making the facial expressions look less unnatural by making the eyes blink randomly. Proposed.

［発明が解決しようとする課題］しかしながら、このように口の部分の変形に加えてまば
たきをランダムに行なわせる従来の手段では、まばたき
の発生が全くランダムなものであるため、話をしている
ときでもそうでないときでも、まばたきの様子は変わら
ず、やはり不自然さが残るため、なんらかの改善が望ま
れている。[Problem to be solved by the invention] However, with the conventional means of deforming the mouth and blinking randomly, the occurrence of blinking is completely random, so it is difficult to talk. No matter when the eyes are blinking or not, the way the eyes blink remains the same, and it still feels unnatural, so some kind of improvement is desired.

本発明は、このような状況下において創案されたもので
、話をしているときと、そうでないときとで、まばたき
の様子を変えることができるようにして、より自然な動
画像の合成を可能にした。The present invention was devised under these circumstances, and allows for the synthesis of more natural moving images by making it possible to change the way the eyes blink depending on whether they are talking or not. made possible.

顔動画像合成用まばたき信号発生方式を提供することを
目的とする。The purpose of this invention is to provide a blink signal generation method for facial dynamic image synthesis.

［課題を解決するための手段］第１図は本発明の原理ブロック図である。[Means to solve the problem] FIG. 1 is a block diagram of the principle of the present invention.

第１図において、２８は顔動画像合成用まばたき信号発
生部で、このまばたき信号発生部２８は。In FIG. 1, reference numeral 28 denotes a blink signal generation section for facial moving image synthesis;

標準正規乱数テーブル２８１．第１．第２の乱数変換部
２８２，２８３．第１．第２のパルス発生部２８４，２
８５．音声入力検出部２８６．パルス列選択部２８７を
そなえて構成されている。Standard normal random number table 281. 1st. Second random number converter 282, 283. 1st. Second pulse generator 284,2
85. Audio input detection section 286. It is configured to include a pulse train selection section 287.

ここで、標準正規乱数テーブル２８１は、平均値Ｏ２標
準偏差１の正規分布に従う乱数系列Ｕｉ（ｉ＝１．２，
３．　　・・、ｎｊ；ｎは十分大きな整数）の値を記憶
したテーブルである。Here, the standard normal random number table 281 is a random number series Ui (i=1.2,
3. ..., nj; n is a sufficiently large integer).

第１の乱数変換部２８２は、初期化時に発声中における
まばたきの時間間隔の平均値ｍ工と標準偏差σ、とを受
けて、通信が開始されると、標準正規乱数テーブル２８
１の１より乱数値Ｕｉを読み出し、これに（１）式のよ
うな変換を施して平均値ｍ、と標準偏差σ１の正規分布
に従う乱数値Ｘに変換するものであり、同様に、第２の
乱数変換部２８３も、初期化時に非発声中におけるまば
たきの時間間隔の平均値ｍ２と標準偏差σ２とを受けて
、通信が開始されると、標準正規乱数テーブル２８１の
１より乱数値Ｕｉを読み出し、これに（２）式のような
変換を施して平均値ｍ２と標準偏差σ２の正規分布に従
う乱数値Ｘに変換するものである。The first random number conversion unit 282 receives the average value m and the standard deviation σ of the time interval of blinking during utterance at the time of initialization, and when communication is started, the standard normal random number table 282
The random value Ui is read from 1 of 1, and it is converted as shown in equation (1) to convert it into a random value X that follows a normal distribution with a mean value m and standard deviation σ1.Similarly, the second The random number conversion unit 283 also receives the average value m2 and standard deviation σ2 of the blink time during non-utterance at the time of initialization, and when communication starts, it converts the random number Ui from 1 in the standard normal random number table 281. This is read out and subjected to conversion as shown in equation (2) to convert it into a random value X that follows a normal distribution with an average value m2 and a standard deviation σ2.

Ｘ＝ＵｉＸ　ａ１＋ｍ１　（但しＸ＞Ｏ）−−（１）Ｘ
＝ＵｉＸｃｒ、＋ｍ、　　（但しＸ＞０）−−（２）第
１のパルス発生部２８４は、第１の乱数変換部２８２よ
り乱数値Ｘが入力されると、クロックをカウントし、カ
ウント値が乱数値Ｘの値と等しくなると、パルスを発生
し、その後、第１の乱数変換部２８２へ制御信号を発生
して、次の乱数値Ｘの値を入力し、同じ処理を繰り返す
ことにより。X=UiX a1+m1 (X>O) --(1)X
= UiXcr, +m, (where X > 0) -- (2) When the first pulse generator 284 receives the random value When it becomes equal to the value of the random number value X, a pulse is generated, and then a control signal is generated to the first random number conversion unit 282 to input the next value of the random number value X, and the same process is repeated.

パルス列信号Ｐ□を出力するもので、同様に、第２のパ
ルス発生部２８５も、第２の乱数変換部２８３より乱数
値Ｘが入力されると、クロックをカウントし、カウント
値が乱数値Ｘの値と等しくなると、パルスＰ２を発生し
、その後、第２の乱数変換部２８３へ制御信号を発生し
て、次の乱数値Ｘの値を入力し、同じ処理を繰り返すこ
とにより、パルス列信号Ｐ２を出力するものである。Similarly, when the second pulse generator 285 receives the random number X from the second random number converter 283, it counts the clock, and the count value becomes the random number X. When it becomes equal to the value of This outputs the following.

音声入力検出部２８６は、伝送されてきた音声のエネル
ギーを一定時間間隔でサンプリングし。The audio input detection unit 286 samples the energy of the transmitted audio at regular time intervals.

そのエネルギーが予め設定されたしきい値より大きけれ
ばオンとなり、小さければオフとなることにより、発声
中か非発声中かを検出するものである。If the energy is larger than a preset threshold value, it is turned on, and if it is smaller, it is turned off, thereby detecting whether vocalization is occurring or not.

パルス列選択部２８７は、音声入力検出部２８６で発声
中であることが検出されている間は第１のパルス発生部
２８４からのパルスＰ□を出力し、音声入力検出部２８
６で非発声中であることが検出されている間は第２のパ
ルス発生部２８５からのパルスＰ２を出力するように切
り替わるものである。The pulse train selection section 287 outputs the pulse P□ from the first pulse generation section 284 while the voice input detection section 286 detects that the voice is being uttered.
While non-voice is detected in step 6, the second pulse generator 285 switches to output the pulse P2.

［作　用］このような構成により、初期化時に、初期化データとし
て、発声中におけるまばたきの時間間隔の平均値ｍｉお
よび標準偏差σ１が第１の乱数変換部２８２へ伝送され
るとともに、非発声中におけるまばたきの時間間隔の平
均値ｍ、および標準偏差σ２が第２の乱数変換部２８３
へ伝送される。[Function] With this configuration, at the time of initialization, the average value mi and the standard deviation σ1 of the time intervals of blinking during vocalization are transmitted to the first random number conversion unit 282 as initialization data, and the non-vocalizing The average value m of the blink time interval and the standard deviation σ2 are determined by the second random number conversion unit 283
transmitted to.

そして、通信時には、発声中および非発声中における各
まばたきの時間間隔の平均値ｍ工１ｍ２と標準偏差σ１
．σ２とに応じた正規分布に従う時間間隔で、第１．第
２のパルス発生部２８４，２８５から発声時用パルス列
信号Ｐ１および非発声時用パルス列信号Ｐ２がそれぞれ
発生せしめられる。Then, during communication, the average value of the time interval of each blink during utterance and non-utterance is determined by the average value m×1m2 and the standard deviation σ1.
．． The first . The second pulse generating sections 284 and 285 generate a pulse train signal P1 for vocalization and a pulse train signal P2 for non-vocalizing, respectively.

さらに、この受信側では、音声入力検出部２８６で検出
した検出信号に応じて、パルス列選択部２８７が切り替
わることにより、発声中は、発声時用パルス列信号Ｐ１
がまばたき信号として出力されるとともに、非発声中は
、非発声時用パルス列信号Ｐ２がまばたき信号として出
力される。Furthermore, on the reception side, the pulse train selection section 287 switches according to the detection signal detected by the audio input detection section 286, so that during vocalization, the pulse train signal P1 for vocalization is selected.
is output as a blink signal, and during non-utterance, a non-utterance pulse train signal P2 is output as a blink signal.

これにより、話をしているときと、そうでないときとで
、まばたきの様子を変えることができる。This allows you to blink your eyes differently depending on whether you are talking or not.

［実施例］以下、図面を参照して本発明の詳細な説明する。[Example] Hereinafter, the present invention will be described in detail with reference to the drawings.

（ａ）第１実施例の説明第２図は本発明の第１実施例を示すブロック図で、この
第１実施例では、送信部１０と受信部２０とが設けられ
、送信部１０は、顔画像入力を画像処理する画像処理部
１１と、音声入力を符号化する音声符号化部１２とを含
んでいる。(a) Description of First Embodiment FIG. 2 is a block diagram showing a first embodiment of the present invention. In this first embodiment, a transmitting section 10 and a receiving section 20 are provided, and the transmitting section 10 includes: It includes an image processing section 11 that performs image processing on facial image input, and an audio encoding section 12 that encodes audio input.

また、受信部２０は、背景画メモリ１９．音声復号化部
２１．音声認識部２２．コードブック２３Ａ、口形モデ
ル変形部（口形モデル画像記憶手段）２４Ａ、制御点座
標メモリ（テーブル）２３Ｂ、陰影モデル変形部（瞼形
モデル画像記憶手段）２４Ｂ９合成部２５．補間点計算
部２７．顔動画像合成用まばたき信号発生部２８．座標
テーブル制御部２９を有している。The receiving unit 20 also includes a background image memory 19. Audio decoding section 21. Speech recognition unit 22. Code book 23A, mouth shape model transformation section (mouth shape model image storage means) 24A, control point coordinate memory (table) 23B, shadow model transformation section (eyelid shape model image storage means) 24B9 synthesis section 25. Interpolation point calculation unit 27. Blink signal generation unit 28 for facial dynamic image synthesis. It has a coordinate table control section 29.

ここで、背景画メモリ１９は、初期化時に送信側より送
られた１フレ一ム分の顔画像の静止画データを記憶し格
納するものである。Here, the background image memory 19 stores still image data of one frame of a face image sent from the transmitting side at the time of initialization.

また、音声復号化部２１は送信部１０で符号化された音
声符号を復号化するもので、音声認識部２２は音声復号
化部２１から出力された音声信号を音声認識するもので
、コードブック２３Ａは音声認識部２２から次々と出力
される音素符号（音声の基本構成単位である母音又は子
音などから成るもの）から１組の口形パラメータ値を逐
次選択するもので、口形モデル変形部（口形モデル画像
記憶手段）２４Ａはコードブック２３で逐次選択された
１組の口形パラメータ値に応じて口形モデル画像を変形
するものである。Further, the speech decoding section 21 decodes the speech code encoded by the transmission section 10, and the speech recognition section 22 speech recognizes the speech signal output from the speech decoding section 21. 23A sequentially selects a set of mouth shape parameter values from phoneme codes (consisting of vowels or consonants, etc., which are the basic constituent units of speech) output one after another from the speech recognition section 22; The model image storage means 24A transforms the mouth shape model image according to a set of mouth shape parameter values successively selected in the codebook 23.

ところで、コードブック２３Ａには、第４図に示すよう
に、特定の話者が各音素■、■・・・９ｍを発生した場
合の口の形状をパラメータ■　（例えば口の横幅）、■
（例えば唇の厚さ）、・・・ｔｎ（例えば口の縦幅）と
して数値化したテーブルが予めその個人情報として記憶
されている。ここで、例えば、音素１．ｎ、ＩＩＩに対
する口画像の一例を模式的に示すと、第６図（ａ）、（
ｂ）、（ｃ）のようになる。By the way, as shown in FIG. 4, the codebook 23A includes parameters ■ (for example, the width of the mouth),
(For example, the thickness of the lips), ... tn (For example, the vertical width of the mouth) A table is stored in advance as the personal information. Here, for example, phoneme 1. FIG. 6(a), (
b) and (c).

また、口形モデル変形部２４Ａは、その個人情報として
予めその特定話者の１画面（１フレーム）分の口画像デ
ータを背景画メモリ１９を介してもらい、これを口の幾
何学的形状の骨組となるパッチ・モデルにマツピングし
たものを口形モデル画像として記憶しておく。このよう
に、最初に送信部１０から目部分の画像を１画面分送っ
ておく場合でも、コードブック２３Ａは予め作っておく
必要がある。In addition, the mouth shape model deformation unit 24A obtains one screen (one frame) worth of mouth image data of the specific speaker in advance as the personal information via the background image memory 19, and uses this as the skeleton of the geometric shape of the mouth. The mapped patch model is stored as a mouth shape model image. In this way, even if one screen worth of images of the eyes is first sent from the transmitter 10, the codebook 23A needs to be created in advance.

補間点計算部２７は、静止画データに対応する瞼形状モ
デル（第７図参照）の全頂点Ｐ、〜Ｐ、の座標データを
初期化時に受けて、まばたき開始から終了までの各フレ
ーム時点での制御点ｐ、、ｐ、。The interpolation point calculation unit 27 receives the coordinate data of all vertices P, ~P, of the eyelid shape model (see Fig. 7) corresponding to the still image data at the time of initialization, and calculates the coordinate data at each frame point from the start to the end of blinking. Control points p,,p,.

Ｐ４の座標を線形補間計算し、そのデータを制御点座標
メモリ２３Ｂへ送るものである。The coordinates of P4 are calculated by linear interpolation, and the data is sent to the control point coordinate memory 23B.

すなわち、この瞼形状モデルは、第７図に示すごとく、
８個の頂点Ｐ１〜Ｐ、（各点がＸｔ’ｊの２次元座標値
をもつ）と、これらの頂点Ｐ１〜Ｐ、をつないでできる
６個の三角形パッチＴ１〜Ｔｌｌとで構成されるが、こ
の瞼形状モデルは、まばたきの動作を合成するため、ｐ
、、ｐ、、ｐ、を制御点（ｘ、ｙ座標を変化させる点）
とし、その他の５点は不動（固定点）としている。In other words, this eyelid shape model is as shown in FIG.
It is composed of eight vertices P1 to P (each point has a two-dimensional coordinate value of Xt'j) and six triangular patches T1 to Tll formed by connecting these vertices P1 to P. , this eyelid shape model synthesizes the blinking action, so p
,,p,,p are control points (points that change the x, y coordinates)
The other five points are fixed (fixed points).

そして、この補間点計算部２７においては、初期化時に
、８個の頂点Ｐ１〜Ｐ、の座標のほかに、ｐ、、　ｐ、
、　ｐ、の最下点を示すｐ、、ｐ。Then, in this interpolation point calculation unit 27, in addition to the coordinates of the eight vertices P1 to P, p, p, p,
, p, indicating the lowest point of , p.

Ｐ４′の３点の座標値も与えられ、あらかじめ与えられ
たまばたき１回当りのフレーム数Ｎより、Ｐ２→Ｐ２′
→Ｐ、、Ｐ□→Ｐ、′→ｐ、、ｐ４→Ｐ。The coordinate values of the three points P4' are also given, and from the pre-given number N of frames per blink, P2→P2'
→P,,P□→P,'→p,,p4→P.

→Ｐ４の各区間を線形補間するようになっている。→ Each section of P4 is linearly interpolated.

制御点座標メモリ２３Ｂは、陰影モデル画像の瞼パラメ
ータを基に瞼のまばたき動作を記憶するものである。具
体的には、上記補間点計算部２７で補間計算されたまば
たき開始から終了までの各フレーム時点における３つの
制御点ｐ、、　ｐ、。The control point coordinate memory 23B stores the blinking motion of the eyelids based on the eyelid parameters of the shadow model image. Specifically, three control points p, , p, at each frame time point from the start to the end of blinking are interpolated and calculated by the interpolation point calculation unit 27.

Ｐ４の座標をテーブルの形で、制御点座標メモリ２３Ｂ
に記憶領域に保管するのである。この制御点座標テーブ
ルの構成例を第５図に示す。The coordinates of P4 are stored in the control point coordinate memory 23B in the form of a table.
It is stored in the storage area. An example of the structure of this control point coordinate table is shown in FIG.

まばたき信号発生部２８は、まばたき信号（パルス信号
）を発生するもので、第３図に示すごとく、乱数発生器
２８０９機標準正規乱数テーブル２８１、第１．第２の
乱数変換部２８２，２８３゜第１．第２のパルス発生部
２８４，２８５．音声入力検出部２８６．パルス列選択
部２８７をそなえて構成されている。The blink signal generator 28 generates a blink signal (pulse signal), and as shown in FIG. Second random number converter 282, 283° 1st. Second pulse generator 284, 285. Audio input detection section 286. It is configured to include a pulse train selection section 287.

ここで、乱数発生器２８０は、初期化時の信号入力に応
じて乱数開始位置を設定するポインタ値ｉｌｌ　１２　
（１≦１１ｔ’１２≦ｎ）をランダムに発生するもので
ある。Here, the random number generator 280 generates a pointer value ill 12 that sets a random number start position according to a signal input at the time of initialization.
(1≦11t'12≦n) is randomly generated.

標準正規乱数テーブル２８１は、第９図に示すような平
均値ｏ、ｍ準偏差１の正規分布に従う乱数系列Ｕｉ（ｉ
＝＝１，２，３．　　・ａ、ｎｕｎは十分大きな整数）
の値を記憶したテーブル（メモリ）である。The standard normal random number table 281 includes a random number sequence Ui (i
==1, 2, 3.・a and nun are sufficiently large integers)
This is a table (memory) that stores the values of .

第１の乱数変換部２８２は、初期化時に発声中における
まばたきの時間間隔の平均値ｍ、、＠準偏差σ１と乱数
発生器２８０から乱数開始位置を設定するポインタ値１
１とを受けて、通信が開始されると、標準正規乱数テー
ブル２８１の番地１１よりこの番地ｉＬに対応する乱数
値Ｕｉを読み出し、これに前述の（１）式（下記参照）
のような変換を施して第１０図に示すような平均値ｍ工
と標準偏差σ１の正規分布に従う乱数値Ｘに変換するも
のである。At the time of initialization, the first random number conversion unit 282 uses the average value m of the time interval of blinking during utterance, @ standard deviation σ1 and a pointer value 1 for setting the random number start position from the random number generator 280.
1, and communication is started, the random number Ui corresponding to this address iL is read from address 11 of the standard normal random number table 281, and the above-mentioned formula (1) (see below) is read out from address 11 of the standard normal random number table 281.
By performing the following transformation, the random value X is converted into a random value X that follows a normal distribution with an average value m and a standard deviation σ1 as shown in FIG.

ｘ：＝ｔＪｉｘ　ａ、＋ｍ、　　（但しＸ＞０）−−（
１）そして、この第１の乱数変換部２８２は、後述の第
１のパルス発生部２８４から制御信号を待って１８を１
ずつ増やして同じ処理を繰り返す。x:=tJix a, +m, (X>0)−−(
1) Then, this first random number conversion section 282 waits for a control signal from a first pulse generation section 284, which will be described later, and converts 18 into 1.
Increase by increments and repeat the same process.

かかる処理を第１１図に示す、即ち、まずステップａ１
で、初期値ｍ工、σ□ｐ　ｘｚを設定し、ステップａ２
で、標準正規乱数テーブル２８１から１１に対応するＵ
ｉを読み出し、ステップａ３で。Such processing is shown in FIG. 11, that is, first step a1
Then, set the initial values m, σ□p xz, and proceed to step a2.
Then, U corresponding to standard normal random number table 281 to 11
Read i in step a3.

乱数値Ｘ、＝ＵｉＸσ□＋ｍ０を演算し、ステップａ４
で、Ｘ〉０かどうかを判定し、ＹＥＳなら、ステップａ
５で、乱数値Ｘを入力し、ステップａ６で、第１のパル
ス発生部２８４から制御信号の入力があったかどうかが
判定され、制御信号の入力があった場合は、ステップａ
７で、ｉ、＝ｉｌ＋１として１次のステップａ８で、１
１≦ｎかどうかを判定する。かかる処理はｉ、＝ｎ＋１
になるまで行なわれ、ｉ、＝ｎ＋１となると、ステップ
ａ９で、１ｉ＝１と初期化して同様の処理を繰り返す。Calculate random value X,=UiXσ□+m0, step a4
Then, determine whether X>0, and if YES, step a
In step 5, a random value X is input, and in step a6, it is determined whether or not a control signal has been input from the first pulse generator 284. If a control signal has been input, step a
7, as i,=il+1, in the first step a8, 1
Determine whether 1≦n. Such processing is i,=n+1
When i,=n+1, in step a9, the process is initialized to 1i=1 and the same process is repeated.

なお、ステップａ４で、乱数値Ｘが負の値になった場合
は、ステップａ５．ａ６はジャンプする。Note that if the random value X becomes a negative value in step a4, step a5. a6 jumps.

また、ステップａ６で、制御信号が入力されないうちは
１次のステップへは移らない。Further, in step a6, the process does not proceed to the first step until the control signal is input.

同様に、第２の乱数変換部２８３も、初期化時に非発声
中におけるまばたきの時間間隔の平均値ｍ２．標準偏差
σ、および乱数発生器２８０から乱数開始位置を設定す
るポインタ値１２とを受けて、通信が開始されると、標
準正規乱数テーブル２８１の番地１２よりこの番地１２
に対応する乱数値Ｕｉを読み出し、これに前述の（２）
式（下記参照）のような変換を施して第１０図に示すも
のとほぼ同様な平均値ｍ２と標準偏差σ２の正規分布に
従う乱数値Ｘに変換するものである。Similarly, the second random number conversion unit 283 also calculates the average value m2 of the blink time during non-utterance at the time of initialization. When communication is started upon receiving the standard deviation σ and the pointer value 12 for setting the random number start position from the random number generator 280, this address 12 is selected from address 12 of the standard normal random number table 281.
Read out the random value Ui corresponding to , and apply the above (2) to this
The random value X is converted into a random value X that follows a normal distribution with an average value m2 and a standard deviation σ2, which is substantially similar to that shown in FIG.

Ｘ＝ＵｉＸ　ａ２＋ｍ２　（但しＸ＞Ｏ）−−（２）そ
して、この第２の乱数変換部２８３も、後述の第２のパ
ルス発生部２８５から制御信号を待って１２を１ずつ増
やして同じ処理を繰り返す。X=UiX a2+m2 (where X>O) --(2) Then, this second random number conversion section 283 also waits for a control signal from the second pulse generation section 285, which will be described later, and increases 12 by 1 and performs the same process. repeat.

なお、この第２の乱数変換部２８３における処理フロー
も第１１図に示すものと同じである。Note that the processing flow in this second random number conversion section 283 is also the same as that shown in FIG.

第１のパルス発生部２８４は、クロックを計数するカウ
ンタ２８４ａ、このカウンタ２８４ａからのカウント値
と第１の乱数変換部２８２からの乱数値Ｘとを比較する
比較器２８４ｂ、この比較器２８４ｂから一致パルスが
出されるとパルスを出力するパルス発生器２８４Ｃとを
そなえてなり、これにより第１の乱数変換部２８２より
乱数値Ｘが入力されると、クロックをカウントし、カウ
ント値が乱数値Ｘの値と等しくなると、パルスを発生し
、その後、第１の乱数変換部２８２へ制御信号を発生し
て９次の乱数値Ｘの値を入力し、同じ処理を繰り返すこ
とにより、第１２図（ａ）に示すようなパルス列信号Ｐ
□を出力するもので、同様に、第２のパルス発生部２８
５も、クロックを計数するカウンタ２８５ａ、このカウ
ンタ２８５ａからのカウント値と第２の乱数変換部２８
３からの乱数値Ｘとを比較する比較器２８５ｂ、この比
較器２８５ｂから一致パルスが出されるとパルスを出力
するパルス発生器２８５Ｃとをそなえてなり、これによ
り第２の乱数変換部２８３より乱数値Ｘが入力されると
、クロックをカウントし、カウント値が乱数値Ｘの値と
等しくなると、パルスを発生し、その後、第２の乱数変
換部２８３へ制御信号を発生して、次の乱数値Ｘの値を
入力し、同じ処理を繰り返すことにより、第１２図（ｂ
）に示すようなパルス列信号Ｐ２を出力するものである
。The first pulse generator 284 includes a counter 284a that counts clocks, a comparator 284b that compares the count value from the counter 284a and the random value X from the first random number converter 282, and a match from the comparator 284b. It is equipped with a pulse generator 284C that outputs a pulse when a pulse is generated, and when a random number value X is input from the first random number converter 282, the clock is counted and the count value is equal to the random number value X. When the value becomes equal to the value, a pulse is generated, and then a control signal is generated to the first random number converter 282 to input the value of the 9th order random number X, and the same process is repeated. ) A pulse train signal P as shown in
Similarly, the second pulse generator 28 outputs □.
5 also includes a counter 285a that counts clocks, a count value from this counter 285a, and a second random number converter 28.
3, and a pulse generator 285C that outputs a pulse when a coincidence pulse is output from the comparator 285b. When the numerical value X is input, the clock is counted, and when the count value becomes equal to the value of the random numerical value By inputting the value of numerical value X and repeating the same process, Figure 12 (b
) outputs a pulse train signal P2 as shown in FIG.

音声入力検出部２８６は、伝送されてきた音声のエネル
ギーを一定時間間隔でサンプリングし、そのエネルギー
が予め設定されたしきい値より大きければオンとなり、
小さければオフとなることにより［第１２図（ｃ）参照
］、発声中か非発声中かを検出するものである。The audio input detection unit 286 samples the energy of the transmitted audio at regular time intervals, and turns on if the energy is greater than a preset threshold.
If it is smaller, it is turned off [see FIG. 12(c)], thereby detecting whether vocalization is occurring or not.

パルス列選択部２８７は、音声入力検出部２８６で発声
中であることが検出されている間は第１のパルス発生部
２８４からのパルスＰ０をまばたき開始信号として出力
し、音声入力検出部２８６で非発声中であることが検出
されている間は第２のパルス発生部２８５からのパルス
Ｐ２をまばたき開始信号として出力するように切り替わ
るものでで、マルチプレクサが使用される。The pulse train selection unit 287 outputs the pulse P0 from the first pulse generation unit 284 as a blink start signal while the audio input detection unit 286 detects that the voice is being uttered, and the audio input detection unit 286 outputs the pulse P0 as a blink start signal. While it is detected that utterance is in progress, the pulse P2 from the second pulse generator 285 is switched to be output as the blink start signal, and a multiplexer is used.

従って、このパルス列選択部２８７からの出力パルス列
は第１２図（ｄ）のようになるので、話をしているとき
と、そうでないときとで、異なったパルス列信号を発生
させることができ、これにより、まばたきの様子を変え
ることができる。Therefore, the output pulse train from the pulse train selection section 287 is as shown in FIG. This allows you to change the way your eyes blink.

ところで、第２図の座標テーブル制御部２９は、まばた
き信号発生部２８からまばたき開始信号を受けた時点か
ら制御点座標メモリ２３Ｂの座標テーブル内の全頂点デ
ータを順次読み出し、各フレームごとに陰影モデル変形
部２４Ｂへと転送するものである。By the way, the coordinate table control unit 29 in FIG. 2 sequentially reads out all vertex data in the coordinate table of the control point coordinate memory 23B from the time when it receives the blink start signal from the blink signal generation unit 28, and creates a shadow model for each frame. It is transferred to the deforming section 24B.

陰影モデル変形部２４Ｂは、顔の瞼部分の幾何学的形状
を示す陰影パラメータによって定義される除温モデル画
像を記憶するもので、この陰影モデル変形部２４Ｂでは
、制御点座標メモリ２３Ｂから瞼パラメータを取り出し
、この瞼パラメータに基づいて除温モデル画像を変形す
るものである。The shadow model transformation unit 24B stores a warming model image defined by shadow parameters indicating the geometrical shape of the eyelid portion of the face.The shadow model transformation unit 24B stores the eyelid parameters from the control point coordinate memory 23B The model image is then transformed based on the eyelid parameters.

具体的には、座標テーブル制御部２９の作用により、制
御点座標メモリ２３Ｂから順次送られてくる瞼パラメー
タを取り込んで、この瞼パラメータに基づいて除温モデ
ル画像を変形するのである。Specifically, by the action of the coordinate table control unit 29, the eyelid parameters sequentially sent from the control point coordinate memory 23B are taken in, and the dewarming model image is transformed based on the eyelid parameters.

ここで、この除温モデル画像の変形の様子を模式的に示
すと、第８図（ａ）〜（ｃ）のようになる。Here, the deformation of this temperature removal model image is schematically shown in FIGS. 8(a) to 8(c).

合成部２５は、口形モデル変形部２４Ａから発生された
自画像および陰影モデル変形部２４Ｂから発生された瞼
画像を、背景画メモリ１９に記憶された静止顔画像の目
部分および瞼部分以外の画像と合成するものである。The synthesis unit 25 combines the self-portrait generated from the mouth shape model transformation unit 24A and the eyelid image generated from the shadow model transformation unit 24B with an image other than the eyes and eyelids of the still face image stored in the background image memory 19. It is something that is synthesized.

次に、この第１実施例の動作を説明する。Next, the operation of this first embodiment will be explained.

音声入力は音声符号化部１２で符号化されて受信部２０
に伝送されるが、この音声符号は音声復号化部２１で復
号化して音声として出力される。The audio input is encoded by the audio encoder 12 and sent to the receiver 20.
This audio code is decoded by the audio decoding section 21 and output as audio.

また、一方において、この音声出力は音声認識部２２に
送られ、その音素符号が逐次抽出されてコードブック２
３Ａに送られる。コードブック２３Ａでは、入力した音
素符号に基づいて第４図に示すコードブックの中から対
応する口形に関する１組のパラメータ値１．ＩＩ、・・
・、ｎを選択する。On the other hand, this voice output is sent to the voice recognition unit 22, and its phoneme codes are sequentially extracted and stored in the codebook 22.
Sent to 3A. In the codebook 23A, based on the input phoneme code, a set of parameter values 1. II...
・, select n.

そして、これらの選択された１組のパラメータ値により
、予め記憶した口形モデル画像を変形した自画像を口形
モデル変形部２４Ａで発生する。この結果、発生された
自画像と音声認識部２２で抽出された音素との対応関係
は、例えば第６図（ａ）（ｂ）、（Ｑ）に示すようにな
る。Then, based on the selected set of parameter values, the mouth shape model deforming section 24A generates a self-portrait by deforming the mouth shape model image stored in advance. As a result, the correspondence between the generated self-image and the phonemes extracted by the voice recognition section 22 is as shown in FIGS. 6(a), 6(b), and 6(Q), for example.

一方、まばたき信号発生部２８からは、話中とそうでな
いときとで、異なったランダムな時間間隔で、まばたき
開始信号が発せられる。On the other hand, the blink signal generator 28 generates a blink start signal at different random time intervals depending on whether the phone is busy or not.

即ち、初期化時に、初期化データとして１発声中におけ
るまばたきの時間間隔の平均値ｍ工、Ｉｔｌ準偏差σ１
および乱数開始位置を設定するポインタ値１１が第３図
に示す第１の乱数変換部２８２へ伝送されるとともに、
非発声中におけるまばたきの時間間隔の平均値ｍ、、４
！１１１準偏差σ２および乱数開始位置を設定するポイ
ンタ値ｉ、Ｉが第２の乱数変換部２８３へ伝送される。That is, at the time of initialization, the average value m of the time interval of blinking during one utterance and Itl standard deviation σ1 are used as initialization data.
The pointer value 11 for setting the random number start position is transmitted to the first random number converter 282 shown in FIG.
Average value of blink time interval during non-utterance, m, 4
! The 111 standard deviation σ2 and pointer values i and I for setting the random number start position are transmitted to the second random number conversion unit 283.

そして、通信時には１発声中および非発声中における各
まばたきの時間間隔の平均値ｍ１．ｍ２と標準偏差σ１
．σ□とに応じた正規分布に従う時間間隔で、第１．第
２のパルス発生部２８４，２８５から発声時用パルス列
信号Ｐ工および非発声時用パルス列信号Ｐ２がそれぞれ
発生される。During communication, the average value m1 of the time interval of each blink during one utterance and during non-utterance is determined. m2 and standard deviation σ1
．． The first . The second pulse generators 284 and 285 generate a pulse train signal P for vocalization and a pulse train signal P2 for non-vocal, respectively.

さらに、この受信側では、音声入力検出部２８６で検出
した検出信号に応じて、パルス列選択部２８７が切り替
わることにより、発声中は、第１２図（２）に示すよう
な発声時用パルス列信号Ｐ工がまばたき開始信号として
出力されるとともに、非発声中は、第１２図（ｂ）に示
すような非発声時用パルス列信号Ｐ２がまばたき開始信
号として出力される。Furthermore, on the reception side, the pulse train selection section 287 switches according to the detection signal detected by the audio input detection section 286, so that during vocalization, the pulse train signal P for vocalization as shown in FIG. During non-speech, a non-speech pulse train signal P2 as shown in FIG. 12(b) is output as a blink start signal.

これにより１話をしているときと、そうでないときとで
、異なったパルス列信号が出力される［第１２図（ｃ）
、（ｄ）参照］。As a result, different pulse train signals are output depending on whether one episode is being made or not [Figure 12(c)]
, see (d)].

このようにまばたき信号発生部２８からパルス列信号が
出力されると、座標テーブル制御部２９では、このまば
たき開始信号を受けた時点から、制御点座標メモリ２３
Ｂの座標テーブル内の全頂点データを読み出し、各フレ
ーム毎に陰影モデル変形部２４Ｂへと転送する。かかる
転送はまばたき開始信号発生時から単位まばたき当りの
フレーム数が経過した時点で終了する。そして、陰影モ
デル変形部２４Ｂでは、上記の頂点データに従って、あ
らかじめ記憶した除温モデル画像を変形した瞼画像を発
生する。When the pulse train signal is output from the blink signal generating section 28 in this way, the coordinate table control section 29 starts the control point coordinate memory 23 from the time when this blink start signal is received.
All vertex data in the coordinate table of B is read out and transferred to the shadow model transformation unit 24B for each frame. Such transfer ends when the number of frames per unit blink has elapsed since the blink start signal was generated. Then, the shadow model deformation unit 24B generates an eyelid image by deforming the pre-stored warming model image according to the above vertex data.

このようにして変形して発生された自画像および瞼画像
は、背景画メモリ１９に記憶された静止顔画像の口およ
び瞼以外の画像と、合成部２５で、合成されて、顔全体
の動画像として出力されることとなる。The self-portrait and eyelid image deformed and generated in this way are combined with the image other than the mouth and eyelids of the still face image stored in the background image memory 19 in the compositing unit 25 to create a moving image of the entire face. This will be output as .

これにより、原動画の情報をより圧縮できるので、情報
量を大きく削減することができ、その結果、低ビツトレ
ートの回線を利用した低置な画像伝送方式を実現できる
ほか、顔の中の瞼の部分が会話の途中において、話して
いるときとそうでないときとで、異なった間隔でまばた
きをするので、顔の表情がより自然になる。This makes it possible to further compress the information in the original video, significantly reducing the amount of information.As a result, it is possible to realize a low-cost image transmission method that uses a low bit rate line, and also to During a conversation, the person blinks at different intervals depending on whether they are talking or not, making their facial expressions more natural.

なお、上記の口形モデル変形部２４Ａでの口形モデル画
像の変形および陰影モデル変形部２４Ｂでの陰影モデル
画像の変形に用いられる手法は、信学技報ＩＥ８７−２
．第８７巻、第１９号、１９８７に記述されている。The method used to transform the mouth shape model image in the mouth shape model transformation section 24A and the shadow model image in the shadow model transformation section 24B is described in IEICE technical report IE87-2.
．． 87, No. 19, 1987.

（ｂ）第２実施例の説明第１３図は本発明の第２実施例を示すブロック図である
が、前述した第２図の第１実施例と異なる点は、送信部
１０に音声認識部１３を設け、送信側で音素符号とその
他の情報（イントネーション、ピッチ等）とに分離して
受信部２０に送り、受信部２０では、音素符号をそのま
まコードブック２３Ａで用いるとともに音素符号とイン
トネーション等の情報とを音声合成部２６で合成して音
声出力を発生していることである。その他の構成および
動作（まばたき信号発生部の構成および動作を含む）は
第２，３図の場合と同様である。従って、この第２実施
例においても、前述の第１実施例と同様の効果ないし利
点が得られる。(b) Description of Second Embodiment FIG. 13 is a block diagram showing a second embodiment of the present invention. The difference from the first embodiment shown in FIG. 13, and the transmitting side separates the phoneme code and other information (intonation, pitch, etc.) and sends it to the receiver 20. The receiver 20 uses the phoneme code as it is in the codebook 23A, and also separates the phoneme code and intonation, etc. information is synthesized by the speech synthesis section 26 to generate speech output. The other configurations and operations (including the configuration and operations of the blink signal generator) are the same as those in FIGS. 2 and 3. Therefore, in this second embodiment as well, the same effects and advantages as in the first embodiment described above can be obtained.

（ｃ）第３実施例の説明ところで１以上の各実施例では、予め記憶されたコード
ブック２３Ａは予め決められた話者固有のものであるた
め、不特定多数の人物の口画像を伝送しようとすると、
コードブックに記憶された全口形符号を、話者が変わる
度にその話者に適合させるための書き換え処理を行なう
か、または、登録されている話者のコードブック情報を
すべて記録しておくための膨大なメモリ領域をコードブ
ックに用意しておかなければならない。(c) Description of the third embodiment By the way, in each of the first or more embodiments, since the pre-stored codebook 23A is unique to a predetermined speaker, mouth images of an unspecified number of people should be transmitted. Then,
To rewrite the full mouth form code stored in the codebook to adapt it to the speaker each time the speaker changes, or to record all codebook information for registered speakers. A huge memory area must be prepared for the codebook.

そこで、以下に示す第３実施例では、コードブックを不
特定の話者に合わせて用いることができるようにした。Therefore, in the third embodiment shown below, the codebook can be used to suit unspecified speakers.

即ち、第１４図に示すように、標準的な人間の全音素を
発音した字の口形に対する口形モデルの各パラメータ値
を測定して標準コードブックを作成し、このコードブッ
ク内の各パラメータ値を予め決めた基本音素符号（例え
ば無音符号）のパラメータ値で正規化（割算）してパラ
メータ後とに正規化したコードブックを作る（第１５図
参照）。That is, as shown in Fig. 14, a standard codebook is created by measuring each parameter value of the mouth shape model for the mouth shape of a character that pronounces all standard human phonemes, and each parameter value in this codebook is A predetermined basic phoneme code (for example, silence code) is normalized (divided) by the parameter value to create a normalized codebook after the parameters (see FIG. 15).

そして、第１６図に示すように、基本音素符号に対応す
る個人の口画像から１組のパラメータを測定し、パラメ
ータ毎に第１５図のように求めた正規化されたコードブ
ックの全音素符号に対する各パラメータに乗算すること
により個人用のコードブックが作成できることとなる。Then, as shown in Fig. 16, a set of parameters are measured from the individual's mouth image corresponding to the basic phoneme code, and the total phoneme code of the normalized codebook is obtained for each parameter as shown in Fig. 15. By multiplying each parameter for , a personal codebook can be created.

即ち１例えば、得られた１組の個人口画像パラメータが
ｂよ、〜ｂ、ｎとすれば、第１５図において音素符号■
でパラメータ■の正規化コードａ　２１／　ａ　１□に
は上記のパラメータｂ１□が掛けられて（ａｓ□／ａ工
ｚ）　ｂｌｌというコードに変換され、同様にしてパラ
メータｌに関してはパラメータｂ１１が全音素符号に関
して乗算されることとなる。That is, 1. For example, if the obtained set of personal mouth image parameters is byo, ~b, n, then in FIG. 15, the phoneme code ■
Then, the normalized code a21/a1□ of parameter ■ is multiplied by the above parameter b1□ (as□/a z) and converted to the code bll, and similarly for parameter l, parameter b11 is completely The phoneme code will be multiplied.

第１７図はかかる個人用のコードブックを作成するため
の初期化装置３０を設けた第３実施例を示すブロック図
であるが、この初期化装置３０でコードブック２３Ａを
個人用に初期化することにより不特定多数の話者の原動
画像を再生するものである。FIG. 17 is a block diagram showing a third embodiment that is provided with an initialization device 30 for creating such a personal codebook, and this initialization device 30 initializes the codebook 23A for personal use. By doing so, dynamic images of an unspecified number of speakers are reproduced.

そして、この初期化装置３０の具体的な構成が第１８図
に示されており、最初に送信部１０の画像処理部１１か
ら顔画像中の基本音素符号（この場合、無音符号）の口
画像が送られてきた時、この初期化装置３０では、特徴
点抽出部３１でその口画像の特徴点を抽出する。そして
、この特徴点間距離等からパラメータ計算部３２で１組
のパラメータを計算する。この１組のパラメータを第１
５図に示すように正規化コードブックメモリ３３に予め
用意しておいた正規化コードブックの各パラメータ毎の
乗算を乗算器３４で行なって個人用コードブックメモリ
３５を作成してコードブック２３に格納する。The specific configuration of this initialization device 30 is shown in FIG. When the mouth image is sent, the feature point extraction unit 31 of the initialization device 30 extracts the feature points of the mouth image. Then, a parameter calculation unit 32 calculates a set of parameters based on the distance between feature points and the like. This set of parameters is
As shown in FIG. 5, the multiplier 34 performs multiplication for each parameter of the normalized codebook prepared in advance in the normalized codebook memory 33 to create a personal codebook memory 35. Store.

以後、その個人の０画像伝送の際に参照されることとな
る。From now on, it will be referenced when transmitting that individual's 0 image.

このように、用意したコードブックを話者毎に更新でき
るように初期化装置３０を設けたので、不特定多数の話
者に対しても容易に対応することが可能と成る。In this way, since the initialization device 30 is provided so that the prepared codebook can be updated for each speaker, it becomes possible to easily deal with an unspecified number of speakers.

なお、この初期化装置３０は、第１３図に示すような実
施例にも同様に適用される。Note that this initialization device 30 is similarly applied to the embodiment shown in FIG.

［発明の効果］以上のように、本発明の顔動画像合成用まばたき信号発
生方式によれば５話をしているときと、そうでないとき
とで、まばたき信号の発生頻度を変えることができるの
で、まばたきの様子を変えることができ、これにより、
より自然な動画像を合成できるという利点がある。[Effects of the Invention] As described above, according to the blink signal generation method for face dynamic image synthesis of the present invention, the frequency of blink signal generation can be changed depending on whether the person is talking or not. This allows you to change the way your eyes blink.
This has the advantage of being able to synthesize more natural moving images.

[Brief explanation of the drawing]

第１図は本発明の原理ブロック図、第２図は本発明の第１実施例を示すブロック図、第３図
はまばたき信号発生部のブロック図、第４図はコードブ
ックの構成図、第５図は制御点座標テーブルの構成図、第６図（ａ）、
（ｂ）、（ｃ）は音素符号に対する口画像を示す図、第７図は瞼領域の形状モデル構成を示す図、第８図（ａ
）、（ｂ）、（ｃ）は陰影モデル画像の変形の概念を説
明する図、第９図は平均０．標準偏差１の正規分布を示す図、第１０図は平均ｍ８．標準偏差σ、の正規分布を示す図
、第１１図は乱数値演算要領を示すフローチャート、第１２図はまばたき信号発生部での各部波形を示す図、第１３図は本発明の第２実施例を示すブロック図、第１４図は本発明の第３実施例における正規化コードブ
ックの作成手順を示す図、第１５図は正規化コードブックの構成図、第１６図は本
発明の第３実施例における個人用コードブックの作成手
順を示す図、第１７図は本発明の第３実施例を示すブロック図、第１８図は初期化装置のブロック図、第１９図は従来の一般的な画像伝送方式を示す系統図で
ある。図において、１０は送信部、１１は画像処理部、１２は音声符号化部、１３は音声認識部。１９は背景画メモリ、２０は受信部。２１は音声復号化部、２２は音声認識部、２３Ａはフードブック、２３Ｂは制御点座標メモリ（テーブル）、２４Ａは口形
モデル変形部（口形モデル画像記憶手段）。２４Ｂは陰影モデル変形部（瞼形モデル画像記憶手段）
、２５は合成部、２６は音声合成部、２７は補間点計算部、２８はまばたき信号発生部、２９は座標テーブル制御部。３０は初期化装置、３１は特徴点抽出部、３２はパラメータ計算部。３３は正規化コードブックメモリ、３４は乗算部、３５は個人用コードブックメモリ、２８０は乱数発生器、２８１は標準正規乱数テーブル。２８２は第１の乱数変換部、２８３は第２の乱数変換部、２８４は第１のパルス発生部、２８４ａはカウンタ。２８４ｂは比較器、２８４ｃはパルス発生器、２８５は第２のパルス発生部、２８５ａはカウンタ、２８５ｂは比較器、２８５ｃはパルス発生器、２８６は音声入力検出部、２８７はパルス列選択部である。ツードブ１ワめ講放凹第４図刺ｆＪＰ虐。怜り卸、化１１オ水イープル／ｌａ６支し］巨舎瀬域／
ｌ形状七程槙広乞小す図第７図音素工音素■ 音＃：ＩＩ（ｂ）（Ｃ）冬昔素１；丈寸オろロ山イ家！ホす刀第６図平均ｍ＋、標譚イ鼎左σ輪正規７分卆乞小す口笛１゜図Ｓ！−均ｏＪ譚（橢井１め正規分布乞がす口第図ｄ）数％亘演算９１貝ぎ示すフロー手ヤード第図、正犬見イヒコードブ７グイ丁〜キＰ１貝七す固第１４
図正絹、化コードブッグめわＩｆｆ画第１５図イ固人用コード°ブ７グイＴｉマイトｌｌｌ貝乞ホすロ
第１６図判ｓｌイロFig. 1 is a block diagram of the principle of the present invention, Fig. 2 is a block diagram showing a first embodiment of the invention, Fig. 3 is a block diagram of the blink signal generator, Fig. 4 is a block diagram of the codebook, Figure 5 is a configuration diagram of the control point coordinate table, Figure 6 (a),
(b) and (c) are diagrams showing mouth images corresponding to phoneme codes, Figure 7 is a diagram showing the shape model configuration of the eyelid region, and Figure 8 (a
), (b), and (c) are diagrams explaining the concept of deformation of a shadow model image. Figure 9 shows an average of 0. Figure 10 shows a normal distribution with a standard deviation of 1. FIG. 11 is a flowchart showing the procedure for calculating random numbers. FIG. 12 is a diagram showing waveforms of various parts in the blink signal generation section. FIG. 13 is a second embodiment of the present invention. FIG. 14 is a diagram showing the procedure for creating a normalization codebook in the third embodiment of the present invention, FIG. 15 is a block diagram of the normalization codebook, and FIG. Figure 17 is a block diagram showing the third embodiment of the present invention; Figure 18 is a block diagram of the initialization device; Figure 19 is a conventional general image. FIG. 2 is a system diagram showing a transmission method. In the figure, 10 is a transmitting section, 11 is an image processing section, 12 is a speech encoding section, and 13 is a speech recognition section. 19 is a background image memory, and 20 is a receiving section. 21 is a voice decoding unit, 22 is a voice recognition unit, 23A is a food book, 23B is a control point coordinate memory (table), and 24A is a mouth shape model transformation unit (mouth shape model image storage means). 24B is a shadow model transformation unit (eyelid shape model image storage means)
, 25 is a synthesis section, 26 is a speech synthesis section, 27 is an interpolation point calculation section, 28 is a blink signal generation section, and 29 is a coordinate table control section. 30 is an initialization device, 31 is a feature point extraction unit, and 32 is a parameter calculation unit. 33 is a normalization codebook memory, 34 is a multiplication unit, 35 is a personal codebook memory, 280 is a random number generator, and 281 is a standard normal random number table. 282 is a first random number converter, 283 is a second random number converter, 284 is a first pulse generator, and 284a is a counter. 284b is a comparator, 284c is a pulse generator, 285 is a second pulse generator, 285a is a counter, 285b is a comparator, 285c is a pulse generator, 286 is an audio input detector, and 287 is a pulse train selector. Two Dobu 1 Wame Kyouho Concave Figure 4 Stab fJP Massacre. Resurrection, 11 O water Ypres / LA 6 support] Kyosha Se area /
l-shape 7-degree Makihiro-Ko-su Diagram 7 Diagram Phoneme Technique Phoneme ■ Sound #: II (b) (C) Fuyuumamoto 1; Length size Ororo Yamai family! Hosu sword figure 6 average m+, mark Tan I ding left σ ring regular 7 minutes 卆 beg small whistle 1° figure S! - Hitoshi oJ story (Kurai 1st normal distribution begging diagram d) several % calculation 91 shell showing flow hand diagram, positive Inumi Ihi code 7 Gui cho ~ Ki P1 Kai 7 Su Gu 14th
Figure Pure Silk, Code Book Mewa Iff Drawing Figure 15 A Cord for Solid Person ° BU 7 Gui Ti Might Ill Shellfish Begging Hosulo Figure 16 Size SL Iro

Claims

[Claims]

(1) By using a small amount of initialization data transmitted during initialization, depending on the audio information transmitted during communication,
In a device that synthesizes and displays moving images of faces on the receiving side, at the time of initialization, the average value (m_1,
m_2) and the standard deviation (σ_1, σ_2), and during communication, the average value (m_1, m_2) of the time interval of each blink during vocalization and non-vocalization and the standard deviation (σ_
A pulse train signal for vocalization (P_1) and a pulse train signal for non-vocalization (P_2) are generated at time intervals according to a normal distribution according to According to the detected detection signal (S), during vocalization, the pulse train signal for vocalization (P_1) is output as a blink signal, and during non-vocalization, the pulse train signal for non-vocalization (P_2) is output as a blink signal. A blink signal generation method for facial dynamic image synthesis, which is characterized by outputting.

(2) Random number sequence Ui (i=1, 2
, 3, . When communication is started, a random value Ui is read from the standard normal random number table (281), and the required conversion is performed on it to make it follow a normal distribution of mean value (m_1) and standard deviation (σ_1). Communication starts upon receiving the first random number conversion unit (282) that converts into a random number value (X), and the average value (m_2) and standard deviation (σ_2) of the time interval of blinking during non-utterance at the time of initialization. Then, the random value Ui is read from the standard normal random number table (281), and the required conversion is performed on it to convert it into a random value (X) that follows a normal distribution of the mean value (m_2) and standard deviation (σ_2). When a random number (X) is input from the second random number converter (283) and the first random number converter (282), the clock is counted and the count value is equal to the value of the random number (X). , it generates a pulse, and then
generating a control signal to the first random number converter (282);
A first pulse generator (284) that inputs the next random number value (X) and repeats the same process, and a second random number converter (
When a random number value (X) is input from the second random number converter (
A second pulse generator (283) generates a control signal, inputs the next random number value (X), and repeats the same process.
85), the energy of the transmitted voice is sampled at fixed time intervals, and if the energy is greater than a preset threshold, it is turned on, and if it is smaller, it is turned off. a voice input detection unit (286) that detects the voice input detection unit (286);
While it is detected that vocalization is being performed, the pulse (P1) from the first pulse generating section (284) is output, and the voice input detecting section (286) detects that no vocalization is being performed. a pulse train selection section (287) that switches to output the pulse (P_2) from the second pulse generation section (285) while blink signal generation method.