JP2002009963A

JP2002009963A - Communication system device and communication system

Info

Publication number: JP2002009963A
Application number: JP2000186284A
Authority: JP
Inventors: Hiroshi Mukai; 弘向井; Hitoshi Hagimori; 仁萩森; Minoru Kuwana; 稔桑名; Tsutomu Honda; 努本田; Kazuhiko Ishimaru; 和彦石丸; Hideki Osada; 英喜長田
Original assignee: Minolta Co Ltd
Current assignee: Minolta Co Ltd
Priority date: 2000-06-21
Filing date: 2000-06-21
Publication date: 2002-01-11

Abstract

PROBLEM TO BE SOLVED: To solve the problem that video telephones are not widely spread in spite of their convenience because terminals loaded with cameras are required on both the sides of the conventional video telephone system, which causes problems of cost and portability. SOLUTION: A communication terminal is provided with a microphone 17, a voice recognition device 15 for receiving a voice signal from the microphone 17 and recognizing a voice, a mapping device 12a for collating the voice recognized by the voice recognition device 15 with a previously determined basic voice, a pseudo image data base 14 for storing pseudo picture data corresponding to the basic voice, a pseudo image acquiring device 12b for acquiring the pseudo picture data corresponding to the collated basic voice from the data base 14, and a communication control device 11 for transmitting the voice signal inputted from the microphone 17 and the pseudo image data to a specified communication terminal.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、発声音に対応させ
た擬似画像を利用する通信端末、通信サーバ、通信シス
テムの構成に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a communication terminal, a communication server, and a communication system using a pseudo image corresponding to an uttered sound.

【０００２】[0002]

【従来の技術】音声及び画像による双方向の通信を可能
とするテレビ電話が存在する。テレビ電話は、マイク及
びスピーカを備える受話器と、通話を行う人の映像を撮
影するカメラと、送信されてきた画像を映し出すモニタ
を備えている。そして、通話を行う両者がテレビ電話を
使用することにより、相互の姿を見ながら会話をするこ
とが可能となる。2. Description of the Related Art There are videophones that enable two-way communication by voice and image. A videophone includes a receiver including a microphone and a speaker, a camera for capturing an image of a person making a call, and a monitor for displaying a transmitted image. By using the videophone, the two parties making the call can have a conversation while seeing each other.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上述し
たテレビ電話において、相互に相手の姿を確認しながら
会話を行うためには、両者がカメラを備えたテレビ電話
を所有している必要がある。つまり、自分の姿を相手に
見せるには、自分が使用している端末にカメラが必要で
あるし、逆に相手の姿を見るためには、相手が使用して
いる端末にカメラが必要である。However, in the above-described videophone, in order to have a conversation while confirming the other party's figure, it is necessary that both have a videophone equipped with a camera. In other words, in order to show yourself to the other party, you need a camera on the terminal you are using, and conversely, in order to see the other party, you need a camera on the terminal you are using. is there.

【０００４】このため、テレビ電話は必然的に構成部品
が多くなり、装置構成も比較的大きなものとなる。ま
た、カメラを装備するためにコストが高くなり、その利
便性の高さにも関わらず、広く普及するには至っていな
い。[0004] For this reason, the videophone inevitably has many components and the device configuration is relatively large. In addition, the cost is high due to the provision of a camera, and despite its high convenience, it has not yet become widespread.

【０００５】また、最近では、通信端末は携帯性が重要
なポイントとなるが、カメラを備えたテレビ電話は、携
帯して持ち歩くには形状が大きくなるという問題があ
る。また、カメラを搭載するため、やはり、コストが高
くなる。さらに、コンパクト化が進む携帯通信端末（携
帯電話）においては、できるだけ構成部品の点数を減ら
す必要があり、カメラを搭載することはその妨げとな
る。[0005] Recently, portability is an important point for communication terminals. However, there is a problem that a videophone equipped with a camera has a large size to be carried around. In addition, since the camera is mounted, the cost also increases. Furthermore, in portable communication terminals (mobile phones) that are becoming more compact, it is necessary to reduce the number of components as much as possible, and mounting a camera is an obstacle.

【０００６】以上のような状況において、カメラを搭載
し、かつ、携帯性に優れたテレビ電話が普及するには問
題が多く、結果的に所有者が少ないという状況を引き起
こしている。つまり、音声及び画像の送信が可能なテレ
ビ電話（据え置き型、携帯型の種別を問わず）を所有し
ていても、通話相手が限定されるため、その効果を十分
に発揮することができないという状況にある。[0006] Under the circumstances described above, there are many problems in disseminating a videophone equipped with a camera and having excellent portability, and as a result, a situation where the number of owners is small is caused. In other words, even if you have a videophone (regardless of stationary or portable type) capable of transmitting voice and image, the effect is not fully exhibited because the number of callers is limited. In the situation.

【０００７】また、テレビ電話では、音声に加えて画像
データを送受信するため、通信回線にかかる負担が大き
い。[0007] In a videophone, image data is transmitted and received in addition to audio, so that a heavy load is imposed on a communication line.

【０００８】そこで、本発明は上記問題点に鑑み、擬似
的にテレビ電話機能を実現することで、携帯性に優れ、
かつ、低コストな通信端末を提供することを目的とす
る。[0008] In view of the above problems, the present invention realizes a pseudo-telephone function to provide excellent portability.
Another object of the present invention is to provide a low-cost communication terminal.

【０００９】[0009]

【課題を解決するための手段】上記課題を解決するた
め、請求項１の発明は、音声及び画像による通信を可能
とする通信システム用装置であって、通信システム用装
置は通信端末として構成されており、通信端末が、音声
を入力するマイクと、マイクから音声信号を受け取り音
声の認識を行う音声認識装置と、音声認識装置が認識し
た音声とあらかじめ定められた基本音声との照合を行う
マッピング装置と、あらかじめ定められた基本音声に対
応した擬似画像データを蓄積する擬似画像データベース
と、照合された基本音声に対応する擬似画像データを前
記擬似画像データベースから取得する擬似画像取得装置
と、マイクより入力した音声の音声信号と擬似画像取得
装置が取得した擬似画像データとを指定された通信先端
末に送信する通信制御装置と、を備えることを特徴とす
る。In order to solve the above-mentioned problems, the invention of claim 1 is an apparatus for a communication system which enables communication by voice and image, wherein the apparatus for a communication system is configured as a communication terminal. A communication terminal, a microphone for inputting voice, a voice recognition device for receiving voice signals from the microphone and recognizing voice, and a mapping for collating voice recognized by the voice recognition device with a predetermined basic voice. A device, a pseudo image database that stores pseudo image data corresponding to a predetermined basic sound, a pseudo image acquisition device that obtains pseudo image data corresponding to the collated basic sound from the pseudo image database, and a microphone. A communication system for transmitting an input audio signal and pseudo image data acquired by the pseudo image acquisition device to a designated communication destination terminal. Characterized in that it comprises apparatus and, a.

【００１０】請求項２の発明は、音声及び画像による通
信を可能とする通信システム用装置であって、通信シス
テム用装置は通信端末として構成されており、通信端末
が、通信先端末から送信された音声信号を受信する通信
制御装置と、通信制御装置が受信した音声信号より音声
の認識を行う音声認識装置と、音声認識装置が認識した
音声とあらかじめ定められた基本音声との照合を行うマ
ッピング装置と、あらかじめ定められた基本音声に対応
した擬似画像データを蓄積する擬似画像データベース
と、照合された基本音声に対応する擬似画像データを前
記擬似画像データベースから取得する擬似画像取得装置
と、通信制御装置が受信した音声信号を音声として再生
するスピーカと、擬似画像取得装置が取得した擬似画像
データを出力するモニタと、を備えることを特徴とす
る。According to a second aspect of the present invention, there is provided an apparatus for a communication system capable of performing voice and image communication, wherein the apparatus for a communication system is configured as a communication terminal, and the communication terminal is transmitted from a communication destination terminal. A communication control device for receiving a voice signal, a voice recognition device for recognizing voice from a voice signal received by the communication control device, and a mapping for collating the voice recognized by the voice recognition device with a predetermined basic voice. A pseudo-image database for storing pseudo-image data corresponding to a predetermined basic sound, a pseudo-image acquiring device for obtaining pseudo image data corresponding to the collated basic sound from the pseudo-image database, and communication control. A speaker for reproducing the audio signal received by the device as audio, and a mode for outputting the pseudo image data acquired by the pseudo image acquisition device. Characterized in that it comprises data and, a.

【００１１】請求項３の発明は、音声及び画像による通
信を可能とする通信システム用装置であって、通信シス
テム用装置は通信サーバとして構成されており、通信サ
ーバが、送信側通信端末から送信された音声信号を受信
する受信制御装置と、受信制御装置が受信した音声信号
より音声の認識を行う音声認識装置と、音声認識装置が
認識した音声とあらかじめ定められた基本音声との照合
を行うマッピング装置と、あらかじめ定められた基本音
声に対応した擬似画像データを蓄積する擬似画像データ
ベースと、照合された基本音声に対応する擬似画像デー
タを前記擬似画像データベースから取得する擬似画像取
得装置と、受信制御装置が受信した音声信号と擬似画像
取得装置が取得した擬似画像データとを送信側通信端末
に送信する送信制御装置と、を備えることを特徴とす
る。According to a third aspect of the present invention, there is provided an apparatus for a communication system capable of performing communication by voice and image, wherein the apparatus for a communication system is configured as a communication server, and the communication server transmits from the communication terminal on the transmitting side. A receiving control device for receiving the received voice signal, a voice recognition device for recognizing voice from the voice signal received by the receiving control device, and collating the voice recognized by the voice recognition device with a predetermined basic voice. A mapping device, a pseudo image database that stores pseudo image data corresponding to a predetermined basic sound, a pseudo image acquisition device that obtains pseudo image data corresponding to the collated basic sound from the pseudo image database, A transmission control for transmitting an audio signal received by the control device and pseudo image data acquired by the pseudo image acquisition device to the transmission side communication terminal. Characterized in that it comprises apparatus and, a.

【００１２】請求項４の発明は、請求項１ないし請求項
３のいずれかに記載の通信システム用装置であって、擬
似画像データは、基本音声を発声する人の表情を含む映
像情報であることを特徴とする。According to a fourth aspect of the present invention, in the communication system apparatus according to any one of the first to third aspects, the pseudo image data is video information including a facial expression of a person who utters a basic sound. It is characterized by the following.

【００１３】請求項５の発明は、請求項１ないし請求項
３のいずれかに記載の通信システム用装置であって、擬
似画像データは、基本音声を発声する人の口元の形状を
あらわす映像情報であることを特徴とする。According to a fifth aspect of the present invention, there is provided the communication system apparatus according to any one of the first to third aspects, wherein the pseudo image data represents video information representing a shape of a mouth of a person who utters a basic sound. It is characterized by being.

【００１４】請求項６の発明は、請求項４または請求項
５に記載の通信システム用装置であって、擬似画像デー
タベースは、同一の前記基本音声に対して、背景の異な
る複数種類の擬似画像データを備えることを特徴とす
る。According to a sixth aspect of the present invention, in the communication system apparatus according to the fourth or fifth aspect, the pseudo image database stores a plurality of types of pseudo images having different backgrounds for the same basic sound. It is characterized by having data.

【００１５】請求項７の発明は、請求項４ないし請求項
６のいずれかに記載の通信システム用装置であって、基
本音声は所定の言語についての５つの母音の組合せを含
み、擬似画像データベースは少なくとも前記各母音に対
応した擬似画像データを備えることを特徴とする。A seventh aspect of the present invention is the communication system apparatus according to any one of the fourth to sixth aspects, wherein the basic voice includes a combination of five vowels for a predetermined language, and the pseudo image database Comprises at least pseudo image data corresponding to each of the vowels.

【００１６】請求項８の発明は、請求項７に記載の通信
システム用装置であって、マッピング装置は音声認識装
置が認識した音声に当該音声の母音を照合する機能を含
むことを特徴とする。According to an eighth aspect of the present invention, there is provided the communication system apparatus according to the seventh aspect, wherein the mapping apparatus has a function of comparing a vowel of the voice with a voice recognized by the voice recognition apparatus. .

【００１７】請求項９の発明は、請求項４ないし請求項
６のいずれかに記載の通信システム用装置であって、基
本音声は日本語についての５つの母音及び「ん」音を含
み、擬似画像データベースは少なくとも前記各母音及び
「ん」音に対応した擬似画像データを備えることを特徴
とする。According to a ninth aspect of the present invention, there is provided the communication system apparatus according to any one of the fourth to sixth aspects, wherein the basic voice includes five vowels and "n" sounds for Japanese, The image database is provided with at least pseudo image data corresponding to each of the vowels and the "n" sound.

【００１８】請求項１０の発明は、請求項９に記載の擬
似画像を利用した通信システム用装置であって、マッピ
ング装置は音声認識装置が認識した音声が「ん」音以外
の音声である場合、当該音声に当該音声の母音を照合
し、音声認識装置が認識した音声が「ん」音である場
合、当該「ん」音をそのまま照合する機能を含むことを
特徴とする。According to a tenth aspect of the present invention, there is provided an apparatus for a communication system using the pseudo image according to the ninth aspect, wherein the mapping apparatus uses a voice recognized by the voice recognition apparatus other than the "n" sound. The vowel of the voice is collated with the voice, and when the voice recognized by the voice recognition device is the “n” sound, the function is provided to collate the “n” sound as it is.

【００１９】請求項１１の発明は、音声及び画像による
通信を可能とする通信システムであって、送信側通信端
末から受信側通信端末に至るまでの通信に関与する複数
の通信装置のうちの特定の通信装置が、請求項１ないし
請求項１０のいずれかの通信システム用装置によって構
成されていることを特徴とする。An invention according to claim 11 is a communication system which enables communication by voice and image, and specifies a plurality of communication devices involved in communication from a transmission side communication terminal to a reception side communication terminal. Is characterized by being constituted by the communication system device according to any one of claims 1 to 10.

【００２０】請求項１２の発明は、請求項１１に記載の
通信システムであって、前記特定の通信装置が、送信側
通信端末と受信側通信端末との通信を中継する通信サー
バであることを特徴とする。A twelfth aspect of the present invention is the communication system according to the eleventh aspect, wherein the specific communication device is a communication server that relays communication between the transmitting communication terminal and the receiving communication terminal. Features.

【００２１】[0021]

【発明の実施の形態】以下、本発明の実施の形態につい
て添付の図面を用いて説明する。実施の形態は、送信側
で擬似画像を生成する実施の形態１（図１）、受信側で
擬似画像を生成する実施の形態２（図３）、中間のサー
バが擬似画像を生成する実施の形態３（図５）の３構成
について説明する。Embodiments of the present invention will be described below with reference to the accompanying drawings. In the embodiment, a first embodiment (FIG. 1) in which a pseudo image is generated on the transmitting side, a second embodiment (FIG. 3) in which a pseudo image is generated on the receiving side, and an embodiment in which an intermediate server generates a pseudo image. Three configurations of the third embodiment (FIG. 5) will be described.

【００２２】｛実施の形態１｝まず、送信側で擬似画像
を生成する実施の形態について図１を用いて説明する。
実施の形態１においては、便宜上、擬似画像を生成する
通信端末を画像生成通信端末１０と称し、これに対し
て、音声及び画像による通信を可能とする従来型の通信
端末をＴＶ電話（テレビ電話）５０と称す。画像生成通
信端末１０は、据え置き型、携帯型いずれの構成であっ
てもよいが、本実施の形態では、より本発明の効果を発
揮する携帯型の端末として説明する。Embodiment 1 First, an embodiment for generating a pseudo image on the transmission side will be described with reference to FIG.
In the first embodiment, a communication terminal that generates a pseudo image is referred to as an image generation communication terminal 10 for the sake of convenience, whereas a conventional communication terminal that enables voice and image communication is a TV telephone (video telephone). ) 50. The image generation communication terminal 10 may be either a stationary type or a portable type. However, in the present embodiment, the image generating communication terminal 10 will be described as a portable type that exerts more advantageous effects of the present invention.

【００２３】ＴＶ電話５０は、スピーカ、マイク、モニ
タ、カメラを備えて、双方向で音声及び画像の通信を可
能とするものであるが、本実施の形態においては、ＴＶ
電話５０がカメラを備えていない構成とする。つまり、
ＴＶ電話５０は、スピーカ５３、マイク５４、モニタ５
５を備える構成としている。The TV phone 50 is provided with a speaker, a microphone, a monitor, and a camera to enable two-way voice and image communication.
The telephone 50 does not have a camera. That is,
The TV phone 50 includes a speaker 53, a microphone 54, and a monitor 5.
5 is provided.

【００２４】図に示すように、ＴＶ電話５０は、通信制
御装置５１及び制御装置５２を備えており、ＴＶ電話５
０は、通信制御装置５１を介して通信ネットワーク１に
接続されている。そして、通信制御装置５１が通信先端
末との間で、音声信号及び画像データの送受信を行い、
制御装置５２に接続されたスピーカ５３、モニタ５５よ
り音声及び画像の出力を行い、マイク５４より入力され
た音声を通信制御装置５１を介して通信先端末へ送信す
るようにしている。なお、ＴＶ電話５０と通信ネットワ
ーク１との接続形態は、有線、無線の種別は問わない。As shown in the figure, the TV phone 50 includes a communication control device 51 and a control device 52.
0 is connected to the communication network 1 via the communication control device 51. Then, the communication control device 51 transmits and receives the audio signal and the image data to and from the communication destination terminal,
Audio and images are output from a speaker 53 and a monitor 55 connected to the control device 52, and the audio input from the microphone 54 is transmitted to the communication destination terminal via the communication control device 51. The connection form between the TV phone 50 and the communication network 1 is not limited to wired or wireless.

【００２５】画像生成通信端末１０は、図に示すよう
に、スピーカ１６、マイク１７、モニタ１８、カメラ１
９を備える構成としているが、後に述べるようにカメラ
１９を装備しない構成とすることも可能である。スピー
カ１６、マイク１７、モニタ１８、カメラ１９は、それ
ぞれ制御装置１２に接続され、制御装置１２に接続され
た通信制御装置１１を介して、通信先端末との間で音声
及び画像による双方向の通信を行う。As shown in the figure, the image generation communication terminal 10 includes a speaker 16, a microphone 17, a monitor 18, and a camera 1.
9, but it is also possible to adopt a configuration without the camera 19 as described later. The speaker 16, the microphone 17, the monitor 18, and the camera 19 are connected to the control device 12, respectively, and communicate bidirectionally with a communication destination terminal via a communication control device 11 connected to the control device 12 by voice and image. Perform communication.

【００２６】また、マイク１７と制御装置１２の間に
は、音声認識装置１５が介装されており、マイク１７か
ら受け取った音声信号から音声の認識を行うようにして
いる。つまり、制御装置１２は、マイク１７が入力した
音声の音声信号と、音声認識装置１５が認識した音声の
音声信号とを入力する。A voice recognition device 15 is interposed between the microphone 17 and the control device 12 to recognize voice from a voice signal received from the microphone 17. That is, the control device 12 inputs the voice signal of the voice input by the microphone 17 and the voice signal of the voice recognized by the voice recognition device 15.

【００２７】また、制御装置１２はマッピング装置１２
ａ及び擬似画像取得装置１２ｂを備えている。マッピン
グ装置１２ａは、音声認識装置１５が認識した音声と、
あらかじめ設定された基本音声との照合を行う装置であ
る。照合とは、認識した音声をどの基本音声に対応付け
る（マッピングする）かの判断手段、および、その対応
付けを行う手段であり、マイク１７が入力した音声の全
てを基本音声に照合する機能を備えている。なお、以下
の説明で、マッピング処理としての記述は、上記照合処
理を示すものとする。また、擬似画像取得装置１２ｂ
は、基本音声にマッピングされた音声に対応する擬似画
像データを取得する機能を備えている。The control device 12 is a mapping device 12
a and a pseudo image acquisition device 12b. The mapping device 12a includes a voice recognized by the voice recognition device 15;
This is a device that performs collation with a preset basic voice. The collation is a means for determining (mapping) a recognized voice to which basic voice is to be associated with, and a means for performing the association, and has a function of collating all the voices input by the microphone 17 with the basic voice. ing. In the following description, the description as the mapping process indicates the above-described collation process. Also, the pseudo image acquisition device 12b
Has a function of acquiring pseudo image data corresponding to audio mapped to basic audio.

【００２８】基本音声については、後で詳細に述べる
が、ここでは、図７で示すように、日本語についての基
本音声を「あ」「い」「う」「え」「お」「ん」の６つ
の音声、つまり、母音及び「ん」音からなる音声の群で
構成する。また、マッピング装置１２ａは、音声が
「ん」音以外の音声である場合には、その音声の母音を
マッピング（照合）させ、音声が「ん」音である場合に
は、そのまま「ん」音をマッピング（照合）させる設定
となっている。これにより、マイク１７が入力した音声
は、マッピング装置１２ａによって、全て母音及び
「ん」音にマッピングされることになる。The basic voice will be described later in detail, but here, as shown in FIG. 7, the basic voice for Japanese is "A", "I", "U", "E", "O", "N". , Ie, a group of voices composed of vowels and “n” sounds. When the voice is a voice other than the “n” sound, the mapping device 12a maps (collates) the vowel of the voice, and when the voice is the “n” sound, the mapping device 12a does not change the “n” sound. Is set to be mapped (matched). As a result, the voice input by the microphone 17 is all mapped to vowels and “n” sounds by the mapping device 12a.

【００２９】例えば、図８に示すように、マイク１７に
より「こんにちは」という音声が入力された場合、マッ
ピング装置１２ａは、「こ」→「お」「ん」→「ん」「に」→「い」「ち」→「い」「は」→「あ」というマッピングを行うのである。[0029] For example, as shown in FIG. 8, when the voice of "Hello" is input by the microphone 17, the mapping unit 12a, "child" → "you,""I" → "I", "to" → " The mapping is performed in the order of "i""chi" → "i""ha" → "a".

【００３０】また、画像生成通信端末１０は、擬似画像
データベース１４を備えている。擬似画像データベース
１４は、基本音声に対応した擬似画像データを蓄積する
データベースであり、基本音声を発声した人の表情を撮
影した映像情報を蓄積している。つまり、図７で示すよ
うに、各母音及び「ん」音は、それぞれの音声を発生し
ている人の表情の擬似画像データを保有しているのであ
る。The image generation communication terminal 10 has a pseudo image database 14. The pseudo image database 14 is a database that stores pseudo image data corresponding to the basic sound, and stores video information obtained by photographing the expression of the person who uttered the basic sound. In other words, as shown in FIG. 7, each vowel and “n” sound has pseudo image data of a facial expression of a person generating each sound.

【００３１】このように、マッピング装置１２ａによ
り、全ての音声を基本音声にマッピングし、各基本音声
は対応する擬似画像データを擬似画像データベース１４
内に保有しているので、全ての音声に対して擬似画像デ
ータを対応させることが可能となる。As described above, all voices are mapped to basic voices by the mapping device 12a, and each of the basic voices corresponds to the corresponding pseudo image data in the pseudo image database 14.
, The pseudo image data can be associated with all the sounds.

【００３２】以上の構成における、画像生成通信端末１
０を動作状態について説明する。まず、前段階として前
述した擬似画像データベース１４の作成を行う。擬似画
像データベース１４の作成は、例えば、画像生成通信端
末１０を利用して通話を行う人の表情をカメラ１９で撮
影することにより行う。そして、撮影した各母音及び
「ん」音に対応した擬似画像データを擬似画像データベ
ース１４に保存する。なお、画像生成通信端末１０には
操作キー等からなる操作装置１０ａが設けられており、
撮影した画像を、どの基本音声に対応させるかといった
設定を可能としている。In the above configuration, the image generation communication terminal 1
The operation state of 0 will be described. First, the above-described pseudo image database 14 is created as a previous step. The creation of the pseudo image database 14 is performed, for example, by photographing the expression of a person making a call using the image generation communication terminal 10 with the camera 19. Then, pseudo image data corresponding to each of the photographed vowels and “n” sounds is stored in the pseudo image database 14. The image generation communication terminal 10 is provided with an operation device 10a including operation keys and the like.
It is possible to make settings such as which basic sound the captured image corresponds to.

【００３３】なお、擬似画像データは、例えばデジタル
カメラで撮影を行い、画像生成通信端末１０に設けられ
た外部端子１０ｂより画像データを入力することも可能
である。このような構成とすることにより、画像生成通
信端末１０にはカメラを搭載しない構成とすることも可
能となる。The pseudo image data can be taken by, for example, a digital camera, and the image data can be input from an external terminal 10b provided in the image generation communication terminal 10. With such a configuration, the image generation communication terminal 10 can be configured not to include a camera.

【００３４】また、マッピング装置１２ａにおけるマッ
ピングルールは、本実施の形態においては、「ん」音以
外の音声に対しては、当該音声に当該音声の母音を対応
させ、「ん」音に対しては、そのまま「ん」音を対応さ
せるといった設定としているが、後に、述べるように他
のマッピングルールを採用することが可能であるし、ま
た、複数のマッピングルールを保有させることも可能で
ある。複数のマッピングルールを保有させている場合に
は、操作装置１０ａを用いて、マッピングルールの設定
変更を可能にすればよい。In the present embodiment, the mapping rule in the mapping device 12a is such that, for a sound other than the "n" sound, the vowel of the sound is made to correspond to the sound, and Is set to correspond to the “n” sound as it is. However, as described later, other mapping rules can be adopted, and a plurality of mapping rules can be held. If a plurality of mapping rules are held, the setting of the mapping rules may be changed using the operation device 10a.

【００３５】前段階の設定がなされている状態で、画像
生成通信端末１０の利用者２は、操作装置１０ａでダイ
ヤル操作を行い、通話先であるＴＶ電話５０との通信を
確立する。もしくは、ＴＶ電話５０の利用者３の操作に
より、ＴＶ電話５０側から発呼があり通信が確立する。
そして、利用者２がマイク１７に対して発声すると、マ
イク１７に入力された音声が音声信号に変換され、音声
認識装置１５において音声の認識が行われ、音声信号と
ともに認識された音声の信号が制御装置１２に送られ
る。制御装置１２では、マッピング装置１２ａによって
認識された音声を基本音声に対応させ、擬似画像生成装
置１２ｂによって擬似画像データベース１４より対応す
る擬似画像データを取得する。そして、制御装置１２
は、マイク１７より入力した音声の音声信号と、擬似画
像データベース１４より取得した擬似画像データを通信
制御装置１１へ受け渡す。そして、通信制御装置１１に
より、音声信号及び擬似画像データが、通信ネットワー
ク１を介してＴＶ電話５０へ送信されるのである。以上
説明した音声信号及び擬似画像データのデータの流れを
図２に示す。In the state where the settings at the previous stage have been made, the user 2 of the image generation communication terminal 10 performs a dial operation with the operation device 10a to establish communication with the TV telephone 50 as the call destination. Alternatively, by the operation of the user 3 of the TV phone 50, a call is made from the TV phone 50 side, and the communication is established.
When the user 2 speaks to the microphone 17, the voice input to the microphone 17 is converted into a voice signal, voice recognition is performed in the voice recognition device 15, and the voice signal recognized together with the voice signal is output. It is sent to the control device 12. In the control device 12, the voice recognized by the mapping device 12a corresponds to the basic voice, and the pseudo image data is acquired from the pseudo image database 14 by the pseudo image generation device 12b. And the control device 12
Transfers the audio signal of the audio input from the microphone 17 and the pseudo image data obtained from the pseudo image database 14 to the communication control device 11. Then, the audio signal and the pseudo image data are transmitted to the TV phone 50 via the communication network 1 by the communication control device 11. FIG. 2 shows the flow of the audio signal and the pseudo image data described above.

【００３６】ＴＶ電話５０の通信制御装置５１は、画像
生成通信端末１０から音声信号及び擬似画像データを受
信すると、当該信号を制御装置５２に受け渡し、制御装
置５２により音声信号及び画像信号が取り出され、それ
ぞれスピーカ５３及びモニタ５５より出力される。When the communication control device 51 of the TV phone 50 receives the audio signal and the pseudo image data from the image generation communication terminal 10, the communication control device 51 transfers the signal to the control device 52, and the control device 52 extracts the audio signal and the image signal. Are output from the speaker 53 and the monitor 55, respectively.

【００３７】このようにして、ＴＶ電話５０側には、画
像生成通信端末１０の利用者２の音声と、当該音声に対
応した擬似画像がモニタ５５に出力されるので、あたか
も利用者２の姿がリアルタイムでモニタ５５に映し出さ
れているような状態で、会話を行うことができるのであ
る。As described above, the voice of the user 2 of the image generation communication terminal 10 and the pseudo image corresponding to the voice are output to the monitor 55 on the TV phone 50 side. Can be conducted in a state where is displayed on the monitor 55 in real time.

【００３８】一方、ＴＶ電話５０のマイク５４より入力
された音声信号は、通信制御装置５１，１１を介して画
像生成通信端末１０に送信され、画像生成通信端末１０
のスピーカ１６より出力される。On the other hand, the audio signal input from the microphone 54 of the TV phone 50 is transmitted to the image generation communication terminal 10 via the communication control devices 51 and 11, and the image generation communication terminal 10
Is output from the speaker 16.

【００３９】このような構成とすることで、両者がカメ
ラを装備していない通信端末を利用している場合であっ
ても、擬似的に相手の姿を見ながらの会話が可能とな
る。例えば、図１で示した画像生成通信端末１０を営業
担当者が利用すると有効である。顧客からの問い合わせ
電話があった場合に、擬似画像データを送信すること
で、営業担当者は自分の顔を覚えてもらうことができる
のである。しかも、営業担当者は、常に自分の姿をカメ
ラで撮影できるような状態で準備しておく必要がない。
作業着姿で走り回っているときでも、スーツを着てネク
タイを締めた擬似画像の姿で顧客対応が可能となるので
ある。また、あらかじめ複数の擬似画像を選択可能にデ
ータベースに登録し、状況に応じて選択できるようにし
てもよい。By adopting such a configuration, even when both use a communication terminal not equipped with a camera, it is possible to simulate a conversation while looking at the other party. For example, it is effective that the sales representative uses the image generation communication terminal 10 shown in FIG. By transmitting the pseudo image data when a customer makes an inquiry call, the sales representative can have his / her face remembered. In addition, the salesperson does not need to be prepared so that he can always take his picture with the camera.
Even when running around in work clothes, customers can be handled in the form of a pseudo image of wearing a suit and wearing a tie. Alternatively, a plurality of pseudo images may be registered in a database in a selectable manner in advance, and may be selected according to the situation.

【００４０】なお、ＴＶ電話５０にカメラを装備する構
成とした場合には、画像生成通信端末１０の利用者２は
ＴＶ電話５０を利用している利用者３の姿を確認しなが
ら会話をすることが可能となる。In the case where the TV phone 50 is equipped with a camera, the user 2 of the image generating communication terminal 10 has a conversation while checking the appearance of the user 3 using the TV phone 50. It becomes possible.

【００４１】｛実施の形態２｝次に、受信側で擬似画像
を生成する実施の形態について図３を用いて説明する。
なお、実施の形態２においては、便宜上、擬似画像を作
成する通信端末を画像生成通信端末２０と称し、これに
対して、音声及び画像による通信を可能とする従来型の
通信端末をＴＶ電話５０と称す。なお、ＴＶ電話５０は
図１で示した実施の形態における端末と同様の構成であ
り、ＴＶ電話５０が備えている各装置に同一の番号を付
している。Embodiment 2 Next, an embodiment for generating a pseudo image on the receiving side will be described with reference to FIG.
In the second embodiment, a communication terminal that creates a pseudo image is referred to as an image generation communication terminal 20 for the sake of convenience, whereas a conventional communication terminal that enables voice and image communication is a TV phone 50. Called. Note that the TV phone 50 has the same configuration as the terminal in the embodiment shown in FIG. 1, and the same numbers are assigned to the respective devices provided in the TV phone 50.

【００４２】画像生成通信端末２０は、図に示すよう
に、スピーカ２６、マイク２７、モニタ２８、カメラ２
９を備える構成としているが、実施の形態１で説明した
ようにカメラ２９を装備しない構成とすることも可能で
ある。スピーカ２６、マイク２７、モニタ２８、カメラ
２９は、それぞれ制御装置２２に接続され、制御装置２
２に接続された通信制御装置２１を介して、通信先端末
との間で音声及び画像による双方向の通信を行う。As shown in the figure, the image generation communication terminal 20 includes a speaker 26, a microphone 27, a monitor 28,
9, the camera 29 may not be provided as described in the first embodiment. The speaker 26, the microphone 27, the monitor 28, and the camera 29 are connected to the control device 22, respectively.
Via the communication control device 21 connected to the communication terminal 2, bidirectional communication by voice and image is performed with the communication destination terminal.

【００４３】また、制御装置２２には音声認識装置２５
が接続されており、制御装置２２が通信相手から受け取
った音声信号から音声の認識を行うようにしている。ま
た、制御装置２２は実施の形態１と同様に、マッピング
装置２２ａ及び擬似画像取得装置２２ｂを備えている。
基本音声については、実施の形態１と同様に、「あ」
「い」「う」「え」「お」「ん」の６つの音声、つま
り、母音及び「ん」音からなる音声の群で構成する。こ
れにより、制御装置２２が受け取った音声は、マッピン
グ装置２２ａによって、全て母音及び「ん」音に対応さ
れることになる。また、画像生成通信端末２０は、擬似
画像データベース２４を備えている。The control device 22 includes a speech recognition device 25.
Is connected, and the control device 22 recognizes a voice from a voice signal received from a communication partner. The control device 22 includes a mapping device 22a and a pseudo image acquisition device 22b, as in the first embodiment.
As for the basic voice, as in the first embodiment, “A”
It is composed of six voices “i”, “u”, “e”, “o”, and “n”, that is, a group of voices composed of vowels and “n” sounds. As a result, all the voices received by the control device 22 are made to correspond to vowels and “n” sounds by the mapping device 22a. Further, the image generation communication terminal 20 includes a pseudo image database 24.

【００４４】このように、マッピング装置２２ａによ
り、全ての音声を基本音声のマッピング（照合）し、各
基本音声は対応する擬似画像データを擬似画像データベ
ース２４内に保有しているので、全ての音声に対して擬
似画像データを対応させることが可能となる。As described above, the mapping device 22a maps (collates) all voices with basic voices, and each basic voice has corresponding pseudo image data in the pseudo image database 24. Can be associated with the pseudo image data.

【００４５】以上の構成における画像生成通信端末２０
の動作状態について説明する。まず、前段階として実施
の形態１と同様に、擬似画像データベース２４の作成及
びマッピングルールの設定を行う。画像生成通信端末２
０は、実施の形態１と同様、操作キー等からなる操作装
置２０ａ及び外部端子２０ｂを備えており、カメラ２９
により撮影した映像の基本音声への対応付けや、マッピ
ングルールの設定変更は、操作装置２０ａを用いて行う
ことが可能である。また、デジタルカメラ等で撮影した
映像を外部端子２０ｂから入力することで、外部で生成
した擬似画像データを擬似画像データベース２４に登録
することを可能としている。The image generation communication terminal 20 in the above configuration
Will be described. First, as in the first embodiment, the pseudo image database 24 is created and mapping rules are set as in the first embodiment. Image generation communication terminal 2
0 includes an operation device 20a including operation keys and the like and an external terminal 20b as in the first embodiment.
It is possible to associate the video captured with the basic sound with the basic audio and change the setting of the mapping rule using the operation device 20a. Further, by inputting an image captured by a digital camera or the like from the external terminal 20b, it is possible to register pseudo image data generated externally in the pseudo image database 24.

【００４６】前段階の設定がなされている状態で、画像
生成通信端末２０の利用者４は、操作装置２０ａでダイ
ヤル操作を行い、通話先であるＴＶ電話５０との通信を
確立する。もしくは、ＴＶ電話５０の利用者５によるダ
イヤル操作により通信が確立する。そして、利用者４が
マイク２７に対して発声すると、マイク２７が入力した
音声が音声信号に変換され、通信制御装置２１，５１を
介してＴＶ電話５０に送信され、ＴＶ電話５０のスピー
カ５３より出力される。In the state where the settings at the previous stage have been made, the user 4 of the image generation communication terminal 20 performs a dial operation with the operation device 20a to establish communication with the TV telephone 50 as a call destination. Alternatively, communication is established by a dial operation by the user 5 of the TV phone 50. When the user 4 speaks to the microphone 27, the voice input by the microphone 27 is converted into a voice signal, transmitted to the TV phone 50 via the communication control devices 21 and 51, and transmitted from the speaker 53 of the TV phone 50. Is output.

【００４７】また、画像生成通信端末２０がカメラ２９
を装備している場合には、画像及び音声を送信すること
で、ＴＶ電話５０側に音声及び画像データを出力するこ
とも可能である。The image generation communication terminal 20 is connected to the camera 29
When the device is equipped, it is also possible to output audio and image data to the TV phone 50 by transmitting images and audio.

【００４８】一方、ＴＶ電話５０の利用者５が発声した
音声が、マイク５４により音声信号に変換され、通信制
御装置５１，２１を介して、画像生成通信端末２０に送
信される。画像生成通信端末２０に送信された音声信号
は、制御装置２２に受け渡され、音声認識装置２５にお
いて音声の認識が行わる。さらに、制御装置２２は、認
識された音声をマッピング装置２２ａによって基本音声
に対応させ、擬似画像取得装置２２ｂによって擬似画像
データベース２４より対応する擬似画像データを取得す
る。そして、制御装置２２は、通信制御装置２１より受
信した音声信号をスピーカ２６に送信するとともに、擬
似画像データベース２４より取得した擬似画像データを
モニタ２８に送信する。以上説明した音声信号及び擬似
画像データのデータの流れを図４に示す。On the other hand, the voice uttered by the user 5 of the TV phone 50 is converted into a voice signal by the microphone 54 and transmitted to the image generation communication terminal 20 via the communication control devices 51 and 21. The voice signal transmitted to the image generation communication terminal 20 is passed to the control device 22, and the voice recognition device 25 performs voice recognition. Further, the control device 22 causes the recognized voice to correspond to the basic voice by the mapping device 22a, and obtains the corresponding pseudo image data from the pseudo image database 24 by the pseudo image acquisition device 22b. Then, the control device 22 transmits the audio signal received from the communication control device 21 to the speaker 26, and transmits the pseudo image data acquired from the pseudo image database 24 to the monitor 28. FIG. 4 shows the data flow of the audio signal and the pseudo image data described above.

【００４９】このようにして、画像生成通信端末２０側
には、ＴＶ電話５０の利用者５の音声と、当該音声に対
応した擬似画像がモニタ２８に出力されるので、あたか
も利用者５が会話をしているような映像が、リアルタイ
ムで利用者４側に映し出されるのである。In this way, the voice of the user 5 of the TV phone 50 and the pseudo image corresponding to the voice are output to the monitor 28 on the image generation communication terminal 20 side, so that the user 5 can talk. Is displayed on the user 4 side in real time.

【００５０】このような構成とすることで、両者がカメ
ラを装備していない通信端末を利用している場合であっ
ても、擬似的に相手の姿を見ながらの会話が可能とな
る。例えば、自宅に画像生成通信端末２０を設置する。
この場合、画像生成通信装置２０は据え置き型でもよ
い。そして、会社勤めの父親は携帯電話（図３で示した
ＴＶ電話５０に相当するが、この場合は、ＴＶ電話機能
を保有していない一般の携帯電話でよい。）を持ってい
るとする。そして、会社を出て帰路に向かう父親が携帯
電話で自宅に電話し、「もうすぐ帰るよ」等の会話をす
れば、自宅の画像生成通信端末２０では、あらかじめ擬
似画像データベース２４に保存されている父親の擬似画
像データを利用して、父親があたかも会話をしているよ
うな映像を映し出すことができるのである。これを利用
すれば、父親の帰宅をまっている小さな子供は、父親の
映像を見ながら会話をすることが可能となる。By adopting such a configuration, even when both use a communication terminal not equipped with a camera, it is possible to have a conversation while looking at the other party in a pseudo manner. For example, the image generation communication terminal 20 is installed at home.
In this case, the image generation communication device 20 may be a stationary type. Then, it is assumed that the father who works for the company has a mobile phone (corresponding to the TV phone 50 shown in FIG. 3, but in this case, a general mobile phone having no TV phone function may be used). Then, if the father who leaves the office and goes home returns and calls home with a mobile phone and has a conversation such as "I'm going home soon," the image generation communication terminal 20 at home stores the image in the pseudo image database 24 in advance. By using the father's pseudo image data, it is possible to display a video as if the father is having a conversation. If this is used, a small child who is returning home from his father can have a conversation while watching the image of his father.

【００５１】また、父親の持ち歩く通信端末は通常の携
帯電話を使用することができるという利点もある。つま
り、図３で示した実施の形態では、利用者５が使う端末
をＴＶ電話５０として説明したが、本実施の形態におい
ては、利用者５が使う端末は音声のみによる会話が可能
な電話（携帯電話）であってもよい。There is also an advantage that the communication terminal carried by the father can use a normal mobile phone. That is, in the embodiment shown in FIG. 3, the terminal used by the user 5 is described as the TV phone 50. However, in the present embodiment, the terminal used by the user 5 is a telephone ( Mobile phone).

【００５２】また、図３で示した構成では、ＴＶ電話５
０から画像生成通信端末２０に送信されるデータは音声
信号のみである。これにより、通信ネットワーク１にか
かる負荷を小さくできるという利点もある。Also, in the configuration shown in FIG.
Data transmitted from 0 to the image generation communication terminal 20 is only an audio signal. Thereby, there is also an advantage that the load on the communication network 1 can be reduced.

【００５３】｛実施の形態３｝次に、中間のサーバで擬
似画像を作成する実施の形態の構成について図５を用い
て説明する。なお、実施の形態３においては、音声及び
画像による通信を可能とする従来型の通信端末をＴＶ電
話５０Ａ，５０Ｂと称すが、ＴＶ電話５０Ａ，５０Ｂは
図１で示したＴＶ電話５０と同様の構成である。Third Embodiment Next, the configuration of a third embodiment in which a pseudo image is created by an intermediate server will be described with reference to FIG. In the third embodiment, conventional communication terminals that enable communication by voice and image are referred to as TV phones 50A and 50B, but the TV phones 50A and 50B are the same as the TV phones 50 shown in FIG. Configuration.

【００５４】通信サーバ３０は、受信制御装置３６及び
送信制御装置３７とを備える通信制御装置３１、制御装
置３２、マッピング装置３２ａ、擬似画像取得装置３２
ｂ、擬似画像データベース３４、音声認識装置３５等を
備えている。なお、制御装置３２、マッピング装置３２
ａ、擬似画像取得装置３２ｂ、擬似画像データベース３
４、音声認識装置３５の構成及び機能は、図１及び図３
で示した画像生成通信端末１０，２０が備えるそれぞれ
に該当する装置の構成及び機能と同様である。The communication server 30 includes a communication control device 31 having a reception control device 36 and a transmission control device 37, a control device 32, a mapping device 32a, and a pseudo image acquisition device 32.
b, a pseudo image database 34, a voice recognition device 35, and the like. The control device 32 and the mapping device 32
a, pseudo image acquisition device 32b, pseudo image database 3
4. The configuration and functions of the voice recognition device 35 are shown in FIGS.
The configurations and functions of the devices corresponding to the image generation communication terminals 10 and 20 shown in FIG.

【００５５】基本音声についても、同様に、基本音声を
「あ」「い」「う」「え」「お」「ん」の６つの音声、
つまり、母音及び「ん」音からなる音声の群で構成す
る。これにより、通信サーバ３０が受け取った音声は、
マッピング装置３２ａによって、全て母音及び「ん」音
にマッピングされることになる。Similarly, for the basic voices, the basic voices are divided into six voices “A”, “I”, “U”, “E”, “O”, and “N”.
In other words, it is composed of a group of voices composed of vowels and “n” sounds. Thereby, the voice received by the communication server 30 is
By the mapping device 32a, all the vowels and "n" sounds are mapped.

【００５６】このように、マッピング装置３２ａによ
り、全ての音声を基本音声にマッピングし、各基本音声
は対応する擬似画像データを擬似画像データベース３４
内に保有しているので、全ての音声に対して擬似画像デ
ータを対応させることが可能となる。As described above, all voices are mapped to basic voices by the mapping device 32a, and each of the basic voices corresponds to the corresponding pseudo image data in the pseudo image database 34.
, The pseudo image data can be associated with all the sounds.

【００５７】以上の構成における、通信サーバ３０の動
作状態について説明する。まず、前段階として実施の形
態１及び実施の形態２と同様に、擬似画像データベース
３４の作成及びマッピングルールの設定を行う。The operation state of the communication server 30 in the above configuration will be described. First, as in the first embodiment, as in the first and second embodiments, the creation of the pseudo image database 34 and the setting of mapping rules are performed.

【００５８】前段階の設定がなされている状態で、ＴＶ
電話５０Ａの利用者６は、ダイヤル操作を行うことによ
って、通話先であるＴＶ電話５０Ｂとの通信を確立す
る。もしくは、ＴＶ電話５０Ｂの利用者７のダイヤル操
作により通信を確立する。そして、利用者６がマイク５
４Ａに対して発声すると、マイク５４Ａが入力した音声
が音声信号に変換され、通信制御装置５１Ａから送信さ
れる。そして、音声信号は直接ＴＶ電話５０Ｂ側に送信
されるのではなく、通信サーバ３０に送信される。In the state where the setting of the previous stage has been made, the TV
The user 6 of the telephone 50A establishes communication with the TV telephone 50B, which is the call destination, by performing a dial operation. Alternatively, communication is established by a dial operation of the user 7 of the TV phone 50B. Then, the user 6 uses the microphone 5
When speaking to 4A, the voice input by microphone 54A is converted to a voice signal and transmitted from communication control device 51A. Then, the audio signal is transmitted to the communication server 30 instead of being directly transmitted to the TV phone 50B side.

【００５９】通信サーバ３０に送信された音声信号は、
受信制御装置３６を介して制御装置３２に受け渡され
る。そして、制御装置３２が入力した音声信号は、音声
認識装置３５において音声の認識が行われ、認識された
音声信号が制御装置３２に返される。そして、制御装置
３２は、認識された音声信号をマッピング装置３２ａに
より基本音声に対応させる。さらに、擬似画像取得装置
３２ｂにより、対応された基本音声の擬似画像データを
擬似画像データベース３４より取得する。The audio signal transmitted to the communication server 30 is
It is passed to the control device 32 via the reception control device 36. The voice signal input by the control device 32 is subjected to voice recognition in the voice recognition device 35, and the recognized voice signal is returned to the control device 32. Then, the control device 32 causes the recognized voice signal to correspond to the basic voice by the mapping device 32a. Further, the pseudo image data of the corresponding basic sound is obtained from the pseudo image database 34 by the pseudo image obtaining device 32b.

【００６０】このようにし、通信サーバ３０は、ＴＶ電
話端末５０Ａから受信した音声信号に基づいて擬似画像
データを取得し、受信した音声信号とともに取得した擬
似画像データを送信制御装置３７からＴＶ電話５０Ｂ側
に送信するのである。As described above, the communication server 30 acquires the pseudo image data based on the audio signal received from the TV telephone terminal 50A, and transmits the acquired pseudo image data together with the received audio signal from the transmission control device 37 to the TV telephone 50B. It is sent to the side.

【００６１】ＴＶ電話５０Ｂは、通信サーバ３０からの
音声信号及び擬似画像データを通信制御装置５１Ｂにお
いて受信し、制御装置５２Ｂに受け渡す。そして、制御
装置５２Ｂで取り出された音声信号はスピーカ５３Ｂよ
り出力され、画像データはモニタ５５Ｂに出力されるの
である。The TV telephone 50B receives the audio signal and the pseudo image data from the communication server 30 in the communication control device 51B, and transfers it to the control device 52B. Then, the audio signal extracted by the control device 52B is output from the speaker 53B, and the image data is output to the monitor 55B.

【００６２】これとは逆に、ＴＶ電話５０Ｂのマイク５
４Ｂで入力された音声は、通信サーバ３０において音声
認識されるとともにマッピング処理され、擬似画像デー
タとともにＴＶ電話５０Ａ側に送信される。On the contrary, the microphone 5 of the TV phone 50B
The voice input in 4B is voice-recognized and mapped in the communication server 30, and transmitted to the TV phone 50A together with the pseudo image data.

【００６３】このようにして、ＴＶ電話５０Ｂ側には、
ＴＶ電話５０Ａの利用者６の音声がスピーカ５３Ｂで再
生されるとともに、当該音声に対応した擬似画像がモニ
タ５５Ｂに出力され、ＴＶ電話５０Ａ側には、ＴＶ電話
５０Ｂの利用者７の音声がスピーカ５３Ａで再生される
とともに、当該音声に対応した擬似画像がモニタ５５Ａ
に出力されるので、両利用者ともに、あたかも会話をし
ている相手の映像を見ているような状態で、通話が行え
るのである。Thus, the TV phone 50B side
The voice of the user 6 of the TV phone 50A is reproduced by the speaker 53B, a pseudo image corresponding to the voice is output to the monitor 55B, and the voice of the user 7 of the TV phone 50B is displayed on the TV phone 50A side. 53A, and a pseudo image corresponding to the sound is displayed on the monitor 55A.
Is output to the user, so that both users can talk as if they are watching the video of the other party in conversation.

【００６４】なお、本実施の形態においては、通信サー
バ３０の擬似画像データベース３４は、利用者６及び利
用者７の双方の擬似画像データを保有していることにな
り、利用者６から利用者７への音声信号とともに送信さ
れる擬似画像データは利用者６が基本音声を発声してい
る擬似画像であり、利用者７から利用者６への音声信号
とともに送信される擬似画像データは利用者７が基本音
声を発声している擬似画像である。In this embodiment, the pseudo image database 34 of the communication server 30 holds the pseudo image data of both the user 6 and the user 7, and the user 6 The pseudo image data transmitted together with the audio signal to the user 7 is a pseudo image in which the user 6 utters the basic voice, and the pseudo image data transmitted together with the audio signal from the user 7 to the user 6 is the user. Reference numeral 7 denotes a pseudo image producing a basic sound.

【００６５】このような構成とすることで、両者がカメ
ラを装備していない通信端末を利用している場合であっ
ても、擬似的に相手の姿を見ながらの会話が可能とな
る。そして、本実施の形態によれば、各利用者が使用す
る端末に、音声認識装置や擬似画像データベースを備え
る必要がないので、システム全体でのコスト低減が図れ
る。By adopting such a configuration, even when both use a communication terminal not equipped with a camera, it is possible to have a conversation while simulating the appearance of the other party. According to the present embodiment, the terminal used by each user does not need to be provided with a voice recognition device or a pseudo image database, so that the cost of the entire system can be reduced.

【００６６】｛基本音声及びマッピング装置｝上述した
それぞれの実施の形態においては、基本音声は「あ」
「い」「う」「え」「お」の５つの母音と、「ん」音と
から構成される一例を説明したが、基本音声は、これに
限定されるものではなく、所定の言語についての音声を
発する人の表情をいくつかのパターンに分類した場合
に、そのパターンを代表する音声で構成すればよい。{Basic Voice and Mapping Apparatus} In each of the above-described embodiments, the basic voice is “A”.
An example has been described in which five vowels “I”, “U”, “E”, and “O” and “N” sound are used, but the basic voice is not limited to this. In the case where the expression of the person who utters the voice of the above is classified into several patterns, it may be constituted by the voice representing the pattern.

【００６７】母音及び「ん」音から基本音声を構成した
ものを例に説明すると、音声を発声する人の表情、つま
り口元の形状は、その音声の母音を発声する人の表情と
似ているという性質を利用しているのである。つまり、
母音を発声する人の表情により、音声を発する人の表情
を５つのパターンに分類可能としているのである。ただ
し、口を閉じている状態の表情が５つの母音には存在し
ないため、基本音声に「ん」音を加えることにより、音
声を発している全ての表情を、母音及び「ん」音で網羅
するようにしているのである。To explain an example in which a basic voice is composed of vowels and "n" sounds, the expression of a person who utters the voice, that is, the shape of the mouth is similar to the expression of the person who utters the vowel of the voice. That is the nature of that. That is,
According to the expression of the person who utters the vowel, the expression of the person who utters the voice can be classified into five patterns. However, since the facial expression with the mouth closed does not exist in the five vowels, all facial expressions emitting voice are covered by vowels and "n" sounds by adding the "n" sound to the basic voice. They are trying to do it.

【００６８】これと同様に、例えば、音声を発した人の
表情を、その特徴によって分類して、「口をとがらせた
状態で発する音声」や、「口を小さく閉ざした状態で発
する音声」などのパターンに分類し、それぞれのパター
ンの代表する音声を基本音声に設定し、同じパターンに
属する音声は、基本音声にマッピング（照合）するとと
もに、基本音声の擬似画像を利用するようにすればよ
い。Similarly, for example, the facial expression of the person who uttered the voice is classified according to its characteristics, and the "voice uttered with the mouth closed" or the "voice uttered with the mouth closed slightly" If the voices belonging to the same pattern are mapped (matched) to the basic voice and a pseudo image of the basic voice is used, Good.

【００６９】このように、音声を基本音声に属するパタ
ーンに分類し、基本音声の擬似画像を利用する方法であ
れば、利用者の発生する言語が日本語でない場合であっ
ても影響を受けない。つまり、入力される音声が例えば
英語である場合には、英語の発音を基本音声を含むパタ
ーンで分類し、全ての音声を基本音声にマッピングする
ようにすればよいのである。また、入力される音声が英
語である場合でも、ある程度の精度の低下を許容するな
らば、日本語による音声認識を行うことで、上述した母
音及び「ん」音を基本音声とするマッピング方法をその
まま利用することも可能である。As described above, if the method is to classify voices into patterns belonging to the basic voices and use a pseudo image of the basic voices, there is no effect even if the language generated by the user is not Japanese. . That is, when the input voice is, for example, English, the English pronunciation may be classified by a pattern including the basic voice, and all voices may be mapped to the basic voice. In addition, even if the input voice is in English, if a certain degree of accuracy is tolerated, by performing voice recognition in Japanese, the above-described mapping method in which the vowel and the "n" sound are used as the basic voice is used. It can be used as it is.

【００７０】｛擬似画像データ｝前述した実施の形態に
おいては、擬似画像データは、図７でも示したように、
基本音声を発声している人の表情を含む映像情報とし
た。一方、音声を発声している人の表情は、その口元に
最も特徴があらわれることから、擬似画像データは、口
元部分のみの映像情報としてもよい。この場合、図９に
示すように、口元以外の部分、つまり、目や鼻などの顔
の他の部分や、人の後ろの背景部分などは、共通部分と
して別の映像情報（共通画像データ）を容易しておき、
この共通部分の映像と、口元部分の映像情報（口元画像
データ）とを合成するようにしてもよい。{Pseudo Image Data} In the above-described embodiment, the pseudo image data is, as shown in FIG.
The video information includes the facial expression of the person uttering the basic sound. On the other hand, since the facial expression of the person who is uttering the voice has the most characteristic at the lip, the pseudo image data may be video information of only the lip. In this case, as shown in FIG. 9, parts other than the mouth, that is, other parts of the face such as eyes and nose, and a background part behind a person are different video information (common image data) as common parts. Make it easier,
The video of the common portion and the video information (mouth image data) of the mouth portion may be combined.

【００７１】擬似画像データをこのような構成とするこ
とで、変動部分（口元部分）の画像データの容量を小さ
くできるため、擬似画像データベースの容量を小さくす
ることが可能となる。また、口元と目、口元とまゆな
ど、特徴部分を複数の画像データの組合せにしてもよ
い。When the pseudo image data is configured as described above, the capacity of the image data of the fluctuating portion (mouth portion) can be reduced, so that the capacity of the pseudo image database can be reduced. Further, the characteristic portion such as the lip and the eye and the lip and the eyebrows may be a combination of a plurality of image data.

【００７２】｛応用例｝以上説明した擬似画像を利用し
た通信端末等は、通信（通話）を行う双方が、カメラを
装備したＴＶ電話を所有していなくても、擬似的にＴＶ
電話としての機能を維持する効果が得られるものである
が、カメラの装備の有無に関係なく、次に示すような様
々な応用が可能である。<< Application Example >> A communication terminal or the like using a pseudo image described above can perform pseudo-TV communication even if both parties performing communication (call) do not own a TV phone equipped with a camera.
Although the effect of maintaining the function as a telephone can be obtained, various applications as described below are possible regardless of the presence or absence of a camera.

【００７３】（１）留守録モード会議中や電車の中などにいる場合など、電話にでられな
い状態の時には、携帯電話を留守番録音モードとする場
合があるが、この留守番録音モードにおいて電話がかか
ってきた場合には、あらかじめ録音した、若しくは、あ
らかじめ携帯電話が保有している留守番メッセージが流
される。そして、この留守番メッセージの音声に対して
も基本音声へのマッピングを行い、擬似画像データを取
得することで、留守番メッセージとともに擬似画像デー
タを通話相手に送信することが可能である。これによっ
て、あたかも本人が喋っているような留守番メッセージ
を送信することができる。(1) Answering Machine Mode When the telephone cannot be answered, such as during a meeting or in a train, the mobile phone may be in the answering machine recording mode. When the call arrives, an answering machine message recorded in advance or held in advance by the mobile phone is played. Then, by mapping the voice of the answering machine message to the basic voice and acquiring the pseudo image data, it is possible to transmit the pseudo image data together with the answering machine message to the other party. Thereby, it is possible to transmit an answering machine message as if the person is talking.

【００７４】また、留守番録音モード中に電話をかけて
きた通話相手が、メッセージを残した場合にも、その通
話相手の擬似画像データを保有している場合には、あた
かも本人が喋っているようなメッセージを後から確認す
ることができる。Also, if the other party who calls the telephone while in the answering machine recording mode leaves a message, but has pseudo image data of the other party, it is as if the person is talking. Message can be checked later.

【００７５】（２）エージェントモード前述した実施の形態においては、擬似画像データは、音
声を発する人の表情であり、実際に通話を行っている人
の擬似画像データを送信することで、あたかも本人が喋
っているような効果を発揮するものであるが、この擬似
画像をまったく別の人の映像や、アニメーションのキャ
ラクタなどの映像で代用するようにしてもよい。(2) Agent Mode In the above-described embodiment, the pseudo image data is a facial expression of a person who emits a voice. By transmitting the pseudo image data of a person who is actually talking, the pseudo image data is transmitted. The pseudo image is effective, but the pseudo image may be replaced with an image of a completely different person or an image of an animated character or the like.

【００７６】例えば、どうしても話し辛い内容を相手に
伝える場合、代わりにアニメーションのキャラクタの映
像を用いて会話をするといった使い方が可能である。ま
た、いやがらせ電話の被害を受けている女性が、かわり
に強面の男性の擬似画像を利用して、撃退対策を打つな
どの使い方も可能である。このようなエージェントのキ
ャラクタは、たとえば、インターネット等で提供される
データをダウンロードすることで入手できるようにして
もよい。For example, when it is absolutely necessary to convey difficult-to-talk content to the other party, it is possible to use a method of having a conversation using an animation character image instead. In addition, it is also possible to use such a method that a woman suffering from a harassment telephone call uses a pseudo image of a strong man to take measures against repulsion instead. Such an agent character may be obtained, for example, by downloading data provided on the Internet or the like.

【００７７】（３）背景モード擬似画像データは、基本音声のそれぞれの音声に対応し
た画像を一通り用意すればよいが、１つの基本音声に対
して複数種類の擬似画像データを擬似画像データベース
に蓄積するようにしてもよい。つまり、基本音声である
「あ」という音声に対して、背景の異なる複数種類の擬
似画像データを蓄積しておくのである。(3) Background Mode For the pseudo image data, it is only necessary to prepare one image corresponding to each of the basic sounds, but a plurality of types of pseudo image data for one basic sound is stored in the pseudo image database. You may make it accumulate. In other words, a plurality of types of pseudo image data having different backgrounds are stored for the sound "a" as the basic sound.

【００７８】例えば、「社内用」の擬似画像データとし
て、社内で撮影した基本音声に対応する擬似画像データ
を通話先に送信すれば、社外にいる場合であっても、あ
たかも社内から電話をしているような効果が得られる。
また、「観光用」の擬似画像データとして、いずれかの
観光地で基本音声に対応した画像を撮影しておけば、観
光地から戻った後であっても、あたかも観光地から電話
をかけているような効果が得られる。For example, if pseudo image data corresponding to the basic sound taken in the company is transmitted to the called party as pseudo image data for “in-house”, even if the user is outside the company, it is as if a call is made from within the company. The effect as described above is obtained.
Also, if you shoot an image corresponding to the basic sound at any of the sightseeing spots as pseudo image data for "sightseeing", you can call from the sightseeing spot even if you return from the sightseeing spot The same effect can be obtained.

【００７９】このように背景の異なる複数種類の擬似画
像データを利用することで、様々な用途に応じた利用が
可能となる。背景モードは、例えば図１で示した実施の
形態を例にすれば、擬似画像データベース１４に、「社
内用」、「観光用」等の複数種類の擬似画像データを蓄
積しておき、操作装置１０ａで、背景モードの切替操作
を可能にしておけばよい。これにより、擬似画像取得装
置１２ｂは、マッピングされた基本音声と背景モードに
応じた擬似画像データを取得し、通信先には、背景モー
ドに対応した擬似画像が送信されるのである。By using a plurality of types of pseudo image data having different backgrounds as described above, it is possible to use the pseudo image data according to various uses. In the background mode, for example, in the embodiment shown in FIG. 1, a plurality of types of pseudo image data such as “for office” and “for sightseeing” are stored in the pseudo image database 14 and the operation device At 10a, the switching operation of the background mode may be enabled. As a result, the pseudo image acquisition device 12b acquires the mapped basic sound and the pseudo image data corresponding to the background mode, and the pseudo image corresponding to the background mode is transmitted to the communication destination.

【００８０】[0080]

【発明の効果】以上説明したように、請求項１記載の発
明では、送信側の通信端末において、音声認識を行うと
ともに擬似画像データを生成し、通信先に音声及び擬似
画像を送信するように構成したので、送信側がカメラを
搭載していない端末であっても、擬似的にテレビ電話と
同様の効果が得られる。As described above, according to the first aspect of the present invention, the communication terminal on the transmitting side performs the speech recognition, generates the pseudo image data, and transmits the voice and the pseudo image to the communication destination. With this configuration, even if the transmitting side is a terminal not equipped with a camera, the same effect as a videophone can be obtained in a pseudo manner.

【００８１】請求項２の発明では、受信側の通信端末に
おいて、受信した音声の音声認識を行うとともに擬似画
像データを生成し、音声をスピーカに、擬似画像データ
をモニタに出力するように構成したので、送信側がカメ
ラを搭載していない端末であっても、擬似的にテレビ電
話と同様の効果が得られる。また、通信ネットワークに
は音声信号のみが送信されるので、回線の負荷を軽減さ
せることが可能となる。According to the second aspect of the present invention, the communication terminal on the receiving side is configured to perform voice recognition of the received voice, generate pseudo image data, and output the voice to the speaker and the pseudo image data to the monitor. Therefore, even if the transmitting side is a terminal without a camera, the same effect as a videophone can be obtained. Further, since only the audio signal is transmitted to the communication network, the load on the line can be reduced.

【００８２】請求項３または請求項１２の発明では、中
間の通信サーバが、送信側通信端末から送信された音声
の音声認識を行うとともに擬似画像データを生成し、音
声および擬似画像データを受信側通信端末に送信するよ
う構成したので、送信側がカメラを搭載していない端末
であっても、擬似的にテレビ電話と同様の効果が得られ
る。さらに、音声認識装置や擬似画像データベースを通
信サーバのみが備えればよいので、システム全体として
のコストを低くすることができる。According to the third or twelfth aspect of the present invention, the intermediate communication server performs voice recognition of the voice transmitted from the transmitting side communication terminal, generates pseudo image data, and transmits the voice and the pseudo image data to the receiving side. Since the transmission is performed to the communication terminal, even if the transmission side is a terminal not equipped with a camera, the same effect as a videophone can be obtained in a pseudo manner. Furthermore, since only the communication server needs to include the voice recognition device and the pseudo image database, the cost of the entire system can be reduced.

【００８３】請求項４の発明では、擬似画像データは、
基本音声を発声する人の表情を含む映像情報としたの
で、擬似画像データを見ている通話先の相手は、あたか
も本人が喋っているような感覚が得られる。According to the fourth aspect of the present invention, the pseudo image data is
Since the video information includes the expression of the person who utters the basic voice, the other party who is looking at the pseudo image data can feel as if he or she is talking.

【００８４】請求項５の発明では、擬似画像データは、
基本音声を発声する人の口元の形状をあらわす映像情報
としたので、擬似画像データを見ている通話先の相手
は、あたかも本人が喋っているような感覚が得られる。
また、変動部分の画像データの容量を小さくすること
で、擬似画像データベースの容量を節約することが可能
となる。According to the fifth aspect of the present invention, the pseudo image data is
Since the video information represents the shape of the mouth of the person who utters the basic sound, the other party who is looking at the pseudo image data can feel as if the person is talking.
Further, by reducing the capacity of the image data of the fluctuating portion, it is possible to save the capacity of the pseudo image database.

【００８５】請求項６の発明では、擬似画像データベー
スは、１つの基本音声に対して、背景の異なる複数種類
の擬似画像データを備えるので、実際に通話をしている
場所とは別の場所から通話をしているような効果を得る
ことができる。According to the sixth aspect of the present invention, the pseudo image database includes a plurality of types of pseudo image data having different backgrounds for one basic voice. It is possible to obtain the effect of making a call.

【００８６】請求項７または請求項８の発明では、基本
音声は５つの母音を含むので、音声を発声する人の表情
を最小限の基本音声で分類可能となる。According to the seventh or eighth aspect of the present invention, since the basic voice includes five vowels, it is possible to classify the facial expression of the person who utters the voice with the minimum basic voice.

【００８７】請求項９または請求項１０の発明では、基
本音声は５つの母音及び「ん」音を含むので、音声を発
する人の表情を最小限の基本音声で分類可能であり、か
つ、網羅的に分類可能となる。According to the ninth or tenth aspect of the present invention, since the basic voice includes five vowels and the "n" sound, the facial expression of a person who emits the voice can be classified with the minimum basic voice, and the basic voice can be covered. Classification is possible.

【００８８】請求項１１に記載の発明では、請求項１な
いし請求項１０のいずれかの通信システム用装置を用い
ることにより、擬似画像を利用した多様なシステムを構
成可能とし、上述した効果が得られる。According to the eleventh aspect of the present invention, by using the communication system apparatus according to any one of the first to tenth aspects, it is possible to configure various systems using a pseudo image, and the above-described effects are obtained. Can be

[Brief description of the drawings]

【図１】送信端末において擬似画像データを作成する実
施の形態の端末及びシステム構成図である。FIG. 1 is a configuration diagram of a terminal and a system according to an embodiment for creating pseudo image data in a transmission terminal.

【図２】図１で示す実施の形態におけるデータの流れを
示す図である。FIG. 2 is a diagram showing a data flow in the embodiment shown in FIG.

【図３】受信端末において擬似画像データを作成する実
施の形態の端末及びシステム構成図である。FIG. 3 is a diagram showing a terminal and system configuration of an embodiment for generating pseudo image data in a receiving terminal.

【図４】図２で示す実施の形態におけるデータの流れを
示す図である。FIG. 4 is a diagram showing a data flow in the embodiment shown in FIG.

【図５】中間のサーバにおいて擬似画像データを作成す
る実施の形態の端末及びシステム構成図である。FIG. 5 is a diagram showing a terminal and system configuration of an embodiment for creating pseudo image data in an intermediate server.

【図６】図５で示す実施の形態におけるデータの流れを
示す図である。FIG. 6 is a diagram showing a data flow in the embodiment shown in FIG.

【図７】基本音声と擬似画像の対応関係を示す図であ
る。FIG. 7 is a diagram showing a correspondence between a basic sound and a pseudo image.

【図８】マッピング装置の処理の概要を示す図である。FIG. 8 is a diagram illustrating an outline of processing of a mapping device.

【図９】口元画像を利用した擬似画像データの構成を示
す図である。FIG. 9 is a diagram showing a configuration of pseudo image data using a lip image.

[Explanation of symbols]

１０画像生成通信端末１１通信制御装置１２制御装置１２ａマッピング装置１２ｂ擬似画像取得装置１４擬似画像データベース１５音声認識装置１６スピーカ１７マイク１８モニタ１９カメラ５０ＴＶ電話５１通信制御装置５２制御装置５３スピーカ５４マイク５５モニタ Reference Signs List 10 image generation communication terminal 11 communication control device 12 control device 12a mapping device 12b pseudo image acquisition device 14 pseudo image database 15 voice recognition device 16 speaker 17 microphone 18 monitor 19 camera 50 TV phone 51 communication control device 52 control device 53 speaker 54 microphone 55 monitors

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０４Ｎ 5/76 Ｇ１０Ｌ 5/06 Ｄ (72)発明者桑名稔大阪府大阪市中央区安土町二丁目３番13号大阪国際ビルミノルタ株式会社内 (72)発明者本田努大阪府大阪市中央区安土町二丁目３番13号大阪国際ビルミノルタ株式会社内 (72)発明者石丸和彦大阪府大阪市中央区安土町二丁目３番13号大阪国際ビルミノルタ株式会社内 (72)発明者長田英喜大阪府大阪市中央区安土町二丁目３番13号大阪国際ビルミノルタ株式会社内Ｆターム(参考） 5C052 AA12 AB04 AC02 DD02 DD06 EE03 5D015 AA01 BB02 CC18 KK02 5K101 KK04 LL12 MM07 NN07 NN08 NN16 NN18 NN23 NN36 NN37──────────────────────────────────────────────────の Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) H04N 5/76 G10L 5/06 D (72) Inventor Minoru Kuwana 2-3 Azuchicho, Chuo-ku, Osaka-shi, Osaka No. 13 Osaka International Building Minolta Co., Ltd. (72) Inventor Tsutomu Honda 2-3-3 Azuchicho, Chuo-ku, Osaka City, Osaka Prefecture Osaka International Building Minolta Co., Ltd. (72) Inventor Kazuhiko Ishimaru Chuo, Osaka City, Osaka Osaka International Building Minolta Co., Ltd. 2-3-1-13 Azuchi-cho, Ward (72) Inventor Hideki Nagata 2-3-13 Azuchi-cho, Chuo-ku, Osaka-shi, Osaka F-term in Osaka International Building Minolta Co., Ltd. 5C052 AA12 AB04 AC02 DD02 DD06 EE03 5D015 AA01 BB02 CC18 KK02 5K101 KK04 LL12 MM07 NN07 NN08 NN16 NN18 NN23 NN36 NN37

Claims

[Claims]

An apparatus for a communication system enabling communication by voice and image, wherein the apparatus for a communication system is configured as a communication terminal, wherein the communication terminal includes: a microphone for inputting sound; A voice recognition device that receives a voice signal from the voice recognition device and performs voice recognition; a mapping device that performs matching between the voice recognized by the voice recognition device and a predetermined basic voice; and corresponds to the predetermined basic voice. A pseudo image database that accumulates the obtained pseudo image data, a pseudo image acquisition device that obtains pseudo image data corresponding to the collated basic sound from the pseudo image database, an audio signal of an audio input from the microphone, A communication control device for transmitting the pseudo image data acquired by the pseudo image acquisition device to the designated communication destination terminal. Communication system, characterized in Rukoto device.

2. An apparatus for a communication system that enables communication by voice and image, wherein the apparatus for a communication system is configured as a communication terminal, and the communication terminal transmits an audio signal transmitted from a communication destination terminal. A voice recognition device that recognizes a voice from a voice signal received by the communication control device; and performs collation between a voice recognized by the voice recognition device and a predetermined basic voice. A mapping device, a pseudo image database storing pseudo image data corresponding to the predetermined basic sound, and a pseudo image obtaining device obtaining pseudo image data corresponding to the collated basic sound from the pseudo image database. A speaker that reproduces an audio signal received by the communication control device as audio; And a monitor for outputting similar image data.

3. An apparatus for a communication system which enables communication by voice and image, wherein said apparatus for a communication system is configured as a communication server, and said communication server transmits a voice transmitted from a communication terminal on the transmitting side. A reception control device that receives a signal, a speech recognition device that performs speech recognition from a speech signal received by the reception control device, and a comparison between a speech recognized by the speech recognition device and a predetermined basic speech. A pseudo-image database that stores pseudo-image data corresponding to the predetermined basic sound; and a pseudo-image acquisition device that obtains pseudo image data corresponding to the collated basic sound from the pseudo image database. Transmitting the audio signal received by the reception control device and the pseudo image data acquired by the pseudo image acquisition device Communication system, characterized in that it comprises a transmission control unit that transmits to the communication terminal apparatus.

4. The communication system device according to claim 1, wherein the pseudo image data is video information including a facial expression of a person who utters the basic sound. Communication system device.

5. The communication system device according to claim 1, wherein the pseudo image data is video information representing a shape of a mouth of a person who utters the basic sound. An apparatus for a communication system, comprising:

6. The communication system device according to claim 4, wherein the pseudo image database includes a plurality of types of pseudo image data having different backgrounds for the same basic sound. An apparatus for a communication system, comprising:

7. The communication system device according to claim 4, wherein the basic voice includes a combination of five vowels for a predetermined language, and the pseudo image database includes: An apparatus for a communication system, comprising at least pseudo image data corresponding to each of the vowels.

8. The communication system device according to claim 7, wherein the mapping device includes a function of comparing a vowel of the voice with a voice recognized by the voice recognition device. System equipment.

9. The communication system device according to claim 4, wherein the basic voice includes five vowels and “n” sound for Japanese, and the pseudo image database. Comprises at least pseudo image data corresponding to each of the vowels and the "n" sound.

10. The communication system device using the pseudo image according to claim 9, wherein the mapping device is configured to, when the voice recognized by the voice recognition device is a voice other than the “n” sound. An apparatus for a communication system, comprising a function of collating a vowel of the voice with the voice and, when the voice recognized by the voice recognition device is a "n" sound, collating the "n" sound as it is.

11. A communication system capable of voice and image communication, wherein a specific communication device among a plurality of communication devices involved in communication from a transmitting communication terminal to a receiving communication terminal is: A communication system comprising the communication system device according to any one of claims 1 to 10.

12. The communication system according to claim 11, wherein the specific communication device is a communication server that relays communication between the transmitting communication terminal and the receiving communication terminal. Communications system.