JP2001145103A

JP2001145103A - Transmission device and communication system

Info

Publication number: JP2001145103A
Application number: JP32763599A
Authority: JP
Inventors: Yoichi Yamada; 陽一山田
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1999-11-18
Filing date: 1999-11-18
Publication date: 2001-05-25

Abstract

PROBLEM TO BE SOLVED: To improve reliability of a multimedia communication system. SOLUTION: This transmission device is equipped with a band recognizing means which recognizes the quantity of a band allowed by a network for transmission according to a congestion state, a video encoding means which generates video information by inputting and encoding a video signal, an audio data encoding means which generates audio data information corresponding to audio information by inputting and encoding an audio signal, an audio text encoding means which generates audio text information corresponding to the audio information by converting the inputted audio signal into a text signal through speech recognition and encoding the text information, and a transmission information selecting means which sends the video information and audio data information as transmit information, when the allowed band is wide and the audio text information as transmit information, when it is small.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は通信システムに関
し、例えば映像、音声などのマルチメディアデータを、
ネットワークを介して通信する場合に適用し得るもので
ある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a communication system for transmitting multimedia data such as video and audio.
This can be applied to a case where communication is performed via a network.

【０００２】また、本発明は、かかる通信システムの構
成要素としての送信装置に関するものである。[0002] The present invention also relates to a transmission device as a component of such a communication system.

【０００３】[0003]

【従来の技術】ネットワークを介して映像情報や音声情
報を通信する従来の技術としては、特開平１０−１６４
５３３号公報（文献１）や、特開平１０−２８５２７５
号公報（文献２）に記載されたものがある。2. Description of the Related Art Japanese Patent Laid-Open No. 10-164 discloses a conventional technique for communicating video information and audio information via a network.
No. 533 (Reference 1) and Japanese Patent Application Laid-Open No. 10-285275.
There is one described in Japanese Patent Application Laid-Open Publication No. H10 (Document 2).

【０００４】文献１の画像通信方法及び装置では、帯域
非保証であるＩＰ（Internet Protocol）ネットワーク
のような通信ネットワーク上でマルチメディアデータを
実時間通信する際、ネットワークが輻輳状態となり使用
可能な帯域が減少した時、該限定された許容帯域の中で
最適なマルチメディアデータを通信する。In the image communication method and apparatus of Document 1, when multimedia data is communicated in real time over a communication network such as an IP (Internet Protocol) network for which bandwidth is not guaranteed, the network becomes congested and the available bandwidth is changed. When the number decreases, optimal multimedia data is communicated within the limited allowable band.

【０００５】この通信では、ネットワークのデータ損失
割合などから使用可能な帯域を推定し、ユーザが予め行
っておく各データの優先度（音声優先で送信するか否
か）の設定に基づき、アプリケーションが画像優先、音
声優先の制御を行う。[0005] In this communication, an available bandwidth is estimated from the data loss rate of the network and the like, and based on the setting of the priority of each data (whether to transmit with priority to voice) set by the user, the application performs Control of image priority and audio priority is performed.

【０００６】したがって、例えば、音声優先の場合に
は、音声データに要求通りの帯域を割り当て、残りの帯
域を映像データに割り当てることになる。Therefore, for example, in the case of audio priority, a band as requested is allocated to audio data, and the remaining band is allocated to video data.

【０００７】また、文献１では、フレームレート、画質
（圧縮パラメータ）、画像サイズ（解像度）などの画像
パラメータを調整することにより、伝送容量内でユーザ
が要求する映像品質を実現させようとしている。[0007] Further, in Reference 1, an image parameter such as a frame rate, an image quality (compression parameter), and an image size (resolution) is adjusted to realize a video quality required by a user within a transmission capacity.

【０００８】次に、文献２の通話方法では、送話側で、
話者の音声に対応した音声データを、音声認識によって
いったん文字データに変換し、当該文字データをパケッ
ト化してネットワークに送出する。このネットワークか
ら当該パケットを受信する受話側では、パケットから取
り出した文字データを音声合成によって音声データに変
換し、音声として出力する。[0008] Next, in the telephone call method of Reference 2, on the transmitting side,
The voice data corresponding to the voice of the speaker is once converted into character data by voice recognition, and the character data is packetized and transmitted to the network. On the receiving side that receives the packet from the network, the character data extracted from the packet is converted into voice data by voice synthesis and output as voice.

【０００９】これによって文献２は、通信の不安定性や
負荷変動に起因する音声劣化、言葉の不自然な分断、音
飛び等を極力排して、理解容易な会話を可能にするとと
もに、自然な会話を可能にすることを目的としている。[0009] Thus, the document 2 enables easy-to-understand conversations while minimizing voice deterioration, unnatural division of words, and skipping of sound due to communication instability and load fluctuation, and enabling natural conversation. The purpose is to enable conversation.

【００１０】[0010]

【発明が解決しようとする課題】しかしながら、文献１
のように、画像の空間解像度、フレームレート、圧縮パ
ラメータを最適に設定する方法は、個々のフレームの内
容の重要度を反映するものではなく、時系列な多数フレ
ーム全体としての取り扱いを規定しているにすぎない。[0005] However, Document 1
The method of optimally setting the spatial resolution, frame rate, and compression parameters of an image does not reflect the importance of the contents of individual frames, but specifies the handling of multiple time-series frames as a whole. It's just that.

【００１１】なお、文献１では、音声優先や画像優先の
処理が行われ得るが、これも、例えば画像に関しては、
個々のフレームの重要度を反映した処理ではなく、全体
として音声の方を画像よりも優先するか否かを問題とし
ている。[0011] Note that in Document 1, audio-priority or image-priority processing can be performed.
The problem is not whether the processing reflects the importance of each frame but whether the sound is given priority over the image as a whole.

【００１２】このような方法では、ネットワークの輻輳
度が低いときには対応できても、輻輳度が高まってくる
と対応が困難になると考えられる。これは、例えば、映
像優先の設定がなされていてもその映像にさえ要求通り
の帯域を割り当てることができないような状況である。With such a method, it is considered that even if the congestion degree of the network is low, it is difficult to cope with the congestion degree. This is a situation where, for example, even if the video priority is set, it is not possible to allocate the requested bandwidth even to the video.

【００１３】要求通りの帯域が割り当てられないという
ことは、優先度の高い映像情報に関してさえ、ネットワ
ークにおける伝送過程で、パケット損失割合が高まっ
て、正常な伝送が行えない可能性が高い。[0013] The fact that the bandwidth is not allocated as requested means that even for high-priority video information, there is a high possibility that normal transmission cannot be performed due to an increase in the packet loss ratio in the transmission process in the network.

【００１４】このような状況では、前記許容帯域が極め
て小さいために、送信側通信装置からパケットを送信す
れば、当該パケット送信自体が、当該送信側通信装置か
ら送信される後続のパケットの損失割合を高める要因と
して作用するとも考えられ、極めて小さい許容帯域をど
のようにして効率的に活用するかが問題となる。In such a situation, if the packet is transmitted from the transmitting communication device because the allowable band is extremely small, the packet transmission itself becomes the loss ratio of the subsequent packet transmitted from the transmitting communication device. The problem is how to utilize the extremely small allowable band efficiently.

【００１５】次に、文献２では、音声情報の伝送は、音
声データを音声認識によって文字データ化しこの文字デ
ータをパケットに詰め込んでネットワークを伝送するこ
とによって行われるが、映像情報の伝送とこの音声情報
の伝送の関係については明確ではない。ただ、話者の顔
等の映像データを転送し、音声とともに映像データを相
手側に転送してもよい旨記載されているだけである。[0015] Next, in Document 2, transmission of audio information is performed by converting audio data into character data by voice recognition, packing the character data into packets, and transmitting the data through a network. The relationship between information transmission is not clear. However, it is merely described that video data such as the face of a speaker may be transferred, and the video data may be transferred to the other party together with the sound.

【００１６】そこで、映像情報の送信と音声情報の送信
を、どのように関係づけるかが問題となる。Therefore, how to relate the transmission of the video information and the transmission of the audio information becomes a problem.

【００１７】[0017]

【課題を解決するための手段】かかる課題を解決するた
めに、第１の発明では、映像情報及び／又は音声情報を
送信情報とし、当該送信情報をネットワークを介して受
信装置に送信する送信装置において、（１）前記ネット
ワークがその輻輳状態に応じて当該送信のために許容す
る許容帯域の量を認識する帯域認識手段と、（２）映像
信号を入力して符号化することにより前記映像情報を生
成する映像符号化手段と、（３）音声信号を入力して符
号化することにより前記音声情報に対応する音声データ
情報を生成する音声データ符号化手段と、（４）入力し
た音声信号を音声認識によりテキスト信号に変換し、当
該テキスト信号を符号化することにより、前記音声情報
に対応する音声テキスト情報を生成する音声テキスト符
号化手段と、（５）前記許容帯域が大きい場合には、前
記映像情報及び音声データ情報を前記送信情報として送
信し、前記許容帯域が小さい場合には、前記音声テキス
ト情報を前記送信情報として送信する送信情報選択手段
とを備えたことを特徴とする。According to a first aspect of the present invention, there is provided a transmitting apparatus for transmitting video information and / or audio information as transmission information and transmitting the transmission information to a receiving apparatus via a network. (1) a band recognizing means for recognizing an amount of a permissible band permitted for the transmission by the network according to the congestion state; and (2) the video information by inputting and encoding a video signal. And (3) audio data encoding means for generating audio data information corresponding to the audio information by inputting and encoding an audio signal, and (4) converting the input audio signal Voice text encoding means for converting into a text signal by voice recognition and encoding the text signal to generate voice text information corresponding to the voice information; (5) When the allowable band is large, the video information and the audio data information are transmitted as the transmission information, and when the allowable band is small, the transmission information selecting unit that transmits the audio text information as the transmission information. It is characterized by having.

【００１８】また、第２、第３の発明では、少なくとも
映像情報を送信情報と成し、当該送信情報をネットワー
クを介して受信装置に送信する送信装置において、以下
の構成要件を備えたことを特徴とする。According to the second and third aspects of the present invention, at least the transmitting device for forming video information as transmission information and transmitting the transmission information to the receiving device via a network has the following configuration requirements. Features.

【００１９】すなわち、第２の発明では、（１）前記ネ
ットワークがその輻輳状態に応じて当該送信のために許
容する許容帯域の量を認識する帯域認識手段と、（２）
前記映像情報が動画像である場合、当該動画像を構成す
る時系列な複数画面の中から、画面中の画像に関する変
化量が大きい画面ほど重要度が高いものと判定して、重
要度の高い画面を選択する重要画面選択手段と、（３）
前記許容帯域が大きい場合には、時系列な全ての画面
を、前記送信情報として送信し、前記許容帯域が小さい
場合には、時系列な複数画面の内、前記重要画面選択手
段によって選択された画面だけを前記送信情報として送
信する映像情報選択手段とを備えたことを特徴とする。That is, in the second invention, (1) band recognizing means for recognizing the amount of allowable band allowed for the transmission by the network according to the congestion state, and (2)
When the video information is a moving image, from among a plurality of time-series screens constituting the moving image, it is determined that a screen having a larger amount of change regarding an image in the screen has a higher importance, and the importance is higher. Important screen selecting means for selecting a screen; (3)
If the permissible bandwidth is large, all time-series screens are transmitted as the transmission information, and if the permissible bandwidth is small, the screen is selected by the important screen selecting means from a plurality of time-series screens. Video information selecting means for transmitting only a screen as the transmission information.

【００２０】また、第３の発明では、（１）前記ネット
ワークがその輻輳状態に応じて当該送信のために許容す
る許容帯域の量を認識する帯域認識手段と、（２）被写
体を撮影して時系列な撮影画面を生成する撮影手段と、
（３）この撮影手段が生成した撮影画面を符号化して、
前記映像情報を構成する時系列な画面情報を得る撮影画
面符号化手段と、（４）各画面情報の情報量を検出する
画面情報量検出手段と、（５）前記許容帯域が小さい場
合には、前記画面情報の情報量を参照しながら前記撮影
手段と被写体との光学的または空間的な関係を変化させ
ることで、前記撮影手段に、当該画面情報の情報量が小
さい撮影画面を生成させる撮影制御手段とを備えたこと
を特徴とする。Further, in the third invention, (1) a band recognizing means for recognizing an amount of an allowable band allowed for the transmission by the network according to the congestion state, and (2) photographing a subject. Shooting means for generating a time-series shooting screen;
(3) Encoding the photographing screen generated by the photographing means,
Photographing screen encoding means for obtaining time-series screen information constituting the video information; (4) screen information amount detecting means for detecting the information amount of each screen information; and (5) when the allowable band is small. Changing the optical or spatial relationship between the photographing unit and the subject while referring to the information amount of the screen information, thereby causing the photographing unit to generate a photographing screen having a small information amount of the screen information. Control means.

【００２１】さらに、第４の発明では、送信装置が送信
した少なくとも映像情報を、受信情報としてネットワー
クを介して受信する受信装置を備えた通信システムにお
いて、当該受信装置は、（１）実際に受信した受信情報
の内容に応じて、前記ネットワークが当該映像情報の送
信のために許容する許容帯域の量に関連する帯域関連情
報を得る帯域関連情報取得手段と、（２）前記ネットワ
ークがその輻輳状態に応じて前記送信のために許容する
許容帯域の量を、前記送信装置に認識させるために、当
該帯域関連情報を含む制御情報を前記送信装置に送信す
る制御情報送信手段とを備え、（４）前記送信装置は、
請求項１〜３のいずれかの送信装置であることを特徴と
する。Further, according to the fourth invention, in a communication system including a receiving device for receiving at least video information transmitted by the transmitting device as reception information via a network, the receiving device includes: Bandwidth-related information acquisition means for obtaining bandwidth-related information relating to the amount of allowable bandwidth allowed for transmission of the video information by the network according to the content of the received information; Control information transmitting means for transmitting control information including the band-related information to the transmitting device so as to allow the transmitting device to recognize the amount of allowable bandwidth permitted for the transmission according to (4). ) The transmitting device comprises:
The transmission device according to any one of claims 1 to 3.

【００２２】[0022]

【発明の実施の形態】（Ａ）実施形態以下、本発明の送信装置及び通信システムに関する第１
〜第３の実施形態について説明する。BEST MODE FOR CARRYING OUT THE INVENTION (A) Embodiment Hereinafter, a first embodiment relating to a transmitting apparatus and a communication system of the present invention will be described.
Third to third embodiments will be described.

【００２３】第１の実施形態では、確保可能な許容帯域
に適応して、通信に使用するメディアの種類を変更す
る。In the first embodiment, the type of media used for communication is changed according to the allowable bandwidth that can be secured.

【００２４】（Ａ−１）第１の実施形態の構成本実施形態のマルチメディア通信システム１０の構成を
図１に示す。マルチメディア送信部１０１はパソコン端
末などの汎用的な通信端末の一部であってもよく、専用
の通信装置の一部であってもよい。この点は、ＩＰネッ
トワーク１１を介して当該送信部１０１と通信するマル
チメディア受信部１２０についても同様である。(A-1) Configuration of First Embodiment FIG. 1 shows the configuration of a multimedia communication system 10 of the present embodiment. The multimedia transmitting unit 101 may be a part of a general-purpose communication terminal such as a personal computer terminal, or may be a part of a dedicated communication device. The same applies to the multimedia receiving unit 120 that communicates with the transmitting unit 101 via the IP network 11.

【００２５】また、送信部１０１に入力されるビデオ入
力信号ＶＳと音声入力信号ＡＳの関係は、映画ソフトな
どの音声付き動画を構成し、相互に密接な関係にあるビ
デオ信号と音声信号であってもよく、まったく独立で無
関係なビデオ信号と音声信号であって、たまたま同じ送
信部１０１が同時に送信しようとしている信号であって
もよい。The relationship between the video input signal VS and the audio input signal AS input to the transmission unit 101 is a video signal and an audio signal that constitute a moving image with audio such as movie software and are closely related to each other. Alternatively, they may be completely independent and unrelated video signals and audio signals, and may be signals that happen to be transmitted simultaneously by the same transmitting unit 101.

【００２６】図１において、マルチメディア送信部１０
１は、ビデオ符号化部１０２と、音声符号化部１０３
と、音声テキスト変換部１０４と、多重化制御部１０５
とを備えている。In FIG. 1, a multimedia transmitting unit 10
1 is a video encoding unit 102 and an audio encoding unit 103
, A speech-to-text converter 104, and a multiplexing controller 105
And

【００２７】ビデオ符号化部１０２は、ビデオ入力信号
ＶＳを入力し、データ量を圧縮するように符号化してビ
デオ符号化データＣＶとして多重化制御部１０５へ出力
する部分である。ここで、ビデオ符号化方式は特に限定
せず、ＭＰＥＧ方式なども使用可能であるが、本実施形
態では、一例としてＪＰＥＧを用いるものとする。The video encoding unit 102 is a part that receives the video input signal VS, encodes the data so as to compress the data amount, and outputs the encoded data to the multiplexing control unit 105 as encoded video data CV. Here, the video encoding method is not particularly limited, and an MPEG method or the like can be used. In the present embodiment, JPEG is used as an example.

【００２８】音声符号化部１０３は、音声入力信号ＡＳ
を入力し、データ量を圧縮するように符号化して音声符
号化データＣＡとして多重化制御部１０５へ出力する部
分である。ここで、音声符号化方式は特に限定しない
が、本実施形態では、一例としてＩＴＵ−Ｔ標準規格で
あるＧ７２３．１を使用するものとする。本符号化方式
により、前記音声入力信号ＡＳは５〜６Ｋｂｐｓのデー
タ量に圧縮可能である。[0028] The speech encoding unit 103 outputs the speech input signal AS.
Is input and encoded to compress the data amount, and is output to the multiplexing control unit 105 as encoded audio data CA. Here, the audio encoding method is not particularly limited, but in the present embodiment, G723.1, which is the ITU-T standard, is used as an example. According to the present encoding method, the audio input signal AS can be compressed to a data amount of 5 to 6 Kbps.

【００２９】音声テキスト変換部１０４は、音声入力信
号ＡＳを入力し、入力される音韻列に対して認識処理を
行い、テキスト信号に変換してテキスト変換データＣＴ
として多重化制御部１０５へ出力する部分である。The speech-to-text conversion unit 104 receives the speech input signal AS, performs recognition processing on the input phoneme sequence, converts the input phoneme sequence into a text signal, and converts the text signal into text conversion data CT.
Is output to the multiplexing control unit 105.

【００３０】当該認識方法については特に限定しない
が、本実施形態では、一例として、認識対象音素、各音
節遷移部の標準周波数スペクトル情報を予め用意してお
き、前記音声入力信号ＡＳの各フレームにおける周波数
スペクトルを算出し、前記あらかじめ用意された標準周
波数スペクトルとの相関を計算して、所定閾値以上の相
関を与える音韻列を認識結果とする方法が使用可能であ
る。Although the recognition method is not particularly limited, in the present embodiment, as an example, a recognition target phoneme and standard frequency spectrum information of each syllable transition portion are prepared in advance, and the speech input signal AS in each frame is prepared. It is possible to use a method of calculating a frequency spectrum, calculating a correlation with the previously prepared standard frequency spectrum, and obtaining a phoneme sequence giving a correlation equal to or greater than a predetermined threshold value as a recognition result.

【００３１】音声信号ＡＳをテキスト信号に変換すると
そのデータ量は、例えば、１秒間に８音節含むと仮定
し、１音節当たりの文字コードを８ビットとすると１秒
間に６４ビットとなり、さらにアクセント、イントネー
ション等の制御情報が同等の分量付与されたとしても、
０．５Ｋｂｐｓ以下のデータ量となり、前記Ｇ７２３．
１符号化と比較して１桁少ないデータ量に圧縮可能であ
る。When the audio signal AS is converted to a text signal, the data amount is assumed to include, for example, eight syllables per second. If the character code per syllable is 8 bits, the data amount is 64 bits per second, and the accent, Even if the same amount of control information such as intonation is given,
The data amount becomes 0.5 Kbps or less.
It is possible to compress the data amount to one digit less than one encoding.

【００３２】次に、前記多重化制御部１０５は、図４に
示すように、セレクタ１２と、比較判定部１３と、送信
データ量記憶部１４、ヘッダ制御部１５とを備えてい
る。Next, as shown in FIG. 4, the multiplexing control unit 105 includes a selector 12, a comparison / determination unit 13, a transmission data amount storage unit 14, and a header control unit 15.

【００３３】このうちセレクタ１２は、比較判定部１３
から供給されるセレクタ制御信号ＳＥに応じて３つの入
力データのあいだで選択を切り替える。３つの入力デー
タは、ビデオ符号化部１０２から供給されるビデオ符号
化データＣＶ、音声符号化部１０３から供給される音声
符号化データＣＡ、および音声テキスト変換部１０４か
ら供給されるテキスト変換データＣＴである。The selector 12 includes a comparison / determination unit 13
The selection is switched among the three input data in accordance with the selector control signal SE supplied from. The three input data are video encoded data CV supplied from the video encoding unit 102, audio encoded data CA supplied from the audio encoding unit 103, and text conversion data CT supplied from the audio text conversion unit 104. It is.

【００３４】比較判定部１３は、ネットワーク１１を介
してマルチメディア受信部１２０から送信制御信号ＳＣ
の供給を受ける部分で、送信データ量記憶部１４が保持
している過去の送信符号化データＣＤの送信データ量
（あるいは送信データレート）と、送信制御信号ＳＣが
持ってくる受信側（受信部１２０）で実際に受信された
受信データ量（あるいは受信データレートやパケット損
失割合等）とを比較すること等によって、ネットワーク
１１で使用可能な帯域（許容帯域）の大きさをリアルタ
イムに認識する。The comparing / determining unit 13 transmits the transmission control signal SC from the multimedia receiving unit 120 via the network 11.
In the transmission data amount storage unit 14, the transmission data amount (or transmission data rate) of the past transmission encoded data CD and the reception side (reception unit) In step 120), the size of the band (allowable band) usable in the network 11 is recognized in real time by comparing the received data amount (or the received data rate or the packet loss ratio) actually received.

【００３５】すなわちこの送信制御信号ＳＣは、通信ネ
ットワーク１１の許容帯域に関する情報を持っており、
この情報は、例えば、パケット損失割合や、実際に受信
部１２０で受信できた符号化データレートなどで構成さ
れる。That is, the transmission control signal SC has information on the allowable bandwidth of the communication network 11.
This information includes, for example, a packet loss rate, an encoded data rate that can be actually received by the receiving unit 120, and the like.

【００３６】比較判定部１３の処理においては、送信部
１０１が送信した送信データレートに比べて、実際に受
信部１２０が受信した受信データレートが低いほど、あ
るいは前記送信データ量に比べて受信データ量が小さい
（すなわちパケット損失割合が大きい）ほど、ネットワ
ーク１１の輻輳度が高く、許容帯域が小さいと判定す
る。In the process of the comparing and judging unit 13, the reception data rate actually received by the reception unit 120 is lower than the transmission data rate transmitted by the transmission unit 101, or the reception data rate is smaller than the transmission data amount. It is determined that the smaller the amount (that is, the larger the packet loss ratio), the higher the congestion degree of the network 11 and the smaller the allowable bandwidth.

【００３７】許容帯域があまりに小さく、ビデオ符号化
データＣＶを送信することが困難と判断された時、比較
判定部１３は、当該ビデオ符号化テータＣＶ、前記音声
符号化データＣＡを送信符号化データＣＤとして出力し
ないようにするとともに、前記テキスト変換データＣＴ
を送信符号化データＣＤとして出力する。When it is determined that the allowable band is too small to transmit the video coded data CV, the comparison / determination unit 13 converts the video coded data CV and the audio coded data CA into the transmission coded data CA. In order not to output as a CD, the text conversion data CT
Is output as transmission encoded data CD.

【００３８】一般的には、パケット損失割合が２０％を
越える状況ではビデオ符号化データＣＶを通信路に出力
することは不可能であり、また音声符号化データＣＡの
パケットも局部的に損失する可能性が高い。この場合、
受信部１２０側では、ユーザが聴取している音声に音飛
びが発生して、ユーザに不快感を与える。In general, when the packet loss rate exceeds 20%, it is impossible to output the video coded data CV to the communication path, and the packet of the voice coded data CA is also locally lost. Probability is high. in this case,
On the receiving unit 120 side, a sound skipping occurs in the sound that the user is listening to, giving the user discomfort.

【００３９】そこで、セレクタ制御信号ＳＥは、セレク
タ１２にテキスト変換データＣＴを選択させる。上述し
たように、同じ内容の音声入力信号ＡＳに対するテキス
ト変換データＣＴのデータ量は、音声符号化データＣＡ
のデータ量に比べて、例えば１０分の１程度になる。The selector control signal SE causes the selector 12 to select the text conversion data CT. As described above, the data amount of the text conversion data CT with respect to the audio input signal AS having the same content is the audio encoded data CA.
For example, the data amount becomes about one tenth as compared with the data amount.

【００４０】これは、換言するなら、入力された音声信
号ＡＳを音声認識によってテキストデータＣＴに変換す
ることにより、標準的に用いられる音声符号化と比較し
て、１０分の１程度のデータ量になるまで圧縮率を高め
たものととらえることができる。In other words, by converting the input speech signal AS into text data CT by speech recognition, the data amount is reduced to about 1/10 compared to the standard speech coding. It can be considered that the compression ratio has been increased until.

【００４１】パケット損失割合が２０％を越える状況で
は、前記許容帯域が極めて小さいために、送信部１０１
からパケットを送信すれば、当該パケット送信自体が、
当該送信部１０１から送信される後続のパケットの損失
割合を高める要因として作用するとも考えられるから、
通信の品質を維持するためには、データ量が少なく使用
帯域が小さいテキスト変換データＣＴを、送信符号化デ
ータＣＤとして選ぶことが有効である。In a situation where the packet loss ratio exceeds 20%, since the allowable bandwidth is extremely small, the transmitting unit 101
If you send a packet from, the packet transmission itself,
It is also considered that this acts as a factor that increases the loss ratio of the subsequent packet transmitted from the transmitting unit 101,
In order to maintain the communication quality, it is effective to select the text conversion data CT having a small data amount and a small use band as the transmission encoded data CD.

【００４２】多重化制御部１０５内に設けられているヘ
ッダ生成部１５は、多重化制御部１０５から送出される
パケットのヘッダを生成、出力する部分である。したが
って、前記セレクタ制御信号ＳＥがセレクタ１２にヘッ
ダ生成部１５を選択させているときに、パケットのヘッ
ダ部分がネットワーク１１に送出されることになる。The header generation section 15 provided in the multiplex control section 105 is a section for generating and outputting a header of a packet transmitted from the multiplex control section 105. Therefore, when the selector control signal SE causes the selector 12 to select the header generation unit 15, the header portion of the packet is transmitted to the network 11.

【００４３】すなわちセレクタ１２がヘッダ生成部１５
を選択しているときにパケットヘッダが出力され、ビデ
オ符号化部１０２または音声符号化部１０３、あるいは
音声テキスト変換部１０４を選択しているときに、当該
パケットヘッダにつづくパケットのデータ部分が出力さ
れることになる。That is, the selector 12 is provided with the header generation unit 15
Is selected, the packet header is output. When the video encoder 102, the audio encoder 103, or the audio / text converter 104 is selected, the data portion of the packet following the packet header is output. Will be done.

【００４４】このようなパケット化に際しては、前記送
信符号化データＣＤは、適当な単位でパケット化される
ことになるが、これは例えばフレーム単位であってよ
い。At the time of such packetization, the transmission encoded data CD is packetized in an appropriate unit, which may be, for example, a frame unit.

【００４５】ヘッダ生成部１５が生成する各パケットの
ヘッダには、パケット間の出力順序を示すパケット番
号、メディア識別情報、パケット長等の制御情報が付与
されている。The header of each packet generated by the header generation unit 15 is provided with control information such as a packet number indicating the output order between packets, media identification information, and packet length.

【００４６】一方、このマルチメディア送信部１０１に
対応した機能や構造を備えているマルチメディア受信部
１２０は、分離制御部１２１と、ビデオ復号部１２２
と、音声復号部１２３と、テキスト音声変換部１２４と
を備えている。On the other hand, a multimedia receiving unit 120 having a function and a structure corresponding to the multimedia transmitting unit 101 includes a separation control unit 121 and a video decoding unit 122.
, A voice decoding unit 123, and a text-to-speech conversion unit 124.

【００４７】このうち分離制御部１２１は、図５に示す
ように、識別分配部２１と、送信制御信号送出部２２
と、受信データ量検出部２３とを備えている。As shown in FIG. 5, the separation control unit 121 includes an identification distribution unit 21 and a transmission control signal transmission unit 22.
And a received data amount detection unit 23.

【００４８】識別分配部２１は、ネットワーク１１を介
して前記送信符号化データＣＤを含むパケットを入力し
て当該パケットを分解し、そのパケットヘッダに含まれ
る前記メディア識別情報により、該パケットのデータ部
分が持つ内容が、ビデオ（すなわちビデオ符号化データ
ＣＶ）、音声（すなわち音声符号化データＣＡ）、テキ
スト（すなわちテキスト変換データＣＴ）のいずれであ
るかを識別する機能を装備している。The identification / distribution unit 21 receives a packet including the transmission coded data CD via the network 11 and decomposes the packet, and, based on the media identification information included in the packet header, a data portion of the packet. Is provided with a function of identifying whether the content of the data is video (that is, video encoded data CV), audio (that is, audio encoded data CA), or text (that is, text conversion data CT).

【００４９】そしてその識別結果に従い、識別分配部２
１は、ビデオ符号化データＣＶをビデオ復号部１２２に
供給し、音声符号化データＣＡを音声復号部１２３に供
給し、テキスト変換データＣＴをテキスト音声変換部１
２４に供給する。Then, according to the identification result, the identification distribution unit 2
1 supplies the encoded video data CV to the video decoding unit 122, supplies the encoded audio data CA to the audio decoding unit 123, and converts the text conversion data CT to the text / audio conversion unit 1.
24.

【００５０】マルチメディア受信部１２０が実際に受信
した当該送信符号化データＣＤが受信データである。こ
の受信データに関し、受信データ量や受信データレート
は、受信データ量検出部２３によって検出される。The transmission coded data CD actually received by the multimedia receiving section 120 is the received data. With respect to the received data, the received data amount and the received data rate are detected by the received data amount detection unit 23.

【００５１】この受信データ量検出部２３は、前記送信
符号化データＣＤをデータ部分に持つパケットのパケッ
トヘッダを調べて、前記パケット番号からパケット損失
割合を計算し、前記パケット長から受信データレートを
計算する。The reception data amount detection unit 23 checks the packet header of the packet having the transmission coded data CD in the data part, calculates the packet loss ratio from the packet number, and determines the reception data rate from the packet length. calculate.

【００５２】送信制御信号送出部２２は、送信制御信号
ＳＣを送出する部分である。この送出に際して送信制御
信号送出部２２は、受信データ量検出部２３が計算し、
検出した前記パケット損失割合、受信データレート、受
信データ量等を受け取って、これらに基づいて送信制御
信号ＳＣを構成する。The transmission control signal transmission section 22 is a section for transmitting the transmission control signal SC. In this transmission, the transmission control signal transmission unit 22 calculates the reception data amount detection unit 23,
The received packet loss ratio, received data rate, received data amount, and the like are received, and a transmission control signal SC is configured based on these.

【００５３】この送信制御信号ＳＣも、前記送信部１０
１において送信符号化データＣＤがパケット化されたよ
うに、図示しないパケット組立て部によって、パケット
化されてネットワーク１１に送出される。The transmission control signal SC is also transmitted by the transmission unit 10
As in 1, the transmission encoded data CD is packetized by a packet assembling unit (not shown) and transmitted to the network 11.

【００５４】識別分配部２１からビデオ符号化データＣ
Ｖの入力を受ける前記ビデオ復号部１２２は、送信部１
０１内のビデオ符号化部１０２が用いている符号化方式
に対応する方式で復号を行い、ビデオ出力信号ＶＳを出
力するが、本実施形態では、前記ビデオ符号化部１０２
でＪＰＥＧ方式を採用したので、当該ビデオ復号部１２
２ではＭｏｔｉｏｎ−ＪＰＥＧ方式を用いて動画を再生
することになる。The video encoding data C from the identification distribution unit 21
The video decoding unit 122 receiving the input of V
01, a video output signal VS is output by decoding using a method corresponding to the coding method used by the video coding unit 102 in the video encoding unit 102. In the present embodiment, the video coding unit 102
Adopted the JPEG method, the video decoding unit 12
In 2, the moving image is reproduced using the Motion-JPEG method.

【００５５】本実施形態では、ビデオ出力信号ＶＳの出
力は、送信符号化データＣＤに含まれるビデオ符号化デ
ータＣＶの供給が途切れても、途切れる直前に供給を受
けたビデオ符号化データＣＶに基づいて継続されるもの
とする。In the present embodiment, the output of the video output signal VS is based on the video encoded data CV supplied immediately before the interruption even if the supply of the video encoded data CV included in the transmission encoded data CD is interrupted. Shall be continued.

【００５６】フレーム中の画像が静止している場合や、
画像の動きが遅い場合、少ない場合などには、このよう
なビデオ符号化データＣＶの途切れがあっても、視聴中
のユーザに不快感を与えることは少ないし、画像の動き
が大きい状況で途切れが発生した場合にも、何も画像を
表示しないより直前の画像を表示した方がよいと考えら
れる。When the image in the frame is stationary,
When the motion of the image is slow or small, even if the video coded data CV is interrupted, the user who is viewing the video is less likely to cause discomfort, and the video is interrupted when the motion is large. It is considered better to display the immediately preceding image even if no image is displayed, when no image is displayed.

【００５７】識別分配部２１から音声符号化データＣＡ
の入力を受ける前記音声復号部１２３も、送信部１０１
の音声符号化部１０３が用いているＩＴＵ−Ｔ標準規格
のＧ７２３．１に対応する方式で復号を行い、音声出力
信号ＡＳを出力する。The voice encoding data CA from the identification distribution unit 21
The voice decoding unit 123 receiving the input of the
Of the ITU-T standard G723.1, which is used by the audio encoder 103, and outputs an audio output signal AS.

【００５８】尚、本実施形態では、送信符号化データＣ
Ｄに含まれる音声符号化データＣＡの入力中以外は、音
声信号ＡＳの出力を行わないものとする。したがって音
声符号化データＣＡの入力が途切れると、受信部１２０
側では音飛びが発生することになる。In this embodiment, the transmission coded data C
It is assumed that the audio signal AS is not output except during the input of the audio encoded data CA included in D. Therefore, when the input of the voice coded data CA is interrupted, the receiving unit 120
On the side, skipping will occur.

【００５９】映像は通常、同じ画像が連続して表示され
てもユーザに不快感をあたえることは少ないが、同じ音
声が繰り返し出力されると、ユーザに多大な不快感を与
える可能性が高いからである。したがってデータの受信
が途切れた場合、本実施形態では、画像は直前の画像を
そのまま表示しつづけ、音声は出力停止することにな
る。In general, a video does not give a user discomfort even if the same image is continuously displayed, but if the same sound is repeatedly output, the user is likely to give a great discomfort. It is. Therefore, when the data reception is interrupted, in the present embodiment, the image continues to be displayed as it is immediately before, and the output of the sound is stopped.

【００６０】最後に、前記テキスト音声変換部１２４
は、識別分配部２１からテキスト変換データＣＴの入力
を受けると、音声データに変換して音声出力信号ＡＳを
出力する。テキスト変換データＣＴから音声信号ＡＳへ
の変換にあたっては、音声合成技術を用いることができ
る。Finally, the text-to-speech conversion unit 124
Receives the text conversion data CT from the identification distribution unit 21, converts the text conversion data into voice data, and outputs a voice output signal AS. In converting the text conversion data CT into the audio signal AS, a speech synthesis technique can be used.

【００６１】尚、本実施形態では、送信符号化データＣ
Ｄに含まれるテキスト変換データＣＴの入力中以外は、
音声信号ＡＳの出力を行わないものとする。In this embodiment, the transmission coded data C
Except during the input of the text conversion data CT included in D,
It is assumed that the audio signal AS is not output.

【００６２】また、必要に応じて、テキスト変換データ
ＣＴを音声信号ＡＳに変換せず、テキストのまま、ＣＲ
Ｔ等に画面表示するようにしてもよく、音声信号ＡＳへ
の変換とこのような画面表示の双方を同時に行ってもよ
い。Also, if necessary, the text conversion data CT is not converted to the audio signal AS, and the
The screen may be displayed on T or the like, and both the conversion to the audio signal AS and the screen display may be performed simultaneously.

【００６３】当該テキストを画面表示する場合には、も
し、送信符号化データＣＤに含まれるテキスト変換デー
タＣＴの供給が途切れても、途切れる直前の画面表示を
継続するようにするとよい。When the text is displayed on the screen, even if the supply of the text conversion data CT included in the transmission encoded data CD is interrupted, the screen display immediately before the interruption may be continued.

【００６４】（Ａ−２）第１の実施形態の全体動作ＩＰネットワーク１１の送信部１０１と受信部１２０に
対する許容帯域が十分に大きいとき、比較判定部１３は
セレクタ制御信号ＳＥを用いて、ヘッダ生成部１５の出
力のほかに、適切な時間配分でビデオ符号化データＣＶ
と音声符号化データＣＡの双方を選択させる。(A-2) Overall Operation of the First Embodiment When the allowable bandwidth for the transmission unit 101 and the reception unit 120 of the IP network 11 is sufficiently large, the comparison and determination unit 13 uses the selector control signal SE to generate a header. In addition to the output of the generation unit 15, the video encoded data CV
And the audio encoded data CA are both selected.

【００６５】これにより、前記送信符号化データＣＤと
して、ビデオ符号化データＣＶと音声符号化データＣＡ
が適切な時間配分で選択され、パケットごとにヘッダ生
成部１５が選択されることになる。時間配分の中身は、
ビデオ符号化データＣＶのデータレートや音声符号化デ
ータＣＡのデータレートに応じて決まる。Thus, the encoded video data CV and the encoded audio data CA are used as the encoded transmission data CD.
Is selected with an appropriate time distribution, and the header generation unit 15 is selected for each packet. The contents of the time distribution are
It is determined according to the data rate of the encoded video data CV and the data rate of the encoded audio data CA.

【００６６】例えば、同じ映画ソフトの映像をビデオ入
力信号ＶＳとし、その音声を音声入力信号ＡＳとする場
合には、映像のほうがはるかにデータレートが大きく、
大部分の時間はビデオ符号化データＣＤを選択すること
になるのが普通である。For example, when the video of the same movie software is used as the video input signal VS and the audio is used as the audio input signal AS, the video has a much higher data rate.
Most of the time, video encoded data CD will normally be selected.

【００６７】このとき受信部１２０では、ビデオ出力信
号ＶＳも音声出力信号ＡＳも問題なく出力され、当該映
画ソフトを視聴しているユーザは、快適な視聴をつづけ
ることができる。At this time, the receiving section 120 outputs both the video output signal VS and the audio output signal AS without any problem, so that the user watching the movie software can continue to view comfortably.

【００６８】ところが、ネットワーク１１の輻輳度が高
まり、許容帯域が極めて小さくなると、ビデオ符号化デ
ータＣＶについても音声符号化データＣＡについてもパ
ケット損失割合が非常に高くなり、画面では同じ画像の
表示がつづき、音声は、音飛びが頻発する傾向を示す。However, when the degree of congestion of the network 11 increases and the allowable bandwidth becomes extremely small, the packet loss ratio of the video encoded data CV and the audio encoded data CA becomes extremely high, and the same image is displayed on the screen. Subsequently, the voice shows a tendency for skipping to occur frequently.

【００６９】このとき、比較判定部１３は受け取った送
信制御信号ＳＣの内容に基づき、音声テキスト変換部１
０４を選択するようになる。音声テキスト変換部１０４
が出力するテキスト変換データＣＴは、音声符号化デー
タＣＡに比べても十分にデータレートが小さいため、許
容帯域が極めて小さい状況下でも、パケット損失のほと
んど無い通信が可能である。At this time, based on the content of the received transmission control signal SC, the comparison / determination section 13 performs
04 will be selected. Voice text converter 104
Has a sufficiently low data rate compared to the encoded voice data CA, so that communication with almost no packet loss is possible even in a situation where the allowable bandwidth is extremely small.

【００７０】上述したように、テキスト変換データＣＴ
のデータレートは、音声符号化データＣＡに比べて１０
分の１程度とすることが可能であるが、ここでは、音声
符号化データＣＡよりもはるかにデータレートの大きい
ビデオ符号化データＣＶの送信を停止していることが大
きく作用すると考えられる。ビデオ符号化データＣＶと
音声符号化データＣＡを合計したデータレートと比較す
ると、テキスト変換データＣＴのデータレートは極めて
小さく、数百分の１をはるかに下回るように設定するこ
とが可能である。As described above, the text conversion data CT
Is 10 times higher than that of the encoded voice data CA.
Although it can be reduced to about one-half, it is considered that the fact that the transmission of the video coded data CV having a much higher data rate than the voice coded data CA is stopped here has a large effect. The data rate of the text conversion data CT is extremely small when compared with the total data rate of the video coded data CV and the voice coded data CA, and can be set to be much less than one hundredth.

【００７１】このような許容帯域の極めて小さい状況下
でも、受信部１２０側のユーザは、テキスト音声変換部
１２４の出力する音声信号ＡＳに応じて、途切れのない
合成音声を聞きつづけることができる。パケット損失が
ほとんど無いとすると、この合成音声の伝える意味内容
は、途切れも繰り返しもなく、前記映画ソフトなどのの
進行にしたがって快適に変化することになる。Even in such a situation where the allowable band is extremely small, the user on the receiving unit 120 side can continue to listen to the synthesized speech without interruption according to the audio signal AS output from the text-to-speech conversion unit 124. Assuming that there is almost no packet loss, the meaning content transmitted by the synthesized voice changes without interruption or repetition and changes comfortably with the progress of the movie software or the like.

【００７２】（Ａ−３）第１の実施形態の効果以上説明したように、本実施形態によれば、例えば昼間
時間帯のインターネット環境などのように、通信ネット
ワークが輻輳した状態であっても、比較的快適なマルチ
メディア情報の通信が可能となる。(A-3) Effects of the First Embodiment As described above, according to the present embodiment, even when the communication network is congested, for example, in the Internet environment during the daytime. This enables relatively comfortable communication of multimedia information.

【００７３】また、本実施形態では、送信部の音声テキ
スト変換部（１０４）における音声認識率は必ずしも１
００％とはならないが、８０％〜９０％の認識率であれ
ば、受信側で内容を識別することは十分可能である。よ
って、幅広いネットワーク環境で動作する高性能なマル
チメディア通信装置の実現が可能となる。In the present embodiment, the speech recognition rate in the speech-to-text conversion unit (104) of the transmission unit is not necessarily 1.
Although it is not 00%, if the recognition rate is 80% to 90%, it is sufficiently possible to identify the content on the receiving side. Therefore, a high-performance multimedia communication device that operates in a wide range of network environments can be realized.

【００７４】（Ｂ）第２の実施形態本実施形態は、入力ビデオのシーン変化量が大きいフレ
ームを優先的にデータ圧縮変換して、送信符号化データ
のデータレートを低減し、限定された許容帯域を有効利
用することを特徴とする。(B) Second Embodiment In this embodiment, a frame having a large amount of scene change of an input video is preferentially subjected to data compression conversion to reduce the data rate of the transmission coded data, and has a limited tolerance. The band is used effectively.

【００７５】（Ｂ−１）第２の実施形態の構成および動
作図２に、本実施形態のマルチメディア通信システム３０
の構成を示す。このマルチメディア通信システム３０を
構成するマルチメディア送信部２０１は、第１の実施形
態のマルチメディア送信部１０１に対応し、マルチメデ
ィア受信部２２０は前記マルチメディア受信部１２０に
対応する。(B-1) Configuration and Operation of the Second Embodiment FIG. 2 shows a multimedia communication system 30 of this embodiment.
Is shown. The multimedia transmitting unit 201 included in the multimedia communication system 30 corresponds to the multimedia transmitting unit 101 of the first embodiment, and the multimedia receiving unit 220 corresponds to the multimedia receiving unit 120.

【００７６】図２において、マルチメディア送信部２０
１は、ビデオ符号化部２０２と、音声符号化部２０３
と、多重化制御部２０５と、変化量検出部２０６とを備
えており、マルチメディア受信部２２０は、分離制御部
２２１と、ビデオ復号部２２２と、音声復号部２２３と
を備えている。In FIG. 2, the multimedia transmitting unit 20
1 is a video encoding unit 202 and an audio encoding unit 203
, A multiplexing control unit 205, and a change amount detecting unit 206. The multimedia receiving unit 220 includes a demultiplexing control unit 221, a video decoding unit 222, and an audio decoding unit 223.

【００７７】図２に示した本実施形態の各部は、第１の
実施形態の対応する符号を付した各部と対応する。Each part of this embodiment shown in FIG. 2 corresponds to each part of the first embodiment with a corresponding reference numeral.

【００７８】すなわち、ビデオ符号化部２０２は前記ビ
デオ符号化部１０２に対応し、音声符号化部２０３は前
記音声符号化部１０３に対応し、多重化制御部２０５は
前記多重化制御部１０５に対応し、分離制御部２２１は
前記分離制御部１２１に対応し、ビデオ復号部２２２は
前記ビデオ復号部１２２に対応し、音声復号部２２３は
前記音声復号部１２３に対応し、ネットワーク３１は前
記ネットワーク１１に対応している。That is, the video encoding section 202 corresponds to the video encoding section 102, the audio encoding section 203 corresponds to the audio encoding section 103, and the multiplex control section 205 corresponds to the multiplex control section 105. The separation control unit 221 corresponds to the separation control unit 121, the video decoding unit 222 corresponds to the video decoding unit 122, the audio decoding unit 223 corresponds to the audio decoding unit 123, and the network 31 corresponds to the network. 11 is supported.

【００７９】特に、音声符号化部２０３と前記音声符号
化部１０３、ネットワーク３１と前記ネットワーク１
１、ビデオ復号部２２２と前記ビデオ復号部１２２、音
声復号部２２３と前記音声復号部１２３は、その機能面
で、まったく同一である。したがって本実施形態でも、
ビデオ符号化方式として、例えばＪＰＥＧ方式、ＭＰＥ
Ｇ方式を用いることができ、音声符号化方式として、例
えばＩＴＵ−Ｔ標準規格Ｇ７２３．１を用いることがで
きる。In particular, the speech encoder 203 and the speech encoder 103, the network 31 and the network 1
1. The video decoding unit 222 and the video decoding unit 122, and the audio decoding unit 223 and the audio decoding unit 123 are exactly the same in terms of function. Therefore, in this embodiment,
As video encoding methods, for example, JPEG method, MPE
The G system can be used, and, for example, ITU-T standard G723.1 can be used as the audio coding system.

【００８０】また、本実施形態における各信号も第１の
実施形態の各信号と対応するものが多い。すなわち、ビ
デオ符号化データＣＶ１は前記ＣＶに対応し、音声符号
化データＣＡ１は前記ＣＡに対応し、送信符号化データ
ＣＤ１は前記ＣＤに対応し、送信制御信号ＳＣ１は前記
ＳＣに対応し、ビデオ入力（出力）信号ＶＳ１は前記Ｖ
Ｓに対応し、音声入力（出力）信号ＡＳ１は前記ＡＳに
対応する。Further, each signal in the present embodiment often corresponds to each signal in the first embodiment. That is, the encoded video data CV1 corresponds to the CV, the encoded audio data CA1 corresponds to the CA, the encoded transmission data CD1 corresponds to the CD, the transmission control signal SC1 corresponds to the SC, The input (output) signal VS1 is
S, and the audio input (output) signal AS1 corresponds to the AS.

【００８１】また、多重化制御部２０５の内部構成を示
した図６において、多重化制御部２０５は、送信データ
量記憶部１４と、ヘッダ生成部１５と、セレクタ３２
と、比較判定部３３とを備えている。In FIG. 6 showing the internal configuration of the multiplexing control unit 205, the multiplexing control unit 205 includes a transmission data amount storage unit 14, a header generation unit 15, and a selector 32.
And a comparison determination unit 33.

【００８２】このセレクタ３２は前記セレクタ１２に対
応し、比較判定部３３は前記比較判定部１３に対応す
る。ただし第１の実施形態のセレクタ１２が行う選択切
り替えは、許容帯域が大きいときにはビデオ符号化デー
タＣＶまたは音声符号化データＣＡとヘッダ生成部１５
を選択し、許容帯域が小さくなるとテキスト変換データ
ＣＴとヘッダ生成部１５を選択したが、本実施形態のセ
レクタ３２の選択切り替えは、もっぱらパケット化のた
めの動作であり、許容帯域の大きさが変動しても実質的
に変化することはない。The selector 32 corresponds to the selector 12, and the comparison / determination section 33 corresponds to the comparison / determination section 13. However, the selection switching performed by the selector 12 of the first embodiment is performed when the allowable band is large, and the video encoded data CV or the audio encoded data CA and the header generation unit 15
Is selected, and the text conversion data CT and the header generation unit 15 are selected when the allowable bandwidth is reduced. However, the selection switching of the selector 32 in the present embodiment is an operation for packetization only, and the size of the allowable bandwidth is Even if it fluctuates, it does not substantially change.

【００８３】当該多重化制御部２０５内で、ヘッダ生成
部１５と送信データ量記憶部１４は、機能面でも第１の
実施形態のヘッダ生成部、送信データ量記憶部と同じで
あるので同一の符号を付している。In the multiplexing control unit 205, the header generation unit 15 and the transmission data amount storage unit 14 are the same in terms of function as the header generation unit and the transmission data amount storage unit of the first embodiment. Signs are attached.

【００８４】その他の構成要素などについても、機能面
で共通する点が多いので、以下では、本実施形態が第１
の実施形態と相違する点についてのみ説明する。The other components and the like also have many points in common in terms of function.
Only the points different from the above embodiment will be described.

【００８５】実質的にこの相違点は、主として変化量検
出部２０６に関連する部分に限られる。This difference is substantially limited to a portion mainly related to the change amount detecting section 206.

【００８６】多重化制御部２０５の内部構成を示した図
６において、比較判定部３３は、ネットワーク３１を介
してマルチメディア受信部２２０から送信制御信号ＳＣ
１の供給を受ける部分で、送信データ量記憶部１４が保
持している過去の送信符号化データＣＤ１の送信データ
量（あるいは送信データレート）と、送信制御信号ＳＣ
１が持ってくる受信側（受信部２２０）で実際に受信さ
れた受信データ量（あるいは受信データレートやパケッ
ト損失割合等）とを比較すること等によって、ネットワ
ーク３１の許容帯域の大きさをリアルタイムに認識する
部分である点では、前記比較判定部１３に類似している
が、その出力信号である帯域情報信号ＢＳ１は、セレク
タ３２ではなく変化量検出部２０６に供給される。In FIG. 6 showing the internal configuration of the multiplexing control unit 205, the comparison / determination unit 33 sends a transmission control signal SC from the multimedia receiving unit 220 via the network 31.
The transmission data amount (or transmission data rate) of the past transmission encoded data CD1 stored in the transmission data amount storage unit 14 and the transmission control signal SC
1 by comparing the received data amount (or received data rate, packet loss rate, etc.) actually received by the receiving side (receiving unit 220) brought by the No. 1 in real time. This is similar to the comparison / determination unit 13 in that the band information signal BS1 is output to the change amount detection unit 206 instead of the selector 32.

【００８７】この帯域情報信号ＢＳ１は、送信制御信号
ＳＣ１の持ってくる許容帯域に関する情報に対応した情
報を備え、変化量検出部２０６の動作を制御するための
信号である。ただしこの帯域情報信号ＢＳ１が有効なの
は、許容帯域が小さく、送信データレートを減らす必要
がある場合だけなので、許容帯域の大きさが十分な場合
には、比較判定部３３は、帯域情報信号ＢＳ１の送信自
体を行わないようにしてもよい。This band information signal BS 1 is a signal for controlling the operation of the change amount detecting section 206, including information corresponding to the information on the allowable band carried by the transmission control signal SC 1. However, the band information signal BS1 is effective only when the allowable band is small and the transmission data rate needs to be reduced. Therefore, when the size of the allowable band is sufficient, the comparison determination unit 33 determines whether the band information signal BS1 The transmission itself may not be performed.

【００８８】帯域情報信号ＢＳ１を受け取る変化量検出
部２０３は、ビデオ符号化部２０２と同じビデオ入力信
号ＶＳ１の供給を受け、ビデオ入力信号ＶＳ１と帯域情
報信号ＢＳ１の内容に応じて、ビデオ符号化部２０２の
動作を制御するための符号化制御信号ＣＣを変化させ
る。The change amount detector 203 receiving the band information signal BS1 receives the same video input signal VS1 as the video encoder 202, and performs video encoding according to the contents of the video input signal VS1 and the band information signal BS1. The coding control signal CC for controlling the operation of the unit 202 is changed.

【００８９】変化量検出部２０６の動作において、例え
ば、帯域状態信号ＢＳ１に含まれるパケット損失割合が
所定の値（１〜２％）を下回る時は、通信路に出力する
データ量を減らす必要がないので、符号化制御信号ＣＣ
はオンとなる。一方、パケット損失割合が所定の値を上
回る場合、ビデオ入力信号ＶＳ１の各フレームの中で、
隣接フレームからの変化量が大きいフレームを検出し、
そのフレームについてのみ前記符号化制御信号ＣＣをオ
ンとし、その前後のフレームなどについては、符号化制
御信号ＣＣはオフとする。In the operation of the change amount detecting section 206, for example, when the packet loss ratio included in the band state signal BS1 falls below a predetermined value (1-2%), it is necessary to reduce the amount of data output to the communication path. Since there is no encoding control signal CC
Turns on. On the other hand, when the packet loss ratio exceeds a predetermined value, in each frame of the video input signal VS1,
Detect a frame with a large amount of change from the adjacent frame,
The coding control signal CC is turned on only for that frame, and the coding control signal CC is turned off for frames before and after that.

【００９０】変化量検出方法としては、例えば、各フレ
ームの各画素の濃度分布（濃度ヒストグラム）を計算
し、直前フレームの濃度ヒストグラムとの差分値が所定
の閾値を上回るフレームを変化量が大きいと判定する方
法を用いる。濃度ヒストグラムとは、横軸に階調値、縦
軸に各階調値の出現頻度（画面中の各濃度値の出現画素
数）を示したヒストグラムであるから、一般的に、画像
の変化は当該濃度ヒストグラムの変化となってあらわ
れ、濃度ヒストグラムの変化が大きいほど画像の変化も
大きい。As a change amount detection method, for example, a density distribution (density histogram) of each pixel of each frame is calculated, and a frame whose difference value from the density histogram of the immediately preceding frame exceeds a predetermined threshold value has a large change amount. A method of determining is used. The density histogram is a histogram in which the horizontal axis indicates the tone value, and the vertical axis indicates the frequency of appearance of each tone value (the number of appearing pixels of each density value on the screen). It appears as a change in the density histogram, and the larger the change in the density histogram, the larger the change in the image.

【００９１】具体的な階調数としては、例えば８ビット
２５６階調などであってよい。The specific number of gradations may be, for example, 256 gradations of 8 bits.

【００９２】濃度ヒストグラムのフレーム間差分は、フ
レーム間における各濃度出現回数の差分絶対値の総和で
示され、一般的に、濃度ヒストグラムフレーム間差分が
大きい値を示すフレームは、隣接（直前）フレームから
の変化が大きい性質を持つ。直前フレームからの変化が
大きいフレームは、時系列に入力されるビデオシーンの
中でユーザから見て有益な情報を多く含む。The difference between the frames of the density histogram is represented by the sum of the absolute values of the differences in the number of appearances of each density between the frames. Generally, a frame having a large difference between the density histogram frames is determined by the adjacent (previous) frame. It has the property that the change from is large. A frame having a large change from the immediately preceding frame contains a lot of information useful to the user in the video scene input in time series.

【００９３】換言するなら、マルチメディア受信部２２
０側において、ビデオ符号化データＣＶ１として時系列
に多数のフレームを受信し復号したとしても、各フレー
ム中の画像の変化がほとんどないようなケースでは、１
フレームだけ受信してそのフレームを画面表示しつづけ
た場合と比べ、視聴しているユーザは大きな違いを感得
することができないが、直前フレームからの変化が大き
いフレームを受信、復号することができなければ、当該
ユーザは大きな不快感を感じる可能性が高い。In other words, the multimedia receiving unit 22
On the 0 side, even if a large number of frames are received and decoded in time series as the video coded data CV1, if there is almost no change in the image in each frame, 1
Compared to the case where only the frame is received and the frame is continuously displayed on the screen, the viewing user cannot perceive a great difference, but cannot receive and decode a frame having a large change from the immediately preceding frame. If so, the user is likely to feel great discomfort.

【００９４】例えば、パケット損失が１０％を越える状
況では、ＩＰネットワーク３１でビデオ符号化データＣ
Ｖ１のすべてを正常に受信し、復号することはもともと
困難であるが、直前フレームからの変化量の大きなフレ
ームも小さなフレームも全フレームを送信しようとする
と、変化量の大きなフレームも小さなフレームも等しい
確率でパケット損失によって失われるため、小さな許容
帯域の利用効率が実質的に低下して、ユーザの体感上、
画像表示の満足度が低下するが、変化量の大きなフレー
ムだけを選択的に送信すれば、真にユーザにとっての重
要度の高いフレームを持つパケットだけが送信されるた
めに、送信データレートが低下し、それによってパケッ
ト損失自体も低下して、ユーザの満足度も高まることが
期待できる。For example, in a situation where the packet loss exceeds 10%, the video encoded data C
Normally, it is difficult to normally receive and decode all of V1. However, if all frames, both large and small, are transmitted from the immediately preceding frame, both large and small frames are equal. Because of the probability of being lost due to packet loss, the utilization efficiency of the small allowable bandwidth is substantially reduced, and on the user's experience,
Satisfaction with image display decreases, but if only frames with a large amount of change are selectively transmitted, only packets having frames that are truly important to the user are transmitted, and the transmission data rate decreases. As a result, it can be expected that the packet loss itself is reduced and the user satisfaction is also increased.

【００９５】変化量検出部２０６から符号化制御信号Ｃ
Ｃの供給を受けたビデオ符号化部２０２では、当該符号
化制御信号ＣＣがオンの時に符号化処理を行ってビデオ
符号化データＣＶ１を出力し、当該符号化制御信号ＣＣ
がオフの時には、符号化処理を行わず、ビデオ符号化デ
ータＣＶ１を出力しない。The coding control signal C from the change amount detecting section 206
The video encoding unit 202 receiving the supply of the C performs the encoding process when the encoding control signal CC is on, outputs encoded video data CV1, and outputs the encoded control signal CCV.
Is off, the encoding process is not performed, and the encoded video data CV1 is not output.

【００９６】したがって、直前フレームの濃度ヒストグ
ラムとの差分値に関する前述の所定の閾値に、前記パケ
ット損失割合と比例関係を持たせるようにすると、送信
制御信号ＳＣ１がもたらすパケット損失割合が高く、許
容帯域が小いときには、ビデオ符号化データＣＶ１のデ
ータレートを小さくして送信部２０１からネットワーク
３１に送出する送信符号化データＣＤ１のデータレート
を引き下げることができる。Therefore, if the above-mentioned predetermined threshold value relating to the difference value from the density histogram of the immediately preceding frame is made to have a proportional relationship with the packet loss ratio, the packet loss ratio brought by the transmission control signal SC1 is high, Is smaller, the data rate of the encoded video data CV1 can be reduced, and the data rate of the encoded transmission data CD1 transmitted from the transmitting unit 201 to the network 31 can be reduced.

【００９７】本実施形態において、マルチメディア受信
部２２０内の分離制御部２２１の内部構成は、図５に示
した分離制御部１２０の内部構成と同じであってもよ
い。ただし本実施形態では、テキスト変換データＣＴは
使用されないため、前記識別分配部２１のテキスト変換
データＣＴを識別して分配するための機能は、省略する
ことができる。In the present embodiment, the internal configuration of the separation control unit 221 in the multimedia receiving unit 220 may be the same as the internal configuration of the separation control unit 120 shown in FIG. However, in this embodiment, since the text conversion data CT is not used, the function of the identification and distribution unit 21 for identifying and distributing the text conversion data CT can be omitted.

【００９８】（Ｂ）第２の実施形態の効果以上説明したように、本実施形態によれば、ネットワー
クの輻輳状態により使用可能な帯域が減少した時、単純
にフレーム間引きを行うのでなく、ユーザにとって重要
な情報を含むフレームを選択し優先して符号化する構成
としたので、許容された帯域を有効に利用した高性能な
マルチメディア通信装置の実現が期待できる。(B) Effects of the Second Embodiment As described above, according to the present embodiment, when the available bandwidth decreases due to the network congestion, the frame is not simply thinned out, Therefore, it is expected that a high-performance multimedia communication device that effectively uses an allowed band can be realized because a frame including information important to the user is selected and encoded with priority.

【００９９】（Ｃ）第３の実施形態本実施形態は、カメラのズームやパン等を行うことで、
送信符号化データのデータレートを低減し、限定された
許容帯域を有効利用することを特徴とする。(C) Third Embodiment In this embodiment, zooming and panning of a camera are performed.
The present invention is characterized in that the data rate of the transmission coded data is reduced and a limited allowable band is used effectively.

【０１００】（Ｃ−１）第３の実施形態の構成および動
作図３に、本実施形態のマルチメディア通信システム４０
の構成を示す。このマルチメディア通信システム４０を
構成するマルチメディア送信部３０１は、第２の実施形
態のマルチメディア送信部２０１に対応し、マルチメデ
ィア受信部３２０は前記マルチメディア受信部２２０に
対応する。(C-1) Configuration and Operation of Third Embodiment FIG. 3 shows a multimedia communication system 40 according to the third embodiment.
Is shown. The multimedia transmitting unit 301 constituting the multimedia communication system 40 corresponds to the multimedia transmitting unit 201 of the second embodiment, and the multimedia receiving unit 320 corresponds to the multimedia receiving unit 220.

【０１０１】図３において、マルチメディア送信部３０
１は、ビデオ符号化部３０２と、音声符号化部３０３
と、多重化制御部３０５と、カメラ制御部３０６と、カ
メラ３０７とを備えており、マルチメディア受信部３２
０は、分離制御部３２１と、ビデオ復号部３２２と、音
声復号部３２３とを備えている。In FIG. 3, the multimedia transmitting unit 30
1 is a video encoding unit 302 and an audio encoding unit 303
And a multiplexing control unit 305, a camera control unit 306, and a camera 307.
0 includes a separation control unit 321, a video decoding unit 322, and an audio decoding unit 323.

【０１０２】図３に示した本実施形態の各部は、第２の
実施形態の対応する符号を付した各部と対応する。Each part of this embodiment shown in FIG. 3 corresponds to each part of the second embodiment with a corresponding reference numeral.

【０１０３】すなわち、ビデオ符号化部３０２は前記ビ
デオ符号化部２０２に対応し、音声符号化部３０３は前
記音声符号化部２０３に対応し、多重化制御部３０５は
前記多重化制御部２０５に対応し、分離制御部３２１は
前記分離制御部２２１に対応し、ビデオ復号部３２２は
前記ビデオ復号部２２２に対応し、音声復号部３２３は
前記音声復号部２２３に対応し、ネットワーク４１は前
記ネットワーク３１に対応している。That is, the video encoding unit 302 corresponds to the video encoding unit 202, the audio encoding unit 303 corresponds to the audio encoding unit 203, and the multiplexing control unit 305 corresponds to the multiplexing control unit 205. The separation control unit 321 corresponds to the separation control unit 221; the video decoding unit 322 corresponds to the video decoding unit 222; the audio decoding unit 323 corresponds to the audio decoding unit 223; 31.

【０１０４】特に、音声符号化部３０３と前記音声符号
化部２０３、ネットワーク４１と前記ネットワーク３
１、ビデオ復号部３２２と前記ビデオ復号部２２２、音
声復号部３２３と前記音声復号部２２３は、多重化制御
部３０５と前記多重化制御部２０５、分離制御部３２１
と前記分離制御部２２１は、その機能面で、まったく同
一である。したがって本実施形態でも、ビデオ符号化方
式として、例えばＪＰＥＧ方式、ＭＰＥＧ方式を用いる
ことができ、音声符号化方式として、例えばＩＴＵ−Ｔ
標準規格Ｇ７２３．１を用いることができる。In particular, the speech encoder 303 and the speech encoder 203, the network 41 and the network 3
1. The video decoding unit 322 and the video decoding unit 222, the audio decoding unit 323 and the audio decoding unit 223 include a multiplexing control unit 305, the multiplexing control unit 205, and a demultiplexing control unit 321.
And the separation control unit 221 are exactly the same in terms of their functions. Therefore, also in the present embodiment, for example, the JPEG system or the MPEG system can be used as the video coding system, and the ITU-T
Standard G723.1 can be used.

【０１０５】また、本実施形態における各信号も第２の
実施形態の各信号と対応するものが多い。すなわち、ビ
デオ符号化データＣＶ２は前記ＣＶ１に対応し、音声符
号化データＣＡ２は前記ＣＡ１に対応し、送信符号化デ
ータＣＤ２は前記ＣＤ１に対応し、送信制御信号ＳＣ２
は前記ＳＣ１に対応し、帯域情報信号ＢＳ２は前記ＢＳ
１に対応し、ビデオ入力（出力）信号ＶＳ２は前記ＶＳ
１に対応し、音声入力（出力）信号ＡＳ２は前記ＡＳ１
に対応する。The signals in this embodiment also often correspond to the signals in the second embodiment. That is, the encoded video data CV2 corresponds to the CV1, the encoded audio data CA2 corresponds to the CA1, the encoded transmission data CD2 corresponds to the CD1, and the transmission control signal SC2
Corresponds to the SC1, and the band information signal BS2 is the BS1.
1 and the video input (output) signal VS2 is
1 and the audio input (output) signal AS2 is the AS1
Corresponding to

【０１０６】上述したように多重化制御部３０５は前記
多重化制御部２０５と機能面でまったく同一であるか
ら、図６はそのまま、多重化制御部３０５の内部構成も
示している。As described above, the multiplexing control unit 305 is exactly the same in function as the multiplexing control unit 205, and FIG. 6 also shows the internal configuration of the multiplexing control unit 305 as it is.

【０１０７】その他の構成要素などについても、機能面
で共通する点が多いので、以下では、本実施形態が第２
の実施形態と相違する点についてのみ説明する。[0107] The other components and the like also have many points in common in terms of function.
Only the points different from the above embodiment will be described.

【０１０８】実質的にこの相違点は、主としてカメラ制
御部３０６に関連する部分に限られる。This difference is substantially limited mainly to the portion related to the camera control section 306.

【０１０９】第２の実施形態のビデオ入力信号ＶＳ１
は、すでに記録されている画像を再生することによって
得られるビデオ信号であってもよかったが、本実施形態
のビデオ入力信号ＶＳ２は、動画像撮影用のビデオカメ
ラ３０７によって被写体Ｔを撮影することによって得ら
れるものであるとする。The video input signal VS1 of the second embodiment
May be a video signal obtained by reproducing an image that has already been recorded. However, the video input signal VS2 of the present embodiment can be obtained by capturing an image of a subject T by using a video camera 307 for capturing a moving image. It is assumed that it can be obtained.

【０１１０】当該カメラ３０７は、カメラ制御部３０６
から供給されるカメラ制御信号ＶＣ２に応じて、搭載し
ている光学系や自身の向きを自動的に変化させて、ズー
ム、パン、チルト、フォーカスなどに関して、制御デー
タを変化させることができる。The camera 307 includes a camera control unit 306.
In response to the camera control signal VC2 supplied from the camera, the mounted optical system and its own direction are automatically changed, and control data on zoom, pan, tilt, focus, and the like can be changed.

【０１１１】このカメラ３０７が被写体Ｔを撮影するこ
とによって得られるビデオ入力信号ＶＳ２を受け取った
ビデオ符号化部３０２は、当該ビデオ入力信号ＶＳ２を
符号化することによって例えばフレームごとに得られた
ビデオ符号化データＣＶ２を多重化制御部３０５に送出
するとともに、当該フレームの符号量ＣＯＶをカメラ制
御部３０６に出力する。The video encoder 302, which has received the video input signal VS2 obtained by the camera 307 taking an image of the subject T, encodes the video input signal VS2 to obtain, for example, a video code obtained for each frame. The coded data CV2 is sent to the multiplexing control unit 305, and the code amount COV of the frame is output to the camera control unit 306.

【０１１２】カメラ制御部３０６は、前記帯域情報信号
ＢＳ２と当該符号量ＣＯＶを入力し、カメラ制御信号Ｖ
Ｃ２をカメラ３０７へ出力する。帯域情報信号ＢＳ２は
前記ＢＳ１と同様に、許容帯域が十分に大きい場合に
は、その送信自体を行わないようにしてもよい。The camera control unit 306 receives the band information signal BS2 and the code amount COV, and
C2 is output to the camera 307. As in the case of BS1, if the allowable band is sufficiently large, the band information signal BS2 may not be transmitted.

【０１１３】前記帯域状態信号ＢＳ２に含まれるパケッ
ト損失割合が所定の値（１〜２％）を下回る時、このカ
メラ制御部３０６は、安定した状態でビデオ信号が通信
できていると判断し、前記カメラ制御信号ＶＣ２の内容
は変更しない。一方、パケット損失割合が所定の値を上
回る場合、より符号量ＣＯＶを低減できるカメラ入力条
件を算出するため、ズーム、回転（パン、チルトに対
応）等の制御データを変更し、カメラ制御信号ＶＣ２の
内容を変化させる。When the packet loss ratio included in the band status signal BS2 is lower than a predetermined value (1-2%), the camera control unit 306 determines that the video signal can be communicated in a stable state. The content of the camera control signal VC2 is not changed. On the other hand, if the packet loss ratio exceeds a predetermined value, the control data such as zoom and rotation (corresponding to pan and tilt) are changed to calculate camera input conditions that can further reduce the code amount COV, and the camera control signal VC2 Change the content of

【０１１４】これによりカメラ３０７による被写体Ｔの
撮影条件（すなわちカメラ入力条件）が変化する。例え
ば、ズームを４段階、パンを４段階程度切換え、その中
で最小の符号量ＣＯＶを与える条件でビデオ入力を行う
ようにする。画像が多少ぼけても構わない状況であれ
ば、フォーカスポイントをずらすことによりぼけた画質
としてもよい。As a result, the photographing condition of the subject T by the camera 307 (that is, the camera input condition) changes. For example, the zoom is switched in four steps and the pan is switched in about four steps, and video input is performed under the condition that gives the minimum code amount COV. If the image may be slightly blurred, the image quality may be blurred by shifting the focus point.

【０１１５】フォーカスポイントをずらすと、高周波数
成分が除去された状態となるので、符号量ＣＯＶの削減
に効果があり、パンやチルトなどによっても、カメラ３
０７から見た被写体Ｔの角度や明るさなどが変化し、同
じ被写体Ｔを撮影して得られるフレームに関し、例え
ば、濃度ヒストグラムなども変化するため、符号量ＣＯ
Ｖの低いフレームを得ることが可能である。When the focus point is shifted, the high frequency component is removed, which is effective in reducing the code amount COV.
07, the angle and the brightness of the subject T change, and for a frame obtained by photographing the same subject T, for example, the density histogram also changes.
It is possible to obtain low V frames.

【０１１６】ビデオ符号化データＣＶ２を構成する各フ
レームの符号量ＣＯＶが減少すると、送信符号化データ
ＣＤ２のデータレートが減少して、パケットの損失割合
を低減することができる。When the code amount COV of each frame constituting the coded video data CV2 decreases, the data rate of the coded transmission data CD2 decreases, and the packet loss ratio can be reduced.

【０１１７】本実施形態によると、マルチメディア受信
部３２０側でビデオ出力信号ＶＳ２に応じた画面を見て
いるユーザにとっては、自身の意思とは無関係に、送信
部３０１側がカメラ３０７を制御して画面中で被写体Ｔ
の位置や大きさが変化したり、合焦状態が変化したりす
ることになるが、許容帯域が小さい場合でも、被写体Ｔ
の画像を、見続けることができる可能性が高まる点で有
利である。According to the present embodiment, for the user watching the screen corresponding to the video output signal VS2 on the multimedia receiving unit 320 side, the transmitting unit 301 controls the camera 307 regardless of his / her intention. Subject T in the screen
May change in position or size, or the focus state may change, but even when the allowable band is small, the subject T
This is advantageous in that the possibility that the user can continue to view the image increases.

【０１１８】(Ｃ）第３の実施形態の効果本実施形態によれば、例えばＴＶ会議アプリケーション
のように、カメラ入力条件を多少変更してもユーザの満
足度が著しく低下する可能性が低いシステムなどにおい
て、システム規模増加を招くことなく、最適な入力条件
を得ることが可能となり、許容帯域が小さい場合でも、
リアルタイム性を損なうことなくユーザが被写体を見続
けることが可能である。(C) Effects of the Third Embodiment According to the present embodiment, a system such as a TV conference application is unlikely to significantly reduce user satisfaction even if camera input conditions are slightly changed. In such cases, it is possible to obtain optimal input conditions without increasing the system scale, and even when the allowable bandwidth is small,
It is possible for the user to continue looking at the subject without impairing the real-time property.

【０１１９】(Ｄ)他の実施形態なお、第１の実施形態では、許容帯域が極めて小さくな
ったときには、ビデオ符号化データＣＶの選択と音声符
号化データＣＡの選択の双方を行わないようにしたが、
ビデオ符号化データＣＶの選択は継続して音声符号化デ
ータＣＡだけ選択しないようにするとともに、テキスト
変換データＣＴの選択を行うようにしても、音声に関し
てはデータレートが小さくなるため、一定の効果は期待
できる。(D) Other Embodiments In the first embodiment, when the allowable band becomes extremely small, both the selection of the video encoded data CV and the selection of the audio encoded data CA are not performed. But
Even if the selection of the video encoded data CV is not continued and only the audio encoded data CA is selected, and the selection of the text conversion data CT is performed, the data rate of audio is reduced. Can be expected.

【０１２０】また、第１の実施形態の音声符号化部１０
３および音声テキスト変換部１０４と、第２の実施形態
のビデオ符号化部２０２および変化量検出部２０６を備
えたマルチメディア送信部を構成してもよい。Further, the speech encoding unit 10 of the first embodiment
3 and the audio-to-text converter 104, and the multimedia transmitter including the video encoder 202 and the variation detector 206 of the second embodiment.

【０１２１】このマルチメディア送信部は、許容帯域が
十分に大きいときには、すべてのフレームをビデオ符号
化部２０２で符号化するとともに音声符号化部１０３の
出力を多重化制御部（多重化制御部１０５に相当）で選
択するが、許容帯域が小さくなると、変化量の大きいフ
レームだけを符号化するとともに、音声テキスト変換部
１０４の出力を多重化制御部が選択するようにしてもよ
い。When the allowable band is sufficiently large, the multimedia transmitting section encodes all the frames by video encoding section 202 and outputs the output of audio encoding section 103 to a multiplexing control section (multiplexing control section 105). However, when the allowable band is reduced, only the frame with a large change amount may be encoded, and the output of the speech-to-text conversion unit 104 may be selected by the multiplexing control unit.

【０１２２】また、第１の実施形態の主要部と第３の実
施形態の主要部は、同じマルチメディア送信部内に併存
させることが可能である。The main part of the first embodiment and the main part of the third embodiment can coexist in the same multimedia transmitting unit.

【０１２３】すなわち、第３の実施形態のマルチメディ
ア送信部３０１において、音声符号化部３０３を、第１
の実施形態の音声符号化部１０３および音声テキスト変
換部１０４で置換したような構成を用いることもでき
る。That is, in the multimedia transmitting unit 301 of the third embodiment, the audio encoding unit 303 is
It is also possible to use a configuration in which the voice encoding unit 103 and the voice / text conversion unit 104 of the embodiment are replaced.

【０１２４】さらに、第２の実施形態の主要部と第３の
実施形態の主要部を、同じマルチメディア送信部内に併
存させることも可能である。Further, the main part of the second embodiment and the main part of the third embodiment can coexist in the same multimedia transmitting unit.

【０１２５】この場合、許容帯域が小さくなった時に
は、画像の変化量の大きなフレームだけを選択的に符号
化するとともに、送信データレートを低減するようにカ
メラ入力条件の制御も行うことができる。In this case, when the allowable band is reduced, it is possible to selectively encode only a frame having a large image change amount, and to control the camera input conditions so as to reduce the transmission data rate.

【０１２６】なお、第３の実施形態において、カメラ入
力条件を変化させるために、ズーム、パン、チルト、フ
ォーカスなどを行うものとしたが、これらすべてに関し
て制御を行う必要はなく、例えばフォーカスに関する合
焦状態の変更だけを行うようにしてもよい。反対に、こ
こに列挙していない条件について制御を行うようにする
ことも可能である。In the third embodiment, zoom, pan, tilt, focus, and the like are performed in order to change the camera input condition. However, it is not necessary to perform control for all of them. Only the change of the focus state may be performed. Conversely, it is also possible to control the conditions not listed here.

【０１２７】また、第３の実施形態では、カメラ入力条
件の変更は自動的にシステムが実行するものとしたが、
例えばシステムの指示に基づいてカメラを操作している
オペレータが人的に実行するようにしてもよい。In the third embodiment, the change of the camera input condition is automatically executed by the system.
For example, an operator operating the camera based on an instruction from the system may execute the operation manually.

【０１２８】さらに、第３の実施形態の送信制御信号Ｓ
Ｃ２は、受信部３２０で実際に受信された受信データ
量、受信データレート、パケット損失割合等の情報だけ
でなく、受信部３２０側のユーザが送信部３０１に対し
カメラ３０７のカメラ入力条件を指定するカメラ入力条
件変更要求を含むようにしてもよい。この場合、カメラ
制御部３０６は、許容帯域が小さければ、当該カメラ入
力条件変更要求と、送信部３０１内で発生する上述した
フレーム符号量ＣＯＶの低減の要求とを調停して、実際
のカメラ入力条件を決めることになる。Further, the transmission control signal S of the third embodiment
C2 specifies not only information such as the amount of received data, the received data rate, and the packet loss ratio actually received by the receiving unit 320, but also the user of the receiving unit 320 specifying the camera input conditions of the camera 307 for the transmitting unit 301. May be included. In this case, if the allowable band is small, the camera control unit 306 arbitrates between the camera input condition change request and the above-described request for reduction of the frame code amount COV generated in the transmission unit 301, and performs actual camera input. The conditions will be decided.

【０１２９】なお、第１〜第３の実施形態において、マ
ルチメディア送信部は、映像と音声の双方をネットワー
クに送信したが、第１の実施形態においては映像を送信
するための構成要素を省略してもよく、第２および第３
の実施形態においては、音声を送信するための構成要素
を省略してもよい。In the first to third embodiments, the multimedia transmitting unit transmits both video and audio to the network. However, in the first embodiment, the components for transmitting video are omitted. Second and third
In the embodiment, components for transmitting voice may be omitted.

【０１３０】すなわち本発明は、映像情報及び／又は音
声情報を送信情報とし、当該送信情報をネットワークを
介して受信装置に送信する送信装置、少なくとも映像情
報を送信情報と成し、当該送信情報をネットワークを介
して受信装置に送信する送信装置、または、送信装置が
送信した少なくとも映像情報を、受信情報としてネット
ワークを介して受信する受信装置を備えた通信システム
について、広く適用することができる。That is, according to the present invention, a transmitting device that transmits video information and / or audio information as transmission information, and transmits the transmission information to a receiving device via a network. The present invention can be widely applied to a transmitting device that transmits to a receiving device via a network, or a communication system including a receiving device that receives at least video information transmitted by the transmitting device as receiving information via a network.

【０１３１】[0131]

【発明の効果】以上に説明したように、第１〜第４の発
明によれば、ネットワークの輻輳状態に応じて許容帯域
が小さくなっても、映像情報または音声情報の正常な通
信を継続できる可能性が高く、通信の信頼性が向上す
る。As described above, according to the first to fourth aspects of the present invention, normal communication of video information or audio information can be continued even if the allowable bandwidth is reduced according to the congestion state of the network. The probability is high, and the reliability of communication is improved.

【０１３２】また、第１または第４の発明では、許容帯
域が極めて小さい場合にも通信の継続が可能である。Further, according to the first or fourth aspect, communication can be continued even when the allowable band is extremely small.

【０１３３】さらに、第２または第４の発明では、ユー
ザにとって真に重要度が高い画面の通信を優先すること
で許容帯域の利用効率を実質的に向上することができ
る。Further, in the second or fourth aspect of the present invention, the priority is given to the communication of a screen that is truly important to the user, so that the utilization efficiency of the allowable band can be substantially improved.

【０１３４】さらにまた、第３または第４の発明は、比
較的小さな規模でありながら、リアルタイム性等の点で
も優れている。Further, the third or fourth invention has a relatively small scale and is also excellent in real-time properties and the like.

[Brief description of the drawings]

【図１】第１の実施形態に係るマルチメディア通信シス
テムの概略構成を示すブロック図である。FIG. 1 is a block diagram illustrating a schematic configuration of a multimedia communication system according to a first embodiment.

【図２】第２の実施形態に係るマルチメディア通信シス
テムの概略構成を示すブロック図である。FIG. 2 is a block diagram illustrating a schematic configuration of a multimedia communication system according to a second embodiment.

【図３】第３の実施形態に係るマルチメディア通信シス
テムの概略構成を示すブロック図である。FIG. 3 is a block diagram illustrating a schematic configuration of a multimedia communication system according to a third embodiment.

【図４】第１の実施形態の多重化制御部の概略構成を示
すブロック図である。FIG. 4 is a block diagram illustrating a schematic configuration of a multiplexing control unit according to the first embodiment.

【図５】第１の実施形態の分離制御部の概略構成を示す
ブロック図である。FIG. 5 is a block diagram illustrating a schematic configuration of a separation control unit according to the first embodiment.

【図６】第２または第３の実施形態の多重化制御部の概
略構成を示すブロック図である。FIG. 6 is a block diagram illustrating a schematic configuration of a multiplexing control unit according to the second or third embodiment.

【符号の説明】１０、３０，４０…マルチメディア通信システム、１
１，３１，４１…ネットワーク、１３，３３…比較判定
部、１４…送信データ量記憶部、１０２，２０２，３０
２…ビデオ符号化部、１０１，２０１，３０１…マルチ
メディア送信部、１０３，２０３，３０３…音声符号化
部、１０４…音声テキスト変換部、１２０，２２０，３
２０…マルチメディア受信部、２０６…変化量検出部、
３０６…カメラ制御部、３０７…カメラ、ＶＳ、ＶＳ
１、ＶＳ２…ビデオ信号、ＡＳ、ＡＳ１、ＡＳ２…音声
信号、ＳＣ、ＳＣ１、ＳＣ２…送信制御信号、ＣＤ、Ｃ
Ｄ１、ＣＤ２…送信符号化データ、Ｔ…被写体。[Description of References] 10, 30, 40 ... Multimedia communication system, 1
1, 31, 41: network, 13, 33: comparison / determination unit, 14: transmission data amount storage unit, 102, 202, 30
2. Video encoder, 101, 201, 301 Multimedia transmitter, 103, 203, 303 Audio encoder, 104 Audio text converter, 120, 220, 3
20: multimedia receiving unit, 206: change amount detecting unit,
306: Camera control unit, 307: Camera, VS, VS
1, VS2: video signal, AS, AS1, AS2: audio signal, SC, SC1, SC2: transmission control signal, CD, C
D1, CD2: coded transmission data, T: subject.

Claims

[Claims]

1. A transmitting apparatus for transmitting video information and / or audio information as transmission information and transmitting the transmission information to a receiving apparatus via a network, wherein the network allows the transmission according to the congestion state. Band recognition means for recognizing the amount of allowable band; video coding means for generating the video information by inputting and coding a video signal; and inputting and coding an audio signal to the audio information. Voice data encoding means for generating corresponding voice data information; converting an input voice signal into a text signal by voice recognition and coding the text signal to generate voice text information corresponding to the voice information. An audio text encoding unit that transmits the video information and the audio data information as the transmission information when the allowable band is large. , Wherein when the tolerance band is small, the transmission device being characterized in that a transmission information selecting means for transmitting the voice text information as the transmission information.

2. At least video information is formed as transmission information,
A transmitting apparatus for transmitting the transmission information to a receiving apparatus via a network, wherein the network recognizes an amount of an allowable bandwidth allowed for the transmission according to the congestion state; and the video information is a moving image. If the image is an image, it is determined that, from among a plurality of time-series screens constituting the moving image, a screen having a larger amount of change in the image in the screen is determined to have a higher importance, and a screen having a higher importance is selected. Screen selecting means, if the allowable band is large, transmit all the time-series screens as the transmission information, and if the allowable band is small, select the important screen from a plurality of time-series screens. Transmitting means for transmitting only the screen selected by the means as the transmission information.

3. At least video information is formed as transmission information,
A transmitting device that transmits the transmission information to a receiving device via a network, wherein the network recognizes an amount of an allowable band allowed for the transmission according to the congestion state, and captures an object. A photographing means for generating a time-series photographing screen; a photographing screen encoding means for encoding the photographing screen generated by the photographing means to obtain time-series screen information constituting the video information; and information of each screen information. Screen information amount detecting means for detecting the amount, when the allowable band is small, by changing the optical or spatial relationship between the imaging means and the subject while referring to the information amount of the screen information, A transmitting apparatus, wherein the photographing means includes a photographing control means for generating a photographing screen having a small information amount of the screen information.

4. In a communication system including a receiving device that receives at least video information transmitted by a transmitting device as reception information via a network, the receiving device performs the following processing in accordance with the content of the reception information actually received. Band-related information obtaining means for obtaining band-related information relating to the amount of allowable band allowed by the network for transmission of the video information; and allowable band allowed by the network for the transmission according to the congestion state. Control information transmitting means for transmitting control information including the band-related information to the transmitting device so that the transmitting device recognizes the amount of the transmission device. The transmitting device according to any one of claims 1 to 3, A communication system, being a transmission device.