JP2007318283A

JP2007318283A - Packet communication system, data receiver

Info

Publication number: JP2007318283A
Application number: JP2006143706A
Authority: JP
Inventors: Ken Yoshii; 謙吉井
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2006-05-24
Filing date: 2006-05-24
Publication date: 2007-12-06

Abstract

<P>PROBLEM TO BE SOLVED: To provide a packet communication system and a data receiver capable of reducing difference in delay time between voice packets and video image packets, while maintaining audio quality. <P>SOLUTION: The packet communication system comprises a data transmitter and a data receiver. The data receiver comprises an accumulation amount detection means, an information amount calculation means, an information amount determination means, and an information amount determination means. The accumulation amount detection means detects the amount of video image packets accumulated in a video image jitter buffer. The information amount calculation means calculates the amount of information of voice packets. The information amount determination means determines whether the amount of information calculated by the information amount calculation means is below a predetermined value or not. A reproduction control means changes a reproduction time duration for the voice packets determined as the ones below the predetermined value by the information amount determination means, according to the amount of video image packets detected by the accumulation amount detection means. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、パケット通信システム、及びデータ受信機器に関する。 The present invention relates to a packet communication system and a data receiving device.

通信回線を通じて映像および音声を送受信するシステム、たとえばＴＶ会議システムにおいて、話者の口の動きとスピーカから再生される声を同期させることは通話の違和感を無くすために重要であり、映像と音声とを同期させる、いわゆるリップシンク調整が行われている。 In a system that transmits and receives video and audio over a communication line, such as a TV conference system, it is important to synchronize the movement of the speaker's mouth and the voice reproduced from the speaker in order to eliminate the uncomfortable feeling of the call. So-called lip sync adjustment is performed.

例えば、映像信号の遅延量を検出し、遅延量に合わせて音声信号の遅延量を変化させることにより、画質や画像の変化に対して常に映像と音声の出力タイミングを合わせる方法が提案されている（例えば、特許文献１参照）。 For example, a method has been proposed in which the output timing of video and audio is always matched to changes in image quality and images by detecting the delay amount of the video signal and changing the delay amount of the audio signal in accordance with the delay amount. (For example, refer to Patent Document 1).

また、例えば、映像信号と音声信号にそれぞれ同時刻にタイミング信号を記録して別々の回線で送信し、受信側で受信した映像信号に記録されたタイミング信号と同時刻に記録されたタイミング信号を持つ音声信号を映像信号に合わせて再生する方法が提案されている（例えば、特許文献２参照）。 Also, for example, the timing signal recorded at the same time as the timing signal recorded on the video signal received on the receiving side is recorded on the video signal and the audio signal at the same time and transmitted on separate lines. There has been proposed a method of reproducing an audio signal possessed by a video signal (for example, see Patent Document 2).

しかしながら、特許文献１、特許文献２では音声信号を映像信号に合わせて再生するので、音声信号が映像信号に対し遅れると音が途切れ、聞き苦しい再生になるという課題があった。 However, in Patent Documents 1 and 2, since the audio signal is reproduced in accordance with the video signal, there is a problem that if the audio signal is delayed with respect to the video signal, the sound is interrupted and the reproduction is difficult to hear.

このような課題に対応するため、それぞれ同時期に発生した映像信号と音声信号に、同じ一意なマークを付与してパケット化して送信し、受信側で音声パケットより遅れた映像パケットを廃棄することにより、音声の品質を保ちながら音声と映像を同期させて再生する方法が提案されている（例えば、特許文献３参照）。
特開平７−１８４１８２号公報特開２００１−２４９９２号公報特開平７−５０８１８号公報 In order to deal with such problems, the same unique mark is attached to the video signal and audio signal generated at the same time, packetized and transmitted, and the video packet delayed from the audio packet is discarded on the receiving side. Therefore, a method of synchronizing and reproducing audio and video while maintaining audio quality has been proposed (see, for example, Patent Document 3).
JP-A-7-184182 JP 2001-24992 A Japanese Patent Laid-Open No. 7-50818

しかしながら、特許文献３の方法では音声パケットが映像パケットより遅れた場合、音声パケットの再生を優先するので、音声は再生されるが映像が全く再生されないという課題がある。 However, in the method of Patent Document 3, when the audio packet is delayed from the video packet, priority is given to the reproduction of the audio packet, so that the audio is reproduced but the video is not reproduced at all.

本発明は、上記課題に鑑みてなされたものであって、音声パケットが映像パケットより遅れた場合でも、音声の品質を保ちながら映像を音声と同期させて再生可能なパケット通信システム、及びデータ受信機器を提供することを課題とする。 The present invention has been made in view of the above problems, and a packet communication system capable of reproducing video in synchronization with audio while maintaining the quality of the audio even when the audio packet is delayed from the video packet, and data reception It is an object to provide a device.

１．
映像信号をパケット化して映像パケットにする映像パケットエンコーダと、
音声信号をパケット化して音声パケットにする音声パケットエンコーダと、
映像パケットと音声パケットに時刻情報を付与するタイムスタンプ付与部と、
前記時刻情報を付与された映像パケットと音声パケットを順次送信する送信部と、を備えたデータ送信機器と、
伝送回線を介して送信された前記映像パケットと前記音声パケットを受信する受信部と、
前記受信部が受信した前記映像パケットを順次蓄積する映像ジッタバッファと、
前記受信部が受信した前記音声パケットを順次蓄積する音声ジッタバッファと、
前記映像パケットをデコードして再生する映像パケットデコーダと、
前記音声パケットをデコードして再生する音声パケットデコーダと、
前記時刻情報を検出する時刻情報検出手段と、
前記時刻情報検出手段が検出した前記時刻情報に基づいて、前記映像パケットデコーダと、前記音声パケットデコーダに再生を指令する再生制御手段と、を備えたデータ受信機器と、
を有するパケット通信システムにおいて、
前記データ受信機器は、
前記映像ジッタバッファに蓄積された映像パケットの量を検出する蓄積量検出手段と、
前記音声パケットデコーダで再生する音声パケットの情報量を算出する情報量算出手段と、
前記情報量算出手段が算出した情報量を判定する情報量判定手段とを有し、
前記再生制御手段は、
前記蓄積量検出手段の検出した映像パケットの量に応じて、前記情報量判定手段が所定値以下の情報量であると判定した音声パケットに関しその再生時間を変更することを特徴とするパケット通信システム。 1.
A video packet encoder that packetizes video signals into video packets;
A voice packet encoder that packetizes voice signals into voice packets;
A time stamp giving unit for giving time information to video packets and audio packets;
A data transmission device comprising: a transmission unit that sequentially transmits the video packet and the audio packet to which the time information is attached;
A receiving unit for receiving the video packet and the audio packet transmitted via a transmission line;
A video jitter buffer for sequentially storing the video packets received by the receiver;
A voice jitter buffer for sequentially storing the voice packets received by the receiver;
A video packet decoder for decoding and reproducing the video packet;
An audio packet decoder for decoding and reproducing the audio packet;
Time information detecting means for detecting the time information;
Based on the time information detected by the time information detection means, a data receiving device comprising the video packet decoder, and a reproduction control means for instructing the audio packet decoder to reproduce,
In a packet communication system having
The data receiving device is:
Accumulated amount detecting means for detecting the amount of image packets accumulated in the image jitter buffer;
An information amount calculating means for calculating an information amount of an audio packet to be reproduced by the audio packet decoder;
An information amount determination means for determining the information amount calculated by the information amount calculation means;
The reproduction control means includes
A packet communication system characterized in that, according to the amount of video packets detected by the accumulated amount detection means, the reproduction time of the audio packet determined by the information amount determination means to be an information amount equal to or less than a predetermined value is changed. .

２．
前記再生制御手段は、
前記蓄積量検出手段が検出した映像パケットの量が所定値以上のとき、
前記情報量判定手段が所定値以下の情報量であると判定した音声パケットの再生を行わず、情報量が所定値以上の音声パケットを順次再生することを特徴とする１に記載のパケット通信システム。 2.
The reproduction control means includes
When the amount of video packets detected by the accumulated amount detection means is a predetermined value or more,
2. The packet communication system according to 1, wherein voice packets whose information amount is equal to or greater than a predetermined value are sequentially reproduced without reproducing the voice packet determined by the information amount determination means as being an information amount equal to or smaller than a predetermined value. .

３．
前記再生制御手段は、
再生する音声パケットより古い時刻情報を有する前記映像ジッタバッファに蓄積された映像パケットを、廃棄することを特徴とする２に記載のパケット通信システム。 3.
The reproduction control means includes
3. The packet communication system according to 2, wherein a video packet stored in the video jitter buffer having time information older than an audio packet to be reproduced is discarded.

４．
前記再生制御手段は、
前記蓄積量検出手段が検出した映像パケットの量が所定値以下のとき、
前記情報量判定手段が所定値以下の情報量であると判定した音声パケットの再生を前記映像パケットの量に応じて繰り返した後、次の音声パケットを順次再生することを特徴とする１に記載のパケット通信システム。 4).
The reproduction control means includes
When the amount of video packets detected by the accumulated amount detection means is below a predetermined value,
2. The reproduction of audio packets determined by the information amount determination means to be an information amount of a predetermined value or less is repeated according to the amount of video packets, and then the next audio packets are sequentially reproduced. Packet communication system.

５．
前記情報量算出手段は、前記音声信号の平均値に基づいて前記情報量を算出することを特徴とする１乃至４の何れか１項に記載のパケット通信システム。 5).
The packet communication system according to any one of claims 1 to 4, wherein the information amount calculation means calculates the information amount based on an average value of the audio signal.

６．
前記情報量算出手段は、前記音声信号の各周波数成分を算出し、前回までに算出した音声パケットの各周波数成分の値との加算平均値と、今回算出した各周波数成分の値との差分に基づいて前記情報量を算出することを特徴とする１乃至４の何れか１項に記載のパケット通信システム。 6).
The information amount calculation means calculates each frequency component of the audio signal, and calculates the difference between the addition average value of each frequency component value of the audio packet calculated up to the previous time and the value of each frequency component calculated this time. The packet communication system according to any one of 1 to 4, wherein the information amount is calculated based on the information.

７．
映像パケットと音声パケットを受信する受信部と、
前記受信部が受信した前記映像パケットを順次蓄積する映像ジッタバッファと、
前記受信部が受信した前記音声パケットを順次蓄積する音声ジッタバッファと、
前記映像パケットをデコードして再生する映像パケットデコーダと、
前記音声パケットをデコードして再生する音声パケットデコーダと、
前記映像パケットと前記音声パケットに付与された時刻情報を検出する時刻情報検出手段と、
前記時刻情報検出手段が検出した前記時刻情報に基づいて、前記映像パケットデコーダと、前記音声パケットデコーダに再生を指令する再生制御手段と、を備えたデータ受信機器において、
前記データ受信機器は、
前記映像ジッタバッファに蓄積された映像パケットの量を検出する蓄積量検出手段と、
前記音声パケットデコーダで再生する音声パケットの情報量を算出する情報量算出手段と、
前記情報量算出手段が算出した情報量を判定する情報量判定手段とを有し、
前記再生制御手段は、
前記蓄積量検出手段の検出した映像パケットの量に応じて、前記情報量判定手段が所定値以下の情報量であると判定した音声パケットに関しその再生時間を変更することを特徴とするデータ受信機器。 7).
A receiver for receiving video packets and audio packets;
A video jitter buffer for sequentially storing the video packets received by the receiver;
A voice jitter buffer for sequentially storing the voice packets received by the receiver;
A video packet decoder for decoding and reproducing the video packet;
An audio packet decoder for decoding and reproducing the audio packet;
Time information detecting means for detecting time information attached to the video packet and the audio packet;
In a data receiving device comprising: the video packet decoder based on the time information detected by the time information detection means; and a reproduction control means for instructing the audio packet decoder to reproduce,
The data receiving device is:
Accumulated amount detecting means for detecting the amount of image packets accumulated in the image jitter buffer;
An information amount calculating means for calculating an information amount of an audio packet to be reproduced by the audio packet decoder;
An information amount determination means for determining the information amount calculated by the information amount calculation means;
The reproduction control means includes
A data receiving device characterized in that, according to the amount of video packets detected by the accumulated amount detecting means, the reproduction time of the audio packet determined by the information amount determining means to be an information amount equal to or less than a predetermined value is changed. .

８．
前記再生制御手段は、
前記蓄積量検出手段が検出した映像パケットの量が所定値以上のとき、
前記情報量判定手段が所定値以下の情報量であると判定した音声パケットの再生を行わず、情報量が所定値以上の音声パケットを順次再生することを特徴とする７に記載のデータ受信機器。 8).
The reproduction control means includes
When the amount of video packets detected by the accumulated amount detection means is a predetermined value or more,
8. The data receiving apparatus according to claim 7, wherein the information packet determining unit sequentially reproduces the voice packets having the information amount equal to or greater than the predetermined value without reproducing the voice packets determined to be the information amount equal to or smaller than the predetermined value. .

９．
前記再生制御手段は、
再生する音声パケットより古い時刻情報を有する前記映像ジッタバッファに蓄積された映像パケットを、廃棄することを特徴とする８に記載のデータ受信機器。 9.
The reproduction control means includes
9. The data receiving apparatus according to 8, wherein the video packet stored in the video jitter buffer having time information older than the audio packet to be reproduced is discarded.

１０．
前記再生制御手段は、
前記蓄積量検出手段が検出した映像パケットの量が所定値以下のとき、
前記情報量判定手段が所定値以下の情報量であると判定した音声パケットの再生を前記映像パケットの量に応じて繰り返した後、次の音声パケットを順次再生することを特徴とする７に記載のデータ受信機器。 10.
The reproduction control means includes
When the amount of video packets detected by the accumulated amount detection means is below a predetermined value,
8. The reproduction of an audio packet determined by the information amount determination unit to be an information amount equal to or less than a predetermined value is repeated according to the amount of the video packet, and then the next audio packet is sequentially reproduced. Data receiving equipment.

１１．
前記情報量算出手段は、音声信号の平均値に基づいて前記情報量を算出することを特徴とする７乃至１０の何れか１項に記載のデータ受信機器。 11.
11. The data receiving device according to claim 7, wherein the information amount calculation unit calculates the information amount based on an average value of an audio signal.

１２．
前記情報量算出手段は、前記音声信号の各周波数成分を算出し、前回までに算出した音声パケットの各周波数成分の値との加算平均値と、今回算出した各周波数成分の値との差分に基づいて前記情報量を算出することを特徴とする７乃至１０の何れか１項に記載のデータ受信機器。 12
The information amount calculation means calculates each frequency component of the audio signal, and calculates the difference between the addition average value of each frequency component value of the audio packet calculated up to the previous time and the value of each frequency component calculated this time. 11. The data receiving device according to any one of 7 to 10, wherein the information amount is calculated based on the information amount.

本発明によれば、音声パケットと映像パケットの遅延時間の差を、情報量の少ない音声パケットの再生時間を変更することにより調整するので、音声パケットが映像パケットより遅れた場合でも、音声の品質を保ちながら映像を音声と同期させて再生可能なパケット通信システム、及びデータ受信機器を提供することができる。 According to the present invention, the difference between the delay times of the audio packet and the video packet is adjusted by changing the playback time of the audio packet with a small amount of information. Therefore, even if the audio packet is delayed from the video packet, the audio quality Thus, it is possible to provide a packet communication system and a data receiving device that can reproduce video while synchronizing with audio while maintaining video.

以下、実施形態により本発明を詳しく説明するが、本発明はこれに限定されるものではない。 Hereinafter, the present invention will be described in detail with reference to embodiments, but the present invention is not limited thereto.

図１は本発明におけるパケット通信システムの一例を示すブロック図である。 FIG. 1 is a block diagram showing an example of a packet communication system according to the present invention.

図１（ａ）は全体構成を示すブロック図、図１（ｂ）はメインマイコン部７１３の詳細なブロック図である。 FIG. 1A is a block diagram showing the overall configuration, and FIG. 1B is a detailed block diagram of the main microcomputer unit 713.

図１（ａ）のデータ送信装置７０１とデータ受信装置７０２は伝送回線７１７に接続されている。データ送信装置７０１は入力された映像と音声を音声パケットと映像パケットにエンコードして伝送回線７１７に送信し、伝送回線７１７を介して音声パケットと映像パケットを受信したデータ受信装置７０２は映像と音声を再生する。 A data transmission device 701 and a data reception device 702 in FIG. 1A are connected to a transmission line 717. The data transmission device 701 encodes the input video and audio into audio packets and video packets and transmits them to the transmission line 717, and the data reception device 702 that receives the audio and video packets via the transmission line 717 receives the video and audio. Play.

データ送信装置７０１は、音声入力部７０３、音声パケットエンコーダ７０４、映像パケットエンコーダ７０５、映像入力部７０６、タイムスタンプ付与部７０７、送信部７０８から構成される。 The data transmission device 701 includes an audio input unit 703, an audio packet encoder 704, a video packet encoder 705, a video input unit 706, a time stamp assigning unit 707, and a transmission unit 708.

音声入力部７０３から入力された音声信号は音声パケットエンコーダ７０４でパケット化され、音声パケットになる。また、映像入力部７０６から入力された映像信号は映像パケットエンコーダ７０５でパケット化され、映像パケットになる。音声パケットと映像パケットは、次にタイムスタンプ付与部７０７に入力され、タイムスタンプ付与部７０７は入力された時刻の時刻情報をタイムスタンプとして音声パケットと映像パケットに記録する。送信部７０８はタイムスタンプを付与された音声パケットと映像パケットを伝送回線７１７に送信する。 The voice signal input from the voice input unit 703 is packetized by the voice packet encoder 704 to become a voice packet. Also, the video signal input from the video input unit 706 is packetized by the video packet encoder 705 to become a video packet. Next, the audio packet and the video packet are input to the time stamp adding unit 707, and the time stamp adding unit 707 records the time information of the input time as a time stamp in the audio packet and the video packet. The transmission unit 708 transmits the audio packet and the video packet to which the time stamp is given to the transmission line 717.

伝送回線７１７は有線ネットワーク、又は、無線ネットワークであり、インターネット、イントラネットの何れでも良い。また、映像パケットと音声パケットの伝送回線７１７内の伝送経路は同じ経路に限定されるものではなく、別々の経路でデータ受信装置７０２に送信されて良い。 The transmission line 717 is a wired network or a wireless network, and may be either the Internet or an intranet. Further, the transmission path of the video packet and the audio packet in the transmission line 717 is not limited to the same path, and may be transmitted to the data receiving apparatus 702 via different paths.

データ受信装置７０２は、音声出力部７０９、音声パケットデコーダ部７１０、映像パケットデコーダ部７１１、映像出力部７１２、メインマイコン部７１３、受信部７１４、映像ジッタバッファ７１５、音声ジッタバッファ７１６から構成されている。 The data receiving device 702 includes an audio output unit 709, an audio packet decoder unit 710, a video packet decoder unit 711, a video output unit 712, a main microcomputer unit 713, a reception unit 714, a video jitter buffer 715, and an audio jitter buffer 716. Yes.

受信部７１４で伝送回線７１７を介して受信した音声パケットと映像パケットは、それぞれ音声ジッタバッファ７１６と映像ジッタバッファ７１５に順次記憶される。音声ジッタバッファ７１６と映像ジッタバッファ７１５は、伝送回線７１７を介して受信する音声パケットと映像パケットの到着時間のバラツキを補正するために設けられたバッファメモリであり、受信した音声パケットと映像パケットは、音声ジッタバッファ７１６と映像ジッタバッファ７１５にそれぞれ一旦蓄えられる。 Audio packets and video packets received by the receiving unit 714 via the transmission line 717 are sequentially stored in the audio jitter buffer 716 and the video jitter buffer 715, respectively. The audio jitter buffer 716 and the video jitter buffer 715 are buffer memories provided for correcting variations in arrival times of audio packets and video packets received via the transmission line 717. The received audio packets and video packets are And once stored in the audio jitter buffer 716 and the video jitter buffer 715, respectively.

音声パケットデコーダ部７１０、映像パケットデコーダ部７１１は、それぞれパケット化された音声パケットと映像パケットをデコードし、デコードした音声信号と映像信号を再生し、それぞれ音声出力部７０９、映像出力部７１２から出力する。 The audio packet decoder unit 710 and the video packet decoder unit 711 decode the packetized audio packet and video packet, respectively, reproduce the decoded audio signal and video signal, and output from the audio output unit 709 and the video output unit 712, respectively. To do.

次に、図１（ｂ）を用いてメインマイコン部７１３について説明する。 Next, the main microcomputer unit 713 will be described with reference to FIG.

メインマイコン部７１３は、マイクロコンピュータを備えて構成される。すなわち、メインマイコン部７１３は、各種演算処理を行うＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０と、演算を行うための作業領域となるＲＡＭ２１と、制御プログラム等が記憶されるＲＯＭ２２とを備え、データ受信装置７０２の各処理部の動作を統括的に制御する。不揮発性メモリであるＲＯＭ２２としては、例えば、データの電気的な書き換えが可能なＥＥＰＲＯＭが採用される。 The main microcomputer unit 713 includes a microcomputer. That is, the main microcomputer unit 713 includes a CPU (Central Processing Unit) 20 that performs various arithmetic processes, a RAM 21 that is a work area for performing arithmetic, and a ROM 22 that stores a control program and the like, and a data receiving device 702. Centrally controls the operation of each processing unit. As the ROM 22 which is a non-volatile memory, for example, an EEPROM capable of electrically rewriting data is adopted.

本実施形態のＣＰＵ２０は、時刻情報検出部１０、再生制御部１１、蓄積量検出部１２、情報量算出部１３、情報量判定部１４を有している。時刻情報検出部１０、再生制御部１１、蓄積量検出部１２、情報量算出部１３、情報量判定部１４は、本発明の時刻情報検出手段、再生制御手段、蓄積量検出手段、情報量算出手段、情報量判定手段である。 The CPU 20 of this embodiment includes a time information detection unit 10, a reproduction control unit 11, an accumulation amount detection unit 12, an information amount calculation unit 13, and an information amount determination unit 14. The time information detection unit 10, the reproduction control unit 11, the accumulation amount detection unit 12, the information amount calculation unit 13, and the information amount determination unit 14 are the time information detection unit, the reproduction control unit, the accumulation amount detection unit, and the information amount calculation according to the present invention. Means for determining the amount of information.

時刻情報検出部１０は映像パケット、音声パケットのタイムスタンプから時刻情報を検出する。 The time information detection unit 10 detects time information from time stamps of video packets and audio packets.

再生制御部１１は、時刻情報検出部１０が検出した時刻情報に基づいて、映像パケットデコーダ７１１と、音声パケットデコーダ７１０にデコードと再生を指令する。 The reproduction control unit 11 instructs the video packet decoder 711 and the audio packet decoder 710 to perform decoding and reproduction based on the time information detected by the time information detection unit 10.

蓄積量検出部１２は、映像ジッタバッファ７１５に蓄積された映像パケットの量を検出する。映像ジッタバッファ７１５には映像パケットが順に蓄積されるので、蓄積量検出部１２は、映像ジッタバッファ７１５に蓄積された先頭の映像パケットのアドレスから映像ジッタバッファ７１５に蓄積された映像パケットの量を検出できる。 The accumulation amount detection unit 12 detects the amount of video packets accumulated in the video jitter buffer 715. Since video packets are sequentially stored in the video jitter buffer 715, the storage amount detection unit 12 calculates the amount of video packets stored in the video jitter buffer 715 from the address of the first video packet stored in the video jitter buffer 715. It can be detected.

情報量算出部１３は音声パケットの情報量を算出し、情報量判定部１４は算出した情報量が所定値以下か、否か、判定する。情報量は、例えば音声信号のレベルであり、所定値は無音状態と判定できるレベルである。このように音声信号のレベルが低い場合は、情報量も低く重要な情報が含まれない、と考えられる。 The information amount calculation unit 13 calculates the information amount of the voice packet, and the information amount determination unit 14 determines whether or not the calculated information amount is a predetermined value or less. The amount of information is, for example, the level of an audio signal, and the predetermined value is a level at which it can be determined that there is no sound. Thus, when the level of the audio signal is low, it is considered that the amount of information is low and important information is not included.

本実施形態では、データ送信装置７０１は、一例として、１５ｆｐｓの動画の１コマ毎の静止画をパケット化して映像パケットとして送信し、２０ｍｓｅｃ分の音声をパケット化して音声パケットにするものとする。また、タイムスタンプ付与部７０７は内蔵するクロックにより得た時刻情報を、映像パケットと音声パケットにタイムスタンプとして付与し、送信部７０８からデータ受信装置７０２に送信するものとして以下の処理を説明する。 In this embodiment, as an example, the data transmission device 701 packetizes a still image of each frame of a 15 fps moving image and transmits it as a video packet, and packetizes 20 msec of audio into audio packets. The following processing will be described on the assumption that the time stamp assigning unit 707 assigns the time information obtained by the built-in clock as a time stamp to the video packet and the audio packet, and transmits the time packet from the transmitting unit 708 to the data receiving device 702.

本実施形態のＣＰＵ２０の処理について、具体的な例を説明する。 A specific example of the processing of the CPU 20 of this embodiment will be described.

再生制御部１１は、受信した音声パケットが音声ジッタバッファ７１６に所定量蓄えられるまで再生を開始しない。例えば、再生制御部１１は、蓄積量検出部１２が検出した音声ジッタバッファ７１６に蓄積された音声パケットの量が、音声ジッタバッファ７１６の記憶容量の５０％に達した時点から再生を開始する。 The playback control unit 11 does not start playback until the received voice packet is stored in the voice jitter buffer 716 by a predetermined amount. For example, the playback control unit 11 starts playback when the amount of voice packets accumulated in the voice jitter buffer 716 detected by the accumulation amount detection unit 12 reaches 50% of the storage capacity of the voice jitter buffer 716.

再生制御部１１の再生制御は、時刻情報検出部１０が検出した音声パケットのタイムスタンプに記されている時刻情報を基準にしている。再生制御部１１は、再生しようとする音声パケットの時刻情報から、その音声の再生中に再生すべき映像パケットを映像ジッタバッファ７１５から検索して映像パケットデコーダ部７１１に送り、映像を再生する。 The reproduction control of the reproduction control unit 11 is based on the time information described in the time stamp of the voice packet detected by the time information detection unit 10. The playback control unit 11 searches the video jitter buffer 715 for a video packet to be played back during playback of the audio from the time information of the audio packet to be played, sends it to the video packet decoder unit 711, and plays back the video.

例えば、これから再生しようとする音声パケットＡのタイムスタンプがＴ１であるとすると、この音声の再生中に再生すべき映像パケットはＴ１＋２０ｍｓｅｃの範囲のタイムスタンプを持つ映像パケットである。例えば、メインマイコン部７１３が映像ジッタバッファ７１５からＴ１＋１０ｍｓｅｃのタイムスタンプを持つ映像パケットＡを検索したとすると、再生制御部１１は、音声パケットＡの再生を開始してから１０秒後に映像パケットデコーダ部７１１で映像パケットＡの映像を再生するように制御する。このようにして、再生制御部１１は音声と映像の同期処理を行う。 For example, if the time stamp of the audio packet A to be reproduced is T1, the video packet to be reproduced during the reproduction of the audio is a video packet having a time stamp in the range of T1 + 20 msec. For example, if the main microcomputer unit 713 searches the video jitter buffer 715 for the video packet A having a time stamp of T1 + 10 msec, the playback control unit 11 starts the playback of the audio packet A, and the video packet decoder unit 10 seconds later. In step S711, control is performed so that the video of the video packet A is reproduced. In this way, the playback control unit 11 performs the audio and video synchronization processing.

ネットワークの伝搬遅延や、データ受信装置７０２のデータ処理時間の影響により、音声ジッタバッファ７１６、映像ジッタバッファ７１５がアンダーフローやオーバフローを起こす場合がある。図２を用いてこの現象について説明する。 The audio jitter buffer 716 and the video jitter buffer 715 may underflow or overflow due to the influence of the propagation delay of the network and the data processing time of the data receiving device 702. This phenomenon will be described with reference to FIG.

図２はアンダーフロー、オーバフローを模式的に説明する説明図である。図２において７１５は映像ジッタバッファ７１５に蓄えられた映像パケットの状態、７１６は音声ジッタバッファ７１６に蓄えられた音声パケットの状態を模式的に表す図である。図２の７１５、７１６の外形はそれぞれ音声ジッタバッファ７１６、映像ジッタバッファ７１５の最大メモリ容量を表している。横軸はメモリアドレスであり、音声パケット、映像パケットは図２の左端のメモリアドレスから順にＦＩＦＯ（ＦｉｒｓｔＩｎＦｉｒｓｔＯｕｔ）で蓄積される。斜線部は音声パケット、映像パケットが音声ジッタバッファ７１６、映像ジッタバッファ７１５に蓄積されている部分を表している。したがって、図２の左側の矢印方向に蓄積された音声パケット、映像パケットほど「新しい」パケットであり、右側の矢印方向に蓄積された音声パケット、映像パケットほど「古い」パケットである。 FIG. 2 is an explanatory diagram for schematically explaining underflow and overflow. In FIG. 2, 715 is a diagram schematically illustrating the state of the video packet stored in the video jitter buffer 715, and 716 is a diagram schematically illustrating the state of the audio packet stored in the audio jitter buffer 716. The outer shapes of 715 and 716 in FIG. 2 represent the maximum memory capacities of the audio jitter buffer 716 and the video jitter buffer 715, respectively. The horizontal axis is a memory address, and audio packets and video packets are stored in FIFO (First In First Out) in order from the leftmost memory address in FIG. The hatched portion represents a portion where audio packets and video packets are accumulated in the audio jitter buffer 716 and the video jitter buffer 715. Therefore, the voice packet and video packet stored in the arrow direction on the left side of FIG. 2 are “newer” packets, and the voice packet and video packet stored in the right arrow direction are “older” packets.

図２（ａ）はオーバーフローの状態を説明する説明図である。 FIG. 2A is an explanatory diagram for explaining an overflow state.

図２（ａ）では音声ジッタバッファ７１６に蓄積された音声パケットより、映像ジッタバッファ７１５に映像パケットが多く蓄積されている状態を示している。矢印Ａ１は次に再生する先頭の音声パケットを示している。一方、矢印Ａ１で示す音声パケットの時刻情報に対応する映像パケットは、矢印Ｖ１で示す映像パケットであり、矢印Ｖ１より右側には映像ジッタバッファ７１５には再生されないまま古い映像パケットが残っている。 FIG. 2A shows a state in which more video packets are stored in the video jitter buffer 715 than audio packets stored in the audio jitter buffer 716. An arrow A1 indicates the first voice packet to be reproduced next. On the other hand, the video packet corresponding to the time information of the audio packet indicated by the arrow A1 is the video packet indicated by the arrow V1, and the old video packet remains without being reproduced in the video jitter buffer 715 on the right side of the arrow V1.

このような現象は、例えば、映像パケットデコーダ部７１の処理時間がタイムスタンプに記されている時間間隔より長くなった場合などであり、この場合処理が追いつかず映像ジッタバッファ７１５に映像パケットがどんどんたまっていくので、オーバーフローを生じる。 Such a phenomenon is, for example, when the processing time of the video packet decoder unit 71 is longer than the time interval described in the time stamp. In this case, the processing cannot catch up and video packets are steadily stored in the video jitter buffer 715. As it accumulates, overflow occurs.

図２（ｂ）では映像ジッタバッファ７１５に蓄積された映像パケットより、音声ジッタバッファ７１６に蓄積された音声パケットが多く蓄積されている状態を示している。矢印Ａ２は次に再生する先頭の音声パケットを示している。一方、矢印Ａ２で示す音声パケットの時刻情報に対応する映像パケットは、矢印Ｖ２で示す先頭の映像パケットであり、映像ジッタバッファ７１５に蓄積されている映像パケットが少ない状態を示している。 FIG. 2B shows a state where more audio packets are stored in the audio jitter buffer 716 than in the video packets stored in the video jitter buffer 715. An arrow A2 indicates the first audio packet to be reproduced next. On the other hand, the video packet corresponding to the time information of the audio packet indicated by the arrow A2 is the first video packet indicated by the arrow V2, and indicates that the video packet stored in the video jitter buffer 715 is small.

このような現象は、例えば、当初遅滞なく到達していた映像パケットが、ネットワークが混雑してきたため、パケット到達に遅延が生じた場合である。この場合、映像ジッタバッファ７１５に蓄積された映像パケットが枯渇し、再生すべき映像パケットがなくなる、という状況に陥る。 Such a phenomenon is, for example, the case where a video packet that has arrived without delay initially has a delay in reaching the packet because the network is congested. In this case, the video packets stored in the video jitter buffer 715 are depleted and there are no video packets to be reproduced.

このようなアンダーフロー、オーバーフローは、ＩＰネットワーク経由でデータの送受信を行っているため避けられない現象である。また、エンコーダー、デコーダーの処理時間の個体差により発生する。 Such underflow and overflow are phenomena that cannot be avoided because data is transmitted and received via the IP network. It also occurs due to individual differences in the processing time of the encoder and decoder.

次に、本実施形態においてアンダーフロー、オーバーフローのおそれが発生したときの処理について説明する。 Next, processing when there is a possibility of underflow or overflow in the present embodiment will be described.

図３はメインマイコン部７１３による再生制御の流れを示すフローチャート処理である。 FIG. 3 is a flowchart process showing the flow of reproduction control by the main microcomputer unit 713.

図３のフローチャートでは、受信した音声パケットの数が、例えば音声ジッタバッファ７１６の記憶容量の５０％に達し、再生を開始してからの処理について説明する。 In the flowchart of FIG. 3, processing after the number of received voice packets reaches 50% of the storage capacity of the voice jitter buffer 716 and playback is started will be described.

Ｓ１０１：最も古い時刻情報を持つ音声パケットをデコードするステップである。 S101: Decoding a voice packet having the oldest time information.

再生制御部１１は、時刻情報検出部１０に指令して音声ジッタバッファ７１６に記憶されている最も古い時刻情報を有する音声パケットを検索し、検索した音声パケットを音声パケットデコーダ部７１０に送り、音声パケットデコーダ部７１０は該音声パケットをデコードして音声データにする。 The playback control unit 11 instructs the time information detection unit 10 to search for a voice packet having the oldest time information stored in the voice jitter buffer 716, and sends the searched voice packet to the voice packet decoder unit 710. The packet decoder unit 710 decodes the audio packet into audio data.

Ｓ５００：デコードした音声パケットの情報量を算出するステップである。 S500: This is a step of calculating the information amount of the decoded voice packet.

情報量算出部１３は、デコードした音声パケットの情報量を算出するサブルーチンをコールし、音声パケットの情報量の値を得る。例えば会議中に発言が活発に行われているときは音声パケットの情報量が多く、発言が途切れてほぼ無音状態になったり、空調の音など背景音だけが聞こえる場合などは情報量が少ない。音声データから情報量算出するアルゴリズムについては後に詳しく説明する。 The information amount calculation unit 13 calls a subroutine for calculating the information amount of the decoded voice packet, and obtains the value of the information amount of the voice packet. For example, the amount of information in a voice packet is large when speech is actively performed during a conference, and the amount of information is small when speech is interrupted and the sound is almost silent, or only background sounds such as air conditioning sounds are heard. An algorithm for calculating the amount of information from the audio data will be described in detail later.

Ｓ１０２：音声の再生を開始するステップである。 S102: This is a step of starting audio reproduction.

再生制御部１１は、音声パケットデコーダ部７１０に指令し、デコードした音声データを再生して音声出力部７０９から出力する。 The reproduction control unit 11 instructs the audio packet decoder unit 710 to reproduce the decoded audio data and output it from the audio output unit 709.

Ｓ１０３：再生中の音声と同期する映像パケットを検索するステップである。 S103: This is a step of searching for a video packet synchronized with the audio being reproduced.

例えば、音声パケットＡの再生時間が２０ｍｓｅｃ、タイムスタンプがＴ１であるとすると、この音声の再生中に再生すべき映像パケットはＴ１＋２０ｍｓｅｃの範囲のタイムスタンプを持つ映像パケットである。再生制御部１１は、時刻情報検出部１０に指令して映像ジッタバッファ７１５でこの範囲のタイムスタンプを持つ映像パケットを検索する。 For example, if the reproduction time of the audio packet A is 20 msec and the time stamp is T1, the video packet to be reproduced during the reproduction of the audio is a video packet having a time stamp in the range of T1 + 20 msec. The playback control unit 11 instructs the time information detection unit 10 to search the video jitter buffer 715 for a video packet having a time stamp in this range.

同期する映像パケットが無い場合、（ステップＳ１０３；Ｎｏ）、ステップＳ１２２に進む。 When there is no video packet to be synchronized (step S103; No), the process proceeds to step S122.

同期する映像パケットが無い場合は、映像ジッタバッファ７１５はアンダーフロー状態であるのでステップＳ１２２に進む。 If there is no video packet to be synchronized, the video jitter buffer 715 is in an underflow state, and the process proceeds to step S122.

同期する映像パケットがある場合、（ステップＳ１０３；Ｙｅｓ）、ステップＳ１０４に進む。 If there is a video packet to be synchronized (step S103; Yes), the process proceeds to step S104.

Ｓ１０４：映像パケットが映像ジッタバッファ７１５においてオーバーフローのおそれがあるか、否か判定するステップである。 S104: A step of determining whether or not there is a possibility that the video packet may overflow in the video jitter buffer 715.

蓄積量検出部１２は、映像ジッタバッファ７１５に蓄積されている映像パケットの先頭アドレスを調べて、蓄積量を検出する。再生制御部１１は、所定値と比較し映像ジッタバッファ７１５においてオーバーフローのおそれがあるか、否か判定する。例えば、所定値は７０％であり、映像ジッタバッファ７１５に蓄積されている映像パケットが７０％以上の場合、オーバーフローのおそれがある、と判定する。 The accumulation amount detector 12 checks the start address of the video packet accumulated in the video jitter buffer 715 and detects the accumulation amount. The reproduction control unit 11 determines whether there is a possibility of overflow in the video jitter buffer 715 by comparing with a predetermined value. For example, when the predetermined value is 70% and the video packet stored in the video jitter buffer 715 is 70% or more, it is determined that there is a possibility of overflow.

オーバーフローのおそれが無い場合、（ステップＳ１０４；Ｎｏ）、ステップＳ１０５に進む。 If there is no risk of overflow (step S104; No), the process proceeds to step S105.

オーバーフローのおそれがある場合、（ステップＳ１０４；Ｙｅｓ）、ステップＳ１０８に進む。 If there is a possibility of overflow (step S104; Yes), the process proceeds to step S108.

Ｓ１０８：映像パケットを再生するステップである。 S108: This is a step of reproducing the video packet.

再生制御部１１は、ステップＳ１０３で検索した映像パケットを、デコードし所定のタイミングで再生するよう映像パケットデコーダ部７１１に指令する。 The reproduction control unit 11 instructs the video packet decoder unit 711 to decode and reproduce the video packet searched in step S103 at a predetermined timing.

Ｓ１０９：古い映像パケットを廃棄するステップである。 S109: This is a step of discarding old video packets.

再生制御部１１は、映像ジッタバッファ７１５に蓄積されている映像パケットのうち現在再生中の音声パケットより古い映像パケットを廃棄する。オーバーフローのおそれがある場合は、映像の再生が遅れ古い映像パケットがどんどん蓄積していくので、再生されない古い映像パケットを廃棄する。 The playback control unit 11 discards video packets older than the currently playing audio packet among the video packets stored in the video jitter buffer 715. When there is a possibility of overflow, old video packets that are not played back are discarded because old video packets are accumulated more and more late.

Ｓ１１０：音声の情報量は所定値以下か、否か、判定するステップである。 S110: A step of determining whether or not the amount of audio information is equal to or less than a predetermined value.

情報量判定部１４は、ステップＳ５００で算出した音声の情報量の値が所定値以下か、否か、判定する。所定値は例えば無音状態や背景音のみが聞こえている状態を判定する閾値であり、使用環境に応じて予め設定する。 The information amount determination unit 14 determines whether or not the value of the audio information amount calculated in step S500 is equal to or less than a predetermined value. The predetermined value is a threshold value for determining, for example, a silent state or a state in which only background sound is heard, and is set in advance according to the use environment.

所定値以上の場合、（ステップＳ１１０；Ｎｏ）、ステップＳ１１２に進む。 If it is greater than or equal to the predetermined value (step S110; No), the process proceeds to step S112.

音声の再生を継続し、ステップＳ１１２に進む。 The audio reproduction is continued, and the process proceeds to step S112.

所定値以下の場合、（ステップＳ１１０；Ｙｅｓ）、ステップＳ１１１に進む。 If the value is equal to or smaller than the predetermined value (step S110; Yes), the process proceeds to step S111.

Ｓ１１１：音声再生を中止するステップである。 S111: This is a step of stopping audio reproduction.

再生制御部１１は、音声パケットデコーダ部７１０に音声再生を中止するよう指令し、次のステップＳ１１２に進む。この場合、音声パケットの到着が遅れているので、音声の遅延が大きく、例えば会話を行うときの違和感が大きい。そのため、情報量の少ない、例えば無音状態や背景音のみが聞こえている音声パケットの再生を中止し、音声の遅延を減少させるようにしている。このようにすると、聞いている人にとって違和感が少なく、また重要な情報を聞き漏らすことなく、音声の遅延時間を短縮できる。 The reproduction control unit 11 instructs the audio packet decoder unit 710 to stop audio reproduction, and proceeds to the next step S112. In this case, since the arrival of the voice packet is delayed, the voice delay is large, for example, a sense of incongruity when talking is large. For this reason, the reproduction of a voice packet with a small amount of information, for example, a silent state or a background sound that is only heard, is stopped to reduce the voice delay. In this way, there is little sense of incongruity for the person who is listening, and the voice delay time can be shortened without missing important information.

Ｓ１０５：映像パケットが映像ジッタバッファ７１５においてアンダーフローのおそれがあるか、否か判定するステップである。 S105: This is a step of determining whether or not the video packet is likely to underflow in the video jitter buffer 715.

蓄積量検出部１２は、映像ジッタバッファ７１５に蓄積されている映像パケットの先頭アドレスを調べて、映像ジッタバッファ７１５に蓄積された映像パケットの蓄積量を検出する。再生制御部１１は、所定値と比較しアンダーフローのおそれがあるか、否か判定する。例えば、所定値は３０％であり、再生制御部１１は、映像ジッタバッファ７１５に蓄積されている映像パケットが３０％以下の場合はオーバーフローのおそれ有りと判定する。 The accumulation amount detection unit 12 checks the start address of the video packet accumulated in the video jitter buffer 715 and detects the accumulation amount of the video packet accumulated in the video jitter buffer 715. The reproduction control unit 11 determines whether or not there is a possibility of underflow by comparing with a predetermined value. For example, if the predetermined value is 30% and the video packet stored in the video jitter buffer 715 is 30% or less, the reproduction control unit 11 determines that there is a possibility of overflow.

アンダーフローのおそれが無い場合、（ステップＳ１０５；Ｎｏ）、ステップＳ１３０に進む。 When there is no possibility of underflow (step S105; No), the process proceeds to step S130.

アンダーフローのおそれがある場合、（ステップＳ１０５；Ｙｅｓ）、ステップＳ１２１に進む。 If there is a possibility of underflow (step S105; Yes), the process proceeds to step S121.

Ｓ１３０：映像パケットを再生するステップである。 S130: This is a step of reproducing a video packet.

再生制御部１１は、ステップＳ１０３で検索した映像パケットを、デコードし所定のタイミングで再生するよう映像パケットデコーダ部７１１に指令する。次にステップＳ１１２に進む。 The reproduction control unit 11 instructs the video packet decoder unit 711 to decode and reproduce the video packet searched in step S103 at a predetermined timing. Next, the process proceeds to step S112.

Ｓ１２１：映像パケットを再生するステップである。 S121: This is a step of reproducing a video packet.

Ｓ１２２：音声の情報量は所定値以下か、否か、判定するステップである。 S122: This is a step of determining whether or not the amount of audio information is a predetermined value or less.

再生制御部１１は、ステップＳ５００で算出した音声データの情報量の値が所定値以下か、否か、判定する。所定値は例えば無音状態や背景音のみが聞こえている状態を判定する閾値であり、使用環境に応じて予め設定する。 The reproduction control unit 11 determines whether or not the information amount value of the audio data calculated in step S500 is equal to or less than a predetermined value. The predetermined value is a threshold value for determining, for example, a silent state or a state in which only background sound is heard, and is set in advance according to the use environment.

所定値以上の場合、（ステップＳ１２２；Ｎｏ）、ステップＳ１１２に進む。 If it is equal to or greater than the predetermined value (step S122; No), the process proceeds to step S112.

所定値以下の場合、（ステップＳ１２２；Ｙｅｓ）、ステップＳ１２３に進む。 If it is equal to or smaller than the predetermined value (step S122; Yes), the process proceeds to step S123.

Ｓ１２３：音声を繰り返し再生するステップである。 S123: This is a step of repeatedly reproducing the sound.

再生制御部１１は、ステップＳ１２２で所定量以下の情報量の音声データと判定された音声データを、映像パケットが映像ジッタバッファ７１５に所定量蓄積されるまで、音声パケットデコーダ７１０に繰り返し再生するように指令する。この場合、映像パケットの到着が音声パケットの到着より遅れているので、情報量の少ない、例えば無音状態や背景音のみが聞こえている音声パケットを繰り返して再生し、映像パケットが映像ジッタバッファ７１５に所定量蓄積されるまで待つ。メインマイコン部７１３は、映像パケットが映像ジッタバッファ７１５に所定量蓄積されたことを蓄積量検出部１２が検出すると、ステップ１１２に進む。 The reproduction control unit 11 repeatedly reproduces the audio data determined as the audio data having the information amount equal to or smaller than the predetermined amount in step S122 until the audio packet decoder 710 stores the predetermined amount of the video packet in the video jitter buffer 715. To In this case, since the arrival of the video packet is delayed from the arrival of the audio packet, an audio packet having a small amount of information, for example, a sound packet in which only a silent state or background sound is heard is repeatedly reproduced, and the video packet is stored in the video jitter buffer 715. Wait until a predetermined amount is accumulated. When the accumulation amount detection unit 12 detects that a predetermined amount of video packets have been accumulated in the video jitter buffer 715, the main microcomputer unit 713 proceeds to step 112.

Ｓ１１２：次の音声パケットがあるか、否か、判定するステップである。 S112: A step of determining whether or not there is a next voice packet.

再生制御部１１は、音声ジッタバッファ７１６を検索し、次に再生する音声パケットを検索する。 The reproduction control unit 11 searches the audio jitter buffer 716 and searches for the next audio packet to be reproduced.

音声パケットがある場合、（ステップＳ１１２；Ｙｅｓ）、ステップＳ１０１に戻る。 If there is a voice packet (step S112; Yes), the process returns to step S101.

ステップＳ１０１に戻り、次の音声パケットを再生する。 Returning to step S101, the next voice packet is reproduced.

音声パケットがない場合、（ステップＳ１１２；Ｎｏ）、終了する。 When there is no voice packet (step S112; No), the process ends.

このように、情報量の少ない、例えば無音状態や背景音のみが聞こえている音声パケットの再生を中止、または繰り返し再生することにより音声の実時間からの遅延時間を調整し、映像ジッタバッファ７１５に蓄積されている映像パケットの量を一定量にして音声と同期させる映像パケットを確保しているので、重要な音声情報が損なわれることが無く、通話者が再生された音声に違和感を感じることが少ないパケット通信システムを提供できる。 In this way, the delay time from the real time of the audio is adjusted by stopping or repeating the reproduction of the audio packet with a small amount of information, for example, the silent state or the background sound only heard, and the video jitter buffer 715 Since the video packets to be synchronized with the audio are secured with a certain amount of the stored video packets, important audio information is not lost, and the caller may feel uncomfortable with the reproduced audio Fewer packet communication systems can be provided.

次に、図４、図５を用いてステップＳ５００で説明した音声の情報量を算出するルーチンについて説明する。 Next, the routine for calculating the amount of audio information described in step S500 will be described with reference to FIGS.

図４は本発明の音声の情報量算出ルーチンの第１の実施形態である。 FIG. 4 shows a first embodiment of the audio information amount calculation routine of the present invention.

Ｓ５０１：音声信号の平均信号レベルを算出する。 S501: The average signal level of the audio signal is calculated.

情報量算出部１３は、音声パケットデコーダ７１０がデコードした音声データの平均信号レベルを算出し、この値を音声の情報量とする。 The information amount calculation unit 13 calculates the average signal level of the audio data decoded by the audio packet decoder 710, and uses this value as the audio information amount.

例えば、音声データが１２ｂｉｔであり、完全な無音状態がデジタル値０だとする。ステップＳ５００で算出した平均値がデジタル値８０だったとすると、デジタル値８０が音声の情報量の値である。例えばステップＳ１１０で判定する所定値が１００だとすると、音声の情報量の値が８０の場合は、無音状態と判定される。 For example, it is assumed that the audio data is 12 bits and the complete silence state is a digital value of 0. If the average value calculated in step S500 is a digital value 80, the digital value 80 is the value of the amount of audio information. For example, if the predetermined value determined in step S110 is 100, if the value of the audio information amount is 80, it is determined that there is no sound.

図５は本発明の音声の情報量算出ルーチンの第２の実施形態である。 FIG. 5 shows a second embodiment of the sound information amount calculation routine of the present invention.

Ｓ６０１：音声信号を周波数変換する。 S601: Frequency conversion of an audio signal is performed.

情報量算出部１３は、音声パケットデコーダ７１０がデコードした音声データを周波数変換し、各周波数毎の信号レベルの値を求める。 The information amount calculation unit 13 performs frequency conversion on the audio data decoded by the audio packet decoder 710, and obtains a signal level value for each frequency.

Ｓ６０２：前回までの値と加算平均する。 S602: Addition averaging with the previous value.

メインマイコン部７１３は、内蔵するメモリに記憶されている前回までの各周波数毎の信号レベルの値と今回の値に重み付けをして加算し、各周波数毎の平均値を求める。 The main microcomputer unit 713 weights and adds the signal level value for each frequency stored in the built-in memory until the previous time and the current value to obtain an average value for each frequency.

このように毎回の周波数分布の加算平均を算出すると、常に同じ音を発生している、例えば空調の音などの背景音の周波数分布の平均値が算出される。 When the addition average of the frequency distribution of each time is calculated in this way, the average value of the frequency distribution of the background sound such as the air-conditioning sound that always generates the same sound is calculated.

Ｓ６０３：加算平均値と今回の値の差分を算出する。 S603: The difference between the addition average value and the current value is calculated.

情報量算出部１３は、ステップＳ６０２で求めた各周波数毎の加算平均値と今回の値との差分を算出する。 The information amount calculation unit 13 calculates the difference between the addition average value for each frequency obtained in step S602 and the current value.

Ｓ６０４：周波数毎の差分の総和を算出する。 S604: The sum of the differences for each frequency is calculated.

情報量算出部１３は、ステップＳ６０３で求めた各周波数毎の差分の総和を算出する。 The information amount calculation unit 13 calculates the sum of differences for each frequency obtained in step S603.

ステップＳ６０４で求めた総和の値を情報量とし、もとのルーチンに戻る。 The sum total obtained in step S604 is used as the information amount, and the process returns to the original routine.

例えば、情報量が低い無音状態が続いて居る場合は、このようにして算出した値は０であり、一方、会議などで複数の人が発言する場合は周波数分布が毎回異なることが多いので、得られた値は大きくなる。 For example, when there is a silent state with a low amount of information, the value calculated in this way is 0. On the other hand, when multiple people speak in a meeting or the like, the frequency distribution is often different each time. The value obtained is large.

以上このように、本発明によれば、音声パケットが映像パケットより遅れた場合でも、音声の品質を保ちながら映像を音声と同期させて再生可能なパケット通信システム、及びデータ受信機器を提供できる。 As described above, according to the present invention, it is possible to provide a packet communication system and a data receiving device capable of reproducing video in synchronization with audio while maintaining audio quality even when the audio packet is delayed from the video packet.

本発明に係るパケット通信システムの構成の一実施形態を示すブロック図である。It is a block diagram which shows one Embodiment of the structure of the packet communication system which concerns on this invention. アンダーフロー、オーバフローを模式的に説明する説明図である。It is explanatory drawing which illustrates underflow and overflow typically. メインマイコン部７１３による再生制御の流れを示すフローチャート処理である。It is a flowchart process which shows the flow of reproduction | regeneration control by the main microcomputer part 713. FIG. 本発明の音声の情報量算出ルーチンの第１の実施形態である。It is 1st Embodiment of the audio | voice information amount calculation routine of this invention. 本発明の音声の情報量算出ルーチンの第２の実施形態である。It is 2nd Embodiment of the audio | voice information amount calculation routine of this invention.

Explanation of symbols

１０時刻情報検出部
１１再生制御部
１２蓄積量検出部
１３情報量算出部
１４情報量判定部
２０ＣＰＵ
２１ＲＡＭ
２２ＲＯＭ
７０１データ送信機器
７１３メインマイコン部
７０２データ受信機器
７１７伝送回線 DESCRIPTION OF SYMBOLS 10 Time information detection part 11 Reproduction | regeneration control part 12 Accumulation amount detection part 13 Information amount calculation part 14 Information amount determination part 20 CPU
21 RAM
22 ROM
701 Data transmission device 713 Main microcomputer unit 702 Data reception device 717 Transmission line

Claims

A video packet encoder that packetizes video signals into video packets;
A voice packet encoder that packetizes voice signals into voice packets;
A time stamp giving unit for giving time information to video packets and audio packets;
A data transmission device comprising: a transmission unit that sequentially transmits the video packet and the audio packet to which the time information is attached;
A receiving unit for receiving the video packet and the audio packet transmitted via a transmission line;
A video jitter buffer for sequentially storing the video packets received by the receiver;
A voice jitter buffer for sequentially storing the voice packets received by the receiver;
A video packet decoder for decoding and reproducing the video packet;
An audio packet decoder for decoding and reproducing the audio packet;
Time information detecting means for detecting the time information;
Based on the time information detected by the time information detection means, a data receiving device comprising the video packet decoder, and a reproduction control means for instructing the audio packet decoder to reproduce,
In a packet communication system having
The data receiving device is:
Accumulated amount detecting means for detecting the amount of image packets accumulated in the image jitter buffer;
An information amount calculating means for calculating an information amount of an audio packet to be reproduced by the audio packet decoder;
An information amount determination means for determining the information amount calculated by the information amount calculation means;
The reproduction control means includes
A packet communication system characterized in that, according to the amount of video packets detected by the accumulated amount detection means, the reproduction time of the audio packet determined by the information amount determination means to be an information amount equal to or less than a predetermined value is changed. .

The reproduction control means includes
When the amount of video packets detected by the accumulated amount detection means is a predetermined value or more,
2. The packet according to claim 1, wherein voice packets whose information amount is equal to or greater than a predetermined value are sequentially reproduced without reproducing the voice packets determined as having an information amount equal to or smaller than a predetermined value. Communications system.

The reproduction control means includes
3. The packet communication system according to claim 2, wherein the video packet stored in the video jitter buffer having time information older than the audio packet to be reproduced is discarded.

The reproduction control means includes
When the amount of video packets detected by the accumulated amount detection means is below a predetermined value,
2. The audio packet determined by the information amount determination means to be an information amount equal to or less than a predetermined value is repeated according to the amount of the video packet, and then the next audio packet is sequentially reproduced. The packet communication system described in 1.

5. The packet communication system according to claim 1, wherein the information amount calculation unit calculates the information amount based on an average value of the audio signal.

The information amount calculation means calculates each frequency component of the audio signal, and calculates the difference between the addition average value of each frequency component value of the audio packet calculated up to the previous time and the value of each frequency component calculated this time. The packet communication system according to any one of claims 1 to 4, wherein the information amount is calculated based on the information.

A receiver for receiving video packets and audio packets;
A video jitter buffer for sequentially storing the video packets received by the receiver;
A voice jitter buffer for sequentially storing the voice packets received by the receiver;
A video packet decoder for decoding and reproducing the video packet;
An audio packet decoder for decoding and reproducing the audio packet;
Time information detecting means for detecting time information attached to the video packet and the audio packet;
In a data receiving device comprising: the video packet decoder based on the time information detected by the time information detection means; and a reproduction control means for instructing the audio packet decoder to reproduce,
The data receiving device is:
Accumulated amount detecting means for detecting the amount of image packets accumulated in the image jitter buffer;
An information amount calculating means for calculating an information amount of an audio packet to be reproduced by the audio packet decoder;
An information amount determination means for determining the information amount calculated by the information amount calculation means;
The reproduction control means includes
A data receiving device characterized in that, according to the amount of video packets detected by the accumulated amount detecting means, the reproduction time of the audio packet determined by the information amount determining means to be an information amount equal to or less than a predetermined value is changed. .

The reproduction control means includes
When the amount of video packets detected by the accumulated amount detection means is a predetermined value or more,
8. The data according to claim 7, wherein the voice packet whose information amount is equal to or greater than the predetermined value is sequentially reproduced without reproducing the voice packet determined as having the information amount equal to or smaller than the predetermined value. Receiver equipment.

The reproduction control means includes
9. The data receiving apparatus according to claim 8, wherein the video packet stored in the video jitter buffer having time information older than the audio packet to be reproduced is discarded.

The reproduction control means includes
When the amount of video packets detected by the accumulated amount detection means is below a predetermined value,
8. The reproduction of audio packets determined by the information amount determination means to be an information amount of a predetermined value or less is repeated according to the amount of video packets, and then the next audio packets are sequentially reproduced. The data receiving device described in 1.

11. The data receiving device according to claim 7, wherein the information amount calculation unit calculates the information amount based on an average value of an audio signal.

The information amount calculation means calculates each frequency component of the audio signal, and calculates the difference between the addition average value of each frequency component value of the audio packet calculated up to the previous time and the value of each frequency component calculated this time. The data receiving device according to claim 7, wherein the information amount is calculated based on the information amount.