JP2005094755A

JP2005094755A - Method for processing a plurality of videos

Info

Publication number: JP2005094755A
Application number: JP2004260001A
Authority: JP
Inventors: Vetro Anthony; アンソニー・ヴェトロ; Huifang Sun; ハイファン・スン
Original assignee: Mitsubishi Electric Research Laboratories Inc
Current assignee: Mitsubishi Electric Research Laboratories Inc
Priority date: 2003-09-11
Filing date: 2004-09-07
Publication date: 2005-04-07

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method for processing a plurality of videos, which minimizes a combined distortion and satisfies a combined frame rate constraint, for all videos. <P>SOLUTION: Intra-, or inter-compressed frames of each compressed video are acquired at a fixed sampling rate. Joint analysis is applied concurrently and in parallel to the compressed videos to determine a variable and non-uniform temporal sampling rate for each compressed video so that a combined distortion is minimized and a combined frame rate constraint is satisfied. Each compressed video is then sampled at an associated variable and non-uniform temporal sampling rate to produce output compressed videos having variable temporal resolutions. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、包括的には圧縮ビデオのサンプリングに関し、特に資源の制約に応じた圧縮ビデオのサンプリングに関する。 The present invention relates generally to compressed video sampling, and more particularly to compressed video sampling subject to resource constraints.

複数のビデオの符号化は主に２つの領域、すなわち送信および録画において考慮されてきた。複数のビデオの送信は主に放送および伝達用途に応用され、複数のビデオの録画は通常、監視用途に応用される。ビデオの録画は多くの消費者向け電気製品でもよく見られるが、そのようなビデオレコーダは通常、複数の同時のビデオではなく単一のビデオの符号化を扱う。 Multiple video encoding has been considered primarily in two areas: transmission and recording. Multiple video transmissions are primarily applied to broadcast and transmission applications, and multiple video recordings are typically applied to surveillance applications. Although video recording is common in many consumer electronics products, such video recorders typically handle single video encoding rather than multiple simultaneous videos.

テレビ放送用途では、複数のビデオを符号化して、符号化されたビデオのビットストリームを帯域幅の固定された単一のチャネルで一斉送信できるようにすることが一般的なやり方である。例えばＮ個のプログラムと、衛星リンクに一般的な全部で４５Ｍｂｐｓのチャネル帯域とが与えられた場合に問題となるのは、全体が最大品質を有するＮ個のプログラムを符号化して単一のチャネル上に多重化することである。帯域幅は固定されており、各プログラムの複雑性は異なるため、各プログラムは可変ビットレート（ＶＢＲ）で符号化される。このようにして、全プログラムを通して略一定の歪を維持することができる。したがって、一部のビデオのより複雑な部分には、同時に符号化する他のビデオのあまり複雑でない部分に割り当てるビット数を減らすことにより、より多くのビットを割り当てることができる。 In television broadcast applications, it is common practice to encode multiple videos so that the encoded video bitstream can be broadcast over a single channel with a fixed bandwidth. For example, when N programs and a general 45 Mbps channel bandwidth are generally given to the satellite link, the problem is that the N programs having the maximum quality are encoded into a single channel. To multiplex on top. Since the bandwidth is fixed and the complexity of each program is different, each program is encoded at a variable bit rate (VBR). In this way, a substantially constant distortion can be maintained throughout the entire program. Thus, more complex portions of some videos can be allocated more bits by reducing the number of bits allocated to less complex portions of other videos that are encoded simultaneously.

上記のような符号化プロセスは統計的多重化と呼ばれる。このプロセスに関連する技法は、Haskell「可変レート符号化ストリームの多重化（Multiplexing of Variable Rate Encoded Streams）」（IEEE Transactions on Circuits and Systems for Video Technology, 1994）、Wang他「マルチプログラムビデオ符号化の複合レート制御（Joint Rate Control For Multi-Program Video Coding）」（IEEE Transactions on Consumer Electronics, Vol. 42, No. 3, August 1996）、Yangに対して２０００年７月１８日付で発行された米国特許第６，０９１，４５５号「ビデオ録画のための統計的多重化装置（Statistical Multiplexer for Recording Video）」、Choi他に対して２００１年２月２７日付で発行された米国特許第６，１９５，３８８号「複数のビデオプログラムを符号化するための装置および方法（Apparatus and Method for Encoding Multiple Video Programs）」、および本明細書中に含まれる参考文献に記載されている。 The encoding process as described above is called statistical multiplexing. Techniques associated with this process include Haskell's Multiplexing of Variable Rate Encoded Streams (IEEE Transactions on Circuits and Systems for Video Technology, 1994), Wang et al. "Joint Rate Control For Multi-Program Video Coding" (IEEE Transactions on Consumer Electronics, Vol. 42, No. 3, August 1996), US patent issued July 18, 2000 to Yang No. 6,091,455, “Statistical Multiplexer for Recording Video”, US Pat. No. 6,195,388, issued February 27, 2001 to Choi et al. No. "Apparatus and Method for Encoding Multiple Video Programs" and references contained herein. It has been.

同様の方針では、SunおよびVetroが１９９９年１０月１９日付で発行された米国特許第５，９６９，７６４号において、固定帯域幅に制約のあるシーン中の複数のオブジェクトの符号化を記載している。この方法は、各オブジェクトにビットを割り当てる。Vetro他により２０００年５月２６日付で出願された米国特許出願第０９／５７９，８８９号「可変の時間分解能を有する複数のビデオオブジェクトを符号化およびトランスコードする方法（Method for encoding and transcoding multiple video objects with variable temporal resolution）」では、シーン中の異なる時間レートを有する各オブジェクトを用いて全帯域幅制約を満足する方法が記載されている。そこでこの方法は、シーン中の複数のオブジェクトを異なる時間レートで符号化するときに生じる成分のアーチファクトを最小にする。 A similar policy describes the encoding of multiple objects in a fixed bandwidth constrained scene in US Pat. No. 5,969,764 issued October 19, 1999 by Sun and Vetro. Yes. This method assigns a bit to each object. US patent application Ser. No. 09 / 579,889, filed May 26, 2000 by Vetro et al., “Method for encoding and transcoding multiple video. objects with variable temporal resolution) describes a method for satisfying the full bandwidth constraint with each object having a different temporal rate in the scene. This method therefore minimizes the component artifacts that occur when encoding multiple objects in a scene at different time rates.

上記の従来技術の方法は、１つの伝送チャネルの全帯域幅制約を受ける複数のビデオまたはオブジェクトを符号化する。 The prior art method described above encodes multiple videos or objects subject to the full bandwidth constraints of one transmission channel.

従来技術において、帯域幅以外の資源制約は複数のビデオの処理において考慮されてきた。例えばHuang他に対して２０００年８月１８日付で発行された米国特許第６，０５２，３８４号「受信器モデルを用いたタイミング制約を有する可変ビットレートストリームの多重化（Using a Receiver Model to Multiplex Variable Bit-Rate Streams Having Timing Constraints）」を参照されたい。この特許は、各ストリームの出力ビットレートを求めて、マルチプレクサ内のビットストリームの待ち行列もデコーダのバッファもオーバーフローやアンダーフローしないようにする技法を記載する。これらのレートは、ビットストリームから読み出されるタイミング情報と、デコーダのバッファの動作を考慮する受信器モデルとを用いて求められる。 In the prior art, resource constraints other than bandwidth have been considered in processing multiple videos. For example, US Pat. No. 6,052,384 issued to Huang et al. On August 18, 2000, “Using a Receiver Model to Multiplex with Timing Constraints Using Receiver Model”. Refer to Variable Bit-Rate Streams Having Timing Constraints). This patent describes a technique for determining the output bit rate of each stream so that neither the queue of bit streams in the multiplexer nor the buffer of the decoder overflows or underflows. These rates are determined using timing information read from the bitstream and a receiver model that takes into account the operation of the decoder buffer.

遅延と処理の両方のタイミング制約を考慮した複数のビデオのトランスコーディングは、Chen他に対して２００１年８月１４日付で発行された米国特許第６，２７５，５３６号「複数のプログラマブルプロセッサを用いたマルチチャネルＭＰＥＧビデオトランスコーダの実行アーキテクチャ（Implementation architectures of a multi-channel MPEG video transcoder using multiple programmable processors）」に記載されている。入力ビットストリームはまず処理単位に区分化される。１つのアーキテクチャにおいて、処理単位は、それぞれ独自の待ち行列を有する異なるサブビットストリームに分割され、各サブストリームは対応する分岐において処理される。第２のアーキテクチャにおいて、処理単位は共通の待ち行列から任意の利用可能なプロセッサに割り当てられる。待ち行列システムモデルに従って独立した処理単位が同時に処理され、平均処理時間を最短にする。複数の分岐の並列処理である第１のアーキクテチャとは対照的に、第２のアーキテクチャは多重処理の単一分岐である。 Multiple video transcoding considering both delay and processing timing constraints is described in US Pat. No. 6,275,536, issued August 14, 2001 to Chen et al. “Using multiple programmable processors. Implementation architectures of a multi-channel MPEG video transcoder using multiple programmable processors ”. The input bitstream is first partitioned into processing units. In one architecture, the processing unit is divided into different sub-bitstreams, each with its own queue, and each substream is processed in a corresponding branch. In the second architecture, processing units are assigned to any available processor from a common queue. Independent processing units are processed simultaneously according to the queuing system model, minimizing the average processing time. In contrast to the first architecture, which is parallel processing of multiple branches, the second architecture is a single branch of multiple processing.

Chen他と同様に、Tiwari他に対して２００１年１２月２８日付で発行された米国特許第６，００８，８４８号「複数の計算エージェントを用いるビデオ圧縮（Video Compression Using Multiple Computing Agents）」もまた、複数のプロセッサを用いるシステムおよび方法を記載する。対照的にTiwariは、ビデオの符号化に適用され、複数のプロセッサまたは圧縮エージェントによって行われる粗粒度並列処理（parallelism）を用いた符号化を達成する技法を記載する。 Similar to Chen et al., US Pat. No. 6,008,848, “Video Compression Using Multiple Computing Agents” issued December 28, 2001 to Tiwari et al. A system and method using multiple processors is described. In contrast, Tiwari describes a technique that applies to video encoding and achieves encoding using coarse-grain parallelism performed by multiple processors or compression agents.

図１は、監視用途の複数のビデオを符号化する一般的なシステムモデルを示す。カメラ１０１は、ビデオレコーダ１１０のためのビデオ１０２を取得する。通常、レコーダ１１０はビデオを圧縮する。圧縮されたビデオは次にメモリ１２０に記憶される。その後、ビデオプレーヤ１３０は記憶されたビデオを再生することができる。 FIG. 1 shows a general system model for encoding multiple videos for surveillance applications. The camera 101 acquires the video 102 for the video recorder 110. Usually, the recorder 110 compresses the video. The compressed video is then stored in memory 120. The video player 130 can then play the stored video.

図２は、レコーダ２００の詳細を示す。取得されたビデオ１０２は高速スイッチ２１０に送られる。このスイッチは、アナログビデオ信号を時間通りに（in time）サンプリングする。サンプルはデコーダ２２０に供給され、デジタル化された画像は静止画エンコーダ２３０によって符号化されて圧縮画像を生じる。メモリコントローラ２４０は、圧縮画像をメモリ１２０内の割り当てられた空間に書き込む。記憶されたビデオは後に再生することができる。 FIG. 2 shows details of the recorder 200. The acquired video 102 is sent to the high speed switch 210. This switch samples the analog video signal in time. The samples are supplied to the decoder 220 and the digitized image is encoded by the still image encoder 230 to produce a compressed image. The memory controller 240 writes the compressed image into the allocated space in the memory 120. The stored video can be played back later.

図２のレコーダの主な問題は、静止画が符号化されることである。ビデオの時間的な冗長性は使用されない。その結果、何週間や何ヶ月もの監視ビデオを記憶する場合、メモリ１２０は非常に大きくなければならない。大量の監視ビデオ、特に夜間に撮影されたものは、完全な静止シーンのものであることに留意すべきである。有意な事象は稀である。 The main problem with the recorder of FIG. 2 is that still images are encoded. Video temporal redundancy is not used. As a result, the memory 120 must be very large when storing weeks or months of surveillance video. It should be noted that large amounts of surveillance video, especially those taken at night, are of a completely static scene. Significant events are rare.

図３は、上記の問題に対する明快な解決策を示す。この方式では、各ビデオに１つの符号化チャネルがある。符号化チャネルにおいて、ビデオはまずＮＴＳＣ復号化される２２０。ビデオコーダにおいて用いられるＭＰＥＧのような予測符号化のために、ビデオエンコーダ毎にフレームメモリ３１０が保持（maintain）されて基準ピクチャを記憶する。次に異なるカメラ１０１からの入力フレームが別々に符号化され、その結果がメモリコントローラ２４０を用いてメモリ１２０に書き込まれる。入力フレームの時間レートは、固定期間Ｔ３０１で均一にサンプリングすることによって制御することができる。このサンプリングによりメモリ１２０をより有効に使用することができる。この方式の主な欠点は、ビデオエンコーダ３２０が、フルレートのビデオを処理するように設計されているにもかかわらず完全に利用されないことである。また、多数のデコーダおよびエンコーダがシステムのコストを増加させる。 FIG. 3 shows a clear solution to the above problem. In this scheme, there is one encoded channel for each video. In the encoded channel, the video is first NTSC decoded 220. For predictive coding such as MPEG used in a video coder, a frame memory 310 is maintained for each video encoder to store a reference picture. Next, input frames from different cameras 101 are encoded separately, and the result is written into the memory 120 using the memory controller 240. The time rate of the input frame can be controlled by sampling uniformly in the fixed period T301. By this sampling, the memory 120 can be used more effectively. The main drawback of this scheme is that video encoder 320 is not fully utilized despite being designed to process full-rate video. Multiple decoders and encoders also increase the cost of the system.

Ono他に対して２００１年１１月６日付で発行された米国特許第６，３１４，１３７号「ビデオデータ圧縮システム、ビデオ録画／再生システム、およびビデオデータ圧縮の符号化方法（Video data compression system, video recording/playback system, and video data compression encoding method）」では、上記の欠点を克服する図４のようなシステムおよび方法が記載されている。そこでは、単一のビデオエンコーダ４２０を用いて全てのビデオを符号化する。各カメラ入力１０１からのデジタル化されたビデオフレームは期間Ｔでサブサンプリングされ、それぞれのフレームメモリ３１０にバッファリングされる。単一のエンコーダ４２０で予測符号化を達成するために、１つのカメラ入力に対応する一連の入力ビデオフレームがビデオエンコーダに供給され、同一カメラ入力のフレームからの予測符号化が連続して行われるようにする。ＭＰＥＧ符号化のピクチャグループ（ＧＯＰ）構造により、そのような独立単位を形成することが可能になる。そのようにして、メモリコントローラ２４０はＧＯＰ選択（GOP-select）となり、各カメラ入力からのＧＯＰはコントローラ４１０に従ってエンコーダ内に時間的に多重化される。その方式では、全カメラ入力の単一のビットストリームが生成される。所与のカメラに対応するビデオの部分を特定するために、カメラ識別子４０１が符号化されたビットストリーム内に多重化される。 U.S. Pat. No. 6,314,137 issued to Ono et al. On November 6, 2001 "Video data compression system, video recording / playback system, and video data compression system," "video recording / playback system, and video data compression encoding method" describes a system and method as shown in FIG. 4 that overcomes the above drawbacks. There, a single video encoder 420 is used to encode all videos. Digitized video frames from each camera input 101 are subsampled in time period T and buffered in a respective frame memory 310. In order to achieve predictive coding with a single encoder 420, a series of input video frames corresponding to one camera input is fed to the video encoder, and predictive coding from frames of the same camera input is continuously performed. Like that. An MPEG encoded picture group (GOP) structure allows such independent units to be formed. As such, the memory controller 240 becomes GOP-select and the GOP from each camera input is temporally multiplexed into the encoder according to the controller 410. In that scheme, a single bitstream of all camera inputs is generated. A camera identifier 401 is multiplexed into the encoded bitstream to identify the portion of the video corresponding to a given camera.

上記の解決策では、１つのＧＯＰに値するデータがフレームメモリの各々に記憶される必要があり、図２の多くても１つまたは２つの基準ピクチャのみが必要であるシステムよりも要件は遥かに大きくなる。したがって、符号化ハードウェアは大幅に節減されるものの、メモリ要件は依然として大きく高価である。この欠点は、単に入力ビデオをより積極的にサンプリングするだけでは克服できない。これはビデオの時間分解能を下げるが、依然として同じＧＯＰデータをバッファリングする必要がある。より短いＧＯＰ期間のみがメモリ要件を減らすが、これはより頻繁なフレーム内符号化を暗示し、より頻繁なフレーム内符号化は符号化効率の低下を意味する。最も極端な場合、１つのＧＯＰ期間は、図２に示す静止画符号化システムに退化してしまう。 In the above solution, data worth one GOP needs to be stored in each of the frame memories, and the requirements are far greater than the system of FIG. 2 where only one or two reference pictures are required at most. growing. Thus, although the encoding hardware is greatly reduced, the memory requirements are still large and expensive. This drawback cannot be overcome by simply sampling the input video more aggressively. This reduces the temporal resolution of the video, but still needs to buffer the same GOP data. Only shorter GOP periods reduce memory requirements, but this implies more frequent intraframe coding, which means less efficient coding. In the most extreme case, one GOP period degenerates to the still picture coding system shown in FIG.

高いメモリ要件は図４のシステムの１つの欠点に過ぎない。この問題はより多数のビデオに対するシステムのスケーリングに比例して悪化する。 High memory requirements are just one drawback of the system of FIG. This problem gets worse in proportion to the scaling of the system for a larger number of videos.

大規模なシステムでは、帯域幅および記憶容量（storage）を減らすために圧縮ビデオが用いられることが多い。したがって、複数の圧縮ビデオを同時にサブサンプリングするシステムおよび方法を提供することが望ましい。 In large systems, compressed video is often used to reduce bandwidth and storage. Accordingly, it is desirable to provide a system and method for subsampling multiple compressed videos simultaneously.

本発明の１つの目的は、資源、特にメモリ、符号化ハードウェア、および遅延に制約のある複数のビデオを記憶するための低コストで圧縮効率が良くスケーラブルで柔軟性のあるシステムを提供することである。 One object of the present invention is to provide a low-cost, compression-efficient, scalable, and flexible system for storing resources, especially memory, encoding hardware, and multiple delay-constrained videos. It is.

本発明のさらなる目的は、複数の相関のない圧縮ビデオの間で可変で非均一な時間分解能を達成する方法を提供することである。 It is a further object of the present invention to provide a method for achieving variable and non-uniform temporal resolution between multiple uncorrelated compressed videos.

方法が圧縮された入力ビデオを取得する。各入力ビデオの圧縮フレームを固定されたサンプリングレートで取得する。 The method obtains a compressed input video. Acquire compressed frames of each input video at a fixed sampling rate.

圧縮ビデオに同時かつ並行して複合解析を適用して、各圧縮ビデオの可変で非均一な時間サンプリングレートを求め、合計の歪を最小にするとともに合計のフレームレート制約を満足するようにする。 A composite analysis is applied to the compressed video simultaneously and concurrently to determine a variable and non-uniform temporal sampling rate for each compressed video to minimize the total distortion and satisfy the total frame rate constraint.

次に各圧縮ビデオを関連する可変で非均一な時間サンプリングレートでサンプリングして、可変の時間分解能を有する圧縮された出力ビデオを生成する。 Each compressed video is then sampled at an associated variable non-uniform time sampling rate to produce a compressed output video having a variable temporal resolution.

図５は、カメラにより固定されたサンプリングレートで同時に取得された複数の入力ビデオ（ビデオ１〜４）５１０の個々のフレーム５０１を示す。本発明による複合解析６００にかけて、複数の出力ビデオ５２０のフレーム５０２は可変で非均一な時間サンプリングレートを有する。複合解析の目的は、合計のフレームレート制約を満足する一方で全出力ビデオの合計の歪を最小にすることである。 FIG. 5 shows individual frames 501 of multiple input videos (videos 1-4) 510 acquired at the same time with a fixed sampling rate by the camera. Over the composite analysis 600 according to the present invention, the frames 502 of the plurality of output videos 520 have a variable and non-uniform temporal sampling rate. The purpose of composite analysis is to minimize the total distortion of all output videos while satisfying the total frame rate constraint.

所与の時間における特定の出力フレーム５０２のレートおよび存在に影響する要素として、圧縮効率、資源の制約、および有意な事象の検出がある。圧縮効率は、用いるエンコーダのタイプ（例えばＭＰＥＧ−２またはＭＰＥＧ−４）に関連する可能性がある。資源の制約は、メモリ、処理速度、および符号化レートを考慮する。有意な事象とは、背景シーンのより長い部分とは異なる短期の局所的な事象を意味する。 Factors that affect the rate and presence of a particular output frame 502 at a given time include compression efficiency, resource constraints, and significant event detection. The compression efficiency may be related to the type of encoder used (eg MPEG-2 or MPEG-4). Resource constraints take into account memory, processing speed, and coding rate. Significant events mean short-term local events that are different from the longer part of the background scene.

例えば、監視ビデオにおいて、通常は静止した誰もいない玄関のシーンに入ってきた人物は有意であるとみなされる。Divakaran他によって２００３年６月３０日付で出願された米国特許出願第１０／６１０，４６７号「ビデオにおける短期間の異常事象の検出方法（Method for Detecting Short Term Unusual Events in Videos）」（参照により本明細書中に援用する）を参照されたい。同様に、交通ビデオの場合、滑らかな動きから動きがない状態への変化は事故を示す。 For example, in a surveillance video, a person who enters an entrance scene that is normally stationary and nobody is considered significant. US patent application Ser. No. 10 / 610,467, filed June 30, 2003 by Divakaran et al., “Method for Detecting Short Term Unusual Events in Videos” (book by reference (Incorporated herein). Similarly, in the case of traffic video, a change from smooth motion to no motion indicates an accident.

システムアーキテクチャ
図６Ａは、本発明による複数のビデオのビデオ符号化システムのブロック図を示す。複数のカメラ６０１からのビデオ５１０はまずＮＴＳＣ復号化される６０２。デジタル化されたビデオは次に、各ビデオの時間サンプリングレート６０４を求めるために複合解析回路６００に入力される。 System Architecture FIG. 6A shows a block diagram of a video encoding system for multiple videos according to the present invention. Video 510 from multiple cameras 601 is first NTSC decoded 602. The digitized video is then input to composite analysis circuit 600 to determine a temporal sampling rate 604 for each video.

時間サンプリングレート６０４は、ビデオ中の有意な事象の存在、圧縮効率および資源の制約を考慮する方法に従って求められる。本方法および制約は以下でさらに詳述する。 The temporal sampling rate 604 is determined according to a method that takes into account the presence of significant events in the video, compression efficiency and resource constraints. The method and constraints are described in further detail below.

各ビデオの可変で非均一な時間サンプリングレートはコントローラ６１０に伝えられ、このコントローラ６１０がサンプリング回路６０４をトリガする。この回路は、時間ｔにおける各ビデオ入力のフレームを非均一にサンプリングすることができる。サンプリング時間Ｔ_１（ｔ）、Ｔ_２（ｔ）、Ｔ_３（ｔ）、Ｔ_４（ｔ）は必ずしも互いに同一レートで動作しなくてもよい。これらのサンプリング時間は、全ビデオにわたる合計のフレームレートを決める。 The variable non-uniform time sampling rate of each video is communicated to the controller 610 which triggers the sampling circuit 604. This circuit can non-uniformly sample each video input frame at time t. The sampling times T ₁ (t), T ₂ (t), T ₃ (t), and T ₄ (t) do not necessarily operate at the same rate. These sampling times determine the total frame rate across all videos.

サンプリングされたビデオ５２０のフレームは次にフレームバッファ６２０に記憶される。コントローラ６１０は、符号化６４０される入力を選択し６３０、それぞれのフレームバッファからフレームが読み出され、ビデオエンコーダ６４０に伝えられる。コントローラから、ビデオエンコーダは以下の情報６１１、すなわち複合解析６００によって導出されたビデオを符号化するために用いる符号化パラメータ、同様にビットストリーム自体に符号化することができる識別情報を受け取ることができる。識別は、ＩＳＯ／ＩＥＣ１４４９６−２「視聴覚オブジェクトの符号化−パート２：視覚（Coding of audio-visual objects - Part 2: Visual）」（2^nd Edition, 2001）に記載されるＭＰＥＧ−４ビデオオブジェクトプレーン符号化構文を用いてビットストリームに直接符号化することができる。 The sampled frame of video 520 is then stored in frame buffer 620. Controller 610 selects the input to be encoded 640 630, and frames are read from the respective frame buffers and communicated to video encoder 640. From the controller, the video encoder can receive the following information 611: the encoding parameters used to encode the video derived by the composite analysis 600, as well as identification information that can be encoded into the bitstream itself. . Identification, ISO / IEC 14496-2 "encoding of audio-visual objects - Part 2: Visual (Coding of audio-visual objects - Part 2: Visual) " MPEG-4 video objects that are described in (2 ^nd Edition, 2001) It can be encoded directly into the bitstream using plain coding syntax.

本システムは、ビデオ入力毎に固有のビットストリーム６５０を生成し、このビットストリーム６５０は次に後の再生のために永久メモリ６６０に記憶される。 The system generates a unique bitstream 650 for each video input, which is then stored in permanent memory 660 for later playback.

このシステムでは、ビデオ毎に最小数の基準フレーム、例えば１つまたは２つのフレームが記憶される。この低いメモリ要件の理由は、ビデオ符号化プロセスの性質による。図４に示すOno他によるデータのＧＯＰを符号化する従来技術のシステムと対照的に、本発明によるシステムはデータのフレームを符号化する。これは、異なるビデオ入力に印を付けて個別の出力ビットストリーム６５０を生成することができるビデオエンコーダ６４０により可能となる。コントローラ６１０は、符号化している現在のビデオの動き補償に使用する正確な基準フレームを選択することができる６３０。これらの機能はいずれもOno他によって記載されたシステムの一部ではない。これらの特徴は、複数のビデオを同時に符号化することができるスケーラブルなシステムを提供することができる。 In this system, a minimum number of reference frames, for example one or two frames, is stored for each video. The reason for this low memory requirement is due to the nature of the video encoding process. In contrast to the prior art system for encoding data GOP by Ono et al. Shown in FIG. 4, the system according to the present invention encodes a frame of data. This is made possible by a video encoder 640 that can mark different video inputs and generate separate output bitstreams 650. Controller 610 may select 630 the correct reference frame to use for motion compensation of the current video being encoded. None of these features are part of the system described by Ono et al. These features can provide a scalable system that can encode multiple videos simultaneously.

図６Ｂのシステムは、図６Ａのシステムのわずかに異なる構成である。主な違いは単一の共有フレームバッファ６２１である。 The system of FIG. 6B is a slightly different configuration of the system of FIG. 6A. The main difference is a single shared frame buffer 621.

図７は、図６Ａおよび図６Ｂのコアアーキテクチャの拡張を示す。永久メモリ６６０に記憶されるビットストリームは長期間の記憶のためにアーカイバルメモリ７０３に転送することができる。この場合、トランスコーダ７０１を使用して、記憶されたビットストリームのビットレート、空間解像度および／または時間分解能をさらに変更することができる。解析回路７０２が、トランスコーディングパラメータを構成および設定するために用いられる。Vetro等によって「ビデオのトランスコーディングアーキテクチャおよび技法の概要（An overview of video transcoding architectures and techniques）」（IEEE Signal Processing Magazine, March 2003）に記載されるような任意の従来技術のトランスコーディング技法を用いることができる。 FIG. 7 illustrates an extension of the core architecture of FIGS. 6A and 6B. The bitstream stored in permanent memory 660 can be transferred to archival memory 703 for long term storage. In this case, the transcoder 701 can be used to further change the bit rate, spatial resolution and / or temporal resolution of the stored bitstream. An analysis circuit 702 is used to configure and set the transcoding parameters. Use any conventional transcoding technique as described by Vetro et al. In "An overview of video transcoding architectures and techniques" (IEEE Signal Processing Magazine, March 2003). Can do.

上記のアーキテクチャは４つの入力ビデオおよび１つのビデオエンコーダについて示されたが、この設計は、システムをより多数の入力ビデオおよびビデオエンコーダ（例えば１６個の入力および２つのビデオエンコーダ、または６４個の入力と４つのビデオエンコーダ）に容易にスケールアップすることを可能にする。 Although the above architecture has been shown for four input videos and one video encoder, this design makes the system more capable of input video and video encoders (eg, 16 inputs and 2 video encoders, or 64 inputs). And 4 video encoders) can be easily scaled up.

複合解析
複合解析６００の動作に影響を及ぼす要素としては、フレームをスキップすることによって生じる合計の歪、システムの一部として組み込まれるエンコーダの数、入力ビデオの数、シーン中の有意な事象の検出および分類、特定のカメラ入力に対するユーザの優先度、各ビデオの最小の時間符号化レート、許容可能な遅延および解析に要求される記憶容量（storage）がある。 Composite Analysis Factors that affect the operation of composite analysis 600 include the total distortion caused by skipping frames, the number of encoders incorporated as part of the system, the number of input videos, and the detection of significant events in the scene. And classification, user priority for specific camera inputs, minimum temporal encoding rate for each video, acceptable delay and storage required for analysis.

スキップされたフレームによって生じる合計の歪は複合解析の動作を導く。Vetro他によって２００１年４月１６日付で出願された米国特許出願第０９／８３５，６５０号「可変のフレームスキップを有するビデオにおける総平均歪の推定（Estimating Total Average Distortion in a Video with Variable Frameskip）」において、スキップされたフレームの歪は２つの部分、すなわち最後の基準フレームの量子化による符号化誤差と、２つの瞬間の間のビデオ信号の変化によるフレーム補間誤差とから求められる。その方法は、単一のビデオの歪しか考慮しない。ここでは、全出力ビデオの合計の総歪を最小にする。 The total distortion caused by the skipped frames leads to a complex analysis operation. US patent application Ser. No. 09 / 835,650 filed Apr. 16, 2001 by Vetro et al. “Estimating Total Average Distortion in a Video with Variable Frameskip” , The distortion of the skipped frame is determined from two parts, namely the coding error due to the quantization of the last reference frame and the frame interpolation error due to the change of the video signal between the two instants. The method only considers single video distortion. Here, the total total distortion of all output videos is minimized.

以下では、符号化誤差によって生じる歪を空間歪と呼び、補間誤差によって生じた歪を時間歪と呼ぶ。ビデオ録画システムは通常全てのフレームを同一品質で符号化するため、歪を推定する問題は、フレームの空間歪を除外し、時間歪のみに焦点を当てることができる。これは従来技術において検討されてきた定式化とは大きく異なることを強調しておく。 In the following, distortion caused by coding error is called spatial distortion, and distortion caused by interpolation error is called time distortion. Since video recording systems typically encode all frames with the same quality, the problem of estimating distortion can eliminate the spatial distortion of the frame and focus only on temporal distortion. It is emphasized that this is very different from the formulation that has been studied in the prior art.

システムの一部として組み込まれるエンコーダの数は合計のフレームレート、すなわち単位時間毎に符号化することができるフレームの最大数を決める。例えば、単一のエンコーダが１秒につき３０個のフレームを符号化することができ、かつシステムの一部として４つのエンコーダが組み込まれる場合、合計のフレームレートは１２０フレーム／秒である。例えば、合計のフレームレートはビデオに以下のように割り当てることができる。合計の総フレームレートが１２０ｆｐｓの場合、静止シーンの６０個の監視ビデオはそれぞれ１フレーム／秒でサンプリングすることができ、アクティビティのある２つのビデオは３０フレーム／秒でサンプリングすることができる。 The number of encoders incorporated as part of the system determines the total frame rate, ie the maximum number of frames that can be encoded per unit time. For example, if a single encoder can encode 30 frames per second and four encoders are incorporated as part of the system, the total frame rate is 120 frames / second. For example, the total frame rate can be assigned to the video as follows: If the total total frame rate is 120 fps, each of the 60 surveillance videos of the still scene can be sampled at 1 frame / second, and the two videos with activity can be sampled at 30 frames / second.

この制約に基づく問題を定式化するために、値Ｎ_ｃａｐ（Ｔ）は時間単位Ｔ毎のフレームの総数を示す。システムに対する入力ビデオの数とそれらのビデオのフレームレートに応じて、全ビデオにわたる平均時間サンプリングレートを求めることができる。 In order to formulate the problem based on this constraint, the value N _cap (T) indicates the total number of frames per time unit T. Depending on the number of input videos to the system and the frame rate of those videos, an average temporal sampling rate across all videos can be determined.

複合解析の目的関数は次のように記述することができる。
Ｎ_{ｃｏｄｅｄ}（Ｔ）≦Ｎ_ｃａｐ（Ｔ）となるようなｍｉｎ Σ_ｉＤ_ｉ（１）
ここで、Ｄ_ｉはビデオｉの時間歪であり、Ｎ_{ｃｏｄｅｄ}（Ｔ）は単位時間Ｔ毎に符号化されるフレームの総数である。符号化能力の使用を最大にするには、Ｎ_{ｃｏｄｅｄ}（Ｔ）＝Ｎ_ｃａｐ（Ｔ）とする。 The objective function of complex analysis can be described as follows.
Min Σ _i D _i (1) such that N _coded (T) ≦ N _cap (T)
Here, D _i is the time distortion of video i, and N _coded (T) is the total number of frames encoded per unit time T. To maximize the use of coding capability, N _coded (T) = N _cap (T).

時間間隔［ｔ，ｔ＋τ］における（１）からの時間歪は以下のように表される。
Ｄ_ｉ［ｔ，ｔ＋τ］＝ｗ_ｉΣＤ_{ｓｋｉｐｐｅｄ＿ｆｒａｍｅｓ}［ｔ，ｔ＋τ］（２） The time distortion from (1) in the time interval [t, t + τ] is expressed as follows.
D _i [t, t + τ] = w _i ΣD _{skipped_frames} [t, t + τ] (2)

上記の式は、所与の時間間隔においてスキップされたフレームの歪を総和するとともに、各ビデオの乗法の重み係数ｗ_ｉを考慮する。この重み付けの目的および重み値に影響を与える要素は後述する。 The above formula sums the skipped frame distortions in a given time interval and considers the multiplicative weighting factor w _i for each video. The factors that affect the purpose of weighting and the weight value will be described later.

先に述べたように、この定式化は、時間歪のみを明らかにする（account for）という点で新規であるだけでなく、従来技術の定式化に極めて典型的な個々のビデオにレート制約がないという点でも新規である。その代わりに、全ビデオの合計のフレームレートに制約がある。これはこの定式化の第２の新規の態様であり、本発明者の知る限りでは、この機能は従来技術のシステムにはないものである。 As mentioned earlier, this formulation is not only new in that it only accounts for time distortion (account for), but it also has rate constraints on individual videos that are very typical of prior art formulations. It is also new in that it is not. Instead, there are constraints on the total frame rate of all videos. This is the second novel aspect of this formulation, and to the best of the inventor's knowledge, this function is not present in prior art systems.

シーン中の有意な事象の検出は、事象が検出されたかどうかを記録し、有意な事象がない場合には記録をやめるバイナリ動作を暗示する可能性がある。これは非常に単純な動作である。しかしながらこれは、検出器が高精度かつ耐雑音性でない限り、いくつかの有意な事象を見過ごし、いくつかの有意でない事象を記録する可能性があるため、あまりよい戦略ではない。 Detection of a significant event in the scene may record a binary action that records whether an event has been detected and stops recording if there is no significant event. This is a very simple operation. However, this is not a very good strategy as it can miss some significant events and record some insignificant events unless the detector is accurate and noise proof.

その代わりに、最適化する上記の目的関数の一部として重み係数を含めることが好ましい。目的関数が合計の歪を最小にすることに基づくものと仮定すると、有意な事象がない場合には１の重みを使用し、事象がある場合にはより大きい重み（例えばｗ_ｉ＝ｗ_ｍａｘ）を使用する。ｗ_ｍａｘの値は調整することができる。大きな値は、複合解析がビデオ中の事象が検出されたフレームをより多く符号化して合計の歪を最小にするよう促す。事象が検出されず、かつｗ_ｉ＝１である場合、デフォルトモードは、目的関数が純粋にビデオの歪に依存して符号化されたフレームを求めることである。事象の検出に用いられるプロセスが異なる事象の有意度を区別する能力を有するものと仮定すると、重みｗ_ｉは［１，ｗ_ｍａｘ］の範囲の値をとることができる。 Instead, it is preferable to include a weighting factor as part of the objective function to be optimized. Assuming that the objective function is based on minimizing the total distortion, use a weight of 1 if there is no significant event, and a larger weight if there is an event (eg, w _i = w _max ). Is used. The value of w _max can be adjusted. A large value encourages composite analysis to encode more frames in which events in the video are detected to minimize total distortion. If no event is detected and w _i = 1, the default mode is to find a frame whose objective function is purely dependent on video distortion. Assuming that the process used to detect events has the ability to distinguish the significance of different events, the weights w _i can take values in the range [1, w _max ].

（１）の式における重みｗ_ｉのさらなる使用は、特定のビデオの優先度を表現することである。例えば、あるビデオが重要なエリア（例えば入口）のものである場合、Ｗ_ｉを１より大きく設定することができる。これは事象の検出と同様に、この入力ビデオから常により多くのフレームを符号化するという偏向を与える。なお、この優先度は、事象の検出の結果として生じる重み付けに対しても作用することができる。言い換えれば、優先度は、事象検出プロセスから出力される重みに一定の値を加えることができる。 A further use of the weight w _i in equation (1) is to express the priority of a particular video. For example, if there video is of critical areas (e.g. entry), it is possible to set the W _i greater than 1. This gives a bias to always encode more frames from this input video, as well as event detection. Note that this priority can also act on the weighting that occurs as a result of event detection. In other words, the priority can add a certain value to the weight output from the event detection process.

ビデオに優先度を付加する代わりに、重みは、特定のビデオについて符号化する相対的なフレームの数の重要性を低くするために用いることもできる。これは、１よりも小さい重みを用いて達成することができる。その理由は、その入力ビデオからフレームをスキップすることにより生じる歪の蓄積が遅くなり、符号化されるフレームが少なくなるためである。 Instead of adding priority to a video, weights can also be used to reduce the importance of the relative number of frames that are encoded for a particular video. This can be achieved using a weight less than one. The reason for this is that distortion accumulation caused by skipping frames from the input video is slowed down and fewer frames are encoded.

しかしながら、システムにより設定された最小の時間符号化レートが存在する可能性がある。この設定は、最後のフレームが符号化されてからある所定の期間が経過すると特定の入力のフレームを符号化させる。特定の入力に対する重みをゼロに設定すると常に、この最小の時間レートが強制される。 However, there may be a minimum time coding rate set by the system. This setting causes a frame of a specific input to be encoded when a predetermined period has elapsed since the last frame was encoded. This minimum time rate is enforced whenever the weight for a particular input is set to zero.

複合解析６００の動作に影響を与える最後の要素は許容可能な遅延（τ）である。より大きな遅延が許容されれば、より大きな時間間隔を考慮して問題が解決される。時間間隔が大きくなれば歪が小さくなる可能性があるが、この大きな時間間隔を設けることには２つの欠点がある。第１の欠点はメモリのサイズである。より大きな時間窓は、入ってくるフレームを解析するのにより多くの記憶容量が必要であることを暗示する。第２の欠点は計算である。時間窓が大きくなれば、評価する必要のある可能な解も多くなる。これは、システムの設計中に行う必要がある通常の技術的なトレードオフである。しかしながら、本発明が説明するシステムが提供する柔軟性に注目する。 The last factor that affects the operation of the composite analysis 600 is the acceptable delay (τ). If a larger delay is allowed, the problem is solved taking into account a larger time interval. Although the distortion may be reduced if the time interval is increased, there are two drawbacks to providing this large time interval. The first drawback is the size of the memory. A larger time window implies that more storage capacity is needed to analyze incoming frames. The second drawback is calculation. The larger the time window, the more possible solutions that need to be evaluated. This is a normal technical trade-off that must be made during system design. However, note the flexibility provided by the system described by the present invention.

問題を固定された時間間隔Ｔに分解し、Ｎは全入力ビデオの、この時間間隔において符号化される考えられるフレームの合計の総数を示すものとすると、上記問題の最適な解決策は、システムが Decomposing the problem into a fixed time interval T, where N represents the total number of possible frames encoded in this time interval for all input videos, the optimal solution to the problem is the system But

個の考えられる解を評価することを要求する。 Request to evaluate each possible solution.

所与の時間間隔について全ビデオにわたる合計の歪が最小である解を選択する。考えられる解の各々は２つのフレーム間の歪の計算の組み合わせであることを留意すべきである。Ｍが入力ビデオの総数であり、かつＫが各ビデオの期間Ｔ内のフレームの総数である場合、計算しなければならないフレーム対間の別個の歪の総数は Choose the solution with the minimum total distortion across all videos for a given time interval. It should be noted that each possible solution is a combination of distortion calculations between two frames. If M is the total number of input videos and K is the total number of frames in each video period T, then the total number of distinct distortions between frame pairs that must be calculated is

である。 It is.

図８に示す実施の形態において、複合解析８００に対するパラレル入力ビデオ８０１は、個別に圧縮されたフレーム、すなわちフレーム内符号化フレーム（intra-frame）の圧縮ビデオである。フレーム内符号化フレームの圧縮カテゴリの一部であるビデオの例には、ＪＰＥＧ、ＪＰＥＧ２０００、ＭＰＥＧ−１／２のＩフレーム、ＭＰＥＧ−４のＩ−ＶＯＰ（ビデオオブジェクトプレーン）または任意の他のフレーム内符号化フレームの圧縮技法を用いてフレームが圧縮されたビデオがある。 In the embodiment shown in FIG. 8, the parallel input video 801 for the composite analysis 800 is a compressed video of individually compressed frames, ie, intra-frame frames. Examples of videos that are part of the compression category of intra-coded frames include JPEG, JPEG2000, MPEG-1 / 2 I-frame, MPEG-4 I-VOP (video object plane) or any other frame There are videos in which frames are compressed using inner-coded frame compression techniques.

ネットワーク化された監視システムでは、帯域幅を減らすために、カメラ入力はこのように圧縮される。同時に、編集および閲覧（review）動作は依然として各フレームに個別にアクセスすることができる。このようにフレーム内符号化フレームの圧縮を用いることは放送スタジオでもごく一般的に行われており、ビデオは通常、放送前に内部で異なる編集ステーションに送信されて、いくつかの変更が加えられる。 In networked surveillance systems, the camera input is compressed in this way to reduce bandwidth. At the same time, edit and review operations can still access each frame individually. Using compression of intra-frame encoded frames in this way is quite common in broadcast studios, and video is usually sent internally to different editing stations prior to broadcast, with some modifications. .

本実施の形態では、複合解析８００が依然として、式（１）で与えられる目的関数および制約に応じて動作する。しかしながら、この場合、時間歪は、フレーム内符号化フレームの圧縮領域情報から直接求められる。この圧縮領域情報は、離散コサイン変換（ＤＣＴ）係数を含む可能性がある。フレームの差およびフレーム間の相関を圧縮領域から非常に効率的に達成することができるため、フレームを完全に復号化して再構築したフレームから解析を行う必要はない。その代わりに、部分的に復号化したフレームに対して解析８００を行う。 In the present embodiment, the composite analysis 800 still operates according to the objective function and constraints given by equation (1). However, in this case, the time distortion is obtained directly from the compressed area information of the intra-frame encoded frame. This compressed domain information may include discrete cosine transform (DCT) coefficients. Since frame differences and correlations between frames can be achieved very efficiently from the compressed domain, there is no need to perform analysis from frames that have been completely decoded and reconstructed. Instead, analysis 800 is performed on the partially decoded frame.

コントローラ６１０の出力は上記と同様に動作し、サンプリング回路６０４がトリガされ、時間ｔにおける各ビデオ入力の圧縮フレームを非均一にサンプリングする。利点として、サンプリングレートＴ_１（ｔ）、Ｔ_２（ｔ）、Ｔ_３（ｔ）およびＴ_４（ｔ）は異なっていてもよい。サンプリングレートを求めた後、フレーム内符号化フレームの圧縮ビデオを適宜サンプリングし、サンプリングされたフレームを永久メモリ６６０に直接記録する。 The output of controller 610 operates in the same manner as above, and sampling circuit 604 is triggered to sample the compressed frame of each video input at time t non-uniformly. As an advantage, the sampling rates T ₁ (t), T ₂ (t), T ₃ (t) and T ₄ (t) may be different. After obtaining the sampling rate, the compressed video of the intra-frame encoded frame is sampled as appropriate, and the sampled frame is recorded directly in the permanent memory 660.

図９に示す実施の形態において、複合解析９００に対する入力ビデオ９０１は、互いに圧縮されたフレーム、すなわちフレーム間符号化フレーム（inter-frame）の圧縮ビデオである。このカテゴリに入るビデオには、ＭＰＥＧ−１／２のＰ／Ｂフレーム、ＭＰＥＧ−４のＰ／Ｂ−ＶＯＰまたは現在のフレーム内の情報を別の過去または未来の基準フレームから予測する任意の他の技法を用いてフレームが圧縮されたビデオがある。 In the embodiment shown in FIG. 9, the input video 901 for the composite analysis 900 is a compressed video of mutually compressed frames, ie inter-frame frames. Videos that fall into this category include MPEG-1 / 2 P / B frames, MPEG-4 P / B-VOPs, or any other that predicts information in the current frame from another past or future reference frame. There is a video in which the frame is compressed using the above technique.

ネットワーク化されたカメラシステムにおいて、ビデオは、フレーム内符号化フレームの圧縮で可能であるよりも大幅に圧縮される。これは特に、ほとんど動きのないシーンにおける時間の冗長性が非常に高い監視システムに当てはまる。本実施の形態の１つの主な利点は、多数のカメラが同一のネットワークおよび帯域幅を共有できることである。この場合、フレームの一部は、他のフレームを予測するための基準フレームの役割を果たす。これらのフレームを正確に処理するには注意が必要である。 In networked camera systems, video is compressed much more than is possible with compression of intra-frame encoded frames. This is especially true for surveillance systems with very high time redundancy in scenes with little motion. One major advantage of this embodiment is that multiple cameras can share the same network and bandwidth. In this case, a part of the frame serves as a reference frame for predicting other frames. Care must be taken to process these frames correctly.

本実施の形態において、複合解析９００は、圧縮ビデオから時間歪を推定する。これを達成する一手段は、Vetro他によって２００１年４月１６日付で出願された米国特許出願第０９／８３５，６５０号「可変のフレームスキップを有するビデオにおける総平均歪の推定（Estimating Total Average Distortion in a Video with Variable Frameskip）」に記載されている技法に基づく。 In this embodiment, the composite analysis 900 estimates temporal distortion from the compressed video. One means of achieving this is US patent application Ser. No. 09 / 835,650, filed Apr. 16, 2001, entitled “Estimating Total Average Distortion” in videos with variable frame skipping. in a Video with Variable Frameskip) ”.

フレームｉとフレームｋの間の時間歪Ｅ｛Δ^２ｚ_ｉ，ｋ｝は次式により推定される。 The time distortion E {Δ ² z _{i, k} } between the frame i and the frame k is estimated by the following equation.

ここで、（σ^２ _ｘｉ，σ^２ _ｙｉ）はフレームｉにおけるｘおよびｙの空間勾配の分散を表し、（σ^２ _{Δｘｉ，ｋ}，σ^２ _{Δｙｉ，ｋ}）はｘ方向およびｙ方向における２つのフレーム間の動きベクトルの分散を表す。 Here, (σ ² _xi , σ ² _yi ) represents the variance of the spatial gradient of x and y in frame i, and (σ ² _{Δxi, k} , σ ² _{Δyi, k} ) represents two frames in the x and y directions. Represents the variance of the motion vectors between them.

ここで、Vetro他は、当該技法をビデオの符号化に適用する場合の実際的な難しさを認めており、目標は、異なるフレームスキップ率（factor）の歪を推定することである。目的は、歪を最小にするフレームスキップ率を選択することである。しかしながら、多くの可能なフレームの動きベクトルは符号化の時点ではまだ得られないため、動きベクトルの推定は難しい計算となる。 Here, Vetro et al. Recognize the practical difficulty in applying the technique to video coding, and the goal is to estimate the distortion of different frame skip factors. The objective is to select a frame skip rate that minimizes distortion. However, since many possible frame motion vectors are not yet available at the time of encoding, motion vector estimation is a difficult calculation.

しかしながら、図９の実施の形態では、動きベクトルが圧縮ビデオ９０１から直接得られる。基準フレームの空間勾配、例えばテクスチャおよびエッジ情報は、ＤＣＴ係数から直接求めることができる。Wang他「視聴覚の索引付けおよび解析で用いられる圧縮領域の特徴の調査（Survey of compressed-domain features used in audio-visual indexing and analysis）」（Journal of Visual Communication and Image Representation, Volume 14, Issue 2, pp. 150-183, June 2003）を参照されたい。 However, in the embodiment of FIG. 9, the motion vectors are obtained directly from the compressed video 901. The spatial gradient of the reference frame, such as texture and edge information, can be determined directly from the DCT coefficients. Wang et al. “Survey of compressed-domain features used in audio-visual indexing and analysis” (Journal of Visual Communication and Image Representation, Volume 14, Issue 2, pp. 150-183, June 2003).

本発明によれば、式（３）によって得られる推定歪ならびに時間歪を測定する任意の他の手段を用いることができる。本発明の目的は、全てのビデオにおいて合計の歪を最小にするとともに合計のフレームレート制約を満足することである。 According to the present invention, any other means for measuring the estimated distortion as well as the temporal distortion obtained by equation (3) can be used. The object of the present invention is to minimize the total distortion in all videos and satisfy the total frame rate constraint.

複合解析９００の結果に基づいて、入力ビデオから抜け落ちるフレームに関する情報をコントローラ９１０に伝える。コントローラは、この情報を各ビデオに１つずつ設けられる時間レートトランスコーダ９２０へ信号で送る。 Based on the result of the composite analysis 900, information about the frames that are missing from the input video is transmitted to the controller 910. The controller signals this information to a time rate transcoder 920, one for each video.

トランスコーディング動作の複雑さは符号化動作よりも大幅に低いことに留意されたい。符号化中には、動きベクトルを求めなければならない。これは、動きベクトルが入力ストリームから得られるトランスコーディングには当てはまらない。したがって本システムは、多くのビデオを同時に処理することができる。 Note that the complexity of the transcoding operation is significantly lower than the encoding operation. During encoding, a motion vector must be determined. This is not true for transcoding where motion vectors are derived from the input stream. The system can therefore process many videos simultaneously.

時間レートを低減する様々なトランスコーディング技法が、Vetro他「トランスコーディングのアーキテクチャおよび技法の概要（An Overview of Transcoding Architectures and Techniques）」（IEEE Signal Processing Magazine, vol. 20, no. 2, pp. 18-29, March 2003）によって記載されている。 Various transcoding techniques that reduce the time rate are described by Vetro et al., “An Overview of Transcoding Architectures and Techniques” (IEEE Signal Processing Magazine, vol. 20, no. 2, pp. 18 -29, March 2003).

各ビデオの時間レートを適宜低減した後、圧縮された出力ストリーム９０２を永久メモリ６６０に直接記録する。 After reducing the time rate of each video accordingly, the compressed output stream 902 is recorded directly in permanent memory 660.

入力ビデオは、圧縮ビデオと非圧縮ビデオの両方を同時に含む可能性があることが理解されるべきである。 It should be understood that the input video may include both compressed and uncompressed video simultaneously.

本発明を好ましい実施の形態の例として説明してきたが、本発明の精神および範囲内で様々な他の適応および変更を行うことができることが理解される。したがって、添付の特許請求の範囲の目的は、本発明の真の精神および範囲内に入る変形および変更をすべて網羅することである。 Although the invention has been described by way of examples of preferred embodiments, it is understood that various other adaptations and modifications can be made within the spirit and scope of the invention. Accordingly, the purpose of the appended claims is to cover all modifications and variations that fall within the true spirit and scope of the invention.

従来技術による複数のビデオの録画および再生システムのブロック図である。1 is a block diagram of a conventional video recording and playback system. FIG. 従来技術による複数のビデオの静止画符号化システムのブロック図である。1 is a block diagram of a plurality of video still image encoding systems according to the prior art. FIG. 従来技術による複数のエンコーダを使用した複数のビデオのビデオ符号化システムのブロック図である1 is a block diagram of a video encoding system for multiple videos using multiple encoders according to the prior art. FIG. 従来技術による単一のエンコーダを使用した複数のビデオのビデオ符号化システムのブロック図である。1 is a block diagram of a video encoding system for multiple videos using a single encoder according to the prior art. FIG. 本発明による非均一な時間分解能を有する出力ビデオを生成するために解析される同時のフルフレームレート入力ビデオのブロック図である。FIG. 6 is a block diagram of simultaneous full frame rate input video that is analyzed to produce output video with non-uniform temporal resolution according to the present invention. 本発明による単一のエンコーダおよび個別のメモリを使用した複数のビデオのビデオ符号化システムを示す図である。FIG. 2 illustrates a video encoding system for multiple videos using a single encoder and separate memory according to the present invention. 本発明による単一のエンコーダおよび共有メモリを使用した複数のビデオのビデオ符号化システムを示す図である。FIG. 2 illustrates a video encoding system for multiple videos using a single encoder and shared memory according to the present invention. 本発明による単一のエンコーダおよびトランスコーダを使用した複数のビデオのビデオ符号化システムのブロック図である。1 is a block diagram of a video encoding system for multiple videos using a single encoder and transcoder according to the present invention. FIG. 圧縮ビデオのフレーム内符号化フレームに対して動作するビデオ処理システムのブロック図である。1 is a block diagram of a video processing system that operates on intra-coded frames of compressed video. FIG. 圧縮ビデオのフレーム間符号化フレームに対して動作するビデオ処理システムのブロック図である。1 is a block diagram of a video processing system that operates on inter-coded frames of compressed video. FIG.

Claims

Acquiring multiple compressed videos in parallel, acquiring compressed frames for each input video at a fixed sampling rate;
A composite analysis is applied simultaneously and concurrently to the plurality of compressed videos to determine a variable and non-uniform temporal sampling rate for each of the compressed videos, and the sum while minimizing the total distortion for the plurality of compressed videos Satisfy the frame rate constraints of
Processing a plurality of videos including sampling each compressed frame of the compressed video at the associated variable non-uniform time sampling rate to generate a plurality of compressed output videos having a variable temporal resolution. Method.

The method of processing a plurality of videos of claim 1, further comprising: storing the plurality of compressed output videos in a permanent memory.

The method of processing a plurality of videos according to claim 1, wherein the compressed frame is an intra-frame encoded frame.

The method of processing a plurality of videos according to claim 3, wherein the compressed video is a JPEG video.

The method of processing a plurality of videos according to claim 3, wherein the compressed video is an MPEG video.

The method of processing a plurality of videos of claim 1, further comprising: acquiring the plurality of compressed videos using a plurality of surveillance cameras.

The method of processing a plurality of videos of claim 1, further comprising: acquiring the plurality of compressed videos using a plurality of broadcast studio cameras.

The method of processing a plurality of videos according to claim 3, wherein the total distortion includes a temporal distortion.

The method for processing a plurality of videos according to claim 8, wherein the time distortion is obtained from compressed region information of the intra-frame encoded frame.

The method of processing a plurality of videos according to claim 9, wherein the compressed region information includes DCT coefficients.

The method of processing a plurality of videos of claim 1, further comprising: partially decoding the plurality of compressed videos before applying the composite analysis.

The method of processing a plurality of videos according to claim 1, wherein the compressed frames are inter-frame encoded frames.

The method of processing a plurality of videos according to claim 12, wherein the time distortion is directly determined from a motion vector of the inter-frame encoded frame.

The method for processing a plurality of videos according to claim 12, wherein the compressed frame is an MPEG-1 / 2 P / B frame.

The method of processing a plurality of videos according to claim 12, wherein the compressed frame is an MPEG-4 P / B video object plane.

The method of processing a plurality of videos according to claim 12, wherein the total distortion includes a temporal distortion.

The time distortion E {Δ ² z _{i, k} } between frame i and frame k is estimated by the following equation:

Here, (σ ² _xi , σ ² _yi ) represents the variance of the spatial gradient of x and y in the frame i, and (σ ² _{Δxi, k} , σ ² _{Δyi, k} ) represents the frame in the x and y directions. The method of processing a plurality of videos according to claim 16, which represents a variance of motion vectors between i and the frame k.

The method of processing a plurality of videos according to claim 17, wherein the spatial gradient is directly determined from DCT coefficients of the frame.

The method of processing a plurality of videos of claim 12, further comprising transcoding the compressed output video.